| Register | FAQ | Calendar | Search | Today's Posts | Mark Forums Read |
|
#1
| |||
| |||
| Peace, Consider: - The following represents the kind of code I am dealing with. Task A: (Clocked) Iterative structure (value of iteration) begin c <= a*b; d <= b*f; g <= c/a; k <= g+d; end (Ignore the sizes of the following variables and any fixed point arithmetic considerations, they are straight forward to deal with and besides the point for my question.) Typically later calculations depend on previous ones and series of calculations like these take place in iterative loops or nested iterative loops (using counters, not for loops). I have numerous such calculation sequences. I only want to use 1 multiplier, 1 adder and 1 divider, so I cannot have 2 or more multiplications, additions or divisions at the same time. So 2 options I see are: - 1- If I use blocking assignments, I will fulfil the hardware usage constraint, however, clock speed will suffer, as all the sequence of calculations in Task A will need to take place in one clock cycle. And where there are nested cycles of numerous calculations, the clocking frequency would be greatly affected. 2- If I use nonblocking assignments then each calculation will take place on a clock cycle, but all the calculations in the sequence of Task A take place simultaneously, which can imply the use of several multipliers, adders and dividers. My question: - Is it possible to use nonblocking assignments for Task A, yet still: - * Use only 1 multiplier, 1 adder and 1 divider * Have one calculation per clock cycle If not, how would you suggest this situation is dealt with in a minimum hardware situation? And what are the timing implications of this? Thank you for your time and consideration. |
|
#2
| |||
| |||
| "Marwan" <marwanboustany@gmail.com> wrote in message news:a43d48c2-d518-4533-83d6-69f4c96228dd@79g2000hsk.googlegroups.com... > Peace, > > Consider: - > > The following represents the kind of code I am dealing with. > > Task A: > (Clocked) Iterative structure (value of iteration) > begin > c <= a*b; > d <= b*f; > g <= c/a; > k <= g+d; > end > > (Ignore the sizes of the following variables and any fixed point > arithmetic considerations, they are straight forward to deal with and > besides the point for my question.) > In the particular case of Task A, see if you can apply the following optimization: g <= c/a; g <= a*b/a; g <= b; Won't work in all similar tasks, though. |
|
#3
| |||
| |||
| On Aug 9, 4:58*pm, "Robert Miles" <robertmi...@bellsouthNOSPAM.net> wrote: > In the particular case of Task A, see if you can apply the following > optimization: > > g <= c/a; > > g <= a*b/a; > > g <= b; > > Won't work in all similar tasks, though. A more general solution is needed because as you rightly stated, your approach has limited applicability. Task A was just something quick typed up to show mult, div, add and inter-relationship between outputs of one calculation and the input of another... so your little trick there highlights that small oversight. Peace |
|
#4
| |||
| |||
| Marwan wrote: > My question: - > Is it possible to use nonblocking assignments for Task A, yet still: - > * Use only 1 multiplier, 1 adder and 1 divider > * Have one calculation per clock cycle > If not, how would you suggest this situation is dealt with in a > minimum hardware situation? And what are the timing implications of > this? No, because a practical divide will require more than one clock cycle. However, if this is a homework problem, practicality may not be needed. Non-blocking assignments are safe iff those registers are declared and used in one block. Example: http://mysite.verizon.net/miketreseler/count_enable.v To spread logic over time, I like to use a case of a counter register. I work out timing implications by simulation and trial synthesis. Sharing logic means adding code to mux the inputs and outputs of the shared section. This may or may not save resources. Note that devices with dsp blocks often have unused hardware multipliers. > Thank you for your time and consideration. You are welcome. -- Mike Treseler |
|
#5
| |||
| |||
| On Aug 9, 5:58*pm, Mike Treseler <mtrese...@gmail.com> wrote: > No, because a practical divide > will require more than one clock cycle. > However, if this is a homework problem, > practicality may not be needed. How about if I did not have to have one calculation per clock cycle? Also are you saying that I would have to somehow control for the cycles taken by a divider and multiplier (by analogy?) within a sequence of calculations which include calculations that can take place in one cycle? So if I have (Clocked) Iterative structure (value of iteration) begin ... a <= b*c; g <= d/r; k <= e+w; ... end Then it could take (Clocked) Iterative structure (value of iteration) begin 1 clock cycle n clock cycles 1 clock cycle end ? That sounds difficult to deal with in a synchronous system... Somehow making everything wait until the divider is finished before moving on to the next iteration of the set of calculations... Also, why would a divide need more than one clock cycle? Surely it can calculate asynchronously, and as long as the operating frequency of the system is not too fast, you can get your result in one clock cycle? > > Non-blocking assignments are safe iff those registers > are declared and used in one block. *Example:http://mysite.verizon.net/miketreseler/count_enable.v > > To spread logic over time, I like to use > a case of a counter register. I work out > timing implications by simulation and trial synthesis. Where the counter counts up to the number of stages/cycles you wish to spread your logic over? > > Sharing logic means adding code to mux the inputs > and outputs of the shared section. This may > or may not save resources. Note that devices > with dsp blocks often have unused hardware multipliers. > If I was not to reuse hardware I would need 10's of multipliers and dividers... totally impractical. Also, I do not see how muxing inputs and outputs could possibly be more expensive than dividers and multipliers... Ok an important question for me is this, if I do have the following kind of code:- Task B (Clocked) Iterative structure (value of iteration) begin . . . series of nonblocking mult, divide and add operations . . (Clocked) Iterative structure (value of iteration) begin |
|
#6
| |||
| |||
| Marwan wrote: > On Aug 9, 5:58 pm, Mike Treseler <mtrese...@gmail.com> wrote: >> No, because a practical divide >> will require more than one clock cycle. >> However, if this is a homework problem, >> practicality may not be needed. > > How about if I did not have to have one calculation per clock cycle? That makes things easier. > Also are you saying that I would have to somehow control for the > cycles taken by a divider and multiplier (by analogy?) within a > sequence of calculations which include calculations that can take > place in one cycle? If the complete calculation takes longer than 1/clock_period, I have to break it up somehow. > That sounds difficult to deal with in a synchronous system... Somehow > making everything wait until the divider is finished before moving on > to the next iteration of the set of calculations... I think of it as a program counter in a cpu. If you don't like that, you can break it up structurally or use a register pipeline. > Also, why would a divide need more than one clock cycle? Surely it > can calculate asynchronously, and as long as the operating frequency > of the system is not too fast, you can get your result in one clock > cycle? You might be able to arrange that with a 1 MHz clock. Try it and see. >> To spread logic over time, I like to use >> a case of a counter register. I work out >> timing implications by simulation and trial synthesis. > > Where the counter counts up to the number of stages/cycles you wish to > spread your logic over? That's it. > If I was not to reuse hardware I would need 10's of multipliers and > dividers... totally impractical. Also, I do not see how muxing inputs > and outputs could possibly be more expensive than dividers and > multipliers... Depends what the hardware is and how much it costs. Try it and see. Start with a simple case. > Is there any way to make a general estimate for the number of clock > cycles taken for a Task B type situation... Or a Task A even? Write code for a simple example. Run a sim to check the cycle timing. Look at the RTL viewer to check synthesis. Edit code. Repeat. > Thank you for your time. You are welcome. Good luck. -- Mike Treseler |
|
#7
| |||
| |||
| > If the complete calculation takes longer > than 1/clock_period, I have to break it up somehow. 1/clock_frequency |
|
#8
| |||
| |||
| Thank you for you time. It is kind of you to give this time to a stranger. Peace |
|
#9
| |||
| |||
| Marwan wrote: (snip) > How about if I did not have to have one calculation per clock cycle? For the divider, you probably want it pipelined. Either that or iterative (less logic). > Also are you saying that I would have to somehow control for the > cycles taken by a divider and multiplier (by analogy?) within a > sequence of calculations which include calculations that can take > place in one cycle? > (Clocked) Iterative structure (value of iteration) > begin > ... > a <= b*c; > g <= d/r; > k <= e+w; > ... > end It all depends. It is possible to write a combinatorial multiplier or divider, but there isn't much reason to do it. Both pipeline very easily (especially on FPGA where registers are pretty much free). A pipelined multiplier or divider will take N clock cycles (N might be the width of the multiplier or quotient), but can do two operations in N+1 cycles. A pipelined multiplier takes about 2N times as much logic as an adder, and a divider about 3N times an adder. (There isn't much reason to limit the number of adders, as the logic to reuse one will be bigger than the adder itself.) If you have N clock cycles, but no need to pipeline the adder or multiplier, then an iterative one takes about three or four times the logic or an adder, and still N clock cycles to complete the operations. > Then it could take > > (Clocked) Iterative structure (value of iteration) > begin > 1 clock cycle > n clock cycles > 1 clock cycle > end > ? > That sounds difficult to deal with in a synchronous system... Somehow > making everything wait until the divider is finished before moving on > to the next iteration of the set of calculations... > Also, why would a divide need more than one clock cycle? Surely it > can calculate asynchronously, and as long as the operating frequency > of the system is not too fast, you can get your result in one clock > cycle? Yes, but it is a waste of logic. Only if logic is especially cheap and you don't have a faster clock is there any reason to do that. >>Non-blocking assignments are safe iff those registers >>are declared and used in one block. Example:http://mysite.verizon.net/miketreseler/count_enable.v (snip) > If I was not to reuse hardware I would need 10's of multipliers and > dividers... totally impractical. Also, I do not see how muxing inputs > and outputs could possibly be more expensive than dividers and > multipliers... Mux are fairly expensive for FPGAs, but so bad for others. (snip) > Is there any way to make a general estimate for the number of clock > cycles taken for a Task B type situation... Or a Task A even? Adders should work in one clock cycle. Multipliers in N cycles, where N is the width of one of the operands. Divider in N cycles, where N is the width of the quotient. -- glen |
|
#10
| |||
| |||
| Thank you everyone for the responses. I did some research on Xilinx boards and did come across allot of what was written here... However there is one question which I am still interested in which I cannot find a nice answer to: > (Clocked) Iterative structure (value of iteration) > begin > 1 clock cycle > n clock cycles > 1 clock cycle > end How does one practically deal with such a situation, blocking assignments would ensure that the order of processes works nicely, However operating frequency would suffer dramatically. I can appreciate that with a pipelined multiplier or divider, one can get outputs clock after clock with a continuous stream of input data after a fixed number of clock cycles with a good clock frequency... However, how can such a process be properly set up in a situation such as the example above? Given that there are two constraints, the series of calculations in every iteration cycle must have: - 1- The same data needs to be available to all the calculations. 2- The calculations preceding and following a multiplication or division, can be related outputs or inputs. 3- The outputs of the calculations in one iteration need to be available to the next. Non blocking assignments would work nicely if one could somehow properly align or set up the fully pipelined multiplier/divider to be in synch with the data being dealt with in the surrounding calculations, however this seems ridiculous as it implies hardware that predicts the future! Any gurus around here with experience dealing with such a situation? |
![]() |
| Thread Tools | |
| Display Modes | |
In an effort to better serve ads to our visitors, cookies are used on objectmix.com. For more information, check out our Privacy Policy.