| Register | FAQ | Calendar | Search | Today's Posts | Mark Forums Read |
|
#1
| |||
| |||
| Hi, I have loosely followed the threads on this group and tried to extract some wisdom from everybody. This is quite useful to me /now/ because I'm implementing a "simple" SPI master interface for my Actel board. So I tried to use a single process idea as described by M. Treseler and his interesting http://mysite.verizon.net/miketreseler/uart.vhd However my circuit is not a UART. Also I don't much like "state machines" (I try to avoid them unless really necessary) and I don't see them as a one-must-fit-all problem solver. And (most importantly) I need to integrate a clock divider... I've been busy on this subject for the last 48 hours and I have been through at least 3 implementations. First I thought about how the gates would be connected and used to implement the function. Then I thought "Nah, Mike would beat me if i do that" so I restarted from scratch, in a "descriptive" and process-driven way. * As I read the SPI description, like http://en.wikipedia.org/wiki/Serial_..._Interface_Bus I see that the sampling on the input is on the opposite edge of output shifting. This means that I can't use the same register in the same process and in the same "if (clk'event and clk='1')" statement. Register reuse (as advocated in the UART document) is not possible here. * I had put everything as variables. Then it became obvious that the ORDER of the assignations is important, while it is less critical for signals. Well, in a process, you should care not to write the same signal twice, but it is a blood-thirsty 2-edge sword : it can simplify some statements, but the synthesizer will maybe not warn when you assign 2x (or more) a value to the same signal. Or variable. * Finally, my 2nd thought gave birth to an intricated, monolithic piece of dense code. It was a sort of huge state machine without the name, and it would always run at CPU frequency, even though most signals/variables were updated very infrequently : This was a waste of clock signals, of energy and routing resources. And I don't think it would have helped me prevent bugs (despite being seemingly easy to write). * Oh, and P&R results gave disapointing numbers. It was bloated. Mostly because the FFs had to support set/reset/enable and this did not map : a single bit of storage turned into 4 logic elements (each corresponding to a sort of 3-LUT in Actel technology). Speed was ok but that did not make me feel more comfortable. Style depends a lot on the writer and his perception of the design. For me it's not a problem to write 3 (or even much more) versions of the same thing, as I'm not "paid for it" (I could be described as a professional hobbyist, not a full-time engineering mercenary). Experience varies, and various tricks have different acceptance. So I decided to split the thing into several parts : the CPU interface, the programmable clock divider, the receive shift register and the sender. Each one can have its own clock so - power consumption is reduced - Enable and reset pins can be avoided every time it's possible (which is very important as I realised that my toolchain does not accept to have BOTH enable and set/reset despite the ability of the logic array : This led to a big bloat for the precedent design). Working with different clocks is tricky and this is also why "one big process" can't be the 100% recommended solution for every design. Also, splitting up things allows one to test each part separately (in my case, i check the logic that is generated and measure the compactness). One can more easily see where the synthesizer uses the best approach, so it can be guided to the expected result. When everything is packed together, it's less easy... Also, the behavioural/variable/process approach is not compelling (for me) at least for the final implementation. I am not concerned by simulation speed gained by the compacity and the high level of code description. This does not keep me from using the features of the language (genericity, modularity, portability etc.) http://www.designabstraction.co.uk/A...Techniques.htm makes some good points. However I put emphasis on the quality of the result and I don't care if the source code is bloated as long as i get what i want in the end. And I am used to think in the "mousetrap" way for so long that I have my own methods to ensure that everything works in the end. I admit however that high-level behavioural description/coding can be helpful to understand the difficulties and dark corners (side effects and special cases) of some circuitry. Also, now, I use variables in a slightly different way : - variables inside processes to hold temporary results for logic/boolean stuff - signals when FF is desired. So it's less prone to unwanted FF inferences and i get a warning when the order of the variable assignation could cause a trouble. Finally, I have gotten very nice results : - the circuit uses 2x less cells than before - the code uses signals and variables and about 4 processes - an interesting technique avoids glitches and cross-domain issues - One saturated Gray counter is used (that could be thought of as a disguised FSM) However the design is not finished and I'm curious how the validation will find errors where and why. A few days will pass before I can draw the first conclusions. But I really like VHDL because it is very expressive : one has the choice to use ANY level of description, and (even better) it is possible to MIX any levels together ! So it is always possible to reach the desired result in one way or another when a trouble or limitation appears somewhere. It is also possible to learn this language progressively, starting with dumb RTL and mastering functions years later. YG |
|
#2
| |||
| |||
| On Aug 21, 4:39 pm, whygee <why...@yg.yg> wrote: > Hi, ....snip... > * As I read the SPI description, likehttp://en.wikipedia.org/wiki/Serial_Peripheral_Interface_Bus > I see that the sampling on the input is on the opposite edge > of output shifting. This means that I can't use the same register > in the same process and in the same "if (clk'event and clk='1')" > statement. Register reuse (as advocated in the UART document) > is not possible here. Why not? You can use a single FF to delay the signal so that the data is shifted from one edge of the clock to the other. Then the same shift register can be used for both input and output. Rick |
|
#3
| |||
| |||
| Hi ! rickman wrote: > On Aug 21, 4:39 pm, whygee <why...@yg.yg> wrote: >> Hi, > ...snip... >> * As I read the SPI description, likehttp://en.wikipedia.org/wiki/Serial_Peripheral_Interface_Bus >> I see that the sampling on the input is on the opposite edge >> of output shifting. This means that I can't use the same register >> in the same process and in the same "if (clk'event and clk='1')" >> statement. Register reuse (as advocated in the UART document) >> is not possible here. > > Why not? You can use a single FF to delay the signal so that the data > is shifted from one edge of the clock to the other. Then the same > shift register can be used for both input and output. I thought about that, of course. however i have used a different approach. Maybe i'll change that again, as it takes more room than just a FF, but it avoids off-by-ones and other corner cases while remaining glitch-free and asynchronous. I eye on the ENC28J60 which has, at least for the first silicon revisions, particular clocking issues requiring my SPI core to use either internal or external clock sources. It will be cool to have a tiny 10Mbps Ethernet tranciever hooked to my 16-bit core in a FPGA :-) Anyway, this "small" SPI example is an eye-opener and I learnt a few asynchronous design tricks (about clock synchronisation, not FIFO) yesterday :-) regards, > Rick yg |
|
#4
| |||
| |||
| rickman wrote: > On Aug 21, 4:39 pm, whygee <why...@yg.yg> wrote: >> Hi, > ...snip... >> * As I read the SPI description, likehttp://en.wikipedia.org/wiki/Serial_Peripheral_Interface_Bus >> I see that the sampling on the input is on the opposite edge >> of output shifting. This means that I can't use the same register >> in the same process and in the same "if (clk'event and clk='1')" >> statement. Register reuse (as advocated in the UART document) >> is not possible here. > > Why not? You can use a single FF to delay the signal so that the data > is shifted from one edge of the clock to the other. Then the same > shift register can be used for both input and output. Something just struck me : I understand that the slave SPI device samples MOSI on the opposite clock edge that shifts the bit out. This ensures that setup&hold is roughly symetric and leaves some margin. However, my master SPI controller emits the clock itself (and resynchronises it) so for MOSI, the system can be considered as "source clocked", even if the slave provides some clock (it is looped back in my circuit). So i can also sample the incoming MISO bit on the same clock edge as MOSI : the time it takes for my clock signal to be output, transmitted, received by the slave, trigger the shift, and come back, this is well enough time for sample & hold. I am thinking more about the "fast" applications (20MHz) where the propagation delays start to reduce the sample time margin. This is not what i intend to implement (i sample at opposite edges), but this is an interesting discussion anyway : most sampled/latched digital logic i know sample at one edge (or another but not both). cheers, > Rick yg |
|
#5
| |||
| |||
| whygee wrote: > rickman wrote: >> On Aug 21, 4:39 pm, whygee <why...@yg.yg> wrote: >>> Hi, >> ...snip... >> Why not? You can use a single FF to delay the signal so that the data >> is shifted from one edge of the clock to the other. Then the same >> shift register can be used for both input and output. > > I thought about that, of course. > > however i have used a different approach. > Maybe i'll change that again, as it takes more room > than just a FF, but it avoids off-by-ones > and other corner cases while remaining glitch-free and asynchronous. Oh, now I remember exactly why i use the current, seemingly crazy method. This is because the SPI clock is different from the CPU clock. The shift register (in shared/single configuration) must have 2 clock sources, one for loading the register, another for shifting. When only the CPU clock controls the shift, it is very easy, but NO FF i know has dual clock inputs. I have found a mostly equivalent circuit with 3 FF and 1 MUX2 but ... heeeeek ! That would require at least 64 Actel cells for a 16-bit register (plus the rest). To reduce the cell count, one could switch from CPU clock to SPI clock and back. I know how to do this and there would be 16FF, 16MUX (2xless). However, the switching time creates excessive write latencies. -oO0Oo- The solution I have implemented is a bit unusual for a SPI master (AFAIK) but gives satisfying characteristics : - the receive register is a standard shift register. Nothing fancy, I only added a single AND gate to "mask" the 8 MSB in 8-bit mode. The data is easily read by the CPU, when a "ready flag" is read, so no Clock Domain Crossing issue here. - the emit system is more interesting : - A "classic" 16-bit data register is written by the CPU in the CPU clock domain. - a kind of FSM (5-bit Gray counter with saturation) operates in the SPI clock domain. The clock is already filtered and resynchronised using techniques from http://www.design-reuse.com/articles...itch-free.html I add one cycle of delay to let everything settle, before starting the counter. - the Gray counter (quite compact) controls a 16-input MUX2 tree, whose output goes to MOSI. Result : 32FF, 16 MUX (48 cells instead of 64) and any clock works. Speed is good too. I have found another structure for the emitter (one-hot encoding the shift register FSM and AND-OR the FSM with the Data Out register), but it was marginally larger (though less challenging than addressing a bit-reversed Gray counter in a MUX tree) and could not prevent some bit-to-bit transition glitches. If someone has a better idea, please tell me ... But when 2 different unrelated clocks are used, there is no solution using less than 1 FF per side. Then again, I'm pretty sure that I am reinventing the wheel for the 100001th time :-) have a nice week-end, yg |
|
#6
| |||
| |||
| whygee wrote: > This is because the SPI clock is different from the CPU clock. Could SPI use the CPU clock also? > If someone has a better idea, please tell me ... > But when 2 different unrelated clocks are used, there is no solution > using less than 1 FF per side. Then again, I'm pretty sure that > I am reinventing the wheel for the 100001th time :-) Is there some reason that a handshake won't work in both cases? -- Mike Treseler |
|
#7
| |||
| |||
| "whygee" <whygee@yg.yg> wrote in message news:48b0287a$0$292$7a628cd7@news.club-internet.fr... > whygee wrote: >> rickman wrote: >>> On Aug 21, 4:39 pm, whygee <why...@yg.yg> wrote: >>>> Hi, >>> ...snip... > > Oh, now I remember exactly why i use the current, seemingly crazy method. > > This is because the SPI clock is different from the CPU clock. > The shift register (in shared/single configuration) must have 2 clock > sources, > one for loading the register, another for shifting. > When only the CPU clock controls the shift, it is very easy, but Since you said you're implementing the SPI master side, that implies that you're generating the SPI clock itself which *should* be derived from the CPU clock...there should be no need then for more than a single clock domain (more later). > > If someone has a better idea, please tell me ... > But when 2 different unrelated clocks are used, there is no solution > using less than 1 FF per side. Then again, I'm pretty sure that > I am reinventing the wheel for the 100001th time :-) > The CPU clock period and the desired SPI clock period are known constants. Therefore one can create a counter that counts from 0 to Spi_Clock_Period / Cpu_Clock_Period - 1. When the counter is 0, set your Spi_Sclk output signal to 1; when that counter reaches one half the max value (i.e. "(Spi_Clock_Period / Cpu_Clock_Period/2") then set Spi_Sclk back to 0. The point where the counter = 0 can also then be used to define the 'rising edge of Spi_Sclk' state. So any place where you'd like to use "rising_edge(Spi_Sclk)" you would instead use "Counter = 0". The same can be done for the falling edge of Spi_Sclk; that point would occur when Counter = Spi_Clock_Period / Cpu_Clock_Period/2. Every flop in the design then is synchronously clocked by the Cpu_Clock, there are no other clock domains therefore no clock domain crossings. The counter is used as a divider to signal internally for when things have reached a particular state. KJ |
|
#8
| |||
| |||
| "whygee" <whygee@yg.yg> wrote in message news:48b00011$0$290$7a628cd7@news.club-internet.fr... > rickman wrote: >> On Aug 21, 4:39 pm, whygee <why...@yg.yg> wrote: > > However, > > my master SPI controller emits the clock itself (and resynchronises it) No need for the master to resynchronize something that it generates itself (see my other post). > so for MOSI, the system can be considered as "source clocked", even > if the slave provides some clock (it is looped back in my circuit). I don't think you understand SPI. The master always generates the clock, it is up to the slave to synchronize to that clock. The master never has to synchronize to the SPI clock since it generates it. > So i can also sample the incoming MISO bit on the same clock edge as MOSI > : > the time it takes for my clock signal to be output, transmitted, > received by the slave, trigger the shift, and come back, this is > well enough time for sample & hold. > See my other post for the details, but basically you're making this harder than it need be. Since the master is generating the SPI clock it knows when it is about to switch the SPI clock from low to high or from high to low, there is no need for it to detect the actual SPI clock edge, it simply needs to generate output data and sample input data at the point that corresponds to where it is going to be switching the SPI clock. KJ |
|
#9
| |||
| |||
| Hi ! Mike Treseler wrote: > whygee wrote: >> This is because the SPI clock is different from the CPU clock. > Could SPI use the CPU clock also? Yes, this was in the first implementation. But some slaves have clocking restrictions. Also, my (current) clock divider is not large enough to output quite low frequencies (for example, < 1MHz for very low voltage SPI Flash memories) Now i can select between the internal, divided clock, or an external pin. >> If someone has a better idea, please tell me ... >> But when 2 different unrelated clocks are used, there is no solution >> using less than 1 FF per side. Then again, I'm pretty sure that >> I am reinventing the wheel for the 100001th time :-) > > Is there some reason that a handshake > won't work in both cases? Do you mean a handshake between a CPU-driven 16-bit register, and a SPI-clock 16-bit shift register ? That would work, why not, but it is better or safer, or smaller ? That would amount, just as in my current implementation, to 32FF and 16MUX. I could strip one row of 16 FF by exploiting the fact that there is a latch somewhere in the CPU datapath, between the pipeline and the I/O section, but i can't garantee when this datapath latch is updated : the CPU may want to poll the control register, or access another peripheral, which would destroy the emitted data before it is sent. The VHDL code may be shorter but density is not an issue for me because it's not necessarily related to coding time. regards, > -- Mike Treseler YG |
|
#10
| |||
| |||
| Hi ! KJ wrote: > Since you said you're implementing the SPI master side, that implies that > you're generating the SPI clock itself which *should* be derived from the > CPU clock...there should be no need then for more than a single clock domain > (more later). As pointed in my previous post, there is at least one peripheral (ENC28J60 revB4) that has clocking restrictions (also know as "errata") and I happen to have some ready-to-use modules equipped with this otherwise nice chip... I don't know if my chip revision is B4 and the errata suggest using a clock between 8 and 10MHz. However, it also suggest using the ENC28J60-provided 12.5MHz output : I'm ready to add an external clock input in the master if i'm allowed to "legally" go beyond the 10MHz rating (a 25% bandwidth increase is always a good thing, particularly with real-time communications). As another "unintended case", an external clock input opens the possibility to bit-bang data with some PC or uC. I know it sounds stupid :-) but I had a project 10 years ago that would stream bootstrap code to a DSP through the PC's parallel printer port. ADi's SHARC had a boot mode where a DMA channel loaded code from outside, and I had found a trick to single-cycle the transfer with external circuits. That's very handy for early software development, more than flashing the boot memory all the time... Now, if I can stream external boot code WITHOUT the hassles of external circuitry (which was a pain to develop without the test devices I have now), that is an even better thing. For me and in the intended application, that's enough to justify another clock domain. If I had no ready-to-use ENC28J60 mini-module, I would not have bothered. > The CPU clock period and the desired SPI clock period are known constants. They are indicated in the datasheet of each individual product. And there is no "SPI standard" contrary to I2C or others. ( http://en.wikipedia.org/wiki/Serial_..._Bus#Standards ) Some chips accept a falling CLK edge after CS goes low, and some other chips don't (even chips by the same manufacturer vary). So i have read the datasheets of the chips i want to interface, and adapted the master interface to their various needs (and errata). > Therefore one can create a counter that counts from 0 to Spi_Clock_Period / > Cpu_Clock_Period - 1. When the counter is 0, set your Spi_Sclk output > signal to 1; when that counter reaches one half the max value (i.e. > "(Spi_Clock_Period / Cpu_Clock_Period/2") then set Spi_Sclk back to 0. I have (more or less) that already, which is active when the interal CPU clock is selected. This is used when booting the CPU soft core from an external SPI EEPROM. Note however that your version does not allow to use the CPU clock at full speed, what happens if you set your "max value" to "00000" ? And it does not garantee that the high and low levels have equal durations. But i'm sure that in practice, you will do much better (and i still have a few range limitations in my the clock divider, i'll have to add an optional prescaler). Here is the current (yet perfectible) version : clk, ... : in std_logic; -- the CPU clock .... signal clkdiv, -- the frequency register divcounter : std_logic_vector(4 downto 0); -- the actual counter signal ... SPI_en, lCK, ... : std_logic; begin .... -- free-running clock divider clock_process : process(SPI_en, clk) variable t : std_logic_vector(5 downto 0); -- holds the carry bit without storing it begin -- no reset needed, synchro is done later if (clk'event and clk='1' and SPI_en='1') then t := std_logic_vector( unsigned('0' & divcounter) + 1 ); -- increment the counter if t(t'left)='1' then -- if (expected) overflow then toggle lCK divcounter <= clkdiv; lCK <= not(lCK); else divcounter <= t(divcounter'range); -- Just update the counter end if; end if; end process; This method divides by 2x(-clkdiv). Without the 2x factor, it is impossible to work with clkdiv="00000", and the High and Low durations are unequal when clkdiv is odd. I use a count up, not of down, but the difference is marginal. > The point where the counter = 0 can also then be used to define the 'rising > edge of Spi_Sclk' state. So any place where you'd like to use > "rising_edge(Spi_Sclk)" you would instead use "Counter = 0". The same can > be done for the falling edge of Spi_Sclk; that point would occur when > Counter = Spi_Clock_Period / Cpu_Clock_Period/2. > > Every flop in the design then is synchronously clocked by the Cpu_Clock, > there are no other clock domains therefore no clock domain crossings. The > counter is used as a divider to signal internally for when things have > reached a particular state. I understand that well, as this is how i started my first design iteration I soon reached some inherent limitations, however. Particularly because of (slightly broken ?) tools that won't allow both a clock enable AND a preset on the FF (even though the Actel cells in ProAsic3 have this capability). Synplicity infers the right cell, which is later broken into 2 cells or more by the Actel backend :-? Maybe I missed a restriction on the use of one of the signals, using a specific kind of net or something like that (I hope). As the RTL code grows, the synthesizer infers more and more stuffs, often not foreseen, which leads to bloat. Muxes everywhere, and duplicated logic cells that are necessary to drive higher fanouts. I guess that this is because I focused more on the "expression" of my need than on the actual result (but I was careful anyway). I have split the design in 3 subparts (CPU interface, clock divider/synch and emit/receive, a total of 7 processes in a single architecture) and this needs < 140 cells instead of the 180 cells in the first iteration. And I use whatever clock I want or need. I could upload the source code somewhere so others can better understand my (fuzzy ?) descriptions. I should finish the simulation first. > KJ (I answer in the same post to stay on topic) KJ wrote: >> my master SPI controller emits the clock itself (and resynchronises it) > > No need for the master to resynchronize something that it generates itself > (see my other post). In fact, there IS a need to resynchronise the clock, even when it is generated by the CPU, because of the divider. Imagine (I'm picky here) that the CPU runs at 100MHz (my target) and the slave at 100KHz (an imaginary old chip). The data transfer is setup in the control register, then the write to the data register triggers the transfer. But this can happen at any time, whatever the value of the predivider's counter. So the clock output may be toggled the first time well below the required setup time of the slave. That's a glitch. In this case, the solution is easy : reset the counter whenever a transfer is requested. That's what i did too, the first time. but there is an even simpler solution : add a "clear" input condition to the FF that are used to resynchronise the clocks as in http://i.cmpnet.com/eedesign/2003/jun/mahmud3.jpg so the next clock cycle will be well-formed, whether the source is internal or external. The created delay is not an issue. >> so for MOSI, the system can be considered as "source clocked", even >> if the slave provides some clock (it is looped back in my circuit). > I don't think you understand SPI. The master always generates the clock, it > is up to the slave to synchronize to that clock. The master never has to > synchronize to the SPI clock since it generates it. I thought that too, until I read erratas of the chips i want to use. A friend told me years ago : "Never read the datasheet before the errata". An excellent advice, indeed. >> So i can also sample the incoming MISO bit on the same clock edge as MOSI : >> the time it takes for my clock signal to be output, transmitted, >> received by the slave, trigger the shift, and come back, this is >> well enough time for sample & hold. > See my other post for the details, but basically you're making this harder > than it need be. Though sometimes there needs to be something a bit more than the "theoretically practically enough". > Since the master is generating the SPI clock it knows when > it is about to switch the SPI clock from low to high or from high to low, > there is no need for it to detect the actual SPI clock edge, it simply needs > to generate output data and sample input data at the point that corresponds > to where it is going to be switching the SPI clock. This is what I did in the first design iteration. However, now, i avoid large single-clock processes because there is less control over what the synthesiser does. My code now uses 7 processes (one clockless, just because it's easier to code than in parallel statements) and fanout and MUXes are OK. Which goes back to the other thread : everybody has his own idea of what is "good", "acceptable", "required"... style and taste are difficult to discuss, and one rule does not apply to ANY case :-) Finally, I have the impression that you misunderstood the initial post about "SPI clocking". The idea was that the SPI master "could" sample MISO with the same (internal) clock signal and edge that samples MOSI. The issue this would "solve" is when capacitance and propagation delays on the PCB, along with relatively high clock speed (the 25AA1024 by Microchip goes up to 20MHz) delay the MISO signal enough to miss the normal clock edge. regards, > KJ yg |
![]() |
| Thread Tools | |
| Display Modes | |
In an effort to better serve ads to our visitors, cookies are used on objectmix.com. For more information, check out our Privacy Policy.