PRBS generation is very useful in many digital design applications as well as in DFT.
I am almost always confused when given a PRBS polynomial and asked to implement it, so I find it handy to visit this site.
This is all nice and well for simple PRBS patterns. In some systems however, the PHY is working in a much higher rate than the digital core (say n times higher). The data is collected in wide buses in the core and then serialized (n:1 serialization) and driven out of the chip by the PHY.
This means that if we do a normal PRBS in the core clock domain, we would not get a real PRBS pattern on the pin of the chip but rather a mixed up version of PRBS with repeating sub-patterns. Best way to see this is to experiment with it on paper.
To get a real PRBS on the pin we must calculate n PRBS steps in each core clock cycle. That is, execute the polynomial, then execute it again on the result and then again, n times.
Let me describe a real life example I encountered not so long ago. The core was operating 8 times slower than the PHY and there was a requirement for a maximum length PRBS7 to be implemented.
There are a few maximum length polynomials for a PRBS7, here are two of them:
Both of these will generate a maximum length sequence of 127 different states. We now have to format it into 8 registers and hand it over to the PHY on each clock, But which of the two should we use? is there a speed/power/area advantage of one over the other? does it really matter?
Well, if you do a PRBS look-ahead, which is approximately the same order as your PRBS polynomial, then it really does matter. In our case we have to do a 8 look-ahead for a PRBS 7.
Compare the implementations of both polynomials below. For convenience both diagrams show the 8 intermediate steps needed for calculating the 8 look-ahead. In the circuit itself only the final result (the contents of the boxes in step 8 ) is used.
Because the XOR gate of the second polynomial is placed more close to where we have to shift in the new calculation of the PRBS, the amount of XORs (already too small in the second image to even notice) accumulate with each step. For the final step we have to use an XOR tree that basically XORs 7 of the 8 original bits – this is more in amount than the first implementation (even if you reuse some of the XORs in the logic) and the logic itself is deeper and thus the circuit becomes slower compared to the other implementation.
The first implementation requires at most a 3 input XOR for the calculation of look-ahead bit6 but the rest require only 2 input XOR gates.
Bottom line, if you do a PRBS look-ahead and have the possibility to choose a polynomial, choose one with lower exponents.