| Hardware pipelining example |
|
|
|
| Written by Yann Sionneau |
| Tuesday, 01 November 2011 11:24 |
|
Dear Open Source Hardware lovers, I wrote a small 3-stages pipeline example in Verilog and tested it using Icarus Verilog. This particular pipeline example has no real interest, all it does is adding 3 to the input integer, it is for educational purposes only and to serve as a basis for more advanced things in the future. What is a pipeline ? Quoting Wikipedia :
A drawing is better than 1 000 words, let assume we have a typical hardware block looking like this :
So we have basically :
Actually this is usually not enough. As users of this block, we need to know :
Moreover, the block needs to know :
Therefore, the previous block would usually look a little bit more like this :
This block is doing a job on its input data, therefore producing it's output data. Very simple so far, right ? Let assume this block does its computation in 10 clock cycles (10 periods T of the clock signal), this means that after feeding this block with data, you will wait for 10*T seconds to get the output out of it. This means the block has a 10*T seconds latency AND that the block will only output data each 10*T seconds. Pipelining is a way of improving the block throughput. We won't be able to improve the block latency, because we are not going to optimize the algorithm itself used inside this hardware block : the computation needed to transform an input into an output will still take 10*T seconds. What we can do with pipelining is making it possible for this block to reduce the time between two outputs, therefore increasing the throughput of the block.
Yes! That's the trick! You have to let the block accept another input BEFORE it has totally computed the previous input. The idea is to break the algorithm down to smaller blocks, all chained (pipelined) together : this is pipelining. Each of these smaller blocks constituting the pipeline is called "a stage". A 3-stages pipelined version of this hardware block would look like this :
This is a nice simplified drawing of what a typical pipeline looks like. Simplified? Yes. This just shows the idea of a chain of elements with a few control lines, I removed the clock and reset lines to make it simpler but we still need those.
Well, yes. That's it. Let assume the following statements are correct :
2 + 3 + 5 = 10 OK, that sounds logical, we didn't change the algorithm so it's not any better, we just split it up in 3 parts. But something has changed. Now when the "stage 1" is done with its data, it can give the output to the "stage 2" whenever the latter is ready and then start processing a new output. And this applies to the following stages too. Indeed none of the smaller block can start a new processing until it has passed its output data to the next smaller block, i.e untill the next smaller block is ready As a result, the pipeline "speed" will be the "speed" of its slowest stage. Which means that in our example the pipeline will output data every 5*T seconds! That's an improvement, we have twice more data coming out from the pipelined block than from the non-pipelined block in a given time period. This is why pipelines are widely used in the conception of CPUs. Usually, a CPU contains an Instruction Pipeline whose goal is to fetch the machine code instructions from main memory, decode them, execute them and write the results back into registers and memory. Naturally, the different stages of this Instruction Pipeline are :
Here is a an example of such an Instruction Pipeline : IF : Instruction Fetch ID : Instruction Decode EX : Execute MEM : Memory access WB : Register write back For more informations about pipelines you can look at the Instruction Pipeline Wikipedia page, it's pretty well documented. You can look at my code on github to see how I implemented a simple 3-stages pipeline in Verilog. In this code each stage is doing exactly the same thing, I just duplicated the code and renamed the stages. Each stage takes an 8-bits integer as input, increments it and then outputs it. I guess you can therefore easily conclude that all this pipeline does is adding 3 to a given 8-bits integer. How to run the code ?
Thanks for reading me !
|
| Last Updated on Friday, 04 November 2011 11:58 |







