Search



Designed by:

Yann Sionneau's stories
Hardware pipelining example PDF Print E-mail
Written by Yann Sionneau   
Tuesday, 01 November 2011 11:24

Dear Open Source Hardware lovers, 


I wrote a small 3-stages pipeline example in Verilog and tested it using Icarus Verilog.

This particular pipeline example has no real interest, all it does is adding 3 to the input integer, it is for educational purposes only and to serve as a basis for more advanced things in the future.

What is a pipeline ? 

Quoting Wikipedia :

"In computing, a pipeline is a set of data processing elements connected in series, so that the output of one element is the input of the next one. The elements of a pipeline are often executed in parallel"

A drawing is better than 1 000 words, let assume we have a typical hardware block looking like this : 



So we have basically : 


  • A clock (input) because we are doing a synchronous design, right ?
  • A reset (input) because we want to be able to restart/reset our design.
  • A "Data in" input, which can be a bus of several lines, for example 8 input lines for an 8-bits input : this is the path for data coming to the block in order to be processed.
  • A "Data out" output, which can be a bus of several lines as well : this is the path for data coming out from the block, the result of the block's processing.


Actually this is usually not enough.

As users of this block, we need to know :


  • when it is available for computing (i.e not busy with another computing).
  • when the block has sampled our input data (and started working on it) so that we can present another input data.
Moreover, the block needs to know : 
  • when input data is correctly set, in order to sample it.
  • when output data has been received/sampled, in order to start working on another input data and then be able to present the next results at the output data pins.


Therefore, the previous block would usually look a little bit more like this : 




This block is doing a job on its input data, therefore producing it's output data. Very simple so far, right ?


Let assume this block does its computation in 10 clock cycles (10 periods T of the clock signal), this means that after feeding this block with data, you will wait for 10*T seconds to get the output out of it.


This means the block has a 10*T seconds latency AND that the block will only output data each 10*T seconds.


Pipelining is a way of improving the block throughput.


We won't be able to improve the block latency, because we are not going to optimize the algorithm itself used inside this hardware block : the computation needed to transform an input into an output will still take 10*T seconds.


What we can do with pipelining is making it possible for this block to reduce the time between two outputs, therefore increasing the throughput of the block.


"But how is it possible?" "You said the block can only compute in 10*T seconds and cannot accept another input while it is still computing!"


Yes! That's the trick! You have to let the block accept another input BEFORE it has totally computed the previous input.


The idea is to break the algorithm down to smaller blocks, all chained (pipelined) together : this is pipelining.


Each of these smaller blocks constituting the pipeline is called "a stage".


A 3-stages pipelined version of this hardware block would look like this : 




This is a nice simplified drawing of what a typical pipeline looks like.

Simplified? Yes. This just shows the idea of a chain of elements with a few control lines, I removed the clock and reset lines to make it simpler but we still need those.


"So now what? We now have 3 blocks instead of 1, and that's it? Why is this any better?"

Well, yes. That's it.


Let assume the following statements are correct : 


  • The first stage takes 2 clock cycles to do its job.
  • The second stage takes 3 clock cycles to do his share of the job.
  • The third stage of the pipeline takes a little bit longer : 5 clock cycles to finish the job.
2 + 3 + 5 = 10 OK, that sounds logical, we didn't change the algorithm so it's not any better, we just split it up in 3 parts.

But something has changed.

Now when the "stage 1" is done with its data, it can give the output to the "stage 2" whenever the latter is ready and then start processing a new output. And this applies to the following stages too.

Indeed none of the smaller block can start a new processing until it has passed its output data to the next smaller block, i.e untill the next smaller block is ready

As a result, the pipeline "speed" will be the "speed" of its slowest stage.

Which means that in our example the pipeline will output data every 5*T seconds! That's an improvement, we have twice more data coming out from the pipelined block than from the non-pipelined block in a given time period.

This is why pipelines are widely used in the conception of CPUs. Usually, a CPU contains an Instruction Pipeline whose goal is to fetch the machine code instructions from main memory, decode them, execute them and write the results back into registers and memory.

Naturally, the different stages of this Instruction Pipeline are : 

  • Fetch
  • Decode
  • Execute
  • Write Back
Here is a an example of such an Instruction Pipeline : 


IF : Instruction Fetch
ID : Instruction Decode
EX : Execute
MEM : Memory access
WB : Register write back

For more informations about pipelines you can look at the Instruction Pipeline Wikipedia page, it's pretty well documented.

You can look at my code on github to see how I implemented a simple 3-stages pipeline in Verilog.

In this code each stage is doing exactly the same thing, I just duplicated the code and renamed the stages.

Each stage takes an 8-bits integer as input, increments it and then outputs it.

I guess you can therefore easily conclude that all this pipeline does is adding 3 to a given 8-bits integer.

How to run the code ?

  • Install Icarus Verilog
    • On Mac OS X you can use MacPorts and just do "sudo port install iverilog"
    • On Ubuntu/debian you can do "sudo aptitude install iverilog" or "sudo apt-get install iverilog"
    • On another Linux distribution you can download the project archive and compile it : http://iverilog.icarus.com/
    • On a RPM based distribution you can download precompiled snapshots there : ftp://icarus.com/pub/eda/verilog/snapshots/precompiled/ 
  • git clone git://github.com/fallen/tinycpu.git && cd tinycpu/examples/pipeline/
  • make run

Thanks for reading me !
Last Updated on Friday, 04 November 2011 11:58
 
Milkymist talk at OSHUG #8 PDF Print E-mail
Written by Yann Sionneau   
Friday, 11 March 2011 21:40

I've been invited by Andrew Back to give a talk during the OSHUG #8 to present the Milkymist project.

OSHUG means "Open Source Hardware User Group". It's located in London.

The OSHUG has already done 7 events like this one, too bad I couldn't come, they looked all very interesting, here is the list of their events : http://oshug.org/

I have to say those guys are really welcoming and friendly !

You can find the slides I used there and the sources of the slides there, the slides are under CC-by-SA license.

I met really interesting and nice people at this OSHUG event, it was really great !

I met Ömer Kiliç (blog, twitter), hardware designer of the concurrency.cc, an "arduino like" board.

The design adds a connector to wire a battery and several colored LEDs. But what's really interesting about the concurrency.cc arduino is the way of programming it

You can program it using the occam-π programming language. A book has been written about how to program arduinos with this language.

The nice thing is it's a concurrent language, like in HDL. You can tell the arduino to execute several statements in parallel.

for example just do

PAR
  blink (11 , 500)
  blink (12 , 500)


And two LEDs are gonna blink simultaneously at the same frequency. Brilliant isn't it ? And bloody easy !

blink() is an example, you can put whatever function you want, f() ... g() ... etc !

I also met Nick Ager the director of get it made , his web site looks great, especially the video which is really funny :)

I met a lot of other really interesting people, I don't remember all the names (sorry guys, I have a really bad memory for names ! I remember the two I quoted because they gave me their business card !!).

I've been told about Xmos chips and the programming style which is highly parallel too and event driven and resembles a lot the FPGA programming style.

I really enjoyed the OSHUG #8, I am blogging to report about my experience at this event and to thank everyone who came to listen to my talk and the guys who did the organization (mostly Andrew I suppose).

Very nice meeting you guys, see you around :)

Some pictures of the event on flickr : 


Some goodies they gave me (thanks ! sorry I didn't have any Milkymist stickers :/) : 

I let you imagine what the next goodie is all about and how to use it ;) (Yes it's actually pined on my coat from now on ;))

Last Updated on Sunday, 24 April 2011 19:01
 
Touchsurface Android app now PC compatible ! PDF Print E-mail
Written by Yann Sionneau   
Friday, 21 January 2011 15:57

Hi guys !

The touchsurface application (port of the JGroups Draw demo for Android phones) I talked about in my last blog post is now PC compatible.

Which means you can now play with touchsurface on several phones AND on several computers at the same time !

Colors are now supported on the phone application :)

Source code of the app : https://github.com/fallen/touchsurface-android-jgroups

Source code of the JGroups port to Android : https://github.com/fallen/JGroups

Wanna try the application on your Android phone ? Just scan the following QRCode with your favourite barcode scanner :

QRCode

Application link : http://sionneau.net/touchsurface.apk

All you have to do :

  • Install the app on your Android ( >= 2.1 )
  • Connect your phone to a WiFi Access Point
  • The Access Point must not have a feature like "Access Point Isolation" activated
  • The Access Point must accept to forward broadcast packets to AP Clients for the discovery protocol (BPING) to work
  • Each device connected to the same WiFi Access Point and running the app (or the Draw demo from JGroups) should be able to participate in the game.

Enjoy :)

See ya !

Last Updated on Friday, 21 January 2011 17:29
 
JGroups port to Android PDF Print E-mail
Written by Yann Sionneau   
Tuesday, 11 January 2011 22:15

Hi guys !

It's been a long time ... I really don't have the time and the motivation to blog, I guess blogging is not for me !

Anyway, I am doing some relatively nice stuff on a school project lately.

The goal of the project : making games on smart phones where several players can play together with automatic discovery of the different players available (auto configuration). Basically all players will connect to a Wifi Access Point or do some bluetooth PAN, begin a game and play in a mobile context where connection can be lost, data can be lost, anyone could be disconnected any second because of distance, bug, shutting down the device or battery outage. What's interesting about that ?

Several points :

  • No Client/Server architecture, the different instances of the game on each phone will exchange the objects they need and communicate with each other, using group communication (broadcast / multicast) or unicast.
  • There can't be a disconnection of everybody caused by the disconnection (or bug, battery fail, whatever ...) of the server phone, since there will be *no* server phone.
  • No need to enter an IP address or choose a phone to connect to or whatever, as soon as the phones are connected to the same subnetwork (wifi, bluetooth or whatever) they discover each other and can join or leave a game.
  • Each game instance has the same code running, no server-side, no single point of failure.
That's it, so all of this is beautiful theory, how do you do that now ?
We are trying to use JGroups as the lowest level communication API (over IP), it is a Java API to do "multicast communications".
Basically JGroups allows you to create a group (represented by a name), everyone in the same group (the same name) can automatically discover the other participants of the group and begin to discuss with them either on a one-to-one (unicast) mode or one-to-many (multicast/broadcast) mode.
In this case the group name can be the name of the game so that each phone trying to play to the same game would be "connected" to the same group and could discuss with each other.
So the game would use a game API (defined by some researchers from Télécom SudParis), and this game API would be implemented using the JGroups API.
So what is this blog post about ?
Just to tell that I just did a port of JGroups to android (2.1 to 2.3) which is available at https://github.com/fallen/JGroups
I successfully ran some demo programs using JGroups on android phones (HTC Desire, HTC Hero and Nexus One), some demos were also run on an Ubuntu and a Mac OS.
I ported and ran on those three phones a modified version of the "Draw" JGroups demo program.
It's a whiteboard, you can draw on the whiteboard with your finger touching the screen of the phone, and each point you draw is then transmitted to other group members. Several players can draw on the same whiteboard using their own phone.
Then I ported the SimpleChat JGroups demo program, there is no GUI though, but this time it is compatible with the computer version.
It has been tested simultaneously with 3 different phones and 2 computers (1 Mac OS X and 1 Ubuntu Linux).
You can write on the two computers console and the messages will be transmitted to everyone, you will be able to read them in the phones' syslogs (via adb logcat).
The phones will send in an infinite loop a message to the group with "Hello world from *phone name*", you will be able to read them in the phone's syslogs as well as in the computers' consoles.
I will keep you posted if I have something new about this project !
Last Updated on Monday, 17 January 2011 09:26
 
<< Start < Prev 1 2 3 4 Next > End >>

Page 1 of 4