Software Pipelining

This is a discussion on Software Pipelining within the Compilers forums in Theory and Concepts category; Hi, Software pipelining is a technique to reorder loops to achieve more parallelism at the instruction level. I was wondering if this optimization is mainly beneficial for processors that have multiple functional units like VLIWs, or can significant performance improvements be also achieved for RISC processors with a restricted number of functional unis like just two 4-stage pipelines? I was also wondering if software pipelining can benefit from profiling information, i.e. can the knowledge about frequently executed paths be exploited for a better pipelining? Do you know any successful applications? Regards, Tim...

Go Back   Application Development Forum > Theory and Concepts > Compilers

Object Mix

Register FAQ Calendar Search Today's Posts Mark Forums Read
  #1  
Old 08-26-2008, 10:57 AM
Tim Frink
Guest
 
Default Software Pipelining

Hi,

Software pipelining is a technique to reorder loops to achieve more
parallelism at the instruction level. I was wondering if this
optimization is mainly beneficial for processors that have multiple
functional units like VLIWs, or can significant performance
improvements be also achieved for RISC processors with a restricted
number of functional unis like just two 4-stage pipelines?

I was also wondering if software pipelining can benefit from profiling
information, i.e. can the knowledge about frequently executed paths be
exploited for a better pipelining? Do you know any successful
applications?

Regards,
Tim

Reply With Quote
  #2  
Old 08-28-2008, 02:19 AM
Jatin Bhateja
Guest
 
Default Re:Software Pipelining

It's a combination of loop unrolling and instruction scheduling. e.g

forloop (...) {
a[i] = b[i]; // stmt 1
c[i] = a[i]; // stmt 2
d[i] = c[i]; // stmt 3
}


there is a clear true dependence b/w stmt 1 , 2 and 3 which constrains
their execution order. If each statement takes one unit time then
each iteration will take 3 unit time, so if there are 2 iterations
then loop execution will take 6 unit time.

But if we unroll the loop and the reorder the instruction execution
order in such a way that all the non-dependent statements can execute
simultaneously then we can save on execution time. i.e.

1. Unrolling step .

a[0] = b[0]
c[0] = a[0]
d[0] = c[0]
a[1] = b[1]
c[1] = a[1]
d[1] = c[1]


2. After Instruction scheduling step.

a[0] = b[0]
a[1] = b[1]

c[0] = a[0]
c[1] = a[1]

d[0] = c[0]
d[1] = c[1]

Thus instructions in each of above pair of instruction can run
parallely. Thus if the architecture has multiple functional units then
we can run the above loop in 3 units times. Also we can club the
instruction present in each group in a long VLIW format if the
processor is VLIW, this will reduce the initial loop into few VLIW
instructions.

Optimization is called pipelining because we are able to overlap the
execution of multiple instruction after optimizations.

Thanks
- Jatin Bhateja
Reply With Quote
  #3  
Old 08-28-2008, 04:51 PM
Tim Frink
Guest
 
Default Re:Software Pipelining

On Thu, 28 Aug 2008 11:49:49 +0530, Jatin Bhateja wrote:

> It's a combination of loop unrolling and instruction scheduling. e.g


Thank you, but I know how software pipelining is working. :-)

My question was if also a significant performance increase for RISC
architectures with a restricted number of functional units can be
expected when software pipelining is applied.

And if profiling might be exploited here.
Reply With Quote
  #4  
Old 08-29-2008, 08:42 AM
Pertti Kellomaki
Guest
 
Default Re: Software Pipelining

Tim Frink wrote:
> My question was if also a significant performance increase for RISC
> architectures with a restricted number of functional units can be
> expected when software pipelining is applied.


I'm not sure about significant increase, but it seems that even with
a small number of FUs pipelining could be used to hide latency (e.g.
memory access). But if FUs are already busy without software
pipelining, then pipelining is not going to make them any busier.
--
Pertti

Reply With Quote
  #5  
Old 09-03-2008, 01:32 AM
Neeraj Goel
Guest
 
Default Re: Software Pipelining

Also, the overheads of modulo scheduling can make it worse.
Overheads include, more registers, prolog and epilog instructions.
As Tim said, only advantage is in hiding latencies such as memory
access
and floating point.

=Neeraj

On Aug 29, 5:42 pm, Pertti Kellomaki <pertti.kellom...@tut.fi> wrote:
> Tim Frink wrote:
> > My question was if also a significant performance increase for RISC
> > architectures with a restricted number of functional units can be
> > expected when software pipelining is applied.

>
> I'm not sure about significant increase, but it seems that even with
> a small number of FUs pipelining could be used to hide latency (e.g.
> memory access). But if FUs are already busy without software
> pipelining, then pipelining is not going to make them any busier.

Reply With Quote
  #6  
Old 09-08-2008, 08:25 AM
Touati Sid
Guest
 
Default Re: Software Pipelining

Neeraj Goel a icrit :

> Also, the overheads of modulo scheduling can make it worse.
> Overheads include, more registers, prolog and epilog instructions.
> As Tim said, only advantage is in hiding latencies such as memory
> access and floating point.


Software pipelining is usually beneficial for loops with a high number
of iterations.

S

ps: In order to hide memory latencies, non blocking caches should be
present in the underlying cpu. This is not common in vliw processors.

Reply With Quote
  #7  
Old 09-11-2008, 12:15 AM
kamal
Guest
 
Default Re: Software Pipelining

On Aug 29, 1:51 am, Tim Frink <plfr...@yahoo.de> wrote:
> On Thu, 28 Aug 2008 11:49:49 +0530, Jatin Bhateja wrote:
> > It's a combination of loop unrolling and instruction scheduling. e.g

>
> Thank you, but I know how software pipelining is working. :-)
>
> My question was if also a significant performance increase for RISC
> architectures with a restricted number of functional units can be
> expected when software pipelining is applied.


You need more than 1 functional unit to take advantage of pipelining.
Be it RISC or VLIW instruction format, that has no bearing on how many
functional units a processor can have,

The itanium processor has a RISC instruction format, and it has a
notion of an EPIC (explicitly parallel instruction) bundle. The
compiler will have to generate a bundle wherein multiple instructions
will be handed to the processor and scheduled for parallel execution
and it will yield a good performance if each of those instructions can
utilize different functional units at the same time.

> And if profiling might be exploited here.


I didn't get you there i.e. profiling doesn't seem related to
pipelining as described above.

regards
-kamal

Reply With Quote
  #8  
Old 09-11-2008, 04:57 PM
johnhull2008
Guest
 
Default Re: Software Pipelining

On Aug 28, 1:51 pm, Tim Frink <plfr...@yahoo.de> wrote:
> On Thu, 28 Aug 2008 11:49:49 +0530, Jatin Bhateja wrote:
> > It's a combination of loop unrolling and instruction scheduling. e.g

>
> Thank you, but I know how software pipelining is working. :-)
>
> My question was if also a significant performance increase for RISC
> architectures with a restricted number of functional units can be
> expected when software pipelining is applied.
>
> And if profiling might be exploited here.


I believe software pipelining can improve performance even when there
are only a few functional units.
For example, suppose the following code is run on a machine with a
single FU.
All the
It is just incrementing an array of 10 elements.

ldc r0, 0
L1:
0. load r1, 0(r0) # latency 2
1. add r1, r1, 1 # latency 1
2. store 0(r0), r1 # latency 1
3. add r0, r0, 1 # latency 1
4. cmp r0, 10 # latency 1
5. blt L1 # latency 1

For a non-SW pipelined code, a single iteration would take 7 cycles.
Now, for the SW pipelined code, a single iteration would take 6 cycles
on average.

## prologue
op0_0
noop
op1_0
op2_0
op3_0
op4_0
######## 1st iteration of steady state
op0_1
op5_0
op1_1
op2_1
op3_1
op4_1
######## 2nd iteration of steady state
op0_2
op5_1
op1_2
op2_2
op3_2
op4_2
########
..............

The degree of performance increase of a SW pipelined loop over a non-
SW pipelined loop depends on several factors, including the number of
resources (FUs), whether there are recurrences in the loop, the number
of iterations and also register pressure.

Reply With Quote
  #9  
Old 09-16-2008, 05:41 PM
Tim Frink
Guest
 
Default Re: Software Pipelining

>> And if profiling might be exploited here.
>
> I didn't get you there i.e. profiling doesn't seem related to
> pipelining as described above.


This was just a question if it would make sense to somehow
combine software pipelining with the most frequently executed
path (determined by profiling). I can't imagine any useful
combination of these two techniques, but maybe there are some?

Regards,
Tim

Reply With Quote
  #10  
Old 09-16-2008, 05:45 PM
Tim Frink
Guest
 
Default Re: Software Pipelining

> I believe software pipelining can improve performance even when there
> are only a few functional units.
> For example, suppose the following code is run on a machine with a
> single FU.
> All the
> It is just incrementing an array of 10 elements.
>
> ldc r0, 0
> L1:
> 0. load r1, 0(r0) # latency 2
> 1. add r1, r1, 1 # latency 1
> 2. store 0(r0), r1 # latency 1
> 3. add r0, r0, 1 # latency 1
> 4. cmp r0, 10 # latency 1
> 5. blt L1 # latency 1
>
> For a non-SW pipelined code, a single iteration would take 7 cycles.
> Now, for the SW pipelined code, a single iteration would take 6 cycles
> on average.


Even though I understand that this is an example, I still
have the feeling that with just a single functional unit, the
benefits of pipelining are marginal, in your example it's just
about 1 cycle. But I also agree that this might have some
positive effect when the loop has a high iteration count.

Reply With Quote
Reply


Thread Tools
Display Modes


All times are GMT -5. The time now is 06:05 AM.


Powered by vBulletin® Version 3.7.2
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
vB Ad Management by =RedTyger=

In an effort to better serve ads to our visitors, cookies are used on objectmix.com. For more information, check out our Privacy Policy.