Array optimizing problem in C++? - Java

This is a discussion on Array optimizing problem in C++? - Java ; On Mar 25, 12:20 pm, Lionel B <m...@privacy.net> wrote: > > Personally I've never managed to code up a scenario (using GCC with > various optimisations) where __restrict__ appears to have made any > difference whatsoever. Try the program I gave ...

+ Reply to Thread
Page 5 of 13 FirstFirst ... 3 4 5 6 7 ... LastLast
Results 41 to 50 of 125

Array optimizing problem in C++?

  1. Default Re: Array optimizing problem in C++?

    On Mar 25, 12:20 pm, Lionel B <m...@privacy.net> wrote:
    >
    > Personally I've never managed to code up a scenario (using GCC with
    > various optimisations) where __restrict__ appears to have made any
    > difference whatsoever.


    Try the program I gave earlier on this topic.
    Compile it with :
    g++ -O3

    Then uncomment the line :
    //#define NO_ALIASING_OPTIMIZATION

    and compile it again with g++ -O3.

    There should be a difference.
    Let me know if you don't find any.

    Here is the program :

    #include <iostream>
    #include <ctime>

    //#define NO_ALIASING_OPTIMIZATION

    const int len = 50000;

    __attribute__((noinline))
    #ifndef NO_ALIASING_OPTIMIZATION
    void smooth (int* dest, int * src )
    #else
    void smooth (int* __restrict dest, int * __restrict src )
    #endif
    {
    for ( int i = 0 ; i < 17 ; ++i )
    dest[ i ] = src[ i ] + src[ i + 1 ] + src[ i + 2 ];
    }

    void fill (int* src)
    {
    for (int i = 0 ; i < len ; ++ i )
    src[i] = i;
    }


    int main()
    {

    int src_array [len] = {0} ;
    int dest_array [len] = {0};

    fill(src_array);

    smooth (dest_array, dest_array); // dummy call

    clock_t start=clock();

    for (int i = 0; i < 100000000; i++)
    smooth (dest_array, src_array);


    clock_t endt=clock();

    std::cout <<"Time smooth(): " <<
    double(endt-start)/CLOCKS_PER_SEC * 1000 << " ms\n";

    // doesn't work without the following cout on vc++
    std::cout << dest_array [0] ;

    return 0;
    }

    Alexandre Courpron.

  2. Default Re: Array optimizing problem in C++?

    On Tue, 25 Mar 2008 04:48:26 -0700, courpron wrote:

    > On Mar 25, 12:20 pm, Lionel B <m...@privacy.net> wrote:
    >>
    >> Personally I've never managed to code up a scenario (using GCC with
    >> various optimisations) where __restrict__ appears to have made any
    >> difference whatsoever.

    >
    > Try the program I gave earlier on this topic. Compile it with :
    > g++ -O3
    >
    > Then uncomment the line :
    > //#define NO_ALIASING_OPTIMIZATION
    >
    > and compile it again with g++ -O3.
    >
    > There should be a difference.
    > Let me know if you don't find any.


    With g++ 4.1.2 (same results with 4.2.0)

    With aliasing optimisation:

    Time smooth(): 2120 ms

    Without aliasing optimisation:

    Time smooth(): 2120 ms

    With ICC (Intel compiler) using -restrict -O3

    With aliasing optimisation:

    Time smooth(): 3410 ms

    Without aliasing optimisation:

    Time smooth(): 2990 ms

    So there appears to be a small improvement with aliasing optimisation for
    ICC (although there is a fair amount of variance in the results, so not
    very significant) and no discernible difference for GCC. In fact I've
    checked that there is absolutely no difference in the assembler generated
    by g++. This is on linux x86_64 - I tried compiling 32-bit binaries
    (using -m32 flag) and again, no difference.

    --
    Lionel B

  3. Default Re: Array optimizing problem in C++?

    On Tue, 25 Mar 2008 12:54:47 +0000, Lionel B wrote:

    > On Tue, 25 Mar 2008 04:48:26 -0700, courpron wrote:
    >
    >> On Mar 25, 12:20 pm, Lionel B <m...@privacy.net> wrote:


    > With ICC (Intel compiler) using -restrict -O3
    >
    > With aliasing optimisation:
    >
    > Time smooth(): 3410 ms
    >
    > Without aliasing optimisation:
    >
    > Time smooth(): 2990 ms


    Sorry, those results were swapped round

    > So there appears to be a small improvement with aliasing optimisation
    > for ICC (although there is a fair amount of variance in the results, so
    > not very significant)


    --
    Lionel B

  4. Default Re: Array optimizing problem in C++?

    On Mar 25, 1:54 pm, Lionel B <m...@privacy.net> wrote:
    > On Tue, 25 Mar 2008 04:48:26 -0700, courpron wrote:
    > > On Mar 25, 12:20 pm, Lionel B <m...@privacy.net> wrote:

    >
    > >> Personally I've never managed to code up a scenario (using GCC with
    > >> various optimisations) where __restrict__ appears to have made any
    > >> difference whatsoever.

    >
    > > Try the program I gave earlier on this topic. Compile it with :
    > > g++ -O3

    >
    > > Then uncomment the line :
    > > //#define NO_ALIASING_OPTIMIZATION

    >
    > > and compile it again with g++ -O3.

    >
    > > There should be a difference.
    > > Let me know if you don't find any.

    >
    > With g++ 4.1.2 (same results with 4.2.0)
    >
    > With aliasing optimisation:
    >
    > Time smooth(): 2120 ms
    >
    > Without aliasing optimisation:
    >
    > Time smooth(): 2120 ms
    >


    I used exactly the same compiler (g++ 4.1.2) and there was a
    difference in the generated assembly listing (my architecture is 32
    bits, intel pentium 4)

    Could you post the generated assembly when NO_ALIASING_OPTIMIZATION is
    defined ?


    Alexandre Courpron.

  5. Default Re: Array optimizing problem in C++?

    On Tue, 25 Mar 2008 06:13:30 -0700, courpron wrote:

    > On Mar 25, 1:54 pm, Lionel B <m...@privacy.net> wrote:
    >> On Tue, 25 Mar 2008 04:48:26 -0700, courpron wrote:
    >> > On Mar 25, 12:20 pm, Lionel B <m...@privacy.net> wrote:

    >>
    >> >> Personally I've never managed to code up a scenario (using GCC with
    >> >> various optimisations) where __restrict__ appears to have made any
    >> >> difference whatsoever.

    >>
    >> > Try the program I gave earlier on this topic. Compile it with : g++
    >> > -O3

    >>
    >> > Then uncomment the line :
    >> > //#define NO_ALIASING_OPTIMIZATION

    >>
    >> > and compile it again with g++ -O3.

    >>
    >> > There should be a difference.
    >> > Let me know if you don't find any.

    >>
    >> With g++ 4.1.2 (same results with 4.2.0)
    >>
    >> With aliasing optimisation:
    >>
    >> Time smooth(): 2120 ms
    >>
    >> Without aliasing optimisation:
    >>
    >> Time smooth(): 2120 ms
    >>

    > I used exactly the same compiler (g++ 4.1.2) and there was a difference
    > in the generated assembly listing (my architecture is 32 bits, intel
    > pentium 4)
    >
    > Could you post the generated assembly when NO_ALIASING_OPTIMIZATION is
    > defined ?


    Sure (here's the 32-bit version, as you might be able to compare it better):

    $ g++ -m32 -O3 -S scratch.cpp
    $ cat scratch.s

    .file "scratch.cpp"
    .section .ctors,"aw",@progbits
    .align 4
    .long _GLOBAL__I__Z6smoothPiS_
    .text
    .align 2
    .p2align 4,,15
    ..globl _Z6smoothPiS_
    .type _Z6smoothPiS_, @function
    _Z6smoothPiS_:
    ..LFB1435:
    pushl %ebp
    ..LCFI0:
    movl %esp, %ebp
    ..LCFI1:
    movl 12(%ebp), %edx
    pushl %esi
    ..LCFI2:
    movl 8(%ebp), %esi
    pushl %ebx
    ..LCFI3:
    movl 8(%edx), %eax
    leal 8(%edx), %ebx
    addl 4(%edx), %eax
    addl (%edx), %eax
    leal 4(%edx), %ecx
    movl %eax, (%esi)
    movl 4(%ebx), %eax
    addl 4(%ecx), %eax
    addl 4(%edx), %eax
    movl %eax, 4(%esi)
    movl 8(%edx), %eax
    addl 8(%ecx), %eax
    addl 8(%ebx), %eax
    movl %eax, 8(%esi)
    movl 12(%edx), %eax
    addl 12(%ecx), %eax
    addl 12(%ebx), %eax
    movl %eax, 12(%esi)
    movl 16(%edx), %eax
    addl 16(%ecx), %eax
    addl 16(%ebx), %eax
    movl %eax, 16(%esi)
    movl 20(%edx), %eax
    addl 20(%ecx), %eax
    addl 20(%ebx), %eax
    movl %eax, 20(%esi)
    movl 24(%edx), %eax
    addl 24(%ecx), %eax
    addl 24(%ebx), %eax
    movl %eax, 24(%esi)
    movl 28(%edx), %eax
    addl 28(%ecx), %eax
    addl 28(%ebx), %eax
    movl %eax, 28(%esi)
    movl 32(%edx), %eax
    addl 32(%ecx), %eax
    addl 32(%ebx), %eax
    movl %eax, 32(%esi)
    movl 36(%edx), %eax
    addl 36(%ecx), %eax
    addl 36(%ebx), %eax
    movl %eax, 36(%esi)
    movl 40(%edx), %eax
    addl 40(%ecx), %eax
    addl 40(%ebx), %eax
    movl %eax, 40(%esi)
    movl 44(%edx), %eax
    addl 44(%ecx), %eax
    addl 44(%ebx), %eax
    movl %eax, 44(%esi)
    movl 48(%edx), %eax
    addl 48(%ecx), %eax
    addl 48(%ebx), %eax
    movl %eax, 48(%esi)
    movl 52(%edx), %eax
    addl 52(%ecx), %eax
    addl 52(%ebx), %eax
    movl %eax, 52(%esi)
    movl 56(%edx), %eax
    addl 56(%ecx), %eax
    addl 56(%ebx), %eax
    movl %eax, 56(%esi)
    movl 60(%edx), %eax
    addl 60(%ecx), %eax
    addl 60(%ebx), %eax
    movl %eax, 60(%esi)
    movl 64(%edx), %eax
    addl 64(%ecx), %eax
    addl 64(%ebx), %eax
    movl %eax, 64(%esi)
    popl %ebx
    popl %esi
    popl %ebp
    ret
    ..LFE1435:
    .size _Z6smoothPiS_, .-_Z6smoothPiS_
    ..globl __gxx_personality_v0
    .align 2
    .p2align 4,,15
    ..globl _Z4fillPi
    .type _Z4fillPi, @function
    _Z4fillPi:
    ..LFB1436:
    pushl %ebp
    ..LCFI4:
    xorl %eax, %eax
    movl %esp, %ebp
    ..LCFI5:
    movl 8(%ebp), %edx
    .p2align 4,,7
    ..L4:
    movl %eax, (%edx,%eax,4)
    addl $1, %eax
    cmpl $50000, %eax
    jne .L4
    popl %ebp
    ret
    ..LFE1436:
    .size _Z4fillPi, .-_Z4fillPi
    .align 2
    .p2align 4,,15
    .type _Z41__static_initialization_and_destruction_0ii, @function
    _Z41__static_initialization_and_destruction_0ii:
    ..LFB1591:
    pushl %ebp
    ..LCFI6:
    movl %esp, %ebp
    ..LCFI7:
    subl $24, %esp
    ..LCFI8:
    subl $1, %eax
    je .L15
    ..L14:
    leave
    ret
    .p2align 4,,7
    ..L15:
    cmpl $65535, %edx
    jne .L14
    movl $_ZSt8__ioinit, (%esp)
    call _ZNSt8ios_base4InitC1Ev
    movl $__dso_handle, 8(%esp)
    movl $0, 4(%esp)
    movl $__tcf_0, (%esp)
    call __cxa_atexit
    leave
    ret
    ..LFE1591:
    .size _Z41__static_initialization_and_destruction_0ii, .-_Z41__static_initialization_and_destruction_0ii
    .align 2
    .p2align 4,,15
    .type _GLOBAL__I__Z6smoothPiS_, @function
    _GLOBAL__I__Z6smoothPiS_:
    ..LFB1593:
    pushl %ebp
    ..LCFI9:
    movl $65535, %edx
    movl %esp, %ebp
    ..LCFI10:
    movl $1, %eax
    popl %ebp
    jmp _Z41__static_initialization_and_destruction_0ii
    ..LFE1593:
    .size _GLOBAL__I__Z6smoothPiS_, .-_GLOBAL__I__Z6smoothPiS_
    .align 2
    .p2align 4,,15
    .type __tcf_0, @function
    __tcf_0:
    ..LFB1592:
    pushl %ebp
    ..LCFI11:
    movl %esp, %ebp
    ..LCFI12:
    movl $_ZSt8__ioinit, 8(%ebp)
    popl %ebp
    jmp _ZNSt8ios_base4InitD1Ev
    ..LFE1592:
    .size __tcf_0, .-__tcf_0
    .section .rodata.str1.1,"aMS",@progbits,1
    ..LC0:
    .string "Time smooth(): "
    ..LC3:
    .string " ms\n"
    .section .rodata.cst4,"aM",@progbits,4
    .align 4
    ..LC1:
    .long 1232348160
    .align 4
    ..LC2:
    .long 1148846080
    .text
    .align 2
    .p2align 4,,15
    ..globl main
    .type main, @function
    main:
    ..LFB1437:
    leal 4(%esp), %ecx
    ..LCFI13:
    andl $-16, %esp
    pushl -4(%ecx)
    ..LCFI14:
    pushl %ebp
    ..LCFI15:
    movl %esp, %ebp
    ..LCFI16:
    pushl %edi
    ..LCFI17:
    pushl %esi
    ..LCFI18:
    pushl %ebx
    ..LCFI19:
    pushl %ecx
    ..LCFI20:
    subl $400024, %esp
    ..LCFI21:
    leal -200016(%ebp), %esi
    movl $200000, 8(%esp)
    leal -400016(%ebp), %edi
    movl $0, 4(%esp)
    movl %esi, (%esp)
    call memset
    movl $200000, 8(%esp)
    movl $0, 4(%esp)
    movl %edi, (%esp)
    call memset
    xorl %eax, %eax
    .p2align 4,,7
    ..L21:
    movl %eax, (%esi,%eax,4)
    addl $1, %eax
    cmpl $50000, %eax
    jne .L21
    movl %edi, 4(%esp)
    xorl %ebx, %ebx
    movl %edi, (%esp)
    call _Z6smoothPiS_
    call clock
    movl %eax, -400020(%ebp)
    .p2align 4,,7
    ..L23:
    movl %esi, 4(%esp)
    addl $1, %ebx
    movl %edi, (%esp)
    call _Z6smoothPiS_
    cmpl $100000000, %ebx
    jne .L23
    call clock
    movl $.LC0, 4(%esp)
    movl $_ZSt4cout, (%esp)
    movl %eax, %ebx
    call _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
    subl -400020(%ebp), %ebx
    pushl %ebx
    fildl (%esp)
    addl $4, %esp
    fdivs .LC1
    movl %eax, (%esp)
    fmuls .LC2
    fstpl 4(%esp)
    call _ZNSolsEd
    movl $.LC3, 4(%esp)
    movl %eax, (%esp)
    call _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
    movl -400016(%ebp), %eax
    movl $_ZSt4cout, (%esp)
    movl %eax, 4(%esp)
    call _ZNSolsEi
    addl $400024, %esp
    xorl %eax, %eax
    popl %ecx
    popl %ebx
    popl %esi
    popl %edi
    popl %ebp
    leal -4(%ecx), %esp
    ret
    ..LFE1437:
    .size main, .-main
    .local _ZSt8__ioinit
    .comm _ZSt8__ioinit,1,1
    .weakref _Z20__gthrw_pthread_oncePiPFvvE,pthread_once
    .weakref _Z27__gthrw_pthread_getspecificj,pthread_getspecific
    .weakref _Z27__gthrw_pthread_setspecificjPKv,pthread_setspecific
    .weakref _Z22__gthrw_pthread_createPmPK14pthread_attr_tPFPvS3_ES3_,pthread_create
    .weakref _Z22__gthrw_pthread_cancelm,pthread_cancel
    .weakref _Z26__gthrw_pthread_mutex_lockP15pthread_mutex_t,pthread_mutex_lock
    .weakref _Z29__gthrw_pthread_mutex_trylockP15pthread_mutex_t,pthread_mutex_trylock
    .weakref _Z28__gthrw_pthread_mutex_unlockP15pthread_mutex_t,pthread_mutex_unlock
    .weakref _Z26__gthrw_pthread_mutex_initP15pthread_mutex_tPK19pthread_mutexattr_t,pthread_mutex_init
    .weakref _Z26__gthrw_pthread_key_createPjPFvPvE,pthread_key_create
    .weakref _Z26__gthrw_pthread_key_deletej,pthread_key_delete
    .weakref _Z30__gthrw_pthread_mutexattr_initP19pthread_mutexattr_t,pthread_mutexattr_init
    .weakref _Z33__gthrw_pthread_mutexattr_settypeP19pthread_mutexattr_ti,pthread_mutexattr_settype
    .weakref _Z33__gthrw_pthread_mutexattr_destroyP19pthread_mutexattr_t,pthread_mutexattr_destroy
    .section .eh_frame,"a",@progbits
    ..Lframe1:
    .long .LECIE1-.LSCIE1
    ..LSCIE1:
    .long 0x0
    .byte 0x1
    .string "zP"
    .uleb128 0x1
    .sleb128 -4
    .byte 0x8
    .uleb128 0x5
    .byte 0x0
    .long __gxx_personality_v0
    .byte 0xc
    .uleb128 0x4
    .uleb128 0x4
    .byte 0x88
    .uleb128 0x1
    .align 4
    ..LECIE1:
    ..LSFDE5:
    .long .LEFDE5-.LASFDE5
    ..LASFDE5:
    .long .LASFDE5-.Lframe1
    .long .LFB1591
    .long .LFE1591-.LFB1591
    .uleb128 0x0
    .byte 0x4
    .long .LCFI6-.LFB1591
    .byte 0xe
    .uleb128 0x8
    .byte 0x85
    .uleb128 0x2
    .byte 0x4
    .long .LCFI7-.LCFI6
    .byte 0xd
    .uleb128 0x5
    .align 4
    ..LEFDE5:
    ..LSFDE11:
    .long .LEFDE11-.LASFDE11
    ..LASFDE11:
    .long .LASFDE11-.Lframe1
    .long .LFB1437
    .long .LFE1437-.LFB1437
    .uleb128 0x0
    .byte 0x4
    .long .LCFI13-.LFB1437
    .byte 0xc
    .uleb128 0x1
    .uleb128 0x0
    .byte 0x9
    .uleb128 0x4
    .uleb128 0x1
    .byte 0x4
    .long .LCFI14-.LCFI13
    .byte 0xc
    .uleb128 0x4
    .uleb128 0x4
    .byte 0x4
    .long .LCFI15-.LCFI14
    .byte 0xe
    .uleb128 0x8
    .byte 0x85
    .uleb128 0x2
    .byte 0x4
    .long .LCFI16-.LCFI15
    .byte 0xd
    .uleb128 0x5
    .byte 0x4
    .long .LCFI20-.LCFI16
    .byte 0x84
    .uleb128 0x6
    .byte 0x83
    .uleb128 0x5
    .byte 0x86
    .uleb128 0x4
    .byte 0x87
    .uleb128 0x3
    .align 4
    ..LEFDE11:
    .ident "GCC: (GNU) 4.1.2 20070626 (Red Hat 4.1.2-14)"
    .section .note.GNU-stack,"",@progbits


    --
    Lionel B

  6. Default Re: Array optimizing problem in C++?

    On Mar 25, 2:23 pm, Lionel B <m...@privacy.net> wrote:
    > [snip asm]


    It doesn't generate the same thing on my machine. Your assembly
    listing doesn't display any aliasing optimization. There is still 3
    memory access for each iteration (2 reads and 1 write). I also see
    that there are "weakref" in your assembly listing, which is not the
    case on my machine. This is probably a libstd++ issue. So, if you
    don't mind, could you try the following program (simpler smooth
    function and no #include, we can always resort to the command "time ./
    a.out" to measure performances of each version) :

    //#define NO_ALIASING_OPTIMIZATION

    const int len = 16;

    __attribute__((noinline))
    #ifndef NO_ALIASING_OPTIMIZATION
    void smooth (int* dest, int * src )
    #else
    void smooth (int* __restrict dest, int * __restrict src )
    #endif
    {
    for ( int i = 0 ; i < len ; ++i )
    dest[ i ] = *src;
    }

    int main()
    {

    int dest_array [len];
    int *src = new int();

    for (int i = 0; i < 100000000; i++)
    smooth (dest_array, src);

    return 0;
    }


    Alexandre Courpron.

  7. Default Re: Array optimizing problem in C++?

    On Tue, 25 Mar 2008 07:22:36 -0700, courpron wrote:

    > On Mar 25, 2:23 pm, Lionel B <m...@privacy.net> wrote:
    >> [snip asm]

    >
    > It doesn't generate the same thing on my machine. Your assembly listing
    > doesn't display any aliasing optimization. There is still 3 memory
    > access for each iteration (2 reads and 1 write). I also see that there
    > are "weakref" in your assembly listing, which is not the case on my
    > machine. This is probably a libstd++ issue. So, if you don't mind, could
    > you try the following program (simpler smooth function and no #include,
    > we can always resort to the command "time ./ a.out" to measure
    > performances of each version) :


    That seems to have made a difference - to the assembly at least - I
    still don't see any significant difference in run time, however.

    Compiled with:

    g++ -m32 -S -O3 scratch.cpp

    **************************** with no-alias optimisation:

    .file "scratch.cpp"
    .text
    .align 2
    .p2align 4,,15
    ..globl _Z6smoothPiS_
    .type _Z6smoothPiS_, @function
    _Z6smoothPiS_:
    ..LFB2:
    pushl %ebp
    ..LCFI0:
    movl %esp, %ebp
    ..LCFI1:
    movl 12(%ebp), %edx
    movl 8(%ebp), %eax
    movl (%edx), %edx
    movl %edx, (%eax)
    movl %edx, 4(%eax)
    movl %edx, 8(%eax)
    movl %edx, 12(%eax)
    movl %edx, 16(%eax)
    movl %edx, 20(%eax)
    movl %edx, 24(%eax)
    movl %edx, 28(%eax)
    movl %edx, 32(%eax)
    movl %edx, 36(%eax)
    movl %edx, 40(%eax)
    movl %edx, 44(%eax)
    movl %edx, 48(%eax)
    movl %edx, 52(%eax)
    movl %edx, 56(%eax)
    movl %edx, 60(%eax)
    popl %ebp
    ret
    ..LFE2:
    .size _Z6smoothPiS_, .-_Z6smoothPiS_
    ..globl __gxx_personality_v0
    .align 2
    .p2align 4,,15
    ..globl main
    .type main, @function
    main:
    ..LFB3:
    leal 4(%esp), %ecx
    ..LCFI2:
    andl $-16, %esp
    pushl -4(%ecx)
    ..LCFI3:
    pushl %ebp
    ..LCFI4:
    movl %esp, %ebp
    ..LCFI5:
    pushl %edi
    ..LCFI6:
    pushl %esi
    ..LCFI7:
    pushl %ebx
    ..LCFI8:
    xorl %ebx, %ebx
    pushl %ecx
    ..LCFI9:
    subl $72, %esp
    ..LCFI10:
    movl $4, (%esp)
    leal -80(%ebp), %edi
    call _Znwj
    movl %eax, %esi
    movl $0, (%eax)
    .p2align 4,,7
    ..L4:
    movl %esi, 4(%esp)
    addl $1, %ebx
    movl %edi, (%esp)
    call _Z6smoothPiS_
    cmpl $100000000, %ebx
    jne .L4
    addl $72, %esp
    xorl %eax, %eax
    popl %ecx
    popl %ebx
    popl %esi
    popl %edi
    popl %ebp
    leal -4(%ecx), %esp
    ret
    ..LFE3:
    .size main, .-main
    .section .eh_frame,"a",@progbits
    ..Lframe1:
    .long .LECIE1-.LSCIE1
    ..LSCIE1:
    .long 0x0
    .byte 0x1
    .string "zP"
    .uleb128 0x1
    .sleb128 -4
    .byte 0x8
    .uleb128 0x5
    .byte 0x0
    .long __gxx_personality_v0
    .byte 0xc
    .uleb128 0x4
    .uleb128 0x4
    .byte 0x88
    .uleb128 0x1
    .align 4
    ..LECIE1:
    ..LSFDE3:
    .long .LEFDE3-.LASFDE3
    ..LASFDE3:
    .long .LASFDE3-.Lframe1
    .long .LFB3
    .long .LFE3-.LFB3
    .uleb128 0x0
    .byte 0x4
    .long .LCFI2-.LFB3
    .byte 0xc
    .uleb128 0x1
    .uleb128 0x0
    .byte 0x9
    .uleb128 0x4
    .uleb128 0x1
    .byte 0x4
    .long .LCFI3-.LCFI2
    .byte 0xc
    .uleb128 0x4
    .uleb128 0x4
    .byte 0x4
    .long .LCFI4-.LCFI3
    .byte 0xe
    .uleb128 0x8
    .byte 0x85
    .uleb128 0x2
    .byte 0x4
    .long .LCFI5-.LCFI4
    .byte 0xd
    .uleb128 0x5
    .byte 0x4
    .long .LCFI8-.LCFI5
    .byte 0x83
    .uleb128 0x5
    .byte 0x86
    .uleb128 0x4
    .byte 0x87
    .uleb128 0x3
    .byte 0x4
    .long .LCFI9-.LCFI8
    .byte 0x84
    .uleb128 0x6
    .align 4
    ..LEFDE3:
    .ident "GCC: (GNU) 4.1.2 20070626 (Red Hat 4.1.2-14)"
    .section .note.GNU-stack,"",@progbits

    **************************** without no-alias optimisation:

    .file "scratch.cpp"
    .text
    .align 2
    .p2align 4,,15
    ..globl _Z6smoothPiS_
    .type _Z6smoothPiS_, @function
    _Z6smoothPiS_:
    ..LFB2:
    pushl %ebp
    ..LCFI0:
    movl %esp, %ebp
    ..LCFI1:
    movl 12(%ebp), %eax
    movl 8(%ebp), %edx
    movl (%eax), %ecx
    movl %ecx, (%edx)
    movl (%eax), %ecx
    movl %ecx, 4(%edx)
    movl (%eax), %ecx
    movl %ecx, 8(%edx)
    movl (%eax), %ecx
    movl %ecx, 12(%edx)
    movl (%eax), %ecx
    movl %ecx, 16(%edx)
    movl (%eax), %ecx
    movl %ecx, 20(%edx)
    movl (%eax), %ecx
    movl %ecx, 24(%edx)
    movl (%eax), %ecx
    movl %ecx, 28(%edx)
    movl (%eax), %ecx
    movl %ecx, 32(%edx)
    movl (%eax), %ecx
    movl %ecx, 36(%edx)
    movl (%eax), %ecx
    movl %ecx, 40(%edx)
    movl (%eax), %ecx
    movl %ecx, 44(%edx)
    movl (%eax), %ecx
    movl %ecx, 48(%edx)
    movl (%eax), %ecx
    movl %ecx, 52(%edx)
    movl (%eax), %ecx
    movl %ecx, 56(%edx)
    movl (%eax), %eax
    movl %eax, 60(%edx)
    popl %ebp
    ret
    ..LFE2:
    .size _Z6smoothPiS_, .-_Z6smoothPiS_
    ..globl __gxx_personality_v0
    .align 2
    .p2align 4,,15
    ..globl main
    .type main, @function
    main:
    ..LFB3:
    leal 4(%esp), %ecx
    ..LCFI2:
    andl $-16, %esp
    pushl -4(%ecx)
    ..LCFI3:
    pushl %ebp
    ..LCFI4:
    movl %esp, %ebp
    ..LCFI5:
    pushl %edi
    ..LCFI6:
    pushl %esi
    ..LCFI7:
    pushl %ebx
    ..LCFI8:
    xorl %ebx, %ebx
    pushl %ecx
    ..LCFI9:
    subl $72, %esp
    ..LCFI10:
    movl $4, (%esp)
    leal -80(%ebp), %edi
    call _Znwj
    movl %eax, %esi
    movl $0, (%eax)
    .p2align 4,,7
    ..L4:
    movl %esi, 4(%esp)
    addl $1, %ebx
    movl %edi, (%esp)
    call _Z6smoothPiS_
    cmpl $100000000, %ebx
    jne .L4
    addl $72, %esp
    xorl %eax, %eax
    popl %ecx
    popl %ebx
    popl %esi
    popl %edi
    popl %ebp
    leal -4(%ecx), %esp
    ret
    ..LFE3:
    .size main, .-main
    .section .eh_frame,"a",@progbits
    ..Lframe1:
    .long .LECIE1-.LSCIE1
    ..LSCIE1:
    .long 0x0
    .byte 0x1
    .string "zP"
    .uleb128 0x1
    .sleb128 -4
    .byte 0x8
    .uleb128 0x5
    .byte 0x0
    .long __gxx_personality_v0
    .byte 0xc
    .uleb128 0x4
    .uleb128 0x4
    .byte 0x88
    .uleb128 0x1
    .align 4
    ..LECIE1:
    ..LSFDE3:
    .long .LEFDE3-.LASFDE3
    ..LASFDE3:
    .long .LASFDE3-.Lframe1
    .long .LFB3
    .long .LFE3-.LFB3
    .uleb128 0x0
    .byte 0x4
    .long .LCFI2-.LFB3
    .byte 0xc
    .uleb128 0x1
    .uleb128 0x0
    .byte 0x9
    .uleb128 0x4
    .uleb128 0x1
    .byte 0x4
    .long .LCFI3-.LCFI2
    .byte 0xc
    .uleb128 0x4
    .uleb128 0x4
    .byte 0x4
    .long .LCFI4-.LCFI3
    .byte 0xe
    .uleb128 0x8
    .byte 0x85
    .uleb128 0x2
    .byte 0x4
    .long .LCFI5-.LCFI4
    .byte 0xd
    .uleb128 0x5
    .byte 0x4
    .long .LCFI8-.LCFI5
    .byte 0x83
    .uleb128 0x5
    .byte 0x86
    .uleb128 0x4
    .byte 0x87
    .uleb128 0x3
    .byte 0x4
    .long .LCFI9-.LCFI8
    .byte 0x84
    .uleb128 0x6
    .align 4
    ..LEFDE3:
    .ident "GCC: (GNU) 4.1.2 20070626 (Red Hat 4.1.2-14)"
    .section .note.GNU-stack,"",@progbits



    --
    Lionel B

  8. Default Re: Array optimizing problem in C++?

    On Mar 25, 3:58 pm, Lionel B <m...@privacy.net> wrote:
    > On Tue, 25 Mar 2008 07:22:36 -0700, courpron wrote:
    > > On Mar 25, 2:23 pm, Lionel B <m...@privacy.net> wrote:
    > >> [snip asm]

    >
    > That seems to have made a difference - to the assembly at least - I
    > still don't see any significant difference in run time, however.


    Yes, the performance increase on my machine was around 4%.
    In this example, a single int is read to fill the dest_array.
    Therefore, it stays easily in the cache and the performance
    degradation is minimal.
    Operations on large arrays can however leads to cache trashing and
    performance degradations would be much more visible.


    The assembly listing now shows no aliasing optimization.

    From this code :
    > for ( int i = 0 ; i < len ; ++i )
    > dest[ i ] = *src;



    generated asm with __restrict :

    > movl (%edx), %edx


    > movl %edx, (%eax)
    > movl %edx, 4(%eax)
    > movl %edx, 8(%eax)

    ...

    With __restrict, the value of *src is stored in the register once (in
    edx), and it is used for the rest of the iterations.


    generated asm without __restrict :

    > movl (%eax), %ecx
    > movl %ecx, (%edx)


    > movl (%eax), %ecx
    > movl %ecx, 4(%edx)


    > movl (%eax), %ecx
    > movl %ecx, 8(%edx)

    ...

    without restrict, the value is stored in the register (in ecx) for
    each iteration because the compiler doesn't know if &dest[i] and src
    are pointing to the same object. The compiler must take into account
    the situation where changing the value of dest[i] changes the value of
    *src. It deals with this by reloading each time the *src value.


    Alexandre Courpron.

  9. Default Re: Array optimizing problem in C++?

    On Tue, 25 Mar 2008 08:49:42 -0700, courpron wrote:

    > On Mar 25, 3:58 pm, Lionel B <m...@privacy.net> wrote:
    >> On Tue, 25 Mar 2008 07:22:36 -0700, courpron wrote:
    >> > On Mar 25, 2:23 pm, Lionel B <m...@privacy.net> wrote:
    >> >> [snip asm]

    >>
    >> That seems to have made a difference - to the assembly at least - I
    >> still don't see any significant difference in run time, however.

    >
    > Yes, the performance increase on my machine was around 4%.


    I've just checked your original code under g++ 4.3.0 (installed with its
    own libstdc++) and now I do indeed get the optimisation benefit:

    without no-alias optimisation: 3270 ms
    with no-alias optimisation: 1310 ms

    I suspect you were right about it being an issue with libstdc++

    Cheers,

    --
    Lionel B

  10. Default Re: Array optimizing problem in C++?

    Lionel B wrote:
    > On Sun, 23 Mar 2008 16:59:00 -0600, Jerry Coffin wrote:
    >
    >> In article <MIednRYljpBPQ3vanZ2dnUVZ_rLinZ2d@comcast.com>,
    >> andreytarasevich@hotmail.com says...
    >>
    >> [ ... ]
    >>
    >>> I don't really see much sense in comparing Java performance with
    >>> C++ performance. What might be more interesting is the effect of
    >>> the 'restrict' specifier supported by C99 compilers (and many
    >>> C89/90 and C++ compielers as an extension), which is intended to
    >>> assist compiler in performing exactly this kind of optimizations.

    >>
    >> Of, if you're really interested in C++, you could throw in a
    >> comparison use a valarray, which was designed to give the same
    >> sort of assurance. Unfortunately, I don't know of anybody who
    >> seems to have gone to much (if any) trouble to optimize valarray
    >> at all -- rather the contrary, it was an idea that even its own
    >> creator admits was in the wrong place at the wrong time, so it's
    >> ignored almost to death, so to speak.

    >
    > FWIW on my (gcc 4.1.1 supplied) implementation valarrays appear to
    > be simply pointers to new'ed memory with a (GCC-specific)
    > __restrict__ qualifier.
    >
    > Personally I've never managed to code up a scenario (using GCC with
    > various optimisations) where __restrict__ appears to have made any
    > difference whatsoever. I seem to recall that this has come up now
    > and again - inconclusively - on the g++ lists.


    You have to work with functions taking multiple pointers to arrays of
    the same type. This is not idiomatic for C++ code.


    Bo Persson



+ Reply to Thread
Page 5 of 13 FirstFirst ... 3 4 5 6 7 ... LastLast