| Register | FAQ | Calendar | Search | Today's Posts | Mark Forums Read |
|
#51
| |||
| |||
| "Ben Voigt [C++ MVP]" <rbv@nospam.nospam> wrote in message news:%23WmD47o6HHA.3264@TK2MSFTNGP02.phx.gbl... > > "Willy Denoyette [MVP]" <willy.denoyette@telenet.be> wrote in message > news:euVDFIc6HHA.3264@TK2MSFTNGP02.phx.gbl... >> "Ben Voigt [C++ MVP]" <rbv@nospam.nospam> wrote in message >> news:eRxhhZZ6HHA.1212@TK2MSFTNGP05.phx.gbl... >>> >>>> Sorry, above does not contain the correct verifiable C# function, here >>>> she >>>> is........ >>>> >>>> private const int MOD_ADLER = 65521; >>>> private static uint AdlerSafe(byte[] databytes) >>>> { >>>> const int unrollFactor = 16; >>>> uint a = 1, b = 0; >>>> int n = -1; >>>> int len = databytes.Length; >>>> while(len > 0) >>>> { >>>> int tlen = len > 5552 ? 5552 : len; >>> >>> Do you get the same result after changing this number? I would guess >>> probably so, in which case that could make a significant difference as >>> well. >>> >> >> Changing into what? >> If you mean into 5550, it makes no real (measurable) difference in >> performance. >> Take care, this is not the true for the "optimized C# (unsafe)" sample >> (based on marc adler's adler32 algorithm), here the NMAX length must be a >> multiple of the "unroll factor" while NMAX is the largest n such that >> 255n(n+1)/2 + (n+1)(BASE-1) <= (2^32)-1 (as per the adler32 algorithm) >> >> 5552 is the largest possible value for n, while this value is perfectly >> suitable for an "unroll factor" of 16. >> Changing NMAX into 5550 (which is a valid number), requires you to change >> the "unroll factor" into 15, else the result of the checksum will be >> incorrect! > > No fair, you're actually familiar with the algorithm in question! > >> Not really, the algorithm is well documented, all I did was derive the unsafe C# code from a fast C implementation and optimized the code a bit such to reduce register pressure while executing the inner loop. The result is that executing this : s1 += ptr[0]; s2 += s1; s1 += ptr[1] ... results in ..... movzx eax,byte ptr [edi] add esi,eax add ebx,esi movzx eax,byte ptr [edi+1] .... That means you need a 3 instructions sequence to handle one byte in the array which is the absolute minimum on X86, even an optimized C build can't do better. The loop unrolling reduces the number of writes to memory of the intermediate result, but has less effect than reducing the register pressure. But again, my point is that you should not have to do this, the language compilers and the JIT should do a better job here. It looks like MSFT is focusing on features and less on performance. Willy. |
![]() |
| Thread Tools | |
| Display Modes | |
In an effort to better serve ads to our visitors, cookies are used on objectmix.com. For more information, check out our Privacy Policy.