Ultimate Prime Sieve -- Sieve of Zakiya (SoZ)

This is a discussion on Ultimate Prime Sieve -- Sieve of Zakiya (SoZ) within the Forth forums in Programming Languages category; jzakiya <jzakiya @ mail.com> writes Re: Ultimate Prime Sieve -- Sieve of Zakiya (SoZ) > On Aug 11, 4:43 pm, jzakiya <jzak...@mail.com> wrote: >> On Aug 11, 2:37 pm, jzakiya <jzak...@mail.com> wrote: [..] > Not knowing the AMD X2 architecture, these are generic concepts. > I hope they give you some good ideas on how to partition the > threading. The algorithm below is correct, so that is progress. I will keep this program as a benchmark. Currently the results are a bit disappointing: FORTH> largetest sieve of 1000000007 bytes. primesP5 : 17.505 seconds. (50,847,535 primes in sieve) 2 * ...

Go Back   Application Development Forum > Programming Languages > Forth

Object Mix

Register FAQ Calendar Search Today's Posts Mark Forums Read
  #21  
Old 08-11-2008, 07:06 PM
Marcel Hendrix
Guest
 
Default Re: Ultimate Prime Sieve -- Sieve of Zakiya (SoZ)

jzakiya <jzakiya@mail.com> writes Re: Ultimate Prime Sieve -- Sieve of Zakiya (SoZ)

> On Aug 11, 4:43 pm, jzakiya <jzak...@mail.com> wrote:
>> On Aug 11, 2:37 pm, jzakiya <jzak...@mail.com> wrote:

[..]
> Not knowing the AMD X2 architecture, these are generic concepts.
> I hope they give you some good ideas on how to partition the
> threading.


The algorithm below is correct, so that is progress. I will keep
this program as a benchmark. Currently the results are a bit
disappointing:

FORTH> largetest
sieve of 1000000007 bytes.
primesP5 : 17.505 seconds. (50,847,535 primes in sieve)
2 * primesP5 : 16.279 seconds. (50,847,535 primes in sieve) ok

... 1.2 seconds speedup.

Possibly the independent updates of the Sieve cause cacheline trashing.

The overhead of taskswitching is currently quite high. However, there is
no reason that taskswitches should happen here. Maybe I'll find the time
to test the memory hypothesis tomorrow.

-marcel

-- ---------------------------------
: array CREATE allocate ?allocate ,
FORGET> @ FREE ?allocate
does> @ + ;

#1000000007
\ #8000007
=: maxsize

0 value num
maxsize to num \ set number to find primes upto
num 1- 2/ value lndx \ max index for sieve array
lndx 105 + array sieve \ 105 will allow sieve for modulus <= 210

: setup ( num -- )
maxsize umin
DUP TO num 1- 2/ TO lndx
0 sieve lndx 105 + erase ;

\ show number of primes found
: .nprimes 1 ( for prime 2) lndx 0 do I sieve c@ if 1+ then loop u>d (n,3) ." primes in sieve" ;

\ display prime values
: .primes ." 2 " lndx 0 do I sieve c@ if I 2* 3 + . then loop ;

: 2*primesP5 ( N -- )
0 0 0 0 0 0 0 0 LOCALS| n1 n2 n3 n4 n5 n6 n7 n8 |

setup

-1 to n1 2 to n2 4 to n3 5 to n4 7 to n5 8 to n6 10 to n7 13 to n8
begin n8 lndx <
while 15 +to n1 15 +to n2 15 +to n3 15 +to n4
15 +to n5 15 +to n6 15 +to n7 15 +to n8
1 n1 sieve c! 1 n2 sieve c! 1 n3 sieve c! 1 n4 sieve c!
1 n5 sieve c! 1 n6 sieve c! 1 n7 sieve c! 1 n8 sieve c!
repeat

\ now initialize sieve array with (odd) primes < 30
1 0 sieve c! 1 1 sieve c! 1 2 sieve c! 1 4 sieve c! 1 5 sieve c!
1 7 sieve c! 1 8 sieve c! 1 10 sieve c! 1 13 sieve c!

PAR
STARTP \ Core 1
0 0 0 0 0 LOCALS| p1 p2 p7 p8 x1 |
num sqrt 3 - 2/ 1+ 2 do
I sieve c@ if
\ Compute p1,p2,p7,p8 and X1 for I
I 2* 3 + ( j) I over dup 2* dup >r + + ( p1) dup to p1 r@ + ( p2)
dup to p2 over + ( p3) r@ + ( p4) over + ( p5) r@ + ( p6) over + r> + ( p7)
dup to p7 + ( p8) dup to p8 I - to x1
begin p8 lndx < while
p1 sieve c0! p2 sieve c0! p7 sieve c0! p8 sieve c0!
x1 +to p1 x1 +to p2 x1 +to p7 x1 +to p8
repeat
p7 lndx < if p7 sieve c0! then
p2 lndx < if p2 sieve c0! then
p1 lndx < if p1 sieve c0! then
then
loop
ENDP
STARTP \ Core 2
0 0 0 0 0 LOCALS| p3 p4 p5 p6 x1 |
num sqrt 3 - 2/ 1+ 2 do
I sieve c@ if
\ Compute p3,p4,p5,p6 and X1 for I
I 2* 3 + ( j) I over dup 2* dup >r + + ( p1) r@ + ( p2) over + ( p3)
dup to p3 r@ + ( p4) dup to p4 over + ( p5)
dup to p5 r@ + ( p6) dup to p6 over + r> + ( p7) + ( p8) I - to x1
begin p6 lndx < while
p3 sieve c0! p4 sieve c0! p5 sieve c0! p6 sieve c0!
x1 +to p3 x1 +to p4 x1 +to p5 x1 +to p6
repeat
p5 lndx < if p5 sieve c0! then
p4 lndx < if p4 sieve c0! then
p3 lndx < if p3 sieve c0! then
then
loop
ENDP
ENDPAR ;

: largetest
maxsize local sz
cr ." sieve of " maxsize DEC. ." bytes."
cr ." primesP3 : " timer-reset sz primesP3 .ms ." (" .nprimes ." )"
cr ." primesP5 : " timer-reset sz primesP5 .ms ." (" .nprimes ." )"
cr ." primesP7 : " timer-reset sz primesP7 .ms ." (" .nprimes ." )"
cr ." 2 * primesP5 : " timer-reset sz 2*primesP5 .ms ." (" .nprimes ." )" ;

DOC
(*
FORTH> largetest
sieve of 1000000007 bytes.
primesP3 : 18.222 seconds. (50,847,535 primes in sieve)
primesP5 : 15.068 seconds. (50,847,535 primes in sieve)
primesP5p : 14.064 seconds. (50,847,535 primes in sieve)
primesP7 : 13.450 seconds. (50,847,535 primes in sieve)

FORTH> largetest
sieve of 1000000007 bytes.
primesP3 : 20.839 seconds. (50,847,535 primes in sieve)
primesP5 : 17.505 seconds. (50,847,535 primes in sieve)
primesP7 : 15.710 seconds. (50,847,535 primes in sieve)
2 * primesP5 : 16.279 seconds. (50,847,535 primes in sieve) ok
*)
ENDDOC

Reply With Quote
  #22  
Old 08-19-2008, 01:10 PM
jzakiya
Guest
 
Default Re: Ultimate Prime Sieve -- Sieve of Zakiya (SoZ)

On Aug 11, 7:06*pm, m...@iae.nl (Marcel Hendrix) wrote:
> jzakiya <jzak...@mail.com> writes Re: Ultimate Prime Sieve -- Sieve of Zakiya (SoZ)
>
>
>
> > On Aug 11, 4:43 pm, jzakiya <jzak...@mail.com> wrote:
> >> On Aug 11, 2:37 pm, jzakiya <jzak...@mail.com> wrote:

> [..]
> > Not knowing the AMD X2 architecture, these are generic concepts.
> > I hope they give you some good ideas on how to partition the
> > threading.

>
> The algorithm below is correct, so that is progress. I will keep
> this program as a benchmark. Currently the results are a bit
> disappointing:
>
> * * * * FORTH> largetest
> * * * * sieve of 1000000007 bytes.
> * * * * primesP5 * * : 17.505 seconds. (50,847,535 primes in sieve)
> * * * * 2 * primesP5 : 16.279 seconds. (50,847,535 primes in sieve) ok
>
> .. 1.2 seconds speedup.
>
> Possibly the independent updates of the Sieve cause cacheline trashing.
>
> The overhead of taskswitching is currently quite high. However, there is
> no reason that taskswitches should happen here. Maybe I'll find the time
> to test the memory hypothesis tomorrow.
>
> -marcel
>
> -- ---------------------------------
> : array CREATE *allocate ?allocate ,
> * * * * FORGET> @ FREE ?allocate
> * * * * does> * @ + ;
>
> *#1000000007
> \ * #8000007
> * * =: maxsize
>
> * *0 value num
> * *maxsize to num * * * * *\ set number to find primes upto
> * *num 1- 2/ value lndx * *\ max index for sieve array
> * *lndx 105 + array sieve *\ 105 will allow sieve for modulus <= 210
>
> : setup ( num -- )
> * * * * maxsize umin
> * * * * DUP TO num *1- 2/ TO lndx
> * * * * 0 sieve lndx 105 + erase ;
>
> \ show number of primes found
> : .nprimes *1 ( for prime 2) lndx 0 do I sieve *c@ if 1+ then loop *u>d (n,3) ." primes in sieve" *;
>
> \ display prime values
> : .primes *." 2 " lndx 0 do I sieve c@ if I 2* 3 + . then loop ;
>
> : 2*primesP5 ( N -- )
> * * * * 0 0 0 0 *0 0 0 0 LOCALS| n1 n2 n3 n4 *n5 n6 n7 n8 |
>
> * * * * setup
>
> * * * * -1 to n1 *2 to n2 *4 to n3 *5 to n4 *7 to n5 *8to n6 *10 to n7 *13 to n8
> * * * * begin n8 lndx <
> * * * * while 15 +to n1 * 15 +to n2 * 15 +to n3 *15 +to n4
> * * * * * * * 15 +to n5 * 15 +to n6 * 15 +to n7 *15 +to n8
> * * * * * * * 1 n1 sieve c! * 1 n2 sieve c! * 1 n3 sieve c! *1 n4 sieve c!
> * * * * * * * 1 n5 sieve c! * 1 n6 sieve c! * 1 n7 sieve c! *1 n8 sieve c!
> * * * * repeat
>
> * * * * \ now initialize sieve array with (odd) primes < 30
> * * * * 1 0 sieve c! *1 1 sieve c! *1 2 *sieve c! *1 4 *sieve c! *1 5 sieve c!
> * * * * 1 7 sieve c! *1 8 sieve c! *1 10 sieve c! *1 13 sieve c!
>
> * * * * PAR
> * * * * *STARTP \ Core 1
> * * * * * 0 0 0 0 0 LOCALS| p1 p2 p7 p8 x1 |
> * * * * * num sqrt 3 - 2/ 1+ 2 do
> * * * * * * * *I sieve c@ if
> * * * * * * * * * \ Compute p1,p2,p7,p8 and X1 for I
> * * * * * * * * * I 2* 3 + ( j) I over dup 2* dup >r + + ( p1) dup to p1 r@ *+ ( p2)
> * * * * * * * * * dup to p2 over + ( p3) r@ + ( p4) over + ( p5) r@ + ( p6) over + r> + ( p7)
> * * * * * * * * * dup to p7 * * *+ ( p8) dup to p8 *I - to x1
> * * * * * * * * * begin p8 lndx < while
> * * * * * * * * * * *p1 sieve c0! *p2 sieve c0! *p7 sieve c0! *p8 sieve c0!
> * * * * * * * * * * *x1 +to p1 * * x1 +to p2 * * x1 +to p7 * * x1 +to p8
> * * * * * * * * * repeat
> * * * * * * * * * p7 lndx < if *p7 sieve c0! *then
> * * * * * * * * * p2 lndx < if *p2 sieve c0! *then
> * * * * * * * * * p1 lndx < if *p1 sieve c0! *then
> * * * * * * * * then
> * * * * * *loop
> * * * * *ENDP
> * * * * *STARTP \ Core 2
> * * * * * 0 0 0 0 0 LOCALS| p3 p4 p5 p6 x1 |
> * * * * * num sqrt 3 - 2/ 1+ 2 do
> * * * * * * * *I sieve c@ if
> * * * * * * * * * \ Compute p3,p4,p5,p6 and X1 for I
> * * * * * * * * * I 2* 3 + ( j) I over dup 2* dup >r + + ( p1) r@ *+ ( p2) over + ( p3)
> * * * * * * * * * dup to p3 r@ + ( p4) dup to p4 over *+ ( p5)
> * * * * * * * * * dup to p5 r@ + ( p6) dup to p6 over *+ r> + ( p7) + ( p8) I - to x1
> * * * * * * * * * begin p6 lndx < while
> * * * * * * * * * * *p3 sieve c0! *p4 sieve c0! *p5 sieve c0! *p6 sieve c0!
> * * * * * * * * * * *x1 +to p3 * * x1 +to p4 * * x1 +to p5 * * x1 +to p6
> * * * * * * * * * repeat
> * * * * * * * * * p5 lndx < if *p5 sieve c0! *then
> * * * * * * * * * p4 lndx < if *p4 sieve c0! *then
> * * * * * * * * * p3 lndx < if *p3 sieve c0! *then
> * * * * * * * * then
> * * * * * *loop
> * * * * *ENDP
> * * * * ENDPAR ;
>
> : largetest
> * * * * maxsize local sz
> * * * * cr ." sieve of " maxsize DEC. ." bytes."
> * * * * cr ." primesP3 * * : " timer-reset *sz primesP3 ** .ms ." *(" .nprimes ." )"
> * * * * cr ." primesP5 * * : " timer-reset *sz primesP5 ** .ms ." *(" .nprimes ." )"
> * * * * cr ." primesP7 * * : " timer-reset *sz primesP7 ** .ms ." *(" .nprimes ." )"
> * * * * cr ." 2 * primesP5 : " timer-reset *sz 2*primesP5 * .ms ." *(" .nprimes ." )" ;
>
> DOC
> (*
> * * * * FORTH> largetest
> * * * * sieve of 1000000007 bytes.
> * * * * primesP3 * * : 18.222 seconds. (50,847,535 primes in sieve)
> * * * * primesP5 * * : 15.068 seconds. (50,847,535 primes in sieve)
> * * * * primesP5p * *: 14.064 seconds. (50,847,535 primes in sieve)
> * * * * primesP7 * * : 13.450 seconds. (50,847,535 primes in sieve)
>
> * * * * FORTH> largetest
> * * * * sieve of 1000000007 bytes.
> * * * * primesP3 * * : 20.839 seconds. (50,847,535 primes in sieve)
> * * * * primesP5 * * : 17.505 seconds. (50,847,535 primes in sieve)
> * * * * primesP7 * * : 15.710 seconds. (50,847,535 primes in sieve)
> * * * * 2 * primesP5 : 16.279 seconds. (50,847,535 primes in sieve) ok
> *)
> ENDDOC


Marcel,

The C guy who has the quadcore 9650 system said he had to do a lot of
cache optimizing with the sieve array, partitioning the array in
segments and working on it in the cache. I don't know the details, but
that is the primary optimization detail.

It may be useful to test your techniques using P3 first since it only
has two residues, which can be mapped to each core without too much
task switching.

Jabari
Reply With Quote
  #23  
Old 08-23-2008, 07:49 AM
Marcel Hendrix
Guest
 
Default Re: Ultimate Prime Sieve -- Sieve of Zakiya (SoZ)

mhx@iae.nl (Marcel Hendrix) writes Re: Ultimate Prime Sieve -- Sieve of Zakiya (SoZ)

> jzakiya <jzakiya@mail.com> writes Re: Ultimate Prime Sieve -- Sieve of Zakiya (SoZ)


>> On Aug 11, 4:43 pm, jzakiya <jzak...@mail.com> wrote:
>>> On Aug 11, 2:37 pm, jzakiya <jzak...@mail.com> wrote:

[..]
>> Not knowing the AMD X2 architecture, these are generic concepts.
>> I hope they give you some good ideas on how to partition the
>> threading.


> The algorithm below is correct, so that is progress. I will keep
> this program as a benchmark. Currently the results are a bit
> disappointing:


> FORTH> largetest
> sieve of 1000000007 bytes.
> primesP5 : 17.505 seconds. (50,847,535 primes in sieve)
> 2 * primesP5 : 16.279 seconds. (50,847,535 primes in sieve) ok


> .. 1.2 seconds speedup.


> Possibly the independent updates of the Sieve cause cacheline trashing.


> The overhead of taskswitching is currently quite high. However, there is
> no reason that taskswitches should happen here. Maybe I'll find the time
> to test the memory hypothesis tomorrow.


I still don't have a satisfying answer, but I can offer more numbers.

First, the results let me wonder if the thread idea was really capable
of generating speedups. Therefore I resurrected an old benchmark that
read accesses a big array (not fitting in cache) row-by-row and then column
by column. To parallellize this I let one cpu get the odd rows or columns
and the other the even ones. I compared this to code running on a single
cpu that first accessed the odd, then the even rows/columns. The results:

FORTH> test ( AMD X2, 3 GHz, each TEST repeated 40 times)
Testing access to a 2047x2047 matrix ...
rmTEST : 4.965 seconds elapsed.
rmTESTp : 2.614 seconds elapsed.
cmTEST : 1.073 seconds elapsed.
cmTESTp : 0.623 seconds elapsed.

Here r/cmTEST is the single-CPU code accessing by row/column. A trailing "p"
is for parallellized code running on two CPUs (AMD X2).

By comparing rmTEST and cmTEST it is clear that column-by-column access (adjacent
in memory) is far faster, but everybody knows that already.

By comparing rmTESTp and cmTESTp it is clear that column-by-column access is also
way faster when using two CPUs. Nothing unexpected.

By comparing r/cmTEST to r/cmTESTp it is clear that using two CPUs definitely
helps a lot. For rows it is 1.9 times faster, for columns 1.7 times. But for the
Sieve idea this gives no explanation.

After this, I hypothesized that cache problems are preventing the advantage of
having more CPUs. (e.g. because there is only one cache shared by both). To test
this in extreme form, I did fully random read access to the array.

FORTH> test ( AMD X2, 3 GHz, each TEST repeated 40 times)
Testing access to a 2047x2047 matrix ...
rmTESTr : 6.687 seconds elapsed. ( seq. column, random row 1 CPU)
rmTESTpr : 9.893 seconds elapsed. ( seq. column, random row 1 CPU)
cmTESTr : 1.833 seconds elapsed.
cmTESTpr : 3.456 seconds elapsed. ok

It can be seen that in this case using 2 CPU's is almost 1.5 - 1.9 times SLOWER
than using a single CPU. (The general slow-down is caused by using SIZE CHOOSE
instead of just I.)

I didn't see a slow-down from using two CPU's, however, this effect might explain
why naive parallelization simply does not help. First the cache problems should
be solved, and one should always check if the paralellel code does not run slower
than the original.

In case of the Sieve, it will probably very fast if it can be done in badges
that fit in the CPU cache (instead of allocating a single giant array).

-marcel

Reply With Quote
  #24  
Old 10-13-2008, 05:16 PM
jzakiya
Guest
 
Default Re: Ultimate Prime Sieve -- Sieve of Zakiya (SoZ)

Just uploaded an updated SieveOfZakiya.f file.

http://www.4shared.com/account/dir/7...bd7b71/sharing

Added generator primesP60, fast SoE (Sieve of Erat), and gforth
BENCHMARKS word.

When (if) I get around to it, I will add primesP11 to complete the
Forth versions of the SoZ generators I've done in Ruby and Python.

Jabari
Reply With Quote
  #25  
Old 10-18-2008, 01:07 PM
Helmar
Guest
 
Default Re: Ultimate Prime Sieve -- Sieve of Zakiya (SoZ)

On Oct 13, 5:16*pm, jzakiya <jzak...@mail.com> wrote:
> Just uploaded an updated SieveOfZakiya.f file.
>
> http://www.4shared.com/account/dir/7...bd7b71/sharing
>
> Added generator primesP60, fast SoE (Sieve of Erat), and gforth
> BENCHMARKS word.
>
> When (if) I get around to it, I will add primesP11 to complete the
> Forth versions of the SoZ generators I've done in Ruby and Python.
>
> Jabari


Sorry for beeing late with my answer, also for your first post. You
seem to play with the same interesting things I played some years ago.
So I for myself do not use full sieves anymore. I play with "partial"
sieves. You call your thing "Ultimate Prime Sieve" - I dont like this.
And I don't believe this.
Your paper is missing a nice explanation what you are doing. Maybe it
is there somewhere, but it should be at the beginning and it should be
short.

As I told, I played with similar issues. My ideas (directly coming
from the idea of Eratosthene) are somewhat unusual as I noticed - but
I guess they are faster

I only want to know the primes in a range. I do this by sieving.
Primality-tests or factoring are not my primary interest.

The sources you can get via the primekit in http://maschenwerk.de/dl/helforth2.51-src.tgz

It's inside the bench/ directory and there you find also sieve3-1.h4
which is an implementation with some bigmath that is only a "little
bit" slower than the C-implementation with native data types.

What do you guess the calculation of primes between
1120100127123123122 and 1120100127123146968 needs using a sieve? For
me it takes about 16 seconds and the processor used is very slow (Low-
end Celeron E1200, only a single thread, no use of the "dual core").

As far as I know the algorithm I do have was not beaten by anyone
(using a sieve) as of now. Does yours? It would be interesting why it
does so.

But what I know is that your "Ultimative" will not hold very long.

I'll soon take a closer look at your work, but you know, there was a
lot of work done before. Also my program beats only partitionally the
others. And one of the things you can not know exactly in this area
are unpublished works. I did not make any work to make the things
using multiple CPU-cores. But my ideas could be a step to use multiple
CPUs. And I'm sure other people did think about it...

Grüße,
Helmar


Reply With Quote
  #26  
Old 11-03-2008, 12:00 PM
jzakiya
Guest
 
Default Re: Ultimate Prime Sieve -- Sieve of Zakiya (SoZ)

On Oct 18, 12:07*pm, Helmar <hel...@gmail.com> wrote:
> On Oct 13, 5:16*pm, jzakiya <jzak...@mail.com> wrote:
>
> > Just uploaded an updated SieveOfZakiya.f file.

>
> >http://www.4shared.com/account/dir/7...bd7b71/sharing

>
> > Added generator primesP60, fast SoE (Sieveof Erat), and gforth
> > BENCHMARKS word.

>
> > When (if) I get around to it, I will add primesP11 to complete the
> > Forth versions of the SoZ generators I've done in Ruby and Python.

>
> > Jabari

>
> Sorry for beeing late with my answer, also for your first post. You
> seem to play with the same interesting things I played some years ago.
> So I for myself do not use full sieves anymore. I play with "partial"
> sieves. You call your thing "Ultimate PrimeSieve" - I dont like this.
> And I don't believe this.
> Your paper is missing a nice explanation what you are doing. Maybe it
> is there somewhere, but it should be at the beginning and it should be
> short.
>
> As I told, I played with similar issues. My ideas (directly coming
> from the idea of Eratosthene) are somewhat unusual as I noticed - but
> I guess they are faster
>
> I only want to know the primes in a range. I do this by sieving.
> Primality-tests or factoring are not my primary interest.
>
> The sources you can get via the primekit inhttp://maschenwerk.de/dl/helforth2.51-src.tgz
>
> It's inside the bench/ directory and there you find also sieve3-1.h4
> which is an implementation with some bigmath that is only a "little
> bit" slower than the C-implementation with native data types.
>
> What do you guess the calculation of primes between
> 1120100127123123122 and 1120100127123146968 needs using asieve? For
> me it takes about 16 seconds and the processor used is very slow (Low-
> end Celeron E1200, only a single thread, no use of the "dual core").
>
> As far as I know the algorithm I do have was not beaten by anyone
> (using asieve) as of now. Does yours? It would be interesting why it
> does so.
>
> But what I know is that your "Ultimative" will not hold very long.
>
> I'll soon take a closer look at your work, but you know, there was a
> lot of work done before. Also my program beats only partitionally the
> others. And one of the things you can not know exactly in this area
> are unpublished works. I did not make any work to make the things
> using multiple CPU-cores. But my ideas could be a step to use multiple
> CPUs. And I'm sure other people did think about it...
>
> Grüße,
> Helmar


Update: 2008/11/03
Architecture & coding improvements, 10-20% better performance on
gforth
Renamed generators. 'Benchmarks' verified with the new gforth 0.7.0.

I am 90% finished writing up a mathematical analysis of my method.
In the process I found an architectural optimization to the sieve
process
which is incorporated in the new code. Complexity analysis showing
other
interesting stuff for each generator. When finished will post paper
here:

http://www.4shared.com/account/dir/7...bd7b71/sharing

OK Marcel, please run it thru Iforth and post results. Should be much
better.

Jabari
Reply With Quote
  #27  
Old 11-04-2008, 04:23 PM
Marcel Hendrix
Guest
 
Default Re: Ultimate Prime Sieve -- Sieve of Zakiya (SoZ)

jzakiya <jzakiya@gmail.com> writes Re: Ultimate Prime Sieve -- Sieve of Zakiya (SoZ)
[..]
> http://www.4shared.com/account/dir/7...bd7b71/sharing


> OK Marcel, please run it thru Iforth and post results. Should be much better.


Changed ALLOT to ALLOCATE and SECS to TIMER-RESET .. .ELAPSED .

FORTH> benchmarks ( AMD X2, 3 GHz, iForth64 )
For N = 800000001 times in secs for these SoZ generators
SoZP7 9.557 seconds elapsed.
SoZP5 10.713 seconds elapsed.
SoZP60 11.185 seconds elapsed.
SoZP3 12.858 seconds elapsed.
SoE 15.980 seconds elapsed. ok

FORTH> benchmarks ( Intel PIV, 3 GHz, iForth32 )
For N = 800000001 times in secs for these SoZ generators
SoZP7 17.854 seconds elapsed.
SoZP5 20.201 seconds elapsed.
SoZP60 20.356 seconds elapsed.
SoZP3 24.329 seconds elapsed.
SoE 31.151 seconds elapsed. ok

Not converted to parallel code.

-marcel

Reply With Quote
  #28  
Old 11-05-2008, 02:42 AM
jzakiya
Guest
 
Default Re: Ultimate Prime Sieve -- Sieve of Zakiya (SoZ)

On Nov 4, 4:23*pm, m...@iae.nl (Marcel Hendrix) wrote:
> jzakiya <jzak...@gmail.com> writes Re: Ultimate Prime Sieve -- Sieve of Zakiya (SoZ)
> [..]
>
> >http://www.4shared.com/account/dir/7...bd7b71/sharing
> > OK Marcel, please run it thru Iforth and post results. Should be much better.

>
> Changed *ALLOT to ALLOCATE *and *SECS to TIMER-RESET .. .ELAPSED .
>
> FORTH> benchmarks ( AMD X2, 3 GHz, iForth64 )
> * * * * For N = 800000001 times in secs for these SoZ generators
> * * * * SoZP7 *9.557 seconds elapsed.
> * * * * SoZP5 *10.713 seconds elapsed.
> * * * * SoZP60 11.185 seconds elapsed.
> * * * * SoZP3 *12.858 seconds elapsed.
> * * * * SoE * *15.980 seconds elapsed. ok
>
> FORTH> benchmarks ( Intel PIV, 3 GHz, iForth32 )
> * * * * For N = 800000001 times in secs for these SoZ generators
> * * * * SoZP7 *17.854 seconds elapsed.
> * * * * SoZP5 *20.201 seconds elapsed.
> * * * * SoZP60 20.356 seconds elapsed.
> * * * * SoZP3 *24.329 seconds elapsed.
> * * * * SoE * *31.151 seconds elapsed. ok
>
> Not converted to parallel code.
>
> -marcel


Thanks Marcel, those are great times.

I'm finally doing SoZP11, which will be the fastest of the lot.
Hope to finish my paper my next week too.

Jabari
Reply With Quote
  #29  
Old 11-10-2008, 01:16 PM
jzakiya
Guest
 
Default Re: Ultimate Prime Sieve -- Sieve of Zakiya (SoZ)

On Nov 5, 2:42*am, jzakiya <jzak...@mail.com> wrote:
> On Nov 4, 4:23*pm, m...@iae.nl (Marcel Hendrix) wrote:
>
>
>
> > jzakiya <jzak...@gmail.com> writes Re: Ultimate Prime Sieve -- Sieve ofZakiya (SoZ)
> > [..]

>
> > >http://www.4shared.com/account/dir/7...bd7b71/sharing
> > > OK Marcel, please run it thru Iforth and post results. Should be muchbetter.

>
> > Changed *ALLOT to ALLOCATE *and *SECS to TIMER-RESET .. .ELAPSED ..

>
> > FORTH> benchmarks ( AMD X2, 3 GHz, iForth64 )
> > * * * * For N = 800000001 times in secs for these SoZ generators
> > * * * * SoZP7 *9.557 seconds elapsed.
> > * * * * SoZP5 *10.713 seconds elapsed.
> > * * * * SoZP60 11.185 seconds elapsed.
> > * * * * SoZP3 *12.858 seconds elapsed.
> > * * * * SoE * *15.980 seconds elapsed. ok

>
> > FORTH> benchmarks ( Intel PIV, 3 GHz, iForth32 )
> > * * * * For N = 800000001 times in secs for these SoZ generators
> > * * * * SoZP7 *17.854 seconds elapsed.
> > * * * * SoZP5 *20.201 seconds elapsed.
> > * * * * SoZP60 20.356 seconds elapsed.
> > * * * * SoZP3 *24.329 seconds elapsed.
> > * * * * SoE * *31.151 seconds elapsed. ok

>
> > Not converted to parallel code.

>
> > -marcel

>
> Thanks Marcel, those are great times.
>
> I'm finally doing SoZP11, which will be the fastest of the lot.
> Hope to finish my paper my next week too.
>
> Jabari


I've finally finished the SoZP11 primes generator.
It's been uploaded to my 4shared site and is ready to download.

Marcel, please post your times for this too.

Thanks

Jabari
Reply With Quote
  #30  
Old 11-10-2008, 01:25 PM
Marcel Hendrix
Guest
 
Default Re: Ultimate Prime Sieve -- Sieve of Zakiya (SoZ)

jzakiya <jzakiya@mail.com> writes Re: Ultimate Prime Sieve -- Sieve of Zakiya (SoZ)

> On Nov 5, 2:42 am, jzakiya <jzak...@mail.com> wrote:
>> On Nov 4, 4:23 pm, m...@iae.nl (Marcel Hendrix) wrote:


>> > jzakiya <jzak...@gmail.com> writes Re: Ultimate Prime Sieve -- Sieve of Zakiya (SoZ)
>> > [..]


>> > >http://www.4shared.com/account/dir/7...bd7b71/sharing
>> > > OK Marcel, please run it thru Iforth and post results. Should be much better.


[..]

> I've finally finished the SoZP11 primes generator.
> It's been uploaded to my 4shared site and is ready to download.


> Marcel, please post your times for this too.


FORTH> benchmarks (iForth64, Intel Core-Duo E8200, 2.66 GHz, XP64 PRO)
For N = 1000000001 times in secs for these SoZ prime generators
SoZP11 8.995 seconds elapsed.
SoZP7 9.868 seconds elapsed.
SoZP5 11.543 seconds elapsed.
SoZP60 11.522 seconds elapsed.
SoZP3 14.318 seconds elapsed.
SoE 19.310 seconds elapsed. ok

FORTH> benchmarks (iForth32, Intel PIV 3GHz, XP PRO)
For N = 1000000001 times in secs for these SoZ prime generators
SoZP11 23.827 seconds elapsed.
SoZP7 25.079 seconds elapsed.
SoZP5 28.101 seconds elapsed.
SoZP60 28.166 seconds elapsed.
SoZP3 34.314 seconds elapsed.
SoE 47.252 seconds elapsed. ok

I bought the E8200 to port iForth64 to Windows. These are the first results.
If you want I could run the code on the AMD X2 under Linux, too.

-marcel

Reply With Quote
Reply


Thread Tools
Display Modes


All times are GMT -5. The time now is 06:37 AM.


Powered by vBulletin® Version 3.7.2
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
vB Ad Management by =RedTyger=

In an effort to better serve ads to our visitors, cookies are used on objectmix.com. For more information, check out our Privacy Policy.