The Harrington Compression Method (HCM) White Paper.

This is a discussion on The Harrington Compression Method (HCM) White Paper. within the Theory forums in Theory and Concepts category; Einstein wrote: ) The command section has been calculated out, it's rather simple. ) ) The files are accounted for in the command section. 50 bits per file ) should account for maximum length of each in turn. ) ) This is 250 bits. What, exactly, is recorded in these 50 bits then ? Each time a string ends up in a different file from the string before it, you will need to record this information. This takes more than one bit per switch, as I see it. SaSW, Willem -- Disclaimer: I am in no way responsible for any ...

Go Back   Application Development Forum > Theory and Concepts > Theory

Object Mix

Register FAQ Calendar Search Today's Posts Mark Forums Read
  #11  
Old 08-26-2008, 03:27 PM
Willem
Guest
 
Default Re: The Harrington Compression Method (HCM) White Paper.

Einstein wrote:
) The command section has been calculated out, it's rather simple.
)
) The files are accounted for in the command section. 50 bits per file
) should account for maximum length of each in turn.
)
) This is 250 bits.

What, exactly, is recorded in these 50 bits then ?

Each time a string ends up in a different file from the string before
it, you will need to record this information. This takes more than one
bit per switch, as I see it.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
Reply With Quote
  #12  
Old 08-26-2008, 03:31 PM
Jym
Guest
 
Default Re: The Harrington Compression Method (HCM) White Paper.

On Tue, 26 Aug 2008 21:03:27 +0200, Einstein <michaelhh@gmail.com> wrote:

> S1 = 11 01 00 00 -> 111 10 0 0
> S2 = 11 10 10 11 -> 111 110 110 111
>
> What I am going to do, to help explain this better is place a letter
> next to each number, so the description is more effective
>
>
> S1 = 11 01 00 00 -> 1a1b1c 1d0e 0f 0g
> S2 = 11 10 10 11 -> 1h1i1j 1k1l0m 1n1o0p 1q1r1s
>
>
> So what happens in this order is
>
> Letter per letter
>
> a goes in file #1, since a = 1 then b will go to file #3, since b = 1

#2, I guess
(as done in the example below)
> then c goes to file #3 , now since c is located in file #3 the next
> bit, regardless, starts in file #1. So d goes into file #1, as d = 1
> then e goes into file #2, and since e = 0 the next bit goes to file
> #1.


> So end result:
>
> S1: F1: 1100 F2: 10 F3: 1
> S2: F1: 1111 F2: 1111 F3: 1001


> Does this help with the understanding?


Yes it's much better, now I understand this step. And I'm pretty much
conviced it is indeed reversible.

Well, basically so far we have just done an Huffman's encoding and then
shuffled the bits a little. But all the bits in the three files are
exactly the same than the ones in the Huffman's encoding.

> 3)Repeat step #2 on the #1 files content.The results will astonish
> you. Using a sub-filing system of 1.1, 1.2, 1.3, 2, 3 we end up with
> very interesting results across the whole spectrum.


OK, obvioulsy with such small examples this will be a stupid thing to do,
but let's do it anyway (if for no other reason, at least to see whether
I've understood so far)...

So,
S1: F1.1: 10 F1.2: 1 F1.3: 0 F2: 10 F3: 1
S2: F1.1: 11 F1.2: 1 F1.3: 1 F2: 1111 F3: 1001
Yeah, obviously this is pretty stupid on these examples...

> Take the example below:
>
> 11 = 1
> 10 = 01
> 01 = 001
> 00 = 000
>
> This table relies upon a statistical imbalance of 1's to 0's. The
> formula is, if percentage of 1's = A, and percentage of 0's = B,
> ( (A*A) + (A*B*2) + (B*A*3) + (B*B*3) ) /2 = C. C is the percentage
> remaining of the original file size.


I disagree with this formula.
If we consider the string :
11001100
we have A = B = 0.5
After the encoding (with the Huffman's table you give just above), we will
get the string
10001000 whose size is the same as the original.
Yet, computing C will give :
C = ( (0.5*0.5) + 2*(0.5*0.5) + 3*(0.5*0.5) + 3*(0.5*0.5) )/2
= ( 0.25 + 2*0.25 + 3*0.25 + 3*0.25 )/2
= 1.125 != 1

I agree, however, that if a, b, c and d are the respective ratios of 11,
10, 01 and 00 the the size of the final result will be in ratio
(a+2b+3c+3d)/2 of the original.
[in this case, a=d=0.5, b=c=0 and (0.5 + 0 + 0 + 1.5)/2 = 1 ; the final
string has indeed the same size as the original.]

So, in order to get this formula working, you at least need some extra
assumptions somewhere...
[Similar examples invalidating your formula can easily be built for ratio
other than 50%]

--
Hypocoristiquement,
Jym.
Reply With Quote
  #13  
Old 08-26-2008, 03:37 PM
Einstein
Guest
 
Default Re: The Harrington Compression Method (HCM) White Paper.

On Aug 26, 12:27*pm, Willem <wil...@stack.nl> wrote:
>
> Each time a string ends up in a different file from the string before
> it, you will need to record this information. *This takes more than one
> bit per switch, as I see it.


Incorrect sir, the string identifies where to go on it's own, the only
factor is when it is done and we compress (or do not compress as may
happen with file #3) the files, we need to identify the length of the
remaining string.

Since file #1 is statistically 44.4% of the file size it can represent
the largest single chunk. If file #1 exceeds a terrabyte (sp?) then
frankly I am going to spank google for not getting a custom version
for their compression!
Reply With Quote
  #14  
Old 08-26-2008, 03:42 PM
Einstein
Guest
 
Default Re: The Harrington Compression Method (HCM) White Paper.

On Aug 26, 12:31*pm, Jym <Jean-Yves.Moyen+n...@ens-lyon.org> wrote:
> On Tue, 26 Aug 2008 21:03:27 +0200, Einstein <michae...@gmail.com> wrote:
> > S1 = 11 01 00 00 -> 111 10 0 0
> > S2 = 11 10 10 11 -> 111 110 110 111

>
> > What I am going to do, to help explain this better is place a letter
> > next to each number, so the description is more effective

>
> > S1 = 11 01 00 00 -> 1a1b1c 1d0e 0f 0g
> > S2 = 11 10 10 11 -> 1h1i1j 1k1l0m 1n1o0p 1q1r1s

>
> > So what happens in this order is

>
> > Letter per letter

>
> > a goes in file #1, since a = 1 then b will go to file #3, since b =1

>
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * *#2, I guess
> (as done in the example below)
>
> > then c goes to file #3 , now since c is located in file #3 the next
> > bit, regardless, starts in file #1. So d goes into file #1, as d = 1
> > then e goes into file #2, and since e = 0 the next bit goes to file
> > #1.
> > So end result:

>
> > S1: F1: 1100 F2: 10 F3: 1
> > S2: F1: 1111 F2: 1111 F3: 1001
> > Does this help with the understanding?

>
> Yes it's much better, now I understand this step. And I'm pretty much *
> conviced it is indeed reversible.
>
> Well, basically so far we have just done an Huffman's encoding and then *
> shuffled the bits a little. But all the bits in the three files are *
> exactly the same than the ones in the Huffman's encoding.
>
> > 3)Repeat step #2 on the #1 files content.The results will astonish
> > you. Using a sub-filing system of 1.1, 1.2, 1.3, 2, 3 we end up with
> > very interesting results across the whole spectrum.

>
> OK, obvioulsy with such small examples this will be a stupid thing to do,*
> but let's do it anyway (if for no other reason, at least to see whether *
> I've understood so far)...
>
> So,
> S1: F1.1: 10 F1.2: 1 F1.3: 0 * F2: 10 F3: 1
> S2: F1.1: 11 F1.2: 1 F1.3: 1 * F2: 1111 F3: 1001
> Yeah, obviously this is pretty stupid on these examples...
>
> > Take the example below:

>
> > 11 = 1
> > 10 = 01
> > 01 = 001
> > 00 = 000

>
> > This table relies upon a statistical imbalance of 1's to 0's. The
> > formula is, if percentage of 1's = A, and percentage of 0's = B,
> > ( (A*A) + (A*B*2) + (B*A*3) + (B*B*3) ) /2 = C. C is the percentage
> > remaining of the original file size.

>
> I disagree with this formula.
> If we consider the string :
> 11001100
> we have A = B = 0.5
> After the encoding (with the Huffman's table you give just above), we will *
> get the string
> 10001000 whose size is the same as the original.
> Yet, computing C will give :
> C = ( (0.5*0.5) + 2*(0.5*0.5) + 3*(0.5*0.5) + 3*(0.5*0.5) )/2
> * *= ( 0.25 + 2*0.25 + 3*0.25 + 3*0.25 )/2
> * *= 1.125 != 1
>
> I agree, however, that if a, b, c and d are the respective ratios of 11, *
> 10, 01 and 00 the the size of the final result will be in ratio *
> (a+2b+3c+3d)/2 of the original.
> [in this case, a=d=0.5, b=c=0 and (0.5 + 0 + 0 + 1.5)/2 = 1 ; the final *
> string has indeed the same size as the original.]
>
> So, in order to get this formula working, you at least need some extra *
> assumptions somewhere...
> [Similar examples invalidating your formula can easily be built for ratio*
> other than 50%]
>
> --
> Hypocoristiquement,
> Jym.


Ok Jym your getting there.

Now you need to follow statistics. If 50/50 even but random
distributation, then you work file #1 down. .44 * 1.125 * original
file size is the new size of files 1.1, 1.2, 1.3 when we run file #1
through the system.

However the numbers, 93.75 to 6.25, and 80% to 20%, and 75% to 25%, it
makes for enough compression using a mere huffman...

I would dare say that a more efficient compression routine than the
simplistic huffman I posted for step 4, will totally lead to a very
measurable increase in the already measurable compression rate.
Reply With Quote
  #15  
Old 08-26-2008, 04:21 PM
biject
Guest
 
Default Re: The Harrington Compression Method (HCM) White Paper.

On Aug 26, 1:25 pm, Einstein <michae...@gmail.com> wrote:
> Thomas you are trying to divert the actual make up.
>
> It's a simple concept, a 50/50%, a 25/25/25/25%, a
> 12.5/12.5/12.5/12.5/12.5/12.5/12.5/12.5%, and so forth situation can
> be made into a:
>
> Three files with a 75/25% a 67/33% and a 50/50%.
>
> Further work gets us a 93.75%, a 80%, a 75%, a 67%, and a 50%.
>
> When run through a Huffman the size decrease is better than the size
> increases required in the earlier Huffmans.


Remember the Goldman Craig compression thing awhile back.
In my view Craig won since Goldman seemed to allow extra output
files. Well thats not what compression is about.

You need to count the bits needed to make them one file. You
could use a tar or something like my DSC but its nor fair to count
seperate totals and wave your hands. Make it one file going to one
file. Where name is arbitatary.


David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"
Reply With Quote
  #16  
Old 08-26-2008, 04:24 PM
Einstein
Guest
 
Default Re: The Harrington Compression Method (HCM) White Paper.

On Aug 26, 1:21*pm, biject <biject.b...@gmail.com> wrote:
> On Aug 26, 1:25 pm, Einstein <michae...@gmail.com> wrote:
>
> > Thomas you are trying to divert the actual make up.

>
> > It's a simple concept, a 50/50%, a 25/25/25/25%, a
> > 12.5/12.5/12.5/12.5/12.5/12.5/12.5/12.5%, and so forth situation can
> > be made into a:

>
> > Three files with a 75/25% a 67/33% and a 50/50%.

>
> > Further work gets us a 93.75%, a 80%, a 75%, a 67%, and a 50%.

>
> > When run through a Huffman the size decrease is better than the size
> > increases required in the earlier Huffmans.

>
> *Remember the Goldman Craig compression thing awhile back.
> In my view Craig won since Goldman seemed to allow extra output
> files. Well thats not what compression is about.
>
> * You need to count the bits needed to make them one file. You
> could use a tar or something like my DSC but its nor fair to count
> seperate totals and wave your hands. Make it one file going to one
> file. Where name is arbitatary.
>
> David A. Scott
> --
> *My Crypto codehttp://bijective.dogma.net/crypto/scott19u.ziphttp://www..jim.com/jamesd/Kong/scott19u.zipold version
> My Compression codehttp://bijective.dogma.net/
> **TO EMAIL ME drop the roman "five" **
> Disclaimer:I am in no way responsible for any of the statements
> *made in the above text. For all I know I might be drugged.
> As a famous person once said "any cryptograhic
> system is only as strong as its weakest link"




My thanks for telling him one file leading to one file. That is the
ultimate goal, start with a single data source, end with a single data
source.
Reply With Quote
  #17  
Old 08-26-2008, 04:43 PM
Willem
Guest
 
Default Re: The Harrington Compression Method (HCM) White Paper.

Einstein wrote:
) On Aug 26, 12:27*pm, Willem <wil...@stack.nl> wrote:
)> Each time a string ends up in a different file from the string before
)> it, you will need to record this information. *This takes more than one
)> bit per switch, as I see it.
)
) Incorrect sir, the string identifies where to go on it's own, the only
) factor is when it is done and we compress (or do not compress as may
) happen with file #3) the files, we need to identify the length of the
) remaining string.

How does it identify where to go ? You did not state that very clearly
in your original post. Note that it *also* needs to identify where it
came from, and/or in which file to find the next chunk, when decompressing.

How about you demonstrate your algorithm with, say, a 64-bit string ?
I'm sure you can work it out on paper, no need for any real programming.

Here are 64 bits I downloaded from a random-bit generating site:
10001010010111010101111001010000111000110010001000 0000110111110

And in hexadecimal:
452EAF28719101BE


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
Reply With Quote
  #18  
Old 08-26-2008, 04:53 PM
Jym
Guest
 
Default Re: The Harrington Compression Method (HCM) White Paper.

On Tue, 26 Aug 2008 21:42:37 +0200, Einstein <michaelhh@gmail.com> wrote:

> Now you need to follow statistics. If 50/50 even but random
> distributation, then you work file #1 down. .44 * 1.125 * original
> file size is the new size of files 1.1, 1.2, 1.3 when we run file #1
> through the system.
>
> However the numbers, 93.75 to 6.25, and 80% to 20%, and 75% to 25%, it
> makes for enough compression using a mere huffman...
>
> I would dare say that a more efficient compression routine than the
> simplistic huffman I posted for step 4, will totally lead to a very
> measurable increase in the already measurable compression rate.


Again, I'm slightly confused...

If I understand correctly, your claiming that *statistically* a random
file will have some kind of distribution between 0 and 1 and from that you
conclude (after some computation I haven't checked yet) that
*statistically* your method result in some compression.

Am I correct ?

OK, now what puzzles me is what is more or less hidden behind this
"staistically". I see it as some kind of average over all possible
outcomes, or something in that way.

The problem is that even if you have *statistically* (on average) a good
compression, that doesn't mean that *all* string get compressed.

For example, I can consider an "algorithm" that "compress" (encodes) all 3
bits strings in the following way :
000 -> 11
001 -> 101
011 -> 01
010 -> 111
110 -> 1
111 -> 11
101 -> 1000
100 -> 0

As you can easilly see, this encoding is reversible [I'm just working here
over all the possible 3-bits strings, I'm not considering that I cut a
string in chuncks of 3 bits and encode each one that way].

Now, you can also quite easilly see that statistically the size of the
result is smaller than the size of the input.

Yet not all 3 bits strings are compressed by this encoding, some of them
keep the same size, some even increase in size.



So, my point is that you claim [again, i haven't done the maths and trust
them for the while] that your method gives some statistical compression
but that doesn't prove that each single input will be compressed.

--
Hypocoristiquement,
Jym.
Reply With Quote
  #19  
Old 08-26-2008, 05:02 PM
Jym
Guest
 
Default Re: The Harrington Compression Method (HCM) White Paper.

On Tue, 26 Aug 2008 22:43:23 +0200, Willem <willem@stack.nl> wrote:

> Einstein wrote:
> ) On Aug 26, 12:27*pm, Willem <wil...@stack.nl> wrote:
> )> Each time a string ends up in a different file from the string before
> )> it, you will need to record this information. *This takes more than
> one
> )> bit per switch, as I see it.
> )
> ) Incorrect sir, the string identifies where to go on it's own, the only
> ) factor is when it is done and we compress (or do not compress as may
> ) happen with file #3) the files, we need to identify the length of the
> ) remaining string.
>
> How does it identify where to go ? You did not state that very clearly
> in your original post. Note that it *also* needs to identify where it
> came from, and/or in which file to find the next chunk, when
> decompressing.


I had the same question. See the first messages in the other branch of the
thread.

As fas as I understand, the algorithm for splitting a string into the
three files could be formalized as follows:
* First bit goes in file 1.
* If bit n is a 0, then bit (n+1) goes in file 1.
* If bit n goes in file 3, then bit (n+1) goes in file 1.
* If bit n is a 1 and goes in file i<3, then bit (n+1) goes in file (i+1).

The decoding algorithm is slightly more complicated to formalise because
you need to keep track of a pointer in each file, but I'm pretty sure you
can work it out if needed.

> How about you demonstrate your algorithm with, say, a 64-bit string ?
> I'm sure you can work it out on paper, no need for any real programming.
>
> Here are 64 bits I downloaded from a random-bit generating site:
> 10001010010111010101111001010000111000110010001000 0000110111110


If I'm correct, that sould be :
File 1: 10011011011110110001000101001000000111
File 2: 000010010001100111
File 3: 1110010

--
Hypocoristiquement,
Jym.
Reply With Quote
  #20  
Old 08-26-2008, 10:39 PM
Einstein
Guest
 
Default Re: The Harrington Compression Method (HCM) White Paper.

10 00 10 10 01 01 11 01 01 01 11 10 01 01 00 00 11 10 00 11 00 10 00
10 00 00 00 11 01 11 11 0

First I divided it into chunks of 2 bits. I found a missing bit, so I
add a 1 to the end (Command section identifies the missing bit by
using a bit to say "add a bit to the end")

10 00 10 10 01 01 11 01 01 01 11 10 01 01 00 00 11 10 00 11 00 10 00
10 00 00 00 11 01 11 11 01

Next step is the alter the string with the Huffman

1110 0 1110 1110 10 10 111 10 10 10 111 1110 10 10 0 0 111 1110 0 111
0 1110 0 1110 0 0 0 111 10 111 111 10

The Next step is to sort these into the files.



File #1
10111111111111001101010100011111

File #2
11100100011001111110110

File #3
00011010100111


Now this has an obvious deviation, it will compress good in file #1
and possibly file #2.

Split it up to identify occurences


File #1
10 11 11 11 11 11 11 00 11 01 01 01 00 01 11 11 (26)

File #2
11 10 01 00 01 10 01 11 11 10 11 0[1] (24)

File #3
00 01 10 10 10 01 11 (14)

Immeadiately I see something I missed on the command section, that
some outcomes may have an odd result. Add a 1 to it (Command section
per file, 1 bit to identify), What the (26) stands for is size after
compression if we were to stop now and use the Huffman. In this case
it would be a no way.

11 = 1
10 = 01
01 = 001
00 = 000

So we run file #1 through the system again prior to the compression
stage.
1111111011110111
11111111000011
0111111111

This means the following files now:

File #1.1
11 11 11 10 11 11 01 11 (9)


File #1.2
11 11 11 11 00 00 11 (9)


File #1.3
01 11 11 11 11 (6)


File #2
11 10 01 00 01 10 01 11 11 10 11 0[1] (24)

File #3
00 01 10 10 10 01 11 (14)

Note that statistically, since we do have command section issues, we
have obtained compression, even after adding two bits to the whole.
Yes the command section requires knowing which swaps we will do per
file, such as in file #1.2 we swap 00 for 10 in our make up so that it
has a value length of 2 bits., and 01 in file #1.3 has the same
happening.

This has 63 bits + command section, for 64 bits. Now the command
sections will come out to less than 350 bits total. I can even say a
rounded 500 bits if you desire. Therefore if this statistical sample
was 31500 bits or more we would have factual compression after the
command section.


And here is the real pisser for you.

I skipped a step that is sometimes non-essential.


I did not do swap 1 for 0 in the very first step, as clearly there is
an imbalance in favor of 0's in primacy in the source code provided.




It can take a serious programmer less than 8 hours to make this thing.
Hell it should take less than 4 hours even. I realize this little
fact, and instead will be forced to run figure after figure, all
compressing, until someone decides to make a system to test it through
larger random files.
Reply With Quote
Reply


Thread Tools
Display Modes


All times are GMT -5. The time now is 11:51 PM.


Powered by vBulletin® Version 3.7.2
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
vB Ad Management by =RedTyger=

In an effort to better serve ads to our visitors, cookies are used on objectmix.com. For more information, check out our Privacy Policy.