The Harrington Compression Method (HCM) White Paper.

This is a discussion on The Harrington Compression Method (HCM) White Paper. within the Theory forums in Theory and Concepts category; Introduction: This is a lossless compression method which WILL work on random binary data and data considered Entropic. It involves one unique step previously not considered by others to obtain the compression. It can work on data in nearly any level you want, with a minimum to be yet calculated. It does this via a self creating filing system that gives much more possible values from our actual outcomes, as well as a ratio imbalance at the same time. This post is to help people understand for two purposes, Investment in the proposal, and for peer review. Therefore please understand ...

Go Back   Application Development Forum > Theory and Concepts > Theory

Object Mix

Register FAQ Calendar Search Today's Posts Mark Forums Read
  #1  
Old 08-26-2008, 11:57 AM
Einstein
Guest
 
Default The Harrington Compression Method (HCM) White Paper.

Introduction:
This is a lossless compression method which WILL work on random binary
data and data considered Entropic. It involves one unique step
previously not considered by others to obtain the compression. It can
work on data in nearly any level you want, with a minimum to be yet
calculated.

It does this via a self creating filing system that gives much more
possible values from our actual outcomes, as well as a ratio imbalance
at the same time.

This post is to help people understand for two purposes, Investment in
the proposal, and for peer review.


Therefore please understand I have the answer for the counting
argument, and I have the answer for entropic data. I am going to make
myself available for conversation on mIRC.

Find me at www.imperialconflict.com and in #compress

I am the registered person called Einstein there, and I will try to be
available as much as I can be today.

End of the Introduction






Harrington Compression Method


The Harrington Compression Method, henceforth known as HCM, is a
repeatable, self tabulating, statistical compression method like no
other compression system in use today. HCM incorporates a built-in
dictionary which allows the user to repeatedly run this system on
individual files, or subfiles, via triggers for certain events, and
includes command sections for each built-in. As a result, HCM allows
for nearly endless variations and possibilities. And, most importantly
of all, the degree of file compression is far greater than any
existing compression software currently available. In short, the HCM
is a revolutionary compression system.

This is a White Paper intended for peer review of the basic
fundamentals behind the system. Michael Hugh Harrington reserves all
rightsŪ.

The method:

1)Using a standard Huffman table:

11 = 111
10 = 110
01 = 10
00 = 0

Additional variations as needed can be utilized.

The Huffman produces 9 bits for every 8 initial bits.


2)Using HCM, we make different 'files' out of information. Any 1's in
a line (until three are used or a 0 occurs) will lead to a different
'file' being granted the following information from that string. In
this context each 'file' stands for a temporary file used to hold the
information.

The following Table 1 is divided for the standard conversion system
noted above.

File #1
1
1
1
0


File #2
1
1
0
File #3
1
0
Table 1

Note the ratios in each 'file'. File #1 has a 75% ratio of 1's to a
25% ratio of 0's. File #2 has a 66.7% ratio of 1's to a 33.3% ratio of
0's. File #3 has a 50% to 50% ratio. Of course this is purely a
statistical representation as we will have a large file being
compressed with many more results added to the files


in question.

If originally we had a preexisting ratio imbalance of 1's to 0's we
could conceivably have an imbalance

that would affect the whole. We will presume that the imbalance is to
the favor of the 1's for our current example, that we favor 1's in the
current system (1's and 0's can be interchanged at will), and that if
0's were predominant a single bit in a command section could switch
1's for 0's and make 1's predominant again. This imbalance will lead
to even greater imbalances than 75% in the File #1 and File #2, and
file #3 will then show an imbalance equal to the original imbalance.



3)Repeat step #2 on the #1 files content.The results will astonish
you. Using a sub-filing system of 1.1, 1.2, 1.3, 2, 3 we end up with
very interesting results across the whole spectrum.

File 1.1 will have 19.75% of the results and a ratio imbalance of
93.75% to 6.25%.

File 1.2 will have 14.81% of the results and a ratio imbalance of 80%
to 20%

File 1.3 will have 9.87% of the results and a ratio imbalance of 75%
to 25%

File 2 will have 33.33% of the results and a ratio imbalance of 66.7%
to 33.3%

File 3 will have 22.22% of the results and a ratio imbalance of 50% to
50%

These variables are critically important due to the mathematics
involved in how a Huffman can create compression.

Take the example below:

11 = 1
10 = 01
01 = 001
00 = 000

This table relies upon a statistical imbalance of 1's to 0's. The
formula is, if percentage of 1's = A, and percentage of 0's = B,
( (A*A) + (A*B*2) + (B*A*3) + (B*B*3) ) /2 = C. C is the percentage
remaining of the original file size. Using File 1.1 as an example, we
have: ( ( .9375 * .9375) + (.9375 * .0625 * 2) + (.0625 * .9375 * 3)
+ (.0625 * .0625) ) / 2 = 0.591796875, or in percentage 59.1796875% of
the original file size of 1.1. This is of course, statistically
speaking, due to the fact that all outcomes have variances. However,
in very large files processed this will be the end result's
approximate range, +/- .1%. This represents a dramatic decrease in the
size of File #1.1.


4)Using a replacement method as follows, except for File #3 (unless
ratio was previously in excess of 65% to 35%, a command section
notation can allow this):



11 = 1
10 = 01
01 = 001

00 = 000

We get the following sizes per file, statistically:

File 1.1 = 1.05 bits
File 1.2 = 1.04 bits
File 1.3 = .75 bits
File 2 = 2.83 bits
File 3 = 2 bits

Statistically speaking there are 7.67 bits for every original 8 bits.
This is based upon a 50% to 50% ratio of 1's to 0's in the original
code. This represents a 4% reduction in the size of our original file
prior to the next portion below. No other compression system can do
this on 'random' or 'entropic' data.


5)Command Section
We now need a command section to handle all the different files,
changes, etc. Each file will have a specific number of bits. This can
be easily represented with a simplistic counting system allowing for
maximum space savings. If < 1 kilobyte then 00, if less than 1
megabyte then 01, if less than 1 gigabyte then 10, if greater than 1
gigabyte then 11.

This can be expanded for custom applications if needed to enter
terrabyte or larger sizes. If 00 then 10 bits to count the number of
bits in file 1.1. If 01 then 20 bits to count the bits in file 1.2.
If 10 then 30 bits, and if 00 then 40 bits. Repeat for each file. Then
attach each file to each other, they are now accounted for exactly. We
might also include a switch for compressing file #3. We might also
include a switch to flip 0 for 1, and/or 11 for 00 results. We might
include other switches as needed. We should also include a command
section to allow for a number of compression cycles to be completed.

6)Reversing the Compression
Our compression results can be undone by exactly going in reverse of
our system. The command section will point the way accurately. There
can be no errors whatsoever so long as the code is written correctly.
It has an absolute lossless function built-in.




How is this all possible? There are several factors to consider.

Look at only the numbers listed in Table 2 below:






Source
111
110
10
0
File #1
1
1
1
0


File #2
1
1
0
File #3
1
0
Table 2

As you can see, any imbalance in the actual source will NOT hurt the
whole. If we are missing all 110 results, File #3 can have a command
section note to only count the 1's. If there is a single 110 leading
to a 0 in a slew of 1's then we still can obtain a near 50% reduction
in size via a standard replacement as

noted in section 4. Other replacement schemes can make our file as low
as desired if the ratio is truly imbalanced. This can be hard-coded
and triggered as needed. If 10 results are missing then File #2
compresses to a much higher extent. Even if 111 results are totally
off we can simply flip it in the beginning with a 11 switch with a 00
command section note. Many modifications can be installed to effect
the initial base code which will make any issues of preexisting
imbalances a nonissue. All of this can be easily coded into a command
section. Our command section can be large, or small, based upon our
needs. Tree triggers can be incorporated as needed to allow for
unusual contingencies which will lead to even better compression
ratios. In short, this method can kick all the current compression
systems out on its own. Any variation of the two bits ratios will not,
cannot, hinder the system. This leads to a situation where only the
number of cycles, and the command section, truly limits a binary
sequence from being compressed to the fullest extent.

The claim of the pigeon hole problem naturally arises at this point.
However, all the steps are reversible and lead IMMEDIATELY back to the
same original code. No two results will be exactly the same. It is
mathematically not possible. The HCM system completely leads to the
data as inputted originally.

What this means is that currently thought of entropic data is not
entropic for our purposes. Entropy exists at much lower levels than
previously expected for this system. This does not mean 1 = all data.
Neither does 0. It means that a new flexible view of data needs to be
researched and examined.

I liken the compression to being a camera in a wall in a fog filled
super-sized room. If you try to look too far you won't see to the end.
However, if you move the wall in, the fog thickens, but you can see
further than you did before. Eventually there is a point where the fog
is so thick that any forward movement will not gain you any viewing
distance, and even eventually the fog grows so thick you cannot see at
all.



Reply With Quote
  #2  
Old 08-26-2008, 12:05 PM
Einstein
Guest
 
Default Re: The Harrington Compression Method (HCM) White Paper.

Oh and patent pending!
Reply With Quote
  #3  
Old 08-26-2008, 12:14 PM
Willem
Guest
 
Default Re: The Harrington Compression Method (HCM) White Paper.

Einstein wrote:
) Introduction:
) This is a lossless compression method which WILL work on random binary
) data and data considered Entropic.

<snip>
I'll jump right to the blatant error.

) File 1.1 = 1.05 bits
) File 1.2 = 1.04 bits
) File 1.3 = .75 bits
) File 2 = 2.83 bits
) File 3 = 2 bits
)
) Statistically speaking there are 7.67 bits for every original 8 bits.

And you have 5 files instead of 1. This represents extra information.
So much extra that it easily accounts for the 0.33 bits/byte you 'gained'.

) 5)Command Section
) We now need a command section to handle all the different files,
) changes, etc. Each file will have a specific number of bits. This can
) be easily represented with a simplistic counting system allowing for
) maximum space savings. If < 1 kilobyte then 00, if less than 1
) megabyte then 01, if less than 1 gigabyte then 10, if greater than 1
) gigabyte then 11.

I note that you fail to calculate how many bits this command section will
take, statistically.

) The claim of the pigeon hole problem naturally arises at this point.
) However, all the steps are reversible and lead IMMEDIATELY back to the
) same original code. No two results will be exactly the same. It is
) mathematically not possible. The HCM system completely leads to the
) data as inputted originally.

Note that the counting theorem has been proven true.

Therefore, the only logical conclusion can be that the number of bits the
command section takes, a number which you failed to calculate, *must*
be at least as large as any gain you get from the compression.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
Reply With Quote
  #4  
Old 08-26-2008, 12:18 PM
Willem
Guest
 
Default Re: The Harrington Compression Method (HCM) White Paper.

Einstein wrote:
) Oh and patent pending!

I know of two patents that have been granted already
that claim compression on all data.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
Reply With Quote
  #5  
Old 08-26-2008, 12:20 PM
Einstein
Guest
 
Default Re: The Harrington Compression Method (HCM) White Paper.

And the first troll who did not think things through.

The command section has been calculated out, it's rather simple.

The files are accounted for in the command section. 50 bits per file
should account for maximum length of each in turn.

This is 250 bits.

Other commands add up to less than 100 bits for sure. Thats worst case
to reach 100.

Ergo our savings needs to be in excess of the 350 bits.

And this is ON ENTROPIC DATA.

Do your research, or walk away.


You did not even read the last argument in there where I showed how
the system works in general.
Reply With Quote
  #6  
Old 08-26-2008, 12:24 PM
Einstein
Guest
 
Default Re: The Harrington Compression Method (HCM) White Paper.

1st you ignore the facts by not looking for yourself how large it will
end up being (The command section).


The highest reasonable bounds (unless your compressing Google) is 350
bits.




The counting argument is defeated, you ignored the closing argument.
This is how my system, in general, has achieved it.



Please try to at least act like your trying.
Reply With Quote
  #7  
Old 08-26-2008, 02:41 PM
Jym
Guest
 
Default Re: The Harrington Compression Method (HCM) White Paper.

On Tue, 26 Aug 2008 17:57:35 +0200, Einstein <michaelhh@gmail.com> wrote:

> Harrington Compression Method


OK, I haven't understood all the steps, so let's try a concrete example to
see how it work in practise. Please let me know if I do something wrong.

Let's consider two strings that we want to compress, independantly :
S1 = 11010000
S2 = 11101011

> The method:
>
> 1)Using a standard Huffman table:
>
> 11 = 111
> 10 = 110
> 01 = 10
> 00 = 0


Of course, this Huffman table will probably not be the correct one for the
examples I've chose but let's not care about it. i don't want (yet) to see
the compression working (anyway, my two examples are probably too small to
work properly, you speak of 350 bits of command later) but just understand
how the method work.

Using this table, my two examples can be encoded as :
S1 = 11 01 00 00 -> 111 10 0 0
S2 = 11 10 10 11 -> 111 110 110 111

Of course, Huffman's coding also require the decoding table to be send
with the file in order to be able to decompress the data (this is the
point which make certain data larger after an Hufman's encoding).

> The Huffman produces 9 bits for every 8 initial bits.


Really ? I'm counting 7 for S1 and 12 for S2. Can you explicit a bit more
this assertion ?

> The following Table 1 is divided for the standard conversion system
> noted above.
>
> File #1
> 1
> 1
> 1
> 0
>
>
> File #2
> 1
> 1
> 0
> File #3
> 1
> 0
> Table 1


OK, so we have files corresponding to the columns in the Huffman's table.

> Note the ratios in each 'file'. File #1 has a 75% ratio of 1's to a
> 25% ratio of 0's. File #2 has a 66.7% ratio of 1's to a 33.3% ratio of
> 0's. File #3 has a 50% to 50% ratio. Of course this is purely a
> statistical representation as we will have a large file being
> compressed with many more results added to the files in question.


OK.

> If originally we had a preexisting ratio imbalance of 1's to 0's we
> could conceivably have an imbalance that would affect the whole.


Huh ? I may have missed something, but at this point I don't see how the
strings S1 and S2 can be encoded using these 3 files.
Could you please complete my examples ?

> We will presume that the imbalance is to
> the favor of the 1's for our current example, that we favor 1's in the
> current system (1's and 0's can be interchanged at will), and that if
> 0's were predominant a single bit in a command section could switch
> 1's for 0's and make 1's predominant again.


Just take example S2 if you want (where there are more '1' than '0').

> This imbalance will lead
> to even greater imbalances than 75% in the File #1 and File #2, and
> file #3 will then show an imbalance equal to the original imbalance.


At this point, I really don't see the link between the original data and
the three files.
Well, there is a link between the Huffman's table and the three files, and
the Huffman's table should depends on the data (Huffman's algorithm), but
the table you presented was for a data with more 00 than 01 and more 01
than 10 and 11, so probably more 0 than 1 overall, so I don't see why
you're speaking about more 1 than 0...

--
Hypocoristiquement,
Jym.
Reply With Quote
  #8  
Old 08-26-2008, 03:03 PM
Einstein
Guest
 
Default Re: The Harrington Compression Method (HCM) White Paper.

On Aug 26, 11:41*am, Jym <Jean-Yves.Moyen+n...@ens-lyon.org> wrote:
> On Tue, 26 Aug 2008 17:57:35 +0200, Einstein <michae...@gmail.com> wrote:
> > Harrington Compression Method

>
> OK, I haven't understood all the steps, so let's try a concrete example to *
> see how it work in practise. Please let me know if I do something wrong.
>
> Let's consider two strings that we want to compress, independantly :
> S1 = 11010000
> S2 = 11101011
>
> > The method:

>
> > 1)Using a standard Huffman table:

>
> > 11 = 111
> > 10 = 110
> > 01 = 10
> > 00 = 0

>
> Of course, this Huffman table will probably not be the correct one for the *
> examples I've chose but let's not care about it. i don't want (yet) to see *
> the compression working (anyway, my two examples are probably too small to *
> work properly, you speak of 350 bits of command later) but just understand *
> how the method work.
>
> Using this table, my two examples can be encoded as :
> S1 = 11 01 00 00 -> 111 10 0 0
> S2 = 11 10 10 11 -> 111 110 110 111
>
> Of course, Huffman's coding also require the decoding table to be send *
> with the file in order to be able to decompress the data (this is the *
> point which make certain data larger after an Hufman's encoding).
>
> > The Huffman produces 9 bits for every 8 initial bits.

>
> Really ? I'm counting 7 for S1 and 12 for S2. Can you explicit a bit more*
> this assertion ?
>


My apologies, it should read statistically.

>
> > The following Table 1 is divided for the standard conversion system
> > noted above.

>
> > * * * * * * * * *File #1
> > * * * * * * * * * * *1
> > * * * * * * * * * * *1
> > * * * * * * * * * * *1
> > * * * * * * * * * * *0

>
> > * * * * * * * * *File #2
> > * * * * * * * * * * *1
> > * * * * * * * * * * *1
> > * * * * * * * * * * *0
> > * * * * * * * * *File #3
> > * * * * * * * * * * *1
> > * * * * * * * * * * *0
> > Table 1

>
> OK, so we have files corresponding to the columns in the Huffman's table.
>
> > Note the ratios in each 'file'. File #1 has a 75% ratio of 1's to a
> > 25% ratio of 0's. File #2 has a 66.7% ratio of 1's to a 33.3% ratio of
> > 0's. File #3 has a 50% to 50% ratio. Of course this is purely a
> > statistical representation as we will have a large file being
> > compressed with many more results added to the files in question.

>
> OK.
>
> > If originally we had a preexisting ratio imbalance of 1's to 0's we
> > could conceivably have an imbalance that would affect the whole.

>
> Huh ? I may have missed something, but at this point I don't see how the *
> strings S1 and S2 can be encoded using these 3 files.
> Could you please complete my examples ?
>
> > We will presume that the imbalance is to
> > the favor of the 1's for our current example, that we favor 1's in the
> > current system (1's and 0's can be interchanged at will), and that if
> > 0's were predominant a single bit in a command section could switch
> > 1's for 0's and make 1's predominant again.

>
> Just take example S2 if you want (where there are more '1' than '0').
>
> > This imbalance will lead
> > to even greater imbalances than 75% in the File #1 and File #2, and
> > file #3 will then show an imbalance equal to the original imbalance.

>
> At this point, I really don't see the link between the original data and *
> the three files.
> Well, there is a link between the Huffman's table and the three files, and *
> the Huffman's table should depends on the data (Huffman's algorithm), but*
> the table you presented was for a data with more 00 than 01 and more 01 *
> than 10 and 11, so probably more 0 than 1 overall, so I don't see why *
> you're speaking about more 1 than 0...
>
> --
> Hypocoristiquement,
> Jym.


S1 = 11 01 00 00 -> 111 10 0 0
S2 = 11 10 10 11 -> 111 110 110 111

What I am going to do, to help explain this better is place a letter
next to each number, so the description is more effective


S1 = 11 01 00 00 -> 1a1b1c 1d0e 0f 0g
S2 = 11 10 10 11 -> 1h1i1j 1k1l0m 1n1o0p 1q1r1s


So what happens in this order is

Letter per letter

a goes in file #1, since a = 1 then b will go to file #3, since b = 1
then c goes to file #3 , now since c is located in file #3 the next
bit, regardless, starts in file #1. So d goes into file #1, as d = 1
then e goes into file #2, and since e = 0 the next bit goes to file
#1.

Working this out we get

S1
File #1
1a1d0f0g

File #2
1b0e

File #3
1c


S2
File #1
1h1k1n1q

File #2
1i1l1o1r

File #3
1j0m0p1s


Now file #1 in S1 and S2 could be run again through the process, if it
would be worth doing, but frankly with such a small sample base it is
improper to even propose.


So end result:

S1: F1: 1100 F2: 10 F3: 1
S2: F1: 1111 F2: 1111 F3: 1001



Does this help with the understanding?
Reply With Quote
  #9  
Old 08-26-2008, 03:11 PM
Thomas Richter
Guest
 
Default Re: The Harrington Compression Method (HCM) White Paper.

Einstein wrote:
> Introduction:
> This is a lossless compression method which WILL work on random binary
> data and data considered Entropic.


No, it won't.

> The method:
>
> 1)Using a standard Huffman table:
>
> 11 = 111
> 10 = 110
> 01 = 10
> 00 = 0
>
> Additional variations as needed can be utilized.
>
> The Huffman produces 9 bits for every 8 initial bits.
>
> 2)Using HCM, we make different 'files' out of information. Any 1's in
> a line (until three are used or a 0 occurs) will lead to a different
> 'file' being granted the following information from that string. In
> this context each 'file' stands for a temporary file used to hold the
> information.
>
> The following Table 1 is divided for the standard conversion system
> noted above.
>
> File #1
> 1
> 1
> 1
> 0
>
>
> File #2
> 1
> 1
> 0
> File #3
> 1
> 0



That alone gives you three files, but the files alone do not help you to
recover any string. Up to now you defined how to encode the huffman
table (under 1) but not how to encode a message. So let's assume your
message is 00,01,10,11 and the above is the encoding, and that your
encoding works by grouping the initial message into groups of two bits
each, and translate each two bits via a huffman table as shown, and
furthermore, you read the output from that "in 90 degrees rotated" as
you did. And furthermore, that you start a new file whenever you find a
zero or three ones in that process: Then, to decode, you do not really
need any files (what are they good for if you don't use the file number
or the file index or the EOF indicator) but just the raw numbers
concatenated again. Thus, whether there are files or not is irrelevant,
we can just talk about scanning strings, right?

> Table 1
>
> Note the ratios in each 'file'. File #1 has a 75% ratio of 1's to a
> 25% ratio of 0's. File #2 has a 66.7% ratio of 1's to a 33.3% ratio of
> 0's. File #3 has a 50% to 50% ratio. Of course this is purely a
> statistical representation as we will have a large file being
> compressed with many more results added to the files


That is an example of using a sub-optimal huffman code on the input. In
fact, given that the statistics of the input is that each input is
equally likely, then the (or a) optimal huffman would be the identity
mapping:

00 -> 00
01 -> 01
10 -> 10
11 -> 11

and your files would be (probably, that is not explained in a useful way)

file 1: 0
file 2: 0
file 3: 11
file 4: 0
file 5: 10
file 6: 1

and there is no statistical weight to either zero's or one's. In fact,
there are three files where we have 100% of zeros, one file of two
one's, one of one one, and one with a 50%-50% difference.

> If originally we had a preexisting ratio imbalance of 1's to 0's we
> could conceivably have an imbalance.


Not for the input the code was designed for. The huffman table you
presented is optimal for a four-letter alphabet with the probabilities

p(00) = 0.5
p(01) = 0.25
p(10) = 0.125
p(11) = 0.125

under these conditions, the output would contain, on average

0.5 (0) + 0.25 (01) + 0.125 (110) + 0.125 (111)

which, when we count the number of zeros per symbol

0.5 + 0.25 + 0.125 = 0.875

and the number of ones per symbol

0.25 + 2*0.125 + 3*0.125 = 0.875

which is exactly equal. That is, if you apply the huffman to the source
it was designed to be applied to, the number of zeros is (on average)
exactly equal to the number of ones. That you parse the output
column-wise instead of row-wise doesn't change the fact. That you create
files and abort a file as soon as you find a zero doesn't change the
total number of zeros or ones either. Since zeros and ones are equally
likely in the output, the probability of having n ones (and one zero) is
2^(-n-1), constituting a file of size n+1, and the probability of having
a file of size 1 is (hence) 2^-1 = 0.5. Which means, that even one half
of your files will consist of many ones, the other half will be
completely zero, and one symbol each.

> that would affect the whole. We will presume that the imbalance is to
> the favor of the 1's for our current example, that we favor 1's in the
> current system (1's and 0's can be interchanged at will), and that if
> 0's were predominant a single bit in a command section could switch
> 1's for 0's and make 1's predominant again.


At the cost of one bit - big deal, since this will double the size of
half of your data.

> This imbalance will lead
> to even greater imbalances than 75% in the File #1 and File #2, and
> file #3 will then show an imbalance equal to the original imbalance.


Ok, so let's continue with this nonsense. Suppose we want to compress
the given files with a "slanted" probability, namely that found above.
One can either consider the statistics as a whole, in which case we had
exactly as many zeros as ones, giving you no compression. Or you could
create a huffman table for each file itself, which had to be
communicated - again not optimal. The easiest is, since we know how each
file looks like (n ones followed by a single zero) just to encode that
number (n) of consecutive ones. That would require log_2(n) bits to do,
namely the file size alone, obviously sufficient to decompress it.
However, if you combine that again to a total message (or, to put it in
different words, if you store that on a filing system) you also need to
encode where one file (or sub-message) stops, and where the next starts.
Not each file size is equally likely, i.e. a file of size log_2(n)
appears (as shown) with a probability of 2^(-n-1), and the optimal way
to encode the file size is again a huffman code (as we know), taking
(again as we know) exactly n+1 bits for the message of log_2(n) bits.
Now, as we only need to encode the file size (and not the contents), n+1
bits are enough to be written out. Strange enough, these are *exactly*
the number of bits stored in the file in first place, surprise,
surprise, it didn't get any shorter, but stayed optimal, and we used all
optimal codes on the way.


> 3)Repeat step #2 on the #1 files content.The results will astonish
> you. Using a sub-filing system of 1.1, 1.2, 1.3, 2, 3 we end up with
> very interesting results across the whole spectrum.
>
> File 1.1 will have 19.75% of the results and a ratio imbalance of
> 93.75% to 6.25%.
>
> File 1.2 will have 14.81% of the results and a ratio imbalance of 80%
> to 20%
>
> File 1.3 will have 9.87% of the results and a ratio imbalance of 75%
> to 25%
>
> File 2 will have 33.33% of the results and a ratio imbalance of 66.7%
> to 33.3%
>
> File 3 will have 22.22% of the results and a ratio imbalance of 50% to
> 50%


You again start with a wrong premise. If you start with the right
premise, namely that you apply the huffman to the code it was designed
for, you also get the math correctly, namely that you do not gain anything.

> 11 = 1
> 10 = 01
> 01 = 001
> 00 = 000
>
> This table relies upon a statistical imbalance of 1's to 0's.


There is none. See my example above. Your "observed imbalance" is a
by-product of using a sub-optimal code on a random source. Using the
code on the source it was designed for does not show an imbalance.

> 5)Command Section
> We now need a command section to handle all the different files,
> changes, etc. Each file will have a specific number of bits. This can
> be easily represented with a simplistic counting system allowing for
> maximum space savings. If < 1 kilobyte then 00, if less than 1
> megabyte then 01, if less than 1 gigabyte then 10, if greater than 1
> gigabyte then 11.


Again, this code is sub-optimal, you're wasting bits. If you do not want
to waste bits, see above.

> 6)Reversing the Compression
> Our compression results can be undone by exactly going in reverse of
> our system. The command section will point the way accurately. There
> can be no errors whatsoever so long as the code is written correctly.
> It has an absolute lossless function built-in.
>
> How is this all possible?


By starting from a wrong premise and not thinking the math through.

> The claim of the pigeon hole problem naturally arises at this point.
> However, all the steps are reversible and lead IMMEDIATELY back to the
> same original code. No two results will be exactly the same. It is
> mathematically not possible. The HCM system completely leads to the
> data as inputted originally.


The principle applies. However, the only thing your code does is that it
makes files larger, on average, unless you tune it to encode things
optimal, in which case the files stay the same size. Thus, no problem,
but no gain either.

Despite, what about changing your nick. It's funny and entertaining,
what about "Groucho Marx"?

So long,
Thomas
Reply With Quote
  #10  
Old 08-26-2008, 03:25 PM
Einstein
Guest
 
Default Re: The Harrington Compression Method (HCM) White Paper.

Thomas you are trying to divert the actual make up.


It's a simple concept, a 50/50%, a 25/25/25/25%, a
12.5/12.5/12.5/12.5/12.5/12.5/12.5/12.5%, and so forth situation can
be made into a:

Three files with a 75/25% a 67/33% and a 50/50%.

Further work gets us a 93.75%, a 80%, a 75%, a 67%, and a 50%.

When run through a Huffman the size decrease is better than the size
increases required in the earlier Huffmans.

Reply With Quote
Reply


Thread Tools
Display Modes


All times are GMT -5. The time now is 11:58 PM.


Powered by vBulletin® Version 3.7.2
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
vB Ad Management by =RedTyger=

In an effort to better serve ads to our visitors, cookies are used on objectmix.com. For more information, check out our Privacy Policy.