I have some preliminary software given to me to test. I am testing it

to see how well it works.

Bug checking and such as well.

My spreadsheet is 90% completed, will be done tommorrow sadly due to

work schedule.

Printable View

- 09-01-2008, 11:13 PMApplication DevelopmentRe: The Harrington Compression Method (HCM) White Paper.
I have some preliminary software given to me to test. I am testing it

to see how well it works.

Bug checking and such as well.

My spreadsheet is 90% completed, will be done tommorrow sadly due to

work schedule.

- 09-01-2008, 11:48 PMApplication DevelopmentRe: The Harrington Compression Method (HCM) White Paper.
In article <1f00214b-8c75-4fd0-97a9-ad2bc744d65b@s20g2000prd.googlegroups.com>,

Einstein <michaelhh@gmail.com> wrote:

>On Aug 31, 12:20=A0pm, amor...@xenon.Stanford.EDU (Alan Morgan) wrote:

>>

>> I would assume that this means that for absolutely any file of any

>> (finite) size the command section will never be more than a gigabyte.

>> Is my interpretation correct? =A0Can you prove your claim?

>

>Alan here is a question for you.

>

>if say for a hypothetical compression algorithm, a command section

>could vary between 100 bits to 10,000 bits, but would never be smaller

>or larger than that, regardless of the size of the file, and this

>compression algorithm was able to compress only 1% one time on any

>file size...

So I take it my interpretation is correct. Now then: can you prove

that a command section of <= 10,000 bits will always suffice?

>What is the minimum size, and maximum size for files to be compressed?

>Or is it somewhat variable due to the variable length of the command

>section?

>

>Pray tell me this, then you have your own answer sir.

I know what the answer is if your claims are correct. What I'm asking

you to do is prove that your claims are correct. Sir.

Alan

--

Defendit numerus

- 09-02-2008, 12:10 PMApplication DevelopmentRe: The Harrington Compression Method (HCM) White Paper.
On Aug 29, 7:15 pm, Einstein <michae...@gmail.com> wrote:

> I am not worried about a mere few bits. I am not worried about 1,000

> bits, I am not worried about 100,000,000 bits in the command section.

> It is IRRELEVENT at this moment. It can be what size it needs to be.

> It can be larger even, I do not care!

>

> What I care is that the final result can compress so long as the file

> is of sufficient initial size to account for the command section.

> Honestly this is so minute of an issue, it does not exist as far as I

> am concerned.

And that is the flaw in your method. You *must* account for the size

of your command section/control codes/dictionary references/whatever

because they take up space, and you will find that simply reordering

data does not save any bits.

I am reminded of the "simple" compression method that can compress any

file by picking a symbol and recording where in the file that symbol

occurs, then the rest of the file minus that symbol. For example,

let's take the following sentence:

I enjoy compression as a hobby

Let's "compress" this by picking the letter "o", which occurs here

(fixed-width font please)

I enjoy compression as a hobby

-----^---^-------^--------^---

That's offsets 5, 9, 17, and 26. So my output is now:

o,5,9,17,26,I enjy cmpressin as a hbby

I can store my offsets using only 5 bits, but I have to burn 8 bits

specifying the letter "o" as the symbol at each offset, so I save 7

bits. At this point, you're probably thinking "Hey, that worked

great, I'm going to re-run it on the compressed output and get even

MORE compression!" but the more you do this the more bits you need to

store the offsets, plus there is a net loss for letters than only

appear once like "j" and "p", and in the end you will find that the

above method will not work for all sources.

That is the key thing -- "for all sources". There is no one

compression method that will compress every single possible source,

yet people think they have come up with Magic Method X that does, and

that's where they fall into the perpetual compression trap.

- 09-03-2008, 02:40 PMApplication DevelopmentRe: The Harrington Compression Method (HCM) White Paper.
On Fri, 29 Aug 2008 05:03:30 +0200, Einstein <michaelhh@gmail.com> wrote:

Hmmmm....

If I've understood properly what was said on Monday, the first two Huffman

are *expansion-Huffman* and only the third one is a compression-Huffman.

Is this right ?

If yes, then what's wrong with the following:

Start with string

11 11 11 11 (length 8)

> Steps in short:

>

> 1) String length source file, if odd numbered either trunc, or add a

> bit [command+1]

> 2) COUNT all 11, 10, 01, 00 outcomes. Switch them into most occurring

> to most desired (In this order desired: 11, 10, 01, 00). [Command

> +8]

Done.

> 3) 1st Huffman

expansion Huffman, hence 11 -> 111, right ?

Result : 111 111 111 111 (length 12)

> 4) File Sort

11 11

11 11

11 11

> 5) String length files, if odd numbered either trunc, or add a bit

> [command+3]

> 6) COUNT all 11, 10, 01, 00 outcomes per file. Switch them into most

> occurring to most desired (In this order desired: 11, 10, 01, 00).

> [Command +24]

Done.

> 7) repeat 1st Huffman on all 3 files.

Again, expansion Huffman, right ?

Result:

111 111

111 111

111 111

(sum of lengths: 18)

> 8) Subfile sort.

11 / 11 / 11

11 / 11 / 11

11 / 11 / 11

> 9) String length sub-files, if odd numbered either trunc, or add a

> bit [command+9]

> 10) COUNT all 11, 10, 01, 00 outcomes per file. Switch them into most

> occurring to most desired (In this order desired: 11, 10, 01, 00).

> [Command +24]

Done.

> 11) Compression Huffman on files 1.1, 1.2, 1.3, 2.1, 2.2, 2.3, 3.1,

> and 3.2. Do not attempt file #3 unless COUNT indicated 11 results >

> 01+11 results.

This time, compression Huffman, hence 11 -> 0

Result:

0 / 0 / 0

0 / 0 / 0

0 / 0 / 0

Sum of lengths: 9...

> 12) String Length each subfile, use 50 bits per subfile to identify

> string length [command+450]

> 13) Append information as:

> CommandSection&File1.1&File1.2&File1.3&File2.1&File2.2&File2.3&File3.1&File3.2&File3.3

So, even without command section, just by appending the 9 files I get in

the end, I have a result of length 9 strating from a string of length 8.

This can hardly be called a compression...

Where did I do something wrong ?

Are these first two Huffman really expansion Huffman ?

After understanding that, and doing my own computations on the different

sizes of files, I've been back to an earlier message of you:

http://groups.google.com/group/comp....d6ca4597c032bd

There, you stated:

> Let me break this down into a real simple example

> 64 bits goes in.

> Stage 1 increases it to 72 bits statistically

This is wrong. Stage 1 increase it to something between 72 and 96 bits.

(between 9/8 and 3/2 of the original)

72 bits are only obtainable is the input file has exactly 1/4 11, 1/4 10,

1/4 01 and 1/4 00.

If by "statistically" you actually meant "in the best case", then I agree

with you.

If you meant something like "on average", I'm afraid I'm going to

disagree...

> Stage 2 puts it in files, we then run file #1 again.

> File #1 had 32 bits when put in, now it has 36 and is split into 3

> subfiles.

File #1 had 32 bits, expanded to [40;48] bits, depending on its content.

Notice that the way file #1 is obtained, it cannot contain less than 75%

1. Indeed, file #1 contains a 0 for each 00 that where present in the

original input. Since 00 was the less frequent pair of bits, there cannot

be more than 25% 00 in the input, hence no more than 25% 0 in file #1.

If you work out the composition of file #1 in term of pairs of bits, you

can quickly see that there will be at least 50% 11. This immediatly gives

at most 1/6 10, 01 and 00 (each).

So, file #1 cannot have the 25%/25%/25%/25% composition that allowed the

smallest increase after an Huffman-expand and cannot be expanded to only

36 bits.

> 1.1 has 16 bits1.2 has 12 bits1.3 has 8 bits

I obtained:

File 1.1: 16 (1/4)

File 1.2: [14;16] ([5/24;1/4])

File 1.3: [11;16] ([1/6;1/4])

> Now the ratio's:File 1.193.75% to 6.25%.

I obtained a percentage of 1 between 5/6 (83.33%) and 1.

> File 1.280% to 20%

I obtained a percentage of 1 between 3/4 and 1.

> File 1.375% to 25%

I obtained a percentage of 1 between 1/2 and 1.

> File 1.1 comes out with a statistical 9.47 bits remaining.

> File 1.2 comes out with a statistical 9.36 bits remaining

> File 1.3 comes out with a statistical 6.75 bits remaining

> This is 25.58 bits.

> Compare this to the 28.44 bits that it would normally hold

I have stopped computing the size/proportions in each files and subfiles

before the last Huffman (the Huffman-compress).

However, I do have range for sizes of files, percentages of 1 in files and

percentages of 11, 10, 01, 00 in files up to stage 10 included (after

splitting into 1.1, ... and rearranging each file to get 11 as the most

frequent pair, ...)

I can do the computation of ranges of sizes after the Huffman-compress of

step 11 and write all my result if you really want to...

--

Hypocoristiquement,

Jym.