I have some preliminary software given to me to test. I am testing it
to see how well it works.
Bug checking and such as well.
My spreadsheet is 90% completed, will be done tommorrow sadly due to
work schedule.
Printable View
I have some preliminary software given to me to test. I am testing it
to see how well it works.
Bug checking and such as well.
My spreadsheet is 90% completed, will be done tommorrow sadly due to
work schedule.
In article <1f00214b-8c75-4fd0-97a9-ad2bc744d65b@s20g2000prd.googlegroups.com>,
Einstein <michaelhh@gmail.com> wrote:
>On Aug 31, 12:20=A0pm, amor...@xenon.Stanford.EDU (Alan Morgan) wrote:
>>
>> I would assume that this means that for absolutely any file of any
>> (finite) size the command section will never be more than a gigabyte.
>> Is my interpretation correct? =A0Can you prove your claim?
>
>Alan here is a question for you.
>
>if say for a hypothetical compression algorithm, a command section
>could vary between 100 bits to 10,000 bits, but would never be smaller
>or larger than that, regardless of the size of the file, and this
>compression algorithm was able to compress only 1% one time on any
>file size...
So I take it my interpretation is correct. Now then: can you prove
that a command section of <= 10,000 bits will always suffice?
>What is the minimum size, and maximum size for files to be compressed?
>Or is it somewhat variable due to the variable length of the command
>section?
>
>Pray tell me this, then you have your own answer sir.
I know what the answer is if your claims are correct. What I'm asking
you to do is prove that your claims are correct. Sir.
Alan
--
Defendit numerus
On Aug 29, 7:15 pm, Einstein <michae...@gmail.com> wrote:
> I am not worried about a mere few bits. I am not worried about 1,000
> bits, I am not worried about 100,000,000 bits in the command section.
> It is IRRELEVENT at this moment. It can be what size it needs to be.
> It can be larger even, I do not care!
>
> What I care is that the final result can compress so long as the file
> is of sufficient initial size to account for the command section.
> Honestly this is so minute of an issue, it does not exist as far as I
> am concerned.
And that is the flaw in your method. You *must* account for the size
of your command section/control codes/dictionary references/whatever
because they take up space, and you will find that simply reordering
data does not save any bits.
I am reminded of the "simple" compression method that can compress any
file by picking a symbol and recording where in the file that symbol
occurs, then the rest of the file minus that symbol. For example,
let's take the following sentence:
I enjoy compression as a hobby
Let's "compress" this by picking the letter "o", which occurs here
(fixed-width font please)
I enjoy compression as a hobby
-----^---^-------^--------^---
That's offsets 5, 9, 17, and 26. So my output is now:
o,5,9,17,26,I enjy cmpressin as a hbby
I can store my offsets using only 5 bits, but I have to burn 8 bits
specifying the letter "o" as the symbol at each offset, so I save 7
bits. At this point, you're probably thinking "Hey, that worked
great, I'm going to re-run it on the compressed output and get even
MORE compression!" but the more you do this the more bits you need to
store the offsets, plus there is a net loss for letters than only
appear once like "j" and "p", and in the end you will find that the
above method will not work for all sources.
That is the key thing -- "for all sources". There is no one
compression method that will compress every single possible source,
yet people think they have come up with Magic Method X that does, and
that's where they fall into the perpetual compression trap.
On Fri, 29 Aug 2008 05:03:30 +0200, Einstein <michaelhh@gmail.com> wrote:
Hmmmm....
If I've understood properly what was said on Monday, the first two Huffman
are *expansion-Huffman* and only the third one is a compression-Huffman.
Is this right ?
If yes, then what's wrong with the following:
Start with string
11 11 11 11 (length 8)
> Steps in short:
>
> 1) String length source file, if odd numbered either trunc, or add a
> bit [command+1]
> 2) COUNT all 11, 10, 01, 00 outcomes. Switch them into most occurring
> to most desired (In this order desired: 11, 10, 01, 00). [Command
> +8]
Done.
> 3) 1st Huffman
expansion Huffman, hence 11 -> 111, right ?
Result : 111 111 111 111 (length 12)
> 4) File Sort
11 11
11 11
11 11
> 5) String length files, if odd numbered either trunc, or add a bit
> [command+3]
> 6) COUNT all 11, 10, 01, 00 outcomes per file. Switch them into most
> occurring to most desired (In this order desired: 11, 10, 01, 00).
> [Command +24]
Done.
> 7) repeat 1st Huffman on all 3 files.
Again, expansion Huffman, right ?
Result:
111 111
111 111
111 111
(sum of lengths: 18)
> 8) Subfile sort.
11 / 11 / 11
11 / 11 / 11
11 / 11 / 11
> 9) String length sub-files, if odd numbered either trunc, or add a
> bit [command+9]
> 10) COUNT all 11, 10, 01, 00 outcomes per file. Switch them into most
> occurring to most desired (In this order desired: 11, 10, 01, 00).
> [Command +24]
Done.
> 11) Compression Huffman on files 1.1, 1.2, 1.3, 2.1, 2.2, 2.3, 3.1,
> and 3.2. Do not attempt file #3 unless COUNT indicated 11 results >
> 01+11 results.
This time, compression Huffman, hence 11 -> 0
Result:
0 / 0 / 0
0 / 0 / 0
0 / 0 / 0
Sum of lengths: 9...
> 12) String Length each subfile, use 50 bits per subfile to identify
> string length [command+450]
> 13) Append information as:
> CommandSection&File1.1&File1.2&File1.3&File2.1&File2.2&File2.3&File3.1&File3.2&File3.3
So, even without command section, just by appending the 9 files I get in
the end, I have a result of length 9 strating from a string of length 8.
This can hardly be called a compression...
Where did I do something wrong ?
Are these first two Huffman really expansion Huffman ?
After understanding that, and doing my own computations on the different
sizes of files, I've been back to an earlier message of you:
http://groups.google.com/group/comp....d6ca4597c032bd
There, you stated:
> Let me break this down into a real simple example
> 64 bits goes in.
> Stage 1 increases it to 72 bits statistically
This is wrong. Stage 1 increase it to something between 72 and 96 bits.
(between 9/8 and 3/2 of the original)
72 bits are only obtainable is the input file has exactly 1/4 11, 1/4 10,
1/4 01 and 1/4 00.
If by "statistically" you actually meant "in the best case", then I agree
with you.
If you meant something like "on average", I'm afraid I'm going to
disagree...
> Stage 2 puts it in files, we then run file #1 again.
> File #1 had 32 bits when put in, now it has 36 and is split into 3
> subfiles.
File #1 had 32 bits, expanded to [40;48] bits, depending on its content.
Notice that the way file #1 is obtained, it cannot contain less than 75%
1. Indeed, file #1 contains a 0 for each 00 that where present in the
original input. Since 00 was the less frequent pair of bits, there cannot
be more than 25% 00 in the input, hence no more than 25% 0 in file #1.
If you work out the composition of file #1 in term of pairs of bits, you
can quickly see that there will be at least 50% 11. This immediatly gives
at most 1/6 10, 01 and 00 (each).
So, file #1 cannot have the 25%/25%/25%/25% composition that allowed the
smallest increase after an Huffman-expand and cannot be expanded to only
36 bits.
> 1.1 has 16 bits1.2 has 12 bits1.3 has 8 bits
I obtained:
File 1.1: 16 (1/4)
File 1.2: [14;16] ([5/24;1/4])
File 1.3: [11;16] ([1/6;1/4])
> Now the ratio's:File 1.193.75% to 6.25%.
I obtained a percentage of 1 between 5/6 (83.33%) and 1.
> File 1.280% to 20%
I obtained a percentage of 1 between 3/4 and 1.
> File 1.375% to 25%
I obtained a percentage of 1 between 1/2 and 1.
> File 1.1 comes out with a statistical 9.47 bits remaining.
> File 1.2 comes out with a statistical 9.36 bits remaining
> File 1.3 comes out with a statistical 6.75 bits remaining
> This is 25.58 bits.
> Compare this to the 28.44 bits that it would normally hold
I have stopped computing the size/proportions in each files and subfiles
before the last Huffman (the Huffman-compress).
However, I do have range for sizes of files, percentages of 1 in files and
percentages of 11, 10, 01, 00 in files up to stage 10 included (after
splitting into 1.1, ... and rearranging each file to get 11 as the most
frequent pair, ...)
I can do the computation of ranges of sizes after the Huffman-compress of
step 11 and write all my result if you really want to...
--
Hypocoristiquement,
Jym.