New compression theory announcement soon

This is a discussion on New compression theory announcement soon within the Theory forums in Theory and Concepts category; On Wednesday I am doing a release of a new compression method, for peer review purposes, though Patent is Pending. This compression method will define a new range for Entropy, is lossless, and will not result in two files becoming the other. I am pre-preparing the different venues with the basic's so that when I talk of the method then people can quickly follow the exact method. ~~~~~~~~~~ A break down of the different portions of the invention, known as Harrington's Compression Method, or HCM for the layman. ~~~~~~~~ The First Fundamental is the Huffman. A Huffman is a way ...

Go Back   Application Development Forum > Theory and Concepts > Theory

Object Mix

Register FAQ Calendar Search Today's Posts Mark Forums Read
  #1  
Old 08-25-2008, 08:29 PM
Einstein
Guest
 
Default New compression theory announcement soon

On Wednesday I am doing a release of a new compression method, for
peer review purposes, though Patent is Pending.


This compression method will define a new range for Entropy, is
lossless, and will not result in two files becoming the other.

I am pre-preparing the different venues with the basic's so that when
I talk of the method then people can quickly follow the exact method.




~~~~~~~~~~

A break down of the different portions of the invention, known as
Harrington's Compression Method, or HCM for the layman.


~~~~~~~~

The First Fundamental is the Huffman.

A Huffman is a way for a computer to change a data line, and be able
to go back and change it back with out any errors, loss, or the likes.
It does this through identifying the following bit in a sequence of
bits until it recognizes a pattern.

This can be a typical Huffman

11 = 0
10 = 10
01 = 110
00 = 111

A Huffman can be patterned in many ways, and many lengths. The above
example works on switching 1's for 0's, and if 11 results outnumber 01
and 00 results it will also result in compression. A Huffman is a very
adaptable format, with many advantages to compression.

In the example above if the Huffman sees a 0 as the first bit it is
currently examining, it know it is actually a result and can end
identifying this group there. If the next bit is a 1, then it knows to
seek the following bit out and identify it. If it is a 0, then the
final result is a 10, and it ends that group there. Each is
identifiable in turn. Reversal of a Huffman is both easy and exacting.
There is no errors so long as no corruption occurred in the code.


~~~~~~~~~~


The Second Fundamental is Pascals triangle.
Pascals triangle has applications in many many different fields. For
our purposes the Triangle is a perfect tool for identifying the odds
of certain outcomes in certain sized files.

A triangle is made rather easily. The first two results are 11, and
are applied to the top of the triangle.

1-1

The left most, and right most result are always 1, however for those
used to a table, we can say the left is always a 1.
Therefore our Triangle starts to look like this:


1-1
1-
1-
1-
1-

now the second row down, 2nd column and on, operate in this manner:
Add the bit directly above the desired result, and the bit to the left
of that one.

1-1
1-2-1
1-3-3-1
1-4-6-4-1
1-5-10-10-5-1

This will always result in a SUM count over a row equal to the total
possible outcomes of that many bits, as represented in column 2. This
is always true. The row can be looked at like this. Assuming 0 is the
left, and 1 is the right, the first number represents the number of
outcomes with all 0's, the next is the number of bits, -1, that are
0's with +1 of the 1's. The 2nd column is all zero's minus two, and
plus two ones. This goes on until there is no more 0's. Therefore at
100 bits the area with 30 total 0's and 70 total 1's will have around
29,372,339,821,610,900,000,000,000 (with some rounding occurring
(darned maximums of OpenOffice and Excel) outcomes. As we progress in
total number of bits we end up with a odds of occurrence of most
outcomes (to a degree which rounds to 100% just by the decimal
displacement generated) to lay in/close to, the 50% range, with a
deviation of less than 1% in either direction.

This is a basic law of large software, so long as it is entropic, or
random. If the software is text based, largely non-random, or is
highly imbalanced in a low bit ratio, then it will fall outside this
norm.


~~~~~~~


The Third Fundamental is Separate Filing

This is a new concept to some extent. In typical compression the usage
of multiple files, of multiple sizes, of multiple possible names,
results in a potential compression method, for extremely imbalanced in
ratio levels information. It is extremely ineffective, and unlikely to
compress. The problem ultimately is an issue of the command section,
or real size of a factual file, to hold the data separated.

The numbers feel impressive, for this can be an example:

Assumption, up to 3 files, 1 to 3 bits in each. Each file therefore
has 14 outcomes. This is 14 * 14 * 14 + 14 * 14 + 14 * 14 + 14 * 14 +
14 + 14 + 14, or a total of 3374. This is 11.7202 bits in ratio. Where
we had 1 to 3 bits, with an average of 2.57 bits per file, of 1 to 3
files, and a predictable number of around 2.5 files averaged. This is
7.19 bits averaged! It seems wonderful... until you need to count how
many bits per which file. The command section requires 1 bit per file
to say if it is active, 1 bit per an active file to identify if there
is 1 bit or 2 bits, and this adds up fast. So fast that it out does
any gains statistically, and renders itself moot.

However if a means existed to sort information to different files,
without an overhead cost to keep track of this information then a huge
increase would occur in the possible data being stored per bit. As
would the ability to have different numbers of bits in a file tracked
without cost. Either way would be an increase, together would be a
dramatic increase in the capacity for our purposes.


~~~~~~~~~

The Fourth Fundamental is perfect tracking

If Step A tells you to do 00 = 11 then step B tells you 11 = 10, and
your command section is AB, then if your code was 001110 you get
instead 100011 as your final results. Reversing it is as simple as
doing the reverse, BA.


~~~~~~~~~

The Fifth Fundamental is Lossy versus Lossless

A lossy code system can mistake two results for each other. An example
of a lossy encoder would be one which takes a word, or words, it knows
in context, and tries to repair them. An example is “If then” type
statements. If the If then is in a programming context then the
software may be able to just predict a lot of the system, but if it
happens to be in a text portion it may lose some of the text to
errors.

A lossless system never makes a mistake in the code. There is no means
to create a mistake as steps are clearly written to produce step B
after step A, and with perfect accountability. A Huffman is a lossless
algorithm due to the fact it can be flawlessly reversed.


~~~~~~~~~~~~

The Sixth Fundamental is Ratio's

Ratio's can play an important part in compression. It is possible to
have anywhere from 100% ones, to 0% ones, versus a 0% zero's to 100%
zero's. This can lead to wildly different variables when talking two
bit outcomes, even if both are 50%. An example. You might have 50/50
1's to 0's, but 20% - 11, 30% - 10, 30% - 01, 20% 11. This would be a
ratio imbalance.

It is possible to create an imbalance from a pure statistical evenness
via any number of means. For instance a Huffman with 11 = 111, 10 =
110, 01 = 10, 00 = 0 will result in a statistical imbalance of 66.7%
total 1's versus 33.3% total 0's



~~~~~~~~~~~~


I will not go into Arithmetic encoding, bijective encoding, and other
compression techniques because frankly I do not need to. These are
better than Huffman is, by a good measure, but they are not needed to
validate the method.
Reply With Quote
  #2  
Old 08-26-2008, 02:07 AM
Einstein
Guest
 
Default Re: New compression theory announcement soon

Well in 24 hours I am posting the post. Please study up if your
weak :P

Some people have exclusively seen it in advance, or have a chance to
do so.
Reply With Quote
Reply


Thread Tools
Display Modes


All times are GMT -5. The time now is 04:01 AM.


Powered by vBulletin® Version 3.7.2
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
vB Ad Management by =RedTyger=

In an effort to better serve ads to our visitors, cookies are used on objectmix.com. For more information, check out our Privacy Policy.