| Register | FAQ | Calendar | Search | Today's Posts | Mark Forums Read |
|
#1
| |||
| |||
| On Wednesday I am doing a release of a new compression method, for peer review purposes, though Patent is Pending. This compression method will define a new range for Entropy, is lossless, and will not result in two files becoming the other. I am pre-preparing the different venues with the basic's so that when I talk of the method then people can quickly follow the exact method. ~~~~~~~~~~ A break down of the different portions of the invention, known as Harrington's Compression Method, or HCM for the layman. ~~~~~~~~ The First Fundamental is the Huffman. A Huffman is a way for a computer to change a data line, and be able to go back and change it back with out any errors, loss, or the likes. It does this through identifying the following bit in a sequence of bits until it recognizes a pattern. This can be a typical Huffman 11 = 0 10 = 10 01 = 110 00 = 111 A Huffman can be patterned in many ways, and many lengths. The above example works on switching 1's for 0's, and if 11 results outnumber 01 and 00 results it will also result in compression. A Huffman is a very adaptable format, with many advantages to compression. In the example above if the Huffman sees a 0 as the first bit it is currently examining, it know it is actually a result and can end identifying this group there. If the next bit is a 1, then it knows to seek the following bit out and identify it. If it is a 0, then the final result is a 10, and it ends that group there. Each is identifiable in turn. Reversal of a Huffman is both easy and exacting. There is no errors so long as no corruption occurred in the code. ~~~~~~~~~~ The Second Fundamental is Pascals triangle. Pascals triangle has applications in many many different fields. For our purposes the Triangle is a perfect tool for identifying the odds of certain outcomes in certain sized files. A triangle is made rather easily. The first two results are 11, and are applied to the top of the triangle. 1-1 The left most, and right most result are always 1, however for those used to a table, we can say the left is always a 1. Therefore our Triangle starts to look like this: 1-1 1- 1- 1- 1- now the second row down, 2nd column and on, operate in this manner: Add the bit directly above the desired result, and the bit to the left of that one. 1-1 1-2-1 1-3-3-1 1-4-6-4-1 1-5-10-10-5-1 This will always result in a SUM count over a row equal to the total possible outcomes of that many bits, as represented in column 2. This is always true. The row can be looked at like this. Assuming 0 is the left, and 1 is the right, the first number represents the number of outcomes with all 0's, the next is the number of bits, -1, that are 0's with +1 of the 1's. The 2nd column is all zero's minus two, and plus two ones. This goes on until there is no more 0's. Therefore at 100 bits the area with 30 total 0's and 70 total 1's will have around 29,372,339,821,610,900,000,000,000 (with some rounding occurring (darned maximums of OpenOffice and Excel) outcomes. As we progress in total number of bits we end up with a odds of occurrence of most outcomes (to a degree which rounds to 100% just by the decimal displacement generated) to lay in/close to, the 50% range, with a deviation of less than 1% in either direction. This is a basic law of large software, so long as it is entropic, or random. If the software is text based, largely non-random, or is highly imbalanced in a low bit ratio, then it will fall outside this norm. ~~~~~~~ The Third Fundamental is Separate Filing This is a new concept to some extent. In typical compression the usage of multiple files, of multiple sizes, of multiple possible names, results in a potential compression method, for extremely imbalanced in ratio levels information. It is extremely ineffective, and unlikely to compress. The problem ultimately is an issue of the command section, or real size of a factual file, to hold the data separated. The numbers feel impressive, for this can be an example: Assumption, up to 3 files, 1 to 3 bits in each. Each file therefore has 14 outcomes. This is 14 * 14 * 14 + 14 * 14 + 14 * 14 + 14 * 14 + 14 + 14 + 14, or a total of 3374. This is 11.7202 bits in ratio. Where we had 1 to 3 bits, with an average of 2.57 bits per file, of 1 to 3 files, and a predictable number of around 2.5 files averaged. This is 7.19 bits averaged! It seems wonderful... until you need to count how many bits per which file. The command section requires 1 bit per file to say if it is active, 1 bit per an active file to identify if there is 1 bit or 2 bits, and this adds up fast. So fast that it out does any gains statistically, and renders itself moot. However if a means existed to sort information to different files, without an overhead cost to keep track of this information then a huge increase would occur in the possible data being stored per bit. As would the ability to have different numbers of bits in a file tracked without cost. Either way would be an increase, together would be a dramatic increase in the capacity for our purposes. ~~~~~~~~~ The Fourth Fundamental is perfect tracking If Step A tells you to do 00 = 11 then step B tells you 11 = 10, and your command section is AB, then if your code was 001110 you get instead 100011 as your final results. Reversing it is as simple as doing the reverse, BA. ~~~~~~~~~ The Fifth Fundamental is Lossy versus Lossless A lossy code system can mistake two results for each other. An example of a lossy encoder would be one which takes a word, or words, it knows in context, and tries to repair them. An example is “If then” type statements. If the If then is in a programming context then the software may be able to just predict a lot of the system, but if it happens to be in a text portion it may lose some of the text to errors. A lossless system never makes a mistake in the code. There is no means to create a mistake as steps are clearly written to produce step B after step A, and with perfect accountability. A Huffman is a lossless algorithm due to the fact it can be flawlessly reversed. ~~~~~~~~~~~~ The Sixth Fundamental is Ratio's Ratio's can play an important part in compression. It is possible to have anywhere from 100% ones, to 0% ones, versus a 0% zero's to 100% zero's. This can lead to wildly different variables when talking two bit outcomes, even if both are 50%. An example. You might have 50/50 1's to 0's, but 20% - 11, 30% - 10, 30% - 01, 20% 11. This would be a ratio imbalance. It is possible to create an imbalance from a pure statistical evenness via any number of means. For instance a Huffman with 11 = 111, 10 = 110, 01 = 10, 00 = 0 will result in a statistical imbalance of 66.7% total 1's versus 33.3% total 0's ~~~~~~~~~~~~ I will not go into Arithmetic encoding, bijective encoding, and other compression techniques because frankly I do not need to. These are better than Huffman is, by a good measure, but they are not needed to validate the method. |
|
#2
| |||
| |||
| Well in 24 hours I am posting the post. Please study up if your weak :P Some people have exclusively seen it in advance, or have a chance to do so. |
![]() |
| Thread Tools | |
| Display Modes | |
In an effort to better serve ads to our visitors, cookies are used on objectmix.com. For more information, check out our Privacy Policy.