I recently came across a beautiful mathematical proof about data compression

Ionica SmithsOctober 21 202210:30

The name S10 is an elegantly compact version of “Stien”. If you read the lyrics of her song depth If you want to write it more compactly, you can, for example, replace “Tadadadadadadada” with “Ta 78 x da”.

With lossless data compression, you can accurately restore the original file from the compressed file. It’s not obvious, think of the bloated images you sometimes see: the image file size is reduced there so you can’t go back to the original image.

I recently came across great mathematical proof that one lossless data compression method can never work for all files. If you have a method that shrinks at least one file, there will inevitably be a file that actually grows the same way.

It’s a silly proof: you start with the opposite of what you want to prove, then you start with the mind from there a number of perfectly logical steps until you come across a contradiction. Since all of your steps were logical and correct, the only possible conclusion is that your first assumption was wrong.

In this guide, we think of stored files as a finite sequence of bits (each of which can be 0 or 1). We assume that there is a compression method that converts at least one file into an output that is shorter than the original file and that method does not make all the other files longer than they were. (Making files longer is not a very useful form of compression.)

Now here’s a little notation: let M be the smallest number so that there is a file B of length M that gets converted to a shorter one. Let’s say N is the length of the compressed version of this file.

Since N is smaller than M, any file of length N would still be exactly the same length when compressed (because M was the smallest length that could be converted to a shorter file and we assumed our method would never make the files larger).

How many files of length N are there? Both N bits can be 0 or 1, making the sum 2ⁿ Possible different files.

All these files are converted to one or another file of length N by the compression method. In addition, the larger file B is also compressed into a file of length N. This makes 2ⁿ+1 compressed files of length N, while the number of possible files of this length is only 2ⁿ he is. We have one file too many and that means two different input files lead to exactly the same zip file. This makes it impossible to determine what the original file was from this zip file.

This means that our original assumption cannot be correct: so there is no compression method that makes at least one file smaller and then not making all the others taller than they were. So any meaningful compression method makes at least one file larger than it was. As S10 sang: “Do you know the feeling that your dream has not come true?”

Faye Welch

“Travel enthusiast. Alcohol lover. Friendly entrepreneur. Coffeeaholic. Award-winning writer.”

I recently came across a beautiful mathematical proof about data compression

Leave a Reply Cancel reply

Dijksteel expresses concern about increasing cases of HIV/AIDS and syphilis among Surinamese youth

“Good sleep is a matter of science”

Freezing ovarian tissue can delay menopause

Trust in science? Well for neuroscientists, less so for political scientists

Serious illness on the rise: 'They get short of breath, turn blue and risk brain damage'

We finally know what a giant prehistoric shark looked like and what it ate

Pages

Recent Posts