# I recently came across a beautiful mathematical proof about data compression

Ionica Smiths

With lossless data compression, you can accurately restore the original file from the compressed file. It’s not obvious, think of the bloated images you sometimes see: the image file size is reduced there so you can’t go back to the original image.

I recently came across great mathematical proof that one lossless data compression method can never work for all files. If you have a method that shrinks at least one file, there will inevitably be a file that actually grows the same way.

It’s a silly proof: you start with the opposite of what you want to prove, then you start with the mind from there a number of perfectly logical steps until you come across a contradiction. Since all of your steps were logical and correct, the only possible conclusion is that your first assumption was wrong.

In this guide, we think of stored files as a finite sequence of bits (each of which can be 0 or 1). We assume that there is a compression method that converts at least one file into an output that is shorter than the original file and that method does not make all the other files longer than they were. (Making files longer is not a very useful form of compression.)

Now here’s a little notation: let M be the smallest number so that there is a file B of length M that gets converted to a shorter one. Let’s say N is the length of the compressed version of this file.

See also  NASA struggles to fix Hubble Space Telescope computer failure in the 1980sثماني

Since N is smaller than M, any file of length N would still be exactly the same length when compressed (because M was the smallest length that could be converted to a shorter file and we assumed our method would never make the files larger).

How many files of length N are there? Both N bits can be 0 or 1, making the sum 2n Possible different files.

All these files are converted to one or another file of length N by the compression method. In addition, the larger file B is also compressed into a file of length N. This makes 2n+1 compressed files of length N, while the number of possible files of this length is only 2n he is. We have one file too many and that means two different input files lead to exactly the same zip file. This makes it impossible to determine what the original file was from this zip file.

This means that our original assumption cannot be correct: so there is no compression method that makes at least one file smaller and then not making all the others taller than they were. So any meaningful compression method makes at least one file larger than it was. As S10 sang: “Do you know the feeling that your dream has not come true?”