### There is equivalence between expressing large numbers and compression

Friday, June 4th, 2004There is an interesting discussion about large numbers in the Wikipedia and in Robert Munafo’s article.

Munafo describes several notations for concise expression of very large numbers. The basic idea is to define special operators (symbols) to allow one to express a very large number using a small number of characters (digits, operators/punctuation marks). Since a short string of characters, which represents a very large number, can be represented in a small number of bits (say, 16 times the string’s length in characters) - what we are really doing is an attempt to implement an “hash function” from large numbers into their representations, while doing this in inverse.

In other words, a mathematical notation for concisely expressing a very large number corresponds to a data compression algorithm, like the algorithms used in gzip or bzip2. I am not saying “equivalent” here, because I cannot demonstrate a way to derive a compression algorithm from a mathematical notation definition or vice versa. The data compression algorithm takes, as its input, a string, which represents the very large number in some straightforward form. The algorithm’s output is another string, which expresses the same number by means of a suitable mathematical notation.

It is well known that every compression algorithm compresses a subset of the strings presented to it, but expands another subset of the potential input strings. The point of using a compression algorithm is that the strings, which interest us, get compressed, while only “garbage” strings get expanded.

Therefore, for any mathematical notation for representing very large numbers, there is a subset of numbers, whose representation in that notation is shorter than a straightforward form; and a subset of numbers, whose representation is longer.

### What if?

Let’s say that the straightforward form, which I mentioned above, is the base-2 representation of a number. The corresponding representation, using a mathematical notation, is a string of N characters. We can consider it to be a base-2 number which is 16N+1 bits long (the extra bit is needed to guarantee that the number starts with ‘1′).

Let’s try a new way to represent very large numbers. We take a number, say 42, and represent it as binary number. We interpret the binary number as a mathematical notation, and translate it into the represented number, typically a very large number. An equivalent way of saying this is that we uncompress the string, which is the binary representation of 42.

Then, we take the very large number and read it as a mathematical notation, which denotes another very large number. In other words, we uncompress again the binary string. We can iterate this process several times.

The following questions, for which I don’t have answers, follow from the above discussion:

- Are there sequences of infinite length? Or do such iterations always repeat the same strings after a sufficiently long cycle?
- What is the largest number, which we can reach from 42 using this approach? What mathematical notation (uncompression algorithm) would allow us to reach the largest number from 42 by means of a finite (but very large) number of iterations?
- Can a similar methodology be applied also to transfinite numbers?