Analysis of the Collatz Conjecture

The Collatz Conjecture was proposed by Lothar Collatz in year 1937. The Collatz Conjecture is a very easy to define, and implement for a specific case, but so far no one has been able to prove or disprove it in general.

Algorithm

Defining the conjecture is simple. Take any positive integer. If it is odd, multiply it by 3 and add 1. If it is even, divide it by 2. Keep repeating this until you reach 1. The conjecture says that no matter which number you start with, if you follow the steps, you will always eventually reach 1.

History

In year 1985, J.C.Lagarias proved that the conjecture was applicable to all numbers less that 275,000.

Now many mathematicians think that Math is not ready for such a problem. Though the implementation for a specific number is easy, it is not easy to prove for numbers in general. Such a problem just needs one exception to be false. Yet those who try to solve it cannot stop because it is so engrossing. You just have to find the answer. That is why many people call it the “dangerous problem”.

The reason the Collatz Conjecture is so hard to prove or disprove is that when you are working out the series for a number and you have done millions of steps and it showing no signs of reaching 1, that does not mean in a trillion steps it will not reach one. The Collatz conjecture does not cap the number of steps required for the series of a number to reach one.

But one way to make it easier is, if are working on the Collatz series of a number and at some point in the series you reach a number you have checked before, then you don’t need to go on. Let’s call the number you are implementing the Collatz algorithm on: number A. After some steps you arrive at number B. If you have already proven that number B satisfies the Collatz conjecture, then it follows that number A also satisfies the Collatz conjecture, since after this step the algorithm will follow the same path as it did for number B. For example, suppose you are checking if the Collatz conjecture holds for number 24. Since 24 is even you have to divide it by 2. Half of 24 is 12. But if you have already proven the conjecture holds for 12, you know what the path for 12 is. So then you know that the series for 24 reaches one because in implementing the series for 24 you encounter 12 and you know, the series for 12 reaches 1.

Mathematician Terence Tao has proved the Collatz Conjecture can be applied to almost all numbers. He used partial differential equations as tool in his proof. He split all numbers in three different sets. The first set was was multiples of 3. The second set was numbers, which when divided by 3 had 1 as the remainder. The third set was numbers, which when divided by 3 had 2 as the remainder. He found it easy to prove the conjecture for multiples of 3 but the other sets posed tougher challenges. To find out more here is the link to Tao’s paper.

My Analysis

I started by trying to check the Collatz Conjecture for numbers at random by hand when I first heard of it. I tried to check it for number 27. This one was not easy. After about 20 steps I gave up. The numbers were getting huge and it seemed to go on forever. My mom suggested that I should write a program to do it for me. So I did.

import math

print("We want to examine the Collatz conjecture, and see how the series converges for various numbers")
print("This program generates the list for numbers up to infinity")

def collatz():
    number = input("Enter a positive integer less than 1000:  ")
    number = int(number)
    steps = 0
    highest_number = 0
    odd_steps = 0
    print(f"The number being calculated is {number}")
    f = open(f"collatz{number}.txt" , "w")
    while number >1:
        steps = steps + 1
        if number > highest_number:
            highest_number = number
        else:
            highest_number = highest_number
        is_even = number % 2 == 0
        if is_even:
            number = number/2
            f.write(f"{steps}   {number}")
            f.write("\n")
        else:
            number = 3 * number + 1
            odd_steps = odd_steps + 1
            f.write(f"{steps}  {number}")
            f.write("\n")
    f.close()
    print("This is the Collatz series for the number you have entered")
    print(f"the number of odd steps taken to reach one is: {odd_steps}")
    print(f" the number of steps taken to reach one is: {steps}")
    print(f" the highest number reached is: {highest_number}")

tryit = "Y"
while tryit == "Y":
    collatz()
    tryit = input("Type Y to get the Collatz series of your choice or press any other key to exit:  ")

I tested it for number 12 because that was easy to do by hand. I started running the program for various numbers. Here is the plot for number 37. The Y axis shows the output number, given by the collatz series, for each step, shown by the X axis.

Collatz series plot for number 37

I ran it again and this time for number 55. Here is the plot.

Collatz series plot for number 55

After a little while I decided to do it for 27 because I could not do it by hand. Here’s the plot.

Collatz series for number 27

Then I realized that number 27 had 111 steps. The next time, I ran it for numbers 25, 26, 28 and 29 to see how they converged. In particular, I noted, the highest number reached and the number of steps. Here’s the plot.

Collatz series for numbers 25, 26, 28, 29

As you can see in the plot, even the collatz series for 25, which takes the most steps, only takes 23 steps to reach one. So the highest number reached in the collatz series for 27 is unusually high. The collatz series for 27 also takes unusually long to converge.

Collatz series for numbers 28 and 29

After examining the plots more closely, I realized that the collatz series for 28 and 29 meet at a point, and then the line corresponding to the collatz series for 28 disappears from the plot completely. It puzzled me for a while. But I finally figured out the reason.

This is because once the output of two series coincidentally, reach the same number on the same step, henceforth they both have to follow the path of collatz series of the number they reached. That is why, in the case of 28 and 29’s collatz series by the seventh step, 28 gets hidden under 29, because, their paths overlap.

I continued playing around with more numbers and, to my surprise, found that 12345 took only 50 steps to converge but the numbers around it took approximately 100 steps. But for 12343 it took 262 steps and the highest number is 975,000 which is much higher than the rest.

Colatz series for 12345

Collatz series for numbers 12344, 1236 and 12347
Collatz series for 12343

In the plot above you can see that 12344 and 12346 follow the converge on the same path, after meeting at number below 10,000.

Convergence of the Collatz Series

According to Wikipedia “If one considers only the odd numbers in the sequence generated by the Collatz process, then each odd number is on average ¾ of the previous one.” Here is the link to the Wikipedia page.

To test this theory, I re-wrote the program, to only output the odd numbers of the Collatz series.

import math

print("We want to examine the Collatz conjecture, and see how the series converges for various numbers")
print("This program generates the list for numbers up to infinity")

def collatz():
    number = input("Enter a positive integer less than 1000:  ")
    number = int(number)
    odd_steps = 0
    print(f"The number being calculated is {number}")
    f = open(f"collatz_odd{number}.txt", "w")
    while number >1:
        is_even = number % 2 == 0
        if is_even:
            number = number/2
        else:
            odd_steps = odd_steps + 1
            f.write(f"{odd_steps}   {number}")
            f.write("\n")
            number = 3 * number + 1

    print("This is the Collatz series for the number you have entered")
    print(f"the number of odd steps taken to reach one is: {odd_steps}")
    f.close()

tryit = "Y"
while tryit == "Y":
    collatz()
    tryit = input("Type Y to get the Collatz series of your choice or press any other key to exit:  ")

For comparison, I wrote one more program to generate the geometric series with ratio ¾ for a given start value.

print(" Today we are going to talk about about the 3/4 problem")

def loop():
    number = input("Give me a number:  ")
    number = float(number)
    steps = 1
    f = open(f"CCcompare{number}.txt", "w")
    while number >1:
        f.write(f"{steps}   {number}")
        steps = steps + 1
        number = (number * 3.0) / 4.0
        f.write("\n")
    f.close()
tryit = "Y"
while tryit == "Y":
    loop()
    tryit = input("Type Y to learn the 3/4 series of another number.")

When we multiply something by ¾ we can get floating point values, unlike in the Collatz conjecture in which you only generate positive integers. For example, suppose you choose 16 as the start value. The program will generate ¾ of number 16 which is 12 and then it’ll multiply 12 by ¾ to yield 9 and then it’ll multiply 9 by ¾ and get 6.75 (which is a floating point number) and so on.

Here are some plots I made that compare the collatz series of a number, to the geometric series with ratio ¾ for that number. The picture below shows 2 plots. The green plot shows the geometric series with ratio ¾ for each step and the purple plot shows the Collatz series with only the odd numbers. The images given below show the plots numbers 25, 29 and 37.

Odd numbers from the Collatz series and the geometric series for ratio 3/4 for number 25
Odd numbers from the Collatz series and the geometric series for ratio 3/4 for number 29
Odd numbers from the Collatz series and the geometric series for ratio 3/4 for number 29

The above plots show that the odd numbers generated in the Collatz series more or less, follow the asymptotic trend of a geometric progression with ration 3/4.

Then I repeated the process for number 55 and got a completely different result. It really perplexed me. The Collatz series was going to much higher numbers and taking much longer to coverage than geometric series with ratio ¾.

Odd numbers from the Collatz series and the geometric series for ratio 3/4 for number 37

I realized that the Wikipedia page said that they converged at the same speed on average. So yes there were going to be some exceptions and 55 was on of them. Another is 47.

Odd numbers from the Collatz series and the geometric series for ratio 3/4 for number 55

Even though the Collatz conjecture has not been proved or disproved yet, I enjoyed learning about it because, analyzing the problem, implementing it for different numbers and seeing how the series proceeds was fun.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s