*I was very happy to speak at Curry On once again. Last year, following our work with Quasar, I spoke on continuations vs. monads. This year I gave a talk about computational complexity results in software verification, a subject I became interested in while formally specifying and verifying our upcoming revamped version of Galaxy.*

*What follows is the first half of an extended version of the talk (embedded above), containing some additional proofs and results I didn’t have time to cover, complete with footnotes, for those wishing to study the topics further.*

In this article, we try to understand the relationship between programs and correctness, and in particular, why writing correct programs must be hard. We will review results from computability theory and complexity theory, and see that programs and complete understanding – which is required for correctness – are fundamentally at odds. The main contribution of this text is aggregating in one place some results (as well as some references) pertaining to the *essential* difficulty of producing correct software. My motivation was various statements I found online (such as this, although it is far from the only one) claiming that software correctness is *easy* to achieve with the right tools. While some tools may (and do) certainly assist ensuring programs are correct, correctness comes at a significant cost. The cost of correctness – like the energy cost of reducing entropy – is a result of the “natural laws” of computation, that cannot possibly be avoided.

## Chapter I: Foundations

The story of theoretical CS began about a hundred years ago. With the growing success of mathematical logic, the mathematician David Hilbert laid out a program to formalize all of mathematics. He wanted to prove the formalism consistent and complete, and to find a decision procedure, an algorithm for determining the truth of every formal statement:

**Consistency**: proof that no contradiction can be obtained in the formalism.**Completeness**: all true mathematical statements can be proved in the formalism.**Decidability**: an algorithm deciding whether any statement is a theorem (**Entscheidungsproblem**)

Hilbert believed that anything that can be formally reasoned can also be fully understood, and by that I mean all of its properties proven. the 19th century view on the limits of knowledge was expressed by the maxim “ignoramus et ignorabimus”, meaning “we do not know and we will not know”. For his retirement in 1930, Hilbert gave a lecture – “Naturerkennen und Logik” (Logic and the understanding of nature) – in Königsberg to the Society of German Scientists and Physicians which he challenged this view. He said:

For the mathematician there is no Ignorabimus, and, in my opinion, not at all for natural science either. … The true reason why [no one] has succeeded in finding an unsolvable problem is, in my opinion, that there *is no* unsolvable problem. In contrast to the foolish Ignorabimus, our credo avers: **We must know, we shall know! ^{1}**

Those words – *Wir müssen wissen. Wir werden wissen* – are inscribed on his tombstone.

Unbeknownst to Hilbert at the time, his program had been demolished a day earlier, when Kurt Gödel tentatively announced his first incompleteness result^{2}.

Gödel’s incompleteness theorems demolished the first two pillars of Hilbert’s program (some say also the third, only he didn’t realize it^{3}), and in 1936, Church and Turing broke the third, that of the decision problem, with proofs of what is today known as the halting theorem.

The halting theorem – CS’s first theorem – was proven in both Alonzo Church’s^{4} and Alan Turing’s^{5} papers in 1936. The theorem is so fundamental to the essence of computation that it is no accident that it appeared alongside both formulations of computation. It states that there cannot exist an algorithm that decides whether any given program ever terminates. The proof (which is given here in a shortened form) is based on the idea of diagonalization, or self reference, the same idea Gödel had used in his incompleteness proof.

Assume such a program H exists:

```
H(M, x) := IF M halts on x THEN TRUE
ELSE FALSE
```

we can then easily construct G:

```
G(y) := IF H(y, y) = FALSE THEN TRUE
ELSE loop forever
```

If we then pass G to itself (G(G)) we get a contradiction between what G does and what H says that G does.

It is interesting to note that the halting theorem immediately yields a version of Gödel’s first incompleteness theorem. The proof says simply^{6}: suppose that for every true statement there was a logical proof in some logical theory (basically a set of axioms). So, given a program, starting from those axioms, we can enumerate all theorems one by one by using logical deduction. Because the number of all statements, let alone all theorems, is countable, and because every statement either has a proof or its negation does, we would eventually find the proof of one or the other. Therefore, given a program, we can start enumerating all theorems until we find a theorem showing that the program terminates, or one that says it doesn’t, thus violating the halting theorem.

However, the halting theorem has some other immediate corollaries that tie it more directly to the problem of software verification, which is our main focus. The first is the simple result about reachability.

There does not exist a program R, when given a program M, a set of initial states I and state configuration S, determines whether or not M ever reaches S when it starts in one of the states in I.

The proof is a simple reduction from halting: if there were an such an algorithm, we could ask it if, starting with the input x it ever reaches the halting state, and so solve the halting problem.

A more interesting corollary is Rice’s Theorem. Programs can compute what’s known as partial functions. If they stop when given an input, we consider the output written on the tape to be mapped to the initial input; if they never stop for a given input, we consider that input unmapped. Rice’s theorem states this:

For any non-trivial property of partial functions, no algorithm can decide whether or not a program (TM) M computes a partial function with that property.

For example, there is no algorithm to determine whether the output of a program is always the square of its input. This theorem is also proven by a reduction from Halting.

For our purposes, I think that the most interesting corollary of the halting theorem is what we’ll call “the Busy Beaver theorem”^{7}, because it captures best the notion of the complexity of analyzing programs, giving it some quantitative measure. Because this result applies to both dynamic and static views of programs (namely, their behavior as they run or their global properties), it helps demonstrate that they are two sides of the same coin.

In the dynamic version, we define the following function on the natural numbers:

F(n) = the maximum number of steps a halting TM with n control states may take before halting

A Turing machine’s control states constitute its program, so n is the size of the program. This function is well defined because there is a finite number of TMs of size n. The subset of them that terminates is also finite, so there exists a maximal number of states.

The theorem says F is non-computable. Suppose that F were computable. If so, that would be a violation of the halting theorem. Given a TM, we count the number of control states, n, compute F(n) (which is possible by our assumption of F’s computability), and then run the machine for F(n) steps. If it doesn’t terminate by then — then it must surely never terminate (by the definition of F), and we have a decidable procedure of the halting problem, in contradiction with the halting theorem.

Not only is F a non-computable function, but any function that is known to be greater than f for all n is also non computable, because if we knew of such a function, the same procedure would again yield a decision for halting.

The very same argument applies to a static view of programs. To understand this result, we need to know that given a logical theory with a finite set of axioms, an algorithm can apply deduction rules one by one starting with the axioms, and start enumerating all theorems (provable statements) one by one.

This time, we’ll define the function:

G(n) = maximum size of a minimal proof of termination of a halting TM of size n

For every terminating TM of size n, we pick the shortest proof of termination (there must be at least one proof, as a terminating trace is such a proof). We then take the maximum among all terminating TMs of size n (of which there’s a finite number).

Similarly to F, G must also be non-computable. If G(n) were computable, then given a TM of size n, we’d describe the machine with some axioms and then we would enumerate all theorems with proofs of size \leq G(n) — that’s a finite number — and if no proof is found, we’d know the machine to never terminate; again, in contradiction with the halting theorem.

This corollary of the halting theorem captures the following fact: the function tying the size of a program to the difficulty of proving that it terminates, regardless of whether we’re interested in a dynamic proof technique or a static one, is greater than all computable functions.

While we cannot compute F for all n, we can compute it, or lower bounds for it, for some values. For n = 6 (and a 2-symbol alphabet), it is > 7.4 \times 10^{36534}.

Another question we may ask is this: we know that we can’t come up with an algorithm to decide whether any arbitrary program halts, but are there *specific* programs that we cannot know whether they halt or not (even if we tailor a decision procedure specifically for them)?

Gödel’s second incompleteness theorem yields such a program. The theorem states it is impossible for a logical system to prove its own completeness. The logical system that is widely accepted as the foundation of mathematics is called ZFC — Zermello-Fraenkel set theory with the axiom of choice; within that system we can prove most mathematical theorems we care about (with some very notable, very special exceptions). So we can construct a program that does the following: starting from the ZFC axioms, enumerate all theorems and stop when you reach a contradiction (e.g. 1 = 2). The program would stop if-and-only-if ZFC is inconsistent. We *assume* ZFC to be consistent, and so would expect the program to never halt, but we can’t *prove* it, by the second incompleteness theorem.

This past May, Adam Yedidiya and Scott Aaronson published a program that compiles down to a Turing machine that works by that principle (or something equivalent to it)^{8}. The result was quickly improved upon by Stefan O’Rear, who created a Turing machine with 1919 control states which cannot be proven to terminate (i.e., it is “independent of ZFC”). This machine is so small that it comfortably fits in a 4K challenge; it is comparable to a Python program with 100-200 lines.

This mean that there are *small* programs — just a couple hundred lines or less — whose behavior cannot be mathematically determined. Not because the analysis would be too large or intractable, but because mathematics is not powerful enough to ever prove the behavior of this program, even in theory.

### Computer Science is Not Math

In any event, with the work of Gödel, Church and Turing, Hilbert’s program — at least as originally planned — was laid to rest, 80 years ago by simple, compact, mathematical objects that can be formally reasoned about, but not fully understood: computer programs.

There is one foundational question to be asked. The entire premise of the halting theorem rests on the idea that a function that requires infinite computational steps is uncomputable. This isn’t entirely obvious. After all, math often handles infinite objects^{9}. While Church glossed over this point (perhaps believing it to be obvious — he called it a “vague intuitive notion”) Turing, whose paper was much more philosophical than Church’s, made a point of explaining that an algorithm is something that is carried out by a person or by a machine.

In 1947, he gave a lecture to the London Mathematical Society. “Some years ago”, he said:

I was researching on what might now be described as an investigation of the theoretical possibilities and limitations of digital computing machines.It was essential in these theoretical arguments that the memory should be infinite… Machines such as the ACE may be regarded as practical versions of this same type of machine^{10}.

This is how Turing viewed his 1936 breakthrough achievement. While the mathematical formalization may appeal to the infinite, it was clear to Turing that the notion of computation is tied to some physical process. If that were not so, there would be no justification for the centrality of the halting problem and its use as a negative solution to Hilbert’s Entscheidungsproblem. An algorithm is something that is carried out in the physical world.

The Church-Turing thesis, as it is presented today, conjectures that *any* computation done by any realizable physical process can be computed by the universal computational models described by Church and Turing. In a sense, this means that everything in the universe could be formalized and simulated as a computer program.

Programs are fundamentally and essentially at odds with full understanding, proof and correctness.

As mathematics — as an academic discipline, not as a tool — is concerned with what’s inside the small circle, and computer science is largely concerned with what’s outside it — and, as we’ll see, even those programs that do lie within that inner circle, many are just provable in principle, but not feasibly provable — I think it is clear that computer science isn’t math.

### Turing’s Philosophical Contribution

Paul Cockshott, a computer scientist and a political economist at the University of Glasgow, writes that Turing *changed* math rather than ventured beyond it, but I think that the essence is the same:

Turing’s 1936 paper on the computable real numbers marks the epistemological break between idealism and a materialism in mathematics. Prior to Turing it was hard to get away from the idea that through mathematical reason, the human mind gained access to a higher domain of Platonic truths. Turing’s first proposal for a universal computing machine is based on an implicit rejection of this view. His machine is intended to model what a human mathematician does when calculating or reasoning, and by showing what limits this machine encounters, he identifies constraints which bind mathematical reasoning in general (whether done by humans or machines). … Turing starts a philosophical tradition of grounding mathematics on the material and hence ultimately on what can be allowed by the laws of physics. The truth of mathematics become truths like those of any other science — statements about sets of possible configurations of matter.

I think Turing also achieved the opposite: by constructing a universal machine that, at least by our current understanding, can simulate any physical process, and at the same time be governed by the halting problem, he created an impenetrable barrier of reduction. A computer’s behavior cannot possibly be understood by reduction to the behavior of the physical process by which it is implemented. In fact, a computer can (albeit only approximately, due to size concerns) be more general than the universe that contains it. I find this to be, in a way, a decisive victory to the philosophical tradition of dualism. Mind can be implemented in matter, but it can in no way be reduced to it.

*Alan Turing*

Continue reading* chapters 2, 3, and the conclusion on **The Parallel Universe blog*.

Selected Bibliography

The entries are ordered by their order of mention in the article/talk. The full list of references appears in the footnotes below.

- A. M. Turing,
*On Computable Numbers, with an application to the Entschidungsproblem*, 1936 - A. M. Turing, A lecture to the London Mathematical Society, February 20, 1947
- J. Hartmanis and R.E. Stearns, On The Computational Complexity of Algorithms, 1965
- Juris Hartmanis, Turing Award Lecture:
*On Computational Complexity and the Nature of Computer Science*, 1994 - S. Demri, F. Laroussinie and Ph. Schnoebelen,
*A Parametric Analysis of the State-Explosion Problem in Model Checking*, 2005 - Philippe Schnoebelen,
*The Complexity of Temporal Logic Model Checking*, 2002 - Edmund M. Clarke,
*Model checking: my 27-year quest to overcome the state explosion problem*, 2007 - Edmund M. Clarke,
*Progress on the State Explosion Problem in Model Checking*, 2001 - David A. Schmidt,
*Binary Relations for Abstraction and Refinement*, 1999 - Patrick Cousot,
*Abstract Interpretation Based Formal Methods and Future Challenges*, 2001 - Patrick Cousot,
*Types as Abstract Interpretations*, 1997 - Jean-Yves Girard (Translated and with appendices by Paul Taylor, Yves Lafont),
*Proofs and Types*, 1989 - Juris Hartmanis,
*Some Observations About the Nature of Computer Science*, 1993 - Frederick P. Brooks, Jr.,
*No Silver Bullet*and*No Silver Bullet, Refired*in*The Mythical Man-Month, Essays on Software Engineering, Anniversary Edition*, Addison-Wesley 1995

* * *

- John W. Dawson, Jr.,
*Logical Dilemmas, The Life and Work of Kurt Gödel*, A K Peters,1997 p. 71 ↩ - Dawson, p.69 ↩
- Saul Kripke,
*The Origins and Nature of Computation*, Lecture at Tel Aviv University, June 13, 2006, 1:45:50-1:47:10 ↩ - Alonzo Church,
*An Unsolvable Problem of Elementary Number Theory*, 1936 ↩ - A. M. Turing,
*On Computable Numbers, with an application to the Entschidungsproblem*, 1936 ↩ - Scott Aaronson,
*Rosser’s Theorem via Turing machines*, July 2011 ↩ - Tibor Rado,
*On Non-Computable Functions*, 1961 ↩ - Adam Yedidia and Scott Aaronson
*A Relatively Small Turing Machine Whose Behavior Is Independent of Set Theory*, 2016 ↩ - Although the subject of infinity in mathematics has always been controversial, with some logicians arguing against it, I would say that the reason for the opposition is precisely
*because*of the same physical limitations that we associate with computation. ↩ - A. M. Turing, A lecture to the London Mathematical Society, February 20, 1947 ↩

## {{ parent.title || parent.header.title}}

## {{ parent.tldr }}

## {{ parent.linkDescription }}

{{ parent.urlSource.name }}