The Proof

This page defines in high-level language what precisely we are proving, what we assume, and what the proof implies. It is aimed at an audience with a technical background, but does not assume any expertise in formal verification.

What we prove

Formal proofs can be tricky. They prove exactly what you have stated, not necessarily what you mean or what you want.

Our proof statement in high-level natural language is the following:

The C code of the seL4 microkernel correctly implements the behaviour described in its abstract specification and nothing more.

This doesn't sound like much, but it is a very strong formal statement, called functional correctness. A cynic might say: proofs like this only show that every fault in the specification has been precisely implemented in C. This is true, the above statement does not exclude this problem. But: at some point you have to say what you want (the "abstract specification") and that is what you get.

The idea is that it is much easier and quicker to say in the specification what you want, because the language used to express it is more powerful (formal, higher-order logic in this case) and because you can leave out detail you do not care about: the specification only needs to say what the software does, not how it is done.

There is another reason implementation proofs like this give you a strong return in terms of software trustworthiness: they imply the absence of large classes of common programming errors. More on this below.

And finally, for the experts, proofs like this are valuable, because they enable you to prove further things about the software much more quickly and easily: for a very large class of properties, it is now enough to prove that the property holds on the specification. These additional proofs are a further way to make sure the specification states what you mean, not just what you say. Our correctness proof then guarantees that the same property also holds for the C implementation of the kernel without any further work to be done.

This is just the beginning. We plan to make good use of this last point in the future and we also plan to make use of it for proving the correctness of applications running on top of the seL4 microkernel. Without a verified OS kernel, verified applications stand on shaky ground.

What we assume

With a proof in formal logic, it is important to understand what its basic assumptions are, because this is where fault can still occur. Our proof about the seL4 microkernel goes down to the level of the C programming language.

  • Compiler/linker: we assume that the compiler has translated this specific program (seL4) correctly into ARM11 assembly code (according to our formalisation of C and the C standard) and that the linker lays out this code correctly in memory.
  • Assembly: the seL4 kernel, like all operating system kernels, contains some assembly code, about 600 lines in our case. For seL4, this concerns mainly entry to and exit from the kernel, as well as direct hardware accesses. For the proof, we assume this code is correct.
  • Hardware: we assume the hardware works correctly.
  • Hardware management: the proof tries to make only the most minimal assumptions on the underlying hardware. It abstracts from cache consistency, cache colouring and TLB (translation lookaside buffer) management. The proof assumes these functions are implemented correctly in the assembly layer mentioned above and that the hardware works as advertised. The proof also assumes that especially these three hardware management functions do not have any effect on the behaviour of the C program. This is only true if they are used correctly.
  • Boot code: the proof currently is about the operation of the kernel after it has been loaded correctly into memory and brought into a consistent, minimal initial state. This leaves out about 1,200 lines of the code base that a kernel programmer would usually consider to be part of the kernel.
  • Virtual memory: virtual memory is the hardware mechanism that the kernel uses to protect itself from user programs and user programs from each other. Virtual memory makes life complicated for the proof, because it can change how memory is laid out and how the kernel accesses it. While our proof does talk about the concept of virtual memory and does verify the virtual memory subsystem of the kernel that is offered to user programs, the execution model of the kernel itself and its own access to memory is not up to our usual standards of reasoning from first principles like the rest of our proof. Instead we assume a certain standard behaviour of memory, and prove the necessary conditions that ensure these assumptions are true or at the very least we informally make sure they are implied by what we have proved so far. The thing is: you have to trust us that we got all necessary conditions and that we got them right. Our machine-checked proof doesn't force us to be complete at this point. In short, in this part of the proof, unlike the other parts, there is potential for human error. If you only accept end-to-end machine-checked proof as true and don't trust humans, you might therefore count virtual memory for kernel execution into the assumptions section.

What do these assumptions mean?

The proof assumptions mean that there may be faults remaining in the kernel that could be classified as implementation faults below the level of C. They will not be faults that are properly visible on the level of the C programming language (which is our main claim of correctness), but they could still be serious faults that make the seL4 kernel misbehave.

We have made these assumptions to fit into the carefully limited scope and the limited resources of a major research project. These specific assumptions are not a limitation of the general formal verification approach. In theory, it is possible to eliminate all of them: there are successful verification projects showing the correctness of even optimising C compilers; there are at least two prominent research groups that have demonstrated successful formal verification of assembly code and low-level hardware management functions; we have ourselves proved an earlier version of the boot code correct down to the level of a precise, executable specification; and we have a separate formalisation of ARM11 virtual memory from first principles. There are still significant scientific challenges in unifying all of these into a single framework, but it is clear at this point that it can be done.

With all the purity and strength of mathematical proof it is easy to get carried away. There is a fundamental theoretical limit of formal verification: there will always be some bottom level of assumptions about the physical world left and these assumptions have to be validated by other means. Mathematical proof is proof as long as it talks about formal concepts. It is when assumptions connect to reality where things may still fail. Albert Einstein is quoted as saying "As far as the laws of mathematics refer to reality, they are not certain; and as far as they are certain, they do not refer to reality." For instance, if the hardware is faulty, or if a cosmic ray randomly changes memory, the correctness predictions of our proof do not apply. Neither do any traditional tests or verification methods help against cosmic rays, but it is important to keep in mind that even with mathematical proof, there are no absolute guarantees about the physical world.

There are two other assumptions that we do not include in the list above, because they are of a different kind:

  • We assume the axioms of higher-order logic are logically consistent.
  • We assume our prover checks this particular proof correctly.

The first is a fundamental question of formal logic. If this is not true, mathematics in general has a much bigger problem than one verified OS kernel. The second is more interesting, but equally unlikely to be false. A typical conversation about the topic goes like this:

Q: So you say this OS kernel is correct. How do you know?
A: We proved it in formal logic.
Q: How do you know your proof is correct?
A: We use a computer program, a so-called theorem prover to construct and check the proof.
Q: How do you know the theorem prover is correct? Have you verified it?
A: We have not formally verified the theorem prover (you could, but note the potential for an infinite conversation if we said yes here). It is a so-called LCF-system that constrains any correctness-critical problems to a very small part of the prover. We have strong confidence in this small core.
Q: Is there a way to check the proof independently?
A: There is. You can generate a gigantic text file that contains no proof search, nothing complicated, only the axioms and proof steps of the most basic, fundamental kind. You can run a small, independent proof-checking program on that. We haven't generated that huge file, but we could.

At the end of the conversation, we are three steps removed from the original problem. Software bugs in the last step may have an impact on the first, but they need to go through all the other steps first and pass by the eyes of the human verification engineer who spends most of her working time looking at precisely this proof. There is no absolute guarantee that the proof is correct, but when you come right down to it, humans are good at creating proofs, computers are very good at checking them. It is an easy problem for computers. If you are worried about the proof, be worried about the assumptions in the first part. They are much more likely to cause problems.

What the proof implies

We mentioned above that our formal proof of functional correctness implies the absence of whole classes of common programming errors. Provided our assumptions above are true, some of these common errors are:

  • Buffer overflows: buffer overflows are a classic security attack against operating systems, trying to make the software crash or even to inject malicious code into the cycle. We have proved that no such attack can be successful on seL4.
  • Null pointer dereferences: null pointer dereferences are another common issue in the C programming language. In applications they tend to lead to strange error messages and lost data. In operating systems they will usually crash the whole system. They do not occur in seL4.
  • Pointer errors in general: in C it is possible to accidentally use a pointer to the wrong type of data. This is a common programming error. It does not happen in the seL4 kernel.
  • Memory leaks: memory leaks occur when memory is requested, but never given back. The other direction is even worse: memory could be given back, even though it is still in use. Neither of these can happen in seL4.
  • Arithmetic overflows and exceptions: humans and mathematics usually have a concept of numbers that can be arbitrarily big. Machines do not, they need to fit them into memory, usually into 32 or 64 bits worth of storage. Machines also generate exceptions when you attempt to do things that are undefined like dividing by zero. In the OS, such exceptions would typically crash the machine. This does not occur in seL4.

We could go on, but it would become a bit boring... There are other techniques that can also be used to find some of these errors. Here, the absence of such bugs is just a useful by-product of the proof. To be able to complete our proof of functional correctness, we also prove a large number of so-called invariants: properties that we know to always be true when the kernel runs. To normal people these will not be exciting, but to experts and kernel programmers they give an immense amount of useful information. They give you the reasons why and how data structures work, why it is OK to optimise and leave out certain checks (because you know they will be always be true anyway), and why the code always executes in a defined and safe manner.

As mentioned above, this is just the beginning. Formal verification has a lot more to offer. Check back for the next research project.