Formal proofs can be tricky. They prove exactly what you have stated, not necessarily what you mean or what you want.
Our proof statement in high-level natural language is the following:
The C code of the seL4 microkernel correctly implements the behaviour described in its abstract specification and nothing more.
This doesn't sound like much, but it is a very strong formal statement, called functional correctness. A cynic might say: proofs like this only show that every fault in the specification has been precisely implemented in C. This is true, the above statement does not exclude this problem. But: at some point you have to say what you want (the "abstract specification") and that is what you get.
The idea is that it is much easier and quicker to say in the specification what you want, because the language used to express it is more powerful (formal, higher-order logic in this case) and because you can leave out detail you do not care about: the specification only needs to say what the software does, not how it is done.
There is another reason implementation proofs like this give you a strong return in terms of software trustworthiness: they imply the absence of large classes of common programming errors. More on this below.
And finally, for the experts, proofs like this are valuable, because they enable you to prove further things about the software much more quickly and easily: for a very large class of properties, it is now enough to prove that the property holds on the specification. These additional proofs are a further way to make sure the specification states what you mean, not just what you say. Our correctness proof then guarantees that the same property also holds for the C implementation of the kernel without any further work to be done.
This is just the beginning. We plan to make good use of this last point in the future and we also plan to make use of it for proving the correctness of applications running on top of the seL4 microkernel. Without a verified OS kernel, verified applications stand on shaky ground.
With a proof in formal logic, it is important to understand what its basic assumptions are, because this is where fault can still occur. Our proof about the seL4 microkernel goes down to the level of the C programming language.
What do these assumptions mean?
The proof assumptions mean that there may be faults remaining in the kernel that could be classified as implementation faults below the level of C. They will not be faults that are properly visible on the level of the C programming language (which is our main claim of correctness), but they could still be serious faults that make the seL4 kernel misbehave.
We have made these assumptions to fit into the carefully limited scope and the limited resources of a major research project. These specific assumptions are not a limitation of the general formal verification approach. In theory, it is possible to eliminate all of them: there are successful verification projects showing the correctness of even optimising C compilers; there are at least two prominent research groups that have demonstrated successful formal verification of assembly code and low-level hardware management functions; we have ourselves proved an earlier version of the boot code correct down to the level of a precise, executable specification; and we have a separate formalisation of ARM11 virtual memory from first principles. There are still significant scientific challenges in unifying all of these into a single framework, but it is clear at this point that it can be done.
With all the purity and strength of mathematical proof it is easy to get carried away. There is a fundamental theoretical limit of formal verification: there will always be some bottom level of assumptions about the physical world left and these assumptions have to be validated by other means. Mathematical proof is proof as long as it talks about formal concepts. It is when assumptions connect to reality where things may still fail. Albert Einstein is quoted as saying "As far as the laws of mathematics refer to reality, they are not certain; and as far as they are certain, they do not refer to reality." For instance, if the hardware is faulty, or if a cosmic ray randomly changes memory, the correctness predictions of our proof do not apply. Neither do any traditional tests or verification methods help against cosmic rays, but it is important to keep in mind that even with mathematical proof, there are no absolute guarantees about the physical world.
There are two other assumptions that we do not include in the list above, because they are of a different kind:
The first is a fundamental question of formal logic. If this is not true, mathematics in general has a much bigger problem than one verified OS kernel. The second is more interesting, but equally unlikely to be false. A typical conversation about the topic goes like this:
Q: So you say this OS kernel is correct. How do you know?
A: We proved it in formal logic.
Q: How do you know your proof is correct?
A: We use a computer program, a so-called theorem prover to construct and check the proof.
Q: How do you know the theorem prover is correct? Have you verified it?
A: We have not formally verified the theorem prover (you could, but note the potential for an infinite conversation if we said yes here). It is a so-called LCF-system that constrains any correctness-critical problems to a very small part of the prover. We have strong confidence in this small core.
Q: Is there a way to check the proof independently?
A: There is. You can generate a gigantic text file that contains no proof search, nothing complicated, only the axioms and proof steps of the most basic, fundamental kind. You can run a small, independent proof-checking program on that. We haven't generated that huge file, but we could.
At the end of the conversation, we are three steps removed from the original problem. Software bugs in the last step may have an impact on the first, but they need to go through all the other steps first and pass by the eyes of the human verification engineer who spends most of her working time looking at precisely this proof. There is no absolute guarantee that the proof is correct, but when you come right down to it, humans are good at creating proofs, computers are very good at checking them. It is an easy problem for computers. If you are worried about the proof, be worried about the assumptions in the first part. They are much more likely to cause problems.
We mentioned above that our formal proof of functional correctness implies the absence of whole classes of common programming errors. Provided our assumptions above are true, some of these common errors are:
We could go on, but it would become a bit boring... There are other techniques that can also be used to find some of these errors. Here, the absence of such bugs is just a useful by-product of the proof. To be able to complete our proof of functional correctness, we also prove a large number of so-called invariants: properties that we know to always be true when the kernel runs. To normal people these will not be exciting, but to experts and kernel programmers they give an immense amount of useful information. They give you the reasons why and how data structures work, why it is OK to optimise and leave out certain checks (because you know they will be always be true anyway), and why the code always executes in a defined and safe manner.
As mentioned above, this is just the beginning. Formal verification has a lot more to offer. Check back for the next research project.