Malicious tampering of trusted libraries

Question

I gather that some proof assistants trust multi-precision arithmetic packages such as GMP. I'm wondering if anyone has, as a proof of concept, formally proved (for example) the Riemann hypothesis using a well-known proof assistant, by making a small malicious change to the GMP library? The idea would be to trigger the exploit only with a specific large-integer computation, so that the malicious library would behave normally under almost all circumstances.

If this hasn't been done already, I think it would be worth doing for illustrative purposes.

@AndrejBauer The idea here is that the proof assistant per se is completely bug-free; nevertheless, a bug in a trusted library causes it to prove False or some unproven statement. The point is to illustrate what can go wrong with trusting external libraries. — Timothy Chow
– Timothy Chow, Commented Sep 14 at 13:07
Does it have to be GMP specifically, or would any of the other attack vectors also be ok? There are, like, 50 libraries that are linked into a typical proof assistant. — Andrej Bauer
– Andrej Bauer, Commented Sep 14 at 13:08
@AndrejBauer It doesn't have to be GMP, but it should be something in the trusted code base other than the kernel of the proof assistant. The more "trusted" the library the better, which is why I picked GMP. — Timothy Chow
– Timothy Chow, Commented Sep 14 at 13:09
@TimothyChow Here is one github.com/clarus/falso that exploits implementation bug. I remember there was a repo of false proofs in Rocq on GitHub but I cannot find it. — keep_learning
– keep_learning, Commented Sep 14 at 13:29
Also, if this concerns you, you could look into CakeML, Metamath, and Metamath Zero, which are less susceptible to these issues, either because the full system is verified (in the case of CakeML and the plans for Metamath Zero) or because the kernel is much simpler and easier to implement (in the case of Metamath). — Jason Rute
– Jason Rute, Commented Sep 14 at 15:52

Jason Rute · Accepted Answer · 2025-09-15 00:10:31Z

Lean 4 uses GMP in a few ways, which can cause (and indeed already has caused) bugs.

GMP-based `native_decide` bugs

Lean 4 has a mechanism where mathematically pure functions can be overridden with external implementations calling external C code. These are used when executing the code. (Lean is a programming language.) Some common examples are operations on the integers and natural numbers. These use external C++ implementations using GMP, which are much faster than going through the standard unary definitions of natural numbers. If these external implementations are functionally the same as the mathematical definition, then there are no bugs. But if there is a mistake in the implementation, that could lead to a bug in two ways:

The compiler and code interpreter will execute this operation incorrectly when the code is executed. This is essentially a compiler bug. It's not a logical soundness bug, but a code soundness bug. It would lead to wrong results and code crashing.

However, there is a special tactic with its own axioms, native_decide. It is like rfl, but it bypasses the kernel and just runs code to evaluate an expression. If a function override is implemented incorrectly (i.e., it differs from the mathematical definition), then it allows you to reduce an expression in two different ways and prove False. Such bugs have been common in Lean 4, and a few complicated ones involving non-deterministic IO functions only got fixed this month. It certainly makes native_decide not 100% trustworthy, and most Lean projects avoid it unless necessary. (Again, native_decide has its own axioms, so if you only use the three standard axioms, you are avoiding this issue entirely.)

GMP-based kernel bugs

Lean 4's kernel, written in C++, also uses GMP-based C++ code to reduce terms of type Nat. This is a bit controversial, but it has a large speed advantage. If there is a bug in the implementation of any such GMP-externally-implemented function used in the kernel, it could lead to a kernel soundness bug. (Mario mentions that this happened a few times. Here is an old instance from before Lean 4 was stable.) This is similar to native_decide, but differs in a good and a bad way.

Good: In native_decide, the user can add new external implementations, so it is impossible to prevent native_decide soundness issues. But here, there is only a select few kernel primitives, and by now those have been scrutinized many times. A user hopefully can no longer create a proof of False using these.
Bad: There are no special axioms alerting the user to the kernel using MCP for nat reduction. If there were a bug, it would be a pure kernel soundness issue.

Bugs in GMP

I'm not aware of anyone finding and exploiting, nor creating and exploiting, a bug directly in GMP. The existing bugs have been in Lean C++ code, which just implements, say, Nat.mod incorrectly (or at least not the same as in Lean). But if there was a bug in the GMP bignum code used in the kernel to reduce Nat, it could be exploited just like the previous kernel bugs. (And it wouldn't be that hard to do, just compute the same expression in two different ways.)

Do we need GMP?

Lean does not need GMP to check proofs. It is just a convenience. One could build a kernel that avoids GMP and just reduces Nat as normal, using its definition. One could even optimize this kernel by using special theorems to speed up term reduction, either by proving a more complicated kernel correct or by replacing some reductions with explicit proof terms (not unlike the norm_num tactic, but built into the kernel). (I think I recall hearing that the Coq kernel was heavily optimized for the four color theorem, but I don't know the details.)

Also, Lean exports proof terms that can be checked by another program. So in some sense, with proof export, there is still the chance that any such GMP exploits would be found and caught if one runs the proof through a checker that doesn't use GMP. I don't know if such a checker currently exists or not, and how much slower it would be.

I think even Lean's checker could in theory be run without GMP. Code exists for it to use basic C++ code to implement nat operations. I'm not fully familiar with this, but here Mario mentions running lean4checker (an external checker which is basically the same as the Lean kernel) in an environment without GMP. If possible, that would be another way to ensure there are no GMP bugs. (The code still uses a C++ implementation relying on custom code and machine types like int64. It wouldn't use the unary definition of Nat.)

Is this a real worry?

As proof assistants become more important for verifying the correctness of security-critical applications, it is not inconceivable that a bad actor could induce a flaw in some upstream dependency and then exploit it in the Lean kernel. I would personally worry more about naturally occurring bugs first, but it is possible.

Also, AI systems trained via reinforcement learning are known to find exploits. (DeepSeek-Prover v2 exploited a subtle Lean bug recently. It was a bug in Lean's front end that removed the theorem from the environment, but not a bug in the kernel.)

Safer proof assistants

If this sort of thing worries you, there are provers with a much stronger guarantee of correctness.

CakeML is a verified system running HOL-Light. I don't think it depends on external libraries like GMP.
Metamath is a much simpler theorem prover, where you could program a checker yourself in a few days. There are many external checkers, and they don't need fancy external libraries. Also, everything is ASCII, and there is no overridden notation, so the meaning of each statement is non-ambiguous. It is also a meta logic, so it can be used to check other logics like ZFC or HOL. (It can even check Lean, but one would need to store each kernel reduction step explicitly. I don't think that would be fast.) Also, because there is no automation, the users use theorems to shorten and speed up proofs directly. Proof checking is really fast in Metamath.
Metamath Zero is a variation of Metamath, which is faster and differently designed. It has also been used to translate between proof assistant languages. One goal of the project is to verify (in Peano Arithmetic) the kernel's computer code (which is a very small amount of pure C code). This would check that the kernel is implementing the logical rules as given correctly.
MetaRocq and Lean4Lean are both projects using Rocq (resp. Lean) to verify the Rocq (resp. Lean) kernel. In Lean4Lean, the plan is to verify a version of the Lean kernel written in pure Lean. However, I don't know if any part of this plan involves avoiding externally implemented GMP functions in that kernel. (Currently, the code uses externally-implemented functions like Nat.add to reduce Nat just like in the C++ kernel. However, by proving the kernel correct, it would make it easier to refactor it without breaking it.)

As usual, a very good and comprehensive answer. Let me just add that it is worth keeping in mind the wider context here. Given the obliquity of GMP, if it has a bug then Lean 4 will be the least of our worries. — Andrej Bauer
– Andrej Bauer, Commented Sep 15 at 6:34
@AndrejBauer Certainly if there is an unintentional bug in GMP then it will have wide repercussions. However, a malicious version of GMP that is narrowly deployed against a particular target is another matter. — Timothy Chow
– Timothy Chow, Commented Sep 15 at 13:32

Stack Exchange Network

Malicious tampering of trusted libraries

1 Answer 1

GMP-based `native_decide` bugs

GMP-based kernel bugs

Bugs in GMP

Do we need GMP?

Is this a real worry?

Safer proof assistants

Your Answer

Linked

Hot Network Questions

Malicious tampering of trusted libraries

1 Answer 1

GMP-based native_decide bugs

GMP-based kernel bugs

Bugs in GMP

Do we need GMP?

Is this a real worry?

Safer proof assistants

Your Answer

Sign up or log in

Post as a guest

Linked

Hot Network Questions

GMP-based `native_decide` bugs