Surfer's blog

Let's Learn Lean : Proof By Induction

After learning the Basics, it's time to learn our first tactic for proving various theorems.

In this article, we will be learning about a powerful reasoning principle : induction. We will also have a separate section for learning how Binary Numerals can be written in Lean.

Proof By Induction

The Principle of Mathematical Induction is a very common topic in the high school mathematics textbooks. The good thing is that we can use the same principle for writing proofs in Lean as well.

The principle of induction over natural numbers :

If P(n) is some proposition involving a natural number n and we want to show that P holds for all numbers n, we can reason like this :

This works in a very similar way in Lean as well.
If n : ℕ is an object, and the goal mentions n, then induction n with d hd attempts to prove the goal by induction on n, with the inductive variable in the successor case being d, and the inductive hypothesis being hd.

Example:

If the goal is

0 + n = n

then

induction n with d hd

will turn into two goals. The first is 0+0=0; the second has an assumption hd : 0 + d = d and goal 0 + succ d = succ d.

We prove the first goal before moving onto second.

Let's try to prove a theorem using the same tactic. In the last article we were able to prove n+0=n but had difficulty in proving 0+n=n as n is an unknown and the recursion happens on n, but with the help of induction, this can be done pretty easily.

theorem add_0_n (n : Nat) : 0 + n = n := by
  induction n with
  | zero => rfl
  | succ n hd =>
    rw [add_succ, hd]
```![image](https://bear-images.sfo2.cdn.digitaloceanspaces.com/surfer/image.webp)


You'll notice that we are using `add_succ` theorem which is nothing but 
```Lean
theorem add_succ (n m : Nat) : n + succ m = succ (n + m) :=
  rfl

These are some pretty common theorems which we can use directly from the Nat library. ( In the Induction.lean file, you can see us using open Nat for making these usage of theorems work.)

Let's try to prove commutativity of addtion next i.e. n + m = m + n. If we try proving it directly, we'll find ourself stuck on the step succ n + m, as addition works on second variable and the succ n is in the first place. So let's prove this part first.

We prove this by using induction on m.

theorem succ_add_lemma (n m : Nat) : (succ n) + m = succ (n + m) := by
  induction m with
  | zero => rfl
  | succ m hd =>
    rw [add_succ, add_succ, hd]

With this in place, we prove commutativity.

theorem add_comm (n m : Nat) : n + m = m + n := by
  induction m with
  | zero =>
    rw [plus_n_0, add_0_n]
  | succ m hd =>
    rw [add_succ, succ_add_lemma, hd]

Finally, we prove associativity : n + (m + p) = (n + m) + p. Since addition on natural numbers in Lean recurses on the second argument, it is often easiest to induct on the right-most variable, which in this case is p.

  induction p with
  | zero => rfl
  | succ p hd =>
    rw [add_succ, add_succ, add_succ, hd]

As you can see in the proof for add_comm, wemake use of the previous proofs that we already wrote, namely plus_n_0,add_0_n and succ_add_lemma. We call this Proofs within Proofs, which is a good way to write proofs when they become huge.

Binary Numerals

We have been using the standard Unary representation (like Nat), where a number is just a sequence of successors. We can generalize this to a binary representation where a number is a sequence of bits (0s and 1s).

We define a new inductive type Bin with three constructors :

inductive Bin : Type
  | Z : Bin
  | B0 (n : Bin) : Bin
  | B1 (n : Bin) : Bin

open Bin

Now we define an incr function that increments a binary number by 1, so

def incr (m : Bin) : Bin :=
  match m with
  | Z => B1 Z
  | B0 n => B1 n
  | B1 n => B0 (incr n)

The next step would be have a binary to natural number function which would be :

def bin_to_nat (m : Bin) : Nat :=
  match m with
  | Z => 0
  | B0 n => 2 * (bin_to_nat n)
  | B1 n => 2 * (bin_to_nat n) + 1

Checking our definitions :

example : (incr (B1 Z)) = B0 (B1 Z) := rfl
example : (incr (B0 (B1 Z))) = B1 (B1 Z) := rfl
example : (incr (B1 (B1 Z))) = B0 (B0 (B1 Z)) := rfl


example : bin_to_nat (B0 (B1 Z)) = 2 := rfl
example : bin_to_nat (incr (B1 Z)) = 1 + bin_to_nat (B1 Z) := rfl
example : bin_to_nat (incr (incr (B1 Z))) = 2 + bin_to_nat (B1 Z) := rfl

But the examples don't prove the correctness. We need to prove that our increment preserves the values relative to the natural numbers. That is incrementing a binary number and then converting it to a (unary) natural number yields the same result as first converting it to a natural number and then incrementing.

theorem bin_to_nat_pres_incr (m : Bin) : bin_to_nat (incr m) = 1 + bin_to_nat m := by
  induction m with
  | Z => rfl
  | B0 n hd =>
    simp [bin_to_nat, incr]
    rw [Nat.add_comm]
  | B1 n hd =>
    simp [bin_to_nat, incr]
    rw [hd]
    rw [Nat.mul_add]
    rw [ Nat.mul_one, Nat.add_comm, Nat.add_left_comm]

Note: In the B0 case, we encounter 2n + 1 on the left and 1 + 2n on the right. These are not definitionally equal so we must use Nat.add_comm. In the B1 case, we use simp to handle the algebraic distribution 2(n+1).

So, similar to how we defined binary to natural, we describe a system to define natural to binary, i.e.

def nat_to_bin (n : Nat) : Bin :=
  match n with
  | 0 => Z
  | succ n' => incr (nat_to_bin n')

and we prove that the whole loop for natural to binary to natural is correct.

theorem nat_bin_nat (n : Nat) : bin_to_nat (nat_to_bin n) = n := by
  induction n with
  | zero => rfl
  | succ n hd =>
    dsimp [nat_to_bin]
    rw [bin_to_nat_pres_incr]
    rw [hd]
    rw [Nat.add_comm]

Perfect, things look good right now.

But what about the loop for binary to natural to binary, will it work the same way ?

Actually, it would fail.

The reason for this actually very simple to understand. In our Bin type, numbers can have multiple representations - specifically, multiple representations of zero.

The constructor Z represents zero.
But B0 Z also represents zero. And B0 (B0 Z) also represents zero.

You can see the issue happening here. The function bin_to_nat collapses Z, B0 Z, B0 (B0 Z) and multiple other representations all down to single natural number 0.

When we go back the other way using nat_to_bin, the function has to pick a single canonical representation for 0. Our definition maps 0 to Z.

So, the round trip looks like this:

  1. Start: B0 Z (a valid binary object).

  2. Convert to Nat: bin_to_nat (B0 Z) returns 0.

  3. Convert to Bin: nat_to_bin 0 returns Z.

Since Z is not structurally equal to B0 Z, the theorem bin_nat_bin fails.

To solve that problem, we can introduce a normalization function that selects the simplest bin out of all the equivalent bin. Then we can prove that the conversion from bin to nat and back again produces that normalized, simplest bin.

The code for the whole process can be found in the Induction.v file.

I did not try to port the code for this section to Lean, please do try it as an exercise if you feel like it.

Concluding and Moving Ahead ...

We conclude here with the knowledge of induction proofs and binary numerals. You can find all the source code in this repo.Do give a try to the exercises in the file as well.

In the next article, we focus on learning Lists and gear ourselves for working with structured data. See you there !

Main Reference : Software Foundations