Let's Learn Lean : Lists
Welcome back to the fourth installment of our series porting Software Foundations to Lean 4.
In the previous three articles, we were succesfully able to establish the Basics of functional programming and learned how to perform Proof by Induction.
In this article, we are going to look at structured data. Specifically, we will cover:
- Pairs: Packaging two values together.
- Lists: Finite sequences of data.
- Bags: Multisets implemented via lists.
- Reasoning: Proving theorems about these structures.
- Options: Handling errors gracefully.
- Partial Maps: Building a key-value store.
So let's begin.
1. Pairs of Numbers
In standard functional programming, a "pair" is the simplest composite data structure. It holds exactly two values. We will build NatProd (a pair of natural numbers) from scratch.
Defining the Type
We define NatProd as an inductive type with a single constructor pair that takes two Nat arguments.
namespace NatList
inductive NatProd where
| pair (n1 n2 : Nat)
-- Check the type of a pair
#check (NatProd.pair 3 5) -- : NatProd
Extraction Functions
To get data out of a pair, we use pattern matching. We define fst (first) and snd (second) functions:
def fst (p : NatProd) : Nat :=
match p with
| NatProd.pair x y => x
def snd (p : NatProd) : Nat :=
match p with
| NatProd.pair x y => y
#eval (fst (NatProd.pair 3 5)) -- Result: 3
Notation
In Lean, we want our custom types to feel as natural to use as built-in ones. Since NatProd is just a pair of numbers, it would feel better to write (3, 5) instead of NatProd.pair 3 5.
However, (3, 5) is already syntax for Lean's built-in product type, Nat × Nat.
To bridge this gap, we use a Coercion.
A coercion tells Lean: "If you see a value of type A where a value of type B is expected, run this function to convert it automatically."
@[coe]
def toNatProd (t : Nat × Nat) : NatProd := .pair t.1 t.2
instance : Coe (Nat × Nat) NatProd where coe := toNatProd
#eval (fst (3, 5))
Proving Properties of Pairs
Let's try to prove a simple fact: if you take a pair p, extract its first and second elements, and rebuild the pair, you get p back. This is called surjective pairing.
theorem surjective_pairing : ∀ (p : NatProd), p = (fst p, snd p) := by
intro p
cases p with
| pair n m => rfl
Note on Tactics:
cases p with: This tactic breakspdown into its constituent parts. SinceNatProdhas only one constructor (pair), we only have one case to handle.| pair n m: This names the two numbers inside the pairnandm.rfl: Reflexivity solves the goal becausefst (pair n m)evaluates tonandsnd (pair n m)tom.
We can define a function that swaps the elements of a pair:
def swap_pair (p : NatProd) : NatProd :=
match p with
| NatProd.pair x y => NatProd.pair y x
You can prove as an exercise that (snd p, fst p) is the same as swapping p.
theorem snd_fst_is_swap : ∀ (p : NatProd), (snd p, fst p) = swap_pair p := by
/- FILL IN HERE -/ sorry
2. Lists of Numbers
Now we move on to the main topic of this article: Lists. A list is slightly more complex than a pair because it is recursive. A list is either:
- Empty (
nil). - A number constructed onto another list (
cons).
inductive NatList where
| nil
| cons (n : Nat) (l : NatList)
Notation for Lists
While infixr and notation are great for simple symbols like :: or [], they aren't powerful enough to handle sequences of arbitrary length like [1, 2, 3]. To achieve this familiar syntax for our custom NatList, we need to use Lean's macro system.
A macro is essentially a rule that tells Lean: "When you see syntax pattern X, rewrite it into code Y before you even try to check types."
Here is the complete implementation that allows us to write [a, b, c] for NatList:
infixr:60 " :: " => NatList.cons
notation "[]" => NatList.nil
-- NOTE: We'll use ',' as the separator instead of ';' in the list`
macro_rules
| `([$hd:term , $tl:term,*]) => `(NatList.cons $(Lean.quote hd) ([$tl,*]))
| `([$hd:term]) => `(NatList.cons $(Lean.quote hd) NatList.nil)
| `([]) => `(NatList.nil)
Common Functions
We can now define standard functional programming tools.
Repeat: Because repeat is a reserved keyword in Lean (used for tactics), we rename this function to repeatN. It creates a list of length count containing only n.
def repeatN (n count : Nat) : NatList :=
match count with
| zero => nil
| .succ count' => n :: (repeatN n count')
Length: Calculates the size of the list.
def length (l : NatList) : Nat :=
match l with
| [] => 0
| _ :: t => 1 + length t
Append (++): Joins two lists together.
def app (l1 l2 : NatList) : NatList :=
match l1 with
| [] => l2
| h :: t => h :: (app t l2)
infixr:60 " ++ " => app
Head (hd) and Tail (tl)
Note that hd requires a default value because the list might be empty!
def hd (default : Nat) (l : NatList) : Nat :=
match l with
| [] => default
| h :: _ => h
def tl (l : NatList) : NatList :=
match l with
| [] => []
| _ :: t => t
Exercises: List Functions
The following functions can be found in the code file to be solved as an exercise:
nonzeros: Takes a list and returns a list with all0s removed.oddmembers: Returns a list containing only the odd numbers.countoddmembers: Returns the number of odd elements.
def nonzeros (l : NatList) : NatList :=
/- REPLACE THIS LINE WITH YOUR DEFINITION -/ sorry
def oddmembers (l : NatList) : NatList :=
/- REPLACE THIS LINE WITH YOUR DEFINITION -/ sorry
def countoddmembers (l : NatList) : Nat :=
/- REPLACE THIS LINE WITH YOUR DEFINITION -/ sorry
3. Bags via Lists
A Bag (or multiset) is a collection where order doesn't matter, but repetition does. We can model a Bag simply as a NatList.
def Bag := NatList
These are some of the operations you can perform on Bags.
- count : counts how many times a value
vappears in the bags. - sum : similar to
app, it unions two bags. - add : inserts an element into the bag.
Something like :
def count (v : Nat) (s : Bag) : Nat :=
/- REPLACE THIS LINE WITH YOUR DEFINITION -/ sorry
def sum: Bag → Bag → Bag :=
/- REPLACE THIS LINE WITH YOUR DEFINITION -/ sorry
def add (v : Nat) (s : Bag) : Bag :=
/- REPLACE THIS LINE WITH YOUR DEFINITION -/ sorry
I have not added the implementations for these functions in this article as they are meant to be exercises even in the original Software Foundations text.
However, you can find the solution implementations in the equivalent .v files of Lists in the linked repo, so nothing to worry about.
More functions on Bags to practice :
- remove_one : removes the first occurrence of a number
- remove_all : removes every occurrence
def remove_one (v : Nat) (s : Bag) : Bag :=
/- REPLACE THIS LINE WITH YOUR DEFINITION -/ sorry
def remove_all (v : Nat) (s : Bag) : Bag :=
/- REPLACE THIS LINE WITH YOUR DEFINITION -/ sorry
4. Reasoning About Lists
When we proved properties about numbers, we used induction on Nat. Similarly for proving properties about lists, we use induction on Lists.
The principle is the same:
- Base Case: Prove the property holds for
[](nil). - Inductive Step: Assume the property holds for a list
t, then prove it holds forh :: t.
Associativity of Append
We want to prove (l1 ++ l2) ++ l3 = l1 ++ (l2 ++ l3).
theorem app_assoc : ∀ l1 l2 l3 : NatList, (l1 ++ l2) ++ l3 = l1 ++ (l2 ++ l3) := by
intros l1 l2 l3
induction l1 with
| nil =>
rfl
| cons h t ih =>
simp [app]
rewrite [ih]
rfl
Notice the pattern: simp usually reduces the goal by unfolding the definition of app for the cons case, revealing a sub-term that matches our Inductive Hypothesis (ih). We rewrite using ih and we are done.
The Reversing List Problem
For a slightly more involved example of inductive proof over lists, suppose we use app to define a list-reversing function rev:
def rev (l : NatList) : NatList :=
match l with
| [] => []
| h :: t => rev t ++ [h]
example : rev [1, 2, 3] = [3, 2, 1] := rfl
example : rev [] = [] := rfl
For something a bit more challenging, let's prove that reversing a list does not change its length. Our first attempt gets stuck in the successor case...
theorem rev_length_firsttry : ∀ l : NatList, length (rev l) = length l := by
intro l
induction l with
| nil => rfl
| cons h t ih =>
-- simp: seems to do nothing!
-- We are stuck: the goal involves '++', but we don't have equations
-- in the context to handle 'length' distributing over '++'.
sorry
A first attempt to make progress would be to prove exactly the statement that we are missing at this point (involving rev and ++). But the attempt fails because the induction hypothesis is not general enough.
theorem app_rev_length_succ_firsttry :
∀ l n, length (rev l ++ [n]) = Nat.succ (length l) := by
intro l
induction l with
| nil => intro n <;> rfl
| cons h t ih =>
intro n
-- We are stuck again!
-- The Inductive Hypothesis 'ih' talks about 'rev t ++ [n]'.
-- But the goal talks about '(rev t ++ [h]) ++ [n]'.
-- The mismatch prevents us from using the IH.
sorry
It turns out that the lemma above is more specific than it needs to be. We can strengthen the lemma to work not only on reversed lists but on general lists. This removes the rev structure that was complicating the induction.
theorem app_length_succ : ∀ l n, length (l ++ [n]) = Nat.succ (length l) := by
intro l
induction l with
| nil => intro n <;> rfl
| cons h t ih => intro n <;> simp [app, length] <;> rewrite [ih] <;> rfl
Now we can complete the original proof.
theorem rev_length : ∀ l : NatList, length (rev l) = length l := by
intro l
induction l with
| nil => rfl
| cons h t ih =>
simp [rev, length]
rewrite [app_length_succ, ih]
simp [Nat.one_add]
Note that the app_length_succ lemma we proved above is pretty narrow, requiring that the second list contains only a single element. We can prove a more general version for any two lists.
theorem app_length : ∀ l1 l2 : NatList, length (l1 ++ l2) = length l1 + length l2 := by
intros l1 l2
induction l1 with
| nil => simp [app, length]
| cons h t ih =>
simp [app, length]
rewrite [ih]
simp [Nat.succ_add]
5. Options
Sometimes, functions fail. For example, hd [] has to return a default value. A better way to handle partial functions is using the Option type.
inductive NatOption where
| Some (n : Nat)
| None
Now we can write nth_error, which returns the n-th element of a list, or None if the index is out of bounds.
def nth_error (l : NatList) (n : Nat) : NatOption :=
match l with
| [] => .None
| a :: l' =>
match n with
| .zero => .Some a
| .succ n' => nth_error l' n'
This allows us to write robust code without making up fake default values like 0 or -1.
6. Partial Maps
Finally, we apply our list knowledge to build a Partial Map. This is a simple key-value database where keys are IDs and values are Nat.
We define an ID type:
inductive MyId : Type where
| Id (n : Nat)
And the Map itself:
inductive partial_map : Type where
| empty
| record (i : MyId) (v : Nat) (m : partial_map)
This is essentially a list of records! The update function adds a new record to the front, effectively "overwriting" any previous values for that key (since find searches from the front).
def update (d : partial_map) (x : MyId) (value : Nat) : partial_map :=
.record x value d
def find (x : MyId) (d : partial_map) : NatOption :=
match d with
| empty => .None
| record y v d' =>
if eqb_id x y then .Some v else find x d'
And we wrap this up with a few more exercises to prove :
theorem update_eq :
∀ (d : partial_map) (x : MyId) (v : Nat),
find x (update d x v) = .Some v
:= by
/- FILL IN HERE -/ sorry
theorem update_neq :
∀ (d : partial_map) (x y : MyId) (o : Nat),
eqb_id x y = false → find x (update d y o) = find x d
:= by
/- FILL IN HERE -/ sorry
Concluding and Moving Ahead...
We end here with a pretty good knowledge of how we handle structured data in Lean. You can find all the source code in this repo. All the exercises can be found here, and you can give them a try yourself.
In the next article, we focus on learning Polymorphisms and Higher-Order Functions. See you there !
Main Reference : Software Foundations