Surfer's blog

Let's Learn Lean : Basics-I

There has been a lot of discussion around Formal Verification, and Lean has been at the forefront of all this. So instead of observing from the sidelines, let's try to learn it from the ground up.

In this series, I am porting the Software Foundations book (written originally for Coq) into Lean 4. We will build everything from scratch to understand the mathematics that powers verified software.

Why Lean 4?

Lean is an interactive theorem prover developed by Leonardo de Moura at Microsoft Research. It has become incredibly popular among mathematicians (via the Mathlib library) and the ZK community.

Unlike older provers like Coq that require "extracting" code to an intermediate language like OCaml to run it, Lean 4 is a general-purpose language. We can write our compiler in Lean, verify it in Lean, and compile it directly to a standalone C binary. This reduces the "trust surface" and offers higher performance.

Data and Functions

We are ignoring the standard libraries and building everything from scratch to understand how these tools work.

"Days of the Week"

Let's start with a very simple example. We want to tell the theorem prover that we are defining a set of data values—a type.

We use the inductive keyword. We use where to start the definition (similar to Python syntax), and list the values. Note the naming convention: Lean types and constructors use CamelCase.

inductive Day where
  | monday
  | tuesday
  | wednesday
  | thursday
  | friday
  | saturday
  | sunday
deriving Repr

What is deriving Repr? Lean is efficient and "lazy." It won't generate the code to convert our type into a printable string unless we explicitly ask it to derive a "Representation" (Repr). Without this, we wouldn't be able to print the days to the console later.

Writing Functions

Now that we have defined Day, we can write functions that operate on it. Let's write a function that calculates the next business day.

We use def to define functions and match to handle the logic.

def nextWorkingDay (d : Day) : Day :=
  match d with
  | Day.monday    => Day.tuesday
  | Day.tuesday   => Day.wednesday
  | Day.wednesday => Day.thursday
  | Day.thursday  => Day.friday
  | Day.friday    => Day.monday
  | Day.saturday  => Day.monday
  | Day.sunday    => Day.monday

Note on Namespacing: You will notice we use Day.monday instead of just monday. In Lean, constructors are namespaced inside the type. This keeps the global namespace clean—a massive advantage in large projects.

Computation and Testing

To check our work, we use the #eval command. This calculates the result immediately inside the editor.

#eval nextWorkingDay Day.friday
-- Result: Day.monday

#eval nextWorkingDay (nextWorkingDay Day.saturday)
-- Result: Day.tuesday

The First Proof (Assertions)

Testing values is good, but proving them is better.

We can define a test case using the example keyword. Instead of writing a complex proof script, we can simply provide the proof term rfl (short for reflexivity) inside a by block.

example : nextWorkingDay (nextWorkingDay Day.saturday) = Day.tuesday := by
  rfl

The rfl tactic computes the values on both sides of the = sign and checks if they are definitionally equal. If they compute to the same thing, the proof passes.

Booleans

Now, let's implement Booleans from scratch.

Note: Because Bool is built into Lean's standard library, we will wrap our work in a namespace to avoid conflicts.

namespace MySpace

  inductive Bool where
    | true
    | false
  deriving Repr

Boolean Functions

We can define functions over booleans just like we did for days. Here are the definitions for negation (negb), logical AND (andb), and logical OR (orb).

  def negb (b : Bool) : Bool :=
    match b with
    | Bool.true => Bool.false
    | Bool.false => Bool.true

  def andb (b1 : Bool) (b2 : Bool) : Bool :=
    match b1 with
    | Bool.true => b2
    | Bool.false => Bool.false

  def orb (b1 : Bool) (b2 : Bool) : Bool :=
    match b1 with
    | Bool.true => Bool.true
    | Bool.false => b2

We can verify these with truth tables using example:

  example : orb Bool.true Bool.false = Bool.true := by rfl
  example : orb Bool.false Bool.false = Bool.false := by rfl

Notation

Writing orb x y is tedious. We prefer standard infix notation like x || y. Lean allows us to define this with specific precedence levels.

  infixl:65 " && " => andb
  infixl:60 " || " => orb

Now we can write readable proofs:

  example : (Bool.true || (Bool.true || Bool.true)) = Bool.true := by rfl

Conditional Expressions (if is tricky)

In most languages, if is a primitive. In Lean, if is tied strictly to logic.

A standard if b then x else y statement in Lean expects b to be a Proposition that is Decidable. Our custom Bool is just a data type; Lean doesn't automatically know how to treat it as a logical condition.

To use if with our custom type, we would need to derive DecidableEq and check equality explicitly:

  inductive Bool' where
    | true
    | false
  deriving Repr, DecidableEq -- Allows us to check equality (x = y)

  def negb' (b : Bool') : Bool' :=
    if b = Bool'.true then Bool'.false else Bool'.true

For the rest of this series, we will mostly stick to Pattern Matching (match), as it is more fundamental.

Types and Functions

Every expression in Lean has a type. We can inspect them using the #check command.

#check Bool.true
-- Result: Bool.true : Bool

#check negb
-- Result: negb : Bool → Bool

The type Bool → Bool is pronounced "Bool arrow Bool." It means: "Give me a Bool, and I will return a Bool." Similarly, the type of andb, written bool → bool → bool, can be read, "Given two inputs, each of type bool, this function produces an output of type bool."

Wrapper Types (New Types from Old)

Let's define a Color type that can be Black, White, or a Primary color (Red, Green, Blue).

Notice the syntax primary (p : Rgb). This means the primary constructor requires an argument.

inductive Rgb where
  | red
  | green
  | blue
deriving Repr

inductive Color where
  | black
  | white
  | primary (p : Rgb) -- Wraps the Rgb type
deriving Repr

When we pattern match, we can extract this data:

def monochrome (c : Color) : Bool :=
  match c with
  | Color.black => Bool.true
  | Color.white => Bool.true
  | Color.primary _ => Bool.false -- The '_' is a wildcard (we don't care which RGB it is)

Tuples

We can create types that hold multiple values by defining a single constructor with multiple parameters. This is effectively a "Struct."

namespace TuplePlayground

  inductive Bit where
    | B1
    | B0
  deriving Repr

  inductive Nybble where
    | bits (b0 b1 b2 b3 : Bit)
  deriving Repr

  -- Creating a Nybble
  #check (Nybble.bits Bit.B1 Bit.B0 Bit.B1 Bit.B0)

  -- Pattern matching to read the bits
  def allZero (nb : Nybble) : Bool :=
    match nb with
      | Nybble.bits Bit.B0 Bit.B0 Bit.B0 Bit.B0 => Bool.true
      | Nybble.bits _ _ _ _ => Bool.false

end TuplePlayground

Tip: In production Lean code, we often use the structure keyword for types that have only one constructor. However, for learning the foundations, using inductive helps us understand that a struct is just a data type with a single shape.

Numbers (Peano Arithmetic)

We wrap this section in a namespace so that our own definition of natural numbers does not interfere with the one from the standard library.

namespace NatPlayground

All the types we have defined so far—Enumerated types like Day and Tuples like Nybble—are finite. The natural numbers, on the other hand, are an infinite set. To represent them, we need a richer form of type declaration.

There are many representations of numbers to choose from. You are familiar with decimal notation (base 10), binary (base 2), or hexadecimal (base 16). The binary representation is valuable in computer hardware because digits map to voltage levels. However, here we choose a representation that makes proofs simpler.

We use Unary representation (base 1), where only a single digit is used—like making scratches on a cave wall. To represent unary numbers with a datatype, we use two constructors:

  1. O: Represents Zero.
  2. S: Represents the Successor of a number (n + 1).
inductive Nat where
  | O
  | S (n : Nat)
deriving Repr

With this definition:

The Logic: The definition says precisely how expressions in the set Nat can be built:

  1. O belongs to the set Nat.
  2. If n belongs to Nat, then S n also belongs to Nat.

Crucially, the constructor S has the type Nat → Nat, just like functions. However, S does not "do" anything. It doesn't perform a calculation. It is simply a way of writing down numbers. Just like the digit "1" in "100" is data, S is data.

Predecessor Function

We can write functions that "peel off" these wrappers. For example, the predecessor function:

def pred (n : Nat) : Nat :=
  match n with
  | Nat.O => Nat.O
  | Nat.S n' => n'

The second branch reads: "If n has the form S n', return n'."

Recursion

For most interesting computations, simple pattern matching isn't enough; we need recursion. For example, to check if n is even, we need to check if n-2 is even.

In Coq, you must use the keyword Fixpoint to define recursive functions. In Lean, we use standard def. Lean is smart enough to analyze the function and prove it terminates (because the argument n' is smaller than n).

def even (n : Nat) : Bool :=
  match n with
  | Nat.O => Bool.true        -- 0 is even
  | Nat.S Nat.O => Bool.false -- 1 is odd
  | Nat.S (Nat.S n') => even n' -- Recurse

Recursive Addition

Addition is defined by "peeling off" wrappers from one number and putting them on the other.

def plus (n m : Nat) : Nat :=
  match n with
  | Nat.O => m
  | Nat.S n' => Nat.S (plus n' m)

Mental Model: If we calculate plus 3 2, the simplification steps look like this:

plus (S (S (S O))) (S (S O))
==> S (plus (S (S O)) (S (S O)))
==> S (S (plus (S O) (S (S O))))
==> S (S (S (plus O (S (S O)))))
==> S (S (S (S (S O))))  -- which is 5

Multiplication and Subtraction

Multiplication is just repeated addition.

def mult (n m : Nat) : Nat :=
  match n with
  | Nat.O => Nat.O
  | Nat.S n' => plus m (mult n' m)

def minus (n m : Nat) : Nat :=
  match n, m with
  | Nat.O, _ => Nat.O
  | Nat.S n', Nat.O => n
  | Nat.S n', Nat.S m' => minus n' m'

infixl:65 " + " => plus
infixl:65 " - " => minus
infixl:70 " * " => mult

#eval (6 + 3 : _root_.Nat)

Here, we define eqb (equality boolean).

def eqb (n m : Nat) : Bool :=
  match n, m with
  | Nat.O,    Nat.O    => Bool.true
  | Nat.O,    Nat.S _  => Bool.false
  | Nat.S _,  Nat.O    => Bool.false
  | Nat.S n', Nat.S m' => eqb n' m'

infix:50 " =? " => eqb

-- Less than or Equal function
def leb (n m : Nat) : Bool :=
  match n with
  | Nat.O => Bool.true
  | Nat.S n' =>
      match m with
      | Nat.O => Bool.false
      | Nat.S m' => leb n' m'

infix:70 " <=? " => leb

While we can define this, we usually don't have to. Lean has a built-in typeclass called BEq (Boolean Equality). Nat already has an instance of this, so we can just write if n == m then .... We defined it here just for the sake of knowing.

Exercises

1. Factorial Recall the mathematical definition: factorial(0) = 1 and factorial(n) = n * factorial(n-1). Translate this into Lean.

def factorial (n : Nat) : Nat :=
  -- REPLACE THIS WITH YOUR DEFINITION
  sorry

example : factorial (Nat.S (Nat.S (Nat.S Nat.O))) = (Nat.S (Nat.S (Nat.S (Nat.S (Nat.S (Nat.S Nat.O)))))) := by sorry

2. Less Than (ltb) Fill in the definition of ltb (less-than boolean).

def ltb (n m : Nat) : Bool :=
  -- REPLACE THIS WITH YOUR DEFINITION
  sorry

infix:70 " <? " => ltb

example : (ltb (Nat.S (Nat.S Nat.O)) (Nat.S (Nat.S Nat.O))) = Bool.false := by sorry
example : (ltb (Nat.S (Nat.S Nat.O)) (Nat.S (Nat.S (Nat.S (Nat.S Nat.O))))) = Bool.true := by sorry

end NatPlayground

Concluding and moving forward...

With all these definitions, we will be concluding the first blog in this series. Basics-II will be continuation of the first part where we will cover tactics for proofs : Simplification, Rewriting, and Case Analysis. We will also look at a more efficient representation of numbers: Binary Numerals.

You can find all the source code in this repo. The repo provides you with the Lean code, and you can follow the Basics.v file for the Coq code. For all the exercises mentioned in this article, the solutions can be found in the equivalent .v files, and you can write them in the Lean file yourself.

Main Reference : Software Foundations