Lean mathlib notes
Various implementation details are noted in the mathlib source, and referenced later on. We collect these notes here.
bundled maps over different rings
Frequently, we find ourselves wanting to express a bilinear map M →ₗ[R] N →ₗ[R] P
or an
equivalence between maps (M →ₗ[R] N) ≃ₗ[R] (M' →ₗ[R] N')
where the maps have an associated ring
R
. Unfortunately, using definitions like these requires that R
satisfy comm_semiring R
, and
not just semiring R
. Using M →ₗ[R] N →+ P
and (M →ₗ[R] N) ≃+ (M' →ₗ[R] N')
avoids this
problem, but throws away structure that is useful for when we do have a commutative (semi)ring.
To avoid making this compromise, we instead state these definitions as M →ₗ[R] N →ₗ[S] P
or
(M →ₗ[R] N) ≃ₗ[S] (M' →ₗ[R] N')
and require smul_comm_class S R
on the appropriate modules. When
the caller has comm_semiring R
, they can set S = R
and smul_comm_class_self
will populate the
instance. If the caller only has semiring R
they can still set either R = ℕ
or S = ℕ
, and
add_comm_monoid.nat_smul_comm_class
or add_comm_monoid.nat_smul_comm_class'
will populate
the typeclass, which is still sufficient to recover a ≃+
or →+
structure.
An example of where this is used is linear_map.prod_equiv
.
category_theory universes
The typeclass category C
describes morphisms associated to objects of type C : Type u
.
The universe levels of the objects and morphisms are independent, and will often need to be
specified explicitly, as category.{v} C
.
Typically any concrete example will either be a small_category
, where v = u
,
which can be introduced as
universes u
variables {C : Type u} [small_category C]
or a large_category
, where u = v+1
, which can be introduced as
universes u
variables {C : Type (u+1)} [large_category C]
In order for the library to handle these cases uniformly,
we generally work with the unconstrained category.{v u}
,
for which objects live in Type u
and morphisms live in Type v
.
Because the universe parameter u
for the objects can be inferred from C
when we write category C
, while the universe parameter v
for the morphisms
can not be automatically inferred, through the category theory library
we introduce universe parameters with morphism levels listed first,
as in
universes v u
or
universes v₁ v₂ u₁ u₂
when multiple independent universes are needed.
This has the effect that we can simply write category.{v} C
(that is, only specifying a single parameter) while u
will be inferred.
Often, however, it's not even necessary to include the .{v}
.
(Although it was in earlier versions of Lean.)
If it is omitted a "free" universe will be used.
coercion into rings
Coercions such as nat.cast_coe
that go from a concrete structure such as
ℕ
to an arbitrary ring α
should be set up as follows:
@[priority 900] instance : has_coe_t ℕ α := ⟨...⟩
It needs to be has_coe_t
instead of has_coe
because otherwise type-class
inference would loop when constructing the transitive coercion ℕ → ℕ → ℕ → ...
.
The reduced priority is necessary so that it doesn't conflict with instances
such as has_coe_t α (option α)
.
For this to work, we reduce the priority of the coe_base
and coe_trans
instances because we want the instances for has_coe_t
to be tried in the
following order:
has_coe_t
instances declared in mathlib (such ashas_coe_t α (with_top α)
, etc.)coe_base
, which contains instances such ashas_coe (fin n) n
nat.cast_coe : has_coe_t ℕ α
etc.coe_trans
If coe_trans
is tried first, then nat.cast_coe
doesn't get a chance to apply.
continuity lemma statement
The library contains many lemmas stating that functions/operations are continuous. There are many
ways to formulate the continuity of operations. Some are more convenient than others.
Note: for the most part this note also applies to other properties
(measurable
, differentiable
, continuous_on
, ...).
The traditional way #
As an example, let's look at addition (+) : M → M → M
. We can state that this is continuous
in different definitionally equal ways (omitting some typing information)
continuous (λ p, p.1 + p.2)
;continuous (function.uncurry (+))
;continuous ↿(+)
. (↿
is notation for recursively uncurrying a function)
However, lemmas with this conclusion are not nice to use in practice because
- They confuse the elaborator. The following two examples fail, because of limitations in the elaboration process.
variables {M : Type*} [has_mul M] [topological_space M] [has_continuous_mul M]
example : continuous (λ x : M, x + x) :=
continuous_add.comp _
example : continuous (λ x : M, x + x) :=
continuous_add.comp (continuous_id.prod_mk continuous_id)
The second is a valid proof, which is accepted if you write it as
continuous_add.comp (continuous_id.prod_mk continuous_id : _)
- If the operation has more than 2 arguments, they are impractical to use, because in your application the arguments in the domain might be in a different order or associated differently.
The convenient way #
A much more convenient way to write continuity lemmas is like continuous.add
:
continuous.add {f g : X → M} (hf : continuous f) (hg : continuous g) : continuous (λ x, f x + g x)
The conclusion can be continuous (f + g)
, which is definitionally equal.
This has the following advantages
- It supports projection notation, so is shorter to write.
continuous.add _ _
is recognized correctly by the elaborator and gives useful new goals.- It works generally, since the domain is a variable.
As an example for an unary operation, we have continuous.neg
.
continuous.neg {f : α → G} (hf : continuous f) : continuous (λ x, -f x)
For unary functions, the elaborator is not confused when applying the traditional lemma
(like continuous_neg
), but it's still convenient to have the short version available (compare
hf.neg.neg.neg
with continuous_neg.comp $ continuous_neg.comp $ continuous_neg.comp hf
).
As a harder example, consider an operation of the following type:
def strans {x : F} (γ γ' : path x x) (t₀ : I) : path x x
The precise definition is not important, only its type. The correct continuity principle for this operation is something like this:
{f : X → F} {γ γ' : ∀ x, path (f x) (f x)} {t₀ s : X → I}
(hγ : continuous ↿γ) (hγ' : continuous ↿γ')
(ht : continuous t₀) (hs : continuous s) :
continuous (λ x, strans (γ x) (γ' x) (t x) (s x))
Note that all arguments of strans
are indexed over X
, even the basepoint x
, and the last
argument s
that arises since path x x
has a coercion to I → F
. The paths γ
and γ'
(which
are unary functions from I
) become binary functions in the continuity lemma.
Summary #
- Make sure that your continuity lemmas are stated in the most general way, and in a convenient
form. That means that:
- The conclusion has a variable
X
as domain (not something likeY × Z
); - Wherever possible, all point arguments
c : Y
are replaced by functionsc : X → Y
; - All
n
-ary function arguments are replaced byn+1
-ary functions (f : Y → Z
becomesf : X → Y → Z
); - All (relevant) arguments have continuity assumptions, and perhaps there are additional assumptions needed to make the operation continuous;
- The function in the conclusion is fully applied.
- The conclusion has a variable
- These remarks are mostly about the format of the conclusion of a continuity lemma.
In assumptions it's fine to state that a function with more than 1 argument is continuous using
↿
orfunction.uncurry
.
Functions with discontinuities #
In some cases, you want to work with discontinuous functions, and in certain expressions they are
still continuous. For example, consider the fractional part of a number, fract : ℝ → ℝ
.
In this case, you want to add conditions to when a function involving fract
is continuous, so you
get something like this: (assumption hf
could be weakened, but the important thing is the shape
of the conclusion)
lemma continuous_on.comp_fract {X Y : Type*} [topological_space X] [topological_space Y]
{f : X → ℝ → Y} {g : X → ℝ} (hf : continuous ↿f) (hg : continuous g) (h : ∀ s, f s 0 = f s 1) :
continuous (λ x, f x (fract (g x)))
With continuous_at
you can be even more precise about what to prove in case of discontinuities,
see e.g. continuous_at.comp_div_cases
.
custom simps projection
You can specify custom projections for the @[simps]
attribute.
To do this for the projection my_structure.original_projection
by adding a declaration
my_structure.simps.my_projection
that is definitionally equal to
my_structure.original_projection
but has the projection in the desired (simp-normal) form.
Then you can call
initialize_simps_projections (original_projection → my_projection, ...)
to register this projection. See initialize_simps_projections_cmd
for more information.
You can also specify custom projections that are definitionally equal to a composite of multiple
projections. This is often desirable when extending structures (without old_structure_cmd
).
has_coe_to_fun
and notation class (like has_mul
) instances will be automatically used, if they
are definitionally equal to a projection of the structure (but not when they are equal to the
composite of multiple projections).
decidable arguments
As mathlib is primarily classical,
if the type signature of a def
or lemma
does not require any decidable
instances to state,
it is preferable not to introduce any decidable
instances that are needed in the proof
as arguments, but rather to use the classical
tactic as needed.
In the other direction, when decidable
instances do appear in the type signature,
it is better to use explicitly introduced ones rather than allowing Lean to automatically infer
classical ones, as these may cause instance mismatch errors later.
decidable namespace
In most of mathlib, we use the law of excluded middle (LEM) and the axiom of choice (AC) freely.
The decidable
namespace contains versions of lemmas from the root namespace that explicitly
attempt to avoid the axiom of choice, usually by adding decidability assumptions on the inputs.
You can check if a lemma uses the axiom of choice by using #print axioms foo
and seeing if
classical.choice
appears in the list.
dsimp, simp
Many proofs in the category theory library use the dsimp, simp
pattern,
which typically isn't necessary elsewhere.
One would usually hope that the same effect could be achieved simply with simp
.
The essential issue is that composition of morphisms involves dependent types.
When you have a chain of morphisms being composed, say f : X ⟶ Y
and g : Y ⟶ Z
,
then simp
can operate succesfully on the morphisms
(e.g. if f
is the identity it can strip that off).
However if we have an equality of objects, say Y = Y'
,
then simp
can't operate because it would break the typing of the composition operations.
We rarely have interesting equalities of objects
(because that would be "evil" --- anything interesting should be expressed as an isomorphism
and tracked explicitly),
except of course that we have plenty of definitional equalities of objects.
dsimp
can apply these safely, even inside a composition.
After dsimp
has cleared up the object level, simp
can resume work on the morphism level ---
but without the dsimp
step, because simp
looks at expressions syntactically,
the relevant lemmas might not fire.
There's no bound on how many times you potentially could have to switch back and forth,
if the simp
introduced new objects we again need to dsimp
.
In practice this does occur, but only rarely, because simp
tends to shorten chains of compositions
(i.e. not introduce new objects at all).
fact non-instances
In most cases, we should not have global instances of fact
; typeclass search only reads the head
symbol and then tries any instances, which means that adding any such instance will cause slowdowns
everywhere. We instead make them as lemmata and make them local instances as required.
forgetful inheritance
Suppose that one can put two mathematical structures on a type, a rich one R
and a poor one
P
, and that one can deduce the poor structure from the rich structure through a map F
(called a
forgetful functor) (think R = metric_space
and P = topological_space
). A possible
implementation would be to have a type class rich
containing a field R
, a type class poor
containing a field P
, and an instance from rich
to poor
. However, this creates diamond
problems, and a better approach is to let rich
extend poor
and have a field saying that
F R = P
.
To illustrate this, consider the pair metric_space
/ topological_space
. Consider the topology
on a product of two metric spaces. With the first approach, it could be obtained by going first from
each metric space to its topology, and then taking the product topology. But it could also be
obtained by considering the product metric space (with its sup distance) and then the topology
coming from this distance. These would be the same topology, but not definitionally, which means
that from the point of view of Lean's kernel, there would be two different topological_space
instances on the product. This is not compatible with the way instances are designed and used:
there should be at most one instance of a kind on each type. This approach has created an instance
diamond that does not commute definitionally.
The second approach solves this issue. Now, a metric space contains both a distance, a topology, and a proof that the topology coincides with the one coming from the distance. When one defines the product of two metric spaces, one uses the sup distance and the product topology, and one has to give the proof that the sup distance induces the product topology. Following both sides of the instance diamond then gives rise (definitionally) to the product topology on the product space.
Another approach would be to have the rich type class take the poor type class as an instance parameter. It would solve the diamond problem, but it would lead to a blow up of the number of type classes one would need to declare to work with complicated classes, say a real inner product space, and would create exponential complexity when working with products of such complicated spaces, that are avoided by bundling things carefully as above.
Note that this description of this specific case of the product of metric spaces is oversimplified
compared to mathlib, as there is an intermediate typeclass between metric_space
and
topological_space
called uniform_space
. The above scheme is used at both levels, embedding a
topology in the uniform space structure, and a uniform structure in the metric space structure.
Note also that, when P
is a proposition, there is no such issue as any two proofs of P
are
definitionally equivalent in Lean.
To avoid boilerplate, there are some designs that can automatically fill the poor fields when
creating a rich structure if one doesn't want to do something special about them. For instance,
in the definition of metric spaces, default tactics fill the uniform space fields if they are
not given explicitly. One can also have a helper function creating the rich structure from a
structure with fewer fields, where the helper function fills the remaining fields. See for instance
uniform_space.of_core
or real_inner_product.of_core
.
For more details on this question, called the forgetful inheritance pattern, see Competing inheritance paths in dependent type theory: a case study in functional analysis.
referenced by: [1] [2] [3] [4] [5] [6] [7] [8] [9] [10]function coercion
Many structures such as bundled morphisms coerce to functions so that you can
transparently apply them to arguments. For example, if e : α ≃ β
and a : α
then you can write e a
and this is elaborated as ⇑e a
. This type of
coercion is implemented using the has_coe_to_fun
type class. There is one
important consideration:
If a type coerces to another type which in turn coerces to a function,
then it must implement has_coe_to_fun
directly:
structure sparkling_equiv (α β) extends α ≃ β
-- if we add a `has_coe` instance,
instance {α β} : has_coe (sparkling_equiv α β) (α ≃ β) :=
⟨sparkling_equiv.to_equiv⟩
-- then a `has_coe_to_fun` instance **must** be added as well:
instance {α β} : has_coe_to_fun (sparkling_equiv α β) :=
⟨λ _, α → β, λ f, f.to_equiv.to_fun⟩
(Rationale: if we do not declare the direct coercion, then ⇑e a
is not in
simp-normal form. The lemma coe_fn_coe_base
will unfold it to ⇑↑e a
. This
often causes loops in the simplifier.)
implicit instance arguments
There are places where typeclass arguments are specified with implicit {}
brackets instead of
the usual []
brackets. This is done when the instances can be inferred because they are implicit
arguments to the type of one of the other arguments. When they can be inferred from these other
arguments, it is faster to use this method than to use type class inference.
For example, when writing lemmas about (f : α →+* β)
, it is faster to specify the fact that α
and β
are semiring
s as {rα : semiring α} {rβ : semiring β}
rather than the usual
[semiring α] [semiring β]
.
likely generated binder names
In surface Lean, we can write anonymous Π binders (i.e. binders where the argument is not named) using the function arrow notation:
inductive test : Type
| intro : unit → test
After elaboration, however, every binder must have a name, so Lean generates
one. In the example, the binder in the type of intro
is anonymous, so Lean
gives it the name ᾰ
:
test.intro : ∀ (ᾰ : unit), test
When there are multiple anonymous binders, they are named ᾰ_1
, ᾰ_2
etc.
Thus, when we want to know whether the user named a binder, we can check whether the name follows this scheme. Note, however, that this is not reliable. When the user writes (for whatever reason)
inductive test : Type
| intro : ∀ (ᾰ : unit), test
we cannot tell that the binder was, in fact, named.
The function name.is_likely_generated_binder_name
checks if
a name is of the form ᾰ
, ᾰ_1
, etc.
lower instance priority
Certain instances always apply during type-class resolution. For example, the instance
add_comm_group.to_add_group {α} [add_comm_group α] : add_group α
applies to all type-class
resolution problems of the form add_group _
, and type-class inference will then do an
exhaustive search to find a commutative group. These instances take a long time to fail.
Other instances will only apply if the goal has a certain shape. For example
int.add_group : add_group ℤ
or
add_group.prod {α β} [add_group α] [add_group β] : add_group (α × β)
. Usually these instances
will fail quickly, and when they apply, they are almost the desired instance.
For this reason, we want the instances of the second type (that only apply in specific cases) to
always have higher priority than the instances of the first type (that always apply).
See also #1561.
Therefore, if we create an instance that always applies, we set the priority of these instances to 100 (or something similar, which is below the default value of 1000).
referenced by: [1]nolint_ge
Currently, the linter forbids the use of >
and ≥
in definitions and
statements, as they cause problems in rewrites.
They are still allowed in statements such as bounded (≥)
or ∀ ε > 0
or ⨆ n ≥ m
,
and the linter allows that.
If you write a pattern where you bind two or more variables, like ∃ n m > 0
, the linter will
flag this as illegal, but it is also allowed. In this case, add the line
@[nolint ge_or_gt] -- see Note [nolint_ge]
nonarchimedean non instances
The non archimedean subgroup basis lemmas cannot be instances because some instances
(such as measure_theory.ae_eq_fun.add_monoid
or topological_add_group.to_has_continuous_add
)
cause the search for @topological_add_group β ?m1 ?m2
, i.e. a search for a topological group where
the topology/group structure are unknown.
open expressions
Some declarations work with open expressions, i.e. an expr that has free variables.
Terms will free variables are not well-typed, and one should not use them in tactics like
infer_type
or unify
. You can still do syntactic analysis/manipulation on them.
The reason for working with open types is for performance: instantiating variables requires
iterating through the expression. In one performance test pi_binders
was more than 6x
quicker than mk_local_pis
(when applied to the type of all imported declarations 100x).
operator precedence of big operators
There is no established mathematical convention
for the operator precedence of big operators like ∏
and ∑
.
We will have to make a choice.
Online discussions, such as https://math.stackexchange.com/q/185538/30839
seem to suggest that ∏
and ∑
should have the same precedence,
and that this should be somewhere between *
and +
.
The latter have precedence levels 70
and 65
respectively,
and we therefore choose the level 67
.
In practice, this means that parentheses should be placed as follows:
∑ k in K, (a k + b k) = ∑ k in K, a k + ∑ k in K, b k →
∏ k in K, a k * b k = (∏ k in K, a k) * (∏ k in K, b k)
(Example taken from page 490 of Knuth's Concrete Mathematics.)
partially-applied ext lemmas
When possible, ext
lemmas are stated without a full set of arguments. As an example, for bundled
homs f
, g
, and of
, f.comp of = g.comp of → f = g
is a better ext
lemma than
(∀ x, f (of x) = g (of x)) → f = g
, as the former allows a second type-specific extensionality
lemmas to be applied to f.comp of = g.comp of
.
If the domain of of
is ℕ
or ℤ
and of
is a ring_hom
, such a lemma could then make the goal
f (of 1) = g (of 1)
.
For bundled morphisms, there is a ext
lemma that always applies of the form
(∀ x, ⇑f x = ⇑g x) → f = g
. When adding type-specific ext
lemmas like the one above, we want
these to be tried first. This happens automatically since the type-specific lemmas are inevitably
defined later.
range copy pattern
For many categories (monoids, modules, rings, ...) the set-theoretic image of a morphism f
is
a subobject of the codomain. When this is the case, it is useful to define the range of a morphism
in such a way that the underlying carrier set of the range subobject is definitionally
set.range f
. In particular this means that the types ↥(set.range f)
and ↥f.range
are
interchangeable without proof obligations.
A convenient candidate definition for range which is mathematically correct is map ⊤ f
, just as
set.range
could have been defined as f '' set.univ
. However, this lacks the desired definitional
convenience, in that it both does not match set.range
, and that it introduces a redudant x ∈ ⊤
term which clutters proofs. In such a case one may resort to the copy
pattern. A copy
function converts the definitional problem for the carrier set of a subobject
into a one-off propositional proof obligation which one discharges while writing the definition of
the definitionally convenient range (the parameter hs
in the example below).
A good example is the case of a morphism of monoids. A convenient definition for
monoid_hom.mrange
would be (⊤ : submonoid M).map f
. However since this lacks the required
definitional convenience, we first define submonoid.copy
as follows:
protected def copy (S : submonoid M) (s : set M) (hs : s = S) : submonoid M :=
{ carrier := s,
one_mem' := hs.symm ▸ S.one_mem',
mul_mem' := hs.symm ▸ S.mul_mem' }
and then finally define:
def mrange (f : M →* N) : submonoid N :=
((⊤ : submonoid M).map f).copy (set.range f) set.image_univ.symm
simp-normal form
This note gives you some tips to debug any errors that the simp-normal form linter raises.
The reason that a lemma was considered faulty is because its left-hand side is not in simp-normal form. These lemmas are hence never used by the simplifier.
This linter gives you a list of other simp lemmas: look at them!
Here are some tips depending on the error raised by the linter:
-
'the left-hand side reduces to XYZ': you should probably use XYZ as the left-hand side.
-
'simp can prove this': This typically means that lemma is a duplicate, or is shadowed by another lemma:
2a. Always put more general lemmas after specific ones:
And not the other way around! The simplifier always picks the last matching lemma.
2b. You can also use
@[priority]
instead of moving simp-lemmas around in the file.Tip: the default priority is 1000. Use
@[priority 1100]
instead of moving a lemma down, and@[priority 900]
instead of moving a lemma up.2c. Conditional simp lemmas are tried last. If they are shadowed just remove the
simp
attribute.2d. If two lemmas are duplicates, the linter will complain about the first one. Try to fix the second one instead! (You can find it among the other simp lemmas the linter prints out!)
-
'try_for tactic failed, timeout': This typically means that there is a loop of simp lemmas. Try to apply squeeze_simp to the right-hand side (removing this lemma from the simp set) to see what lemmas might be causing the loop.
Another trick is to
set_option trace.simplify.rewrite true
and then applytry_for 10000 { simp }
to the right-hand side. You will see a periodic sequence of lemma applications in the trace message.
use has_coe_t
We use the class has_coe_t
instead of has_coe
if the first argument is a variable,
or if the second argument is a variable not occurring in the first.
Using has_coe
would cause looping of type-class inference. See
https://leanprover.zulipchat.com/#narrow/stream/113488-general/topic/remove.20all.20instances.20with.20variable.20domain
user attribute parameters
For performance reasons, it is inadvisable to use user_attribute.get_param
.
The parameter is stored as a reflected expression. When calling get_param
,
the stored parameter is evaluated using eval_expr
, which first compiles the
expression into VM bytecode. The unevaluated expression is available using
user_attribute.get_param_untyped
.
In particular, user_attribute.get_param
MUST NEVER BE USED in the
implementation of an attribute cache. This is because calling eval_expr
disables the attribute cache.
There are several possible workarounds:
- Set a different attribute depending on the parameter.
- Use your own evaluation function instead of
eval_expr
, such as e.g.expr.to_nat
. - Write your own
has_reflect Param
instance (using a more efficient serialization format). Theuser_attribute
code unfortunately checks whether the expression has the correct type, but you can use`(id %%e : Param)
to pretend that your expressione
has typeParam
.