Recent Posts

Understanding Rabinowitsch's trick

Rabinowitsch’s trick is a clever argument to prove the Nullstellensatz from its weak form, by introducing an extra variable. Explictly, having proved that every maximal ideal in a polynomial ring over an algebraically closed field \(k\) has a zero, the trick allows you to prove that if you start with any ideal, take the corresponding algebraic set and then go back, you get the radical of the ideal you started with.

I have always found this argument quite confusing and mysterious. However, as it turns out, abstracting things out slightly and organising the proofs carefully makes the trick seem less magical and more like something anyone could have come up with.

So, let \(k\) be an algebraically closed field. For any commutative ring \(R\), let \(\mathsf{mSpec}(R)\) be the maximal spectrum of \(R\), i.e. the set of the maximal ideals of \(R\).

The weak Nullstellensatz as stated above can easily be generalised to any finitely generated \(k\)-algebra \(A\) as follows:

Theorem 1. The map \(k\text{-}\mathsf{Alg}(A, k) \to \mathsf{mSpec}(A)\) that maps a homomorphism to its kernel is a bijection.

Proof. The map is obviously injective. The weak Nullstellensatz is exactly the statement that the map is surjective in the case when \(A\) is a polynomial ring over \(k\). In fact, the points of the affine space \(\mathbb{A}^n\) are exactly the \(k\)-algebra homomorphisms \(k[x_1, \ldots, x_n] \to k\).

For a general finitely generated \(A\), write \(A\) as a quotient of a polynomial ring \(k[x_1, \ldots, x_n]\). Given any maximal ideal \(\mathfrak m\) of \(A\), there is a maximal ideal \(\mathfrak m'\) of \(k[x_1, \ldots, x_n]\) that contains the pullback of \(\mathfrak m\). Find a homomorphism \(\phi\) whose kernel is \(\mathfrak m'\). Then clearly \(\phi\) descends to the quotient, and its kernel is a maximal ideal containing \(\mathfrak m\), so it must be equal to it.\(\square\)

And now we can apply the trick itself. Let \(\mathfrak a\) be any ideal of \(k[x_1, \ldots, x_n]\). Denote by \(V(\mathfrak a)\) the corresponding algebraic set. This can be thought of as the set of \(k\)-algebra homomorphisms to \(k\) that are zero on \(\mathfrak a\). Suppose \(f\) vanishes on \(V(\mathfrak a)\), i.e. \(f\) is in the kernel of all the corresponding homomorphisms. Denote by \(\sqrt{\mathfrak a}\) the radical of \(\mathfrak a\).

Proposition 2. For \(f\) as above, \(f \in \sqrt{\mathfrak a}\).

Proof. Let \(A\) be the coordinate ring of \(V(\mathfrak a)\), i.e \(A = k[x_1, \ldots, x_n] / \mathfrak a\). If \(f\) is not in the radical of \(\mathfrak a\), then it is not nilpotent in \(A\). Therefore, the localisation \(A[f^{-1}]\) is not the zero ring. Now the key observation is that \(A[f^{-1}]\) is finitely generated, since it is isomorphic to \(A[t]/(tf - 1)\). Therefore, by Theorem 1 there exists at least one homomorphism \(\phi: A[f^{-1}] \to k\), since \(A[f^{-1}]\), being non-trivial, has at least one maximal ideal. But pulling back \(\phi\) to \(k[x_1, \ldots, x_n]\) we get a homomorphism that is zero on \(\mathfrak a\), but nonzero on \(f\), a contradiction.\(\square\)

I like how in this reformulation the whole idea of introducing a new variable is now a completely natural operation, since the variable basically appears by itself by unfolding one of the explicit definitions of localisation. The extra variable is only used to prove that the localisation is finitely generated, and does not play any role in the actual argument.

Also, the argument goes through unchanged if we replace the polynomial ring with an arbitrary finitely generated \(k\)-algebra, and define \(V\) directly in terms of homomorphisms to \(k\).

In fact, the essence of Proposition 2 is really the following statement, which has essentially the same proof.

Proposition 3. In finitely generated \(k\)-algebra, the nilradical is equal to the Jacobson radical, i.e. it is the intersection of all the maximal ideals.

Proof. Let \(A\) be a finitely generated \(k\)-algebra. All nilpotent elements of \(A\) are contained in every prime ideal, and in particular in every maximal ideal. Conversely, if \(f \in A\) is not nilpotent, the localisation \(A[f^{-1}]\) is a non-trivial finitely generated algebra, so it admits a homomorphism \(\phi\) to \(k\), by Theorem 1. Pulling back \(\phi\) gives a homomorphism of \(A\) that maps \(f\) to a unit. In particular, its kernel is a maximal ideal that does not contain \(f\).\(\square\)

If \(X\) is a set of homomorphisms \(A \to k\), denote with \(I(X)\) the ideal of \(A\) of elements that are sent to zero by every homomorphism in \(X\). So \(V\) and \(I\) generalise the usual Galois connection between algebraic sets and ideals of a polynomial ring to arbitrary finitely generated \(k\)-algebras.

We can now make a one-line calculation to prove Proposition 2: \[ I(V(\mathfrak a)) = \bigcap_{\phi: A/\mathfrak a \to k} \pi^{-1}(\mathsf{ker}(\phi)) = \bigcap_{\mathfrak m \in \mathsf{mSpec}(A/\mathfrak a)} \pi^{-1}(\mathfrak m) = \pi^{-1}\bigcap_{\mathfrak m \in \mathsf{mSpec}(A/\mathfrak a)} \mathfrak m = \pi^{-1}(\sqrt 0) = \sqrt{\mathfrak a}, \]

where \(\mathfrak a\) is any ideal of \(A\), and \(\pi: A \to A/\mathfrak a\) is the projection to the quotient.


O(2) as a semidirect product

It is a simple linear algebra exercise to show that \(O(n)\), the group of orthogonal \(n × n\) real matrices, is the semidirect product of its subgroup \(SO(n)\), consisting of matrices of determinant 1, and the cyclic group \(C_2\) on two elements.

In other words, there is a short exact sequence of Lie groups \[ 1 → SO(n) → O(n) → O(1) → 1, \] where the map \(O(n) → O(1)\) is the determinant, and this sequence splits through any homomorphism \(O(1) → O(n)\) that maps \(-1\) to a reflection.

We can replicate this construction in HoTT, but unfortunately only for the case \(n = 2\), where, thanks to some fortuitous coincidences, we can obtain (the homotopy types of) the Lie groups above and their deloopings.

First, let \(\mathsf{Aut}(F)\) denote the automorphism group of a type \(F\), i.e. the type \(F ≅ F\). If \(F : \mathcal{U}\), where \(\mathcal{U}\) is a univalent universe, then we can define a delooping of \(\mathsf{Aut}(F)\) simply as the connected component of \(F\) in \(\mathcal{U}\), i.e. \[ B\mathsf{Aut}(F) = (X : \mathcal{U}) × \| X = F \|. \]

Now we can define: \[ \begin{array}{l} O(1) :≡ \mathsf{Aut}(\underline{2}), \\ O(2) :≡ \mathsf{Aut}(S¹). \\ \end{array} \]

The first definition is justified by the fact that \(O(1)\) is the discrete group with two elements, which is isomorphic to the permutation group \(Σ_2 ≡ \mathsf{Aut}(\underline{2})\).

As for the second, consider the function \((S^1 → S^1) → S^1\) that maps a function to its value on the base point. One can show that the fibres of this map form the constant \(ℤ\) family. So \((S^1 → S^1) ≅ S^1 × ℤ\), where composition acts like multiplication on the second component. It follows that \(\mathsf{Aut}(S^1) ≅ S^1 × ℤ^* ≅ S^1 × \mathsf{Aut}(\underline{2})\). This argument shows that the inclusion \(O(2) → \mathsf{Aut}(S^1)\) is a homotopy equivalence in spaces, which justifies our definition.

To replicate the exact sequence above in type theory, we have to construct it at the level of deloopings. Let \(p: \mathcal{U}→ \mathcal{U}\) be defined by \[ p(X) :≡ \| X = S¹ \|_0. \] It follows from the above calculations of \(\mathsf{Aut}(S¹)\) that \(p(S¹) = \underline{2}\). Therefore, \(p\) maps the connected component of \(S¹\) to the connected component of \(\underline{2}\), hence it defines a function \(p: BO(2) → BO(1)\). Define \(BSO(2)\) to be the fibre of \(p\).

Lemma 1. For \(A : BO(1)\), there is an equivalence \((\underline{2} = A) ≅ A\).

Proof. Let \(ϕ_A : (\underline{2} = A) → A\) be defined by \(ϕ_A(α) = α(0)\), where we are implicitly regarding an element of \(\underline{2} = A\) as an equivalence. Since \(ϕ_{\underline{2}}\) is obviously an equivalence, it follows that \(ϕ_A\) is an equivalence for all \(A\) in the connected component of \(\underline{2}\).\(\square\)

Proposition 2. The map \(\| BO(2) \|_1 → BO(1)\) induced by \(p\) is an equivalence.

Proof. Since both types are connected, it is enough to show that the induced map on the loop spaces is an equivalence. For any \(X : BO(2)\), let \(ϕ: (X = S^1) → p(X)\) be the map obtained by composing \(\mathsf{ap}_p: (X = S^1) → (p(X) = p(S¹))\) with the equivalence \((p(X) = p(S¹)) → p(X)\) given by Lemma 1, and using the fact that \(p(S¹) = \underline{2}\). We will show that \(ϕ\) is equal to the canonical projection into the 0-truncation.

By path elimination, it is enough to show that the reflexivity path \(S^1 = S^1\) is mapped to its image in the 0-truncation, which follows immediately from expanding the definitions.

In particular, the map \(p(X) → p(X)\) induced by \(ϕ\) is an equivalence, hence \(\mathsf{ap}_p\) is an equivalence by 2-out-of-3.\(\square\)

Corollary 3. Ω(BSO(2)) ≅ S¹

Proof. By identifying \(Ω\| BO(2) \|\) with \(ΩBO(1) = \underline{2}\), we get that the map induced by \(p\) on the loop spaces is equivalent to the second projection \(S¹ × \underline{2} → \underline{2}\). In particular, its fibre is \(S¹\). The conclusion then follows from the long exact sequence of homotopy groups.\(\square\)

So we have the desired fibre sequence \(BSO(2) → BO(2) → BO(1)\). To show that our definition of \(BSO(2)\) matches what we have in spaces, there is one more subtle point that we need to address. We only showed that the loop space of \(BSO(2)\) is equivalent to \(S¹\) as a space, but a priori there could be another ∞-group structure on \(S¹\), corresponding to another delooping. Fortunately, though, \(S¹\) is the Eilenberg-MacLane space \(K(ℤ, 1)\), hence any delooping of it must be a \(K(ℤ, 2)\), which means that there is exactly one. In particular, \(BSO(2)\) as we defined it is a possible model for the infinite dimensional complex projective space, another one being what arises from the type-theory construction of Eilenberg-MacLane spaces, namely \(\| S² \|_2\).

The fibre sequence \(BSO(2) → BO(2) → BO(1)\) gives rise to an exact sequence of \(∞\)-groups \(SO(2) → O(2) → O(1)\). Such a sequence splits if and only if the second map has a section.

Since the suspension of \(\underline{2}\) is equivalent to \(S^1\), suspension can be regarded as a pointed map \(Σ : BO(1) → BO(2)\). As usual, we will denote by \(\mathsf{merid}: A → N = S\) the path constructor in the definition of the suspension \(Σ A\), and take the north pole \(N\) as the base point of \(Σ A\).

Proposition 4. The suspension map \(Σ: BO(1) → BO(2)\) is a section of \(p: BO(2) → BO(1)\).

Proof. For any type \(A\) in \(BO(1)\), let \(σ_A : A → A\) be the only non-identity equivalence of \(A\), and define a function \(ψ_A : A → S^1 → ΣA\) so that \(ψ_A(a)\) corresponds to the loop \(\mathsf{merid}(a) · \mathsf{merid}(σ_A(a))^{-1}\).

Then for all \(A : BO(1)\), and all \(a: A\), \(ψ_A(a)\) is an equivalence, since this is true for \(A ≡ \underline{2}\), and being an equivalence is a proposition. We have thus defined a map \(A → (S¹ ≅ ΣA)\), hence in particular a map \(A → \| S^1 ≅ ΣA \|_0\). It is easy to verify that the latter map is an equivalence for \(A ≡ 2\), which implies that it is an equivalence for all \(A : BO(1)\). This shows that \(Σ\) is a section of \(p\), concluding the proof.\(\square\)


Deloopings of 1-groups

By a delooping of a space \(X\) we mean a connected pointed space \(Y\) such that \(X ≅ ΩY\). Using higher inductive types in HoTT, one can construct deloopings of 1-groups, i.e. sets with a group structure. This follows from the more general construction of Eilenberg-MacLane spaces [1].

Based on the fact that 1-groups can be delooped, one can define an \(∞\)-group to be a space that can be delooped. Classically, there are more concrete definitions of \(∞\)-groups, often as algebras of certain operads, but they are not directly replicable in HoTT. Examples of \(∞\)-groups that are not \(1\)-groups include matrix Lie Groups such as \(O(n)\), \(SO(n)\), \(U(n)\) and \(SU(n)\). Neither these groups themselves, nor their deloopings, have been constructed in HoTT, except for a few special cases (for example, \(U(1) = S^1\), \(SO(3) = ℝP^3\) and \(SU(2) = S^3\)).

Furthermore, there is no general technique that allows us to construct deloopings of such objects. In a way, this is to be expected, since we don’t yet know how to express their full \(∞\)-group structure in HoTT in way that is not simply exhibiting a delooping. Therefore, it is not yet known whether deloopings of \(S^3\) or \(ℝP^3\) exist in HoTT.

However, some special cases can be dealt with, for example, automorphism groups \(\mathsf{Aut}(X)\), for a type \(X\) in a universe \(\mathcal{U}\), defined as \(\mathsf{Aut}(X) :≡ (X =_{\mathcal{U}} X)\). In fact, if we define \(B\mathsf{Aut}(X)\) to be the image of the map \(1 → \mathcal{U}\) determined by \(X\), it is easy to see that \(ΩB\mathsf{Aut}(X) ≅ (X =_{\mathcal{U}} X) ≡ \mathsf{Aut}(X)\). These kinds of definitions are quite convenient, since they are only based on the existence of a univalent universe containing \(X\), and propositional truncation (needed to define images of maps). In particular, they do not require the existence of any other higher inductive type besides propositional truncation.

One downside is that this construction of \(B\mathsf{Aut}(X)\) raises the universe level, since it ends up being as large as the universe \(\mathcal{U}\), hence larger than \(X\) and \(\mathsf{Aut}(X)\). Furthermore, clearly not all \(∞\)-groups are of the form \(\mathsf{Aut}(X)\) for some space \(X\). On the other hand, we are not limited to considering only bare types \(X\), but we can look at automorphism groups in other categories. Using this observation, we can actually replicate the construction of Eilenberg-MacLane spaces of the form \(K(G,1)\) only using a univalent universe of sets and propositional truncation. The idea is to express any \(1\)-group \(G\) as the group of automorphism of an object in a univalent category, and then use the same construction as in the case of \(B\mathsf{Aut}(X)\).

Let \(G\) be a 1-group, i.e. a set with a group structure. A \(G\)-set is defined to be a set \(X\), together with a group homomorphism \({G}^{\mathrm{op}} → (X ≅ X)\). If \(g : G\) and \(x : X\), let \(x · g\) denote the result of the action of \(g\) on \(x\).

If \(X\) and \(Y\) are \(G\)-sets, and \(f : X → Y\) is a function, we say that \(f\) is equivariant if for all \(x : X\) and \(g : G\), we have that \(f(x) · g = f(x · g)\).

More abstractly, we can regard \(G\) as a category with one object, denoted \(\mathbf{B}G\), and define a \(G\)-set to be a presheaf on \(\mathbf{B}G\). Then an equivariant function is simply a natural transformation, and \(G\)-sets with equivariant functions form a univalent category \(G\text{-}\mathsf{Set}\). Note, however, that \(\mathbf{B}G\) itself is not univalent, unless \(G\) is the trivial group.

Clearly, \(G\) itself can be made into a \(G\)-set via its right multiplication action. We will use the notation \(\underline G\) to refer to \(G\) regarded as a \(G\)-set. Note that \(\underline G\) can be thought of as the representable presheaf corresponding to the unique object of \(\mathbf{B}G\). It then follows from the Yoneda lemma that there is an equivalence \(G ≅ G\text{-}\mathsf{Set}(\underline G, \underline G)\). Since \(\mathbf{B}G\) is a groupoid, the monoid of endomorphisms of \(\underline G\) is a group, which together with the previously stated equivalence and univalence of \(G\text{-}\mathsf{Set}\) implies that \(G ≅ (\underline G = \underline G)\).

Now let \(BG\) denote the image of the map \(1 → G\text{-}\mathsf{Set}\) corresponding to \(\underline G\). In other words, we factor the map \(1 → G\text{-}\mathsf{Set}\) into a \((-1)\)-connected map \(b : 1 → BG\) followed by a \((-1)\)-truncated map \(i : BG → G\text{-}\mathsf{Set}\). In particular, \(BG\) is pointed and connected, and we will denote its basepoint also by \(b\).

Now we can calculate: \[ \begin{aligned} & G ≅ \\ & (\underline G = \underline G) ≅ \\ & (i(b) = i(b)) ≅ \\ & (b = b) ≅ \\ & \Omega B G, \end{aligned} \]

which shows that \(BG\) is a delooping of \(G\).

I formalised the above construction of \(BG\) in agda. You can find it here.


[1]: Licata, Daniel R., and Eric Finster. 2014. “Eilenberg-MacLane Spaces in Homotopy Type Theory.” In Proceedings of the joint meeting of the twenty-third EACSL annual conference on Computer Science Logic (CSL) and the twenty-ninth Annual ACM/IEEE Symposium on Logic in Computer Science (LICS), 66:1–66:9. CSL-Lics ’14. ACM.


Monads as lax functors

Last week at FP Lunch I talked about how to generalise the notion of monad using lax functors to \(\mathsf{Cat}\).

Let’s begin by reviewing the classical definition. A monad is given by the following data:

  • a category \(\mathcal{C}\);
  • an endofunctor \(T : \mathcal{C}\to \mathcal{C}\);
  • natural transformations \(\eta : I \to T\) and \(\mu : T \circ T \to T\);

satisfying certains laws (namely: \(\mu \circ \eta T = \mu \circ T \eta = \mathsf{id}\) and \(\mu \circ T \mu = \mu \circ \mu T\)). Note that the category \(\mathcal{C}\) is considered to be part of the data, rather than fixed beforehand.

In this post, I will illustrate a compact formulation of the above definition that can easily be generalised to include other similar notions, which appear from time to time in functional programming.

Here is the punchline:

A monad is a lax 2-functor from the terminal 2-category 1 to \(\mathsf{Cat}\).

To make sense of this definition, we need to venture into the marvellous world of higher categories. If we take the definition of category that we are familiar with, we can regard it as some sort of 1-dimensional structure: we have a set of objects, which we can picture as points, and a set of arrows between them, which we imagine as (oriented) 1-dimensional lines.

It is then relatively easy to go one dimension up, and imagine an entity with 3 levels of structure: objects, morphisms, and 2-dimensional “cells” connecting arrows. This is what we call a 2-category.

More precisely, a 2-category is given by:

  • a set of objects (or 0-cells)
  • for any two objects \(x, y\), a set of morphisms (or 1-cells) \(\mathsf{hom}(x, y)\);
  • for any two objects \(x, y\), and any two morphisms \(f, g : \mathsf{hom}(x, y)\), a set of 2-morphisms \(\mathsf{hom}_2 (f, g)\).

Of course, this cannot really be the complete definition of 2-category, as we also need to be able to compose morphisms and 2-morphisms, but we won’t go into much detail here. The interested reader can find more details on this nLab page.

The primary example of 2-category is \(\mathsf{Cat}\), the 2-category of categories. Objects of \(\mathsf{Cat}\) are (ordinary) categories (also called 1-categories), morphisms are functors, and 2-morphisms are natural transformations. Another example is the terminal 2-category, containing only 1 object, and no non-identity morphisms or 2-morphisms.

As always happens in mathematics, every new structure that we define is accompanied by a corresponding notion of morphism. Given 2-categories \(\mathcal{C}\) and \(\mathcal{D}\), we want to define what it means to give a “map” \(\mathcal{C}\to \mathcal{D}\) that respects the 2-category structure. We call such maps 2-functors.

As it turns out, there are multiple ways to give a definition of 2-functor. They differ in the amount of strictness that they require. More precisely, a 2-functor \(\mathcal{C}\to \mathcal{D}\) is given by:

  • a function \(F\) mapping objects of \(\mathcal{C}\) to objects of \(\mathcal{D}\);
  • for any two objects \(x, y\) of \(\mathcal{C}\), a function (also denoted \(F\)) mapping morphisms between \(x\) and \(y\) in \(\mathcal{C}\) to morphisms between \(F x\) and \(F y\) in \(\mathcal{D}\);
  • for any two objects \(x, y\) of \(\mathcal{C}\), and morphisms \(f, g : \mathsf{hom}(x, y)\), a function mapping 2-morphisms between \(f\) and \(g\) to 2-morphisms between \(F f\) and \(F g\);

subject to certain “functoriality” properties. We can make this functoriality requirement precise in a number of different (non-equivalent) ways.

First, we might directly generalise the functoriality properties for functors, and require: \[ \begin{aligned} & F\mathsf{id}= \mathsf{id}, \\ & F (g \circ f) = F g \circ F f. \end{aligned} \]

If we do that, we get the notion of strict functor. However, the elements appearing in the above equations are objects of certain categories (namely, \(\mathsf{hom}\)-categories of \(\mathcal{D}\)), and if category theory has taught us anything, it is the idea that comparing objects of categories up to equality is often not very fruitful.

Therefore, we are naturally lead to the notion of pseudofunctor, which weakens the equalities to isomorphisms: \[ \begin{aligned} & F\mathsf{id}\cong \mathsf{id}, \\ & F (g \circ f) \cong F g \circ F f. \end{aligned} \]

However, we are interested in an even weaker notion here, called lax 2-functor, which replaces the isomorphisms above with arbitrary (possibly not invertible) 2-morphisms: \[ \begin{aligned} & \mathsf{id}\to F \mathsf{id}, \\ & F g \circ F f \to F (g \circ f). \end{aligned} \]

The direction of the arrows can be reversed, yielding the dual notion of oplax functor, which we won’t need here.

Now we understand all the terminology used in the definition above. Let \(F : 1 \to \mathsf{Cat}\) be a lax 2-functor. At the level of objects, \(F\) maps the unique object of \(1\) to \(\mathsf{Cat}\), which amounts to just picking a single category \(\mathcal{C}\). At the level of morphisms, we map the single (identity) morphisms of 1 to a functor \(T: \mathcal{C}\to \mathcal{C}\). Now, the “lax structure” produces 2-morphisms in \(\mathsf{Cat}\) (i.e. natural transformations): \(\eta : I \to T\) and \(\mu : T \circ T \to T\).

So it looks like lax 2-functors to \(\mathsf{Cat}\), at least ignoring certain details that we haven’t discussed, correspond perfectly to the classical definition of monad. I encourage the interested reader to look at the complete definition of lax functor, and verify that everything does indeed match, including the monad laws.

After all this work, generalising the definition is now extremely easy: just replace the 2-category 1 with a more general category. A simple example is: given a monoid \(S\), regard \(S\) as a 2-category with 1 object and no non-trivial 2-morphism. Lax functors \(S \to \mathsf{Cat}\) are exactly Wadler’s indexed monads.

It is also possible (although slightly more involved) to recover Atkey’s parameterised monads as lax functors. I’ll leave this as a fun exercise.


Free monads in category theory (part 3)


In the previous post, we investigated free monads, i.e. those whose monad algebras are the same as algebras of some functor. In general, however, not all monads are free, not even in Haskell! Nevertheless, monad algebras can often be regarded as algebras of some functor, satisfying certain “algebraic laws”.

In the first post of the series, we looked at the list monad \(L\). We observed that monad algebras of \(L\) can be regarded as monoids, which is to say they are algebras of the functor \(F\) given by \(F X = 1 + X²\), subject to unit and associativity laws.

The list example is interesting, because it suggests a strong connection between monads and algebraic structures. Can we always regard algebraic structures (such as groups, rings, vector spaces, etc…) as the algebras of some monad?

In this post, we will try to generalise this example to other monads by developing a categorical definition of algebraic theory based on monads and monad algebras.

Algebraic theories

The theory of monoids is a particular instance of a general pattern that occurs over and over in mathematics. We have a set of operations, each with a specified arity, and a set of laws that these operations are required to satisfy. The laws all have the form of equations with universally quantified variables.

For monoids, we have two operations: a unit \(e\), which is a nullary operation (i.e. a constant), and multiplication \(·\), a binary operation (written infix). The laws should be very familiar: \[ \begin{aligned} e · x & = x \\ x · e & = x \\ x · (y · z) & = (x · y) · z \end{aligned} \] where every free variable is implicitly considered to be universally quantified.

As we observed in the first post of this series, the functor \(F\) corresponding to the algebraic theory of monoids is given by \(F X = 1 + X²\). Algebras of \(F\) are sets equipped with the operations of a monoid, but there is no requirement that they satisfy the laws.

Since \(F\) is polynomial, it has an algebraically free monad \(F^*\), so \(F^* X\) is in particular an \(F\)-algebra. If we focus on the first law above, we see that it just consists of a pair of terms in \(F^* X\), parameterised over some unspecified element \(x : X\). This can be expressed as a natural transformation: \[ X → F^*X × F^* X \] The same holds for the second law, while the third can be regarded as a function: \[ X³ → F^*X × F^*X \]

We can assemble those three functions into a single datum, consisting of a pair of natural transformations: \[ X + X + X³ ⇉ F^* X \]

If we set \(G X = X + X + X³\), we have that the laws can be summed up concisely by giving a pair of natural transformations: \[ G ⇉ F^*, \] which, since algebraically free monads are free, is the same as a parallel pair of monad morphisms: \[ l, r : G^* ⇉ F^*, \] and this is something that we can easily generalise. Namely, we say that an algebraic theory is a parallel pair of morphisms of algebraically free monads.

Note that the terminology here is a bit fuzzy. Some authors might refer to the parallel pair above as a presentation of an algebraic theory. It ultimately depends on whether or not you want to consider theories with different syntactical presentations but identical models to be equal. With our definition, they would be considered different.


To really motivate this definition, we now need to explain what the models of an algebraic theory are. This is quite easy if we just follow our derivation of the general definition from the example.

We know that a monoid is an \(F\)-algebra \(θ : F X → X\) that satisfies the monoid laws. Since \(F\)-algebras are the same as \(F^*\)-algebras, we can work with the corresponding \(F^*\)-algebra instead, which we denote by \(θ^* : F^*X → X\).

This algebra satisfies the laws exactly when the two natural transformations above become equal when composed with \(θ^*\), i.e. when \(θ^* ∘ l = θ^* ∘ r\).

We thus define the category of models of an algebraic theory \(l, r : G^* ⇉ F^*\) as the full subcategory of \(\mathsf{Alg}_F ≅ \mathsf{mAlg}_{F^*}\) consisting of all those monad algebras \(θ^* : F^* X → X\) such that \(θ^* ∘ l = θ^* ∘ r\).

Now, we know that, in the case of monoids, this subcategory is monadic over \(\mathsf{Set}\), but is this true in general?

We begin by defining the notion of a free model for some algebraic theory. In the monoid example, we used the list monad to build a monoid out of any set, and then proceeded to prove that this construction gives the left adjoint of the forgetful functor \(\mathsf{Alg}_F → \mathsf{Set}\). This is of course the first step towards proving monadicity.

In general, there does not seem to be a way to generalise this construction directly. We pulled the list monad out of a hat, and showed that it was exactly the monad that we were looking for. We did not derive it using the functor \(F\) in a systematic way that we could replicate in the general case.

Fortunately, there is another way to produce the free monoid over a set \(X\). We start with the free \(F\)-algebra \(F^* X\), then quotient it according to the laws. Intuitively, we define an equivalence relation that relates two elements \(t₁\) and \(t₂\) whenever there is a law that requires them to be equal.

The straightforward way to formalise this intuition is to take the equivalence relation generated by such pairs \((t₁, t₂)\), then take the corresponding quotient. A more conceptual approach is to say that \(T X\) is obtained as a coequaliser: \[ G^* X ⇉ F^* X → T X. \]

In the monoid example, \(F^* X\) is the set of all trees with leaves labelled by elements of \(X\). If we regard a tree as a parenthesised string of elements of \(X\), then the equivalence relation on \(F^*\) given by the coequaliser above corresponds to identifying strings with the same underlying list of elements but possibly different parenthesizations. Therefore, \(T X\) is clearly isomorphic to the list monad.


More generally, we can take any algebraic theory, which we defined as a parallel pair of monad morphisms between free monads \(F^*\) and \(G^*\), and take the coequaliser in the category of (finitary) monads.

With some reasonable assumptions on the functors \(F\) and \(G\), we can show that this coequaliser always exists, and that the algebras of the resulting monad are exactly the models of the algebraic theory we started with.

Further reading

This concludes my series on the underlying theory of free monads and their relation with universal algebra.

Here is a list of resources where you can learn more about this topic:

  • Michael Barr and Charles Wells, Toposes, Triples and Theories

    “Triple” is the old term for monads. Chapter 3 is about the monadicity theorem and some of the material that I covered in this series.

  • Saunders Mac Lane, Categories for the Working Mathematician

    Chapter 6 is about monads and their algebras.

  • Steve Awodey, Category Theory

    The last chapter explains the relationship between initial algebras and monadic functors.

  • Francis Borceux, Handbook of Categorical Algebra

    A very comprehensive resource, with detailed proofs.