49

It is often said that instead of proving a great theorem a mathematician's fondest dream is to prove a great lemma. Something like Kőnig's tree lemma, or Yoneda's lemma, or really anything from this list.

When I was first learning algebra, one of the key lemmas we were taught was Zorn's lemma. It was almost magical in its power and utility. However, I can't remember the last time Zorn's lemma appeared in one of my papers (even though I'm an algebraist). In pondering why this is, a few reasons occurred to me, which I'll list below. I don't want to lose my old friend Zorn, and so my question is:

What are some reasons to keep (or, perhaps in line with my thoughts, abandon) Zorn's lemma?

Edited to add: One purpose to this question is to know whether or not I should be rewriting my proofs to use Zorn's lemma, instead of my usual practice of using transfinite recursion, if there is a mathematical reason to prefer one over the other. Hopefully this clarifies the mathematical content of this question.


To motivate the discussion, let me give an example of how I would now teach ungraduates a result that was taught to me using Zorn's lemma.

Theorem: Every vector space $V$ has a basis.

Proof: First, fix a well-ordering for $V$. We will recursively work our way through the ordering, deciding whether to keep or discard elements of $V$. Suppose we have reached a vector $v$; we keep it if it is linearly independent from the previously kept vectors (equivalently, it is not in their span), otherwise we discard it. If $B$ is the set of kept vectors we see it is a basis as follows. Any vector $v\in V$ is in the span of $B$, because it is either in $B$ or in the span of the vectors previously kept. On the other hand, the elements of $B$ are linearly independent because a nontrivial combination $c_1 v_1 + \dotsb +c_k v_k=0$, where $v_1<v_2<\dotsb<v_k$ and $c_k\neq 0$ can be rearranged so $v_k$ is a linear combination of the previous vectors, so $v_k$ cannot belong to $B$ after all. $\quad\square$

Here are some of the benefits I see for this type of proof over the usual Zorn's lemma argument.

1. The use of choice is disentangled from the other parts of the proof.

When applying Zorn's lemma, it is difficult to see exactly how the axiom of choice is being used to reach the conclusion of a maximal element. One way to visualize its use is that Zorn's lemma lets us recursively build a maximal chain through the poset. This chain must have a greatest element. However, this construction is hidden behind the magic words "Abracadabra Zornify".

Is it a historical artifact that choice is hidden this way?

2. We can more easily see whether or not to use a choice principle.

In the proof above, if $V$ is already well-orderable (without AC), then we don't need to ever use the axiom of choice.

3. Zorn's lemma is no easier than transfinite recursion.

Each part of transfinite recursion already (implicitly) occurs in most Zorn's lemma arguments. The base case of the recursion corresponds, roughly, to showing that that the poset is nonempty (i.e., has some starting point). The successor ordinal step often occurs at the end; after asserting that some maximal element of the poset exists, we show that this maximal element has some claimed property by working by contradiction, and then passing to a slightly bigger element of the poset (i.e., the next successor). The limit ordinal step occurs when we show that chains have upper bounds.

4. Zorn's lemma often includes unnecessary complications.

In the proof I gave above, there is no need to define a complicated set, together with a poset relation. We can use strong induction, to avoid differentiating between the zero, successor, and nonzero limit steps. We don't need to combine the contradiction at the end with any successor step; they are entirely separated.

5. Transfinite recursion is a more fundamental principle.

As a matter of pedagogy, shouldn't we teach students about transfinite induction before we teach them a version of it that is also combined with AC, and that requires the construction of a complicated poset?

6. Transfinite recursion applies to situations where Zorn's lemma does not.

To give just one example: There are some recursions that continue along all of the ordinals (for a proper class amount of time). Zorn's requires, as a hypothesis, an end.

LSpice
  • 11,423
Pace Nielsen
  • 18,047
  • 4
  • 72
  • 133
  • 27
    So, what is your question? – Asaf Karagila Dec 11 '22 at 18:03
  • 2
    Your title should have “or” rather than “of”. – KConrad Dec 11 '22 at 18:04
  • 1
    I am not an algebraist, but I've heard that Zorn's Lemma was endorsed by Bourbaki as "the way to do it", instead of the transfinite induction which since then became outdated:-) – Alexandre Eremenko Dec 11 '22 at 18:12
  • 17
    I would restate your question as: When presenting proofs to undergraduates, would you emphasize proofs by Zorn’s lemma or proofs by transfinite induction, and why? I think that is the clear question that fits the rest of your post, though it fits the MathEducators site better than this one. –  Dec 11 '22 at 18:12
  • 4
    a related discussion is at https://math.stackexchange.com/q/318167/87355 – Carlo Beenakker Dec 11 '22 at 18:20
  • @KConrad, I have fixed that. – LSpice Dec 11 '22 at 18:28
  • 13
    I don't think this is worth making an answer of, but it's worth noting that, in a constructive math setting, Zorn's lemma might hold even in settings where the Axiom of Choice and transfinite induction (in the guise of the Bourbaki-Witt principle) fail: see this other question and the paper of Bell it cites; see also this paper on the Bourbaki-Witt principle and its relation to the existence of large enough ordinals. – Gro-Tsen Dec 11 '22 at 19:08
  • @Gro-Tsen, my feeling is that this question, while interesting, is extremely opinion-based, whereas that comment gives a non-opinion-based answer and so elevates the question—so I, at least, would like to see it as an answer! – LSpice Dec 11 '22 at 19:19
  • 1
    @LSpice Thanks, but the second (untold) reason I made a comment and not an answer is that I'm pretty sure I'd say something wrong if I elaborated further than what I said in the comment. The person who should be writing an answer along these lines is Peter LeFanu Lumsdaine, but sadly there's no way to tag someone on StackExchange. – Gro-Tsen Dec 11 '22 at 19:26
  • 15
    Can we please leave this question open because the discussion around it gives very interesting insights about the connection between AC and ZL? Thank you. And on a personal note, ZL is a good friend of mine, :-). – Martin Brandenburg Dec 11 '22 at 21:46
  • Gro-Tsen, you can tag someone like @PeterLeFanuLumsdaine with the at-sign; however, if you start a comment with such a tag, StackExchange will not let you tag someone else later in the comment. –  Dec 12 '22 at 00:20
  • 3
    @MattF. I don't think you can (effectively) tag someone not already part of the thread, though. Or has that changed? – Noah Schweber Dec 12 '22 at 01:13
  • 4
    Not sure how useful this is, but as a student who hasn't studied very much logic I find Zorn's lemma slightly more intuitive than the well-ordering principle, and I think it's pedagogically valuable not to start a proof in a class with a statement that many students will look at and say "there's no way we can do that." So, while the proof you've presented that begins with well-ordering $V$ may be more "logically robust" or attractive from a logician's perspective, it could be less satisfying to a student seeing this argument for the first time. – Carl Schildkraut Dec 12 '22 at 03:53
  • 1
    (Concretely, the Zorn's lemma application here is somewhat clear in the case of a finite-dimensional vector space over $\mathbb R$, while the well-ordering requires a well-ordering of $\mathbb R$.) – Carl Schildkraut Dec 12 '22 at 03:57
  • 15
    Yes, Zorn's Lemma is just a way of setting up a transfinite recursion, and any proof that invokes it could just as well inline its argument instead. But that's hardly specific to Zorn's Lemma. Every lemma is just a way of using that lemma's proof. Zorn's Lemma is a particular usage pattern for transfinite recursion which comes up over and over. Ok, let's give that a name to recognize and understand that pattern quickly each time it comes up again. That's what we usually do in math, we name the patterns we see a lot so we needn't think about them from scratch each time. No more to it than that. – Sridhar Ramesh Dec 12 '22 at 06:00
  • 4
    Your proof of the existence of a basis actually also makes the involvement of choice not entirely clear: in the case of $V$ being a finite-dimensional vector space over $\mathbb{R}$ with non-zero dimension, the axiom of choice is not needed - but your proof invokes it! In fact one can argue that Zorn's lemma is some kind of induction over uncountable ordered sets and then the proof using Zorn's lemma reduces to induction over $\mathbb{N}$ exactly in the case above. So one could argue that the proof using well-ordering is even less clear than the proof using Zorn's lemma. – Jannik Pitt Dec 12 '22 at 08:31
  • 1
    @JannikPitt Here the question is what your definition of finite-dimensionality is without referring to a basis. If it is the existence of a finite spanning set, then you can restrict the well-ordering in the proof to any spanning set. Then it becomes obvious that you don't need AC for finite-dimensional spaces; the "transfinite" proof is no more complicated. – Michael Greinecker Dec 12 '22 at 12:10
  • 2
    Something I find interesting is how while much of the time it's clear how to abstract a Transfinite Induction + Choice proof into a Zorn's Lemma argument, there are situations where this is not the case, specifically, when it is important that the induction be carried out along an initial well-ordering. My favourite example would be the existence of a subset of the plane which intersects every line precisely twice. Can anyone furnish a Zorn's Lemma proof? – Adam Epstein Dec 12 '22 at 14:18
  • 1
    @AdamEpstein That's a nice example, since in the TR construction it is important that the number of lines considered at any stage during the construction is less than the continuum. The naive ZL approaches to this, in contrast, fall prey to problems, particularly if the continuum is singular. But of course, one can set up a correct ZL argument simply by taking P to consist of partial (or complete) solutions of the recursion. Every chain has an upper bound, since the union of partial solutions is a solution, and the maximal element is the complete solution. – Joel David Hamkins Dec 12 '22 at 16:46
  • 1
    "...Zorn's lemma lets us recursively build a maximal chain through the poset...": you mean, Hausdorff maximal principle? – Zach Teitler Dec 13 '22 at 00:57
  • 1
    I do some mathematical research but my formal education is in physics. I've never seriously studied logic and set theory -- I did not attend any courses on that as a student and later it never seemed useful or interesting enough to spend time on learning it on my own. I'm comfortable with applying ZL but I would not be able to tell if a proof mentioning ordinal numbers is correct. Here I'm essentially admitting to be lazy and incompetent, but presumably I'm not the only one. This might be an explanation why some mathematicians might prefer ZL. – Blazej Dec 13 '22 at 18:59
  • 6
    I believe Sridhar Ramesh's comment is the correct answer to your objections, but to add a further point, one should invoke Zorn's lemma instead of transfinite induction in case it does not matter which maximal element one chooses. By contrast, transfinite induction makes it opaque whether or not you are using some special feature of the object you constructed above and beyond maximality. Transfinite induction obscures the key idea that any maximal independent set is a basis, and instead focuses on how exactly you obtain such a maximal set, which does not matter one iota in the proof. – Adam Přenosil Dec 13 '22 at 22:44
  • @PaceNielsen could you indicate me a good, succinct PDF on well order and transfinite induction and recursion? – rfloc Dec 17 '22 at 11:16
  • 1
    "Abracadabra Zornify" if it were possible, I would double-upvote this question just for those two words – am70 Dec 20 '22 at 08:28
  • 1
    I find it somewhat bizarre that just two weeks after I posted this question, a coauthor sent me a proof using Zorn's lemma! – Pace Nielsen Dec 26 '22 at 21:37

7 Answers7

58

I agree with almost everything in your post. But still, I believe I know why people use Zorn's lemma.

My answer. Zorn's lemma encapsulates succinctly many of the consequences of AC via transfinite recursion, but without requiring any involvement of the ordinals or knowledge of transfinite recursion to be used.

To those who are deeply familiar with transfinite recursion, of course, every use of Zorn's lemma can be seen as sumblimating the underlying construction, which achieves the maximal elements by a transfinite process that simultanesously explains why they exist. To appeal to Zorn seems to hide this essential explanatory underlying mechanism.

And yet, the alternative perspective is that Zorn's lemma abstracts away from the recursive process, producing in the end a simpler argument that relies only on the core consequences of the recursive process, which do not rely on any explicit engagement with ordinals or recursion. And precisely because of that feature, Zorn's lemma arguments can be undertaken and understood by mathematicians who are unfamiliar with the ordinals and transfinite recursion.

In the vector space example, to show every vector space has a basis, one can mount a transfinite recursive process: you pick an element, and then if it doesn't span, you pick another, and so on transfinitely until you have a basis. (My view of this example is a little different from how you described it, since I view the choice function as more primitive than the well order — I would build the basis by choosing amongst the element not yet in the span — indeed I prefer to view WOP itself as the outcome of recursively choosing elements.)

With Zorn's lemma, however, there is no need for ordinals or transfinite recursion, and the Zorn's lemma argument instead encapsulates abstractly their effects — the partial order consists in effect of partial undertakings of the recursive process. In this sense, the Zorn argument is simpler, abstracting away from the transfinite constructive "process".

I find the situation to be analogous to Martin's axiom and forcing. Martin's axiom is the poor mathematician's forcing, just as Zorn's lemma is the poor mathematician's choice+transfinite recursion.

My personal view is that the ordinals and transfinite recursion are one of the wonders of mathematics, a sublime achievement of the intellect resulting in many beautiful arguments and constructions. I tend to prefer the transfinite recursive arguments as providing a deeply explanatory account of the consequences of Zorn's lemma. (Even the well-order theorem seems fundamentally less mysterious when explained via the transfinite recursion — pick any element as the least element, and now pick a next element, and a next, and so on transfinitely.)

Further, although I recognize that many mathematicians have little involvement or experience with the ordinals and transfinite recursion, I also believe that their mathematical life would be improved by knowing more of them.

  • 24
    I agree. Well-ordered uncountable sets, ordinals, and transfinite induction are often regarded by "ordinary mathematicians" as unpleasant intrusions from the realm of logic and set theory. Zorn's lemma looks more like "ordinary mathematics" and hence is regarded as a minimally invasive surgical procedure. So it's a question of whether you want to cater to their prejudices, or instead teach them to stop worrying and learn to love the bomb. – Timothy Chow Dec 11 '22 at 18:36
  • 4
    While I follow your reasoning that "transfinite recursive arguments [provide] a deeply explanatory account of the consequences of Zorn's lemma", I find your claim that "Zorn's lemma is the poor mathematician's choice+transfinite recursion" somewhat bewildering. Obviously, when writing proofs (be it for a course or for a paper), it is necessary to make certain choices regarding which points to stress in the presentation. A deeper explanation of a topic also means that more resources need to be invested by the audience to understand this explanation. So when writing, say, [...] – Jochen Glueck Dec 11 '22 at 21:24
  • 2
    [...] a functional analysis book, one might appreciate the insights provided by transfinite induction, but still choose to use Zorn's lemma whenever a transfinite argument is required - since this requires less overhead and thus saves the readers' efforts for other arguments that one might consider to be more central for this book. Declaring this as a "poor mathematician's" choice is a bit like claiming that convergent sequences in metric spaces were the poor mathematician's general topology: it ignores that people from different fields necessarily focus on different aspects of an argument. – Jochen Glueck Dec 11 '22 at 21:26
  • 1
    @JochenGlueck Yes, those kinds of situations are also when people appeal to Martin's axiom, instead of using a more sophisticated treatment of forcing. And while that process is pedagogically sound, since as you say readers don't have the background, it doesn't mean that we shouldn't still view MA as the poor-mathematicians forcing. A deeper understanding of forcing will go beyond MA. And similarly, in my view, one can ultimately gain a deeper insight to transfinite recursion (but prob not in a functional analysis book). – Joel David Hamkins Dec 11 '22 at 21:36
  • 3
    @JochenGlueck: Does ZL require less overhead? Not really. Just like you don't prove things by (standard) induction by defining a set $S={n\in\Bbb N\mid n\text{ has some property}}$ and then showing that $S$ is an inductive set; but rather you just write the inductive proof, this is not very different from transfinite recursion. Especially in ZL-type arguments that always have finite character (see the Teichmüller–Tukey Lemma), and so the limit steps tend to be kind of trivial. – Asaf Karagila Dec 12 '22 at 00:27
  • 6
    @TimothyChow, I hope the joke in your analogy was intentional: Dr Strangelove was modeled in part on the very mathematician who gave the now-standard definition of von Neumann ordinals. –  Dec 12 '22 at 03:40
  • 2
    Is there something a non-expert could read that amplifies on your remark that Martin's Axiom is 'the poor mathematician's forcing'? – Mark Wildon Dec 12 '22 at 06:17
  • 1
    @AsafKaragila: I agree that induction does not require much overhead, in general. Yet, I think we're mostly talking about recursion - and even in the standard case over $\mathbb{N}$ I often find arguments that include recursive constructions surprisingly messy to write down (mainly since one often needs to establish several properties of the object that one has just constructed before one can go on to construct the next one). [...] – Jochen Glueck Dec 12 '22 at 09:36
  • 1
    [...] So yes indeed, in my experience Zorn's lemma often requires less overhead than transfinite recursion. You might convince me of the contrary, though, if you show me a proof of the Hahn-Banach extension theorem (with a level of detail that is appropriate for a standard functional analysis textbook) by transfinite recursion that does not require more overhead than the a proof by Zorn's lemma with the same level of detail. – Jochen Glueck Dec 12 '22 at 09:36
  • 2
    @JochenGlueck: You mean how there is always a hands-on proof for extending by one more point, and then just rinse-repeat until you're done? I mean, in the case of ZL you still need to show that every chain has an upper bound, on Wikipedia it is given as a "one-liner", but it's not an actual proof, since we haven't shown that every chain has an upper bound. Oh, it's trivial you say? Well, then the transfinite recursion is trivial too. Extend, extend, extend, and at limit step we have a chain, and you just agreed it's trivial that it has an upper bound. – Asaf Karagila Dec 12 '22 at 10:14
  • 2
    @JochenGlueck: The point here, of course, is about how comfortable you are with a technique. I don't need to think about anything when I cook aglio e olio and it comes out top notch, but despite it being a very simple technique requiring very few ingredients, the first dozen times I've tried to do it (properly, not just winging it) were stressful and I had to pay attention to a lot of details and still ended up burning the garlic half the time. – Asaf Karagila Dec 12 '22 at 10:16
  • 2
    @AsafKaragila: Well, but my point was not whether something is trivial (which depends on the perspective anyway), but how much overhead it requires to write the argument down. I now wrote an answer to compare both approaches in an explicit example. I think this example illustrates quite well, why I said that transfinite recursion tends to have more overhead. – Jochen Glueck Dec 12 '22 at 11:10
  • 1
    @MarkWildon This is a standard perspective on Martin's axiom, and indeed was the view of Martin himself from the start, and most accounts of MA will mention a version of it. When teaching forcing, it is quite typical to begin with Martin's axiom, since it provides experience with dense sets and genericity and forcing combinatorics, but without the added complexity of names and the forcing relation. (See Jech's Set Theory.) Students first master MA and some of its consequences, and then move on to forcing over countable models. Eventually, they are forcing over V itself. – Joel David Hamkins Dec 12 '22 at 13:39
  • 4
    @MattF. I often see that claim repeated on blogs, but at least Peter Sellers said quite definitively that "[Dr. Strangelove] was always Wernher von Braun." – Carl-Fredrik Nyberg Brodda Dec 12 '22 at 20:01
41

I agree with the existing answers, but I personally like Zorn's lemma both pedagogically and mathematically for an additional reason: the "poset of partial solutions" that it introduces is a valuable new idea in its own right. Even when it fails to have a maximal element for some reason - either because we're working in a choiceless context or, more commonly, because the poset doesn't satisfy the Zornian hypotheses - it can still be a useful tool. In particular I think of forcing arguments as such an application.

So for me, Zorn's lemma feels central because of how it enriches the idea of partial orders rather than because of its role as a theorem-prover. This is of course subjective, but transfinite recursion/induction doesn't really have the same impact for me.


EDIT: In this context, as much as I love well-orderings I think the reaction to transfinite recursion/induction described in Timothy's comment to Joel's answer is understandable. If I want to use TR/I to prove that maximal ideals exist, I need choice to build me the relevant well-ordering (or choice function) but once I have it everything is choice-free. By contrast, the poset of partial solutions exists and is easy to describe, it's just that in the absence of choice it might not have the properties I need it to. In caricature, Zorn's lemma amounts to a wild analysis of a straightforward object while the TR/I approach is a straightforward analysis of a wild object.

Noah Schweber
  • 18,932
  • 2
    I like this point very much. – Joel David Hamkins Dec 11 '22 at 21:40
  • 3
    This is in line with Matteo Viale's approach to Zorn's Lemma as a forcing axiom. – Asaf Karagila Dec 11 '22 at 21:47
  • 4
    One good example to illustrate this is the lemma in field theory that if $L/K$ is algebraic and $M/K$ is algebraically closed, there is a $K$-homomorphism $L \to M$. The poset of all partial extensions is clearly interesting in its own right, and to be honest it would be very awkward to me to write this proof down with a well-ordering of $L$. – Martin Brandenburg Dec 11 '22 at 21:53
  • 1
    @AsafKaragila I'm not aware of that - is this in a paper somewhere? – Noah Schweber Dec 11 '22 at 21:58
  • 2
    @Noah: Probably? It was something he's been saying in talks about forcing axioms since quite a few years. I'd try maybe one of the expository papers (possibly with one or more of his students). – Asaf Karagila Dec 11 '22 at 21:59
  • 2
    @NoahSchweber Matteo's point is just this: if you turn the order upside down, then minimal elements are generic filters, since they are atoms in the forcing notion. So Zorn's lemma asserts that every forcing notion for which every chain has a lower bound has a (fully) generic filter. – Joel David Hamkins Dec 11 '22 at 22:03
  • 2
    A contrast to your edit is to build the maximal ideal by TR using a choice function at each step, rather than a WO. Philosophically I find this way of thinking more appealing, since the role of the WO is just to guide choices, but if we think of choosing as a primitive triviality, banal even, then TR seems more natural. You just choose to extend, extend, extend, until you are maximal. – Joel David Hamkins Dec 11 '22 at 23:03
  • 2
    @JoelDavidHamkins I think that relies on a false intuition, though - when you look at it carefully, a general choice function is still quite wild, and there's no canonical one. So I still would put that in the category of "wild object, straightforward analysis," it's just that some mathematicians might not realize why the object in question is wild at first. – Noah Schweber Dec 11 '22 at 23:21
  • 1
    @MartinBrandenburg and Noah: I really like your answers/comments. +1! To motivate further discussion consider the following counterarguments. (1) Just because the poset of ideals of a ring is natural, that doesn't mean that the existence of a specific object (like a maximal ideal) is naturally viewed in that lens, rather than in the lens of constructing a maximal ideal via building up to it. (2) Without choice at the start, what sort of poset are you creating for the algebraic extensions of $\mathbb{Q}$? (There may be no algebraic closure in which to work, etc...) Choice seems fundamental. – Pace Nielsen Dec 11 '22 at 23:33
  • 1
    @PaceNielsen For (2), if you don't want to use a poclass, just look at all algebraic extensions in (say) $V_{\omega_{17}}$. So I strongly disagree that choice is fundamental here. For (1), I don't think that's in tension with the point of my answer (and in particular the "wild object/straightforward analysis vs. straightforward object/wild analysis"). – Noah Schweber Dec 11 '22 at 23:37
  • 1
    @NoahSchweber My (2) comment was more directed to Martin, trying to explain that the picture of the poset without AC is quite different than the picture with AC (even for easy objects like algebraic extensions of $\mathbb{Q}$, say in $V_{\omega_{17}}$). Assuming AC from the start gives us a handle on how to build up more an more extensions. Assuming it later, gives us the mistaken impression that the poset under Zorn looks just like without Zorn, when in fact that hidden AC assumption changes things radically. – Pace Nielsen Dec 11 '22 at 23:52
  • 1
    That said, this might challenge your "straightforward object" characterization of ZL. Before the use of ZL, the object is actually quite wild. – Pace Nielsen Dec 12 '22 at 00:05
  • 2
    @PaceNielsen I don't agree with that at all. The sense of "wild" I meant was a rather specific one - it's not the properties of the object I'm looking at, but rather the complexity of even pointing at the object in the first place. If you prefer, replace "straightforward" with "nicely definable." The point is simply that the relevant poset is an unambiguously extant and interesting object even pre-choice (or sans choice), in a way that the appropriate well-orderings/choice functions aren't. – Noah Schweber Dec 12 '22 at 00:07
  • 3
    That "caricature" is spot on. – Asaf Karagila Dec 12 '22 at 00:28
  • 2
    Noah, I think you misunderstood my point, which was not aimed at your "wild" concept, but rather just the point in your edit, that we should first fix a well-order, whereas I would want to fix a choice function. I take it that both are "wild" on your account, but it seems more primitive to use the choice function than the well-order, since I think there is little pre-reflective support for the existence of well-orders, while there is enormous support for the existence of choice functions. – Joel David Hamkins Dec 12 '22 at 00:36
  • 2
    @JoelDavidHamkins Ah, sorry, I did misunderstand; that's quite fair. – Noah Schweber Dec 12 '22 at 00:40
  • 1
    @NoahSchweber Suppose that I changed the theorem in my question to the new "Theorem: Every well-orderable vector space $V$ has a basis." Moreover, suppose I change the first sentence to read "For every possible well-ordering of $V$, consider the following process." Would this new proof completely avoid any "wildness" in your view? (Other than, perhaps, in the new last sentence which would read: "There is at least one such basis by hypothesis.") Pedagogically, I could see some argument for this sort of "consider all possibilities" proof, although I'm more Platonic in my AC uses. – Pace Nielsen Dec 12 '22 at 01:46
  • 1
    This answer is very much how I think of ZL. Following up on @JoelDavidHamkins’ comment, I think also ZL points us to precisely where AC is usually needed — not on subsets of the “original space”, whatever that may be, but to choose some extension of every non-maximal “partial solution”. In principle, AC may be also needed to choose some upper bound for chains of partial solutions; in practice, as per my related question, there is almost always a canonical choice for that, and it’s almost always a supremum. – Peter LeFanu Lumsdaine Dec 12 '22 at 15:12
  • 1
    @PaceNielsen Are there examples of infinite-dimensional vector spaces with "explicit" well-order (not using AC)? Of course, you can start with a well-ordered set and equip a vector-space structure on it. But do you know examples without some use of AC? Otherwise, your theorem is not very meaningful. – Oleg Eroshkin Dec 12 '22 at 20:46
  • 3
    @OlegEroshkin Yes, that occurs quite often. For instance, the polynomial ring $F[x]$, as an $F$-vector space, has an explicit well-order as long as $F$ does (so, e.g., this works when $F=\mathbb{Q}$). – Pace Nielsen Dec 12 '22 at 21:17
  • 2
    @PaceNielsen OK, that's a nice example, but do you know examples without an explicit basis? – Oleg Eroshkin Dec 12 '22 at 21:26
  • 4
    @OlegEroshkin That's a much harder question to answer, because if I had an example, one could always argue that the basis you get from the well-order is an explicit one. – Pace Nielsen Dec 12 '22 at 21:33
  • 4
    I think this conversation should probably continue in chat rather than the comment thread (if only for the selfish reason that I keep getting pinged!). – Noah Schweber Dec 12 '22 at 21:40
19

Your choice (ha) to prove the existence of a basis by using a well-ordering in place of Zorn’s Lemma turns things around historically: the first proof of the existence of algebraic bases (using only finite linear combinations) in an infinite-dimensional example was given by Hamel in 1905 for $\mathbf R$ as a $\mathbf Q$-vector space using well-orderings (his interest in this was to show there are additive maps $\mathbf R \to \mathbf R$ that are not of the form $f(x)=cx$). That is why vector space bases in the algebraic sense are called Hamel bases.

Zorn’s paper proving Zorn’s lemma appeared in 1935. But many abstract existence theorems that are usually proved today with Zorn’s lemma were known before 1935 and they were first proved by well-ordering or transfinite induction. For example, Steinitz proved the existence of an algebraic closure of every field in 1910 using well-ordering and Hahn and Banach proved (independently) the existence of an extension of bounded linear functionals in the late 1920s used transfinite induction. Some originally controversial proofs based on the axiom of choice are listed in Section 6 here, and it is still common to use choice directly in some of those proofs today (like Vitali’s nonmeasurable subset of the real numbers).

While you may feel proofs making a direct use of a well-ordering or transfinite induction is more appealing than Zorn’s lemma, the formulation of these equivalent statements in a proof as Zorn’s lemma is more attractive to others, at least psychologically: it is regarded as more convenient. (Ultimately such proofs are nonconstructive no matter what method is used.) Your point $3$ (“Zorn’s lemma is no easier than transfinite induction”) is not borne out in practice: most mathematicians do think Zorn’s lemma is easier. You could argue that this is only true today because all the modern textbooks outside of set theory have been written to emphasize Zorn’s lemma over other approaches (to the extent that the other ones may not even be mentioned except in passing), but the reason they are written this way is because mathematicians in the era after 1935 also felt Zorn’s lemma is simpler. After all, many fundamental existence theorems we see today that are proved by Zorn’s lemma were first proved by well-ordering or transfinite recursion. Ask yourself why those original proofs were abandoned.

The purpose of Zorn’s paper was to offer a new principle that could replace the direct use of transfinite induction, especially in algebra, and it succeeded: outside of set theory, proofs that were originally done with well-orderings or transfinite induction are now essentially all done with Zorn’s lemma. As Joel indicates in his answer, proofs using Zorn’s lemma avoid needing to develop the background material about ordinal numbers and their well-ordering. That is arguably why proofs by Zorn’s lemma became the dominant approach in mathematics.

While you said that in contrast to your student days you can’t remember the last time you used Zorn’s lemma, perhaps you have more recently used results that are consequences of it (e.g., maximal ideals in a general nonzero commutative ring or the algebraic closure of a general field or the extension of a non-Archimedean absolute value from a field to a huge extension of that field). If so, then your mathematical work would still be relying on it. Also consider that Zorn’s lemma is a popular tool for foundational existence theorems in great generality throughout mathematics, and once you get past the foundations you are using those existence theorems all the time rather than the methods like Zorn’s lemma that proved those existence theorems. For example, the most fundamental foundational result in functional analysis is the Hahn-Banach theorem. It is proved by Zorn’s lemma in all functional analysis books today, so even if a functional analyst is not directly using Zorn’s lemma in their work, they are indirectly using it each time they appeal to the Hahn-Banach theorem.

KConrad
  • 49,546
  • Your answer currently ends mid-sentence (curiously, added in an edit). – LSpice Dec 11 '22 at 19:03
  • 7
    I am curious about the point "most mathematicians do think Zorn’s lemma is easier." How do you know? At least for me, one thing reading MO teaches me is that most things I think I know about most mathematicians are wrong. – LSpice Dec 11 '22 at 19:03
  • 1
    @LSpice thanks for catching that mid-sentence save. I have finished that paragraph now. – KConrad Dec 11 '22 at 19:17
  • 5
    @LSpice concerning most mathematicians thinking Zorn’s lemma is easier to use, today I think this holds from the way papers and books have been written for many decades. (Shall we blame van der Waerden?) Zorn’s lemma takes getting used to, but it’s in books on group theory, field theory, commutative algebra, topology, functional analysis, etc. so you catch on to how it works. Proofs 100 years ago via transfinite recursion or well-ordering have largely disappeared from the literature that is not in set theory or areas of math near set theory. Do you often see proofs by transfinite induction? – KConrad Dec 11 '22 at 19:41
  • 7
    @KConrad I have caught myself coming up with proofs by transfinite recursion and then rewriting them to cover my tracks. There might be others. – Michael Greinecker Dec 11 '22 at 19:49
  • 1
    @KConrad You mentioned someone using maximal ideals as implicitly using Zorn's lemma. As a (friendly!) challenge, I invite you to write a proof that maximal ideals exist using Zorn's lemma, and another proof using transfinite induction. (Both with all the necessary details.) If done right, you might be surprised that all the elements of the inductive proof will be present in the Zorn proof, but the Zorn proof will be longer and more complicated. – Pace Nielsen Dec 11 '22 at 19:57
  • 6
    @LSpice: I agree with your point that it's difficult to know what "most mathematicians" think. Regarding your remark on reading MO though, I think it's fair to point out that the MO community is not representative of the mathematical research community in general. Arbitrary example: a search for the algebraic geometry tag on MO yields 20,059 questions, while a search for the PDE tag yields 3,826 questions. Comparing this to the arXiv (which yields, for 2022, twice as many results for PDEs than for algebraic geometry) indicates that algebraic geometry is extremely overrepresented on MO. – Jochen Glueck Dec 11 '22 at 20:07
  • 1
    @PaceNielsen: Hmm, I might be misunderstanding you - but I just wrote down the proof of the existence of a maximal ideal in a commutative ring with $1$ via Zorn's lemma, and I found it to be extremely simple and clear - in particular, since the claim ("existence of a maximal element") is worded in a way which perfectly fits the framework of Zorn's lemma. If I'm supposed to do it via transfinite recursion, though, the only idea that comes to my mind is to do it via recursion over a well-ordering of the ring - but I fail to see why this is shorter or less complicated. Could you elaborate? – Jochen Glueck Dec 11 '22 at 20:24
  • 1
    @JochenGlueck, re, could one write a proof in which one does induction on the (ordinal) number of successive enlargement 'steps' to go from a non-maximal ideal to a maximal one, it being impossible to enlarge more times than the cardinality of the ring? – LSpice Dec 11 '22 at 20:30
  • 1
    @LSpice: Good point - that idea strikes me as more natural than the "well-ordering of the ring" approach that I suggested (as an analyst, I also like that your argument has a somewhat "analytic touch" - it loosely reminds me of the proof that an increasing function $\mathbb{R} \to \mathbb{R}$ has at most countably many discontinuities). I'm still not sure, though, why Pace Nielsen would consider the proof via Zorn's lemma to be more complicated than that. – Jochen Glueck Dec 11 '22 at 20:39
  • 1
    @PaceNielsen I agree with Jochen that proving there are maximal ideals gives an unfair advantage to Zorn's lemma since maximality for Zorn's lemma is baked into the meaning of a maximal ideal: a proof of the existence of maximal ideals in all nonzero commutative rings by Zorn is 1-2 lines long. While the step in a proof by Zorn's lemma where you show by contradiction that the maximal element is the object you want (if it weren't, you build a strictly bigger object) corresponds to the recursive step in transfinite induction, the Zorn's lemma method has less technical overhead. – KConrad Dec 11 '22 at 20:41
  • 1
    @JochenGlueck, re, your point is well taken, but anyway even a not wholly representative sample like MO is more representative than any other sample of mathematical thought I'm likely to gather by myself--so, if what I think everyone does conflicts with what I see on MO, it seems more reasonable to assume I got it wrong than that counterexamples are overrepresented. But my impressions are also quite naïve. – LSpice Dec 11 '22 at 20:45
  • 3
    @KConrad I mostly agree with you, but as one example of a proof by transfinite induction in an algebra book, see Chapter 4, Exercise 17 in Atiyah and Macdonald. – Timothy Chow Dec 11 '22 at 20:45
  • 1
    @MichaelGreinecker: I remember myself doing the same thing once. But my intention in rewriting the proof was not to cover my tracks - I just noted that the proof became half as long and much less technical the moment I replaced transfinite recursion with Zorn's lemma. As with many techniques, I think it's fair to say that, on some occasions, one technique can be more suitable to help us to come up with a proof, while an alternative technique might be more suitable to write down the proof in a clear and efficient way. – Jochen Glueck Dec 11 '22 at 20:53
  • 1
    @LSpice: "so, if what I think everyone does conflicts with what I see on MO, it seems more reasonable to assume I got it wrong" Point taken and agreed. – Jochen Glueck Dec 11 '22 at 20:55
  • 2
    @JochenGlueck I do think proofs that avoid transfinite recursion are often shorter and slicker. At least to me, they are often less intuitive, though. – Michael Greinecker Dec 11 '22 at 21:59
  • 1
    @JochenGlueck My proof for the existence of maximal ideals would be quite similar to the one I gave for bases for vector spaces. It can be summed up in one sentence: Keep ajoining elements as long as they don't generate the trivial ideal when put together with the previously kept elements. Any further elaboration would be essentially similar to checking the details of a Zorn's lemma proof. – Pace Nielsen Dec 11 '22 at 23:20
  • 1
    @PaceNielsen: I've now written an answer where I took up your challenge to write down both proofs in detail. The one by transfinite recursion is much longer. Would you be so kind to check and, in case that I got the recursion argument unnecessarily complicated, improve it to demonstrate how it can be made similarly simple as the argument by Zorn's lemma? – Jochen Glueck Dec 12 '22 at 11:12
  • 1
    @JochenGlueck's answer referenced above. – LSpice Dec 12 '22 at 17:18
  • 1
    @LSpice: Thanks for the adding the link! By the way, could you explain how you manage to link to individual comments? It seems that I can't figure out how to do this. – Jochen Glueck Dec 12 '22 at 17:23
  • 1
    @JochenGlueck, re, if you right-click on the time stamp next to a commenter's name, then you can get a link. See Is linking to comments possible?. – LSpice Dec 12 '22 at 17:27
  • 1
    @LSpice: Hah, this is great! Thanks! – Jochen Glueck Dec 12 '22 at 17:28
13

The answers and comments so far contain a lot of abstract and philosophical discussion. Personally I think that, when comparing the advantages and disadvantages of two approaches, concrete examples are important. So I'll give one here (in response to the OP's suggestion from a comment).

Let us compare how Zorn's lemma and transfinite recursion work for the following theorem.

Theorem. Let $R$ be a non-zero ring with a multiplicatively neutral element $1$. Then there exists a maximal ideal in $R$.

Below, there are four proofs of the theorem - one by Zorn's lemma and three by transfinite recursion. The one by Zoern's lemma and the first two by transfinite recursion need the following observation, which I thus outsource into the subsequent lemma.

Note. In an earlier version of this answer I (that contained fewer proofs rather than four) I compared the proofs and stated my opinion about which of the proofs I find easier. Since then, the answer has been expanded, so I removed my personal evaluation of the "simplicity" of the proofs - but I think it is still worthwhile to have all those proofs explicitly written down here, so that one can easily compare them.

Lemma. If $\mathcal{I}$ is a non-empty set of proper ideals in $R$ which is totally ordered with respect to set inclusion, then $\bigcup \mathcal{I}$ is also a proper ideal in $R$.

Proof of the lemma. Let $a,b \in \bigcup \mathcal{I}$ and $r \in R$. Then there are $I_1,I_2 \in I$ such that $a \in I_1$ and $b \in I_2$. As $\mathcal{I}$ is totally ordered, one of the ideals $I_1,I_2$ - call it $I_k$ - contains the other. Hence, $a,b \in I_k$, and thus the elements $a+b$, $ab$, $ar$, $ra$ are also in $I_k$ and thus in $\bigcup \mathcal{I}$. So $\bigcup \mathcal{I}$ is indeed an ideal. Finally, this ideal is proper since none of the ideals in $\mathcal{I}$ contains $1$, and hence neither does $\bigcup \mathcal{I}$. $\square$

Proof of the theorem via Zorn's lemma: Let $\mathcal{J}$ denote the set of all proper ideals in $R$; this is a poset with respect to set inclusion. Note that $\mathcal{J}$ is non-empty since it contains $\{0\}$. If $\mathcal{I} \subseteq \mathcal{J}$ is a non-empty chain, then the lemma implies that $\bigcup \mathcal{I} \in \mathcal{J}$, and clearly $\bigcup \mathcal{I}$ is an upper bound of $\mathcal{I}$. So by Zorn's lemma $\mathcal{J}$ has a maximal element. $\square$

First proof of the theorem by transfinite recursion (taken from a comment by Joel David Hamkins). We recursively define an increasing family of proper ideals $I_\beta$, indexed over the ordinal numbers:

  • $I_0 := \{0\}$.

  • If $\beta$ has a predeccesor $\alpha$, see if $I_\alpha$ is maximal. If yes, the proof is complete; if not, choose $I_\beta$ to be a larger proper ideal than $I_\alpha$.

  • If $\beta$ is a limit point, define $I_\beta$ to be the union of the preceding ideals. Then $I_\beta$ is, according to the lemma, also a proper ideal.

This procedure has to terminate at some ordinal number; if it didn't, we would find an ideal in $R$ whose cardinality is strictly larger than the cardinality of $R$. $\square$

Second proof of the theorem by transfinite recursion. Endow $R$ with a well-ordering. We recursively define proper ideals $I_r$ for each $r \in R$: assume that $I_s$ has already been defined for all $s < r$. If the ideal generated by $\bigcup_{s < r} I_s \cup \{r\}$ is not equal to $R$, then we define $I_r$ to be this ideal; this is clearly proper. Otherwise we define $I_r := \{0\}$ if $r$ is the smallest element in $R$ (so $I_r$ is proper since $R \ne \{0\}$), and $I_r:= \bigcup_{s < r} I_s$ in any other case; then $I_r$ is a proper ideal due to the lemma.

The family $(I_r)_{r \in R}$ is increasing, so the lemma implies that $I := \bigcup_{r \in R} I_r$ is a proper ideal. If $t \in R$ is not in $I$, then $t \not\in I_t$, so the ideal generated by $\bigcup_{s < t} I_s \cup \{t\}$ is $R$, and hence the ideal generated by $I \cup \{t\}$ is $R$. Thus, $I$ is maximal. $\square$

Added by Pace Nielsen:

A different proof of the theorem by transfinite recursion, without using the lemma. Given a well-ordering $(r_{\alpha})$ for $R$, we recursively set $$s_{\alpha}:= \left\{ \begin{matrix} r_{\alpha} & \text{ if }1\notin \langle r_{\alpha},s_{\beta}\ :\ \beta<\alpha\rangle\\ 0 & \text{ if }1\in \langle r_{\alpha},s_{\beta}\ :\ \beta<\alpha\rangle.\end{matrix}\right.$$ The ideal $M$ generated by the $s$'s is not $R$, otherwise $1$ would be generated by a minimal finite number of them $s_{\alpha_1},\ldots,s_{\alpha_k}$, with $\alpha_1<\ldots< \alpha_k$. This contradicts the minimality of $k$ if $s_{\alpha_k}=0$ (noting that $k\neq 0$ since $R\neq \{0\}$), otherwise this directly contradicts the definition of $s_{\alpha_k}$. Moreover, no new element can be adjoined to $M$ without generating the trivial ideal, so $M$ is maximal. $\square$

(If you like, you could also explain why the ideal $M$ equals the set of $s$ elements.)

Jochen Glueck
  • 11,625
  • Maybe a stupid question ... but why is the ideal generated by $I_t \cup {t}$ equal to $R$? – Martin Brandenburg Dec 12 '22 at 11:46
  • 6
    So, your argument is "I am much less familiar with TR/I, so it requires more careful proof writing and therefore significantly more overhead and more time in reading it as well". But that is not a good argument. Now, don't get me wrong. Some proof do lend themselves to ZL much more naturally than to TR/I, but the "overhead" is about how sloppy you are willing to be. In the case of ZL we all have so much experience, that we can be very sloppy. I have seen enough set theory papers where the proof literally read "By transfinite recursion." or something like that, which have no overheads. – Asaf Karagila Dec 12 '22 at 11:53
  • 8
    @AsafKaragila: No, my argument is that both proofs in my answer are equally carefully written, and still the second one is much longer (in fact I'm not under the impression that I am less familiar with transfinite induction than with ZL; I use both of them now and then). You seem to argue that the second proof includes more details than the first one, though. Could you please elaborate which details you would add to the ZL proof such that you would consider it to have a similar level of detail as the second proof? – Jochen Glueck Dec 12 '22 at 12:17
  • @MartinBrandenburg: Ah, sorry. I messed up the definition of $I_r$. Fixed now. If $\bigcup_{s < t} I_t \cup {t}$ were equal to $R$, then by the definition of $I_t$, $I_t$ would include $t$. – Jochen Glueck Dec 12 '22 at 12:26
  • 4
    "Fix a well-ordering of $R$, and construct by recursion an increasing sequence of ideals, $I_r$, by adding the $\alpha$th element in the $\alpha$th stage, if possible; taking unions at limit stages. This has to terminate, and therefore the ideal has to be maximal." – Asaf Karagila Dec 12 '22 at 13:19
  • 4
    My version of the proof by transfinite recursion: start with the trivial ideal, and at each stage, choose a proper extension if there is one. At limits take unions. Since this can't continue forever, one finds a maximal ideal. – Joel David Hamkins Dec 12 '22 at 13:19
  • @AsafKaragila We have the same view. – Joel David Hamkins Dec 12 '22 at 13:19
  • Note that there is no need to fix a wellorder, since one can just use AC directly. – Joel David Hamkins Dec 12 '22 at 13:21
  • 6
    @AsafKaragila: "Fix a well-ordering of R, and construct by recursion an increasing sequence of ideals, $I_r$, by adding the αth element in the αth stage" But this is not what actually happens: at some stages you do not add the $\alpha$th element, but continue without it. Apart from this, you are suggesting to shorten the recursive proof by leaving out a lot of technical details, whereas the ZL proof that I gave does not leave out any technical details at all. (If you disagree with this, it would be nice if you could point out which technical details you think are missing.) – Jochen Glueck Dec 12 '22 at 14:07
  • 1
    @JochenGlueck I don't think my short proof is missing any technical details. In my view, the construction simply doesn't require the complexities and notation you employ, either for the ZL proof or for the TR proof. – Joel David Hamkins Dec 12 '22 at 15:19
  • 4
    @JoelDavidHamkins: If one takes the transfinite recursion argument that you suggest (which is different from the one in my answer, that was essentially suggested by the OP), then I agree that things become rather simple. I'm not sure, though, what you mean by "I don't think my short proof is missing any technical details." Clearly, you leave many things for your readers to figure out for themselves: for instance, you write "at limits, take unions". You don't write which limits you are referring to, you don't write which sets you unify, [...] – Jochen Glueck Dec 12 '22 at 15:49
  • 4
    [...] and you don't explicitly point out what you do with the unions. You did not mention either that (and where) you use that the monotone union of proper ideals is again a proper ideal. So I find that you actually left out a lot of details. (I have no objections against doing so whenever you think this is fine for your audience. I just object the claim that there were no technical details missing from what you wrote.) – Jochen Glueck Dec 12 '22 at 15:50
  • 4
    But isn't this just the point? When one is familiar enough with TR, those details are obvious and don't need to be stated. For a reader less familiar with TR, one needs to include boring obvious details. There is a translation of details between ZL and TR, with the upper bound condition corresponding to the limit case, and the appeal to Zorn to get the maximal node corresponding to the appeal to AC at successors. – Joel David Hamkins Dec 12 '22 at 15:56
  • 1
    @JoelDavidHamkins: I have the following situation in mind: say one teaches the existence of maximal ideals in an advanced undergraduate course; the students don't know Zorn's lemma nor transfinite recursion, yet. Option 1: introduce ZL and use it to prove the theorem; option 2: introduce transfinite recursion and use it to prove the theorem. In case 1, I would feel comfortable to give precisely the proof from my answer; in case 2, I would not feel comfortable to give only the level of detail from your comment. – Jochen Glueck Dec 12 '22 at 16:16
  • 3
    Right, so this is just about familiarity with the method, which is what we are saying, and this is what I argued in my answer. What I disagree with is your claim that TR somehow inherently requires more complexity. It doesn't seem to, when the reader is familiar with the method. That said, for a class of beginners to understand why ZL is true, I would immediately start describing the process of building a transfinite chain, iteratively extending nodes forever until one finds oneself at a maximal element. That is, the intelligibility of TR underlies our reasons for accepting ZL. – Joel David Hamkins Dec 12 '22 at 16:24
  • 1
    @JoelDavidHamkins: "so this is just about familiarity with the method" Yes, I think we can agree on this. "your claim that TR somehow inherently requires more complexity" I don't think that I claimed this (at least, I did not intend to claim this). I just claimed that, if one writes down both proofs for the specific theorem in my answer with high - and comparably high - level of technical detail, this amounts to more technical overhead for the recursive argument. (But I admit that the difference becomes pretty small if one uses your recursive argument rather than the one in my answer.) – Jochen Glueck Dec 12 '22 at 16:45
  • Please note that I have no intention of claiming that Zorn's lemma were somehow "better" or easier in a general sense. As I said in some comments to other answers, I like to sometimes use transfinite recursion myself. I just believe that there are good reasons why proofs by ZL are sometimes perceived to be easier or clearer. – Jochen Glueck Dec 12 '22 at 16:45
  • 1
    It might be interesting to compare a different example, eg, existence of a minimal prime ideal in a commutative ring. The ZL proof is reasonably familiar, but I don't know how easy TI/TR would be, perhaps an expert here could say. – Zach Teitler Dec 12 '22 at 17:49
  • 3
    @JochenGlueck Thank you for taking a shot at trying my friendly challenge. I think this brings up another reason for using ZL: Some people will find it easier to create proofs in that framework. That said, your TI proof is quite different than the one I would have written. I'm going to edit your answer with my personal TI proof, and then you are welcome to compare/constrast with your ZL proof. (If you disagree with me editing this way, feel free to revert.) – Pace Nielsen Dec 12 '22 at 17:53
  • @PaceNielsen: Yes, I would be great if you could adjust the proof to show how you would have written it. – Jochen Glueck Dec 12 '22 at 18:07
  • 2
    @PaceNielsen For what is worth, and it is of course only anecdotal, I find your solution much harder to read than the Zorn lemma one (and I'm somewhat familiar with transfinite recursion). I needed to do more than a double take to understand what you meant with "keeping" an element. I'm sure this is on some level a matter of personal preference, but your solution strikes me as much more informal and less detailed than Jochen Glück's. – Denis Nardin Dec 13 '22 at 09:33
  • 2
    @PaceN To expand on what Denis wrote, I think that a lot of the complication in your proof is hidden behind the inoffensive-sounding word "keep". If you try rewriting the proof in the language of sets without using that word, it suddenly becomes much more convoluted. And if an undergrad honestly asks "what do you mean by 'keep'?" you'll have to do it. – Federico Poloni Dec 13 '22 at 21:47
  • @FedericoPoloni Agreed. But the same goes for "poset" in the ZL proof. :-) – Pace Nielsen Dec 13 '22 at 23:44
  • 1
    @PaceNielsen: Thanks for adding your proof! I tend to agree with Denis and Federico: Both our proofs use preliminary order theoretical knowledge which we don't spell out - my ZL proof uses posets and chains, yours uses well-orders. But if we assume this order theoretical knowledge to be given in both cases, then I don't think that there are any technical details missing from the ZL proof, while I agree with Federico that are still many details missing about what is meant by the word "keep". – Jochen Glueck Dec 14 '22 at 06:43
  • 1
    @JochenGlueck I personally felt "kept" was perfectly understandable, and not really related to any order-theoretic knowledge, but I've added details since you objected. In turn, I would object that your "proof of the lemma" is lacking important details (not related to order-theoretic details). To make a proper comparison, you might want to add those details (and perhaps remove your old TR proof). – Pace Nielsen Dec 14 '22 at 16:24
  • 1
    Actually, now that I think about it, the lemma also is hiding some very important order-theoretic details. Namely, a union of ideals is not always an ideal, so something very special happens with chains. Thus, those missing details are quite important. – Pace Nielsen Dec 14 '22 at 16:30
  • 1
    @PaceNielsen: Thanks for the edit! Yes, the lemma certainly includes relevant properties of chains (I would even go as far as saying that the lemma contains the essence of the arguments in both my proofs). The reason why I didn't prove the lemma is that the same lemma was also needed for my TR proof, so it didn't matter for the comparison. But your latest comment made me note for the first time that your TR proof actually does not use this lemma. I think that's indeed a very good point. I have to urgently prepare a lecture now, but I'll edit the post tomorrow to make for a better comparison. – Jochen Glueck Dec 14 '22 at 17:19
  • @PaceNielsen: Sorry the delay - somehow I just forgot that I planned to edit the post. I've now added the proof of the lemma, and yet another proof of the theorem by transfinite recursion. (I also removed my conclusion which proof if find easiest, since I think it might be better to just have the various proofs in the answer so that everybody can compare for themselves - which can, or course, still be discussed in the comments, though.) – Jochen Glueck Jan 20 '23 at 11:00
  • @PaceNielsen: Anway, I read your proof again, and I am now having trouble to understand the following part of the argument: "This contradicts the minimality of $k$ if $s_{\alpha_k}=0$." Probably I am just overlooking something simple - but could you elaborate on why there is a contradiction? – Jochen Glueck Jan 20 '23 at 11:06
  • @JochenGlueck If $s_{\alpha_k}=0$, then the other elements generate $1$ without it. (Zero doesn't contribute anything new.) – Pace Nielsen Jan 20 '23 at 16:12
  • @PaceNielsen: Hmm, I'm still confused. If $s_{\alpha_k} = 0$, then it follows from the definition of $s_{\alpha_k}$ that $1 \in \langle r_{\alpha_k}, s_\beta: \beta < \alpha_k \rangle$. Why does this imply that $1 \in \langle s_{\alpha_1}, \dots, s_{\alpha_{k-1}} \rangle$? – Jochen Glueck Jan 20 '23 at 18:54
9

Before transfinite induction or Zorn's lemma, there's already a choice with the integers. You can study elementary number theory using weak induction (WI)

$$(\forall P \subseteq \mathbb N)\bigl((P(0) \wedge \forall n (P(n) \implies P(n+1))) \implies \forall n P(n)\bigr) \tag{*}\label{WI}$$

or using the least-number principle (LNP):

$$(\forall P \subseteq \mathbb N)\bigl(P = \emptyset \vee \exists n (P(n) \wedge \forall m (P(m) \implies n \leq m))\bigr). \tag{**}\label{LNP}$$

Remarks:

  • I find WI closer in spirit to transfinite induction, and LNP closer in spirit to Zorn's lemma.

  • My sense is that WI is a more common pedagogical choice in this setting than LNP.

  • Nevertheless, it's possible to teach elementary number theory by invoking LNP repeatedly without ever discussing WI. For instance, the Ross Mathematics Program takes this approach systematically.

  • (It occurs to me that LNP only says that $\mathbb N$ is well-ordered, while WI also proves that $\mathbb N$ has order type $\omega$. The stronger claim follows from LNP plus the fact that $\mathbb N$ is an ordered commutative monoid with cancellative addition.)

Pros and cons of WI versus LNP

The philosophy at Ross is that students understand things more deeply by using LNP rather than WI.

  1. WI feels much more like a separate "law of reasoning", on a par with modus ponens, whereas LNP feels much more like an "axiom", more on a par with the associativity of multiplication in a group. I think that most mathematicians outside logic more-or-less learn the basic "laws of reasoning" once, and then set them in stone, while fluidly learning, inventing and considering new axioms for new mathematical objects, all the time. It’s convenient for them to keep the "laws of reasoning" to a minimum, packaging as many mathematical concepts as possible as "axioms", which are more flexible and more amenable to variations, generalizations, and modifications later.

  2. This is particularly convenient at Ross, where students are introduced to many things at once, e.g. the basic laws of reasoning at the same time as the abstract notion of a ring. For them the natural numbers $\mathbb N$ play a dual role — as a primitive domain for performing induction, and as a first example of the general notion of commutative semirings. So using LNP nicely identifies $\mathbb N$ as “the unique commutative semiring which is well-ordered”, while WI identifies $\mathbb N$ more awkwardly as “a commutative semiring with a separate logical role.”

  3. LNP encourages second-order reasoning, and picturing subsets geometrically, in a way that WI doesn't.

Pros and cons of transfinite induction versus Zorn's lemma

For transfinite induction and Zorn's lemma, maybe the shoe is on the other foot. After all, the ordinals are rarely thought of as a special instance of a more general class of mathematical objects. Perhaps it's more honest to consider the ordinals logically privileged. Maybe the image of building something step by step is more helpful than the image of teleporting to the maximal element.

LSpice
  • 11,423
Tim Campion
  • 60,951
  • 3
    For me, the reason the well-ordering principle is used at the Ross program in place of induction is that when using well-ordering there to prove there are no integers between 0 and 1 (sketch: if there is $0 < n < 1$ for a positive integer $n$, then $0 < n^2 < n$, so the nonempty set ${n : 0 < n < 1}$ has no least element, contradicting well-ordering), it feels like a proof. But proving that by induction ($1 \geq 1$, and if $n \geq 1$ then $n+1 \geq 1$) feels circular because the only reason we accept induction is that we know intuitively that the positive integers start at 1! – KConrad Dec 11 '22 at 20:49
  • 6
    Slightly off topic, but people might find this article interesting: Are induction and well-ordering equivalent? by Lars–Daniel Öhman. – Timothy Chow Dec 11 '22 at 20:49
  • Every other use of well-ordering in proofs at the Ross program is just an inductive proof dressed up to fit the well-ordering format since students are not allowed to use induction: you want to prove a property of all positive integers, so you look at the set $S$ of positive integers where the property fails, let $\ell$ be the least element where it fails, show $\ell \not= 1$ (since $1$ has the property, as in the base case of induction), so $\ell > 1$ (since no positive integers are less than $1$!), then $\ell-1$ fits the property, and then prove $\ell$ does too (like an inductive step). – KConrad Dec 11 '22 at 20:55
  • @KConrad, would starting your induction at $n = 0$ fix that? (Of course I'm kidding—that just moves the feeling of circularity from "the positive integers start at $1$" to "each positive integer is $1$ more than the last"—but, of course, we have to start somewhere, and a definition that builds in the obvious facts is arguably better than one where the proofs of the obvious facts feel difficult!) – LSpice Dec 11 '22 at 21:12
  • 2
    To my way of thinking, ($\ast$) and ($\ast\ast$) are essentially identical, proved equivalent by a very easy argument, and it seems unimportant to distinguish greatly between them. A deeper understanding of induction emphasizes this flexibility. In contrast, the main distinction to make wrt the question is not involving transfinite induction at all but rather transfinite recursion, that is, the distinction between a method of proof and a method of construction. Applications of Zorn are often viewed as constructions, and the alternative in question is a construction by transfinite recursion. – Joel David Hamkins Dec 11 '22 at 21:52
  • 1
    I find the LNP is much better for teaching students, because it focuses their attention on the smallest counterexample. Consider a student trying to prove "every positive integer has a prime factorization". If they are using WI, then there is a stumbling block right from the start: The correct inductive hypothesis is "for all $1 \leq m \leq n$, the integer $m$ has a prime factorization", not "the integer $n$ has a prime factorization". You can dodge that bullet by making your axiom be strong induction: $( (\forall_{m < n} P(m)) \implies P(n) ) \implies \forall_n P(n)$. (continued) – David E Speyer Dec 12 '22 at 14:02
  • 1
    But, even then, I find that students thinking about strong induction for this problem will (1) spend a lot of time trying to relate the factorization of $n+1$ to the factorization of $n$ and (2) will insist on calling the number which they want to factor $n+1$ and will then be confused because it has a long name. Using the LNP points their attention at the structure of $n$, where it should be. – David E Speyer Dec 12 '22 at 14:06
  • 2
    From a more sophisticated perspective, when I am building lengthy inductive combinatorial proofs, I also like thinking about a minimal counterexample, because I often don't know yet what I will induct on. I often fill up a notebook page with reductions between one instance of a problem and then another, and then I look at it to find a statistic which decreases in each reduction. – David E Speyer Dec 12 '22 at 14:08
  • 2
    @DavidESpeyer I agree with that, David, and this is just what I meant by flexibility. Students taught with a inflexible rigid approach to induction are often stymied by trivial variations of the method, such as having $-5$ as a base case, or induction on pairs, etc. When I teach induction, I emphasize the main idea of reducing to simpler instances, and prove all the standard forms (LNP, common, strong, etc.) are equivalent. In my proof-writing book, I also start with LNP as the easiest initial form. – Joel David Hamkins Dec 12 '22 at 15:45
  • 5
    The distinction described in this answer (and in the nice paper of Öhman that @TimothyChow links) is quite different from the Zorn’s Lemma / transfinite induction distinction, and also from other distinctions sometimes called “well-ordering” vs “induction”. The distinction here is specific to the naturals: it’s the difference between inductiveness/well-foundedness of the “sucessor” relation, and of the “less-than” relation. So it’s a nice point to discuss, and pedagogically important, but quite orthogonal to questions about more general well-orders. – Peter LeFanu Lumsdaine Dec 12 '22 at 16:18
  • 3
    @DavidESpeyer, you can also say “minimal criminal” instead of “minimal counterexample” for LNP. Maybe undergrads would enjoy the rhyme — I don’t know that I’ve heard that phrase more than once, but I remember it from my undergrad combinatorics class with Katalin Vesztergombi. –  Dec 12 '22 at 20:52
  • @MattF. How standard is "LNP"? At Ross, it's affectionately called "WOP" for "well-ordering principle", which seems equally descriptive to me. – Tim Campion Dec 13 '22 at 19:32
  • Any acronym is better than asterisks! But to me, “least number principle” sounds like natural English, and “well-ordering principle” sounds like an awkward translation of das Wohlordnungsprinzip — perhaps the most direct translation given that Wohl is a particle rather than an ordinary word like “well” or “good”, but still an awkward translation that I’m happy to avoid. –  Dec 13 '22 at 20:03
  • 1
    Before I found this answer I was about to say almost exactly your first paragraph (down to using the Ross Program as an example)! – Noah Snyder Dec 14 '22 at 17:19
  • 1
    By the way, I also learned to call this WOP, both at MOP and at PROMYS. I never saw LNP before. – David E Speyer Dec 14 '22 at 18:34
6

First, I like this organization because it gives a better intuition for what we're likely to lose if we would prefer to trade Choice for Countable Choice, say, since all the work of AoC is done in "Fix a well-ordering."

To answer your question, I'd say your presentation suggests that we should be exactly as comfortable with Zorn as we are with the Well-Ordering Principle, and allows the reader to follow that intuition where it leads -- if a reader is fine with asserting that we can fix a well-ordering, the rest follows; if a reader is going to choke on that then the rest is irrelevant, to some extent.

6

Here's another example, from a rather worm's-eye view. (Apologies for too many edits.)

Theorem: Let $R$ be a non-zero commutative ring with $1$. Then there exists a minimal prime ideal in $R$, i.e., a prime ideal which is minimal among prime ideals.

Lemma: Let $C$ be a totally-ordered collection of prime ideals. The intersection $\bigcap C$ is a prime ideal.

The proof of the lemma is pretty straightforward algebra.

Proof of Lemma: Suppose $xy \in \bigcap C$, but $x \notin \bigcap C$. There is some $P \in C$, $x \notin P$. Let $Q \in C$. If $Q \subseteq P$, we have $x \notin Q$, but $xy \in Q$; since $Q$ is prime, $y \in Q$. In particular, $y \in P$. If $P \subseteq Q$ then $y \in Q$. So $y \in Q$ for all $Q \in C$, and $y \in \bigcap C$. $\square$

Now, the theorem.

Proof (Zorn's Lemma): Let $S$ be the set of prime ideals in $R$. $S$ is nonempty, e.g., $R$ has at least one maximal ideal and it is prime. If $C \subseteq S$ is a totally-ordered collection of prime ideals then the intersection $\bigcap C$ is a prime ideal by the lemma. The collection $S$ has chains bounded (below), so by Zorn's lemma it has a minimal element. $\square$

Proof (Transfinite recursion; following comment below): Start from a maximal ideal $P_0$. At each $\alpha$, if $P_\alpha$ is not minimal then let $P_{\alpha+1}$ be a smaller prime ideal. At limit steps, take intersections (this uses the algebraic lemma). The recursion ends on a minimal prime ideal. $\square$

Proof (Well ordering): Well-order the elements of $R$. Start from a maximal ideal $P_{\varnothing}$. At each element $r$, consider the intersection of the prime ideals at elements before $r$; if it has a prime subideal not containing $r$, choose one such---maybe the lexicographically first such?---to be $P_r$, otherwise set $P_r$ to be that intersection. This gives a nested sequence of prime ideals, whose intersection is a prime ideal, and it is minimal because if there were a prime subideal avoiding any particular element $r$ in the intersection, then we would have already passed to a subideal avoiding $r$ at the $r$'th step. $\square$

All three of these proofs are very similar. They produce the minimal prime ideal by "sculpting" rather than by "building", i.e., carving away bad parts of the ring instead of gathering together elements of the ideal. (The slightly embarrassing earlier edits of this answer reveal my failure to find a "building" proof.)

I can't argue that the last two proofs are any harder to find or less natural than the Zorn's lemma proof. (I needed a hint, but that's just me.) But at least for this example there is no way that the Zorn proof is longer or more complicated.

Zach Teitler
  • 6,197
  • 4
    A transfinite recursion proof would proceed as follows: We construct a descending chain of prime ideals. To start, let $P_0$ be any maximal ideal (which requires AC to produce in the first place). At the successor steps, if $P_{\alpha}$ is not minimal, let $P_{\alpha+1}$ be any prime ideal below it (which is where we use choice). At the limit steps, take intersections. The recursion stops at a minimal prime ideal. – Pace Nielsen Dec 12 '22 at 22:32
  • 1
    Thanks. That makes sense. ... Is that also a hint for well-ordering? Maybe I should have started with a maximal ideal, and for each $r$, if there's a prime subideal of the current ideal that avoids $r$, then move to that ideal. We get a descending chain of prime ideals that avoids "as much as possible". – Zach Teitler Dec 12 '22 at 22:36
  • 1
    That would work, although I should mention that my question isn't so much about well-ordering. Rather, it is about the difference between recursively constructing objects (while using choice as needed, be it via well-ordering the back ring, or not) vs. using Zorn's lemma to assert the existence of the object you want. – Pace Nielsen Dec 12 '22 at 23:07