Jordan Ellenberg's Blog, page 5
January 11, 2025
Generation Z will have its revenge on Seattle
I was just in Seattle for the Joint Meetings and had dinner with an old friend. I asked her 15-year-old daughter, who’s lived in Seattle all her life, if she knew who Kurt Cobain was, and she looked at me with a slight tinge of recognition and said, “Did…. he play basketball?”
I do really love Seattle. I love the rocks coming out of the water, I love the pointy trees. Breath of happiness whenever I show up there.
December 26, 2024
Notes towards a logic puzzle
You arrive at a gate with two guards. One guard likes big butts and he cannot lie. The other guard hates big butts and he cannot tell the truth.
December 21, 2024
Live at the Lunchable
Much-needed new housing is going up on Madison’s north side where the Oscar Meyer plant used to stand, with more to come. The View and The Victoria will join other new apartment buildings in town, like Verve, and Chapter, and The Eastern. I think it would be a shame if the redevelopment failed to honor the greatest innovation Oscar Meyer ever devised at its Madison facility. There should be a luxury apartment building called The Lunchable.
December 20, 2024
Three ways to apply AI to mathematics
If you wanted to train a machine to play Go, there are three ways you could do it, at decreasing levels of “from-scratchness.
You could tell the machine the rules of the game, and have it play many millions of games against itself; your goal is to learn a function that does a good job assigning a value to a game state, and you evaluate such a function by seeing how often it wins in this repeated arena. This is an oversimplified account of what AlphaGo does.
Or you could have the machine try to learn a state function from some database of games actually played by expert human competitors — those games would be entered in some formal format and the machine would try to learn to imitate those expert players. Of course, you could then combine this with simulated internal games as in the first step, but you’d be starting with a leg up from accumulated human knowledge.
The third way would be to train on every natural-language book ever written about Go and try to produce natural-language response to natural-language questions that just tells you what to do.
I don’t actually care about Go, but I do care about math, and I think all three of these approaches have loose analogues as we ask what it might look like for machines to help mathematicians. The first, “from scratch” approach, is the side of things I’ve worked on in projects like PatternBoost and FunSearch. (OK, maybe FunSearch has aspects of both method 1 and method 2.) Here you actively try to keep prior human knowledge away from the machine, because you want to see what it can do on its own.
The second approach is where I’d put formalized proof. If we try to train a machine to get from one assertion to another using a chain of proven theorems in a formal system like Lean, we’re starting from a high platform: a huge repository of theorems and even more importantly definitions which guide the machine along channels which people have already figured out are rich in meaning. AlphaProof is like this.
The third approach is more like what GPT o1 is doing — asking whether you can circumvent the formal language entirely and just generate text which kindles mathematical insight in the mind of the human reader.
I think all of these are reasonable things to try. I guess my own mostly unjustified prejudice is that the first approach is the one that has the most to teach us about what the scope of what machine learning actually is, while the second is the one that will probably end up being most useful in practice. The third? So far I think it doesn’t work. I don’t think it couldn’t work. But I also don’t think it’s on an obvious trajectory towards working, if words like “trajectory” even make sense in this context. At some point I’ll post an o1-preview dialogue which I found very illluminating in this respect.
December 19, 2024
Makes no sense at all (an Orioles post)
The Orioles signed Tyler O’Neill, who is really good at hitting left-handers, for three years at $16.5m a year, and Tomoyuki Sugano, a 35-year-old pitcher with pinpoint control but whose fastball is down to 94 and doesn’t strike people out anymore, for one year at $13m. We are talking about trading for Dylan Cease, who’s a free agent in 2026, and who was very good last year but just good the year before that.
So the Orioles are adding payroll. We’re not cheaply Marlinning our way through the next few seasons. But unless reporting is badly wrong, it doesn’t look like they’re adding payroll by offering Corbin Burnes the $250m/8 year deal he’s likely to draw elsewhere.
There are teams for which this course of action would make sense. Probably most teams. For instance: if you had big holes at several spots, places where you were getting replacement-level performance, it makes a lot of sense to add 3-WAR guys for those spots.
But the Orioles aren’t that team. They have solid regulars all around the diamond. Unless Tyler O’Neill is playing only against lefties, he’s taking at-bats from Heston Kjerstad and (the probably gone) Anthony Santander, not Stevie Wilkerson. And Sugano is taking starts from Dean Kremer, not Asher Wojciechowski.
It also makes sense to focus on short-term deals if you have a one-year window because your best players are about to hit free agency. But the Orioles aren’t that team either. Adley Rutschman, the senior statesman of the team, isn’t a free agent until 2028.
In fact, if you were to imagine a team that should open the wallet and sign a huge contract for a free-agent starter, you’d say “it would have to be a team that had a good young core with multiple years of team control, so that at least four out of those eight years you’re paying for an ace to lead a contender. And you’d want the rotation to have enough depth to make the team a credible playoff threat every year, but lacking a guy with reliable #1 stuff.” And that is the team the Orioles are.
Twenty years ago I encountered Francis Bacon
Well, he’d already been dead a while; I mean I encountered his paintings, some of which are of people who appear to have been dead a while. I just read a short book about him which is why he’s on my mind. I went to the Musee Malliol in Paris — it was a trip to Paris Prof. Dr. Mrs. Q and I took after I finished visiting Emmanuel Kowalski in Bordeaux — and saw a special exhibition of Bacon’s paintings. Probably the last time I encountered a new artist and felt a shock of recognition and affinity. Maybe the last time it will happen?
Oh, I have blogged about that exhibition once before, in opposition to Jed Perl.
December 16, 2024
Dumbass Corners
Resolved: the intersection of Regent, Highland, and Speedway in front of Madison West HS, the worst four-way stop in the city of Madison and possibly anywhere, is to be renamed Dumbass Corners, since it is essentially impossible to navigate it without witnessing an act of vehicular dumbassery.

December 15, 2024
PatternBoost (Yet Another Paper About Math and AI)
I have a new preprint up with some new collaborators: François Charton from Meta, Geordie Williamson from Sydney, and Adam Zsolt Wagner who used to be from WPI and as of last month is from DeepMind. As you can guess from those affiliations, this is a paper at the math/machine learning interface. I’m having a lot of fun working on this stuff!
Like my earlier paper with DeepMind people, the emphasis here is on machines as fast, eccentric example generators. Suppose you wanted to find a graph on 33 nodes with as many edges as possible that has no 4-cycle. It’s not at all easy to see how you’d get started looking. Perhaps you’d think: well, what if it were a 3-cycle to exclude? And you’d quickly come up with the idea of using a bipartite graph, which has tons of edges (a positive density of all edges in the complete graph!) and no odd-length cycles at all. And maybe you’d hope some similarly clever construction would generate very dense graph with no 4-cycle. But alas there is no such construction, or at least nobody has come up with one.
One natural thing to try is a greedy algorithm: start with the empty graph and add edges at random, subject to the constraint that you can’t add any edge that creates a 4-cycle. Keep doing that until you can’t. Of course if you could do this infinitely many times you’d eventually come up with whatever the maximal squareless graph was. But you can’t. And in practice if you do it 50 million times you never get past 89 edges.
So here’s what PatternBoost does. You start by doing that greedy search above. You pick out the 50,000 best graphs, and you pass them to a transformer called Makemore and say “give me 500,000 more like this.” Then you run the greedy search again, now starting from those 500,000 graphs instead the empty graph. Now you have 550,000 squareless graphs, the ones you just made and the 50,000 you already have stockpiled. Take the 50,000 of these with the most edges and hand them back to Makemore. And now just repeat! The hope is that whatever features are common to graphs with many edges and no squares will be somehow captured by the transformer, leading it to produce candidates which after refinement by local search become still better examples. We call the greedy search the “local” step and the transformer the “global” step. Here’s the metaphor we use in the intro:
[C]onsider an example of a collective human endeavor, the design of bicycles. The “local” step involves many individual bicyclemakers each making a sequence of careful tweaks to their design, trying to make it as efficient and comfortable as possible. The “global” step involves people living in the world, seeing lots of bicycles around, each one of which has been carefully optimized locally, and then developing a new bicycle design based on those they have observed. Of course this new design would then be carefully optimized by both its designer and others; and if after those tweaks, the new bikes turned out to be notably comfortable and efficient, a lot of them would be sold, and they would join the general fleet of bicycles observed by the next prospective designer,… and on and on we go.
So does this actually work? Yes! Well, kind of. For the problem I just mentioned, PatternBoost does find a squareless graph on 33 nodes with 96 edges, which is known to be best possible. But — as with a lot of problems in math/ML, it turns out to matter a lot how you encode a graph as a string. There are lots of ways to do this, of course! You can treat a graph as an adjacency matrix, and then record the 1s and 0s in some order to get a string of bits. But what order do you put the entries in? To a pure mathematician, this distinction is barely worth mentioning. But it matters to the transformer! In particular, it helped performance a lot when we switched from encoding the matrix row by row to encoding the matrix row by row but including a comma at the end of each row. One of the psychological difficulties mathematicians have to overcome when working in this area is that choices we expect not to matter actually do.
Also: n=33 is actually the largest number of nodes where PatternBoost finds the best squareless graph. For larger n, it certainly learns to do better than greedy search, but it doesn’t match the best known examples.
But the point of this paper isn’t to construct graphs with no 4-cycles, it’s to develop a protocol that can be used on lots of different problems, so that we can try to understand which problems are likely to be graspable by the transformer. And there are problems (like squareless graphs) where PatternBoost can’t match human performance, others where it’s about equal, and others where it finds things humans weren’t able to find. And you can try it on your own problem, too! The code is all right here.
(One problem we worked on that I’ve developed kind of a taste for: Suppose a 0-1 matrix avoids the pattern (312) — i.e. there is never an i < j < k such that a_{ik}, a_{ji}, and a{kj} are all 1. How big can its permanent be? This is one where PatternBoost comes up with just about as good a solution as people have been able to. Maybe in another blog post I’ll write about why I think this is a pretty natural question, as long as you think pattern-avoiding permutations are natural in the first place.)
To be honest, I still don’t think I have a clear picture of what separates the problems PatternBoost finds hard from the problems it finds tractable. Except for one sort of obvious point — you are a lot less likely to beat the best human efforts on problems many humans have tried really hard to solve. If your goal is to show the superiority of machine cognition to the human version, this might be dispiriting. But if your goal is mathematical progress, it’s fine! Even when we’re working on a famous hard problem, our approach, if it’s at all original, is going to pass through questions and hypotheses that have not been thought about much before. If ML techniques help generate rapid answers to those unfamous questions, that’s great! Who cares if humans would have done better in a counterfactual universe where the question was famous and sought-after — or even in a counterfactual universe where one person had spent a couple of weeks on it? We don’t live in that universe! I’ve said it before and I’ll say it again: the promise of AI is mediocrity at scale.
December 13, 2024
Time for a Change
There is music that doesn’t exist on the Internet. Sometimes one forgets this! In my office I have a copy of Time for a Change, a promotional compilation Bar/None records put out in 1990. Some of this stuff is available online, like the They Might Be Giants B-sides and Otis Ball and the Chains’ catchy “Walk on Water” and Brian Dewan’s glorious and very strange “99 Cops”. But Bianca “Flystrip” Miller? The Brothers Kendall? Not sure where this music exists except on my cassette, and however many others there are out there like it.
December 12, 2024
Lifetime achievement award
The child actor who played the godlike child in the Twilight Zone episode “It’s a Good Life” grew up to be a member of the novelty rock act Barnes & Barnes, creator of “Fish Heads”:
If you don’t know “It’s a Good Life,” here’s the original short story by Jerome Bixby. It’s a doozy.
Jordan Ellenberg's Blog
- Jordan Ellenberg's profile
- 411 followers
