Galois theory for non-mathematicians
How a teenager invented a new branch of mathematics to solve a long standing open question about equations
You might know that to solve an equation of degree 2, ax²+bx+c = 0, we use the quadratic formula.
There
exist similar formulas for equations of degree 3 and 4, but they are
mysteriously missing for 5 or higher. More specifically, it seems like
we cannot construct the solutions to the quintic (equation of degree 5)
or higher using only addition, subtraction, multiplication, division,
and radicals (square roots, cube roots, etc). Why is that, what’s so
special about the number 5? These were questions that haunted the young
Frenchman Evariste Galois
in the early 1800s, and the night before he was fatally wounded in a
duel, he wrote down a theory of a new mathematical object called a
“group” that solves the issue in a surprisingly elegant way.
This is how he did it.
TL;DR
The
set of roots of different equations are of different complexity. Some
sets are so complex that they cannot be expressed using only simple
objects such as radicals. But how do we measure the complexity of the
roots if we cannot even calculate them, and what measure of complexity
should we use?
Permuting roots and symmetry
The answer lies in the symmetry of the roots.
Symmetry of the roots you may ask, what does that have to do with anything? What does it even mean?
Let’s plot the roots of two equations and see if we can make sense of it:
The
left one is said to be less symmetric than the right one. This might
surprise you, because in the colloquial sense of the word, symmetric is
usually used if one can reflect or rotate the object without changing
the way it looks. In that sense, the left picture looks more symmetric.
For example: The star is more symmetric than the heart, because aside from reflecting it, one can also rotate it.
But
in our case, we are going to take a more general view of symmetries. We
are not restricting ourselves to only reflections and rotations, any
function that transforms the object without changing the way it looks is
fair game. In the case of the roots, that means that any function that
interchanges (permutes) the roots in any way is valid. More functions
means more symmetric.
It turns out that in the right case, there are functions for permuting all the roots in any conceivable order, as many as 5!=120, so it is highly symmetric. But in the left case, if we interchange r₂↔r₄ using the transformation i↔−i we necessarily also interchange r₁↔r₅. This restricts us, and thus all conceivable permutations are not possible. It is less symmetric.
The
functions that permute the roots are called “Automorphisms”, and if we
group those automorphisms together we get what is called a “Group” (I
will get back to better definitions of automorphisms and groups later
on).
This
means that the group that represents the symmetries of the roots is
larger and more complex in the right case. In fact, the group in the
right case is so complex that the roots cannot be described using
radicals.
How do we know how complex a group is? To understand this we need a bit more theory.
The size of the quintic
First, let’s take a look at the size of a group. How do I know that there are some quintics that have a 5! large group?
A general quintic normally looks like this:
x⁵+ax⁴+bx³+cx²+dx+e=0
But if we take a more “root-centric” approach we can say that it looks like this:
(x−r₁)(x−r₂)(x−r₃)(x−r₄)(x−r₅)=
x⁵−(r₁+r₂+r₃+r₄+r₅)x⁴+
(r₁r₂+r₁r₃+r₁r₄+r₂r₄+r₃r₄+r₁r₅+r₂r₅+r₃r₅+r₄r₅)x³…− r₁r₂r₃r₄r₅=0
x⁵−(r₁+r₂+r₃+r₄+r₅)x⁴+
(r₁r₂+r₁r₃+r₁r₄+r₂r₄+r₃r₄+r₁r₅+r₂r₅+r₃r₅+r₄r₅)x³…− r₁r₂r₃r₄r₅=0
That is, the constant a,b,c,d,e in the first equation is replaced by a symmetric combination of the roots:
r₁+r₂+r₃+r₄+r₅=a
r₁r₂+r₁r₃+r₁r₄+r₂r₄+r₃r₄+r₁r₅+r₂r₅+r₃r₅+r₄r₅=b
(c and d omitted for brevity)
r₁r₂r₃r₄r₅ = e
r₁r₂+r₁r₃+r₁r₄+r₂r₄+r₃r₄+r₁r₅+r₂r₅+r₃r₅+r₄r₅=b
(c and d omitted for brevity)
r₁r₂r₃r₄r₅ = e
Looking at all the terms in detail, one discovers that interchanging the roots does not affect the equation (try it for b
above for example). This is true for polynomials of any degree. Since
we are able to interchange all the roots, we can draw the conclusion
that the symmetry group for this general quintic is in fact all
permutations, also called S₅ (the symmetric group of order 5).
Fields and Automorphisms
Now
we are going to expand our definition of automorphisms a bit, as they
are more than just functions that permute roots. In the process we need
to introduce something called “fields”. Why would we want to do that,
you say? The reason is, that while working with roots and their
permutations is fun, it’s a bit easier to work with fields and their
automorphisms. It is exactly the same functions, don’t worry, just
another way to look at them.
So, if the equation is, say x²–2=0, instead of working with the roots, r₁=√2, r₂=−√2 we are going to introduce the field Q(√2). This is all the rational numbers Q with an added √2. √2 is called a “field extension”. It looks like this: a+b√2 a,b∈Q. To be able to describe the root of the equation we need the field Q(√2).
For every field extension (and also other mathematical objects) we have
bunch of functions, σₙ, that sends a number to another unique number in
the same field and follow the condition σ(a+b)=σ(a)+σ(b) and σ(ab)=σ(a)σ(b). σ is a function of the extension and does not touch the underlying field Q. These function are called automorphisms. Incidentally, they also permutes the roots. This is because for the root r:
r⁵+ar⁴+br³+cr²+dr+e=0⟹
σ(r⁵+ar⁴+br³+cr²+dr+e)=σ(0)⟹
σ(r)⁵+aσ(r)⁴+bσ(r)³+cσ(r)²+dσ(r)+e=0 (since σ does not touch Q (where a, b, c, d,e lives))
σ(r⁵+ar⁴+br³+cr²+dr+e)=σ(0)⟹
σ(r)⁵+aσ(r)⁴+bσ(r)³+cσ(r)²+dσ(r)+e=0 (since σ does not touch Q (where a, b, c, d,e lives))
This means that σ(r) is also a solution to the equation. And since:
σ(r₁)−σ(r₂)=σ(r₁)+σ(−1)σ(r₂)=σ(r₁−r₂)≠0
the roots are distinct, so we have 5 of them, which must be the original 5. Thus σ must permute the roots.
Of course, this works for an equation of any degree.
Hence:
- We have our equation.
- That equation has a field that might contain an extension of a few radicals
- That field extension has a group, which is a collection of all its automorphisms.
Two Examples of Degree 3
Example 1
Equation: x³−x²−2x+2=0
The roots are (1,√2,–√2) (you can verify this yourself by just plugging them in), so the field must be Q(√2)
Writing down all the ways we can think of to permute the roots (e means identity permutation, it does nothing):
(e)
(√2↔–√2)
(1↔√2)
(1↔–√2)
(√2→−√2 and 1→√2)
(√2↔−√2 and 1↔−√2)
(√2↔–√2)
(1↔√2)
(1↔–√2)
(√2→−√2 and 1→√2)
(√2↔−√2 and 1↔−√2)
Let’s test one: Let (√2↔−√2) be σ₁:
σ₁(√2+−√2)=σ₁(0)=0=σ₁(√2)+σ₁(−√2)
σ₁((√2)(−√2))=σ₁(−2)=−2=σ₁(√2)σ₁(–√2)
σ₁((√2)(−√2))=σ₁(−2)=−2=σ₁(√2)σ₁(–√2)
So far so good. Another one.
Let (1↔√2) be σ₂:
σ₂(√2+−√2)=σ₂(0)=0 ≠ σ₂(√2)+σ₂(−√2)=1+−√2
σ₂((√2)(−√2))=2 ≠ σ₂(√2)σ₂(−√2)=1(−√2)
σ₂((√2)(−√2))=2 ≠ σ₂(√2)σ₂(−√2)=1(−√2)
Apparently σ₂ is not an automorphism, so we will have to scrap it. The other σ runs into similar problems, the only ones remaining are e and σ₁. This is called the cyclic group C₂ since we can only permute in a circle (a very small circle in this case).
Example 2
Equation: x³−2=0
The roots are
so the field must be
using ζ for brevity. This is what it what it looks like:
One
can play around with the root permutations a bit, and will soon notice
that in this case they are all automorphisms. Thus there are 3! automorphisms, which is all the root permutations, so the group must be S₃.
Another
fun thing to notice about the image above is that it looks like an
equilateral triangle and that the automorphisms exactly corresponds to
rotating and reflecting the triangle. If the automorphisms corresponds
to the symmetries of a regular polygon in this way, the group is called a
“Dihedral group”. In this case D₃. Usually the group of all permutations Sₙ is not the same as the dihedral group Dₙ, but in the case of n=3 it is.
Groups
This
seems to be a good place to segue into a little lengthier discussion
about groups. So, groups started out as collections of permutations of
roots, but can also be seen as collections of automorphisms, or
rotations and reflections of symmetrical geometrical objects. Any
collection of functions that changes an object in such a way that it
looks the same can be considered a group. But, we can actually look at
the transformations themselves without bothering about the symmetric
object that they act upon. Much in the same way that we do not bother
about piles of apples when we do arithmetic, we simply follow the rules,
similarly we can define some rules that the transformations of a group
follow, and use them.
The rules are something like this:
If
we first do a transform, and then another one we will get a third
transform that is still in the group. For example, the group C₄ is the group of all rotations one can do on a square. If a is rotating 90∘, b is rotating 180∘ and c is rotating 270∘ then a∗b=c.
Where ∗ means, first do b then a, commonly called multiplication since
it is (kind of) similar to multiplication of numbers. According to the
rule above, c has to be in the group. This is called closure.
There has to be an identity element (e) that does nothing at all.
For every element there has to be a reverse of that element.
Now, we can investigate the features of different group without having to worry about roots or polygons.
Visualizing groups
Two fun way to visualize groups are:
Cayley tables
The above is the Caylay table for an equilateral triangle, the D₃ group. It is all the elements of the group and what elements we get when we multiply them. For example, if we first do a 120∘ rotation (r) and then the same rotation again we get a 240∘ rotation rr=r² as can be seen in the table. If we do a 120∘ rotation-flip rf and a r we end up with just a flip. Notice how the elements f and r does not commute. A group were the element commute is called a abelian group.
This
particular table is still very symmetric though, but that doesn’t need
to be the case. Any scrambling around of the elements that follow the
rules is valid.
Caylay graph
The above is the D₃
Caylay graph. Here the elements are displayed in a way to show how to
get from one element to the next, where the edges are the operations. In
this case a 120∘ rotation and a flip is necessary, these (r and f)
are also called the generators of the group because one can generate
the whole group with them, starting from the identity element.
Usages of groups
Groups
tend to be useful everywhere where there is symmetry. For example, the
wallpaper groups are used to describe symmetric wallpapers. There are
some wallpapers that can be rotated 180∘ and some wallpapers that can be
reflected and some where we can do both, and so on. It turns out that
there are only 17 of them so it is a neat way of classifying wallpapers.
The above wallpapers both belongs a group called p6m.
Another,
more surprising, use of groups is in physics. It seems like the laws of
nature follow certain symmetries. For example, if one transform Newtons
second law F=ma, 10 minutes into the future it is
still the same. That the laws of nature does not change from one day to
the next seems to indicate that they are symmetrical with regard to
time-transformation. Neither do they change from one place to the next
so transformations in space are also allowed. Since it is possible to
transform time and space in arbitrarily small or large chunks the groups
describing these, Lie groups, contains an infinite amount of elements.
Interestingly
it turns out that these symmetries are all related to a conservation
law each. Time symmetry entails the conservation of energy, space
symmetry the conservation of momentum, angular symmetry (nature looks
the same from all angles) the conservation of angular momentum and so
on. This was show by Emmy Noether by just combining the symmetries with
the principle of least action, a law of nature that states that nature
tend to “take the shortest path”.
I
find it interesting how much of all the complexity and apparent chaos
of nature can be explained by such intuitive concepts as “laws of nature
does not change from day to day” and “nature tend to take the shortest
path”.
Back to fields
End of intermezzo, where were we? Right, we were talking about x³−3=0 and its roots and fields.
The field of that equation is Q(³√2,
ζ) and it would be natural to think that it looks like this: a+b³√2+cζ,
but that is wrong. The reason for this is that we want our field to be
“Closed”. That is, if we add or multiply two elements in the field we
want to stay in the field. So for example ³√2 and ζ are both in the
above field but ³√2ζ is not.
Subfields and subgroups
Looking at our examples of degree 3 above we have
It
would seem like the second field and group are more complex than the
first field and group. We can guess this by just counting the number of
terms in the field case or the number of automorphisms in the group
case. But just counting does not seem to really capture what it means to
be complex. Take for example the group C₁₂. Lots of elements, but it only rotates the roots, so it doesn’t really seem all that complex. A corresponding field is Q(e^π/6). It will contain e^π/6,e^2π/6… but again, not very complex.
Worrying
about how complex a group is is going to be key to understanding why
some roots can’t be described by only radicals, remember.
To
get a better way of appreciating the complexity we are going to
introduce the concept of a “Subfield” and a “Subgroup”. A subfield is
when you remove some of the terms but you still have a closed field.
Similarly, a subgroup is when you remove some of the automorphisms but
still have a closed group.
In the first case Q(√2), the only thing one can do is remove the √2 in the field and one of the two automorphisms in the group (we cannot remove (e) and still have a group).
As for the second case Q(³√2, ζ),
it gets a bit more complicated. One can manually distill the
sub-field/groups by just removing elements one at a time and see if the
resulting field/group is closed. After a while we arrive at this:
Interesting,
both the field and the group have four constituents. Now, it would be a
reasonable guess that the subgroups always contains exactly the
automorphisms of the subfields. But they don’t.
Fixed fields
Don’t worry, we’re almost there, it’s just a tiny bit more complicated. To see this, let’s have a look at the field Q(⁴√2, i) and its subfields.
The field Q(⁴√2, i) has the permutation-group D₄ (same as a square). Let’s look at D₄ and it’s subgroups.
The subgroup lattice is upside down in this picture with D₄ at the bottom, I will get to that shortly, but let’s first look at the subfields contra the subgroups. Q(⁴√2, i) has 5 large subfields and 3 small sub-subfields, but D₄ only has large 3 subgroups and 5 smaller sub-subgroups.
It
would seem like there are not enough large groups to permute the 5
large fields. If you were to play around with the subgroups and
subfields you would eventually come to the conclusion that the subgroups
actually permute not the subfields, but rather everything that is not
in the subfields, that they “fix” or do not touch the subfields.
So for example (f) fixes Q(⁴√2) and (r², f) fixes Q(√2).
Why is it this way rather than the other way around, as we first guessed?
I
don’t have an intuitive way of explaining this, the way I see it is
that we discovered it empirically and now we can try to prove it. The
proof goes sort of like this:
Hand-wavy fundamental theorem of Galois theory proof sketch
We
want to show that if we turn the subgroup lattice upside down we get a
one-to-one correspondence with the subfield lattice where the fields are
the fixed fields of the groups.
First,
I would like to point out that it is reasonable (sort of) that this is
the case. At the bottom group, we have all the automorphisms, who of
course move around everything except Q (fixes Q), and at the top, we only have the e-automorphism, which moves around nothing (fixes everything).
If
we start at the bottom group and remove a few of the automorphisms, the
removed automorphisms will no longer move around a small part of the
field and will thus fix that part of the field. As we remove more
automorphisms a larger and larger part of the field will be unaffected
and thus we will have a larger fixed field.
To
be a bit more rigorous we will need to be able to compare the size of
the group and the field. The group size is, of course, the number of
automorphisms in it. The size of the field is the number of terms. These
two happen to be the same, but why is this the case?
Quotient
Now,
we could look at the S₅ subgroup lattice of the quintic and see that
indeed it looks pretty complex. But in order to tie this together with
radicals we need a way to analyze complexity between groups and its
subgroups. That is: How much more complex is D₄ than C₄ for example? To
do this we introduce the concept of a “Quotient”. A quotient is
basically group division. How does that work?
In
ordinary division we do something like this: To divide 15 apples on 5
persons, we group the apples in the apple-set in 5 equal piles and every
pile will correspond to a person in the person-set. The answer to the
question 15/5 is 3, one of the piles, any pile will do since they are
equal.
A
similar thing happens when we divide groups. To divide D₄ by C₄ we
group the 8 elements of D₄ in 4 equal groups, one for each element in
C₄. How do we make the groups equal? It’s not like the elements are all
identical apples. They can be very different automorphisms for example.
Well, quotients are not always possible for exactly that reason. But
sometimes a group can be divided into “Cosets”. Say we divide D₄ in 4
equal parts with 2 elements in each. If we are lucky we can have 4 piles
of elements where the relation between the two elements are the same in
all of the piles. To be able to do this the original group have to
display a high level of self similarity. To see this, let’s look at a
Cayley graph of D₄.
As
one can see there is in fact a high level of self similarity here. The
top-left, top-right, bottom-left and bottom-right corner all looks the
same. This is our cosets.
So D₄/C₄ is basically one of these cosets, which is C₂. Hence: D₄/C₄=C₂.
Now,
by introducing quotients we actually have a concept of how to build
groups from the ground up. Just as 21 consists of 3 and 7, so do groups
consist of their subgroups. And just as we can get the constituents of a
number by dividing, 21/7=3, so we can get the constituents of a group
by taking the quotient. Since D₄/C₄=C₂, this means that if we have a C₄
group, we have to multiply it by C₂ to get to D₄. Since there is a
correspondence between fields and groups, this will play a role in how
we construct fields.
Radicals
Subgroups of the quintic
Now, I wont show a picture of the group lattice of S₅ because it is too big, but I will say a couple of things about its subgroups. One of the subgroups is A₅ (Alternating group) which is easily checked. To get from A₅ to S₅ we need S₅/A₅=C₂. Thus, we can get there by radicals, but: One subgroup of A₅ is (e), but A₅/e is not a cyclic group. This is true for any An with n≥5 by the way. Thus we cannot get there by radicals and alas, any polynomial of degree≥5 cannot be solved by radicals.
And
that is how Galois, as a teenager, invented the concept of a group to
prove a long standing open question about the unsolvability of the
quintic⁹.
Trisecting the angle
One
fun bonus fact we get from the machinery surrounding Galois theory, in
this case the tower law for fields, is a nice proof of a problem that
stumped humanity since the ancient Greeks, namely: The impossibility of
trisecting an angle with a straightedge and a compass. Apparently the
Greeks loved to draw things in this manner and were curious about the
limitations of the method.
One
example is finding a point in the middle of two other points. To do
this, set the compass on the two point and draw first a circle around
one and then around the other. Use the straightedge as a ruler and draw a
line between the points and then between the points where the circles
cross. The middle is were the lines cross.
But
how does this way of drawing translate to field theory? Well, one can
see the above problem as, say we have a field of two points, (x₁,y₁) and (x₂,y₂). We would like to extend the field to also contain the middle point. To do this we find the intersections of the circles (x−x₁)²+(y−y₁)²=r and (x−x₂)²+(y−y₂)²=r. We get two new points (x₃, y₃) and (x₄,y₄). The line between them is y=(y₄−y₃)/(x₄−x₃)x. The line between the first two points is y=(y₂−y₁)/(x₂−x₁)x. Solve for x to get were they cross.
Apparently, straightedge and compass constructions amount to solving equations of degree one and two.
But what does trisecting an angle amounts to?
The triple angle formula yields:
But
since using a straightedge and compass was the same as solving one and
two dimensional equations the only field extensions possible is 2 for
one operation, and then using the new points we can get to powers of 2: 4,8,16 etc but never 3.
Although it’s impossible to trisect the angle using only a straightedge and compass it is possible using origami.
Solving the general Quintic
It should be said that, although the general quintic cannot be solved by radicals, it can be solved by the “Jacobi theta function”.
References
- Galois Theory for Beginners: A Historical Perspective. Jörg Bewersdorff
- http://pi.math.cornell.edu/~kbro...
- Field Automorphisms
- https://kconrad.math.uconn.edu/b...
- https://faculty.math.illinois.ed...
- Wolfram|Alpha: Making the world’s knowledge computable
- https://www.wikiwand.com/en/Galois_theory
- https://www.wikiwand.com/en/%C3%89variste_Galois
- https://www.youtube.com/watch?v=8qkfW35AqrQ&list=PLwV-9DG53NDxU337smpTwm6sef4x-SCLv&index=36