## Graphs on the edge: When subgraphs become unavoidable

Archisha works (part-time) in the advertising section of a brand-new airline company. These are the times when the company has to finalize their flight schedule. The boss is a good fellow, but with a conspicuous streak of mischief. If he can help it, he’d make his passengers fly more than they really have to.

He decides to not offer a direct flight between two cities if there is a one-hop route between them already. That is to say, if there are flights from A to B and B to C, then the boss would like to avoid offering a direct flight from A to C, hoping that passengers will be tempted into the one-hop route. On the other hand, he would also like to advertise a large number of connections. He demands from his team the largest number of connections while still ensuring that one-hop routes cannot be bypassed by direct connections.

As is the case with most startups, the company is severely understaffed, so there are no strategists on hire. Using ad-hoc skills, the pilots come up with a plan with 2,500 direct flights between 100 cities, while still maintaining that if there are flights from A to B and B to C, there are no flights between A and C. The boss is impressed, but still greedy: are you sure you can’t squeeze in some more direct connections?

Naturally, over dinner that evening, there is much cursing. When Archisha extracts the full story from her colleagues, she is determined to help. Being a trained graph theorist, her first instinct is to make vertices out of cities and edges out of direct flights. She now sets out to bound the number of edges in graphs that meet the criteria that there are no edges between vertices that are connected by paths of length two. Her first observation about such graphs is that the neighborhood of every vertex must be an independent set (devoid of edges): indeed, if $u$ and $v$ are in the neighborhood of $w$ and have an edge between them, then there is a direct flight from $u$ to $v$ despite the one-hop route via $w$, which violates the business ethic of the company.

So let $X$ be the largest independent set in the graph, and let $x$ denote $|X|$. Note that this means that the degree of every vertex is at most $x$ (why?). Let’s now count the number of edges:

$$e(G) \leq \sum_{x \in G \setminus X} d(x)$$

The sum is only over $G \setminus X$, because we know there are no edges in $X$. Notice that some edges are potentially counted twice on the right hand side (these are edges that have both endpoints inside $G \setminus X$). If the graph has $n$ vertices, then the sum on the right has $(n-x)$ terms. We also know that each of these terms is at most $x$, so we have:

$$e(G) \leq \sum_{x \in G \setminus X} d(x)\leq x(n-x) \leq n^2/4.$$

The last inequality comes from the fact that the product $x(n-x)$ is maximized when $x = (n/2)$.

So the conclusion is that any flight network that satisfies the criteria demanded by the boss cannot have more than $(n^2/4)$ direct connections — thus the pilots couldn’t have managed more than 2,500 direct flights with 100 cities to connect. The argument with the boss is sealed with mathematical proof, which is about as good as it gets. Despite questionable business skills, the boss is satisfied with the reasoning, and the company finally moves on.

Notice that the restriction above amounts to saying that you want to avoid triangles in your graphs, so the question translates to the following: what is the largest number of edges that a triangle-free graph on $n$ vertices can have? As we have seen above, the answer is $(n^2/4)$. A natural question is if this is tight in general - notice that the pilots actually “achieved” this bound. In general, can you construct a graph on $n$ vertices with $(n^2/4)$ edges that does not contain any triangles?

There are also a couple of other ways of proving the bound above. A cute calculation is the following: notice that in a triangle-free graph, the two vertices that form the endpoints of an edge cannot have any common neighbors. In other words, if $(x,y) \in E$, then $d(x) + d(y) \leq n$, because every vertex other than $x$ or $y$ either lies in $N(x)$ or $N(y)$, but not both. So summing over all edges, we get:

$$\sum_{xy \in E} d(x) + d(y) \leq mn.$$

Let’s stare at the sum closely — if we write it out, we’ll see that for every vertex $x$, the term $d(x)$ appears $d(x)$ times, so the sum on the left hand side can also be written as $\sum_{x \in V} d^2(x)$. By a neat application of the Cauchy-Schwarz inequality, we have:

$$(1/n)\left[\sum_{x \in V} d(x) \right]^2 \leq \sum_{x \in V} d^2(x) = \sum_{xy \in E} d(x) + d(y) \leq mn.$$

The leftmost term involves sum of the degrees of all the vertices in a graph — but this is precisely twice the number of edges, since every edge “contributes” to two degree terms. So substituting for the sum of degrees with $2m$, we have:

$$(2m)^2 \leq mn^2 \Rightarrow m \leq n^2/4,$$

and we are done. There is also a simple proof by induction on the number of vertices, which is as follows. The base cases, for $n = 1$ and $n=2$, are trivial. Now, suppose we are given a graph on $n$ vertices and the statement is true for all values smaller than $n$. Consider any edge in $G$ with endpoints $(x,y)$. We know from before that $d(x) + d(y) \leq n$. Consider $G \setminus \{x,y\}$. Clearly, if $G$ is triangle free, so is $G \setminus \{x,y\}$. By the induction hypothesis, the number of edges in $G \setminus \{x,y\}$ is at most $(n-2)^2/4$. So the total number of edges in $G$ is bounded by:

$$m \leq [d(x) + d(y) - 1] + (n-2)^2/4 \leq [n-1] + (n-2)^2/4 = n^2/4$$

Notice that we have a $(-1)$ term to account for the fact that the expression $d(x) + d(y)$ double-counts the edge $(x,y)$.

### Beyond triangles: Turán’s Theorem

By now, the reader might have guessed that the triangle-free graphs with the largest number of edges are the complete bipartite graphs with partitions of equal size — that is, graphs where the vertex set is split into two parts of size $\lfloor (n/2) \rfloor$ each, and we have an edge for every pair of vertices that lie in different parts. The number of edges is clearly $\lfloor (n^2/4) \rfloor$, and there are clearly no triangles.

At this point, let us introduce some terminology that will be useful to remember. Let $H$ be a fixed graph. Among the class of graphs on $n$ vertices that do not contain $H$ as a subgraph, the ones that have the maximum number of edges are said to be extremal for $n$ and $H$, and the number of edges in these graphs is denoted by ex$(n,H)$. A graph $G$ that does not contain $H$ as a subgraph is said to be edge-maximal with respect to $H$ if $G \cup \{e\}$ contains $H$ as a subgraph for any edge $e$.

It should be clear that graphs that are extremal for $n$ and $H$ are necessarily edge-maximal for $H$, but the converse is not always true. Consider, for instance, a star on $(n-1)$ leaves (this is the graph on $n$ vertices that has one global vertex). This is certainly edge-maximal with respect to avoiding triangles, but is clearly not extremal.

The most natural way of generalizing the notion of triangle-freeness is to explore extremal structures that avoid $K_{r+1}$ — the complete graph on $(r+1)$ vertices — as a subgraph. After some reflection, we realize that complete $r$-partite graphs are edge-maximal with respect to avoiding $K_{r+1}$ — indeed, adding any edge to such a graph invokes the presence of a $K_{r+1}$. Among the $r$-partite graphs, it is easy to see that the ones that have the most edges are those where the parts are as equal as possible. Indeed, if two parts $A$ and $B$ of a $r$-partite graph differ in size by two or more, then shift the surplus from the larger set to the smaller till they are equal (or differ by one). A simple counting argument reveals that the number of edges in the resulting graph is more than what it was before. If $n$ divides $r$ then each part has size $(n/r)$ and the number of edges works out to being $(n^2/2) ( 1 − 1/r)$.

Pál Turán, a Hungarian mathematician, proved that graphs which are extremal with respect to avoiding $K_{r+1}$ are exactly the $r$-partite graphs where all partition sizes differ by at most one. We prove this by showing that in a graph that is extremal with respect to $K_{r+1}$, the property of being non-adjacent is an equivalence relation. Since every equivalence relation defines a partition, this would imply that the extremal graphs are partitioned into independent sets, with the only non-adjacencies being within the independent sets — thus proving Turán’s theorem.

That non-adjacency is reflexive is easy: we are dealing with simple graphs, and there are no self-loops, so every vertex is non-adjacent to itself. Symmetry is also immediate because the graph is undirected and if $x$ is not adjacent to $y$ then $y$ is also not adjacent to $x$. What about transitivity?

We would like to prove that if $xy$ is not an edge, and $yz$ is not an edge, then $xz$ is not an edge either. Suppose, for the sake of contradiction, that $xz$ was an edge. If $xz$ is an edge and $d(x) > d(y)$, then we claim that we can replace the vertex $y$ with a copy of $x$ to obtain a different graph which has more edges than $G$ and continues to avoid $K_{r+1}$. Indeed, the new graph cannot have a $K_{r+1}$ — if it does, it must involve both $x$ and the newly introduced copy of $x$, but these vertices are non-adjacent, so the new graph cannot admit a $K_{r+1}$. It is also clear that the number of edges has increased, because in removing $y$ we removed $d(y)$ edges and in adding $x$ we added $d(x)$ edges and we assumed that $d(x) > d(y)$. This contradicts the fact that $G$ was extremal with respect to $K_{r+1}$. A similar analysis shows that we cannot have $d(z) > d(y)$.

The situation that remains is that $d(x) \leq d(y)$ and $d(z) \leq d(y)$. In this scenario, we replace $x$ and $z$ each with a copy of $y$, without an edge between them. Again, it’s easy to see that the new graph also avoids $K_{r+1}$. Notice that in this process, we lost $d(x) + d(z) - 1$ edges and we added back $2d(y)$ edges. Since $d(y) \geq d(x)$ and $d(y) \geq d(z)$, we added more edges than we removed, and we have again contradicted the fact that $G$ is extremal.

Turán’s theorem also admits a slightly less imaginative proof by induction, which we leave as an exercise.