Modeling 3,000 Years of Human History

It’s rare to find an interesting paper on history in the Proceedings of the National Academy of Sciences, so it was interesting to stumble upon Peter Turchin et al.’s “War, Space, and the Evolution of Old World Complex Societies” who developed a model that uses cultural evolution mechanisms to predict where and when the largest-scale complex societies should have arisen in human history.

From their abstract:

How did human societies evolve from small groups, integrated by face-to-face cooperation, to huge anonymous societies of today, typically organized as states? Why is there so much variation in the ability of different human populations to construct viable states? Existing theories are usually formulated as verbal models and, as a result, do not yield sharply defined, quantitative predictions that could be unambiguously tested with data. Here we develop a cultural evolutionary model that predicts where and when the largest-scale complex societies arose in human history. The central premise of the model, which we test, is that costly institutions that enabled large human groups to function without splitting up evolved as a result of intense competition between societies—primarily warfare. Warfare intensity, in turn, depended on the spread of historically attested military technologies (e.g., chariots and cavalry) and on geographic factors (e.g., rugged landscape). The model was simulated within a realistic landscape of the Afroeurasian landmass and its predictions were tested against a large dataset documenting the spatiotemporal distribution of historical large-scale societies in Afroeurasia between 1,500 BCE and 1,500 CE. The model-predicted pattern of spread of large-scale societies was very similar to the observed one. Overall, the model explained 65% of variance in the data. An alternative model, omitting the effect of diffusing military technologies, explained only 16% of variance. Our results support theories that emphasize the role of institutions in state-building and suggest a possible explanation why a long history of statehood is positively correlated with political stability, institutional quality, and income per capita.

The model simulation runs from 1500 B.C.E. to 1500 C.E.—so it encompasses the growth of societies like Mesopotamia, ancient Egypt and the like—and replicates historical trends with 65 percent accuracy.

Smithsonian Magazine summarizes:

Turchin began thinking about applying math to history in general about 15 years ago. “I always enjoyed history, but I realized then that it was the last major discipline which was not mathematized,” he explains. “But mathematical approaches—modeling, statistics, etc.—are an inherent part of any real science.”

In bringing these sorts of tools into the arena of world history and developing a mathematical model, his team was inspired by a theory called cultural multilevel selection, which predicts that competition between different groups is the main driver of the evolution of large-scale, complex societies. To build that into the model, they divided all of Africa and Eurasia into gridded squares which were each categorized by a few environmental variables (the type of habitat, elevation, and whether it had agriculture in 1500 B.C.E.). They then “seeded” military technology in squares adjacent to the grasslands of central Asia, because the domestication of horses—the dominant military technology of the age—likely arose there initially.

Over time, the model allowed for domesticated horses to spread between adjacent squares. It also simulated conflict between various entities, allowing squares to take over nearby squares, determining victory based on the area each entity controlled, and thus growing the sizes of empires. After plugging in these variables, they let the model simulate 3,000 years of human history, then compared its results to actual data, gleaned from a variety of historical atlases.

Click here to see a movie of the model in action.

Of particular interest to me was the discussion of the limitations of the model (100-year sampling and exclusion of city-states of Greece):

Due to the nature of the question addressed in our study, there are inevitably several sources of error in historical and geographical data we have used. Our decision to collect historical data only at 100-year time-slices means that the model ‘misses’ peaks of some substantial polities such as the Empire of Alexander the Great, or Attila’s Hunnic Empire. This could be seen as a limitation for traditional historical analyses because we have not included a few polities known to be historically influential. However, for the purposes of our analyses this is actually strength. Using a regular sampling strategy allows us to collect data in a systematic way independent of the hypothesis being tested rather than cherry-picking examples that support our ideas.

We have also only focused on the largest polities, i.e those that were approximately greater than 100,000 km2. This means that some complex societies, such as the Ancient Greek city states, are not included in our database. The focus on territorial extent is also a result of our attempt to be systematic and minimize bias, and this large threshold was chosen for practical considerations. Historical information about the world varies partly in the degree to which modern societies can invest in uncovering it. Our information about the history of western civilization, thus, is disproportionately good compared to some other parts of the world. Employing a relatively large cut-off minimizes the risk of “missing” polities with large  populations in less well-documented regions and time-frames, because the larger the polity the more likely it is to have left some trace in the historical record. At a smaller threshold there are simply too many polities about which we have very little information, including their territories, and the effects of a bias in our access to the historical record is increased.

Overall, I think the supporting information for the model is actually a lot more interesting read than the paper itself.

Norbert Wiener’s The Machine Age, Published 64 Years Later

In 1949, The New York Times invited MIT mathematician Norbert Wiener to summarize his views about “what the ultimate machine age is likely to be,” in the words of its longtime Sunday editor, Lester Markel.

Wiener accepted the invitation and wrote a draft of the article; but Markel was dissatisfied and asked him to rewrite it. Wiener did. But things fell through the cracks and his articles were never published.

Until now.

Per the Times, “almost 64 years after Wiener wrote it, his essay is still remarkably topical, raising questions about the impact of smart machines on society and of automation on human labor. In the spirit of rectifying an old omission,” here are excerpts from The Machine Age.

My favorite one is titled “The Genie and the Bottle”:

These new machines have a great capacity for upsetting the present basis of industry, and of reducing the economic value of the routine factory employee to a point at which he is not worth hiring at any price. If we combine our machine-potentials of a factory with the valuation of human beings on which our present factory system is based, we are in for an industrial revolution of unmitigated cruelty.

We must be willing to deal in facts rather than in fashionable ideologies if we wish to get through this period unharmed. Not even the brightest picture of an age in which man is the master, and in which we all have an excess of mechanical services will make up for the pains of transition, if we are not both humane and intelligent.

Finally the machines will do what we ask them to do and not what we ought to ask them to do. In the discussion of the relation between man and powerful agencies controlled by man, the gnomic wisdom of the folk tales has a value far beyond the books of our sociologists.

There is general agreement among the sages of the peoples of the past ages, that if we are granted power commensurate with our will, we are more likely to use it wrongly than to use it rightly, more likely to use it stupidly than to use it intelligently. [W. W. Jacobs’s] terrible story of the “Monkey’s Paw” is a modern example of this — the father wishes for money and gets it as a compensation for the death of his son in a factory accident, then wishes for the return of his son. The son comes back as a ghost, and the father wishes him gone. This is the outcome of his three wishes.

Moreover, if we move in the direction of making machines which learn and whose behavior is modified by experience, we must face the fact that every degree of independence we give the machine is a degree of possible defiance of our wishes. The genie in the bottle will not willingly go back in the bottle, nor have we any reason to expect them to be well disposed to us.

In short, it is only a humanity which is capable of awe, which will also be capable of controlling the new potentials which we are opening for ourselves. We can be humble and live a good life with the aid of the machines, or we can be arrogant and die.


Interestingly, I didn’t make the connection to who this Weiner fellow was until I looked him up on Wikipedia. I studied the famous Wiener process (also known as standard Brownian motion) in my graduate course in stochastics.

The Mathematics of a Swimsuit

The New Yorker is currently presenting its Swimsuit issue, and one of the more interesting pieces comes from Gregory Buck, a mathematician. In the piece “A Mathematician Goes to the Beach,” Buck considers the mathematics of the swimsuit, breaking out terms such as visual volatility and singularity:

The job of a swimsuit is to uphold decency while you hang out in places where people might, conceivably, swim. We can think of this decency, this modesty, as a load or strain the suit must bear. Different suit designs solve this problem in different ways, though each must take into account the regions which must be covered (RMBCs). There has, it’s well known, been a considerable decline in the percentage of skin area covered by swimsuits over the last hundred years (which has increased visual volatility—dramatic swings to both ends of the attraction/repulsion spectrum). As the suit becomes smaller and smaller, each square inch takes on more and more of the weight of propriety.

The equation here is pretty straightforward. For example, let DL represent the total decency load. DL has been declining with time, but can be considered fixed during any given beach season. Let SA be the surface area of the suit, and SK the surface area of the skin. Then if VV is the visual volatility, we have:


The proper mathematical way to look at this is to say that since, as the suit shrinks, a finite decency mass is concentrated into an ever smaller region, the decency density grows larger and larger—growing toward infinity. This point of infinite density is called a singularity. So we have that each RMBC has an associated singularity. And each beach-goer, on each beach, has an associated decency surface, with some number of singularities. The first thing a mathematician does, when faced with a surface or space with singularities, is, naturally enough, count them. A most unusual aspect of this particular singularity problem is that the count is culturally dependent—in fact there are countries where the sum is less than it is in the United States. I have heard that there are beaches where a bather’s decency surface might have no singularities at all, a prospect I have not the courage to consider.

Hilarious and enlightening.

Stephen Wolfram on Personal Data Analytics

Stephen Wolfram, the designer of Mathematica, believes that someday everyone will routinely collect all sorts of data about themselves.

In a fascinating blog post, Wolfram admits that he’s been collecting data for many years (since 1990!), and until now, hadn’t had the chance to truly analyze the data. Using the data analytics tools in the latest release of Wolfram Alpha, Stephen Wolfram provides a summary of his outgoing and incoming email (on a daily and monthly basis), the keystrokes he’s used on his computers, how much time he’s spent on the telephone, and the number of steps he’s taken on a daily basis (since 2010). He makes the following observation about his data collection:

The overall pattern is fairly clear. It’s meetings and collaborative work during the day, a dinner-time break, more meetings and collaborative work, and then in the later evening more work on my own. I have to say that looking at all this data I am struck by how shockingly regular many aspects of it are. But in general I am happy to see it. For my consistent experience has been that the more routine I can make the basic practical aspects of my life, the more I am able to be energetic—and spontaneous—about intellectual and other things.

Wolfram mentions that the data he presents in the blog post only touches the surface of the kinds of data he’s collected over the years. He’s also got years of curated medical test data, his complete genome, GPS location tracks, room-by-room motion sensor data, and “endless corporate records.” I am guessing a secondary post from him will be forthcoming some day.

As for Wolfram’s conclusions about the future of personal analytics?

There is so much that can be done. Some of it will focus on large-scale trends, some of it on identifying specific events or anomalies, and some of it on extracting “stories” from personal data.

And in time I’m looking forward to being able to ask Wolfram|Alpha all sorts of things about my life and times—and have it immediately generate reports about them. Not only being able to act as an adjunct to my personal memory, but also to be able to do automatic computational history—explaining how and why things happened—and then making projections and predictions.

As personal analytics develops, it’s going to give us a whole new dimension to experiencing our lives. At first it all may seem quite nerdy (and certainly as I glance back at this blog post there’s a risk of that). But it won’t be long before it’s clear how incredibly useful it all is—and everyone will be doing it, and wondering how they could have ever gotten by before. And wishing they had started sooner, and hadn’t “lost” their earlier years.

Definitely check out Stephen Wolfram’s detailed and insightful post. And if you’re interested in data analytics, this site is a great resource. I also recommend watching the brief TED talk “The Quantified Self” by Gary Wolf.

Solving the Sudoku Minimum Number of Clues Problem

Three mathematicians — Gary McGuire, Bastian Tugemann, and Gilles Civario — spent a year working on a sudoku puzzle. Well, it’s a bit more complicated than that. The essential question they sought to answer: what is the minimum number of clues one must be provided to solve a sudoku puzzle? Turns out that one must see 17 clues (out of a total of 81 squares on a sudoku board) to solve the puzzle uniquely. From their paper, here is the abstract:

We apply our new hitting set enumeration algorithm to solve the sudoku minimum number of clues problem, which is the following question: What is the smallest number of clues (givens) that a sudoku puzzle may have? It was conjectured that the answer is 17. We have performed an exhaustive search for a 16-clue sudoku puzzle, and we did not find one, thereby proving that the answer is indeed 17. This article describes our method and the actual search

If you aren’t familiar with sudoku…the puzzle solver is presented with a 9×9 grid, some of whose cells already contain a digit between 1 and 9. The puzzle solver must complete the grid by filling in the remaining cells such that each row, each column, and each 3×3 box contains all digits between 1 and 9 exactly once. It is always understood that any proper (valid) sudoku puzzle must have only one completion. In other words, there is only one solution, only one correct answer.

And hence the interest in the minimum number of clues problem: What is the smallest number of clues that can possibly be given such that a sudoku puzzle still has only one solution?

There are exactly 6,670,903,752,021,072,936,960 possible solutions to Sudoku (about 6.7 * 10^21) . That’s far more than can be checked in a reasonable period of time. But due to various symmetry arguments (also known as equivalency transformations), many grids are identical, which reduces the numbers of grids to be checked to 5,472,730,538.

I am always on the lookout of mathematicians doing fun things, so if I find any papers on solving other types of games or puzzles, I will post the results here.


(via Technology Review)

On Understanding Advanced Mathematics

What’s it like to have an understanding of very advanced mathematics? A very detailed answer in a Quora post:

  • You can answer many seemingly difficult questions quickly. But you are not very impressed by what can look like magic, because you know the trick. The trick is that your brain can quickly decide if question is answerable by one of a small number of powerful general purpose “machines” (e.g. continuity arguments, combinatorial arguments, correspondence between geometric and algebraic objects, linear algebra, compactness arguments that reduce the infinite to the finite, dynamical systems, etc.). The number of fundamental ideas and techniques that people use to solve problems is pretty small — see for a partial list, maintained by Tim Gowers.
  • You are often confident that something is true long before you have an airtight proof for it (this happens especially often in geometry). The main reason is that you have a large catalogue of connections between concepts, and you can quickly intuit that if X were to be false, that would create tensions with other things you know to be true, so you are inclined to believe X is probably true to maintain the harmony of the conceptual space. It’s not so much that you can “imagine” the situation perfectly, but you can quickly imagine many other things that are logically connected to it.
  • Your intuitive thinking about a problem is productive and usefully structured, wasting little time on being puzzled. For example, when answering a question about a high-dimensional space (e.g., whether a certain kind of rotation of a five-dimensional object has a “fixed point” which does not move during the rotation), you do not spend much time straining to visualize those things that do not have obvious analogues in two and three dimensions. (Violating this principle is a huge source of frustration for beginning maths students who don’t know that they shouldn’t be straining.) Instead…
  • When trying to understand a new thing, you automatically focus on very simple examples that are easy to think about, and then you leverage intuition about simple examples into much more impressive insights. For example, you might imagine two- and three- dimensional rotations that are analogous to the one you really care about, and think about whether they clearly do or don’t have the desired property. Then you think about what was important to those examples and try to distill those ideas into symbols. Often, you see that the key idea in those symbolic manipulations doesn’t depend on anything about two or three dimensions, and you know how to answer your hard question.
    As you get more mathematically advanced, the examples you consider easy are actually complex insights built up from many easier examples; the “simple case” you think about now took you two years to become comfortable with. But at any given stage, you do not strain to obtain a magical illumination about something intractable; you work to reduce it to the things that feel friendly.
  • You go up in abstraction, “higher and higher”. The main object of study yesterday becomes just an example or a tiny part of what you are considering today. For example, in calculus classes you think about functions or curves. In functional analysis or algebraic geometry, you think of spaces whose points are functions or curves — that is, you “zoom out” so that every function is just a point in a space, surrounded by many other “nearby” functions. Using this kind of “zooming out” technique, you can say very complex things in very short sentences — things that, if unpacked and said at the “zoomed in” level, would take up pages. Abstracting and compressing in this way allows you to consider very complicated issues while using your limited memory and processing power.
  • Understanding something abstract or proving that something is true becomes a task a lot like building something. You think: “First I will lay this foundation, then I will build this framework using these familiar pieces, but leave the walls to fill in later, then I will test the beams…” All these steps have mathematical analogues, and structuring things in a modular way allows you to spend several days thinking about something without feeling lost or frustrated. Andrew Wiles, who proved Fermat’s Last Theorem, used an “exploring” metaphor: “Perhaps I can best describe my experience of doing mathematics in terms of a journey through a dark unexplored mansion. You enter the first room of the mansion and it’s completely dark. You stumble around bumping into the furniture, but gradually you learn where each piece of furniture is. Finally, after six months or so, you find the light switch, you turn it on, and suddenly it’s all illuminated. You can see exactly where you were. Then you move into the next room and spend another six months in the dark. So each of these breakthroughs, while sometimes they’re momentary, sometimes over a period of a day or two, they are the culmination of—and couldn’t exist without—the many months of stumbling around in the dark that proceed them.”
  • You are humble about your knowledge because you are aware of how weak maths is, and you are comfortable with the fact that you can say nothing intelligent about most problems. There are only very few mathematical questions to which we have reasonably insightful answers. There are even fewer questions, obviously, to which any given mathematician can give a good answer. After two or three years of a standard university curriculum, a good maths undergraduate can effortlessly write down hundreds of mathematical questions to which the very best mathematicians could not venture even a tentative answer. This makes it more comfortable to be stumped by most problems; a sense that you know roughly what questions are tractable and which are currently far beyond our abilities is humbling, but also frees you from being intimidated, because you do know you are familiar with the most powerful apparatus we have for dealing with these kinds of problems.

(Hat tip: Chris Dixon)

Readings: Compressed Sensing, Future of Money, Google’s Search Algorithm

I finished reading, from cover to cover, the March 2010 edition of Wired magazine last week. Today’s links of the day are all from Wired.

(1) “Fill in the Blanks: Using Math to Turn Lo-Res Datasets into High-Res Samples” [Wired] – a fascinating look into the Compressed Sensing algorithm. This article explores how the algorithm, discovered accidentally by Emmanuel Candès, has applications in medical imaging, satellite imaging, and photography. On the origins of the algorithm:

Candès, with the assistance of postdoc Justin Romberg, came up with what he considered to be a sketchy and incomplete theory for what he saw on his computer. He then presented it on a blackboard to a colleague at UCLA named Terry Tao. Candès came away from the conversation thinking that Tao was skeptical — the improvement in image clarity was close to impossible, after all. But the next evening, Tao sent a set of notes to Candès about the blackboard session. It was the basis of their first paper together. And over the next two years, they would write several more.

If you’ve never heard of Terence Tao, you should find out more about him. He’s one of the most brilliant mathematicians alive today (when he was 24, Tao was promoted to full professor at UCLA, the youngest person to achieve full professorship at UCLA; Tao also won the Fields Medal in 2006, equivalent to the Nobel Prize in mathematics). Tao maintains a very popular blog (among mathematics and those who really enjoy math, as the majority of the topics are quite esoteric for the general audience) here.

So how does compressed sensing work?

Compressed sensing works something like this: You’ve got a picture — of a kidney, of the president, doesn’t matter. The picture is made of 1 million pixels. In traditional imaging, that’s a million measurements you have to make. In compressed sensing, you measure only a small fraction — say, 100,000 pixels randomly selected from various parts of the image. From that starting point there is a gigantic, effectively infinite number of ways the remaining 900,000 pixels could be filled in.

So is this a revolutionary technique? The implication, of course, is that you can create something out of nothing. I remain unconvinced whether this technology will be used in digital photography in the future, but I do anticipate that for gathering large data sets, such as in satellite imagery, this technique will become very popular…

(2) “The Future of Money” [Wired] – an excellent, comprehensive piece explaining how the role of paying for things online has evolved since the days of PayPal. This is a must-read if you’re unfamiliar with the history of PayPal, don’t know how credit card transactions are made, and if you haven’t heard of recent developments of TwitPay and/or Square.

(3) “How Google’s Algorithm Rules the Web” [Wired] – most likely, you use Google every single day. This article explores the fascinating story behind the Google search algorithm (beginning with PageRank to the rollout of real-time search in December 2009), its adapation and evolution over the years. This article is a must-read.