When Wolfram|Alpha launched three years ago, it did so with broad (but not very deep) socioeconomic data for most geographic places on Earth. Since then, each enhancement of this part of our knowledge base has tended to address just one type of place at a time. Sometimes we’ve added an entirely new category (like US congressional districts or school districts); other times, we’ve added a narrowly focused set of properties to an existing category (such as age pyramids for countries or home prices for US metro areas).
I’ve been proud of each of these individual features, but also frustrated by how hard it’s been to get detailed and directly comparable data for many different types of places at once—the kind of data, in other words, that Wolfram|Alpha is perfectly suited to work with.
But thanks to the outstanding work of our friends at the US Census Bureau, we’ve been able to take some big steps toward filling this “data gap.” The annual American Community Survey (ACS) is designed to replace the old long-form decennial census questionnaire, covering information about age, sex, race, ethnicity, education, income, and much more. In 2006, the Census Bureau released the first single-year ACS estimates, but only for areas with populations over 65,000; in 2008, three-year estimates came out for areas with populations of 20,000 or more; and in 2010, the first five-year estimates were released, covering every geographic area in the country.
What does this mean for Wolfram|Alpha? It means that when we add new data from the five-year ACS estimates, we can immediately compute answers to a new set of questions about virtually every city, school district, congressional district, county, metropolitan area, and state in the country—as well as questions about the nation overall. You can ask about a specific place, compare several specific places, or generate distributions and rankings for a single property mapped over a large set of places.
Let’s start with one of the most fundamental—and most frequently requested—demographic breakdowns: population by age and sex. I’ve always been able to ask Wolfram|Alpha simple questions like “What’s the population of the city of Mars, PA?” (my tiny hometown). But I couldn’t dig any deeper into those numbers.
Now that we’ve added some ACS estimates to our knowledge base, I can ask for a population pyramid for Mars, PA, or I could ask what fraction of the population of the city is female, or even what fraction of the population are girls age 0 to 4. But then I might be curious about how the city proper compares to my old school district. Or I might want to analyze and rank the proportion of school-age children among school districts in my home county. Since it’s an election year, I also find myself asking Wolfram|Alpha to do things like compare the middle-aged male population fraction of PA congressional districts or compare the senior citizen population fraction of Florida and Nevada, two other supposed swing states in the upcoming election.
Even limiting myself to questions about population by age and sex, I’ve squandered a probably-unhealthy amount of time comparing the shape of specific cities’ age pyramids. Consider the distinctive spikes of college towns like Champaign, Illinois or Binghamton, NY—or the dramatically different “bulges” for Manhattan and Staten Island.
And those are only questions related to a single table of ACS estimates. We’ve already added data on race, Hispanic origin, and poverty; estimates of educational attainment, school enrollment, household income, and more are coming within the next few weeks. Because each of these topics represents such a large volume of data and such a wealth of new things to compute with Wolfram|Alpha, we plan to publish a new blog post each week for the next month or so. We’ll focus on one or two new topics, with lots of examples of new ACS-based queries and other computations that mash up ACS estimates with other datasets in Wolfram|Alpha.
We’ll also be making some subtle improvements to Wolfram|Alpha’s ability to understand complex, natural-language queries about this data, but, as always, it helps to have lots of real test cases from users. So dig in, play around, and let us know what works—or what could work better. We’re excited to make this rich data more accessible to the general public and eager to hear what you think about it.