From the Wolfram Science Summer School to Wolfram|Alpha Pro

May 1, 2012 — Carlo Barbieri

In spring 2011, while adding the finishing touches to my PhD dissertation, I decided to enroll in the Wolfram Science Summer School (then called the NKS Summer School). I never suspected that my project at the Summer School would lead to a job and my involvement in one of the central features of Wolfram|Alpha Pro.

During my years as a graduate student I had the chance to live in three different countries and experience different working environments: other than my native Italy, I lived in Paris, where my PhD was based at ENS, and in Princeton, where I was lucky enough to spend time at the Institute for Advanced Study. However, at the end of my PhD, I felt that most of my interest in what I was doing was gone and that I needed to try something new.

Once at the Summer School, I had the chance to meet and chat with Stephen Wolfram as he helped me come up with a problem to work on. One of the first things I told him was that I was weary of open-ended academic kinds of problems and I was afraid no one was ever going to read my papers. I said that I wanted to deal with intellectual challenges, but I also wanted to tackle something that had a clear beginning and end.

His reply came as a disappointment, since what he suggested I work on was both completely outside my area of expertise and clearly one of those impossibly wide problems that I was now skeptical of. What did he say?

Stephen asked me to devise a system to generate a plain English description of a time series. My disappointment vanished quickly when I realized that while a general answer to this kind of question was well beyond the scope of the three weeks I had at the Summer School, there was a combination of neat heuristics and information theory ideas that might do a reasonable job in most cases.

Little did I know that I loved this kind of experimental coding and that it was going to become my full-time job as a Wolfram employee a few months later. My project turned out to be a success, and soon I was encouraged to apply for a job at Wolfram Research, the company that made Mathematica–a product my friends and I had long considered a godsend for our work.

In my first four months at the company, I got involved in a very exciting project that has taken Wolfram|Alpha in an entirely new direction. I worked in a small group on what was known internally by the name tabular input (or TI to its friends). Along with image upload, file upload, and data download, TI formed one of the foundations of our subscription service, Wolfram|Alpha Pro.

The idea behind this project was to treat data as if it were language: columns of a single type of data, like numbers, dates, places, or what have you, act as words in a sentence. And just like sentences of human language, groups of these words are more than the sum of their parts. To give a particularly important example, think about a time series, which is a column of dates plus a column of one or more numeric quantities.

We knew we had at our disposal a huge amount of data coming from Wolfram|Alpha as well as the power of the parser to recognize what this data was all about. We could parse dates in any format, various currencies, and units. This made it natural to think along these lines: This column is about currency. It appears alongside a column of dates. We happen to have data about inflation in that country. We can do an inflation adjustment on the currency values!

So we had the notion of parsing data to determine what kinds of analysis to perform. What about those analyses? How should they look and feel? How should we rank them if many different ones were possible?

Now, along with a small group of smart people coming from wildly different backgrounds–economics, computational biology, pure math, and statistics, to mention a few–we started thinking about what this new data language was trying to say. (Interestingly, most of these people were Summer School alumni from several past years.)

A key fact about our group was that, while everyone had the vocabulary of mathematics in common, no single person knew it all. Take me, for example. As a physicist, I am perfectly at ease with nonlinear fits, but I couldn’t read a regression table to save my life! But to my colleague, an economist, regression tables were second nature.

And in arguing about how to display these results in a way that made sense to a non-expert and an expert both, we got to the crux of each analysis. In fact, this was also how I came up with the idea of spelling out in plain English the key results of our data analysis and, so, funnily enough, this is where my Summer School project came full circle.

For example, one of the sample datasets was the passenger list from the Titanic: age, class, gender, and whether or not they had made it to the lifeboats. Now, a logistic regression is the perfect tool to see if being a woman or a child actually increased the chances of surviving, but can you actually interpret one? As you can see below, we’ve done the job for you! And in case you were wondering what the result is, you had a better chance of survival if you were female, young, and a first-class passenger.

Each one of us had to think: “What would I, as an expert in this field, do with this data? How would I visualize it? What kind of analysis would I perform on it?” It turns out this is one of the key insights of Wolfram|Alpha: to bring expert knowledge to the tips of everybody’s fingers (or vocal cords, if you happen to use Siri).

What I ended up finding really addictive about this job is that I get to wrap my mind around research-grade problems, but after I understand them, I have to quite quickly turn my ideas into workable features of a website that is used by a lot of people daily. At Wolfram, I get the intellectual challenge I was looking for, and thousands read my research results every day.

4 Comments

Congratulations, Carlo! I’m glad to hear your new job is going great. Would you mind sharing the link (if any) for the demonstration you created during last year’s summer school?

Posted by Fernando Sanchez May 1, 2012 at 2:13 pm

Hi Fernando,
long time no hear! I hope you are doing fine.
As to your question: no I haven’t published any demonstration, but as you can see, some parts of my Summer School project are making their way into Wolfram|Alpha. If you are curious get a Pro trial account and try it out!

Posted by Carlo Barbieri May 2, 2012 at 6:47 am

Now you can do a plain English description of a time series I suggest you devise an algorithm to do a plain (I would call it computable) English description of anything.

You start by seeing any thing as an idea. That idea is identified for communication purposes by preferably one word. Now you define that word as you would in the best dictionaries but in addition the definition (Plain English description) must be computable forming an algortihm.
Now you can use any idea/ word as a factor in a formula to answer any query.
When you have finished ask it What is life? and the answer should contain every other idea/word it knows. BG Volunteer Curator.

Posted by Brian Gilbert May 2, 2012 at 4:13 am

Hi,

Very interesting post.
I tried Wolfram-Alpha Trial with a small dataset that i was working on. In terms of basic statistics (descriptive analysis) it was pretty good moreover if it was performed automatically. That is an important step forward for many, many people with a basic analytical background.
But it could not provide something more elaborated. It is for sure a question of time. Now you have a platform and little by little “Moore’s Law” will lead you to a kind of expert system where you throw your data and automatically get many different insights levaing the more elaborated analysis to very specialized analytical-skilled people.

Regards,
Carlos.

Posted by Carlos Ortega May 3, 2012 at 3:45 am