Wolfram|Alpha Will Understand Your Language

June 4, 2009 — The Wolfram|Alpha Team

Today if you give input to Wolfram|Alpha in a language other than English, you’ll most likely see something like:

Wolfram|Alpha does not yet understand...

But in making Wolfram|Alpha accessible to as many people around the world as possible, our goal is eventually to have it understand every one of these languages.

A certain amount of Wolfram|Alpha input is actually quite language independent—because it’s really in math, or chemistry, or some other international notation, or because it’s asking about something (like a place) that’s always referred to by the same name.

But inevitably many inputs do depend on human language—and in fact even now about 5% of all inputs that are given try to use a language other than English.

Wolfram|Alpha knows quite a bit about the general properties of essentially every language (Spanish, Swahili, ….) But it doesn’t yet know how to interpret input in any language other than English.

Handling English is of course difficult enough. And each language that is added is a huge project—requiring all kinds of local help, support, and investment. But the good news is that with the core technology of Wolfram|Alpha, any language can in principle be handled.

At the lowest level, Wolfram|Alpha inherits from Mathematica its comprehensive use of Unicode—allowing it immediately to represent any character set. (Try something like unicode 2345 or unicode 1000 through 1050.)

But what’s more important is that Wolfram|Alpha’s whole approach to linguistic processing is general enough to be adapted to any detailed language structure.

And in fact, the very language that people use to interact with Wolfram|Alpha—even in English—is not really a language that’s been seen before.

Sometimes when people are first introduced to Wolfram|Alpha they’ll use a complete sentence, like What is the population of Italy? But remarkably quickly, they’ll abbreviate down to something that keeps the key concepts, but gets rid of other words, say just Italy population.

One might think this would mean that all one has to do is to spot the key words. But that wouldn’t get very far. Almost always one has to understand how the words are linked, and what actions, as well as objects, are being specified (e.g. 2 feet in inches, big apple population).

The abbreviated “computese” that people enter into Wolfram|Alpha isn’t quite like any existing human language. When people give input to Wolfram|Alpha, they’re usually trying to get their ideas communicated as quickly and directly as possible—and that means that they don’t put on the same gloss as in ordinary human language.

There are often fragments of language left over, as well as pieces of phrase structure and so on. But the forms that occur are not ones that one can learn from traditional grammar books.

The approach our team took during the initial development of Wolfram|Alpha was to accumulate large corpuses of linguistic usage in different areas, then to abstract from these rules and meta-rules that could be slotted into
Wolfram|Alpha’s linguistic processing system.

Now that Wolfram|Alpha has been released, our team has a major new—and more accurate—source, at least for English: the millions and millions of actual inputs that are given to the system.

So what’s involved in generalizing to other languages? A certain amount can be done by word- or phrase-wise translation. Often there will be multiple translations at this level. And when there are several words or phrases
together, there will often be a combinatorial explosion in the number of possibilities.

But conveniently enough, Wolfram|Alpha’s general ambiguity-handling system already deals very efficiently with some of this—providing an interesting foundation for a first level of language understanding.

A lot of language, however, does not factor in this kind of way, and it’s inevitable that all sorts of detailed linguistic curation will have to be done for each particular language—to capture all its particular idioms and special forms.

Different human languages often have rather different structures. For example, some languages (like English) have dominant subject-verb-object word orders, while others (like Japanese) have other word orders such as
subject-object-verb. Similarly, some languages indicate the role of words by case endings, others by position or by using post- or prepositions.

Interestingly, though, when people write “computese” these differences don’t seem to be as marked as usual: word orders are jumbled; case endings are simplified or omitted. Often this makes understanding individual inputs
more difficult, but it will make it easier to generalize Wolfram|Alpha to completely different classes of languages.

Of course, even once Wolfram|Alpha understands input in a particular language, we’re not finished. There’s also the problem of synthesizing correct output text in that language.

The automation of the underlying Mathematica system makes it feasible to have arbitrarily modified text flow immediately into tables, graphics, and everything else. But Wolfram|Alpha is mostly not dealing with literal pieces of text: it’s instead dealing with many small algorithms that form correct phrases from linguistic fragments. And directly or using appropriate meta-algorithms, each of these algorithms has to be converted for each output language.

The generalization of Wolfram|Alpha to all major human languages is a huge undertaking. But it’s one that we’re committed to pursuing.

We’re already had many comments and suggestions—as well as offers of help—from the international Wolfram|Alpha community. And we look forward to extensive collaborations with many individuals and organizations as we pursue the goal of making Wolfram|Alpha fully accessible to as many people in the world as possible.

45 Comments

I’d really like to see Hebrew among the other languages.
I’m willing to help you translate into Hebrew, and I’m sure many more Israelis and Jews would like to help 🙂

Good Luck!

Posted by Ori June 4, 2009 at 3:14 pm

Latin? Really?

YAY!

Well, that is great news!

Thank you!

Posted by Johann June 4, 2009 at 3:16 pm

What about catalan language? We are 10 million people!

Posted by Afontcu June 4, 2009 at 3:16 pm

How about Albanian ? I can help!

Posted by Ilir June 4, 2009 at 3:37 pm

At finnish line:

suomen -> suomea

“suomen” has possessive suffix.

Posted by JJ Luukko June 4, 2009 at 3:41 pm

i guess arab don’t exist 😛

Posted by O. Humeid June 4, 2009 at 3:44 pm

hey WolframAlpha…..Where is the Arabic language…….1 billion people use it!!!

Posted by mohamed June 4, 2009 at 3:52 pm

The list of outputs above are merely samples of the languages Wolfram|Alpha recognizes. Please be assured that our goal is to make Wolfram|Alpha fully accessible to as many people in the world as possible.

Posted by The PR Team June 5, 2009 at 10:06 am

What about Greek??

Posted by Yiannis Fr. June 4, 2009 at 3:52 pm

India has 63 languages !

Posted by Pavithra Kenjige June 4, 2009 at 4:07 pm

What about Arabic?

Posted by Abdullah June 4, 2009 at 4:14 pm

If things progress the way I’d like, W|A will be the standard way many people will learn math, science, and engineering. I don’t see that happening as a result of any designed lessons or curriculum but as a self-directed excursion through what is computable, guided by an initial interest or curiosity.

With that in mind, I’d like to see some more helpful responses to inquiries W|A can’t handle. Are any parts recognizable? Can a graph or network be shown with possibly related information. Are the units inconsistent? Can a poorly-posed math or physics question be shunted to MathWorld or ScienceWorld for clarification?

For that matter, W|A should be more tightly linked to those sites.

The response shouldn’t be too accommodating. If the question is bogus, W|A shouldn’t knock itself (himself? herself?) out coming up with possible ways it could make sense.

Posted by Fred Klingener June 4, 2009 at 4:18 pm

@W|A-Team: Check out omegawiki.org for translations. I guess you will find it helpfull to have translations in a computable manner.

Posted by MovGP0 June 4, 2009 at 5:10 pm

Great job, I’ve noticed this language recognition feature before and works quite well! Perhaps we can all help translating the message to other languages.

Posted by Ray Moren June 4, 2009 at 5:27 pm

That’s pretty coool!

Posted by Serge gregor June 4, 2009 at 5:28 pm

is there a support for Arabic someday?

Posted by martani June 4, 2009 at 6:19 pm

Speaking of Unicode characters, we should be able to copy paste the outputs for those… 😮

Posted by Steven June 4, 2009 at 6:26 pm

I’d like to see Traditional Chinese!!

Posted by Mark Walberg June 4, 2009 at 7:28 pm

I’m use the Google Toolbar with the integrated Translation-Button.
Why don’t W|A use the “Google Translation API” temporarily (Test)?

Posted by NoodleGei June 4, 2009 at 8:03 pm

Hi Wolfram|Alpha team

I see that Hindi is not there in the priority list of yours at this time. Besides wondering why, I would like to extend any help possible to get things done in Hindi.

Posted by Prem Piyush Goyal June 4, 2009 at 10:34 pm

The Chinese translation seems inappropriate (the translation of “support” you are using is the fanboy-ish kind of support). A better one I would suggest is “Wolfram|Alpha ??????” (“doesn’t understand Chinese”).

Posted by randName June 4, 2009 at 10:52 pm

some intresting things (easter eggs) to try out with Wolfram Alpha .. apart from core Mathematics ..
http://talisman-rajiv.blogspot.com/

Posted by rajiv June 4, 2009 at 11:58 pm

Now, if only we could get wolfram to give the correct density of water. I’m hopping someone reads the comments here, the water density problem is getting a lot of bad attention.

Posted by Andrew June 5, 2009 at 12:24 am

Thanks for the input Andrew. Could you give a specific example of a problematic input for water density? Are you specifying thermodynamic conditions, such as temperature and pressure?

Posted by WolframAlphaDataGuy June 5, 2009 at 11:57 am

http://www27.wolframalpha.com/input/?i=density+of+water+at+300K

The density of water at 300K should be 996.513 kg/m^3 according to Perry’s Handbook of chemical engineering. I’ve tried other temperatures, and they are different but are all off by 10-15kg/m^3. Let me emphatically say though that I love W|A and I appreciate the quick response!

Posted by Andrew June 5, 2009 at 1:01 pm

Thanks for the information. We are working on fixing this problem. Actually, the correct result is there if you click on the link at the top of the page: “referring to thermodynamics”, which will take you to:

http://www.wolframalpha.com/input/?i=density+of+water+at+300K&a=*MC.%7E-_*ThermodynamicPropertyPhrase-

We’re glad to hear that you find our site useful. We will get this issue resolved very soon.

Posted by WolframAlphaDataGuy June 5, 2009 at 3:38 pm

>>> or because it’s asking about something (like a place) that’s always referred to by the same >>> name.
This is not true. In fact, places and institutions tend to be translation problems. There is a certain amount of places (and institutions) that are translated in certain languages but not in others. Even in between the speakers of a language there’s no general rule to decide if one place is referred one way or the other.

Sometimes even names (specially of historical figures) have a translation. It’s no as easy at it seems. It can be done, but I’m not sure of the engineering costs of it. Doing it for a handful of languges in a predefined field it’s OK. Doing it for twenty without limiting the fields…

Even Google Tanslator has some problems with palces, institutions and other entities.

Posted by Lluc Potrony June 5, 2009 at 1:00 am

Thank you W|A team!

¡Gracias equipo de W|A!

Posted by Dieguico June 5, 2009 at 2:33 am

What about ukraininan.
Its a language spoken by 40 mln of people. Even much more than Dannish, Sweedish, Finnish or Polish.

Posted by Mudry June 5, 2009 at 2:54 am

Japanese is actually subject-object-verb, not verb-subject-object as stated in this entry. 🙂

Posted by Matthew Lanigan June 5, 2009 at 3:06 am

???? ????? ?? ?? ?????? ????
http://www.glk.wikipedia.org

Posted by ???? June 5, 2009 at 3:38 am

Coooool!!!!!

Posted by Srinivas June 5, 2009 at 7:16 am

What about Filipino/Tagalog language spoken in the Philippines by about 22 million people?
Overall (worldwide): ? 90 million total speakers

Posted by Batibot June 5, 2009 at 9:43 am

Perhaps they lack translations. Why don’t we help them out and write each the same message in their own language? That would be perhaps very useful for them. I wonder however the problem of the characters encoding since some have tried to post something in other language, so perhaps people can post the actual unicode numbers of the characters or the HTML codes or something, otherwise the idea won’t work.

I would start providing translations but I’m afraid I only speak Italian and French and they already have them.

Posted by Serge gregor June 5, 2009 at 3:49 pm

I hope if this is implemented, I would actually get to choose which language I want to use instead of it being detected by geolocation or something. I’m tired of sites that default to horrible Russian translations, even though I can read English perfectly fine. For example Russian version of justin.tv is just cringeworthy.

Posted by Timofei June 6, 2009 at 8:40 am

Are you sure it’s geolocation? Multilingual sites should use content negotiation to serve the user-requested language that your browser sends in the HTTP header. Perhaps you have your browser mis-configured to request Russian?

Posted by Nicholas Shanks June 10, 2009 at 12:04 pm

Type in
wolframalpha.com
and look at the Alexa-information.

You will see, German is the Nr. 2 User-language, not far from English.
Google earns a lot of money with the german version.

Posted by Matthias Zehe June 6, 2009 at 1:20 pm

Hmm it seems the Japanese error message is not implemented yet… I tried ??? and ?? and only got an English error message…

Having high hopes on this multilanguage endeavor!

Posted by Agro Rachmatullah June 7, 2009 at 2:19 am

That is really going to be great. I agree that it definitely is going to increase the diversity of the people using W|A. There’s one thing I wish W|A to have is a specific place to find more information on the quarry, I know it is going to make output bit messy, but sometime I do feel the need for more information (and I definitely do not mean the junk), from very good, informative, reliable source. Needless to say that I am big fan of W|A and do understand that its growing.

Posted by Sanjiv Kumar June 7, 2009 at 11:37 am

Guys, will you support quires into biblical stuff in Greek and Hebrew for in depth comparison, ie how often a verb form is used, locations of its occurrence compared to other passages, etc…

Posted by Andrew Meit June 7, 2009 at 9:36 pm

I’d be willing to help for Portuguese 🙂

Posted by J August 19, 2009 at 6:22 pm

I can help with spanish! 😉

Posted by CarlosAC February 8, 2010 at 6:18 pm

Hi,

Do you support Natural English language processing? In other words using your API, can I identify the verbs, nounds, objects etc.. from a given english sentence?

Thanks

Jerome

Posted by Jerome February 11, 2015 at 11:11 am

Thank you for your comment, unfortunately that information is not available.

Posted by The Wolfram Team February 20, 2015 at 11:33 am