Wolfram|Alpha Will Understand Your Language

June 4, 2009
shadow
The Wolfram|Alpha Team
Posted by

Today if you give input to Wolfram|Alpha in a language other than English, you’ll most likely see something like:

Wolfram|Alpha does not yet understand...

But in making Wolfram|Alpha accessible to as many people around the world as possible, our goal is eventually to have it understand every one of these languages.

A certain amount of Wolfram|Alpha input is actually quite language independent—because it’s really in math, or chemistry, or some other international notation, or because it’s asking about something (like a place) that’s always referred to by the same name.

But inevitably many inputs do depend on human language—and in fact even now about 5% of all inputs that are given try to use a language other than English.

Wolfram|Alpha knows quite a bit about the general properties of essentially every language (Spanish, Swahili, ….) But it doesn’t yet know how to interpret input in any language other than English.

Handling English is of course difficult enough. And each language that is added is a huge project—requiring all kinds of local help, support, and investment. But the good news is that with the core technology of Wolfram|Alpha, any language can in principle be handled.

At the lowest level, Wolfram|Alpha inherits from Mathematica its comprehensive use of Unicode—allowing it immediately to represent any character set. (Try something like unicode 2345 or unicode 1000 through 1050.)

But what’s more important is that Wolfram|Alpha’s whole approach to linguistic processing is general enough to be adapted to any detailed language structure.

And in fact, the very language that people use to interact with Wolfram|Alpha—even in English—is not really a language that’s been seen before.

Sometimes when people are first introduced to Wolfram|Alpha they’ll use a complete sentence, like What is the population of Italy? But remarkably quickly, they’ll abbreviate down to something that keeps the key concepts, but gets rid of other words, say just Italy population.

One might think this would mean that all one has to do is to spot the key words. But that wouldn’t get very far. Almost always one has to understand how the words are linked, and what actions, as well as objects, are being specified (e.g. 2 feet in inches, big apple population).

The abbreviated “computese” that people enter into Wolfram|Alpha isn’t quite like any existing human language. When people give input to Wolfram|Alpha, they’re usually trying to get their ideas communicated as quickly and directly as possible—and that means that they don’t put on the same gloss as in ordinary human language.

There are often fragments of language left over, as well as pieces of phrase structure and so on. But the forms that occur are not ones that one can learn from traditional grammar books.

The approach our team took during the initial development of Wolfram|Alpha was to accumulate large corpuses of linguistic usage in different areas, then to abstract from these rules and meta-rules that could be slotted into
Wolfram|Alpha’s linguistic processing system.

Now that Wolfram|Alpha has been released, our team has a major new—and more accurate—source, at least for English: the millions and millions of actual inputs that are given to the system.

So what’s involved in generalizing to other languages? A certain amount can be done by word- or phrase-wise translation. Often there will be multiple translations at this level. And when there are several words or phrases
together, there will often be a combinatorial explosion in the number of possibilities.

But conveniently enough, Wolfram|Alpha’s general ambiguity-handling system already deals very efficiently with some of this—providing an interesting foundation for a first level of language understanding.

A lot of language, however, does not factor in this kind of way, and it’s inevitable that all sorts of detailed linguistic curation will have to be done for each particular language—to capture all its particular idioms and special forms.

Different human languages often have rather different structures. For example, some languages (like English) have dominant subject-verb-object word orders, while others (like Japanese) have other word orders such as
subject-object-verb. Similarly, some languages indicate the role of words by case endings, others by position or by using post- or prepositions.

Interestingly, though, when people write “computese” these differences don’t seem to be as marked as usual: word orders are jumbled; case endings are simplified or omitted. Often this makes understanding individual inputs
more difficult, but it will make it easier to generalize Wolfram|Alpha to completely different classes of languages.

Of course, even once Wolfram|Alpha understands input in a particular language, we’re not finished. There’s also the problem of synthesizing correct output text in that language.

The automation of the underlying Mathematica system makes it feasible to have arbitrarily modified text flow immediately into tables, graphics, and everything else. But Wolfram|Alpha is mostly not dealing with literal pieces of text: it’s instead dealing with many small algorithms that form correct phrases from linguistic fragments. And directly or using appropriate meta-algorithms, each of these algorithms has to be converted for each output language.

The generalization of Wolfram|Alpha to all major human languages is a huge undertaking. But it’s one that we’re committed to pursuing.

We’re already had many comments and suggestions—as well as offers of help—from the international Wolfram|Alpha community. And we look forward to extensive collaborations with many individuals and organizations as we pursue the goal of making Wolfram|Alpha fully accessible to as many people in the world as possible.

43 Comments

I’d really like to see Hebrew among the other languages.
I’m willing to help you translate into Hebrew, and I’m sure many more Israelis and Jews would like to help :)

Good Luck!

Posted by Ori June 4, 2009 at 3:14 pm Reply

Latin? Really?

YAY!

Well, that is great news!

Thank you!

Posted by Johann June 4, 2009 at 3:16 pm Reply

What about catalan language? We are 10 million people!

Posted by Afontcu June 4, 2009 at 3:16 pm Reply

How about Albanian ? I can help!

Posted by Ilir June 4, 2009 at 3:37 pm Reply

At finnish line:

suomen -> suomea

“suomen” has possessive suffix.

Posted by JJ Luukko June 4, 2009 at 3:41 pm Reply

i guess arab don’t exist :P

Posted by O. Humeid June 4, 2009 at 3:44 pm Reply

hey WolframAlpha…..Where is the Arabic language…….1 billion people use it!!!

Posted by mohamed June 4, 2009 at 3:52 pm Reply

    The list of outputs above are merely samples of the languages Wolfram|Alpha recognizes. Please be assured that our goal is to make Wolfram|Alpha fully accessible to as many people in the world as possible.

    Posted by The PR Team June 5, 2009 at 10:06 am Reply

What about Greek??

Posted by Yiannis Fr. June 4, 2009 at 3:52 pm Reply

India has 63 languages !

Posted by Pavithra Kenjige June 4, 2009 at 4:07 pm Reply

What about Arabic?

Posted by Abdullah June 4, 2009 at 4:14 pm Reply

If things progress the way I’d like, W|A will be the standard way many people will learn math, science, and engineering. I don’t see that happening as a result of any designed lessons or curriculum but as a self-directed excursion through what is computable, guided by an initial interest or curiosity.

With that in mind, I’d like to see some more helpful responses to inquiries W|A can’t handle. Are any parts recognizable? Can a graph or network be shown with possibly related information. Are the units inconsistent? Can a poorly-posed math or physics question be shunted to MathWorld or ScienceWorld for clarification?

For that matter, W|A should be more tightly linked to those sites.

The response shouldn’t be too accommodating. If the question is bogus, W|A shouldn’t knock itself (himself? herself?) out coming up with possible ways it could make sense.

Posted by Fred Klingener June 4, 2009 at 4:18 pm Reply

@W|A-Team: Check out omegawiki.org for translations. I guess you will find it helpfull to have translations in a computable manner.

Posted by MovGP0 June 4, 2009 at 5:10 pm Reply

Great job, I’ve noticed this language recognition feature before and works quite well! Perhaps we can all help translating the message to other languages.

Posted by Ray Moren June 4, 2009 at 5:27 pm Reply

That’s pretty coool!

Posted by Serge gregor June 4, 2009 at 5:28 pm Reply

is there a support for Arabic someday?

Posted by martani June 4, 2009 at 6:19 pm Reply

Speaking of Unicode characters, we should be able to copy paste the outputs for those… :o

Posted by Steven June 4, 2009 at 6:26 pm Reply

I’d like to see Traditional Chinese!!

Posted by Mark Walberg June 4, 2009 at 7:28 pm Reply

I’m use the Google Toolbar with the integrated Translation-Button.
Why don’t W|A use the “Google Translation API” temporarily (Test)?

Posted by NoodleGei June 4, 2009 at 8:03 pm Reply

Hi Wolfram|Alpha team

I see that Hindi is not there in the priority list of yours at this time. Besides wondering why, I would like to extend any help possible to get things done in Hindi.

Posted by Prem Piyush Goyal June 4, 2009 at 10:34 pm Reply

The Chinese translation seems inappropriate (the translation of “support” you are using is the fanboy-ish kind of support). A better one I would suggest is “Wolfram|Alpha ??????” (“doesn’t understand Chinese”).

Posted by randName June 4, 2009 at 10:52 pm Reply

some intresting things (easter eggs) to try out with Wolfram Alpha .. apart from core Mathematics ..
http://talisman-rajiv.blogspot.com/

Posted by rajiv June 4, 2009 at 11:58 pm Reply

Now, if only we could get wolfram to give the correct density of water. I’m hopping someone reads the comments here, the water density problem is getting a lot of bad attention.

Posted by Andrew June 5, 2009 at 12:24 am Reply

>>> or because it’s asking about something (like a place) that’s always referred to by the same >>> name.
This is not true. In fact, places and institutions tend to be translation problems. There is a certain amount of places (and institutions) that are translated in certain languages but not in others. Even in between the speakers of a language there’s no general rule to decide if one place is referred one way or the other.

Sometimes even names (specially of historical figures) have a translation. It’s no as easy at it seems. It can be done, but I’m not sure of the engineering costs of it. Doing it for a handful of languges in a predefined field it’s OK. Doing it for twenty without limiting the fields…

Even Google Tanslator has some problems with palces, institutions and other entities.

Posted by Lluc Potrony June 5, 2009 at 1:00 am Reply

Thank you W|A team!

¡Gracias equipo de W|A!

Posted by Dieguico June 5, 2009 at 2:33 am Reply

What about ukraininan.
Its a language spoken by 40 mln of people. Even much more than Dannish, Sweedish, Finnish or Polish.

Posted by Mudry June 5, 2009 at 2:54 am Reply

Japanese is actually subject-object-verb, not verb-subject-object as stated in this entry. :)

Posted by Matthew Lanigan June 5, 2009 at 3:06 am Reply

???? ????? ?? ?? ?????? ????
http://www.glk.wikipedia.org

Posted by ???? June 5, 2009 at 3:38 am Reply

Coooool!!!!!

Posted by Srinivas June 5, 2009 at 7:16 am Reply

What about Filipino/Tagalog language spoken in the Philippines by about 22 million people?
Overall (worldwide): ? 90 million total speakers

Posted by Batibot June 5, 2009 at 9:43 am Reply

Perhaps they lack translations. Why don’t we help them out and write each the same message in their own language? That would be perhaps very useful for them. I wonder however the problem of the characters encoding since some have tried to post something in other language, so perhaps people can post the actual unicode numbers of the characters or the HTML codes or something, otherwise the idea won’t work.

I would start providing translations but I’m afraid I only speak Italian and French and they already have them.

Posted by Serge gregor June 5, 2009 at 3:49 pm Reply

I hope if this is implemented, I would actually get to choose which language I want to use instead of it being detected by geolocation or something. I’m tired of sites that default to horrible Russian translations, even though I can read English perfectly fine. For example Russian version of justin.tv is just cringeworthy.

Posted by Timofei June 6, 2009 at 8:40 am Reply

    Are you sure it’s geolocation? Multilingual sites should use content negotiation to serve the user-requested language that your browser sends in the HTTP header. Perhaps you have your browser mis-configured to request Russian?

    Posted by Nicholas Shanks June 10, 2009 at 12:04 pm Reply

Type in
wolframalpha.com
and look at the Alexa-information.

You will see, German is the Nr. 2 User-language, not far from English.
Google earns a lot of money with the german version.

Posted by Matthias Zehe June 6, 2009 at 1:20 pm Reply

Hmm it seems the Japanese error message is not implemented yet… I tried ??? and ?? and only got an English error message…

Having high hopes on this multilanguage endeavor!

Posted by Agro Rachmatullah June 7, 2009 at 2:19 am Reply

That is really going to be great. I agree that it definitely is going to increase the diversity of the people using W|A. There’s one thing I wish W|A to have is a specific place to find more information on the quarry, I know it is going to make output bit messy, but sometime I do feel the need for more information (and I definitely do not mean the junk), from very good, informative, reliable source. Needless to say that I am big fan of W|A and do understand that its growing.

Posted by Sanjiv Kumar June 7, 2009 at 11:37 am Reply

Guys, will you support quires into biblical stuff in Greek and Hebrew for in depth comparison, ie how often a verb form is used, locations of its occurrence compared to other passages, etc…

Posted by Andrew Meit June 7, 2009 at 9:36 pm Reply

I’d be willing to help for Portuguese :-)

Posted by J August 19, 2009 at 6:22 pm Reply

I can help with spanish! ;)

Posted by CarlosAC February 8, 2010 at 6:18 pm Reply

I think the the post doesn’t necessarely imply they think Japanese is verb-subject-object but that it has a different order (like others, it says), where subject-object-verb and Japanese are 2 instances.

Posted by Serge gregor June 5, 2009 at 3:46 pm Reply
Leave a Comment

(required)

(will not be published) (required)

(your comment will be held for moderation)