Today if you give input to Wolfram|Alpha in a language other than English, you’ll most likely see something like:
But in making Wolfram|Alpha accessible to as many people around the world as possible, our goal is eventually to have it understand every one of these languages.
A certain amount of Wolfram|Alpha input is actually quite language independent—because it’s really in math, or chemistry, or some other international notation, or because it’s asking about something (like a place) that’s always referred to by the same name.
But inevitably many inputs do depend on human language—and in fact even now about 5% of all inputs that are given try to use a language other than English.
Wolfram|Alpha knows quite a bit about the general properties of essentially every language (Spanish, Swahili, ….) But it doesn’t yet know how to interpret input in any language other than English.
Handling English is of course difficult enough. And each language that is added is a huge project—requiring all kinds of local help, support, and investment. But the good news is that with the core technology of Wolfram|Alpha, any language can in principle be handled.
At the lowest level, Wolfram|Alpha inherits from Mathematica its comprehensive use of Unicode—allowing it immediately to represent any character set. (Try something like unicode 2345 or unicode 1000 through 1050.)
But what’s more important is that Wolfram|Alpha’s whole approach to linguistic processing is general enough to be adapted to any detailed language structure.
And in fact, the very language that people use to interact with Wolfram|Alpha—even in English—is not really a language that’s been seen before.
Sometimes when people are first introduced to Wolfram|Alpha they’ll use a complete sentence, like What is the population of Italy? But remarkably quickly, they’ll abbreviate down to something that keeps the key concepts, but gets rid of other words, say just Italy population.
One might think this would mean that all one has to do is to spot the key words. But that wouldn’t get very far. Almost always one has to understand how the words are linked, and what actions, as well as objects, are being specified (e.g. 2 feet in inches, big apple population).
The abbreviated “computese” that people enter into Wolfram|Alpha isn’t quite like any existing human language. When people give input to Wolfram|Alpha, they’re usually trying to get their ideas communicated as quickly and directly as possible—and that means that they don’t put on the same gloss as in ordinary human language.
There are often fragments of language left over, as well as pieces of phrase structure and so on. But the forms that occur are not ones that one can learn from traditional grammar books.
The approach our team took during the initial development of Wolfram|Alpha was to accumulate large corpuses of linguistic usage in different areas, then to abstract from these rules and meta-rules that could be slotted into
Wolfram|Alpha’s linguistic processing system.
Now that Wolfram|Alpha has been released, our team has a major new—and more accurate—source, at least for English: the millions and millions of actual inputs that are given to the system.
So what’s involved in generalizing to other languages? A certain amount can be done by word- or phrase-wise translation. Often there will be multiple translations at this level. And when there are several words or phrases
together, there will often be a combinatorial explosion in the number of possibilities.
But conveniently enough, Wolfram|Alpha’s general ambiguity-handling system already deals very efficiently with some of this—providing an interesting foundation for a first level of language understanding.
A lot of language, however, does not factor in this kind of way, and it’s inevitable that all sorts of detailed linguistic curation will have to be done for each particular language—to capture all its particular idioms and special forms.
Different human languages often have rather different structures. For example, some languages (like English) have dominant subject-verb-object word orders, while others (like Japanese) have other word orders such as
subject-object-verb. Similarly, some languages indicate the role of words by case endings, others by position or by using post- or prepositions.
Interestingly, though, when people write “computese” these differences don’t seem to be as marked as usual: word orders are jumbled; case endings are simplified or omitted. Often this makes understanding individual inputs
more difficult, but it will make it easier to generalize Wolfram|Alpha to completely different classes of languages.
Of course, even once Wolfram|Alpha understands input in a particular language, we’re not finished. There’s also the problem of synthesizing correct output text in that language.
The automation of the underlying Mathematica system makes it feasible to have arbitrarily modified text flow immediately into tables, graphics, and everything else. But Wolfram|Alpha is mostly not dealing with literal pieces of text: it’s instead dealing with many small algorithms that form correct phrases from linguistic fragments. And directly or using appropriate meta-algorithms, each of these algorithms has to be converted for each output language.
The generalization of Wolfram|Alpha to all major human languages is a huge undertaking. But it’s one that we’re committed to pursuing.
We’re already had many comments and suggestions—as well as offers of help—from the international Wolfram|Alpha community. And we look forward to extensive collaborations with many individuals and organizations as we pursue the goal of making Wolfram|Alpha fully accessible to as many people in the world as possible.
I’d really like to see Hebrew among the other languages.
I’m willing to help you translate into Hebrew, and I’m sure many more Israelis and Jews would like to help 🙂
Good Luck!
At finnish line:
suomen -> suomea
“suomen” has possessive suffix.
hey WolframAlpha…..Where is the Arabic language…….1 billion people use it!!!
The list of outputs above are merely samples of the languages Wolfram|Alpha recognizes. Please be assured that our goal is to make Wolfram|Alpha fully accessible to as many people in the world as possible.
If things progress the way I’d like, W|A will be the standard way many people will learn math, science, and engineering. I don’t see that happening as a result of any designed lessons or curriculum but as a self-directed excursion through what is computable, guided by an initial interest or curiosity.
With that in mind, I’d like to see some more helpful responses to inquiries W|A can’t handle. Are any parts recognizable? Can a graph or network be shown with possibly related information. Are the units inconsistent? Can a poorly-posed math or physics question be shunted to MathWorld or ScienceWorld for clarification?
For that matter, W|A should be more tightly linked to those sites.
The response shouldn’t be too accommodating. If the question is bogus, W|A shouldn’t knock itself (himself? herself?) out coming up with possible ways it could make sense.
@W|A-Team: Check out omegawiki.org for translations. I guess you will find it helpfull to have translations in a computable manner.
Great job, I’ve noticed this language recognition feature before and works quite well! Perhaps we can all help translating the message to other languages.
Speaking of Unicode characters, we should be able to copy paste the outputs for those… 😮
I’m use the Google Toolbar with the integrated Translation-Button.
Why don’t W|A use the “Google Translation API” temporarily (Test)?
Hi Wolfram|Alpha team
I see that Hindi is not there in the priority list of yours at this time. Besides wondering why, I would like to extend any help possible to get things done in Hindi.
The Chinese translation seems inappropriate (the translation of “support” you are using is the fanboy-ish kind of support). A better one I would suggest is “Wolfram|Alpha ??????” (“doesn’t understand Chinese”).
some intresting things (easter eggs) to try out with Wolfram Alpha .. apart from core Mathematics ..
http://talisman-rajiv.blogspot.com/
Now, if only we could get wolfram to give the correct density of water. I’m hopping someone reads the comments here, the water density problem is getting a lot of bad attention.
Thanks for the input Andrew. Could you give a specific example of a problematic input for water density? Are you specifying thermodynamic conditions, such as temperature and pressure?
http://www27.wolframalpha.com/input/?i=density+of+water+at+300K
The density of water at 300K should be 996.513 kg/m^3 according to Perry’s Handbook of chemical engineering. I’ve tried other temperatures, and they are different but are all off by 10-15kg/m^3. Let me emphatically say though that I love W|A and I appreciate the quick response!
Thanks for the information. We are working on fixing this problem. Actually, the correct result is there if you click on the link at the top of the page: “referring to thermodynamics”, which will take you to:
We’re glad to hear that you find our site useful. We will get this issue resolved very soon.
>>> or because it’s asking about something (like a place) that’s always referred to by the same >>> name.
This is not true. In fact, places and institutions tend to be translation problems. There is a certain amount of places (and institutions) that are translated in certain languages but not in others. Even in between the speakers of a language there’s no general rule to decide if one place is referred one way or the other.
Sometimes even names (specially of historical figures) have a translation. It’s no as easy at it seems. It can be done, but I’m not sure of the engineering costs of it. Doing it for a handful of languges in a predefined field it’s OK. Doing it for twenty without limiting the fields…
Even Google Tanslator has some problems with palces, institutions and other entities.
What about ukraininan.
Its a language spoken by 40 mln of people. Even much more than Dannish, Sweedish, Finnish or Polish.
Japanese is actually subject-object-verb, not verb-subject-object as stated in this entry. 🙂
What about Filipino/Tagalog language spoken in the Philippines by about 22 million people?
Overall (worldwide): ? 90 million total speakers
Perhaps they lack translations. Why don’t we help them out and write each the same message in their own language? That would be perhaps very useful for them. I wonder however the problem of the characters encoding since some have tried to post something in other language, so perhaps people can post the actual unicode numbers of the characters or the HTML codes or something, otherwise the idea won’t work.
I would start providing translations but I’m afraid I only speak Italian and French and they already have them.
I hope if this is implemented, I would actually get to choose which language I want to use instead of it being detected by geolocation or something. I’m tired of sites that default to horrible Russian translations, even though I can read English perfectly fine. For example Russian version of justin.tv is just cringeworthy.
Are you sure it’s geolocation? Multilingual sites should use content negotiation to serve the user-requested language that your browser sends in the HTTP header. Perhaps you have your browser mis-configured to request Russian?
Type in
wolframalpha.com
and look at the Alexa-information.
You will see, German is the Nr. 2 User-language, not far from English.
Google earns a lot of money with the german version.
Hmm it seems the Japanese error message is not implemented yet… I tried ??? and ?? and only got an English error message…
Having high hopes on this multilanguage endeavor!
That is really going to be great. I agree that it definitely is going to increase the diversity of the people using W|A. There’s one thing I wish W|A to have is a specific place to find more information on the quarry, I know it is going to make output bit messy, but sometime I do feel the need for more information (and I definitely do not mean the junk), from very good, informative, reliable source. Needless to say that I am big fan of W|A and do understand that its growing.
Guys, will you support quires into biblical stuff in Greek and Hebrew for in depth comparison, ie how often a verb form is used, locations of its occurrence compared to other passages, etc…
Hi,
Do you support Natural English language processing? In other words using your API, can I identify the verbs, nounds, objects etc.. from a given english sentence?
Thanks
Jerome
Thank you for your comment, unfortunately that information is not available.