The Wolfram|Alpha Blog is now part of the Wolfram Blog. Join us there for the latest on Wolfram|Alpha and other Wolfram offerings »
The Wolfram|Alpha Team

Keeping Up with the Smiths

August 26, 2009 —
Comments Off

In an earlier post, we had some fun with Wolfram|Alpha’s popular collection of  name data and its ability to compare given names’ popularity and demonstrate historical naming trends. Wolfram|Alpha can also compute statistics for surnames, rank them in order of commonality, and provide the approximate number of people living in the United States with any last name.

The data Wolfram|Alpha uses to compute surname statistics is largely drawn from name results from the U.S. Census. The United States is sometimes referred to as a “melting pot” because of the number of people who move to it from all corners of the world, bringing and melding their native cultures. Because of this, surnames found in the U.S. have origins from all over the world.

In this example below, we compare a set of random surnames. Take a guess at the most common surname in the U.S. Yes, it’s Smith. According to Wolfram|Alpha there are approximately 2.376 million Smiths living in the U.S.—that’s almost the population of Nevada.

Wolfram|Alpha ranks the surnames Smith, Nguyen, Gonzales, Lee by

A quick query of the surname “Jefferson” reveals that it is 594th most popular surname in the U.S. If you are a Jefferson, that means 1 in every 5252 people shares your last name, or a total of 51, 361 people—that would be some family reunion!

Wolfram|Alpha generates the numbers behind the last name Jefferson

Go ahead and query Wolfram|Alpha to learn more about the number of people living in the U.S. who share your last name. It may be more (or less) common than you think!

7 Comments

Doesn’t always work. Check the name Lindahl in Illinois will show that exists for a few people, but WA doesn’t understand it – probably because of it’s scarcity.

Posted by Sean Gorman August 26, 2009 at 12:51 pm

I would love to check my surname, but Wolfram|Alpha doesn’t think it’s a name.

When I search on “Stegbauer” or even “name Stegbauer”, I get the message:

Wolfram|Alpha isn’t sure what to do with your input.

Boo Hoo,
Randy

Posted by Randy August 26, 2009 at 1:39 pm

I know the Durrett surname in the U.S. goes back to at least 1715. It returns no result in W|A. Where is the cutoff floor for inclusion in your database? 10,000 occurrances? 25,000? 50,000? Your database is incomplete and I am certain there are many other U.S resident surnames that have been omitted. Isn’t complete U.S. Census available to W|A? Why not?

Posted by Bob D. August 26, 2009 at 4:43 pm

    Hi Bob,

    We are curating and adding data to Wolfram|Alpha all of the time. As we do so its knowledge will continue to grow. Thank you for the feedback.

    Posted by The PR Team August 27, 2009 at 8:49 am

W|A heal thyself. As part of its selfchecking I suggest that :-
1. W|A takes every succesful input. Copies the ‘Input Interpretation’ if there is one back into the Input field and tries again. If the second Input is unsuccessful then route it for investigation.
2. W|A times the response to a spellcheck at frequent intervals and if the time is unacceptable it routes it for investigation.
3. Assign someone to review all problems and devise a W|A self-check when appropriate. In some cases the self-check would cover a range of problems.
4. Add a governor to W|A. This would vary its behaviour to suit current circumstances. For example if a resource was near to overload it would reduce the amount of that resource allowed to a single question and vice-versa. While the service is free this sort of experiment is quite acceptable..Output could also be varied making some output subject to an explicit request if the resources required were near to overload. For example a graph might normally appear immediately but if under pressure it would be replaced by the usual type of red word for the user to click on if they definitely wanted it.
5. W|A to email the authors of all unresolved posts at intervals asking if they considered the post resolved, with a link to the post so that they can flag them as resolved if appropriate.

Posted by Brian Gilbert August 27, 2009 at 6:54 am

    All good suggestions, and #5 is excellent.

    Posted by Bob D. August 27, 2009 at 11:14 am

Any chance of getting non-U.S.-centric name data? I’d love to compare popularity of names in the U.K. against names in the U.S., for example.

Posted by Maristic August 30, 2009 at 1:57 am