To Compute or Not to Compute—Wolfram|Alpha Analyzes Shakespeare’s Plays

April 10, 2012 — The Wolfram|Alpha Team

For hundreds of years, scholars have carefully studied the plays of Shakespeare, breaking down the language and carefully dissecting every act and scene. We thought it would be interesting to see what sorts of computational insights Wolfram|Alpha could provide, so we uploaded the complete catalog of Shakespeare’s plays into our database. This allows our users to examine Romeo and Juliet, Macbeth, Othello, and the rest of the Bard’s plays in an entirely new way.

Entering a play into Wolfram|Alpha, like A Midsummer Night’s Dream, brings up basic information, such as number of acts, scenes, and characters. It also provides more in-depth info like longest word, most frequent words, number of words and sentences, and more. It’s also easy to find more specific information about a particular act or scene with queries like “What is the longest word in King Lear?”, “What is the average sentence length of Macbeth?”, and “How many unique words are there in Twelfth Night?”.

These queries can be used to analyze multiple plays at same time as well:

Asking Wolfram|Alpha for information about specific characters is where things really begin to get interesting. We took the dialog from each play and organized them into dialog timelines that show when each character talks within a specific play. For example, if you look at the dialog timeline of Julius Caesar, you’ll notice that Brutus and Cassius have steady dialog throughout the whole play, but Caesar’s dialog stops about halfway through. I wonder why that is?

Wolfram|Alpha can also provide an analysis of just a specific act or scene of a play. The query “The Merchant of Venice, Act 2, Scene 5” brings up the data analysis for just that part of the play. Getting more specific is also possible, like finding the exact number of words in a specific act and scene of a play:

In addition to Shakespeare’s plays, Wolfram|Alpha can also analyze other famous works of literature, including Moby Dick, Great Expectations, and Adventures of Huckleberry Finn. We hope you enjoy being able to perform computational analysis on these texts and would love your suggestions on new features and texts to add.

30 Comments

Where did the data come from? Did a person or a computer painstakingly collect it all?

Posted by Mark Stewart April 10, 2012 at 10:16 am

Well this just shows Wolfram Alpha is the future.

Posted by Nick April 10, 2012 at 10:31 am

What’s about adding the Bible as a text?

Posted by M April 10, 2012 at 10:36 am

I agree with M. It would be very interesting and useful to have the Bible as a text (in multiple translations, of course).

Posted by Jeremy H April 11, 2012 at 1:15 am

Which version; God’s or man’s?
If man’s, which of the various versions, and then ya gotta decide from among the numerous translations of those versions?
Then you can ask Wolframaplha, “Which one is the real ‘Bible’?”

Posted by fynali April 11, 2012 at 6:50 am

@fynali do you have God’s email so I could ask for an html copy of her version?

Posted by rational April 11, 2012 at 11:45 am

Second that…aside from the choosing a translation question, would be helpful to have the analysis

Posted by PC April 11, 2012 at 9:23 am

This is fantastic. Can I do something similar with my own text in Mathematica? I was thinking about analyzing the text of a Mozart’s opera, Cosi Fan Tutte. The dialog time graph is nice.

Posted by ErnestoA April 10, 2012 at 12:29 pm

I asked
What plays did Shakespeare write?
and WA only understood the word Shakespeare.
I suggest that when data is curated Wolfram Alpha itself also curates the data doing its best to integrate it with all its existing data by asking the curating team the necessary questions. Initally this would create a lot of work but the longer it is left the worse it will get.

Posted by Brian Gilbert April 10, 2012 at 1:36 pm

Shakespeare’s plays each exist in one, two, or three discrete and different 16th/17th century texts which must be published separately or edited into a composite text by an editor. There are thousands of editions of each play and of the complete works since the first Folio, 1623 Each edition has different words, lines, line numbers, scene divisions, spelling, and punctuation. Which text is Alpha using in its database. How does it compare to the many other databases with the complete plays now online?

Posted by Carol Thomas Neely April 10, 2012 at 2:38 pm

I think you guys should let the user read the text if they need to. Stuff like Shakespeare is in the public domain, which would let you avoid copyright issues. This could be pretty helpful.

Posted by Roger April 10, 2012 at 8:21 pm

This is awesome! Which editions of Shakespeare’s works are you using? I’m a college librarian and would love to recommend this to students…but it will matter to faculty which versions of the texts this includes.

Posted by Chris Strauber April 11, 2012 at 6:30 am

What’s about adding Lord of the Rings book trilogy?

Posted by Drake April 11, 2012 at 7:19 am

F. Scott Fitzgerald?

Posted by TBK April 11, 2012 at 7:21 am

The search doesn’t accept that Edward III was a play as well as a monarch.

Posted by Stuart Ian Burns April 11, 2012 at 8:20 am

Be nice! So these dudes haven’t read JC; they chose not to. You, on the other hand, could more than likely not write the code that analyzes these literary works, if you chose to.

Posted by ms english teacher April 11, 2012 at 12:08 pm

PS. There’s a good chance that the question about JC was tongue-in-cheek.

Posted by ms english teacher April 11, 2012 at 1:09 pm

Concerning: “For example, if you look at the dialog timeline of Julius Caesar, you’ll notice that Brutus and Cassius have steady dialog throughout the whole play, but Caesar’s dialog stops about halfway through. I wonder why that is?”

Well, how about reading the play – the answer is very simple.

Posted by hairy April 11, 2012 at 8:35 am

I would suggest to add more popular books by such authors as e.g. Arthur Conan Doyle, Agatha Christie, Edgar Rise Burroughs, Isaac Asimov etc. Most text are now in public domain and could be found on Project Gutenberg (http://www.gutenberg.org/). Off course, some high level parsing will be required to remove the text of licence and non authors foreword.

Advantages:
– very handy for those who learn language,
– can be used for tasks of language processing and automated translation,
– it is just interesting!

Posted by Dmitry April 11, 2012 at 9:08 am

Any chance to get the complete works of Jonathan Edwards up on here? He is the most studied American Puritan historical figure in high schools due to his literary prowess.

Posted by Michael April 11, 2012 at 10:21 am

“How many words did you have to say as King Lear at the Aldwitch in ’52?

Sir Edwin: Ah, well, I don’t want you to get the impression it’s just a question of the number of words… um… I mean, getting them in the right order is just as important. Old Peter Hall used to say to me, ‘They’re all there Eddie, now we’ve got to get them in the right order.’ ”

–Monty Python, “Great Actors”

Posted by shakesyear April 12, 2012 at 8:57 am

5625 words.
http://www.wolframalpha.com/input/?i=King%20Lear&t=crmtb01
Aldwych, not Aldwitch.

Posted by This many words April 15, 2012 at 9:38 am

This is awesome! Stuff like Shakespeare is in the public domain, which would let you avoid copyright issues. This could be useful.

Posted by John April 13, 2012 at 12:03 pm

I got a text character count for the following query:
How many characters are there in Titus Andronicus?

How would I determine data like the number of named roles in a particular play?

Posted by Dan April 18, 2012 at 11:07 am

This reminds me of a character in “If on a winter’s night a traveler,” by Italo Calvino. She puts novels into a database that processes them and then relays lists of word frequencies, from which she can deduce the main themes and tones. Now WolframAlpha has not only word frequencies, but frequencies of characters and other such information for analysis. Thirty-three years, and now there’s a possibility of modern Lotarias. Interesting.

Posted by Susan April 22, 2012 at 6:43 pm

Marry me, Wolframalpha. Marry me.

Posted by Mumin April 22, 2012 at 7:06 pm

“We hope you enjoy being able to perform computational analysis on these texts and would love your suggestions on new features and texts to add.”…….

……it will be easy to do if all the developers Wolfram | Alpha in the input function will add the import file \ file works.
🙂

Posted by Alex Kulay April 23, 2012 at 5:48 am

Analysis of this type with respect to the Hebrew Bible was done as far back as 1994. See http://www.torahcode.co.il/pdf_files/pub/wrr.pdf

Posted by Ariella (Dr. Brown) April 24, 2012 at 9:09 am

I would like to see the works of William Gibson made available for Wolfram Alpha. Already volunteers have made concordances of some of his works, so the demand is there.

Posted by Lilly Hunter (@LillyLyle) April 25, 2012 at 11:00 pm

It would be nice if W|A would allow counting not only words usage, but also digramms. Or, at least, have shown series of most frequently used digramms. That, in future, might be very helpful for the text analysis by using power of W|A. Technically, that would not differ from words counting, of course. Second, it would be nice if there would be a possibility to analize other texts, not only by Shakespeare.

Posted by Sv23 May 5, 2012 at 10:26 am