To Compute or Not to Compute—Wolfram|Alpha Analyzes Shakespeare’s Plays
For hundreds of years, scholars have carefully studied the plays of Shakespeare, breaking down the language and carefully dissecting every act and scene. We thought it would be interesting to see what sorts of computational insights Wolfram|Alpha could provide, so we uploaded the complete catalog of Shakespeare’s plays into our database. This allows our users to examine Romeo and Juliet, Macbeth, Othello, and the rest of the Bard’s plays in an entirely new way.
Entering a play into Wolfram|Alpha, like A Midsummer Night’s Dream, brings up basic information, such as number of acts, scenes, and characters. It also provides more in-depth info like longest word, most frequent words, number of words and sentences, and more. It’s also easy to find more specific information about a particular act or scene with queries like “What is the longest word in King Lear?”, “What is the average sentence length of Macbeth?”, and “How many unique words are there in Twelfth Night?”.
These queries can be used to analyze multiple plays at same time as well:
Asking Wolfram|Alpha for information about specific characters is where things really begin to get interesting. We took the dialog from each play and organized them into dialog timelines that show when each character talks within a specific play. For example, if you look at the dialog timeline of Julius Caesar, you’ll notice that Brutus and Cassius have steady dialog throughout the whole play, but Caesar’s dialog stops about halfway through. I wonder why that is?
Wolfram|Alpha can also provide an analysis of just a specific act or scene of a play. The query “The Merchant of Venice, Act 2, Scene 5” brings up the data analysis for just that part of the play. Getting more specific is also possible, like finding the exact number of words in a specific act and scene of a play:
In addition to Shakespeare’s plays, Wolfram|Alpha can also analyze other famous works of literature, including Moby Dick, Great Expectations, and Adventures of Huckleberry Finn. We hope you enjoy being able to perform computational analysis on these texts and would love your suggestions on new features and texts to add.
Where did the data come from? Did a person or a computer painstakingly collect it all?
I agree with M. It would be very interesting and useful to have the Bible as a text (in multiple translations, of course).
Which version; God’s or man’s?
If man’s, which of the various versions, and then ya gotta decide from among the numerous translations of those versions?
Then you can ask Wolframaplha, “Which one is the real ‘Bible’?”
@fynali do you have God’s email so I could ask for an html copy of her version?
Second that…aside from the choosing a translation question, would be helpful to have the analysis
This is fantastic. Can I do something similar with my own text in Mathematica? I was thinking about analyzing the text of a Mozart’s opera, Cosi Fan Tutte. The dialog time graph is nice.
I asked
What plays did Shakespeare write?
and WA only understood the word Shakespeare.
I suggest that when data is curated Wolfram Alpha itself also curates the data doing its best to integrate it with all its existing data by asking the curating team the necessary questions. Initally this would create a lot of work but the longer it is left the worse it will get.
Shakespeare’s plays each exist in one, two, or three discrete and different 16th/17th century texts which must be published separately or edited into a composite text by an editor. There are thousands of editions of each play and of the complete works since the first Folio, 1623 Each edition has different words, lines, line numbers, scene divisions, spelling, and punctuation. Which text is Alpha using in its database. How does it compare to the many other databases with the complete plays now online?
I think you guys should let the user read the text if they need to. Stuff like Shakespeare is in the public domain, which would let you avoid copyright issues. This could be pretty helpful.
This is awesome! Which editions of Shakespeare’s works are you using? I’m a college librarian and would love to recommend this to students…but it will matter to faculty which versions of the texts this includes.
The search doesn’t accept that Edward III was a play as well as a monarch.
Be nice! So these dudes haven’t read JC; they chose not to. You, on the other hand, could more than likely not write the code that analyzes these literary works, if you chose to.
PS. There’s a good chance that the question about JC was tongue-in-cheek.
Concerning: “For example, if you look at the dialog timeline of Julius Caesar, you’ll notice that Brutus and Cassius have steady dialog throughout the whole play, but Caesar’s dialog stops about halfway through. I wonder why that is?”
Well, how about reading the play – the answer is very simple.
I would suggest to add more popular books by such authors as e.g. Arthur Conan Doyle, Agatha Christie, Edgar Rise Burroughs, Isaac Asimov etc. Most text are now in public domain and could be found on Project Gutenberg (http://www.gutenberg.org/). Off course, some high level parsing will be required to remove the text of licence and non authors foreword.
Advantages:
– very handy for those who learn language,
– can be used for tasks of language processing and automated translation,
– it is just interesting!
Any chance to get the complete works of Jonathan Edwards up on here? He is the most studied American Puritan historical figure in high schools due to his literary prowess.
“How many words did you have to say as King Lear at the Aldwitch in ’52?
Sir Edwin: Ah, well, I don’t want you to get the impression it’s just a question of the number of words… um… I mean, getting them in the right order is just as important. Old Peter Hall used to say to me, ‘They’re all there Eddie, now we’ve got to get them in the right order.’ ”
–Monty Python, “Great Actors”
5625 words.
http://www.wolframalpha.com/input/?i=King%20Lear&t=crmtb01
Aldwych, not Aldwitch.
This is awesome! Stuff like Shakespeare is in the public domain, which would let you avoid copyright issues. This could be useful.
I got a text character count for the following query:
How many characters are there in Titus Andronicus?
How would I determine data like the number of named roles in a particular play?
This reminds me of a character in “If on a winter’s night a traveler,” by Italo Calvino. She puts novels into a database that processes them and then relays lists of word frequencies, from which she can deduce the main themes and tones. Now WolframAlpha has not only word frequencies, but frequencies of characters and other such information for analysis. Thirty-three years, and now there’s a possibility of modern Lotarias. Interesting.
“We hope you enjoy being able to perform computational analysis on these texts and would love your suggestions on new features and texts to add.”…….
……it will be easy to do if all the developers Wolfram | Alpha in the input function will add the import file \ file works.
🙂
Analysis of this type with respect to the Hebrew Bible was done as far back as 1994. See http://www.torahcode.co.il/pdf_files/pub/wrr.pdf
I would like to see the works of William Gibson made available for Wolfram Alpha. Already volunteers have made concordances of some of his works, so the demand is there.
It would be nice if W|A would allow counting not only words usage, but also digramms. Or, at least, have shown series of most frequently used digramms. That, in future, might be very helpful for the text analysis by using power of W|A. Technically, that would not differ from words counting, of course. Second, it would be nice if there would be a possibility to analize other texts, not only by Shakespeare.