…or, some highlights from the 2011 Conference of the Association for Computational Linguistics
First, a word from our curmudgeon: Far too many of the papers, unfortunately as usual, could be summarized as, “We applied several variations of a new complicated method, and one of the runs gave 1% improved results over the simple method that’s been around for 15 years.” Talks with new ideas were therefore very refreshing! I’ll talk about a few of these below (more to hopefully follow in future posts).
Identifying opposing sides in political debate from news articles
So much of sentiment analysis research over the last few years has amounted, more or less, to “how can we improve classification of positive and negative movie reviews by a couple of percent by adding features to bag-of-words?” None of these techniques add any new insights into the problem, nor do they improve results in any real way. The vein of incremental progress was exhausted years ago; advances now require new ways of looking at the problem.
A new view of sentiment analysis, reminding us that opinions have to do with people, was offered by Souneil Park, Kyung Soon Lee, and Junehwa Song in their paper Contrasting Opposing Views of News Articles on Contentious Issues, which described an approach not to classifying articles according to sentiment (which is usually not even a well-defined task), but to identifying the key figures on opposite sides of (political) debates, by analyzing newspaper text. The key idea is that figures (people or organizations) on each side will tend to attack figures on the other side, so if we can identify enough such attacks, we can automatically discover what the sides are (and who is on each). Sentiment analysis on sentences such as, “The government defined that the attack of North Korea is an act of invasion,” can (roughly) classify if one actor (“the government”) is saying positive or negative things (in this case, negative: “attack”, “invasion”) about another (“North Korea”).
“As cold as a robot fish” — from n-grams to linguistic creativity
Tony Veale, of University College Dublin, presented a fascinating exploration of how massive, but simple, language data can be mined for Creative Language Retrieval. By searching for certain kinds of fixed word patterns, pairs of stereotypically related terms can be found, which can then be used in creative ways. For example, by noting that “as as a ” is common, the system knows that fish are stereotypically cold, since “as as a ” is also common, the system knows that robots are stereotypically cold, and finally, since the phrase “robot fish” occurs more often than one would expect by chance, there is such a thing as a robot fish. The system can then create the novel (and evocative) simile, “as cold as a robot fish”. Novel metaphors can also be created — “he is a wise professor” can be transformed into “he is a wise old owl”. To see some very entertaining online demos of this technology, see
Educated Insolence, where the demos include Idiom Savant, The Jigsaw Bard, and Aristotle, among others. This site is as exciting as the lifestyle of a princess (as the Jigsaw Bard puts it).
Automatic source criticism of the Hebrew Scriptures
For centuries Bible scholars have attempted to identify distinct authorial voices within books of the Bible, though their methods have often been criticized as subjective and even impressionistic. Moshe Koppel, Navot Akiva, Nachum Dershowitz, and Idan Dershowitz have devised an algorithm that automatically divides a document into distinct strands by automatically detecting which parts of the text make different choices among available synonyms. For example, some parts of the text will consistently use the Hebrew word makel (meaning “staff”), while others will consistently use mateh (also meaning “staff”). By formalizing and generalizing this phenomenon, the researchers showed that when two Biblical books, such as Jeremiah and Ezekiel, are randomly mixed together, the merged book could be automatically almost-perfectly separated out to its constituent components.
UPDATE: See this AP story on this research.
- Posted using BlogPress from my iPad