Can an Alligator Run the Hundred Meter Hurdles?

Gary Marcus, writing in The New Yorker, offers a summary of why artificial intelligence isn’t so intelligent (and has a long way to go to catch up with the human brain). He focuses on the research of Hector Levesque, who is a critic of the modern A.I.:

In a terrific paper just presented at the premier international conference on artificial intelligence, Levesque, a University of Toronto computer scientist who studies these questions, has taken just about everyone in the field of A.I. to task. He argues that his colleagues have forgotten about the “intelligence” part of artificial intelligence.

Levesque starts with a critique of Alan Turing’s famous “Turing test,” in which a human, through a question-and-answer session, tries to distinguish machines from people. You’d think that if a machine could pass the test, we could safely conclude that the machine was intelligent. But Levesque argues that the Turing test is almost meaningless, because it is far too easy to game. Every year, a number of machines compete in the challenge for real, seeking something called the Loebner Prize. But the winners aren’t genuinely intelligent; instead, they tend to be more like parlor tricks, and they’re almost inherently deceitful. If a person asks a machine “How tall are you?” and the machine wants to win the Turing test, it has no choice but to confabulate. It has turned out, in fact, that the winners tend to use bluster and misdirection far more than anything approximating true intelligence. One program worked by pretending to be paranoid; others have done well by tossing off one-liners that distract interlocutors. The fakery involved in most efforts at beating the Turing test is emblematic: the real mission of A.I. ought to be building intelligence, not building software that is specifically tuned toward fixing some sort of arbitrary test.

The crux, it seems to me, is how machines interpret the subtleties of human communication and how we talk. Marcus offers the following example in which a substitute of one word yields disparate answers:

The large ball crashed right through the table because it was made of Styrofoam. What was made of Styrofoam? (The alternative formulation replaces Stryrofoam with steel.)

a) The large ball
b) The table

Continuing, he explains:

These examples, which hinge on the linguistic phenomenon known as anaphora, are hard both because they require common sense—which still eludes machines—and because they get at things people don’t bother to mention on Web pages, and that don’t end up in giant data sets.

More broadly, they are instances of what I like to call the Long-Tail Problem: common questions can often be answered simply by trawling the Web, but rare questions can still stymie all the resources of a whole Web full of Big Data. Most A.I. programs are in trouble if what they’re looking for is not spelled out explicitly on a Web page. This is part of the reason for Watson’s most famous gaffe—mistaking Toronto for a city in the United States.

Levesque’s paper is short and easily accessible for the layman.

Watson Gets a Job on Wall Street

After winning Jeopardy! against Ken Jennings and Brad Rutter last year, IBM’s Watson is on to bigger and better things. Bloomberg reports that Watson is now set to work on Wall Street:

IBM expects to generate billions in new revenue by 2015 by putting Watson to work. The technology giant has already sold Watson to health-care clients, helping WellPoint Inc. (WLP)and Seton Health Family analyze data to improve care. IBM executives say Watson’s skills — understanding and processing natural language, consulting vast volumes of unstructured information, and accurately answering questions with humanlike cognition — are also well suited for the finance industry.

Financial services is the “next big one for us,” said Manoj Saxena, the man responsible for finding Watson work. IBM is confident that with a little training, the quiz-show star that can read and understand 200 million pages in three seconds can make money for IBM by helping financial firms identify risks, rewards and customer wants mere human experts may overlook.

Banks spent about $400 billion on information technology last year, said Michael Versace, head of risk research at International Data Corp.’s Financial Insights, which has done research for IBM.

I am not sure if this is entirely good news…

Recap: IBM’s Watson Dominates at Jeopardy!

“I, for one, welcome our new computer overlords.”

So wrote Ken Jennings as part of his correct response to the Final Jeopardy! clue in tonight’s final game between Ken Jennings, Brad Rutter, and the newcomer, IBM’s Watson.

Though it didn’t do as well today as it did in match 1, Watson had another impressive showing in Game 2, earning $41,413 and combined with his $35,734 in game 1, won the two-day affair with combined winnings of $77,147. At the end of the Double Jeopardy! round, Ken Jennings computed that he wouldn’t be able to beat Watson even if Ken wagered it all, so he wagered conservatively in Final Jeopardy!. The category was 19th Century  Novelists and the clue was:

“William Wilkenson’s ‘An Account of the Principalities of Wallachia and Moldavia’ inspired this author’s most famous novel.”

All three contestants got the right answer: Bram Stoker, who wrote Dracula (I got it right at home, playing along). In the end, Ken Jennings wound up with $24,000. Brad Rutter came in third with $21,600 in winnings. Of course, IBM’s Watson was playing for charity, and the cool $1,000,000 prize will be split between World Vision and World Community Grid.

Overall, I was very impressed with Watson’s showing. And yes, I was wrong with my prediction that I made last week. The last three days on Jeopardy! have been a blast.

So what’s next for IBM? According to the New York Times:

For I.B.M., the future will happen very quickly, company executives said. On Thursday it plans to announce that it will collaborate with Columbia University and the University of Maryland to create a physician’s assistant service that will allow doctors to query a cybernetic assistant. The company also plans to work with Nuance Communications Inc. to add voice recognition to the physician’s assistant, possibly making the service available in as little as 18 months.

###
References:

1) Selected Nuances of Watson’s Strategies (How does Watson know what it know?)

2) Watson’s Wagering Strategies (excellent blog post from one of IBM’s researchers)

3)  All the questions and answers from Game 1 (part 1 and part 2) and from Game 2 in this Jeopardy! contest between Watson, Ken Jennings, and Brad Rutter.

 

Ken Jennings and Brad Rutter vs. IBM’s Watson on Jeopardy!

Jeopardy! is one of my all-time favorite shows on Television. When I was in high school, I used to watch every show (it was a nightly ritual). These days, I watch Jeopardy! less than I used to in my younger days, but I’ll be certainly tuning in next week to see Ken Jennings (winner of 74 consecutive games on the show, with a total loot of over $3 million in prize money) and Brad Rutter (the biggest all-time money winner on the show) take on IBM’s artificial intelligence software, Watson.

Perhaps I am downplaying my enthusiasm. I’m really, really looking forward to this Jeopardy! contest, which will take place over three days on February 14, 15, and 16. Watson doesn’t just represent a machine force-fed a bunch of encyclopedias, dictionaries, and thesauri (although that’s certainly a major component of it); no, the machine is also facing the arduous task of deciphering the natural (English) language and everything it entails: nuances, hyperboles, puns, dialects, slang, metaphor, and so much more. In other words, Jeopardy! is a perfect test-drive for Watson: the questions on the show aren’t meant for a computer to answer. And this is why this Watson is such a huge deal:

Watson is a powerful machine. Its setup is called “Massively Parallel Probabilistic Evidence-Based Architecture,” and it runs on 2,800 Power7 processing cores. If all that computer and interpreting power presents an answer that satisfies Watson’s confidence interval, it “rings in” with the answer:

Concerning Watson, Richard Powers made a superb op-ed contribution to the New York Times. In “What is Artificial Intelligence?” (link via @openculture) he elaborates on Watson’s challenge:

Open-domain question answering has long been one of the great holy grails of artificial intelligence. It is considerably harder to formalize than chess. It goes well beyond what search engines like Google do when they comb data for keywords. Google can give you 300,000 page matches for a search of the terms “greyhound,” “origin” and “African country,” which you can then comb through at your leisure to find what you need.

Asked in what African country the greyhound originated, Watson can tell you in a couple of seconds that the authoritative consensus favors Egypt. But to stand a chance of defeating Mr. Jennings and Mr. Rutter, Watson will have to be able to beat them to the buzzer at least half the time and answer with something like 90 percent accuracy.

So the task is two-fold: first arrive at the correct answer and then “buzz in” before Ken Jennings or Brad Rutter. From my observation, Ken Jennings is one of the best players to have mastered the art of the buzz-in (I suspect that for many questions that his competitors knew that he knew as well, his ability to buzz-in before them contributed to at least 30% of his daily winnings on the show). However, I believe the IBM engineers designed Watson for maximum efficiency: as soon as the last syllable escapes Alex Trebek’s lips, Watson will ring in with the answer.

Actually, Watson already competed in a practice round against humans–and beat them, badly. The most interesting tidbit comes from this Discover Magazine piece:

The questions were fed in plain text to Watson, but it had to wait the same amount of time to ring in as the human players did. To make the game fair, it also had to trigger a mechanical signaling button. Watson spoke in a stilted computerized voice–and was almost never wrong.

So if that is the case, Watson isn’t truly “listening” to the questions posed by Trebek; rather, it is reading the plain text. This is important for a reason: if there’s a clue where Alex Trebek accentuates certain part of the answer or perhaps changes his intonation or even his accent, this will be a help for the human contestants. The discrepancy between how the humans are processing the answers vs. Watson cannot be overlooked. For those unfamiliar with Jeopardy!’s buzzer system, the way it is designed is to lock-out the buzzer until Alex Trebek has finished reading the question, and the lock-out period is determined by a human producer (who sits off-camera, and has a button of his/her own which enables the buzzers). If a contestant were to buzz in before the producer pushes that button, the contestant’s buzzer is automatically locked out for three seconds, and any attempt to buzz in before that “penalty” period expires locks the contestant’s buzzer for another three seconds. So this raises an important question: how does Watson know when it can buzz in (i.e., how does it receive notification that the lock-out period has passed?). Will the human players have the advantage here, or will Watson?

Another important question to ponder is this one raised by Richard Powers:

Answers, for Watson, are a statistical thing, a matter of frequency and likelihood. If, after a couple of seconds, the countless possibilities produced by the 100-some algorithms converge on a solution whose chances pass Watson’s threshold of confidence, it buzzes in.

This raises the question of whether Watson is really answering questions at all or is just noticing statistical correlations in vast amounts of data.

In a sense, Watson is like a machine trained in quantum mechanics: it can never be certain about any of the answers, but if it can break the confidence threshold (whatever that may be) for the answer (err, question), it will surely try to buzz in with the response.

I like Powers’s conclusion:

It does not matter who will win this $1 million Valentine’s Day contest. We all know who will be champion, eventually. The real showdown is between us and our own future. Information is growing many times faster than anyone’s ability to manage it, and Watson may prove crucial in helping to turn all that noise into knowledge.

I finish this post with a prediction: I believe Ken Jennings will be the champion of this Jeopardy! contest. I have high hopes for Watson, and I believe it will do quite well, but I am sticking with the all-time champion. Ken is just too good with the buzzer and an absolute whiz for me to bet against him.

What about you? Who do you think will win?