----------------------------------------------------------------------
This article was sent to you by someone who found it on SFGate.
The original article can be found on SFGate.com here:
http://www.sfgate.com/cgi-bin/article.cgi?file=/c/a/2010/10/17/BUJ61FTF9I.DTL
---------------------------------------------------------------------
Sunday, October 17, 2010 (SF Chronicle)
How Google understands language like a 10-year-old
James Temple, Chronicle Staff Writer
Language has long been one of the most difficult challenges in artificial
intelligence research, mainly because programs are based on rules, while
native tongues cobbled together over hundreds of years tend to flout them.
Researchers only began to make major strides in the last 15 years or so,
once they began supplementing rules with a so-called statistical approach.
Put very simply: By analyzing huge quantities of human text, initially
labeled and dissected in much the manner of English class sentence
diagramming, machines eventually begin to detect the patterns that define
the use of language. After a certain stage of development, the algorithms
can be unleashed onto raw or unstructured data, and continue to refine
their understanding.
The same process has led to similarly momentous advances in language
translation tools, and machine perception technologies like facial and
voice recognition.
The success of this approach has been further propelled by two key
developments: The sudden availability of massive amounts of digital text
in the way of the Internet, and the enormous computing power available to
researchers through server farms strung together across the planet.
Now when Google's computers confront a word with multiple meanings, they
can rely on the same clues that humans use to understand the meaning.
Take the word "can." It might be a noun (a metal container), a verb (to
put something into such a container) or a modal verb (to be able to do
so). You can can something in a can.
Based on the billions of examples its algorithm has analyzed, Google knows
it's highly likely that if "can" is preceded by a pronoun ("you") it's
most likely the modal verb. If it's followed by an object ("something")
it's most likely a verb. If it comes after an article ("a") it's most
likely a metal container. (And in just about every case other than the one
in the preceding paragraph, two cans in a row would probably denote a
dance.)
The search engine has also begun to understand which words are synonyms
for others. That's why today Google knows that a user typing the query
"change memory in my laptop" would probably be interested in a string of
text online that reads "install laptop RAM," even though only one word is
the same. Google was incapable of a match like that as recently as three
years ago.
These improvements have allowed users to increasingly express their
queries using natural language, instead of breaking down their wants into
three-word Boolean expressions. As consumers have caught on to this, the
length of average queries has steadily grown.
Artificial intelligence isn't a silver bullet to online search, however.
Google is continually tweaking its algorithms to address shortcomings, but
some problems can be quite difficult to solve.
For instance, "pain killers that don't upset stomach," a fairly common
query, trips up the engine because it's not great at negation. Typically,
the words in a query represent things people do - not "don't" - want to
find.
And sometimes probability works against the search engine: Google tends to
think that Dell and Lenovo are the same thing because so many similar
words show up around the names of the two computer manufacturers.
The algorithm's understanding of language "has moved from a 2-year-old
infant to something close to an 8 or 10-year-old child," said Amit
Singhal, a Google Fellow, an honorific reserved for the company's top
engineers. "They're still not approaching the conversations you'd have as
a teenager."
E-mail James Temple at jtemple@sfchronicle.com. ----------------------------------------------------------------------
Copyright 2010 SF Chronicle
No comments:
Post a Comment