Posted by Rob Knies

CHI 2012 logo 

You might have noticed in your most recent Bing search that, these days, you often can get an answer to your query without having to click through to one of the search results.

For example, I recently wondered about this year’s date for Mother’s Day in the United States. I typed “mother’s day” into Bing and was greeted with a bold line of text in the middle of the screen that read “Mother’s Day is on Sunday, May 13, 2012.” Great—just what I needed.

Such assistance, though, requires manual intervention. People research and write the direct answers for such requests, and, given that the list of potential web queries is endless, there aren’t enough people to go around. Expensive human help is available for only the most popular queries.

That’s what Direct Answers for Search Queries in the Long Tail aims to fix. The paper—written by Michael S. Bernstein of the Massachusetts Institute of Technology and Jaime Teevan, Susan Dumais, Dan Liebling, and Eric Horvitz of Microsoft Research Redmond and accepted for the Association for Computing Machinery’s 2012 Conference on Human Factors in Computing Systems, being held in Austin, Texas, from May 5 to 10—offers the solution of Tail Answers, a large collection of crowdsourced search results that are unpopular individually but together address a large proportion of search traffic.

“There are many uncommon queries, like ‘molasses substitute’ or ‘body temperature of a dog,’ that could be answered but aren’t,” Teevan explains. “Individually, none of these queries are issued often enough to make it worth the editorial effort, but collectively, they represent a significant portion of the query volume.”

The authors offer a straightforward approach to what is essentially an issue of extracting knowledge from huge volumes of data, using a combination of search-log analysis and crowdsourcing.

“We developed a method that automatically creates answers for these tail queries using the aggregate knowledge of thousands of web users,” Teevan says. “We mine people’s web-search behavior to identify questions that could potentially be resolved with just a short snippet of text and use crowdsourcing to filter out the misidentified queries.

“Similarly, we use search behavior to identify web resources that answer the filtered queries and use the crowd to extract the answers from these resources. By taking advantage of people’s implicit search behavior wherever possible and using paid crowd effort when necessary, we are able to cheaply extend the reach of search-engine answers into the tail.”

As you might imagine, one of the biggest challenges to such an approach involved quality control.

“Because the answers provided by search engines appear authoritative, they need to be accurate,” Teevan says. “We used people, via crowdsourcing, to vet the quality of the content identified by our algorithms, but we also needed to develop a way to vet the quality of the information provided by the people. We did this by collecting judgments for the same thing from multiple people and requiring them to agree with each other—and by sometimes asking questions we already knew the answer to.”

Clearly, the logic behind the project has proved effective. A user study of 361 participants demonstrated that Tail Answers provided significant improvement in users’ subjective ratings of search quality and their ability to find answers without having to click through. The findings indicate that search engines can be extended to respond directly to a large new class of queries.

“This work represents a way for search engines to aggregate user knowledge to improve not just ranking, but the entire search user experience,” Teevan concludes. “To generate Tail Answers, we use algorithms whenever possible, but rely on people to complete the parts that are easy for humans but hard for computers.”