Posted by Rob Knies

Mark Manasse 

So, if you’re writing a book called On the Efficient Determination of Most Near Neighbors: Horseshoes, Hand Grenades, Web Search and Other Situations When Close is Close Enough, how exactly do you start?

You could start by providing an overview of the matter at hand. Or you could start with your own rich research history in the domain being discussed. Or you could just dive right into the discussion with a passage called Cumulative Distribution and Probability Density Functions.

But if you’re Mark Manasse,  principal researcher at Microsoft Research Silicon Valley, and you’re writing a book on the concept of most near neighbors, a book that combines mathematical and engineering principles in an effort to gather together the various research directions in his chosen field, you do all that—but first you lead with a joke.

It’s one you know, the one whose punch line involves the concept of not having to outrun a bear, just having to outrun the other person the bear is chasing.

It’s a fine joke, and, from all appearances, Manasse’s book is just as fine. Sometimes, you can judge a book by its cover, and the back of this one includes this comment, attributed to George Varghese, Professor, University of California, San Diego:

“From de-duplication to search, billion dollar industries rely on the ability to search for keys that are ‘close’ to a specified key. The book by Mark Manasse provides a beautiful exposition of the field.”

(The fact that Varghese now works for Microsoft Research certainly did not overly influence his admiration. Given the context, let’s just call his objectivity “close enough.”)

How did the book come about? We went to the source.

“My motivation,” Manasse says, “was that the chain of disconnected papers describing the series of results we and others have in describing Jaccard approximation has gotten pretty long—long enough that it’s hard to put all of the pieces together in a coherent way.

“Jaccard similarity [a statistic used to compare the similarity and differences of sample sets] has proven itself useful in a few contexts—web and image similarity, storage deduplication, and a few others—so having a single reference location for understanding both weighted and unweighted variants, and understanding the implementation efficiency tradeoffs when moving from unweighted sampling to weighted sampling, seemed useful.”

The book—from Morgan & Claypool Publishers and No. 24 in the series Synthesis Lectures on Information Concepts, Retrieval, and Services, edited by Gary Marchionini of the University of North Carolina at Chapel Hill—is a slender 88 pages, but for those interested in issues such as deduplication, uniform sampling, and, particularly, web search, it is certain to offer a fascinating survey of research to date, especially given that the author was around virtually at the dawn of web search, back in the mid-’90s at pioneering search site AltaVista.

“I would like people to take away two things,” Manasse says. “One, that Jaccard similarity estimations are much faster now than they were 15 years ago, as Andrei Broder and I first described them in a still cited, used, and reimplemented paper at a few universities. And two, that clever tricks in algorithm engineering are still as valuable as they were when the HAKMEM collection was published at MIT nearly 40 years ago.”

Other pertinent facts you might be interested in knowing about the author: He joined Microsoft Research in 2001, Wired has termed him “the guru of micropayments,” he helped establish the field of online algorithms, and, in 1994, Newsweek referred to his band Severe Tire Damage (Manasse’s the bass player) as “lesser-known” than the Rolling Stones—all documented and indisputable.

So how does an accomplished researcher tackling a complicated, research-heavy subject end up opening his book with a variant on the line “I don’t have to outrun the bear. I just have to outrun you”?

“Algorithms, theory, and wit are all important to me,” he says. “Having a theoretical basis and understanding is a key to much of what I do. Realizing that theory in algorithms is what makes it useful to others. Having a sense of humor is what makes it seem more like fun than work.

“That’s probably the least essential, but it’s how I roll.”