BingMatrix – A Windows Azure application that provides a fun way to mine data from Bing

BingMatrix – A Windows Azure application that provides a fun way to mine data from Bing

  • Comments 4
  • Likes

I wanted to share a little application I put together using Windows Azure. It uses Bing queries to find out how the popularity of a specific set of keywords on a specific set of sites. I actually created this for my own use while researching how frequently some registry keys are mentioned on Microsoft support, on TechNet blogs or on the TechNet forums.

To get started, go to http://bingmatrix.cloudapp.net and provide:

  1. Title (option)
  2. List of keyword
  3. List of web sites
  4. Additional keywords (optional)

Here’s a sample screenshot of the input screen:

image

You can use use one of the sample queries provided. For instance, to get the data above I simply clicked the “SMB2” button on the right. To get your results, click on the big “Build my BingMatrix” button on the left. Please note that it will take a few seconds to build the matrix, so be patient. Here are the results for the sample above:

image

For the specific example above, it searches the 4 keywords on the 5 different sites. To do that, it goes to Bing 20 times to get the results. For each one, it uses one keyword from the “keywords list”, one of the sites from the “sites list” and adds the additional keywords on the “additional keywords”. For instance, for the query on “Performance” on the “blogs” site, it passes the following query string to Bing: +"Performance" +"File Server" +SMB2 site:blogs.technet.com. The table with the results includes links to each individual query, so you can go directly to Bing to find the details.

Here are a few additional sample results:

image

image

You should interpret these results carefully, since they can vary widely depending on the additional keywords provided. Also, Bing is constantly crawling the Internet, so the output for he same query will change over time. For instance, the numbers you see on the screenshots above will probably be different by the time you try them out. It's also important to note that if you get millions of hits for a certain query, the numbers are obviously less precise. If you get just a few dozen, they are usually fairly accurate.

You can also provide direct links to a BingMatrix query. For instance, here are direct URL to the complete list of 12 sample queries provided in the main page of the site:

Try it out at http://bingmatrix.cloudapp.net and make sure to post a comment if you like it. You can obviously type in any keywords, sites or additional keywords to build your own matrix. Just keep in mind that if your matrix is too big, it will take longer to process. It might also time out. This was a weekend project for me and I have made a few updates in the last few weekends. Feel free to provide feedback and suggest improvements.


Updated on 12/15/2010: Deployed a new version that works faster by using multiple threads to query Bing.
Updated on 12/21/2010: New sample queries added (8 total now).
Updated on 12/23/2010: New sample queries added (12 total now). Support for passing in parameters in the URL. Added title field. Added “Building…” message.
Updated on 01/03/2010: More sample queries added (18 total now). Samples now come from a SQL Azure database.

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
  • In a previous blog post, I described the BingMatrix search tool: blogs.technet.com/.../archive

  • Hi, Jose,

    I created a BingMatrix search for terms related to Windows Azure at oakleafblog.blogspot.com/.../windows-azure-and-cloud-computing-posts_24.html.

    Results for all but OData and Azure App Fabric appear to me to be excessive.

    Your comments?

    --rj

  • I have spent my last few weekends building an ASP.NET web application that sends multiple queries to

  • @rogerj

    BingMatrix simply reports the results provided by Bing with the required set of parameters. I have run into similar issues when using broader searches with a high result count by Bing. This seems to happen especially when the search term is a small word or when a site includes the specific search term very frequently. In your case, this seem to happen because your blog site includes those terms in every page multiple times (they are part of the heading and the labels that shows in every page). More focused searches seems to work better that do not hit every page in the site seem to work better.

    Thanks for experimenting with BingMatrix.

    Jose