A group of researchers at CMU have been considering a notion of blog importance based on how likely a set of blogs is to ensure that you will be informed of topics bursting in the blogosphere. By analogy, they consider a graph of water pipelines. Their paper – Cost-Effective Outbreak Detection in Networks Leskovec, Krause, Guestrin, Faloutsos, VanBriesen, Glance – poses the problem:
Given a water distribution network, where should we place sensors to quickly detect contaminants? Or, which blogs should we read to avoid missing important stories? These seemingly different problems share common structure: Outbreak detection can be modeled as selecting nodes (sensor locations, blogs) in a network, in order to detect the spreading of a virus or information as quickly as possible.
As a result of this work, the authors have published some blog lists which answer a fundamentally important question in terms of weblog reading habits: Which weblogs should I read to be most up to date? The lists answering this question – generated by the approach described in their paper – come in a number of varieties to be found on the project’s page.
I scanned (skipped most of the math 🙂 through the extended version of the paper and this is something I would love to see applied to niche blogging networks. For example, starting from a subset of weblogs that mention topic X or, better, those that participate in a discussion (cascade) that mentiones topic X.
A few points relevant from the practical perspective – having a tool that helps a blogreader to make a selection of blogs to read (my expectations in that respect are pretty high given that Natalie Glance is working for Google now 🙂
1. “Costs” of reading. The authors played with optimising the number of blogs and number of posts one reads. Assuming that reading less blog posts is more cost-effective, the algorithm shows that “the popular blogs might not be the most effective way to catch relevant information cascades” (p.23). Instead, it makes more sense to read “good summarizer blogs that may not be very popular, but which, by using few posts, catch most of the important stories propagating over the blogosphere” (p.15).
2. Predicting the future. From a reader perspective one would like to have a recommendation of blogs that will cover most interesting stories in the future. From what I understood the algorithm does not work that well for making those predictions. The authors optimised the performance by including only big blogs (= at least one post per day), but I wonder if there are some other alternatives.
Anyway, I guess I should go back to my PhD writing and wait patiently till people who read the paper without skipping the math do something with it. So far I’m happy that the paper promises lots of interesting developments and that it also makes me feeling less guilty with our alternative approach to vaccination by suggesting that “uniform ummunization strategy corresponds to randomly placing sensors in a water network” (p.22), which in not optimal :)))
Archived version of this entry is available at http://blog.mathemagenic.com/2007/11/14.html#a1953; comments are here.