August 28th 2008

Blogging PhD ideas chapter: missing piece of the discussion section

In case you are reviewing the chapter on blogging PhD ideas - below is the part missing in the discussion section of the draft (as a bonus you can see how the post from yesterday turned into something more academic :)

***

While study of a single blogger is not representative for all knowledge workers who blog, the findings presented in this chapter correspond to personal accounts of other bloggers discussing uses of their weblogs for organising own thinking (Doctorow, 2002; Halavais, 2006; Mortensen & Walker, 2002; Pollard, 2003), publications discussing how weblogs could be used that way (Edmonds, Blustein, & Turnbull, 2004; Paquet, 2002; Peña-López, 2007; Todoroki, Konishi, & Inoue, 2006) or how contextual factors shape blogging in an organisational environment (Walker, 2006). Studies of work-related blogging suggest that weblogs serve as a ‘trigger to elicit passion for knowledge’ (Kaiser, Müller-Seitz, Lopes, & Pina e Cunha, 2007) and are used as a reference archive to support working on a document (Carter, 2005) by knowledge workers in other settings, however they do not provide an in-depth view of the activities behind those uses.

The literature on personal information management allows comparing the findings to existing research at a more granular level. The synergies between using weblog to collect and organising ideas and uses of those in supporting specific tasks are similar to those described by Erickson (Erickson, 1996) in the case of personal electronic notebook. The possibility of a feedback in a case of a weblog provides an additional motivation to contribute, however, writing in public also results in limitations on what could be written that do not exist in a case of a personal tool.

Although at the first sight using weblog as an online knowledge base calls for comparison with digital collections created by other tools, I find more parallels with the studies that look at information represented by the paper artefacts on desks and in personal archives (Bondarenko & Janssen, 2005; Kaye et al., 2006; Kidd, 1994; Whittaker & Hirschberg, 2001).

For example, the type of information included into my weblog and the role it plays in developing ideas echoes the discussion of the role of the paper on the desks to support knowledge work in the study by Kidd (1994). According to this study, spatial layout of papers in the office serves as a holding pattern for the ideas that knowledge workers “cannot yet categorise or even decide how they might use”, as a primitive language that reflects models of the world still being constructed, as contextual cues to recover the state of their thought after an interruption and as demonstrable output of the progress (Kidd, 1994, pp. 187-188).

Not being tied to specific tasks and bounded by expectations and format of a bigger document, my weblog allows including dormant information and capturing ideas under construction. Flexible categorisation provides a way to replicate the spatial arrangement of documents on a desk: chronological archives, tags and links allow “piling” entries together and indicating relations between parts of emergent mental structures. While contextual cues around a weblog post do not support returning to an interrupted task in a way as the layout of papers on a desk does, they play similar role helping to recover a state of mind at the moment of writing the post, which is useful when returning to an idea that has been “parked” for a while.

Finally, the public nature of weblog gives others an idea of the work in progress similar to the papers on one’s office desk. In that respect, a weblog bears more similarity to one’s office room than to one’s digital spaces: as a personal space that others could visit as guests, weblog serves social functions of sharing resources, building a legacy and impression management similar to the paper archives (Kaye et al., 2006).

While existing publications and feedback on this study from other bloggers suggest that more bloggers use their weblogs to organise and develop their thinking, more research is needed to explore frequencies of those uses and the conditions stimulating them. In that respect, the view of blogging as an experience of flow states (Kaiser et al., 2007) provides an intriguing starting point.

A particularly interesting research direction would be exploring connections between a task at hand and specific blogging episodes: how much and in what cases blogging is used to “park ideas” and when it directly contributes to one’s work on the task. Since those connections are too infrequent for an observation and difficult to reconstruct from memory or content of a weblog post, the best results are likely to be acquired in a diary study (for example, by inviting a blogger to fill in post-specific questionnaire immediately after publishing a post, as in Carter, 2005).

The connection between the functionalities of weblog technologies and their uses for personal information management needs further examination. The similarity between the roles of weblog to support my work and those of paper collections in other studies indicate a need to explore the affordances of weblog technologies from PIM perspective and possibilities of learning from blogging when designing other tools. Finally, the potential for learning from information accumulated in one’s weblog calls for a development of tools allowing to explore patterns in those traces that aimed at bloggers themselves (supporting what Pousman, Stasko, & Mateas, 2007, call casual information visualisation).

References

Tags: , , ,

No Comments yet »

November 14th 2007

Getting more by reading less blogs: some thoughts on ‘Cost-Effective Outbreak Detection in Networks’

Matthew Hurst on the most important blogs for efficient readers:

A group of researchers at CMU have been considering a notion of blog importance based on how likely a set of blogs is to ensure that you will be informed of topics bursting in the blogosphere. By analogy, they consider a graph of water pipelines. Their paper - Cost-Effective Outbreak Detection in Networks Leskovec, Krause, Guestrin, Faloutsos, VanBriesen, Glance - poses the problem:

Given a water distribution network, where should we place sensors to quickly detect contaminants? Or, which blogs should we read to avoid missing important stories? These seemingly different problems share common structure: Outbreak detection can be modeled as selecting nodes (sensor locations, blogs) in a network, in order to detect the spreading of a virus or information as quickly as possible.

As a result of this work, the authors have published some blog lists which answer a fundamentally important question in terms of weblog reading habits: Which weblogs should I read to be most up to date? The lists answering this question - generated by the approach described in their paper - come in a number of varieties to be found on the project’s page.

I scanned (skipped most of the math :) through the extended version of the paper and this is something I would love to see applied to niche blogging networks. For example, starting from a subset of weblogs that mention topic X or, better, those that participate in a discussion (cascade) that mentiones topic X.

A few points relevant from the practical perspective - having a tool that helps a blogreader to make a selection of blogs to read (my expectations in that respect are pretty high given that Natalie Glance is working for Google now :)

1. “Costs” of reading. The authors played with optimising the number of blogs and number of posts one reads. Assuming that reading less blog posts is more cost-effective, the algorithm shows that “the popular blogs might not be the most effective way to catch relevant information cascades” (p.23). Instead, it makes more sense to read “good summarizer blogs that may not be very popular, but which, by using few posts, catch most of the important stories propagating over the blogosphere” (p.15).

2. Predicting the future. From a reader perspective one would like to have a recommendation of blogs that will cover most interesting stories in the future. From what I understood the algorithm does not work that well for making those predictions. The authors optimised the performance by including only big blogs (= at least one post per day), but I wonder if there are some other alternatives.

Anyway, I guess I should go back to my PhD writing and wait patiently till people who read the paper without skipping the math do something with it. So far I’m happy that the paper promises lots of interesting developments and that it also makes me feeling less guilty with our alternative approach to vaccination by suggesting that “uniform ummunization strategy corresponds to randomly placing sensors in a water network” (p.22), which in not optimal :)))

Archived version of this entry is available at http://blog.mathemagenic.com/2007/11/14.html#a1953; comments are here.

Tags: , , ,

No Comments yet »

April 11th 2006

Feed your blog to tOKo and see what comes out

Anjo is moving further in developing a blog-friendly version of tOKo (related to all our earlier work on weblog communities, conversations and topics):

A little bit of progress on the open source version of tOKo (and the like), and in particular making it suitable for bloggers.

The first problem is turning a (your?) blog into a corpus. tOKo is pretty flexible as to what a corpus looks like, but the process must be automated. Jack Vinson and Ton Zijlstra provided great help by converting their blogs to a Movable Type export file and making the result available. Therefore, tOKo now contains a “Create corpus from Movable Type” function. The nice thing is that several blogging platforms provide Movable Type (MT) export. For example, in TypePad (which I use) a MT file can be generated from the web interface. Moreover, an MT file contains all information, including comments and trackbacks.

I’m getting into research fun anticipation - getting hold of comments next to post text would be such a great thing for the analysis :)

And, if want to help to develop the tool you can contribute your blog archives in Movable Type format (WPexport could be handy for WordPress users). This especially makes sense if you feel belonging to KM bloggers community (paper) - or, as Anjo puts it:

If you have linked to Jack, Ton, Lilia or myself in the past, this would be particularly interesting (also if you can only export to Movable Type). The only disadvantage of making your weblog available is that I might ask you to alpha-test tOKo :-).

My email address is: anjo science uva nl (one at, two dots).

You get a bit more insight about this work from Ton’s impressions on the work in progress and Anjo’s visualisations (1, 2, 3, 4).

Archived version of this entry is available at http://blog.mathemagenic.com/2006/04/11.html#a1761; comments are here.

Tags: , , , , ,

No Comments yet »

September 29th 2005

KM bloggers community

KM community Usually Stephanie is the first to blog pictures like this one from our work on weblog communities, but this time I couldn’t resist :)

  • Light green is me
  • Blue - KM blogs
  • Red - educational blogs
  • Orange - internet research blog
  • Green - A-list

    All very subjective :)

    [Morning update] A bit more background: The data comes from 64 weblogs, spidered to extract full-text posts from 2004. This is semi-snowball sample; all 64 are 1-2 degrees from my weblog. The posts of all 64 were processed to extract links.

    For this visualisation we used the number of posts from weblog A linking to weblog B in 2004 as a tie indicator (assuming that more posts linking to someone mean stronger connection). It includes 64 weblogs spidered + weblogs that are linked by one (or more) of those 64 in at least 3 posts.

    [Joint work with Stephanie and Anjo (abstract, paper)].

  • Archived version of this entry is available at http://blog.mathemagenic.com/2005/09/29.html#a1680; comments are here.

    Tags: , ,

    No Comments yet »

    August 30th 2005

    Experimenting with creating an ontology based on weblog content

    Anjo documents the experiment of creating a cooking ontology by running smart tools through the content of Chocolate and Zucchini:

    Curious to know what Sigmund would say :)

    Archived version of this entry is available at http://blog.mathemagenic.com/2005/08/30.html#a1650; comments are here.

    Tags: , ,

    No Comments yet »

    August 22nd 2005

    Link love: lists, clouds and action points

    I was thinking of commenting on the unfolding discussion on link love since BlogHer, but couldn’t find time to write it up properly (which for me required going through the fast-growing number of posts). Don’t think I’ll do it properly now, but given our work was referenced a couple of times I feel responsible enough to do it…

    I’m in the Feedster top 500 (as some friends nicely point out). So what?

    • I don’t have people knocking on my door asking me to speak at conferences or wanting to place ads in my weblog - being in the list doesn’t mean that you are in the inner circle (I suspect that A-list is not something defined by whatever top-X list anyway).
    • I do not see any personal value of being in this list or using it to find others. The only thing it brings is egosmiling - ha, I’m in the list - me having some fan registering the fact. If I disappear from it tomorrow I’d smile again and go on.

    These are my personal indicators that lists of popular blogs do not work.

    A few things could work. Smart combinations of blog metrics, or better visualizations of conversation clouds because I guess we are more interested in finding the cloudmakers and connecting with them…

    Lilia Efimova (Blog posts 2004)visualization of the political blog networkI guess there is already some understanding in the community of what is needed. Probably something like those visualizations.

    Available for you and me. For our own weblogs or topics we are interested, not only for those researchers choose to study. Trusted and clickable.

    From what to how

    I’m not sure that the problem is in the lack of algorithms. At least those that come from research are published. I think it’s pretty much about the teasing data.

    It’s not enough to come up with a great formula. You have to test it - to see what comes out, to try it on different data sets, to implement it as a tool, to make tools open for a public, to make sure all these scales…

    But it starts with the data. And the data is not public.

    I can not speak for others, but I can talk about problems we have with the data needed for our research (which addresses some of the “link love” aspects). What we need to develop algorithms and tools are pretty simple: blog content in “full-text RSS quality” via APIs…

    We tried many of the current blog indexing tools: no luck (those that are pretty close to what we need, BlogPulse, Technorati and Bloglines are either consider the data they collect commercial or do not have APIs to access it). As a results Anjo is working on weblog spider instead of community discovery algorithm.

    I know other researchers working on weblog spidering instead of working on algorithms to process and visualise weblog data. I wonder how many other people out there who would play with the data if it would be accessible without any threshold. I believe there are many.

    I was very sad to hear last week that upflux didn’t gain much support from players in the blog indexing market. I wonder if open access to weblog data is a “nice to have, but never real” dream. And I wonder if Mary’s effort will turn it into reality…

    Btw, are there any Technorati tags for this conversation?

    Archived version of this entry is available at http://blog.mathemagenic.com/2005/08/22.html#a1641; comments are here.

    Tags: , ,

    No Comments yet »

    August 17th 2005

    The Robots and media contagion

    Monday morning I didn’t know that I’ll spend the evening in a company of The Robots (the guys behind 43 Things, 43 Places and All Consuming) and Lee LeFever welcoming Cameron Marlow who happen to be in Seattle with beers and fun.

    Between other things we had a nice presentation of Cameron’s dissertation research (photos by Daniel Spils and Erik Benson), talked about blogs and dreams and all other 43 things…

    Cameron’s dissertation (The structural determinants of media contagion) should be online soon - there is a lot of good stuff in there… A few things to remember:

    • there seem to be a correlation between the frequency of updates and number of incoming links for a weblog
    • static and dynamic links (blog homepage/permalinks, see earlier research) are not that different (on 1 month sample) in terms of connectivity (re: mapping blog communities) - survey data on types of relations behind linking explains some of it
    • there is no S innovation diffusion curve in the blogosphere - if something picks up it happens exponentially
    • meme traces are visible in patches - how much ideas travel through backchannel?
    • survey response rate and travel through the blogosphere really picked up once Cameron added funny banners to it
    • upflux didn’g gain much support and has been discontinued - bad for blog research

    Archived version of this entry is available at http://blog.mathemagenic.com/2005/08/17.html#a1633; comments are here.

    Tags: , , ,

    No Comments yet »

    August 15th 2005

    Papers of WWW2005 workshop on the weblogging ecosystem

    Papers from WWW 2005 2nd Annual Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics (see also papers from the workshop in 2004):

    I believe that engaging with researchers is something to be seriousely considered while thinking of blog metrics - hopefully will have more time to write about it…

    Archived version of this entry is available at http://blog.mathemagenic.com/2005/08/15.html#a1632; comments are here.

    Tags: , ,

    No Comments yet »

    January 29th 2005

    BlogTrace

    Anjo shares details about BlogTrace, weblog analysis tool we are working on (as you can see from Anjo’s post my main contribution is motivating the work and then going for a vacation :)))

    There are too many specific comments I have, so at this moment just an image representing BlogTrace architecture. Read Anjo’s post for more details.

    Archived version of this entry is available at http://blog.mathemagenic.com/2005/01/29.html#a1494; comments are here.

    Tags: ,

    No Comments yet »

    January 26th 2005

    Visual settlements: on weblog visualisations

    Anjo Anjewierden (Blog posts 2004)While I was travelling, Anjo did a great job of working out his visual settlement idea into an implementation (and I’m also a lucky one who can actually play with the software and not only enjoy images in his weblog :)

    First, Anjo’s explanations (the image right is a representation of Anjo’s weblog):

    Roughly the method to draw the pictures is as follows:

    • Size of a blob is determined by the number of words in the post. Bigger blob, more words (in fact: every pixel represents one word).
    • Colour of the blob is determined by whether there are links to others (grey), links from others (green) or no links (red). All with respect to a community of KM bloggers determined by Lilia and Stephanie
    • Position of the blog is determined by the chronological order (oldest posts are in the center) and by self-linking (if a post self links back to an own post, it will appear close to the original post).

    My first questions are about things Anjo didn’t clarify:

    • is there any difference between squares and circles? circles and ovals?
    • what color is the blob if post behind it has both, links to others and links from others?

    Lilia Efimova (Blog posts 2004) Alex Halavais (Blog posts 2004) These are two other visualisations, of my own weblog and one of Alex Halavais.

    My weblog is more colored than the one of Alex. Does it mean that Alex doesn’t link or not linked back? That he is not well connected with the community? Or (which I guess is the reason) that the community was mapped as a snowballing starting from my weblog, so my “linking partners” are there, but not those of Alex. Of course, we are working on mapping the community properly, but still would be nice to have some workaround…

    You can also see that Alex’ blog shows more “rays from the center” structure than mine - guess as a result of me heavily linking to older posts, so posts are grouped braking straight lines (ray structure is even more visible on visualisation of Robert Scoble’s blog). But what is behind those rays starting from the center? Are posts randomly assigned to a line or there is a logic behind it?

    I’m still thinking of what else and how I’d like to see visualised. You are welcome to share your ideas.

    And, if you need more inspiration, you may want to check BlogScapes by Brian Dennis, various visualisations of five years writings by Tom Coates, web-log continuum sparklines or knowledge flow sparklines

    I’m back to my usual “bad” practice: blogging when I have to work on a paper :)

    This post also appears on channel weblog research

    Archived version of this entry is available at http://blog.mathemagenic.com/2005/01/26.html#a1488; comments are here.

    Tags: , , ,

    No Comments yet »

    • Welcome!

      Like my house right now this blog is loved, but neglected space: finishing my dissertation and being a happy mom doesn't leave much energy for anything else. I'm almost there, starting to look forward to "after the PhD" life, like moving to an unknown country...
    • Archives

    • Categories