Topics and terms (categorisations and text analysis) for weblog conversations
Anjo, in What is a topic?
The most mysterious term that I encountered a lot recently is topic. I have no idea how to define it and, neither seem the weblog research proposals that suggest finding the topic of a post is something worth doing. Being on holiday currently, and given it was raining and snowing outside, I tried to apply the notion of "topic finding'' to weblog conversations (see also: here, and here).
Anjo goes on, providing an example of "unique" terms extracted from three weblog conversations (more details in the post). Although those provide a good picture of what conversations are about, they do not really answer the question of what is a topic of each of them.
Which makes me thinking of my own experiences around the issue...
One of the things we planned to do this year, but didn't get to do, was looking at personal categorisations. To be more specific the idea was to compare categories (~tags, ~topics) that a blogger assigns to her posts and the results of the text analysis of those posts to see if there is any correlation between the language used and conceptual categories. [I still think it's an experiment worth doing, but not sure I personally can devote serious time to it. Anyone interested?]
Thinking of my own weblog I can imagine that for some topics (I call them topics ;) that I use for my own weblog the correlation should be present (e.g. posts related to events are likely to be labelled with it and mention it in the text).
However there are others, those where I assign topic to organise my ideas on ill-structured themes (=I feel that those posts belong together, but I don't know why yet, or I don't have a good label for it). The examples of the second type are posts on life, knowledge mapping or transparency.
Which brings me to the reason I started to write this post. I think that topics are conceptual categories used to characterise a group of connected pieces (conversations with others, conversations with self, or something in between) and to give it a nametag. The common name makes sense - it makes it easier to remember those pieces belong together, to retrive, to communicate about.
The problem is that conceptual categories are subjective. They depend on a person, group or even groupthink (as with pressure to use certain tags to appear at right places in Technorati and not because they make more sense than others). So I suspect that once we define a topic of a conversation there will be someone who would say that it's about something else (referring to Anjo's examples - it could be "not about Skype, but about presence").
That's said I still think that defining a topic of a conversation makes sense. Personally, I'd prefer to have a Sigmund picture (~frequent terms and relations between them) for a conversation, as some kind of ontological fingerprint of what the conversation is about. Or there is a number of ways to select one of the terms from the "unique term list" for a conversation:
- by further selecting "least unique" from the subset (i.e. terms used by highest number of participants of the conversation)
- by selecting terms that match categories some of participants assign to posts
- by selecting terms that match predefined ontology/folksonomy/keyword list
- by selecting terms most of the participants are likely to agree (don't ask me how to do that :)
- by selecting terms most closely resembling those of an external "customer" for the analysis or those that non-participant is likely to understand
Or we just have to find a way of matching personal caterogisations. Given there the tools are going this shouldn't be that far...