July 7th 2008

Comparing weblog text to the PhD dissertation via tagclouds

About a year ago I looked for Tools to find similarity between two texts (weblog and papers) - I wanted to find a relatively objective way to judge how much of my weblog writing ends up in the dissertation.

Between other things I experimented with generating and comparing tagclouds from texts that were supposed to correspond to each other. I tried several tools, but ended up with tagCrowd since it allowed using generic and custom-made lists of stop words.

As an experiment I used text of five dissertation chapters (draft versions as of April 17, 2008) and the text of blog posts coded as corresponding to those chapters to generate a visualisation of most frequent words in each case. After removing stop words (general English plus those from my own list that I was stupid enough not to save) 65 most frequent words are visualised.

For example, two tagclouds below are those from the blogposts related to the Microsoft study and the draft chapter with the results of it.
Tagcrowd: blogposts related to chapter 6 (Microsoft)Tagcrowd: current draft chapter 6 (Microsoft)

In total I had 5 pairs of visualisations. I then mixed them and asked five people familiar with my research (supervisors and collaborators) and eight students (taking a class with Anjo) to find matching pairs. The results are below.

Total pairs Correctly matched pairs Correctly matched pairs, %
Chapter 1. Introduction 13 10 77%
Chapter 2. Methodology 13 11 85%
Chapter 3. Ideas 13 6 46%
Chapter 4. Conversations 13 10 77%
Chapter 5. Microsoft 13 9 69%
Total 65 46 71%
by people familiar with the research 25 20 80%
by people not familiar with the research 40 26 65%

Some comments:

  • I guess there is a connection between PhD chapters and blogposts :)
  • The high score for the methodology chapter is explained by its qualitative difference from the rest of the dissertation.
  • The low score for this chapter is explained by the fact that the coding of weblog entries in relation to chapters was done prior to writing it. As a results it included many “might be relevant” posts, while for other chapters the focus was more clear. In addition, the draft version of the chapter used to generate the visualisation was the first draft, while in other cases those were revised several times.

Tagcrowds: current state of the dissertationIt was nice to see that although many of the visualisations looked similar (with blogging and weblog being big ;) it was actually possible to match the pairs. But the nicest thing was simply making all those pictures, laying them on the floor and thinking that I actually had some version of 5 chapters out of the 7 :)

Tags: , , ,

3 Comments »

July 2nd 2008

If most of the things I want to say in my PhD are already in my weblog, what’s the added value of the dissertation?

While working on the study of my personal blogging practices I went through my weblog archives. 1460 posts, more than half a million words (it was hard to believe when I saw the stats).

Reading old posts in an interesting experience, especially at a “convergence moment” when lots of old ideas find their place in the dissertation. At some moment I was pretty frustrated wondering on Twitter “if most of the things I want to say in my PhD are already in my weblog, what’s the added value of the dissertation?”

Well, writing a dissertation has an added value. This post is about it.

While weblog provides a space to grow ideas, it’s also a mess of fragments. They are connected through links and tags, but in many cases the higher level reasons of why certain bits appear and how are the relevant to a bigger whole remain unarticulated. Mainly because at the moment of writing it’s not clear how the fragments connect. Also, in many cases, the whole story is too long for a weblog post.

Connecting those fragments takes time, which is difficult to find between work and writing about new and fresh ideas. Usually I know vaguely about the connections; regular readers of the weblog probably have an idea too. For others, it’s just a bunch of interesting bits buried in the pile of half a million words.

It also takes extra work (e.g. a systematic data collection and analysis) to connect fragments in a story that provides stronger evidence than a collection of anecdotes.

Working on a dissertation provides a structure to address those issues: the need to connect fragments, push and discipline to collect evidence, time to work on converting all that into a bigger whole and a space to do it.

At this moment I smile reading my old post about not wanting to write a book - I’m pretty happy to have my dissertation as a legitimate excuse to turn “small pieces loosely joined” into a whole that does not easily fall apart. While reading weblog posts is still easier, I hope that reading the dissertation is more efficient for those interested in a bigger picture behind the fragments.

I still have my concerns about the long time it takes to write a book and lack of interactivity in the traditional process of doing so, but this is another story.

[Some related thoughts were also in a post by Jill about an added value of writing a book on things well covered in the weblog, but I couldn't find it back.]

Tags: ,

4 Comments »

January 19th 2008

Combining PhD writing and caring for a sick baby OR New take on flexible working hours

Those moments when Alexander is sick are probably most difficult in trying to combine motherhood and working on my PhD. The sleepless nights, when he wakes up every hour and needs something from me are not only tough by themselves, but they make writing the day after close to impossible, because my brain refuses to function.

Well, it seems that I might have discovered a solution: instead of struggling to write the day after the night like that, I might well write at night (taking breaks to help Alexander when he wakes up) and recover during the day when there are usually more people who can help.

Don’t know if it’s sustainable, but at least this night it works :)

Tags: , ,

No Comments yet »

November 29th 2007

Why storytelling works?

While working on my methodology chapter I realised that my interest in using alternative writing styles (e.g. authoethnography) in reporting research is also supported by knowing that storytelling is an effective way to share knowledge from my KM work. Now the problem is that I was never seriousely into storytelling research, so I don’t have any research-based arguments for that. Any pointers are very welcome!

From what I can recall it was something about the power of contextual cues in the story that trigger all kinds of connections in our brain.

Some randomly related things that I thought about:

See also: a collection on how storytelling communicates complex ideas by Steve Denning

Archived version of this entry is available at http://blog.mathemagenic.com/2007/11/29.html#a1958; comments are here.

Tags: ,

No Comments yet »

September 23rd 2007

Fever and methodology

Those are the two reasons I haven’t been blogging much…

Alexander was ill for the first time (not counting teething and running nose). I knew that it would be scary, but it’s even scarier when you are in the middle of it. REALLY sleepless nights, crying baby and us, worried about everything and not knowing how to help. Fortunately is over…

I’m working on the methodology chapter for my PhD, which is unforgiving. At times I do feel embarrassed about how much time and effort it takes before a reasonably good text is constructed. Unfortunately it’s not finished yet…

Archived version of this entry is available at http://blog.mathemagenic.com/2007/09/23.html#a1943; comments are here.

Tags: ,

No Comments yet »

June 25th 2007

Affectionate writing reduces cholesterol

Came across today at Torill’s blog:

From the journal Human Communication Research, vol 33, number2, April 2007, ‘Affectionate Writing Reduces Total Cholesterol: Two Randomized Controlled Trials’ by Kory Floyd, Alan C. Mikkelson, Colin Hesse and Perry M. Pauley.

This is also a good reason to write on research topics you care about :)

Archived version of this entry is available at http://blog.mathemagenic.com/2007/06/25.html#a1917; comments are here.

Tags: ,

No Comments yet »

June 23rd 2007

On things that hide behind typical formats of reporting research

Another quote:

Agger (1989) has informed us that the typical article format in sociology is used to claim scientific validity. Techniques such as the citation of authority and the display of methodology convince the reader that they are partaking of an undistorted view of reality. [...] Merton (1968, 4) complained that sociologists do not inquire into “the ways in which scientists actually think, feel, and go about their work,” and as a result there is little public discourse concerning how social science is actually done. Moreover, Merton (1968, 4) believes that textbooks on research methods exacerbate the problem by teaching:

how scientists OUGHT [emphasis his] to think, feel, and act, but these tidy normative patterns, as everyone who has engaged in inquiry knows, do not reproduce the typically untidy, opportunistic adaptations that scientist make in the course of their inquiries.

He describes immaculate, bland, and typically impersonal sociological presentations that lack any accounting on the intuitive leaps, false starts, mistakes, loose ends, and happy accidents that comprise the investigative experiences. I further suggest that these presentations disguise the eminently social character of the production of knowledge, scientific or otherwise. By attempting to organize articles neatly into literature reviews, methods, findings, conclusions and so forth, all thinking is forcesed into a mold yielding an account of the research process that ignores, indeed counts as irrelevant, issues such as who the researcher is and what his or her motives are for the researching the topic of interest.[pp.420-421]

Ronai, C. (1995) ‘Multiple Reflections of Child Sex Abuse: An Argument for a Layered Account’, Journal of Contemporary Ethnography, 23: 395-426.

Given that the quote on lack of inquiry into “the ways in which scientists actually think, feel, and go about their work” is from 1968, I guess I should check is there is any research on those things.

Also: I never really realised how the format of reporting research is inded used to claim validity. Now I realise that in her discussion of quality criteria Ulrike Schultze brings the format of writing explicitly as an evidence of plausibility [check when at work!]. I never questioned it…

However, if you look into that it looks suspucious - the difference in reporting style doesn’t really change what you did in your investigation. Or does it?

If it does not, then using the “right” format to claim quality is pretty much hiding behind the words.

If it does, then writing itself is an added value activity, rather then “just” reporting. And then we are back to writing as a method.

Something to think about…

Archived version of this entry is available at http://blog.mathemagenic.com/2007/06/23.html#a1915; comments are here.

Tags: ,

No Comments yet »

June 21st 2007

Time flies: 5 years, 5 months

Today is five years since I blog. Time flies. Writing to a weblog gives me an extra evidence of it - time becomes more tangible when you see it as a timestamp on a story that feels so recent. But having it there, written, also gives it depth - showing that the path between then and now has been long.

AlexanderToday Alexander is 5 months old. Time flies, faster then I’d like to. Sure, we wait for every new development (when will he start sitting by himself? crawling? talking?), but every time I hold him in my arms I want to make time running slower. May be I should write more - to stretch those moments, at least on the screen…

Archived version of this entry is available at http://blog.mathemagenic.com/2007/06/21.html#a1914; comments are here.

Tags: , , ,

No Comments yet »

June 20th 2007

You either live, or write

One more on writing, from Gabriela (emphasis added, I just loved this nesting):

A lot of people have blogged about reboot - I gave up the idea because I wanted to focus on what was going on. A Romanian writer said once: “you either live, or write”, which might seem a bit odd to a blogger. We’re living while we’re writing - or is it vice versa? writing while we’re living? Anyhow, this time there were so many better bloggers around, that I felt like letting go!

Also: Real-time conference blogging: reporting vs. reflecting

Archived version of this entry is available at http://blog.mathemagenic.com/2007/06/20.html#a1913; comments are here.

Tags: ,

No Comments yet »

June 20th 2007

Writing as a method of data analysis

Pretty much on what I tried to say in Mangrove effect: the value of making things explicit - but narrowed down to writing as a method of data analysis:

I use writing as a method of data analysis by using writing to think; that is, I wrote my way into particular spaces I could not have occupied by sorting data with a computer program or by analytic induction. This was rhizomatic work (Deleuze&Guattari, 1980/1987) in which I made accidental and fortuitous connections I could not foresee or control. My point here is that I did not limit data analysis to conventional practices of coding data and then sorting it into categories that I then grouped into themes that became section headings in an outline that organized and governed my writing in advance of writing. Thought happened in the writing. As I wrote, I watched word after word appear on the computer screen - ideas, theories, I had not thought before I wrote them. [p.970]

And another one, just because it takes to the extreme some of my feelings (=I’m more moderate about audit trails and data saturation :)

And it is thinking of writing in this way that breaks down the distinction in conventional qualitative inquiry between data collection and data analysis - one more assault to the structure. Both happen at once. [...] Data collection and data analysis cannot be separated when writing is a method of inquiry. And positivist concepts, such as audit trails and data saturation, become absurd and then irrelevant in postmodern qualitative inquiry in which writing is a field of play where anything can happen - and it does. [p.971]

Both quotes are from Richardson, L. & St.Pierre, E. A. (2005). Writing: A method of inquiry. In N.K.Denzin & Y. S. Lincoln (Eds.), The SAGE handbook of qualitative research (3rd ed., pp. 959-978). SAGE Publications.

Wikipedia entry on rhizome in philosophy: I don’t understand much, but the fact that Carl Jung used the word “to emphasize the invisible and underground nature of life” is intriguing.

Archived version of this entry is available at http://blog.mathemagenic.com/2007/06/20.html#a1912; comments are here.

Tags: , ,

No Comments yet »

Next »

  • Welcome!

    I have not been blogging for a while. Between working on the chapters of my PhD dissertation and being a happy mom there wasn't much time to fix blog bugs. Finally I managed: this is brand new Wordpress blog; old Radio archives live next to it [quotes in imported posts are broken, I'm slowly fixing that]. It will take a while to make it nice and beautiful, but at least now I have a space to write.
  • Twitter

  • Archives

  • Categories