Tools to find similarity between two texts (weblog and papers)

I’m playing with an idea of comparing (parts of) my weblog with some of my published papers (and with the dissertation as a whole when I’m done). So far I’m interested in two things:

  • how much of the text is reused
  • how conceptually close two texts (weblog and a paper) are

Thought of a couple of ways to do so:

  • One way would be to use all kinds of weblog analysis tool from Anjo. One of the difficulties there would be to figure out how to find similarities between weblog text, which is relatively self-contained microcontent pieces, and linear “build upon previousely said” academic papers.
  • Another option would be to use some plagiarism detection tools. Only wonder if you can configure those to compare target paper with a specific weblog, rather than with “everything published”.

Any ideas?

