Samstag, 21. März 2009

A (Very) Simple Method for Automatic Creation of Hierarchical Tag Clouds

Ok, I don't have time for an introduction, so here is the idea - i'll explain it at the example of photos. Note that I did not make a literature research (same excuse - no time), so anything I say might have been published already.

Say you took a bunch of photos:
  1. a bird in Brisbane in 2009
  2. a tree in Brisbane in 2009
  3. a bird in the mountains in 2008
  4. a tree in the mountains in 2008
  5. a sunset in Brisbane in 2009
  6. a sunset in the mountains in 2009
Now, you would attach the following tags to the pictures:
  1. bird, city, 2009
  2. tree, city, 2009
  3. bird, countryside, 2008
  4. tree, countryside, 2008
  5. sunset, city, 2009
  6. sunset, countryside, 2009
A simple tag cloud would look like this:
2008 2009 bird countryside city sunset tree

This example is small and there is no problem with displaying your tags in such a tag cloud. However, once you have to deal with thousands of tags, this approach is not feasible anymore.

So, how can we handle these large tag clouds? My answer: hierarchical tag clouds. Hierarchical tag clouds consist of a kind-of "root" to which several sub-clouds are attached to and so on. Also, we want the hierarchy to be built automatically, without any further effort by the user (besides tagging).

Here is an algorithm on how to generate the root cloud automatically:
  • look for the most used tag
  • put it to the root tag cloud
  • look for the second-most used tag that was not used together with the previous one
  • put it to the cloud
  • and so on

Then, our root cloud would look like this:
2009 2008

By clicking on the cloud, the same algorithm could open a sub-cloud by only using the tags that occur together with the supertag. For instance, clicking on 2009 would open
city countryside

Clicking on 2008 opens
countryside

and so on...

In total, the hierarchy would look like this:
+2009
++ city
+++ bird
+++ tree
+++ sunset

++ countryside
+++ sunset

+2008
++countryside
+++bird
+++tree

Ok, this is really a very simple idea. It may not work very well for folksonomies, that is, tag clouds generated by a bunch of users (such as done in del.icio.us).