Freitag, 8. Mai 2009

Claudia Koreck - 's ewige Lem

alles ist so ruhig (bayer.: staat)
so gemütlich und warm
ich lieg in der Wiese
unter einem Baum
ich mach' meine Augen zu
dann schlafe ich ein
ich bin jetzt über den Wolken
und die Welt zieht vorbei

ich höre dein Lachen
ich kenne doch die Stimme
das wird immer lauter
du willst dass ich komme
wenn du mich abholst
wo bringst du mich hin?
sag' ist es schön dort?
und wartet wer auf mich?

und irgendwie, habe ich keine Angst mehr vor dir
ich seh alte Leute aus längst verganger Zeit und ich spüre
alles ist vergessen und alle Fehler vergeben
und ich glaube dass ich da bin - im ewigen leben

und irgendwie, habe ich keine Angst mehr vor dir
ich seh alte Leute aus längst verganger Zeit
und ich spüre alles ist vergessen und alle Fehler vergeben
und ich glaube dass ich da bin - im ewigen leben
und ich glaube dass ich da bin - im ewigen leben
und ich glaube dass ich da bin - im ewigen leben

Samstag, 21. März 2009

A (Very) Simple Method for Automatic Creation of Hierarchical Tag Clouds

Ok, I don't have time for an introduction, so here is the idea - i'll explain it at the example of photos. Note that I did not make a literature research (same excuse - no time), so anything I say might have been published already.

Say you took a bunch of photos:
  1. a bird in Brisbane in 2009
  2. a tree in Brisbane in 2009
  3. a bird in the mountains in 2008
  4. a tree in the mountains in 2008
  5. a sunset in Brisbane in 2009
  6. a sunset in the mountains in 2009
Now, you would attach the following tags to the pictures:
  1. bird, city, 2009
  2. tree, city, 2009
  3. bird, countryside, 2008
  4. tree, countryside, 2008
  5. sunset, city, 2009
  6. sunset, countryside, 2009
A simple tag cloud would look like this:
2008 2009 bird countryside city sunset tree

This example is small and there is no problem with displaying your tags in such a tag cloud. However, once you have to deal with thousands of tags, this approach is not feasible anymore.

So, how can we handle these large tag clouds? My answer: hierarchical tag clouds. Hierarchical tag clouds consist of a kind-of "root" to which several sub-clouds are attached to and so on. Also, we want the hierarchy to be built automatically, without any further effort by the user (besides tagging).

Here is an algorithm on how to generate the root cloud automatically:
  • look for the most used tag
  • put it to the root tag cloud
  • look for the second-most used tag that was not used together with the previous one
  • put it to the cloud
  • and so on

Then, our root cloud would look like this:
2009 2008

By clicking on the cloud, the same algorithm could open a sub-cloud by only using the tags that occur together with the supertag. For instance, clicking on 2009 would open
city countryside

Clicking on 2008 opens
countryside

and so on...

In total, the hierarchy would look like this:
+2009
++ city
+++ bird
+++ tree
+++ sunset

++ countryside
+++ sunset

+2008
++countryside
+++bird
+++tree

Ok, this is really a very simple idea. It may not work very well for folksonomies, that is, tag clouds generated by a bunch of users (such as done in del.icio.us).

Donnerstag, 8. Januar 2009

Dropbox With Personal Encryption

When Dropbox came to public beta in the end of last year it had (and still has) a quite considerable press coverage even in common newspapers. Dropbox gives a simple solution to a non-trivial problem: syncing files across several computers and operating systems. It is free for up to 2GB of storage and can be upgraded to 50GB for about 10$ per month.

There is one caveat, though: who wants to put his potentially confidential data in the hands of an anonymous company? Dropbox uses encryption for both transmission and storage of data (on the Amazon S3 Servers), but the keys are in the hand of Dropbox.

The remedy: personal encryption. In this post, I will compare two approaches:
  1. TrueCrypt (www.truecrypt.org) and
  2. EncFS (www.arg0.net/encfs).

TrueCrypt


Truecrypt is a free open source encryption program that creates file containers which capture a whole directory tree. Once they are mounted they appear to the operating system just as a normal hard disk drive. No information about file sizes, filenames, directory structure etc. is available. It is not possible to see from outside how much data actually resides in the container. Also, true plausible deniability is possible by creating a so-called hidden volume which is mounted depending on which password is entered.
To put it in a nutshell: TrueCrypt is one of the most secure encryption programs available.

The good news: TrueCrypt works with Dropbox. Just put the container to your dropbox and mount it whenever you want to access it. Since only the file changes that actually take place are updated, it is no problem to put even larger volumes to the dropbox - only the file differences will be uploaded.

The bad news - there are some issues to take into account:
  • For very huge containers, there is an annoying offset time before the actual synchronization can take place. My largest container was of the size of almost 10GB, and even with only minor changes to the volume (like creating an empty folder) it took up to 5 Minutes for syncing. This may be due to the fact that Dropbox needs to figure out where changes have actually happened, so some checksums have to be transmitted and compared.
  • TrueCrypt puts an exclusive lock on the container, which means that Dropbox can only sync it once the container is dismounted. Now, imagine you forget to dismount your container on computer A, turn it off, and continue working with computer B. Since changes on A were not uploaded to the dropbox cloud, this will result in a conflict.
    Principally, Dropbox handles conflicts quite well: it creates a copy of the conflicted file in the dropbox and leaves it up to the user to decide which file to take or how to merge the data. However, for huge TrueCrypt containers this feature is a killer: you have to download the whole container to your harddisk before you can resolve the conflict. This can take days for a 10GB volume...


EncFS


EncFS is also a free open source encryption program. In contrast to TrueCrypt, it encrypts each file individually, so there is no need of a huge container file. Encryption works on-the-fly just as with TrueCrypt. Filenames and directory names are also encrypted.
Not being an encryption expert, I would not consider EncFS to be as secure as TrueCrypt. This observation is simply due to the fact that EncFS does provide some information to the "outside": the complete directory structure and the file sizes. For example, this makes it easy to find out if some known software packages are stored.

Besides that, it works much better with Dropbox than TrueCrypt does. Since there is no big container file, no big time offset for small changes occurs. Additionally, the risk of conflicts is reduced dramatically since files are small and can be uploaded quickly before they are edited on another computer. No exclusive lock is put to the files while mounted. Still, there is a risk of conflicts which is equal to the risk of conflicts for normal/unencrypted use of Dropbox.

Of course... there is one big caveat also in this approach: EncFS is Linux only. For those who want to use it within windows in spite of that, there is a more or less comfortable workaround:
  • install a virtual machine like VirtualBox (which is freely available)
  • setup linux + encfs + a samba server on this virtual machine (I used Ubuntu 8.10)
  • mount the encfs directory publicly - in Ubuntu this might look similar to
    sudo encfs --public ~/Dropbox/encfs ~/encfs
Note that for Windows Vista it is somehow difficult to connect to samba shares so be patient... sometimes it helps to use the IP address instead of the netbios share name. Also, if you have a Vista home version, this link might help you out.


Resolving Conflicts with EncFS

Once this works, one further problem has to be considered: even though with this approach conflicts are not more likely to occur than without encryption, they still can occur - for example, if two persons work on the same file at the same time. Or, if a huge file is changed and before the upload is finished it is changed on another computer.
Once a conflict occurs, it is more difficult to be resolved than without encryption. Dropbox will create a renamed file and leave the conflict resolution up to you. The problem is that you won't see this renamed file in the decrypted folder, but only in the encrypted one. So, you have to look into the encrypted folder to find the two files which could be named like "X7cBkyW" and "X7cBkyW.conflicted" (just an example). Then, if you want to see the contents of the conflicted file you have to rename it to the original name (and beforehand rename the original to something else). Then, you can open the unencrypted file. It can also be a bit difficult to actually find which one is the corresponding unencrypted file... filesize or directory structure can be helpful in this step.


Conclusions

EncFS wins over TrueCrypt with respect to usability. You can benefit from Dropbox just as if you would not use file encryption at all.
It is possible to resolve conflicts, however far not as convenient as it would be without encryption. Usually this should not be a problem because Dropbox is designed to avoid conflicts by instantly syncing files to the cloud. Conflicts are most likely to occur once files are shared with other persons, so in this case one should consider not using encryption at all.