Europeana Tech

Last week I spent three days in Vienna at the Europeana Tech conference. I have to admit that I did not see much of the conference itself, as I was heavily involved in the Hackathon. Just to clarify, a Hackathon is a strange event where you get a group of programmers into a room and provide them with an internet connection, data, food, and drink. Then you ask them to form into groups and take the data that has been provided and build something interesting/cool/useful/whatever, while consuming the food and drink.

This we duly did, me teaming up with my two Paths colleagues from San Sebastian. The prototype we built was focused on one of the research questions that I have been investigating recently, namely “how do you provide access to a large digital cultural heritage collection to users who do not know what they are looking for”. Before I can go into what we did, a bit of context. The “data” part alluded to above was the Europeana search API. Europeana itself is a Europe-wide content aggregator/portal/they aren’t quite sure what they want to be in the future. Basically they take meta-data about cultural heritage artefacts from museums, archives, libraries, and galleries and provide a single, unified interface for searching them. This works, sort of, except when you do not know what you are looking for, then you are stuck.

The idea we had was to provide an alternative interface to Europeana, which did not revolve around searching for something. What we decided on doing was to use Wikipedia as the interface the user sees and augment that with images taken from Europeana. Say you wanted to find images in Europeana about the Roman Empire. You copy the URL of the Wikipedia page (http://en.wikipedia.org/wiki/Roman_Empire) into our system. The system fetches the Wikipedia page and then for each paragraph in the Wikipedia page, it tries to find images that are linked to the content of the paragraph. The user is then shown the Wikipedia page + the images and that way gets an overview over what data is available in Europeana. You can try it out here, which should clarify how it looks in practice.

In the process of developing this, we hit upon some issues with the Europeana search API, which sometimes did not work at all and sometimes returned broken data. This has an unfortunate influence on the quality of the results you see. We tried to work around that by simplifying the types of searches we issue in the background (which unfortunately leads to the occasional weird image) and also by doing some pre-processing. My two colleagues pre-processed a load of Europeana data to link it to Wikipedia using their own algorithm and you can see the results of this at the top, in the scrolling set of images.

Overall it works quite well and it was good fun building it. What is really nice about it, is that as long as there is data in Europeana that uses the same words as are used on the page, then it doesn’t matter what language those words are. It just works. Have a go yourself.

 

Posted in Technology, Work | Tagged , , , | 2 Comments

· comments

  1. written by desenfrenada · October 11, 2011 at 10:24 am  

    Kudos, that’s a nifty approach you implemented there.

    http://paths.sheffield.ac.uk/wikiana/wiki/Johannes_Gutenberg
    I like the image that has been found for “fonts” :)