Room3b – digital home . Laura & Mark Room3b – digital home . Laura & Mark

Europeana Tech

Last week I spent three days in Vienna at the Europeana Tech conference. I have to admit that I did not see much of the conference itself, as I was heavily involved in the Hackathon. Just to clarify, a Hackathon is a strange event where you get a group of programmers into a room and provide them with an internet connection, data, food, and drink. Then you ask them to form into groups and take the data that has been provided and build something interesting/cool/useful/whatever, while consuming the food and drink.

This we duly did, me teaming up with my two Paths colleagues from San Sebastian. The prototype we built was focused on one of the research questions that I have been investigating recently, namely “how do you provide access to a large digital cultural heritage collection to users who do not know what they are looking for”. Before I can go into what we did, a bit of context. The “data” part alluded to above was the Europeana search API. Europeana itself is a Europe-wide content aggregator/portal/they aren’t quite sure what they want to be in the future. Basically they take meta-data about cultural heritage artefacts from museums, archives, libraries, and galleries and provide a single, unified interface for searching them. This works, sort of, except when you do not know what you are looking for, then you are stuck.

The idea we had was to provide an alternative interface to Europeana, which did not revolve around searching for something. What we decided on doing was to use Wikipedia as the interface the user sees and augment that with images taken from Europeana. Say you wanted to find images in Europeana about the Roman Empire. You copy the URL of the Wikipedia page (http://en.wikipedia.org/wiki/Roman_Empire) into our system. The system fetches the Wikipedia page and then for each paragraph in the Wikipedia page, it tries to find images that are linked to the content of the paragraph. The user is then shown the Wikipedia page + the images and that way gets an overview over what data is available in Europeana. You can try it out here, which should clarify how it looks in practice.

In the process of developing this, we hit upon some issues with the Europeana search API, which sometimes did not work at all and sometimes returned broken data. This has an unfortunate influence on the quality of the results you see. We tried to work around that by simplifying the types of searches we issue in the background (which unfortunately leads to the occasional weird image) and also by doing some pre-processing. My two colleagues pre-processed a load of Europeana data to link it to Wikipedia using their own algorithm and you can see the results of this at the top, in the scrolling set of images.

Overall it works quite well and it was good fun building it. What is really nice about it, is that as long as there is data in Europeana that uses the same words as are used on the page, then it doesn’t matter what language those words are. It just works. Have a go yourself.

 

Post mortem

This is a brief post-mortem explaining the steps that lead to the complete Room3b website being wiped out.

This website is hosted on a Virtual Private Server (VPS) and at the beginning of the new year I moved to a new provider, as I was not happy with the performance at the previous one. I had set up everything on the new VPS instance and migrated all the data. Then news came that we could move into the new flat in Birmingham and the next days were spent frantically getting everything ready and moving up to Birmingham. Unfortunately this meant that I did not have time to port over the backup scripts (there was some adaptation necessary as the new provider had a different backup layout). In the meantime my contract with the old provider had run out, leading to the backups there being removed together with my VPS instance.

On the 16th January I notice that I cannot access my e-mail. I try restarting the VPS instance, but that has no effect. It looks like all internet traffic is being dropped on the last hop from the data-centre router to my VPS instance. I send the admin people an e-mail asking them to investigate. Unfortunately what they find out is that due to either a bug or a configuration mistake my VPS instance and another VPS instance’s disk system are being mapped to the same LVM volume. Effectively two file systems were trying to co-exist on a single drive, something that naturally can’t work. The admin staff tried to resurrect the two VPS instances, unfortunately the last file system they could find and restore was the other VPS instance’s file system. All my data was lost and due to the fact that I hadn’t had time to port the backup scripts I did not have any backups.

Additionally moving to the new flat in Birmingham also meant that I only have sporadic internet access, but what time I had was used to slowly rebuild the Room3b server and it is now almost 100% back, with the exception of those database entries and image collections for which I did not have a local backup copy. I won’t be restoring those. We’re starting with a clean slate. Unfortunately that also means that any links you had to this website are now either non-existent or point to something else and you will have to update them.

So that’s basically what happened. The final few bits of data and service will be restored in the next few days and then everything will be back to normal.