Data Publishing

The GRADE project, which I am a collaborative partner on, is concerned with scoping geospatial repositories. The project has principally been tackling legal and technical issues regarding their establishment and, I think, has made some very good progress. Yet behind all this work you do actually need people to deposit data for inclusion in a repository. And this is where the rub is. At the moment we have data centres (buzz word 5-10 years ago) and we are now seeing the increased establishment of institutional repositories. Yet what/where is the impetus for actually depositing data?? I suspect that this is partly subject specific. My impression is that subjects such as physics have a greater tendency to data share. In geosciences its usually a case of keeping what you have collected and only ever publishing the results; not the data itself. To be fair this is beginning to change with the research councils in the UK requiring the deposition of data from funded work. But how much data (from research) actually results from research council funding? My impression is less than half (although if anyone has any figures that would be interesting).

So we have the situation where there is a “top down” establishment of respostories, but no one is actually interested in using them. We have researchers collecting data (for research), but it is research publications that drives the agenda (NOT the data). I know that I see absolutely no reason why I should share primary data and, indeed, I like to discuss with people potential uses before sharing. Then of course we have the vested interests of the institutions that employ researchers. They are directly or indirectly funding much of this research and there is increased interest in “monitoring potential assets” (although quite to what extent institutions have a claim to IPR is another matter).

So where does that actually leave things?? Well Mahendra Mahey (at the GRADE meeting this week) provided a summary of repository work in the UK and (briefly) summarised some points that Pete Burnhill (Director of EDINA) was making along these lines. And that is that data should be published. As a community, academics need to be encouraged about the positive aspects of data sharing and see this as an opportunity to publish. Indeed one could argue that data publication should be seen as a valid publication route. And in the same way that journal articles are peer reviewed, so data should also be. This is a route that we have been toying with at the Journal of Maps. Several articles have data published with them (e.g. Stokes et al. They have been checked for appropriateness but not explicity reviewed in the same manner the article was. I am currently reviewing how useful this “service” is, with the potential to asks reviewers to comment on submitted data, as well as having a separate data reviewer. This actually raises a whole host of other questions concerning data preservation (as opposed to a repository) which I won’t comment on at this moment.

With the above comments, I think it is clear that I’m in favour of data publication, but I am inclined to think at the moment that the data should follow the research (hence the reason for publishing the data with the article at the Journal of Maps). The problem with separating data and content is that maintaining the explicit link between the two becomes more complex (just look at journals from the 19th century to see how effective immediacy is). It also makes the peer review process much simpler. That isn’t to say that data can’t be stored in a repository, but that, in the first instance, it might be better placed with the article. Indeed, I could see the research councils requirement for copies of publications and data deposition taken a stage further and requiring research articles to have data published with them. Clearly the emphasis is then shifted to the journals many of whom will not be placed to deal with it. However the whole research publication ethos is changing (e.g. open access) and it is time that journals become proactive. Indeed, with Wiley and Elsevier being so prominent (and supporting things like permanent electronic archives), it would only require these two organisations to support such an initiative for it to really take off. Whilst in principle it sounds a reasonable idea, there are many barriers. Not least the sheer volume of some data sets within a web based infrastructure where most journals struggle to offer more than a static PDF.

More on TOIDs

Following on from my earlier TOID blog, I was at EDINA (University of Edinburgh) where I was briefly chatting about the Mastermap trials. Getting on to the subject of TOIDs I mentioned about TOID searching and, I am happy to report, EDINA will be incorporating TOID searches in to the Digimap interface (although not just yet). So at least someone is doing it which is good to see. Interestingly, they are also working on exploiting the royalty free status of TOIDs. If you interested in some processing that has been performed, you could essentially be given a macro that list the TOIDs and actions performed on them. It is this file that could be stored (in a repository for instance) and emailed out and then you could legitimally obtain a licensed copy of the data and perform exactly the same operations. No need for the (intermediate) processed data set. Nice idea and I’ll be interested to see it in action.

Palm Opera

Whilst I’m on the theme of playing with my Palm, I saw the recent release of Opera Mini for the Palm (and this interested me because WebPro that shipped with my Palm is not great). Opera for a long time was “the” alternative to Internet Explorer. Whilst it was a paid-for app, it was for small and fast, particularly suited for optimising browsing over dial-up connections. It also was the first browser (I think) to introduce an MDI (multiple document interface) based around tabs. Anyway, the release of Firefox eclipsed many of the in roads Opera had been making. To start with Opera went advertising supported and then ultimately free. However where Opera has been making big in-roads is in the mobile market. Its ability to design small and powerful browsers is ideally suited to this area and it has versions of Opera Mini running on a variety of operating systems. PalmOS has, until recently, not been one of those supported, but the moderate success (particularly stateside) of the Treo smartphones has led to a release which was recently upgraded to version 2.

Opera Mini is actually a highly optimised (read: fast) Java applet that runs on top of IBM Websphere. You need to install this before Opera Mini. Fire it up and everything just works. Opera has done a good job of using screen space well and it succesfully (via a proxy) loaded pretty much any website I threw at it (ebay, GMail, Google etc etc). But the screen is small so don’t expect to got lots of productivity out of it. Like anything, its great to have Office, VNC, email and web all on the Palm, but only when you are in a tight spot.

Google Maps for your Palm

Google Maps was recently released for the Palm platform so I wandered on over to download it. Whilst Palm OS as a platform for PDAs is dwindling (but has a fantastic software base), Palm has been moderately successful with its range of Treo smartphones. Hence a Palm OS version of Google Maps. And I was astounded at how truly powerful an application it is. The interface is simple and offers three main functions:

  • 1. Maps
  • 2. Satellite images
  • 3. Route planning

And what it does it does very well. Zooming in is quick (although whilst you can pan you can’t interactively zoom) and refresh rates fast. You can switch between map and satellite view. I found this strangely addictive and found the whole experience of interactively using 15cm digital aerial imagery “on demand” truly amazing. Finally the route planning is simplicity itself and very effective at showing the route.

LaTeX: Abiword

LaTeX is a mark-up “language” that users learn in order to write LaTeX documents. If you have ever done any HTML by hand then it is similar. The markup accesses the background macros that control all the layout. Below is an example of a LaTeX document:


pre>\documentclass[12pt,a4paper]{article}\usepackage[pdftex]{graphicx}\usepackage{multicol}\begin{document}\section*{Notes on my new paper}Some notes simply typed in to the documen. I can also add some \textbf{bold} and \textit{italic}.\end{document}

So nothing desparately exciting or difficult, althought laying out graphics and tables can get quite fiddly. But of course remember that this is a typesetting program. Not a DTP one.

If this is all a bit of a large learning curve in the first instance then AbiWord is a good place to start. It is a cross-platform WYSIWYG word processor that is pretty good. It doesn’t have all the bells and whistles of Microsoft Word, but does do nearly everything you want and pretty well. Its also OpenSource (and if you want a portable version, pop over to the people at PortableApps). What makes AbiWord stand out a little more is (via plugins) its support for MS Word import (and others) and LaTeX export. It also has a pretty nifty equation editor that utilises an implementation of the LaTeX equation language. So you can import Word documents, save them as LaTeX files and run them through pdftex to create LaTeX PDFs. All very neat.

Footnote: I am currently using the PortableApps version, but the download link was broken. It seems to only be available on the UK Mirror service so I had to do a manual search over there. After getting it, there was no LaTeX export or equation editor present and the AbiWord documentation was a little lacking. You actually need to go to the AbiWord download page where there are three further downloads: import/export plugins, tools plugins and equation fonts. These are small executables that automatically go online and download the right packages for you (and make sure you select the correct install folder). Once done you can select the equation editor from the Insert menu and get some pretty equations going!

LaTeX Musings

Before I get in to this blog, I should note that this will form one of several entries on LaTeX.

When we first started the Journal of Maps we decided to typeset the material ourselves and pondered for quite a while about which software to use. Whilst DTP software first sprang to ming (e.g. InDesign), these are not wholly designed for free flowing text, but rather short, styled, pieces. What we really wanted was software for typesetting and some hunting around the internet pointed me in the direction of LaTeX. This is a version of TeX which adds many macros to allow “ordinary” users to access the true typesetting power of TeX. Tex is nicely summarised by Wikipedia as “a typesetting system created by Donald Knuth. Together with the METAFONT language for font description and the Computer Modern typeface, it was designed with two main goals in mind: first, to allow anybody to produce high-quality books using a reasonable amount of effort, and, second, to provide a system that would give the exact same results on all computers, now and in the future.”

The source code is in the public domain and there are serveral versions available across many platforms. Whilst for typesetting the output is excellent, because TeX will remain essentially unaltered, code from 1985 should work without problems in 2085. One only has to look at early DOS Word Processor programs, or indeed the BBCs Doomsday Project, to see how quickly file formats can date. This shouldn’t be an issue with TeX.

So at JoM we had settled on the software to use. We then had to pick a distribution, platform and design a template. To keep things short for the moment, I’ll just note that we use Windows XP and settled on LaTeX, which meant going for a version called MiKTeX. And this does everything we want it to!

Latte Heaven

Bit of a plug for one of my local coffee houses, Latte Heaven. Great coffee (using a blend of their own, roasted in Leighton Buzzard), a good menu of food and a nice place to chill out. The website is, well, not very good. It is template driven though and, amusingly, gives a link to the control panel for the site at the web hosts. Funny what people leave online.

Anyway, if you’re passing through Dunstable, give it crack. You won’t be disappointed.

BBC News: The map gap

BBC News had an interesting article today entitled The map gap. It covers familiar catographic ground on how we actually represent an ellipsoidal Earth on a flat piece of paper (Nice quote from Steve Chilton: “If you peel an orange, you can’t lay it flat and there’s never an answer to that”); first year geographers, this is exactly why we have a lecture on map projections and coordinate systems! It then briefly introduces Google Earth (and ilk), mapping websites and the whole idea of “user-data” (e.g. OpenStreetMap) and mash-ups. Whilst the article itself doesn’t break any new ground, it is interesting in that it places cartography at the centre of these developments and subtlely (or unintentionally?) asks how the subject can respond to such rapid changes. And this is a good question; will cartography reduce to a niche subject as spatial data users continue to make visually poor (or wrong!) maps or can it re-invent itself? It also demonstrates how mainstream spatial data, visualisation and re-use (mash-ups) have become.

Searching TOIDs?

Following on from my initial thoughts on Mastermap, I was having a ponder about the use of TOIDs. TOIDs essentially represent another addressing system for the UK, allowing identification at the object level. And of course, following on from my blog on AGI Tat, the OS are advertising TOIDs as royalty free. So whilst they can be used for much more than simply addressing, are there any searchable interfaces based on TOIDs (i.e. web based)? Anyone know??

P.S. I can be TOIDed at 1000041424855 (but does anyone know where that actually is!)

Mastermap Styling

With EDINA preparing OS Mastermap data for general distribution as part of Digimap, one of my colleagues has become an early adopter for trialling purposes. Whilst I have come across Mastermap in various guises, this was the first time I had seen the whole processing side up close. My colleague downloaded a couple of layers for a 5x5km area which came to over 200Mb. The first problem was loading it in to ArcMap which is one of the least standards compliant GIS packages around (although this might well be coming in v9.2). A trip over to ESRI UK and a download of MapManager 9 allowed the conversion of the OS GML into a geodatabase. Mastermap then loaded fine, although with default ESRI symbolisation. A lot of digging and we finally came across a style file that symbolises the Mastermap data in the same fashion as the default OS styling (and pleasant enough it is too). Its worth noting that you can get a free OS Mastermap GML Viewer from Snowflake Software.

We then downloaded all layers (although no imagery) for a 5x5km area and, 2-hours later, MapManager produced a 750Mb geodatabase. Hmmm, some planning ahead me-thinks. This is clearly a big headache for EDINA (it took about 6-hours for the data request to be processed, rather than the 2-3 minutes for LandLine), hence the need for testing!