Academic programming and research

Saturday, 18 April, 2009

Paul Mather wrote an interesting editorial in the latest RSPSoc newsletter (2009, 32, 2) detailing recent comments proposing that photogrammetry is dead as a subject. The basis for this was that well funded computer vision researchers have taken photogrammetric principles and are now moving forward research in this area. It raises the wider issue of whether programming skills are needed by researchers for pursuing their work and Paul notes that “you cannot do real research if your research questions are limited to what a commercial software package will let you do.” He contends that commercial software only allows researchers to use “last year’s techniques.”

Now I don’t disagree that the ability of a package should not determine the work that you do and that you inevitably need some kind of programming or scripting skills to do at least some bespoke work. Indeed, when you need to move in to more complex processing, more generalised enviroments such as MATLAB or IDL are often used. Saying that, much environmental research requires fairly straightforward analysis; just because it’s the latest or newest technique doesn’t mean it’s the best or most appropriate.

Where I have a much greater bone of contention is the quality of the programming. I did some work a number of years ago and performed a PCA (in Imagine) on a number of images. The result was interesting, but I had to revisit the dataset about a year later and re-do the analysis (using a newer version of Imagine). Much to my surprise the results were different. The raised the horrible prospect of algorithm modifications producing different results. And this is perhaps the biggest problem with commercial software: they are a black-box and you have no real way of knowing how good the results actually are. Microsoft Excel has long been hammered for producing inconsistent or incorrect results, yet it is routinely used for much academic statistical work (and heck you can even play on the flight simulator!). SPSS is generally much better regarded, but again it’s algorithms are largely unknown. Empirical testing is required in order to ascertain the quality of the output. Many of the routines in ArcGIS Workstation are better, being well documented and generally taken from academic research (although many remain 20 years old; fine if they do the job, but not so good if there are much better alternatives). A excellent example of how work should be progressed is in the R Project. Here we have open source statistical software with routines written in a generalised scripting environment. Routines are often submitted to statistical journals by researchers in the area and peer reviewed. That doesn’t make them faultless but gives you a far better chance of producing work with correct results.

Paul proposes the development of relevant code libraries for remote sensing, partly because so much C and C++ code is now open source. Whilst this sounds good in principle much software has suffered from the “bolt-on” approach where you take what you have and simply add to it. Blackboard is an appalling bolt-on product, along with the behemoth that Natscape Navigator became. Just because code is open source doesn’t mean it is any good or, indeed, correct. And developing graphical environments is much more time consuming to do. But as researchers we do want to develop code to analysis or process data. So what is the best solution?? Well there probably isn’t one, but issues to consider include the speed of development, availability of existing code, cost, code speed, quality of algorithms, archivability and long term usability.

A nice example of considering these issues comes from NERC FSF. They supply some Excel 2003 templates to process field spectra; they work very well and are good. Of course with the release of Excel 2007 they stopped working and required re-development. They are also proprietary and there is an inherent cost in terms of Excel licensing. In stark contrast, the Latex base system was frozen in (I think) 1989. The software has been improved through plugins, but what it means is that typesetting code written in 1989 will work in 2009. That is a fantastic achievement, particularly in comparison to Excel VBA scripts where 2 years is about all you’re guaranteed. FSF have updated their templates but are also working on some MATLAB scripts which strikes me as a better solution. R strikes me as a good environment because of the peer-review and generalised scripting, but it doesn’t yet offer the richness of image processing that MATLAB or, in particular, IDL do. I’m far less familiar with either MATLAB or IDL, but do they offer the best option for an image processing code library? Or should we be pushing for an image processing environment along the lines of R. I know there are a variety of open source projects running, but am not aware of their status in this respect. Such a code library would of course be an excellent resource and save research projects from reprogramming the same thing over and over.

Add comment

Fill out the form below to add your own comments