The software engineer and scientist Martin Hammitzsch contributed to the development of early warning systems for natural hazards. Together with colleagues he organizes software workshops for young scientists since 2015. We talk to him about the relevance of software within the geosciences, current developments, and challenges.
GFZ: What is the target group of your workshops on software writing?
Martin Hammitzsch: We address software coding scientists, especially young scientists, master students, PhDs and postdocs as well as experienced scientists who would like to broaden their horizons. In principle, we would like to address everyone who deals with software in one way or the other but especially self-taught persons. It is important to us to teach crafting skills and a better understanding of dealing with software. In this way, scientists not only gain time but also reach a better quality for their work.
GFZ: So you basically do not address IT-experts but geoscientists that use software within their research?
Hammitzsch: Right. We would like to show what already exists so that not everyone has to start from scratch and to teach everything to themselves. In our basic workshop we explain the fundamentals of code. We would like to teach background knowledge on programming and to provide a platform for mutual exchange.
An important aspect of the courses is the topic of access, re-use and reproducibility. If, for example, a PhD leaves the institute, his or her department should be able to use the code that the PhD wrote. The writer of the code therefore needs to follow some rules. Software needs to be understood as part of the infrastructure that requires some support.
To get a paper published you need to understand and follow the basic rules of scientific writing. This needs some training and knowledge on, for example, the standards of scientific writing within a scientific community or of a scientific journal that you would like to publish in. The same is true for software.
GFZ: Within the description of the courses you say that writing code in geosciences needs to follow minimal requirements. Like what?
Hammitzsch: Here I address specific standard tools and processes from software engineering like version control, handling data, and the use of automated processes. Minimal requirements also need to be fulfilled for structuring code, modularizing it, or for the avoidance of duplications. From hundred findings of software engineering I as a coding scientist maybe only need to consider about ten of them to improve my work. But these ten findings I need to know about.
What also needs to be avoided is an optimization at a too early stage. Many people believe that they need to build up their code in a generic way and to optimize it at a very early stage. I say it is better to first of all focus on the problem-solving and then to later on make the code more elegant. Software cannot stand for itself, written in some cryptic code. A uniformly used nomenclature of variables and functions is very important. This cannot be taken for granted. You often see very odd nominations. Software also needs supplementary information in the form of a transparent documentation.
This of course is some extra work. But it is worthwhile, especially if you write software within a team or if, for example, someone quits a team. But also if you yourself would like to be able to get back to your own code after half a year. Then you come to understand the importance of a good structure and documentation.
GFZ: How to ensure that software remains accessible when the originators leave an institution?
Hammitzsch: The application of web-based platforms and tools for the management of software projects and software development processes are essential. These platforms are for example GitHub, Bitbucket or Sourceforge. They allow the collaborational work of two colleagues sharing an office as well as of large international teams. If used free of charge, however, the software on these platforms is openly accessible. Moreover, these web-based tools allows the collaboration and exchange between developers as well as between developers and non-programming users.
As for written publications, it is also possible to work within a protected environment, for example by using GitLab as an institutional tool. For an institution it might make sense to implement rules and guidelines for it so that the coding scientists to have an orientation independent of the size of software.
GFZ: For written publications the digital object identifier DOI was implemented that allows to reference a publication to its author. How does this work for software?
Hammitzsch: DOIs are also used for software. It is relatively well known that versions of a software in GitHub can be published as a simple zip-file via Zenodo – a platform for the publication, shared use, and long-term storage of data and software. It also includes metadata like: who are the authors, how is a specific part of the software named, in which version is it available, and so forth.
The zip-file, together with its meta-data, is archived within a database and gets a number: the DOI. For long-term storage and citation this may be sufficient but, unfortunately, the software then is taken out of the context of its project, its ecosystem. A lot of information is lost when you apply solutions intended for publishing static PDFs and printed publications on software. It just falls short on the dynamic character of software.
GFZ: How do people deal with this?
Hammitzsch: Often, people still publish in print. They just write about their software within a scientific article. Everybody who uses the software is then asked to refer to this publication. But like this there is no guarantee of transparency or reproducibility because sometimes outdated publications are cited again and again. The software behind a publication does, however, often evolve way beyond the described status. There is no guarantee that older or newer versions of a software do provide the functionality a user aims for. There is still a big gap.
GFZ: Are the institutions on the right track to solve this problem?
Hammitzsch: I think that software is still treated improperly, given that it is an elemental and indispensable part of research today. Gaining knowledge is often significantly depending on the used software. Scientific software and the related work needs to get the due credit within research.
GFZ: What is a best practice use of software that you would like to point out?
Hammitzsch: Currently, the GFZ section Hydrology plans the implementation of version control systems following defined rules. In the future version control will be mandatory in this section for software where two or more developers are working together, and software that is considered as a tool for the whole working group, and for all new software projects and software-writing PhDs and staff members.
Furthermore, there are software projects like SeisComP3, originally developed for the GEOFON program to process seismic data. Meanwhile, even the UNO uses it for monitoring the nuclear-test-ban treaty. Or EQUATOR II, a system that brings together earthquake parameter, context information and the evaluation of experts. Both projects are based on solid ground and the maintenance and re-use is guaranteed.
For me, the exchange on best practices is of major importance, as well as to learn from each other. It is simply a good thing to know about where which software is being developed. Our workshops also intend to provide a platform for this. Sometimes even knowing about a tiny script or a part of a source code can help someone a lot. Much more could be done in this regard to make use of the potential available.
GFZ: How do you as an it expert work together with the scientists?
Hammitzsch: When I, as an IT expert with a background in software engineering, work together with scientists, first of all I need to understand where they want to go. What insights do they want to gain? This is an iterative process that needs a lot of communication. There is never only one solution but a bunch of them with different complexities and different pros and cons where you need to choose the one most suitable from.
The ideal way is that software experts are involved right from the beginning of a project to talk about requirements and feasibilities. Especially when software plays an important role within a project, and here I am not talking about standard software solutions but those that are especially designed for a specific application. Like for data, it must be made clear how software is managed within a project.
GFZ: Do you see a tendency towards more digitalized processes within geosciences where new software solutions need to be developed for?
Hammitzsch: Definitively. More and more software is used and I believe that research does not work without software anymore. Most of the time, in one way or the other, you have to deal with the fact that data need to be recorded. And today, data are digital and are collected by software systems. After data collection the chain of data processing starts to gain new findings. Data cannot exist without a system and today this usually is a software system.
GFZ: What are the chances of digitization in geosciences?
Hammitzsch: Using the example of early warning: The best-known system is the German-Indonesian Tsunami Early Warning System GITEWS that was implemented under the direction of the GFZ. But there are also less-known European projects with the GFZ as a partner for the Atlantic or the Mediterranean regions. These systems mostly operate in a highly automated way.
A human operator in front of these systems could check and evaluate all incoming data 24/7. But this work can also be done by a software system. When the build-up of a tsunami is calculated based on actual measured data, a message is automatically generated with the important information, for example on estimated wave heights and times of arrivals at a shore. An operator now only needs to check the results of the simulation but does not need to do it all by herself. The knowledge of experts is still in demand, also as a corrective, but the software system does a lot of the work. The same principle can be applied on several other systems.
GFZ: What is your own background? How did you become a software expert in geosciences?
Hammitzsch: I started with professional training at Siemens as an Associate Software Engineer with a high share in software development. Afterwards I studied Communication Systems in Berlin and later on Software Systems Engineering in Potsdam at the Hasso-Plattner Institute. During this time I also worked as a student and later on in full-time in the industry and developed web-based applications, before I came to the GFZ in 2008 to develop and implement software systems in the field of early warning systems for natural hazards.
GFZ: When will the next software writing courses take place?
Hammitzsch: In the third week of May. Besides us the Potsdam Institute for Climate Impact Research, the University of Potsdam and colleagues from partner institutes and universities are involved, as well as colleagues from other countries, via the Software Carpentry network. The idea is to provide fundamentals as well as institute-specific basics in hands-on lessons.
The interview was ked by Ariane Kujau