To do something with OpenStreetMap data, we have to download it first. This can be the entire data from planet.openstreetmap.org or a smaller extract from a provider like Geofabrik. If you’re doing this manually, it’s easy. Just a single command will call
wget, or you can download it from the browser. If you want to script it, it’s a bit harder. You have to worry about error conditions, what can go wrong, and make sure everything can happen unattended. So, to make sure we can do this, we write a simple bash script.
The goal of the script is to download the OSM data to a known file name, and return 0 if successful, or 1 if an error occurred. Also, to keep track of what was downloaded, we’ll make two files with information on what was downloaded, and what state it’s in:
configuration.txt. These will be compatible with osmosis, the standard tool for updating OpenStreetMap data.
Before doing anything else, we specify that this is a bash script, and that if anything goes wrong, the script is supposed to exit.
1 2 3
Next, we put the information about what’s being downloaded, and where, into variables. It’s traditional to use the Geofabrik Liechtenstein extract for testing, but the same scripts will work with the planet.
1 2 3 4
We’ll be using curl to download the data, and every time we call it, we want to add the options
-L. Respectively, these make curl silent and cause it to follow redirects. Two files are needed: the data, and it’s md5 sum. The md5 file looks something like
27f7... liechtenstein-latest.osm.pbf. The problem with this is we’re saving the file as
liechtenstein-latest.osm.pbf. A bit of manipulation with
cut fixes this.
1 2 3
The reason for downloading the md5 first is it reduces the time between the two downloads are initiated, making it less likely the server will have a new version uploading in that time.
The next step is easy, downloading the planet, and checking the download wasn’t corrupted. It helps to have a good connection here.
1 2 3
Libosmium is a popular library for manipulating OpenStreetMap data, and the osmium command can show metadata from the header of the file. The command
osmium fileinfo data.osm.pbf tells us
1 2 3 4 5 6 7 8 9 10 11
The osmosis properties tell us where to go for the updates to the data we downloaded. Despite not needing the updates for this task, it’s useful to store this in the
configuration.txt files mentioned above.
Rather than try to parse osmium’s output, it has an option to just extract one field. We use this to get the base URL, and save that to
Replication sequence numbers needed to represented as a three-tiered directory structure, for example
123/456/789. By taking the number, padding it to 9 characters with 0s, and doing some sed magic, we get this format. From there, it’s easy to download the
state.txt file representing the state of the data that was downloaded.
1 2 3
After all this has been run, we’ve got the planet, it’s md5 file, and the state and configuration that correspond to the download.
Combining the code fragments, adding some comments, and cleaning up the files results in this shell script
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29