Announcing pictogpx

I have finished a program that will turn a set of geotagged photos into a .gpx trace. This program is pictogpx and can be found on github.

The usage is fairly simple, just give it the name of the .gpx file to create and a list of images.

I use it to interpolate between geotagged images so I don’t need to manually tag each one. It could also be used to see how you travelled from a set of photos.

Posted in Uncategorized | Comments Off

Going jogging

Today while mapping I came across an intersection that shows the different types of artefacts that can be found in Bing imagery. Continue reading

Posted in gis, osm | Comments Off

Loading a pgsnapshot schema with a planet, take 2

Awhile back I ran my own jxapi server, running with postgresql 8.4. I recently upgraded my server, and in the process I decided to re-import the database into postgresql 9.1.

There are two ways to load the data into the databases. The first of these is to use osmosis and --write-pgsql. The second is to use osmosis, --write-pgsql-dump and copy statements to load the data into postgresql. This is reportedly faster and allows for more tuning of postgresql. Last time I used --write-pgsql but this time around I decided to use --write-pgsql-dump.

Getting the data

Even though you don’t need the data right away, you want to start downloading your planet.osm right away. You can do this with a program like curl, or you can try the planet torrents. The torrents are always at least as fast as downloading directly if you use a client that supports webseed, which most do. Most torrent clients also support throttling the bandwidth used which is useful if you want to use your internet connection while downloading.

Installing osmosis

You want osmosis-0.39 or later, which can be found on bretth’s site. When I tried I was unable to download the -latest version and had to download -0.39 specifically.

Reading the planet with osmosis

If you finish your planet download early or intend to use the planet file for other purposes, you should recompress it from bzip to gzip, which, although less space efficient, is much faster to decompress. Do this with bunzip2 -c planet-xxxxxx.osm.bz2 | gzip -3 > planet-xxxxxx.osm.gz. This is fairly quick on a dual-core CPU since one core will decompress while the other compresses.

Once you have osmosis and planet.osm, you want to start osmosis on processing the planet file. The osmosis documentation is extensive. Before you start, you will want to allocate more RAM to osmosis. You do this by creating a ~/.osmosis file and insert export JAVACMD_OPTIONS="-Xmx8G". You want to give osmosis as much RAM as you can spare, particularly if using InMemory node storage. I gave it 16 GB.

You may need to chmod +x bin/osmosis to continue.

Crafting osmosis command lines is somewhat of a confusing art. What is essential to remember is order matters. You will end up with something like zcat osm/planet/planet-111109.osm.gz | ./osmosis-0.39/bin/osmosis --fast-read-xml file=- --log-progress interval=15 --write-pgsql-dump directory=pgsqldump enableBboxBuilder=yes enableLineStringBuilder=yes nodeLocationStoreType=InMemory

Breaking this down into parts:

  • zcat osm/planet/planet-111109.osm.gz | ./osmosis-0.39/bin/osmosis: This reads the gzipped planet file and feeds it to osmosis. If you didn’t recompress to gzip, then use bzcat on the bzip.
  • --fast-read-xml file=-: This reads from the standard input (“-”) and produces an output stream which is then used by…
  • --log-progress interval=60: This takes the output stream from the --fast-read-xml task and prints progress reports every 60 seconds. It sends the output stream to the next task unchanged.
  • --write-pgsql-dump directory=pgsqldump enableBboxBuilder=no enableLinestringBuilder=yes nodeLocationStoreType=InMemory: This takes the output stream and converts it to pgsql dump files in the directory pgsqldump. Let’s take a loot at the options
    • directory=pgsqldump: The directory where the output will be saved. This needs to be local to the machine, network latency will kill performance. AS of late 2011 you need approximately 240 GB for these files.
    • enableBboxBuilder=no: The information generated by this setting is reportedly not used by jxapi, so we don’t need to generate it.
    • enableLinestringBuilder=yes: You can generate this on import to the database, but it makes the import more complicated. You have to do the appropriate analyze commands mid-import or it will take years to finish. Just set it to yes and have osmosis do the work at this stage.
    • nodeLocationStoreType=InMemory: how osmosis stores nodes. If you have enough memory, you want InMemory. If not you want either TempFile or CompactTempFile. See the osmosis documentation for more information. When running with -Xmx12G java threw a garbage collection error towards the end of the nodes. I fixed it with export JAVACMD_OPTIONS="-Xmx16G -XX:-UseGCOverheadLimit" in my .osmosis file.

When you finally run the command, you want to use screen or another program to keep it running in the background. This took about 5 hours for me.

Installing postgresql

You need to install postgresql. This may vary between distributions, but on ubuntu sudo apt-get install postgresql-9.1 postgresql-contrib-9.1 postgresql-9.1-postgis should get you everything you need.
If you don’t have them, you’ll want also do sudo apt-get install screen openjdk-7-jre-headless to get other programs you’ll need.

Optimizing the database

A pgsnapshot database differs from other databases in a few ways. Data loss on power outages can be tolerated since osmosis will just reprocess the diff file anyways, and it’s a big database. The postgresql wiki has a page on database tuning.

The important settings are shared_buffers, checkpoint_segments, and the write-ahead-log settings. There are also some settings you want to change specifically for the import.

I found the following settings worked for me with 16GB of ram:

shared_buffers = 2GB
work_mem = 128MB
maintenance_work_mem = 1GB
wal_level = minimal
synchronous_commit = off
checkpoint_segments = 64
checkpoint_timeout = 15min
checkpoint_completion_target = 0.9
default_statistics_target = 1000

The wal_level and synchronous_commit settings give the potential for data loss if a power outage occurs mid-update, but osmosis will re-run any diffs that were interrupted so this does not matter.

There are additional settings you can change for a faster import.

autovacuum = off
fsync = off

The first turns off auto-vacuum during the import and allows you to run a vacuum at the end. The second will introduce data corruption in case of a power outage and is dangerous. If you have a power outage while importing the data you will have to drop the data from the database and re-import, but it’s faster. Just remember to change these settings back after importing. fsync has no effect on query times once the data is loaded.

Also remember to restart postgresql. /etc/init.d/postgresql restart or equivalent.

You may get an error and need to increase the shared memory size. Edit /etc/sysctl.d/30-postgresql-shm.conf and run sudo sysctl -p /etc/sysctl.d/30-postgresql-shm.conf. I use kernel.shmmax=17179869184 and kernel.shmall=4194304 for a 16GB segment size.

Setting up the database

The jxapi documentation contains some commands to run to set up the database. These are only valid on 8.4 and can screw up the database on 9.1. You want to run these commands instead

sudo su - postgres

createdb xapi

createlang plpgsql xapi may already be done by default

createuser xapi You do want this to be a superuser

psql -d xapi -c "ALTER ROLE xapi PASSWORD 'xapi';"

psql -d xapi -f /usr/share/postgresql/9.1/contrib/postgis-1.5/postgis.sql

psql -d xapi -f /usr/share/postgresql/9.1/contrib/postgis-1.5/spatial_ref_sys.sql These two paths might change from version to version.

psql -d xapi -c "CREATE EXTENSION hstore" This is the big change from 8.4. What was hstore-new is now hstore, and extensions are enabled differently.

psql -d xapi -f ~/osmosis-0.39/script/pgsnapshot_schema_0.6.sql

psql -d xapi -f ~/osmosis-0.39/script/pgsnapshot_schema_0.6_linestring Vary with wherever you installed osmosis.

psql -d xapi -c "CREATE INDEX idx_nodes_tags ON nodes USING GIN(tags);"

psql -d xapi -c "CREATE INDEX idx_ways_tags ON ways USING GIN(tags);"

psql -d xapi -c "CREATE INDEX idx_relations_tags ON relations USING GIN(tags);"

exit will take you back to the original user.

Setting up the import

The pgsnapshot_load_0.6.sql file requires some changes before use, so place a copy of it in the pgsqldump directory. Open it up and comment out the SELECT DropGeometryColumnSELECT AddGeometryColumnUPDATE ways SET bbox and SELECT MakeLine(c.geom) AS way_line statements. Also add \timing before the \copy statements to track how long it takes.

It is also possible to comment out the CLUSTER statements. CLUSTER places nearby node near each other on disk. This is useful if you are running bbox queries against your jxapi, but not very useful if you are filtering by tags.

The CREATE INDEX idx_ways_bbox ON ways USING gist (bbox); statement can be commented out as jxapi does not use this information.

Running the import

In the pgsqldump directory as the postgresql user, run psql -d xapi -f pgsnapshot_load_0.6.sql

If you placed pgsnapshot_load_0.6.sql in a different directory you need to be in the pgsqldump directory, not the directory where the .sql is. It will then run. Go and find something else to do, this will take some time.

users.txt:            0h  0m 0.67 s
nodes.txt:            4h 54m
ways.txt:             3h  3m
way_nodes.txt:           50m
relations.txt:            1m 14   s
relation_members.txt:        29.1 s

Creating indexes:

node primary key:                48m
way primary key:                 12m
way_nodes primary key:           55m
relations primary key:           14s
relation_members primary key:     9s
nodes geometry index:          14h8m
way_nodes node ID index:       1h29m
realtion_members member index:   15s
ways bounding box index:         n/a
way linestring index:            52m

Next are the two CLUSTER statements. I aborted these after half a day. CLUSTER requires 1-2x table size in free disk space which I do not have. Based on disk usage I estimated it would take 2-3 days to CLUSTER. Reportedly the correct way to do this is to pre-sort the files before loading.

Do not forget the ANALYZE. Without it, queries will not finish in your lifetime. ANALYZE took me 20 minutes.

Setting up jxapi

Next you want to set up jxapi. This can be done concurrently with the other steps but you will be unable to test until the ANALYZE command is finished.

First of all, install tomcat7 with sudo apt-get install tomcat7. Then download the latest version of jxapi from github. Copy it as xapi.war to /var/lib/tomcat7/webapps

You’re done! Enjoy your data, and don’t forget to turn fsync and autovacuum back on.

Posted in osm | Comments Off

A simpler shapefile conversion

After my last marathon shapefile conversion, I thought a simpler example might be useful as to how to write a translation. To do so, I’m going to use the Surrey parkNaturalAreasSHP shapefile. This shapefile contains data on forested areas in Surrey parks.

My first step was establishing which shapefile fields might be useful. To do so, I ran ogr2osm on the shapefile with no translation. I got back 12 fields, AGE_CLASS, AREA_TYPE, CRE_BYE, CRE_DATE, EQ_NO, GEODB_OID, GLOBALID, MOD_BY, MOD_DATE, OBJECTID, PARK_NAME and SHAPE_AREA. Of these, the potentially interesting ones are AGE_CLASS, AREA_TYPE and PARK_NAME.

The first of these, AGE_CLASS, has four possible values which JOSM reports as {=104, Mature=1100, N/A=6, Young=214}. This appears to be the age of the stand of trees. Unfortunately there’s no corresponding tagging for this information, but there might be some day. My preferred method for this type of information is to add it in the surrey:* namespace, in this case as surrey:wood_age=*.

The second, AREA_TYPE has five possible values, {Forest-coniferous=161, Forest-deciduous=542, Forest-mixed=610, Grassland=47, Shrubland=64}.

There is debate over how to tag forests. Without going into excessive detail, I am of the school of thought that forested areas in parks should be tagged as natural=wood. This is consistent with what I’ve seen most other local mappers do, and is Approach 2 on the wiki. After checking with the wiki for how to tag coniferous vs. deciduous forests, I map the three forest types to natural=wood and wood=coniferous/deciduous/mixed.

I now have to consider how to tag AREA_TYPE=Grassland. Two obvious options are landuse=grass and landuse=meadow. After looking at 2010 on-leaf and 2008 off-leaf imagery, I can conclude that the areas are unmanaged and contain a mix of plants and are best described as a meadow. It helps to be very familiar with your imagery to make a determination like this without a site survey.

The last value is AREA_TYPE=Shrubland. Here it could be natural=heath or natural=scrub. Of these, I lean towards natural=scrub based on the imagery. There’s no clear dividing line between the two, but I see enough bush coverage that I decide scrub is more appropriate.

The last of the fields is PARK_NAME. This could be used to determine the name of a park if I had no other way, but I have received permission to copy from Surrey’s online maps which include park names. In addition, the lot shapefile I converted previously has park names.

An important theme in both this translation and the property lot translation is to strip off excess fields that are not relevant to OSM. Too many imports want to import all the fields, leading to unwieldy tagging schemes that are intimidating to mappers.

Posted in Uncategorized | Comments Off

Writing a shp to osm conversion translation dictionary

ogr2osm is an amazing tool for converting shapefiles to osm files, but to make use of it you have to write a translation dictionary. The quality of your translation file will determine the quality of your tagging, and how useful your import is. To illustrate this, I am using the example cadLotsSHP from the City of Surrey. This is data for a city of 92 617 lots. Although this data will not be imported into OSM, it is useful to write a conversion as selected lots can be imported for parks and such. Continue reading

Posted in Uncategorized | Comments Off

Simplifying NHN Waterways

When the NHN waterways were imported in BC, they were over-digitized and had too many nodes in them for the accuracy they had. This was detrimental to editing since areas far away from the city which had not been mapped by hand had a node density equal to that of the city.

Lately I’ve been bulk simplifying these waterways, as well as fixing up incorrect tagging.

Continue reading

Posted in Uncategorized | Comments Off

Converting MrSID to GeoTIFF

Recently I got the City of Surrey 2010 orthographs in MrSID format using a portable hard drive. MrSID is an imagery format that is highly compressed and in this case has a 20:1 compression ratio. Even with this high compression ratio, the orthographs are still nearly 9 GB.

1. Get some software

The biggest downside to MrSID is that it is patented and requires specialized software only available from LizardTech to process. The program we want is MrSID Decode, available at no cost for Windows, Linux and Solaris. In this case we want the Linux release, which we download and extract.

After extracting, we go to linux64 and extract another tarball. We ultimately want to end up in linux64/GeoExpressCLUtils-8.0.0.3065/bin

We then add this directory to the path and LD_LIBRARY_PATH with setenv PATH `pwd`:$PATH; setenv LD_LIBRARY_PATH `pwd`

The second piece of software we need is gdal.

2. Convert to GeoTIFF

We navigate to the directory where the MrSID is saved and then run mrsidgeodecode on it. This has a few options, but the key one is -wf. Without this option, it will loose the georeferencing.

Combining all of this, we run mrsidgeodecode -wf -i Surrey_10cm_utm10_2010.sid -o Surrey_10cm_utm10_2010.tif and wait. This process takes a few hours, and ends up with a 166 GB GeoTIFF.

3. Tiling the GeoTIFF

To be usable, we want to tile the GeoTIFF so you can view a small area without reading the complete file. To do this, we use gdal_translate -of GTiff -co "TILED=YES" Surrey_10cm_utm10_2010.tif Surrey_10cm_utm10_2010_tiled.tif

Once this has completed, we can delete the untiled GeoTIFF.

4. Adding overviews.

Depending on what we’re doing, we might want to add overviews. This will improve performance when viewing the GeoTIFF zoomed out. We do this with  gdaladdo -r gauss Surrey_10cm_utm10_2010_tiled.tif 2 4 8 16 32 64

Posted in Uncategorized | Comments Off

Rectangular buildings in JOSM

Many buildings are rectangular, particularly in industrial areas. In fact, when you look at aerial imagery of industrial areas, it often looks like a giant set of boxes. There are a few ways to draw these buildings, but these are a few of the methods I’ve found quickest. When you’re drawing buildings in large industrial areas, you want the quickest and most accurate way to do so.

Continue reading

Posted in osm | Comments Off

Golf, from the air

Rweait’s post on golf courses got me thinking about mapping golf courses got me thinking about mapping some myself, but by aerial photos. To start with, I selected a golf course near the Serpentine River in Surrey, BC.

Mapnik image Copyright OpenStreetMap contributors, CC-BY-SA.

Opening up this area in JOSM reveals that the golf course was mapped as a multipolygon, essentially entirely by me. The source tags also tell me that the water was imported from Surrey GIS data and the other features are from 2010 aerial photos. Of course, I already knew this, but it’s good to check if anyone has come along and mapped more updated info.

Looking at the orthos, I can quickly see that all the water present is accurately mapped, and what exists agrees with the photos, with the exception of one stream. I also know from experience with this imagery that I do not need to apply an offset as it is already correctly aligned.

The one stream that stands out is an import from the NHN database, which often contains outdated ditch information in agricultural areas. What is likely is that this golf course was once farmland and contained a ditch that followed that alignment. Regardless, knowing how it got there and that my imagery is more accurate and up to date, I go ahead and remove it.

The first tracing to do is the paths. highway=track is what is currently used and seems sensible, as well as agreeing with a wiki proposal. The tag information tools wouldn’t be very helpful here, they would just tell me there are lots of tracks and paths world-wide.

Tracing this many tracks takes awhile, but once it’s done the course is starting to take shape. This step added about 1300 nodes, including adding some buildings and parking lots in the normal way.

Looking at one of the simpler holes, the one to the north-east corner of the course, I can see that putting greens are the lighest color in the 2010 orthographs, the long fairways are a medium color, and the tee areas are the darkest green. The bunkers are very obvious, being sand. But how to tag these? Rweait’s post, taginfo, and the wiki all agree on bunker, tee, green and fairway, with the additional value of golf=hole for the holes themselves.

Posted in osm, surrey, Uncategorized | Comments Off

Surrey Addresses: A retrospection

For the last little while I’ve been working on importing the addresses from the City of Surrey GIS data to OpenStreetMap, and thought I’d share my thoughts after having successfuly completed the import. Because this import was in an area with no addresses previously in the database, this import in many ways was as simple as possible as I didn’t have to worry about colliding with existing data. Address data is also very well-suited for imports as no one really likes collecting this data, even if it is very important.

Legal

For many imports, this section would be a lengthy discussion of the license and compatibility, but not so here. Surrey has taken the forwards-thinking step of releasing their GIS data under the Public Domain Dedication and License, an excellent license for municipal data. As rweait said on talk-ca, this is good not just for OSM, but good for the citizens of Surrey as it allows anyone to use the data for their project.

How Surrey Helped

Surrey’s GIS department was very helpful in preparing this import and understanding their data which was essentially a direct dump from their internal database. One of the changes I made after talking to them was to use the ADDRID field instead of the GLOBALID field, a change that should make future additions easier.

The Conversion

I elected to use ogr2osm to convert, using a virtual machine running debian. I did this because it was the easiest, but because it uses a python function for tag translation. This let me turn road names like “OCEAN PARK RD” into “Ocean Park Road”

Uploading nearly 100k nodes

I did the actual upload by splitting the .osm file into parts with JOSM and uploading with JOSM, with each part consisting of approximately 20k nodes, uploaded in pieces of 500. I had one error on the second upload which I had to revert and re-upload. It took over a day to get everything uploaded since I had to time my uploads for off-peak hours. If I were doing it over again, I’d use Upload.py or one of the other scripts.

Conclusions

Overall, the import was a success and saved collecting about a hundred thousand address nodes.

Posted in osm | Comments Off