Paul's Blog

Minutely Updated Tile Volume: Technical Details

2024-01-15T14:00:00-08:00

I’ve been looking at how many tiles are changed when updating OSM data in order to better guide resource estimations, and have completed some benchmarks. This is the technical post with details, I’ll be doing a high-level post later.

Software like Tilemaker and Planetiler is great for generating a complete set of tiles, updated about once a day, but they can’t handle minutely updates. Most users are fine with daily or slower updates, but OSM.org users are different, and minutely updates are critical for them. All the current minutely ways to generate map tiles involve loading the changes and regenerating tiles when data in them may have changed. I used osm2pgsql, the standard way to load OSM data for rendering, but the results should be applicable to other ways including different schemas.

Using the Shortbread schemea from osm2pgsql-themepark I loaded the data with osm2pgsql and ran updates. osm2pgsql can output a list of changed tiles (“expired tiles”) and I did this for zoom 1 to 14 for each update. Because I was running this on real data sometimes an update took longer than 60 seconds to process if it was particularly large, and in this case the next run would combine multiple updates from OSM. Combining multiple updates reduces how much work the server has to do at the cost of less frequent updates, and this has been well documented since 2012, but no one has looked at the impact from combining tiles.

To do this testing I was using a Hezner server with 2x1TB NVMe drives in RAID0, 64GB of RAM, and an Intel i7-8700 @ 3.2 GHz. Osm2pgsql 1.10 was used, the latest version at the time. The version of themepark was equivalent to the latest version

The updates were run for a week from 2023-12-30T08:24:00Z to 2024-01-06T20:31:45Z. There were some interruptions in the updates, but I did an update without expiring tiles after the interruptions so they wouldn’t impact the results.

To run the updates I used a simple shell script

#!/bin/bash
set -e
while :
do
SEQUENCE=$(osm2pgsql-replication status -d shortbread --json | jq '.local.sequence')
osm2pgsql-replication update -d shortbread --once -- --expire-tiles=1-14 -o "expire_files/$SEQUENCE.txt"
sleep 60
done

Normally I’d set up a systemd service and timer as described in the manual, but this setup was an unusual test where I didn’t want it to automatically restart.

I then used grep to count the number by zoom in each file, creating lists for each zoom.

for z in `seq 1 14`; do
find "$@" -type f -exec grep -Ech "^$z/" {} + >> $z.txt
done

This let me use a crude script to get percentiles and the mean, and assemble them into a CSV.

#!/usr/bin/env python3
import numpy
import sys
nums = numpy.fromfile(sys.argv[1], dtype=int, sep=' ')
mean = numpy.mean(nums)
percentiles = numpy.percentile(nums, [0, 1, 5, 25, 50, 75, 95, 99, 100])
numpy.set_printoptions(precision=2, suppress=True, floatmode='fixed')
print(str(mean) + ',' + ','.join([str(p) for p in percentiles]))

A look at the percentiles for zoom 14 immediately reveals some outliers, with a mean of 249 tiles, median of 113, p99 of 6854, and p100 of 101824. I was curious what was making this so large and found the p100 was with sequence number 5880335, which was also the largest diff. This diff was surrounded by normal sized diffs, so it wasn’t a lot of data. The data consumed would have been the diff 005/880/336

A bit of shell magic got me a list of changesets that did something other than add a node: osmium cat 005880336.osc.gz -f opl| egrep -v '^n[[:digit:]]+ v1' | cut -d ' ' -f 4 | sort | uniq | sed 's/c$.*$/\1/' Looking at the changesets with achavi, 145229319 stood out as taking some time to load. Two of the nodes modified were information boards that were part of the Belarus - Ukraine border and Belarus-Russia border. Thus, this changeset changed the Russia, Ukraine, and Belarus polygons. As these are large polygons, only the tiles along the edge were considered dirty, but this is still a lot of tiles!

After validating that the results make sense, I got the following means and percentiles, which may be useful to others.

Tiles per minute, with updates every minute

zoom	mean	p0	p1	p5	p25	p50	p75	p95	p99	p100
z1	3.3	1	2	2	3	3	4	4	4	4
z2	5.1	1	2.6	3	4	5	6	7	7	10
z3	9.1	1	4	5	8	9	11	13	15	24
z4	12.8	1	5	7	10	12	15	20	24	52
z5	17.1	1	5	8	13	17	20	28	35	114
z6	21.7	1	6	9	15	21	26	37	48	262
z7	25.6	1	6	9	17	24	31	46	63	591
z8	29.2	1	6	9	17	26	34	55	92	1299
z9	34.5	1	6	10	18	28	37	64	173	2699
z10	44.6	1	7	10	20	31	41	80	330	5588
z11	65.6	1	7	12	23	35	49	125	668	11639
z12	111	1	8	14	29	44	64	238	1409	24506
z13	215	1	10	18	40	64	102	527	3150	52824
z14	468	1	14	27	66	113	199	1224	7306	119801

Based on historical OpenStreetMap Carto data the capacity of a rendering server is about 1 req/s per hardware thread. Current performance is slower, but includes The new OSMF general purpose servers are mid-range servers and have 80 threads, so should be able to render about 4800 tiles per second. This means that approximately 95% of the time the server will be able to complete re-rendering tiles within the 60 seconds between updates. A couple of times an hour it will be slower.

As mentioned earlier, when updates take over 60 seconds, multiple updates combine into one and reduce the amount of work to be done. I simulated this by merging every k files together. Contuining the theme of patched-together scripts I did this with a shell script, based on StackExchange

k=2
indir="expire_files_2/"
dir="expire_2_mod$k"

readarray -td $'\0' files < <(
   for f in ./"$indir"/*.txt; do
       if [[ -f "$f" ]]; then printf '%s\0' "$f"; fi
   done |
       sort -zV
)

rm -f ./"$dir"/joined-files*.txt
for i in "${!files[@]}"; do
   n=$((i/k+1))
   touch ./"$dir"/joined-files$n.txt
   cat "${files[i]}" ./"$dir"/joined-files$n.txt | sort | uniq > ./"$dir"/joined-files$n.txt
done

Running the results through the same process for percentiles generates numbers in tiles per update - but updates are half as often, so in terms of work done per time, all the numbers need to be divided by k. For a few k, here’s the results.

k=2

zoom	mean	p0	p1	p5	p25	p50	p75	p95	p99	p100
z1	1.7	0.5	1	1	1.5	1.5	2	2	2	2
z2	2.5	0.5	1	1.5	2	2.5	3	3.5	3.5	5
z3	4.5	0.5	2	2.5	4	4.5	5.5	6.5	7.5	12
z4	6.4	0.5	2.5	3.5	5	6	7.5	10	12.5	26
z5	8.6	0.5	2.5	4	6.5	8.5	10	14	17.5	51
z6	10.9	0.5	2.9	4.5	7.5	10.5	13	18.5	24.5	107
z7	13.0	0.5	3	4.5	8.5	12	15.5	23	32	239
z8	14.9	0.5	3	4.5	9	13	17	27	50	535
z9	17.8	0.5	3	5	9.5	14	18.5	32	97	1127
z10	24	0.5	3	5	10	15.5	20.5	41	192	2347
z11	36	0.5	3.5	6	11.5	17.5	24	65	395	4888
z12	64	0.5	4	7	14.5	22	32	120	844	10338
z13	120	0.5	5	9	20	32	50	265	1786	22379
z14	263	0.5	7	14	33	56	99	617	3988	50912

k=5

zoom	mean	p0	p1	p5	p25	p50	p75	p95	p99	p100
z1	0.66	0.20	0.40	0.40	0.60	0.60	0.80	0.80	0.80	0.80
z2	1.01	0.20	0.40	0.60	0.80	1.00	1.20	1.40	1.40	2.00
z3	1.82	0.20	0.80	1.00	1.60	1.80	2.20	2.60	3.00	4.60
z4	2.54	0.20	1.00	1.40	2.00	2.40	3.00	4.00	4.80	8.00
z5	3.40	0.20	1.00	1.60	2.60	3.40	4.00	5.40	7.00	18.80
z6	4.31	0.20	1.02	1.80	3.20	4.20	5.20	7.40	9.80	42.60
z7	5.08	0.20	1.20	1.80	3.40	4.80	6.20	9.20	12.60	93.60
z8	5.78	0.20	1.20	1.80	3.40	5.20	6.80	11.00	18.93	206.20
z9	6.78	0.20	1.20	2.00	3.60	5.60	7.40	13.00	35.40	430.40
z10	8.73	0.20	1.40	2.00	4.00	6.20	8.20	16.40	67.48	895.20
z11	12.76	0.20	1.40	2.40	4.60	7.00	9.60	25.16	150.32	1,865.40
z12	21.60	0.40	1.60	2.80	5.80	8.80	12.80	47.00	328.89	3,932.40
z13	41.88	0.40	2.00	3.60	8.00	12.80	20.60	102.08	712.36	8,486.80
z14	91.76	0.40	2.80	5.40	13.00	22.80	40.40	239.88	1,597.66	19,274.40

Finally, we can reproduce the Geofabrik graph, looking at tiles per minute with update interval and get approximately work ∝ update ^ -1.05, where update is the number of minutes between updates. This means combining multiple updates is very effective at reducing load.

What does all this mean?

This has been a lot of numbers, which is useful for someone in my position, but what does this mean at a practical level?

Big updates happen sometimes, which will slow everything down. Even a powerful server will slow down when multiple large country borders need to be regenerated.
As update interval slows down, the tile server has less work to do and can catch up. Updates every 10 minutes involve approximately 5 times less work than minutely updates, so when a particularly large update happens, the server can easily catch up.
A lower-end server capable of 10 tiles/second can still update every 3 minutes or faster 95% of the time, 3-15 minutes 4% of the time, and only 1% of the time fall farther behind.
You probably don’t want to keep a minutely updated tileset running on your laptop.

Tilelog Country Data

2023-05-21T20:00:00-08:00

I added functionality to tilelog to generate per-country usage information for the OSMF Standard Map Layer. The output of this is a CSV file, generated every day, which contains country code, number of unique IPs that day, tiles per second, and tiles per second that were a cache miss, all for each country code.

With a bit of work, I manipulated the files to give me the usage from the 10 countries with the most usage, for the first four months of 2023.

Perhaps more interesting is looking at the usage for each country by the day of week.

Tilekiln Tile Storage

2022-10-09T20:00:00-08:00

I’m rewriting Tilekiln, tile generation software which leverages PostGIS to allow using established toolchains like osm2pgsql.

Tile storage is a difficult problem. For a tileset going to zoom 14, there are 358 million tiles, and for one going to zoom 15, there are 1.4 billion. Most tiles are smalled, with 80% being about 100 bytes typically, and the largest tiles might be about 1 megabyte.

Tilekiln’s storage must be able to handle these numbers, but also handle incremental minutely updates, and maintenance work like deleting tilesets. A nice to have would be the ability to distribute tilesets easily, but this is not essential.

Options

PMTiles

PMTiles is a file format designed to store an entire tileset in one file. It consists of a directory, which lists offsets for where tiles are within the larger file. Using range requests, any tile can be retrieved in 3 requests in the worst case, while any caching at all will bring this to 2 requests, and typical caching can bring it close to one.

It features de-duplication, both for tiles that are bytewise-indentical, as well as for adjacent offset listings pointing at the same tile.

There is client-side support for some map browser-based display libraries, but most applications will require a server returning conventional that handles conventional z/x/y URLs serving from the PMTiles file. As a fairly new format, support from other applications is limited.

Updating the PMTiles archive in place is possible, because the clients use etags to identify when the archive has changed, invalidating the client-side cache. This means with minutely updates, every one minute, one request from each client will be the worst case, requiring 3 requests. In practice, this doesn’t matter, because for a large tileset, it is impossible to rewrite the entire archive that frequently, as it will take longer than that to write out the complete file.

Pros

Generally most space efficient single-file tileset archive format
Easy to distribute
Can directly serve to some clients

Cons

Impossible to minutely update
Poor support for the archive format outside of specialized software and browser-based libraries

MBTiles

Like PMTiles, MBTiles is a single-file archive format. It was developed by Mapbox for users to generate tiles and upload them to Mapbox’s servers. It’s format is a SQLite database with tables consisting of tile indexes and tile data data as binary blobs. Because it’s based on SQLite, and has been around for longer, support is wide-spread, with several generation. Browser-based support is limited, and it wasn’t designed with that in mind.

Minutely updates are theoretically possible, but in practice, not a good idea. SQLite databases do not work well with high volumes of concurrent reads and writes, generally requiring all work to go through one process. This requires coupling the generation and serving systems.

Pros

Easy to distribute
Good support for non-browser clients

Cons

Poor minutely support
Not suitable for directly serving to browsers

PostgreSQL

Because Tilekiln already requires PostgreSQL, it would be possible to store tiles in it, the same way that MBTiles does.

Pros

Supports minutely updates
Uses software already required

Cons

Custom format
Impossible to distribute the archive

Tiles on disk

Instead of an archive format, it’s possible to store tiles on disk as files. This is the most well-established method, and simplest. Tiles can be updated atomically, and serving tiles is just serving files from disk. The downside comes to managing millions or billions of tiny files. File systems are not designed for this, and can have problems with

minimum file sizes,
inode usage,
inodes per directory, and
cleaning up tilesets.

In particular, it can take a day or longer to delete a tileset.

Pros

Supports minutely updates
Simple serving

Cons

Does not scale to planet-wide tilesets
No archive to distribute

Object stores

A popular approach to store tiles in some form of object store, like S3. All commercial object stores I’ve looked perform badly with large numbers of small objects. While there are sometimes work-arounds for this, their pricing structure generally makes it very expensive to store tiles this way.

Pros

Easy to serve out of
Supports minutely updates

Cons

Very expensive, or requires running your own object store
Slow

Tapalcatl 2

Tapalcatl 2 is a system of using zip files to combine tiles, reducing the number of tiles that need to be stored. It is similar to how raster tiles are combined into metatiles, except that the vector tiles are pre-sliced within the zipfile and can contain multiple zooms.

In a typical configuration, there are zip files generated for tiles on zooms 0, 4, 8, and 12. Each zip file contains the “root” tile and then tiles from the next three zooms that lie within it. This means that a zip archive contains 85 tiles, all tiles within a small area. By combining tiles into one zip archive, this reduces the number of files on disk to 16.8 million files, a small enough number to be reasonably managed on disk.

The format hasn’t had a great deal of usage since it was developed, so support is limited to some server-side programs that take tapalcatl archives and present tiles to the user. These server-side programs are known to have some issues, like not supporting updates to remote tapalcatl tilesets.

Updates are possible in two ways. The first is by taking an existing zip file, replacing the changed tiles within it, and generating a new zip file. The second is to completely regenerate all the tiles in the zip file, which is simpler, but involves more tile generation.

Pros

Supports minutely updates
Allows good decoupling of serving and generation

Cons

Limited client support
Minutely updates are more complicated

Recommendations

The two options which requires further investigation are PostgreSQL and Tapalcatl 2. Both support updates, but come with downsides.

OpenStreetMap Standard Layer: Requests

2021-07-26T20:00:00-08:00

This blog post is a version of my recent SOTM 2021 presentation on the OpenStreetMap Standard Layer and who’s using it.

With the switch to a commercial CDN, we’ve improved our logging significantly and now have the tools to log and analyze logs. We log information on both the incoming request and our response to it.

We log

user-agent, the program requesting the map tile;
referrer, the website containing a map;
some additional headers;
country and region;
network information;
HTTP protocol and TLS version;
response type;
duration;
size;
cache hit status;
datacenter;
and backend rendering server

We log enough information to see what sites and programs are using the map, and additional debugging information. Our logs can easily be analyzed with a hosted Presto system, which allows querying large amounts of data in logfiles.

I couldn’t do this talk without the ability to easily query this data and dive into the logs. So, let’s take a look at what the logs tell us for two weeks in May.

Although the standard layer is used around the world, most of the usage correlates to when people are awake in the US and Europe. It’s tricky to break this down in more detail because we don’t currently log timezones. We’ve added logging information which might make this easier in the future.

Based off of UTC time, which is close to European standard time, weekdays average 30 000 requests per second incoming while weekends average 21 000. The peaks, visible on the graph, show a greater difference. This is because the load on weekends is spread out over more of the day.

On average over the month we serve 27 000 requests per second, and of these, about 7 000 are blocked.

Blocked Requests

Seven thousand requests per second is a lot of blocked requests. We block programs that give bad requests or don’t follow the tile usage policy, mainly

those which lie about what they are,
invalid requests,
misconfigured programs, or
scrapers trying to download everything

They get served

HTTP 400 Bad Request if invalid,
HTTP 403 Forbidden if misconfigured,
HTTP 418 I'm a teapot if pretending to be a different client, or
HTTP 429 Too Many Requests if they are automatically blocked for making excessive requests by scraping.

Before blocking we attempt to contact them, but this doesn’t always work if they’re hiding who they are, or they frequently don’t respond.

HTTP 400 responses are for tiles that don’t exist and will never exist. A quarter of these are for zoom 20, which we’ve never served.

For the HTTP 403 blocked requests, most are not sending a user-agent, a required piece of information. The others are a mix of blocked apps and generic user-agents which don’t allow us to identify the app.

Fake requests get a HTTP 418 response, and they’re nearly all scrapers pretending to be browsers.

In July we added automatic blocking of IPs that were scraping the standard layer, responding with HTTP 429 IPs that are requesting way too many tiles from the backend. This only catches scrapers, but a tiny 0.001% of users were causing 13% of the load, and 0.1% of QGIS users causing 38% of QGIS load.

OpenStreetMap Standard Layer: Introduction

2021-07-10T20:00:00-08:00

This blog post is a version of my recent SOTM 2021 presentation on the OpenStreetMap Standard Layer and who’s using it.

The OpenStreetMap Standard Layer is the default layer on openstreetmap.org, using most of the front page. It’s run by the OpenStreetMap Foundation, and the Operations Working Group is responsible for the planning, organisation and budgeting of OSMF-run services like this one and servers running it. There are other map layers on the front page like Cycle Map and Transport Map, and I encourage you to try them, but they’re not hosted or planned by us.

Technology

At the high level, this is the overview of the technology the OWG is responsible for. The standard layer is divided into million of parts, each of which is called a tile, and we serve tiles.

OSM updates flow into a tile server, where they go into a database. When a tile is needed, a program called renderd makes and store the tile, and something called mod_tile serves it over the web. We have multiple render servers for redundancy and capacity. We’re completely responsible for these, although some of them run on donated hardware.

In front of the tile server we have a content delivery network. This is a commercial service that caches files closer to the users, serving 90% of user requests. It is much faster and closer to the users, but knows nothing about maps. We’re only responsible for the configuration.

The difference between the tile store and tile cache is how they operate, and size. The tile store is much larger and stores more tiles.

Only the cache misses from the CDN impose a load on our servers. When looking at improving performance of the standard layer, I tend to look at cache misses and how to reduce them.

Policy

The OWG has a tile usage policy that sets out what you can and cannot do with our tile layer. We are in principle happy for our map tiles to be used by external users for creative and unexpected uses, but our priority is providing a quickly updating map to improve the editing cycle. This is a big difference between the standard layer and most other commercially available map layers, which might update weekly or monthly.

We prohibit some acitivities like bulk-downloading tiles for a large area (“scraping”) because it puts an excessive load on our servers. This is because we render tiles on-demand and someone scraping all the tiles in an area is downloading tiles they will never view.

OpenStreetMap Survey by Visits

2021-02-17T11:00:00-08:00

In my last post I looked at survey responses by country and their correlation with mappers eligible for a fee waver as an active contributor.

I wanted to look at the correlation with OSM.org views. I already had a full day’s worth of logs on tile.openstreetmap.org accesses, so I filtered them for requests from www.openstreetmap.org and got a per-country count. This is from December 29th, 2020. Ideally it would be from a complete week, and not a holiday, but this is the data I had downloaded.

The big outlier is Italy. It has more visits than I would expect, so I wonder if the holiday had an influence. Like before, the US is overrepresented in the results, Russia and Poland are underrepresented, and Germany is about average.

Like before, I made a graph of the smaller countries.

More small countries are above the average line - probably an influence of Italy being so low.

OpenStreetMap Survey

2021-02-17T11:00:00-08:00

The board has started releasing results from their 2021 survey. I’ve done some analysis on the response rates by country.

There’s lots of data for activity on OSM by country, but for this I took the numbers from joost for how many “active contributors” there are according to the contributor fee waver criteria.

For the larger countries, Russia is the most underrepresented country. This is not surprising, as they are underrepresented in other venues like the OSMF membership.

The US and UK are both slightly overrepresented in the survey, but less so than I would have expected based on other surveys and OSMF membership.

The smaller countries are all crowded, so I did a graph of just them.

As with other surveys, Japan is underrepresented. Indonesia, although underrepresented is less underrepresented than I would have expected.

OpenStreetMap Cartographic: A Client-side Rendered OpenStreetMap Carto

2020-05-24T18:00:00-08:00

I’ve been working on a new project, OpenStreetMap Cartographic. This is a client-side rendering based on OpenStreetMap Carto. This is an ambitious project, as OpenStreetMap Carto is an extremely complex style which shows a large number of features. The technical choices I’m making are designed so the style is capable of handling the load of osm.org with minutely updates.

I’ve put up a world-wide demo at https://pnorman.dev.openstreetmap.org/cartographic/mapbox-gl.html, using data from 2020-03-16, and you can view the code at https://github.com/pnorman/openstreetmap-cartographic.

Incomplete parts

Only zoom 0 to 8 has been implemented so far. I started at zoom 0 and am working my way down.

Admin boundaries are not implemented. OpenStreetMap Carto uses Mapnik-specific tricks to deduplicate the rendering of these. I know how I can do this, but it requires the changes I intend to make with the flex backend.

Landuse, vegetation, and other natural features are not rendered until zoom 7. This is the scale of OpenStreetMap Carto zoom 8, and these features first appear at zoom 5. There are numerous problems with unprocessed OpenStreetMap data at these scales. OpenStreetMap Carto gets a result that looks acceptable but is poor at conveying information by tweaking Mapnik image rasterizing options. I’m looking for better options here involving preprocessed data, but haven’t found any.

I’m still investigating how to best distribute sprites.

Technology

The technology choices are designed to be suitable for a replacement for tile.osm.org. This means minutely updates, high traffic, high reliability, and multiple servers. Tilekiln, the vector tile generator, supports all of these. It’s designed to better share the rendering results among multiple servers, a significant flaw with renderd + mod_tile and the standard filesystem storage. It uses PostGIS’ ST_AsMVT, which is very fast with PostGIS 3.0. On my home system generates z0-z8 in under 40 minutes.

Often forgotten is the development requirements. The style needs to support multiple developers working on similar areas, git merge conflicts while maintaining an easy development workflow. I’m still figuring this out. Mapbox GL styles are written in JSON and most of the tools overwrite any formatting. This means there’s no way to add comments to lines of codes. Comments are a requirement for a style like this, so I’m investigating minimal pre-processing options. The downside to this will make it harder to use with existing GUI editors like Fresco or Maputnik.

Cartography

The goal of this project isn’t to do big cartography changes yet, but client-side rendering opens up new tools. The biggest immediate change is zoom is continuous, no longer an integer or fixed value. This means parameters like sizes can smoothly change as you zoom in and out, specified by their start and end size instead of having to specify each zoom.

Want to help?

Have a look at https://github.com/pnorman/openstreetmap-cartographic and have a go at setting it up and generating your own map. If you have issues, open an issue or pull request. Or, because OpenStreetMap Cartographic uses Tilekiln have a look at its issue list.

Creating Tarballs

2019-01-14T13:21:06-08:00

With all the tiles generated and optimized, they just need to be packaged in a tarball. Before creating them, we want to create some files with metadata about what was used to generate the tiles. The commit of the stylesheet and the timestamp of the planet file can be extracted with a couple of commands.

osmium fileinfo -g 'header.option.osmosis_replication_timestamp' "${PLANET_FILE}" > osm_tiles/timestamp
git -C openstreetmap-carto rev-parse HEAD > osm_tiles/commit

Not every user will want all the zooms, so I’m creating multiple tarballs, going from zoom 0 to zoom 6, 0 to 8, and 0 to 10. This duplicates data between the files, but makes them more useful since only one file needs downloading.

tar will pack all of the tiles into one file, and can optionally compress them. Compressing a png won’t normally save space, but compressing a bunch of PNGs, many of which are identical will save space.

GZIP='--rsyncable --best' tar -C osm_tiles --create --gzip --file tarballs/z6.tar.gz commit timestamp 0 1 2 3 4 5 6
GZIP='--rsyncable --best' tar -C osm_tiles --create --gzip --file tarballs/z8.tar.gz commit timestamp 0 1 2 3 4 5 6 7 8
GZIP='--rsyncable --best' tar -C osm_tiles --create --gzip --file tarballs/z10.tar.gz commit timestamp 0 1 2 3 4 5 6 7 8 9 10

Optimizing PNGs

2018-12-27T23:40:23-08:00

With the tiles generated normally the next step would be to serve them, but because I’m planning to distribute them to others, I’m going to the unusual step of optimizing the PNGs. Optimizing PNGs can cut the file size in half, helping downstream users of the tiles I’m generating.

To make use of all the cores of my CPU, I’m going to use find to locate the PNGs, then the program parallel to have optipng operate in parallel.

OptiPNG is a program that performs lossless optimization on PNGs. Because low-zoom tiles are more likely to be viewed and there’s fewer of them, I’ll call the program with different options, doing more aggressive optimizations on low-zoom tiles. There’s no magic right answer for much time to spend compressing, but I found these reasonable, and save up to 50% space on some zooms.

find osm_tiles/{0,1,2,3,4,5,6}/ -type f -name '*.png' -print0 | parallel -0 -m optipng -quiet -o4 -strip all
find osm_tiles/{7,8}/ -type f -name '*.png' -print0 | parallel -0 -m optipng -quiet -o2 -strip all
find osm_tiles/{9,10}/ -type f -name '*.png' -print0 | parallel -0 -m optipng -quiet -o1 -strip all

The space used can be measured with du -hsc --apparent-size osm_tiles/*. --apparent-size is essential since many of the tiles will be below the size of one block on disk.

All of this is of course not required, but helps a bit, and is an interesting experiement regardless.

Seeding Tiles

2018-12-24T12:15:00-08:00

With the database loaded, all the software installed, and everything configured, it’s time to render tiles. This is done with the mapproxy-seed program, using the previous config files. The only option needed besides config file locations is -c which sets how many CPU threads to use. For the machine I’m using, 7 works best. Fewer leaves some capacity idle, while running with too many threads starves PostgreSQL and system of any CPU time.

mapproxy/bin/mapproxy-seed -s seed.yaml -f mapproxy.yaml -c 7

How long this takes depends on to what zoom you’re seeding, and how powerful the server is. On my server it takes about four hours to seed to zoom 10.

Configuring MapProxy

2018-12-21T03:43:27-08:00

MapProxy needs a couple of configuration files. One defines the layers, caches, and services that it runs. The other is used for “seeding” the cache, and specifies what to pre-render.

There’s a lot of documentation on MapProxy configuration files, and example ones can be created with mapproxy/bin/mapproxy-util create -t base-config.

The first file is mapproxy.yaml, which defines the layers to be rendered

# This sets up the service at tiles/osm, which is useful for debugging.
# It doesn't get used for seeding.
services:
  demo:
  tms:
    use_grid_names: true
    origin: 'nw'

# Just one layer with OSM carto
layers:
  - name: osm
    title: OpenStreetMap Carto
    sources: [osm_cache]

caches:
  osm_cache:
    grids: [GLOBAL_WEBMERCATOR]
    sources: [osm-carto]
    meta_size: [8,8]
    cache:
      type: file
      # Force a meaningful name, since this is only being used for seeding
      directory: osm_tiles
      directory_layout: tms

sources:
  osm-carto:
    type: mapnik
    mapfile: openstreetmap-carto/project.xml

This file can be tested with the command mapproxy/bin/mapproxy-util serve-develop mapproxy.yaml, then the URL http://localhost:8080/tiles/1.0.0/osm/GLOBAL_WEBMERCATOR/0/0/0.png should be a single tile covering the world.

The second file is seed.yaml

seeds:
  world:
    caches: [osm_cache]
    levels:
      to: 8

This sets up a seeding area covering the entire world from zoom 0 to zoom 8. The seeding can be run with mapproxy/bin/mapproxy-seed -s seed.yaml -f mapproxy.yaml and the -c option can be added to set parallelism. After this is done, the tiles are generated, they just need to be packaged.

More Work on Bolder

2018-08-08T09:01:08-07:00

After the birds of a feather session Richard Fairhurst lead at State of the Map, I was motivated to continue some work on bolder, a client-side style I’ve been working on.

While I was working at the Wikimedia Foundation, I developed brighmed, a CartoCSS style using vector tiles. Wikimedia decided not to flip the switch to deploy the style, but the style is open source, so I can use it elsewhere. Making this decision, I spent a day implementing most of it in Tangram.

What’s next?

I’ve got some missing features like service roads and some railway values to add, then I can look at new stuff like POIs. For that I’ll need to look at icons and where to fit them into colourspace.

There’s a bunch of label work that needs to be done, what I have is just a first pass, and some things like motorway names have big issues, and ref tags still need rendering. Label quality is of course a unending quest, but I should be able to get some big gains without much work.

Richard is planning to do some work on writing a schema, and if it works, I’d like to adopt it. At the same time, I don’t want to tie myself to an external schema which may have different cartographic aims, so I’ll have to see how that works out. Looking at past OpenStreetMap Carto changes to project.mml, I found that what would be breaking schema changes on a vector tile project are less common than I thought, happening about once every 4-6 months. Most of the schema changes that would have happened were compatible and could be handled by regenerating tiles in the background.

"Make the Website Use the API" GSOC Project

2018-02-22T15:30:16-08:00

I’m a potential mentor for the Google Summer of Code project. The goal of this project is to change the website to rely on API calls instead of website-only database queries. It’s focused on the “browse” pages for each object, and might need additions to the API to fully reproduce the functionality. Because I get asked a lot, this is a blog post on what I’d suggest doing if you’re a student interested in the project.

1. Know Ruby and JavaScript

The website code is mainly in Ruby on Rails, and you need to know this before starting the project. JavaScript is a good idea, as one implementation route requires client-side JavaScript changes.

2. Map a bit

It may seem odd for the first step of a coding project to have nothing to do with coding, but it’s essential. You need to learn about OSM’s data model, architecture, and what it’s used for, and the fastest way to do this is with by mapping. You’ll also be looking at how editing software interacts with the API. It doesn’t matter too much what you map, but I’d suggest around your university, a past job, or some other area you’re familiar with.

3. Read Matt’s background post

Matt Amos wrote a blog post on API changes which puts this project into a wider context. Most of the work there isn’t part of the GSOC project, but it helps understand why we want to do this project.

4. Read the API documentation

The API documentation covers all of the API calls, but the ones that are particularly important for the project are the read calls for elements, full versions for ways and relations, ways for node call, relations for element, read and download calls for changesets, and read note call.

The map call, and changeset model are also important concepts to understand.

5. Use JOSM with the console open

Start JOSM with a console window open, and will show all the API calls it makes. When you’ve done this, edit some more. Make sure to use the show object, show object history, download relation, and other tools that download data. Watch what API calls are made, compare them against the API documentation, and understand what it’s doing.

6. Explore getting object information

There’s a few ways to get object information. The obvious one is the “browse” pages at https://www.openstreetmap.org/way/, but also include history view in JOSM and OSM Deep History. The first page doesn’t use the API and the second two do. The goal of this project is to make the first page use the API.

7. Examine a browse page

The next two steps are a form of homework and necessary to write your proposal. Look at a the node browse page for node 5324545411. Write down what API calls are needed to get all the information on it. It should be possible to do it in a fixed number of API calls, in this case four calls.

8. Identify missing API calls

For some browse pages it’s not possible to get all the information in a fixed number of API calls. Take a look at way 471813907 and see what infomation is missing or would require recursive API calls. Part of the project will be proposing and implementing new API calls to fill the missing needs.

Some more background is found in some emails from a year ago

https://lists.openstreetmap.org/pipermail/dev/2017-February/029705.html and a pure-javascript approach vs internal calls to API endpoints
https://lists.openstreetmap.org/pipermail/dev/2017-February/029700.html another writeup, including the two possible technical routes for this

Installing MapProxy

2018-02-06T03:05:24-08:00

Switching gears, with the database loaded, it’s time to install more software.

OpenStreetMap Carto generates a Mapnik XML stylesheet, which can be used by any software that includes Mapnik. Some of the common options are

renderd for serving tiles,
Nik4 for static images,
MapProxy for serving tiles,
TileStache for serving tiles, and
generate_tiles.py.

None of these options is perfect for anything. For this particular use the requirements are

renders using metatiles,
creates a directory of PNGs,
works in parallel.

The options which meet this are:

renderd + mod_tile/tirex and curl. This requires running a server and scraping it with curl. Although it works, it’s not ideal, and involves setting up a great deal of supporting software. When you don’t need to live-render and handle data updates, a lot of the features of renderd are useless and add complexity
generate_tiles.py and other options that call the Mapnik API. Although capable, this typically involves some work to do metatiles and parallization.
MapProxy. MapProxy is lacking in features for continual data updates, but they aren’t needed for this use.
TileStache. TileStache is similar to MapProxy, but I find MapProxy easier to set up, so I didn’t investigate it in detail.

MapProxy or accessing the Mapnik API directly are the best two options. It’s a lot easier to set up MapProxy than write new code, so that’s the option I’ll go with.

With MapProxy selected, we need to install it. Unfortunately, this requires installing Mapnik. Mapnik has a reputation of being difficult to compile, having an API that changes between versions when it shouldn’t, poor support for bindings for other languages, versioning problems, and generally being tricky to work with. This reputation is accurate.

If I were trying to install Mapnik on anything other than a Debian system, it would be tricky, but I can use the excellent work of the Debian GIS team. All that’s needed is apt-get install libmapnik3.0 mapnik-utils python-mapnik, and the required software is there. In addition to Mapnik, the virtualenv package provides virtualenv, a program for isolated Python environments.

The install script is a simple two lines

virtualenv --quiet --system-site-packages mapproxy
mapproxy/bin/pip install "MapProxy>=1.11.0,<=1.11.99"

The first line creates a virtualenv named mapproxy that has access to the system Python packages, most importantly Mapnik. The second installs MapProxy 1.11 in it.

Loading the Data

2018-01-23T14:59:12-08:00

With data downloaded and the style built, the next step is to load the data. Sometimes this scares people, but really shouldn’t. A modern server with the capacity to serve the world will have no problems building the database.

Loading can easily be done on a single CPU server and the RAM needed is less than you want for caching later on.

Like before, the first step is setting some variables.

#!/usr/bin/env bash

set -euf -o pipefail

PLANET_FILE='data.osm.pbf'
export PGDATABASE='osmcarto_prerender'

Next, a database is needed. OpenStreetMap Carto documents what extensions are needed by it, so we just need to follow those directions.

dropdb --if-exists "${PGDATABASE}"

createdb
psql -Xqw -c 'CREATE EXTENSION postgis; CREATE EXTENSION hstore;'

OpenStreetMap Carto needs data loaded with osm2pgsql, like most styles. The osm2pgsql options can be broken down into three groups: style settings, performance, and locations.

The style settings control how the data in the database is represented. These are given by the style. We don’t have to know what they mean, so we just have to use what OpenStreetMap Carto’s documentation says: -G --hstore --style openstreetmap-carto.style --tag-transform-script openstreetmap-carto.lua

The locations are where to get the OSM data, database names, and other information that relates to where to read and save everything.

Performance options are the only ones that require some judgement to set. Because this script is intended for the full planet, we use --slim --flat-nodes ${FLAT_NODES}, just like the osm2pgsql documentation suggests. Also, we know the database will not be updated with --append, so we can use the --drop option, which skips indexing the slim tables and drops them instead, saving time and space.

We need to set the how much memory is used to cache node positions. This should never be set so high that the server runs out of RAM, but there’s no gain to setting it to more than is needed to cache every node. A general rule of thumb is to set it to 75% of RAM size, in MB. With the size of the planet right now, I also know that it doesn’t need more than 40GB, but this is subject to change.

This results in the osm2pgsql command

FLAT_NODES='nodes.bin'
OSM2PGSQL_CACHE='40000'

osm2pgsql -G --hstore --style 'openstreetmap-carto/openstreetmap-carto.style' \
  --tag-transform-script 'openstreetmap-carto/openstreetmap-carto.lua' \
  --slim --drop --flat-nodes "${FLAT_NODES}" --cache "${OSM2PGSQL_CACHE}" \
  -d "${PGDATABASE}" "${PLANET_FILE}"

On a SSD-based server with 64GB RAM, this should take 10-20 hours to process the planet. On a tuned server with NVMe drives, it can be under 5 hours.

Last is building some indexes the stylesheet relies on. Normally we could use the indexes.sql file that is part of OpenStreetMap Carto, but because this database isn’t going to be updated, the fillfactor option can be set to build more efficient indexes

openstreetmap-carto/scripts/indexes.py --fillfactor 100 | psql -Xqw -f -

Rearranging the order of some commands and adding cleanup, we get a script that we can run.

#!/usr/bin/env bash

set -euf -o pipefail

PLANET_FILE='data.osm.pbf'
export PGDATABASE='osmcarto_prerender'
FLAT_NODES='nodes.bin'
OSM2PGSQL_CACHE='40000'

# PGDATABASE is set, so postgres commands don't need a database name supplied

# Clean up any existing db and files
dropdb --if-exists "${PGDATABASE}"
rm -f -- "${FLAT_NODES}"

createdb
psql -Xqw -c 'CREATE EXTENSION postgis; CREATE EXTENSION hstore;'

osm2pgsql -G --hstore --style 'openstreetmap-carto/openstreetmap-carto.style' \
  --tag-transform-script 'openstreetmap-carto/openstreetmap-carto.lua' \
  --slim --drop --flat-nodes "${FLAT_NODES}" --cache "${OSM2PGSQL_CACHE}" \
  -d "${PGDATABASE}" "${PLANET_FILE}"

rm -f -- "${FLAT_NODES}"

openstreetmap-carto/scripts/indexes.py --fillfactor 100 | psql -Xqw -f -

Edit: Information about indexes added

Add Some Style

2018-01-22T16:04:15-08:00

Last post ended with downloading OpenStreetMap data. This post will leave the data aside and switch to downloading and building a style. There’s lots of styles available, but we’re going to use OpenStreetMap Carto, the current default on OpenStreetMap.org. Also, because we need software not packaged in Debian, that needs to be installed.

For the script, we’re going to assume that the carto binary is in the PATH. Unfortunately, this requires installation, which requires npm, which itself needs to be installed.

Given nodejs and npm is a huge headache of versions, the easiest route I’ve found is to install nvm, then install nodejs 6 with nvm install 6. CartoCSS is then installed with npm install -g carto.

The shell script starts off with some variables from last time.

#!/usr/bin/env bash

set -euf -o pipefail

OpenStreetMap Carto is hosted on Github, which offers the ability to download a project as a zip file. This is the logical way to get it, but isn’t usable from a script because the internal structure of the zip file isn’t easily predicted. Instead, we’ll clone it with git, only getting the specific revision needed.

OSMCARTO_VERSION="v4.6.0"
OSMCARTO_LOCATION='https://github.com/gravitystorm/openstreetmap-carto.git'
rm -rf -- 'openstreetmap-carto'
git -c advice.detachedHead=false clone --quiet --depth 1 \
  --branch "${OSMCARTO_VERSION}" -- "${OSMCARTO_LOCATION}" 'openstreetmap-carto'

Setting advice.detachedHead=false for this command avoids a warning about a detached HEAD, which is expected.

OpenStreetMap Carto sets the database name to be “gis”. There are various ways to override this for development, but in this case we want to override it for the generated XML file. Fortunately, the database name only appears once, as dbname: "gis" in project.mml. One way to override it would be to remove the line and rely on the libpq environment variables like PGDATABASE. Another is replacing “gis” with a different name. It’s not clear which is better, but I decided to go with replacing the name, using a patch which git applies.

export PGDATABASE='osmcarto_prerender'

git -C 'openstreetmap-carto' apply << EOF
diff --git a/project.mml b/project.mml
index b8c3217..a41e550 100644
--- a/project.mml
+++ b/project.mml
@@ -30,7 +30,7 @@ _parts:
     srs: "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs"
   osm2pgsql: &osm2pgsql
     type: "postgis"
-    dbname: "gis"
+    dbname: "${PGDATABASE}"
     key_field: ""
     geometry_field: "way"
     extent: "-20037508,-20037508,20037508,20037508"
EOF

With project.mml patched, it’s easy to generate the Mapnik XML, because CartoCSS was installed earlier.

carto -a 3.0.12 'openstreetmap-carto/project.mml' > 'openstreetmap-carto/project.xml'

Lastly, OpenStreetMap Carto needs some data files like coastlines. It comes with a script to download them, so we run it.

openstreetmap-carto/scripts/get-shapefiles.py

Taking all of this and re-arranging it as, we end up with the following script.

#!/usr/bin/env bash

set -euf -o pipefail

OSMCARTO_VERSION="v4.6.0"
OSMCARTO_LOCATION='https://github.com/gravitystorm/openstreetmap-carto.git'

rm -rf -- 'openstreetmap-carto'
git -c advice.detachedHead=false clone --quiet --depth 1 \
  --branch "${OSMCARTO_VERSION}" -- "${OSMCARTO_LOCATION}" 'openstreetmap-carto'
carto -a 3.0.12 'openstreetmap-carto/project.mml' > 'openstreetmap-carto/project.xml'

openstreetmap-carto/scripts/get-shapefiles.py

It Starts With the Planet

2018-01-15T17:37:23-08:00

To do something with OpenStreetMap data, we have to download it first. This can be the entire data from planet.openstreetmap.org or a smaller extract from a provider like Geofabrik. If you’re doing this manually, it’s easy. Just a single command will call curl or wget, or you can download it from the browser. If you want to script it, it’s a bit harder. You have to worry about error conditions, what can go wrong, and make sure everything can happen unattended. So, to make sure we can do this, we write a simple bash script.

The goal of the script is to download the OSM data to a known file name, and return 0 if successful, or 1 if an error occurred. Also, to keep track of what was downloaded, we’ll make two files with information on what was downloaded, and what state it’s in: state.txt and configuration.txt. These will be compatible with osmosis, the standard tool for updating OpenStreetMap data.

Before doing anything else, we specify that this is a bash script, and that if anything goes wrong, the script is supposed to exit.

#!/usr/bin/env bash

set -euf -o pipefail

Next, we put the information about what’s being downloaded, and where, into variables. It’s traditional to use the Geofabrik Liechtenstein extract for testing, but the same scripts will work with the planet.

PLANET_FILE='data.osm.pbf'

PLANET_URL='http://download.geofabrik.de/europe/liechtenstein-latest.osm.pbf'
PLANET_MD5_URL="${PLANET_URL}.md5"

We’ll be using curl to download the data, and every time we call it, we want to add the options -s and -L. Respectively, these make curl silent and cause it to follow redirects. Two files are needed: the data, and it’s md5 sum. The md5 file looks something like 27f7... liechtenstein-latest.osm.pbf. The problem with this is we’re saving the file as $PLANET_FILE, not liechtenstein-latest.osm.pbf. A bit of manipulation with cut fixes this.

CURL='curl -s -L'
MD5="$($CURL "${PLANET_MD5_URL}" | cut -f1 -d' ')"
echo "${MD5}  ${PLANET_FILE}" > "${PLANET_FILE}.md5"

The reason for downloading the md5 first is it reduces the time between the two downloads are initiated, making it less likely the server will have a new version uploading in that time.

The next step is easy, downloading the planet, and checking the download wasn’t corrupted. It helps to have a good connection here.

$CURL -o "${PLANET_FILE}" "${PLANET_URL}" || { echo "Planet file failed to download"; exit 1; }

md5sum --quiet --status --strict -c "${PLANET_FILE}.md5" || { echo "md5 check failed"; exit 1; }

Libosmium is a popular library for manipulating OpenStreetMap data, and the osmium command can show metadata from the header of the file. The command osmium fileinfo data.osm.pbf tells us

Header:
  Bounding boxes:
    (9.47108,47.0477,9.63622,47.2713)
  With history: no
  Options:
    generator=osmium/1.5.1
    osmosis_replication_base_url=http://download.geofabrik.de/europe/liechtenstein-updates
    osmosis_replication_sequence_number=1764
    osmosis_replication_timestamp=2018-01-15T21:43:03Z
    pbf_dense_nodes=true
    timestamp=2018-01-15T21:43:03Z

The osmosis properties tell us where to go for the updates to the data we downloaded. Despite not needing the updates for this task, it’s useful to store this in the state.txt and configuration.txt files mentioned above.

Rather than try to parse osmium’s output, it has an option to just extract one field. We use this to get the base URL, and save that to configuration.txt

REPLICATION_BASE_URL="$(osmium fileinfo -g 'header.option.osmosis_replication_base_url' "${PLANET_FILE}")"
echo "baseUrl=${REPLICATION_BASE_URL}" > 'configuration.txt'

Replication sequence numbers needed to represented as a three-tiered directory structure, for example 123/456/789. By taking the number, padding it to 9 characters with 0s, and doing some sed magic, we get this format. From there, it’s easy to download the state.txt file representing the state of the data that was downloaded.

REPLICATION_SEQUENCE_NUMBER="$( printf "%09d" "$(osmium fileinfo -g 'header.option.osmosis_replication_sequence_number' "${PLANET_FILE}")" | sed ':a;s@\B[0-9]\{3\}\>@/&@;ta' )"

$CURL -o 'state.txt' "${REPLICATION_BASE_URL}/${REPLICATION_SEQUENCE_NUMBER}.state.txt"

After all this has been run, we’ve got the planet, it’s md5 file, and the state and configuration that correspond to the download.

Combining the code fragments, adding some comments, and cleaning up the files results in this shell script

#!/usr/bin/env bash

set -euf -o pipefail

PLANET_FILE='data.osm.pbf'

PLANET_URL='http://download.geofabrik.de/europe/liechtenstein-latest.osm.pbf'
PLANET_MD5_URL="${PLANET_URL}.md5"
CURL='curl -s -L'

# Clean up any remaining files
rm -f -- "${PLANET_FILE}" "${PLANET_FILE}.md5" 'state.txt' 'configuration.txt'

# Because the planet file name is set above, the provided md5 file needs altering
MD5="$($CURL "${PLANET_MD5_URL}" | cut -f1 -d' ')"
echo "${MD5}  ${PLANET_FILE}" > "${PLANET_FILE}.md5"

# Download the planet
$CURL -o "${PLANET_FILE}" "${PLANET_URL}" || { echo "Planet file failed to download"; exit 1; }

md5sum --quiet --status --strict -c "${PLANET_FILE}.md5" || { echo "md5 check failed"; exit 1; }

REPLICATION_BASE_URL="$(osmium fileinfo -g 'header.option.osmosis_replication_base_url' "${PLANET_FILE}")"
echo "baseUrl=${REPLICATION_BASE_URL}" > 'configuration.txt'

# sed to turn into / formatted, see https://unix.stackexchange.com/a/113798/149591
REPLICATION_SEQUENCE_NUMBER="$( printf "%09d" "$(osmium fileinfo -g 'header.option.osmosis_replication_sequence_number' "${PLANET_FILE}")" | sed ':a;s@\B[0-9]\{3\}\>@/&@;ta' )"

$CURL -o 'state.txt' "${REPLICATION_BASE_URL}/${REPLICATION_SEQUENCE_NUMBER}.state.txt"

Data to Tiles

2018-01-15T17:23:50-08:00

The most common use for OpenStreetMap data is hosting your own map. If you need up to the minute data, the entire world, and high zooms, this requires a dedicated server running renderd+mod_tile or other specialized software that handles requests. On the other hand, if less frequently updated data and low zooms is all that’s needed, it can make more sense to pre-render tiles and serve them off of an existing server as files from disk.

Over the next few posts, I’m going to be walking through step-by-step on how to generate these files, starting with downloading OpenStreetMap data, and ending up with rendered tiles.

It starts with the planet - downloading OSM the right way
Add some style - building a stylesheet
Loading the data - using osm2pgsql
Installing MapProxy - lots of options, all similar
Configuring MapProxy
Seeding tiles
Optimizing PNGs

Serving Vector Tiles

2016-11-18T19:52:06-08:00

If you want to serve vector tiles, there are a few server options that have developed, each with different strengths and weaknesses.

node-mapnik based

Language: nodejs
Layer definitions: Mapnik layer definitions in XML, typically preprocessed from YAML
Vector tile formats: Mapbox Vector Tiles
Data source support: PostGIS

Kartotherian, tessera, and other servers based on tilelive all rely on Node bindings to Mapnik to produce vector tiles. They all work with Mapnik layer definitions. This is a reasonably well understood language and consists primarily of a SQL statement for each layer. This is reasonably flexable and it’s possible to do proper code review, git conflict resolution, and other processes you need with an open style.

Some servers can turn the Mapbox Vector Tiles into GeoJSON, but not all do. There are other minor differences, but they all have the same major advantages and disadvantages.

The biggest problem with these options is that you have to either use the exact same versions of everything as the Mapbox developers while hoping their changes work with your code, or lock down your versions to a set of known good versions and periodically update when you need new features, retesting all your code. Neither of these is practical for an open-source style which wants to involve others.

If you don’t do this, you’ll find parts of your server failing with different combinations of Mapnik and node-mapnik.

Tilezen tileserver

Language: Python
Layer definitions: SQL in jinja2 templates, YAML
Vector tile formats: Mapbox Vector Tiles, TopoJSON, and GeoJSON
Data source support: PostGIS

Tilezen tileserver was written by Mapzen to replace their TileStache-based vector tile generation. Having been written by developers who wrote previous vector tile servers, it combines ideas and functionality other options don’t have.

The datasource definitions are written in SQL + YAML, a common choice, but unlike other options, the SQL is in its own files which are preprocessed by the jinja2 templating engine. This adds some complexity, but a great deal of power. Selecting different features by zoom level normally requires repetative SQL and lengthy UNION ALL queries, but the preprocessing allows queries to be written more naturally.

Tileserver’s unique feature is the post-processing capabilities it offers. This allows vector tiles to be operated on after the database, altering geometries, changing attributes, and combining geometries. Post-processing to reduce size is a necessary feature if targeting mobile devices on slower connections. Mapbox had been working on this in the open, but now that they no longer use node-mapnik it’s not clear how they do so. MapQuest had developed Avecado to specifically target this, but it became abandoned when they stopped doing their own map serving.

You don’t need any AWS services for a basic Tilezen tileserver deployment, but there might be some dependencies in the more advanced features needed to set up a full production environment.

Tegola

Language: Go
Layer definitions: SQL in TOML
Vector tile formats: Mapbox Vector Tiles
Data source support: PostGIS

Tegola is a new server written in Go. It operates with multiple providers which supply layers to maps, allowing them to be assembled different ways. It looks like it has most of the features needed for vector tiles for a basemap, but might be missing a few needed for changing data as zoom changes.

SQL in TOML is similar to SQL in YAML for layer definitions, and like this it is reasonably flexable and makes it possible to do proper code review, git conflict resolution, and other processes you need with an open style.

I haven’t had a chance to deploy it yet, so I’m not sure what difficulties there are.

t-rex

Language: Rust
Layer definitions: SQL in TOML
Vector tile formats: Mapbox Vector Tiles
Data source support: PostGIS

t-rex is a new server written in Rust. It’s unique feature it that it can auto-configure layers from PostGIS tables. It does have all the required features for selecting appropriate data in a basemap.

It’s layer definitions are different than Tegola’s, but they are both SQL in TOML, and share the same strengths.

Like Tegola, I haven’t had a chance to deploy it.

TileStache

Language: Python
Layer definitions: SQL in JSON Vector tile formats: Mapbox Vector Tiles, TopoJSON, GeoJSON, and Arc GeoServices JSON Data source support: PostGIS

TileStache is a general-purpose tile server which Mapzen used to use a fork of to serve their Tilezen schema. They’ve switched to Tilezen tileserver, but the functionality they added has been merged back into TileStache. Unfortunately, the documentation hasn’t caught up yet, so there’s not too much information about all of its functionality.

Deploying TileStache tends to be reasonable - particularly compared to node-mapnik - but the language of SQL in JSON is one that’s a problem for open projects with multiple authors and prevents proper code review and git conflict resolution.

Tilemaker

Language: C++
Layer definitions: Lua
Vector tile formats: Mapbox Vector Tiles
Data source support: OSM PBF and shapefiles

Tilemaker is built around the idea of vector tiles without a serving stack. It does this by doing an in-memory conversion directly from OSM PBF data to pre-generated vector tiles, which can then be served using Apache, a S3 bucket, or any means of serving files from disk. This vastly simplifies deployment and reduces sources of downtime.

For serving a city or most countries this can be the ideal method, but the same strengths that make it good for this are a problem for processing the planet. It takes large amounts of RAM, can’t consume minutely changes, and has to create vector tiles for the entire PBF at once.

Tilemaker is also the only server to support directly using shapefiles for low zoom data and OSM for high zoom. Other options require loading into PostGIS and using SQL that selects the appropriate data based on zoom.

VectorTileCreator

Language: Python
Layer definitions: osmfilter options
Vector tile formats: o5m
Data source support: OSM PBF and other raw OSM data

VectorTileCreator is part of KDE Marble and takes the unique approach of creating tiles of raw OSM data. It uses osmfilter’s language for filtering OSM data, but lacks the means to use other data sources, something most maps will need. The support of o5m vector tiles is also limited. Like tilemaker it runs from the command line and produces a set of vector tiles.

Which should I use?

What you should use depends on your needs. First figure out what support you need for the full planet, updates, data sources, and output formats. If you need diff update support, then you need something that can create a single vector tile and Tilemaker won’t work. If you need TopoJSON support, node-mapnik won’t work.

Server	Full planet	Diff updates	Non-OSM data	GeoJSON	TopoJSON	Mapbox Vector Tiles
node-mapnik	Yes	Yes	Yes	Some	No	Yes
Tilezen tileserver	Yes	Yes	Yes	Yes	Yes	Yes
Tegola	Yes	Yes	Yes	No	No	Yes
t-rex	Yes	Yes	Yes	No	No	Yes
TileStache	Yes	Yes	Yes	Yes	No	Yes
Tilemaker	No	No	Yes	No	No	Yes
VectorTileCreator	Unknown	No	No	No	No	No