Paul’s Blog

A blog without a good name

Minutely Updated Tile Volume: Technical Details

I’ve been looking at how many tiles are changed when updating OSM data in order to better guide resource estimations, and have completed some benchmarks. This is the technical post with details, I’ll be doing a high-level post later.

Software like Tilemaker and Planetiler is great for generating a complete set of tiles, updated about once a day, but they can’t handle minutely updates. Most users are fine with daily or slower updates, but OSM.org users are different, and minutely updates are critical for them. All the current minutely ways to generate map tiles involve loading the changes and regenerating tiles when data in them may have changed. I used osm2pgsql, the standard way to load OSM data for rendering, but the results should be applicable to other ways including different schemas.

Tilelog Country Data

I added functionality to tilelog to generate per-country usage information for the OSMF Standard Map Layer. The output of this is a CSV file, generated every day, which contains country code, number of unique IPs that day, tiles per second, and tiles per second that were a cache miss, all for each country code.

With a bit of work, I manipulated the files to give me the usage from the 10 countries with the most usage, for the first four months of 2023.

Tile usage per country by date

Perhaps more interesting is looking at the usage for each country by the day of week.

Tile usage per country by date

Tilekiln Tile Storage

I’m rewriting Tilekiln, tile generation software which leverages PostGIS to allow using established toolchains like osm2pgsql.

Tile storage is a difficult problem. For a tileset going to zoom 14, there are 358 million tiles, and for one going to zoom 15, there are 1.4 billion. Most tiles are smalled, with 80% being about 100 bytes typically, and the largest tiles might be about 1 megabyte.

Tilekiln’s storage must be able to handle these numbers, but also handle incremental minutely updates, and maintenance work like deleting tilesets. A nice to have would be the ability to distribute tilesets easily, but this is not essential.

Options

PMTiles

PMTiles is a file format designed to store an entire tileset in one file. It consists of a directory, which lists offsets for where tiles are within the larger file. Using range requests, any tile can be retrieved in 3 requests in the worst case, while any caching at all will bring this to 2 requests, and typical caching can bring it close to one.

It features de-duplication, both for tiles that are bytewise-indentical, as well as for adjacent offset listings pointing at the same tile.

There is client-side support for some map browser-based display libraries, but most applications will require a server returning conventional that handles conventional z/x/y URLs serving from the PMTiles file. As a fairly new format, support from other applications is limited.

Updating the PMTiles archive in place is possible, because the clients use etags to identify when the archive has changed, invalidating the client-side cache. This means with minutely updates, every one minute, one request from each client will be the worst case, requiring 3 requests. In practice, this doesn’t matter, because for a large tileset, it is impossible to rewrite the entire archive that frequently, as it will take longer than that to write out the complete file.

Pros

  • Generally most space efficient single-file tileset archive format
  • Easy to distribute
  • Can directly serve to some clients

Cons

  • Impossible to minutely update
  • Poor support for the archive format outside of specialized software and browser-based libraries

MBTiles

Like PMTiles, MBTiles is a single-file archive format. It was developed by Mapbox for users to generate tiles and upload them to Mapbox’s servers. It’s format is a SQLite database with tables consisting of tile indexes and tile data data as binary blobs. Because it’s based on SQLite, and has been around for longer, support is wide-spread, with several generation. Browser-based support is limited, and it wasn’t designed with that in mind.

Minutely updates are theoretically possible, but in practice, not a good idea. SQLite databases do not work well with high volumes of concurrent reads and writes, generally requiring all work to go through one process. This requires coupling the generation and serving systems.

Pros

  • Easy to distribute
  • Good support for non-browser clients

Cons

  • Poor minutely support
  • Not suitable for directly serving to browsers

PostgreSQL

Because Tilekiln already requires PostgreSQL, it would be possible to store tiles in it, the same way that MBTiles does.

Pros

  • Supports minutely updates
  • Uses software already required

Cons

  • Custom format
  • Impossible to distribute the archive

Tiles on disk

Instead of an archive format, it’s possible to store tiles on disk as files. This is the most well-established method, and simplest. Tiles can be updated atomically, and serving tiles is just serving files from disk. The downside comes to managing millions or billions of tiny files. File systems are not designed for this, and can have problems with

  • minimum file sizes,
  • inode usage,
  • inodes per directory, and
  • cleaning up tilesets.

In particular, it can take a day or longer to delete a tileset.

Pros

  • Supports minutely updates
  • Simple serving

Cons

  • Does not scale to planet-wide tilesets
  • No archive to distribute

Object stores

A popular approach to store tiles in some form of object store, like S3. All commercial object stores I’ve looked perform badly with large numbers of small objects. While there are sometimes work-arounds for this, their pricing structure generally makes it very expensive to store tiles this way.

Pros

  • Easy to serve out of
  • Supports minutely updates

Cons

  • Very expensive, or requires running your own object store
  • Slow

Tapalcatl 2

Tapalcatl 2 is a system of using zip files to combine tiles, reducing the number of tiles that need to be stored. It is similar to how raster tiles are combined into metatiles, except that the vector tiles are pre-sliced within the zipfile and can contain multiple zooms.

In a typical configuration, there are zip files generated for tiles on zooms 0, 4, 8, and 12. Each zip file contains the “root” tile and then tiles from the next three zooms that lie within it. This means that a zip archive contains 85 tiles, all tiles within a small area. By combining tiles into one zip archive, this reduces the number of files on disk to 16.8 million files, a small enough number to be reasonably managed on disk.

The format hasn’t had a great deal of usage since it was developed, so support is limited to some server-side programs that take tapalcatl archives and present tiles to the user. These server-side programs are known to have some issues, like not supporting updates to remote tapalcatl tilesets.

Updates are possible in two ways. The first is by taking an existing zip file, replacing the changed tiles within it, and generating a new zip file. The second is to completely regenerate all the tiles in the zip file, which is simpler, but involves more tile generation.

Pros

  • Supports minutely updates
  • Allows good decoupling of serving and generation

Cons

  • Limited client support
  • Minutely updates are more complicated

Recommendations

The two options which requires further investigation are PostgreSQL and Tapalcatl 2. Both support updates, but come with downsides.

OpenStreetMap Standard Layer: Requests

This blog post is a version of my recent SOTM 2021 presentation on the OpenStreetMap Standard Layer and who’s using it.

With the switch to a commercial CDN, we’ve improved our logging significantly and now have the tools to log and analyze logs. We log information on both the incoming request and our response to it.

OpenStreetMap Standard Layer: Introduction

This blog post is a version of my recent SOTM 2021 presentation on the OpenStreetMap Standard Layer and who’s using it.

The OpenStreetMap Standard Layer is the default layer on openstreetmap.org, using most of the front page. It’s run by the OpenStreetMap Foundation, and the Operations Working Group is responsible for the planning, organisation and budgeting of OSMF-run services like this one and servers running it. There are other map layers on the front page like Cycle Map and Transport Map, and I encourage you to try them, but they’re not hosted or planned by us.

OpenStreetMap Survey by Visits

In my last post I looked at survey responses by country and their correlation with mappers eligible for a fee waver as an active contributor.

I wanted to look at the correlation with OSM.org views. I already had a full day’s worth of logs on tile.openstreetmap.org accesses, so I filtered them for requests from www.openstreetmap.org and got a per-country count. This is from December 29th, 2020. Ideally it would be from a complete week, and not a holiday, but this is the data I had downloaded.

Preview image

The big outlier is Italy. It has more visits than I would expect, so I wonder if the holiday had an influence. Like before, the US is overrepresented in the results, Russia and Poland are underrepresented, and Germany is about average.

Like before, I made a graph of the smaller countries.

Preview image

More small countries are above the average line - probably an influence of Italy being so low.

OpenStreetMap Survey

The board has started releasing results from their 2021 survey. I’ve done some analysis on the response rates by country.

There’s lots of data for activity on OSM by country, but for this I took the numbers from joost for how many “active contributors” there are according to the contributor fee waver criteria.

Preview image

For the larger countries, Russia is the most underrepresented country. This is not surprising, as they are underrepresented in other venues like the OSMF membership.

The US and UK are both slightly overrepresented in the survey, but less so than I would have expected based on other surveys and OSMF membership.

The smaller countries are all crowded, so I did a graph of just them.

Preview image

As with other surveys, Japan is underrepresented. Indonesia, although underrepresented is less underrepresented than I would have expected.

OpenStreetMap Cartographic: A Client-side Rendered OpenStreetMap Carto

I’ve been working on a new project, OpenStreetMap Cartographic. This is a client-side rendering based on OpenStreetMap Carto. This is an ambitious project, as OpenStreetMap Carto is an extremely complex style which shows a large number of features. The technical choices I’m making are designed so the style is capable of handling the load of osm.org with minutely updates.

I’ve put up a world-wide demo at https://pnorman.dev.openstreetmap.org/cartographic/mapbox-gl.html, using data from 2020-03-16, and you can view the code at https://github.com/pnorman/openstreetmap-cartographic.

Creating Tarballs

With all the tiles generated and optimized, they just need to be packaged in a tarball. Before creating them, we want to create some files with metadata about what was used to generate the tiles. The commit of the stylesheet and the timestamp of the planet file can be extracted with a couple of commands.

Optimizing PNGs

With the tiles generated normally the next step would be to serve them, but because I’m planning to distribute them to others, I’m going to the unusual step of optimizing the PNGs. Optimizing PNGs can cut the file size in half, helping downstream users of the tiles I’m generating.