Paul’s Blog

A blog without a good name

Serving Vector Tiles

If you want to serve vector tiles, there are a few server options that have developed, each with different strengths and weaknesses.

Identifying Slow Rendering Queries

In my last post I took a sample of PostgreSQL queries and loaded them into a table to do some analysis. The next step is to

This log will include queries from osm2pgsql updating the database, but normally we’re only optimizing rendering queries. Mapnik starts all of these with SELECT ST_AsBinary("way") AS geom so we can make a view of just rendering queries with CREATE VIEW render_query_log AS SELECT * FROM query_log WHERE query LIKE 'SELECT ST_AsBinary("way") AS geom%';

There are some tools to turn PostgreSQL queries back into style layers and zoom, but none of them work well on modern complex style queries, so we need to figure out a different way to group queries for the same layers.

Mapnik has four tokens which can appear in a query.

  1. !bbox! is replaced the buffered bounding box of the area being rendered. This text ends up looking like ST_SetSRID('BOX3D(...

  2. !scale_denominator! is replaced by the scale in projected units per pixel. OpenStreetMap Carto doesn’t use this, but it’s often seen in vector tile styles to calculate the zoom level. This is a number, either an integer or floating point depending on zoom.

  3. !pixel_width! and !pixel_height! which are the width and height of a pixel in projected units. These are both numbers, either integers or floating points. They can be zero. In OpenStreetMap carto they are alway seen in the form way_area/NULLIF(!pixel_width!::real*!pixel_height!::real,0) or way_area > [some number]*!pixel_width!::real*!pixel_height!::real.

To get the common part of the query we can cut off the query text at the first occurance of ST_SetSRID or way_area. The former is only inserted by Mapnik and the latter only appears on lines with Mapnik tokens or as part of an `ORDER BY very late in the query.

Splitting the string by these tokens and taking the first part can solve this.

1
2
3
4
SELECT count(*), sum(run_length), (regexp_split_to_array(query, '(way_area|ST_SetSRID)'))[1] AS q
  FROM render_query_log
  GROUP BY q
  ORDER by sum DESC;

SELECT count(*), sum(run_length), query q FROM render_query_log GROUP BY q

regexp_split_to_array(‘select foo from a way_area b seasdfas’, ‘(way_area|ST_SetSRID)’)

Sampling Slow Postgres Queries

Part of optimizing a style or application using PostgreSQL is finding what queries are taking up the most time, and improving them. The first part of this is finding the slow queries.

A popular way to do this is using log_min_duration_statement to log all queries over a certain duration or to use log_statement to log all queries. These both have downsides.

Logging slow queries gives a misleading view on what is using the most resources. A query that takes 100 seconds to run but only runs once a day will use less resource than a 100ms query running every second, but the fast often repeated query might not appear in the logs. This is common with map rendering where low-zoom queries are slow but infrequently run.

Logging all queries is comprehensive but can have a noticeable performance impact. Every query causes a write to the server logs. This can be an unacceptable performance drag on a production server.

Both methods suffer from adjusting log file settings. This means adjusting the server configuration, having the setting apply to all databases, having to remove extraneous information from the logs, and generally being hard to parse. In a shared environment this can be impossible to get done. If you have statements with newlines the newlines make it into the logfile unescaped, making it hard to parse them with normal tools.

Fortunately, there is another method: sampling running queries.

Self-hosted Vector Tiles and Tangram

I’ve been experimenting with generating my own vector tiles and client-side rendering with Tangram in order to figure out how to best write a style in its language.

Tangram is a GL-based renderer written by Mapzen and normally used with their Tilezen vector tiles, but I’m interested in being able to make my own vector tiles with different cartographic choices.

Improving Rendering Speed With Reclustering

Part of normal database maintenance is to rebuild indexes and recluster tables, but this often gets ignored on rendering servers.

Antidotal reports report a speed increase of 25%–50% from reclustering, and this can be done without shutting down the rendering server. The overall plan is to create a new copy of the tables, build new indexes, then replace the old tables.

OpenStreetMap Active Users

Periodically people make the claim of over 2 million active users for OpenStreetMap, but what this mean? This is the total number of accounts, including those who never edited, those who left long ago, spammers, and actual active contributors.

The closest metric to a standard is active users over the last 30 days. Although we can’t get that number, we can look at the changeset dump and analyze it with ChangesetMD and some SQL.

New Drive Testing

I had to buy a new hard drive for my array recently, which meant verifying that it works before I put it into service.

I don’t do burn-in tests of drives. Drives have a bathtub curve for reliability, like most components, but I find that if a drive is failing, it will start exhibiting performance problems that a thorough testing reveals.

Running a sufficiently long burn-in is increasingly impractical. A burn-in would probably involve writing and reading everywhere on the disk multiple times, and disks have been getting bigger and bigger. Denser platters help with sequential speeds, but I’d estimate it would take several days to burn in a new drive.

OpenStreetMap Carto Complexity

I often refer to OpenStreetMap Carto as the largest most complex open multi-contributor map style, but what does that mean?

Broken down, it means

  • It’s the largest open stylesheet. If you measure in code size, features rendered, or complexity, nothing else is close;

  • It’s the largest multi-contributor map style that doesn’t have a company dictating what is worked on. This means we get merge conflicts. They got so bad we changed the technology we use to define layers to make them solvable; and

  • It’s the largest style using OpenStreetMap data. Some proprietary styles like OpenCycleMap, MapQuest Open, and Mapbox Streets are complex, but none of them render the range of features we do.

More OpenStreetMap Futures

Andy recently blogged the developer numbers from his OpenStreetMap Futures talk at SOTM US.

Wanting to play with the numbers myself, I took the osm100 code and added in additional projects. The original list of repos came from a list of “Core Software” from the Engineering Working Group, and since then some of the software has been replaced, and there’s other older software which used to be core, but isn’t.