Paul’s Blog

A blog without a good name

Improving Rendering Speed With Reclustering

Part of normal database maintenance is to rebuild indexes and recluster tables, but this often gets ignored on rendering servers.

Antidotal reports report a speed increase of 25%–50% from reclustering, and this can be done without shutting down the rendering server. The overall plan is to create a new copy of the tables, build new indexes, then replace the old tables.

After stopping updates, the first step is to create a new schema and set the search path to include it.

1
2
CREATE SCHEMA IF NOT EXISTS recluster;   
SET search_path TO recluster,"$user",public;

Next, copies of the rendering tables need to be made, with a spatially correlated order.

This can be done for the four rendering tables with

1
2
3
4
5
6
7
8
9
10
11
12
13
SET search_path TO recluster,"$user",public;
CREATE TABLE planet_osm_point AS
  SELECT * FROM public.planet_osm_point
    ORDER BY ST_GeoHash(ST_Transform(ST_Envelope(way),4326),10) COLLATE "C";
CREATE TABLE planet_osm_line AS
  SELECT * FROM public.planet_osm_line
    ORDER BY ST_GeoHash(ST_Transform(ST_Envelope(way),4326),10) COLLATE "C";
CREATE TABLE planet_osm_polygon AS
  SELECT * FROM public.planet_osm_polygon
    ORDER BY ST_GeoHash(ST_Transform(ST_Envelope(way),4326),10) COLLATE "C";
CREATE TABLE planet_osm_roads AS
  SELECT * FROM public.planet_osm_roads
    ORDER BY ST_GeoHash(ST_Transform(ST_Envelope(way),4326),10) COLLATE "C";

Next new indexes need to be made. OpenStreetMap Carto includes for most of this

1
2
scripts/indexes.py --osm2pgsql | \
  PGOPTIONS='-c search_path=recluster,"$user",public' psql -d gis

The osm_id indexes still need to be recreated, or updates will take forever

1
2
3
4
5
SET search_path TO recluster,"$user",public;
CREATE INDEX planet_osm_point_pkey ON planet_osm_point (osm_id);
CREATE INDEX planet_osm_line_pkey ON planet_osm_line (osm_id);
CREATE INDEX planet_osm_polygon_pkey ON planet_osm_polygon (osm_id);
CREATE INDEX planet_osm_roads_pkey ON planet_osm_roads (osm_id);

Now that there are new tables, it’s just a matter of putting them in place so they’ll be used.

It’s important to replace the tables in a transaction so any rendering going on at the same time won’t be interrupted.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
CREATE SCHEMA IF NOT EXISTS backup;
BEGIN;
ALTER TABLE public.planet_osm_point
  SET SCHEMA backup;
ALTER TABLE recluster.planet_osm_point
  SET SCHEMA public;
COMMIT;

BEGIN;
ALTER TABLE public.planet_osm_line
  SET SCHEMA backup;
ALTER TABLE recluster.planet_osm_line
  SET SCHEMA public;
COMMIT;

BEGIN;
ALTER TABLE public.planet_osm_polygon
  SET SCHEMA backup;
ALTER TABLE recluster.planet_osm_polygon
  SET SCHEMA public;
COMMIT;

BEGIN;
ALTER TABLE public.planet_osm_roads
  SET SCHEMA backup;
ALTER TABLE recluster.planet_osm_roads
  SET SCHEMA public;
COMMIT;

DROP SCHEMA recluster;

Now that the tables have been replaced, check that everything is rendering correctly. If it is, updates can be resumed and the old tables removed with

1
DROP SCHEMA backup CASCADE;

One change that could be made is to do each table on by one, or to do them all in parallel. If this makes sense depends on server load and capacity.