In my last post I looked at osm2pgsql style files and import and rendering speed. This was really looking at the width of rendering tables and how that impacted rendering speed. I found that eliminating a few unused columns decreased database size by 0.28% and increased rendering speed by 2.35%. The style file is not the only option that affects table width. Osm2pgsql has a series of options which allow you to store additional tags in an hstore column.
hstore
is a special data type in postgresql, designed for storing sets of arbitrary key/value pairs. OpenStreetMap tags are also key/value pairs, which makes this data type ideal for storing a complete list of tags.
TOAST tables
To make sense of benchmarking results, it’s necessary to understand TOAST
tables, EXTENDED
storage, and how PostgreSQL stores large values. In PostgreSQL tuples (rows) are stored on pages, each of which is 8 kB, and a tuple must fit into one page. Obviously, an alternate strategy is required for when a tuple exceeds 8 kB it needs to be stored using another method. The methods that are used in these cases are compression and storage in a special TOAST table. These can only be used on variable-length (varlena
) types, but text
, hstore
and geometry
are all variable-length, and most of the columns in an osm2pgsql rendering table are of these types.
When a row is wider than 2 kB (or whatever TOAST_TUPLE_THRESHOLD
is set to) PostgreSQL will try to compress field values and move them from the table to the TOAST table. Left behind in place of the data is an ID that points to a row in the TOAST table. When you want to access the TOAST’ed data PostgreSQL will automatically detect it is stored in the TOAST tables, retrieve the chunks from the TOAST tables by ID, and assemble them into the result, presenting what you expect without you ever realizing it was stored in a TOAST table.
The TOAST tables are so transparent to the user that I have not found a way of finding which values, if any, are stored in TOAST tables. Nonetheless, TOAST storage does matter when considering hstore performance because the hstore data may be stored in a TOAST table, changing size increase of rows.
osm2pgsql hstore options
Osm2pgsql has five hstore options
--hstore
or-k
adds any tags not already in a conventional column to a hstore column. With the standard stylesheet this would result in tags likehighway
appearing in a conventional column while tags not in the style likename:en
orlanes:forward
would appear only in the hstore column.--hstore-all
or-j
adds all tags to a hstore column, even if they’re already stored in a conventional column. With the standard stylesheet this would result in tags likehighway
appearing in conventional column and the hstore column while tags not in the style likename:en
orlanes:forward
would appear only in the hstore column.--hstore-column
or-z
, which adds an additional column for tags starting with a specified string, e.g.--hstore-column "name:'
produces a hstore column that contains all name:xx tags--hstore-match-only
modifies the above options and prevents objects from being added if they only have tags in the hstore column and no conventional tags.--hstore-add-index
adds an index to the hstore columns
Either --hstore
or --hstore-all
when combined with --hstore-match-only
should give the same rows, just with an additional unused column.
Results
As before, we’re using a modified osm2pgsql which doesn’t build indexes at import time. Sizes are from pg_column_size
and pg_table_size
.
The total number of rows was the same without hstore, with -k --match-only
and with -j --match-only
and 0.6% larger with -k
or -j
Import type | Rendering rate (MT/s) | Speed decrease | osm2pgsql time | Rendering tables size | Size increase | Hstore size |
---|---|---|---|---|---|---|
No hstore | 8.665 ± 0.017 | N/A | 14600 | 60.7 GB | N/A | N/A |
-k –match-only | 8.632 ± 0.022 | 0.38% | 14780 | 66.2 GB | 9% | 5.3 GB |
-j –match-only | 8.592 ± 0.010 | 0.84% | 14984 | 75.8 GB | 25% | 14.6 GB |
-k | 8.644 ± 0.018 | 0.24% | 14895 | 66.5 GB | 10% | 5.36 GB |
-j | 8.594 ± 0.013 | 0.82% | 15041 | 76.1 GB | 25% | 14.7 GB |
Conclusions
The rendering rate without hstore agrees with previous results, which validates the methods used.
It is immediately obvious that --match-only
has virtually no impact when used with the standard style because most objects have at least one tag used by default.style. This is not surprising because there is a strong feedback cycle between what is mapped and what is displayed on the main page. Additional tags are mainly used to provide additional information.
Looking at the size of the hstore columns, we can tell that most tags are present in the normal columns. There is little benefit to -j
over -k
as all -j
serves to do is duplicate all the data in normal columns to the hstore, at the cost of slower rendering, duplicated data and hstore columns three times the size.
Previous results found a rendering speed increase of 2.35% with a table size decrease 0.28%. This compares with the hstore results where a much larger size increase results in a minor speed decrease.
Summary
There are minimal speed decreases from a hstore column created with -k
/--hstore
, although there is a more significant size increase. Other hstore options like -j
/--hstore-all
and --match-only
are of minimal use except for specialized cases.