Paul’s Blog

A blog without a good name

ZFS Settings for Osm2pgsql

Continuing with the setup of my new server, I moved on to filesystem settings. Unlike previous servers, this server runs FreeBSD with a ZFS file system. ZFS has many features but importantly for osm2pgsql, it allows transparent on-disk compression and adjusting the disk record size.

There is contradictory information available about both these tunables. Gregory Smith’s PostgreSQL 9.0 High Performance suggests adjusting the recordsize to match the PostgreSQL block size of 8k if doing scattered random IO, and using transparent compression for scans of data too small to be compressed by TOAST. On the other hand, some mailing list posts have suggested 128K for ETL workloads like osm2pgsql.

I was unable to find any ZFS tuning suggestions specific to spatial databases, so I was operating completely in the dark, which called for benchmarking.

Using the previous postgresql tuning, I ran a series of imports, adjusting the ZFS recordsize and compression.

Results

Processing time

Compression8K recordsize128K recordsize
None6.3h8.4h
lz46.1h10.8h

Processing time includes processing nodes, ways and relations, and is single-threaded. A faster SSD-only server with a faster CPU can complete this stage in 4 hours.

PostgreSQL index and cluster time

Compression8K recordsize128K recordsize
None18.4h21.8h
lz412.8h30.4h

This time is dominated by the creation of an 100GB GIN index on a bigint[] column.

Total import time

Compression8K recordsize128K recordsize
None27.2h32.7h
lz421.5h43.7h

Not broken out seperately is pending ways, as this is largely uneffected by ZFS settings.

Record size

The one clear conclusion is that 8K recordsizes are faster than 128K recordsizes. This holds true if you further subdivide the import into individual tasks like processing ways or relations. I found no cases where 128K was faster.

Compression and 128K recordsize is a particularly bad combination. I’m not familiar enough with the internals of ZFS, but it’s possible that every time PostgreSQL does a write of 8K ZFS is forced to fetch the 128K record, decompress it, add the new 8K, and recompress it.

An examination of the CPU usage supports this, with the CPU difference between 128K compressed and uncompressed being bigger than the difference between 8K compressed and uncompressed. This is true for both CPU utilization, and total CPU time used.

Compression

Compression is more interesting. It’s a performance gain on 8K recordsizes, while a loss on 128K recordsizes. 8K is more important, as that’s all-around faster, and it’s worth looking at it in more detail.

Detailed time

Compression settingofflz4
Processingnode15161514
way55895938
relation1585314716
Total2295822168
Pending ways87468889
PostgreSQL index and cluster6633046365
Total9805877441

Pending ways is unique in that it is the only purely write part of the import that is IO bound, with the load consisting of many COPY statements. It is also the only part that is slower on a compressed ZFS volume.

Total CPU usage

With compression, CPU utilization can be expected to be higher since the import completes faster. On a sufficiently fast IO system, CPU could again be the limiting factor. The 8K lz4 import used 33.9 cpu-hours while the 8K uncompressed import used 34.2 CPU-hours. On a modern system with multiple cores, CPU usage is unlikely to be a reason to reject lz4 compression. It may be an issue with slower methods like lzjb.

Conclusions

A recordsize of 8K is preferrable to 128K for ZFS volumes with a tablespace under all conditions found. For anything but the purely write portions of the import, lz4 compression was found to offer speed advantages, proving to be very significant speed advantages for large GIN index creation.

Further work

ZFS settings were kept constant with 128K recordsize and uncompressed for the xlog. This may not be optimal, given the gains shown with 8K recordsize on the tablespace volume.

Large PostGIS geometries are stored compressed with TOAST and MAIN storage. This compression is on top of the lz4 compression done by ZFS, so results in duplicated work. Better performance might be obtained by switching all MAIN storage to PLAIN or EXTERNAL.