Improving the file saving performance was fairly easy. The octree write function was single threaded. Profiling revealed that the bulk of the time was spent in LZ4 compression.
So I added data parallel #Tasks using #enkiTS TaskSets which compressed sections of the octree to memory. The main write loop launched several of these, then wrote out the results in order whilst setting off new TaskSets.
https://github.com/dougbinks/enkiTS
2/N