Speed up backup process by downloading multiple Shard Groups in parallel #365

TwentyFiveSoftware · 2022-03-08T15:57:56Z

Currently, the backup process downloads one shard at a time from the Influx API and stores it on the file system. This process tends to be very slow on larger databases, as it doesn't take advantage of large IO capacity which could speed up this process tremendously.

This PR introduces a pool of workers downloading a bunch of shards in parallel, split at the layer of shard groups,
because a shard group only holds a single shard in the Influx OSS version, which obviously wouldn't make sense to parallelize.

My benchmarked speedup of the parallelization in a VM running on my machine with a limited IO capacity is already 2 to 3 times, but is probably even more on a beefier system.

…hard group across different parallel workers

TwentyFiveSoftware · 2022-03-08T16:05:37Z

Closes #366

speed up backup process by splitting the shard download process per s…

239a8cc

…hard group across different parallel workers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up backup process by downloading multiple Shard Groups in parallel #365

Speed up backup process by downloading multiple Shard Groups in parallel #365

TwentyFiveSoftware commented Mar 8, 2022

TwentyFiveSoftware commented Mar 8, 2022

Speed up backup process by downloading multiple Shard Groups in parallel #365

Are you sure you want to change the base?

Speed up backup process by downloading multiple Shard Groups in parallel #365

Conversation

TwentyFiveSoftware commented Mar 8, 2022

TwentyFiveSoftware commented Mar 8, 2022