Speed up backup process by downloading multiple Shard Groups in parallel #365
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Currently, the backup process downloads one shard at a time from the Influx API and stores it on the file system. This process tends to be very slow on larger databases, as it doesn't take advantage of large IO capacity which could speed up this process tremendously.
This PR introduces a pool of workers downloading a bunch of shards in parallel, split at the layer of shard groups,
because a shard group only holds a single shard in the Influx OSS version, which obviously wouldn't make sense to parallelize.
My benchmarked speedup of the parallelization in a VM running on my machine with a limited IO capacity is already 2 to 3 times, but is probably even more on a beefier system.