Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Verifying we can safely reuse the context? #33

Open
neiljohari opened this issue Aug 26, 2024 · 3 comments
Open

Question: Verifying we can safely reuse the context? #33

neiljohari opened this issue Aug 26, 2024 · 3 comments
Labels

Comments

@neiljohari
Copy link

Hello, thanks for the great library!

We wanted to verify we aren't running into any UB here.

We are using an ObjectPool of compressors like in this comment #31 (comment) with a small tweak that we're using a custom PoolObjectPolicy and passing in a dictionary:

new DefaultObjectPool<Compressor>(
                new CustomPolicy(
                    new CompressionOptions(dictionaryBytes)));

We then call Wrap multiple times serially on the same compressor.

The question is whether there's any sort of reset on the context necessary? Following all the Wrap calls, this eventually calls:

ExternMethods.ZSTD_compress_usingCDict(this.cctx, dst, (UIntPtr) checked ((ulong) dst.Length), src, (UIntPtr) checked ((ulong) src.Length), this.Options.Cdict))).EnsureZstdSuccess()));

The docs seem to say you can reuse a ZSTD_CDict* with no issue in the bulk processing dictionary API + in the beginning it says we can reuse a context multiple times:

  When compressing many times,
  it is recommended to allocate a context just once,
  and re-use it for each successive compression operation.
  This will make workload friendlier for system's memory.
  Note : re-using context is just a speed / resource optimization.
         It doesn't change the compression ratio, which remains identical.
  Note 2 : In multi-threaded environments,
         use one different context per thread for parallel execution.
 
typedef struct ZSTD_CCtx_s ZSTD_CCtx;
ZSTD_CCtx* ZSTD_createCCtx(void);
size_t     ZSTD_freeCCtx(ZSTD_CCtx* cctx);  /* accept NULL pointer */

Just making sure this is an accurate understanding of how the lib is meant to be used?

@dscheg
Copy link
Member

dscheg commented Sep 2, 2024

You can reuse a context (i.e. Compressor/Decomressor instance) with a dictionary as well as context without a dictionary, while respecting the condition that only one thread accesses one context at a time. Actually, we use small amount of [ThreadStatic]Compressor instances with dict under fairly heavy loads in production for mass bulk processing, and it works well.

The only thing I think is important to note, even though the Zstd documentation says "re-using context is just a speed / resource optimization, it doesn't change the compression ratio, which remains identical", strictly speaking, the documentation doesn't explicitly say whether there might be side channels of information leakage between compressions/decompressions with single context. So if you are using compression of some sort of user data, I would recommend clarifying this point with the authors of Zstd. Or reuse the context only within signle user as a measure of defense-in-depth technique

@dscheg dscheg added the question label Sep 6, 2024
@neiljohari
Copy link
Author

You can reuse a context (i.e. Compressor/Decomressor instance) with a dictionary as well as context without a dictionary, while respecting the condition that only one thread accesses one context at a time. Actually, we use small amount of [ThreadStatic]Compressor instances with dict under fairly heavy loads in production for mass bulk processing, and it works well.

The only thing I think is important to note, even though the Zstd documentation says "re-using context is just a speed / resource optimization, it doesn't change the compression ratio, which remains identical", strictly speaking, the documentation doesn't explicitly say whether there might be side channels of information leakage between compressions/decompressions with single context. So if you are using compression of some sort of user data, I would recommend clarifying this point with the authors of Zstd. Or reuse the context only within single user as a measure of defense-in-depth technique

Thanks, we deployed this pooling approach and it seems to hold up well! 😄

Unfortunately I don't have time at the moment to extract out a minimal repro, but as a heads up we did run into an aggressive unmanaged memory leak when compressing with a dictionary following this approach every request (e.g. no pooling), and it happened at very low rps (1-5).

We aren't able to rule out fragmentation, and we tried to dig into the native code but weren't able to identify what could be causing a leak even with the suboptimal approach.

We weren't able to repro it on our dev machines even at much higher rps (native on macOS Apple Silicon), but we observed it running containerized on our Intel production machines. In the container we're building libzstd 1.5.6 from source and ensuring ZstdNet uses it.

Not sure if others have run into this / perhaps you have some theories on what could cause it. For now though the pooling approach works well to bound our memory usage and handle this more efficiently overall, but we were pretty stumped when trying to narrow down what could have caused it.

@dscheg
Copy link
Member

dscheg commented Sep 12, 2024

@neiljohari CompressionOptions class is thread-safe. Do you reuse instance of this class?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants