Question: Verifying we can safely reuse the context? #33

neiljohari · 2024-08-26T23:51:15Z

Hello, thanks for the great library!

We wanted to verify we aren't running into any UB here.

We are using an ObjectPool of compressors like in this comment #31 (comment) with a small tweak that we're using a custom PoolObjectPolicy and passing in a dictionary:

new DefaultObjectPool<Compressor>(
                new CustomPolicy(
                    new CompressionOptions(dictionaryBytes)));

We then call Wrap multiple times serially on the same compressor.

The question is whether there's any sort of reset on the context necessary? Following all the Wrap calls, this eventually calls:

ExternMethods.ZSTD_compress_usingCDict(this.cctx, dst, (UIntPtr) checked ((ulong) dst.Length), src, (UIntPtr) checked ((ulong) src.Length), this.Options.Cdict))).EnsureZstdSuccess()));

The docs seem to say you can reuse a ZSTD_CDict* with no issue in the bulk processing dictionary API + in the beginning it says we can reuse a context multiple times:

  When compressing many times,
  it is recommended to allocate a context just once,
  and re-use it for each successive compression operation.
  This will make workload friendlier for system's memory.
  Note : re-using context is just a speed / resource optimization.
         It doesn't change the compression ratio, which remains identical.
  Note 2 : In multi-threaded environments,
         use one different context per thread for parallel execution.
 
typedef struct ZSTD_CCtx_s ZSTD_CCtx;
ZSTD_CCtx* ZSTD_createCCtx(void);
size_t     ZSTD_freeCCtx(ZSTD_CCtx* cctx);  /* accept NULL pointer */

Just making sure this is an accurate understanding of how the lib is meant to be used?

The text was updated successfully, but these errors were encountered:

dscheg · 2024-09-02T16:42:31Z

You can reuse a context (i.e. Compressor/Decomressor instance) with a dictionary as well as context without a dictionary, while respecting the condition that only one thread accesses one context at a time. Actually, we use small amount of [ThreadStatic]Compressor instances with dict under fairly heavy loads in production for mass bulk processing, and it works well.

The only thing I think is important to note, even though the Zstd documentation says "re-using context is just a speed / resource optimization, it doesn't change the compression ratio, which remains identical", strictly speaking, the documentation doesn't explicitly say whether there might be side channels of information leakage between compressions/decompressions with single context. So if you are using compression of some sort of user data, I would recommend clarifying this point with the authors of Zstd. Or reuse the context only within signle user as a measure of defense-in-depth technique

neiljohari · 2024-09-11T23:05:32Z

You can reuse a context (i.e. Compressor/Decomressor instance) with a dictionary as well as context without a dictionary, while respecting the condition that only one thread accesses one context at a time. Actually, we use small amount of [ThreadStatic]Compressor instances with dict under fairly heavy loads in production for mass bulk processing, and it works well.

The only thing I think is important to note, even though the Zstd documentation says "re-using context is just a speed / resource optimization, it doesn't change the compression ratio, which remains identical", strictly speaking, the documentation doesn't explicitly say whether there might be side channels of information leakage between compressions/decompressions with single context. So if you are using compression of some sort of user data, I would recommend clarifying this point with the authors of Zstd. Or reuse the context only within single user as a measure of defense-in-depth technique

Thanks, we deployed this pooling approach and it seems to hold up well! 😄

Unfortunately I don't have time at the moment to extract out a minimal repro, but as a heads up we did run into an aggressive unmanaged memory leak when compressing with a dictionary following this approach every request (e.g. no pooling), and it happened at very low rps (1-5).

We aren't able to rule out fragmentation, and we tried to dig into the native code but weren't able to identify what could be causing a leak even with the suboptimal approach.

We weren't able to repro it on our dev machines even at much higher rps (native on macOS Apple Silicon), but we observed it running containerized on our Intel production machines. In the container we're building libzstd 1.5.6 from source and ensuring ZstdNet uses it.

Not sure if others have run into this / perhaps you have some theories on what could cause it. For now though the pooling approach works well to bound our memory usage and handle this more efficiently overall, but we were pretty stumped when trying to narrow down what could have caused it.

dscheg · 2024-09-12T06:58:36Z

@neiljohari CompressionOptions class is thread-safe. Do you reuse instance of this class?

dscheg added the question label Sep 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: Verifying we can safely reuse the context? #33

Question: Verifying we can safely reuse the context? #33

neiljohari commented Aug 26, 2024

dscheg commented Sep 2, 2024 •

edited

Loading

neiljohari commented Sep 11, 2024

dscheg commented Sep 12, 2024

Question: Verifying we can safely reuse the context? #33

Question: Verifying we can safely reuse the context? #33

Comments

neiljohari commented Aug 26, 2024

dscheg commented Sep 2, 2024 • edited Loading

neiljohari commented Sep 11, 2024

dscheg commented Sep 12, 2024

dscheg commented Sep 2, 2024 •

edited

Loading