Various changes (hopefully improvements) #12

ggriffiniii · 2020-02-18T17:11:50Z

Here is a rather large change. This makes changes to the public facing API as well as the internal implementation. The public API changes are to remove the hex_ from function names and reduce what's exported by default. The only public functions exposed by default are now

encode, encode_to_slice, decode, decode_to_slice, decode_to_slice_unchecked

Each of these now also take a generic input of anything that is AsRef<[u8]> to be more convenient for users to encode/decode strings as well as byte buffers. The various fallback and specialized implementations are only exposed when compiled with the "bench" feature. The benchmarks now require that feature to run.

When tested on my machine which is a Intel(R) Xeon(R) W-2135 (skylake) these changes result in the following benchmark results:

encode/faster_hex/2     time:   [21.387 ns 21.457 ns 21.609 ns]
                        thrpt:  [88.265 MiB/s 88.891 MiB/s 89.184 MiB/s]
                 change:
                        time:   [-22.640% -22.215% -21.653%] (p = 0.00 < 0.05)
                        thrpt:  [+27.638% +28.559% +29.265%]
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe

encode/faster_hex/16    time:   [40.341 ns 40.623 ns 41.236 ns]
                        thrpt:  [370.03 MiB/s 375.62 MiB/s 378.25 MiB/s]
                 change:
                        time:   [-17.997% -16.616% -15.447%] (p = 0.00 < 0.05)
                        thrpt:  [+18.268% +19.928% +21.947%]
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  3 (3.00%) high severe

encode/faster_hex/32    time:   [40.945 ns 41.027 ns 41.178 ns]
                        thrpt:  [741.12 MiB/s 743.84 MiB/s 745.32 MiB/s]
                 change:
                        time:   [-16.032% -15.482% -14.929%] (p = 0.00 < 0.05)
                        thrpt:  [+17.549% +18.318% +19.093%]
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  1 (1.00%) low mild
  6 (6.00%) high mild
  5 (5.00%) high severe

encode/faster_hex/128   time:   [52.797 ns 52.941 ns 53.245 ns]
                        thrpt:  [2.2389 GiB/s 2.2517 GiB/s 2.2579 GiB/s]
                 change:
                        time:   [-13.436% -12.674% -12.073%] (p = 0.00 < 0.05)
                        thrpt:  [+13.730% +14.514% +15.521%]
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  4 (4.00%) low mild
  5 (5.00%) high mild
  3 (3.00%) high severe

encode/faster_hex/4096  time:   [270.41 ns 270.99 ns 271.79 ns]
                        thrpt:  [14.036 GiB/s 14.077 GiB/s 14.107 GiB/s]
                 change:
                        time:   [-16.646% -16.256% -15.877%] (p = 0.00 < 0.05)
                        thrpt:  [+18.873% +19.412% +19.971%]
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  8 (8.00%) high severe

decode/faster_hex/2     time:   [4.5752 ns 4.6051 ns 4.6524 ns]
                        thrpt:  [409.97 MiB/s 414.18 MiB/s 416.89 MiB/s]
                 change:
                        time:   [-39.299% -38.642% -37.981%] (p = 0.00 < 0.05)
                        thrpt:  [+61.241% +62.977% +64.741%]
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe

decode/faster_hex/16    time:   [12.561 ns 12.574 ns 12.595 ns]
                        thrpt:  [1.1831 GiB/s 1.1851 GiB/s 1.1863 GiB/s]
                 change:
                        time:   [-11.740% -11.124% -10.172%] (p = 0.00 < 0.05)
                        thrpt:  [+11.324% +12.516% +13.301%]
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

decode/faster_hex/32    time:   [7.5830 ns 7.6135 ns 7.6552 ns]
                        thrpt:  [3.8931 GiB/s 3.9144 GiB/s 3.9302 GiB/s]
                 change:
                        time:   [-66.986% -66.902% -66.800%] (p = 0.00 < 0.05)
                        thrpt:  [+201.21% +202.13% +202.90%]
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  4 (4.00%) high severe

decode/faster_hex/128   time:   [9.6518 ns 9.6774 ns 9.7188 ns]
                        thrpt:  [12.266 GiB/s 12.318 GiB/s 12.351 GiB/s]
                 change:
                        time:   [-52.470% -51.723% -51.184%] (p = 0.00 < 0.05)
                        thrpt:  [+104.85% +107.14% +110.39%]
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe

decode/faster_hex/4096  time:   [168.69 ns 169.12 ns 169.65 ns]
                        thrpt:  [22.485 GiB/s 22.556 GiB/s 22.613 GiB/s]
                 change:
                        time:   [-65.925% -65.422% -65.072%] (p = 0.00 < 0.05)
                        thrpt:  [+186.30% +189.20% +193.47%]
                        Performance has improved.
Found 17 outliers among 100 measurements (17.00%)
  6 (6.00%) low severe
  2 (2.00%) low mild
  1 (1.00%) high mild
  8 (8.00%) high severe

decode/faster_hex_unchecked/2
                        time:   [3.8828 ns 3.8839 ns 3.8851 ns]
                        thrpt:  [490.94 MiB/s 491.09 MiB/s 491.23 MiB/s]
                 change:
                        time:   [-12.488% -11.426% -10.756%] (p = 0.00 < 0.05)
                        thrpt:  [+12.052% +12.900% +14.271%]
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

decode/faster_hex_unchecked/16
                        time:   [10.755 ns 10.785 ns 10.845 ns]
                        thrpt:  [1.3740 GiB/s 1.3817 GiB/s 1.3856 GiB/s]
                 change:
                        time:   [-0.2600% +0.0841% +0.4912%] (p = 0.74 > 0.05)
                        thrpt:  [-0.4888% -0.0840% +0.2607%]
                        No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  4 (4.00%) high severe

decode/faster_hex_unchecked/32
                        time:   [5.3663 ns 5.3761 ns 5.3883 ns]
                        thrpt:  [5.5310 GiB/s 5.5435 GiB/s 5.5536 GiB/s]
                 change:
                        time:   [-70.582% -70.458% -70.349%] (p = 0.00 < 0.05)
                        thrpt:  [+237.25% +238.50% +239.92%]
                        Performance has improved.
Found 16 outliers among 100 measurements (16.00%)
  4 (4.00%) high mild
  12 (12.00%) high severe

decode/faster_hex_unchecked/128
                        time:   [6.4211 ns 6.4313 ns 6.4486 ns]
                        thrpt:  [18.486 GiB/s 18.536 GiB/s 18.565 GiB/s]
                 change:
                        time:   [-17.437% -16.443% -15.654%] (p = 0.00 < 0.05)
                        thrpt:  [+18.559% +19.679% +21.119%]
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

decode/faster_hex_unchecked/4096
                        time:   [101.75 ns 102.60 ns 103.91 ns]
                        thrpt:  [36.712 GiB/s 37.179 GiB/s 37.492 GiB/s]
                 change:
                        time:   [-33.690% -33.365% -32.935%] (p = 0.00 < 0.05)
                        thrpt:  [+49.110% +50.072% +50.806%]
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  3 (3.00%) low severe
  2 (2.00%) high mild
  6 (6.00%) high severe

The output length only needs to be half the input length.

This uses criterions benchmark groups to compare the relative performance between the different implementations at various byte lengths. The criterion reports now provide graphs that overlay the different implementation so you can easily see the relative performance.

Range patterns are deprecated, switch to inclusive ranges instead. Re-export the deprecated function `hex_to` in a separate use statement that allows deprecation warnings.

This isn't an error condition. A decoded empty slice is just an empty slice.

This also removes functions from the public api. hex_decode and hex_decode_unchecked are the only things exposed by default. hex_check_sse, hex_check_fallback, and hex_decode_fallback are only visible when compiled with the 'bench' feature. This means that benches now need to specify --features=bench when running. $ cargo bench --features=bench This makes no changes to the actual implementation and benchmarks confirm that.

Rather than checking to ensure all bytes are within valid ranges, this now checks to ensure no bytes are within invalid ranges. This is mostly to avoid compiler warning when comparing an i32 to 0xffff because we now compare the i32 to zero. The performance is identical.

…e pass over the input data. This speeds up checked decoding substantially, while not changing the performance of unchecked decoding. On my machine this increases throughput when decoding 2 bytes of data by 20%, and when decoding 4096 bytes of data by 116%.

Use a lookup table to determine the offset to add to each byte. This improves performance substantially. decode/faster_hex/4096 thrpt increased 15% decode/faster_hex_unchecked/4096 thrpt increased 54%

…an sse.

compare the implementation against the hex crate.

… of the crate.

The benches added unnecessary code that isn't available via the public api and is only benchmarked as a side-effect of the decode benchmark.

zhangsoledad · 2020-02-19T03:21:48Z

Great work @ggriffiniii ! I took a quick pass over it, but it's a big PR. so I need more time to do a proper review.

zhangsoledad · 2020-02-19T03:22:36Z

I'd propose to use AsMut make public api more generic, like this

pub fn encode<S, D>(src: &S, dst: &D) -> Result<(), Error>
where
    S: AsRef<[u8]> + ?Sized,
    D: AsMut<[u8]> + ?Sized,
{

pub fn encode_to_string<I>(src: &I) -> String
where
    I: AsRef<[u8]> + ?Sized,
{

ggriffiniii · 2020-02-19T05:26:37Z

Yeah, this is a rather large change. By all means take as much time as you need to do a thorough review.

I'd propose to use AsMut make public api more generic, like this

pub fn encode<S, D>(src: &S, dst: &D) -> Result<(), Error>
where
    S: AsRef<[u8]> + ?Sized,
    D: AsMut<[u8]> + ?Sized,
{

pub fn encode_to_string<I>(src: &I) -> String
where
    I: AsRef<[u8]> + ?Sized,
{

It could be changed to AsMut easily enough, but it doesn't seem worth it to me. The big benefit of AsRef<[u8]> is being able to accept &str and &String as well as byte slices and Vec's. The only standard things that implement AsMut are byte slices, Vec's and Box<[u8]>, which already will auto-deref to &mut [u8] without the explicit AsMut bound.

I also favored encode and decode allocating, and having encode_to_slice and decode_to_slice being the more verbose options for those seeking to reuse allocations. The latter is far less common so making it slightly more verbose makes sense and also follows a similar convention to other encoding crates like hex and base64.

ggriffiniii · 2020-02-19T18:56:13Z

While on the topic of public function signatures, I thought it would be useful to add a couple additional ideas I had. I opted to not include them in this change just to keep the already large change from growing even further, but here it goes.

I'll outline the proposals for encoding to keep the focus on the ideas, but in each case there is a matching analog for decoding that should probably be included if you want to pursue it.

Provide a convenient return value for encode_to_slice

pub fn encode_to_slice<'i, 'o, I>(src: &'i I, dst: &'o mut [u8]) -> Result<&'o str, Error>
where
    I: AsRef<[u8]> + ?Sized;

The goal here would just to provide a convenient &str of the encoded data. Users would be free to ignore the return value and the function would behave identically as it does now, but for many users getting back a &str (that's already been validated as utf8) could be a nice convenience.

Provide a simpler api for buffer reuse

pub fn encode_with_buffer<'i, 'o, I>(src: &'i I, dst: &'o mut Vec<u8>) -> &'o str
where
    I: AsRef<[u8]> + ?Sized;

or

pub fn encode_with_buffer<I, B>(src: &I, dst: B) -> String
where
    I: AsRef<[u8]> + ?Sized,
    B: Into<Vec<u8>>;

The goal with both of these options is to allow users to reuse buffers (allocations) when encoding multiple chunks of data. That's the biggest performance benefit allowed by the to_slice functions, but these would allow for the same performance characteristics without the potential errors since the size of the output buffer would be managed internally. Both of these options would be simple wrappers around encode_to_slice.

These could all be pursued as subsequent PR's, but we could have the discussion here while the topic of public API is already being discussed. The buffer reuse would be semver compatible change that could be added at any time, the return value change to encode_to_slice would not be semver compatible so a decision on that should be made prior to making another release on crates.io.

fanatid · 2020-04-15T18:39:19Z

I like directions of changes in this PR. Crate itself is great, I found it when noticed how hex is slow. But I found that documentation and serde module are missed here. Newcomers can prefer hex, because it's clear how use it and ready for usage with serde. Any plans on this? I think I'd like work on some such code when this PR will be merged.

fanatid · 2020-04-15T18:43:41Z

Looks like proptest should be bumped (rustc 1.44.0-nightly (94d346360 2020-04-09)):

   Compiling proptest v0.8.7
error[E0495]: cannot infer an appropriate lifetime for lifetime parameter `'a` due to conflicting requirements
  --> /home/kirill/.cargo/registry/src/github.com-1ecc6299db9ec823/proptest-0.8.7/src/arbitrary/_core/iter.rs:44:23
   |
44 |         base.prop_map(Iterator::cloned).boxed()
   |                       ^^^^^^^^^^^^^^^^
   |
note: first, the lifetime cannot outlive the lifetime `'a` as defined on the impl at 35:6...
  --> /home/kirill/.cargo/registry/src/github.com-1ecc6299db9ec823/proptest-0.8.7/src/arbitrary/_core/iter.rs:35:6
   |
35 | impl<'a, T: 'static + Clone, A: fmt::Debug + 'static + Iterator<Item = &'a T>>
   |      ^^
note: ...so that the types are compatible
  --> /home/kirill/.cargo/registry/src/github.com-1ecc6299db9ec823/proptest-0.8.7/src/arbitrary/_core/iter.rs:44:23
   |
44 |         base.prop_map(Iterator::cloned).boxed()
   |                       ^^^^^^^^^^^^^^^^
   = note: expected  `&'a T`
              found  `&T`
   = note: but, the lifetime must be valid for the static lifetime...
note: ...so that the type `strategy::map::Map<S, fn(A) -> core::iter::Cloned<A> {<A as core::iter::Iterator>::cloned::<'_, T>}>` will meet its required lifetime bounds
  --> /home/kirill/.cargo/registry/src/github.com-1ecc6299db9ec823/proptest-0.8.7/src/arbitrary/_core/iter.rs:44:41
   |
44 |         base.prop_map(Iterator::cloned).boxed()
   |                                         ^^^^^

With update proptest to 0.9.6 errors gone.

doitian · 2022-10-16T10:39:44Z

Hey y'all, any updates on this PR?

zhangsoledad · 2022-10-17T03:12:54Z

This PR now stuck point is not timely processing, we can accept the algorithm changes, but for the API we do not want to break the original API now directly, may add new API rather than directly overwrite the original is more acceptable.

dsheets · 2023-12-01T17:11:32Z

Please break the API. You could have the defacto hex encoding crate as hex seems unmaintained and the API will need to be improved eventually.

ggriffiniii added 20 commits February 11, 2020 13:18

Fix decode output buffer length calculation.

6f392c3

The output length only needs to be half the input length.

Fix compiler warnings.

50f9d46

Range patterns are deprecated, switch to inclusive ranges instead. Re-export the deprecated function `hex_to` in a separate use statement that allows deprecation warnings.

Don't return an error when decoding an empty slice.

a5898da

This isn't an error condition. A decoded empty slice is just an empty slice.

Add a basic roundtrip property test.

59e5b28

migrate hex_check from sse to avx2

ca46185

Faster avx2 is_valid implementation

fcb43b2

New avx2 decode algorithm.

ad34b9e

Use a lookup table to determine the offset to add to each byte. This improves performance substantially. decode/faster_hex/4096 thrpt increased 15% decode/faster_hex_unchecked/4096 thrpt increased 54%

Update the name of the check bench to reflect it's now avx2 rather th…

4d1af86

…an sse.

New encode algorithm

2773009

hex_string doesn't need to return an Result

e5c516f

Add sse decoding

1558c78

Add features for avx2 and sse41

b7e178b

Add a decode function that returns a Vec<u8> and a couple proptests to

ae677f1

compare the implementation against the hex crate.

Rename functions removing the hex prefix as it stutters with the name…

9bd7c34

… of the crate.

Accept input as AsRef<[u8]> to be more convenient.

124a106

Remove the check bench and implementations.

8d8ace9

The benches added unnecessary code that isn't available via the public api and is only benchmarked as a side-effect of the decode benchmark.

zhangsoledad added the major Breaking changes, bump major label Feb 19, 2020

doitian requested review from zhangsoledad and yangby-cryptape February 19, 2020 03:43

zhangsoledad added this to the version milestone May 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Various changes (hopefully improvements) #12

Various changes (hopefully improvements) #12

ggriffiniii commented Feb 18, 2020

zhangsoledad commented Feb 19, 2020

zhangsoledad commented Feb 19, 2020 •

edited

Loading

ggriffiniii commented Feb 19, 2020

ggriffiniii commented Feb 19, 2020

fanatid commented Apr 15, 2020

fanatid commented Apr 15, 2020

doitian commented Oct 16, 2022

zhangsoledad commented Oct 17, 2022

dsheets commented Dec 1, 2023

Various changes (hopefully improvements) #12

Are you sure you want to change the base?

Various changes (hopefully improvements) #12

Conversation

ggriffiniii commented Feb 18, 2020

zhangsoledad commented Feb 19, 2020

zhangsoledad commented Feb 19, 2020 • edited Loading

ggriffiniii commented Feb 19, 2020

ggriffiniii commented Feb 19, 2020

Provide a convenient return value for encode_to_slice

Provide a simpler api for buffer reuse

fanatid commented Apr 15, 2020

fanatid commented Apr 15, 2020

doitian commented Oct 16, 2022

zhangsoledad commented Oct 17, 2022

dsheets commented Dec 1, 2023

zhangsoledad commented Feb 19, 2020 •

edited

Loading