Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various changes (hopefully improvements) #12

Open
wants to merge 20 commits into
base: master
Choose a base branch
from

Conversation

ggriffiniii
Copy link

Here is a rather large change. This makes changes to the public facing API as well as the internal implementation. The public API changes are to remove the hex_ from function names and reduce what's exported by default. The only public functions exposed by default are now

encode, encode_to_slice, decode, decode_to_slice, decode_to_slice_unchecked

Each of these now also take a generic input of anything that is AsRef<[u8]> to be more convenient for users to encode/decode strings as well as byte buffers. The various fallback and specialized implementations are only exposed when compiled with the "bench" feature. The benchmarks now require that feature to run.

When tested on my machine which is a Intel(R) Xeon(R) W-2135 (skylake) these changes result in the following benchmark results:

encode/faster_hex/2     time:   [21.387 ns 21.457 ns 21.609 ns]
                        thrpt:  [88.265 MiB/s 88.891 MiB/s 89.184 MiB/s]
                 change:
                        time:   [-22.640% -22.215% -21.653%] (p = 0.00 < 0.05)
                        thrpt:  [+27.638% +28.559% +29.265%]
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe

encode/faster_hex/16    time:   [40.341 ns 40.623 ns 41.236 ns]
                        thrpt:  [370.03 MiB/s 375.62 MiB/s 378.25 MiB/s]
                 change:
                        time:   [-17.997% -16.616% -15.447%] (p = 0.00 < 0.05)
                        thrpt:  [+18.268% +19.928% +21.947%]
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  3 (3.00%) high severe

encode/faster_hex/32    time:   [40.945 ns 41.027 ns 41.178 ns]
                        thrpt:  [741.12 MiB/s 743.84 MiB/s 745.32 MiB/s]
                 change:
                        time:   [-16.032% -15.482% -14.929%] (p = 0.00 < 0.05)
                        thrpt:  [+17.549% +18.318% +19.093%]
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  1 (1.00%) low mild
  6 (6.00%) high mild
  5 (5.00%) high severe

encode/faster_hex/128   time:   [52.797 ns 52.941 ns 53.245 ns]
                        thrpt:  [2.2389 GiB/s 2.2517 GiB/s 2.2579 GiB/s]
                 change:
                        time:   [-13.436% -12.674% -12.073%] (p = 0.00 < 0.05)
                        thrpt:  [+13.730% +14.514% +15.521%]
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  4 (4.00%) low mild
  5 (5.00%) high mild
  3 (3.00%) high severe

encode/faster_hex/4096  time:   [270.41 ns 270.99 ns 271.79 ns]
                        thrpt:  [14.036 GiB/s 14.077 GiB/s 14.107 GiB/s]
                 change:
                        time:   [-16.646% -16.256% -15.877%] (p = 0.00 < 0.05)
                        thrpt:  [+18.873% +19.412% +19.971%]
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  8 (8.00%) high severe

decode/faster_hex/2     time:   [4.5752 ns 4.6051 ns 4.6524 ns]
                        thrpt:  [409.97 MiB/s 414.18 MiB/s 416.89 MiB/s]
                 change:
                        time:   [-39.299% -38.642% -37.981%] (p = 0.00 < 0.05)
                        thrpt:  [+61.241% +62.977% +64.741%]
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe

decode/faster_hex/16    time:   [12.561 ns 12.574 ns 12.595 ns]
                        thrpt:  [1.1831 GiB/s 1.1851 GiB/s 1.1863 GiB/s]
                 change:
                        time:   [-11.740% -11.124% -10.172%] (p = 0.00 < 0.05)
                        thrpt:  [+11.324% +12.516% +13.301%]
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

decode/faster_hex/32    time:   [7.5830 ns 7.6135 ns 7.6552 ns]
                        thrpt:  [3.8931 GiB/s 3.9144 GiB/s 3.9302 GiB/s]
                 change:
                        time:   [-66.986% -66.902% -66.800%] (p = 0.00 < 0.05)
                        thrpt:  [+201.21% +202.13% +202.90%]
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  4 (4.00%) high severe

decode/faster_hex/128   time:   [9.6518 ns 9.6774 ns 9.7188 ns]
                        thrpt:  [12.266 GiB/s 12.318 GiB/s 12.351 GiB/s]
                 change:
                        time:   [-52.470% -51.723% -51.184%] (p = 0.00 < 0.05)
                        thrpt:  [+104.85% +107.14% +110.39%]
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe

decode/faster_hex/4096  time:   [168.69 ns 169.12 ns 169.65 ns]
                        thrpt:  [22.485 GiB/s 22.556 GiB/s 22.613 GiB/s]
                 change:
                        time:   [-65.925% -65.422% -65.072%] (p = 0.00 < 0.05)
                        thrpt:  [+186.30% +189.20% +193.47%]
                        Performance has improved.
Found 17 outliers among 100 measurements (17.00%)
  6 (6.00%) low severe
  2 (2.00%) low mild
  1 (1.00%) high mild
  8 (8.00%) high severe

decode/faster_hex_unchecked/2
                        time:   [3.8828 ns 3.8839 ns 3.8851 ns]
                        thrpt:  [490.94 MiB/s 491.09 MiB/s 491.23 MiB/s]
                 change:
                        time:   [-12.488% -11.426% -10.756%] (p = 0.00 < 0.05)
                        thrpt:  [+12.052% +12.900% +14.271%]
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

decode/faster_hex_unchecked/16
                        time:   [10.755 ns 10.785 ns 10.845 ns]
                        thrpt:  [1.3740 GiB/s 1.3817 GiB/s 1.3856 GiB/s]
                 change:
                        time:   [-0.2600% +0.0841% +0.4912%] (p = 0.74 > 0.05)
                        thrpt:  [-0.4888% -0.0840% +0.2607%]
                        No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  4 (4.00%) high severe

decode/faster_hex_unchecked/32
                        time:   [5.3663 ns 5.3761 ns 5.3883 ns]
                        thrpt:  [5.5310 GiB/s 5.5435 GiB/s 5.5536 GiB/s]
                 change:
                        time:   [-70.582% -70.458% -70.349%] (p = 0.00 < 0.05)
                        thrpt:  [+237.25% +238.50% +239.92%]
                        Performance has improved.
Found 16 outliers among 100 measurements (16.00%)
  4 (4.00%) high mild
  12 (12.00%) high severe

decode/faster_hex_unchecked/128
                        time:   [6.4211 ns 6.4313 ns 6.4486 ns]
                        thrpt:  [18.486 GiB/s 18.536 GiB/s 18.565 GiB/s]
                 change:
                        time:   [-17.437% -16.443% -15.654%] (p = 0.00 < 0.05)
                        thrpt:  [+18.559% +19.679% +21.119%]
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

decode/faster_hex_unchecked/4096
                        time:   [101.75 ns 102.60 ns 103.91 ns]
                        thrpt:  [36.712 GiB/s 37.179 GiB/s 37.492 GiB/s]
                 change:
                        time:   [-33.690% -33.365% -32.935%] (p = 0.00 < 0.05)
                        thrpt:  [+49.110% +50.072% +50.806%]
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  3 (3.00%) low severe
  2 (2.00%) high mild
  6 (6.00%) high severe

The output length only needs to be half the input length.
This uses criterions benchmark groups to compare the relative
performance between the different implementations at various byte
lengths. The criterion reports now provide graphs that overlay the
different implementation so you can easily see the relative performance.
Range patterns are deprecated, switch to inclusive ranges instead.
Re-export the deprecated function `hex_to` in a separate use statement
that allows deprecation warnings.
This isn't an error condition. A decoded empty slice is just an empty
slice.
This also removes functions from the public api. hex_decode and
hex_decode_unchecked are the only things exposed by default.
hex_check_sse, hex_check_fallback, and hex_decode_fallback are only
visible when compiled with the 'bench' feature. This means that benches
now need to specify --features=bench when running.

$ cargo bench --features=bench

This makes no changes to the actual implementation and benchmarks
confirm that.
Rather than checking to ensure all bytes are within valid ranges, this
now checks to ensure no bytes are within invalid ranges. This is mostly
to avoid compiler warning when comparing an i32 to 0xffff because we now
compare the i32 to zero.

The performance is identical.
…e pass over the input data.

This speeds up checked decoding substantially, while not changing the
performance of unchecked decoding.

On my machine this increases throughput when decoding 2 bytes of data by
20%, and when decoding 4096 bytes of data by 116%.
Use a lookup table to determine the offset to add to each byte. This
improves performance substantially.

decode/faster_hex/4096 thrpt increased 15%
decode/faster_hex_unchecked/4096 thrpt increased 54%
compare the implementation against the hex crate.
The benches added unnecessary code that isn't available via the public
api and is only benchmarked as a side-effect of the decode benchmark.
@zhangsoledad
Copy link
Member

Great work @ggriffiniii ! I took a quick pass over it, but it's a big PR. so I need more time to do a proper review.

@zhangsoledad
Copy link
Member

zhangsoledad commented Feb 19, 2020

I'd propose to use AsMut make public api more generic, like this

pub fn encode<S, D>(src: &S, dst: &D) -> Result<(), Error>
where
    S: AsRef<[u8]> + ?Sized,
    D: AsMut<[u8]> + ?Sized,
{
pub fn encode_to_string<I>(src: &I) -> String
where
    I: AsRef<[u8]> + ?Sized,
{

@zhangsoledad zhangsoledad added the major Breaking changes, bump major label Feb 19, 2020
@ggriffiniii
Copy link
Author

Yeah, this is a rather large change. By all means take as much time as you need to do a thorough review.

I'd propose to use AsMut make public api more generic, like this

pub fn encode<S, D>(src: &S, dst: &D) -> Result<(), Error>
where
    S: AsRef<[u8]> + ?Sized,
    D: AsMut<[u8]> + ?Sized,
{
pub fn encode_to_string<I>(src: &I) -> String
where
    I: AsRef<[u8]> + ?Sized,
{

It could be changed to AsMut easily enough, but it doesn't seem worth it to me. The big benefit of AsRef<[u8]> is being able to accept &str and &String as well as byte slices and Vec's. The only standard things that implement AsMut are byte slices, Vec's and Box<[u8]>, which already will auto-deref to &mut [u8] without the explicit AsMut bound.

I also favored encode and decode allocating, and having encode_to_slice and decode_to_slice being the more verbose options for those seeking to reuse allocations. The latter is far less common so making it slightly more verbose makes sense and also follows a similar convention to other encoding crates like hex and base64.

@ggriffiniii
Copy link
Author

While on the topic of public function signatures, I thought it would be useful to add a couple additional ideas I had. I opted to not include them in this change just to keep the already large change from growing even further, but here it goes.

I'll outline the proposals for encoding to keep the focus on the ideas, but in each case there is a matching analog for decoding that should probably be included if you want to pursue it.

Provide a convenient return value for encode_to_slice

pub fn encode_to_slice<'i, 'o, I>(src: &'i I, dst: &'o mut [u8]) -> Result<&'o str, Error>whereI: AsRef<[u8]> + ?Sized;

The goal here would just to provide a convenient &str of the encoded data. Users would be free to ignore the return value and the function would behave identically as it does now, but for many users getting back a &str (that's already been validated as utf8) could be a nice convenience.

Provide a simpler api for buffer reuse

pub fn encode_with_buffer<'i, 'o, I>(src: &'i I, dst: &'o mut Vec<u8>) -> &'o str
where
    I: AsRef<[u8]> + ?Sized;

or

pub fn encode_with_buffer<I, B>(src: &I, dst: B) -> String
where
    I: AsRef<[u8]> + ?Sized,
    B: Into<Vec<u8>>;

The goal with both of these options is to allow users to reuse buffers (allocations) when encoding multiple chunks of data. That's the biggest performance benefit allowed by the to_slice functions, but these would allow for the same performance characteristics without the potential errors since the size of the output buffer would be managed internally. Both of these options would be simple wrappers around encode_to_slice.

These could all be pursued as subsequent PR's, but we could have the discussion here while the topic of public API is already being discussed. The buffer reuse would be semver compatible change that could be added at any time, the return value change to encode_to_slice would not be semver compatible so a decision on that should be made prior to making another release on crates.io.

@fanatid
Copy link

fanatid commented Apr 15, 2020

I like directions of changes in this PR. Crate itself is great, I found it when noticed how hex is slow. But I found that documentation and serde module are missed here. Newcomers can prefer hex, because it's clear how use it and ready for usage with serde. Any plans on this? I think I'd like work on some such code when this PR will be merged.

@fanatid
Copy link

fanatid commented Apr 15, 2020

Looks like proptest should be bumped (rustc 1.44.0-nightly (94d346360 2020-04-09)):

   Compiling proptest v0.8.7
error[E0495]: cannot infer an appropriate lifetime for lifetime parameter `'a` due to conflicting requirements
  --> /home/kirill/.cargo/registry/src/github.com-1ecc6299db9ec823/proptest-0.8.7/src/arbitrary/_core/iter.rs:44:23
   |
44 |         base.prop_map(Iterator::cloned).boxed()
   |                       ^^^^^^^^^^^^^^^^
   |
note: first, the lifetime cannot outlive the lifetime `'a` as defined on the impl at 35:6...
  --> /home/kirill/.cargo/registry/src/github.com-1ecc6299db9ec823/proptest-0.8.7/src/arbitrary/_core/iter.rs:35:6
   |
35 | impl<'a, T: 'static + Clone, A: fmt::Debug + 'static + Iterator<Item = &'a T>>
   |      ^^
note: ...so that the types are compatible
  --> /home/kirill/.cargo/registry/src/github.com-1ecc6299db9ec823/proptest-0.8.7/src/arbitrary/_core/iter.rs:44:23
   |
44 |         base.prop_map(Iterator::cloned).boxed()
   |                       ^^^^^^^^^^^^^^^^
   = note: expected  `&'a T`
              found  `&T`
   = note: but, the lifetime must be valid for the static lifetime...
note: ...so that the type `strategy::map::Map<S, fn(A) -> core::iter::Cloned<A> {<A as core::iter::Iterator>::cloned::<'_, T>}>` will meet its required lifetime bounds
  --> /home/kirill/.cargo/registry/src/github.com-1ecc6299db9ec823/proptest-0.8.7/src/arbitrary/_core/iter.rs:44:41
   |
44 |         base.prop_map(Iterator::cloned).boxed()
   |                                         ^^^^^

With update proptest to 0.9.6 errors gone.

@zhangsoledad zhangsoledad added this to the version milestone May 10, 2021
@doitian
Copy link
Member

doitian commented Oct 16, 2022

Hey y'all, any updates on this PR?

@zhangsoledad
Copy link
Member

This PR now stuck point is not timely processing, we can accept the algorithm changes, but for the API we do not want to break the original API now directly, may add new API rather than directly overwrite the original is more acceptable.

@dsheets
Copy link

dsheets commented Dec 1, 2023

Please break the API. You could have the defacto hex encoding crate as hex seems unmaintained and the API will need to be improved eventually.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
major Breaking changes, bump major
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants