Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pdq] ValueError when computing PDQ hash on some pngs #1659

Open
thedanielsun opened this issue Oct 8, 2024 · 1 comment
Open

[pdq] ValueError when computing PDQ hash on some pngs #1659

thedanielsun opened this issue Oct 8, 2024 · 1 comment
Labels
bug pdq Items related to the pdq libraries or reference implementations python-threatexchange Items related to the threatexchange python tool / library

Comments

@thedanielsun
Copy link
Contributor

Getting a rare, sporadic error when computing PDQ hash for certain images

Repro below:

$ threatexchange hash photo https://styles.redditmedia.com/t5_6k8ysb/styles/profileIcon_jj0se5l3qgrd1.png
Traceback (most recent call last):
  File "/Users/daniel.sun/.pyenv/versions/3.8.12/bin/threatexchange", line 8, in <module>
    sys.exit(main())
  File "/Users/daniel.sun/.pyenv/versions/3.8.12/lib/python3.8/site-packages/threatexchange/cli/main.py", line 333, in main
    inner_main()
  File "/Users/daniel.sun/.pyenv/versions/3.8.12/lib/python3.8/site-packages/threatexchange/cli/main.py", line 326, in inner_main
    execute_command(settings, namespace)
  File "/Users/daniel.sun/.pyenv/versions/3.8.12/lib/python3.8/site-packages/threatexchange/cli/main.py", line 161, in execute_command
    command.execute(settings)
  File "/Users/daniel.sun/.pyenv/versions/3.8.12/lib/python3.8/site-packages/threatexchange/cli/hash_cmd.py", line 106, in execute
    hash_str = hasher.hash_from_file(file)
  File "/Users/daniel.sun/.pyenv/versions/3.8.12/lib/python3.8/site-packages/threatexchange/signal_type/signal_base.py", line 196, in hash_from_file
    return cls.hash_from_bytes(file.read_bytes())
  File "/Users/daniel.sun/.pyenv/versions/3.8.12/lib/python3.8/site-packages/threatexchange/signal_type/pdq/signal.py", line 79, in hash_from_bytes
    pdq_hash, quality = pdq_from_bytes(bytes_)
  File "/Users/daniel.sun/.pyenv/versions/3.8.12/lib/python3.8/site-packages/threatexchange/signal_type/pdq/pdq_hasher.py", line 32, in pdq_from_bytes
    return _pdq_from_numpy_array(np_array)
  File "/Users/daniel.sun/.pyenv/versions/3.8.12/lib/python3.8/site-packages/threatexchange/signal_type/pdq/pdq_hasher.py", line 36, in _pdq_from_numpy_array
    hash_vector, quality = pdqhash.compute(array)
  File "pdqhash/bindings.pyx", line 66, in pdqhash.bindings.compute
ValueError: Buffer dtype mismatch, expected 'char' but got 'int'

Verified this occurs on threatexchange version 1.1.1 (but on python 3.8)

@Dcallies Dcallies added bug pdq Items related to the pdq libraries or reference implementations python-threatexchange Items related to the threatexchange python tool / library labels Oct 9, 2024
@Dcallies
Copy link
Contributor

Dcallies commented Oct 9, 2024

Thanks a ton for the bug report @thedanielsun! A bit overwhelmed after the hackathon, but I'll aim to get a repro and then some debugging. If you end up needed it fixed faster and solve it on your own, feel free to submit the fix as a comment or PR!

@Dcallies Dcallies changed the title [pdq] ValueError when computing PDQ hash [pdq] ValueError when computing PDQ hash on some pngs Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug pdq Items related to the pdq libraries or reference implementations python-threatexchange Items related to the threatexchange python tool / library
Projects
None yet
Development

No branches or pull requests

2 participants