Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pytx] Add Tech Against Terrorism Hash API #1617

Merged
merged 12 commits into from
Sep 8, 2024

Conversation

Bruce-Pouncey-TAT
Copy link
Contributor

Summary

Hi

Bruce here from Tech Against Terrorism 👋

This PR contains Step 1 from this issue integrating the Tech Against Terrorism Hash List API into python-threatexchange

I used StopNSCII & NCMEC inside /client/exchanges/ as initial guidance.

Caveat
Our API returns a pre-signed URL to a JSON file for our hash list, instead of paginated results. I'm wondering how we want to handle this moving forward into part 2 (Once we're happy with part 1) ?

Test Plan

Manual
For manual testing you can use python shell to simply instantiate the TATHashListAPI with your TCAP username and password.

tat_api = TATHashListAPI(username=USER_NAME, password=PASSWORD)
response = tat_api.get_hash_list()

Automated
I have written unit tests techagainstterrorism/tests so pytest will do it.

@facebook-github-bot
Copy link
Contributor

Hi @Bruce-Pouncey-TAT!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

@Dcallies
Copy link
Contributor

Dcallies commented Sep 4, 2024

Hey Bruce! Great to be working with you! I will help you get this landed and review it.

As for the details of the API, I did read your dos and though you picked a bit of a challenging implementation, but I think we can make it work! I'll give you some tips on the PR itself.

@Bruce-Pouncey-TAT
Copy link
Contributor Author

Hey David, likewise!
I will get the pipeline failures resolved in the coming days, I'm also waiting on the CLA confirmation to come through on my side.

@Dcallies Dcallies changed the title Issue 1610 [pytx] Add Tech Against Terrorism Hash API Sep 4, 2024
Copy link
Contributor

@Dcallies Dcallies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks pretty good!

Thank you for including your manual test steps in the PR summary!

The one blocking thing I have is that I think the interface will become easier if you standardize on throwing exceptions instead of catching them and return a new object.

Let me know if you are short on time or energy, I'm willing to do the labor to bring this in the rest of the way if you are short on either of those.

Comment on lines 14 to 15
# Maybe move to a common library someday
from threatexchange.exchanges.clients.fb_threatexchange.api import TimeoutHTTPAdapter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ignorable: If you want to move this to a common helper library under clients, now seems like a reasonable time

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved 👍


@dataclass
class TATUser:
user: Any
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to narrow this anymore?

Copy link
Contributor Author

@Bruce-Pouncey-TAT Bruce-Pouncey-TAT Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our Actual User object can be quite large (possibly hundreds of permissions) and will vary in size based on who is using the interface. I can narrow this down to Dict[str, Any]


This API delivers a JSON file containing a comprehensive list of all hashed terrorist content within the TAT system.

The list is refreshed daily.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blocking: If you have public documentation, can you also add a link to it here?


except Exception as e:
logging.error("Error authenticating with TCAP: %s", e)
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blocking question: There's no recovery from this, right? Why not let the exception leak out?

response.raise_for_status()
return response.json()

def authenticate(self, username: str, password: str) -> t.Optional[str]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: would personally rename this get_auth_token


endpoint = f"{TATEndpoint.hash_list.value}/{ideology}"

print("ENDPOINT", endpoint)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blocking: No stdout in libraries please

Comment on lines 164 to 167
except requests.exceptions.HTTPError as http_err:
return TATAPIErrorResponse(error=str(http_err))

except Exception as e:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blocking q: Why not let exceptions leak out? This is exceptional circumstances, and I think the other clients in this library except on error. Some libraries are very anti-exception, but not this one :P

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback!

Forgive me it's my first time creating/contributing to a package I have mostly worked on web apps where we catch and handle everything!

Today I have learned it's better practice in libraries to let the exception leak up the stack 🙇 awesome!


def get_hash_list(
self, ideology: str = TATIdeology._all.value
) -> t.Union[TATHashListResponse, TATAPIErrorResponse]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blocking: This is a challenging type to deal with on the caller side! The simplest solution is to just throw exceptions. Exceptions for for exceptional circumstances, and failing to fetch seems to match that.

@Bruce-Pouncey-TAT
Copy link
Contributor Author

Bruce-Pouncey-TAT commented Sep 5, 2024

Overall looks pretty good!

Thank you for including your manual test steps in the PR summary!

The one blocking thing I have is that I think the interface will become easier if you standardize on throwing exceptions instead of catching them and return a new object.

Let me know if you are short on time or energy, I'm willing to do the labor to bring this in the rest of the way if you are short on either of those.

Thanks!
I have resolved the comments with my best understanding of what you have requested. Also I have the time and energy to continue with anymore changes you suggest.

I have moved the TimeoutHTTPAdapter into a common utils file and updated where it has been used.

Idea 🤔
I have noticed across all clients we're using _get_session within a content manager
Maybe we could lift this up into a parent class ? PyTxClientAPIBase for the lack of a better definition or maybe introduce an Abstract Base Class for the places where client are using _get _post. Just a thought 😄

Copy link
Contributor

@Dcallies Dcallies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idea 🤔
I have noticed across all clients we're using _get_session within a content manager
Maybe we could lift this up into a parent class ? PyTxClientAPIBase for the lack of a better definition or maybe introduce an Abstract Base Class for the places where client are using _get _post. Just a thought 😄

It's an interesting idea! Let's first get your SignalExchangeAPI implementation working, and we can come back to it.

You only have ONE thing left to fix - the lint error on running black!

@@ -0,0 +1,17 @@
from requests.adapters import HTTPAdapter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I git an internal nagger that forces me to add

# Copyright (c) Meta Platforms, Inc. and affiliates. to the tops of all files, though I can do it as a followup

@@ -0,0 +1,94 @@
import typing as t
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I git an internal nagger that forces me to add

# Copyright (c) Meta Platforms, Inc. and affiliates. to the tops of all files, though I can do it as a followup


except Exception as exception:
logging.error("Failed to get hash list: %s", exception)
raise Exception("Failed to fetch Hash List: %s", exception)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blocking (though I'm fine fixing it on a followup): Instead of wrapping this exception, rethrow the one on stack by just doing

raise

Ref: https://docs.python.org/3/tutorial/errors.html#enriching-exceptions-with-notes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was in the code so I made the change. Thanks for the reference!

Copy link
Contributor

@Dcallies Dcallies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@Dcallies Dcallies merged commit 1a9c554 into facebook:main Sep 8, 2024
6 checks passed
Bruce-Pouncey-TAT added a commit to Bruce-Pouncey-TAT/ThreatExchange that referenced this pull request Sep 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants