You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We hand-rolled a file storage for python-threatexchange even though the data is extremely simple key-value storage:
Key: the int or string returned by fetch()
Value: the dataclass in the value returned by fetch(). All the core ones are compatible with dacite, and so json-serializable
The current implementation stores this in the json serialization of a massive dict, which requires a full in-memory merge. At larger dataset sizes, this becomes untenable.
Because the data partitions so easily, any string: string key value store should work. We've discussed sqllite in the past, but it has the downside of requiring additional libraries.
dbm seems like it might be just a straight up upgrade over the current dumb file. It's similarly flexible, but has the additional benefit that it may get an optimized implementation (on unix), and even the dumb implementation doesn't load the data, only the keynames.
We hand-rolled a file storage for python-threatexchange even though the data is extremely simple key-value storage:
The current implementation stores this in the json serialization of a massive dict, which requires a full in-memory merge. At larger dataset sizes, this becomes untenable.
Because the data partitions so easily, any string: string key value store should work. We've discussed sqllite in the past, but it has the downside of requiring additional libraries.
dbm
seems like it might be just a straight up upgrade over the current dumb file. It's similarly flexible, but has the additional benefit that it may get an optimized implementation (on unix), and even the dumb implementation doesn't load the data, only the keynames.https://docs.python.org/3/library/dbm.html
The text was updated successfully, but these errors were encountered: