Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running harvester run during active harvest can cause harvest object resubmissions #446

Open
bonnland opened this issue May 21, 2021 · 2 comments

Comments

@bonnland
Copy link
Contributor

bonnland commented May 21, 2021

@amercader

5ffe6d4

If a harvest job is currently running while harvester run is issued, it seems possible that a record currently being processed could be re-submitted to the fetch queue and processed again. This might lead to session rollbacks that detach the SqlAlchemy instance.

At least I'm seeing it with my site, where harvest jobs can take hours to finish, and harvester run can be run many times an hour during the harvest.

See #445, which shows the error I'm seeing.

Maybe switching the order of database queries could fix this? Or is it possible to expire objects in the current session just before resubmitting jobs?

@frafra
Copy link
Contributor

frafra commented Oct 21, 2021

Shouldn't there be a lock mechanism to avoid running the harvesting while another one is running?

@bonnland
Copy link
Contributor Author

bonnland commented Oct 21, 2021

I believe you could be correct. The problem I was seeing could have been a side effect of having the CKAN database on a remote machine, where aggressive firewall rules were closing the connection. CKAN harvesting does not appear to behave well when the client/server database connection is closed repeatedly, especially while a harvest is underway. I moved the database to the same machine as the CKAN application, and this problem of harvest object resubmission went away.

If someone else can confirm that proper harvesting behavior relies on a database connection remaining open indefinitely, I might suggest that CKAN administrators be made more aware of this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants