Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix use after free bug in actor heap finalisation that can lead to a segfault #4522

Merged

Conversation

dipinhora
Copy link
Contributor

@dipinhora dipinhora commented Oct 7, 2024

A long time ago, some rando decided to be clever and rework how objects with finalisers are stored/garbage collected in actor heaps (see: #1638). Unfortunately for us all, he wasn't nearly as clever as he thought and introduced a use after free bug that only occurs when finalisers have logic to reference other objects that might have already been freed and reused during the same garbage collection process. Luckily for us, a smarterer rando has come along to save the day.

This PR adds in tests to reproduce the bug and rework the actor heap garbage collection logic to make sure this issue can no longer occur by making sure that:

  • finalisers can no longer re-use memory that might be freed earlier in the garbage collection process
  • heap chunks are only freed/destroyed after all finalisers have been run to ensure all cross object references in finalisers remain valid at the time the finalisers are run

this is needed to reliably trigger heap finaliser bug and can be
reverted after the bug is fixed.
@ponylang-main ponylang-main added the discuss during sync Should be discussed during an upcoming sync label Oct 7, 2024
@dipinhora dipinhora changed the title Add tests for small/large dependent finalisers use-after-free access WIP: testing out a potential bug Oct 7, 2024
@dipinhora
Copy link
Contributor Author

This was supposed to be opened as a draft pr but i clicked the wrong button and i can't seem to figure out how to convert it to a draft now.. 8*/

@mfelsche mfelsche marked this pull request as draft October 7, 2024 15:27
@mfelsche
Copy link
Contributor

mfelsche commented Oct 7, 2024

There was a small grey text "convert to draft" under the reviewers section. I clicked it for you. (was curious how to do it myself)

@dipinhora
Copy link
Contributor Author

There was a small grey text "convert to draft" under the reviewers section. I clicked it for you. (was curious how to do it myself)

thanks.. much appreciated..

@dipinhora
Copy link
Contributor Author

no clue why PR / x86-64 Linux musl (pull_request) is unhappy but it seems unrelated to the code change since the code change only impacts actor heap gc and the failures are related to codegen..

@dipinhora
Copy link
Contributor Author

this is mostly ready in case anyone wants to take a peek..

it still needs things like a fancy description and release notes and any code cleanup/formatting changes before it is marked ready for review.. and any changes based on feedback.. and a decision around whether to revert the Temporarily zero out heap memory when destorying a chunk commit or not...

but the core bug (object finalisers can reference other objects whose memory might have already been freed/reused) has been reproduced and resolved on all architectures..

@dipinhora
Copy link
Contributor Author

the various PR / Validate * image builds (pull_request) job failures seem to be transient networking issues..

the PR / x86-64 Linux musl (pull_request) job failure seems unrelated to the code change in this PR since the code change only impacts actor heap gc and the errors are related to the compiler/libponyc which doesn't use actor heaps..

i don't seem to have the ability to restart any failed jobs so i would appreciate it if someone who does have the appropriate privileges can restart them..

@dipinhora dipinhora changed the title WIP: testing out a potential bug Fix use after free bug in actor heap finalisation that can lead to a segfault Oct 7, 2024
@dipinhora dipinhora marked this pull request as ready for review October 7, 2024 21:06
@dipinhora
Copy link
Contributor Author

given that the PR / x86-64 Linux musl (pull_request) job has been unhappy and failing since before any heap garbage collection logic changes (see: https://github.com/ponylang/ponyc/actions/runs/11217855902/job/31180337962 for when the new tests were added in the first commit on this PR before the zeroing on heap free change) and that it fails in different tests every run, i'm not going to spend any time on it (my guess/assumption is that musl changed behavior in some way) as it is unrelated to the heap garbage collection logic changes and bug fix in this PR..

please either keep retrying until it succeeds or mark it as success/ignore so the downstream jobs can run..

@SeanTAllen
Copy link
Member

So the regression test that failed is for #3615. The actual regression isn't an issue but we did get a segfault which is concerning.

@dipinhora
Copy link
Contributor Author

So the regression test that failed is for #3615. The actual regression isn't an issue but we did get a segfault which is concerning.

@SeanTAllen the PR / x86-64 Linux musl (pull_request) has segfaulted multiple times in multiple different tests both with and without the heap logic changes.. i agree that the segfaults are concerning and should be investigated and resolved, but as i stated in #4522 (comment), the issue is unrelated to the code changes in this PR based on the evidence available..

@SeanTAllen
Copy link
Member

I agree it seems unrelated. @dipinhora can you open an issue for the kabooms?

@SeanTAllen SeanTAllen added changelog - fixed Automatically add "Fixed" CHANGELOG entry on merge triggers release Major issue that when fixed, results in an "emergency" release labels Oct 8, 2024
@SeanTAllen SeanTAllen mentioned this pull request Oct 8, 2024
@dipinhora
Copy link
Contributor Author

I agree it seems unrelated. @dipinhora can you open an issue for the kabooms?

@SeanTAllen #4524 has been created

@SeanTAllen SeanTAllen merged commit f7a22aa into ponylang:main Oct 9, 2024
23 of 24 checks passed
@ponylang-main ponylang-main removed the discuss during sync Should be discussed during an upcoming sync label Oct 9, 2024
github-actions bot pushed a commit that referenced this pull request Oct 9, 2024
github-actions bot pushed a commit that referenced this pull request Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog - fixed Automatically add "Fixed" CHANGELOG entry on merge triggers release Major issue that when fixed, results in an "emergency" release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants