-
Notifications
You must be signed in to change notification settings - Fork 585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor criu plugin baseline v1 #2295
base: criu-dev
Are you sure you want to change the base?
Refactor criu plugin baseline v1 #2295
Conversation
…vices Add a new compilation unit to host symbols and methods that will be needed to C&R DRM devices. Refactor code that indicates support for C&R and checkpoints KFD and DRM devices Signed-off-by: Ramesh Errabolu <[email protected]>
Refactor code used to Checkpoint DRM devices. Code is moved into amdgpu_plugin_drm.c file which hosts various methods to checkpoint and restore a workload. Signed-off-by: Ramesh Errabolu <[email protected]>
Fix the typo
Could someone advise me as to what is blocking? The 3 tests that are failing are doing so in "build" phase. It is not clear how that is connected to my change. What are the next steps |
You can just enter details section of each check and see errors by yourself. 1 and 3 are obviously introduced by your code, so you should fix them. 4 is definitely unrelated to your code as I saw it in other PRs, 2 is likely unrelated - I triggered a rerun for it. I also approved all checks for your PR, so you may expect some more fails. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## criu-dev #2295 +/- ##
============================================
+ Coverage 70.55% 70.62% +0.06%
============================================
Files 132 133 +1
Lines 33508 33312 -196
============================================
- Hits 23642 23525 -117
+ Misses 9866 9787 -79 ☔ View full report in Codecov by Sentry. |
@@ -1244,7 +1244,7 @@ static bool map_devices(struct tp_system *src_sys, struct tp_system *dest_sys, s | |||
return true; | |||
} else { | |||
/* We could not map remaining nodes in the list. Add dest node back | |||
* to list and try to map next dest ndoe in list to current src | |||
* to list and try to map next dest node in list to current src |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you be able to apply this change in the previous commit (using git rebase
or git commit --amend
)?
https://github.com/checkpoint-restore/criu/blob/criu-dev/CONTRIBUTING.md#pull-request-guidelines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing out the procedure to look at the details.
I am rather new to submitting to repo on github that goes through an approval.
I was getting an error when I fixed the error (typo) locally and tried to push it again.
The git pull was messing up my local repo branch and thus the commit online on github itself
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After modifying the commit you might need to use git push
with -f
to update your github branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given my headaches, will push a new branch which will contain resolution to the various comments.
return ret; | ||
} | ||
|
||
void print_kfd_bo_stat(int bo_cnt, struct kfd_criu_bo_bucket *bo_list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is not used anywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I add this as a helper method that could be used in debugging if needed. Is this not allowed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with adding unused debugging functions upstream is that other people are not going to use them and test if they still work when new changes are being introduced. This makes it more difficult to maintain the code base and creates confusion for new contributors working on the project.
A friendly reminder that this PR had no activity for 30 days. |
{ | ||
dev_file_cnt--; | ||
pr_info("\n"); | ||
pr_info("Sairam: %s(), Number of Checkpoints LEFT: %d\n", __func__, dev_file_cnt); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove all prints with Sairam:
@@ -68,6 +68,13 @@ struct vma_metadata { | |||
|
|||
extern int fd_next; | |||
|
|||
// FD of KFD device used to checkpoint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use consistent commenting style. I think rest of this file uses this style:
/**
* Comments
*/
|
||
/* Helper macros to Checkpoint and Restore a ROCm file */ | ||
#define HSAKMT_SHM_PATH "/dev/shm/hsakmt_shared_mem" | ||
#define HSAKMT_SHM "/hsakmt_shared_mem" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mis-alignment for the right-hand-sides here
return (dev_file_cnt == 0); | ||
} | ||
|
||
void decrement_checkpoint_count() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please put this function in the other commit as it is used there
@@ -840,6 +840,9 @@ void topology_free(struct tp_system *sys) | |||
list_del(&p2pgroup->listm_system); | |||
xfree(p2pgroup); | |||
} | |||
|
|||
/* Update Topology as being freed */ | |||
sys->parsed = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think parsed is used anywhere else?
Overall, the general idea/concept of this refactor is fine. |
A friendly reminder that this PR had no activity for 30 days. |
No description provided.