-
Notifications
You must be signed in to change notification settings - Fork 221
Set memory pool through RMM #2866
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
achirkin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @viclafargue , could you please update the description of the PR to let the reviewers know what is the problem it addresses?
I suspect, this is about perf bottlenecks related to locking the cuda context in the multi-gpu setup. If so, please attach the benchmarks. I'm also concerned a little about the lifetime of the objects allocated with the per-device pools. Could there be a situation when an object outlives its memory pool and segfaults the program?
Yes, that is a real issue. Once a user sets a memory pool (e.g. |
jinsolp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Victor!
Yes, this is a tricky problem with RMM today. I am working to improve this for 26.02 or 26.04. This class of problems will be eliminated by RMM migrating to the CCCL 3.2 memory resource design. There is a new form of memory resource ownership in This issue comment (rapidsai/rmm#2011 (comment)) tracks adoption of the new memory resources, which is my primary active project right now. I am hopeful that 26.02 will entirely resolve this class of issues, though deprecation cycles may force it out to 26.04. |
|
I'm fine either way! |
This PR aims at updating the
device_resources_snmgutility to make sure that the method in charge of setting up the RMM pools on the GPUs effectively work in multi-GPU context. This method used to set a pool workspace on the device resource to no effect. We instead now set the per device memory resource of each GPU to a pool through RMM.Benchmark of search throughput for 100 rows queries on 2 GPUs CAGRA indexes built with NN Descent :
