Fallback to pinned host memory when managed memory is not supported#3075
Fallback to pinned host memory when managed memory is not supported#3075zcbenz merged 1 commit intoml-explore:mainfrom
Conversation
| auto& d = device(i); | ||
| free_streams_.emplace_back(d); |
There was a problem hiding this comment.
I feel like I did this without accessing the MLX device intentionally. Maybe there was some initialization order thing that was causing problems.. I wish I had left a comment :/.
But then again maybe it's fixed now .. if the tests clear then they clear..
| if (supports_managed_memory()) { | ||
| CHECK_CUDA_ERROR(cudaMallocManaged(&data, size)); | ||
| } else { | ||
| CHECK_CUDA_ERROR(cudaMallocHost(&data, size)); | ||
| } |
There was a problem hiding this comment.
There are a few cases of if (supports_managed) do x else y fi
It might make sense to refactor to a unifiedMalloc and unifiedFree to keep the code a little more readable.
| if (d.memory_pools()) { | ||
| CHECK_CUDA_ERROR(cudaDeviceGetDefaultMemPool(&mem_pools_[i], i)); | ||
| } |
There was a problem hiding this comment.
What's the purpose of that check here? Some devices do not support memory pools?
There was a problem hiding this comment.
Yeah according to https://github.com/ml-explore/mlx/pull/2972/changes#diff-3e8aaaff4c1529bbcf6ea804df3793a6c354f2812ff63377dffec82b8ca4321d some devices do not have memory pools. Also I just realized that cudaMallocAsync should not be used when memory pools is not available, will make change.
awni
left a comment
There was a problem hiding this comment.
Looks great. Left some minor comments. Feel free to merge when ready!
c63799f to
bf8502c
Compare
…ported Extend the Windows managed memory check from ml-explore#3075 to also apply to WSL, as the underlying behavior is the same.
For 4090 the managed memory does not work on Windows (crashed when accessed on CPU) even though the API reports it being supported, so I'm just disabling managed memory for Windows unless there is hardware unified memory support.
The memory allocation code looks a little messy with all the conditions but I think it is fine for now.