Skip to content

Fallback to pinned host memory when managed memory is not supported#3075

Merged
zcbenz merged 1 commit intoml-explore:mainfrom
zcbenz:move-to-unified
Jan 30, 2026
Merged

Fallback to pinned host memory when managed memory is not supported#3075
zcbenz merged 1 commit intoml-explore:mainfrom
zcbenz:move-to-unified

Conversation

@zcbenz
Copy link
Copy Markdown
Collaborator

@zcbenz zcbenz commented Jan 28, 2026

For 4090 the managed memory does not work on Windows (crashed when accessed on CPU) even though the API reports it being supported, so I'm just disabling managed memory for Windows unless there is hardware unified memory support.

The memory allocation code looks a little messy with all the conditions but I think it is fine for now.

Comment thread mlx/backend/cuda/allocator.cpp Outdated
Comment on lines +140 to +141
auto& d = device(i);
free_streams_.emplace_back(d);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like I did this without accessing the MLX device intentionally. Maybe there was some initialization order thing that was causing problems.. I wish I had left a comment :/.

But then again maybe it's fixed now .. if the tests clear then they clear..

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you ran into #3062 (comment).

Comment thread mlx/backend/cuda/allocator.cpp Outdated
Comment thread mlx/backend/cuda/allocator.cpp Outdated
Comment on lines +185 to +189
if (supports_managed_memory()) {
CHECK_CUDA_ERROR(cudaMallocManaged(&data, size));
} else {
CHECK_CUDA_ERROR(cudaMallocHost(&data, size));
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few cases of if (supports_managed) do x else y fi

It might make sense to refactor to a unifiedMalloc and unifiedFree to keep the code a little more readable.

Comment on lines +142 to +144
if (d.memory_pools()) {
CHECK_CUDA_ERROR(cudaDeviceGetDefaultMemPool(&mem_pools_[i], i));
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of that check here? Some devices do not support memory pools?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah according to https://github.com/ml-explore/mlx/pull/2972/changes#diff-3e8aaaff4c1529bbcf6ea804df3793a6c354f2812ff63377dffec82b8ca4321d some devices do not have memory pools. Also I just realized that cudaMallocAsync should not be used when memory pools is not available, will make change.

Copy link
Copy Markdown
Member

@awni awni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Left some minor comments. Feel free to merge when ready!

@zcbenz zcbenz merged commit 212077f into ml-explore:main Jan 30, 2026
16 checks passed
@zcbenz zcbenz deleted the move-to-unified branch January 30, 2026 04:18
jessegross added a commit to jessegross/mlx that referenced this pull request Feb 3, 2026
…ported

Extend the Windows managed memory check from ml-explore#3075 to also apply to WSL,
as the underlying behavior is the same.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants