In this release, we add some APIs to enable users to control more properties of devices, connections and the async-rdma background framework.
We tested async-rdma on some hardware RDMA devices(only Soft-RoCE before), and found that the default configuration cannot meet all hardware requirements. Some hardware needs special configuration, so we have added richer set interfaces.
We provide support for listening IBV async events. When the IBV async event occurs, it will be recorded. We will add more functions besides recording in the future.
- Add
IbvEventListenerto monitor ibv async events. Rename the originalevent_xxx.rstocq_event_xxx.rsto distinguish cq event from ibv event. - Add manual cq polling trigger. Add
PollingTriggertrait to enable multiple implementations of trigger. Implement channel based manual polling trigger and add related APIs. - Add API
set_cc_evnet_timeout. To set the timeout value for event listener to wait for the notification of completion channel. - Add API to set imm flag in wc. The value of immediate data flag may be different on rdma devices. Users can change this flag by this API to adapt their devices.
- Add experimental feature and recv with
fnAPIs. - Add set APIs for attributes of queue pair.
- Add APIs to get the state of qp.
- Add
atomic_casAPI and tests. - Add
ibv_connectAPI forRdmaBuilderandRdma. pub useGidforibv_conenct. You can connect to remote end by raw qp info no matter how you get it. - Enable users to create
RemoteMrmanually.
- Increase default cq size. The cq may be full before next polling if the cq size is too little.
- Add qp init attributes check.
- Add
cq_sizeandmax_poll_cqecheck. - Adapt new
cm_connectAPI logic. - Replace
selfwith&selfinibv_connect. Make it easier to use without the need of clone. - Refactor attributes structs of qp. Implement builder pattern for attributes structs by
derive Builder. to reduce hand write codes and make the structs clearer.
- Replace assertion with debug msg. The length of mr is not always equal to
wc.byte_len.
In this release, we add some APIs to make it easier to establish connections and provide more configurable options.
We provide support for sending/receiving raw data and allocating raw memory region to enable users to implement the upper layer protocol or memory pool by themselves.
APIs for supporting multi-connection can be used to easily establish multiple connections with more than one remote end and reuse resources.
- Adapt RDMA CM APIs. Add
cm_connectto establish connection with CM Server. Add{send/recv}_rawAPIs to send/recv raw data without the help of agent. - Add APIs to set attributes of queue pair like
set_max_{send/recv}_{sge/wr}. - Add APIs for
RdmaBuilderto establish connections conveniently. - Add APIs to alloc raw mr. Sometimes users want to setup memory pool by themselves
instead of using
Jemallocto manage mrs. So we define two strategies. - Add APIs to alloc mrs with different accesses and protection domains from
Jemalloc.MrAllocatorwill create a new arena when we alloc mr with not the default access and pd. - Add access API for mrs to query access.
- Support multi-connection APIs. Add APIs for
Rdmato create a newRdmathat has the samemr_allocatorandevent_listeneras parent. - Add
RemoteMraccess control. Addset_max_rmr_accessAPI to set the maximum permission onRemoteMrthat the remote end can request.
- Use submodule to setup CI environment.
- Reorganize some attributes to avoid too many arguments.
- Update examples. Replace unsafe blocks and add more comments. Show more APIs.
- Disable
tcacheofJemallocas default.tcacheis a feature ofJemallocto speed up memory allocation. HoweverJemallocmay allocMRwith wrongarena_indexfromtcachewhen we create more than oneJemallocenabledmr_allocators. So we disabletcacheby default.
In this release, we adapted Jemalloc to manage RDMA memory region to improve memory
allocation efficiency.
Some safety and performance issues have been fixed, thanks to @Nugine's comments.
Jemallocwas adapted to manage memory region. We inject custom extent hooks intoJemallocto empower it to manageRawMrs. That avoids the overhead of repeatedly reg/dereg memory region. UseBTreeMapto record the relationship betweenaddrandRawMr. When we alloc memory fromJemalloc,mr_allocatorwill lookup the relatedRawMrbyaddr.- Add timeout mechanism for
RemoteMr. RequestRemoteMrfrom remote without timeout may cause remote OOM. So we addRemoteMrManagerto manageRemoteMrand free them after timeout.
- Avoid unnecessary overflow checking operations to improve performance.
- Optimize cq poll work flow. Poll single
CQEat a time is inefficient in high concurrency. Here we made the maximum number ofCQEto poll at a time configurable. Accordingly,event_listenershould be able to wake up multiple tasks at a time. - Redesign
LocalMrandevent_listenerto ensure cancel safety. Letevent_listenerholds theArcs of theLocalMrInners that are being used by RDMA ops to ensure cancel safety.LocalMrwas replaced with new structLocalMrInner. Because every struct that can use APIs should hold anArcof the mr's metadata, but previousLocalMrcan't hold itself. AddRwLockto avoid potential race condition. - Add zeroed
LocalMrAPI. Uninitialized memory region is fast but not safe, so we add zeroed API and mark uninitialized memory alloc API as unsafe.
- Fix
Gidimpl that used to have alignment mismatch bug. - Fix the handling of return values of ibv APIs. Most ibv APIs return -1 or NULL on error and if the call fails, errno will be set to indicate the reason for the failure.But there are some places treat the return value as errno, we fixed these wrong handlings.
- Ensure cancel safety. Undefined behavior will happen if the future dropped(cancelled)
during the execution of RDMA operations. So we redesign
LocalMrandevent_listenerto holdArcof memory regions until the operations are complete. - Mark unsafe traits and functions. The safety of access traits and functions cannot be guaranteed
by themselves. So we need to mark them as unsafe. And the unsafe traits are marked as
sealedto avoid being unsafely impl externally.
In this release, the code related to memory region has been reorganized.
This change makes the abstraction more explicit and easy to maintain later.
Some new APIs for RDMA operations with immediate data was added. And we also fixed some bugs, which made the lib more stable.
- Implement memory region slice.
LocalMrSliceandRemoteMrSliceenable us to operate on part of memory region. And we can only get one mutable slice or many unmutable slices at the same time, just like any other type in rust. - Add imm data APIs.
- send_with_imm
- receive_with_imm
- write_with_imm
- receive_write_imm
- Redesign memory region. Use traits to describe three kinds of memory region abstraction.
RawMemoryRegionrecords all information about the memory region Registered locally.LocalMrrecords information for local use.RemoteMrrecords information for remote use. - Discard preapplication of memroy region strategy. The old preapplication strategy is not flexible enough, so that is currently abandoned. And we will transform jemalloc to manage memory region and reuse it's preapplication strategy in the next release.
- Change from dynamic generic to static generic.
- Implement multi-task receiver. Just one task work with one
ibv_recv_wrcan not handle high-concurrency SEND requests. So more receiver tasks with moreibv_recv_wrwere spawned to handle highly concurrent requests in this release. - Redesign memory region transfer APIs.
- Refine the interface capability
- Change from
send_mrtosend_local_mrandsend_remote_mr. - Change from
receive_mrtoreceive_local_mrandreceive_remote_mr.
- Change from
- Take the ownership of the sent memory region. Prevent both ends from operating on the same memory region at the same time.
- Refine the interface capability
- Fix retry bug. When
rnr_retry==7, the sender will keep retrying until the system freezes. We can't get any effective information to debug it because this part of the work is performed by the kernel module and there is no error message. But that will deplete CPU resources and the only thing we can observe is the system freezes. Solution is to makernr_retry< 7. - Add timeout mechanism for remote requet operations to prevent Infinite wait.
- Make
AgentholdAgentThreadto prevent the senders' release.
Async-rdma is a framework for
writing RDMA applications with high-level abstraction and asynchronous APIs.
Remote Direct Memory Access(RDMA) is direct access of memory from memory of one machine to the memory of another. It helps in boosting the performance of applications that need low latency and high throughput as it supports kernel bypass and zero copy while not involving CPU.
However, writing RDMA applications with low-level c library is laborious and error-prone. We want
easy-to-use APIs that hide the complexity of the underlying RDMA operations, so we developed
async-rdma. With the help of async-rdma, most RDMA operations can be completed by writing
only one line of code.
-
Tools for establishing connections with rdma endpoints.
-
High-level async APIs for data transmission between endpoints.
-
High-level APIs for rdma memory region management.
-
A framework working behind APIs for memory region management and executing rdma requests asynchronously.
We develop async-rdma with the aim of becoming production-grade. But now it is too young, still in the
experimental stage. We are adding more features and improving stability. We welcome everyone to try, ask
questions and make suggestions.
-
RDMA introduction video: https://www.youtube.com/watch?v=lu78_C-9jvA
-
RDMA introduction doc: http://www.reports.ias.ac.in/report/12829/understanding-the-concepts-and-mechanisms-of-rdma