-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
Description
| System | Created | Feature | Fixed? | Testing | Others |
|---|---|---|---|---|---|
| HBase | 2019.04 | Log roll on slow sync | ✔️ | TestLogRolling.java | Explicitly mentioned 'gray failure'. |
| HBase | 2017.09 | Alert slow reads on a block | Stale | x | x |
| etcd | 2023.02 | Leader stuck in handling raft Ready due to slow fdatasync or high CPU | ✔️ | v3_lease_no_proxy_test.go | Still active |
| crdb | 2023.01 | Adapt charybdefs-based roachtest to allow for indefinite stalls. Also discussed in etcd-issues | ✔️ | commit history, disk_stall.go, pebble_test.go | x |
| crdb | 2022.05 | Initial discussion on handling of degraded storage devices. | 🚫 | x | Developers said "not clear to me what the best mechanism nor policy for this would be". Marked as stale 18 months later. |
| crdb | 2020.10 | Throw warning on slow log write, fatal on engine disk stall. | ✔️ | They found this to be incomplete and thus move to the above mentioned solutions | Developers intend to be "conservative" to respond to disk stalls (>20s) and only print a warning on disk slow (>10s). |
| cassandra | 2016.08 | Slow query detecting. | ✔️ | stef1927/cql_test.py and riptano/cql_test.py | By setting slow_query_log_timeout_in_ms to a small value (i.e., 10ms and 30ms) so that it's easily triggered (NOT INJECTING SLOW FAULTS) |
| kafka | - | Mitigating Kafka Broker ‘Gray’ Failures. | |||
| kafka | - | SRECON'23: Improving Kafka Resilience - Gray Failures Mitigation. |
Misc:
- CRDB is actually using
charbydefsbut only simulate & test disk stalls. See this ticket for how developers use charybdefs.
Generally, developers are accessible to and intentionally use tools to inject slow faults, but rather targeting the worst-case scenario: disk stalls; they did not iterate through different severity levels even if they are able to.
Reactions are currently unavailable