Draft
Conversation
3af4977 to
6319e91
Compare
50f7865 to
e2cbef1
Compare
duckhawk
added a commit
that referenced
this pull request
Apr 20, 2026
This change implements the remaining review items from PR #157 and aligns the driver with the synced ADR. driver: - NodeStageVolume now fetches the PV BEFORE creating disk.img and fails with codes.Unavailable on a missing PV for a brand-new volume so kubelet retries cleanly. The reclaim and sparse markers (.reclaim, .sparse) are persisted on disk before the backing file is created, ensuring the orphan-cleanup goroutine and NodeExpandVolume can rely on them even when apiserver is unreachable. Re-stage of an existing volume is idempotent and back-fills missing markers for older volumes. - NodeStageVolume rolls back a freshly-created backing file when losetup, mkfs, mount or online resize fails, so retries do not reuse a half-formatted image. Pre-existing files are left intact. - NodeExpandVolume resolves the sparse flag from PV.VolumeAttributes with the on-disk .sparse marker as the fallback and fails closed with codes.FailedPrecondition when neither source is available. - CreateVolume no longer falls back to the controller's hostID when no topology is provided; HA controllers MUST receive topology via StorageClass.allowedTopologies (Immediate) or scheduler Preferred (WFFC). The rawFile.nodes filter is kept as defence-in-depth. - Removed obsolete EnsureDataDir call from controller startup; the data directory is created lazily on the node side. rawfile: - New SetSparse / GetSparse helpers persist the sparse flag in a .sparse marker via atomic writeMarker / readMarker helpers (also used for .reclaim). - createRawFile now Chmod()s the file after fallocate so umask does not weaken the intended 0600 mode (parity with the truncate branch). - FindLoopDevice / FindAllLoopDevices guard against an empty losetup output with ErrLoopDeviceNotFound; parsing is extracted into parseLosetupOutput for unit-testability. - Added unit tests covering: parseLosetupOutput edge cases, volumeLock mutual exclusion, distinct-volume parallelism and reference-counted entry release, .reclaim and .sparse marker round-trips, and DeleteVolume marker cleanup. driver/cleanup_test.go (new): - Covers listRawFilePVs (driver/type filtering), cleanupOrphanedVolume (Retain keeps file, Delete removes file, missing marker is conservative), processRawFilePVs (grace period, aged Delete vs aged Retain orphans, live Retain PV is left alone) and small helpers. webhooks/lscValidator: - validateRawFile now checks that every node in spec.rawFile.nodes exists in the cluster via a node Get loop. Local validation (empty / duplicate names) runs first to short-circuit. The existence check fails-open on transient apiserver errors so webhook outages do not block LSC edits, and is fully mocked from unit tests. internal: - Removed the unused RawFileReclaimPolicyKey constant. pkg/utils/func.go: - Re-grouped imports to satisfy gci ordering. Signed-off-by: v.oleynikov <vasily.oleynikov@flant.com>
abf605b to
60079da
Compare
Introduce RawFile as an alternative to LVM — volumes backed by regular files mounted as loop devices. Components: API/CRD, CSI driver (controller + node), module controller, scheduler extender, webhooks, Helm templates, documentation (EN/RU). Key implementation details: - Per-volume locking in rawfile.Manager (no global mutex contention) - PV finalizer management via Patch with conflict retry - Node filtering: LSC rawFile.nodes propagated through SC params and enforced in CreateVolume topology filtering - Robust DeleteVolume: checks PV attrs, local file, LLV existence before choosing RawFile vs LVM path - ValidateVolumeID for path traversal prevention - Sparse and pre-allocated file creation (fallocate syscall → binary → goAllocate fallback chain) - Cleanup goroutine for orphaned volumes and finalizer-protected PVs
Signed-off-by: v.oleynikov <vasily.oleynikov@flant.com>
- Pin RawFile PVs to a single node in CreateVolume. RawFile volumes are inherently node-local (the backing file lives on one filesystem), so AccessibleTopology MUST advertise exactly one node. Previously, multi-node topology could let the scheduler place a Pod on a different node, where NodeStageVolume would silently create a new empty file and orphan the original. - Propagate rawFile.nodes to StorageClass.allowedTopologies so the scheduler (WaitForFirstConsumer) and external-provisioner (Immediate) honor the constraint instead of relying on a post-hoc filter in CreateVolume. - Persist PV ReclaimPolicy on disk next to disk.img and refuse to delete orphaned files unless the persisted policy is "Delete". This restores the contract for ReclaimPolicy: Retain even after the PV API object disappears (e.g., manual cleanup, etcd loss). - Loosen CRD immutability for spec.rawFile: only `sparse` stays immutable (changing it would require recreating existing volumes); `nodes` is now mutable so operators can extend or shrink the eligible-node list. A new top-level rule still forbids switching the backend (lvm <-> rawFile). - Extend hasSCDiff to detect rawFile.sparse / rawFile.nodes drift between the LSC and its StorageClass; the SC is recreated when the eligible-node list (parameters + allowedTopologies) goes out of sync. - Rename volume-context key "rawfileSize" -> "local.csi.storage.deckhouse.io/rawfile-size" for consistency with the other RawFile keys. Signed-off-by: v.oleynikov <vasily.oleynikov@flant.com>
- Detach every loop device attached to a backing file. losetup permits multiple loop devices per file; the previous DetachLoopDevice only detached the first hit and silently left the others attached, which could keep the file busy and block volume deletion. New helper FindAllLoopDevices returns the full list; DetachLoopDevice now iterates and joins per-device errors. FindLoopDevice keeps its single-device signature for the (more common) "primary attachment" callers. - Broaden retry classification in addFinalizer/removeFinalizer. Previously only 409 Conflict was retried, so a transient 429/503/500/timeout from the apiserver during NodeStageVolume would surface as a hard error and leave the PV without our cleanup finalizer. The new patchFinalizerWithRetry helper retries on Conflict + ServerTimeout + Timeout + TooManyRequests + InternalError + ServiceUnavailable, with exponential backoff capped at 2s and a context-aware sleep so cancellation propagates promptly. - Replace the per-volume Get PV burst in processRawFilePVs with a single LIST per cycle, indexed by PV name and pre-filtered to RawFile PVs of this driver. Apiserver pressure drops from O(N_local_volumes) to O(1) per cycle; semantics for the orphan-cleanup grace period are preserved. Signed-off-by: v.oleynikov <vasily.oleynikov@flant.com>
This change implements the remaining review items from PR #157 and aligns the driver with the synced ADR. driver: - NodeStageVolume now fetches the PV BEFORE creating disk.img and fails with codes.Unavailable on a missing PV for a brand-new volume so kubelet retries cleanly. The reclaim and sparse markers (.reclaim, .sparse) are persisted on disk before the backing file is created, ensuring the orphan-cleanup goroutine and NodeExpandVolume can rely on them even when apiserver is unreachable. Re-stage of an existing volume is idempotent and back-fills missing markers for older volumes. - NodeStageVolume rolls back a freshly-created backing file when losetup, mkfs, mount or online resize fails, so retries do not reuse a half-formatted image. Pre-existing files are left intact. - NodeExpandVolume resolves the sparse flag from PV.VolumeAttributes with the on-disk .sparse marker as the fallback and fails closed with codes.FailedPrecondition when neither source is available. - CreateVolume no longer falls back to the controller's hostID when no topology is provided; HA controllers MUST receive topology via StorageClass.allowedTopologies (Immediate) or scheduler Preferred (WFFC). The rawFile.nodes filter is kept as defence-in-depth. - Removed obsolete EnsureDataDir call from controller startup; the data directory is created lazily on the node side. rawfile: - New SetSparse / GetSparse helpers persist the sparse flag in a .sparse marker via atomic writeMarker / readMarker helpers (also used for .reclaim). - createRawFile now Chmod()s the file after fallocate so umask does not weaken the intended 0600 mode (parity with the truncate branch). - FindLoopDevice / FindAllLoopDevices guard against an empty losetup output with ErrLoopDeviceNotFound; parsing is extracted into parseLosetupOutput for unit-testability. - Added unit tests covering: parseLosetupOutput edge cases, volumeLock mutual exclusion, distinct-volume parallelism and reference-counted entry release, .reclaim and .sparse marker round-trips, and DeleteVolume marker cleanup. driver/cleanup_test.go (new): - Covers listRawFilePVs (driver/type filtering), cleanupOrphanedVolume (Retain keeps file, Delete removes file, missing marker is conservative), processRawFilePVs (grace period, aged Delete vs aged Retain orphans, live Retain PV is left alone) and small helpers. webhooks/lscValidator: - validateRawFile now checks that every node in spec.rawFile.nodes exists in the cluster via a node Get loop. Local validation (empty / duplicate names) runs first to short-circuit. The existence check fails-open on transient apiserver errors so webhook outages do not block LSC edits, and is fully mocked from unit tests. internal: - Removed the unused RawFileReclaimPolicyKey constant. pkg/utils/func.go: - Re-grouped imports to satisfy gci ordering. Signed-off-by: v.oleynikov <vasily.oleynikov@flant.com>
The truncated header in cleanup_test.go and lscValidator_test.go was rejected by the license linter. Restore the full Apache 2.0 boilerplate to match the rest of the repository. Signed-off-by: v.oleynikov <vasily.oleynikov@flant.com>
- Pin TopologyKey constant on both sides of the cross-image boundary: add a literal-value test in controller/pkg/controller and a mirror in sds-local-volume-csi/internal, plus a comment on the CSI-side const so any rename forces both packages to be updated together. - Add Ginkgo specs covering the controller's RawFile contract: AllowedTopologies propagation from rawFile.nodes (with TopologyKey), empty-topologies behavior when nodes is unset, and SC drift detection / re-creation on rawFile.nodes mutation. - Unify Apache 2.0 copyright year on the four files added by this PR (2025 -> 2026) so all newly introduced files match the year of their initial commit. Signed-off-by: v.oleynikov <vasily.oleynikov@flant.com>
c234ebe to
0b9c2d7
Compare
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR adds support for RawFile volumes — local file-based volumes mounted as loop devices. They provide a simpler alternative to LVM for local storage when LVM is not available or not desired.
Key changes:
New
rawfilepackage (images/sds-local-volume-csi/pkg/rawfile/) — handles raw-file/loop-device management:losetup)CSI driver updates (
images/sds-local-volume-csi/driver/):CreateVolume: validates topology and pins each volume to a single node taken fromaccessibility_requirements(no implicit fallback to the controller's host)NodeStageVolume: fetches the PV before file creation, persists.reclaimand.sparseon-disk markers, adds thestorage.deckhouse.io/rawfile-pv-protectionfinalizer forDeletepolicy, then creates the file and attaches the loop deviceNodeUnstageVolume/NodeExpandVolume: detach + clean unmount; online resizeDeleteVolume: defers actual deletion to the node-side cleanup goroutine (driven by on-disk markers + finalizer)LocalStorageClassCRD extension (api/v1alpha1/,crds/):rawFilespec withsparse(immutable) andnodes(mutable)XValidationfor mutual exclusivity withlvm, immutability of backend switching andrawFile.sparse, mutability ofrawFile.nodesController updates (
images/controller/):type=rawfile,rawfile-sparse,rawfile-nodes)rawFile.nodespropagated toStorageClass.AllowedTopologies(keytopology.sds-local-volume-csi/node) so both the scheduler (WFFC) and external-provisioner (Immediate) honor the constrainthasSCDiffforrawFile.sparseandrawFile.nodestriggers SC recreationWebhook validation (
images/webhooks/):Nodeobjects (fail-open on transient API errors)Cleanup mechanism (node-side):
.reclaim/.sparsemarkers stored next to each raw fileReclaimPolicy: Retainvia the marker)storage.deckhouse.io/rawfile-pv-protection)ModuleConfig (
openapi/config-values.yaml):rawFileDefaultDataDirparameter (default:/var/lib/sds-local-volume/rawfile)RAWFILE_DEFAULT_DATA_DIRand aBidirectional-propagatedhostPathmountWhy do we need it, and what problem does it solve?
Problem. Some environments cannot use LVM due to:
Solution. RawFile volumes provide:
rawFile.nodes)What is the expected result?
After applying these changes:
LocalStorageClasswith RawFile:volume node affinity conflicterrors (single-node pinning via WFFC + topology label).Deletereclaim — files are cleaned up by the node-side cleanup loop after PV finalizer is removed.Retainreclaim — files are preserved (.reclaim=Retainmarker).Checklist