Skip to content

Keep rules only see artifacts that already passed the Delete rule's AQL filter — "keep latest N" can't protect the latest N across all versions when combined with a Delete rule #183

Description

@sdn2s

Summary

When a Keep* rule (e.g. KeepLatestNDockerImages) shares a policy with a filtering
Delete* rule (e.g. DeleteDockerImagesNotUsed), the keep rule does not protect the
latest N items across all versions — only the latest N among the items that already
survived the delete rule's AQL filter.

The common retention intent — "delete images not used for N days, but always keep the
latest M versions"
— is therefore not expressible in a single policy, and on many real
datasets the policy deletes nothing (or the wrong set).

Root cause

All rules contribute their aql_add_filter into a single AQL find combined with $and
(artifactory_cleanup/rules/base.py, _get_aql_find_filters, lines 238–249):

def _get_aql_find_filters(self) -> Dict:
    filters = []
    for rule in self.rules:
        filters = rule.aql_add_filter(filters)
    return {"$and": filters}        # every rule's filter AND-ed together

DeleteDockerImagesNotUsed.aql_add_filter (docker.py:125–165) adds a date constraint, so
only stale/unused images are fetched — recently-used versions never come back.

Keep* rules add nothing to the AQL; they only implement in-memory filter(), which runs on
the already-narrowed set (CleanupPolicy.filter, base.py:280–287):

def filter(self, artifacts: ArtifactsList) -> ArtifactsList:
    for rule in self.rules:
        artifacts = rule.filter(artifacts)   # `artifacts` = only the stale items from AQL
    return artifacts

KeepLatestNDockerImages.filter (docker.py:214–239) groups the received artifacts by path
and keeps the N newest digests of that subset — it has no knowledge of the truly newest
versions, which were filtered out earlier as "recently used".

Reproduction

- name: Delete docker images not used for 120 days, keep latest 5
  rules:
    - rule: DeleteDockerImagesNotUsed
      days: 120
    - rule: KeepLatestNDockerImages
      count: 5
    - rule: IncludeDockerImages
      masks: ["*"]

Take an image whose recent versions are still being downloaded and whose older versions are
not. The date filter drops the recently-used versions, so AQL returns only the stale ones.
KeepLatestNDockerImages then keeps the N newest of that stale subset. If the number of
stale versions is ≤ N, the policy deletes nothing — even though plenty of old, unused versions
exist beyond the latest N overall.

Expected vs actual

  • Expected: keep the latest N versions overall, and delete the ones that are both older
    than the threshold and not in that latest-N set.
  • Actual: "keep latest N" is computed only over the stale subset, so the newest stale
    versions are protected indefinitely and the intended deletions never happen.

The same applies to non-docker rules, e.g. DeleteOlderThan + KeepLatestNFiles in one policy.

Why config alone can't work around it

Splitting into two policies makes it worse: policies are independent and their deletions are
unioned, and a Keep* rule in one policy does not protect artifacts targeted by another.
So "delete (unused) AND NOT (in latest N of all)" is inexpressible.

Proposed fix

Make "keep latest N" evaluate over all versions, not just the post-filter subset:

  1. (Preferred, backward-compatible) Add a new keep rule that runs its own AQL query for
    all tags of each image (independent of the policy's delete filter), computes the N newest,
    and removes them from the deletion set. There's precedent for a rule issuing its own AQL
    request (_collect_docker_size). Existing behavior stays intact.
  2. Otherwise, at least document this interaction, since changing the existing rule in place
    would break policies that rely on "keep newest among the stale subset".

Happy to submit a PR for option 1 if maintainers agree on the rule name/approach.

Environment

  • artifactory-cleanup: master (commit 4fff979)
  • Rules: DeleteDockerImagesNotUsed + KeepLatestNDockerImages (general: any filtering Delete
    rule + any Keep rule in the same policy)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions