Skip to content

snowzach/dockyard

Repository files navigation

Dockyard

Dockyard

A pull-through cache for OCI container images. Dockyard proxies registry requests, caches manifests and blobs in S3-compatible storage, and serves cached content on subsequent pulls. It supports multiple upstream registries with configurable authentication, including automatic credential helpers for AWS ECR.

How It Works

When a client pulls an image through Dockyard:

  1. Route -- The image name is matched against configured upstreams using longest-prefix matching
  2. Check upstream -- For tag references (e.g. latest), a HEAD request checks the current digest. Digest references (e.g. sha256:abc...) skip this step since they're immutable
  3. Serve or fetch -- If the cached digest matches upstream, serve from cache. Otherwise, fetch the full manifest, cache it, and serve
  4. Blobs -- Blob layers are content-addressable and globally deduplicated. On cache miss, the blob is simultaneously streamed to the client and uploaded to S3 (no buffering)
  5. Fallback -- If the upstream is unreachable, Dockyard serves stale cached content rather than failing
docker pull dockyard:5000/ghcr.io/nginx/nginx:latest
                          └── prefix ──┘ └── repo ──┘

Dockyard matches ghcr.io to its configured upstream, strips the prefix, and proxies the request for nginx/nginx:latest to ghcr.io.

Quick Start

# Start Dockyard + MinIO
docker compose up --build -d

# Pull an image through the cache
docker pull localhost:5000/docker.io/library/alpine:latest

# MinIO console at http://localhost:9001 (minioadmin / minioadmin)

Configuration

Dockyard loads configuration in order of precedence (highest wins):

  1. Environment variables
  2. YAML config file (-c flag)
  3. Built-in defaults
dockyard server -c /etc/dockyard/config.yaml

Full Config Reference

logger:
  level: "info"          # debug, info, warn, error
  encoding: "console"    # console or json
  color: true            # colored output (console encoding only)
  output: "stderr"       # stderr, stdout, or a file path

server:
  host: ""               # listen address (empty = all interfaces)
  port: "5000"           # listen port
  tls: false
  certfile: ""           # required if tls: true
  keyfile: ""            # required if tls: true
  log:
    enabled: true        # HTTP request logging
    level: "info"
    request_body: false  # log request bodies (debug)
    response_body: false # log response bodies (debug)
    ignore_paths: []     # paths to skip logging

storage:
  endpoint: "minio:9000"       # S3-compatible endpoint (no protocol)
  region: "us-east-1"
  bucket: "dockyard-cache"     # must exist
  prefix: "cache/"             # key prefix for all objects
  access_key: "minioadmin"
  secret_key: "minioadmin"
  use_ssl: false               # true for AWS S3
  force_path_style: true       # required for MinIO
  touch_threshold: "24h"       # see "Cache Expiration" below

upstream:
  check_timeout: "5s"       # timeout for upstream HEAD/GET requests (0 to disable)

upstreams:
  - prefix: "docker.io"
    registry: "registry-1.docker.io"
    auth:
      type: "anonymous"

  - prefix: "ghcr.io"
    registry: "ghcr.io"
    auth:
      type: "anonymous"

Example: MinIO (Minimal)

The included config.yaml ships with default upstreams for Docker Hub and GitHub Container Registry (anonymous auth). If that's all you need, just point Dockyard at your S3-compatible storage:

storage:
  endpoint: "minio:9000"
  bucket: "dockyard-cache"
  access_key: "minioadmin"
  secret_key: "minioadmin"
  use_ssl: false
  force_path_style: true

No upstreams block is needed -- Docker Hub and GHCR are configured by default. Pull images through the cache immediately:

docker pull localhost:5000/docker.io/library/alpine:latest
docker pull localhost:5000/ghcr.io/nginx/nginx-prometheus-exporter:latest

Example: AWS S3

When running on AWS (EC2, ECS, EKS), you can omit access_key and secret_key entirely. The AWS SDK automatically picks up credentials from the environment (instance profile, IRSA, ECS task role, etc.).

storage:
  region: "us-east-1"
  bucket: "my-dockyard-cache"
  prefix: "cache/"
  use_ssl: true
  # endpoint omitted — defaults to AWS S3
  # access_key/secret_key omitted — uses IAM role
  # force_path_style omitted — defaults to false (virtual-hosted style)
  touch_threshold: "24h"

upstreams:
  # ECR private registry — uses Docker keychain + ecr-login helper
  - prefix: "123456789.dkr.ecr.us-east-1.amazonaws.com"
    registry: "123456789.dkr.ecr.us-east-1.amazonaws.com"
    # auth omitted — keychain default handles ECR token refresh

  - prefix: "docker.io"
    registry: "registry-1.docker.io"
    auth:
      type: "anonymous"

  - prefix: "ghcr.io"
    registry: "ghcr.io"
    auth:
      type: "anonymous"

Example: Cloudflare R2

R2 is S3-compatible with free egress, making it a good fit for a cache. Use your R2 account ID to form the endpoint, and an API token with R2 read/write permissions for credentials.

storage:
  endpoint: "<ACCOUNT_ID>.r2.cloudflarestorage.com"
  region: "auto"
  bucket: "dockyard-cache"
  prefix: "cache/"
  access_key: "<R2_ACCESS_KEY_ID>"
  secret_key: "<R2_SECRET_ACCESS_KEY>"
  use_ssl: true
  force_path_style: true
  touch_threshold: "24h"

upstreams:
  - prefix: "docker.io"
    registry: "registry-1.docker.io"
    auth:
      type: "anonymous"

  - prefix: "ghcr.io"
    registry: "ghcr.io"
    auth:
      type: "anonymous"

Environment Variable Overrides

All scalar config keys can be overridden with environment variables. Replace dots with underscores and uppercase:

Config Key Environment Variable
logger.level LOGGER_LEVEL
server.port SERVER_PORT
storage.bucket STORAGE_BUCKET
storage.access_key STORAGE_ACCESS_KEY
storage.touch_threshold STORAGE_TOUCH_THRESHOLD
upstream.check_timeout UPSTREAM_CHECK_TIMEOUT

Note: Upstreams cannot be configured via environment variables. The env var override works by matching against keys that already exist in the config, and upstream arrays are only defined in the config file. Upstreams must be configured in the YAML config file.

The included config.yaml ships with default upstreams for Docker Hub and GitHub Container Registry (anonymous auth), which covers most public image pulls out of the box. Add additional upstreams to the config file as needed.

Environment Variable Expansion in Values

Config values can reference environment variables using ${VAR} syntax. This is useful for injecting secrets (from Kubernetes Secrets, Vault, etc.) into the config without hardcoding them:

upstreams:
  - prefix: "ghcr.io/my-org"
    registry: "ghcr.io"
    auth:
      type: "bearer"
      password: "${GHCR_TOKEN}"   # resolved from env var at startup

storage:
  access_key: "${S3_ACCESS_KEY}"
  secret_key: "${S3_SECRET_KEY}"

Behavior:

  • Only braced ${VAR} is expanded — bare $VAR is left untouched
  • If the env var is not set, the ${VAR} literal is kept as-is (makes misconfiguration visible in error messages)
  • Expansion works for all string config values, including durations (e.g. ${CACHE_TTL}"5m")

Kubernetes example — inject a GitHub PAT from a Secret:

apiVersion: v1
kind: Secret
metadata:
  name: ghcr-credentials
stringData:
  token: "ghp_ABC123..."
---
apiVersion: apps/v1
kind: Deployment
spec:
  template:
    spec:
      containers:
        - name: dockyard
          envFrom:
            - secretRef:
                name: ghcr-credentials
          env:
            - name: GHCR_TOKEN
              valueFrom:
                secretKeyRef:
                  name: ghcr-credentials
                  key: token

Then reference ${GHCR_TOKEN} in the config YAML (which lives in a ConfigMap).

Authentication

Dockyard supports four auth modes per upstream. If no auth.type is set, it defaults to the Docker credential keychain.

Anonymous

For public registries. No credentials sent.

auth:
  type: "anonymous"

Basic Auth

HTTP Basic authentication with username and password.

auth:
  type: "basic"
  username: "myuser"
  password: "mytoken"

Bearer Token

A static bearer token (e.g. a GitHub PAT).

auth:
  type: "bearer"
  password: "ghp_ABC123..."

Docker Keychain (Default)

When auth.type is omitted or set to any value other than the above, Dockyard uses the Docker credential keychain (~/.docker/config.json). This delegates authentication to credential helpers, which is the recommended approach for registries that use short-lived tokens (like AWS ECR).

# Omit auth entirely -- keychain is the default
- prefix: "123456789.dkr.ecr.us-east-1.amazonaws.com"
  registry: "123456789.dkr.ecr.us-east-1.amazonaws.com"

The keychain falls back to anonymous when no matching credential helper is configured, so this is backward-compatible with public registries.

AWS ECR Setup

The Dockyard Docker image includes docker-credential-ecr-login. To use it:

  1. Mount a config.json with the ECR credential helper configured:
{
  "credHelpers": {
    "123456789.dkr.ecr.us-east-1.amazonaws.com": "ecr-login",
    "public.ecr.aws": "ecr-login"
  }
}
  1. Provide AWS credentials via one of:

    • EKS Pod Identity / IRSA (recommended) -- no static credentials needed
    • Environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION)
    • EC2 instance metadata (automatic on EC2/ECS)
  2. Omit auth in the upstream config so the keychain default handles it:

upstreams:
  - prefix: "123456789.dkr.ecr.us-east-1.amazonaws.com"
    registry: "123456789.dkr.ecr.us-east-1.amazonaws.com"

The credential helper automatically refreshes ECR tokens (which expire every 12 hours), so no restarts or config updates are needed.

Kubernetes Example

apiVersion: v1
kind: ConfigMap
metadata:
  name: docker-config
data:
  config.json: |
    { "credHelpers": { "123456789.dkr.ecr.us-east-1.amazonaws.com": "ecr-login" } }
---
apiVersion: apps/v1
kind: Deployment
spec:
  template:
    spec:
      serviceAccountName: dockyard  # with IRSA annotation for ECR access
      containers:
        - name: dockyard
          volumeMounts:
            - name: docker-config
              mountPath: /home/nonroot/.docker
              readOnly: true
      volumes:
        - name: docker-config
          configMap:
            name: docker-config

Cache Expiration

Dockyard itself does not delete cached content. Instead, it's designed to work with S3 lifecycle policies for automatic expiration.

Touch Threshold

The storage.touch_threshold setting (default: 24h) controls how Dockyard keeps actively-used content alive:

  • When serving a cached object, if its S3 LastModified timestamp is older than the threshold, Dockyard copies the object in place to reset its timestamp
  • This prevents S3 lifecycle rules from deleting content that is still being pulled
  • Objects that are never accessed will naturally age out via lifecycle policy
  • Set to 0 to disable touching (if you don't use S3 lifecycle rules)

Recommended S3 Lifecycle Policy

Configure an S3 lifecycle rule to expire objects older than your desired retention period. For example, expire objects not accessed in 30 days:

{
  "Rules": [
    {
      "ID": "expire-stale-cache",
      "Status": "Enabled",
      "Filter": { "Prefix": "cache/" },
      "Expiration": { "Days": 30 }
    }
  ]
}

With touch_threshold: "24h" and a 30-day lifecycle rule:

  • Objects pulled at least once every 30 days are kept indefinitely
  • Objects not pulled for 30 days are automatically deleted
  • The touch threshold prevents unnecessary S3 copy operations (at most once per 24h per object)

What Gets Cached

Content Key Pattern Deduplication
Manifests (by digest) {prefix}/manifests/{name}/sha256/{digest} Per-repository
Tag pointers {prefix}/manifests/{name}/tags/{tag} Per-repository
Blobs {prefix}/blobs/sha256/{first-two-chars}/{digest} Global (across all repos)

Tag pointers are small files that contain only a digest string. When a tag is updated upstream, the pointer is updated on the next pull and the old manifest naturally expires via lifecycle.

Blobs are stored globally by digest, so identical layers shared across images are stored only once.

Upstream Routing

Upstreams are matched by longest prefix. The prefix is stripped from the image name to produce the repository path sent to the upstream registry.

upstreams:
  # More specific prefix matches first
  - prefix: "ghcr.io/my-org"
    registry: "ghcr.io"
    auth:
      type: "bearer"
      password: "ghp_orgtoken..."

  # Catch-all for other ghcr.io repos
  - prefix: "ghcr.io"
    registry: "ghcr.io"
    auth:
      type: "anonymous"

  - prefix: "docker.io"
    registry: "registry-1.docker.io"
    auth:
      type: "anonymous"

With this config:

  • ghcr.io/my-org/app:v1 matches the first upstream (uses bearer auth)
  • ghcr.io/other/tool:latest matches the second upstream (anonymous)
  • docker.io/library/nginx:latest matches the third upstream

Deployment

Docker Compose (Development)

The included docker-compose.yml starts Dockyard with MinIO:

docker compose up --build -d

Services:

  • Dockyard on port 5000
  • MinIO S3 on port 9000, console on port 9001
  • minio-init creates the dockyard-cache bucket automatically

Docker

docker build -t dockyard .
docker run -p 5000:5000 -v ./config.yaml:/etc/dockyard/config.yaml:ro \
  dockyard server -c /etc/dockyard/config.yaml

The Docker image supports multiarch builds (amd64/arm64):

docker buildx build --platform linux/amd64,linux/arm64 -t dockyard .

Using as a Registry Mirror

Configure your container runtime to pull through Dockyard. The image name must include the upstream prefix as configured:

# Direct pull
docker pull localhost:5000/docker.io/library/nginx:latest

# Equivalent to
docker pull nginx:latest  # (but routed through the cache)

For Kubernetes, configure containerd or CRI-O to use Dockyard as a mirror for specific registries.

Building

make          # Build binary to build/dockyard
make run      # Build and run with config.yaml
make test     # Run tests
make docker   # Build Docker image
make tidy     # go mod tidy
make clean    # Remove build artifacts

OCI Endpoints

Dockyard implements the OCI Distribution Spec (pull-only):

Method Path Description
GET /v2/ API version check
GET /v2/{name}/manifests/{reference} Get manifest by tag or digest
HEAD /v2/{name}/manifests/{reference} Check manifest existence
GET /v2/{name}/blobs/{digest} Get blob layer
HEAD /v2/{name}/blobs/{digest} Check blob existence
GET /version Dockyard version info

Push, delete, and catalog operations are not supported -- Dockyard is a read-only cache.

About

Dockyard is a pull through container cache backed by S3-compatible storage

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors