Cache Architecture

Note

This page provides a technical overview of the Tuist cache service architecture. It is primarily intended for self-hosting users and contributors who need to understand the internal workings of the service. General users who only want to use the cache do not need to read this.

The Tuist cache service is a standalone service that provides Content Addressable Storage (CAS) for build artifacts and a key-value store for cache metadata.

Overview#

The service uses a two-tier storage architecture plus local SQLite metadata:

Local disk: Primary storage for low-latency cache hits
S3: Durable storage that persists artifacts and allows recovery after eviction
SQLite: Local metadata for artifact access tracking, orphan cleanup, background jobs, and key-value cache data

flowchart LR
    CLI[Tuist CLI] --> NGINX[Nginx]
    NGINX --> APP[Cache service]
    NGINX -->|X-Accel-Redirect| DISK[(Local Disk)]
    APP --> S3[(S3)]
    APP -->|auth| SERVER[Tuist Server]

Components#

Nginx#

Nginx serves as the entry point and handles efficient file delivery using X-Accel-Redirect:

Downloads: The cache service validates authentication, then returns an X-Accel-Redirect header. Nginx serves the file directly from disk or proxies from S3.
Uploads: Nginx proxies requests to the cache service, which streams data to disk.

Content Addressable Storage#

Artifacts are stored on local disk in a sharded directory structure:

Path: {account}/{project}/cas/{shard1}/{shard2}/{artifact_id}
Sharding: First four characters of the artifact ID create a two-level shard (e.g., ABCD1234 → AB/CD/ABCD1234)

SQLite Metadata#

The cache service uses two SQLite databases:

Primary metadata DB: Stores cache_artifacts, orphan scan cursors, Oban jobs, and other service metadata.
Key-value DB: Stores key_value_entries and key_value_entry_hashes in a dedicated SQLite file.

The key-value store is split into its own database so it can use SQLite incremental auto-vacuum without affecting artifact metadata and orphan cleanup state.

S3 Integration#

S3 provides durable storage:

Background uploads: After writing to disk, artifacts are queued for upload to S3 via a background worker that runs every minute
On-demand hydration: When a local artifact is missing, the request is served immediately via a presigned S3 URL while the artifact is queued for background download to local disk

Disk Eviction#

The service manages disk space using multiple background processes:

CAS disk eviction uses LRU semantics backed by cache_artifacts
When disk usage exceeds 85%, the oldest artifacts are deleted until usage drops to 70%
Artifacts remain in S3 after local eviction
KV eviction removes old key-value entries by retention and can also shrink the dedicated KV database when it grows past its configured size budget

Orphan Cleanup#

The service also runs an orphan cleanup worker for disk artifacts:

It scans the storage tree for files that exist on disk but have no corresponding cache_artifacts row.
This can happen if a file is written to disk but the metadata write is lost before the SQLite buffer flush completes.
Files newer than a safety window are ignored to avoid racing with in-flight uploads.
If an orphan is deleted and later requested again, the next cache miss causes it to be uploaded again, so the system self-heals.

sequenceDiagram
    participant CLI as Tuist CLI
    participant N as Nginx
    participant A as Cache service
    participant D as Disk
    participant S as S3

    CLI->>N: GET /api/cache/cas/:id
    N->>A: Proxy for auth
    A-->>N: X-Accel-Redirect
    alt On disk
        N->>D: Serve file
    else Not on disk
        N->>S: Proxy from S3
    end
    N-->>CLI: File bytes

Upload#

sequenceDiagram
    participant CLI as Tuist CLI
    participant N as Nginx
    participant A as Cache service
    participant D as Disk
    participant S as S3

    CLI->>N: POST /api/cache/cas/:id
    N->>A: Proxy upload
    A->>D: Stream to disk
    A-->>CLI: 201 Created
    A->>S: Background upload

API Endpoints#

Endpoint	Method	Description
`/up`	GET	Health check
`/metrics`	GET	Prometheus metrics
`/api/cache/cas/:id`	GET	Download CAS artifact
`/api/cache/cas/:id`	POST	Upload CAS artifact
`/api/cache/keyvalue/:cas_id`	GET	Get key-value entry
`/api/cache/keyvalue`	PUT	Store key-value entry
`/api/cache/module/:id`	HEAD	Check if module artifact exists
`/api/cache/module/:id`	GET	Download module artifact
`/api/cache/module/start`	POST	Start multipart upload
`/api/cache/module/part`	POST	Upload part
`/api/cache/module/complete`	POST	Complete multipart upload