# Cache Architecture {#cache-architecture}

> [!NOTE]
> This page provides a technical overview of the Tuist cache service architecture. It is primarily intended for **self-hosting users** and **contributors** who need to understand the internal workings of the service. General users who only want to use the cache do not need to read this.


The Tuist cache service is a standalone service that provides Content Addressable Storage (CAS) for build artifacts and a key-value store for cache metadata.

## Overview {#overview}

The service uses a two-tier storage architecture plus local SQLite metadata:

- **Local disk**: Primary storage for low-latency cache hits
- **S3**: Durable storage that persists artifacts and allows recovery after eviction
- **SQLite**: Local metadata for artifact access tracking, orphan cleanup, background jobs, and key-value cache data

```mermaid
flowchart LR
    CLI[Tuist CLI] --> NGINX[Nginx]
    NGINX --> APP[Cache service]
    NGINX -->|X-Accel-Redirect| DISK[(Local Disk)]
    APP --> S3[(S3)]
    APP -->|auth| SERVER[Tuist Server]
```

## Components {#components}

### Nginx {#nginx}

Nginx serves as the entry point and handles efficient file delivery using `X-Accel-Redirect`:

- **Downloads**: The cache service validates authentication, then returns an `X-Accel-Redirect` header. Nginx serves the file directly from disk or proxies from S3.
- **Uploads**: Nginx proxies requests to the cache service, which streams data to disk.

### Content Addressable Storage {#cas}

Artifacts are stored on local disk in a sharded directory structure:

- **Path**: `{account}/{project}/cas/{shard1}/{shard2}/{artifact_id}`
- **Sharding**: First four characters of the artifact ID create a two-level shard (e.g., `ABCD1234` → `AB/CD/ABCD1234`)

### SQLite Metadata {#sqlite}

The cache service uses two SQLite databases:

- **Primary metadata DB**: Stores `cache_artifacts`, orphan scan cursors, Oban jobs, and other service metadata.
- **Key-value DB**: Stores `key_value_entries` and `key_value_entry_hashes` in a dedicated SQLite file.

The key-value store is split into its own database so it can use SQLite incremental auto-vacuum without affecting artifact metadata and orphan cleanup state.

### S3 Integration {#s3}

S3 provides durable storage:

- **Background uploads**: After writing to disk, artifacts are queued for upload to S3 via a background worker that runs every minute
- **On-demand hydration**: When a local artifact is missing, the request is served immediately via a presigned S3 URL while the artifact is queued for background download to local disk

### Disk Eviction {#eviction}

The service manages disk space using multiple background processes:

- **CAS disk eviction** uses LRU semantics backed by `cache_artifacts`
- When disk usage exceeds 85%, the oldest artifacts are deleted until usage drops to 70%
- Artifacts remain in S3 after local eviction
- **KV eviction** removes old key-value entries by retention and can also shrink the dedicated KV database when it grows past its configured size budget

### Orphan Cleanup {#orphan-cleanup}

The service also runs an orphan cleanup worker for disk artifacts:

- It scans the storage tree for files that exist on disk but have no corresponding `cache_artifacts` row.
- This can happen if a file is written to disk but the metadata write is lost before the SQLite buffer flush completes.
- Files newer than a safety window are ignored to avoid racing with in-flight uploads.
- If an orphan is deleted and later requested again, the next cache miss causes it to be uploaded again, so the system self-heals.

### Authentication {#authentication}

The cache delegates authentication to the Tuist server by calling the `/api/projects` endpoint and caching results (10 minutes for success, 3 seconds for failure).

## Request Flows {#request-flows}

### Download {#download-flow}

```mermaid
sequenceDiagram
    participant CLI as Tuist CLI
    participant N as Nginx
    participant A as Cache service
    participant D as Disk
    participant S as S3

    CLI->>N: GET /api/cache/cas/:id
    N->>A: Proxy for auth
    A-->>N: X-Accel-Redirect
    alt On disk
        N->>D: Serve file
    else Not on disk
        N->>S: Proxy from S3
    end
    N-->>CLI: File bytes
```

### Upload {#upload-flow}

```mermaid
sequenceDiagram
    participant CLI as Tuist CLI
    participant N as Nginx
    participant A as Cache service
    participant D as Disk
    participant S as S3

    CLI->>N: POST /api/cache/cas/:id
    N->>A: Proxy upload
    A->>D: Stream to disk
    A-->>CLI: 201 Created
    A->>S: Background upload
```

## API Endpoints {#api-endpoints}

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/up` | GET | Health check |
| `/metrics` | GET | Prometheus metrics |
| `/api/cache/cas/:id` | GET | Download CAS artifact |
| `/api/cache/cas/:id` | POST | Upload CAS artifact |
| `/api/cache/keyvalue/:cas_id` | GET | Get key-value entry |
| `/api/cache/keyvalue` | PUT | Store key-value entry |
| `/api/cache/module/:id` | HEAD | Check if module artifact exists |
| `/api/cache/module/:id` | GET | Download module artifact |
| `/api/cache/module/start` | POST | Start multipart upload |
| `/api/cache/module/part` | POST | Upload part |
| `/api/cache/module/complete` | POST | Complete multipart upload |
