owlette docs
api

chunks - the content-addressed data plane

chunks are the atomic unit of storage in roost. every file in a version is a list of chunk digests; every chunk is an immutable blob of bytes keyed by its sha-256 hash. this doc is the end-to-end contract for chunking, hash format, storage layout, upload flow, download flow, cross-roost mount, referrer lookup, retry behavior, and error taxonomy.

Last updated: 2026-05-01 Status: normative for all roost v2 uploads, downloads, gc, and rollback paths.

related:

  • versions.md - version publish schema; chunks are embedded in version.files[].chunks[] as { hash, size }.
  • web/openapi.yaml - rendered route reference.
  • quickstart.md - first public API smoke workflow before you move on to roost publishing.

1. chunk shape

  • fixed size: every chunk is exactly 4 MiB (4,194,304 bytes) except the last chunk of each file, which may be 1..4,194,304 bytes.
  • algorithm: sha-256 only in v1. content-defined chunking (fastcdc / rabin) is explicitly deferred to v3.
  • hash encoding: lowercase hex, exactly 64 chars, matching ^[0-9a-f]{64}$.
  • wire format: every public chunk endpoint accepts and returns the bare 64-hex digest. do not add an algorithm prefix in request bodies, query strings, or path parameters.
  • version format: the same bare 64-hex digest is stored in version.files[].chunks[].hash.
  • zero-byte files have chunks: [] and size: 0. the empty chunk is never stored in r2.
  • ordering: chunks within a file are ordered by file offset. concatenation in order reproduces the file exactly; no gaps, no overlaps.
file bytes  --------------------------------------------------------------->
            |-- chunk[0] --|-- chunk[1] --|-- chunk[2] --|-- chunk[3] --|
            |  4,194,304   |  4,194,304   |  4,194,304   |   812,345    |
            |  64-hex hash |  64-hex hash |  64-hex hash |  64-hex hash |

2. storage layout

chunks are stored in cloudflare r2 at a per-site, sharded, content-addressed key:

project-content/{siteId}/{hash[0:2]}/{hash}
segmentsourcepurpose
project-contentfixed bucket prefixseparates chunks from version bodies
{siteId}owlette site that owns the chunktenant isolation
{hash[0:2]}first two hex chars of the chunk hashavoids hot prefixes in r2's keyspace
{hash}full 64-hex sha-256the content address

example:

project-content/kiosk-fleet-01/4e/4e07408562bedb8b60ce05c1decfe3ad16b72230967de01f640b7e4729b49fce

chunk paths never appear inside the version. the agent reconstructs them from siteId and hash. third-party clients normally do not construct them at all; they receive signed urls that embed the full path.

3. per-site isolation

chunks are scoped to the siteId they were uploaded under. two different sites cannot dedup against each other's content, even if they independently upload byte-identical files. this is enforced at the storage-key level ({siteId} is part of the r2 path) and at the api level.

the client never asks "does roost have this chunk?". it asks "does site X have this chunk?". every chunk-plane endpoint takes siteId in the body or query string. a missing or malformed siteId is validation_failed.

4. dedup flow (upload)

the chunker never ships bytes the server already has. the upload dance is a three-step round trip per batch:

  1. POST /api/chunks/check finds missing hashes for a site.
  2. POST /api/chunks/upload-urls mints signed r2 PUT urls for those missing hashes.
  3. the client uploads each chunk directly to r2.

step 1 - batch existence check

POST /api/chunks/check
content-type: application/json

{
  "siteId": "kiosk-fleet-01",
  "hashes": [
    "2e7d2c03a9507ae265ecf5b5356885a53393a2029d241394997265a1a25aefc6",
    "18ac3e7343f016890c510e93f935261169d9e3f565436429830faf0934f4f8e4",
    "4e07408562bedb8b60ce05c1decfe3ad16b72230967de01f640b7e4729b49fce"
  ]
}
  • scope: site:<siteId>:write.
  • up to 1000 hashes per call; larger batches are rejected with validation_failed.
  • response lists only the digests missing from the site's cas namespace. everything else is already reusable.
{
  "missing": [
    "4e07408562bedb8b60ce05c1decfe3ad16b72230967de01f640b7e4729b49fce"
  ]
}

step 2 - mint signed put urls for the missing set

POST /api/chunks/upload-urls
content-type: application/json

{
  "siteId": "kiosk-fleet-01",
  "hashes": [
    "4e07408562bedb8b60ce05c1decfe3ad16b72230967de01f640b7e4729b49fce"
  ]
}
  • scope: site:<siteId>:write.
  • signed urls carry a 60-minute ttl. the expiry is returned as expiresAt in rfc 3339 utc form.
  • urls are per-hash. one request can mint many urls at once, capped at 1000 hashes.
  • this route does not implement a response cache. retry by requesting fresh signed urls for the hashes that still need uploading.
{
  "urls": {
    "4e07408562bedb8b60ce05c1decfe3ad16b72230967de01f640b7e4729b49fce": "https://owlette-prod.r2.cloudflarestorage.com/project-content/kiosk-fleet-01/4e/4e0740856...?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=..."
  },
  "expiresAt": "2026-04-22T16:30:00.000Z"
}

step 3 - put the bytes directly to r2

PUT <signed url>
content-type: application/octet-stream

<exactly size bytes of chunk content>
  • bytes travel client to r2 with no roost intermediary.
  • retry policy is per-chunk. if the PUT fails, retry that single chunk. if the signed url has expired, re-mint only that hash via POST /api/chunks/upload-urls.
  • clients should upload chunks concurrently. 8-16 parallel PUTs is typical.

5. download flow

POST /api/chunks/download-urls
content-type: application/json

{
  "siteId": "kiosk-fleet-01",
  "hashes": [
    "2e7d2c03a9507ae265ecf5b5356885a53393a2029d241394997265a1a25aefc6",
    "18ac3e7343f016890c510e93f935261169d9e3f565436429830faf0934f4f8e4"
  ]
}

for small batches, use the GET form with repeated hash query parameters:

GET /api/chunks/download-urls?siteId=kiosk-fleet-01&hash=2e7d2c03a9507ae265ecf5b5356885a53393a2029d241394997265a1a25aefc6&hash=18ac3e7343f016890c510e93f935261169d9e3f565436429830faf0934f4f8e4
  • scope: agent token for the same site, or site:<siteId>:read.
  • signed urls carry a 15-minute ttl.
  • cross-site requests are rejected by the site auth and scope checks. there is no out-of-band way to resolve a digest to another tenant's storage key.
{
  "urls": {
    "2e7d2c03a9507ae265ecf5b5356885a53393a2029d241394997265a1a25aefc6": "https://owlette-prod.r2.cloudflarestorage.com/project-content/kiosk-fleet-01/2e/2e7d2c03...?X-Amz-Algorithm=AWS4-HMAC-SHA256&...",
    "18ac3e7343f016890c510e93f935261169d9e3f565436429830faf0934f4f8e4": "https://owlette-prod.r2.cloudflarestorage.com/project-content/kiosk-fleet-01/18/18ac3e73...?X-Amz-Algorithm=AWS4-HMAC-SHA256&..."
  },
  "expiresAt": "2026-04-22T15:45:00.000Z"
}

clients then issue GET against each signed url. the agent re-verifies each chunk's sha-256 as it writes to the local content store; a mismatch aborts the sync and the chunk is fetched again.

6. cross-roost mount

POST /api/chunks/{digest}/mount records a cross-roost chunk reference without moving bytes. {digest} is the bare 64-hex chunk hash.

POST /api/chunks/2e7d2c03a9507ae265ecf5b5356885a53393a2029d241394997265a1a25aefc6/mount
content-type: application/json

{
  "siteId": "kiosk-fleet-01",
  "from": "roost_lobby_td",
  "to": "roost_lobby_td_v2"
}
  • scope: site:<siteId>:write.
  • siteId, from, and to can be supplied in the JSON body. query parameters with the same names are also accepted.
  • from and to must be different roost ids in the same site.
  • the chunk must already exist under project-content/{siteId}/... and both roost documents must exist.
  • bytes moved: zero. firestore records or updates sites/{siteId}/chunk_referrers/{digest}/entries/mount_{from}_{to}.
  • retry behavior: the same (digest, siteId, from, to) upserts the same referrer entry; it is retry-safe but not backed by a request replay cache.

successful mounts return 201:

{
  "digest": "2e7d2c03a9507ae265ecf5b5356885a53393a2029d241394997265a1a25aefc6",
  "siteId": "kiosk-fleet-01",
  "from": "roost_lobby_td",
  "to": "roost_lobby_td_v2",
  "mounted": true,
  "zeroByte": true
}

7. referrer query

GET /api/chunks/{digest}/referrers returns recorded mount and version-publish referrers for a chunk.

GET /api/chunks/2e7d2c03a9507ae265ecf5b5356885a53393a2029d241394997265a1a25aefc6/referrers?siteId=kiosk-fleet-01&page_size=25
{
  "digest": "2e7d2c03a9507ae265ecf5b5356885a53393a2029d241394997265a1a25aefc6",
  "siteId": "kiosk-fleet-01",
  "referrers": [
    {
      "entryId": "mount_roost_lobby_td_roost_lobby_td_v2",
      "source": "mount",
      "roostId": null,
      "fromRoostId": "roost_lobby_td",
      "toRoostId": "roost_lobby_td_v2",
      "versionId": null,
      "versionNumber": null,
      "fileCount": null,
      "pathCount": null,
      "totalBytes": null,
      "referencedAt": "2026-04-22T15:30:00.000Z",
      "createdAt": null,
      "createdBy": null,
      "mountedAt": "2026-04-22T15:30:00.000Z",
      "mountedBy": "user_123"
    }
  ],
  "items": [
    {
      "entryId": "mount_roost_lobby_td_roost_lobby_td_v2",
      "source": "mount",
      "roostId": null,
      "fromRoostId": "roost_lobby_td",
      "toRoostId": "roost_lobby_td_v2",
      "versionId": null,
      "versionNumber": null,
      "fileCount": null,
      "pathCount": null,
      "totalBytes": null,
      "referencedAt": "2026-04-22T15:30:00.000Z",
      "createdAt": null,
      "createdBy": null,
      "mountedAt": "2026-04-22T15:30:00.000Z",
      "mountedBy": "user_123"
    }
  ],
  "next_page_token": "",
  "nextPageToken": ""
}
  • canonical pagination parameters are page_size and page_token; legacy aliases limit and cursor are also accepted.
  • default page size is 50 and max page size is 200.

8. retry behavior

operationretry behavior
POST /api/chunks/checksafe to retry; it recomputes missing hashes from r2 metadata.
POST /api/chunks/upload-urlssafe to retry by minting fresh signed urls. the route does not cache request replays.
PUT <signed r2 url>safe to retry with the same bytes while the signed url is valid; otherwise mint a new url.
POST or GET /api/chunks/download-urlssafe to retry; it mints fresh signed download urls.
POST /api/chunks/{digest}/mountretry-safe for the same (digest, siteId, from, to) because the referrer entry id is deterministic.

9. reference chunker - pseudocode

both snippets below stream-read a file in fixed 4-MiB blocks, compute sha-256 per block, and yield (hash, size, offset). neither loads the file into memory.

python (cpython 3.9+)

import hashlib
from pathlib import Path
from typing import Iterator

CHUNK_SIZE = 4 * 1024 * 1024  # 4 MiB

def chunk_file(path: Path) -> Iterator[dict]:
    """
    stream-read `path` in 4 MiB blocks; yield {"hash": hex, "size": int, "offset": int}
    for each chunk. the last chunk may be 1..CHUNK_SIZE bytes. a zero-byte file yields
    no chunks.
    """
    offset = 0
    with path.open("rb") as f:
        while True:
            block = f.read(CHUNK_SIZE)
            if not block:
                return
            digest = hashlib.sha256(block).hexdigest()
            yield {"hash": digest, "size": len(block), "offset": offset}
            offset += len(block)


def file_to_version_entry(path: Path, rel_path: str) -> dict:
    chunks = list(chunk_file(path))
    return {
        "path": rel_path,
        "size": sum(c["size"] for c in chunks),
        "chunks": [{"hash": c["hash"], "size": c["size"]} for c in chunks],
    }

node (20+)

import { createReadStream } from 'node:fs';
import { createHash } from 'node:crypto';

const CHUNK_SIZE = 4 * 1024 * 1024; // 4 MiB

/**
 * stream-read `path` in 4 MiB blocks; yield { hash, size, offset } per chunk.
 * last chunk may be 1..CHUNK_SIZE bytes. zero-byte files yield nothing.
 */
export async function* chunkFile(path) {
  let offset = 0;
  let pending = Buffer.alloc(0);

  for await (const buf of createReadStream(path, { highWaterMark: CHUNK_SIZE })) {
    pending = pending.length === 0 ? buf : Buffer.concat([pending, buf]);

    while (pending.length >= CHUNK_SIZE) {
      const block = pending.subarray(0, CHUNK_SIZE);
      pending = pending.subarray(CHUNK_SIZE);
      const hash = createHash('sha256').update(block).digest('hex');
      yield { hash, size: block.length, offset };
      offset += block.length;
    }
  }

  if (pending.length > 0) {
    const hash = createHash('sha256').update(pending).digest('hex');
    yield { hash, size: pending.length, offset };
  }
}

export async function fileToVersionEntry(path, relPath) {
  const chunks = [];
  for await (const c of chunkFile(path)) {
    chunks.push({ hash: c.hash, size: c.size });
  }
  return {
    path: relPath,
    size: chunks.reduce((n, c) => n + c.size, 0),
    chunks,
  };
}

notes applicable to both implementations:

  • read buffers must be at least 4 MiB, or the reader must accumulate smaller reads until it has a full chunk.
  • the final chunk is whatever remains at eof.
  • hashing and i/o must not be interleaved with other writers on the same file; take a read lock or copy the file first.
  • the version's files[].size should equal sum(chunks[].size). current agents reject mismatches when parsing versions, but the publish API does not currently enforce this invariant.

10. error taxonomy

every error response follows rfc 7807 application/problem+json with standard extensions (code, docsUrl, requestId, and optional field-level errors). the code field is the stable contract; match on that, not on detail prose.

codestatusendpointsmeaning
validation_failed400all chunk routesmalformed bare digest, missing or invalid siteId, empty hashes[], more than 1000 hashes, invalid from / to, from equal to to, or invalid pagination.
not_found404auth/site guard, POST /api/chunks/{digest}/mountsite not found or not accessible; mount digest is not stored for the site; source or target roost does not exist.
precondition_failed412chunk-dependent publish/finalize pathsa version references chunk hashes that are not present in the site's r2 namespace. upload the missing chunks and retry the publish.

universal errors (unauthorized 401, token_expired 401, scope_insufficient 403, rate_limited 429, internal_error 500) are documented in the top-level api conventions and are not repeated here.

example validation_failed for a malformed digest:

{
  "type": "https://owlette.app/problems/validation-failed",
  "title": "validation failed",
  "status": 400,
  "detail": "field hashes contains malformed hash entries (must be lowercase 64-char hex sha-256)",
  "code": "validation_failed",
  "errors": {
    "hashes": [
      "malformed entries: bad-digest..."
    ]
  },
  "docsUrl": "https://owlette.app/docs/api/errors#validation_failed",
  "requestId": "req_01HW..."
}

11. operational notes

  • signed-url expiry is enforced by r2, not roost. clients should trust expiresAt from the api response, not local time.
  • retries during the upload session should respect Retry-After on 429 and use exponential backoff on 5xx from r2.
  • partial uploads are not representable. r2 PUT is atomic per object; retry the whole chunk.
  • a chunk with zero referrers across all roosts in a site becomes eligible for deletion after the gc grace period. during that window, a mount can still resurrect it.

on this page