changelog
All notable changes to owlette are documented here. The format is based on Keep a Changelog and this project adheres to Semantic Versioning.
[2.12.14] - 2026-07-03
fixed
- Agent pairing no longer opens a browser without operator action. Interactive installer pairing now prints the phrase and URL, starts polling immediately, and offers
press Enter to open the pairing page in your browseras the opt-in path. The desktop app join-site dialog also waits for the user to click its existing open button instead of launching a browser automatically.
added
--open-browserpairing option for operators who explicitly want the old immediate-open behavior during manualconfigure_site.pyruns.--no-browserandOWLETTE_NO_BROWSER=1still suppress any local browser affordance.
[2.12.13] - 2026-07-03
fixed
- The desktop app's "join site" button now actually pairs the machine. Previously it opened no browser and showed no pairing phrase - it just polled silently, so there was no way to know what to do next. It now shows the 3-word pairing phrase in a dialog (copied to your clipboard automatically, click it to copy again), provides an open button for owlette.app/add, and reports progress; the service restarts on its own once you authorize from any device. You can cancel at any time without waiting for the code to expire.
- Revoking a machine's token no longer risks disconnecting other machines that share its hostname. The revoke dialog now offers "revoke current token" (only the most-recently-used token for that hostname) alongside "revoke all for hostname" (the previous behavior, now clearly labelled). Machines cloned to the same hostname are no longer knocked offline when you revoke just one.
added
- The pairing phrase is copied to your clipboard during install. When the installer displays the 3-word phrase, it's already on your clipboard to paste straight into owlette.app/add - no retyping. Best-effort, so a busy clipboard never blocks pairing.
- The agent-tokens admin can be searched, filtered, and pruned. Search by machine id or agent, filter by version, flag hostname duplicates, and prune tokens that are provably expired. The collection previously grew without bound as tokens rotated hourly, which made it easy to revoke a token a live agent was actually using; rotation now cleans up its own dead predecessors at the source.
[2.12.12] - 2026-06-15
fixed
- Windows Defender no longer disables temperature monitoring on some machines. The driver Owlette uses to read CPU/GPU temperatures is written to disk at runtime, and on some Windows 11 machines Microsoft Defender quarantined that driver file (flagged as
WinRing0) because the installer's antivirus exclusion didn't cover its actual location. The installer now excludes the driver file itself, so temperatures keep working; existing machines pick up the fix on their next update. Temperature reads are also now time-bounded, so a driver that's blocked at the OS level can never stall the agent's startup or heartbeat.
[2.12.11] - 2026-06-15
fixed
- The display "events" tab now shows recent events. The dashboard query fetched an arbitrary slice of a machine's logs (ordered by document id, which is a random UUID) instead of the newest ones, so on a machine with many logs recent monitor add/remove/drift events silently never appeared while an old one stayed pinned. The feed now filters to display events and orders by time, so the latest events always show.
- Auto-restore no longer gets permanently paused when a monitor is turned off. Powering off (or sleeping / disconnecting) a monitor that's part of a stored layout made auto-restore repeatedly try to re-apply, fail because the monitor isn't present, and trip its circuit breaker after three attempts — leaving auto-restore paused until a manual reset even after the monitor came back. Auto-restore now skips the re-apply while an assigned monitor is absent and resumes on its own when it returns; a transient power-off no longer counts as a failure.
[2.12.10] - 2026-06-14
fixed
- Agent pairing no longer stalls behind an unanswered prompt. During interactive install the agent used to block on an
open browser? [y/N]prompt before it started polling for authorization — so approving from the dashboard or another device did nothing until that prompt was answered, and pairing could silently time out. The agent now opens the pairing page automatically and starts polling immediately; authorization from any device completes pairing on its own.
added
--no-browserpairing option (and theOWLETTE_NO_BROWSER=1env var) skips opening a local browser on kiosks, signage, media servers, and headless or RDP machines while still polling immediately — authorize from your phone or another computer.
[2.12.9] - 2026-06-09
fixed
- Distribution management is now capability-gated end-to-end.
roostdeploy / rollback / versions / resync / delete and chunk upload-url routes now require the distribution-manage capability; they were previously reachable by read-only site members. - Soft-deleted users are blocked across the control plane. A shared active-user guard now rejects tombstoned users on control-plane and self-service credential routes, and the delete cascade clears site membership and revokes passkeys + MFA.
- Alert emails no longer send to soft-deleted users.
- Agent
cancel_syncis honored during the sync setup window. Cancels that arrived before the active sync handler was ready were previously dropped. - Agent command dispatch now keys rate limits per target. A dropped duplicate command is surfaced as failed instead of silently completed.
- Machine removal now revokes the agent refresh token so a removed machine cannot keep reporting with an existing refresh session.
- roost rollback now re-dispatches to machines instead of only flipping the version pointer.
- The one-click unsubscribe link disables all alert categories, not just health alerts.
- Firestore rules block cross-tenant autonomous Cortex conversation forgery.
[2.12.8] - 2026-06-07
added
/for-aipage with/llms.txt,/for-ai.json, and structured data — an honest, machine-readable pitch to AI assistants evaluating owlette.- tridant brand mark and restructured landing footer ("a tridant system").
- Cortex tier-3 tool-approval gate. Tier-3 tool calls now require explicit in-chat approval.
/api/cortexmigrated to the UIMessage protocol (convertToModelMessages); the per-site approval flag is default-on and forces the server-side path. Includes Cortex sidebar/UX fixes and a persistent chat layout. - Scoped full-text search on the logs page, plus an animated filters panel.
- Date-scoped log clearing. Clear-logs deletion accepts a
since/untilrange, with the date window computed in the display timezone. - Themed date picker (shadcn calendar + popover input) and dark-mode theming of native form controls via
color-scheme. - Site admins can remove machines on their assigned sites (previously superadmin-only).
- Admin user management: last-seen column and deleted-user visibility.
- Any authenticated user can create API keys from account settings — the account dialog was wrongly pointed at the superadmin-only
/api/account/api-keys; it now uses the user-scoped/api/keyswith an explicit scope-preset selector. GET /api/users/deletionsis now documented in the OpenAPI spec; new Firestore composite indexes back the log filter combinations and the deletions audit feed.- CLI / SDK OIDC trusted-publishing workflows for
@owlette/cliand the SDKs.
fixed
- False "machines offline" emails during scheduled reboots are fixed. The health-check cron is now reboot-aware: it suppresses offline alerts while a machine is inside an announced reboot/shutdown window — gated on the agent's
rebooting/shuttingDownflag plus the scheduled instant within a bounded ±15-minute grace, so neither a clock-skewed far-future anchor nor a stale anchor left behind by a cancel can mute a real outage — and it debounces transient staleness (health.staleSince) so a single missed 120-second heartbeat never pages. Agent-side, the periodic heartbeat (_upload_metrics) no longer reports connection success when its Firestore write fails — which had left a machineonline: truewith a frozenlastHeartbeatand no reconnect — and cancelling a scheduled reboot/shutdown now also clearsshutdownScheduledAt(it had kept the dashboard countdown pill alive). - Dashboard "Missing or insufficient permissions" error (Sentry OWLETTE-WEB-3R).
useUserManagementopened a realtime listener over the wholeuserscollection (superadmin-only perfirestore.rules) for every user, viaManageSitesDialogmounted on the dashboard/roosts/logs/deployments pages — so every non-superadmin tripped a permission-denied on load. The listener is now gated to superadmins (client-side; rules unchanged). - Quieted expected
permission-deniednoise on the site-availability check (CreateSiteDialog) and Cortex chat-URL load, and blocked agent ID tokens from minting user API keys (POST /api/keys). - Logs are ordered by timestamp across all filter combinations.
- roost: config-only republish honors content-addressed CAS on the no-op branch; a restated deploy config is applied on same-version republish.
- Cortex conversation rows are a11y-accessible (no nested buttons).
- Dark-mode button hover repaired and standardized on the secondary rollover.
- Metrics panel persistence restored (gated on a non-empty selection; no auto-restore on load); empty metrics-slide gap removed; inline sparklines read hourly
metrics_historybuckets; disk r/w throughput right-aligned in the machine list.
changed
- Logs page refactor — themed date pickers, date-scoped clear, aligned-column table.
- Sunken machine card/list surfaces with section enclosures; metrics/displays panels darkened with brighter content.
- CLI pre-publish hardening (6-wave review): request timeouts, idempotency-key surfacing on unconfirmed failures;
owlette keyremoved (key management is dashboard-only). - Removed the dead
MachineListViewwrapper and redundant per-instance button hover overrides.
infrastructure / docs
/preflightpre-push gate +post-push-e2ewatch hook; build-system skill expanded with the installer-release + version-bump flow.- CI: bumped checkout/setup-node/setup-java/cache to Node 24 action majors; py-sdk publish now runs pytest first.
- Layperson video-tutorial series + Playwright video-capture harness.
[2.12.3] - 2026-05-19
fixed
-
Agent screenshot capture restored — properly this time. 2.12.2 fixed the broken signed-URL flow by routing the
capture_screenshotcommand back through the pre-refactorOwletteService._handle_capture_screenshotpath (base64 upload via/api/agent/screenshot). That was working but architecturally wrong — Next.js was proxying multi-MB image bodies again, andscreenshot_capture.pywas effectively dead code despite being the api-sprint's intended replacement. 2.12.3 restores the signed-URL design end-to-end with the missing pieces actually built:- User-session capture moved into
screenshot_capture.py. The Windows service runs as LocalSystem in Session 0; mss inside Session 0 captures a blank ~2 KB LocalSystem display. The pre-refactor working flow ran capture inside the active user's desktop session viaCreateProcessAsUser. That mechanism (OwletteService.execute_in_user_session→session_exec.py) is now wired intoscreenshot_capture.capture_in_user_session()via dependency injection — the service passes itsexecute_in_user_sessionmethod tocapture_and_upload, the user-session interpreter runsmss+PIL(JPEG quality 72, max-width 7680 px, PNG fallback if PIL is missing), and writes the bytes into the IPC output directory. OwletteService.execute_in_user_sessionnow returnsoutputDirin its result envelope so callers can read files directly without scanningipc/results/for the most-recent screenshot (which was the OLD pattern and is racy under concurrent captures).- New
POST /api/sites/{siteId}/machines/{machineId}/screenshots/finalizeendpoint. After the signed-URL PUT lands, the agent calls finalize with thestoragePath+sizeKB. Server-side: verifies the object exists, pins content-type metadata, makes the file public-read, writessites/{siteId}/machines/{machineId}.lastScreenshot = { url, timestamp, sizeKB }(the Firestore field the dashboard's ScreenshotDialog subscribes to in real-time), appends ascreenshots/{docId}history doc, and prunes to the most-recent 20. SamelastScreenshotfield + history pruning behavior as the legacy/api/agent/screenshotroute, but for the signed-URL upload path. - Path → content-type alignment. Storage paths now use
.jpgfor JPEG bodies and.pngfor PNG fallback, so the URL doesn't lie about its content. machine_commands._handle_capture_screenshotreverted from the 2.12.2 delegation to the propercapture_and_upload(executor=service.execute_in_user_session, ...)call. The temp delegation path in OwletteService is no longer invoked by the public API (kept in source for now to keep the diff focused; can be removed in a follow-up cleanup pass).
Net result: multi-MB image bodies no longer transit Next.js, dashboard
lastScreenshotupdates immediately via Firestore real-time (no polling), and the signed-URL upload concept ships end-to-end as originally designed by the api-sprint (3027713). - User-session capture moved into
-
Login page hydration fixed (React #418). The passkey button rendered via
{browserSupportsWebAuthn() && …}evaluated during render — the server (nowindow) omitted the button, the client included it: a hydration mismatch. It looked harmless (React recovers by discarding the SSR tree and re-rendering client-side) but it threw away/login's server HTML on every load, and in the Playwright E2E harness the in-flight re-render dropped the login click so the suite hung on/login. This is why E2E had been red since the 2.12.0 hardening pass — that pass switched the root layout to dynamic rendering (await headers()for the CSP nonce), which surfaced the latent mismatch as a thrown error in production builds. Fixed by gating the button on amountedflag: server + first client render agree (no button), button revealed after hydration viauseEffect. E2E is green again. -
Alert emails localize timestamps to the machine's IANA timezone. The alert-email renderer passed a Windows-style zone string ("Pacific Standard Time") to
toLocaleString, which only accepts IANA names ("America/Los_Angeles") — so every alert email threwRangeError: Invalid time zone specifiedand silently failed to send. Now uses the machine's stored IANA timezone.
added
-
owlette.app failover origin — Cloudflare Load Balancing (Railway primary, Vercel standby). Groundwork for surviving a single-provider outage (the failure mode that took prod down on 2026-05-19 when Google Cloud blocked Railway's account):
GET /api/health— unauthenticated readiness probe (shallow Firestore read, 2.5 s timeout, 200/503) so the load balancer can fail an origin out when it's up but can't reach its backend (the exact "provider lost cloud egress" mode).infra/cloudflare/— Terraform module for the load balancer: Railway primary pool + Vercel fallback pool,/api/healthmonitor, andadaptive_routing { failover_across_pools = true }for an instant mid-request cross-pool handoff instead of waiting ~60 s for the next health-check cycle. Apply workflow + token scope + origin-hostname gotchas documented in.claude/skills/cf-load-balancing.md.scripts/sync-env.mjs+scripts/env-manifest.json— manifest-driven env-var management across railway-dev / railway-prod / vercel-prod (status / check / diff / sync; values never printed; dry-run by default). Workflow + must-match secret rules in.claude/skills/env-management.md.- Vercel deploy wiring —
web/vercel.jsonremoved (the auto-generatedexperimentalServicesconfig put the project in the wrong framework mode);.vercelignore+web/.npmrcadded for a correct monorepo deploy.
This is groundwork: the load balancer and Vercel deploy aren't live yet, so
/api/healthships and sits inert until failover is provisioned.
[2.12.2] - 2026-05-19 [superseded by 2.12.3]
The screenshot fix in 2.12.2 used a temporary delegation to the pre-refactor base64 upload path. 2.12.3 replaces that with the proper signed-URL pipeline. The other 2.12.2 fixes (audit-export guard, TimezoneChip, CSP) shipped unchanged through 2.12.3.
fixed
-
Agent screenshot capture restored (interim). Two regressions from the api-sprint refactor (
3027713, 2026-04-26) silently broke capture for ~3 weeks:/api/sites/{siteId}/machines/{machineId}/screenshots/upload-url404'd every agent call with"site not found or no access". The newrequireMachineAuthAndScopehelper had no agent-token short-circuit, so agent IDs fell through to ausers/{uid}.sites[]lookup — a doc agents don't have. Fixed by mirroring the agent-token branch from the siblingrequireAgentOrSiteAuthAndScope(with a defense-in-depth check that the token'smachine_idmatches the URL path).- The new
screenshot_capture.capture_and_uploadflow ranmssdirectly in the service process. The agent service runs as LocalSystem in Session 0, so mss captured a blank ~2 KB LocalSystem display instead of the actual user desktop. The pre-refactor path ran capture inside the active user's session viaCreateProcessAsUserand uploaded via/api/agent/screenshot(which also writes thelastScreenshotFirestore field the dashboard listens on); the new path skipped both.machine_commands._handle_capture_screenshotwas temporarily routed back through that pre-refactor path; 2.12.3 supersedes with the proper signed-URL design.
-
Audit-export Cloud Function deploys cleanly to prod.
exportSecurityBoundaryAuditDevDailyreferenced a dedicated service account that only exists in the dev project (security-boundary-audit-export@owlette-dev-3838a). Cloud Functions Gen 2 validated the SA at deploy time and would reject the prod deploy of all 24 functions infirebase deploy --only functions. The serviceAccount config is now conditionally attached only for the dev project, and the function body early-returns with a log message outside dev. (Provisioning the prod SA + bucket + IAM is a follow-up; this lets the rest of the release ship.) -
TimezoneChip shows abbreviation, not city name.
tz.split('/').pop()rendered "Los Angeles" / "New York" / "Berlin" — which read more like a city than a timezone.Intl.DateTimeFormat({ timeZoneName: 'short' })is observance-aware and returnsPDT/PST/EDTautomatically. Zones without a stable abbreviation fall back toGMT±N. Chip font also reduced one step (the screenshot-dialog history sidebar where this is most visible was too prominent). -
Login page no longer fails to hydrate under the 2.12.0 CSP. The hardening pass set
style-srcandstyle-src-elemto'self' 'nonce-...', but Next 16 emits inline<style>blocks during client-side hydration/navigation that the request-header nonce only covers for scripts. The browser blocked those styles, the page hit a React #418 hydration mismatch, and the login form became inert (Playwright E2E suite caught the regression on every push). Both directives now use'self' 'unsafe-inline'— modern browsers ignore'unsafe-inline'when a nonce is present, so the style nonce was dropped intentionally. Script injection remains nonce + strict-dynamic locked; style injection is materially lower risk.
infrastructure
functions/.gitignoretightened. Now blocks all.env*overlays (e.g..env.owlette-prod-90a12) except the committed.env.exampletemplate, preventing prod secrets from being accidentally tracked.
[2.12.1] - 2026-05-18
fixed
-
Display layout no longer flags healthy monitors as "not connected" after RDP / virtual-display events. Two changes:
- The
edidHashpayload no longer includes the Windows-reported monitor friendly name. Windows reports that string inconsistently across driver state transitions (RDP attach/detach, monitor sleep, EDID re-read fallback), so the same physical panel was receiving different hashes between snapshots — which surfaced as every stored monitor showing "⚠ not connected" after a remote session, along with spuriousdisplay_drift/display_monitor_added/display_monitor_removedevents. The hash is now identity-only:manufacturer | productCode | device-path serial. - Indirect display drivers (Miracast, IddCx-based indirect displays, the RDP
RdpIdd_IndirectDisplay/ "Microsoft Remote Display Adapter", dummy-plug EDIDs likeHDP-V104) are now skipped at CCD enumeration. These appear and disappear with remote sessions and previously polluted the topology signature, the drift counter, and the events feed every time someone attached. Only physical output technologies (DVI, HDMI, DP, USB-C tunnel, internal, etc.) are enumerated.
Existing assigned layouts stored under the old hashing scheme are re-derived on read on both the agent and web sides — no manual migration required, no need to re-press "store" after upgrading.
- The
[2.12.0] - 2026-05-17
security
- Five unauthenticated Cloud Functions hardened:
emitWebhook,recordAuditEvent,verifyAuditChain,preUploadCheck,recordUsageEventall now require anx-internal-secretheader matchingCORTEX_INTERNAL_SECRET. Previously theseonRequestfunctions defaulted toinvoker: 'public'and could be called by anyone with the project ID, allowing webhook forgery (signed events to customer subscribers), audit-log injection, quota DoS, and billing-data poisoning. Shared helperfunctions/src/lib/requireInternalSecret.tsusescrypto.timingSafeEqualfor constant-time comparison. - MFA is now server-enforced. Previously MFA enforcement lived in browser sessionStorage + a client-side
router.push('/verify-2fa')that could be bypassed by navigating directly to/dashboard. Iron-sessionSessionDatanow carriesmfaRequired+mfaVerifiedflags baked at session-create time fromusers/{uid}.mfaEnrolled. The proxy gates protected paths whenmfaRequired && !mfaVerifiedand redirects to/verify-2fa. Successful TOTP / backup-code / MFA-setup flipsmfaVerifiedserver-side. Pre-deploy sessions are upgraded lazily on first protected-page hit. New/api/mfa/disableroute is the only way to disable MFA — requires current TOTP or backup code, writes audit log, re-mints the session. - Firestore rules tightened to block client-side privilege escalation. User-doc create rule now constrains
sites[]to[],emailto the auth token email,mfaEnrolled/passkeyEnrolledtofalse,requiresMfaSetuptotrue. User-doc update switched from value-equality checks to adiff().affectedKeys().hasOnly([allowlist])policy — sensitive fields (role/email/sites/MFA state) must go through trusted server routes.canAccessSite()andisSiteOwner()now checkusers/{uid}.deletedAtso admin-soft-deleted users immediately lose dashboard access. Rules version bumped to 2.5.0. - Three new rule blocks plug agent silent-failure paths:
sites/{s}/machines/{m}/cortex/{docId}(local Cortex active-chat state),sites/{s}/machines/{m}/logs/{logId}(per-machine log shipping), andsites/{s}/cortex-events/{eventId}(autonomous-event log) now have explicit rules matchingagentCanAccessMachine()/isAgent() && site_id. Previously these writes fell to the deny-all fallback and the entire local-Cortex feature was silently broken end-to-end. - API key revocation now actually revokes.
apiAuth.server.tsnow checksrevokedAtbefore retiresAt/expiresAt. Previously the admin "revoke key" button and the user-delete cascade's revocation were no-ops. - Legacy unscoped API keys rejected by default. Pre-scope-system keys with empty/missing
scopesarrays now fail authentication unless explicitly opted in viaLEGACY_API_KEY_BYPASS_ENABLED=true+LEGACY_API_KEY_ALLOW_HASH_LIST, and even allowlisted legacy keys can't passrequireScope()(they resolve toscopes: []). - Cross-machine token isolation on
/api/agent/screenshotand/api/agent/alert. Routes now verifydecodedToken.machine_id === machineId(not justsite_id), preventing a compromised agent token for machine A from spoofing reports for machine B in the same site. - Webhook delivery pipeline reconciled. Cloud Functions dispatcher was reading
collectionGroup('webhook_subscriptions')while the web API was writingsites/{siteId}/webhooks/{id}— async roost-event webhook delivery emitted zero deliveries. Now both ends use thewebhookscollection. - Passkey login now requires user verification (
userVerification: 'required', previously'preferred'). Single-touch FIDO keys without PIN/biometric no longer suffice for full account access. - Backup code single-use guaranteed. Backup-code verification in
/api/mfa/verify-loginnow runs inside a Firestore transaction that re-reads the array before removing the matching code, preventing two concurrent same-code requests from both passing. - Agent refresh tokens rotate on every refresh.
/api/agent/auth/refreshnow generates a new refresh token, marks the old one superseded with a 5-minute grace window, and returns the new token to the agent. Captured tokens no longer grant indefinite access. Agents receive the newrefreshTokenin the response and must persist it on every refresh. - Roost kill switch is now wired into web API routes (17
/api/roosts/*+/api/chunks/*handlers). Flippingsites/{siteId}.roostEnabled=falsenow actually halts uploads, not just agent sync work. - Account self-deletion no longer wipes shared sites. Previously the cascade hard-deleted every site in the user's
sites[]array including sites where the user was just a member — wiping out the entire site for the owner. Now classified per-site: sole-owner sites are hard-deleted (existing behavior); shared owned sites refuse with409 needs_successor; member sites have onlyarrayRemove(uid). Cascade also drains previously-missed subcollections (passkeys,api_keys, top-levelapi_keys/{hash}lookups,mfa_pending,agent_refresh_tokens,chats, Firebase Storage avatar) and revokes + deletes the Firebase Auth user server-side rather than relying on the client. - Admin soft-delete also revokes Firebase Auth tokens. Previously
deletedAtwas written on the user doc but Firebase Auth was untouched; the user could keep using their session until the ID token expired. Now the admin cascade callsrevokeRefreshTokens+updateUser({disabled: true})immediately. x-owlette-security-versionheader bumped to 2 so dashboard tabs open at deploy time will surface a reload nudge.- LLM cost endpoints rate-limited.
/api/cortex/categorizeand/api/cortex/provision-keyare now wrapped inwithRateLimit({strategy: 'user'})so a captured session cookie can't burn through LLM credits. - Unsafe role fallback removed.
cortex-utils.server.tspreviously defaulted unknown roles to'admin'(site-scoped privileged); now defaults to'member'(least-privileged). /api/agent/alertlocalhost fallback fixed. WhenNEXT_PUBLIC_BASE_URLis unset, autonomous-Cortex callbacks now fall back tohttps://owlette.app(matching the email-templates pattern) instead ofhttp://localhost:${PORT}.
changed
- Cloud Functions reconcilers now exported.
reconcileDeploymentStatusandreconcileDistributionStatuswere defined but missing fromfunctions/src/index.ts. With the rules lockdown live, deployment/distribution status writes would have been silently dropped. Both are now exported and deploy. - Scheduled-function batch sizes capped at 400.
chunkGc.tombstoneStore.createandaggregateTelemetry.trimOlderThanpreviously committed all matched docs in a single batch, which fails past Firestore's 500-op limit. Now chunked. apiKeyExpirequery bounded. Added.where('expiredMarkedAt', '==', null)filter so the daily scan doesn't grow linearly with historical expired-key count.- Rate limiter
setIntervalno longer leaks Jest workers.web/lib/rateLimit.tscleanup timer now calls.unref()so test processes exit cleanly.
added
- New composite Firestore index
webhook_deliveries(state, nextAttemptAt)for the retry-queue pump. - New composite Firestore index
api_keys(expiresAt, expiredMarkedAt)for the boundedapiKeyExpirequery. - One-shot deprecation warnings for legacy API keys with empty scopes and for API keys passed via query string. Both log a keyHash prefix once per process so operators can identify which integrations need to migrate.
required env vars (new)
CORTEX_INTERNAL_SECRET— must be set on Cloud Functions (infunctions/.env) for the 5 internal-only HTTPS functions to operate. Without it, all 5 return 503.- Optional
LEGACY_API_KEY_BYPASS_ENABLED+LEGACY_API_KEY_ALLOW_HASH_LIST— emergency-only allowlist for legacy unscoped keys during the migration window.
migration notes
- Agents on 2.11.x and earlier: still compatible. Refresh tokens are issued under the new rotation scheme on next
/api/agent/auth/refreshcall; older tokens stay valid until first refresh. - CLI clients on 1.0.0-rc and earlier: still compatible. No CLI behavior changes in 2.12.0.
- Pre-deploy iron-session cookies: lazily upgraded on first protected-page hit (one Firestore read per cookie). After the upgrade the cookie carries the new MFA flags. Cookies expire naturally within 7 days.
[2.11.3] - 2026-05-08
added
- New Cortex Tier 2 tools
update_process,add_process, anddelete_processbring the chat to feature parity with the GUI/web for process management. The tools accept the full set of process config fields (name, exe_path, file_path, cwd, priority, visibility, time_delay, time_to_init, relaunch_attempts, launch_mode, schedules, schedulePresetId) and execute server-side via the existing validated action functions — no agent relay, no command-queue latency. - Agent now surfaces missing-executable failures as dashboard toast notifications with up to two suggested alternative paths (sibling versions discovered by walking up from the missing path). The toast offers a one-click "use path" action that opens the process edit dialog with the suggested path pre-filled.
- New canonical
firebase_client.send_alert(event_type, data)method with retry-on-failure queueing (drains on reconnect, capped at 100 pending alerts). The previoussend_process_alertandsend_display_alertare now thin wrappers that delegate to it.
changed
/api/agent/alertroute now accepts both the new generic{eventType, data}shape and the legacy flat process-alert shape, so older agent callers keep working unchanged while new event types (exe_missing, etc.) flow through cleanly.
fixed
- Missing-executable failures (e.g. when a process's configured
exe_pathno longer exists after an app upgrade) are now surfaced to the operator instead of silently failing every tick. Previously the agent only logged the error to its local log file, leaving operators with no signal that a managed process couldn't launch. Rate-limited so it fires once per failed-state transition, not every 5-second main-loop tick.
[2.11.2] - 2026-05-06
fixed
- Agent GUI process-details panel now stays in sync with external
launch_modechanges. The dropdown and schedule label refresh immediately even while a text entry is focused; entry-field updates are deferred and retried after focus leaves so Firestore changes are no longer permanently dropped. - GUI
launch_modechanges no longer trigger a brief command-prompt flash.GPUtil.getGPUs()(called transitively from the GUI's post-toggle metrics push) now spawnsnvidia-smiwithCREATE_NO_WINDOWvia a Windows-only monkey-patch inshared_utils._get_gputil(). launch_modetransitions originating from the GUI now run the same runtime cleanup as web-originated changes. The service maintains an in-memory snapshot of last-applied modes and diffs it on every main-loop tick, applying transitions through a single_apply_launch_mode_transitionhelper. Previously the GUI flow bypassed the smart-transition path because it wrote disk before uploading to Firestore, leaving stuckkilled/cooldown markers that prevented relaunch.off → scheduledtransitions outside the schedule window now clear stale runtime markers on the transition itself, so a previously-killed process will launch when the schedule window next opens (was stuck before).- Cortex
set_launch_modeIPC handler calledshared_utils.write_config(config)with the wrong signature, so launch-mode changes from Cortex were silently not persisted. Now usessave_config(config). - Cortex chat tool calls for
set_launch_mode(and other Tier 2 tools dispatched throughexecuteExistingCommand) now forward all LLM-supplied parameters —mode,schedules,schedulePresetId, etc. — to the agent. Previously onlyprocess_namewas forwarded, so the agent's command handler defaultedmodetooffregardless of what the user asked for.
changed
- Firebase
set_launch_mode/toggle_autolaunchcommand handler now matches byprocess_idfirst with aprocess_namefallback, making it robust to processes with duplicate names.
added - post-2.11 updates
- Site-tier billing now gates beta sites as
coreorpro, defaulting beta sites topro. - Process control now includes
restart_processend to end across the public API, CLI, and agent control surfaces. - Agent GUI
kill_processactions now require a confirmation dialog before dispatch.
changed - post-2.11 updates
- Landing and pricing pages were refreshed across the redesign, use-case, developer, proof, pricing, and GPU-label sections.
- roost and webhooks UI surfaces are now labeled as developer preview.
fixed - post-2.11 updates
- Metrics charts refresh their time window when a dashboard tab regains focus.
- Streaming API responses no longer enter the idempotency cache.
- Status page readiness checks are hardened against partial service readiness.
security - post-2.11 updates
- Remote installer requests now require a SHA-256 checksum before an agent accepts the installer.
getUsageSummaryHttpnow requires site-scoped auth before returning usage data.- Site-member and project-distribution routes now enforce their capability gates before mutations.
- Chat routes now enforce conversation ownership before returning or mutating conversations.
added - security-boundary migration prep
Adds the production-readiness material for the security-boundary migration: W8.2 observability, W9.0 operator runbooks, customer communication draft, and scheduled audit-export plan.
admin impact: expected to be no user-visible change for admins and superadmins. Control-plane writes now stay server-mediated through scoped REST routes, capability checks, rate limits, and blocking audit writes; existing dashboard workflows should keep the same behavior.
member impact: milestone A intentionally keeps direct member control-plane command capability denied after rules lockdown. Members may continue read-only and explicitly allowed flows, but machine commands and other privileged control-plane writes remain admin/superadmin-only until the configurable policy work lands in milestone B.
operations: both enforcement switches are documented (capability_enforcement, rate_limit_enforcement), including when to flip them, the 4-hour incident window, audit implications, alert wiring, and re-enable checklist. The incident playbook covers capability denial spikes, Cortex system-bucket 429s, and account-deletion cascade recovery.
audit retention: security-boundary audit rows under sites/*/audit_log and platform rows under global/audit_log/entries now have a GCS managed-export schedule plan with retention longer than the Firestore hot-store window and a restore drill.
added — api-sprint: 30+ scoped REST endpoints across 6 capability tracks
Promotes the internal admin-gated capabilities (machine commands, processes, classic-installer deploys, agent-installer mgmt, cortex chat, user/member admin) to public, scoped, api-key-friendly REST endpoints. Closes 35 cli stubs in owlette-cli. New: full Node + Python SDK coverage. The roost data-plane (chunks/versions/deployments/keys/webhooks) was already public from the prior cycle; this release fills out the rest of the platform surface.
Versioning: this is additive — no breaking changes to existing callers, no removals. Sits in [Unreleased] until cut. The 3.0.0 major-bump moment is deferred to roost-public-api W8 (public launch) which will combine this surface + the v1-agent compat cutover noted in project_roost.md.
new public endpoints (~30) — every endpoint accepts Authorization: Bearer owk_* (api key) or session/ID-token; mutations require Idempotency-Key (24h replay window, body-hash mismatch → 422 idempotency_key_mismatch); errors are RFC 7807 problem+json with stable code strings; collections are cursor-paginated per AIP-158; every mutation emits a fire-and-forget audit-log event under one of seven MutationKind taxonomies.
/api/sites/{siteId}/deployments/*— classic installer deploy CRUD + retry/cancel/uninstall/delete (7 verbs). Quota-enforced (max-targets-per-deploy=100, configurable viasites/{id}.deployQuota; 413over_quota). Cancel actively purges queued commands so a stale entry can't be picked up. Uninstall requiressite=<id>:admin./api/installer/*— agent-installer binary management (list, latest, 3-step upload, set-latest, delete). Superadmin-only (gated by newrequirePlatformAuthAndScopehelper that composes Wave-0 primitives + a defense-in-depth role check). Soft-delete protects the current latest version and enforces min-active-versions ≥ 2./api/sites/{siteId}/machines/{machineId}/commands/*— dispatch + status-poll forreboot_machine/shutdown_machine/capture_screenshot. Live-view streaming explicitly out of scope (deferred Wave-4 spike). Offline machine returns 409machine_offline(not queued). Screenshot uses a 3-step CLI flow: dispatch → poll → download from 1-hour signed URL./api/sites/{siteId}/machines/{machineId}/processes/*— full process CRUD + control verbs (kill / restart / start / stop / schedule). 10 verbs total. Race-safe via the newwithProcessLock()helper — Firestore transaction enforces duplicate-name rejection (409duplicate_process_name) inside the txn boundary; lazy backfill ofprocessIdUUIDs on legacy rows. Schedule verb writes through the lock (no command queue); the other four control verbs queue commands./api/cortex/conversations/*— canonical Cortex conversation API (5 verbs: create, list, send+stream, soft-delete, rename). Reuses Cortex's dual-path streaming engine (local agent vs server-side LLM) via the extractedcortexStream.server.tshelper. Older/api/chat/*routes remain compatibility aliases. Conversations are stored atchat_conversations/{id}with embedded messages capped at 200; overflow splits intochat_messages/{conversationId}/{messageId}subcollection./api/users/*— platform user administration (7 verbs: list/get/promote/demote/assign-sites/remove-sites/delete). Superadmin-gated. Last-superadmin guard runs inside the demote transaction (409last_superadmin). DELETE cascade with explicit failure modes: orphan-sites guard (409orphan_sitesif user owns sites andsuccessorUidnot provided); successor validation; api-key revocation across both subcollection + top-level lookup; backgroundsetImmediatecommand-cancel sweep./api/sites/{siteId}/members/*— site membership (3 verbs: list/add/remove). Site-admin-gated. Add withrole: 'admin'against a member-tier user returnsroleHonored: falserather than silently promoting globally — explicit promotion goes through/api/users/{uid}/promote.
owlette CLI — @owlette/cli (binary owlette) prepared as v1.0.0-rc.0 with 6 promoted command groups: chat (5 verbs), user (7), deploy (classic installer; 7), installer (4), process (10), machine mutations (3 — reboot/shutdown/screenshot; live-view stays as the only C-tier stub). Plus whoami, version, site, quota, audit-log, machine reads from the prior cli wave. Every mutation auto-generates Idempotency-Key: cli-<noun>-<verb>-<uuid>; stable error codes (machine_offline, duplicate_process_name, over_quota, min_versions_violated, last_superadmin, orphan_sites, scope_insufficient) get human-readable hints; destructive verbs honor --yes.
Node + Python SDKs — extended to the 1.0.0 RC package line: @owlette/sdk@1.0.0-rc.1 and owlette-sdk==1.0.0rc0 (PEP 440 spelling). Both add 6 new resource modules (installerDeployments/installer_deployments, installer, processes, chat, users, members) plus extension of machines with command-dispatch + screenshot orchestration (queue → poll → download). Both auto-generate Idempotency-Key: sdk-<resource>-<verb>-<uuid> if not supplied. Streaming chat.send() parses the AI-SDK v3 line protocol (0: deltas, d: end markers, 3: errors) — Node exposes { deltas: AsyncIterable, complete: Promise }; Python yields async for delta in chat.send(...).
scope grammar extended — ApiKeyResource enum extended from 3 → 8 types: added chat, deploy, process, user, installer. New constants ALL_RESOURCES + SUPERADMIN_ONLY_RESOURCES exported from web/lib/apiKeyTypes.ts so route validators + the dashboard scope picker can't drift.
shared infrastructure — three new helpers landed in Wave 0 of the sprint and are now used everywhere:
withIdempotency(request, ctx, rawBody, handler)inweb/lib/idempotency.ts— collapses the 12-line check→handler→save pattern into 4 lines for every mutating route. The existingweb/lib/idempotency.tswas extended in-place; no parallel helper file was created.emitMutation({kind, siteId, actor, targetId, attributes})inweb/lib/auditLogClient.ts— single parameterized helper covering all 7 mutation kinds (deployment_mutated,process_mutated,machine_command_dispatched,user_mutated,site_member_mutated,installer_mutated,chat_mutated). Fire-and-forget; never awaited.requireMachineAuthAndScope+requirePlatformAuthAndScopeinweb/app/api/_shared.ts— joined the existingrequireSiteAuthAndScope/requireRoostAuthAndScopefamily. Single-line scope check at every route; no per-route boilerplate.
cortex auth — web/app/api/cortex/route.ts swapped from session-only requireSession() to resolveAuth() + requireScope('chat', siteId, 'write'). Dashboard callers (session/ID-token) bypass scope and continue to work unchanged; CLI / 3rd-party api-key callers must hold chat=<siteId>:write. SSE streaming context preserved.
site membership canonical — audited and locked: site membership lives only at users/{uid}.sites[]. The hypothesized inverse sites/{siteId}.members[] does not exist anywhere in the codebase; firestore.rules pins to the canonical model. New getUserSiteIds(uid) helper in web/lib/apiHelpers.server.ts so future callers don't reinvent the read pattern. Decision memo at dev/completed/api-sprint/reference/membership-decision.md.
OpenAPI spec — every new endpoint shipped with its spec entry alongside the route. Total: ~50 public scoped endpoints documented in web/openapi.yaml. Interactive reference at /docs/api (Scalar) and raw JSON at /api/openapi. Three pre-existing Problem → ProblemDetails ref typos fixed during the sprint-close verify.
testing — 1447/1447 web tests passing across 75 jest suites (was 1142 before the sprint, +305 new). 237/237 CLI tests passing across 29 suites (was 108, +129 new). 55 Node SDK tests + 57 Python SDK tests. 47 new Playwright e2e specs across 7 files in web/e2e/specs/api-sprint/. 6 new k6 scripts in load-tests/k6/ covering the highest-traffic new endpoints with smoke/sustained/spike scenarios and per-VU per-iteration unique idempotency keys.
deprecated
- Historical
/api/admin/*docs are no longer the public contract. Use the scoped public equivalents (/api/sites/{s}/deployments/*,/api/installer/*, etc.); individual admin aliases may be internal, removed, or dashboard-only depending on the domain.
removed
Nothing. The api-sprint is purely additive — internal callers and existing public roost endpoints are unaffected.
[2.11.1] - 2026-05-05
fixed — display-helper IPC permission denied on non-admin console users
apply_display_topology, enumerate_display_modes, and the related revert / self-test paths failed with [Errno 13] Permission denied: 'C:\WINDOWS\TEMP\owlette_display_apply_*.req.json' on any machine where the active console user wasn't a local admin. After 3 retries the auto-restore breaker latched and stopped attempting drift correction.
root cause: the agent service (running as LocalSystem in Session 0) wrote helper request JSON via tempfile.gettempdir() — which resolves to C:\Windows\Temp for LocalSystem — then spawned the helper as the console user via CreateProcessAsUser. Default ACLs on C:\Windows\Temp deny standard-user reads of SYSTEM-owned files, so the helper couldn't open the request file the service had just written. Worked fine when the console user happened to be a local admin, which is why it didn't surface universally.
fix: every cross-session display-helper IPC file (request, response, stderr) now lives under %PROGRAMDATA%\Owlette\ipc\display\ via a new _ipc_tempdir() helper. The directory's DACL is set explicitly at first use — SYSTEM:Full + Administrators:Full + console-user-SID:Modify, with PROTECTED_DACL inheritance disabled so the installer's users-modify ACE on ProgramData\Owlette doesn't propagate down and let any interactive user tamper with IPC payloads between SYSTEM-write and helper-read.
Also covers the two enumeration paths (--enumerate-json, --enumerate-modes-json) that had the same bug, sweeps stale owlette_display_*.{req,out}.json and *.tmp files >1h old at first use, and adds a distinct DisplayErrorCode.IPC_FAILURE so request-read / response-write / stderr-redirect failures stop getting bucketed as generic bad_request (which was the reason this took 3 retries to surface as something operators could diagnose).
[2.11.0] - 2026-04-25
added — display alert routing (Feature B)
owlette now emits structured display events when monitor topology changes — drift, monitor added/removed/swapped, mosaic disabled, sync lost, apply succeeded/failed, auto-revert fired, apply-refused-mosaic. The new alert pipeline routes these through the existing email + webhook infrastructure with severity-aware delivery.
routing table — single source of truth at web/lib/alerts/displayEventRouting.ts. Severity decisions:
- email + webhook (critical):
display_monitor_removed,display_apply_failed,display_auto_revert_fired,display_sync_lost - webhook only (warning):
display_drift,display_monitor_swapped,display_mosaic_disabled,display_apply_refused_mosaic - dashboard only (info):
display_monitor_added,display_apply_succeeded
critical-path bypass — display_monitor_removed and display_auto_revert_fired skip the 3-min digest cron and email inline so operators see them in seconds. Everything else queues to pending_display_alerts and ships via the new /api/cron/display-alerts cron (3 min interval, 2 min accumulation window — same cadence as pending_process_alerts).
operator-caused-drift suppression — agent stamps suppressAlert: true + correlatedApplyId on display events that fire within 90s of a successful apply. The /api/agent/alert route honors the flag: skips email, still fires webhook. Closes the recall-then-N-drift-emails avalanche.
rate limiting — 1 per (machineId, eventType) per hour for most events; display_drift gets a tighter 1 per 4h window because cable-flap drift can fire repeatedly on rack-mount installations.
preferences — new displayAlerts: boolean toggle in account settings (opt-out, defaults to true). 30-day migration banner on /admin/alerts directs existing operators to the new control. Webhook config UI exposes all 10 new event ids as opt-in subscriptions (existing webhooks NOT auto-subscribed).
dashboard surface — new events tab on the display panel renders the last 50 display events for the selected machine, severity-color-badged, with relative timestamps. Subscription opens only when the tab is visible.
added — auto-restore (Feature C)
opt-in per-machine toggle: when enabled, the agent automatically reapplies the stored layout on detected drift instead of waiting for a human to click recall.
state machine — drift detected → 30s topology-check tick → drift persists across 2 ticks (~60s) → fixability check (every drifted edidHash present in assigned) → cooldown clear → spawn worker thread → apply_topology(..., auto_restore=True) → emit display_auto_restore_fired audit. No watchdog (no operator to ack), no sentinel (next topology check re-fires from clean state if anything goes wrong).
circuit breaker — 3 consecutive auto-restore failures trip the breaker (circuitBreaker.tripped: true in the config doc). While tripped, _maybe_auto_restore short-circuits at gate 3 — no thread spawned, no further failures counted. Operator resets via the panel's banner (single click → circuitBreaker.tripped: false, failures: 0). Rate-limited responses (AUTO_RESTORE_RATE_LIMITED) and unfixable skips (AUTO_RESTORE_SKIPPED_UNFIXABLE) do NOT count toward the failure counter.
fixability model — only display_drift triggers auto-restore. _monitor_added (new monitor in live, not in assigned) and _monitor_removed (assigned monitor unplugged) are unfixable by re-applying assigned and would loop infinitely if attempted. The unfixable-skip emits display_auto_restore_skipped_unfixable (info severity) so operators see why the auto-fix didn't fire.
dashboard surface — <Switch> toggle in the panel header (any site member with write access; not admin-gated). When armed, a small "auto" micro-label + green dot renders next to the recall button. When tripped, a red banner appears at the top of the panel body with the last error and a single-click reset. Fleet-view: small red dot next to existing drift indicator on both MachineCardView + MachineListView so operators can spot tripped machines without expanding any panel.
permissions — displays.autoRestore.enabled writable by any site member (matches existing config-doc write rule). Admin role is reserved for cross-site administration, not per-site feature toggles. circuitBreaker.tripped: false reset uses the same gate. Agent retains exclusive write on circuitBreaker.{failures, lastError, lastFailureAt, lastSuccessAt, trippedAt}.
added — remote-apply master kill switch + helper self-test (Wave 6)
Defence-in-depth gate over the apply path so a fresh agent doesn't auto-trust remote layout writes until an operator explicitly opts in.
kill switch — apply_topology now also reads displays.remoteApplyEnabled from the agent's local config. Anything other than literal True rejects the apply with remote apply disabled by config before any locks, audit events, or CCD calls. Distinct from displays.enabled (which gates the entire feature including drift detection); this flag scopes only the write path. Defaults False on fresh installs (generate_config_file); existing installs without the field also read as off.
self-test — test_display_apply command runs the apply helper in read-only mode (QueryDisplayConfig + SetDisplayConfig(SDC_VALIDATE) against the live config — never SDC_APPLY) so operators can verify the helper IPC plumbing (CreateProcessAsUser, env block, atomic response file) works end-to-end on a given machine before flipping the kill switch on. New _self_test_via_user_session service-side wrapper + _helper_self_test_to_json helper-mode entry. Surfaced in the dashboard panel only while remoteApplyEnabled is off.
migration
- New Firestore subcollection:
pending_display_alerts/*(digest queue). Created on first display event. No backfill needed. - New machine config field:
displays.autoRestore.{enabled, enabledBy, enabledAt, circuitBreaker}. Default-absent = disabled, breaker reads as not-tripped via the typed default sentinel inuseDisplayState. - New machine config field:
displays.remoteApplyEnabled: boolean. Default false on fresh installs and on existing installs with the field missing — operators must explicitly enable per machine via Firestore (or the upcoming dashboard toggle) before remote apply / auto-restore writes will land. Existing operators relying on the v2.10 apply button must flip this totrueafter upgrading the agent. - New user preference:
displayAlerts: boolean. Missing field treated astrue(opt-out semantics — existing users continue to receive display events until they explicitly disable). - New cron required:
/api/cron/display-alertsevery 3 minutes via Railway cron (same cadence as/api/cron/process-alerts). Set theX-Cron-Secretheader to the existingCRON_SECRETenv var. - No agent-side migration. v2.10+ agents emit display events through the existing log_event path; pre-2.10 agents simply don't emit them and the dashboard's events tab stays empty for those machines.
[2.10.0] - 2026-04-24
breaking — roost: manifest → version rename + per-roost version numbering
The release-engineering noun renamed end-to-end. What was a manifest (OCI/Docker borrow that confused TD-artist + signage operators) is now a version everywhere: API routes, SDK types, CLI commands, dashboard labels, Firestore sub-collection name, mediaType string, webhook events, error codes. Clean break — no backcompat shim, no redirect layer. v2.10.0 is the cutover release. Pre-2.10 agents cannot speak to a 2.10 web/api (wire protocol changed: manifest_id/manifest_url/folder_id in sync_pull payloads → version_id/version_url/roost_id); upgrade agents in lockstep with web.
rename surface (mechanical)
- API routes:
/api/roosts/{id}/manifests/*→/api/roosts/{id}/versions/*./manifest-url→/version-url. Old paths 404. - Path param grammar:
{manifestId}→{versionRef}. TheversionRefresolver accepts six forms — see new capabilities below. - Field names (camelCase):
manifestId→versionId,currentManifestId→currentVersionId,previousManifestId→previousVersionId,targetManifestId→targetVersion(nowstring | number),manifestUrl→versionUrl,manifestMetadata→versionMetadata. Same renames in snake_case across Python SDK + agent (manifest_id→version_id, etc.). - TypeScript types:
Manifest*→Version*(ManifestSummary→VersionSummary,ManifestDetail→VersionDetail, etc.). - SDK accessor:
client.manifests→client.versions(Node + Python). - CLI flags:
--manifest <id>→--version <ref>;--against <manifestId>→--against <versionRef>;--to <manifestId>→--to <versionRef>. - Webhook events:
manifest.published→version.published.Roost-Eventheader values follow. - Error codes:
manifest_stale→version_stale,manifest_not_found→version_not_found, plus newversion_ref_malformed(400) +version_content_immutable(400). - mediaType:
application/vnd.owlette.manifest.v1+json→application/vnd.owlette.version.v1+json. - Firestore sub-collection:
sites/{s}/roosts/{r}/manifests/{id}→sites/{s}/roosts/{r}/versions/{id}. Migration script atscripts/migrate-manifest-to-version.mjs(idempotent, supports--dry-run+--rollback, handles roost-doc field rename +versionNumberbackfill). - Agent wire protocol:
sync_pullcommand payloads now useversion_id/version_url/roost_id(wasmanifest_id/manifest_url/folder_id). Agent code atagent/src/sync_version.py(wassync_manifest.py). - Firestore security rule:
match /manifests/{manifestId}→match /versions/{versionId}under the roost sub-collection.
new capabilities introduced alongside the rename
- Auto-incrementing per-roost
versionNumber: every push to a roost gets a 1-indexed integer (#1, #2, #3...) minted inside a Firestore transaction. Monotonic + gap-free even under concurrent publishes. Surfaced in API responses, dashboard list rows (v3badge), and CLI output. - Optional
descriptionfield per version (≤500 chars, plaintext): commit-message-style "what changed?" annotation. Set on push via SDK options or CLI-m / --description <text>. Editable after publish viaPATCH /api/roosts/{id}/versions/{ref}— version content (files, chunks) stays immutable; only description can change. Denormalized to roost doc ascurrentVersionDescriptionso the dashboard list renders without N+1 reads. - Version-addressing resolver (
web/lib/resolveVersion.ts): every{versionRef}path param + CLI--to/--againstaccepts six forms — plain integer (3),#3,v3/V3, stable id (vrs_*), or alias (current/previous/first). Server resolves; SDKs/CLIs forward raw input verbatim. - New
roost roost versions <roostId>CLI subcommand: lists all versions for a roost with#, id, description, createdAt columns. Cursor-paginated, honors--json. - Dashboard UI: roost rows show current
v{N}badge + description preview + relative timestamp. Expanding a row reveals chronological version history with per-row three-dot menu (rollback to this version, copy version id, view files, diff against current). New "+ new version" button inside each expanded panel opens the push modal with name/extract-path/targets locked + pre-populated, so adding a version to an existing roost is one flow distinct from creating a new project. New-roost modal now requires a non-empty name.
migration tooling
- Firestore migration:
node scripts/migrate-manifest-to-version.mjs --project dev --dry-runfirst, then run for real. Idempotent — skips already-migrated roosts. BackfillsversionNumber(1, 2, 3...) sorted bycreatedAt. Renames roost-doc pointer fields.--rollbackreverses via the per-run log file. - Local agent state: SQLite columns renamed in
sync-state.dbschema.SCHEMA_VERSIONwas not bumped — existing dev installs need the file deleted before the new agent boots, or upgrade will throwOperationalErroron first INSERT. Acceptable per the 2.10.0 clean-cutover semantics; flagged for installer-side cleanup follow-up.
deferred follow-ups (post-rename)
- R2 physical bucket migration: object keys still live under
project-manifests/{roostId}/{versionId}in R2. Code references the prefix string with a marker comment (web/lib/r2Client.server.ts) until a dedicated migration script copies+deletes fromproject-manifests/→project-versions/. No wire impact — just a storage-layer rename. - Browser-app
manifest.json(PWA) is intentionally untouched — that's a Next.js metadata convention, not a roost concept.
verification
- 200 functions tests + 1,017 web tests + 28 node SDK tests + 101 CLI tests + 376 agent unit tests all green. Final repo-wide grep returns 17 hits, all in
dev/active/roost-version-rename/reference/rename-sweep-allowlist.txt(R2 deferral, PWA reference, verb forms, intentional compat note, defensive test fixtures). - Migration guide for any dev tester or early SDK user:
dev/active/roost-version-rename/MIGRATION.md.
added — roost (project distribution v2)
A content-addressed sync layer replacing v1's single-URL ZIP model. Turns roost into the release-engineering layer: deploy via drag-drop or URL, atomic rollback via pointer flip, dedup at chunk granularity, real retry + resume across tab close.
storage + manifest model
- 4 MiB fixed-chunk content addressing with per-tenant prefix
project-content/{siteId}/{hash[0:2]}/{hash}on Cloudflare R2 — picked over S3/GCS for free egress, the only economic axis that matters at fleet fan-out. - OCI Image Manifest v1.1 derivation (
application/vnd.owlette.manifest.v1+json), immutable once written. A firestore pointer atsites/{siteId}/roosts/{roostId}.currentManifestIdis the only mutable head. - Schema spec + threat model + v1-to-v2 migration design at
docs/internal/{manifest-format,threat-model,v1-v2-migration}.md.
browser upload pipeline (dashboard)
ProjectDistributionDialogrebuilt as a two-mode dialog:new deploy(configure a new deployment — source is a sub-choice within: url vs upload files) +history(past manifests + rollback). Shared fields (distribution name, extract path, verify files, target machines, preset bar) apply across both sources. Dialog is mobile-responsive at 375px.- Off-main-thread chunker + SHA-256 hasher in a web worker (
web/lib/chunking.ts); AbortSignal-aware between chunks, streamed-slice memory (O(1) per chunk regardless of file size). - IndexedDB-backed upload queue (
web/lib/uploadQueue.ts) with parallelism (default 4), exponential backoff + jitter, 10-attempt cap. Crashed tabs leavein_flighttasks that get demoted topendingon re-open — close-and-re-drop resumes from wherever it stopped. - Pre-upload confirmation (
PreUploadSummary) showing file count, total size + dedup preview, est. upload time, per-target disk-free check (warns on unknown, blocks on insufficient + 20% margin), quota check (80% warning, exceed-cap error). Start button gated on blocking checks. - Rollback confirmation dialog (
RollbackConfirmDialog) with file-level diff (added/removed/changed), net byte delta, canary-vs-all-at-once strategy picker (canary default), problem+json error parsing. - Filename sanitization (
web/lib/sanitize.ts) — NFC normalisation, strips C0/C1 control chars + zero-width/RTL-override invisibles (ZWSP, LRM/RLM, LRO/RLO, BOM, etc.), windows-canonical trailing-dot/space trim, codepoint-safe 255-char truncation, rejects NUL bytes, path separators,./../empty-after-clean.
server-side — web API surface (roost routes)
- New routes at clean paths (no
/v2/prefix — deliberate decision):POST /api/chunks/{check,upload-urls,download-urls},GET/POST /api/roosts/{roostId}/manifests,POST /api/roosts/{roostId}/rollback. All 6 currently return 501notImplementedYetstubs pending R2 wiring. - RFC 7807
application/problem+jsonerror envelope (web/lib/apiErrors.ts) — stable problem-type URIs, per-occurrence requestId for trace correlation, field-level error detail for 400/422. Replaces the legacy{error: string}shape for roost routes. - OpenAPI 3.1 spec extended (
web/openapi.yaml) with roost tag, all 6 paths,ProblemDetails+OciManifest+ManifestSummaryschemas, reusableProblem4xx+Problem501responses, bearer-token scheme for firebase ID tokens. Live docs at/docs/api(Scalar renderer). - Strict CI drift gate:
.github/workflows/openapi-validate.ymlruns on PR + push; any undocumented route under/api/chunks/*or/api/roosts/*hard-errors.
server-side — cloud functions
onRoostWritten+onTargetStateWritten(functions/src/distributionFanout.ts) — canary-first fan-out. 10%/floor-1/cap-50 canary cohort via stable FNV-1a hash ofmachineId + manifestId(deterministic across retries). Abort threshold evaluates against total-not-settled, so a rollout already past 25% failure aborts without waiting for stragglers. Cloudflare 2025-11-18 all-at-once lesson explicitly honored.verifyChunk(functions/src/chunkVerify.ts) — SHA-256 verify on every R2 PUT via Cloudflare Worker webhook; planted-bytes get deleted + alerted.chunkGcNightly(functions/src/chunkGc.ts) — two-phase mark-and-sweep with 30-day tombstone TTL. Resurrection guard: a chunk referenced again before TTL elapses has its tombstone cleared (never deleted).CHUNK_GC_MODE=dry-rundefault for the first production month.preUploadCheck+reconcileQuota(functions/src/quotaEnforce.ts) — per-tenant storage quota. Admit reserves pending-bytes atomically (concurrent uploads can't both fit when sum > cap). Daily reconcile fires only newly-crossed 50%/80%/100% alarms (no refire at steady state). Pricing tiers: free 5 GB / starter $8 25 GB / pro $15 100 GB / enterprise BYO.- Telemetry + cost attribution (
functions/src/telemetry.ts) — R2 pricing model ($0.015/GB-month storage, $4.50/M class-A, $0.36/M class-B, $0 egress), GB-day averaging on storage snapshots (not latest), OTLP-shaped structured logs for downstream collector. - Append-only audit log (
functions/src/auditLog.ts) — SHA-256 hash-chained records; deletion + mutation both detectable. 7-year retention, BigQuery cold-storage sink. - Webhook dispatcher (
functions/src/webhookDispatch.ts) — 7 event types (distribution.{queued,started,succeeded,failed},chunk.uploaded,manifest.published,rollback.executed), HMAC-SHA256 signed (sha256=<hex>), retry queue with exponential backoff (5 s × 3 × 1 h cap × ±20% jitter, 10-attempt cap).classifyResponsemaps 2xx=success, 408/425/429/5xx=retry, other-4xx=permanent.
agent sync pipeline (windows service)
sync_commands/sync_manifest/sync_downloader/sync_state/sync_assembler/sync_scrubmodules (agent/src/) implementing the end-to-end v2 pipeline. ~350 new unit tests covering fetch + diff + cache, range resume (Range: bytes=N-), verify failure, URL refresh, atomic assembly, drift detection.- Destination allowlist (
agent/src/destination_allowlist.py) — fail-closed: empty/missing rejects all writes. Realpath-based; rejects symlinks/junctions (viaFILE_ATTRIBUTE_REPARSE_POINTcheck — catches cve-2022-21658 / cve-2025-4330 class), windows reserved device names (NUL/CON/PRN/COM1-9/LPT1-9, any case, with or without extension), alternate data streams, path traversal. - Path-traversal + TOCTOU hardening (
sync_assembler.py) — post-rename realpath check catches parent-dir symlink-swap between allowlist validation and the rename landing; suspect files get quarantine-deleted. Sibling-prefix regression (/foo/bar-extramust NOT satisfy root/foo/bar) covered via separator-suffixed prefix match. - Explicit ACL on extracted files —
SYSTEM+Administratorsonly, inheritance stripped. Fixed a 0x80000000 overflow on python 3.9 + pywin32 that was silently no-op'ing the ACL hardening in prior builds. - Long-path support (
\\?\prefix for >260-char paths), throttled progress reporting (every 5% or 30 s, not every chunk), real cancellation (flag checked between chunks, atomic rename completes if in flight, no corrupted files), locale + accented filename support (French/Spanish/German/Nordic accents, CJK, Arabic/Hebrew RTL, emoji with surrogate pairs, NFC/NFD).
security / ops / observability
- Per-site kill switch (
sites/{siteId}.roostEnabled) — agent checks before everysync_pull; web routes gate viagateOrProceed()returning 503 problem+json. Fail-open on read error (a transient firestore blip must not silently disable a customer). 30 s TTL on both sides → flip propagates within 60 s. Matching python + ts implementations, field-name-pinned on both sides. - No-token-logs lint gate (
scripts/check-no-token-logs.mjs) — scans TS/JS/TSX/MJS + Python for log calls that interpolate auth-token identifiers, handles f-strings and template literals, 6 must-flag + 5 must-pass self-test fixtures. Runs via.github/workflows/no-token-logs.ymlon PR + push. Plus an ESLintno-restricted-syntaxrule for dev-time IDE feedback. - SLSA Build Level 3 pipeline (
.github/workflows/build-installer.yml, doc atdocs/internal/slsa-build-l3.md) — hermetic windows build with pinned Inno Setup 6.2.2 + Python 3.11, keyless sigstore signing via github OIDC, reusable workflow pinned toslsa-framework/slsa-github-generator@v2.0.0(not@main, prevents silent chain-of-trust degradation). Verify job runsslsa-verifier v2.6.0against artifact + provenance before the release ships. - k6 load-test suite (
load-tests/k6/) with SLO targets enforced as thresholds — chunks/check p99 < 200 ms, upload-urls < 500 ms, download-urls < 400 ms, finalize-manifest < 800 ms, rollback < 400 ms. Base reliability gate:http_req_failed < 0.01. Includes a race scenario on finalize-manifest (20 VUs × sameexpectedCurrentManifestId→ P0 CAS regression guard: exactly one 201, rest 412). - Architecture doc (
docs/architecture.md) extended with the roost section: storage layout, manifest format, browser upload pipeline mermaid, agent sync pipeline with per-module links, canary-first rollout algorithm, security floor, explicit restatement of the clean-cutover decision.
added — playwright e2e suite
A Playwright end-to-end test suite running the full web dashboard against Firebase emulators (Auth :9099, Firestore :8080, Storage :9199). Covers 50+ specs across six phases:
- Phase A — emulator wiring + smoke (4 specs): Admin SDK branch routing, sentinel canary for
firebase-admin.ts - Phase B — auth flows (4 specs): login, logout, signup, HttpOnly session cookie round-trip via
page.evaluate(fetch)pattern - Phase C — account + settings (8 specs): profile update, passkey enrol/delete, preferences, password change with fixture-isolation (dedicated
password-test-userprevents Firebase token revocation from poisoning other specs) - Phase D — dispatch flows (22 specs): reboot, shutdown, kill-process, recall/store/clear display layouts, deployment create/progress/cancel/retry, roost create, rollback auth + validation
- Phase E — time-travel flows (10 specs):
page.clock-driven specs for reboot countdown, cancel-lockout threshold, display-apply deadline auto-revert, heartbeat staleness → offline flip, heartbeat recovery via onSnapshot overwrite - Phase F — CI + hardening:
.github/workflows/e2e.ymlwith Temurin JDK 21, Playwright browser caching keyed on@playwright/testversion,firebase emulators:execwrapper, artifact upload on failure (14-day retention)
Infrastructure highlights: roleState() helper for pre-authenticated specs, stubCommand / completeCommand agent-stub helpers, seedMachine + seedBaseline deterministic seeding, per-spec page.clock.install({ time: Date.now() }) pattern (must precede page.goto for React's setInterval to bind to the fake clock). Full guide at web/e2e/README.md.
decisions locked (roost)
- No
/api/v2/URL prefix — the new routes ARE the API. - No backwards compatibility with v1 agents — clean cutover, the v2.10.0 agent is a hard requirement. No dual-write window, no shadow-read, no
project_urlfallback. Operators re-roost on v2; existing v1 distributions end at cutover. - No header-based versioning (no
Accept: application/vnd.owlette.v2+json). - v3-deferred (do NOT rebuild in v2): bidirectional sync, LAN swarm, Ed25519 manifest signing, public CLI + GitHub Action, FastCDC, chaos rack.
changed
- Three-role permission model —
member/admin/superadminreplaces the two-tieruser/adminscheme. Superadmins retain platform-wide god-mode (user management, installer uploads, access to every site regardless of assignment). The newadmintier is site-scoped: site admins get elevated rights only on the sites in theirsites[]— they can edit site config, delete machines, and manage display layouts without holding any platform-level powers. Members keep standard site-scoped access. - User-management page redesigned for the new model: role selector is now a three-option dropdown (with icons, colour, and inline descriptions of each role's capabilities); stats cards show per-role counts; admin rows display the specific sites each admin is responsible for (small pills, easy to scan). Self-demotion guard narrowed — superadmins are still blocked from self-demotion (platform-lockout risk), but admins can demote themselves since no cross-site powers are in play.
- Superadmin visual indicator — small red "superadmin" Crown pill appears next to the user avatar on every authenticated page when signed in as a superadmin. Signals god-mode so routine site ops don't accidentally use elevated access.
migration
- Deploy order is load-bearing: run
node scripts/migrate-roles.mjs --env=<dev|prod>first, thenfirebase deploy --only firestore:rules, then the web deploy. The migration flips existingrole: 'user'→'member'androle: 'admin'→'superadmin'idempotently; supports--dry-run. Existing admins become superadmins automatically (semantics preserved). The new site-scopedadmintier starts empty — superadmins promote members via the user-management page. Reversed deploy order would transiently lock current admins out of their sites until the migration runs. scripts/migrate-profiles.mjs— one-shot bootstrap script for the multi-device metrics schema (shipped at 2.8.1). Iterates everysites/*/machines/*doc and writes a best-efforthardware/profilesubdoc from legacy singularmetrics.cpu/disk/gpu/network.interfacesfields. Skips machines that already have a profile or are on schemaVersion 2. Idempotent; supports--env=<dev|prod>,--site=<id|all>,--dry-run,--force. Useful for offline/stale fleets that haven't upgraded to the 2.8.1+ agent yet — gives the dashboard something renderable until the agent overwrites the bootstrap on its next startup.
[2.9.0] - 2026-04-18
added
- Per-logical-volume disk IO monitoring — agent now collects per-drive read/write throughput, IOPS, and busy% via WMI's
Win32_PerfFormattedData_PerfDisk_LogicalDisk. Each drive (C:,L:, etc.) reports its own rates instead of one system-wide aggregate; selecting a different drive in the dashboard shows that drive's specific IO. New Firestore fieldmetrics.diskiois keyed by volume id (mirrorsmetrics.disksshape). History samples carry per-volume entries undersample.dios = [{i, rb, wb, bu}]. - Disk IO surfaces on machine cards and list rows — selected drive's read/write rates shown as stacked
r <rate>(green) /w <rate>(orange) lines, hidden when idle. Auto-scales between B/s, KB/s, MB/s, GB/s. List-view disk cell widened to 160px with a 2-column layout (usage stats left, IO right). - Disk IO chart series in the metrics detail panel — per-volume sub-toggles for read/write/busy% with volume-qualified labels (
C: read,L: busy). Read/write render on the hidden axis (throughput); busy% shares the default 0-100% axis. Tooltip and stats grid both volume-qualified. - Friendly GPU names in the detail panel — UUID stays as the chart-data key (stable, unique) while toggle labels, chart legend, stats grid, and tooltip all show "NVIDIA GeForce RTX 2080 Ti" via a lookup against the static profile.
changed
- Network cards now use arrow notation (
↑ <rate>/↓ <rate>) instead ofTX/RXtext on both card and list views — language-independent, more compact, and a cleaner visual rhythm with the new disk r/w letters. - Metric cell clicks SWAP the detail panel instead of merging — clicking disk then GPU shows only GPU lines, not both. Click expansion still adds every per-device id (all disks, all GPUs, all NICs) so the panel shows all devices of the clicked type.
- Toggles can deselect everything — the "must keep at least one selected" guard is gone. Empty chart is a valid state; the stats grid hides cleanly.
- Stats grid headers lowercased (
avg/max/min) for consistency with the rest of the panel's UI copy.
fixed
- MetricsDetailPanel chart now re-measures when the tab becomes visible — Recharts' ResponsiveContainer would occasionally hold a stale width while the tab was backgrounded (rAF + ResizeObserver throttling), then render the plot area offset to the right with blank space on the left. A synthetic
window.resizeevent onvisibilitychange → visibleforces all charts to re-measure. - Watchdog timeouts actually unblock the metrics loop — both
_wmi_logical_disk_with_timeout(disk IO) and_disk_usage_with_timeout(disk partitions) usedwith ThreadPoolExecutorwhose__exit__callsshutdown(wait=True), blocking forever on a hung WMI/network-mount worker. Switched both to manual lifecycle withshutdown(wait=False, cancel_futures=True)so a hung worker leaks instead of stalling. - NSSM runner crash on first metrics tick —
MockServiceinowlette_runner.pywas missing four attributes thatOwletteService.__init__defines (_display_check_counter,_cached_display_hash,_shutting_down,_reboot_attempt_started_monotonic). The display topology check at the top of every loop iteration crashed withAttributeError, causing NSSM to thrash-restart. Added the missing initializations plus two more (_last_status_signature,_last_status_write_time) for the status-throttle path. - NaN-guarded disk IO history extraction — cloud function now uses
Number.isFinite()checks before pushing IO entries to a sample. Prevents a poisoned NaN field (rare but possible from upstream perf-counter glitches) from causing Firestore to reject the entire history write. - Disk IO detail-panel UX rebuild — toggle list collapsed from a wall of
C: read/C: write/C: busy%/HarddiskVolumeN ...labels (12+ buttons on a typical machine) to two camps with icons:<HardDrive> C:(storage / disk usage %) and<ArrowDownUp> C:(activity — when on, plots both read% and write% lines).HarddiskVolumeNraw partitions are filtered out at the agent so they never enter Firestore in the first place; existing entries fall out of the live doc on the next metrics upload (Firestore dot-notation replaces the field). Read/write rates are now plotted as % of max bandwidth on the same 0-100 axis as everything else — agent ships a hardware-classmaxBpsper volume (NVMe ≈ 3.5 GB/s, SATA SSD ≈ 550 MB/s, HDD ≈ 150 MB/s, etc., detected viaMSFT_PhysicalDiskat first call) that ratchets up on observed peaks. - Disk IO watchdog timeout 2s → 10s — the old 2s budget skipped the perflib LogicalDisk stalls that occur when the BITS service flips state during Windows Update / Delivery Optimization polling, causing ~12% of metrics ticks to report empty disk IO. Empirical: 2s and 5s both still skipped (3.6-3.7 timeouts/hr, perfectly spaced ~16 min apart matching SCM 7040 BITS demand↔auto cycles); the perflib provider lock during a BITS state change consistently exceeds 5s. 10s captures them — verified at zero timeouts over 23 min observation post-fix. The metrics loop runs in its own thread (not the main service loop) so a 10s WMI call doesn't stall anything else. (A persistent-worker variant of the WMI call was tried first but reproducibly triggered RPC_E_WRONG_THREAD on every call after the first — the python
wmipackage binds proxies to the apartment that created them, and reusing a cached proxy from a long-lived thread is incompatible with how the package hands off to the COM runtime. The per-call pattern stays.)
[2.8.1] - 2026-04-14
added
- Per-device metrics schema (v2) — metrics now key cpus/disks/gpus/nics by stable device id instead of the old singular
cpu/disk/gpufields. Each machine publishes ahardware/profilesubcollection document (schemaVersion 1) describing its physical devices; heartbeats reference those ids and include aprimarypick per kind (selected by busyness with 5% hysteresis so the display doesn't flicker). Multi-GPU rigs, multi-disk setups, and multi-NIC hosts now render every device instead of collapsing to the first one. - Per-user device selection preferences — dashboard list and card views remember which device each user wants to see per machine, persisted to
users/{uid}/devicePrefs/globalin Firestore (no localStorage). The list view shows a column-header dropdown when any visible machine has >1 of a kind; each row falls back to its ownprimarywhen the selection isn't present locally. deviceResolversutility —resolveDevice,shouldShowDeviceDropdown, andunionIdshelpers with unit tests; shared by list view, card view, and the metrics detail panel.- Metrics detail panel persistence — selected metric tabs (and selected NIC, when relevant) now persist per machine via
userPreferences.graphTabs, so refreshing or switching sites no longer resets your view.
changed
- Agent hardware profile is built on-device (
hardware_profile.py) with signature-hash change detection and a 5-minute rate-limit gate; only re-uploads when hardware actually changes. Gate stamps its timestamp beforebuild_profile()so a persistent WMI/disk failure doesn't storm the heartbeat loop. shared_utils.get_system_metrics_with_configretained its legacy snake_case shape (cpu/memory/disk/gpu/network) for in-process consumers (mcp_tools,report_issue, tray GUI) while also carrying the camelCase keys the v2 uploader needs;skip_gpuis honored again to keep the tray GUI free ofnvidia-smiconsole-window flashes.- WMI calls from the metrics thread now initialize COM via
pythoncom.CoInitialize()so per-socket CPU detection works on dual-socket workstations instead of silently falling through to the psutil fallback. - Legacy singular metrics fields are deleted on v2 upload (
metrics.cpu/metrics.disk/metrics.gpu→DELETE_FIELD) so doc size doesn't grow with both schemas side by side.
fixed
- Cloud
metricsHistoryfunction reads v2 per-device maps (viaprimary+ first-entry) with v1 fallback, so sparklines and threshold alerts keep working across the rollout window instead of flatlining the moment a v2 agent reports. shimLegacyMachineno longer clobbers a real profile during the mixed-version window when a v2 agent has uploaded its profile doc but its next metrics write is still legacy-shaped.hardware_profile._mac_forreturn type tightened toOptional[str].
[2.8.0] - 2026-04-12
added
- Per-machine Cortex kill switch — operators can toggle Cortex off on a specific machine from the Cortex header (
CortexPowerToggle). The agent reads thecortexEnabledflag before every poll and, when disabled, rejects pending messages with a clear error back to the web UI instead of executing tool calls. Firestore security rules were extended to allow dashboard writes tocortexEnabledwithout granting write access to agent-only fields (online,lastHeartbeat). - Profile photo upload/remove — users can upload an avatar from account settings (
AccountSettingsDialog). Photos are stored in Firebase Storage atusers/{uid}/avatar.jpgand surfaced throughout the UI via the newUserAvatarcomponent (Cortex chat bubbles, page header). Newstorage.rulesgrant each user read/write access to their own avatar path only. - Copy-message button in Cortex chat — hover-revealed
CopyButtonon each chat bubble copies the full message text (cortex or user) to the clipboard. - Cortex suggested-question additions and new
docs/dashboard/timezones.mdreference page.
changed
- Relicensed from AGPL-3.0 to FSL-1.1-Apache-2.0 (Functional Source License, Version 1.1, Apache 2.0 Future License). Self-hosting, internal use, non-commercial use, and professional services remain freely permitted; only competing commercial products or services are restricted. Each release automatically converts to Apache License 2.0 two years after it is made available. Copyright holder is The Experiential Company.
- Updated license references across web dashboard footers, landing page, terms page, OpenAPI spec, docs site footer, agent GUI, and repository README to reflect the new license.
run_powershellallow-list removed — the first-token regex was security theater (a semicolon-prefixedGet-Date; Remove-Itembypassed it trivially) and caused constant false rejections on legitimate multi-statement scripts (foreach,if,try,$var = ...). Accountability now comes from the Firestore audit trail (cortex-events+ site logs) and the[MCP-AUDIT]local log, which captures a 500-char preview of every script.run_commandretains its binary allow-list.- Tier 3 audit previews expanded from 100 → 500 chars for script-bearing tools (
run_powershell,execute_script). One-line commands still truncate at 100. Multi-line PowerShell/Python bodies are now actually readable in the site log instead of showing only the first line. mcp_tool_callexempted from the per-type command rate limit. Cortex fires tool calls in parallel by design and is already authenticated + audit-logged per call; a 5-second throttle broke parallel queries. Other command types remain rate-limited.get_event_logs_filteredwrapped intry/catch— "no matching events" now returns[]cleanly instead of surfacing as a non-zero exit with empty stderr.sync-versions.jsnow updates the shields.io README version badge, the "Current" lines indocs/internal/version-management.md, and "Last Updated" date stamps automatically.
fixed
_get_agent_healthMCP tool was calling aHealthProbe()API that no longer existed; rewired to the currentHealthProbe(config_path, api_base).run()shape and now returnsstatus,error_code,error_message,checked_at, andchecksalongside version/hostname/uptime.
[2.7.0] - 2026-04-11
added
- 14 new MCP tools for Cortex — purpose-built Tier 2 admin tools with validated parameters. Eliminates most fallbacks to
execute_scriptfor common sysadmin scenarios and produces cleaner chat UX (one green tool call instead of 2-4 red-then-green retries).manage_process— kill / suspend / resume any OS process by name or glob pattern. Refuses to touch critical system processes (lsass, winlogon, csrss, etc.).manage_windows_service— full services.msc parity: start / stop / restart / pause / continue / set_startup / set_recovery / get_details.set_recoveryconfigures the auto-restart-on-crash safety net (first/second/subsequent failure actions, restart delay, reset counter).get_detailsreturns status, startup type, binary path, dependencies, and recovery config in one call.configure_gpu_tdr— set Windows GPU TDR (Timeout Detection and Recovery) registry values (TdrDelay, TdrDdiDelay). Critical for TouchDesigner/Unreal workloads with heavy shaders.manage_windows_update— pause/resume + full scheduling: set_active_hours, set_scheduled_install, set_restart_deadline, set_feature_deferral, set_quality_deferral.manage_notifications— suppress Windows toast notifications, enable Focus Assist (priority_only/alarms_only), disable notifications per-app. Essential for kiosks so Windows/Teams/Defender toasts don't appear on exhibit displays.configure_power_plan— set power plan + disable sleep/hibernate/screen blanking. Required for every 24/7 unattended installation.check_pending_reboot(Tier 1) — detect whether a reboot is pending and why (Windows Update, CBS, pending file renames, SCCM).manage_scheduled_task— full taskschd.msc parity: list / enable / disable / delete / run_now / stop / create / get_details / get_history.createsupports full trigger schema (boot/logon/once/daily/weekly/on_event/on_idle), run-as principals (SYSTEM/LOCAL_SERVICE/NETWORK_SERVICE), and settings (start_when_available, restart_count, execution_time_limit, multiple_instances, etc.).network_reset— flush_dns / renew_ip / restart_adapter / reset_winsock.registry_operation— allowlisted registry read / write / delete. Explicit allowlist of safe prefixes (Winlogon, GraphicsDrivers, WindowsUpdate, Notifications, Power, Services); SAM / SECURITY / Cryptography hives blocked.clean_disk_space— clean temp / windows_temp / prefetch / recycle_bin / owlette_logs with age filter and dry-run mode.get_event_logs_filtered— fast event log queries viaGet-WinEvent -FilterHashtable. Orders of magnitude faster than the olderget_event_logswhen filtering by process, event ID, or time window.manage_windows_feature— add / remove / list Windows Optional Features, Capabilities, or AppX packages. Removes OneDrive / Xbox Game Bar / Cortana / Teams consumer bloat during kiosk provisioning. Critical Windows features are blocklisted.show_notification— display an on-screen toast or modal message (opposite of manage_notifications). Useful when a tech is physically nearby.
- Background reboot-pending auto-alert — agent checks for pending reboot every 15 min and emits a site event via the existing
/api/agent/alertpipeline. Dashboard admins see "Reboot pending on [machine]" alerts; configured email/webhook alerts fire automatically. Idempotent — alerts once per pending-state transition.
changed
- Cortex CLAUDE.md — new "Prefer Tier 2 tools over Tier 3" section with a mapping table. Cortex will now reach for purpose-built tools first and fall back to
execute_scriptonly for novel tasks. - All new Tier 2 tools emit structured
[MCP-AUDIT]log entries for security monitoring. mcp_tools.pygained_CRITICAL_PROCESSESblocklist and_SAFE_REGISTRY_PREFIXESallowlist as hardcoded safety helpers.
security
manage_processrefuses to kill critical Windows processes (lsass, winlogon, csrss, services, etc.) and Owlette's own service — can't be bypassed via tool parameters.registry_operationhas explicit allowlist + blocklist; hives like SAM and SECURITY are unreachable regardless of params.manage_windows_featurehas a hardcoded blocklist of features Owlette itself depends on (NetFx4, WMI-, PowerShell, core networking).- Scheduled task creation uses argument-list subprocess calls and PowerShell string escaping to prevent command injection via task names, programs, or arguments.
[2.6.6] - 2026-04-11
fixed
- Screenshot capture restored — the v2.6.5
run_pythonsandbox blocked internal screenshot callers (mss,PIL,osimports denied), breaking the dashboard screenshot panel, Cortexcapture_screenshottool, crash screenshots, and live view.execute_in_user_sessionnow accepts atrusted=Trueflag for first-party callers that bypasses the sandbox; the LLM-facingrun_pythonMCP tool remains sandboxed (trusted=Falsedefault).
[2.6.5] - 2026-04-11
security
- MCP tool hardening —
run_commandnow usesshell=Falsewithshlex.split(posix=False)to prevent shell injection via metacharacters (&&,|,;). Allowlist still validates the first token. - Python sandbox —
run_pythonrestricts__builtins__(noeval,exec,compile,os,subprocess). Imports gated to safe stdlib modules only (math, json, re, datetime, etc.).open()andgetattrare allowed for file I/O and introspection. - File I/O path validation —
read_fileandwrite_fileMCP tools validate paths against allowed directories (Owlette data, user profile, temp, configured process dirs). Blocks system directory access and path traversal. - PowerShell audit logging —
execute_script,run_command,run_python,read_file,write_filenow emit[MCP-AUDIT]log entries for security monitoring. - Token exchange race fix — registration code exchange uses two-phase pattern: validate → generate tokens → atomically mark used. Prevents burning codes on transient Firebase Auth failures.
- Token refresh race fix — refresh endpoint wrapped in Firestore transaction to prevent concurrent requests from creating inconsistent token state.
- Command field allowlist —
commands/sendAPI filters data fields per command type before writing to Firestore. Prevents field injection (e.g., overridingtimestamporstatus). - Error sanitization — 60 API routes now use centralized
apiError()helper that returns generic messages in production, hiding Firebase internals and stack traces. - Firestore rules hardened — site creation requires
owner == auth.uid, agent logs requiremachineId == token.machine_id, chat creation requiresuserId == auth.uid(with autonomous exception). - HSTS header —
Strict-Transport-Security: max-age=31536000; includeSubDomainsadded. - Redirect validation — login and MFA pages validate redirect/return URLs (must be relative paths, blocks protocol-relative
//redirects). - Cortex nonce dedup — autonomous endpoint accepts optional
noncefield to prevent replay attacks. - Deployment checksum validation — API validates SHA-256 format when provided; agent already enforces verification.
changed
- Updated
next16.2.1 → 16.2.3 (Server Component DoS fix) - Updated
iron-session8.0.3 → 8.0.4 (cookie out-of-bounds char fix) - Updated
cryptography41.0.7 → ≥44.0.0 (hygiene; CVE-2023-50782 not exploitable in Fernet-only usage) -ExecutionPolicy Bypassretained onexecute_script(required for Group Policy–hardened kiosks)
[2.6.4] - 2026-04-10
added
- Session state classifier — new
session_state.pymodule persists shutdown intent and last-alive timestamps across reboots. On startup, the agent classifies how the previous session ended and emits a warning event to the dashboard for anomalous shutdowns:external_reboot— operator or Windows Update restarted the machine outside Owletteunexpected_reboot— no shutdown signal detected (BSOD, power loss, hard reset)unexpected_service_restart— agent process crashed or was killed (NSSM auto-restart)- Silent for Owlette-initiated reboots/shutdowns, version upgrades, and first runs
- Tooltips on event log action and detail text — truncated entries now show full text on hover (shadcn Tooltip, matching dashboard style)
changed
- Reboot event log details cleaned up — human-readable info leads, entry UUID shortened to 8-char prefix for correlation (was full 36-char UUID)
[2.6.3] - 2026-04-09
fixed
- CRITICAL: scheduled reboot scheduler no longer fires entries that are more than 5 minutes late. A 5-minute "missed-fire grace window" silently skips any entry observed past its scheduled instant + 5 min, marks it as fired-for-the-day, and logs a
scheduled_reboot_missedevent to the dashboard. Previously, if the agent restarted (or a schedule entry was edited) AFTER the scheduled time, the agent would catastrophically fire the missed reboot the next time it observed the entry — hours late, with no warning, with no chance for the operator to cancel. This was the cause of an unintended reboot that destroyed days of in-progress rendering work on a dev machine - Scheduled reboot scheduler no longer retries failed reboots. The previous 3-attempts-with-7-min-timeout retry loop is gone — a failed reboot is logged and dropped, never re-fired automatically
- Scheduled reboot scheduler now resolves entry times against the machine's local timezone, not the site timezone. A
14:00entry on a Tokyo installation and a14:00entry on a New York installation in the same Owlette site now reboot at their respective local 14:00s, not synchronized to one shared timezone. The dashboard reboot-schedule editor was always timezone-agnostic — the agent was incorrectly applying the site timezone - Tray icon "Exit" now actually stops the Owlette service. Previously it wrote a
tmp/shutdown.flagfile that the service detected and exited from, but NSSM'sAppExit Default Restartimmediately re-started the service, so Exit was a no-op the user couldn't see. Exit now triggers a UAC-elevatednet stop OwletteServicevia the Service Control Manager, which is the only stop NSSM respects - Latent
firestore_rest_client.set_document(merge=True)bug fixed. When called without anySERVER_TIMESTAMPfields (e.g. everyset_machine_flag()call), the function silently sent a PATCH withoutupdateMask.fieldPaths, which the Firestore REST API treats as a full document REPLACEMENT — every field not in the request body was DELETED. This wipedlastHeartbeat,online, andmetricsfrom the machine doc on every flag write, then the next_upload_metricscall (within ~1s) silently restored them. The bug had been latent in the codebase for an unknown length of time — only became visible when the new atomic startup flag-clear in this same release made the wipe gap large enough for the dashboard pill to flicker offline. Now correctly sendsupdateMaskwhenmerge=True - Dashboard
MachineStatusPillheartbeat parser hardened to handle every shape Firebase JS SDK v12 can return for a Timestamp field —Timestampinstance via.toMillis(), plain{seconds, nanoseconds}from cache rehydration, legacy{_seconds, _nanoseconds}, JSDate, plain number, ISO string. Previously, only the strictTimestampinstance shape was recognised; cache rehydrations dropped silently to0, which the staleness check then treated as "infinitely stale", flipping the dashboard pill offline - Dashboard
rebootMachine()andshutdownMachine()now writeconfigChangeFlag: truealongside the optimistic countdown anchor. Without it, the firestore.rulesallow updatepredicate rejected the write, so the optimistic countdown never appeared on the dashboard until the agent's own write came through (~5-10s later) — the perceived "the cancel button only appears right before the restart" bug
changed
- Reboot countdown anchors
rebootScheduledAtandshutdownScheduledAtare now Unix-seconds NUMBERS representing the TARGET reboot time (when the OS will actually restart). Previously they were Firestore server timestamps representing the START of a fixed 30-second countdown. The new semantic supports the new agent-side announce → 5-second pre-roll → 60-second OS countdown sequence, and the dashboard pill renders the countdown the moment the listener fires (no second round trip required) - Agent reboot scheduler state file moved from
C:\ProgramData\Owlette\state\reboot_state.jsontoC:\ProgramData\Owlette\tmp\reboot_state.json, alongside the existingapp_states.jsonandservice_status.json. The unusedstate\directory is removed entirely - Tray Exit no longer writes
tmp/shutdown.flag. The flag handler in the service main loop is also removed (it was dead code — see "fixed" above)
added
- New
firebase_client.set_machine_flags(dict)helper for atomic multi-field writes to the machine doc. Used by the new reboot announce path so the dashboard seesrebootScheduledAt + rebooting + rebootSource + rebootCancellable + rebootEntryIdin a single listener tick instead of multiple intermediate states. Raises on failure (unlike the silent-logset_machine_flag) so callers can react
[2.6.2] - 2026-04-08
added
- Live cancel-reboot countdown on the dashboard — the status pill becomes a red pulsing
MM:SStimer the moment a reboot or shutdown starts, anchored to a server-siderebootScheduledAt/shutdownScheduledAttimestamp so all viewers stay in sync and the countdown survives page refreshes - Hover the countdown pill to reveal a "cancel" affordance; clicking it sends
cancel_rebootand the pill returns to "online" once the agent confirms - Context menu adapts during a pending reboot/shutdown — the reboot/shutdown items are replaced with a single red "cancel reboot" / "cancel shutdown" item, keeping the discoverable cancel path intact for users who don't notice the pill
- Final-5-seconds safety: pill becomes non-clickable and shows "rebooting…" because Windows
shutdown /ais unreliable in the final phase - Scheduled (cron) reboots now write the same countdown anchor, so they get the same live timer + cancel UX as manual reboots
- New shared
MachineStatusPillcomponent used by both list and card views — eliminates the duplicated status badge JSX and the now-redundant standalone cancel button in card view
fixed
- Reboot/shutdown confirmation dialogs no longer claim "you can cancel during the countdown" without exposing any cancel UI — copy now reads "you'll have 30 seconds to cancel from the dashboard"
rebootingandshuttingDownflags written by the agent are now actually read by the dashboard's Firestore listener — they were declared on theMachineinterface but never propagated, so the existing "rebooting…" amber pill never displayed in production
[2.6.1] - 2026-04-05
added
- Batched process alert emails — crash alerts are queued for 2 minutes then grouped by site into a single digest email, preventing spam when multiple machines restart simultaneously
- New cron endpoint
/api/cron/process-alertsdrains the alert queue every 3 minutes and sends one digest email per site with a table of all affected machines/processes - Digest email adapts: single alert uses the familiar single-process layout, multiple alerts show a grouped table
fixed
- Owlette's built-in process restart (kill-and-relaunch for hung processes) no longer triggers false "process crashed" emails — writes KILLED status before terminating
- Scheduled and dashboard-initiated machine reboots/shutdowns no longer trigger crash alert emails — agent suppresses alerts during the shutdown window
- Removed unused
buildProcessAlertEmailfrom alert route (moved to cron digest)
[2.6.0] - 2026-04-04
added
- Sentry error monitoring for both web dashboard and agent — captures unhandled exceptions with full stack traces, machine identity tags (hostname, site_id, project_id), and user context
- Agent Sentry events include structured machine context (hostname, site, version) for quick triage
- Sentry tunnel route on web to bypass ad-blockers
sentryconfig section in agent config.json (disabled by default, preserved during Firestore sync)
fixed
- Firestore config sync now preserves local-only keys (
firebase,sentry) viaLOCAL_ONLY_KEYS - Sentry init added to NSSM runner (owlette_runner.py) — previously only in unused OwletteService.init
- Pre-existing stale test failures fixed: apiAuth NextRequest type, installer_utils terminate vs kill, shared_utils config/metrics API mismatches, removed deleted sanitize_process_name test
[2.5.9] - 2026-04-03
added
- Rich startup diagnostics logged on every service start: version banner (hostname, timezone, environment, Python version, Windows edition + build, install/data paths), system snapshot (CPU model + cores, RAM, disk free space, GPU name + VRAM, IP addresses), config summary (Firebase enabled, site ID, process count, Cortex status), startup phase timings (health probe, Firebase init, Firebase start), and a startup-complete summary block (version, total elapsed time, Firebase status, process count)
[2.5.8] - 2026-04-03
fixed
requireAdminmiddleware now accepts both API keys and Firebase ID tokens — previously only one auth method worked depending on route- Firestore collection renamed from
apiKeystoapi_keysfor naming consistency - Cortex lowercase enforced across all user-facing text
- Phantom QR code references removed from device pairing flow
- Firestore timestamps standardised to
serverTimestamp()across all writes - Dead
presence?.onlinefallback patterns removed from web reads - Device code documents now deleted on consumption/expiry instead of being marked with a status field
[2.5.7] - 2026-04-02
added
- FAQ section on landing page
- Pricing section on landing page
fixed
- Update log event no longer writes version string into the
levelfield - Update log messages lowercased; version strings prefixed with
v - "Updating owlette" label shown in dashboard while agent update is in progress
- Landing page explore link spacing polished
[2.5.6] - 2026-04-02
added
- API and Webhooks promoted to top-level nav section in docs
fixed
presence?.onlinedead fallback patterns removed from web dashboard reads
[2.5.5] - 2026-04-02
added
- Per-user alert emails — each user receives alerts for their own assigned machines
- Timezone labels in email alerts
- Centered unsubscribe link in alert emails
fixed
- Missing Firestore indexes added;
claude-agent-sdkdependency pinned
[2.5.4] - 2026-04-01
added
- Process scheduling UX overhaul — redesigned schedule editor with overnight schedule support (e.g. 22:00–06:00 spanning midnight)
fixed
- Site members can now access machine controls without requiring admin role
[2.5.3] - 2026-04-01
fixed
- Auto-update now reliably replaces Python files — replaced deprecated WMIC
call terminatewith PowerShellStop-Process -Forcein the installer'sInitializeSetup()phase. WMIC was silently failing to kill Python processes before file overwrite, causing Inno Setup to schedule locked DLLs for next-reboot replacement (which never comes during auto-update). Two-pass kill with verification ensures all handles are released before file copy begins.
[2.5.2] - 2026-03-31
changed
- Version bump
[2.5.1] - 2026-03-31
added
- Agent longevity hardening for 24/7 uptime — fixed resource leaks, unbounded queue growth, and error handling across all background threads
- Windows console event signal handling (
CTRL_SHUTDOWN_EVENTetc.) in the agent runner - Version number logged at service startup
fixed
- Dashboard machine online/offline status delay reduced from ~27s to ~4s
- Slow command worker thread must start after
self.running = True(previously caused worker to exit immediately) - Killed processes no longer trigger crash detection and relaunch
- Parallel install support for Cortex
deploy_software— existing registry keys hidden from installer to prevent unintended removal of previous versions - Cortex now requires explicit user confirmation before calling
deploy_software install.batservice detection switched to registry query —nssm statusreturned non-zero for stopped services, causing upgrade installs to skip removal and fail on re-registration
performance
- Landing page Lighthouse score improved from 69 to 89
[2.4.4] - 2026-03-31
added
- Cortex
deploy_softwaretool — AI-driven software deployment with full pipeline tracking (download, silent install, verify, Deployments page visibility). Server-side tool with user confirmation required - Cortex
get_system_presetstool — Retrieves admin-configured software presets (installer URLs, silent flags, verification paths) for use withdeploy_software - 12h/24h time format preference — User preference persisted to Firestore, applied across schedule editor, time pickers, and dashboard
listen_to_documentreturns awake_eventfor instant polling on Firestore writes
fixed
deploy_softwareoverrides/DIRflag to match the target version install path- Bidirectional config sync between agent and Firestore
- Network metrics flickering on dashboard
- Process list not detecting running processes; added Deployments link
- Cortex sidebar categories sorted by recency, batch categorize fixed
- Schedule editor time picker opens upward when near bottom of viewport
timeFormatpreference not persisting due to missing equality check
[2.4.2] - 2026-03-27
added
- Feedback / Bug Reporting — Report bugs from the web dashboard and the agent GUI (system tray → "Report Issue"), with log attachments and direct Firestore submission
- Branded email templates — Dark-theme HTML emails with shared layout system (header, footer, logo, site name in all transactional emails)
- Landing page overhaul — New hero with eye ignition animation, Blade Runner easter egg, rotating word, typewriter text, interactive background, use case and value prop sections, demo mode
- Demo mode — Full dashboard preview with simulated data, no login required
- SEO overhaul — new OG image, lowercase brand voice, sitemap, robots.txt, proper metadata
- File path, arguments, and PID displayed on process rows in the dashboard
- Download link in landing page nav;
/downloadredirect route
fixed
- Agent Bearer token auth in bug-report API route
- Email template: logo URL fallback, auto-link color, footer links, Gmail clipping
- Landing page layout: 4K centering, mobile accordion, hero vertical alignment, subheadline jitter
[2.4.1] - 2026-03-26
changed
- Agent pairing replaces browser OAuth — Agents now authenticate via device code flow. The installer displays a 3-word pairing phrase, auto-opens the pairing page in a browser, or users enter the phrase on the dashboard
- Installer publisher, URLs, and fallback version updated
fixed
- Installer pairing UX — auto-opens browser, improved colors, handles failure gracefully
- Rate limits increased; pause added on pairing failure
- Dashboard button and status badge hover states
removed
- Browser-based OAuth flow (replaced by device code pairing)
[2.4.0] - 2026-03-25
added
- Network Monitoring Dashboard — Per-NIC throughput charts with historical data (upload/download MB/s per adapter)
- Agent GPU Process Monitoring — Per-process VRAM usage via Windows Performance Counters (cross-vendor: NVIDIA, AMD, Intel)
- Cortex
execute_scripttool — Unrestricted PowerShell execution on the remote machine (Tier 3, requires confirmation) - Screenshot Vision Analysis — Cortex can analyze captured screenshots and provide behavioral guidance
- Cortex Chat Improvements — Markdown rendering, conversation categorization, process context awareness
- Live View — Real-time screenshot stream modal in the dashboard
- Reboot Scheduling — Schedule recurring reboots with cron-style configuration
- Threshold Alerts — Configurable alerts when CPU, memory, or disk usage exceeds thresholds
- Webhook Platform Formatting — Slack, Teams, and Discord formatted payloads for webhook notifications
- OpenAPI Documentation — Auto-generated API docs via Scalar at
/docs/api - Admin Tools API — REST endpoints for all Cortex tools, usable by external integrations
- Logs Improvements — Infinite scroll, date range filters, auth re-render optimization
- React Markdown rendering with GFM support in Cortex chat
fixed
- Screenshot max width increased to 8K; lower JPEG quality for reduced payload size
- Cortex language and tone improvements
[2.3.1] - 2026-03-24
changed
- Version bump for documentation audit and accuracy pass
[2.3.0] - 2026-03-22
added
- Cortex AI Chat — AI-powered chat interface with 29 specialized tools across three tiers for machine management via natural language
- Tier 1 (read-only): system info, process lists, logs, metrics, network, disk
- Tier 2 (process management): restart, kill, start, set launch mode, screenshot
- Tier 3 (privileged): run commands/scripts, read/write files, reboot/shutdown
- Autonomous mode: AI auto-investigates process crashes with configurable directives
- Escalation system: emails admins when Cortex can't resolve an issue
- Per-user and per-site LLM key management (encrypted at rest)
- Passkey Authentication (WebAuthn) — Passwordless login using biometrics or device PIN
- Discoverable credentials (no email needed to start login)
- Passkey login skips 2FA entirely (passkey IS the second factor)
- Clone detection via signature counter validation
- Management UI: list, rename, and delete registered passkeys
- Webhook Notifications — Configurable webhooks for process events, machine status changes, and deployment updates
- Process Scheduling — Schedule processes to run during specific time windows with launch modes (off, always, scheduled)
- Schedule presets for reuse across processes
- Admin schedule management page
- Screenshot Capture — Remote desktop screenshots with multi-monitor support
- Health Probes — Agent-side health monitoring with configurable checks
- Server-Side Deployment Status — Firebase Cloud Functions for automatic deployment status tracking
- Firestore trigger updates deployment status on command completion
- Scheduled sweeper marks stale deployments as failed (15 min pending, 30 min active)
- Software Inventory — Agent reports installed software to Firestore
- Admin Webhook Management — Dashboard page for configuring site webhooks
- Admin Schedule Presets — Dashboard page for managing schedule presets
- MkDocs Documentation — Complete documentation rewrite with MkDocs Material theme
changed
- Agent monitoring loop interval reduced from 10s to 5s
- Deployment system now uses Firebase Cloud Functions for status aggregation
- Process launch mode replaces simple autolaunch toggle (
set_launch_modereplacestoggle_autolaunch)
removed
owlette_updater.py— self-update logic moved into main service command handler
[2.1.8] - 2026-03-15
added
- Remote reboot/shutdown commands with dashboard UI
- Process crash email alerts
- Installer setup logging enabled by default
fixed
useSiteshook fetches assigned sites individually instead of collection query- Setup page fetches assigned sites individually instead of collection query
[2.0.49] - 2025-11-28
fixed
- VBS Cleanup Race Condition — VBS wrapper files for hidden process launches are now cleaned up in a background thread after 10s delay, preventing "file in use" errors
- Reduced Log Noise — GUI config change detection logs moved from INFO to DEBUG; verbose tray status logging removed
[2.0.48] - 2025-11-26
fixed
- Windows Defender False Positive — Installer now adds Defender exclusion for owlette directory (WinRing0 driver flagged as
VulnerableDriver:WinNT/Winring0). Exclusion removed on uninstall. - Join Site Performance — Fixed extremely slow "Join Site" operation (2+ minutes to instant) by removing redundant Firebase client initialization from GUI
- Silent Browser Launch — Changed from
webbrowser.open()toos.startfile()to prevent flashing command prompt windows during OAuth flow
[2.0.47] - 2025-11-24
fixed
- Token Encryption Key Stability — Changed encryption key derivation from
uuid.getnode()(MAC address) to WindowsMachineGuid. MAC address can return different values after reboot;MachineGuidis stable. Resolves "Agent not authenticated" errors after restart.
[2.0.46] - 2025-11-19
added
- Email Testing Page — New admin-only page at
/admin/test-emailfor testing email notifications, templates, and delivery - Update Panel Layout — Improved spacing and organization for machine update dialog
[2.0.44] - 2025-11-13
added
- Config & Logs Buttons — Quick access buttons in GUI footer to open config.json and logs folder
- Custom Messagebox — Improved text wrapping (92% width), compact layout, dark theme matching
- Firebase Reconnection Auto-Detection — Service detects when Firebase is re-enabled and automatically restarts the client
changed
- Unified Site Management — Join/Leave Site consolidated into single dynamic button
- Increased Deployment Timeout — Extended from 20 to 40 minutes for large installers
- Force Close on Uninstall — Inno Setup uninstalls now automatically close running applications
fixed
- Firebase Client Reference Errors — Fixed incorrect global variable usage in uninstall handler
- Firebase Client Not Restarting — Service now detects enable/disable transitions and reinitializes
[2.0.29] - 2025-11-12
fixed
- Config Version for New Installs — Fixed hardcoded version in configure_site.py; new installs now use correct
CONFIG_VERSIONconstant
[2.0.28] - 2025-11-12
added
- Environment Configuration — New
environmentsetting in config.json (production vs development)
fixed
- Tray Icon Launch Failure — Added
get_python_exe_path()helper to locate bundled Python interpreter - Incorrect Server URL — "Join Site" now correctly defaults to production URL
- Port Conflict on Retry — Enabled
allow_reuse_addresson OAuth callback server
[2.0.27] - 2025-11-11
fixed
- Software Inventory Sync Error — Fixed
NameErrorin post-installation inventory sync - Real-Time Deployment Status — Fixed deployment status staying on "downloading" until manual refresh; now transitions in real-time
[2.0.26] - 2025-11-11
added
- Process Launch via Task Scheduler — Complete rewrite using
schtasksfor service restart resilience - Enhanced Event Logging — Agent lifecycle events, process crash detection, GUI kill tracking
fixed
- Agent Stopped Logging — Implemented restart flag mechanism for graceful shutdown event logging
- Process Crash False Positives — Crash events no longer logged for manually killed processes
[2.0.15] - 2025-11-11
added
- Event Logs Page — Dedicated page for monitoring process events with filtering, pagination, and color-coded severity
- Hidden Process Launch — VBScript wrapper for truly invisible console application launches
- Event Logging to Firestore — Automatic logging of process starts, kills, crashes, and command executions
[2.0.0] - 2025-01-31
major release — cloud-connected architecture
Version 2.0.0 transforms owlette from a standalone Windows process manager to a cloud-connected system.
added
- Next.js Web Dashboard — Remote monitoring and control from any browser
- Firebase/Firestore Backend — Real-time bidirectional data sync
- Remote Software Deployment — Silent installation across multiple machines
- Deployment Templates — Save and reuse installer configurations
- PID Recovery — Reconnect to existing processes after service restart
- Multi-Site Management — Organize machines by location
changed
- Monorepo Structure — Unified
agent/andweb/directories - Firebase as Primary Backend — Gmail/Slack marked as legacy
- Configuration Schema — Updated to v1.3.0 with Firebase settings
deprecated
- Gmail and Slack notifications (replaced by web dashboard alerts)
[0.4.2b] - legacy
standalone architecture
The original owlette — a standalone Windows service with local configuration.
- Windows service process monitoring with auto-restart
- System tray icon and GUI configuration
- Gmail API and Slack API notifications
- Local JSON configuration
- Process priority and visibility control
migration guide: v0.4.2b to v2.0.0
| v0.4.2b feature | v2.0.0 equivalent |
|---|---|
| Gmail notifications | Web dashboard alerts + email via Resend |
| Slack notifications | Web dashboard alerts |
| Local GUI only | GUI + Web dashboard |
| Manual configuration | GUI or web-based config |
| Single machine | Multi-machine, multi-site |
| N/A | Remote deployment |
| N/A | Project distribution |
| N/A | Cortex AI chat |
troubleshooting
Cross-cutting troubleshooting guide for common issues across the entire owlette system.
self-hosting
owlette is available as a hosted service at owlette.app - no setup required. If you run your own instance, this section covers the Firebase, storage, web runtime, and agent pieces you need to operate the full stack.