Sessions and the scheduler

A session is a single run of an agent. Every time an agent starts — whether a user clicked “Run now” or the platform scheduler fired a cron tick — x1agent records a row in the sessions table and drives it through a small state machine until it completes or fails.

This page describes the sessions domain and the scheduler that feeds it. It deliberately stops at the database boundary: execution (the Kubernetes Job, the sidecar, the agent container) is covered in the Architecture Overview.

Lifecycle

stateDiagram-v2
    [*] --> pending: trigger
    pending --> running: executor claims
    running --> complete
    running --> failed
    pending --> failed: cancel / expire
    complete --> [*]
    failed --> [*]

pending — the row exists and awaits an executor. This is the handoff state between the sessions domain and the execution layer.
running — an executor has claimed the row and is driving the Job forward.
complete — the agent exited cleanly.
failed — the agent exited non-zero, the Job timed out, or the session was cancelled before it started.

A session never moves backwards. Once it reaches a terminal state (complete or failed), completed_at is set and the row is immutable.

Trigger sources

Every session has a triggered_by discriminator:

triggered_by	triggered_by_user_id	Meaning
`user`	populated	Someone clicked Run now or hit the API.
`scheduler`	null	The platform scheduler fired a cron tick.

Storing the distinction explicitly lets the UI show who fired the run and lets the scheduler reason about its own history without guessing.

The scheduler

The scheduler is a single loop inside the API process. It ticks every 30 seconds and, for each active agent with a cron schedule, decides whether a new run is due.

sequenceDiagram
    participant T as Ticker (30s)
    participant S as Scheduler
    participant A as AgentRepo
    participant R as SessionRepo

    T->>S: tick(now)
    S->>A: listScheduled()
    loop per active+scheduled agent
        S->>R: lastSchedulerRunFor(agent)
        S->>S: nextDue = cron.after(lastRun ?? agent.createdAt)
        alt nextDue <= now
            S->>R: create(pending, triggered_by=scheduler, triggered_at=nextDue)
            Note over R: unique (agent_id, triggered_at)<br/>keeps duplicate ticks idempotent
        end
    end

Three properties matter and are worth spelling out:

Idempotency. The unique index on (agent_id, triggered_at) means two ticks that compute the same nextDue will not both succeed. The second insert fails with a duplicate key error; the scheduler swallows it and moves on. This keeps a briefly-flapping API pod from creating duplicate runs.

Catch-up, not replay. nextDue is computed once per tick from the last scheduler-triggered row. If the process was down for an hour, the next tick fires one run (the next one after now), not sixty. Missed runs are missed; we do not want a backlog of stale runs stampeding when the API comes back up.

No leader election. The scheduler is safe to run from multiple API replicas because the unique index is the lock. Whichever replica inserts first wins; the rest get a duplicate-key error and continue. We do not need Redlock, leases, or FOR UPDATE SKIP LOCKED at this scale.

Cron syntax

The scheduler accepts any expression that cron-parser accepts — 5-field cron, plus the named macros @hourly, @daily, @weekly, @monthly, @yearly. One extra local form is supported: @every <n>(m|h|d) for “every N minutes, hours, or days.” Validation happens in the domain layer; invalid schedules are rejected at agent-create time, not at tick time.

API surface

Sessions are addressed under an agent. The list is scoped to the agent; the trigger endpoint requires workspace admin.

GET  /api/workspaces/:slug/agents/:agentId/sessions
     → { sessions: [...last 50 rows, newest first...] }

POST /api/workspaces/:slug/agents/:agentId/sessions
     → { session: {...pending row...} }
     Creates a pending session with triggered_by=user.

POST /api/workspaces/:slug/agents/:agentId/sessions/:sessionId/cancel
     → { session: {...failed row...} }
     Only valid while status=pending. Running sessions are cancelled through
     the execution layer, not here.

Why it lives in a domain package

The sessions domain owns four things: the session entity, the status state machine, the scheduler-tick logic, and the HTTP surface. It does not own the executor — that is a separate concern that will land with the Kubernetes Job watcher. Keeping scheduling and execution in separate packages means we can ship and test the scheduler against a real database today, and swap in the executor later without changing the scheduling contract.