Building an Autonomous Agent Runtime in Go

The first sign something was missing came when I watched an agent forget what it had decided three steps back. The tool call succeeded, the LLM responded, but the context had drifted. There was no reliable memory of what had already been tried.

That observation started Painless Agent: a self-hosted autonomous agent runtime built in Go, with PostgreSQL, pgvector, and Redis as the persistence layer.

Why existing frameworks were not enough

Most agent frameworks are conversation wrappers. They give you a loop, some tool definitions, and a system prompt. That works for demos. It does not hold up for long-running workflows where the agent needs to plan across sessions, recall prior decisions, and recover from partial failures.

The deeper problem is state. Frameworks that run entirely in memory lose everything when the process restarts. Agents that depend on appended conversation history grow stale and expensive. Neither model is durable enough for workflows that run for minutes or hours.

Architecture decisions

Go was a deliberate choice. Long-running workflows benefit from low-overhead goroutines, explicit error handling, and predictable memory behavior. The structured concurrency model maps cleanly onto a task runtime that needs to schedule, execute, and monitor work across multiple sessions.

PostgreSQL handles durable task state and workflow records. pgvector handles semantic memory retrieval. Redis handles ephemeral caching, session state, and pub/sub for real-time updates. Goose manages schema migrations. Each component has a defined boundary and a single owner.

The agent runtime is organized around four layers: the task store, the memory system, the skill registry, and the LLM provider abstraction. Those layers communicate through explicit interfaces, not shared state.

Memory and planning

Persistent memory is implemented as two complementary stores. Short-term memory lives in Redis with a TTL. Long-term memory is encoded as vector embeddings and stored in pgvector. When the agent needs to recall prior context, it queries both stores and synthesizes a summary before the next LLM call.

Task planning uses a structured decomposition loop. When the agent receives a high-level objective, it breaks it into subtasks, estimates dependencies, and stores the plan in PostgreSQL. Each subtask carries a status, a retry count, and a parent reference. The scheduler picks up pending subtasks, executes them, and updates state transitions atomically.

The reflection system runs after each task completes. It evaluates output quality, identifies what failed, and writes a short note back to long-term memory. That note shapes how the agent approaches similar tasks in future sessions without requiring the full history to be replayed.

Challenges

The hardest part was not the LLM calls. It was the state transitions.

An agent that decomposes a plan and then partially executes it needs to answer: what is safe to retry, what has side effects, and what should be abandoned. Most frameworks do not model this at all. Painless Agent uses a finite-state task model with explicit terminal states for success, failure, and abandonment. Retrying a failed subtask is a deliberate action, not a default.

Sandboxed tool execution was the second difficult problem. When the agent can run code, read files, and call external APIs, the failure modes multiply. The current model uses an opt-in skill registry with declared permissions. Each skill specifies what it can affect, and the runtime validates the permission boundary before execution.

Streaming required less structural work but still needed care. SSE over long-running HTTP connections handles progress updates well, but clients need to handle reconnections. The backend buffers recent events in Redis so a reconnecting client can replay from its last known position without re-running the task.

Where it is now

The core runtime is functional. Tasks are created, decomposed, scheduled, executed, and persisted. Memory retrieval works across sessions. The LLM provider abstraction supports OpenAI, Anthropic, and GitHub Copilot through the same interface. The skill system allows new capabilities to be registered without touching the core runtime logic.

The web dashboard and browser tooling are in progress. Containerized sandboxed execution is the next infrastructure piece.

What I would do differently

The database schema evolved too much in the first month. Earlier clarity about which state transitions were meaningful would have saved migration cycles. It is worth spending time on the task state model before writing the first table.

I also underestimated how much the provider abstraction would matter. Different models have different strengths for planning, execution, and reflection. Having the abstraction in place early made it straightforward to route task types to the right model without modifying the runtime.

The durable lesson

An autonomous agent is not a better chatbot. It is a runtime for structured work. Reliability comes from the same things that make any backend reliable: clear state, explicit transitions, recoverable failures, and persistent memory that outlasts the process.

The LLM is one component. The infrastructure around it is the product.