Hello World

I spent the last few months trying to get AI agents to build my personal website. Along the way, I tried four different agent harnesses, a multi-agent communication protocol, an AI-native issue tracker, and a custom workflow orchestration language. Most of it didn’t work the way I expected.

This post is about what that actually looked like — the dead ends, the tools that didn’t stick, and the one insight that changed how I think about all of it.

Me, endlessly tweaking agentic workflows instead of actually building a working product

A caveat upfront: I’m not an expert in agentic development. I’m a product manager and developer who’s genuinely curious about these tools and patterns. I’m more interested in understanding how to structure workflows for complex objectives than in shipping any particular product. This site was the vehicle for that exploration.

Why This Site

I needed a home base. Somewhere to put my projects, start writing publicly, and make my professional background visible in one place.

Some context: I spent years deep in Bitcoin — studying the protocol, hosting Honolulu BitDevs for several years while living in Hawaii, and eventually working in Bitcoin fintech as a product manager. Bitcoin consumed most of my time, and during that period I was really only passively working on small, private projects on the side. Before product management, I was a QA engineer by trade — so I know a lot about testing and a bit less about full application development. I never prioritized the time to build much I was comfortable with open sourcing. Plenty of half-baked projects live in private repos, and maybe I’ll circle back to them someday.

With the advancements in agentic development over the last couple of years, that barrier to entry for actually building things has gone down drastically. And it turns out agentic work caters well to my specialties. Agents need clear functional requirements, acceptance criteria, and ways to validate their work to excel — and my QA background is well-suited for doing exactly that. I probably couldn’t write all the code from scratch without a lot of time and effort. But I know how to articulate problems and reason about solutions, and that’s increasingly what matters.

So: a personal site. Portfolio, blog, professional background — all in one place, built with the agentic tools I’ve been experimenting with.

But I also saw an opportunity: building this site would be a testbed for agentic development workflows. I’d been experimenting with AI coding assistants for a while, and I wanted to push further — not just using them to autocomplete code, but treating them as genuine collaborators in a development process.

Let me be honest: this website isn’t some incredibly complex project. I could have banged out a passable site in a few detailed prompts. But I cared more about understanding the workflows themselves — the layers of task tracking, custom commands, orchestration scripts. Not because the site needed them, but because I wanted to know: how can things be structured to support more complex objectives when the time comes?

The site itself became the experiment.

Writing in Markdown, Publishing from Obsidian

I already maintain an extensive knowledge base in Obsidian — research notes, project documentation, clippings, ideas. I work in markdown constantly, partly because it’s the most agent-friendly format there is. AI assistants can read it, write it, and reason about it with zero friction. When I’m ideating or synthesizing research, I’m already doing it in markdown files that agents can operate on directly.

So when I found Vault CMS — an approach that lets you use an Obsidian vault as a CMS for an Astro site — the implication was obvious. My existing writing workflow was the publishing workflow. Wikilinks, callouts, embeds — all the Obsidian features I rely on, rendered on the web. And every piece of content is already in the format that agents work best with.

The gap between “thinking in private” and “writing in public” turned out to be smaller than I expected. The publish step is almost incidental. And the agentic assistance that was already part of my note-taking workflow comes along for free.

The Agentic Development Experiment

Here’s where it gets interesting. I didn’t just want to use one AI tool — I wanted to understand the landscape. Over the course of building this site, I tried several approaches to agentic development. Some worked. Most taught me something.

Claude Code as the Primary Harness

I settled on Claude Code as my main development tool, but not out of the box. The real value came from customization — building custom slash commands like /work (a full ticket lifecycle), /breakdown (decomposing features into trackable tasks), and /write (drafting blog posts with structured author callouts). I also wrote Open Prose workflow definitions — .prose files that define multi-agent roles (a beads manager, a codebase explorer, a planner, an implementer) and orchestrate them through a development cycle.

This kind of customization is where agentic tools start to feel genuinely powerful. You’re not just prompting — you’re designing workflows.

The Interoperability Trap

Early on, I tried to maintain interoperable workflows across multiple agent harnesses: Claude Code, Codex, and OpenCode. The idea was appealing — use the best tool for each job, share context and skills across them.

In practice, it was a mess. OpenCode’s Zen model inference was flexible, but API costs were steep. More fundamentally, aligning hooks, skills, and conventions across different tools introduced a lot of confusion. Each harness has its own patterns, and forcing them to share a workflow meant none of them worked particularly well.

My honest take: interoperability across agent harnesses is improving, but for most people right now, picking one harness and specializing in it is a more effective approach. Go deep rather than wide.

Multi-Agent Coordination (and Chaos)

I also experimented with MCP Agent Mail — a tool by Jeffrey Emanuel (Doodlestein) that lets multiple agents work in parallel on the same branch, communicating through a message-passing system with file reservations.

The concept is compelling. In practice, I found it too chaotic. Multiple parallel streams of work were hard to track. The agents themselves struggled to cooperate effectively — deciding who should work on what, avoiding conflicts, communicating status. I couldn’t keep up with the volume of parallel activity, and the coordination overhead ate into the productivity gains.

I ultimately switched to a hardened single-agent workflow, orchestrated through Open Prose scripts. One agent, focused, with structured review cycles. Less exciting, more effective.

Beads for Task Tracking

Beads is Steve Yegge’s AI-native issue tracker — distributed, git-backed, designed for agents to use directly. The core concept is genuinely good, and I’ve been using it throughout this project.

But keeping beads up to date has been a consistent challenge. Agents routinely ignore tracking requirements unless you aggressively enforce them. One agent skipping the process can invalidate task state and confuse every agent that comes after it — they can’t tell what’s been done and what hasn’t. The more complex your workflow architecture, the harder consistent enforcement becomes.

That said, Beads solves a real problem when it works. More on that in a moment.

A note on the Beads ecosystem

The original Beads project has been expanding toward complex orchestration for Steve’sGas Town project. If you’re interested in the core issue-tracking concept without the added complexity, check out Jeffrey Emanuel’s Rust port: beads-rust. It sticks closer to the original vision — focused, fast, and does the fundamentals well.

The Context Engineering Insight

If there’s one thing I’d want anyone experimenting with agentic development to take away from this post, it’s this: context engineering matters more than prompt engineering.

The term has been gaining traction, and for good reason. Several pieces have shaped my thinking here:

Philipp Schmid’s “The New Skill in AI is Not Prompting, It’s Context Engineering” — reframing the core skill from crafting prompts to shaping the information environment around them
Anthropic’s “Effective Context Engineering for AI Agents” — practical patterns from the people who build the models
Boris Tane’s “Context Engineering is What Makes AI Magical” — why context is the real leverage point
LangChain’s “How Agents Can Use Filesystems for Context Engineering” — grounding context in persistent, structured file systems

Here’s what this means in practice: agents hit a “dumb zone” around 60% of their context window. Output quality degrades — they start losing track of earlier instructions, making inconsistent decisions, hallucinating details. This isn’t a prompting problem. It’s a context problem.

The implication is that you need to size tasks so an agent can complete them entirely within a fresh context window. Handing off half-completed work to a new session is extremely hard to automate well. The receiving agent doesn’t have the same mental model, doesn’t know what implicit decisions were made, and wastes tokens reconstructing context that the previous session had naturally.

Beads as the Atomic Unit

This is where Beads clicks. A well-written bead isn’t just a task description — it’s a self-contained context packet. It has everything an agent needs to pick up the work cold: what needs to be done, why, which files are involved, what the acceptance criteria are, and what related work exists.

The workflow becomes almost mechanical:

Start a fresh session
Run bd ready — see what’s unblocked
Pick a task — the bead description has full context
Do the work entirely within this session
Review, iterate, address any issues
Close the bead
New session, repeat

No handoff problem. No context fragmentation. Each session is a clean, focused unit of work. The bead is the interface between sessions — it carries the context so the agent doesn’t have to.

This pattern also makes workflow enforcement more tractable. Instead of trying to get agents to follow complex multi-step processes across sessions, you’re asking them to do one thing: pick up a bead, complete it, close it. The complexity lives in the bead descriptions and the dependency graph, not in the agent’s runtime behavior.

To make this concrete, here’s what a well-written bead looks like in practice:

home-base-olf · Add custom Nostr social icon  [P2 · OPEN]

Replace the placeholder icon for the Nostr social link in navigation.

Current state:
- Config (src/config.ts): Nostr link uses icon: 'hexagon-nodes' (generic)
- Icon system: src/components/Icon.astro uses FontAwesome brands + custom SVG paths

Task:
1. Find official Nostr SVG icon
2. Add as custom SVG path in Icon.astro icons map under name 'nostr'
3. Update src/config.ts social entry: icon: 'nostr'
4. Verify renders at 24px, matches other social icons

Verification:
- npm run dev → check header nav → distinct, recognizable icon
- Test dark mode and light mode
- Test mobile menu rendering

That’s a complete context packet. An agent starting a fresh session can read that bead, understand exactly what to do, do it, and close the bead — all without needing context from any previous session. That’s the goal.

What I’ve Learned So Far

If I had to distill everything into a short list:

Go deep on one tool. I wasted weeks trying to maintain interoperable workflows. The compounding returns come from specializing — custom commands, custom skills, custom workflows built around one harness’s patterns.

Expect agents to break your process. They will ignore rules, skip steps, and forget instructions. Design your workflows to be resilient to that. Atomic tasks with self-contained context are more robust than complex multi-step processes that assume compliance.

Start with one agent. Multi-agent coordination is exciting but the overhead is real. A single focused agent with structured review cycles outperforms a swarm of agents that can’t cooperate.

Front-load planning, back-load review. I used to try to minimize token consumption — get it right in one shot. Now I’d rather spend heavily on both ends. Plan upfront with multi-perspective adversarial reviews to surface problems and ambiguities before building anything. Then review the implementation thoroughly against the functional requirements and acceptance criteria. The cost of drift — building something that doesn’t match what was specified — compounds fast across sessions.

Treat context as infrastructure. This is the meta-lesson. Your task descriptions, your file organization, your AGENTS.md files — these aren’t documentation. They’re the operating environment your agents run in. Engineer them accordingly.

What’s Next

This site is still being built. I’ve recently started experimenting with using OpenClaw agents to handle some of the ongoing development and content production — but that’s a story for another post.

For now, there’s a backlog of work to do: more content, more polish, more experiments. If you’re building with agents too, I’d love to hear what’s working for you — find me on Nostr or X.

The site is the experiment. The blog is the lab notebook. Let’s see where it goes.