LLM Toolchain - June 2026
My toolchain is constantly evolving, but as of June 2026 I’ve landed on the following:
- claude harness - latest model
- codex harness - latest model
- glm-5.2 via olama via claude harness
- Arcaneum - local full text and semantic search
- RDR skills
- kata-flight skills
- kata
- roborev
Coming soon:
Arcaneum is how I start every project, via deep research against academic literature, adjacent open source projects, reference materials, and any online documents. For academic research I have a full corpus named PapersFast (with ~5k papers) that is fast to index into and search across. Often I create focused corpora with a better embedding model for papers on a specific topic to help catch nuances across papers.
The RDR process has been evolving for a year. It’s a spec-first process that now has skills wrapping each stage of the process. It is not fully automated by design as I still do all the user-facing design work and drop into the specs to revise them.
kata-flight is new. Over the past month or so I’ve effectively created an ad-hoc state machine around issue tracking with kata. kata-flight formalizes that in a set of skills.
(bind repo)"] -.setup.-> flight doctor["/kata-flight-doctor
(health check)"] ops["/kata-flow-ops
(dashboard / reaper)"] kinbox["/kata-inbox
(human inbox drain)"] flight["/kata-flight
batch orchestrator"] review["/kata-scope-review
review gate"] kship["/kata-ship
single-kata ship"] resolve["/kata-resolve
fix in worktree"] roborev(["roborev refine / respond"]) flight -->|"per wave"| review flight -->|"per kata"| kship kship -->|"resolve"| resolve kship -->|"refine"| roborev review -->|"design fork"| seed[/"kind:rdr-seed
(kata label)"/] review -->|"held / needs human"| inbox[/"inbox:*
(kata label)"/] inbox --> kinbox kinbox -->|"READY"| flight kinbox -->|"TO-SEED"| seed seed --> seedtri["/rdr-seed-triage"] seedtri -->|"/rdr-seed (Stage 1)"| rdrflow{{"RDR design flow
Stages 1-7 (external engine)"}} rdrflow --> impltri["/rdr-implement-triage
(Stage 8: build)"] impltri --> roretri["/roborev-triage"] impltri --> implland["/rdr-implement-land
(land)"] implland -->|"flight bug children"| flight pship["/prompt-ship"] -->|"refine"| roborev wsp["worktree-ship-pipeline"]:::lib -. read by .-> kship wsp -. read by .-> pship lland["lib-land-rdr"]:::lib -. read by .-> impltri lland -. read by .-> implland classDef skill fill:#EDE9FE,stroke:#7C3AED,color:#1E1B4B; classDef lib fill:#F5F5F4,stroke:#A8A29E,color:#1C1917,stroke-dasharray:4 3; classDef label fill:#FEF3C7,stroke:#D97706,color:#713F12; classDef ext fill:#DBEAFE,stroke:#2563EB,color:#1E3A8A; class init,doctor,ops,kinbox,flight,review,kship,resolve,seedtri,impltri,roretri,implland,pship skill; class seed,inbox label; class roborev,rdrflow ext;
It allows me to ship well-researched bug fixes and small features. But if a feature or bug hits a specific threshold, it funnels the issue into the RDR process for more intentional design. There is also a wrapper skill that will drive an RDR implementation (after a design is marked final) through any roborev fixes that spin off during the implementation.
Some projects use codex as the roborev reviewer, others use claude+glm-2.5 as the reviewer.
Note that I’m publishing these skills as a matter of transparency with no expectation of any adoption. It’s great to see what others are doing, so I’m sharing. I know everyone has their own bespoke workflows and skills, so hopefully others will publish their own flows.
I also use Arcaneum to index/search harness transcripts to find ways to improve the skills and prompts. As well as leverage current research on prompt engineering. This is an ongoing process.
I’m not a “token maxer”, but I am tracking my usage publicly on https://tkmx.odio.dev/u/cwensel.
It would be great if there was a productivity metric that showed that I was getting more capability for the same token usage. It would mean my RDR/Flight skills/prompt changes were improving my token usage.
One major issue I’ve found is that LLMs are bad at state transitions, they are always slow, but frequently make poor choices as to the next state regardless of how simple the decision criteria is.
Because of that I’ve started a new project named intrastate.
As of this moment it is nothing but a collection of final RDR instances. Tokens permitting I’ll kick off the implementation before I drop out for summer vacation.