Arcaneum via RDRs
A case study in spec first LLM development
Arcaneum is a set of tools to index pdf documents, source code repositories, and markdown files. It can be installed as a Claude Code Marketplace plugin.
It is intended to create both semantic search and full text search indexes for use by an LLM – when rendering code or creating a feature specification. But currently I’ve had my hands full just getting the semantic search working reliably. This is a me problem, not a technology problem.
Prior to building Arcaneum I started using what I call a specification first LLM driven development model via RDRs. In my day job, I write a single RDR per feature. But for Arcaneum, I decided to write as many RDRs as possible before starting to render a single line of code. Check out the RDR index.
I don’t write Python, but Arcaneum is in Python, knowing there were a rich set of supporting libraries. Also, I wanted to see how far I could get not reviewing code, but just by using the app and adjusting the app at an LLM level of abstraction. If I did read the code, I wouldn’t have any opinions on it, so why bother.
The RDR model worked great getting my thoughts in order and during subsequent development. My primary challenge has been performance and memory management of the chunking and embedding pipeline. This is somewhat aggravated by my lack of reasonable tests. As a Java person, I complete every RDR with a suite of tests. But this wasn’t something I pushed hard on with Arcaneum.
I use Arcaneum every day I write code and on the days I’m just doing research.
One of my primary flows is to clone a git repo, index it, and create a document analyzing the code base over a different lenses. For example, how complete is the JWT support in a given framework. Or compare two frameworks footprint’s in relation to size of transitive dependencies and number of documented CVEs over time. I even ‘diff’ two versions of a standards document (this is not a replacement of your compliance team….).
I’m about to pick up my Clusterless project again and refresh parts of it. In preparation, I indexed the full aws-cdk repo last week, it took almost 30 hours using the jina-code embedding model – most repos take a new minutes to an hour depending on the embedding model used. Every line of Clusterless and Tessellate to-date are human crafted. I’m cautiously looking at using an LLM moving forward (using the RDR in strict mode).
It’s important to note that semantic search isn’t complete without a complimentary full text search interface. So far I’ve been mirroring my pdfs documents (mostly standards documents) with markdown copies in a parallel directory (not true of my full academic paper corpus of +3k files), so my LLM can grep for lines to use as citations etc.
Also, mactop is great for watching GPU utilization.