Curious How you can actually talk to your documents with an LLM? So was I, so I built some tools.
The technique is called Retrieval-Augmented Generation (RAG). There are some open source tools that exist: LM Studio, AnythingLLM, OpenWebUI. However, most are black boxes. It’s not exactly clear how they work under the hood. I could look at the source but wanted to build something up from first principles.
So I built a set of tools to run my own RAG experiments using transcripts from Apple WWDC conferences.
This was a learning project, not a product. But along the way I realized something: the way you chunk and tag your documents makes a huge difference.
Most RAG pipelines I’ve seen ignore context structure. I wanted to see what happens when you don’t.
If you’re curious how to build your own RAG stack I’d love feedback, forks, or ideas for where to take it next.