Curious How you can actually talk to your documents with an LLM? So was I, so I built some tools.

The technique is called Retrieval-Augmented Generation (RAG). There are some open source tools that exist: LM Studio, AnythingLLM, OpenWebUI. However, most are black boxes. It’s not exactly clear how they work under the hood. I could look at the source but wanted to build something up from first principles.

So I built a set of tools to run my own RAG experiments using transcripts from Apple WWDC conferences.

This was a learning project, not a product. But along the way I realized something: the way you chunk and tag your documents makes a huge difference.

Most RAG pipelines I’ve seen ignore context structure. I wanted to see what happens when you don’t.

If you’re curious how to build your own RAG stack I’d love feedback, forks, or ideas for where to take it next.

github.com/JoeCotell…

Joe Cotellese @JoeCotellese