About
William Kennedy is a managing partner at Ardan Labs in Miami, Florida. Ardan is a group of passionate engineers, artists and business professionals focused on building and delivering reliable, secure and scalable solutions. Bill is also the author of Go in Action and the Ultimate Go Notebook, plus the author of over 100 blog posts and articles about Go. Lastly, he is a founding member of GoBridge and the GDN, which are organizations working to increase Go adoption through diversity.
Sessions at Gophercamp 2026
Kronk: Hardware accelerated local inference
In this talk Bill will introduce Kronk, a new SDK that allows you to write AI based apps without the need of a model server. If you have Apple Metal (Mac), CUDA (NVIDIA), or Vulkan, Kronk can tap into that GPU power instead of grinding through the work on the CPU alone. To dog food the SDK, Bill wrote a Model Server that is optimized to run your local AI workloads with performance in mind. During the talk, Bill will show how you can use Agents like Cline and Kilo Code to run local agentic workloads to perform basic work.
Ultimate Private AI
This is a hands-on, full-day workshop where you'll go from zero to running open-source models directly inside your Go applications — no cloud APIs, no external servers, no data leaving your machine. You'll start by loading a model and running your first inference with the Kronk SDK. Then you'll learn how to configure models for your hardware — GPU layers, KV cache placement, batch sizes, and context windows — so you get the best performance out of whatever machine you're running on. With the model tuned, you'll take control of its output through sampling parameters: temperature, top-k, top-p, repetition penalties, and grammar constraints that guarantee structured JSON responses. Next you'll see how Kronk's caching systems — System Prompt Cache (SPC) and Incremental Message Cache (IMC) — eliminate redundant computation and make multi-turn conversations fast. You'll watch a conversation go from full prefill on every request to only processing the newest message. With the foundation solid, you'll build real applications: a Retrieval-Augmented Generation (RAG) pipeline that grounds model responses in your own documents using embeddings and vector search, and a natural-language-to-SQL system where the model generates database queries from plain English — with grammar constraints ensuring the output is always valid, executable SQL. Each part builds on the last. By the end of the day, you won't just understand how private AI works — you'll have built applications that load models, cache intelligently, retrieve context, and generate code, all running locally on your own hardware.
