// Capabilities
Everything Kenzy does — and where it runs.
Each capability is a self-contained service. Run the whole set on one machine or spread it across the house. The common thread: it can all run on hardware you own.
01 · KENZY-NODE
Wake word, listening on-device
Every room node runs openWakeWord on each audio frame, locally. Nothing is streamed anywhere until the wake word actually fires — the mic isn't an open line to the cloud.
An optional Silero VAD gate suppresses false triggers on near-silence, so you can lower the threshold for better real-speech sensitivity without the assistant waking to a creaky floorboard.
- Two bundled models, or drop in a
.tflite/.onnxyou trained yourself - VAD-gated detection to cut false wakes
- Wake-word interrupt — talk over playback to start a new request
Loaded models
Detection
02 · KENZY-STT
Transcription on your machine
Speech-to-text runs on faster-whisper — locally, by default. Pick a model size to match your hardware, from tiny on a Pi to large-v3 on a GPU workstation.
Because it's a service behind a URL, you choose where it lives. Keep it on the LAN and your spoken words are transcribed without ever touching someone else's server.
- CPU or CUDA, with int8 / float16 compute options
- Model size is one line of config
- Swap in a cloud STT endpoint only if you want to
whisper: model: "base" # tiny → large-v3 device: "cpu" # or "cuda" compute_type: "int8" language: "en"
03 · KENZY-LLM
Bring your own language model
This is the heart of the "100% local" story. kenzy-llm runs on LiteLLM, which talks to local runtimes — Ollama, LM Studio, vLLM — exactly the same way it talks to OpenAI or Anthropic.
So you decide the privacy/quality trade-off, per install. Run a small model entirely offline, or route to a frontier model in the cloud. Changing your mind is two lines of YAML.
- Local or cloud — same config, swap freely
- Per-room conversation memory with a short TTL
- Structured responses that carry a TTS voice style
# fully offline model: "ollama/llama3.1" base_url: "http://localhost:11434"
04 · SKILLS
Tool-calling skills, zero boilerplate
A skill is just an async Python function in skills/ with a @skill decorator. Kenzy reads its signature and docstring to build the tool schema automatically — the model calls it when it fits.
Weather, news, stocks, dice, Home Assistant control, and version info ship in the box. Adding your own is one file, no registration.
- Auto-generated tool schemas from type hints
- Per-skill config in
llm.yaml, secrets from.env - Disable any skill by name without deleting it
@skill async def set_scene(name: str) -> str: """Activate a lighting scene by name.""" return f"Scene {name} is on."
05 · FAST PATH
Instant commands, no round-trip
Some things shouldn't wait on a language model. "Turn on the lights" should just happen. Kenzy's deterministic fast path parses common commands locally and acts immediately — the LLM is the fallback, not the bottleneck.
It uses padacioso for intent parsing and rapidfuzz for device matching, with the model catching anything ambiguous. The lights are usually on before the confirmation finishes speaking.
- High-frequency commands answer in milliseconds
- Falls through to the LLM when it isn't sure
- Past-tense confirmations — the device already changed
06 · KENZY-SPEAKER
Knows who is speaking
SpeechBrain's ECAPA-TDNN model identifies enrolled speakers from a short voice profile. Enroll each person once with kenzy-enroll and Kenzy can tell the household apart — all locally.
That powers real guardrails: unlocking a door or opening a garage by voice can be restricted to a recognized person, refusing an unidentified voice outright.
- On-device speaker embeddings, no cloud
- Speaker-gated secure actions (locks, covers)
- Runs in parallel with transcription — no added latency
Enrolled profiles
Secure action
07 · KENZY-TTS
A voice of its own
Text-to-speech runs through OpenAI's TTS for polish, or Kokoro for a fully local PyTorch voice — your choice of where the audio is synthesized.
The LLM can attach a voice style to each reply, so Kenzy can sound calm, upbeat, or matter-of-fact to match the moment.
- Local Kokoro voices or hosted OpenAI voices
- Per-response voice styling
- Streamed back to the room as raw PCM
Backends
Streamed to room
08 · KENZY-DEPLOY
Roll it out across the house
One command syncs source, manages virtualenvs, writes systemd units, and controls services across a fleet of Debian hosts over SSH. Put a node in every room without hand-configuring each one.
init → install → upgradeworkflow- Pushes source, skills, and
.envtogether - Health checks across every host
$ kenzy-deploy init # prep hosts $ kenzy-deploy install # first deploy $ kenzy-deploy upgrade # push updates $ kenzy-deploy status # health check
// See how it fits together
One pipeline, six moving parts.
Every feature above is a service with a clear job and a simple interface. The architecture page shows how they connect.