// Capabilities

Everything Kenzy does — and where it runs.

Each capability is a self-contained service. Run the whole set on one machine or spread it across the house. The common thread: it can all run on hardware you own.

01 · KENZY-NODE

Wake word, listening on-device

Every room node runs openWakeWord on each audio frame, locally. Nothing is streamed anywhere until the wake word actually fires — the mic isn't an open line to the cloud.

An optional Silero VAD gate suppresses false triggers on near-silence, so you can lower the threshold for better real-speech sensitivity without the assistant waking to a creaky floorboard.

  • Two bundled models, or drop in a .tflite/.onnx you trained yourself
  • VAD-gated detection to cut false wakes
  • Wake-word interrupt — talk over playback to start a new request

Loaded models

hey_kenzie ken_zee your_model.tflite

Detection

02 · KENZY-STT

Transcription on your machine

Speech-to-text runs on faster-whisper — locally, by default. Pick a model size to match your hardware, from tiny on a Pi to large-v3 on a GPU workstation.

Because it's a service behind a URL, you choose where it lives. Keep it on the LAN and your spoken words are transcribed without ever touching someone else's server.

  • CPU or CUDA, with int8 / float16 compute options
  • Model size is one line of config
  • Swap in a cloud STT endpoint only if you want to
  configs/stt.yaml
whisper:
  model: "base"      # tiny → large-v3
  device: "cpu"      # or "cuda"
  compute_type: "int8"
  language: "en"

03 · KENZY-LLM

Bring your own language model

This is the heart of the "100% local" story. kenzy-llm runs on LiteLLM, which talks to local runtimes — Ollama, LM Studio, vLLM — exactly the same way it talks to OpenAI or Anthropic.

So you decide the privacy/quality trade-off, per install. Run a small model entirely offline, or route to a frontier model in the cloud. Changing your mind is two lines of YAML.

  • Local or cloud — same config, swap freely
  • Per-room conversation memory with a short TTL
  • Structured responses that carry a TTS voice style
  configs/llm.yaml
# fully offline
model: "ollama/llama3.1"
base_url: "http://localhost:11434"
OllamaLM StudiovLLMOpenAIAnthropic

04 · SKILLS

Tool-calling skills, zero boilerplate

A skill is just an async Python function in skills/ with a @skill decorator. Kenzy reads its signature and docstring to build the tool schema automatically — the model calls it when it fits.

Weather, news, stocks, dice, Home Assistant control, and version info ship in the box. Adding your own is one file, no registration.

  • Auto-generated tool schemas from type hints
  • Per-skill config in llm.yaml, secrets from .env
  • Disable any skill by name without deleting it
  skills/my_skill.py
@skill
async def set_scene(name: str) -> str:
    """Activate a lighting scene by name."""
    return f"Scene {name} is on."

05 · FAST PATH

Instant commands, no round-trip

Some things shouldn't wait on a language model. "Turn on the lights" should just happen. Kenzy's deterministic fast path parses common commands locally and acts immediately — the LLM is the fallback, not the bottleneck.

It uses padacioso for intent parsing and rapidfuzz for device matching, with the model catching anything ambiguous. The lights are usually on before the confirmation finishes speaking.

  • High-frequency commands answer in milliseconds
  • Falls through to the LLM when it isn't sure
  • Past-tense confirmations — the device already changed
UTTERANCE"turn off the kitchen lights"
↓ local parse
FAST PATH · ~MSTurned off the kitchen lights.no model call

06 · KENZY-SPEAKER

Knows who is speaking

SpeechBrain's ECAPA-TDNN model identifies enrolled speakers from a short voice profile. Enroll each person once with kenzy-enroll and Kenzy can tell the household apart — all locally.

That powers real guardrails: unlocking a door or opening a garage by voice can be restricted to a recognized person, refusing an unidentified voice outright.

  • On-device speaker embeddings, no cloud
  • Speaker-gated secure actions (locks, covers)
  • Runs in parallel with transcription — no added latency

Enrolled profiles

johnnickiguest

Secure action

UNLOCK · UNKNOWN VOICERefusedrequires a recognized speaker

07 · KENZY-TTS

A voice of its own

Text-to-speech runs through OpenAI's TTS for polish, or Kokoro for a fully local PyTorch voice — your choice of where the audio is synthesized.

The LLM can attach a voice style to each reply, so Kenzy can sound calm, upbeat, or matter-of-fact to match the moment.

  • Local Kokoro voices or hosted OpenAI voices
  • Per-response voice styling
  • Streamed back to the room as raw PCM

Backends

Kokoro · localOpenAI · hosted

Streamed to room

08 · KENZY-DEPLOY

Roll it out across the house

One command syncs source, manages virtualenvs, writes systemd units, and controls services across a fleet of Debian hosts over SSH. Put a node in every room without hand-configuring each one.

  • init → install → upgrade workflow
  • Pushes source, skills, and .env together
  • Health checks across every host
  fleet rollout
$ kenzy-deploy init      # prep hosts
$ kenzy-deploy install   # first deploy
$ kenzy-deploy upgrade   # push updates
$ kenzy-deploy status    # health check

// See how it fits together

One pipeline, six moving parts.

Every feature above is a service with a clear job and a simple interface. The architecture page shows how they connect.