Welcome to Kenzy.Ai.
So here’s the deal. I’ll be rebuilding Karen from scratch–mostly. I’d like to go through all the core components and restructure them to provide better support for several important features. When complete I expect Karen to work with:
- Microphones/Audio Input Devices
- Sound Boards/Audio Output Devices
- Cameras/Video Input Devices
- Controller Boards & Servos
- Sensors for all purposes (temperature, speed, location, weight, pressure, etc.)
It’ll take a while, but given a recent experiment with some new tech toys I’ve realized that there is a world of interactive components that I’ve been missing out on, plus it’s just fun to play with new stuff.
I still fully expect to utilize the modular design for Karen, although I may more closely couple a few of the components to reduce latency in responsiveness.
The path to version 1.0 consists of the following:
- A speaker daemon for performing text-to-speech
- A listener daemon for performing speech-to-text
- An optional wake-word routine for command filtering
- A watcher daemon for face detection and facial recognition
- An intent parser for translating spoken text into actionable commands
- A skill-based expansion architecture
These features must remain true to the following constraints:
- Run under the power of one (or more) Raspberry Pi devices
- Support 100% offline mode (no network connectivity)
- Use only business-friendly 3rd party libraries
My plan for a Demo of Version 1 is to have a self-contained Raspberry Pi (or a network of Raspberry Pi devices) that perform these functions. I also want to include a Pan/Tilt setup for a camera so that it can move/refocus to follow the face around the room. Ideally this will create the pattern for the controller board interfaces which eventually will lead to mobility.