There is a huge market segment waiting here. At least I think there is. Well, at least people like me want this. Ok, tens of dollars can be made at least. It is just missing a critical tipping point. Basically, I want an alexa like device for the home backed by local inference and storage with some standardized components identified:

- the interactive devices - all the alexa/google/apple devices out there are this interface, also, probably some TV input that stays local and I can voice control. That kind of thing. It should have a good speaker and voice control. It probably should also do other things like act as a wifi range extender or be the router. That would actually be good. I would buy one for each room so no need for crazy antennas if they are close and can create true mesh network for me. But I digress.

- the home 'cloud' server that is storage and control. This is a cheap CPU, a little ram and potentially a lot of storage. It should hold the 'apps' for my home and be the one place I can back-up everything about my network (including the network config!)

- the inference engines. That is where this kind of repo/device combo comes in. I buy it and it knows how to advertise in a standard way its services and the controlling node connects it to the home devices. It would be great to just plug it in and go.

Of course all of these could be combined but conceptually I want to be able to swap and mix and match at these levels so options here and interoperability is what really matters.

I know a lot of (all of) these pieces exist, but they don't work well together. There isn't a simple standard 'buy this turn it on and pair with your local network' kind of plug and play environment.

My core requirements are really privacy and that it starts taking over the unitaskers/plays well together with other things. There is a reason I am buying all this local stuff. If you phone home/require me to set up an account with you I probably don't want to buy your product. I want to be able to say 'Freddy, set timer for 10 mins' or 'Freddy, what is the number one tourist attraction in South Dakota' (wall drugs if you were wondering)

▲

Normal_gaussian 2 days ago | parent | next [-]

No, there isn't a plug and play one yet, but I've have great success with Home Assistant and the Home Assistant Voice Preview edition and its goal is pretty much to get rid of Alexa.

I'd imagine you'd have a bunch of cheap ones in the house that are all WiFi + Mic + Speakers, streaming back to your actual voice processing box (which would cost a wee bit more, but also have local access to all the data it needs).

You can see quite quickly that this becomes just another program running on a host, so if you use a slightly beefier machine and chuck a WiFi card in as well you've got your WiFi extenders.

▲

joshstrange 2 days ago | parent | next [-]

> but I've have great success with Home Assistant and the Home Assistant Voice Preview edition

As compared to Alexa? I bought their preview hardware (and had a home-rolled ESP32 version before that even) and things are getting closer, I can see the future where this works but we aren't there today IMHO. HA Voice (the current hardware) does not do well enough in the mic or speaker [0] department when compared to the Echos. My Echo can hear me over just about anything and I can hear it back, the HA Voice hardware is too quiet and the mic does not pick my up from the same distances or noise pollution levels as the Echo.

I _love_ my HA setup and run everything through it. I'd like nothing more than to trash all my Echos, I cam close to ordering multiple of the preview devices but convinced myself to get just 1 to test (glad I did).

Bottom line: I think HA Voice is the future (for me) but it's not ready yet, it doesn't compare to the Echos. I wish so much that my Sonos speakers could integrate with HA Voice since I already have those everywhere and I know they sound good.

[0] I use Sonos for all my music/audio listening in my house so I only care about the speaker for hearing it talk back to me, I don't need high-end audiophile speakers.

▲

Normal_gaussian 2 days ago | parent | next [-]

I've not had any issues with the audio picking up, but its in the living room rather than the kitchen. I have Alexa's in most rooms. I don't play music through it, which I do from the Alexa. Tbh I think the mic and the speakers will be fine when the rest of the 'product' is sorted.

I failed to mention I have Claude connected to it rather than their default assistant. To us, this just beats Alexa hands down. I have the default assistant another wake word and mistral on the last, they're about as good as Alexa but I rarely use them.

▲

joshstrange a day ago | parent [-]

Interesting, well I'm glad it's working well for you all. I tested with local, HA Cloud, and ChatGPT/Claude and that wasn't the sticking point, it was getting the hardware to hear me or for me to hear it.

I will say, while it was too slow (today) with the my local inference hardware (CPU, older computer and a little on my newer MBP) it was magically to talk to and hear back from HA all locally. I look forward to a future where I can do that at the same speed/quality as the cloud models. Yes, I know cloud models will continue to get better but turning on/off my fans/lights/etc doesn't need to best model available, just needs to be reliable and fast, I'm even fine with it "shelling out" to the cloud if I ask it for something outside of the basics though I doubt I'll care to do that.

	▲	Normal_gaussian a day ago \| parent [-]
		> Yes, I know cloud models will continue to get better but turning on/off my fans/lights/etc doesn't need to best model available, just needs to be reliable and fast, I'm even fine with it "shelling out" to the cloud if I ask it for something outside of the basics though I doubt I'll care to do that. This is exactly how I feel. Its also why I like the multiple wake words - one for remote and one for local. One of the amazing things I've found with the LLM powered voice assistants is being able to 'recover' from mistakes - e.g. when cooking and forgetting to set the next timer, I can recover by asking about another event like when the last timer ended or when I turned off the bedroom light. Its annoying you can't do that with Alexa. This 'complexity' doesn't need a huge or SOTA model to resolve! I also enjoy being able to ask for a song by half title and half description - my wife was trying to play Ghost by Au/Ra, which we just can't get the Alexa to do, and I can't reasonably get my local LLMs to fail at. After your comment earlier I took the preview edition into the kitchen, where it did perform a lot worse with the multiple bits of white noise and odd room shape.

▲

luma 2 days ago | parent | prev [-]

I had the same experience, eBay suggests that I'll have a Jabra speakerphone in my mailbox tomorrow to try moving everything to a better audio setup. The software seems good but the audio performance is miserable on the preview device, you essentially have to be talking directly at the microphone from not more than a few feet away for anything to recognize.

Sadly, the Jabra (or any USB) audio device means I'll need to shift over to an rPi which comes with it's own lifecycle challenges.

▲

mcny 2 days ago | parent | prev [-]

And if it is plugged in to the wall, I'd be tempted to add a touch screen display and a camera just in case.

But really my use case is as simple as

1. Wake word, what time is it in ____

2. Wake word, how is the weather in ____

3. Wake word, will it rain/snow/?? in _____ today / tomorrow / ??

4. Wake word, what is ______

5. Wake word, when is the next new moon / full moon?

6. Wake word, when is sunrise / sunset?

And something similar like that

▲

sallveburrpi 2 days ago | parent [-]

So you need a clock maybe? Plus something like wttr.in

	▲	mcny 16 hours ago \| parent [-]
		Problem is it should be accessible by voice for like a ninety year old person.

▲

estimator7292 10 hours ago | parent | prev | next [-]

Just last week I hacked my Echo Show to install a custom OS and hook it into HomeAssistant.

Even gave it a custom wake word, she's Janet now.

HA is pretty clunky and there's a lot of manual setup. But I have a voice assistant contained entirely within my local infrastructure. I'm even planning to wire it up to my local ollama server for actual AI inference behind it.

So far it's exactly as crappy as Alexa, but only because I haven't waded deep enough into configuration. I'm okay with tools being crap when it's my fault instead of the tool being crap because it doesn't make Amazon enough money.

	▲	password4321 36 minutes ago \| parent [-]
		> hacked my Echo Show Wowsers I did not know this was a thing; TIL, thanks!

▲

ragebol 2 days ago | parent | prev | next [-]

A bit like HomeAssistant Voice? https://www.home-assistant.io/voice-pe/

▲

fuzzer371 2 days ago | parent | prev | next [-]

And there never will be. You know why? Because the giant corporations can't suck up all your data and tailor advertisements to you. Why sell a good thing once, when you can sell crappy shovelware ridden with ads and a subscription service every month?

	▲	jmward01 2 days ago \| parent [-]
		Open source is amazing for this. Honestly, I suspect this is much simpler than the jellyfin ecosystem and other open source projects out there. Really, we are so close to this now it is just missing a few things like a good 'how to' that ties it all together and turns into the opensource repo that bundles things.

▲

protocolture 2 days ago | parent | prev | next [-]

Keen for this also. Been having issues getting a smooth voice experience from HA to ChatGPT. I dont like the whole wakeword concept for the receiver either. I think theres work to be done on the whole stack.

▲

fennecbutt 2 days ago | parent | next [-]

What's wrong with the wakeword stuff?

Great timing as I was looking into it yesterday as was thinking about writing my own set of agents to run house stuff. I don't want to spent loads of time on voice interaction so HA wakeword stuff would've been useful. If not I'll bypass HA for voice and really only use HA via mcp.

I can do fw dev for micros...but omg do I not want to spend the time looking thru a datasheet and getting something to run efficiently myself these days.

▲

protocolture a day ago | parent [-]

You can use the vendor supported wakewords, and they are generally pretty good.

However-> These are device specific. The devices I purchased for this purpose have very few vendor supported wakewords, but even more prominently, refuse to integrate with HA. Possible firmware issue, but I have reloaded the firmware 30 times. I dont necessarily want to purchase something else for this purpose. Which is where building a bespoke HA audio box becomes its own can of worms.

But if you want a custom wake word, or more like a wake phrase, you go down a rabbit hole of training/cost/memory etc that starts to get annoying fast.

I kind of know I am being unreasonable. I dont want a device that just ships off everything it hears to an LLM, even local, that would suck. I just want a third way.

Then theres other stuff. Like HA has a hard time with providing context to an LLM, because it sends the whole conversation thus far off to the LLM for context. It can get really weird really quickly. This caused me a lot of issues with lights for example. It would remember switching a light on, and if that was in the context, would refuse to switch it on a second time if it turned off due to a rule or manual intervention. But if you dont send the context, you cant have deeper conversations. You cant ask subsequent questions basically.

	▲	estimator7292 10 hours ago \| parent [-]
		On my new AMD laptop, it took about 90 minutes to run 50k training rounds on OpenWakeWord. It's not really a big burden. A tiny AI running locally is the third option you want. That's the only reasonable way to do configurable wake word detection

▲

nickthegreek 2 days ago | parent | prev | next [-]

you can use a physical button instead of wakeword.

▲

protocolture 2 days ago | parent [-]

Doesnt suit my use case sadly.

▲

0xdeadbeefbabe 2 days ago | parent [-]

Back to the drawing board. What about a proximity sensor?

	▲	protocolture a day ago \| parent [-]
		I think what I want to do, is have a dodgy local LLM that picks up the context that the user is speaking to the LLM, and then enables it for 20 minutes or so. But even thats a bit of a wild tradeoff.

▲

6510 2 days ago | parent | prev [-]

It should participate in all conversations, take initiative and experiment.

▲

sdenton4 2 days ago | parent [-]

"Hey, hey, are you still asleep? Using spare cycles, I have designed an optimal recipe for mashed potatoes, as you mentioned ten days ago. I need you to go get some potatoes."

▲

terribleperson 2 days ago | parent | next [-]

A local AI system that hears your conversations, identifies problems, and then uses spare cycles to devise solutions for them is actually an incredible idea. I'm never going to give a cloud system the kind of access it would need to do a really good job, but a local one I control? Absolutely.

"Hey, are you still having trouble with[succinct summary of a problem it identified]?" "Yes" "I have a solution that meets your requirements as I understand them, and fits in your budget."

▲

PaulDavisThe1st a day ago | parent | next [-]

> A local AI system that hears your conversations, identifies problems, and then uses spare cycles to devise solutions for them is actually an incredible idea.

I call that Dreaming.

(TM)

	▲	BizarroLand a day ago \| parent [-]
		If you could get an AI to listen to the conversations that happen in your sphere of influence and simply jot down the problems it identifies over the course of the day/week/month/year, that in itself would be an amazing tool. Doubly so if you could just talk and brainstorm while it's listening and condensing, so you can circle back later and see what raindrops formed from the brainstorm. Call that DayDreaming (TM)

▲

darkwater 2 days ago | parent | prev [-]

"Did you find how to make peace with $FRIEND_OF_SPOUSE after they came here last week and they were pretty mad at you because you should tell something to $SPOUSE ? I thought about it in my spare cycles and all psychologists agree that truth and trust are paramount in a healthy relationship"

▲

FeepingCreature 2 days ago | parent | prev | next [-]

I unironically want this.

	▲	estimator7292 10 hours ago \| parent \| next [-]
		Agreed, this is hilarious.
	▲	6510 2 days ago \| parent \| prev [-]
		I forget who but someone onhere a while back said he made a contraption that listens in and tries to determine the winner of each conversation.

▲

6510 2 days ago | parent | prev | next [-]

I ponder the concept in the 90's. Initially I thought it should be an assistant but with age came wisdom and now I think it should be a virtual drill instructor. "Rise and shine $insult $insult, the sun is up, the store is open, we will be getting some potatoes today, $insult $insult, it was all your idea now apply yourself!" Bright lights flashing, loud music, the shower starts running. "Shower time, you have 7 minutes! $insult $insult" 4 minutes in the coffee machine boots up. "You will be wearing the blue pants, top shelve on the left stack, the green shirt, 7th from the left. Faster faster! $insult $insult"

▲

quietsegfault 2 days ago | parent | prev [-]

This sounds a lot like gptars. I want a little gptars tearing around my house.

https://youtube.com/shorts/e2t0RxX4b54

▲

6510 a day ago | parent [-]

I forgot about him. Great project!

Reminds me of a video from the 90's where some wizard put a camcorder and a giant antenna on a petrol powered rc car, an even bigger antenna on his house and controlled it from a 40's style sofa and a huge tube TV in his cramped garage. Over a mile range. Surrounded by enormous cars I think he was going 40-50 mph but with the screaming engine sound and the camera so low to the ground it looked like 500 mph. I'm still laughing, it looked like he was having all of the fun.

	▲	quietsegfault 12 hours ago \| parent [-]
		I've been meaning to put an FPV drone camera on one of my RC cars! It's very, very simple to do nowadays and requires none of the know-how you needed back in the day.

▲

mkul 2 days ago | parent | prev | next [-]

I've just started using it but I'd recommend https://github.com/steipete/clawdis, you need to set it up a bit but it's really cool to just be able to do things on the go by just texting an assistant. You can see all the different ways people are using it @clawdbot on twitter.

	▲	mr_mitm 2 days ago \| parent \| next [-]
		Can you give us some highlights on how this is helpful in your day-to-day life for those of us who aren't on twitter?
	▲	BizarroLand a day ago \| parent \| prev [-]
		Why does it require an online AI service? Why can it not work with ollama or some other locally hosted setup?

▲

colechristensen 2 days ago | parent | prev | next [-]

I've been working on this on and off for a couple of years now, the loop is definitely closing, I think it's possible at this point but not yet easy.

▲

PunchyHamster 2 days ago | parent | prev | next [-]

There is but that market doesn't sell subscriptions and that is what tech giants wants to sell - renewable flow of money that will keep flowing even if product stagnates because effort to move to competition is big.

	▲	Haaargio 2 days ago \| parent [-]
		We are in a free market with china still playing the open source game. The market is not ready for building this due to costs etc. not because the big companies block them or anything. And nvidia is not selling subscriptions at all.

▲

sofixa 2 days ago | parent | prev | next [-]

It sounds like you want Home Asisstant.

You have all of the different components:

* you can use a number of things for the interactive devices (any touchscreen device, buttons, voice, etc)

* have it HA do the basic parsing (word for word matching), with optionally plugging into something more complex (cloud service like ChatGPT, or self-hosted Ollama or whatever) for more advanced parsing (logical parsing)

Every part of the ecosystem is interchangeable and very open. You can use a bunch of different devices, a bunch of different LLMs to do the advanced parsing if you want it. HA can control pretty much everything with an API, and can itself be controlled by pretty much anything that can talk an API.

▲

empiko 2 days ago | parent | prev | next [-]

The sota chatbots are getting more and more functionality that is not just LLM inference. They can search the web, process files, integrate with other apps. I think that's why most people will consider local LLMs to be insufficient very soon.

▲

woooooo 2 days ago | parent | next [-]

But that's just software that also runs fine locally. A few tools with a local LLM can do it.

▲

empiko a day ago | parent [-]

Well I don't see people running their web search locally, so I don't think they will run their own search+LLM.

	▲	woooooo a day ago \| parent [-]
		No but you can call out to Google or DDG APIs.

▲

BoxOfRain 2 days ago | parent | prev [-]

Nah I disagree, tool calling isn't that difficult. I've got my own Cats Effect based model orchestration project I'm working on, and while it's not 100% yet I can do web browse, web search, memory search (this one is cool), and others on my own hardware.

▲

throwaway7783 2 days ago | parent | prev | next [-]

And toys

▲

zwnow 2 days ago | parent | prev [-]

> Well, at least people like me want this.

Yeah because dynamic digital price signs in shops based on what data vendors have about you and AI can extract from it are such fun! Total surveillance. More than what's already happening. Such fun!