I love the idea and I would like to build something like this. But the few attempts i have made using whisper locally has so far been underwhelming. Has anyone gotten results with small whisper models that are good enough for a use case like this?
Yeah, I would definitely double-check your setup. At work we use Whisper to live-transcribe-and-translate all-hands meetings and it works exceptionally well.
+1 this. Whisper works insanely well. I've been using the medium model as it has yet to mis transcribe anything noticeable, and it's very lightweight. I even converted it to a coreML model so it runs accelerated on apple silicon. It doesn't run *that* much faster than before.. but it ran really fast to begin with. For anyone tinkering, ive had much success with whisper.cpp.
I’ve noticed recently (maybe I missed an announcement) that Siri now functions locally for at least some commands. Try putting an Apple watch in airplane mode and asking it to set a timer or reminder
They are doing this, just at a mind-numbingly slow pace. They seem to add controls for brightness and power but don't make it clear what works when offline. It's not even worth trying because there's no guide or documentation on what commands would be available. You just have to go into airplane mode and try asking stuff. Awful UX
Does Apple even allow you to replace Siri with another assistant? For the longest time on android, all non-Google assistants were crippled by not being able to listen in the background or use the assistant hardkey, gestures, or shortcuts. I'm not sure if the Google assistant still has privileges others don't, but I wouldn't be surprised in the least.
Part of the problem is the wake word “hey siri” is actually handed by a separate coprocessor (AOP) with the model compiled into the firmware. While anything is technically possible, it isn’t as simple as just letting the google app run in the background since the AP is asleep when any of these gesture happen. You could probably setup the action button on the side to open an assistant, but that’s going to be a less pleasant experience (app might not be open, etc).
Same with android phones - a super-specific hardcoded phrase is much easier to work in the power budgets required for an "always on" part of the device.
It's why a manufacturer (like Samsung) can change that sort of thing on their devices, but it's not realistically something an end user (or even an app) can customize in software. It's not some "arbitrary" limitation.
Back in 1992 or so the NeXT could distinguish (was it 16 or) 64 fixed, trained, phrases. Point being, it doesn’t take too much compute with a finite vocabulary.
I saw an article about this and downloaded the Perplexity app but I was unable to figure out if this was true? Do I need a paid tier? I just quickly worked through the free sign up and couldn't sort it out. The demo looked really slick. Is it worth pursuing?
Faithful year and half user of chatGPT on my iPhone which has made me loathe Siri for how dumb she is in every sense of the way!
When will OpenAI (with the help of Microsoft) release a GPT phone to compete with the iPhone? Im so tired of the boring iPhone! Give me a GPT phone where from my lock screen GPT does everything for me. Fingers crossed :) it's secretively in the works!
In earnest though, I'm certain we'll see a community replacement of Siri by end-of-year if the iPhone permissions model allows it or there's some workaround. IDK what the limitations are here but I'm eagerly awaiting the community to step in where Siri has failed.
Maybe I've just had a bad microphone.
Yeah, I would definitely double-check your setup. At work we use Whisper to live-transcribe-and-translate all-hands meetings and it works exceptionally well.
But this guide gives me what I need to make that, I think, so a big thank you for this!
Haven't followed it through yet, but does this model run successfully on an iPhone?
My 9 year old ran a Qwen 0.6B model using ollama quite well, anything else was too slow to offer a good UX.
I was thinking there was a fourth grader out there deploying models when at that age I was still learning multiplication tables.
[0] https://llm.mlc.ai/docs/deploy/ios.html#bring-your-own-model
Details are listed below
https://machinelearning.apple.com/research/hey-siri
It's why a manufacturer (like Samsung) can change that sort of thing on their devices, but it's not realistically something an end user (or even an app) can customize in software. It's not some "arbitrary" limitation.
When will OpenAI (with the help of Microsoft) release a GPT phone to compete with the iPhone? Im so tired of the boring iPhone! Give me a GPT phone where from my lock screen GPT does everything for me. Fingers crossed :) it's secretively in the works!
In earnest though, I'm certain we'll see a community replacement of Siri by end-of-year if the iPhone permissions model allows it or there's some workaround. IDK what the limitations are here but I'm eagerly awaiting the community to step in where Siri has failed.
"Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something." - https://news.ycombinator.com/newsguidelines.html
(Your comment would be fine without that first bit.)
- time of day
- calendar date
- weather
- set a timer
- simple math calculation
That’s 90% of the functionality right there.
Oh you asked "what time is it in Miami?" Because I already cut you off after "what time is it".
This happens on a daily basis when I'm not talking right into my phone.