Started using this earlier this week. I built a backtesting benchmark tool to compare a mix of frontier and open-source models on a fairly heavy data analysis workflow I’d been running in the cloud.
The task is basically predicting pricing and costs.
Apple’s model came out on top—best accuracy in 6 out of 10 cases in the backtest. That surprised me.
It also looks like it might be fast enough to take over the whole job. If I ran this on Sonnet, we’re talking thousands per month. With DeepSeek, it’s more like hundreds.
So far, the other local models I’ve tried on my 64GB M4 Max Studio haven’t been viable - either far too slow or not accurate enough. That said, I haven’t tested a huge range yet.
With the Claude bug, or so it is known, burning through tokens at record speed, I gave alternative models a try and they're mostly ... interchangeable. I don't know how easy switching and low brand loyalty and fast markets will play out. I hope that local LLMs will become very viable very soon.
Yeah I don’t think the models are meaningfully differentiated outside of very specific edge cases. I suspect this was the thinking behind OpenAI and Facebook and all trying to lean hard into presenting their chatbots as friends and romantic partners. If they can’t maintain a technical moat they can try to cultivate an emotional one.
I like the approach of running everything locally. I'm strongly of the opinion that the privacy angle for local models is going to keep getting stronger and more relevant. The amount of articles that come out about accidents happening because of people handing too much context to cloud models the more self reinforcing this will become.
local is best for privacy, but i personally think you don't need to go local.
anthropic, google, openai etc, decided that their consumer ai plans would not be private. partly to collect training data, the other half to employ moderators to review user activity for safety.
we trust that human moderators will not review and flag our icloud docs, onedrive or gmail, or aggregate such documents into training data for llms. it became the norm that an llm is somehow not private. it became a norm that you can't opt out of training, even on paid plans (see meta and google); or if you can opt out of training, you can't opt out of moderation.
cloud models with a zero retention privacy policy are private enough for almost everyone, the subscriptions, google search, ai search engines are either 'buying' your digital life or covering themselves for legal reasons.
you can and should have private cloud services, and if legal agreement is not enough, cryptographic attestation is already used in compute, with AWS nitro enclaves and other providers.
I personally think everyone should default to using local resources. Cloud resources should only be used for expansion and be relatively bursty rather than the default.
For about two years I experimented with writing local apps using local LLMs, but I often had to blend in a commercial web search API to make my little experiments useful.
I pay $13/month for Proton’s Lumo+ private chat LLM that contains an excellent built-in web search tool. I use it for everything non-technical, even just simple searching for local businesses, etc.
As an enthusiastic reader of books like Privacy is Power and Surveillance Capitalism, it feels good to have a private tool that is ready at hand.
I’ve seen several projects like this that offer a network server with access to these Apple models. The danger is when they expose that, even on a loop port, to every other application on your system, including the browser. Random webpages are now shipping with JavaScript that will post to that port. Same-origin restrictions will stop data flow back to the webpage, but that doesn’t stop them from issuing commands to make changes.
Some such projects use CORS to allow read back as well. I haven’t read Apfel’s code yet, but I’m registering the experiment before performing it.
They offer it as an option but default it to false! This is still a --footgun option but it’s the least unsafe version I’ve seen yet! Well done, Apfel authors.
FWIW this was the status quo (webpage could ping arbitrary ports but not read data, even with CORS protections) - but it is changing.
This is partially in response to https://localmess.github.io/ where Meta and Yandex pixel JS in websites would ping a localhost server run by their Android apps as a workaround to third-party cookie limits.
So things are getting better! But there was a scarily long time where a rogue JS script could try to blindly poke at localhost servers with crafty payloads, hoping to find a common vulnerability and gain RCE or trigger exfiltration of data via other channels. I wouldn't be surprised if this had been used in the wild.
There is a CORS preflight check for POST requests that don't use form-encoding. It would be somewhat surprising if these weren't using JSON (though it wouldn't be that surprising if they were parsing submitted JSON instead of actually checking the MIME-type which would probably be bad anwyay)
Isn't there a CORS preflight check for this? In most cases. I guess you could fashion an OG form to post form fields. But openai is probably a JSON body only.
The default scenario should be secure. If the local site sends permissive CORS headers bets may be off. I would need to check but https->http may be a blocker too even in that case. Unless the attack site is http.
Local AIs are the future in times of limited resources. This could be the beginning of something big. I like that Apple opens up like this. Hopefully more to come.
I have been using Apple’s built-in system LLM model for the last 7 or 8 months. I like the feature that if it needs to, it occasionally uses a more powerful secure private cloud model. I also write my own app to wrap it.
Would really love to see a web api standard for on device llms. This could get us closer. Some in-browser language model usage could be very powerful. In the interim maybe a little protocol spec + a discovery protocol used with browser plugins, web apps could detect and interface with on-device llms making it universally available.
As an experiment I built a prototype chatbot app that uses the built-in LLM. It’s got a small context window, but is surprisingly capable and has tool-calling support. Without too much effort I was able to get it to fetch weather data, fetch and summarise emails, read and write reminders and calendar events.
Just discovered iOS shortcuts has a native action called “use model” that lets you use local, Apple cloud, or ChatGPT— before that I would have agreed with the author about being locked behind Siri (natively)
Just a small thing about the website: your examples shift all the elements below it on mobile when changing, making it jump randomly when trying to read.
dyld[71398]: Library not loaded: /System/Library/Frameworks/FoundationModels.framework/Versions/A/FoundationModels
Referenced from: <32818E2F-CB45-3506-A35B-AAF8BDDFFFCE> /opt/homebrew/Cellar/apfel/0.6.25/bin/apfel (built for macOS 26.0 which is newer than running OS)
Reason: tried: '/System/Library/Frameworks/FoundationModels.framework/Versions/A/FoundationModels' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/System/Library/Frameworks/FoundationModels.framework/Versions/A/FoundationModels' (no such file), '/System/Library/Frameworks/FoundationModels.framework/Versions/A/FoundationModels' (no such file, not in dyld cache)
And for those who did know that and want to know more, the shift from apple - apfel and water -> wasser happened during the High German consonant shift.
Does the local LLM have access to personal information from the Apple account associated with the logged-in user? Maybe through a RAG pipeline or similar? Just curious if there are any risks associated with exposing this in a way that could be exploited via CORS or through another rogue app querying it locally.
no. the on device foundationmodels framework that apfel uses does not have access to personal information from the apple account. the model is a bare language model with no built in personal data access.
apple does have an on device rag pipeline called the semantic index that feeds personal data like contacts emails calendar and photos into the model context but this is only available to apples own first party features like siri and system summaries.
it is not exposed through the foundationmodels api.
4,096 token context window is pretty limiting. That's roughly 3,000 words — fine for "summarize this paragraph" but not enough for anything that needs real context. Still, zero cost and fully local is hard to beat for quick throwaway tasks. Does it handle streaming or is it request-response only?
If you’re looking into small models for tiny local tasks, you should try Qwen coder 0,5B. It’s more of an experiment, but it can output decent functions given the right context instructions.
So… a prompt? I’m not on my laptop but I hooked it to cmp.nvim, gave it a short situational prompt, +- 10 lines, and started typing. Not anywhere near usable but with a little effort you can get something ok for repetitive tasks. Maybe something like spotting one specific code smell pattern. The advantage is the ridiculous T/s you get
the 2 hard limits of Appel Intelligence Foundation Model and therefor apfel is the 4k token context window and the super hard guardrails (the model prefers to tell you nothing before it tells you something wrong ie ask it to describe a color)
parsing logfiles line by line, sure
parsing a whole logfile, well it must be tiny, logfile hardly ever are
It’s a nice LLM because it seems fairly decent and it loads instantly and uses the CPU neural engine. The GPU is faster but when I run bigger LLMs on the GPU the normally very cool M series Mac becomes a lap roaster.
It’s a small LLM though. Seems decent but it’s also been safety trained to a somewhat comical degree. It will balk over safety at requests that are in fact quite banal.
This is pretty cool. My bet is that we have more LLMs running locally when its possible, either thru "better hardware as default" or some new tech that can run the models on commodity hardware (like apple silicon / equivalent PC setup).
The task is basically predicting pricing and costs.
Apple’s model came out on top—best accuracy in 6 out of 10 cases in the backtest. That surprised me.
It also looks like it might be fast enough to take over the whole job. If I ran this on Sonnet, we’re talking thousands per month. With DeepSeek, it’s more like hundreds.
So far, the other local models I’ve tried on my 64GB M4 Max Studio haven’t been viable - either far too slow or not accurate enough. That said, I haven’t tested a huge range yet.
With the Claude bug, or so it is known, burning through tokens at record speed, I gave alternative models a try and they're mostly ... interchangeable. I don't know how easy switching and low brand loyalty and fast markets will play out. I hope that local LLMs will become very viable very soon.
anthropic, google, openai etc, decided that their consumer ai plans would not be private. partly to collect training data, the other half to employ moderators to review user activity for safety.
we trust that human moderators will not review and flag our icloud docs, onedrive or gmail, or aggregate such documents into training data for llms. it became the norm that an llm is somehow not private. it became a norm that you can't opt out of training, even on paid plans (see meta and google); or if you can opt out of training, you can't opt out of moderation.
cloud models with a zero retention privacy policy are private enough for almost everyone, the subscriptions, google search, ai search engines are either 'buying' your digital life or covering themselves for legal reasons.
you can and should have private cloud services, and if legal agreement is not enough, cryptographic attestation is already used in compute, with AWS nitro enclaves and other providers.
I personally think everyone should default to using local resources. Cloud resources should only be used for expansion and be relatively bursty rather than the default.
As an enthusiastic reader of books like Privacy is Power and Surveillance Capitalism, it feels good to have a private tool that is ready at hand.
if you are happy with off-prem then the llm is ok too, if you need on-prem this is when you will need local.
The private thing is the prompt.
But also, a local LLM opens up the possibility of agentic workflows that don't have to touch the Internet.
Some such projects use CORS to allow read back as well. I haven’t read Apfel’s code yet, but I’m registering the experiment before performing it.
This is partially in response to https://localmess.github.io/ where Meta and Yandex pixel JS in websites would ping a localhost server run by their Android apps as a workaround to third-party cookie limits.
Chrome 142 launched a permission dialog: https://developer.chrome.com/blog/local-network-access
Edge 140 followed suit: https://support.microsoft.com/en-us/topic/control-a-website-...
And Firefox is in progress as well, though I couldn't find a clear announcement about rollout status: https://fosdem.org/2026/schedule/event/QCSKWL-firefox-local-...
So things are getting better! But there was a scarily long time where a rogue JS script could try to blindly poke at localhost servers with crafty payloads, hoping to find a common vulnerability and gain RCE or trigger exfiltration of data via other channels. I wouldn't be surprised if this had been used in the wild.
The default scenario should be secure. If the local site sends permissive CORS headers bets may be off. I would need to check but https->http may be a blocker too even in that case. Unless the attack site is http.
Unfortunately, I found the small context window makes the utility pretty limited.
apfel -o json "Translate to German: apple" | jq .content
Already in Chrome as an origin trial: https://developer.chrome.com/docs/ai/prompt-api
https://en.wikipedia.org/wiki/High_German_consonant_shift
apple does have an on device rag pipeline called the semantic index that feeds personal data like contacts emails calendar and photos into the model context but this is only available to apples own first party features like siri and system summaries.
it is not exposed through the foundationmodels api.
Can you share a working example?
trying to run openclaw with it in ultra token saving mode, did totally not work.
great for shell scripts though (my major use case now)
So you have to put up with the low contrast buggy UI to use that.
I too would love to try this for simple prompts but won’t be updating past Sequoia for the foreseeable future.
Imagine they baked Qwen 3.5 level stuff into the OS. Wow that’d be cool.
https://www.linkedin.com/posts/nathangathright_marco-arment-...
https://developer.apple.com/documentation/Updates/Foundation...
They released an official python SDK in March 2026:
https://github.com/apple/python-apple-fm-sdk
parsing logfiles line by line, sure
parsing a whole logfile, well it must be tiny, logfile hardly ever are
It’s a nice LLM because it seems fairly decent and it loads instantly and uses the CPU neural engine. The GPU is faster but when I run bigger LLMs on the GPU the normally very cool M series Mac becomes a lap roaster.
It’s a small LLM though. Seems decent but it’s also been safety trained to a somewhat comical degree. It will balk over safety at requests that are in fact quite banal.
> $0 cost
No kidding.
Why not just link the GH Github: https://github.com/Arthur-Ficial/apfel
https://news.ycombinator.com/item?id=47624647