Forgive me, I’m no AI expert to fully compare the needed tokens per second measurement to relate to the average query Siri might handle, but I will say this:
Even in your article, only the largest model ran at 8/tps, others ran much faster, and none of these were optimized for a task, just benchmarking.
Would it be impossible for Apple to be running an optimized model specific to expected mobile tasks, and leverage their own hardware more efficiently than we can, to meet their needs?
I imagine they cut out most worldly knowledge etc/use a lightweight model, which is why there is still a need to link to ChatGPT or Apple for some requests, would this let them trim Siri down to perform well enough on phones for most requests? They also advertised launching AI on M1-2 chip devices, which are not M3-Max either…
Gas stations attached mechanic shops and then convenience stores even though you don’t spend a lot of time refueling.
Charging centers simply need to do the same. Or restaurants etc need to invest in charging stations.
If you go on a long trip and need to charge, and you can spend that time also meeting your personal needs for food and bathroom? By the time you finish a meal at the attached full service restaurant both your car and passengers will be fully refueled.
Even though I don’t personally own an EV, just a hybrid, that much became obvious as soon as my local grocery added a couple charging spots. Only a couple, but it’s so obvious the answer to long charging times is simply to have something else to do.