iPhone notification summaries were made with GPT3.5 I believe (maybe even the -turbo version).
It doesn’t use reasoning and so when using very short outputs it can produce wild variations since there are not a lot of previous tokens in order to direct the LLM into the appropriate direction in kv-space and so you’re more at the whims of temperature setting (randomly selecting the next token from a SOFTMAX’d list which was output from the LLM).
You can take those same messages and plug them into a good model and get much higher quality results. But good models are expensive and Apple is, for some reason, going for the budget option.
iPhone notification summaries were made with GPT3.5 I believe (maybe even the -turbo version).
It doesn’t use reasoning and so when using very short outputs it can produce wild variations since there are not a lot of previous tokens in order to direct the LLM into the appropriate direction in kv-space and so you’re more at the whims of temperature setting (randomly selecting the next token from a SOFTMAX’d list which was output from the LLM).
You can take those same messages and plug them into a good model and get much higher quality results. But good models are expensive and Apple is, for some reason, going for the budget option.
AFAIK some outputs are made with a really tiny/quantized local LLM too.
And yeah, even that aside, GPT 3.5 is really bad these days. It’s obsolete.