cross-posted from: https://feddit.online/c/technology/p/1229433/apertus-switzerland-government-release-a-fully-open-transparent-multilingual-language-l
"Apertus: a fully open, transparent, multilingual language model
EPFL, ETH Zurich and the Swiss National Supercomputing Centre (CSCS) released Apertus 2 September, Switzerland’s first large-scale, open, multilingual language model — a milestone in generative AI for transparency and diversity.
Researchers from EPFL, ETH Zurich and CSCS have developed the large language model Apertus – it is one of the largest open LLMs and a basic technology on which others can build.
In brief Researchers at EPFL, ETH Zurich and CSCS have developed Apertus, a fully open Large Language Model (LLM) – one of the largest of its kind. As a foundational technology, Apertus enables innovation and strengthens AI expertise across research, society and industry by allowing others to build upon it. Apertus is currently available through strategic partner Swisscom, the AI platform Hugging Face, and the Public AI network. …
The model is named Apertus – Latin for “open” – highlighting its distinctive feature: the entire development process, including its architecture, model weights, and training data and recipes, is openly accessible and fully documented.
AI researchers, professionals, and experienced enthusiasts can either access the model through the strategic partner Swisscom or download it from Hugging Face – a platform for AI models and applications – and deploy it for their own projects. Apertus is freely available in two sizes – featuring 8 billion and 70 billion parameters, the smaller model being more appropriate for individual usage. Both models are released under a permissive open-source license, allowing use in education and research as well as broad societal and commercial applications. …
Trained on 15 trillion tokens across more than 1,000 languages – 40% of the data is non-English – Apertus includes many languages that have so far been underrepresented in LLMs, such as Swiss German, Romansh, and many others. …
Furthermore, for people outside of Switzerland, the external pagePublic AI Inference Utility will make Apertus accessible as part of a global movement for public AI. “Currently, Apertus is the leading public AI model: a model built by public institutions, for the public interest. It is our best proof yet that AI can be a form of public infrastructure like highways, water, or electricity,” says Joshua Tan, Lead Maintainer of the Public AI Inference Utility."
Did someone try it? What’s your experience?
I think I like it. But, I’m having massive issues with it becoming very repetetive after a while. And I’m not sure if it’s the model or my sampler settings.
Found it not very compliant, asked it about french politics and it fared worse than other 70B models I tested through openrouter.
Mistral 24B gave better answers.
I like the group that is behind though. EPFL and ETH are some of the best research institutes in Europe. Catching up with big corporate models is hard, and it is nice they are giving it a shot. Don’t expect them to be there yet on the first try though.
It’s a dense model so it’s unfortunately much slower than the newer MoE models.
I also was a little bit disappointed with its translation abilities, especially considering that its a Swiss model but cannot properly do German <-> English translations.
I’m only tinkering with the 8B variant, so speed is alright. I hadn’t noticed yet. But yes, seems English to German leads to weird results. German to English seems to be fine, though. At least for the 2 texts I put in.
I tested the 70b model at Q8_K_XL quantization on Strix Halo and it’s not too slow to use for short queries but definitely much slower than something like gpt-oss-120b.
English to German leads to weird results. German to English seems to be fine, though. At least for the 2 texts I put in.
Didn’t get to German to English before giving up, English to German was just too awful. Cases were constantly wrong, very weird word choices and incorrect grammatical genders.
How are you running it? Would you be able to post your run arguments?
I’m running the 8B version with KoboldCPP and pretty much the default settings of the Min-P sampler. That tends to work very well with almost all other models and I’m not aware of any specific recommendations for Apertus…
We recommend setting temperature=0.8 and top_p=0.9 in the sampling parameters.
Try that. I believe those params are available I Kobold. Id that doesn’t work, send me a sample of what you’re doing and I’ll try it out


