How good is LLM training data for a language spoken by less than 10 million people? Keep in mind that most of those people are probably multilingual (i.e. categorizing which language is which by person is harder), and language itself is similar to its neighbors. And then, again, terms.
How good is LLM training data for a language spoken by less than 10 million people? Keep in mind that most of those people are probably multilingual (i.e. categorizing which language is which by person is harder), and language itself is similar to its neighbors. And then, again, terms.
I can not say, and wouldn’t trust it unless a translator confirmed its validity