Soo I’m working on a database that needs to support multiple languages (two for now, but who knows). I stumbled across this blog post that explains how to develop what it calls a “translation subschema” (haven’t seen it called like this anywhere else so I don’t know if it’s actually how you’d call it), which seems like a very nice way of dealing with things.
I’m not very experienced with DB stuff, so it took me a while to fully understand what it was doing, but now that (I think) I do, I’m wondering if I could just ignore the Languages
table, and just use a language
field in the tables TextContent
and Translations
, without loosing any functionality. (except of course having a table listing the available languages, which is not however something I’m interested in)
In my head everything would still work, I’d insert stuff with
INSERT INTO TextContent (OriginalText, OriginalLanguage)
VALUES ("Ciao", "it");
DECLARE TextContentId INT = SCOPE_IDENTITY();
INSERT INTO Translations (TextContentId, Language, Translation)
VALUES (@TextContentId, "it", "Ciao");
INSERT INTO Translations (TextContentId, Language, Translation)
VALUES (@TextContentId, "en", "Hello");
and given a TextContentId, i’d retrieve the correct translation with
SELECT Translation FROM Translations WHERE TextContentId = TCId AND Language = "en"
At this point, I’m thinking I could drop TextContent
too, and just have a Translations
table with TextContentId
, Language
, and Translation
, with (TextContentId
, Language
) as primary key.
Am I missing something? I’m trying to simplify this solution but I don’t want to risk making random errors.
Edit: translations on the DB are for user inserted text, which will also provide translations. The client then will only receive text it the correct language.
You might want to consider using country codes in your language identifier.
en-US en-UK en-CA
all have slight variations in dialect. I don’t speak other languages but I imagine it is important in many cases.
for reference, that’s usually a ISO 639-1 combined with ISO 3166-1 alpha 2
and if an exact locale match is not available, it makes more sense to return another language match than the default language fallback
I think IETF BCP 47 is the way to go for this (and is rougly a superset of what you’re saying). They also define a “more/less specific” selationship between tags so you can ask for “en-CA” and have the system decide “en” is still appropriate.
That’s probably a good call, thanks :)