Soo I’m working on a database that needs to support multiple languages (two for now, but who knows). I stumbled across this blog post that explains how to develop what it calls a “translation subschema” (haven’t seen it called like this anywhere else so I don’t know if it’s actually how you’d call it), which seems like a very nice way of dealing with things.

I’m not very experienced with DB stuff, so it took me a while to fully understand what it was doing, but now that (I think) I do, I’m wondering if I could just ignore the Languages table, and just use a language field in the tables TextContent and Translations, without loosing any functionality. (except of course having a table listing the available languages, which is not however something I’m interested in)

In my head everything would still work, I’d insert stuff with

INSERT INTO TextContent (OriginalText, OriginalLanguage)
VALUES ("Ciao", "it");

DECLARE TextContentId INT = SCOPE_IDENTITY();

INSERT INTO Translations (TextContentId, Language, Translation)
VALUES (@TextContentId, "it", "Ciao");
INSERT INTO Translations (TextContentId, Language, Translation)
VALUES (@TextContentId, "en", "Hello");

and given a TextContentId, i’d retrieve the correct translation with

SELECT Translation FROM Translations WHERE TextContentId = TCId AND Language = "en"

At this point, I’m thinking I could drop TextContent too, and just have a Translations table with TextContentId, Language, and Translation, with (TextContentId, Language) as primary key.

Am I missing something? I’m trying to simplify this solution but I don’t want to risk making random errors.

Edit: translations on the DB are for user inserted text, which will also provide translations. The client then will only receive text it the correct language.

  • wetbeardhairs@lemmy.dbzer0.com
    link
    fedilink
    arrow-up
    25
    ·
    1 day ago

    You might want to consider using country codes in your language identifier.

    en-US en-UK en-CA

    all have slight variations in dialect. I don’t speak other languages but I imagine it is important in many cases.

    • Eager Eagle@lemmy.world
      link
      fedilink
      English
      arrow-up
      16
      ·
      edit-2
      1 day ago

      for reference, that’s usually a ISO 639-1 combined with ISO 3166-1 alpha 2

      and if an exact locale match is not available, it makes more sense to return another language match than the default language fallback

      • czan@aussie.zone
        link
        fedilink
        arrow-up
        2
        ·
        18 hours ago

        I think IETF BCP 47 is the way to go for this (and is rougly a superset of what you’re saying). They also define a “more/less specific” selationship between tags so you can ask for “en-CA” and have the system decide “en” is still appropriate.