https://github.com/resemble-ai/chatterbox is pretty good, and has both TTS and voice cloning. Main disadvantage for me was that even if the cloning gives a consistent voice, the generated samples can get random accents.
https://huggingface.co/zai-org/GLM-TTS also seemed pretty promising, but I haven’t had time to test it yet.







If you don’t find any hostable service, perhaps you could try Obsidian if its Kanban plugin works well in the mobile client. It’s closed source, but all data is stored in markdown files, and you could use a self-hosted git server for storage and synchronization between users.