• criss_cross@lemmy.world
      link
      fedilink
      English
      arrow-up
      23
      ·
      1 day ago

      I’m sorry as an AI I cannot physically color you shocked. I can help you with AWS services and questions.

      • Shayeta@feddit.org
        link
        fedilink
        English
        arrow-up
        3
        ·
        20 hours ago

        How do I set up event driven document ingestion from OneDrive located on an Azure tenant to Amazon DocumentDB? Ingestion must be near-realtime, durable, and have some form of DLQ.

        • Tja@programming.dev
          link
          fedilink
          English
          arrow-up
          2
          ·
          7 hours ago

          DocumentDB is not for one drive documents (PDFs and such). It’s for “documents” as in serialized objects (json or bson).

          • Shayeta@feddit.org
            link
            fedilink
            English
            arrow-up
            1
            ·
            3 hours ago

            That’s even better, I can just jam something in before it and churn the documents through an embedding model, thanks!

        • Meowing Thing@lemmy.world
          link
          fedilink
          English
          arrow-up
          2
          ·
          8 hours ago

          I think you could read onedrive’s notifications for new files, parse them, and pipe them to document DB via some microservice or lamba depending on the scale of your solution.

        • criss_cross@lemmy.world
          link
          fedilink
          English
          arrow-up
          7
          ·
          20 hours ago

          I see you mention Azure and will assume you’re doing a one time migration.

          Start by moving everything from OneDrive to S3. As an AI I’m told that bitches love S3. From there you can subscribe to create events on buckets and add events to an SQS queue. Here you can enable a DLQ for failed events.

          From there add a Lambda to listen for SQS events. You should enable provisioned concurrency for speed, the ability for AWS to bill you more, and so that you can have a dandy of a time figuring out why an old version of your lambda is still running even though you deployed the latest version and everything telling you that creating a new ID for the lambda each time to fix it fucking lies.

          This Lambda will include code to read the source file and write it to documentdb. There may be an integration for this but this will be more resilient (and we can bill you more for it. )

          Would you like to see sample CDK code? Tough shit because all I can do is assist with questions on AWS services.