As a data engineer, testing with production loads is critical to performance checking, as well as finding edge cases where your assumptions about what can be expected in the data are curb stomped and send you back to the drawing board to cry and think about what you’ve done.
I test my own code/scripts in dev when I’m working on it. QA usually tests acceptance criteria in test environment. And then staging is used for production data testing for performance and identifying missed edge cases. Actually, we sometimes use dev and test interchangeably when multiple people are working on the same repo, so the lines are a little blurrier than that.
And then managers that don’t get this will try to shove policies down our throats about how “pre-prod systems should not have access to prod data”.
“just obfuscate it.” Sure, for all 300Tb of it from the 10 different sources that don’t really talk to each other and we were already doing magic to be able to join them together? They should give us a bottle of hard liquor per month/project.
Yeah we finally set up a workflow where we get production data available in a staging environment. This has saved a lot of trouble via “well it worked on my local where there were 100 records, but prod has 1037492 and it does not”
Same. Early on as a new dev, I failed to performance check my script (as did my qa tester) before it was released to production, and that was my first roll back ever. It was very unoptimized and incredibly slow under one of our highest density data streams. Felt like an idiot that I was good with it’s 1-2 second execution time in the dev environment.
I deal with this constantly. Profilers are your friend. I keep begging my team to use the database dumps from production to test with, but nope. Don’t feel bad about messing up though. The amount of fuck ups I deal with in prod is exasperating. At least most of the things I break is a quick 5 minute fix and not weeks of rework.
The hardest thing I have explaining to the team is the concept of time. Once you have done controls programming and get to witness how much happens in 50-100ms, it sinks in. Your thing takes 500ms? 1 second? They think this is acceptable on something that is dealing with less than 100 database records. 😭
As a data engineer, testing with production loads is critical to performance checking, as well as finding edge cases where your assumptions about what can be expected in the data are curb stomped and send you back to the drawing board to cry and think about what you’ve done.
You test in dev? You mean you don’t have a Q&A environment? or staging?
I test my own code/scripts in dev when I’m working on it. QA usually tests acceptance criteria in test environment. And then staging is used for production data testing for performance and identifying missed edge cases. Actually, we sometimes use dev and test interchangeably when multiple people are working on the same repo, so the lines are a little blurrier than that.
And then managers that don’t get this will try to shove policies down our throats about how “pre-prod systems should not have access to prod data”.
“just obfuscate it.” Sure, for all 300Tb of it from the 10 different sources that don’t really talk to each other and we were already doing magic to be able to join them together? They should give us a bottle of hard liquor per month/project.
Yeah we finally set up a workflow where we get production data available in a staging environment. This has saved a lot of trouble via “well it worked on my local where there were 100 records, but prod has 1037492 and it does not”
I once tanked a production service by assuming it could handle at least as much load as my laptop on residential sub-gigabit Internet could produce.
I was wrong by at least an order of magnitude.
Same. Early on as a new dev, I failed to performance check my script (as did my qa tester) before it was released to production, and that was my first roll back ever. It was very unoptimized and incredibly slow under one of our highest density data streams. Felt like an idiot that I was good with it’s 1-2 second execution time in the dev environment.
I deal with this constantly. Profilers are your friend. I keep begging my team to use the database dumps from production to test with, but nope. Don’t feel bad about messing up though. The amount of fuck ups I deal with in prod is exasperating. At least most of the things I break is a quick 5 minute fix and not weeks of rework.
The hardest thing I have explaining to the team is the concept of time. Once you have done controls programming and get to witness how much happens in 50-100ms, it sinks in. Your thing takes 500ms? 1 second? They think this is acceptable on something that is dealing with less than 100 database records. 😭
I learned about this in my first internship, bless the senior Dev that showed be proper ways to build actual SQL
17 years working with hospital patient data. I’m going to curl up in a corner and cry now…
dev teams usually :
this guy :
must be high pressure work.
Heh, loads.