Hi everyone, first post!
Couple of questions:
1 - When you add or remove columns to your db and need to massage existing data how do you do it?
- I create rake tasks to do just this (with tests)
- Commit the above
- Push to Heroku
- Run the tasks remotely
If all goes well sweet. But what should I do with the rake tasks? Remove them as they have served their pupose and commit (I can always checkthem out from the earlier commit) or just leave them be?
2 - Lets say you made some changes to the db and only several hours later your you start seeing some exceptions. Bad data has gotten in and now bringing the app down. How do you triage this?
- just rollback to your db backup and kiss goodbye to any records your users made
- try to massage the data into good form (rake tasks again)
- or is there another best practice.
Are there any good books on this sort of stuff?
Hey @robodisco, welcome to the forum.
I use seed_migration to execute data changes. It allows you to create data migrations that can be hooked in to your migration task (they run after database migrations). The gem keeps track of what migrations have been run in a similar manner to database migrations, so if you have a team working on the app and they are writing migrations separately they all get executed once and only once when they are merged in. (It supports rollback just like database migrations.)
It’s not terribly different from doing that with a rake task, but I do like how the gem won’t run the migration twice. If you use rake tasks, I would remove them from your code base after use to ensure that they don’t accidentally get run a second time and damage your data.
As for your question about how to triage a data migration gone bad, I think the approach really depends on how much damage has been done. I don’t know how critical your users’ data is, but I would try to avoid just doing a blanket rollback if your app has been running for several hours.
The best strategy for avoiding a situation like this would be to run your migrations on a staging environment that runs on the web (not your local machine) just like your production environment, and then to subject that environment to some realistic use to see if it holds up.
At my job we do everything on a staging server and usually with a copy of production data for that server. So we will drop the entire production database and transfer it to the staging server just to ensure we maintain data integrity. This is really the only way to ensure you know how a script will work at 100 records and how it will work at 1 million records or even 1 billion records. if the database is too big then we usually try to create a partial of it or simulate enough data to make it realistic. Ideally we test it enough to ensure we don’t screw things up in production.
Never roll back the database unless you can avoid it. Data loss is incredibly touchy for most users and even in minor apps will lose you users and or customers. I would usually create another rake task to fix the issue at hand, the other route is to use migrations/seeds in some combination to handle migrating data for you. There are some gems that will act like migrations for data. The one mentioned by @geoffharcourt is a decent one.
Thanks @Jared_Smith and @geoffharcourt for your help (and sorry for late reply).
Good to hear you advice matches what my basic hunch was.
I do keep a staging server and i always do a quick testing session on it. As a single dev however I don’t have the luxury of a team of QA testers but still it prevents major blowups.
I didn’t know about the seed_migration gem and that sounds pretty useful. I’ll take it for a spin this week.