5 replies
June 2016

kiramclean

Thanks for the awesome video! I was just wondering what is a good/safe way to get some production data out of a production database, if we don’t want a full backup of all the data?

June 2016

christoomey

Hey Kira, glad you enjoyed the video. Personally I tend to pull the full Upcase production DB regularly using Parity (I sum up this approach in the Parity section of the Heroku Weekly Iteration). If this is an option I highly recommend it to get a real picture. At a minimum, perhaps you could backup production → staging and then tinker there (after some local experimentation).

If that is not an option, you might consider building a script to generate the data. We have an example of this in the dev:prime task in Upcase, which uses FactoryGirl’s methods to aid in building structured data (with a few helper methods).

I’ve also heard of folks applying an automated anonymization script to work around compliance / privacy concerns. With this, you’d work from a copy of the production data set, but scramble any identifying data, for example replacing names with “Jane M Doe”. I don’t have any solid examples of this that I can point you to, but wanted to point it out as a third option.

June 2016

jkrmr

@christoomey @jferris
Baller episode, thanks for putting this together.

August 2016

gxespino

@christoomey, @jferris - When would you want to add a NoSQL db to the mix?

August 2016

jferris

I would add another database only when there’s a production issue that can’t be reasonably fixed in your primary database and that you’re sure will be fixed by introducing a new tool.

The cost of having two databases is higher than is generally appreciated:

I would suggest this litmus test: if you’re not sure whether or not to introduce another database, stick with Postgres.