Optimizing SQL Queries in Postgres - Upcase

June 2016

kiramclean

Thanks for the awesome video! I was just wondering what is a good/safe way to get some production data out of a production database, if we don’t want a full backup of all the data?

June 2016

Hey Kira, glad you enjoyed the video. Personally I tend to pull the full Upcase production DB regularly using Parity (I sum up this approach in the Parity section of the Heroku Weekly Iteration). If this is an option I highly recommend it to get a real picture. At a minimum, perhaps you could backup production → staging and then tinker there (after some local experimentation).

If that is not an option, you might consider building a script to generate the data. We have an example of this in the dev:prime task in Upcase, which uses FactoryGirl’s methods to aid in building structured data (with a few helper methods).

I’ve also heard of folks applying an automated anonymization script to work around compliance / privacy concerns. With this, you’d work from a copy of the production data set, but scramble any identifying data, for example replacing names with “Jane M Doe”. I don’t have any solid examples of this that I can point you to, but wanted to point it out as a third option.

June 2016

Transactions are taken for granted in day-to-day development, but ACID guarantees no longer exist when you’re sharing data between two different stores.
Taking backups, snapshots, and local copies of data is much more difficult with multiple databases.
If you’re using a secondary database as a cache, now you need to solve one of computer science’s hardest problems! (Cache invalidation)
If you’re using a schemaless database, you now need to enforce, optimize, and document your schema by yourself, without aid from the database.

I would suggest this litmus test: if you’re not sure whether or not to introduce another database, stick with Postgres.

kiramclean

christoomey

jkrmr

gxespino

jferris