I have recently rebuilt an app that was being run on a MySQL powered PHP CMS. It had a lot of production data, both DB records and files & images associated with many of those records.
I have gotten a SQL dump and pulled out the requisite info into CSV files, and I have downloaded all the files & images to my local drive.
Ideally, what I would like to do is to import those records in the CSV files into my local dev PostgreSQL install and have any files/images associated with that record uploaded to my S3 bucket via Carrierwave (BTWâŚall of this is setup in a vanilla way, i.e. I can create a record and upload files & images to S3 just fine).
The CSV files have tens of thousands of records.
What is the best approach to import these records locally (keeping created_at and updated_at records accurate - i.e. per the CSV and not per the datetime when it was imported)?
Also, once I have implemented this locally and I have all of this âold dataâ imported and S3 files uploaded to my dev bucket, how do I approach this for Heroku? Right now, I am running the free package, but I can upgrade with no issues. Should I upgrade to a fullblown PostgreSQL service? Or will the normal $9/mo PGSQL upgrade work?
If you canât tackle the entire thing, then maybe just giving me guidance on how I can break this down into more manageable steps because it feels a bit overwhelming right now.
That sounds good for my local copy. What happens with Heroku? How do I then import these stuff to Heroku?
Also, if I donât use a Ruby script, how do I override the automatic setting of âcreated_atâ and âupdated_atâ or is that an AR feature and not a PG thing?
@marcamillion, you can connect to your Heroku database the same way you would connect to a local database. The Postgres page on Heroku shows you the connection parameters youâd need to connect via a GUI client, webapp, or command line utility.
If you are going to move these records up to a Heroku instance then my recommendation to try to use PG and to avoid Ruby goes double. ActiveRecord has a lot of memory overhead for each record, and I have run into problems with import scripts for tens of thousands of records running out of memory on Heroku even with attempts to batch the imports into groups, etc.
Thanks for the responses guys. @jondkinney I concur with @geoffharcourt. That tool looks very interesting!
So @geoffharcourt for my Heroku DB, would you suggest I fork my live DB and then import these to thatâŚand do it that way? Whatâs the best way to do this with zero-downtime? It would be awesome if I could do this on a fork of my production db, and then once I am happy with it, just point the production app to the forked DB and use that as productionâŚbut I am not sure how possible/feasible that is with Heroku.
Testing against a fork of your production DB (maybe your staging DB?) is a great way to check your work, but if you do the import against a fork and your users are updating other data during the import, swapping in your fork for the original is going to wipe out those changes.
Note that if youâre using the Dev plan at Heroku (the $9/mo plan), you canât do forks. Forks are reserved for production-tier instances.
Yeh I am using the dev plan, and I am aware re: changes. The only writes we have in the DB are user registrations and creations of posts which moderators do. So while I am importing, I can freeze all writes with no issue.
By production-tier instances you mean I need to upgrade from the free (or 2-dynos) or do I need to get the standalone postgres product that they have? I know they have this - Fully Managed Database as a Service - PostgreSQL | Heroku and the regular tiers for their dynos.