Responsibly Handling a migration of data from PHP app to Rails App (PostgreSQL)

marcamillion · January 27, 2015, 12:21pm

I have recently rebuilt an app that was being run on a MySQL powered PHP CMS. It had a lot of production data, both DB records and files & images associated with many of those records.

I have gotten a SQL dump and pulled out the requisite info into CSV files, and I have downloaded all the files & images to my local drive.

Ideally, what I would like to do is to import those records in the CSV files into my local dev PostgreSQL install and have any files/images associated with that record uploaded to my S3 bucket via Carrierwave (BTW…all of this is setup in a vanilla way, i.e. I can create a record and upload files & images to S3 just fine).

The CSV files have tens of thousands of records.

What is the best approach to import these records locally (keeping created_at and updated_at records accurate - i.e. per the CSV and not per the datetime when it was imported)?

Also, once I have implemented this locally and I have all of this “old data” imported and S3 files uploaded to my dev bucket, how do I approach this for Heroku? Right now, I am running the free package, but I can upgrade with no issues. Should I upgrade to a fullblown PostgreSQL service? Or will the normal $9/mo PGSQL upgrade work?

If you can’t tackle the entire thing, then maybe just giving me guidance on how I can break this down into more manageable steps because it feels a bit overwhelming right now.

Thanks!

geoffharcourt · January 27, 2015, 3:01pm

I would do the data import with Postgres directly rather than trying to run the data through a Ruby script.

The PG docs on the COPY command are here:

PGAdmin also has a pretty decent import utility if you want to get some GUI sugar on top of the import process:

If your data set is really big, scripting it through Ruby may be time and/or memory constrained.

HTH.

marcamillion · January 27, 2015, 11:18pm

That sounds good for my local copy. What happens with Heroku? How do I then import these stuff to Heroku?

Also, if I don’t use a Ruby script, how do I override the automatic setting of “created_at” and “updated_at” or is that an AR feature and not a PG thing?

geoffharcourt · January 28, 2015, 3:45pm

@marcamillion, you can connect to your Heroku database the same way you would connect to a local database. The Postgres page on Heroku shows you the connection parameters you’d need to connect via a GUI client, webapp, or command line utility.

If you are going to move these records up to a Heroku instance then my recommendation to try to use PG and to avoid Ruby goes double. ActiveRecord has a lot of memory overhead for each record, and I have run into problems with import scripts for tens of thousands of records running out of memory on Heroku even with attempts to batch the imports into groups, etc.

jondkinney · January 28, 2015, 6:01pm

I’d suggest taking a look at http://pgloader.io/ I’ve used it in the past and it’s worked well.

geoffharcourt · January 28, 2015, 7:41pm

@jondkinney, that tool looks awesome. I like the reject logs it stores.

marcamillion · January 28, 2015, 8:07pm

Thanks for the responses guys. @jondkinney I concur with @geoffharcourt. That tool looks very interesting!

So @geoffharcourt for my Heroku DB, would you suggest I fork my live DB and then import these to that…and do it that way? What’s the best way to do this with zero-downtime? It would be awesome if I could do this on a fork of my production db, and then once I am happy with it, just point the production app to the forked DB and use that as production…but I am not sure how possible/feasible that is with Heroku.

Any thoughts?

geoffharcourt · January 28, 2015, 8:38pm

Testing against a fork of your production DB (maybe your staging DB?) is a great way to check your work, but if you do the import against a fork and your users are updating other data during the import, swapping in your fork for the original is going to wipe out those changes.

Note that if you’re using the Dev plan at Heroku (the $9/mo plan), you can’t do forks. Forks are reserved for production-tier instances.

marcamillion · January 28, 2015, 9:04pm

Yeh I am using the dev plan, and I am aware re: changes. The only writes we have in the DB are user registrations and creations of posts which moderators do. So while I am importing, I can freeze all writes with no issue.

By production-tier instances you mean I need to upgrade from the free (or 2-dynos) or do I need to get the standalone postgres product that they have? I know they have this - Fully Managed Database as a Service - PostgreSQL | Heroku and the regular tiers for their dynos.

Which did you mean?

geoffharcourt · January 28, 2015, 11:31pm

You don’t need to have multiple dynos to do the fork, just a Standard-0 or higher DB: Heroku Postgres - Add-ons - Heroku Elements

marcamillion · January 29, 2015, 12:07am

Ok Thanks!

Topic		Replies	Views
How to restore a Postgres backup that is stored on Amazon S3 SQL	4	3751	November 28, 2014
Best practices for managing databases in production Ruby on Rails	3	2363	February 9, 2015
How do you run migrations? Ruby on Rails	5	1682	January 21, 2017
When to upgrade heroku database? Ruby on Rails	8	1011	March 6, 2014
What data model tools do you use? Workflow	5	1348	July 22, 2014

Responsibly Handling a migration of data from PHP app to Rails App (PostgreSQL)

Related topics