Rake Task - data migration limit?

I wrote a temporary rake task based on tips from this excellent thoughtbot blog post:

Here’s what I wrote:

namespace :postables do
  desc 'Use existing post data to populate postables table'
  task create: :environment do
    posts = Post.with_deleted.all
    puts "Going to create #{posts.count} postable associations"

    ActiveRecord::Base.transaction do
      posts.each do |post|
        next if post.postable_id.to_i == 0
        post.postables.create!(postee_id: post.postable_id, postee_type: post.postable_type)
        print '.'
      end
    end

    puts 'All done!'
  end
end

The rake task works great in development and on a staging server. However, I’m concerned about doing this on production. The task itself is pretty simple, but the number of posts in production is over 63,000. Does anyone know if there is a limit to the number of records you can create in an Activerecord transaction? Does this task look ok for running on 63,000 posts?

Thanks!!

There is no hard limit, but calling Post.all will load all posts in memory, which could be problematic. Have a look at find_each (ActiveRecord::Batches) - APIdock which will do it in batches of 1000.

Also, if possible, pull down your full production database locally and try running the migration locally to see how it performs.

Thanks @andyw8 !
So would you run all the batches within one Activerecord transaction? Here’s what I changed it to:

namespace :postables do
  desc 'Use existing post data to populate postables table'
  task create: :environment do
    puts "Going to create postable associations"

    ActiveRecord::Base.transaction do
      Post.with_deleted.find_each do |post|
        next if post.postable_id.to_i == 0
        post.postables.create!(postee_id: post.postable_id, postee_type: post.postable_type)
        print '.'
      end
    end

    puts 'All done!'
  end
end
1 Like

Or I could use find_in_batches and then run the Activerecord transaction around each batch of 1000 like this:

namespace :postables do
  desc 'Use existing post data to populate postables table'
  task create: :environment do
    puts 'Going to create postable associations'

    Post.with_deleted.find_in_batches do |posts|
      ActiveRecord::Base.transaction do
        posts.each do |post|
          next if post.postable_id.to_i == 0
          post.postables.create!(postee_id: post.postable_id, postee_type: post.postable_type)
          print '.'
        end
      end
    end

    puts 'All done!'
  end
end

What do you think is better?

I believe migrations are automatically wrapped in a transaction anyway.