Integrating Rails with the Scrapy Python Web Scraper

Justin_Gordon · March 29, 2014, 6:54am

Doe anybody have any experience integrating a Rails app with a web scraper? It seems like the Python Scrapy project is far more advanced than anything in Ruby, and I already have some legacy code for it, so it would make sense to keep it. The Rails app runs on Heroku, and the Scrapy app can run on AWS. So here’s my questions:

Does it make sense to have the Scrapy app directly modify the Heroku Rails database? I can imagine using a REST API or maybe Rabbit MQ to send the data, but what would be the gain?
Assuming that the Scrapy app will directly access the Heroku Rails database, would it makes sense to let Rails handle all DB migrations, and if so, then to have the Scrapy python code in the same git repository, say under a directory called python/scrapy. It’s a tiny amount of code.

Does my proposed “architecture” make sense?

geoffharcourt · March 31, 2014, 12:30am

I would attempt to do all of this in Ruby if you’re going to have a Rails/Heroku/Postgres setup. Having a non ActiveRecord pice of software modifying an ActiveRecord-built database is asking for trouble.

Here’s a good blog post about building a quick scraper using Nokogiri: http://ruby.elevatedintel.com/blog/screen-scraping-with-a-saw-a-nokogiri-tutorial-with-examples/

I’ve done this a bunch with Nokogiri (and its predecessor, Hpricot), and it gets the job done well.

geoffharcourt · March 31, 2014, 1:36am

This project is a little rusty, but we’ve used it to scrape legacy sites when transitioning to a new CMS, making sure we’ve caught every page. It spiders the site for you, and its foundation is Nokogiri:

Topic		Replies	Views
Splitting up a Rails App / Multi-Application Architecture Ruby on Rails	3	1411	February 13, 2015
Porting Rails App to Mobile Ruby on Rails	2	1111	June 19, 2013
Web App plus Analytics Engine: 2 databases needed? Ruby on Rails	7	1374	May 13, 2016
How to install trail-map on my rails app? Upcase	4	751	July 30, 2013
API or shared database? Ruby on Rails	4	1078	January 10, 2014

Integrating Rails with the Scrapy Python Web Scraper

Related topics