Take a screenshot of a site from a URL

javiercarballo · September 11, 2013, 7:34pm

Hello!

I am working on an app where a user can post a url and and I want to parse that url, the same way facebook does, where they take a screencapture of the webpage and take the title then present a thumbnail to the user.

I would like to do this from Ruby, planning to use Sidekiq for the parsing.

Thanks!

patrikbona · September 11, 2013, 9:04pm

And where is the question?

derekprior · September 12, 2013, 12:45am

There’s no screenshot taking place. It’s making a request to get the HTML of the page and parsing the title from that. Then it pulls images from what it determines to be the main body of the article and allows the user to choose. You could simplify by simply picking the first image tag you see.

JoelQ · September 12, 2013, 2:22pm

If you do want to take a screenshot of the page, you can do so pretty easily using PhantomJS

This is the example on how to do this from their website.

// github_screenshot.js

var page = require('webpage').create();
page.open('http://github.com/', function () {
    page.render('github.png');
    phantom.exit();
});

Then from the command line:

phantomjs github_screenshot.js

For more info, see the wiki page on screen capture and the quick start guide

pedrosmmoreira · September 13, 2013, 9:00am

@javiercarballo, I had this old link lying around my bookmarks, it describes creating that sort of parsing. It uses jQuery and php but maybe it can help get you started: Parse a link like Facebook

andyw8 · September 13, 2013, 12:51pm

If you do want an actual screenshot, check out http://url2png.com/

javiercarballo · September 13, 2013, 1:03pm

Hey everybody!

Thank you so much for your help, this is how I ended up doing it: Using PhantomJS’s screen capture feature.

class WebScreenCapture

  def initialize(url,file_name)
    upload_image_to_s3(url,file_name)
  end

  def self.get(file_name)
    s3 = AWS::S3.new
    o = s3.buckets[ENV["AWS_BUCKET_WEB_CAPTURES"]].objects["#{file_name}.png"]
    o.public_url
  end

private

  def upload_image_to_s3(url,file_name)
    image = HTTParty.get("http://screenshot.etf1.fr/?url=#{url}")
    s3 = AWS::S3.new
    obj = s3.buckets[ENV["AWS_BUCKET_WEB_CAPTURES"]].objects["#{file_name}.png"]
    obj.write(image, acl: :public_read)
  end

end

and then added a worker with Sidekiq for the parsing:

class ShoutParserWorker
  include Sidekiq::Worker
  def perform(shout_id)
    shout = TextShout.find(shout_id)
    if shout
      check_for_urls_and_take_a_screenshot(shout)
    end
    tell_shout_to_rerender_in_the_playground(shout)
  end

private

  def check_for_urls_and_take_a_screenshot(shout)
    shout.body.gsub(/^(http?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/) do |match|
      if WebScreenCapture.new(match, "shout_#{ shout.id }")
        if shout.update_attributes(
            thumbnail_image_path:"shout_#{ shout.id }",
            thumbnail_image_parsed_url: match,
            thumbnail_image_page_title: get_page_title_from(match)
            )
        end
      end
    end
  end

  def get_page_title_from(url)
    r = HTTParty.get(url)
    r = Nokogiri::HTML(r)
    r.title
  end

  def tell_shout_to_rerender_in_the_playground(shout)
    activity_id = Activity.where(trackable_type: "Shout", trackable_id: shout.id).
                    first.id

    Pusher['pg_activities'].trigger('rerender_activity', {
          id: "#{activity_id}"
        })
  end
end

geoffharcourt · September 13, 2013, 1:24pm

@javiercarballo, Twitter and Facebook are using wither Twitter Cards or the Open Graph Protocol to make those snippets. They are both pretty easy to do (and I think sites will fall back to Open Graph if Twitter Cards aren’t present). You accomplish this by adding a few meta tags in the <head> section.

Here’s a good readup on Open Graph: http://ogp.me/

Btw, I wrote a quick gem for parsing Twitter Cards in Ruby. I haven’t done the same for Open Graph/Facebook, but it’s almost exactly the same code just with different attribute names.

Topic		Replies	Views
_photo_shout partial code Intermediate Rails Tutorial	2	852	August 8, 2013
Designing File Upload / Parsing / Third Party Structure Ruby on Rails	0	437	July 30, 2013
Source code example of Improve Rails Performance Screencast Ruby on Rails	1	588	February 26, 2014
OO Design and Extracting Objects Ruby on Rails	2	608	February 20, 2014
Tracking Pixels Best Practice Ruby on Rails	2	1184	August 28, 2014

Take a screenshot of a site from a URL

Related topics