← Back to Upcase

How to free up swap space heroku?


(Samnang Chhun) #1

I have one app that runs on Rails 4.1.6 and Ruby 2.1.3 on Heroku. What I realize after I leave it run for a while I see there a lot of swap space used. I don’t know why. And also I cause Error R14 (Memory quota exceeded).

Do you guys used to familiar with this situation? or know how to free up swap space?


Should we make the upcase forum non-indexable by Google?
(Simon Taranto) #2

We recently saw a similar issue. We did not see the large amount (looks like about ~30% of total) swap usage but we did see memory consumption continually increase until we hit the max allowed per dyno leading to a force restart. We are running Unicorn with Rails 4.1.5 and Ruby 2.1.3.

Some background on the problem:

Some ideas to get a better look into the problem:

Some steps to try:

We experimented with some different Ruby GC settings and while the changes did adjust the way GC was happening we still had R14 errors so we implemented unicorn worker killer and our R14’s went away.


(Dan Croak) #3

On that thread, it looks like the patch was backported in Ruby 2.1.4:

http://svn.ruby-lang.org/repos/ruby/tags/v2_1_4/ChangeLog

(Search for “9607” on that page.)

I’m seeing similar memory issues on a few Ruby 2.1.4 apps.


(Simon Taranto) #4

I’m seeing similar memory issues on a few Ruby 2.1.4 apps.

Interesting. I missed that detail. I wonder if it would be possible to downgrade to 2.0.x and see if memory usage looks different.


(Geoff Harcourt) #5

We have had similar issues on our apps on every patch version of Ruby 2.1. I haven’t tried unicorn-worker-killer yet, but I have experienced R14 errors that persist until we restart.

We saw some minor improvement when we started using out-of-band garbage collection through this gem: https://github.com/tmm1/gctools, but it wasn’t enough to stop the R14s.


(Dan Croak) #6

Debugging locally with:

WEB_CONCURRENCY=1 foreman start

And:

watch ps -o rss= 21400

Where 21400 is whatever the output of the first Unicorn worker is:

web.1 | I, [2014-11-04T15:45:34.630089 #21400]  INFO -- : worker=0 ready

I eliminated the following potential solutions:

  • Downgrade to Ruby 2.0.
  • Switch from Unicorn to Puma.
  • ENV variables suggested in

http://samsaffron.com/archive/2014/04/08/ruby-2-1-garbage-collection-ready-for-production
and
https://discussion.heroku.com/t/tuning-rgengc-2-1-on-heroku/359/13

@jferris did a visual read of the codebase looking for global state in our
application code:

  • global variables
  • class variables
  • singletons
  • per-process instance state

Not seeing any examples of those, he did a binary search through the gems,
commenting out one gem at a time, restarting Foreman, and changing the PID on
the watch command.

New Relic was the only gem that resulted in memory not rising indefinitely. It
settled around 130MB for the web process.

We’re trying out removing the New Relic gem tonight to see if we can improve this graph:


(Dan Croak) #7

Removing the New Relic gem did solve the issue for us in production:

I’m now testing on various versions of the gem using an isolated, plain Rails app with New Relic gem.

I’ve observed the memory leak on Ruby 2.1.4 with these versions of newrelic_rpm:

  • 3.9.6.257 (latest)
  • 3.9.5.251 (originally what we were using)
  • 3.9.4.245
  • 3.9.3.241
  • 3.9.1.236

I did not see the behavior on:

  • 3.8.1.221
  • 3.7.3.204

So, it seems like it was introduced in the 3.9.x series. I’m contacting New Relic now.


(Lenart Rudel) #8

Would you guys mind sharing where and how do you log memory consumption? I thought the graphs are from newrelic but I see they’re different…


(Geoff Harcourt) #9

@lenartr, that’s from Heroku’s dashboard in the Metrics section.


(Geoff Harcourt) #10

@Simon_Taranto, we added Unicorn Worker Killer to our app on staging today, and with the default settings out of the box it works wonderfully. Thanks very much for the recommendation!


(Simon Taranto) #11

Awesome, @geoffharcourt. Are y’all using New Relic gem as well?


(Geoff Harcourt) #12

We were, but I gave Skylight.io a try a couple months ago and was extremely impressed with it. We got some huge performance wins out of conditions that we diagnosed through it. I am fairly sure that it’s possible to do a lot of the same stuff with New Relic, but Skylight’s low resource usage was a nice plus for us given our prior issues with memory use on dynos.


(Dan Croak) #13

After better controlling the experiment better, I was not able to isolate the issue to the 3.9.x series of the New Relic gem.

Talking with their support, the recommended solution is:

heroku config:set NEW_RELIC_AGGRESSIVE_KEEPALIVE=1 --remote production

It sounds like there may be a bad interaction between the New Relic Ruby agent and the generational garbage collector in Ruby 2.1 where the agent will re-establish a new SSL connection to New Relic servers each minute in order to submit data. This causes a handful of Ruby objects to be allocated and triggers lots of native memory allocations through malloc in the openssl library. These allocations aren’t accounted for by Ruby’s GC triggering logic, so unless something else triggers a GC, they can hang around too long.

Somewhat paradoxically, this issue affects idling applications (or apps with infrequent requests) more than applications under a steady amount of load. This is because of the fact that GC runs will be triggered less frequently in an idling application.

The environment variable above is a configuration setting that you can set that will cause the agent to re-use a single SSL connection to New Relic servers. It might become a default soon in upcoming New Relic Ruby gem releases.

After setting the variable in one of our apps with this problem, while keeping New Relic in the app, we saw this result:

So far, so good!


(Samnang Chhun) #14

I’m trying to figure out where NEW_RELIC_AGGRESSIVE_KEEPALIVE is being used by newrelic_rpm gem? https://github.com/newrelic/rpm/search?utf8=✓&q=NEW_RELIC_AGGRESSIVE_KEEPALIVE, but somehow I can find it.


(Samnang Chhun) #15

@geoffharcourt do you use gctools combination with Unicorn Worker Killer?


(Geoff Harcourt) #16

@samnang, I dropped gctools because I didn’t find it was that effective for us (it may be more effective if you have 2x dynos or are working in a non-Heroku environment). Now that we’re using unicorn worker killer our RAM use is (mostly) staying below the Heroku R14 limit, so I’m holding out hope for Ruby 2.2 this December 25th before I spend much further time on memory issues.


(Dan Croak) #17

You can find the config option by searching for “aggressive_keepalive”. I believe the NEW_RELIC_AGGRESSIVE_KEEPALIVE may be specific to the Heroku add-on, with the prefix being a tip-off for which add-on and the rest of the config variable’s name used by the service.

An alternative to using the ENV variable I believe would be to set aggressive_keepalive: true in your config/newrelic.yml. I believe that will be the default in upcoming New Relic gem versions.


(Anthony Lee) #18

@croaky I see in the upcase gemfile that ruby version is 2.2.2, and new relic gem to 3.8.1. Is this working well in heroku and not having memory leak issues like 2.1? I downgraded to ruby 2.0 and memory got a lot better. I was having memory issues with 2.2.2.