After better controlling the experiment better, I was not able to isolate the issue to the 3.9.x series of the New Relic gem.
Talking with their support, the recommended solution is:
heroku config:set NEW_RELIC_AGGRESSIVE_KEEPALIVE=1 --remote production
It sounds like there may be a bad interaction between the New Relic Ruby agent and the generational garbage collector in Ruby 2.1 where the agent will re-establish a new SSL connection to New Relic servers each minute in order to submit data. This causes a handful of Ruby objects to be allocated and triggers lots of native memory allocations through malloc in the openssl library. These allocations aren’t accounted for by Ruby’s GC triggering logic, so unless something else triggers a GC, they can hang around too long.
Somewhat paradoxically, this issue affects idling applications (or apps with infrequent requests) more than applications under a steady amount of load. This is because of the fact that GC runs will be triggered less frequently in an idling application.
The environment variable above is a configuration setting that you can set that will cause the agent to re-use a single SSL connection to New Relic servers. It might become a default soon in upcoming New Relic Ruby gem releases.
After setting the variable in one of our apps with this problem, while keeping New Relic in the app, we saw this result:
So far, so good!