It’s not 100% clear what you’re trying to do. Are you trying to allow some html but not other html? If so, have you considered supporting some other (controlled) markup language that supports only what you want?
Also, while I understand the desire to clean the data on the way in to the database, is it feasible to simply treat the data as untrusted when its actually used instead? Rails is already doing this for you in the view layer by default. This might not be something you want to push onto all consumers of the data your API collects, though.
It sounds like
sanitize should do what you want. By default it likely escapes all HTML, though the readme pretty clearly states it can be configred with a set of approved tags.
On past, non-ruby projects, I’ve used
AntiSamy, which is written by the Open Web Application Security Project. It sounds similar to sanitize. There appears to be a fork of that project (not maintained by OWASP) for Ruby. Never used it myself, but you could check that out as well.