Proper abstraction for bulk updates from an external API

Let me know if this is too specific.

I have an AbstractConnection class which handles updating my ActiveRecord models with data retrieved from external APIs. I then have a class inheriting from AbstractConnection which handles all of the specifics of the API I’m using (as there will be others in the future) and returns the attributes in arrays/hashes.

One operation involves the loading of 10,000+ records, and the API limits the number or records per request, and provides a “next” URL which offsets the query so you can load all the records in a series of requests. I would prefer not to load an unknowably-huge number of records into memory in my API-specific method, so I’d like to be able to update each batch of records as soon as I get them. However, I worry about exposing too much of the API-specific implementation to the AbstractConnection.

Anyone handled something like this before? Thanks!

Have you looked at find_in_batches yet? Not sure how the API gives you results, but you might be able to leverage this. There’s also a “find_each” method that allows you to specify a start param. These might worth investigating.

I’d suggest a slightly different overall approach: write what makes the most sense for this connection, and generalize later.

It’s hard to generalize based on assumptions. It’s very easy to guess wrong in a way that creates way for work for you.

I’d write a concrete class that handles the fetching the best way you know how. Later, when you have to connect to another API, consider whether this new API is similar enough to warrant code reuse. You’ll have a better sense of what’s the same and what’s different, and the abstraction will probably be easier to create.