TLDR;
In the controller:
@customers = Customer.all.includes(:invoices).where(invoices: { status: "open" })
In the view:
<% @customers.each do |customer| %>
<%= customer.name %>
<% customer.invoices.each do |invoice| %>
<%= invoice.details %>
<% end %>
<% end %>
However, this can lead to potential problems (memory load) so read below for an explanation.
N+1 Problem
It is possible to make Method #2 efficent.
Currently, every time this code runs over customer.invoices
you will be generating another sql query (N+1). Lets look at your example, but leave off the open
scope for now
@customers.each do |customer|
customer.invoices.each do |invoice|
invoice.details
end
end
Each loop would generate a query (not efficient) and your log would look like this:
Invoice Load (0.2ms) SELECT "invoices".* FROM "invoices" WHERE "invouces"."customer_id" = $1 [["customer_id", 1]]
Invoice Load (0.2ms) SELECT "invoices".* FROM "invoices" WHERE "invouces"."customer_id" = $1 [["customer_id", 2]]
Invoice Load (0.2ms) SELECT "invoices".* FROM "invoices" WHERE "invouces"."customer_id" = $1 [["customer_id", 3]]
...
...
Using Includes
A common solution would be to add an includes
in your controller on the customer
association.
@customers = Customer.all.includes(:invoices)
Now you just generate two small, tight sql querires.
Customer Load (1.1ms) SELECT "customers".* FROM "customers"
.
Invoice Load (0.4ms) SELECT "invoices".* FROM "invoices" WHERE "invoices"."customer_id" IN (1, 2, 3, 4, 5, 6)
Each iteration of the loop won’t need to go back to the database because it has all of the objects it needs.
Here’s where things get tricky - you have included another scope onto invoices.
Returning back to the example, lets modify that third line and add the open
scope back in. Instead of using the in memory objects, each iteration of the loop has to go back and hit the database to filter out only the “open” invoices.
Invoice Load (0.2ms) SELECT "invoices".* FROM "invoices" WHERE "invouces"."customer_id" = $1 AND "invoice"."status" = 'open' [["customer_id", 2]]
Invoice Load (0.2ms) SELECT "invoices".* FROM "invoices" WHERE "invouces"."customer_id" = $1 AND "invoice"."status" = 'open' [["customer_id", 3]]
Using Includes with Conditions
Now it is possible to filter out eager loaded data. We can use a where
to find only the open invoices.
@customers = Customer.all.includes(:invoices).where(invoices: { status: "open" })
Unfortunately, Rails cannot figure out how to optimize and break apart this query, so it will Left Outer Join everything.…
Here’s output from a different set of models
User.includes(:memberships).where(memberships: { status: "active" })
SQL (1.0ms) SELECT "users"."id" AS t0_r0, "users"."avatar_url" AS t0_r1, "users"."email" AS t0_r2, "users"."first_name" AS t0_r3, "users"."last_name" AS t0_r4, "users"."oauth_provider" AS t0_r5, "users"."oauth_access_token" AS t0_r6, "users"."oauth_remember_token" AS t0_r7, "users"."oauth_uid" AS t0_r8, "users"."password_digest" AS t0_r9, "users"."password_reset_token" AS t0_r10, "users"."remember_token" AS t0_r11, "users"."last_digest_sent_at" AS t0_r12, "users"."password_reset_token_expires_at" AS t0_r13, "users"."created_at" AS t0_r14, "users"."updated_at" AS t0_r15, "users"."users_invited" AS t0_r16, "users"."phone_number" AS t0_r17, "users"."company" AS t0_r18, "memberships"."id" AS t1_r0, "memberships"."project_id" AS t1_r1, "memberships"."user_id" AS t1_r2, "memberships"."status" AS t1_r3, "memberships"."is_admin" AS t1_r4, "memberships"."last_viewed_events_at" AS t1_r5, "memberships"."created_at" AS t1_r6, "memberships"."updated_at" AS t1_r7 FROM "users" LEFT OUTER JOIN "memberships" ON "memberships"."user_id" = "users"."id" WHERE "memberships"."status" = 'active'
Quite a beast. You can however, drop the open
scope in your view since you have only loaded the invoices which match that scope.
Potential Issues
The query above, while large, it is still much more performant than dealing with N+1 problems. However, once the tables start to get large or your are returning a large amount of rows, the query like the one above can be potentially troublesome. The database returns something like this:
customer_id | name | invoice_id | title
---------|---------|------------|----------------------------
1 | malcolm | 1 | Some invoice title
1 | malcolm | 2 | Another invoice
2 | wash | 3 | Late invoice title
3 | zoe | 4 | Some invoice title
3 | zoe | 5 | Another title
3 | zoe | 6 | Invoice Title
We get some duplication - a customer has_many invoices so every time we load a unique invoice we are going to load ALL of the customer data along with it. This kinda sucks, but nothing too terrible.
Double the has_manys
Now, issues arise when you try and eager load data from more than one has_many.
@customers = Customer.all.includes(:invoices, :receipts).where(invoices: { status: "open" })
This generates two left outer joins. Whats going to happen is that the database is going to return a row for every unique combination of customer, invoice and receipt. So if we have 25 customers, each customer has 10 invoices (250 total) and 10 receipts (250 total). The result set that is returned is not 525 records, but something closer to 2500 records. Now Rails has to instantiate all of that into ActiveRecord objects.
When is it safe to add where conditions with an includes?
Only when conditions are used on the main model
Customer.all.includes(:invoices, :receipts).where("users.name" => "patkoperwas")
This will generate just 3 small queries.
Customer Load (19.4ms) SELECT `customers`.* FROM `customers` WHERE `users`.`name` = 'patkoperwas'
Invoice Load (14.7ms) SELECT `invoices`.* FROM `invoices` WHERE `invoices`.`customer_id` IN (1) ORDER BY invoices.created_at desc
Receipt Load (5.0ms) SELECT `receipts `.* FROM `receipts` WHERE `receipts`.`customer_id` IN (1) ORDER BY receipts.id DESC