Summary -> Detail Views/Queries

Is there a general pattern that is recommended for building a “summary detail” type view? Say for example, you have an invoicing app with a reporting view that needs to list headings for every customer with a detail of every open invoice for that customer under the heading.

Here’s how I’ve attempted to deal with it in the past (using the customer/invoice example), none of which has been very satisfactory:

  1. Method #1: Create a join query which joins the customer details with the invoice. In the view loop over the collection using an instance variable to keep track of the customer being looped over. Have a conditional in the loop code that compares the current iteration’s customer with the last interation’s customer. If they are different, add in a heading for the new customer.

  2. Method #2: Query for customers with an open invoice, then in the view iterate over the customers and nested the customer iteration, do another iteration over that customer’s open invoices, e.g.

    <% @customers.each do |customer| %>
    <%= customer.name %>

    <% customer.invoices.open.each do |invoice| %>
    <%= invoice.details %>
    <% end %>
    <% end %>

So, yeah. Neither one of these feels great. Method #1 seems like a hack and Method #2 seems like it would generate too many queries though its the most readable in the code.

Please, show me another way? :smile:

TLDR;


In the controller:

@customers = Customer.all.includes(:invoices).where(invoices: { status: "open" })

In the view:

<% @customers.each do |customer| %>
  <%= customer.name %>
  <% customer.invoices.each do |invoice| %>
    <%= invoice.details %>
  <% end %>
<% end %>

However, this can lead to potential problems (memory load) so read below for an explanation.

N+1 Problem


It is possible to make Method #2 efficent.
Currently, every time this code runs over customer.invoices you will be generating another sql query (N+1). Lets look at your example, but leave off the open scope for now

@customers.each do |customer|
  customer.invoices.each do |invoice|
    invoice.details
  end
end

Each loop would generate a query (not efficient) and your log would look like this:

Invoice Load (0.2ms)  SELECT "invoices".* FROM "invoices"  WHERE "invouces"."customer_id" = $1  [["customer_id", 1]]
Invoice Load (0.2ms)  SELECT "invoices".* FROM "invoices"  WHERE "invouces"."customer_id" = $1  [["customer_id", 2]]
Invoice Load (0.2ms)  SELECT "invoices".* FROM "invoices"  WHERE "invouces"."customer_id" = $1  [["customer_id", 3]]
...
...

Using Includes


A common solution would be to add an includes in your controller on the customer association.

@customers = Customer.all.includes(:invoices)

Now you just generate two small, tight sql querires.

Customer Load (1.1ms)  SELECT "customers".* FROM "customers"
.
Invoice Load (0.4ms)  SELECT "invoices".* FROM "invoices"  WHERE "invoices"."customer_id" IN (1, 2, 3, 4, 5, 6)

Each iteration of the loop won’t need to go back to the database because it has all of the objects it needs.

Here’s where things get tricky - you have included another scope onto invoices.

Returning back to the example, lets modify that third line and add the open scope back in. Instead of using the in memory objects, each iteration of the loop has to go back and hit the database to filter out only the “open” invoices.

Invoice Load (0.2ms)  SELECT "invoices".* FROM "invoices"  WHERE "invouces"."customer_id" = $1 AND "invoice"."status" = 'open' [["customer_id", 2]]
Invoice Load (0.2ms)  SELECT "invoices".* FROM "invoices"  WHERE "invouces"."customer_id" = $1 AND "invoice"."status" = 'open'  [["customer_id", 3]]

Using Includes with Conditions


Now it is possible to filter out eager loaded data. We can use a where to find only the open invoices.

@customers = Customer.all.includes(:invoices).where(invoices: { status: "open" })

Unfortunately, Rails cannot figure out how to optimize and break apart this query, so it will Left Outer Join everything.
Here’s output from a different set of models

User.includes(:memberships).where(memberships: { status: "active" })
 SQL (1.0ms)  SELECT "users"."id" AS t0_r0, "users"."avatar_url" AS t0_r1, "users"."email" AS t0_r2, "users"."first_name" AS t0_r3, "users"."last_name" AS t0_r4, "users"."oauth_provider" AS t0_r5, "users"."oauth_access_token" AS t0_r6, "users"."oauth_remember_token" AS t0_r7, "users"."oauth_uid" AS t0_r8, "users"."password_digest" AS t0_r9, "users"."password_reset_token" AS t0_r10, "users"."remember_token" AS t0_r11, "users"."last_digest_sent_at" AS t0_r12, "users"."password_reset_token_expires_at" AS t0_r13, "users"."created_at" AS t0_r14, "users"."updated_at" AS t0_r15, "users"."users_invited" AS t0_r16, "users"."phone_number" AS t0_r17, "users"."company" AS t0_r18, "memberships"."id" AS t1_r0, "memberships"."project_id" AS t1_r1, "memberships"."user_id" AS t1_r2, "memberships"."status" AS t1_r3, "memberships"."is_admin" AS t1_r4, "memberships"."last_viewed_events_at" AS t1_r5, "memberships"."created_at" AS t1_r6, "memberships"."updated_at" AS t1_r7 FROM "users" LEFT OUTER JOIN "memberships" ON "memberships"."user_id" = "users"."id" WHERE "memberships"."status" = 'active'

Quite a beast. You can however, drop the open scope in your view since you have only loaded the invoices which match that scope.

Potential Issues


The query above, while large, it is still much more performant than dealing with N+1 problems. However, once the tables start to get large or your are returning a large amount of rows, the query like the one above can be potentially troublesome. The database returns something like this:

   customer_id |   name  | invoice_id | title
   ---------|---------|------------|----------------------------
    1       | malcolm | 1          | Some invoice title
    1       | malcolm | 2          | Another invoice
    2       | wash    | 3          | Late invoice title
    3       | zoe     | 4          | Some invoice title
    3       | zoe     | 5          | Another title
    3       | zoe     | 6          | Invoice Title

We get some duplication - a customer has_many invoices so every time we load a unique invoice we are going to load ALL of the customer data along with it. This kinda sucks, but nothing too terrible.

Double the has_manys

Now, issues arise when you try and eager load data from more than one has_many.

@customers = Customer.all.includes(:invoices, :receipts).where(invoices: { status: "open" })

This generates two left outer joins. Whats going to happen is that the database is going to return a row for every unique combination of customer, invoice and receipt. So if we have 25 customers, each customer has 10 invoices (250 total) and 10 receipts (250 total). The result set that is returned is not 525 records, but something closer to 2500 records. Now Rails has to instantiate all of that into ActiveRecord objects.

When is it safe to add where conditions with an includes?

Only when conditions are used on the main model

Customer.all.includes(:invoices, :receipts).where("users.name" => "patkoperwas")

This will generate just 3 small queries.

 Customer Load (19.4ms)  SELECT `customers`.* FROM `customers` WHERE `users`.`name` = 'patkoperwas'
  Invoice Load (14.7ms)  SELECT `invoices`.* FROM `invoices` WHERE `invoices`.`customer_id` IN (1) ORDER BY invoices.created_at desc
  Receipt Load (5.0ms)  SELECT `receipts `.* FROM `receipts` WHERE `receipts`.`customer_id` IN (1) ORDER BY receipts.id DESC
1 Like