Advanced ActiveRecord Querying - has_many Associations

This topic is for the Querying has_many Associations exercise in the Advanced ActiveRecord Querying trail. Post any questions, comments, or pointers you have to share with other Upcase subscribers.

Hi there,
These exercises are awesome!

I was hoping someone could help me understand the distinct method.

Person.joins(:employees), returns the 4 employees (not managers). Person.joins(:employees).distinct returns the two managers and none of the employees. When I watched the Upcase video, I thought distinct would filter through the 4 employees and return the unique ones. Since all the employees had unique names, I expected the 4 employees to be returned, not the two managers. Could you help me understand how distinct works?

Hi Ashley,

From what you have written here, it sounds like maybe you understand the distinct method and are really getting thrown off by the has_many :employees vs. belongs_to :manager associations. With Person.joins(:employees).distinct, the query is asking for all People who have employees (managers) and adding distinct on will limit each Person to only being returned once, so the 2 managers come up. If you ran Person.joins(:manager).distinct, the query is requesting all people with a manager, with each Person being returned only once, so the 4 employees who are not managers come up.

Does that make sense? In other words, I think it might be the words chosen for the association that are causing the confusion. I did a double take on it myself at first too. Please correct me if Iā€™m wrong.

Happy Coding!
Nathan

@aginz it may help to understand what is going on behind the scenes in the SQL and look at the data returned:

TL; DR
You correctly understand .distinct but are confused by the way joins behaves. In particular, the surprising results are due to INNER JOINing the people table onto itself. Back to Basics: Writing SQL Queries looks at some of the gotchas of INNER JOIN

State of the database

Firstly, it we inspect all the users, we find that there are 6 unique users in the database:

Person.all.to_sql
# => SELECT "people".* FROM "people"
+----+-------------------------+---------+-------------+------------+--------+
| id | name                    | role_id | location_id | manager_id | salary |
+----+-------------------------+---------+-------------+------------+--------+
| 1  | manager-one             | 1       | 1           |            | 0      |
| 2  | manager-two             | 2       | 2           |            | 0      |
| 3  | employee-of-manager-one | 3       | 3           | 1          | 0      |
| 4  | employee-of-manager-one | 4       | 4           | 1          | 0      |
| 5  | employee-of-manager-two | 5       | 5           | 2          | 0      |
| 6  | employee-of-manager-two | 6       | 6           | 2          | 0      |
+----+-------------------------+---------+-------------+------------+--------+

DISTINCT

The SQL DISTICT keyword removes duplicate rows from the results (similar to Rubyā€™s Array#uniq)

Person.distinct.to_sql
# => SELECT DISTINCT "people".* FROM "people"

Since there are no duplicate rows, we get the same result (although in a different order):

+----+-------------------------+---------+-------------+------------+--------+
| id | name                    | role_id | location_id | manager_id | salary |
+----+-------------------------+---------+-------------+------------+--------+
| 4  | employee-of-manager-one | 4       | 4           | 1          | 0      |
| 3  | employee-of-manager-one | 3       | 3           | 1          | 0      |
| 5  | employee-of-manager-two | 5       | 5           | 2          | 0      |
| 1  | manager-one             | 1       | 1           |            | 0      |
| 6  | employee-of-manager-two | 6       | 6           | 2          | 0      |
| 2  | manager-two             | 2       | 2           |            | 0      |
+----+-------------------------+---------+-------------+------------+--------+

Joining

Although Person.joins(:employees) returns 4 results, they arenā€™t the employees. Instead, itā€™s the two managers duplicated twice.

Person.join(:employees).to_sql
# => SELECT "people".*
#    FROM "people"
#    INNER JOIN "people" "employees_people"
#    ON "employees_people"."manager_id" = "people"."id"
+----+-------------+---------+-------------+------------+--------+
| id | name        | role_id | location_id | manager_id | salary |
+----+-------------+---------+-------------+------------+--------+
| 1  | manager-one | 1       | 1           |            | 0      |
| 1  | manager-one | 1       | 1           |            | 0      |
| 2  | manager-two | 2       | 2           |            | 0      |
| 2  | manager-two | 2       | 2           |            | 0      |
+----+-------------+---------+-------------+------------+--------+

Calling .distinct on these results removes the duplicates and returns the two managers

Person.joins(:employees).distinct.to_sql
# => SELECT DISTINCT "people".*
#    FROM "people"
#    INNER JOIN "people" "employees_people"
#    ON "employees_people"."manager_id" = "people"."id"
+----+-------------+---------+-------------+------------+--------+
| id | name        | role_id | location_id | manager_id | salary |
+----+-------------+---------+-------------+------------+--------+
| 1  | manager-one | 1       | 1           |            | 0      |
| 2  | manager-two | 2       | 2           |            | 0      |
+----+-------------+---------+-------------+------------+--------+

Duplication?!! Managers?!!

So what happened there? Why did managers show up? Why were they duplicated? Both are effects of the way INNER JOIN works.

In Person.joins(:employees), we try to match all the people to their employees. Employee users donā€™t match anything so they donā€™t return anything (this is because joins uses an INNER join. Other join types keep results that didnā€™t match). A row is returned for each match. Since each manager has two employees, they get two rows each.

This is counter-intuitive at first but makes sense once youā€™ve played with INNER JOIN a bit. This example is particularly confusing because it is joining on itself so we donā€™t see the matched employees we joined on. Letā€™s add them to the result by adding some fields to the SELECT statement:

Person.
  joins(:employees).
  select(
    "people.*,
    employees_people.name AS employee_name,
    employees_people.id AS employee_id"
  ).
  to_sql
# => SELECT people.*,
#      employees_people.name AS employee_name,
#      employees_people.id AS employee_id
#    FROM "people"
#    INNER JOIN "people" "employees_people"
#    ON "employees_people"."manager_id" = "people"."id"
+----+-------------+---------+-------------+------------+--------+-------------------------+-------------+
| id | name        | role_id | location_id | manager_id | salary | employee_name           | employee_id |
+----+-------------+---------+-------------+------------+--------+-------------------------+-------------+
| 1  | manager-one | 1       | 1           |            | 0      | employee-of-manager-one | 3           |
| 1  | manager-one | 1       | 1           |            | 0      | employee-of-manager-one | 4           |
| 2  | manager-two | 2       | 2           |            | 0      | employee-of-manager-two | 5           |
| 2  | manager-two | 2       | 2           |            | 0      | employee-of-manager-two | 6           |
+----+-------------+---------+-------------+------------+--------+-------------------------+-------------+

It is now more obvious why users that matched employees were just managers and why they were ā€œduplicatedā€.

Revering the query

By now, youā€™ve probably guessed that you want to reverse the clauses in the join.

Results (with matched columns added):

Person.
  joins(:manager).
  select(
    "people.*,
    managers_people.name AS manager_name,
    managers_people.id AS manager_id"
  ).
  to_sql
#=> SELECT people.*,
#      managers_people.name AS manager_name,
#      managers_people.id AS manager_id
#    FROM "people"
#    INNER JOIN "people" "managers_people"
#    ON "managers_people"."id" = "people"."manager_id"
+----+-------------------------+---------+-------------+------------+--------+--------------+
| id | name                    | role_id | location_id | manager_id | salary | manager_name |
+----+-------------------------+---------+-------------+------------+--------+--------------+
| 3  | employee-of-manager-one | 3       | 3           | 1          | 0      | manager-one  |
| 4  | employee-of-manager-one | 4       | 4           | 1          | 0      | manager-one  |
| 5  | employee-of-manager-two | 5       | 5           | 2          | 0      | manager-two  |
| 6  | employee-of-manager-two | 6       | 6           | 2          | 0      | manager-two  |
+----+-------------------------+---------+-------------+------------+--------+--------------+

For more info on INNER JOIN, check out this article on the thoughtbot blog Back to Basics: Writing SQL Queries

3 Likes

Hi Nathan,
Thanks a bunch. This makes much more sense. JOINing a table onto itself caused some confusion. Check out Joelā€™s reply. It was also very helpful.

Ashley

Hello, all. I have a question about the first spec in this exercise: order_by_location_name

My solution is the same as the featured solution:

def self.order_by_location_name
  joins(:location).merge(Location.order(:name))
end

However, I found another solution that seems to work when I was reviewing someone elseā€™s code.

joins(:location).order('locations.name')

Is there a difference between these two approaches? The test passed with each one.

Thank you.

EDIT - Pasted the wrong LOC for my solution.

I know maybe this is not the right place to ask, but is there a way to view log (especially database query log) in the exercise repo.

It is helpfull to view whatā€™s going on at SQL level. e.g. Iā€™d like to view the data query details of this code:

class Person < ActiveRecord::Base
  belongs_to :location
  belongs_to :role
  belongs_to :manager, class_name: "Person", foreign_key: :manager_id
  has_many :employees, class_name: "Person", foreign_key: :manager_id

  ...

  def self.with_employees
   # What's the difference between joins(:employees) and joins(:manager)
    joins(:employees).distinct
  end
   
   ...
end

I know rails output rspec log info into log/test.log, but this git repo doenā€™t have rails, just some minimal setup, I did some search at Google, but canā€™t find a way to view the detailed log. So can anyone be kind to tell me how to?

I solved this, but I think in the wrong way and the tests just didnā€™t tell catch it. Here is my solution, and Iā€™m guessing my order_by_location_name method isnā€™t implemented correctly, but it lets me use the method chain I do in with_employees_order_by_location_name.

Can someone tell me why this would be bad?

  def self.order_by_location_name
    joins(:location).order(:name)
  end

  def self.with_employees
    joins(:employees).distinct
  end

  def self.with_employees_order_by_location_name
    with_employees.order_by_location_name
  end
1 Like

I did the same thing - the recommended solution is different, and Iā€™m not sure what the difference is. It would be nice if the tests were written to catch the differences and/or if it was clear what the difference was

1 Like

Since the first test is plopping the location name into a personā€™s name, then you can just order by the personā€™s name to get the desired affect.

def self.order_by_location_name
  order(:name)
end

The test seems to be indirectly grouping people by location through person.name. I think either assigning different names to the people objects or changing the expectation clause would rule out the order(:name) solution. I would opt for the latter so that the expectation aligns with the ā€œitā€ description.

Iā€™ve put in a pull request to change the testā€™s expectation clause.