← Back to Upcase

Advanced ActiveRecord Querying - has_many Associations


(Upcase ) #1

This topic is for the Querying has_many Associations exercise in the Advanced ActiveRecord Querying trail. Post any questions, comments, or pointers you have to share with other Upcase subscribers.


(Ashley Ginsberg) #5

Hi there,
These exercises are awesome!

I was hoping someone could help me understand the distinct method.

Person.joins(:employees), returns the 4 employees (not managers). Person.joins(:employees).distinct returns the two managers and none of the employees. When I watched the Upcase video, I thought distinct would filter through the 4 employees and return the unique ones. Since all the employees had unique names, I expected the 4 employees to be returned, not the two managers. Could you help me understand how distinct works?


(Nathan Elliott) #6

Hi Ashley,

From what you have written here, it sounds like maybe you understand the distinct method and are really getting thrown off by the has_many :employees vs. belongs_to :manager associations. With Person.joins(:employees).distinct, the query is asking for all People who have employees (managers) and adding distinct on will limit each Person to only being returned once, so the 2 managers come up. If you ran Person.joins(:manager).distinct, the query is requesting all people with a manager, with each Person being returned only once, so the 4 employees who are not managers come up.

Does that make sense? In other words, I think it might be the words chosen for the association that are causing the confusion. I did a double take on it myself at first too. Please correct me if I’m wrong.

Happy Coding!
Nathan


(Joel Quenneville) #7

@aginz it may help to understand what is going on behind the scenes in the SQL and look at the data returned:

TL; DR
You correctly understand .distinct but are confused by the way joins behaves. In particular, the surprising results are due to INNER JOINing the people table onto itself. https://robots.thoughtbot.com/back-to-basics-sql#inner-join looks at some of the gotchas of INNER JOIN

State of the database

Firstly, it we inspect all the users, we find that there are 6 unique users in the database:

Person.all.to_sql
# => SELECT "people".* FROM "people"
+----+-------------------------+---------+-------------+------------+--------+
| id | name                    | role_id | location_id | manager_id | salary |
+----+-------------------------+---------+-------------+------------+--------+
| 1  | manager-one             | 1       | 1           |            | 0      |
| 2  | manager-two             | 2       | 2           |            | 0      |
| 3  | employee-of-manager-one | 3       | 3           | 1          | 0      |
| 4  | employee-of-manager-one | 4       | 4           | 1          | 0      |
| 5  | employee-of-manager-two | 5       | 5           | 2          | 0      |
| 6  | employee-of-manager-two | 6       | 6           | 2          | 0      |
+----+-------------------------+---------+-------------+------------+--------+

DISTINCT

The SQL DISTICT keyword removes duplicate rows from the results (similar to Ruby’s Array#uniq)

Person.distinct.to_sql
# => SELECT DISTINCT "people".* FROM "people"

Since there are no duplicate rows, we get the same result (although in a different order):

+----+-------------------------+---------+-------------+------------+--------+
| id | name                    | role_id | location_id | manager_id | salary |
+----+-------------------------+---------+-------------+------------+--------+
| 4  | employee-of-manager-one | 4       | 4           | 1          | 0      |
| 3  | employee-of-manager-one | 3       | 3           | 1          | 0      |
| 5  | employee-of-manager-two | 5       | 5           | 2          | 0      |
| 1  | manager-one             | 1       | 1           |            | 0      |
| 6  | employee-of-manager-two | 6       | 6           | 2          | 0      |
| 2  | manager-two             | 2       | 2           |            | 0      |
+----+-------------------------+---------+-------------+------------+--------+

Joining

Although Person.joins(:employees) returns 4 results, they aren’t the employees. Instead, it’s the two managers duplicated twice.

Person.join(:employees).to_sql
# => SELECT "people".*
#    FROM "people"
#    INNER JOIN "people" "employees_people"
#    ON "employees_people"."manager_id" = "people"."id"
+----+-------------+---------+-------------+------------+--------+
| id | name        | role_id | location_id | manager_id | salary |
+----+-------------+---------+-------------+------------+--------+
| 1  | manager-one | 1       | 1           |            | 0      |
| 1  | manager-one | 1       | 1           |            | 0      |
| 2  | manager-two | 2       | 2           |            | 0      |
| 2  | manager-two | 2       | 2           |            | 0      |
+----+-------------+---------+-------------+------------+--------+

Calling .distinct on these results removes the duplicates and returns the two managers

Person.joins(:employees).distinct.to_sql
# => SELECT DISTINCT "people".*
#    FROM "people"
#    INNER JOIN "people" "employees_people"
#    ON "employees_people"."manager_id" = "people"."id"
+----+-------------+---------+-------------+------------+--------+
| id | name        | role_id | location_id | manager_id | salary |
+----+-------------+---------+-------------+------------+--------+
| 1  | manager-one | 1       | 1           |            | 0      |
| 2  | manager-two | 2       | 2           |            | 0      |
+----+-------------+---------+-------------+------------+--------+

Duplication?!! Managers?!!

So what happened there? Why did managers show up? Why were they duplicated? Both are effects of the way INNER JOIN works.

In Person.joins(:employees), we try to match all the people to their employees. Employee users don’t match anything so they don’t return anything (this is because joins uses an INNER join. Other join types keep results that didn’t match). A row is returned for each match. Since each manager has two employees, they get two rows each.

This is counter-intuitive at first but makes sense once you’ve played with INNER JOIN a bit. This example is particularly confusing because it is joining on itself so we don’t see the matched employees we joined on. Let’s add them to the result by adding some fields to the SELECT statement:

Person.
  joins(:employees).
  select(
    "people.*,
    employees_people.name AS employee_name,
    employees_people.id AS employee_id"
  ).
  to_sql
# => SELECT people.*,
#      employees_people.name AS employee_name,
#      employees_people.id AS employee_id
#    FROM "people"
#    INNER JOIN "people" "employees_people"
#    ON "employees_people"."manager_id" = "people"."id"
+----+-------------+---------+-------------+------------+--------+-------------------------+-------------+
| id | name        | role_id | location_id | manager_id | salary | employee_name           | employee_id |
+----+-------------+---------+-------------+------------+--------+-------------------------+-------------+
| 1  | manager-one | 1       | 1           |            | 0      | employee-of-manager-one | 3           |
| 1  | manager-one | 1       | 1           |            | 0      | employee-of-manager-one | 4           |
| 2  | manager-two | 2       | 2           |            | 0      | employee-of-manager-two | 5           |
| 2  | manager-two | 2       | 2           |            | 0      | employee-of-manager-two | 6           |
+----+-------------+---------+-------------+------------+--------+-------------------------+-------------+

It is now more obvious why users that matched employees were just managers and why they were “duplicated”.

Revering the query

By now, you’ve probably guessed that you want to reverse the clauses in the join.

Results (with matched columns added):

Person.
  joins(:manager).
  select(
    "people.*,
    managers_people.name AS manager_name,
    managers_people.id AS manager_id"
  ).
  to_sql
#=> SELECT people.*,
#      managers_people.name AS manager_name,
#      managers_people.id AS manager_id
#    FROM "people"
#    INNER JOIN "people" "managers_people"
#    ON "managers_people"."id" = "people"."manager_id"
+----+-------------------------+---------+-------------+------------+--------+--------------+
| id | name                    | role_id | location_id | manager_id | salary | manager_name |
+----+-------------------------+---------+-------------+------------+--------+--------------+
| 3  | employee-of-manager-one | 3       | 3           | 1          | 0      | manager-one  |
| 4  | employee-of-manager-one | 4       | 4           | 1          | 0      | manager-one  |
| 5  | employee-of-manager-two | 5       | 5           | 2          | 0      | manager-two  |
| 6  | employee-of-manager-two | 6       | 6           | 2          | 0      | manager-two  |
+----+-------------------------+---------+-------------+------------+--------+--------------+

For more info on INNER JOIN, check out this article on the thoughtbot blog https://robots.thoughtbot.com/back-to-basics-sql#inner-join


(Ashley Ginsberg) #8

Hi Nathan,
Thanks a bunch. This makes much more sense. JOINing a table onto itself caused some confusion. Check out Joel’s reply. It was also very helpful.

Ashley


(Hunter Stevens) #9

Hello, all. I have a question about the first spec in this exercise: order_by_location_name

My solution is the same as the featured solution:

def self.order_by_location_name
  joins(:location).merge(Location.order(:name))
end

However, I found another solution that seems to work when I was reviewing someone else’s code.

joins(:location).order('locations.name')

Is there a difference between these two approaches? The test passed with each one.

Thank you.

EDIT - Pasted the wrong LOC for my solution.


(Song Hui) #10

I know maybe this is not the right place to ask, but is there a way to view log (especially database query log) in the exercise repo.

It is helpfull to view what’s going on at SQL level. e.g. I’d like to view the data query details of this code:

class Person < ActiveRecord::Base
  belongs_to :location
  belongs_to :role
  belongs_to :manager, class_name: "Person", foreign_key: :manager_id
  has_many :employees, class_name: "Person", foreign_key: :manager_id

  ...

  def self.with_employees
   # What's the difference between joins(:employees) and joins(:manager)
    joins(:employees).distinct
  end
   
   ...
end

I know rails output rspec log info into log/test.log, but this git repo doen’t have rails, just some minimal setup, I did some search at Google, but can’t find a way to view the detailed log. So can anyone be kind to tell me how to?


(Dave Maurer) #11

I solved this, but I think in the wrong way and the tests just didn’t tell catch it. Here is my solution, and I’m guessing my order_by_location_name method isn’t implemented correctly, but it lets me use the method chain I do in with_employees_order_by_location_name.

Can someone tell me why this would be bad?

  def self.order_by_location_name
    joins(:location).order(:name)
  end

  def self.with_employees
    joins(:employees).distinct
  end

  def self.with_employees_order_by_location_name
    with_employees.order_by_location_name
  end