This topic is for the Querying has_many Associations exercise in the Advanced ActiveRecord Querying trail. Post any questions, comments, or pointers you have to share with other Upcase subscribers.
Hi there,
These exercises are awesome!
I was hoping someone could help me understand the distinct
method.
Person.joins(:employees)
, returns the 4 employees (not managers). Person.joins(:employees).distinct
returns the two managers and none of the employees. When I watched the Upcase video, I thought distinct would filter through the 4 employees and return the unique ones. Since all the employees had unique names, I expected the 4 employees to be returned, not the two managers. Could you help me understand how distinct works?
Hi Ashley,
From what you have written here, it sounds like maybe you understand the distinct method and are really getting thrown off by the has_many :employees vs. belongs_to :manager associations. With Person.joins(:employees).distinct, the query is asking for all People who have employees (managers) and adding distinct on will limit each Person to only being returned once, so the 2 managers come up. If you ran Person.joins(:manager).distinct, the query is requesting all people with a manager, with each Person being returned only once, so the 4 employees who are not managers come up.
Does that make sense? In other words, I think it might be the words chosen for the association that are causing the confusion. I did a double take on it myself at first too. Please correct me if Iām wrong.
Happy Coding!
Nathan
@aginz it may help to understand what is going on behind the scenes in the SQL and look at the data returned:
TL; DR
You correctly understand .distinct
but are confused by the way joins
behaves. In particular, the surprising results are due to INNER JOIN
ing the people table onto itself. Back to Basics: Writing SQL Queries looks at some of the gotchas of INNER JOIN
State of the database
Firstly, it we inspect all the users, we find that there are 6 unique users in the database:
Person.all.to_sql
# => SELECT "people".* FROM "people"
+----+-------------------------+---------+-------------+------------+--------+
| id | name | role_id | location_id | manager_id | salary |
+----+-------------------------+---------+-------------+------------+--------+
| 1 | manager-one | 1 | 1 | | 0 |
| 2 | manager-two | 2 | 2 | | 0 |
| 3 | employee-of-manager-one | 3 | 3 | 1 | 0 |
| 4 | employee-of-manager-one | 4 | 4 | 1 | 0 |
| 5 | employee-of-manager-two | 5 | 5 | 2 | 0 |
| 6 | employee-of-manager-two | 6 | 6 | 2 | 0 |
+----+-------------------------+---------+-------------+------------+--------+
DISTINCT
The SQL DISTICT
keyword removes duplicate rows from the results (similar to Rubyās Array#uniq
)
Person.distinct.to_sql
# => SELECT DISTINCT "people".* FROM "people"
Since there are no duplicate rows, we get the same result (although in a different order):
+----+-------------------------+---------+-------------+------------+--------+
| id | name | role_id | location_id | manager_id | salary |
+----+-------------------------+---------+-------------+------------+--------+
| 4 | employee-of-manager-one | 4 | 4 | 1 | 0 |
| 3 | employee-of-manager-one | 3 | 3 | 1 | 0 |
| 5 | employee-of-manager-two | 5 | 5 | 2 | 0 |
| 1 | manager-one | 1 | 1 | | 0 |
| 6 | employee-of-manager-two | 6 | 6 | 2 | 0 |
| 2 | manager-two | 2 | 2 | | 0 |
+----+-------------------------+---------+-------------+------------+--------+
Joining
Although Person.joins(:employees)
returns 4 results, they arenāt the employees. Instead, itās the two managers duplicated twice.
Person.join(:employees).to_sql
# => SELECT "people".*
# FROM "people"
# INNER JOIN "people" "employees_people"
# ON "employees_people"."manager_id" = "people"."id"
+----+-------------+---------+-------------+------------+--------+
| id | name | role_id | location_id | manager_id | salary |
+----+-------------+---------+-------------+------------+--------+
| 1 | manager-one | 1 | 1 | | 0 |
| 1 | manager-one | 1 | 1 | | 0 |
| 2 | manager-two | 2 | 2 | | 0 |
| 2 | manager-two | 2 | 2 | | 0 |
+----+-------------+---------+-------------+------------+--------+
Calling .distinct
on these results removes the duplicates and returns the two managers
Person.joins(:employees).distinct.to_sql
# => SELECT DISTINCT "people".*
# FROM "people"
# INNER JOIN "people" "employees_people"
# ON "employees_people"."manager_id" = "people"."id"
+----+-------------+---------+-------------+------------+--------+
| id | name | role_id | location_id | manager_id | salary |
+----+-------------+---------+-------------+------------+--------+
| 1 | manager-one | 1 | 1 | | 0 |
| 2 | manager-two | 2 | 2 | | 0 |
+----+-------------+---------+-------------+------------+--------+
Duplication?!! Managers?!!
So what happened there? Why did managers show up? Why were they duplicated? Both are effects of the way INNER JOIN
works.
In Person.joins(:employees)
, we try to match all the people to their employees. Employee users donāt match anything so they donāt return anything (this is because joins
uses an INNER
join. Other join types keep results that didnāt match). A row is returned for each match. Since each manager has two employees, they get two rows each.
This is counter-intuitive at first but makes sense once youāve played with INNER JOIN
a bit. This example is particularly confusing because it is joining on itself so we donāt see the matched employees we joined on. Letās add them to the result by adding some fields to the SELECT
statement:
Person.
joins(:employees).
select(
"people.*,
employees_people.name AS employee_name,
employees_people.id AS employee_id"
).
to_sql
# => SELECT people.*,
# employees_people.name AS employee_name,
# employees_people.id AS employee_id
# FROM "people"
# INNER JOIN "people" "employees_people"
# ON "employees_people"."manager_id" = "people"."id"
+----+-------------+---------+-------------+------------+--------+-------------------------+-------------+
| id | name | role_id | location_id | manager_id | salary | employee_name | employee_id |
+----+-------------+---------+-------------+------------+--------+-------------------------+-------------+
| 1 | manager-one | 1 | 1 | | 0 | employee-of-manager-one | 3 |
| 1 | manager-one | 1 | 1 | | 0 | employee-of-manager-one | 4 |
| 2 | manager-two | 2 | 2 | | 0 | employee-of-manager-two | 5 |
| 2 | manager-two | 2 | 2 | | 0 | employee-of-manager-two | 6 |
+----+-------------+---------+-------------+------------+--------+-------------------------+-------------+
It is now more obvious why users that matched employees were just managers and why they were āduplicatedā.
Revering the query
By now, youāve probably guessed that you want to reverse the clauses in the join.
Results (with matched columns added):
Person.
joins(:manager).
select(
"people.*,
managers_people.name AS manager_name,
managers_people.id AS manager_id"
).
to_sql
#=> SELECT people.*,
# managers_people.name AS manager_name,
# managers_people.id AS manager_id
# FROM "people"
# INNER JOIN "people" "managers_people"
# ON "managers_people"."id" = "people"."manager_id"
+----+-------------------------+---------+-------------+------------+--------+--------------+
| id | name | role_id | location_id | manager_id | salary | manager_name |
+----+-------------------------+---------+-------------+------------+--------+--------------+
| 3 | employee-of-manager-one | 3 | 3 | 1 | 0 | manager-one |
| 4 | employee-of-manager-one | 4 | 4 | 1 | 0 | manager-one |
| 5 | employee-of-manager-two | 5 | 5 | 2 | 0 | manager-two |
| 6 | employee-of-manager-two | 6 | 6 | 2 | 0 | manager-two |
+----+-------------------------+---------+-------------+------------+--------+--------------+
For more info on INNER JOIN
, check out this article on the thoughtbot blog Back to Basics: Writing SQL Queries
Hi Nathan,
Thanks a bunch. This makes much more sense. JOINing a table onto itself caused some confusion. Check out Joelās reply. It was also very helpful.
Ashley
Hello, all. I have a question about the first spec in this exercise: order_by_location_name
My solution is the same as the featured solution:
def self.order_by_location_name
joins(:location).merge(Location.order(:name))
end
However, I found another solution that seems to work when I was reviewing someone elseās code.
joins(:location).order('locations.name')
Is there a difference between these two approaches? The test passed with each one.
Thank you.
EDIT - Pasted the wrong LOC for my solution.
I know maybe this is not the right place to ask, but is there a way to view log (especially database query log) in the exercise repo.
It is helpfull to view whatās going on at SQL level. e.g. Iād like to view the data query details of this code:
class Person < ActiveRecord::Base
belongs_to :location
belongs_to :role
belongs_to :manager, class_name: "Person", foreign_key: :manager_id
has_many :employees, class_name: "Person", foreign_key: :manager_id
...
def self.with_employees
# What's the difference between joins(:employees) and joins(:manager)
joins(:employees).distinct
end
...
end
I know rails output rspec log info into log/test.log, but this git repo doenāt have rails, just some minimal setup, I did some search at Google, but canāt find a way to view the detailed log. So can anyone be kind to tell me how to?
I solved this, but I think in the wrong way and the tests just didnāt tell catch it. Here is my solution, and Iām guessing my order_by_location_name method isnāt implemented correctly, but it lets me use the method chain I do in with_employees_order_by_location_name.
Can someone tell me why this would be bad?
def self.order_by_location_name
joins(:location).order(:name)
end
def self.with_employees
joins(:employees).distinct
end
def self.with_employees_order_by_location_name
with_employees.order_by_location_name
end
I did the same thing - the recommended solution is different, and Iām not sure what the difference is. It would be nice if the tests were written to catch the differences and/or if it was clear what the difference was
Since the first test is plopping the location name into a personās name, then you can just order by the personās name to get the desired affect.
def self.order_by_location_name
order(:name)
end
The test seems to be indirectly grouping people by location through person.name
. I think either assigning different names to the people objects or changing the expectation clause would rule out the order(:name)
solution. I would opt for the latter so that the expectation aligns with the āitā description.
Iāve put in a pull request to change the testās expectation clause.