Left Join for Advanced Rubyists

by Wojciech Mikusek / Ruby On Rails Developer @ Exlabs
Wojciech Mikusek Tech Lead

You will learn: Left Join with filtering (Having, Count and Group By) in Ruby on Rails

Every now and then you will need to keep a track of user activities in the system. This can be approached in several possible ways – one possible solution is to store them as an activity log in the database. This alone can create some challenges and may require you to use less common SQL statements in the Ruby on Rails apps like LEFT JOIN, hence I’ve decided to shed some light on it.

Some time ago I was given the task of adding a filter on a blog admin panel page to display articles by a count of views (less, equal, more than a given value) in a given period of time. It looked like a standard every-day duty, but it turned out to be a good lesson on how LEFT JOIN really works.

First, let’s take a brief look at the database:


Pic. 1 – Simplified database schema

The article model has a polymorphic relation to the activity model (as activitable) which belongs to both user and activity type models. The activity type model has a unique “name” field holding values like started, viewed, finished and so on. The activity model has a timestamp field created_at, while the article model has some more fields with details which we want to display to the users as title and body.

The task looks quite simple but turned out to be a bit tricky. Typically, RoR developers use joins method to combine 2 tables together. However, there is a thing that needs to be remembered here – INNER JOIN selects only rows that have matching records in both tables. In our case, this would result in excluding articles that were not viewed at all. Luckily, there are other types of JOIN statements in SQL which we can use like LEFT JOIN. The difference between those JOINS shows the picture below:

INNER and LEFT JOIN comparison diagram

 

Pic. 2 – INNER and LEFT JOIN comparison diagram

ActiveRecord offers a handy left_joins method (alias of left_outer_joins) that will handle all the logic of linking articles to activities and a having method that will allow filtering activity counts. Moreover, as described in the documentation, I had to add a group method when using having. Params validations were skipped in the code gist.

Which produces this SQL query (with some filter parameters added):

Pic 3. Example of SQL result of left_joins method with auto-generated join logic

I was surprised when the above example did not work as expected. During tests, it turned out that when less than the selected value was chosen, articles with no activities disappeared, the same as if INNER JOIN was used instead of LEFT JOIN.

After some research (I recommend reading this article) I figured out what was really happening. If there is an article without any activity, as soon as you use “activities” table in you WHERE clause things go south. In a simplified explanation, PostgreSQL will attempt to “create” missing activity by filling its all attributes to NULL. This effectively will make the following SQL:

activities.activity_type_id = 133 

…to fail (activity_type_id for such ‘ghost’ record would be NULL and comparison with NULL in SQL is neither TRUE nor FALSE – see this for more details) and eliminate the article from a result set. 

Fortunately, there is an easy way out. I rewrote this code by moving all activities filtering inside the join statement. Sadly, there is no way to use the power of ActiveRecord here (like auto evaluating polymorphic relations), but still, the code looks readable enough.  

Pic 4. Example of SQL result of left_joins methods with custom join logic

To sum up, there are a few things you should remember from this case study:

  • LEFT JOIN may lead to some unexpected results when you want to filter joined table
  • Always write tests that check edge cases like no record in has many relations
  • Despite many handy methods, ActiveRecord allows you to write custom SQL queries when needed