Don't Spam Your Users: Batch Notifications in Rails

The wrong way to do email notifications

Meldium sends emails to our users for a few different interesting events. The most common event in our system, and the one that generates 90% of our emails, is when Alice shares an app with Bob:

sharing email.png

We send one message for each of these sharing events, by kicking off a Resque job in an after_commit hook. But this naïve implementation goes terribly wrong when a team is getting set up for the first time on Meldium: Alice might import a few dozen credentials from her spreadsheet of passwords and want to share many or all of them with Bob. The result is that Bob's inbox looks like this:

spam.png

An actual customer sent us that screenshot (if you're reading this, we're sorry!) - it was his first impression of Meldium, and it was a lousy one. If Alice's sharing occurred during a single controller action or transaction, we could put in some hooks to detect that multiple things were being shared and collapse these into a single message, and in some cases that works. However, it's more common in Meldium for Alice to share a few credentials with Bob, one at a time, over the course of about 10-15 minutes. So what we want is a batch notification system that can span multiple HTTP requests and multiple independent database transactions.

What we've built and deployed is a very simple workflow using resque, resque-scheduler, mailboxer, and Mailgun that lets our Rails app send asynchronous batch notifications to users based on a 'cool-down' period from the last event. I'll show you how it works here and provide some sample code so you can add a similar system to your Rails 3 app.

A simple batching design

Before I discuss the implementation of batching, I want to talk briefly about a simple batching algorithm that doesn't involve too much state tracking. The idea is that we want to wake up periodically, see if there are any new notification batches to deliver, and then either deliver them or go back to sleep, without adding much bookkeeping. At a high level, the algorithm looks like this:

every POLLING_PERIOD minutes:
  for each user:
    undelivered <- all the undelivered notifications for the user
    newest <- the maximum timestamp in undelivered

    if (now - newest) > COOLDOWN_PERIOD:
      deliver_as_batch undelivered

This algorithm effectively starts a timer of length COOLDOWN_PERIOD every time a notification is created, and resets that timer if another one is created in the meantime. There's one flaw in this algorithm that may or may not be critical for your application - old notifications can be starved and never sent if new notifications keep coming in (since our timer keeps getting reset). If your notifications are time-critical, you can use this modified algorithm to make sure they don't get starved for too long:

every POLLING_PERIOD minutes:
  for each user:
    undelivered <- all the undelivered notifications for the user
    oldest, newest <- the minimum and maximum timestamps in undelivered

    if (now - newest) > COOLDOWN_PERIOD:
      deliver_as_batch undelivered
    else if (now - oldest) > MAX_STALE_MESSAGE_PERIOD:
      deliver_as_batch undelivered

The else if clause above ensures that messages won't remain in the batching state for too long, but also means that users might receive multiple batches in the event of frequent notifications. The values of POLLING_PERIOD, COOLDOWN_PERIOD, and MAX_STALE_MESSAGE_PERIOD must all be tuned to your application - after looking at our historic sharing data, we decided to set POLLING_PERIOD at 1 minute, COOLDOWN_PERIOD at 4 minutes, and we did not implement MAX_STALE_MESSAGE_PERIOD.

From pseudocode to code

First, add the mailboxer gem to your Rails app; the basic steps are:

  1. Add gem 'mailboxer', git: 'https://github.com/ging/mailboxer.git' to Gemfile
  2. bundle update
  3. rails g mailboxer:install
  4. rake db:migrate

Mailboxer is a Rails engine with two major components - a set of ActiveRecord models that you can use for sending messages to users and tracking receipts and notifications, and a mailer to convert messages in to emails. We're just going to use the model component and write our own asynchronous batch mailer. In order to do this, add a config/initializer/mailboxer.rb that looks like this:

Now you need to add hooks to your application to create notifications for each relevant event. We decided to use the mailboxer Notification model, instead of the Message model, since it allows you to attach an arbitrary ActiveRecord object to the notification that we later use to build the message. In Meldium, a shared application is represented by a join table called app_shares which has a sharer_id, a sharee_id, and an app_id; we modified AppShare's after_commit hook to create a sharing notification:

You'll also need to make your User model messageable so you can call User#notify:

With these changes, your app will now happily generate notifications every time a sharing event takes place. The next step is to actually batch and dispatch those notifications as emails. Since our front-end runs on Heroku, we decided to use resque-scheduler to implement the "every N minutes" loop from the pseudocode above. I followed this guide to set up resque-scheduler, omitting the auto-scaling pieces.

The actual batching system uses two jobs - one job that runs every minute and quickly looks for batches that might need to be run, and a second job that actually locks the notifications and performs the send. Here's the first job:

There's a lot going on in that User query so let's unpack it - we're trying to identify all of the distinct Users that have at least one unread notification and have also been confirmed (we don't send notifications to users that haven't yet confirmed their email yet and opted in to using Meldium). A very simple schedule drives this job once a minute:

The bulk of the work is done by SendNotificationJob, which runs for each user to evaluate whether or not the notification batch is ready to go and if so, sends it:

Again, there are some interesting DB / Arel tricks going on here that are worth explaining. The most interesting is the call to lock(true) on the receipts scope - in Postgres, this creates a SELECT ... FOR UPDATE query that ensures our job has exclusive access to those rows until the transaction commits or rolls back. We need to perform this manual locking in case two of these jobs are running on the same user at the same time - if you're running multiple Resque workers, and your queue is busy enough, it's possible that ScheduleSendNotificationsJob could run twice before SendNotificationsJob runs once, and the two Resque workers need to exclude each other from operating on the same Notifications.

I've omitted the implementation of NotificationsMailer here since it's mostly Meldium-specific logic for constructing a nice looking email from multiple Notification objects. You'll probably want to have two different email styles depending on whether you are sending one or many Notifications in a batch. Also, you may need to handle multiple heterogenous Notification types if you're using this technique across several models. Of course, you'll also need a transactional mailer to fire off the actual emails - we use and recommend Mailgun for this.

Is your app batching notifications? We'd love to hear about how others are approaching this problem - there doesn't seem to be an off-the-shelf implementation of this pattern for Rails apps. Let us know what you think by joining the discussion on Hacker News.