What is Memoization?

When developing applications we often have methods that run slowly. Perhaps they need to query the database, or hit an external service, both of which can cause them to slow down. We could call the method every time we need that data and just accept the overhead, but if performance is a concern we have some options. For one, we can assign the data to a variable and re-use it, which would speed up the process. While a possible solution, manually managing that variable could quickly become tedious. But, what if instead, the method doing this “slow work” could just handle that variable for us? This would allow us to call the method in the same way, but have the method save and re-use the data. This is exactly what memoization does.

Put simply, memoization is saving a method’s return value so it does not have to be recomputed each time. As with all caching, you are effectively trading memory for time (i.e. you give up the memory required to store the value, but you save the time required to process the method).

How to Memoize a Value

Ruby provides a very clean idiom for memoizing values with the or-equals operator: ||=. This uses a logical OR (||) between left and right values, then assigns the result to the variable on the left. In action:

value ||= expensive_method(123)

#logically equivalent to:
value = (value || expensive_method(123))

How Does Memoization Work

To understand how this works you need to grasp two concepts: “falsey” values and lazy evaluation. We’ll start with truthy-falsey first.

Truthy and Falsey

Ruby (like almost all other languages) has built-in keywords for boolean true and false values. They work exactly as you’d expect:

if true
  #we always run this
end

if false
  # this will never run
end

However, Ruby (and many other languages) also has the concept of “truthy” and “falsey” values. This means values can be treated “as if” they were true or false. In Ruby only nil and false are falsey. All other values (including zero) are treated as true (note: other languages make different choices. For example C treats zero as false). Re-using our example from above, we could also write:

value = "abc123" # a string
if value
  # we always run this
end

value = nil
if value
  # this will never run
end

Lazy Evaluation

Lazy evaluation is a form of optimization that is very common in programming languages. It allows the program to skip operations that are not necessary.

The logical OR operator (||) returns true if either left or right-hand sides are true. This means that if the left-hand argument is true there is no point evaluating the right-hand side since we already know the result will be true. If we were to implement this ourselves we might end up with something like this:

def logical_or (lhs, rhs)
  return lhs if lhs
  
  rhs
end

If lhs and rhs were functions (e.g. lamdas) then you can see rhs will only execute if lhs is falsey.

Or-Equals

Combining these two concepts of truthy-falsey values and lazy evaluation shows us what the ||= operator is doing:

value #defaults to nil
value ||= "test"
value ||= "blah"
puts value
=> test

We start with value being nil because it was not initialized. Next, we encounter our first ||= operator. value is falsey at this stage so we evaluate the right-hand side ("test") and assign the result to value. Now we hit the second ||= operator, but this time value is truthy as it has the value "test". We skip the evaluation of the right-hand side and continue with value untouched.

Trouble managing your Github Pull Requests?

GitArborist was created to simplify Pull Request management on Github
Mark PR dependencies, automatic merging, scheduled merges, and more →

Deciding When to Use Memoization

When using memoization there are some questions we need to ask ourselves: How often is the value accessed? What causes it to change? How often does it change?

If the value is only accessed once then caching the value is not going to be very useful, the more often the value is accessed, the more benefit we can get from caching it.

When it comes to what causes it to change we need to look at what values are used in the method. Does it take arguments? If so the memoization probably needs to take this into account. Personally, I like using the memoist gem for this as it handles arguments for you.

Lastly, we need to consider how often the value changes. Are there instance variables that cause it to change? Do we need to clear the cached value when they change? Should the value be cached at the object level or the class level?

To answer these questions let’s look at a simple example and step through the decisions:

class ProfitLossReport
  def initialize(title, expenses, invoices)
    @expenses = expenses
    @invoices = invoices
    @title = title
  end
  
  def title
    "#{@title} #{Time.current}"
  end
  
  def cost
    @expenses.sum(:amount)
  end
  
  def revenue
    @invoices.sum(:amount)
  end
  
  def profit
    revenue - cost
  end
  
  def average_profit(months)
    profit / months.to_f
  end
end

Calling code is not shown here, but it’s a good guess that the title method is probably only called once, it also uses Time.current so memoizing it could mean the value instantly becomes stale.

The revenue and cost methods are hit several times even within this class. Given that they both require hitting the database, they would be prime candidates for memoizing if performance became an issue. Assuming we memoize these, then profit shouldn’t need to be memoized, otherwise, we’re just adding caching on top of caching for minimal gains.

Finally, we have average_profit. The value here relies on the argument so our memoizing has to take this into account. For a simple case like revenue we can just do this:

def revenue
  @revenue ||= @invoices.sum(:amount)
end

For average_profit though, we need a different value for each argument that is passed in. We could use memoist for this, but for the sake of clarity we’ll roll our own solution here:

def average_profit(months)
  @average_profit ||= {}
  @average_profit[months] ||= profit / months.to_f
end

Here we’re using a hash to keep track of our computed values. First we ensure @average_profit has been initialized, then we use the argument passed in is as the hash key.

Memoizing at the Class Level or Instance Level

Most of the time memoization is done at the instance level, meaning we use an instance variable to hold the computed value. This also means that whenever we create a new instance of the object it does not benefit from the “cached” value. Here’s a very simple illustration:

class MemoizedDemo
  def value
    @value ||= computed_value
  end
  
  def computed_value
    puts "Crunching Numbers"
    rand(100)
  end
end

Using this object we can see the results:

demo = MemoizedDemo.new
=> #<MemoizedDemo:0x00007f95e5d9d398>

demo.value
Crunching Numbers
=> 19

demo.value
=> 19

MemoizedDemo.new.value
Crunching Numbers
=> 93

We can change this by simply using a class-level variable (with @@) for our memoized value:

  def value
    @@value ||= computed_value
  end

The results then become:

demo = MemoizedDemo.new
=> #<MemoizedDemo:0x00007f95e5d9d398>
demo.value
Crunching Numbers
=> 60
demo.value
=> 60
MemoizedDemo.new.value
=> 60

You may not want class-level memoization often, but it is there as an option. However, if you need a value to be cached at this level it is probably worth looking at caching the value with an external store like Redis or memcached.

Common Memoization Use Cases in Ruby on Rails Applications

In Rails applications, the most common use-case I see for memoization is reducing database calls, particularly when a value is not going to change within a single request. “Finder” methods for looking up records in controllers are a good example of this kind of database call such as:

  def current_user
    @current_user ||= User.find(params[:user_id])
  end

Another common place is if you use some type of decorator/presenter/view-model type of architecture for rendering views. Methods in these objects often have good candidates for memoization because they only persist for the life of the request, the data is normally not mutated, and some methods are probably hit multiple times when rendering the views.

Memoization Gotchas

One of the biggest gotchas is memoizing things when it is not really necessary. Things like string interpolation can look like easy candidates for memoization, but in reality, they are unlikely to be causing any noticeable impact on your site’s performance (unless of course you are using exceptionally large strings or doing a very large amount of string manipulation), for example:

  def title
    # memoization here is not going to have much of an impact on our performance
    @title ||= "#{@object.published_at} - #{@object.title}"
  end

Another thing to watch for is our old friend cache invalidation, particularly if your memoized value depends on the state of the object. One way to help prevent this is to cache at the lowest level you can. Instead of caching a method computing a + b it may be better to cache the a and b methods individually.

  # Instead of this
  def profit
    # anyone else calling 'revenue' or 'losses' is not benefitting from the caching here
    # and what happens if the 'revenue' or 'losses' value changes, will we remember to update profit?
    @profit ||= (revenue - losses)
  end
 
  # try this
  def profit
    # no longer cached, but subtraction is a fast calculation
    revenue - losses
  end
  
  def revenue
    @revenue ||= Invoice.all.sum(:amount)
  end
  
  def losses
    @losses ||= Purchase.all.sum(:amount)
  end

The last gotcha is due to how lazy evaluation works - you will have to do something a bit more custom if you need to memoize a falsey value (i.e nil or false), as the ||= idiom will always execute the right-hand side if your saved value is falsey. In my experience, it’s not often you need to cache these values, but if you do, you may need to add a boolean flag to indicate it’s already been computed, or use another caching mechanism.

  def last_post
    # if the user has no posts, we will hit the database every time this method is called
    @last_post ||= Post.where(user: current_user).order_by(created_at: :desc).first
  end
  
  # As a simple workaround we could do something like:
  def last_post
    return @last_post if @last_post_checked
    
    @last_post_checked = true
    @last_post ||= Post.where(user: current_user).order_by(created_at: :desc).first
  end

When Memoization is Not Enough

Memoization can be a cheap and effective way to improve performance in parts of your application, but it’s not without its drawbacks. One big one is persistence; for common instance-level memoization, the value is only saved for that one particular object. This makes memoization great for saving values for the life of a web request, but doesn’t give you the full benefits of caching if you have values that would be the same for multiple requests and are being re-computed each time.

Class-level memoization could help with this, but it becomes more difficult to manage cache invalidation. Not to mention that if your server reboots those cached values will be lost, and they can’t be shared among multiple web servers.

In the next issue in this series on caching we’ll look at Rails’ solution to these problems - low-level caching. Allowing you to cache values to an external store that can be shared among servers, and manage cache invalidation with expiry timeouts and dynamic cache keys.