Performance Woes
I’ve been running GitArborist for a couple of months now, gradually adding features. Like any good production app, I have some logging and metrics set up to monitor and troubleshoot issues.
Over these last few months, I’ve seen recurring performance issues, with response times in double-digit seconds in some cases. I made various fixes like you’d expect: adding database indexes, returning early where possible, etc. These were improvements to be sure, but I never fully removed these occasional long response times, though I could not reproduce it myself.
While looking in the logs of one slow request I saw a DB lookup take over 700ms for a single record, followed by another table lookup taking over 100ms. In both cases the field has a unique index, one is looking up a string, the latter a bigint. I was shocked to see an indexed field, in a pretty small database, take so long to look up.
Heroku’s Postgres Offering
I’m currently using Heroku’s “Hobby Dev” postgres plan, which is free for up to 10,000 rows. Heroku does not say much about the performance of the plan, presumably because it is a shared box with no dedicated RAM/CPU resources, though they do note it has no in-memory cache and could be hit with unannounced maintenance windows or upgrades.
Now, GitArborist currently makes no money and has a small number of users, so 10,000 rows for free is a fine trade-off for me. What surprised me is that the lowest tier postgres plan for a “production” application is Standard-0 priced at $50/month. Considering the dyno’s only cost me $7/month each, I’m shocked Heroku does not have any plans between the $9 Hobby-Basic plan (which has the same limitations described above but the row limit is increased to 10,000,000) and $50.
This also raised two more questions in my mind: 1) How bad are the hobby level plans in real-world usage, and 2) what do some alternatives look like?
Contenders
To be clear, I do not intend this to be an apples-to-apples performance comparison, nor did I set out to make it statistically rigorous. I just wanted a quick-and-dirty setup to answer one question: If I want to spend < $50/month on postgres hosting, what performance is available to me?
I settled on three test benches, basically the options I would consider in the real world:
- Heroku’s Hobby-Dev plan (free)
- Dokku on a DigitalOcean Droplet ($15/month. 2GB RAM, 2 CPU)
- Dokku on a DigitalOcean Hosted Postgres plan ($15/month + $5/month Droplet to host the test app)
The Test Setup
The test setup is pretty simple; A fresh Rails 6 application (This was Ruby 2.6.5 and Rails 6.0.2.2) with redis/sidekiq, and postgres. In the database, I have a simple table with one string
and one bigint
. Both have a unique index on them. I seed the database with 10,000 of these records using random values:
used_ints = []
10_000.times do |i|
print "." if i % 100 == 0
puts i if i % 1000 == 0
next_int = nil
# find a unique integer
loop do
next_int = rand(10**10)
break unless used_ints.include?(next_int)
end
used_ints << next_int
DataItem.create!(
name: SecureRandom.alphanumeric(12 + rand(120)),
uniqint: next_int
)
end
For machines using Dokku, I used the one-click setup in DigitalOcean and then apt upgrade
which brought it up to Dokku 0.20.4.
Test Execution
Once that’s done, I have a scheduled background job running every minute, each run it performs 60 lookups of a string and an integer (assuming that worst-case, it could take 1 second to do the two queries). I’m using ActiveSupport’s Notifications to monitor query time:
class PerformanceCheckJob < ApplicationJob
def perform
ActiveSupport::Notifications.subscribe('sql.active_record') do |name, start, finish, id, payload|
event = ActiveSupport::Notifications::Event.new(name, start, finish, id, payload)
@duration = event.duration #how long it took to run the sql command
end
60.times do
do_perf_test
end
end
private
def do_perf_test
named_search = DataItem.find_by(name: random_name)
named_time = @duration
int_search = DataItem.find_by(uniqint: random_integer)
int_time = @duration
named_search.update!(name_lookup_time: named_time)
int_search.update!(int_lookup_time: int_time)
end
...
The only other interesting thing going on here is that I’m caching all the strings and integers the first time the job runs, this is to avoid that query being repeated and interfering with the results:
def random_name
all_names.sample
end
def random_integer
all_integers.sample
end
def all_names
Rails.cache.fetch(:all_names) do
DataItem.all.pluck(:name)
end
end
def all_integers
Rails.cache.fetch(:all_integers) do
DataItem.all.pluck(:uniqint)
end
end
As mentioned earlier, this is just a quick-and-dirty test. There are numerous ways this could be improved; for one thing, the response times are being overwritten by new entries whereas for better results they should all be persisted.
Test Results
After running for over 48 hours, here are the results:
Unique Indexed Strings
Machine | Min | Max | Avg | 50th | 95th | 99th |
---|---|---|---|---|---|---|
DO Postgres | 11.7ms | 114.7ms | 27.8ms | 23.6ms | 55.6ms | 72.5ms |
Dokku Postgres | 22.5ms | 199.1ms | 44.4ms | 38.6ms | 78.8ms | 98.9ms |
Heroku Hobby-Dev | 0.8ms | 3138.5ms | 23.5ms | 2.2ms | 68.3ms | 698.3ms |
Unique Indexed BigInt
Machine | Min | Max | Avg | 50th | 95th | 99th |
---|---|---|---|---|---|---|
DO Postgres | 12.6ms | 119.5ms | 27.4ms | 23.1ms | 54.0ms | 71.7ms |
Dokku Postgres | 21.3ms | 192.5ms | 44.9ms | 39.2ms | 80.3ms | 101.1ms |
Heroku Hobby-Dev | 0.7ms | 1627.3ms | 11.7ms | 2.0ms | 22.3ms | 248.9ms |
Surprises
There’s a lot of things that surprised me in these results. For one, Heroku’s lower minimum lookup is considerably lower than the DigitalOcean alternatives (though this only happened around the third day of running, before that DO had minimums around the 1-1.5ms mark, I’m not sure what happened that caused the increase).
You can see that the free Heroku plan does great on minimum and average lookup times, easily comparable with DigitalOcean’s managed option. The variability of Heroku’s plan is the killer though, with 99th percentile times being almost 10 times better on DigitalOcean for strings, and 3 times better for bigintegers. Keep in mind that the slow responses are probably close to one another in time as well, so although it is only the 99th percentile, a lot of these likely happened around the same timeframe (i.e. when traffic spikes in Heroku’s servers).
I also observed a less uniform distribution of random numbers on Heroku, in that it took longer to fill up rows and never updated all 10,000 rows, though I have no explanation for this.
Recommendation
Going forward, I see three realistic options to choose from (for me, paying another $50/month is not a realistic option):
- Just put up with it, increase the threshold on slow response time alerts.
- Continue running dynos on Heroku, host database on DigitalOcean.
- Move all hosting over to DigitalOcean.
Fortunately for me, GitArborist mainly processes data coming from Github’s servers, not users. This makes #1 a perfectly viable option for me where it might not be for other “side-project” developers. #2 is a compelling option as it means you get to stay in the Heroku ecosystem with the add-ons and services they provide, essentially this would be treating DO as a $15 postgres add-on.
If I was building a user-facing application, #3 would certainly be the best bang-for-buck option. If you went with self-hosted Dokku, you’re paying $15 for a box that can be your web, worker, and postgres instance all in one, that’s only $1 more than the web/worker dynos are going to cost you on Heroku.
Conclusion
Hopefully I don’t sound too critical of Heroku here. I get it, they provide Ops-as-a-service and charge accordingly. Yet it cannot be denied that they also want to woo new developers with their simple git-push-to-deploy setup and free tier offerings.
In the end, I’d love to see Heroku come out with some performance-level postgres plan for the lower end. Perhaps a $15/month plan for some limited-in-size-not-in-reliability configuration. I’m also curious about what the use-case is for their $9 Hobby plan, where someone wants 10 million rows but is ok with unreliable performance.