Fight flakiness and speed up your Cypress tests. Rails 6.1 horizontal sharding & data cleaning mechanism (Part 1)

As a project evolves, running cypress to cover its behaviour starts facing hurdles. When the number of tests grows you want the whole test suite to finish faster, fight flakiness and deal with optimization problems from infrastructural point of view.

Here I want to focus on a problem that already has well-established solutions when executing unit tests, but does not for E2E tests - data cleaning & seeding your test database to ensure each of your cypress tests starts on a clean state.

The set-up I’ll show you involves running cypress in parallel against a Rails application that uses multiple cypress/test databases utilizing Rails 6.1 horizontal sharding functionality and Postgres as database engine. I’ll go over the decisions I made along the way to come up with this set-up.

The set-up assumes good understanding of how transacations, threads & processes work in the context of database connections within a Rails application, so before proceeding with the details for the actual set-up I want to go over some fundamental angles.

Nested transactions in Rails

Since most database engines do not support nested transactions, when we place a transaction within another one, Rails realizes those via Transaction Savepoints. A savepoint indicates a state of the database in a transaction and you can always rollback to that state. Hence, when you write nested transaction blocks in Rails, under the hood you actually have one SQL TRANSACTION with multiple SAVEPOINTs.

If you look at Rails core, when a transaction is begun, it creates either a RealTransaction or a SavePointTransaction instance depending on wheter you have a regular or a nested transaction.

Let’s look at this example:

ActiveRecord::Base.transaction do
  Student.first.update(first_name: 'David')
  ActiveRecord::Base.transaction do
    Student.last.update(first_name: 'John')
    raise ActiveRecord::Rollback
  end
end

Now this might be confusing, but here to our surprise, both first and last students’ names will be updated and the rollback will be ignored. According to official docs about nested transactions all database statements in the nested transaction block become part of the parent transaction, handled by a RealTransaction instance underneath and no savepoints.

Hence, raising ActiveRecord::Rollback to trigger a ROLLBACK will not revert all operations within the parent transaction block. Since it’s a special exception and not re-raised, the nested transaction block will capture it but since under the hood all database statements will join/become part of the solely created sql TRANSACTION(handled by RealTransaction) for the parent transaction block, both updates will be commited.

In order to trigger new transaction (think savepoint) for each nested block there’re two options:

# Option 1 - requires_new: true
ActiveRecord::Base.transaction do
  Student.first.update(first_name: 'David')
  ActiveRecord::Base.transaction(requires_new: true) do
    Student.last.update(first_name: 'John')
    raise ActiveRecord::Rollback
  end
end

requires_new: true forces a new savepoint for the transaction block.

# Option 2 - joinable: false
ActiveRecord::Base.transaction(joinable: false) do
  Student.first.update(first_name: 'David')
  ActiveRecord::Base.transaction do
    Student.last.update(first_name: 'John')
    raise ActiveRecord::Rollback
  end
end

joinable: false switches off the default behaviour of having database statements in nested transaction blocks join the parent transaction.

In both cases only the first student name will be updated as the nested transaction block will result in a savepoint under the hood that will be reverted as dictated by ActiveRecord::Rollback.

New threads and workers mean more database connections

In Rails each new thread obtains a database connection, so it’s best to have the limit of your database connection pool equal to the number of threds you’ve configured puma with. Default configuration for a new rails project is set with 5 threds:

# puma.rb
max_threads_count = ENV.fetch("RAILS_MAX_THREADS") { 5 }
# workers ENV.fetch("WEB_CONCURRENCY") { 2 }

# database.yml
default: &default
adapter: sqlite3
pool: <%= ENV.fetch("RAILS_MAX_THREADS") { 5 } %>
timeout: 5000

Apart from threads, in puma.rb you can also increase the workers. A worker is a new OS process in which a separate instance of your Rails app runs and in puma.rb it’s controlled by WEB_CONCURRENCY by default.

Each worker uses threads, hence the number of the maximum database connections that can be opened is WEB_CONCURRENCY * RAILS_MAX_THREADS. If you set 2 workers and 5 threds, then you must ensure that your database engine can support 10 connections.

Alright, knowing all that, we can proceed with the actual set-up.

The Problem

Flakiness

To give more context, in a rails project we had the following dilemma. Using Cypress for E2E testing, as the project evolved and the number ot tests increased, we started running cypress in parallel to cut the waiting time. Even though tests were written in fairly well-isolated manner and tests in one parallel cypress process were independent of tests in another, we did experience some flakiness - managable at the time.

Data cleaning

The other constraint we had was related to refreshing data values created in a test. When two tests are executed sequentially, the data created by the first test should be cleaned/reverted so that it doesn’t interfere with the second test. Over time it became obvious that this is hardly attainable when you have long scenarios that perform many interactions. So as the number of tests incresed, that factor also contributed to the growing flakiness.

We needed a data cleaning mechanism.

Since each of our parallel cypress processes was preceeded by those 2 steps:

Generate test seeds used as a base for cypress tests
Generate cypress fixtures based on test seeds, so that fixtures mirror test data in the database

the first approach for cleaning up data we came up with was via comparing timestamps. Before each test we deleted all records created after the most recent created_at out of all fixtures.

That was more or less a variation of what one would describe as a DELETE approach when it comes to restoring database. It was a straightforward and quick solution which reduced flakiness but only temporarily in the short run.

Transactional approach

Over time with flakiness becoming more frustrating we felt the need for an equivalent of how DatabaseCleaner works for rspec tests. That meant either transaction or truncate strategy.

The idea of keeping all interactions which a cypress test performs in a transaction that we can easily roll back in the beginning of each test was appealing. While researching if it can actually work in our parallel set-up we stumbled upon cypress-rails gem which made us more confident as it relies on a custom transactional mechanism as a data cleaning strategy.

Here’s a huge constraint:

⚠️ transactions assume one database connection since database connections do not share transactions’ state by default.

This Rails MR provides an option for making all threads that obtain connections from the same database connection pool share the same connection via:

connection_pool.lock_thread = true

which is also what cypress-rails relies on:

def begin_transaction
  @connections = gather_connections
  @connections.each do |connection|
    connection.begin_transaction joinable: false, _lazy: false
    connection.pool.lock_thread = true
  end
  
  @connection_subscriber = ActiveSupport::Notifications.subscribe("!connection.active_record") { |_, _, _, _, payload|
  ...
      if connection && !@connections.include?(connection)
        connection.begin_transaction joinable: false, _lazy: false
        connection.pool.lock_thread = true
        @connections << connection
      end
  }
  ...
end

def gather_connections
  ...
  # pool.connection retrieves the connection for the current thread
  ActiveRecord::Base.connection_handler.connection_pool_list.map(&:connection)
end

So when cypress is run and tests start interacting with rails api resulting in new threads being utilized by puma which on their own will try to establish database connections - they will get the same database connection.

Along with pool.lock_thread = true a new transaction is started in the beginning of cypress launch and whenever a new connection is being requested (.subscribe part).

As a test rolls down and transactions for the same connection are being begun, each new transaction becomes nested and because they’re started with joinable: false, under the hood Rails builds one bulky SQL TRANSACTION with multiple SAVEPOINTS which later one can be rolled back.

Transactional approach for cypress running in parallel

All good up to now, but can that be applied when running cypress in parallel groups?

A shared database connection obviously won’t be sufficient as parallel groups should not share each other’s data. If you split cypress tests in 3 groups and run them in parallel, you’ll need 3 database connections dedicated for each group and logic that associates requests coming from a group to its dedicated connection.

Instead of caching database connections by Thread.current (which is what .lock_thread = true does), you’ll likely have to cache them by a request’s unique indicator, which can be a subdomain if you split test groups by subdomains for example.

Such implementation seemed convoluted, error-prone and difficult to maintain, and a blocker for the whole transaction strategy, so we ended up not taking that route.

Even if we had build such implementation, another hurdle would’ve been the puma workers. If you’re running cypress tests against an app with multiple puma workers handling requests underneath, then workers’ threads won’t access same database state as they will have separate database connections.

While you’re not likely to run puma with multiple workers in a test environment, that might not necessarily be the case in a staging or some other pre-production environment.

Hence, such implementation would’ve also been resource-dependent.

Truncate approach

Rulling out transactions, we’re left with cleaning up the database manually. Emptying all tables and essentially the whole database again is easy to picture with one process executing all tests, but not so much when running tests in parallel. We can’t afford to reset the database in the beginning of a test in one parallel group while a test from another is still running.

Here’s when the idea of multiple test databases serving cypress test groups emerged. Each cypress test group could use a dedicated test database completely in isolation from other test groups and their databases - that way we can empty each test database at any time without worrying about the others - allowing true parallelism between test groups.

To link a cypress test group with its dedicated database we’ll also run the groups using dedicated subdomains. Let’s say you’re executing your tests against https://myawesomeapp.com in 3 parallel processes, you can set:

https://cypress-db1.myawesomeapp.com
https://cypress-db2.myawesomeapp.com
https://cypress-db3.myawesomeapp.com

subdomains serving your rails app (regarless if your app is server-side rendered html or a completely client-side js) and run a test group against each. You can think of each of those as tenants in a multi-tenant application where each tenant’s data is saved in its own database.

⚠️ This requires adapting your cypress configuration, rails configuration and infrastructure in charge of running cypress to such multiple subdomains set-up. Here we’ll focus on key points in the application’s code only.

Horizontal sharding

Since 6.1 Rails supports horizontal sharding - database functionality that allows you to have multiple databases(shards) that have same structure:

# database.yml
default: &default
  adapter: postgresql
  pool: <%= ENV.fetch("RAILS_MAX_THREADS") { 5 } %>
  timeout: 5000

development:
  primary:
    <<: *default
    database: 'development'
  cypress-db1:
    <<: *default
    database: 'cypress-db1'
  cypress-db2:
    <<: *default
    database: 'cypress-db2'
  cypress-db3:
    <<: *default
    database: 'cypress-db3'

cypress-db1, cypress-db-2 and cypress-db3 databases have the same structure as the primary.

# config/environments/development.rb
config.x.cypress_shards = [
  'cypress-db1',
  'cypress-db2',
  'cypress-db3'
]

# application_record.rb
class ApplicationRecord < ActiveRecord::Base
  self.abstract_class = true

  CYPRESS_SHARDS = Rails.application.config.x.cypress_shards.each_with_object({}) do |cypress_db, hash|
    hash[cypress_db.to_sym] = { writing: cypress_db.to_sym, reading: cypress_db.to_sym }
  end.freeze

  connects_to shards: {
    default: { writing: :primary, reading: :primary },
    **CYPRESS_SHARDS
  }
end

Okay, with that database.yml and the changes in application_record we have enabled switching between shards in our application. Swapping happens via connected_to method:

module Middlewares
  class CypressConnection
    def initialize(app)
      @app = app
    end

    def call(env)
      request = Rack::Request.new(env)
      subdomain = request.referer&.split('https://')&.second&.split('.').&first
      is_cypress = Rails.application.config.x.cypress_shards.include?(subdomain)

      if is_cypress
        ActiveRecord::Base.connected_to(shard: subdomain.to_sym, role: :writing) do
          @app.call(env)
        end
      else
        @app.call(env)
      end
    end
  end
end

CypressConnection is a custom middleware responsible for connecting to the right cypress database based on the subdomain the request comes from.

With the configuration up to this point you can split your cypress tests in 3 groups and run the subdomain-isolated groups in parallel without worrying if data created in one group would interefere with the data created in another as groups have dedicated databases to save their data in.

But there’s still nothing in place to ensure the same database state before each test within a group. Let’s go to Part 2

Fight flakiness and speed up your Cypress tests. Rails 6.1 horizontal sharding & data cleaning mechanism (Part 1)

Nested transactions in Rails

New threads and workers mean more database connections

The Problem

Flakiness

Data cleaning

Transactional approach

Transactional approach for cypress running in parallel

Truncate approach

Horizontal sharding

About Mihail Panayotov

Comments

Read Next

Run Cypress in parallel on Gitlab CI using Cloud Build (GCP)

Fight flakiness and speed up your Cypress tests. Rails 6.1 horizontal sharding & data cleaning mechanism (Part 2)

Nested transactions in Rails

New threads and workers mean more database connections

The Problem

Flakiness

Data cleaning

Transactional approach

Transactional approach for cypress running in parallel

Truncate approach

Horizontal sharding

About Mihail Panayotov

Comments

Subscribe to my newsletter

Read Next