In Part 1 we covered how a rails application can become multi-tenant with a separate database per tenant for the purpose of running cypress in parallel groups in complete isolation of each other.
Data cleaning mechanism
If we zoom in on a single parallel test group we need to ensure that each of its tests starts from a certain database state(usually one represented by the initial data coming from seeds or fixtures), execute their scenario and at the beginning of the next test the initial database state is restored, guaranteeing that data created by one test wonāt interefere with the execution of another.
We need some form of equivalent of Postgreās pg_restore encapsulated in Rails dsl.
Before each parallel group is run we will:
- Seed the corresponding cypress database with initial data.
- Dump the database in the form of
INSERT
statements (so that the dump contains the initial data, in .sql file)
Then before each test scenario is run we will:
- Restore the cypress database using the .sql dump
Restoring the database includes emptying it first. One way of doing that is emptying its tables using TRUNCATE
. If so, before executing the āINSERTā statements you may need to disable foreign key constraints and enable them afterwords so that the order of the INSERTs does not lead to errors. That may be a tricky job as it might involve additional privilages, workarounds and more raw SQL lines.
So the alternative I came up with is appending the inserts in structure.sql right before the constraints are defined and after table definitions and then simply realod the structure.sql. That way INSERTS will be inserted regardless of their order without disabling any constraints, and restoring the database becomes simply reloading the structure.sql.
Letās look at the service that will help us manage that:
class CypressDumpManager
EXCLUDE_TABLES = ['ar_internal_metadata', 'schema_migrations'].freeze
class NonCypressShardError < StandardError
end
def initialize
@connection = ActiveRecord::Base.connection
end
def generate_dump
verify_cypress_shard
create_inserts_file
create_structure_sql_with_data
end
def restore_from_dump
verify_cypress_shard
@connection.execute('DROP SCHEMA public CASCADE;CREATE SCHEMA public;')
@connection.execute(File.read(structure_with_data))
@connection.reconnect!
@connection.tables.each { |table| @connection.reset_pk_sequence!(table) }
ActiveRecord::Base.connection.pool.disconnect!
end
private
# to be defined later
def create_inserts_file; end
def create_structure_sql_with_data; end
def transform_values; end
def current_shard; end
def inserts_file_name; end
def structure_with_data; end
def verify_cypress_shard; end
end
generate_dump
calls two methods:
create_inserts
- creates the the dump file (.sql) containingINSERT
statements:
....
`INSERT INTO {table1} (col1, col2) VALUES (val1, val2)`
`INSERT INTO {table2} (col1, col2, col3) VALUES (val1, val2, val3)`
...
create_structure_sql_with_data
- creates a copy of structure.sql and appends the content of the dump file(generated bycreate_inserts
) in it after table definitions and right before any constraints or column alterations are made - indicated byALTER TABLE ONLY
. So, we append our dump before the first match of that statement.
restore_from_dump
- when called it drops public schema and reloads it using the new structure.sql that also has the initial data via INSERT
statments.
Letās define the empty methods:
def create_inserts_file
tables = @connection.tables - EXCLUDE_TABLES
File.open(inserts_file_name, 'w+') do |file|
tables.each do |table|
model = table.singularize.classify.constantize
columns = model.new.attributes.keys
model.all.each do |record|
values = transform_values(record)
row = "INSERT INTO public.#{table} (#{columns.join(', ')}) VALUES (#{values});"
file.puts(row)
end
end
end
end
def create_structure_sql_with_data
lines = File.readlines('db/structure.sql')
first_match_detected = false
File.open(structure_with_data, 'w+') do |file|
lines.each do |line|
if !first_match_detected && /^ALTER TABLE ONLY/.match?(line)
File.readlines(inserts_file_name).each { |insert| file.puts insert }
first_match_detected = true
end
file.puts line
end
end
end
To create INSERT
rows based on AR records, we use transform_values
to make column values insertable.
def transform_values(record)
record.attributes.values.map do |value|
case value
when Time, Date, String
@connection.quote(value)
when NilClass
'NULL'
when ActiveSupport::HashWithIndifferentAccess, Hash
"'#{value.to_json}'"
when Array
"'{#{value.join(',')}}'"
else
value
end
end.join(', ')
end
Letās define the rest of the empty methods:
def current_shard
@current_shard ||= ActiveRecord::Base.current_shard
end
def inserts_file_name
"/tmp/#{current_shard}_dump.sql"
end
def structure_with_data
"/tmp/#{current_shard}_structure_with_data.sql"
end
def verify_cypress_shard
unless ApplicationRecord::CYPRESS_SHARDS[current_shard]
raise NonCypressShardError, "Cannot restore a non-cypress database: #{current_shard}"
end
end
ActiveRecord::Base.current_shard
gives us the database weāre currently connected to so before each restore we verify that weāre actually working with a test database.
Both files that are generated per shard are saved in /tmp
folder of the server hosting the Rails app. That way after restore file is generated, the database it relates to can simply be restored via endpoint call.
The set-up
With CypressDumpManager we can call:
CypressDumpManager.new.generate_dump
CypressDumpManager.new.restore_from_dump
to generate a database dump and later use it for restore.
To see how evrything we covered in Part 1 and Part 2 is put into practice letās have the following context:
- Rails application in api mode as backend.
- React/Vue/Angular application as frontend.
- Cypress as E2E testing framework.
Now, in Rails letās:
- Configure database.yml with 3 cypress shards that will be used for testing.
- Add
CypressConnection
middleware so each requests opens a connection to the right test db. - Define
/cypress/seeds
endpoint which seeds with initial testing data and then creates a dump:
# let's pretend YourSeedGenerator creates the initial data
YourSeedGenerator.generate
CypressDumpManager.new.generate_dump
- Define
/cypress/restore
endpoint which restores the corresponding cypress database via:
CypressDumpManager.new.restore_from_dump
Letās configure Cypress to:
- Run each parallel process(test group) using separate subdomain and before starting
/cypress/seeds
is called. - call
/cypress/restore
before each test (in abeforeEach
block)
With those steps you should have an example of an multi-db architecture that allows running cypress in parallel with parallel processes being isolated between each other in terms of database operations and having a data cleaning mechanism that guarantees same starting point for each test.
ā ļø Caveats
Such set-up comes with some drawbacks:
- Working with threads becomes trickier. When creating a new thread via
Thread.new
, it opens a new database connection to primary database by default. So if you want to have AR code inThread.new
you must ensure a connection to the right one is made. Letās say a test from cypress-d2 is executing and so thereās a connection to cypress-db2, then if you have something like:
Thread.new { Student.first.update name: 'John' }
Student.first
will reach in the primary database and not cypress-db2.
- If Rails app communicates with third-party services, they might also need to be adapted to the multi-db setup. For example if those services call interal endpoints, calls may have to start including additional data in the headers or body of the requests to indicate which database should be used.
- If thereāre other processes alongside Rails server, they may also need to be adapted. For example, a background job processor.
- In case of using sharding for optimizations like running cypress in parallel, increasing the parallel processes means adding databases manually.
Moving to a multiple database setup should be approached carefully after weighing pros and cons with respect to the application needs and long-term development.
Really hoping this example will be helpful to you and inspire ideas for applying horizontal sharding in other use cases š
Comments