“The best time to write database-agnostic code is from day one. The second best time is before your migration.”
When we started building the Wigiwork API, we chose MySQL for its familiarity and ease of setup. As our requirements grew—AI vector embeddings, geospatial queries, graph routing—we realized PostgreSQL wasn’t just an alternative, it was a necessity.
This post documents our migration journey and shares the patterns that made it nearly painless.
Table of Contents
- Why We Migrated
- The Database-Agnostic Foundation
- The Migration Process
- PostgreSQL vs MySQL: A Technical Comparison
- Advanced PostgreSQL Features
- Multi-Schema Architecture
- Migration Tips and Lessons Learned
1. Why We Migrated
1.1 The Growing Feature Requirements
Our application started as a straightforward REST API. Over time, we needed:
| Requirement | MySQL Solution | PostgreSQL Solution |
|---|---|---|
| Vector similarity search (AI) | External service (Pinecone, Weaviate) | Native pgvector extension |
| Geospatial queries | Limited spatial functions | PostGIS (industry standard) |
| Graph routing | External graph DB | pgRouting extension |
| Full-text search | Basic FULLTEXT | Powerful tsvector/tsquery |
| JSON querying | JSON functions (limited) | JSONB with GIN indexes |
The math was simple: Three external services vs. one PostgreSQL instance.
1.2 The Hidden Costs of Multiple Databases
Running MySQL + Pinecone + a graph database meant:
- 3x infrastructure complexity — More services to monitor, scale, and pay for
- Data synchronization overhead — Keeping vector embeddings in sync with source data
- Network latency — Cross-service queries add milliseconds
- Operational burden — Different backup strategies, different failure modes
PostgreSQL with extensions gave us one database to rule them all.
1.3 Our Starting Point
Before migration, our codebase analysis revealed:
| Category | Count | Status |
|---|---|---|
| MySQL-specific SQL | 1 function | Required fix |
| JSON columns | 45 | Upgrade to JSONB |
| Check constraints | 11 | Already compatible |
| Raw SQL queries | 15+ | Already compatible |
| Configuration files | 4 | Required update |
98% of our code was already database-agnostic. This wasn’t luck—it was intentional architecture.
2. The Database-Agnostic Foundation
2.1 Why Database-Agnostic Code Matters
Writing database-agnostic code isn’t about planning to switch databases. It’s about:
- Portability — Your code works on any database Rails supports
- Testability — SQLite in CI, PostgreSQL in production
- Maintainability — Standard patterns are easier to understand
- Future-proofing — Requirements change; your code adapts
2.2 The Rules We Follow
Our AGENTS.md (development guidelines) includes this rule:
Rule #9: Database-agnostic queries. Use Arel and ActiveRecord abstractions instead of raw SQL or database-specific syntax. This ensures portability when migrating databases.
2.3 Patterns That Work Everywhere
✅ Use ActiveRecord Query Interface
# Good: Works on any database
User.where(status: :active)
.where("created_at > ?", 1.week.ago)
.order(created_at: :desc)
.limit(10)
✅ Use Arel for Complex Queries
# Good: Database-agnostic date ranges
class CreditTransaction < ApplicationRecord
scope :by_requested_dates, ->(dates) {
ranges = Array(dates).filter_map do |date|
year, month = date.split("-").map(&:to_i)
start_date = Time.zone.local(year, month, 1).beginning_of_day
end_date = start_date.end_of_month.end_of_day
arel_table[:created_at].between(start_date..end_date)
end
where(ranges.reduce(:or))
}
end
✅ Use Standard SQL Functions
# Good: LOWER, COALESCE, CASE work everywhere
scope :search, ->(term) {
where("LOWER(name) LIKE :term OR LOWER(email) LIKE :term",
term: "%#{term.downcase}%")
}
# Good: COALESCE for fallbacks
order(Arel.sql("COALESCE(last_message_at, created_at) DESC"))
# Good: CASE for conditional ordering
order(Arel.sql("CASE WHEN status = 'urgent' THEN 0 ELSE 1 END"))
✅ Process Dates in Ruby
# Good: Fetch timestamps, format in Ruby
def self.date_buckets_for(user, field:)
pluck(field)
.filter_map { |timestamp| timestamp&.strftime("%Y-%m") }
.uniq
.first(24)
end
2.4 Patterns to Avoid
❌ MySQL-Specific Functions
# Bad: DATE_FORMAT is MySQL-only
month_sql = "DATE_FORMAT(created_at, '%Y-%m')"
# PostgreSQL equivalent: TO_CHAR(created_at, 'YYYY-MM')
# Better: Process in Ruby (see above)
❌ Database-Specific Syntax
# Bad: MySQL's IFNULL
where("IFNULL(deleted_at, NOW()) > ?", 1.day.ago)
# Good: Standard COALESCE
where("COALESCE(deleted_at, CURRENT_TIMESTAMP) > ?", 1.day.ago)
❌ String Concatenation Operators
# Bad: MySQL uses CONCAT(), PostgreSQL uses ||
select("CONCAT(first_name, ' ', last_name) AS full_name")
# Good: Use Ruby
def full_name
"#{first_name} #{last_name}"
end
3. The Migration Process
3.1 Pre-Migration Audit
Before touching any configuration, we audited the entire codebase:
# Find MySQL-specific SQL
grep -r "DATE_FORMAT\|GROUP_CONCAT\|IFNULL\|BINARY\s" app/
# Find raw SQL that might be problematic
grep -r "\.where.*Arel\.sql\|execute\|connection\." app/
# Check for MySQL gem references
grep -r "mysql" Gemfile config/
Our findings:
| Issue | Location | Fix |
|---|---|---|
DATE_FORMAT |
app/models/access_request.rb |
Rewrote using Ruby |
mysql2 gem |
Gemfile |
Replaced with pg |
| MySQL config | config/database.yml |
Rewrote for PostgreSQL |
| Docker MySQL | docker-compose.dev.yml |
Replaced with PostgreSQL |
3.2 The Critical Fix
The only code change required was in access_request.rb:
# Before: MySQL-specific
def self.date_buckets_for(user, field:)
column = field.to_s
qcol = connection.quote_column_name(column)
month_sql = "DATE_FORMAT(#{qcol}, '%Y-%m')" # ❌ MySQL-only
for_user(user)
.group(Arel.sql(month_sql))
.pluck(Arel.sql(month_sql))
end
# After: Database-agnostic Ruby
def self.date_buckets_for(user, field:)
column = field.to_s
return [] unless DATE_BUCKET_FIELDS.include?(column)
for_user(user)
.where.not(column => nil)
.order(column => :desc)
.limit(500)
.pluck(column)
.filter_map { |timestamp| timestamp&.strftime("%Y-%m") }
.uniq
.first(24)
.filter_map do |month|
start_at = Time.zone.parse("#{month}-01")
next unless start_at
{
id: month,
name: start_at.strftime("%B %Y"),
from: start_at.beginning_of_month.beginning_of_day.iso8601,
to: start_at.end_of_month.end_of_day.iso8601
}
end
end
3.3 Configuration Changes
Gemfile:
# Remove
gem 'mysql2', '~> 0.5.6'
# Add
gem 'pg', '~> 1.5'
config/database.yml:
default: &default
adapter: postgresql
encoding: unicode
pool: <%= ENV.fetch("RAILS_MAX_THREADS", 5) %>
username: <%= ENV.fetch("POSTGRES_USER", "postgres") %>
password: <%= ENV.fetch("POSTGRES_PASSWORD", "") %>
host: <%= ENV.fetch("POSTGRES_HOST", "127.0.0.1") %>
port: <%= ENV.fetch("POSTGRES_PORT", 5432) %>
development:
<<: *default
database: wigiwork_development
test:
<<: *default
database: wigiwork_test
production:
<<: *default
database: <%= ENV["POSTGRES_DB"] %>
docker-compose.dev.yml:
services:
postgres:
image: postgres:16
environment:
POSTGRES_PASSWORD: password
POSTGRES_DB: wigiwork_development
ports:
- "5433:5432"
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 5s
timeout: 5s
retries: 20
volumes:
- postgres_data:/var/lib/postgresql/data
volumes:
postgres_data:
3.4 JSON to JSONB Migration
MySQL’s JSON type works, but PostgreSQL’s JSONB is superior:
# In migration files, change:
t.json :metadata
# To:
t.jsonb :metadata, default: {}
Why JSONB?
| Feature | JSON | JSONB |
|---|---|---|
| Storage | Text (parsed each query) | Binary (pre-parsed) |
| Indexing | ❌ Not supported | ✅ GIN indexes |
| Operators | Basic | Rich (@>, ?, ?|, ?&) |
| Query speed | Slower | Faster |
| Write speed | Faster | Slightly slower |
JSONB Query Examples:
# Find profiles with specific skill in JSONB array
Profile.where("skills @> ?", ['ruby'].to_json)
# Find users with specific setting
User.where("settings -> 'notifications' ->> 'email' = ?", 'true')
# Check if key exists
Product.where("metadata ? 'featured'")
3.5 Running the Migration
# 1. Update gems
bundle install
# 2. Create databases
rails db:create
# 3. Run migrations
rails db:migrate
# 4. Verify schema
rails db:schema:dump
# 5. Run test suite
bundle exec rspec
# 6. Seed development data
rails db:seed
Total downtime: Zero (for development). Production required a maintenance window for data migration.
4. PostgreSQL vs MySQL: A Technical Comparison
4.1 Feature Comparison
| Feature | MySQL | PostgreSQL |
|---|---|---|
| JSON Support | JSON type, basic functions | JSONB with GIN indexes, rich operators |
| Full-Text Search | FULLTEXT indexes | tsvector/tsquery, ranking, dictionaries |
| Geospatial | Basic spatial | PostGIS (industry standard) |
| Vector Search | ❌ None | pgvector extension |
| Graph Queries | ❌ None | pgRouting, Apache AGE |
| Arrays | ❌ None | Native array types with operators |
| Range Types | ❌ None | int4range, daterange, etc. |
| Check Constraints | ✅ Supported | ✅ Supported |
| Partial Indexes | ❌ None | ✅ Supported |
| Expression Indexes | ❌ Limited | ✅ Full support |
| CTEs (WITH queries) | ✅ Basic | ✅ Recursive, materialized |
| Window Functions | ✅ Supported | ✅ More advanced |
| UPSERT | ON DUPLICATE KEY |
ON CONFLICT (more flexible) |
| Transactions | ✅ InnoDB | ✅ MVCC |
| Replication | ✅ Built-in | ✅ Streaming, logical |
4.2 Performance Characteristics
MySQL Strengths:
- Simple read-heavy workloads
- Large-scale web applications with basic queries
- When you need maximum reads/second on simple queries
PostgreSQL Strengths:
- Complex queries with many joins
- Write-heavy workloads (better MVCC)
- Data integrity (stricter by default)
- Advanced data types and indexing
4.3 Why PostgreSQL Won for Us
1. One Database for Everything
Instead of:
MySQL (primary data)
+ Pinecone (vector search)
+ Neo4j (graph queries)
+ Elasticsearch (full-text)
We have:
PostgreSQL
+ pgvector (vector search)
+ pgRouting (graph queries)
+ Built-in full-text search
2. Better Data Integrity
-- PostgreSQL: Strict by default
INSERT INTO users (email) VALUES ('not-an-email');
-- ERROR: violates check constraint
-- MySQL: Often silently truncates or converts
INSERT INTO users (name) VALUES ('this string is way too long for the column');
-- Silently truncated (depending on sql_mode)
3. Superior Indexing
# Partial index: Only index active users
add_index :users, :email, where: "status = 'active'"
# Expression index: Index lowercased emails
add_index :users, "LOWER(email)"
# GIN index on JSONB for fast JSON queries
add_index :profiles, :skills, using: :gin
4. Native Array Support
# PostgreSQL array columns
t.string :tags, array: true, default: []
# Query arrays
Profile.where("'ruby' = ANY(tags)")
Profile.where("tags @> ARRAY[?]", ['ruby', 'rails'])
5. Advanced PostgreSQL Features
5.1 pgvector: AI Vector Embeddings
Store and query vector embeddings directly in PostgreSQL:
# Enable extension
class EnablePgvector < ActiveRecord::Migration[8.0]
def change
enable_extension 'vector'
end
end
# Add vector column
class AddEmbeddingToProfiles < ActiveRecord::Migration[8.0]
def change
add_column :profiles, :embedding, :vector, limit: 1536 # OpenAI dimension
add_index :profiles, :embedding, using: :ivfflat, opclass: :vector_cosine_ops
end
end
# app/models/profile.rb
class Profile < ApplicationRecord
def self.similar_to(embedding, limit: 10)
where.not(embedding: nil)
.order(Arel.sql("embedding <=> '#{embedding}'"))
.limit(limit)
end
end
# Usage
similar_profiles = Profile.similar_to(openai_embedding, limit: 5)
5.2 PostGIS: Geospatial Queries
# Enable extension
enable_extension 'postgis'
# Add geography column
add_column :locations, :coordinates, :st_point, geographic: true
add_index :locations, :coordinates, using: :gist
# Find locations within 10km
Location.where(
"ST_DWithin(coordinates, ST_MakePoint(?, ?)::geography, ?)",
longitude, latitude, 10_000 # meters
)
5.3 pgRouting: Graph Algorithms
# Find shortest path between locations
execute <<-SQL
SELECT * FROM pgr_dijkstra(
'SELECT id, source, target, cost FROM edges',
#{start_node},
#{end_node},
directed := true
)
SQL
5.4 Full-Text Search
# Add tsvector column
add_column :articles, :searchable, :tsvector
add_index :articles, :searchable, using: :gin
# Keep it updated with a trigger
execute <<-SQL
CREATE TRIGGER articles_searchable_update
BEFORE INSERT OR UPDATE ON articles
FOR EACH ROW EXECUTE FUNCTION
tsvector_update_trigger(searchable, 'pg_catalog.english', title, body);
SQL
# Search with ranking
Article.where("searchable @@ plainto_tsquery('english', ?)", query)
.order(Arel.sql("ts_rank(searchable, plainto_tsquery('english', '#{query}')) DESC"))
6. Multi-Schema Architecture
6.1 What Are PostgreSQL Schemas?
PostgreSQL schemas are namespaces within a single database. Think of them as folders for your tables:
database: wigiwork_production
├── public (default schema)
│ ├── users
│ ├── profiles
│ └── companies
├── analytics
│ ├── events
│ ├── page_views
│ └── conversions
├── audit
│ ├── user_actions
│ └── api_requests
└── cache
├── cached_profiles
└── cached_searches
6.2 Benefits of Multi-Schema Architecture
1. Logical Separation Without Multiple Databases
# One connection pool, multiple schemas
class AnalyticsEvent < ApplicationRecord
self.table_name = 'analytics.events'
end
class AuditLog < ApplicationRecord
self.table_name = 'audit.user_actions'
end
2. Different Permissions Per Schema
-- Analytics team can only access analytics schema
GRANT USAGE ON SCHEMA analytics TO analytics_role;
GRANT SELECT ON ALL TABLES IN SCHEMA analytics TO analytics_role;
-- No access to public schema
REVOKE ALL ON SCHEMA public FROM analytics_role;
3. Simplified Backup Strategies
# Backup only the audit schema
pg_dump -n audit wigiwork_production > audit_backup.sql
# Backup everything except cache
pg_dump -N cache wigiwork_production > production_backup.sql
4. Schema-Level Maintenance
-- Truncate all cache tables without affecting production
TRUNCATE TABLE cache.cached_profiles, cache.cached_searches;
-- Drop and recreate analytics schema
DROP SCHEMA analytics CASCADE;
CREATE SCHEMA analytics;
6.3 Setting Up Multi-Schema in Rails
Migration to create schemas:
class CreateAnalyticsSchema < ActiveRecord::Migration[8.0]
def up
execute "CREATE SCHEMA IF NOT EXISTS analytics"
execute "CREATE SCHEMA IF NOT EXISTS audit"
execute "CREATE SCHEMA IF NOT EXISTS cache"
end
def down
execute "DROP SCHEMA IF EXISTS cache CASCADE"
execute "DROP SCHEMA IF EXISTS audit CASCADE"
execute "DROP SCHEMA IF EXISTS analytics CASCADE"
end
end
Models with schema prefixes:
# app/models/analytics/event.rb
module Analytics
class Event < ApplicationRecord
self.table_name = 'analytics.events'
end
end
# app/models/audit/user_action.rb
module Audit
class UserAction < ApplicationRecord
self.table_name = 'audit.user_actions'
end
end
Cross-schema queries work seamlessly:
# Join across schemas
User.joins("JOIN audit.user_actions ON audit.user_actions.user_id = users.id")
.where("audit.user_actions.action = ?", "login")
6.4 Schemas vs. Multiple Databases
| Aspect | Multiple Databases | Multiple Schemas |
|---|---|---|
| Connection pools | Separate per DB | Shared |
| Cross-queries | Complex/impossible | Native JOINs |
| Transactions | Distributed (2PC) | Single transaction |
| Backups | Per database | Per schema or together |
| Permissions | Database-level | Schema-level (granular) |
| Migrations | Separate per DB | Single migration |
| Complexity | Higher | Lower |
| Rails support | Multiple configs | Single config |
Our recommendation: Use schemas for logical separation within the same application. Use separate databases only for truly isolated services or multi-tenant SaaS.
6.5 When to Use Each Pattern
Use Multiple Schemas When:
- Separating concerns (analytics, audit, cache)
- Different retention policies per data type
- Granular permission control needed
- Data is related and needs cross-queries
Use Multiple Databases When:
- Complete data isolation required (multi-tenant)
- Different scaling requirements
- Regulatory compliance requires separation
- Microservices with independent lifecycles
7. Migration Tips and Lessons Learned
7.1 Before You Start
1. Audit your codebase thoroughly
# Find all raw SQL
grep -rn "execute\|\.sql\|Arel\.sql" app/ --include="*.rb"
# Find MySQL-specific functions
grep -rn "DATE_FORMAT\|IFNULL\|GROUP_CONCAT\|BINARY\|STRAIGHT_JOIN" app/
# Check gems for MySQL dependencies
bundle exec gem dependency mysql2
2. Run tests on PostgreSQL locally first
# config/database.yml for CI
test:
adapter: postgresql
database: myapp_test
3. Document all discrepancies
Create a migration checklist (like our mysql_to_postgresql.md) tracking every issue found.
7.2 Data Migration Strategies
Option A: Fresh Start (Recommended for Development)
# On PostgreSQL
rails db:create db:migrate db:seed
Option B: pgloader (For Production Data)
# Install pgloader
brew install pgloader # or apt-get
# Create migration script
cat > migrate.load << 'EOF'
LOAD DATABASE
FROM mysql://root:password@localhost/myapp_production
INTO postgresql://postgres@localhost/myapp_production
WITH include drop, create tables, create indexes, reset sequences
SET work_mem to '16MB', maintenance_work_mem to '512 MB'
CAST type datetime to timestamptz using zero-dates-to-null,
type date to date using zero-dates-to-null
ALTER SCHEMA 'myapp_production' RENAME TO 'public';
EOF
# Run migration
pgloader migrate.load
Option C: Rails-Native Export/Import
# Export from MySQL
namespace :export do
task users: :environment do
File.open('users.json', 'w') do |f|
User.find_each { |u| f.puts u.to_json }
end
end
end
# Import to PostgreSQL
namespace :import do
task users: :environment do
File.readlines('users.json').each do |line|
User.create!(JSON.parse(line))
end
end
end
7.3 Common Gotchas
1. Auto-increment vs. SERIAL
MySQL uses AUTO_INCREMENT, PostgreSQL uses SERIAL or IDENTITY. Rails handles this automatically, but check sequences after migration:
-- Fix sequence if IDs are out of sync
SELECT setval('users_id_seq', (SELECT MAX(id) FROM users));
2. Boolean Handling
MySQL often stores booleans as TINYINT(1). PostgreSQL uses native BOOLEAN:
# May need explicit casting in raw SQL
where("active = true") # PostgreSQL
where("active = 1") # MySQL
3. Case Sensitivity
PostgreSQL string comparisons are case-sensitive by default:
# MySQL: case-insensitive by default
User.where(email: 'Test@Example.com') # Finds test@example.com
# PostgreSQL: case-sensitive
User.where("LOWER(email) = LOWER(?)", 'Test@Example.com')
# Or use citext extension for case-insensitive columns
4. Group By Strictness
PostgreSQL requires all non-aggregated columns in GROUP BY:
# MySQL: Allows this (picks arbitrary value)
select("users.*, COUNT(orders.id)")
.joins(:orders)
.group("users.id")
# PostgreSQL: Must include all selected columns
select("users.id, users.email, COUNT(orders.id)")
.joins(:orders)
.group("users.id, users.email")
7.4 Post-Migration Checklist
# 1. Verify all tests pass
bundle exec rspec
# 2. Check for deprecation warnings
RAILS_ENV=test rails runner "puts ActiveRecord::Base.connection.adapter_name"
# 3. Regenerate schema
rails db:schema:dump
# 4. Verify schema.rb has PostgreSQL syntax
grep "create_table" db/schema.rb | head
# 5. Run security scan
bundle exec brakeman -q -w2
# 6. Regenerate API documentation
bundle exec rails rswag:specs:swaggerize
# 7. Performance baseline
rails runner "Benchmark.measure { User.count }"
7.5 Lessons We Learned
1. Write database-agnostic code from day one
The single MySQL-specific function we had took 30 minutes to rewrite. If we’d had dozens, it would have been a multi-day effort.
2. Use JSONB defaults in migrations
# Always specify defaults for JSONB
t.jsonb :settings, default: {}
t.jsonb :tags, default: []
3. Leverage PostgreSQL-specific features gradually
Don’t rewrite everything immediately. Start with:
- JSONB for new features
- Partial indexes for performance
- Full-text search when needed
4. Monitor query performance post-migration
PostgreSQL’s query planner is different. Some queries may need new indexes:
# Add explain logging in development
ActiveRecord::Base.logger = Logger.new(STDOUT)
User.where(status: :active).explain
5. Document your extensions
# db/migrate/001_enable_extensions.rb
class EnableExtensions < ActiveRecord::Migration[8.0]
def change
# Core extensions we use
enable_extension 'pgcrypto' # UUID generation
enable_extension 'citext' # Case-insensitive text
# Feature-specific (enable when needed)
# enable_extension 'vector' # AI embeddings
# enable_extension 'postgis' # Geospatial
end
end
Conclusion
Migrating from MySQL to PostgreSQL doesn’t have to be painful. With database-agnostic code patterns and a systematic approach, our migration required:
- 1 code fix (DATE_FORMAT → Ruby)
- 4 configuration file updates
- 0 model changes (beyond JSON → JSONB)
- 0 downtime (for development)
The benefits far outweigh the effort:
| Before (MySQL + Services) | After (PostgreSQL) |
|---|---|
| 3+ external services | 1 database |
| Multiple connection pools | Single pool |
| Cross-service latency | Native queries |
| Complex sync logic | Referential integrity |
| Higher infrastructure cost | Lower cost |
PostgreSQL isn’t just a MySQL replacement—it’s a platform for building sophisticated applications. With extensions like pgvector, PostGIS, and pgRouting, you can consolidate your entire data layer into a single, powerful database.
Key Takeaways
- Write database-agnostic code — Use ActiveRecord/Arel, avoid raw SQL functions
- Audit before migrating — Find MySQL-specific code early
- Use JSONB over JSON — Better indexing, better queries
- Consider schemas over databases — Simpler architecture, better performance
- Leverage PostgreSQL features — Partial indexes, arrays, full-text search
- Plan for extensions — pgvector, PostGIS, pgRouting when needed
- Test thoroughly — Run your full suite on PostgreSQL before production
The best database is the one that grows with your needs. PostgreSQL does exactly that.
Resources
- PostgreSQL Official Docs: postgresql.org/docs
- pgvector Extension: github.com/pgvector/pgvector
- PostGIS Documentation: postgis.net/docs
- pgRouting Documentation: pgrouting.org
- Rails PostgreSQL Guide: guides.rubyonrails.org/active_record_postgresql.html
- pgloader: pgloader.io
This post documents our migration of the Wigiwork API from MySQL to PostgreSQL. The patterns described here enabled a migration with minimal code changes and zero regressions—proving that investment in database-agnostic code pays dividends when requirements evolve.