“Good seed data is the difference between a demo that impresses and a demo that embarrasses.”
Every growing Rails application faces the same challenge: how do you populate your database with realistic data for development, demos, testing, and stakeholder presentations? Simple db/seeds.rb files work for small projects, but as your application scales, so do your seeding requirements.
This post documents how we transformed a 3,600-line monolithic seed file into an enterprise-ready data generation system that creates 100,000+ full candidate profiles with all associations—serving AI training, QA stress testing, and frontend performance validation.
Table of Contents
- The Problem: Why Seed Data Matters
- Before: The Monolithic Nightmare
- The Refactoring Journey
- Feature 1: Modular Step Architecture
- Feature 2: Context Object for State Management
- Feature 3: Conditional Execution via ENV Flags
- Feature 4: Progress Reporting with Timing
- Feature 5: Post-Seed Data Validation
- Feature 6: 100k+ Full Profile Scaling
- Benefits and Use Cases
- Testing the Seed System
- Three Bonus Ideas Worth Implementing
- Conclusion
1. The Problem: Why Seed Data Matters
1.1 Who Uses Seed Data?
| Stakeholder | What They Need | Why It Matters |
|---|---|---|
| Frontend Developers | Realistic UI data | Test pagination, search, empty states |
| Backend Developers | Consistent test fixtures | Debug API responses locally |
| QA Team | Volume for stress testing | Find N+1 queries, memory leaks |
| AI/ML Team | Training data at scale | Validate models with realistic distributions |
| Product Managers | Demo-ready environments | Impress stakeholders, test features |
| CEO/Leadership | Production-like demos | Investor presentations, board meetings |
| New Engineers | Quick environment setup | Onboard in minutes, not hours |
1.2 The Reality Check
Most Rails projects start with something like this:
# db/seeds.rb
User.create!(email: 'admin@example.com', password: 'password')
Company.create!(name: 'Test Company')
# ... done
Then requirements grow:
- “We need 50 users with different roles”
- “Profiles need skills, work experiences, education”
- “QA needs 100,000 profiles for load testing”
- “AI team needs realistic job title distributions”
- “CEO wants the demo to look like a real platform”
Before you know it, your seed file is 3,600 lines of spaghetti.
1.3 Our Starting Point
| Metric | Value |
|---|---|
| Seed file size | 3,610 lines |
| Total methods | 59 |
| Instance variables | 8 (shared state) |
| Dependencies | Complex, undocumented |
| Test coverage | 0% |
| Documentation | None |
Pain points:
- ❌ Changes broke unrelated features
- ❌ No way to run individual steps
- ❌ No progress indication
- ❌ No validation of seeded data
- ❌ Impossible to scale to 100k records
- ❌ New engineers couldn’t understand the flow
2. Before: The Monolithic Nightmare
2.1 The Original Structure
# db/seeds/steps.rb - 3,610 lines!
module Seeds
module Steps
class << self
def run
seed_predefined_avatar_assets
seed_company
seed_job_titles
seed_admin_users
# ... 32 more method calls
print_summary
end
private
def seed_company
@company = Company.find_or_initialize_by(name: 'Bluegeko')
@company.assign_attributes(
# ... 50 lines of attributes
)
@company.save!
# ... 200 more lines for demo companies
end
def seed_admin_users
@admin_user = User.find_or_initialize_by(email: 'admin@bluegeko.com')
# ... 300 lines
end
# ... 56 more methods
end
end
end
2.2 The Problems
| Problem | Impact |
|---|---|
| Single 3,600-line file | IDE struggles, git conflicts |
| Implicit dependencies | Can’t run steps individually |
| Instance variable soup | @company used 28 times across methods |
| No progress feedback | “Is it stuck or working?” |
| No validation | Broken seeds discovered in production demos |
| No scalability | Creating 100k records = 100k individual inserts |
| Zero documentation | New engineers lost for days |
2.3 Instance Variable Reference Count
| Variable | References |
|---|---|
@company |
28 |
@admin_user |
18 |
@candidate_demo_user |
15 |
@access_request_user |
14 |
@standard_user |
8 |
@full_data_candidate_users |
5 |
@demo_job_posts |
5 |
@admin_profile |
2 |
These shared variables made refactoring terrifying—change one method, break five others.
3. The Refactoring Journey
3.1 Implementation Priority Matrix
| # | Feature | Effort | Priority | Status |
|---|---|---|---|---|
| 1 | Modular step architecture | High | Critical | ✅ Done |
| 3 | Conditional ENV flags | Low | High | ✅ Done |
| 4 | Context object | Medium | High | ✅ Done |
| 5 | Progress + timing | Medium | High | ✅ Done |
| 6 | Data validation | Low | High | ✅ Done |
| 7 | 100k batch processing | High | Very High | ✅ Done |
| 2 | Parallel execution | Medium-High | Medium | ⏳ Future |
3.2 Final Architecture
db/seeds/
├── seeds.rb # Entry point (29 lines)
├── support.rb # Helpers (387 lines)
├── reference_data.rb # Lookup tables (889 lines)
├── steps.rb # Orchestrator (217 lines)
├── context.rb # State management (233 lines)
├── timing.rb # Progress tracking (186 lines)
├── validation.rb # Data integrity (331 lines)
├── batch_profiles.rb # 100k scaling (825 lines)
├── README.md # Documentation (1,797 lines)
├── SEEDS_REFACTORING.md # Implementation notes (643 lines)
└── steps/ # 37 individual step files
├── 01_predefined_avatar_assets.rb
├── 02_company.rb
├── ...
└── 37_batch_profiles.rb
Result: 3,610 lines → 37 focused files averaging 80 lines each
4. Feature 1: Modular Step Architecture
4.1 The Approach: Singleton Class Reopening
Instead of a massive refactor, we used Ruby’s ability to reopen classes:
# db/seeds/steps/02_company.rb
module Seeds
module Steps
class << self
private
def seed_company
ctx.company = Company.find_or_initialize_by(name: "Bluegeko")
ctx.company.assign_attributes(company_attributes)
ctx.company.save!
log_seed "Company upserted: #{ctx.company.name}"
seed_demo_companies
end
def company_attributes
{
industry: "Technology",
website_url: "https://bluegeko.com",
# ... attributes
}
end
end
end
end
4.2 Orchestrator Pattern
The main steps.rb became a thin orchestrator:
# db/seeds/steps.rb
module Seeds
module Steps
OPTIONAL_STEPS = {
seed_reports: "SKIP_SEED_REPORTS",
seed_full_data_candidates: "SEED_FULL_CANDIDATES",
seed_batch_profiles: "CREATE_BATCH_PROFILES"
}.freeze
class << self
def run
initialize_context
Seeds::Support::Timing.start_run(all_steps.size)
# Execute steps in order
run_step :seed_predefined_avatar_assets
run_step :seed_company
run_step :seed_job_titles
# ... 34 more steps
run_step :seed_batch_profiles
# Validate and summarize
validate_seed_data
print_summary
Seeds::Support::Timing.print_summary
ctx
end
private
def run_step(step_name)
return if skip_step?(step_name)
Seeds::Support::Timing.track_step(step_name) do
send(step_name)
end
end
def skip_step?(step_name)
env_var = OPTIONAL_STEPS[step_name]
return false unless env_var
if env_var.start_with?("SKIP_")
ENV[env_var] == "true"
else
ENV[env_var] != "true"
end
end
end
end
end
4.3 Benefits of Modular Architecture
| Aspect | Before | After |
|---|---|---|
| File size | 3,610 lines | ~80 lines avg |
| Git conflicts | Constant | Rare |
| IDE navigation | Painful | Instant |
| Step isolation | Impossible | Easy |
| Testing | Untestable | Per-step specs |
| Onboarding | Days | Hours |
5. Feature 2: Context Object for State Management
5.1 The Problem: Instance Variable Chaos
# Before: 95 instance variable references across 37 steps
def seed_admin_users
@admin_user = User.create!(...)
end
def seed_job_posts
job = JobPost.create!(company: @company) # Where is @company set?
end
Issues:
- No IDE autocomplete
- No documentation of what’s available
- No validation that required data exists
- Testing requires complex setup
5.2 The Solution: Explicit Context Object
# db/seeds/context.rb
module Seeds
module Support
class Context
# Core entities
attr_accessor :company
attr_accessor :admin_user, :standard_user
attr_accessor :candidate_demo_user, :access_request_user
attr_accessor :admin_profile, :full_data_candidate_users
attr_accessor :demo_job_posts
# Client roles
attr_accessor :client_user, :client_owner, :client_manager
# Batch processing
attr_accessor :batch_profile_result
def initialize
@full_data_candidate_users = []
@demo_job_posts = []
end
# Convenience methods
def company?
!@company.nil?
end
def batch_profiles?
@batch_profile_result&.success? == true
end
def batch_profile_count
@batch_profile_result&.profiles_created || 0
end
# Dependency validation
def require!(*attrs)
missing = attrs.select { |attr| send(attr).nil? }
raise "Missing required context: #{missing.join(', ')}" if missing.any?
end
# Debugging helper
def summary
{
company: @company&.name,
admin_user: @admin_user&.email,
full_data_candidates: @full_data_candidate_users.size,
batch_profiles: batch_profile_count
}
end
end
end
end
5.3 Usage in Steps
# Access via ctx helper
def seed_job_posts
ctx.require!(:company, :admin_user) # Fail fast if deps missing
job = JobPost.create!(
company: ctx.company,
user: ctx.admin_user,
title: "Senior Engineer"
)
ctx.demo_job_posts << job
end
5.4 Why Context Object Matters
| Aspect | Instance Variables | Context Object |
|---|---|---|
| Discovery | grep @ across files |
ctx. autocomplete |
| Documentation | None | attr_accessor list |
| Validation | Runtime errors | ctx.require! |
| Testing | Mock everything | Inject test context |
| Debugging | puts everywhere | ctx.summary |
6. Feature 3: Conditional Execution via ENV Flags
6.1 Use Cases
| Use Case | ENV Flag | Behavior |
|---|---|---|
| Skip slow reports | SKIP_SEED_REPORTS=true |
Reports step skipped |
| Create 40 full candidates | SEED_FULL_CANDIDATES=true |
Rich candidate data |
| Create 100k profiles | CREATE_BATCH_PROFILES=true |
Batch processing enabled |
6.2 Implementation
# db/seeds/steps.rb
OPTIONAL_STEPS = {
# Skip steps (default: run)
seed_reports: "SKIP_SEED_REPORTS",
# Enable steps (default: skip)
seed_full_data_candidates: "SEED_FULL_CANDIDATES",
seed_full_data_candidates_job_board: "SEED_FULL_CANDIDATES",
seed_batch_profiles: "CREATE_BATCH_PROFILES"
}.freeze
def skip_step?(step_name)
env_var = OPTIONAL_STEPS[step_name]
return false unless env_var
if env_var.start_with?("SKIP_")
ENV[env_var] == "true" # Skip if true
else
ENV[env_var] != "true" # Skip unless true
end
end
6.3 Usage Examples
# Standard development seed (fast)
rails db:seed
# Full candidates for feature testing
SEED_FULL_CANDIDATES=true rails db:seed
# 100k profiles for QA stress testing
CREATE_BATCH_PROFILES=true rails db:seed
# Skip reports for faster iteration
SKIP_SEED_REPORTS=true rails db:seed
# Combined flags
SEED_FULL_CANDIDATES=true CREATE_BATCH_PROFILES=true rails db:seed
6.4 Benefits
| Scenario | Without Flags | With Flags |
|---|---|---|
| Daily development | ~5 min (full seed) | ~30 sec (minimal) |
| Feature testing | Edit code | Set ENV var |
| QA stress test | Impossible | One command |
| CI pipeline | Full seed always | Skip reports |
7. Feature 4: Progress Reporting with Timing
7.1 The Problem: Silent Seeds
$ rails db:seed
# ... silence for 30 seconds ...
# ... more silence ...
# Is it working? Is it stuck?
7.2 The Solution: Real-Time Progress
# db/seeds/timing.rb
module Seeds
module Support
module Timing
class << self
def start_run(total_steps)
@total_steps = total_steps
@current_step = 0
@step_timings = {}
@run_started_at = Time.current
end
def track_step(step_name)
@current_step += 1
started = Time.current
print_progress(step_name) unless quiet?
yield
duration = Time.current - started
@step_timings[step_name] = duration
end
def print_progress(step_name)
puts format(
"[%2d/%2d] %-40s",
@current_step, @total_steps,
step_name.to_s.gsub(/^seed_/, "").humanize
)
end
def print_summary
puts "\n#{'=' * 60}"
puts "SEED TIMING SUMMARY"
puts "=" * 60
@step_timings.sort_by { |_, v| -v }.first(10).each do |step, duration|
puts format(" %-40s %6.2fs", step, duration)
end
puts "-" * 60
puts format(" %-40s %6.2fs", "TOTAL", total_duration)
puts "=" * 60
end
end
end
end
end
7.3 Output Example
[ 1/37] Predefined avatar assets
[ 2/37] Company
[ 3/37] Job titles
[ 4/37] Admin users
...
[35/37] Summary
[36/37] Demo recommendation
============================================================
SEED TIMING SUMMARY
============================================================
seed_full_data_candidates 12.34s
seed_random_profiles 8.21s
seed_homepage_profiles 4.56s
seed_job_posts 3.12s
...
------------------------------------------------------------
TOTAL 45.67s
============================================================
7.4 Benefits
| Aspect | Silent Seeds | Progress + Timing |
|---|---|---|
| Feedback | None | Real-time progress |
| Debugging | Which step failed? | Clear step names |
| Optimization | Guess which is slow | Timing breakdown |
| CI logs | Useless | Actionable |
8. Feature 5: Post-Seed Data Validation
8.1 The Problem: Silent Failures
# Seeds "complete" but data is broken
def seed_admin_users
@admin_user = User.find_or_initialize_by(email: 'admin@bluegeko.com')
@admin_user.save # No bang! Silent failure
end
# Later, in production demo...
# "Why is there no admin user?!"
8.2 The Solution: Validation Module
# db/seeds/validation.rb
module Seeds
module Support
module Validation
CHECKS = {
# Integrity checks (errors)
orphaned_profiles: -> { Profile.left_joins(:user).where(users: { id: nil }) },
candidates_without_profiles: -> { User.candidate.left_joins(:profile).where(profiles: { id: nil }) },
orphaned_job_posts: -> { JobPost.left_joins(:company).where(companies: { id: nil }) },
# Minimum data checks (errors)
minimum_users: -> { User.count >= 5 },
minimum_profiles: -> { Profile.count >= 3 },
minimum_companies: -> { Company.count >= 1 },
minimum_job_posts: -> { JobPost.count >= 5 },
# Quality checks (warnings)
profiles_with_skills: -> { Profile.joins(:skills).distinct.count > 0 },
profiles_with_work_experience: -> { Profile.joins(:work_experiences).distinct.count > 0 },
users_with_login_methods: -> { User.joins(:login_methods).distinct.count > 0 }
}.freeze
class << self
def run
return skip_result if skip?
result = ValidationResult.new
run_integrity_checks(result)
run_minimum_checks(result)
run_quality_checks(result) unless strict?
print_results(result)
result
end
private
def run_integrity_checks(result)
%i[orphaned_profiles candidates_without_profiles orphaned_job_posts].each do |check|
count = CHECKS[check].call.count
if count > 0
result.add_error(check, "Found #{count} #{check.to_s.humanize.downcase}")
else
result.add_passed(check)
end
end
end
end
end
end
end
8.3 Validation Output
============================================================
POST-SEED VALIDATION
============================================================
Integrity Checks:
✓ No orphaned profiles
✓ No candidates without profiles
✓ No orphaned job posts
Minimum Data Checks:
✓ Users: 47 (minimum: 5)
✓ Profiles: 42 (minimum: 3)
✓ Companies: 5 (minimum: 1)
✓ Job Posts: 25 (minimum: 5)
Quality Checks:
✓ Profiles with skills: 38
✓ Profiles with work experience: 35
⚠ Users with login methods: 0 (warning)
------------------------------------------------------------
Result: PASSED (14 checks, 1 warning)
============================================================
8.4 ENV Controls
# Skip validation (CI speed)
SKIP_SEED_VALIDATION=true rails db:seed
# Strict mode (fail on warnings)
SEED_VALIDATION_STRICT=true rails db:seed
9. Feature 6: 100k+ Full Profile Scaling
9.1 The Challenge
Our AI team needed 100,000 realistic candidate profiles for model training. QA needed the same for stress testing. Creating records one-by-one would take hours.
Requirements:
- Full profiles with ALL associations
- Category-appropriate job titles
- Skills, work experiences, education
- Portfolio, contact info, avatars
- Realistic data distributions
- Conditional execution (don’t run in daily dev)
9.2 What Gets Created
Each batch profile is a complete candidate with:
| Association | Per Profile | Total (100k) |
|---|---|---|
| User + Login Security | 1 | 100,000 |
| Profile | 1 | 100,000 |
| Skills | 3-6 | ~450,000 |
| Work Experiences | 2-4 | ~300,000 |
| Education Items | 1-2 | ~150,000 |
| Languages | 1-3 | ~200,000 |
| Contact Infos | 2-4 | ~300,000 |
| Portfolio + Links | 3-9 | ~600,000 |
| Location Setting | 1 | 100,000 |
| Salary Expectation | 1 | 100,000 |
| Avatar Session + Images | 4-5 | ~450,000 |
| Total Records | ~20-30 | ~2,500,000+ |
9.3 Category-Specific Data
CATEGORY_OCCUPATIONS = {
"technology" => [
"Software Engineer", "Senior Software Engineer", "Staff Software Engineer",
"DevOps Engineer", "Data Scientist", "Machine Learning Engineer",
"Mobile Developer", "Cloud Architect", "Security Engineer"
],
"design" => [
"UX Designer", "Product Designer", "Creative Director",
"Interaction Designer", "Design Systems Lead"
],
"business" => [
"Project Manager", "Product Manager", "Business Analyst",
"Strategy Consultant", "Operations Manager"
],
# ... 9 categories total
}.freeze
CATEGORY_SKILLS = {
"technology" => %w[Ruby Python JavaScript TypeScript Docker Kubernetes AWS],
"design" => %w[Figma Sketch Photoshop Wireframing UserResearch],
"business" => %w[Excel Salesforce Tableau Leadership Strategy],
# ...
}.freeze
9.4 Implementation Highlights
# db/seeds/batch_profiles.rb
module Seeds
module Support
module BatchProfiles
DEFAULT_COUNT = 100_000
CHUNK_SIZE = 500
def run(count: target_count, on_progress: nil)
result = BatchResult.new(target_count: count)
count.times.each_slice(chunk_size) do |indices|
chunk_result = process_chunk(indices)
result.record_chunk(chunk_result)
on_progress&.call(result)
end
result.finish!
result
end
private
def process_chunk(indices)
indices.each do |index|
# Select random category
category = categories.sample
# Create user (ActiveRecord for Devise)
user = create_batch_user(index: index)
# Create profile with category-appropriate job title
profile = create_batch_profile(user: user, category: category)
# Create ALL associations
create_skills(profile, category.slug)
create_work_experiences(profile, category.slug)
create_education_items(profile)
create_languages(profile)
create_contact_infos(profile)
create_portfolio(profile)
create_location_setting(profile)
create_salary_expectation(profile)
create_avatar_data(profile)
end
end
end
end
end
9.5 Usage
# Create 100,000 full profiles (default)
CREATE_BATCH_PROFILES=true rails db:seed
# Create 10,000 for smaller tests
CREATE_BATCH_PROFILES=true BATCH_PROFILE_COUNT=10000 rails db:seed
# With verbose progress
CREATE_BATCH_PROFILES=true BATCH_VERBOSE=true rails db:seed
9.6 Output
======================================================================
[BATCH] Starting FULL candidate profile creation
======================================================================
Target count: 100,000
Chunk size: 500
Associations: Skills, Work Exp, Education, Languages, Contacts, etc.
======================================================================
[=======>--------------------] 25.0% | Chunk 50/200 | 25000 profiles | 625,000 assoc
======================================================================
[BATCH] Full candidate profile creation complete!
======================================================================
Users created: 100,000
Profiles created: 100,000
Associations created: 2,543,721
Total duration: 3h 12m
Rate: 8.7 profiles/second
======================================================================
9.7 Performance Characteristics
| Count | Time | Records Created |
|---|---|---|
| 1,000 | ~2-3 min | ~25,000 |
| 10,000 | ~20-30 min | ~250,000 |
| 100,000 | ~3-5 hours | ~2,500,000 |
10. Benefits and Use Cases
10.1 Use Case Matrix
| Stakeholder | Command | Result |
|---|---|---|
| Daily Dev | rails db:seed |
~30s, minimal data |
| Feature Testing | SEED_FULL_CANDIDATES=true rails db:seed |
~2 min, 40 rich profiles |
| QA Stress Test | CREATE_BATCH_PROFILES=true rails db:seed |
~3h, 100k profiles |
| AI Training | CREATE_BATCH_PROFILES=true BATCH_PROFILE_COUNT=500000 rails db:seed |
~15h, 500k profiles |
| CEO Demo | rails db:seed |
Production-like feel |
| New Engineer | rails db:seed |
Working environment in 30s |
10.2 Before vs After Comparison
| Aspect | Before | After |
|---|---|---|
| File organization | 1 file, 3,610 lines | 37 files, ~80 lines each |
| Test coverage | 0% | 400+ specs |
| Documentation | None | 1,797-line README |
| Progress feedback | Silent | Real-time with timing |
| Data validation | None | 14 automated checks |
| Scalability | ~100 profiles | 100,000+ profiles |
| Conditional execution | Edit code | ENV flags |
| Onboarding time | Days | Hours |
| Git conflicts | Daily | Rare |
10.3 ROI
Time saved per week:
- 5 engineers × 10 min/day waiting for seeds = 4+ hours
- 2 QA engineers × 1 hour/week debugging broken seeds = 2 hours
- 1 demo preparation × 2 hours = 2 hours
Total: ~8 hours/week = 1 FTE day
11. Testing the Seed System
11.1 Test Structure
spec/seeds/
├── support/
│ ├── helpers_spec.rb # Unit tests for helpers
│ ├── context_spec.rb # Context object tests
│ ├── timing_spec.rb # Timing module tests
│ ├── validation_spec.rb # Validation tests
│ └── batch_profiles_spec.rb # Batch processing tests
├── reference_data_spec.rb # Reference data tests
├── integration_spec.rb # Full seed run tests
└── steps/
├── shared_examples.rb # Reusable step examples
├── company_spec.rb # Step-specific tests
└── ...
11.2 Shared Examples for Steps
# spec/seeds/steps/shared_examples.rb
RSpec.shared_examples "a seed step that creates records" do |model_class, expected_count:|
it "creates #{expected_count} #{model_class.name.pluralize}" do
expect { run_step }.to change(model_class, :count).by(expected_count)
end
end
RSpec.shared_examples "a seed step that is idempotent" do
it "does not create duplicates on second run" do
run_step
expect { run_step }.not_to change(described_model, :count)
end
end
11.3 Test Coverage
| Component | Specs | Coverage |
|---|---|---|
| Helpers | 45 | 100% |
| Context | 32 | 100% |
| Timing | 28 | 100% |
| Validation | 31 | 100% |
| Batch Profiles | 49 | 100% |
| Reference Data | 89 | 100% |
| Individual Steps | 126 | Key paths |
| Total | 400+ | High |
12. Three Bonus Ideas Worth Implementing
12.1 Idea 1: Seed Data Snapshots
Problem: Recreating 100k profiles takes 3 hours every time.
Solution: Database snapshots for instant restore.
# Potential implementation
module Seeds
module Snapshots
def save(name)
`pg_dump wigiwork_development > snapshots/#{name}.sql`
end
def restore(name)
`psql wigiwork_development < snapshots/#{name}.sql`
end
end
end
# Usage
# After 3-hour batch: SEED_SNAPSHOT=save rails db:seed
# Next time: SEED_SNAPSHOT=restore:batch_100k rails db:seed (~30 sec)
Benefit: 3 hours → 30 seconds for QA environments.
12.2 Idea 2: Faker Seed Consistency
Problem: Random data changes every seed run, breaking visual regression tests.
Solution: Seed Faker’s random generator.
# db/seeds/support.rb
def with_consistent_faker
original_seed = Faker::Config.random
Faker::Config.random = Random.new(42)
yield
ensure
Faker::Config.random = original_seed
end
# Usage
with_consistent_faker do
seed_profiles # Same names, emails, etc. every time
end
Benefit: Consistent screenshots for visual regression testing.
12.3 Idea 3: Seed Data Metrics Dashboard
Problem: No visibility into what’s seeded without running queries.
Solution: Post-seed metrics report.
# db/seeds/metrics.rb
module Seeds
module Metrics
def report
{
users: {
total: User.count,
by_role: User.group(:role).count,
with_profiles: User.joins(:profile).count
},
profiles: {
total: Profile.count,
with_skills: Profile.joins(:skills).distinct.count,
avg_skills: Profile.joins(:skills).group(:id).count.values.sum.to_f / Profile.count,
by_category: Profile.joins(:category).group("categories.slug").count
},
companies: {
total: Company.count,
with_jobs: Company.joins(:job_posts).distinct.count
}
}
end
end
end
Output:
Users:
total: 100,047
by_role: { admin: 2, candidate: 100,040, client: 5 }
with_profiles: 100,042
Profiles:
total: 100,042
with_skills: 99,800
avg_skills: 4.2
by_category: { technology: 45,000, design: 15,000, ... }
Companies:
total: 5
with_jobs: 3
Benefit: Instant insight into seed data quality and distribution.
13. Conclusion
13.1 What We Achieved
| Metric | Before | After |
|---|---|---|
| Lines in main file | 3,610 | 217 |
| Number of files | 1 | 37 |
| Test coverage | 0% | 400+ specs |
| Documentation | 0 lines | 1,797 lines |
| Max profiles | ~100 | 100,000+ |
| Progress feedback | None | Real-time |
| Data validation | None | 14 checks |
| Conditional execution | None | 5 ENV flags |
13.2 Key Takeaways
- Modular beats monolithic — Split large files into focused modules
- Explicit state beats implicit — Context objects over instance variables
- Conditional execution is essential — ENV flags for different scenarios
- Visibility prevents pain — Progress reporting and timing
- Validation catches bugs — Automated checks before they reach demos
- Scale when needed — Batch processing for volume requirements
- Document everything — Future you (and new engineers) will thank you
13.3 The Real Win
The technical improvements are measurable, but the real win is confidence:
- Confident that seeds work (validation)
- Confident about progress (timing)
- Confident in isolation (modular files)
- Confident at scale (batch processing)
- Confident for demos (realistic data)
- Confident for new engineers (documentation)
Invest in your seed system. It’s the foundation everything else is built on.
Resources
- Rails Guides on Seeds: guides.rubyonrails.org/active_record_migrations.html#migrations-and-seed-data
- Faker Gem: github.com/faker-ruby/faker
- FactoryBot: github.com/thoughtbot/factory_bot
- Database Cleaner: github.com/DatabaseCleaner/database_cleaner
- Our Seed System Documentation: See
db/seeds/README.mdin your codebase
This post documents the transformation of the Wigiwork seed system from a 3,600-line monolith to an enterprise-grade data generation platform. The patterns described here—modular architecture, context objects, conditional execution, progress tracking, validation, and batch processing—are applicable to any Rails application that has outgrown simple seeds.