CockroachDB Schema Design Rule

This rule provides comprehensive guidance for designing efficient schemas in CockroachDB's distributed environment. It covers primary key selection, partitioning strategies, index optimization, and data distribution patterns that maximize performance while avoiding common pitfalls like hotspots and range splits.

# CockroachDB Schema Design Rule ## Overview This rule provides comprehensive guidance for designing efficient schemas in CockroachDB's distributed environment. It covers primary key selection, partitioning strategies, index optimization, and data distribution patterns that maximize performance while avoiding common pitfalls like hotspots and range splits. ## Implementation ### 1. Primary Key Design Patterns #### Avoid Sequential Primary Keys ```sql -- Bad: Sequential primary key causes hotspots CREATE TABLE orders_bad ( id SERIAL PRIMARY KEY, customer_id INT, order_date TIMESTAMP, total DECIMAL(10,2) ); -- Good: UUID primary key for distributed writes CREATE TABLE orders_good ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), customer_id INT, order_date TIMESTAMP, total DECIMAL(10,2) ); -- Better: Composite primary key for natural distribution CREATE TABLE orders_better ( customer_id INT, order_date DATE, order_id UUID DEFAULT gen_random_uuid(), total DECIMAL(10,2), PRIMARY KEY (customer_id, order_date, order_id) ); ``` #### Time-Based Primary Keys ```sql -- Good: Time-based UUID for chronological ordering CREATE TABLE events ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), event_time TIMESTAMPTZ DEFAULT NOW(), event_type STRING, payload JSONB, -- Index for time-based queries INDEX idx_events_time (event_time) ); -- Better: Combine time with hash for distribution CREATE TABLE events_distributed ( shard_id INT DEFAULT (random() * 100)::INT, event_time TIMESTAMPTZ DEFAULT NOW(), event_id UUID DEFAULT gen_random_uuid(), event_type STRING, payload JSONB, PRIMARY KEY (shard_id, event_time, event_id) ); ``` ### 2. Partitioning Strategies #### Range Partitioning ```sql -- Partition by date ranges CREATE TABLE sales ( sale_id UUID DEFAULT gen_random_uuid(), sale_date DATE, customer_id INT, amount DECIMAL(10,2), region STRING, PRIMARY KEY (sale_date, sale_id) ) PARTITION BY RANGE (sale_date) ( PARTITION p_2024_q1 VALUES FROM ('2024-01-01') TO ('2024-04-01'), PARTITION p_2024_q2 VALUES FROM ('2024-04-01') TO ('2024-07-01'), PARTITION p_2024_q3 VALUES FROM ('2024-07-01') TO ('2024-10-01'), PARTITION p_2024_q4 VALUES FROM ('2024-10-01') TO ('2025-01-01') ); -- Add future partitions ALTER TABLE sales ADD PARTITION p_2025_q1 VALUES FROM ('2025-01-01') TO ('2025-04-01'); ``` #### Hash Partitioning ```sql -- Partition by hash for even distribution CREATE TABLE user_sessions ( session_id UUID DEFAULT gen_random_uuid(), user_id INT, created_at TIMESTAMPTZ DEFAULT NOW(), last_activity TIMESTAMPTZ, session_data JSONB, PRIMARY KEY (user_id, session_id) ) PARTITION BY HASH (user_id) ( PARTITION p0, PARTITION p1, PARTITION p2, PARTITION p3, PARTITION p4, PARTITION p5, PARTITION p6, PARTITION p7 ); ``` #### List Partitioning ```sql -- Partition by specific values (regions, categories) CREATE TABLE products ( product_id UUID DEFAULT gen_random_uuid(), category STRING, name STRING, price DECIMAL(10,2), region STRING, PRIMARY KEY (region, product_id) ) PARTITION BY LIST (region) ( PARTITION p_us VALUES IN ('us-east', 'us-west'), PARTITION p_eu VALUES IN ('eu-west', 'eu-central'), PARTITION p_asia VALUES IN ('asia-pacific', 'asia-southeast') ); ``` ### 3. Index Optimization #### Covering Indexes ```sql -- Create covering indexes to avoid key lookups CREATE TABLE customers ( customer_id INT PRIMARY KEY, email STRING UNIQUE, name STRING, city STRING, country STRING, created_at TIMESTAMPTZ ); -- Covering index for common query patterns CREATE INDEX idx_customers_location_covering ON customers (country, city) STORING (name, email, created_at); -- Query benefits from covering index SELECT name, email, created_at FROM customers WHERE country = 'USA' AND city = 'San Francisco'; ``` #### Partial Indexes ```sql -- Create partial indexes for filtered queries CREATE TABLE orders ( order_id UUID PRIMARY KEY, customer_id INT, status STRING, order_date TIMESTAMPTZ, total DECIMAL(10,2) ); -- Partial index for active orders only CREATE INDEX idx_orders_active ON orders (customer_id, order_date) WHERE status IN ('pending', 'processing'); -- Partial index for high-value orders CREATE INDEX idx_orders_high_value ON orders (order_date DESC) WHERE total > 1000.00; ``` #### Expression Indexes ```sql -- Index on computed expressions CREATE TABLE users ( user_id INT PRIMARY KEY, email STRING, first_name STRING, last_name STRING, created_at TIMESTAMPTZ ); -- Index on computed full name CREATE INDEX idx_users_full_name ON users ((first_name || ' ' || last_name)); -- Index on extracted date part CREATE INDEX idx_users_created_month ON users (EXTRACT(MONTH FROM created_at)); ``` ### 4. Data Distribution Patterns #### Avoiding Hotspots ```sql -- Bad: Creates hotspots on single node CREATE TABLE metrics_bad ( id SERIAL PRIMARY KEY, timestamp TIMESTAMPTZ DEFAULT NOW(), metric_name STRING, value FLOAT, tags JSONB ); -- Good: Distributed primary key CREATE TABLE metrics_good ( metric_name STRING, timestamp TIMESTAMPTZ, instance_id STRING, value FLOAT, tags JSONB, PRIMARY KEY (metric_name, timestamp, instance_id) ); -- Better: Hash-distributed with time component CREATE TABLE metrics_better ( shard_key INT DEFAULT (random() * 1000)::INT, metric_name STRING, timestamp TIMESTAMPTZ DEFAULT NOW(), instance_id STRING, value FLOAT, tags JSONB, PRIMARY KEY (shard_key, metric_name, timestamp, instance_id) ); ``` #### Multi-Region Schema Design ```sql -- Configure database for multi-region ALTER DATABASE myapp SET PRIMARY REGION 'us-east1'; ALTER DATABASE myapp ADD REGION 'us-west1'; ALTER DATABASE myapp ADD REGION 'eu-west1'; -- Regional table for user data CREATE TABLE users_regional ( user_id UUID PRIMARY KEY, email STRING UNIQUE, name STRING, region STRING, created_at TIMESTAMPTZ ) LOCALITY REGIONAL BY ROW AS region; -- Global table for reference data CREATE TABLE countries ( country_code STRING PRIMARY KEY, country_name STRING, currency STRING ) LOCALITY GLOBAL; -- Regional by table for localized content CREATE TABLE user_preferences ( user_id UUID PRIMARY KEY, preferences JSONB, language STRING, timezone STRING ) LOCALITY REGIONAL BY TABLE IN 'us-east1'; ``` ### 5. JSON and Semi-Structured Data #### JSONB Schema Design ```sql -- Efficient JSONB column design CREATE TABLE events ( event_id UUID PRIMARY KEY DEFAULT gen_random_uuid(), event_type STRING, event_data JSONB, created_at TIMESTAMPTZ DEFAULT NOW(), -- Index on JSONB fields INDEX idx_events_type (event_type), INDEX idx_events_user_id ((event_data->>'user_id')), INDEX idx_events_timestamp ((event_data->>'timestamp')) ); -- GIN index for full JSONB search CREATE INVERTED INDEX idx_events_data_gin ON events USING GIN (event_data); -- Computed columns for frequently accessed JSON fields ALTER TABLE events ADD COLUMN user_id INT AS ((event_data->>'user_id')::INT) STORED; CREATE INDEX idx_events_user_id_computed ON events (user_id); ``` #### Array and Nested Data ```sql -- Design for array columns CREATE TABLE products ( product_id UUID PRIMARY KEY, name STRING, tags STRING[], categories INT[], metadata JSONB ); -- Index on array elements CREATE INVERTED INDEX idx_products_tags ON products USING GIN (tags); CREATE INVERTED INDEX idx_products_categories ON products USING GIN (categories); -- Query patterns for arrays SELECT * FROM products WHERE 'electronics' = ANY(tags); SELECT * FROM products WHERE categories @> ARRAY[1, 2]; ``` ### 6. Temporal Data Patterns #### Time-Series Data Design ```sql -- Time-series table with proper partitioning CREATE TABLE sensor_data ( sensor_id INT, timestamp TIMESTAMPTZ, reading_type STRING, value DECIMAL(10,4), metadata JSONB, PRIMARY KEY (sensor_id, timestamp, reading_type) ) PARTITION BY RANGE (timestamp) ( PARTITION p_2024_01 VALUES FROM ('2024-01-01') TO ('2024-02-01'), PARTITION p_2024_02 VALUES FROM ('2024-02-01') TO ('2024-03-01'), PARTITION p_2024_03 VALUES FROM ('2024-03-01') TO ('2024-04-01') ); -- TTL for automatic data cleanup ALTER TABLE sensor_data SET (ttl_expire_after = '90 days'); ``` #### Change Data Capture Schema ```sql -- Audit table design CREATE TABLE audit_log ( log_id UUID PRIMARY KEY DEFAULT gen_random_uuid(), table_name STRING, operation STRING, record_id STRING, old_values JSONB, new_values JSONB, changed_by STRING, changed_at TIMESTAMPTZ DEFAULT NOW(), INDEX idx_audit_table_time (table_name, changed_at), INDEX idx_audit_record (record_id) ); -- Trigger for change tracking CREATE OR REPLACE FUNCTION audit_trigger_function() RETURNS TRIGGER AS $$ BEGIN IF TG_OP = 'UPDATE' THEN INSERT INTO audit_log (table_name, operation, record_id, old_values, new_values, changed_by) VALUES (TG_TABLE_NAME, 'UPDATE', OLD.id::STRING, row_to_json(OLD), row_to_json(NEW), current_user); END IF; RETURN NEW; END; $$ LANGUAGE plpgsql; ``` ## Best Practices ### 1. Primary Key Selection - **Avoid SERIAL/SEQUENCE**: Use UUID or composite keys for distributed writes - **Consider query patterns**: Design primary keys around most common access patterns - **Balance distribution**: Ensure even data distribution across nodes - **Include time component**: For time-based queries, include timestamp in primary key ### 2. Index Strategy - **Create covering indexes**: Include frequently selected columns in STORING clause - **Use partial indexes**: Filter indexes for specific query conditions - **Monitor index usage**: Regularly check and remove unused indexes - **Consider inverted indexes**: Use GIN indexes for JSONB and array columns ### 3. Partitioning Guidelines - **Partition large tables**: Use partitioning for tables > 100GB - **Align with query patterns**: Partition on columns used in WHERE clauses - **Automate partition management**: Create scripts for adding/dropping partitions - **Monitor partition sizes**: Keep partitions reasonably sized (< 64GB) ### 4. Data Types and Constraints - **Use appropriate data types**: Choose minimal types that fit your data - **Add constraints**: Use CHECK constraints for data validation - **Consider collation**: Use appropriate collation for string comparisons - **Use ENUM types**: For fixed sets of values ### 5. Multi-Region Considerations - **Choose appropriate locality**: REGIONAL BY ROW for user data, GLOBAL for reference data - **Consider latency**: Place data close to users - **Plan for consistency**: Understand consistency guarantees across regions - **Monitor cross-region traffic**: Minimize expensive cross-region operations ## Common Issues ### 1. Hotspot Detection **Problem**: Single node handling all writes **Solution**: ```sql -- Check for range hotspots SELECT range_id, start_key, end_key, lease_holder, replica_localities FROM crdb_internal.ranges WHERE table_name = 'your_table' ORDER BY range_id; -- Monitor query distribution SELECT node_id, count(*) as query_count, sum(service_lat) as total_latency FROM crdb_internal.node_statement_statistics GROUP BY node_id ORDER BY query_count DESC; ``` ### 2. Large Row Issues **Problem**: Rows exceeding recommended size (64KB) **Solution**: ```sql -- Check row sizes SELECT table_name, avg_size, max_size FROM crdb_internal.table_row_statistics WHERE max_size > 65536 ORDER BY max_size DESC; -- Normalize large columns CREATE TABLE documents ( doc_id UUID PRIMARY KEY, title STRING, summary STRING, created_at TIMESTAMPTZ ); CREATE TABLE document_content ( doc_id UUID REFERENCES documents(doc_id), content_type STRING, content BYTES, PRIMARY KEY (doc_id, content_type) ); ``` ### 3. Index Bloat **Problem**: Unused or redundant indexes consuming space **Solution**: ```sql -- Find unused indexes SELECT schemaname, tablename, indexname, idx_tup_read, idx_tup_fetch FROM pg_stat_user_indexes WHERE idx_tup_read = 0 AND idx_tup_fetch = 0; -- Check index sizes SELECT schemaname, tablename, indexname, pg_size_pretty(pg_relation_size(indexrelid)) as index_size FROM pg_stat_user_indexes ORDER BY pg_relation_size(indexrelid) DESC; ``` ### 4. Partition Pruning Issues **Problem**: Queries scanning all partitions **Solution**: ```sql -- Verify partition pruning EXPLAIN (VERBOSE) SELECT * FROM sales WHERE sale_date >= '2024-01-01' AND sale_date < '2024-02-01'; -- Ensure partition key in WHERE clause SELECT * FROM sales WHERE sale_date = '2024-01-15' -- Good: uses partition key AND customer_id = 123; -- Avoid queries without partition key SELECT * FROM sales WHERE customer_id = 123; -- Bad: scans all partitions ``` ### 5. Foreign Key Performance **Problem**: Foreign key constraints causing contention **Solution**: ```sql -- Consider removing foreign keys in high-throughput scenarios ALTER TABLE orders DROP CONSTRAINT fk_orders_customer; -- Use application-level validation instead CREATE TABLE orders ( order_id UUID PRIMARY KEY, customer_id INT, -- No foreign key constraint order_date TIMESTAMPTZ, total DECIMAL(10,2) ); -- Add validation in application code func ValidateCustomerExists(customerID int) error { var exists bool err := db.QueryRow("SELECT EXISTS(SELECT 1 FROM customers WHERE id = $1)", customerID).Scan(&exists) if err != nil { return err } if !exists { return fmt.Errorf("customer %d does not exist", customerID) } return nil } ``` ## Schema Evolution ### Safe Schema Changes ```sql -- Add column (safe) ALTER TABLE users ADD COLUMN phone STRING; -- Add index concurrently (safe) CREATE INDEX CONCURRENTLY idx_users_phone ON users (phone); -- Add constraint (requires validation) ALTER TABLE users ADD CONSTRAINT chk_phone_format CHECK (phone ~ '^[0-9-+()]+$'); -- Drop column (potentially unsafe) ALTER TABLE users DROP COLUMN old_field; ``` ### Migration Scripts ```sql -- Example migration script BEGIN; -- Create new table structure CREATE TABLE users_new ( user_id UUID PRIMARY KEY DEFAULT gen_random_uuid(), email STRING UNIQUE, name STRING, phone STRING, created_at TIMESTAMPTZ DEFAULT NOW() ); -- Copy data from old table INSERT INTO users_new (user_id, email, name, created_at) SELECT user_id, email, name, created_at FROM users; -- Rename tables ALTER TABLE users RENAME TO users_old; ALTER TABLE users_new RENAME TO users; -- Update application code here -- Drop old table after verification -- DROP TABLE users_old; COMMIT; ```

Created: 6/1/2025

Keywords: text snippets, slack for ai prompts, slack for ai, AI consulting, AI Cheat Tool, AI Cheat Tool for developers, AI Cheat Tool for AI, AI Cheat Tool for ChatGPT, chatgpt prompt generator, AI Cheat Tool for email, AI Cheat Tool for text, AI Cheat Tool for keyboard shortcuts, AI Cheat Tool for text expansion, AI Cheat Tool for text snippets, AI Cheat Tool for text replacement, AI Cheating Tool, AI Cheating Tool for developers, AI Cheating Tool for AI, AI Cheating Tool for ChatGPT, AI Cheating Tool for email, AI Cheating Tool for text, AI Cheating Tool for keyboard shortcuts, prompt cheating, AI prompt engineering, AI context engineering, context engineering, ai prompt manager, AI prompt manager, AI prompt management, ai consulting, prompt engineering consulting, generative ai consulting, ai implementation services, llm integration consultants, ai strategy for enterprises, enterprise ai transformation, ai prompt optimization, large language model consulting, ai training for teams, ai workflow automation, build ai knowledge base, llm prompt management, ai prompt infrastructure, ai adoption consulting, enterprise ai onboarding, custom ai workflow design, ai integration for dev teams, ai productivity tools, team prompt collaboration, github gists, github snippets, github code snippets, github code snippets automation, github, text expansion, text automation, snippet manager, code snippets, team collaboration tools, shared snippets, snippet sharing, keyboard shortcuts, productivity tools, workflow automation, AI-powered productivity, snippet tool for teams, team knowledge base, AI text completion, text expander for teams, snippet collaboration, multi-platform productivity, custom keyboard shortcuts, snippet sharing platform, collaborative snippet management, knowledge base automation, team productivity software, business productivity tools, snippet management software, quick text input, macOS productivity apps, Windows productivity tools, Linux productivity tools, cloud-based snippets, cross-platform snippets, team workspace tools, workflow enhancement tools, automation tools for teams, text automation software, team knowledge sharing, task automation, integrated team tools, real-time collaboration, AI for team productivity, business text automation, time-saving tools, clipboard manager, multi-device clipboard, keyboard shortcut manager, team communication tools, project management integration, productivity boost AI, text snippet sharing, text replacement software, text management tools, efficient team collaboration, AI workspace tools, modern productivity apps, custom text automation, digital workspace tools, collaborative workspaces, cloud productivity tools, streamline team workflows, smart text management, snippets AI app, snippet management for teams, shared knowledge platforms, team-focused text automation, team productivity platform, AI text expansion tools, snippet taking app, note taking app, note taking software, note taking tools, note taking app for teams, note taking app for developers, note taking app for AI, note taking app for ChatGPT, snippet software, snippet tools, snippet app for teams, snippet app for developers, snippet app for AI, snippet app for ChatGPT, AI agent builder, AI agent snippets, AI agent prompts, prompt management, prompt engineering, ChatGPT snippets, ChatGPT prompts, AI prompt optimization, AI-powered prompts, prompt libraries for AI, prompt sharing for ChatGPT, GPT productivity tools, AI assistant snippets, ChatGPT integrations, custom AI prompts, AI agent workflows, machine learning snippets, automated AI prompts, AI workflow automation, collaborative AI prompts, personalized AI agents, text snippets for ChatGPT, AI prompt creation tools, AI code snippet manager, GPT-4 text automation, AI-powered writing assistants, AI tools for developers, AI agent integrations, developer prompt snippets, AI text generation workflows, AI-enhanced productivity, GPT prompt sharing tools, team collaboration for AI, openAI integrations, text automation for AI teams, AI-powered collaboration tools, GPT-4 team tools, AI-driven text expanders, AI-driven productivity solutions, AI agent for email writing, AI agent for text expansion, AI agent for text automation, AI agent for text snippets, AI agent for text replacement, AI agent for keyboard shortcuts, AI Agent Developer, Prompt engineering, Machine Learning Engineer, AI Engineer, Customer Support, Code snippets for developers, Recruiting, AI agent for automation, AI agent for AI automation, AI agent for ChatGPT automation, AI agent for email automation, electron app for snippets, desktop snippet manager, code snippet organization, AI prompt repository, intelligent text expansion, vibe coding, Claude cli ai prompts, prompt optimizer, buy prompts, sell prompts, snippets store, sell scripts, buy scripts, buy python scripts, scraping scripts, AI prompt marketplace, ChatGPT prompt marketplace, best AI prompts, best ChatGPT prompts, AI prompt database, AI prompt packs, AI prompt bundles, GPT prompt marketplace, prompt engineering masterclass, prompt engineering certification, prompt engineering course, ChatGPT prompt store, AI prompt store, prompt monetization, sell AI prompts, buy AI prompts, prompt marketplace platform, AI prompt plugins, Claude prompt marketplace, AI prompt subscription, Custom GPT, real-time prompt collaboration, developer workflow optimization, team prompt library, knowledge management for developers, code snippet search, searchable code library, reusable code blocks, prompt engineering tools, prompt template management, collaborative coding, cross-team knowledge sharing, code snippet versioning, AI prompt templates, technical documentation tools, developer productivity suite, team snippet repository, AI prompt history, snippet synchronization, cloud snippet backup, markdown snippet support, syntax highlighting for snippets, code categorization, programming language snippets, language-specific code templates, contextual code suggestions, snippets with AI integration, command palette for snippets, code snippet folder organization, team snippet discovery, private and public snippets, enterprise code management, team codebase documentation, prompt engineering best practices, Vibe Coding, Vibe Coding for developers, Vibe Coding for AI, Vibe Coding for ChatGPT, Vibe Coding for email, Vibe Coding for text, Vibe Coding for keyboard shortcuts, Vibe Coding for text expansion, Vibe Coding for text snippets, Vibe Coding for text replacement, free prompt generator, ai prompt generator, prompt generator, promptlayer, promptimize ai, langchain prompt management, lanhsmith prompt management, latitude, langchain, langgraph, langchain documentation, raycast, text expander, raycast snippets, raycast mac, cursor, cursro ai, cursor snippets, cursor rules, cursor ai rules, learn prompting, how to prompt, prompting guide, prompting tutorials, best prompting practices, ai prompt best practices, prompting techniques, prompting, gpt, go, rest, ios, performance, react, python, logging, accessibility, typescript, openai, git, jest, pwa, spa, node, express, electron

AI Prompts, ChatGPT, Code Snippets, Prompt Engineering

CockroachDB Schema Design Rule

This rule provides comprehensive guidance for designing efficient schemas in CockroachDB's distributed environment. It covers primary key selection, partitioning strategies, index optimization, and data distribution patterns that maximize performance while avoiding common pitfalls like hotspots and range splits.