Use case · Data scientist

SQL query generation

Produce in a few minutes complex SQL queries (multiple joins, CTEs, analytical functions) that would take 30-60 min to manually write.

Data scientists and analysts spend 30 to 50% of their time writing SQL: exploration, aggregations, joins, analytical windows. AI can produce in seconds queries you'd take 30-60 minutes to write and debug. The trap: AI-generated SQL can be syntactically correct but semantically wrong (bad join, double counting, mishandled NULLs). This guide presents the rigorous workflow that maximizes productivity while avoiding invisible errors that skew business results.

Step-by-step workflow

Describe the schema to AI
Provide tables involved with their main columns, types and relations (PK/FK). Without schema, AI invents plausible but non-existent column names. Ideally paste a DDL or table dictionary.
Formulate the business question clearly
Not 'write me a SELECT', but 'for each active customer over 6 months, calculate cumulative revenue over the last 12 months and growth percentage vs prior 12 months'. The clearer the question, the better the query.
Specify the target DB and its specifics
Postgres, BigQuery, Snowflake, Redshift, MySQL: analytical functions and syntax differ. Specifying allows getting a query optimized for your dialect.
Test on a sample before production
Limit to WHERE date >= '2025-01-01' AND id < 1000 to validate quickly. Compare with a known case (manually counted) to verify result consistency.
Iterate to optimize
If the query is slow: have AI suggest optimizations (missing indexes, materialized CTEs, JOIN order). Always verify the execution plan (EXPLAIN) before going to prod.

Copyable prompts

3 tested and optimized prompts. Adapt the bracketed variables [VARIABLE] to your context.

Business SQL query generation

You are an SQL expert in [POSTGRES / BIGQUERY / SNOWFLAKE / REDSHIFT / MYSQL]. Here's my schema:

[TABLES + COLUMNS + RELATIONS]

Business question: [DETAILED QUESTION]

Write a query that:
- Uses the right JOINs (being explicit about INNER vs LEFT)
- Handles NULLs correctly
- Avoids double counting
- Uses named CTEs for readability
- Includes comments on non-trivial choices

Provide: (1) the query, (2) a 3-5 line explanation of choices, (3) expected result on a few rows for validation.

Slow query debug

This query runs in [DURATION] on [VOLUME] of data:

[QUERY]

Schema:
[TABLES + EXISTING INDEXES]

Execution plan:
[EXPLAIN ANALYZE OUTPUT]

Propose:
1. **Diagnosis**: where are the bottlenecks (full scan, bad JOIN order, missing index)?
2. **3 optimizations** in expected impact order, with the modified query for each
3. **Indexes to create** if relevant (with CREATE INDEX syntax)
4. **Risks**: impact on writes, disk space, locks

Target: under [DESIRED SLA].

Semantic error detection

Audit this SQL query for SEMANTIC errors (not just syntactic):

[QUERY]

Schema:
[TABLES + COLUMNS + APPROX CARDINALITIES]

Expected business question: [QUESTION]

Verify:
1. **JOIN cardinality**: double counting risk?
2. **NULL handling**: COUNT(col) vs COUNT(*), AVG on NULL, etc.
3. **Filters**: WHERE vs ON in LEFT JOIN, condition order
4. **Aggregations**: consistent GROUP BY, HAVING vs WHERE
5. **Edge cases**: what happens if a dimension has no fact? if multiple facts per dimension?

For each issue: (a) line concerned, (b) explanation, (c) correction.

Top tools for this use case

Curated selection of the 3 best AI tools for sql query generation.

Claude Opus 4.5

★ 4.9/5· 92 reviews·20 USD/month

Why for this use case: Excellence on complex SQL and analytical functions. Understands semantic subtleties better than competitors.

Try Claude Opus 4.5 →Full review

ChatGPT

★ 4.9/5· 528 reviews·20 USD/month

Why for this use case: Solid on all common dialects, particularly good for cross-DB conversion.

Try ChatGPT →Full review

Cursor

★ 4.8/5· 145 reviews·20 USD/month

Why for this use case: If you work on versioned SQL files (dbt, migration scripts), Cursor gives project context to AI.

Try Cursor →Full review

Estimated ROI

Time saved

60-70% on writing complex queries

Quality gain

Detection of semantic errors in pre-prod

Stack cost

Included in Claude Pro / ChatGPT Plus subscription ($20-30/month)

Estimates based on 2026 benchmarks and user feedback. Actual ROI depends on your context.

Frequently asked questions

Does AI handle all SQL dialects well?

Postgres, MySQL, BigQuery, Snowflake, Redshift: very well. SQL Server (T-SQL): well, with sometimes missed proprietary syntax. Oracle (PL/SQL): correct but requires more verification. DuckDB, SQLite: well on standard SQL, sometimes confused on extensions.

Can SQL be done on sensitive data with ChatGPT?

SQL code itself isn't sensitive — data is. So yes, you can generate SQL queries via any LLM as long as you don't send real client data. Only paste schemas and dummy examples in prompts.

Can AI replace a DBA?

For writing common queries, optimization help, documentation: largely. For DB architecture, fine SGBD tuning, high availability, backup, security: no, the DBA remains essential. AI is an excellent SQL writer, not a DBA.

Should AI-generated queries be documented?

Best practice in collaborative environment (dbt, Airflow, versioned scripts): yes, in comment with the query + prompt used. This allows reviewers to understand logic and regenerate with improvements if needed.

Dataset exploration

Quickly understand the structure, quality, and specificities of a new dataset to orient analysis.

← Back to Data scientist page

See the full stack and all use cases for this profession.

Transparency: some links are affiliate links. No impact on our evaluations or prices.

← Back to Data scientist

SQL query generation

Describe the schema to AI

Formulate the business question clearly

Specify the target DB and its specifics

Test on a sample before production

Iterate to optimize

Business SQL query generation

Slow query debug

Semantic error detection

🔬Dataset exploration

← Back to Data scientist page

Dataset exploration