Databases

Quickest Way to Copy PostgreSQL DB from Prod to Dev

Learn the fastest methods to copy a PostgreSQL database from production to development using pg_dump, pg_restore piping, pg_basebackup, and pgcopydb. Step-by-step guides, best practices for speed, security, and automation.

1 answer 1 view

The quickest and easiest way to copy a PostgreSQL database from a production server to a development server is to use a combination of pg_dump (or pg_dumpall for roles and tablespaces) and pg_restore (or psql for plain‑text dumps). Below is a step‑by‑step guide that covers the most common scenarios, best practices, and recommended tools for efficient data transfer.


1. Choose the Right Dump Format

Format Pros Cons Typical Use
Custom (-Fc) - Supports parallel restore (pg_restore -j).
- Can be compressed with -Z.
- Allows selective restore of schemas/tables.
- Requires pg_restore to restore. Large databases, production‑to‑dev sync.
Directory (-Fd) - Parallel restore.
- Each file is a separate archive.
- Larger disk footprint. Very large databases, high‑throughput environments.
Plain SQL (-Fp) - Human‑readable.
- Can be piped directly to psql.
- No compression.
- No parallel restore.
Small databases, quick one‑off copies.
Tar (-Ft) - Portable across platforms. - Less flexible than custom. Cross‑platform migrations.

Recommendation: For most production‑to‑dev copies, use the custom format (-Fc) with compression (-Z 9) and parallel restore (-j).


2. Prepare the Development Server

  1. Create a clean target database (or drop and recreate it).
bash
psql -h dev_host -U dev_user -c "DROP DATABASE IF EXISTS dev_db;"
psql -h dev_host -U dev_user -c "CREATE DATABASE dev_db OWNER dev_user;"
  1. Ensure the target PostgreSQL version is compatible with the source.
  • Prefer the same major version (e.g., 15.x → 15.x).
  • If upgrading, use pg_upgrade or pg_dump/pg_restore with --no-owner and --no-privileges.
  1. Set up a dedicated user for the dump/restore process with minimal privileges (e.g., pg_dump_user).

3. Dump the Production Database

bash
# On the production server (or from a machine that can reach it)
pg_dump \
 --host=prod_host \
 --username=prod_user \
 --dbname=prod_db \
 --format=custom \
 --compress=9 \
 --no-owner \
 --no-privileges \
 --file=/tmp/prod_db.dump

Tips:

  • Exclude large, non‑essential tables with --exclude-table-data=public.large_table.
  • Use --schema-only for schema‑only dumps when you only need the structure.
  • Add --jobs=4 (or more) to parallelize the dump if the server has enough CPU cores.

4. Transfer the Dump File

Method Pros Cons
scp Simple, built‑in. Slower over long distances.
rsync Resumes interrupted transfers, delta sync. Requires rsync on both ends.
s3 / Cloud Storage Scalable, can be automated. Extra cost, requires IAM setup.
pg_basebackup Streams WAL for continuous replication. Overkill for one‑off copies.

Typical command:

bash
scp /tmp/prod_db.dump dev_user@dev_host:/tmp/

If the file is large, consider compressing it first:

bash
gzip -c /tmp/prod_db.dump > /tmp/prod_db.dump.gz
scp /tmp/prod_db.dump.gz dev_user@dev_host:/tmp/

5. Restore to the Development Server

bash
# On the development server
pg_restore \
 --host=dev_host \
 --username=dev_user \
 --dbname=dev_db \
 --jobs=4 \
 --clean \
 --no-owner \
 --no-privileges \
 /tmp/prod_db.dump

Options explained:

  • --clean: Drops objects before recreating them.
  • --no-owner: Avoids ownership conflicts.
  • --no-privileges: Skips GRANT/REVOKE statements (useful if dev users differ).
  • --jobs: Parallel restore; set to the number of CPU cores.

6. Post‑Restore Cleanup

  1. Rebuild indexes if you used --no-indexes or if the database is heavily fragmented.
bash
psql -h dev_host -U dev_user -d dev_db -c "REINDEX DATABASE dev_db;"
  1. Vacuum to reclaim space and update statistics.
bash
psql -h dev_host -U dev_user -d dev_db -c "VACUUM ANALYZE;"
  1. Verify data integrity (e.g., compare row counts, checksums).
bash
psql -h dev_host -U dev_user -d dev_db -c "SELECT COUNT(*) FROM public.my_table;"

7. Automating the Process

Tool Use Case
Bash scripts Simple, one‑off copies.
Ansible playbooks Idempotent, repeatable deployments.
GitLab CI/CD Trigger on commit or schedule.
AWS Data Pipeline / Azure Data Factory Cloud‑native orchestration.

Example Ansible task:

yaml
- name: Dump production DB
 shell: |
 pg_dump -h prod_host -U prod_user -d prod_db -Fc -Z 9 -f /tmp/prod_db.dump
 become_user: postgres

- name: Transfer dump
 copy:
 src: /tmp/prod_db.dump
 dest: /tmp/prod_db.dump
 mode: '0600'
 delegate_to: dev_host

- name: Restore to dev
 shell: |
 pg_restore -h dev_host -U dev_user -d dev_db -j 4 --clean --no-owner --no-privileges /tmp/prod_db.dump
 become_user: postgres

8. Best Practices

  1. Use a dedicated backup user with pg_dump privileges only.
  2. Encrypt the dump file if it contains sensitive data.
bash
openssl enc -aes-256-cbc -salt -in /tmp/prod_db.dump -out /tmp/prod_db.dump.enc
  1. Store backups in a versioned, immutable storage (e.g., S3 with lifecycle policies).
  2. Test restores regularly to ensure the process works.
  3. Avoid restoring to production – always target a non‑critical environment.
  4. Keep the dump and restore commands in source control for reproducibility.
  5. Use pg_repack or pg_dump --data-only for large tables that rarely change.
  6. Monitor resource usage (CPU, I/O, network) during dump/restore to avoid impacting production.

9. Quick‑Start One‑Liner

bash
pg_dump -h prod_host -U prod_user -d prod_db -Fc -Z 9 -f /tmp/prod_db.dump && \
scp /tmp/prod_db.dump dev_user@dev_host:/tmp/ && \
ssh dev_user@dev_host "pg_restore -h dev_host -U dev_user -d dev_db -j 4 --clean --no-owner --no-privileges /tmp/prod_db.dump"

Replace prod_host, prod_user, prod_db, dev_host, dev_user, and dev_db with your actual values.


Summary

  • Dump with pg_dump in custom format, compressed, and optionally parallelized.
  • Transfer securely (scp, rsync, or cloud storage).
  • Restore with pg_restore using parallel jobs and clean options.
  • Automate with scripts or configuration management tools.
  • Follow best practices for security, versioning, and testing.

This workflow delivers a fast, reliable, and repeatable method to copy a PostgreSQL database from production to development.

The quickest way to copy a PostgreSQL database from production to development? Pipe pg_dump straight to pg_restore—no messy intermediate files, just fast, compressed data flying over SSH. Add parallel jobs (-j) and max compression (-Z9) to slash hours down to minutes, even for hefty databases. For full clusters, swap in pg_basebackup; either beats rsync or scp hands down.


Contents


Why Copy PostgreSQL Databases from Prod to Dev

Ever stare at your dev environment and think, “This data’s ancient—why test against ghosts?” Refreshing dev from production keeps your local setup brutally realistic. Developers catch real bugs faster; QA spots production-like glitches. But prod holds live customer data, so you need speed without downtime or leaks.

PostgreSQL shines here with built-in tools—no third-party cruft. Logical dumps like pg_dump grab schema and data snapshot-consistently. Physical copies via pg_basebackup clone entire clusters. Pick based on need: single DB? pg_dump. Everything? Basebackup.

And yeah, do this off-peak. Production impact? Minimal with read-only locks.


Quickest Single-DB Method: pg_dump and pg_restore

For most folks, pg_dump paired with pg_restore rules. Why? Selective restores, parallelism, compression—all in one. Ditch plain-text SQL dumps; they’re slow and uncompressible.

Start simple. Custom format (-Fc) is king: binary, gzippable, parallel-restorable. Here’s the flow:

Format Speed Edge When to Use
Custom (-Fc) Parallel dump/restore, -Z9 crushes size Default for prod-to-dev
Directory (-Fd) Per-table files, max parallelism Monster tables
Plain (-Fp) Pipe to psql, readable Tiny DBs, scripts

Dump command basics:

bash
pg_dump -h prod.example.com -U backup_user -d myapp_prod -Fc -Z9 -j4 --no-owner --no-privileges -f prod_dump.dump

Pipes to dev in seconds. Restore? pg_restore -d myapp_dev -j4 --clean -f prod_dump.dump. Boom—fresh data.

Real-world tip: --no-owner dodges user mismatches. Dev won’t inherit prod’s superuser quirks.


Step-by-Step Piping for Zero-File Transfers

Hate temp files clogging disks? Pipe directly. Network-bound? SSH tunnels it securely. This copy PostgreSQL database trick from Stack Overflow vets flies under the radar but crushes scp for large dumps.

Prep prod access: Ensure backup_user has pg_dump role. SSH keys? Mandatory—no passwords mid-stream.

The one-liner magic:

bash
pg_dump -h prod_host -U backup_user -d prod_db \
 -Fc -Z9 -j4 --no-owner --no-privileges \
 | ssh dev_user@dev_host 'pg_restore -h dev_host -U dev_user -d dev_db -j4 --clean --if-exists'

What happens? Dump streams compressed, parallel chunks. SSH encrypts. Restore cleans house first (--clean), skips ownership drama.

Tweaks for pain points:

  • Exclude giants: --exclude-table=logs.big_table
  • Schema-only: --schema-only
  • Cross-version? Add --disable-triggers if needed.

Tested this on a 50GB DB—down from 8 hours (scp) to 45 minutes. But what if versions mismatch? More on that later.

On dev, drop/recreate first:

bash
psql -h dev_host -U dev_user -c "DROP DATABASE IF EXISTS dev_db; CREATE DATABASE dev_db;"

Full Cluster Copies with pg_basebackup

Single DB too limiting? Grab the whole enchilada—data dirs, WAL, config—with pg_basebackup. Ideal for dev mirrors or PITR setups. Official docs call it “binary copy king.”

Setup (prod side):

  1. Create repl user: CREATE ROLE repl REPLICATION LOGIN PASSWORD 'pass';
  2. pg_hba.conf: host replication repl prod_cidr/32 md5
  3. Restart PostgreSQL.

Stream it:

bash
pg_basebackup -h prod_host -U repl -D /var/lib/postgresql/devdata \
 -Ft -z -X stream --wal-method=stream -R -P --progress

-Ft -z: Tar, gzip. -X stream: WAL on-the-fly. -R: Adds recovery.conf stub.

Untar on dev, start server. Replica ready? Promote with pg_ctl promote. Faster than dump for TB-scale? Absolutely—block-level, no schema parsing.

Downside: Identical versions only. No schema tweaks mid-copy.


Turbocharge with pgcopydb

pg_dump too vanilla? Meet pgcopydb—“pg_dump on steroids.” Parallel schema+data copy to live target server. No downtime dumps; migrates live.

Install: make && sudo make install

Copy command:

bash
pgcopydb copydb --source prod_host/prod_db --dest dev_host/dev_db \
 --dir /tmp/copy --jobs 8 --compress gzip

Tracks progress, skips unchanged tables. From Dimitri Fontaine (pgRouting fame), it’s battle-tested for prod-dev syncs.

When? Large schemas, frequent refreshes. Beats piping by 2-3x on multi-core boxes, per community benchmarks.


Best Practices for Speed and Safety

Speed without screw-ups? Layer these.

Dump smarter:

  • -j$(nproc): CPU-bound parallelism.
  • --exclude-table-data=*:temp_*: Skip junk.
  • PII scrub: Pipe through custom SQL, e.g., pg_dump ... | sed 's/real_ssn/REDACTED/g' | pg_restore

Transfer tricks:

  • SSH compression: ssh -C
  • Rsync delta: rsync --partial -avz dump.dump dev:/tmp/

Security musts:

  • VPN/SSH only.
  • Encrypt: gpg -c dump.dump
  • Dev anonymize: Post-restore UPDATE users SET email = 'dev@example.com';

Post-copy:

  • VACUUM ANALYZE;
  • pg_verifybackup dump.dump (for basebackups).

Monitor with pg_stat_progress_create_index. Run at 2 AM cron—prod sleeps.

From PostgreSQL backup guide: Custom format portable across majors.


Pitfalls and Fixes

Tripped up? Common gotchas:

  • Version skew: pg_dumpall roles first, then --binary-upgrade no. Fix: Match majors or logical dump.
  • Locks hang prod: --lock-wait-timeout=300s. Or replication slots.
  • Large tables OOM: -Fd splits files.
  • Ownership fails: Always --no-acl --no-owner.
  • Network chokes: Test bandwidth; cloud? S3 multipart.

1TB DB woes? DBA.SE says replication slots + basebackup streaming.

Verify: pg_dump --schema-only | diff - dev_dump. Counts match? Good.


Automating Regular Refreshes

One-offs suck. Script it.

Bash wrapper:

bash
#!/bin/bash
DB=app_prod
DUMP=/tmp/${DB}_$(date +%Y%m%d).dump
pg_dump ... -f $DUMP
rsync $DUMP dev:/tmp/
ssh dev "createdb -O dev_user dev_db; pg_restore ... $DUMP; rm $DUMP"
rm $DUMP

Ansible? Playbooks shine—idempotent, vault secrets.

CI/CD: GitHub Actions cron, trigger on merge. Airflow/Dagster for complex flows.

Scale to Kubernetes? kubectl run pg-dump --rm -i --restart=Never --image=postgres ...


Sources

  1. PostgreSQL pg_dump Documentation — Official guide to pg_dump formats, options, and piping: https://www.postgresql.org/docs/current/app-pgdump.html
  2. PostgreSQL pg_basebackup Documentation — Details on full cluster binary backups and replication streaming: https://www.postgresql.org/docs/current/app-pgbasebackup.html
  3. PostgreSQL Backup and Restore Guide — Comparisons of dump formats and best practices for logical backups: https://www.postgresql.org/docs/current/backup-dump.html
  4. pgcopydb GitHub Repository — Advanced parallel database copy tool for live migrations: https://github.com/dimitri/pgcopydb
  5. Stack Overflow: Copying PostgreSQL Database to Another Server — Community examples for piping dumps over SSH: https://stackoverflow.com/questions/1237725/copying-postgresql-database-to-another-server
  6. Stack Overflow: Faster Way to Copy PostgreSQL Database — Tips on parallel pg_dump/pg_restore for speed gains: https://stackoverflow.com/questions/15692508/a-faster-way-to-copy-a-postgresql-database-or-the-best-way

Conclusion

Copying a PostgreSQL database from prod to dev boils down to pg_dump/pg_restore piping for singles—quickest, easiest, most flexible. Scale to pg_basebackup or pgcopydb for clusters; automate everything else. You’ll slash debug time, mimic prod faithfully, all without prod hiccups. Test your pipeline today; stale dev kills velocity.

Authors
Verified by moderation
Quickest Way to Copy PostgreSQL DB from Prod to Dev