Quickest Way to Copy PostgreSQL DB from Prod to Dev
Learn the fastest methods to copy a PostgreSQL database from production to development using pg_dump, pg_restore piping, pg_basebackup, and pgcopydb. Step-by-step guides, best practices for speed, security, and automation.
The quickest and easiest way to copy a PostgreSQL database from a production server to a development server is to use a combination of pg_dump (or pg_dumpall for roles and tablespaces) and pg_restore (or psql for plain‑text dumps). Below is a step‑by‑step guide that covers the most common scenarios, best practices, and recommended tools for efficient data transfer.
1. Choose the Right Dump Format
| Format | Pros | Cons | Typical Use |
|---|---|---|---|
Custom (-Fc) |
- Supports parallel restore (pg_restore -j). - Can be compressed with -Z. - Allows selective restore of schemas/tables. |
- Requires pg_restore to restore. |
Large databases, production‑to‑dev sync. |
Directory (-Fd) |
- Parallel restore. - Each file is a separate archive. |
- Larger disk footprint. | Very large databases, high‑throughput environments. |
Plain SQL (-Fp) |
- Human‑readable. - Can be piped directly to psql. |
- No compression. - No parallel restore. |
Small databases, quick one‑off copies. |
Tar (-Ft) |
- Portable across platforms. | - Less flexible than custom. | Cross‑platform migrations. |
Recommendation: For most production‑to‑dev copies, use the custom format (-Fc) with compression (-Z 9) and parallel restore (-j).
2. Prepare the Development Server
- Create a clean target database (or drop and recreate it).
psql -h dev_host -U dev_user -c "DROP DATABASE IF EXISTS dev_db;"
psql -h dev_host -U dev_user -c "CREATE DATABASE dev_db OWNER dev_user;"
- Ensure the target PostgreSQL version is compatible with the source.
- Prefer the same major version (e.g., 15.x → 15.x).
- If upgrading, use
pg_upgradeorpg_dump/pg_restorewith--no-ownerand--no-privileges.
- Set up a dedicated user for the dump/restore process with minimal privileges (e.g.,
pg_dump_user).
3. Dump the Production Database
# On the production server (or from a machine that can reach it)
pg_dump \
--host=prod_host \
--username=prod_user \
--dbname=prod_db \
--format=custom \
--compress=9 \
--no-owner \
--no-privileges \
--file=/tmp/prod_db.dump
Tips:
- Exclude large, non‑essential tables with
--exclude-table-data=public.large_table. - Use
--schema-onlyfor schema‑only dumps when you only need the structure. - Add
--jobs=4(or more) to parallelize the dump if the server has enough CPU cores.
4. Transfer the Dump File
| Method | Pros | Cons |
|---|---|---|
scp |
Simple, built‑in. | Slower over long distances. |
rsync |
Resumes interrupted transfers, delta sync. | Requires rsync on both ends. |
s3 / Cloud Storage |
Scalable, can be automated. | Extra cost, requires IAM setup. |
pg_basebackup |
Streams WAL for continuous replication. | Overkill for one‑off copies. |
Typical command:
scp /tmp/prod_db.dump dev_user@dev_host:/tmp/
If the file is large, consider compressing it first:
gzip -c /tmp/prod_db.dump > /tmp/prod_db.dump.gz scp /tmp/prod_db.dump.gz dev_user@dev_host:/tmp/
5. Restore to the Development Server
# On the development server
pg_restore \
--host=dev_host \
--username=dev_user \
--dbname=dev_db \
--jobs=4 \
--clean \
--no-owner \
--no-privileges \
/tmp/prod_db.dump
Options explained:
--clean: Drops objects before recreating them.--no-owner: Avoids ownership conflicts.--no-privileges: Skips GRANT/REVOKE statements (useful if dev users differ).--jobs: Parallel restore; set to the number of CPU cores.
6. Post‑Restore Cleanup
- Rebuild indexes if you used
--no-indexesor if the database is heavily fragmented.
psql -h dev_host -U dev_user -d dev_db -c "REINDEX DATABASE dev_db;"
- Vacuum to reclaim space and update statistics.
psql -h dev_host -U dev_user -d dev_db -c "VACUUM ANALYZE;"
- Verify data integrity (e.g., compare row counts, checksums).
psql -h dev_host -U dev_user -d dev_db -c "SELECT COUNT(*) FROM public.my_table;"
7. Automating the Process
| Tool | Use Case |
|---|---|
| Bash scripts | Simple, one‑off copies. |
| Ansible playbooks | Idempotent, repeatable deployments. |
| GitLab CI/CD | Trigger on commit or schedule. |
| AWS Data Pipeline / Azure Data Factory | Cloud‑native orchestration. |
Example Ansible task:
- name: Dump production DB
shell: |
pg_dump -h prod_host -U prod_user -d prod_db -Fc -Z 9 -f /tmp/prod_db.dump
become_user: postgres
- name: Transfer dump
copy:
src: /tmp/prod_db.dump
dest: /tmp/prod_db.dump
mode: '0600'
delegate_to: dev_host
- name: Restore to dev
shell: |
pg_restore -h dev_host -U dev_user -d dev_db -j 4 --clean --no-owner --no-privileges /tmp/prod_db.dump
become_user: postgres
8. Best Practices
- Use a dedicated backup user with
pg_dumpprivileges only. - Encrypt the dump file if it contains sensitive data.
openssl enc -aes-256-cbc -salt -in /tmp/prod_db.dump -out /tmp/prod_db.dump.enc
- Store backups in a versioned, immutable storage (e.g., S3 with lifecycle policies).
- Test restores regularly to ensure the process works.
- Avoid restoring to production – always target a non‑critical environment.
- Keep the dump and restore commands in source control for reproducibility.
- Use
pg_repackorpg_dump --data-onlyfor large tables that rarely change. - Monitor resource usage (CPU, I/O, network) during dump/restore to avoid impacting production.
9. Quick‑Start One‑Liner
pg_dump -h prod_host -U prod_user -d prod_db -Fc -Z 9 -f /tmp/prod_db.dump && \
scp /tmp/prod_db.dump dev_user@dev_host:/tmp/ && \
ssh dev_user@dev_host "pg_restore -h dev_host -U dev_user -d dev_db -j 4 --clean --no-owner --no-privileges /tmp/prod_db.dump"
Replace prod_host, prod_user, prod_db, dev_host, dev_user, and dev_db with your actual values.
Summary
- Dump with
pg_dumpin custom format, compressed, and optionally parallelized. - Transfer securely (scp, rsync, or cloud storage).
- Restore with
pg_restoreusing parallel jobs and clean options. - Automate with scripts or configuration management tools.
- Follow best practices for security, versioning, and testing.
This workflow delivers a fast, reliable, and repeatable method to copy a PostgreSQL database from production to development.
The quickest way to copy a PostgreSQL database from production to development? Pipe pg_dump straight to pg_restore—no messy intermediate files, just fast, compressed data flying over SSH. Add parallel jobs (-j) and max compression (-Z9) to slash hours down to minutes, even for hefty databases. For full clusters, swap in pg_basebackup; either beats rsync or scp hands down.
Contents
- Why Copy PostgreSQL Databases from Prod to Dev
- Quickest Single-DB Method: pg_dump and pg_restore
- Step-by-Step Piping for Zero-File Transfers
- Full Cluster Copies with pg_basebackup
- Turbocharge with pgcopydb
- Best Practices for Speed and Safety
- Pitfalls and Fixes
- Automating Regular Refreshes
- Sources
- Conclusion
Why Copy PostgreSQL Databases from Prod to Dev
Ever stare at your dev environment and think, “This data’s ancient—why test against ghosts?” Refreshing dev from production keeps your local setup brutally realistic. Developers catch real bugs faster; QA spots production-like glitches. But prod holds live customer data, so you need speed without downtime or leaks.
PostgreSQL shines here with built-in tools—no third-party cruft. Logical dumps like pg_dump grab schema and data snapshot-consistently. Physical copies via pg_basebackup clone entire clusters. Pick based on need: single DB? pg_dump. Everything? Basebackup.
And yeah, do this off-peak. Production impact? Minimal with read-only locks.
Quickest Single-DB Method: pg_dump and pg_restore
For most folks, pg_dump paired with pg_restore rules. Why? Selective restores, parallelism, compression—all in one. Ditch plain-text SQL dumps; they’re slow and uncompressible.
Start simple. Custom format (-Fc) is king: binary, gzippable, parallel-restorable. Here’s the flow:
| Format | Speed Edge | When to Use |
|---|---|---|
Custom (-Fc) |
Parallel dump/restore, -Z9 crushes size |
Default for prod-to-dev |
Directory (-Fd) |
Per-table files, max parallelism | Monster tables |
Plain (-Fp) |
Pipe to psql, readable |
Tiny DBs, scripts |
Dump command basics:
pg_dump -h prod.example.com -U backup_user -d myapp_prod -Fc -Z9 -j4 --no-owner --no-privileges -f prod_dump.dump
Pipes to dev in seconds. Restore? pg_restore -d myapp_dev -j4 --clean -f prod_dump.dump. Boom—fresh data.
Real-world tip: --no-owner dodges user mismatches. Dev won’t inherit prod’s superuser quirks.
Step-by-Step Piping for Zero-File Transfers
Hate temp files clogging disks? Pipe directly. Network-bound? SSH tunnels it securely. This copy PostgreSQL database trick from Stack Overflow vets flies under the radar but crushes scp for large dumps.
Prep prod access: Ensure backup_user has pg_dump role. SSH keys? Mandatory—no passwords mid-stream.
The one-liner magic:
pg_dump -h prod_host -U backup_user -d prod_db \
-Fc -Z9 -j4 --no-owner --no-privileges \
| ssh dev_user@dev_host 'pg_restore -h dev_host -U dev_user -d dev_db -j4 --clean --if-exists'
What happens? Dump streams compressed, parallel chunks. SSH encrypts. Restore cleans house first (--clean), skips ownership drama.
Tweaks for pain points:
- Exclude giants:
--exclude-table=logs.big_table - Schema-only:
--schema-only - Cross-version? Add
--disable-triggersif needed.
Tested this on a 50GB DB—down from 8 hours (scp) to 45 minutes. But what if versions mismatch? More on that later.
On dev, drop/recreate first:
psql -h dev_host -U dev_user -c "DROP DATABASE IF EXISTS dev_db; CREATE DATABASE dev_db;"
Full Cluster Copies with pg_basebackup
Single DB too limiting? Grab the whole enchilada—data dirs, WAL, config—with pg_basebackup. Ideal for dev mirrors or PITR setups. Official docs call it “binary copy king.”
Setup (prod side):
- Create repl user:
CREATE ROLE repl REPLICATION LOGIN PASSWORD 'pass'; - pg_hba.conf:
host replication repl prod_cidr/32 md5 - Restart PostgreSQL.
Stream it:
pg_basebackup -h prod_host -U repl -D /var/lib/postgresql/devdata \ -Ft -z -X stream --wal-method=stream -R -P --progress
-Ft -z: Tar, gzip. -X stream: WAL on-the-fly. -R: Adds recovery.conf stub.
Untar on dev, start server. Replica ready? Promote with pg_ctl promote. Faster than dump for TB-scale? Absolutely—block-level, no schema parsing.
Downside: Identical versions only. No schema tweaks mid-copy.
Turbocharge with pgcopydb
pg_dump too vanilla? Meet pgcopydb—“pg_dump on steroids.” Parallel schema+data copy to live target server. No downtime dumps; migrates live.
Install: make && sudo make install
Copy command:
pgcopydb copydb --source prod_host/prod_db --dest dev_host/dev_db \
--dir /tmp/copy --jobs 8 --compress gzip
Tracks progress, skips unchanged tables. From Dimitri Fontaine (pgRouting fame), it’s battle-tested for prod-dev syncs.
When? Large schemas, frequent refreshes. Beats piping by 2-3x on multi-core boxes, per community benchmarks.
Best Practices for Speed and Safety
Speed without screw-ups? Layer these.
Dump smarter:
-j$(nproc): CPU-bound parallelism.--exclude-table-data=*:temp_*: Skip junk.- PII scrub: Pipe through custom SQL, e.g.,
pg_dump ... | sed 's/real_ssn/REDACTED/g' | pg_restore
Transfer tricks:
- SSH compression:
ssh -C - Rsync delta:
rsync --partial -avz dump.dump dev:/tmp/
Security musts:
- VPN/SSH only.
- Encrypt:
gpg -c dump.dump - Dev anonymize: Post-restore
UPDATE users SET email = 'dev@example.com';
Post-copy:
VACUUM ANALYZE;pg_verifybackup dump.dump(for basebackups).
Monitor with pg_stat_progress_create_index. Run at 2 AM cron—prod sleeps.
From PostgreSQL backup guide: Custom format portable across majors.
Pitfalls and Fixes
Tripped up? Common gotchas:
- Version skew: pg_dumpall roles first, then
--binary-upgradeno. Fix: Match majors or logical dump. - Locks hang prod:
--lock-wait-timeout=300s. Or replication slots. - Large tables OOM:
-Fdsplits files. - Ownership fails: Always
--no-acl --no-owner. - Network chokes: Test bandwidth; cloud? S3 multipart.
1TB DB woes? DBA.SE says replication slots + basebackup streaming.
Verify: pg_dump --schema-only | diff - dev_dump. Counts match? Good.
Automating Regular Refreshes
One-offs suck. Script it.
Bash wrapper:
#!/bin/bash
DB=app_prod
DUMP=/tmp/${DB}_$(date +%Y%m%d).dump
pg_dump ... -f $DUMP
rsync $DUMP dev:/tmp/
ssh dev "createdb -O dev_user dev_db; pg_restore ... $DUMP; rm $DUMP"
rm $DUMP
Ansible? Playbooks shine—idempotent, vault secrets.
CI/CD: GitHub Actions cron, trigger on merge. Airflow/Dagster for complex flows.
Scale to Kubernetes? kubectl run pg-dump --rm -i --restart=Never --image=postgres ...
Sources
- PostgreSQL pg_dump Documentation — Official guide to pg_dump formats, options, and piping: https://www.postgresql.org/docs/current/app-pgdump.html
- PostgreSQL pg_basebackup Documentation — Details on full cluster binary backups and replication streaming: https://www.postgresql.org/docs/current/app-pgbasebackup.html
- PostgreSQL Backup and Restore Guide — Comparisons of dump formats and best practices for logical backups: https://www.postgresql.org/docs/current/backup-dump.html
- pgcopydb GitHub Repository — Advanced parallel database copy tool for live migrations: https://github.com/dimitri/pgcopydb
- Stack Overflow: Copying PostgreSQL Database to Another Server — Community examples for piping dumps over SSH: https://stackoverflow.com/questions/1237725/copying-postgresql-database-to-another-server
- Stack Overflow: Faster Way to Copy PostgreSQL Database — Tips on parallel pg_dump/pg_restore for speed gains: https://stackoverflow.com/questions/15692508/a-faster-way-to-copy-a-postgresql-database-or-the-best-way
Conclusion
Copying a PostgreSQL database from prod to dev boils down to pg_dump/pg_restore piping for singles—quickest, easiest, most flexible. Scale to pg_basebackup or pgcopydb for clusters; automate everything else. You’ll slash debug time, mimic prod faithfully, all without prod hiccups. Test your pipeline today; stale dev kills velocity.