AWS redshift ETL | Redshift to postgresql data transfer python script
AWS python programmers
Redshift extract transform and load python script.
Redshift to Postgresql.
This python script can be run either as a lambda function or as a standalone script.
Redshift data is stored efficiently in S3 files. The process is quite fast and takes less than 10 mins for 30GB of data but can be scaled higher based on Redshift's configuration.
From S3 bucket, data is cleaned up and transformed using python script.
AWS glue can also be used here for specific use cases.
AWS S3 data is then loaded into postgresql database
The postgres server can either be an AWS RDS or on premise etc
Multi threaded operation of the python script ensures fast data transfer and loading into postgres
Cloudwatch events or cron jobs can be configured to start the python script automatically.
Redshift table hash checking can be used to load only changed database tables
Can transfer only postgres or redshift tables that have changed.
Very detailed custom S3 logging is provided in addition to verbose cloudwatch logs.
The data tranfer python redshift script can restart from the place it left off data transfer into postgresql.
Redshift postgresql data transfer can resume after interruption.