polt

Polt is a tool designed to streamline the process of archiving data from Aurora MySQL tables. It offers efficient and safe data transfer to various destinations, including another table in the same database, AWS S3, or permanent deletion (aka sending data to /dev/null).

Prerequesites

MySQL 8.0

License

The primary license for polt is Apache 2.0, however some files are licensed under MPL 2.0 (as indicated in their SPDX headers).

Local Development & Testing

See DEVELOPMENT.md for instructions on how to run tests with dbdeployer or docker.

2 modes

Polt moves data in 2 modes.

Stage
- Moves data from source table to an internal staging table in the same database.
Archive
- Moves data from staging table to final destination.

Every archive operation will run the 2 modes serially to move the data to final destination. As stated above archive mode doesn't cause any load on original source table as it only works on staging table.

Archive Support

Currently, Polt supports table and S3 archive destinations. For S3 files are written in parquet format using SNAPPY compression. For more details about parquet format , see Parquet File Format and more explanation

S3 archive requirements & limitations

IAM role with S3 write access should have the below policy permissions.

    statement {
      effect = "Allow"
      actions = [
      "s3:PutObject",
      ]
      resources = [
        "arn:aws:s3:::{bucket}/*",
      ]
    }
    statement {
      effect = "Allow"
      actions = [
      "kms:GenerateDataKey",
      ]
      resources = [
        "{bucket_kms_key_arn}",
      ]
   }

S3 bucket destination path should be in the format of {bucket}/{prefix} or s3://{bucket}/{prefix}. For example, my-bucket/prefix/.
Currently, Polt only supports S3 archive destination with parquet format. There is no official support for directly writing to Iceberg tables in S3 yet.
If the S3 archive job fails, it is resumable , but there might be some data duplication in the destination due to the nature of chunking and the checkpointing mechanism. Future versions may address this issue. But until then you can refer to this example Glue(spark) job to remove duplicates from parquet files and insert to iceberg table. Remove Duplicates from Parquet Files

Dependencies

Relies on mysql-client-driver for connecting to MySQL databases.
Relies on Spirit for chunking the data that needs to be archived and also for many other utilities related to db connection, loading table metadata etc.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github/workflows		.github/workflows
bin		bin
cmd/polt		cmd/polt
compose		compose
examples		examples
pkg		pkg
scripts		scripts
.gitignore		.gitignore
.golangci.yaml		.golangci.yaml
CONTRIBUTING.md		CONTRIBUTING.md
DEVELOPMENT.md		DEVELOPMENT.md
Dockerfile		Dockerfile
GOVERNANCE.md		GOVERNANCE.md
LICENSE		LICENSE
MAINTAINERS.md		MAINTAINERS.md
OWNERS.yaml		OWNERS.yaml
README.md		README.md
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

polt

Prerequesites

License

Local Development & Testing

2 modes

Archive Support

S3 archive requirements & limitations

Dependencies

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

block/polt

Folders and files

Latest commit

History

Repository files navigation

polt

Prerequesites

License

Local Development & Testing

2 modes

Archive Support

S3 archive requirements & limitations

Dependencies

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages