feat: support s3 urls for input and output#32
Open
grusell wants to merge 2 commits intosvt:masterfrom
Open
Conversation
This commit add support for using s3 urls on the format s3://<BUCKET>/<KEY> in both input and output. If ans s3 URL is used as input, a presigned URL is created and used as input to ffmpeg. The duration of the presigned URLs can be controlled with the 'remote-files.s3.presignDurationSeconds' config property. If an s3 URL is used for 'outputFolder', output will first be stored locally and then uploaded to s3 once transcoding is finished. Aws credentials are read with DefaultCredentialsProvider, meaning aws credentials can be provided in a number of ways, see https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/auth/credentials/DefaultCredentialsProvider.html; Not that when using s3 urls for input, the presigned URLs will be shown in the logs. If this is not desirable, setting logging.config (or env variable LOGGING_CONFIG) to 'classpath:logback-json-mask-s3-presign.xml' will use a log config that masks the presign query parameters. By setting env variable REMOTEFILES_S3_ANONYMOUSACCESS to true, s3 urls will be accessed in anonymous mode, corresponding to using the '--no-sign-request' flag with the aws cli. Any s3 access key or secrets key configured will be ignored. Multipart upload will be disabled in this case since the s3 sdk does not support multipart upload when using anonymous access. S3 multipart upload can be configured through configuration properties: - remote-files.s3.multipart.minimumPartSize - remote-files.s3.multipartthreshold - remote-files.s3.multipart.apicallbuffersize These properties corresponds to the properties in software.amazon.awssdk.services.s3.multipart.MultipartConfiguration See https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/services/s3/multipart/MultipartConfiguration.html Signed-off-by: Gustav Grusell <gustav.grusell@eyevinn.se>
Contributor
Author
|
Updated for encore 0.2.9 . |
S3 operations are now done with the aws CRT-based s3 client. This should enable efficient multipart uploads without the need for detailed configuration in most cases. multipart upload configuration properties are nested under 'remote-files.s3.multipart'. The available properties are listed below, for description of the properties see https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/services/s3/S3CrtAsyncClientBuilder.html - minimumPartSize - threshold - targetThroughputGbps - maxConcurrency minumumPartSize and threshold are specified in bytes Signed-off-by: Gustav Grusell <gustav.grusell@eyevinn.se>
Contributor
Author
|
PR has been updated with switch to crt-based s3 client, which seems to give better multipart upload performance without tuning configuration parameters. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This commit add support for using s3 urls on the format s3:/// in both input and output.
If ans s3 URL is used as input, a presigned URL is created and used as input to ffmpeg. The duration of the presigned URLs can be controlled with the 'remote-files.s3.presignDurationSeconds' config property.
If an s3 URL is used for 'outputFolder', output will first be stored locally and then uploaded to s3 once transcoding is finished.
Aws credentials are read with DefaultCredentialsProvider, meaning aws credentials can be provided in a number of ways, see https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/auth/credentials/DefaultCredentialsProvider.html;
Not that when using s3 urls for input, the presigned URLs will be shown in the logs. If this is not desirable, setting logging.config (or env variable LOGGING_CONFIG) to 'classpath:logback-json-mask-s3-presign.xml'
will use a log config that masks the presign query parameters.
By setting env variable REMOTEFILES_S3_ANONYMOUSACCESS to true, s3 urls will be accessed in anonymous mode, corresponding to using the '--no-sign-request' flag with the aws cli. Any s3 access key or secrets key configured will be ignored. Multipart upload will be disabled in this case since the s3 sdk does not support multipart upload when using anonymous access.
Note that support for chunked encoding with s3 input/output is not yet implemented
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
Checklist: