S3 Presigned URL¶
Caution
This example is for reference only. It is not extensively tested, and it not intended to be a fully-fledged Concourse resource for production pipelines. Copy and paste at your own risk.
This example will showcase the InOnlyConcourseResource, and how to build a resource to fetch arbitrary data from an external resource that is not stored externally. In this particular example we will consider a resource to generate a new presigned URL for an object in an S3 bucket.
Traditionally, when Concourse users which to “run an external function” from a pipeline, they create an OutOnlyConcourseResource to do it. Classic examples include setting a build status (as shown in the Bitbucket Build Status example), or to send a message on Slack. Usually, the new version will only contain a small amount of placeholder information, such as the new build status. However, a put step is designed to “fetch” the information for further use in the pipeline, and this is only really possible when the new version represents a “state” which is stored with the external service. For example, the Git resource will push a new commit, and then emit the version corresponding to that commit so that - when the resource runs its implicit get step - the information is not overwritten.
However, in the case of presigned URLs, AWS does not store these anywhere such that they are accessible from the server. When they are created, the user does not get given a UUID which allows them to “look up” the URL at a later date. Therefore, even if the put step of the resource created the URL and downloaded it to its resource directory, the get step would overwrite it with an empty folder. The two main solutions to this are:
Pass the URL in the newly created version.
Have the put step write the URL to a different resource directory.
Option 2 requires another resource to be already available, as it isn’t possible to request additional output directories like a task step. This is do-able but fragile, and requires care to make sure that no important files in the other resource are overwritten. Option 1 seems cleaner, but the URL might be sensitive, and storing it as plaintext within the version is definitely not ideal. We could consider encrypting it and having the user pass some sort of key, but this is complicating matters greatly. There is a better way.
The InOnlyConcourseResource is designed to run these “functions” in the get step, and to be triggered by a put step, like so:
- put: s3-presigned-url
get_params:
file_path: my-file
URL Version¶
Because the version itself isn’t important, the InOnlyConcourseResource actually uses a prebuilt version containing nothing but a timestamp of creation time:
@dataclass(unsafe_hash=True)
class DatetimeVersion(TypedVersion):
"""
A placeholder version containing only the time at which it was created.
"""
execution_date: datetime
@classmethod
def now(cls) -> DatetimeVersion:
"""Return the version corresponding to now."""
return cls(datetime.now())
URL Resource¶
We start by inheriting from InOnlyConcourseResource. Again, we don’t need to pass a version:
def __init__(self, bucket_name: str, region_name: str) -> None:
"""
Initialise self.
:param bucket_name: The name of your bucket.
:param region_name: The name of the region in which your bucket resides.
"""
super().__init__()
self.bucket_name = bucket_name
self.client = boto3.client("s3", region_name=region_name,
config=Config(signature_version="s3v4"))
All of the resource functionality comes from overloading the download_data() method:
def download_data(self, destination_dir: Path, build_metadata: BuildMetadata,
file_path: str, expires_in: dict[str, float],
file_name: str | None = None,
url_file: str = "url") -> dict[str, str]:
params = {
"Bucket": self.bucket_name,
"Key": file_path,
}
if file_name is not None:
# https://stackoverflow.com/a/2612795
content_disposition = f"attachment; filename=\"{file_name}\""
params["ResponseContentDisposition"] = content_disposition
expiry_seconds = int(timedelta(**expires_in).total_seconds())
url = self.client.generate_presigned_url(ClientMethod="get_object",
Params=params,
ExpiresIn=expiry_seconds)
url_file_path = destination_dir / url_file
url_file_path.write_text(url)
return {}
This method takes a required file_path argument to indicate the object for which the URL should be generated. The expires_in parameter takes a mapping of arguments for datetime.timedelta to allow users to specify expiration time more explicitly. Finally, passing a file_name instructs the URL to name the downloaded file something specific, rather than the original name from within S3. The URL itself is generated using the generate_presigned_url function. Finally, we don’t return a version, so we only need to concern ourselves with the Step Metadata.
This resource can then be invoked like so:
- put: s3-presigned-url
get_params:
file_path: folder/file.txt
file_name: file.txt
expires_in:
hours: 24
Once the implicit get step is completed, the URL can then be loaded easily and used in prior steps:
- load_var: s3-url
file: s3-presigned-url/url
URL Conclusion¶
The final resource only requires 51 lines of code, and looks like this:
1# (C) Crown Copyright GCHQ
2from __future__ import annotations
3
4from datetime import timedelta
5from pathlib import Path
6
7import boto3
8from botocore.client import Config
9
10from concoursetools import BuildMetadata
11from concoursetools.additional import InOnlyConcourseResource
12
13
14class S3SignedURLConcourseResource(InOnlyConcourseResource):
15 """
16 A Concourse resource type for generating pre-signed URLs for items in S3 buckets.
17 """
18 def __init__(self, bucket_name: str, region_name: str) -> None:
19 """
20 Initialise self.
21
22 :param bucket_name: The name of your bucket.
23 :param region_name: The name of the region in which your bucket resides.
24 """
25 super().__init__()
26 self.bucket_name = bucket_name
27 self.client = boto3.client("s3", region_name=region_name,
28 config=Config(signature_version="s3v4"))
29
30 def download_data(self, destination_dir: Path, build_metadata: BuildMetadata,
31 file_path: str, expires_in: dict[str, float],
32 file_name: str | None = None,
33 url_file: str = "url") -> dict[str, str]:
34 params = {
35 "Bucket": self.bucket_name,
36 "Key": file_path,
37 }
38 if file_name is not None:
39 # https://stackoverflow.com/a/2612795
40 content_disposition = f"attachment; filename=\"{file_name}\""
41 params["ResponseContentDisposition"] = content_disposition
42
43 expiry_seconds = int(timedelta(**expires_in).total_seconds())
44 url = self.client.generate_presigned_url(ClientMethod="get_object",
45 Params=params,
46 ExpiresIn=expiry_seconds)
47
48 url_file_path = destination_dir / url_file
49 url_file_path.write_text(url)
50
51 return {}