AWS EFS Sync to S3 Using Lambda
About
This post documents all the steps required to synchronize AWS EFS with an S3 bucket using a lambda function. The flow of information is from S3 to EFS and not vice versa.
The approach is whenever a new file is uploaded or deleted from the S3 bucket, it will create an event notification. This event will trigger a lambda function that has the efs file system mounted to it. The Lambda function will then synchronize the files from S3 to EFS.
Environment Details
- Python = 3.9.x
Steps
Create an S3 bucket
Let’s first create an S3 bucket that will contain our data, and this is the bucket we would like to be in sync with EFS. I am naming the bucket mydata-202203
. You may name it as you please. Choose a region of your choice and leave the rest of the settings as defaults.
Create a Lambda function
Now create a lambda function that will receive event notifications from the S3 bucket, and sync files on efs. I am naming it mydata-sync
and our runtime will be Python 3.9
. Keep the rest of the settings as default, and create the function.
Create S3 event notifications
From the bucket, mydata-sync
go to Properties
. Scroll down to Event notifications
and click create. Give any name to the event. I am calling it object-sync
. From the provided event types select * s3:ObjectCreated:Put * s3:ObjectRemoved:Delete
From the section Destination select Lambda Function
, and from the list choose the lambda function name we created in the last section mydata-sync
Click Save Changes
Test S3 notifications
Let’s now test if S3 event notifications are being received by our lambda function. For this update lambda function code and simply print the event received. After updating the lambda function, make sure to deploy it.
Now upload some files in our S3 bucket, and it should trigger our lambda function. For testing, I have uploaded an empty test1.txt
file in our bucket. Once successfully uploaded I check the Lambda function logs to see if any event is received. For this go to lambda function mydata-sync > Monitor > Logs > View logs in CloudWatch. For the CloudWatch console view the latest log stream. Below is the event I have received in the logs
{'Records': [{'eventVersion': '2.1', 'eventSource': 'aws:s3', 'awsRegion': 'us-east-1', 'eventTime': '2022-03-28T16:08:00.896Z', 'eventName': 'ObjectCreated:Put', 'userIdentity': {'principalId': 'AWS:AIDA3VIXXJNKIVU6P5NY3'}, 'requestParameters': {'sourceIPAddress': '202.163.113.76'}, 'responseElements': {'x-amz-request-id': '39MD61ZS00SNK2RT', 'x-amz-id-2': 'U+zPUWOrfzTuVi7kbaBONLHoJXKqUICsVqyKBg4yPKYbUV7pQLGc4Z5A2fSIVvDFtSJHC6v29EDJoXhypWsj2wXanUu8YrLocr3z+yK1qoo='}, 's3': {'s3SchemaVersion': '1.0', 'configurationId': 'object-sync', 'bucket': {'name': 'mydata-202203', 'ownerIdentity': {'principalId': 'AYAQOSFZ1VPK'}, 'arn': 'arn:aws:s3:::mydata-202203'}, 'object': {'key': 'test1.txt', 'size': 0, 'eTag': 'd41d8cd98f00b204e9800998ecf8427e', 'sequencer': '006241DD60D67A4556'}}}]}
let’s load this event in a dictionary and find some important parameters
event = {'Records': [{'eventVersion': '2.1', 'eventSource': 'aws:s3', 'awsRegion': 'us-east-1', 'eventTime': '2022-03-28T16:08:00.896Z', 'eventName': 'ObjectCreated:Put', 'userIdentity': {'principalId': 'AWS:AIDA3VIXXJNKIVU6P5NY3'}, 'requestParameters': {'sourceIPAddress': '202.163.113.76'}, 'responseElements': {'x-amz-request-id': '39MD61ZS00SNK2RT', 'x-amz-id-2': 'U+zPUWOrfzTuVi7kbaBONLHoJXKqUICsVqyKBg4yPKYbUV7pQLGc4Z5A2fSIVvDFtSJHC6v29EDJoXhypWsj2wXanUu8YrLocr3z+yK1qoo='}, 's3': {'s3SchemaVersion': '1.0', 'configurationId': 'object-sync', 'bucket': {'name': 'mydata-202203', 'ownerIdentity': {'principalId': 'AYAQOSFZ1VPK'}, 'arn': 'arn:aws:s3:::mydata-202203'}, 'object': {'key': 'test1.txt', 'size': 0, 'eTag': 'd41d8cd98f00b204e9800998ecf8427e', 'sequencer': '006241DD60D67A4556'}}}]}
event
{'Records': [{'eventVersion': '2.1',
'eventSource': 'aws:s3',
'awsRegion': 'us-east-1',
'eventTime': '2022-03-28T16:08:00.896Z',
'eventName': 'ObjectCreated:Put',
'userIdentity': {'principalId': 'AWS:AIDA3VIXXJNKIVU6P5NY3'},
'requestParameters': {'sourceIPAddress': '202.163.113.76'},
'responseElements': {'x-amz-request-id': '39MD61ZS00SNK2RT',
'x-amz-id-2': 'U+zPUWOrfzTuVi7kbaBONLHoJXKqUICsVqyKBg4yPKYbUV7pQLGc4Z5A2fSIVvDFtSJHC6v29EDJoXhypWsj2wXanUu8YrLocr3z+yK1qoo='},
's3': {'s3SchemaVersion': '1.0',
'configurationId': 'object-sync',
'bucket': {'name': 'mydata-202203',
'ownerIdentity': {'principalId': 'AYAQOSFZ1VPK'},
'arn': 'arn:aws:s3:::mydata-202203'},
'object': {'key': 'test1.txt',
'size': 0,
'eTag': 'd41d8cd98f00b204e9800998ecf8427e',
'sequencer': '006241DD60D67A4556'}}}]}
Alright, we have seen that we are receiving notifications from S3 bucket so let’s now move on to the next section.
Create an EFS
From EFS console give name as mydata-efs
. I am using default VPC for this post. Use Regional
availability settings. Click Create. Once file system is created, click on Access points and create an access point for this efs to be mounted in other service. For access point use following settings * name = mydata-ap * root dir path = /efs * POSIX User * POSIX UID = 1000 * Group ID = 1000 * Root directory creation permissions * Owner user id = 1000 * Owner group id = 1000 * POSIX permissions = 777
Click Create.
Here I have used the root dir path as /efs
this means that from this access point my access will be limited to folder /efs
. If you want to provide full access to all folders then set to root path to /
.
Note on EFS security group settings
In the last section, I have used a default VPC security group (sg) while creating EFS. default sg allows traffic for all protocols and all ports, both for inbound and outbound traffic. But if you are using a custom security group then make sure that you have an inbound rule for * Type = NFS * Protocol = TCP * Port range = 2049
Otherwise, you will not be able to access EFS using NFS clients.
Mount EFS to Lambda Function
To mount an EFS to the Lambda function requires some additional steps.
First add permissions to Lambda function. From lambda function > Configurations > Permissions > Execution role. Click on the execution role to open it in IAM concole. For the selected role attach an additional policy AmazonElasticFileSystemFullAccess
.
Second, add the lambda to a VPC group in which efs was created. We have created efs in default VPC so let’s add lambda to it. For this from lambda Configurations > VPC click edit. For the next pane select default VPC, all subnets, default VPC security group, and click save.
Now we can add EFS to lambda. Go to lambda Configurations > File Systems > Add file system. Select the file system mydata-efs and associated access point mydata-ap and local mount point as /mnt/efs
. The local mount point is the mounted directory from where we can access our EFS from inside the lambda environment. Click Save
Check EFS mount point from Lambda
Let’s verify from lambda that EFS has been mounted and can we access it. So update the lambda code as below and deploy it.
Now test this code using a test event S3 Put
. For this go to lambda Test > Create new event > Template (s3-put). ‘S3 Put’ test event is similar to the one we saw in the last section. We can use this request template to simulate the event received from S3 bucket. Once the test is successfully executed, check the log output.
START RequestId: 2e307a14-f373-46d5-b763-594d5f406ae6 Version: $LATEST
/mnt/efs exists
[]
{'Records': [{'eventVersion': '2.0', 'eventSource': 'aws:s3', 'awsRegion': 'us-east-1', 'eventTime': '1970-01-01T00:00:00.000Z', 'eventName': 'ObjectCreated:Put', 'userIdentity': {'principalId': 'EXAMPLE'}, 'requestParameters': {'sourceIPAddress': '127.0.0.1'}, 'responseElements': {'x-amz-request-id': 'EXAMPLE123456789', 'x-amz-id-2': 'EXAMPLE123/5678abcdefghijklambdaisawesome/mnopqrstuvwxyzABCDEFGH'}, 's3': {'s3SchemaVersion': '1.0', 'configurationId': 'testConfigRule', 'bucket': {'name': 'example-bucket', 'ownerIdentity': {'principalId': 'EXAMPLE'}, 'arn': 'arn:aws:s3:::example-bucket'}, 'object': {'key': 'test%2Fkey', 'size': 1024, 'eTag': '0123456789abcdef0123456789abcdef', 'sequencer': '0A1B2C3D4E5F678901'}}}]}
END RequestId: 2e307a14-f373-46d5-b763-594d5f406ae6
REPORT RequestId: 2e307a14-f373-46d5-b763-594d5f406ae6 Duration: 7.02 ms Billed Duration: 8 ms Memory Size: 128 MB Max Memory Used: 37 MB Init Duration: 93.81 ms
From the logs we can see that the mounted EFS directory exists /mnt/efs
but currently the folder is empty.
Configure VPC endpoint for S3
Till now we have configured S3 notifications to trigger a lambda function and also mounted EFS to it. Our next step is to process the event received in lambda, and download the file from S3 to EFS. But since our lambda function is configured for a VPC we cannot connect to S3 from it. Even though we can still receive S3 event notification, when we try to connect to S3 to download any file we will get a timeout error. To fix this we will create a VPC endpoint for S3 bucket.
For this go to VPC console > Endpoints > Create endpoint, and set the following * name = mydata-ep * service category = aws services * services = com.amazonaws.us-east-1.s3 (Gateway) * vpc = default * route table = default (main route table) * policy = full access
Click Create endpoint
Configure S3 permissions for Lambda
For lambda to be able to connect to S3 we also need to give it proper permissions. For this go to Lambda > Configurations > Permissions > Execution Role > click on role name. From the IAM Role console select add permissions, and then select AmazonS3FullAccess
Process S3 event notifications
Our lambda and EFS are ready and we can now process S3 events. Update the lambda code as below to process S3 events. It will download and delete from EFS to keep it in sync with S3 bucket.
import json
import boto3
import os
s3 = boto3.client("s3")
def lambda_handler(event, context):
event_name = event["Records"][0]["eventName"]
bucket_name = event["Records"][0]["s3"]["bucket"]["name"]
object_key = event["Records"][0]["s3"]["object"]["key"]
efs_file_name = "/mnt/efs/" + object_key
# S3 put
if event_name == "ObjectCreated:Put":
s3.download_file(bucket_name, object_key, efs_file_name)
print(f"file downloaded: {efs_file_name}")
# S3 delete
if event_name == "ObjectRemoved:Delete":
# check if file exists on efs
if os.path.exists(efs_file_name):
os.remove(efs_file_name)
print(f"file deleted: {efs_file_name}")
return {"statusCode": 200, "body": json.dumps(event)}
We can test this code using the S3-put test event we used last time. Modify the event for bucket name and object key as below.
{
"Records": [
{
"eventVersion": "2.0",
"eventSource": "aws:s3",
"awsRegion": "us-east-1",
"eventTime": "1970-01-01T00:00:00.000Z",
"eventName": "ObjectCreated:Put",
"userIdentity": {
"principalId": "EXAMPLE"
},
"requestParameters": {
"sourceIPAddress": "127.0.0.1"
},
"responseElements": {
"x-amz-request-id": "EXAMPLE123456789",
"x-amz-id-2": "EXAMPLE123/5678abcdefghijklambdaisawesome/mnopqrstuvwxyzABCDEFGH"
},
"s3": {
"s3SchemaVersion": "1.0",
"configurationId": "testConfigRule",
"bucket": {
"name": "mydata-202203",
"ownerIdentity": {
"principalId": "EXAMPLE"
},
"arn": "arn:aws:s3:::example-bucket"
},
"object": {
"key": "test1.txt",
"size": 1024,
"eTag": "0123456789abcdef0123456789abcdef",
"sequencer": "0A1B2C3D4E5F678901"
}
}
}
]
}
Click test. From the output logs, we can see that our code was able to download the file from S3 bucket and write it on EFS.
START RequestId: 7e9c0dc2-f970-426e-8372-e59b07f5536c Version: $LATEST
file downloaded: /mnt/efs/test1.txt
END RequestId: 7e9c0dc2-f970-426e-8372-e59b07f5536c
REPORT RequestId: 7e9c0dc2-f970-426e-8372-e59b07f5536c Duration: 370.00 ms Billed Duration: 371 ms Memory Size: 128 MB Max Memory Used: 72 MB Init Duration: 367.68 ms
Note that if you get any permission errors then it could be due to the mounting path errors. Please do check the access point path and lambda mount path.
Verify file on EFS
We can verify files on EFS by directly mounting them to an EC2 machine and verifying from there. So let’s do that.
Create an EC2 machine * AMI = Amazon Linux 2 AMI (HVM) - Kernel 5.10, SSD Volume Type * Intance type = t2.micro (free tier) * Instance details * Network = default VPC * Auto-assign Public IP = Enable * Review and Lanunch > Launch > Proceed without key pair.
Once the instance is up and running, click on it and connect using EC2 instance connect option. Create a dir ‘efs’ using the command
mkdir efs
In a separate tab open EFS, and click on the file system we have created. Click Attach. From “Mount via DNS” copy command for NFS client. paste that in EC2 bash terminal
sudo mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport fs-0c9526e2f48ece247.efs.us-east-1.amazonaws.com:/ efs
Once successfully mounted verify that the file ‘test1.txt’ exists in EFS. We can also delete the file from S3 and similarly verify from EFS that the file has been removed.
Summary
A summary of all the steps * Create an S3 bucket * Create a Lambda function * Create event notifications on the S3 bucket to trigger the lambda function * Create an EFS file system and its access point. Check the security group setting for inbound rules for NFS traffic * Add EFS and S3 permissions to lambda * Add lambda to VPC * Create VPC endpoint for S3 bucket * Update lambda code to process event notifications * Use EC2 to mount EFS and verify the files