AWS EFS Sync to S3 Using Lambda

aws
lambda
efs
s3
synchonization
A tutorial on synchronizing EFS with S3 bucket using a Lambda function.
Published

March 28, 2022

About

This post documents all the steps required to synchronize AWS EFS with an S3 bucket using a lambda function. The flow of information is from S3 to EFS and not vice versa.

The approach is whenever a new file is uploaded or deleted from the S3 bucket, it will create an event notification. This event will trigger a lambda function that has the efs file system mounted to it. The Lambda function will then synchronize the files from S3 to EFS.

approach

Environment Details

  • Python = 3.9.x

Steps

Create an S3 bucket

Let’s first create an S3 bucket that will contain our data, and this is the bucket we would like to be in sync with EFS. I am naming the bucket mydata-202203. You may name it as you please. Choose a region of your choice and leave the rest of the settings as defaults.

Create a Lambda function

Now create a lambda function that will receive event notifications from the S3 bucket, and sync files on efs. I am naming it mydata-sync and our runtime will be Python 3.9. Keep the rest of the settings as default, and create the function.

Create S3 event notifications

From the bucket, mydata-sync go to Properties. Scroll down to Event notifications and click create. Give any name to the event. I am calling it object-sync. From the provided event types select * s3:ObjectCreated:Put * s3:ObjectRemoved:Delete

From the section Destination select Lambda Function, and from the list choose the lambda function name we created in the last section mydata-sync

Click Save Changes

Test S3 notifications

Let’s now test if S3 event notifications are being received by our lambda function. For this update lambda function code and simply print the event received. After updating the lambda function, make sure to deploy it.

import json

def lambda_handler(event, context):
    print(event)
    return {
        'statusCode': 200,
        'body': json.dumps('Hello from Lambda!')
    }

Now upload some files in our S3 bucket, and it should trigger our lambda function. For testing, I have uploaded an empty test1.txt file in our bucket. Once successfully uploaded I check the Lambda function logs to see if any event is received. For this go to lambda function mydata-sync > Monitor > Logs > View logs in CloudWatch. For the CloudWatch console view the latest log stream. Below is the event I have received in the logs

{'Records': [{'eventVersion': '2.1', 'eventSource': 'aws:s3', 'awsRegion': 'us-east-1', 'eventTime': '2022-03-28T16:08:00.896Z', 'eventName': 'ObjectCreated:Put', 'userIdentity': {'principalId': 'AWS:AIDA3VIXXJNKIVU6P5NY3'}, 'requestParameters': {'sourceIPAddress': '202.163.113.76'}, 'responseElements': {'x-amz-request-id': '39MD61ZS00SNK2RT', 'x-amz-id-2': 'U+zPUWOrfzTuVi7kbaBONLHoJXKqUICsVqyKBg4yPKYbUV7pQLGc4Z5A2fSIVvDFtSJHC6v29EDJoXhypWsj2wXanUu8YrLocr3z+yK1qoo='}, 's3': {'s3SchemaVersion': '1.0', 'configurationId': 'object-sync', 'bucket': {'name': 'mydata-202203', 'ownerIdentity': {'principalId': 'AYAQOSFZ1VPK'}, 'arn': 'arn:aws:s3:::mydata-202203'}, 'object': {'key': 'test1.txt', 'size': 0, 'eTag': 'd41d8cd98f00b204e9800998ecf8427e', 'sequencer': '006241DD60D67A4556'}}}]}

let’s load this event in a dictionary and find some important parameters

event = {'Records': [{'eventVersion': '2.1', 'eventSource': 'aws:s3', 'awsRegion': 'us-east-1', 'eventTime': '2022-03-28T16:08:00.896Z', 'eventName': 'ObjectCreated:Put', 'userIdentity': {'principalId': 'AWS:AIDA3VIXXJNKIVU6P5NY3'}, 'requestParameters': {'sourceIPAddress': '202.163.113.76'}, 'responseElements': {'x-amz-request-id': '39MD61ZS00SNK2RT', 'x-amz-id-2': 'U+zPUWOrfzTuVi7kbaBONLHoJXKqUICsVqyKBg4yPKYbUV7pQLGc4Z5A2fSIVvDFtSJHC6v29EDJoXhypWsj2wXanUu8YrLocr3z+yK1qoo='}, 's3': {'s3SchemaVersion': '1.0', 'configurationId': 'object-sync', 'bucket': {'name': 'mydata-202203', 'ownerIdentity': {'principalId': 'AYAQOSFZ1VPK'}, 'arn': 'arn:aws:s3:::mydata-202203'}, 'object': {'key': 'test1.txt', 'size': 0, 'eTag': 'd41d8cd98f00b204e9800998ecf8427e', 'sequencer': '006241DD60D67A4556'}}}]}
event
{'Records': [{'eventVersion': '2.1',
   'eventSource': 'aws:s3',
   'awsRegion': 'us-east-1',
   'eventTime': '2022-03-28T16:08:00.896Z',
   'eventName': 'ObjectCreated:Put',
   'userIdentity': {'principalId': 'AWS:AIDA3VIXXJNKIVU6P5NY3'},
   'requestParameters': {'sourceIPAddress': '202.163.113.76'},
   'responseElements': {'x-amz-request-id': '39MD61ZS00SNK2RT',
    'x-amz-id-2': 'U+zPUWOrfzTuVi7kbaBONLHoJXKqUICsVqyKBg4yPKYbUV7pQLGc4Z5A2fSIVvDFtSJHC6v29EDJoXhypWsj2wXanUu8YrLocr3z+yK1qoo='},
   's3': {'s3SchemaVersion': '1.0',
    'configurationId': 'object-sync',
    'bucket': {'name': 'mydata-202203',
     'ownerIdentity': {'principalId': 'AYAQOSFZ1VPK'},
     'arn': 'arn:aws:s3:::mydata-202203'},
    'object': {'key': 'test1.txt',
     'size': 0,
     'eTag': 'd41d8cd98f00b204e9800998ecf8427e',
     'sequencer': '006241DD60D67A4556'}}}]}
##
# event name
event['Records'][0]['eventName']
'ObjectCreated:Put'
##
# bucket name
event['Records'][0]['s3']['bucket']['name']
'mydata-202203'
##
# uploaded object key
event['Records'][0]['s3']['object']['key']
'test1.txt'

Alright, we have seen that we are receiving notifications from S3 bucket so let’s now move on to the next section.

Create an EFS

From EFS console give name as mydata-efs. I am using default VPC for this post. Use Regional availability settings. Click Create. Once file system is created, click on Access points and create an access point for this efs to be mounted in other service. For access point use following settings * name = mydata-ap * root dir path = /efs * POSIX User * POSIX UID = 1000 * Group ID = 1000 * Root directory creation permissions * Owner user id = 1000 * Owner group id = 1000 * POSIX permissions = 777

Click Create.

Here I have used the root dir path as /efs this means that from this access point my access will be limited to folder /efs. If you want to provide full access to all folders then set to root path to /.

Note on EFS security group settings

In the last section, I have used a default VPC security group (sg) while creating EFS. default sg allows traffic for all protocols and all ports, both for inbound and outbound traffic. But if you are using a custom security group then make sure that you have an inbound rule for * Type = NFS * Protocol = TCP * Port range = 2049

Otherwise, you will not be able to access EFS using NFS clients.

Mount EFS to Lambda Function

To mount an EFS to the Lambda function requires some additional steps.

First add permissions to Lambda function. From lambda function > Configurations > Permissions > Execution role. Click on the execution role to open it in IAM concole. For the selected role attach an additional policy AmazonElasticFileSystemFullAccess.

Second, add the lambda to a VPC group in which efs was created. We have created efs in default VPC so let’s add lambda to it. For this from lambda Configurations > VPC click edit. For the next pane select default VPC, all subnets, default VPC security group, and click save.

Now we can add EFS to lambda. Go to lambda Configurations > File Systems > Add file system. Select the file system mydata-efs and associated access point mydata-ap and local mount point as /mnt/efs. The local mount point is the mounted directory from where we can access our EFS from inside the lambda environment. Click Save

Check EFS mount point from Lambda

Let’s verify from lambda that EFS has been mounted and can we access it. So update the lambda code as below and deploy it.

import json
import os

def lambda_handler(event, context):
    mount_path = '/mnt/efs'
    if os.path.exists(mount_path):
        print(f"{mount_path} exists")
        print(os.listdir('/mnt/efs'))
    
    print(event)

    return {
        'statusCode': 200,
        'body': json.dumps('Hello from Lambda!')
    }

Now test this code using a test event S3 Put. For this go to lambda Test > Create new event > Template (s3-put). ‘S3 Put’ test event is similar to the one we saw in the last section. We can use this request template to simulate the event received from S3 bucket. Once the test is successfully executed, check the log output.

START RequestId: 2e307a14-f373-46d5-b763-594d5f406ae6 Version: $LATEST
/mnt/efs exists
[]
{'Records': [{'eventVersion': '2.0', 'eventSource': 'aws:s3', 'awsRegion': 'us-east-1', 'eventTime': '1970-01-01T00:00:00.000Z', 'eventName': 'ObjectCreated:Put', 'userIdentity': {'principalId': 'EXAMPLE'}, 'requestParameters': {'sourceIPAddress': '127.0.0.1'}, 'responseElements': {'x-amz-request-id': 'EXAMPLE123456789', 'x-amz-id-2': 'EXAMPLE123/5678abcdefghijklambdaisawesome/mnopqrstuvwxyzABCDEFGH'}, 's3': {'s3SchemaVersion': '1.0', 'configurationId': 'testConfigRule', 'bucket': {'name': 'example-bucket', 'ownerIdentity': {'principalId': 'EXAMPLE'}, 'arn': 'arn:aws:s3:::example-bucket'}, 'object': {'key': 'test%2Fkey', 'size': 1024, 'eTag': '0123456789abcdef0123456789abcdef', 'sequencer': '0A1B2C3D4E5F678901'}}}]}
END RequestId: 2e307a14-f373-46d5-b763-594d5f406ae6
REPORT RequestId: 2e307a14-f373-46d5-b763-594d5f406ae6  Duration: 7.02 ms   Billed Duration: 8 ms   Memory Size: 128 MB Max Memory Used: 37 MB  Init Duration: 93.81 ms 

From the logs we can see that the mounted EFS directory exists /mnt/efs but currently the folder is empty.

Configure VPC endpoint for S3

Till now we have configured S3 notifications to trigger a lambda function and also mounted EFS to it. Our next step is to process the event received in lambda, and download the file from S3 to EFS. But since our lambda function is configured for a VPC we cannot connect to S3 from it. Even though we can still receive S3 event notification, when we try to connect to S3 to download any file we will get a timeout error. To fix this we will create a VPC endpoint for S3 bucket.

For this go to VPC console > Endpoints > Create endpoint, and set the following * name = mydata-ep * service category = aws services * services = com.amazonaws.us-east-1.s3 (Gateway) * vpc = default * route table = default (main route table) * policy = full access

Click Create endpoint

Configure S3 permissions for Lambda

For lambda to be able to connect to S3 we also need to give it proper permissions. For this go to Lambda > Configurations > Permissions > Execution Role > click on role name. From the IAM Role console select add permissions, and then select AmazonS3FullAccess

Process S3 event notifications

Our lambda and EFS are ready and we can now process S3 events. Update the lambda code as below to process S3 events. It will download and delete from EFS to keep it in sync with S3 bucket.

import json
import boto3
import os

s3 = boto3.client("s3")

def lambda_handler(event, context):

    event_name = event["Records"][0]["eventName"]
    bucket_name = event["Records"][0]["s3"]["bucket"]["name"]
    object_key = event["Records"][0]["s3"]["object"]["key"]

    efs_file_name = "/mnt/efs/" + object_key

    # S3 put
    if event_name == "ObjectCreated:Put":
        s3.download_file(bucket_name, object_key, efs_file_name)
        print(f"file downloaded: {efs_file_name}")

    # S3 delete
    if event_name == "ObjectRemoved:Delete":
        # check if file exists on efs
        if os.path.exists(efs_file_name):
            os.remove(efs_file_name)
            print(f"file deleted: {efs_file_name}")

    return {"statusCode": 200, "body": json.dumps(event)}

We can test this code using the S3-put test event we used last time. Modify the event for bucket name and object key as below.

{
  "Records": [
    {
      "eventVersion": "2.0",
      "eventSource": "aws:s3",
      "awsRegion": "us-east-1",
      "eventTime": "1970-01-01T00:00:00.000Z",
      "eventName": "ObjectCreated:Put",
      "userIdentity": {
        "principalId": "EXAMPLE"
      },
      "requestParameters": {
        "sourceIPAddress": "127.0.0.1"
      },
      "responseElements": {
        "x-amz-request-id": "EXAMPLE123456789",
        "x-amz-id-2": "EXAMPLE123/5678abcdefghijklambdaisawesome/mnopqrstuvwxyzABCDEFGH"
      },
      "s3": {
        "s3SchemaVersion": "1.0",
        "configurationId": "testConfigRule",
        "bucket": {
          "name": "mydata-202203",
          "ownerIdentity": {
            "principalId": "EXAMPLE"
          },
          "arn": "arn:aws:s3:::example-bucket"
        },
        "object": {
          "key": "test1.txt",
          "size": 1024,
          "eTag": "0123456789abcdef0123456789abcdef",
          "sequencer": "0A1B2C3D4E5F678901"
        }
      }
    }
  ]
}

Click test. From the output logs, we can see that our code was able to download the file from S3 bucket and write it on EFS.

START RequestId: 7e9c0dc2-f970-426e-8372-e59b07f5536c Version: $LATEST
file downloaded: /mnt/efs/test1.txt
END RequestId: 7e9c0dc2-f970-426e-8372-e59b07f5536c
REPORT RequestId: 7e9c0dc2-f970-426e-8372-e59b07f5536c  Duration: 370.00 ms Billed Duration: 371 ms Memory Size: 128 MB Max Memory Used: 72 MB  Init Duration: 367.68 ms

Note that if you get any permission errors then it could be due to the mounting path errors. Please do check the access point path and lambda mount path.

Verify file on EFS

We can verify files on EFS by directly mounting them to an EC2 machine and verifying from there. So let’s do that.

Create an EC2 machine * AMI = Amazon Linux 2 AMI (HVM) - Kernel 5.10, SSD Volume Type * Intance type = t2.micro (free tier) * Instance details * Network = default VPC * Auto-assign Public IP = Enable * Review and Lanunch > Launch > Proceed without key pair.

Once the instance is up and running, click on it and connect using EC2 instance connect option. Create a dir ‘efs’ using the command

mkdir efs

In a separate tab open EFS, and click on the file system we have created. Click Attach. From “Mount via DNS” copy command for NFS client. paste that in EC2 bash terminal

sudo mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport fs-0c9526e2f48ece247.efs.us-east-1.amazonaws.com:/ efs

Once successfully mounted verify that the file ‘test1.txt’ exists in EFS. We can also delete the file from S3 and similarly verify from EFS that the file has been removed.

Summary

A summary of all the steps * Create an S3 bucket * Create a Lambda function * Create event notifications on the S3 bucket to trigger the lambda function * Create an EFS file system and its access point. Check the security group setting for inbound rules for NFS traffic * Add EFS and S3 permissions to lambda * Add lambda to VPC * Create VPC endpoint for S3 bucket * Update lambda code to process event notifications * Use EC2 to mount EFS and verify the files