Today I Learned

AWS Elastic Transcoder

September 04, 2019

Transcoding Videos on AWS

Over the past few months my office has started moving applications from local servers to AWS. Most of the moves have gone well but migrating our daily transcoding jobs felt a bit more challenging. The legacy setup was:

  1. The client saves mp4s to their FTP
  2. A PHP script downloads the files
  3. PHP invokes FFmpeg to transcode and save the files

Besides moving this to AWS it was also a good time to test out Lambda serverless functions with the AWS Elastic Transcoder. I did several Lambda scripts in the past but had not touched their transcoding process. So it was time to dive in!

Project Outline

The project outline was as follows:

  1. FTP to S3
    • CloudWatch rule triggers 1st Lambda function each day
    • Lambda function moves files from client FTP to bucket-a
  2. S3 to Elastic Transcoder to S3
    • Trigger on bucket-a for 2nd Lambda function
    • 2nd Lambda invokes Elastic Transcoder
    • Elastic Transcoder saves files in bucket-b
  3. S3 to S3 copy and delete
    • Trigger on bucket-b for 3rd Lambda
    • 3rd Lambda copies files from bucket-b to c then deletes files from bucket-b

To be honest it feels a bit like a Rube Goldberg machine, but hey everything new feels weird at first :-)

Retrieving Files

The first step was to transfer client files from FTP to S3. This proved rather simple using the following python script.

from __future__ import print_function
import boto3
import os
import ftplib

# AWS S3 Bucket name
bucket = "samplebucket"

# FTP Credentials
ip = "ftp.sample.com"
username = "sampleuser"
password = "samplepass"
remote_directory = "/"

s3_client = boto3.resource('s3')

# This function will check if a given name/path is a folder to avoid downloading it
def is_ftp_folder(ftp, filename):
    try:
        res = ftp.sendcmd('MLST ' + filename)
        if 'type=dir;' in res:
            return True
        else:
            return False
    except:
        return False

def lambda_handler(event, context):
    # Connecting to FTP
    try:
        ftp = ftplib.FTP(ip)
        ftp.login(username, password)
    except:
        print("Error connecting to FTP")

    try:
        ftp.cwd(remote_directory)
    except:
        print("Error changing to directory {}".format(remote_directory))

    try:
        if not is_ftp_folder(ftp, ftp_archive_folder):
            print("Creating archive directory {}".format(ftp_archive_folder))
            ftp.mkd(ftp_archive_folder)
    except:
        print("Error creting {} directory".format(ftp_archive_folder))

    files = ftp.nlst()

    for file in files:
        if not is_ftp_folder(ftp, file):
            try:
                if os.path.isfile("/tmp/" + file):
                    print("File {} exists locally, skip".format(file))
                    try:
                        ftp.rename(file, ftp_archive_folder + "/" + file)
                    except:
                        print("Can not move file {} to archive folder".format(file))

                else:
                    print("Downloading {} ....".format(file))
                    ftp.retrbinary("RETR " + file, open("/tmp/" + file, 'wb').write)
                    try:
                        s3_client.meta.client.upload_file("/tmp/" + file, bucket, file)
                        print("File {} uploaded to S3".format(file))

                        try:
                            ftp.rename(file, ftp_archive_folder + "/" + file)
                        except:
                            print("Can not move file {} to archive folder".format(file))
                    except:
                        print("Error uploading file {} !".format(file))
            except:
                print("Error downloading file {}!".format(file))\

Transcoding

Once the files are in S3 the file size needs to be reduced and converted to webm. This is made possible via AWS Elastic Transcoder. After creating a pipline in Elastic Transcoder I used a trigger on the initial S3 bucket to invoke a second Lambda function. This function submitted the new files in S3 to Elastic Transcoder. Transcoded files were then saved to bucket-b. Below is the nodejs I used to invoke the Elastic Transcoder.

    const AWS = require('aws-sdk');
    var s3 = new AWS.S3({ apiVersion: '2012–09–25'});
    var eltr = new AWS.ElasticTranscoder({ apiVersion: '2012–09–25', region: 'us-east-1'});
    
    exports.handler = function(event, context) {
    console.log('Executing Elastic Transcoder Orchestrator');
    var bucket = event.Records[0].s3.bucket.name;
    console.log('passed bucket');
    var key = event.Records[0].s3.object.key;
    console.log('passed object key');

    var pipelineId = '123-abc'; 
        if (bucket !== 'samplebucket') {
            context.fail('Incorrect Video Input Bucket'); 
            return;
        }
    console.log('passed if');
    var srcKey =  decodeURIComponent(event.Records[0].s3.object.key.replace(/\+/g, " ")); //the object may have spaces  
    var newKey = key.split('.')[0];
    var d = new Date();
    var hour = d.getHours();
    
    var params = {
        PipelineId: pipelineId,
        //OutputKeyPrefix: newKey + '/',
        Input: {
        Key: srcKey,
        FrameRate: 'auto',
        Resolution: 'auto',
        AspectRatio: 'auto',
        Interlaced: 'auto',
        Container: 'auto'
    },
    Outputs: [
        {
          Key: newKey + hour + '.webm',
          ThumbnailPattern: '',
          PresetId: '1351620000001-100240', //Webm 720p
        }]
         }; 
        console.log('Starting Job');
        
        eltr.createJob(params, function(err, data){
            if (err){
                console.log('problem in job ');
                console.log(err);
            } else {
                console.log(data);
            }  context.succeed('Job complete');
        });
    };

One More Move

When the Elastic Transcoder saves files it does not have the ability to over write. If you’re creating new and different content with new names each time this isn’t an issue. However our content is just an updated version of the same file for the same HTML page that needs a consistent name. This means we need to clear out bucket-b prior to the transcoder saving new files. To do this I used a trigger on bucket-b to invoke a 3rd Lambda function.

const aws = require('aws-sdk');
const s3 = new aws.S3();

// Define variables for the source and destination buckets
var srcBucket = "bucket-b";
var destBucket = "bucket-c";
var sourceObject = "sample.webm";

exports.handler = (event, context, callback) => {

    s3.copyObject({ 
        CopySource: srcBucket + '/' + sourceObject,
        Bucket: destBucket,
        Key: sourceObject
        }, function(copyErr, copyData){
        if (copyErr) {
                console.log("Error: " + copyErr);
            } else {
                console.log('Copied OK');
            } 
        });
    callback(deleteMe(), 'All done!');
    };

    function deleteMe(){
        var params = {  Bucket: srcBucket, Key: sourceObject };
        s3.deleteObject(params, function(err, data) {
        if (err) console.log(err, err.stack);  // error
        else     console.log();                 // deleted
    });
}

In The End

After completing this process it still feels like a Rube Goldberg machine. However this allowed us to move another function from on premise to AWS along with the risk of maintaining a physical machine. The next step here is to convince the client to use SFTP for increased security.


Ryan Kovalchick

Written by Ryan Kovalchick who lives and works in Allentown Pennsylvania creating, fixing and building.