Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

How to Make a Janitor-Lambda Function to Clean Up Old Deployment Packages

DZone's Guide to

How to Make a Janitor-Lambda Function to Clean Up Old Deployment Packages

What do janitors do? They clean up. Say hello to a new lambda function that keeps your deployments nice and tidy by getting rid of the older junk hiding there.

· Cloud Zone
Free Resource

Production-proven Mesosphere DC/OS is now even better with GPU scheduling, pods, troubleshooting, enhanced security, and over 100+ integrated services deployed in one-click.

When working with AWS Lambda, one of the things to keep in mind is that there’s a per region limit of 75GB total size for all deployment packages. While that sounds a lot at first glance, our small team of server engineers managed to rack up nearly 20GB of deployment packages in just over three months!

Whilst we have been mindful of deployment package size (because it affects cold start time) and heavily using Serverless' built-in mechanism to exclude npm packages that are not used by each of the functions, the simple fact that deployment is simple and fast, which means we’re doing A LOT OF DEPLOYMENTS.

Individually, most of our functions are sub-2MB, but many functions are deployed so often that in some cases there are more than 300 deployed versions! This is down to how the Serverless framework deploy functions – by publishing a new version each time. On its own, it’s not a problem, but unless you clean up the old deployment packages, you’ll eventually run into the 75GB limit.

Some readers might have heard of Netflix’s Janitor Monkey, which cleans up unused resources in your environment — instance, ASG, EBS volumes, EBS snapshots, etc.

Taking a page out of Netflix’s book, we wrote a Lambda function that finds and deletes old versions of your functions that are not referenced by an alias – remember, Serverless uses aliases to implement the concept of stages in Lambda, so not being referenced by an alias essentially equates to an orphaned version.

Image title


At the time of writing, we have just over 100 Lambda functions in our development environment and around 50 running in production. After deploying the janitor-lambda function, we have cut the code storage size down to 1.1GB, which includes only the current version of deployments for all our stages (we have four non-prod stages in this account).

Image title


Sidebar: if you’d like to hear more about our experience with Lambda thus far and what we have been doing, then check out the slides from my talk on the matter, I’d be happy to write them up in more details too when I have more free time.

Janitor-Lambda

Without further ado, here’s the bulk of our janitor function:

'use strict';

const _      = require('lodash');
const co     = require('co');
const AWS    = require('aws-sdk');
const lambda = new AWS.Lambda({ apiVersion: '2015-03-31' });

let functions = [];

let listFunctions = co.wrap(function* () {
  console.log('listing all available functions');

  let loop = co.wrap(function* (marker, acc) {
    let params = {
      Marker: marker,
      MaxItems: 10
    };

    let res = yield lambda.listFunctions(params).promise();
    let functions = res.Functions.map(x => x.FunctionArn);
    let newAcc = acc.concat(functions);

    if (res.NextMarker) {
      return yield loop(res.NextMarker, newAcc);
    } else {
      return _.shuffle(newAcc);
    }
  });

  return yield loop(undefined, []);
});

let listVersions = co.wrap(function* (funcArn) {
  console.log(`listing versions for function : ${funcArn}`);

  let loop = co.wrap(function* (marker, acc) {
    let params = {
      FunctionName: funcArn,
      Marker: marker,
      MaxItems: 20
    };

    let res = yield lambda.listVersionsByFunction(params).promise();
    let versions = res.Versions.map(x => x.Version).filter(x => x != "$LATEST");
    let newAcc = acc.concat(versions);

    if (res.NextMarker) {
      return yield loop(res.NextMarker, newAcc);
    } else {
      return newAcc;
    }
  });

  return yield loop(undefined, []);
});

let listAliasedVersions = co.wrap(function* (funcArn) {
  console.log(`listing aliases for function : ${funcArn}`);

  let loop = co.wrap(function* (marker, acc) {
    let params = {
      FunctionName: funcArn,
      Marker: marker,
      MaxItems: 20
    };

    let res = yield lambda.listAliases(params).promise();
    let versions = res.Aliases.map(x => x.FunctionVersion);
    let newAcc = acc.concat(versions);

    if (res.NextMarker) {
      return yield loop(res.NextMarker, newAcc);
    } else {
      return newAcc;
    }
  });

  return yield loop(undefined, []);
});

let deleteVersion = co.wrap(function* (funcArn, version) {
  console.log(`deleting [${funcArn}] version [${version}]`);

  let params = {
    FunctionName: funcArn,
    Qualifier: version
  };

  let res = yield lambda.deleteFunction(params).promise();
  console.log(res);
});

let cleanFunc = co.wrap(function* (funcArn) {
  console.log(`cleaning function: ${funcArn}`);
  let aliasedVersions = yield listAliasedVersions(funcArn);
  console.log('found aliased versions:\n', aliasedVersions);

  let versions = yield listVersions(funcArn);
  console.log('found versions:\n', versions);

  for (let version of versions) {
    if (!_.includes(aliasedVersions, version)) {
      yield deleteVersion(funcArn, version);
    }
  }
});

let clean = co.wrap(function* () {
  if (functions.length === 0) {
    functions = yield listFunctions();
  }

  // clone the functions that are left to do so that as we iterate with it we
  // can remove cleaned functions from 'functions'
  let toClean = functions.map(x => x);
  console.log(`${toClean.length} functions to clean:\n`, toClean);

  for (let func of toClean) {
    yield cleanFunc(func);
    _.pull(functions, func);
  }
});

module.exports.clean = clean;


Because AWS Lambda throttles you on the number of API calls per minute, we had to store the list of functions in the functions variable so that it carries over multiple invocations.

When we hit the (almost) inevitable throttle exception, the current invocation will end, and any functions that haven’t been completely cleaned will be cleaned the next time the function is invoked.

Another thing to keep in mind is that, when using a CloudWatchEvent as the source of your function, Amazon will retry your function up to two more times upon failure. In this case, if the function is retried straight away, it’ll just get throttled again. Hence why in the handler, we log and swallow any exceptions:

'use strict';

const clean = require('./lib').clean;

module.exports.handler = function(event, context, cb) {
  clean()
    .catch(err => console.log(err, err.stack))
    .then(() => context.succeed());
};


I hope you have found this post useful, let me know in the comments if you have anyLambda/Serverless related questions.

Simply build, test, and deploy. Mesosphere DC/OS is the best way to run containers and big data anywhere offering production-proven flexibility and reliability.

Topics:
aws ,lambda ,serverless

Published at DZone with permission of Yan Cui, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}