AWS Lambda (part II): How to create zip from files

Hi,

Last time I talked about how to create zip from files and recently I received a mail from someone which wanted to know how I solved this issue. Later I realized that I covered all part of the process concerning this but I did not dig enough (at least technically) to explain how I managed to do it. Today I'll explain and illustrate technically how I did this, and I will also share this to my Github to let you use it, modify it and improve it for your own usage .

zip-aws-lambda

As I said previously you will have to manage streams.

  1. Define and list into an array bunch of files (from one or various S3 buckets)
  2. Define an output S3 bucket
  3. Stream sources, pipe them to a transform zip wrapper and pipe the output directly to a output (destination) S3 bucket

List your elements

Concerning me this was part of a bigger project so my elements was directly given through a SNS event. So I had to parse my events then get my list. Nevertheless whatever the way you get your inputs you can set them statically or dynamically it does not matter as long as you get all the information you need.

I defined a kind of Object called File, the model is not perfect but it does work for my purpose and I invite you to improve this solution and make it fits your needs.

Basically I created a File object defined as:

const myFile = File('pictures.png', '/tmp', 'pictures')
console.log(myFile)
/*
{
  name: pictures,
  path: '/tmp/pictures.png',
  s3Params: [Object]
}
*/

If you check my s3Params functions it has 2 main functions put and get, those are designed to return AWS required parameters depending on which operation you need to perform.

  1. If you want to define a local file you will bind your object to it able to send it directly to AWS S3 Bucket as a stream using the put method with minimum required parameters for the putObject methods.

  2. If you want to define a remote file stored on a Bucket you will be able afterward to fetch your file with get method parameters.

This is my own wrapper there is definitely no mandatory to use this and you are free to define your own if you need a more customized solution or extra parameters for get/put-Object method.

Check the AWSJavaScriptSDK for more information:

Advantage with my solution above is that you can create the required array I talked lately. The one you need to specify which file you wanted to zip and furthermore with the bucket option attached to the File model it brings flexibility meaning you can managed zip stream from various buckets. On my given solution, in order to make things easier I attached to my event a field called zipContent where you list statically your resources. Again, this is a solution but you are free to make things dynamically and fetch your resources from events, DynamoDB query or whatever. Once this is done let's go to the next part of the process.

Define an output bucket

I also provided this value directly into my test.json event. This could be dynamic and you could even define multiple outputs while organizing your previous list. Indeed the zip function is ready to grab an array of array (list of zipContent). The only thing you need to do is to define the output for each.

eg:

const myFilesToZip = [
  {
    bucketOutput: 'pictures',
    zipContent: [
      'picture1.png',
      'picture2.png'
      ...
    ]
  },
  {
    bucketOutput: 'videos',
    zipContent: [
      'video1.mov',
      'video2.mov'
      ...
    ]
  }
]

Just make them match your pattern and you are good to go .

Stream them all

All the magic is done here. I talked about it, but piping stuff together with Node.js is simple.

eg: let's condider we have 2 stream A and B

const A = fs.createReadStream(fileA)
const B = fs.createWriteStream('/tmp/B.txt')
A.pipe(B)
// We piped our readStream A into file /tmp/B.txt.

According to me I am using pump to pipe stuff and it helps me deal with closing streams you do not have to take care of it. Out of this, it's pretty much the same with stream and zip. In addition you'll have to put a zip wrapper between A and B then it will turn and transform the output as a zipped stream. I said it before in my previous article, as far as I know there is no native way to define a remote S3 bucket as writeStream. If I'm not mistaken writeStream is designed for local files. However s3-upload-stream did the trick and It worked pretty well. With it you can turn a AWS S3 Bucket as a writeStream and then pipe the output directly into it.

I am inviting you people to check my sources out to see how all of this is working, and get an overview about how I deal with this issue. Since I am working with typescript you will have to compile it to make it work with Serverless.

Typescript yarn add global typescript then install dependencies and compile .ts -> .js.

yarn install
tsc *.ts

Run lambda locally
serverless invoke local -f zip -p event/test.json

Hope you did enjoy this article, don't hesitate to contact me if you have any problems or issues.

NB: I also realized that I'll finally add a comment section, in order to add more interaction with my readers. I did wanted to work with Disqus but it does not fit my needs, I'll check other options and put one as soon as possible.