AWS boto3 + S3 + Lambda auto add cache control

For AWS Lambda with python, we need to use boto3. So a different code is required.

from __future__ import print_function

import json
import urllib
import boto3
import email
import time
from datetime import datetime, timedelta

s3 = boto3.resource('s3')
one_month = 3600*24*30
one_year  = 3600*24*365

def lambda_handler(event, context):
    # Get the object from the event and show its content type
    bucket_name = event['Records'][0]['s3']['bucket']['name']
    key_name = urllib.unquote_plus(event['Records'][0]['s3']['object']['key']).decode('utf8')
    
    print(key_name)
    cache_time = one_month # or one_year
    #get the object
    object = s3.Object(bucket_name, key_name)
    
    #get object details 
    response = object.get() 
    # check if 'CacheControl' exists
    if 'CacheControl' in response:
        print("exists")
        return
    modified = response
    modified['Body'] = response['Body'].read()
    modified.pop("VersionId", None)
    modified.pop("AcceptRanges", None)
    modified.pop("ETag", None)
    modified.pop("LastModified", None)
    modified.pop("ResponseMetadata", None)
    modified['Metadata'] = response['Metadata']
    modified['CacheControl'] = ('max-age=%d, public' % (cache_time))
    # put the same object with a new CacheControl
    response = object.put(**modified)
    return

Binary Indexed Tree or Fenwick Tree

PROBLEM TO BE SOLVED

Let’s say, there is a huge array ‘A’ of size N, N = 1000000.
Problem (query) is to find the sum of all the numbers in A from 100 till 2000 i.e. A[100] + A[101] + A[102] + A[103] + ….. + A[1998] + A[1999] + A[2000].
Well, a simple way is this

    
    int sum = 0;
    for(int i=100; i<=2000; i++) sum += A[i];
    print(sum)

This is O(n)
Let’s make these modifications on A:
A[999] = 23, A[2] = 23, A[3412] = 23, A[3] = 23, .. etc.
After these modifications we want to calculate the sum from A[100] till A[2000]. How will we do this?
For every single query, we have to loop from 100 to 2000 again and again. So if there are ‘m’ queries then the run time is O(m*n).
Problem becomes even more difficult if we need to calculate the sum from any index ‘i’ to index ‘j’ of A ,(i<j), i.e.

 A[i] + A[i+1] + A[i+2] + ... + A[j-1] + A[j]

Even if we create a new array Sum such that

Sum[ind] = Sum[ind-1] + A[ind]

we have to update this new array for every change made to A.
Can we do better? Can we avoid this O(n) for every change to A. Let’s see.

WHAT is “num & (-num)” ? Here ‘&’ is Logical AND

Say there is a integer num = 5. Binary representation of ‘5’ is ‘101’. But what is the binary representation of ‘-5’ ?

Number Binary representation
5 101
-5 ?

First, let’s understand how negative of a number is created.

Negative of NUM = ( Complement of the Num + 1)
Complement of the Num = Convert all 1's to 0 and 0's to 1
Let's calculate Binary of '-5'
Binary of '5' = '101'
So, Complement of 5 = Complement of '101' = '010' [convert 1 to 0 and 0 to 1]
So, as per rule above, Negative of 5 = ('010' + 1) = '011'
So, the binary form of '-5' = '011'
Num Num in Binary Binary representation of (-Num)
5 101 011
 7  111  001
 8  100  100
 12  1100  0100
What is '5 & (-5)'?
'101' & '011' = '001'
Num, -Num Num & -Num
101, 011 001
111,001 001
100,100 100
1100,0100 0100

As we can see all the “num & -num” result in a number with only the last ‘1’ in the original num. We will use this information soon.

Going from ind to “ind + (ind & -ind)”

Let’s say ind = 1, Let’s go from ind to ind + (ind & -ind) till we reach ‘8’ or bigger. Try to do with you hand once.
Below table shows this:

ind (Binary) (ind & -ind) (Binary) ind + (ind & -ind) (Binary)
1 (1) 1(1) 2(10)
2(10) 2(10) 4(100)
4(100) 4(100) 8(100)
while(ind<=8) ind = ind + (ind & -ind);

So, Starting from ind  = 1, ind changes like this: 1 -> 2 -> 4 -> 8
Starting from ind = 2, 2 -> 4 -> 8
Starting from ind = 3, 3 -> 4 -> 8
Starting from ind = 4, 4 -> 8
Starting from ind = 5, 5 -> 6 -> 8
Starting from ind = 6, 6 -> 8
Starting from ind = 7, 7 -> 8
Starting from int = 8, 8
while(ind >=0) ind = ind - (ind & -ind);

So, Starting from ind  = 1, ind changes like this: 1 -> 0
Starting from ind = 2, 2 -> 0
Starting from ind = 3, 3 -> 2 -> 0
Starting from ind = 4, 4 -> 0
Starting from ind = 5, 5 -> 4 -> 0
Starting from ind = 6, 6 -> 4 -> 0
Starting from ind = 7, 7 -> 5 -> 4 -> 0
Starting from int = 8, 8 -> 0

I recommend solving both above code by hand once.

Binary Indexed Tree (BIT)

Let's say, our array A is of size N=8, A[0...7]
We create a new array called BIT[0..8]
         0  1  2  3  4  5  6  7  8
BIT[] = {0, 0, 0, 0, 0, 0, 0, 0, 0}
A[]   = {0, 0, 0, 0, 0, 0, 0, 0, 0}

Now let’s run the following code:

A[0] = 3
updateBIT(0, A[0])

void updateBIT(int ind, int val){
    ind = ind + 1; // Increase ind by 1
    while(ind <= N){
        // increase the value of BIT[ind] by val
    	BIT[ind] = BIT[ind] + val;
        // go to next of ind
    	ind  = ind + (ind & -ind);
    }
}

Note that: updateBIT first change ind to ind+1. What is gonna happen?

We know, for ind = 1, ind changes like this: 1 -> 2 -> 4 -> 8
So, at the end of the above code BIT will look like this:
         0  1  2  3  4  5  6  7  8
BIT[] = {0, 3, 3, 0, 3, 0, 0, 0, 3}
A[]   = {0, 3, 0, 0, 0, 0, 0, 0, 0}

Now let’s run the following code:

A[1] = 2
updateBIT(1, A[1])

What is gonna happen? Again, ind = ind+1 before the while loop.

We know, for ind = 2, ind changes like this: 2 -> 4 -> 8
So, at the end of the above code BIT will look like this:
         0  1  2  3  4  5  6  7  8
BIT[] = {0, 3, 5, 0, 5, 0, 0, 0, 5}
A[]   = {0, 3, 2, 0, 0, 0, 0, 0, 0}

If we keep doing like this till ind=7. We will see something like this:

BIT[1] = A[0]
BIT[2] = A[0] + A[1]
BIT[3] = A[2]
BIT[4] = A[0] + A[1] + A[2] + A[3]
BIT[5] = A[4]
BIT[6] = A[4] + A[5]
BIT[7] = A[6]
BIT[8] = A[0] + A[1] + A[2] + A[3] + A[4] + A[5] + A[6] + A[7]

Querying on BIT

How shall we answer the query: Get the sum from A[2] till A[5] i.e.

A[2] + A[3] + A[4] + A[5] =??

Let’s define this function

int getSum(int ind){
	ind = ind + 1;
	int sum = 0;
	while(ind > 0){
		sum = sum + BIT[ind];
		ind = ind - (ind & -ind);
	}
	return sum;
}

Let’s run getSum with different ind.

ind = 0, sum = BIT[1] 
ind = 1, sum = BIT[2], from above table we know BIT[2] = A[0] + A[1] 
ind = 2, sum = BIT[3] + BIT[2] => A[2] + A[1] + A[0]
etc.

Note that here we are going in opposite direction because of ‘-‘ in

 ind = ind - (ind & -ind);
getSum(5) will run like this:
sum = BIT[6] + BIT[4] 
From above tables, sum = (A[4] + A[5]) + (A[0] + A[1] + A[2] + A[3])

So getSum(1) will run like this:
sum = BIT[2] 
From above tables, sum = (A[0] + A[1])

=> getSum(5) - getSum(1) = A[2] + A[3] + A[4] + A[5]
So to get sum from A[2] till A[5], we need 
getSum(5) - getSum(1)

In other words to get sum from A[i] till A[j], we need to calculate
getSum(j) - getSum(1-1)

That’s how we query BIT.

Runtime of updateBIT and getSum

For both updateBIT and getSum,average runtime is O(Log N). This is because maximum bits in a binary representation of a number N can be Log N.

Till Next time. CHAO!

AWS S3 add metadata cache control to existing keys

Code for adding cache-control to the all S3 key/object for a given bucket.

    import json
    import urllib
    from boto.s3.connection import S3Connection
    one_month = 3600*24*30
    one_year  = 3600*24*365
    cckey = 'cache-control'

    s3_connection = S3Connection()
    bucket_name = ''
    bucket = s3_connection.get_bucket(bucket_name, validate=False)

    for key in bucket:
        key_name = key.key
        if  key.size == 0: # continue on directories
            continue
        key = bucket.get_key(key_name)

        cache_time = one_month
        key.set_metadata(name=cckey, value = ('max-age=%d, public' % (cache_time)))
        key.set_metadata(name='content-type', value = key.content_type) # preserve the content-type
        
        key2 = key.copy(key.bucket.name, key.name, key.metadata, preserve_acl=True)
        #print(key2.__dict__)
        continue

Code for adding cache-control to a particular key for a given bucket.

    import json
    import urllib
    from boto.s3.connection import S3Connection
    one_month = 3600*24*30
    one_year  = 3600*24*365
    cckey = 'cache-control'

    s3_connection = S3Connection()
    bucket_name = ''
    key_name = ''
    bucket = s3_connection.get_bucket(bucket_name, validate=False)
    key = bucket.get_key(key_name) 
    key_name = key.key
    cache_time = one_month
    key.set_metadata(name=cckey, value = ('max-age=%d, public' % (cache_time)))
    key.set_metadata(name='content-type', value = key.content_type) # preserve the content-type
    
    key2 = key.copy(key.bucket.name, key.name, key.metadata, preserve_acl=True)