Lambda to delete indices older than ‘x’ days on Elastic Search

This is simple example of how we can delete the indices older than ‘x’ days.


import boto3
from requests_aws4auth import AWS4Auth
from elasticsearch import Elasticsearch, RequestsHttpConnection
import curator

host = 'XXXXXXXXXXXXXXXX.us-east-1.es.amazonaws.com' # Provide the elasticsearch endpoint
region = 'us-east-1' # Provide the region
service = 'es'
credentials = boto3.Session().get_credentials()
awsauth = AWS4Auth(credentials.access_key, credentials.secret_key, region, service, session_token=credentials.token)

# Lambda execution starts here.
def lambda_handler(event, context):

# Build the Elasticsearch client.
es = Elasticsearch(
hosts = [{'host': host, 'port': 443}],
http_auth = awsauth,
use_ssl = True,
verify_certs = True,
connection_class = RequestsHttpConnection
)

index_list = curator.IndexList(es)
# Delete the indices for the pattern yyyy-mm-dd* with creation_date greater than x days.
# Source https://curator.readthedocs.io/en/latest/examples.html
index_list.filter_by_age(source='creation_date', direction='older', timestring='%Y-%m-%d', unit='days', unit_count=7)

print("Found %s indices to delete" % len(index_list.indices))

if index_list.indices:
curator.DeleteIndices(index_list).do_action()

print('Indices deleted successfully')

This example needs aws4auth, elasticsearch, curator modules installed. You can build these modules on using linux machine. I’ve used one of the ec2 instance that has amazon linux installed.
# Install Dependancies
yum -y install python-pip zip
pip install virtualenv

# Create the virtual environment
mkdir -p /var/es-cleanup && cd /var/es-cleanup
virtualenv /var/es-cleanup
cd /var/es-cleanup && source bin/activate
pip install requests_aws4auth -t .
pip install elasticsearch -t .
pip install elasticsearch-curator -t .

# Copy the code to current directory and set the file permission to execute mode
chmod 754 es-cleanup.py

# Package the lambda
zip -r /var/es-cleanup.zip *

# Send the package to S3 Bucket
# aws s3 cp /var/es-cleanup s3://BUCKET_NAME/

Hope you enjoyed the post.

Cheers

Ramasankar Molleti

LinkedIn

How To Create Jupyter Notebook using Amazon SageMaker

In this blog post, i would like to present how to create Jupyter Notebook  and run various commands using Amazon Sagemaker

What is Jupyter Notebook?

The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more.  For more information you may refer here.

What is Amazon Sagemaker?

Amazon SageMaker is a fully-managed service that enables data scientists and developers to quickly and easily build, train, and deploy machine learning models at any scale. Amazon SageMaker includes modules that can be used together or independently to build, train, and deploy your machine learning models.  For more information you may refer here.

Let’s create a jupyter notebook using sagemaker

Step1: Open Amazon Sagemaker 

1

Step2: Click on Create notebook instance

2.PNG

Step3: Create a notebook instance name “MyJupyterNotebook” 

If you want  speed up the throughput and decrease the latency of getting real-time inferences from your deep learning models that are deployed a as Amazon sagemaker hosted models you can select elastic inference. In this example, i’m leaving this none.

You will have to create an IAM role  upon instance creation or use existing IAM role arn which has enough permissions to create jupyter notebook instance

You can select the sagemaker to be deployed on dedicated vpc so that the instance can be created in private subnet for internal access. In this example i’m leaving this option as none so that i can access over the internet.

You can configure life cycle configuration to run shell scrips upon instance creation.

You can specify the volume size of the instance. This depends on howmuch you want to store the data on notebook. I leave it as default.

You can select git repository while sagemaker notebook is being created, so that the jupyter will start in the specified repository.  If you would like to encrypt the instance you can use AWS KMS.

Tags: You can specify name of the notebook and the purpose for easy identification.

Click on create notebook instance, this will create jupyter notebook instance as below.

3.PNG

This will take few minutes to create instance depending on the size of the instance we chose.

4.PNG

As you can see the instance was created successfully, let’s open the jupyter notebook by clicking Open Jupyter under Actions tab.

5.PNG

There you go, you are successfully created Jupyter notebook.

Let’s run some commands on the notebook.

Click on New at the top right corner and then you will see many different modules to run the commands. I will select python3.

6.PNG

After you select conda_python3, this will open in new window as below

7.PNG

I’ve created one s3 bucket called “testrama2019” and i will be copying the content in s3 to local jupyter notebook. I will be using aws cli terminal commands.  When you use terminal commands you need to specify “!” before the command.

8.PNG

Let’s execute by clicking Run or you can use shift+enter to run the command

9.PNG

If you look at the results, i’ve a sample csv file which i will be reading after importing pandas module

10.PNG

That’s it. If you look at the above i was able to read the content in csv file using pandas module.

Hope you enjoyed the post.

Cheers

Ramasankar Molleti

LinkedIn

Python Script – Create and Insert data into Dynamodb table

In this example, i would like to demonstrate how to create a AWS DynamoDB table using python.

I’m taking the simple employee table which contains Id, FirstName, LastName, Dept and Sal columns. Also, i’m going to create a Partition key on id and sort key on Sal columns.  I will use boto3 to call the dynamodb service. For more information about boto3 you can refer here.


import boto3
# Create a table Employee

dynamodb = boto3.resource('dynamodb', region_name='us-east-1')

mytable = dynamodb.create_table(
TableName= 'Employee',
KeySchema=[
{
'KeyType': 'HASH',
'AttributeName': 'Id'
},
{
'KeyType': 'RANGE',
'AttributeName': 'Sal'
}
],
AttributeDefinitions=[
{
'AttributeName': 'Id',
'AttributeType': 'N'
},
{
'AttributeName': 'Sal',
'AttributeType': 'N'
}
],
ProvisionedThroughput={
'ReadCapacityUnits': 2,
'WriteCapacityUnits': 2
}
)
# Wait until the table creation complete.
mytable.meta.client.get_waiter('table_exists').wait(TableName='Employee')
print('Table has been created, please continue to insert data.')

If you look at the definition of the attribute while creating table i only mentioned partition key and sort key for creating table and did not specify any other column names (FirstName, LastName, Dept and Sal). With Dynamodb (NOSQL Database) you don’t need to specify every record attribute field ahead of time. You only need to specify the hash and range fields ahead of time.

This will create a table called Employee as below

Dyanamodb2

Dyanamodb1.PNG

As you can see the table Employee created with partition key as Id and Sort key as Sal.

Let’s insert data into table.  Use the below script to insert the data. you can use put_item method to insert the data to dynamodb. You can see the syntax here


mytable.put_item(
    Item={
        'Id': 1,
        'FirstName': 'Ramasankar',
        'LastName': 'Molleti',
        'Dept': 'IT',
        'Sal': 5000
    }
)
mytable.put_item(
    Item={
        'Id': 1,
        'FirstName': 'Sourav',
        'LastName': 'Mukherjee',
        'Dept': 'IT',
        'Sal': 10000
    }
)
mytable.put_item(
    Item={
        'Id': 1,
        'FirstName': 'Praveen',
        'LastName': 'Kumar',
        'Dept': 'Finance',
        'Sal': 5000
    }
)
mytable.put_item(
    Item={
        'Id': 1,
        'FirstName': 'Suresh',
        'LastName': 'Kumar',
        'Dept': 'Finance',
        'Sal': 12000
    }
)

response = mytable.scan()

for i in response['Items']:
    print("added item:", i['Id'], ":", i['FirstName'], ":", i['LastName'], ":", i['Dept'], ":", i['Sal'])

Output:

added item: 1 : Praveen : Kumar : Finance : 5000
added item: 1 : Sourav : Mukherjee : IT : 10000
added item: 1 : Suresh : Kumar : Finance : 12000

Process finished with exit code 0

Dyanamodb3.PNG

As you can see the data has been inserted.  That’s it for creating and inserting data into dynamodb. Here is the below combined script. In this example i also mentioned that i used provisioned read and write throughput to use 2 instead of default values


import boto3
# Create a table called Employee

dynamodb = boto3.resource('dynamodb', region_name='us-east-1')
mytable = dynamodb.create_table(
TableName= 'Employee',
KeySchema=[
{
'KeyType': 'HASH',
'AttributeName': 'Id'
},
{
'KeyType': 'RANGE',
'AttributeName': 'Sal'
}
],
AttributeDefinitions=[
{
'AttributeName': 'Id',
'AttributeType': 'N'
},
{
'AttributeName': 'Sal',
'AttributeType': 'N'
}
],
ProvisionedThroughput={
'ReadCapacityUnits': 2,
'WriteCapacityUnits': 2
}
)
# Wait until the table exists.
mytable.meta.client.get_waiter('table_exists').wait(TableName='Employee')
print('Table is ready, please continue to isert data.')

# Insert the data into dynamodb table
mytable.put_item(
Item={
'Id': 1,
'FirstName': 'Ramasankar',
'LastName': 'Molleti',
'Dept': 'IT',
'Sal': 5000
}
)
mytable.put_item(
Item={
'Id': 1,
'FirstName': 'Sourav',
'LastName': 'Mukherjee',
'Dept': 'IT',
'Sal': 10000
}
)
mytable.put_item(
Item={
'Id': 1,
'FirstName': 'Praveen',
'LastName': 'Kumar',
'Dept': 'Finance',
'Sal': 5000
}
)
mytable.put_item(
Item={
'Id': 1,
'FirstName': 'Suresh',
'LastName': 'Kumar',
'Dept': 'Finance',
'Sal': 12000
}
)

response = mytable.scan()

for i in response['Items']:
print("added item:", i['Id'], ":", i['FirstName'], ":", i['LastName'], ":", i['Dept'], ":", i['Sal'])

Hope you enjoyed the post.

Cheers

Ramasankar Molleti

LinkedIn

AWS S3—>AWS Lambda —>SSM—> SQL Server Job

Scenario:

Application is running on java and the database is using PostgreSQL.  Application user uploads the data and this data needs to process to analytic server as well. There is no direct access to Analytic Database server that is running on SQL Server on different location. You need to architect a solution to process these data automatically as soon as the data gets uploaded from application.  Also, application user may upload “N” number of files per day but the processes has to pick up the latest among all for that particular day.

My Solution:

  1. Application uploads the data in a CSV format to S3 bucket
  2. Create AWS lambda to process the data to analytic server
  3. Create an s3 event on the S3 bucket
  4. Lambda will trigger SQL Server agent job and process the csv file to Analytic Database Server
  5. SQL Agent job on analytic server picks up the latest file on the s3 bucket and process the data
Step 1: Application uploads the data in a CSV format to S3 bucket

In this example, I’ve create an s3 bucket (testcommandbucket)

1

Let’s upload the data. I’ve uploaded some sample files to demonstrate.

2.PNG

Step 2:  Create AWS lambda to process the data to analytic server

I’ve created a lambda uploaddata in python 2.7 as below

3.PNG


import boto3
import time
import sys
import os
ec2 = boto3.client('ec2')
s3 = boto3.client('s3')
ssm_client = boto3.client('ssm')
def lambda_handler(event, context):
if event:
print("Event: ", event)
file_obj = event["Records"][0]
filename = str(file_obj['s3']['object']['key'])
print("filename: ", filename )
if 'AnalyticData' in filename:
def execute_ssm_command(client, commands, instance_ids):
throttling="False"
sleepTime = 1
while throttling == "False":
try:
resp = client.send_command(
DocumentName="AWS-RunPowerShellScript",
Parameters={'commands': commands},
InstanceIds=instance_ids,
)
throttling="True"
except Exception as e:
print(e)
print("throtelling")
sleepTime = sleepTime*20
time.sleep(sleepTime)
instance_ids = ['i-02742a8']
commands = ['sqlcmd -S AnalysisDBServer -Q "EXEC msdb.dbo.sp_start_job ''SqlJob_Process_Data''"']
execute_ssm_command(ssm_client, commands, instance_ids)
#print(instance["Name"])
#lambda_handler(None,None)

In the above example, you can see i’m sending ssm command as powershell script to start an SQL server agent job.  I also handling throttling if the lambda is executing due to concurrent events.

Note: You can run above code on python 2.7/3.6

Step 3:  Create an s3 event on the S3 bucket

Let’s create an s3 event to trigger lambda when the files gets uploaded to s3 bucket.

4.PNG

If you look at the above screenshot I’ve created s3 event by calling lambda and filtering the s3 bucket data with *.csv files

Step 4:  Lambda will trigger SQL Server agent job and process the csv file to Analytic Database Server

I’ve uploaded latest file on to s3 you can see in the below lambda cloud watch logs that the lambda was triggered and processed the latest file AnalyticData-2018-12-22.csv

5.PNG

This lambda executes ssm command against the Database Server Ec2 instance to start the sql agent job.  You can see the ssm commands logs as below that the job was triggered

Step 5: SQL Agent job on analytic server picks up the latest file on the s3 bucket and process the data

Final step is let’s look at the sql job is triggered or not.

6.PNG

You can see that the job was executed successfully.

What does the sql job do?

5.1. Copies the latest file from s3 bucket to local server

5.2. Import the csv data to analysis table using bulk openrowset method

5.3. Move the processed file to processed folder

5.4. Trigger the SSRS subscription to send reports as email to stake holders

All the 5.1 to 5.4 are outside the scope of this article.

Hope you enjoyed the post!

Cheers

Ramasankar Molleti

LinkedIn

AWS Lambda to connect to PostgreSQL and execute a function/query using Python

It’s been long time since i wrote a blog post. In this post i would like to show an example of lambda to connect to PostgreSQL database and execute the query.

  1. Build a PostgreSQL psycopg module in order to connect to database. I’ve built a custom compiled python library to connect to latest PostgreSQL 9.6 database using this. You may download the package from here 
  2. Write a sample lambda that updates the url in the table. It calls the function to update.
    import psycopg2
    def lambda_handler(event,context):
    conn_string = "dbname='test' port='5432' user='username' password='password' host='RDSEndpoint'"
    conn = psycopg2.connect(conn_string)
    cursor = conn.cursor()
    cursor.execute("select system_env_host('ramasankarmolleti.com')")
    conn.commit()
    cursor.close()
    print("working")
    
  3. You may also download python code here 
  4. Now, let’s create a lambda function
  5. Lambda_1
  6. In the above screenshot I’ve created a test_lambda_role iam role to get access to lambda services. Once you create the lambda, then combine both step1 and step2 as one zip file and upload the zip file in lambda you may download the combined zip file here
  7. Lambda_2
  8. Next, Here comes to access to Database from lambda. You need to create a security group that have access to PostgreSQL database from the lambda. I’ve created a security group and attached to lambda.
  9. Now, we have lambda deployed and run the lambda for testing. Click on test to test the lambda. This will prompt you to create test even as below
  10. Lambda_3.PNG
  11. After creating test even you can run the lambda by clicking test before that make sure lambda function name is updated to function name you created. In this example function name is UpdateHost_Python  as shown in below
  12. Lambda_4
  13. Now, Click on test to test the lambda.
  14. Lambda_5.PNG
  15.  Here you go lambda is successfully executed
  16. Lambda_6.PNG
  17. Let’s see the results in the database. You can see the results that host is updated to ramasankarmolleti.com.
  18. Update_Host_Python

Hope you enjoyed the post!

Cheers

Ramasankar Molleti

LinkedIn

Powershell Script to change all windows scheduled tasks credentials

One of my student asked me is there any simple script to update all windows scheduled tasks credentials in one step. Here is a simple way to do

$taskname = Get-ScheduledTask -TaskPath '\' | select TaskName
foreach ($task in $taskname)
{
Set-ScheduledTask -Password "password" -User "user" -TaskName $task.TaskName
}

Hope you enjoyed the post!

Cheers

Ramasankar Molleti

MSDN:LinkedIn:Twitter

How to Backup SQL Server Backups to Amazon S3 Storage

It’s been a long time since i posted in my blog. One of the common request i have been asked by my colleagues, friends about “Is there a way to automate backups that are taken on the SQL Server database server running on EC2 instance to Amazon S3 storage?”  Today, I would like to show how we can achieve this.

Requirement: I would like to keep one days of backups on the disk and also maintain remaining backups on Amazon S3 storage for 30 days and later archive them.

I’m assuming that you have a mechanism to take backups to Disk.

  •      Attach a S3 policy to Current Database Server running on EC2 instance. Instance need access to read and write permissions. You may use the below policy.

 

{    "Version": "2012-10-17",
"Statement": [        {
"Effect": "Allow",
"Action": [
"s3:Get*",
"s3:List*", "s3:Put*"
],
"Resource": [ "arn:aws:s3:::Name_of_the_bucket/*"
]
}
]
}
  • Once you attach above the policy to IAM ROLE on EC2 instance, EC2 Instance gets access to the Amazon s3 bucket.
  • Next, Install AWS CLI command line tool on the EC2 instance. You may download here 
  • Now, let’s begin the actual process.  I’m assuming that you already have backups jobs running on database server based on your schedule which takes the backups on disk.  Include the below powershell script in the second step after taking backups to disk.
$bucketname = 'S3_Bucket_Name/' # S3 bucket name for copying backups
$backuppath = 'B:\MSSQL\Backups' # Backup location on the disk
$Region = 'us-east-1'
$env:Path += ';C:\Program Files\Amazon\AWSCLI'
 $cmd="aws s3 sync $backuppath s3://$bucketname --region $Region"
 push-location 'C:\Program Files\Amazon\AWSCLI\'
 Get-Location
 Invoke-Expression $cmd
 Pop-Location
  • 1
  • The above step will sync the backups to S3, something like this.

2

  • Have we go you have backups sync back to S3. The advantage of this is you can reduce the cost of disk storage and have backups highly available and reliable. You don’t have to worry about disk capacity and maintaining redundancy at disk level. Amazon s3 storage has 99.999999999% durability and 99.99% availability.
  • Now, you can create a life cycle policy to keep the backups on s3 storage for 30 days and archive them to Glacier storage (Much Cheaper storage). You may follow the link here

Hope you enjoyed the post!

Cheers

Ramasankar Molleti

MSDN:LinkedIn:Twitter

How to kill all the sessions in postgreSQL database

The below script will be used to kill all the sessions except your session.

SELECT pg_terminate_backend(pg_stat_activity.pid)

FROM pg_stat_activity

WHERE pg_stat_activity.datname = 'dbname' –- Database Name

AND pid <> pg_backend_pid();

If you are looking for specific process id then you can provide the pid value in the where clause as below

SELECT pg_terminate_backend(pg_stat_activity.pid)

FROM pg_stat_activity

WHERE pg_stat_activity.datname = 'dbname' –- Database Name

AND pid <> (processid) -- Provide the process id you would like to kill;

 

Hope you enjoyed the post!

Cheers

Ramasankar Molleti

MSDN:LinkedIn:Twitter

 

 

Powershell Script to change IIS AppPool

import-module webadministration
dir IIS:\AppPools | ForEach-Object {
#Not able to change the pipeline mode through 'Set-ItemProperty'
&"$env:windir\system32\inetsrv\appcmd.exe" set apppool $_.Name /managedPipelineMode:"Classic"
$poolNme = 'IIS:\AppPools\'+$_.Name
Set-ItemProperty -Path $poolNme -name enable32BitAppOnWin64 -value FALSE
} 

How to Migrate SQL Server Database running on EC2 instance to Amazon RDS Instance

Recently i worked on migrating sql server databases running on amazon EC2 instances to Amazon RDS instances. Starting from 27th Jul 2016 Amazon announced you can ship your databases from ec2 to rds using native backup and restore method. Let’s see how it works.

Steps:

  • Assume you have RDS Instance ready with default setting
  • Create a custom option group “NativeBackup” and enable S3 bucket full access. You may use the below IAM rule
RDSAccessToS3
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "s3:*",
"Resource": "*"
}
]
}
  • Right click on Custom option group and then click on Add option –> Select SQLSERVER_BACKUP_RESTORE option — > Apply Immediately

rds_sql_server_add_backup_option_1

  • Now you have custom group setup with native backup and restore feature and correct permissions to S3 bucket. Now, Add this group to RDS instance. Right click on RDS instance –> Click on Modify –> Navigate to Option group —> Assign the NativeBackup group which was created in the step 2 –> Click Modify instance

1

  • Now, Copy the backup of EC2 instances to S3 bucket. You may use AWS CLI command line tool to copy from one EC2 instance to S3 bucket
Aws s3 cp \\ec2backuplocations <a href="//Infra_Bucket" target="_blank" rel="noreferrer">s3://</a>Buket_Name &nbsp;--profile “your profilename"
  • Once you are done with copying the backup to S3 location, then you may use the below script to restore the database from backup
    
    use master
    
    go
    
    exec msdb.dbo.rds_restore_database
    
    @restore_db_name='TestDB',
    
    @s3_arn_to_restore_from='arn:aws:s3:::Bucket_Name/testdb.bak';
  • Restore will begin, you may use the below stored procedure to get the status of the restore process
    exec msdb.dbo.rds_task_status
  • That’s all about the restore! Below are the possible errors you may encounter while migrating to RDS instance

Error1:

Aborted the task because of a task failure or a concurrent RESTORE_DB request.
Database TestDB cannot be restored because there is already an existing database wih the same family_guid on the instance.

Reason: If you have same file guid for data or log file exists then you cannot restore to rds instance. The file guid’s are unique to each database. There is a possibility to get the same fileguid based on the name.

You may use sys.master_files view to get the details of file guid

Another reason for this error is you have already restored the same database and trying to restore, hence the error. RDS has the limitation that you cannot restore the same database file names on the same instance.

Resolution: Do not restore the same database names on the same RDS instance

Error2:

Msg 229, Level 14, State 5, Procedure rds_restore_database, Line 1 [Batch Start Line 45]
The EXECUTE permission was denied on the object ‘rds_restore_database’, database ‘msdb’, schema ‘dbo’.

Reason: You do not have permission to perform the restore

Resolution:

Please make sure the login you used have execute permission for the stored procedure rds_restore_database. If you are using master credentials and you still getting the same error then you have not granted full permission to S3 bucket or missing IAM roles for the S3 bucket. Please grant the access and re-try

Error3:

Msg 50000, Level 16, State 0, Procedure rds_restore_database, Line 73 [Batch Start Line 22]
A task has already been issued for database:TestDB  with task Id: 3, please try again later.

Reason: You have issued the restore command multiple times and hence the error.

Resolution: Do not issue the restore command twice for the same database. Kill the multiple restore processes and re-issue the command

References:

https://aws.amazon.com/blogs/aws/amazon-rds-for-sql-server-support-for-native-backuprestore-to-amazon-s3/

https://www.brentozar.com/archive/2016/07/holy-cow-amazon-rds-sql-server-just-changed-everything/

Hope you enjoyed the post!

Cheers

Ramasankar Molleti

MSDN:LinkedIn:Twitter