AWS S3—>AWS Lambda —>SSM—> SQL Server Job

Scenario:

Application is running on java and the database is using PostgreSQL. Application user uploads the data and this data needs to process to analytic server as well. There is no direct access to Analytic Database server that is running on SQL Server on different location. You need to architect a solution to process these data automatically as soon as the data gets uploaded from application. Also, application user may upload “N” number of files per day but the processes has to pick up the latest among all for that particular day.

My Solution:

Application uploads the data in a CSV format to S3 bucket
Create AWS lambda to process the data to analytic server
Create an s3 event on the S3 bucket
Lambda will trigger SQL Server agent job and process the csv file to Analytic Database Server
SQL Agent job on analytic server picks up the latest file on the s3 bucket and process the data

Step 1: Application uploads the data in a CSV format to S3 bucket

In this example, I’ve create an s3 bucket (testcommandbucket)

Let’s upload the data. I’ve uploaded some sample files to demonstrate.

Step 2: Create AWS lambda to process the data to analytic server

I’ve created a lambda uploaddata in python 2.7 as below


import boto3
import time
import sys
import os
ec2 = boto3.client('ec2')
s3 = boto3.client('s3')
ssm_client = boto3.client('ssm')
def lambda_handler(event, context):
if event:
print("Event: ", event)
file_obj = event["Records"][0]
filename = str(file_obj['s3']['object']['key'])
print("filename: ", filename )
if 'AnalyticData' in filename:
def execute_ssm_command(client, commands, instance_ids):
throttling="False"
sleepTime = 1
while throttling == "False":
try:
resp = client.send_command(
DocumentName="AWS-RunPowerShellScript",
Parameters={'commands': commands},
InstanceIds=instance_ids,
)
throttling="True"
except Exception as e:
print(e)
print("throtelling")
sleepTime = sleepTime*20
time.sleep(sleepTime)
instance_ids = ['i-02742a8']
commands = ['sqlcmd -S AnalysisDBServer -Q "EXEC msdb.dbo.sp_start_job ''SqlJob_Process_Data''"']
execute_ssm_command(ssm_client, commands, instance_ids)
#print(instance["Name"])
#lambda_handler(None,None)

In the above example, you can see i’m sending ssm command as powershell script to start an SQL server agent job. I also handling throttling if the lambda is executing due to concurrent events.

Note: You can run above code on python 2.7/3.6

Step 3: Create an s3 event on the S3 bucket

Let’s create an s3 event to trigger lambda when the files gets uploaded to s3 bucket.

If you look at the above screenshot I’ve created s3 event by calling lambda and filtering the s3 bucket data with *.csv files

Step 4: Lambda will trigger SQL Server agent job and process the csv file to Analytic Database Server

I’ve uploaded latest file on to s3 you can see in the below lambda cloud watch logs that the lambda was triggered and processed the latest file AnalyticData-2018-12-22.csv

This lambda executes ssm command against the Database Server Ec2 instance to start the sql agent job. You can see the ssm commands logs as below that the job was triggered

Step 5: SQL Agent job on analytic server picks up the latest file on the s3 bucket and process the data

Final step is let’s look at the sql job is triggered or not.

You can see that the job was executed successfully.

What does the sql job do?

5.1. Copies the latest file from s3 bucket to local server

5.2. Import the csv data to analysis table using bulk openrowset method

5.3. Move the processed file to processed folder

5.4. Trigger the SSRS subscription to send reports as email to stake holders

All the 5.1 to 5.4 are outside the scope of this article.

Hope you enjoyed the post!

Cheers

Ramasankar Molleti

Published by Ramasankar

As a Principal Cloud Architect with over 18 years of experience, I am dedicated to revolutionizing IT landscapes through cutting-edge cloud solutions. My expertise spans Cloud Architecture, Security Architecture, Solution Design, Cloud Migration, Database Transformation, Development, and Big Data Analytics.Currently, I spearhead cloud initiatives with a focus on Infrastructure, Containerization, Security, Big Data, Machine Learning, and Artificial Intelligence. I collaborate closely with development teams to architect, build, and manage robust cloud ecosystems that drive business growth and technological advancement.Core Competencies: • Cloud Platforms: AWS, Google Cloud Platform, Microsoft Azure • Technologies: Kubernetes, Serverless Computing, Microservices • Databases: MS SQL Server, PostgreSQL, Oracle, MongoDB, Amazon Redshift, DynamoDB, Aurora • Industries: Finance, Retail, Manufacturing. Throughout my career, I’ve had the privilege of working with industry leaders such as OCC, Gate Gourmet, Walgreens, and Johnson Controls, gaining invaluable insights across diverse sectors.As a lifelong learner and knowledge sharer, I take pride in being the first in my organization to complete all major AWS certifications. I am passionate about mentoring and guiding fellow professionals in their cloud journey, fostering a culture of continuous learning and innovation.Let’s connect and explore how we can leverage cloud technologies to transform your business: • LinkedIn: https://www.linkedin.com/in/ramasankar-molleti-23b13218/ • Book a mentorship session: [1:1] Together, let’s architect the future of cloud computing and drive technological excellence. Disclaimer The views expressed on this website/blog are mine alone and do not reflect the views of my company. All postings on this blog are provided “AS IS” with no warranties, and confers no rights. The owner of https://ramasankarmolleti.com will not be liable for any errors or omissions in this information nor for the availability of this information. The owner will not be liable for any losses, injuries, or damages from the display or use of this information. View more posts