How Do I Schedule AWS EMR Serverless Jobs Using AWS Scheduler?
In this guide, we will demonstrate how to schedule AWS EMR Serverless jobs using AWS Scheduler with Pulumi. The purpose of this guide is to help you automate the execution of EMR Serverless jobs by setting up a scheduler that triggers the application at specified intervals. We will cover the creation of an EMR Serverless application and the configuration of the AWS Scheduler.
Key Points
- AWS EMR Serverless Application: A serverless application that can run big data workloads.
- AWS Scheduler: A service to schedule tasks and automate workflows.
Steps
Create an EMR Serverless Application:
- Define the application with necessary configurations such as the type of application, EMR release version, and capacity settings.
Set Up AWS Scheduler:
- Create a schedule to trigger the EMR Serverless application using a cron expression.
- Configure an IAM role to provide the necessary permissions for the scheduler to start the EMR job.
import * as pulumi from "@pulumi/pulumi";
import * as aws from "@pulumi/aws";
// Create an EMR Serverless Application
const emrApp = new aws.emrserverless.Application("emrApp", {
name: "my-emr-serverless-app",
type: "SPARK", // Specify the type of application (e.g., SPARK, HIVE)
releaseLabel: "emr-6.4.0", // EMR release version
maximumCapacity: {
cpu: "4 vCPU",
memory: "16 GB",
},
initialCapacities: [{
initialCapacityType: "DRIVER",
initialCapacityConfig: {
workerCount: 1,
workerConfiguration: {
cpu: "2 vCPU",
memory: "8 GB",
},
},
}],
});
// Create an IAM Role for the Scheduler to trigger the EMR application
const schedulerRole = new aws.iam.Role("schedulerRole", {
assumeRolePolicy: aws.iam.assumeRolePolicyForPrincipal({ Service: "scheduler.amazonaws.com" }),
});
// Attach the necessary policies to the role
const schedulerRolePolicy = new aws.iam.RolePolicy("schedulerRolePolicy", {
role: schedulerRole.id,
policy: pulumi.output({
Version: "2012-10-17",
Statement: [{
Effect: "Allow",
Action: [
"emr:StartJobRun",
],
Resource: "*", // Adjust the resource as needed
}],
}),
});
// Create an AWS Scheduler Schedule
const schedule = new aws.scheduler.Schedule("emrSchedule", {
name: "my-emr-schedule",
scheduleExpression: "cron(0 12 * * ? *)", // Every day at 12 PM UTC
flexibleTimeWindow: {
mode: "OFF",
},
target: {
arn: emrApp.id, // ARN of the EMR Serverless application
roleArn: schedulerRole.arn,
input: JSON.stringify({
name: "my-emr-job",
executionRoleArn: schedulerRole.arn, // Role to execute the job
releaseLabel: "emr-6.4.0",
jobDriver: {
sparkSubmitJobDriver: {
entryPoint: "s3://my-bucket/my-script.py", // Replace with your script location
},
},
configurationOverrides: {
monitoringConfiguration: {
s3MonitoringConfiguration: {
logUri: "s3://my-bucket/logs/",
},
},
},
}),
},
});
Summary
In this guide, we successfully created an AWS EMR Serverless application and set up an AWS Scheduler to automate its execution at specified intervals. By using a cron expression, we defined a precise schedule for the job execution. Additionally, we configured an IAM role to ensure that the scheduler has the necessary permissions to trigger the EMR job. Following this guide enables you to efficiently manage and automate your EMR Serverless jobs using AWS Scheduler and Pulumi.
Deploy this code
Want to deploy this code? Sign up for a free Pulumi account to deploy in a few clicks.
Sign upNew to Pulumi?
Want to deploy this code? Sign up with Pulumi to deploy in a few clicks.
Sign upThank you for your feedback!
If you have a question about how to use Pulumi, reach out in Community Slack.
Open an issue on GitHub to report a problem or suggest an improvement.