How Do I Set Up AWS Glue With Parameters?

Setting Up AWS Glue with Parameters

Introduction

This guide aims to walk you through the process of setting up an AWS Glue job with parameters. By parameterizing your AWS Glue jobs, you can increase the flexibility and reusability of your ETL (Extract, Transform, Load) scripts. This guide will cover the creation of an AWS Glue job, a Glue database, and a Glue crawler, as well as the addition of parameters to the job definition.

Step-by-Step Setup Process

Define an AWS Glue Database: Start by creating a Glue database to organize your data. This database will serve as a centralized location for your data assets.
Create an AWS Glue Crawler: Set up a crawler to automatically update the metadata catalog with schema details. This helps in keeping the data schema up to date without manual intervention.
Set Up an AWS Glue Job: Create a Glue job that will execute your ETL scripts. Ensure that the job is configured with the necessary IAM roles and script locations.
Add Parameters to the Job Definition: Introduce parameters to your Glue job to allow customization and reuse of the scripts. This step enhances the job’s flexibility by enabling it to run with different inputs.

import * as pulumi from "@pulumi/pulumi";
import * as aws from "@pulumi/aws";

const example = new aws.glue.CatalogDatabase("example", {name: "example_database"});
const exampleCrawler = new aws.glue.Crawler("example", {
    name: "example_crawler",
    role: "arn:aws:iam::123456789012:role/service-role/AWSGlueServiceRole",
    databaseName: example.name,
    s3Targets: [{
        path: "s3://example-bucket/path/",
    }],
});
const exampleJob = new aws.glue.Job("example", {
    name: "example_job",
    roleArn: "arn:aws:iam::123456789012:role/service-role/AWSGlueServiceRole",
    command: {
        scriptLocation: "s3://example-bucket/scripts/example-script.py",
        name: "glueetl",
    },
    defaultArguments: {
        "--job-language": "python",
        "--TempDir": "s3://example-bucket/temp/",
        "--parameter1": "value1",
    },
    maxRetries: 3,
    glueVersion: "2.0",
    numberOfWorkers: 10,
    workerType: "G.1X",
});
export const glueCrawlerName = exampleCrawler.name;
export const glueJobName = exampleJob.name;

Summary

In summary, this configuration establishes an AWS Glue environment consisting of a database, a crawler, and a job. The Glue job is parameterized to provide flexibility, allowing you to run the same job with varied inputs. This approach enhances the reusability and maintainability of your ETL processes.

Key Points

AWS Glue Database: Organizes your metadata efficiently.
AWS Glue Crawler: Automatically discovers and updates schema details.
AWS Glue Job: Configured with parameters for enhanced flexibility.
Parameterization: Facilitates job customization and reuse.

By following these steps, you can effectively set up and manage your AWS Glue environment with parameterized jobs.

Deploy this code

Want to deploy this code? Sign up for a free Pulumi account to deploy in a few clicks.

New to Pulumi?

Want to deploy this code? Sign up with Pulumi to deploy in a few clicks.