1. Answers
  2. Query Logs and Data in S3 Directly Using Amazon OpenSearch Service Zero-ETL

How Do I Query Logs and Data in S3 Directly Using Amazon OpenSearch Service Zero-ETL?

Introduction

This guide provides a comprehensive walkthrough on how to directly query logs and data stored in Amazon S3 using Amazon OpenSearch Service, eliminating the need for traditional Extract, Transform, Load (ETL) processes. Implementing this setup can greatly enhance your data analysis capabilities by allowing seamless access to data stored in S3, thus optimizing performance and reducing operational overhead.

Step-by-Step Setup

To achieve direct querying of S3 data via Amazon OpenSearch Service, follow these steps:

  1. Create an Amazon S3 Bucket: This bucket will be used to store your logs and data.
  2. Set Up an Amazon OpenSearch Service Domain: Configure the domain to access the S3 bucket.

Below is the complete setup configuration:

import * as pulumi from "@pulumi/pulumi";
import * as aws from "@pulumi/aws";

// Create an S3 bucket to store logs and data
const myLogsBucket = new aws.s3.BucketV2("my_logs_bucket", {
    bucket: "my-logs-and-data-bucket",
    acl: "private",
    policy: `  {
    "Version": "2012-10-17",
    "Statement": [
      {
        "Effect": "Allow",
        "Principal": {
          "Service": "opensearchservice.amazonaws.com"
        },
        "Action": "s3:GetObject",
        "Resource": "arn:aws:s3:::my-logs-and-data-bucket/*"
      }
    ]
  }
`,
});

// Create an OpenSearch service domain
const myOpensearchDomain = new aws.opensearch.Domain("my_opensearch_domain", {
    domainName: "my-awesome-opensearch-domain",
    clusterConfig: {
        instanceType: "t2.small.search",
    },
    ebsOptions: {
        ebsEnabled: true,
        volumeSize: 10,
    },
    accessPolicies: `  {
    "Version": "2012-10-17",
    "Statement": [
      {
        "Effect": "Allow",
        "Principal": {
          "AWS": "*"
        },
        "Action": "es:*",
        "Resource": "arn:aws:es:us-west-2:123456789012:domain/my-awesome-opensearch-domain/*"
      }
    ]
  }
`,
});

export const s3BucketName = myLogsBucket.bucket;
export const opensearchDomainEndpoint = myOpensearchDomain.endpoint;
export const opensearchDomainArn = myOpensearchDomain.arn;

Key Points

  • Zero-ETL Setup: By configuring OpenSearch to access S3 directly, you eliminate the need for intermediate data processing stages.
  • Efficient Data Access: This setup allows for efficient querying and analysis of data stored in S3.
  • Simplified Configuration: The process involves creating an S3 bucket and an OpenSearch domain with the necessary permissions.

Summary

In this guide, we established a direct connection between Amazon S3 and Amazon OpenSearch Service, enabling zero-ETL data querying. This configuration simplifies data analysis workflows by allowing OpenSearch to directly access and query data stored in S3, thus enhancing operational efficiency and reducing complexity.

Deploy this code

Want to deploy this code? Sign up for a free Pulumi account to deploy in a few clicks.

Sign up

New to Pulumi?

Want to deploy this code? Sign up with Pulumi to deploy in a few clicks.

Sign up