Skip to main content
Skip to main content

Integrating Amazon Glue with ClickHouse and Spark

ClickHouse Supported

Amazon Glue is a fully managed, serverless data integration service provided by Amazon Web Services (AWS). It simplifies the process of discovering, preparing, and transforming data for analytics, machine learning, and application development.

Installation

To integrate your Glue code with ClickHouse, you can use our official Spark connector in Glue via one of the following:

  • Installing the ClickHouse Glue connector from the AWS Marketplace (recommended).
  • Manually adding the Spark Connector's jars to your Glue job.
  1. Subscribe to the Connector

    To access the connector in your account, subscribe to the ClickHouse AWS Glue Connector from AWS Marketplace.

  2. Grant Required Permissions

    Ensure your Glue job’s IAM role has the necessary permissions, as described in the minimum privileges guide.

  3. Activate the Connector & Create a Connection

    After subscribing, select the Glue version that matches your job requirements. In the Additional details section, under Usage instructions, click the link to Open Glue Studio - Add ClickHouse connector. This opens the Glue connection creation page with key fields pre-filled. Give the connection a name and press create (no need to provide the ClickHouse connection details at this stage).

  1. Use in Glue Job

    In your Glue job, select the Job details tab, and expend the Advanced properties window. Under the Connections section, select the connection you just created. The connector automatically injects the required JARs into the job runtime.
Glue Notebook connections config
Note

Make sure to select the connector version that matches your Glue job configuration:

  • Glue 4: Spark 3.3, Scala 2, Python 3
  • Glue 5: Spark 3.5, Scala 2, Python 3

Using AWS Secrets Manager for credentials

Rather than hardcoding your ClickHouse user and password in the job, store them in AWS Secrets Manager and reference the secret from your Glue connection or job script. At runtime, Glue fetches the secret and merges its key-value pairs into the connector's connection options.

Create the secret

In AWS Secrets Manager, create a secret of type Other type of secret with key-value pairs whose keys match the connector's option names:

KeyValue
useryour ClickHouse username
passwordyour ClickHouse password

Any key you put in the secret is forwarded to the connector, so you can also store host, database, or any other option there if you'd like to keep them out of code.

Reference the secret

There are two ways to wire the secret into a job.

Option 1: attach it to the Glue connection. When creating or editing the ClickHouse connection in Glue Studio, set the AWS secret field to the secret's name. Any job that uses this connection resolves the secret automatically — no code changes needed.

Option 2: pass secretId in connection options. Add secretId to the options map and drop the keys the secret provides:

clickhouse_options = {
    "className": "clickhouse",
    "secretId": "clickhouse/glue/credentials",
    "host": "<your-clickhouse-host>",
    "http_port": "<your-clickhouse-port>",
    "protocol": "https",
    "database": "default",
    "table": "example_table",
    "ssl": "true"
}

The secret's user and password keys are merged into these options at runtime, so you never need to read them in your script.

Examples

You can use the ClickHouse connector as either a source or a target in the Glue Studio visual editor. Simply drag the ClickHouse Spark Connector component onto the canvas and connect it to your data pipeline.

For more details, please visit our Spark documentation.