Connecting to Data Sources
Shaper can query any data source DuckDB supports - including any extensions.
Use Shaper to attach to your production Postgres database, query files on S3 or even query a Google Sheet directly.
To query remote data sources, you will need to configure secrets, attach databases and load extensions. You can do that either with tasks or by configuring init-sql.
Since DuckDB is also a database, you can also store data within Shaper.
- Shaper has an Ingest API to ingest data via an HTTP API.
- You can also use scheduled jobs to routinely load and transform data.
Querying Data
Section titled “Querying Data”The easiest way to query external data is through files.
DuckDB can read local files and it can also read files via HTTP or from S3 and other object storages directly.
Common file formats are supported out of the box:
- CSV
- JSON
- Parquet
- Text files
- Excel
You can find the complete list in the DuckDB documentation.
In addition to files, you can also attach to databases such as Postgres, MySQL and Iceberg data lakes. For more see below.
Secret Management
Section titled “Secret Management”Shaper uses DuckDB Secrets for managing credentials.
You can directly create persistent secrets by running SQL ask task in the Shaper UI:
- Navigate to New
- select Task
- write a
CREATE PERSISTENT SECRETstatement - Click Run, and don’t save the task after
For example, you can set up a DuckDB secret for S3 credentials like this:
CREATE PERSISTENT SECRET mys3 ( TYPE s3, KEY_ID 'myS3KeyId', SECRET 'myS3Secret', REGION 'myRegion', SCOPE 's3://myBucket');If you like to use environment variables to create secrets, use the init-sql functionality:
Specify SQL that is executed when the system is started using the --init-sql flag (or env var) or
use --init-sql-file to specify a file that contains SQL.
The init-sql code has access to environment variables as $VAR or ${VAR} in the code.
For example, you can set up a DuckDB secret for S3 credentials like this:
CREATE OR REPLACE SECRET mys3 ( TYPE s3, KEY_ID '${S3_KEY_ID}', SECRET '${S3_SECRET}', REGION 'my_region', SCOPE 's3://my_bucket');Connecting to Databases
Section titled “Connecting to Databases”DuckDB allows you to attach to remote databases.
There are extensions to attach to many common databases:
- Postgres
- SQLite
- MySQL
- Another DuckDB database file
- …
Attach to a database by using DuckDB’s ATTACH functionality together with secrets (see above).
Postgres
Section titled “Postgres”For example, you can create a persistent secret that looks like this:
CREATE PERSISTENT SECRET pg1 ( TYPE postgres, HOST '127.0.0.1', PORT 5432, DATABASE postgres, USER 'postgres', PASSWORD '');And then create and save a task that runs on startup like this:
SELECT 'init'::SCHEDULE;ATTACH '' AS postgres_db (TYPE postgres, secret pg1);For more connection options see the duckdb-extension docs.
Note that DuckDB caches your schema. There are multiple ways to refresh the schema. See this discussion.
DuckDB Extensions
Section titled “DuckDB Extensions”Make use of any DuckDB extension within Shaper.
There are extensions to do HTTP calls and query data sources from Snowflake, Big Query and Clickhouse to Arrow Flight and Google Sheets.
Learn about all existing core and community extensions in the DuckDB documentation.
If you run into any issues with extensions or have any questions, don’t hesitate to reach out.
DuckDB core extensions are installed and loaded automatically when they are first used.
Otherwise you can use INSTALL and LOAD in tasks or init-sql.
Change the directory extensions are installed in using the --duckdb-ext-dir flag.
Use UPDATE EXTENSIONS; if you like to ensure the latest version of every extension is used.