Loading Data
There are different strategies for data loading and you can combine them depending on your use cases.
The most important thing to understand is that Shaper is just a wrapper around DuckDB. Any data source DuckDB can query, Shaper can query as well. And in addition to querying external data you can store data within the DuckDB database itself. We extend DuckDB with functionality to make it all work seamlessly for building online and realtime analytic dashboards:
- Ingest data efficiently by storing it in NATS
- Auto-create tables for ingested data
- Securely store credentials for remote data sources
- Schedule automated task to load and transform data
Ingesting Data into Shaper
Section titled “Ingesting Data into Shaper”You can ingest data into Shaper’s database through the HTTP API or via NATS. Since DuckDB is not optimized for write operations, we are storing data in a NATS Jetstream and then write it to DuckDB in batches using DuckDB’s Appender API. We create tables and add columns automatically based on the data ingested.
-
To ingest data you first need to create an API Key in the “Admin” settings of the Shaper UI.
-
Then you can write JSON data to the HTTP API or NATS directly:
Endpoint:
POST http://localhost:5454/api/data/:tablenameAuthentication is done through Bearer token in the Authorization header.
You can pass a single JSON object or an array of objects.
Example:
Terminal window curl -X POST http://localhost:5454/api/data/my_table \-H "Authorization: Bearer <your-api-key>" \-H "Content-Type: application/json" \-d '{"col1": "value1", "col2": 124}'Terminal window curl -X POST http://localhost:5454/api/data/my_table \-H "Authorization: Bearer <your-api-key>" \-H "Content-Type: application/json" \-d '[{"col1": "value1", "col2": 123}, {"col1": "value2", "col2": 456}]'Shaper uses NATS internally, but by default NATS is not directly accessible.
You can make NATS reachable by specifying
nats-port
. To make sure NATS is secured you also need to specify an adminnats-token
. For example:Terminal window docker run --rm -it -p5454:5454 -p4222:4222 taleshape/shaper --nats-port 4222 --nats-token mytokenYou can skip
nats-token
for development, but in production you want to make sure it is set.Now you can ingest data like this:
Terminal window nats pub --user '<your-api-key>' shaper.ingest.my_table '{"col1": "value1", "col2": 124}'You can also use any other NATS client. And you can also submit data using the Jetstream API to get ACKs.
-
If you click on “New” in the sidebar now, you can run the following query:
DESC my_table;SELECT * FROM my_table;column_name column_type null key default extra _id
VARCHAR
YES
_ts
TIMESTAMP
YES
col1
VARCHAR
YES
col2
DOUBLE
YES
You can see that Shaper auto-creates columns with fitting data types, and we always add
_id
and_ts
columns. You can override the default values by passing data in the JSON object for them.Shaper detects boolean and numbers in JSON. We also detect date and timestamp strings in various formats. If any data is a complex data type such as an array or object, we store them as
JSON
column in DuckDB.
Querying External Data
Section titled “Querying External Data”The easiest way to query external data is through files.
DuckDB can read local files and it can use HTTP to read remote files.
We can also read files from S3 and other object storages.
Common file formats are supported out of the box:
- CSV
- JSON
- Parquet
- Text files
- Excel
You can find the complete list in the DuckDB documentation.
Credential Management
Section titled “Credential Management”Connecting to Remote Databases
Section titled “Connecting to Remote Databases”DuckDB allows you to attach to remote databases.
There are extensions to attach to many common databases:
- Another DuckDB database file
- Postgres
- SQLite
- MySQL
Attaching to databases is done with the ATTACH
statement.
For example, you can connect to a local Postgres database:
You can learn more about ATTACH
and DETACH
in the DuckDB documentation.
DuckDB Extensions
Section titled “DuckDB Extensions”You can make use of any DuckDB Extension within Shaper.
Learn about all existing core and community extensions in the DuckDB documentation.
If you run into any issues with extensions or have any questions, don’t hesitate to reach out.
Tasks and Automation
Section titled “Tasks and Automation”Tasks can be scheduled to run on startup and rerun in scheduled intervals to take care of any data maintenance work needed:
- Attach remote databases
- Install extensions
- Import and update data from remote sources
- Create tables and views
You can think of tasks as a lightweight scripting tool similar to CRON jobs.