Blog

Build Your Own Data Analytics Slack Agent With Shaper

Apr 15, 2026 - 4 min read

Jorin Vogel

At Taleshape, we are building Shaper - an open source SQL-based tool to visualize data and build dashboards.

And we think Shaper is a great fit for building custom data analytics AI agents.

Let’s look at an agent in action:

Talk to the agent directly in Slack, and it uses Shaper to answer your data questions.
The agent can create anything from charts and Excel files to complete PDF reports, and then shares these files in Slack.
Since any chart or report in Shaper is just a SQL query, we can easily verify and reproduce any answer the agent gives.

How It Works

The agent itself is a simple Python application that listens to Slack messages.
When it receives a message, it uses an LLM to answer the user request.
We give the LLM tools to use Shaper’s API to generate png/pdf/xlsx files directly from SQL queries.
We then upload the generated files to Slack. While the LLM generates the SQL, it doesn’t touch the results, avoiding the risk of hallucinated data or charts.

Architecture Diagram

Since we control the agent and Shaper runs in our own infrastructure, we are also in full control of what data the LLM sees and which LLM we use. This setup can be easily adapted to fit the privacy and security needs of any organization.

Agents and Dashboards

Shaper’s interactive dashboards are great for tracking known metrics or giving customers an overview of their data.

But what do we do if we have ad-hoc questions or want to explore our data beyond what any existing dashboard was built for?

That’s where AI agents can help:

Build an agent that answers your business team’s data questions without the help of a data analyst.
Build an agent to explore new use cases for your data.

Shaper’s SQL-based approach means that instead of answering one-off questions your data team can focus on:

Reviewing and verifying answers the agent gives.
Turn any ad-hoc request into customer-facing dashboards and reusable reports.

Build Your Own Agent

Building your own agent not only means that you stay in full control of your data, but also that you can build an agent that is truly useful in practice. You can narrow the problem scope and build an agent for exactly the workflows most useful to you, integrating it into your processes with the context it needs.

Shaper provides the foundation to access data, create charts and reports, while ensuring correctness and keeping everything reproducible.

At Taleshape, we help our customers build systems exactly like this agent.

If you’d like to learn more, don’t hesitate to reach out.

Self-hosted Data Lake Platform with DuckDB, DuckLake and Shaper

Mar 31, 2026 - 6 min read

Jorin Vogel

Do you need to store and analyze large amounts of data without breaking the bank? Do you need to run in your own infrastructure instead of relying on the big cloud providers?

You can build a simple data lake platform that scales from a $5 VM to petabytes of data with these three open source tools: DuckDB, DuckLake, and Shaper.

Architecture Diagram

What Does a Data Platform Need?

A minimal data platform requires at least three components:

Data Storage
Data Ingestion and Transformation
Data Analysis and Visualization

Let’s go over them one by one:

Data Storage

When you have only a small amount of data you can use DuckDB to directly query your production database such as Postgres, but as your data grows that gets slow and expensive.

One alternative is to replicate data to a data warehouse such as Snowflake, BigQuery, Redshift or even Clickhouse. But warehouses are complex to operate and expensive.

That’s where data lakes come in:

What’s a Data Lake?

Building a data lake architecture means storing data as files in a cheap object storage such as S3.

Data is then stored in a format optimized for analytics such as Parquet files. And DuckDB can query those files really efficiently - fast enough for most use cases even with large amounts of data.

But Parquet files are read-only. To update data you need to create a new file. So you quickly end up with a large number of files that you need to manage and the need to rewrite files for better performance.

That’s why a data lake needs a metadata layer - a catalog of all the data.

Once you have a catalog, using a data lake feels just like using a database:

You create, update and insert into tables without thinking about Parquet files. The catalog automatically rewrites and optimizes the files for you as needed.

Why DuckLake?

The most popular implementation of such a catalog is Apache Iceberg. Cloud providers offer Iceberg as a service (for example AWS S3 Tables and Cloudflare R2 Data Catalog).

And DuckDB has great support for Iceberg.

If your infrastructure provider already supports Iceberg on top of its object storage and you don’t mind paying the premium for that, that’s a great option for a data lake.

However, to run your own data lake on top of any object storage, DuckLake is much simpler than running your own Iceberg implementation.

DuckLake doesn’t require any additional services to run. It works with any object storage and supports many databases for its catalog.

If you have an S3-compatible service and a Postgres database, you can start using DuckLake without any new infrastructure.

Running DuckLake and Shaper

We can run a complete data lake platform in a simple docker-compose setup:

Minio for object storage
Postgres for the DuckLake catalog
Shaper to run DuckDB and provide a UI for querying and scheduling jobs

services:
  minio:
    image: minio/minio
    restart: unless-stopped
    command: server /data --console-address ":9001"
    environment:
      MINIO_ROOT_USER: minioadmin
      MINIO_ROOT_PASSWORD: minioadmin
    volumes:
      - minio_data:/data
    healthcheck:
      test: ["CMD", "mc", "ready", "local"]
      interval: 5s
      timeout: 5s
      retries: 10

  minio-init:
    image: minio/mc
    depends_on:
      minio:
        condition: service_healthy
    entrypoint: >
      /bin/sh -c "
        mc alias set local http://minio:9000 minioadmin minioadmin &&
        mc mb --ignore-existing local/ducklake &&
        echo 'Bucket ducklake ready.'
      "
    restart: "no"

  postgres:
    image: postgres:16
    restart: unless-stopped
    environment:
      POSTGRES_USER: ducklake
      POSTGRES_PASSWORD: ducklake
      POSTGRES_DB: ducklake
    volumes:
      - postgres_data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U ducklake -d ducklake"]
      interval: 5s
      timeout: 5s
      retries: 10

  shaper:
    image: taleshape/shaper
    restart: unless-stopped
    depends_on:
      postgres:
        condition: service_healthy
      minio-init:
        condition: service_completed_successfully
    ports:
      - "5454:5454" # UI → http://localhost:5454
    volumes:
      - ./init.sql:/init.sql:ro
    environment:
      SHAPER_INIT_SQL_FILE: /init.sql
      MINIO_KEY_ID: minioadmin
      MINIO_SECRET: minioadmin
      MINIO_ENDPOINT: minio:9000
      MINIO_BUCKET: ducklake
      PG_HOST: postgres
      PG_PORT: "5432"
      PG_DB: ducklake
      PG_USER: ducklake
      PG_PASSWORD: ducklake

volumes:
  minio_data:
  postgres_data:
  shaper_data:

We also need to configure Shaper to set up the secrets for DuckLake. We can do that in a separate init.sql file that Shaper executes on startup:

CREATE OR REPLACE SECRET minio_secret (
    TYPE s3,
    KEY_ID '${MINIO_KEY_ID}',
    SECRET '${MINIO_SECRET}',
    REGION 'us-east-1',
    ENDPOINT '${MINIO_ENDPOINT}',
    URL_STYLE 'path',
    USE_SSL false
);

CREATE OR REPLACE SECRET pg_secret (
    TYPE postgres,
    HOST '${PG_HOST}',
    PORT ${PG_PORT},
    DATABASE '${PG_DB}',
    USER '${PG_USER}',
    PASSWORD '${PG_PASSWORD}'
);

CREATE SECRET ducklake_secret (
    TYPE ducklake,
    METADATA_PATH '',
    DATA_PATH 's3://${MINIO_BUCKET}/',
    METADATA_PARAMETERS MAP {'TYPE': 'postgres', 'SECRET': 'pg_secret'}
);

ATTACH 'ducklake:ducklake_secret' AS ducklake;

You can start the setup by running docker compose up -d in the directory where you created the docker-compose.yml and init.sql files.

Ingesting and Transforming Data

We can use Shaper’s Task feature as a simple scheduling system to run SQL queries that ingest and transform data in our data lake:

Open http://localhost:5454 in your browser.
Click on New and select Task at the top.
Copy and paste the following SQL:

SELECT NULL::SCHEDULE;
CREATE TABLE IF NOT EXISTS ducklake.sessions AS
    SELECT * FROM 'https://taleshape.com/sample-data/sessions.csv';

Click on “Create” at the top right and give the task a name such as “Ingest Sessions Data”.
Now click on “Run” in the top right corner.

Congrats, you’ve loaded your first data into DuckLake!

For real use cases you would want to schedule tasks to run automatically and then you would use for example MERGE INTO to update your data lake continuously.

And of course not only Shaper can access DuckLake. For more complex setups you can introduce other tools for your data pipelines such as DBT and Dagster.

Analyze and Visualize Data

Once we have data in our data lake we can now query it.

Let’s build a first Shaper dashboard on top of DuckLake:

In the Shaper UI click on New and select Dashboard at the top.
Copy and paste the following SQL:

SELECT count(*) AS "Total Sessions"
FROM ducklake.sessions;

SELECT 'Sessions per Week'::LABEL;
SELECT
  date_trunc('week', created_at)::XAXIS,
  category::CATEGORY,
  count(*)::BARCHART_STACKED,
FROM ducklake.sessions GROUP BY ALL ORDER BY ALL;

Click on “Create” at the top right and give the dashboard a name such as “Sessions Dashboard”.

Congrats, you just built your first dashboard on top of your data lake!

You can learn more about building dashboards in Shaper in the Getting Started Guide.

Deploying to Production

You now have a minimal, but complete data lake platform that you can build interesting data products on top of.

But for now this is only running on your computer locally. Now it’s time to deploy this setup into a production infrastructure.

The cheapest way to get started is to simply run all services on a single server, almost identical to how we did above.

But you wouldn’t need a data lake if you have so little data that it fits on a single machine.

Instead, you want to make use of an existing object storage service which almost every hosting provider offers.

Recommended: Hetzner

If you are looking for a more affordable hosting provider or prefer a European provider over the big American cloud providers, I recommend you look into Hetzner (referral to get free 20€).

You get a VM for 3.49€ and 1TB of object storage for 4.99€, hosted in Germany.

Just be careful that there are some gotchas with Hetzner’s object storage. To make sure your setup is properly secured, check out this detailed guide by my friend Floyd:

DuckLake on Hetzner

And if you are looking for fully-managed Shaper connected to your data lake and integrated into your systems, check out Taleshape’s Managed Hosting and Support Plans.

Analytics Dashboards as Code

Dec 9, 2025 - 2 min read

Jorin Vogel

Shaper has always been about building dashboards using SQL.

And it’s just natural to manage SQL as files just like any other code.

With the latest release of Shaper you can now deploy dashboards from files,
and live-preview dashboard changes.

You get to:

use your favorite editor - including its AI features
track dashboards in Git
collaborate with your team using pull requests
deploy dashboards via CI/CD

The workflow in action:

Getting Started

Getting started and sharing Shaper projects is now simpler than ever since everything is just files. Let’s give it a try:

Clone this demo Git repository:

git@github.com:taleshape-com/demo-project.git

Install Shaper in the project directory:
Terminal window
```
make install
```
Start a local Shaper server to serve the dashboards:
Terminal window
```
make serve
```
Deploy the two dashboards in the dashboards/ folder by running the deploy command in a second terminal:
Terminal window
```
make deploy
```
Live-preview changes by running the dev file watcher in the second terminal:
Terminal window
```
make dev
```
Now edit or create any SQL file in the dashboards/ folder and see the changes live in your browser.

Learn More

Usually you will run Shaper on a server. Then you only need shaper dev locally when developing dashboards. Once you configured authentication for your Shaper instance, the dev command will automatically prompt you to login and authenticate. And instead of running deploy manually, you can use the Shaper Github Action to deploy dashboards automatically in your CI/CD pipeline.

All details are in the documentation.

And as always - we are happy to hear about any feedback via socials or Github issues or discussions!

Getting Started Building a Data Platform

Oct 16, 2025 - 5 min read

Jorin Vogel

Ever wonder what a data platform is and if your company needs one?

If the idea of hiring a data team to build and manage an enterprise data platform feels overwhelming, you’re not alone.

Let me break down how you can get started from zero and build up data capabilities at your company, one step at a time.

Who Needs a Data Platform?

Nowadays, every company is a data company. From marketing and sales to product usage and customer support, all aspects of your business generate data.

And that data is waiting to be activated. Turn it into reports to drive decisions. Build dashboards to support operations. Offer new product features and services to your customers that are directly driven by data and automation.

Getting started is much more a cultural challenge than a technical one.

Start small. Make sure you see first successes by doing the work manually without worrying about big investments in making it scalable.

But once you see concrete value and it becomes painfully clear that technology is holding you back, you know it’s time to build out your data capabilities.

Where do you go from here? Can you buy an off-the-shelf solution? Do you hire a data engineer? Do you need a dedicated data team or can your existing engineers handle it?

Breaking It Down

You can break down data infrastructure into four main layers:

Ingestion: Connect data sources, extract data and load it into a central repository
Storage: Store data in a structured format that is optimized for analytical workloads
Transformation: Clean, enrich, and transform data to make it practical to work with and ensure consistent definitions of key metrics across the organization
Analytics: Create dashboards, reports, and alerts to share insights internally and with your customers and partners

One Step at a Time

There are many different tools to address all of these layers.

Focus on solving concrete problems you are experiencing and add new tools only when they directly solve a problem.

Ingestion

Start with querying your data directly where it is. Introduce tools to load data into a central repository only when the complexity and volume make this impractical.

You don’t need to address all data sources at once. Focus on the ones creating problems. Accept manual workarounds when practical.

Your data tooling should be able to query data across different data sources. You don’t need to worry about ingestion if directly querying a Postgres database and a Google Sheet gets the job done.

Storage

Chances are you already store your data in a database such as PostgreSQL or MySQL. If you’re not having performance issues, there’s no need to introduce a separate database for analytical workloads.

Only if performance or cost becomes an issue should you start addressing it.

Storage is a critical component since it’s where the actual data lives. Data outlives applications built on top of it. Pick an established and open standard to store data.

Keep in mind that there is no one-size-fits-all solution. You might need multiple data stores optimized for different use cases. You’ll know what you’re looking for when you act on concrete problems instead of trying to find a solution for hypothetical future problems.

Transformation

Start delivering value before adding a separate data transformation step. Introduce a dedicated data transformation layer when queries start taking too long, or metrics become unreliable and hard to maintain because the same logic is repeated in many places.

A few materialized views in your database can take you a long way.

You’ll know it’s time to look into real-time stream processing, data lineage and orchestration tools once you experience the issues that these tools are designed to solve.

Analytics

Many software companies start out by building custom analytics features. As you use data to drive operations and user-facing functionality, building custom solutions for every new workflow and view on the data becomes slow and expensive.

Introduce a data visualization tool to quickly build analytics dashboards and reports. This is a great first step and enables a single data analyst to deliver a lot of value without introducing any other data infrastructure.

Getting Started

I built Shaper to help companies in exactly this situation.

taleshape-com/shaper

Shaper is a simple interface on top of DuckDB that allows you to build analytics dashboards and automate data workflows using only SQL.

Thanks to DuckDB, it’s easy to query data across various sources ranging from databases to CSV files and Google Sheets.

You can go a long way before having to add more layers to your data stack.

Give it a try and let me know what you think.

Build Your Own Bluesky Analytics Dashboard

Aug 28, 2025 - 3 min read

Jorin Vogel

Are you using Bluesky and want to stay on top of what’s happening? Are you curious how you can use Shaper to pull data from APIs and build interactive dashboards, all in a single tool and using just SQL?

Let’s automate pulling posts data from the Bluesky API to track topics we are interested in, and then create a data dashboard that visualizes activity around these topics.

You will get a dashboard that looks like this:

Let’s Get Started

Let’s open a terminal, create a new directory, and change into it:
Terminal window
```
mkdir bluesky-dashboard && cd bluesky-dashboard
```
You will need credentials to authenticate with the Bluesky API.
Create a Bluesky App Password and save it together with your handle as bluesky_credentials.json:
Terminal window
```
echo '{ "identifier": "", "password": "" }' > bluesky_credentials.json
```
Now let’s run Shaper. The easiest way is to run it via Docker or NPM:
- Docker
- npm
Terminal window
docker run --rm -it -p5454:5454 \ -v ./bluesky_credentials.json:/bluesky_credentials.json \ -v ./data:/data taleshape/shaper \ --init-sql "INSTALL http_client FROM community; LOAD http_client;"
Terminal window
npx @taleshape/shaper -d ./data

Open http://localhost:5454 in your browser and click on New.
Now let’s create a Task to fetch posts from Bluesky and store them in a database table. Select Task in the dropdown at the top of the page and paste in the following SQL code:

SELECT (date_trunc('hour', now()) + INTERVAL '1h')::SCHEDULE;

CREATE SCHEMA IF NOT EXISTS bsky;

CREATE TABLE IF NOT EXISTS bsky.posts (
    topic VARCHAR,
    created_at TIMESTAMP,
    cid VARCHAR,
    author_handle VARCHAR,
    url VARCHAR,
    text VARCHAR,
    like_count INT,
    reply_count INT,
    quote_count INT,
    repost_count INT,
    loaded_at TIMESTAMP DEFAULT now(),
);

SET VARIABLE access_jwt = http_post(
  'https://bsky.social/xrpc/com.atproto.server.createSession',
  headers => MAP {
    'Content-Type': 'application/json',
    'Accept': 'application/json',
  },
  body => (SELECT c FROM './bluesky_credentials.json' c)
) ->> 'body' ->> 'accessJwt';

WITH topics AS (
  SELECT col0 AS topic, col1 AS query_string FROM (
    VALUES
    ('DuckDB',            'duckdb'),
    ('Data Engineering',  '"data-engineering" "data engineering" "dataengineering"'),
    ('#databs',           '#databs'),
  )
),
topics_with_ts AS (
  SELECT
    topic,
    query_string,
    coalesce(max(loaded_at), (now() - INTERVAL '30 days')::TIMESTAMP) as last_loaded_at,
  FROM topics LEFT JOIN bsky.posts USING(topic)
  GROUP BY ALL
),
json_posts AS (
  SELECT
    topic,
    (http_get(
      'https://bsky.social/xrpc/app.bsky.feed.searchPosts',
      headers => MAP {
        'Accept': 'application/json',
        'Authorization': concat('Bearer ', getvariable('access_jwt')),
      },
      params => MAP {
        'q': query_string,
        'limit': '100',
        'since': strftime(last_loaded_at, '%Y-%m-%dT%H:%M:%SZ'),
      }
    ) ->> 'body' -> '$.posts[*]').unnest() AS p
  FROM topics_with_ts
)
INSERT INTO bsky.posts BY NAME (
  SELECT
    topic,
    (p ->> '$.record.createdAt')::TIMESTAMP AS created_at,
    p ->> 'cid' AS cid,
    p ->> '$.author.handle' AS author_handle,
    concat('https://bsky.app/profile/', author_handle, '/post/', split_part(p ->> 'uri', '/', -1)) AS url,
    p ->> '$.record.text' AS text,
    (p -> 'likeCount')::INT AS like_count,
    (p -> 'replyCount')::INT AS reply_count,
    (p -> 'quoteCount')::INT AS quote_count,
    (p -> 'repostCount')::INT AS repost_count,
  FROM json_posts
);

The task is configured to run every hour and fetch new posts for the topics “DuckDB”, “Data Engineering”, and “#databs”.
Replace the topics with your own topics.
Then click Run to try out the task. If the task runs successfully, click Create and save it as Fetch Bluesky Posts.

With the first data loaded, we can now create a dashboard to visualize the data.
Click on New again and paste in the following SQL code:

SELECT 'Bluesky Analytics'::SECTION;

SELECT
  min(created_at)::DATE::DATEPICKER_FROM AS start_date,
  max(created_at)::DATE::DATEPICKER_TO AS end_date,
FROM bsky.posts;

SELECT 'Topics'::LABEL;
SELECT distinct topic::DROPDOWN_MULTI AS topics FROM bsky.posts;

CREATE TEMP VIEW posts AS (
  FROM bsky.posts
  WHERE topic in getvariable('topics')
    AND created_at BETWEEN getvariable('start_date')
                       AND getvariable('end_date')
);

SELECT concat('bluesky_posts_', today())::DOWNLOAD_CSV AS CSV;
SELECT * FROM posts;

SELECT count(distinct cid) AS 'Total Posts Overall' FROM posts;

SELECT
  count() AS 'Total Posts',
  topic AS Topic,
FROM posts GROUP BY topic ORDER BY ALL DESC;

SELECT 'Posts per Day'::LABEL;
SELECT
  topic::CATEGORY,
  date_trunc('day', created_at)::XAXIS,
  count()::BARCHART_STACKED,
FROM posts GROUP BY ALL ORDER BY ALL;

SELECT ''::SECTION;

SELECT 'Top Posters'::LABEL;
FROM (
  SELECT
    count(distinct cid)::BARCHART AS "Total Posts",
    author_handle::YAXIS,
  FROM posts GROUP BY ALL ORDER BY ALL DESC LIMIT 10
) ORDER BY ALL;

SELECT 'Likes By Time of Day'::LABEL;
SELECT
  topic::CATEGORY,
  date_trunc('hour', created_at)::TIME::XAXIS,
  sum(like_count)::BARCHART_STACKED,
FROM posts GROUP BY ALL ORDER BY ALL;

Now click Create and save the dashboard as Bluesky Analytics.

Click on View Dashboard in the top right corner to have a better look at the whole dashboard.

And you are done! Please reach out, ask questions and I would love to see what you built.

Shaper is open source and free to use. It’s simple to run on your own server and so you can easily share dashboards with others. Find out more on Github:

taleshape-com/shaper

Blog

How It Works

Agents and Dashboards

Build Your Own Agent

Subscribe to Newsletter

What Does a Data Platform Need?

Data Storage

What’s a Data Lake?

Why DuckLake?

Running DuckLake and Shaper

Ingesting and Transforming Data

Analyze and Visualize Data

Deploying to Production

Recommended: Hetzner

Subscribe to Newsletter

Getting Started

Learn More

Subscribe to Newsletter

Who Needs a Data Platform?

Breaking It Down

One Step at a Time

Ingestion

Storage

Transformation

Analytics

Getting Started

Subscribe to Newsletter

Let’s Get Started

Subscribe to Newsletter