Tasks and Scheduling

Tasks allow you to define arbitrary SQL scripts that run automatically at scheduled times. You can think of them as similar to cron jobs.

In addition to that, tasks can run on startup to configure DuckDB as needed.

Schedule tasks to:

Load, cleanup and transform data
Write data to remote sources
Get creative with DuckDB community extensions - for example use the HTTP client extension to send notifications to Slack

Run tasks on startup to:

Install DuckDB extensions
Attach databases
Create views

How to Define Tasks

You can define tasks as files or via the UI.

Any file ending with .task.sql is considered a task.

To create a task via the UI click on “New” in the menu, then switch the type at the top from “Dashboard” to “Task”.

Unlike dashboards which are restricted to read-only SQL, any DuckDB SQL statement is allowed in tasks apart from SET and PRAGMA statements.

Schedule tasks

The first SQL statement of the task defines its schedule by returning a single value of the type SCHEDULE which is either an INTERVAL or a TIMESTAMP.

Schedule a task that runs every 5 minutes:

SELECT INTERVAL '5 minutes'::SCHEDULE;

Run every day at 1:00AM:

SELECT today() + INTERVAL '25h'::SCHEDULE;

Run every week on Monday at 1:00AM:

SELECT date_trunc('week', now()) + INTERVAL '7days 1h'::SCHEDULE;

Startup Tasks

To run a task on startup use init:SCHEDULE:

SELECT 'init'::SCHEDULE;

Multiple init-tasks run in alphabetical order and top-level tasks run before tasks in folders.

Memory Mode

When the duckdb option is set to :memory: a new database is initialized for every dashboard. In that case not all init-tasks run but only the tasks in the same folder as the dashboard itself and parent folders.

Non-scheduled Scripts

You can also run the SQL script directly via the UI by pressing the “Run” button. This can be useful for testing and for scripts you like to only trigger manually.

If you do not want a task to run automatically skip the ::SCHEDULE statement.

Failed tasks and monitoring

If a task fails, it will be retried automatically the next time it is scheduled to run. If getting the next scheduled task run time fails, it will not be retried automatically.

The time and status of the last task run is shown in the UI.

But to make sure that tasks are working correctly and run at correct times, you need to monitor Shaper. For more on monitoring see the Deploy Docs.

Tasks when running Shaper in a cluster of multiple nodes

Scheduled tasks and when running tasks manually they run only on one node in a cluster, and init-tasks run on all nodes.

When running Shaper in a cluster don’t store data in Shaper’s built-in DuckDB.

Disable Tasks Functionality

You can disable all tasks functionality by setting the flag --no-tasks or the environment variable SHAPER_NO_TASKS=true. This will also disable existing tasks. But it will not delete them. So if you later remove the flag, the tasks will be available again.