Tasks and Scheduling

Tasks allow you to define arbitrary SQL scripts that can run automatically at scheduled times. You can think of them as similar to cron jobs.

Some things you can do with tasks:

Load data from remote sources
Transform data to simplify analytics queries
Prepare and cleanup old data to keep your system performant
Write data to remote sources
Get creative with DuckDB community extensions

How to Define Tasks

Click on “New” in the Menu and then switch the type at the top from “Dashboard” to “Task”.

Now you can write any SQL statements and the will be executed in order.

However, the first statement must be a statement that defines how the task is scheduled.

To schedule a task that runs every 5 minutes, use:

SELECT INTERVAL '5 minutes'::SCHEDULE;

Schedule a task to run every day at 1:00AM:

SELECT today() + INTERVAL '25h'::SCHEDULE;

Or schedule a task to run every week on Monday at 1:00AM:

SELECT date_trunc('week', now()) + INTERVAL '7days 1h'::SCHEDULE;

As you can see the query must return a single value of type SCHEDULE. And schedule can be an interval or a time stamp defining the next execution of the task.

This means you can use any logic to calculate the next execution time, including changing the time based on data that you query.

One-Off Scripts

As you might have noticed, you can also run the SQL script directly via the UI by pressing the “Run” button. This can be useful for testing and for some one-off scripts you only like to use manually once in a while.

If you do not want a task to run automatically you can set the schedule to NULL:

SELECT NULL::SCHEDULE;

Failed tasks and monitoring

If a task fails, it will be retried automatically the next time it is scheduled to run. If getting the next scheduled task run time fails, it will not be retried automatically.

The time and status of the last task run is shown in the UI.

But to make sure that tasks are working correctly and run at correct times, you need monitor Shaper. For more on monitoring see the Deploy Docs.

Tasks when running Shaper in a cluster of multiple nodes

When running Shaper in a cluster you have to decide if a task should run on all nodes or only a single node.

Tasks that load data into the system or cleanup data need to run on all nodes to keep data consistent across all nodes. And tasks that send data externally by writing data to remote systems or sending a notification must only run on a single node to avoid sending the same data multiple times.

By default tasks run only on a single node in the cluster. To run a task on all nodes, just replace SCHEDULE with SCHEDULE_ALL, for example:

SELECT INTERVAL '5 minutes'::SCHEDULE;

Disable Tasks Functionality

You can disable all tasks functionality by setting the flag --no-tasks or the environment variable SHAPER_NO_TASKS=true. This will also disable existing tasks. But it will not delete them. So if you later remove the flag, the tasks will be available again.