Your resource for web content, online publishing
and the distribution of digital products.
S M T W T F S
 
 
 
 
 
 
1
 
2
 
3
 
4
 
5
 
6
 
7
 
8
 
9
 
 
 
 
13
 
14
 
15
 
16
 
17
 
18
 
19
 
20
 
21
 
22
 
23
 
24
 
25
 
26
 
27
 
28
 
 

Ensure Page Smoothness With This DolphinScheduler Task Data Cleanup and Backup Strategy

DATE POSTED:December 13, 2024

As Apache DolphinScheduler runs for an extended period, the number of tasks continues to increase. The task data is primarily stored in the t_ds_task_instance and t_ds_process_instance tables in the database. The continuous growth of data in these two tables leads to system page lag.

\

\

Solution

To address the above issue, the measure taken is to regularly clean up data in the t_ds_process_instance and t_ds_task_instance tables that is more than one month old.

Data Backup

Before cleaning up the data, first back up the original table data to ensure data safety.

use dolphinscheduler; -- Create backup tables t_ds_process_instance_backup20241120 and t_ds_task_instance_backup20241120 CREATE TABLE t_ds_process_instance_backup20241120 LIKE t_ds_process_instance; CREATE TABLE t_ds_task_instance_backup20241120 LIKE t_ds_task_instance; -- Backup the original table data into the corresponding backup tables INSERT INTO t_ds_process_instance_backup20241120 SELECT * FROM t_ds_process_instance; INSERT INTO t_ds_task_instance_backup20241120 SELECT * FROM t_ds_task_instance; Check Backup Status

To ensure the backup operation is successful, you can check the data row count of the backup tables and the original tables.

-- Check the data row count of the backup tables SELECT COUNT(*) FROM t_ds_process_instance_backup20241120; SELECT COUNT(*) FROM t_ds_task_instance_backup20241120; -- Check the data row count of the original tables SELECT COUNT(*) FROM t_ds_process_instance; SELECT COUNT(*) FROM t_ds_task_instance; Data Cleanup

After the backup is completed and confirmed to be error-free, perform the cleanup operation to delete data before October 19, 2024, 23:59:59.

-- Delete data before October 19, 2024, 23:59:59 in the t_ds_task_instance table DELETE FROM t_ds_task_instance WHERE submit_time < '2024-10-19 23:59:59'; -- Delete data before October 19, 2024, 23:59:59 in the t_ds_process_instance table DELETE FROM t_ds_process_instance WHERE end_time < '2024-10-19 23:59:59';

\