Clean Log Data
In HAP Server, some log data is stored in MongoDB for a long time, which may lead to a large amount of this type of data occupying a significant portion of the database storage space.
You can use the show dbs
command in MongoDB to check the size of each database and then use the command to calculate table size to identify tables that occupy a large amount of storage space.
We provide a solution for cleaning log data. Based on the set rules, data in relevant tables can be physically deleted. After the deletion is completed, log data for the corresponding time on the page will no longer be displayed. For example, execution records of workflows, records of approval processes (approval processed are also within the scope of workflows), logs of row records in worksheets, and request logs in the integration center.
Whitelisted Tables for Cleaning:
Database | Table Name | Direct Drop | Table Purpose |
---|---|---|---|
mdworkflow | code_catch | Yes | Cached data generated during code block execution |
mdworkflow | hooks_catch | Yes | Temporary cache data for triggers |
mdworkflow | webhooks_catch | Yes | Cached data generated during Webhook execution |
mdworkflow | wf_instance | No | Associated data for main workflow execution history |
mdworkflow | wf_subInstanceActivity | No | Associated data for sub-workflow execution history |
mdworkflow | wf_subInstanceCallback | No | Associated data for sub-workflow execution history |
mdworkflow | app_multiple_catch | No | Data stored when "Direct access" is checked in the "Get Multiple Data" node |
mdworkflow | custom_apipackageapi_catch | No | Data returned from calling API integration interface |
mdworksheetlog | wslog* | Yes | Log of row records in the corresponding month The format of the worksheet name is wslog+date (e.g., wslog202409) |
mdintegration | wf_instance | No | Integration center - request logs |
mdintegration | wf_instance_relation | No | Integration center - associated data for request logs |
mdintegration | webhooks_catch | No | Integration center - data corresponding to "View details" in request logs |
mdintegration | code_catch | No | Integration center - data corresponding to "View details" in request logs |
mdintegration | json_catch | No | Integration center - data corresponding to "View details" in request logs |
mdintegration | custom_parameter_catch | No | Integration center - data corresponding to "View details" in request logs |
mdservicedata | al_actionlog* | Yes | Store the application behavior logs for the corresponding month The format of the worksheet name is al_actionlog+date(e.g., al_actionlog202409) |
-
Tables that can directly
drop
are recommended to be deleted using thedb.collection.drop()
command because dropping them will release the storage space occupied by the corresponding table directly.For example, the following operation will delete the
code_catch
table under themdworkflow
database:use mdworkflow
db.code_catch.drop() -
For tables that cannot directly
drop
, refer to the following steps to configure a cleanup task.
Configure Data Cleanup Task
-
Download the mirror (offline package download)
docker pull registry.cn-hangzhou.aliyuncs.com/mdpublic/mingdaoyun-archivetools:1.0.3
-
Create a
config.json
configuration file with the following example content:[
{
"id": "1",
"text": "Description",
"start": "2023-05-31 16:00:00",
"end": "2023-06-30 16:00:00",
"src": "mongodb://root:password@192.168.1.20:27017/mdworkflow?authSource=admin",
"archive": "",
"table": "wf_instance",
"delete": true,
"batchSize": 500,
"retentionDays": 0
},
{
"id": "2",
"text": "Description",
"start": "2023-05-31 16:00:00",
"end": "2023-06-30 16:00:00",
"src": "mongodb://root:password@192.168.1.30:27017/mdworkflow?authSource=admin",
"archive": "",
"table": "wf_subInstanceActivity",
"delete": true,
"batchSize": 500,
"retentionDays": 0
}
]- Based on the above configuration file, adjust or add configuration content to clean up the desired data tables.
Parameter Description:
"id": "Task Identifier ID",
"text": "Description",
"start": "Specify the start time of the archived data, in UTC time zone (if the value of retentionDays is greater than 0, this configuration will automatically become invalid)",
"end": "Specify the end time of the archived data, in UTC time zone (if the value of retentionDays is greater than 0, this configuration will automatically become invalid)",
"src": "Connection address of the source database",
"archive": "Connection address of the target database (if empty, no archiving will be done, only deletion according to the set rules)",
"table": "Data table",
"delete": "It is fixed to true; after the archiving task is completed, and the number of records verified is correct, clean up the archived data in the source database",
"batchSize": "Number of entries and deletions in a single batch",
"retentionDays": "It defaults to 0. If greater than 0, it means delete data X days ago and enable scheduled deletion mode, the dates specified in start and end will automatically become invalid, scheduled to run every 24 hours by default" -
Start the archiving service by executing the following in the directory where the
config.json
file is locateddocker run -d -it -v $(pwd)/config.json:/usr/local/MDArchiveTools/config.json -v /usr/share/zoneinfo/Etc/GMT-8:/etc/localtime registry.cn-hangzhou.aliyuncs.com/mdpublic/mingdaoyun-archivetools:1.0.3
Other:
-
Resource Usage: During program operation, there will be a certain amount of resource pressure on the source database, target database, and the device where the program is located. It is recommended to execute in the idle period of the business.
-
Viewing Logs:
-
Running in the background (default): Use
docker ps -a
to find the container ID, then executedocker logs container ID
to view the logs. -
Running in the foreground: Remove the
-d
parameter, and the logs will be output in real-time to the terminal for easy progress tracking.
-
-
Scheduled Tasks:
- Set execution interval: You can modify the execution interval in milliseconds by customizing the
ENV_ARCHIVE_INTERVAL
variable, with a default value of 86400000.
- Set execution interval: You can modify the execution interval in milliseconds by customizing the
-
Reclaim Disk Space: When data is deleted using the cleanup tool, the disk space occupied by the deleted data is not immediately released, but it is usually reused by the same table.
-