Skip to main content

Clean Log Data

In HAP Server, some log data is stored in MongoDB for a long time, which may lead to a large amount of this type of data occupying a significant portion of the database storage space.

You can use the show dbs command in MongoDB to check the size of each database and then use the command to calculate table size to identify tables that occupy a large amount of storage space.

We provide a solution for cleaning log data. Based on the set rules, data in relevant tables can be physically deleted. After the deletion is completed, log data for the corresponding time on the page will no longer be displayed. For example, execution records of workflows, records of approval processes (approval processed are also within the scope of workflows), logs of row records in worksheets, and request logs in the integration center.

Whitelisted Tables for Cleaning:

DatabaseTable NameDirect DropTable Purpose
mdworkflowcode_catchYesCached data generated during code block execution
mdworkflowhooks_catchYesTemporary cache data for triggers
mdworkflowwebhooks_catchYesCached data generated during Webhook execution
mdworkflowwf_instanceNoAssociated data for main workflow execution history
mdworkflowwf_subInstanceActivityNoAssociated data for sub-workflow execution history
mdworkflowwf_subInstanceCallbackNoAssociated data for sub-workflow execution history
mdworkflowapp_multiple_catchNoData stored when "Direct access" is checked in the "Get Multiple Data" node
mdworkflowcustom_apipackageapi_catchNoData returned from calling API integration interface
mdworksheetlogwslog*YesLog of row records in the corresponding month
The format of the worksheet name is wslog+date (e.g., wslog202409)
mdintegrationwf_instanceNoIntegration center - request logs
mdintegrationwf_instance_relationNoIntegration center - associated data for request logs
mdintegrationwebhooks_catchNoIntegration center - data corresponding to "View details" in request logs
mdintegrationcode_catchNoIntegration center - data corresponding to "View details" in request logs
mdintegrationjson_catchNoIntegration center - data corresponding to "View details" in request logs
mdintegrationcustom_parameter_catchNoIntegration center - data corresponding to "View details" in request logs
mdservicedataal_actionlog*YesStore the application behavior logs for the corresponding month
The format of the worksheet name is al_actionlog+date(e.g., al_actionlog202409)
mdservicedataal_uselogNoLog for storage usage analysis
  • Tables that can directly drop are recommended to be deleted using the db.collection.drop() command because dropping them will release the storage space occupied by the corresponding table directly.

    For example, the following operation will delete the code_catch table under the mdworkflow database:

    use mdworkflow
    db.code_catch.drop()
  • For tables that cannot directly drop, refer to the following steps to configure a cleanup task.

Configure Data Cleanup Task

  1. Download the mirror (offline package download)

    docker pull registry.cn-hangzhou.aliyuncs.com/mdpublic/mingdaoyun-archivetools:1.0.4
  2. Create a config.json configuration file with the following example content:

    [
    {
    "id": "1",
    "text": "Description",
    "start": "2023-05-31 16:00:00",
    "end": "2023-06-30 16:00:00",
    "src": "mongodb://root:password@192.168.1.20:27017/mdworkflow?authSource=admin",
    "archive": "",
    "table": "wf_instance",
    "delete": true,
    "batchSize": 500,
    "retentionDays": 0
    },
    {
    "id": "2",
    "text": "Description",
    "start": "2023-05-31 16:00:00",
    "end": "2023-06-30 16:00:00",
    "src": "mongodb://root:password@192.168.1.30:27017/mdworkflow?authSource=admin",
    "archive": "",
    "table": "wf_subInstanceActivity",
    "delete": true,
    "batchSize": 500,
    "retentionDays": 0
    }
    ]
    • According to the above configuration file format, adjust or add configuration content to clean the data tables as needed.
    • Note: The time specified in the configuration file is in Coordinated Universal Time (UTC).
      • UTC: 2023-05-31 16:00:00
        • Converted to UTC+8 (East 8th Zone) time: 2023-06-01 00:00:00 (2023-05-31 16:00:00 + 8 hours)
      • UTC: 2023-06-30 16:00:00
        • Converted to UTC+8 (East 8th Zone) time: 2023-07-01 00:00:00 (2023-06-30 16:00:00 + 8 hours)

    Parameter Description:

    "id": "Task Identifier ID",
    "text": "Description",
    "start": "Specify the start time of the archived data, in UTC time zone (if the value of retentionDays is greater than 0, this configuration will automatically become invalid), delete data greater than or equal to this time.",
    "end": "Specify the end time of the archived data, in UTC time zone (if the value of retentionDays is greater than 0, this configuration will automatically become invalid), delete data before this time.",
    "src": "Connection address of the source database",
    "archive": "Connection address of the target database (if empty, no archiving will be done, only deletion according to the set rules)",
    "table": "Data table",
    "delete": "It is fixed to true; after the archiving task is completed, and the number of records verified is correct, clean up the archived data in the source database",
    "batchSize": "Number of entries and deletions in a single batch",
    "retentionDays": "It defaults to 0. If greater than 0, it means delete data X days ago and enable scheduled deletion mode, the dates specified in start and end will automatically become invalid, scheduled to run every 24 hours by default"
  3. Start the archiving service by executing the following in the directory where the config.json file is located

    docker run -d -it -v $(pwd)/config.json:/usr/local/MDArchiveTools/config.json  -v /usr/share/zoneinfo/Etc/GMT-8:/etc/localtime registry.cn-hangzhou.aliyuncs.com/mdpublic/mingdaoyun-archivetools:1.0.4

    Other:

    • Resource Usage: During program operation, there will be a certain amount of resource pressure on the source database, target database, and the device where the program is located. It is recommended to execute in the idle period of the business.

    • Viewing Logs:

      • Running in the background (default): Use docker ps -a to find the container ID, then execute docker logs container ID to view the logs.

      • Running in the foreground: Remove the -d parameter, and the logs will be output in real-time to the terminal for easy progress tracking.

    • Scheduled Tasks:

      • Set execution interval: You can modify the execution interval in milliseconds by customizing the ENV_ARCHIVE_INTERVAL variable, with a default value of 86400000.
    • Reclaim Disk Space: When data is deleted using the cleanup tool, the disk space occupied by the deleted data is not immediately released, but it is usually reused by the same table.