Skip to main content

How to independently deploy data pipeline service

Data pipeline is an extension module in the HAP system, and users can choose whether to enable it independently. Enable data pipeline.

Quick deployment involves deploying the data pipeline service on the same server as the HAP microservices, which requires high hardware resources. If a single server cannot meet the requirements, follow this article to independently deploy the data pipeline service on a new server. More details on server configuration.

Install Docker

To install Docker, check the official installation instructions for different Linux versions or view the Docker Installation Section in the deployment examples.

Microservices Adjustment

The data pipeline service requires file storage and Kafka components, so it is necessary to map the access points of these two components in the sc service in standalone mode.

If your HAP Server environment is in cluster mode, no adjustment is needed, and you can directly connect the data pipeline service to file storage and Kafka components for configuring.

For standalone mode, to map the ports of file storage and Kafka components, you need to modify the docker-compose.yaml file by adding environment variables and port mappings as shown below.

app:
environment:
ENV_FLINK_URL: http://192.168.10.30:58081 # Add, this is the Host resolution of the Flink data pipeline service, make sure to modify it to the actual IP address

sc:
ports:
- 9000:9000
- 9092:9092
docker-compose.yaml Configuration File Example
version: '3'

services:
app:
image: registry.cn-hangzhou.aliyuncs.com/mdpublic/mingdaoyun-community:6.0.1
environment: &app-environment
ENV_ADDRESS_MAIN: "https://hap.domain.com"
ENV_APP_VERSION: "6.0.1"
ENV_API_TOKEN: "******"
ENV_FLINK_URL: http://192.168.10.30:58081 # Add, this is the Host resolution of the Flink data pipeline service, make sure to modify it to the actual IP address
ports:
- 8880:8880
volumes:
- ./volume/data/:/data/
- ../data:/data/mingdao/data

sc:
image: registry.cn-hangzhou.aliyuncs.com/mdpublic/mingdaoyun-sc:3.0.0
environment:
<<: *app-environment
volumes:
- ./volume/data/:/data/
ports:
- 9000:9000 # Add
- 9092:9092 # Add
volumes:
- ./volume/data/:/data/

After modifications, execute bash service.sh restartall in the manager directory to restart the microservices.

Deploy Data Pipeline Service

  1. Initialize the swarm environment

    docker swarm init
  2. Create a directory

    mkdir -p /data/mingdao/script/volume/data
  3. Create a configuration file

    cat > /data/mingdao/script/flink.yaml <<EOF
    version: '3'
    services:
    flink:
    image: registry.cn-hangzhou.aliyuncs.com/mdpublic/mingdaoyun-flink:1.17.1.530
    entrypoint: ["/bin/bash"]
    command: ["/run.sh"]
    environment:
    ENV_FLINK_S3_ACCESSKEY: "mdstorage"
    ENV_FLINK_S3_SECRETKEY: "eBxExGQJNhGosgv5FQJiVNqH"
    ENV_FLINK_S3_SSL: "false"
    ENV_FLINK_S3_PATH_STYLE_ACCESS: "true"
    ENV_FLINK_S3_ENDPOINT: "sc:9000" # For versions before 5.1.0 (excluding 5.1.0), fill in "app"; for versions 5.1.0+ (including 5.1.0), fill in "sc"
    ENV_FLINK_S3_BUCKET: "mdoc"
    ENV_FLINK_LOG_LEVEL: "INFO"
    ENV_FLINK_JOBMANAGER_MEMORY: "2000m"
    ENV_FLINK_TASKMANAGER_MEMORY: "10000m"
    ENV_FLINK_TASKMANAGER_SLOTS: "50"
    ENV_KAFKA_ENDPOINTS: "sc:9092" # For versions before 5.1.0 (excluding 5.1.0), fill in "app"; for versions 5.1.0+ (including 5.1.0), fill in "sc"; if Kafka is using external components, fill in the actual IP of Kafka.
    ports:
    - 58081:8081
    volumes:
    - ./volume/data/:/data/
    extra_hosts:
    - "sc:192.168.10.28" # This is the host resolution for the sc service (corresponding to the value filled in ENV_KAFKA_ENDPOINTS as "sc:9092"), make sure to modify it to the actual IP address
    #- "app:192.168.10.28" # This is the host resolution for the sc service (corresponding to the value filled in ENV_KAFKA_ENDPOINTS as "sc:9092"), make sure to modify it to the actual IP address
    EOF
  4. Configure the startup script

    cat > /data/mingdao/script/startflink.sh <<-EOF
    docker stack deploy -c /data/mingdao/script/flink.yaml flink
    EOF
    chmod +x /data/mingdao/script/startflink.sh
  5. Start the data pipeline service

    bash /data/mingdao/script/startflink.sh
    • It takes about 5 minutes for the data pipeline service container to fully start after startup.
    • Stop command: docker stack rm flink

Other Considerations

The data pipeline service needs to create two directories, checkpoints and recovery, under the bucket of the file storage service to store relevant data.

If external object storage is enabled, the file storage will switch to S3 mode, causing issues with the data pipeline, as the data pipeline service currently does not support direct use of the S3 protocol for object storage.

Therefore, if external object storage is enabled, a new file storage service needs to be deployed for the data pipeline service.

Deploy File Storage Service

  1. Create file-flink.yaml

    version: '3'
    services:
    file-flink:
    image: registry.cn-hangzhou.aliyuncs.com/mdpublic/mingdaoyun-file:1.6.0
    volumes:
    - /usr/share/zoneinfo/Etc/GMT-8:/etc/localtime
    - ./volume/data/file-flink/volume:/data/storage
    environment:
    MINIO_ACCESS_KEY: storage
    MINIO_SECRET_KEY: ITwWPDGvSLxxxxxxM46XiSEmEdF4 # customize the authentication key
    command: ["./main", "server", "/data/storage/data"]
  2. Download the mirror for the file service

    docker pull registry.cn-hangzhou.aliyuncs.com/mdpublic/mingdaoyun-file:1.6.0
  3. Create persistent storage directories for the file-flink service

    mkdir -p /data/mingdao/script/volume/data/file-flink/volume
  4. Start the file-flink file storage service

    docker stack deploy -c file-flink.yaml file-flink
  5. Enter the file-flink container to create the required bucket

    docker exec -it xxx bash
    • Replace xxx with the container id of file-flink
  6. Create buckets

    # mc command configuration
    mc config host add file-flink http://127.0.0.1:9000 storage ITwWPDGvSLxxxxxxM46XiSEmEdF4 # modify it to your custom authentication key

    # Create the required bucket: mdoc
    mc mb file-flink/mdoc
  7. Modify relevant variables in the data pipeline service to specify connection to the file-flink service

    ENV_FLINK_S3_ACCESSKEY: "storage"
    ENV_FLINK_S3_SECRETKEY: "ITwWPDGvSLxxxxxxM46XiSEmEdF4" # modify it to your custom authentication key
    ENV_FLINK_S3_ENDPOINT: "192.168.10.30:9000" # replace with the actual IP of the file-flink service
    ENV_FLINK_S3_BUCKET: "mdoc"
  8. Restart the flink service

docker stack rm flink
sleep 30
bash /data/mingdao/script/startflink.sh