diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md new file mode 100644 index 0000000000..6235a6ce84 --- /dev/null +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -0,0 +1,32 @@ +## *Tips* +- *Thanks very much for contributing to Apache DolphinScheduler.* +- *Please review https://dolphinscheduler.apache.org/en-us/community/index.html before opening a pull request.* + +## What is the purpose of the pull request + +*(For example: This pull request adds checkstyle plugin.)* + +## Brief change log + +*(for example:)* + - *Add maven-checkstyle-plugin to root pom.xml* + +## Verify this pull request + +*(Please pick either of the following options)* + +This pull request is code cleanup without any test coverage. + +*(or)* + +This pull request is already covered by existing tests, such as *(please describe tests)*. + +(or) + +This change added tests and can be verified as follows: + +*(example:)* + + - *Added dolphinscheduler-dao tests for end-to-end.* + - *Added CronUtilsTest to verify the change.* + - *Manually verified the change by testing locally.* diff --git a/.github/workflows/ci_backend.yml b/.github/workflows/ci_backend.yml new file mode 100644 index 0000000000..e527c3c4a2 --- /dev/null +++ b/.github/workflows/ci_backend.yml @@ -0,0 +1,64 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +name: Backend + +on: + push: + paths: + - '.github/workflows/ci_backend.yml' + - 'package.xml' + - 'pom.xml' + - 'dolphinscheduler-alert/**' + - 'dolphinscheduler-api/**' + - 'dolphinscheduler-common/**' + - 'dolphinscheduler-dao/**' + - 'dolphinscheduler-rpc/**' + - 'dolphinscheduler-server/**' + pull_request: + paths: + - '.github/workflows/ci_backend.yml' + - 'package.xml' + - 'pom.xml' + - 'dolphinscheduler-alert/**' + - 'dolphinscheduler-api/**' + - 'dolphinscheduler-common/**' + - 'dolphinscheduler-dao/**' + - 'dolphinscheduler-rpc/**' + - 'dolphinscheduler-server/**' + +jobs: + Compile-check: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v1 + - name: Set up JDK 1.8 + uses: actions/setup-java@v1 + with: + java-version: 1.8 + - name: Compile + run: mvn -U -B -T 1C clean install -Prelease -Dmaven.compile.fork=true -Dmaven.test.skip=true + License-check: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v1 + - name: Set up JDK 1.8 + uses: actions/setup-java@v1 + with: + java-version: 1.8 + - name: Check + run: mvn -B apache-rat:check diff --git a/.github/workflows/ci_frontend.yml b/.github/workflows/ci_frontend.yml new file mode 100644 index 0000000000..fab75c6341 --- /dev/null +++ b/.github/workflows/ci_frontend.yml @@ -0,0 +1,58 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +name: Frontend + +on: + push: + paths: + - '.github/workflows/ci_frontend.yml' + - 'dolphinscheduler-ui/**' + pull_request: + paths: + - '.github/workflows/ci_frontend.yml' + - 'dolphinscheduler-ui/**' + +jobs: + Compile-check: + runs-on: ${{ matrix.os }} + strategy: + matrix: + os: [ubuntu-latest, macos-latest] + steps: + - uses: actions/checkout@v1 + - name: Set up Node.js + uses: actions/setup-node@v1 + with: + version: 8 + - name: Compile + run: | + cd dolphinscheduler-ui + npm install node-sass --unsafe-perm + npm install + npm run build + + License-check: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v1 + - name: Set up JDK 1.8 + uses: actions/setup-java@v1 + with: + java-version: 1.8 + - name: Check + run: mvn -B apache-rat:check diff --git a/.gitignore b/.gitignore index 40078a2fa2..2ef5f5d1e6 100644 --- a/.gitignore +++ b/.gitignore @@ -35,112 +35,112 @@ config.gypi test/coverage /docs/zh_CN/介绍 /docs/zh_CN/贡献代码.md -/escheduler-common/src/main/resources/zookeeper.properties -escheduler-alert/logs/ -escheduler-alert/src/main/resources/alert.properties_bak -escheduler-alert/src/main/resources/logback.xml -escheduler-server/src/main/resources/logback.xml -escheduler-ui/dist/css/common.16ac5d9.css -escheduler-ui/dist/css/home/index.b444b91.css -escheduler-ui/dist/css/login/index.5866c64.css -escheduler-ui/dist/js/0.ac94e5d.js -escheduler-ui/dist/js/0.ac94e5d.js.map -escheduler-ui/dist/js/1.0b043a3.js -escheduler-ui/dist/js/1.0b043a3.js.map -escheduler-ui/dist/js/10.1bce3dc.js -escheduler-ui/dist/js/10.1bce3dc.js.map -escheduler-ui/dist/js/11.79f04d8.js -escheduler-ui/dist/js/11.79f04d8.js.map -escheduler-ui/dist/js/12.420daa5.js -escheduler-ui/dist/js/12.420daa5.js.map -escheduler-ui/dist/js/13.e5bae1c.js -escheduler-ui/dist/js/13.e5bae1c.js.map -escheduler-ui/dist/js/14.f2a0dca.js -escheduler-ui/dist/js/14.f2a0dca.js.map -escheduler-ui/dist/js/15.45373e8.js -escheduler-ui/dist/js/15.45373e8.js.map -escheduler-ui/dist/js/16.fecb0fc.js -escheduler-ui/dist/js/16.fecb0fc.js.map -escheduler-ui/dist/js/17.84be279.js -escheduler-ui/dist/js/17.84be279.js.map -escheduler-ui/dist/js/18.307ea70.js -escheduler-ui/dist/js/18.307ea70.js.map -escheduler-ui/dist/js/19.144db9c.js -escheduler-ui/dist/js/19.144db9c.js.map -escheduler-ui/dist/js/2.8b4ef29.js -escheduler-ui/dist/js/2.8b4ef29.js.map -escheduler-ui/dist/js/20.4c527e9.js -escheduler-ui/dist/js/20.4c527e9.js.map -escheduler-ui/dist/js/21.831b2a2.js -escheduler-ui/dist/js/21.831b2a2.js.map -escheduler-ui/dist/js/22.2b4bb2a.js -escheduler-ui/dist/js/22.2b4bb2a.js.map -escheduler-ui/dist/js/23.81467ef.js -escheduler-ui/dist/js/23.81467ef.js.map -escheduler-ui/dist/js/24.54a00e4.js -escheduler-ui/dist/js/24.54a00e4.js.map -escheduler-ui/dist/js/25.8d7bd36.js -escheduler-ui/dist/js/25.8d7bd36.js.map -escheduler-ui/dist/js/26.2ec5e78.js -escheduler-ui/dist/js/26.2ec5e78.js.map -escheduler-ui/dist/js/27.3ab48c2.js -escheduler-ui/dist/js/27.3ab48c2.js.map -escheduler-ui/dist/js/28.363088a.js -escheduler-ui/dist/js/28.363088a.js.map -escheduler-ui/dist/js/29.6c5853a.js -escheduler-ui/dist/js/29.6c5853a.js.map -escheduler-ui/dist/js/3.a0edb5b.js -escheduler-ui/dist/js/3.a0edb5b.js.map -escheduler-ui/dist/js/30.940fdd3.js -escheduler-ui/dist/js/30.940fdd3.js.map -escheduler-ui/dist/js/31.168a460.js -escheduler-ui/dist/js/31.168a460.js.map -escheduler-ui/dist/js/32.8df6594.js -escheduler-ui/dist/js/32.8df6594.js.map -escheduler-ui/dist/js/33.4480bbe.js -escheduler-ui/dist/js/33.4480bbe.js.map -escheduler-ui/dist/js/34.b407fe1.js -escheduler-ui/dist/js/34.b407fe1.js.map -escheduler-ui/dist/js/35.f340b0a.js -escheduler-ui/dist/js/35.f340b0a.js.map -escheduler-ui/dist/js/36.8880c2d.js -escheduler-ui/dist/js/36.8880c2d.js.map -escheduler-ui/dist/js/37.ea2a25d.js -escheduler-ui/dist/js/37.ea2a25d.js.map -escheduler-ui/dist/js/38.98a59ee.js -escheduler-ui/dist/js/38.98a59ee.js.map -escheduler-ui/dist/js/39.a5e958a.js -escheduler-ui/dist/js/39.a5e958a.js.map -escheduler-ui/dist/js/4.4ca44db.js -escheduler-ui/dist/js/4.4ca44db.js.map -escheduler-ui/dist/js/40.e187b1e.js -escheduler-ui/dist/js/40.e187b1e.js.map -escheduler-ui/dist/js/41.0e89182.js -escheduler-ui/dist/js/41.0e89182.js.map -escheduler-ui/dist/js/42.341047c.js -escheduler-ui/dist/js/42.341047c.js.map -escheduler-ui/dist/js/43.27b8228.js -escheduler-ui/dist/js/43.27b8228.js.map -escheduler-ui/dist/js/44.e8869bc.js -escheduler-ui/dist/js/44.e8869bc.js.map -escheduler-ui/dist/js/45.8d54901.js -escheduler-ui/dist/js/45.8d54901.js.map -escheduler-ui/dist/js/5.e1ed7f3.js -escheduler-ui/dist/js/5.e1ed7f3.js.map -escheduler-ui/dist/js/6.241ba07.js -escheduler-ui/dist/js/6.241ba07.js.map -escheduler-ui/dist/js/7.ab2e297.js -escheduler-ui/dist/js/7.ab2e297.js.map -escheduler-ui/dist/js/8.83ff814.js -escheduler-ui/dist/js/8.83ff814.js.map -escheduler-ui/dist/js/9.39cb29f.js -escheduler-ui/dist/js/9.39cb29f.js.map -escheduler-ui/dist/js/common.733e342.js -escheduler-ui/dist/js/common.733e342.js.map -escheduler-ui/dist/js/home/index.78a5d12.js -escheduler-ui/dist/js/home/index.78a5d12.js.map -escheduler-ui/dist/js/login/index.291b8e3.js -escheduler-ui/dist/js/login/index.291b8e3.js.map -escheduler-ui/dist/lib/external/ -escheduler-ui/src/js/conf/home/pages/projects/pages/taskInstance/index.vue -/escheduler-dao/src/main/resources/dao/data_source.properties +/dolphinscheduler-common/src/main/resources/zookeeper.properties +dolphinscheduler-alert/logs/ +dolphinscheduler-alert/src/main/resources/alert.properties_bak +dolphinscheduler-alert/src/main/resources/logback.xml +dolphinscheduler-server/src/main/resources/logback.xml +dolphinscheduler-ui/dist/css/common.16ac5d9.css +dolphinscheduler-ui/dist/css/home/index.b444b91.css +dolphinscheduler-ui/dist/css/login/index.5866c64.css +dolphinscheduler-ui/dist/js/0.ac94e5d.js +dolphinscheduler-ui/dist/js/0.ac94e5d.js.map +dolphinscheduler-ui/dist/js/1.0b043a3.js +dolphinscheduler-ui/dist/js/1.0b043a3.js.map +dolphinscheduler-ui/dist/js/10.1bce3dc.js +dolphinscheduler-ui/dist/js/10.1bce3dc.js.map +dolphinscheduler-ui/dist/js/11.79f04d8.js +dolphinscheduler-ui/dist/js/11.79f04d8.js.map +dolphinscheduler-ui/dist/js/12.420daa5.js +dolphinscheduler-ui/dist/js/12.420daa5.js.map +dolphinscheduler-ui/dist/js/13.e5bae1c.js +dolphinscheduler-ui/dist/js/13.e5bae1c.js.map +dolphinscheduler-ui/dist/js/14.f2a0dca.js +dolphinscheduler-ui/dist/js/14.f2a0dca.js.map +dolphinscheduler-ui/dist/js/15.45373e8.js +dolphinscheduler-ui/dist/js/15.45373e8.js.map +dolphinscheduler-ui/dist/js/16.fecb0fc.js +dolphinscheduler-ui/dist/js/16.fecb0fc.js.map +dolphinscheduler-ui/dist/js/17.84be279.js +dolphinscheduler-ui/dist/js/17.84be279.js.map +dolphinscheduler-ui/dist/js/18.307ea70.js +dolphinscheduler-ui/dist/js/18.307ea70.js.map +dolphinscheduler-ui/dist/js/19.144db9c.js +dolphinscheduler-ui/dist/js/19.144db9c.js.map +dolphinscheduler-ui/dist/js/2.8b4ef29.js +dolphinscheduler-ui/dist/js/2.8b4ef29.js.map +dolphinscheduler-ui/dist/js/20.4c527e9.js +dolphinscheduler-ui/dist/js/20.4c527e9.js.map +dolphinscheduler-ui/dist/js/21.831b2a2.js +dolphinscheduler-ui/dist/js/21.831b2a2.js.map +dolphinscheduler-ui/dist/js/22.2b4bb2a.js +dolphinscheduler-ui/dist/js/22.2b4bb2a.js.map +dolphinscheduler-ui/dist/js/23.81467ef.js +dolphinscheduler-ui/dist/js/23.81467ef.js.map +dolphinscheduler-ui/dist/js/24.54a00e4.js +dolphinscheduler-ui/dist/js/24.54a00e4.js.map +dolphinscheduler-ui/dist/js/25.8d7bd36.js +dolphinscheduler-ui/dist/js/25.8d7bd36.js.map +dolphinscheduler-ui/dist/js/26.2ec5e78.js +dolphinscheduler-ui/dist/js/26.2ec5e78.js.map +dolphinscheduler-ui/dist/js/27.3ab48c2.js +dolphinscheduler-ui/dist/js/27.3ab48c2.js.map +dolphinscheduler-ui/dist/js/28.363088a.js +dolphinscheduler-ui/dist/js/28.363088a.js.map +dolphinscheduler-ui/dist/js/29.6c5853a.js +dolphinscheduler-ui/dist/js/29.6c5853a.js.map +dolphinscheduler-ui/dist/js/3.a0edb5b.js +dolphinscheduler-ui/dist/js/3.a0edb5b.js.map +dolphinscheduler-ui/dist/js/30.940fdd3.js +dolphinscheduler-ui/dist/js/30.940fdd3.js.map +dolphinscheduler-ui/dist/js/31.168a460.js +dolphinscheduler-ui/dist/js/31.168a460.js.map +dolphinscheduler-ui/dist/js/32.8df6594.js +dolphinscheduler-ui/dist/js/32.8df6594.js.map +dolphinscheduler-ui/dist/js/33.4480bbe.js +dolphinscheduler-ui/dist/js/33.4480bbe.js.map +dolphinscheduler-ui/dist/js/34.b407fe1.js +dolphinscheduler-ui/dist/js/34.b407fe1.js.map +dolphinscheduler-ui/dist/js/35.f340b0a.js +dolphinscheduler-ui/dist/js/35.f340b0a.js.map +dolphinscheduler-ui/dist/js/36.8880c2d.js +dolphinscheduler-ui/dist/js/36.8880c2d.js.map +dolphinscheduler-ui/dist/js/37.ea2a25d.js +dolphinscheduler-ui/dist/js/37.ea2a25d.js.map +dolphinscheduler-ui/dist/js/38.98a59ee.js +dolphinscheduler-ui/dist/js/38.98a59ee.js.map +dolphinscheduler-ui/dist/js/39.a5e958a.js +dolphinscheduler-ui/dist/js/39.a5e958a.js.map +dolphinscheduler-ui/dist/js/4.4ca44db.js +dolphinscheduler-ui/dist/js/4.4ca44db.js.map +dolphinscheduler-ui/dist/js/40.e187b1e.js +dolphinscheduler-ui/dist/js/40.e187b1e.js.map +dolphinscheduler-ui/dist/js/41.0e89182.js +dolphinscheduler-ui/dist/js/41.0e89182.js.map +dolphinscheduler-ui/dist/js/42.341047c.js +dolphinscheduler-ui/dist/js/42.341047c.js.map +dolphinscheduler-ui/dist/js/43.27b8228.js +dolphinscheduler-ui/dist/js/43.27b8228.js.map +dolphinscheduler-ui/dist/js/44.e8869bc.js +dolphinscheduler-ui/dist/js/44.e8869bc.js.map +dolphinscheduler-ui/dist/js/45.8d54901.js +dolphinscheduler-ui/dist/js/45.8d54901.js.map +dolphinscheduler-ui/dist/js/5.e1ed7f3.js +dolphinscheduler-ui/dist/js/5.e1ed7f3.js.map +dolphinscheduler-ui/dist/js/6.241ba07.js +dolphinscheduler-ui/dist/js/6.241ba07.js.map +dolphinscheduler-ui/dist/js/7.ab2e297.js +dolphinscheduler-ui/dist/js/7.ab2e297.js.map +dolphinscheduler-ui/dist/js/8.83ff814.js +dolphinscheduler-ui/dist/js/8.83ff814.js.map +dolphinscheduler-ui/dist/js/9.39cb29f.js +dolphinscheduler-ui/dist/js/9.39cb29f.js.map +dolphinscheduler-ui/dist/js/common.733e342.js +dolphinscheduler-ui/dist/js/common.733e342.js.map +dolphinscheduler-ui/dist/js/home/index.78a5d12.js +dolphinscheduler-ui/dist/js/home/index.78a5d12.js.map +dolphinscheduler-ui/dist/js/login/index.291b8e3.js +dolphinscheduler-ui/dist/js/login/index.291b8e3.js.map +dolphinscheduler-ui/dist/lib/external/ +dolphinscheduler-ui/src/js/conf/home/pages/projects/pages/taskInstance/index.vue +/dolphinscheduler-dao/src/main/resources/dao/data_source.properties diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index be32e77143..8ed9aac897 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -1,4 +1,4 @@ -* First from the remote repository *https://github.com/analysys/EasyScheduler.git* fork code to your own repository +* First from the remote repository *https://github.com/apache/incubator-dolphinscheduler.git* fork code to your own repository * there are three branches in the remote repository currently: * master normal delivery branch @@ -7,17 +7,14 @@ * dev daily development branch The daily development branch, the newly submitted code can pull requests to this branch. - * branch-1.0.0 release version branch - Release version branch, there will be 2.0 ... and other version branches, the version - branch only changes the error, does not add new features. * Clone your own warehouse to your local - `git clone https://github.com/analysys/EasyScheduler.git` + `git clone https://github.com/apache/incubator-dolphinscheduler.git` * Add remote repository address, named upstream - `git remote add upstream https://github.com/analysys/EasyScheduler.git` + `git remote add upstream https://github.com/apache/incubator-dolphinscheduler.git` * View repository: @@ -63,71 +60,6 @@ git push --set-upstream origin dev1.0 * Next, the administrator is responsible for **merging** to complete the pull request ---- - -* 首先从远端仓库*https://github.com/analysys/EasyScheduler.git* fork一份代码到自己的仓库中 - -* 远端仓库中目前有三个分支: - * master 正常交付分支 - 发布稳定版本以后,将稳定版本分支的代码合并到master上。 - - * dev 日常开发分支 - 日常dev开发分支,新提交的代码都可以pull request到这个分支上。 - - * branch-1.0.0 发布版本分支 - 发布版本分支,后续会有2.0...等版本分支,版本分支只修改bug,不增加新功能。 - -* 把自己仓库clone到本地 - - `git clone https://github.com/analysys/EasyScheduler.git` - -* 添加远端仓库地址,命名为upstream - - ` git remote add upstream https://github.com/analysys/EasyScheduler.git ` - -* 查看仓库: - - ` git remote -v` - -> 此时会有两个仓库:origin(自己的仓库)和upstream(远端仓库) - -* 获取/更新远端仓库代码(已经是最新代码,就跳过) - - `git fetch upstream ` - - -* 同步远端仓库代码到本地仓库 - -``` - git checkout origin/dev - git merge --no-ff upstream/dev -``` - -如果远端分支有新加的分支`dev-1.0`,需要同步这个分支到本地仓库 - -``` -git checkout -b dev-1.0 upstream/dev-1.0 -git push --set-upstream origin dev1.0 -``` - -* 在本地修改代码以后,提交到自己仓库: - - `git commit -m 'test commit'` - `git push` - -* 将修改提交到远端仓库 - - * 在github页面,点击New pull request. -
- -
- - * 选择修改完的本地分支和要合并过去的分支,Create pull request. -- -
- -* 接下来由管理员负责将**Merge**完成此次pull request diff --git a/DISCLAIMER b/DISCLAIMER new file mode 100644 index 0000000000..1c269cd696 --- /dev/null +++ b/DISCLAIMER @@ -0,0 +1,5 @@ +Apache DolphinScheduler (incubating) is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator PMC. +Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, +communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. +While incubation status is not necessarily a reflection of the completeness or stability of the code, +it does indicate that the project has yet to be fully endorsed by the ASF. diff --git a/NOTICE b/NOTICE index 26802e12b6..72b5f0632c 100644 --- a/NOTICE +++ b/NOTICE @@ -1,7 +1,5 @@ -Easy Scheduler -Copyright 2019 The Analysys Foundation +Apache DolphinScheduler (incubating) +Copyright 2019 The Apache Software Foundation This product includes software developed at -The Analysys Foundation (https://www.analysys.cn/). - - +The Apache Software Foundation (http://www.apache.org/). diff --git a/README.md b/README.md index 6352bd5f10..b4a7e5c7cd 100644 --- a/README.md +++ b/README.md @@ -11,7 +11,7 @@ Dolphin Scheduler [![CN doc](https://img.shields.io/badge/文档-中文版-blue.svg)](README_zh_CN.md) -### Design features: +### Design features: A distributed and easy-to-expand visual DAG workflow scheduling system. Dedicated to solving the complex dependencies in data processing, making the scheduling system `out of the box` for data processing. Its main objectives are as follows: @@ -36,8 +36,8 @@ Its main objectives are as follows: Stability | Easy to use | Features | Scalability | -- | -- | -- | -- -Decentralized multi-master and multi-worker | Visualization process defines key information such as task status, task type, retry times, task running machine, visual variables and so on at a glance. | Support pause, recover operation | support custom task types -HA is supported by itself | All process definition operations are visualized, dragging tasks to draw DAGs, configuring data sources and resources. At the same time, for third-party systems, the api mode operation is provided. | Users on DolphinScheduler can achieve many-to-one or one-to-one mapping relationship through tenants and Hadoop users, which is very important for scheduling large data jobs. " | The scheduler uses distributed scheduling, and the overall scheduling capability will increase linearly with the scale of the cluster. Master and Worker support dynamic online and offline. +Decentralized multi-master and multi-worker | Visualization process defines key information such as task status, task type, retry times, task running machine, visual variables and so on at a glance. | Support pause, recover operation | support custom task types +HA is supported by itself | All process definition operations are visualized, dragging tasks to draw DAGs, configuring data sources and resources. At the same time, for third-party systems, the api mode operation is provided. | Users on DolphinScheduler can achieve many-to-one or one-to-one mapping relationship through tenants and Hadoop users, which is very important for scheduling large data jobs. " | The scheduler uses distributed scheduling, and the overall scheduling capability will increase linearly with the scale of the cluster. Master and Worker support dynamic online and offline. Overload processing: Task queue mechanism, the number of schedulable tasks on a single machine can be flexibly configured, when too many tasks will be cached in the task queue, will not cause machine jam. | One-click deployment | Supports traditional shell tasks, and also support big data platform task scheduling: MR, Spark, SQL (mysql, postgresql, hive, sparksql), Python, Procedure, Sub_Process | | @@ -58,11 +58,11 @@ Overload processing: Task queue mechanism, the number of schedulable tasks on a - Front-end deployment documentation -- [**User manual**](https://dolphinscheduler.apache.org/en-us/docs/user_doc/system-manual.html?_blank "System manual") +- [**User manual**](https://dolphinscheduler.apache.org/en-us/docs/user_doc/system-manual.html?_blank "System manual") -- [**Upgrade document**](https://dolphinscheduler.apache.org/en-us/docs/release/upgrade.html?_blank "Upgrade document") +- [**Upgrade document**](https://dolphinscheduler.apache.org/en-us/docs/release/upgrade.html?_blank "Upgrade document") -- Online Demo +- Online Demo More documentation please refer to [DolphinScheduler online documentation] @@ -74,6 +74,20 @@ Work plan of Dolphin Scheduler: [R&D plan](https://github.com/apache/incubator-d Welcome to participate in contributing code, please refer to the process of submitting the code: [[How to contribute code](https://github.com/apache/incubator-dolphinscheduler/issues/310)] +### How to Build + +```bash +mvn clean install -Prelease +``` + +Artifact: + +``` +dolphinscheduler-dist/dolphinscheduler-backend/target/apache-dolphinscheduler-incubating-${latest.release.version}-dolphinscheduler-backend-bin.tar.gz: Binary package of DolphinScheduler-Backend +dolphinscheduler-dist/dolphinscheduler-front/target/apache-dolphinscheduler-incubating-${latest.release.version}-dolphinscheduler-front-bin.tar.gz: Binary package of DolphinScheduler-UI +dolphinscheduler-dist/dolphinscheduler-src/target/apache-dolphinscheduler-incubating-${latest.release.version}-src.zip: Source code package of DolphinScheduler +``` + ### Thanks Dolphin Scheduler uses a lot of excellent open source projects, such as google guava, guice, grpc, netty, ali bonecp, quartz, and many open source projects of apache, etc. @@ -86,8 +100,8 @@ It is because of the shoulders of these open source projects that the birth of t ### License Please refer to [LICENSE](https://github.com/apache/incubator-dolphinscheduler/blob/dev/LICENSE) file. - - + + diff --git a/README_zh_CN.md b/README_zh_CN.md index f64fcda1f7..6bdf7be183 100644 --- a/README_zh_CN.md +++ b/README_zh_CN.md @@ -1,13 +1,13 @@ Dolphin Scheduler ============ [![License](https://img.shields.io/badge/license-Apache%202-4EB1BA.svg)](https://www.apache.org/licenses/LICENSE-2.0.html) -[![Total Lines](https://tokei.rs/b1/github/analysys/EasyScheduler?category=lines)](https://github.com/analysys/EasyScheduler) +[![Total Lines](https://tokei.rs/b1/github/apache/Incubator-DolphinScheduler?category=lines)](https://github.com/apache/Incubator-DolphinScheduler) > Dolphin Scheduler for Big Data -[![Stargazers over time](https://starchart.cc/analysys/EasyScheduler.svg)](https://starchart.cc/analysys/EasyScheduler) +[![Stargazers over time](https://starchart.cc/apache/incubator-dolphinscheduler.svg)](https://starchart.cc/apache/incubator-dolphinscheduler) [![CN doc](https://img.shields.io/badge/文档-中文版-blue.svg)](README_zh_CN.md) [![EN doc](https://img.shields.io/badge/document-English-blue.svg)](README.md) @@ -45,11 +45,11 @@ Dolphin Scheduler - 前端部署文档 -- [**使用手册**](https://dolphinscheduler.apache.org/zh-cn/docs/user_doc/system-manual.html?_blank "系统使用手册") +- [**使用手册**](https://dolphinscheduler.apache.org/zh-cn/docs/user_doc/system-manual.html?_blank "系统使用手册") -- [**升级文档**](https://dolphinscheduler.apache.org/zh-cn/docs/release/upgrade.html?_blank "升级文档") +- [**升级文档**](https://dolphinscheduler.apache.org/zh-cn/docs/release/upgrade.html?_blank "升级文档") -- 我要体验 +- 我要体验 更多文档请参考 DolphinScheduler中文在线文档 @@ -63,11 +63,24 @@ DolphinScheduler的工作计划:> /etc/apt/sources.list - -RUN echo "mysql-server mysql-server/root_password password root" | debconf-set-selections -RUN echo "mysql-server mysql-server/root_password_again password root" | debconf-set-selections - +#5,install postgresql RUN apt-get update && \ - apt-get -y install mysql-server-5.7 && \ - mkdir -p /var/lib/mysql && \ - mkdir -p /var/run/mysqld && \ - mkdir -p /var/log/mysql && \ - chown -R mysql:mysql /var/lib/mysql && \ - chown -R mysql:mysql /var/run/mysqld && \ - chown -R mysql:mysql /var/log/mysql + apt-get install -y postgresql postgresql-contrib sudo && \ + sed -i 's/localhost/*/g' /etc/postgresql/10/main/postgresql.conf - -# UTF-8 and bind-address -RUN sed -i -e "$ a [client]\n\n[mysql]\n\n[mysqld]" /etc/mysql/my.cnf && \ - sed -i -e "s/\(\[client\]\)/\1\ndefault-character-set = utf8/g" /etc/mysql/my.cnf && \ - sed -i -e "s/\(\[mysql\]\)/\1\ndefault-character-set = utf8/g" /etc/mysql/my.cnf && \ - sed -i -e "s/\(\[mysqld\]\)/\1\ninit_connect='SET NAMES utf8'\ncharacter-set-server = utf8\ncollation-server=utf8_general_ci\nbind-address = 0.0.0.0/g" /etc/mysql/my.cnf - - -#9,安装nginx +#6,install nginx RUN apt-get update && \ apt-get install -y nginx && \ rm -rf /var/lib/apt/lists/* && \ echo "\ndaemon off;" >> /etc/nginx/nginx.conf && \ chown -R www-data:www-data /var/lib/nginx -#10,修改escheduler配置文件 -#后端配置 -RUN mkdir -p /opt/escheduler && \ - tar -zxvf /opt/easyscheduler_source/target/escheduler-${tar_version}.tar.gz -C /opt/escheduler && \ - rm -rf /opt/escheduler/conf -ADD ./conf/escheduler/conf /opt/escheduler/conf -#前端nginx配置 -ADD ./conf/nginx/default.conf /etc/nginx/conf.d - -#11,开放端口 -EXPOSE 2181 2888 3888 3306 80 12345 8888 - -#12,安装sudo,python,vim,ping和ssh +#7,install sudo,python,vim,ping and ssh command RUN apt-get update && \ apt-get -y install sudo && \ apt-get -y install python && \ @@ -132,15 +93,44 @@ RUN apt-get update && \ apt-get -y install python-pip && \ pip install kazoo -COPY ./startup.sh /root/startup.sh -#13,修改权限和设置软连 +#8,add dolphinscheduler source code to /opt/dolphinscheduler_source +ADD . /opt/dolphinscheduler_source + + +#9,backend compilation +RUN cd /opt/dolphinscheduler_source && \ + mvn clean package -Prelease -Dmaven.test.skip=true + +#10,frontend compilation +RUN chmod -R 777 /opt/dolphinscheduler_source/dolphinscheduler-ui && \ + cd /opt/dolphinscheduler_source/dolphinscheduler-ui && \ + rm -rf /opt/dolphinscheduler_source/dolphinscheduler-ui/node_modules && \ + npm install node-sass --unsafe-perm && \ + npm install && \ + npm run build + +#11,modify dolphinscheduler configuration file +#backend configuration +RUN tar -zxvf /opt/dolphinscheduler_source/dolphinscheduler-dist/dolphinscheduler-backend/target/apache-dolphinscheduler-incubating-${tar_version}-dolphinscheduler-backend-bin.tar.gz -C /opt && \ + mv /opt/apache-dolphinscheduler-incubating-${tar_version}-dolphinscheduler-backend-bin /opt/dolphinscheduler && \ + rm -rf /opt/dolphinscheduler/conf + +ADD ./dockerfile/conf/dolphinscheduler/conf /opt/dolphinscheduler/conf +#frontend nginx configuration +ADD ./dockerfile/conf/nginx/dolphinscheduler.conf /etc/nginx/conf.d + +#12,open port +EXPOSE 2181 2888 3888 3306 80 12345 8888 + +COPY ./dockerfile/startup.sh /root/startup.sh +#13,modify permissions and set soft links RUN chmod +x /root/startup.sh && \ - chmod +x /opt/escheduler/script/create_escheduler.sh && \ + chmod +x /opt/dolphinscheduler/script/create-dolphinscheduler.sh && \ chmod +x /opt/zookeeper/bin/zkServer.sh && \ - chmod +x /opt/escheduler/bin/escheduler-daemon.sh && \ + chmod +x /opt/dolphinscheduler/bin/dolphinscheduler-daemon.sh && \ rm -rf /bin/sh && \ ln -s /bin/bash /bin/sh && \ mkdir -p /tmp/xls -ENTRYPOINT ["/root/startup.sh"] +ENTRYPOINT ["/root/startup.sh"] \ No newline at end of file diff --git a/dockerfile/README.md b/dockerfile/README.md new file mode 100644 index 0000000000..33b58cacde --- /dev/null +++ b/dockerfile/README.md @@ -0,0 +1,11 @@ +## Build Image +``` + cd .. + docker build -t dolphinscheduler --build-arg version=1.1.0 --build-arg tar_version=1.1.0-SNAPSHOT -f dockerfile/Dockerfile . + docker run -p 12345:12345 -p 8888:8888 --rm --name dolphinscheduler -d dolphinscheduler +``` +* Visit the url: http://127.0.0.1:8888 +* UserName:admin Password:dolphinscheduler123 + +## Note +* MacOS: The memory of docker needs to be set to 4G, default 2G. Steps: Preferences -> Advanced -> adjust resources -> Apply & Restart diff --git a/dockerfile/conf/dolphinscheduler/conf/alert.properties b/dockerfile/conf/dolphinscheduler/conf/alert.properties new file mode 100644 index 0000000000..276ef3132a --- /dev/null +++ b/dockerfile/conf/dolphinscheduler/conf/alert.properties @@ -0,0 +1,50 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +#alert type is EMAIL/SMS +alert.type=EMAIL + +# mail server configuration +mail.protocol=SMTP +mail.server.host=smtp.126.com +mail.server.port= +mail.sender=dolphinscheduler@126.com +mail.user=dolphinscheduler@126.com +mail.passwd=escheduler123 + +# TLS +mail.smtp.starttls.enable=false +# SSL +mail.smtp.ssl.enable=true +mail.smtp.ssl.trust=smtp.126.com + +#xls file path,need create if not exist +xls.file.path=/tmp/xls + +# Enterprise WeChat configuration +enterprise.wechat.enable=false +enterprise.wechat.corp.id=xxxxxxx +enterprise.wechat.secret=xxxxxxx +enterprise.wechat.agent.id=xxxxxxx +enterprise.wechat.users=xxxxxxx +enterprise.wechat.token.url=https://qyapi.weixin.qq.com/cgi-bin/gettoken?corpid=$corpId&corpsecret=$secret +enterprise.wechat.push.url=https://qyapi.weixin.qq.com/cgi-bin/message/send?access_token=$token +enterprise.wechat.team.send.msg={\"toparty\":\"$toParty\",\"agentid\":\"$agentId\",\"msgtype\":\"text\",\"text\":{\"content\":\"$msg\"},\"safe\":\"0\"} +enterprise.wechat.user.send.msg={\"touser\":\"$toUser\",\"agentid\":\"$agentId\",\"msgtype\":\"markdown\",\"markdown\":{\"content\":\"$msg\"}} + + + diff --git a/dockerfile/conf/dolphinscheduler/conf/alert_logback.xml b/dockerfile/conf/dolphinscheduler/conf/alert_logback.xml new file mode 100644 index 0000000000..35e19865b9 --- /dev/null +++ b/dockerfile/conf/dolphinscheduler/conf/alert_logback.xml @@ -0,0 +1,49 @@ + + + + +- -
- - - -A : Change the value of master.properties **master.reserved.memory** under conf to a smaller value, say 0.1 or the value of worker.properties **worker.reserved.memory** is a smaller value, say 0.1 - -## Q: The hive version is 1.1.0+cdh5.15.0, and the SQL hive task connection is reported incorrectly. - -- -
- - -A : Will hive pom - -``` -- -
- dag example -
- - -**Process definition**: Visualization **DAG** by dragging task nodes and establishing associations of task nodes - -**Process instance**: A process instance is an instantiation of a process definition, which can be generated by manual startup or scheduling. The process definition runs once, a new process instance is generated - -**Task instance**: A task instance is the instantiation of a specific task node when a process instance runs, which indicates the specific task execution status - -**Task type**: Currently supports SHELL, SQL, SUB_PROCESS (sub-process), PROCEDURE, MR, SPARK, PYTHON, DEPENDENT (dependency), and plans to support dynamic plug-in extension, note: the sub-**SUB_PROCESS** is also A separate process definition that can be launched separately - -**Schedule mode** : The system supports timing schedule and manual schedule based on cron expressions. Command type support: start workflow, start execution from current node, resume fault-tolerant workflow, resume pause process, start execution from failed node, complement, timer, rerun, pause, stop, resume waiting thread. Where **recovers the fault-tolerant workflow** and **restores the waiting thread** The two command types are used by the scheduling internal control and cannot be called externally - -**Timed schedule**: The system uses **quartz** distributed scheduler and supports the generation of cron expression visualization - -**Dependency**: The system does not only support **DAG** Simple dependencies between predecessors and successor nodes, but also provides **task dependencies** nodes, support for custom task dependencies between processes** - -**Priority**: Supports the priority of process instances and task instances. If the process instance and task instance priority are not set, the default is first in, first out. - -**Mail Alert**: Support **SQL Task** Query Result Email Send, Process Instance Run Result Email Alert and Fault Tolerant Alert Notification - -**Failure policy**: For tasks running in parallel, if there are tasks that fail, two failure policy processing methods are provided. **Continue** means that the status of the task is run in parallel until the end of the process failure. **End** means that once a failed task is found, Kill also drops the running parallel task and the process ends. - -**Complement**: Complement historical data, support ** interval parallel and serial ** two complement methods - - - -### 2.System architecture - -#### 2.1 System Architecture Diagram -- -
- System Architecture Diagram -
- - - - -#### 2.2 Architectural description - -* **MasterServer** - - MasterServer adopts the distributed non-central design concept. MasterServer is mainly responsible for DAG task split, task submission monitoring, and monitoring the health status of other MasterServer and WorkerServer. - When the MasterServer service starts, it registers a temporary node with Zookeeper, and listens to the Zookeeper temporary node state change for fault tolerance processing. - - - - ##### The service mainly contains: - - - **Distributed Quartz** distributed scheduling component, mainly responsible for the start and stop operation of the scheduled task. When the quartz picks up the task, the master internally has a thread pool to be responsible for the subsequent operations of the task. - - - **MasterSchedulerThread** is a scan thread that periodically scans the **command** table in the database for different business operations based on different ** command types** - - - **MasterExecThread** is mainly responsible for DAG task segmentation, task submission monitoring, logic processing of various command types - - - **MasterTaskExecThread** is mainly responsible for task persistence - - - -* **WorkerServer** - - - WorkerServer also adopts a distributed, non-central design concept. WorkerServer is mainly responsible for task execution and providing log services. When the WorkerServer service starts, it registers the temporary node with Zookeeper and maintains the heartbeat. - - ##### This service contains: - - - **FetchTaskThread** is mainly responsible for continuously receiving tasks from **Task Queue** and calling **TaskScheduleThread** corresponding executors according to different task types. - - **LoggerServer** is an RPC service that provides functions such as log fragment viewing, refresh and download. - - - **ZooKeeper** - - The ZooKeeper service, the MasterServer and the WorkerServer nodes in the system all use the ZooKeeper for cluster management and fault tolerance. In addition, the system also performs event monitoring and distributed locking based on ZooKeeper. - We have also implemented queues based on Redis, but we hope that EasyScheduler relies on as few components as possible, so we finally removed the Redis implementation. - - - **Task Queue** - - The task queue operation is provided. Currently, the queue is also implemented based on Zookeeper. Since there is less information stored in the queue, there is no need to worry about too much data in the queue. In fact, we have over-measured a million-level data storage queue, which has no effect on system stability and performance. - - - **Alert** - - Provides alarm-related interfaces. The interfaces mainly include **Alarms**. The storage, query, and notification functions of the two types of alarm data. The notification function has two types: **mail notification** and **SNMP (not yet implemented)**. - - - **API** - - The API interface layer is mainly responsible for processing requests from the front-end UI layer. The service provides a RESTful api to provide request services externally. - Interfaces include workflow creation, definition, query, modification, release, offline, manual start, stop, pause, resume, start execution from this node, and more. - - - **UI** - - The front-end page of the system provides various visual operation interfaces of the system. For details, see the **[System User Manual] (System User Manual.md)** section. - - - -#### 2.3 Architectural Design Ideas - -##### I. Decentralized vs centralization - -###### Centralization Thought - -The centralized design concept is relatively simple. The nodes in the distributed cluster are divided into two roles according to their roles: - -- -
- -- The role of Master is mainly responsible for task distribution and supervising the health status of Slave. It can dynamically balance the task to Slave, so that the Slave node will not be "busy" or "free". -- The role of the Worker is mainly responsible for the execution of the task and maintains the heartbeat with the Master so that the Master can assign tasks to the Slave. - -Problems in the design of centralized : - -- Once the Master has a problem, the group has no leader and the entire cluster will crash. In order to solve this problem, most Master/Slave architecture modes adopt the design scheme of the master and backup masters, which can be hot standby or cold standby, automatic switching or manual switching, and more and more new systems are available. Automatically elects the ability to switch masters to improve system availability. -- Another problem is that if the Scheduler is on the Master, although it can support different tasks in one DAG running on different machines, it will generate overload of the Master. If the Scheduler is on the Slave, all tasks in a DAG can only be submitted on one machine. If there are more parallel tasks, the pressure on the Slave may be larger. - -###### Decentralization - - - - -- In the decentralized design, there is usually no Master/Slave concept, all roles are the same, the status is equal, the global Internet is a typical decentralized distributed system, networked arbitrary node equipment down machine , all will only affect a small range of features. -- The core design of decentralized design is that there is no "manager" that is different from other nodes in the entire distributed system, so there is no single point of failure problem. However, since there is no "manager" node, each node needs to communicate with other nodes to get the necessary machine information, and the unreliable line of distributed system communication greatly increases the difficulty of implementing the above functions. -- In fact, truly decentralized distributed systems are rare. Instead, dynamic centralized distributed systems are constantly emerging. Under this architecture, the managers in the cluster are dynamically selected, rather than preset, and when the cluster fails, the nodes of the cluster will spontaneously hold "meetings" to elect new "managers". Go to preside over the work. The most typical case is the Etcd implemented in ZooKeeper and Go. - -- Decentralization of EasyScheduler is the registration of Master/Worker to ZooKeeper. The Master Cluster and the Worker Cluster are not centered, and the Zookeeper distributed lock is used to elect one Master or Worker as the “manager” to perform the task. - -##### 二、Distributed lock practice - -EasyScheduler uses ZooKeeper distributed locks to implement only one Master to execute the Scheduler at the same time, or only one Worker to perform task submission. - -1. The core process algorithm for obtaining distributed locks is as follows - -- -
- -2. Scheduler thread distributed lock implementation flow chart in EasyScheduler: - -- -
- -##### Third, the thread is insufficient loop waiting problem - -- If there is no subprocess in a DAG, if the number of data in the Command is greater than the threshold set by the thread pool, the direct process waits or fails. -- If a large number of sub-processes are nested in a large DAG, the following figure will result in a "dead" state: - -- -
- -In the above figure, MainFlowThread waits for SubFlowThread1 to end, SubFlowThread1 waits for SubFlowThread2 to end, SubFlowThread2 waits for SubFlowThread3 to end, and SubFlowThread3 waits for a new thread in the thread pool, then the entire DAG process cannot end, and thus the thread cannot be released. This forms the state of the child parent process loop waiting. At this point, the scheduling cluster will no longer be available unless a new Master is started to add threads to break such a "stuck." - -It seems a bit unsatisfactory to start a new Master to break the deadlock, so we proposed the following three options to reduce this risk: - -1. Calculate the sum of the threads of all Masters, and then calculate the number of threads required for each DAG, that is, pre-calculate before the DAG process is executed. Because it is a multi-master thread pool, the total number of threads is unlikely to be obtained in real time. -2. Judge the single master thread pool. If the thread pool is full, let the thread fail directly. -3. Add a Command type with insufficient resources. If the thread pool is insufficient, the main process will be suspended. This way, the thread pool has a new thread, which can make the process with insufficient resources hang up and wake up again. - -Note: The Master Scheduler thread is FIFO-enabled when it gets the Command. - -So we chose the third way to solve the problem of insufficient threads. - -##### IV. Fault Tolerant Design - -Fault tolerance is divided into service fault tolerance and task retry. Service fault tolerance is divided into two types: Master Fault Tolerance and Worker Fault Tolerance. - -###### 1. Downtime fault tolerance - -Service fault tolerance design relies on ZooKeeper's Watcher mechanism. The implementation principle is as follows: - -- -
- -The Master monitors the directories of other Masters and Workers. If the remove event is detected, the process instance is fault-tolerant or the task instance is fault-tolerant according to the specific business logic. - - - -- Master fault tolerance flow chart: - -- -
- -After the ZooKeeper Master is fault-tolerant, it is rescheduled by the Scheduler thread in EasyScheduler. It traverses the DAG to find the "Running" and "Submit Successful" tasks, and monitors the status of its task instance for the "Running" task. You need to determine whether the Task Queue already exists. If it exists, monitor the status of the task instance. If it does not exist, resubmit the task instance. - - - -- Worker fault tolerance flow chart: - -- -
- -Once the Master Scheduler thread finds the task instance as "need to be fault tolerant", it takes over the task and resubmits. - - Note: Because the "network jitter" may cause the node to lose the heartbeat of ZooKeeper in a short time, the node's remove event occurs. In this case, we use the easiest way, that is, once the node has timeout connection with ZooKeeper, it will directly stop the Master or Worker service. - -###### 2. Task failure retry - -Here we must first distinguish between the concept of task failure retry, process failure recovery, and process failure rerun: - -- Task failure Retry is task level, which is automatically performed by the scheduling system. For example, if a shell task sets the number of retries to 3 times, then the shell task will try to run up to 3 times after failing to run. -- Process failure recovery is process level, is done manually, recovery can only be performed from the failed node ** or ** from the current node ** -- Process failure rerun is also process level, is done manually, rerun is from the start node - - - -Next, let's talk about the topic, we divided the task nodes in the workflow into two types. - -- One is a business node, which corresponds to an actual script or processing statement, such as a Shell node, an MR node, a Spark node, a dependent node, and so on. -- There is also a logical node, which does not do the actual script or statement processing, but the logical processing of the entire process flow, such as sub-flow sections. - -Each ** service node** can configure the number of failed retries. When the task node fails, it will automatically retry until it succeeds or exceeds the configured number of retries. **Logical node** does not support failed retry. But the tasks in the logical nodes support retry. - -If there is a task failure in the workflow that reaches the maximum number of retries, the workflow will fail to stop, and the failed workflow can be manually rerun or process resumed. - - - -##### V. Task priority design - -In the early scheduling design, if there is no priority design and fair scheduling design, it will encounter the situation that the task submitted first may be completed simultaneously with the task submitted subsequently, but the priority of the process or task cannot be set. We have redesigned this, and we are currently designing it as follows: - -- According to ** different process instance priority ** prioritizes ** same process instance priority ** prioritizes ** task priority within the same process ** takes precedence over ** same process ** commit order from high Go to low for task processing. - - - The specific implementation is to resolve the priority according to the json of the task instance, and then save the ** process instance priority _ process instance id_task priority _ task id** information in the ZooKeeper task queue, when obtained from the task queue, Through string comparison, you can get the task that needs to be executed first. - - - The priority of the process definition is that some processes need to be processed before other processes. This can be configured at the start of the process or at the time of scheduled start. There are 5 levels, followed by HIGHEST, HIGH, MEDIUM, LOW, and LOWEST. As shown below - -- -
- - - The priority of the task is also divided into 5 levels, followed by HIGHEST, HIGH, MEDIUM, LOW, and LOWEST. As shown below - -- -
- -##### VI. Logback and gRPC implement log access - -- Since the Web (UI) and Worker are not necessarily on the same machine, viewing the log is not as it is for querying local files. There are two options: - - Put the logs on the ES search engine - - Obtain remote log information through gRPC communication -- Considering the lightweightness of EasyScheduler as much as possible, gRPC was chosen to implement remote access log information. - -- -
- -- We use a custom Logback FileAppender and Filter function to generate a log file for each task instance. -- The main implementation of FileAppender is as follows: - -```java - /** - * task log appender - */ - Public class TaskLogAppender extends FileAppender- -
- -* Create queue - -- -
- - * Create tenant -- -
- - * Creating Ordinary Users -- -
- - * Create an alarm group - -- -
- - * Log in with regular users - > Click on the user name in the upper right corner to "exit" and re-use the normal user login. - - * Project Management - > Create Project - > Click on Project Name -- -
- - * Click Workflow Definition - > Create Workflow Definition - > Online Process Definition - -- -
- - * Running Process Definition - > Click Workflow Instance - > Click Process Instance Name - > Double-click Task Node - > View Task Execution Log - -- -
- - diff --git a/docs/en_US/system-manual.md b/docs/en_US/system-manual.md deleted file mode 100644 index d571e1d66f..0000000000 --- a/docs/en_US/system-manual.md +++ /dev/null @@ -1,699 +0,0 @@ -# System Use Manual - -## Operational Guidelines - -### Create a project - - - Click "Project - > Create Project", enter project name, description, and click "Submit" to create a new project. - - Click on the project name to enter the project home page. -- -
- -> Project Home Page contains task status statistics, process status statistics. - - - Task State Statistics: It refers to the statistics of the number of tasks to be run, failed, running, completed and succeeded in a given time frame. - - Process State Statistics: It refers to the statistics of the number of waiting, failing, running, completing and succeeding process instances in a specified time range. - - Process Definition Statistics: The process definition created by the user and the process definition granted by the administrator to the user are counted. - - -### Creating Process definitions - - Go to the project home page, click "Process definitions" and enter the list page of process definition. - - Click "Create process" to create a new process definition. - - Drag the "SHELL" node to the canvas and add a shell task. - - Fill in the Node Name, Description, and Script fields. - - Selecting "task priority" will give priority to high-level tasks in the execution queue. Tasks with the same priority will be executed in the first-in-first-out order. - - Timeout alarm. Fill in "Overtime Time". When the task execution time exceeds the overtime, it can alarm and fail over time. - - Fill in "Custom Parameters" and refer to [Custom Parameters](#Custom Parameters) -- -
- - Increase the order of execution between nodes: click "line connection". As shown, task 1 and task 3 are executed in parallel. When task 1 is executed, task 2 and task 3 are executed simultaneously. - -- -
- - - Delete dependencies: Click on the arrow icon to "drag nodes and select items", select the connection line, click on the delete icon to delete dependencies between nodes. -- -
- - - Click "Save", enter the name of the process definition, the description of the process definition, and set the global parameters. - -- -
- - - For other types of nodes, refer to [task node types and parameter settings](#task node types and parameter settings) - -### Execution process definition - - **The process definition of the off-line state can be edited, but not run**, so the on-line workflow is the first step. - > Click on the Process definition, return to the list of process definitions, click on the icon "online", online process definition. - - > Before setting workflow offline, the timed tasks in timed management should be offline, so that the definition of workflow can be set offline successfully. - - - Click "Run" to execute the process. Description of operation parameters: - * Failure strategy:**When a task node fails to execute, other parallel task nodes need to execute the strategy**。”Continue "Representation: Other task nodes perform normally", "End" Representation: Terminate all ongoing tasks and terminate the entire process. - * Notification strategy:When the process is over, send process execution information notification mail according to the process status. - * Process priority: The priority of process running is divided into five levels:the highest, the high, the medium, the low, and the lowest . High-level processes are executed first in the execution queue, and processes with the same priority are executed first in first out order. - * Worker group: This process can only be executed in a specified machine group. Default, by default, can be executed on any worker. - * Notification group: When the process ends or fault tolerance occurs, process information is sent to all members of the notification group by mail. - * Recipient: Enter the mailbox and press Enter key to save. When the process ends and fault tolerance occurs, an alert message is sent to the recipient list. - * Cc: Enter the mailbox and press Enter key to save. When the process is over and fault-tolerant occurs, alarm messages are copied to the copier list. - -- -
- - * Complement: To implement the workflow definition of a specified date, you can select the time range of the complement (currently only support for continuous days), such as the data from May 1 to May 10, as shown in the figure: - -- -
- -> Complement execution mode includes serial execution and parallel execution. In serial mode, the complement will be executed sequentially from May 1 to May 10. In parallel mode, the tasks from May 1 to May 10 will be executed simultaneously. - -### Timing Process Definition - - Create Timing: "Process Definition - > Timing" - - Choose start-stop time, in the start-stop time range, regular normal work, beyond the scope, will not continue to produce timed workflow instances. - -- -
- - - Add a timer to be executed once a day at 5:00 a.m. as shown below: -- -
- - - Timely online,**the newly created timer is offline. You need to click "Timing Management - >online" to work properly.** - -### View process instances - > Click on "Process Instances" to view the list of process instances. - - > Click on the process name to see the status of task execution. - -- -
- - > Click on the task node, click "View Log" to view the task execution log. - -- -
- - > Click on the task instance node, click **View History** to view the list of task instances that the process instance runs. - -- -
- - - > Operations on workflow instances: - -- -
- - * Editor: You can edit the terminated process. When you save it after editing, you can choose whether to update the process definition or not. - * Rerun: A process that has been terminated can be re-executed. - * Recovery failure: For a failed process, a recovery failure operation can be performed, starting at the failed node. - * Stop: Stop the running process, the background will `kill` he worker process first, then `kill -9` operation. - * Pause:The running process can be **suspended**, the system state becomes **waiting to be executed**, waiting for the end of the task being executed, and suspending the next task to be executed. - * Restore pause: **The suspended process** can be restored and run directly from the suspended node - * Delete: Delete process instances and task instances under process instances - * Gantt diagram: The vertical axis of Gantt diagram is the topological ordering of task instances under a process instance, and the horizontal axis is the running time of task instances, as shown in the figure: -- -
- -### View task instances - > Click on "Task Instance" to enter the Task List page and query the performance of the task. - > - > - -- -
- - > Click "View Log" in the action column to view the log of task execution. - -- -
- -### Create data source - > Data Source Center supports MySQL, POSTGRESQL, HIVE and Spark data sources. - -#### Create and edit MySQL data source - - - Click on "Datasource - > Create Datasources" to create different types of datasources according to requirements. -- Datasource: Select MYSQL -- Datasource Name: Name of Input Datasource -- Description: Description of input datasources -- IP: Enter the IP to connect to MySQL -- Port: Enter the port to connect MySQL -- User name: Set the username to connect to MySQL -- Password: Set the password to connect to MySQL -- Database name: Enter the name of the database connecting MySQL -- Jdbc connection parameters: parameter settings for MySQL connections, filled in as JSON - -- -
- - > Click "Test Connect" to test whether the data source can be successfully connected. - > - > - -#### Create and edit POSTGRESQL data source - -- Datasource: Select POSTGRESQL -- Datasource Name: Name of Input Data Source -- Description: Description of input data sources -- IP: Enter IP to connect to POSTGRESQL -- Port: Input port to connect POSTGRESQL -- Username: Set the username to connect to POSTGRESQL -- Password: Set the password to connect to POSTGRESQL -- Database name: Enter the name of the database connecting to POSTGRESQL -- Jdbc connection parameters: parameter settings for POSTGRESQL connections, filled in as JSON - -- -
- -#### Create and edit HIVE data source - -1.Connect with HiveServer 2 - -- -
- - - Datasource: Select HIVE -- Datasource Name: Name of Input Datasource -- Description: Description of input datasources -- IP: Enter IP to connect to HIVE -- Port: Input port to connect to HIVE -- Username: Set the username to connect to HIVE -- Password: Set the password to connect to HIVE -- Database Name: Enter the name of the database connecting to HIVE -- Jdbc connection parameters: parameter settings for HIVE connections, filled in in as JSON - -2.Connect using Hive Server 2 HA Zookeeper mode - -- -
- - -Note: If **kerberos** is turned on, you need to fill in **Principal** -- -
- - - - -#### Create and Edit Datasource - -- -
- -- Datasource: Select Spark -- Datasource Name: Name of Input Datasource -- Description: Description of input datasources -- IP: Enter the IP to connect to Spark -- Port: Input port to connect Spark -- Username: Set the username to connect to Spark -- Password: Set the password to connect to Spark -- Database name: Enter the name of the database connecting to Spark -- Jdbc Connection Parameters: Parameter settings for Spark Connections, filled in as JSON - - - -Note: If **kerberos** If Kerberos is turned on, you need to fill in **Principal** - -- -
- -### Upload Resources - - Upload resource files and udf functions, all uploaded files and resources will be stored on hdfs, so the following configuration items are required: - -``` -conf/common/common.properties - -- hdfs.startup.state=true -conf/common/hadoop.properties - -- fs.defaultFS=hdfs://xxxx:8020 - -- yarn.resourcemanager.ha.rm.ids=192.168.xx.xx,192.168.xx.xx - -- yarn.application.status.address=http://xxxx:8088/ws/v1/cluster/apps/%s -``` - -#### File Manage - - > It is the management of various resource files, including creating basic txt/log/sh/conf files, uploading jar packages and other types of files, editing, downloading, deleting and other operations. - > - > - >- > - >
- - * Create file - > File formats support the following types:txt、log、sh、conf、cfg、py、java、sql、xml、hql - -- -
- - * Upload Files - -> Upload Files: Click the Upload button to upload, drag the file to the upload area, and the file name will automatically complete the uploaded file name. - -- -
- - - * File View - -> For viewable file types, click on the file name to view file details - -- -
- - * Download files - -> You can download a file by clicking the download button in the top right corner of the file details, or by downloading the file under the download button after the file list. - - * File rename - -- -
- -#### Delete -> File List - > Click the Delete button to delete the specified file - -#### Resource management - > Resource management and file management functions are similar. The difference is that resource management is the UDF function of uploading, and file management uploads user programs, scripts and configuration files. - - * Upload UDF resources - > The same as uploading files. - -#### Function management - - * Create UDF Functions - > Click "Create UDF Function", enter parameters of udf function, select UDF resources, and click "Submit" to create udf function. - > - > - > - > Currently only temporary udf functions for HIVE are supported - > - > - > - > - UDF function name: name when entering UDF Function - > - Package Name: Full Path of Input UDF Function - > - Parameter: Input parameters used to annotate functions - > - Database Name: Reserved Field for Creating Permanent UDF Functions - > - UDF Resources: Set up the resource files corresponding to the created UDF - > - > - -- -
- -## Security - - - The security has the functions of queue management, tenant management, user management, warning group management, worker group manager, token manage and other functions. It can also authorize resources, data sources, projects, etc. -- Administrator login, default username password: admin/escheduler 123 - - - -### Create queues - - - - - Queues are used to execute spark, mapreduce and other programs, which require the use of "queue" parameters. -- "Security" - > "Queue Manage" - > "Create Queue" -- -
- - -### Create Tenants - - The tenant corresponds to the account of Linux, which is used by the worker server to submit jobs. If Linux does not have this user, the worker would create the account when executing the task. - - Tenant Code:**the tenant code is the only account on Linux that can't be duplicated.** - -- -
- -### Create Ordinary Users - - User types are **ordinary users** and **administrator users**.. - * Administrators have **authorization and user management** privileges, and no privileges to **create project and process-defined operations**. - * Ordinary users can **create projects and create, edit, and execute process definitions**. - * Note: **If the user switches the tenant, all resources under the tenant will be copied to the switched new tenant.** -- -
- -### Create alarm group - * The alarm group is a parameter set at start-up. After the process is finished, the status of the process and other information will be sent to the alarm group by mail. - * New and Editorial Warning Group -- -
- -### Create Worker Group - - Worker group provides a mechanism for tasks to run on a specified worker. Administrators create worker groups, which can be specified in task nodes and operation parameters. If the specified grouping is deleted or no grouping is specified, the task will run on any worker. -- Multiple IP addresses within a worker group (**aliases can not be written**), separated by **commas in English** - -- -
- -### Token manage - - Because the back-end interface has login check and token management, it provides a way to operate the system by calling the interface. -- Call examples: - -```令牌调用示例 - /** - * test token - */ - public void doPOSTParam()throws Exception{ - // create HttpClient - CloseableHttpClient httpclient = HttpClients.createDefault(); - - // create http post request - HttpPost httpPost = new HttpPost("http://127.0.0.1:12345/escheduler/projects/create"); - httpPost.setHeader("token", "123"); - // set parameters - List- -
- -- 2.Select the project button to authorize the project - -- -
- -### Monitor center - - Service management is mainly to monitor and display the health status and basic information of each service in the system. - -#### Master monitor - - Mainly related information about master. -- -
- -#### Worker monitor - - Mainly related information of worker. - -- -
- -#### Zookeeper monitor - - Mainly the configuration information of each worker and master in zookpeeper. - -- -
- -#### Mysql monitor - - Mainly the health status of mysql - -- -
- -## Task Node Type and Parameter Setting - -### Shell - - - The shell node, when the worker executes, generates a temporary shell script, which is executed by a Linux user with the same name as the tenant. -> Drag the ![PNG](https://analysys.github.io/easyscheduler_docs/images/toolbar_SHELL.png) task node in the toolbar onto the palette and double-click the task node as follows: - -- -
` - -- Node name: The node name in a process definition is unique -- Run flag: Identify whether the node can be scheduled properly, and if it does not need to be executed, you can turn on the forbidden execution switch. -- Description : Describes the function of the node -- Number of failed retries: Number of failed task submissions, support drop-down and manual filling -- Failure Retry Interval: Interval between tasks that fail to resubmit tasks, support drop-down and manual filling -- Script: User-developed SHELL program -- Resources: A list of resource files that need to be invoked in a script -- Custom parameters: User-defined parameters that are part of SHELL replace the contents of scripts with ${variables} - -### SUB_PROCESS - - The sub-process node is to execute an external workflow definition as an task node. -> Drag the ![PNG](https://analysys.github.io/easyscheduler_docs_cn/images/toolbar_SUB_PROCESS.png) task node in the toolbar onto the palette and double-click the task node as follows: - -- -
- -- Node name: The node name in a process definition is unique -- Run flag: Identify whether the node is scheduled properly -- Description: Describes the function of the node -- Sub-node: The process definition of the selected sub-process is selected, and the process definition of the selected sub-process can be jumped to by entering the sub-node in the upper right corner. - -### DEPENDENT - - - Dependent nodes are **dependent checking nodes**. For example, process A depends on the successful execution of process B yesterday, and the dependent node checks whether process B has a successful execution instance yesterday. - -> Drag the ![PNG](https://analysys.github.io/easyscheduler_docs/images/toolbar_DEPENDENT.png) ask node in the toolbar onto the palette and double-click the task node as follows: - -- -
- - > Dependent nodes provide logical judgment functions, such as checking whether yesterday's B process was successful or whether the C process was successfully executed. - -- -
- - > For example, process A is a weekly task and process B and C are daily tasks. Task A requires that task B and C be successfully executed every day of the last week, as shown in the figure: - -- -
- - > If weekly A also needs to be implemented successfully on Tuesday: - -- -
- -### PROCEDURE - - The procedure is executed according to the selected data source. -> Drag the ![PNG](https://analysys.github.io/easyscheduler_docs/images/toolbar_PROCEDURE.png) task node in the toolbar onto the palette and double-click the task node as follows: - -- -
- -- Datasource: The data source type of stored procedure supports MySQL and POSTGRESQL, and chooses the corresponding data source. -- Method: The method name of the stored procedure -- Custom parameters: Custom parameter types of stored procedures support IN and OUT, and data types support nine data types: VARCHAR, INTEGER, LONG, FLOAT, DOUBLE, DATE, TIME, TIMESTAMP and BOOLEAN. - -### SQL - - Execute non-query SQL functionality -- -
- - - Executing the query SQL function, you can choose to send mail in the form of tables and attachments to the designated recipients. -> Drag the ![PNG](https://analysys.github.io/easyscheduler_docs/images/toolbar_SQL.png) task node in the toolbar onto the palette and double-click the task node as follows: - -- -
- -- Datasource: Select the corresponding datasource -- sql type: support query and non-query, query is select type query, there is a result set returned, you can specify mail notification as table, attachment or table attachment three templates. Non-query is not returned by result set, and is for update, delete, insert three types of operations -- sql parameter: input parameter format is key1 = value1; key2 = value2... -- sql statement: SQL statement -- UDF function: For HIVE type data sources, you can refer to UDF functions created in the resource center, other types of data sources do not support UDF functions for the time being. -- Custom parameters: SQL task type, and stored procedure is to customize the order of parameters to set values for methods. Custom parameter type and data type are the same as stored procedure task type. The difference is that the custom parameter of the SQL task type replaces the ${variable} in the SQL statement. - - - -### SPARK - - - Through SPARK node, SPARK program can be directly executed. For spark node, worker will use `spark-submit` mode to submit tasks. - -> Drag the ![PNG](https://analysys.github.io/easyscheduler_docs/images/toolbar_SPARK.png) task node in the toolbar onto the palette and double-click the task node as follows: -> -> - -- -
- -- Program Type: Support JAVA, Scala and Python -- Class of the main function: The full path of Main Class, the entry to the Spark program -- Master jar package: It's Spark's jar package -- Deployment: support three modes: yarn-cluster, yarn-client, and local -- Driver Kernel Number: Driver Kernel Number and Memory Number can be set -- Executor Number: Executor Number, Executor Memory Number and Executor Kernel Number can be set -- Command Line Parameters: Setting the input parameters of Spark program to support the replacement of custom parameter variables. -- Other parameters: support - jars, - files, - archives, - conf format -- Resource: If a resource file is referenced in other parameters, you need to select the specified resource. -- Custom parameters: User-defined parameters in MR locality that replace the contents in scripts with ${variables} - -Note: JAVA and Scala are just used for identification, no difference. If it's a Spark developed by Python, there's no class of the main function, and everything else is the same. - -### MapReduce(MR) - - Using MR nodes, MR programs can be executed directly. For Mr nodes, worker submits tasks using `hadoop jar` - - -> Drag the ![PNG](https://analysys.github.io/easyscheduler_docs/images/toolbar_MR.png) task node in the toolbar onto the palette and double-click the task node as follows: - - 1. JAVA program - -- -
- -- Class of the main function: The full path of the MR program's entry Main Class -- Program Type: Select JAVA Language -- Master jar package: MR jar package -- Command Line Parameters: Setting the input parameters of MR program to support the replacement of custom parameter variables -- Other parameters: support - D, - files, - libjars, - archives format -- Resource: If a resource file is referenced in other parameters, you need to select the specified resource. -- Custom parameters: User-defined parameters in MR locality that replace the contents in scripts with ${variables} - -2. Python program - -- -
- -- Program Type: Select Python Language -- Main jar package: Python jar package running MR -- Other parameters: support - D, - mapper, - reducer, - input - output format, where user-defined parameters can be set, such as: -- mapper "mapper.py 1" - file mapper.py-reducer reducer.py-file reducer.py-input/journey/words.txt-output/journey/out/mr/${current TimeMillis} -- Among them, mapper. py 1 after - mapper is two parameters, the first parameter is mapper. py, and the second parameter is 1. -- Resource: If a resource file is referenced in other parameters, you need to select the specified resource. -- Custom parameters: User-defined parameters in MR locality that replace the contents in scripts with ${variables} - -### Python - - With Python nodes, Python scripts can be executed directly. For Python nodes, worker will use `python ** `to submit tasks. - - - - -> Drag the ![PNG](https://analysys.github.io/easyscheduler_docs/images/toolbar_PYTHON.png) task node in the toolbar onto the palette and double-click the task node as follows: - -- -
- -- Script: User-developed Python program -- Resource: A list of resource files that need to be invoked in a script -- Custom parameters: User-defined parameters that are part of Python that replace the contents in the script with ${variables} - -### System parameter - -variable | meaning |
---|---|
${system.biz.date} | -The timing time of routine dispatching instance is one day before, in yyyyyMMdd format. When data is supplemented, the date + 1 | -
${system.biz.curdate} | -Daily scheduling example timing time, format is yyyyyMMdd, when supplementing data, the date + 1 | -
${system.datetime} | -Daily scheduling example timing time, format is yyyyyMMddHmmss, when supplementing data, the date + 1 | -
- -
- -> global_bizdate is a global parameter, referring to system parameters. - -- -
- -> In tasks, local_param_bizdate refers to global parameters by ${global_bizdate} for scripts, the value of variable local_param_bizdate can be referenced by${local_param_bizdate}, or the value of local_param_bizdate can be set directly by JDBC. - - - diff --git a/docs/en_US/upgrade.md b/docs/en_US/upgrade.md deleted file mode 100644 index b5c743fd84..0000000000 --- a/docs/en_US/upgrade.md +++ /dev/null @@ -1,39 +0,0 @@ - -# EasyScheduler upgrade documentation - -## 1. Back up the previous version of the files and database - -## 2. Stop all services of escheduler - - `sh ./script/stop-all.sh` - -## 3. Download the new version of the installation package - -- [gitee](https://gitee.com/easyscheduler/EasyScheduler/attach_files), download the latest version of the front and back installation packages (backend referred to as escheduler-backend, front end referred to as escheduler-ui) -- The following upgrade operations need to be performed in the new version of the directory - -## 4. Database upgrade -- Modify the following properties in conf/dao/data_source.properties - -``` - spring.datasource.url - spring.datasource.username - spring.datasource.password -``` - -- Execute database upgrade script - -`sh ./script/upgrade-escheduler.sh` - -## 5. Backend service upgrade - -- Modify the content of the install.sh configuration and execute the upgrade script - - `sh install.sh` - -## 6. Frontend service upgrade - -- Overwrite the previous version of the dist directory -- Restart the nginx service - - `systemctl restart nginx` \ No newline at end of file diff --git a/docs/zh_CN/1.0.1-release.md b/docs/zh_CN/1.0.1-release.md deleted file mode 100644 index 1902fbb04a..0000000000 --- a/docs/zh_CN/1.0.1-release.md +++ /dev/null @@ -1,16 +0,0 @@ -Easy Scheduler Release 1.0.1 -=== -Easy Scheduler 1.0.2是1.x系列中的第二个版本。更新内容具体如下: - -- 1,outlook TSL 发邮件支持 -- 2,servlet 和 protobuf jar冲突解决 -- 3,创建租户同时建立linux用户 -- 4,重跑时间负数 -- 5,单机和集群都可以使用install.sh一键部署 -- 6,队列支持界面添加 -- 7,escheduler.t_escheduler_queue 增加了create_time和update_time字段 - - - - - diff --git a/docs/zh_CN/1.0.2-release.md b/docs/zh_CN/1.0.2-release.md deleted file mode 100644 index c3bacc29c9..0000000000 --- a/docs/zh_CN/1.0.2-release.md +++ /dev/null @@ -1,49 +0,0 @@ -Easy Scheduler Release 1.0.2 -=== -Easy Scheduler 1.0.2是1.x系列中的第三个版本。此版本增加了调度开放接口、worker分组(指定任务运行的机器组)、任务流程及服务监控以及对oracle、clickhouse等支持,具体如下: - -新特性: -=== -- [[EasyScheduler-79](https://github.com/analysys/EasyScheduler/issues/79)] 调度通过token方式对外开放接口,可以通过api进行操作 -- [[EasyScheduler-138](https://github.com/analysys/EasyScheduler/issues/138)] 可以指定任务运行的机器(组) -- [[EasyScheduler-139](https://github.com/analysys/EasyScheduler/issues/139)] 任务流程监控及Master、Worker、Zookeeper运行状态监控 -- [[EasyScheduler-140](https://github.com/analysys/EasyScheduler/issues/140)] 工作流定义—增加流程超时报警 -- [[EasyScheduler-134](https://github.com/analysys/EasyScheduler/issues/134)] 任务类型支持Oracle、CLICKHOUSE、SQLSERVER、IMPALA -- [[EasyScheduler-136](https://github.com/analysys/EasyScheduler/issues/136)] Sql任务节点可以独立选取抄送邮件用户 -- [[EasyScheduler-141](https://github.com/analysys/EasyScheduler/issues/141)] 用户管理—用户可以绑定队列,用户队列级别高于租户队列级别,如果用户队列为空,则寻找租户队列 - - - -增强: -=== -- [[EasyScheduler-154](https://github.com/analysys/EasyScheduler/issues/154)] 租户编码允许纯数字或者下划线这种的编码 - - -修复: -=== -- [[EasyScheduler-135](https://github.com/analysys/EasyScheduler/issues/135)] Python任务可以指定python版本 - -- [[EasyScheduler-125](https://github.com/analysys/EasyScheduler/issues/125)] 用户账号中手机号无法识别联通最新号码166开头 - -- [[EasyScheduler-178](https://github.com/analysys/EasyScheduler/issues/178)] 修复ProcessDao里细微的拼写错误 - -- [[EasyScheduler-129](https://github.com/analysys/EasyScheduler/issues/129)] 租户管理中,租户编码带下划线等特殊字符无法通过校验 - - -感谢: -=== -最后但最重要的是,没有以下伙伴的贡献就没有新版本的诞生: - -Baoqi , chubbyjiang , coreychen , chgxtony, cmdares , datuzi , dingchao, fanguanqun , 风清扬, gaojun416 , googlechorme, hyperknob , hujiang75277381 , huanzui , kinssun, ivivi727 ,jimmy, jiangzhx , kevin5210 , lidongdai , lshmouse , lenboo, lyf198972 , lgcareer , lzy305 , moranrr , millionfor , mazhong8808, programlief, qiaozhanwei , roy110 , swxchappy , sherlock111 , samz406 , swxchappy, qq389401879 , lzy305, vkingnew, William-GuoWei , woniulinux, yyl861, zhangxin1988, yangjiajun2014, yangqinlong, yangjiajun2014, zhzhenqin, zhangluck, zhanghaicheng1, zhuyizhizhi - -以及微信群里众多的热心伙伴!在此非常感谢! - - - - - - - - - - diff --git a/docs/zh_CN/1.0.3-release.md b/docs/zh_CN/1.0.3-release.md deleted file mode 100644 index d89e05dd90..0000000000 --- a/docs/zh_CN/1.0.3-release.md +++ /dev/null @@ -1,30 +0,0 @@ -Easy Scheduler Release 1.0.3 -=== -Easy Scheduler 1.0.3是1.x系列中的第四个版本。 - -增强: -=== -- [[EasyScheduler-482]](https://github.com/analysys/EasyScheduler/issues/482)sql任务中的邮件标题增加了对自定义变量的支持 -- [[EasyScheduler-483]](https://github.com/analysys/EasyScheduler/issues/483)sql任务中的发邮件失败,则此sql任务为失败 -- [[EasyScheduler-484]](https://github.com/analysys/EasyScheduler/issues/484)修改sql任务中自定义变量的替换规则,支持多个单引号和双引号的替换 -- [[EasyScheduler-485]](https://github.com/analysys/EasyScheduler/issues/485)创建资源文件时,增加对该资源文件是否在hdfs上已存在的验证 - -修复: -=== -- [[EasyScheduler-198]](https://github.com/analysys/EasyScheduler/issues/198) 流程定义列表根据定时状态和更新时间进行排序 -- [[EasyScheduler-419]](https://github.com/analysys/EasyScheduler/issues/419) 修复在线创建文件,hdfs文件未创建,却返回成功 -- [[EasyScheduler-481]](https://github.com/analysys/EasyScheduler/issues/481)修复job不存在定时无法下线的问题 -- [[EasyScheduler-425]](https://github.com/analysys/EasyScheduler/issues/425) kill任务时增加对其子进程的kill -- [[EasyScheduler-422]](https://github.com/analysys/EasyScheduler/issues/422) 修复更新资源文件时更新时间和大小未更新的问题 -- [[EasyScheduler-431]](https://github.com/analysys/EasyScheduler/issues/431) 修复删除租户时,如果未启动hdfs,则删除租户失败的问题 -- [[EasyScheduler-485]](https://github.com/analysys/EasyScheduler/issues/486) shell进程退出,yarn状态非终态等待判断 - -感谢: -=== -最后但最重要的是,没有以下伙伴的贡献就没有新版本的诞生: - -Baoqi, jimmy201602, samz406, petersear, millionfor, hyperknob, fanguanqun, yangqinlong, qq389401879, -feloxx, coding-now, hymzcn, nysyxxg, chgxtony - -以及微信群里众多的热心伙伴!在此非常感谢! - diff --git a/docs/zh_CN/1.0.4-release.md b/docs/zh_CN/1.0.4-release.md deleted file mode 100644 index 5df1027e08..0000000000 --- a/docs/zh_CN/1.0.4-release.md +++ /dev/null @@ -1,28 +0,0 @@ -Easy Scheduler Release 1.0.4 -=== -Easy Scheduler 1.0.4是1.x系列中的第五个版本。 - -**修复**: -- [[EasyScheduler-198]](https://github.com/analysys/EasyScheduler/issues/198) 流程定义列表根据定时状态和更新时间进行排序 -- [[EasyScheduler-419]](https://github.com/analysys/EasyScheduler/issues/419) 修复在线创建文件,hdfs文件未创建,却返回成功 -- [[EasyScheduler-481]](https://github.com/analysys/EasyScheduler/issues/481)修复job不存在定时无法下线的问题 -- [[EasyScheduler-425]](https://github.com/analysys/EasyScheduler/issues/425) kill任务时增加对其子进程的kill -- [[EasyScheduler-422]](https://github.com/analysys/EasyScheduler/issues/422) 修复更新资源文件时更新时间和大小未更新的问题 -- [[EasyScheduler-431]](https://github.com/analysys/EasyScheduler/issues/431) 修复删除租户时,如果未启动hdfs,则删除租户失败的问题 -- [[EasyScheduler-485]](https://github.com/analysys/EasyScheduler/issues/486) shell进程退出,yarn状态非终态等待判断 - -**增强**: -- [[EasyScheduler-482]](https://github.com/analysys/EasyScheduler/issues/482)sql任务中的邮件标题增加了对自定义变量的支持 -- [[EasyScheduler-483]](https://github.com/analysys/EasyScheduler/issues/483)sql任务中的发邮件失败,则此sql任务为失败 -- [[EasyScheduler-484]](https://github.com/analysys/EasyScheduler/issues/484)修改sql任务中自定义变量的替换规则,支持多个单引号和双引号的替换 -- [[EasyScheduler-485]](https://github.com/analysys/EasyScheduler/issues/485)创建资源文件时,增加对该资源文件是否在hdfs上已存在的验证 - - -感谢: -=== -最后但最重要的是,没有以下伙伴的贡献就没有新版本的诞生(排名不分先后): - -Baoqi, jimmy201602, samz406, petersear, millionfor, hyperknob, fanguanqun, yangqinlong, qq389401879, -feloxx, coding-now, hymzcn, nysyxxg, chgxtony, lfyee, Crossoverrr, gj-zhang, sunnyingit, xianhu, zhengqiangtan - -以及微信群/钉钉群里众多的热心伙伴!在此非常感谢! \ No newline at end of file diff --git a/docs/zh_CN/1.0.5-release.md b/docs/zh_CN/1.0.5-release.md deleted file mode 100644 index da86c6b207..0000000000 --- a/docs/zh_CN/1.0.5-release.md +++ /dev/null @@ -1,23 +0,0 @@ -Easy Scheduler Release 1.0.5 -=== -Easy Scheduler 1.0.5是1.x系列中的第六个版本。 - -增强: -=== -- [[EasyScheduler-597]](https://github.com/analysys/EasyScheduler/issues/597)child process cannot extend father's receivers and cc - -修复 -=== -- [[EasyScheduler-516]](https://github.com/analysys/EasyScheduler/issues/516)The task instance of MR cannot stop in some cases -- [[EasyScheduler-594]](https://github.com/analysys/EasyScheduler/issues/594)soft kill task 后 进程依旧存在(父进程 子进程) - - -感谢: -=== -最后但最重要的是,没有以下伙伴的贡献就没有新版本的诞生: - -Baoqi, jimmy201602, samz406, petersear, millionfor, hyperknob, fanguanqun, yangqinlong, qq389401879, feloxx, coding-now, hymzcn, nysyxxg, chgxtony, gj-zhang, xianhu, sunnyingit, -zhengqiangtan, chinashenkai - -以及微信群里众多的热心伙伴!在此非常感谢! - diff --git a/docs/zh_CN/1.1.0-release.md b/docs/zh_CN/1.1.0-release.md deleted file mode 100644 index b603180708..0000000000 --- a/docs/zh_CN/1.1.0-release.md +++ /dev/null @@ -1,63 +0,0 @@ -Easy Scheduler Release 1.1.0 -=== -Easy Scheduler 1.1.0是1.1.x系列中的第一个版本。 - -新特性: -=== -- [[EasyScheduler-391](https://github.com/analysys/EasyScheduler/issues/391)] run a process under a specified tenement user -- [[EasyScheduler-288](https://github.com/analysys/EasyScheduler/issues/288)] Feature/qiye_weixin -- [[EasyScheduler-189](https://github.com/analysys/EasyScheduler/issues/189)] Kerberos等安全支持 -- [[EasyScheduler-398](https://github.com/analysys/EasyScheduler/issues/398)]管理员,有租户(install.sh设置默认租户),可以创建资源、项目和数据源(限制有一个管理员) -- [[EasyScheduler-293](https://github.com/analysys/EasyScheduler/issues/293)]点击运行流程时候选择的参数,没有地方可查看,也没有保存 -- [[EasyScheduler-401](https://github.com/analysys/EasyScheduler/issues/401)]定时很容易定时每秒一次,定时完成以后可以在页面显示一下下次触发时间 -- [[EasyScheduler-493](https://github.com/analysys/EasyScheduler/pull/493)]add datasource kerberos auth and FAQ modify and add resource upload s3 - - -增强: -=== -- [[EasyScheduler-227](https://github.com/analysys/EasyScheduler/issues/227)] upgrade spring-boot to 2.1.x and spring to 5.x -- [[EasyScheduler-434](https://github.com/analysys/EasyScheduler/issues/434)] worker节点数量 zk和mysql中不一致 -- [[EasyScheduler-435](https://github.com/analysys/EasyScheduler/issues/435)]邮箱格式的验证 -- [[EasyScheduler-441](https://github.com/analysys/EasyScheduler/issues/441)] 禁止运行节点加入已完成节点检测 -- [[EasyScheduler-400](https://github.com/analysys/EasyScheduler/issues/400)] 首页页面,队列统计不和谐,命令统计无数据 -- [[EasyScheduler-395](https://github.com/analysys/EasyScheduler/issues/395)] 对于容错恢复的流程,状态不能为 **正在运行 -- [[EasyScheduler-529](https://github.com/analysys/EasyScheduler/issues/529)] optimize poll task from zookeeper -- [[EasyScheduler-242](https://github.com/analysys/EasyScheduler/issues/242)]worker-server节点获取任务性能问题 -- [[EasyScheduler-352](https://github.com/analysys/EasyScheduler/issues/352)]worker 分组, 队列消费问题 -- [[EasyScheduler-461](https://github.com/analysys/EasyScheduler/issues/461)]查看数据源参数,需要加密账号密码信息 -- [[EasyScheduler-396](https://github.com/analysys/EasyScheduler/issues/396)]Dockerfile优化,并关联Dockerfile和github实现自动打镜像 -- [[EasyScheduler-389](https://github.com/analysys/EasyScheduler/issues/389)]service monitor cannot find the change of master/worker -- [[EasyScheduler-511](https://github.com/analysys/EasyScheduler/issues/511)]support recovery process from stop/kill nodes. -- [[EasyScheduler-399](https://github.com/analysys/EasyScheduler/issues/399)]HadoopUtils指定用户操作,而不是 **部署用户 -- [[EasyScheduler-378](https://github.com/analysys/EasyScheduler/issues/378)]Mailbox regular match -- [[EasyScheduler-625](https://github.com/analysys/EasyScheduler/issues/625)]EasyScheduler call shell "task instance not set host" -- [[EasyScheduler-622](https://github.com/analysys/EasyScheduler/issues/622)]Front-end interface deployment k8s, background deployment big data cluster session error - -修复: -=== -- [[EasyScheduler-394](https://github.com/analysys/EasyScheduler/issues/394)] master&worker部署在同一台机器上时,如果重启master&worker服务,会导致之前调度的任务无法继续调度 -- [[EasyScheduler-469](https://github.com/analysys/EasyScheduler/issues/469)]Fix naming errors,monitor page -- [[EasyScheduler-392](https://github.com/analysys/EasyScheduler/issues/392)]Feature request: fix email regex check -- [[EasyScheduler-405](https://github.com/analysys/EasyScheduler/issues/405)]定时修改/添加页面,开始时间和结束时间不能相同 -- [[EasyScheduler-517](https://github.com/analysys/EasyScheduler/issues/517)]补数 - 子工作流 - 时间参数 -- [[EasyScheduler-532](https://github.com/analysys/EasyScheduler/issues/532)]python节点不执行的问题 -- [[EasyScheduler-543](https://github.com/analysys/EasyScheduler/issues/543)]optimize datasource connection params safety -- [[EasyScheduler-569](https://github.com/analysys/EasyScheduler/issues/569)]定时任务无法真正停止 -- [[EasyScheduler-463](https://github.com/analysys/EasyScheduler/issues/463)]邮箱验证不支持非常见后缀邮箱 -- [[EasyScheduler-650](https://github.com/analysys/EasyScheduler/issues/650)]Creating a hive data source without a principal will cause the connection to fail -- [[EasyScheduler-641](https://github.com/analysys/EasyScheduler/issues/641)]The cellphone is not supported for 199 telecom segment when create a user -- [[EasyScheduler-627](https://github.com/analysys/EasyScheduler/issues/627)]Different sql node task logs in parallel in the same workflow will be mixed -- [[EasyScheduler-655](https://github.com/analysys/EasyScheduler/issues/655)]when deploy a spark task,the tentant queue not empty,set with a empty queue name -- [[EasyScheduler-667](https://github.com/analysys/EasyScheduler/issues/667)]HivePreparedStatement can't print the actual SQL executed - - - - -感谢: -=== -最后但最重要的是,没有以下伙伴的贡献就没有新版本的诞生: - -Baoqi, jimmy201602, samz406, petersear, millionfor, hyperknob, fanguanqun, yangqinlong, qq389401879, chgxtony, Stanfan, lfyee, thisnew, hujiang75277381, sunnyingit, lgbo-ustc, ivivi, lzy305, JackIllkid, telltime, lipengbo2018, wuchunfu, telltime, chenyuan9028, zhangzhipeng621, thisnew, 307526982, crazycarry - -以及微信群里众多的热心伙伴!在此非常感谢! - diff --git a/docs/zh_CN/EasyScheduler-FAQ.md b/docs/zh_CN/EasyScheduler-FAQ.md deleted file mode 100644 index 360565a4ee..0000000000 --- a/docs/zh_CN/EasyScheduler-FAQ.md +++ /dev/null @@ -1,287 +0,0 @@ -## Q:EasyScheduler服务介绍及建议运行内存 - -A: EasyScheduler由5个服务组成,MasterServer、WorkerServer、ApiServer、AlertServer、LoggerServer和UI。 - -| 服务 | 说明 | -| ------------------------- | ------------------------------------------------------------ | -| MasterServer | 主要负责 **DAG** 的切分和任务状态的监控 | -| WorkerServer/LoggerServer | 主要负责任务的提交、执行和任务状态的更新。LoggerServer用于Rest Api通过 **RPC** 查看日志 | -| ApiServer | 提供Rest Api服务,供UI进行调用 | -| AlertServer | 提供告警服务 | -| UI | 前端页面展示 | - -注意:**由于服务比较多,建议单机部署最好是4核16G以上** - ---- - -## Q: 管理员为什么不能创建项目 - -A:管理员目前属于"**纯管理**", 没有租户,即没有linux上对应的用户,所以没有执行权限, **故没有所属的项目、资源及数据源**,所以没有创建权限。**但是有所有的查看权限**。如果需要创建项目等业务操作,**请使用管理员创建租户和普通用户,然后使用普通用户登录进行操作**。我们将会在1.1.0版本中将管理员的创建和执行权限放开,管理员将会有所有的权限 - ---- - -## Q:系统支持哪些邮箱? - -A:支持绝大多数邮箱,qq、163、126、139、outlook、aliyun等皆支持。支持**TLS和SSL**协议,可以在alert.properties中选择性配置 - ---- - -## Q:常用的系统变量时间参数有哪些,如何使用? - -A:请参考 https://analysys.github.io/easyscheduler_docs_cn/%E7%B3%BB%E7%BB%9F%E4%BD%BF%E7%94%A8%E6%89%8B%E5%86%8C.html#%E7%B3%BB%E7%BB%9F%E5%8F%82%E6%95%B0 - ---- - -## Q:pip install kazoo 这个安装报错。是必须安装的吗? - -A: 这个是python连接zookeeper需要使用到的,必须要安装 - ---- - -## Q: 怎么指定机器运行任务 - -A:使用 **管理员** 创建Worker分组,在 **流程定义启动** 的时候可**指定Worker分组**或者在**任务节点上指定Worker分组**。如果不指定,则使用Default,**Default默认是使用的集群里所有的Worker中随机选取一台来进行任务提交、执行** - ---- - -## Q:任务的优先级 - -A:我们同时 **支持流程和任务的优先级**。优先级我们有 **HIGHEST、HIGH、MEDIUM、LOW和LOWEST** 五种级别。**可以设置不同流程实例之间的优先级,也可以设置同一个流程实例中不同任务实例的优先级**。详细内容请参考任务优先级设计 https://analysys.github.io/easyscheduler_docs_cn/%E7%B3%BB%E7%BB%9F%E6%9E%B6%E6%9E%84%E8%AE%BE%E8%AE%A1.html#%E7%B3%BB%E7%BB%9F%E6%9E%B6%E6%9E%84%E8%AE%BE%E8%AE%A1 - ----- - -## Q:escheduler-grpc报错 - -A:在根目录下执行:mvn -U clean package assembly:assembly -Dmaven.test.skip=true , 然后刷新下整个项目 - ----- - -## Q:EasyScheduler支持windows上运行么 - -A: 理论上只有**Worker是需要在Linux上运行的**,其它的服务都是可以在windows上正常运行的。但是还是建议最好能在linux上部署使用 - ------ - -## Q:UI 在 linux 编译node-sass提示:Error:EACCESS:permission denied,mkdir xxxx - -A:单独安装 **npm install node-sass --unsafe-perm**,之后再 **npm install** - ---- - -## Q:UI 不能正常登陆访问 - -A: 1,如果是node启动的查看escheduler-ui下的.env API_BASE配置是否是Api Server服务地址 - - 2,如果是nginx启动的并且是通过 **install-escheduler-ui.sh** 安装的,查看 **/etc/nginx/conf.d/escheduler.conf** 中的proxy_pass配置是否是Api Server服务地址 - - 3,如果以上配置都是正确的,那么请查看Api Server服务是否是正常的,curl http://192.168.xx.xx:12345/escheduler/users/get-user-info,查看Api Server日志,如果提示 cn.escheduler.api.interceptor.LoginHandlerInterceptor:[76] - session info is null,则证明Api Server服务是正常的 - - 4,如果以上都没有问题,需要查看一下 **application.properties** 中的 **server.context-path 和 server.port 配置**是否正确 - ---- - -## Q: 流程定义手动启动或调度启动之后,没有流程实例生成 - -A: 1,首先通过**jps 查看MasterServer服务是否存在**,或者从服务监控直接查看zk中是否存在master服务 - - 2,如果存在master服务,查看 **命令状态统计** 或者 **t_escheduler_error_command** 中是否增加的新记录,如果增加了,**请查看 message 字段定位启动异常原因** - ---- - -## Q : 任务状态一直处于提交成功状态 - -A: 1,首先通过**jps 查看WorkerServer服务是否存在**,或者从服务监控直接查看zk中是否存在worker服务 - - 2,如果 **WorkerServer** 服务正常,需要 **查看MasterServer是否把task任务放到zk队列中** ,**需要查看MasterServer日志及zk队列中是否有任务阻塞** - - 3,如果以上都没有问题,需要定位是否指定了Worker分组,但是 **Worker分组的机器不是在线状态** - ---- - -## Q: 是否提供Docker镜像及Dockerfile - -A: 提供Docker镜像及Dockerfile。 - -Docker镜像地址:https://hub.docker.com/r/escheduler/escheduler_images - -Dockerfile地址:https://github.com/qiaozhanwei/escheduler_dockerfile/tree/master/docker_escheduler - ---- - -## Q : install.sh 中需要注意问题 - -A: 1,如果替换变量中包含特殊字符,**请用 \ 转移符进行转移** - - 2,installPath="/data1_1T/escheduler",**这个目录不能和当前要一键安装的install.sh目录是一样的** - - 3,deployUser="escheduler",**部署用户必须具有sudo权限**,因为worker是通过sudo -u 租户 sh xxx.command进行执行的 - - 4,monitorServerState="false",服务监控脚本是否启动,默认是不启动服务监控脚本的。**如果启动服务监控脚本,则每5分钟定时来监控master和worker的服务是否down机,如果down机则会自动重启** - - 5,hdfsStartupSate="false",是否开启HDFS资源上传功能。默认是不开启的,**如果不开启则资源中心是不能使用的**。如果开启,需要conf/common/hadoop/hadoop.properties中配置fs.defaultFS和yarn的相关配置,如果使用namenode HA,需要将core-site.xml和hdfs-site.xml复制到conf根目录下 - - 注意:**1.0.x版本是不会自动创建hdfs根目录的,需要自行创建,并且需要部署用户有hdfs的操作权限** - ---- - -## Q : 流程定义和流程实例下线异常 - -A : 对于 **1.0.4 以前的版本中**,修改escheduler-api cn.escheduler.api.quartz包下的代码即可 - -``` -public boolean deleteJob(String jobName, String jobGroupName) { - lock.writeLock().lock(); - try { - JobKey jobKey = new JobKey(jobName,jobGroupName); - if(scheduler.checkExists(jobKey)){ - logger.info("try to delete job, job name: {}, job group name: {},", jobName, jobGroupName); - return scheduler.deleteJob(jobKey); - }else { - return true; - } - - } catch (SchedulerException e) { - logger.error(String.format("delete job : %s failed",jobName), e); - } finally { - lock.writeLock().unlock(); - } - return false; - } -``` - ---- - -## Q : HDFS启动之前创建的租户,能正常使用资源中心吗 - -A: 不能。因为在未启动HDFS创建的租户,不会在HDFS中注册租户目录。所以上次资源会报错 - -## Q : 多Master和多Worker状态下,服务掉了,怎么容错 - -A: **注意:Master监控Master及Worker服务。** - - 1,如果Master服务掉了,其它的Master会接管挂掉的Master的流程,继续监控Worker task状态 - - 2,如果Worker服务掉,Master会监控到Worker服务掉了,如果存在Yarn任务,Kill Yarn任务之后走重试 - -具体请看容错设计:https://analysys.github.io/easyscheduler_docs_cn/%E7%B3%BB%E7%BB%9F%E6%9E%B6%E6%9E%84%E8%AE%BE%E8%AE%A1.html#%E7%B3%BB%E7%BB%9F%E6%9E%B6%E6%9E%84%E8%AE%BE%E8%AE%A1 - ---- - -## Q : 对于Master和Worker一台机器伪分布式下的容错 - -A : 1.0.3 版本只实现了Master启动流程容错,不走Worker容错。也就是说如果Worker挂掉的时候,没有Master存在。这流程将会出现问题。我们会在 **1.1.0** 版本中增加Master和Worker启动自容错,修复这个问题。如果想手动修改这个问题,需要针对 **跨重启正在运行流程** **并且已经掉的正在运行的Worker任务,需要修改为失败**,**同时跨重启正在运行流程设置为失败状态**。然后从失败节点进行流程恢复即可 - ---- - -## Q : 定时容易设置成每秒执行 - -A : 设置定时的时候需要注意,如果第一位(* * * * * ? *)设置成 \* ,则表示每秒执行。**我们将会在1.1.0版本中加入显示最近调度的时间列表** ,使用http://cron.qqe2.com/ 可以在线看近5次运行时间 - - - -## Q: 定时有有效时间范围吗 - -A:有的,**如果定时的起止时间是同一个时间,那么此定时将是无效的定时**。**如果起止时间的结束时间比当前的时间小,很有可能定时会被自动删除** - - - -## Q : 任务依赖有几种实现 - -A: 1,**DAG** 之间的任务依赖关系,是从 **入度为零** 进行DAG切分的 - - 2,有 **任务依赖节点** ,可以实现跨流程的任务或者流程依赖,具体请参考 依赖(DEPENDENT)节点:https://analysys.github.io/easyscheduler_docs_cn/%E7%B3%BB%E7%BB%9F%E4%BD%BF%E7%94%A8%E6%89%8B%E5%86%8C.html#%E4%BB%BB%E5%8A%A1%E8%8A%82%E7%82%B9%E7%B1%BB%E5%9E%8B%E5%92%8C%E5%8F%82%E6%95%B0%E8%AE%BE%E7%BD%AE - - 注意:**不支持跨项目的流程或任务依赖** - -## Q: 流程定义有几种启动方式 - -A: 1,在 **流程定义列表**,点击 **启动** 按钮 - - 2,**流程定义列表添加定时器**,调度启动流程定义 - - 3,流程定义 **查看或编辑** DAG 页面,任意 **任务节点右击** 启动流程定义 - - 4,可以对流程定义 DAG 编辑,设置某些任务的运行标志位 **禁止运行**,则在启动流程定义的时候,将该节点的连线将从DAG中去掉 - -## Q : Python任务设置Python版本 - -A: 1,对于1**.0.3之后的版本**只需要修改 conf/env/.escheduler_env.sh中的PYTHON_HOME - -``` -export PYTHON_HOME=/bin/python -``` - -注意:这了 **PYTHON_HOME** ,是python命令的绝对路径,而不是单纯的 PYTHON_HOME,还需要注意的是 export PATH 的时候,需要直接 - -``` -export PATH=$HADOOP_HOME/bin:$SPARK_HOME1/bin:$SPARK_HOME2/bin:$PYTHON_HOME:$JAVA_HOME/bin:$HIVE_HOME/bin:$PATH -``` - - 2,对 1.0.3 之前的版本,Python任务只能支持系统的Python版本,不支持指定Python版本 - -## Q: Worker Task 通过sudo -u 租户 sh xxx.command会产生子进程,在kill的时候,是否会杀掉 - -A: 我们会在1.0.4中增加kill任务同时,kill掉任务产生的各种所有子进程 - - - -## Q : EasyScheduler中的队列怎么用,用户队列和租户队列是什么意思 - -A : EasyScheduler 中的队列可以在用户或者租户上指定队列,**用户指定的队列优先级是高于租户队列的优先级的。**,例如:对MR任务指定队列,是通过 mapreduce.job.queuename 来指定队列的。 - -注意:MR在用以上方法指定队列的时候,传递参数请使用如下方式: - -``` - Configuration conf = new Configuration(); - GenericOptionsParser optionParser = new GenericOptionsParser(conf, args); - String[] remainingArgs = optionParser.getRemainingArgs(); -``` - - - -如果是Spark任务 --queue 方式指定队列 - - - -## Q : Master 或者 Worker报如下告警 - -- -
- - - -A : 修改conf下的 master.properties **master.reserved.memory** 的值为更小的值,比如说0.1 或者 - -worker.properties **worker.reserved.memory** 的值为更小的值,比如说0.1 - - - -## Q : hive版本是1.1.0+cdh5.15.0,SQL hive任务连接报错 - -- -
- - - -A : 将 hive pom - -``` -- -
- -* 创建队列 - -- -
- - * 创建租户 -- -
- - * 创建普通用户 -- -
- - * 创建告警组 - -- -
- - * 使用普通用户登录 - > 点击右上角用户名“退出”,重新使用普通用户登录。 - - * 项目管理->创建项目->点击项目名称 -- -
- - * 点击工作流定义->创建工作流定义->上线工作流定义 - -- -
- - * 运行工作流定义->点击工作流实例->点击工作流实例名称->双击任务节点->查看任务执行日志 - -- -
\ No newline at end of file diff --git a/docs/zh_CN/系统使用手册.md b/docs/zh_CN/系统使用手册.md deleted file mode 100644 index 348cc2b36a..0000000000 --- a/docs/zh_CN/系统使用手册.md +++ /dev/null @@ -1,675 +0,0 @@ -# 系统使用手册 - - -## 快速上手 - - > 请参照[快速上手](快速上手.md) - -## 操作指南 - -### 创建项目 - - - 点击“项目管理->创建项目”,输入项目名称,项目描述,点击“提交”,创建新的项目。 - - 点击项目名称,进入项目首页。 -- -
- -> 项目首页其中包含任务状态统计,流程状态统计、工作流定义统计 - - - 任务状态统计:是指在指定时间范围内,统计任务实例中的待运行、失败、运行中、完成、成功的个数 - - 流程状态统计:是指在指定时间范围内,统计工作流实例中的待运行、失败、运行中、完成、成功的个数 - - 工作流定义统计:是统计该用户创建的工作流定义及管理员授予该用户的工作流定义 - - -### 创建工作流定义 - - 进入项目首页,点击“工作流定义”,进入工作流定义列表页。 - - 点击“创建工作流”,创建新的工作流定义。 - - 拖拽“SHELL"节点到画布,新增一个Shell任务。 - - 填写”节点名称“,”描述“,”脚本“字段。 - - 选择“任务优先级”,级别高的任务在执行队列中会优先执行,相同优先级的任务按照先进先出的顺序执行。 - - 超时告警, 填写”超时时长“,当任务执行时间超过**超时时长**可以告警并且超时失败。 - - 填写"自定义参数",参考[自定义参数](#用户自定义参数) -- -
- - - 增加节点之间执行的先后顺序: 点击”线条连接“;如图示,任务1和任务3并行执行,当任务1执行完,任务2、3会同时执行。 - -- -
- - - 删除依赖关系: 点击箭头图标”拖动节点和选中项“,选中连接线,点击删除图标,删除节点间依赖关系。 -- -
- - - 点击”保存“,输入工作流定义名称,工作流定义描述,设置全局参数,参考[自定义参数](#用户自定义参数)。 - -- -
- - - 其他类型节点,请参考 [任务节点类型和参数设置](#任务节点类型和参数设置) - -### 执行工作流定义 - - **未上线状态的工作流定义可以编辑,但是不可以运行**,所以先上线工作流 - > 点击工作流定义,返回工作流定义列表,点击”上线“图标,上线工作流定义。 - - > 下线工作流定义的时候,要先将定时管理中的定时任务下线,这样才能成功下线工作流定义 - - - 点击”运行“,执行工作流。运行参数说明: - * 失败策略:**当某一个任务节点执行失败时,其他并行的任务节点需要执行的策略**。”继续“表示:其他任务节点正常执行,”结束“表示:终止所有正在执行的任务,并终止整个流程。 - * 通知策略:当流程结束,根据流程状态发送流程执行信息通知邮件。 - * 流程优先级:流程运行的优先级,分五个等级:最高(HIGHEST),高(HIGH),中(MEDIUM),低(LOW),最低(LOWEST)。级别高的流程在执行队列中会优先执行,相同优先级的流程按照先进先出的顺序执行。 - * worker分组: 这个流程只能在指定的机器组里执行。默认是Default,可以在任一worker上执行。 - * 通知组: 当流程结束,或者发生容错时,会发送流程信息邮件到通知组里所有成员。 - * 收件人:输入邮箱后按回车键保存。当流程结束、发生容错时,会发送告警邮件到收件人列表。 - * 抄送人:输入邮箱后按回车键保存。当流程结束、发生容错时,会抄送告警邮件到抄送人列表。 -- -
- - * 补数: 执行指定日期的工作流定义,可以选择补数时间范围(目前只支持针对连续的天进行补数),比如要补5月1号到5月10号的数据,如图示: -- -
- -> 补数执行模式有**串行执行、并行执行**,串行模式下,补数会从5月1号到5月10号依次执行;并行模式下,会同时执行5月1号到5月10号的任务。 - -### 定时工作流定义 - - 创建定时:"工作流定义->定时” - - 选择起止时间,在起止时间范围内,定时正常工作,超过范围,就不会再继续产生定时工作流实例了。 -- -
- - - 添加一个每天凌晨5点执行一次的定时,如图示: -- -
- - - 定时上线,**新创建的定时是下线状态,需要点击“定时管理->上线”,定时才能正常工作**。 - -### 查看工作流实例 - > 点击“工作流实例”,查看工作流实例列表。 - - > 点击工作流名称,查看任务执行状态。 - -- -
- - > 点击任务节点,点击“查看日志”,查看任务执行日志。 - -- -
- - > 点击任务实例节点,点击**查看历史**,可以查看该工作流实例运行的该任务实例列表 - -- -
- - - > 对工作流实例的操作: - -- -
- - * 编辑:可以对已经终止的流程进行编辑,编辑后保存的时候,可以选择是否更新到工作流定义。 - * 重跑:可以对已经终止的流程进行重新执行。 - * 恢复失败:针对失败的流程,可以执行恢复失败操作,从失败的节点开始执行。 - * 停止:对正在运行的流程进行**停止**操作,后台会先对worker进程`kill`,再执行`kill -9`操作 - * 暂停:可以对正在运行的流程进行**暂停**操作,系统状态变为**等待执行**,会等待正在执行的任务结束,暂停下一个要执行的任务。 - * 恢复暂停:可以对暂停的流程恢复,直接从**暂停的节点**开始运行 - * 删除:删除工作流实例及工作流实例下的任务实例 - * 甘特图:Gantt图纵轴是某个工作流实例下的任务实例的拓扑排序,横轴是任务实例的运行时间,如图示: -- -
- -### 查看任务实例 - > 点击“任务实例”,进入任务列表页,查询任务执行情况 - -- -
- - > 点击操作列中的“查看日志”,可以查看任务执行的日志情况。 - -- -
- -### 创建数据源 - > 数据源中心支持MySQL、POSTGRESQL、HIVE及Spark等数据源 - -#### 创建、编辑MySQL数据源 - - - 点击“数据源中心->创建数据源”,根据需求创建不同类型的数据源。 - - - 数据源:选择MYSQL - - 数据源名称:输入数据源的名称 - - 描述:输入数据源的描述 - - IP/主机名:输入连接MySQL的IP - - 端口:输入连接MySQL的端口 - - 用户名:设置连接MySQL的用户名 - - 密码:设置连接MySQL的密码 - - 数据库名:输入连接MySQL的数据库名称 - - Jdbc连接参数:用于MySQL连接的参数设置,以JSON形式填写 - -- -
- - > 点击“测试连接”,测试数据源是否可以连接成功。 - -#### 创建、编辑POSTGRESQL数据源 - -- 数据源:选择POSTGRESQL -- 数据源名称:输入数据源的名称 -- 描述:输入数据源的描述 -- IP/主机名:输入连接POSTGRESQL的IP -- 端口:输入连接POSTGRESQL的端口 -- 用户名:设置连接POSTGRESQL的用户名 -- 密码:设置连接POSTGRESQL的密码 -- 数据库名:输入连接POSTGRESQL的数据库名称 -- Jdbc连接参数:用于POSTGRESQL连接的参数设置,以JSON形式填写 - -- -
- -#### 创建、编辑HIVE数据源 - -1.使用HiveServer2方式连接 - -- -
- - - 数据源:选择HIVE - - 数据源名称:输入数据源的名称 - - 描述:输入数据源的描述 - - IP/主机名:输入连接HIVE的IP - - 端口:输入连接HIVE的端口 - - 用户名:设置连接HIVE的用户名 - - 密码:设置连接HIVE的密码 - - 数据库名:输入连接HIVE的数据库名称 - - Jdbc连接参数:用于HIVE连接的参数设置,以JSON形式填写 - -2.使用HiveServer2 HA Zookeeper方式连接 - -- -
- - -注意:如果开启了**kerberos**,则需要填写 **Principal** -- -
- - - - -#### 创建、编辑Spark数据源 - -- -
- -- 数据源:选择Spark -- 数据源名称:输入数据源的名称 -- 描述:输入数据源的描述 -- IP/主机名:输入连接Spark的IP -- 端口:输入连接Spark的端口 -- 用户名:设置连接Spark的用户名 -- 密码:设置连接Spark的密码 -- 数据库名:输入连接Spark的数据库名称 -- Jdbc连接参数:用于Spark连接的参数设置,以JSON形式填写 - - - -注意:如果开启了**kerberos**,则需要填写 **Principal** - -- -
- -### 上传资源 - - 上传资源文件和udf函数,所有上传的文件和资源都会被存储到hdfs上,所以需要以下配置项: - -``` -conf/common/common.properties - -- hdfs.startup.state=true -conf/common/hadoop.properties - -- fs.defaultFS=hdfs://xxxx:8020 - -- yarn.resourcemanager.ha.rm.ids=192.168.xx.xx,192.168.xx.xx - -- yarn.application.status.address=http://xxxx:8088/ws/v1/cluster/apps/%s -``` - -#### 文件管理 - - > 是对各种资源文件的管理,包括创建基本的txt/log/sh/conf等文件、上传jar包等各种类型文件,以及编辑、下载、删除等操作。 -- -
- - * 创建文件 - > 文件格式支持以下几种类型:txt、log、sh、conf、cfg、py、java、sql、xml、hql - -- -
- - * 上传文件 - -> 上传文件:点击上传按钮进行上传,将文件拖拽到上传区域,文件名会自动以上传的文件名称补全 - -- -
- - - * 文件查看 - -> 对可查看的文件类型,点击 文件名称 可以查看文件详情 - -- -
- - * 下载文件 - -> 可以在 文件详情 中点击右上角下载按钮下载文件,或者在文件列表后的下载按钮下载文件 - - * 文件重命名 - -- -
- -#### 删除 -> 文件列表->点击"删除"按钮,删除指定文件 - -#### 资源管理 - > 资源管理和文件管理功能类似,不同之处是资源管理是上传的UDF函数,文件管理上传的是用户程序,脚本及配置文件 - - * 上传udf资源 - > 和上传文件相同。 - -#### 函数管理 - - * 创建udf函数 - > 点击“创建UDF函数”,输入udf函数参数,选择udf资源,点击“提交”,创建udf函数。 - - > 目前只支持HIVE的临时UDF函数 - - - UDF函数名称:输入UDF函数时的名称 - - 包名类名:输入UDF函数的全路径 - - 参数:用来标注函数的输入参数 - - 数据库名:预留字段,用于创建永久UDF函数 - - UDF资源:设置创建的UDF对应的资源文件 - -- -
- -## 安全中心(权限系统) - - - 安全中心是只有管理员账户才有权限的功能,有队列管理、租户管理、用户管理、告警组管理、worker分组、令牌管理等功能,还可以对资源、数据源、项目等授权 - - 管理员登录,默认用户名密码:admin/escheduler123 - -### 创建队列 - - 队列是在执行spark、mapreduce等程序,需要用到“队列”参数时使用的。 - - “安全中心”->“队列管理”->“创建队列” -- -
- - -### 添加租户 - - 租户对应的是Linux的用户,用于worker提交作业所使用的用户。如果linux没有这个用户,worker会在执行脚本的时候创建这个用户。 - - 租户编码:**租户编码是Linux上的用户,唯一,不能重复** - -- -
- -### 创建普通用户 - - 用户分为**管理员用户**和**普通用户** - * 管理员有**授权和用户管理**等权限,没有**创建项目和工作流定义**的操作的权限 - * 普通用户可以**创建项目和对工作流定义的创建,编辑,执行**等操作。 - * 注意:**如果该用户切换了租户,则该用户所在租户下所有资源将复制到切换的新租户下** -- -
- -### 创建告警组 - * 告警组是在启动时设置的参数,在流程结束以后会将流程的状态和其他信息以邮件形式发送给告警组。 - - 新建、编辑告警组 - -- -
- -### 创建worker分组 - - worker分组,提供了一种让任务在指定的worker上运行的机制。管理员创建worker分组,在任务节点和运行参数中设置中可以指定该任务运行的worker分组,如果指定的分组被删除或者没有指定分组,则该任务会在任一worker上运行。 - - worker分组内多个ip地址(**不能写别名**),以**英文逗号**分隔 - -- -
- -### 令牌管理 - - 由于后端接口有登录检查,令牌管理,提供了一种可以通过调用接口的方式对系统进行各种操作。 - - 调用示例: - -```令牌调用示例 - /** - * test token - */ - public void doPOSTParam()throws Exception{ - // create HttpClient - CloseableHttpClient httpclient = HttpClients.createDefault(); - - // create http post request - HttpPost httpPost = new HttpPost("http://127.0.0.1:12345/escheduler/projects/create"); - httpPost.setHeader("token", "123"); - // set parameters - List- -
- -- 2.选中项目按钮,进行项目授权 - -- -
- - -## 监控中心 - -### 服务管理 - - 服务管理主要是对系统中的各个服务的健康状况和基本信息的监控和显示 - -#### master监控 - - 主要是master的相关信息。 -- -
- -#### worker监控 - - 主要是worker的相关信息。 - -- -
- -#### Zookeeper监控 - - 主要是zookpeeper中各个worker和master的相关配置信息。 - -- -
- -#### Mysql监控 - - 主要是mysql的健康状况 - -- -
- -## 任务节点类型和参数设置 - -### Shell节点 - - shell节点,在worker执行的时候,会生成一个临时shell脚本,使用租户同名的linux用户执行这个脚本。 -> 拖动工具栏中的![PNG](https://analysys.github.io/easyscheduler_docs_cn/images/toolbar_SHELL.png)任务节点到画板中,双击任务节点,如下图: - -- -
- -- 节点名称:一个工作流定义中的节点名称是唯一的 -- 运行标志:标识这个节点是否能正常调度,如果不需要执行,可以打开禁止执行开关。 -- 描述信息:描述该节点的功能 -- 失败重试次数:任务失败重新提交的次数,支持下拉和手填 -- 失败重试间隔:任务失败重新提交任务的时间间隔,支持下拉和手填 -- 脚本:用户开发的SHELL程序 -- 资源:是指脚本中需要调用的资源文件列表 -- 自定义参数:是SHELL局部的用户自定义参数,会替换脚本中以${变量}的内容 - -### 子流程节点 - - 子流程节点,就是把外部的某个工作流定义当做一个任务节点去执行。 -> 拖动工具栏中的![PNG](https://analysys.github.io/easyscheduler_docs_cn/images/toolbar_SUB_PROCESS.png)任务节点到画板中,双击任务节点,如下图: - -- -
- -- 节点名称:一个工作流定义中的节点名称是唯一的 -- 运行标志:标识这个节点是否能正常调度 -- 描述信息:描述该节点的功能 -- 子节点:是选择子流程的工作流定义,右上角进入该子节点可以跳转到所选子流程的工作流定义 - -### 依赖(DEPENDENT)节点 - - 依赖节点,就是**依赖检查节点**。比如A流程依赖昨天的B流程执行成功,依赖节点会去检查B流程在昨天是否有执行成功的实例。 - -> 拖动工具栏中的![PNG](https://analysys.github.io/easyscheduler_docs_cn/images/toolbar_DEPENDENT.png)任务节点到画板中,双击任务节点,如下图: - -- -
- - > 依赖节点提供了逻辑判断功能,比如检查昨天的B流程是否成功,或者C流程是否执行成功。 - -- -
- - > 例如,A流程为周报任务,B、C流程为天任务,A任务需要B、C任务在上周的每一天都执行成功,如图示: - -- -
- - > 假如,周报A同时还需要自身在上周二执行成功: - -- -
- -### 存储过程节点 - - 根据选择的数据源,执行存储过程。 -> 拖动工具栏中的![PNG](https://analysys.github.io/easyscheduler_docs_cn/images/toolbar_PROCEDURE.png)任务节点到画板中,双击任务节点,如下图: - -- -
- -- 数据源:存储过程的数据源类型支持MySQL和POSTGRESQL两种,选择对应的数据源 -- 方法:是存储过程的方法名称 -- 自定义参数:存储过程的自定义参数类型支持IN、OUT两种,数据类型支持VARCHAR、INTEGER、LONG、FLOAT、DOUBLE、DATE、TIME、TIMESTAMP、BOOLEAN九种数据类型 - -### SQL节点 - - 执行非查询SQL功能 -- -
- - - 执行查询SQL功能,可以选择通过表格和附件形式发送邮件到指定的收件人。 -> 拖动工具栏中的![PNG](https://analysys.github.io/easyscheduler_docs_cn/images/toolbar_SQL.png)任务节点到画板中,双击任务节点,如下图: - -- -
- -- 数据源:选择对应的数据源 -- sql类型:支持查询和非查询两种,查询是select类型的查询,是有结果集返回的,可以指定邮件通知为表格、附件或表格附件三种模板。非查询是没有结果集返回的,是针对update、delete、insert三种类型的操作 -- sql参数:输入参数格式为key1=value1;key2=value2… -- sql语句:SQL语句 -- UDF函数:对于HIVE类型的数据源,可以引用资源中心中创建的UDF函数,其他类型的数据源暂不支持UDF函数 -- 自定义参数:SQL任务类型,而存储过程是自定义参数顺序的给方法设置值自定义参数类型和数据类型同存储过程任务类型一样。区别在于SQL任务类型自定义参数会替换sql语句中${变量} - -### SPARK节点 - - 通过SPARK节点,可以直接直接执行SPARK程序,对于spark节点,worker会使用`spark-submit`方式提交任务 - -> 拖动工具栏中的![PNG](https://analysys.github.io/easyscheduler_docs_cn/images/toolbar_SPARK.png)任务节点到画板中,双击任务节点,如下图: - -- -
- -- 程序类型:支持JAVA、Scala和Python三种语言 -- 主函数的class:是Spark程序的入口Main Class的全路径 -- 主jar包:是Spark的jar包 -- 部署方式:支持yarn-cluster、yarn-client、和local三种模式 -- Driver内核数:可以设置Driver内核数及内存数 -- Executor数量:可以设置Executor数量、Executor内存数和Executor内核数 -- 命令行参数:是设置Spark程序的输入参数,支持自定义参数变量的替换。 -- 其他参数:支持 --jars、--files、--archives、--conf格式 -- 资源:如果其他参数中引用了资源文件,需要在资源中选择指定 -- 自定义参数:是MR局部的用户自定义参数,会替换脚本中以${变量}的内容 - - 注意:JAVA和Scala只是用来标识,没有区别,如果是Python开发的Spark则没有主函数的class,其他都是一样 - -### MapReduce(MR)节点 - - 使用MR节点,可以直接执行MR程序。对于mr节点,worker会使用`hadoop jar`方式提交任务 - - -> 拖动工具栏中的![PNG](https://analysys.github.io/easyscheduler_docs_cn/images/toolbar_MR.png)任务节点到画板中,双击任务节点,如下图: - - 1. JAVA程序 - -- -
- -- 主函数的class:是MR程序的入口Main Class的全路径 -- 程序类型:选择JAVA语言 -- 主jar包:是MR的jar包 -- 命令行参数:是设置MR程序的输入参数,支持自定义参数变量的替换 -- 其他参数:支持 –D、-files、-libjars、-archives格式 -- 资源: 如果其他参数中引用了资源文件,需要在资源中选择指定 -- 自定义参数:是MR局部的用户自定义参数,会替换脚本中以${变量}的内容 - -2. Python程序 - -- -
- -- 程序类型:选择Python语言 -- 主jar包:是运行MR的Python jar包 -- 其他参数:支持 –D、-mapper、-reducer、-input -output格式,这里可以设置用户自定义参数的输入,比如: -- -mapper "mapper.py 1" -file mapper.py -reducer reducer.py -file reducer.py –input /journey/words.txt -output /journey/out/mr/${currentTimeMillis} -- 其中 -mapper 后的 mapper.py 1是两个参数,第一个参数是mapper.py,第二个参数是1 -- 资源: 如果其他参数中引用了资源文件,需要在资源中选择指定 -- 自定义参数:是MR局部的用户自定义参数,会替换脚本中以${变量}的内容 - -### Python节点 - - 使用python节点,可以直接执行python脚本,对于python节点,worker会使用`python **`方式提交任务。 - - -> 拖动工具栏中的![PNG](https://analysys.github.io/easyscheduler_docs_cn/images/toolbar_PYTHON.png)任务节点到画板中,双击任务节点,如下图: - -- -
- -- 脚本:用户开发的Python程序 -- 资源:是指脚本中需要调用的资源文件列表 -- 自定义参数:是Python局部的用户自定义参数,会替换脚本中以${变量}的内容 - -### 系统参数 - -变量 | 含义 |
---|---|
${system.biz.date} | -日常调度实例定时的定时时间前一天,格式为 yyyyMMdd,补数据时,该日期 +1 | -
${system.biz.curdate} | -日常调度实例定时的定时时间,格式为 yyyyMMdd,补数据时,该日期 +1 | -
${system.datetime} | -日常调度实例定时的定时时间,格式为 yyyyMMddHHmmss,补数据时,该日期 +1 | -
- -
- -> global_bizdate为全局参数,引用的是系统参数。 - -- -
- -> 任务中local_param_bizdate通过${global_bizdate}来引用全局参数,对于脚本可以通过${local_param_bizdate}来引用变量local_param_bizdate的值,或通过JDBC直接将local_param_bizdate的值set进去 diff --git a/docs/zh_CN/系统架构设计.md b/docs/zh_CN/系统架构设计.md deleted file mode 100644 index 61e522b6cd..0000000000 --- a/docs/zh_CN/系统架构设计.md +++ /dev/null @@ -1,304 +0,0 @@ -## 系统架构设计 -在对调度系统架构说明之前,我们先来认识一下调度系统常用的名词 - -### 1.名词解释 -**DAG:** 全称Directed Acyclic Graph,简称DAG。工作流中的Task任务以有向无环图的形式组装起来,从入度为零的节点进行拓扑遍历,直到无后继节点为止。举例如下图: - -- -
- dag示例 -
- - -**流程定义**:通过拖拽任务节点并建立任务节点的关联所形成的可视化**DAG** - -**流程实例**:流程实例是流程定义的实例化,可以通过手动启动或定时调度生成,流程定义每运行一次,产生一个流程实例 - -**任务实例**:任务实例是流程定义中任务节点的实例化,标识着具体的任务执行状态 - -**任务类型**: 目前支持有SHELL、SQL、SUB_PROCESS(子流程)、PROCEDURE、MR、SPARK、PYTHON、DEPENDENT(依赖),同时计划支持动态插件扩展,注意:其中子 **SUB_PROCESS** 也是一个单独的流程定义,是可以单独启动执行的 - -**调度方式:** 系统支持基于cron表达式的定时调度和手动调度。命令类型支持:启动工作流、从当前节点开始执行、恢复被容错的工作流、恢复暂停流程、从失败节点开始执行、补数、定时、重跑、暂停、停止、恢复等待线程。其中 **恢复被容错的工作流** 和 **恢复等待线程** 两种命令类型是由调度内部控制使用,外部无法调用 - -**定时调度**:系统采用 **quartz** 分布式调度器,并同时支持cron表达式可视化的生成 - -**依赖**:系统不单单支持 **DAG** 简单的前驱和后继节点之间的依赖,同时还提供**任务依赖**节点,支持**流程间的自定义任务依赖** - -**优先级** :支持流程实例和任务实例的优先级,如果流程实例和任务实例的优先级不设置,则默认是先进先出 - -**邮件告警**:支持 **SQL任务** 查询结果邮件发送,流程实例运行结果邮件告警及容错告警通知 - -**失败策略**:对于并行运行的任务,如果有任务失败,提供两种失败策略处理方式,**继续**是指不管并行运行任务的状态,直到流程失败结束。**结束**是指一旦发现失败任务,则同时Kill掉正在运行的并行任务,流程失败结束 - -**补数**:补历史数据,支持**区间并行和串行**两种补数方式 - -### 2.系统架构 - -#### 2.1 系统架构图 -- -
- 系统架构图 -
- - -#### 2.2 架构说明 - -* **MasterServer** - - MasterServer采用分布式无中心设计理念,MasterServer主要负责 DAG 任务切分、任务提交监控,并同时监听其它MasterServer和WorkerServer的健康状态。 - MasterServer服务启动时向Zookeeper注册临时节点,通过监听Zookeeper临时节点变化来进行容错处理。 - - ##### 该服务内主要包含: - - - **Distributed Quartz**分布式调度组件,主要负责定时任务的启停操作,当quartz调起任务后,Master内部会有线程池具体负责处理任务的后续操作 - - - **MasterSchedulerThread**是一个扫描线程,定时扫描数据库中的 **command** 表,根据不同的**命令类型**进行不同的业务操作 - - - **MasterExecThread**主要是负责DAG任务切分、任务提交监控、各种不同命令类型的逻辑处理 - - - **MasterTaskExecThread**主要负责任务的持久化 - -* **WorkerServer** - - WorkerServer也采用分布式无中心设计理念,WorkerServer主要负责任务的执行和提供日志服务。WorkerServer服务启动时向Zookeeper注册临时节点,并维持心跳。 - ##### 该服务包含: - - **FetchTaskThread**主要负责不断从**Task Queue**中领取任务,并根据不同任务类型调用**TaskScheduleThread**对应执行器。 - - - **LoggerServer**是一个RPC服务,提供日志分片查看、刷新和下载等功能 - -* **ZooKeeper** - - ZooKeeper服务,系统中的MasterServer和WorkerServer节点都通过ZooKeeper来进行集群管理和容错。另外系统还基于ZooKeeper进行事件监听和分布式锁。 - 我们也曾经基于Redis实现过队列,不过我们希望EasyScheduler依赖到的组件尽量地少,所以最后还是去掉了Redis实现。 - -* **Task Queue** - - 提供任务队列的操作,目前队列也是基于Zookeeper来实现。由于队列中存的信息较少,不必担心队列里数据过多的情况,实际上我们压测过百万级数据存队列,对系统稳定性和性能没影响。 - -* **Alert** - - 提供告警相关接口,接口主要包括**告警**两种类型的告警数据的存储、查询和通知功能。其中通知功能又有**邮件通知**和**SNMP(暂未实现)**两种。 - -* **API** - - API接口层,主要负责处理前端UI层的请求。该服务统一提供RESTful api向外部提供请求服务。 - 接口包括工作流的创建、定义、查询、修改、发布、下线、手工启动、停止、暂停、恢复、从该节点开始执行等等。 - -* **UI** - - 系统的前端页面,提供系统的各种可视化操作界面,详见**[系统使用手册](系统使用手册.md)**部分。 - -#### 2.3 架构设计思想 - -##### 一、去中心化vs中心化 - -###### 中心化思想 - -中心化的设计理念比较简单,分布式集群中的节点按照角色分工,大体上分为两种角色: -- -
- -- Master的角色主要负责任务分发并监督Slave的健康状态,可以动态的将任务均衡到Slave上,以致Slave节点不至于“忙死”或”闲死”的状态。 -- Worker的角色主要负责任务的执行工作并维护和Master的心跳,以便Master可以分配任务给Slave。 - - - -中心化思想设计存在的问题: - -- 一旦Master出现了问题,则群龙无首,整个集群就会崩溃。为了解决这个问题,大多数Master/Slave架构模式都采用了主备Master的设计方案,可以是热备或者冷备,也可以是自动切换或手动切换,而且越来越多的新系统都开始具备自动选举切换Master的能力,以提升系统的可用性。 -- 另外一个问题是如果Scheduler在Master上,虽然可以支持一个DAG中不同的任务运行在不同的机器上,但是会产生Master的过负载。如果Scheduler在Slave上,则一个DAG中所有的任务都只能在某一台机器上进行作业提交,则并行任务比较多的时候,Slave的压力可能会比较大。 - - - -###### 去中心化 - - - -- 在去中心化设计里,通常没有Master/Slave的概念,所有的角色都是一样的,地位是平等的,全球互联网就是一个典型的去中心化的分布式系统,联网的任意节点设备down机,都只会影响很小范围的功能。 -- 去中心化设计的核心设计在于整个分布式系统中不存在一个区别于其他节点的”管理者”,因此不存在单点故障问题。但由于不存在” 管理者”节点所以每个节点都需要跟其他节点通信才得到必须要的机器信息,而分布式系统通信的不可靠行,则大大增加了上述功能的实现难度。 -- 实际上,真正去中心化的分布式系统并不多见。反而动态中心化分布式系统正在不断涌出。在这种架构下,集群中的管理者是被动态选择出来的,而不是预置的,并且集群在发生故障的时候,集群的节点会自发的举行"会议"来选举新的"管理者"去主持工作。最典型的案例就是ZooKeeper及Go语言实现的Etcd。 - - - -- EasyScheduler的去中心化是Master/Worker注册到Zookeeper中,实现Master集群和Worker集群无中心,并使用Zookeeper分布式锁来选举其中的一台Master或Worker为“管理者”来执行任务。 - -##### 二、分布式锁实践 - -EasyScheduler使用ZooKeeper分布式锁来实现同一时刻只有一台Master执行Scheduler,或者只有一台Worker执行任务的提交。 -1. 获取分布式锁的核心流程算法如下 -- -
- -2. EasyScheduler中Scheduler线程分布式锁实现流程图: -- -
- - -##### 三、线程不足循环等待问题 - -- 如果一个DAG中没有子流程,则如果Command中的数据条数大于线程池设置的阈值,则直接流程等待或失败。 -- 如果一个大的DAG中嵌套了很多子流程,如下图则会产生“死等”状态: - -- -
-上图中MainFlowThread等待SubFlowThread1结束,SubFlowThread1等待SubFlowThread2结束, SubFlowThread2等待SubFlowThread3结束,而SubFlowThread3等待线程池有新线程,则整个DAG流程不能结束,从而其中的线程也不能释放。这样就形成的子父流程循环等待的状态。此时除非启动新的Master来增加线程来打破这样的”僵局”,否则调度集群将不能再使用。 - -对于启动新Master来打破僵局,似乎有点差强人意,于是我们提出了以下三种方案来降低这种风险: - -1. 计算所有Master的线程总和,然后对每一个DAG需要计算其需要的线程数,也就是在DAG流程执行之前做预计算。因为是多Master线程池,所以总线程数不太可能实时获取。 -2. 对单Master线程池进行判断,如果线程池已经满了,则让线程直接失败。 -3. 增加一种资源不足的Command类型,如果线程池不足,则将主流程挂起。这样线程池就有了新的线程,可以让资源不足挂起的流程重新唤醒执行。 - -注意:Master Scheduler线程在获取Command的时候是FIFO的方式执行的。 - -于是我们选择了第三种方式来解决线程不足的问题。 - - -##### 四、容错设计 -容错分为服务宕机容错和任务重试,服务宕机容错又分为Master容错和Worker容错两种情况 - -###### 1. 宕机容错 - -服务容错设计依赖于ZooKeeper的Watcher机制,实现原理如图: - -- -
-其中Master监控其他Master和Worker的目录,如果监听到remove事件,则会根据具体的业务逻辑进行流程实例容错或者任务实例容错。 - - - -- Master容错流程图: - -- -
-ZooKeeper Master容错完成之后则重新由EasyScheduler中Scheduler线程调度,遍历 DAG 找到”正在运行”和“提交成功”的任务,对”正在运行”的任务监控其任务实例的状态,对”提交成功”的任务需要判断Task Queue中是否已经存在,如果存在则同样监控任务实例的状态,如果不存在则重新提交任务实例。 - - - -- Worker容错流程图: - -- -
- -Master Scheduler线程一旦发现任务实例为” 需要容错”状态,则接管任务并进行重新提交。 - - 注意:由于” 网络抖动”可能会使得节点短时间内失去和ZooKeeper的心跳,从而发生节点的remove事件。对于这种情况,我们使用最简单的方式,那就是节点一旦和ZooKeeper发生超时连接,则直接将Master或Worker服务停掉。 - -###### 2.任务失败重试 - -这里首先要区分任务失败重试、流程失败恢复、流程失败重跑的概念: - -- 任务失败重试是任务级别的,是调度系统自动进行的,比如一个Shell任务设置重试次数为3次,那么在Shell任务运行失败后会自己再最多尝试运行3次 -- 流程失败恢复是流程级别的,是手动进行的,恢复是从只能**从失败的节点开始执行**或**从当前节点开始执行** -- 流程失败重跑也是流程级别的,是手动进行的,重跑是从开始节点进行 - - - -接下来说正题,我们将工作流中的任务节点分了两种类型。 - -- 一种是业务节点,这种节点都对应一个实际的脚本或者处理语句,比如Shell节点,MR节点、Spark节点、依赖节点等。 - -- 还有一种是逻辑节点,这种节点不做实际的脚本或语句处理,只是整个流程流转的逻辑处理,比如子流程节等。 - -每一个**业务节点**都可以配置失败重试的次数,当该任务节点失败,会自动重试,直到成功或者超过配置的重试次数。**逻辑节点**不支持失败重试。但是逻辑节点里的任务支持重试。 - -如果工作流中有任务失败达到最大重试次数,工作流就会失败停止,失败的工作流可以手动进行重跑操作或者流程恢复操作 - - - -##### 五、任务优先级设计 -在早期调度设计中,如果没有优先级设计,采用公平调度设计的话,会遇到先行提交的任务可能会和后继提交的任务同时完成的情况,而不能做到设置流程或者任务的优先级,因此我们对此进行了重新设计,目前我们设计如下: - -- 按照**不同流程实例优先级**优先于**同一个流程实例优先级**优先于**同一流程内任务优先级**优先于**同一流程内任务**提交顺序依次从高到低进行任务处理。 - - 具体实现是根据任务实例的json解析优先级,然后把**流程实例优先级_流程实例id_任务优先级_任务id**信息保存在ZooKeeper任务队列中,当从任务队列获取的时候,通过字符串比较即可得出最需要优先执行的任务 - - - 其中流程定义的优先级是考虑到有些流程需要先于其他流程进行处理,这个可以在流程启动或者定时启动时配置,共有5级,依次为HIGHEST、HIGH、MEDIUM、LOW、LOWEST。如下图 -- -
- - - 任务的优先级也分为5级,依次为HIGHEST、HIGH、MEDIUM、LOW、LOWEST。如下图 -- -
- - -##### 六、Logback和gRPC实现日志访问 - -- 由于Web(UI)和Worker不一定在同一台机器上,所以查看日志不能像查询本地文件那样。有两种方案: - - 将日志放到ES搜索引擎上 - - 通过gRPC通信获取远程日志信息 - -- 介于考虑到尽可能的EasyScheduler的轻量级性,所以选择了gRPC实现远程访问日志信息。 - -- -
- - -- 我们使用自定义Logback的FileAppender和Filter功能,实现每个任务实例生成一个日志文件。 -- FileAppender主要实现如下: - - ```java - /** - * task log appender - */ - public class TaskLogAppender extends FileAppender