Commit Graph

84 Commits

Author SHA1 Message Date
LiuBodong
0f3b42925f [Fix][Monitor]Monitor UI not show DisakAvailable and MemoryUsage correctly (#11870) 2022-09-19 15:30:49 +08:00
Kengo Seki
52b79b017e [Improvement] Replace commons-lang 2 function invocations with commons-lang3 (#11810)
* [Improvement] Replace commons-lang 2 function invocations with commons-lang3
2022-09-19 14:48:17 +08:00
caishunfeng
f034a09d25
[Bug-11650][worker] #11650 fix SQL type task, stop task cause NPE (#11668) (#11958)
Co-authored-by: 冯剑 <35831367+fengjian1129@users.noreply.github.com>
2022-09-15 14:24:34 +08:00
kezhenxu94
277f137358
Add Kubernetes configmap reload to all components (#11730) 2022-09-02 12:03:19 +08:00
Wenjun Ruan
67e7f88d8b
Refactor heart beat task, use json to serialize/deserialize (#11702)
* Refactor heart beat task, use json to serialize/deserialize
2022-08-31 16:20:23 +08:00
Wenjun Ruan
03e1e6fe45
[Bug] [Worker] Optimize the getAppId method to avoid worker OOM when kill task (#11701)
* Fix kill job may cause worker oom
2022-08-31 15:25:01 +08:00
Wenjun Ruan
1b120e3a59
Refactor worker execute task process (#11540)
* Refactor worker execute task process
2022-08-26 13:33:51 +08:00
JinYong Li
3f2ca7bca3
[Fix-9980] [Server] fix heartBeatTaskCount bug (#11232)
* fix heartBeat bug

* modify class name

* fix conflict

Co-authored-by: JinyLeeChina <jiny.li@foxmail.com>
2022-08-23 11:30:13 +08:00
Wenjun Ruan
3516533017
Remove logger header in task log file (#11555) 2022-08-19 14:01:52 +08:00
Eric Gao
9330d6cfcd
[Doc][Security] Update instructions on worker groups (#11483)
* Update instructions on worker groups
2022-08-15 17:44:00 +08:00
Wenjun Ruan
7ff34c3947
[Feature-7024] Add waiting strategy to support master/worker can recover from registry lost (#11368)
* Add waiting strategy to support master/worker can recover from registry lost

* throw exception when zookeeper registry start failed due to interrupted
2022-08-13 09:52:03 +08:00
caishunfeng
0464123c2b
[Feature-11223] support stream task (#11350)
* add task execute type

* update task definition list paging

* update task instance list paging

* stream task start

* [Feature][UI] Some changes to execute task.
    * Set the connection edge to dashed line.
    * Add FLINK_STREAM task.

* add stream task

* flink savepoint and cancel

* fix query bug

* add stream task definition

* add task instance for stream task

* delete stream task definition state

* update api for stream task definition edit

* modify search for stream task instance

* add language

* delete task type search for stream task definition

* change task type search for stream task instance

* add jump button

* add savepoint

* add down log for stream task instance

* ui test

* stream task start

* run DAG

* [Fix][UI] Fix the stream task edgs not to be dashed when filling back.

* [Feature][UI] Remove some fields for FLINK_STREAM.

* add start modal

* add dryRun column for stream task instance

* fix duration

* fix pon

* fix build error

* Add success tip

* add auto sync for stream task instance

* remove forgien key for task instance

* license header

* UT fix

* modify locales

* recover common config

* fix UT

* add doc

Co-authored-by: Amy <amywang0104@163.com>
Co-authored-by: devosend <devosend@gmail.com>
2022-08-10 21:44:43 +08:00
Wenjun Ruan
8774415197
Split ExecutionStatus to WorkflowExecutionStatus and TaskExecutionStatus (#11340) 2022-08-10 11:00:23 +08:00
Eric Gao
9ca1eb96c4
[Improvement][Metrics] Add metrics for alert server (#11240)
* [Improvement][Metrics] Add metrics for alert server (#11131)

* Update related docs of metrics

* Add grafana demo dashboards for alert server metrics

* Refactor metric classes with UtilityClass annotation

* Refactor meter names in camelCase for checkstyle
2022-08-03 15:42:06 +08:00
xuhhui
bfff3a7c5d
fix error (#11206) 2022-07-30 18:20:20 +08:00
zhuxt2015
3701a24d15
[Improvement][Task Log] Task status log print description instead of code (#11009)
* use execution status instead of status code
2022-07-22 13:34:31 +08:00
Wenjun Ruan
5e9c7dad23
Add dolphinscheduler-bom to manage the dependency version (#11025) 2022-07-20 10:37:31 +08:00
zhuxt2015
a74d7ef665
[hotfix][Worker] Remove service dependency from worker module (#11008)
* worker remove service dependency
2022-07-17 22:16:35 +08:00
Wenjun Ruan
083ab2b5c9
Remove dao in worker (#10994) 2022-07-15 20:07:18 +08:00
Wenjun Ruan
2be1d4bf0a
Fix worker cannot shutdown due to resource close failed or heart beat check failed (#10979)
* Use try-with-resource to close resource, and add heart error threshold to avoid worker cannot close due to heart beat check failed

* Move heartbeat error threshold to applicaiton.yml
2022-07-15 20:06:53 +08:00
Wenjun Ruan
cade66a9b6
[Fix-10827] Fix network error cause worker cannot send message to master (#10886)
* Fix network error cause worker cannot send message to master
2022-07-12 14:08:42 +08:00
Eric Gao
2f7281c2d2
[Feature][Metrics] Add resource download related metrics for workers (#10749)
* [Feature][Metrics] Add resource download related metrics for workers (#9324)

* [Feature][Metrics] Fix bugs and add grafana demos for worker resource download metrics (#9324)

* [Feature][Metrics] Add docs to resource related metrics (#9324)

* [Feature][Metrics] Use tags to indicate status in metrics (#9324)

* [Feature][Metrics] Fix demos, docs and remove redundant code (#9324)

* [Feature][Metrics] Remove .pnpm-debug.log (#9324)

* [Feature][Metrics] Fix style check (#9324)

* [Feature][Metrics] Replace KB with bytes for the unit of resource file size in metrics (#9324)

* [Feature][Metrics] Make code neat (#9324)
2022-07-12 11:44:34 +08:00
Wenjun Ruan
f639a2eed4
[Fix-10854] Fix database restart may lost task instance status (#10866)
* Fix database update error doesn't rollback the task instance status

* Fix database error may cause workflow dead with running status
2022-07-11 09:57:00 +08:00
Wenjun Ruan
426567348e
Remove quartz in service (#10748)
* Remove quartz in service
2022-07-06 15:43:55 +08:00
Wenjun Ruan
67d14fb7b3
[Fix-10785] Fix state event handle error will not retry (#10786)
* Fix state event handle error will not retry

* Use state event handler to deal with the event
2022-07-06 14:53:28 +08:00
WangJPLeo
8f621ff98b
[Optimization] Calculate global parameter and local parameter at master. (#10704)
* Global parameter and local parameter calculation external expansion.

* k8s task ut fix.

* TimePlaceholderUtils import DateUtils fix

* follow the review comments to fix.

* follow the review comments to fix.

* e2e rerun
2022-06-30 22:45:25 +08:00
Wenjun Ruan
35b25da863
Validate master/worker config (#10649) 2022-06-28 20:17:43 +08:00
Wenjun Ruan
66624c5c86
[Bug] [Master] Worker failover will cause task cannot be failover (#10631)
* fix worker failover may lose event
2022-06-28 16:08:35 +08:00
pinkhello
719a9d4532
[Improvement][Worker] fixed naming of rpc package (#10614) 2022-06-26 10:30:09 +08:00
xiangzihao
1111371c9a
add datasource health check to the healthcheck endpoint (#10588) 2022-06-24 13:29:49 +08:00
xiangzihao
0f38217b12
fix_10514 (#10568) 2022-06-23 16:15:08 +08:00
Wenjun Ruan
db595b3eff
Optimize master log, use MDC to inject workflow instance id and task instance id in log (#10516)
* Optimize master log, add workflow instance id and task instance id in log

* Use MDC to set the workflow info in log4j

* Add workflowInstanceId and taskInstanceId in MDC
2022-06-23 11:45:06 +08:00
Eric Gao
cc06eaaf54
[Improvement][Metrics] Apply micrometer naming convention to metrics (#10477)
* Apply micrometer naming convention to worker metrics
* Apply micrometer naming convention all current metrics
* Fix remaining metrics names, update English docs and add Chinese docs
* Fix metrics names in grafana-demo dashboards
2022-06-21 14:27:06 +08:00
Wenjun Ruan
ad2646ff1f
Fix TaskProcessorFactory#getTaskProcessor get common processor is not thread safe (#10479)
* Fix TaskProcessorFactory#getTaskProcessor get common processor is not thread safe
2022-06-16 21:46:18 +08:00
Wenjun Ruan
78c5fcc6ac
Add mysql registry plugin (#10406)
* Add mysql registry plugin
2022-06-13 11:24:42 +08:00
Wenjun Ruan
e21d7b1551
[Feature][metrics] Add master, worker metrics (#10326)
* Add mater metrics

* fix UT

* Add url to mysql profile

* Add worker metrics

* Update grafana config

* Add system metrics doc

* Add process failover counter

* Add metrics image

* Change jpg to png

* Add command insert metrics

* Fix UT

* Revert UT
2022-06-09 10:55:39 +08:00
Wenjun Ruan
2d3be6b36c
Add dolphinscheduler-scheduler module (#10360)
* Add dolphinscheduler-scheduler module
2022-06-04 16:39:33 +08:00
Wenjun Ruan
022e4886be
Remove quartz at WorkerServer (#10358)
* Remove quartz at WorkerServer

* move k8s and permission from dolphinscheduler-service to dolphinscheduler-api
2022-06-04 00:18:01 +08:00
kezhenxu94
d80cf21456
Clean up unused dependencies and packaging issues (#9944) 2022-05-31 15:22:41 +08:00
JinYong Li
49979c658e
[Fix-8828] [Master] Assign tasks to worker optimization (#9919)
* fix 9584

* master recall

* fix ut

* update logger

* update delay queue

* fix ut

* remove sleep

Co-authored-by: 进勇 <lijinyong@cai-inc.com>
Co-authored-by: JinyLeeChina <jiny.li@foxmail.com>
2022-05-31 11:49:54 +08:00
lugela
a0771541e5
[Fix-10181] Fix the logic of judging that the tenant does not exist (#10185)
* [Fix-10181] Fix the logic of judging that the tenant does not exist

Use the linux command as id to get the user information that exists in /etc/passwd file and the cached sssd user.
for example:
id test
1. exist in /etc/passwd file or ldap :  uid=1030(test) gid=1030(test) groups=1030(test)
2. no exist  in /etc/passwd file and ldap: id: test: no such user

Temporarily unable to test the system for windows and mac

* [Fix-10181] Fix the logic of judging that the tenant does not exist

Use the linux command as id to get the user information that exists in /etc/passwd file and the cached sssd user.
for example:
id test
1. exist in /etc/passwd file or ldap :  uid=1030(test) gid=1030(test) groups=1030(test)
2. no exist  in /etc/passwd file and ldap: id: test: no such user

Temporarily unable to test the system for windows and mac

* [Fix-10181] Fix the logic of judging that the tenant does not exist

Use the linux command as id to get the user information that exists in /etc/passwd file and the cached sssd user.
for example:
id test
1. exist in /etc/passwd file or ldap :  uid=1030(test) gid=1030(test) groups=1030(test)
2. no exist  in /etc/passwd file and ldap: id: test: no such user

Temporarily unable to test the system for windows and mac

* [Fix-10181] Fix the logic of judging that the tenant does not exist

The configuration item adds 'tenant-distributed-user' in worker application.yaml to make it suitable for distributed users. If it is false, the original logic remains unchanged.

At present, considering that it is a distributed user, it should not be allowed to create users in linux

Use the linux command as id to get the user information that exists in /etc/passwd file and the cached sssd user.
for example:
id test
1. exist in /etc/passwd file or ldap :  uid=1030(test) gid=1030(test) groups=1030(test)
2. no exist  in /etc/passwd file and ldap: id: test: no such user

Temporarily unable to test the system for windows and mac

* [Fix-10181] Fix the logic of judging that the tenant does not exist

Add test method

The configuration item adds 'tenant-distributed-user' in worker application.yaml to make it suitable for distributed users. If it is false, the original logic remains unchanged.

At present, considering that it is a distributed user, it should not be allowed to create users in linux

Use the linux command as id to get the user information that exists in /etc/passwd file and the cached sssd user.
for example:
id test
1. exist in /etc/passwd file or ldap :  uid=1030(test) gid=1030(test) groups=1030(test)
2. no exist  in /etc/passwd file and ldap: id: test: no such user

Temporarily unable to test the system for windows and mac

* [Fix-10181] Fix the logic of judging that the tenant does not exist

Add parameter description to configuration.md

Add test method

The configuration item adds 'tenant-distributed-user' in worker application.yaml to make it suitable for distributed users. If it is false, the original logic remains unchanged.

At present, considering that it is a distributed user, it should not be allowed to create users in linux

Use the linux command as id to get the user information that exists in /etc/passwd file and the cached sssd user.
for example:
id test
1. exist in /etc/passwd file or ldap :  uid=1030(test) gid=1030(test) groups=1030(test)
2. no exist  in /etc/passwd file and ldap: id: test: no such user

Temporarily unable to test the system for windows and mac

* [Fix-10181] Fix the logic of judging that the tenant does not exist

Add parameter description to configuration.md

Add test method

The configuration item adds 'tenant-distributed-user' in worker application.yaml to make it suitable for distributed users. If it is false, the original logic remains unchanged.

At present, considering that it is a distributed user, it should not be allowed to create users in linux

Use the linux command as id to get the user information that exists in /etc/passwd file and the cached sssd user.
for example:
id test
1. exist in /etc/passwd file or ldap :  uid=1030(test) gid=1030(test) groups=1030(test)
2. no exist  in /etc/passwd file and ldap: id: test: no such user

Temporarily unable to test the system for windows and mac

* [Fix-10181] Fix the logic of judging that the tenant does not exist

Add parameter description to configuration.md

Add test method

The configuration item adds 'tenant-distributed-user' in worker application.yaml to make it suitable for distributed users. If it is false, the original logic remains unchanged.

At present, considering that it is a distributed user, it should not be allowed to create users in linux

Use the linux command as id to get the user information that exists in /etc/passwd file and the cached sssd user.
for example:
id test
1. exist in /etc/passwd file or ldap :  uid=1030(test) gid=1030(test) groups=1030(test)
2. no exist  in /etc/passwd file and ldap: id: test: no such user

Temporarily unable to test the system for windows and mac

Co-authored-by: ouyangl <ouyangl@tebon.com.cn>
2022-05-26 14:58:07 +08:00
旺阳
aba5f8a40e
[improve] Change Mysql Driver (#10220) 2022-05-25 14:09:15 +08:00
旺阳
de5507fb19
[Fix-10103][k8s]Fix k8s Change DataSource Error (#10128) 2022-05-24 13:59:42 +08:00
Paul Zhang
8562f6a878
[Feature][Log]Add timezone information in log output (#9913) 2022-05-06 17:31:44 +08:00
LongJGun
778018dcfb
[Bug] [worker] fix CommandType TASK_EXECUTE_RUNNING_ACK don't consumed (#9849) (#9850) 2022-05-03 14:28:19 +08:00
Jiajie Zhong
de50f43de6
[common] Make dolphinscheduler_env.sh work when start server (#9726)
* [common] Make dolphinscheduler_env.sh work

* Change dist tarball `dolphinscheduler_env.sh` location
  from `bin/` to `conf/`, which users could finish their
  change configuration operation in one single directory.
  and we only need to add `$DOLPHINSCHEDULER_HOME/conf`
  when we start our sever instead of adding both
  `$DOLPHINSCHEDULER_HOME/conf` and `$DOLPHINSCHEDULER_HOME/bin`
* Change the `start.sh`'s path of `dolphinscheduler_env.sh`
* Change the setting order of `dolphinscheduler_env.sh`
* `bin/env/dolphinscheduler_env.sh` will overwrite the `<server>/conf/dolphinscheduler_env.sh`
when start the server using `bin/dolphinsceduler_daemon.sh` or `bin/install.sh`
* Change the related docs
2022-04-25 15:35:43 +08:00
caishunfeng
5657cb9aec
[Bug-9719][Master] fix failover fail because task plugins has not been loaded (#9720) 2022-04-24 20:34:21 +08:00
WangJPLeo
996790ce9e
[Improvement-9609][Worker]The resource download method is selected according to the configurati… (#9636)
* The resource download method is selected according to the configuration and the service startup verification is added.

* common check CI fix

* Startup check changed to running check

* code smell

* Coordinate resources to increase test coverage.

* Split resource download method.

* Unit Test Coverage

Co-authored-by: WangJPLeo <wangjipeng@whaleops.com>
2022-04-22 11:45:49 +08:00
caishunfeng
239be31ab7
[Bug] cancel application when kill task (#9624)
* cancel application when kill task

* add warn log

* add cancel application
2022-04-20 22:46:15 +08:00
WangJPLeo
9964c4c1e1
[Fix-9593] Storage Management StorageOperate No Instance (#9594)
* Storage Management StorageOperate No Instance

* Add StorageOperateManager unit test

* Add license header

* Fix issues in SonarCloud code analysis

Co-authored-by: WangJPLeo <wangjipeng@whaleops.com>
2022-04-20 09:58:37 +08:00