See also #37205
Previously releasing growing segments could be triggered by two
conditions:
- Sealed Segment with same id is loaded
- Segment start position is before target checkpoint ts
Which has a worst case that the corresponding sealed segment is
compacted and the checkpoint is pinned by a growing l0 segment.
This PR introduces a new rule that: a growing segment could be released
if the segment id appeared in current target dropped segment id list.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Cherry-pick from master
pr: #37178
Related to #37177
Previous PR #37160
Collection meta is not ref-ed when loading l0 segment in `RemoteLoad`
policy, which cause collection meta release when lots of l0 segment
released.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Related to #35303#30404
This PR change return type of `DeleteCodec.Deserialize` from
`storage.DeleteData` to `DeltaData`, which
reduces the memory usage of interface header.
Also refine `storage.DeltaData` methods to make it easier to usage.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Related to #36102
Previous PR #36107 add grpc inteceptor to observe rpc stats. Using same
strategy, this pr add gin middleware to observer restful v2 rpc stats.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Related to #36887
`LoadDeltaLogs` API did not check memory usage. When system is under
high delete load pressure, this could result into OOM quit.
This PR add resource check for `LoadDeltaLogs` actions and separate
internal deltalog loading function with public one.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Related to #36887
Previously using newly create pool per request shall cause goroutine
leakage. This PR change this behavior by using singleton delete pool.
This change could also provide better concurrency control over delete
memory usage.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Also remove conflit check when executing L0. The exclusive is already
guarenteed in scheduler
See also: #37140
---------
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
Related to #37183
Utilize proxy metacache for `HasCollection` request, if collection
exists in metacache, it could be deducted that collection must exist in
system.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #37158
Return the GuaranteeTS so that the subsequent requests following the
correct TS.
BeginTS is the current timestamp when the task is created.
The GuaranteeTS is the one parsed based on both consistency level and
beginTS, in PreExecute of the task on Proxy.
The delegator will wait until GuaranteeTS is met.
In PostExecute of the task on Proxy, the TS of the first iterator
request will be returned to the SDK and add it to the subsequent
requests.
Hence, if the default consistency level is Eventually or Bounded, the
order of TS will be
> Guarantee TS < BeginTS
If it returns the BeginTS, the second request will need to catch up and
result in extra 200ms max of latency, which results in something like
| Call | Latency |
| --- | --- |
| first call on `Next()` | 30ms |
| second call on `Next()` | 210ms |
| third call on `Next()` | 10ms |
| fourth call on `Next()` | 11 ms |
| ... | ... |
where we expect
| Call | Latency |
| --- | --- |
| first call on `Next()` | 30ms |
| second call on `Next()` | 10ms |
| third call on `Next()` | 10ms |
| fourth call on `Next()` | 11 ms |
| ... | ... |
Signed-off-by: Patrick Weizhi Xu <weizhi.xu@zilliz.com>
Related to #37177
Previous PR #37160
Collection meta is not ref-ed when loading l0 segment in `RemoteLoad`
policy, which cause collection meta release when lots of l0 segment
released.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Timeout is a bad design for long running tasks, especially using a
static timeout config. We should monitor execution progress and fail the
task if the progress has been stale for a long time.
This pr is a small patch to stop DC from marking compaction tasks
timeout, while still waiting for DN to finish. The design is
self-conflicted. After this pr, mix and L0 compaction are no longer
controlled by DC timeout, but clustering is still under timeout control.
The compaction queue capacity grows larger for priority calc, hence
timeout compactions appears more often, and when timeout, the queuing
tasks will be timeout too, no compaction will success after.
See also: #37108, #37015
---------
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
Related to #35303
Slice of `storage.PrimaryKey` will have extra interface cost for each
element, which may cause notable memory usage when delta row count
number is large.
This PR replaces PrimaryKey slice with PrimaryKeys interface saving the
extra interface cost.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #36621
1. Add API to access task runtime metrics, including:
- build index task
- compaction task
- import task
- balance (including load/release of segments/channels and some leader
tasks on querycoord)
- sync task
2. Add a debug model to the webpage by using debug=true or debug=false
in the URL query parameters to enable or disable debug mode.
Signed-off-by: jaime <yun.zhang@zilliz.com>
Related to #37112
Skip load logic used to work only when there is multiple segment load
info entires in load request. In continous delete case, delegator still
loads l0 segment, which occupies lot of memory.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
OSPP 2024 project:
https://summer-ospp.ac.cn/org/prodetail/247410235?list=org&navpage=org
Solutions:
- parser (planparserv2)
- add CallExpr in planparserv2/Plan.g4
- update parser_visitor and show_visitor
- grpc protobuf
- add CallExpr in plan.proto
- execution (`core/src/exec`)
- add `CallExpr` `ValueExpr` and `ColumnExpr` (both logical and
physical) for function call and function parameters
- function factory (`core/src/exec/expression/function`)
- create a global hashmap when starting milvus (see server.go)
- the global hashmap stores function signatures and their function
pointers, the CallExpr in execution engine can get the function pointer
by function signature.
- custom functions
- empty(string)
- starts_with(string, string)
- add cpp/go unittests and E2E tests
closes: #36559
Signed-off-by: Yinzuo Jiang <jiangyinzuo@foxmail.com>
issue: https://github.com/milvus-io/milvus/issues/37083
We use vector of string_view to save data temporally but real string
data will be released after record batch is deconstructed.
Change it to vector of string to avoid memory corruption.
---------
Signed-off-by: sunby <sunbingyi1992@gmail.com>