Commit Graph

145 Commits

Author SHA1 Message Date
congqixia
079276c6ff
fix: [2.4] Unify hook singleton implementation in proxy (#34888)
Cherry-pick from master
pr: #34887
Related to #34885

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-07-26 18:07:53 +08:00
wayblink
1e5c71d550
fix: [cherry-pick] fix dropped segment still visible after dropped by L2 single compaction (#35006)
bug: #35003

Signed-off-by: wayblink <anyang.wang@zilliz.com>
2024-07-26 13:47:48 +08:00
cai.zhang
74adedf750
enhance: Optimized the GC logic to ensure that memory is released in time (#34950)
issue: #34703 

master pr: #34949

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-07-24 14:07:43 +08:00
Chun Han
ae1636c2be
fix: refine handling type for segment pruner(#34923) (#34926)
related: #34923
pr: https://github.com/milvus-io/milvus/pull/34925

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2024-07-24 12:05:44 +08:00
wayblink
8f3c126129
enhance: [cherry-pick] support l2 single compaction (#34929)
#34928
pr: #34935

Signed-off-by: wayblink <anyang.wang@zilliz.com>
2024-07-24 11:47:50 +08:00
cai.zhang
4ed62e9dbb
enhance: [cherry-pick] Add integration test for clustering compaction (#34860)
issue: #34792 

master pr: #34881

Signed-off-by: cai.zhang <cai.zhang@zilliz.com>
2024-07-22 17:49:42 +08:00
wayblink
21973a600d
enhance: [cherry-pick] refine clustering compaction basic it (#34794)
issue: #34792
pr: #34793

Signed-off-by: wayblink <anyang.wang@zilliz.com>
2024-07-21 20:05:41 +08:00
yihao.dai
07bc1b6717
enhance: Seal by total growing segments size (#34692) (#34779)
Seals the largest growing segment if the total size of growing segments
of each shard exceeds the size threshold(default 4GB). Introducing this
policy can help keep the size of growing segments within a suitable
level, alleviating the pressure on the delegator.

issue: https://github.com/milvus-io/milvus/issues/34554

pr: https://github.com/milvus-io/milvus/pull/34692

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-07-19 18:25:50 +08:00
XuanYang-cn
edefc3cbb5
enhance: [skip e2e]Enable compaction it test (#34526) (#34720)
pr: #34526

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-07-16 18:19:38 +08:00
smellthemoon
0fdb288de7
enhance: upsert support autoid(#30342) (#34633)
pr: #30342 
issue: #29258

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
2024-07-15 20:53:39 +08:00
wei liu
cf701a9bf0
enhance: Preserve fixed-size memory in delegator node for growing segment (#34600)
issue: #34595
pr: #34596
When consuming insert data on the delegator node, QueryCoord will move
out some sealed segments to manage its memory usage. After the growing
segment gets flushed, some sealed segments from other workers will be
moved back to the delegator node. To avoid the frequent movement of
segments, we estimate the maximum growing row count and preserve a
fixed-size memory in the delegator node.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-07-15 20:51:46 +08:00
wei liu
d3e94f9861
enhance: Use Blocked Bloom Filter instead of basic bloom fitler impl (#34377)
issue: #32995
pr: #33405
To speed up the construction and querying of Bloom filters, we chose a
blocked Bloom filter instead of a basic Bloom filter implementation.

WARN: This PR is compatible with old version bf impl, but if fall back
to old milvus version, it may causes bloom filter deserialize failed.

In single Bloom filter test cases with a capacity of 1,000,000 and a
false positive rate (FPR) of 0.001, the blocked Bloom filter is 5 times
faster than the basic Bloom filter in both querying and construction, at
the cost of a 30% increase in memory usage.

Block BF construct time {"time": "54.128131ms"}
Block BF size {"size": 3021578}
Block BF Test cost {"time": "55.407352ms"}
Basic BF construct time {"time": "210.262183ms"}
Basic BF size {"size": 2396308}
Basic BF Test cost {"time": "192.596229ms"}
In multi Bloom filter test cases with a capacity of 100,000, an FPR of
0.001, and 100 Bloom filters, we reuse the primary key locations for all
Bloom filters to avoid repeated hash computations. As a result, the
blocked Bloom filter is also 5 times faster than the basic Bloom filter
in querying.

Block BF TestLocation cost {"time": "529.97183ms"}
Basic BF TestLocation cost {"time": "3.197430181s"}

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-07-05 17:04:10 +08:00
wayblink
c62bf8a0b0
fix: [Cherry-pick]Pick major compaction fixs and optimizations (#34360)
This PR cherry-picks the following commits:

- fix: sync partitiion stats blocking balance task #33742
- fix: Fix meta prefix overlap bug #33830
- fix: Small fixs of major compaction #33929 
- fix: Fix memory buffer error & some renaming #33850
- fix: sync part stats task cannot be finished #34027 
- Add an option to enable/disable vector field clustering key #34097
- fix: fix error ignore in compactor #34169
- fix:load major compaction partial result #34052
- Use new stream segment reader in clustering compaction #34232

issue: #30633
pr: #33742 #33830 #33929 #33850 #34027 #34097 #34169 #34052 #34232

---------

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
Signed-off-by: wayblink <anyang.wang@zilliz.com>
Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: Chun Han <116052805+MrPresent-Han@users.noreply.github.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2024-07-03 09:53:37 +08:00
wayblink
99586066f5
feat: [cherry-pick] Major compaction (#34326)
This PR cherry-picks the following commits:
fix: speed up segment lookup via channel name in datacoord (#33530)
needed by the next commit
  feat: Major compaction (#33620)

issue: #30633
pr: #33620

---------

Signed-off-by: yiwangdr <yiwangdr@gmail.com>
Signed-off-by: wayblink <anyang.wang@zilliz.com>
Co-authored-by: yiwangdr <80064917+yiwangdr@users.noreply.github.com>
Co-authored-by: MrPresent-Han <chun.han@zilliz.com>
2024-07-02 18:29:01 +08:00
zhenshan.cao
14a11e379c
enhance: Refactor Compaction to enable persistence(#33265) (#34268)
pr : #33265 

issue #33586

Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2024-07-01 19:32:07 +08:00
yihao.dai
b1e74dc7cb
enhance: [cherry-pick] Decouple compaction from shard (#34157)
This PR cherry-picks the following commits:

- Implement task limit control logic in datanode.
https://github.com/milvus-io/milvus/pull/32881
- Load bf from storage instead of memory during L0 compaction.
https://github.com/milvus-io/milvus/pull/32913
- Remove dependencies on shards (e.g. SyncSegments, injection).
https://github.com/milvus-io/milvus/pull/33138
- Rename Compaction interface to CompactionV2.
https://github.com/milvus-io/milvus/pull/33858
- Remove the unused residual compaction logic.
https://github.com/milvus-io/milvus/pull/33932

issue: https://github.com/milvus-io/milvus/issues/32809

pr: https://github.com/milvus-io/milvus/pull/32881,
https://github.com/milvus-io/milvus/pull/32913,
https://github.com/milvus-io/milvus/pull/33138,
https://github.com/milvus-io/milvus/pull/33858,
https://github.com/milvus-io/milvus/pull/33932

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-06-25 20:22:03 +08:00
wei liu
061a00c58f
enhance: Enable database level replica num and resource groups for loading collection (#33052) (#33981)
pr: #33052

issue: #30040

This PR introduce two database level props:
1. database.replica.number
2. database.resource_groups

User can set those two database props by AlterDatabase API, then can
load collection without specified replica_num and resource groups. then
it will use database level load param when try to load collections.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-06-21 16:56:02 +08:00
Cai Yudong
ebd0af14f4
enhance: Handle Float16Vector/BFloat16Vector numpy bulk insert as same as BinaryVector (#33760) (#33788)
pr: #33760
Issue: #22837

Signed-off-by: Cai Yudong <yudong.cai@zilliz.com>
2024-06-13 10:49:57 +08:00
yihao.dai
396f8608dd
fix: Fix multiple vector fields import (#33723) (#33724)
1. Fix dim mismatch with multi-vector fields and JSON import
2. Enhance: do not display file ID in GetImportResponse.

issue: https://github.com/milvus-io/milvus/issues/33681,
https://github.com/milvus-io/milvus/issues/33682

pr: https://github.com/milvus-io/milvus/pull/33723

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-06-10 21:55:55 +08:00
yihao.dai
ed1dee9e38
enhance: Support L0 import (#33514) (#33712)
issue: https://github.com/milvus-io/milvus/issues/33157

pr: https://github.com/milvus-io/milvus/pull/33514

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-06-08 11:17:52 +08:00
yihao.dai
8ff5d2793c
fix: Fill stats log id and check validity (#33477) (#33478)
1. Fill log ID of stats log from import
2. Add a check to validate the log ID before writing to meta

issue: https://github.com/milvus-io/milvus/issues/33476

pr: https://github.com/milvus-io/milvus/pull/33477

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-05-31 14:13:46 +08:00
Cai Yudong
68e2d532d8
enhance: Cherry-pick following SparseFloatVector bulk insert PRs to Milvus2.4 (#33391)
Cherry pick from master
pr: #33064 #33101 #33187 #33259 #33224
#33064 Support readable JSON file import for
Float16/BFloat16/SparseFloat
  #33101 Store SparseFloatVector into parquet as JSON string
  #33187 Fix SparseFloatVector data parse error for parquet
  #33259 Fix SparseFloatVector data parse error for json
  #33224 Optimize bulk insert unittest

Signed-off-by: Cai Yudong <yudong.cai@zilliz.com>
2024-05-30 10:31:45 +08:00
yihao.dai
ad4c1975bd
fix: Fix filtering by partition key fails for importing data (#33274) (#33277)
Before executing the import, partition IDs should be reordered according
to partition names. Otherwise, the data might be hashed to the wrong
partition during import. This PR corrects this error.

issue: https://github.com/milvus-io/milvus/issues/33237

pr: https://github.com/milvus-io/milvus/pull/33274

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-05-23 11:25:40 +08:00
Cai Yudong
4fc7915c70
enhance: unify data generation test APIs (#32955)
Issue: #22837

Signed-off-by: Cai Yudong <yudong.cai@zilliz.com>
2024-05-14 14:33:33 +08:00
SimFG
4031abd2fa
enhance: change default partition num to 16 when using partition key (#32950)
/kind improvement

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-05-13 14:19:31 +08:00
wei liu
e2332bdc17
enhance: Enable channel exclusive balance policy (#32911)
issue: #32910  
* split replica's node list to channels when create replicas
 * balance nodes among channels when node change happens
 * implement channel level balance, let balance happens in channel level

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-05-10 17:27:31 +08:00
Cai Yudong
dc89c6f810
enhance: remove duplicated data generation APIs for bulk insert test (#32889)
Issue: #22837

including following changes:
1. Add API CreateInsertData() and BuildArrayData() in
internal/util/testutil
2. Remove duplicated test APIs from importutilv2 unittest and bulk
insert integration test

Signed-off-by: Cai Yudong <yudong.cai@zilliz.com>
2024-05-10 15:27:31 +08:00
Cai Yudong
bcdbd1966e
feat: Support sparse float vector bulk insert for binlog/json/parquet (#32649)
Issue: #22837

Signed-off-by: Cai Yudong <yudong.cai@zilliz.com>
2024-05-07 18:43:30 +08:00
yihao.dai
53874ce245
fix: Fix cannot specify partition name in binlog import (#32730)
issue: https://github.com/milvus-io/milvus/issues/32807

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-05-07 17:19:30 +08:00
yiwangdr
b1eacb2ae8
feat: datacoord/node watch based on rpc (#32036)
issue: https://github.com/milvus-io/milvus/issues/25309

Signed-off-by: yiwangdr <yiwangdr@gmail.com>
2024-05-07 15:49:30 +08:00
yihao.dai
4de063ae14
fix: Make the dynamic column optional in parquet import (#32738)
issue: https://github.com/milvus-io/milvus/issues/32729

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-05-07 11:21:29 +08:00
Buqian Zheng
37a99ca23e
fix: remove flaky sparse integration test (#32767)
issue: https://github.com/milvus-io/milvus/issues/32766

this test is outdated, thus removing it instead of fixing it.

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2024-05-06 19:19:29 +08:00
congqixia
ecd8e52b53
fix: Use default integration case timeout for TestBinlogImport (#32701)
See also #32700

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-04-29 19:07:27 +08:00
yihao.dai
1594122c0a
enhance: Make the dynamic field file optional during numpy import (#32596)
1. Make the dynamic field file optional during numpy import
2. Add integration importing test with dynamic
3. Disallow file of pk when autoID=true during numpy import

issue: https://github.com/milvus-io/milvus/issues/32542

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-04-28 19:39:25 +08:00
congqixia
8c4fc1e61c
enhance: Close singleton etcd client in integration teardown (#32664)
Found lots of `failed to updateTimeTick` with error `skip
ChannelTimeTickMsg from un-recognized session 1` The reason was etcd
client became singleton and used last root path in multiple cases are
run in one suite.

This PR add close singleton client invocation to fix this problem.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-04-28 18:17:26 +08:00
congqixia
0ff7a46e95
fix: [skip e2e] Disable compaction for balance integration test (#32603)
See also #31468

Balance test suite may assert segment number based on test setup.
However the compaction may reduce the number and cause test cases
unstable.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-04-25 16:55:23 +08:00
Buqian Zheng
8a1017a152
enhance: add helpers to parse sparse float vector in JSON (#32543)
issue: #29419

added helper functions to parse JSON representation of sparse float
vectors, will be used by both the restful server and the import utils.

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2024-04-25 14:47:24 +08:00
Cai Yudong
16b8b7b35d
enhance: Add get_vector unittest for float16 & bfloat16 (#32153)
Issue: #22837

Signed-off-by: Cai Yudong <yudong.cai@zilliz.com>
2024-04-23 16:15:23 +08:00
Cai Yudong
5fc439c600
feat: Bulk insert support fp16/bf16 (#32157)
Issue: #22837

Signed-off-by: Cai Yudong <yudong.cai@zilliz.com>
2024-04-22 10:05:22 +08:00
smellthemoon
d1ef0a81ee
enhance: [skip e2e] change some long logs (#32309)
<img width="1042" alt="image"
src="https://github.com/milvus-io/milvus/assets/64083300/8daa9ab9-1988-4398-a92a-7d2dac2cd8cd">
change this log

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-04-17 10:25:19 +08:00
yihao.dai
558feed5ed
fix: Use pk from binlog during import (#32118)
During binlog import, even if the primary key's autoID is set to true,
the primary key from the binlog should be used instead of being
reassigned.

issue: https://github.com/milvus-io/milvus/discussions/31943,
https://github.com/milvus-io/milvus/issues/28521

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-04-16 14:51:20 +08:00
congqixia
b87f41128b
fix: [skip e2e] Make channel balance test accept flushing segments (#32229)
See also #30973

Make the case stable since the segment state may be flushing when suite
tries to check segment state.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-04-15 11:27:18 +08:00
chyezh
48fe977a9d
enhance: declarative resource group api (#31930)
issue: #30647

- Add declarative resource group api

- Add config for resource group management

- Resource group recovery enhancement

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2024-04-15 08:13:19 +08:00
Patrick Weizhi Xu
52ae47c850
enhance: gather materialized view search info once per request (#31996)
issue: #29892 

This PR:
1. Move the process of gathering materialized search info to when the
search plan is created, before it goes to each segment, to avoid
repeated work and access the plan node under multi-threaded
circumstances.
2. Enforce the supported MV type to `VARCHAR`
3. Add integration test

Signed-off-by: Patrick Weizhi Xu <weizhi.xu@zilliz.com>
2024-04-11 15:21:19 +08:00
yihao.dai
273df98e20
enhance: Add binlog import intergration test (#32112)
issue: https://github.com/milvus-io/milvus/issues/28521

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-04-11 10:31:18 +08:00
Buqian Zheng
2fdf1a6e76
feat: [Sparse Float Vector] added some integration tests (#31062)
add some integration tests for sparse float vector support

https://github.com/milvus-io/milvus/issues/29419

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2024-04-10 19:57:18 +08:00
SimFG
90bed1caf9
enhance: add the related data size for the read apis (#31816)
issue: #30436
origin pr: #30438
related pr: #31772

---------

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-04-10 15:07:17 +08:00
wei liu
c5a9cae44e
enhance: [skip e2e]remove useless suspend/resume gc operation in integration test (#31954)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-04-09 19:55:17 +08:00
yiwangdr
1cd15d9322
test: support segment release in integration test (#31190)
issue: #29507

Notice that api_testonly.go files should be guarded by compiler tag
`test`, so that production build rules don't compile them and these APIs
don't get misused.

Signed-off-by: yiwangdr <yiwangdr@gmail.com>
2024-04-09 11:39:17 +08:00
congqixia
1c2ae59ece
fix: [skip e2e] Dedup available ports and retry for integration setup (#31902)
See also #31901

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-04-08 10:35:17 +08:00