doris.git
91 min ago[improvement](scan) remove concurrency limit if scan has predicate (#13021) master
Mingyu Chen [Wed, 28 Sep 2022 09:07:07 +0000 (17:07 +0800)] 
[improvement](scan) remove concurrency limit if scan has predicate (#13021)

If a scan node has predicate, we can not limit the concurrency of scanner.
Because we don't know how much data need to be scan.
If we limit the concurrency, this will cause query to be very slow.

For exmple:
select * from tbl limit 1, the concurrency will be 1;
select * from tbl where k1=1 limit 1, the concurrency will not limit.

97 min ago[fix](planner) fix push down no grouping agg (#12983)
luozenglin [Wed, 28 Sep 2022 09:01:01 +0000 (17:01 +0800)] 
[fix](planner) fix push down no grouping agg (#12983)

The value column of the agg does not support zone_map index, fixing the value column pushing down to zone map causes null pointer.

2 hours ago[improvement](memory) set TCMALLOC_HEAP_LIMIT_MB to control memory consumption of...
Yongqiang YANG [Wed, 28 Sep 2022 07:44:18 +0000 (15:44 +0800)] 
[improvement](memory) set TCMALLOC_HEAP_LIMIT_MB to control memory consumption of tcmalloc (#12981)

4 hours ago[DOC](datev2) Add documents for DateV2 (#12976)
Gabriel [Wed, 28 Sep 2022 06:36:26 +0000 (14:36 +0800)] 
[DOC](datev2) Add documents for DateV2 (#12976)

4 hours ago[optimization](array-type) optimize error prompts when sql parser report error (...
carlvinhust2012 [Wed, 28 Sep 2022 06:35:41 +0000 (14:35 +0800)] 
[optimization](array-type) optimize error prompts when sql parser report error (#12999)

Co-authored-by: hucheng01 <hucheng01@baidu.com>
6 hours ago[enhancement](memory) Jemalloc performance optimization and compatibility with MemTra...
Xinyi Zou [Wed, 28 Sep 2022 04:04:29 +0000 (12:04 +0800)] 
[enhancement](memory) Jemalloc performance optimization and compatibility with MemTracker #12496

6 hours ago[chore](regression-test) add default group(p0) for regression-test (#12977)
Jerry Hu [Wed, 28 Sep 2022 03:47:19 +0000 (11:47 +0800)] 
[chore](regression-test) add default group(p0) for regression-test (#12977)

7 hours ago[improvement](test) cache data from s3 to cacheDataPath (#13018)
Yongqiang YANG [Wed, 28 Sep 2022 02:43:55 +0000 (10:43 +0800)] 
[improvement](test) cache data from s3 to cacheDataPath (#13018)

Now, regression data is stored in sf1DataPath, which is local or remote.
For performance reason, we use local dir for community pipeline, however, we need prepare data for every machine,
this process is easy mistake. So we cache data from s3 in local transparently, thus, we just need to config one data source.

8 hours ago[feature](Nereids) use one stage aggregation if available (#12849)
morrySnow [Wed, 28 Sep 2022 02:38:03 +0000 (10:38 +0800)] 
[feature](Nereids) use one stage aggregation if available (#12849)

Currently, we always disassemble aggregation into two stage: local and global. However, in some case, one stage aggregation is enough, there are two advantage of one stage aggregation.
1. avoid unnecessary exchange.
2. have a chance to do colocate join on the top of aggregation.

This PR move AggregateDisassemble rule from rewrite stage to optimization stage. And choose one stage or two stage aggregation according to cost.

8 hours ago[Improvement](sort) Reuse memory in sort node (#12921)
Gabriel [Wed, 28 Sep 2022 01:44:35 +0000 (09:44 +0800)] 
[Improvement](sort) Reuse memory in sort node (#12921)

9 hours ago[fix](join)report 'natural join is not supported' instead of getting wrong result...
starocean999 [Wed, 28 Sep 2022 01:08:56 +0000 (09:08 +0800)] 
[fix](join)report 'natural join is not supported' instead of getting wrong result (#13008)

* [fix](join)report 'natural join is not supported' instead of getting wrong result

* add regression test

9 hours ago[Bug](function) core dump on substr #13007
Pxl [Wed, 28 Sep 2022 00:54:49 +0000 (08:54 +0800)] 
[Bug](function) core dump on substr #13007

18 hours ago[chore](third-party) Fix compilation errors reported by clang-15 (#13016)
Adonis Ling [Tue, 27 Sep 2022 15:46:43 +0000 (23:46 +0800)] 
[chore](third-party) Fix compilation errors reported by clang-15 (#13016)

Add some compile flags to eliminate compilation errors reported by clang-15.

20 hours ago[enhancement](load) avoid duplicate reduce on same TabletsChannel #12975
zhannngchen [Tue, 27 Sep 2022 14:03:08 +0000 (22:03 +0800)] 
[enhancement](load) avoid duplicate reduce on same TabletsChannel #12975

In the policy changed by PR #12716, when reaching the hard limit, there might be multiple threads can pick same LoadChannel and call reduce_mem_usage on same TabletsChannel. Although there's a lock and condition variable can prevent multiple threads to reduce mem usage concurrently, but they still can do same reduce-work on that channel multiple times one by one, even it's just reduced.

20 hours ago[feature-wip](new-scan) support more load situation (#12953)
Mingyu Chen [Tue, 27 Sep 2022 13:48:32 +0000 (21:48 +0800)] 
[feature-wip](new-scan) support more load situation (#12953)

20 hours agofix_md5sum_and_sm3sum (#13009)
yongjinhou [Tue, 27 Sep 2022 13:41:14 +0000 (21:41 +0800)] 
fix_md5sum_and_sm3sum (#13009)

21 hours ago[feature](Nereids) Eliminate outer join (#12985)
jakevin [Tue, 27 Sep 2022 13:09:25 +0000 (21:09 +0800)] 
[feature](Nereids) Eliminate outer join (#12985)

eliminate outer join if we have non-null predicate on slots of inner side of outer join.

TODO:
1. use constant viariable to handle it (we can handle more case like nullsafeEqual ......)
2. using constant folding to handle of null values, is more general and does not require writing long logical judgments
3. handle null safe equals(<=>)

23 hours ago[feature](Nereids) Set pre-aggregation status for OLAP table scan (#12785)
Shuo Wang [Tue, 27 Sep 2022 11:12:15 +0000 (19:12 +0800)] 
[feature](Nereids) Set pre-aggregation status for OLAP table scan (#12785)

This is the second step for #12303.

The previous PR #12464 added the framework to select the rollup index for OLAP table, but pre-aggregation is turned on by default.
This PR set pre-aggregation for scan OLAP table.

The main steps are as below:
1. Select rollup index when aggregate is present, this is handled by `SelectRollupWithAggregate` rule.  Expressions in aggregate functions, grouping expressions, and pushdown predicates would be used to check whether the pre-aggregation should be turned off.
2. When selecting from olap scan table without aggregate plan, it would be handled by `SelectRollupWithoutAggregate`.

24 hours ago[Feature](serialize) move block_data_version to fe heart beat (#12667)
Pxl [Tue, 27 Sep 2022 10:25:54 +0000 (18:25 +0800)] 
[Feature](serialize) move block_data_version to fe heart beat (#12667)

Move block_data_version from be config to fe heart beat

25 hours ago[feature-wip](statistics) step6: statistics is available (#8864)
ElvinWei [Tue, 27 Sep 2022 09:24:14 +0000 (17:24 +0800)] 
[feature-wip](statistics) step6: statistics is available (#8864)

This pull request includes some implementations of the statistics(https://github.com/apache/incubator-doris/issues/6370).

Execute these sql such as "`ANALYZE`, `SHOW ANALYZE`, `SHOW TABLE/COLUMN STATS...`" to collect statistics information and query them.

The following are the changes in this PR:
1. Added the necessary test cases for statistics.
2. Statistics optimization. To ensure the validity of statistics, statistics can only be updated after the statistics task is completed or manually updated by SQL, and the collected statistics should not be changed in other ways. The reason is to ensure that the statistics are not distorted.
3. Some code or comments have been adjusted to fix checkStyle problem.
4. Remove some code that was previously added because statistics were not available.
5. Add a configuration, which indicates whether to enable the statistics. The current statistics may not be stable, and it is not enabled by default (`enable_cbo_statistics=false`). Currently, it is mainly used for CBO test.

See this PR(#12766) syntax, some simple examples of statistics:
```SQL
-- enable statistics
SET enable_cbo_statistics=true;

-- collect statistics for all tables in the current database
ANALYZE;

-- collect all column statistics for table1
ANALYZE test.table1;

-- collect statistics for siteid of table1
ANALYZE test.table1(siteid);
ANALYZE test.table1(pv, citycode);

-- collect statistics for partition of table1
ANALYZE test.table1 PARTITION(p202208);
ANALYZE test.table1 PARTITIONS(p202208, p202209);

-- display table statistics
SHOW TABLE STATS test.table1;

-- display partition statistics of table1
SHOW TABLE STATS test.table1 PARTITION(p202208);

-- display column statistics of table1
SHOW COLUMN STATS test.table1;

-- display column statistics of partition
SHOW COLUMN STATS test.table1 PARTITION(p202208);

-- display the details of the statistics jobs
SHOW ANALYZE;
SHOW ANALYZE idxxxx;
```

25 hours ago[enhancement](test) add tpcds_sf1000 to p2 (#12695)
Yongqiang YANG [Tue, 27 Sep 2022 09:12:52 +0000 (17:12 +0800)] 
[enhancement](test) add tpcds_sf1000 to p2 (#12695)

25 hours ago[enhancement](test) add tpch_sf10 cases to p2 (#12698)
Yongqiang YANG [Tue, 27 Sep 2022 09:12:37 +0000 (17:12 +0800)] 
[enhancement](test) add tpch_sf10 cases to p2 (#12698)

26 hours ago[Enhancement](optimize) optimize for insert_indices_from (#12807)
Pxl [Tue, 27 Sep 2022 07:49:15 +0000 (15:49 +0800)] 
[Enhancement](optimize) optimize for insert_indices_from (#12807)

26 hours ago[test](join)add join case5 #12854
zy-kkk [Tue, 27 Sep 2022 07:48:36 +0000 (15:48 +0800)] 
[test](join)add join case5 #12854

26 hours ago[regression-test](join)add join case5 #12854
zy-kkk [Tue, 27 Sep 2022 07:47:36 +0000 (15:47 +0800)] 
[regression-test](join)add join case5 #12854

28 hours ago[typo](docs)Add bitmap_count doc And Adjustment function list (#12978)
Liqf [Tue, 27 Sep 2022 06:21:37 +0000 (14:21 +0800)] 
[typo](docs)Add bitmap_count doc And Adjustment function list (#12978)

28 hours ago[chore](build) Fix compilation errors reported by clang-15 (#13000)
Adonis Ling [Tue, 27 Sep 2022 06:04:44 +0000 (14:04 +0800)] 
[chore](build) Fix compilation errors reported by clang-15 (#13000)

Add a compile flag -Wno-unused-but-set-variable to build libGeo.a .

30 hours ago[function](bitmap) support bitmap_hash64 (#12992)
TengJianPing [Tue, 27 Sep 2022 04:16:02 +0000 (12:16 +0800)] 
[function](bitmap) support bitmap_hash64 (#12992)

30 hours ago[fix](projection)sort node's unmaterialized slots should be removed from resolvedTupl...
starocean999 [Tue, 27 Sep 2022 03:46:44 +0000 (11:46 +0800)] 
[fix](projection)sort node's unmaterialized slots should be removed from resolvedTupleExprs (#12963)

32 hours ago[chore](build) Support building from source on ubuntu-22.04 (aarch64) (#12813)
Adonis Ling [Tue, 27 Sep 2022 02:29:13 +0000 (10:29 +0800)] 
[chore](build) Support building from source on ubuntu-22.04 (aarch64) (#12813)

Support building from source on ubuntu-22.04

32 hours ago[feature-wip](unique-key-merge-on-write) fix thread safe issue in BetaRowsetWriter...
zhannngchen [Tue, 27 Sep 2022 02:28:18 +0000 (10:28 +0800)] 
[feature-wip](unique-key-merge-on-write) fix thread safe issue in BetaRowsetWriter (#12875)

32 hours ago[fix](like)prevent null pointer by unimplemented like_vec functions (#12910)
starocean999 [Tue, 27 Sep 2022 02:02:10 +0000 (10:02 +0800)] 
[fix](like)prevent null pointer by unimplemented like_vec functions (#12910)

* [fix](like)prevent null pointer by unimplemented like_vec functions

* fix pushed like predicate on dict encoded column bug

32 hours ago[fix](remote)fix bug for delete s3 dir and list s3 dir (#12918)
pengxiangyu [Tue, 27 Sep 2022 01:54:37 +0000 (09:54 +0800)] 
[fix](remote)fix bug for delete s3 dir and list s3 dir (#12918)

* fix bug for delete s3 dir and list s3 dir

33 hours ago[enhancement](workflow) Enable the shellcheck workflow to comment the PRs (#12633)
Adonis Ling [Tue, 27 Sep 2022 01:08:12 +0000 (09:08 +0800)] 
[enhancement](workflow) Enable the shellcheck workflow to comment the PRs (#12633)

> Due to the dangers inherent to automatic processing of PRs, GitHub’s standard pull_request workflow trigger by
default prevents write permissions and secrets access to the target repository. However, in some scenarios such
access is needed to properly process the PR. To this end the pull_request_target workflow trigger was introduced.

According to the article [Keeping your GitHub Actions and workflows secure](https://securitylab.github.com/research/github-actions-preventing-pwn-requests/) , the trigger condition in
`shellcheck.yml` which is `pull_request` can't comment the PR due to the lack of write permissions of the workflow.

Despite the `ShellCheck` workflow checkouts the source, but it doesn't build and test the source code. I think it is safe
to change the trigger condition from `pull_request` to `pull_request_target` which can make the workflow have write
permissions to comment the PR.

33 hours ago[enhancement](memory) Trigger load channel flush based on process physical memory...
Xinyi Zou [Tue, 27 Sep 2022 01:07:38 +0000 (09:07 +0800)] 
[enhancement](memory) Trigger load channel flush based on process physical memory to avoid OOM #12960

When the physical memory of the process reaches 90% of the mem limit, trigger the load channel mgr to brush down
The default value of be.conf mem_limit is changed from 90% to 80%, and stability is the priority.
Fix deadlock in arena_locks in BufferPool::BufferAllocator::ScavengeBuffers and _lock in DebugString

33 hours ago[regression-case](improve) improve regression test case (#12979)
TengJianPing [Tue, 27 Sep 2022 00:53:53 +0000 (08:53 +0800)] 
[regression-case](improve) improve regression test case (#12979)

33 hours ago[enhancement](AuditLoaderPlugin): add audit queue capacity configurat… (#12887)
wxy [Tue, 27 Sep 2022 00:50:30 +0000 (08:50 +0800)] 
[enhancement](AuditLoaderPlugin): add audit queue capacity configurat… (#12887)

33 hours ago[Bug](function) fix substr return null on row-based engine #12906
Pxl [Tue, 27 Sep 2022 00:47:32 +0000 (08:47 +0800)] 
[Bug](function) fix substr return null on row-based engine #12906

33 hours ago[fix](transfer_thread) fix the loss of notification. (#12988)
Xiaocc [Tue, 27 Sep 2022 00:44:02 +0000 (08:44 +0800)] 
[fix](transfer_thread) fix the loss of notification. (#12988)

43 hours ago[Chore](clang) fix some build fail on clang15 (#12882) branch-1.2-lts
Pxl [Mon, 26 Sep 2022 15:13:28 +0000 (23:13 +0800)] 
[Chore](clang) fix some build fail on clang15 (#12882)

remove unused variables

46 hours agofix doc typos (#12967)
zxealous [Mon, 26 Sep 2022 12:11:26 +0000 (20:11 +0800)] 
fix doc typos (#12967)

2 days ago[fix](column)fix get_shrinked_column misspell (#12961)
Shane [Mon, 26 Sep 2022 09:32:03 +0000 (17:32 +0800)] 
[fix](column)fix get_shrinked_column misspell  (#12961)

Fix misspell

2 days ago[feature](Nereids) constant expression folding (#12151)
shee [Mon, 26 Sep 2022 09:16:23 +0000 (17:16 +0800)] 
[feature](Nereids) constant expression folding (#12151)

2 days ago[refactor](fe-core src test catalog): refactor and replace use NIO #12818 (#12818)
DingGeGe [Mon, 26 Sep 2022 08:51:46 +0000 (16:51 +0800)] 
[refactor](fe-core src test catalog): refactor and replace use NIO #12818 (#12818)

2 days ago[function](hash) add support of murmur_hash3_64 (#12923)
TengJianPing [Mon, 26 Sep 2022 06:23:37 +0000 (14:23 +0800)] 
[function](hash) add support of murmur_hash3_64 (#12923)

2 days ago[fix](memtracker) Remove mem tracker record mem pool actual memory usage #12954
Xinyi Zou [Mon, 26 Sep 2022 04:54:06 +0000 (12:54 +0800)] 
[fix](memtracker) Remove mem tracker record mem pool actual memory usage #12954

In order to avoid different mem tracker consumption values of multiple queries/loads, and the difference between the virtual memory of alloc and the physical memory actually increased by the process.

The memory alloc in PODArray and mempool will not be recorded in the query/load mem tracker immediately, but will be gradually recorded in the mem tracker during the memory usage.

But mem pool allocates memory from chunk allocator. If this chunk is used after the second time, it may have used physical memory. The above mechanism will cause the load channel memory statistics to be less than the actual value.

2 days agoOptimized materialized view documentation (#12798)
zy-kkk [Mon, 26 Sep 2022 04:25:20 +0000 (12:25 +0800)] 
Optimized materialized view documentation (#12798)

Optimized materialized view documentation

2 days agoSpark load import kerberos parameter modification (#12924)
caoliang-web [Mon, 26 Sep 2022 04:24:43 +0000 (12:24 +0800)] 
Spark load import kerberos parameter modification (#12924)

Spark load import kerberos parameter modification

2 days ago[feature](nereids) extract single table expression for push down (#12894)
minghong [Mon, 26 Sep 2022 03:19:37 +0000 (11:19 +0800)] 
[feature](nereids) extract single table expression for push down (#12894)

TPCH q7, we have expression like
(n1.n_name = 'FRANCE' and n2.n_name = 'GERMANY') or (n1.n_name = 'GERMANY' and n2.n_name = 'FRANCE')

this expression implies
(n1.n_name='FRANCE' or n1.n_name=''GERMANY)
The implied expression is logical redundancy, but it could be used to reduce the output tuple number of scan(n1), if nereids pushes this expression down.

This pr introduces a RULE to extract such expressions.

NOTE:
1. we only extract expression on a single table.
2. if the extracted expression cannot be pushed down, e.g. it is on right table of left outer join, we need another rule to remove all the useless expressions.

2 days ago[fix](parquet) fix write error data as parquet format. (#12864)
luozenglin [Mon, 26 Sep 2022 02:41:17 +0000 (10:41 +0800)] 
[fix](parquet) fix write error data as parquet format. (#12864)

* [fix](parquet) fix write error data as parquet format.

Fix incorrect data conversion when writing tiny int and small int data
to parquet files in non-vectorized engine.

2 days ago[fix](log)Audit log status is incorrect (#12824)
jiafeng.zhang [Mon, 26 Sep 2022 01:57:52 +0000 (09:57 +0800)] 
[fix](log)Audit log status is incorrect (#12824)

Audit log status is incorrect

2 days ago[typo](docs)Optimized string and date function doc (#12949)
zy-kkk [Mon, 26 Sep 2022 01:26:12 +0000 (09:26 +0800)] 
[typo](docs)Optimized string and date function doc (#12949)

2 days ago[typo](docs)Optimized date function doc order and add partial function doc #12878
zy-kkk [Mon, 26 Sep 2022 01:25:11 +0000 (09:25 +0800)] 
[typo](docs)Optimized date function doc order and add partial function doc #12878

2 days ago[feature-wip](new-scan)Add new odbc scanner and new odbc scan node (#12899)
Tiewei Fang [Mon, 26 Sep 2022 01:24:25 +0000 (09:24 +0800)] 
[feature-wip](new-scan)Add new odbc scanner and new odbc scan node (#12899)

2 days ago[chore](config) increase minimum thread num of some thread pool (#12917)
Jerry Hu [Mon, 26 Sep 2022 01:00:18 +0000 (09:00 +0800)] 
[chore](config) increase minimum thread num of some thread pool (#12917)

Too small minimum thread num will cause additional overhead for creating and recycling threads.

2 days ago[Enhancement](debugging) Add more debug info for clang build (#12845)
Adonis Ling [Mon, 26 Sep 2022 00:50:12 +0000 (08:50 +0800)] 
[Enhancement](debugging) Add more debug info for clang build (#12845)

2 days ago[enhancement](test) add brown cases to p2 (#12694)
Yongqiang YANG [Sun, 25 Sep 2022 15:46:45 +0000 (23:46 +0800)] 
[enhancement](test) add brown cases to p2 (#12694)

2 days ago[enhancement](test) add github events cases to p2 (#12696)
Yongqiang YANG [Sun, 25 Sep 2022 15:46:15 +0000 (23:46 +0800)] 
[enhancement](test) add github events cases to p2 (#12696)

2 days ago[feature-wip](parquet-reader) pre read page data in advance to avoid frequent seek...
Ashin Gau [Sun, 25 Sep 2022 13:21:06 +0000 (21:21 +0800)] 
[feature-wip](parquet-reader) pre read page data in advance to avoid frequent seek (#12898)

1. Fix the bug of file position in `HdfsFileReader`
2. Reserve enough buffer for `ColumnColumnReader` to read large continuous memory

2 days ago[Refactor](datev2) Update comments for datev2/datetimev2 (#12823)
Gabriel [Sun, 25 Sep 2022 10:43:32 +0000 (18:43 +0800)] 
[Refactor](datev2) Update comments for datev2/datetimev2 (#12823)

3 days ago[Improvement](dict) optimize dictionary column (#12852)
Gabriel [Sun, 25 Sep 2022 10:29:10 +0000 (18:29 +0800)] 
[Improvement](dict) optimize dictionary column (#12852)

3 days ago[Improvement](predicate) Replace for-loop by memcpy (#12867)
Gabriel [Sun, 25 Sep 2022 10:27:59 +0000 (18:27 +0800)] 
[Improvement](predicate) Replace for-loop by memcpy (#12867)

3 days ago[feature](JSON datatype)Support JSON datatype (#10322)
Shane [Sun, 25 Sep 2022 06:06:49 +0000 (14:06 +0800)] 
[feature](JSON datatype)Support JSON datatype (#10322)

Add `JSON` datatype, following features are implemented by this PR:
1. `CREATE` tables with `JSON` type columns
2. `INSERT` values containing `JSON` type value stored in `String`, which is represented as binary format(AKA `JSONB`) at BE
3. `SELECT` JSON columns

Detail design refers [DSIP-016: Support JSON type](https://cwiki.apache.org/confluence/display/DORIS/DSIP-016%3A+Support+JSON+type)

* add JSONB data storage format type

* fix JsonLiteral resolve bug

* add DataTypeJson case in data_type_factory

* add JSON syntax check in FE

* add operators for jsonb_document, currently not support comparison between any JSON type value

* add ColumnJson and DataTypeJson

* add JsonField to store JsonValue

* add JsonValue to convert String JSON to BINARY JSON and JsonLiteral case for vliteral

* add push_json for MysqlResultWriter

* JSON column need no zone_map_index

* Revert "JSON column need no zone_map_index"

This reverts commit f71d1ce1ded9dbae44a5d58abcec338816b70d79.

* add JSON writer and reader, ignore zone-map for JSON column

* add json_to_string for DataTypeJson

* add olap_data_convertor for JSON type

* add some enum

* add OLAP_FIELD_TYPE_JSON type, FieldTypeTraits for it and corresponding cases or functions

* fix column_json offsets overflow bug, format code

* remove useless TODOs, add CmpType cases for JSON type

* add license header

* format license

* format be codes

* resolve rebase master conflicts

* fix bugs for CREATE and meta related code

* refactor JsonValue constructors, add fe JSON cases and fix some bugs, reformat codes

* modification be codes along code review advice

* fix rebase conflicts with master

* add unit test for json_value and column_json

* fix rebase error

* rename json to jsonb

* fix some data convert bugs, set Mysql type to JSON

3 days ago[fix](load) print detailed error message (#12938) opt_perf
zhannngchen [Sun, 25 Sep 2022 02:31:41 +0000 (10:31 +0800)] 
[fix](load) print detailed error message (#12938)

fix flush failure return message

4 days ago[fix](function)fix string split function buffer overflow (#12834)
starocean999 [Sat, 24 Sep 2022 09:32:00 +0000 (17:32 +0800)] 
[fix](function)fix string split function buffer overflow (#12834)

4 days ago[fix](new-scan)Fix new scanner load job bugs (#12903)
Jibing-Li [Sat, 24 Sep 2022 09:21:19 +0000 (17:21 +0800)] 
[fix](new-scan)Fix new scanner load job bugs (#12903)

Fix bugs:
1. Fe need to send file format (e.g. parquet, orc ...) to be while processing load jobs using new scanner.
2. Try to get parquet file column type from SchemaElement.type before getting from Logical type and Converted type.

4 days ago[Enhancement](load) Refine the load channel flush policy on mem limit (#12716)
zhannngchen [Sat, 24 Sep 2022 02:01:13 +0000 (10:01 +0800)] 
[Enhancement](load) Refine the load channel flush policy on mem limit (#12716)

1. Remove single load channel mem limit, only use load channel mgr mem limit
2. Default load channel mgr mem limit from 50% to 80%
3. load channel mgr add soft mem limit. When the soft limit is exceeded, other threads will not hang, only current thread triggers flush
4. When exceed load channel mgr mem limit, find a load channel with the largest mem usage, continue to find a tablet channel with the largest mem usage, and try to flush 1/3 of the mem usage of this tablet channel.

4 days ago[bugfix](scanner) olap scanner compute is wrong (#12857)
yiguolei [Sat, 24 Sep 2022 01:59:59 +0000 (09:59 +0800)] 
[bugfix](scanner) olap scanner compute is wrong (#12857)

Co-authored-by: yiguolei <yiguolei@gmail.com>
4 days ago[Bug](bucket shuffle) fix error bucket shuffle join plan in two same table (#12930)
HappenLee [Sat, 24 Sep 2022 01:59:23 +0000 (09:59 +0800)] 
[Bug](bucket shuffle) fix error bucket shuffle join plan in two same table (#12930)

4 days agofix transfer to tracker (#12932)
Xinyi Zou [Sat, 24 Sep 2022 01:01:05 +0000 (09:01 +0800)] 
fix transfer to tracker (#12932)

~MemTrackerLimiter() repeated consumption of _untracked_mem, resulting in inaccurate process mem tracker.

4 days agobuild extension docs failed fix (#12915)
jiafeng.zhang [Fri, 23 Sep 2022 13:58:02 +0000 (21:58 +0800)] 
build extension docs failed fix (#12915)

build extension docs fix

4 days ago[fix](frontend) fix peerDependencies error (#12373)
Jeffrey [Fri, 23 Sep 2022 13:54:52 +0000 (21:54 +0800)] 
[fix](frontend) fix peerDependencies error (#12373)

```npm install``` problem with peer dependencies in the latest version of npm (v7+)
Use ```npm install --legacy-peer-deps``` to fix it.

Reference: https://blog.npmjs.org/post/626173315965468672/npm-v7-series-beta-release-and-semver-major

4 days ago[fix](streamload) set coord for streamLoad (#12744)
Yongqiang YANG [Fri, 23 Sep 2022 12:23:19 +0000 (20:23 +0800)] 
[fix](streamload) set coord for streamLoad (#12744)

When a stream load is canceled, status is reported to coord.

4 days ago[fix](Nereids): add stats in plan. (#12790)
jakevin [Fri, 23 Sep 2022 11:26:49 +0000 (19:26 +0800)] 
[fix](Nereids): add stats in plan. (#12790)

* [improve](Nereids): add stats for bestPlan and correct fix selectivity

4 days ago[feature-wip](parquet-reader) add parquet reader profile (#12797)
Ashin Gau [Fri, 23 Sep 2022 10:42:14 +0000 (18:42 +0800)] 
[feature-wip](parquet-reader) add parquet reader profile (#12797)

Add profile for parquet reader. New counters:
- ParquetFilteredGroups: Filtered row groups by `RowGroup` min-max statistics
- ParquetReadGroups: The number of row groups to read
- ParquetFilteredRowsByGroup: The number of filtered rows by `RowGroup` min-max statistics
- ParquetFilteredRowsByPage: The number of filtered rows by page min-max statistics
- ParquetFilteredBytes: The filtered bytes by `RowGroup` min-max statistics
- ParquetReadBytes: The total bytes in `ParquetReadGroups`, may be further filtered If a page is skipped as a whole
## Result
```
┌──────────────────────────────────────────────────────┐
│[0: VFILE_SCAN_NODE]                                  │
│(Active: 1s29ms, non-child: 96.42)                    │
│  - Counters:                                         │
│      - BytesRead: 0.00                               │
│      - FileReadCalls: 1.826K (1826)                  │
│      - FileReadTime: 510.627ms                       │
│      - FileRemoteReadBytes: 65.23 MB                 │
│      - FileRemoteReadCalls: 1.146K (1146)            │
│      - FileRemoteReadRate: 128.29331970214844 MB/sec │
│      - FileRemoteReadTime: 508.469ms                 │
│      - NumDiskAccess: 0                              │
│      - NumScanners: 1                                │
│      - ParquetFilteredBytes: 0.00                    │
│      - ParquetFilteredGroups: 0                      │
│      - ParquetFilteredRowsByGroup: 0                 │
│      - ParquetFilteredRowsByPage: 6.600003M (6600003)│
│      - ParquetReadBytes: 2.13 GB                     │
│      - ParquetReadGroups: 20                         │
│      - PeakMemoryUsage: 0.00                         │
│      - PredicateFilteredRows: 3.399797M (3399797)    │
│      - PredicateFilteredTime: 133.302ms              │
│      - RowsRead: 3.399997M (3399997)                 │
│      - RowsReturned: 200                             │
│      - RowsReturnedRate: 194                         │
│      - TotalRawReadTime(*): 726.566ms                │
│      - TotalReadThroughput: 0.0 /sec                 │
│      - WaitScannerTime: 1s27ms                       │
└──────────────────────────────────────────────────────┘
```

5 days ago[Opt](Vectorized) Support push down no grouping agg (#12803)
HappenLee [Fri, 23 Sep 2022 10:29:54 +0000 (18:29 +0800)] 
[Opt](Vectorized) Support push down no grouping agg (#12803)

Support push down no grouping agg

5 days ago[fix](streamload&sink) release and allocate memory in the same tracker (#12820)
Yongqiang YANG [Fri, 23 Sep 2022 09:51:44 +0000 (17:51 +0800)] 
[fix](streamload&sink) release and allocate memory in the same tracker (#12820)

1. HttpServer threads allocate bytebuffer and put them into streamload pipe, but scanner thread release them with query tracker.
2. We can assume brpc allocate memory in doris thread.

Above problems leads to wrong result of memtracker.

5 days ago[feature](Nereids) enable bucket shuffle join on fragment without scan node (#12891)
morrySnow [Fri, 23 Sep 2022 07:01:50 +0000 (15:01 +0800)] 
[feature](Nereids) enable bucket shuffle join on fragment without scan node (#12891)

In the past, with legacy planner, we could only do bucket shuffle join on the join node belonging to the fragment with at least one scan node.
But, bucket shuffle join should do on each join node that left child's data distribution satisfy join's demand.
In nereids, we have data distribution info on each node. So we could enable bucket shuffle join on fragment without scan node.

5 days ago[enhancement](Nereids) remove unnecessary ExchangeNode under AssertNumRowsNode (...
morrySnow [Fri, 23 Sep 2022 06:50:27 +0000 (14:50 +0800)] 
[enhancement](Nereids) remove unnecessary ExchangeNode under AssertNumRowsNode (#12841)

current, we always add exchange under AssertNumRowsNode. Nevertheless, if its child node's partition is unpartitioned, no need to add exchange at all.

5 days ago[fix](test) fix a test failure problem after merging (#12902)
ElvinWei [Fri, 23 Sep 2022 06:22:29 +0000 (14:22 +0800)] 
[fix](test) fix a test failure problem after merging (#12902)

5 days ago[Improvement](statistics) collect statistics in parallel and add test cases (#12839)
ElvinWei [Fri, 23 Sep 2022 03:59:53 +0000 (11:59 +0800)] 
[Improvement](statistics) collect statistics in parallel and add test cases (#12839)

This PR mainly improves some functions of the statistics module(#6370):

1. when collecting partition statistics, filter empty partitions in advance and do not generate statistical tasks.
2. the old statistical update method may have problems when updating statistics in parallel, which has been solved.
3. optimize internal-query.
4. add test cases related to statistics.
5. modify some comments as prompted by CheckStyle.

5 days ago[test](Nereids) add TPC-H Q2 as regression test case (#12840)
morrySnow [Fri, 23 Sep 2022 03:00:31 +0000 (11:00 +0800)] 
[test](Nereids) add TPC-H Q2 as regression test case (#12840)

5 days ago[feature-wip](MTMV) Support showing and dropping materialized view for multiple table...
Adonis Ling [Fri, 23 Sep 2022 02:36:40 +0000 (10:36 +0800)] 
[feature-wip](MTMV) Support showing and dropping materialized view for multiple tables (#12762)

Use cases:

mysql> CREATE TABLE t1 (pk INT, v1 INT SUM) AGGREGATE KEY (pk) DISTRIBUTED BY hash (pk) PROPERTIES ('replication_num' = '1');
Query OK, 0 rows affected (0.05 sec)

mysql> CREATE TABLE t2 (pk INT, v2 INT SUM) AGGREGATE KEY (pk) DISTRIBUTED BY hash (pk) PROPERTIES ('replication_num' = '1');
Query OK, 0 rows affected (0.01 sec)

mysql> CREATE MATERIALIZED VIEW mv BUILD IMMEDIATE REFRESH COMPLETE KEY (mv_pk) DISTRIBUTED BY HASH (mv_pk) PROPERTIES ('replication_num' = '1') AS SELECT t1.pk as mv_pk FROM t1, t2 WHERE t1.pk = t2.pk;
Query OK, 0 rows affected (0.02 sec)

mysql> SHOW TABLES;
+---------------+
| Tables_in_dev |
+---------------+
| mv            |
| t1            |
| t2            |
+---------------+
3 rows in set (0.00 sec)

mysql> SHOW CREATE TABLE mv;
+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Materialized View | Create Materialized View                                                                                                                                                                                                                                                                                                                                                                                        |
+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| mv                | CREATE MATERIALIZED VIEW `mv`
BUILD IMMEDIATE REFRESH COMPLETE ON DEMAND
KEY(`mv_pk`)
DISTRIBUTED BY HASH(`mv_pk`) BUCKETS 10
PROPERTIES (
"replication_allocation" = "tag.location.default: 1",
"in_memory" = "false",
"storage_format" = "V2",
"disable_auto_compaction" = "false"
)
AS SELECT `t1`.`pk` AS `mv_pk` FROM `default_cluster:dev`.`t1` , `default_cluster:dev`.`t2` WHERE `t1`.`pk` = `t2`.`pk`; |
+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)

mysql> DROP MATERIALIZED VIEW mv;
Query OK, 0 rows affected (0.01 sec)

5 days ago[typo](docs) Changing the Jump Address of SparkLoad in BrokerLoad (#12731)
yuanyuan8983 [Fri, 23 Sep 2022 01:15:17 +0000 (09:15 +0800)] 
[typo](docs) Changing the Jump Address of SparkLoad in BrokerLoad (#12731)

5 days ago[Refactor](parquet) refactor parquet write to uniform and consistent logic (#12730)
zhangstar333 [Fri, 23 Sep 2022 01:12:34 +0000 (09:12 +0800)] 
[Refactor](parquet) refactor parquet write to uniform and consistent logic (#12730)

5 days ago[regressiontest](test_with)add with_case test (#12814)
Liqf [Fri, 23 Sep 2022 01:10:33 +0000 (09:10 +0800)] 
[regressiontest](test_with)add with_case test (#12814)

5 days ago[Bug](view) Show create view support comment #12838
Stalary [Fri, 23 Sep 2022 01:09:44 +0000 (09:09 +0800)] 
[Bug](view) Show create view support comment #12838

6 days ago[chore](build) add option to disable -frecord-gcc-switches (#12846)
Zhengguo Yang [Thu, 22 Sep 2022 07:38:14 +0000 (15:38 +0800)] 
[chore](build) add option to disable -frecord-gcc-switches (#12846)

6 days ago[feature-wip](statistics) add statistics module related syntax (#12766)
ElvinWei [Thu, 22 Sep 2022 03:15:00 +0000 (11:15 +0800)] 
[feature-wip](statistics) add statistics module related syntax (#12766)

This pull request includes some implementations of the statistics(#6370), it adds statistics module related syntax. The current syntax for collecting statistics will not collect statistics (It will collect statistics until test is stable).

- `ANALYZE` syntax(collect statistics)

```SQL
ANALYZE [[ db_name.tb_name ] [( column_name [, ...] )], ...] [PARTITIONS(...)] [ PROPERTIES(...) ]
```
> db_name.tb_name: collect table and column statistics from tb_name.
> column_name: collect column statistics from column_name.
> properties: properties of statistics jobs.

example:
```SQL
ANALYZE;  -- collect statistics for all tables in the current database
ANALYZE table1(pv, citycode);  -- collect pv and citycode statistics for table1
ANALYZE test.table2 PARTITIONS(partition1); -- collect statistics for partition1 of table2
```

- `SHOW ANALYZE` syntax(show statistics job info)

```SQL
SHOW ANALYZE
    [TABLE | ID]
    [
        WHERE
        [STATE = ["PENDING"|"SCHEDULING"|"RUNNING"|"FINISHED"|"FAILED"|"CANCELLED"]]
    ]
    [ORDER BY ...]
    [LIMIT limit][OFFSET offset];
```

- `SHOW TABLE STATS`syntax(show table statistics)

```SQL
SHOW TABLE STATS [ db_name.tb_name ]
```

- `SHOW COLUMN STATS` syntax(show column statistics)

```SQL
SHOW COLUMN STATS [ db_name.tb_name ]
```

6 days ago[feature-wip](statistics) collect statistics by sql task (#12765)
ElvinWei [Thu, 22 Sep 2022 03:13:35 +0000 (11:13 +0800)] 
[feature-wip](statistics) collect statistics by sql task (#12765)

This pull request includes some implementations of the statistics(#6370), it Implements sql-task to collect statistics based on internal-query(#9983).

After the ANALYZE statement is parsed, statistical tasks will be generated. The statistical tasks includes mata-task(get statistics from metadata) and sql-task(get statistics by sql query). For sql-task, it will get statistics such as the row_count, the number of null values, and the maximum value by SQL query.

For statistical tasks, also include sampling sql-task, which will be implemented in the next pr.

6 days ago[feature](http) refactor version info and add new http api for get version info ...
xueweizhang [Thu, 22 Sep 2022 02:53:04 +0000 (10:53 +0800)] 
[feature](http) refactor version info and add new http api for get version info (#12513)

Refactor version info and add new http api for get version info

6 days ago(brpc) donot use pooled brpc (#12754)
Yongqiang YANG [Thu, 22 Sep 2022 02:00:26 +0000 (10:00 +0800)] 
(brpc) donot use pooled brpc (#12754)

It seems that pooled brpc does not release port timely.

6 days ago[enhancement](like)pass data to like function in block not in row (#12825)
starocean999 [Thu, 22 Sep 2022 01:59:30 +0000 (09:59 +0800)] 
[enhancement](like)pass data to like function in block not in row (#12825)

The like predicate process data in block perform better than in row. Currently, only not null column is optimized, nullable column will be handled later.

SELECT COUNT(*) FROM hits WHERE URL LIKE '%google%';
before: ~680ms
after: ~570ms

6 days ago[bugfix](predicate column) data maybe wrong if not a single page (#12796)
yiguolei [Thu, 22 Sep 2022 01:55:31 +0000 (09:55 +0800)] 
[bugfix](predicate column) data maybe wrong if not a single page (#12796)

Co-authored-by: yiguolei <yiguolei@gmail.com>
6 days ago[bugfix](fe) Fix test_materialized_view_hll case npt. (#12829)
Lei Zhang [Thu, 22 Sep 2022 01:50:53 +0000 (09:50 +0800)] 
[bugfix](fe) Fix test_materialized_view_hll case npt. (#12829)

when enable light schema change, run test_materialized_view_hll case throw NullPointerException.
  java.lang.NullPointerException: null
      at org.apache.doris.analysis.SlotDescriptor.setColumn(SlotDescriptor.java:153)
      at org.apache.doris.planner.OlapScanNode.updateSlotUniqueId(OlapScanNode.java:399)

6 days ago[feature-wip](file-scanner)Get column type from parquet schema (#12833)
Jibing-Li [Thu, 22 Sep 2022 01:35:37 +0000 (09:35 +0800)] 
[feature-wip](file-scanner)Get column type from parquet schema (#12833)

Get schema from parquet reader.
The new VFileScanner need to get file schema (column name to type map) from parquet file while processing load job,
this pr is to set the type information for parquet columns.

6 days ago[feature-wip](parquet-reader) refactor some arguments for parquet reader (#12771)
slothever [Thu, 22 Sep 2022 01:34:01 +0000 (09:34 +0800)] 
[feature-wip](parquet-reader) refactor some arguments for parquet reader (#12771)

refactor some arguments for parquet reader
1. Add new parquet context to wrap reader arguments
2. Reduced some arguments for function call
Co-authored-by: jinzhe <jinzhe@selectdb.com>
6 days ago[fix](Nereids) anti join could not be reorder (#12827)
jakevin [Thu, 22 Sep 2022 01:19:12 +0000 (09:19 +0800)] 
[fix](Nereids) anti join could not be reorder (#12827)

6 days ago[fix](LOAD statement): fix bug for `toSql` func of LoadStmt. (#12648)
wxy [Thu, 22 Sep 2022 01:07:46 +0000 (09:07 +0800)] 
[fix](LOAD statement): fix bug for `toSql` func of LoadStmt. (#12648)

6 days ago[enhancement](Nereids) turn on all reorder rule that needed by zig-zag tree (#12767)
morrySnow [Wed, 21 Sep 2022 18:35:31 +0000 (02:35 +0800)] 
[enhancement](Nereids) turn on all reorder rule that needed by zig-zag tree (#12767)