Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix some bugs with Groupby and CacheTable #14

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

yanghua
Copy link

@yanghua yanghua commented Apr 25, 2023

Why are the changes needed?

How was this patch tested?

  • Add some test cases that check the changes thoroughly including negative and positive cases if possible

  • Add screenshots for manual tests if appropriate

  • Run test locally before make a pull request

yanghua and others added 4 commits April 14, 2023 20:13
### _Why are the changes needed?_

Support GetPrimaryKeys for Trino Fe, close apache#4320

### _How was this patch tested?_
- [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.readthedocs.io/en/master/develop_tools/testing.html#running-tests) locally before make a pull request

Closes apache#4321 from Yikf/primaryKeys.

Closes apache#4320

3690a2c [Yikf] Support GetPrimaryKeys for Trino Fe

Authored-by: Yikf <[email protected]>
Signed-off-by: ulyssesyou <[email protected]>
…rce rowset

### _Why are the changes needed?_

With restful api, for query `select cast(null as int) c1`

#### Before
the result is `0`

#### After
the result is `null`.
### _How was this patch tested?_
- [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.readthedocs.io/en/master/develop_tools/testing.html#running-tests) locally before make a pull request

Closes apache#4372 from turboFei/null_value.

Closes apache#4372

b15f1eb [fwang12] nit
5f16855 [fwang12] fix
45c60dd [fwang12] check is set

Authored-by: fwang12 <[email protected]>
Signed-off-by: fwang12 <[email protected]>
…` and `CacheTable`

close apache#4332
### _Why are the changes needed?_

For the case where the table name has been resolved and an `Expand` logical plan exists
```
InsertIntoHiveTable `default`.`t1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, false, false, [a, b]
+- Aggregate [a#0], [a#0, ansi_cast((count(if ((gid#9 = 1)) spark_catalog.default.t2.`b`#10 else null) * count(if ((gid#9 = 2)) spark_catalog.default.t2.`c`#11 else null)) as string) AS b#8]
   +- Aggregate [a#0, spark_catalog.default.t2.`b`#10, spark_catalog.default.t2.`c`#11, gid#9], [a#0, spark_catalog.default.t2.`b`#10, spark_catalog.default.t2.`c`#11, gid#9]
      +- Expand [ArrayBuffer(a#0, b#1, null, 1), ArrayBuffer(a#0, null, c#2, 2)], [a#0, spark_catalog.default.t2.`b`#10, spark_catalog.default.t2.`c`#11, gid#9]
         +- HiveTableRelation [`default`.`t2`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [a#0, b#1, c#2], Partition Cols: []]
```
For the case `CacheTable` with `window` function
```
InsertIntoHiveTable `default`.`t1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, true, false, [a, b]
+- Project [a#98, b#99]
   +- InMemoryRelation [a#98, b#99, rank#100], StorageLevel(disk, memory, deserialized, 1 replicas)
         +- *(2) Filter (isnotnull(rank#4) AND (rank#4 = 1))
            +- Window [row_number() windowspecdefinition(a#9, b#10 ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS rank#4], [a#9], [b#10 ASC NULLS FIRST]
               +- *(1) Sort [a#9 ASC NULLS FIRST, b#10 ASC NULLS FIRST], false, 0
                  +- Exchange hashpartitioning(a#9, 200), ENSURE_REQUIREMENTS, [id=apache#38]
                     +- Scan hive default.t2 [a#9, b#10], HiveTableRelation [`default`.`t2`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [a#9, b#10], Partition Cols: []]

```

### _How was this patch tested?_
- [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.readthedocs.io/en/master/develop_tools/testing.html#running-tests) locally before make a pull request

Closes apache#4393 from iodone/kyuubi-4332.

Closes apache#4393

d2afdab [odone] fix cache table bug
443af79 [odone] fix some bugs with groupby

Authored-by: odone <[email protected]>
Signed-off-by: ulyssesyou <[email protected]>
windowColumnsLineage.getOrElse(attr, AttributeSet(attr))))
}
}
p.children.map(extractColumnsLineage(_, nextColumnsLineage)).reduce(mergeColumnsLineage)
case p: Join =>
p.joinType match {
case LeftSemi | LeftAnti =>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

该代码补丁涉及 SparkSQL 中一些列的代码修改,主要改动包括:

  1. 导入了新模块 ScalarSubquery
  2. 将 Attribute.references 的返回类型 AttributeSet 化
  3. 修改了 isNameWithQualifier 函数,用于解析带有限定符 (qualifier) 的属性名
  4. 新增了对 Expand 和 Window 两种操作的支持

此外代码的可读性较好,有一些无用代码需要清理。历史版本中可能有已修复的 bug 或者有待优化的部分,但是在当前代码的语境下无法评估。

@@ -477,4 +486,4 @@ public static String getCanonicalHostName(String hostName) {
public static boolean isKyuubiOperationHint(String hint) {
return KYUUBI_OPERATION_HINT_PATTERN.matcher(hint).matches();
}
}
} No newline at end of file

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这段代码主要进行了以下工作:

  1. 引入了org.apache.commons.lang3.StringUtils类。
  2. 修改了confStr的解析方式,使用了Pattern和Matcher进行处理。
  3. 对于每个conf参数进行了非空和包含"="符号的判断,并按照"="将参数名和值解析出来,放进了connParams.getHiveConfs()中。

可能存在的风险/bug:

1.没有捕获解析URL过程中可能存在的异常,需要增加异常处理机制来提醒错误并尽可能减小错误对整体代码运行的影响。

可以改进的地方:

  1. 可以考虑引入单元测试来确保代码的正确性,并进一步测试处理异常等情况。

  2. 进一步完善方法注释,使代码更易读、易懂。

CAST: 'CAST';
AS: 'AS';
KEY_SEQ: 'KEY_SEQ';
PK_NAME: 'PK_NAME';

fragment SEARCH_STRING_ESCAPE: '\'' '\\' '\'';

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这段代码看起来只是定义了几个常量,没有什么明显的错误风险。以下是一些改进建议:

  1. 建议将定义的常量放在单独的文件中,或者与相关实现部分放在同一个文件中,这样更易于组织和维护代码。

  2. 如果其中一些常量与其他地方的常量重复,请确保它们的含义相同,以避免混淆和不必要的错误。

CAST LEFT_PAREN NULL AS VARCHAR RIGHT_PAREN COLUMN_NAME COMMA
CAST LEFT_PAREN NULL AS SMALLINT RIGHT_PAREN KEY_SEQ COMMA
CAST LEFT_PAREN NULL AS VARCHAR RIGHT_PAREN PK_NAME
WHERE FALSE #getPrimaryKeys
| .*? #passThrough
;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这段代码是一个patch,对应多行SQL语句。以下是对每一行的简要说明:

第一行:

statement : 这似乎是先前定义的语法规则的名称。

第二行到第五行:

这里定义了一个SQL语句模板,用于从数据库中获取表格中所有的列名和相关信息。

第六行到第十二行:

这也是一个SQL语句模板,用于从数据库中获取主键的信息。这个查询使用了常量值 NULL 作为占位符。

第十三行:

.*? 是一个正则表达式,表示匹配任何字符,重复零次或更多次。这一行只是一个通配符占位符,用于忽略掉不需要处理的任意字符。

总体而言,这段代码逻辑上没有显著的缺陷和风险。但如果能提供更多上下文,我们可以更准确地给出更多有价值的改进建议。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants