Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Window functions with kRange frames returns incorrect results #10093

Closed
pramodsatya opened this issue Jun 6, 2024 · 2 comments
Closed

Window functions with kRange frames returns incorrect results #10093

pramodsatya opened this issue Jun 6, 2024 · 2 comments
Assignees
Labels
bug Something isn't working triage Newly created issue that needs attention.

Comments

@pramodsatya
Copy link
Collaborator

pramodsatya commented Jun 6, 2024

Bug description

This issue was found while working on adding support for kRange frames in the window fuzzer. The results for below query in velox do not match with Presto. The velox results appear to be incorrect upon checking the output for row 3.

I20240606 10:28:16.273399 12461699 WindowFuzzer.cpp:342] ==============================> Started iteration 0 (seed: 1712238753)
I20240606 10:28:42.073552 12461699 AggregationFuzzerBase.cpp:448] Executing query plan: 
-- Window[partition by [p0, p1] order by [s0 ASC NULLS LAST] w0 := sum(ROW["c0"]) RANGE between UNBOUNDED PRECEDING and k1 PRECEDING] -> c0:DOUBLE, s0:BIGINT, p0:DOUBLE, p1:TINYINT, row_number:BIGINT, k1:BIGINT, w0:DOUBLE
  -- Values[10 rows in 1 vectors] -> c0:DOUBLE, s0:BIGINT, p0:DOUBLE, p1:TINYINT, row_number:BIGINT, k1:BIGINT

Velox output:
	0.687331699533388 | 116046945367747275 | 0.8334291900973767 | 16 | 2 | 2480437560073530297 | null
	0.8206623792648315 | 158974630665072867 | 0.8334291900973767 | 16 | 7 | 2523365245370855889 | null
	0.8206623792648315 | 2244875474122244932 | 0.8334291900973767 | 16 | 0 | 4609266088828027954 | null
	0.2248822799883783 | 3148654803738462351 | 0.8334291900973767 | 16 | 1 | 5513045418444245373 | null
	0.9166490170173347 | 3550897651387022177 | 0.8334291900973767 | 16 | 6 | 5915288266092805199 | null
	0.38860476203262806 | 5105548653791430694 | 0.8334291900973767 | 16 | 3 | 7469939268497213716 | null
	0.38860476203262806 | 5842560344486019605 | 0.8334291900973767 | 16 | 9 | 8206950959191802627 | null
	0.8206623792648315 | 6279807244011096880 | 0.8334291900973767 | 16 | 5 | 8644197858716879902 | null
	0.10044080135412514 | 6461315255634262331 | 0.8334291900973767 | 16 | 4 | 8825705870340045353 | null
	0.9166490170173347 | 7717907586427583831 | 0.8334291900973767 | 16 | 8 | -8364445872576184763 | null

Presto output:
	0.687331699533388 | 116046945367747275 | 0.8334291900973767 | 16 | 2 | 2480437560073530297 | null
	0.8206623792648315 | 158974630665072867 | 0.8334291900973767 | 16 | 7 | 2523365245370855889 | null
	0.8206623792648315 | 2244875474122244932 | 0.8334291900973767 | 16 | 0 | 4609266088828027954 | null
	0.2248822799883783 | 3148654803738462351 | 0.8334291900973767 | 16 | 1 | 5513045418444245373 | 1.5079940787982196     
	0.9166490170173347 | 3550897651387022177 | 0.8334291900973767 | 16 | 6 | 5915288266092805199 | 1.5079940787982196
	0.38860476203262806 | 5105548653791430694 | 0.8334291900973767 | 16 | 3 | 7469939268497213716 | 2.328656458063051
	0.38860476203262806 | 5842560344486019605 | 0.8334291900973767 | 16 | 9 | 8206950959191802627 | 2.5535387380514294
	0.8206623792648315 | 6279807244011096880 | 0.8334291900973767 | 16 | 5 | 8644197858716879902 | 3.470187755068764
	0.10044080135412514 | 6461315255634262331 | 0.8334291900973767 | 16 | 4 | 8825705870340045353 | 3.470187755068764
	0.9166490170173347 | 7717907586427583831 | 0.8334291900973767 | 16 | 8 | -8364445872576184763 | 3.858792517101392

The corresponding Presto query uses the offset k with value 2364390614705783022:

SELECT c0, s0, p0, p1, row_number, k1, sum(c0) OVER (PARTITION BY p0, p1 ORDER BY s0 ASC NULLS LAST RANGE BETWEEN UNBOUNDED PRECEDING AND 2364390614705783022 PRECEDING) FROM tmp

This issue is currently seen with empty frames (RANGE between UNBOUNDED PRECEDING and k1 PRECEDING) and could be because the check does not account for this case.

Repro: The commit can be used to repro the issue.

Related PR: #10006

cc: @aditi-pandit

System information

Velox System Info v0.0.2
Commit: 03d703f
CMake Version: 3.29.4
System: Darwin-23.5.0
Arch: arm64
C++ Compiler: /Library/Developer/CommandLineTools/usr/bin/c++
C++ Compiler Version: 15.0.0.15000309
C Compiler: /Library/Developer/CommandLineTools/usr/bin/cc
C Compiler Version: 15.0.0.15000309
CMake Prefix Path: /Library/Developer/CommandLineTools/SDKs/MacOSX13.3.sdk/usr;/opt/homebrew;/usr/local;/usr;/;/opt/homebrew/Cellar/cmake/3.29.4;/usr/local;/usr/X11R6;/usr/pkg;/opt;/sw;/opt/local

Relevant logs

No response

@pramodsatya pramodsatya added bug Something isn't working triage Newly created issue that needs attention. labels Jun 6, 2024
@aditi-pandit aditi-pandit self-assigned this Jun 6, 2024
@aditi-pandit
Copy link
Collaborator

@pramodsatya : I tried to repro this with the following code

TEST_F(AggregateWindowTest, fuzzerTestFailure) {
  auto c0 = makeFlatVector<int64_t>({1, 1, 1, 1, 1, 1, 1, 1, 1, 1});
  std::vector<int64_t> c1Vector = {
    116046945367747275, 158974630665072867, 2244875474122244932,
    3148654803738462351, 3550897651387022177, 5105548653791430694,
    5842560344486019605, 6279807244011096880, 6461315255634262331,
    7717907586427583831 };
  std::vector<int64_t> c1rangeVector;
  c1rangeVector.reserve(10);
  for (auto c1 : c1Vector) {
    std::cout << c1 - 2364390614705783022 << "\n";
    c1rangeVector.push_back(c1 - 2364390614705783022);
  }

  auto c1 = makeFlatVector<int64_t>(c1Vector);
  auto c2 = makeFlatVector<int64_t>(c1rangeVector);
  auto c3 = makeFlatVector<double>(
      { 0.687331699533388, 0.8206623792648315, 0.8206623792648315,
       0.2248822799883783, 0.9166490170173347, 0.38860476203262806,
       0.38860476203262806, 0.8206623792648315, 0.10044080135412514,
       0.9166490170173347 });
  auto input = makeRowVector({c0, c1, c2, c3});

  auto c4 = makeNullableFlatVector<double>(
      {std::nullopt, std::nullopt, std::nullopt, 1.5079940787982196,
       1.5079940787982196, 2.328656458063051, 2.5535387380514294,
       3.470187755068764, 3.470187755068764, 3.858792517101392});
  auto expected = makeRowVector({c0, c1, c2, c3, c4});
  WindowTestBase::testWindowFunction(
      {input},
      "sum(c3)",
      "partition by c0 order by c1 asc nulls last",
      "range between unbounded preceding and c2 preceding",
      expected);
}

This generates the k-range value column as

-2248343669338035747
-2205415984040710155
-119515140583538090
784264189032679329
1186507036681239155
2741158039085647672
3478169729780236583
3915416629305313858
4096924640928479309
5353516971721800809

and gets the correct results.

So I'm unsure of the values printed in the k1 column. Can you give more information if possible ?

@pramodsatya
Copy link
Collaborator Author

Looks like the k1 column was generated incorrectly and the issue is resolved now, thanks for pointing it out @aditi-pandit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage Newly created issue that needs attention.
Projects
None yet
Development

No branches or pull requests

2 participants