[SUPPORT] - Data loss after 3 days following upgrade from Hudi 0.11.1 to 0.14.0 #11959

RuyRoaV · 2024-09-18T16:17:45Z

Tips before filing an issue

Describe the problem you faced

A clear and concise description of the problem.

We have a COW table which is updated via an UPSERT operation through a Glue Job; the operations were initially performed on Hudi 0.11.1. Moreover the table is partitioned by year, month and day.

Some days after upgrading to Hudi 0.14.0, we noticed that we were having less rows for partitions starting from the update date. Moreover, we noticed that records for a given partition day were dropped with a delay of 3 days. This behaviour was observed when counting the records by partition using Glue or Athena.

On another hand, we also have a Redshift Spectrum subscription built from this table, and when doing the row count check, we could see the "correct" number of rows. However, we could see duplicated data.

Furthermore, we upgraded 4 tables from Hudi 0.11.1 to Hudi 0.14.0 and only with this table we observed such behaviour.

To Reproduce

Steps to reproduce the behavior:

Table in Hudi 0.11.1
Upgrade to Hudi 0.14.0
Wait 3 days to observe the data loss.

These are the write configurations set by us.

Expected behavior

Could you please shed some light on why this could have happened?

We should see the correct number of rows in Athena / Glue.

Environment Description

Hudi version : 0.14.0
Spark version : 3.3.0 (Glue 4)
Hive version :
Hadoop version :
Storage (HDFS/S3/GCS..) : S3
Running on Docker? (yes/no) : No

Additional context

Add any other context about the problem here.

Stacktrace

Add the stacktrace of the error.

The text was updated successfully, but these errors were encountered:

migeruj · 2024-09-19T01:22:00Z

Hi! I am another Hudi user like you, I'm not related directly with Hudi Project.

Could you please format your write configurations as a copyable JSON? This will help make it easier to replicate. From what I can see, nothing stands out as an issue so far.

Also, are you using the hudi-aws-bundle for your Glue Job? There was a breaking change introduced in version 0.13.0, which might affect your setup, though I’m not sure if it applies in your case.

Check the breaking changes and behaviour changes of 0.13.0 versions and 0.14.0 versions:

0.14.0 Changes
0.13.0 Changes

Also check known regressions, on 0.14.0 and 0.14.1 there is some regressions related to Duplicates for ComplexKeyGenerator. Based on that try to use 0.13.0 version instead until is solved.

If you’ve tried everything else, I recommend the following steps:

Compare the checkpoints before and after the Hudi upgrade to see if there is any behaviour that helps.
Could you use the hudi-cli to check the commit history? This can help track down any issues with the data or commits.

danny0405 · 2024-09-19T01:23:03Z

@ad1happy2go Do you have chance to help to reproduce here?

rangareddy · 2024-10-01T08:09:16Z

Hi @RuyRoaV

I have few questions before we identify and provide any kind solution:

You mentioned the row count matches the number of records from Redshift Spectrum. Could you please elaborate on how you tested and verified this count?
How did you verify the record count by partition using Glue or Athena? Providing more details would help us understand the issue and find a solution. You can do one more thing by launching spark and verify the count.
You mentioned this issue is affecting a single table. Is there a difference in how this table was created compared to others?

danny0405 added the data-loss loss of data only, use data-consistency label for inconsistent view label Sep 19, 2024

ad1happy2go added the priority:critical production down; pipelines stalled; Need help asap. label Sep 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SUPPORT] - Data loss after 3 days following upgrade from Hudi 0.11.1 to 0.14.0 #11959

[SUPPORT] - Data loss after 3 days following upgrade from Hudi 0.11.1 to 0.14.0 #11959

RuyRoaV commented Sep 18, 2024

migeruj commented Sep 19, 2024

danny0405 commented Sep 19, 2024

rangareddy commented Oct 1, 2024

[SUPPORT] - Data loss after 3 days following upgrade from Hudi 0.11.1 to 0.14.0 #11959

[SUPPORT] - Data loss after 3 days following upgrade from Hudi 0.11.1 to 0.14.0 #11959

Comments

RuyRoaV commented Sep 18, 2024

migeruj commented Sep 19, 2024

danny0405 commented Sep 19, 2024

rangareddy commented Oct 1, 2024