Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crash: when read big table of 1000000 rows #22

Open
yjhjstz opened this issue Jun 3, 2020 · 10 comments
Open

crash: when read big table of 1000000 rows #22

yjhjstz opened this issue Jun 3, 2020 · 10 comments

Comments

@yjhjstz
Copy link

yjhjstz commented Jun 3, 2020

I wrote a c++ test code:

for (int i = 0; i < festate->rowgroups.size(); i++) {
        elog(INFO, " group %d ", i);
        try {
            festate->reader
            ->RowGroup(i)
            ->ReadTable(festate->indices, &festate->table);
        } catch(const std::exception& e) {
            elog(ERROR,
                 "parquet_fdw : failed to read table %d: %s",
                 festate->row_group, e.what());
        }
    }

it also crashed when i == 32768 .

@zilder
Copy link
Contributor

zilder commented Jun 3, 2020

Hi @yjhjstz,

Can you provide a backtrace? How many rowgroups are there in your file. 32K sounds like pretty unreasonable number of row groups. Is it possible that you share your parquet file?

@yjhjstz
Copy link
Author

yjhjstz commented Jun 3, 2020

sorry , I misuse StreamWriter::SetMaxRowGroupSize(1000) according to example.

@zilder
Copy link
Contributor

zilder commented Jun 3, 2020

Can you please anyway send the file that caused crash or the code that produces it? I'm going to reproduce this issue and either fix the bug in parquet_fdw if there is one or add an extra check if there is limitation in libarrow.

@yjhjstz
Copy link
Author

yjhjstz commented Jun 3, 2020

  1. git clone https://github.com/yjhjstz/parquet_fdw/tree/dev
  2. psql run below:
create extension parquet_fdw;
create server parquet_srv foreign data wrapper parquet_fdw;

CREATE FOREIGN TABLE test (id int , c1 float4[]) SERVER parquet_srv OPTIONS(filename '/Users/jianghuayang/work/fdw/parquet_fdw/data/test.parquet', sorted 'id');

create or replace function gen_float4_arr(int) returns float4[] as $$    
  select array_agg((random()*100)::float4) from generate_series(1,$1);    
$$ language sql strict;


insert into test select id, gen_float4_arr(64) from generate_series(1,1000000) t(id);
select * from test;

@yjhjstz
Copy link
Author

yjhjstz commented Jun 4, 2020

by the way , you can reproduce it and join optimize the insert routine .

@sdressler
Copy link

Is there any update on this issue? I run into a SEGFAULT when running ANALYZE on a big table.

@zilder
Copy link
Contributor

zilder commented Dec 2, 2020

Hi @sdressler,
can you send a backtrace?

@sdressler
Copy link

@zilder I can, but figured it out eventually and the schema had mismatching types. I can still get you a backtrace if you want and if it helps to make things more stable.

@zilder
Copy link
Contributor

zilder commented Dec 2, 2020

Yes, that would be helpful. Is it possible that you also provide schemes that you used in parquet and in postgres?

@sdressler
Copy link

I am going to open a new bug report.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants