Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PostgreSQL inserts are slow #45

Open
wonder-sk opened this issue Apr 26, 2021 · 2 comments
Open

PostgreSQL inserts are slow #45

wonder-sk opened this issue Apr 26, 2021 · 2 comments
Labels
bug Something isn't working

Comments

@wonder-sk
Copy link
Contributor

With a remote PostgreSQL when doing --init-from-db, the initial copy from the main schema to base schema takes very long time.

@wonder-sk wonder-sk added the bug Something isn't working label Apr 26, 2021
@wonder-sk
Copy link
Contributor Author

A quick and dirty fix to batch updates... Speeds up init that took over an hour to ~2 minutes. Needs cleaning up, probably updates and deletes should be batched as well.

diff --git a/geodiff/src/drivers/postgresdriver.cpp b/geodiff/src/drivers/postgresdriver.cpp
index f07760b..a3a6753 100644
--- a/geodiff/src/drivers/postgresdriver.cpp
+++ b/geodiff/src/drivers/postgresdriver.cpp
@@ -620,6 +620,7 @@ static void handleUpdated( const std::string &schemaNameBase, const std::string
 {
   std::string sqlModified = sqlFindModified( schemaNameBase, schemaNameModified, tableName, tbl );
   PostgresResult res( execSql( conn, sqlModified ) );
+  printf("%s\n", sqlModified.data());
 
   int rows = res.rowCount();
   for ( int r = 0; r < rows; ++r )
@@ -804,6 +805,7 @@ void PostgresDriver::applyChangeset( ChangesetReader &reader )
 
   int conflictCount = 0;
   ChangesetEntry entry;
+  std::string insertSqls;
   while ( reader.nextEntry( entry ) )
   {
     std::string tableName = entry.table->name;
@@ -839,7 +841,11 @@ void PostgresDriver::applyChangeset( ChangesetReader &reader )
     if ( entry.op == ChangesetEntry::OpInsert )
     {
       std::string sql = sqlForInsert( mBaseSchema, tableName, tbl, entry.newValues );
+      insertSqls += sql + ";";
+
+      /*
       PostgresResult res( execSql( mConn, sql ) );
+
       if ( res.status() != PGRES_COMMAND_OK )
       {
         logApplyConflict( "insert_failed", entry );
@@ -858,7 +864,7 @@ void PostgresDriver::applyChangeset( ChangesetReader &reader )
           autoIncrementTablesToFix[tableName] = pkey;
         else
           autoIncrementTablesToFix[tableName] = std::max( autoIncrementTablesToFix[tableName], pkey );
-      }
+      }*/
     }
     else if ( entry.op == ChangesetEntry::OpUpdate )
     {
@@ -895,6 +901,19 @@ void PostgresDriver::applyChangeset( ChangesetReader &reader )
     }
     else
       throw GeoDiffException( "Unexpected operation" );
+
+    if ( insertSqls.size() >= 1000000 )
+    {
+      printf("partial exec\n");
+      PostgresResult res( execSql( mConn, insertSqls ) );
+      insertSqls.clear();
+    }
+  }
+
+  if ( !insertSqls.empty() )
+  {
+    printf("final exec\n");
+    PostgresResult res( execSql( mConn, insertSqls ) );
   }
 
   // at the end, update any SEQUENCE objects if needed

@saberraz
Copy link
Contributor

saberraz commented Aug 10, 2021

For me, with less than 200k records, it is taking 5-6 hours to initialise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants