Discussion:
Up to 50K inserts per second... smokin'!
(too old to reply)
DFS
2021-08-25 18:07:11 UTC
Permalink
Copying 30M rows (across 15 tables) from SQLite into a MariaDB store

Original python DB-API code did single inserts, got 5.2K per second overall.

Found some bulk insert ideas on stack exchange, finessed an
implementation and got over 50K inserts per second on some data!

"TABLEONE: 1.50 secs to post 75545 rows (50353 per sec)"

With the new code the avg for all 30M rows across all tables was around
16K per sec.

Worked on the code for some hours to save 30-45 minutes in data loading
time... but 50K inserts/sec is pretty darn good for my 11-year-old
i5-750 8GB RAM system.

Raise a glass!
Jerry Stuckle
2021-08-25 23:26:52 UTC
Permalink
Post by DFS
Copying 30M rows (across 15 tables) from SQLite into a MariaDB store
Original python DB-API code did single inserts, got 5.2K per second overall.
Found some bulk insert ideas on stack exchange, finessed an
implementation and got over 50K inserts per second on some data!
"TABLEONE: 1.50 secs to post 75545 rows (50353 per sec)"
With the new code the avg for all 30M rows across all tables was around
16K per sec.
Worked on the code for some hours to save 30-45 minutes in data loading
time... but 50K inserts/sec is pretty darn good for my 11-year-old
i5-750 8GB RAM system.
Raise a glass!
Very understandable. Each INSERT must be parsed and executed by the
RDBMS. Parsing a text SQL statement and determining how to execute it is
very time consuming, much more than actually doing the SQL statement.

That's why enterprise-level databases like IBM's DB2 have a means to
preprocess static SQL statements and build access plans to do the work.
The resultant code doesn't actually submit the SQL statement but
instead calls a function which executes the pre-built access plan.

You're inserting multiple rows with one SQL statement does the same
thing Your SQL statement only needs to be parse once and the means of
INSERTing the data only determined once. From there on it's a simple
manner of repeating the same code.

And BTW - with so many rows you'd probably get even better response by
DROPping any indices before the INSERT and recreating them after the INSERT.
--
==================
Remove the "x" from my email address
Jerry Stuckle
***@attglobal.net
==================
Axel Schwenke
2021-08-26 07:39:27 UTC
Permalink
Very understandable.  Each INSERT must be parsed and executed by the RDBMS.
Parsing a text SQL statement and determining how to execute it is very time
consuming, much more than actually doing the SQL statement.
That's why enterprise-level databases like IBM's DB2 have a means to
preprocess static SQL statements and build access plans to do the work.  The
resultant code doesn't actually submit the SQL statement but instead calls a
function which executes the pre-built access plan.
That's a typical Stuckle post. MySQL introduced prepared statements in
version 4.1. That was somewhere around 2005. Whopping 16 years in the past.
Yet still he tries to sell it as a feature specific to enterprise RDBMS...
The Natural Philosopher
2021-08-26 08:44:59 UTC
Permalink
Post by Axel Schwenke
Very understandable.  Each INSERT must be parsed and executed by the RDBMS.
Parsing a text SQL statement and determining how to execute it is very time
consuming, much more than actually doing the SQL statement.
That's why enterprise-level databases like IBM's DB2 have a means to
preprocess static SQL statements and build access plans to do the work.  The
resultant code doesn't actually submit the SQL statement but instead calls a
function which executes the pre-built access plan.
That's a typical Stuckle post. MySQL introduced prepared statements in
version 4.1. That was somewhere around 2005. Whopping 16 years in the past.
Yet still he tries to sell it as a feature specific to enterprise RDBMS...
I didn't realize stucklehead was still alive.
--
“A leader is best When people barely know he exists. Of a good leader,
who talks little,When his work is done, his aim fulfilled,They will say,
“We did this ourselves.”

― Lao Tzu, Tao Te Ching
Jerry Stuckle
2021-08-26 14:53:24 UTC
Permalink
Post by Axel Schwenke
Very understandable.  Each INSERT must be parsed and executed by the RDBMS.
Parsing a text SQL statement and determining how to execute it is very time
consuming, much more than actually doing the SQL statement.
That's why enterprise-level databases like IBM's DB2 have a means to
preprocess static SQL statements and build access plans to do the work.  The
resultant code doesn't actually submit the SQL statement but instead calls a
function which executes the pre-built access plan.
That's a typical Stuckle post. MySQL introduced prepared statements in
version 4.1. That was somewhere around 2005. Whopping 16 years in the past.
Yet still he tries to sell it as a feature specific to enterprise RDBMS...
MySQL's version of prepared statements is not at all like static SQL in
enterprise level databases. But you don't know the difference.

A short course. In DB2 the program is run through a preprocessor once.
The preprocessor looks for EXEC DB2 statements. When it finds one, it
analyzes te statement, builds an access plan and stores it in the
database. It then comments out the EXEC DB2 statement and builds native
language calls directly to the stored access plan to pass values and
execute the plan. The SQL code is commented out and doesn't even appear
in the compiled program.

This is NOT the same as prepared statements, which still must be
prepared every time the program hits the PREPARE statement. It is much
faster.

But you have no idea what a high performance databsae is. Compared to
DB2, MySQL is a toy
--
==================
Remove the "x" from my email address
Jerry Stuckle
***@attglobal.net
==================
DFS
2021-09-07 23:20:38 UTC
Permalink
Post by Jerry Stuckle
Post by DFS
Copying 30M rows (across 15 tables) from SQLite into a MariaDB store
And BTW - with so many rows you'd probably get even better response by
DROPping any indices before the INSERT and recreating them after the INSERT.
Created an Oracle Express 18c database and created 'skeleton' tables
with no PKs or FKs or indexes of any kind. Just table, column name,
datatype and NOT|NULL spec.

TBL 1: 14.88 secs to post 2799305 rows (188096 per sec)
TBL 2: 21.93 secs to post 3568564 rows (162707 per sec)
TBL 3: 110.49 secs to post 14898302 rows (134840 per sec)

Loading...