Discussion:
load data infile somewhat flaky
(too old to reply)
DFS
2021-08-17 17:11:14 UTC
Permalink
Finally got a big file (3 columns x 612K rows) of messy data loaded.

load data infile 'file.csv'
into table
fields terminated by ','
enclosed by '"';

I noticed it tells you an error occurred at a certain line, but the
offending data (a text field ending in \ or "\") 5 lines later was the
real issue.

On one run 5 lines didn't post, and it turns out they were the next 5
lines after a field ending in "\". Those 5 lines got concatenated with
the offending line, so I had one large clump of data in one row.

Moral: watch for fields ending in \ or "\".
Johann Klammer
2021-08-18 08:10:26 UTC
Permalink
Post by DFS
Finally got a big file (3 columns x 612K rows) of messy data loaded.
load data infile 'file.csv'
into table
fields terminated by ','
enclosed by '"';
I noticed it tells you an error occurred at a certain line, but the offending data (a text field ending in \ or "\") 5 lines later was the real issue.
On one run 5 lines didn't post, and it turns out they were the next 5 lines after a field ending in "\". Those 5 lines got concatenated with the offending line, so I had one large clump of data in one row.
Moral: watch for fields ending in \ or "\".
I seem to recall it also has problems with whitespace around the commata.
DFS
2021-08-18 13:43:20 UTC
Permalink
Post by Johann Klammer
Post by DFS
Finally got a big file (3 columns x 612K rows) of messy data loaded.
load data infile 'file.csv'
into table
fields terminated by ','
enclosed by '"';
I noticed it tells you an error occurred at a certain line, but the offending data (a text field ending in \ or "\") 5 lines later was the real issue.
On one run 5 lines didn't post, and it turns out they were the next 5 lines after a field ending in "\". Those 5 lines got concatenated with the offending line, so I had one large clump of data in one row.
Moral: watch for fields ending in \ or "\".
I seem to recall it also has problems with whitespace around the commata.
Yes, I saw that too.

It was a fiasco. Wasted a fair amt of my time. A couple times it would
spend 10 minutes reading a file (Stage 1 of 1...) then tell me it bombed
on row 1 (which was valid data).
Near the end of my db cloning process (15 tables, 30M rows total, SQLite
to MariaDB) I wrote a little python program to copy the data.

The python code did - no joking - nearly 2700+ inserts/second into
MariaDB. I copied 1.5M rows (~4GB) in under 10 minutes.

'load data infile' isn't worthless - it worked decently fast on simple
data - but there are better ways of getting data into MariaDB.
Dr Eberhard Lisse
2021-08-19 10:33:47 UTC
Permalink
I find that using tools like CSVQ helps with checking CSV data
integrity.

Turning off indexes and such during the load and generating them
afterwards speeds loading of data up significantly.

However,

https://www.google.com/search?q=sqlite+to+mariadb

has 1.5 Million results :-)-O

I have used

https://pypi.org/project/sqlite3-to-mysql/

in the past successfully


el
Post by DFS
Post by Johann Klammer
Post by DFS
Finally got a big file (3 columns x 612K rows) of messy data loaded.
load data infile 'file.csv'
into table
fields terminated by ','
enclosed by '"';
I noticed it tells you an error occurred at a certain line, but the
offending data (a text field ending in \ or "\") 5 lines later was
the real issue.
On one run 5 lines didn't post, and it turns out they were the next
5 lines after a field ending in "\". Those 5 lines got concatenated
with the offending line, so I had one large clump of data in one
row.
Moral: watch for fields ending in \ or "\".
I seem to recall it also has problems with whitespace around the commata.
Yes, I saw that too.
It was a fiasco. Wasted a fair amt of my time. A couple times it
would spend 10 minutes reading a file (Stage 1 of 1...) then tell me
it bombed on row 1 (which was valid data). Near the end of my db
a little python program to copy the data.
The python code did - no joking - nearly 2700+ inserts/second into
MariaDB. I copied 1.5M rows (~4GB) in under 10 minutes.
'load data infile' isn't worthless - it worked decently fast on simple
data - but there are better ways of getting data into MariaDB.
--
To email me replace 'nospam' with 'el'
DFS
2021-08-19 12:54:12 UTC
Permalink
Post by Dr Eberhard Lisse
I find that using tools like CSVQ helps with checking CSV data
integrity.
The data came out of a SQLite db and was fairly clean, but encoding
issues meant the SQLite data had 'dupes':

this field has a question mark ? in it
this field has a question mark � in it
Post by Dr Eberhard Lisse
Turning off indexes and such during the load and generating them
afterwards speeds loading of data up significantly.
However,
     https://www.google.com/search?q=sqlite+to+mariadb
has 1.5 Million results :-)-O
www.google.com/search?q=mariadb+to+sqlite

has 1.4M results

SQLite has been very robust for this db of 30M rows. Easy to backup and
administer, never crashed, fast querying... but it's very slow to alter
big tables, and I wanted to see what all the fuss about MySQL/MariaDB
was about.

It (MariaDB) is a nice dbms, I must say. Especially for no cost/open
source.

I also looked at Oracle Express Edition, but it was limited to 10GB of
data, and my database with indexes is at least 15GB.
Post by Dr Eberhard Lisse
I have used
     https://pypi.org/project/sqlite3-to-mysql/
in the past successfully
Thanks.

I'll look at it. I wrote my own source to dest python code, and the
INSERT rate into MariaDB was fast, as high as 8600 per second on some
tables.

Overall: 30.6M rows copied in 5770 seconds = 5304 per second (on my
11-year-old i5-750 computer). That's still 96 minutes, so I wonder if
there's a better way.
Post by Dr Eberhard Lisse
el
Post by DFS
Post by Johann Klammer
Post by DFS
Finally got a big file (3 columns x 612K rows) of messy data loaded.
load data infile 'file.csv'
into table
fields terminated by ','
enclosed by '"';
I noticed it tells you an error occurred at a certain line, but the
offending data (a text field ending in \ or "\") 5 lines later was
the real issue.
On one run 5 lines didn't post, and it turns out they were the next
5 lines after a field ending in "\".  Those 5 lines got concatenated
with the offending line, so I had one large clump of data in one
row.
Moral: watch for fields ending in \ or "\".
I seem to recall it also has problems with whitespace around the commata.
Yes, I saw that too.
It was a fiasco.  Wasted a fair amt of my time.  A couple times it
would spend 10 minutes reading a file (Stage 1 of 1...)  then tell me
it bombed on row 1 (which was valid data).  Near the end of my db
a little python program to copy the data.
The python code did - no joking - nearly 2700+ inserts/second into
MariaDB. I copied 1.5M rows (~4GB) in under 10 minutes.
'load data infile' isn't worthless - it worked decently fast on simple
data - but there are better ways of getting data into MariaDB.
J.O. Aho
2021-08-19 13:23:21 UTC
Permalink
Post by DFS
Post by Dr Eberhard Lisse
I find that using tools like CSVQ helps with checking CSV data
integrity.
The data came out of a SQLite db and was fairly clean, but encoding
this field has a question mark ? in it
this field has a question mark � in it
As long as it's the same charset used in the sqlite, it shouldn't be too
much work to fix it, but a completely different thing is it's a mix.
Post by DFS
Post by Dr Eberhard Lisse
Turning off indexes and such during the load and generating them
afterwards speeds loading of data up significantly.
However,
      https://www.google.com/search?q=sqlite+to+mariadb
has 1.5 Million results :-)-O
www.google.com/search?q=mariadb+to+sqlite
has 1.4M results
SQLite has been very robust for this db of 30M rows.  Easy to backup and
administer, never crashed, fast querying... but it's very slow to alter
big tables, and I wanted to see what all the fuss about MySQL/MariaDB
was about.
It (MariaDB) is a nice dbms, I must say.  Especially for no cost/open
source.
PostgreSQL is worth to mention (I know shouldn't say it here), also open
source and do have some nice features missing in MariaDB/MySQL.
--
//Aho
Axel Schwenke
2021-08-19 14:49:00 UTC
Permalink
The data came out of a SQLite db and was fairly clean, but encoding issues
That makes no sense. If the data source has other encoding (charset) than
the destination, it is the DBAs fault. He has to make sure that destination
encoding allows all characters from the source.
Post by DFS
On one run 5 lines didn't post, and it turns out they were the next
5 lines after a field ending in "\".  Those 5 lines got concatenated
with the offending line, so I had one large clump of data in one
row.
Invalid data in exported files is also the DBAs fault. He must use proper
quoting if the data contains special characters like '\'
The Natural Philosopher
2021-08-19 16:00:51 UTC
Permalink
Post by DFS
and my database with indexes is at least 15GB.
...indices.... ^^^^^^^
--
Outside of a dog, a book is a man's best friend. Inside of a dog it's
too dark to read.

Groucho Marx
DFS
2021-08-19 16:31:53 UTC
Permalink
Post by DFS
and my database with indexes is at least 15GB.
...indices....         ^^^^^^^
or indexes.
Jerry Stuckle
2021-08-20 17:58:51 UTC
Permalink
Post by DFS
and my database with indexes is at least 15GB.
...indices....         ^^^^^^^
Both are acceptable. https://www.dictionary.com/browse/index
--
==================
Remove the "x" from my email address
Jerry Stuckle
***@attglobal.net
==================
Loading...