Post by Dr Eberhard LisseI find that using tools like CSVQ helps with checking CSV data
integrity.
The data came out of a SQLite db and was fairly clean, but encoding
issues meant the SQLite data had 'dupes':
this field has a question mark ? in it
this field has a question mark � in it
Post by Dr Eberhard LisseTurning off indexes and such during the load and generating them
afterwards speeds loading of data up significantly.
However,
https://www.google.com/search?q=sqlite+to+mariadb
has 1.5 Million results :-)-O
www.google.com/search?q=mariadb+to+sqlite
has 1.4M results
SQLite has been very robust for this db of 30M rows. Easy to backup and
administer, never crashed, fast querying... but it's very slow to alter
big tables, and I wanted to see what all the fuss about MySQL/MariaDB
was about.
It (MariaDB) is a nice dbms, I must say. Especially for no cost/open
source.
I also looked at Oracle Express Edition, but it was limited to 10GB of
data, and my database with indexes is at least 15GB.
Post by Dr Eberhard LisseI have used
https://pypi.org/project/sqlite3-to-mysql/
in the past successfully
Thanks.
I'll look at it. I wrote my own source to dest python code, and the
INSERT rate into MariaDB was fast, as high as 8600 per second on some
tables.
Overall: 30.6M rows copied in 5770 seconds = 5304 per second (on my
11-year-old i5-750 computer). That's still 96 minutes, so I wonder if
there's a better way.
Post by Dr Eberhard Lisseel
Post by DFSPost by Johann KlammerPost by DFSFinally got a big file (3 columns x 612K rows) of messy data loaded.
load data infile 'file.csv'
into table
fields terminated by ','
enclosed by '"';
I noticed it tells you an error occurred at a certain line, but the
offending data (a text field ending in \ or "\") 5 lines later was
the real issue.
On one run 5 lines didn't post, and it turns out they were the next
5 lines after a field ending in "\". Those 5 lines got concatenated
with the offending line, so I had one large clump of data in one
row.
Moral: watch for fields ending in \ or "\".
I seem to recall it also has problems with whitespace around the commata.
Yes, I saw that too.
It was a fiasco. Wasted a fair amt of my time. A couple times it
would spend 10 minutes reading a file (Stage 1 of 1...) then tell me
it bombed on row 1 (which was valid data). Near the end of my db
a little python program to copy the data.
The python code did - no joking - nearly 2700+ inserts/second into
MariaDB. I copied 1.5M rows (~4GB) in under 10 minutes.
'load data infile' isn't worthless - it worked decently fast on simple
data - but there are better ways of getting data into MariaDB.