python - peewee with bulk insert is very slow into sqlite db -
i'm trying large scale bulk insert sqlite database peewee. i'm using atomic
performance still terrible. i'm inserting rows in blocks of ~ 2500 rows, , due sql_max_variable_number i'm inserting 200 of them @ time. here code:
with helper.db.atomic(): in range(0,len(expression_samples),step): gtd.geneexpressionread.insert_many(expression_samples[i:i+step]).execute()
and list expression_samples
list of dictionaries appropriate fields geneexpressionread
model. i've timed loop, , takes anywhere 2-8 seconds execute. have millions of rows insert, , way have code written take 2 days complete. per this post, there several pragmas have set in order improve performance. didn't change me performance wise. lastly, per this test on peewee github page should possible insert many rows fast (~50,000 in 0.3364 seconds) seems author used raw sql code performance. has been able such high performance insert using peewee methods?
edit: did not realize test on peewee's github page mysql
inserts. may or may not apply situation.
mobius trying helpful in comments there's lot of misinformation in there.
- peewee creates indexes foreign keys when create table. happens database engines supported.
- turning on foreign key pragma going slow things down, why otherwise?
- for best performance, not create indexes on table bulk-loading into. load data, then create indexes. much less work database.
- as noted, disabling auto increment bulk-load speeds things up.
other information:
- use pragma journal_mode=wal;
- use pragma synchronous=0;
- use pragma locking_mode=exclusive;
those settings loading in bunch of data. check sqlite docs more info:
Comments
Post a Comment