Due to various vagaries of fate, I’ve been spending quite a few hours extracting, transforming and loading administrative data for my university.
I’ve been using the Python openpyxl package to import data from Excel files. The process was turning out to be horribly slow, using about a second to process a small file.
Turns out that openpyxl
has a special read-only mode which
is very much faster that the general interface. All you have to
do is to pass an extra argument when opening a file, like
= openpyxl.load_workbook("myfile.xlsx",read_only=True) wb
This brought down the run time of my program from 20 minutes to 2 seconds.
The openpyxl
people should put up a notice in huge letters on the
front-page of their manual advertising this.