For a couple of days now, I'm having a problem with "some" Excel files. Reading them was extremely (more than 50 times) slow. Strange thing is, if I copy the data part of the sheet to another blank Excel sheet and process it instead, the problem disappears. First I thought the application that creates those files have some bug in them but realizing that if I remove some (large) rows from the bottom clears the problem, I tried to turn on and off options one by one to see if there will be any change. And voila: setting UsedRangeIncludesFormatting to TRUE fixed by problem.
Because I'm only reading data, I really don't want any formatting to skew my UsedRange.LastRow/LastColumn values. Also, I always assumed setting this property to FALSE will make things simpler thus faster. Boy was I wrong. This property is really a performance killer for some files.
This happens even if you read and cache UsedRange.LastRow/LastColumn values and used these in your loops. Whenever you access the cell value, strangely UsedRange is still get called (I think this is a bug - or there should be a setting to let library know we're handling range checking).
Anyway, I found out an easy to use workaround: Just set the property to its original value after caching UsedRange.LastRow/LastColumn values, like:
ws.UsedRangeIncludesFormatting = false;
var stopRow = ws.UsedRange.LastRow;
var stopCol = ws.UsedRange.LastColumn;
-- set the property to its original value
ws.UsedRangeIncludesFormatting = true;
-- Cell access is much much faster after this point
for (int row = 1; row <= stopRow; row++) {
for (int col = 1; col <= stopCol; col++) {
var value = ws[row, col].Value;
}
}
Hope it helps,
Regards.