We use cookies to give you the best experience on our website. If you continue to browse, then you agree to our privacy policy and cookie policy. Image for the cookie policy date

Performance problem with UsedRangeIncludesFormatting and a workaround

For a couple of days now, I'm having a problem with "some" Excel files. Reading them was extremely (more than 50 times) slow. Strange thing is, if I copy the data part of the sheet to another blank Excel sheet and process it instead, the problem disappears. First I thought the application that creates those files have some bug in them but realizing that if I remove some (large) rows from the bottom clears the problem, I tried to turn on and off options one by one to see if there will be any change. And voila: setting UsedRangeIncludesFormatting to TRUE fixed by problem.

Because I'm only reading data, I really don't want any formatting to skew my UsedRange.LastRow/LastColumn values. Also, I always assumed setting this property to FALSE will make things simpler thus faster. Boy was I wrong. This property is really a performance killer for some files.

This happens even if you read and cache UsedRange.LastRow/LastColumn values and used these in your loops. Whenever you access the cell value, strangely UsedRange is still get called (I think this is a bug - or there should be a setting to let library know we're handling range checking).

Anyway, I found out an easy to use workaround: Just set the property to its original value after caching UsedRange.LastRow/LastColumn values, like:

ws.UsedRangeIncludesFormatting = false;
var stopRow = ws.UsedRange.LastRow;
var stopCol = ws.UsedRange.LastColumn;
-- set the property to its original value
ws.UsedRangeIncludesFormatting = true;
-- Cell access is much much faster after this point

for (int row = 1; row <= stopRow; row++) {
  for (int col = 1; col <= stopCol; col++) {
    var value = ws[row, col].Value;
  }
}

Hope it helps,

Regards.


7 Replies

KD Kesavan D Syncfusion Team July 6, 2016 06:55 AM UTC

Hi Osman, 

  

Thank you for contacting Syncfusion support. 

  

We tried all the possible combination to replicate the scenario mentioned by you, but it has hard for us face the issue in performance when using UsedRangeIncludesFormatting as stated by you. 

 

We kindly request you to share the excel document on which you faced the issue with code snippet or a simple issue reproducing sample along with your machine specification and time difference in performance when using UsedRangeIncludesFormatting with True/False. Which will be helpful for us to recreate the issue in our side to give you a prompt solution at the earliest. 

  

Regards, 

Kesavan 




OS Osman Sinan Guven July 6, 2016 04:56 PM UTC

Dear Kesavan,

You can find the test project attached. I've used Windows 10 Pro, VS 2015 Community Edition and Syncfusion Community Edition v14.2450.0.26. 

Sample file is an Excel 2010 file with a 100K row, with 9 columns (dummy) data (which is shortened version of the file we got in production).

With stated workaround reading all cells took ~1 sec. I could not wait execution to finish running without the workaround. But if I limit the loop conditions manually (say 1000 rows), it's 11msec vs. 14000msec.

I hope it helps.

Regards,

Sinan

Attachment: XlsIOTest_e66b8bb3.zip


KD Kesavan D Syncfusion Team July 7, 2016 02:23 PM UTC

Hi Osman, 

Thanks for providing us the document with the performance benchmark. 

By default, XlsIO considers a cell as used, even if there exists some formatting alone. You can disable this behavior, and make XlsIO consider a cell as used, only when there exists data, by using the UsedRangeIncludesFormatting property. 

You can refer the below link for more details:  
http://help.syncfusion.com/file-formats/xlsio/worksheet-cells-manipulation#accessing-used-range-of-a-worksheet 

In your code, you set the property to TRUE before reading cell values, which will again check all the cells including empty cells without formatting, which causes delay. To get only values from cells as there is no need of setting UsedRangeIncludesFormatting to TRUE again. You can just use the below code to achieve the requirement. 

//if (resetUsedRangeIncludesFormatting == true) 
     //    ws.UsedRangeIncludesFormatting = true; 

Regards, 
Kesavan 



OS Osman Sinan Guven July 8, 2016 03:25 PM UTC

Dear Kesavan,

I'm not sure I follow your response. As per your request, I've prep'ed and send you the sample project with sample Excel file. Would you mind sharing timings on your side?

I've already stated without that workaround the process takes forever. I know setting UsedRangeIncludesFormatting to TRUE should not be necessary but it's the whole point of this topic; it just does.

Please run the sample project I've sent with the sample Excel file. Either something wrong with the library or the Excel file.

Regards,

Sinan


KD Kesavan D Syncfusion Team July 11, 2016 12:58 PM UTC

Hi Sinan, 

Sorry for the inconvenience caused. 

Please find the detailed response for your concerns. 

Queries 
Solution 
Would you mind sharing timings on your side? 
Please find the benchmark details in our side for limited rows(1000 rows) 
Scenario 
Time taken 
WithReset 

100 milliseconds 
WithoutReset  

340000 milliseconds 


I've already stated without that workaround the process takes forever. I know setting UsedRangeIncludesFormatting to TRUE should not be necessary but it's the whole point of this topic; it just does. 
The logic you are referring as workaround is the required while reading large amount of data. 
Either something wrong with the library or the Excel file. 
There is not an issue related to the library or the excel file. It is a usage issue of the property UsedRangeIncludesFormatting  

Scenario based explanation: 

Scenario 1(With Reset): 
When reading cell value after setting UsedRangeIncludesFormatting to TRUE, we will not check if the cell is empty but with formatting each time var value = ws[row, col].Value;” statement is called. So the execution time is fast. 

Scenario 2(Without Reset): 
When reading cell value after setting UsedRangeIncludesFormatting to FALSE, we will check if the cell is empty but with formatting each time var value = ws[row, col].Value;” statement is called. So the execution time is delayed by the condition check for empty with formatting. 

Please let us know if you have any further queries. 
  
Regards, 
Kesavan 



OS Osman Sinan Guven July 12, 2016 10:12 AM UTC

Dear Kesavan,

Thanks for the detailed response. I understand that setting UsedRangeIncludesFormatting to FALSE cause some delays but this much performance impact cannot be expected. If as you mention this is normal for large files, somewhere in documentation this should be explained. I spent a lot of time to find out this problem.

Even then, you should provide a "escape" property for us. With the help of UsedRange property, I know which rows & columns I can access and I don't need XlsIO to check if the cell is empty or not with or without formatting. In this case I should be able to disable XlsIO empty cell check and go on without playing with UsedRangeIncludesFormatting property.

Still, there is something strange. As I said in my original post, if I remove say half of the rows from the sheet, making it 50K rows, file is processed without a hitch. I don't know if there is some "limit" on the row count that triggers this behavior.

Anyway, thanks for the help.

Regards,

Sinan Guven


KD Kesavan D Syncfusion Team July 13, 2016 11:27 AM UTC

Hi Sinan, 

Thank you for sharing your comments. 

We have confirmed the problem and logged defect a report regarding this. A support incident to track the status of this defect has been created under your account. Please log on to our support website to check for further updates 
  
https://www.syncfusion.com/account/login?ReturnUrl=%2fsupport%2fdirecttrac%2fincidents  
  
Please let us know if you have any questions. 

Regards,  
Kesavan 


SIGN IN To post a reply.
Loader.
Up arrow icon