We use cookies to give you the best experience on our website. If you continue to browse, then you agree to our privacy policy and cookie policy. Image for the cookie policy date
close icon

XLS.IO: Performance when trying to deal with all/many possibilities...

I am trying to write a rather general Excel-Import using XLS.IO. General approach (I'm using Dyalog APL, so here is some pseudo-code):
dim = range.GetRectangles.(Height Width)
IF dim[1>dim[2
// more rows than columns, resolve colwise
rcV = range.Columns
FOR rc IN rcV
clV = rc.Cells
FOR cl IN clV
array = valueConverter cl
...and so on
and valueConverter(cell) then checks:
IF cell.HasNumber
return cell.Number
ELSEIF cell.HasString
return cell.Text
ELSEIF cell.HasFormulaNumberValue
return cell.FormulaNumberValue
ELSEIF cell.HasFormulaStringValue
return cell.FormulaStringValue
ELSEIF cell.IsBlank
return Null
ELSEIF cell.HasDateTime
return cell.Number
ELSEIF cell.HasBoolean
return cell.Number
ELSEIF cell.HasRichText
return cell.RichText
ELSE
return cell.Value2
(lots of ENDIFs)
This code seemed to be necessary - I created a test-sheet that exploited all the possibilities of data-formats and I am now trying to handle them appropriately - i.e. often there are cells with formulas in the sheets and I find it unacceptable to then pretend being unable to deal with the value.
Now, unfortunately, this intense cellwise-processing takes lots of time even on "modest sheets" with 25,000 rows, so I wonder if you have any recommendations how to improve this code?
Ultimately I only want to get Numbers or Text, but trying to support the various ways in which they might be represented.

7 Replies

AV Abirami Varadharajan Syncfusion Team September 27, 2017 12:21 PM UTC

Hi Michael, 
 
Thank you for contacting Syncfusion support. 
 
We are unable to reproduce issue from the shared information. We have prepared sample as per your scenario and shared for your reference which can be downloaded from the following link. 
 
Sample Link: http://www.syncfusion.com/downloads/support/directtrac/general/ze/XlsIO_Sample-2096210946.zip 
We request you to use IMigrantRange to improve performance and reduce the memory consumption. Kindly refer the below UG documentation link for the same.  
We suggest you to upgrade to our latest version (15.3.0.29) of Essential Studio which has more bug fixes and enhancements which is available for download under the following link. 
  
  
If the issue still persists after upgrading, please share us the issue reproducing sample along with the input Excel document  which will be helpful for us to investigate further on this.  
 
Regards, 
Abirami. 




PN Preethi Nesakkan Gnanadurai Syncfusion Team October 10, 2017 04:16 AM UTC


​Hello, 

thanks very much for these suggestions. Switching to iMigrantRange has really helped a lot and improced performance by a factor of approx. 300! :-) 
However, for the extreme case of a 1.5 Million-Row spreadsheet, I still takes more than 10 minutes...so I wonder if there is any way to optimize even further? 

You asked for source-code, so I'm enclosing my code - which is written in Dyalog APL. That might be bit too exotic, but I hope you'll see that is closely modelled after the sample you've sent, so if you find any potential improvements when running your sample against my file, these will probably be applicable for my code as well... ;-) 

Thanks very much 

Michael 

Attached I am sending the sample-file I am using for my tests 




AV Abirami Varadharajan Syncfusion Team October 11, 2017 12:55 PM UTC

Hi Michael, 
 
Thank you for updating us. 
 
We request you to make use of TRangeValueType enumeration to get the cell value based on cell type which consumes less time. We have shared the code snippet for the same for your reference. 
 
Note: We request you to include below namespace to your sample to avoid compilation issue.  
  
Code Example:  
Imports Syncfusion.Calculate 
Imports TRangeValueType = Syncfusion.XlsIO.Implementation.WorksheetImpl.TRangeValueType 
Imports Syncfusion.XlsIO.Implementation 
  
 
Code Snippet: 
 
                    'Instantiate the spreadsheet creation engine. 
                                   Dim excelEngine As New ExcelEngine() 
                                   'Instantiate the excel application object. 
                                   Dim application As IApplication = excelEngine.Excel 
 
                                   'The workbook is opened. 
                                   Dim workbook As IWorkbook = application.Workbooks.Open("opendata_donations1,4m.xlsx") 
 
                                   'The first worksheet object in the worksheets collection is accessed. 
                                   Dim worksheet As IWorksheet = workbook.Worksheets(0) 
                                   worksheet.EnableSheetCalculations() 
                                   Dim lastRow As Integer = worksheet.UsedRange.LastRow 
                                   Dim lastColumn As Integer = worksheet.UsedRange.LastColumn 
                                   Dim cell As IMigrantRange = worksheet.MigrantRange 
                                   Dim sheet As WorksheetImpl = TryCast(worksheet, WorksheetImpl) 
                                   For row As Integer = 1 To lastRow 
                                               For column As Integer = 1 To lastColumn 
                                                           Dim valType As TRangeValueType = sheet.GetCellType(row, column, False) 
                                                           Dim cellValue As Object 
                                                           Select Case valType 
                                                                       Case TRangeValueType.Blank 
                                                                                   cellValue = String.Empty 
                                                                                   Exit Select 
                                                                       Case TRangeValueType.[String] 
                                                                                   cellValue = sheet.GetText(row, column) 
                                                                                   Exit Select 
                                                                       Case TRangeValueType.Number 
                                                                                   cellValue = sheet.GetNumber(row, column) 
                                                                                   Exit Select 
                                                                       Case TRangeValueType.[Boolean] 
                                                                                   cellValue = sheet.GetBoolean(row, column).ToString().ToUpper() 
                                                                                   Exit Select 
                                                                       Case TRangeValueType.Formula 
                                                                                   Dim cellRef As String = RangeInfo.GetAlphaLabel(Column) + Row.ToString() 
                                                                                   sheet.CalcEngine.PullUpdatedValue(cellRef) 
                                                                                   Dim rangeType As TRangeValueType = sheet.GetCellType(row, column, True) 
                                                                                   Dim code As Integer = CInt(rangeType) 
                                                                                   Select Case code 
                                                                                              Case 9 
                                                                                                          'TRangeValueType.Error 
                                                                                                          cellValue = sheet.GetFormulaErrorValue(row, column) 
                                                                                                          Exit Select 
                                                                                              Case 10 
                                                                                                          'TRangeValueType.Boolean 
                                                                                                          cellValue = sheet.GetFormulaBoolValue(row, column).ToString().ToUpper() 
                                                                                                          Exit Select 
                                                                                              Case 12 
                                                                                                          'TRangeValueType.Number 
                                                                                                          cellValue = sheet.GetFormulaNumberValue(row, column) 
                                                                                                          Exit Select 
                                                                                              Case 24 
                                                                                                          'TRangeValueType.String 
                                                                                                          cellValue = sheet.GetFormulaStringValue(row, column) 
                                                                                                          Exit Select 
                                                                                              Case Else 
                                                                                                          ' returns for blank record 
                                                                                                          cellValue = String.Empty 
                                                                                                          Exit Select 
                                                                                   End Select 
                                                                                   Exit Select 
                                                                       Case Else 
                                                                                   cell.ResetRowColumn(row, column) 
                                                                                   cellValue = cell.Value2 
                                                                                   Exit Select 
                                                           End Select 
                                               Next 
                                   Next 
 
 
 
Kindly try this and let us know if this helps at your end. 
 
Regards, 
Abirami. 



PG Pon Geetha A J Syncfusion Team October 19, 2017 03:54 AM UTC

Thanks, 

these suggestions helped to gain a 
(I made it a bit fater by creating variabled with the various values of the RangeValueType-Collection which saved querxing the .net-Collection each time, in the APL-Environment that also summed up to a few seconds.) 
But for large cases (800,000 rows and more) I I still have runtimes of 10minutes and more.  But I assume there's no way to get more speed? 

Thanks 

Michael 



AV Abirami Varadharajan Syncfusion Team October 21, 2017 02:11 PM UTC

Hi Michael, 

Thank you for updating us. 

We are unable to reproduce the issue. The file you have shared has 1,048,576 rows which is processed in 3 minutes. We have shared the video screenshot for the same for your reference which can be downloaded from the following link. 


Kindly share us issue reproducing sample or code snippet which will be helpful for us to provide prompt solution at the earliest. 

Regards, 
Abirami. 



PG Pon Geetha A J Syncfusion Team November 2, 2017 04:06 AM UTC

Hi, 

just FYI: I discusses the matter with my "performance-friends" which are responsible for the framework that I use and they commented that there probably is not much I could do to improve this, as most of time might be spent in the code that deals with the .net-environment (after all, this spreadsheet would lead to at least 20+ mio calls). They also suggested I should look at using the option to associate data with a DataTatble - and retrieving the datatatble then is much faster. So the issue can be closed, I am happy now :-) 

Thanks very much 

Michael 




MC Mohan Chandran Syncfusion Team November 3, 2017 10:19 AM UTC

Hi Michael, 
  
We are glad that the issue is resolved at your end. Please let us know if you have any other queries. 
  
Regards, 
Mohan Chandran. 


Loader.
Live Chat Icon For mobile
Up arrow icon