A pivot table is used to display statistical summaries such as sums and averages of data. The data being processed to plot a pivot table in real-time is very huge, often approaching a million records.
Pivot tables are expected to be highly efficient in handling large amounts of data, so to meet the demands of processing a million records quickly, Syncfusion created the Pivot Table control for web platforms.
Though it is highly efficient, to make the best use of our Pivot Table, you need to follow a few best practices.
In this blog post, we will summarize the key features of the Pivot Table control, that contribute to its rendering performance when bound to a million records of raw data.
Rendering the Pivot Table control with one million data points
Thanks to the virtual scrolling feature, users can view large amounts of data in a web browser without any performance lag. With this feature, only information from the current page is appended to the DOM, and as the page is scrolled, data is retrieved dynamically to refresh the Pivot Table.
Basic features like drill down/up, filtering, sorting, aggregation, and calculated fields work effectively with large data sources.
As an additional consideration of the drill-down feature, only the visible information is used for calculation and merged with the pivot engine. This avoids recalculating values of all the members available in a hierarchy.
For example, consider a pivot table with eight columns, where a user expands a member in a row. In that case, only the visible columns will be used for calculation against the expanded row. Once the pivot engine updates, the pivot table will be refreshed automatically in the DOM.
Like the drill-down mechanism, other features like sorting, filtering, aggregation, and so on involve only the visible portion of data for calculations, instead of all the data, for a better performing Pivot Table control. The CPU and memory usage of the browser application is also maintained at an optimal level.
For reference, check out the sample showcasing Pivot Table performance on GitHub. Please find the difference in the performance when enabling and disabling the virtual scrolling feature in the following table,
|Aggregated Raw Data – Count||Row and Column – Count||Pivot Table rendering time|
|Without virtual scrolling||With virtual scrolling|
|1k||1k rows * 12 columns||6s||0.2s|
|5k||5k rows * 12 columns||30s (browser hangs)||0.4s|
High-performance filter dialog with many records
When the Pivot Table control is bound to a large data source, the member count may be high for certain fields. In such a scenario, opening the filter dialog will affect performance. So, to render a huge amount of members without any performance lag, a provision to limit the number of members to be displayed in the filter dialog has been provided. Also, users can filter the excess members with the help of a search option available in the filter dialog.
Check out a sample of the member filter dialog performance on GitHub. Please find the increase in the processing time with the increase in the members count in the following table,
|Members count||Time taken to view the dialog|
Refreshing the Pivot Table on demand
Like Microsoft Excel, our Pivot Table control also supports deferred layout updates. This allows end users to drag and drop fields between rows, columns, value axes, and filter axes to apply sorting and filtering inside the field list, resulting in changes to the pivot report but not the pivot table. Once all operations are performed, the pivot table will start to refresh the last modified report. This helps provide better performance in pivot table rendering.
To learn more about deferred layout updates, see our documentation.
How to handle date-type fields effectively
In the case of large data sources, if any fields are of date type, we recommend disabling sorting.
The reason is that raw data assigned to the pivot table will be of string type, and when sorting is applied to date values, the wrong results will be generated. So, during the generation of the pivot engine, we would parse the values and find their data type and format them.
For reference, check out the sample showing how to handle date-type fields on GitHub.
Improving Pivot Table performance when grouping by date
The date type can be formatted and displayed by its individual units, as year, quarter, month, day, hour, minute, or second. To do so, the input date value consumed in the source code should be segregated and reframed with necessary fields, which obviously results in performance constraints.
For example, if a field “StartDate” holds the value “15/AUG/2019 03:41 PM” in the raw data, and if the user wants to display the year and month alone, then date grouping will occur, like “StartDate_Year” = “2019” and “StartDate_Month” = “AUG”, and only then will the pivot engine preparation be initiated.
To overcome this, we recommend passing the data source with the date split into separate fields, which avoids the split-up process in the source code and directly starts preparing the pivot engine to display the pivot table.
For reference, check out the sample showing an alternate solution for date grouping in a large data source on GitHub. The performance improvement data while following this technique is provided in following table.
|Aggregated Raw Data – Count||Row and Column – Count||Time taken|
|Sorting raw data||Sorting data that is split into separate fields|
|100k||100k rows * 12 columns||11s||6s|
|200k||200k rows * 12 columns||21s||10s|
In this blog post, we have walked through the Pivot Table control’s performance-oriented features and suggestions. You can always download our free evaluation to see all our controls in action.
If you have any questions or require clarification about these features, please let us know in the comments below. You can also contact us through our support forum, Direct-Trac, or feedback portal. We are happy to assist you!