CHAPTER 1
Did you know you can handle millions of rows of data on your desktop? Build interactive charts and tables that analyse that data instantly? Create dashboards that bring together related data without needing a huge BI project? Explore your data in real time? Even plot it on a map?
Of course you can. You just do it in Excel.
No, seriously. There’s a surprisingly unknown tool in Excel called PowerPivot that delivers all of those capabilities in a format that is familiar to Excel users, simple to use and very powerful.
Microsoft first introduced PowerPivot to the world in beta as “Project Gemini” in 2009. It presented to desktop based data analysts a new way to analyse large amounts of data in memory in the familiar environment of Excel. It was formally introduced as a free add-in for Excel 2010, and made a standard component from 2013. It also found a home within Analysis Services – a core part of Microsoft’s SQL Server BI Platform - as a new way of modelling data for the enterprise.
PowerPivot is an In-Memory cube for the Desktop. What that means to you firstly is that your data is modelled with all the power of a cube in a full blown BI solution – i.e. that your data relationships are all managed for you and aggregations by different slicers is something that happens natively without additional effort on the part of the user. It opens the door to powerful, context aware calculations that enable analysis without constantly reworking formulas for different scenarios. The icing on the cake is that this all happens in your desktop’s memory – which is (currently) the fastest way to work with data on a computer.
The magic happens in the PowerPivot Data Model, an Excel like interface where you can work with millions of rows of data in seconds. This lives inside your Excel workbook.
There is a small limits on this in terms of memory. You can’t let the Excel workbook exceed 2GB in size, and the memory it consumes can’t exceed 4GB[1]. That’s still a huge workbook – and given how PowerPivot compresses data (more on this later) that means that you could be working with up to 30GB of raw data.
Cubes have traditionally been accessed using a language called MDX (for MultiDimensional Expressions). MDX is a difficult language to master and has often been one of the barriers to broader adoption of power cube technologies. The query language used by PowerPivot is DAX (for Data Analysis Expressions). The language is deliberately Excel Formula like to make it accessible to business users. For example, a simple addition of two columns looks like this:
TwoColumns:=[ColumnA]+[ColumnB]
The powerful IF statement looks like this:
IFColumn:=IF(A=1,B,C)
More significantly table level aggregations such as SUM look like this:
SUMMeasure:=SUM([Column])
Not too scary. Part of the appeal of DAX and PowerPivot technologies is that the language is much more accessible than its counterpart from the OLAP Modelled world, MDX. It’s often said the learning curves for the two languages look like this:
That is, DAX is easy to get started with but the advanced capabilities are quite hard to master. In contrast MDX is hard to get started, but once you have it under your belt it doesn’t get much harder. At the end of the book some resources for those looking deeper into DAX are listed for when you get to the far end of the learning curve and need some help.