SQLU SSAS Week: Why Do I Need a Cube & How Do I Get Started?

This week I’m pleased to be the professor at SQL University during SQL Server Analysis Services (SSAS) week. My assumption is that you know nothing about SSAS today and wonder if you ever need to. Today, before I get into the technical details of how to build a cube (which is the container of data that Analysis Services stores for you), I’ll start by describing the reasons you might need a cube and explain the preparatory steps that you should take before beginning cube construction. In later lessons this week, I’ll teach you how to build and deploy a simple cube (and what deployment means), and how you can make a cube accessible to other people.

You Might Need a Cube If…

Queries that summarize lots of data and join several tables run too slowly. Operational systems (sometimes called Online Transaction Processing or OLTP systems) are usually designed for optimal performance inserts, updates, and lookups. They often don’t respond well to queries that aggregate data across many months or years when millions or billions of records are involved.
Reporting against operational databases causes resource contention. When the database engine is grinding away at summarized reports, it responds more slowly to other requests – like important business transactions to insert. Everybody pays a price for these kinds of queries.
People spend more time at meetings trying to figure out why the numbers in their spreadsheets don’t match up than discussing whatever it was they were supposed to be discussing at the meeting. When everyone can get raw data from the OLTP system through their own ad hoc reports (usually a good thing), they may not consistently apply correct business rules to the data (which is definitely not a good thing). You’d be surprised (or maybe not) how many different ways people at the same company can calculate the same metric.
Commonly requested queries that compute growth analysis or calculate cumulative values are difficult to write. Sure, writing complex T-SQL builds character, but is that really how you want to spend your precious time?
People don’t know what data they want until they see it, so you wind up cycling through a lot of iterations of reports to give them what they need.

How Does Building a Cube Help?

SSAS stores data in a completely new way called OLAP (which stands for online analytical processing). OLAP storage is highly compressed and heavily indexed, so queries run fast, even with high volumes of data. And you can design a cube with aggregations, which is the secret sauce of OLAP. The query processing engine looks to see if a query can be answered from aggregations before it tries to retrieve data from the raw data in the cube. Aggregations are like pre-calculated summary tables that SSAS manages transparently after you specify the aggregation design.
SSAS stores data in a completely different place from your relational data. If the SSAS instance is on the same server as the database instance, you still have potential for resource contention at the hardware level, so ideally you use different servers in a production environment. But even if everything’s on the same server, your OLTP database activity is unaffected by your OLAP activity (except when you’re reprocessing the cube to load in fresh data, but that’s typically done after hours when people aren’t relying on OLTP for fast operations).
You can store business logic in the cube design, so that people don’t have to work out the calculations themselves. That way, no matter who asks for a given metric (presuming they don’t make changes to the value after running a query), everyone gets the same answer every time – when using the same parameters for the query, of course.
By using MDX – the query language for SSAS (which does take some time to learn) – queries for growth analysis , such as year-over-year, or cumulative values, such as year-to-date, are much more straightforward than their T-SQL counterparts. Of course, the trick is that you have to learn a new language, but once you get familiar with the basics, you’ll see that queries to support time-series analysis is much easier to write and the MDX queries often contain fewer lines of code.
People can use Excel pivot tables or other tools to explore SSAS data themselves. Tools in the Microsoft BI stack include Reporting Services, SharePoint dashboards, and PerformancePoint Services scorecards and dashboards. There are also third-party tools available to use with SSAS. Regardless of which tool they use, sometimes the exploration process takes people on detours that lead to new insights that they wouldn’t have discovered had they depended on someone like you to build reports for them.

Why Not Use PowerPivot?

You might have heard something about PowerPivot for Excel. Some people have gone so far as to claim that with PowerPivot, you never need to build a cube again. Well, that’s just not the case. PowerPivot is very good for personal BI, when you want to combine data from disparate sources in a way that’s very similar to SSAS. In fact, PowerPivot installs a special version of SSAS called VertiPaq mode – but it doesn’t behave identically to a standard SSAS instance. For one thing, it’s a single user instance. You can’t share your workbook with others unless they also have PowerPivot for Excel installed or unless you publish to a SharePoint farm that has PowerPivot for SharePoint installed and enabled. One shortcoming of PowerPivot right now is the lack of support of hierarchies, which in standard SSAS is useful for data navigation, certain types of queries (like time-series analysis), and query performance. And there are certain types of complex designs that standard SSAS can handle that PowerPivot cannot. The PowerPivot team published a blog post, “Comparing Analysis Services and PowerPivot,” which you can reference if you need help deciding which direction to go.

Getting Started: Software and Data

In the other lessons in this series, I’ll walk you through building an SSAS database using the AdventureWorksDW2008R2 database for SQL Server 2008 R2. You’ll need software installed, of course-both the database engine and SSAS. UPDATE: The hosted trial is no longer available. ~~You can download a trial version of SQL Server 2008 R2 if necessary.~~ If you only need to install SSAS, learn how at MSDN: How to Install Analysis Services by Using Setup.

And you’ll need data. You can download the sample databases for SQL Server 2008 R2 from Codeplex and follow the instructions there. If you prefer, you can use the SQL Server 2008 version instead. For the purposes of this series, there’s no significant difference between 2008 and 2008 R2.

Yet another option is to use the preconfigured SQL Server and SSAS instances at a free hosted trial of SQL Server 2008 R2. Then you won’t have to worry about installing anything, but you will have a 2-hour time limit.

Getting Started: A Theoretical Foundation

With the AdventureWorksDW2008R2 database on hand, you have everything you need to build a cube. The database restructures the data from AdventureWorks2008R2 as a star schema. But when you work with your own data, you’ll need to figure out how design a star schema that’s appropriate for your data. A good comprehensive resource for learning how to do this is The Data Warehouse Toolkit, Second Edition (Wiley, 2002) by Ralph Kimball and Margy Ross. However, for the next lesson, you don’t need to read an entire book. Instead, you can learn just enough to understand the terminology that I’ll be using by reading Designing the Star Schema Database, written by Craig Utley. There’s also a series at SQL University on Data Warehousing that you should review.

In real life, after designing your star schema and creating the physical tables, you’ll populate the tables using Integration Services (SSIS). You can learn more about SSIS at SQL University – just search for the SSIS classes on the homepage. For our lessons, the design and population of the star schema is already complete in AdventureWorksDW2008R2, therefore we’ll focus only on SSAS development this week.

So get your environment ready for the SSAS lessons by installing the software and sample databases (or sign up for the hosted trial), and spend some time learning the star schema lingo. In the next lesson, Dimension Design 101, we’re going to start building database objects!

2 Comments

Prithviraj Bhoite says:

July 3, 2011 at 3:48 pm

Thanks for this tutorial. It really helps in getting the basics right.

Back to basics: Why do you need OLAP cubes? | Paras Doshi - Blog says:

June 26, 2013 at 3:13 am

[…] Why do I need a Cube […]

Data Inspirations

Inspiring Intelligence from Information