#11: Manage your own surrogate keys.
In SQL Server it is common to use an INT or BIGINT set as IDENTITY to create unique, synthetic keys. The number is a sequence and a new value is generated when we execute an insert. There are some issues with this. Quite often we need this value in our Integration Services solution to do logging and efficient loads of the data warehouse (there will be a separate tip on this). This means that sometimes we need the value before an insert and sometimes after. You can obtain the last value generated by issuing a SCOPE_IDENTITY command but this will require an extra trip to the server per row flowing through your pipeline. Obtaining the value before an insert happens is not possible in a safe way. A better option is to generate the keys yourself through a script component. Google for “ssis surrogate key” and you will find a lot of examples.
#12: Excel should be your default front-end tool.
I know this is a little bit controversial. Some say Excel lacks the power of a “real” BI tool. Others say it writes inefficient queries. But hear me out. Firstly, if you look at where Microsoft is making investments in the BI stack, Excel is right up there at the top. Contrast that to what they are doing with PerformancePoint and Reporting Services and its pretty clear that Excel is the most future proof of the lot. Microsoft have added lot of BI features over the last couple of releases and continue to expand it through new add-ins such as data explorer and geoflow. Additionally, the integration with SharePoint gets tighter and tighter. The Excel web client of SharePoint 2013 is pretty on par with the fat Excel client when it comes to BI functionality. This means that you can push out the new features to users who have not yet upgraded to the newer versions of Excel. When it comes to the efficiency with which Excel queries SSAS a lot has become better. But being a general analysis tool it will never be able to optimize its queries as you would if you wrote them specifically for a report.Please note that I am saying “default” not “best”. Of course there are better, pure bred, Business Intelligence front-ends out there. Some of them even have superior integration with SSAS. But its hard to beat the cost-value ratio of Excel if you are already running a Microsoft shop. If you add in the fact that many managers and knowledge workers already do a lot of work in Excel and know the tool well the equation becomes even more attractive.
#13: Hug an infrastructure expert that knows BI workloads.
Like most IT solutions, Microsoft BI solutions are only as good as the hardware and server configurations they run on. Getting this right is very difficult and requires deep knowledge in operating systems, networks, physical hardware, security and the software that is going to run on these foundations. To make matters worse, BI solutions have workloads that often differ fundamentally from line of business applications in the way they access system resources and services. If you work with a person that knows both of these aspects you should give him or her a hug every day because they are a rare breed. Typically BI consultants know a lot about the characteristics of BI workloads but nothing about how to configure hardware and software to support these. Infrastructure consultants on the other hand know a lot about hardware and software but nothing about the specific ways BI solutions access these. Here are three examples: Integration Services is mainly memory constrained. It is very efficient at processing data as a stream as long as there is enough memory for it. The instant it runs out of memory and starts swapping to disk you will see a dramatic decrease in performance. So if you are doing heavy ETL, co-locating this with other memory hungry services on the same infrastructure is probably a bad idea. The other example is the way data is loaded and accessed in data warehouses. Unlike business systems that often do random data access (“Open the customer card for Henry James”) data warehouses are sequential. Batches of transactions are loaded into the warehouse and data is retrieved by reports / analysis services models in batches. This has a significant impact on how you should balance the hardware and configuration of your SQL Server database engine and differs fundamentally from how you handle workloads from business applications. The last example may sound extreme but is something I have encountered multiple times. When businesses outsource their infrastructure to a third party they give up some of the control and knowledge in exchange for an ability to “focus on their core business”. This is a good philosophy with real value. Unfortunately if you do not have anyone on the requesting side of this partnership that knows what to ask for when ordering infrastructure for your BI project what you get can be pretty far off from what you need. Recently a client of mine made such a request for a SQL Server based data warehouse server. The hosting partner followed their SLA protocol and supplied a high availability configuration with a mandatory full recovery model for all databases. You can imagine the exploding need for disk space for the transaction logs when loading batches of 20 million rows each night. As these examples illustrate, it is critical for a successful BI implementation to have people with infrastructure competency on your BI team that also understand how BI solutions differ from “traditional” business solutions and can apply the right infrastructure configurations.
#14: Use Team Foundation Server for your BI projects too.
A couple of years ago putting Microsoft BI projects under source control was a painful experience where the benefits drowned in a myriad of technical issues. This has improved a lot. Most BI artifacts now integrate well with TFS and BI teams can greatly benefit from all the functionality provided by the product such as source control, issue tracking and reporting. Especially for larger projects with multiple developers working against the same solution TFS is the way to go in order to be able to work effectively in parallel. As an added benefit you will sleep better at night knowing that you can roll back that dodgy check-in you performed a couple of hours ago. With that said there are still issues with the TFS integration. SSAS data source views are a constant worry as are server and database roles. But all of this (including workarounds) is pretty well documented online.
#15: Enforce your attribute relationships.
This is mostly related to SSAS dimensional but you should also keep it in mind when working with tabular. Attribute relationships define how attributes of a dimension relate to each other (roll up into each other). For example would products roll up into product subgroups which would again roll into product groups. This is a consequence of the denormalization process many data warehouse models go through where complex relationships are flattened out into wide dimension tables. These relationships should be definied in SSAS to boost general performance. The magic best-practice analyzer built into data tools makes sure you remember this with its blue squiggly lines. Usually it takes some trial and error before you get it right but in the end you are able to process your dimension without those duplicate attribute key errors. If you still don’t know what I am talking about look it up online such as here. So far so good. Problems start arising when these attribute relationships are not enforced in your data source, typically a data warehouse. Continuing with the example from earlier over time you might get the same product subgroup referencing different product groups (“parents”). This is not allowed and will cause a processing of the dimension to fail in SSAS (those pesky duplicate key errors). To handle this a bit more gracefully than simply leaving your cube(s) in an unprocessed state (with the angry phone calls this brings with it) you should enforce the relationship at the ETL level, in Integration Services. When loading a dimension you should reject / handle cases where these relationships are violated and notify someone that this happened. The process should make sure that the integrity of the model is maintained by assigning “violators” to a special member of the parent attribute that marks it as “suspect”. In this way your cubes can still be processed while highlighting data that needs attention.