Friday, December 2, 2016

Tableau - Implementation Challenges and Best Practices

Hi All,

I thought of sharing my leanings and experiences with Tableau so far.

This post will describe some of the challenges you could face while implementing Tableau BI in your enterprise from architectural standpoint.

If you are evaluating BI tools or planning to start implementation, you will definitely benefit from this post. I would be highlighting some of the best practices that you can include in your list.

Tableau is flexible when it comes to playing with and analyzing your data. It gives you complete freedom to choose and connect to your data source, and quickly start building those nice Viz (reports or charts or dashboards).
You can do pretty much everything to join the data sources in a SQL, put filters to restrict your data. If you are a data analyst, you can build some really compelling data visualizations or charts in a very short span of time.
Now you show those nice visualizations with your team or department and they too get very exited.

Till here it was all cool stuff. The challenges starts from here.

1. I don't need a Datawarehouse star schemas. REALLY!!!:
Datawarehouse star schema contains Fact and dimensions that gives you enormous benefits in simplifying your implementation. You won't believe how it can benefit in terms of performance, scalability and maintenance.
Some may argue that Tableau doesn't need any kind of warehouse or these fact and dimensions star schemas.
Well, if you are really a very small enterprise then you may not need it but otherwise if you have good amount of data and have various source systems and applications, then do not build your BI without a datawarehouse. Or sometimes, your organisation has a warehouse but as a data analyst you may be tempted to NOT use it.
Since Tableau does not have any centralized metadata layer, users are free to create their SQL the way they want. This freedom proves costlier in long term strategy.
Developers build their SQL's on top of OLTP or normalized data structures and the result is you have highly complex SQL's with large numbers of joins giving you poor performance.
Very soon you will have hundreds of those complex SQL's with lots of duplicate data/information where one SQL may differ from another SQL slightly. It's not so easy to debug those complex SQL's to make any additions or alterations. Now you understand how difficult it would be to maintain those SQL's.
Star schema reduces those joins and makes your SQL very simple, and of course the performance is way better.
Tableau can extract the data in extract mode and improves the performance to some extent but do not just ignore the other benefits.For some reason in future if you need to make your application in Live mode then you may need to completely redesign it. Such reasons could be more frequent data refresh or implementing row level security for which you need to have Live connection for your Tableau application.

2. Temptation to put ALMOST everything in one Tableau Workbook:
When you start creating an application, you start with small dataset providing answers to very limited or few business questions. This is what tableau is built for.

Slowly when more and more people starts looking at it, they start asking for more and more information. This is when we start adding new data sets, joins, logics and conditions. And our application starts growing from all angles.
It becomes more complex, performance goes down and it becomes difficult to  scale.
If we take a break here and plan things, we can do it in much better way.
Once we realize that our application is growing, think of going to point no 1 above of creating/extending the dimensional model.
You need recreate your application using a dimensional model. If you think about this early, you will reduce the amount of rework you would have to do.
The ideal design would be to do all the data analysis/discovery using your source systems structures (assuming you do not have a warehouse or the required information is not present in a warehouse at all).
Utilize all the freedom Tableau provides here. But once you start thinking of making it available for mass consumption by enterprise users, design the required subject areas (Facts and dimensions) or extend the existing ones.
Build your application now using these subjects areas. Your application would be simple, fast, scalable and easy to maintain. Since the new SQL would be using less joins, fewer calculations and aggregations, it would be fast and easier to read.
You can now imagine the benefits. If you need more data elements or metrics, simply keep adding them to your subject areas.
This will enable you to extend or scale your application to a greater extent BUT this does not mean you can still put ALMOST everything in one workbook.
Definitely there is some more work here but I am sure you would appreciate the benefits it would bring in the long run.

3. I Still want to put almost everything in one Workbook:
You may be wondering if I am against that. Well I am not.
There are many instances where we need to have information to be displayed on our dashboards side by side that may be coming from different subject areas or Stars but there are certain things we need to consider and remember.
Since Tableau does not have a Semantic layer (aka Common Enterprise Reporting Layer), we need to have all the tables added to that one workbook as Data sources.
Here the grain of the data plays an important role. If the grain of the data is same then all can fit in one data source/SQL.
But if the grain of the two data sources are different and there is a need to have an interaction between these data sources then the real trouble starts.
When I say interaction between these two data sources, I mean to say that we need to pass common filters between them or need to show the data coming from these two data sources into one Viz/report.
When we need to have an interaction, we need to have a join between these two data sources. Tableau allows joins across data sources or perform blending BUT it may prove to be very costly in terms of performance and stability.
You would be surprised that even if individual queries have sub second response time, after applying the join the response time may be in minutes.
If your individual queries have limited or small data, it may work for you in some cases.
Better ALWAYS test it out. Even Tableau experts suggest to avoid using the blending.

4. OK. what is the Solution then:
I know its frustrating when we talk about limitations only. Here it is also important to understand why such limitations when Tableau is such a nice tool?.
Well, Tableau is a tool for data discovery. Go quickly grab your data and starts visualizing it. But once we have built those nice dashboards we need to make it available for the enterprise users. Tableau CAN DO certain things here but its not made for that. Now you are trying to make it do something that some enterprise BI reporting tools such as Oracle OBIEE or Business Objects or Cognos are just made for that. These tools CAN do some data discovery but not the way Tableau does similarly Tableau CAN do some dashboarding but not the way they do it.
Here I am not comparing Tableau with them since they are not comparable and have totally different use case and technologies.

5. What else can I do to?
All right. Here is the solution.
We need to design our Tableau workbooks and dashboards intelligently keeping in mind the limitations.
Think of having a common landing page workbook with hyperlinks to all the other applications. Think of having some very common filters on your landing page. So your first workbook have just dimension data for those filters.
Now you can also think of making one or more of these filters mandatory meaning users need to have a filter value selected in order to go to a specific workbook/dashboard.
This would help in cases when your workbooks/dashboards have tons of data and you want to avoid just showing all of that data and slow down your application.
Now, you can build your simplified workbooks based on individual common subject areas and link them to your landing page.
Since Tableau allows to pass the filters between workbooks, you can pass the common filters from one workbook to another.
There may be certain cases when we want to have a dashboard/report having data from 2 different data sources and in those cases you can consider blending. I know I said Tableau experts suggest to avoid it.
See if blending works fine for you else think of creating a physical table in database combining the two sets of data having different grains.
This table will have data at both the grains and some indicator column will tell the row has data for which grain. you will find any example on the web for such cases since this issue is not specific to Tableau but common to data warehouse.

6. THAT'S IT!!!
Well I guess So until something comes to my mind. Please post your comments and questions, and share your thoughts and experiences.

Thanks for reading!!!
Manohar Rana

Saturday, May 28, 2011

Qlikview is now a Leader


As I was expecting, Qlikview joined the leaders quadrant in Gartner's magic quadrant for BI 2011.
Qlikview is cited as a self contained BI Platform and the strengths being interactive, great visualization and end user friendliness and satisfaction. I am very happy about it.
But I am more focused on seeing the challenges ahead. It will be interesting to see how Qlikview maintain that position and stand in the competition.
The challenges cited by Gartner are
1. Lack of expansive product strategy
2. Limited metadata management
3. Lack of broad (high volume) BI deployments
4. Lack of Microsoft office integration
5. Poor Performance when data volume and number of Users increases.

The findings are not new and Qliktech surely needs to seriously think about these shortcomings.
I want to discuss further on the above points in detail.

1. Lack of expansive product strategy : To compete with large vendors like Oracle, it becomes very important to have a competent product expansion strategy. Oracle has very aggressive product strategy and has a vision to integrate its various offerings like Oracle BI, Hyperion Essbase, Oracle Enterprise performance management and more importantly their pre built analytic models popularly known as BI Applications. Though Qliktech has already taken one step in this direction by targeting application vendors like Salesforce and can offer pre built models for Salesforce customers but this is not enough. Qliktech has to work agressively in developing such pre built models for other but big applications. EPM is one area which is still untouched and lack of vision in this area can be disastrous and will simply throw Qliktech out of competition. Vendors not only should now think of Softwares but also start thinking about offering Hardware configured for optimum and enhanced performance. Oracle has got its popular Oracle Exadata, its database pre configured with HP's hardware and is agressively promoting it.

2. Limited metadata management : Qlikview offers limited metadata management capabilities and the primary reason I see is because Qlikview is focussed on small scale or much smaller than average size deployments, it did not see much relevance of metadata management. This can be dangerous to them as well as their clients as when they grow, they will start seeing the need for it and would require the investment they tried to save at the beginning. Even if Qliktech decides to go for building its capabilities in metadata management, the basic problem for them will be to start believing in OLAP dimensional building which will be against their basic principles. Qliktech market its product as a non OLAP tool which actually is not and treat the underlying data as a cloud in the memory. Hence when it will see the need for conforming dimensions to do cross functional analysis, it may become a matter of choice rather than a matter of capability.

3. Lack of broad (high volume) BI deployments: For Qliktech as mentioned above and as cited in Gartner's report, the major challenge will be to deploy large scale applications. As of now they have proved their capabilities in small or much smaller than average scale deployments and I think that is what Qlikview was made for. One of the Qliktech's selling point is that Qlikview do not require a datawarehouse. Now this same selling point will stop them to move ahead or prove their capabilities in average and large scale deployments.
For those who want to know why, please read one of my earlier post here
This again will depend on reviewing its sales strategy and making corrections to their basic beliefs which is not going to be an easy task. If they do not start using the terms datawarehouse and OLAP, it will difficult to maintain the Leaders position.

4. Lack of Microsoft office integration: This is something I have mentioned in one of my post in Year 2008 read here. It seems Qliktech is least bothered. Its current capabilities are very basic in terms of simple export to MS Excel. In coming releases if it do not develop such capabilities, it will he hard for Gartner or Forrester to give a space to Qliktech in their reports and compare Qlikview with Oracle or IBM. There are many more such features which I have mentioned in my post earlier. Some of them which are important according to me are building connectors for their proprietary QVD and QVW files so that their models can be available to other applications, SQL generation queries to help developers in debugging etc.

5. Poor Performance when data volume and number of Users increases: This is again linked to point number 3 above.

Feel free to post your comments or thoughts.

Till next time

Manohar Rana

Saturday, April 16, 2011

Enhance Business Intelligence Performance

Hi All,

In any Business Intelligence Implementation, the key factor is the performance. Performance factor always plays a key role in User accepting or liking the application.
We should do everything possible to enhance the performance and here are some tips some of which are very generic and can be used in any BI Implementation.
From a solution perspective:
1. Use of Datawarehouse: Though a datawarehouse is not compulsory for any BI Implementation, we cannot simply think about a BI solution without a datawarehouse because of the advantages it offers in terms of performance and simplicity. This is important for small implementations who sometimes neglect and underestimate the use of datawarehouse.
From a BI Tool Perspective:
1. For every tool it is important to reduce the size of the application by removing the unnecessary objects.
2. Try to create different database connections for different set of users based on the priority.
3. Try to create a seperate database connection for populating any session level variables.
4. Try to make the best use of system Cache. If the tool allows to cache the results of the queries, use it and if possible pre populate the queries which are very frequent and used mostly.
5. Minimise the calculations happening at the BI level by pre calculating them in datawarehouse.
From a database perspective:
1. The most important thing is to perform every possible calculation you can do in database. We very frequently neglect this saying this is a small thing and cal be calculated or performed at BI level. We should avoid this and if something is possible in ETL or database, do it here even it cost you adding a few extra columns or tables.
2. If you can create a perfect star models, nothing like that.
3. Try to use the database techniques like Partitioning and indexing to enhance the performance of database queries.

There may be several other tips and techniques which we can follow to improve performance and if you have any, please feel free to share.
Till next time.

Manohar Rana