Sunday, August 11, 2019

Clinical Intelligence Analytics - Trends by Study Attributes

In the last post, we looked at the growth trends of studies registered, initiated, completed and posted results over the period of last 20 years. We looked at yearly trends and then drilled down at quarterly and monthly trends and compared the growth in current year with the previous year.
In this post, we will see the growth trends of registered studies by study attributes like study type, study phase, Drug/Device and DMC flag over the period. I have filtered out the studies where the study attributes were not specified.
Figure 2.4

This dashboard (Figure 2.4) is an extension of growth trends:
1. Study Submission Trend by Study Type-
The chart shows the trend of studies submitted for interventional and observational types.
The share of interventional studies have decreased. In early 2000's, there were around 90-93% interventional studies and 7-10% observational studies. In last few years, the share of observational studies have increased to 21%.

2.  Study Submission Trend by FDA oversight-
The chart shows the trend of studies by the oversight of the FDA, if the study is for FDA regulated drug or a device. 
Until 2008, the share of the studies for FDA regulated devices was less than 10% which has increased to around 25% now.

3. Study Submission Trend by Study Phase-
 Phase 3 studies are important studies since sponsors apply for FDA approval after that. 
Study type NA are the studies that do not have a phase of a study, and I guess it is mainly for the device related studies. These studies share have increased significantly over the period and I think its because the device related studies share have increased as we observed in the previous chart.
There is a slight drop in the share of phase 1 studies. The share of Phase 2 studies in green has decreased from 43% in 2003 to 12% in 2018. Phase 3 studies have also followed the same trend with the share reducing to just 7% in 2018 from 25% in 2005. Phase 4 are post marketing studies and the share has reduced from 16% in 2005 to 7% in 2018.

4. Study Submission Trend by DMC-
The DMC flag tells if the study has a Data Monitoring Committee appointed or not. There were many studies that did not mention if they have DMC or not. As I mentioned in the beginning, the chart is the representation of only studies that has the data and others were filtered out.
The share of DMC appointed studies increased in the first few years and then start to drop until 2008 after which it picked up again a bit and maintaining the 40% level until 2013 but falling down a bit to 34% in 2018.
Feel free to share your thoughts.
See you soon in the next post.

Saturday, August 10, 2019

Clinical Intelligence Analytics - Growth Trends

Trial Growth Trends Dashboard

Growth Trends Dashboard provides insights into the growth trends of clinical trials over a period of time. This helps us gauge how the industry is growing. 
Below (Figure 2.1) is a snapshot of the dashboard without any data filters. Figure 2.2 is another representation to view the Year over Year growth. Year 2019 is the current year and hence the decline should not be viewed as negative growth. We still have few months remaining in 2019.
Figure 2.1
Figure 2.2

The Dashboard has the same summary tiles on top of the dashboard except a new metric to measure average number of days from registration to study initiation or enrollment.

Broadly, there are 4 times that are important milestones from study registration to the posting of results. The trends in this dashboard shows these 4 important milestones. The Growth Trends Dashboard can answer some of the following questions:
1. How the studies registered in the US has grown over the last 20 years?
Registering the study is the first step and hence, it helps us in understanding how the industry is growing. A decline in study registration would mean less successful drugs or devices reaching the market. With patented drugs losing patent protection, a declining product pipeline can negatively impact the growth of the industry. CRO(Clinical Research Organisations) conduct the clinical trials on behalf of the sponsors and hence a decline in study registration can put brakes on the growth of CRO industry.
The study registration trend is increasing consistently except in year 2005 and 2008 that saw a sudden jump. I read that the International Committee of Medical journal Editors(ICMJE) began requiring trial registration as a condition of publication in September 2005. This explains the jump in 2005. The jump in year 2008 could be related to the recession period.
Beyond 2008, there has been consistent growth sometimes in double digits. The outlook for the industry as a whole looks positive.

2. Has the growth in the studies initiated been consistent?
Sponsors take lot's of efforts in initiating a study. It's a commitment they make in terms of money and efforts and hence it is an important indicator to see the growth in study initiation. The average duration for a study to go from registration to initiation is 326 days.
The growth in number of studies initiated over the last 20 years is also consistent until 2016. The growth has declined in last 2 years around -6.5% YOY. Year 2019 may also end up following the declining trend. The rate of YOY growth has consistently gone down from 42% in 2002 to -6.5% in 2018. If the trend continues, the coming years may prove to be difficult for the industry and companies need to start looking at alternatives to increase the product pipeline.
3. Are studies being completed growing consistently?
Completing the study is like clearing a big hurdle in drug development. Conducting a study is lengthy, costly, complex and risky and hence a successful completion is a big milestone and a big relief to sponsors.
The growth in studies completed over the last 20 years has been positive and consistent until last year 2018 when the growth was negative. The rate of growth declined consistently after 2007 with a recovery in 2014. The rate of growth came down to 1% in 2017 and then dropped to -8.5% in 2018. The current year 2019 doesn't look to be making any recovery. With just few months remaining, 2019 is 64% away from last year 2018. 
4. Are sponsors submitting study results growing? launched the results database in September 2008 and hence sponsors started posting the results after that. 
The trend shows that the posted results increased at a very high rate until 2014 recovering 39% in 2017 but again declined 20% in 2018. Current year 2019 seems to be matching up but it is yet to be seen if it will end with a positive growth.
If you notice, I compared the current year 2019 with last year 2018 to get an understanding of the current year performance. I did get some fair idea but I wanted to go one level down and see the current year has performed on a quarterly and monthly basis as compared to the last year. I wanted to see if there is any trend where there is high activity in certain months, so I created a dashboard which can provide some insights into understanding the current year performance in Figure 2.3.
Figure 2.3

The four tabular reports on the top row compares the monthly comparisons of current year months with previous year months. The increase column is the difference and appear in green or red based on whether the change is positive or negative. The bar charts on the bottom row compares the same metrics on quarter basis. Remember, August is the current month and hence the third quarter will not have complete values.
Let's take a look at all 4 metrics one by one and make some observations.
1. Study Submissions:
Current year has performed well so far and the number of studies have maintained a positive growth when compared with the same period of last year. Observe that last August had the highest number of studies submitted and unless there is a trend to submit the studies in the last few days of the month, we can expect August to play fairly well but may not be able to exceed. The lead current year has maintained in the first 2 quarters may help to gain a positive growth this year in number of studies submitted.
2. Studies Initiated:
So far, not a single month with a positive growth and the current year months have unperformed with a large margin. There are no months in the remaining period with unusually high or low activity and so if the trend continued, we may see a large drop in the 2019 study initiations as compared to 2018. I will keep an eye on that in the coming months and share with the readers here.
3. Studies Completed:
The monthly and quarterly numbers looks disappointing similar to study initiation. December month has high numbers possibly due to year end activities so we can expect some improvement in filling in the gap but the overall outlook doesn't look promising. 2019 may end up with a significant drop as compared to 2018.
4. Study postings:
The first two quarters have performed well and the current quarter also looking good. Hopefully, 2019 would see some positive growth. 
With that positive note, see you till next time. 

Friday, August 9, 2019

Clinical Intelligence Analytics - Summary

Clinical Summary Dashboard

Summary Dashboard gives an overall picture of clinical trials in a summarized form at a high level. 
Below (Figure 1.1) is a picture of the dashboard without any data filters. If I make any selection or filter, I will call it out and mention it.
Figure 1.1

The Dashboard tries to answer some basic questions to start with, and then we start looking at things from different angles and dimensions. 
The summary Dashboard can answer some of the following questions:

1. How many studies have been registered so far in the US?
This the total number of studies registered in the database. As on 09-Aug-2019, there are 313,345 studies registered in the US.

2. How many studies did actually started?
A study is considered to be started when it enrolls its first patient. It's an important milestone in the entire clinical trial process. 291,364 out of 313,345 which is around 93% of the studies did actually start by enrolling a patient.

3. How many studies did actually completed? 
Another important milestone is when a study completes and the patients who participated in the study stops receiving the drug. Of the 291,364 studies that started, 167,511 studies were completed, which is 58%. 

4. For how many studies did the sponsors posted the results?
Once the study is completed, the data collected is analyzed and the results are posted. Results Database was initiated in 2008 whereas the study registration began in 2000, so there might be studies without posted results. 38,127 studies have results posted out of 167,511 completed studies which is 23%.    

5. How many studies are currently recruiting patients?
A study recruits patients as study facility location called study sites. These study locations could be in many countries. 54,968 studies are presently enrolling patients including the ones that are enrolling only by invitation. This means 19% studies of the 291,365 studies that were started are still recruiting patients. 

6. How many patients were recruited in the past?
There were few enrollment values such as 99999999 that were converted to 0. About 490 Million people participated in the clinical trials across the globe from 210 countries. As per the recent United Nations estimate, the would population is 7.7 billion. We can say that 6.4% of world population has participated in clinical trials of the US. Other countries may have their own clinical trials registry.

7. What is the average duration of study completion?
Duration is the difference in the start and completion date of the study. 2.62 years is the average duration for a study to complete. Phase 3 trials are large scale studies and takes longer than phase 1 and 2, but we will look at that later. We can say that it will take around 8 years for a new drug to complete all the phases of clinical trials before it can apply to FDA for approval. Add another year or 2 for pre-clinical testing of the drug on laboratory animals. Now I understand how lengthy, complex and risky is the entire process of drug development. I will not go into the drug pricing strategies but you got a sense why those drugs are so un-affordable, even with insurance sometimes.

8. How many total sponsors registered the studies?
28,068 sponsors from both industry and non-industry (government agencies etc) have registered studies. We will see the ratio and proportions later.

9. What share of registered studies were interventional or observational?
Almost 80% of the studies are interventional studies where some kind of therapy is given to the participant.

10. What percentage of registered studies were sponsored by industry and non-industry sponsor?
Non-Industry sponsors leads with 3/4th of studies sponsored by them.

11. Top 10 sponsors who registered the most studies?
GSK is the leader. Pfizer and Astrazeneca are other sponsors from pharmaceutical industry following with a close margin. National Cancer Institute, a non-industry sponsor, is at a second position. 

12. What are the top medical conditions for which the studies were registered?
Surprised to see obesity at number 2. Asthma and depression are in top 10. I never thought they were so important but I guess our lifestyle changes are responsible for their growth. Breast cancer is the most studied medical condition.

13. What is percentage share by the overall status of the study?
Only a small portion of the studies were withdrawn and suspended, however, 5.65% studies were terminated which could be a cause of concern.

14. What are the top 10 countries that recruited the patients?
US has recruited about 18% of the total participants. 
Taiwan at no 2 surprised me. I found it enrolled around 67 million participants for an observational study but Taiwan's total population is around 25 million. The enrollment number is an outlier to me, however, I did not change it. Cambodia too has a study that enrolled 15 million participants for an observational patient registry sponsored by non-industry sponsor French National Institute. 
Since observational studies enroll large population of participants, let's look at who are the top countries who recruited for interventional studies only (Figure 1.2). Almost 82 million participants have been enrolled so far and US once again leads the chart with a contribution of little over 13% and China closing the gap at second position.   
Figure 1.2

You can always slice and dice the analysis to look at things from different perspective in a dashboard. It's real fun to create your own questions in your mind and then try to find out the answers yourself.

See you till next time.

Clinical Intelligence Analytics - Insights

Clinical Trial studies a new drug or device before it is brought to the market. The new therapy is tested on human subjects to evaluate its safety and efficacy.
Sponsors or investigators of certain clinical trials are required by U.S. law to register their trials on and submit summary results to website. 
While working for a CRO as a Business Intelligence and Data warehouse Engineer, I gained some basic knowledge about the Clinical Research. I got so much interested that I decided to study Bio-sciences Management and Analytics subjects in my graduate MBA program. Unfortunately, my curiosity and the desire to learn more about Bio-sciences did not end there. 
I am passionate about creating insights out of data and I always try to unravel the layers of my curiosity by diving deep into the data. So, I decided to get the data from and create Clinical Intelligence Analytics for myself. 
The objective of creating this application is to share the insights with the community so that they know more about the things happening in this space. I would be really happy if this application could be of any benefit to patients, physicians,sponsors and other partners of the ecosystem. There are few online websites that helps patients find the recruiting study. However, I did not find any easy way that can help patients find out more about particular studies or sponsors or investigators in the past so that they can make informed decisions.
I used Talend software to get the data and used Qlikview BI to do data preparation and to create analytic dashboards. I have created more than a dozen dashboards so far and creating more as we go along. 
I would appreciate to leave a comment if you read the posts and find it useful. 
Since I do not want to make the posts boring for readers by putting a lot of information in one single post, I would be posting multiple posts in a series in coming days and update the link below in this post.

1. Clinical Summary Dashboard
2. Growth Trends Dashboard

Here are few links that would help in understanding basic terminology and basic information on
Common Terms
Trends Charts

Friday, April 6, 2018

Blockchain for Healthcare and Clinical Trials

Blockchain for Healthcare and Clinical Trials by Manohar Rana

The article is based on our 3rd place winner's proof of concept presented at Generation Blockchain Challenge.

          In general, healthcare and clinical trials are complex business environments mainly due to its direct impact on the human lives and various regulations built around them. There are various stakeholders in the entire ecosystem, and the need to improve on how these stakeholders collaborate and communicate with each other is ever increasing. Technological advancements from time to time have made significant improvements, but due to slow adoption of these technological advancements in healthcare in general, there is a great potential for newer technologies like blockchain to bring significant improvements in the overall systems.
          Healthcare organizations have made significant improvements through technological and process innovations that have benefitted and improved the entire customer experience. The most important customer in the ecosystem is a patient, and the entire healthcare business is centered around this customer. The ultimate aim of the various players like physicians, clinics/hospitals, pharmacies, drug manufacturers (pharmaceutical companies) is to bring value to a patient and enhance the overall customer experience. Then there are regulatory bodies like Food and Drug Administration (FDA), that oversees all these players, ensure the rights of a patient are protected and that they not misused in any way. A patient is the end consumer of the benefits in the entire value chain.
          On the other hand, in clinical trials, the drug manufacturer companies actually partners with human subjects aka patients to try their trial drugs on them before they bring the new drug to the market. Some of the key players in the clinical trials process are the Pharmaceutical company or the drug manufacturer, Contract Research Organisation (CRO) and Site Investigators (Physicians). Institutional Review Board (IRB) act as a regulatory body under the FDA. Since the other actors in the ecosystem are organizations that have their own technological infrastructure, the subjects remain at the receiving end. They have a limited role to play in the entire process and is limited by the technological capabilities of other's systems. Regulatory requirements make Organizations business systems slow, complex and inflexible. Generally, both healthcare and clinical trials partners have greater needs to collaborate and share the information through these complex systems.
          Attempts are made from time to time to come up with centralized systems that can facilitate greater collaboration and quick information sharing, but such systems pose their own challenges of ownership of data. Integrating data from different systems owned by different parties is a challenge. One alternative way could be to try to connect the trusted parties that are known to each other on a common platform. Blockchain technology has the potential to play that role. It may be too early to predict what role blockchain can play since there are not enough use cases that are being tried upon. It is difficult to say if Blockchain can displace the existing systems completely or complement them for some time before it actually does that. The objective here is not to speculate that possibility of whether Blockchain is a replacement for traditional Clinical Trial Management Systems but to explore the possibilities of small use cases that can actually bring value to the entire ecosystem.
Before we discuss how Blockchain can play an important role in clinical trials, it is important to understand the current challenges in the healthcare and clinical trials.

Few of the challenges in clinical trials are:

1. Subject Recruitment: To ask and convince a healthy subject to try a new trial drug is a challenge. There could be different motives for a healthy person to take that risk for monetary or personal reasons. Sponsor's find it very difficult to identify and recruit ideal subjects. A lot of times, the self-reported information provided by subjects cannot be authenticated leading to issues like dual enrollment, false disclosures, higher screen failures, a potential risk of severe adverse events (SAE's), and lawsuits leading to increased cost and bad quality of clinical research trial data.

2. Conducting trials: Sponsors make changes to the study protocols modifying inclusion and exclusion criteria mentioned in the study protocol after the study has started. At certain times the changes are genuine but sometimes the changes are made to widen the inclusion criteria or narrow down the exclusion criteria so that more subjects can be recruited easily.

3. Lack of trust and transparency.

4. Challenges in collaboration and communications.

Blockchain will increase and establish the trust in clinical research by the fact that tempering and manipulating the research data in blockchain is very difficult and easily traced. Self-reported data by the subjects generally lacks trust, which ultimately impacts the quality and cost of the drug trial. There is lack of trust in the way clinical research data is gathered, analyzed, and reported. Trust is further decreased because of unethical and unprofessional practices such as altering and not reporting the inclusion and exclusion criteria in a protocol to suit the interests of drug manufacturers. The timestamped block transactions can be easily traced and verified, making it less prone to manipulation and tempering. It would be worth reading the article about blockchain timestamped protocols here.
Blockchain will increase the transparency, collaboration, and communication in clinical trials. There are many partners in the clinical research ecosystem like Pharmaceutical companies(sponsors), CRO’s, study investigators (Physicians), hospitals, laboratories, insurance providers and patients, and there is a great need for all partners to collaborate and communicate effectively because human health is at stake.  The challenge is that every partner has their own technology systems which limit their ability to communicate effectively and efficiently. A lot of time and money is wasted in requesting, transferring, and communicating the information between different systems.
Blockchain brings all the trusted parties in the ecosystem to a common platform enabling them to see the clinical health records flowing through the system in real time and make timely decisions.
Not only that, Once the identity of a subject is established in the blockchain network, blockchain also addresses the issues related to subject’s dual enrollment in multiple studies at the same time saving the subject from being misused and exploitation. It is very difficult to find if a subject has enrolled in other studies. Ed Miseta, in his article, has highlighted the issue of dual enrollment in great detail here.
From sponsor’s perspective, it saves them lot of efforts wasted in subject recruitment causing higher screen failures.
Another important aspect of blockchain is that it enables a patient to play an important role as a participant. Currently, a subject is always at the receiving end of the value chain and has very limited or no access to his information. For example, in case of an adverse event, once a patient’s adverse event is notified to the physician, the patient has no idea how his case is followed up by a physician with other stakeholders. Blockchain system facilitates a patient to become an important participant in the whole ecosystem.

The inherent architecture and advantages of blockchain will make various processes and systems irrelevant and unnecessary, making the overall process of clinical research simple and cost-effective. The direct impact of this will be that it will help in bringing down the overall cost of bringing a new drug to the market, which ultimately will be passed on to the patients. More importantly, a subject would become a key participant in the clinical trial process and would be saved from misuse and exploitation.

Blockchain technology has the potential to bring disruptive changes in healthcare and clinical trials, that would make many of the current processes and businesses obsolete. It's in the best interest of the entire industry to explore the opportunities blockchain provides to remain sustainable in the longer run.

Thursday, December 1, 2016

Tableau - Implementation Challenges and Best Practices

Hi All,

I thought of sharing my leanings and experiences with Tableau so far.

This post will describe some of the challenges you could face while implementing Tableau BI in your enterprise from architectural standpoint.

If you are evaluating BI tools or planning to start implementation, you will definitely benefit from this post. I would be highlighting some of the best practices that you can include in your list.

Tableau is flexible when it comes to playing with and analyzing your data. It gives you complete freedom to choose and connect to your data source, and quickly start building those nice Viz (reports or charts or dashboards).
You can do pretty much everything to join the data sources in a SQL, put filters to restrict your data. If you are a data analyst, you can build some really compelling data visualizations or charts in a very short span of time.
Now you show those nice visualizations with your team or department and they too get very exited.

Till here it was all cool stuff. The challenges starts from here.

1. Do I don't need a Datawarehouse star schemas.?
Datawarehouse star schema contains Fact and dimensions that gives you enormous benefits in simplifying your implementation. You won't believe how it can benefit in terms of performance, scalability and maintenance.
Some may argue that Tableau doesn't need any kind of warehouse or these fact and dimensions star schemas.
Well, if you are really a very small enterprise then you may not need it but otherwise if you have good amount of data and have various source systems and applications, then do not build your BI without a datawarehouse. Or sometimes, your organisation has a warehouse but as a data analyst you may be tempted to NOT use it.
Since Tableau does not have any centralized metadata layer, users are free to create their SQL the way they want. This freedom proves costlier in long term strategy.
Developers build their SQL's on top of OLTP or normalized data structures and the result is you have highly complex SQL's with large numbers of joins giving you poor performance.
Very soon you will have hundreds of those complex SQL's with lots of duplicate data/information where one SQL may differ from another SQL slightly. It's not so easy to debug those complex SQL's to make any additions or alterations. Now you understand how difficult it would be to maintain those SQL's.
Star schema reduces those joins and makes your SQL very simple, and of course the performance is way better.
Tableau can extract the data in extract mode and improves the performance to some extent but do not just ignore the other benefits.For some reason in future if you need to make your application in Live mode then you may need to completely redesign it. Such reasons could be more frequent data refresh or implementing row level security for which you need to have Live connection for your Tableau application.

2. Temptation to put ALMOST everything in one Tableau Workbook:
When you start creating an application, you start with small dataset providing answers to very limited or few business questions. This is what tableau is built for.

Slowly when more and more people starts looking at it, they start asking for more and more information. This is when we start adding new data sets, joins, transformations and conditions. And our application starts growing from all angles.
It becomes more complex, performance goes down and it becomes difficult to  scale.
If we take a break here and plan things, we can do it in much better way.
Once we realize that our application is growing, think of going to point no 1 above of creating/extending the dimensional model.
You need recreate your application using a dimensional model. If you think about this early, you will reduce the amount of rework you would have to do.
The ideal design would be to do all the data analysis/discovery using your source systems structures (assuming you do not have a warehouse or the required information is not present in a warehouse at all).
Utilize all the freedom Tableau provides here. But once you start thinking of making it available for mass consumption by enterprise users, design the required subject areas (Facts and dimensions) or extend the existing ones.
Build your application now using these subjects areas. Your application would be simple, fast, scalable and easy to maintain. Since the new SQL would be using less joins, fewer calculations and aggregations, it would be fast and easier to read.
You can now imagine the benefits. If you need more data elements or metrics, simply keep adding them to your subject areas.
This will enable you to extend or scale your application to a greater extent BUT this does not mean you can still put almost everything in one workbook.
Definitely there is some more work here but I am sure you would appreciate the benefits it would bring in the long run.

3. I Still want to put almost everything in one Workbook:
You may be wondering if I am against that. Well I am not.
There are many instances where we need to have information to be displayed on our dashboards side by side that may be coming from different subject areas or Stars but there are certain things we need to consider and remember.
Since Tableau does not have a Semantic layer (aka Common Enterprise Reporting Layer), we need to have all the tables added to that one workbook as Data sources.
Here the grain of the data plays an important role. If the grain of the data is same then all can fit in one data source/SQL.
But if the grain of the two data sources are different and there is a need to have an interaction between these data sources then the real trouble starts.
When I say interaction between these two data sources, I mean to say that we need to pass common filters between them or need to show the data coming from these two data sources into one Viz/report.
When we need to have an interaction, we need to have a join between these two data sources. Tableau allows joins across data sources or perform blending but it may prove to be very costly in terms of performance and stability.
You would be surprised that even if individual queries have sub second response time, after applying the join the response time may be in minutes.
If your individual queries have limited or small data, it may work for you in some cases.
Better always test it out. Even Tableau experts suggest to avoid using the blending.

4. OK. what is the Solution then:
I know its frustrating when we talk about limitations only. Here it is also important to understand why such limitations when Tableau is such a nice tool?.
Well, Tableau is a tool for data discovery. Quickly go grab your data and starts visualizing it. Maps are inbuilt and required no configuration like in many other tools. But once we have built those nice dashboards we need to make it available for the enterprise users. Tableau can do certain things here but its not made for that. Now you are trying to make it do something that some enterprise BI reporting tools such as Oracle OBIEE or Business Objects or Cognos are just made for that. These tools can do some data discovery but not the way Tableau does, similarly Tableau can do some dashboarding but not the way they do it.
Here I am not comparing Tableau with them since they are not comparable and have totally different use case and technologies.

5. What else can I do to?
All right. Here is the solution.
We need to design our Tableau workbooks and dashboards intelligently keeping in mind the limitations.
Think of having a common landing page workbook with hyperlinks to all the other applications. Think of having some very common filters on your landing page. So your first workbook have just dimension data for those filters.
Now you can also think of making one or more of these filters mandatory meaning users need to have a filter value selected in order to go to a specific workbook/dashboard.
This would help in cases when your workbooks/dashboards have tons of data and you want to avoid just showing all of that data and slow down your application.
Now, you can build your simplified workbooks based on individual common subject areas and link them to your landing page.
Since Tableau allows to pass the filters between workbooks, you can pass the common filters from one workbook to another.
There may be certain cases when we want to have a dashboard/report having data from 2 different data sources and in those cases you can consider blending. I know I said Tableau experts suggest to avoid it.
See if blending works fine for you else think of creating a physical table in database combining the two sets of data having different grains.
This table will have data at both the grains and some indicator column will tell the row has data for which grain. you will find any example on the web for such cases since this issue is not specific to Tableau but common to data warehouse.

Well I guess So until something comes to my mind. Please post your comments and questions, and share your thoughts and experiences.

Thanks for reading.
Manohar Rana

Saturday, May 28, 2011

Qlikview is now a Leader


As I was expecting, Qlikview joined the leaders quadrant in Gartner's magic quadrant for BI 2011.
Qlikview is cited as a self contained BI Platform and the strengths being interactive, great visualization and end user friendliness and satisfaction. I am very happy about it.
But I am more focused on seeing the challenges ahead. It will be interesting to see how Qlikview maintain that position and stand in the competition.
The challenges cited by Gartner are
1. Lack of expansive product strategy
2. Limited metadata management
3. Lack of broad (high volume) BI deployments
4. Lack of Microsoft office integration
5. Poor Performance when data volume and number of Users increases.

The findings are not new and Qliktech surely needs to seriously think about these shortcomings.
I want to discuss further on the above points in detail.

1. Lack of expansive product strategy : To compete with large vendors like Oracle, it becomes very important to have a competent product expansion strategy. Oracle has very aggressive product strategy and has a vision to integrate its various offerings like Oracle BI, Hyperion Essbase, Oracle Enterprise performance management and more importantly their pre built analytic models popularly known as BI Applications. Though Qliktech has already taken one step in this direction by targeting application vendors like Salesforce and can offer pre built models for Salesforce customers but this is not enough. Qliktech has to work agressively in developing such pre built models for other but big applications. EPM is one area which is still untouched and lack of vision in this area can be disastrous and will simply throw Qliktech out of competition. Vendors not only should now think of Softwares but also start thinking about offering Hardware configured for optimum and enhanced performance. Oracle has got its popular Oracle Exadata, its database pre configured with HP's hardware and is agressively promoting it.

2. Limited metadata management : Qlikview offers limited metadata management capabilities and the primary reason I see is because Qlikview is focussed on small scale or much smaller than average size deployments, it did not see much relevance of metadata management. This can be dangerous to them as well as their clients as when they grow, they will start seeing the need for it and would require the investment they tried to save at the beginning. Even if Qliktech decides to go for building its capabilities in metadata management, the basic problem for them will be to start believing in OLAP dimensional building which will be against their basic principles. Qliktech market its product as a non OLAP tool which actually is not and treat the underlying data as a cloud in the memory. Hence when it will see the need for conforming dimensions to do cross functional analysis, it may become a matter of choice rather than a matter of capability.

3. Lack of broad (high volume) BI deployments: For Qliktech as mentioned above and as cited in Gartner's report, the major challenge will be to deploy large scale applications. As of now they have proved their capabilities in small or much smaller than average scale deployments and I think that is what Qlikview was made for. One of the Qliktech's selling point is that Qlikview do not require a datawarehouse. Now this same selling point will stop them to move ahead or prove their capabilities in average and large scale deployments.
For those who want to know why, please read one of my earlier post here
This again will depend on reviewing its sales strategy and making corrections to their basic beliefs which is not going to be an easy task. If they do not start using the terms datawarehouse and OLAP, it will difficult to maintain the Leaders position.

4. Lack of Microsoft office integration: This is something I have mentioned in one of my post in Year 2008 read here. It seems Qliktech is least bothered. Its current capabilities are very basic in terms of simple export to MS Excel. In coming releases if it do not develop such capabilities, it will he hard for Gartner or Forrester to give a space to Qliktech in their reports and compare Qlikview with Oracle or IBM. There are many more such features which I have mentioned in my post earlier. Some of them which are important according to me are building connectors for their proprietary QVD and QVW files so that their models can be available to other applications, SQL generation queries to help developers in debugging etc.

5. Poor Performance when data volume and number of Users increases: This is again linked to point number 3 above.

Feel free to post your comments or thoughts.

Till next time

Manohar Rana

Saturday, April 16, 2011

Enhance Business Intelligence Performance

Hi All,

In any Business Intelligence Implementation, the key factor is the performance. Performance factor always plays a key role in User accepting or liking the application.
We should do everything possible to enhance the performance and here are some tips some of which are very generic and can be used in any BI Implementation.
From a solution perspective:
1. Use of Datawarehouse: Though a datawarehouse is not compulsory for any BI Implementation, we cannot simply think about a BI solution without a datawarehouse because of the advantages it offers in terms of performance and simplicity. This is important for small implementations who sometimes neglect and underestimate the use of datawarehouse.
From a BI Tool Perspective:
1. For every tool it is important to reduce the size of the application by removing the unnecessary objects.
2. Try to create different database connections for different set of users based on the priority.
3. Try to create a seperate database connection for populating any session level variables.
4. Try to make the best use of system Cache. If the tool allows to cache the results of the queries, use it and if possible pre populate the queries which are very frequent and used mostly.
5. Minimise the calculations happening at the BI level by pre calculating them in datawarehouse.
From a database perspective:
1. The most important thing is to perform every possible calculation you can do in database. We very frequently neglect this saying this is a small thing and cal be calculated or performed at BI level. We should avoid this and if something is possible in ETL or database, do it here even it cost you adding a few extra columns or tables.
2. If you can create a perfect star models, nothing like that.
3. Try to use the database techniques like Partitioning and indexing to enhance the performance of database queries.

There may be several other tips and techniques which we can follow to improve performance and if you have any, please feel free to share.
Till next time.

Manohar Rana

Saturday, March 13, 2010

BI Project Implementation Life Cycle


Recently I went through the website of Qliktech and read where it says "And unlike traditional BI, QlikView delivers immediate value with payback measured in days or weeks rather than months, years, or not at all.". It made me think for a while. what are they referring to ? Do they mean Qlikview as a BI solution can be implemented in weeks? Are they talking only about small department level deployments or just the development phase of the BI project life cycle or is it just a sales pitch? I am just wondering how much part a tool plays in a BI project Implementation life cycle and how does a tool effect the development phase.
Whether you use Qlikview or some traditional BI like OBIEE for that matter, about 70% of the project life cycle remains unaffected or untouched.
Then I thought about my recent Oracle BI Project. If we would have replaced OBIEE with Qlikview, will the whole project gets implemented in weeks instead of months or years. I don't think so.

Well, The topic is BI Project Implementation Life Cycle and not Qlikview.
I have worked on some small Qlikview project Implementations which generally span from 2-4 Months and also large OBIEE implementations ranging from 9 months to close to 1 year.
My latest assignment was a large Oracle BI (OBIEE) Global Implementation Project.
I would like to share my knowledge and learnings from the current as well as previous projects. I will also discuss how large deployments differ from smaller ones and also the different phases involved in a BI project.
Big deployments generally includes ETL as well as BI and takes somewhere around 9 months to 2 Years to implement the solution. But when we compare it with small deployments using tools like Qlikview, it has a small life cycle ranging from somewhere around 3 months to 6 months because they generally do not involve a datawarehouse.
There are various reasons for not having a datawarehouse like

1. Time and Cost: For smaller organisations the IT budgets are generally small. Including Datawarehouse in a BI initiative increases the project cost significantly. The organisations needs to Purchase additional license for a new database. Even if they already have a database in place, it can not be used for various reasons and is not recommended as well. The organisation also needs to purchase a license for a ETL tool and required hardware. Other than the software and Hardware, the cost is involved in hiring resources for database and ETL.

2. Small data size: As small organisations do not have large or huge data sets (Even if they have multiple applications) and hence generally neglect the relevance and importance of having a datawarehouse in place. Not only it complements BI by providing faster response times for BI end users but also it reduces the development cycle of the BI by reducing the level of complexities of the logic. Using a datawarehouse helps in having much simpler BI data models which requires less development time and maintainance.

3. Less complex data models: Small organisations generally have small customised business applications running which do not have very complex OLTP data structures and hence it becomes relatively easy to design BI data models eliminating the need for a datawarehouse solution.

4. Less number of Users: Organisations having smaller number of BI users (Using the application at a given point of time) generally have this option of building their BI solution directly on top of their OLTP database. They can either allow BI users to directly query the OLTP database which in some way reduces the performance of the business application users already connected or they can use tools like Qlikview or Hyperion Essbase which employs this technique of storing the data in their proprietery data file formats and allow a disconnected type analysis. The later one is a much better option as it provides a much faster performance as against querying an OLTP database as for some reason unknown to me these proprietary data files are highly optimized to be used by their respective applications and also this do not impact the performance of the OLTP application.

Now let's discuss a typical BI Project life cycle which comprises of following phases:

1. Project Initiation and Analysis:
For any project to start a business need is a must. for example "replacing an old customized reporting system with a new reporting and analysis system which is more flexible, cost effective, fast and requires less maintainance" or "Need to consolidate data from disparate sources and a common standardized platform for reporting and analysis" could be a business need.This business need is evaluated at a very high level as to how critical is the need and how it well it falls in line with the strategic objectives of an Organization. Formally a Project manager is identified and appointed who further investigate and perform some first level of analysis.
It is then Project manager who creates a document with details like Business case, initial risk assessment, scheduling and budgeting, stakeholders identification etc taking help from the available resources and then a formal approval is taken.

2. Planning and Designing:
Requirements gathering is the most important and critical part of the whole project. A requirement document is created for this purpose and all the requirements are documented in much details. The requirements are finalized and then project scope is created. To fulfill these requirements, various tools are evaluated and whichever is the best fit is selected. Now the team resources are identified. All the logistics, hardware and software requirements are identified and procured.
Data models are designed and it is documented in a Technical design document which also specifies other technical details like features and functionalities of the design, security and authentication, best practices, repository variables, layers, database connections, connection pools etc etc..
Prototypes are designed to show the features and functionalities as documented in the requirement document and is formally approved. Project kickoff.

3. Development:
Before starting the development of the data models and reports, the project scope is divided into small more manageable modules or batches.
It is a good practice to divide it on the basis of functional areas and subject areas.
So let's suppose we decided to do it for functional area Human Resources among other functional areas like Sales and Distribution, finance, Inventory and Manufacturing etc under which we are planning to create different subject areas like

Employee Compensation and Bonus, Learning and Development, Employee movements and transfers etc. You may have one data model for complete HR or seperate data models for each subject areas.
For smaller organizations with less complex OLTP data structures, it is possible and feasible to have a single data model for complete Human Resources.
For large and complex OLTP structures, it is generally not possible as otherwise the size of the fact table will be extremely large horizontally as well as vertically. This will give an extremely slow performance as well as from maintainance perspective also the time taken to load the fact table will be more and unpractical.
Once we decide on our strategy for the development, we start with developing the data model as per the designs created in the previous phase i.e Planning and Designing.
The data model is developed generally in layers. In Oracle OBIEE, there are three layers and Cognos allows you to build as many layers as you wish and BO provides 2 layers(I am not very sure on this and would request some comments on this).
In Qlikview, we can make it single layered or 2 layered by renaming the column names in the script.
For all practical purposes, upto 3 layers is a good idea but you may agree or disagree on that. Based on your
requirements of maintainance you can decide on that.
OBIEE has 3 predefined layers namely Physical Layer, Business and Modeling layer and Presentation layer.
Physical layer is where we simply make connections to the database and import the metadata for database objects like tables, columns, views, primary and foreign key relationships. Now we do not make any changes related to changing the names of the columns which help the administrator and developers from maintainance perspective.
Based on the available physical layer objects we create our OLAP models in Business layer by defining dimensions, facts and Hierarchies.
In the presentation layer, we categorize the objects based on subject areas from the objects available in OLAP model in Business layer. We rename the objects present in Presentation layer from end users perspective or business terminology.
This whole process really helps the developers to understand and visualize the complete model and saves lot of time in debugging or day to day maintainance activities. This process oriented approach is again an attempt to divide and rule and making our life a bit simpler.
Once you are done with your model, the next step is to start developing your reports and bringing in the identified resources for report development into the team.
The reports based on the subject area are divided among the team and with the help of report specifications available in Technical design document created in the previous phase.The reports are generally designed by report developers.
While the report designers are engaged, the data model developers work developing the next subject area. Initially the team size is less and as the work keeps on growing, more people are added in the team.
The development also includes setting up the object level as well as data level security , creating users and groups and creating variables as per the technical design document.

4. Testing
Testing is one of the most critical phase and also sometimes most time consuming phase of a BI project Implementation.
There are 3 types of testing done on a project namely Unit Testing (UT), System Integration Testing (SIT) and User Acceptance Testing (UAT).
Unit Testing is generally done by the developers and they test that the code or report they have developed is working as per the requirement or specifications. This is performed in Development environment where Developers had developed the
report. Developers prepare a test case plan for themselves listing all the cases they would like to test. These cases could be testing the font size, color, spell check, prompts, filters, data values etc.
Sometimes developers may exchange the reports with their team members to perform a peer unit testing and this is a good practice as it is little easier to find out mistakes in other's report than your own.
Once Unit testing is complete, the code (data models and reports) are transferred to Test Environment.
This Testing Environment is generally similar to the production environment but that is not the case always. Having the test environment same as production environment allows us to anticipate the performance and behavior of the developed solution in a much better way.
Once the code is transferred to Test environment, System Integration Testing is performed. SIT checks how all the individually pieces works collectively and are integrated well to produce the desired output. This test is performed by the IT team members or by identified testers from the client side. However before they perform the test a sampling based dry run is required to be performed by the development team.
Once the testing team start testing the application, they put all the defects in a defect log sheet mentioning the
details of the defect.
At this point of time, it is recommended to appoint some dedicated members from the development team to fix those identified defects and update the defect log sheet. While this activity is going on, other team members are assigned next set of development work and they keep working on developing next batch of reports. It may happen that the same team will fix the defects by allocating some portion of their time and rest of the time in developing next batch reports. But this may bring some imbalance or turbulence in the system as it will become very difficult to really work on two things simultaneously. Bug fixing involves lot of coordination with ETL team as well as testers and sometimes consumes time more than what was anticipated which ultimately may impact the development activities. Having a dedicated team for bux fixing activities would be very useful and effective.
Once all bugs are identified and fixed, the Business users are asked to perform the User acceptance test. The test cases are prepared by business users and they check if all the requirements are fulfilled and they really getting what they want. Here business users compare the result set with the result set from their existing system and verify the results.
One of the biggest challenge in SIT or UAT is if any data related inaccuracies are found, it becomes really difficult to find the root cause. The developer needs to check which version is true. There may be some internal logic or transformations or formulas applied in the existing application and this analysis consumes whole lot of time requiring lot of coordination with the team supporting the existing application or system.

5. Implementation
After the code is tested and verified completely, it is transferred to the production environment and opened for the real end users. Before that general awareness sessions and training sessions are held for the end users to use the new system. For some time the new system is put on a stabilization period (Generally ranges from 15 days to 2 or 3 months but it could be even more) where in case a bug occurs, it is fixed by the same development team.
During this time the new system or application is made to run parallel with the existing system or application and once the stabilization period is over, the old system is replaced partially or completely.
Once the stabilization period is over and the system gets stabilized, the support team is provided with all the project related knowledge and the development team is rolled off.
The implementation of BI Project gets over.

Please feel free to share your comments.

Manohar Rana

Thursday, July 30, 2009

Hyperion Essbase Book


One good news for people who wants to learn Hyperion Essbase on their own.
There are not many books available on Essbase and I know only one book which is available 'Look smarter than you are with Hyperion Essbase' by Edward Roske.

Sarma and Joseph have written a 444 page book on Hyperion Essbase 9 with the title

'Oracle Essbase 9 Implementation Guide'.

The book covers Installation, Essbase cube designing and creation, loading data, Excel add-in, MDX, Max-L, calculation scripts, reports etc and explained them with real time examples in sufficient step by step details with the help of pictures, Notes and Hints.

The book is for students who are new to Essbase and Multidimensional OLAP concepts as well as professionals who are already working on the Essbase technology wants to refresh their knowledge.

For those who feel that Essbase is only for financial applications, this book will completely change their belief as at one place the author mentioned as
"Oracle Essbase is widely known as a financial analytical tool. We want to change the mindset just a bit, right here and now. Oracle Essbase absolutely is a superior financial OLAP tool, but it is an equally superior OLAP tool for just about any type of data analysis."
As the Authors have rich real time implementation experience in a Motor company, most of the examples are from that domain only. I am sure the readers would greatly benefit from these real time examples.

I am happy to see that Venkat has reviewed this book and feel proud to have worked with him in Oracle.

Though this is a great attempt to bring out such a nice book there are few points which I think should have been added in this book to make it more useful and practical.

1. As Essbase is not like other BI tools where you can visualise the schema and how dimensions are joined or related to facts, it becomes difficult in Essbase to visualise that. For that reason, it would have been helpful if a small cube have been designed with some small dimensions and facts and a relation is shown with a OLAP schema diagram. I am saying this because I myself was not able to understand how the data which is getting loaded into the Essbase cube is related to each other and it took me so much time to understand that. Please excuse me if I am the only one having this problem.

2. I do not think Essbase is complete without EIS(Essbase Integration Services) in version 9.3.1 or Essbase studio in Version 11.
Even though you can do almost all of the Essbase stuff without them but it would have been much better even if a small portion has been described about either of them in the book. As going forward Essbase studio is going to replace EIS, a small chapter on Essbase studio is desirable.

You can download one complete chapter Essbase Data and Design Considerations. By downloading this chapter in pdf format you can also see the list of chapters and table of contents.

All in all, a great attempt and thanks to people involved in bringing out a much needed book on Essbase.

Manohar Rana