Search This Blog

Monday, August 2, 2021

Grafana Cloud - Connect to PostgreSQL in Heroku

Providing Hostname, Database, Username and Password was just not enough and it worked after some hit and trial changing other settings.

Grafana is a wonderful open-source Business Intelligence software that you can host on your premise or subscribe to the cloud offering. I had my Django app deployed on Heroku with a PostgreSQL database.

I tried to create a new connection in Grafana cloud to the PostgreSQL cloud with default settings but it was throwing an error 400, which is basically a bad request that the server would reject. The error did not specify details about the error. 

The default setting for TLS/SSL mode is 'verify-full'. I had to change it to 'require' and the connection was successful. Below is a screenshot with all the settings.

grafana postgresql connection

If you are in a same situation, just give it a try. Feel free to post any questions or comments.


Thursday, March 5, 2020

Sponsor Key Performance Indicators - Collaboration

Sponsor Performance KPI - Collaboration

In part 1 of the sponsor KPI's, we learned that the number of studies completed and number of studies for which the results were posted are two important indicators to gauge the performance of a sponsor. In this series we will see other metrics related to lead times and collaboration, and also try to dig deeper into why collaboration is important.

Clinical research is a capital intensive business with very high lead times and very low success rate, which makes it one of the riskier businesses. Clinical trials requires high investments in financial capital as well as in human capital. The lead time (time taken for completing all 3 phases of study before the drug gets approval from FDA) is in the range of 8-13 years. Historically, the average time to complete a study phase has been around 2.62 years. However, phase 1 trials are shorter and takes around 1.84 years as compared to phase 2 and phase 3 studies. The longer a study goes, higher would be the costs associated. Hence, it is in the interest of sponsors and end consumers (patients) to reduce the lead time in order to reduce the prices of patented drugs. 

In order to understand how can the lead time be reduced, we first need to understand the factors that affect the duration of a study. Sponsors and CRO's have developed various methods to increase operational efficiencies based on their long experiences in designing and conducting a study. There are possibilities for improvements there, however, I will not discuss them here. 

The availability of sufficient patients(subjects) to conduct a trial is an important factor. The process of recruiting is time-consuming. Broadly, there are two ways sponsors find patients. 

Firstly, sponsors advertise through various channels to find patients. Social media like facebook, and other digital channels like blogs and websites have picked up in advertising for recruiting clinical trials. Sponsors/CRO's most often pay some compensation for patient's time and travel. This compensation vary from $500 to $9000. 
The second method of recruiting patients is through physician networks. Sponsors/CRO's have physicians in their network that have a database of their patients. 
Out of many who show interest in participating in a trial, many patients fail to clear the screening process and become ineligible to enroll. 
Despite all the efforts from sponsors, they are not able to enroll enough patients. Insufficient enrollment forces sponsors to close many sites(facility like a hospital or physician clinic). 
Finally, insufficient sites results in discontinuation of the study.
The competition to recruit patients has increased with many sponsors competing with other sponsors recruiting for the same patient profile. Also, the sponsors have different competencies. For example, commercial industry sponsors have better operational efficiencies and global network(below figure shows that industry sponsors are doing more global studies) where as non-industry sponsors like hospitals have strong patient base but limited to a local geographical location. 
Sponsor Collaborators Vs Num of Countries

Therefore, it makes sense for sponsors to collaborate more with other sponsors and leverage their strengths for a win-win situation. 
For example, for a local hospital or research university conducting a study on diabetes in Raleigh, it make sense to collaborate with other non-industry sponsors in other cities to increase the number of study sites that increases diversity in the patient population which ultimately increases the quality of entire study. 
For the same local hospital, collaborating with industry sponsors having a global network and expertise in conducting trials in different countries, can not only help them expand their patient base but also leverage their expertise in conducting a global study with high operational efficiency.
There is a greater need for sponsors to collaborate instead of competing for recruiting patients. They should come together and explore possibilities of combining studies together or sharing detailed data they have obtained in previous trials. 
Now we know why collaboration for sponsors is important, let's see how the industry and non-industry sponsors have been collaborating.
I have created an interactive sponsor dashboard for sponsors who have completed at least 10 studies, mainly to eliminate very small sponsors and to reduce the data. Hover over the data points to see details.
On KPI-1 tab in the dashboard, you would see chart that show the collaboration as shown in the figure below:
 Sponsor Collaboration

  
Few observations are that only some bigger industry sponsors are collaborating more with non-industry. The non-industry sponsors have collaborated with non-industry sponsors more as compared to industry sponsors.
Till next time!       

Wednesday, December 18, 2019

Data Preparation using R

Data Analytics - Data Preparation using R

Data preparation is an important step in data science. Before you can start analysis, we need to ingest the required data and then clean and transform it so that it becomes more easy to perform analytics.
R is an open source and powerful software having reliable packages and libraries to perform data manipulation tasks.
I have downloaded the clinical trials data from ACCT website and used that to show data preparation steps. You can reference the complete code published here in a R Markdown document:

Let's walk through the code and understand how to do it. You may want to open the link mentioned above in a separate tab since I will not mention the code here again and will simply refer to the document. 
You would need to first download the data files from ACCT website download tab and unzip them into a folder in your local directory.
Installing the required packages is the first step. 'dplyr' is an important library package to perform data manipulation and 'lubridate' provides important functions to perform operations with date columns.
Save the directory paths into variables. Notice that the '\' in the path are replaced with '/'. 
read.csv function is used to read the pipe delimited data files into dataframes. Another option is to use read.table function can also be used but I found read.csv works better for me. I encountered errors reading records but read.csv worked without any issues for the same data set.
studies table has lots of columns that I did not require for the analysis, so I created a subset with the required columns only. 
subset.data.frame function allows to select the required columns using 'select' parameter. You can also provide any data filter using 'subset' parameter specifying the filter condition if you want a subset of rows.
I need to create new formula columns into the subset of studies created. mutate function helps to do that.
Another function which I found helpful in doing a wildcard search is 'grepl' function.
summarise function is helpful in creating aggregations.
Hope you will find this useful.

Friday, December 6, 2019

Sponsor Key Performance Indicators - Part 1

Sponsor Performance KPI - Part 1

Sponsors are the key stakeholders in clinical trials. It is important to measure the performance of sponsors to understand the trends in the industry and market. Metrics such as studies registered, studies completed, study results posted, collaboration and study completion duration can be key performance indicators from study conduct perspective. We can gain competitive intelligence about competitor sponsors or find opportunities to collaborate with potential partners. The possibilities are endless. 
Let's see who were top Industry sponsors in terms of studies completed in 2018 ranked by number of completed studies. 

There are 4,017 studies completed by industry sponsors in 2018. Novartis is leading the board with 114 studies completed in 2018, followed by GSK and Pfizer with 96 and 92 studies respectively. The chart also shows the number of results posted by those sponsors in 2018. There may be studies completed in 2018 for which the results are not posted yet, but I am not aware of any requirements for posting results. Another metric is the ration of completed studies to the studies with posted results. There should be a linear relationship since more completed studies would mean more posted results. To just verify that, let's create a scatter plot. There is a strong linear relationship. The R-square value is pretty high. See the plot below with a fitted line. It may be interesting to look at sponsors with very low ratio.
  

Now, let's take a look at what's going on with non-industry sponsors in the same year 2018.

NIH, Mayo clinic and Duke University are the top 3 performers. 
The studies completed numbers are comparable  but the posted results and the ratio is very low when compared to sponsors from industry segment. The slope is 0.13 as compared to 0.3 for industry sponsors. So, we see that industry sponsors are posting more results. 
   
We can see that the industry sponsors(blue) have higher posted results for same levels of completed studies as compared to the non-industry sponsors(red).
I am really trying to think why non-industry sponsors have low results postings than industry sponsors. 
Every study is required to submit the results, generally no later than 12 months of completion.
The clinicaltrials.gov explains what all is included in the results and also mentions few valid reasons when the results are not submitted:
https://clinicaltrials.gov/ct2/about-site/results#DisplayOfResults 

Good news!!! I have recreated sponsor analytics using flexdashboards and plotly R so that you all can interact instead of viewing static jpeg images. The charts are very interactive and you can view individual data points. However, in the analysis, I have only included sponsors that have completed at least 10 studies, to reduce the number of data points as well as the skewness caused by them since there were a large numbers of sponsors within that range. I hope you would enjoy that.
Link to Dashboard:
http://rpubs.com/kalehdoo/SponsorDashboard

Summary -There are a total of 2234 sponsors who have completed at least 10 studies in the past out of which there are 647 (29%) sponsors from Industry and the remaining 1587 (71%) from non-industry.
Non-industry sponsors are further classified into Academic and Hospital based on their names (this may not be 100% correct and you may notice some sponsors classified incorrectly).
Keep reading!!!


    

Thursday, December 5, 2019

Linear Regression using R and Python

R and Python

There are times when we not only look at the descriptive analysis but also want to make future predictions based on the past trends. We will look at techniques that we can use to predict the number of studies submitted or registered in future years.

We will see how we can use some of the libraries like pandas, statsmodels and  matplotlib in python.
the python code is available on my github repository here
https://github.com/kalehdoo/clintrials/blob/master/ctrials_1.py
I will also try to explain the steps and procedures to perform the analysis later.
Here is the final outcome:
Model Summary:

Studies submitted predicted:
For 2019 : 31,022 
For 2020 : 32,479

The data used for the regression is from 2005-2018.
There is still some time left for 2019 to complete so I will come back next year to compare the 2019 actuals with the predicted numbers here. The actuals for 2019 is 19,990 (data until Aug 2019) which was posted in one of the previous posts here.

I have also shown how to do regression using R programming, and also how to interpret the results. The link below has complete code and the analytics:
http://rpubs.com/kalehdoo/sponsor_analytics


Source data extracted from: https://aact.ctti-clinicaltrials.org 


Friday, November 15, 2019

Clinical Study Activity Per Capita


Study Activity Dashboard

In this post, we will try to understand the clinical study activities across the globe. We will gather some inputs like population, GDP and health spending as % of GDP. Then we compare different countries by their involvement in clinical studies. The clinical study activity is based on the clinical site in that country for a particular study. It is important to keep in mind that those clinical sites or facilities may or may not have enrolled any participants. Also, the demographics data is for the year 2017 and we are considering all the clinical studies registered in the USA clinicaltrials.gov as of Aug 2019. Keeping all that in mind, we will try to get a sense of overall study activity and compare them for different countries. We will also look at the study activity from region level. So, just sit back and relax.
Figure 1.1
Figure 1.1 shows study activity per 100 K population of a country. Denmark tops the list with the highest number of studies per 100 K population. This is not the complete list and I have tried to display maximum I could fit in a picture. You would notice that there are countries with very small population and hence they have got a high activity per capita. Also, there are few countries with very large populations and have got a low ratio.

In figure 1.2 above, the countries are categorized under geographic region. It also shows the percentage share of the population and number of studies. The dashboard allows to drill down on a region and see the details by country. 
We will look at the study activity based on the GDP and health spending as % of GDP of countries in next post.
Keep thinking till next time. 

Wednesday, September 11, 2019

Clinical Intelligence Analytics - Sponsor Trends

Sponsor Trends Dashboard

In the last post here, we gained some insights into top performing sponsors and overall trend. In this post, we will look further deep into how the sponsor participation has changed in last few years. The Sponsor Trends Dashboard (figure 6.1) can answer some interesting questions and see what's going on in the clinical trials industry.

Figure 6.1
Here, we will try to analyze the insights gained from Sponsor Trends Dashboard in Figure 6.1 above. 
1. What type of study Sponsors have the highest or lowest share in study registration, and how has that changed over the last few years?
The chart in Quadrant 2 tells us about the share of studies by different types of sponsors. Clearly, Universities have the highest share and has maintained a steep rise in the last few years. The share has increased from 40% in 2005 to 50% in last 2 years. These Universities could be public or private, funded or not-funded but we do not have that information available as of now. 
Hospitals have also performed well in registering the studies. The share of studies registered by Hospitals have almost doubled since 2005. On the other hand, the share of Industry sponsored studies has declined consistently and it has reduced to just 17% now which was 37% in 2008. The share does not indicate if the segment has really grown or declined. The share of one segment, Universities for example,  could rise because other segments have declined. To see the trend, read the next question below.

2. Which type of sponsors have shown growth in study activity?
The chart in Quadrant 1 shows how the registered studies by different categories or types of sponsors have grown over the period.
The studies registered by Universities have grown at a continuous and rapid pace. Until 2008, the Industry segment and Universities were overlapping but after that the Universities completely outpaced the Industry Sponsors. The growth in studies registered by Industry sponsors has remained almost flat but Hospitals maintained a steady growth until 2016 after which the growth is negligible. 

3. How has the total numbers of sponsors changed over the past few years?
In the above 2 charts in Quadrants 1 and 2, we looked at the trend of the studies registered by sponsors but now in this chart on quadrant 3, we will look at the trend of actual number of sponsors.
The overall trend shows that the total number of primary sponsors who registered their studies has grown consistently. All 3 important categories, Universities, Industry Sponsors and Hospitals have grown consistently. All 3 types of sponsors have increased almost 2 times of 2008 levels.

4. Has the participation of sponsors from industry has increased?
The chart in quadrant 4 shows the participation trend over the period. Industry participation means that either the study is sponsored by a sponsor from industry or at least one of the collaborators is an industry sponsor. 
The chart shows that the industry participation has declined and reduced to just 22% from upper 40s in 2008. Between 2005 and 2008, the share of studies having industry participation remained in upper 40s but started declining consistently after 2008. However, it is not clear from this chart if the studies with industry participation has decreased or the studies without industry participation has increased. To figure that out, let's take a look at the trend chart that shows the number of studies with or without industry participation over the period of time.
Figure 6.2

The trend in chart (figure 6.2) shows that the number of studies where there is industry participation has remained nearly constant but the studies without any industry participation has increased from 6K levels in 2005 to 24K in 2018, which is four times growth.
As a food for thought, how do you think clinical research industry can increase the collaboration for the larger benefit to the whole community?
Do post your questions and comments.
Keep thinking!

Source data extracted from: https://aact.ctti-clinicaltrials.org 

Saturday, September 7, 2019

Clinical Intelligence Analytics - Primary Sponsor

Primary Sponsor Dashboard

Primary Sponsor dashboard (Figure 5.1) provides us insights into the study activities by primary or lead study sponsors. Primary sponsor is an important stakeholder in the clinical trials that has the primary responsibility of initiating, study design and study conduct. In simple terms, we can say that the primary sponsor is the owner of the clinical study. The sponsor can be an individual, a company or an institution, and they can be from industry (commercial) like pharmaceutical or Biotech companies or public non-industry (non-commercial) institutions like government or research institutions. 
 In this dashboard we will look at various aspects of study activities of sponsors.
Figure 5.1

1. How many sponsors have registered clinical trials in the US? How many of them are from the Industry (Commercial)? How many studies have they registered? For how many studies did the sponsors from commercial sector have posted the results for the studies?
The tiles on the top provides few sponsors related summary metrics to answer some of the questions mentioned above.
There are a total of 313,345 studies registered by 28,068 lead or primary sponsors in the US till date. Out of 28,068 sponsors, 8,717 sponsors are from commercial sector with 81,169 studies registered and the remaining 19,351 sponsors are from non-commercial sector with 232,176 studies registered. 
There are 4,545 sponsors with at least one study result posted and 1,680 of them are from Industry. 

2. What percentage of studies were registered by sponsors from industry as compared to the non-industry? 
About 26% of studies were sponsored from the industry or commercial sector. A small percentage 3.31% and 1.16% are contributed by NIH (National Institute of Health) and US government respectively. The raw data did not provide further classification into the non-industry sector. The data is transformed to figure out if the non-industry sponsor is a Hospital or University/Institute/School. The shows that Universities have been a major source of sponsored studies with almost 45% share and Hospitals having 9% share.

3. What's the growth in number of studies sponsored by study as compared to the non-industry sector over the last few years?
Except 2008 and 2014, the number of studies registered by Industry sponsors have largely remained stable around 5,500 mark. In contrast, the studies registered by non-industry sponsors have grown rapidly. Notice the size of the steps in the chart. Also notice the increasing gap between the two lines.

4. What percentage of registered studies have some participation from industry?
The pie chart named Industry Participation shows the share of studies where either the primary sponsor is from industry or at least one of of the collaborators is from industry. About 33% of the studies has some participation from industry.
 
5. Who are the top performing industry sponsors?
The tabular chart shows the primary sponsors from industry sector ranked based on the number of studies registered. The chart also display metrics like number of countries they have recruited patients, number of recruiting facilities, studies in completed state, studies where the recruitment has not yet started, studies that are currently recruiting and studies where results are posted. The success ratio compares the studies that were registered minus the studies that have not yet started or are in progress with the completed studies.
GlaxoSmithKline(GSK) is the top performer with 3351 studies and 91% success ratio followed by Pfizer and Novartis. Pfizer has recruited patients in 105 countries. Sanofi is another sponsor that recruited patients in 107 countries. 

6. Who are the top performing sponsors from non-commercial sector?
National Institute of Health Clinical Center, National Cancer Institute and M.D Anderson Cancer Institute are top 3 performers. The non-industry sponsors have recruited patients in fewer countries as compared to the top sponsors from commercial sectors.
The dashboards can answer many other questions by simply slicing and dicing the data by different dimensions.
If you get any questions in your mind that you want to share, please post them in comments and I will try to address them.
Till next time.  

Wednesday, August 28, 2019

Clinical Intelligence Analytics - Study Trends

Study Trends

Study Trends dashboard (Figure 4.1) gives us insights into the tends in recruiting country, average registration to enrollment duration and average study duration over the past 2 decades.
Figure 4.1

The data legends are shared between the charts on the first row. Similarly, the legend is same for the 2 charts on the bottom row.
The data is aggregated on a study level and a study is classified as an International (Both US and non-US), US-only and non-US only study based on the countries of patient recruitment. The analysis excludes studies that does not have any recruiting country information which could be for various reasons such as the recruitment might not have started or the study may not have any enrollments yet.
An international study means that it has recruited patients in the US and at least one other country. 
A non-US only study means that the study has recruited the patients in countries other than the US and no patient was enrolled in the US.
For the other 2 charts on the bottom row, the numbers with a minus sign are positive numbers but are shown as negative numbers just for the display purpose as they are on the opposite side. However, this is not intended. The vertical lines are the average lines to show the distance.

The dashboard can provide insights into the following trends:
1. What's the share of studies by recruiting country?
55% of the studies registered in the US have recruited the patients only from countries other than the US. The share of studies that recruited the patients only inside the US is 39%. International studies are just 6%.


2. What is the trend in the study registration by their location of recruitment?
Now that we know something about the share of the studies based on where the patients were recruited, let's take a look at how it has changed over the last 2 decades. The share of international studies has reduced to less than 5% now from 15% in early 2000's. The trend of US-only studies have also declined sharply from lower 90% in early 2000's to 30-35% level now. The trend of non-US studies have increased consistently in past 2 decades from 44% in year 2005 to 66% in 2019.  
Remember that the share of the studies registered in the past years can change based on the studies that are still recruiting or may recruit in future. Hence, the current trend is the snapshot.

3. How has the average time taken study registration to enrollment changed over the last few years?
The chart shows the trend in the time taken from study registration to patient enrollment or study initiation. The chart also compares side-by-side the duration for studies sponsored by industry or non-industry sponsors.
There are many studies that were registered retrospectively, meaning the studies were registered after they were already started (the first patient was already enrolled). Such retrospective studies were excluded from the analysis. Only for the prospective studies, the registration to enrollment duration is calculated in days.  
It appears that it usually took longer, sometimes 1.5-3 times, for non-industry sponsored studies to begin a study after they are submitted. The trend for non-industry sponsored studies is following a parabolic curve. The average registration to enrollment duration for the non-industry sponsored studies is 125 days which is higher than the overall industry average of 107 days. For the most part, the yearly trend is close to the average line except in year 2010 having an average of 145 days, which is also the highest in recent years.
On the other hand, the industry sponsored studies are initiated quickly and the study initiation duration has improved slightly overall. The average for industry sponsored studies is around 80 days which is well below the industry average of 107 days. The average for the last few years has been consistently near the average line.   

4. How has the average study duration changed over the last few years?
The non-industry sponsored studies takes longer to complete as compared to industry sponsored studies. The average study completion duration for non-industry sponsors is close to 3.5 years which is considerably much higher than the overall industry average of 2.6 years. On the other side, the average for industry sponsors is little above 2 years. The good news is that both type of sponsors have made a significant improvement is past 15 years to bring down the average completion duration to lower levels, possibly signalling great improvements in overall operational efficiency in study conduct.
With that positive note, see you till next time. 

Friday, August 23, 2019

Clinical Intelligence Analytics - Study

Study Dashboard


In study dashboard (Figure 3.1), we will look at certain aspects of study at aggregated level as well as at a study level. 
Figure 3.1

The study dashboard will try to answer following questions:

1. What is the average study completion duration for sponsors from Industry and non-Industry?

For all types of studies (All), the sponsors from Industry completed the studies in about 1.9 years. In comparison to that, sponsors from non-industry took almost 3 years to complete the study.
The observational studies (Obs) took longer to complete. The Industry sponsors with an average of 2.3 years performed fairly better than the non-industry sponsors with an average of 3.2 years.
For interventional type of studies (Int), the average study took 1.8 years for Industry sponsor as compared to 2.9 years for non-industry sponsors.
We may further want to look at the study duration by the phase of the study. Phase 3 studies are large scale and complex in nature and hence, it should take longer to complete when compared with phase 1 and phase 2 studies. Let's see what we find. Only interventional studies go through the drug development phases. If you take a look at Avg Study Duration by Phase chart, Phase 2 studies took longest among all the study phases with an average of 3.3 years. Phase 3 studies took an average of 2.9 years to complete where as phase 4 studies took 2.4 years. Early phase 1 studies took longer than the phase 1 studies.

2. What is the share of sponsors from industry and non-industry in interventional or observational studies?
Almost 80% of studies were interventional studies. 56% of 80% which 70% of total interventional studies were sponsored by non-industry sponsors. The industry sponsors have greater share in interventional studies as compared to its share in observational studies.

3. What percentage of studies were completed between 0 to 3 years or between 8 to 10 years?
40% of the studies were completed between 1 to 3 years. Around 29% studies were completed in less than 1 year. 

4. Which studies took the longest to complete?
There are 44 studies (0.02%) that took more than 30 years to complete. The study that took longest was sponsored by Johnson & Johnson to evaluate the efficacy of oral Levofloxacin in the treatment of chronic Bronchitis. This study took 63 years to complete starting in 1931 and completing in 1994 and has enrolled 367 patients. 

5. At study level, how many medical conditions a particular study is conducted?
See the tabular report to view the number of enrollment, medical conditions and the number of study sites and countries of subject recruitment.

6. In how many countries and facilities did a study recruited patients?
See the tabular report.
The dashboard will show the description of the selected study.