Search This Blog

Showing posts with label R Programming. Show all posts
Showing posts with label R Programming. Show all posts

Thursday, March 5, 2020

Sponsor Key Performance Indicators - Collaboration

Sponsor Performance KPI - Collaboration

In part 1 of the sponsor KPI's, we learned that the number of studies completed and number of studies for which the results were posted are two important indicators to gauge the performance of a sponsor. In this series we will see other metrics related to lead times and collaboration, and also try to dig deeper into why collaboration is important.

Clinical research is a capital intensive business with very high lead times and very low success rate, which makes it one of the riskier businesses. Clinical trials requires high investments in financial capital as well as in human capital. The lead time (time taken for completing all 3 phases of study before the drug gets approval from FDA) is in the range of 8-13 years. Historically, the average time to complete a study phase has been around 2.62 years. However, phase 1 trials are shorter and takes around 1.84 years as compared to phase 2 and phase 3 studies. The longer a study goes, higher would be the costs associated. Hence, it is in the interest of sponsors and end consumers (patients) to reduce the lead time in order to reduce the prices of patented drugs. 

In order to understand how can the lead time be reduced, we first need to understand the factors that affect the duration of a study. Sponsors and CRO's have developed various methods to increase operational efficiencies based on their long experiences in designing and conducting a study. There are possibilities for improvements there, however, I will not discuss them here. 

The availability of sufficient patients(subjects) to conduct a trial is an important factor. The process of recruiting is time-consuming. Broadly, there are two ways sponsors find patients. 

Firstly, sponsors advertise through various channels to find patients. Social media like facebook, and other digital channels like blogs and websites have picked up in advertising for recruiting clinical trials. Sponsors/CRO's most often pay some compensation for patient's time and travel. This compensation vary from $500 to $9000. 
The second method of recruiting patients is through physician networks. Sponsors/CRO's have physicians in their network that have a database of their patients. 
Out of many who show interest in participating in a trial, many patients fail to clear the screening process and become ineligible to enroll. 
Despite all the efforts from sponsors, they are not able to enroll enough patients. Insufficient enrollment forces sponsors to close many sites(facility like a hospital or physician clinic). 
Finally, insufficient sites results in discontinuation of the study.
The competition to recruit patients has increased with many sponsors competing with other sponsors recruiting for the same patient profile. Also, the sponsors have different competencies. For example, commercial industry sponsors have better operational efficiencies and global network(below figure shows that industry sponsors are doing more global studies) where as non-industry sponsors like hospitals have strong patient base but limited to a local geographical location. 
Sponsor Collaborators Vs Num of Countries

Therefore, it makes sense for sponsors to collaborate more with other sponsors and leverage their strengths for a win-win situation. 
For example, for a local hospital or research university conducting a study on diabetes in Raleigh, it make sense to collaborate with other non-industry sponsors in other cities to increase the number of study sites that increases diversity in the patient population which ultimately increases the quality of entire study. 
For the same local hospital, collaborating with industry sponsors having a global network and expertise in conducting trials in different countries, can not only help them expand their patient base but also leverage their expertise in conducting a global study with high operational efficiency.
There is a greater need for sponsors to collaborate instead of competing for recruiting patients. They should come together and explore possibilities of combining studies together or sharing detailed data they have obtained in previous trials. 
Now we know why collaboration for sponsors is important, let's see how the industry and non-industry sponsors have been collaborating.
I have created an interactive sponsor dashboard for sponsors who have completed at least 10 studies, mainly to eliminate very small sponsors and to reduce the data. Hover over the data points to see details.
On KPI-1 tab in the dashboard, you would see chart that show the collaboration as shown in the figure below:
 Sponsor Collaboration

  
Few observations are that only some bigger industry sponsors are collaborating more with non-industry. The non-industry sponsors have collaborated with non-industry sponsors more as compared to industry sponsors.
Till next time!       

Wednesday, December 18, 2019

Data Preparation using R

Data Analytics - Data Preparation using R

Data preparation is an important step in data science. Before you can start analysis, we need to ingest the required data and then clean and transform it so that it becomes more easy to perform analytics.
R is an open source and powerful software having reliable packages and libraries to perform data manipulation tasks.
I have downloaded the clinical trials data from ACCT website and used that to show data preparation steps. You can reference the complete code published here in a R Markdown document:

Let's walk through the code and understand how to do it. You may want to open the link mentioned above in a separate tab since I will not mention the code here again and will simply refer to the document. 
You would need to first download the data files from ACCT website download tab and unzip them into a folder in your local directory.
Installing the required packages is the first step. 'dplyr' is an important library package to perform data manipulation and 'lubridate' provides important functions to perform operations with date columns.
Save the directory paths into variables. Notice that the '\' in the path are replaced with '/'. 
read.csv function is used to read the pipe delimited data files into dataframes. Another option is to use read.table function can also be used but I found read.csv works better for me. I encountered errors reading records but read.csv worked without any issues for the same data set.
studies table has lots of columns that I did not require for the analysis, so I created a subset with the required columns only. 
subset.data.frame function allows to select the required columns using 'select' parameter. You can also provide any data filter using 'subset' parameter specifying the filter condition if you want a subset of rows.
I need to create new formula columns into the subset of studies created. mutate function helps to do that.
Another function which I found helpful in doing a wildcard search is 'grepl' function.
summarise function is helpful in creating aggregations.
Hope you will find this useful.

Thursday, December 5, 2019

Linear Regression using R and Python

R and Python

There are times when we not only look at the descriptive analysis but also want to make future predictions based on the past trends. We will look at techniques that we can use to predict the number of studies submitted or registered in future years.

We will see how we can use some of the libraries like pandas, statsmodels and  matplotlib in python.
the python code is available on my github repository here
https://github.com/kalehdoo/clintrials/blob/master/ctrials_1.py
I will also try to explain the steps and procedures to perform the analysis later.
Here is the final outcome:
Model Summary:

Studies submitted predicted:
For 2019 : 31,022 
For 2020 : 32,479

The data used for the regression is from 2005-2018.
There is still some time left for 2019 to complete so I will come back next year to compare the 2019 actuals with the predicted numbers here. The actuals for 2019 is 19,990 (data until Aug 2019) which was posted in one of the previous posts here.

I have also shown how to do regression using R programming, and also how to interpret the results. The link below has complete code and the analytics:
http://rpubs.com/kalehdoo/sponsor_analytics


Source data extracted from: https://aact.ctti-clinicaltrials.org