Search This Blog

Thursday, November 10, 2022

Docker Container Setup for AWS Glue Dev Environment

I was working to configure a local development environment to run AWS Glue jobs locally first before I run them on AWS cloud. 

There is already a Docker image created by Amazon on Docker hub. below are my notes to configure it on a Mac.

Docker hub:

https://hub.docker.com/r/amazon/aws-glue-libs/tags


1. Pull Docker Image locally:


docker pull amazon/aws-glue-libs:glue_libs_3.0.0_image_01


2. One time Build Container for Jupyter(with rw privileges): This rw mode is needed to modify the aws s3 credentials file as in aws configure. Also mount a local folder which can be git enabled.


docker run -itd -p 8888:8888 -p 4040:4040 -v ~/.aws:/root/.aws:rw -v /Users/manoharrana/Documents/GitHub/sandbox/awsGlueDev:/home/jupyter/jupyter_default_dir --name glue_jupyter amazon/aws-glue-libs:glue_libs_1.0.0_image_01 /home/jupyter/jupyter_start.sh


3. Now you can run docker commands to see all the details of the container we just built. 

Open a terminal and start running docker commands:

List all containers(started or stopped):

docker ps -a

List only running containers:

docker ps


To start a Jupyter shell to run commands inside the shell, for example to run aws configure:

docker exec -it glue_jupyter bash

aws configure


4. Start a container by id  or name in interactive mode. Remember the name provided when creating the container in step 2:

Adding -ai is important. simply start is not starting the spark. The container can be started from Docker Desktop. Note: When I start from Docker Desktop, spark engine doesn't start. Hence do docker start -ai in interactive mode as shown below. Run in a terminal and keep the terminal open. stop the container and then start it. Spark takes almost 8-10 minutes to come up:


docker start -ai ebbdebcded48

OR

docker start -ai glue_jupyter


5. Open Jupyter:

http://localhost:8888/


Zeppelin:

docker exec -it glue_zeppelin bash


spark UI:

http://localhost:4040/


6. (Optional) Setup AWS credentials:

Enter shell:

docker exec -it glue_jupyter bash


start aws configuration:

aws configure


# Put aws configure Values

AWS Access Key ID [None]: 7

AWS Secret Access Key [None]: 'Your Secret Key'

Default region name [None]: us-east-2

Default output format [None]: json


7. To stop the docker container:

docker stop idofcontainer

OR

docker stop glue_jupyter


8. To start a container again:


docker start -ai glue_jupyter


Few Screenshots:





Monday, August 2, 2021

Grafana Cloud - Connect to PostgreSQL in Heroku

Providing Hostname, Database, Username and Password was just not enough and it worked after some hit and trial changing other settings.

Grafana is a wonderful open-source Business Intelligence software that you can host on your premise or subscribe to the cloud offering. I had my Django app deployed on Heroku with a PostgreSQL database.

I tried to create a new connection in Grafana cloud to the PostgreSQL cloud with default settings but it was throwing an error 400, which is basically a bad request that the server would reject. The error did not specify details about the error. 

The default setting for TLS/SSL mode is 'verify-full'. I had to change it to 'require' and the connection was successful. Below is a screenshot with all the settings.

grafana postgresql connection

If you are in a same situation, just give it a try. Feel free to post any questions or comments.


Thursday, March 5, 2020

Sponsor Key Performance Indicators - Collaboration

Sponsor Performance KPI - Collaboration

In part 1 of the sponsor KPI's, we learned that the number of studies completed and number of studies for which the results were posted are two important indicators to gauge the performance of a sponsor. In this series we will see other metrics related to lead times and collaboration, and also try to dig deeper into why collaboration is important.

Clinical research is a capital intensive business with very high lead times and very low success rate, which makes it one of the riskier businesses. Clinical trials requires high investments in financial capital as well as in human capital. The lead time (time taken for completing all 3 phases of study before the drug gets approval from FDA) is in the range of 8-13 years. Historically, the average time to complete a study phase has been around 2.62 years. However, phase 1 trials are shorter and takes around 1.84 years as compared to phase 2 and phase 3 studies. The longer a study goes, higher would be the costs associated. Hence, it is in the interest of sponsors and end consumers (patients) to reduce the lead time in order to reduce the prices of patented drugs. 

In order to understand how can the lead time be reduced, we first need to understand the factors that affect the duration of a study. Sponsors and CRO's have developed various methods to increase operational efficiencies based on their long experiences in designing and conducting a study. There are possibilities for improvements there, however, I will not discuss them here. 

The availability of sufficient patients(subjects) to conduct a trial is an important factor. The process of recruiting is time-consuming. Broadly, there are two ways sponsors find patients. 

Firstly, sponsors advertise through various channels to find patients. Social media like facebook, and other digital channels like blogs and websites have picked up in advertising for recruiting clinical trials. Sponsors/CRO's most often pay some compensation for patient's time and travel. This compensation vary from $500 to $9000. 
The second method of recruiting patients is through physician networks. Sponsors/CRO's have physicians in their network that have a database of their patients. 
Out of many who show interest in participating in a trial, many patients fail to clear the screening process and become ineligible to enroll. 
Despite all the efforts from sponsors, they are not able to enroll enough patients. Insufficient enrollment forces sponsors to close many sites(facility like a hospital or physician clinic). 
Finally, insufficient sites results in discontinuation of the study.
The competition to recruit patients has increased with many sponsors competing with other sponsors recruiting for the same patient profile. Also, the sponsors have different competencies. For example, commercial industry sponsors have better operational efficiencies and global network(below figure shows that industry sponsors are doing more global studies) where as non-industry sponsors like hospitals have strong patient base but limited to a local geographical location. 
Sponsor Collaborators Vs Num of Countries

Therefore, it makes sense for sponsors to collaborate more with other sponsors and leverage their strengths for a win-win situation. 
For example, for a local hospital or research university conducting a study on diabetes in Raleigh, it make sense to collaborate with other non-industry sponsors in other cities to increase the number of study sites that increases diversity in the patient population which ultimately increases the quality of entire study. 
For the same local hospital, collaborating with industry sponsors having a global network and expertise in conducting trials in different countries, can not only help them expand their patient base but also leverage their expertise in conducting a global study with high operational efficiency.
There is a greater need for sponsors to collaborate instead of competing for recruiting patients. They should come together and explore possibilities of combining studies together or sharing detailed data they have obtained in previous trials. 
Now we know why collaboration for sponsors is important, let's see how the industry and non-industry sponsors have been collaborating.
I have created an interactive sponsor dashboard for sponsors who have completed at least 10 studies, mainly to eliminate very small sponsors and to reduce the data. Hover over the data points to see details.
On KPI-1 tab in the dashboard, you would see chart that show the collaboration as shown in the figure below:
 Sponsor Collaboration

  
Few observations are that only some bigger industry sponsors are collaborating more with non-industry. The non-industry sponsors have collaborated with non-industry sponsors more as compared to industry sponsors.
Till next time!