Search This Blog

Thursday, November 10, 2022

Docker Container Setup for AWS Glue Dev Environment

I was working to configure a local development environment to run AWS Glue jobs locally first before I run them on AWS cloud. 

There is already a Docker image created by Amazon on Docker hub. below are my notes to configure it on a Mac.

Docker hub:

https://hub.docker.com/r/amazon/aws-glue-libs/tags


1. Pull Docker Image locally:


docker pull amazon/aws-glue-libs:glue_libs_3.0.0_image_01


2. One time Build Container for Jupyter(with rw privileges): This rw mode is needed to modify the aws s3 credentials file as in aws configure. Also mount a local folder which can be git enabled.


docker run -itd -p 8888:8888 -p 4040:4040 -v ~/.aws:/root/.aws:rw -v /Users/manoharrana/Documents/GitHub/sandbox/awsGlueDev:/home/jupyter/jupyter_default_dir --name glue_jupyter amazon/aws-glue-libs:glue_libs_1.0.0_image_01 /home/jupyter/jupyter_start.sh


3. Now you can run docker commands to see all the details of the container we just built. 

Open a terminal and start running docker commands:

List all containers(started or stopped):

docker ps -a

List only running containers:

docker ps


To start a Jupyter shell to run commands inside the shell, for example to run aws configure:

docker exec -it glue_jupyter bash

aws configure


4. Start a container by id  or name in interactive mode. Remember the name provided when creating the container in step 2:

Adding -ai is important. simply start is not starting the spark. The container can be started from Docker Desktop. Note: When I start from Docker Desktop, spark engine doesn't start. Hence do docker start -ai in interactive mode as shown below. Run in a terminal and keep the terminal open. stop the container and then start it. Spark takes almost 8-10 minutes to come up:


docker start -ai ebbdebcded48

OR

docker start -ai glue_jupyter


5. Open Jupyter:

http://localhost:8888/


Zeppelin:

docker exec -it glue_zeppelin bash


spark UI:

http://localhost:4040/


6. (Optional) Setup AWS credentials:

Enter shell:

docker exec -it glue_jupyter bash


start aws configuration:

aws configure


# Put aws configure Values

AWS Access Key ID [None]: 7

AWS Secret Access Key [None]: 'Your Secret Key'

Default region name [None]: us-east-2

Default output format [None]: json


7. To stop the docker container:

docker stop idofcontainer

OR

docker stop glue_jupyter


8. To start a container again:


docker start -ai glue_jupyter


Few Screenshots: