Search This Blog

Showing posts with label glue. Show all posts
Showing posts with label glue. Show all posts

Thursday, November 10, 2022

Docker Container Setup for AWS Glue Dev Environment

I was working to configure a local development environment to run AWS Glue jobs locally first before I run them on AWS cloud. 

There is already a Docker image created by Amazon on Docker hub. below are my notes to configure it on a Mac.

Docker hub:

https://hub.docker.com/r/amazon/aws-glue-libs/tags


1. Pull Docker Image locally:


docker pull amazon/aws-glue-libs:glue_libs_3.0.0_image_01


2. One time Build Container for Jupyter(with rw privileges): This rw mode is needed to modify the aws s3 credentials file as in aws configure. Also mount a local folder which can be git enabled.


docker run -itd -p 8888:8888 -p 4040:4040 -v ~/.aws:/root/.aws:rw -v /Users/manoharrana/Documents/GitHub/sandbox/awsGlueDev:/home/jupyter/jupyter_default_dir --name glue_jupyter amazon/aws-glue-libs:glue_libs_1.0.0_image_01 /home/jupyter/jupyter_start.sh


3. Now you can run docker commands to see all the details of the container we just built. 

Open a terminal and start running docker commands:

List all containers(started or stopped):

docker ps -a

List only running containers:

docker ps


To start a Jupyter shell to run commands inside the shell, for example to run aws configure:

docker exec -it glue_jupyter bash

aws configure


4. Start a container by id  or name in interactive mode. Remember the name provided when creating the container in step 2:

Adding -ai is important. simply start is not starting the spark. The container can be started from Docker Desktop. Note: When I start from Docker Desktop, spark engine doesn't start. Hence do docker start -ai in interactive mode as shown below. Run in a terminal and keep the terminal open. stop the container and then start it. Spark takes almost 8-10 minutes to come up:


docker start -ai ebbdebcded48

OR

docker start -ai glue_jupyter


5. Open Jupyter:

http://localhost:8888/


Zeppelin:

docker exec -it glue_zeppelin bash


spark UI:

http://localhost:4040/


6. (Optional) Setup AWS credentials:

Enter shell:

docker exec -it glue_jupyter bash


start aws configuration:

aws configure


# Put aws configure Values

AWS Access Key ID [None]: 7

AWS Secret Access Key [None]: 'Your Secret Key'

Default region name [None]: us-east-2

Default output format [None]: json


7. To stop the docker container:

docker stop idofcontainer

OR

docker stop glue_jupyter


8. To start a container again:


docker start -ai glue_jupyter


Few Screenshots: