I was working to configure a local development environment to run AWS Glue jobs locally first before I run them on AWS cloud.
There is already a Docker image created by Amazon on Docker hub. below are my notes to configure it on a Mac.
Docker hub:
https://hub.docker.com/r/amazon/aws-glue-libs/tags
1. Pull Docker Image locally:
docker pull amazon/aws-glue-libs:glue_libs_3.0.0_image_01
2. One time Build Container for Jupyter(with rw privileges): This rw mode is needed to modify the aws s3 credentials file as in aws configure. Also mount a local folder which can be git enabled.
docker run -itd -p 8888:8888 -p 4040:4040 -v ~/.aws:/root/.aws:rw -v /Users/manoharrana/Documents/GitHub/sandbox/awsGlueDev:/home/jupyter/jupyter_default_dir --name glue_jupyter amazon/aws-glue-libs:glue_libs_1.0.0_image_01 /home/jupyter/jupyter_start.sh
3. Now you can run docker commands to see all the details of the container we just built.
Open a terminal and start running docker commands:
List all containers(started or stopped):
docker ps -a
List only running containers:
docker ps
To start a Jupyter shell to run commands inside the shell, for example to run aws configure:
docker exec -it glue_jupyter bash
aws configure
4. Start a container by id or name in interactive mode. Remember the name provided when creating the container in step 2:
Adding -ai is important. simply start is not starting the spark. The container can be started from Docker Desktop. Note: When I start from Docker Desktop, spark engine doesn't start. Hence do docker start -ai in interactive mode as shown below. Run in a terminal and keep the terminal open. stop the container and then start it. Spark takes almost 8-10 minutes to come up:
docker start -ai ebbdebcded48
OR
docker start -ai glue_jupyter
5. Open Jupyter:
Zeppelin:
docker exec -it glue_zeppelin bash
spark UI:
6. (Optional) Setup AWS credentials:
Enter shell:
docker exec -it glue_jupyter bash
start aws configuration:
aws configure
# Put aws configure Values
AWS Access Key ID [None]: 7
AWS Secret Access Key [None]: 'Your Secret Key'
Default region name [None]: us-east-2
Default output format [None]: json
7. To stop the docker container:
docker stop idofcontainer
OR
docker stop glue_jupyter
8. To start a container again:
docker start -ai glue_jupyter
Few Screenshots: