Link to home
Start Free TrialLog in
Avatar of mikha
mikhaFlag for United States of America

asked on

how to download code via git inside docker container?

i have a sample docker file , that I want to use build containers for my data science project ( sample below) . so far this works, but now the code i want to run is in gitlab and I would like to download the code and run it in this container. I am looking for examples on how to do it and have few questions around it. 


1. for my local machine, the ssh keys are set up with my gitlab account, so for the purpose of testing i should be able to pull down the code, but i need suggestions on after i download the code , do i just dump it to working directory , what if there are some other artifacts, data that needs to be downloaded as well, where should i keep it. 


2. also testing locally, my code reads csv file , where should i put the csv file. i know docker has temp storage that we can set . i would assume that is the way to go, to set up a temp storage and then just upload the files to a mounted directory or something similar. 


3. for the gitlab , it will work on my local machine , but if i want to run this in say aws or some cloud environment, it won't have the ssh keys set up with that cloud instance/machine and gitlab account. how are these things done usually? 





```

FROM python:3.6-alpine


WORKDIR /var/www/


//pull code from git


# SOFTWARE PACKAGES
#   * musl: standard C library
#   * lib6-compat: compatibility libraries for glibc
#   * linux-headers: commonly needed, and an unusual package name from Alpine.
#   * build-base: used so we include the basic development packages (gcc)
#   * bash: -- /bin/bash
#   * git: to ease up clones of repos
#   * ca-certificates: for SSL verification during Pip and easy_install
#   * tcl: scripting language
#   * libssl1.0: SSL shared libraries


ENV SOFTWARE_PACKAGES   ="\
    dumb-init \
    musl \
    libc6-compat \
    linux-headers \
    build-base \
    bash \
    git \
    ca-certificates \
    tcl \
    libssl1.0 \
    "

# PYTHON DATA SCIENCE PACKAGES
ENV PYTHON_PACKAGES="\
    numpy \
    matplotlib \
    pandas \
    "

RUN apk add --no-cache --virtual build-dependencies python3 \     

&& ln -s /usr/include/locale.h /usr/include/xlocale.h \    

 && python3 -m ensurepip \     

&& pip install --no-cache-dir $PYTHON_PACKAGES \  

## GCC and Utilities     

&& apk del build-runtime \     

&& apk add --no-cache --virtual build-dependencies $SOFTWARE_PACKAGES   \     && rm -rf /var/cache/apk/* 

CMD ["python3"]

```

ASKER CERTIFIED SOLUTION
Avatar of skullnobrains
skullnobrains

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of mikha

ASKER

@skullnobrains - thanks. one follow up question , as i am reading the docs , would it make sense to have a code in some volume attached, so that we don't have to build image , every time we update the code. 
Avatar of skullnobrains
skullnobrains

that is totally feasible but your docker container would typically essentially contain your code.

stating that need means imho you probably do not want or need to use docker at all.

it might proove simpler to craft a script that just clones and runs the software or restart whichever daemon whenever a new version is available.

docker is not meant to store data inside containers or run code from outside the container. if you want to do either of those things, you probably should reflect on your perception of the technology and change the way you craft the containers or switch to actual vms.
Avatar of mikha

ASKER

@skullnobrains - thank you very much for your insight. one more , if you don't mind . I'm installing pandas to my container and i'm planning to use a particular version of it all the time, so it would makes sense to may be somehow save it to a cache or something similar conceptually , rather then reaching out to internet and downloading every time we build. i'm sure docker has a way to do it. would be good to know your thoughts around this as well. I will close this question , you have been very helpful thanks. 
i know nothing of pandas but one of the following should do

1: the docker way : create a container "panda" with panda only inside in your docker repository. build your other containers starting with "from panda"

2: store panda locally before you build. possibly wrap the build code with sole custom script that checks for a new panda version.

3: use a caching proxy

#1 is probably the most docker-ish way. you can build different base versions for each panda version and name them panda:version

if you use from panda, docker will pick panda:latest and resolve that adequately