Link to home
Start Free TrialLog in
Avatar of mikha
mikhaFlag for United States of America

asked on

installing python packages in a docker file ?

i have a sample docker file , that i want to use build containers for my data science project. but have few questions around this . 


1. why do we need to install software packages , including gcc compilers .e.t.c . what is included in alpine base image , and do we need all these software packages on top of alpine to install python? 


2. all the python packages needed , like numpy, matplotlib and pandas are included at once, should i try to create layer images . so that , each python package is a layer , instead of all packages built once. what would be a correct way to do this . 


3. in the last statement below , i see install  $PYTHON_PACKAGES  before the software packages ( see below) , what is the order that these get executed , I would think the SOFTWARE_PACKAGES   needed to be installed before the python?


```

    && pip install --no-cache-dir $PYTHON_PACKAGES \

  ...
    && apk add --no-cache --virtual build-dependencies $SOFTWARE_PACKAGES \

```


```

FROM alpine:latest


WORKDIR /var/www/


# SOFTWARE PACKAGES
#   * musl: standard C library
#   * lib6-compat: compatibility libraries for glibc
#   * linux-headers: commonly needed, and an unusual package name from Alpine.
#   * build-base: used so we include the basic development packages (gcc)
#   * bash: -- /bin/bash
#   * git: to ease up clones of repos
#   * ca-certificates: for SSL verification during Pip and easy_install
#   * libgfortran: Fortran shared library
#   * libgcc: contains shared code that would be inefficient to duplicate every time as well as auxiliary helper routines and runtime support
#   * libstdc++: The GNU Standard C++ Library.
#   * openblas: open source implementation of the BLAS(Basic Linear Algebra Subprograms) API with many hand-crafted optimizations for specific processor types
#   * tcl: scripting language
#   * tk: GUI toolkit for the Tcl scripting language
#   * libssl1.0: SSL shared libraries


ENV PACKAGES="\
    dumb-init \
    musl \
    libc6-compat \
    linux-headers \
    build-base \
    bash \
    git \
    ca-certificates \
    libgfortran \
    libgcc \
    libstdc++ \
    openblas \
    tcl \
    tk \
    libssl1.0 \
    "

# PYTHON DATA SCIENCE PACKAGES
ENV PYTHON_PACKAGES="\
    numpy \
    matplotlib \
    pandas \
    "

RUN apk add --no-cache --virtual build-dependencies python3 \
    && apk add --virtual build-runtime \
    build-base python3-dev openblas-dev freetype-dev pkgconfig gfortran \
    && ln -s /usr/include/locale.h /usr/include/xlocale.h \
    && python3 -m ensurepip \
    && rm -r /usr/lib/python*/ensurepip \
    && pip3 install --upgrade pip setuptools \
    && ln -sf /usr/bin/python3 /usr/bin/python \
    && ln -sf pip3 /usr/bin/pip \
    && rm -r /root/.cache \
    && pip install --no-cache-dir $PYTHON_PACKAGES \
    && apk del build-runtime \
    && apk add --no-cache --virtual build-dependencies $SOFTWARE_PACKAGES   \
    && rm -rf /var/cache/apk/*

CMD ["python3"]

```

ASKER CERTIFIED SOLUTION
Avatar of David Johnson, CD
David Johnson, CD
Flag of Canada image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of mikha

ASKER

@David - thanks . do you know if this actually installs python in the image I'm building . reason I'm asking is , when I try to run my python script   using this as a image , it errors out