asked on
installing python packages in a docker file ?
i have a sample docker file , that i want to use build containers for my data science project. but have few questions around this .
1. why do we need to install software packages , including gcc compilers .e.t.c . what is included in alpine base image , and do we need all these software packages on top of alpine to install python?
2. all the python packages needed , like numpy, matplotlib and pandas are included at once, should i try to create layer images . so that , each python package is a layer , instead of all packages built once. what would be a correct way to do this .
3. in the last statement below , i see install $PYTHON_PACKAGES before the software packages ( see below) , what is the order that these get executed , I would think the SOFTWARE_PACKAGES needed to be installed before the python?
```
&& pip install --no-cache-dir $PYTHON_PACKAGES \
...
&& apk add --no-cache --virtual build-dependencies $SOFTWARE_PACKAGES \
```
```
FROM alpine:latest
WORKDIR /var/www/
# SOFTWARE PACKAGES
# * musl: standard C library
# * lib6-compat: compatibility libraries for glibc
# * linux-headers: commonly needed, and an unusual package name from Alpine.
# * build-base: used so we include the basic development packages (gcc)
# * bash: -- /bin/bash
# * git: to ease up clones of repos
# * ca-certificates: for SSL verification during Pip and easy_install
# * libgfortran: Fortran shared library
# * libgcc: contains shared code that would be inefficient to duplicate every time as well as auxiliary helper routines and runtime support
# * libstdc++: The GNU Standard C++ Library.
# * openblas: open source implementation of the BLAS(Basic Linear Algebra Subprograms) API with many hand-crafted optimizations for specific processor types
# * tcl: scripting language
# * tk: GUI toolkit for the Tcl scripting language
# * libssl1.0: SSL shared libraries
ENV PACKAGES="\
dumb-init \
musl \
libc6-compat \
linux-headers \
build-base \
bash \
git \
ca-certificates \
libgfortran \
libgcc \
libstdc++ \
openblas \
tcl \
tk \
libssl1.0 \
"
# PYTHON DATA SCIENCE PACKAGES
ENV PYTHON_PACKAGES="\
numpy \
matplotlib \
pandas \
"
RUN apk add --no-cache --virtual build-dependencies python3 \
&& apk add --virtual build-runtime \
build-base python3-dev openblas-dev freetype-dev pkgconfig gfortran \
&& ln -s /usr/include/locale.h /usr/include/xlocale.h \
&& python3 -m ensurepip \
&& rm -r /usr/lib/python*/ensurepip \
&& pip3 install --upgrade pip setuptools \
&& ln -sf /usr/bin/python3 /usr/bin/python \
&& ln -sf pip3 /usr/bin/pip \
&& rm -r /root/.cache \
&& pip install --no-cache-dir $PYTHON_PACKAGES \
&& apk del build-runtime \
&& apk add --no-cache --virtual build-dependencies $SOFTWARE_PACKAGES \
&& rm -rf /var/cache/apk/*
CMD ["python3"]
```
ASKER