57

We are currently looking into CI/CD with our team for our website. We recently also adapted to a monorepo structure as this keeps our dependencies and overview a lot easier. Currently testing etc is ready for the CI but I'm now onto the deployment. I would like to create docker images of the needed packages.

Things I considered:

1) Pull the full monorepo into the docker project but running a yarn install in our project results in a total project size of about 700MB and this mainly due to our react native app which shouldn't even have a docker image. Also this should result in a long image pull time every time we have to deploy a new release

2) Bundle my projects in some kind of way. With our frontend we have working setup so that should be ok. But I just tried to add webpack to our express api and ended up with an error inside my bundle due to this issue: https://github.com/mapbox/node-pre-gyp/issues/308

3) I tried running yarn install only inside the needed project but this will still install my node_modules for all my projects.

4) Run the npm package: pkg. This results in a single file ready to run on a certain system with a certain node version. This DOES work but I'm not sure how well this will handle errors and crashes.

5) Another solution could be copying the project out of the workspace and running a yarn install on it over there. The issue with this is that the use of yarn workspaces (implicitly linked dependencies) is as good as gone. I would have to add my other workspace dependencies explicitly. A possibility is referencing them from a certain commit hash, which I'm going to test right now. (EDIT: you can't reference a subdirectory as a yarn package it seems)

6) ???

I'd like to know if I'm missing an option to have only the needed node_modules for a certain project so I can keep my docker images small.

2
  • have you found a solution to this? I am working on a similar project.
    – Peter
    Commented Sep 5, 2018 at 16:42
  • This is not going to be a problem if you publish your packages to npm, you should not depend directly on the package in the disk during deployment, but on the one submitted to the registry. The automatic linking yarn does should only be used during development. If you keep this in mind you are going to have no problems with a normal deployment were you just copy the service directory to the docker image and install the deps there. Commented Dec 4, 2019 at 15:09

4 Answers 4

36

I've worked on a project following a structure similar to yours, it was looking like:

project
├── package.json
├── packages
│   ├── package1
│   │   ├── package.json
│   │   └── src
│   ├── package2
│   │   ├── package.json
│   │   └── src
│   └── package3
│       ├── package.json
│       └── src
├── services
│   ├── service1
│   │   ├── Dockerfile
│   │   ├── package.json
│   │   └── src
│   └── service2
│       ├── Dockerfile
│       ├── package.json
│       └── src
└── yarn.lock

The services/ folder contains one service per sub-folder. Every service is written in node.js and has its own package.json and Dockerfile. They are typically web server or REST API based on Express.

The packages/ folder contains all the packages that are not services, typically internal libraries.

A service can depend on one or more package, but not on another service. A package can depend on another package, but not on a service.

The main package.json (the one at the project root folder) only contains some devDependencies, such as eslint, the test runner etc.

An individual Dockerfile looks like this, assuming service1 depends on both package1 & package3:

FROM node:8.12.0-alpine AS base

WORKDIR /project

FROM base AS dependencies

# We only copy the dependencies we need
COPY packages/package1 packages/package1
COPY packages/package3 packages/package3

COPY services/services1 services/services1

# The global package.json only contains build dependencies
COPY package.json .

COPY yarn.lock .

RUN yarn install --production --pure-lockfile --non-interactive --cache-folder ./ycache; rm -rf ./ycache

The actual Dockerfiles I used were more complicated, as they had to build the sub-packages, run the tests etc. But you should get the idea with this sample.

As you can see the trick was to only copy the packages that are needed for a specific service. The yarn.lock file contains a list of package@version with the exact version and dependencies resolved. To copy it without all the sub-packages is not a problem, yarn will use the version resolved there when installing the dependencies of the included packages.

In your case the react-native project will never be part of any Dockerfile, as it is the dependency of none of the services, thus saving a lot of space.

For sake of conciseness, I omitted a lot of details in that answer, feel free to ask for precision in the comment if something isn't really clear.

7
  • 4
    How does COPY packages/package1 packages/package1 work if the Dockerfile is located inside the service1 directory? Isn't it COPY ../../packages/package1 packages/package1? Commented Nov 20, 2018 at 11:56
  • 8
    It's because I was using a build command such as docker build -f ./services/service1/Dockerfile . that sets the context to the current directory (the project root in this case) with the Dockerfile of service1. Commented Nov 20, 2018 at 22:59
  • 8
    I really wish there was a way to not have to copy the packages in and just let webpack handle installing the dependencies. Is this possible? Commented Feb 12, 2019 at 16:28
  • 4
    The downside of this approach is that you have to define your dependencies twice; once in your service's package.json and once in your Dockerfile.
    – Nepoxx
    Commented Nov 4, 2020 at 20:06
  • You can auto generate parts of Dockerfiles in precommit hook/ci with info from package.json files. Commented Jan 30, 2021 at 20:47
1

After a lot of trial and error I've found that using that careful use of the file .dockerignore is a great way to control your final image. This works great when running under a monorepo to exclude "other" packages.

For each package, we have a similar named dockerignore file that replaces the live .dockerignore file just before the build.

e.g., cp admin.dockerignore .dockerignore

Below is an example of admin.dockerignore. Note the * at the top of that file that means "ignore everything". The ! prefix means "don't ignore", i.e., retain. The combination means ignore everything except for the specified files.

*
# Build specific keep
!packages/admin

# Common Keep
!*.json
!yarn.lock
!.yarnrc
!packages/common

**/.circleci
**/.editorconfig
**/.dockerignore
**/.git
**/.DS_Store
**/.vscode
**/node_modules
1

I have a very similar setup to Anthony Garcia-Labiad on my project and managed to get it all up&running with skaffold, which allows me to specify the context and the docker file, something like this:

apiVersion: skaffold/v2beta22
kind: Config
metadata:
  name: project
deploy:
  kubectl:
    manifests:
      - infra/k8s/*
build:
  local:
    push: false
  artifacts:
    - image: project/service1
      context: services
      sync:
        manual:
          - src: "services/service1/src/**/*.(ts|js)"
            dest: "./services/service1"
          - src: "packages/package1/**/*.(ts|js)"
            dest: "./packages/package1"
      docker:
        dockerfile: "services/service1/Dockerfile"
0

We put our backend services to a monorepo recently and this was one of a few points that we had to solve. Yarn doesn't have anything that would help us in this regard so we had to look elsewhere.

First we tried @zeit/ncc, there were some issues but eventually we managed to get the final builds. It produces one big file that includes all your code and also all your dependencies code. It looked great. I had to copy to the docker image only a few files (js, source maps, static assets). Images were much much smaller and the app worked. BUT the runtime memory consumption grew a lot. Instead of ~70MB the running container consumed ~250MB. Not sure if we did something wrong but I haven't found any solution and there's only one issue mentioning this. I guess Node.js load parses and loads all the code from the bundle even though most of it is never used.

All we needed is to separate each of the packages production dependencies to build a slim docker image. It seems it's not so simple to do but we found a tool after all.

We're now using fleggal/monopack. It bundles our code with Webpack and transpile it Babel. So it produces also one file bundle but it doesn't contain all the dependencies, just our code. This step is something we don't really needed but we don't mind it's there. For us the important part is - Monopack copies only the package's production dependency tree to the dist/bundled node_modules. That's exactly what we needed. Docker images now have 100MB-150MB instead of 700MB.

There's one easier way. If you have only a few really big npm modules in your node_modules you can use nohoist in your root package.json. That way yarn keeps these modules in package's local node_modules and it doesn't have to be copied to Docker images of all other services.

eg.:

"nohoist": [
  "**/puppeteer",
  "**/puppeteer/**",
  "**/aws-sdk",
  "**/aws-sdk/**"
]

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.