Reckonsys is responsible for creating and maintaining multiple environments (QA/beta/production, etc) for each of our clients on cloud providers of their choice (AWS / Google Cloud / DigitalOcean, etc). So it is imperative that we standardize everything from Planning, Implementation, Testing, and Deployment (As is it for any other companies like us). In this blog post, we attempt to explain how we standardize deployments..
GitHub: https://github.com/reckonsys/bigga
Bigga (Community Edition) is a generic Docker Compose boilerplate to deploy your microservices (MIT Licensed). It is optimized for python based projects because we build most of the products with Python. But it can be used to deploy any other language as well (Of course, you will have to make some changes for that to happen).
There are multiple alternatives out there: Bare Metal, Fabric, Ansible, Chef, Puppet, Docker’s very own Swarm, Kubernetes, Mesos/Marathon. Why choose just plain Docker?
Bare Metals are not an option. It is a very basic provisioning/deployment. Notoriously hard to scale and maintain. Anything that was not container-based was not an option. As in, we needed the tool of choice to treat containers as first-class citizens. So naturally, Fabric, Ansible, Chef, and Puppet was out of the question (I understand with some plugins, most of these tools can support containerization, but it still is not a first-class citizen). Skillet is another criterion. The more well-known a tool is, the more preferable it is. It makes hiring the right talent easier. Apart from this, there are also business requirements. We needed to run our QA, dev, beta environments on a small 4GB, 2vCPU single instances, which helps us cut costs for our clients. And the same configuration should work on clusters with very minimal changes. So mammoths like Kubernetes and Marathon were out of the equation because it often required bigger instances just to run simple things. Initially, we wrote reckonsys/infra, which is based on Fabric to deploy our applications. But we had to abandon it for the reasons mentioned above.
And naturally, it came down to using Docker Compose. That was the most familiar tool on the market, which means easy hiring. It originally runs on a single instance (Of course Swarm can have multiple, but that’s not the focus here). We maintain an Enterprise Edition of Bigga (private repository) that has support for Kubernetes. This allows us to deploy to multi-instance prod environment and small, single-instance QA/dev/beta environments with the same configurations (with very minimal changes, of course). So Docker Compose seemed to fit the bill. That is what Bigga is written on. The deployment steps are all the same regardless of the cloud provider.
It is to be noted that this blog post is not a tutorial to Docker. You can find plenty of fantastic resources for free on the web. This blog post assumes you already know the basics of Docker, Docker Compose and Docker Machine.
Bigga is a superset of all the tools we use here at Reckonsys. We fork the Enterprise edition of Bigga repository into our client’s GitLab / GitHub / Gogs account, remove the services that are not required and make a few changes to match the client requirement and deploy using the forked repo. It is imperative that the forked repo should be named exactlybigga. Because we have a bunch of invoke tasks that make this assumption. If you rename your bigga folder to something else and use our scripts for maintenance purposes, things might go wrong. And please don’t blame us if it does.
Here are few of the things that you get out of the box once you fork bigga:
The main file of the repo is docker-compose.yml. Let us break down the services in the docker-compose one-by-one.
version: '3'volumes:elastic_data: {}mongo_data: {}neo4j_data: {}postgres_data: {}rabbitmq_data: {}services:datadog:image: datadog/agent:latestenv_file: .envenvironment:- DOCKER_CONTENT_TRUST=1links:- redisports:- 8126volumes:- /var/run/docker.sock:/var/run/docker.sock- /proc/:/host/proc/:ro- /sys/fs/cgroup:/host/sys/fs/cgroup:roelasticsearch:env_file: .envimage: docker.elastic.co/elasticsearch/elasticsearch# environment:# - bootstrap.memory_lock=true# - "ES_JAVA_OPTS=-Xms512m -Xmx512m"volumes:- elastic_data:/usr/share/elasticsearch/dataports:- 9200mongo:env_file: .envrestart: alwaysimage: mongo:latestports:- 27017volumes:- mongo_data:/data/dbneo4j:env_file: .envrestart: alwaysimage: neo4j:latestports:- 7687# - 7474volumes:- neo4j_data:/data# Uncomment port and below lines to make n4j browser accessible# labels:# - traefik.port=7474# - traefik.enable=true# - traefik.backend.domain=${NEO4J_BROWSER_HOST}# - traefik.frontend.rule=Host:${NEO4J_BROWSER_HOST}postgres:env_file: .envrestart: alwaysimage: postgres:latestports:- 5432volumes:- postgres_data:/var/lib/postgresql/data/rabbitmq:env_file: .envrestart: alwaysimage: rabbitmq:latestports:- 5672volumes:- rabbitmq_data:/dataredis:env_file: .envrestart: alwaysimage: redis:latestports:- 6379view raw, docker-compose.yml hosted with ❤ by GitHub
These services are as simple as pulling the images and running it. Running DataDog, ElasticSearch, MongoDB, Neo4J, Postgres, RabbitMQ and Redis is very straight forward. Just pull the image, expose required ports, mount volumes as required, add env vars in the .env file and that is it. You are good to consume these services by their container name and ports.
A note on PostgreSQL: The PostgreSQL container is only used for dev, QA, or beta instances. It is not recommended for production as we do not take care of backups and scaling. It is recommended to use RDS, CloudSQL, or any other managed services for your database. The same goes for MongoDB and Elastic Search. We tend to keep ElasticSearch/Solr in the container because the indexes can be rebuilt on demand.
One might wonder why do we use both RabbitMQ and Redis. Both have pros and cons. RabbitMQ is very reliable for task distribution. It is our celery backend. If the server crashes, the queued task automatically resumes after restart because RabbitMQ always writes to disk. Redis on the other hand is used for sending notifications. RedisManagers makes it possible to scale our Socket Servers. You might argue that Redis also has persistence which can be enabled and can replace RabbitMQ. But it comes at the cost of performance. Persistence is not a first-class citizen in Redis and it tends to throw some IO lock errors that we have witnessed in the past.
All the other services require a custom build.
version: '3'volumes:solr_data: {}services:solr:env_file: .envbuild: solrvolumes:- solr_data:/var/solrports:- 8983view raw, docker-compose.yml hosted with ❤ by GitHubFROM solr:6.6.6RUN precreate-core core1RUN rm -rf /opt/solr/server/solr/mycores/core1/conf/managed-scehmaRUN rm -rf /opt/solr/server/solr/configsets/basic_configs/conf/managed-schemaCOPY managed-schema /opt/solr/server/solr/configsets/basic_configs/conf/COPY managed-schema /opt/solr/server/solr/mycores/core1/conf/COPY schema.xml /opt/solr/server/solr/mycores/core1/conf/view raw, Dockerfile hosted with ❤ by GitHub<?xml version="1.0" encoding="UTF-8"?><!-- Solr managed schema - automatically generated - DO NOT EDIT --><schema name="haystack-schema" version="1.6"><!-- Schema Defnition --></schema>view raw, managed-schema hosted with ❤ by GitHub<?xml version="1.0" encoding="UTF-8" ?><!-- Schema Defnition --></schema>view raw, schema.xml hosted with ❤ by GitHub
As you can notice in the solr configuration, it does not simply pull an image and run it. It builds from a solr directory. This directory contains 3 files: Dockerfile, managed-schema and schema.xml. The Dockerfile pulls solr image, pre-creates a core (core1), and copies the managed-schema and schema.xml into the image’s respected paths. The files managed-schema and schema.xml are generated by our backend services which need to be exported into our solr directory in bigga.
version: '3'volumes:traefik_acme: {}services:traefik:env_file: .envbuild: traefikvolumes:- /var/run/docker.sock:/var/run/docker.sock- traefik_acme:/etc/traefik/acme/ # Defined in traefik/traefik.tomlports:- 0.0.0.0:80:80- 0.0.0.0:443:443view raw, docker-compose.yml hosted with ❤ by GitHub FROM traefik:alpineRUN mkdir -p /etc/traefik/acmeRUN touch /etc/traefik/acme/acme.jsonRUN chmod 600 /etc/traefik/acme/acme.jsonCOPY traefik.toml /etc/traefik# Custom SSL Certificate# COPY /path/to/ssl.crt /certs/ssl.crt# COPY /path/to/ssl.key /certs/ssl.keyview raw, Dockerfile hosted with ❤ by GitHub logLevel = "INFO"defaultEntryPoints = ["https", "http"][retry][docker]exposedByDefault = false[Global]debug = true[log]level = "DEBUG"[accessLog]# format = "json"[api]# entryPoint = "traefik"# rule = "Host(`traefik.domain.com`)"dashboard = false[ping]# Entrypoints, http and https[entryPoints]# http should be redirected to https[entryPoints.http]address = ":80"[entryPoints.http.redirect]entryPoint = "https"# https is the default[entryPoints.https]address = ":443"[entryPoints.https.tls]# Custom SSL certificate# [[entryPoints.https.tls.certificates]]# certFile = "/certs/ssl.crt"# keyFile = "/certs/ssl.key"[acme]email = "dhilip@reckonsys.com"storage = "/etc/traefik/acme/acme.json"entryPoint = "https"acmeLogging = trueonHostRule = true# caServer = "https://acme-staging-v02.api.letsencrypt.org/directory"# Default: "https://acme-v02.api.letsencrypt.org/directory"[acme.dnsChallenge]provider = "route53"view raw, traefik.toml hosted with ❤ by GitHub
Traefik is a replacement for Nginx / Apache httpd. A Cloud-Native Edge Router. it offers reverse proxy and load balancer that makes deploying microservices easy. Traefik integrates with your existing infrastructure components and configures itself automatically and dynamically.
Traefik is designed to be as simple as possible to operate, but capable of handling large, highly-complex deployments across a wide range of environments and protocols in public, private, and hybrid clouds. It also comes with a powerful set of middlewares that enhance its capabilities to include load balancing, API gateway, orchestrator ingress, as well as east-west service communication and more.
When you use Nginx/httpd, configuring it route traffic to your containers is a bit tricky. Every time you add/remove containers that expect ingress traffic, you will have to reconfigure and re-deploy your Nginx / httpd service as well. But all that can be avoided with Traefik. You just apply labels to your containers. Traefik will automatically setup routing and load balancing for you dynamically. On top of that, it provides a dashboard and metrics our of the box.
Just like Solr, Traefik service builds the traefik directory. It has 2 files: Dockerfile and traefik.toml. The Dockerfile simply copies the traefik.toml into its required path and touches acme.json (used to store contents of SSL certificates). traefik.toml has 2 main sections [entrypoints] and [acme]. There are 2 entry points: http which simply redirects to https. The acme section is used for generating (and renewing) letsencrypt certificate. So configuring the SSL certificate is taken care of automatically. ACME supports several DNS Providers. Just place the appropriate env vars in your .env file and update your provider in the traefik.toml and that is all. You can forget about configuring SSL Certificates altogether. Traefik will generate (and renew) SSL for the domains automatically (It parses the list of domains from the labels that you have assigned to other containers)
.git.venvmediauploadnode_modulesdistview raw, .dockerignore hosted with ❤ by GitHub version: '3'services:worker: &workerenv_file: .envrestart: alwaysbuild: ./backend # Change this to your backend path. ed: ../my/backend/pathcommand: celery -A myapp.tasks worker --loglevel=infolinks:- mongo:mongo- rabbitmq:rabbitmq- postgres:postgres- redis:redisbeat:<<: *workercommand: celery -A config.celery_app beat --loglevel=infobackend:<<: *workercommand: gunicorn autoapp:app -b :${BACKEND_PORT}expose:- ${BACKEND_PORT}labels:- traefik.enable=true- traefik.backend.domain=${BACKEND_DOMAIN}- traefik.frontend.rule=Host:${BACKEND_DOMAIN}socket:<<: *workercommand: python sio_server.py ${SOCKET_PORT}expose:- ${SOCKET_PORT}labels:- traefik.enable=true- traefik.backend.domain=${SOCKET_DOMAIN}- traefik.frontend.rule=Host:${SOCKET_DOMAIN}view raw, docker-compose.yml hosted with ❤ by GitHub # FROM reckonsys/python:latestFROM python:3.7-slimWORKDIR /appCOPY requirements.txt .RUN pip install -r requirements.txtCOPY . .view raw, Dockerfile hosted with ❤ by GitHub
Now to the main part of our application. These are a bunch of services the makeup backend of our application. It consists of services like API, Socket, Workers, and CRON jobs. In most of our products, all these services remain in the same directory (usually, the Django project directory). Which means these services share the same build process. to keep things DRY, we first build workers and then use the same worker configuration as a mixin to other services.
Worker Service: First we build our worker services which act as a base configuration for building all other services. We first tell docker which directory to build (we change the path to a relative path where the Django project resides). There is a Docker file in that path. It simply pulls the latest python slim image, copies the requirements.txt. Then it pip install all requirements and then copies the repo into the image being built. Some projects will require additional packages which can be done before running pip install. It links services like PostgreSQL, MongoDB, Redis & RabbitMQ. This is the base configuration that all the other backed services rely on. Finally, the command to run the worker, which in most of our cases is `celery -A myapp.tasks worker –loglevel=info`.
Beat Service: Imports the worker mixin. Only the command is changed `celery -A config.celery_app beat –loglevel=info`. We use beat to run all our CRON jobs. Because we have more control over the process and we can easily configure error emails if something goes wrong with a CRON job.
Backend Service: Imports the worker mixin. The command to start API is `gunicorn autoapp:app -b :${BACKEND_PORT}`. On top of that, we expose $BACKEND_PORT. This signals our traefik service telling it that this particular port is exposed to it. And we have applied 3 labels. The first label tells traefik that ingress in enabled for that particular container. The second label asks traefik to redirect the requests for domain ${BACKEND_DOMAIN} to this container. And the last label tells traefik to generate SSL certificates for ${BACKEND_DOMAIN}. This automatically routes, load balances, and takes care of SSL.
Socket Service: This is almost the same as backed service. Only the command, port and domain is changed.
All the env vars are read from the `.env` file
version: '3'services:# This is ONLY for apps that needs to support SSR.# If you don't use SSR, folow S3_FRONTEND_DEPLOYMENT.md to deploy directly on s3ssrfrontend:env_file: .envrestart: alwaysbuild: ./ssrfrontend # Change this to your frontend path. ed: ../my/frontent/pathcommand: PORT=${SSRFRONTEND_PORT} npm run serve:ssrexpose:- ${SSRFRONTEND_PORT}labels:- traefik.enable=true- traefik.backend.domain=${SSRFRONTEND_DOMAIN}- traefik.frontend.rule=Host:${SSRFRONTEND_DOMAIN}view raw, docker-compose.yml hosted with ❤ by GitHub FROM node:10.16.3-jessie-slimWORKDIR /appCOPY package*.json ./RUN npm installCOPY . .RUN npm run build:ssrview raw, Dockerfile hosted with ❤ by GitHub
There are times when SSR is required for some products (for instance, SEO Optimizations). This service is used for Server Side Rendering. This configuration is almost the same as other backend configurations. Just that, it uses a node image instead of a python image. And the command used differs from product to product.
For most of our products, SSR is not a requirement. So we adopt an S3-ACM-CloudFront deployment approach most of the time (meaning running in containers is not a requirement). We simply sync the built assets to S3, we have ACM configured to generate the SSL certificate and we service all assets through CloudFront which proxies all requests to the S3 bucket. For this purpose, we have written a invoke task that will configure S3, ACM, and CloudFront. Think of it as a homebrewed Netlify alternative. We simply run `inv init-s3-cf-app -r us-east-2 -e beta -d app.example.com` command which will set up everything for us. The manual steps are documented in detail over here.
We have documented the detailed installation procedures in our GitHub repo‘s README.md. Links to additional docs, troubleshooting guides, and migrations notes are also documented over there. Feedbacks are more than welcomed and PRs are amazing.
Let's collaborate to turn your business challenges into AI-powered success stories.
Get Started