Edwinology, a web dev blog: postgresql

Single container approach

As we discussed in the first post of this docker series, when you define and run containers, all the software required to run your magnolia server is already installed and configured so launching a magnolia app is just a matter of building your image once and then run it as many times as you wish with the required params.

Multi-container approach

Running containers helps you with the logistics of deploying, configuring and managing different versions of the same software stack. But when you have an app that consists of multiple containers that depend on each other, taking care of the loading order and the specific configs needed for each one could be tough and messy.

In the case of Magnolia, you need at least two different containers running, one for the author and one for the public instances. But if you also want to run Magnolia on top of a DB, then you would need to run two additional containers for each instance's DB. All those instances share credentials, networks, volumes, and have a specific loading order, i.e the DB has to be run before the web app server.

This is where docker-compose can help you to have everything declared in one place and easily reproducible whenever you need a new setup for your app.

Docker Compose

Docker compose is a separate tool that gets installed along with docker. It helps to startup multiple docker containers at the same time and automatically connect them together with networking, health-check and volume management. The main purpose of docker-compose is to function as docker CLI but allows you to issue more commands quickly.

In docker compose you have a YAML file where you define how you would like your multi-container application to be structured. This YAML file will then be used to automate the launch of the containers as defined.

Let's create a docker-compose.yaml file step by step trying to achieve the previous post Magnolia setup of one author and one public attached to a couple of postgres DBs.

Services

In order to define the configuration and params needed to run the containers, you need to provide a services element. Let's take a look to the postgres container as the first service:

version: '3.7'

services: 
  mgnlauthor-postgres:
    image: postgres:12
    restart: unless-stopped
    healthcheck: 
      test: pg_isready -U magnolia || exit 1
      interval: 10s
      timeout: 5s
      retries: 5
    volumes:
      - "~/docker/pgdata-author:/var/lib/postgresql/data"
    networks: 
      - mgnlnet
    environment:
      POSTGRES_USER: "magnolia"
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:?password empty}
      POSTGRES_DB: "magnolia"
      PGDATA: "/var/lib/postgresql/data/pgdata"
  mgnlpublic-postgres: ...

Reviewing the options provided:

version: the service (yaml) definition format is compatible with docker compose version 3.7.
mgnlauthor-postgres: the name of the container in the network. This will be used by the magnolia author instance to connect to this DB.
image: the name of the image to be pulled from the docker registry.
restart: the restart policy of the container. In this case we want docker to restart it if it gets killed somehow.
healthcheck: a test command to be run in order to check the container status. exit code 0 is healthy, exit code 1 unhealthy/failed. The interval, timeout and retry number of the test can be configured as sub-options. For PostgreSQL we used this tool.
volumes: a host mounted volume to be used inside the container.
networks: the network where this container is going to be registered as mgnlauthor-postgres.
environment: All the environment variables needed by the container to run. In this case the database credentials and PGDATA folder.

Note: the POSTGRES_PASSWORD is provided by a compose env variable, which must be provided by a .env file in the same folder as the docker-compose.yaml file.

Now let's continue with the magnolia author container:

  mgnlauthor:
    build:
      context: ./
      args: 
        MGNL_AUTHOR: "true"
    image: ebguilbert/magnolia-cms-postgres:6.2.1-author
    restart: unless-stopped
    depends_on:
      - mgnlauthor-postgres
    volumes:
      - mgnl:/opt/magnolia
    networks: 
      - mgnlnet
    ports: 
      - 8080:8080
    environment: 
      DB_ADDRESS: "mgnlauthor-postgres"    
      DB_PORT: "5432"
      DB_SCHEMA: "magnolia"
      DB_USERNAME: "magnolia"
      DB_PASSWORD: ${DB_PASSWORD:?DB_PASSWORD must be set!}   
    healthcheck:
      test: curl -f http://localhost:8080/.rest/status || exit 1
      interval: 1m
      timeout: 10s
      retries: 5      
  mgnlpublic: ...

The options are very similar to the postgres service, but we have a couple worth mentioning:

build: This option lets you build your own local image if you don't have one registered in a public docker server. You can define the context where to look the Dockerfile for and provide the build_args.
depends_on: This is a very important option since you can control the order in which the containers are run. In this case we want the DB to be started before the the magnolia author container.
volumes: A named volume, which is managed by docker compose, more on this in the next section.
ports:expose the host port 8080.
environment: Credentials as env variables. Note the password is the same compose env variable used in the postgres service.
healthcheck: The test command for magnolia is a rest endpoint we can invoke with curl. The interval is 1 min and the retry is 5, since the first time the rest endpoint might not be available yet and the healthcheck might need to be tested more than once (until 5).

Networks

Docker compose handles the creation and deletion of networks every-time you start up or shutdown the setup defined in the compose file.

For magnolia we only need one network:

networks: 
  mgnlnet:
    name: mgnlnet

Volumes

Docker compose also handles the creation of named volumes and optionally the prune of volumes if needed.

We want one named volume for each magnolia container (author and public):

volumes: 
  mgnl:
    name: mgnl
  mgnlp1:
    name: mgnlp1 

docker-compose.yaml

The whole file including all services, networks and volumes for magnolia author and public instances would look like this:

version: '3.7'

services: 
  mgnlauthor-postgres:
    image: postgres:12
    restart: unless-stopped
    healthcheck: 
      test: pg_isready -U magnolia || exit 1
      interval: 10s
      timeout: 5s
      retries: 5
    volumes:
      - "~/docker/pgdata-author:/var/lib/postgresql/data"
    networks: 
      - mgnlnet
    environment:
      POSTGRES_USER: "magnolia"
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:?password empty}
      POSTGRES_DB: "magnolia"
      PGDATA: "/var/lib/postgresql/data/pgdata"
  mgnlpublic-postgres:
    image: postgres:12
    restart: unless-stopped
    healthcheck: 
        test: pg_isready -U magnolia || exit 1
        interval: 10s
        timeout: 5s
        retries: 5
    volumes:
      - "~/docker/pgdata-public:/var/lib/postgresql/data"
    networks: 
      - mgnlnet
    environment:
      POSTGRES_USER: "magnolia"
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:?password empty}
      POSTGRES_DB: "magnolia"
      PGDATA: "/var/lib/postgresql/data/pgdata"
  mgnlauthor:
    build:
      context: ./
      args: 
        MGNL_AUTHOR: "true"
    image: ebguilbert/magnolia-cms-postgres:6.2.1-author
    restart: unless-stopped
    depends_on:
      - mgnlauthor-postgres
    volumes:
      - mgnl:/opt/magnolia
    networks: 
      - mgnlnet
    ports: 
      - 8080:8080
    environment: 
      DB_ADDRESS: "mgnlauthor-postgres"    
      DB_PORT: "5432"
      DB_SCHEMA: "magnolia"
      DB_USERNAME: "magnolia"
      DB_PASSWORD: ${DB_PASSWORD:?DB_PASSWORD must be set!}   
    healthcheck:
      test: curl -f http://localhost:8080/.rest/status || exit 1
      interval: 1m
      timeout: 10s
      retries: 5      
  mgnlpublic:
    build:
        context: ./
        args: 
          MGNL_AUTHOR: "false"    
    image: ebguilbert/magnolia-cms-postgres:6.2.1-public
    restart: unless-stopped
    depends_on:
      - mgnlpublic-postgres
    volumes:
      - mgnlp1:/opt/magnolia
    networks: 
      - mgnlnet
    ports: 
      - 8090:8080
    environment: 
      DB_ADDRESS: "mgnlpublic-postgres"    
      DB_PORT: "5432"
      DB_SCHEMA: "magnolia"
      DB_USERNAME: "magnolia"
      DB_PASSWORD: ${DB_PASSWORD:?DB_PASSWORD must be set!}   
    healthcheck:
      test: curl -f http://localhost:8080/.rest/status || exit 1
      interval: 1m
      timeout: 10s
      retries: 5

networks: 
  mgnlnet:
    name: mgnlnet

volumes: 
  mgnl:
    name: mgnl
  mgnlp1:
    name: mgnlp1   

As you can see the file is self-explanatory and clearly defines what services are needed, the dependency order and what is needed by each service.

Docker compose up and down

One of the docker compose features I like the most is the possiblilty to compile and run everything at once with one single command:

docker-compose -f "docker-compose.yaml" up -d

The above is running in detached mode so the containers output is not streamed to the terminal.

If you also want to build the local images (if they weren't built before), you can add --build:

docker-compose -f "docker-compose.yaml" up -d --build

To check the status of your containers you can always use the ps command:

docker-compose ps

And finally you can shut everything down, including the removal of containers and networks with the following command:

docker-compose -f "docker-compose.yaml" down

Docker Swarm or Kubernetes

The next and final step is the orchestration of these containers, managing the auto-scaling and recovery of containers in clusters of nodes. For this, other tools like Docker Swarm or Kubernetes are needed.

Good news is that docker compose is fully compatible with docker swarm, so just few more steps are needed. For kubernetes, the docker-compose file can't be reused "as is" but the structure and concepts will be very similar so it shouldn't need big efforts.

Edit: There's a tool called kompose which translates docker-compose files into kubernetes-compatible files ready to be used by Kubernetes :)

Deploying without a DB

This post is a follow-up of a previous post explaining how to deploy Magnolia CMS as a docker container using Debian slim, OpenJDK and Tomcat.

Although this is a very light weight and simple setup since you only need to worry about one container per magnolia instance, the data storage is file system based which might be fine for your public/disposable instances but its definitely not a good choice for the author instance.

Deploying with a DB

The author acts as the master of contents where all versions created are stored. It needs a more robust storage with features like data integrity, concurrency, performance, disaster recovery and so on, just what a RDBMS can offer.

Magnolia is officially compatible with MySQL, Oracle and PostgreSQL, so we can pick any of the official docker images they offer. For this post we will use Postgres.

Why Postgres?

Well, it is open source, which implies many benefits so this would leave Oracle out.

Although MySQL is the most popular open source database out there, and probably the most widely used with Magnolia. Postgres on the other hand is "The World's Most Advanced Open Source Relational Database" according to their website.

Since in our case with docker deployments, we want to store everything in the DB, including Magnolia's datastore, this means that we will make use of BLOBs extensively, for reading and also for storing. MySQL is historically known for its lack of performance with this kind of escenario, and even in latest versions is still something to take care of, specially when using InnoDB storage engine and its buffer pool. So, in short, for performance reasons, we'll pick Postgres.

Lets take a look on how to run the official postgres docker image:

docker run --rm -d \
    --name mgnlauthor-postgres \
    --network mgnlnet \
    -e POSTGRES_USER=magnolia \
    -e POSTGRES_PASSWORD=mysecretpassword \
    -e POSTGRES_DB=magnolia \
    -e PGDATA=/var/lib/postgresql/data/pgdata \
    -v /Users/ebguilbert/docker/pgdata-author:/var/lib/postgresql/data postgres:12

Reviewing the options provided:

Network mgnlnet is going to be used by magnolia containers (mgnlauthor and mgnlpublic) where they can contact the database as mgnlauthor-postgres.
Database credentials are provided as environment variables. These credentials are going to be used later on by the magnolia containers.
The PGDATA folder is linked to a local folder on the host. This is required to preserve the data stored by the DB even after the container is stopped o recreated.

Since we are going to have at least two instances of Magnolia running, lets start a second Postgres container for the public instance, changing the name, credentials and data local volume:

docker run --rm -d \
    --name mgnlpublic-postgres \
    --network mgnlnet \
    -e POSTGRES_USER=magnolia \
    -e POSTGRES_PASSWORD=mysecretpassword \
    -e POSTGRES_DB=magnolia \
    -e PGDATA=/var/lib/postgresql/data/pgdata \
    -v /Users/ebguilbert/docker/pgdata-public:/var/lib/postgresql/data postgres:12

Notice we are using the same network created in the previous blog post. If you haven't created it yet, you need to do it before running the image:

docker network create --subnet=192.168.42.0/24 mgnlnet

Magnolia-Postgres Image

In order to have Magnolia configured with a database we will need to enhance the Dockerfile we used in the previous post, with the postgres JDBC lib and DB connection params for Tomcat to use. Also we will need to copy our own war file containing custom DB-Magnolia config (which we'll cover in detail in the next section):

# Tomcat debian-slim image (any official image would do)
FROM ebguilbert/tomcat-slim:9

LABEL maintainer="Edwin Guilbert"

# ENV variables for Magnolia
ENV MGNL_VERSION 6.2.1
ENV MGNL_APP_DIR /opt/magnolia
ENV MGNL_REPOSITORIES_DIR ${MGNL_APP_DIR}/repositories
ENV MGNL_LOGS_DIR ${MGNL_APP_DIR}/logs
ENV MGNL_RESOURCES_DIR ${MGNL_APP_DIR}/light-modules
ENV JDBC_VERSION=postgresql-42.2.12

# ARGS
ARG MGNL_AUTHOR=true
ARG MGNL_WAR_PATH=docker-bundle/docker-bundle-webapp/target/docker-bundle-webapp-6.2.1.war
ARG MGNL_HEAP=2048M
ARG MGNL_ENV=tomcat/setenv.sh
ARG JDBC_URL=https://jdbc.postgresql.org/download

# JVM PARAMS
ENV CATALINA_OPTS -Xms64M -Xmx${MGNL_HEAP} -Djava.awt.headless=true \
    -Dmagnolia.bootstrap.authorInstance=${MGNL_AUTHOR} \
    -Dmagnolia.repositories.home=${MGNL_REPOSITORIES_DIR} \
    -Dmagnolia.author.key.location=${MGNL_APP_DIR}/magnolia-activation-keypair.properties \
    -Dmagnolia.logs.dir=${MGNL_LOGS_DIR} \
    -Dmagnolia.resources.dir=${MGNL_RESOURCES_DIR} \
    -Dmagnolia.update.auto=true

# VOLUME for Magnolia
VOLUME [ "${MGNL_APP_DIR}" ]

# JDBC lib
RUN wget -q ${JDBC_URL}/${JDBC_VERSION}.jar -O $CATALINA_HOME/lib/${JDBC_VERSION}.jar

# Database runtime config
# - DB_ADDRESS
# - DB_PORT
# - DB_SCHEMA
# - DB_USERNAME
# - DB_PASSWORD
COPY ${MGNL_ENV} $CATALINA_HOME/bin/setenv.sh

# MGNL war
COPY ${MGNL_WAR_PATH} ${DEPLOYMENT_DIR}/ROOT.war

The dockerfile is self-explanatory but its worth pointing out the differences from the original version:

JDBC_VERSION is an environment variable containing the name of the jar file to be added to Tomcat libs.
JDBC_URL is an argument variable containing the URL used to download the JDBC jar from.
MGNL_WAR_PATH is an argument variable containing the path of the custom war to be copied/deployed to Tomcat. Note this variable replaces the MGNL_WAR in the original version.
MGNL_ENV is an argument variable containing the path of a setenv.sh file which is going to configure database credentials as env variables in Tomcat.
The last three lines downloads the JDBC jar, copies the setenv.sh and war file into Tomcat.

Lets build the new image for author and public instances from the Dockerfile:

docker build -t ebguilbert/magnolia-cms-postgres:6.2.1-author --build-arg MGNL_AUTHOR=true .

docker build -t ebguilbert/magnolia-cms-postgres:6.2.1-public --build-arg MGNL_AUTHOR=false .

Note: If you checkout the project from git and try to compile with src folders present, it will take a long time since docker will "load" all the subfolders even though just the compiled war file is needed. So, it is strongly recommended to delete the src and target folders and only preserve the compiled war file before building the image.

Magnolia Persistent Manager and Datastore

Magnolia uses JCR to store all its contents through a Persistent Manager handling the persistent storage of content nodes and properties, with the exception of large binary values which are handled by a Datastore.

Why should I care?

Since we are using PostgreSQL to store the contents, we need to configure a postgres persistent manager for Magnolia, providing the credentials needed to connect to the database and configuring a datastore to store the large binary values.

It's important to notice that we want to store everything in the DB, so this will include all the components of the persistent manager like the datastore and the filesystem for versions and cache.

Why everything in the DB?

Since we are using Docker the idea is to have self-contained containers, so the storage should be handled by the postgres container and the server/app should be handled by the magnolia container, so they could be replaced and moved freely.

This persistent manager is configured by an xml file (jackrabbit-bundle-postgres-search.xml) which is loaded by Magnolia usually from the folder "WEB-INF/config/repo-conf/"

Let's take a look at the relevant sections of the file:

Datasource

  <DataSources>
    <DataSource name="magnolia">
      <param name="driver" value="org.postgresql.Driver" />
      <param name="url" value="jdbc:postgresql://${db.address}:${db.port}/${db.schema}" />
      <param name="user" value="${db.username}" />
      <param name="password" value="${db.password}" />
      <param name="databaseType" value="postgresql"/>
    </DataSource>
  </DataSources>

This is where the DB credentials are set. All the needed params are configured by environment variables that we will pass as arguments to the running Magnolia container (we'll see how in the next section). This is why a setenv.sh file is needed.

Filesystem

  <FileSystem class="org.apache.jackrabbit.core.fs.db.DbFileSystem">
    <param name="dataSourceName" value="magnolia"/>
    <param name="schemaObjectPrefix" value="fs_"/>
  </FileSystem>

This is an interface that acts as a file system abstraction for storing the global repository state. Since we want to store everything in the DB, we are using a db-filesystem.

This db-filesystem configuration is also used for the workspace filesystem and the versioning filesystem for things like search indexes and versions.

Datastore

  <DataStore class="org.apache.jackrabbit.core.data.db.DbDataStore">
    <param name="dataSourceName" value="magnolia"/>
    <param name="schemaObjectPrefix" value="ds_"/>
  </DataStore>

Normally all node and property data is stored in a persistence manager, but for large binaries the datastore is used. This is usually stored in the local-server storage but since we want to store everything in the DB, we are using a db-datastore.

SearchIndex

This section was added recently due to a confusion with the idea of everything in the DB. The search index is actually stored in the filesystem (the main idea of an index is to prevent querying the DB). This means JCR still stores files in the instance filesystem:

# ls ${MGNL_REPOSITORIES_DIR}/magnolia/workspaces/website/ 
index  workspace.xml

As you can see only the index entries and index configuration file (per workspace) are stored in the filesystem.

Magnolia-Postgres Container

Based on the image we just built with the postgres persistent manager configured, let's run the image in the same network as the database (postgres container):

docker run --rm -d -p 8080:8080/tcp --mount source=mgnlauthor,target=/opt/magnolia \
       --network mgnlnet --name mgnlauthor \
       -e DB_ADDRESS=mgnlauthor-postgres \
       -e DB_PORT=5432 \
       -e DB_SCHEMA=magnolia \
       -e DB_USERNAME=magnolia \
       -e DB_PASSWORD=mysecretpassword \
       ebguilbert/magnolia-cms-postgres:6.2.1-author

Looking at the params provided we can see the database credentials being configured dynamically at running time.

Let's run a public container with the credentials of the mgnlpostgres-public db container:

docker run --rm -d -p 8090:8080/tcp --mount source=mgnlpublic,target=/opt/magnolia \
       --network mgnlnet --name mgnlpublic \
       -e DB_ADDRESS=mgnlpublic-postgres \
       -e DB_PORT=5432 \
       -e DB_SCHEMA=magnolia \
       -e DB_USERNAME=magnolia \
       -e DB_PASSWORD=mysecretpassword \
       ebguilbert/magnolia-cms-postgres:6.2.1-public

Note: We have used the same network and volumes created in the previous blog post. If you haven't created the volumes yet, you need to do it before running the images:

docker volume create mgnlauthor

docker volume create mgnlpublic

Multi-container docker Magnolia app

As you can see the whole setup for Magnolia in docker with a DB involves many configurations and quite some containers for the different databases and webserver apps. There are things like db password secrecy and container health-check (to relaunch public instances automatically) that could be automatically managed by docker tools like Docker Compose. But these improvements will be covered by the following post.

As a general note, all the files involved in this post, including the source project for the custom Magnolia war configured for PostgreSQL are available in this git project.

Edwinology, a web dev blog

Posts about Web Development, Java, Magnolia CMS and beyond

5/31/2020

Trying out Docker Compose with Magnolia and DB containers