Trying out Docker, Magnolia and Postgres ~ Edwinology, a web dev blog

Deploying without a DB

This post is a follow-up of a previous post explaining how to deploy Magnolia CMS as a docker container using Debian slim, OpenJDK and Tomcat.

Although this is a very light weight and simple setup since you only need to worry about one container per magnolia instance, the data storage is file system based which might be fine for your public/disposable instances but its definitely not a good choice for the author instance.

Deploying with a DB

The author acts as the master of contents where all versions created are stored. It needs a more robust storage with features like data integrity, concurrency, performance, disaster recovery and so on, just what a RDBMS can offer.

Magnolia is officially compatible with MySQL, Oracle and PostgreSQL, so we can pick any of the official docker images they offer. For this post we will use Postgres.

Why Postgres?

Well, it is open source, which implies many benefits so this would leave Oracle out.

Although MySQL is the most popular open source database out there, and probably the most widely used with Magnolia. Postgres on the other hand is "The World's Most Advanced Open Source Relational Database" according to their website.

Since in our case with docker deployments, we want to store everything in the DB, including Magnolia's datastore, this means that we will make use of BLOBs extensively, for reading and also for storing. MySQL is historically known for its lack of performance with this kind of escenario, and even in latest versions is still something to take care of, specially when using InnoDB storage engine and its buffer pool. So, in short, for performance reasons, we'll pick Postgres.

Lets take a look on how to run the official postgres docker image:

docker run --rm -d \
    --name mgnlauthor-postgres \
    --network mgnlnet \
    -e POSTGRES_USER=magnolia \
    -e POSTGRES_PASSWORD=mysecretpassword \
    -e POSTGRES_DB=magnolia \
    -e PGDATA=/var/lib/postgresql/data/pgdata \
    -v /Users/ebguilbert/docker/pgdata-author:/var/lib/postgresql/data postgres:12

Reviewing the options provided:

Network mgnlnet is going to be used by magnolia containers (mgnlauthor and mgnlpublic) where they can contact the database as mgnlauthor-postgres.
Database credentials are provided as environment variables. These credentials are going to be used later on by the magnolia containers.
The PGDATA folder is linked to a local folder on the host. This is required to preserve the data stored by the DB even after the container is stopped o recreated.

Since we are going to have at least two instances of Magnolia running, lets start a second Postgres container for the public instance, changing the name, credentials and data local volume:

docker run --rm -d \
    --name mgnlpublic-postgres \
    --network mgnlnet \
    -e POSTGRES_USER=magnolia \
    -e POSTGRES_PASSWORD=mysecretpassword \
    -e POSTGRES_DB=magnolia \
    -e PGDATA=/var/lib/postgresql/data/pgdata \
    -v /Users/ebguilbert/docker/pgdata-public:/var/lib/postgresql/data postgres:12

Notice we are using the same network created in the previous blog post. If you haven't created it yet, you need to do it before running the image:

docker network create --subnet=192.168.42.0/24 mgnlnet

Magnolia-Postgres Image

In order to have Magnolia configured with a database we will need to enhance the Dockerfile we used in the previous post, with the postgres JDBC lib and DB connection params for Tomcat to use. Also we will need to copy our own war file containing custom DB-Magnolia config (which we'll cover in detail in the next section):

# Tomcat debian-slim image (any official image would do)
FROM ebguilbert/tomcat-slim:9

LABEL maintainer="Edwin Guilbert"

# ENV variables for Magnolia
ENV MGNL_VERSION 6.2.1
ENV MGNL_APP_DIR /opt/magnolia
ENV MGNL_REPOSITORIES_DIR ${MGNL_APP_DIR}/repositories
ENV MGNL_LOGS_DIR ${MGNL_APP_DIR}/logs
ENV MGNL_RESOURCES_DIR ${MGNL_APP_DIR}/light-modules
ENV JDBC_VERSION=postgresql-42.2.12

# ARGS
ARG MGNL_AUTHOR=true
ARG MGNL_WAR_PATH=docker-bundle/docker-bundle-webapp/target/docker-bundle-webapp-6.2.1.war
ARG MGNL_HEAP=2048M
ARG MGNL_ENV=tomcat/setenv.sh
ARG JDBC_URL=https://jdbc.postgresql.org/download

# JVM PARAMS
ENV CATALINA_OPTS -Xms64M -Xmx${MGNL_HEAP} -Djava.awt.headless=true \
    -Dmagnolia.bootstrap.authorInstance=${MGNL_AUTHOR} \
    -Dmagnolia.repositories.home=${MGNL_REPOSITORIES_DIR} \
    -Dmagnolia.author.key.location=${MGNL_APP_DIR}/magnolia-activation-keypair.properties \
    -Dmagnolia.logs.dir=${MGNL_LOGS_DIR} \
    -Dmagnolia.resources.dir=${MGNL_RESOURCES_DIR} \
    -Dmagnolia.update.auto=true

# VOLUME for Magnolia
VOLUME [ "${MGNL_APP_DIR}" ]

# JDBC lib
RUN wget -q ${JDBC_URL}/${JDBC_VERSION}.jar -O $CATALINA_HOME/lib/${JDBC_VERSION}.jar

# Database runtime config
# - DB_ADDRESS
# - DB_PORT
# - DB_SCHEMA
# - DB_USERNAME
# - DB_PASSWORD
COPY ${MGNL_ENV} $CATALINA_HOME/bin/setenv.sh

# MGNL war
COPY ${MGNL_WAR_PATH} ${DEPLOYMENT_DIR}/ROOT.war

The dockerfile is self-explanatory but its worth pointing out the differences from the original version:

JDBC_VERSION is an environment variable containing the name of the jar file to be added to Tomcat libs.
JDBC_URL is an argument variable containing the URL used to download the JDBC jar from.
MGNL_WAR_PATH is an argument variable containing the path of the custom war to be copied/deployed to Tomcat. Note this variable replaces the MGNL_WAR in the original version.
MGNL_ENV is an argument variable containing the path of a setenv.sh file which is going to configure database credentials as env variables in Tomcat.
The last three lines downloads the JDBC jar, copies the setenv.sh and war file into Tomcat.

Lets build the new image for author and public instances from the Dockerfile:

docker build -t ebguilbert/magnolia-cms-postgres:6.2.1-author --build-arg MGNL_AUTHOR=true .

docker build -t ebguilbert/magnolia-cms-postgres:6.2.1-public --build-arg MGNL_AUTHOR=false .

Note: If you checkout the project from git and try to compile with src folders present, it will take a long time since docker will "load" all the subfolders even though just the compiled war file is needed. So, it is strongly recommended to delete the src and target folders and only preserve the compiled war file before building the image.

Magnolia Persistent Manager and Datastore

Magnolia uses JCR to store all its contents through a Persistent Manager handling the persistent storage of content nodes and properties, with the exception of large binary values which are handled by a Datastore.

Why should I care?

Since we are using PostgreSQL to store the contents, we need to configure a postgres persistent manager for Magnolia, providing the credentials needed to connect to the database and configuring a datastore to store the large binary values.

It's important to notice that we want to store everything in the DB, so this will include all the components of the persistent manager like the datastore and the filesystem for versions and cache.

Why everything in the DB?

Since we are using Docker the idea is to have self-contained containers, so the storage should be handled by the postgres container and the server/app should be handled by the magnolia container, so they could be replaced and moved freely.

This persistent manager is configured by an xml file (jackrabbit-bundle-postgres-search.xml) which is loaded by Magnolia usually from the folder "WEB-INF/config/repo-conf/"

Let's take a look at the relevant sections of the file:

Datasource

  <DataSources>
    <DataSource name="magnolia">
      <param name="driver" value="org.postgresql.Driver" />
      <param name="url" value="jdbc:postgresql://${db.address}:${db.port}/${db.schema}" />
      <param name="user" value="${db.username}" />
      <param name="password" value="${db.password}" />
      <param name="databaseType" value="postgresql"/>
    </DataSource>
  </DataSources>

This is where the DB credentials are set. All the needed params are configured by environment variables that we will pass as arguments to the running Magnolia container (we'll see how in the next section). This is why a setenv.sh file is needed.

Filesystem

  <FileSystem class="org.apache.jackrabbit.core.fs.db.DbFileSystem">
    <param name="dataSourceName" value="magnolia"/>
    <param name="schemaObjectPrefix" value="fs_"/>
  </FileSystem>

This is an interface that acts as a file system abstraction for storing the global repository state. Since we want to store everything in the DB, we are using a db-filesystem.

This db-filesystem configuration is also used for the workspace filesystem and the versioning filesystem for things like search indexes and versions.

Datastore

  <DataStore class="org.apache.jackrabbit.core.data.db.DbDataStore">
    <param name="dataSourceName" value="magnolia"/>
    <param name="schemaObjectPrefix" value="ds_"/>
  </DataStore>

Normally all node and property data is stored in a persistence manager, but for large binaries the datastore is used. This is usually stored in the local-server storage but since we want to store everything in the DB, we are using a db-datastore.

SearchIndex

This section was added recently due to a confusion with the idea of everything in the DB. The search index is actually stored in the filesystem (the main idea of an index is to prevent querying the DB). This means JCR still stores files in the instance filesystem:

# ls ${MGNL_REPOSITORIES_DIR}/magnolia/workspaces/website/ 
index  workspace.xml

As you can see only the index entries and index configuration file (per workspace) are stored in the filesystem.

Magnolia-Postgres Container

Based on the image we just built with the postgres persistent manager configured, let's run the image in the same network as the database (postgres container):

docker run --rm -d -p 8080:8080/tcp --mount source=mgnlauthor,target=/opt/magnolia \
       --network mgnlnet --name mgnlauthor \
       -e DB_ADDRESS=mgnlauthor-postgres \
       -e DB_PORT=5432 \
       -e DB_SCHEMA=magnolia \
       -e DB_USERNAME=magnolia \
       -e DB_PASSWORD=mysecretpassword \
       ebguilbert/magnolia-cms-postgres:6.2.1-author

Looking at the params provided we can see the database credentials being configured dynamically at running time.

Let's run a public container with the credentials of the mgnlpostgres-public db container:

docker run --rm -d -p 8090:8080/tcp --mount source=mgnlpublic,target=/opt/magnolia \
       --network mgnlnet --name mgnlpublic \
       -e DB_ADDRESS=mgnlpublic-postgres \
       -e DB_PORT=5432 \
       -e DB_SCHEMA=magnolia \
       -e DB_USERNAME=magnolia \
       -e DB_PASSWORD=mysecretpassword \
       ebguilbert/magnolia-cms-postgres:6.2.1-public

Note: We have used the same network and volumes created in the previous blog post. If you haven't created the volumes yet, you need to do it before running the images:

docker volume create mgnlauthor

docker volume create mgnlpublic

Multi-container docker Magnolia app

As you can see the whole setup for Magnolia in docker with a DB involves many configurations and quite some containers for the different databases and webserver apps. There are things like db password secrecy and container health-check (to relaunch public instances automatically) that could be automatically managed by docker tools like Docker Compose. But these improvements will be covered by the following post.

As a general note, all the files involved in this post, including the source project for the custom Magnolia war configured for PostgreSQL are available in this git project.

2 comments:

Christian MenzelJune 9, 2020 at 11:35 AM
Hi Edwin, I like your "everything in the DB" approach, but when I tried to setup a Magnolia Instance with your jackrabbit-bundle-postgres-search.xml from your git project, I noticed that the index was created in the local filesystem (${rep.home}/workspaces) instead of the database.
After I added a FileSystem element to the SearchIndex element, 'index' tables where created in the DB but not populated, instead the local filesystem was used again.
Can you verify that in your setup the index is actually created in the DB?
Cheers
Chris

Edwinology, a web dev blog

Posts about Web Development, Java, Magnolia CMS and beyond

5/24/2020