Edwinology, a web dev blog

7/14/2020

Redirect A/B Testing with Google Optimize

7/14/2020 Posted by Edwin Guilbert abtesting, analytics 1 comment

It has been a while since my last post on A/B testing. Since then, a lot of things have changed, specially with Google Analytics API and how Google wants us to create experiments with their platform.

To cut to the chase, Google Experiments has been shut down in favor of Google Optimize. This has direct impact on the way we were integrating A/B testing on Magnolia, since we were using Analytics Management API to programmatically create and modify test experiments and this is not possible anymore with Optimize.

But not everything is lost since we can still take advantage of Magnolia Personalization and page variants in order to manually (not automagically) create and manage redirect experiments with Google Optimize.

Let's review what ab testing is for, from my previous post:

"In ab testing you could improve your conversions by comparing different versions of the same page. The idea is to change key elements that might improve your conversion rate. You usually have a original page (so called control page) and a variation of it. The users will get any of these versions randomly and after a period of time you can compare which version did better according to a goal (or conversion)."

... and how can be implemented in Magnolia with Google Optimize and the same example as before.

We are going to work with the demo website travel, specifically with the "about" page which contains a video. The goal here would be get this video played more times, so the metric we are going to use will be the play event on this element of the dom.

The original page looks like this:

And the variation will have a wider video, removing the left side links which might distract the visitor and prevent him from playing the video (our goal in this case).

To create a variation of a page, you just have to select "Add page variant" in the pages app of Magnolia.

Open the variant created and update the video component:

After publishing the variant, you need to connect these pages to Google Analytics using a Javascript snippet for tracking page views. You can embed this snippet with Marketing Tags in Magnolia (don't forget to include it in all your pages):

After publishing the snippet (or marketing tag in Magnolia) you need to send an event for every time a user plays the video included in the pages we want to test, so we need to create another marketing tag for that:

<script>
$(".video-wrapper video").on('play',function(){
ga('send','event','video','play');
});
</script>

After publishing the snippet you need to actually record the number of play video events as conversions or "goals" in Google Analytics:

After the goal is configured, you might want to test it in the real time tab of Google Analytics, so every time you play a video in your page, an event gets registered as a conversion:

Additionally in order to use Optimize, you need to install it as a Javascript Snippet, including your container ID, in a similar way to what you did with Google Analytics:

At this point you finally have all prepared to start your ab testing on the page variants measuring video played metrics as a goal with Google Optimize:

"A redirect test is a type of A/B test that allows you to test different web pages against each other. A redirect test contains different URLs for each variant. Redirect tests are useful when you want to test multiple different landing pages, or a complete redesign of a page."

So lets create a redirect experiment with the following steps:

Create an experience of type "redirect test". Give it a name and provide the public URL of the page you want to test, in this case: http://yourserver/travel/about.html

Add as variant of the test, the variant of the page in Magnolia. The trick here is to use the internal URL of the page variant, in this case: http://yourserver/travel/about/variants/variant-0

Add as an objective the goal previously configured in Google Analytics, in this case, everytime a video is played:

Finally start the experiment! (don't worry about installation checked we are handling it with the marketing tags)

And thats all! Yu can always check how your test is going in the reporting tab of the Optimize dashboard. Remember you have to wait at least 1 day in order to see results.

After you are done testing you can always pick the winner and publish it as the final page in Magnolia :)

Trying out Docker Compose with Magnolia and DB containers

5/31/2020 Posted by Edwin Guilbert container, database, docker, magnoliacms, postgres, postgresql 4 comments

Single container approach

As we discussed in the first post of this docker series, when you define and run containers, all the software required to run your magnolia server is already installed and configured so launching a magnolia app is just a matter of building your image once and then run it as many times as you wish with the required params.

Multi-container approach

Running containers helps you with the logistics of deploying, configuring and managing different versions of the same software stack. But when you have an app that consists of multiple containers that depend on each other, taking care of the loading order and the specific configs needed for each one could be tough and messy.

In the case of Magnolia, you need at least two different containers running, one for the author and one for the public instances. But if you also want to run Magnolia on top of a DB, then you would need to run two additional containers for each instance's DB. All those instances share credentials, networks, volumes, and have a specific loading order, i.e the DB has to be run before the web app server.

This is where docker-compose can help you to have everything declared in one place and easily reproducible whenever you need a new setup for your app.

Docker Compose

Docker compose is a separate tool that gets installed along with docker. It helps to startup multiple docker containers at the same time and automatically connect them together with networking, health-check and volume management. The main purpose of docker-compose is to function as docker CLI but allows you to issue more commands quickly.

In docker compose you have a YAML file where you define how you would like your multi-container application to be structured. This YAML file will then be used to automate the launch of the containers as defined.

Let's create a docker-compose.yaml file step by step trying to achieve the previous post Magnolia setup of one author and one public attached to a couple of postgres DBs.

Services

In order to define the configuration and params needed to run the containers, you need to provide a services element. Let's take a look to the postgres container as the first service:

version: '3.7'

services: 
  mgnlauthor-postgres:
    image: postgres:12
    restart: unless-stopped
    healthcheck: 
      test: pg_isready -U magnolia || exit 1
      interval: 10s
      timeout: 5s
      retries: 5
    volumes:
      - "~/docker/pgdata-author:/var/lib/postgresql/data"
    networks: 
      - mgnlnet
    environment:
      POSTGRES_USER: "magnolia"
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:?password empty}
      POSTGRES_DB: "magnolia"
      PGDATA: "/var/lib/postgresql/data/pgdata"
  mgnlpublic-postgres: ...

Reviewing the options provided:

version: the service (yaml) definition format is compatible with docker compose version 3.7.
mgnlauthor-postgres: the name of the container in the network. This will be used by the magnolia author instance to connect to this DB.
image: the name of the image to be pulled from the docker registry.
restart: the restart policy of the container. In this case we want docker to restart it if it gets killed somehow.
healthcheck: a test command to be run in order to check the container status. exit code 0 is healthy, exit code 1 unhealthy/failed. The interval, timeout and retry number of the test can be configured as sub-options. For PostgreSQL we used this tool.
volumes: a host mounted volume to be used inside the container.
networks: the network where this container is going to be registered as mgnlauthor-postgres.
environment: All the environment variables needed by the container to run. In this case the database credentials and PGDATA folder.

Note: the POSTGRES_PASSWORD is provided by a compose env variable, which must be provided by a .env file in the same folder as the docker-compose.yaml file.

Now let's continue with the magnolia author container:

  mgnlauthor:
    build:
      context: ./
      args: 
        MGNL_AUTHOR: "true"
    image: ebguilbert/magnolia-cms-postgres:6.2.1-author
    restart: unless-stopped
    depends_on:
      - mgnlauthor-postgres
    volumes:
      - mgnl:/opt/magnolia
    networks: 
      - mgnlnet
    ports: 
      - 8080:8080
    environment: 
      DB_ADDRESS: "mgnlauthor-postgres"    
      DB_PORT: "5432"
      DB_SCHEMA: "magnolia"
      DB_USERNAME: "magnolia"
      DB_PASSWORD: ${DB_PASSWORD:?DB_PASSWORD must be set!}   
    healthcheck:
      test: curl -f http://localhost:8080/.rest/status || exit 1
      interval: 1m
      timeout: 10s
      retries: 5      
  mgnlpublic: ...

The options are very similar to the postgres service, but we have a couple worth mentioning:

build: This option lets you build your own local image if you don't have one registered in a public docker server. You can define the context where to look the Dockerfile for and provide the build_args.
depends_on: This is a very important option since you can control the order in which the containers are run. In this case we want the DB to be started before the the magnolia author container.
volumes: A named volume, which is managed by docker compose, more on this in the next section.
ports:expose the host port 8080.
environment: Credentials as env variables. Note the password is the same compose env variable used in the postgres service.
healthcheck: The test command for magnolia is a rest endpoint we can invoke with curl. The interval is 1 min and the retry is 5, since the first time the rest endpoint might not be available yet and the healthcheck might need to be tested more than once (until 5).

Networks

Docker compose handles the creation and deletion of networks every-time you start up or shutdown the setup defined in the compose file.

For magnolia we only need one network:

networks: 
  mgnlnet:
    name: mgnlnet

Volumes

Docker compose also handles the creation of named volumes and optionally the prune of volumes if needed.

We want one named volume for each magnolia container (author and public):

volumes: 
  mgnl:
    name: mgnl
  mgnlp1:
    name: mgnlp1 

docker-compose.yaml

The whole file including all services, networks and volumes for magnolia author and public instances would look like this:

version: '3.7'

services: 
  mgnlauthor-postgres:
    image: postgres:12
    restart: unless-stopped
    healthcheck: 
      test: pg_isready -U magnolia || exit 1
      interval: 10s
      timeout: 5s
      retries: 5
    volumes:
      - "~/docker/pgdata-author:/var/lib/postgresql/data"
    networks: 
      - mgnlnet
    environment:
      POSTGRES_USER: "magnolia"
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:?password empty}
      POSTGRES_DB: "magnolia"
      PGDATA: "/var/lib/postgresql/data/pgdata"
  mgnlpublic-postgres:
    image: postgres:12
    restart: unless-stopped
    healthcheck: 
        test: pg_isready -U magnolia || exit 1
        interval: 10s
        timeout: 5s
        retries: 5
    volumes:
      - "~/docker/pgdata-public:/var/lib/postgresql/data"
    networks: 
      - mgnlnet
    environment:
      POSTGRES_USER: "magnolia"
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:?password empty}
      POSTGRES_DB: "magnolia"
      PGDATA: "/var/lib/postgresql/data/pgdata"
  mgnlauthor:
    build:
      context: ./
      args: 
        MGNL_AUTHOR: "true"
    image: ebguilbert/magnolia-cms-postgres:6.2.1-author
    restart: unless-stopped
    depends_on:
      - mgnlauthor-postgres
    volumes:
      - mgnl:/opt/magnolia
    networks: 
      - mgnlnet
    ports: 
      - 8080:8080
    environment: 
      DB_ADDRESS: "mgnlauthor-postgres"    
      DB_PORT: "5432"
      DB_SCHEMA: "magnolia"
      DB_USERNAME: "magnolia"
      DB_PASSWORD: ${DB_PASSWORD:?DB_PASSWORD must be set!}   
    healthcheck:
      test: curl -f http://localhost:8080/.rest/status || exit 1
      interval: 1m
      timeout: 10s
      retries: 5      
  mgnlpublic:
    build:
        context: ./
        args: 
          MGNL_AUTHOR: "false"    
    image: ebguilbert/magnolia-cms-postgres:6.2.1-public
    restart: unless-stopped
    depends_on:
      - mgnlpublic-postgres
    volumes:
      - mgnlp1:/opt/magnolia
    networks: 
      - mgnlnet
    ports: 
      - 8090:8080
    environment: 
      DB_ADDRESS: "mgnlpublic-postgres"    
      DB_PORT: "5432"
      DB_SCHEMA: "magnolia"
      DB_USERNAME: "magnolia"
      DB_PASSWORD: ${DB_PASSWORD:?DB_PASSWORD must be set!}   
    healthcheck:
      test: curl -f http://localhost:8080/.rest/status || exit 1
      interval: 1m
      timeout: 10s
      retries: 5

networks: 
  mgnlnet:
    name: mgnlnet

volumes: 
  mgnl:
    name: mgnl
  mgnlp1:
    name: mgnlp1   

As you can see the file is self-explanatory and clearly defines what services are needed, the dependency order and what is needed by each service.

Docker compose up and down

One of the docker compose features I like the most is the possiblilty to compile and run everything at once with one single command:

docker-compose -f "docker-compose.yaml" up -d

The above is running in detached mode so the containers output is not streamed to the terminal.

If you also want to build the local images (if they weren't built before), you can add --build:

docker-compose -f "docker-compose.yaml" up -d --build

To check the status of your containers you can always use the ps command:

docker-compose ps

And finally you can shut everything down, including the removal of containers and networks with the following command:

docker-compose -f "docker-compose.yaml" down

Docker Swarm or Kubernetes

The next and final step is the orchestration of these containers, managing the auto-scaling and recovery of containers in clusters of nodes. For this, other tools like Docker Swarm or Kubernetes are needed.

Good news is that docker compose is fully compatible with docker swarm, so just few more steps are needed. For kubernetes, the docker-compose file can't be reused "as is" but the structure and concepts will be very similar so it shouldn't need big efforts.

Edit: There's a tool called kompose which translates docker-compose files into kubernetes-compatible files ready to be used by Kubernetes :)

Trying out Docker, Magnolia and Postgres

5/24/2020 Posted by Edwin Guilbert database, docker, jcr, magnoliacms, postgres, postgresql, tomcat 2 comments

Deploying without a DB

This post is a follow-up of a previous post explaining how to deploy Magnolia CMS as a docker container using Debian slim, OpenJDK and Tomcat.

Although this is a very light weight and simple setup since you only need to worry about one container per magnolia instance, the data storage is file system based which might be fine for your public/disposable instances but its definitely not a good choice for the author instance.

Deploying with a DB

The author acts as the master of contents where all versions created are stored. It needs a more robust storage with features like data integrity, concurrency, performance, disaster recovery and so on, just what a RDBMS can offer.

Magnolia is officially compatible with MySQL, Oracle and PostgreSQL, so we can pick any of the official docker images they offer. For this post we will use Postgres.

Why Postgres?

Well, it is open source, which implies many benefits so this would leave Oracle out.

Although MySQL is the most popular open source database out there, and probably the most widely used with Magnolia. Postgres on the other hand is "The World's Most Advanced Open Source Relational Database" according to their website.

Since in our case with docker deployments, we want to store everything in the DB, including Magnolia's datastore, this means that we will make use of BLOBs extensively, for reading and also for storing. MySQL is historically known for its lack of performance with this kind of escenario, and even in latest versions is still something to take care of, specially when using InnoDB storage engine and its buffer pool. So, in short, for performance reasons, we'll pick Postgres.

Lets take a look on how to run the official postgres docker image:

docker run --rm -d \
    --name mgnlauthor-postgres \
    --network mgnlnet \
    -e POSTGRES_USER=magnolia \
    -e POSTGRES_PASSWORD=mysecretpassword \
    -e POSTGRES_DB=magnolia \
    -e PGDATA=/var/lib/postgresql/data/pgdata \
    -v /Users/ebguilbert/docker/pgdata-author:/var/lib/postgresql/data postgres:12

Reviewing the options provided:

Network mgnlnet is going to be used by magnolia containers (mgnlauthor and mgnlpublic) where they can contact the database as mgnlauthor-postgres.
Database credentials are provided as environment variables. These credentials are going to be used later on by the magnolia containers.
The PGDATA folder is linked to a local folder on the host. This is required to preserve the data stored by the DB even after the container is stopped o recreated.

Since we are going to have at least two instances of Magnolia running, lets start a second Postgres container for the public instance, changing the name, credentials and data local volume:

docker run --rm -d \
    --name mgnlpublic-postgres \
    --network mgnlnet \
    -e POSTGRES_USER=magnolia \
    -e POSTGRES_PASSWORD=mysecretpassword \
    -e POSTGRES_DB=magnolia \
    -e PGDATA=/var/lib/postgresql/data/pgdata \
    -v /Users/ebguilbert/docker/pgdata-public:/var/lib/postgresql/data postgres:12

Notice we are using the same network created in the previous blog post. If you haven't created it yet, you need to do it before running the image:

docker network create --subnet=192.168.42.0/24 mgnlnet

Magnolia-Postgres Image

In order to have Magnolia configured with a database we will need to enhance the Dockerfile we used in the previous post, with the postgres JDBC lib and DB connection params for Tomcat to use. Also we will need to copy our own war file containing custom DB-Magnolia config (which we'll cover in detail in the next section):

# Tomcat debian-slim image (any official image would do)
FROM ebguilbert/tomcat-slim:9

LABEL maintainer="Edwin Guilbert"

# ENV variables for Magnolia
ENV MGNL_VERSION 6.2.1
ENV MGNL_APP_DIR /opt/magnolia
ENV MGNL_REPOSITORIES_DIR ${MGNL_APP_DIR}/repositories
ENV MGNL_LOGS_DIR ${MGNL_APP_DIR}/logs
ENV MGNL_RESOURCES_DIR ${MGNL_APP_DIR}/light-modules
ENV JDBC_VERSION=postgresql-42.2.12

# ARGS
ARG MGNL_AUTHOR=true
ARG MGNL_WAR_PATH=docker-bundle/docker-bundle-webapp/target/docker-bundle-webapp-6.2.1.war
ARG MGNL_HEAP=2048M
ARG MGNL_ENV=tomcat/setenv.sh
ARG JDBC_URL=https://jdbc.postgresql.org/download

# JVM PARAMS
ENV CATALINA_OPTS -Xms64M -Xmx${MGNL_HEAP} -Djava.awt.headless=true \
    -Dmagnolia.bootstrap.authorInstance=${MGNL_AUTHOR} \
    -Dmagnolia.repositories.home=${MGNL_REPOSITORIES_DIR} \
    -Dmagnolia.author.key.location=${MGNL_APP_DIR}/magnolia-activation-keypair.properties \
    -Dmagnolia.logs.dir=${MGNL_LOGS_DIR} \
    -Dmagnolia.resources.dir=${MGNL_RESOURCES_DIR} \
    -Dmagnolia.update.auto=true

# VOLUME for Magnolia
VOLUME [ "${MGNL_APP_DIR}" ]

# JDBC lib
RUN wget -q ${JDBC_URL}/${JDBC_VERSION}.jar -O $CATALINA_HOME/lib/${JDBC_VERSION}.jar

# Database runtime config
# - DB_ADDRESS
# - DB_PORT
# - DB_SCHEMA
# - DB_USERNAME
# - DB_PASSWORD
COPY ${MGNL_ENV} $CATALINA_HOME/bin/setenv.sh

# MGNL war
COPY ${MGNL_WAR_PATH} ${DEPLOYMENT_DIR}/ROOT.war

The dockerfile is self-explanatory but its worth pointing out the differences from the original version:

JDBC_VERSION is an environment variable containing the name of the jar file to be added to Tomcat libs.
JDBC_URL is an argument variable containing the URL used to download the JDBC jar from.
MGNL_WAR_PATH is an argument variable containing the path of the custom war to be copied/deployed to Tomcat. Note this variable replaces the MGNL_WAR in the original version.
MGNL_ENV is an argument variable containing the path of a setenv.sh file which is going to configure database credentials as env variables in Tomcat.
The last three lines downloads the JDBC jar, copies the setenv.sh and war file into Tomcat.

Lets build the new image for author and public instances from the Dockerfile:

docker build -t ebguilbert/magnolia-cms-postgres:6.2.1-author --build-arg MGNL_AUTHOR=true .

docker build -t ebguilbert/magnolia-cms-postgres:6.2.1-public --build-arg MGNL_AUTHOR=false .

Note: If you checkout the project from git and try to compile with src folders present, it will take a long time since docker will "load" all the subfolders even though just the compiled war file is needed. So, it is strongly recommended to delete the src and target folders and only preserve the compiled war file before building the image.

Magnolia Persistent Manager and Datastore

Magnolia uses JCR to store all its contents through a Persistent Manager handling the persistent storage of content nodes and properties, with the exception of large binary values which are handled by a Datastore.

Why should I care?

Since we are using PostgreSQL to store the contents, we need to configure a postgres persistent manager for Magnolia, providing the credentials needed to connect to the database and configuring a datastore to store the large binary values.

It's important to notice that we want to store everything in the DB, so this will include all the components of the persistent manager like the datastore and the filesystem for versions and cache.

Why everything in the DB?

Since we are using Docker the idea is to have self-contained containers, so the storage should be handled by the postgres container and the server/app should be handled by the magnolia container, so they could be replaced and moved freely.

This persistent manager is configured by an xml file (jackrabbit-bundle-postgres-search.xml) which is loaded by Magnolia usually from the folder "WEB-INF/config/repo-conf/"

Let's take a look at the relevant sections of the file:

Datasource

  <DataSources>
    <DataSource name="magnolia">
      <param name="driver" value="org.postgresql.Driver" />
      <param name="url" value="jdbc:postgresql://${db.address}:${db.port}/${db.schema}" />
      <param name="user" value="${db.username}" />
      <param name="password" value="${db.password}" />
      <param name="databaseType" value="postgresql"/>
    </DataSource>
  </DataSources>

This is where the DB credentials are set. All the needed params are configured by environment variables that we will pass as arguments to the running Magnolia container (we'll see how in the next section). This is why a setenv.sh file is needed.

Filesystem

  <FileSystem class="org.apache.jackrabbit.core.fs.db.DbFileSystem">
    <param name="dataSourceName" value="magnolia"/>
    <param name="schemaObjectPrefix" value="fs_"/>
  </FileSystem>

This is an interface that acts as a file system abstraction for storing the global repository state. Since we want to store everything in the DB, we are using a db-filesystem.

This db-filesystem configuration is also used for the workspace filesystem and the versioning filesystem for things like search indexes and versions.

Datastore

  <DataStore class="org.apache.jackrabbit.core.data.db.DbDataStore">
    <param name="dataSourceName" value="magnolia"/>
    <param name="schemaObjectPrefix" value="ds_"/>
  </DataStore>

Normally all node and property data is stored in a persistence manager, but for large binaries the datastore is used. This is usually stored in the local-server storage but since we want to store everything in the DB, we are using a db-datastore.

SearchIndex

This section was added recently due to a confusion with the idea of everything in the DB. The search index is actually stored in the filesystem (the main idea of an index is to prevent querying the DB). This means JCR still stores files in the instance filesystem:

# ls ${MGNL_REPOSITORIES_DIR}/magnolia/workspaces/website/ 
index  workspace.xml

As you can see only the index entries and index configuration file (per workspace) are stored in the filesystem.

Magnolia-Postgres Container

Based on the image we just built with the postgres persistent manager configured, let's run the image in the same network as the database (postgres container):

docker run --rm -d -p 8080:8080/tcp --mount source=mgnlauthor,target=/opt/magnolia \
       --network mgnlnet --name mgnlauthor \
       -e DB_ADDRESS=mgnlauthor-postgres \
       -e DB_PORT=5432 \
       -e DB_SCHEMA=magnolia \
       -e DB_USERNAME=magnolia \
       -e DB_PASSWORD=mysecretpassword \
       ebguilbert/magnolia-cms-postgres:6.2.1-author

Looking at the params provided we can see the database credentials being configured dynamically at running time.

Let's run a public container with the credentials of the mgnlpostgres-public db container:

docker run --rm -d -p 8090:8080/tcp --mount source=mgnlpublic,target=/opt/magnolia \
       --network mgnlnet --name mgnlpublic \
       -e DB_ADDRESS=mgnlpublic-postgres \
       -e DB_PORT=5432 \
       -e DB_SCHEMA=magnolia \
       -e DB_USERNAME=magnolia \
       -e DB_PASSWORD=mysecretpassword \
       ebguilbert/magnolia-cms-postgres:6.2.1-public

Note: We have used the same network and volumes created in the previous blog post. If you haven't created the volumes yet, you need to do it before running the images:

docker volume create mgnlauthor

docker volume create mgnlpublic

Multi-container docker Magnolia app

As you can see the whole setup for Magnolia in docker with a DB involves many configurations and quite some containers for the different databases and webserver apps. There are things like db password secrecy and container health-check (to relaunch public instances automatically) that could be automatically managed by docker tools like Docker Compose. But these improvements will be covered by the following post.

As a general note, all the files involved in this post, including the source project for the custom Magnolia war configured for PostgreSQL are available in this git project.

Trying out Docker with Magnolia CMS

5/03/2020 Posted by Edwin Guilbert docker, magnoliacms, tomcat 8 comments

Deploying old fashion way

When developing web apps sooner or later you need to handle the deployment of your app. A web server is needed to run your app so you need to install one which in some case needs deep knowledge about the one you go for.

In the case of java web apps, like Magnolia CMS, there are hundreds of options to choose from. In this post we will use Tomcat to deploy Magnolia.

Why Tomcat?

Tomcat is a good candidate since its very well adopted by the java community and its a good compromise between its feature set and its low complexity (it doesn't support EARs for example).

The idea of downloading, installing and configuring Tomcat over and over again every time you want to deploy and run your app can be cumbersome and tiring. Specially when you also have to install a database and/or want to take advantage of cloud platforms and services.

Deploying container way

This is where containers like Docker can be helpful since the only thing you need to do is run a Tomcat image and then copy your WAR file there and you are ready to go. Any OS, library, tool, JDK, etc has been already handled and set-up for you. And on top of that you can move your environment to any cloud platform and it will run just as smooth as in your on-premise/local one.

In the case of Magnolia, there are a couple of images out there already available. None of them is official and are somehow outdated. In this post, I will try to build yet another one, from scratch, so anyone can do the same with their own flavours of OS, server and deployment configuration.

Base image

The first step is to choose a base image with tomcat already installed and ready to deploy your web app. There are many images available in Docker Hub, including the official ones.

Just for fun (and also to feel a bit more in control of the environment) let's pick an OpenJDK image based on Debian slim version.

Why the slim version?

The idea of docker containers is to provide the minimal resources needed for a service in particular, so for Magnolia this would mean the minimum required to run Tomcat and deploy Magnolia on it. Debian has been experimenting with "providing a slimmer base (removing some extra files that are normally not necessary within containers, such as man pages and documentation)"

Why Debian?

Actually my first attempt was to use an alpine linux image with OpenJDK but I stumbled upon some issues with glib libraries that were needed by Magnolia. Apparently there is a workaround in case someone one wants to experiment with it :)

Regarding Debian, is widely used in docker images and in general, so it's a safe bet.

Base Dockerfile

In docker, you need a dockerfile in order to build and run a container. The dockerfile is a way to describe all the things needed to create and run a container which provides a service.

I created a dockerfile based on openjdk 8 since its the minimum required by Magnolia, and then I just downloaded Tomcat, installed it and exposed port 8080. You can check the image at docker hub with the name: ebguilbert/tomcat-slim.

Note: I updated the dockerfile to use openjdk 11. But you can always checkout the previous commit.

Magnolia Container

Defining the image

Once we have our base image with an OS, JDK and Tomcat already set-up, the only thing needed is to copy a Magnolia war file and Tomcat will automatically deploy it.

This might seem like a simple step but in fact there are a couple of things you want to take care of when creating your own Magnolia Docker image:

The base image

Taken from the previous section, lets use ebguilbert/tomcat-slim:9 (9 is the tag stands for Tomcat 9.0.54 build)

FROM ebguilbert/tomcat-slim:9

Env variables

Environment variables are usually a good practice since they keep things organised in your dockerfile so anyone can check and modify relevant configuration at a glance. It's important to note that these variables will also be present in the running OS of the container so anyone can check them out by inspecting the image or by shell commands.

The first two variables are the version and the server to download the war from. These variables are only relevant for using a standard bundle, probably just for light development. In case you are building your own war, they are not needed anymore.

ENV MGNL_VERSION 6.2
ENV MGNL_SERVER https://files.magnolia-cms.com

Note: The version should be updated to the latest one, i.e 6.2.x, the URL (files.magnolia-cms.com) is only accesible for the latest version.

The following variables are very important, first the app main configuration folder, then the JCR repositories folder, logs folder and lastly the light-modules folder.

ENV MGNL_APP_DIR /opt/magnolia
ENV MGNL_REPOSITORIES_DIR ${MGNL_APP_DIR}/repositories
ENV MGNL_LOGS_DIR ${MGNL_APP_DIR}/logs
ENV MGNL_RESOURCES_DIR ${MGNL_APP_DIR}/light-modules

Arguments

Env variables are part of the image and can't be changed outside the dockerfile. But, if you define your variables as arguments you could change them at building time, which means the same dockerfile can be used for different configurations/builds.

Let's define the type of instance, the name of the war to be downloaded and the heap memory as arguments. We will see in the next section how we could use them to build different images.

ARG MGNL_AUTHOR=true
ARG MGNL_WAR=magnolia-dx-core-webapp
ARG MGNL_HEAP=2048M

JVM args

Since we are building a Java environment, it is very relevant to specify the heap memory that Tomcat should use to run our app. In this case we are using 2GB (minimum required by Magnolia). The other arguments are just the env/arg variables described previously.

ENV CATALINA_OPTS -Xms64M -Xmx${MGNL_HEAP} -Djava.awt.headless=true \
-Dmagnolia.bootstrap.authorInstance=${MGNL_AUTHOR} \
-Dmagnolia.repositories.home=${MGNL_REPOSITORIES_DIR} \
-Dmagnolia.author.key.location=${MGNL_APP_DIR}/magnolia-activation-keypair.properties \
-Dmagnolia.logs.dir=${MGNL_LOGS_DIR} \
-Dmagnolia.resources.dir=${MGNL_RESOURCES_DIR} \
-Dmagnolia.update.auto=true

Volume

Volumes are a way to share files between the host and the container. Without volumes, whatever you store in your container will be lost when you stop it. In the case of Magnolia we need to persist in the host machine the app folder since it will contain configurations like private/public keys, indexes, logs and the file-based contents.

VOLUME [ "${MGNL_APP_DIR}" ]

The war file to deploy

The final step is to obtain and copy the war file into the Tomcat's webapp folder so it's deployed when the container is started.

RUN wget -q ${MGNL_SERVER}/${MGNL_WAR}-${MGNL_VERSION}.war -O ${DEPLOYMENT_DIR}/ROOT.war

Like I said before I am using a standard bundle downloaded from Magnolia servers. But if you are providing your own war file you might need to change it to something like:

COPY my-application.war $DEPLOYMENT_DIR

It's also worth noticing that if you want to use the community standard bundle you will need to change the MGNL_SERVER and MGNL_WAR variables.

Note: If you use https://files.magnolia-cms.com URL and the certificate is invalid, wget will fail. In order to prevent that, try adding --no-check-certificate parameter to wget command

Building the image

In Magnolia you usually need at least one author and one public instance, so let's build an image for each:

docker build -t ebguilbert/magnolia-cms:6.2-author --build-arg MGNL_AUTHOR=true .

docker build -t ebguilbert/magnolia-cms:6.2-public --build-arg MGNL_AUTHOR=false .

Running the image

Managing volumes

Before running the image we need to create a volume in the host for each instance/container. In Magnolia you usually need at least one author and one public instance, so let's create one volume for each:

docker volume create mgnlauthor

docker volume create mgnlpublic

Managing network

A network is needed since the author instance needs to communicate to public instances for publication of content. If you don't provide any network when running the images, they will be in the default bridge network where both of them will be assigned dynamic IPs and they could be connected by IP. But this is not the desired way since we can't know the IP being assigned to each container (without inspecting the container with docker inspect).

For easy connection we will need to assign network aliases to the containers. In order to do that we will create our own network providing an ip range so they are isolated from other containers:

docker network create --subnet=192.168.42.0/24 mgnlnet

Running containers

We are finally ready to run our author and public containers, using our volumes and network:

docker run --rm -d -p 8080:8080/tcp --mount source=mgnlauthor,target=/opt/magnolia \
--network mgnlnet --name mgnlauthor ebguilbert/magnolia-cms:6.2-author

docker run --rm -d -p 8090:8080/tcp --mount source=mgnlpublic,target=/opt/magnolia \
--network mgnlnet --name mgnlpublic ebguilbert/magnolia-cms:6.2-public

Lets take a moment to review all the options provided:

Internally the containers are exposing port 8080, but the author is using port 8080 on the host and the public is using 8090. You can assign any port you like as the first parameter.
The volumes on the host will be linked to /opt/magnolia in the container.
Network mgnlnet is used by both containers where the author is called mgnlauthor and public is mgnlpublic.

Publishing content

Final step for a working Magnolia installation would be to configure the public instance as a receiver in the author instance. For that we will make use of the aliases provided in the network, let's configure the receiver in the publishing-core module of the author instance:

* publishing-core

* config

* receivers

* mgnlpublic

url: http://mgnlpublic:8080

Notice we are using the container port and not the host one.

Final Dockerfile

FROM ebguilbert/tomcat-slim:9

LABEL maintainer="Edwin Guilbert"

# ENV variables for Magnolia
ENV MGNL_VERSION 6.2
ENV MGNL_SERVER https://files.magnolia-cms.com
ENV MGNL_APP_DIR /opt/magnolia
ENV MGNL_REPOSITORIES_DIR ${MGNL_APP_DIR}/repositories
ENV MGNL_LOGS_DIR ${MGNL_APP_DIR}/logs
ENV MGNL_RESOURCES_DIR ${MGNL_APP_DIR}/light-modules

# ARGS
ARG MGNL_AUTHOR=true
ARG MGNL_WAR=magnolia-dx-core-webapp
ARG MGNL_HEAP=2048M

# JVM PARAMS
ENV CATALINA_OPTS -Xms64M -Xmx${MGNL_HEAP} -Djava.awt.headless=true \
-Dmagnolia.bootstrap.authorInstance=${MGNL_AUTHOR} \
-Dmagnolia.repositories.home=${MGNL_REPOSITORIES_DIR} \
-Dmagnolia.author.key.location=${MGNL_APP_DIR}/magnolia-activation-keypair.properties \
-Dmagnolia.logs.dir=${MGNL_LOGS_DIR} \
-Dmagnolia.resources.dir=${MGNL_RESOURCES_DIR} \
-Dmagnolia.update.auto=true

# VOLUME for Magnolia
VOLUME [ "${MGNL_APP_DIR}" ]

RUN wget -q ${MGNL_SERVER}/${MGNL_WAR}-${MGNL_VERSION}.war -O ${DEPLOYMENT_DIR}/ROOT.war

Note: The version of Magnolia should be updated to the latest one, i.e 6.2.x, the URL (files.magnolia-cms.com) is only accesible for the latest version. Also if the URL's certificate is invalid, try adding --no-check-certificate to wget.

Database

Since this post ended up quite long already, let's talk about best practices for configuring a dockerized database in the next post.