Posts about Web Development, Java, Magnolia CMS and beyond

7/29/2016

Customize Lucene index for searching texts with accents (or any other special character)

7/29/2016 Posted by Edwin Guilbert , , , No comments
Any application that requires full text indexing and searching capabilities frequently uses Lucene as a content retrieval library.

In content management systems its usually used for searching contents through websites. Magnolia CMS uses JCR for storing and managing all its data. JCR is implemented by Apache Jackrabbit which by default uses Lucene for full text indexing and searching.

The problem is that by default Lucene uses an english semantic analyser, which means that when indexing it will not take into account special character from other languages.

What does this mean?

It means that if you have contents with the word "competición", the index will understand that "Competición", "COMPETICIÓN" and "competición" are all the same words, but if you try to search for "competicion" (notice there is no accent here) it will return no results.

How does Lucene solve this problem?

Lucene has something called analyzers that will contain filters for indexing and searching. It will filters for lower/upper case words, stopwords (common words that are taken into account) and stems (ways to detect the semantic root of words).

You can create your own analyser with more filters if you want to fine tune your search results. However, Lucene comes with a set of common analysers for specific languages.


In the case I mentioned above we want that spanish words are correctly analysed, so we use the spanish analyser, which in the case of Magnolia will be the one included in Lucene 3.6.0 (the one used in Jackrabbit 2.8.0).

org.apache.lucene.analysis.es.SpanishAnalyzer

We have to include the following library in Magnolia (which doesn't come by default):

<dependency>
 <groupId>org.apache.lucene</groupId>
 <artifactId>lucene-analyzers</artifactId>
 <version>3.6.0</version>
</dependency>

 Then we have to configure this analyser in Lucene config xml, which in Magnolia is located in:

WEB-INF/config/default/repositories.xml

The section you need to change is:

<SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex"> 

with the param:

<param name="analyzer" value="org.apache.lucene.analysis.es.SpanishAnalyzer"/> 

The whole section will look like this:

    <SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
      <param name="path" value="${wsp.home}/index"/>
      <!-- SearchIndex will get the indexing configuration from the classpath, if not found in the workspace home -->
      <param name="indexingConfiguration" value="/info/magnolia/jackrabbit/indexing_configuration.xml"/>
      <param name="analyzer" value="org.apache.lucene.analysis.es.SpanishAnalyzer"/>
      <param name="useCompoundFile" value="true" />
      <param name="minMergeDocs" value="100" />
      <param name="volatileIdleTime" value="3" />
      <param name="maxMergeDocs" value="100000" />
      <param name="mergeFactor" value="10" />
      <param name="maxFieldLength" value="10000" />
      <param name="bufferSize" value="10" />
      <param name="cacheSize" value="1000" />
      <param name="forceConsistencyCheck" value="false" />
      <param name="autoRepair" value="true" />
      <param name="queryClass" value="org.apache.jackrabbit.core.query.QueryImpl" />
      <param name="respectDocumentOrder" value="true" />
      <param name="resultFetchSize" value="100" />
      <param name="extractorPoolSize" value="3" />
      <param name="extractorTimeout" value="100" />
      <param name="extractorBackLogSize" value="100" />

    </SearchIndex>

You need to redeploy your webapp so the indexes can get recreated

7/22/2016

A/B Testing webpages and google analytics experiments

7/22/2016 Posted by Edwin Guilbert , , No comments
A/B testing has been a trend among web content editors and marketers in the last few years. It is not a secret that high performing webpages will lead to more page visits, less bouncing rates and more conversions, which usually means more revenue.

However, we need to find out what we need to achieve with a specific page or content and try to measure it before performing any comparison tests.

We can measure how the webpage is performing through metrics, some examples are:
  • Call to actions (clicking on a button, playing a video).
  • Product purchases.
  • E-mail signups.
From these metrics you could calculate conversion rates, which basically states how many visitors have turn into customers in a given time frame. Examples of conversions could be visitors buying products in an e-commerce website or readers clicking on advertisements in a blog website.

You may be wondering: what ab testing has to do with any of this? Well, the answer is simple: In ab testing you could improve your conversions by comparing different versions of the same page. The idea is to change key elements that might improve your conversion rate. You usually have a original page (so called control page) and a variation of it. The users will get any of these versions randomly and after a period of time you can compare which version did better according to a goal (or conversion).

But we are not here just for explaining how ab testing works, we actually want to do some ab testing in a real web page with a concrete CMS and analytics software. So we will use Magnolia CMS for managing web pages and Google Analytics Experiments for recording/analysing statistics.

In Magnolia you can create variations of a page and editing them independently (this is only available on Enterprise editions).

We are going to work with the demo website travels, specifically with the "about" page which contains a video. The goal here would be get this video played more times, so the metric we are going to use will be the play event on this element of the dom.

The original page looks like this:


And the variation will have a wider video, removing the left side links which might distract the visitor and prevent him from playing the video (our goal in this case).


To create a variation of a page, you just have to select "Add page variant" in the pages app of Magnolia.


Open the variant created and update the video component:


After publishing the variant, you need to connect these pages to Google Analytics using a Javascript snippet for tracking page views. You can embed this snippet with Marketing Tags in Magnolia (don't forget to include it in all your pages):


After publishing the snippet (or marketing tag in Magnolia) you need to send an event for every time a user plays the video included in the pages we want to test, so we need to create another marketing tag for that:


<script>
$(".video-wrapper video").on('play',function(){
  ga('send','event','video','play');
});
</script>

After publishing the snippet you need to actually record the number of play video events as conversions or "goals" in Google Analytics:




After the goal is configured, you might want to test it in the real time tab of Google Analytics, so every time you play a video in your page, an event gets registered as a conversion:



At this point you finally have all prepared to start your ab testing on the page variants measuring video played metrics as a goal. You can automatically create experiments (as how Google calls them) in Magnolia using the A/B testing module:


As shown in the screenshot below that you can define the name of the experiment, the page, and variants involved , the duration and the goal for the test. The goal is dynamically retrieved from the Google Analytics account configured:


The module will create a Javascript snippet as a marketing tag, which you need to publish, so Google can forward users to one variant or the other:


After publishing the snippet you can already start the experiment in Google Analytics through the A/B testing module:


And that's all folks, you can always check your statistics in Google Analytics behaviour tab:


I hope you enjoyed my first post and find this useful. I will make a second part of this post explaining how to do ab testing manually without using the A/B testing and Personalisation modules  (both need the Enterprise version of Magnolia).