Indexes and metadata

Introduction

Indexing is action to make object data searchable. Plone stores available indexes in the database. You can create them through-the-web and inspect existing indexes in portal_catalog on Index tab.

Indexes and metadata

portal_catalog does subset of object field as a copy and makes them searchable.

  • Indexes make content searchable: Indexes are are stored values which are used to match queries. Indexed might be preprocessed to make the matching possible. For example, full text search indices run incoming text output through splitters and such filters to generate fast searchable data out of it.
  • Metadata make content summariable: Metadata, also known as columns, are stored values which can be displayed to the user with the search hit. They usually copy the field value as is.

Metadata can exist without index and vice versa.

Viewing indexes and indexed data

Indexed data

You can do this through portal_catalog tool in ZMI.

  • Click portal_catalog in the portal root
  • Click Catalog tab
  • Click any object

Indexes and metadata columns

Available indexes are stored in the database, not in Python code. To see what indexes your site has

  • Click portal_catalog in the portal root
  • Click Indexes and Metadata tabs

Creating an index

To perform queries on custom data, you need to add corresponding index to portal_catalog first.

E.g. If your content type has a field/method:

class MyContent(...):

        def getMyCustomValue(self):
                return 111

You can add a new index which will index the value of this field, so you can make queries based on it later.

  • Type: FieldIndex
  • Id: getMyCustomValue
  • Indexed attributes: getMyCustomValue

You can use Archetypes accessors methods directly as an indexed attribute. You do not need to add a new method just to get something indexed, unless you want to use different method name for any particular reason. If you want to create an index for content type you do not control yourself or if you want to do some custom logic in your indexer, please see Custom index method below.

The type of index you need depends on what kind queries you need to do on the data. E.g. direct value matching, ranged date queries, free text search, etc. need different kind of indexes.

Object reindexing is run when the object is edited. If you add a new index you need to run Rebuild catalog to get the existing values from content objects to new index.

Adding an index through Zope management interface

  • Go ZMI

  • Click portal_catalog

  • Click Indexes tab

  • On top right corner, you have a drop down menu to add new indexes. Choose the index type you need to add.

  • Add a method to your content class code with the same name as indexed attribute (e.g. getMyCustomValue)

  • After this you can query portal_catalog:

    my_brains = contex.portal_catalog(getMyCustomValue=111)
    for brain in my_brains:
            print brain["getMyCustomValue"]
    

Adding index using add-on product installer

This way is repeatable: index gets created every time an add-on product is installed. It is more cumbersome, however.

Custom index methods

plone.indexer provides method to create custom indexing functions. These methods can be retrofitted to content types you do not directly control e.g. you do not need to mess with class code.

Note

This method is available since Plone 3.3.

import Missing

from plone.indexer.decorator import indexer

# indexer decorator matches all objects against a marker interface before being run
@indexer(IConvergenceSupport)
def getContentMedias(object, portal, **kw):
    """ Provide indexing hooksk for portal_catalog """

    if IConvergenceSupport.providedBy(object):

        schema = object.Schema()

        if not "contentMedias" in schema:
            # Missing.Value must be returned if the indexing
            # cannot be complete for the object
            return Missing.Value
        else:
            filter = getUtility(IConvergenceMediaFilter)
            return filter.getContentMedia(object)

Index types

Zope 2 product PluginIndexes defines various portal_catalog index types used by Plone.

  • FieldIndex stores values as is
  • DateIndex and DateRangeIndex store dates (Zope 2 DateTime objects) in searhable format. The latter provides ranged searches.
  • KeywordIndex allows keyword-style look-ups (query term is matched against the all values of a stored list)
  • ZCTextIndex is used for full text indexing
  • ExtendedPathIndex is used for indexing content object locations.

Default Plone indexes and metadata columns

Some interesting indexes

  • start and end: Calendar event timestamps, used to make up calendar portlet
  • sortable_title: Title provided for sorting
  • portal_type: Content type as it appears in portal_types
  • Type: Translated, human readable, type of the content
  • path: Where the object is (getPhysicalPath accessor method).
  • object_provides: What interfaces and marker interfaces object has. KeywordIndex of interface full names.
  • is_default_page: is_default_page is method in CMFPlone/CatalogTool.py handled by plone.indexer, so there is nothing like object.is_default_page and this method calls ptool.isDefaultPage(obj)

Some interesting columns

  • getRemoteURL: Where to go when the object is clicked
  • getIcon: Which content type icon is used for this object in the navigation
  • exclude_from_nav: If True the object won’t appear in sitemap, navigation tree

Indexing an object

Warning

Unit test warning: Usually Plone reindexes modified objects at the end of each request (each transaction). If you modify the object yourself you are responsible to notify related catalogs about the new object data.

Indexing an object is done by calling reindexObject() method. reindexObject() method is defined in ICatalogAware interface.

Plone calls reindexObject() if

  • The object is modified by the user using the standard edit forms

You must call reindexObject() if you

  • Directly call object field mutators
  • Otherwise directly change object data

reindexObject() method takes optional argument idxs which will list the changed indexes. If idxs is not given, all related indexes are updated even though they were not changed.

Example:

object.setTitle("Foobar")

# Object.reindexObject() method is called to reflect the changed data in portal_catalog.
# In our example, we change the title. The new title is not updated in the navigation,
# since the navigation tree and folder listing pulls object title from the catalog.

object.reindexObject(idxs=["Title"])

Also, if you modify security related parameters (permissions), you need to call reindexObjectSecurity().

TextIndexNG3

TextIndexNG3 is advanced text indexing solution for Zope.

Please read TextIndexNG3 README.txt regarding how to add support for custom fields. Besides installing TextIndexNG3 in GenericSetup XML you need to provide a custom indexing adapter.

# Add TextIndexNG3 in catalog.xml. Example:

<index name="getYourFieldName" meta_type="TextIndexNG3">

  <field value="getYourFieldName"/>

  <autoexpand value="off"/>
  <autoexpand_limit value="4"/>
  <dedicated_storage value="False"/>
  <default_encoding value="utf-8"/>
  <index_unknown_languages value="True"/>
  <language value="en"/>
  <lexicon value="txng.lexicons.default"/>
  <query_parser value="txng.parsers.en"/>
  <ranking value="True"/>
  <splitter value="txng.splitters.simple"/>
  <splitter_additional_chars value="_-"/>
  <splitter_casefolding value="True"/>
  <storage value="txng.storages.term_frequencies"/>
  <use_normalizer value="False"/>
  <use_stemmer value="False"/>
  <use_stopwords value="False"/>
</index>

# Create adapter which will add TextIndexNG3 indexing support for your custom fields. Example:

import logging

from Products.TextIndexNG3.adapters.cmf_adapters import CMFContentAdapter
from zope.component import adapts

logger = logging.getLogger("Plone")

class TextIndexNG3SearchAdapter(CMFContentAdapter):
    """ Adapter which provides custom field specific index information for TextIndexNG3
    """

    # Your content marker interface here
    adapts(IDescriptionBase)

    def indexableContent(self, fields):
        """ Produce TextIndexNG3 indexing information for the object

        Traceback::

              ZCatalog.py(536)catalog_object()
            -> update_metadata=update_metadata)
              Catalog.py(360)catalogObject()
            -> blah = x.index_object(index, object, threshold)
              Products/TextIndexNG3/TextIndexNG3.py(91)index_object()
            -> result = self.index.index_object(obj, docid)
              Products/TextIndexNG3/src/textindexng/index.py(114)index_object()
            -> default_language=self.languages[0])
              Products/TextIndexNG3/src/textindexng/content.py(99)extract_content()
            -> icc = adapter.indexableContent(fields)
            > indexableContent()

        """
        logging.debug("Indexing" + str(self.context))

        # Use superclass to construct generic field adapters (id, title, description, SearchableText)
        icc = CMFContentAdapter.indexableContent(self, fields)

        # These fields have their own TextIndexNG3 indexes which
        # are queried separately from SearchableText
        accessors = [ "getClassifications", "getOtherNames" ]

        for accessor in accessors:

            try:
                method = getattr(self.context, accessor)
            except AttributeError:
                logger.warn("Declared indexing for unsuppoted accessor:" + accesor)
                continue

            value = method()

            # We might have a value which is not a real string,
            # but must be first stringified
            try:
                value = unicode(value)
            except UnicodeDecodeError, e:
                # Bad things happen here?
                logger.warn("Failed to index field:" + accessor)
                logger.exception(e)
                continue

            # Convert value to text format (utf-8) expected
            # by the indexer
            text = self._c(value)

            icc.addContent(accessor, text, self.language)

        return icc

# Add adapter in your ZCML:

<adapter factory=".customcontent.TextIndexNG3SearchAdapter"/>