Elasticsearch Publishing

The Web UI Search Screen offers a modern faceted search experience. This requires STEP data to be published to an Elasticsearch cluster that is accessible from the STEP application servers. For the end user, the Search Screen provides search results that can be refined and modified on-the-fly.

Note: Elasticsearch can display product, classification, and/or asset data only, based on configuration.

Important: Before starting the configuration outlined in this topic, contact your Stibo Systems account manager or partner manager for assistance. Activation and configuration for the faceted Search screen, Elasticsearch, and corresponding components / functionality should not be done without the assistance of Stibo Systems.

Important:  

  • To ensure current STEP data is displayed on the Web UI faceted Search Screen, verify that the event processor indicated by the active Elasticsearch configuration is enabled, is set to Read Events, and is running on a reasonable schedule.

  • If reindexing is still in progress, you will find data related to the previously existing index.

  • The amount of data being published directly impacts the amount of time required for the events to be processed. When publishing large amounts of data, it is recommended to schedule the process during user down time, such as overnight or on the weekend.

  • Calculated attributes have a noticeable negative effect on publishing performance. It is recommended to avoid publishing calculated attributes and to instead use a business action to set the values prior to publishing. When it is not possible to avoid calculated attributes completely, publish no more than five (5) calculated attributes to reduce the performance impact.

Elasticsearch Indexes and Shards

Before you begin publishing to Elasticsearch, it is important to understand what comprises an Elasticsearch index. At the highest level, an Elasticsearch cluster is a group of one or more node instances that are connected together. Data in Elasticsearch is organized into indexes, each of which is divided into shards (at least one). Shards are required in order to distribute tasks, search, and index data across nodes in the cluster. Adding more nodes and shards allows for efficient management of larger amounts of data.

Shards come in two varieties: primary and replica. Primary shards represent Elasticsearch data itself, and in the event that you start another Elasticsearch instance in the same cluster, having a greater number of primary shards allows for more coverage across that cluster. Replica shards represent a copy of their primary shard counterparts, and are used to increase search performance and for fail-over. A replica shard is never allocated to the same node as their primary shard counterpart.

Note: By default, a cluster has one node, and each index consist of two primary shards. Replicas are disabled.

Important: There is a limit of 1000 shards per node.

Elasticsearch Event Processor Troubleshooting

It is possible the user will encounter a Maximum shards open error. In this scenario, the execution log on the event processor includes an error message similar to:

Each index in Elasticsearch represents STEP data for a specific context and workspace combination, and is divided into one or more shards to protect against hardware failures. A single Elasticsearch configuration in workbench creates at least one for each workspace / context pair in the Elasticsearch database. For example, if you wish to search in both the Main and Approved workspaces, and have a system with 20 contexts, each time an Elasticsearch configuration is published or reindexed 40 indexes are created. If you set 2 primary shards and 1 replica per primary shard, 4 shards will represent a single index, resulting in 160 shards.

For easy identification and management of indexes in the Index Management tab, each index is named with the same prefix as defined in the Elasticsearch configuration.

Creating a new Elasticsearch configuration and then publishing to Elasticsearch creates 2 indexes for each context (one for the main workspace and one for the approved workspace). For example, on a system with 20 contexts, each time an Elasticsearch configuration is published or reindexed, 40 indexes are created.

All indexes consume shards and removing the unused indexes can increase the number of available shards. To manage the number of shards in the system, it is recommended that you follow the below actions in the order listed:

  1. Use the Index Management tab to delete unneeded indexes.

  1. Add additional nodes to the cluster.

  1. Consider optimizing the number of shards via sharedconfig.properties "NumberOfReplicas" and "NumberOfShards". The data must be reindexed after changing these properties.

Monitoring and Managing Shards

A list of all indexes for a particular Elasticsearch configuration can be found in the Index Management tab of the configuration. Every time a configuration is indexed, the table in this tab will update, which provides the shard count, document count, and total storage space usage of each index.

Individual indexes can be deleted by clicking the 'X' at the end of the row or you can click 'Remove All' to delete every index in the configuration.

This tab is also available on the setup group level, allowing the user to manage the indexes of all Elasticsearch configurations from one screen.

Reindex the Elasticsearch Database

It may be necessary to periodically update indexes. Whenever a change is made to the configuration, the system will prompt you to reindex the data.

Additionally, if you create an entirely new Elasticsearch configuration when one or more already exist on the system, or make any other kind of change to the data that would require reindexing (e.g. add / remove a business condition or change object types), you will be prompted to reindex after finishing the wizard.

While reindexing the database, Elasticsearch is still available to the end user. Any searches made during this time will use the previous index until the reindexing process is complete.

Reindex with Existing Configuration

  1. If reindexing is required for an existing Elasticsearch Configuration, navigate to the Reindexing tab on the configuration and under the Reindexing status flipper, click 'Reindex'. You can also right-click the configuration node from System Setup and select 'Reindex' from the context menu.

  1. In the Reindexing window that appears, select 'Now' and then click 'Reindex now' if you want to begin the process immediately. If you want to schedule the reindexing for later, select 'Later' and set a time for the process to initiate, then click 'Schedule reindexing'.

  1. Once reindexing has been initiated, a process generating events begins. You can cancel or reschedule the event process any time by navigating to the process beneath the 'Generating events processes' flipper and clicking either Cancel or Reschedule.

The actual reindexing of the data begins once the first event is read by the event processor, which can be monitored by navigating to the process beneath the 'Reindexing status' flipper. Click 'Cancel' if you want to stop the reindex process.

Note: If an additional Elasticsearch configuration has been newly created and you did not reindex upon completing the wizard, reindexing can be manually initiated by following the above steps.

Monitoring Reindex Processes

As described above, you can initiate a reindex process from the Reindexing tab on an Elasticsearch configuration. Additionally, this screen also provides a list of current and previous reindex events.

Reindex Process Performance

If the Event Processor configured on the Elasticsearch configuration is disabled, or its Event Mode is set as 'Efficient' in the event processor, a warning will appear.

To address this potential issue, click 'Go to Event Processor' and follow the warning's suggestion.

Republish to the Elasticsearch Database

An alternative solution to publish data to Elasticsearch is to use the event processor republish option:

  1. Generate events for the STEP data required in the Search Screen using the republish action as defined in the Event-Based OIEP Forward, Rewind, Purge, and Republish topic of the Data Exchange documentation here. The republish background process generates events for the configured products, classifications, and/or assets.
  2. Publish STEP data to the Elasticsearch database by invoking the event processor running the Elasticsearch Configuration as defined in the Running an Event Processor topic of the System Setup documentation here. The event processor background process creates indexes and publishes products, classifications, and/or assets to the Elasticsearch database.

Important: It is highly recommended to use the reindex solution rather than republishing.