Elasticsearch Index Publishing

The Web UI Search Screen offers a modern faceted search experience. This requires STEP data to be published to an Elasticsearch cluster that is accessible from the STEP application server. For the end user, the Search Screen provides search results that can be refined and modified as needed.

Note: Elasticsearch is required to use the search functionality in the Instrument UI.

Important:  

  • Before starting the configuration outlined in this topic, contact Stibo Systems to activate Elasticsearch.

  • To ensure current STEP data is displayed on the Web UI faceted Search Screen, verify that the event processor indicated by the active Elasticsearch configuration is enabled and is set to Read Events.

  • Before the Elasticsearch event processor can publish incremental updates to Elasticsearch, the Elasticsearch configuration must be initialized by running reindex. Refer to the Reindex the Elasticsearch Database section below.

  • Calculated attributes have a noticeable negative effect on publishing performance. It is recommended to avoid publishing calculated attributes and to instead use a business action to set the values prior to publishing. When it is not possible to avoid calculated attributes completely, publish no more than five (5) calculated attributes to reduce the performance impact.

  • Only one reindexing process can run at a time. If the user tries to start a new reindex process, the message 'Reindexing is already in progress' will display.

How it Works

After configuring Elasticsearch using the steps defined in the Elasticsearch Quick Start topic and reindexing, the Reindexing Controlling background process gathers and prepares the data specified by the object type for extraction and begins the indexing process on the Elasticsearch server. It uses a parallel framework that delegates the extraction, transformation, and loading (ETL) processes to multiple Elasticsearch Indexing background processes that run concurrently.

The Elasticsearch Event Processor polls for new incremental updates every 10 seconds and creates up to 5 Elasticsearch Indexing background processes that may run concurrently.

Elasticsearch Event Processor

The Elasticsearch event processor is only used to process incremental update events. This allows updates to the data to occur while reindexing is happening. How the Elasticsearch event processor is managed and maintained is described below.

Elasticsearch Event Processor Performance

For the best Elasticsearch performance, configure the following:

  1. On the event processor associated with the active Elasticsearch configuration:

    • Enabled parameter is 'Yes'

    • Event Mode parameter is 'Standard' (Setting to 'Duplicate' may slow down performance.)

    • Queue Status parameter is 'Read Events'

  2. On the Elasticsearch Configuration Type object, click the Reindexing tab and verify the last indexing attempt was successful. If there are any failed attempts, review the Execution Report and resolve any issues identified.

Monitoring Elasticsearch Event Processor

The Elasticsearch event processor is used to process incremental update events. It polls for new events every 10 seconds and when an event is found, an Elasticsearch Indexing background process is created.

  1. To view Elasticsearch Indexing background processes, under Event Processor tab for the Elasticsearch event processor, navigate to 'Current Background Process Log' section.

  2. Choose a method to review the Elasticsearch Indexing background processes and the related execution report:

    • Click an Elasticsearch Indexing background process link to view an execution report for the selected background process.

    • Navigate to the Background Processes tab for the event processor, in the 'Id' column, hover over and click the Id link of the event processor background process. This will navigate to the execution report for the Elasticsearch processor which will allow you to view all of the Elasticsearch indexing background processes.

    • Review an Elasticsearch Indexing background process Execution Report and click a background process link to view the details.

    • Navigate to the BGP Processes menu in the STEP Workbench. Refer to the Elasticsearch Indexing Background Processes section for additional information.

Reindex the Elasticsearch Database

Reindexing is necessary for the following reasons:

  • Any changes made to an existing Elasticsearch configuration

  • Creating a new Elasticsearch configuration

  • Any change to the data (e.g. add / remove a business condition)

  • Before the Elasticsearch event processor can publish incremental updates to Elasticsearch

  • To configure the objects that should trigger the Elasticsearch processor. Refer to the Elasticsearch Indexer Processing Plugin Parameters and Triggers for more information.

While reindexing the database, Elasticsearch is still available to the end user.

Reindexing an Elasticsearch Configuration

Initial indexing is required for a brand new Elasticsearch configuration and it is also required for a modified existing Elasticsearch Configuration.

Note: If an additional Elasticsearch Configuration has been newly created but was not reindexed upon completing the wizard, initiate a manual reindex by following these steps.

  1. Choose a method to start reindexing:

    • Navigate to the Reindexing tab on the configuration and under the Reindexing Process table, click the 'Reindex' link.

    • Alternatively, right-click the configuration node from System Setup and select 'Reindex' from the context menu.

      Note: After the initial Elasticsearch configuration for every new STEP system, the following message may display: 'Elasticsearch is initializing. Try again in 5 minutes. If this problem persists for more than 1 hour, contact Stibo Systems Support'. This triggers the Elasticsearch server to turn on. The user should try again after 5 minutes and contact Stibo Systems Support if the message persists.

  2. In the Reindexing window, set the desired Start:

    • Select 'Now' and then click 'Reindex now' to begin the process immediately.

    • Select 'Later' to schedule the reindexing for later, and set a time for the process to initiate, then click 'Schedule reindexing'.

    Note: To help balance the strain on the system, it is recommended to schedule reindexing for a later time if many background processes are currently running.

    Once reindexing has been initiated, the reindexing controlling background process is created.

  3. Navigate to the background process beneath the 'Reindexing Process' section to manage the background process as required:

Monitoring Reindex Processes

The Reindexing tab on an Elasticsearch configuration provides a list of current and previous reindex events. Details of the background processes are also available from the BG Processes tab in the STEP Workbench. Refer to the Elasticsearch Indexing Background Processes section for additional information.

  1. To view the details of the current and previous reindexing processes, select the background process to monitor by hovering over and clicking the ID link in either the 'Reindexing Process' or 'Reindexing Process History' sections. The background processes found on this screen are the controlling background processes.

    In the 'Indexing Subprocesses' section, the individual reindexing background subprocesses for the controlling reindexing background process are displayed. For details on the subprocess, hover over and select the ID link to display the Background Processes tab for the subprocess.

  2. On the Background Process tab for the selected background process, review the Properties and the Execution Report. The progress of reindexing can be tracked with the progress bar located under the 'Progress' parameter. The execution report is a full summary of what occurred during reindexing.

Republishing Smaller Sets of Objects to Elasticsearch

If a subset of objects can be identified by a search criteria, it is possible to reindex only those objects by republishing events on those specific objects to the Elasticsearch event processor instead of rebuilding the entire index.

The following methods can be used to republish a smaller set of objects:

  • Bulk update using a Send Republish event operation. Refer to the Send Republish Event Operation topic in the Bulk Updates documentation for more information.

  • Republishing a collection. Refer to the Maintaining Collections topic in the Getting Started documentation for more information.

These methods cannot be canceled as the Reindex process can.

Important: For reindexing the majority of the indexed objects, it is highly recommended to use the Reindex option rather than republishing.

Republish to the Elasticsearch Database

An alternative solution of reindexing to publish data to Elasticsearch is to use the event processor republish option:

  1. Generate events for the STEP data required in the Web UI Search Screen using the republish action as defined in the Event-Based OIEP Event Actions topic in the Data Exchange documentation. The republish background process generates events for the configured products, assets, classifications, and/or entities.

  2. Publish STEP data to the Elasticsearch database by invoking the event processor running the Elasticsearch Configuration as defined in the Running an Event Processor topic in the System Setup documentation. The event processor background process creates indexes and publishes products, assets, classifications, and/or entities to the Elasticsearch database.

Elasticsearch Indexing Background Processes Scheduling

The number of concurrent running Elasticsearch Indexing background processes is controlled by the background process execution management. Elasticsearch Indexing background processes are created with priority 'Medium'. For more information, refer to the BGP Execution Management topic.

Legacy Background Process Queue Configuration

Priority of background processes does not apply to Legacy Background Process Queue Management (Multiple Queues).

To maximize the throughput of the Elasticsearch Index publishing process, the queue size should be set to 5. For additional information, refer to the Default Configuration for Legacy BGP Queues topic.

Important: Changing the queue size of the Legacy Background Process Queues may impact other background processing queues running on the STEP system. If you choose to implement the legacy Multiple Queues BGP execution option, perform thorough testing in lower systems of similar sizing before running in production.

Elasticsearch Indexing Background Processes

The background processes for both the controlling reindex background process and subprocesses can be viewed from the BG Process tab. The Elasticsearch node contains reindex subprocesses and the event processor background processes while the Re-index node contains the controlling reindex background processes.

Navigate to the BG Processes tab in the STEP Workbench and expand either the Elasticsearch node or the Re-index node. Select any of the process phases to review the progress of the background processes.

Note: Additional information about the BG Processes tab can be found BG Processes Tab topic under the Getting Started documentation.

Elasticsearch Server Indexing Capacity (On-Premises Installations)

For on-premises installations, it is relevant to understand that each Elasticsearch Indexing background process with a maximum of 5 processes makes concurrent batch requests to the Elasticsearch Server. The size of each batch request, as well as the actual number of concurrent batch requests, is dynamically adjusted so that no more than half of Elasticsearch server ‘Index pressure memory’ is consumed. This ensures that the indexing pressure from STEP does not exhaust the Elasticsearch server.

Elasticsearch Troubleshooting

Some of the most common errors encountered when troubleshooting Elasticsearch are listed below.

Failure Handling

Elasticsearch Outages

If the Elasticsearch server is unavailable, the Elasticsearch event processor will display a ‘Failed (retrying)’ state. The Elasticsearch indexing background processes will continue to retry. By default, the status of the Elasticsearch indexing background processes switches between a ‘Running’ and ‘Waiting’ state every 30 seconds. The ‘Waiting’ state will show as ‘Suspended’ in the logs and the ‘Running’ state will continue to try and run in a ‘Failed (retrying)’ state.

Once the issue with the Elasticsearch server has been resolved, the status will show as ‘Running’ and indexing will resume where it left off.

Error in Background Process

In the case of an Elasticsearch Server not being configured correctly, errors due to incompatible schema, incompatible version of Elasticsearch server, etc., or internal STEP errors, a 'Failed' status displays in the Execution Report. Once the error has been resolved, the controlling Reindexing background processes can be resumed by clicking the Resume button or the Indexing background processes generated by the event processor will reattempt at the next incremental update.

Cancellation of Controlling Reindexing Background Process

Canceling a reindex event causes both controlling and subprocess background processes to be aborted with no possibility to resume. The Cancel button will only be available if a cancellation is not already in progress.

  1. From the Elasticsearch configuration Reindexing tab under ‘Reindexing Process’, click the Cancel button.

  2. Choose an option:

    • To stop the cancellation process, click the 'Do not cancel reindexing' button.

    • To proceed with the cancellation, click the 'Cancel reindexing' button.

      All background processes that are running or waiting are moved to the ‘aborted’ state. The Execution Report will look similar to the below:

      Controlling Indexing background process:

      Indexing Subprocess: