Elasticsearch Index Publishing
The Web UI Search Screen offers a modern faceted search experience. This requires STEP data to be published to an Elasticsearch cluster that is accessible from the STEP application servers. For the end user, the Search Screen provides search results that can be refined and modified as needed.
Before you begin publishing to Elasticsearch, it is important to understand what comprises an Elasticsearch index. For more information, refer to the Elasticsearch Index Management topic.
Note: Elasticsearch needs to be set up to use the search functionality in the Instrument UI.
Note: Elasticsearch can display product, entity, classification, and/or asset data only, based on configuration.
Important: Before starting the configuration outlined in this topic, contact your Stibo Systems account manager to activate Elasticsearch.
Important:
-
To ensure current STEP data is displayed on the Web UI faceted Search Screen, verify that the event processor indicated by the active Elasticsearch configuration is enabled and is set to Read Events.
-
If reindexing is still in progress, you will find data related to the previously existing index.
-
The amount of data being published directly impacts the amount of time required for the events to be processed. When publishing large amounts of data, it is recommended to schedule the process during user down time, such as overnight or on the weekend.
-
Calculated attributes have a noticeable negative effect on publishing performance. It is recommended to avoid publishing calculated attributes and to instead use a business action to set the values prior to publishing. When it is not possible to avoid calculated attributes completely, publish no more than five (5) calculated attributes to reduce the performance impact.
-
It is not possible to start a new reindexing process while there is already one in progress. If the user tries to start a new process, the message 'Reindexing is already in progress' will display.
How It Works
After configuring Elasticsearch using the steps defined in the Elasticsearch Quick Start topic, the Elasticsearch event processor gathers and prepares the data specified by the object type for extraction and begins the indexing process to the Elasticsearch server. It uses a parallel framework that delegates the extraction, transformation, and loading (ETL) processes to multiple Elasticsearch Indexing background processes that run concurrently.
The Elasticsearch Event Processor polls for new events every 10 seconds and creates up to 5 Elasticsearch Indexing background processes that may run concurrently.
The number of concurrent running Elasticsearch Indexing background processes is controlled by the background process execution management. Elasticsearch Indexing background process are created with priority ‘Medium’. For more information, refer to the BG Processes Execution Management topic.
Legacy Background Process Queue Configuration
Priority of background processes does not apply to Legacy Background Process Queue Management.
To maximize the throughput of the Elasticsearch Index publishing process, the queue size should be set to 5. For additional information, refer to the Default Configuration for Legacy Background Process Queues topic.
Important: Changing the queue size of the Legacy Background Process Queues may impact other background processing queues running on the STEP system. If you choose to implement, perform thorough testing in lower systems of similar sizing before running in production.
Elasticsearch server indexing capacity (On-Premises Installations)
For on-premises installations, it is relevant to understand that each Elasticsearch Indexing background process with a maximum of 5 processes makes concurrent batch requests to the Elasticsearch Server. The size of each batch request as well as the actual number of concurrent batch requests is dynamically adjusted so that no more than half of Elasticsearch server ‘Index pressure memory’ is consumed. This ensures that the indexing pressure from STEP does not exhaust the Elasticsearch server.
Reindex the Elasticsearch Database
It may be necessary to periodically update indexes. Whenever a change is made to the configuration, the system will prompt you to reindex the data.
Additionally, if you create an entirely new Elasticsearch configuration when one or more already exist on the system, or make any other kind of change to the data that would require reindexing (e.g. add / remove a business condition or change object types), you will be prompted to reindex after finishing the wizard.
While reindexing the database, Elasticsearch is still available to the end user. Any searches made during this time will use the previous index until the reindexing process is complete.
Reindexing an Elasticsearch Configuration
- Whether an initial indexing is required for a brand new Elasticsearch configuration or reindexing is required for an existing Elasticsearch Configuration, navigate to the Reindexing tab on the configuration. At the bottom left, under the Reindexing status, click 'Reindex'.
If reindexing is required for an existing Elasticsearch Configuration, right-click the configuration node from System Setup and select 'Reindex' from the context menu.
Note: After the initial Elasticsearch configuration for every new STEP system, the following message may display: 'Elasticsearch is initializing. Try again in 5 minutes. If this problem persists for more than 1 hour, contact Stibo Support'. This triggers the Elasticsearch server to turn on. The user should try again after 5 minutes and contact Sitbo Systems Support if the message persists.
- In the Reindexing window, select 'Now' and then click 'Reindex now' to begin the process immediately.
To schedule the reindexing for later, select 'Later' and set a time for the process to initiate, then click 'Schedule reindexing'.
- Once reindexing has been initiated, a process generating events begins. You can cancel or reschedule the event process any time by navigating to the process beneath the 'Generating events processes' flipper and clicking either Cancel or Reschedule. For more information on using the Cancel option, Cancellation of Event Processor Background Process below.
The actual reindexing of the data begins once the first event is read by the event processor, which can be monitored by navigating to the process beneath the 'Reindexing status' flipper. Click 'Cancel' if you want to stop the reindex process.
Note: If an additional Elasticsearch configuration has been newly created and you did not reindex upon completing the wizard, reindexing can be manually initiated by following the above steps.
Monitoring Reindex Processes
As described above, you can initiate a reindex process from the Reindexing tab on an Elasticsearch configuration. Additionally, this screen also provides a list of current and previous reindex events.
- To view the current and previous reindex processes, select the background process you would like to monitor by clicking the BGP id under the 'Generating event process' column. The background processes found on this screen are the parent background processes.
- You will be redirected to the Background Process tab for the selected background process where the Properties and the Execution report can be reviewed. The execution report is a full summary for what occurred during reindexing.
Monitoring Elasticsearch Event Processor
- After invoking the Elasticsearch event processor, multiple Elasticsearch indexing background processes are created to process the events. You can view these background processes by navigating to 'Current Background Process Log' under Event Processor tab for the event processor.
- Click any of the Elasticsearch indexing background processes listed to view an execution report for the selected background process.
- Alternatively, navigate to the Background Processes tab for the event processor and click the Id of the event processor background process from the 'Id' column. This will navigate to the execution report for the Elasticsearch processor which will allow you to view all of the Elasticsearch indexing background processes.
- The Elasticsearch indexing background processes are listed under Execution Report. Click one of the background processes to view the details.
- A third way to view the details of the background processes is to navigate to the BGP Processes Menu in the STEP Workbench.
- Navigate to the Elasticsearch node and select any of the process phases under Elasticsearch to review the progress of the background processes.
Note: Additional information about the BG Processes menu can be found BG Processes Tab topic under the Getting Started documentation.
Reindex Process Performance
If the Event Processor configured on the Elasticsearch configuration is disabled, or if its Event Mode is set as 'Deduplicate' in the event processor, a warning dialog displays with the appropriate message(s).
Prior to reindexing, recommended practice is to click 'Go to Event Processor' and resolve the issue(s) identified in the message.
Republish to the Elasticsearch Database
An alternative solution to publish data to Elasticsearch is to use the event processor republish option:
- Generate events for the STEP data required in the Search Screen using the republish action as defined in the Event-Based OIEP Event Actions topic in the Data Exchange documentation. The republish background process generates events for the configured products, classifications, and/or assets.
- Publish STEP data to the Elasticsearch database by invoking the event processor running the Elasticsearch Configuration as defined in the Running an Event Processor topic in the System Setup documentation. The event processor background process creates indexes and publishes products, classifications, and/or assets to the Elasticsearch database.
Republishing Smaller Sets of Objects to Elasticsearch
There are situations where it is necessary to reindex a smaller subset of objects that can be identified by a search criteria by republishing events on those specific objects to the Elasticsearch event processor instead of rebuilding the entire index.
The following methods are examples for how to republish a smaller set of objects:
-
Bulk update using a send republish event operation. Refer to the Send Republish Event Operation topic in the Bulk Updates documentation for more information.
-
Republishing a collection. Refer to the Maintaining Collections topic in the Getting Started documentation for more information.
These methods cannot be canceled as the Reindex process can.
Important: For reindexing the majority of the indexed objects, it is highly recommended to use the reindex solution rather than republishing.
Elasticsearch Event Processor Troubleshooting
Failure Handling
Resuming Unfinished Work
Elasticsearch indexing takes time, and in order to accommodate unexpected errors in the middle of processing, the Elasticsearch event processor has the capability to be ‘Stopped’, put the waiting background processes in a ‘Suspended’ state, and later restart where it left off. Some examples of why the Elasticsearch event processor would need to be paused are :
-
A user needs to stop and start the event processor
-
Additional resources are added to the STEP system and a restart is required
-
A user needs to shutdown the STEP system for any reason
-
The STEP system has crashed
Once the Elasticsearch event processor resumes a message displays in the execution report that will confirm the resuming of each Elasticsearch indexing background process that had previously been suspended.
Elasticsearch Outages
If the Elasticsearch server is unavailable, there is not enough memory, or there are too many shards, the Elasticsearch event processor will go into a ‘Failed (retrying)’ state. The event processor background processes as well as the Elasticsearch indexing background processes will continue to retry. The status of the Elasticsearch indexing background processes will switch between a ‘Running’ and ‘Waiting’ state every 30 seconds by default. The ‘Waiting’ state will show as ‘suspended’ in the logs and the ‘Running’ state will continue to try and run in a ‘Failed (retrying)’ state.
Once the issue with the Elasticsearch server has been resolved, the event processor status will show as ‘Running’ and indexing will resume where it left off. For additional information on the event processor status and their meanings, refer to theRunning an Event Processor topic.
Error in Background Process
In the case of an Elasticsearch Server not being configured correctly, Elasticsearch processing errors due to incompatible schema, incompatible version of Elasticsearch server, etc. or internal STEP errors, a failed status displays for both the Elasticsearch indexing background processes and the event processor background process.
Cancellation of Event Processor Background Process
Canceling a reindex event for the event processor causes all events and background processes to be aborted with no possibility to resume. It will be important to make sure this button is only selected when necessary.
-
Select Cancel from the Elasticsearch configuration Reindexing tab under ‘Reindexing status’.
-
Select 'Cancel reindexing'.
-
After canceling reindexing, all background processes that are running or waiting are moved to the ‘aborted’ state, something similar displays.
Event processor log:
Execution Report: