Healthcheck Test Index
Healthchecks assist users to identify and resolve configuration and data issues that can negatively affect system performance.
Healthchecks are executed or skipped based on the database in use and/or if in-memory is enabled on the STEP system, so not all healthchecks will run on every system. As available, healthchecks can be reviewed and run from the following locations:
-
In the Admin Portal on the Healthcheck tab, users can run tests and review detected problems as needed. For more information on the Admin Portal Healthcheck tab, refer to the Healthcheck topic here.
-
For on-premise systems, healthcheck information is stored on the application server at [STEPHOME]/diag/healthcheck (for example, opt/stibo/step/diag/healthcheck). This information is automatically included when sending a diagnostics package to Stibo Systems Support.
-
From the Start Page, the Performance Analysis link displays 12 weeks of results from all scheduled healthcheck results. Some unscheduled healthchecks are long-running or are only useful for Stibo Systems Support, and so they are not available in the Performance Analysis tools.
Healthcheck Tests
The following tables include all available Configuration, Data Error, and Performance healthchecks.
Note: Not all healthchecks are applicable for all STEP systems. On your system, only the healthchecks that are valid are displayed in the Admin Portal, on the application server, and in the Performance Analysis tools.
-
The 'Automated Fix' column indicates if a script is available to resolve the reported issue. To apply a scripted fix, contact Stibo Systems Support for assistance. If no automated fix is available, manually update the reported data or configuration.
-
The 'Runs on Schedule' column indicates that the test is run on the schedule defined in the sharedconfig.properties file on the application server, as defined in the Performance Analysis topic here.
Performance Healthcheck Name | Severity | Description | Automated Fix | Runs on Schedule |
---|---|---|---|---|
Business Rule Execution Time Too Long |
High |
Performance can be impacted when business rules run too long. Business rules may run too long when there are too many operations combined into one rule, or when accessing too many objects, or when accessing objects with too many revisions, or when configurations call external services with a slow response time, etc. This is not always a problem, for example, if an IEP using the Business Rule Based Message Processor runs too long when processing batches, performance is not necessarily impacted if large transactions are not being written to the database. Most often, business rules taking longer than one minute require examination of the rule itself for performance improvements or the objects involved for data management. By default, business rules that run longer than five (5) minutes are reported as a healthcheck warning and business rules that run longer than 15 minutes are stopped and generate an error. |
No |
Yes |
Change Log Entries Per Node | Low | When modifying certain objects in STEP, a change is written to the change log of the object. You can set event queues in STEP to monitor on these events and via, for example, an integration endpoint, information about the change can be exported. An event is put the queue for each interested event queue. Every time you modify an object that generates an event, STEP tries to limit the number of log entries for that object. If more than 20,000 changes are logged for the object, it attempts to delete the old events. However, the attempt only succeeds if there are no event for the change. If you have more than 20,000 change log entries for an object, determine why the events are not being processed. | Yes, in most cases | No |
Change Log Total Size | Critical | Checks if the change log has grown too large. This can cause Oracle to perform poorly. The maximum number of rows allowed in the table is 100,000,000. | Yes | No |
Check for Common Web UI Configuration Errors | Medium | Checks Web UI for some of the most common configuration errors that can cause performance problems. | No | No |
Data cleanup tools | High | To maintain a system that performs well, regularly scheduled data clean-up is highly recommended. This healthcheck detects problems with the configuration of a scheduled background process to empty the recycle bin and/or an event processor to purge old revisions. Configure 'Schedule Empty Recycle Bin' (here), running at least monthly, and including all contexts in use by your system, to purge items in the Tree recycle bin. Enable an event processor of type 'Revision Management' scheduled to run frequently, with Purge Across Workspaces set to Yes, and Number to Keep set to not much more than 100, to reduce the number of unnecessary revisions. | No | Yes |
Hard Evicts | High | A hard evict is a forced attempt to remove persistent objects from their cache. Hard evicts can happen when a task is holding many persistent objects for a long time without committing the transaction. In such a case, a hard evict may be executed to make room for caching other persistent objects. This can negatively affect performance since the cache of persistent objects may become less effective. A cause of hard evicts could be one or more business rules accessing too many objects. | No | Yes |
Large Commit | High |
For Cassandra systems, large commits decrease system performance and increase the risk of concurrency problems. Very large commits may fail. |
No | Yes |
Leaked Changelog Rows | High | Checks if there are leaked rows in the change log table. If there are many leaked data rows, performance will be negatively affected. | Yes | Yes |
Optimistic Lock Recovery | High | Reports that optimistic locking errors were detected when flushing to the data store. This indicates that some objects were concurrently modified in another transaction, or a constraint error occurred. This can negatively impact performance. Repeated occurrences of this may cause the transaction to eventually fail. Resolve this by avoiding and/or minimizing concurrent modifications of the same data. | Yes | Yes |
Too Many Associated Objects | High | When there are too many associated objects, degradation of performance is possible because the amount of data exceeds the threshold for caching the given relation. This could be due to too many children, references, referenced by values, or multi-valued attributes. | No | Yes |
Too Many Attributes Linked (Directly Not Via Inheritance) to a Product/Classification | Medium | Finds all products / classifications that are directly linked (not inherited) to more than 1,000 valid attributes. More than 1,000 links can cause performance issues when opening the References Editor in workbench. | No | Yes |
Too Many Background Processes for an Integration Endpoint | High | Checks if there are too many background processes for an integration endpoint. Too many BGPs for an IEP can degrade performance. Clean-up of old BGP files and folders is required to resolve this issue. | No | Yes |
Too Many Manually Sorted Attribute Groups | Medium | Checks that no manually sorted attribute group has more than 10,000 children. Only the front revisions are considered and children in all workspaces are counted. | No | Yes |
Too Many Manually Sorted Products and Classifications | Medium | Checks that no manually sorted attribute group has more than 10,000 children. Only the front revisions are considered and children in all workspaces are counted. | No | Yes |
Too Many Qualifier Relations | Low | Find all qualifiers that are used in too many pseudo qualifiers. Performance problems can result from having a large number of pseudo qualifiers if a real qualifier is linked to large number of pseudo workspaces because, by default, the application cache only caches 10,000. Refer to the property: Install.DataCache.MaxRelationSize=10000. This plugin cannot remove the duplicates, but another plugin can remove the unused pseudo qualifiers. | No | No |
Too Many Revisions for a Node | High | Checks if there are too many revisions for an object. More than 10,000 revisions can cause performance issues because the amount of data exceeds the threshold for caching. | No | Yes |
Too Many Valid Values for List of Values | Medium | Checks that no list of values has more than 5,000 valid values. Large lists of values (LOVs) make it difficult to find, search, select, and filter on values. | No | No |
Too Many Values for a Node | Medium | Checks if there are nodes with too many values, which can cause performance issues. | No | Yes |
Too Many Workspace Relations | Low | Finds all workspaces are used in too many pseudo workspaces. If, for example, a node is visible in the Main, Approved, and Staging workspaces, a pseudo workspace representing these three workspaces is created. Performance problems can result from having a large number of pseudo workspaces if a real workspace is linked to large number of pseudo workspaces. The application cache, by default, only caches 10,000 pseudo workspaces. Refer to the property: Install.DataCache.MaxRelationSize=10000. While this plugin cannot remove the duplicates, another plugin can remove the unused pseudo workspaces. | No | No |
Unused Pseudo Qualifiers | Low | Finds all pseudo qualifiers that are not used. Performance problems can result from having a large number of pseudo qualifiers if a real qualifier is linked to large number of pseudo qualifiers. The application cache, by default, only caches 10,000. Refer to the property: Install.DataCache.MaxRelationSize=10000. Missing qualifiers are only reported when at least 5,000 unused qualifiers exist. | Yes | No |
Unused Pseudo Workspaces | Low | Finds all pseudo workspaces that are not used. If, for example, a node is visible in the Main, Approved, and Staging workspaces, a pseudo workspace representing these three workspaces is created. If you create new workspaces, many new pseudo workspaces can display many combinations of data. In this case, the result is a lot of pseudo workspaces, while any of these combinations are not always used. Performance problems result from having a large number of pseudo workspaces if a real workspace is linked to large number of pseudo workspaces. The application cache, by default, only caches 10,000 of these. Refer to the property: Install.DataCache.MaxRelationSize=10000. | Yes | No |
Configuration Healthcheck Name | Severity | Description | Automated Fix | Runs on Schedule |
---|---|---|---|---|
Hidden Oracle Parameters With Non-Default Values | Medium | Lists hidden Oracle parameters with a changed default value. The default value of a hidden parameter should only be changed when recommended by Oracle or Stibo Systems Support. | No | No |
JavaScript Catch Without Rethrow |
High | Identifies business rules that do not correctly handle exceptions in try-catch statements. When catching an exception in JavaScript business rules using try-catch, only checked exceptions that have been declared in the Stibo Systems Scripting API are safe to catch without a rethrow of the same or another exception. All runtime exceptions should be rethrown. For some runtime exceptions, this will be strictly enforced so that if the business rule completes successfully, the exception will be rethrown by the framework when omitted in JavaScript. This protects against possible database inconsistencies that occur when the rethrow is omitted. If an API method partially completed a change when the exception occurred, the database transaction needs to roll back by letting the exception fall through the execution scope of the transaction. When issues are reported in this healthcheck, the system-detected missing rethrow(s) and the reported business rule(s) need to be revised to include a rethrow of the same or another exception. | No | Yes |
Non-Compacted Attributes | Medium | Identifies attributes that are not using the compact storage model. Compacted attributes (excluding LOVs) take up less storage space than non-compacted attributes, which results in reduced I/O during read and write, improving response-time of the system. Additionally, for customers migrating to Stibo Systems SaaS (Cassandra), it is a prerequisite and may take multiple days to complete. When issues are detected in this healthcheck, review the attributes reported and start the migration to compact soft values, or convert the attributes to LOVs, where feasible (many usages and few distinct values). Refer to the Attribute Value Migration topic (here) for prerequisites and technical migration details. | No | Yes |
Reflection usage in business rules | High | This healthcheck detects the use of reflection in business rules when they are executed and a warning with the text 'Attempted to call reflection API...' is written to the step.0.log. A future STEP release will block the use of reflection. For any necessary methods or functions that are not covered in the Public JavaScript API, enter an enhancement request in the Stibo Systems Service Portal for review and approval. Also, plan to rewrite the business rules identified by this healthcheck when preparing for a future upgrade. | No | Yes |
Residual Events for a Queue |
Medium | Identifies event queues with events not being processed. When a queue-based event processor or outbound integration endpoint is set to 'Read events' but is not enabled, or is stopped in error, or is enabled without a schedule being run, large numbers of events can build up, which can negatively affect performance. If this issue is detected, inspect the objects in the report and make configuration changes to either consume the events or discard them. | No | Yes |