BG Processes Failsafe
STEP has a mechanism to prevent repeated restarts of background processes (BGPs) that were running on an application server that crashed. This mechanism prevents continuous loops of server crashes caused by BGPs.
If a BGP is believed to be a potential cause of an application server crash, STEP will put the background process into a 'failed' state, where it will not be restarted automatically. Additionally, the BGP is put into a 'quarantined' state which requires user intervention.
Note: When the system administrator is shutting down an application server, it is important to do so properly with the official tools, as simply killing the application server JVM for STEP or cutting power will be interpreted as a crash and may cause this mechanism to be triggered. Shutting down a STEP application server with the official tools will not have this effect.
An analysis of why the process was quarantined will need to be done in order to determine what action to take. Not all background processes being quarantined are the cause of a crash. In some cases, it is a matter of being in the wrong place at the wrong time, which is unfortunately difficult to detect automatically.
Sometimes, the crash is not caused by the specific background process that was quarantined. If the issue can be found and fixed, the quarantined background process can be restarted. A background process can be restarted from the workbench. Alternatively, if it is determined that the quarantined background process should not be started again, it can be deleted.
Integration Endpoints
Integration endpoints will halt if any of their associated background processes are quarantined. This is done to avoid starting background processes, which could cause the same problem to repeat. The execution reports of both the integration endpoint and the background process will indicate the connection between the two processes. The integration endpoint will need to be manually restarted in the workbench. It is strongly recommended to first deal with any other quarantined background processes of the integration endpoint. Resuming an integration endpoint will not resume any associated quarantined background processes.
Important: It is vital that integration endpoints are monitored so they can be restarted after such an incident. Sensors for integration endpoints are already available for this purpose.
Possible Values for Quarantined State
The Quarantine Status property will only be displayed if a background process is or has ever been in quarantine. Otherwise, the property line will not show on the BGP tab. If a background process has been quarantined, there are three possible values that will be displayed:
Quarantined: When the background process is currently in quarantine. In normal cases, the background process is in a 'failed' state when this happens.
Restart acknowledged: A quarantined background process has restarted, but has not yet being executed, e.g. because it just happened or it is waiting on its execution queue. Normally, the process is in 'waiting' state.
Previously Quarantined: A background process was in quarantine, but is no longer quarantined. The background process can now be in any of the possible BGP states.
Searching for Quarantined Background Processes
Quarantined background processes can be found using drill down search in the workbench:
-
Select the Search tab on the left.
-
In the criterion drop down list, change the criterion to Background Process Search.
-
For the Quarantined parameter, select Quarantined.
-
Press the Search button to start the search.