Sensors for External Monitoring

On the Admin Portal Monitoring tab, the Sensors for external monitoring link provides access to the sensors described below. For more information on the functionality, refer to the Monitoring topic.

Sensor Name

Baseline

Sensor Description

AuditMessagingSensor-KafkaStatus

N

'Warning' if a connection cannot be made to Kafka or if one (1) or more topics are not configured in Kafka.

AuditMessagingSensor-ZookeeperStatus

N

'Critical' if no zoo keeper connections are up.

'Warning' if one (1) or more zoo keeper connections are down but one (1) is up.

AutoScaleSensor-Has-Ready-Work

Y

Available only when OneQueueBGPScheduler is enabled. Refer to the BGP One Queue topic.

'Yes' if pending items could run if workers were available:

  • BGP Queue contains waiting processes

  • the Merged Golden Record IIEP has more messages (the IIEP is strict mode with Message Processor that supports parallel processing)

AutoScaleSensor-Workers-In-Cluster

Y

Returns the aggregated number of workers in action for BGPs in the cluster.

AutoScaleSensor-Workers-On-Current-Node

Y

Returns the number of workers in action for BGPs on current node.

BackgroundProcessSensor-Number-Of-Active-Background-Processes

Y

Reports the average number of active background processes during the last three (3) periods specified by configuration properties:

  • Healthcheck.SystemMetrics.ShortIntervalTime
  • Healthcheck.SystemMetrics.MediumIntervalTime
  • Healthcheck.SystemMetrics.LongIntervalTime

BackgroundProcessSensor-Number-Of-Queued-Background-Processes

Y

Reports the average number of queued background processes during the last three (3) periods specified by configuration properties:

  • Healthcheck.SystemMetrics.ShortIntervalTime
  • Healthcheck.SystemMetrics.MediumIntervalTime
  • Healthcheck.SystemMetrics.LongIntervalTime

CpuLoadSensor-JvmCpuAverageLoad

Y

Reports the average CPU load during the last three (3) periods specified by configuration properties:

  • Healthcheck.SystemMetrics.ShortIntervalTime
  • Healthcheck.SystemMetrics.MediumIntervalTime
  • Healthcheck.SystemMetrics.LongIntervalTime

'Critical' if data cannot be retrieved.

Database-checkparam

Y

Checks character-sets, sorting, optimizer-parameters, recovery and integrity, block sizes, cache-sizes, etc.

'Critical' if Oracle DB parameters have been incorrectly configured.

Database-dbping

Y

'Critical' if greater than 150 ms.

'Warning' if DB SQL Ping response time exceeds 50 ms.

Database-dbpings

Y

'Critical' if greater than 150 ms.

'Warning' if the DB SQL Ping response time for multiple pings exceeds 50 ms.

DnbSensor-connection-timeout

N

'Critical' if a connection timeout has occurred.

DnbSensor-dnb-service-error

N

'Critical' if a D&B service error has occurred.

DnbSensor-license-invalid

N

'Critical' if the D&B license is invalid.

DnbSensor-unknown-service-error

N

'Critical' if an unknown service error has occurred.

ElasticsearchMonitoringSensor-Elasticsearch

N

'Critical' if no connection to Elasticsearch.

EventProcessorStatus-[ID]

Y

One (1) sensor will be present per event processor.

'Critical' if event processor has failed due to errors.

'Warning' if event processor is running but has errors.

EventQueueSensor-[ID]

Y

One (1) sensor will be present per event queue.

'Critical' if the event consumer background process has failed, been aborted, or if the number of queued events exceeds the number specified with configuration property:

  • Monitor.EventQueue.NoOfUnreadEvents.Critical'.

'Warning' if there is no active event consumer or if the number of queued events exceeds the number specified with the configuration property:

  • Monitor.EventQueue.NoOfUnreadEvents.Warning

FullGCSensor-PercentOfTimeInFullGCinLast2minutes

Y

'Critical' if more than 90 percent was spent in full garbage collection (GC).

'Warning' if more than 50 percent of the past two (2) minutes was spent in full GC.

FullGCSensor-PercentOfTimeInFullGCinLast10minutes

Y

'Critical' if more than 90 percent was spent in full garbage collection (GC).

'Warning' if more than 50 percent of the past 10 minutes was spent in full GC.

FullGCSensor-PercentOfTimeInFullGCinLast60minutes

Y

'Critical' if more than 90 percent was spent in full garbage collection (GC).

'Warning' if more than 50 percent of the past 60 minutes was spent in full GC.

GatewayIntegrationEndpointStatus-[ID]

Y

One (1) sensor will be present per gateway integration endpoint.

'Critical' if errors were reported within the last 24 hours.

'Warning' if errors were reported.

GCOverheadSensor-FullGCOverheadLoad

Y

Reports the percentage of time in full garbage collection (GC) during the last three (3) periods specified by configuration properties:

  • Healthcheck.SystemMetrics.ShortIntervalTime
  • Healthcheck.SystemMetrics.MediumIntervalTime
  • Healthcheck.SystemMetrics.LongIntervalTime

GraphQLHealthcheckSensor-GraphQLv2Healthcheck

Y

'OK' when GraphQLv2 has successfully started

HipaaQueueSensor-delivery

N

Used when HIPAA logging is enabled.

'Critical' if STEP has been unable to deliver log messages during the last number of seconds specified with configuration property:

  • HipaaQueueSensorDroppedThresholdSec

HipaaQueueSensor-dropped

N

Used when HIPAA logging is enabled.

'Critical' if STEP has dropped log messages for insertion into the HIPAA log during the last number of seconds specified with configuration property:

  • HipaaQueueSensorDroppedThresholdSec

HipaaQueueSensor-waiting

N

Used when HIPAA logging is enabled.

'Warning' if STEP has waited for insertion into the HIPAA log during the last number of seconds specified by the configuration property:

  • HipaaQueueSensorWaitThresholdSec

Http-local

Y

HTTP connection to local host.

Always responds with 'OK' if monitoring sensor can be called.

Http-remote

Y

HTTP connection to other nodes in cluster.

'Critical' if one (1) or more nodes in the cluster cannot be reached.

HttpNumberOfRequestsPerSecondSensor-Number-Of-Http-Requests-Per-Second

Y

Reports the average number of HTTP requests per second during the last three (3) periods specified by configuration properties:

  • Healthcheck.SystemMetrics.ShortIntervalTime
  • Healthcheck.SystemMetrics.MediumIntervalTime
  • Healthcheck.SystemMetrics.LongIntervalTime

HttpResponseTimeSensor-Avg-Http-Response-Time

Y

Reports the average request response time during the last three (3) periods specified by configuration properties:

  • Healthcheck.SystemMetrics.ShortIntervalTime
  • Healthcheck.SystemMetrics.MediumIntervalTime
  • Healthcheck.SystemMetrics.LongIntervalTime

ImageCacheCleanupSensor-[pipeline]

Y

One (1) sensor will be present per cached image pipeline.

'Warning' if used cache size exceeds the max cache size specified for the pipeline by configuration property:

  • ImageCache.Size.[pipeline]

InboundIntegrationEndpointStatus-[ID]

Y

One (1) sensor will be present per inbound integration endpoint (IIEP).

'Critical' if IIEP has failed due to errors.

'Warning' if IIEP is running but has errors.

InMemorySensor-Status

N

Used when the In-Memory component is installed.

Reports if In-Memory is enabled.

JVM-heapsize

Y

'Critical' or 'Warning' if the heapsize is too small or too large compared to available physical memory.

JVM-hugepages

Y

'Warning' if hugepages are not configured and OS supports it.

KafkaStreamingReceiverStatusSensor-[ID]

 

'Critical' if IIEP has been stopped due to failure.

'OK' if the IIEP is enabled and running or disabled.

LicenseSensor-all-expires

Y

'Critical' if no valid license.

'Warning' if all licenses will expire within the number of days specified with configuration property (default is 21):

  • Admin.LicenseWillExpire.SensorWarning

LicenseSensor-some-expires

Y

'Critical' if no valid license.

'Warning' if one (1) or more licenses will expire within the number of days specified with configuration property (default is 21):

  • Admin.LicenseWillExpire.SensorWarning

MemorySensor-LowMemory

Y

Max percentage of time spent on GC threshold can be modified by Stibo Systems.

'Critical' if too much time is spent on garbage collection (GC).

NumberOfThreadsSensor-NumberOfThreads

Y

Reports the number of threads during the last three (3) periods specified by configuration properties:

  • Healthcheck.SystemMetrics.ShortIntervalTime
  • Healthcheck.SystemMetrics.MediumIntervalTime
  • Healthcheck.SystemMetrics.LongIntervalTime

OffHeapMemorySensor-Free

N

Off heap memory status.

'Critical' if less than 10 percent free off heap memory.

'Warning' if less than 12 percent free off heap memory.

OracleAlertLog-last-day

Y

Monitors the Oracle alert log.

'Critical' if one (1) of the ORA codes specified in the following configuration property has been reported within the last day:

  • Oracle.Sensor.AlertLog.Critical

'Warning' if one (1) of the ORA codes specified in the following configuration property has been reported within the last day:

  • Oracle.Sensor.AlertLog.Warning

OracleAlertLog-last-hour

Y

Monitors the Oracle alert log.

'Critical' if one (1) of the ORA codes specified in the following configuration property has been reported within the last hour:

  • Oracle.Sensor.AlertLog.Critical

'Warning' if one (1) of the ORA codes specified in the following configuration property has been reported within the last hour:

  • Oracle.Sensor.AlertLog.Warning

OutboundIntegrationEndpointStatus-[ID]

Y

One (1) sensor will be present per outbound integration endpoint (OIEP).

'Critical' if OIEP has failed due to errors.

'Warning' if OIEP is running but has errors.

PolicyEvaluationSensor-CheckActivePoliciesLatestScore

N

'Critical' if Customer MDM monitoring policies were not evaluated correctly.

PolicyManagerSensor-CheckLatestDataQualityBGP

N

'Critical' if the latest data quality checker background process contained errors.

RestApiHealthcheckSensor-RestApiv2Healthcheck

Y

'OK' when RestAPIv2 has successfully started

ScheduledBackgroundProcessSensor-[ID]

Y

One (1) sensor will be present per scheduled background process.

'Critical' if the last scheduled background process is failed or aborted.

ScheduledJobs-GatherStatsDomain

Y

'Critical' if Oracle itself is running stats and changing execution-plans.

ScheduledJobs-GatherStatsRunning

Y

'Critical' if the Stibo Systems Oracle stats gathering background job is not correctly scheduled.

ScheduledJobs-GatherStatsStatus

Y

'Critical' if the Stibo Systems Oracle stats gathering background job is stopped or has failed.

'Warning' if the Stibo Systems Oracle stats logs have not recently been updated.

Security-ClusterTrafficFilterEnabled

Y

'OK' when the cluster traffic filter is enabled

Security-EnableCSRFProtectionForGetService

Y

CSRF(Cross-site request forgery) protection prevents one-click attacks and/or session riding.

'Warning' if disabled.

Security-EnableIpBlocker

Y

The configuration property allows an IP to be blocked after a specific number of invalid login attempts. This prevents brute force login attempts by guessing passwords.

Note: Do not enable IP-based security unless the network infrastructure supports it.

Reports 'OK' and the value for configuration property:

  • Security.EnableIpBlocker

Security-FrameBreakerEnabled

Y

Frames reject embedding in external pages to prevent unauthorized use of the system through external sources.

'Warning' if embedding is allowed.

Security-HttpServletWrapperEnabled

Y

Non-printable ASCII characters are purged from HTTP headers to prevent HTTP header injection.

'Warning' if disabled.

Security-IpSessionMapperEnabled

Y

Prevents session hijacking based on client IP address by making a session only valid from the IP that initiated it. Users can still run different sessions from different IP addresses.

Note: Do not enable IP-based security unless the network infrastructure supports it.

Reports 'OK' and the value for configuration property:

  • Security.IpSessionMapper.Enabled

Security-LocalTrafficFilterEnabled

Y

'OK' when the local traffic filter is enabled

Security-LogIpRoute

Y

Reports 'OK' and the value for configuration property:

  • Security.Log.IpRoute

Security-PasswordEncryptionEnabled

Y

User passwords stored in the database are encrypted to prevent extraction of user passwords in clear text.

'Warning' if disabled.

Security-PasswordEncryptionStopSTEPStartupOnMissingJCE

Y

Reports on the configuration property:

  • Security.PasswordEncryption.StopSTEPStartupOnMissingJCE

'Warning' if disabled.

Security-RemoteDeserializationCheckEnabled

Y

Reports on the configuration property:

  • Security.RemoteDeserializationCheck.Enabled

'Warning' if disabled.

Security-ResponseHeaderIncludeHttpStrictTransportSecurity

Y

HTTP Strict Transport Security (HSTS) ensures that all HTTP communication is done via SSL (HTTPS). This prevents attacks such as 'downgrade attacks' and 'cookie hijacking.'

'Warning' if disabled.

Security-SHA512PasswordEncryption

Y

All systems should use SHA512 password encryption to provide the greatest security.

'Warning' if disabled.

Security-SystemSSL

Y

SSL enables encrypted HTTP traffic (HTTPS).

'Warning' if disabled.

Security-UserWithIdenticalUsernameAndPassword

Y

Database should not contain any user with the same text for username and password.

'Warning' if the username is the same as the password for one (1) or more users.

Security.EnableCSRFProtectionForGetService.Exclude

Y

Refer to previous row / row above. Customers can keep the property above then disable specifically for a named service by web path.

Sidecar-[sidecar]-[ID]

Y

'Critical' if sidecar is offline or not active.

Sidecar-[sidecar]-[ID]-version

Y

'Critical' if the sidecar version is outdated.

StepFileSystemMonitoring-imagepipelinenative

Y

Time it takes to stat the directory identified by the configuration property:

  • ImagePipeline.Native

'Critical' if greater than 150 ms.

'Warning' if greater than 50 ms.

StepFileSystemMonitoring-installbackgroundprocessarea

Y

Time it takes to stat the directory identified by the configuration property:

  • Install.BackgroundProcessArea

'Critical' if greater than 150 ms.

'Warning' if greater than 50 ms.

StepFileSystemMonitoring-installhotfolderroot

Y

Time it takes to stat the directory identified by the configuration property:

  • Install.HotfolderRoot

'Critical' if greater than 150 ms.

'Warning' if greater than 50 ms.

StepFileSystemMonitoring-installimagecache

Y

Time it takes to stat the directory identified by the configuration property:

  • Install.ImageCache

'Critical' if greater than 150 ms.

'Warning' if greater than 50 ms.

StepFileSystemMonitoring-installincidentreportfolder

Y

Time it takes to stat the directory identified by the configuration property:

  • Install.IncidentReportFolder

'Critical' if greater than 150 ms.

'Warning' if greater than 50 ms.

StepFileSystemMonitoring-installprocessarea

Y

Time it takes to stat the directory identified by the configuration property:

  • Install.ProcessArea

'Critical' if greater than 150 ms.

'Warning' if greater than 50 ms.

STEPflow-wfmping

Y

Reports on legacy workflows. No longer used.

TrafficLight-cluster

Y

Combined sensor status for the cluster.

Change how the result of individual sensors effect the traffic light using the configuration properties:

  • Admin.TrafficLight.Downgrade
  • Admin.TrafficLight.Ignore

TrafficLight-local

Y

Combined sensor status for local server.

Change how the result of individual sensors effect the traffic light using the configuration properties:

  • Admin.TrafficLight.Downgrade
  • Admin.TrafficLight.Ignore

UserLimit-licensed

Y

'Critical' if the number of allowed users has been exceeded.

'Warning' if the number of user accounts is close to exceeding the number of allowed users. Warning threshold is controlled with the configuration property:

  • Admin.UserLimit.Warning

Webservices-api

Y

'Critical' if the core SOAP service is not available.

Webservices-dtp

Y

'Critical' if the DTP SOAP service is not available.

WebstartHealthcheckSensor-WebstartHealthcheck

Y

'OK' when the WebStart has successfully started.