Monitoring Design Services



Network InfrastructureTOP top

Network equipment is the foundation of any technical infrastructure, so ensuring that it is functioning optimally at all times is critical. YJT takes a comprehensive approach to monitoring the network, including the following categories in our designs:

  • Ping & network testing - Up/down, response time, port response
  • Interface metrics - State, traffic volume, errors, discards
  • Hardware health - PSU state, fan functionality, temperatures, card & module states
  • Performance - CPU, memory, buffers
  • Logs - Errors and other information
  • Routing table state

Server & WorkstationTOP top

The cost of downtime for servers running business-critical applications can never be understated. Server hardware and operating systems are complex systems which present many opportunities for failure. YJT designs solutions to monitor all critical metrics so that downtime can be prevented, and to mitigate the effect of any failures that may still occur.

Most metrics can be monitored via a small-footprint, real-time agent, or via agentless polling. They can apply to servers or workstations as needed. These are the broad categories we most often cover:

  • Power consumption and state (via UPS devices and/or via PSU SNMP)
  • Environmental conditions (temperature, humidity) via dedicated device or existing hardware
  • Server network connectivity
  • Hardware status (temperature, drive state, memory module state, PSU state, fan state)
  • General health (system logs, resource usage of CPU, memory, disk/volume, paging/swap)
  • Network interface card statistics
  • Cluster status
  • Additional performance metrics as-needed (context-switch rate, DPC rate, open file handles, NFS mount status, etc)

Storage Fabric & Virtualization EnvironmentTOP top

While server monitoring will often catch and prevent many issues related to storage subsystems and virtualized host state, YJT has found that monitoring additional metrics from the storage fabric or virtualization hosts directly can also be extremely valuable.

Storage:

  • Array performance
  • Array utilization
  • State/health of drives
  • SAN switch performance, throughput, and error rate

Virtualization:

  • Host throughput and load average
  • Overall host resource consumption (CPU, memory, datastore)
  • Host cluster/farm state

Application Behavior & PerformanceTOP top

Third-party applications can be critical to many infrastructures, and YJT has experience in monitoring a wide variety of them. Proprietary applications often pose additional challenges when monitoring their health and performance, and YJT has developed a diverse set of methods to pull data from many disparate sources into a central monitoring system to ensure maximum uptime and health for applications. Some methods include:

  • Process state and resource usage (CPU, memory, I/O, thread count, instance count)
  • Service state and resource usage
  • Network connectivity (response on given ports)
  • Command line output
  • Log file output:
    • Error strings (regex match)
    • Expected string not found
    • Parsing of numeric data for trending or threshold alerting
  • Message analysis:
    • Custom integration with messaging subsystem (29West/Informatica, TIBCO, etc)
    • Analysis of message contents, size, rate

Common third-party applications that YJT has experience monitoring include (but are not limited to): Exchange, Sendmail, SQL Server, MySQL, Oracle, Blackberry Enterprise Services, Sharepoint, IIS, Apache, Tomcat, SVN, Bamboo, TT Gateway, Advent Geneva, Activ Financial, and many others.


Website, Email, & External ConnectivityTOP top

It is important to check availability from the outside world in addition to internal checks. YJT will monitor the following metrics via a separate, external system:

  • Internet connectivity to physical sites
  • Failover status of redundant gateways
  • Status of BGP routes
  • Email delivery time and status
  • Website availability, response, and health