AWS EC2 Monitoring

EC2 Monitoring
Status Checks
Status monitoring allows you to quickly determine if EC2 has detected any problems which might prevent instances running applications.
EC2 runs automated checks on each running instance of EC2 to identify hardware or software problems.
Every minute, status checks are done. Each returns a pass or fail status.
If all checks pass, then the overall status is OK.
If any of the checks fails, the overall status will be Impaired.
Status checks are part of EC2 and cannot be disabled or removed.
Status checks data adds to the information that EC2 already gives about each instance’s intended state (such as pending and running and stopping), as well as the usage metrics that CloudWatch monitors (CPU utilization and network traffic and disk activity).
Alarms can be set up or removed based on the results of status checks. System Status Checks
To ensure that they work properly, monitor the AWS systems required to use the instance.
Detect problems that need AWS intervention to fix.
Failure to check system status could be due to loss of network connectivity
System power loss
Software issues on the physical host
Hardware issues on the physical host
If a system status check fails you can either check Personal Health Dashboard to see if there is any scheduled critical maintenance by AWS for the instance’s host.
Wait for AWS to resolve the issue
You can either fix it by stopping and restarting, terminating and replacing an instanceInstance Status checks
Monitor the software and network configuration for each instance
To detect problems that need to be repaired, check the following:
Failure to perform instance status checks could be caused by failed system status checks
Misconfigured networking and startup configuration
Exhausted memory
Corrupted file systems
Incompatible kernel
Instance status checks that fail can be fixed by rebooting the instance, or by making changes to the operating systemCloudWatch Monitoring
CloudWatch monitors EC2 instances and converts raw data from EC2 into readable metrics that are near real time.
Two weeks are required to record statistics so that historical information can easily be accessed and used for better understanding of how the application or service performs.
CloudWatch receives EC2 metric data in 5-minute intervals automatically by default. Basic monitoring is enabled by default
CloudWatch can enable detailed monitoring on the EC2 instance. This sends data to CloudWatch every 1 minute.
Aggregating Statistics Across Instances/ASG/AMI IDAggregate statistics can be accessed for instances that have detailed monitoring (at an extra charge), which provides data in 1-minute intervals
The aggregates do not include basic monitoring instances.
CloudWatch does not aggregate data between Regions. Therefore, metrics are totally different between regions.
CloudWatch provides statistics for all dimensions within the AWS/EC2 domain namespace, if none is specified
CloudWatch namespaces that are not published to AWS do not support the retrieval of all dimensions.
Statistics include Sum, Average and Minimum, Maximum, and Data Samples
To retrieve statistics using custom namespaces, you must specify the entire set of dimensions associated with each data point.
CloudWatch alarms can be set up to monitor any of the EC2 instances’ metrics.
You can configure the system to send you an email notification when the metric exceeds a certain threshold.
You can automatically stop, terminate or reboot EC2 instances.
An EC2 instance can be automatically recovered if it becomes unavailable due to an underlying hardware problem. This issue requires AWS involvement.
To save cos, you can stop or terminate the instances automatically