Skip to main content

What Is AWS CloudWatch?

Why Observability Matters

When something breaks in production, you need to answer three questions fast:

  1. Is it broken? — metrics and alarms tell you before users do
  2. What happened? — logs tell you the exact error and when it occurred
  3. Why did it happen? — traces and correlated logs help with root cause

CloudWatch handles the first two out of the box for AWS services. Everything your Lambda function logs, every EC2 CPU spike, every RDS connection error — it all flows into CloudWatch.

Logs

Log Groups

A log group is a container for logs from a single source. AWS creates them automatically:

  • Lambda: /aws/lambda/<function-name>
  • EC2 (via agent): /var/log/nginx/access.log or any path you configure
  • RDS: /aws/rds/instance/<db-id>/error
  • API Gateway: API-Gateway-Execution-Logs_<id>/<stage>

Log Streams

Within a log group, each log stream represents a separate source — for Lambda, each function container gets its own stream. For EC2, each instance gets its own stream.

Reading Lambda Logs

Every console.log() in your Lambda handler automatically appears in CloudWatch. After invoking a function, go to:

Lambda Console → Monitor → View logs in CloudWatch

Or directly in the CloudWatch console:

CloudWatch → Log groups → /aws/lambda/my-function → <latest stream>

Each log event includes a timestamp and the message. Lambda also automatically logs:

START RequestId: abc-123 Version: $LATEST
END RequestId: abc-123
REPORT RequestId: abc-123 Duration: 45.23 ms Billed Duration: 46 ms Memory Size: 128 MB Max Memory Used: 67 MB

Metrics

Metrics are numeric time-series data points. CloudWatch automatically collects:

ServiceKey Metrics
LambdaInvocations, Errors, Duration, Throttles, ConcurrentExecutions
EC2CPUUtilization, NetworkIn/Out, DiskReadOps
RDSCPUUtilization, DatabaseConnections, FreeStorageSpace, ReadLatency
ALBRequestCount, HTTPCode_Target_5XX_Count, TargetResponseTime

You can also publish custom metrics from your application:

import { CloudWatchClient, PutMetricDataCommand } from "@aws-sdk/client-cloudwatch";

const cw = new CloudWatchClient({ region: "us-east-1" });
await cw.send(new PutMetricDataCommand({
Namespace: "MyApp",
MetricData: [{
MetricName: "OrdersProcessed",
Value: 1,
Unit: "Count",
}],
}));

Alarms

An alarm monitors a metric and triggers an action when it crosses a threshold.

Example: alert when Lambda error rate exceeds 5% in 5 minutes.

To create an alarm:

  1. CloudWatch → Alarms → Create Alarm
  2. Select metric: Lambda → By Function Name → Errors
  3. Set threshold: >= 10 errors in 5 minutes
  4. Set action: Send notification to SNS topic → email

Common alarms to set up for any production app:

AlarmMetricThreshold
High Lambda errorsLambda Errors> 5 in 5 min
High EC2 CPUEC2 CPUUtilization> 80% for 10 min
Low RDS disk spaceRDS FreeStorageSpace< 2 GB
5xx errors on ALBHTTPCode_Target_5XX_Count> 10 in 1 min

CloudWatch Log Insights

Log Insights lets you query log groups with a SQL-like language — much faster than searching raw log streams.

Find all errors in the last hour:

fields @timestamp, @message
| filter @message like /ERROR/
| sort @timestamp desc
| limit 20

Find the slowest Lambda invocations:

filter @type = "REPORT"
| fields @requestId, @duration
| sort @duration desc
| limit 10

Count errors per minute:

filter @message like /ERROR/
| stats count() by bin(1m)

CloudWatch Agent for EC2

EC2 doesn't automatically send application logs to CloudWatch. You need to install the CloudWatch agent:

sudo apt install amazon-cloudwatch-agent -y
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-config-wizard
sudo systemctl enable amazon-cloudwatch-agent
sudo systemctl start amazon-cloudwatch-agent

The wizard asks which log files to ship (e.g., /var/log/nginx/access.log, /home/ubuntu/app/logs/app.log) and what metrics to collect. The EC2 instance needs an IAM role with CloudWatchAgentServerPolicy attached.