Application Monitoring & Visualization
Direct Prometheus integration with Cloud Run can be intricate due to its serverless architecture. Google Cloud offers Managed Service for Prometheus (GMP) using a sidecar for streamlined metric collection. However, for maximum simplicity and deep Google Cloud integration, native monitoring tools are the primary path.
Collecting Application Metrics (via Cloud Monitoring)
Option A: Structured Logging & Log-based Metrics (Most Direct)
- Output structured (JSON) logs directly to
stdout
in your Streamlit app. - Cloud Run automatically forwards
stdout
/stderr
to Cloud Logging; JSON strings printed tostdout
are parsed asjsonPayload
. This is the simplest mechanism.
Sample code snippet:
import json
import time # For realistic duration
# Example: In your Streamlit application code
def some_function_processing_a_task(task_type, success, duration_ms_value):
log_data = {
"message": f"Processed task: {task_type}",
"task_type": task_type,
"duration_ms": duration_ms_value,
"success": success,
# Severity is often inferred by Cloud Logging (stdout=INFO, stderr=ERROR).
# Explicitly add "severity": "INFO" or "ERROR" if precise control is needed
# for filtering or log-based metrics derived from severity.
}
# Print JSON to stdout; Cloud Run forwards this to Cloud Logging.
# Cloud Logging will parse this into jsonPayload.
print(json.dumps(log_data))
# Example usage:
# start_time = time.monotonic()
# # ... processing logic ...
# success_status = True # or False
# duration = (time.monotonic() - start_time) * 1000
# some_function_processing_a_task("schedule_meeting", success_status, int(duration))
In Cloud Monitoring, create log-based metrics:
Navigate: Google Cloud Console → Logging → Log-based Metrics → Create Metric.
Choose Metric Type: Counter (for occurrences) or Distribution (for values like latency).
Define Filter to isolate relevant logs, e.g.:
resource.type="cloud_run_revision"
resource.labels.service_name="meeting-workflow-agent"
jsonPayload.task_type="schedule_meeting"
- Specify Field Name (for Distribution, e.g.,
jsonPayload.duration_ms
), units (e.g., ms), and labels for drill-down.
Option B: Using Cloud Monitoring API (Maximum Flexibility, More Code)
- Add
google-cloud-monitoring
to yourrequirements.txt
. - Utilize
google.cloud.monitoring_v3
to programmatically write custom metrics. This offers granular control.
Sample code:
from google.cloud import monitoring_v3
import time
import os
project_id = "adk-learning-journey" # Ensure this is configured or dynamically fetched
client = monitoring_v3.MetricServiceClient()
project_name = f"projects/{project_id}"
def write_custom_metric(metric_type, value, labels=None):
series = monitoring_v3.types.TimeSeries()
series.metric.type = f"custom.googleapis.com/{metric_type}" # Standard prefix for custom metrics
series.resource.type = "cloud_run_revision"
# These labels are crucial for associating the metric with the correct Cloud Run resource
series.resource.labels["service_name"] = os.environ.get("K_SERVICE", "unknown") # K_SERVICE is the Cloud Run service name
series.resource.labels["revision_name"] = os.environ.get("K_REVISION", "unknown") # K_REVISION is the specific revision
series.resource.labels["configuration_name"] = os.environ.get("K_CONFIGURATION", "unknown") # K_CONFIGURATION is the configuration name
if labels:
for k, v in labels.items():
series.metric.labels[k] = str(v) # Metric labels must be strings
point = monitoring_v3.types.Point()
# Ensure value is appropriate type (int64_value, double_value, etc.)
if isinstance(value, int):
point.value.int64_value = value
elif isinstance(value, float):
point.value.double_value = value
else:
# Handle other types or raise error, e.g. for Distribution for complex types
# For simplicity, this example assumes int or float.
point.value.int64_value = int(value)
now_ts = time.time()
point.interval.end_time.seconds = int(now_ts)
point.interval.end_time.nanos = int((now_ts - point.interval.end_time.seconds) * 10**9)
series.points.append(point)
# Call the API to create the time series
try:
client.create_time_series(name=project_name, time_series=[series])
except Exception as e:
# Implement proper error handling/logging for production
print(f"Error writing custom metric {metric_type}: {e}", file=sys.stderr)
# Example usage:
# write_custom_metric("streamlit/successful_meetings", 1, {"agent_type": "manager"})
# write_custom_metric("streamlit/processing_latency_ms", 150.5, {"task_category": "report_generation"})
Set Up Grafana
- Deploy Grafana:
- Option 1 (Recommended for GKE/GCE): Deploy via Google Cloud Marketplace if operating within these environments.
- Option 2: Install Grafana in your designated environment.
- Connect Grafana to Google Cloud Monitoring:
- Access Grafana.
- Navigate: Configuration → Data Sources → Add data source.
- Select "Google Cloud Monitoring".
- Authentication:
- If Grafana runs on GCE/GKE, leverage the attached service account (grant
roles/monitoring.viewer
). - Otherwise, create a service account, assign
roles/monitoring.viewer
, download its JSON key, and upload to Grafana.
- If Grafana runs on GCE/GKE, leverage the attached service account (grant
- Set Default Project:
adk-learning-journey
. - Save & Test. A successful test validates the connection.
Create Grafana Dashboards
- Create → Dashboard → Add new panel.
- Query Configuration:
- Select the "Google Cloud Monitoring" data source.
- Service: Choose "Cloud Run" for standard metrics or "Custom Metrics" for your defined metrics.
- Metric: Select your log-based metric, standard Cloud Run metrics (e.g., Request Count, Latency, Instance Count), or your custom metric name (e.g.,
custom.googleapis.com/streamlit/request_count
). - Utilize the query editor to filter and aggregate data (e.g., by
resource.labels.service_name
,resource.labels.revision_name
, or custom metric labels).
- Visualization: Choose the optimal chart type for the data.
- Alerting: Configure alert rules in Grafana for critical metrics to ensure operational awareness.
Cost Control & Optimization
- Monitor Costs: Regularly audit Google Cloud Billing reports. Filter by service (Cloud Run, Secret Manager, Logging, Monitoring) for granular insight.
- Set Budget Alerts: Implement budget alerts in Billing for proactive cost management.
- Optimize Cloud Run Settings:
- Tune CPU, memory, and min/max instances based on observed performance data from Grafana/Cloud Monitoring.
- Employ
--min-instances 0
for services with intermittent traffic to minimize idle costs.
- Logging & Monitoring Costs:
- Be aware: Logging ingestion, storage, and Monitoring (especially custom metrics & API calls) have associated costs.
- Reduce log ingestion costs by setting appropriate application log levels at the source. Log-based metric filters refine insights from ingested logs; they do not reduce the initial volume of logs stored.
- For custom metrics, govern write frequency to balance granularity with cost.
- Secret Manager Costs: Costs are driven by the number of secrets and access frequency. Reading secrets primarily at application startup minimizes these operational costs.
Top comments (0)