Forem: quietpulse

Rails Scheduled Job Monitoring: How to Catch Missed Jobs Before They Break Production

quietpulse — Mon, 11 May 2026 06:13:47 +0000

Rails scheduled job monitoring is easy to forget because scheduled work usually lives in the background. Your web app is up, requests are fine, the database is responding, and dashboards look green. Meanwhile, a nightly billing sync, cleanup task, email digest, or data import may have stopped running three days ago.

That is the dangerous part: scheduled jobs often fail quietly.

A Rails app can look completely healthy while important recurring work is missing. Users may not notice right away. You may not notice right away. Then suddenly invoices are wrong, trial expirations did not happen, reports are stale, or a queue is full of old data.

This guide covers how Rails scheduled jobs fail, why logs are not enough, and how to use heartbeat monitoring to catch missed executions before they become production incidents.

The problem

Rails apps often rely on scheduled background work for things that are not directly tied to a web request.

Common examples include:

sending daily or weekly email digests
charging subscriptions
syncing data from third-party APIs
expiring trials or temporary records
cleaning old sessions, uploads, or audit logs
generating reports
enqueueing recurring jobs
refreshing cached data
retrying failed external operations

These tasks may be implemented with different tools:

plain cron
whenever
sidekiq-cron
sidekiq-scheduler
good_job
solid_queue
delayed_job
Heroku Scheduler
Kubernetes CronJobs
systemd timers
custom Rake tasks

The implementation changes, but the monitoring problem stays the same.

A scheduled job can fail in several ways:

it never starts
it starts but crashes
it hangs forever
it runs on the wrong schedule
it runs on one environment but not another
it queues work but workers are down
it silently skips important records
it completes locally but fails in production

The most frustrating failure mode is the missing run. Nothing explodes. No exception is raised. No user request fails. The scheduled job simply does not happen.

That is exactly the kind of issue normal Rails monitoring often misses.

Why it happens

Rails scheduled job failures usually come from small operational details rather than dramatic bugs.

One common cause is a broken cron environment. Cron does not load the same shell profile as your interactive terminal. Environment variables may be missing. Ruby, Bundler, or Rails paths may be different. A command that works perfectly over SSH may fail when cron runs it.

For example:

bundle exec rails runner "Billing::SyncJob.perform_now"

might work in your shell, while cron fails because it cannot find bundle, does not have RAILS_ENV=production, or runs from the wrong directory.

Another common issue is deployment drift. A scheduled task may be configured on an old server, a staging box, or a container that no longer exists. After an infrastructure migration, the app is still online, but the scheduler was never recreated.

Queue-backed scheduling adds another layer. With Sidekiq, GoodJob, Solid Queue, or Delayed Job, there are two separate things to monitor:

Did the scheduler enqueue the job?
Did a worker actually execute it?

If the scheduler runs but workers are stopped, jobs pile up. If workers run but the scheduler is broken, nothing gets enqueued. Looking at only one side can give you a false sense of safety.

Rails deployments also make scheduled work easy to accidentally duplicate or disable. You may have multiple app servers, multiple containers, or multiple release directories. If every instance runs the scheduler, the job may execute many times. If none of them run it, the job disappears completely.

There are also application-level causes:

feature flags disable part of the job
a database query becomes too slow and times out
an API token expires
a lock never releases
a migration changes a column the job depends on
the job rescues exceptions too broadly
a retry loop hides the real failure

In all of these cases, the Rails app can still serve web traffic normally.

That is why Rails scheduled job monitoring needs to focus on the scheduled work itself, not just the app process.

Why it's dangerous

Silent scheduled job failures can be expensive because they often affect delayed, accumulated, or business-critical work.

If a cleanup job stops running, the impact may start small. A few old records remain. Disk usage grows a little. Queries become slightly slower. Then, weeks later, storage fills up or a table becomes painfully large.

If a billing job stops running, the damage is more direct. Customers may not be charged, invoices may not be sent, subscription states may drift, or payment retries may never happen.

If a sync job stops running, your app may show stale data. Users may make decisions based on old information. Support tickets appear, but the root cause is not obvious.

If an email digest job stops running, engagement drops quietly. Nobody gets paged. The app is up. But an important product loop is broken.

The same pattern appears across many Rails systems:

failed nightly reports
missed customer notifications
stuck import pipelines
stale search indexes
broken cache refreshes
abandoned trial expiration tasks
missed webhook retry jobs
incomplete analytics rollups

Traditional monitoring often does not catch these failures.

Uptime checks only confirm that an HTTP endpoint responds. Error tracking catches exceptions only if the job raises and reports them. Logs help only if someone searches them or has log-based alerts configured correctly. Queue dashboards show queue state, but not always whether a recurring job was expected and missed.

The dangerous question is not just “did something fail?”

It is also:

Did the job run when it was supposed to?

That is the core question Rails scheduled job monitoring should answer.

How to detect it

The simplest reliable pattern is heartbeat monitoring.

A heartbeat is a small signal sent by your scheduled job when it runs successfully. An external monitor expects that signal on a schedule. If the signal does not arrive within the expected time window, it alerts you.

Instead of only watching for errors, you watch for proof of success.

For example, if a Rails job should run every night at 02:00, the monitor expects one successful ping every 24 hours. If no ping arrives by 02:30, something is wrong:

cron did not run
the scheduler is misconfigured
the Rails command crashed
the job hung before completion
the worker never processed it
the server was down
the deploy broke the task

The monitor does not need to know which failure happened first. It knows the important outcome: the scheduled job did not complete successfully on time.

That is the key advantage.

For Rails apps, a heartbeat should usually be sent at the end of the job, after the important work is complete. This avoids false success signals.

Bad pattern:

class NightlyBillingJob < ApplicationJob
  def perform
    ping_monitor
    Billing::RunNightlySync.call
  end
end

If billing fails after the ping, the monitor still sees success.

Better pattern:

class NightlyBillingJob < ApplicationJob
  def perform
    Billing::RunNightlySync.call
    ping_monitor
  end
end

Now the heartbeat means the job actually reached the end.

For jobs with multiple critical steps, you can ping only after all required steps finish. If the job partially completes and then fails, the missing heartbeat tells you something needs attention.

This is different from logging. Logs describe what happened inside your system. Heartbeats prove that an expected scheduled outcome happened from the outside.

A good Rails scheduled job monitoring setup usually tracks:

expected frequency
grace period
last successful run
missed runs
alert channel
job identity
production environment only

The grace period matters. If a job runs every hour, you may allow 10 or 15 extra minutes before alerting. If a nightly job usually takes 20 minutes, do not alert after 2 minutes. Monitor the real expected completion window.

Simple solution (with example)

Here is a simple Rails example using a heartbeat ping at the end of a scheduled job.

Imagine you have a job that runs every night and syncs subscription states:

# app/jobs/nightly_subscription_sync_job.rb
class NightlySubscriptionSyncJob < ApplicationJob
  queue_as :default

  def perform
    SubscriptionSync.run!
    ping_monitor
  end

  private

  def ping_monitor
    return unless Rails.env.production?

    uri = URI("https://quietpulse.xyz/ping/YOUR_TOKEN")

    Net::HTTP.start(uri.host, uri.port, use_ssl: true, read_timeout: 5) do |http|
      request = Net::HTTP::Get.new(uri)
      http.request(request)
    end
  rescue StandardError => e
    Rails.logger.warn("Heartbeat ping failed: #{e.class}: #{e.message}")
  end
end

The important details:

the ping happens after SubscriptionSync.run!
it only runs in production
it has a short timeout
ping failure is logged but does not break the job
the URL uses a simple success ping endpoint

You can schedule this job with whichever tool your Rails app already uses.

With sidekiq-cron, the schedule might look like this:

nightly_subscription_sync:
  cron: "0 2 * * *"
  class: "NightlySubscriptionSyncJob"
  queue: default

With whenever, you may schedule a Rails runner or Rake task:

# config/schedule.rb
set :environment, "production"

every 1.day, at: "2:00 am" do
  runner "NightlySubscriptionSyncJob.perform_later"
end

With a Rake task:

# lib/tasks/subscriptions.rake
namespace :subscriptions do
  desc "Run nightly subscription sync"
  task nightly_sync: :environment do
    SubscriptionSync.run!

    if Rails.env.production?
      uri = URI("https://quietpulse.xyz/ping/YOUR_TOKEN")
      Net::HTTP.get_response(uri)
    end
  end
end

Then cron could run:

cd /var/www/myapp/current && RAILS_ENV=production bundle exec rake subscriptions:nightly_sync

A more defensive shell version can make sure the heartbeat only fires after the Rails task succeeds:

cd /var/www/myapp/current &&
RAILS_ENV=production bundle exec rake subscriptions:nightly_sync &&
curl -fsS --max-time 10 https://quietpulse.xyz/ping/YOUR_TOKEN

The && matters. It means the ping only runs if the previous command exits successfully.

If you use a heartbeat monitoring tool like QuietPulse, you create a check with the expected interval, add the generated ping URL to the end of your job, and receive an alert if the job misses its window. You can build something similar yourself, but using a small external monitor is usually simpler and more reliable than having the app monitor its own missing work.

The main idea is not tool-specific: every important scheduled job should produce an external success signal.

Common mistakes

1. Pinging at the start of the job

This is the most common mistake.

If you ping at the start, you only prove that the job began. You do not prove that it completed.

For short, simple jobs, that may feel good enough. But for billing, syncs, reports, imports, and cleanup tasks, completion matters much more than startup.

Ping after the critical work finishes.

2. Monitoring only the queue

Queue dashboards are useful, but they are not the same as scheduled job monitoring.

A queue may look healthy while a recurring job is never enqueued. Or the scheduler may enqueue the job successfully while workers are stuck. You need to monitor the expected completion of the scheduled task, not just the presence of a worker process.

3. Using one heartbeat for many jobs

One generic “daily jobs ran” heartbeat is tempting, but it hides which job failed.

If you have separate billing, cleanup, report, and sync jobs, give important jobs their own checks. That way, the alert tells you exactly what is missing.

4. Ignoring time zones

Rails, cron, Sidekiq, Kubernetes, and hosting platforms may use different time zones.

A job scheduled for “2 AM” may not mean what you think it means. Daylight saving time can also surprise you.

Use UTC where possible, document the expected schedule, and set heartbeat grace periods based on real execution times.

5. Swallowing exceptions too broadly

Some Rails jobs rescue everything to avoid retry storms:

rescue StandardError
  nil
end

That pattern can hide real failures. If the job also sends a heartbeat after the rescue, monitoring becomes misleading.

Log exceptions clearly, report them to error tracking, and only send the heartbeat after the required work actually succeeded.

Alternative approaches

Heartbeat monitoring is the most direct way to detect missed scheduled jobs, but it works best alongside other signals.

Logs are still useful. Rails logs can show job start times, durations, record counts, API failures, and SQL issues. Structured logs make debugging much easier after an alert fires.

Error tracking is also important. Tools like Sentry, Honeybadger, Rollbar, or AppSignal can catch exceptions inside jobs. They answer a different question: “Did the job crash with an error?” Heartbeats answer: “Did the job complete on time?”

Queue monitoring helps too. For Sidekiq, GoodJob, Solid Queue, or Delayed Job, you should watch queue latency, retries, dead jobs, and worker availability. If a scheduled job misses its heartbeat, queue metrics often help explain why.

Database checks can catch business-level symptoms. For example:

no invoices created in 24 hours
no imports completed today
no reports generated this week
no webhook retries processed recently

These checks are powerful, but they are usually more custom. A heartbeat is easier to add first.

Uptime checks are useful for the Rails web app itself, but they are not enough for scheduled work. Your homepage or health endpoint can return 200 OK while every recurring job is broken.

The best setup is layered:

uptime monitoring for the web app
error tracking for exceptions
queue monitoring for background workers
logs for debugging
heartbeat monitoring for scheduled job completion
business checks for critical outcomes

Each signal catches a different class of failure.

FAQ

What is Rails scheduled job monitoring?

Rails scheduled job monitoring means tracking whether recurring Rails tasks run successfully on their expected schedule. These tasks may be cron jobs, Rake tasks, Active Job jobs, Sidekiq jobs, GoodJob jobs, or scheduler-triggered background work.

The goal is to detect missed, failed, delayed, or silently broken jobs before they cause production problems.

How do I monitor Rails cron jobs?

The simplest approach is to send a heartbeat ping at the end of each important cron job. An external monitor expects that ping based on the job schedule and alerts you if it does not arrive.

For example, if a Rails Rake task runs every night, add a success ping after the task completes. If cron fails, Rails crashes, or the job hangs, the ping will be missing.

Is Sidekiq monitoring enough for scheduled jobs?

Sidekiq monitoring is useful, but it is not always enough. It can show retries, dead jobs, queue latency, and worker status. But scheduled job monitoring should also confirm that each expected recurring job completed on time.

A Sidekiq dashboard may not alert you when a scheduler stops enqueueing a job entirely. Heartbeat monitoring closes that gap.

Should I ping before or after a Rails job runs?

Usually after.

A heartbeat should represent successful completion, not just startup. If you ping before the job runs and then the job fails halfway through, your monitor will show a false success.

Ping only after the critical work finishes.

What Rails jobs should have heartbeat monitoring?

Start with jobs where a missed run would hurt users, revenue, data quality, or operations.

Good candidates include billing syncs, subscription updates, imports, exports, email digests, cleanup tasks, report generation, webhook retries, search indexing, and analytics rollups.

Not every tiny maintenance task needs its own alert, but important scheduled jobs should be visible.

Conclusion

Rails scheduled job monitoring is about proving that important background work actually happened.

Your Rails app can be online while scheduled jobs are broken. Cron can miss runs. Schedulers can stop. Workers can fail. Environment variables can disappear. Jobs can hang or silently skip work.

Logs, error tracking, and queue dashboards all help, but they do not fully answer the most important question:

Did this scheduled job complete when expected?

Heartbeat monitoring gives you that answer. Add a success ping at the end of each critical Rails scheduled job, set the expected interval, and alert when the signal goes missing.

That small pattern can save you from discovering a broken billing sync, stale report, or missing cleanup task days too late.

Originally published at https://quietpulse.xyz/blog/rails-scheduled-job-monitoring

Django Management Command Monitoring: How to Catch Missed Commands Before They Break Production

quietpulse — Sun, 10 May 2026 06:14:57 +0000

Django management command monitoring is easy to overlook.

A command works when you run it manually:

python manage.py sync_invoices

So you put it in cron, Celery beat, systemd, Kubernetes, or a platform scheduler.

Then one day it stops running.

The app is still online. Uptime checks are green. But invoices are missing, reminder emails are not sent, reports are stale, and nobody notices until the data is already wrong.

The problem

Django management commands often run outside the normal request/response path.

They are commonly used for:

billing reconciliation
scheduled emails
CRM or payment provider syncs
CSV imports
cleanup jobs
search index rebuilds
report generation
expired trial handling

These jobs usually run through something outside Django:

0 2 * * * cd /srv/app && /srv/app/venv/bin/python manage.py sync_invoices

That creates a monitoring gap.

Your web app can be healthy while the scheduled command quietly fails.

Why it happens

Management commands are application code, but they are usually launched by infrastructure.

That means failures can happen before your Django app has a good chance to report them.

Common causes include:

cron not running
disabled systemd timers
stopped Celery beat processes
missing environment variables
wrong virtualenv paths
changed working directories
expired database credentials
stuck external API calls
commands hanging forever
commands exiting successfully while processing nothing

A command may work perfectly in your shell but fail under cron because cron has a minimal environment.

For example:

python manage.py cleanup_expired_trials

may work manually, while cron does not know which python to use or which Django settings module should be loaded.

Why it's dangerous

Missed Django management commands rarely look like immediate outages.

They look like slow operational damage.

A missed billing job means invoices are not generated.

A missed email job means users are not notified.

A missed cleanup job means old data piles up until queries slow down.

A missed sync job means your local database and external system drift apart.

The painful part is that these failures are often discovered late. By then, you may need to figure out:

which records were missed
whether the command can be safely replayed
whether duplicate emails or invoices might be created
how long the job was broken
whether reports from previous days can be trusted

That is why scheduled work needs a direct completion signal.

How to detect it

The simplest approach is to monitor completion.

Not just server uptime.

Not just whether cron exists.

Not just whether logs were written.

Completion.

The command should send a heartbeat ping after the important work succeeds. If the ping does not arrive within the expected time window, you get an alert.

The flow is:

Create a heartbeat check for the command.
Configure the expected schedule.
Run the Django command normally.
Send a ping only after the command succeeds.
Alert if the ping is missing or late.

If the scheduler does not run, no ping arrives.

If Django crashes, no ping arrives.

If the command hangs, no ping arrives.

If the server is down during the schedule window, no ping arrives.

That makes heartbeat monitoring a good fit for Django management command monitoring.

Simple solution

Start with a normal management command:

# billing/management/commands/sync_invoices.py

from django.core.management.base import BaseCommand
from billing.services import sync_invoices


class Command(BaseCommand):
    help = "Sync invoices from the payment provider"

    def handle(self, *args, **options):
        synced_count = sync_invoices()
        self.stdout.write(
            self.style.SUCCESS(f"Synced {synced_count} invoices")
        )

Then schedule it:

0 2 * * * cd /srv/app && /srv/app/.venv/bin/python manage.py sync_invoices

To monitor successful completion, add a heartbeat ping after the command:

0 2 * * * cd /srv/app && /srv/app/.venv/bin/python manage.py sync_invoices && curl -fsS https://quietpulse.xyz/ping/YOUR_TOKEN

The && is important.

It means the ping only runs if the Django command exits successfully.

For production, add logging and a timeout:

0 2 * * * cd /srv/app && timeout 30m /srv/app/.venv/bin/python manage.py sync_invoices >> /var/log/sync_invoices.log 2>&1 && curl -fsS https://quietpulse.xyz/ping/YOUR_TOKEN

This catches cases where:

the command never starts
the command fails
the command hangs
the final completion signal is missing

You can also send the ping from inside Python:

import requests

from django.conf import settings
from django.core.management.base import BaseCommand
from billing.services import sync_invoices


class Command(BaseCommand):
    help = "Sync invoices from the payment provider"

    def handle(self, *args, **options):
        synced_count = sync_invoices()

        requests.get(settings.SYNC_INVOICES_HEARTBEAT_URL, timeout=10)

        self.stdout.write(
            self.style.SUCCESS(f"Synced {synced_count} invoices")
        )

If you do this, send the ping after the critical work completes, not before it starts.

Common mistakes

1. Sending the heartbeat at the start

This only proves the command started.

It does not prove the work completed.

2. Using `;` instead of `&&`

Avoid this:

0 2 * * * python manage.py sync_invoices; curl -fsS https://quietpulse.xyz/ping/YOUR_TOKEN

The ping may run even if the command fails.

Use this:

0 2 * * * python manage.py sync_invoices && curl -fsS https://quietpulse.xyz/ping/YOUR_TOKEN

3. Relying only on logs

Logs are useful after you know something went wrong.

They are not always good at telling you that a scheduled command never ran.

4. Monitoring only the scheduler

Knowing that cron or Celery beat is alive does not prove a specific Django command completed successfully.

The scheduler can be running while one command fails every day.

5. Reusing one monitor for every command

Important commands should have separate checks.

If invoice sync fails, the alert should say invoice sync failed — not “some backend job might be broken.”

Alternative approaches

Heartbeat monitoring works best when combined with other signals.

Logs

Good command logs should include:

start time
finish time
duration
processed count
skipped count
external API failures
exceptions

Logs help explain failures, but they still need detection and alerting around them.

Error tracking

Error tracking tools are great when a Django command raises an exception.

But they may not catch:

cron never starting
server downtime during the schedule
killed processes
hung commands
commands that exit successfully but process nothing

Scheduler dashboards

Celery, Kubernetes, and platform schedulers may show job history.

That helps, but the signal is tied to the scheduler.

A heartbeat ping is portable because it travels with the command.

Database audit tables

For critical workflows, writing run metadata to the database can be useful:

command name
started at
finished at
status
processed count
error message

This gives you history, but you still need alerting when a run is missing.

FAQ

What is Django management command monitoring?

It means tracking whether scheduled Django management commands run and complete successfully. A common pattern is to send a heartbeat ping after the command succeeds and alert if the ping is missing.

How do I monitor a Django management command in cron?

Run the command normally, then send a heartbeat ping only after success:

0 2 * * * cd /srv/app && /srv/app/.venv/bin/python manage.py my_command && curl -fsS https://quietpulse.xyz/ping/YOUR_TOKEN

Should the heartbeat ping happen before or after the command?

Usually after. A ping before the command proves it started. A ping after the command proves it completed.

Is cron enough for Django scheduled tasks?

Cron can run the task, but it does not reliably tell you when the task was missed, failed, or hung. For production, combine cron with logging, timeouts, and heartbeat monitoring.

Does this work with Celery beat or systemd timers?

Yes. The same idea works with cron, Celery beat, systemd timers, Kubernetes CronJobs, GitHub Actions, and platform schedulers.

Conclusion

Django management commands often handle important production work quietly in the background.

That is exactly why they need monitoring.

If a command syncs data, sends emails, generates invoices, or updates reports, you should know when it stops completing on schedule.

Logs and error tracking help explain failures. Heartbeat monitoring catches the missing completion signal before stale data turns into an incident.

Originally published at https://quietpulse.xyz/blog/django-management-command-monitoring

Firebase Scheduled Functions Monitoring: How to Catch Missed Runs Before They Break Production

quietpulse — Sat, 09 May 2026 07:37:55 +0000

Firebase scheduled functions monitoring matters because scheduled backend work is easy to forget until it quietly stops doing its job.

A Cloud Function might clean old records every night, sync subscription status from a payment provider, send reminder notifications, refresh search indexes, or export analytics data. When that function runs correctly, nobody thinks about it. When it stops running, the app may still look healthy from the outside.

The website is up. The API responds. Users can log in.

But the scheduled work is missing.

That is the dangerous part: Firebase scheduled functions often fail in places that normal uptime monitoring cannot see.

The problem

A Firebase scheduled function is usually created with Cloud Scheduler behind the scenes. Depending on the generation and setup, it may be triggered through Pub/Sub or the newer scheduler integration.

A typical job might look like this:

const { onSchedule } = require("firebase-functions/v2/scheduler");

exports.cleanupExpiredSessions = onSchedule("every 24 hours", async () => {
  await cleanupExpiredSessions();
});

It looks simple. Once deployed, you expect it to run forever.

But production systems are not that clean.

Scheduled functions can stop working because of deployment mistakes, billing issues, IAM changes, runtime errors, dependency failures, quota problems, region mismatches, or configuration drift. Sometimes the function does run, but exits early before doing the important work. Sometimes it starts failing every night and nobody notices because the rest of the app keeps responding normally.

The core problem is this:

Your app can be up while your scheduled work is broken.

That means uptime checks alone are not enough.

Why it happens

Firebase scheduled functions rely on several moving parts:

Cloud Scheduler
Cloud Functions
Pub/Sub or scheduler triggers
IAM permissions
runtime configuration
external APIs
database access
billing and quota limits

If any of those pieces changes, your scheduled task can fail.

Common causes include:

the function was renamed or removed during deployment
the schedule exists in one region while the function is deployed in another
an environment variable is missing in production
the service account lost permission to invoke the function
Firestore or Realtime Database rules changed
a third-party API started returning errors
the job times out on larger data sets
Firebase billing or Google Cloud quotas block execution
logs are noisy enough that nobody sees the failure

There is also a more subtle failure mode: partial success.

For example, a scheduled function might begin processing users, update the first 500 records, hit an exception, and stop. From a high level, you may see that the function ran. But the job did not actually complete the work it was responsible for.

That is why Firebase scheduled functions monitoring should focus on completion, not just invocation.

Why it's dangerous

Missed scheduled functions can create slow, silent damage.

A broken cleanup job might leave expired sessions, temporary files, or stale documents in your database. A missed billing sync might fail to downgrade unpaid accounts. A failed notification job might leave users waiting for reminders that never arrive. A broken analytics export might create missing reports for several days before anyone notices.

These failures are dangerous because they often do not produce an obvious incident right away.

Instead, they accumulate.

Examples:

trial users are not converted or expired correctly
stale Firestore documents keep growing storage costs
email or push notifications stop being sent
cache refresh jobs leave users seeing old data
daily reports are missing
webhook retry queues are never drained
database maintenance tasks silently stop
subscription state becomes inconsistent

By the time someone notices, the fix is no longer just “restart the job.”

You may need to backfill data, repair inconsistent records, explain missing notifications, or manually replay failed work.

That is why waiting for user complaints is a bad monitoring strategy for scheduled functions.

How to detect it

The simplest reliable pattern is heartbeat monitoring.

Instead of only checking whether the app is online, you check whether the scheduled function completed when expected.

The idea is straightforward:

Create a heartbeat check for the job.
Give the job a deadline, such as “must complete every 24 hours.”
At the end of the function, send a ping to the heartbeat URL.
If the ping does not arrive on time, alert someone.

This detects the thing you actually care about: whether the scheduled function finished successfully.

For Firebase scheduled functions monitoring, completion-based pings are usually better than start-based pings. A ping at the beginning only proves the function started. It does not prove the work finished.

A good signal should happen after the important work completes.

For example:

exports.dailyBillingSync = onSchedule("every 24 hours", async () => {
  await syncBillingState();
  await sendHeartbeatPing();
});

If syncBillingState() fails, the heartbeat is not sent.

That means the missing heartbeat becomes a useful alert.

Simple solution with example

Here is a practical Firebase scheduled function example using a heartbeat ping.

const { onSchedule } = require("firebase-functions/v2/scheduler");

const HEARTBEAT_URL = process.env.QUIETPULSE_HEARTBEAT_URL;

async function pingHeartbeat() {
  if (!HEARTBEAT_URL) {
    throw new Error("Missing QUIETPULSE_HEARTBEAT_URL");
  }

  const response = await fetch(HEARTBEAT_URL, {
    method: "GET",
  });

  if (!response.ok) {
    throw new Error(`Heartbeat ping failed: ${response.status}`);
  }
}

async function syncBillingState() {
  // Example business logic:
  // - fetch active subscriptions from your payment provider
  // - update Firestore user records
  // - expire unpaid accounts
  // - write audit logs
}

exports.dailyBillingSync = onSchedule(
  {
    schedule: "every 24 hours",
    timeZone: "UTC",
    timeoutSeconds: 300,
    memory: "512MiB",
  },
  async () => {
    await syncBillingState();

    await pingHeartbeat();
  }
);

Your environment variable would contain a heartbeat URL like:

https://quietpulse.xyz/ping/{token}

The important detail is placement.

Put the heartbeat ping after the critical work, not before it.

If the scheduled function crashes, times out, or exits before finishing, the ping will not be sent. Your monitoring system can then alert you that the expected completion signal is missing.

You can also use finally, but be careful. If you always ping inside finally, you may report success even when the job failed. For scheduled jobs, that is usually the wrong signal.

This is risky:

exports.dailyJob = onSchedule("every 24 hours", async () => {
  try {
    await doImportantWork();
  } finally {
    await pingHeartbeat();
  }
});

That sends a heartbeat even after failure.

This is usually better:

exports.dailyJob = onSchedule("every 24 hours", async () => {
  await doImportantWork();
  await pingHeartbeat();
});

Now the ping means “the job completed,” not just “the job started.”

Instead of building the alerting layer yourself, you can use a simple heartbeat monitoring tool like QuietPulse. Create a check, copy the ping URL, call it after your Firebase scheduled function completes, and get alerted if the ping is late. The point is not the tool itself — the important part is having an external completion signal.

Common mistakes

1. Only checking Firebase logs

Logs are useful when you already know something is wrong.

They are not enough to tell you that a job never ran.

If a scheduled function is not invoked, there may be no application log from that function at all. You might need to inspect Cloud Scheduler logs, Pub/Sub delivery, function logs, IAM errors, and deployment history.

That is a lot to rely on during an incident.

2. Pinging before the work finishes

A heartbeat at the start of the function proves invocation, not completion.

For scheduled functions, completion is usually what matters. If the job starts and then fails halfway through, an early ping can hide the failure.

Put the ping after the important work succeeds.

3. Using one heartbeat for many jobs

It is tempting to reuse one heartbeat URL for every scheduled function.

Avoid that.

A billing sync, cleanup job, report exporter, and notification sender should each have their own check. Otherwise, one healthy job can mask another broken one.

Use separate heartbeat checks for separate responsibilities.

4. Ignoring time zones

Firebase schedules can run with a configured time zone. Your product logic may assume local time, while your monitoring window assumes UTC.

That mismatch can create false alerts or hide real delays.

Be explicit about time zones in both the scheduled function and monitoring configuration.

5. Not testing failure cases

Do not only test the happy path.

Test what happens when:

the function throws an error
an external API times out
the heartbeat URL is missing
the job takes longer than expected
the scheduled function is disabled
the deployment removes or renames the function

Monitoring that has never seen a failure is often monitoring you cannot trust.

Alternative approaches

Heartbeat monitoring is not the only option. It is just one of the clearest ways to detect missed scheduled work.

Other useful signals include:

Firebase and Google Cloud logs

Cloud Logging can show function errors, execution duration, and scheduler delivery events. This is useful for debugging.

The downside is that logs are often reactive. Someone still needs to notice the failure, query the right logs, and understand what should have happened.

Error tracking

Tools like Sentry can catch exceptions inside scheduled functions.

That helps when the function runs and throws.

But error tracking may not catch missed invocations. If the function never starts, there may be no exception inside your application code.

Cloud Scheduler monitoring

You can monitor Cloud Scheduler execution attempts and failures.

This helps detect trigger-level issues, but it may not prove business-level completion. The scheduler can successfully invoke a function that later fails internally.

Database audit records

Some teams write a job_runs document to Firestore for each scheduled task.

That can be very useful:

await db.collection("job_runs").add({
  job: "dailyBillingSync",
  status: "success",
  finishedAt: new Date().toISOString(),
});

This gives you a history of runs.

But you still need something to watch that history and alert you when a run is missing.

Custom dashboards

You can build your own dashboard showing the last successful run of each job.

That works well if you have time to maintain it. For small teams and indie projects, an external heartbeat check is often simpler and less fragile.

FAQ

What is Firebase scheduled functions monitoring?

Firebase scheduled functions monitoring is the process of checking whether scheduled Cloud Functions run and complete on time. It helps detect missed executions, runtime failures, delays, and silent scheduled job problems before they affect users or data.

Are Firebase logs enough to monitor scheduled functions?

Firebase logs are helpful for debugging, but they are not enough by themselves. Logs can show errors after you look for them, but they may not proactively alert you when a scheduled function never runs or never completes.

Should I ping a heartbeat at the start or end of a scheduled function?

For most production jobs, ping at the end. A start ping only proves that the function began. An end ping proves that the important work completed successfully.

Can Firebase scheduled functions fail silently?

Yes. They can fail because of permissions, deployment changes, missing environment variables, timeouts, quota issues, external API failures, or scheduler configuration problems. Some failures may not be obvious from normal uptime checks.

How often should I monitor a Firebase scheduled function?

Match the monitoring window to the schedule. If a function runs every hour, alert if it does not complete within a reasonable grace period after that hour. If it runs daily, use a daily check with enough grace time for normal delays.

Conclusion

Firebase scheduled functions are great for background work, but they can fail quietly.

The app may stay online while billing syncs, cleanup tasks, reports, notifications, or maintenance jobs stop running.

Good Firebase scheduled functions monitoring focuses on completion. Add a heartbeat ping after the important work finishes, give each job its own check, and alert when the expected signal does not arrive.

That simple pattern catches the failures that uptime checks, dashboards, and logs often miss.

Originally published at https://quietpulse.xyz/blog/firebase-scheduled-functions-monitoring

Supabase Scheduled Functions Monitoring: How to Catch Missed Runs Before They Break Production

quietpulse — Fri, 08 May 2026 06:11:06 +0000

Supabase scheduled functions monitoring matters because scheduled backend work can fail quietly while your app still looks completely healthy.

Your frontend loads. Your API responds. The database is online. Auth works. But the scheduled function that cleans old rows, syncs billing data, sends reminders, refreshes materialized views, or calls an external API may have stopped running hours ago.

That is the dangerous part about scheduled work in serverless and database-backed apps: the failure is often invisible until some downstream symptom appears.

A scheduled function is not “up” in the same way a web endpoint is up. It either ran when expected and completed the important work, or it did not.

Supabase gives developers a powerful stack for building quickly, including Edge Functions, Postgres, cron-like scheduling patterns, and database automation. But once scheduled tasks become part of production, you need monitoring that answers a very specific question:

Did this scheduled function actually run and finish successfully?

The problem

Scheduled backend work is easy to add and easy to forget.

In a Supabase app, you might have scheduled work that:

deletes expired sessions or temporary records
sends daily or weekly email digests
refreshes reporting tables
syncs subscription status from a payment provider
calls an external API on a schedule
checks usage limits
exports analytics
cleans up old files in storage
recalculates account metrics
triggers notifications
verifies backup or data consistency jobs

Some of this work may live in Supabase Edge Functions. Some may be driven by Postgres cron extensions, database triggers, external schedulers, GitHub Actions, Vercel Cron, or another service that calls a Supabase function endpoint.

The exact implementation varies, but the operational risk is the same:

If the scheduled job does not run, your main app can still look fine.

Traditional uptime monitoring might check your homepage or API route. That is useful, but it does not tell you whether yesterday’s cleanup ran, whether today’s digest was sent, or whether the scheduled sync completed.

This creates a silent failure.

The system is not fully down. There may be no obvious error on the public surface. But an important background process is missing.

Why it happens

Supabase scheduled functions can fail silently for several reasons.

One common cause is scheduler drift. The schedule may be configured outside the main application code, or it may depend on infrastructure that someone rarely checks. A cron expression can be wrong, disabled, duplicated, or moved to a different environment. A staging schedule may accidentally replace production behavior, or production may stop receiving triggers after a deployment change.

Another cause is function deployment drift. Edge Functions are code, and code changes. A function can be renamed, removed, redeployed with different environment variables, or changed in a way that breaks scheduled execution while manual tests still pass.

For example, a scheduled function might depend on an environment variable:

const apiKey = Deno.env.get("EXTERNAL_API_KEY");

if (!apiKey) {
  throw new Error("Missing EXTERNAL_API_KEY");
}

If that secret is missing in production, the function may fail every time the scheduler invokes it.

Database permissions can also be a problem. Scheduled work often touches tables, service-role operations, storage buckets, or external APIs. A permission change that is safe for normal user requests can still break a privileged background job.

External dependencies are another source of failure. A scheduled function might call Stripe, Resend, Slack, OpenAI, a webhook endpoint, or an internal API. If that dependency times out, rate limits requests, changes response shape, or rejects credentials, the scheduled job can fail even though Supabase itself is healthy.

Timeouts and partial completion are especially tricky. A function might start successfully, process half the records, then time out. Logs may contain the failure, but unless someone checks them or receives an alert, the job can keep failing silently.

Finally, many teams confuse logs with monitoring. Supabase logs are useful when investigating a known issue. But logs do not automatically prove that a scheduled job ran on time and completed successfully.

Monitoring should detect the missing success signal before a human goes digging through logs.

Why it's dangerous

Missed scheduled functions are dangerous because they usually manage work that users do not directly trigger.

A failed scheduled function can cause:

stale reports or dashboards
expired data staying in the database
email digests not being sent
payment status not syncing
usage limits not updating
notification queues backing up
old files accumulating in storage
cleanup tasks never running
third-party integrations falling behind
delayed or incorrect customer-facing data

These failures often compound.

If a daily cleanup misses one run, maybe nothing obvious happens. If it misses seven runs, tables grow unexpectedly, queries slow down, storage costs rise, and users start seeing outdated data.

If a billing sync stops, customers may keep access after cancellation, lose access after payment, or receive confusing account states.

If a reporting refresh fails, business dashboards quietly become wrong. That can lead to bad decisions because the data looks normal, just stale.

The painful part is that the first visible symptom usually appears far away from the root cause.

Someone might report:

“Why didn’t I get the digest?”
“Why is this dashboard stale?”
“Why is this user still marked active?”
“Why is storage growing so fast?”
“Why did this webhook not update the account?”

Then you have to reconstruct what happened:

Did the scheduler fire?
Did Supabase receive the request?
Did the Edge Function start?
Did it have the right secrets?
Did it finish?
Did it fail halfway through?
Did it retry?
Did anyone get alerted?

That is exactly the kind of uncertainty good scheduled function monitoring should remove.

How to detect it

The most reliable way to detect missed scheduled functions is to make the function send a success signal after the important work completes.

This is heartbeat monitoring.

Instead of asking, “Is my Supabase project online?”, heartbeat monitoring asks:

“Did this specific scheduled function report success inside the expected time window?”

The pattern is simple:

Create a heartbeat check for the expected schedule.
Run your scheduled Supabase function normally.
Send a heartbeat ping only after the critical work succeeds.
Alert if the ping does not arrive on time.

This catches the failure mode that logs and uptime checks often miss: absence.

If the scheduler never fires, no ping arrives.

If the function crashes before completion, no ping arrives.

If a secret is missing, no ping arrives.

If an external API fails and the job exits early, no ping arrives.

If the job hangs or times out before reaching the final step, no ping arrives.

That makes the monitoring signal much more meaningful than checking whether a generic endpoint returns 200 OK.

The heartbeat should represent successful completion, not just startup.

For example, if your function syncs subscription status, the ping should happen after the sync finishes. If your function sends a daily digest, the ping should happen after the digest job completes. If your function refreshes reporting tables, the ping should happen after the refresh succeeds.

A startup ping only proves that the function began. A completion ping proves the work finished.

Simple solution with example

Here is a simplified Supabase Edge Function example.

Imagine you have a scheduled function that cleans up expired rows once per day.

// supabase/functions/cleanup-expired-records/index.ts

import { serve } from "https://deno.land/std@0.224.0/http/server.ts";
import { createClient } from "https://esm.sh/@supabase/supabase-js@2";

serve(async () => {
  const supabaseUrl = Deno.env.get("SUPABASE_URL");
  const serviceRoleKey = Deno.env.get("SUPABASE_SERVICE_ROLE_KEY");
  const heartbeatUrl = Deno.env.get("CLEANUP_HEARTBEAT_URL");

  if (!supabaseUrl || !serviceRoleKey) {
    return new Response("Missing Supabase configuration", { status: 500 });
  }

  const supabase = createClient(supabaseUrl, serviceRoleKey);

  const { error } = await supabase
    .from("temporary_records")
    .delete()
    .lt("expires_at", new Date().toISOString());

  if (error) {
    console.error("Cleanup failed:", error);
    return new Response("Cleanup failed", { status: 500 });
  }

  if (heartbeatUrl) {
    const ping = await fetch(heartbeatUrl);

    if (!ping.ok) {
      console.error("Heartbeat ping failed:", ping.status);
      return new Response("Heartbeat failed", { status: 500 });
    }
  }

  return new Response("Cleanup completed", { status: 200 });
});

Then set the heartbeat URL as an environment variable:

CLEANUP_HEARTBEAT_URL=https://quietpulse.xyz/ping/{token}

The exact scheduler can vary. You might call this Edge Function from an external cron service, from GitHub Actions, from another platform scheduler, or from a database-driven scheduling setup.

The important part is not which scheduler triggers the function.

The important part is that the function reports success only after the critical work completes.

If the scheduled function runs every day at 02:00 UTC, configure the heartbeat check to expect one ping per day, with a reasonable grace period.

For example:

expected interval: 24 hours
grace period: 30–60 minutes
alert channel: Telegram, webhook, or another notification route

That way, if the function does not complete by the expected time, you get an alert.

Instead of discovering the issue from stale data later, you know shortly after the scheduled work fails to report success.

Instead of building this alerting flow yourself, you can use a heartbeat monitoring tool like QuietPulse. Create a check, copy the ping URL, and call it at the end of your scheduled function. If the ping does not arrive on time, QuietPulse can notify you through Telegram or webhooks.

Common mistakes

1. Pinging at the start of the function

A heartbeat ping at the beginning only proves that the function started.

That can create false confidence.

If the function starts, deletes nothing, fails on an external API call, times out, or crashes halfway through, the monitor still sees a successful ping.

For scheduled functions, the heartbeat should usually be the final step after the important work completes.

2. Monitoring only the public app

A homepage uptime check is useful, but it does not monitor background work.

Your Supabase app can be online while scheduled functions fail for days.

Use uptime checks for request/response availability. Use heartbeat checks for scheduled work.

They answer different questions.

3. Relying only on logs

Logs are valuable for debugging, but they are weak as the first line of detection.

If nobody is watching the logs, a failure can sit there unnoticed.

A heartbeat check gives you an explicit missing-success alert. Logs then help you investigate why the success signal did not arrive.

4. Using one heartbeat for multiple jobs

If you have several scheduled functions, do not hide them behind one generic monitor.

A cleanup job, billing sync, digest sender, and reporting refresh should usually have separate checks.

Separate checks make alerts actionable. You immediately know which scheduled function missed its expected run.

5. Ignoring partial failures

A function can return success even when part of the work failed.

For example, a digest job might send 900 emails out of 1,000 and silently skip the rest. A sync might process one page of API results and fail before the next page.

Make sure your function treats important partial failures as failures. Only ping the heartbeat after the job meets your real success criteria.

Alternative approaches

Heartbeat monitoring is the cleanest way to detect missed scheduled functions, but it is not the only useful signal.

Supabase logs

Supabase logs are important for debugging. They can show function invocations, errors, stack traces, and timing information.

Use them to answer “what happened?”

But logs are less reliable for answering “did the expected scheduled function finish on time?” unless you build alerting around them.

Database audit tables

Some teams create a job_runs table and insert a row for each scheduled execution.

For example:

create table job_runs (
  id uuid primary key default gen_random_uuid(),
  job_name text not null,
  status text not null,
  started_at timestamptz not null default now(),
  finished_at timestamptz,
  error_message text
);

This can be very useful, especially for internal dashboards and debugging history.

But you still need something to check that table and alert when a run is missing or failed. Otherwise, it becomes another place where failures are recorded but not noticed.

External scheduler alerts

Some schedulers provide failure notifications. That helps when the scheduler fires and receives a failing response.

But scheduler alerts may not catch every important case. They might not know whether the function completed all internal work correctly. They also may not alert when the function returns 200 OK too early.

Heartbeat monitoring works well alongside scheduler alerts because it focuses on completion of the actual work.

Application metrics

Metrics are useful when scheduled functions affect measurable values: rows processed, emails sent, API records synced, duration, error count, and so on.

If you already have metrics infrastructure, instrumenting scheduled functions is a good idea.

But for many small teams and indie projects, a simple heartbeat is faster to set up and catches the most important failure mode: the job did not complete.

FAQ

What is Supabase scheduled functions monitoring?

Supabase scheduled functions monitoring is the practice of tracking whether scheduled backend work in a Supabase app runs and completes successfully. This can include Edge Functions, database jobs, cleanup tasks, sync jobs, reporting refreshes, and other recurring automation.

How do I know if a Supabase scheduled function did not run?

The most direct way is to use a heartbeat check. Add a ping at the end of the scheduled function after the important work succeeds. If the ping does not arrive within the expected time window, the function probably did not run or did not complete successfully.

Are Supabase logs enough for scheduled function monitoring?

Supabase logs are useful for debugging, but they are not always enough for monitoring. Logs can show errors after you know there is a problem. Heartbeat monitoring alerts you when the expected success signal is missing.

Should I ping before or after the scheduled function work?

Usually after. A heartbeat ping should represent successful completion. If you ping at the beginning, the function can still fail halfway through while the monitor thinks everything is fine.

Can I monitor multiple Supabase scheduled functions with one heartbeat?

You can, but it is usually better to create one heartbeat check per important scheduled function. Separate checks make alerts clearer and help you quickly identify which job missed its run.

Conclusion

Scheduled functions are easy to trust because they usually run quietly in the background.

That quietness is also the risk.

A Supabase app can look healthy while a cleanup job, billing sync, digest sender, or reporting refresh silently stops running. Uptime checks and logs help, but they do not always prove that the scheduled work completed on time.

For production scheduled work, add a completion signal.

Use heartbeat monitoring to confirm that each important Supabase scheduled function runs when expected and finishes successfully. If the heartbeat does not arrive, alert early, investigate quickly, and fix the issue before stale data or missed automation turns into a real incident.

Originally published at https://quietpulse.xyz/blog/supabase-scheduled-functions-monitoring

GitLab Scheduled Pipeline Monitoring: How to Catch Missed CI/CD Runs Before They Break Production

quietpulse — Thu, 07 May 2026 06:22:08 +0000

GitLab scheduled pipeline monitoring matters because scheduled CI/CD jobs can fail quietly while the rest of your system looks healthy.

Your application is up. Your GitLab project is reachable. Recent commits build successfully. But the scheduled pipeline that runs nightly tests, refreshes staging data, checks dependencies, builds reports, syncs artifacts, or runs cleanup jobs may have stopped hours or days ago.

That is the uncomfortable part about scheduled pipelines: they often run outside the normal developer flow. Nobody is sitting there waiting for them. When they fail silently, the first visible symptom may be stale data, missed checks, broken deployments, or a production issue that should have been caught earlier.

GitLab scheduled pipelines are useful, but they still need monitoring that answers one simple question:

Did the scheduled pipeline actually run and complete successfully?

The problem

A GitLab scheduled pipeline is not the same thing as a pipeline triggered by a commit or merge request.

Commit pipelines are visible because they happen during active development. Someone pushes code, reviews a merge request, and sees whether the pipeline passed or failed.

Scheduled pipelines are different. They run in the background on a timer.

For example, a team might use GitLab pipeline schedules to:

run nightly end-to-end tests
rebuild static assets or documentation
refresh a staging database
check dependency updates
scan containers for vulnerabilities
generate reports
run cleanup scripts
sync data between systems
trigger periodic deployments
validate backups or exports

If one of these scheduled pipelines stops running, normal uptime monitoring will not catch it. Your web app may still return 200 OK. Your API may still respond. GitLab may still be available. But the specific piece of scheduled work is missing.

That creates a silent failure.

The system is not completely down, so broad monitoring stays green. But an important recurring job did not happen.

Why it happens

GitLab scheduled pipelines can fail silently for several reasons.

The first cause is schedule configuration drift. A pipeline schedule may be disabled, edited, pointed at the wrong branch, or configured with a cron expression that does not mean what the team thinks it means. Time zones can also be confusing, especially when teams expect local business time but the schedule is evaluated differently.

The second cause is CI configuration drift. A scheduled pipeline depends on .gitlab-ci.yml. A refactor can rename a job, change rules, remove a stage, or accidentally make a scheduled job stop matching the schedule source.

For example:

nightly_tests:
  stage: test
  script:
    - npm ci
    - npm run test:e2e
  rules:
    - if: '$CI_PIPELINE_SOURCE == "schedule"'

This is clear enough when it works. But if someone changes rules globally, updates stages, removes a variable, or changes the default branch, this job may no longer run as expected.

The third cause is expired or missing credentials. Scheduled pipelines often use tokens, deploy keys, API credentials, registry access, cloud credentials, or environment variables. A normal build might still work while the scheduled job fails because it needs a different secret.

The fourth cause is dependency failure. A scheduled pipeline might call an external API, database, package registry, object storage bucket, internal service, or deployment endpoint. If that dependency fails, the pipeline may fail, hang, or exit early.

The fifth cause is false confidence from GitLab status alone. A failed scheduled pipeline might be visible somewhere in GitLab, but visibility is not the same as alerting. If nobody checks the schedule page or pipeline history, the failure can sit there unnoticed.

Why it's dangerous

Missed scheduled pipelines are dangerous because they usually protect work that is not checked by normal request/response monitoring.

A failed or missed scheduled pipeline can mean:

nightly tests stop catching regressions
dependency checks stop running
stale artifacts are served
vulnerability scans are skipped
staging data becomes outdated
reports are not generated
cleanup jobs never run
backups are not verified
scheduled deployments do not happen
compliance or audit checks are missed

The risk is not always immediate. That is exactly what makes it easy to ignore.

If your production API goes down, someone notices quickly. If a nightly scheduled pipeline fails three nights in a row, you might only discover it when a release breaks, a customer reports stale data, or a security scan that should have run never produced results.

By then, debugging becomes harder.

You have to answer:

Did GitLab trigger the schedule?
Did the pipeline start?
Did the expected jobs run?
Did a rule skip them?
Did a secret expire?
Did a dependency fail?
Did the pipeline succeed but skip the important step?
Did anyone get notified?

GitLab pipeline history is useful for investigation. But monitoring should tell you there is a problem before you need to investigate.

How to detect it

The most reliable way to detect a missing scheduled pipeline is to make the pipeline send a success signal after the important work finishes.

This is heartbeat monitoring.

Instead of asking, “Is GitLab up?”, heartbeat monitoring asks, “Did this specific scheduled pipeline report success inside the expected time window?”

For GitLab scheduled pipeline monitoring, the pattern looks like this:

Create a heartbeat check for the expected schedule.
Run the scheduled GitLab pipeline normally.
Put the heartbeat ping at the end of the job that proves success.
If the ping does not arrive on time, send an alert.

This catches the failure mode that normal uptime checks miss: absence.

A heartbeat monitor does not need to understand every detail of your pipeline. It only needs to know whether the completion signal arrived when expected.

For example:

a nightly test pipeline should ping once per night
an hourly sync pipeline should ping once per hour
a weekly vulnerability scan should ping once per week
a daily report pipeline should ping after the report is generated
a backup verification pipeline should ping after verification succeeds

The important detail is placement.

Send the heartbeat after the meaningful work completes, not at the start of the pipeline. If you ping first and the job fails later, your monitor will think the scheduled pipeline is healthy when it is not.

A good heartbeat means:

“The scheduled pipeline ran and reached the success point.”

Simple solution

Here is a simple GitLab CI job that runs only for scheduled pipelines and sends a heartbeat after the work succeeds.

stages:
  - test
  - notify

nightly_tests:
  stage: test
  rules:
    - if: '$CI_PIPELINE_SOURCE == "schedule"'
  script:
    - npm ci
    - npm run test:e2e

scheduled_pipeline_heartbeat:
  stage: notify
  rules:
    - if: '$CI_PIPELINE_SOURCE == "schedule"'
  needs:
    - nightly_tests
  script:
    - curl --fail --silent --show-error "$QUIETPULSE_PING_URL"

The environment variable would contain a ping URL like:

https://quietpulse.xyz/ping/YOUR_TOKEN

This is intentionally simple.

The scheduled work runs first. If nightly_tests fails, the heartbeat job does not run. If the scheduled pipeline never starts, no heartbeat arrives. If the pipeline is disabled, no heartbeat arrives. If a rule skips the job, no heartbeat arrives.

That absence is the signal.

For a more realistic pipeline, you might have multiple jobs before the heartbeat:

stages:
  - prepare
  - test
  - scan
  - notify

prepare_staging_data:
  stage: prepare
  rules:
    - if: '$CI_PIPELINE_SOURCE == "schedule"'
  script:
    - ./scripts/refresh-staging-data.sh

nightly_e2e_tests:
  stage: test
  rules:
    - if: '$CI_PIPELINE_SOURCE == "schedule"'
  needs:
    - prepare_staging_data
  script:
    - npm ci
    - npm run test:e2e

dependency_scan:
  stage: scan
  rules:
    - if: '$CI_PIPELINE_SOURCE == "schedule"'
  script:
    - ./scripts/check-dependencies.sh

scheduled_success_ping:
  stage: notify
  rules:
    - if: '$CI_PIPELINE_SOURCE == "schedule"'
  needs:
    - nightly_e2e_tests
    - dependency_scan
  script:
    - curl --fail --silent --show-error "$QUIETPULSE_PING_URL"

In this setup, the heartbeat is only sent after the important scheduled work succeeds.

You can store the ping URL as a protected CI/CD variable in GitLab:

QUIETPULSE_PING_URL=https://quietpulse.xyz/ping/YOUR_TOKEN

Then your pipeline can reference it without hardcoding the URL in the repository.

If you have multiple scheduled pipelines, use separate heartbeat checks.

For example:

NIGHTLY_TESTS_PING_URL
DAILY_REPORT_PING_URL
WEEKLY_SCAN_PING_URL
HOURLY_SYNC_PING_URL

Do not reuse one heartbeat URL for unrelated schedules. A weekly scan and an hourly sync have different expectations. Sharing a monitor between them makes alerts confusing and can hide failures.

Instead of building alerting around GitLab schedule history yourself, you can use a simple heartbeat monitoring tool like QuietPulse. Create one check per scheduled pipeline, put the ping URL at the end of the successful job, and get alerted if the signal does not arrive on time. The important part is not the tool name; it is monitoring the actual completion signal.

Common mistakes

1. Pinging at the start of the pipeline

This is the most common mistake.

If your first job sends the heartbeat and then the important work fails, your monitoring is lying to you.

Bad pattern:

scheduled_start_ping:
  script:
    - curl "$QUIETPULSE_PING_URL"
    - ./run-important-job.sh

Better pattern:

scheduled_job:
  script:
    - ./run-important-job.sh
    - curl --fail --silent --show-error "$QUIETPULSE_PING_URL"

The ping should mean success, not “the pipeline started.”

2. Monitoring only GitLab availability

GitLab being reachable does not mean your scheduled pipeline ran.

A status page or uptime check can tell you whether GitLab is generally available. It cannot tell you whether your specific project schedule fired, whether the correct jobs ran, or whether your nightly task finished successfully.

Scheduled work needs job-level monitoring.

3. Relying on someone to check pipeline history

Pipeline history is useful, but it is passive.

If the workflow depends on a human remembering to open GitLab and inspect yesterday’s scheduled run, the monitoring system is really just hope with a dashboard.

Dashboards are for investigation. Alerts are for detection.

4. Using one monitor for many different jobs

It is tempting to create one generic “GitLab scheduled jobs” monitor.

That becomes messy quickly.

If an alert fires, which job failed? The nightly tests? The weekly scan? The report builder? The cleanup script?

Use separate heartbeat checks for separate responsibilities.

5. Ignoring skipped jobs

A GitLab pipeline can “succeed” while the job you cared about was skipped because of rules, only, except, branch filters, variables, or a config change.

For scheduled pipeline monitoring, make sure the heartbeat depends on the actual jobs that prove the scheduled task succeeded.

If the important job is skipped but the heartbeat still runs, your monitor will miss the problem.

Alternative approaches

Heartbeat monitoring is usually the simplest way to detect missed scheduled pipelines, but it is not the only signal you can use.

GitLab pipeline notifications

GitLab can notify users about failed pipelines. This is useful, especially for failures that GitLab clearly detects.

The limitation is that notification settings can be noisy, personal, or easy to ignore. They also may not cover the case where the expected scheduled pipeline never runs or the important job is skipped.

GitLab API checks

You can build a script that calls the GitLab API and checks the latest scheduled pipeline status.

This gives you more control. For example, you can query the last pipeline for a schedule and alert if it is too old or failed.

The tradeoff is complexity. You now need another scheduled job to check the scheduled job, plus authentication, API handling, retries, and alert routing.

Logs and artifacts

Logs and artifacts are excellent for debugging.

They can show why a scheduled pipeline failed, which command broke, and what output was produced.

But logs are not enough for detection. A log file sitting in GitLab does not help if nobody knows they need to look at it.

Uptime checks

Uptime checks are good for public HTTP endpoints.

They are not enough for scheduled pipelines.

A website can be up while a scheduled pipeline is missing. Your production API can respond successfully while nightly tests have not run in three days.

Use uptime monitoring for availability. Use heartbeat monitoring for scheduled work.

GitLab status badges

Pipeline badges are useful visual indicators in README files or dashboards.

But they are not alerts. They also usually show the latest pipeline status, which might not represent a specific scheduled workflow.

A badge can be green while a separate scheduled job is missing.

FAQ

What is GitLab scheduled pipeline monitoring?

GitLab scheduled pipeline monitoring is the practice of checking that scheduled GitLab CI/CD pipelines run and complete successfully on time. It helps detect missed, failed, skipped, or silently broken scheduled jobs before they cause production issues.

How do I know if a GitLab scheduled pipeline did not run?

The safest approach is to use a completion heartbeat. Put a ping at the end of the scheduled job or final success stage. If the expected ping does not arrive within the schedule window, the pipeline either did not run, failed before completion, or skipped the success path.

Can GitLab notify me when a scheduled pipeline fails?

Yes, GitLab can send pipeline notifications, and those are useful. But they may not catch every silent failure mode, especially when a schedule is disabled, jobs are skipped, or alerts depend on individual notification preferences. A heartbeat check gives you an external signal that the scheduled work completed.

Should I monitor every GitLab scheduled pipeline separately?

Usually, yes. Each scheduled pipeline with a different responsibility or schedule should have its own monitor. This makes alerts easier to understand and prevents one successful job from hiding another failed or missed job.

Where should I put the heartbeat ping in GitLab CI?

Put the heartbeat ping after the important work succeeds. If you have multiple stages, place it in a final stage that depends on the jobs that must complete successfully. Do not put it at the beginning of the pipeline.

Conclusion

GitLab scheduled pipelines are great for recurring CI/CD work, but they are easy to forget because they run in the background.

A passing website uptime check does not prove that nightly tests ran. A green GitLab project does not prove that a scheduled cleanup, report, scan, or sync completed successfully.

The practical fix is simple: make each important scheduled pipeline send a heartbeat after it finishes. If that signal does not arrive on time, alert someone.

That turns a silent failure into a visible one — before it becomes a production surprise.

Originally published at https://quietpulse.xyz/blog/gitlab-scheduled-pipeline-monitoring

Cloudflare Workers Cron Monitoring: How to Catch Missed Triggers Before They Break Production

quietpulse — Wed, 06 May 2026 06:14:56 +0000

Cloudflare Workers Cron Monitoring matters because scheduled edge jobs can fail quietly while the rest of your app looks healthy.

Your website can be up. Your API can return 200 OK. The Worker can be deployed. But the Cron Trigger that refreshes cached data, syncs records, sends reports, or cleans old state may have stopped completing successfully hours ago.

That is the monitoring gap with cron-like systems: normal uptime checks tell you whether a public endpoint responds. They do not tell you whether scheduled background work actually happened.

The problem

Cloudflare Workers Cron Triggers are commonly used for small but important recurring tasks:

refreshing cached data
syncing from third-party APIs
generating reports
cleaning expired records
updating search indexes
sending webhook retries
warming edge data before traffic arrives

Many of these jobs do not have a public URL. Cloudflare invokes the Worker on a schedule, the code runs, and the result is visible only through logs, metrics, or downstream state.

If the job stops running or fails halfway through, your normal uptime monitor may stay green.

That is a silent failure.

The system is not fully down, but something important stopped happening.

Why it happens

A scheduled Cloudflare Worker can fail for several practical reasons.

Configuration can be wrong. The cron expression may not match the intended schedule. The trigger may exist in staging but not production. A deployment may accidentally remove or change the scheduled handler.

Runtime code can fail. The Worker may throw while calling an API, parsing JSON, writing to KV, D1, R2, or an external database.

Dependencies can fail. Third-party APIs can return errors, rate limits, malformed responses, or slow timeouts.

Jobs can also partially succeed. A Worker may process some records, skip others, log an error, and exit in a way nobody notices until stale data shows up.

A simple scheduled Worker might look like this:

export default {
  async scheduled(controller, env, ctx) {
    await refreshCache(env);
  },
};

That code may be fine. But nothing in it tells you that refreshCache() completed successfully every time it was expected to run.

Why it's dangerous

Missed Cron Triggers usually break business logic, not basic availability.

A failed scheduled job can mean:

stale data remains visible
reports are not generated
usage is not synced
cleanup tasks do not run
exports are missing
old records pile up
customers see outdated information

The delay is what makes it painful.

If a public API goes down, someone notices quickly. If an hourly scheduled Worker fails silently, the first symptom may appear much later. By then you are digging through logs and trying to reconstruct what happened.

Logs help with investigation. They do not always help with detection.

How to detect it

The simplest detection pattern is heartbeat monitoring.

Instead of asking, “Is my website up?”, heartbeat monitoring asks:

Did this specific scheduled job finish successfully within the expected time window?

For Cloudflare Workers Cron Monitoring, the flow is:

Create a heartbeat check for the job schedule.
Run the scheduled Worker normally.
Send a heartbeat ping after the job completes successfully.
Alert if the ping does not arrive on time.

The key detail is that the ping should happen at the end, not the beginning.

A heartbeat at the start only proves the Worker began running. It does not prove the work finished.

Simple solution

Here is a basic Cloudflare Worker scheduled handler with a completion heartbeat:

export default {
  async scheduled(controller, env, ctx) {
    await runScheduledJob(env);

    await fetch(env.QUIETPULSE_PING_URL);
  },
};

async function runScheduledJob(env) {
  const response = await fetch("https://api.example.com/data");

  if (!response.ok) {
    throw new Error(`API request failed: ${response.status}`);
  }

  const data = await response.json();

  await saveDataSomewhere(env, data);
}

async function saveDataSomewhere(env, data) {
  // Write to KV, R2, D1, an external API, or another storage system.
}

The heartbeat URL can be stored as an environment variable:

https://quietpulse.xyz/ping/YOUR_TOKEN

The job runs first. The heartbeat is sent only after the useful work completes.

If the Worker throws before that point, the ping is not sent. The missing ping becomes the alert signal.

A slightly more explicit version:

export default {
  async scheduled(controller, env, ctx) {
    try {
      await refreshImportantData(env);
      await fetch(env.QUIETPULSE_PING_URL);
    } catch (error) {
      console.error("Scheduled Worker failed", error);
      throw error;
    }
  },
};

async function refreshImportantData(env) {
  const response = await fetch("https://api.example.com/latest");

  if (!response.ok) {
    throw new Error(`Upstream API failed with ${response.status}`);
  }

  const payload = await response.json();

  // Store or process the payload here.
}

If one Worker handles multiple Cron Triggers, use separate heartbeat checks:

export default {
  async scheduled(controller, env, ctx) {
    switch (controller.cron) {
      case "0 * * * *":
        await hourlySync(env);
        await fetch(env.HOURLY_SYNC_PING_URL);
        break;

      case "0 2 * * *":
        await dailyCleanup(env);
        await fetch(env.DAILY_CLEANUP_PING_URL);
        break;

      default:
        console.log(`No handler for cron: ${controller.cron}`);
    }
  },
};

Separate checks make alerts more useful. “Hourly sync missed a run” is better than “some scheduled Worker may have failed.”

Instead of building all the heartbeat timing, grace periods, and alert delivery yourself, you can use a simple heartbeat monitoring tool like QuietPulse. Create a check, copy the ping URL, and call it after your Cloudflare Worker Cron Trigger finishes successfully. If the expected ping is missing, QuietPulse can notify you through the alert channels you configured.

Common mistakes

1. Monitoring only the website

A public uptime monitor does not prove that a scheduled Worker ran. Use uptime checks for public URLs and heartbeat checks for scheduled jobs.

2. Pinging before the work is done

If you send the heartbeat at the start, the monitor can show success even when the job fails later.

Send the ping after successful completion.

3. Swallowing errors and still pinging

Avoid this pattern:

try {
  await runJob();
} catch (error) {
  console.error(error);
}

await fetch(env.QUIETPULSE_PING_URL);

The job failed, but the heartbeat still says success.

4. Sharing one monitor across unrelated jobs

Different schedules should usually have different heartbeat checks. It makes alerts easier to understand and act on.

5. Forgetting time zones

Be careful with cron expressions and expected run times. Document whether the schedule is intended to match UTC or a business timezone.

Alternative approaches

Logs

Cloudflare logs are useful for debugging after an alert. They are less useful as the only way to notice a missed run.

Dashboard metrics

Metrics can show invocations and errors, but they may not map directly to “this business job completed successfully every hour.”

Downstream state checks

You can monitor the output of the job, such as a timestamp in storage or a recently updated file. This is powerful but often more custom than a heartbeat ping.

Status endpoint

Some teams expose an endpoint that reports the last successful run time. An external monitor checks whether that timestamp is fresh. This works well, but for simple jobs a heartbeat ping is usually less code.

FAQ

What is Cloudflare Workers Cron Monitoring?

Cloudflare Workers Cron Monitoring means checking whether scheduled Cloudflare Worker jobs run and complete successfully. Heartbeat monitoring is a common way to do this.

Can uptime monitoring detect missed Cloudflare Cron Triggers?

Not reliably. Uptime monitoring checks public endpoints. A Cron Trigger can fail while the rest of your app stays online.

Where should the heartbeat ping go?

After the scheduled work finishes successfully. If the job fails, the success heartbeat should not be sent.

Should every Cron Trigger have its own heartbeat?

Usually yes. Separate heartbeat checks make alerts clearer and easier to debug.

Are logs enough?

Logs are helpful for investigation, but they are not always enough for alerting. A heartbeat check detects the missing successful run directly.

Conclusion

Cloudflare Workers Cron Triggers are great for lightweight scheduled work, but they still need monitoring.

If a job matters, make it report successful completion. Send a heartbeat after the work finishes, alert when the heartbeat is missing, and treat scheduled jobs as production systems — not background magic.

Originally published at https://quietpulse.xyz/blog/cloudflare-workers-cron-monitoring

Vercel Cron Monitoring: How to Catch Missed Executions Before They Break Production

quietpulse — Tue, 05 May 2026 06:18:52 +0000

Vercel Cron monitoring matters because scheduled serverless work is easy to forget once it “usually works.” You add a cron job to rebuild cached data, sync billing state, send reports, clean up expired records, or call an internal API every hour. It runs fine during testing. The deployment looks healthy. The website stays online.

Then one day the scheduled work silently stops.

No page goes down. No uptime monitor turns red. Users may not notice immediately. But your database starts drifting, stale records pile up, notifications stop sending, or an external integration falls behind. By the time someone spots the problem, the failure has already become operational debt.

This is the awkward part of scheduled serverless work: the absence of a run is itself the failure. If nobody is watching for that absence, Vercel Cron Jobs can fail quietly.

This guide explains why Vercel Cron Jobs can be missed or broken, why logs alone are not enough, and how to monitor them with heartbeat checks so you know when an expected execution does not happen.

The problem

Vercel Cron Jobs let you schedule HTTP requests to routes in your Vercel project. That makes them a convenient way to trigger small recurring jobs without running your own server.

Common examples include:

refreshing cached API data
syncing subscription or payment status
sending daily email digests
cleaning up expired sessions or tokens
rebuilding search indexes
pulling data from a third-party API
checking whether external workflows are still healthy

The setup is usually simple. You define a schedule in vercel.json, point it at an API route, deploy, and Vercel calls that route on schedule.

For example:

{
  "crons": [
    {
      "path": "/api/cron/sync-customers",
      "schedule": "0 * * * *"
    }
  ]
}

That looks clean, but there is a monitoring gap.

Your app can be online while the cron job is not doing useful work. The route can return a response while the real sync failed halfway through. The job can time out, hit a third-party rate limit, throw an exception, or stop being called after a config change.

Traditional uptime monitoring checks whether a URL responds. Vercel Cron monitoring needs to answer a different question:

Did the scheduled job actually run successfully when it was supposed to?

If the answer is no, you need to know quickly.

Why it happens

Vercel Cron Jobs are reliable enough for many scheduled tasks, but they still live inside a real production system. That means they can break for boring, ordinary reasons.

A cron route might fail because of application code:

an unhandled exception
a changed database schema
a missing environment variable
an expired API token
a timeout during a slow external request
a deployment that changed route behavior
a bad assumption about time zones or dates

It can also fail because of platform or configuration issues:

the cron path was renamed
the route was deleted or moved
the project was redeployed with an invalid vercel.json
the schedule was changed accidentally
the function exceeds execution limits
the job depends on a third-party service that is unavailable

There is also a subtle category: partial success.

Imagine a cron route that syncs invoices from a billing provider. It starts correctly, fetches the first page, updates a few records, then crashes before processing the rest. Depending on how the handler is written, the response might still look successful or the failure might only appear in logs.

Another common problem is assuming that “no alert” means “everything ran.” For scheduled jobs, no alert often just means nothing is checking whether the job happened.

That is why Vercel Cron monitoring should not only look for route errors. It should detect missing successful executions.

Why it's dangerous

Missed cron executions rarely look dramatic at first. That is what makes them dangerous.

If a public page goes down, someone notices. If a checkout flow breaks, customers complain. If a server crashes, metrics spike.

But if a scheduled background task does not run, the damage is often delayed.

A missed customer sync can leave billing state stale. A missed cleanup job can slowly fill a database table. A missed reporting job can make dashboards inaccurate. A missed notification job can break user trust without creating an obvious infrastructure incident.

The risk is higher with serverless cron jobs because the system is distributed:

the scheduler lives on the platform
the handler lives in your app
dependencies may live in external APIs
logs may be spread across deployments
retries may not match your business expectations

You need a signal that represents the thing you actually care about: successful completion.

Not “the app is up.”

Not “the route exists.”

Not “there are logs somewhere.”

The useful signal is:

This scheduled job finished its expected work within the expected time window.

If that signal does not arrive, you should get an alert.

How to detect it

The most practical way to monitor Vercel Cron Jobs is heartbeat monitoring.

A heartbeat is a small HTTP request your job sends after it completes successfully. An external monitor expects that request on a schedule. If the heartbeat does not arrive on time, the monitor alerts you.

The key detail is where you place the heartbeat.

Do not ping at the very beginning of the cron handler. If you do that, the monitor only knows the job started. It does not know whether the important work finished.

Instead, send the heartbeat after the successful part of the job:

Vercel triggers your cron route.
Your handler performs the scheduled work.
The work completes successfully.
The handler sends a heartbeat ping.
The monitor resets the expected window.
If no ping arrives next time, you get alerted.

This creates a much better Vercel Cron monitoring signal.

For example, if a job runs every hour, you might configure the monitor to expect a ping every 60 minutes with a grace period of 10–15 minutes. If the job misses that window, it means the scheduled execution did not complete successfully.

This catches problems like:

the cron route was not called
the handler crashed before completion
the job timed out
the deployment broke the route
an external API caused the job to fail
the code returned early before doing the real work

Heartbeat monitoring is especially useful because it detects silence. Logs and errors are helpful when something runs and fails loudly. Heartbeats catch the case where the expected success signal never arrives.

Simple solution

Here is a simple Vercel Cron Job handler with a heartbeat ping after successful work.

Example with a Next.js App Router route:

// app/api/cron/sync-customers/route.ts

export const dynamic = 'force-dynamic';

async function syncCustomers() {
  // Replace this with your real scheduled work.
  // For example: fetch customers from Stripe, update your database,
  // refresh cached records, or call an internal service.
  console.log('Syncing customers...');

  await new Promise((resolve) => setTimeout(resolve, 500));

  console.log('Customer sync finished');
}

async function sendHeartbeat() {
  const response = await fetch('https://quietpulse.xyz/ping/YOUR_TOKEN', {
    method: 'GET',
    cache: 'no-store',
  });

  if (!response.ok) {
    console.warn('Heartbeat ping failed', response.status);
  }
}

export async function GET(request: Request) {
  const authHeader = request.headers.get('authorization');

  if (authHeader !== `Bearer ${process.env.CRON_SECRET}`) {
    return new Response('Unauthorized', { status: 401 });
  }

  try {
    await syncCustomers();

    // Send the heartbeat only after the scheduled work succeeds.
    await sendHeartbeat();

    return Response.json({ ok: true });
  } catch (error) {
    console.error('Cron job failed', error);

    return Response.json(
      { ok: false, error: 'Cron job failed' },
      { status: 500 }
    );
  }
}

And the matching vercel.json:

{
  "crons": [
    {
      "path": "/api/cron/sync-customers",
      "schedule": "0 * * * *"
    }
  ]
}

In this pattern, the heartbeat is not a replacement for logs or error tracking. It is a separate completion signal.

If the job succeeds, the monitor receives the ping. If the job does not run, crashes, times out, or fails before completion, the ping never arrives. That missing ping becomes the alert.

You can build a heartbeat monitor yourself, but it is usually easier to use a small tool built for this. Instead of building scheduling windows, grace periods, and alert delivery from scratch, you can use a heartbeat monitoring tool like QuietPulse. Create a monitored job, copy the ping URL, place it after successful completion, and configure alerts through Telegram or webhooks.

The important part is not the specific tool. The important part is that your Vercel Cron monitoring should watch for successful completion, not just route availability.

Common mistakes

1. Pinging before the work starts

This is the most common mistake.

If your cron handler sends the heartbeat at the top of the function, the monitor only knows that the route started. The real job may still fail afterward.

Bad pattern:

export async function GET() {
  await fetch('https://quietpulse.xyz/ping/YOUR_TOKEN');

  await syncCustomers();

  return Response.json({ ok: true });
}

Better pattern:

export async function GET() {
  await syncCustomers();

  await fetch('https://quietpulse.xyz/ping/YOUR_TOKEN');

  return Response.json({ ok: true });
}

The heartbeat should represent success, not just execution.

2. Treating Vercel logs as monitoring

Logs are useful when you already know something went wrong. They are not enough for missed execution detection.

If nobody checks the logs, they do not alert you. If the job never runs, there may be no useful application log at all. And if the failure is hidden inside partial work, the logs might not make the problem obvious.

Use logs for debugging. Use heartbeats for detection.

3. Ignoring function time limits

Cron jobs often start small and grow over time. A job that once took five seconds may eventually take forty seconds, then several minutes.

If your function approaches platform limits, it may fail before sending the heartbeat. That is good in the sense that monitoring catches it, but you should also treat duration growth as a design warning.

Long-running jobs may need batching, pagination, queues, or a different execution environment.

4. Not protecting the cron route

A Vercel Cron route is still an HTTP endpoint. If it triggers real production work, protect it.

Use a secret header or token check so random requests cannot trigger the job manually. Vercel supports cron requests to your path, but your app should still validate that the request is expected.

A simple bearer token check is often enough for small projects.

5. Using the wrong schedule window

If your cron runs every hour, do not alert at exactly 60 minutes unless you are comfortable with occasional noise. Real systems have small delays.

Use a grace period. For an hourly job, expecting a heartbeat every 60 minutes with a 10–15 minute grace period is often reasonable. For daily jobs, a larger grace period may make sense.

The goal is to catch real misses without creating alert fatigue.

Alternative approaches

Heartbeat monitoring is usually the cleanest signal for missed Vercel Cron Jobs, but it is not the only useful monitoring layer.

Vercel logs

Vercel logs help you debug what happened inside a function. They can show errors, response status, runtime output, and timing information.

They are good for investigation, but weaker for proactive detection. Logs answer “what happened?” after you look. Heartbeats answer “did the expected success happen?” automatically.

Error tracking

Tools like Sentry or similar error trackers are useful when your cron handler throws an exception.

But missed executions do not always throw exceptions. If the route is not called, the schedule is wrong, or the function exits early without raising an error, error tracking may stay silent.

Use error tracking for exceptions. Use heartbeat monitoring for missing success.

Uptime checks

You can point an uptime monitor at the cron route, but that can be risky.

A cron route often performs side effects. Calling it from an uptime monitor might trigger real work at the wrong time. If you create a separate health endpoint, that only tells you the app is reachable, not that the scheduled job completed.

Uptime checks are great for public endpoints. They are not enough for scheduled background work.

Database markers

Some teams store a last_success_at timestamp in the database and check it from an admin dashboard.

This can work well, especially for internal systems. But you still need something to alert when the timestamp gets too old. Otherwise it becomes another value that nobody checks until after an incident.

A heartbeat monitor is basically this idea turned into an external alerting mechanism.

FAQ

How do I monitor Vercel Cron Jobs?

The most practical approach is to send a heartbeat ping after your cron handler completes successfully. Configure an external monitor to expect that ping on the same schedule as your Vercel Cron Job. If the ping does not arrive within the expected window, you get alerted.

Is Vercel Cron monitoring different from uptime monitoring?

Yes. Uptime monitoring checks whether an endpoint responds. Vercel Cron monitoring checks whether scheduled work completed successfully. Your app can be online while a cron job is missed, broken, or failing silently.

Where should I put the heartbeat ping in a Vercel Cron Job?

Place the heartbeat ping after the important scheduled work succeeds. Do not put it at the beginning of the handler. A heartbeat should mean “the job completed,” not merely “the route started.”

What schedule should I use for heartbeat alerts?

Match the heartbeat schedule to the cron schedule, then add a grace period. For example, if the cron runs every hour, you might alert after 70–75 minutes without a heartbeat. The right grace period depends on how much delay is acceptable.

Can Vercel logs catch missed cron executions?

Logs help debug failures, but they are not reliable missed-run detection by themselves. If a cron job never runs, there may be no useful application log. Heartbeat monitoring is better for detecting absence.

Conclusion

Vercel Cron Jobs are a convenient way to run scheduled serverless work, but they still need monitoring.

The dangerous failures are not always loud. Sometimes the job simply does not run, exits early, times out, or fails before completing the important work. Your app may stay online while the scheduled task quietly stops doing its job.

Good Vercel Cron monitoring should focus on successful completion. Add a heartbeat ping after the cron handler finishes its real work, configure an expected schedule and grace period, and alert when the ping goes missing.

That simple signal turns silent missed executions into visible, actionable failures.

Originally published at https://quietpulse.xyz/blog/vercel-cron-monitoring

Zapier Monitoring: How to Catch Silent Automation Failures

quietpulse — Mon, 04 May 2026 06:11:39 +0000

Zapier monitoring sounds simple until an important Zap quietly stops doing its job.

Maybe a lead should be copied from a form into your CRM. Maybe an invoice should trigger a Slack message. Maybe a paid signup should create a user record, tag the customer, and notify your team. When everything works, nobody thinks about it.

The problem is that automation failures are often silent. A Zap can be turned off, skipped because of a changed field, blocked by an expired token, or delayed long enough that nobody notices until the downstream mess is already real.

This guide explains how to monitor Zapier Zaps in a practical way, what usually breaks, and how heartbeat monitoring can help you detect missing automation runs before users or customers find the problem.

The problem

Zapier is great at connecting tools quickly. That is also why it often becomes part of production workflows without being treated like production infrastructure.

A typical Zap might do something like this:

New form submission in Typeform
Create contact in HubSpot
Add row to Google Sheets
Send Slack notification
Add subscriber to Mailchimp
Trigger an internal webhook

On paper, this is simple. In reality, the Zap may be responsible for sales, support, onboarding, reporting, billing operations, or customer communication.

The dangerous part is not always a visible error. The dangerous part is missing work.

For example:

A customer fills out a form, but no CRM contact is created.
A payment happens, but the onboarding message is never sent.
A support escalation is created, but nobody gets notified.
A daily sync should run every morning, but stops for three days.
A webhook step silently fails because the receiving app changed its schema.

If nobody is watching for expected Zap activity, the workflow can look “fine” from the outside while important business operations are stuck.

That is the core Zapier monitoring problem: you do not only need to know when a Zap errors. You need to know when expected automation work does not happen.

Why it happens

Zapier workflows can fail or stop for many normal reasons.

The most common ones are not dramatic. They are small operational issues that accumulate quietly.

One common cause is account authentication. A connected app token expires, permissions change, or someone removes access from the external service. The Zap may stop at the affected step until the account is reconnected.

Another common cause is input shape changes. If a form field is renamed, a CRM property is removed, or a webhook payload changes, later Zap steps may no longer receive the data they expect.

Filters and paths are another source of confusion. A Zap can trigger correctly but skip the important action because a filter condition no longer matches. From a monitoring perspective, that is tricky: the Zap technically ran, but the business outcome did not happen.

Rate limits can also create partial failures. A busy workflow may hit API limits in Google Sheets, Slack, HubSpot, Airtable, or another connected app. Some steps may retry, delay, or fail depending on the integration.

Scheduled Zaps have their own problems. A daily or hourly automation can be disabled, delayed, or misconfigured. If it runs at 06:00 every morning and stops, there may be no obvious signal unless you explicitly check for the run.

Human changes matter too. Someone can edit a Zap, turn it off during debugging, change a filter, remove a step, or switch accounts. The change may be reasonable at the time, but the workflow can stay broken longer than expected.

This is why Zapier monitoring needs to focus on the actual expected signal: did the automation complete the work it was supposed to complete within the expected time window?

Why it's dangerous

Silent Zap failures are dangerous because they usually sit between systems.

When a backend job fails, you may see an error log. When a website goes down, an uptime monitor catches it. But when a Zap misses a business action, the symptom often appears somewhere else later.

A missed CRM sync becomes a sales follow-up problem.

A missed Slack notification becomes a support response problem.

A missed spreadsheet update becomes a reporting problem.

A missed webhook delivery becomes a customer onboarding problem.

A missed daily automation becomes a pile of stale data.

These failures are especially painful for small teams and indie products because Zapier often fills gaps between tools. It is not “just automation.” It is glue code, except the code lives in a visual workflow builder.

The risk is higher when Zaps are used for:

Lead capture
Customer onboarding
Payment and billing operations
Support routing
Internal alerts
Daily reports
Data synchronization
No-code backend workflows
Webhook-based integrations

The incident can also be hard to reconstruct. Zapier task history helps, but only after someone knows what to look for. If you discover the issue days later, you may need to replay data manually, deduplicate records, contact customers, or rebuild state across several tools.

Good Zapier monitoring reduces the detection time. It does not make every integration perfect, but it gives you a fast signal when expected automation stops happening.

How to detect it

The simplest monitoring model is this:

If a Zap is expected to run regularly, it should emit a signal when it successfully reaches the important point.

That signal is usually called a heartbeat.

A heartbeat is just a small HTTP request that says, “this workflow reached this point.” If the heartbeat does not arrive within the expected interval, your monitor alerts you.

This is different from only checking Zapier task history.

Task history tells you what happened inside Zapier. Heartbeat monitoring tells you whether the expected external signal arrived on time.

For scheduled Zaps, this is very straightforward:

A Zap should run every hour.
Add a webhook step near the end of the Zap.
The webhook calls a heartbeat URL.
If the heartbeat is missing for more than, for example, 75 minutes, alert someone.

For event-driven Zaps, the pattern depends on expected volume.

If a Zap should run many times per day, you can monitor for activity gaps. For example, if your lead capture Zap normally runs every few hours during business days, a full day without any signal may be suspicious.

If the Zap handles critical but irregular events, you can monitor a companion scheduled check instead. For example, a scheduled Zap can query whether new records are being processed and ping a heartbeat when the check completes.

The key is to monitor completion, not just start.

A heartbeat at the beginning of a Zap proves only that the Zap started. A heartbeat near the end proves that the important steps completed before the signal was sent.

For Zapier workflows, a good heartbeat step is usually placed after:

The CRM record is created
The notification is sent
The spreadsheet row is written
The webhook succeeds
The data sync finishes
The final important action is complete

This gives you a practical signal for “the automation actually did the thing.”

Simple solution (with example)

Zapier can call external URLs using Webhooks by Zapier.

A simple monitoring setup looks like this:

Create a monitor for the Zap.
Copy the heartbeat URL.
Add a Webhooks by Zapier step near the end of the Zap.
Configure it to make a GET request to the heartbeat URL.
Set the expected schedule or grace period in your monitoring tool.
Alert if the heartbeat does not arrive on time.

Example heartbeat URL:

https://quietpulse.xyz/ping/{token}

In Zapier, add an action step:

App: Webhooks by Zapier
Event: GET
URL: https://quietpulse.xyz/ping/{token}
Payload Type: leave default unless needed
Headers: usually not required

Place this step after the critical work.

For example, imagine a Zap that handles new paid signups:

Trigger: new successful payment
Create customer in CRM
Add customer to onboarding list
Send Slack notification
Call heartbeat URL

The heartbeat should be last because that is the signal that the important automation path completed.

If the Zap does not run, the heartbeat is missing.

If the Zap fails before the final step, the heartbeat is missing.

If someone turns the Zap off, the heartbeat is missing.

If an app authorization breaks, the heartbeat is missing.

That missing signal is what creates the alert.

Instead of building this yourself, you can use a simple heartbeat monitoring tool like QuietPulse. Create a monitored job, copy the ping URL, and add it as a final Webhooks by Zapier step. If the expected ping does not arrive, QuietPulse can alert you before the broken automation quietly damages the rest of your workflow.

For scheduled Zaps, choose an interval slightly longer than the expected schedule. If the Zap runs hourly, a 75- or 90-minute threshold is often safer than exactly 60 minutes because automation platforms can have delays.

For daily Zaps, add a reasonable grace period too. If a Zap should run at 06:00, alerting at 06:01 may create noise. Alerting after 07:00 or 08:00 may be more practical depending on the workflow.

Common mistakes

1. Monitoring only the trigger

A Zap trigger firing does not mean the workflow completed.

If you ping at the start, the monitor may stay green even when later steps fail. Put the heartbeat after the important action, not before it.

2. Treating Zapier errors as the only failure mode

Zapier task errors are useful, but they do not cover every business failure.

A Zap can skip work because of filters, paths, changed data, or logic that no longer matches reality. Monitor the expected outcome, not just platform errors.

3. Using no grace period

Automation platforms can be delayed.

If a scheduled Zap runs every hour, do not alert the second the hour passes. Use a grace period that reflects real-world delays while still catching problems quickly.

4. Forgetting about low-volume workflows

Some Zaps do not run often, but they are still critical.

For irregular workflows, consider a scheduled audit Zap that checks whether source and destination systems are in sync, then sends a heartbeat when the audit completes.

5. Not documenting ownership

When a Zap fails, who fixes it?

Many no-code automations are created by one person and later become team infrastructure. Keep a short note with the owner, expected schedule, connected apps, and what the heartbeat means.

Alternative approaches

Heartbeat monitoring is useful, but it is not the only signal.

Zapier task history is still important. It helps you inspect failed tasks, replay data, and debug specific steps. The limitation is that someone has to look there or rely on Zapier's built-in notifications.

Zapier built-in alerts can catch some platform-level failures. They are a good baseline, especially for broken app connections or task errors. But they may not tell you that expected business work is missing.

Destination-system checks are another option. For example, you can check whether a CRM received new leads, whether a spreadsheet has fresh rows, or whether Slack messages were sent. This can be powerful, but it usually requires more custom logic.

Logs can help if your Zap calls an internal service. If you own the receiving API, log every incoming Zapier request and monitor error rates. This is useful for webhook-heavy workflows, but less useful for purely no-code flows between third-party apps.

Manual review is sometimes enough for low-risk workflows. For example, a weekly personal productivity automation may not need alerting. But if the Zap affects customers, revenue, support, or production data, manual review is usually too slow.

A practical setup often combines several layers:

Zapier built-in error notifications
Task history for debugging
Heartbeat monitoring for missing runs
Destination checks for critical data syncs
Clear ownership and documentation

That gives you both fast detection and enough context to fix the problem.

FAQ

What is Zapier monitoring?

Zapier monitoring means tracking whether your Zaps are running and completing the work they are supposed to do. Good monitoring does not only look for task errors. It also detects missing runs, skipped workflows, delayed automations, and broken downstream actions.

How do I know if a Zapier Zap stopped running?

For scheduled Zaps, add a heartbeat ping near the end of the workflow and alert when the ping does not arrive on time. You can also check Zapier task history, connected app errors, and whether the destination system received the expected data.

Can Zapier send a heartbeat ping?

Yes. You can use Webhooks by Zapier to send a GET request to a heartbeat URL such as https://quietpulse.xyz/ping/{token}. Put that step after the critical work so the ping means the Zap completed successfully.

Is Zapier task history enough for monitoring?

Zapier task history is useful for debugging, but it is not always enough for proactive monitoring. It helps explain what happened after you look, but heartbeat monitoring can alert you when an expected Zap run or completion signal is missing.

Where should I place the heartbeat step in a Zap?

Place the heartbeat step near the end of the Zap, after the most important action. If you ping at the beginning, your monitor may stay green even when later steps fail.

Conclusion

Zapier automations are often more important than they look.

If a Zap moves leads, customers, payments, support tickets, reports, or internal alerts, it deserves monitoring like any other production workflow.

The most reliable pattern is simple: define what “successful completion” means, send a heartbeat when the Zap reaches that point, and alert when the heartbeat is missing.

That turns silent automation failures into visible, fixable problems.

Originally published at https://quietpulse.xyz/blog/zapier-monitoring

Make Scenario Monitoring: How to Catch Silent Automation Failures

quietpulse — Sun, 03 May 2026 07:38:39 +0000

Make scenario monitoring is easy to overlook until an automation silently stops running.

A Make.com scenario might sync leads, update a CRM, send reports, copy invoices, or notify a team when something important happens. When it works, it feels invisible. When it breaks quietly, the damage can build up for hours or days.

The key is to monitor for missing successful runs, not only visible errors.

The problem

Make.com scenarios often become business-critical glue between tools.

They might:

copy form submissions into a CRM
sync orders into a spreadsheet
send Slack alerts
update Airtable or Notion
trigger onboarding emails
generate daily reports

The problem is that many automation failures are quiet.

A scenario can be disabled, a schedule can be wrong, an app connection can expire, or an upstream webhook can stop sending events. Sometimes the scenario runs, but a filter or router path prevents useful work from happening.

If nobody checks, the failure can remain hidden.

Why it happens

Make scenarios depend on many moving parts:

connected app credentials
third-party APIs
schedules and timezones
webhook payloads
filters and routers
account limits and quotas
human configuration changes

Any of these can change after the scenario was originally built.

A CRM token expires. A Google Sheets column is renamed. A teammate pauses a scenario for testing. A SaaS API starts returning rate limits. A webhook sender changes its payload shape.

The automation platform may still be online, but your specific workflow is no longer doing its job.

Why it's dangerous

Silent automation failures are dangerous because they rarely look urgent at first.

Your website is still up. Your dashboard may still be green. Nobody sees a crash screen.

But the work is not happening.

That can mean:

missed leads
stale customer records
incomplete finance reports
delayed onboarding
missing support notifications
bad data in downstream systems

The longer the failure stays hidden, the more manual cleanup it creates.

For small teams, this is especially painful because Make scenarios often replace custom backend jobs. They may be no-code workflows, but they still handle production responsibilities.

How to detect it

The most practical way to detect silent failures is heartbeat monitoring.

A heartbeat is a small signal sent when a job or workflow reaches an important successful point. If the signal arrives on time, the workflow probably ran. If it does not arrive, something needs attention.

For Make scenario monitoring, add the heartbeat near the end of the scenario, after the important work completes.

For example:

after leads are copied into the CRM
after a report is generated
after invoices are synced
after a Slack notification is sent
after a batch of records is processed

This turns silence into something you can alert on.

If the scenario is disabled, the heartbeat stops. If the schedule is wrong, the heartbeat is late. If an earlier module fails, the heartbeat never sends.

Simple solution (with example)

Add an HTTP request module at the end of the Make scenario.

Example heartbeat URL:

https://quietpulse.xyz/ping/YOUR_TOKEN_HERE

A simple scenario might look like this:

Scheduler trigger
  → Search new rows in Google Sheets
  → Create or update contacts in CRM
  → Send Slack summary
  → HTTP request: GET https://quietpulse.xyz/ping/YOUR_TOKEN_HERE

Outside Make, the same ping would look like this:

curl -fsS https://quietpulse.xyz/ping/YOUR_TOKEN_HERE

In Make's HTTP module, use:

Method: GET
URL: https://quietpulse.xyz/ping/YOUR_TOKEN_HERE
Body: empty
Headers: usually none required

Place the heartbeat after the work you actually care about.

If the scenario runs every hour, alert after something like 90 minutes without a ping. If it runs daily at 02:00, alert if no ping arrives by 03:00 or 04:00. The grace period prevents noisy alerts from normal delays.

For scenarios with routers, consider separate heartbeats for separate important paths.

Webhook trigger
  → Router
    → New customer path
      → Create onboarding tasks
      → Ping onboarding heartbeat
    → Refund path
      → Update finance sheet
      → Ping refund heartbeat

That gives you more precise alerts when only one branch breaks.

Common mistakes

1. Putting the heartbeat at the beginning

If the heartbeat runs right after the trigger, it only proves the scenario started. It does not prove the important work completed.

Put it near the end.

2. Relying only on Make history

Scenario history is useful for debugging, but it mostly helps after someone looks. It does not always catch missing runs quickly.

3. Using no grace period

Schedules are not always exact. APIs can be slow and scenarios can take longer than usual.

Use a practical alert window instead of alerting immediately after the expected time.

4. Treating every branch as one workflow

If a scenario has multiple router paths, one path can break while another still works.

Monitor critical branches separately when needed.

5. Sending heartbeats when no useful work happened

For some automations, a successful run is not enough. If a lead sync processes zero leads because a filter broke, you may want the heartbeat only after useful data is processed.

Alternative approaches

Heartbeat monitoring works best alongside other signals.

Make execution history

Great for debugging failed modules, input bundles, output bundles, and error details. Less ideal as the only proactive monitor.

Built-in error notifications

Useful for visible scenario errors, but not always enough for disabled scenarios, missed schedules, or logical failures.

Logs in destination systems

A CRM, database, or spreadsheet may show when data was last updated. This can help confirm results, but it is often harder to centralize.

Uptime monitoring

Good for checking whether a website or API is reachable. Not enough to prove a Make scenario processed records or sent a report.

Result-based checks

For critical workflows, you can monitor the destination directly: did today's report exist, did new records arrive, did a timestamp update? This is precise, but usually takes more setup.

A strong setup combines:

Make history for debugging
built-in alerts for visible errors
heartbeat monitoring for missing runs
result checks for critical data correctness

FAQ

What is Make scenario monitoring?

Make scenario monitoring means tracking whether Make.com scenarios run successfully and on time. It includes checking errors, execution history, schedules, and heartbeat signals.

How can I detect if a Make scenario stopped running?

Add a heartbeat ping near the end of the scenario and alert when the ping is missing. If the scenario is disabled, delayed, or fails before completion, the heartbeat will not arrive.

Is Make's built-in error history enough?

It is useful, but it is not always enough. History helps debug executions that happened. Heartbeat monitoring also catches expected executions that did not happen.

Where should the heartbeat go?

Place it after the critical work succeeds: after syncing records, sending a report, updating a destination system, or completing a key branch.

Does this work for webhook scenarios?

Yes. A heartbeat can confirm that a webhook scenario processed an event successfully, not just that the scenario exists.

Conclusion

Make.com scenarios can quietly become production infrastructure.

If a scenario matters, monitor it like any other scheduled job or background process. Add a heartbeat after the critical work, choose a reasonable alert window, and make missing runs visible before they become business problems.

Originally published at https://quietpulse.xyz/blog/make-scenario-monitoring

Systemd Timer Monitoring: How to Detect Failed or Missed Timers

quietpulse — Sat, 02 May 2026 08:46:57 +0000

Systemd timer monitoring matters when you use Linux timers for real production work: backups, imports, billing tasks, report generation, cleanup scripts, queue maintenance, certificate renewal, and dozens of other scheduled jobs that nobody wants to babysit.

Systemd timers are often cleaner than cron. They integrate with systemctl, log through journald, support dependencies, and can run missed jobs after boot. But they still have one uncomfortable weakness: a timer can stop doing useful work while the server itself looks perfectly healthy.

The machine is up. SSH works. Your app responds. The timer unit exists.

And yet the job did not run.

That is the gap systemd timer monitoring should close.

The problem

A systemd timer is usually made of two units:

# /etc/systemd/system/example-backup.timer
[Unit]
Description=Run example backup daily

[Timer]
OnCalendar=daily
Persistent=true

[Install]
WantedBy=timers.target

And the service it triggers:

# /etc/systemd/system/example-backup.service
[Unit]
Description=Example daily backup

[Service]
Type=oneshot
ExecStart=/usr/local/bin/example-backup.sh

You enable it:

systemctl enable --now example-backup.timer

Then you check it:

systemctl list-timers

Everything looks fine.

The problem is that “timer exists” does not mean “the work is being completed successfully.”

A timer can be active while the service fails. A service can exit successfully while the script skipped the important part. A job can hang forever. A server can be off during the scheduled window. A deployment can replace the script path. Permissions can change. Environment variables can disappear.

If nobody checks the actual execution signal, these failures can stay silent for days.

Why it happens

Systemd timers are reliable, but they are not magic. They schedule execution. They do not automatically prove that the business task succeeded.

Common failure modes include:

The .timer unit is enabled, but the .service unit fails.
The service exits with code 0, but the script did not complete meaningful work.
The job depends on network access before the network is ready.
The script works manually but fails under systemd’s limited environment.
The timer was disabled during maintenance and never re-enabled.
The server rebooted, and the timer did not catch up because Persistent=true was missing.
A long-running service overlaps with the next scheduled run.
Logs rotate or disappear before anyone checks them.
A package update changes permissions, paths, or runtime behavior.

A classic example is a backup script:

#!/usr/bin/env bash
set -euo pipefail

pg_dump "$DATABASE_URL" > /backups/app.sql
aws s3 cp /backups/app.sql s3://example-backups/app.sql

This may work perfectly from your shell.

But when systemd runs it, $DATABASE_URL may not exist. The AWS credentials may not be loaded. The script may not have permission to write to /backups. DNS may fail for a few minutes after boot.

You will probably see the failure in journald if you look:

journalctl -u example-backup.service

But the whole point of monitoring is not needing to remember to look.

Why it’s dangerous

Missed systemd timers are dangerous because they usually affect work that happens behind the scenes.

Users do not immediately notice that:

backups stopped running
reports were not generated
invoices were not sent
expired sessions were not cleaned up
data syncs stopped
temporary files are filling the disk
webhooks are not being retried
usage counters are stale
SSL renewal hooks did not run

The app can look healthy while important background work is broken.

This is why uptime monitoring is not enough. An uptime check tells you that an HTTP endpoint responded. It does not tell you that last night’s backup finished. It does not tell you that a timer ran at 03:00. It does not tell you that your cleanup job is stuck waiting on a locked file.

For small teams and side projects, this can be especially painful. You may not have a full observability stack. You may not check servers every morning. You may only discover the issue when something has already gone wrong.

A missed timer is rarely dramatic at first. It is quiet.

That is what makes it risky.

How to detect it

Good systemd timer monitoring should answer a simple question:

Did the expected job complete within the expected time window?

There are a few signals you can use.

First, systemd itself can show scheduled timers:

systemctl list-timers --all

This tells you the next run, last run, and associated unit.

Second, you can inspect service status:

systemctl status example-backup.service

Third, you can check logs:

journalctl -u example-backup.service --since "24 hours ago"

These are useful debugging tools.

But they are mostly pull-based. You have to remember to check them.

For production monitoring, you usually want push-based detection. The job should emit a small success signal after it completes. If that signal does not arrive on time, your monitoring system alerts you.

That is heartbeat monitoring.

The timer runs the service. The service runs the script. At the end of a successful run, the script sends a heartbeat ping.

If the ping arrives, the job completed.

If the ping does not arrive by the expected deadline, something is wrong:

the timer did not fire
the service failed
the script crashed
the server was down
the network was unavailable
the job hung before completion

Heartbeat monitoring does not replace logs. It answers a different question: “Did the scheduled work happen?”

Simple solution

Let’s say you have a daily backup job triggered by a systemd timer.

Your service calls this script:

#!/usr/bin/env bash
set -euo pipefail

BACKUP_FILE="/var/backups/app-$(date +%F).sql"

pg_dump "$DATABASE_URL" > "$BACKUP_FILE"
gzip "$BACKUP_FILE"
aws s3 cp "$BACKUP_FILE.gz" "s3://example-backups/"

curl -fsS "https://quietpulse.xyz/ping/YOUR_TOKEN"

The important part is that the ping happens only after the meaningful work succeeds.

Do not ping at the start. Do not ping before the upload. Do not ping before the database dump completes.

Ping after success.

Your service file might look like this:

[Unit]
Description=Daily application backup

[Service]
Type=oneshot
EnvironmentFile=/etc/example-backup.env
ExecStart=/usr/local/bin/example-backup.sh

Your timer:

[Unit]
Description=Run daily application backup

[Timer]
OnCalendar=03:00
Persistent=true
Unit=example-backup.service

[Install]
WantedBy=timers.target

Then enable it:

systemctl daemon-reload
systemctl enable --now example-backup.timer

Check that systemd knows about it:

systemctl list-timers example-backup.timer

With heartbeat monitoring, you configure the expected interval externally. For example, if the backup runs every day at 03:00, you might expect one ping every 24 hours with a small grace period.

If no ping arrives, you get alerted.

Instead of building that alerting logic yourself, you can use a simple heartbeat monitoring tool like QuietPulse. Create a monitor, copy the ping URL, and call it from the end of your systemd-triggered script. The important idea is still the same: alert on missing success signals, not just server uptime.

A better pattern for scripts

For more robust scripts, use a trap so failures are easier to debug locally, but keep the success ping at the end.

Example:

#!/usr/bin/env bash
set -euo pipefail

log() {
  echo "[$(date --iso-8601=seconds)] $*"
}

log "Starting backup"

BACKUP_FILE="/var/backups/app-$(date +%F).sql"

pg_dump "$DATABASE_URL" > "$BACKUP_FILE"
gzip "$BACKUP_FILE"
aws s3 cp "$BACKUP_FILE.gz" "s3://example-backups/"

log "Backup completed successfully"

curl -fsS "https://quietpulse.xyz/ping/YOUR_TOKEN"

log "Heartbeat sent"

This gives you two layers:

journald logs for investigation
heartbeat monitoring for missed execution detection

If the script fails before the final curl, the heartbeat does not fire. That is exactly what you want.

Common mistakes

1. Monitoring only the timer unit

Checking that a timer is enabled is not enough.

systemctl is-enabled example-backup.timer

This only tells you that systemd is configured to schedule it. It does not prove successful execution.

You need to monitor completion, not configuration.

2. Sending the heartbeat too early

A common mistake is placing the ping at the top of the script:

curl -fsS "https://quietpulse.xyz/ping/YOUR_TOKEN"

pg_dump "$DATABASE_URL" > backup.sql

This creates a false positive. The monitor sees a successful ping even if the actual job fails immediately afterward.

The ping should be the last step after the important work completes.

3. Ignoring the systemd environment

Systemd services do not run with the same environment as your interactive shell.

This often breaks scripts that depend on:

shell profile files
local PATH changes
exported secrets
user-specific credentials
working directories

Use explicit paths, EnvironmentFile=, and clear permissions.

4. Forgetting `Persistent=true`

If a server is off during a scheduled time, Persistent=true tells systemd to run the missed timer after boot.

Without it, some jobs may simply be skipped.

For daily maintenance jobs, backups, and syncs, this setting is often worth enabling.

5. Not setting timeouts

A oneshot service can hang longer than expected if a command waits forever.

Use systemd options like:

[Service]
Type=oneshot
TimeoutStartSec=30min

A hung timer can be just as bad as a missed one.

Alternative approaches

Heartbeat monitoring is usually the simplest way to detect missed timers, but it is not the only useful signal.

Journald logs

You can inspect logs with:

journalctl -u example-backup.service --since today

This is excellent for debugging.

But logs are passive. They help after you know something is wrong.

Systemd status checks

You can check failed units:

systemctl --failed

Or inspect one service:

systemctl status example-backup.service

This helps catch hard service failures.

But it may not catch a script that exits successfully while doing incomplete work.

Metrics and dashboards

If you already use Prometheus, Grafana, or another monitoring stack, you can export timer metrics and alert on them.

This is powerful, but it may be too much for a small VPS, indie app, or simple background job.

Email from scripts

Some scripts send email on failure. This can work, but it depends on mail delivery, spam filtering, and correct error handling.

Also, failure-only alerts do not catch every missed run. If the script never starts, it may never send the email.

Uptime checks

Uptime checks are still useful for web apps.

They just do not answer the systemd timer question. Your website can be up while your daily job is broken.

Use uptime checks for endpoints. Use heartbeat checks for scheduled work.

FAQ

What is systemd timer monitoring?

Systemd timer monitoring is the practice of checking whether scheduled systemd timer jobs actually run and complete successfully. It usually combines systemd status, logs, and heartbeat checks that alert when an expected job does not report success.

How do I know if a systemd timer failed?

You can start with:

systemctl list-timers --all
systemctl status your-service.service
journalctl -u your-service.service

For proactive detection, add a heartbeat ping at the end of the job and alert when the ping is missing.

Are systemd timers better than cron?

Systemd timers are often better for Linux services because they integrate with unit dependencies, journald, boot behavior, and systemctl. Cron is simpler and widely known. Both still need monitoring if the scheduled work matters.

Can uptime monitoring detect missed systemd timers?

No, not reliably. Uptime monitoring checks whether a service or endpoint responds. A missed systemd timer can happen while the server and application are still online.

Where should I put the heartbeat ping?

Put the heartbeat ping at the end of the script, after the important work has completed successfully. If you ping at the beginning, you may hide failures that happen later.

Conclusion

Systemd timers are a strong replacement for many cron jobs, but they still need monitoring.

Do not stop at “the timer is enabled.” Monitor whether the job actually completed.

Use systemd logs and status for debugging. Use heartbeat monitoring to catch missed or failed execution automatically. For backups, syncs, reports, cleanup scripts, and other scheduled production work, that small success ping can be the difference between a quiet failure and an early alert.

Originally published at https://quietpulse.xyz/blog/systemd-timer-monitoring

Kubernetes CronJob Monitoring: How to Catch Missed Runs Before They Break Production

quietpulse — Fri, 01 May 2026 07:30:50 +0000

Kubernetes CronJob monitoring sounds simple until the first scheduled job silently does not run.

Your cluster is healthy. The pods look fine. The app is serving traffic. Prometheus is green. Then somebody asks why yesterday’s invoices were not generated, why cleanup did not happen, or why a customer export is missing.

The problem is that Kubernetes can tell you a lot about pods and workloads, but a scheduled job is different: it matters that it ran at the right time, completed successfully, and keeps doing that every time.

This guide explains what actually breaks with Kubernetes CronJobs, why missed runs are easy to miss, and how to monitor them with heartbeat checks.

The problem

A Kubernetes CronJob is a scheduled workload. You define a schedule, Kubernetes creates Jobs, and those Jobs create Pods.

For example:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: nightly-invoice-sync
spec:
  schedule: "0 2 * * *"
  jobTemplate:
    spec:
      template:
        spec:
          restartPolicy: OnFailure
          containers:
            - name: sync
              image: example/invoice-sync:latest
              command: ["node", "sync-invoices.js"]

This looks clean. But in production, several things can go wrong:

The CronJob never creates a Job.
The Job starts but the Pod fails.
The Pod hangs forever.
The job runs too late.
Multiple runs overlap.
The job succeeds from Kubernetes’ point of view but does not finish the business task.
The schedule is suspended and nobody notices.

Kubernetes usually exposes these as separate signals: CronJob status, Job status, Pod events, logs, and metrics. That is useful, but it also means there is no single obvious signal that says:

“This scheduled task did not complete when expected.”

That is the core monitoring gap.

Why it happens

Kubernetes CronJobs depend on several moving parts.

First, the CronJob controller must notice that a schedule is due and create a Job. If the controller is delayed, the cluster is under pressure, or the CronJob configuration has edge cases, the Job may be late or skipped.

Second, the Job must create a Pod. That can fail because of image pull errors, missing secrets, resource limits, node pressure, admission policies, or broken service accounts.

Third, the Pod must actually run the task. This is where application-level failures appear: bad credentials, API rate limits, database locks, schema changes, network timeouts, or logic bugs.

Finally, the task must complete the real business operation. A script can exit with code 0 even if it processed zero records because a query changed or an upstream API returned an unexpected empty response.

Kubernetes is good at managing containers. It is not automatically aware of your business expectation:

“This billing sync must finish once every night.”

That expectation needs to be monitored directly.

Why it's dangerous

Missed CronJobs are dangerous because they often fail quietly.

A web server failure is visible quickly. Users complain. Error rates spike. Uptime checks fail.

A missed scheduled task can sit unnoticed for hours or days.

Examples:

A billing job does not run, so invoices are never created.
A cleanup job stops, so storage usage grows until something breaks.
A data import misses one night, so dashboards show stale numbers.
A reminder job silently fails, so customers do not receive notifications.
A reconciliation task skips a run, so financial state drifts.
A backup verification job stops running, so nobody knows backups are broken.

The worst part is that many CronJob failures do not look urgent at the infrastructure level. The cluster can be perfectly healthy while the scheduled business process is failing.

That is why Kubernetes CronJob monitoring should focus on expected completion, not just pod health.

How to detect it

The most reliable way to detect missed CronJobs is to monitor the job from the outside.

Instead of only asking Kubernetes “did a pod exist?”, ask:

“Did this scheduled task finish within the expected time window?”

That is what heartbeat monitoring does.

The pattern is simple:

Create a unique heartbeat URL for the scheduled task.
At the end of the CronJob, call that URL.
Configure the monitor to expect a ping every schedule interval.
If the ping does not arrive on time, send an alert.

For example, if a CronJob runs every night at 02:00 and normally finishes by 02:10, you might expect a heartbeat once every 24 hours with a grace period.

This detects:

The CronJob did not start.
The Job failed before the end.
The Pod crashed.
The script hung.
The schedule was suspended.
The task completed too late.
Kubernetes created objects but the real work never finished.

This is different from log monitoring or pod monitoring. It checks the outcome that matters: the job reached the point where it can say “I completed.”

Simple solution with example

A simple pattern is to send the heartbeat only after the task succeeds.

For a shell-based Kubernetes CronJob, that might look like this:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: nightly-report
spec:
  schedule: "0 2 * * *"
  concurrencyPolicy: Forbid
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 3
  jobTemplate:
    spec:
      backoffLimit: 2
      template:
        spec:
          restartPolicy: OnFailure
          containers:
            - name: report
              image: curlimages/curl:latest
              command:
                - /bin/sh
                - -c
                - |
                  set -e

                  echo "Running nightly report..."

                  # Replace this with your real command.
                  /app/generate-nightly-report.sh

                  curl -fsS --max-time 10 https://quietpulse.xyz/ping/YOUR_TOKEN

The important detail is the order.

The heartbeat happens after the actual work. If the report command fails, set -e stops the script and the ping never happens. That means the monitor will alert.

For a Node.js job:

async function main() {
  await generateReport();

  await fetch("https://quietpulse.xyz/ping/YOUR_TOKEN", {
    method: "GET",
    signal: AbortSignal.timeout(10000),
  });
}

main().catch((error) => {
  console.error(error);
  process.exit(1);
});

For a Python job:

import requests

def main():
    generate_report()

    requests.get(
        "https://quietpulse.xyz/ping/YOUR_TOKEN",
        timeout=10,
    ).raise_for_status()

if __name__ == "__main__":
    main()

You can build this yourself with a small service that stores last-seen timestamps and sends alerts. Or you can use a heartbeat monitoring tool like QuietPulse, create a monitor for the CronJob, and ping its URL when the job finishes.

The key idea is not the tool. The key idea is that every important scheduled task should prove it completed.

Common mistakes

1. Pinging at the start of the job

A start ping proves the job started. It does not prove the job completed.

If the task hangs halfway through, crashes after processing some records, or fails during the final API call, a start ping gives a false sense of safety.

For most CronJobs, send the heartbeat at the end.

2. Only watching pod status

Pod status is useful, but it is not enough.

A pod can exist and still fail the real task. A container can exit successfully while processing no data. A Job can be retried and eventually disappear from history.

Infrastructure status should support CronJob monitoring, not replace it.

3. Ignoring execution time

A job that normally finishes in 3 minutes but suddenly takes 2 hours may already be broken.

Track duration when possible. At minimum, configure heartbeat grace periods based on realistic runtime, not just the schedule.

4. Allowing overlapping runs by accident

If a CronJob runs every 10 minutes but sometimes takes 20 minutes, overlapping executions can create duplicates, locks, or inconsistent data.

Use concurrencyPolicy: Forbid when overlap is unsafe:

spec:
  concurrencyPolicy: Forbid

Then monitor for missed completions so skipped or delayed work does not stay invisible.

5. Keeping too little job history

Kubernetes lets you control how many successful and failed Jobs are retained.

If history limits are too low, useful debugging context disappears quickly:

successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 5

Heartbeat alerts tell you something is wrong. Job and pod history help you investigate why.

Alternative approaches

Heartbeat monitoring is usually the cleanest way to detect missed CronJobs, but it should not be your only signal.

Kubernetes events

Kubernetes events can show scheduling problems, failed pod creation, image pull errors, and resource issues.

They are useful for debugging, but they are noisy and not always retained long enough.

Logs

Logs help explain what happened inside the job.

They are less reliable for detecting jobs that never started. If there is no run, there may be no log line to search for.

Metrics

Prometheus and kube-state-metrics can expose useful signals about CronJobs, Jobs, and Pods.

This can work well if your team already has a strong Kubernetes monitoring setup. But it still requires careful alert rules around expected schedule, last successful completion, and delay tolerance.

Uptime checks

Uptime monitoring checks whether a service responds.

That is not the same as checking whether a scheduled job completed. Your app can be online while the nightly reconciliation job has not run in three days.

Application-level checks

For some jobs, the best signal is a business metric: “new report generated”, “backup verified”, “records imported”, or “emails sent”.

These are excellent when available. Heartbeat monitoring is often the simplest baseline, and business metrics can add extra confidence.

FAQ

What is Kubernetes CronJob monitoring?

Kubernetes CronJob monitoring is the practice of checking whether scheduled Kubernetes Jobs run and complete as expected. Good monitoring detects missed runs, failed pods, delayed execution, hangs, and broken business tasks.

How do I know if a Kubernetes CronJob did not run?

You can inspect CronJob, Job, and Pod status with kubectl, but the most reliable production signal is an external heartbeat. If the expected heartbeat does not arrive after the scheduled run, the CronJob likely failed, missed its schedule, or did not complete.

Is pod monitoring enough for Kubernetes CronJobs?

No. Pod monitoring helps, but it does not fully prove that the scheduled task completed its business work. A pod can start and still fail internally, hang, process no records, or exit successfully with bad results.

Should the heartbeat happen at the start or end of the CronJob?

Usually at the end. A heartbeat at the end proves that the job reached its completion point. A heartbeat at the start only proves that execution began.

What grace period should I use for a CronJob monitor?

Use the normal schedule plus expected runtime and a small buffer. If a job runs every hour and usually finishes in 5 minutes, a 10–15 minute grace period may be reasonable. For long jobs, base the grace period on real historical runtime.

Conclusion

Kubernetes CronJobs are easy to create, but missed runs are easy to overlook.

The safest monitoring pattern is simple: make each important CronJob send a heartbeat after successful completion, then alert when that heartbeat does not arrive on time.

Kubernetes can tell you what happened to pods. Heartbeat monitoring tells you whether the scheduled task actually completed.

For production CronJobs, that difference matters.

Originally published at https://quietpulse.xyz/blog/kubernetes-cronjob-monitoring

Node.js Cron Job Monitoring Best Practices for Catching Silent Failures

quietpulse — Thu, 30 Apr 2026 06:22:33 +0000

Node.js cron job monitoring becomes important the first time a scheduled task quietly stops doing its job.

Your API can be healthy. Your frontend can load. Your uptime monitor can stay green. Meanwhile, a billing sync, cleanup task, report generator, or import job may have stopped running days ago.

That is the tricky part about cron-style work: the failure is often not visible from the outside.

The problem

Node.js scheduled jobs often run away from normal user requests.

They might handle:

daily email digests
payment retries
database cleanup
cache refreshes
scheduled notifications
data imports
report generation
third-party API syncs

When one of these breaks, there may be no customer-facing error at first. The job is simply missing.

That missing work can become stale data, failed billing, unprocessed records, or support tickets later.

Why it happens

Node.js cron jobs can break in obvious and non-obvious ways.

A simple job might look like this:

cron.schedule('0 * * * *', async () => {
  await syncCustomers();
});

This can fail because syncCustomers() throws. But scheduled jobs can also fail because:

the worker process crashed
the scheduler was not started after deploy
environment variables changed
the cron expression is wrong
the job hangs on an external API
database queries never return
the job overlaps with itself
multiple app instances run the same task
a server timezone changed
errors are caught and only logged

A common mistake is forgetting proper async handling:

cron.schedule('*/15 * * * *', () => {
  syncInventory(); // missing await / error handling
});

This can make production failures harder to notice.

Why it's dangerous

Missed scheduled jobs rarely create one neat incident.

They create slow damage.

A sync that fails once may not matter. A sync that fails for three days can create stale data, missing records, broken reports, or customer confusion.

The longer the issue continues, the more painful recovery becomes:

more data needs reprocessing
duplicate work becomes more likely
logs may rotate away
manual fixes become risky
customers may notice first

Uptime monitoring does not solve this. It tells you whether an endpoint responds. It does not tell you whether your scheduled jobs actually completed.

How to detect it

The core monitoring question is simple:

Did the job send a success signal within the expected time window?

This is usually called heartbeat monitoring.

The pattern is:

The scheduled job runs.
It completes the important work.
It sends a heartbeat ping.
A monitor expects that ping on schedule.
If the ping does not arrive, someone gets alerted.

For example:

a 15-minute job should check in every 15–20 minutes
an hourly job should check in every 60–70 minutes
a daily job should check in every 24–26 hours

This catches problems like missed runs, crashed workers, bad deploys, disabled schedulers, and jobs that hang before completion.

Simple solution

Here is a basic example using node-cron.

npm install node-cron

import cron from 'node-cron';

async function runJob() {
  console.log('Starting customer sync');

  await syncCustomers();

  await fetch('https://quietpulse.xyz/ping/{token}');

  console.log('Customer sync completed');
}

cron.schedule('0 * * * *', async () => {
  try {
    await runJob();
  } catch (error) {
    console.error('Customer sync failed:', error);
    process.exitCode = 1;
  }
});

The key detail: send the heartbeat after the work succeeds.

Do not do this:

await fetch('https://quietpulse.xyz/ping/{token}');
await syncCustomers();

If the sync fails after the ping, your monitor will think the job succeeded.

For older Node.js versions, use a small HTTP client:

npm install undici

import { fetch } from 'undici';

await fetch('https://quietpulse.xyz/ping/{token}');

You can also add a timeout:

async function sendHeartbeat() {
  const controller = new AbortController();
  const timeout = setTimeout(() => controller.abort(), 5000);

  try {
    await fetch('https://quietpulse.xyz/ping/{token}', {
      signal: controller.signal,
    });
  } finally {
    clearTimeout(timeout);
  }
}

Then call it after the job finishes:

async function runJob() {
  await syncCustomers();
  await sendHeartbeat();
}

Instead of building the monitoring side yourself, you can use a heartbeat monitoring service. The important part is the pattern: each successful job run should create an external signal, and missing signals should trigger alerts.

Common mistakes

1. Pinging too early

If you send a heartbeat before the real work, failures after that point are hidden.

Send the heartbeat after successful completion.

2. Relying only on process uptime

A process can be running while the scheduled task is broken.

PM2, Docker, systemd, or Kubernetes can tell you whether a process exists. They cannot always tell you whether a specific job completed.

3. Ignoring long runtimes

A job that usually takes 20 seconds but now takes 30 minutes may be failing in a slower way.

Long runtimes can cause overlap, stale data, and queue buildup.

4. Running jobs on every app instance

If your app runs on multiple servers and each one starts the scheduler, the same job may run multiple times.

Use a dedicated worker, external scheduler, or distributed lock when needed.

5. Swallowing errors

Logging errors is useful, but it is not the same as alerting.

try {
  await syncCustomers();
} catch (error) {
  console.error(error);
}

If nobody reads the logs, this is still a silent failure.

Alternative approaches

Logs

Logs are useful for debugging what happened. They are weaker at detecting something that never happened.

If the job never ran, there may be no log line.

Error tracking

Error tracking tools can catch thrown exceptions and rejected promises.

They help when a job starts and fails loudly. They do not catch every missed run, disabled scheduler, or stuck process.

Uptime checks

Uptime checks are great for websites and APIs.

They do not confirm that a background job completed.

Queue dashboards

If your scheduled job creates queue work, queue metrics can help. Watch queue depth, retries, failed jobs, and processing latency.

But queue metrics may not catch the scheduler failing to enqueue work in the first place.

Database timestamps

You can store last_success_at in your database.

This works, but you still need something that checks whether the timestamp is too old and sends an alert.

FAQ

What is Node.js cron job monitoring?

It is the practice of checking whether scheduled Node.js tasks run successfully when expected. This includes jobs for syncs, cleanup, billing, reports, imports, and other background work.

How do I detect if a Node.js cron job stopped running?

Send a heartbeat after each successful run. If the heartbeat does not arrive within the expected interval, alert someone.

Are logs enough for Node.js scheduled jobs?

No. Logs help with debugging, but they do not reliably detect missed runs. If the job never starts, logs may not show anything useful.

Should cron jobs run inside the main Node.js app?

For small apps, it can work. For production systems, a dedicated worker, external scheduler, or distributed lock is usually safer.

Conclusion

Node.js cron job monitoring is about detecting missing work, not just errors.

A scheduled job can stop running while the rest of your app looks healthy. Add a heartbeat after successful completion, alert when it goes missing, and you will catch silent failures much earlier.

Originally published at https://quietpulse.xyz/blog/node-js-cron-job-monitoring-best-practices

Forem: quietpulse

Rails Scheduled Job Monitoring: How to Catch Missed Jobs Before They Break Production

The problem

Why it happens

Why it's dangerous

How to detect it

Simple solution (with example)

Common mistakes

1. Pinging at the start of the job

2. Monitoring only the queue

3. Using one heartbeat for many jobs

4. Ignoring time zones

5. Swallowing exceptions too broadly

Alternative approaches

FAQ

What is Rails scheduled job monitoring?

How do I monitor Rails cron jobs?

Is Sidekiq monitoring enough for scheduled jobs?

Should I ping before or after a Rails job runs?

What Rails jobs should have heartbeat monitoring?

Conclusion

Django Management Command Monitoring: How to Catch Missed Commands Before They Break Production

The problem

Why it happens

Why it's dangerous

How to detect it

Simple solution

Common mistakes

1. Sending the heartbeat at the start

2. Using ; instead of &&

3. Relying only on logs

4. Monitoring only the scheduler

5. Reusing one monitor for every command

Alternative approaches

Logs

Error tracking

Scheduler dashboards

Database audit tables

FAQ

What is Django management command monitoring?

How do I monitor a Django management command in cron?

Should the heartbeat ping happen before or after the command?

Is cron enough for Django scheduled tasks?

Does this work with Celery beat or systemd timers?

Conclusion

Firebase Scheduled Functions Monitoring: How to Catch Missed Runs Before They Break Production

The problem

Why it happens

Why it's dangerous

How to detect it

Simple solution with example

Common mistakes

1. Only checking Firebase logs

2. Pinging before the work finishes

3. Using one heartbeat for many jobs

4. Ignoring time zones

5. Not testing failure cases

Alternative approaches

Firebase and Google Cloud logs

Error tracking

Cloud Scheduler monitoring

Database audit records

Custom dashboards

FAQ

What is Firebase scheduled functions monitoring?

Are Firebase logs enough to monitor scheduled functions?

Should I ping a heartbeat at the start or end of a scheduled function?

Can Firebase scheduled functions fail silently?

How often should I monitor a Firebase scheduled function?

Conclusion

Supabase Scheduled Functions Monitoring: How to Catch Missed Runs Before They Break Production

The problem

Why it happens

Why it's dangerous

How to detect it

Simple solution with example

Common mistakes

1. Pinging at the start of the function

2. Monitoring only the public app

3. Relying only on logs

2. Using `;` instead of `&&`