<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Vincent Abolarin</title>
    <description>The latest articles on Forem by Vincent Abolarin (@vincentabolarin).</description>
    <link>https://forem.com/vincentabolarin</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3837739%2F3fdd5c49-e1f2-4158-8d24-1e5918fe0a76.jpg</url>
      <title>Forem: Vincent Abolarin</title>
      <link>https://forem.com/vincentabolarin</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/vincentabolarin"/>
    <language>en</language>
    <item>
      <title>Your Cron Job has Hung, but You May Never Realize It</title>
      <dc:creator>Vincent Abolarin</dc:creator>
      <pubDate>Fri, 08 May 2026 12:25:31 +0000</pubDate>
      <link>https://forem.com/vincentabolarin/your-cron-job-has-hung-but-you-may-never-realize-it-45f5</link>
      <guid>https://forem.com/vincentabolarin/your-cron-job-has-hung-but-you-may-never-realize-it-45f5</guid>
      <description>&lt;p&gt;A crashed cron job is easy to catch. It exits with a non-zero code. Your monitoring fires. You get an alert.&lt;/p&gt;

&lt;p&gt;A hung cron job is different. It starts. It keeps running. It never crashes and never finishes. It holds a database connection, locks a file, consumes memory, and blocks every subsequent execution. Days later, you notice your data is three days stale and your server is running out of memory.&lt;/p&gt;

&lt;p&gt;No alert fired. No exit code was ever logged. The job was "running" the entire time.&lt;/p&gt;

&lt;p&gt;This is a hung job, and it's one of the most damaging failure modes in scheduled task infrastructure because it's entirely invisible to standard monitoring.&lt;/p&gt;




&lt;h2&gt;
  
  
  What causes cron jobs to hang
&lt;/h2&gt;

&lt;p&gt;Hung jobs almost always trace back to one of five causes:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Database deadlocks or long-running queries.&lt;/strong&gt; A query acquires a lock and waits for another lock that's held by a different process. Neither transaction can proceed. The job sits in a waiting state indefinitely, holding its own lock and blocking other operations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Network timeouts without timeout configuration.&lt;/strong&gt; An HTTP request to an external API, a message queue connection, a remote file transfer — if no timeout is explicitly set, the default in most runtimes is to wait indefinitely. A server that stops responding mid-response will leave your job waiting forever.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Infinite loops from unexpected input.&lt;/strong&gt; A loop that processes records one by one, where a malformed record causes the loop to re-process the same item, or where a queue never empties because new items are added as fast as they're consumed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Memory pressure causing extreme slowdown.&lt;/strong&gt; A job that processes large datasets and runs out of heap memory doesn't always crash — it can enter a state of constant garbage collection where it's technically running but making no forward progress.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;File system issues.&lt;/strong&gt; A job that writes to a full disk, tries to acquire a file lock held by a crashed previous instance, or waits for input from a pipe that nothing is writing to.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why standard cron job monitoring doesn't catch hung jobs
&lt;/h2&gt;

&lt;p&gt;Standard heartbeat monitoring — where your job pings a URL when it completes — cannot detect hung jobs by design.&lt;/p&gt;

&lt;p&gt;The ping only fires when the job finishes. A hung job never finishes, so the ping never fires. From the monitor's perspective, the job simply hasn't completed yet. It has no way to know whether the job is still actively working or has been frozen for six hours.&lt;/p&gt;

&lt;p&gt;To detect hung jobs, you need two things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A &lt;strong&gt;start ping&lt;/strong&gt; — so the monitor knows when the job began&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;maximum duration threshold&lt;/strong&gt; — so the monitor knows when the job has been running too long&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Only when both are present can an external service detect the difference between "job is still working" and "job has been stuck for four hours".&lt;/p&gt;

&lt;p&gt;This is why Crontify uses a start/success/fail ping model rather than a single completion heartbeat.&lt;/p&gt;




&lt;h2&gt;
  
  
  How Crontify detects hung jobs
&lt;/h2&gt;

&lt;p&gt;When your job calls &lt;code&gt;start()&lt;/code&gt;, Crontify creates a run record with the current timestamp. The run is in a &lt;code&gt;running&lt;/code&gt; state.&lt;/p&gt;

&lt;p&gt;Every minute, Crontify's scheduler checks all runs in &lt;code&gt;running&lt;/code&gt; state. For each one, it calculates:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;seconds_running&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;startedAt&lt;/span&gt;
&lt;span class="n"&gt;hung_threshold&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;monitor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;gracePeriod&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;hung_job_timeout_multiplier&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If &lt;code&gt;seconds_running&lt;/code&gt; exceeds the threshold, the run is marked as &lt;code&gt;hung&lt;/code&gt; and an alert fires. The default multiplier is 2 — so a monitor with a 30-minute grace period triggers a hung alert after 60 minutes of continuous running.&lt;/p&gt;

&lt;p&gt;This detection is entirely external to your process. It fires even if your job is completely frozen, even if your process is consuming 100% CPU in a tight loop, even if the event loop is blocked.&lt;/p&gt;




&lt;h2&gt;
  
  
  Adding hung job detection to your cron jobs
&lt;/h2&gt;

&lt;p&gt;Install the SDK:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; @crontify/sdk
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The minimal instrumentation to enable hung job detection:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;CrontifyMonitor&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@crontify/sdk&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;monitor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;CrontifyMonitor&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;CRONTIFY_API_KEY&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;monitorId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;your-monitor-id&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// wrap() calls start() at the beginning automatically&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;monitor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;wrap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;processLargeDataset&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. The &lt;code&gt;start()&lt;/code&gt; ping is sent when &lt;code&gt;wrap()&lt;/code&gt; is called. If &lt;code&gt;processLargeDataset()&lt;/code&gt; never resolves, Crontify detects the hung state after the threshold expires and sends an alert.&lt;/p&gt;

&lt;p&gt;For manual control:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;monitor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;processLargeDataset&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;monitor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;success&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;meta&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;records_processed&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;count&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;monitor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fail&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;log&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;stack&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="c1"&gt;// If this code never reaches success() or fail(), &lt;/span&gt;
&lt;span class="c1"&gt;// Crontify detects the hung state externally&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Setting an appropriate maximum duration threshold
&lt;/h2&gt;

&lt;p&gt;The hung job threshold is derived from the grace period you set for each monitor. Set it based on the longest reasonable runtime for your job — not the average.&lt;/p&gt;

&lt;p&gt;A job that normally takes 5 minutes but can legitimately take 20 minutes under heavy load should have a grace period of at least 25–30 minutes. If it's still running after 50–60 minutes (2× the grace period), something is wrong.&lt;/p&gt;

&lt;p&gt;Some rules of thumb:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Database backup jobs:&lt;/strong&gt; 2–3× the average backup duration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API sync jobs:&lt;/strong&gt; Set an explicit timeout on every HTTP request (e.g. 30 seconds), then set the monitor grace period to &lt;code&gt;(number of records × 30 seconds) + buffer&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data processing jobs:&lt;/strong&gt; Profile average runtime over 10 executions, set grace period to 2–3× the p95 duration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Email dispatch jobs:&lt;/strong&gt; Usually fast (under 5 minutes); a 15-minute grace period is generous&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Preventing hung jobs in the first place
&lt;/h2&gt;

&lt;p&gt;Hung job detection tells you when it happens. These patterns reduce how often it happens:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Set explicit timeouts on all network calls:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Never do this in a cron job&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Always do this&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;signal&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;AbortSignal&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;timeout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="nx"&gt;_000&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="c1"&gt;// 30 second timeout&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Use database query timeouts:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// PostgreSQL statement timeout (per connection)&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;prisma&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;$executeRaw&lt;/span&gt;&lt;span class="s2"&gt;`SET statement_timeout = '60s'`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;// Or per-query via raw SQL&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;prisma&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;$queryRaw&lt;/span&gt;&lt;span class="s2"&gt;`
  SET LOCAL statement_timeout = '30000';
  SELECT * FROM large_table WHERE condition = true;
`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Add a process-level timeout as a last resort:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Kill the entire process if the job takes more than 10 minutes&lt;/span&gt;
&lt;span class="c1"&gt;// Only appropriate for jobs running in isolated processes&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;TIMEOUT_MS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;timeout&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;setTimeout&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Job exceeded maximum duration, exiting&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="nx"&gt;TIMEOUT_MS&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;timeout&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;unref&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;// Don't prevent normal exit&lt;/span&gt;

&lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;runJob&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;finally&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;clearTimeout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;timeout&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note that a process exit will trigger an error, which Crontify's SDK will catch and report as a failed run — which is correct and preferable to a hung run.&lt;/p&gt;




&lt;h2&gt;
  
  
  Frequently asked questions
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What is the difference between a hung job and a missed run?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A missed run never started — no start ping arrived within the grace period after the scheduled time. A hung job started but never finished — a start ping arrived, but no success or fail ping followed within the maximum duration threshold. Both require external monitoring to detect, but they represent different root causes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can a job be both hung and missed?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Yes, in sequence. If a job hangs indefinitely, it may still be technically "running" when the next scheduled execution is due. If the next instance detects the previous run is still active, it may refuse to start (depending on your configuration), resulting in what appears as a missed run. Crontify detects the hung state and fires an overlap alert if a new instance starts while the previous one hasn't finished.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How do I test that hung job detection is working?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Create a test monitor, instrument a test job that calls &lt;code&gt;start()&lt;/code&gt; and then sleeps indefinitely (or just never calls &lt;code&gt;success()&lt;/code&gt; or &lt;code&gt;fail()&lt;/code&gt;). Within one detection cycle after your threshold expires, you should receive an alert.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Test hung job detection — never call this in production&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;monitor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{});&lt;/span&gt; &lt;span class="c1"&gt;// never resolves&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Start monitoring for free
&lt;/h2&gt;

&lt;p&gt;Crontify is free for up to 5 monitors — no credit card required.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://crontify.com" rel="noopener noreferrer"&gt;crontify.com&lt;/a&gt; — SDK on npm as &lt;code&gt;@crontify/sdk&lt;/code&gt;.&lt;/p&gt;

</description>
      <category>cron</category>
      <category>cronjob</category>
      <category>devops</category>
      <category>monitoring</category>
    </item>
    <item>
      <title>What Is AI, Really? A Plain-English Explanation</title>
      <dc:creator>Vincent Abolarin</dc:creator>
      <pubDate>Mon, 13 Apr 2026 19:53:40 +0000</pubDate>
      <link>https://forem.com/vincentabolarin/what-is-ai-really-a-plain-english-explanation-ecg</link>
      <guid>https://forem.com/vincentabolarin/what-is-ai-really-a-plain-english-explanation-ecg</guid>
      <description>&lt;p&gt;You’ve heard the word everywhere. On the news, in your workplace, from your kids. Artificial intelligence — AI — seems to be the answer to everything right now. But what actually is it? And do you need to care?&lt;/p&gt;

&lt;p&gt;The honest answer: yes, you probably do. But it’s nowhere near as complicated as the tech world makes it sound. This guide cuts through all the jargon and tells you what AI actually is, what it can do for you right now, and how to start using it without a single line of code or a computer science degree.&lt;/p&gt;

&lt;h2&gt;
  
  
  Let’s Start With What AI Is Not
&lt;/h2&gt;

&lt;p&gt;Before explaining what AI is, it helps to clear up what it isn’t — because most people’s mental image is completely wrong.&lt;/p&gt;

&lt;p&gt;AI is not a robot that thinks like a human. It’s not HAL 9000 from 2001: A Space Odyssey. It doesn’t have feelings, motivations, or plans to take over the world. It’s also not magic, and it’s definitely not infallible.&lt;/p&gt;

&lt;p&gt;AI is, at its core, a very powerful pattern-matching system. It has been trained on enormous amounts of text, images, code, and data — and it has learned to recognise patterns in all of that information well enough to generate useful responses to your questions.&lt;/p&gt;

&lt;p&gt;Think of it like this: if you read every cookbook ever written, you’d get very good at suggesting recipes. You wouldn’t understand food the way a chef does, but you’d be remarkably useful in a kitchen. That’s roughly what modern AI does — at scale, and very fast.&lt;/p&gt;

&lt;h2&gt;
  
  
  So What Is “Generative AI” Exactly?
&lt;/h2&gt;

&lt;p&gt;You’ll often hear the phrase generative AI used alongside names like ChatGPT, Claude, and Gemini. Generative AI refers specifically to AI that creates things — text, images, code, audio — rather than just analysing or sorting existing data.&lt;/p&gt;

&lt;p&gt;When you type a question into ChatGPT or Claude and get a paragraph back, that’s generative AI at work. It has generated a response based on the patterns it learned during training. It hasn’t looked up your answer in a database the way Google does — it has essentially composed a response on the fly.&lt;/p&gt;

&lt;p&gt;This is what makes it so flexible and, frankly, so impressive. You can ask it to write an email, summarise a document, explain a legal clause, plan a holiday, or debug a spreadsheet formula — and it will give you a coherent, useful answer each time.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Main AI Tools You’ll Hear About
&lt;/h2&gt;

&lt;p&gt;Right now, there are four names you’ll encounter most often:&lt;/p&gt;

&lt;p&gt;ChatGPT (made by OpenAI) is the one that started the mainstream AI wave in late 2022. It’s the most widely used, holding around 60% of the AI chatbot market as of early 2026, and it’s a solid all-rounder. It handles writing, research, answering questions, and even generating images. Most people start here because it’s the name they’ve heard most.&lt;/p&gt;

&lt;p&gt;Claude (made by Anthropic) has been quietly gaining ground. When you give it specific instructions — like “keep it under 200 words, conversational tone, no corporate jargon” — Claude actually follows them, which regular users quickly notice. It’s particularly strong for reading and analysing long documents.&lt;/p&gt;

&lt;p&gt;Gemini (made by Google) is worth knowing about if you already live inside Google’s world — Gmail, Docs, Drive, Calendar. Its tight integration with Google services is its main selling point, meaning it can reach into your actual emails and documents rather than working in isolation.&lt;/p&gt;

&lt;p&gt;Perplexity is worth paying attention to for everyday questions. Unlike the others, it searches the web in real time and shows you its sources — making it the most useful option when you need current, factual information you can verify.&lt;/p&gt;

&lt;p&gt;All four have free tiers. You don’t need to pay anything to get started.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Can AI Actually Do For a Normal Person?
&lt;/h2&gt;

&lt;p&gt;Here’s where it gets practical. Forget the tech demos and the business case studies. For a regular person going about their day, AI can:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Save you time on writing.&lt;/strong&gt; Drafting emails, writing a complaint letter, putting together a CV, composing a wedding speech — AI can produce a solid first draft in seconds that you then tweak into your own voice. You don’t have to stare at a blank page anymore.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Explain complicated things simply.&lt;/strong&gt; Got a confusing insurance document? A legal letter you don’t understand? Paste it into ChatGPT or Claude and ask it to explain it like you’re twelve. It will.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Help you think things through.&lt;/strong&gt; Planning a home renovation, deciding whether to switch jobs, trying to figure out a budget — AI is surprisingly good as a thinking partner. It won’t make decisions for you, but it will lay out considerations you might not have thought of.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Answer questions without a twenty-tab Google session.&lt;/strong&gt; Instead of opening a dozen websites and piecing together an answer, you can ask AI a direct question and get a direct answer. For straightforward factual questions, this alone saves enormous amounts of time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Handle repetitive tasks.&lt;/strong&gt; Summarising meeting notes, formatting a list, turning bullet points into a paragraph — the small tedious jobs that eat your day are exactly what AI does fastest.&lt;/p&gt;

&lt;h2&gt;
  
  
  What AI Cannot Do (And This Matters)
&lt;/h2&gt;

&lt;p&gt;Knowing the limits is just as important as knowing the capabilities — possibly more so.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AI makes things up.&lt;/strong&gt; This is the most important thing to understand. When an AI doesn’t know something, it doesn’t say “I don’t know.” It generates a plausible-sounding answer anyway. This is called a hallucination, and it happens with all AI tools, even the best ones. Always verify important facts, especially anything medical, legal, or financial.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AI’s knowledge has a cut-off date.&lt;/strong&gt; Most AI models were trained on data up to a certain point in time. Ask it about last week’s news and you may get an outdated or fabricated answer. Perplexity is the exception here, since it searches the web in real time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AI doesn’t understand context the way humans do.&lt;/strong&gt; It doesn’t know your situation, your history, or the nuance behind your question unless you tell it. The more context you give, the better the output.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AI is not a therapist, a doctor, or a lawyer.&lt;/strong&gt; It can give you general information, but it cannot replace professional advice for anything serious.&lt;/p&gt;

&lt;h2&gt;
  
  
  Do You Need to Be Technical to Use It?
&lt;/h2&gt;

&lt;p&gt;Not at all. If you can type a sentence, you can use AI. The interface for every major AI tool looks like a chat window — you type, it responds. There’s nothing to install for the basic versions, no account configuration required beyond signing up, and no commands to memorise.&lt;/p&gt;

&lt;p&gt;The one skill worth developing is knowing how to ask good questions — what the AI world calls “prompting.” The better you describe what you want, the better the result you get. But even without any prompting knowledge, you’ll get useful output immediately. You can build up the skill gradually as you go.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Should You Start?
&lt;/h2&gt;

&lt;p&gt;If you’ve never used an AI tool before, here’s the simplest possible starting point:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Go to &lt;a href="https://claude.ai/" rel="noopener noreferrer"&gt;claude.ai&lt;/a&gt; or &lt;a href="https://chatgpt.com/" rel="noopener noreferrer"&gt;chatgpt.com&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Create a free account&lt;/li&gt;
&lt;li&gt;Type something you’d normally Google — a question, a task, something you’ve been meaning to write&lt;/li&gt;
&lt;li&gt;See what comes back&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That’s it. You don’t need a plan, a course, or a tutorial. Just start a conversation.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>explainlikeimfive</category>
      <category>chatgpt</category>
    </item>
    <item>
      <title>How to add cron job monitoring without changing your infrastructure</title>
      <dc:creator>Vincent Abolarin</dc:creator>
      <pubDate>Sun, 12 Apr 2026 20:15:43 +0000</pubDate>
      <link>https://forem.com/vincentabolarin/how-to-add-cron-job-monitoring-without-changing-your-infrastructure-khi</link>
      <guid>https://forem.com/vincentabolarin/how-to-add-cron-job-monitoring-without-changing-your-infrastructure-khi</guid>
      <description>&lt;p&gt;Most cron job monitoring advice involves significant infrastructure changes: configure a centralised logging stack, set up log forwarding, deploy a sidecar, integrate with your existing observability platform, write a custom alerting rule.&lt;/p&gt;

&lt;p&gt;These are reasonable options for large teams with dedicated infrastructure engineers. For everyone else — a solo developer, a small team, a startup moving fast — the overhead of the setup is what prevents monitoring from happening at all. The result is production jobs running unmonitored for months until something visibly breaks.&lt;/p&gt;

&lt;p&gt;There is a simpler model. Your jobs report their own status over HTTP. No new infrastructure. No log pipelines. No sidecar processes. Three HTTP calls per job run.&lt;/p&gt;




&lt;h2&gt;
  
  
  How the dead man's switch model works
&lt;/h2&gt;

&lt;p&gt;Traditional monitoring is pull-based: a monitoring system checks whether something is alive. Dead man's switch monitoring is push-based: the thing being monitored announces its own health.&lt;/p&gt;

&lt;p&gt;For cron jobs specifically:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Your job sends an HTTP POST to a monitoring endpoint when it starts.&lt;/li&gt;
&lt;li&gt;Your job sends an HTTP POST when it completes successfully.&lt;/li&gt;
&lt;li&gt;If no start ping arrives within the expected window, an alert fires. If a start ping arrives but no success ping follows within the maximum duration, an alert fires.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That's the entire model. The monitoring service holds the expected schedule and grace periods. Your job holds nothing except the API key and the monitor ID.&lt;/p&gt;

&lt;p&gt;No changes to your cron infrastructure. Your jobs run wherever they already run — a VPS, a container, a serverless function, a Raspberry Pi. The monitoring is entirely external.&lt;/p&gt;




&lt;h2&gt;
  
  
  The integration is three lines
&lt;/h2&gt;

&lt;p&gt;In Node.js with the &lt;code&gt;@crontify/sdk&lt;/code&gt; package:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;CrontifyMonitor&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@crontify/sdk&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;monitor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;CrontifyMonitor&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;CRONTIFY_API_KEY&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;monitorId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;your-monitor-id&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;monitor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;wrap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;yourExistingJobFunction&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;wrap()&lt;/code&gt; sends the start ping before calling your function, and the success or fail ping when it returns or throws. Your existing job function is unchanged.&lt;/p&gt;

&lt;p&gt;For jobs that are not in Node.js, the same three pings are plain HTTP POST requests with an &lt;code&gt;X-API-Key&lt;/code&gt; header:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Start ping&lt;/span&gt;
curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST https://api.crontify.com/api/v1/ping/your-monitor-id/start &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"X-API-Key: ck_live_your_key"&lt;/span&gt;

&lt;span class="c"&gt;# Your job runs here&lt;/span&gt;

&lt;span class="c"&gt;# Success ping&lt;/span&gt;
curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST https://api.crontify.com/api/v1/ping/your-monitor-id/success &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"X-API-Key: ck_live_your_key"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In a shell script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt;

&lt;span class="nv"&gt;MONITOR_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"your-monitor-id"&lt;/span&gt;
&lt;span class="nv"&gt;API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;CRONTIFY_API_KEY&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nv"&gt;BASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"https://api.crontify.com/api/v1/ping"&lt;/span&gt;

ping&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
  curl &lt;span class="nt"&gt;-fsS&lt;/span&gt; &lt;span class="nt"&gt;-X&lt;/span&gt; POST &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;BASE_URL&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;MONITOR_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;1&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"X-API-Key: &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;API_KEY&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /dev/null 2&amp;gt;&amp;amp;1 &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nb"&gt;true&lt;/span&gt;
  &lt;span class="c"&gt;# || true prevents monitoring failures from killing the job&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;

ping start

&lt;span class="c"&gt;# Your existing job logic&lt;/span&gt;
/usr/local/bin/your-backup-script.sh

ping success
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In Python:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;ping&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;monitor_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.crontify.com/api/v1/ping/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;monitor_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;X-API-Key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CRONTIFY_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]},&lt;/span&gt;
            &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;pass&lt;/span&gt;  &lt;span class="c1"&gt;# monitoring failure should never block the job
&lt;/span&gt;
&lt;span class="n"&gt;monitor_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your-monitor-id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="nf"&gt;ping&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;start&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;monitor_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;your_existing_job&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="nf"&gt;ping&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;success&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;monitor_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;ping&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fail&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;monitor_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;raise&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  What you get without touching your infrastructure
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Missed run detection.&lt;/strong&gt; &lt;a href="https://crontify.com" rel="noopener noreferrer"&gt;Crontify&lt;/a&gt; parses your cron expression and knows when each run is expected. If no start ping arrives within your configured grace period, an alert fires. This catches missed runs caused by server reboots, cron daemon failures, and deleted crontabs — all situations where your existing logging would produce no evidence at all.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hung job detection.&lt;/strong&gt; If a start ping arrives but no success or fail ping follows within your maximum duration threshold, the run is marked as hung and an alert fires. This catches deadlocks, infinite loops, and network hangs that keep a process alive but not productive.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Run history.&lt;/strong&gt; Every run is recorded with start time, end time, duration, and status. You get a dashboard showing run frequency, success rate, and duration trends over time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Recovery alerts.&lt;/strong&gt; When a monitor that was in a failing state receives a healthy ping, a recovery notification is sent automatically. You know when incidents are resolved, not just when they start.&lt;/p&gt;




&lt;h2&gt;
  
  
  Adding failure context without a logging pipeline
&lt;/h2&gt;

&lt;p&gt;When a job fails, attaching the error context directly to the fail ping delivers it to your Slack or email alert without needing to configure log forwarding:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;runJob&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;monitor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;success&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;monitor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fail&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;log&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;stack&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;// up to 10,000 characters&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The first 500 characters appear inline in the Slack alert. The full log is stored in the run detail and accessible from the dashboard. This is not a replacement for a structured logging stack in large teams, but for a solo developer who doesn't have one, it means full failure context in the notification rather than a task to SSH into a server and find the log file.&lt;/p&gt;




&lt;h2&gt;
  
  
  The cost of not monitoring
&lt;/h2&gt;

&lt;p&gt;The question is not whether monitoring is worth setting up. The question is whether the setup cost justifies it.&lt;/p&gt;

&lt;p&gt;For infrastructure-heavy approaches, the answer is often no — until after the first incident that costs more than the setup would have. For the dead man's switch model with HTTP pings, the setup cost is under an hour for a first job. Once you have the pattern in one job, adding it to subsequent jobs is a few minutes each.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://crontify.com" rel="noopener noreferrer"&gt;Crontify&lt;/a&gt; is free for up to 5 monitors — no credit card required.&lt;/p&gt;

</description>
      <category>softwareengineering</category>
      <category>monitoring</category>
      <category>howto</category>
      <category>infrastructure</category>
    </item>
    <item>
      <title>How to add monitoring to gocron scheduled jobs in Go</title>
      <dc:creator>Vincent Abolarin</dc:creator>
      <pubDate>Sat, 04 Apr 2026 15:39:49 +0000</pubDate>
      <link>https://forem.com/vincentabolarin/how-to-add-monitoring-to-gocron-scheduled-jobs-in-go-80c</link>
      <guid>https://forem.com/vincentabolarin/how-to-add-monitoring-to-gocron-scheduled-jobs-in-go-80c</guid>
      <description>&lt;p&gt;&lt;a href="https://github.com/go-co-op/gocron" rel="noopener noreferrer"&gt;gocron&lt;/a&gt; is the most widely used job scheduling library in Go. It handles the hard parts of scheduling — cron expressions, concurrency control, timezone awareness, singleton modes — and gets out of your way.&lt;/p&gt;

&lt;p&gt;What it doesn't do is tell you when something goes wrong.&lt;/p&gt;

&lt;p&gt;The gocron v2 documentation lists a &lt;code&gt;Monitor&lt;/code&gt; interface and a &lt;code&gt;MonitorStatus&lt;/code&gt; interface for collecting metrics from job execution. Both entries note the same thing: &lt;em&gt;"There are currently no open source implementations of the Monitor interface available."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That gap is what this post addresses. Here's how to instrument your gocron jobs with external monitoring so you're alerted the moment a job fails, hangs indefinitely, or stops running on schedule.&lt;/p&gt;




&lt;h2&gt;
  
  
  What gocron monitoring gives you natively
&lt;/h2&gt;

&lt;p&gt;gocron v2 exposes two scheduler-level monitoring hooks:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;Monitor&lt;/code&gt;&lt;/strong&gt; collects metrics per job execution. You implement the interface and attach it to the scheduler. gocron calls your implementation before and after each job run.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;MonitorStatus&lt;/code&gt;&lt;/strong&gt; extends &lt;code&gt;Monitor&lt;/code&gt; with error and status tracking per execution.&lt;/p&gt;

&lt;p&gt;Both are useful for local observability — feeding metrics into Prometheus, for example. But they don't solve the most important monitoring problems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Missed runs&lt;/strong&gt;: if gocron itself crashes or your process dies, nothing fires an alert because the scheduler is gone.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hung jobs&lt;/strong&gt;: gocron can detect that a job is taking longer than expected, but it doesn't alert anyone externally.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Silent failures&lt;/strong&gt;: a job that completes without error but processes zero records looks identical to a successful run.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;External monitoring solves all three. The approach: your jobs send start and completion pings to an external service. If a ping doesn't arrive when expected, you get alerted — even if your entire Go process is down.&lt;/p&gt;




&lt;h2&gt;
  
  
  The basic instrumentation pattern
&lt;/h2&gt;

&lt;p&gt;For simple gocron jobs, add HTTP pings around the work:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;package&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"context"&lt;/span&gt;
    &lt;span class="s"&gt;"fmt"&lt;/span&gt;
    &lt;span class="s"&gt;"log"&lt;/span&gt;
    &lt;span class="s"&gt;"net/http"&lt;/span&gt;
    &lt;span class="s"&gt;"time"&lt;/span&gt;

    &lt;span class="s"&gt;"github.com/go-co-op/gocron/v2"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;apiKey&lt;/span&gt;    &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"ck_live_your_api_key"&lt;/span&gt;
    &lt;span class="n"&gt;monitorID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"your-monitor-id"&lt;/span&gt;
    &lt;span class="n"&gt;baseURL&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"https://api.crontify.com/api/v1/ping"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;ping&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Sprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"%s/%s/%s"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;baseURL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;monitorID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewRequest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"POST"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"ping %s: failed to build request: %v"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Header&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"X-API-Key"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;Timeout&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Second&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Do&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"ping %s: request failed: %v"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Body&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;syncRecords&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c"&gt;// your job logic&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;gocron&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewScheduler&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Fatal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewJob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;gocron&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CronJob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"0 2 * * *"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;false&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;gocron&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewTask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;ping&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"start"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;syncRecords&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="c"&gt;// Optionally POST error details to the fail endpoint&lt;/span&gt;
                &lt;span class="n"&gt;ping&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"fail"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"syncRecords failed: %v"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;

            &lt;span class="n"&gt;ping&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"success"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;}),&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Fatal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Start&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gives you missed run detection (if the start ping doesn't arrive within your grace period) and failed job detection (if &lt;code&gt;fail&lt;/code&gt; is pinged instead of &lt;code&gt;success&lt;/code&gt;).&lt;/p&gt;




&lt;h2&gt;
  
  
  Adding hung job detection
&lt;/h2&gt;

&lt;p&gt;Hung job detection requires a start ping followed by a success or fail ping within a maximum duration. That's already provided by the pattern above — Crontify's scheduler checks for runs that started but never completed.&lt;/p&gt;

&lt;p&gt;You can also add an application-level timeout using Go's context package to force-terminate jobs that run too long:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;gocron&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewTask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cancel&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithTimeout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Background&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="m"&gt;30&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Minute&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="n"&gt;cancel&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;ping&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"start"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;done&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;chan&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;done&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; &lt;span class="n"&gt;syncRecords&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;}()&lt;/span&gt;

    &lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="n"&gt;done&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;ping&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"fail"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"job failed: %v"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;ping&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"success"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Done&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;ping&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"fail"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"job exceeded 30 minute timeout"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}),&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If &lt;code&gt;syncRecords()&lt;/code&gt; runs beyond 30 minutes, the context cancels, a fail ping is sent, and Crontify records the run as failed. The external hung job detection provides a second safety net — if your process itself hangs and never sends any ping, the monitor will alert after the threshold regardless.&lt;/p&gt;




&lt;h2&gt;
  
  
  Attaching metadata for silent failure detection
&lt;/h2&gt;

&lt;p&gt;Go jobs often process records, sync data, or run aggregations. Attaching the count of what was actually processed lets you define alert rules on the output — firing an alert when a job succeeds but processes nothing.&lt;/p&gt;

&lt;p&gt;The Crontify ping endpoints accept a JSON body on the &lt;code&gt;success&lt;/code&gt; endpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"bytes"&lt;/span&gt;
    &lt;span class="s"&gt;"encoding/json"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;SuccessPayload&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Meta&lt;/span&gt; &lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="n"&gt;any&lt;/span&gt; &lt;span class="s"&gt;`json:"meta"`&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;pingSuccess&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;meta&lt;/span&gt; &lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="n"&gt;any&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Marshal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SuccessPayload&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;Meta&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;meta&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Sprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"%s/%s/success"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;baseURL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;monitorID&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewRequest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"POST"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bytes&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewReader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Header&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"X-API-Key"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Header&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Content-Type"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"application/json"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;Timeout&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Second&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Do&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"pingSuccess failed: %v"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Body&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c"&gt;// In your job:&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;syncRecords&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;ping&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"fail"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;pingSuccess&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="n"&gt;any&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="s"&gt;"rows_synced"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;   &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s"&gt;"duration_ms"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;   &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DurationMs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s"&gt;"errors_skipped"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Errors&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In &lt;a href="https://crontify.com" rel="noopener noreferrer"&gt;Crontify&lt;/a&gt;, you can then define a rule: &lt;code&gt;rows_synced eq 0&lt;/code&gt; → fire alert. The run is still logged as a success, but you get an immediate notification that something upstream is broken.&lt;/p&gt;




&lt;h2&gt;
  
  
  Wrapping it up as a reusable helper
&lt;/h2&gt;

&lt;p&gt;If you have multiple gocron jobs to instrument, a small wrapper avoids repetition:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;CrontifyJob&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;MonitorID&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;
    &lt;span class="n"&gt;APIKey&lt;/span&gt;    &lt;span class="kt"&gt;string&lt;/span&gt;
    &lt;span class="n"&gt;BaseURL&lt;/span&gt;   &lt;span class="kt"&gt;string&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;CrontifyJob&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;Wrap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fn&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="n"&gt;any&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ping&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"start"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;meta&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ping&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"fail"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"job %s failed: %v"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;MonitorID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;meta&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pingSuccess&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;meta&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ping&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"success"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Usage:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;nightly&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;CrontifyJob&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;MonitorID&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"mon_abc123"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;APIKey&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;    &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"CRONTIFY_API_KEY"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;BaseURL&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;   &lt;span class="s"&gt;"https://api.crontify.com/api/v1/ping"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewJob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;gocron&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CronJob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"0 2 * * *"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;false&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;gocron&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewTask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;nightly&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Wrap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="n"&gt;any&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;syncRecords&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="n"&gt;any&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;"rows_synced"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Count&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;
    &lt;span class="p"&gt;})),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  What you get
&lt;/h2&gt;

&lt;p&gt;After instrumenting your gocron jobs with external monitoring:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Missed runs&lt;/strong&gt; alert you when a job doesn't start within its grace period — catches process crashes, server reboots, and OOM kills that take down the entire scheduler.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hung jobs&lt;/strong&gt; alert you when a job starts but never finishes — catches deadlocks, infinite loops, and database locks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Silent failures&lt;/strong&gt; alert you when a job completes but produces no output — catches empty upstream responses, failed database conditions, and zero-record syncs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Recovery alerts&lt;/strong&gt; fire automatically when a previously failing job returns to healthy.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://crontify.com" rel="noopener noreferrer"&gt;Crontify&lt;/a&gt; is free for up to 5 monitors. No credit card required.&lt;/p&gt;

</description>
      <category>gocron</category>
      <category>go</category>
      <category>devops</category>
      <category>monitoring</category>
    </item>
    <item>
      <title>The cron job that always succeeded and never worked</title>
      <dc:creator>Vincent Abolarin</dc:creator>
      <pubDate>Sat, 21 Mar 2026 23:44:19 +0000</pubDate>
      <link>https://forem.com/vincentabolarin/the-cron-job-that-always-succeeded-and-never-worked-d5n</link>
      <guid>https://forem.com/vincentabolarin/the-cron-job-that-always-succeeded-and-never-worked-d5n</guid>
      <description>&lt;p&gt;Cron job monitoring usually answers one question: did the job run?&lt;/p&gt;

&lt;p&gt;That's the wrong question.&lt;/p&gt;

&lt;p&gt;Your job ran last night. Exited 0. No exceptions in the logs. Your uptime dashboard is green. And somewhere in your database, a table that should have 50,000 new rows from last night's sync has zero.&lt;/p&gt;

&lt;p&gt;Not an error. Not a crash. A silence.&lt;/p&gt;

&lt;p&gt;The job ran. It just didn't do anything. Maybe the upstream API returned an empty response. Maybe a config change stopped the data flowing. Maybe a schema migration broke a query in a way that returns zero rows instead of throwing. The job saw nothing wrong. So it reported nothing wrong.&lt;/p&gt;

&lt;p&gt;You find out when a customer emails you.&lt;/p&gt;

&lt;p&gt;This is a &lt;strong&gt;silent failure&lt;/strong&gt; — and it is the hardest class of production failure to catch because the entire failure chain reports success. Your cron scheduler: success. Your job runner: success. Your cron job monitoring tool: success. Meanwhile your data is stale, your pipeline is broken, and nobody knows.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why cron job monitoring misses silent failures
&lt;/h2&gt;

&lt;p&gt;The standard model for cron job monitoring is heartbeat monitoring. Your job pings a URL when it runs. If the ping doesn't arrive within a grace period, you get an alert. Simple and effective for missed runs.&lt;/p&gt;

&lt;p&gt;Some tools extend this to start/finish pings, which adds hung job detection — jobs that start but never complete. Valuable. But both approaches are still asking only one question: &lt;em&gt;did the job execute?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Silent failures require asking a different question: &lt;em&gt;did the job accomplish anything?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That requires data from inside the job. Data that only your code can provide. No external monitoring tool can know that your nightly sync processed zero records unless you tell it — which is exactly what &lt;a href="https://crontify.com" rel="noopener noreferrer"&gt;Crontify&lt;/a&gt;'s alert rules are designed for.&lt;/p&gt;




&lt;h2&gt;
  
  
  How to detect silent failures with alert rules
&lt;/h2&gt;

&lt;p&gt;Crontify introduces alert rules on job output metadata.&lt;/p&gt;

&lt;p&gt;When your job calls &lt;code&gt;success()&lt;/code&gt;, you attach a metadata object describing what the run actually did:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;CrontifyMonitor&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@crontify/sdk&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;monitor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;CrontifyMonitor&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;CRONTIFY_API_KEY&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;monitorId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;your-monitor-id&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;monitor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;success&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;meta&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;rows_processed&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;records_synced&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;api_calls_made&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;14&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You then define alert rules against that metadata in the dashboard:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;rows_processed eq 0&lt;/code&gt; → fire alert&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;records_synced lt 100&lt;/code&gt; → fire alert&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;api_calls_made gt 1000&lt;/code&gt; → fire alert&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The run is logged as a success. Your historical success rate is untouched. But when a rule fires, Crontify sends an immediate alert to your configured channels — Slack, email, Discord, or webhook.&lt;/p&gt;

&lt;p&gt;You can stack multiple rules per monitor and mix operators (&lt;code&gt;eq&lt;/code&gt;, &lt;code&gt;lt&lt;/code&gt;, &lt;code&gt;gt&lt;/code&gt;, &lt;code&gt;ne&lt;/code&gt;). This is silent failure detection built into the monitoring layer, where it belongs — not bolted onto your job's business logic.&lt;/p&gt;




&lt;h2&gt;
  
  
  The three failure types every cron job monitor should catch
&lt;/h2&gt;

&lt;p&gt;Silent failures are the hardest to catch. The other three are the ones every developer knows they need:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Missed runs.&lt;/strong&gt; Your job didn't start at all. Crontify parses your cron expression and knows when each run is expected. If no start ping arrives within your configured grace period, an alert fires. Works with any cron syntax including complex multi-part expressions and timezone-aware schedules.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hung jobs.&lt;/strong&gt; Your job started but hasn't finished within your threshold. Catches deadlocks, stuck database queries, and infinite loops that don't throw — the class of failure that keeps a process alive and doing nothing forever. Without start/finish pings, a standard heartbeat monitor can't distinguish "job finished in 2 seconds" from "job has been running for 6 hours".&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Failed jobs.&lt;/strong&gt; Your job explicitly called &lt;code&gt;fail()&lt;/code&gt;, or the SDK caught an unhandled exception and reported it automatically.&lt;/p&gt;




&lt;h2&gt;
  
  
  Attaching log output to failed runs
&lt;/h2&gt;

&lt;p&gt;When a cron job fails, you need context to diagnose it quickly. Crontify lets you attach the full log output directly to the failure ping:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;runJob&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;monitor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;success&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;monitor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fail&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;log&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;stack&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Up to 10,000 characters, stored separately and delivered directly in the Slack or email alert. No switching to your logging infrastructure at 2am. No searching through CloudWatch or Datadog to find what broke. The context is in the notification.&lt;/p&gt;




&lt;h2&gt;
  
  
  Instrumenting your jobs in under a minute
&lt;/h2&gt;

&lt;p&gt;The SDK is zero-dependency TypeScript, published to npm as &lt;a href="https://www.npmjs.com/package/@crontify/sdk" rel="noopener noreferrer"&gt;&lt;code&gt;@crontify/sdk&lt;/code&gt;&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; @crontify/sdk
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;wrap()&lt;/code&gt; handles start, success, and fail pings automatically:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;CrontifyMonitor&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@crontify/sdk&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;monitor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;CrontifyMonitor&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;CRONTIFY_API_KEY&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;monitorId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;your-monitor-id&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;monitor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;wrap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;syncDatabase&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;meta&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;rows_processed&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;rowCount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;duration_ms&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;durationMs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the function throws, &lt;code&gt;fail()&lt;/code&gt; is called automatically with the error message. If it completes, &lt;code&gt;success()&lt;/code&gt; is called with whatever metadata you return. If it never completes within your hung job threshold, the scheduler catches it on the next detection cycle.&lt;/p&gt;

&lt;p&gt;If you manage multiple monitors in the same process, &lt;code&gt;CrontifyClient&lt;/code&gt; caches instances by ID:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;CrontifyClient&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@crontify/sdk&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;crontify&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;CrontifyClient&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;CRONTIFY_API_KEY&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;crontify&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;monitor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;mon_abc123&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;wrap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;syncUsers&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;crontify&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;monitor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;mon_def456&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;wrap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;syncOrders&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Not using TypeScript? The ping endpoints are plain HTTP — &lt;code&gt;/api/v1/ping/{id}/start&lt;/code&gt;, &lt;code&gt;/success&lt;/code&gt;, &lt;code&gt;/fail&lt;/code&gt;. Any language that can make an HTTP request works: Python, Go, Bash, PHP, Ruby.&lt;/p&gt;




&lt;h2&gt;
  
  
  Recovery alerts
&lt;/h2&gt;

&lt;p&gt;Cron job monitoring isn't just about knowing when things break — it's about knowing when they're fixed.&lt;/p&gt;

&lt;p&gt;When a monitor that was in a failing state receives a healthy ping, Crontify automatically sends a recovery notification to the same channels. You know when an incident starts. You know when it's over. You're not left refreshing the dashboard to find out if the fix worked.&lt;/p&gt;




&lt;h2&gt;
  
  
  Frequently asked questions
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What is a silent failure in a cron job?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A silent failure is when a cron job completes without errors — exits 0, no exceptions — but fails to accomplish its intended purpose. For example, a sync job that processes zero records, an email job that sends to an empty list, or a cleanup job that deletes nothing because its filter condition is wrong. Standard monitoring treats these as successes because the job technically ran.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How do I monitor cron jobs in Node.js?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Install &lt;code&gt;@crontify/sdk&lt;/code&gt; from npm, create a monitor in the Crontify dashboard, and wrap your job function with &lt;code&gt;monitor.wrap()&lt;/code&gt;. The SDK handles start, success, and failure pings automatically. Full instrumentation takes under 60 seconds.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What is the difference between a missed run and a hung job?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A missed run means the job never started — no start ping arrived within the grace period after the scheduled time. A hung job means the job started but never finished — a start ping arrived, but no success or fail ping followed within the maximum duration threshold. Both require start/finish ping architecture to detect; a simple heartbeat monitor can only catch missed runs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does cron job monitoring work with languages other than JavaScript?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Yes. Any language that can make an HTTP POST request works with Crontify's ping API. The &lt;code&gt;@crontify/sdk&lt;/code&gt; npm package is the easiest path for Node.js and TypeScript projects, but Python, Go, Ruby, PHP, Bash, and any other runtime can ping the three HTTP endpoints directly.&lt;/p&gt;




&lt;h2&gt;
  
  
  Start monitoring for free
&lt;/h2&gt;

&lt;p&gt;Crontify is free to get started — 5 monitors, no credit card required.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://crontify.com" rel="noopener noreferrer"&gt;crontify.com&lt;/a&gt; — SDK on npm as &lt;code&gt;@crontify/sdk&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;If silent failures are a gap in your monitoring, this is what Crontify was built to close.&lt;/p&gt;

</description>
      <category>cron</category>
      <category>devops</category>
      <category>node</category>
      <category>webdev</category>
    </item>
  </channel>
</rss>
