<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: N A S</title>
    <description>The latest articles on Forem by N A S (@droncogone).</description>
    <link>https://forem.com/droncogone</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F450195%2Fd03279a3-0fc7-48ec-99da-5ba2cd4c9892.jpg</url>
      <title>Forem: N A S</title>
      <link>https://forem.com/droncogone</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/droncogone"/>
    <language>en</language>
    <item>
      <title>A SOFTWARE POSTMORTEM</title>
      <dc:creator>N A S</dc:creator>
      <pubDate>Thu, 16 Jun 2022 11:59:18 +0000</pubDate>
      <link>https://forem.com/droncogone/a-software-postmortem-k3b</link>
      <guid>https://forem.com/droncogone/a-software-postmortem-k3b</guid>
      <description>&lt;p&gt;All software systems will have downtime or failure at some point in their lifecycle. Writing a concise, yet detailed &lt;a href="https://en.wikipedia.org/wiki/Postmortem_documentation"&gt;post-mortem&lt;/a&gt; is essential for the on-call engineer after the issue is mitigated.&lt;br&gt;&lt;br&gt;
Below is an attempt to demonstrate how to write an issue &lt;a href="https://en.wikipedia.org/wiki/Postmortem_documentation"&gt;post-mortem&lt;/a&gt; after a service outage in hospital management software.  &lt;/p&gt;

&lt;h2&gt;
  
  
  Table of Content
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Issue Summary&lt;/li&gt;
&lt;li&gt;Timeline&lt;/li&gt;
&lt;li&gt;Root cause and Resolution&lt;/li&gt;
&lt;li&gt;
Corrective and Preventive measures
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Issue Summary
&lt;/h2&gt;

&lt;p&gt;There was a service downtime between 8:45 pm WAT and 9:23 pm WAT on January 5th, 2022.&lt;br&gt;&lt;br&gt;
The effect was mainly on the pharmacy interface. Healthcare personnel were not able to make prescription requests to the pharmacy. All personnel were affected.&lt;br&gt;&lt;br&gt;
The cause of the downtime was traced to a bug introduced by the API update of December 2021. The bug was triggered by onboarding drugs whose name contains non-ASCII characters.&lt;/p&gt;

&lt;h2&gt;
  
  
  Timeline
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;8:45 pm WAT&lt;/strong&gt; - Issue was noticed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;8:50 pm WAT&lt;/strong&gt; - complaint was made to the IT front desk by the pharmacist on call while trying to enter newly acquired drugs into the pharmacy store.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;8:55 pm WAT&lt;/strong&gt; -personnel thought it was a network-related error and proceeded to refresh the browser page multiple times, noting that other aspects of the service were up but the issue persisted.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;9:02 pm WAT&lt;/strong&gt; - The incident was reported to the IT front desk officer who promptly placed a call to the IT officer on call (Terence Waller).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;9:10 pm&lt;/strong&gt; - Since other microservices were not affected and the issue was seen to be specific to pharmacy-related requests, the debug mode was turned on which revealed where the error was.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;9:20 pm WAT&lt;/strong&gt; - The incident was mitigated by reverting the current API version to the previous one and restarting the pharmacy microservice. This is to allow for a more rigorous review of the recent update and to fix other potential errors yet to surface.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;9:23 pm WAT&lt;/strong&gt; - Service is back up.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Root cause and Resolution
&lt;/h2&gt;

&lt;p&gt;As stated above, the root cause of the downtime was traced to a bug in the pharmacy API introduced by the last update from API version &lt;code&gt;0.2.6&lt;/code&gt; to &lt;code&gt;0.3.0&lt;/code&gt;. The bug was found in the file at /api/views/pharmacy/drugs.py on line 254 in view function add_drugs(details). The character encoding was not passed to the open function writing to a temporary drug file.  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--UWeg1Q9T--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/scjeufmmq9f14jqrz3op.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--UWeg1Q9T--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/scjeufmmq9f14jqrz3op.png" alt="Bug Image" width="754" height="124"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The issue was fixed by starting the backup service running on API version &lt;code&gt;0.2.6&lt;/code&gt;, isolating the current version to allow for a more extensive investigation into it when the team assembles.&lt;/p&gt;

&lt;h2&gt;
  
  
  Corrective and Preventive measures
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;set &lt;code&gt;debug=True&lt;/code&gt; on the flask server start-up command&lt;/li&gt;
&lt;li&gt;restart the server.&lt;/li&gt;
&lt;li&gt;replicate the issue by sending a drug request with a non-ASCII character in the name.&lt;/li&gt;
&lt;li&gt;print the debug output and save the traceback to an error.txt file.&lt;/li&gt;
&lt;li&gt;extract the server access and error logs&lt;/li&gt;
&lt;li&gt;shut down the current server based on the current API&lt;/li&gt;
&lt;li&gt;start up the backup server running on API version &lt;code&gt;0.2.6&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;repeat the same request&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Improvements need to be made in the code review process. Adding more automated tests to cover some extraneous scenarios will help with spotting such errors in the future. Using a strict mode for the codebase is another way to mitigate such issues.&lt;/p&gt;

&lt;p&gt;Thank you for reading. I hope you learned a thing or two. Catch you later. Till then, peace!&lt;/p&gt;

</description>
      <category>devops</category>
      <category>writing</category>
      <category>python</category>
      <category>backend</category>
    </item>
  </channel>
</rss>
