<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: #benaryorg</title>
    <description>The latest articles on Forem by #benaryorg (@benaryorg).</description>
    <link>https://forem.com/benaryorg</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1873%2F03f8a348-f4da-4583-a81c-539a8057a9ab.png</url>
      <title>Forem: #benaryorg</title>
      <link>https://forem.com/benaryorg</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/benaryorg"/>
    <language>en</language>
    <item>
      <title>Handling your personal data online.</title>
      <dc:creator>#benaryorg</dc:creator>
      <pubDate>Sun, 10 Sep 2017 01:01:01 +0000</pubDate>
      <link>https://forem.com/benaryorg/handling-your-personal-data-online</link>
      <guid>https://forem.com/benaryorg/handling-your-personal-data-online</guid>
      <description>

&lt;p&gt;How are you dealing with your online identity?&lt;br&gt;
I tend to keep my private life and my online persona somewhat separated.&lt;br&gt;
This includes but is not limited to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;keeping my real name offline&lt;/li&gt;
&lt;li&gt;not posting pictures of me online&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;How do you handle that?&lt;br&gt;
Have you encountered any problems regarding that?&lt;br&gt;
Do you have any recommendations?&lt;/p&gt;


</description>
      <category>discuss</category>
      <category>personaldata</category>
      <category>career</category>
      <category>privacy</category>
    </item>
    <item>
      <title>I'm an ops person. Ask me anything!</title>
      <dc:creator>#benaryorg</dc:creator>
      <pubDate>Sun, 10 Sep 2017 00:55:59 +0000</pubDate>
      <link>https://forem.com/benaryorg/im-an-ops-person-ask-me-anything</link>
      <guid>https://forem.com/benaryorg/im-an-ops-person-ask-me-anything</guid>
      <description>&lt;p&gt;I'm an ops person a year into business, ask me anything.&lt;br&gt;
Ask me technical questions, ask me to design a theoretical system, ask me things about my career.…&lt;/p&gt;

&lt;p&gt;This will be my permanent AMA for anything that's technical.&lt;/p&gt;

</description>
      <category>ama</category>
      <category>ops</category>
      <category>programming</category>
      <category>sysadmin</category>
    </item>
    <item>
      <title>Keeping Track of your Skills</title>
      <dc:creator>#benaryorg</dc:creator>
      <pubDate>Sun, 10 Sep 2017 00:47:36 +0000</pubDate>
      <link>https://forem.com/benaryorg/keeping-track-of-your-skills</link>
      <guid>https://forem.com/benaryorg/keeping-track-of-your-skills</guid>
      <description>&lt;p&gt;Hi, I'm an ops person.&lt;br&gt;
I've been a developer.&lt;br&gt;
Done distributed systems and System Engineering.&lt;br&gt;
I did things with Haskell, Rust, C, Perl, Shell (lots of shell), GNU/Linux, {Free,Open}BSD, TLS, x509 and whatnot.&lt;/p&gt;

&lt;p&gt;It's been some time since I started and sometimes I lose track of what I – hey, there's Python, JS, HTML/CSS missing in that list above – actually learned over the years.&lt;br&gt;
There's still moments when I notice that I actually know what a &lt;code&gt;TIME_WAIT&lt;/code&gt; on Linux is and why it's there and also what the difference between an abstract class and an interface is in Java.&lt;/p&gt;

&lt;p&gt;So now my final question: &lt;strong&gt;Do you, if yes, how do you keep track of all that?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Note: I know that there are certain advantages to not keeping track (e.g. not accidentally claiming to know tech when your knowledge is hopelessly out of date).&lt;/p&gt;

</description>
      <category>discuss</category>
      <category>skills</category>
      <category>career</category>
    </item>
    <item>
      <title>Lines of Code don't matter.</title>
      <dc:creator>#benaryorg</dc:creator>
      <pubDate>Sat, 02 Sep 2017 00:52:08 +0000</pubDate>
      <link>https://forem.com/benaryorg/lines-of-code-dont-matter</link>
      <guid>https://forem.com/benaryorg/lines-of-code-dont-matter</guid>
      <description>&lt;p&gt;We all long ago learned that LOC (Lines Of Code) are a terrible unit for measurement.&lt;br&gt;
Well, at least most of us learned that.&lt;/p&gt;

&lt;p&gt;Now when I sat down this Friday to work on some internal magic to get some text from your console to a dashboard (easier said than done, I've found CouchDB to be the tool of choice) at the end I was doubting my productiveness.&lt;/p&gt;

&lt;h1&gt;
  
  
  The Result
&lt;/h1&gt;

&lt;p&gt;At the end I had two things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;three hours in our time tracking, might have been quite a bit more though&lt;/li&gt;
&lt;li&gt;104 (one hundred and four) lines of beautiful shell code in our GitLab&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's even less than a line of code per minute.&lt;br&gt;
Hell, that even contained a block of code that's been duplicated five times with a different variable name&lt;br&gt;
(it's a non-trivial case to DRY that part).&lt;/p&gt;

&lt;p&gt;Granted, we did port the whole thing from a terrible &lt;code&gt;vim file;pandoc file | curl home-grown-nodejs-daemon&lt;/code&gt; to a cleaner database solution with revisions and stuff, but the discussion part was just about an hour or so.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Script
&lt;/h2&gt;

&lt;p&gt;So what does the script do?&lt;br&gt;
Basically it&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;uses &lt;code&gt;getopts&lt;/code&gt; to get some variables filled&lt;/li&gt;
&lt;li&gt;reads missing variables from &lt;em&gt;stdin&lt;/em&gt; if the &lt;em&gt;tty&lt;/em&gt; is interactive&lt;/li&gt;
&lt;li&gt;fails if mandatory variables are missing&lt;/li&gt;
&lt;li&gt;downloads the current document&lt;/li&gt;
&lt;li&gt;merges the current document with the new entry&lt;/li&gt;
&lt;li&gt;pushes that back to CouchDB wrapped with the correct revision&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Seems like an easy task, right?&lt;/p&gt;

&lt;h1&gt;
  
  
  Why Lines of Code are bad measurement.
&lt;/h1&gt;

&lt;p&gt;I've put &lt;strong&gt;a lot&lt;/strong&gt; of effort into making the script as robust as possible.&lt;br&gt;
If at some point you enter something like a literal &lt;code&gt;my"name\":{}\x123&lt;/code&gt; it will be stored the very same way in the database.&lt;br&gt;
Everyone who has ever dealt with shell-scripts will know that it's hell of an effort to not fail at this.&lt;br&gt;
There is your shell which has escaping.&lt;br&gt;
There is the json merging which needs the string input to be escaped.&lt;br&gt;
There is the curl which could possibly fail.&lt;br&gt;
There is so much that could go wrong.&lt;/p&gt;

&lt;p&gt;It took five lines of (maybe too) tightly packed shell that use a variable, read it if not set, but only if the tty is interactive, fail otherwise, escape it (properly, not only a simple &lt;code&gt;s/"/\"/g&lt;/code&gt;) and store it in a new read-only variable.&lt;br&gt;
This works for all inputs, including special characters that need special escaping in JSON (think: "binary" characters, multibyte, hell even emoji).&lt;/p&gt;

&lt;p&gt;That's five lines.&lt;br&gt;
You'd have trouble putting that in such a tightly packed piece of code in programming languages that don't need super special escaping.&lt;/p&gt;

&lt;p&gt;There is no meaning to the number of lines of code, because it's an artificial number that can be changed at will (blank lines, moving lines together, comments, .…).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;But further, there are task which seem so thoroughly trivial, but end up in a lot of work. Sometimes they even turn out to actually &lt;em&gt;be&lt;/em&gt; plain simple, but that might not be obvious at first. There often is an elegant solution to a simple problem, that is so elegant and plain that you simply don't see it.&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>loc</category>
      <category>productivity</category>
      <category>development</category>
      <category>programming</category>
    </item>
    <item>
      <title>How To Build A RegEx</title>
      <dc:creator>#benaryorg</dc:creator>
      <pubDate>Sun, 16 Jul 2017 14:51:39 +0000</pubDate>
      <link>https://forem.com/benaryorg/how-to-build-a-regex</link>
      <guid>https://forem.com/benaryorg/how-to-build-a-regex</guid>
      <description>&lt;h1&gt;
  
  
  Updates
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;I might at some point update this post, but over at &lt;a href="https://benaryorg.github.io"&gt;my own blog&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  How to build a RegEx
&lt;/h1&gt;

&lt;p&gt;I see people abusing regexes just about every day.&lt;br&gt;
If you're really good at regexes then you will certainly feel some sort of pain&lt;br&gt;
as soon as you see &lt;code&gt;.*&lt;/code&gt; or similar constructs in inappropriate places.&lt;/p&gt;

&lt;p&gt;So here is my guide to doing it the right way.&lt;/p&gt;
&lt;h2&gt;
  
  
  Notes
&lt;/h2&gt;

&lt;p&gt;I'm going to use PCRE all over the place.&lt;br&gt;
My preferred syntax is &lt;code&gt;m{something}&lt;/code&gt; and &lt;code&gt;s{a}{b}g&lt;/code&gt; so I'll stick with those.&lt;/p&gt;

&lt;p&gt;You can try all examples using:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# for searches, just start typing, quit using ^D&lt;/span&gt;
perl &lt;span class="nt"&gt;-ne&lt;/span&gt; &lt;span class="s1"&gt;'m{a(.)c} &amp;amp;&amp;amp; print "$1\n"'&lt;/span&gt;
&lt;span class="c"&gt;# for replacements&lt;/span&gt;
perl &lt;span class="nt"&gt;-pe&lt;/span&gt; &lt;span class="s1"&gt;'s{a.c}{abc}g'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Why not use &lt;code&gt;.*&lt;/code&gt;?
&lt;/h2&gt;

&lt;p&gt;You're looking for a needle in a haystack.&lt;br&gt;
A practical example: you look for your favourite plushie in your room.&lt;/p&gt;

&lt;p&gt;A nice regex for that would be &lt;code&gt;m{\bplushie\b}&lt;/code&gt;.&lt;br&gt;
It looks for your plushie as one word, meaning that it will match only if on&lt;br&gt;
each side the word ends.&lt;br&gt;
See also &lt;a href="http://www.regular-expressions.info/wordboundaries.html"&gt;word&lt;br&gt;
boundaries&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;What I see people do, which is completely ridiculous, is&lt;br&gt;
&lt;code&gt;m{^.*\bplushie\b.*$}&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Let me explain:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;They match the beginning of the line even though they don't need it.&lt;/li&gt;
&lt;li&gt;They match the end of the line.&lt;/li&gt;
&lt;li&gt;They let their parser go through all of the characters, even though they
already found what they were looking for.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you look for an occurrence somewhere, that does not need to be anywhere&lt;br&gt;
specific then why are you looking at everything else?&lt;br&gt;
You don't walk into your room and start looking from one side sequentially to&lt;br&gt;
the other.&lt;br&gt;
What you do is look in your room, see the plushie sitting on the bed and take&lt;br&gt;
it.&lt;/p&gt;
&lt;h2&gt;
  
  
  Complex Examples
&lt;/h2&gt;

&lt;p&gt;Let's choose &lt;em&gt;fail2ban&lt;/em&gt; as an example.&lt;br&gt;
We want to block every IP sending more than 1000 HTTP requests per five&lt;br&gt;
seconds.&lt;br&gt;
I'll ignore how to configure &lt;em&gt;fail2ban&lt;/em&gt; as it's not relevant to the regex&lt;br&gt;
thing.&lt;/p&gt;

&lt;p&gt;First try, without even looking at the format of the logs: &lt;code&gt;m{^.*&amp;lt;HOST&amp;gt;.*$}&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;You probably messed up very hard here.&lt;br&gt;
To explain why you've messed up so hard, let's look at one line of logs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;10.0.0.1 - - [16/Jul/2017:15:38:54 +0200] "GET /robots.txt HTTP/1.0" 404 319 "-" "Mozilla/5.0 (compatible; AhrefsBot/5.2; +http://ahrefs.com/robot/)" "-"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I took that line out of my server, the &lt;em&gt;10.0.0.1&lt;/em&gt; is the part that we want&lt;br&gt;
(please ignore that it's an internal IP).&lt;/p&gt;

&lt;p&gt;&lt;code&gt;m{.*}&lt;/code&gt; does greedy matching, so the above regex will be very easy to break.&lt;br&gt;
Just put an IP in the User Agent.&lt;br&gt;
The User Agent contains spaces as you might have noticed, which is fine as the&lt;br&gt;
string is quoted (and &lt;em&gt;nginx&lt;/em&gt; does some escaping on the contained string).&lt;br&gt;
Now what'll happen if I put an IP address in the User Agent?&lt;br&gt;
Right, due to greedy matching the rightmost &lt;code&gt;&amp;lt;HOST&amp;gt;&lt;/code&gt; will match.&lt;br&gt;
This will of course result in some serious problems.&lt;/p&gt;

&lt;p&gt;As a malicious hacker I could:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Circumvent &lt;em&gt;fail2ban&lt;/em&gt; altogether by putting &lt;code&gt;127.0.0.1&lt;/code&gt; into my User Agent.
That will effectively turn off &lt;em&gt;fail2ban&lt;/em&gt; for my requests as long as
&lt;code&gt;127.0.0.1&lt;/code&gt; is whitelisted. If that IP is not whitelisted, you've got
problems a lot worse (assuming &lt;em&gt;fail2ban&lt;/em&gt; uses &lt;em&gt;iptables&lt;/em&gt;, this will break at
least half of your server's software, think of accessing &lt;em&gt;MySQL&lt;/em&gt; on
&lt;em&gt;localhost&lt;/em&gt;).&lt;/li&gt;
&lt;li&gt;Block arbitrary IPs from accessing your website in much the same way.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So how are we going to construct a Regex that will match &lt;strong&gt;only&lt;/strong&gt; what we're&lt;br&gt;
looking for here?&lt;/p&gt;

&lt;p&gt;Well, the IP is right at the start, so we'll take &lt;code&gt;m{^&amp;lt;HOST&amp;gt;.*$}&lt;/code&gt; right?&lt;/p&gt;

&lt;p&gt;Technically right, but I wouldn't write it that way.&lt;br&gt;
That regex would again match everything after the IP, but &lt;em&gt;we really don't care&lt;br&gt;
about that&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;What we should do is a simple &lt;code&gt;m{^&amp;lt;HOST&amp;gt;}&lt;/code&gt;.&lt;br&gt;
This works as intended and it does only look at the first few characters.&lt;br&gt;
If you want to make sure that it's followed by a space, go ahead, please.&lt;/p&gt;

&lt;p&gt;So we end up with a fool-proof regex: &lt;code&gt;m{^&amp;lt;HOST&amp;gt;\s}&lt;/code&gt;.&lt;br&gt;
This will for sure fulfill all our needs.&lt;/p&gt;

&lt;p&gt;To be honest, this example is kind of easy as the IP is right at the start.&lt;br&gt;
Let's assume some other format so we can work out a more general way for this&lt;br&gt;
to work.&lt;/p&gt;
&lt;h3&gt;
  
  
  Copy, Paste, Replace
&lt;/h3&gt;

&lt;p&gt;We are going to perform CPR on a line of text.&lt;br&gt;
As said above, this time we want to get something that is &lt;em&gt;not&lt;/em&gt; right at the&lt;br&gt;
start.&lt;br&gt;
We'll look at the following line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[16/Jul/2017:16:27:41 +0200] openbsd.cloud.bsocat.net - - 66.133.109.36 "GET /.well-known/acme-challenge/tTRnUGY9gZEVz2llGWqn1m3mHznMDOFH3zCXsgelh7w HTTP/1.1" 200 87
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a slightly modified OpenBSD httpd log-format.&lt;br&gt;
By &lt;em&gt;slightly&lt;/em&gt; I mean:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;date and time moved to the front&lt;/li&gt;
&lt;li&gt;host moved a bit backwards&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now let's do this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# copy the line, verbatim
[16/Jul/2017:16:27:41 +0200] openbsd.cloud.bsocat.net - - 66.133.109.36 "GET /.well-known/acme-challenge/tTRnUGY9gZEVz2llGWqn1m3mHznMDOFH3zCXsgelh7w HTTP/1.1" 200 87

# remove everything after the needle (except for the delimiter), we don't need it
[16/Jul/2017:16:27:41 +0200] openbsd.cloud.bsocat.net - - 66.133.109.36\s

# add needed meta-characters (start of line)
^[16/Jul/2017:16:27:41 +0200] openbsd.cloud.bsocat.net - - 66.133.109.36\s

# escape all the characters that need escaping
^\[16/Jul/2017:16:27:41 \+0200\] openbsd\.cloud\.bsocat\.net - - 66\.133\.109\.36\s

# replace the host
^\[16/Jul/2017:16:27:41 \+0200\] openbsd\.cloud\.bsocat\.net - - &amp;lt;HOST&amp;gt;\s

# replace everything that is not static by their possible values
# this requires a lot of in depth knowledge about the log format
# let's do this the easy way and just replace using dots for the date
^\[../.../....:..:..:.. .....\] openbsd\.cloud\.bsocat\.net - - &amp;lt;HOST&amp;gt;\s

# for the other fields we just specify them to "not contain spaces" as these
# are used for delimiting, so they will not occur in the fields
^\[../.../....:..:..:.. .....\] [^\s]+ [^\s]+ [^\s]+ &amp;lt;HOST&amp;gt;\s

# fail2ban needs spaces escaped I think
^\[../.../....:..:..:..\s.....\]\s[^\s]+\s[^\s]+\s[^\s]+\s&amp;lt;HOST&amp;gt;\s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  How do we check?
&lt;/h3&gt;

&lt;p&gt;If it's &lt;em&gt;fail2ban&lt;/em&gt;, just run that.&lt;br&gt;
For everything else:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;perl &lt;span class="nt"&gt;-ne&lt;/span&gt; &lt;span class="s1"&gt;'m{^\[../.../....:..:..:..\s.....\]\s[^\s]+\s[^\s]+\s[^\s]+\s([^\s]+)\s} &amp;amp;&amp;amp; print "$1\n"'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This should output only the IP we were looking for.&lt;/p&gt;

</description>
      <category>regex</category>
    </item>
    <item>
      <title>My Way of (Purely Functional) Programming</title>
      <dc:creator>#benaryorg</dc:creator>
      <pubDate>Wed, 15 Mar 2017 18:27:49 +0000</pubDate>
      <link>https://forem.com/benaryorg/my-way-of-programming</link>
      <guid>https://forem.com/benaryorg/my-way-of-programming</guid>
      <description>

</description>
      <category>atob</category>
      <category>haskell</category>
      <category>fp</category>
      <category>data</category>
    </item>
  </channel>
</rss>
