<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Martin Gouws</title>
    <description>The latest articles on Forem by Martin Gouws (@theocoria).</description>
    <link>https://forem.com/theocoria</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F49828%2F1e24c72d-7c2e-407f-9f6f-9f706b9120e8.jpeg</url>
      <title>Forem: Martin Gouws</title>
      <link>https://forem.com/theocoria</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/theocoria"/>
    <language>en</language>
    <item>
      <title>My Thoughts on ETL and ETL testing</title>
      <dc:creator>Martin Gouws</dc:creator>
      <pubDate>Tue, 09 Jan 2018 03:59:34 +0000</pubDate>
      <link>https://forem.com/theocoria/my-thoughts-on-etl-and-etl-testing-2cjd</link>
      <guid>https://forem.com/theocoria/my-thoughts-on-etl-and-etl-testing-2cjd</guid>
      <description>

&lt;p&gt;&lt;strong&gt;Background&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In the previous position I held, I was entrusted with the fun task of handling the ETL processes for a client. My existence revolved for days at end to extract, transform and load data, from one source to another. The repetition of doing this everyday, as well as the monotonous process of mass data extrapolation, left me exhausted, somewhat frustrated and quite frankly bored out of my skull. I had to learn how to implement the ETL process from scratch, and in those beginning phases, I had very few resources available to me to implement automatic testing.&lt;/p&gt;

&lt;p&gt;As with many things, after the first iteration of building and manually testing my ETL process, I thought and hoped it was perfect. However, this was not the case, the most common problems a tester can encounter is exactly what I would find; some records didn’t get loaded, some were malformed or truncated, some were duplicated, others either gave invalid types or values or were transformed incorrectly, just to name a few issues. To identify these problems took forever, as it could either be a fault in the data or, in the ETL process itself. A process had to be built to resolve these issues and then tested again before deploying to production. This was a major interruption to data flows and required extensive rewrites, refactors and redeployment of the entire ETL infrastructure for every change.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ETL testing&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The reason for automated or manual tests run in ETL processes are to ensure consistency and integrity, prevent regression, as well as fault detection (find more details on &lt;a href="https://www.alooma.com/blog/etl-testing-the-future-is-here"&gt;ETL testing&lt;/a&gt; here). Even with an automated data pipeline and automated ETL tools you still have to perform testing and validation, therefore it does not result in complete end-to-end automation. In my experience there was a cycle in the testing process I manually had to adhere to. In order to prepare the infrastructure environment and data for my ETL process, code had to be run again to determine the quality of the data pipeline and manual debugging had to be put in place after that. Only when my suspicions were confirmed and the &lt;a href="https://developer.ibm.com/recipes/tutorials/a-stepbystep-guide-to-testing-your-data-pipelines/"&gt;input and output&lt;/a&gt; matched, the ETL process was completed and moved into production. This was done under two scopes: &lt;/p&gt;

&lt;p&gt;1) &lt;em&gt;Non-functional testing&lt;/em&gt;, which characterizes performance tuning, load, and fault tolerance of “dirty data”, and &lt;br&gt;
2) &lt;em&gt;Functional testing&lt;/em&gt;, needed for data preparation and problem resolution. This includes unit or component tests, integration tests, and end-to-end testing for example.&lt;/p&gt;

&lt;p&gt;The ETL cycle of testing can be time consuming. Each stage of the testing requires a different strategy or type of testing to be done and it also depends on the client requirements or organization standards. I was not able to use automated ETL testing, although I admittedly did not know enough about it at that point. To start, it would have probably taken me another 18 months to figure out how to use it and get all the data ready. It simply would have not have been cost or time effective.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ETL relevance today&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Looking back, I can almost not believe how accustomed I am today with real-time data at my fingertips. However, this is not yet the case with traditional ETL processing. We are &lt;a href="https://blog.panoply.io/etl-vs-elt-the-difference-is-in-the-how"&gt;still some way&lt;/a&gt; from delivering a similar “real-time” data-driven solution without ETL processes. The idea of what the etl process represents in its essence is very relevant today and will continue to be. The more a company scales and the bigger the data they can acquire, the more the need also expands for better ways to aggregate information and run the right transformations. The end goal of ETL is, ultimately to enable a company to make the best data-driven business decisions and ETL does that by analyzing and using an extraction of the required information. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Concluding thoughts&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Traditional ETL processing and testing is a time consuming task, but the value of the data acquired is invaluable. The tools and paradigms on the other hand, being used by the current market leaders of ETL should be questioned- seeing as the way that they enforce the implementation is archaic in nature. This nature should be re-thought, re-evaluated and exposed to the same kind of rigorous overhaul of practices in the way that for example, &lt;a href="https://due.com/blog/blockchain-to-change-accounting-forever/"&gt;blockchain technology&lt;/a&gt; changed the way we think about ledgers. Maybe then I would be more willing to undertake such a mammoth event again.&lt;/p&gt;


</description>
      <category>etl</category>
      <category>etltesting</category>
    </item>
    <item>
      <title>What is Data Enrichment</title>
      <dc:creator>Martin Gouws</dc:creator>
      <pubDate>Fri, 22 Dec 2017 07:09:50 +0000</pubDate>
      <link>https://forem.com/theocoria/what-is-data-enrichment-5ejb</link>
      <guid>https://forem.com/theocoria/what-is-data-enrichment-5ejb</guid>
      <description>

&lt;h4&gt;
  
  
  Data Enrichment: the What, Why and How
&lt;/h4&gt;

&lt;p&gt;I remembered many years ago while working on a project for Anglo American, one of my responsibilities were to write extensive complex SQL queries spanning hundreds and thousands of records, ranging over almost 10 years of data. This, in turn, was wrapped by a restful API and consumed by a bunch of graph intensive frontends. That was the high-level view of the requirements, the details were much more morbid and the data was incomplete for the most part, and like they say, the rest is history. The point is that I spent the next 9 months aggregating data from other sources, extrapolating some pieces out of thin air and packing it all into ETL (Extract, Translate and Load) processes to fix the existing data and make sure new data remained spotless for years to come. I was only later able to define what I was doing as Data Enrichment.&lt;/p&gt;

&lt;h4&gt;
  
  
  What it is and why it is used
&lt;/h4&gt;

&lt;p&gt;Before we go any further, let’s first define Data Enrichment. As per Technopedia &lt;a href="https://www.techopedia.com/definition/28037/data-enrichment"&gt;data enrichment&lt;/a&gt; is defined as “a general term that refers to processes used to enhance, refine or otherwise improve raw data”. This is quite a broad and slightly ambiguous definition, however, it does give us the gist of it, which is to make data better in every possible way. &lt;/p&gt;

&lt;p&gt;As for why we would want to do this; it depends on who is asking really. If you were to explain this to a product owner or stakeholder you would focus on the value that it adds to the business as well as to the product and the overall bottom line. To the developer, you would argue that this is the single most effective way to put a smile on your bosses face. It is furthermore a way to make sure that the data you are visualizing is correct and makes sense and is easy to work with since there are fewer gaps that need to be accounted for and fewer edge cases to worry about. You would really have to go through a lot of trouble or be speaking to someone with no clue about how the digital world works to mess up a motivation for why data enrichment is a positive thing.&lt;/p&gt;

&lt;h4&gt;
  
  
  How to go about data enrichment
&lt;/h4&gt;

&lt;p&gt;This is where the tire meets the road. There is data that needs enrichment, but where do I start? I think a good place to start would be to consider manual vs automated processes first.&lt;/p&gt;

&lt;p&gt;The manual way is the oldest method of doing data enrichment. Currently, it is also without an equal at handling the most intricate of edge cases in your data. The human mind and eyes are experts at spotting fictitious data when the data set are understood, and can for instance much easier categorize an image based on its content than a computer (&lt;a href="https://changelog.com/podcast/219"&gt;for now&lt;/a&gt; at least). The use cases where manual data enrichment intervention is needed is endless and will undoubtedly present itself quite clearly.&lt;/p&gt;

&lt;p&gt;The automated way is pretty old as well (integration with third-party sources and services started happening as soon as third party sources and services became a thing). Since its inception, the possibilities have increased almost as rapidly as Moore’s Law itself. Today we have an endless array of approaches, implementations, and integrations to choose from. It ranges from algorithms designed to fix spelling mistakes in your data, adding simple data sets, doing data integration, filling in the missing pieces for conventional data and algorithmic and statistical analysis to machine learning constructs like Tensorflow and Hadoop clusters circling around your data lakes. Each of these mentioned methods is deserving of its own book (not to mention a blog post) and would keep any developer extremely busy to try and master it all.&lt;/p&gt;

&lt;h4&gt;
  
  
  Tools to help you do data enrichment
&lt;/h4&gt;

&lt;p&gt;In my mind data enrichment tools can be grouped into three categories, namely: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ETL&lt;/li&gt;
&lt;li&gt;Adding or completing&lt;/li&gt;
&lt;li&gt;Big Data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;1. ETL&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The first category of data enrichment tools is ETL. Although it is a necessary step towards data warehousing, it can exist as a standalone solution for various data enrichment needs. Regardless of the technical definition, the idea of ETL is to take data, do something with it, and store it again. Typically when marshaling data for the first time, we discover patterns and repeatable processes and we can then use that knowledge to write algorithms that determine what do with the data. Another case is where different data sources need to be integrated on a lower level. &lt;/p&gt;

&lt;p&gt;If your needs are enterprise-grade, then consider tools like SQL Server Integration Services or IBM InfoSphere DataStage. If your needs are of the medium to small startup variety I’d suggest skipping the heavy frameworks and going with an RYO approach using a simple Node.js script and AWS Lambda combination or something similar.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Adding information or filling in the gaps&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The second category is the tools that help you add to simple data set or to fill in the gaps. This is almost the most obvious way of doing data enrichment. You have data, it’s good data, but you can benefit from enhancing your data. The rest almost speaks for itself. You have user emails, but no phone number, then you can use tools like &lt;a href="https://www.lusha.co/"&gt;Lusha&lt;/a&gt; or &lt;a href="https://www.leadgenius.com/"&gt;LeadGenius&lt;/a&gt; to add this information. This is just one example. There are so many ways that you can enrich data by using third-party services from extensive car information to aerial data to related map data based on location. The options are endless.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Big Data&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Lastly are the big data tools. This I admit is the category where my own personal experience on actual big data falters completely, not to mention the tools that you can use to enrich this data. As mentioned previously the potential of machine learning constructs are fascinating and could change data enrichment forever by the time it reaches climax. I researched this quite a lot, and one service I found that seems to tap into this market is &lt;a href="https://www.datanyze.com/"&gt;Datanyze&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;
  
  
  Conclusion
&lt;/h4&gt;

&lt;p&gt;I think it is safe to say that Data Enrichment is a necessary process if your data lacks luster and that it also might be needed to just grow a product to its next level. This is a wide overview and hopefully, it makes you think about the multiple possibilities and applications of Data Enrichment.&lt;/p&gt;


</description>
      <category>dataenrichment</category>
      <category>etl</category>
    </item>
    <item>
      <title>Why Developers Need Marketing Tools</title>
      <dc:creator>Martin Gouws</dc:creator>
      <pubDate>Mon, 18 Dec 2017 12:16:18 +0000</pubDate>
      <link>https://forem.com/theocoria/why-developers-need-marketing-tools-539</link>
      <guid>https://forem.com/theocoria/why-developers-need-marketing-tools-539</guid>
      <description>

&lt;h4&gt;
  
  
  The Illusion and Disillusion
&lt;/h4&gt;

&lt;p&gt;Imagine you woke up this morning with an idea to rule all ideas. The idea for SaaS that no one else has ever thought of before. You get so excited to start coding and don’t even think about much else. People are going to love this. Fast forward a few sleepless weeks and all the work is done, pretty and shiny and ready to be deployed. The only step left is to get people to see your brilliance. The right people. Fortunately, that is really easy, and you just add this to the head tag:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&amp;lt;script src="https://www.instantMarketingToolForYourWebApp.com/resources/init.js"&amp;gt;&amp;lt;/script&amp;gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Now that that’s done, you just press the button and wait for the cash to start rolling in. Best marketing tool ever.&lt;/p&gt;

&lt;p&gt;Not entirely. I regret to point out it is much more complicated than that, unfortunately.&lt;/p&gt;

&lt;p&gt;As software developers, we have this inherent desire inside of us to come up with a new idea, something that has never been done before, the ever elusive unicorn, and when that idea comes, we would know exactly how to build it, down to the finest grain of complexity. It would take time and blood and sweat and tears, but when it is done it will immortalize us as One of the Great. After all, we have spent our entire careers working to this point. &lt;/p&gt;

&lt;p&gt;Ultimately a lot of work is needed to make an idea a reality, not just the lines of code. We tend to get so caught up in getting our work done that we don’t leave much time or space to open our minds to the possibility of how to do more than that. You might think that your big idea will manifest itself one day, and until then it’s best to continue growing your experience and master your skills. When will this be? What if there was another path? Upon realizing this, you, unfortunately, don’t have much else planned out. At this point you either think a) this idea is so amazing, it will instantly be discovered or b) that no one will ever see how amazing it is.&lt;/p&gt;

&lt;p&gt;The one thing that is so neglected by many is the almost complete disregard of an industry that seems to have been around for forever. Marketing. You know what it is. And although you didn’t study engineering to become a marketer, it wouldn’t hurt to wrap your mind around this concept that could save you a lot of time and also heartbreak. Before you dive in, a good place to start is this wiki that includes an overview of &lt;a href="http://salesmarketingstack.com/"&gt;marketing tools and technologies&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;
  
  
  Five Principles
&lt;/h4&gt;

&lt;p&gt;Here follow my 5 principles to cultivate a healthy understanding of marketing. These principles are intended to inspire further reading and research of your own, and above all else, an awareness and altered thinking about your next great idea. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1) Market research&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Ideas you can come up with might be great to you, but not needed or wanted by others. Market research will give you the answer to that question and enable you to know you aren’t giving people what they don’t want. Test the water, use social media to ask questions, follow questions and even answer a few of your own. Learn more about which technologies companies are using to market their products and how these technologies relate to each other. There are various ways and platforms to engage with like-minded people, find one that works for you and leverage this to validate your ideas before you start building anything.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2) Engage with an audience&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It’s a stereotype to say that all developers are introverts, however, it is mostly true. Luckily for us, we can engage through the internet and that is one less thing for us to get anxious about. You’ve got this. Give your opinion on what you know, share your knowledge and show the world (or a following) who you are. &lt;/p&gt;

&lt;p&gt;Nobody will find your next big idea if you stay in your chair and stare at your screen. Engage with someone. Don’t be afraid to build some sort of online community presence. Tweet, post, blog or vlog about anything in line with your experience in order to prepare your audience (or the world for that matter) for whatever you will eventually come up with. The topic of audience building is massive and exhaustively covered by others, so please do further reading on this topic. It is also worth mentioning that many marketing tools (like Narrow for instance) specialize in growing your audience and other tools (like Hootsuite) help you manage many different social media accounts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3) Get people interested in your own personal brand&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Give people a taste of your work. Start a page with signup opportunities so that people can get notified when you have something new and exciting to say (find a great article about landing pages &lt;a href="https://www.toptal.com/designers/landing-page-designers/effective-landing-page-design"&gt;here&lt;/a&gt;). Don’t hold back for that one next big thing. Small snowballs turn into gigantic ones when they keep rolling. Be a snowball. Everything you work on is connected to the next (I based this on the idea from this great &lt;a href="http://www.fullstackradio.com/42"&gt;podcast&lt;/a&gt;). You might never get to see your biggest work if you don’t do the smaller ones in between. Who knows if one of those might line up to be your unicorn starter. Perceptions change, views shift, interests continually develop and it is important to take your audience with on this journey. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4) Always communicate your true self&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Stay in touch and communicate your true self with your audience. The biggest part of your success is other people and once you have people looking at you, you can get up on your own stage and tell them what they want to hear from you. Being true to yourself and your ideas go a far way and people respond well to someone like this. If you have to fake it to make it you will probably falter at some point, so rather just stick to who you are and present that to the world.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5) Learn how to leverage applicable parts of a marketing stack&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A full-fledged marketing stack is a beast and something that you as a lone developer will most likely never invest in. There are however some of the parts that you can leverage and do really well in order to market your service or application. I think one thing that we sometimes miss is considering SEO from the very beginning. Admittedly I’ve always forgotten about this and I found &lt;a href="https://moz.com/blog/seo-cheat-sheet"&gt;this article&lt;/a&gt; really helpful. Something that ties in with SEO directly but is mostly also forgotten is &lt;a href="https://wuhcag.com/web-content-accessibility-guidelines/"&gt;accessibility&lt;/a&gt;. This makes sure that you reach as many people as possible. Another very powerful tool that lies at the base of most marketing stacks is, of course, Google Analytics. The resources for learning how to leverage Google Analytics properly are endless (I found &lt;a href="https://www.syscomminternational.com/tracking-traffic-website-going-change-business-strategies/"&gt;this&lt;/a&gt; to be a useful initial review). We all know how important usage data is and this will give an endless stream of data to evolve and improve your application to have maximum impact.&lt;/p&gt;

&lt;h4&gt;
  
  
  Conclusion
&lt;/h4&gt;

&lt;p&gt;Maybe one day, someone will write that plugin/service that I joked about at the beginning of this article, however until that becomes a reality, it will take hard work and altered thinking to align yourself for the possibility of greatness.&lt;/p&gt;


</description>
      <category>marketing</category>
      <category>careerownership</category>
      <category>career</category>
    </item>
  </channel>
</rss>
