<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Bill Schneider</title>
    <description>The latest articles on Forem by Bill Schneider (@wrschneider).</description>
    <link>https://forem.com/wrschneider</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1828%2FIMG_8800_400x400_1_.JPG</url>
      <title>Forem: Bill Schneider</title>
      <link>https://forem.com/wrschneider</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/wrschneider"/>
    <language>en</language>
    <item>
      <title>Readability with break statements</title>
      <dc:creator>Bill Schneider</dc:creator>
      <pubDate>Tue, 06 Nov 2018 19:52:06 +0000</pubDate>
      <link>https://forem.com/wrschneider/readability-with-break-statements-9b0</link>
      <guid>https://forem.com/wrschneider/readability-with-break-statements-9b0</guid>
      <description>

&lt;p&gt;There is a widespread belief that &lt;code&gt;break&lt;/code&gt; and &lt;code&gt;continue&lt;/code&gt; statements are a bad programming practice.  &lt;a href="https://softwareengineering.stackexchange.com/questions/58237/are-break-and-continue-bad-programming-practices"&gt;See this StackOverflow thread for an discussion.&lt;/a&gt;  They can make code less readable if they make the intent less clear.  In some cases, though, I believe they are actually better than the alternatives and can improve readability.&lt;/p&gt;

&lt;p&gt;Here's a recent example from working with AWS in Python.&lt;/p&gt;



&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;ssm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"ssm"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ssm&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;send_command&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;command_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"Command"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="s"&gt;"CommandId"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="n"&gt;invocation&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ssm&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_command_invocation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt; &lt;span class="n"&gt;command_id&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;invocation&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'Status'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'Pending'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'InProgress'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;break&lt;/span&gt;
  &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;This code sends a command to AWS SSM, and gets the status of the invocation in a loop, exiting when completed.  It will also pause before retrying.  &lt;/p&gt;

&lt;p&gt;Note that the only way to terminate this loop is the &lt;code&gt;break&lt;/code&gt; statement.&lt;/p&gt;

&lt;p&gt;I thought of a few alternatives to avoid a &lt;code&gt;break&lt;/code&gt; but actually like them less.&lt;/p&gt;

&lt;p&gt;One option is to bootstrap an initial request outside the loop, so you can include the status directly on the &lt;code&gt;while&lt;/code&gt; condition:&lt;/p&gt;



&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;invocation&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ssm&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_command_invocation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt; &lt;span class="n"&gt;command_id&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="n"&gt;invocation&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'Status'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'Pending'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'InProgress'&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
  &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;invocation&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ssm&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_command_invocation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt; &lt;span class="n"&gt;command_id&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;I don't like this as much because of the duplication.&lt;/p&gt;

&lt;p&gt;Another way to address this is by using a flag to indicate completion:&lt;/p&gt;



&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;done&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;done&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="n"&gt;invocation&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ssm&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_command_invocation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt; &lt;span class="n"&gt;command_id&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;done&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;invocation&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'Status'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'Pending'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'InProgress'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;No duplication here, but the flag variable is extra clutter, and is not really that much more readable than the &lt;code&gt;break&lt;/code&gt; version, because it's not obvious from the &lt;code&gt;while&lt;/code&gt; statement itself what condition will cause the loop to terminate.  Also, even if the loop completes on the first iteration, you still have to &lt;code&gt;sleep&lt;/code&gt; before exiting the loop -- unless you add a &lt;code&gt;break&lt;/code&gt;, in which case the flag is useless.&lt;/p&gt;

&lt;p&gt;Compared to the other options, the original version with the &lt;code&gt;while True / break&lt;/code&gt; feels like the least bad, even if that conclusion is unintuitive.&lt;/p&gt;


</description>
      <category>awspython</category>
    </item>
    <item>
      <title> Spring Boot listener for AWS SQS with Spring Cloud</title>
      <dc:creator>Bill Schneider</dc:creator>
      <pubDate>Tue, 18 Sep 2018 01:36:14 +0000</pubDate>
      <link>https://forem.com/wrschneider/-spring-boot-listener-for-aws-sqs-with-spring-cloud-5cjk</link>
      <guid>https://forem.com/wrschneider/-spring-boot-listener-for-aws-sqs-with-spring-cloud-5cjk</guid>
      <description>&lt;p&gt;&lt;em&gt;This originally appeared on &lt;a href="http://localhost:4000/2018/09/17/spring-boot-sqs.html"&gt;my personal blog&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I was surprised how little code I needed to get a Spring Boot application listening to an Amazon SQS queue.&lt;/p&gt;

&lt;p&gt;I put a &lt;a href="https://gist.github.com/wrschneider/42407cc2ea70799362cc5b044ebcfabb"&gt;gist on Github&lt;/a&gt; to illustrate.&lt;/p&gt;

&lt;p&gt;The key is that when you have the right dependencies in your Maven POM, all you have to do is annotate your listener method with &lt;code&gt;@SqsListener&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;@SqsListener("your-queue-name")
public void listen(DataObject message) {
    LOG.info("!!!! received message {} {}", message.getFoo(), message.getBar());
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The dependency on &lt;code&gt;spring-cloud-starter-aws&lt;/code&gt; takes care of initializing everything and scanning for annotated methods.&lt;/p&gt;

&lt;h4&gt;
  
  
  Command line arguments and authentication
&lt;/h4&gt;

&lt;p&gt;You specify AWS credentials and region through Spring Boot properties.  I passed these as command line arguments through Eclipse, where I was debugging locally / not on AWS:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;--cloud.aws.region.static=us-east-1&lt;/code&gt; set my region to US East 1 (Northern VA). &lt;/li&gt;
&lt;li&gt;
&lt;code&gt;--cloud.aws.credentials.useDefaultAwsCredentialsChain=true&lt;/code&gt; tells Spring Boot to use the AWS &lt;code&gt;DefaultAWSCredentialsChain&lt;/code&gt;,
which will pull credentials from either environment vars or &lt;code&gt;~/.aws/credentials&lt;/code&gt; file.&lt;/li&gt;
&lt;li&gt;I also set environment variable &lt;code&gt;AWS_PROFILE&lt;/code&gt; for the default credentials chain to find my credentials under the correct profile.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If I were running in AWS itself, the EC2 instance metadata could have determined the region automatically, and also provided credentials via the instance profile.&lt;/p&gt;

&lt;h4&gt;
  
  
  Testing via AWS console
&lt;/h4&gt;

&lt;p&gt;I used the AWS console to send test messages. The main gotcha is that if you are using JSON messages and using Spring to automatically deserialize JSON to your objects via &lt;code&gt;@JsonProperty&lt;/code&gt; annotations, you will need to specify the message&lt;br&gt;
attribute (header) &lt;code&gt;contentType&lt;/code&gt; with value &lt;code&gt;application/json&lt;/code&gt;.   Otherwise the conversion will fail with an unhelpful error message like "Cannot convert from [java.lang.String] to .... for GenericMessage ...." with no indication why &lt;br&gt;
there was a failure.&lt;/p&gt;

&lt;p&gt;There is another alternative for &lt;a href="http://cloud.spring.io/spring-cloud-static/spring-cloud-aws/2.0.0.RELEASE/multi/multi__messaging.html#_consuming_aws_event_messages_with_amazon_sqs"&gt;reconfiguring the default Spring messaging classes&lt;/a&gt; to ignore the &lt;code&gt;contentType&lt;/code&gt; header. &lt;/p&gt;

</description>
      <category>aws</category>
    </item>
    <item>
      <title>Spark UDFs to migrate from other SQL dialects</title>
      <dc:creator>Bill Schneider</dc:creator>
      <pubDate>Mon, 08 Jan 2018 15:20:24 +0000</pubDate>
      <link>https://forem.com/wrschneider/spark-udfs-to-migrate-from-other-sql-dialects-1mg6</link>
      <guid>https://forem.com/wrschneider/spark-udfs-to-migrate-from-other-sql-dialects-1mg6</guid>
      <description>&lt;p&gt;&lt;em&gt;This article originally appeared &lt;a href="http://wrschneider.github.io/2017/12/05/spark-udf-for-sql.html" rel="noopener noreferrer"&gt;on my blog&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I found it helpful to create Spark UDFs to make it easier to migrate logic in SQL from another database like SQL Server.&lt;/p&gt;

&lt;p&gt;SQL Server defines several string functions like &lt;code&gt;LEN&lt;/code&gt;, &lt;code&gt;REPLACE&lt;/code&gt; and &lt;code&gt;CHARINDEX&lt;/code&gt;, which are not available in Spark by default.  Fortunately these are easy to implement in Spark with UDFs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;spark.udf.register("len", (s: String) =&amp;gt; s.length())
spark.udf.register("replace", (orig: String, toReplace: String, replaceString: String) =&amp;gt; orig.replace(toReplace, replaceString))
spark.udf.register("charindex", (substring: String, str: String, startPos: Int) =&amp;gt; str.indexOf(substring, startPos - 1) + 1)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These are all thin wrappers around native Scala string functions.  These will now be available for use in Spark SQL queries:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;select len('foo bar') as len_test,
     replace('foo bar baz', 'bar', 'quux') as replace_test,
     charindex('.', '1.2.3', 3) as idx_should_be_4,
     charindex('.', '1.2.3', 0) as idx_should_be_2,
     charindex('@', '1.2.3', 0) as idx_should_be_0  
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This makes it a little easier to copy-paste queries from SQL Server to Spark, if the syntax is otherwise standard.&lt;/p&gt;

&lt;p&gt;The one downside is that Spark UDFs are functions, not methods, and as such &lt;a href="https://stackoverflow.com/questions/25234682/in-scala-can-you-make-an-anonymous-function-have-a-default-argument" rel="noopener noreferrer"&gt;do not allow for default argument values&lt;/a&gt;.  So you would have to explicitly add a &lt;code&gt;0&lt;/code&gt; as the third argument for &lt;code&gt;CHARINDEX&lt;/code&gt; wherever it's missing.&lt;/p&gt;

</description>
      <category>spark</category>
      <category>scala</category>
    </item>
    <item>
      <title>EC2 proxy to RDS for a static IP address</title>
      <dc:creator>Bill Schneider</dc:creator>
      <pubDate>Fri, 05 Jan 2018 18:58:42 +0000</pubDate>
      <link>https://forem.com/wrschneider/ec2-proxy-to-rds-for-a-static-ip-address-11i8</link>
      <guid>https://forem.com/wrschneider/ec2-proxy-to-rds-for-a-static-ip-address-11i8</guid>
      <description>&lt;p&gt;&lt;em&gt;This post originally appeared &lt;a href="http://wrschneider.github.io/2017/12/18/rds-ec2-proxy.html"&gt;on my blog&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;RDS instances in AWS do not get a static IP address.  This is usually a good thing, not a problem.  This provides flexibility to preserve availability while the physical RDS host may shift around for resizing, or failing over to a different availability zone (AZ).  In either case, clients connect to RDS by hostname, and AWS magically updates the hostname to point at the IP address for the currently active host. &lt;/p&gt;

&lt;p&gt;The only time this creates a challenge is when you want to connect to RDS from a private/corporate network and have to update firewall or VPN tunnel configuration to allow connections to RDS. If this isn't an issue for you, you can stop reading this.&lt;/p&gt;

&lt;p&gt;The problem is firewall rules, VPN tunnels, and NAT rules all work on IP addresses, not hostnames.  You can't configure your firewall to unblock traffic to an RDS instance if you don't know its IP address.  &lt;/p&gt;

&lt;p&gt;The workaround I found was to put an EC2 server in front of RDS as a TCP proxy.  You can give a static IP address to an EC2 instance with the &lt;code&gt;PrivateIpAddress&lt;/code&gt; property of &lt;code&gt;AWS::EC2::Instance&lt;/code&gt;, or with an &lt;code&gt;AWS::EC2::EIPAssociation&lt;/code&gt; resource  for a static publicly-routed IP.  Then you use that EC2 instance to forward traffic on to the RDS instance by hostname.  The EC2 instance's IP address then becomes the database's static IP for firewall purposes.&lt;/p&gt;

&lt;p&gt;There's lots of different ways you can forward traffic from EC2 to RDS. You can pick whichever one best suits you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;socat&lt;/code&gt;: e.g., &lt;code&gt;socat TCP-LISTEN,[port],fork,reuseaddr TCP:[hostname]:[port]&lt;/code&gt;. &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pros: simple and convenient, easy to install with &lt;code&gt;yum install socat&lt;/code&gt;.
&lt;/li&gt;
&lt;li&gt;Cons: not widely known; forks a process per connection so not good for high volume&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;li&gt;

&lt;p&gt;&lt;code&gt;haproxy&lt;/code&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pros: Robust, scalable&lt;/li&gt;
&lt;li&gt;Cons: AWS Linux packages &lt;a href="https://stackoverflow.%0Acom/questions/37520737/get-or-install-haproxy-1-6-on-amazon-linux-only-comes-with-1-5-in-epel"&gt;do not include latest version with runtime DNS resolution&lt;/a&gt;; must be built 
from source, and even then requires some contortions to resolve hostnames at runtime.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;li&gt;

&lt;p&gt;&lt;code&gt;nginx&lt;/code&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pros: you might already be using it&lt;/li&gt;
&lt;li&gt;Cons: overkill for a port forwarder&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;li&gt;

&lt;p&gt;&lt;code&gt;ssh&lt;/code&gt; port forwarding&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pros: widely understood, ssh/sshd already installed by default&lt;/li&gt;
&lt;li&gt;Cons: requires establishing and authenticating an SSH connection, which is overkill when you &lt;em&gt;only&lt;/em&gt;   want port forwarding&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;On security group setup&lt;/em&gt;: Put the EC2 instance and RDS instance in two different security groups, and then those security groups can refer to each other.  This is a perfect use case for  CloudFormation's AWS::EC2::SecurityGroupIngress&lt;code&gt;and AWS::EC2::SecurityGroupEgress&lt;/code&gt; resources ("typically to allow security groups &lt;br&gt;
to reference each other").   Since you don't know the RDS instance's IP address, you can refer to the RDS instance's security group.  The EC2 security group would have an egress rule to RDS security group and vice versa.  &lt;/p&gt;

&lt;p&gt;It's a good idea to otherwise lock down the EC2 security group.  The EC2 instance should only allow outbound access to the target RDS instance, and DNS for hostname resolution.  The RDS instance should only allow inbound access through the EC2 proxy.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Other CloudFormation tips&lt;/em&gt;: Keep the EC2 instance in a stack by itself so it can be rebuilt independently.  You can make cross-stack references to the RDS instance from the EC2 stack.  Also, you can put a script in the &lt;code&gt;UserData&lt;/code&gt; property in EC2 to inject the RDS hostname (from the cross-stack reference) into Upstart config files (&lt;code&gt;/etc/init/your-proxy-service.conf&lt;/code&gt;) so your proxy service will start automatically on boot and refer to the correct hostname.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>devops</category>
    </item>
    <item>
      <title>Learning Scala for Spark, and the apply method</title>
      <dc:creator>Bill Schneider</dc:creator>
      <pubDate>Wed, 27 Dec 2017 14:55:36 +0000</pubDate>
      <link>https://forem.com/wrschneider/learning-scala-for-spark-and-the-apply-method-gjf</link>
      <guid>https://forem.com/wrschneider/learning-scala-for-spark-and-the-apply-method-gjf</guid>
      <description>&lt;p&gt;&lt;em&gt;This article originally appeared &lt;a href="http://wrschneider.github.io/2017/12/12/scala-spark-apply.html"&gt;on my blog&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Sometimes in Spark you will see code like&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;val df1 = ...
val df2 = ...
val df3 = df1.join(df2, df1("col") === df2("col"))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It is a little odd at first to use &lt;code&gt;DataFrame&lt;/code&gt; objects like methods.  &lt;/p&gt;

&lt;p&gt;What's going on here?  &lt;/p&gt;

&lt;p&gt;In Scala, objects have an &lt;code&gt;apply&lt;/code&gt; method, which allows any object to be invoked like a method.  &lt;code&gt;obj(foo)&lt;/code&gt; is equivalent to &lt;code&gt;obj.apply(foo)&lt;/code&gt;.  DataFrame's &lt;code&gt;apply&lt;/code&gt; method is the same as &lt;code&gt;col&lt;/code&gt;, so &lt;code&gt;df("col")&lt;/code&gt; is equivalent to &lt;code&gt;df.col("col")&lt;/code&gt;.  &lt;/p&gt;

&lt;p&gt;This is also related to why you can create instances of case classes without &lt;code&gt;new&lt;/code&gt; -- a case class defines a companion object with the same name, and that &lt;br&gt;
companion object has an &lt;code&gt;apply&lt;/code&gt; method that returns &lt;code&gt;new ClassName()&lt;/code&gt;.  &lt;/p&gt;

&lt;p&gt;Personally I haven't learned to like Scala's &lt;code&gt;apply&lt;/code&gt; feature, because it's not entirely obvious what &lt;code&gt;obj(foo)&lt;/code&gt; is supposed to do.  But in this case,&lt;br&gt;
it makes sense to have shortcuts like that when I'm thinking of Scala as a DSL for Spark.&lt;/p&gt;

</description>
      <category>spark</category>
      <category>scala</category>
    </item>
    <item>
      <title>Readability analogy in music</title>
      <dc:creator>Bill Schneider</dc:creator>
      <pubDate>Sat, 16 Dec 2017 12:33:01 +0000</pubDate>
      <link>https://forem.com/wrschneider/readability-analogy-in-music-48</link>
      <guid>https://forem.com/wrschneider/readability-analogy-in-music-48</guid>
      <description>&lt;p&gt;&lt;em&gt;This article originally appeared &lt;a href="http://wrschneider.github.io/2017/12/11/readability-music.html"&gt;on my blog&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In music, you can often write the same note two different ways, for example, B-flat and A-sharp correspond to the same key on a piano keyboard.  When you use which depends on surrounding context.  A chord C/E/G/B-flat is a C dominant 7th and resolves to an F chord.  The same chord written&lt;br&gt;
as C/E/G/A-sharp is an augmented 6th and resolves to B major.  So which way the chord is written tells you something about where it's going next.&lt;/p&gt;

&lt;p&gt;The other day, I saw music with an augmented 6th chord written as a dominant 7th, and I found it confusing to look at a sequence of notes&lt;br&gt;
like A-natural, A-flat, A-natural.  Given the first two notes in that sequence, you usually expect the third note to be G.  &lt;/p&gt;

&lt;p&gt;So what does this have to do with code?&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Readability matters.&lt;/em&gt;  &lt;/p&gt;

&lt;p&gt;With code, you can often get the same end result multiple ways.  It's important for your code to look like what it does, so anyone reading it&lt;br&gt;
will be able to understand it.  Since we spend at least 90% of our time reading code (the Uncle Bob figure) focusing on readability will improve &lt;br&gt;
productivity.  &lt;/p&gt;

&lt;p&gt;Poorly named methods are the equivalent of that A-flat that should have been written as a G-sharp: it will sound (or work)&lt;br&gt;
the same, but you're making the reader work harder than they should have to.&lt;/p&gt;

</description>
      <category>programming</category>
      <category>readability</category>
    </item>
    <item>
      <title>Learning Scala for Spark, or, what's up with that triple equals?</title>
      <dc:creator>Bill Schneider</dc:creator>
      <pubDate>Mon, 11 Dec 2017 22:48:34 +0000</pubDate>
      <link>https://forem.com/wrschneider/learning-scala-for-spark-or-whats-up-with-that-triple-equals-2m5</link>
      <guid>https://forem.com/wrschneider/learning-scala-for-spark-or-whats-up-with-that-triple-equals-2m5</guid>
      <description>&lt;p&gt;&lt;em&gt;This article originally appeared &lt;a href="http://wrschneider.github.io/2017/09/24/spark-triple-equals.html"&gt;on my blog&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I began to learn Scala specifically to work with Spark.  The sheer number of language features in Scala can be overwhelming, so, I find it useful to learn Scala features one by one, in context of specific use cases.  In a sense I'm treating Scala like a DSL for writing Spark jobs.&lt;/p&gt;

&lt;p&gt;Let's pick apart a simple fragment of Spark-Scala code: &lt;code&gt;dataFrame.filter($"age" === 21)&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;There are a few things going on here:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;The &lt;code&gt;$"age"&lt;/code&gt; creates a Spark &lt;code&gt;Column&lt;/code&gt; object referencing the column named &lt;code&gt;age&lt;/code&gt; within in a dataframe.  The &lt;code&gt;$&lt;/code&gt; operator is defined in an implicit class &lt;a href="https://spark.apache.org/docs/2.1.1/api/java/index.html?org/apache/spark/sql/SQLImplicits.StringToColumn.html"&gt;&lt;code&gt;StringToColumn&lt;/code&gt;&lt;/a&gt;.  Implicit classes are a similar concept to C# extension methods or mixins in other dynamic languages.  The &lt;code&gt;$&lt;/code&gt; operator is like a method added on to the &lt;code&gt;StringContext&lt;/code&gt; class.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The triple equals operator &lt;code&gt;===&lt;/code&gt; is normally the Scala type-safe equals operator, analogous to the one in Javascript.  Spark overrides this with a method in &lt;code&gt;Column&lt;/code&gt; to create a new &lt;code&gt;Column&lt;/code&gt; object that compares the &lt;code&gt;Column&lt;/code&gt; to the left with the object on the right, returning a boolean.  Because &lt;a href="https://stackoverflow.com/questions/7681183/how-can-i-define-a-custom-equality-operation-that-will-be-used-by-immutable-set"&gt;double-equals (&lt;code&gt;==&lt;/code&gt;) cannot be overridden&lt;/a&gt;, Spark &lt;em&gt;must&lt;/em&gt; use the triple equals. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The &lt;code&gt;dataFrame.filter&lt;/code&gt; method takes an argument of &lt;code&gt;Column&lt;/code&gt;, which defines the comparison to apply to the rows in the &lt;code&gt;DataFrame&lt;/code&gt;.  Only rows that match the condition will be included in the resulting &lt;code&gt;DataFrame&lt;/code&gt;.  &lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Note that the actual comparison is not performed when the above line of code executes!  Spark methods like &lt;code&gt;filter&lt;/code&gt; and &lt;code&gt;select&lt;/code&gt; -- including the &lt;code&gt;Column&lt;/code&gt; objects passed in--are lazy.  You can think of a DataFrame like a query builder pattern, where each call builds up a plan for what Spark will do later when a call like &lt;code&gt;show&lt;/code&gt; or &lt;code&gt;write&lt;/code&gt; is called.  It's similar in concept to something like &lt;code&gt;IQueryable&lt;/code&gt; in LINQ, where &lt;code&gt;foo.Where(row =&amp;gt; row.Age == 21)&lt;/code&gt; builds up a plan and an expression tree that is later translated to SQL when rows must be fetched, e.g., when &lt;code&gt;ToList()&lt;/code&gt; is called.&lt;/p&gt;

</description>
      <category>spark</category>
      <category>scala</category>
    </item>
    <item>
      <title>Measuring AWS Redshift Query Compile Latency</title>
      <dc:creator>Bill Schneider</dc:creator>
      <pubDate>Mon, 18 Sep 2017 15:14:45 +0000</pubDate>
      <link>https://forem.com/wrschneider/measuring-aws-redshift-query-compile-latency</link>
      <guid>https://forem.com/wrschneider/measuring-aws-redshift-query-compile-latency</guid>
      <description>

&lt;p&gt;&lt;em&gt;This article originally &lt;a href="http://wrschneider.github.io/2017/06/02/redshift-compile-latency.html"&gt;appeared on my blog&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;AWS is transparent that Redshift's distributed architecture entails a &lt;a href="http://docs.aws.amazon.com/redshift/latest/dg/c-query-performance.html"&gt;fixed cost every time a new query is issued&lt;/a&gt;.  The documentation says the impact "might be especially noticeable when you run one-off (ad hoc) queries."&lt;/p&gt;

&lt;p&gt;I went deeper to try to quantify exactly what "noticeable" means.&lt;/p&gt;

&lt;p&gt;To isolate the impacts of data cache hits/misses from query compilation, I ran a bunch of queries on empty tables so there is no data to load or cache. Each query was slightly modified to trigger a recompilation, by changing the columns or aggregate functions.&lt;/p&gt;

&lt;p&gt;I found that the compile latency scales with the complexity of the query.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Simple query: usually between 1-1.5 sec, with an outlier around 3 seconds.  Example of a simple query:&lt;/li&gt;
&lt;/ul&gt;



&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;select sum(a1) from foo where a2 = 1;
select sum(a2) from foo where a3 = 1;
-- etc.
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;More complex query with more conditions, and group-by: usually around 2-3 seconds.  Example of a query in this category:&lt;/li&gt;
&lt;/ul&gt;



&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;select a8, a9, sum(a1), sum(a2)
from foo
where foo.a3 &amp;gt; 10 and foo.a4 &amp;lt; foo.a5
group by a8, a9;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Even more complex, with joins and group-by: average around 5 seconds, ranging between 3-7 seconds. Example query:&lt;/li&gt;
&lt;/ul&gt;



&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;select s, s2, count(a6), sum(a7)
from foo
join bar on bar.a = foo.a6
join baz on baz.b = foo.a7
where foo.a3 = 1 and baz.s2 is not null
group by s, s2;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;




</description>
      <category>awsredshiftbigdata</category>
    </item>
    <item>
      <title>What does agile development have in common with amateur theater?</title>
      <dc:creator>Bill Schneider</dc:creator>
      <pubDate>Mon, 18 Sep 2017 00:52:37 +0000</pubDate>
      <link>https://forem.com/wrschneider/what-does-agile-development-have-in-common-with-amateur-theater</link>
      <guid>https://forem.com/wrschneider/what-does-agile-development-have-in-common-with-amateur-theater</guid>
      <description>

&lt;p&gt;&lt;em&gt;This article originally appeared &lt;a href="http://wrschneider.github.io/2017/09/16/musical-theater-and-agile-development.html"&gt;on my blog&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Working in an agile development environment, I noticed some parallels to my experiences with student theater several decades ago.&lt;/p&gt;

&lt;p&gt;In both cases, &lt;em&gt;you never have enough time to get your production / release perfect&lt;/em&gt;.  In theater, your dates are fixed in advance, and you work within that constraint.  Your production has to be "releaseable" in the sense that you are expected to perform a whole show start-to-finish, and you have to accept some fine-tuning just won't get done.  You have some flexibility on how elaborate you make your staging, scenery, costumes, etc., and you do the best you can with the resources and the time you have.  In software, you commit to a release schedule, and you scope your releases to what you can get done within that schedule.  It's better to drop features than to delay the release.&lt;/p&gt;

&lt;p&gt;Another common concept is &lt;em&gt;progressive refinement&lt;/em&gt;.  The idea is to build the big picture in broad strokes first, then come back to fill in details. (Think about how JPEGs look while downloading.)  The first thing you when you start rehearsals is read through the whole script, start-to-finish.  No matter what, everyone has to know their lines and music.  Then you start adding staging, bits at a time.  Early into the rehearsal period you would be able to do a minimalist performance--you wouldn't have full staging or sets, but it would be &lt;em&gt;something&lt;/em&gt;.  In software, this would be like defining the full product vision, then building out enough critical features to release an early MVP.  &lt;/p&gt;

&lt;p&gt;On the people side, theater productions tend to have distinct roles.  Producers are responsible for publicity and ticket sales, directors and music directors are responsible for making sure performers know where to stand and how to sound, etc.  These roughly correspond to software team roles like product manager, development manager, and technical lead -- there is a separation of responsibility between commercial success (producer/product manager) and the day-to-day management of rehearsals/development (directors/managers).  In an amateur/student group people usually wear multiple hats, similar to an agile or startup environment--everyone is a stakeholder in overall success and pitches in where needed--but there are clear affinities.    &lt;/p&gt;

&lt;p&gt;Finally, on people management: it is important to have the right people on the team.  In both cases, teams that are excited to work together will feed off each other and can outperform the sum of their parts.  On the flip side, someone who looks great on paper might not be a good fit for your organization, and will often dissapoint.  In both cases, I learned this the hard way.&lt;/p&gt;


</description>
      <category>agileteamsculture</category>
    </item>
    <item>
      <title>Opinions on truthiness across languages</title>
      <dc:creator>Bill Schneider</dc:creator>
      <pubDate>Thu, 20 Jul 2017 13:48:22 +0000</pubDate>
      <link>https://forem.com/wrschneider/opinions-on-truthiness-across-languages</link>
      <guid>https://forem.com/wrschneider/opinions-on-truthiness-across-languages</guid>
      <description>&lt;p&gt;&lt;em&gt;A version of this article originally appeared on &lt;a href="http://wrschneider.github.io"&gt;my GitHub pages blog&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Different languages have different opinions about what to treat as "truthy" or "falsy" when using a non-boolean object as an expression inside an &lt;code&gt;if&lt;/code&gt; statement.&lt;/p&gt;

&lt;p&gt;I looked at Python, Groovy, Javascript and Ruby to compare their differences.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Null is always falsy&lt;/li&gt;
&lt;li&gt;Zero and empty strings are falsy, except in Ruby&lt;/li&gt;
&lt;li&gt;Empty collections (set/list/dict) are falsy in Python and Groovy but not Javascript or Ruby&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;My observations and personal opinions on language design:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Python treats zero, empty strings and collections all as 'falsy'.  Personally, I find this the most intuitive convention.

&lt;ul&gt;
&lt;li&gt;Treatment of zero and null as falsy has historical precedent, from C.  False and null pointers are both represented as zeros in a register or memory location.&lt;/li&gt;
&lt;li&gt;Treatment of empty strings and collections is a nice convenience, given the number of times I've written conditionals like &lt;code&gt;if (foo != null and !foo.empty())&lt;/code&gt;.  It's usually the exception that I want to distinguish between null and empty in a conditional.  So it's nice that &lt;code&gt;if (foo)&lt;/code&gt; handles the common case, then I can write &lt;code&gt;if (not foo is None)&lt;/code&gt; when I really do want to distinguish null.
&lt;/li&gt;
&lt;li&gt;Treatment of empty string as similar to null feels familiar from my Oracle experience.  Also, it's consistent with treatment of an empty collection.
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;li&gt;Groovy is inspired by Python and adopts similar conventions for truthiness. &lt;/li&gt;
&lt;li&gt;Ruby takes a different opinion that all values are truthy except &lt;code&gt;nil&lt;/code&gt; (and &lt;code&gt;false&lt;/code&gt;, of course).  While it's not my personal preference, it's defensible and self-consistent.&lt;/li&gt;
&lt;li&gt;Javascript can reliably be expected to deliver a WTF.  Javascript treats zero and empty strings as falsy, but empty collections as truthy.
To me, it's hard to understand why strings and collections ought to behave differently; the Python behavior makes much more sense.   But wait, it gets even better: check out this &lt;a href="http://stackoverflow.com/questions/5491605/empty-arrays-seem-to-equal-true-and-false-at-the-same-time"&gt;link on StackOverflow&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>javascript</category>
      <category>python</category>
      <category>ruby</category>
      <category>beginners</category>
    </item>
    <item>
      <title>Balancing early and later project risks</title>
      <dc:creator>Bill Schneider</dc:creator>
      <pubDate>Mon, 16 Jan 2017 12:44:21 +0000</pubDate>
      <link>https://forem.com/wrschneider/balancing-early-and-later-project-risks</link>
      <guid>https://forem.com/wrschneider/balancing-early-and-later-project-risks</guid>
      <description>&lt;p&gt;One of the things I liked about this post on &lt;a href="https://hackernoon.com/senior-engineers-reduce-risk-5ab2adc13c97#.9brfu91rj"&gt;"Senior Engineers Reduce Risk"&lt;/a&gt; is how it called out two different kinds of project risks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Early in a project lifecycle, the biggest risk is building the wrong thing&lt;/li&gt;
&lt;li&gt;  Later in the project lifecycle, once you know you're building the right thing, the “-ilities (scalability, maintainability etc.) become bigger risks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The author's point is that senior engineers need help identify and mitigate these risks. &lt;/p&gt;

&lt;p&gt;One additional responsibility of a senior engineer, in my opinion, is to understand the tradeoffs between these kinds of risks, and how to balance those tradeoffs.  This is tricky because, to paraphrase Yogi Berra, predictions are hard--especially about your future user load or revenue.  You can think of this like type 1/type 2 error (false positive/negative) in hypothesis testing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Type 1 error: premature optimization/generalization/etc.  You spend time scaling something that doesn't sell, or designing a generic platform that only gets used once.
&lt;/li&gt;
&lt;li&gt;  Type 2 error: technical debt.  By the time you realize you have a scaling problem it's too late, and your users end up unhappy.  Or, your lack of CI processes and tests slows down future releases.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The type 1 vs. type 2 metaphor assumes you have constrained resources - an engineering hour spent on scaling is an hour not spent on prototyping to get feedback from users.  So reducing one kind of risk will increase the other kind of risk and vice versa.&lt;/p&gt;

&lt;p&gt;Given that both kinds of error are bad, what do you do?  You have to balance the possible outcomes from these risks, and prioritize based on what's more important to you, and this is context-dependent.  A senior engineer should know how to reach out and communicate with business stakeholders to figure out the right balance, telling a good story about the risks that may not be immediately evident to non-technical team members.  A senior engineer will have lived through both kinds of errors and can draw from their past experience in their storytelling.&lt;/p&gt;

&lt;p&gt;My own personal opinion: after living through projects with both kinds of type 1/type 2 errors, I would rather take type 2 over type 1 most of the time.  37signals sums this up with the mantra &lt;a href="https://gettingreal.37signals.com/ch04_Its_a_Problem_When_Its_a_Problem.php"&gt;"It's a problem when it's a problem"&lt;/a&gt;.  The catch is you have to be disciplined enough to identify and communicate future risks, &lt;em&gt;and&lt;/em&gt; have a plan to address them if and when they become issues.  It can be OK to defer scaling if and only if it is a deliberate, conscious tradeoff to prioritize something else, so there are no surprises later.   &lt;/p&gt;

&lt;p&gt;This is also why "debt" is a good metaphor.  In personal finance, some kinds of debt are good because they help reach a strategic goal: buying a house, getting an education, starting a business.  Other kinds of debt are bad: racking up credit card balances without a plan to pay them off.  Similarly, deferring some "-ilities" in pursuit of a higher priority business goal can be a good thing, while ignoring them outright is bad.&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
