<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Lars Kamp</title>
    <description>The latest articles on Forem by Lars Kamp (@scapecast).</description>
    <link>https://forem.com/scapecast</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F759134%2Fcc914261-7ab8-42f1-a02c-b781463ae1e1.jpeg</url>
      <title>Forem: Lars Kamp</title>
      <link>https://forem.com/scapecast</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/scapecast"/>
    <language>en</language>
    <item>
      <title>How to integrate cloud data into your analytics stack with Cloud2SQL</title>
      <dc:creator>Lars Kamp</dc:creator>
      <pubDate>Wed, 18 Jan 2023 04:49:08 +0000</pubDate>
      <link>https://forem.com/scapecast/how-to-integrate-cloud-data-into-your-analytics-stack-with-cloud2sql-5bil</link>
      <guid>https://forem.com/scapecast/how-to-integrate-cloud-data-into-your-analytics-stack-with-cloud2sql-5bil</guid>
      <description>&lt;p&gt;Cloud infrastructure can be really complex, and finding resources that match certain criteria can be like searching for the proverbial needle in the haystack. &lt;/p&gt;

&lt;p&gt;Why would you want to search for a specific resource? For example, to find:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;vulnerabilities&lt;/li&gt;
&lt;li&gt;misconfigurations&lt;/li&gt;
&lt;li&gt;dependencies&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Usually, that requires direct access to your cloud environment, and that privilege is reserved to a few select people. &lt;/p&gt;

&lt;p&gt;But what if you could just query your cloud infrastructure with SQL? For example, this query would return the name of an AWS ELB and the name of the VPC it is connected to.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT aws_elb.name, aws_vpc.name
 FROM aws_elb
 INNER JOIN link_aws_vpc_aws_elb ON aws_elb._id = link_aws_vpc_aws_elb.to_id
 INNER JOIN aws_vpc ON aws_vpc._id = link_aws_vpc_aws_elb.from_id
 LIMIT 1;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With raw cloud data, you could use your existing analytics toolchain to transform and visualize your cloud data - think Airflow, dbt, Metabase, etc. By extracting resource metadata, and loading it into a separate SQL database (say a cloud warehouse like Snowflake) for analysis - we're widening the number of people who can ask smart questions about your cloud. &lt;/p&gt;

&lt;p&gt;The first step for that is to build the "EL" part of "ELT" - extract, load, and transform. Extracting data from the clouds like AWS and GCP can be quite cumbersome - the APIs are all fragmented, and use different data models, even within a single cloud. &lt;/p&gt;

&lt;p&gt;And that's why the vast majority of infrastructure and security engineers always fall back to an existing tool with a UI. That tool acquires and stores data in a proprietary format, in yet another infrastructure data silo, in yet another dashboard. And then charges and arm and a leg for it.&lt;/p&gt;

&lt;p&gt;We think that data integration for infrastructure engineers should be a commodity. Cloud data should be easily accessible and separated from the underlying infrastructure, so that data professionals like analytics engineers can work with it. &lt;/p&gt;

&lt;p&gt;That's why we've created &lt;a href="https://github.com/someengineering/cloud2sql#readme" rel="noopener noreferrer"&gt;Cloud2SQL&lt;/a&gt;. Cloud2SQL is an open source tool that extracts resource metadata from your cloud (currently with support for AWS, GCP and DigitalOcean) and syncs it to a destination database. Cloud2SQL flattens that data into tables, complete with foreign keys and link tables.&lt;/p&gt;

&lt;p&gt;The image below shows what the fields in a table for an AWS ELB look like (screenshot taken from Metabase).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgei2w5qpwkvj737t3fmb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgei2w5qpwkvj737t3fmb.png" alt="Metabase Dashboard with the fields for an AWS ELB table" width="800" height="781"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A link table is a special type of table that allows you to easily find relationships between different resources.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqs0rovgbc1n89qy6nlw7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqs0rovgbc1n89qy6nlw7.png" alt="Link Table for AWS ELB" width="800" height="246"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Dependencies matter in the cloud, you need to understand how your resources are connected. The relationships among your resources are 1st class citizens, and as important as the resources themselves. Link tables capture those dependencies.&lt;/p&gt;

&lt;p&gt;Each link table is prefixed with &lt;em&gt;link&lt;/em&gt;_ followed by the two resource kind names. For example, a link table connecting an AWS VPC to an AWS ELB would be named &lt;em&gt;link_aws_vpc_aws_elb&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Link tables only have two fields: &lt;em&gt;from_id&lt;/em&gt; and &lt;em&gt;to_id&lt;/em&gt;, which can be easily JOINed on.&lt;/p&gt;

&lt;p&gt;By using link tables, you can find dependent resources without needing to know the specific details of each resource's API or how the resources are related. Much like when working with a graph you can also find resources based on the state of another resource.&lt;/p&gt;

&lt;p&gt;All of this allows users who are familiar with SQL to easily work with the data collected by Cloud2SQL, using the analytics toolchain and apps they are already familiar. You can now build your own transformations, and write your own queries - custom to your business. We think about it as the first step to replace your "XOps" tools, which only give you 80% of the data you need to start with.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe38tgr72v2upzghg4vy9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe38tgr72v2upzghg4vy9.png" alt="Metabase dashboard" width="800" height="684"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Cloud2SQL already has support for SQLite, MySQL, MariaDB, and PostgreSQL, and Snowflake, as well as Parquet columnar structure files. &lt;/p&gt;

&lt;p&gt;To install Cloud2SQL, all you need is Python 3.9 or newer. Create a new virtual environment and install the cloud2sql[all] package:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;$ pip3 install --user cloud2sql[all]&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;See the full installation instructions for Cloud2SQL &lt;a href="https://resoto.com/blog/2023/01/17/installing-cloud2sql" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In the next few posts, we'll publish example queries and transformations. To stay informed, best is to star and follow the &lt;a href="https://github.com/someengineering/cloud2sql#readme" rel="noopener noreferrer"&gt;GitHub repo for Cloud2SQL&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>programming</category>
      <category>sql</category>
      <category>productivity</category>
      <category>database</category>
    </item>
    <item>
      <title>How to search your AWS accounts for any resource</title>
      <dc:creator>Lars Kamp</dc:creator>
      <pubDate>Tue, 23 Aug 2022 08:31:34 +0000</pubDate>
      <link>https://forem.com/scapecast/how-to-search-your-aws-accounts-for-any-resource-2a40</link>
      <guid>https://forem.com/scapecast/how-to-search-your-aws-accounts-for-any-resource-2a40</guid>
      <description>&lt;p&gt;Retrieving information about resources you have deployed in your Amazon Web Services (AWS) infrastructure means tediously navigating the AWS Management Console or using the AWS Command Line Interface. &lt;/p&gt;

&lt;p&gt;This approach works well in a single account setup. Yet the best practice proposed by AWS is to set up a multi-account environment to separate your workloads and users. As the number of accounts grows, navigating your infrastructure and finding resources via the Console or the CLI becomes increasingly difficult.&lt;/p&gt;

&lt;p&gt;The number of resources in these accounts just keeps growing. Developers create resources using tools such as Terraform, CDK, or CloudFormation… or sometimes even the console or CLI. &lt;/p&gt;

&lt;p&gt;It's not just the resources themselves, it's also the relationships between your resources that are relevant: an EBS volume is mounted to an EC2 instance running in a VPC and reachable via an ALB load balancer, for example. &lt;/p&gt;

&lt;p&gt;So how can you see everything that is running in your cloud, including the dependencies between your resources?&lt;/p&gt;

&lt;h2&gt;
  
  
  Graph-based Search
&lt;/h2&gt;

&lt;p&gt;We created &lt;a href="https://resoto.com"&gt;Resoto&lt;/a&gt; to allow the user to effortlessly search resources and automate workflows. Resoto gathers data about your infrastructure and builds a directed acyclic graph, where resources are vertices and their relationships/dependencies edges. &lt;/p&gt;

&lt;p&gt;This graph is what makes Resoto so powerful. But we also needed a way to allow users to query this data.&lt;/p&gt;

&lt;p&gt;Graph data is not relational, so SQL was not a good fit. And existing graph query languages like Cypher, Gremlin, or GSQL have steep learning curves and are unnecessarily complex for this use case.&lt;/p&gt;

&lt;p&gt;And so we developed our own search syntax tailored specifically to Resoto. The Resoto Shell allows you to interact with your Resoto installation. In particular, it provides a search command.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example: Search for EC2 instances
&lt;/h3&gt;

&lt;p&gt;Let's try searching for all available EC2 instances. &lt;code&gt;is()&lt;/code&gt; will match a specific or abstract type in a polymorphic fashion, checking all types and subtypes of the provided type. &lt;/p&gt;

&lt;p&gt;The &lt;code&gt;instance_cores&lt;/code&gt; filter will limit results to only those instances with more than two cores. The query below will automagically search your entire infrastructure, regardless of account or region!&lt;/p&gt;

&lt;p&gt;&lt;code&gt;search is(aws_ec2_instance) and instance_cores &amp;gt; 2&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;and here the (abbreviated) result:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;id=i-a..., name=crmsec, age=2y2M, account=dev, region=us-east-1
​id=i-0..., name=airgap, age=2M, account=staging, region=eu-central-1
​id=i-0..., name=flixer, age=1M3w, account=sales, region=us-west-2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The query found three instances in three accounts and three regions. The default output is a condensed list view, but it is also possible to get all collected properties of any resource using the &lt;code&gt;dump&lt;/code&gt; command:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;search is(aws_ec2_instance) and instance_cores &amp;gt; 2 limit 1 | dump&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;In the case of EC2, these properties for example are the number of cores, memory and the actual instance type and its status:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;reported:
​  kind: aws_ec2_instance
​  id: i-a...
​  tags:
​    aws:cloudformation:stack-name: lk-build-server
​    aws:cloudformation:stack-id: arn:aws:cloudformation:...
​    owner: team-proto
​  name: LKbuild
​  instance_cores: 4
​  instance_memory: 16
​  instance_type: t3.xlarge
​  instance_status: stopped
​  age: 1y10M
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can refine and group the results. Let's group our instances by &lt;code&gt;instance_type&lt;/code&gt; using the count command:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;search is(aws_ec2_instance) and instance_cores &amp;gt; 2 | count instance_type&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;t3.2xlarge: 1
​m5.4xlarge: 15
​total matched: 16
​total unmatched: 0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The search returns sixteen EC2 instances, including fifteen m5 and one t3 xlarge.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using graph search to understand dependencies
&lt;/h2&gt;

&lt;p&gt;Now, let's say we want to find all ELB load balancers attached to the EC2 instances returned above. We first need to understand Resoto's graph data structure to tackle this problem.&lt;/p&gt;

&lt;p&gt;When Resoto collects data on your cloud infrastructure, it creates an edge between ELB and EC2 instances if the ELB balances the traffic of the related EC2 instance. In the image below, you can see the how the graph captures the entire set of dependencies in the account:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--cTcw-2Mj--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/2ya20xj9ejwgdb638dqo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--cTcw-2Mj--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/2ya20xj9ejwgdb638dqo.png" alt="Image description" width="880" height="299"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;search is(aws_ec2_instance) and instance_cores &amp;gt; 2 --&amp;gt; is(aws_elb)&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;name=a5..., age=1y1M, account=sales, region=eu-central-1
​name=a3..., age=6M2w, account=staging, region=us-west-2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;--&amp;gt;&lt;/code&gt;arrow will take all matching EC2 instances and walk the graph "outbound," moving precisely one step. The list of matching items is not limited only to ELB load balancers, so with &lt;code&gt;is(aws_elb)&lt;/code&gt; we filter the list again to return only ELB results.&lt;/p&gt;

&lt;p&gt;It's also possible to reverse the last query to output all EC2 instances behind an ELB:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;search is(aws_elb) &amp;lt;-- is(aws_ec2_instance) and instance_cores &amp;gt; 2&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;id=i-0..., name=airgap, age=2M, account=staging, region=eu-central-1
​id=i-0..., name=flixer, age=1M3w, account=sales, region=us-west-2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The arrow is now mirrored and traverses the graph "inbound," walking edges in the opposite direction.&lt;/p&gt;

&lt;h2&gt;
  
  
  Use cases
&lt;/h2&gt;

&lt;p&gt;Graph-based search becomes useful when you're trying to solve for problems that require understanding of how resources are connected to each other.&lt;/p&gt;

&lt;p&gt;An example is the "blast radius" of a resource. &lt;/p&gt;

&lt;p&gt;When you're looking at cleaning up an EC2 instance, what other resources are you taking down as well? Or, the other way around, what resources do you need to clean up first before you want to clean up an unused EC2 instance?&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it yourself!
&lt;/h2&gt;

&lt;p&gt;These examples only scratch the surface of Resoto's search syntax. While this post is AWS specific, we also support GCP and DigitalOcean. &lt;/p&gt;

&lt;p&gt;Resoto is open source, self-hosted and free to use! Check out &lt;a href="https://resoto.com/docs"&gt;our documentation&lt;/a&gt; and give Resoto a spin! &lt;/p&gt;




&lt;p&gt;This post was originally published by my colleague Matthias Veit at &lt;a href="https://resoto.com"&gt;https://resoto.com/blog/2022/02/04/resoto-search-101&lt;/a&gt; on February 4, 2022. &lt;/p&gt;

</description>
    </item>
    <item>
      <title>Cloud Inventory: The High Interest Credit Card of Technical Debt</title>
      <dc:creator>Lars Kamp</dc:creator>
      <pubDate>Sat, 06 Aug 2022 10:35:59 +0000</pubDate>
      <link>https://forem.com/scapecast/cloud-inventory-the-high-interest-credit-card-of-technical-debt-423c</link>
      <guid>https://forem.com/scapecast/cloud-inventory-the-high-interest-credit-card-of-technical-debt-423c</guid>
      <description>&lt;p&gt;Does the title sound familiar? Probably yes, because ... 👇&lt;/p&gt;

&lt;p&gt;... it was the title of a 2014 Google paper, except the paper was about Machine Learning.&lt;/p&gt;

&lt;p&gt;But the title also easily applies to today's cloud-native infrastructure.&lt;/p&gt;

&lt;p&gt;👀 Today, developers face pressure to ship new products and services.&lt;/p&gt;

&lt;p&gt;As a result, they run lots of experiments. They adopt new cloud services that drive forward innovation, combined with accelerated deployment through infrastructure-as-code and CI/CD pipelines.&lt;/p&gt;

&lt;p&gt;😱 The flip side of that innovation is that companies now have an inventory problem&lt;/p&gt;

&lt;p&gt;👉 It's much easier to deploy new resources than figuring out which ones are running and why. The result is a growing number of resources that run in your cloud.&lt;/p&gt;

&lt;p&gt;It's a new type of technical debt, where you lose track of the assets running in your infrastructure and how they relate to your business.&lt;/p&gt;

&lt;p&gt;Ward Cunningham first defined the metaphor of technical debt in 1992.&lt;/p&gt;

&lt;p&gt;💸 Just like financial debt, inventory debt compounds.&lt;/p&gt;

&lt;p&gt;💰With modern cloud-native infrastructure, it's remarkably easy to incur massive recurring cloud spend - without understanding what you're actually spending it on.&lt;/p&gt;

&lt;p&gt;A Cloud Asset Inventory has a lot of the answers. It's also a forward-looking tool that allows platform teams to stay in control while giving developers liberal permissions.&lt;/p&gt;

&lt;p&gt;In a post "&lt;a href="https://resoto.com/blog/2022/07/28/what-is-cloud-asset-inventory"&gt;What is Cloud Asset Inventory?&lt;/a&gt;", I summarize the challenges that come along with adoption of cloud-native infrastructure, and how a cloud asset inventory is a strategic tool to&lt;/p&gt;

&lt;p&gt;📉 pay off you inventory debt,&lt;/p&gt;

&lt;p&gt;🚀 increase development velocity, and&lt;/p&gt;

&lt;p&gt;📈 grow infrastructure's contribution to profitability.&lt;/p&gt;

&lt;p&gt;💪 ✌️ 👌&lt;/p&gt;

&lt;h1&gt;
  
  
  cloud #infrastructure #shiftleft
&lt;/h1&gt;

</description>
      <category>kubernetes</category>
      <category>terraform</category>
      <category>cloudnative</category>
    </item>
  </channel>
</rss>
