Forem: Michael Guarino

Understanding Deprecated Kubernetes APIs and Their Significance

Michael Guarino — Mon, 02 Oct 2023 19:57:11 +0000

If you’re running an outdated version of the Kubernetes API, you’re putting your application at risk of substantial downtime. You may even experience a sitewide outage like the one experienced by Reddit.

Even if an upgrade doesn't result in an outage, subtle differences in Kubernetes APIs can cause frustration and wasted effort investigating underlying problems.

This was evident when we encountered the upgrade to Kubernetes 1.24, which brought an issue with service account default secrets. Consequently, it rendered the version incompatible with other Kubernetes terraform provider versions. Although it didn't cause a complete outage, it was a significant drain on our valuable time as we worked to identify the root cause of the problem.

The Kubernetes API serves as the interface to interact with a Kubernetes cluster. It allows users to query and manipulate various Kubernetes objects like pods, namespaces, and deployments. These APIs can be accessed through tools such as kubectl, via the REST API directly, or by using client libraries.

In this guide, we will explore the significance of deprecating Kubernetes APIs and how Plural Continuous Deployment (CD) can provide valuable insights for identifying these deprecations.

Deprecating and Removing Kubernetes APIs

Kubernetes is a dynamic system driven by APIs, which evolve with each new release. A crucial aspect of any API-driven system is having a well-defined deprecation policy. This policy informs users about APIs that are slated for removal or modification. Kubernetes follows this principle and periodically refines or upgrades its APIs. Consequently, older APIs are marked as deprecated and eventually phased out.

In this context, deprecation implies identifying an API component for eventual removal. While it functions currently, it is scheduled to be eliminated in an upcoming version. Further details on how Kubernetes manages API deprecation can be found in the deprecation policy documentation.

Why the Concern about Deprecated APIs?

When configuring an application, the user specifies the API version of the Kubernetes object to be employed. Whether it's a straightforward Kubernetes YAML manifest or a Helm chart, the apiVersion field designates the API version of the Kubernetes object. This underscores the importance for users or maintainers to be aware of deprecated Kubernetes API versions and the Kubernetes release in which they are set to be removed.

Additionally, during a Kubernetes cluster upgrade, encountering deprecated APIs is a possibility if the upgraded version does not support them. For instance, if resources in your cluster utilize an outdated API version, your application relying on that resource may cease to function if the deprecated API has been eliminated in the new cluster version.

An illustrative example is the APIVersion extensions/v1beta1 of the Ingress Resource, which was removed in Kubernetes version v1.22. Attempting to use such a removed API version in your configuration would result in an error message:

‌ Error: UPGRADE FAILED: current release manifest contains removed kubernetes api(s) for this kubernetes version and it is therefore unable to build the kubernetes objects for performing the diff. error from kubernetes: unable to recognize "": no matches for kind "Ingress" in version "extensions/v1beta1"

Where and How Kubernetes APIs are Utilized

‌‌To specify a particular API version in your configuration, refer to the sample below, sourced from Kubernetes documentation:

   apiVersion: apps/v1 <------ API Version of the kubernetes object
   kind: Deployment
  metadata:
    name: nginx-deployment
    labels:
      app: nginx
  spec:
    replicas: 3

You can also review all supported API groups along with their versions through official documentation or by using the kubectl command-line tool's api-versions command:‌

kubectl api-versions
admissionregistration.k8s.io/v1
admissionregistration.k8s.io/v1beta1
apiextensions.k8s.io/v1
apiextensions.k8s.io/v1beta1
apiregistration.k8s.io/v1
apiregistration.k8s.io/v1beta1
apps/v1

Challenges in Identifying Deprecated APIs in Your Cluster

Although Kubernetes provides official documentation to examine deprecated or removed APIs, identifying the resources in your cluster that utilize these APIs can be quite challenging.

On top of that, Kubernetes abides by a stringent API versioning protocol, resulting in multiple deprecations of v1beta1 and v2beta1 APIs across several releases. Their policy states that Beta API versions are mandated to receive support for a minimum of 9 months or 3 releases (whichever is longer) after deprecation, after which they may be subject to removal.

In cases where APIs that have been deprecated are still actively employed by workloads, tools, or other components interfacing with the cluster, disruptions may occur. Hence, it is imperative for users and administrators to conduct a thorough assessment of their cluster to identify any APIs in use slated for removal, and subsequently migrate the affected components to leverage the appropriate new API version.

Merely listing your Kubernetes resources using kubectl commands may yield inaccurate API version information, as explained in this issue.

[

app/v1 show as extensions/v1beta1 when kubectl get xxx -oyaml · Issue #58131 · kubernetes/kubernetes

Is this a BUG REPORT or FEATURE REQUEST?: Uncomment only one, leave it on its own line: /kind bug /kind feature What happened: kubectl version [root@ib17b07 ~]# kubectl version Client Version: vers…

GitHubkubernetes

](https://github.com/kubernetes/kubernetes/issues/58131?ref=plural.sh#issuecomment-356823588)

To tackle this problem, you can employ a tool like Plural CD.

What is Plural CD?

Plural CD is an end-to-end solution for managing Kubernetes clusters and application deployment. Plural offers users a managed Cluster API provisioner to consistently set up managed and custom Kubernetes control planes across top infrastructure providers.

Additionally, Plural provides a robust deployment pipeline system, empowering users to effortlessly deploy their services to these clusters. Plural acts as a Single Pane of Glass for managing application deployment across environments.

With Plural CD you can effortlessly detect deprecated Kubernetes APIs used in your code repositories and helm releases minimizing the effect deprecated APIs can have on your ecosystem.

How Plural CD will operate under the hood.

Features:

Rapidly create new Kubernetes environments across any cloud without ever having to write code
Managed, zero downtime upgrades with cluster API reconciliation loops, don’t worry about sloppy and fragile terraform rollouts
Dynamically add and remove nodes to your cluster node topology as you like
Use scaffolds to create functional gitops deployments in a flash
First-class support for cdk8s.io to provide a robust Kubernetes authoring experience with unit testability and package management
Integrated secret management
A single, scalable user interface where your org can deploy and monitor everything fast.

Plural CD to detect deprecated Kubernetes APIs

Previously, identifying deprecated versions of the Kubernetes API was a laborious and error-prone task, requiring manual inspection of all manifests. This process becomes even more cumbersome and unfeasible when multiple teams deploy to a cluster without a centralized manifest repository.

Plural CD offers a comprehensive analysis of source code deployed to your Kubernetes clusters by scanning existing Git repositories. It provides an overview of the entire footprint, ensuring you stay up-to-date with the latest API versions. If any Kubernetes API is deprecated, Plural CD notifies you promptly, making it easy to troubleshoot and update configurations as needed.

How Plural CD will notify you of any API upgrades that need to happen

Behind on Kubernetes upgrades and wary of API deprecations? Reach out to us to learn more about Plural CD.

What you need to know about Self-Hosting Large Language Models (LLMs)

Michael Guarino — Tue, 26 Sep 2023 14:33:20 +0000

Since its arrival in November 2022, ChatGPT has revolutionized the way we all work by leveraging generative artificial intelligence (AI) to streamline tasks, produce content, and provide swift and error-free recommendations. By harnessing the power of this groundbreaking technology, companies and individuals can amplify efficiency and precision while reducing reliance on human intervention.

At the core of ChatGPT and other AI algorithms lie Large Language Models (LLMs), renowned for their remarkable capacity to generate human-like written content. One prominent application of LLMs is in the realm of website chatbots utilized by companies.

By feeding customer and product data into LLMs and continually refining the training, these chatbots can deliver instantaneous responses, personalized recommendations, and unfettered access to information. Furthermore, their round-the-clock availability empowers websites to provide continuous customer support and engagement, unencumbered by constraints of staff availability.

While LLMs are undeniably beneficial for organizations, enabling them to operate more efficiently, there is also a significant concern regarding the utilization of cloud-based services like OpenAI and ChatGPT for LLMs. With sensitive data being entrusted to these cloud-based platforms, companies can potentially lose control over their data security.

Simply put, they relinquish ownership of their data. In these privacy-conscious times, companies in regulated industries are expected to adhere to the highest standards when it comes to handling customer data and other sensitive information.

In heavily regulated industries like healthcare and finance, companies need to have the ability to self-host some open-source LLM models to regain control of their own privacy. Here is what you need to know about self-hosting LLMs and how you can easily do so with Plural.

Before you decide to self-host

In the past year, the discussion surrounding LLMs has evolved, transitioning from "Should we utilize LLMs?" to "Should we opt for a self-hosted solution or rely on a proprietary off-the-shelf alternative?"

Like many engineering questions, the answer to this one is not straightforward. While we are strong proponents of self-hosting infrastructure – we even self-host our AI chatbot for compliance reasons – we also rely on our Plural platform, leveraging the expertise of our team, to ensure our solution is top-notch.

We often urge our customers to answer these questions below before self-hosting LLMs.

Where would you want to host LLMs?
Do you have a client-server architecture in mind? Or, something with edge devices, such as on your phone?

It also depends on your use case:

What will the LLMs be used for in your organization?
Do you work in a regulated industry and need to own your proprietary data?
Does it need to be in your product in a short period?
Do you have engineering resources and expertise available to build a solution from scratch?

If you require compliance as a crucial feature for your LLM and have the necessary engineering expertise to self-host, you'll find an abundance of tools and frameworks available. By combining these various components, you can build your solution from the ground up, tailored to your specific needs.

If your aim is to quickly implement an off-the-shelf model for a RAG-LLM application, which only requires proprietary context, consider using a solution at a higher abstraction level such as OpenLLM, TGI, or vLLM.

Why Self-Host LLMs?

Although there are various advantages to self-hosting LLMs, three key benefits stand out prominently.

Greater security, privacy, and compliance: It is ultimately the main reason why companies often opt to self-host LLMs. If you were to look at OpenAI’s Terms of Use, it even mentions that “We may use Content from Services other than our API (“Non-API Content”) to help develop and improve our Services.

OpenAI Terms of Use neglect a users privacy.

Anything you or your employees upload into ChatGPT will be included in future training data. And, despite its attempt to anonymize the data, it ultimately contributes knowledge of the model. Unsurprisingly, there is even a conversation happening in the space as to whether or not ChatGPT's use of data is even legal, but that’s a topic for a different day. What we do know is that many privacy-conscious companies have already begun to prohibit employees from using ChatGPT.

Customization: By self-hosting LLMs, you can scale alongside your use case. Organizations that rely heavily on LLMs might reach a point where it becomes economical to self-host. A telltale sign of this occurring is when you begin to hit rate limits with public API endpoints and the performance of these models is ultimately affected. Ideally, you can build it all yourself, train a model, and create a model server for your chosen ML framework/model runtime (e.g. tf, PyTorch, Jax.), but most likely you would leverage a distributed ML framework like Ray.
Avoid Vendor-Lock-In: When between open-source and proprietary solutions, a crucial question to address is your comfort with cloud vendor lock-in. Major machine learning services provide their own managed ML services, allowing you to host an LLM model server. However, migrating between them can be challenging, and depending on your specific use case, it may result in higher long-term expenses compared to open-source alternatives.

Building a LLM stack to self-host

When building an LLM stack, the first hurdle you'll encounter is finding the ideal stack that caters to your specific requirements. Given the multitude of available options, the decision-making process can be overwhelming. Once you've narrowed down your choices, creating and deploying a small application on a local host becomes a relatively straightforward task.

However, scaling said application presents an entirely separate challenge, which requires a certain level of expertise and time. For that, you’ll want to leverage some of the OS cloud-native platforms/tools we outlined above. It might make sense to use Rayin some cases as it gives you an end-to-end platform to process data, train, tune, and serve your ML applications beyond LLMs.

OpenLLM is more geared towards simplicity and operates at a higher abstraction level than Ray. If your end goal is to host a RAG LLM-app using langchain and/or llama-index, OpenLLM in conjunction with Yatai probably can get you there quickest. Keep in mind if you do end up going that route you’ll likely compromise on flexibility as opposed to Ray.

For a typical RAG LLM app, you want to set up a data stack alongside the model serving component where you orchestrate periodic or event-driven updates to your data as well as all the related data-mangling, creating embeddings, fine-tuning the models, etc.

The Plural marketplace offers various data stack apps that can perfectly suit your needs. Additionally, our marketplace provides document-store/retrieval optimized databases, such as Elastic or Weaviate, which can be used as vector databases. Furthermore, during operations, monitoring and telemetry play a crucial role. For instance, a Grafana dashboard for your self-hosted LLM app could prove to be immensely valuable.

If you choose to go a different route you can elect to use a proprietary managed service or SaaS solution (which doesn’t come without overhead either, as it would require additional domain-specific knowledge as well.) Operating and maintaining those platforms on Kubernetes is the main overhead you’ll have.

Plural to self-host LLMs

If you were to choose a solution like Plural you can focus on building your applications and not worry about the day-2 operations that come with maintaining those applications. If you are still debating between ML tooling, it could be beneficial to spin up an example architecture using Plural.

Our platform can bridge the gap between the “localhost” and “hello-world” examples in these frameworks to scalable production-ready apps because you don’t lose time on figuring out how to self-host model-hosting platforms likeRay and Yatai.

Plural is a solution that aims to provide a balance between self-hosting infrastructure applications within your own cloud account, seamless upgrades, and scaling.

To learn more about how Plural works and how we are helping organizations deploy secure and scalable machine learning infrastructure on Kubernetes, reach out to our team to schedule a demo.

If you would like to test out Plural, sign up for a free open-source account and get started today.

Data Engineering Glossary

Michael Guarino — Fri, 27 Jan 2023 20:12:47 +0000

With the growing importance of data-powered decision-making, data engineering is becoming critical to organizations in just about every industry.

This glossary is designed to be a resource for those looking to learn about the field, hire data engineers, or brush up on the terminology. It’s also intended to help you understand the fundamentals of data engineering and its growing importance in today's data-driven world.

What is data engineering?

At its core, data engineering is all about designing, building, and maintaining the infrastructure and systems that support the collection, storage, and processing of large amounts of data. This includes creating and maintaining data pipelines, data warehousing, and data storage systems. It also includes creating and maintaining data quality and governance processes and ensuring the security and accessibility of data.

Data engineering is a critical part of any organization that relies on data to make decisions. It provides the foundation for data-driven decision-making, machine learning, analytics, and reporting. It is a highly technical field that requires knowledge of programming, databases, data warehousing, and cloud computing.

Data engineers work closely with data scientists and analysts to understand their data needs and help them access and use data effectively. They are responsible for making sure that the data is accurate, complete, and accessible to the people who need it.

Check out our project on Github

Glossary of Data Engineering terms

Big Data

Big data refers to extremely large and complex sets of data that are difficult or impossible to process using traditional methods. Big data can come from a variety of sources such as social media, sensor networks, and online transactions. It can include structured data (such as numbers or dates) as well as unstructured data (such as images or video).

The 3 Vs (Volume, Variety, and Velocity) are often used to describe the characteristics of big data. Volume refers to the sheer size of the data, variety refers to the different types of data, and velocity refers to the speed at which the data is generated. Big data requires specialized technologies and methods to process, store, and analyze it.

Business Intelligence

Business Intelligence (BI) is the process of collecting, analyzing, and presenting data to support decision-making and strategic planning within an organization. This can include data from internal systems, as well as external sources like market research and competitor analysis.

BI includes a variety of tools and techniques such as data visualization, reporting, data mining, and OLAP (Online Analytical Processing) to help organizations make sense of their data. The goal of BI is to provide organizations with a complete and accurate picture of their performance, customers, and market to make informed decisions.

Data Analyst

A data analyst is a professional who is responsible for collecting, cleaning, analyzing, and interpreting large sets of data. They use statistical methods, data visualization techniques, and other tools to gain insights and knowledge from data. They use this information to support decision-making and problem-solving within an organization. Data analysts work closely with data scientists, business analysts, and other stakeholders to understand their data needs and help them access and use the data effectively.

Data Architecture

Data architecture refers to the overall design and organization of data within a system or team. It includes the data models, data flow, and storage systems used to manage and access the data. It also includes the processes and policies that are set up to ensure data quality, security, and accessibility. The goal of data architecture is to make sure data is properly structured and stored in a way that supports the needs of the organization and users of the data.

Data Compliance

Data compliance refers to the adherence to regulations and guidelines that govern the collection, storage, use, and disposal of data. It notably includes the protection of sensitive data, such as personal information, and ensuring that such data is handled, stored, and disposed of in accordance with requirements. Data compliance is a critical aspect of data governance and it helps organizations to mitigate risks and protect sensitive information.

Data Exploration

Data exploration is the process of analyzing and understanding a dataset. It includes visualizing data, identifying patterns and relationships, and finding outliers and anomalies. The goal of data exploration is to gain insights and knowledge about the data that can be used to make informed decisions. It is often an iterative process and is useful for building an understanding of the data before getting into more structured approaches like data analysis, machine learning, or statistical modeling.

Data Enrichment

Data enrichment is the process of adding additional data to a dataset to make it more valuable. This can include adding external data, such as weather data or geographic data, to a dataset to gain new insights. It can also include adding derived variables and features, such as calculated fields or aggregated data, to the dataset. Data enrichment is a common step when preparing data for machine learning and statistical modeling, as it can lead to better model performance.

Data Governance

Data governance is the set of policies, standards, and procedures that an organization uses to manage, protect, and ensure the quality of its data. It includes the management of data policies, procedures, standards, and metrics to ensure data is accurate, complete, consistent, and accessible. It also involves the monitoring of data compliance concerning regulations.

Data Ingestion

Data ingestion is the process of bringing data into a system for storage and processing. This includes the collection, extraction, and loading of data from various sources such as databases, files, or the Internet. It is the first step in the data pipeline and it's critical for data quality and accuracy. The data ingestion process can be done with various tools such as ETL (Extract, Transform, Load) processes, data integration platforms, or custom scripts.

Data Integration

Data integration is the process of combining data from multiple sources into a single, unified dataset. This can include combining data from different databases, applications, or file formats. Data integration is a critical step in creating a single source of truth, and it can be done with various tools such as ETL (Extract, Transform, Load) processes, data integration platforms, or custom scripts.

Data Lake

A data lake is a centralized repository that allows organizations to store all their structured and unstructured data at any scale. It is a way to store raw data in its original format and allows for easy data discovery and access through a self-service model. Data lakes are designed to handle large volumes of data, and they are often implemented on distributed storage systems such as Hadoop or cloud storage platforms.

Data Lakehouse

A data lakehouse is a combination of a data lake and a data warehouse. It is a unified, hybrid data platform that enables organizations to store, manage, and analyze both structured and unstructured data in a single repository. The data lakehouse architecture is a new approach that combines the scalability and flexibility of a data lake with the performance and governance of a data warehouse, enabling organizations to get insights faster and make data-driven decisions more effectively.

Data Mart

A data mart is a subset of a data warehouse that is focused on a specific business function or department. It is a repository of data that is tailored to the specific needs of a particular business unit. Data marts are designed to provide a specific set of data to a specific set of users.

Data Mesh

A data mesh is an architectural pattern that aims to decouple data services from application services by breaking down monolithic data systems into small, decentralized, and autonomous data services. This allows for greater flexibility, scalability, and resilience in how data is managed and accessed within an organization. In a data mesh architecture, each data service is responsible for a specific domain or subset of data, and they are loosely coupled to allow for independent development, deployment, and scaling.

Data Mining

Data mining is the process of discovering patterns and knowledge from large sets of data. It involves the use of various techniques such as statistical analysis, machine learning, and artificial intelligence to extract insights from data. Data mining can be applied to a wide range of fields including business, medicine, and science, and it can be used to predict future trends, identify customer behavior, and detect fraud.

Data Observability

Barr Moses, CEO, and Co-Founder of Monte Carlo Data coined the term, data observability back in 2019. According to Barr, data observability is an organization's ability to fully understand the health of the data in its systems. By applying DevOps best practices to data pipelines, data observability ultimately eliminates data downtime.

Data Orchestration

The main purpose of a data orchestration tool is to ensure jobs are executed on a schedule only after a set of dependencies is satisfied. Data orchestration tools will do most of the grunt work for you, like managing connection secrets and plumbing job inputs and outputs. An advantage of using data orchestration tools is that they provide a nice user interface to help engineers visualize all work flowing through the system.

[

Architecture Review: Dagster vs. Airflow

We dive into the weeds to figure out what separates Dagster from Airflow at the architectural level.

Blog | PluralMichael Guarino

](https://www.plural.sh/blog/dagster-vs-airflow/)

Data Modeling

Data modeling is the process of creating a conceptual representation of data and the relationships between data elements. It is useful for designing and implementing infrastructure such as databases and data warehouses that can store and manage data effectively.

Data Pipeline

A data pipeline is a set of processes that move data from one system or stage to another, typically involving extracting data from one or more sources, transforming it to fit the needs of downstream consumers, and loading it into a target system or data store. Data pipelines can be used to automate the flow of data between systems, to ensure data is processed consistently and efficiently, and to support real-time processing and analytics.

Data Preparation

Data preparation is the process of cleaning, transforming, and normalizing data to make it ready for analysis or modeling. This can include tasks such as removing missing or duplicate data, handling outliers, and converting data into a consistent format. Data preparation is a critical step in the data science process, as it can greatly impact the quality and accuracy of the final analysis or model.

Data Science

Data science is an interdisciplinary field that involves using scientific methods, processes, algorithms, and systems to extract insights and knowledge from data. It includes various steps such as data exploration, data modeling, and data visualization. Data science can be applied to a wide range of fields, including business, healthcare, and science. It is a combination of many techniques and skills such as statistics, machine learning, data visualization, data engineering, and domain knowledge.

Data Quality

Data quality refers to the degree to which data is accurate, complete, consistent, and reliable. It is an important aspect of data management, as poor data quality can lead to incorrect or unreliable insights and poor decision-making. Data quality can be managed through a variety of techniques such as data validation, data cleansing, and data governance. Ensuring data quality is a continuous process that should be accounted for throughout data ingestion, transformation, and analysis.

Data Source

A data source is a location or system where data is stored or generated. It can be a database, file, or external system such as a website or sensor network. Data sources can provide structured or unstructured data and can be used for a variety of purposes such as business intelligence, data warehousing, and machine learning.

Data Stack

A data stack refers to the collection of technologies and tools that are used to manage and analyze data within an organization. It often includes databases, data warehousing, data pipelines, data visualization, and machine learning. The data stack can vary depending on the specific needs and requirements of an organization, but it is typically designed to support the collection, storage, processing, and analysis of large amounts of data.

[

Plural | Deploying Data Stack on Kubernetes

Use Plural to deploy and manage the Data stack on your own cloud.

Deploying Data Stack on Kubernetes

](https://www.plural.sh/plural-stacks/data)
Use Plural to deploy a data stack on Kubernetes

Data Warehouse

A data warehouse is a large, centralized repository of data that is specifically designed to support business intelligence and reporting. Data is extracted from various sources, transformed to fit a common data model, and loaded into the warehouse for analysis. Data warehouses are optimized for reading and querying large amounts of data, and they often include features such as indexing, partitioning, and aggregations to support efficient querying.

Data Wrangling

Data wrangling is the process of cleaning, transforming, and normalizing data to make it ready for analysis or modeling. This can include tasks such as removing missing or duplicate data, handling outliers, and converting data into a consistent format. Data wrangling can be time-consuming and labor-intensive, but is an important step in the data science process, as it can greatly impact the quality and accuracy of the final analysis or model.

Deduplication

Deduplication is the process of identifying and removing duplicate records from a dataset. Deduplication can be performed on various fields such as name, address, or email, and it can be done using various techniques such as hashing, string matching, and machine learning. Deduplication is an important step in data preparation, as duplicate records can lead to inaccurate analysis and decision-making.

ELT

ELT stands for Extract, Load, Transform, it is a process where data is first extracted from various sources, loaded into a target system, and then transformed to fit the needs of downstream consumers. This is different from the traditional ETL process (Extract, Transform, Load) where the data is first transformed before being loaded into the target system. ELT allows for more efficient processing of large volumes of data as it can take advantage of the processing power of modern data warehousing and big data platforms.

ETL

ETL stands for Extract, Transform, Load, it is a process for moving data from one or more sources into a target system, such as a data warehouse, for further analysis and reporting. The process consists of three main steps: Extracting data from various sources, transforming the data to fit a common data model, and loading the data into the target system. ETL processes are often automated and scheduled to run regularly to ensure that the target system is up-to-date.

Machine Learning

Machine learning (ML) is a subfield of artificial intelligence that allows systems to learn from data and improve their performance without being explicitly programmed. Machine learning algorithms can be used to classify, cluster, or predict outcomes based on data. There are various types of machine learning algorithms, such as supervised learning, unsupervised learning, and reinforcement learning. The goal of machine learning is to create models that can make predictions or make decisions based on historical data.

Reverse ETL

Reverse ETL is the process of moving data from a target system, such as a data warehouse, to one or more sources. It is the opposite of the traditional ETL process where data is extracted from the sources and loaded into the target system. Reverse ETL is used when data needs to be propagated back to the source systems after it has been transformed, consolidated, or processed in the target system. This is often important for data governance and data compliance reasons.

Plural for Data Engineers

Recently, we have noticed a trend among data teams choosing open-source tools for their data stacks when either building or re-evaluating their existing infrastructure. And, with the current state of the market, it makes sense to continuously evaluate your stack to ensure that you are keeping costs down.

Open-source toolkits are growing in popularity among data teams due to their low cost, high flexibility, and helpful developer communities. To deploy those open-source toolkits, data teams deploy them with Kubernetes.

However, the biggest struggle data teams face when using open-source technology is managing, deploying, and integrating the tools themselves in their own cloud.

Plural aims to make deploying open-source applications on the cloud a breeze for organizations of all sizes. In under ten minutes, you can deploy a product-ready open-source data infrastructure stack on a Kubernetes cluster with Plural.

To learn more about how Plural works and how we are helping engineering teams across the world deploy open-source applications in a cloud production environment, reach out to our team to schedule a demo.

Ready to effortlessly deploy and operate open-source applications in minutes? Get started with Plural today.

Join us on our Discord channel for questions, discussions, and to meet the rest of the community.

Architecture Review: Dagster vs. Airflow

Michael Guarino — Wed, 11 Jan 2023 00:22:33 +0000

In a previous article, I argued that the tech industry needs to double down on open-source to work through the tech downturn that will be needlessly extended by companies bleeding themselves dry relying on overpriced commercial software.

The open-source ecosystem is sprawling and is full of potentially cumbersome solutions. Throughout the last year, my team and I have packaged together dozens of open-source tools and have learned the ins and outs of these tools.

Based on our learnings, we are sharing our perspective on open-source tools and which ones we have found to be the best.

We plan on making this a recurring series where we compare tools against each other, and hope to aid engineering teams in their decision-making process.

In this edition, I will compare two popular solutions for data orchestration, Airflow, and Dagster.

What is Data Orchestration?

Airflow and Dagster, along with similar tools like Flyte, Luigi, Argo Workflows, and Prefect, are bucketed under a set of systems you could utilize for the orchestration of jobs.

What is Airflow?

Airflow has been the industry standard for data orchestration since Airbnb began the project in 2014. Since then the project took off, and there are almost certainly millions of lines of DAG code scattered across git repos throughout global engineering organizations.

At its time, Airflow was simply an amazing technology and was better than any alternative. However, eight years later it's beginning to show its age. Airflow's monolithic architecture, rampant legacy code, and aging interface have engineering teams ditching the once-popular technology for other alternatives.

What is Dagster?

Dagster is a recently created open-source project targeted at the same problem space as Airflow but is built with a modern cloud-native design in mind. For an open-source software (OSS), it has a slick interface with a modular architecture and a decent SDK in which to write DAGs.

To be upfront, we are big fans of Dagster and use it as our default orchestrator for our model data stack. But let's go into more detail about these tools and how they compare against each other.

Airflow is a legacy tool, but it is here to stay.

To be blunt, Airflow is simply an old technology. Before continuing to use it in production, it is worth considering some of its specific flaws.

As mentioned earlier, the clearest defects of Airflow are its monolithic architecture, legacy server code, and outdated interface.

Airflow’s architecture is pretty simple, consisting of a web tier, a scheduler tier, and a worker tier. The web tier accepts CRUD requests and serves out the interface. The scheduler polls its DB for jobs that are ready to execute and then sends it to the worker, which is either a pool of celery workers or a dedicated K8s pod per job. DAG code is then loaded into all processes and executed directly as python function calls as work dispatches.

_Image courtesy of Airflow. _

While this is a perfectly workable approach for small, one-team use cases, it is simply not scalable. As your organization adopts Airflow you end up with a severe dependency management problem. In fact, it is more common than you think. The DAG code needs to be loaded in-process and multiple teams will likely contribute DAGs to the same Airflow cluster with divergent Python dependencies. That makes it more likely you physically cannot run the DAGs for your entire organization on the same cluster. To do so, you will need to split it out to handle the incomplete pip dependencies.

In fact, pip is a very flaky dependency manager. It’s fairly common for pip installs to update the pip version of Airflow itself, causing a database migration and bringing your cluster into an unknown state that can only be reconciled manually. This is only exacerbated by the second problem: bad legacy code.

Airflow’s migration system is alembic, which is fine for simple flask deployments but is not fit for a repeatedly deployed OSS tool.

I’ve seen numerous times where alembic migrations get unsynced due to persisting an incorrect version number (probably from phantom pip upgrades). Most DB ORMs in other languages are not a concern when performing migrations since they are done in a much more robust, and intelligent fashion. This is not true of Airflow.

Additionally, Airflow’s authentication system uses a legacy package called authlib, which is not a huge issue if you just want username/password auth. However, if you want to do something more interesting like setting up OIDC, you will need to spend a few hours looking at terrible, legacy python code and ultimately realizing you need to subclass a specific python object to implement an OAuth handler.

It also has unusual, phantom conventions on how users are registered in the airflow database that can byte you if any auth providers are mis implemented

Finally, while aesthetics are in the realm of de gustibus, non disputandum est, Airflow’s user interface is considerably out of date. In the world of OSS, this is somewhat expected, but there certainly are better user experiences out there among competing job orchestrators.

That said, we promised to explain why Airflow is here to stay, and there’s a simple answer: there’s a massive amount of existing Airflow code already built.

At most this can be classified as tech debt from the above observations, and the upfront cost of rewriting all that code is rarely worth it if you can just baby your cluster. There are ways to move off if you truly wanted to, and I’d be interested in someone building an API-compatible scheduler to drop in and replace airflow as well, but for now, the path dependencies need to be respected in a lot of codebases.

Airflow is going to be around for a while, which is why we’ve invested a lot of effort in supporting it on Plural. We still want Airflow users to have a simple operational experience with their clusters.

Why Dagster is better than Airflow

The huge innovation Dagster has introduced is leveraging containerization to entirely solve the Airflow monolith dependency issue at the architectural level. It segments an architecture similar to Airflow by moving the scheduling tier into a GRPC-compliant microservice that can accept any number of “user deployments.” They then register job types with the scheduler and web server and then spawn them as isolated docker containers within a k8s job, which can consolidate the dependencies and source code into isolated units.

_Image courtesy of Dagster. _

This enables any number of teams to share the same scheduler without the worry of trampling on each other's code, simplifying the operational profile of your setup. This also removes the risk of pip upgrades interfering with Dagster’s core source code and all the database migration headaches that can cause.

On the aesthetic side, Dagster benefits from being built in the 2020s and has a sleek, modern interface, with nice timeline visualization for running jobs alongside more familiar graph visualization.

Like many new OSS projects, Dagster has its warts. The most notable is its web interface does not come with authentication at all in the OSS version. Plural helps there by using our OAuth proxy infrastructure to inject sidecars to provide authentication with OpenID Connect, but you could also host it on a private network to provide a measure of security as well. That said, mature projects with some web-facing components really should be supporting authentication as table stakes, which is a bit disappointing.

This is more of a niche concern, but I also think they should build an operator for provisioning user code deployments for a running dagster instance. Currently, their creation is wrapped in a helm chart, which can in theory be deployed independently.

However, in most realistic cases this will involve all dev teams writing code for Dagster having to submit PRs to a single repo managing the installation of that helm chart, instead of creating deployments in the namespaces or git repos in which they naturally work. Using a CRD to instantiate these would be a natural evolution to the more decentralized operating model the product seems to be built for.

Wrapping Up

Every engineering org will have its own tradeoffs to make in adopting any software, and our preferences will not necessarily be the winning consideration everywhere. Hopefully, we have helped people either learn about a new tool or realize some issues with their assumed favorite before getting too locked in.

Check out our project on Github

Open-Core Companies Are Not Incentivized To Make Their Projects Good

Michael Guarino — Thu, 20 Oct 2022 04:00:50 +0000

Image courtesy of Bartosz Prusinowski from Unsplash.

SaaS companies built on open-source projects (also known as open-core companies) are not incentivized for their underlying project to be good; but rather to be _ good enough _.

Few individuals will trust a SaaS product built on a dubious underlying project, so I’m not claiming that these projects are bad. Rather, they are intentionally incomplete.

Many of these open-core companies (which I will hereafter refer to as “unicorns”) have spent time building out their underlying project to help their communities and increase adoption.

However, there comes a point where it is simply not profitable or cost-effective for the SaaS business model to spend time building out features for maintaining the open-source software (OSS) version. The OSS version merely serves as a proof point for guiding a purchaser down the funnel.

Throughout this article, I’ll defend this claim with a few main points, offer some counterarguments, and close with a discussion on the dire need for an update to OSS monetization.

Cannibalization

Photo by Victor Rodriguez / Unsplash

One of the common fears unicorns have is that the open-source offering will cannibalize sales of their cloud service tool. This fear is not baseless.

Consider the case ofHuggingFace, a wildly popular AI tool with over 70k stars onGitHub. They are universally loved by the community, have an amazing core project, and have massive adoption. You would think that with these accolades (not to mention their $2B valuation) they would be generating massive revenue.

That’s where you’d likely be wrong. According to a recent Forbes article, HuggingFace brought in less than $10 million in 2021. While that number is nothing to sneeze at, and I’m sure revenues are only improving for 2022, it hardly stands up to the titanic OSS adoption and valuation. With such a huge evaluation and growing expectations for revenue, it does bring up two important questions.

What are the root causes?
How does such an adopted project not pull in more cash?

The answer to both of these questions lies in the fact that the OSS project is too effective at what it does. No matter how socially beneficial it is to donate to and support companies that genuinely care for their OSS projects, consumers and companies will rarely do so, as our economic systems don’t incentivize that behavior. While they offer valuable paid services, most users come to HuggingFace for the pre-trained models, which are available out of the box for free.

The HuggingFace community is a genuinely cool place and I am rooting for the project to be successful and profitable. However, companies looking to follow in their footsteps are facing a massive challenge.

Community is a nontrivial endeavor

So even if there is potential cannibalization, thriving communities like HuggingFace will eventually lead to successful businesses… right? While most unicorns will naively believe this to be the case, replicating that success is not so straightforward.

While I am confident that HuggingFace will see success in the long run, many products trying to build a community around their open-core SaaS will likely end up creating a glorified support forum or a place for self-promotion. Additionally, the unicorn will struggle to unify the user bases of its SaaS and open-source offerings, which will appeal to vastly different audiences.

Communities like HuggingFace anddbt provide a lot of genuine value outside of just support and have many dedicated individuals working exclusively on the open-source offering.

For example, dbt has:

An awesome conference inCoalesce that people are legitimately excited about every year
Great meetups and culture around them
A transparent and honest focus on diversity, equity, and inclusion
Great free online courses and learning materials

As indicated in this exceedingly thoughtful response post to acallout of dbt’s community, CEO Tristan Handy mentions that in May, they had 10 full-time employees dedicated to dbt Core and 8 full-time employees dedicated to community work. This sort of top-down organizational dedication is impressive and notably rare in smaller organizations.

All said, it takes a lot of dedication to form a healthy and vibrant community. If the unicorn doesn’t allocate time and money to building out these resources for people in an honest way, its community will not resemble any of the success stories that came before it.

A product manager’s nightmare

Photo by Kamil Feczko / Unsplash

So where does that leave our unicorn? Their hosted and OSS project are diametrically opposed and the community they fostered is either a support forum, a self-promotion frenzy, or both.

Well, it gets worse.

Every theoretical improvement to the underlying project is limited by the time of their engineers. The unicorn has to constantly make complex decisions on whether or not to contribute every new feature to the open-source project.

Essentially, each feature needs to be inspected to see if it will bring profit as a proprietary offering. Simple improvements that can only be pushed to open-source require dedicated engineers or diverting time that could be spent on proprietary features.

Contribute too few features and the OSS community will weaken and adoption will slow to a halt. And, if you contribute too many features the OSS offering will start the cannibalization process of your hosted service.

This product management problem can be represented by an optimization function described with the following parameters, usually ordered by this priority:

Fix major bugs for the hosted product before anything else.
Fix major bugs for the OSS offering as soon as possible.
Create differentiated features for the hosted product.
Create features for the OSS product that can be absorbed into the hosted product.
Improve the self-hosting experience for OSS products if there’s any spare time (which there usually isn’t).

Not all companies abide by the priority list; to be clear, I am not suggesting that all SaaS companies operate this way. Many will put their OSS product first to gain favor from the community. This is a noble feat but does not guarantee a reward.

The point is, the journey of profit maximization will cause the unicorn to eventually pit themselves against their open-source community, whether they like it or not. This became the norm because venture capital realized that open-source communities are powerful marketing engines for a new startup to break into crowded markets.

💡

The question that follows is: how do we fix this incentive structure?

Counterexamples

Before we go into suggesting what a healthy monetization scheme for the OSS ecosystem would be, I wanted to acknowledge that there are some healthy counterexamples to the points that I have made.

Docker, Elastic, and MongoDB are great examples of succeeding with the hosted open-core SaaS. However, they are potentially large exceptions to this rule. The main differentiator here is that the global scale at which the community adopted their projects made sure that even if they only captured a small percentage of their users, that slice would still bring in significant revenue.

This is the reason why I believe that HuggingFace will eventually be successful. While everyone should aspire to this, the chance of achieving this level of adoption is low and is usually an exception. On the journey toward this level of adoption, companies will need a reliable and sustainable business model to stay afloat.

Fixing the incentive structure

Photo by Danny Howe / Unsplash

To fix the incentive structure, we need to attack the two halves of the problem. One half discusses what we can do about companies that actually benefit from being open-core and the other side discusses companies that don’t.

Let’s start with the first one.

Open-source support agreements are healthy

In the name of endless hypergrowth, unicorns and venture capital realized that support agreements aren’t exponentially scalable. They require lots of hands-on attention, have high customer acquisition costs, and are not always worth the business that it brings in (not all revenue is created equally).

On the other hand, cloud spending is comparably cheap to acquire, requires less hands-on attention, and can scale to your infrastructure and marketing budget. This is often referred to as product-led growth.

However, open-source companies need to accept that the majority of their big deals and logos come from support contracts, not cloud spending. Even if the landscape has changed since the old days of Red Hat’s support model revolution, the support contract system will always retain a mutualistic relationship with supporting the open-source project. The better the OSS project is, the fewer issues that pop up and require hands-on attention. Additionally, companies with support contracts can request features that will usually get pushed upstream for the greater community of users.

One may suggest: “If things are running perfectly, won’t customers reduce their required engagement or remove the support plan?” Generally, no. The cost of keeping experts around is usually far lower than a SaaS bill and new features will always need to be built.

Hypergrowth is cool but is not necessary to create a sustainable and effective business. Sometimes, insane aspirations will even lead to wildly unprofitable companies in the short term.

Your hosted SaaS does not need to be open-core

I have exclusively worked for companies that are based on open-source projects because I have a deep love for open-source communities and the ecosystems they foster. However, I have seen lots of founders opt-in to open-core purely for the potential sales funnel that they can create with their open-source “community.”

Additionally, unicorns think that open-source automatically means cool and trendy. This is wildly untrue; there is nothing less cool than an open-source project that is difficult or impossible to self-host. There are only so many benefits to be received from being able to stare at the source code, most of which are already captured through security compliance.

If you are creating a hosted product, seriously consider whether you can both sustain a business and bring legitimate value to a community before you consider making it open-core. Closed-source is not automatically uncool. What is cool is being genuine and honest about the way that you intend to monetize your business.

While we don’t have a SaaS product, one may ask, “won’t your open-source project fall into the same trap?”, which is a fair question.

In turn, we’ve used this as an opportunity to detail our monetization plan here.

If you have any further questions about how we plan to monetize, head over to our community Discord and ask us questions in the #feedback channel. We’ll be glad to help clarify any doubts.

7 Kubernetes Best Practices

Michael Guarino — Mon, 17 Oct 2022 23:08:44 +0000

Photo by Aaron Kato / Unsplash

As more and more applications are built on the cloud, Kubernetes has rapidly become one of the most popular open-source technologies. Containerization is now standard practice in cloud-native development, and as a cluster orchestration tool, Kubernetes (K8s) simplifies the definition and operation of your containerized infrastructure.

K8s is extremely powerful, and once set up, it will unlock many benefits for your team. K8s can make your infrastructure easier to develop and scale. It can improve your product’s ability to withstand and recover from failures. It can simplify organizational challenges when it comes to maintaining infrastructure.

On the other hand, Kubernetes itself is complex and has a steep learning curve. If you’re looking for information about whether Kubernetes is right for your team, check out our in-depth exploration of the pros and cons of K8s.

To help you reduce the complexity and avoid failures, we’ve identified the seven best practices you can apply to your job when using Kubernetes. These will be helpful whether your team has recently decided to start its Kubernetes journey or you’ve already traveled far down the path.

Check out our project on Github

Upgrade your Kubernetes version

Running the latest version of Kubernetes will ensure you have access to new features, bug fixes, and security patches. Outdated versions of Kubernetes may open up vulnerabilities in your system, and you may find it harder to receive support for older versions. Unless you have a reason to maintain an old version, you should upgrade to the latest stable release–available from Kubernetes here.

Use namespaces

Namespaces allow you to organize your cluster and set up logical divisions between domains or functions. Once your namespaces are set up, you can define policies and access rules for each. Namespaces simplify container management and reduce the risk of large-scale failure.

To minimize risk, you should practice the principle of least privilege: everyone should have only the permissions required to perform their function. For instance, to increase security and prevent accidents, you should ensure that developers only have access to namespaces related to their work. That way, if someone’s account were compromised, malicious agents couldn’t wreak havoc all across your infrastructure. Similarly, a developer couldn’t accidentally overwrite parts of your system that reside in other namespaces.

For a deeper dive into namespaces and how to set them up, you can read through the K8s docs.

Set up role-based access control (RBAC)

To make implementing the principle of least privilege easier, use role-based access control.

RBAC makes it simpler to manage the permissions and access granted to your users or service accounts, reducing the possibility of mistakes. Kubernetes allows you to set up roles within namespaces or across the cluster and define their permissions. Accounts can then be linked to those roles, ensuring they only have the permissions granted by those roles.

In the example above, to restrict developers to particular namespaces, developers would be given roles in just those namespaces. The permissions granted by these roles would allow them to deploy their code but not to change the broader operations of the cluster.

Regular maintenance of your organization's roles and permissions is important to ensure only proper access is being granted. Roles make this maintenance much simpler and safer than managing permissions individually for every account.

Organize your cluster with labels

Labels in Kubernetes allow you to attach key-value pairs to K8s objects. As an orchestration platform, Kubernetes lets you define objects to maintain an abstraction layer around your individual clusters and their states. As your infrastructure grows, you’ll end up with a growing number of objects as well. Labels make it easier to manage these objects. At a base level, you can use them to define and track metadata. For instance, you could use a label to track who a pod’s owner is.

Even better, you can query labels and manage objects in bulk using selectors. For instance, if you include a label that tracks which environment each pod is designated for, you could query all of your QA objects through the command line with a command like this:

$kubectl get pods -l “environment=QA” --show-labels

More information on labels and examples of how to set up selectors can be found inthe Kubernetes docs.

Use a Git-based workflow

Kubernetes deployment can be complicated, so having an automated workflow will reduce hassles and errors. Setting up a Git-based workflow and a CI/CD pipeline allows you to maintain a single source of truth for your deployment; with automated deployment, your system will automatically reflect what’s in your repo. GitOps is a commonly used framework for organizing and supporting your workflow.

If any issues arise during your system's operation, a Git-based workflow will also make it simpler to roll back or redeploy.

Set up automated monitoring

Monitoring your cluster is critical for identifying issues and controlling resource usage. Issues with your cluster can worsen your product’s performance, increase your operational costs, and in the worst case, cause outages. Monitoring will allow you to identify these problems more quickly and understand their causes.

You should set up automated monitoring to help you make sense of Kubernetes alerts. Tools like Prometheus and Grafana will help you pull insights out of your data, allowing you to focus on what’s most important and make more informed decisions about your operations.

Set up network policies

Though it may seem safe enough to allow your containers to communicate with any other service behind your firewall, this can pose a risk if malicious actors gain entry into your system.

By default, your containers should deny any traffic unless it is from specifically allowed sources. The K8s docs provide further information on how to set up network policies.

Continue refining your practices

As your team gets more comfortable with Kubernetes, continue to refine how you incorporate these best practices. You should also develop your own. Every team has different needs, but setting up standards for your organization can reduce unnecessary complexity and prevent mistakes.

And here’s one last tip: if you’re deploying an open-source application on the cloud, try using Plural.

We’ve helped engineering teams across the world manage their applications on Kubernetes. Check out our Githuband follow along our documentation to get up and running.

Join us on our Discord channel for questions, discussions, and to meet the rest of the community.

How to Run a Minecraft Server on a Kubernetes Cluster

Michael Guarino — Tue, 27 Sep 2022 20:59:11 +0000

When we initially set out on our journey to empower people to host their own applications, we focused on core infrastructure and useful tooling. While that is still our primary focus and our most common use case, we did consider there are plenty of things that developers may want to self-host, such as video games with server-side components.

While it may not be the most cost-effective way to do this if it lives by itself on a Kubernetes cluster, a Minecraft server can be a fun addition to a project or company’s Plural stack.

Why would you want to self-host your own Minecraft server?

Before we dive into the technical details of how this works, it’s worth explaining the advantages of hosting your own Minecraft server.

Custom rules: Hosting your own server allows you to customize your game experience, as opposed to running the stock game on your local machine.
Mods: By self-hosting, you can add mods to your server, allowing for various quality-of-life improvements, increased gameplay depth, and new experiences. Check out some of those mods on CurseForgehere.
Role-based access control (RBAC): While playing on a personal server, you have more robust control over user access and can set role-based access control.

And, while hosting your own Minecraft server is not a new concept, there are a few hurdles you’ll face when you get started using Kubernetes to deploy a Minecraft server.

Minecraft on Kubernetes

Photo by Nina Rivas / Unsplash

The first major hurdle to using a Kubernetes cluster to self-host your Minecraft server is figuring out how to dockerize a Minecraft server process. Fortunately, there’s some pretty amazing prior art fromitzg that fully dockerizes a Minecraft server and includes some pretty common modsets, along with the ability to choose your server version and configure other important settings to the game.

Honestly, the difficulty of building that image far exceeds the difficulty of configuring Kubernetes to run it and Itzg deserves a lot of love for getting it working.

Minecraft is a relatively simple system when you break it open. It contains a single node that requires a persistent disk for storing game state and configuration files, and it needs some sort of network interface for Minecraft game clients to connect to the server.

The client-server protocol is built on Transmission Control Protocol (TCP), although there are ways in which it leverages User Datagram Protocol (UDP) as well. It’s fairly easy to convert this to a Kubernetes setup, and it illustrates some nice Kubernetes core concepts.

A Statefulset for the Minecraft node. This simplifies binding a volume to the pod and technically gives a stable network address, although that’s not fully necessary for these purposes. You can use a deployment here, but I usually find statefulsets more straightforward for stateful workloads.
A LoadBalancer service mapping to the pods of the statefulset in (1). Kubernetes ingress only supports Hypertext Transfer Protocol (HTTP) out-of-the-box, so this is the only k8s-native solution. In theory, you can also use more advanced networking solutions like istio, but unless you have experience working with it in the past It might be best to stay clear from this solution.
An external Domain Name System. You will also need an external Domain Name System (DNS) annotation on the service to bind a DNS name to the attached load balancer, which you can seehere. Note: If you are using Plural, you can also take advantage of our DNS service to set up the hostname, so there is no need to procure a domain.

It is worth calling out that there is one minor complication that does arise from Minecraft's reliance on TCP for client-server communication. For example, Amazon Web Services (AWS)doesn’t support TCP naturally with its bare-bones Elastic Load Balancing (ELB) or Application Load Balancer (ALB). However, you can do this by installing theaws-load-balancer-controller, which is also pretty easy to configure with a few annotations:

service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing

service.beta.kubernetes.io/aws-load-balancer-backend-protocol: TCP

service.beta.kubernetes.io/aws-load-balancer-type: external

service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip

Most other clouds have TCP compatible load balancers by default, but although it’s worth watching out for this issue. Also, other games will typically rely on TCP rather than HTTP protocols for communication with their clients as they were often developed with Local Area Network (LAN) in mind.

Things to keep in mind when deploying Minecraft on Kubernetes

Photo by Steve Johnson / Unsplash

Kubernetes is a fairly large system that typically has a high hurdle cost to run. On AWS, an EKS control plane alone costs around $50 per month. We don’t expect most hobbyists to be willing to spend that sort of money for a standalone Minecraft server. That said, it can still be a useful setup in two scenarios:

You want a cluster to be available for you and your friends to run a set of games, such as Minecraft, Terraria, Valheim, or CS: GO. In that case, you might be unable to pack them all onto a single box, and using something like k8s to manage the cluster could keep things relatively healthy.
You use a lightweight k8s distribution like k3s on a low-cost machine. In this scenario, you could theoretically also use docker-compose, but k8s would be a bit more robust and could support clustering if needed.

It’s ultimately a matter of personal preference, but for Kubernetes nerds like us, we think it can be a fun way to play with Kubernetes and some of your favorite games at the same time.

How to Install Minecraft on Plural

As for installing Minecraft on Plural, you can do this by running plural bundle install minecraft minecraft-aws in a preexisting Plural installation on AWS.

For Azure or GCP, use the minecraft-azure or minecraft-gcp bundle names, respectively.

After a plural build and plural deploy, your server will be up and running.

If you don’t have a Plural installation yet, check out our quickstart guide here to get up and running.

A few things to watch out for:

Make sure you’re updated to the latest version of the Java Minecraft launcher
To connect to the server, you’ll directly connect to the address that it’s hosted at on Plural, e.g. minecraft.org_name.app.plural.sh
Make sure to add a password to your server and add permissions before sharing the address, as anyone will be able to join if they know the address.

If you like what we are doing and want to contribute to our project, head over to our GitHub to learn more about Plural.

Getting Started with Kubescape on Plural

Michael Guarino — Mon, 12 Sep 2022 12:31:06 +0000

It’s no secret how popular Kubernetes (K8s), the open-source container orchestration solution, is for deploying cloud-native technologies. A 2021 survey from the Cloud Native Computing Foundation found that 5.6 million developers currently use Kubernetes, up four percent from 2020. While the rise in adoption of Kubernetes is exciting, it does raise a big concern for most DevOps teams:

“How do we ensure our Kubernetes clusters are secure?”

Security is critical for containerized applications that operate on a shared infrastructure. As organizations continue to scale their deployments on Kubernetes, the risk of misconfiguring a Kubernetes cluster only increases. In fact, Gartner estimates that through 2025, 99% of cloud breaches will have a root cause associated with customer misconfigurations or mistakes.

To help combat this massive problem in the Kubernetes landscape, we added Kubescape to our open-source marketplace. Kubescape is now available as a direct install with Plural, a free open-source Kubernetes DevOps platform that allows you to deploy Kubernetes clusters and open-source applications. This provides engineers with immediate access to risk analytics, compliance checks, and vulnerability scanning.

What is Kubescape?

Kubescape is a Kubernetes open-source platform that provides a multi-cloud K8s single pane of glass, including risk analysis, security compliance, RBAC (role-based access control) visualizer, and image vulnerabilities scanning.

The open-source platform works by scanning K8 clusters, Kubernetes manifest files (YAML files, and HELM Charts), code repositories, container registries, and images. After scanning, Kubescape will detect misconfigurations according to multiple frameworks (such as the NSA-CISA, and MITRE ATT&CK.) During those scans, Kubescape will find software vulnerabilities and show RBAC violations at the early stages of the CI/CD pipeline.

Best of all, you can install Kubescape on Plural with little to no management experience necessary. Here’s what you need to know:

Prerequisites

Before getting started with this tutorial you will need your cloud provider’s CLI installed and configured. For more information on this step please refer to our cloud provider guide and follow the provider-specific instructions.

Create an Account with Plural

If you haven’t done so already, create a free account on our web application. This is only to track your application installations so we can automatically upgrade the applications on your behalf. You will not be asked to provide any infrastructure credentials or sensitive information.

Install the Plural CLI and dependencies

Next, install the Plural CLI and dependencies. Plural’s CLI can be pulled down via curl, brew, or as a prebaked Docker image. Since we’re on a Mac, we’ll use brew. If you’re not on a Mac, you can use one of these other options to pull the CLI down.

brew install pluralsh/plural/plural

Brew will install Plural, alongside terraform, helm, and kubectl for you. If you have already installed any of those dependencies yourself previously, you can add --without-helm, --without-terraform, or --without-kubectl.

Set up a Repository for Configuration

Once Plural is installed in your CLI, you’ll need to set up a Git repository to store your Plural configuration. This will contain the Helm charts, Terraform config, and Kubernetes manifests that Plural will autogenerate for you.

For this step, you have two options to get up and running.

Run plural init in any directory to let Plural initiate an OAuth workflow to create a Git repo for you
Create a Git repo manually, clone it down, and run plural init inside it

Running plural init will start a configuration wizard to configure your Git repo and cloud provider for use with Plural. You're now ready to install Kubescape on your Plural repo.

Installing Kubescape

To find the console bundle name for your cloud provider, run:

plural bundle list kubescape

Now, to add it to your workspace, run the install command. If you're on AWS, this is what the command would look like:

plural bundle install kubescape kubescape-aws

Plural's Kubescape distribution has support for AWS, GCP, and Azure, so feel free to pick whichever best fits your infrastructure.

The CLI will prompt you to choose whether you want to use Plural OIDC. OIDC allows you to log in to the applications you host on Plural with your login to app.plural.sh, acting as an SSO provider.

To generate the configuration and deploy your infrastructure, run:

plural build

plural deploy --commit "deploying kubescape"

Note: Deployments will generally take 10-20 minutes, based on your cloud provider.

Installing the Plural Console

To make management of your installation as simple as possible, we recommend installing the Plural Console. The console provides tools to manage resource scaling, automated upgrades, dashboards tailored to your Kubescape installation, and log aggregation. This can be done using the exact same process as above, using AWS as an example:

plural bundle install console console-aws

plural build

plural deploy --commit "deploying the console too"

Accessing your Kubescape installation

Now, head over to kubescape.YOUR_SUBDOMAIN.onplural.sh to access the Kubescape UI. If you set up a different subdomain for Kubescape during installation, make sure to use that instead.

Accessing your Plural Console

To monitor and manage your Kubescape installation, head over to the Plural Console at console.YOUR_SUBDOMAIN.onplural.sh .

Moving forward with Kubescape and Plural

If you have any issues with installing Kubescape on Plural, feel free to join our Discord community so a member of our team or community can help you out.

If you'd like to request any new features for our Kubescape installation, feel free to open an issuehere.

The Pros and Cons of Kubernetes

Michael Guarino — Wed, 07 Sep 2022 19:30:28 +0000

I recently talked to a Head of Engineering at a 700-person e-commerce company who just started her new role. While hiring and planning for the fourth quarter are high on her ever-growing list of things to do, she is also investigating reducing her organization's management overhead and cloud spend.

Due to this, she is considering adopting Kubernetes in production environments for her organization. At this point, some may be surprised at “Kubernetes” and “cost-savings” existing in the same sentence, but she and many other engineering leaders are seriously evaluating Kubernetes for this use case. A 2021 survey from CNCF reported that 96% of organizations are either using or evaluating Kubernetes as a means to reduce cloud spending.

With the adoption of Kubernetes is on the rise by most organizations, it’s evident that it’s an extremely popular and useful open-source project. But, as is the case with most projects, there are advantages and disadvantages that you should take into consideration.

Over the last year, I have spoken with dozens of engineering leaders to learn more about their experiences with the popular container orchestration software.

pluralsh / plural

Deploy open source software on kubernetes in record time. 🚀

The fastest way to build great infrastructure

Plural empowers you to build and maintain cloud-native and production-ready open source infrastructure on Kubernetes

🚀🔨☁️

✨ Features

Plural will deploy open source applications on Kubernetes in your cloud using common standards like Helm and Terraform.

The Plural platform provides the following:

Dependency management between Terraform/Helm modules, with dependency-aware deployment and upgrades.
Authenticated docker registry and chartmuseum proxy per repository.
Secret encryption using AES-256 (so you can keep the entire workflow in git).

In addition, Plural also handles:

Issuing the certificates.
Configuring a DNS service to register fully-qualified domains under onplural.sh to eliminate the hassle of DNS registration for users.
Being an OIDC provider to enable zero touch login security for all Plural applications.

We think it's pretty cool! 😎 Some other nice things:

☁️ Build and manage open cloud-native architectures

The plural platform ingests all deployment artifacts needed to deploy…

View on GitHub

Here is what I have learned.

What is Kubernetes Used For?

Kubernetes is an open-source industry standard for delivering containerized applications with an inherent microservices architecture. Photo by Etienne Girardet / Unsplash

Before we dive into the pros and cons of using Kubernetes, we should briefly explain why Kubernetes is so popular nowadays.

Kubernetes, also known as K8s, recently turned eight years old. In those eight years, it has changed the modern engineering landscape. Kubernetes is an open-source industry standard for delivering containerized applications with an inherent microservices architecture.

Before Kubernetes, everything was bespoke. Engineers managed distributed systems using an assortment of one-off cloud consoles, bash scripts, and Python scripts. If you have any experience wrangling distributed systems in the past decade, you likely have encountered these exact quirks and frustrations that come along with doing so.

Kubernetes brought standardization to distributed systems – something that was desperately needed. Developers finally had a way to describe deployment logic, configure management, networking, RBAC policies, and ACL rules that were interchangeable across either an on-prem setup or with any cloud provider.

Thanks to Kubernetes, you can declaratively specify what your infrastructure should look like within a set of YAML files. You can then package those files up and then move them around between different clusters, providing some desperately needed portability.

When not to use Kubernetes

Sometimes it doesn't make sense to use Kubernetes. Here are two main reasons why you should avoid it. Photo by Ehimetalor Akhere Unuabona / Unsplash

The most obvious argument against using Kubernetes is that organizations often don’t need the high availability that Kubernetes provides and don’t have multiple containers to deploy. While these are valid, they are generally surface-level concerns.

Kubernetes has a steep learning curve

The most common systemic argument against using Kubernetes is that it’s overly complex to get up and running. For starters, the learning curve is steep. There is also an overwhelming amount of content covering a variety of topics, often leaving developers puzzled as to where to start when learning Kubernetes.

My Kubernetes journey started at PlanetScale, where I was working on early versions of their database as a service that deployed on Kubernetes. I found myself constantly spinning up clusters, getting familiar with common debugging workflows, and testing locally with minikube and kind. I carried the pager for our production clusters and learned how to debug issues from Kubernetes experts. In addition to Ops, I had to directly interface with the K8s API when developing, which provided some useful context and depth to my understanding of the technology.

Where does this leave the aspiring student of Kubernetes? Everyone learns in different ways, but I’ve found that there are few substitutes for hands-on experience here. This is a genuine concern, and a lack of access to real experience is sometimes (but not always) a deal breaker.

The bright side is that every engineering leader I spoke with emphasized that you are going to only use about 10 resources within the Kubernetes API consistently. Over time you’ll slowly learn how all those work and fit together. Here are some resources to explore if you aren’t sure where to begin:

At the end of the day, if you are unable to find experienced engineers that understand running Kubernetes in production, setting it up and managing it yourself is not recommended.

Deploying and managing Kubernetes clusters is complex

Another common argument engineering leaders made against the adoption of Kubernetes is that smaller engineering teams have a hard time deploying and managing Kubernetes clusters. In fact, they found that over the years most of the organizations they have seen that do end up adopting Kubernetes have entire DevOps teams dedicated to dealing with the complexity that comes from doing so. Smaller engineering teams do not have this luxury, and the experience shortage we talked about previously becomes increasingly relevant here.

For Kubernetes to not be a hassle for your engineering team, you should at least have a team dedicated to managing clusters, or at a bare minimum, a dedicated engineer with years of experience with Kubernetes.

However, with budgets decreasing for companies worldwide, the opportunity cost for hiring a person or team to manage Kubernetes may be prohibitively high, leading many to stick to the more comfortable world of VMs.

Why you should use Kubernetes

Kubernetes does offer an array of benefits to organizations. Photo by Devin H / Unsplash

For starters, Kubernetes is open source, has a strong community, and has attracted excellent contributions from vendors in the cloud ecosystem. On top of that, Kubernetes allows you to run your application on multiple cloud providers or a combination of on-premises and cloud, thus allowing you to avoid vendor lock-in.

Every engineering leader I spoke with agreed that Kubernetes is an extremely powerful tool, and developers at companies of all sizes can immediately reap the benefits of using it for their projects.

Kubernetes has built-in self-healing for all of your running containers and ships with readiness and liveness checks. When containers go down or are in a bad state, things often return to the status quo automatically or with plug-and-play debugging workflows.

If you are looking to save money while running infrastructure at scale, Kubernetes can make sense for your organization. Kubernetes has auto-scaling capabilities, allowing organizations to effectively scale up and down the number of resources they are using in real time.

Additionally, If you are moving from VMs to containers, you will reap the lower maintenance costs of containers, as consistent deployments and portability will reduce friction within your organization.

Due to the advent of containerization, the plug-and-play nature of open-source software is more powerful than it ever has been. Adding software to your Kubernetes cluster can be as simple as copying some configuration and running a few commands in your terminal. Open-source alternatives to managed software can net you meaningful cost savings in the long run and will receive networking benefits from being collocated.

Moving toward running OSS software on Kubernetes from paying for disparate managed services is a lucrative opportunity to save costs and may be one of the biggest benefits of using Kubernetes as your business begins to mature.

Is Kubernetes worth it?

Ultimately, we believe that Kubernetes is oftentimes worth the investment (in terms of engineering resources) for most organizations.

If you have the right engineers, enough time, and resources to effectively run and upkeep Kubernetes, then your organization is likely at a point where Kubernetes makes sense. I understand that these are not trivial prerequisites, but if you can afford to hire a larger engineering team you are likely at a point where your users heavily depend on your product to be operating at peak performance constantly.

However, if you are looking to deploy your application into production and are short on either time or engineering resources it might be better to stay clear of Kubernetes for the time being.

Especially if you are not familiar with Kubernetes and are short on time, it likely is a bad idea to hack away at it and misconfigure a cluster exposing sensitive information that could lead to data exfiltration and other hacking attempts.

However, if you’re considering deploying open-source applications onto Kubernetes, it has never been easier to do so than with Plural.

It requires minimal understanding of Kubernetes to deploy and manage your resources, which is unique for the ecosystem.

If you like what we’re doing here, head over toour GitHub and check out our code, or better yet, try your hand out at adding an application to our catalog.

This post was co-written by Abhi Vaidyanatha our Head of Community

Bringing a good OSS experience to Kubernetes DevOps

Michael Guarino — Wed, 24 Aug 2022 14:29:51 +0000

This blog post was originally featured on the CNCF blog.

Open-source is at a crossroads at the moment. 97% of data stackscontain open-source code. However, deploying and managing open source applications is tedious and time-consuming.

Take an open source product like Apache Airflow for example. The popular workflow management system has over 26k stars on Github and is used by over 14k companies. Currently, the consensus among developers is to deploy their own Airflow instance themselves on Kubernetes.

Sounds simple, right?

Configuring and managing Airflow on Kubernetes can be difficult, especially if you are short on engineering resources or have never deployed Airflow itself on Kubernetes. Airflow itself is a complicated stateful application that is made of a SQL database and a Redis cache. Each of these components is technically difficult, and companies that ultimately end up using Airflow usually dedicate an engineer’s job to efficiently scale Airflow while ensuring it is always up and running.

It doesn’t have to be this challenging to get going with open-source applications on Kubernetes.

In this post, we’ll discuss:

Why use Kubernetes to deploy your data stack?
Three constraints organizations face when deploying open-source applications on Kubernetes
Solving the three DevOps constraints

Why Use Kubernetes to Deploy Your Data Stack?

Deploying your data stack on Kubernetes is the current consensus for DevOps professionals. Photo by Austin Chan / Unsplash

Over the past few years, I have seen a dramatic increase in companies opting to deploy their data stacks with Kubernetes. Throughout my engineering career, which includes tours at Amazon and Facebook, a lot of what I did was scale open-source applications in a completely self-managed way.

Obviously, at these organizations, we had access to greater engineering resources which allowed us to shift this process completely in-house. We didn’t need to use a managed service in the cloud, or a cloud offering from a software vendor to scale out open-source applications.

When I was first introduced to Kubernetes, it became pretty obvious to me it had the potential to automate a lot of the knowledge we had in-house at these organizations and deliver those benefits to the wider engineering community.

While I could go on and on regarding the benefits of using Kubernetes to deploy your data stack, I do think it’s important to highlight the three main benefits I have seen from customers that run their data stack on Kubernetes.

Cost savings: In my opinion, the biggest benefit is the cost that organizations save, especially when they begin to mature as a business. Larger organizations that are running infrastructure at scale can realistically reduce their cloud bill by upwards of a million dollars per year. I have also seen incredible cost savings, especially when paired against the managed serviced layer. Buying a managed service solution often comes with a 40 percent markup to compute. With large-scale batch processing jobs, that cost adds up quickly.
Simpler security model: Everything is a hardened network, with no worries about privacy and compliance issues. Being able to have strong compliance and privacy around product analytic suites is especially helpful when companies begin to talk about GDPR and CCPA environments that are challenging to enforce across the board at scale.
Operationally simpler to scale out to multiple solutions: In most cases, developers likely have to run multiple technologies to solve their use case. For something like a data stack you can consider Airbyte with other solutions such as Superset, Airflow, and Presto. But that is challenging and time-consuming to do effectively. If they are committed to running all those applications together, a self-hosted Kubernetes model becomes quite powerful because it unifies all applications in a singular environment. Developers can create a unified management experience with a web UI on top. The other benefit is the application upgrade process is unified. The complexity involved with upgrading a tool like Kubeflow requires a lot of dependencies at the Kubernetes level. Those dependencies might clash with your deployment of an application like Airflow, which requires something similar to K9 or Istio under the hood. With a unified platform, this process is simplified, developers can easily run a diff check between the dependencies of all the applications and validate that they will be upgradable at any given time.

Three Constraints Organizations Face When Deploying Open-Source Applications on Kubernetes

Developers deploying open-source applications on Kubernetes usually face these three common constraints. Photo by Mike Szczepanski / Unsplash

Prior to founding Plural, I began to investigate the Kubernetes deployments of popular open-source applications. During my research, It became clear that there was a wide gap between a solid Kubernetes deployment and a fully hosted offering from either a cloud provider or a legacy software vendor. So, while Kubernetes has a lot of technical potential there is still a huge gap that is not generally commercially viable to the average developer.

After talking with hundreds of developers over the last two years, I have seen these three common constraints among organizations looking to deploy open-source applications on Kubernetes.

Applications need to be tailored to each specific cloud. It has become pretty clear to me that solving for cloud customizability is as complex as a code management problem. Each cloud has its own services, APIs, and conventions which leads to developers navigating multiple sets of documentation depending on the cloud provider.
Running applications directly on Kubernetes is deeply customizable. This usually impedes a functionable out-of-the-box experience, which most legacy tools and cloud providers are unable to accomplish. And solving a complicated problem like this in-house is a hassle, especially if you are already short on engineering resources.
Application lifecycle management is challenging. Similar to solving for cloud customizability, application lifecycle management is an extremely complex process. In my experience, I have found that some applications are easy to install but a pain to manage, which leads to technical debt.

Currently, legacy constraints prevent all the major cloud providers (Amazon, Google, Microsoft) from doing this right since they need to protect their core cloud business. The core technology needed to solve this is there but is not readily available and offered in a holistic manner to the open market.

Solving the three DevOps constraints

While the constraints I just mentioned can be solved with enough DevOps firepower, they can present genuine roadblocks to teams without the specialized experience required to run OSS applications on Kubernetes. With our open source project, Plural, we have specifically solved each of these constraints.

How Plural solves for the three common DevOps constraints. Image courtesy of CNCF.

Plural is a free, open-source, unified application deployment platform that dramatically simplifies running open-source applications on Kubernetes. Plural aims to make applications production-ready from day 0 and offers over 60 packages ready to deploy from our marketplace, allowing you to truly build the open-source stack of your choice.

With Plural, you also can:

Install pre-packaged open-source applications with one command
Add authentication/SSO to your open-source apps
Deploy and manage deployments of Kubernetes with minimal prior experience
Utilize a GitOps workflow with a batteries-included transparent secret encryption

If you are interested in learning more about Plural, everything we built is open-source, so feel free to check it out on our GitHub. If there is an open-source application you want available on the Plural marketplace, we encourage you to contribute it to the Plural ecosystem by following our documentation which outlines how to do so. Most of all, we are dedicated to making Kubernetes infrastructure easy to deploy for everyone by fostering a community of self-hosted engineers powered by open source.

Why You Shouldn't Overlook Day 2 Kubernetes

Michael Guarino — Tue, 23 Aug 2022 12:22:28 +0000

Day 2 Kubernetes can be challenging. Here's why you shouldn't overlook it's implications. Photo by Alexandr Bormotin / Unsplash.

Deciding to implement Kubernetes (Day 0) and then getting your first deployment up and running (Day 1) is hard enough. But then there’s everything that comes after, commonly known as Day 2 Kubernetes. Many organizations overlook this stage, which is fraught with challenges and problems.

Once the initial excitement wears off, Day 2 is the make-or-break moment when your team needs to figure out how to manage and maintain Kubernetes for the long term. Otherwise, as you add features to your app and grow the complexity of your deployment, costs can and will pile up in the form of expensive outages, integration headaches, and lost developer velocity.

I have spent the past year talking to dozens of best-in-class DevOps teams about how to overcome some common operational challenges engineering teams face when wrangling Kubernetes.

pluralsh / plural

Deploy open source software on kubernetes in record time. 🚀

The fastest way to build great infrastructure

Plural empowers you to build and maintain cloud-native and production-ready open source infrastructure on Kubernetes

🚀🔨☁️

✨ Features

Plural will deploy open source applications on Kubernetes in your cloud using common standards like Helm and Terraform.

The Plural platform provides the following:

Dependency management between Terraform/Helm modules, with dependency-aware deployment and upgrades.
Authenticated docker registry and chartmuseum proxy per repository.
Secret encryption using AES-256 (so you can keep the entire workflow in git).

In addition, Plural also handles:

Issuing the certificates.
Configuring a DNS service to register fully-qualified domains under onplural.sh to eliminate the hassle of DNS registration for users.
Being an OIDC provider to enable zero touch login security for all Plural applications.

We think it's pretty cool! 😎 Some other nice things:

☁️ Build and manage open cloud-native architectures

The plural platform ingests all deployment artifacts needed to deploy…

View on GitHub

Here is what I learned:

Why solving Day 2 Kubernetes is crucial

Day 2 Kubernetes can often feel like a puzzle for most engineering teams. Photo by Markus Winkler / Unsplash

Day 2 Kubernetes covers DevOps processes—like monitoring, testing, runbooks, and alerting—that maintain the performance and reliability of your clusters. Often, these operations aren’t given careful thought in the initial push to deploy Kubernetes as quickly as possible. After all, there’s an extensive amount of terminology and concepts to learn in order to break into Kubernetes and just figure out the basics, like how to convert a Docker Compose file into a production K8s service.

However, while figuring out your initial deployment, it’s important to also think ahead to Day 2 and beyond. As with any open-source technology, choosing to self-host Kubernetes rather than a managed solution can provide huge cost savings and flexibility, but it comes with risks.

If your Kubernetes clusters are not well managed, monitored, or understood, your engineers can end up spending a significant amount of time root-causing and fixing failures. Security breaches or governance issues could lead to PR or compliance disasters. You could run up cloud costs as a result of misconfigurations. And overall, morale can take a hit as engineers spend more time writing Helm charts than they spend working on product features.

What problems do organizations face with Day 2 Kubernetes?

While it varies by organization, you can break down Kubernetes Day 2 problems into the below five areas. Photo by Rob Wicks / Unsplash

The problems that engineering organizations encounter when managing K8s tend to break down into these five areas:

Learning curve & knowledge transfer

Whether you’re using Kubernetes for just your data stack or converting your entire monolithic system into distributed microservices, you want to avoid a situation where just one or two engineers are responsible for maintaining your solution. However, there’s a steep learning curve and an overwhelming amount of material out there about K8s.

Furthermore, not only do you have to master the core Kubernetes API, you also have to master the toolchains to manage K8s. With so many options out there for different tools (Helm or Kustomize? Terraform or Ansible?), your solution will often end up being very specialized, making it painful to onboard new engineers or lose knowledge that exists within a few engineers in the org.

Visibility

In most cases, especially if you use AWS, you won’t have a dashboard built-in for Kubernetes. To understand what all your resources are, you’ll need to use the command-line interface (kubectl)—and while some people are very comfortable with this, most aren’t and need the benefit of a visual interface.

Third-party app integrations

Often, the problems you’ll face with Day 2 Kubernetes aren’t technically Kubernetes problems. Rather, it’s the operational idiosyncrasies of how other applications interact with K8s that will give you headaches. For example, if you want to deploy Airflow on Kubernetes, you might not know how to scale the database underneath it or how to scale the workers, which metrics to visualize, or what CPU/memory tradeoffs to make.

This operational knowledge is unique to each application and has to be learned from scratch every time there’s a new open-source tool you want to use on Kubernetes. Any misconfigurations could result in a higher cloud bill than you really need to spend.

Monitoring, alerting, and disaster recovery

While you can get some logging built-in with K8s, in Day 2 it’s essential to set up your logs to connect to a central system (or set of tools) that you use for observability and alerting. Logging a dynamic, distributed system like Kubernetes is complicated. You’ll want to monitor multiple layers (e.g. Node and Cluster levels), each with its own lifecycle and different kinds of logs.

Along with logging, an alerting and disaster recovery strategy are a must for Day 2 Kubernetes. Again, teams can run into problems here because of the distributed nature of the system. It may not always be clear who the owner is for each service, so the person on-call might have no idea what to do or even who to contact in the case of an outage.

Security and governance

Kubernetes can be beneficial from a security perspective. If you have a consolidated networking layer using K8s, you don’t have to worry about exposing more data than you need to, and you can run an extra-secure layer on top of potentially less-secure third-party apps.

However, the way you store secrets and check for vulnerabilities will need to be adapted to work for Kubernetes, which can be especially challenging if you’re new to managing a distributed system. Furthermore, you’ll need to set up new access controls that follow your company’s best practices around governance and compliance.

What a solution to Kubernetes Day 2 looks like

A Kubernetes Day 2 Solution has to cover at a minimum the below six components. Photo by Antonio Janeski / Unsplash

In my experience, a solution to Day 2 Kubernetes needs to have the following components at a minimum:

Dashboarding: A visual interface for managing your resources, for people who don’t want to use the command line.
Integration testing suite: When you push a new version of a package to production, you want some way to automatically deploy it to test clusters and run health checks to make sure that everything is working perfectly.
Access controls: It should be easy to set up access controls for your cluster from a central location, and audit trails should be baked in.
Observability and alerting: If anything goes wrong, you need to be able to root-cause the issue quickly and alert the right people.
Runbooks for disaster recovery: When there’s an issue, you need runbooks so that anyone on-call can quickly implement a fix. Which leads to the final point…
Automation: Too often, teams end up reinventing the wheel when managing Kubernetes. When you want to deploy anything on K8s, you should be able to quickly find all the dashboards you need, all the hooks for scaling, and interactive runbooks that make the process repeatable.

Many companies try to string these components together from different fragmented DevOpssolutions. However, to have a really effective solution, you need the whole suite to work together. When an alert fires, it should hook up to a runbook and point you to the fix. All your operations should be automated—and the knowledge around these operations should be accessible and available to everyone on the team, not just a few engineers.

To learn more about how Plural works and how we are helping engineering teams across the world deploy open-source applications in a cloud production environment, check outour Github to get started today.

Join us on our Discord channel for questions, discussions, and to meet the rest of the community.

Kubernetes StatefulSets are Broken

Michael Guarino — Thu, 11 Aug 2022 18:03:00 +0000

Don't get me wrong; we are strong supporters of Kubernetes. It is a critical piece of our architecture and provides massive value when wielded correctly. But, Kubernetes was originally intended to act as a container orchestration platform for stateless workloads, not stateful applications.

Over the past few years, the Kubernetes community has done a great job evolving the project to support stateful workloads by creating StatefulSets, which is Kubernetes' answer to storage-centric workloads.

pluralsh / plural

Enterprise Kubernetes management, accelerated. 🚀

The fastest way to build great infrastructure and deploy your software

Plural empowers you to build and maintain cloud-native and production-ready infrastructure on Kubernetes

🚀🔨☁️

✨ Features

Plural will deploy open-source applications and proprietary services on Kubernetes in your cloud using common standards like Helm and Terraform.

The Plural platform provides the following:

Cluster API Providers to create and manage clusters at scale.
Full visibility of your fleet and all deployed services via our secure Auth Proxy.
Configuration management for deployments, allowing you to parameterize services for each deployment.
Horizontal scaling to ingest and auto-shard as many repos as necessary.
Dependency management between Terraform/Helm modules for open-source applications, with dependency-aware deployment and upgrades.
Authenticated docker registry and chartmuseum proxy per repository.

In addition, Plural also handles:

Issuing the certificates.
Configuring a DNS service to register fully-qualified domains under onplural.sh to eliminate the hassle of DNS registration for users.
Being an…

View on GitHub

StatefulSets run the gamut from databases, queues, and object store to janky old web applications that need to modify a local filesystem for whatever reason. They provide developers with a set of pretty powerful guarantees:

Consistent network identity for each pod: This allows you to easily configure the DNS address to the pod in your application. It works great for database connection strings or configuring complicated Kafka clients. We also use it for setting up erlang’s mesh network at times too.
Persistent volume automation: Whenever a pod is restarted, even if it is rescheduled onto a different node, the persistent volume is reattached to the node it is placed on. This is somewhat limited by the capabilities of the CSI (Container Storage Interface) you’re using. For instance on AWS this only works within the same regional AZ since EBS volumes are AZ-linked.
Sequential Rolling Updates: StatefulSet updates are designed to be rolling and consistent. It will always update in the same order which can help preserve systems that have delicate coordination protocols.

These guarantees cover a ton of the operations needed to run a stateful workload. In particular, it almost completely handles the availability portion. Given that EBS uptime and redundancy guarantees are extremely strong, the StatefulSet’s rescheduling automation almost trivially guarantees you a high availability service. However, some caveats do apply (e.g., that you have room in your cluster and don’t botch the AZ setup.)

Kubernetes has a ton of promise in this area, and in theory, could certainly evolve into a platform to easily run stateful workloads alongside the stateless ones most developers use it for.

What’s Missing From the Kubernetes StatefulSet?

So why do we think StatefulSets are broken? Well, if you run through the operational needs of a stateful workload in your head, there’s one key component that you might notice is missing:

What do you do when you need to resize the underlying disk?

The dataset is a common database store that typically grows at a pretty constant positive rate. Unless you support horizontal scaling and partitioning, you’ll need to add headroom in the disk as that dataset grows. This is where Kubernetes falls flat on its face.

Currently, the StatefulSet controller has no built-in support for volume resizing. This is despite the fact that almost all CSI implementations have native support for volume resizing the controller could hook into. There is a workaround, but it’s almost ludicrously roundabout:

Delete the StatefulSet while orphaning pods to avoid downtime with: kubectl delete sts --cascade=orphan
Manually edit the persistent volume for each pod to the new storage size
Manually edit the StatefulSet volume claim with the new storage size and add a dummy pod annotation to force a rolling update
Recreate the StatefulSet with that new spec which allows the controller to reclaim control of the orphaned pods and begin the rolling update which will trigger the CSI to apply the volume resize

We actually automated this entire process as part of the Plural operator. We knew we’d need to build storage resize automation to make stateful applications running with Plural to be operable by non-Kubernetes experts. It’s a nontrivial amount of logic in reality and if someone were asked to do it in a high-pressure scenario, the chances of failure are incredibly high.
Okay, so there’s a pretty noteworthy flaw in Kubernetes StatefulSets, but there is a workaround even if it’s somewhat janky.

That shouldn’t be too bad, right?

But it gets worse!

The situation gets downright painful when you realize the impact of this limitation and that a lot of the Kubernetes operators have been built to manage stateful workloads.

A pretty good example is the Prometheus operator, which is a great project for both provisioning Prometheus databases and allowing a CRD-based workflow for configuring metrics, scrapers, and alerts.

The problem arises because the built-in controller for the operator has no logic to manage StatefulSet resize, but it does have the logic to recreate its underlying StatefulSet if it sees an event that triggered its deletion. This means that you effectively have no way to use the above workaround, since the moment you do a cascade orphan delete, the operator will recreate the StatefulSet against the old spec and prevent proper resize. The only solution is to delete the entire CRD or find a tweak that can fool the operator into not reconciling the object (sometimes scale to zero will do this).

Regardless, as a result of this flaw, there is effectively no way to resize a Prometheus instance with the operator without either significant downtime or data loss. Considering how robust the automation in StatefulSets is in all other cases, it’s pretty shocking that this is still a potential failure mode.

Our Head of Community, Abhi, actually hit this issue with interplay between operators and StatefulSet volume resizes as well while implementing it in the open-source Vitess operator.

“Considering the natural complexity of a Vitess deployment, you can infer that disk resizing is proportionally complicated. Vitess is a database sharing system that sits on top of MySQL, meaning that volume resizing had to be both partitioning-aware and shard-aware. We had to manually write our own shard-safe rolling restarts, create a cascade condition that worked with the parent-child structure of Vitess custom resources, and address every conceivable failure condition to prevent downtime. Shoutout to notable Kubernetes contributor enisoc for designing this feature.”
Other widely used and notable database operators, like Zalando's Postgres operator, effectively reimplement the same procedure we implemented in the Plural operator in their own codebase. This causes a ton of wasted developer cycles on a problem that should only have to be fixed once.

The Potential of Kubernetes

In general, we are extremely bullish on the potential for Kubernetes to make the operations of virtually any workload almost trivial, and a huge part of our mission at Plural is to make that a possibility.

That said, we also need to be clear-eyed about gaps that still remain in the Kubernetes ecosystem, so we can either work around them or close them upstream. I think it’s pretty clear this is a significant gap, and if prioritized, this could be fixed pretty easily in a future release of Kubernetes.

If you thought this was interesting check out what we’re doing with Kubernetes here. Thanks for reading!

Forem: Michael Guarino

Understanding Deprecated Kubernetes APIs and Their Significance

Deprecating and Removing Kubernetes APIs

Why the Concern about Deprecated APIs?

Where and How Kubernetes APIs are Utilized

Challenges in Identifying Deprecated APIs in Your Cluster

What is Plural CD?

Plural CD to detect deprecated Kubernetes APIs

What you need to know about Self-Hosting Large Language Models (LLMs)

Before you decide to self-host

Why Self-Host LLMs?

Popular Solutions to host LLMs

OpenLLM via Yatai

Features that stand out

Why run Yatai on Plural

Ray Serve Via Ray Cluster

Features that stand out

Why run Ray on Plural

Hugginface’s TGI

Features that stand out

Why run Hugging Face LLM on Plural

Building a LLM stack to self-host

Plural to self-host LLMs

Data Engineering Glossary

What is data engineering?

Glossary of Data Engineering terms

Big Data

Business Intelligence

Data Analyst

Data Architecture

Data Compliance

Data Exploration

Data Enrichment

Data Governance

Data Ingestion

Data Integration

Data Lake

Data Lakehouse

Data Mart

Data Mesh

Data Mining

Data Observability

Data Orchestration

Data Modeling

Data Pipeline

Data Preparation

Data Science

Data Quality

Data Source

Data Stack

Data Warehouse

Data Wrangling

Deduplication

ELT

ETL

Machine Learning

Reverse ETL

Plural for Data Engineers

Architecture Review: Dagster vs. Airflow

What is Data Orchestration?

What is Airflow?

What is Dagster?

Airflow is a legacy tool, but it is here to stay.

Why Dagster is better than Airflow

Wrapping Up

Open-Core Companies Are Not Incentivized To Make Their Projects Good

Cannibalization

Community is a nontrivial endeavor

A product manager’s nightmare

Counterexamples

Fixing the incentive structure

Open-source support agreements are healthy

Your hosted SaaS does not need to be open-core

7 Kubernetes Best Practices

Upgrade your Kubernetes version

Use namespaces

Set up role-based access control (RBAC)

Organize your cluster with labels

Use a Git-based workflow

Set up automated monitoring