<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Brian Fletcher</title>
    <description>The latest articles on Forem by Brian Fletcher (@brian_fletcher_bdacb7417b).</description>
    <link>https://forem.com/brian_fletcher_bdacb7417b</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2051734%2Ff941be91-beb3-49d4-a918-f624e0abad9e.png</url>
      <title>Forem: Brian Fletcher</title>
      <link>https://forem.com/brian_fletcher_bdacb7417b</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/brian_fletcher_bdacb7417b"/>
    <language>en</language>
    <item>
      <title>Improving Backstage performance (by up to 48x)</title>
      <dc:creator>Brian Fletcher</dc:creator>
      <pubDate>Fri, 04 Oct 2024 07:57:19 +0000</pubDate>
      <link>https://forem.com/roadie/improving-backstage-performance-by-up-to-48x-12d9</link>
      <guid>https://forem.com/roadie/improving-backstage-performance-by-up-to-48x-12d9</guid>
      <description>&lt;p&gt;Backstage is an excellent framework for building an internal developer portal. It provides all of the fundamental building blocks to improve developer experience in an organization.&lt;/p&gt;

&lt;p&gt;Core to Backstage is its Catalog of entities. The Catalog provides a database of software components, resources, libraries, and other kinds of software items. It provides client code and an API backend to retrieve items in the catalog, along with the software interfaces required to populate the entity catalog. Its model is flexible, customizable, and powerful.&lt;/p&gt;

&lt;p&gt;However, with great power comes great responsibility. Without experience, it's easy to develop anti-patterns in Backstage catalog usage. These anti-patterns can then turn into major performance issues at scale. This in turn leads affects trust and usage of the product as a whole.&lt;/p&gt;

&lt;p&gt;At Roadie, we provide an out-of-the-box version of Backstage for our customers. We have come across many of the ways in which non-optimal Catalog client usage can affect performance of the application as a whole. We have seen these performance issues result in lagging page loads and (in extreme cases) causing page loads to fail in Backstage. &lt;/p&gt;

&lt;p&gt;By applying the patterns explained in this post, you could see a huge improvement in Catalog response time. In some cases, you may even see Catalog queries perform 48x faster!&lt;/p&gt;

&lt;h1&gt;
  
  
  Architecture of the Entity catalog
&lt;/h1&gt;

&lt;p&gt;The entity catalog in Backstage is made up of three components. A Catalog client, the Catalog backend, and the Catalog database. When Backstage starts up for the first time, it will have a Catalog database and a catalog backend. When you visit the Backstage application in your browser, it will make use of the Catalog client to retrieve data from the Catalog backend. The Catalog backend in turn retrieves the requested catalog items from the Catalog database.&lt;/p&gt;

&lt;h1&gt;
  
  
  Using the Backstage Catalog Client
&lt;/h1&gt;

&lt;p&gt;Soon after deploying Backstage in an organization, users will want to customize it.&lt;/p&gt;

&lt;p&gt;Frequent customizations we come across include loading entities into the Catalog from an in-house platform or visualizing data from an internal system in the Backstage UI. Customization is normal and is a sign that Backstage is adding value for teams.&lt;/p&gt;

&lt;p&gt;When developers write extensions to Backstage, it's likely they will come across the need to interact with the Catalog. There are two ways they can do this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;via a Frontend Backstage extension&lt;/li&gt;
&lt;li&gt;via a Backend Backstage extension&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;To make use of the Catalog client in a frontend Backstage extension, you are likely to be using the &lt;code&gt;useApi&lt;/code&gt; hook, along with a &lt;code&gt;useAsync&lt;/code&gt; function.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;catalogApiRef&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@backstage/plugin-catalog-react&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;useApi&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@backstage/core-plugin-api&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;stringifyEntityRef&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@backstage/catalog-model&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;useAsync&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;react-use/lib/useAsync&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;CustomReactComponent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;catalogApi&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;useApi&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;catalogApiRef&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;entities&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;useAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;catalogApi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getEntities&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;items&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;catalogApi&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;

  &lt;span class="k"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&amp;lt;&amp;gt;&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;entities&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;stringifyEntityRef&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;/&amp;gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In a backend Backstage extension, you are likely to be constructing the Catalog client using the discovery client. The discovery client is a helper that allows plugins to discover the API location of other clients.&lt;/p&gt;

&lt;p&gt;Generally, if you are writing a backend plugin, like a new REST API or a Catalog processor, you will have access to the discovery client. Depending on your particular situation, you may have access to the discovery client in a different way.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;CatalogClient&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@backstage/catalog-client&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;DiscoveryApi&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@backstage/core-plugin-api&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;getAllEntities&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;discovery&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;DiscoveryApi&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; 
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;catalogApi&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;CatalogClient&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;discoveryApi&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;discovery&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;catalogApi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getEntities&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;items&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You will notice that once you have an instance of the CatalogApi, it is used in the same way in either the frontend or a backend extension.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;catalogClient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getEntities&lt;/span&gt;&lt;span class="p"&gt;()).&lt;/span&gt;&lt;span class="nx"&gt;items&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Check &lt;a href="https://backstage.io/docs/reference/catalog-client.catalogapi/" rel="noopener noreferrer"&gt;the Backstage docs&lt;/a&gt; for a more comprehensive explanation of the full Catalog interface.&lt;/p&gt;

&lt;h1&gt;
  
  
  How big can a Backstage Catalog get?
&lt;/h1&gt;

&lt;p&gt;When thinking about Catalog size, it’s useful to think about two things: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How big is each individual entity?&lt;/li&gt;
&lt;li&gt;How many entities do you have?&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  A typical Entity
&lt;/h3&gt;

&lt;p&gt;A typical Backstage Entity looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;backstage.io/v1alpha1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Component&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;artist-web&lt;/span&gt;
  &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;The place to be, for great artists&lt;/span&gt;
  &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;example.com/custom&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;custom_label_value&lt;/span&gt;
  &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;example.com/service-discovery&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;artistweb&lt;/span&gt;
    &lt;span class="na"&gt;circleci.com/project-slug&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;github/example-org/artist-website&lt;/span&gt;
  &lt;span class="na"&gt;tags&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;java&lt;/span&gt;
  &lt;span class="na"&gt;links&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://admin.example-org.com&lt;/span&gt;
      &lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Admin Dashboard&lt;/span&gt;
      &lt;span class="na"&gt;icon&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;dashboard&lt;/span&gt;
      &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;admin-dashboard&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;website&lt;/span&gt;
  &lt;span class="na"&gt;lifecycle&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;production&lt;/span&gt;
  &lt;span class="na"&gt;owner&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;artist-relations-team&lt;/span&gt;
  &lt;span class="na"&gt;system&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;public-websites&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It describes a website called the &lt;code&gt;artist-web&lt;/code&gt;. It has a few basic Backstage entity properties, and some annotations, tags, and links. It's encoded in YAML here, but in Backstage's database, it is stored as plain text in JSON format.&lt;/p&gt;

&lt;p&gt;Uncompressed, this entity definition is about half a kilobyte. Therefore, a Catalog containing about 20,000 similarly sized entities would add up to about 10 megabytes of data uncompressed. That's a pretty big chunk of data to be sending over the wire.&lt;/p&gt;

&lt;p&gt;However, we haven't seen anything yet…&lt;/p&gt;

&lt;p&gt;The Backstage Catalog model defines an API Kind. These are used to document the endpoints that services make available in Backstage. API entities often contain an embedded OpenAPI doc.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;backstage.io/v1alpha1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;API&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;artist-api&lt;/span&gt;
  &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Retrieve artist details&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;openapi&lt;/span&gt;
  &lt;span class="na"&gt;lifecycle&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;production&lt;/span&gt;
  &lt;span class="na"&gt;owner&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;artist-relations-team&lt;/span&gt;
  &lt;span class="na"&gt;system&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;artist-engagement-portal&lt;/span&gt;
  &lt;span class="c1"&gt;# The embedded OpenAPI spec is in the definition&lt;/span&gt;
  &lt;span class="na"&gt;definition&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;openapi: "3.0.0"&lt;/span&gt;
    &lt;span class="s"&gt;info:&lt;/span&gt;
      &lt;span class="s"&gt;version: 1.0.0&lt;/span&gt;
      &lt;span class="s"&gt;title: Artist API&lt;/span&gt;
      &lt;span class="s"&gt;license:&lt;/span&gt;
        &lt;span class="s"&gt;name: MIT&lt;/span&gt;
    &lt;span class="s"&gt;servers:&lt;/span&gt;
      &lt;span class="s"&gt;- url: http://artist.spotify.net/v1&lt;/span&gt;
    &lt;span class="s"&gt;paths:&lt;/span&gt;
      &lt;span class="s"&gt;/artists:&lt;/span&gt;
        &lt;span class="s"&gt;get:&lt;/span&gt;
          &lt;span class="s"&gt;summary: List all artists&lt;/span&gt;
    &lt;span class="s"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At Roadie, we have seen multiple customers with API kind entities in their Catalog with embedded OpenAPI docs as large as 1 megabyte in size. It's easy for even a small-sized engineering organization to have 50 such APIs documented in Backstage. Unoptimized, that's 50MB+ of data that's being transferred every time we query the full Catalog.&lt;/p&gt;

&lt;p&gt;This Catalog size is important because poor use of the Catalog APIs can cause huge database queries and API response sizes to result, which will cause both unwarranted traffic across the network and unwanted wasted time transferring, encoding, and decoding that data.&lt;/p&gt;

&lt;h1&gt;
  
  
  How to Make Good Use of the Entity Catalog
&lt;/h1&gt;

&lt;p&gt;With all this said, we wanted to run through some good practices that are going to help with improving the Backstage experience. The examples we use in the tables below are measured in the browser on a production Catalog that has 14k entities.&lt;/p&gt;

&lt;p&gt;Unoptimized, we're looking at 2.16 seconds and 59.5 MB of data. That's our starting point. Now each experiment we do below is going to improve that data.&lt;/p&gt;

&lt;h2&gt;
  
  
  Only query the entities fields that you need
&lt;/h2&gt;

&lt;p&gt;By default, when retrieving entities from the Backstage Catalog, Backstage will return the whole entity for each item listed. As mentioned above, an entity might be as large as 1 megabyte. As such, limiting fields requested to the ones that are strictly required can help a lot. For example, you might have code like the following that is requesting every entity in the Catalog and then converting the result into an array of entity references.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;catalogClient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getEntities&lt;/span&gt;&lt;span class="p"&gt;()).&lt;/span&gt;&lt;span class="nx"&gt;items&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;stringifyEntityRef&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you look under the hood, you'll find that the &lt;code&gt;stringifyEntityRef&lt;/code&gt; function only makes use of the kind, name, and namespace. As such, we can cut down on the amount of data transferred across the network by limiting the fields that are requested.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;catalogClient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getEntities&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;fields&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;kind&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;metadata.name&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;metadata.namespace&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;})).&lt;/span&gt;&lt;span class="nx"&gt;items&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;stringifyEntityRef&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Operation&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Time&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Size&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;catalogClient.getEntities()&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;2.16 seconds&lt;/td&gt;
&lt;td&gt;59.5 Mb&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;catalogClient.getEntities({ fields: ['kind', 'metadata.name', 'metadata.namespace'] })&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;0.767 ms&lt;/td&gt;
&lt;td&gt;1.2 Mb&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Make use of the filter option to retrieve only entities that you need
&lt;/h2&gt;

&lt;p&gt;We have seen a pattern develop whereby the Catalog client is used to retrieve all of the entities in the Catalog, and then the list is filtered client side.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;catalogClient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getEntities&lt;/span&gt;&lt;span class="p"&gt;()).&lt;/span&gt;&lt;span class="nx"&gt;items&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;entity&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;entity&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;kind&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Group&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It is more efficient to send a filter to the catalog client so that filtering is done either in the Backstage backend or the Backstage database.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;catalogClient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getEntities&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Group&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;})).&lt;/span&gt;&lt;span class="nx"&gt;items&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Operation&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Time&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Size&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;catalogClient.getEntities()&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;2.16 seconds&lt;/td&gt;
&lt;td&gt;59.5 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;await catalogClient.getEntities({ filter: { kind: ['Component'] } })&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;69 milliseconds&lt;/td&gt;
&lt;td&gt;2.5 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Avoid retrieving all entities in order to count entities
&lt;/h2&gt;

&lt;p&gt;A common pattern we see in Backstage is for developers to download the whole contents of the Catalog in order to count the entities in that Catalog. The following code will cause Backstage to query the database for every entity, then the client will need to decode the JSON it retrieves in order to count the number of entities.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;catalogApi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getEntities&lt;/span&gt;&lt;span class="p"&gt;({})).&lt;/span&gt;&lt;span class="nx"&gt;items&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There is a far more performant way to do this, using the query API. The following requests a limit of 1 entity to be returned, and also requests that the &lt;code&gt;uid&lt;/code&gt; field from the entity is the only item that is returned for that entity. The query API always returns the total count of entities for that query. As such it gives us what we need with out downloading the whole Catalog to the client.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;catalogClient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;queryEntities&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;fields&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;metadata.uid&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="na"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nx"&gt;totalItems&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The change suggested here is going to save work for the Backstage database, the Backstage backend, and the Backstage frontend.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Operation&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Time&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Size&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;catalogClient.getEntities()&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;2.16 seconds&lt;/td&gt;
&lt;td&gt;59.5 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;catalogClient.queryEntities({ fields: ['metadata.uid'], limit: 1 })&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;45 milliseconds&lt;/td&gt;
&lt;td&gt;0.5 Kb&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Enable Gzip Compression
&lt;/h2&gt;

&lt;p&gt;When not using the Catalog Client, we recommend using &lt;code&gt;gzip&lt;/code&gt; encoding to reduce the amount of data transferred. This is crucial because requests for large amounts of data directly from the Backstage APIs can be massive. Enabling compression significantly decreases the data volume sent to the client. You can achieve this by including the &lt;code&gt;Accept-Encoding&lt;/code&gt; header with your client requests.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Operation&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Size&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;curl https://backstage-server/api/catalog/entities&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;59.5 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;curl https://backstage-server/api/catalog/entities -H 'Accept-Encoding: gzip'&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;6.7 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Keep Backstage up to date
&lt;/h2&gt;

&lt;p&gt;The first, and perhaps the most important thing to consider is to keep Backstage up to date. If you are using Roadie, you are already using a very recent version of Backstage. However, if you are managing Backstage yourself, you may have fallen behind. Backstage releases new versions at least once a month, and these versions often contain very valuable performance improvements to the Catalog.&lt;/p&gt;

&lt;p&gt;For example, in version 1.6.7 of the Catalog client library, there was an optimization. Previously, the Catalog client would sort all entities before returning them to the caller. This is a nice, helpful utility until there are thousands of entities to sort. Often, it is not necessary or optimal to receive a sorted list of entities.&lt;/p&gt;

&lt;h2&gt;
  
  
  Collaborate on the OSS Backstage core project
&lt;/h2&gt;

&lt;p&gt;As part of research for this document, we spoke with the core maintainers of Backstage, and there are some great ideas about how to continue to improve the performance. For example, it has been discussed that by default, the getEntities function should be replaced by an iterator object. That iterator would be used to page over the list of entities rather than retrieving the whole list.&lt;/p&gt;

&lt;p&gt;As such, keeping up to date with Backstage releases will allow you to benefit from these performance improvements.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;This article is illustrative of some of the performance gains that can be achieved, and your mileage may vary. We have not delved into the performance implications of these changes on backend memory and database query performance. However, we can say that these changes can greatly improve these items too. It's difficult to quantify; however, at Roadie, we were seeing huge memory spikes and large garbage collections occurring in Backstage when the whole Catalog is queried. This is possibly due to the physical sizes of the entity Catalogs and the serialization and deserialization that occurs between the client, backend, and database.&lt;/p&gt;

&lt;p&gt;We have shown that making use of some good patterns can result in a much improved load times for users. We have shown some examples where timings are reduced from multiple seconds to sub-second. We have also shown that the sizes sent across the wire can be greatly reduced from multiple megabytes to tens of kilobytes.&lt;/p&gt;

&lt;p&gt;A well-managed and optimized internal developer portal can make your software engineers more efficient and empower them with the information they need. When load times are reduced from multiple seconds to sub 1 second, developers enjoy a fast, responsive experience that means they’re more likely to use Backstage and find what they need.&lt;/p&gt;

</description>
      <category>backstage</category>
      <category>developerportal</category>
      <category>platformengineering</category>
      <category>performance</category>
    </item>
    <item>
      <title>Scaling Backstage</title>
      <dc:creator>Brian Fletcher</dc:creator>
      <pubDate>Fri, 27 Sep 2024 09:00:29 +0000</pubDate>
      <link>https://forem.com/roadie/scaling-backstage-2ka3</link>
      <guid>https://forem.com/roadie/scaling-backstage-2ka3</guid>
      <description>&lt;p&gt;There are multiple challenges that arise when the volume of data in the Backstage grows to 1,000s and 10,000s of entities, ranging from performance to ease of use. We’ll explore these in this article as well as suggesting possible ways around them for your own Backstage deployments.&lt;/p&gt;

&lt;p&gt;The Backstage developer portal is an excellent tool for platform teams, as well as engineers, to keep a handle on their software, &lt;a href="https://roadie.io/blog/backstage-gets-quality-and-compliance-scorecards-with-roadie/" rel="noopener noreferrer"&gt;maintain compliance statuses&lt;/a&gt; and &lt;a href="https://roadie.io/docs/scaffolder/writing-templates/" rel="noopener noreferrer"&gt;spin up new services&lt;/a&gt;. Unfortunately the open source Backstage is known for its difficult set up time and overall cumbersome maintainability. &lt;/p&gt;

&lt;p&gt;This pain is often made worse when the catalog within a Backstage instance gets large. Below are a few high-level pointers that we have come across during our journey to support bigger engineering organizations. We’ve scaled our installation to support organizations with multiple tens of thousands of entities over the last year. We’ll dig deeper into individual topics, based on interest, on further articles. &lt;/p&gt;




&lt;blockquote&gt;
&lt;p&gt;Let us know about your optimization problems or questions in either Roadie or Backstage Discord channels!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Handling the Catalog data
&lt;/h2&gt;

&lt;p&gt;We have written more extensively about catalog performance and how to improve that &lt;a href="https://roadie.io/blog/improving-backstage-performance/" rel="noopener noreferrer"&gt;in a separate blog post.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When developing on top of Backstage, you are always building on the foundation of solid catalog data. This makes the CatalogAPI usually the most used API on both back- and frontend of the application. It may be that the entities in the system grow large (looking at you API specs) or that there is just a large quantity of them (looking at you &lt;a href="https://roadie.io/docs/integrations/aws-resources/" rel="noopener noreferrer"&gt;automatically ingested AWS resources&lt;/a&gt;). Therefore it is important to retrieve only the actual necessary fields that are displayed to the user and limit the amount of entities being fetched. &lt;/p&gt;

&lt;p&gt;The &lt;a href="https://backstage.io/docs/reference/catalog-client.catalogclient/" rel="noopener noreferrer"&gt;default CatalogClient&lt;/a&gt; has the option to retrieve only relevant &lt;em&gt;fields&lt;/em&gt; through the API. Do use it. Also make sure to use the pagination if possible and hit the correct endpoints with your catalog client. Retrieving less data is always going to be cheaper than retrieving more data. It really makes a big difference whether you want to JSON parse/stringify the biggest API docs in the world multiple times, along with the rest of the catalog data or you just want to use ineffective string manipulations for sorting purposes. &lt;/p&gt;

&lt;p&gt;For some cases we at Roadie have needed to introduce our own endpoints and catalog queries to improve the performance in larger catalogs. These endpoints could be as simple as creating pointed subset queries directly against the database table to identify only needed entities. Or possibly only returning partial entity data with preformatted response shapes. Having the ability to do use-case specific queries for relevant data, and use the better performance usually present in the database layer makes a big difference at times.&lt;/p&gt;

&lt;h2&gt;
  
  
  Processors vs. Providers
&lt;/h2&gt;

&lt;p&gt;In the early days of Backstage the approach to ingest entities into the catalog was by using &lt;em&gt;Processors&lt;/em&gt; to retrieve data from third party sources. This is still a remnant within the product, and is (unfortunately) still used by some integrations. The main purpose of processors nowadays is to enhance the entities, but same caveats on their usage are still present.&lt;/p&gt;

&lt;p&gt;CNCF maintainers of the project introduced &lt;em&gt;Providers&lt;/em&gt; to Backstage at a later stage. These providers allow more maneuverability to schedule and modify the payloads that you are sending to the catalog. &lt;a href="https://backstage.io/blog/2023/01/31/incremental-entity-provider/" rel="noopener noreferrer"&gt;Being able to chunk the ingested entities into smaller buckets&lt;/a&gt;, having the ability schedule the intervals with more (or less) granularity and having better visibility to the internals of the catalog is a big benefit when tweaking the catalog ingestion to work optimally.&lt;/p&gt;

&lt;p&gt;In many cases the problem may still remain though. The providers may have the need to &lt;em&gt;emit&lt;/em&gt; locations or other intermittent data before it is finally stored into the system as a full entity. And in those cases the entity may need to go through the processing pipeline again. &lt;/p&gt;

&lt;p&gt;When you encounter issues that may be related to this approach, make sure that your processors are nimble beasts and are definitely &lt;strong&gt;not&lt;/strong&gt; blocking the event loop. Small milliseconds make a difference here. The system is both processing a lot of data and doing it multiple times. User experience may also suffer when processing times get large or some immediately expected entities are clogged up behind a large processing queue. &lt;/p&gt;

&lt;p&gt;In Roadie we have taken an even more performant alternative approach for few specific entities that we natively support and ingest. We want more fine-grained control to serve our larger users better and have created an alternative, self-contained processing module to do the processing for some specific use cases.&lt;/p&gt;

&lt;h2&gt;
  
  
  Scaffolder
&lt;/h2&gt;

&lt;p&gt;By default the Scaffolder within a Backstage project runs in the same process as the rest of the application. This is by design, but there is an escape hatch that can be used to externalize this from the codebase. Backstage is built as a modular monolith and in theory has the possibility to be spread out into multiple services. &lt;/p&gt;

&lt;p&gt;There is a fair amount of work to achieve that but the payoff is usually there. The decision to make here is to identify the tradeoffs that your company is willing to sacrifice. Is it ok that only a single scaffolder run is manageable at one time? Does it matter if the rest of the Backstage application begins to show signs of slowness when other processes are running? &lt;/p&gt;

&lt;p&gt;The Scaffolder, and larger Tech Insights installations, take a lot of CPU cycles from the underlying hardware which may negatively interfere with the user experience. Blocking the event loop is the &lt;em&gt;big no-no&lt;/em&gt; in the Node.js world when it comes to performance. If you are running heavy tasks within the same process that you are using to serve your users, you may encounter bad times. If bad times appear, consider externalizing some of the chunkier pieces of your instance. These may include the Scaffolder, Tech Insights, Search indexing, Cost Insights and the catalog processing loop.&lt;/p&gt;

&lt;p&gt;Roadie has extracted the larger, more resource hungry processes into ephemeral standalone processes to avoid eating up the event loop cycles of the main application. In our case these are running in AWS, where we host your Roadie instances, as ECS tasks or Lambda functions, depending on the use case. With the backend system fully out for the Backstage project, it should be much easier to spin out supporting services into their own processes and leave the catalog alone to do what it does best, showing entities to the users in a performant manner.&lt;/p&gt;

&lt;h2&gt;
  
  
  Perceived Frontend Performance
&lt;/h2&gt;

&lt;p&gt;Of course, performance is relevant only to the users if they are able to &lt;em&gt;feel&lt;/em&gt; it. This is present in standard Backstage installations on the frontend layer of the application. Does your catalog load fast? Do you have a ton of frontend plugins installed and your bundle sizes are big? Do you need to rebuild your tech docs every time you navigate to the docs page? &lt;/p&gt;

&lt;p&gt;For the frontend resources, there are multiple well-known performance tricks that can be included in the build process and hosting solutions that you are using to serve your frontend app. In the end, the Backstage frontend is a single-page application with all the known benefits and caveats. All the data displayed will need to be retrieved from somewhere before they enter the Javascript runtime to be rendered on the screen. &lt;/p&gt;

&lt;p&gt;In most cases getting the data we want to display means API calls. For some that is ok, like getting cheap values from fast endpoints, but for some the roundtrip to the server is not worth it. You can embed relevant information to the &lt;code&gt;index.html&lt;/code&gt; that is either served from the backend (in newer Backstage installations) or pre-built during the deployment process. You can also use localstorage to your advantage, in fact Backstage does use this for some of its data, but not necessarily for caching purposes unfortunately.&lt;/p&gt;

&lt;h2&gt;
  
  
  YAMLs and scaling maintainability
&lt;/h2&gt;

&lt;p&gt;The canonical and recommended approach by the CNCF open source maintainers of Backstage is to use catalog manifest files, usually called &lt;code&gt;catalog-info.yaml&lt;/code&gt; within code repositories to store entity data for the catalog. In a large amount of cases this is the wrong approach. You maybe able to keep domain and system entities up to date easily, since they are small in numbers and change rarely. For other kinds/types of entities we have seen with multiple of our customers that maintaining and keeping those YAML files up to date in an engineering organization is difficult.&lt;/p&gt;

&lt;p&gt;A better approach to ingest entities in many cases is to automate the process of at least initial entity information from a more robust source of truth. In the end, trusting humans to update a random file in their repository just for the sake of updating it seems unlikely to succeed 100% of the time. &lt;/p&gt;

&lt;p&gt;In modern engineering organization there are multiple different good sources to use as the canonical starting point for your entity data. For users and groups you have your Oktas or Azure Entra Ids. For repositories you have your GitHub APIs. For components, APIs and resources you have your running instances, your exposed OpenAPI endpoints and your K8s or cloud provider APIs. &lt;/p&gt;

&lt;p&gt;Ingesting the relevant data automatically from these allows you to trust that the relevant information is up to date and mirrors correctly the software that is actually being developed within your organization and what is running in environments.&lt;/p&gt;

&lt;p&gt;That is in the end the purpose of a developer portal, a mapping of software that your are providing to your customers, not a mapping of software that you have at some point written into a well-formatted text file.&lt;/p&gt;

&lt;h2&gt;
  
  
  Rate limits
&lt;/h2&gt;

&lt;p&gt;There are downsides to automating your catalog ingestion as well. Backstage relies heavily on integrations towards third party APIs and this causes some implications on how up to date it can keep the catalog information. Being so reliant on other services and wanting to be the single pane of glass to display that information means that you need to be aware of the limitations of this system. &lt;/p&gt;

&lt;p&gt;Backstage by default is a pull-based system which contacts third parties using API tokens or other authentication information and retrieves relevant data. Usually in the form of plugins, this isn’t a massive issue since the actual concurrent user count is relatively small. Even for the bigger clients we don’t usually see high 3 digit morning rushes. The just-in-time nature of retrieving data on runtime to display from third parties on the familiar frontend thus works well.&lt;/p&gt;

&lt;p&gt;On the other hand, Backstage also stores data internally. Data that it gathers automatically from third parties and uses to generate insights or enhance entities. These &lt;em&gt;processing loops&lt;/em&gt; usually run at a schedule and try to slurp in as much as they can. Herein lies the problem where rate limits are introduced in the system. &lt;/p&gt;

&lt;p&gt;Monitoring rate limits against different system is extremely important and helps you identify when you getting close to the edge to make the downstream service angry. Backstage offers a good set of monitoring primitives to expose metrics from your providers. You can for example &lt;a href="https://backstage.io/docs/tutorials/setup-opentelemetry" rel="noopener noreferrer"&gt;set up open-telemetry&lt;/a&gt; to gather the information you need. Exposing rate limit information either by querying it on a loop periodically or directly embedding it to your fetch client implementations. The former approach gives you the ability to manually tweak your calling schedules to accommodate your integrations, the latter may give you the ability to automatically slow down the calling loop to satisfy the limits.&lt;/p&gt;

&lt;p&gt;Let us know if you have encountered any other aspects or approaches that have helped you scale your Backstage instance and work effectively within your organization.&lt;/p&gt;

</description>
      <category>backstage</category>
      <category>performance</category>
    </item>
  </channel>
</rss>
