<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: OSS Insight</title>
    <description>The latest articles on Forem by OSS Insight (@ossinsight).</description>
    <link>https://forem.com/ossinsight</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F864228%2F812cea51-bb48-427f-b7a5-7200c5a7c418.png</url>
      <title>Forem: OSS Insight</title>
      <link>https://forem.com/ossinsight</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/ossinsight"/>
    <language>en</language>
    <item>
      <title>The Unsung Heroes of Open Source: The Dedicated Maintainers Behind Lesser-Known Projects</title>
      <dc:creator>OSS Insight</dc:creator>
      <pubDate>Sun, 05 Mar 2023 06:53:08 +0000</pubDate>
      <link>https://forem.com/ossinsight/the-unsung-heroes-of-open-source-the-dedicated-maintainers-behind-lesser-known-projects-11ij</link>
      <guid>https://forem.com/ossinsight/the-unsung-heroes-of-open-source-the-dedicated-maintainers-behind-lesser-known-projects-11ij</guid>
      <description>&lt;p&gt;A few days ago, I read &lt;a href="https://github.com/zloirock/core-js/blob/master/docs/2023-02-14-so-whats-next.md"&gt;a blog post by the author of Core-js&lt;/a&gt;. To be honest, it was my first time hearing about Core-js. As someone who has written some front-end code and has been keeping up with open source projects, I feel a bit ashamed.&lt;/p&gt;

&lt;p&gt;However, there are many open source projects that are widely used but not well-known. In this blog post, I will take a closer look at a few of these unsung heroes of the open source world. I do not want to give them a business model or financial advice in this article. This largely depends on the author's personal experience and values. I just want to raise more awareness about these open source projects.&lt;/p&gt;

&lt;h2&gt;
  
  
  Core-js
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;GitHub repo: &lt;a href="https://github.com/zloirock/core-js"&gt;https://github.com/zloirock/core-js&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://github.com/zloirock/core-js"&gt;Core-js&lt;/a&gt; is a modular standard library for JavaScript. It provides polyfills for many ECMAScript features, as well as some additional features that are not included in the standard library. It's used by many popular JavaScript libraries and frameworks, including React, Vue.js, and Angular.&lt;/p&gt;

&lt;p&gt;Core-js has been downloaded more than 2.5 billion times from the npm package registry, making it one of the most widely used JavaScript libraries in the world. Despite its widespread use, the project does not receive much attention, and its star growth is very slow.&lt;/p&gt;

&lt;p&gt;Core-js is maintained by &lt;a href="https://github.com/zloirock"&gt;Denis Pushkarev&lt;/a&gt;, who started the project as a hobby in 2012 and open-sourced it in 2014.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--TJAOk-W5--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ofpiher7obaknnl1tqf9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--TJAOk-W5--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ofpiher7obaknnl1tqf9.png" alt="Image description" width="880" height="341"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;&lt;a href="https://ossinsight.io/explore/?id=b21fdc02-c9dc-4dcf-abc4-f74110e784dc"&gt;&lt;em&gt;Core-js' top contributors&lt;/em&gt;&lt;/a&gt;&lt;/center&gt;

&lt;p&gt;Based on the distribution of contributions to the project, it seems that Denis has provided more than 95% of the project's code. And as he said in the &lt;a href="https://github.com/zloirock/core-js/blob/master/docs/2023-02-14-so-whats-next.md"&gt;blog post&lt;/a&gt; I read, the project occupies almost all of his time—more than a full working day.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--aDB76jKK--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/feazaca1tp33o4mb8lcz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--aDB76jKK--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/feazaca1tp33o4mb8lcz.png" alt="Image description" width="880" height="503"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;
    &lt;a href="https://ossinsight.io/analyze/zloirock"&gt;&lt;em&gt;Denis' contribution time distribution&lt;/em&gt;&lt;/a&gt;
&lt;/center&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--MXPot8rg--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/xnf201k552mee7clfc5y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--MXPot8rg--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/xnf201k552mee7clfc5y.png" alt="Image description" width="880" height="266"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;
    &lt;a href="https://ossinsight.io/analyze/zloirock"&gt;&lt;em&gt;Denis' contribution time distribution&lt;/em&gt;&lt;/a&gt;
&lt;/center&gt;

&lt;p&gt;On February 14th, Denis’s blog brought significant attention to the Core-js project. Now he has opened multiple donation channels, including through &lt;a href="https://opencollective.com/core-js"&gt;Open Collective&lt;/a&gt;, &lt;a href="https://www.patreon.com/zloirock"&gt;Patreon&lt;/a&gt;, and &lt;a href="https://boosty.to/zloirock"&gt;boosty&lt;/a&gt;. He is actively exploring ways to ensure that Core-js can be maintained in the long term.&lt;/p&gt;

&lt;h2&gt;
  
  
  cURL
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;GitHub repo: &lt;a href="https://github.com/curl/curl"&gt;https://github.com/curl/curl&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://github.com/curl/curl"&gt;cURL&lt;/a&gt; is a command-line tool and library for transferring data over a wide range of network protocols, including HTTP, FTP, SMTP, and many others. It is used by millions of developers to download and upload files, test APIs, and automate tasks.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--fE38V42U--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/vxczcinra5em6qdgxsy9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--fE38V42U--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/vxczcinra5em6qdgxsy9.png" alt="Image description" width="880" height="612"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;
    &lt;a href="https://ossinsight.io/explore/?id=12cbe933-d371-435c-83e6-3fe8d46abf76"&gt;&lt;em&gt;cURL's top contributors&lt;/em&gt;&lt;/a&gt;
&lt;/center&gt;

&lt;p&gt;cURL is primarily maintained by Daniel Stenberg alone, who started working on the project in 1998. Fortunately, there are occasionally new contributors joining in as mentioned in this&lt;a href="https://twitter.com/bagder/status/1628421123586109440"&gt; tweet&lt;/a&gt;. This allows Daniel to maintain a more normal schedule and a full time job, and even&lt;a href="https://twitter.com/bagder/status/1546857830866722817"&gt; leave work early on Wednesdays to play floorball&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--f2HGFth_--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/unl00j6n89xc96pi47oz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--f2HGFth_--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/unl00j6n89xc96pi47oz.png" alt="Image description" width="880" height="488"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;
    &lt;a href="https://ossinsight.io/analyze/bagder"&gt;&lt;em&gt;Daniel's contribution time distribution&lt;/em&gt;&lt;/a&gt;
&lt;/center&gt;

&lt;p&gt;cURL has received sponsorship from various &lt;a href="https://curl.se/sponsors.html"&gt;organizations&lt;/a&gt; and &lt;a href="https://github.com/sponsors/curl#sponsors"&gt;individuals&lt;/a&gt;, including wolfSSL. WolfSSL employs &lt;a href="https://daniel.haxx.se/"&gt;Daniel&lt;/a&gt; and allows him to spend paid work hours on cURL.&lt;/p&gt;

&lt;h2&gt;
  
  
  ImageMagick
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;GitHub repo: &lt;a href="https://github.com/ImageMagick/ImageMagick"&gt;https://github.com/ImageMagick/ImageMagick&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;ImageMagick is a free and open-source software suite for displaying, converting, and editing raster image and vector image files. ImageMagick is used by millions of websites and applications to manipulate and display images, including popular content management systems like WordPress and Drupal.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--7MaQ8Eop--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/vu0ayt9gxz8mot5yqhn3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--7MaQ8Eop--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/vu0ayt9gxz8mot5yqhn3.png" alt="Image description" width="880" height="502"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;
    &lt;a href="https://ossinsight.io/explore/?id=eda659c3-8dfb-41b6-a79e-6ad126797ac1"&gt;&lt;em&gt;ImageMagick's top contributors&lt;/em&gt;&lt;/a&gt;
&lt;/center&gt;

&lt;p&gt;ImageMagick is maintained by a small group of developers, including its founder, &lt;a href="https://github.com/urban-warrior"&gt;John Cristy&lt;/a&gt;. Cristy started the project at DuPont in 1987 and released it in 1990. It is said that John Cristy has a full-time job and only maintains the project in his spare time.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--vkn7Rm5t--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/90w7n56pq3wdn7kw7af2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--vkn7Rm5t--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/90w7n56pq3wdn7kw7af2.png" alt="Image description" width="880" height="409"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;
    &lt;a href="https://ossinsight.io/analyze/ImageMagick/ImageMagick#contributors"&gt;&lt;em&gt;ImageMagick's top contributors last month&lt;/em&gt;&lt;/a&gt;
&lt;/center&gt;

&lt;p&gt;&lt;a href="https://ossinsight.io/analyze/dlemstra"&gt;Dirk Lemstra&lt;/a&gt; is another primary maintainer of ImageMagick, currently working as a consultant for a company and maintaining the project in his spare time.&lt;/p&gt;

&lt;p&gt;Currently, the project is sustained by the support of&lt;a href="https://imagemagick.org/script/support.php"&gt; various organizations and individuals&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  MyCLI
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;GitHub repo:&lt;a href="https://github.com/dbcli/mycli"&gt; https://github.com/dbcli/mycli&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;MyCLI is a command line interface for MySQL, MariaDB, and Percona with auto-completion and syntax highlighting.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--bgOCbKFk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/y0czjy2nmezchen4dldf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--bgOCbKFk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/y0czjy2nmezchen4dldf.png" alt="Image description" width="880" height="618"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;
    &lt;a href="https://ossinsight.io/explore/?id=2ec42ec5-b54b-4103-a01a-90fa72af5137"&gt;&lt;em&gt;MyCLI's top contributors&lt;/em&gt;&lt;/a&gt;
&lt;/center&gt;

&lt;p&gt;The project is maintained by its creator, Amjith Ramanujam, and contributions from the open source community. Based on the distribution of contributions, a relatively stable community of contributors has formed around MyCLI. Moreover, there are some&lt;a href="https://www.mycli.net/sponsors"&gt; organizations and individuals sponsoring this project&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--WG9nfNCw--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/zv9bubsofmfo49b9tqhg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--WG9nfNCw--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/zv9bubsofmfo49b9tqhg.png" alt="Image description" width="880" height="395"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;
    &lt;a href="https://ossinsight.io/analyze/dbcli/mycli#overview"&gt;&lt;em&gt;MyCLI's commit history&lt;/em&gt;&lt;/a&gt;
&lt;/center&gt;

&lt;p&gt;However, with the popularity of cloud databases, such projects have fallen behind the times, so the updates for the project have been very slow.&lt;/p&gt;

&lt;h2&gt;
  
  
  Homebrew
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;GitHub repo:&lt;a href="https://github.com/Homebrew/brew"&gt; https://github.com/Homebrew/brew&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Homebrew is a popular package manager for macOS that allows users to easily install and manage a wide variety of software packages. Homebrew is a nonprofit project run entirely by unpaid volunteer developers, with the lead maintainer being Mike McQuaid.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--JT8dlRy5--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/z0sn97ahif4xdd2x9u67.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--JT8dlRy5--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/z0sn97ahif4xdd2x9u67.png" alt="Image description" width="880" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;
    &lt;a href="https://ossinsight.io/explore/?id=791c5932-ceba-44b0-8a34-d5328a177783"&gt;&lt;em&gt;Homebrew's top contributors&lt;/em&gt;&lt;/a&gt;
&lt;/center&gt;

&lt;p&gt;&lt;a href="https://github.com/MikeMcQuaid"&gt;McQuaid&lt;/a&gt; has been involved with the Homebrew project since its inception and has been the lead maintainer since 2012—and he has full-time work on GitHub as a principal engineer.&lt;/p&gt;

&lt;p&gt;Homebrew’s financial operations are managed by the&lt;a href="https://opencollective.com/opensource"&gt; Open Source Collective&lt;/a&gt;, and accepts donations through&lt;a href="https://github.com/sponsors/Homebrew"&gt; GitHub Sponsors&lt;/a&gt;,&lt;a href="https://opencollective.com/homebrew"&gt; Open Collective&lt;/a&gt; or&lt;a href="https://www.patreon.com/homebrew"&gt; Patreon&lt;/a&gt;. Homebrew is also sponsoring some projects, including cURL mentioned earlier.&lt;/p&gt;

&lt;h2&gt;
  
  
  Apache Log4j
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;GitHub repo:&lt;a href="https://github.com/apache/logging-log4j2"&gt; https://github.com/apache/logging-log4j2&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Apache Log4j is a powerful logging framework for Java that allows developers to log messages from their applications with fine-grained control over where and how those messages are recorded. This library has been widely adopted by Java developers and is used by many popular Java-based applications, including Apache Kafka and Apache Spark.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Z7rW281r--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/w9rjse7h5idywr6fecsx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Z7rW281r--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/w9rjse7h5idywr6fecsx.png" alt="Image description" width="880" height="268"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;
    &lt;a href="https://ossinsight.io/analyze/apache/logging-log4j2#overview"&gt;&lt;em&gt;Apache Log4j's star history&lt;/em&gt;&lt;/a&gt;
&lt;/center&gt;

&lt;p&gt;Interestingly, the project did not receive much attention until November 2021, when a security vulnerability was reported. This incident doubled its star count and gained attention from the industry.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--2mBerE6T--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/qsgimkaldyzj0qr83xit.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--2mBerE6T--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/qsgimkaldyzj0qr83xit.png" alt="Image description" width="880" height="343"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;
    &lt;a href="https://ossinsight.io/explore/?id=63ea8bae-8155-41d2-864c-795978524a60"&gt;&lt;em&gt;Apache Log4j's top contributors&lt;/em&gt;&lt;/a&gt;
&lt;/center&gt;

&lt;p&gt;&lt;a href="https://github.com/rgoers"&gt;Ralph Goers&lt;/a&gt; is the original author of Log4j 2. He worked on the initial design and development of Log4j 2, which was released in 2014. Now he is working on Nextiva as a Fellow Architect.Now the core maintainer of logging-log4j2 is&lt;a href="https://github.com/garydgregory"&gt; Gary Gregory&lt;/a&gt;, who is a member of the Apache Software Foundation and has been working on the project for over a decade.&lt;/p&gt;

&lt;p&gt;Because the Log4j 2 project is under the Apache Foundation, the maintainers can focus more on project maintenance without worrying about financial issues.&lt;/p&gt;

&lt;h2&gt;
  
  
  OpenSSL
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;GitHub repo:&lt;a href="https://github.com/openssl/openssl"&gt; https://github.com/openssl/openssl&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;OpenSSL is an open source library that provides cryptographic functions for many different applications, including web servers, email clients, and virtual private networks. OpenSSL is used by millions of websites and applications to secure communications over the internet, including popular web servers like Apache and Nginx, as well as popular programming languages like Python and Ruby.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--mW9svt-t--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/jt8y7808tr5mim5wu2k0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--mW9svt-t--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/jt8y7808tr5mim5wu2k0.png" alt="Image description" width="880" height="322"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;
    &lt;a href="https://ossinsight.io/explore/?id=143a8231-bd81-42c7-bb52-d1e85c61c6a9"&gt;&lt;em&gt;OpenSSL's top contributors&lt;/em&gt;&lt;/a&gt;
&lt;/center&gt;

&lt;p&gt;The project is developed by a distributed team, mostly consisting of volunteers with some project funded resources. The team is led by&lt;a href="https://github.com/mattcaswell"&gt; Matt Caswell&lt;/a&gt;, who has been working on OpenSSL since 2010 and became one of the maintainers in 2013.&lt;/p&gt;

&lt;p&gt;Apart from volunteer developers, OpenSSL also depends on financial support from the community, which can be given in various forms. These include&lt;a href="https://www.openssl.org/support/contracts.html"&gt; a support contract&lt;/a&gt;,&lt;a href="https://www.openssl.org/support/acks.html"&gt; a sponsorship donation&lt;/a&gt;, or a smaller donation via&lt;a href="https://github.com/sponsors/openssl"&gt; GitHub Sponsors&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Maintaining an open source project is no easy feat. It's a labor of love, built by passionate developers who sacrifice their time to create something that makes a difference. As users, we owe them our gratitude for the tools and technologies they provide. As Mike McQuaid suggested on the blog&lt;a href="https://mikemcquaid.com/open-source-maintainers-owe-you-nothing/"&gt; Open Source Maintainers Owe You Nothing&lt;/a&gt;, "Remember when filing an issue, opening a pull request, or making a comment on a project, to be grateful that people spend their free time to build software you get to use for free."&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>github</category>
      <category>programming</category>
      <category>computerscience</category>
    </item>
    <item>
      <title>How We Reduced Online Serving Latency from 1.11s to 123.6ms on a Distributed SQL Database</title>
      <dc:creator>OSS Insight</dc:creator>
      <pubDate>Tue, 22 Nov 2022 04:11:43 +0000</pubDate>
      <link>https://forem.com/ossinsight/how-we-reduced-online-serving-latency-from-111s-to-1236ms-on-a-distributed-sql-database-3ki6</link>
      <guid>https://forem.com/ossinsight/how-we-reduced-online-serving-latency-from-111s-to-1236ms-on-a-distributed-sql-database-3ki6</guid>
      <description>&lt;p&gt;&lt;strong&gt;&lt;em&gt;TL;DR:&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This post tells how a website on a distributed database &lt;strong&gt;reduced online serving latency from 1.11 s to 417.7 ms, and then to 123.6 ms&lt;/strong&gt;. We found that some lessons learned on MySQL could be applied throughout the optimization process. But when we optimize a &lt;strong&gt;distributed database,&lt;/strong&gt; we need to consider more.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://ossinsight.io/" rel="noopener noreferrer"&gt;OSS Insight&lt;/a&gt; website displays the data changes of GitHub events in real time. It's powered by &lt;a href="https://www.pingcap.com/tidb-cloud/" rel="noopener noreferrer"&gt;TiDB Cloud&lt;/a&gt;, a MySQL-compatible distributed SQL database for elastic scale and real-time analytics.&lt;/p&gt;

&lt;p&gt;Recently, to save costs, we tried to use lower-specification machines without affecting query efficiency and user experience. But our website and query response slowed down.&lt;/p&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3n3jtf1z69oyosm7e0ep.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3n3jtf1z69oyosm7e0ep.png" alt="Image description" width="800" height="326"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;&lt;em&gt;The repository analysis page was loading, loading, and loading&lt;/em&gt;&lt;/center&gt;



&lt;p&gt;How could we solve these problems on a distributed database? Could we use the methodology we learned on MySQL?&lt;/p&gt;

&lt;h2&gt;
  
  
  Analyzing the SQL execution plan
&lt;/h2&gt;

&lt;p&gt;To identify slow SQL statements, we used TiDB Cloud's Diagnosis page to sort SQL queries by their average latency.&lt;/p&gt;

&lt;p&gt;For example, after the API server received a request, it executed the following SQL statement to obtain the number of issues in the &lt;a href="https://ossinsight.io/analyze/microsoft/vscode" rel="noopener noreferrer"&gt;vscode repository&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;number&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;github_events&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt;
    &lt;span class="n"&gt;repo_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;41881900&lt;/span&gt;     &lt;span class="c1"&gt;-- vscode&lt;/span&gt;
    &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'IssuesEvent'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;However, if the open source repository is large, this query may take several seconds or more to execute.&lt;/p&gt;

&lt;h3&gt;
  
  
  Using &lt;code&gt;EXPLAIN ANALYZE&lt;/code&gt; to troubleshoot query performance problems
&lt;/h3&gt;

&lt;p&gt;In MySQL, when we troubleshoot query performance problems, we usually use the &lt;code&gt;EXPLAIN ANALYZE &amp;lt;sql&amp;gt;&lt;/code&gt; statement to view the SQL statement's execution plan. We can use the execution plan to locate the problem. The same works for TiDB.&lt;/p&gt;

&lt;p&gt;We executed the &lt;code&gt;EXPLAIN&lt;/code&gt; statement:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;EXPLAIN&lt;/span&gt; &lt;span class="k"&gt;ANALYZE&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;number&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;github_events&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt;
    &lt;span class="n"&gt;repo_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;41881900&lt;/span&gt;     &lt;span class="c1"&gt;-- vscode&lt;/span&gt;
    &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'IssuesEvent'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The result showed that the query took 1.11 seconds to execute.&lt;/p&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcjy1ts6icaih672aio44.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcjy1ts6icaih672aio44.png" alt="Image description" width="800" height="104"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;&lt;em&gt;The query result&lt;/em&gt;&lt;/center&gt;



&lt;p&gt;You can see that TiDB's &lt;a href="https://docs.pingcap.com/tidb/stable/explain-overview" rel="noopener noreferrer"&gt;&lt;code&gt;EXPLAIN ANALYZE&lt;/code&gt;&lt;/a&gt; statement execution result was completely different from MySQL's. TiDB's execution plan gave us a clearer understanding of how this SQL statement was executed.&lt;/p&gt;

&lt;p&gt;The execution plan shows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;This SQL statement was split into several subtasks. Some were on the &lt;code&gt;root&lt;/code&gt; node, and others were on the &lt;a href="https://docs.pingcap.com/tidb/dev/tikv-overview#tikv-overview" rel="noopener noreferrer"&gt;&lt;code&gt;tikv&lt;/code&gt;&lt;/a&gt; node.&lt;/li&gt;
&lt;li&gt;The query fetched data from the &lt;code&gt;partition:issue_event partition&lt;/code&gt; table.&lt;/li&gt;
&lt;li&gt;This query did a range scan through the index &lt;code&gt;index_github_events_on_repo_id(repo_id)&lt;/code&gt;. This let the query &lt;strong&gt;narrow down the data scan quickly&lt;/strong&gt;. &lt;strong&gt;This process only took&lt;/strong&gt; &lt;strong&gt;59 ms.&lt;/strong&gt; It was the sum of the execution times of multiple concurrent tasks.&lt;/li&gt;
&lt;li&gt;Besides &lt;code&gt;IndexRangeScan&lt;/code&gt;, &lt;strong&gt;the query also used &lt;code&gt;TableRowIDScan&lt;/code&gt;&lt;/strong&gt;. &lt;strong&gt;This scan took&lt;/strong&gt; &lt;strong&gt;4.69 s&lt;/strong&gt;, the sum of execution times for multiple concurrent subtasks.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;From the execution times above, we determined that the query performance bottleneck was in the &lt;code&gt;TableRowIDScan&lt;/code&gt; step.&lt;/p&gt;

&lt;p&gt;We reran the &lt;code&gt;EXPLAIN ANALYZE&lt;/code&gt; statement and found that the query was faster the second time. Why?&lt;/p&gt;

&lt;h3&gt;
  
  
  Why did &lt;code&gt;TableRowIDScan&lt;/code&gt; take so long?
&lt;/h3&gt;

&lt;p&gt;To find the reason why &lt;code&gt;TableRowIDScan&lt;/code&gt; took so long, we need basic knowledge of TiDB's underlying storage.&lt;/p&gt;

&lt;p&gt;In TiDB, a table's data entries and indexes are stored on TiKV nodes in key-value pairs.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For an index, the key is the combination of the index value and the &lt;code&gt;row_id&lt;/code&gt; (for a non-clustered index) or the primary key (for a clustered index). The &lt;code&gt;row_id&lt;/code&gt; or primary key indicates where the data is stored.&lt;/li&gt;
&lt;li&gt;For a data entry, the key is the combination of the table ID and the &lt;code&gt;row_id&lt;/code&gt; or primary key. The value part is the combination of this row of data.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This graph shows how &lt;code&gt;IndexLookup&lt;/code&gt; is executed in the execution plan:&lt;/p&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb6vfrnyh4th43nzg5bw9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb6vfrnyh4th43nzg5bw9.png" alt="Image description" width="800" height="466"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;&lt;em&gt;This is the logical structure, not the physical storage structure.&lt;/em&gt;&lt;/center&gt;



&lt;p&gt;In the query above, TiDB uses the query condition &lt;code&gt;repo_id=41881900&lt;/code&gt; to filter out all row numbers &lt;code&gt;row_id&lt;/code&gt; related to the repository in the secondary index &lt;code&gt;index_github_events_on_repo_id&lt;/code&gt;. The query needs the number &lt;code&gt;column&lt;/code&gt; data, but the secondary index doesn't provide it. Therefore, TiDB must execute &lt;code&gt;IndexLookup&lt;/code&gt; to find the corresponding row in the table based on the obtained &lt;code&gt;row_id&lt;/code&gt; (the &lt;code&gt;TableRowIDScan&lt;/code&gt; step).&lt;/p&gt;

&lt;p&gt;The rows are probably scattered in different data blocks and stored on the hard disk. This causes TiDB to perform a large number of I/O operations to read data from different data blocks or even different machine nodes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why was &lt;code&gt;EXPLAIN ANALYZE&lt;/code&gt; faster the second time?
&lt;/h3&gt;

&lt;p&gt;In &lt;code&gt;EXPLAIN ANALZYE&lt;/code&gt;'s execution result, we saw that the "execution info" column corresponding to the &lt;code&gt;TableRowIDScan&lt;/code&gt; step contained this information:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;block: {cache_hit_count: 2755559, read_count: 179510, read_byte: 4.07 GB}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We thought this had something to do with TiKV. TiKV read a very large number of data blocks from the disk. Because the data blocks read from the &lt;em&gt;disk&lt;/em&gt; were cached in &lt;em&gt;memory&lt;/em&gt; in the first execution, 2.75 million data blocks could be read directly from &lt;em&gt;memory&lt;/em&gt; instead of being retrieved from the hard disk. This made the &lt;code&gt;TableRowIDScan&lt;/code&gt; step much faster, and the query was faster overall.&lt;/p&gt;

&lt;p&gt;However, we believed that user queries were random. For example, a user might look up data from a &lt;code&gt;vscode&lt;/code&gt; repository and then go to a &lt;code&gt;kubernetes&lt;/code&gt; repository. TiKV's memory couldn't cache all the data blocks in all the drives. Therefore, this did not solve our problem, but it reminded us that when we analyze SQL execution efficiency, we need to exclude cache effects.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using a covering index to avoid executing &lt;code&gt;TableRowIDScan&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;Could we avoid executing &lt;code&gt;TableRowIDScan&lt;/code&gt; in &lt;code&gt;IndexLookup&lt;/code&gt;?&lt;/p&gt;

&lt;p&gt;In MySQL, a covering index prevents the database from index lookup after index filtering. We wanted to apply this to OSS Insight. In our TiDB database, we tried to create a composite index to achieve index coverage.&lt;/p&gt;

&lt;p&gt;When we created a composite index with multiple columns, we needed to pay attention to the column order. Our goals were to allow a composite index to be used by as many queries as possible, to help these queries narrow the scope of data scans as quickly as possible, and to provide as many fields as possible in the query. When we created a composite index we followed this order:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Columns that had high differentiation and could be used as equivalence conditions for the &lt;code&gt;WHERE&lt;/code&gt; statement, like &lt;code&gt;repo_id&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Columns that didn't have high differentiation but could be used as equivalence conditions for the &lt;code&gt;WHERE&lt;/code&gt; statement, like &lt;code&gt;type&lt;/code&gt; and &lt;code&gt;action&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Columns that could be used as range query conditions for the &lt;code&gt;WHERE&lt;/code&gt; statement, like &lt;code&gt;created_at&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Redundant columns that were not used as filter conditions but were used in the query, such as &lt;code&gt;number&lt;/code&gt; and &lt;code&gt;push_size&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We used the &lt;code&gt;CREATE IDNEX&lt;/code&gt; statement to create a composite index in the database:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE INDEX index_github_events_on_repo_id_type_number ON github_events(repo_id, type, number);
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When we created the index and ran the SQL statement again, the query speed was significantly faster. We viewed the execution plan through &lt;code&gt;EXPLAIN ANALYZE&lt;/code&gt; and found that the execution plan became simpler. The &lt;code&gt;IndexLookup&lt;/code&gt; and &lt;code&gt;TableRowIDScan&lt;/code&gt; steps were gone. &lt;strong&gt;The query took only 417.7 ms&lt;/strong&gt;.&lt;/p&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy19l1hs17h25d57bm45o.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy19l1hs17h25d57bm45o.png" alt="Image description" width="800" height="71"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;&lt;em&gt;The result of the EXPLAIN query. This query cost 417.7 ms&lt;/em&gt;&lt;/center&gt;



&lt;p&gt;So we knew that our query could get all the data it needed by doing an &lt;code&gt;IndexRangeScan&lt;/code&gt; on the new index. This composite index included the &lt;code&gt;number&lt;/code&gt; field, so TiDB did not need to perform &lt;code&gt;IndexLookup&lt;/code&gt; to get data from the table. This reduced a lot of I/O operations.&lt;/p&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwsaxm2ai74syrgetivb6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwsaxm2ai74syrgetivb6.png" alt="Image description" width="800" height="431"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;&lt;em&gt;IndexRangeScan in the non-clustered table&lt;/em&gt;&lt;/center&gt;



&lt;h2&gt;
  
  
  Pushing down computing to further reduce query latency
&lt;/h2&gt;

&lt;p&gt;For a query that needed to obtain 270,000 rows of data, 417.7 ms was quite a short execution time. But could we improve the time even more?&lt;/p&gt;

&lt;p&gt;We thought this relied on TiDB's architecture that separates computing and storage layers. This is different from MySQL.&lt;/p&gt;

&lt;p&gt;In TiDB:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;tidb-server&lt;/code&gt; node computes data. It corresponds to root in the execution plan.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;tikv-server&lt;/code&gt; node stores the data. It corresponds to &lt;code&gt;cop[tikv]&lt;/code&gt; in the execution plan.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Generally, an SQL statement is split into multiple steps to execute with the cooperation of computing and storage nodes.&lt;/p&gt;

&lt;p&gt;When we executed the SQL statement in this article, TiDB obtained the data of the &lt;code&gt;github_events&lt;/code&gt; table from &lt;code&gt;tikv-server&lt;/code&gt; and performed the aggregate calculation of the COUNT function on &lt;code&gt;tidb-server&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;number&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;github_events&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt;
    &lt;span class="n"&gt;repo_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;41881900&lt;/span&gt;     &lt;span class="c1"&gt;-- vscode&lt;/span&gt;
    &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'IssuesEvent'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The execution plan indicated that when TiDB was performing &lt;code&gt;IndexReader&lt;/code&gt;, &lt;code&gt;tidb-server&lt;/code&gt; needed to read 270,000 rows of data from &lt;code&gt;tikv-server&lt;/code&gt; through the network. This was time-consuming.&lt;/p&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu6g92hq8z9t7cvoc9c3q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu6g92hq8z9t7cvoc9c3q.png" alt="Image description" width="800" height="70"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;&lt;em&gt;tidb-server read 270,000 rows of data from tikv-server&lt;/em&gt;&lt;/center&gt;



&lt;p&gt;How could we avoid such a large network transmission? Although the query needed to obtain a large amount of data, the final calculation result was only a number. Could we complete the &lt;code&gt;COUNT&lt;/code&gt; aggregation calculation on &lt;code&gt;tikv-server&lt;/code&gt; and return the result only to &lt;code&gt;tidb-server&lt;/code&gt;?&lt;/p&gt;

&lt;p&gt;TiDB had implemented this idea through the &lt;a href="https://docs.pingcap.com/tidb/dev/tikv-overview#tikv-coprocessor" rel="noopener noreferrer"&gt;coprocessor&lt;/a&gt; on &lt;code&gt;tikv-server&lt;/code&gt;. This optimization process is called computing pushdown.&lt;/p&gt;

&lt;p&gt;The execution plan indicated that our SQL query did not do this. Why? We checked the TiDB documentation and learned that:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Usually, aggregate functions with the &lt;code&gt;DISTINCT&lt;/code&gt; option are executed in the TiDB layer in a single-threaded execution model.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This meant that our SQL statement couldn't use computing pushdown.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;number&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;github_events&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt;
    &lt;span class="n"&gt;repo_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;41881900&lt;/span&gt;     &lt;span class="c1"&gt;-- vscode&lt;/span&gt;
    &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'IssuesEvent'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Therefore, we removed the &lt;code&gt;DISTINCT&lt;/code&gt; keyword.&lt;/p&gt;

&lt;p&gt;For the &lt;code&gt;github_events&lt;/code&gt; table, an issue only generated an event with the &lt;code&gt;IssuesEvent&lt;/code&gt; type and &lt;code&gt;opened&lt;/code&gt; action. We could get the total number of unique issues by adding the condition of &lt;code&gt;action = 'opened'&lt;/code&gt;. This way, we didn't need to use the &lt;code&gt;DISTINCT&lt;/code&gt; keyword for deduplication.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT
    COUNT(number)
FROM github_events
WHERE
    repo_id = 41881900     -- vscode
    AND type = 'IssuesEvent'
    AND action = 'opened';
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The composite index we created lacked the &lt;code&gt;action&lt;/code&gt; column. This caused the query index coverage to fail. So we created a new composite index:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE INDEX index_github_events_on_repo_id_type_action_number ON github_events(repo_id, type, action, number);
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After we created the index, we checked the execution plan of the modified SQL statement through the &lt;code&gt;EXPLAIN ANALYZE&lt;/code&gt; statement. We found that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Because we added a new filter &lt;code&gt;action='opened'&lt;/code&gt;, the number of rows to scan had decreased from 270,000 to 140,000.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;tikv-server&lt;/code&gt; executed the &lt;code&gt;StreamAgg&lt;/code&gt; operator, which was the aggregate calculation of the &lt;code&gt;COUNT&lt;/code&gt; function. This indicated that the calculation had been pushed down to the TiKV coprocessor for execution.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;tidb-server&lt;/code&gt; only needed to obtain two rows of data from &lt;code&gt;tikv-server&lt;/code&gt; through the network. This greatly reduced the amount of data transmitted.&lt;/li&gt;
&lt;li&gt;The query only took 123.6 ms.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+-------------------------+---------+---------+-----------+-------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------+-----------+------+

| id                      | estRows | actRows | task      | access object                                                                                                           | execution info                                                                                                                                                                                                                                                                                                                                                           | operator info                                                                             | memory    | disk |

+-------------------------+---------+---------+-----------+-------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------+-----------+------+

| StreamAgg_28            | 1.00    | 1       | root      |                                                                                                                         | time:123.6ms, loops:2                                                                                                                                                                                                                                                                                                                                                    | funcs:count(Column#43)-&amp;gt;Column#34                                                         | 388 Bytes | N/A  |

| └─IndexReader_29        | 1.00    | 2       | root      | partition:issues_event                                                                                                  | time:123.6ms, loops:2, cop_task: {num: 2, max: 123.5ms, min: 1.5ms, avg: 62.5ms, p95: 123.5ms, max_proc_keys: 131360, p95_proc_keys: 131360, tot_proc: 115ms, tot_wait: 1ms, rpc_num: 2, rpc_time: 125ms, copr_cache_hit_ratio: 0.50, distsql_concurrency: 15}                                                                                                           | index:StreamAgg_11                                                                        | 590 Bytes | N/A  |

|   └─StreamAgg_11        | 1.00    | 2       | cop[tikv] |                                                                                                                         | tikv_task:{proc max:116ms, min:8ms, avg: 62ms, p80:116ms, p95:116ms, iters:139, tasks:2}, scan_detail: {total_process_keys: 131360, total_process_keys_size: 23603556, total_keys: 131564, get_snapshot_time: 1ms, rocksdb: {delete_skipped_count: 320, key_skipped_count: 131883, block: {cache_hit_count: 307, read_count: 1, read_byte: 63.9 KB, read_time: 60.2µs}}} | funcs:count(gharchive_dev.github_events.number)-&amp;gt;Column#43                                | N/A       | N/A  |

|     └─IndexRangeScan_15 | 7.00    | 141179  | cop[tikv] | table:github_events, index:index_ge_on_repo_id_type_action_created_at_number(repo_id, type, action, created_at, number) | tikv_task:{proc max:116ms, min:8ms, avg: 62ms, p80:116ms, p95:116ms, iters:139, tasks:2}                                                                                                                                                                                                                                                                                 | range:[41881900 "IssuesEvent" "opened",41881900 "IssuesEvent" "opened"], keep order:false | N/A       | N/A  |

+-------------------------+---------+---------+-----------+-------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------+-----------+------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Applying what we learned to other queries
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Through our analysis and optimizations, the query latency was significantly reduced:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1.11 s → 417.7 ms → 123.6 ms&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We applied what we learned to other queries and created the following composite indexes in the &lt;code&gt;github_events&lt;/code&gt; table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;index_ge_on_repo_id_type_action_pr_merged_created_at_add_del

index_ge_on_repo_id_type_action_created_at_number_pdsize_psize

index_ge_on_repo_id_type_action_created_at_actor_login

index_ge_on_creator_id_type_action_merged_created_at_add_del

index_ge_on_actor_id_type_action_created_at_repo_id_commits
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These composite indexes covered more than 20 analytical queries in repository analysis and personal analysis pages on the OSS Insight website. This improved our website's overall loading speed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Some lessons we learned on MySQL can be applied throughout the optimization process.&lt;/strong&gt; But we need to consider more when we optimize query performance in a &lt;strong&gt;distributed database&lt;/strong&gt;. We also recommend you read &lt;a href="https://docs.pingcap.com/tidb/stable/performance-tuning-overview" rel="noopener noreferrer"&gt;Performance Tuning&lt;/a&gt; in the TiDB documentation. This will give you a more professional and comprehensive guide to performance optimization.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.pingcap.com/tidb/dev/tidb-computing" rel="noopener noreferrer"&gt;TiDB Computing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.pingcap.com/tidb/stable/tidb-storage" rel="noopener noreferrer"&gt;TiDB Storage&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.pingcap.com/tidb/stable/agg-distinct-optimization" rel="noopener noreferrer"&gt;Distinct Optimization&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>javascript</category>
    </item>
    <item>
      <title>Open Source Highlights: Trends and Insights from GitHub 2022</title>
      <dc:creator>OSS Insight</dc:creator>
      <pubDate>Thu, 10 Nov 2022 09:23:33 +0000</pubDate>
      <link>https://forem.com/ossinsight/open-source-highlights-trends-and-insights-from-github-2022-1ajc</link>
      <guid>https://forem.com/ossinsight/open-source-highlights-trends-and-insights-from-github-2022-1ajc</guid>
      <description>&lt;p&gt;We analyzed more than 5,000,000,000 rows of GitHub event data and got the results here. In this &lt;a href="https://ossinsight.io/2022/"&gt;report&lt;/a&gt;, you'll get interesting findings about open source software on GitHub in 2022, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Top languages in the open source world over the past four years&lt;/li&gt;
&lt;li&gt;Geographic distribution of developer behavior&lt;/li&gt;
&lt;li&gt;Developer behavior distribution on weekdays and weekends&lt;/li&gt;
&lt;li&gt;Popular open source topics&lt;/li&gt;
&lt;li&gt;The most popular repositories in 2022&lt;/li&gt;
&lt;li&gt;The most active repositories over the past four years&lt;/li&gt;
&lt;li&gt;Who gave the most stars in 2022&lt;/li&gt;
&lt;li&gt;The most active developers since 2011&lt;/li&gt;
&lt;li&gt;Appendix&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Top languages in the open source world over the past four years
&lt;/h2&gt;

&lt;p&gt;This chart ranks programming languages yearly from 2019 to 2022 based on the ratio of new repositories using these languages to all new repositories.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--LW1bRX9l--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/vwoxjxwmttobhae5uuym.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--LW1bRX9l--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/vwoxjxwmttobhae5uuym.jpg" alt="Image description" width="880" height="1271"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Insights:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Python surpassed Java and moved to #3 in 2021.&lt;/li&gt;
&lt;li&gt;TypeScript rose from #10 to #6, and SCSS rose from #39 to #19. The rise of SCSS shows that open source projects that value front-end expressiveness are gradually gaining popularity.&lt;/li&gt;
&lt;li&gt;The two languages Ruby and R dropped a lot in ranking over the years.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Rankings of back-end programming languages
&lt;/h3&gt;

&lt;p&gt;The programming languages used in a pull request reflect which languages developers used. To find out the most popular back-end programming languages, we queried the distribution of programming languages by new pull requests from 2019 to 2022 and took the top 10 for each year.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--cecf2xVN--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/l63bplds5ng1t0x8uw8z.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--cecf2xVN--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/l63bplds5ng1t0x8uw8z.jpg" alt="Image description" width="880" height="1335"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The chart data indicates:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Python and Java rank #1 and #2 respectively. In 2021, Go overtook Ruby to rank #3 in 2021.&lt;/li&gt;
&lt;li&gt;Rust has been trending upward for several years, ranking #9 in 2022.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Geographic distribution of developer behavior
&lt;/h2&gt;

&lt;p&gt;We queried the number of various events that occurred throughout the world from January 1 to September 30, 2022 and identified the top 10 countries by the number of events triggered by developers in these countries. The chart displays the proportion of each event type by country or region.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--5VhAZY0---/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/j00djt4rpkmd0h7iz4mc.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--5VhAZY0---/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/j00djt4rpkmd0h7iz4mc.jpg" alt="Image description" width="880" height="1188"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The chart shows that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The events triggered in the top 10 countries account for about 23.27% of all GitHub events. However, the number of developers from these countries is only 10%.&lt;/li&gt;
&lt;li&gt;US developers are most likely to review code, with a PullRequestReviewEvent share of 6.15%.&lt;/li&gt;
&lt;li&gt;Korean developers prefer pushing directly to repositories (PushEvent).&lt;/li&gt;
&lt;li&gt;Japanese developers are most likely to submit code via pull requests, with a PullRequestEvent share of 10%.&lt;/li&gt;
&lt;li&gt;German developers like to open issues and comments, with IssueEvent and CommentEvent accounting for 4.18% and 12.66% respectively.&lt;/li&gt;
&lt;li&gt;Chinese developers like to star repositories, with 17.23% for WatchEvent and 2.7% for ForkEvent.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Notes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;In 2022, 17,062,081 developers had behavioral events, and 2,923,523 of them have the Location field, so the sampling rate is 17.13%&lt;/li&gt;
&lt;li&gt;GitHub identifies 15 types of events. We only show commonly used types. Comment Event includes CommitCommentEvent, IssueCommentEvent, and PullRequestReviewCommentEvent. Others includes MemberEvent, CreateEvent, ReleaseEvent, GollumEvent, and PublicEvent.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Developer behavior distribution on weekdays and weekends
&lt;/h2&gt;

&lt;p&gt;We queried the distribution of each event type over the seven days of the week.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--lkB034S5--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/sccq0ijru9u3dbve2oik.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--lkB034S5--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/sccq0ijru9u3dbve2oik.png" alt="Image description" width="880" height="476"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Insights:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Developers are most active on weekdays, with 77.73% of events occurring on weekdays.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The distribution of specific events
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--2BAnnGMN--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/6jmr2vz9bz0yfali5quf.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--2BAnnGMN--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/6jmr2vz9bz0yfali5quf.jpg" alt="Image description" width="880" height="483"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Insights:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pull Request Event, Pull Request Review Event, and Issues Event all have the highest percentage on Tuesdays, while the lowest percentage is on the weekends.&lt;/li&gt;
&lt;li&gt;The amount of Push Event, Watch Event, and Fork Event activities are similar on weekdays and weekends, while the Pull Request Review Event is the most different. Watch Event and Fork Event are more personal behaviors, Pull Request Review Events are more work behaviors, and Push Events are used more in personal projects.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Popular open source topics
&lt;/h2&gt;

&lt;p&gt;Each year, technology introduces new buzz words. Can we gain insight into technical trends through the open source repositories behind the hot words? We investigated five technical areas: Low Code, Web3, GitHub Actions, Database, and AI.&lt;/p&gt;

&lt;h3&gt;
  
  
  Activity levels of popular topics
&lt;/h3&gt;

&lt;p&gt;We queried the number of open source repositories associated with each technical area, as well as the percentage of active repositories in 2022.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Z1Biti7j--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/cyct5xqy3xrgbw0ghjqq.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Z1Biti7j--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/cyct5xqy3xrgbw0ghjqq.jpg" alt="Image description" width="880" height="452"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This figure shows that open source repositories in the Low Code topic are the most active, with 76.3% being active in 2022, followed by Web3 with 63.85%.&lt;/p&gt;

&lt;h3&gt;
  
  
  Popular topics over the years
&lt;/h3&gt;

&lt;p&gt;We queried the following items for each technical area from 2015 to 2022:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The annual increment of repositories&lt;/li&gt;
&lt;li&gt;The annual increment of collaborative events&lt;/li&gt;
&lt;li&gt;The number of developers participating in collaborative events&lt;/li&gt;
&lt;li&gt;The annual increment of stars&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then, we calculated the growth rate for each year which can reflect new entrants, developer engagement in this technical field, and the industry's interest in this area. For 2022, we compare its first nine months with the first nine months of 2021.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--PafgDUR4--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/dogh8izzxjqamq9pigyt.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--PafgDUR4--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/dogh8izzxjqamq9pigyt.jpg" alt="Image description" width="880" height="639"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can see that 2020 is the peak period of project development, with a 313.43% increase in new repositories and a 157.06% increase in developer collaborative events. The industry's interest increased most significantly in 2021, reaching 184.82%. In 2022, the year-on-year growth data shows that the number of new repositories decreased (-26.21%), but developer engagement and industry interest are still rising.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Time range of 2022: 01.01-09.30, excluding bot events and forking repositories&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--C8th_Rez--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/q29zoyg1yrnginozasgx.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--C8th_Rez--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/q29zoyg1yrnginozasgx.jpg" alt="Image description" width="880" height="635"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Whether it is the creation of new repositories, developers, or the interest of the industry, the Web3 ecosystem has grown rapidly in recent years, and the growth rate of new repositories peaked at 322.65% in 2021.&lt;/p&gt;

&lt;p&gt;* Time range of 2022: 01.01-09.30, excluding bot events and forking repositories&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--0KveZFI3--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/kgfe1aub6372krp8q1ss.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--0KveZFI3--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/kgfe1aub6372krp8q1ss.jpg" alt="Image description" width="880" height="653"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The annual increase of GitHub Actions repositories has been declining, but developer engagement and the industry's interest are still increasing slightly.&lt;/p&gt;

&lt;p&gt;* Time range of 2022: 01.01-09.30, excluding bot events and forking repositories&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--h_MkE_XQ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/hh7q4070knsdh8swc1jn.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--h_MkE_XQ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/hh7q4070knsdh8swc1jn.jpg" alt="Image description" width="880" height="638"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As an infrastructure project, the Database project's threshold is high. Compared with projects in other fields, a database project has a stable growth rate.&lt;/p&gt;

&lt;p&gt;* Time range of 2022: 01.01-09.30, excluding bot events and forking repositories&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--RO03fSKS--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/w886ml641c7md27d85i2.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--RO03fSKS--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/w886ml641c7md27d85i2.jpg" alt="Image description" width="880" height="656"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After two years of high growth in 2016 and 2017, open source projects in AI have been growing gradually slowly.&lt;/p&gt;

&lt;p&gt;* Time range of 2022: 01.01-09.30, excluding bot events and forking repositories&lt;/p&gt;

&lt;h2&gt;
  
  
  The most popular repositories in 2022
&lt;/h2&gt;

&lt;p&gt;The number of stars is the most visible indication of the popularity of open source projects. We looked at the 50 projects that received the most stars from January 1 to September 30, 2022. We found that:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--XKuCUEm3--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/g4otk0x1irty1yjclto0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--XKuCUEm3--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/g4otk0x1irty1yjclto0.png" alt="Image description" width="880" height="473"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;* Time range: 2022.01.01-2022.09.30, excluding bot events&lt;/p&gt;

&lt;h2&gt;
  
  
  The most active repositories over the past four years
&lt;/h2&gt;

&lt;p&gt;Here we looked up the top 20 active repositories per year from 2019 to 2022 and counted the total number of listings per repository. The activity of the repository is ranked according to the number of developers participating in collaborative events.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;tr&gt;
   &lt;td&gt;
&lt;strong&gt;Repository Name&lt;/strong&gt;
   &lt;/td&gt;
   &lt;td&gt;
&lt;strong&gt;Count&lt;/strong&gt;
   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;
&lt;a href="https://ossinsight.io/analyze/microsoft/vscode"&gt;microsoft/vscode&lt;/a&gt;
   &lt;/td&gt;
   &lt;td&gt;4
   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;
&lt;a href="https://ossinsight.io/analyze/flutter/flutter"&gt;flutter/flutter&lt;/a&gt;
   &lt;/td&gt;
   &lt;td&gt;4
   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;
&lt;a href="https://ossinsight.io/analyze/MicrosoftDocs/azure-docs"&gt;MicrosoftDocs/azure-docs&lt;/a&gt;
   &lt;/td&gt;
   &lt;td&gt;4
   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;
&lt;a href="https://ossinsight.io/analyze/firstcontributions/first-contributions"&gt;firstcontributions/first-contributions&lt;/a&gt;
   &lt;/td&gt;
   &lt;td&gt;4
   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;
&lt;a href="https://ossinsight.io/analyze/Facebook/react-native"&gt;Facebook/react-native&lt;/a&gt;
   &lt;/td&gt;
   &lt;td&gt;4
   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;
&lt;a href="https://ossinsight.io/analyze/pytorch/pytorch"&gt;pytorch/pytorch&lt;/a&gt;
   &lt;/td&gt;
   &lt;td&gt;4
   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;
&lt;a href="https://ossinsight.io/analyze/microsoft/TypeScript"&gt;microsoft/TypeScript&lt;/a&gt;
   &lt;/td&gt;
   &lt;td&gt;4
   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;
&lt;a href="https://ossinsight.io/analyze/tensorflow/tensorflow"&gt;tensorflow/tensorflow&lt;/a&gt;
   &lt;/td&gt;
   &lt;td&gt;3
   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;
&lt;a href="https://ossinsight.io/analyze/kubernetes/kubernetes"&gt;kubernetes/kubernetes&lt;/a&gt;
   &lt;/td&gt;
   &lt;td&gt;3
   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;
&lt;a href="https://ossinsight.io/analyze/DefinitelyTyped/DefinitelyTyped"&gt;DefinitelyTyped/DefinitelyTyped&lt;/a&gt;
   &lt;/td&gt;
   &lt;td&gt;3
   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;
&lt;a href="https://ossinsight.io/analyze/golang/go"&gt;golang/go&lt;/a&gt;
   &lt;/td&gt;
   &lt;td&gt;3
   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;
&lt;a href="https://ossinsight.io/analyze/google/it-cert-automation-practice"&gt;google/it-cert-automation-practice&lt;/a&gt;
   &lt;/td&gt;
   &lt;td&gt;3
   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;
&lt;a href="https://ossinsight.io/analyze/home-assistant/core"&gt;home-assistant/core&lt;/a&gt;
   &lt;/td&gt;
   &lt;td&gt;3
   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;
&lt;a href="https://ossinsight.io/analyze/microsoft/PowerToys"&gt;microsoft/PowerToys&lt;/a&gt;
   &lt;/td&gt;
   &lt;td&gt;3
   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;
&lt;a href="https://ossinsight.io/analyze/microsoft/WSL"&gt;microsoft/WSL&lt;/a&gt;
   &lt;/td&gt;
   &lt;td&gt;3
   &lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Insights:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Microsoft has the most repositories on the list, with five.&lt;/li&gt;
&lt;li&gt;tensorflow/tensorflow and kubernetes/kubernetes both dropped out of the top 20 after three consecutive years on the list (2019 to 2021).&lt;/li&gt;
&lt;li&gt;&lt;p&gt;New to the 2022 list are archway-network/testnets, element-fi/elf-council-frontend, solana-labs/token-list, education/GitHubGraduation-2022, taozhiyu/TyProAction, NixOS/nixpkgs, rust-lang/rust.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Time range: 2022.01.01-2022.09.30, excluding bot events&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Who gave the most stars in 2022
&lt;/h2&gt;

&lt;p&gt;We queried the developers who gave the most stars in 2022, took the top 20, and filtered out accounts of suspected bots. If a developer's number of star events divided by the number of starred repositories is equal to or greater than 2, we suspect this user to be a bot.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--UX0jkChq--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/vw730d44uxf2a5zcocm2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--UX0jkChq--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/vw730d44uxf2a5zcocm2.png" alt="Image description" width="880" height="449"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We found that until September 30, 2022, the developer who starred the most repositories had starred a total of 37,228 repositories, an average of 136 repositories per day.&lt;/p&gt;

&lt;p&gt;* Time range: 2022.01.01-2022.09.30, excluding bot events&lt;/p&gt;

&lt;h2&gt;
  
  
  The most active developers since 2011
&lt;/h2&gt;

&lt;p&gt;We queried the top 20 most active developers per year since 2011. This time we didn't filter out bot events.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ngXl4r9G--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/rqqf9bon8cw1z1lyvtgk.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ngXl4r9G--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/rqqf9bon8cw1z1lyvtgk.jpg" alt="Image description" width="880" height="499"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We found that the percentage of bots is becoming larger and larger. Bots started to overtake humans in 2013 and have reached over 95% in 2022.&lt;/p&gt;

&lt;h2&gt;
  
  
  Appendix
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Term description
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GitHub events&lt;/strong&gt;: GitHub events are triggered by user actions, like starring a repository or pushing code.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Time range&lt;/strong&gt;: In this report, the data collection range of 2022 is from January 1, 2022 to September 30, 2022. When comparing data of 2022 with another year, we use year-on-year analysis.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bot events&lt;/strong&gt;: Bot-triggered events account for a growing percentage of GitHub events. However, these events are not the focus of this report. We filtered out most of the bot-initiated events by matching regular expressions.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  How we classify technical fields by topics
&lt;/h3&gt;

&lt;p&gt;We do exact matching and fuzzy matching based on the repository topic. Exact matching means that the repository topics have a topic that exactly matches the word, and fuzzy matching means that the repository topics have a topic that contains the word.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;tr&gt;
   &lt;td&gt;
&lt;strong&gt;Topic&lt;/strong&gt;
   &lt;/td&gt;
   &lt;td&gt;
&lt;strong&gt;Exact matching&lt;/strong&gt;
   &lt;/td&gt;
   &lt;td&gt;
&lt;strong&gt;Fuzzy matching&lt;/strong&gt;
   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;
&lt;strong&gt;GitHub Actions&lt;/strong&gt;
   &lt;/td&gt;
   &lt;td&gt;actions
   &lt;/td&gt;
   &lt;td&gt;github-action, gh-action
   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;
&lt;strong&gt;Low Code&lt;/strong&gt;
   &lt;/td&gt;
   &lt;td&gt;
   &lt;/td&gt;
   &lt;td&gt;low-code, lowcode, nocode, no-code
   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;
&lt;strong&gt;Web3&lt;/strong&gt;
   &lt;/td&gt;
   &lt;td&gt;
   &lt;/td&gt;
   &lt;td&gt;web3
   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;
&lt;strong&gt;Database&lt;/strong&gt;
   &lt;/td&gt;
   &lt;td&gt;db
   &lt;/td&gt;
   &lt;td&gt;database, databases
&lt;br&gt;
nosql, newsql, sql
&lt;br&gt;
mongodb,neo4j
   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;
&lt;strong&gt;AI&lt;/strong&gt;
   &lt;/td&gt;
   &lt;td&gt;ai, aiops, aiot
   &lt;/td&gt;
   &lt;td&gt;artificial-intelligence, machine-intelligence
&lt;br&gt;
computer-vision, image-processing, opencv, computervision, imageprocessing
&lt;br&gt;
voice-recognition, speech-recognition, voicerecognition, speechrecognition, speech-processing
&lt;br&gt;
machinelearning, machine-learning
&lt;br&gt;
deeplearning, deep-learning
&lt;br&gt;
transferlearning, transfer-learning
&lt;br&gt;
mlops
&lt;br&gt;
text-to-speech, tts, speech-synthesis, voice-synthesis
&lt;br&gt;
robot, robotics
&lt;br&gt;
sentiment-analysis
&lt;br&gt;
natural-language-processing, nlp
&lt;br&gt;
language-model, text-classification, question-answering, knowledge-graph, knowledge-base
&lt;br&gt;
gan, gans, generative-adversarial-network, generative-adversarial-networks
&lt;br&gt;
neural-network, neuralnetwork, neuralnetworks, neural-network, dnn
&lt;br&gt;
tensorflow
&lt;br&gt;
PyTorch
&lt;br&gt;
huggingface
&lt;br&gt;
transformers
&lt;br&gt;
seq2seq, sequence-to-sequence
&lt;br&gt;
data-analysis, data-science
&lt;br&gt;
object-detection, objectdetection
&lt;br&gt;
data-augmentation
&lt;br&gt;
classification
&lt;br&gt;
action-recognition
   &lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;&lt;/div&gt;

</description>
      <category>opensource</category>
      <category>github</category>
      <category>javascript</category>
      <category>programming</category>
    </item>
    <item>
      <title>GitHub Events Are Booming! Are Bots the Reason?</title>
      <dc:creator>OSS Insight</dc:creator>
      <pubDate>Wed, 03 Aug 2022 09:58:31 +0000</pubDate>
      <link>https://forem.com/ossinsight/github-events-are-booming-are-bots-the-reason-3kj3</link>
      <guid>https://forem.com/ossinsight/github-events-are-booming-are-bots-the-reason-3kj3</guid>
      <description>&lt;p&gt;The &lt;a href="https://ossinsight.io/"&gt;OSS Insight&lt;/a&gt; website displays the data changes of GitHub events in real time. GitHub events are activities triggered by user actions on GitHub, for example, commenting and forking a repository. &lt;strong&gt;In nearly seven weeks, GitHub events increased by about 150 million, from 4.7 billion to 4.85 billion.&lt;/strong&gt; GitHub events are booming!&lt;/p&gt;

&lt;p&gt;This post dives deeply into GitHub event trending, why GitHub events are surging, and whether GitHub's architecture can handle the increasing load.&lt;/p&gt;

&lt;h2&gt;
  
  
  Historical data analysis
&lt;/h2&gt;

&lt;p&gt;The OSS Insight database includes all the GitHub events since 2011. When we plot the number of events by year, we can see that since 2018 they have been increasing rapidly.&lt;/p&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--YxHThgXV--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/37a8mvdfph46garlap7j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--YxHThgXV--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/37a8mvdfph46garlap7j.png" alt="Image description" width="880" height="522"&gt;&lt;/a&gt; &lt;/p&gt;


&lt;center&gt;&lt;em&gt;GitHub event trending&lt;/em&gt;&lt;/center&gt;



&lt;p&gt;The figure below shows how long it takes to grow each billion events in GitHub.&lt;/p&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--hlucuzWt--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/9h9oiy2at82zr9yoo0xr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--hlucuzWt--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/9h9oiy2at82zr9yoo0xr.png" alt="Image description" width="880" height="487"&gt;&lt;/a&gt; &lt;/p&gt;


&lt;center&gt;&lt;em&gt;The time to reach a billion GitHub events&lt;/em&gt;&lt;/center&gt;



&lt;p&gt;&lt;strong&gt;It's taking less and less for GitHub to generate 1 billion events. It took more than 6 years for the first billion events and only 13 months for the last billion!&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The secret behind the exponential growth of GitHub events
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.blog/2018-10-16-future-of-software/"&gt;GitHub Actions&lt;/a&gt; was released in October 2018. Since August 2019, it has &lt;a href="https://github.blog/2019-08-08-github-actions-now-supports-ci-cd/"&gt;supported continuous integration and continuous delivery (CI/CD)&lt;/a&gt;, and it has been free for open source projects. Therefore, projects hosted on GitHub can automate their own development workflows, and a large number of automation-related bot applications have appeared on GitHub Marketplace. Could GitHub events' data growth be related to these?&lt;/p&gt;

&lt;p&gt;To find the answer, we divided the events into data from humans and data from bots and plotted them with the following histogram. The blue columns represent the human data, and the yellow columns represent the bot data.&lt;/p&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--uT0rZCtr--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/8zsoq71fg4edrvh6yfjf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--uT0rZCtr--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/8zsoq71fg4edrvh6yfjf.png" alt="Image description" width="880" height="526"&gt;&lt;/a&gt; &lt;/p&gt;


&lt;center&gt;&lt;em&gt;Bot events vs. human events&lt;/em&gt;&lt;/center&gt;



&lt;p&gt;As you can see, the proportion of GitHub bot events has increased each year. In 2015, they were only 1.23% of all events. In early July of this year, they reached 13.2%. To show the data changes of bot events more clearly, we made the following line chart.&lt;/p&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--VgJz_QWC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ji7fzj5hn4dzdxq04pk4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--VgJz_QWC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ji7fzj5hn4dzdxq04pk4.png" alt="Image description" width="880" height="462"&gt;&lt;/a&gt; &lt;/p&gt;


&lt;center&gt;&lt;em&gt;Bot event trending&lt;/em&gt;&lt;/center&gt;



&lt;p&gt;This figure shows that since 2019, bot events have been grown faster than before. As &lt;a href="https://github.com/Mini256"&gt;Mini256&lt;/a&gt;, a TiDB community contributor said in &lt;a href="https://ossinsight.io/blog/say-thanks-to-github-robots"&gt;Love, Code, and Robot — Explore robots in the world of code&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;For now, rough statistics find that there are more than 95,620 bots on GitHub. The number doesn't seem like so much, but wait...&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;These 95 thousand bot accounts generated 603 million events. These events account for 12.82% of all public events on GitHub&lt;/strong&gt;, and these GitHub robots have served over 18 million open source repositories.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Bots are playing an increasingly important role on GitHub. Many projects are handing over automated work to bots. We expect that GitHub events will grow faster in the future.&lt;/p&gt;

&lt;h2&gt;
  
  
  When will GitHub reach 10 billion events?
&lt;/h2&gt;

&lt;p&gt;How many GitHub events will there be by the end of 2022? We fit predictions to GitHub historical data.&lt;/p&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--fVQ34Itu--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/kdxu05robkk2ale9yano.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--fVQ34Itu--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/kdxu05robkk2ale9yano.png" alt="Image description" width="880" height="351"&gt;&lt;/a&gt; &lt;/p&gt;


&lt;center&gt;&lt;em&gt;Human event fit (left) vs. bot event fit (right)&lt;/em&gt;&lt;/center&gt;



&lt;p&gt;It's estimated that by the end of 2022, GitHub events will reach 5.36 billion.&lt;/p&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--EV2JFPmv--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/6w8xhacdqmgug7w4gatb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--EV2JFPmv--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/6w8xhacdqmgug7w4gatb.png" alt="Image description" width="880" height="693"&gt;&lt;/a&gt; &lt;/p&gt;


&lt;center&gt;&lt;em&gt;GitHub event prediction&lt;/em&gt;&lt;/center&gt;



&lt;p&gt;According to this prediction, GitHub events will exceed 10 billion in February 2025.&lt;/p&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--I5a6VARC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/63qbg6u0g0oa15pf0of6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--I5a6VARC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/63qbg6u0g0oa15pf0of6.png" alt="Image description" width="880" height="671"&gt;&lt;/a&gt; &lt;/p&gt;


&lt;center&gt;&lt;em&gt;GitHub events will exceed 10 billion in 2025&lt;/em&gt;&lt;/center&gt;



&lt;h2&gt;
  
  
  Can MySQL sharding support such a huge amount of data?
&lt;/h2&gt;

&lt;p&gt;GitHub uses MySQL as the main storage for all non-git warehouse data. The rapid growth of data volume poses a great challenge to GitHub's high availability. In March 2022, GitHub had 3 service disruptions, each lasting 2-5 hours. &lt;a href="https://github.blog/2022-03-23-an-update-on-recent-service-disruptions/"&gt;The official investigation report&lt;/a&gt; shows the MySQL database caused the outages. During peak load periods, the GitHub mysql1 database (the main database cluster in GitHub) load increased. Therefore, database access reached the maximum number of connections. This affected the performance of many GitHub services and features.&lt;/p&gt;

&lt;p&gt;In fact, over the past few years GitHub has optimized its databases. For example, it added clusters to support platform growth and partitioned the main database. But these improvements did not fundamentally solve the problem. In the near future, GitHub events will exceed 5 billion, or even 10 billion. Can MySQL sharding support such data surge?&lt;/p&gt;

&lt;h2&gt;
  
  
  Data sources
&lt;/h2&gt;

&lt;p&gt;All the analysis data in this article comes from &lt;a href="https://ossinsight.io/"&gt;OSS Insight&lt;/a&gt;, a tool based on &lt;a href="https://en.pingcap.com/tidb-cloud/?utm_source=ossinsight&amp;amp;utm_medium=referral"&gt;TiDB&lt;/a&gt; to analyze and gain insights into GitHub events data.&lt;/p&gt;

&lt;p&gt;You can use it to easily get insights about developers and repositories based on billions of GitHub events. You can also get the latest and historical rankings and trends in technical fields.&lt;/p&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--TiqSz-9U--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/l1jo2fmu9bd98g4967rv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--TiqSz-9U--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/l1jo2fmu9bd98g4967rv.png" alt="Image description" width="880" height="356"&gt;&lt;/a&gt; &lt;/p&gt;


&lt;center&gt;&lt;em&gt;The OSS Insight website&lt;/em&gt;&lt;/center&gt;



</description>
      <category>github</category>
      <category>bot</category>
      <category>opensource</category>
      <category>database</category>
    </item>
    <item>
      <title>Deep Insights into Web Frameworks</title>
      <dc:creator>OSS Insight</dc:creator>
      <pubDate>Thu, 07 Jul 2022 05:46:30 +0000</pubDate>
      <link>https://forem.com/ossinsight/deep-insights-into-web-frameworks-3g43</link>
      <guid>https://forem.com/ossinsight/deep-insights-into-web-frameworks-3g43</guid>
      <description>&lt;p&gt;In this chapter, we will share with you some of the top Web Framework repos (WF repos) on GitHub in 2021 measured by different metrics including the number of stars, PRs, contributors, countries, regions and so on.&lt;/p&gt;

&lt;p&gt;Note:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;You can move your cursor onto any of the repository bars/lines on the chart and get the exact number.&lt;/li&gt;
&lt;li&gt;The SQL commands above each chart are what we use on our TiDB Cloud to get the analytical results. Try those SQL commands by yourselves on TiDB Cloud with this 10-minute tutorial.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Star history of top Web Framework repos since 2011
&lt;/h2&gt;

&lt;p&gt;The number of stars is often thought of as a measure of whether a github repository is popular or not. We sort all web framework repositories from github by the total number of historical stars since 2011. For visualizing the results more intuitively, we show the top 10 open source databases by using an interactive line chart.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--rEaSphvF--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/9fezwl02vn07p37ylfav.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--rEaSphvF--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/9fezwl02vn07p37ylfav.png" alt="Image description" width="880" height="416"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--qXMedzKI--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/j9wwwm3f29riun5n46g2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--qXMedzKI--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/j9wwwm3f29riun5n46g2.png" alt="Image description" width="880" height="416"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Top 10 most starred Web Framework repos in 2021
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--scpbP4qJ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/vvh62rb7vh7omuqpjhr6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--scpbP4qJ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/vvh62rb7vh7omuqpjhr6.png" alt="Image description" width="880" height="633"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Top 10 Web Framework repos with the most PRs in 2021
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--T2Q9bR7X--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ondk2ckejyslx2r87ef4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--T2Q9bR7X--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ondk2ckejyslx2r87ef4.png" alt="Image description" width="880" height="647"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Top 20 Web Framework repos with the highest YoY growth rate of stars in 2021
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--C-YxwuUh--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/9do4dl66fc173998coml.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--C-YxwuUh--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/9do4dl66fc173998coml.png" alt="Image description" width="880" height="698"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Top 10 Web Framework repos with the lowest YoY growth rate of stars in 2021
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--BRT5UbzM--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/6zo2wqnqvava83yyslbt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--BRT5UbzM--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/6zo2wqnqvava83yyslbt.png" alt="Image description" width="880" height="696"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Top 10 most used programming languages in Web Framework repos in 2021
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--tDoVvvEB--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/rxyxempejxv4xtgu9h2q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--tDoVvvEB--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/rxyxempejxv4xtgu9h2q.png" alt="Image description" width="880" height="495"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Top 10 countries/regions contributing the most to Web Framework repos in 2021
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--9V6dEhBC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/lqfzyvsfvazlakwnjrud.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--9V6dEhBC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/lqfzyvsfvazlakwnjrud.png" alt="Image description" width="880" height="491"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Rankings of Web Framework repos measured by Z-score in 2021
&lt;/h2&gt;

&lt;p&gt;The analytical results displayed above are generated based on just one single metric of these three: stars, PRs, or contributors. Now, we will use the Z-score method to rank the WF repos on GitHub.&lt;/p&gt;

&lt;p&gt;This is the comprehensive ranking calculated by z-score:&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--zUBUW95C--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/99xt2dxc88pgqspfryb7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--zUBUW95C--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/99xt2dxc88pgqspfryb7.png" alt="Image description" width="880" height="893"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;_More content and specific SQL can go into the official website to learn more. _&lt;br&gt;
&lt;a href="https://ossinsight.io/blog/deep-insight-into-web-framework-2021"&gt;https://ossinsight.io/blog/deep-insight-into-web-framework-2021&lt;/a&gt;&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>opensource</category>
      <category>github</category>
      <category>programming</category>
    </item>
    <item>
      <title>Deep Insights into Programming Languages</title>
      <dc:creator>OSS Insight</dc:creator>
      <pubDate>Mon, 04 Jul 2022 02:17:16 +0000</pubDate>
      <link>https://forem.com/ossinsight/deep-insights-into-programming-languages-4323</link>
      <guid>https://forem.com/ossinsight/deep-insights-into-programming-languages-4323</guid>
      <description>&lt;p&gt;In this chapter, we will share with you some of the top programming language repos (PL repos) on GitHub in 2021 measured by different metrics including the number of stars, PRs, contributors, countries, regions and so on.&lt;/p&gt;

&lt;p&gt;Note:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;You can move your cursor onto any of the repository bars/lines on the chart and get the exact number.&lt;/li&gt;
&lt;li&gt;The SQL commands above each chart are what we use on TiDB Cloud to get the analytical results. Try those SQL commands by yourselves on TiDB Cloud with this 10-minute tutorial.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Star history of top PL repos since 2011
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--EIdmwIhQ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/r8kfgo25qpu4a2iv9r3m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--EIdmwIhQ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/r8kfgo25qpu4a2iv9r3m.png" alt="Image description" width="880" height="445"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Top 10 most starred PL repos in 2021
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s---OO-xvNy--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/r7y9ogjfled3y95ym40d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s---OO-xvNy--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/r7y9ogjfled3y95ym40d.png" alt="Image description" width="880" height="625"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Top 10 PL repos with the most PRs in 2021
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--4cJ_2l06--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/7hrp3eewhdqvdta8qhs1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--4cJ_2l06--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/7hrp3eewhdqvdta8qhs1.png" alt="Image description" width="880" height="629"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Top 9 PL repos with the highest YoY growth rate of stars in 2021
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--4dnSMABp--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/mts6523xnnu30z7bbjw9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--4dnSMABp--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/mts6523xnnu30z7bbjw9.png" alt="Image description" width="880" height="697"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Top 10 PL repos with the lowest YoY growth rate of stars in 2021
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s---fuGkzcQ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/61de9o28bbqaar0genz5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s---fuGkzcQ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/61de9o28bbqaar0genz5.png" alt="Image description" width="880" height="689"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Top countries or regions contributing to OSS programming languages
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--P0c4AdPo--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/x8rili03sv4x1vuz069f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--P0c4AdPo--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/x8rili03sv4x1vuz069f.png" alt="Image description" width="880" height="507"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The rankings of PL repos measured by Z-score in 2021
&lt;/h2&gt;

&lt;p&gt;The analytical results displayed above are generated based on just one single metric of these three: stars, PRs, or contributors. Now, we will use the Z-score method to rank PL repos on GitHub.&lt;/p&gt;

&lt;p&gt;This is the comprehensive ranking calculated by z-score:&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--y-TqZ4aq--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/cmqc4646tqyxk6qtxki2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--y-TqZ4aq--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/cmqc4646tqyxk6qtxki2.png" alt="Image description" width="880" height="905"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;More content and specific SQL can go into the official website to learn more.&lt;/em&gt;&lt;br&gt;
&lt;a href="https://ossinsight.io/blog/deep-insight-into-programming-languages-2021"&gt;https://ossinsight.io/blog/deep-insight-into-programming-languages-2021&lt;/a&gt;&lt;/p&gt;

</description>
      <category>programming</category>
      <category>github</category>
      <category>opensource</category>
      <category>database</category>
    </item>
    <item>
      <title>Love, Code, and Robot — Explore robots in the world of code</title>
      <dc:creator>OSS Insight</dc:creator>
      <pubDate>Fri, 01 Jul 2022 03:28:55 +0000</pubDate>
      <link>https://forem.com/ossinsight/love-code-and-robot-explore-robots-in-the-world-of-code-1ah8</link>
      <guid>https://forem.com/ossinsight/love-code-and-robot-explore-robots-in-the-world-of-code-1ah8</guid>
      <description>&lt;p&gt;When it comes to GitHub, we often see fake GitHub users who are always enthusiastic and active, giving timely feedback to project maintainers and contributors, and helping developers with tasks that can be automated. Yes, the next thing I want to discuss is something about GitHub bots.&lt;/p&gt;

&lt;h2&gt;
  
  
  Overview
&lt;/h2&gt;

&lt;p&gt;In the OSSInsight project, we have developed a number of metrics to provide insight into open source projects. When developing some open source project metrics, we always consider excluding bot-generated actions or events from the metric calculation.&lt;/p&gt;

&lt;p&gt;However, We can't ignore the contribution of robots in the domain of open source, and it's important to shift our thinking to look at what bots are doing on GitHub.&lt;/p&gt;

&lt;p&gt;GitHub bots help developers do a lot of work:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Issue triage and management. (For example: stale[bot]、todo[bot])&lt;/li&gt;
&lt;li&gt;Code review, security audit and quality inspection (For example, snyk-bot).&lt;/li&gt;
&lt;li&gt;Format checking like ensuring license agreement signing, or make sure commit messages semantic. (For example: CLAassistant)&lt;/li&gt;
&lt;li&gt;Integration with third-party systems, including Jira, Slack, Jenkins and so on.&lt;/li&gt;
&lt;li&gt;As an agent to help contributor perform some operations needed permission on the repository. (For example: k8s-ci-bot、ti-chi-bot)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  History trends
&lt;/h2&gt;

&lt;p&gt;Looking at the historical data, we see that the number of GitHub bots grows significantly faster after 2019 (on average, 20,000 new bots are created each year)&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--I1qFZZBw--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/yrpsx77byktxarenxrze.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--I1qFZZBw--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/yrpsx77byktxarenxrze.png" alt="Image description" width="880" height="490"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I looked into what happened during the year and found that GitHub invested a lot in its software development infrastructure (including bots) during the year.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;In May 23, 2019, GitHub announced acquired Dependabot (Aka, dependabot[bot]).&lt;/li&gt;
&lt;li&gt;In June 17th, 2019, GitHub announced acquired Pull Panda.&lt;/li&gt;
&lt;li&gt;In September 18th, 2019, GitHub announced acquired Semmle (Aka, the team builded lgtm-com[bot]).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At this year, we, humans beings, were amazed to discover that bots could find problems, submit PRs, wait CI test code, complete merges and comment on PRs on their own without any human involvement. &lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--OzVvGX_9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/a44qaudw33s8xg68iw39.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--OzVvGX_9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/a44qaudw33s8xg68iw39.png" alt="Image description" width="880" height="808"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For now, rough statistics found that there are more than 95,620 bots on GitHub, the number doesn't seem like so much, but wait...&lt;/p&gt;

&lt;p&gt;These 95 thousand bot accounts generated 603 million events, these events account for 12.82% of all public events on GitHub. &lt;/p&gt;

&lt;p&gt;And these GitHub robots have served over 18 million open source repositories.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cases study
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Dependabot[bot]
&lt;/h3&gt;

&lt;p&gt;dependabot[bot] is a hard-working bot responsible for helping open source projects keep their dependencies up to date.&lt;/p&gt;

&lt;p&gt;By analyzing depentenbot's Push commit time, we found that he likes to start his busy week at 8:00 on Mondays (at GMT timezone).&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--0-jkHESM--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/zxw3zz0sm0tuy7xujbbs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--0-jkHESM--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/zxw3zz0sm0tuy7xujbbs.png" alt="Image description" width="880" height="442"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It is commendable that, after a series of log4j security vulnerabilities came to light, it helped many Java-language repositories to update the dependency to a secure version timely.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stale Bots
&lt;/h2&gt;

&lt;p&gt;Stale Bot is a controversial class of robots, they are responsible for reminding maintainers to continue promoting long-term stale issue.&lt;/p&gt;

&lt;p&gt;Bad practice&lt;br&gt;
The user from Gatsby:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I used to open GitHub issues to Gatsby to report bugs. Almost nothing was ever fixed and every few weeks I had to manually clickety-click to keep the issues alive because of the stale bot. Guess what I do now? I don't report bugs to Gatsby, and I recommend against using Gatsby in newer projects.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Best practices&lt;br&gt;
The user from NixOS:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;IMO NixOS has the right stalebot settings 0. It was discussed thouroughly in the RFC, as to choose the right information text and other actions by the bot. For example, the bot will only mark the issue/PR as stale and will never close the issue or lock it. Issues are only ever closed by humans. The information text they came up with is quite a bit longer than the ansible one 1. I think this is a very important point when adding such a bot, otherwise the user will be left helpless.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;To verify the above statement, we run the following query through the SQL statement:&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--3qmUvYJ2--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/5p1y8am6n5mjkccqfyjr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--3qmUvYJ2--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/5p1y8am6n5mjkccqfyjr.png" alt="Image description" width="880" height="324"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We know from the following query that many Issues in the gatsbyjs/gatsby repository have been closed by the stale bots.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--oZ8k3A5J--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ytpm3t0hq2bkgwjwgg00.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--oZ8k3A5J--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ytpm3t0hq2bkgwjwgg00.png" alt="Image description" width="880" height="382"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I think it is necessary to distinguish between what should be done by robots and what must be done with human involvement.&lt;/p&gt;

&lt;h2&gt;
  
  
  Weird bots
&lt;/h2&gt;

&lt;p&gt;There are some weird bots on GitHub that don't help people work and learn on GitHub, but rather act as data movers.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--qElt2Vn8--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/4ti4ez52jvkdyhybzt1r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--qElt2Vn8--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/4ti4ez52jvkdyhybzt1r.png" alt="Image description" width="880" height="902"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Some of them will use GitHub as a free place to archive their data, for example, speedtracker-bot, newstools.&lt;/li&gt;
&lt;li&gt;Some of them will periodically upload a timestamp to the code repository as a commit, for example, keihin00174.&lt;/li&gt;
&lt;li&gt;Some are even crazier and you can't even access their profile pages because the number of events generated is so large that GitHub's database can't process them quickly, for example, mhutchinson-witness, direwolf-github.
&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--dyWHvQ9q--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/is95hss8wf3aw1y6gagk.png" alt="Image description" width="880" height="496"&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;_More content and specific SQL can go into the official website to learn more. _&lt;br&gt;
&lt;a href="https://ossinsight.io/blog/say-thanks-to-github-robots"&gt;https://ossinsight.io/blog/say-thanks-to-github-robots&lt;/a&gt;&lt;/p&gt;

</description>
      <category>github</category>
      <category>opensource</category>
      <category>programming</category>
      <category>discuss</category>
    </item>
    <item>
      <title>Build a Better GitHub Insight Tool in a Week? A True Story</title>
      <dc:creator>OSS Insight</dc:creator>
      <pubDate>Mon, 27 Jun 2022 05:31:01 +0000</pubDate>
      <link>https://forem.com/ossinsight/build-a-better-github-insight-tool-in-a-week-a-true-story-14en</link>
      <guid>https://forem.com/ossinsight/build-a-better-github-insight-tool-in-a-week-a-true-story-14en</guid>
      <description>&lt;p&gt;In early January 2022, Max, our CEO, a big fan of open-source, asked if my team could build a small tool to help us understand all the open-source projects on GitHub; and, that if everything worked well, we should open the API to help open source developers to build better insights. In fact, GitHub continuously publishes the public events in its open-source world through the open API. (Thank you and well done! Github). We can certainly learn a lot from the data!&lt;/p&gt;

&lt;p&gt;I was excited about this project until Max said: “You’ve only got one week.” Well, the boss is the boss! Although time was tight and we were faced with multiple head-aching problems, I decided to take up this challenge.&lt;/p&gt;

&lt;h2&gt;
  
  
  Headache 1: we need both historical and real-time data.
&lt;/h2&gt;

&lt;p&gt;After some quick research, we found GHArchive, an open-source project that collects and archives all GitHub data from 2011 and updates it hourly. By the way, a lot of open-source analytical tools such as CNCF's Devstats rely on GH Archive, too.&lt;/p&gt;

&lt;p&gt;Thanks to GH Archive, we found the data source.&lt;/p&gt;

&lt;p&gt;But there's another problem: hourly data is good, but not good enough. We wanted our data to be updated in real time—or at least near real time. We decided to directly use the GitHub event API, which collects all events that have occurred within the past hour.&lt;/p&gt;

&lt;p&gt;By combining the data from the GH Archive and the GitHub event API, we can gain streaming, real-time event updates.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--57CZZa50--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/3ze3ktxbono73vm3t88n.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--57CZZa50--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/3ze3ktxbono73vm3t88n.gif" alt="Image description" width="582" height="451"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Headache 2: the data is huge!
&lt;/h2&gt;

&lt;p&gt;After we decompressed all the data from GH Archive, we found there were more than 4.6 billion rows of GitHub events. That’s a lot of data! We also noticed that about 300,000 rows were generated and updated each hour.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--WKasTBKs--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/0p25ugmc993s4nb6t7za.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--WKasTBKs--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/0p25ugmc993s4nb6t7za.png" alt="Image description" width="880" height="202"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The database solution would be tricky here. Our goal is to build an application that provides real-time data insights based on a continuously growing dataset. So, scalability is a must. NoSQL databases can provide good scalability, but what follows is how to handle complex analytical queries. Unfortunately, NoSQL databases are not good at that.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--9Lg6YOqG--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/4no8nl4cton0hop0j4dw.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--9Lg6YOqG--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/4no8nl4cton0hop0j4dw.jpg" alt="Image description" width="880" height="1359"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Another option is to use an OLAP database such as ClickHouse. ClickHouse can handle the analytical workload very well, but it is not designed for serving online traffic. If we chose it, we would need another database for the online traffic.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--zW1OGb2M--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/5llql0rj0a9752dt2suw.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--zW1OGb2M--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/5llql0rj0a9752dt2suw.jpg" alt="Image description" width="880" height="1359"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;What about sharding the database and then building an extract, transform, load (ETL) pipeline to synchronize the new events to a data warehouse? This sounds workable.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--LXL7kBFj--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/3du48ofhuo7xspd4r8sw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--LXL7kBFj--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/3du48ofhuo7xspd4r8sw.png" alt="Image description" width="880" height="292"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;According to our product manager's (PM’s) plan, we needed to do some repo-specific or user-specific analysis. Although the total data volume was huge, the number of events was not too large for a single project or user. This meant using the secondary indexes in RDBMS would be a good idea. But, if we decided to use the above architecture, we had to be careful in selecting the database sharding key. For example, if we use user_id as the sharding key, then queries based on repo_id will be very tricky.&lt;/p&gt;

&lt;p&gt;Another requirement from the PM was that our insight tool should provide OpenAPI, which meant we would have unpredictable concurrent traffic from the outside world.&lt;/p&gt;

&lt;p&gt;Since we're not experts on Kafka and data warehouses, mastering and building such an infrastructure in just one week was a very difficult task for us.&lt;/p&gt;

&lt;p&gt;The choice is obvious now, and don't forget PingCAP is a database company! TiDB seems a perfect fit for this, and it's a good chance to eat our own dog food. So, why not using TiDB! :)&lt;/p&gt;

&lt;p&gt;If we use TiDB, can we get:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;SQL support, including complex &amp;amp; flexible queries? ☑️&lt;/li&gt;
&lt;li&gt;Scalability? ☑️&lt;/li&gt;
&lt;li&gt;Secondary index support for fast lookup? ☑️&lt;/li&gt;
&lt;li&gt;Capability for online serving? ☑️&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Wow! It seems we got a winner!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--bYDnHcHb--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/3e2dw1a36o34z4oen2kc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--bYDnHcHb--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/3e2dw1a36o34z4oen2kc.png" alt="Image description" width="880" height="239"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To choose a database to support an application like OSS Insight, we think TiDB is a great choice. Plus, its simplified technology stack means a faster go-to-market and faster delivery of my boss' assignment.&lt;/p&gt;

&lt;p&gt;After we used TiDB, we got a simplified architecture as shown below.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--sqqnMcpn--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/bk63v2goi0kbg3ppgb5u.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--sqqnMcpn--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/bk63v2goi0kbg3ppgb5u.jpg" alt="Image description" width="880" height="622"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Headache 3: We have a "pushy" PM!
&lt;/h2&gt;

&lt;p&gt;Just as the subtitle indicates, we have a very “pushy” PM, which is not always a bad thing. :) His demands kept extending, from the single project analysis at the very beginning to the comparison and ranking of multiple repositories, and to other multidimensional analysis such as the geographical distribution of stargazers and contributors. What’s more pressing was that the deadlines stayed unchanged!!!&lt;/p&gt;

&lt;p&gt;We had to keep a balance between the growing demands and the tight deadlines.&lt;/p&gt;

&lt;p&gt;To save time, we built our website using Docusaurus, an open source static site generator in React with scalability, rather than building a site from scratch. We also used Apache Echarts, a powerful charting library, to turn analytical results into good-looking and easy-to-understand charts.&lt;/p&gt;

&lt;p&gt;We chose TiDB as the database to support our website, and it perfectly supports SQL. This way, our back-end engineers could write SQL commands to handle complex and flexible analytical queries with ease and efficiency. Then, our front-end engineers would just need to display those SQL execution results in the form of good-looking charts.&lt;/p&gt;

&lt;p&gt;Finally, we made it. We prototyped our tool in just one week, and named it OSS Insight, short for open source software insights. We continued to fine-tune it, and it was officially released on May 3.&lt;/p&gt;

&lt;h2&gt;
  
  
  How we deal with analytical queries with SQL
&lt;/h2&gt;

&lt;p&gt;Let's use one example to show you how we deal with complex analytical queries.&lt;/p&gt;

&lt;h3&gt;
  
  
  Analyze a GitHub collection: JavaScript frameworks
&lt;/h3&gt;

&lt;p&gt;OSS Insight can analyze popular GitHub collections by many metrics including the number of stars, issues, and contributors. Let’s identify which JavaScript framework has the most issue creators. This is an analytical query that includes aggregation and ranking. To get the result, we only need to execute one SQL statement:&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--st0zFw05--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/8aa5zomnnw9xcloeqraj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--st0zFw05--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/8aa5zomnnw9xcloeqraj.png" alt="Image description" width="880" height="591"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the statement above, the collections and collection_items tables store the data of all GitHub repository collections in various areas. Each table has 30 rows. To get the order of issue creators, we need to associate the repository ID in the collection_items table with the real, 4.6-billion-row github_events table as shown below.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s---0fd57cE--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/4ixchvbq3nisrkxp5acs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s---0fd57cE--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/4ixchvbq3nisrkxp5acs.png" alt="Image description" width="880" height="535"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next, let's look at the execution plan. TiDB is compatible with MySQL syntax, so its execution plan looks very similar to that of MySQL.&lt;/p&gt;

&lt;p&gt;In the figure below, notice the parts in red boxes. The data in the table collection_items is read through distributed[row], which means this data is processed by TiDB’s row storage engine, TiKV. The data in the table github_events is read through distributed[column], which means this data is processed by TiDB’s columnar storage engine, TiFlash. TiDB uses both row and columnar storage engines to execute the same SQL statement. This is so convenient for OSS Insight because it doesn’t have to split the query into two statements.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--pBKpLq-x--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/6ko4jvhya4a2407xmhkk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--pBKpLq-x--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/6ko4jvhya4a2407xmhkk.png" alt="Image description" width="880" height="285"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;TiDB returns the following result:&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--HrFDnS5t--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/x15zbejtwj2jd62fekg6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--HrFDnS5t--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/x15zbejtwj2jd62fekg6.png" alt="Image description" width="880" height="593"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Then, we just need to draw the result with Apache Echarts into a more visualized chart as shown below.&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--0jjZA6-b--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/x848qaggabkcu5yztost.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--0jjZA6-b--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/x848qaggabkcu5yztost.png" alt="Image description" width="880" height="550"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Note: You can click the REQUEST INFO on the upper right side of each chart to get the SQL command for each result.&lt;/p&gt;

&lt;h2&gt;
  
  
  Feedback: People love it!
&lt;/h2&gt;

&lt;p&gt;After we released OSS Insight on May 3, we have received loud applause on social media, via emails and private messages, from many developers, engineers, researchers, and people who are passionate about the open source community in various companies and industries.&lt;/p&gt;

&lt;p&gt;I am more than excited and grateful that so many people find OSS Insight interesting, helpful, and valuable. I am also proud that my team made such a wonderful product in such a short time. &lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--qhaElKxH--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/mpkfeea6li9bfhb4zucx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--qhaElKxH--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/mpkfeea6li9bfhb4zucx.png" alt="Image description" width="880" height="1291"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--bwrXhVES--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/76wo0q6sxpq3ox0ojnqk.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--bwrXhVES--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/76wo0q6sxpq3ox0ojnqk.jpg" alt="Image description" width="880" height="1164"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Lessons learned
&lt;/h2&gt;

&lt;p&gt;Looking back at the process we used to build this website, we have learned many mind-refreshing lessons.&lt;/p&gt;

&lt;p&gt;First, quick doesn’t mean dirty, as long as we make the right choices. Building an insight tool in just one week is tricky, but thanks to those wonderful, ready-made, and open source projects such as TiDB, Docusaurus, and Echarts, we made it happen with efficiency and without compromising the quality.&lt;/p&gt;

&lt;p&gt;Second, it’s crucial to select the right database—especially one that supports SQL. TiDB is a distributed SQL database with great scalability that can handle both transactional and real-time analytical workloads. With its help, we can process billions of rows of data with ease, and use SQL commands to execute complicated real-time queries. Further, using TiDB means we can leverage its resources to go to market faster and get feedback promptly.&lt;/p&gt;

&lt;p&gt;If you like our project or are interested in joining us, you’re welcome to submit your PRs to our GitHub repository. You can also follow us on Twitter for the latest information.  &lt;/p&gt;

&lt;p&gt;_More content and specific SQL can go into the official website to learn more. _&lt;br&gt;
&lt;a href="https://ossinsight.io/blog/why-we-choose-tidb-to-support-ossinsight"&gt;https://ossinsight.io/blog/why-we-choose-tidb-to-support-ossinsight&lt;/a&gt;&lt;/p&gt;

</description>
      <category>github</category>
      <category>database</category>
      <category>beginners</category>
      <category>programming</category>
    </item>
    <item>
      <title>Deep Insight Into Open Source Databases</title>
      <dc:creator>OSS Insight</dc:creator>
      <pubDate>Tue, 21 Jun 2022 00:37:16 +0000</pubDate>
      <link>https://forem.com/ossinsight/deep-insight-into-open-source-databases-5dcd</link>
      <guid>https://forem.com/ossinsight/deep-insight-into-open-source-databases-5dcd</guid>
      <description>&lt;p&gt;On this page, we will share with you many deep insights into open source databases, such as the database popularity, database contributors, coding vitality, community feedback and so on.&lt;/p&gt;

&lt;p&gt;We’ll also share the SQL commands that generate all these analytical results above each chart, so you can use them on your own on TiDB Cloud following this 10-minute tutorial.&lt;/p&gt;

&lt;h2&gt;
  
  
  Database Popularity
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The popularity trend in the past ten years
&lt;/h3&gt;

&lt;p&gt;The chart below displays the accumulated number of stars open source databases gained respectively each year and their star growth trend during the past ten years.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--s2wiVs7c--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/gyq1mh47cjf0qet6g21h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--s2wiVs7c--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/gyq1mh47cjf0qet6g21h.png" alt="Image description" width="880" height="434"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://api.ossinsight.io/share/f95516c3-0e20-40ee-bb2d-7d396d1bdeac"&gt;https://api.ossinsight.io/share/f95516c3-0e20-40ee-bb2d-7d396d1bdeac&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Which databases experienced a popularity boom in 2021?
&lt;/h3&gt;

&lt;p&gt;The chart below displays top 10 open source databases with the highest year-over-year growth rate of stars in 2021 alone. &lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Kux6rvli--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/v62mb03kqugk77hks760.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Kux6rvli--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/v62mb03kqugk77hks760.png" alt="Image description" width="880" height="688"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://api.ossinsight.io/share/a317f2d7-e43a-4486-b1df-e4db84ab47a3"&gt;https://api.ossinsight.io/share/a317f2d7-e43a-4486-b1df-e4db84ab47a3&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Which databases barely gained influence in 2021?
&lt;/h3&gt;

&lt;p&gt;The chart below displays top 10 open source databases with the lowest year-over-year growth rate of stars in 2021 alone.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s---i7hGGwC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/l1bkwnjttlga1m0k5sdz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s---i7hGGwC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/l1bkwnjttlga1m0k5sdz.png" alt="Image description" width="880" height="709"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://api.ossinsight.io/share/19ceab26-534d-4512-8be7-bbadb5015d52"&gt;https://api.ossinsight.io/share/19ceab26-534d-4512-8be7-bbadb5015d52&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Which databases were the new favorites in 2021?
&lt;/h3&gt;

&lt;p&gt;The chart below displays the top open source databases that gained the most stars in 2021. &lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--cRzMF5Q7--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/kvlcvptbejjlo7ezv2k8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--cRzMF5Q7--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/kvlcvptbejjlo7ezv2k8.png" alt="Image description" width="880" height="640"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://api.ossinsight.io/share/297bd1d9-4a81-4cd3-ada1-88a6bd5c07c9"&gt;https://api.ossinsight.io/share/297bd1d9-4a81-4cd3-ada1-88a6bd5c07c9&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Which countries &amp;amp; regions favor databases the most?
&lt;/h3&gt;

&lt;p&gt;The map below describes the geographical distribution of database stargazers. The larger and darker the color spots on this map, the more database stargazers are distributed. &lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--omEvyAHp--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/3hm0nwbot0cdy0vcm5cu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--omEvyAHp--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/3hm0nwbot0cdy0vcm5cu.png" alt="Image description" width="880" height="508"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://api.ossinsight.io/share/a6b4597b-468a-4ac8-9759-771d0aa9cdd9"&gt;https://api.ossinsight.io/share/a6b4597b-468a-4ac8-9759-771d0aa9cdd9&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Which companies like databases the most?
&lt;/h3&gt;

&lt;p&gt;The pie chart below describes which company those database stargazers work for and how many stargazers those companies employ. &lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--BoJjHWQf--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/4alqvst8gqv12c504gsz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--BoJjHWQf--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/4alqvst8gqv12c504gsz.png" alt="Image description" width="880" height="505"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://api.ossinsight.io/share/b9526907-27b4-4528-802a-0162af55f133"&gt;https://api.ossinsight.io/share/b9526907-27b4-4528-802a-0162af55f133&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Database contributors
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Which countries &amp;amp; regions led the database contributions in 2021?
&lt;/h3&gt;

&lt;p&gt;The map below shows the geographic distribution of developers who pushed commits, resolved issues, or submitted pull requests to open source databases in 2021. The larger and darker the color spots on this map, the more database contributors were distributed. &lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--mOK0xSHA--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/f2jd31x8tpbfuobi10qf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--mOK0xSHA--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/f2jd31x8tpbfuobi10qf.png" alt="Image description" width="880" height="488"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://api.ossinsight.io/share/7d1bdd66-a39f-4556-80ad-6eab2e66b966"&gt;https://api.ossinsight.io/share/7d1bdd66-a39f-4556-80ad-6eab2e66b966&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  When did developers contribute?
&lt;/h2&gt;

&lt;p&gt;The heat map below describes the number of push events that occur at a particular point of time (UTC). For each day and hour, the colored boxes indicate the number of push events. The lighter the color, the fewer push events; the darker the color, the more push events. You can learn from this heat map what time is the busiest for contributors, and roughly conclude which country or region distributes the most contributors.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--iulwsjZm--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/7lnnpqht02vfkpr25cek.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--iulwsjZm--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/7lnnpqht02vfkpr25cek.png" alt="Image description" width="880" height="456"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://api.ossinsight.io/share/d47c6f1c-39a4-4cf5-9adf-c714bb497e4d"&gt;https://api.ossinsight.io/share/d47c6f1c-39a4-4cf5-9adf-c714bb497e4d&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Database coding vitality
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Contribution trend in the past ten years
&lt;/h3&gt;

&lt;p&gt;The chart below displays the accumulated number of commits pushed by contributors to open source database repositories each year, and their growth trend during the past ten years.&lt;/p&gt;

&lt;p&gt;Coming soon&lt;/p&gt;

&lt;h3&gt;
  
  
  Which databases vibrantly maintains and updates itself in the past ten years?
&lt;/h3&gt;

&lt;p&gt;The chart below displays top 10 open source databases that received the most pull requests in the past ten years.&lt;/p&gt;

&lt;p&gt;Coming soon&lt;/p&gt;

&lt;h3&gt;
  
  
  Which databases vibrantly maintained and updated itself in 2021?
&lt;/h3&gt;

&lt;p&gt;The chart below displays top 10 open source databases that received the most pull requests in 2021 alone.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--hmt6AxBG--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/57xjjsu3p9t2qqsbtexs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--hmt6AxBG--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/57xjjsu3p9t2qqsbtexs.png" alt="Image description" width="880" height="635"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://api.ossinsight.io/share/9bad45fd-4182-4b41-b1a8-f18d6859cbed"&gt;https://api.ossinsight.io/share/9bad45fd-4182-4b41-b1a8-f18d6859cbed&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Database user feedback
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Which databases have the widest feedback sources?
&lt;/h3&gt;

&lt;p&gt;The chart below displays the number of issue creators of leading open source databases each year and their growth trend during the past ten years. &lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--cTuPfRrO--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/o5pov86q13g3eoikpm3e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--cTuPfRrO--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/o5pov86q13g3eoikpm3e.png" alt="Image description" width="880" height="896"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://api.ossinsight.io/share/c1358f96-2b49-4a53-9459-cc71a1d7a3dc"&gt;https://api.ossinsight.io/share/c1358f96-2b49-4a53-9459-cc71a1d7a3dc&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Which databases gave the fastest first response in 2021?
&lt;/h3&gt;

&lt;p&gt;The bar chart below shows the median time each open source database needs to make its first response to an issue. &lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Ylxdn5ET--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/hsgfvs5r10ygdgtyiuer.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Ylxdn5ET--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/hsgfvs5r10ygdgtyiuer.png" alt="Image description" width="880" height="880"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://api.ossinsight.io/share/85d8865a-d60f-45e3-8ee5-9a2f5b15ad38"&gt;https://api.ossinsight.io/share/85d8865a-d60f-45e3-8ee5-9a2f5b15ad38&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Which databases were the most efficient in feedback resolution in 2021?
&lt;/h3&gt;

&lt;p&gt;The bar chart below shows the median time each open source database needs to close an issue. &lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--jBfgWGVj--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/vngg3n0a0x7h5har36qj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--jBfgWGVj--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/vngg3n0a0x7h5har36qj.png" alt="Image description" width="880" height="881"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://api.ossinsight.io/share/5a1cec8a-2e80-4427-8fba-6e64fa0cdc5d"&gt;https://api.ossinsight.io/share/5a1cec8a-2e80-4427-8fba-6e64fa0cdc5d&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Who gave the feedback in 2021?
&lt;/h3&gt;

&lt;p&gt;The map below shows the geographical distribution of developers who submitted issues to open source databases. The larger and darker the color spots on this map, the more issue openers were distributed. &lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--pz9F7TEE--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ys05vluzhq0z8hpikbnj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--pz9F7TEE--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ys05vluzhq0z8hpikbnj.png" alt="Image description" width="880" height="506"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://api.ossinsight.io/share/0a7edfbb-ff66-4b50-a61b-36169b2412dc"&gt;https://api.ossinsight.io/share/0a7edfbb-ff66-4b50-a61b-36169b2412dc&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Community Robustness
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Contributor growth trend in the past ten years
&lt;/h3&gt;

&lt;p&gt;The chart below shows the accumulated number of contributors leading open source databases attracted respectively each year and their growth trend during the past ten years.&lt;/p&gt;

&lt;p&gt;Coming soon&lt;/p&gt;

&lt;h3&gt;
  
  
  Which databases have the most heavy contributors?
&lt;/h3&gt;

&lt;p&gt;The chart below displays the number of heavy contributors (who submitted more than 100 pull requests), medium contributors (who submitted more than 10 but less than 100 pull requests), and light contributors (who submitted less than 10 pull requests) of leading open source databases. The chart also ranks these databases based on their number of heavy contributors.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--pSnmy4NL--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/4zdaixcwvfk2ijcq112c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--pSnmy4NL--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/4zdaixcwvfk2ijcq112c.png" alt="Image description" width="880" height="860"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://api.ossinsight.io/share/d30c2953-1c0f-4a87-a3f5-27b0b59b61c8"&gt;https://api.ossinsight.io/share/d30c2953-1c0f-4a87-a3f5-27b0b59b61c8&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Which databases are heavily contributed?
&lt;/h3&gt;

&lt;p&gt;The chart below displays the number of pull requests submitted by heavy contributors, medium contributors, and light contributors. The chart also ranks these databases based on the number of pull requests submitted by heavy contributors. &lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--DGDAjqmT--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/dr4dry2vtyy3re7jpn3o.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--DGDAjqmT--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/dr4dry2vtyy3re7jpn3o.png" alt="Image description" width="880" height="926"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://api.ossinsight.io/share/050511ea-3aa5-468f-be07-2a48568147b9"&gt;https://api.ossinsight.io/share/050511ea-3aa5-468f-be07-2a48568147b9&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Database programming languages
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Which languages were most favored in the past ten years?
&lt;/h3&gt;

&lt;p&gt;The chart below shows the top programming languages used in open source databases in the past ten years and how many databases used them each year.&lt;/p&gt;

&lt;p&gt;Coming soon&lt;/p&gt;

&lt;h3&gt;
  
  
  Which languages were most favored in 2021?
&lt;/h3&gt;

&lt;p&gt;The chart below shows the top programming languages used in pull requests submitted to open source databases in 2021. &lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--gt6G2IfS--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/qp3wlzhju088dzhlgavj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--gt6G2IfS--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/qp3wlzhju088dzhlgavj.png" alt="Image description" width="880" height="510"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://api.ossinsight.io/share/57c5049f-d2f5-4046-b06c-e71f48807963"&gt;https://api.ossinsight.io/share/57c5049f-d2f5-4046-b06c-e71f48807963&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;More content and specific SQL can go into the official website to learn more.&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://ossinsight.io/blog/deep-insight-into-open-source-databases"&gt;https://ossinsight.io/blog/deep-insight-into-open-source-databases&lt;/a&gt; &lt;/p&gt;

</description>
      <category>opensource</category>
      <category>database</category>
      <category>github</category>
      <category>sql</category>
    </item>
    <item>
      <title>Deep Insights into Low-code Development Tools</title>
      <dc:creator>OSS Insight</dc:creator>
      <pubDate>Wed, 15 Jun 2022 10:13:55 +0000</pubDate>
      <link>https://forem.com/ossinsight/deep-insights-into-low-code-development-tools-5h8j</link>
      <guid>https://forem.com/ossinsight/deep-insights-into-low-code-development-tools-5h8j</guid>
      <description>&lt;p&gt;In this chapter, we will share with you some of the top low-code development tools repos (LCDT repos) on GitHub in 2021 measured by different metrics including the number of stars, PRs, contributors, countries, regions and so on.&lt;/p&gt;

&lt;p&gt;Note:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;You can move your cursor onto any of the repository bars/lines on the chart and get the exact number.&lt;/li&gt;
&lt;li&gt;The SQL commands above each chart are what we use on our TiDB Cloud to get the analytical results. Try those SQL commands by yourselves on TiDB Cloud with this 10-minute tutorial.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Star history of top LCDT repos since 2011
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--n4ar2ket--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/qt3zasptyu7hcmse4sex.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--n4ar2ket--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/qt3zasptyu7hcmse4sex.png" alt="Image description" width="880" height="432"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Top 10 most starred LCDT repos in 2021
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Bsz7LJYe--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/y4g55x13jpdiht0cjo8c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Bsz7LJYe--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/y4g55x13jpdiht0cjo8c.png" alt="Image description" width="880" height="644"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Top 10 LCDT repos with the most PRs in 2021
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--pqyilWID--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/m4osvytytn74ttgat5mb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--pqyilWID--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/m4osvytytn74ttgat5mb.png" alt="Image description" width="880" height="627"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Top 10 LCDT repos with the highest YoY growth rate in 2021
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--228kAFS0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/nevfp0hbpxy5lh8ocasc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--228kAFS0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/nevfp0hbpxy5lh8ocasc.png" alt="Image description" width="880" height="701"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Top 10 LCDT repos with the lowest YoY growth rate in 2021
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--dDEmUPyB--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/772dwujswjld00wxd0r2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--dDEmUPyB--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/772dwujswjld00wxd0r2.png" alt="Image description" width="880" height="705"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Top 7 most used programming languages in LCDT repos in 2021
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--6i5RFWX0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/nks39m2jx8t1ip5bkll3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--6i5RFWX0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/nks39m2jx8t1ip5bkll3.png" alt="Image description" width="880" height="515"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Top 20 countries/regions contributing the most to LCDT repos in 2021
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--mKXDeD9V--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/hc8idksfegoiamz2a4vs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--mKXDeD9V--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/hc8idksfegoiamz2a4vs.png" alt="Image description" width="880" height="492"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The rankings of LCDT repos measured by Z-score in 2021
&lt;/h2&gt;

&lt;p&gt;The analytical results displayed above are generated based on just one single metric of these three: stars, PRs, or contributors. Now, we will use the Z-score method to rank the LCDT repos on GitHub.&lt;/p&gt;

&lt;p&gt;This is the comprehensive ranking calculated by z-score:&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--kSTTHFzN--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/bq33fqzuysddh1fjqsl8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--kSTTHFzN--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/bq33fqzuysddh1fjqsl8.png" alt="Image description" width="880" height="824"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://ossinsight.io/blog/deep-insight-into-lowcode-development-tools-2021"&gt;https://ossinsight.io/blog/deep-insight-into-lowcode-development-tools-2021&lt;/a&gt; &lt;/p&gt;

</description>
      <category>lowcode</category>
      <category>opensource</category>
      <category>github</category>
      <category>programming</category>
    </item>
    <item>
      <title>Deep Insights into JavaScript Frameworks</title>
      <dc:creator>OSS Insight</dc:creator>
      <pubDate>Mon, 13 Jun 2022 03:24:39 +0000</pubDate>
      <link>https://forem.com/ossinsight/deep-insights-into-javascript-frameworks-30d2</link>
      <guid>https://forem.com/ossinsight/deep-insights-into-javascript-frameworks-30d2</guid>
      <description>&lt;p&gt;In this chapter, we will share with you some of the top JavaScript Framework repos(JSF repos) on GitHub in 2021 measured by different metrics including the number of stars, PRs, contributors, countries, regions and so on.&lt;/p&gt;

&lt;p&gt;Note:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;You can move your cursor onto any of the repository bars/lines on the chart and get the exact number.&lt;/li&gt;
&lt;li&gt;The SQL commands above each chart are what we use on our TiDB Cloud to get the analytical results. Try those SQL commands by yourselves on TiDB Cloud with this 10-minute tutorial.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Star history of top JavaScript Framework repos since 2011
&lt;/h2&gt;

&lt;p&gt;The number of stars is often thought of as a measure of whether a github repository is popular or not. We sort all JavaScript framework repositories from github by the total number of historical stars since 2011. For visualizing the results more intuitively, we show the top 10 open source databases by using an interactive line chart.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Jb-I5qLA--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/r4j5mj830zumlksr45es.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Jb-I5qLA--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/r4j5mj830zumlksr45es.png" alt="Image description" width="880" height="448"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://api.ossinsight.io/share/0375d838-7f7f-4649-90e7-6ec78ab5fb48"&gt;https://api.ossinsight.io/share/0375d838-7f7f-4649-90e7-6ec78ab5fb48&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Top 10 most starred JSF repos in 2021
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--hFbnthmN--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/9rtkp9oqcv2utszz5gwb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--hFbnthmN--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/9rtkp9oqcv2utszz5gwb.png" alt="Image description" width="880" height="624"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://api.ossinsight.io/share/739d4930-57b0-4960-b577-39b535d25839"&gt;https://api.ossinsight.io/share/739d4930-57b0-4960-b577-39b535d25839&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Top 10 JSF repos with the most PRs in 2021
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--5a4vhrc7--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/7qnomqwj8qvs0zyanvn0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--5a4vhrc7--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/7qnomqwj8qvs0zyanvn0.png" alt="Image description" width="880" height="641"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://api.ossinsight.io/share/87287eb6-b909-4c82-872a-bdb32358e6d7"&gt;https://api.ossinsight.io/share/87287eb6-b909-4c82-872a-bdb32358e6d7&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Top 10 JSF repos with the highest YoY growth rate of stars in 2021
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--QWp1woDS--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/1ulwdabodioi5oiyhzxh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--QWp1woDS--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/1ulwdabodioi5oiyhzxh.png" alt="Image description" width="880" height="726"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://api.ossinsight.io/share/9cbfbdf6-66db-42dd-9b5a-0b6257b827d7"&gt;https://api.ossinsight.io/share/9cbfbdf6-66db-42dd-9b5a-0b6257b827d7&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Top 10 JSF repos with the lowest YoY growth rate of stars in 2021
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--0UWzC3Ji--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/stwmmtxkqhcbat9upb7g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--0UWzC3Ji--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/stwmmtxkqhcbat9upb7g.png" alt="Image description" width="880" height="695"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://api.ossinsight.io/share/c60ad913-11f1-42d9-bf1c-d52312e36890"&gt;https://api.ossinsight.io/share/c60ad913-11f1-42d9-bf1c-d52312e36890&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Top 10 most used programming languages in JSF repos in 2021
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ZffzbWKD--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/47hpp5a4irzgqjc9s433.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ZffzbWKD--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/47hpp5a4irzgqjc9s433.png" alt="Image description" width="880" height="521"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://api.ossinsight.io/share/9d4b3dbe-e9f4-44f9-b4a1-a9bd2280df48"&gt;https://api.ossinsight.io/share/9d4b3dbe-e9f4-44f9-b4a1-a9bd2280df48&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Top 10 countries/regions contributing the most to JSF repos in 2021
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--TudivwRW--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/vu7o6or8qmkfvu31wkg1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--TudivwRW--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/vu7o6or8qmkfvu31wkg1.png" alt="Image description" width="880" height="496"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://api.ossinsight.io/share/741e94ce-8b84-4003-8491-9a2537c10922"&gt;https://api.ossinsight.io/share/741e94ce-8b84-4003-8491-9a2537c10922&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Rankings of JSF repos measured by Z-score in 2021
&lt;/h2&gt;

&lt;p&gt;The analytical results displayed above are generated based on just one single metric of these three: stars, PRs, or contributors. Now, we will use the Z-score method to rank the JSF repos on GitHub.&lt;/p&gt;

&lt;p&gt;This is the comprehensive ranking calculated by z-score:&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--0Yr3AS_t--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/uv57bhvktcu9mpgws8ps.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--0Yr3AS_t--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/uv57bhvktcu9mpgws8ps.png" alt="Image description" width="880" height="832"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://api.ossinsight.io/share/64c7ac0a-1a4b-4fcc-85b7-da523097ac3e"&gt;https://api.ossinsight.io/share/64c7ac0a-1a4b-4fcc-85b7-da523097ac3e&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://ossinsight.io/blog/deep-insight-into-js-framework-2021"&gt;https://ossinsight.io/blog/deep-insight-into-js-framework-2021&lt;/a&gt; &lt;/p&gt;

</description>
      <category>javascript</category>
      <category>github</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Data Preparation for Analytics</title>
      <dc:creator>OSS Insight</dc:creator>
      <pubDate>Wed, 25 May 2022 00:42:10 +0000</pubDate>
      <link>https://forem.com/ossinsight/data-preparation-for-analytics-495l</link>
      <guid>https://forem.com/ossinsight/data-preparation-for-analytics-495l</guid>
      <description>&lt;p&gt;All the data we use here on this website sources from GH Archive, a non-profit project that records and archives all GitHub events data since 2011. The total data volume archived by GH Archive can be up to 4 billion rows. We download the json file on GH Archive and convert it into csv format via Script, and finally load it into the TiDB cluster in parallel through TiDB-Lightning.&lt;/p&gt;

&lt;p&gt;In this post, we will explain step by step how we conduct this process.&lt;/p&gt;

&lt;p&gt;Prepare the data in csv format for TiDB Lighting.&lt;br&gt;
├── gharchive_dev.github_events.000000000000.csv&lt;br&gt;
├── gharchive_dev.github_events.000000000001.csv&lt;br&gt;
├── gharchive_dev.github_events.000000000002.csv&lt;br&gt;
├── gharchive_dev.github_events.000000000003.csv&lt;br&gt;
├── gharchive_dev.github_events.000000000004.csv&lt;br&gt;
├── gharchive_dev.github_events.000000000005.csv&lt;br&gt;
├── gharchive_dev.github_events.000000000006.csv&lt;br&gt;
├── gharchive_dev.github_events.000000000007.csv&lt;br&gt;
├── gharchive_dev.github_events.000000000008.csv&lt;br&gt;
├── gharchive_dev.github_events.000000000009.csv&lt;br&gt;
├── gharchive_dev.github_events.000000000010.csv&lt;br&gt;
├── gharchive_dev.github_events.000000000011.csv&lt;br&gt;
├── gharchive_dev.github_events.000000000012.csv&lt;br&gt;
├── gharchive_dev.github_events.000000000013.csv&lt;br&gt;
Configure the TiDB Lightning as follows.&lt;br&gt;
cat tidb-lightning.toml&lt;br&gt;
[mydumper.csv]&lt;br&gt;
separator = ','&lt;br&gt;
delimiter = '"'&lt;br&gt;
header = true&lt;br&gt;
not-null = false&lt;br&gt;
backslash-escape = true&lt;br&gt;
trim-last-separator = false&lt;/p&gt;

&lt;p&gt;[tikv-importer]&lt;br&gt;
 backend = "local"&lt;br&gt;
 sorted-kv-dir = "/kvdir/"&lt;/p&gt;

&lt;p&gt;disk-quota = "1.5TiB"&lt;/p&gt;

&lt;p&gt;[mydumper]&lt;br&gt;
data-source-dir = "/csv_dir/"&lt;br&gt;
strict-format = false&lt;br&gt;
no-schema = true&lt;/p&gt;

&lt;p&gt;[tidb]&lt;br&gt;
host = "xxx"&lt;br&gt;
port = 3306&lt;br&gt;
user = "github_events"&lt;br&gt;
password = "******"&lt;/p&gt;

&lt;p&gt;[lightning]&lt;br&gt;
check-requirements = false&lt;br&gt;
region-concurrency = 32&lt;br&gt;
meta-schema-name = "gharchive_meta"&lt;br&gt;
Load the data into the TiDB cluster.&lt;br&gt;
nohup tidb-lightning -config ./tidb-lightning.toml &amp;gt; nohup.out&lt;br&gt;
Convert the unstructured json file provided by GH Archive into structured data.&lt;br&gt;
gharchive_dev&amp;gt; desc github_events;&lt;br&gt;
+--------------------+--------------+------+-----+---------+-------+&lt;br&gt;
| Field              | Type         | Null | Key | Default | Extra |&lt;br&gt;
+--------------------+--------------+------+-----+---------+-------+&lt;br&gt;
| id                 | bigint(20)   | YES  | MUL |   |       |&lt;br&gt;
| type               | varchar(255) | YES  | MUL |   |       |&lt;br&gt;
| created_at         | datetime     | YES  | MUL |   |       |&lt;br&gt;
| repo_id            | bigint(20)   | YES  | MUL |   |       |&lt;br&gt;
| repo_name          | varchar(255) | YES  | MUL |   |       |&lt;br&gt;
| actor_id           | bigint(20)   | YES  | MUL |   |       |&lt;br&gt;
| actor_login        | varchar(255) | YES  | MUL |   |       |&lt;br&gt;
| actor_location     | varchar(255) | YES  |     |   |       |&lt;br&gt;
| language           | varchar(255) | YES  | MUL |   |       |&lt;br&gt;
| additions          | bigint(20)   | YES  | MUL |   |       |&lt;br&gt;
| deletions          | bigint(20)   | YES  | MUL |   |       |&lt;br&gt;
| action             | varchar(255) | YES  | MUL |   |       |&lt;br&gt;
| number             | int(11)      | YES  |     |   |       |&lt;br&gt;
| commit_id          | varchar(255) | YES  | MUL |   |       |&lt;br&gt;
| comment_id         | bigint(20)   | YES  | MUL |   |       |&lt;br&gt;
| org_login          | varchar(255) | YES  | MUL |   |       |&lt;br&gt;
| org_id             | bigint(20)   | YES  | MUL |   |       |&lt;br&gt;
| state              | varchar(255) | YES  |     |   |       |&lt;br&gt;
| closed_at          | datetime     | YES  | MUL |   |       |&lt;br&gt;
| comments           | int(11)      | YES  | MUL |   |       |&lt;br&gt;
| pr_merged_at       | datetime     | YES  | MUL |   |       |&lt;br&gt;
| pr_merged          | tinyint(1)   | YES  |     |   |       |&lt;br&gt;
| pr_changed_files   | int(11)      | YES  | MUL |   |       |&lt;br&gt;
| pr_review_comments | int(11)      | YES  | MUL |   |       |&lt;br&gt;
| pr_or_issue_id     | bigint(20)   | YES  | MUL |   |       |&lt;br&gt;
| event_day          | date         | YES  | MUL |   |       |&lt;br&gt;
| event_month        | date         | YES  | MUL |   |       |&lt;br&gt;
| author_association | varchar(255) | YES  |     |   |       |&lt;br&gt;
| event_year         | int(11)      | YES  | MUL |   |       |&lt;br&gt;
| push_size          | int(11)      | YES  |     |   |       |&lt;br&gt;
| push_distinct_size | int(11)      | YES  |     |   |       |&lt;br&gt;
+--------------------+--------------+------+-----+---------+-------+&lt;br&gt;
With structured data at hand, we can start to make further analysis with TiDB Cloud. Execute SQL commands to generate analytical results. For example, you can execute SQL commands below to output the top 10 most starred JavaScript framework repos in 2021.&lt;br&gt;
  SELECT js.name, count(*) as stars &lt;br&gt;
    FROM github_events &lt;br&gt;
         JOIN js_framework_repos js ON js.id = github_events.repo_id &lt;br&gt;
   WHERE type = 'WatchEvent' and event_year = 2021 &lt;br&gt;
GROUP BY 1 &lt;br&gt;
ORDER BY 2 DESC&lt;br&gt;
   LIMIT 10;&lt;br&gt;
+-------------------+-------+&lt;br&gt;
| name              | stars |&lt;br&gt;
+-------------------+-------+&lt;br&gt;
| facebook/react    | 22830 |&lt;br&gt;
| sveltejs/svelte   | 18573 |&lt;br&gt;
| vuejs/vue         | 18015 |&lt;br&gt;
| angular/angular   | 11037 |&lt;br&gt;
| alpinejs/alpine   | 6993  |&lt;br&gt;
| preactjs/preact   | 2965  |&lt;br&gt;
| hotwired/stimulus | 1355  |&lt;br&gt;
| marko-js/marko    | 1006  |&lt;br&gt;
| neomjs/neo        | 826   |&lt;br&gt;
| tastejs/todomvc   | 813   |&lt;br&gt;
+-------------------+-------+&lt;br&gt;
We have analyzed all the GitHub projects regarding databases, JavaScripe frameworks, programming languages, web frameworks, and low-code development tools, and provided valuable insights in 2021, in real time, and custom insights. If the repository you care about is not included here, you're welcome to submit your PR here. If you want to gain more insights into other areas, you can try TiDB Cloud by yourselves with this 10-minute tutorial.&lt;/p&gt;

&lt;p&gt;Below are the areas of GitHub projects we have analyzed.&lt;/p&gt;

&lt;p&gt;gharchive_dev&amp;gt; show tables;&lt;br&gt;
+-----------------------------+&lt;br&gt;
| Tables_in_gharchive_dev     |&lt;br&gt;
+-----------------------------+&lt;br&gt;
| cn_repos                    |&lt;br&gt;
| css_framework_repos         |&lt;br&gt;
| db_repos                    |&lt;br&gt;
| github_events               |&lt;br&gt;
| js_framework_repos          |&lt;br&gt;
| nocode_repos                |&lt;br&gt;
| programming_language_repos  |&lt;br&gt;
| static_site_generator_repos |&lt;br&gt;
| web_framework_repos         |&lt;br&gt;
+-----------------------------+&lt;/p&gt;

&lt;p&gt;&lt;a href="https://pingcap-ossinsight-build-pr-271.surge.sh/blog/how-it-works"&gt;https://pingcap-ossinsight-build-pr-271.surge.sh/blog/how-it-works&lt;/a&gt;&lt;/p&gt;

</description>
      <category>github</category>
      <category>opensource</category>
      <category>database</category>
      <category>discuss</category>
    </item>
  </channel>
</rss>
