<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Hilton Fernandes</title>
    <description>The latest articles on Forem by Hilton Fernandes (@hilton_fernandes_eaac26ab).</description>
    <link>https://forem.com/hilton_fernandes_eaac26ab</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3451189%2F8daac153-76e1-4d43-b3cd-143771a97d38.jpg</url>
      <title>Forem: Hilton Fernandes</title>
      <link>https://forem.com/hilton_fernandes_eaac26ab</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/hilton_fernandes_eaac26ab"/>
    <language>en</language>
    <item>
      <title>Leveling with cluster analysis in Python: basic Python concepts</title>
      <dc:creator>Hilton Fernandes</dc:creator>
      <pubDate>Thu, 06 Nov 2025 21:36:06 +0000</pubDate>
      <link>https://forem.com/hilton_fernandes_eaac26ab/leveling-with-cluster-analysis-in-python-basic-python-concepts-1n6</link>
      <guid>https://forem.com/hilton_fernandes_eaac26ab/leveling-with-cluster-analysis-in-python-basic-python-concepts-1n6</guid>
      <description>&lt;p&gt;This is the 2nd of a series of 5 little articles that intend to present a simple idea of time series, and their implementation in Python. The purpose here is to present both a time series problem, and how we can solve it in simple Python code.&lt;/p&gt;

&lt;p&gt;Only very basic knowledge of Python and time series are needed as most concepts will be explained with care and references to longer tutorials.&lt;/p&gt;

&lt;h2&gt;
  
  
  Roadmap
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://dev.to/hilton_fernandes_eaac26ab/leveling-with-cluster-analysis-in-python-400p"&gt;1st article of the series&lt;/a&gt; presented the basic concepts of this series. This one, the 2nd one will present basic Python concepts and techniques to be used in the solution. The 3rd one will present a solution implemented in Python. The 4th article will add a sinusoidal decomposition of the data after the filtering of the solution. And the 5th and last one will use all the elements to address a real problem in cryptocurrencies.&lt;br&gt;
'&lt;/p&gt;
&lt;h2&gt;
  
  
  Some simple Python ideas
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Libraries, modules and submodules
&lt;/h3&gt;

&lt;p&gt;Isaac Newton, that created a huge part of modern Physics and Mathematics once said the &lt;a href="https://en.wikiquote.org/wiki/Isaac_Newton#Quotes" rel="noopener noreferrer"&gt;he could see further because because he standed in the shoulder of giants&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This concept is behind most software codes: they do not create everything, they use a large part of what was already created, mainly in the form of &lt;a href="https://en.wikipedia.org/wiki/Library_(computing)" rel="noopener noreferrer"&gt;sofware libraries&lt;/a&gt;. Python is very good at this, and here is the part of the code used here that will use some libraries. In Python, a library is usually named a &lt;em&gt;module&lt;/em&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt; 

&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.cluster&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;KMeans&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The idea that the current code will bring information from another one is encompassed in the word &lt;code&gt;import&lt;/code&gt; in the code above. Another important point is that programmers usually prefer to write less. So, in the 1st line, the library &lt;code&gt;numpy&lt;/code&gt; is renamed as &lt;code&gt;np&lt;/code&gt;. In another line, the module &lt;code&gt;matplotlib.pyplot&lt;/code&gt; is renamed simply as &lt;code&gt;plt&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;And in the last line of the code, another shortening is presented: instead of renaming a code fragment in a shorter from, Python lets one pick only what a programmer needs. In this case, only &lt;code&gt;KMeans&lt;/code&gt; will be picked from &lt;code&gt;sklearn.cluster&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;A less important point is that use of a dot lets selecting a part of a module; that is: a &lt;em&gt;submodule&lt;/em&gt;. In the code shown, &lt;code&gt;matplotlib.pyplot&lt;/code&gt; means the submodule &lt;code&gt;pyplot&lt;/code&gt; of the module &lt;code&gt;matplotlib&lt;/code&gt;. And, of course, &lt;code&gt;sklearn.cluster&lt;/code&gt; is the submodule &lt;code&gt;cluster&lt;/code&gt; of the module &lt;code&gt;sklearn&lt;/code&gt;. Creating submodules increases the organization of larger modules, as this divides them in specialized parts.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pieces of information in scalar variables
&lt;/h3&gt;

&lt;p&gt;The next part of the code deals with storing information that shall be processed. Here a list of &lt;em&gt;scalar variables&lt;/em&gt;, or variables that are individual:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;coeff&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.25&lt;/span&gt;

&lt;span class="n"&gt;base&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;
&lt;span class="n"&gt;ladder&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;
&lt;span class="n"&gt;n_points&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;101&lt;/span&gt;
&lt;span class="n"&gt;n_half&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;n_points&lt;/span&gt; &lt;span class="o"&gt;//&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;

&lt;span class="n"&gt;delta&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;n_points&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There are two types of information in this code fragment: &lt;code&gt;int&lt;/code&gt;, that hold only integral values. In this case the count of points (&lt;code&gt;n_point&lt;/code&gt;) and the half of that count, &lt;code&gt;n_half&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The double bar in the line for &lt;code&gt;n_half&lt;/code&gt; is used to make sure the division of &lt;code&gt;n_points&lt;/code&gt; by &lt;code&gt;2&lt;/code&gt; will generate a number of type &lt;code&gt;int&lt;/code&gt;, and not a &lt;code&gt;float&lt;/code&gt; number. That is: it will hold &lt;code&gt;50&lt;/code&gt;, and not &lt;code&gt;50.5&lt;/code&gt;, as the division of &lt;code&gt;101&lt;/code&gt; by &lt;code&gt;2&lt;/code&gt; would create.&lt;/p&gt;

&lt;p&gt;The type &lt;code&gt;float&lt;/code&gt; can be used to hold number with a decimal part. For instance, &lt;code&gt;coeff&lt;/code&gt; will be used to hold a coefficient. In this case, &lt;code&gt;0.25&lt;/code&gt;.  Since the problem here is to represent a discontuinuity, a set of values will be close to a &lt;code&gt;base&lt;/code&gt; level, while other ones will be above it, in &lt;code&gt;ladder&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;And finally, &lt;code&gt;delta&lt;/code&gt; holds the step that will be used as a clock tick in our time series.&lt;/p&gt;

&lt;h3&gt;
  
  
  A data generator
&lt;/h3&gt;

&lt;p&gt;A time series usually is a series of data collected along the time. For instance, mean wage in a certain year. But to avoid the need of getting real data, this code will generate its own data. By means of a &lt;a href="https://en.wikipedia.org/wiki/Pseudorandom_number_generator" rel="noopener noreferrer"&gt;pseudorandom number generator&lt;/a&gt;, aka PRNG. In a few words, a PRNG is a mathematical algorithm that can generate a sequence of numbers without any pattern; that is: they look random. It's called &lt;em&gt;pseudorandom&lt;/em&gt; because if one such algorithm is fed with a constant &lt;em&gt;seed&lt;/em&gt; it will always generate the same sequence of numbers.&lt;/p&gt;

&lt;p&gt;In this case it is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;rng&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;default_rng&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this code, &lt;code&gt;rng&lt;/code&gt; is the name of the fabric of numbers, and it's created by calling the &lt;em&gt;function&lt;/em&gt; (or piece of code) &lt;code&gt;defaut_rng&lt;/code&gt; with the parameter &lt;code&gt;42&lt;/code&gt;.  This function is in the submodule &lt;code&gt;np.random&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pieces of information in arrays
&lt;/h3&gt;

&lt;p&gt;Since a time series contain several data, a &lt;em&gt;scalar variable&lt;/em&gt; can't be used to contain it. The module &lt;code&gt;numpy&lt;/code&gt; has the resource of &lt;em&gt;arrays&lt;/em&gt; or &lt;code&gt;ndarrays&lt;/code&gt;. In an array, the elements are identified by an &lt;em&gt;index&lt;/em&gt;, that is analogous to the apartment number in a building.&lt;/p&gt;

&lt;p&gt;The following code fragment creates the arrays needed to this problem:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ndarray&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;linspace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_points&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ndarray&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rng&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uniform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;n_points&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;n_half&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;ladder&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the 1st line, the array &lt;code&gt;x&lt;/code&gt; receives &lt;code&gt;n_points&lt;/code&gt; (aka &lt;code&gt;101&lt;/code&gt;) numbers from &lt;code&gt;0.0&lt;/code&gt; to &lt;code&gt;1.0&lt;/code&gt;, subdivided in increments of &lt;code&gt;0.01&lt;/code&gt;; that is: &lt;code&gt;0.0&lt;/code&gt;, &lt;code&gt;0.01&lt;/code&gt;, &lt;code&gt;0.02&lt;/code&gt;, ..., &lt;code&gt;0.99&lt;/code&gt;, &lt;code&gt;1.0&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;In the 2nd line, the array &lt;code&gt;y&lt;/code&gt; receives &lt;code&gt;n_points&lt;/code&gt; numbers chosen &lt;em&gt;pseudorandomly&lt;/em&gt; between &lt;code&gt;-05&lt;/code&gt; and &lt;code&gt;0.5&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;In the third line, the 2nd half of &lt;code&gt;y&lt;/code&gt; receives an increment of &lt;code&gt;ladder&lt;/code&gt;. That creates  the discontinuity to be solved in the remaining of the texts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Plotting the data
&lt;/h2&gt;

&lt;p&gt;Please consider the following code fragment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;subplots&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;plot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_xlabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;x&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_ylabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;y&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;grid&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;visible&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The 1st line is a generic one, that can be used to create &lt;em&gt;plots&lt;/em&gt; -- aka charts -- much more complicated than the ones used here. For instance, several plots in the same image.&lt;/p&gt;

&lt;p&gt;The 2nd line plots the elements of two arrays, taking care of connecting by lines each one of their values.&lt;/p&gt;

&lt;p&gt;The 3rd and 4th lines of the code label the &lt;code&gt;x&lt;/code&gt; and &lt;code&gt;y&lt;/code&gt; axis.&lt;/p&gt;

&lt;p&gt;The 5th line of the code create a grid to ease the visualization of data. And the 6th and final line causes the chart to be shown in the screen.&lt;/p&gt;

&lt;p&gt;The final result is like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa1hv5vkfi32omlyezfty.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa1hv5vkfi32omlyezfty.jpeg" alt="A time series with a discontinuity" width="640" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>beginners</category>
      <category>tutorial</category>
      <category>python</category>
      <category>datascience</category>
    </item>
    <item>
      <title>Leveling with cluster analysis in Python: general concepts</title>
      <dc:creator>Hilton Fernandes</dc:creator>
      <pubDate>Thu, 30 Oct 2025 18:52:11 +0000</pubDate>
      <link>https://forem.com/hilton_fernandes_eaac26ab/leveling-with-cluster-analysis-in-python-400p</link>
      <guid>https://forem.com/hilton_fernandes_eaac26ab/leveling-with-cluster-analysis-in-python-400p</guid>
      <description>&lt;p&gt;Financial markets have discontinuities: sometimes a price jumps up or down in a time so short that it can be considered a real discontinuity if the time measured by our clocks were really continuous, real-line continuous.&lt;/p&gt;

&lt;p&gt;Those discontinuities create problems for many forms of mathematical modelling, since their models are based upon continuous functions. For instance, many price oscillations look like periodic functions, but when a discontinuity is found, any harmonic analysis becomes troublesome.&lt;/p&gt;

&lt;p&gt;Actually, a trend can also be troublesome to the fitting of periodic functions to financial data. But in this case, fitting a polynomial of low grade to the data can filter the trend and then a periodic function series can be fitted to the residuals, the difference between the fitted polynomial and the original data.&lt;/p&gt;

&lt;p&gt;The purpose of this suite of articles is to present a simple method to eliminate jumps from the observed data. Of course, when reconstructing the fitted data, the discontinuity will be added back.&lt;/p&gt;

&lt;p&gt;Only very basic knowledge of Python and time series are needed as most concepts will be explained with care and references to longer tutorials.&lt;/p&gt;

&lt;h2&gt;
  
  
  Roadmap
&lt;/h2&gt;

&lt;p&gt;This one, the 1st of 5 short articles, will introduce the general concepts for the solution, the &lt;a href="https://dev.to/hilton_fernandes_eaac26ab/leveling-with-cluster-analysis-in-python-basic-python-concepts-1n6"&gt;2nd one&lt;/a&gt; will present basic Python concepts and techniques to be used in the solution, the 3rd one will present a solution implemented in Python, the 4th article will add a sinusoidal decomposition of the data after the filtering of the solution, and the 5th and last one will use all the elements to address a real problem in cryptocurrencies.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cluster analysis as a means to group similar data
&lt;/h2&gt;

&lt;p&gt;Cluster analysis is a well-known technique for grouping data elements based on their similarities. In a metric space, similarity means smaller distances. There are several ways to devise the groups or clusters of data, and one of the simplest is called &lt;a href="https://en.wikipedia.org/wiki/K-means_clustering" rel="noopener noreferrer"&gt;k-means clustering&lt;/a&gt;. In very few words, it creates clusters by assigning a mean average of the coordinates to a point, that's a &lt;em&gt;centroid&lt;/em&gt;. Through these articles we shall use only k-means clustering.&lt;/p&gt;

&lt;p&gt;The following image is a typical two-dimensional representation of two groups.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7hwo0a10z279kdqhxm5j.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7hwo0a10z279kdqhxm5j.jpg" alt=" " width="640" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The points are in blue, and the centroids are in red.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cluster analysis in a curve
&lt;/h2&gt;

&lt;p&gt;Since the k-means clustering is based upon the distance of points, an interesting effect will happen when the points are connected in a curve; therefore, they are much closer to each other than the points dispersed in a cloud, like in the previous image.&lt;/p&gt;

&lt;p&gt;Please consider the following image that shows a time series with a discontinuity.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2vx61npnjasazb722kxj.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2vx61npnjasazb722kxj.jpeg" alt="A time series with a discontinuity" width="640" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When a k-means cluster analysis is applied to it, the centroids of the clusters are shown in red.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4paveeq32asnqw1ua83p.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4paveeq32asnqw1ua83p.jpeg" alt="A time series with clusters in the discontinuity" width="640" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It's easy to see that the there are two groups are in different levels, as shown in the following image:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffe1jhj3xf2wrkktuzksx.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffe1jhj3xf2wrkktuzksx.jpeg" alt="Levels in the time series with discontinuity" width="640" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The Group 2 is around the green line, while the Group 1 is around the red line.  &lt;/p&gt;

&lt;p&gt;Then to eliminate the discontinuity, it's enough to lower the Group 1 to the level of the Group 2.  That is: to subtract from the points &lt;code&gt;y&lt;/code&gt; coordinate the difference between the level of the two groups.&lt;/p&gt;

&lt;p&gt;That can be shown in the following image:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8c8sq9sxiucspp46j4yx.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8c8sq9sxiucspp46j4yx.jpeg" alt="Unification of the two clusters" width="640" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now no differences can be seen in the two groups of points.&lt;/p&gt;

&lt;h2&gt;
  
  
  Next step
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://dev.to/hilton_fernandes_eaac26ab/leveling-with-cluster-analysis-in-python-basic-python-concepts-1n6"&gt;next article&lt;/a&gt; in this suit will introduce the basic Python concepts needed to create the 1st of the images presented here, and also used in the other articles.&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>python</category>
      <category>tutorial</category>
      <category>beginners</category>
    </item>
  </channel>
</rss>
