<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Bruno Clemente</title>
    <description>The latest articles on Forem by Bruno Clemente (@killertux).</description>
    <link>https://forem.com/killertux</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F353171%2F325dc260-ab2c-4af3-b9d8-ea5379aa8449.jpeg</url>
      <title>Forem: Bruno Clemente</title>
      <link>https://forem.com/killertux</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/killertux"/>
    <language>en</language>
    <item>
      <title>Stream: a library to lazily process data with PHP.</title>
      <dc:creator>Bruno Clemente</dc:creator>
      <pubDate>Wed, 02 Feb 2022 12:28:56 +0000</pubDate>
      <link>https://forem.com/killertux/stream-a-library-to-lazily-process-data-with-php-583h</link>
      <guid>https://forem.com/killertux/stream-a-library-to-lazily-process-data-with-php-583h</guid>
      <description>&lt;p&gt;First of all, when we are talking about streams inside the PHP universe, the thing that will probably go to your head is the streamable resources of PHP. This is not what I will talk about today. &lt;a href="https://github.com/ebanx/stream"&gt;Stream&lt;/a&gt; is a PHP library that helps us to define a pipeline of data transformations ergonomically and lazily. It is named "stream" because it was heavily inspired by the &lt;a href="https://hexdocs.pm/elixir/Stream.html"&gt;Elixir Stream&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;So, let's take a look at it. First, add it to your project using composer &lt;code&gt;composer require ebanx/stream&lt;/code&gt;. Then, we can create a very simple example to illustrate how it works.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;EBANX\Stream\Stream&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nv"&gt;$result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Stream&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;rangeInt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nv"&gt;$value&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nv"&gt;$value&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nv"&gt;$value&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;bool&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nv"&gt;$value&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;collect&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="nb"&gt;print_r&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$result&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The output of this example should be:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Array 
( 
   [0] =&amp;gt; 4
   [1] =&amp;gt; 6
   [2] =&amp;gt; 8
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Ok, now we need to understand what is happening here. The first thing that we do is to create a Stream. There are 4 methods that allow us to create a stream: &lt;code&gt;Stream::of&lt;/code&gt;, &lt;code&gt;Stream::ofKeyValueMap&lt;/code&gt;, &lt;code&gt;Stream::rangeInt&lt;/code&gt;, &lt;code&gt;Stream::rangeFloat&lt;/code&gt;. The last two allow us to create a stream by providing a numeric range. Note that this is lazy, which means that each element of the range will be only created when you consume it. We can create the following Stream and it would not use a crazy amount of memory from your computer: &lt;code&gt;Stream::rangeInt(PHP_INT_MIN, PHP_INT_MAX)&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The other two stream creators allow us to create a stream providing an iterable to it. The &lt;code&gt;Stream::of&lt;/code&gt; will ignore any key that each element might have. &lt;code&gt;Stream::ofKeyValueMap&lt;/code&gt; will transform each element into an array tuple where the first element is the key and the second one is the element value.&lt;/p&gt;

&lt;p&gt;Now, in the example, following the stream creation there are two transformation methods. The first one is the &lt;code&gt;map&lt;/code&gt;. In the map, we pass to it a callable. This callable will be called for each element that is being processed and pass on the value that the callable returns to the stream chain. So, in our example, we add 3 to each element of the stream.&lt;/p&gt;

&lt;p&gt;The next transformation method is the &lt;code&gt;filter&lt;/code&gt;. Here we also pass a callable that will receive as a parameter each element of the stream. But this callable needs to return a boolean. If it returns true, the element will follow to the next step of the stream. If it returns false, the element will not continue and it will be filtered out. So in this case, we are only keeping the even numbers.&lt;/p&gt;

&lt;p&gt;The last method in the chain is the &lt;code&gt;collect&lt;/code&gt;. This is the only eager method that we saw this far. This method will consume the entire stream by collecting each element into an array. In the end, it returns this array. After calling this method, the original stream goes into an invalid state and you cannot work with it anymore. All methods that consume the stream have commentaries that point that out.&lt;/p&gt;

&lt;h2&gt;
  
  
  A little more focus on the Lazy part.
&lt;/h2&gt;

&lt;p&gt;The Stream tries to be as lazy as it can. Because of that, the following code will not output anything.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;EBANX\Stream\Stream&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nv"&gt;$result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Stream&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;of&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nv"&gt;$value&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Original Element &lt;/span&gt;&lt;span class="nv"&gt;$value&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nv"&gt;$value&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nv"&gt;$value&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"After first map &lt;/span&gt;&lt;span class="nv"&gt;$value&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nv"&gt;$value&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This happens because map is lazy. If no one tries to use/consume the stream, we will never apply the map. So, let's try again by adding a 'sum' to the end.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;EBANX\Stream\Stream&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nv"&gt;$result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Stream&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;of&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nv"&gt;$value&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Original Element &lt;/span&gt;&lt;span class="nv"&gt;$value&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nv"&gt;$value&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nv"&gt;$value&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"After first map &lt;/span&gt;&lt;span class="nv"&gt;$value&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nv"&gt;$value&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="k"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"The sum is &lt;/span&gt;&lt;span class="nv"&gt;$sum&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This should output the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Original Element 1
After first map 5
Original Element 2
After first map 10
The sum is 7
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note the order. We don't apply the first map to all elements and then apply the second one. We apply all transformations to each element as soon as they are consumed. This allows us to execute fewer transformations in case we are using something like &lt;code&gt;take&lt;/code&gt;, where we can take only N elements of the stream.&lt;/p&gt;

&lt;h2&gt;
  
  
  The reduce.
&lt;/h2&gt;

&lt;p&gt;There are a lot of different transformations and ways to consume your streams. But I want to focus a little bit on the &lt;code&gt;reduce&lt;/code&gt;. The reduce is an eager method so it consumes the stream. It receives an accumulator initial value and a callable. Then, for each element of the stream the callable will receive the current value of the accumulator and the current element. The return of the callable will be the new value of the accumulator. After the last iteration, it returns whatever is in the accumulator.&lt;/p&gt;

&lt;p&gt;The powerful thing about the reduce is that all other methods that consume the stream could be written using the reduce. The only reason that we added the other methods is for easy-of-use. For example, here we are implementing the &lt;code&gt;sum&lt;/code&gt; using the &lt;code&gt;reduce&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="nv"&gt;$result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Stream&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;rangeInt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;reduce&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nv"&gt;$acc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nv"&gt;$value&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nv"&gt;$acc&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nv"&gt;$value&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Arrays in the parameters.
&lt;/h2&gt;

&lt;p&gt;Let's try to implement the &lt;code&gt;collect&lt;/code&gt; method using the reduce:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="nv"&gt;$result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Stream&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;rangeInt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;reduce&lt;/span&gt;&lt;span class="p"&gt;([],&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;array&lt;/span&gt; &lt;span class="nv"&gt;$acc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nv"&gt;$value&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;array&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$acc&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$value&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nv"&gt;$acc&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you execute it, you will see that it will work as expected. But what about performance? Let's create a range of 100000 entities and check the performance using reduce and using collect. These are the results on my machine:&lt;/p&gt;

&lt;p&gt;Using collect:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;real    0m0.128s
user    0m0.076s
sys 0m0.032s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Using reduce:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;real    1m29.543s
user    0m46.033s
sys 0m43.108s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Wow, that is a BIG difference. Why does it take so much more to execute it using reduce?? Well, that is because of PHP and arrays.&lt;/p&gt;

&lt;p&gt;PHP will pass an array by value in functions parameters. What this means is that every time that we try to change and array that we received as a parameter, PHP will clone the entire array to a new memory location. In our example, we are doing this 100k times and every time the array is a little bit longer.&lt;/p&gt;

&lt;p&gt;A way around this issue is to receive the array as a reference. Here is the same example using references and the amount of time that it took to process the same amount of entities.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;EBANX\Stream\Stream&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nv"&gt;$result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Stream&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;rangeInt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;100000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;reduce&lt;/span&gt;&lt;span class="p"&gt;([],&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;array&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nv"&gt;$acc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nv"&gt;$value&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;array&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$acc&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$value&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nv"&gt;$acc&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;real    0m0.136s
user    0m0.085s
sys 0m0.033s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  To finish it all up
&lt;/h2&gt;

&lt;p&gt;Stream is a very useful and addictive library. You can get more details of all methods on it by taking a look at our &lt;a href="https://github.com/ebanx/stream/blob/master/test/StreamTest.php"&gt;unit tests&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;It is under a MIT license so feel free to go wild with it. Cheers.&lt;/p&gt;

</description>
      <category>php</category>
    </item>
    <item>
      <title>Flaws of PHP Iterators</title>
      <dc:creator>Bruno Clemente</dc:creator>
      <pubDate>Thu, 23 Jul 2020 01:05:12 +0000</pubDate>
      <link>https://forem.com/killertux/the-flaws-of-php-iterators-1afd</link>
      <guid>https://forem.com/killertux/the-flaws-of-php-iterators-1afd</guid>
      <description>&lt;p&gt;First of all, I need to point out that as it normally is when we talk about programming and design, this is my personal opinion. But, I will try to explain my position the best that I can.&lt;/p&gt;

&lt;p&gt;So, what is an Iterator? As the &lt;a href="https://en.wikipedia.org/wiki/Iterator"&gt;wiki page&lt;/a&gt; says, an iterator is an object that allows us to traverse a container. Another definition that I found on the internet is that an iterator is an abstraction of a pointer to an element of a sequence.&lt;/p&gt;

&lt;p&gt;Imagine that you have a container with some elements. Maybe a PHP array. An iterator will point to an element of that array and also provide ways so you can get a pointer to the next element of that array.&lt;/p&gt;

&lt;p&gt;It worth mentioning that iterators can also be used in containers that not really exist in memory. For example, you could create an iterator that each element is the next possible odd number. This iterator pointer to an imaginary container that holds an infinite amount of all possible odd numbers. Iterators are also used for lazy loading where, for example, you iterate over data that is on a server but you only load it at each iteration.&lt;/p&gt;

&lt;p&gt;Iterators appear in many languages like JAVA, C++, JavaScript, and Rust. But the journey for iterators in PHP started in PHP 4. That is when the foreach loop was added and you could use it with arrays with this syntax:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="nv"&gt;$array&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="s1"&gt;'first'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'second'&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;foreach&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$array&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nv"&gt;$key&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="mf"&gt;...&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the next major version, iterators were added using two interfaces: Iterator and IteratorAggregate. I will not focus on the IteratorAggregate right now because it does not represent an iterator in itself but actually something that we can get an iterator from. &lt;/p&gt;

&lt;p&gt;The main focus here will be the Iterator interface. So, let's take a look into its methods:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="kd"&gt;interface&lt;/span&gt; &lt;span class="nc"&gt;Iterator&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;Traversable&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="cm"&gt;/* Methods */&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;current&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt; &lt;span class="n"&gt;void&lt;/span&gt; &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;mixed&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;key&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt; &lt;span class="n"&gt;void&lt;/span&gt; &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;scalar&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;next&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt; &lt;span class="n"&gt;void&lt;/span&gt; &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;void&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;rewind&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt; &lt;span class="n"&gt;void&lt;/span&gt; &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;void&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;valid&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt; &lt;span class="n"&gt;void&lt;/span&gt; &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;bool&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you implement this interface, your object can be iterated both by manually calling the next, current, valid methods or by using the same foreach loop that you used for arrays. Inside a foreach, the following methods will be called:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="cd"&gt;/** First Iteration */&lt;/span&gt;
&lt;span class="nb"&gt;rewind&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="nf"&gt;valid&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="nb"&gt;current&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="nb"&gt;key&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="cd"&gt;/** Remaining Iterations until valid() returns false */&lt;/span&gt;
&lt;span class="nb"&gt;next&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="nf"&gt;valid&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="nb"&gt;current&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="nb"&gt;key&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And, to demonstrate, here is an implementation of an iterator that read a file line by line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;FileIterator&lt;/span&gt; &lt;span class="k"&gt;implements&lt;/span&gt; &lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="nc"&gt;Iterator&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;

    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="nv"&gt;$resource&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="nv"&gt;$read_lines&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="nv"&gt;$current_line&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;__construct&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nv"&gt;$filename&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;resource&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;fopen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$filename&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'r'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;read_lines&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;current_line&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;rewind&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nb"&gt;rewind&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;resource&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;read_lines&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;fetchLine&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;next&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;fetchLine&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
        &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;read_lines&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;current&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;current_line&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;read_lines&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;valid&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="kt"&gt;bool&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nb"&gt;feof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;resource&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;fetchLine&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;current_line&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;rtrim&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;fgets&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;resource&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="kc"&gt;PHP_EOL&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now that we know what are iterators, where do we use them and how do they work in PHP, I want to point out 3 design flaws of PHP iterators. I will enumerate them from what I think is the least serious, to the biggest flaw.&lt;/p&gt;

&lt;h3&gt;
  
  
  The key() method
&lt;/h3&gt;

&lt;p&gt;The first design flaw is the &lt;code&gt;key()&lt;/code&gt; method. My problem with it is that by being on the interface, everyone that wants to create an iterator needs to implement some type of key for each iteration.&lt;/p&gt;

&lt;p&gt;There are some times where this is useful. For example, in an iterator over a database table, we can use the table primary key as the key.&lt;/p&gt;

&lt;p&gt;But imagine another iterator that iterates over a list of prime numbers. What would we use as a key? Look at the FileIterator example. There, we are just incrementing a local property as key and this is a very common approach to this problem. Although, we may never actually need to use it.&lt;/p&gt;

&lt;p&gt;I imagine that the reason for the &lt;code&gt;key()&lt;/code&gt;, is because of the &lt;code&gt;foreach&lt;/code&gt;. As I said earlier, the foreach came as a syntax sugar to iterate over an array. Since every element of an array has an index, it makes sense to also add to it a way to easily access the index of each element in the iteration. When the iterator interface was added, the PHP core team decided to also use the foreach with the iterator, and to avoid changing the current syntax, they added a key to the iterator.&lt;/p&gt;

&lt;h3&gt;
  
  
  The current() and valid() method
&lt;/h3&gt;

&lt;p&gt;The problem with &lt;code&gt;current()&lt;/code&gt; and &lt;code&gt;valid()&lt;/code&gt; methods is the horrible time coupling. We always need to call &lt;code&gt;valid()&lt;/code&gt; to check if the iterator is in a valid state before calling &lt;code&gt;current()&lt;/code&gt;. Calling &lt;code&gt;current()&lt;/code&gt; without checking if the iterator is in a valid state is undefined behavior.&lt;/p&gt;

&lt;p&gt;A slightly better approach would be to get rid of the &lt;code&gt;valid()&lt;/code&gt; method and add to the &lt;code&gt;current()&lt;/code&gt; method a way to inform that the iteration is over. An even better approach would be to also remove the &lt;code&gt;current()&lt;/code&gt; method and change the &lt;code&gt;next()&lt;/code&gt; method to return the value. As is done in other languages like JavaScript, Rust, and Python.&lt;/p&gt;

&lt;p&gt;Another thing that is kind of annoying about having a &lt;code&gt;valid()&lt;/code&gt; and &lt;code&gt;current()&lt;/code&gt; method is that is quite common to have to save extra data inside the iterator. We can see this in the FileIterator where we save the data of each line read when &lt;code&gt;next()&lt;/code&gt; is called to be returned in the &lt;code&gt;current()&lt;/code&gt; method.&lt;/p&gt;

&lt;h3&gt;
  
  
  The rewind() method
&lt;/h3&gt;

&lt;p&gt;Now, is time for what I think is the worst design flaw, the &lt;code&gt;rewind()&lt;/code&gt; method. The rewind resets the iterator so we can iterate over it again. The problem is that not all iterable things can or should be rewindable. Probably the correct approach would have been to have another interface, something like &lt;code&gt;\RewindableIterator&lt;/code&gt;, that extends the iterator interface with the rewind method.&lt;/p&gt;

&lt;p&gt;You might ask, which iterator is not rewindable? Well, a good example can be found in the &lt;code&gt;\Generator&lt;/code&gt; class. As is stated in the &lt;a href="https://www.php.net/manual/en/language.generators.comparison.php"&gt;PHP manual&lt;/a&gt;, generators are forward-only iterators. This means that we cannot rewind it after we start to iterate over it. And what happens if we try to do so? We receive a horrible exception resulting in a very nasty infringement of the Liskov Substitution Principle.&lt;/p&gt;

&lt;p&gt;Also, another thing that really grinds my gears is that we always need to call rewind before we start to iterate over an iterator. Calling any method of an iterator without calling rewind first is also undefined behavior. Now, imagine that we have a method that receives an iterator. The problem is if we check if the iterator is valid and it is not, we don't know if it is invalid because the iteration is over or if it is invalid because no one called rewind on it. This gets worse since we can't call a rewind if it is an already started generator as doing so would throw an exception.&lt;/p&gt;

&lt;h3&gt;
  
  
  Better approaches
&lt;/h3&gt;

&lt;p&gt;Looking at the approach of other languages I personally like two other approaches.&lt;/p&gt;

&lt;p&gt;The first one comes from Rust. There, the iterator has only one method: &lt;code&gt;next()&lt;/code&gt;. The next method returns an option that can be either Some(Value) or None. When None is returned it means that iteration is over. I played a little bit with this approach in this &lt;a href="https://github.com/killertux/riterator"&gt;project&lt;/a&gt;, where I basically rewrite the rust iterator using PHP. I also added some adapters to better integrate it with the current PHP approach.&lt;/p&gt;

&lt;p&gt;Another solution is the python way. The python iterator also has only the &lt;code&gt;next()&lt;/code&gt; method. What differs from Rust, is that an exception is thrown when we call a next in a completed iterator. I personally have my problems with this because we are using an exception to control the flow of the program. But in a controlled way, it might be a good approach.&lt;/p&gt;

&lt;h3&gt;
  
  
  Summing up
&lt;/h3&gt;

&lt;p&gt;As stated above, I think that the PHP iterator has some major design flaws that make working with it quite annoying. I tried to explain why I think this in this article. Also, I have shown other approaches. Nevertheless, I have to say thanks to the PHP core team because overall, I think that the PHP language as a whole has evolved a lot during the last years.&lt;/p&gt;

</description>
      <category>php</category>
      <category>iterators</category>
      <category>design</category>
    </item>
    <item>
      <title>Playing with Rust to solve data-intensive problem</title>
      <dc:creator>Bruno Clemente</dc:creator>
      <pubDate>Tue, 31 Mar 2020 12:33:11 +0000</pubDate>
      <link>https://forem.com/killertux/playing-with-rust-to-solve-data-intensive-problem-2f14</link>
      <guid>https://forem.com/killertux/playing-with-rust-to-solve-data-intensive-problem-2f14</guid>
      <description>&lt;p&gt;As programmers, we need to be able to choose the correct tools to solve the problems that we face every day. One of those tools is which programming language should we use. Sometimes a scripting language like Python, PHP or JS will be best. At others, we will need the robustness of a compiled language like Java, C++ or Rust. In this post, I will talk about my adventure in changing the stack to help me solve a problem.&lt;/p&gt;

&lt;p&gt;I work as a software engineer at EBANX. Here, most of our backend is written in PHP. Even though there is a lot of criticism about it out there, PHP is a great language that is scaling pretty well to attend our needs. But, the lack of good multithreading can be a negative point when you want to do some very data or CPU intensive tasks.&lt;/p&gt;

&lt;p&gt;So, here at EBANX, we have a very critical operation that happens every day and it needs to be finished by noon. Since it is very important for our business, we are constantly trying to make it faster without sacrificing the business rules. One of the steps of this process was taking over 30 minutes, so I decided to play with it to see if I could get it to be faster.&lt;/p&gt;

&lt;p&gt;Now, this step is very data-intensive. Imagine that we have a payment(transaction) with its unique identifier, a date, a customer identifier and an amount. We need to check that if we sum the amount from all payments with the same customer identifier in the same month (month/year) it does not surpass a specific threshold. To make matters worse, every day we may have more than a million transactions from a variety of customers and dates that need to be validated against the entire history of the company.&lt;/p&gt;

&lt;p&gt;In the past, we tried to solve this by using our relational database. But as the company grew, it started to be too slow. So, we changed to a different approach. We started to write a file every day with all payments that entered during that day operation with a month-year identifier, the payment amount and a customer identifier. So, now we open all files from the company history, we create a compound key using the month-year and customer identifier. Then we add up all amounts from payments that match the keys from our current process and alert it goes over our limit. All of this is done using PHP.&lt;/p&gt;

&lt;p&gt;So, my first decision was that I would not change this method. Even though we could think of ways of pre-caching some of those calculations, I wanted to keep the same approach in a way that I could simply swap the existent code with mine. Also,  I decided to try writing it using Rust since it is a compiled language focused on performance and safety.&lt;/p&gt;

&lt;p&gt;To have a baseline, I started by running the current code on my computer to see how much time it was taking on my machine. I ran it 5 times and it took on average 9.3 minutes. Now, It was way faster than in our production environment and I think it is due to my SSD being faster than whatever storage device is being used in production.&lt;/p&gt;

&lt;p&gt;My next step was to rewrite all the current code from PHP to Rust. I wrote it as a standalone program that could be called by our current PHP code and just wait for the result reading the standard output.&lt;br&gt;
I used some cool crates like &lt;a href="https://crates.io/crates/flate2"&gt;flate2&lt;/a&gt; that allows you to stream over a GZ compressed file. Also, I wrote integration tests using &lt;a href="https://crates.io/crates/assert_cmd"&gt;assert_cmd&lt;/a&gt; to call my program and assert that it outputs the expected payments.&lt;/p&gt;

&lt;p&gt;Once it was finished, I did the same thing that I did with the original PHP code. I ran it 5 times and got the average execution time. And it was... 9.1 minutes. Only 2.1% faster. Quite a disappointment right? It makes sense since it does not perform a lot of heavy computations, the PHP interpreter was not being a big bottleneck. But I did not give up. My next step was to use some multithreading.&lt;/p&gt;

&lt;p&gt;If you never used Rust, it tries to be a fast language like C or C++, but with a lot of guarantees that help you as a programmer to not mess up. It does not make multithreading easy, but it makes it safer.&lt;/p&gt;

&lt;p&gt;I was using a HashMap to accumulate my values where the key was a tuple with the first element being the month-year of the payment and the second element the customer identifier. For each key, I was accumulating the total amount. To do the multithreading I could go in two ways, the first would be to create a separate HashMap to each thread and then unify them at the end. This would avoid concurrency problems and having one thread have to wait for another one to stop modifying the data. Another approach would be to use the same hashmap across all threads and handle the clashes. I decided to go with the second option using a &lt;a href="https://crates.io/crates/dashmap"&gt;DashMap&lt;/a&gt;. A Dashmap is a Rust crate that offers a concurrent hashmap implementation that tries to minimize the number of locks. It has some pretty good benchmarks as you can see in the following image taken from the official crate documentation. One factor that made me prefer this approach is that in my use case it should not happen very often that two threads are processing payments from the same customer opened in the same month so clashes should be rare.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--X4RcEDzx--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/gag5v9nazw1x7jk3j6pn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--X4RcEDzx--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/gag5v9nazw1x7jk3j6pn.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Another thing that I needed to think about is since the company has grown a lot, the newer files are way bigger than the old ones. So, a thread might get all small files so it ends fast while another gets stuck with all the big ones and take a lot of time. To try to minimize this, I shuffle all files before dividing them into the threads. I can do this safely since the order of the files does not affect the result.&lt;/p&gt;

&lt;p&gt;After it was done it was the time to test it. Going from 2 threads up to 8(The number of virtual cores on my machine), I ran it 5 times and took the average. Bellow is the result of all tests. Now, the average execution time for 8 threads was 2.6 minutes. 72% faster. Yay, real improvement :).  I did not test it yet on one of our production machines to see how it will perform there.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Yw8dUdO7--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/yy2jc84pdxehr56nohel.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Yw8dUdO7--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/yy2jc84pdxehr56nohel.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;My next step will be to rewrite everything to use async I/O. See if this will make it even faster or not. After it, I will try changing the approach and do some caching to try getting it from minutes to seconds.&lt;/p&gt;

</description>
      <category>rust</category>
      <category>php</category>
      <category>datascience</category>
    </item>
  </channel>
</rss>
