<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Vladislav Zenin</title>
    <description>The latest articles on Forem by Vladislav Zenin (@vladzen13).</description>
    <link>https://forem.com/vladzen13</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F985400%2Feb4fd5b2-5db4-45c1-b41e-c64cab849173.png</url>
      <title>Forem: Vladislav Zenin</title>
      <link>https://forem.com/vladzen13</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/vladzen13"/>
    <language>en</language>
    <item>
      <title>Mastering Python Standard Library: infinite iterators of itertools</title>
      <dc:creator>Vladislav Zenin</dc:creator>
      <pubDate>Thu, 15 Dec 2022 23:30:00 +0000</pubDate>
      <link>https://forem.com/vladzen13/mastering-python-standard-library-infinite-iterators-of-itertools-5em7</link>
      <guid>https://forem.com/vladzen13/mastering-python-standard-library-infinite-iterators-of-itertools-5em7</guid>
      <description>&lt;p&gt;Let's continue our little research of &lt;code&gt;itertools&lt;/code&gt; module.&lt;/p&gt;

&lt;p&gt;Today we'll have a look at 3 infinite iterator constructors:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from itertools import count, cycle, repeat
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  itertools.count
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;itertools.count&lt;/code&gt; - is like a &lt;code&gt;range&lt;/code&gt;, but lazy and endless.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;By the way, if you have never heard of &lt;strong&gt;laziness&lt;/strong&gt; &lt;em&gt;(well, I'm sure we all heard of it, and moreover, practice it everyday)&lt;/em&gt; - then you really should check it out, &lt;a href="https://en.wikipedia.org/wiki/Lazy_evaluation"&gt;at least briefly&lt;/a&gt;. Someday we will walk the path of David Beazley and his legendary "Generator Tricks For Systems Programmers" in 147 pages, but not today. Today is for the basics.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Well, &lt;code&gt;count&lt;/code&gt; is super easy, it just counts until infinity. &lt;em&gt;Or minus infinity, if step is negative.&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def my_count(start=0, step=1):
    x = start
    while True:
        yield x
        x += step
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it.&lt;/p&gt;

&lt;p&gt;But there is a caveat. It never stops, so you can't &lt;em&gt;"consume"&lt;/em&gt; it.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;To consume - is to read all iterable at once, for example, to store it in a list.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Well, actually, you can try, but this code line will &lt;strong&gt;freeze to death any machine&lt;/strong&gt;. And yeah, many-many Ctrl+C won't help. Only hard reset, I did warn you ;)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;list(itertools.count())
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then, how am I supposed to work with it, if I can't call list/set/sum/etc. on it?&lt;/p&gt;

&lt;p&gt;First of all, you can iterate over it (and break out - when time comes):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;for i in count(start=10, step=-1):
    print(i, end=", ")
    if i&amp;lt;=0: break

# 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0,
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Second, some programs never break from endless loop, waiting for something to happen: workers waiting for incoming tasks, http servers waiting for incoming request, etc. But we shall skip this case. For now.&lt;/p&gt;

&lt;p&gt;Finally, you can combine infinite iterator with another lazy iterators: &lt;code&gt;map&lt;/code&gt;, &lt;code&gt;zip&lt;/code&gt;, &lt;code&gt;islice&lt;/code&gt;, &lt;code&gt;accumulate&lt;/code&gt;, etc.&lt;/p&gt;

&lt;p&gt;When iterators like &lt;code&gt;zip&lt;/code&gt; or &lt;code&gt;map&lt;/code&gt; iterate over multiple iterables at once, they &lt;strong&gt;finish when any of iterables finishes&lt;/strong&gt;. It gives us &lt;em&gt;exit&lt;/em&gt; from infinite iterator.&lt;/p&gt;

&lt;p&gt;Here is an example from &lt;code&gt;itertools.repeat&lt;/code&gt; docs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;list(map(pow, range(10), repeat(2)))
# [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Our machine is staying alive - although, technically we "consume infinite repeat with list". Well, &lt;code&gt;range&lt;/code&gt; is finite and &lt;code&gt;map&lt;/code&gt; finishes together with it.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Infinite iterator rejects its infinity - just to finish together with some finite collection...&lt;br&gt;
Wow! Some serious &lt;a href="https://en.wikipedia.org/wiki/Highlander_(film)"&gt;Highlander&lt;/a&gt; &amp;amp; &lt;a href="https://www.youtube.com/watch?v=_Jtpf8N5IDE"&gt;Queen&lt;/a&gt; vibe around here ...&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  itertools.repeat
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;itertools.repeat&lt;/code&gt; is even easier, than &lt;code&gt;itertools.count&lt;/code&gt;. It doesn't even count, but simply repeats the same value infinitely. Also, there is a form with fixed amount of repeats.&lt;/p&gt;

&lt;p&gt;According to &lt;code&gt;itertools&lt;/code&gt; docs, &lt;code&gt;itertools.repeat&lt;/code&gt; is roughly equivalent to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def repeat(object, times=None):
    # repeat(10, 3) --&amp;gt; 10 10 10
    if times is None:
        while True:
            yield object
    else:
        for i in range(times):
            yield object
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For "fixed" form and since python generator statements are also lazy, &lt;code&gt;itertools.repeat(42, 10)&lt;/code&gt; can be simplified as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;( 42 for _ in range(10) )
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For infinite form, we can't simplify it with &lt;code&gt;range&lt;/code&gt;, but one can notice, that &lt;code&gt;itertools.repeat&lt;/code&gt; equals to &lt;code&gt;itertools.count&lt;/code&gt; with step=0.&lt;/p&gt;

&lt;p&gt;I guess, &lt;code&gt;repeat&lt;/code&gt; and &lt;code&gt;count&lt;/code&gt; add a little bit of readability to your code, and they might also be quite faster than python generator statements. However, it is not that easy to test performance of iterators &lt;em&gt;(especially, infinite ones :) )&lt;/em&gt; since &lt;em&gt;they exhaust&lt;/em&gt;, and performance test is multiple repetition and comparison.&lt;/p&gt;

&lt;p&gt;Still, let us try:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;In [49]: i1 = lambda: ( 42 for _ in range(100000) )

In [50]: i2 = lambda: repeat(42, 100000)

In [51]: %timeit sum(i1())
3.49 ms ± 36.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [52]: %timeit sum(i2())
333 µs ± 1.27 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;itertools.repeat&lt;/code&gt; seems to be &lt;strong&gt;10 times faster!&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;By the way, do you think that performance test with &lt;em&gt;"&lt;a href="https://en.wikipedia.org/wiki/Factory_method_pattern"&gt;lambda-style factory&lt;/a&gt;"&lt;/em&gt; is valid and comparison is correct?&lt;/p&gt;

&lt;h3&gt;
  
  
  Wait, what do you mean by "exhaust"?
&lt;/h3&gt;

&lt;p&gt;If you are &lt;em&gt;confused&lt;/em&gt; with "exhaust" in the previous section - then I'll show you only this ...&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;In [3]: i = ( x for x in range(10) )

In [4]: sum(i)
Out[4]: 45

In [5]: sum(i)
Out[5]: 0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;... and strongly encourage you to dive into &lt;a href="https://docs.python.org/3/howto/functional.html"&gt;Python Functional Programming HowTo&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  itertools.cycle
&lt;/h2&gt;

&lt;p&gt;Endless cycle over iterable. As simple as that:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# cycle('ABCD') --&amp;gt; A B C D A B C D ...

def my_cycle(iterable):
    while True:
        yield from iterable
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Despite its simplicity, it is &lt;strong&gt;very&lt;/strong&gt; convenient.&lt;/p&gt;

&lt;p&gt;I really love to rotate proxies/useragents/etc with &lt;code&gt;itertools.cycle&lt;/code&gt; for regular parsing/scraping of web pages.  &lt;/p&gt;

&lt;p&gt;For instance, you can define some "global" iterators:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;PROXY_CYCLE = itertools.cycle(proxy_list)
UA_CYCLE = itertools.cycle(ua_list)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And each time you need to make a new request, you just ask "global" iterators for new proxy/ua values with &lt;code&gt;next&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;proxy = next(PROXY_CYCLE)
ua = next(UA_CYCLE)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It turns out as a distributed iteration from different places of the program at the same time. But iterator is shared. Iterator as a service, huh.&lt;/p&gt;

&lt;p&gt;It's like we defined a class &lt;code&gt;ProxyManager&lt;/code&gt; with method &lt;code&gt;ProxyManager.get&lt;/code&gt;, which handles proxy rotation and selection. But instead of  &lt;code&gt;class&lt;/code&gt; we have &lt;code&gt;itertools.cycle&lt;/code&gt;, and instead of &lt;code&gt;get&lt;/code&gt; - we have &lt;code&gt;next&lt;/code&gt;, instead of 10 code lines - only 1.  So do we really need to define a class? :)&lt;/p&gt;

&lt;h2&gt;
  
  
  That's all, folks!
&lt;/h2&gt;

&lt;p&gt;Thank you for reading, hope you enjoyed! Consider subscribing - &lt;strong&gt;we shall go deeper&lt;/strong&gt; :) &lt;/p&gt;

&lt;h3&gt;
  
  
  Anything else to read?
&lt;/h3&gt;

&lt;p&gt;Always.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.python.org/3/howto/functional.html"&gt;Python Functional Programming HowTo&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="http://www.dabeaz.com/generators/Generators.pdf"&gt;For bravehearts&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.python.org/3/library/itertools.html"&gt;Of cource, itertools module docs&lt;/a&gt;&lt;/p&gt;

</description>
      <category>programming</category>
      <category>tutorial</category>
      <category>python</category>
      <category>beginners</category>
    </item>
    <item>
      <title>Mastering Python Standard Library: itertools.chain</title>
      <dc:creator>Vladislav Zenin</dc:creator>
      <pubDate>Sat, 10 Dec 2022 15:30:00 +0000</pubDate>
      <link>https://forem.com/vladzen13/mastering-python-standard-library-itertoolschain-1e07</link>
      <guid>https://forem.com/vladzen13/mastering-python-standard-library-itertoolschain-1e07</guid>
      <description>&lt;p&gt;Imagine, you need to iterate over some N iterables.&lt;/p&gt;

&lt;p&gt;For example, you have two lists: l1 and l2.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;In [2]: l1 = list(range(5))
In [3]: l2 = list(range(10))

In [4]: l1
Out[4]: [0, 1, 2, 3, 4]

In [5]: l2
Out[5]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here is the easiest way to do so:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;for i in l1+l2: print(i, end=", ")
# 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;However, it may not be the best one. &lt;code&gt;l1+l2&lt;/code&gt; statement is a list concatenation, and that give you a &lt;em&gt;new list&lt;/em&gt; with &lt;code&gt;len(l1+l2) == len(l1) + len(l2)&lt;/code&gt;. If you positive that both lists are rather small, then it's kinda okay. &lt;/p&gt;

&lt;p&gt;But, let us assume they are each of 1GB in RAM. At peak, your program will consume 4GB, twice the size of input lists. And what if you don't have much RAM? - maybe your code is in AWS Lambda, etc.&lt;/p&gt;

&lt;p&gt;Actually, we want to do something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def gen(l1, l2):
    yield from l1
    yield from l2

for i in gen(l1,l2): print(i, end=", ")
# 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No new lists, no copies, no memory overhead. Just iterate over the first list and then iterate over the second one. &lt;/p&gt;

&lt;p&gt;And that &lt;code&gt;gen&lt;/code&gt; iterator is already coded for you, and also known as &lt;code&gt;itertools.chain&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import itertools

for i in itertools.chain(l1,l2): print(i, end=", ")
# 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;By the way, there is another form of &lt;code&gt;itertools.chain&lt;/code&gt;, &lt;code&gt;itertools.chain.from_iterable&lt;/code&gt;. It does absolutely the same, except input arguments unpacking:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;for i in itertools.chain.from_iterable([l1, l2]): print(i, end=", ")
# 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So, in general:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# this is itertools.chain
def my_chain(*collections):
    for collection in collections:
        yield from collection

# this is itertools.chain.from_iterable
def my_chain_from_iterable(collections):
    for collection in collections:
        yield from collection
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why there are 2 chains, with one tiny "*" difference? I really don't know - but who am I to judge authors of itertools module, &lt;em&gt;they are true gods&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;But I do know, that &lt;a href="https://en.wikipedia.org/wiki/Occam%27s_razor"&gt;"entities should not be multiplied beyond necessity"&lt;/a&gt;. And this thought brings us back to our unnecessary extra list creation issue.&lt;/p&gt;

&lt;h2&gt;
  
  
  So what’s the point?
&lt;/h2&gt;

&lt;p&gt;Well, use chain! Learn &lt;code&gt;itertools&lt;/code&gt; module. Think about performance. &lt;em&gt;Save the memory&lt;/em&gt;, in production environment it is actually limited and &lt;a href="https://cloud.google.com/functions/pricing#compute_time"&gt;not really cheap&lt;/a&gt;! &lt;/p&gt;

&lt;h3&gt;
  
  
  Anything else to read?
&lt;/h3&gt;

&lt;p&gt;Sure.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.python.org/3/library/index.html"&gt;Whole lotta docs&lt;/a&gt; - Master the power of standard library!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.python.org/3/library/itertools.html"&gt;Itertools module docs&lt;/a&gt; - chain is not the only one, there are plenty more&lt;/p&gt;

&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Occam%27s_razor"&gt;Occam's Razor&lt;/a&gt; - really, read it&lt;/p&gt;

</description>
      <category>python</category>
      <category>beginners</category>
      <category>performance</category>
      <category>programming</category>
    </item>
    <item>
      <title>Tricky Unpacking In Python</title>
      <dc:creator>Vladislav Zenin</dc:creator>
      <pubDate>Wed, 07 Dec 2022 08:03:51 +0000</pubDate>
      <link>https://forem.com/vladzen13/tricky-unpacking-in-python-3m6a</link>
      <guid>https://forem.com/vladzen13/tricky-unpacking-in-python-3m6a</guid>
      <description>&lt;p&gt;Imagine, you iterate through a collection, which contains some other collections.&lt;/p&gt;

&lt;p&gt;Like so: list of lists&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;In [32]: L = [ [i] for i in range(10) ]

In [33]: L
Out[33]: [[0], [1], [2], [3], [4], [5], [6], [7], [8], [9]]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One obvious way to iterate over inner values is to use indexing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;In [24]: [ i[0] for i in L ]
Out[24]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Well, there is another way to do so. Almost so :)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;In [24]: [ i for i, in L ]
Out[24]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In fact, it is single element unpacking. It works, because in python &lt;strong&gt;commas&lt;/strong&gt; "construct" tuples, not &lt;em&gt;brackets&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;In [29]: 5,
Out[29]: (5,)

In [30]: (5)
Out[30]: 5
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Are there any differences?
&lt;/h2&gt;

&lt;p&gt;Yeap.&lt;/p&gt;

&lt;p&gt;This unpacking seems to be faster than reading by index. Not much, by ~10%.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;In [24]: L = [ [i] for i in range(1000) ]

In [25]: %timeit [ i for i, in L ]
19.7 µs ± 31.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [26]: %timeit [ i[0] for i in L ]
22.1 µs ± 150 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Also, there is logical difference.&lt;/p&gt;

&lt;p&gt;If we take a list of empty lists as input, both statements will fall with different exceptions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;In [30]: [ i[0] for i in L+[[]] ]
# IndexError: list index out of range

In [31]: [ i for i, in L+[[]] ]
# ValueError: not enough values to unpack (expected 1, got 0)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;However, if we have more than 1 element in any of inner lists, then:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;unpacking will fall with &lt;code&gt;ValueError: too many values to unpack (expected 1)&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;and reading by index will &lt;em&gt;silently&lt;/em&gt; return first elements of lists&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://peps.python.org/pep-0020/" rel="noopener noreferrer"&gt;"Explicit is better than implicit"&lt;/a&gt; - they say, huh?&lt;/p&gt;

&lt;p&gt;Hope you enjoyed! :)&lt;/p&gt;

</description>
      <category>watercooler</category>
    </item>
    <item>
      <title>Прикольный трюк: сжатие csv файлов 'на лету' в pandas</title>
      <dc:creator>Vladislav Zenin</dc:creator>
      <pubDate>Tue, 06 Dec 2022 16:12:18 +0000</pubDate>
      <link>https://forem.com/vladzen13/prikolnyi-triuk-szhatiie-csv-failov-na-lietu-v-pandas-1oo4</link>
      <guid>https://forem.com/vladzen13/prikolnyi-triuk-szhatiie-csv-failov-na-lietu-v-pandas-1oo4</guid>
      <description>&lt;p&gt;pandas - великолепный инструмент для работы с данными в python, а csv - де-факто стандартный формат хранения данных в Data Science (да и много где еще).&lt;/p&gt;

&lt;p&gt;Однако, csv файлы могут занимать &lt;em&gt;ооочень&lt;/em&gt; много места. Если Вы сохраняете какие-то промежуточные данные или регулярно делаете выгрузки из СУБД, то и количество этих файлов может быстро расти.&lt;/p&gt;

&lt;p&gt;Если Вам приходится часто двигать файлы через сеть между различными окружениями - сервера/рабочая станция/Google Colab/Kaggle, то этот процесс может превратиться в настоящую головную боль. Большие файлы долго передаются по сети, дисковое пространство в сервисах быстро заканчивается и они начинают требовать от Вас апгрейдить аккаунт и расширять лимиты.&lt;/p&gt;

&lt;p&gt;Но есть решение, причем удивительно простое и удобное! &lt;/p&gt;

&lt;h4&gt;
  
  
  Итак, у нас есть относительно большой csv файл.
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;user@d14 /tmp # ls -la data.csv
-rw-r--r-- 1 datascience datascience 226M Dec  5 16:07 data.csv
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Откроем наш файл на 226MB в pandas:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import pandas

df = pandas.read_csv('data.csv', index_col=0)

df.info()

# &amp;lt;class 'pandas.core.frame.DataFrame'&amp;gt;
# Int64Index: 42367 entries, 0 to 42429
# Columns: 240 entries
# dtypes: bool(4), float64(178), int64(25), object(33)
# memory usage: 76.8+ MB
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Как видно, данные тут очень разные: много интов, флоатов, есть также строки. Строки бывают как небольшие, так и приличные json объекты на несколько килобайт.&lt;/p&gt;

&lt;p&gt;Теперь идем в документацию: &lt;code&gt;pandas.read_csv?&lt;/code&gt; / &lt;a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html"&gt;pandas.pydata.org&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;compression : str or dict, default 'infer'&lt;br&gt;
    If str, represents compression mode. If dict, value at 'method' is&lt;br&gt;
    the compression mode. Compression mode may be any of the following&lt;br&gt;
    possible values: {'infer', 'gzip', 'bz2', 'zip', 'xz', None}. If&lt;br&gt;
    compression mode is 'infer' and &lt;code&gt;path_or_buf&lt;/code&gt; is path-like, then&lt;br&gt;
    detect compression mode from the following extensions: '.gz',&lt;br&gt;
    '.bz2', '.zip' or '.xz'. (otherwise no compression). &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;То есть: можно на лету сжимать/разжимать csv файлы, и &lt;strong&gt;все что нужно - это всего лишь, чтобы файл имел правильное расширение&lt;/strong&gt; ('.gz', '.bz2', '.zip' или '.xz'). Даже включать никакой флаг не нужно, &lt;strong&gt;это дефолтное поведение&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Пробуем!&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;exts = '', '.gz', '.bz2', '.zip', '.xz'

for ext in exts: df.to_csv(f'test_compression.csv{ext}')
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Да, на сжатие ушло какое-то время. Смотрим результат:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;user@d14 /tmp # ls -lh test_compression.csv*
-rw-r--r-- 1 user user 223M Dec  6 09:28 test_compression.csv
-rw-r--r-- 1 user user  38M Dec  6 09:29 test_compression.csv.bz2
-rw-r--r-- 1 user user  47M Dec  6 09:29 test_compression.csv.gz
-rw-r--r-- 1 user user  29M Dec  6 09:30 test_compression.csv.xz
-rw-r--r-- 1 user user  48M Dec  6 09:29 test_compression.csv.zip
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Вау!&lt;/strong&gt; Сжатие в 7.5 раз ! Сколько траффика, времени на скачивание/выкачивание, нервов и дискового пространства можно сэкономить!&lt;/p&gt;

&lt;p&gt;Разумеется, открывается так же просто, как и сохраняется:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;df = pandas.read_csv('test_compression.csv.xz', index_col=0)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  А как же время открытия?
&lt;/h2&gt;

&lt;p&gt;Должен же быть подвох! Может, надо ждать полчаса на каждое открытие? Давайте проверим:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;%timeit pandas.read_csv('test_compression.csv', index_col=0)
1.58 s ± 2.59 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit pandas.read_csv('test_compression.csv.bz2', index_col=0)
6.16 s ± 5.89 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit pandas.read_csv('test_compression.csv.gz', index_col=0)
2.18 s ± 4.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit pandas.read_csv('test_compression.csv.xz', index_col=0)
3.14 s ± 6.34 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit pandas.read_csv('test_compression.csv.zip', index_col=0)
2.16 s ± 3.71 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Ждать полчаса не придется:)&lt;/p&gt;

&lt;p&gt;Кажется, нужно просто всегда дописывать &lt;code&gt;.xz&lt;/code&gt; к названиям csv файлов, и все будет сразу хорошо. Это лучшая практика.&lt;/p&gt;

&lt;p&gt;Лучший способ не пропустить новые материалы - оформить подписку на &lt;a href="https://t.me/traceback_ru"&gt;телеграм канал&lt;/a&gt;!&lt;/p&gt;

</description>
      <category>python</category>
      <category>beginners</category>
      <category>tutorial</category>
      <category>linux</category>
    </item>
  </channel>
</rss>
