<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: CoderHXL</title>
    <description>The latest articles on Forem by CoderHXL (@coderhxl).</description>
    <link>https://forem.com/coderhxl</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1048025%2F81e6a303-2dc2-4eac-9633-d1877e2f181d.jpg</url>
      <title>Forem: CoderHXL</title>
      <link>https://forem.com/coderhxl</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/coderhxl"/>
    <language>en</language>
    <item>
      <title>Traditional crawler or AI-assisted crawler? How to choose?</title>
      <dc:creator>CoderHXL</dc:creator>
      <pubDate>Tue, 23 Apr 2024 06:33:52 +0000</pubDate>
      <link>https://forem.com/coderhxl/traditional-crawler-or-ai-assisted-crawler-how-to-choose-46cc</link>
      <guid>https://forem.com/coderhxl/traditional-crawler-or-ai-assisted-crawler-how-to-choose-46cc</guid>
      <description>&lt;h2&gt;
  
  
  Foreword
&lt;/h2&gt;

&lt;p&gt;In the field of data fetching, traditional crawlers and AI-assisted crawlers have their own advantages. The traditional crawler crawls data based on rules, which is suitable for websites with stable structure and clear rules. However, with the frequent changes and complications of the website structure, traditional crawlers have gradually exposed their limitations. In contrast, AI-assisted crawlers use artificial intelligence technology to intelligently analyze web pages and adapt to changes, showing higher flexibility and accuracy. So, in the face of different grasping needs, how should we choose? This article will deeply discuss the characteristics and advantages and disadvantages of traditional crawlers and AI-assisted crawlers, so as to provide you with decision-making reference.&lt;br&gt;
What are the traditional crawlers and AI-assisted crawlers respectively&lt;/p&gt;
&lt;h3&gt;
  
  
  Traditional crawler
&lt;/h3&gt;

&lt;p&gt;Traditional crawlers mainly rely on fixed rules or patterns to grab web data. They typically locate and extract the required information by identifying specific elements in a web page, such as class names, tags, or structures. However, the limitations of this approach are obvious. Once a site is updated to change the original class name, label, or structure, traditional crawlers will fail because they cannot recognize the new elements, resulting in data fetching failures or errors.&lt;/p&gt;
&lt;h3&gt;
  
  
  AI-assisted crawler
&lt;/h3&gt;

&lt;p&gt;Ai-assisted crawlers intelligently analyze and understand the content of web pages to more accurately locate and extract the required information. Through technologies such as natural language processing, they are able to understand the semantic information of the web page, so that they can more precisely locate the required data, and even after the website is updated, AI-assisted crawlers can continue to effectively crawl data.&lt;/p&gt;
&lt;h2&gt;
  
  
  Example
&lt;/h2&gt;

&lt;p&gt;The crawler uses &lt;a href="https://github.com/coder-hxl/x-crawl"&gt;x-crawl&lt;/a&gt;. The crawled websites are all real. To avoid disputes, &lt;a href="https://www.example.com"&gt;https://www.example.com&lt;/a&gt; is used instead.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Traditional crawler to obtain movie information of movie rankings through specific elements in the web page&lt;/li&gt;
&lt;li&gt;Crawler + AI, the crawler is paired with AI to obtain movie information from the movie rankings&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Traditional crawler
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Traditional crawler, obtains movie information of movie rankings through specific elements in the webpage&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;createCrawl&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;x-crawl&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

&lt;span class="c1"&gt;//Create a crawler application&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;crawlApp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createCrawl&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;// crawlPage is used to crawl pages&lt;/span&gt;
&lt;span class="nx"&gt;crawlApp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;crawlPage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://www.example.com&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
   &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;browser&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;

   &lt;span class="c1"&gt;// Wait for the element to appear on the page&lt;/span&gt;
   &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;waitForSelector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;#wrapper #content .article&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;filmHandleList&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;$&lt;/span&gt;&lt;span class="nf"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;#wrapper #content .article table&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

   &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;pendingTask&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
   &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;filmHandle&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;filmHandleList&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
     &lt;span class="c1"&gt;// Cover link (picture)&lt;/span&gt;
     &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;picturePending&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;filmHandle&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;$eval&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;td img&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;img&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;img&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;src&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
     &lt;span class="c1"&gt;//Movie name(name)&lt;/span&gt;
     &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;namePending&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;filmHandle&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;$eval&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
       &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;td:nth-child(2) a&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;el&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;el&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;innerText&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt; / &lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
     &lt;span class="p"&gt;)&lt;/span&gt;
     &lt;span class="c1"&gt;// Introduction (info)&lt;/span&gt;
     &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;infoPending&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;filmHandle&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;$eval&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
       &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;td:nth-child(2) .pl&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;el&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;el&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;textContent&lt;/span&gt;
     &lt;span class="p"&gt;)&lt;/span&gt;
     &lt;span class="c1"&gt;// score&lt;/span&gt;
     &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;scorePending&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;filmHandle&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;$eval&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
       &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;td:nth-child(2) .star .rating_nums&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;el&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;el&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;textContent&lt;/span&gt;
     &lt;span class="p"&gt;)&lt;/span&gt;
     &lt;span class="c1"&gt;// Number of comments (commentsNumber)&lt;/span&gt;
     &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;commentsNumberPending&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;filmHandle&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;$eval&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
       &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;td:nth-child(2) .star .pl&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;el&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;el&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;textContent&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\(&lt;/span&gt;&lt;span class="sr"&gt;|&lt;/span&gt;&lt;span class="se"&gt;\)&lt;/span&gt;&lt;span class="sr"&gt;/g&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
     &lt;span class="p"&gt;)&lt;/span&gt;

     &lt;span class="nx"&gt;pendingTask&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
       &lt;span class="nx"&gt;namePending&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="nx"&gt;picturePending&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="nx"&gt;infoPending&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="nx"&gt;scorePending&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="nx"&gt;commentsNumberPending&lt;/span&gt;
     &lt;span class="p"&gt;])&lt;/span&gt;
   &lt;span class="p"&gt;}&lt;/span&gt;

   &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;filmInfoResult&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
   &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
   &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;item&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;pendingTask&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
     &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
       &lt;span class="c1"&gt;// filmInfo is a movie information object, the order is determined before&lt;/span&gt;
       &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;filmInfo&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
         &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;name&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
         &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;picture&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
         &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;info&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
         &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;score&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
         &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;commentsNumber&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
       &lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;reduce&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;pre&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
         &lt;span class="nx"&gt;pre&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
         &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;pre&lt;/span&gt;
       &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;{})&lt;/span&gt;

       &lt;span class="c1"&gt;//Save each movie information&lt;/span&gt;
       &lt;span class="nx"&gt;filmInfoResult&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;filmInfo&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

       &lt;span class="c1"&gt;//Last processing&lt;/span&gt;
       &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;pendingTask&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
         &lt;span class="nx"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

         &lt;span class="c1"&gt;// Arrange, decide whether it is more or less based on the quantity&lt;/span&gt;
         &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;filmResult&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
           &lt;span class="na"&gt;element&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;filmInfoResult&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
           &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;filmInfoResult&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;multiple&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;single&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
         &lt;span class="p"&gt;}&lt;/span&gt;

         &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;filmResult&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
       &lt;span class="p"&gt;}&lt;/span&gt;
     &lt;span class="p"&gt;})&lt;/span&gt;
   &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  AI assisted crawler
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Crawler + AI, let crawler and AI obtain movie information from movie rankings&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;createCrawl&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;createCrawlOpenAI&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;x-crawl&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

&lt;span class="c1"&gt;//Create a crawler application&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;crawlApp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createCrawl&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;//Create AI application&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;crawlOpenAIApp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createCrawlOpenAI&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
   &lt;span class="na"&gt;clientOptions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
   &lt;span class="na"&gt;defaultModel&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;chatModel&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;gpt-4-turbo-preview&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="c1"&gt;// crawlPage is used to crawl pages&lt;/span&gt;
&lt;span class="nx"&gt;crawlApp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;crawlPage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://movie.douban.com/chart&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
   &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;browser&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;

   &lt;span class="c1"&gt;// Wait for the element to appear on the page and get the HTML&lt;/span&gt;
   &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;waitForSelector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;#wrapper #content .article&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;targetHTML&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;$eval&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
     &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;#wrapper #content .article&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;outerHTML&lt;/span&gt;
   &lt;span class="p"&gt;)&lt;/span&gt;

   &lt;span class="nx"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

   &lt;span class="c1"&gt;// Let AI obtain movie information (the more detailed the description, the better)&lt;/span&gt;
   &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;filmResult&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;crawlOpenAIApp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parseElements&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
     &lt;span class="nx"&gt;targetHTML&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="s2"&gt;`This is a list of movies. You need to get the movie name (name), cover link (picture), introduction (info), rating (score), and number of comments (commentsNumber). Use bracketed words as attribute names`&lt;/span&gt;
   &lt;span class="p"&gt;)&lt;/span&gt;

   &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;filmResult&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Results of two examples
&lt;/h4&gt;

&lt;p&gt;Movie information ultimately displayed by two examples&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"elements"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Old Fox"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"picture"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://img1.doubanio.com/view/photo/s_ratio_poster/public/p2900908599.webp"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"info"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2023-10-27 (Tokyo International Film Festival) / 2023-11-24 (Taiwan, China) / Bai Runyin / Liu Guanting / Chen Muyi / Liu Yier / Kadowaki Mai / Huang Jianwei / Wen Shenghao / Ban Tiexiang / Yang Liyin / Fu Mengbai/Gao Yingxuan/Zhuang Yizeng/Zhang Zaixing/Xu Bowei/Guan Qing/Zhong Yao/You Jiaxuan/Zheng Yangen/Dai Yazhi/Jiang Ren/Xiao Hongwen..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"8.1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"commentsNumber"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"29211 people commented"&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Robot Dream"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"picture"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://img1.doubanio.com/view/photo/s_ratio_poster/public/p2899644068.webp"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"info"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2023-05-20(Cannes Film Festival) / 2023-12-06(Spain) / 2024(Mainland China) / Ivan Labanda/Albert Trevor Segarra/ Rafa Calvo/José Garcia Toss/José Luis Mediavilla/Garcia Molina/Esther Sollance/Spain/France/Pablo· Berger..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"9.1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"commentsNumber"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"64650 people commented"&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Under the sun"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"picture"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://img1.doubanio.com/view/photo/s_ratio_poster/public/p2904961420.webp"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"info"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2023-06-11 (Shanghai International Film Festival) / 2023-11-02 (Hong Kong, China) / 2024-04-12 (Mainland China) / David Jiang / Yu Xiangning / Lin Baoyi / Liang Zhongheng / Chen Zhanwen / Zhou Hanning / Liang Yongting/Gong Cien/Bao Peiru/Zhu Baiqian/Zhu Baikang/Xu Yuexiang/Hu Feng/Bao Qijing/Gao Hanwen/Peng Xingying/Luo Haoming/Tan Yuying..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"8.0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"commentsNumber"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"36540 people commented"&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Poor thing"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"picture"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://img1.doubanio.com/view/photo/s_ratio_poster/public/p2897662939.webp"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"info"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2023-09-01 (Venice Film Festival) / 2023-12-08 (USA) / Emma Stone / Mark Ruffalo / Willem Dafoe / Rami Yusuf / Christopher Abbott/Susie Bemba/Jerrod Carmichael/Katherine Hunter/Vicki Pepperdine/Margaret Qualley/Hannah Schigula/Jack Patton... ."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"7.0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"commentsNumber"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"130113 people commented"&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Perfect Day"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"picture"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2898894527.webp"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"info"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2023-05-25 (Cannes Film Festival) / 2023-12-21 (Germany) / 2023-12-22 (Japan) / Yakusho Koji / Emoto Tokio / Nakano Arisa / Yamada Aoi / Aso Yumi/Sayuri Ishikawa/Tomokazu Miura/Mini Tanaka/Hiroto Oshita/Inuko Inuyama/Motomi Makiguchi/Tan Nagai/Ken Naoko/Moro Shioka/Ken Moriyu/Iru Katagiri/Xinguto Serizawa..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"8.3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"commentsNumber"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"33562 people commented"&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"New Dragon Killing Formation"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"picture"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://img1.doubanio.com/view/photo/s_ratio_poster/public/p2905374090.webp"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"info"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2024-03-08 (South by Southwest Film Festival) / 2024-03-21 (USA Network) / Jake Gyllenhaal / Conor McGregor / Jessica Williams / Billy Magnussen/Daniela Melchior/Jimisola Ekumero/Lucas Gage/Travis Van Winkle/Darren Barnett/Joe Quem de Almeida..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"6.3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"commentsNumber"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"9980 people commented"&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Seoul Spring"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"picture"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://img1.doubanio.com/view/photo/s_ratio_poster/public/p2905204009.webp"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"info"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2023-11-22 (South Korea) / Hwang Jung-min / Jung Woo-sung / Lee Sung-min / Park Hae-jun / Kim Sung-joon / Park Hoon / An Se-ho / Jung Yun-ha / Jung Hae-in / Nam Yun-ho / Jeon Soo-ji / South Korea / Kim Sung-soo / 141 Minutes/Seoul Spring/Drama/Sung-su Kim/Korean"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"8.8"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"commentsNumber"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"171858 people commented"&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Goldfinger"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"picture"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://img1.doubanio.com/view/photo/s_ratio_poster/public/p2901830629.webp"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"info"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2023-12-30 (Mainland China) / Tony Leung / Andy Lau / Charlene Choi / Simon Yam / Alex Fong / Chan Ka Lok / Bai Zhi / Jiang Haowen / Pacific Insurance / Chin Ka Lok / Anita Yuen / Chow Jiayi / Sam Jiaqi / Li Jingjun / Ng Siu Hin / Ke Weilin / Feng Yongxian / Du Yaoyu/Li Jiancheng/Gu Yongfeng/Hong Kong, China/Mainland China/Chuang Wenqiang..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"6.1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"commentsNumber"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"135956 people commented"&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"American novel"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"picture"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2902166424.webp"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"info"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2023-09-08(Toronto International Film Festival) / 2023-12-15(USA) / Jeffrey Wright/ Tracey Ellis Ross/ John Ortiz/ Issa· Ray/Sterling K. Brown/Erica Alexander/Leslie Goseth/Adam Brody/Keith David/Myra Lucretia Taylor/Raymond Anthony Thomas..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"7.7"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"commentsNumber"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"26223 people commented"&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"interest area"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"picture"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2899514583.webp"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"info"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2023-05-19 (Cannes Film Festival) / 2023-12-15 (USA) / Christian Fouridower / Sandra Wheeler / John Carter House / Ralph Herbert Erfurt/Freya Kreutzkamm/Max Baker/Imorgan Kugger/Stephanie Petrowicz/Ralph Zillman/Mary Rosa Tityan. .."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"7.4"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"commentsNumber"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"24875 people commented"&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"multiple"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Compare
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Steps required for traditional crawlers to extract information
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;pendingTask&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;filmHandle&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;filmHandleList&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
   &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;picturePending&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;filmHandle&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;$eval&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;td img&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;img&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;img&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;src&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;namePending&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;filmHandle&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;$eval&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
     &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;td:nth-child(2) a&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;el&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;el&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;innerText&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt; / &lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
   &lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;infoPending&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;filmHandle&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;$eval&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
     &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;td:nth-child(2) .pl&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;el&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;el&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;textContent&lt;/span&gt;
   &lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;scorePending&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;filmHandle&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;$eval&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
     &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;td:nth-child(2) .star .rating_nums&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;el&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;el&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;textContent&lt;/span&gt;
   &lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;commentsNumberPending&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;filmHandle&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;$eval&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
     &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;td:nth-child(2) .star .pl&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;el&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;el&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;textContent&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\(&lt;/span&gt;&lt;span class="sr"&gt;|&lt;/span&gt;&lt;span class="se"&gt;\)&lt;/span&gt;&lt;span class="sr"&gt;/g&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="p"&gt;)&lt;/span&gt;

   &lt;span class="nx"&gt;pendingTask&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
     &lt;span class="nx"&gt;namePending&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="nx"&gt;picturePending&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="nx"&gt;infoPending&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="nx"&gt;scorePending&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="nx"&gt;commentsNumberPending&lt;/span&gt;
   &lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;filmInfoResult&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;item&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;pendingTask&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
   &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
     &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;filmInfo&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
       &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;name&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;picture&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;info&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;score&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;commentsNumber&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;reduce&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;
       &lt;span class="nx"&gt;any&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
       &lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;pre&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
         &lt;span class="nx"&gt;pre&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
         &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;pre&lt;/span&gt;
       &lt;span class="p"&gt;},&lt;/span&gt;
       &lt;span class="p"&gt;{})&lt;/span&gt;

     &lt;span class="nx"&gt;filmInfoResult&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;filmInfo&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

     &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;pendingTask&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
       &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;filmResult&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
         &lt;span class="na"&gt;element&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;filmInfoResult&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
         &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;filmInfoResult&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;multiple&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;single&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
       &lt;span class="p"&gt;}&lt;/span&gt;
     &lt;span class="p"&gt;}&lt;/span&gt;
   &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Relying on fixed class names and structures, the process is still relatively cumbersome.&lt;/p&gt;

&lt;h4&gt;
  
  
  Steps required for AI-assisted crawlers to extract information
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;filmResult&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;crawlOpenAIApp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parseElements&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
   &lt;span class="nx"&gt;targetHTML&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="s2"&gt;`This is a list of movies. You need to get the movie name (name), cover link (picture), introduction (info), rating (score), and number of comments (commentsNumber). Use bracketed words as attribute names`&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A matter of words.&lt;/p&gt;




&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;Traditional crawlers&lt;/code&gt; need to rely on &lt;code&gt;fixed class names&lt;/code&gt; and &lt;code&gt;various cumbersome operations&lt;/code&gt; to obtain data. If the website is updated frequently, changes in class names or structures after the website is updated may cause the traditional crawler crawling strategy to fail. It is necessary to re-obtain the latest class name and update various operations to crawl data.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;AI-assisted crawler&lt;/code&gt; only needs &lt;code&gt;a paragraph&lt;/code&gt; to obtain the required data more efficiently, intelligently and conveniently. You can even pass the entire HTML to AI to help us operate it. Since the website content is more complex, the location to be retrieved needs to be described more accurately, and a large amount of Tokens will be consumed. However, even if subsequent updates to the website cause the class name or structure to change, it can still be crawled normally. to data, because we can no longer rely on fixed class names or structures to locate and extract the required information, but let AI understand and parse the semantic information of web pages, thereby extracting the required data more efficiently, intelligently and conveniently.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;If more content is required, then the traditional crawler will take more steps, while the AI-assisted crawler can do it by just adding a few sentences, and there is no need to worry about whether the class name and structure of the website will change after the website is updated.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Summarize
&lt;/h2&gt;

&lt;p&gt;Traditional crawlers mainly rely on preset rules or patterns to crawl web page data, and they perform well on websites with stable structures and clear rules. However, with the rapid development of network technology and frequent updates of website structure, traditional crawlers are facing more and more challenges. Once the website structure changes, traditional crawlers usually need to readjust the rules, which may even lead to crawling failure, which greatly reduces its efficiency and accuracy.&lt;/p&gt;

&lt;p&gt;In contrast, AI-assisted crawlers combine artificial intelligence technology to intelligently analyze the structure and semantics of web pages and adapt to changes in the website. Through technologies such as machine learning and natural language processing, AI-assisted crawlers can identify and learn features in web pages to more accurately locate and extract the required data. This allows AI-assisted crawlers to maintain efficient crawling capabilities when facing complex and changing website structures.&lt;/p&gt;

&lt;p&gt;In general, traditional crawlers and AI-assisted crawlers each have their own applicable scenarios. For websites with stable structure and clear rules, traditional crawlers may be a more economical and direct choice. However, for websites with complex structures and frequent updates, AI-assisted crawlers have shown greater flexibility and accuracy advantages. When choosing, we need to comprehensively consider factors such as specific crawling needs, website characteristics, and resource investment.&lt;/p&gt;

&lt;h2&gt;
  
  
  resource
&lt;/h2&gt;

&lt;p&gt;The crawlers used in the examples in this article are all from x-crawl. Whether it is a traditional crawler or an AI-assisted crawler, it can satisfy you, and it also has many useful functions.&lt;/p&gt;

&lt;h3&gt;
  
  
  x-crawl
&lt;/h3&gt;

&lt;p&gt;x-crawl is a flexible Node.js AI-assisted crawler library. Flexible usage and powerful AI assistance functions make crawler work more efficient, intelligent and convenient.&lt;/p&gt;

&lt;p&gt;It consists of two parts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Crawler: It consists of crawler API and various functions, which can work normally even without relying on AI.&lt;/li&gt;
&lt;li&gt;AI: Currently based on the large AI model provided by OpenAI, AI simplifies many tedious operations.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Features
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;🤖 AI Assistance&lt;/strong&gt; - Powerful AI assistance function makes crawler work more efficient, intelligent and convenient.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;🖋️ Flexible writing&lt;/strong&gt; - A single crawling API is suitable for multiple configurations, and each configuration method has its own advantages.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;⚙️Multiple uses&lt;/strong&gt; - Supports crawling dynamic pages, static pages, interface data and file data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;⚒️ Control page&lt;/strong&gt; - Crawling dynamic pages supports automated operations, keyboard input, event operations, etc.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;👀 Device Fingerprint&lt;/strong&gt; - Zero configuration or custom configuration to avoid fingerprint recognition to identify and track us from different locations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;🔥 Asynchronous Synchronization&lt;/strong&gt; - Asynchronous or synchronous crawling mode can be performed without switching the crawling API.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;⏱️ Interval crawling&lt;/strong&gt; - no interval, fixed interval and random interval, determine whether to crawl with high concurrency.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;🔄 Failed retry&lt;/strong&gt; - Customize the number of retries to avoid crawling failures due to temporary problems.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;➡️ Rotation proxy&lt;/strong&gt; - Automatic proxy rotation with retries on failure, custom error times and HTTP status codes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;🚀 Priority Queue&lt;/strong&gt; - According to the priority of a single crawling target, it can be crawled ahead of other targets.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;🧾Crawling information&lt;/strong&gt; - Controllable crawling information will output colored string information in the terminal.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;🦾 TypeScript&lt;/strong&gt; - Own types and implement complete types through generics.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;If you think x-crawl is helpful to you, or if you like x-crawl, you can give &lt;a href="https://github.com/coder-hxl/x-crawl"&gt;x-crawl repository&lt;/a&gt; a point on GitHub A star. Your support is the driving force for our continuous improvement! thank you for your support!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;x-crawl GitHub: &lt;a href="https://github.com/coder-hxl/x-crawl"&gt;https://github.com/coder-hxl/x-crawl&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;x-crawl documentation: &lt;a href="https://coder-hxl.github.io/x-crawl/"&gt;https://coder-hxl.github.io/x-crawl/&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>javascript</category>
      <category>crawler</category>
      <category>node</category>
    </item>
    <item>
      <title>AI+Node.js x-crawl crawler: Why are traditional crawlers no longer the first choice for data crawling?</title>
      <dc:creator>CoderHXL</dc:creator>
      <pubDate>Wed, 17 Apr 2024 04:59:13 +0000</pubDate>
      <link>https://forem.com/coderhxl/ainodejs-x-crawl-crawler-why-are-traditional-crawlers-no-longer-the-first-choice-for-data-crawling-18a</link>
      <guid>https://forem.com/coderhxl/ainodejs-x-crawl-crawler-why-are-traditional-crawlers-no-longer-the-first-choice-for-data-crawling-18a</guid>
      <description>&lt;h1&gt;
  
  
  AI and Node.js crawler combination
&lt;/h1&gt;

&lt;p&gt;When AI is paired with Node.js crawlers, this combination makes data collection smarter and more efficient. AI can help Node.js crawler to achieve more accurate target positioning. Traditional crawlers often rely on fixed rules or templates to capture data, but this way is often powerless in the face of complex and changeable web structure.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why do we need AI-assisted crawlers
&lt;/h2&gt;

&lt;p&gt;With the rapid development of network technology, website updates become more and more frequent, and the change of class name or structure often brings no small challenge to the crawlers that rely on these elements. In this context, crawlers combined with AI technology have become a powerful weapon to deal with this challenge.&lt;/p&gt;

&lt;p&gt;First, a change in class name or structure after a website is updated can render traditional crawling strategies ineffective. This is because crawlers often rely on fixed class names or structures to locate and extract the information they need. Once these elements change, crawlers may not be able to accurately find the required data, which affects the effectiveness and accuracy of data fetching.&lt;/p&gt;

&lt;p&gt;However, crawlers that incorporate AI technology are better able to cope with this change. AI can also understand and parse the semantic information of web pages through technologies such as natural language processing, so as to extract the required data more accurately.&lt;/p&gt;

&lt;p&gt;In summary, crawlers combined with AI technology can better cope with the problem of class names or structural changes after website updates.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is x-crawl?
&lt;/h2&gt;

&lt;p&gt;x-crawl is a flexible Node.js AI-assisted crawler library. Flexible use and powerful AI-assisted functions make the crawler work more efficient, intelligent and convenient.&lt;/p&gt;

&lt;p&gt;It consists of two parts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Crawler: Composed of crawler API and various functions, it can work properly even without relying on AI.&lt;/li&gt;
&lt;li&gt;AI: Currently based on the large AI model provided by OpenAI, AI simplifies many tedious operations.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;x-crawl GitHub: &lt;a href="https://github.com/coder-hxl/x-crawl" rel="noopener noreferrer"&gt;https://github.com/coder-hxl/x-crawl&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;x-crawl Documentation: &lt;a href="https://coder-hxl.github.io/x-crawl/cn/" rel="noopener noreferrer"&gt;https://coder-hxl.github.io/x-crawl/cn/&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Features
&lt;/h2&gt;

&lt;p&gt;🤖 AI Assistance - Powerful AI assistance function makes crawler work more efficient, intelligent and convenient.&lt;br&gt;
🖋️ Flexible writing - A single crawling API is suitable for multiple configurations, and each configuration method has its own advantages.&lt;br&gt;
⚙️Multiple uses - Supports crawling dynamic pages, static pages, interface data and file data.&lt;br&gt;
⚒️ Control page - Crawling dynamic pages supports automated operations, keyboard input, event operations, etc.&lt;br&gt;
👀 Device Fingerprinting - Zero configuration or custom configuration to avoid fingerprint recognition to identify and track us from different locations.&lt;br&gt;
🔥 Asynchronous Sync - Asynchronous or synchronous crawling mode without switching crawling API.&lt;br&gt;
⏱️ Interval crawling - no interval, fixed interval and random interval, determine whether to crawl with high concurrency.&lt;br&gt;
🔄 Failed Retry - Customize the number of retries to avoid crawling failures due to temporary problems.&lt;br&gt;
➡️ Rotation proxy - Automatic proxy rotation with failed retries, custom error times and HTTP status codes.&lt;br&gt;
🚀 Priority Queue - Based on the priority of a single crawl target, it can be crawled ahead of other targets.&lt;br&gt;
🧾 Crawl information - Controllable crawl information, which will output colored string information in the terminal.&lt;br&gt;
🦾 TypeScript - Own types and implement complete types through generics.&lt;/p&gt;
&lt;h2&gt;
  
  
  Example of AI and x-crawl crawler combination
&lt;/h2&gt;

&lt;p&gt;The combination of crawler and AI allows the crawler and AI to obtain pictures of high-rated vacation rentals according to our instructions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;createCrawl&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;createCrawlOpenAI&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;x-crawl&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

&lt;span class="c1"&gt;//Create a crawler application&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;crawlApp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createCrawl&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;maxRetry&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;intervalTime&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;max&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;min&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="c1"&gt;//Create AI application&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;crawlOpenAIApp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createCrawlOpenAI&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;clientOptions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;defaultModel&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;chatModel&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;gpt-4-turbo-preview&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="c1"&gt;// crawlPage is used to crawl pages&lt;/span&gt;
&lt;span class="nx"&gt;crawlApp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;crawlPage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://www.airbnb.cn/s/select_homes&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;browser&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;

  &lt;span class="c1"&gt;// Wait for the element to appear on the page and get the HTML&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;targetSelector&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;[data-tracking-id="TOP_REVIEWED_LISTINGS"]&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;waitForSelector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;targetSelector&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;highlyHTML&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;$eval&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;targetSelector&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;el&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;el&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;innerHTML&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="c1"&gt;// Let the AI get the image link and de-duplicate it (the more detailed the description, the better)&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;srcResult&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;crawlOpenAIApp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parseElements&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nx"&gt;highlyHTML&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s2"&gt;`Get the image link, don't source it inside, and de-duplicate it`&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="nx"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

  &lt;span class="c1"&gt;// crawlFile is used to crawl file resources&lt;/span&gt;
  &lt;span class="nx"&gt;crawlApp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;crawlFile&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;targets&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;srcResult&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;elements&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;src&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="na"&gt;storeDirs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./upload&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Even if the subsequent update of the website causes the class name or structure to change, it can climb to the data normally, because we no longer rely on the fixed class name or structure to locate and extract the required information, but let the AI understand and parse the semantic information of the web page, so as to extract the required data more efficiently, intelligently and conveniently.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You can even pass the entire HTML to AI to help us operate, because the website content is more complex you also need to more accurately describe the location to take, and will consume a lot of Tokens.&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Procedure:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcoder-hxl.github.io%2Fx-crawl%2Fexample.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcoder-hxl.github.io%2Fx-crawl%2Fexample.gif"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you want to see the HTML the AI needs to process or see the srcResult (img url) returned by the AI after parsing the HTML according to our instructions:&lt;br&gt;
&lt;a href="https://coder-hxl.github.io/x-crawl/guide/#example" rel="noopener noreferrer"&gt;here at the bottom&lt;/a&gt;, because there are too many HTML fragments inconvenient to view at the bottom of this example, you can go to see.&lt;/p&gt;
&lt;h3&gt;
  
  
  AI Intelligent on-demand analysis elements
&lt;/h3&gt;

&lt;p&gt;No need to manually analyze the HTML page structure to extract the required element attributes or values. Now you just need to input the HTML code into AI and tell AI which elements you want to get information about, and AI will automatically analyze the page structure and extract the corresponding element attributes or values.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;createXCrawlOpenAI&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;x-crawl&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;xCrawlOpenAIApp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createXCrawlOpenAI&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;clientOptions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Your API Key&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;HTMLContent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`
   &amp;lt;div class="scroll-list"&amp;gt;
     &amp;lt;div class="list-item"&amp;gt;Women's hooded sweatshirt&amp;lt;/div&amp;gt;
     &amp;lt;div class="list-item"&amp;gt;Men's sweatshirts&amp;lt;/div&amp;gt;
     &amp;lt;div class="list-item"&amp;gt;Women's sweatshirt&amp;lt;/div&amp;gt;
     &amp;lt;div class="list-item"&amp;gt;Men's hooded sweatshirt&amp;lt;/div&amp;gt;
   &amp;lt;/div&amp;gt;
   &amp;lt;div class="scroll-list"&amp;gt;
     &amp;lt;div class="list-item"&amp;gt;Men's pure cotton short sleeves&amp;lt;/div&amp;gt;
     &amp;lt;div class="list-item"&amp;gt;Men's pure cotton short sleeves&amp;lt;/div&amp;gt;
     &amp;lt;div class="list-item"&amp;gt;Women's pure cotton short sleeves&amp;lt;/div&amp;gt;
     &amp;lt;div class="list-item"&amp;gt;Men's ice silk short sleeves&amp;lt;/div&amp;gt;
     &amp;lt;div class="list-item"&amp;gt;Men's round neck short sleeves&amp;lt;/div&amp;gt;
   &amp;lt;/div&amp;gt;
`&lt;/span&gt;

&lt;span class="nx"&gt;xCrawlOpenAIApp&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parseElements&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;HTMLContent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;`Take all men's clothing and remove duplicates`&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="cm"&gt;/*
      res:
      {
        elements: [
          { content: "Men's hooded sweatshirt" },
          { content: "Men's sweatshirts" },
          { content: "Men's pure cotton short sleeves" },
          { content: "Men's ice silk short sleeves" },
          { content: "Men's round neck short sleeves" }
        ],
        type: 'multiple'
      }
    */&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;You can even pass the entire HTML to AI to help us operate, because the website content is more complex you also need to more accurately describe the location to take, and will consume a lot of Tokens.&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Intelligent generation of element selectors
&lt;/h3&gt;

&lt;p&gt;It can help us quickly locate specific elements on the page. Just enter the HTML code into AI and tell AI which elements you want to get selectors for, and AI will automatically generate appropriate selectors for you based on the page structure, greatly simplifying the tedious process of determining selectors.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;createXCrawlOpenAI&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;x-crawl&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;xCrawlOpenAIApp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createXCrawlOpenAI&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;clientOptions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Your API Key&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;HTMLContent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`
   &amp;lt;div class="scroll-list"&amp;gt;
     &amp;lt;div class="list-item"&amp;gt;Women's hooded sweatshirt&amp;lt;/div&amp;gt;
     &amp;lt;div class="list-item"&amp;gt;Men's sweatshirts&amp;lt;/div&amp;gt;
     &amp;lt;div class="list-item"&amp;gt;Women's sweatshirt&amp;lt;/div&amp;gt;
     &amp;lt;div class="list-item"&amp;gt;Men's hooded sweatshirt&amp;lt;/div&amp;gt;
   &amp;lt;/div&amp;gt;
   &amp;lt;div class="scroll-list"&amp;gt;
     &amp;lt;div class="list-item"&amp;gt;Men's pure cotton short sleeves&amp;lt;/div&amp;gt;
     &amp;lt;div class="list-item"&amp;gt;Men's pure cotton short sleeves&amp;lt;/div&amp;gt;
     &amp;lt;div class="list-item"&amp;gt;Women's pure cotton short sleeves&amp;lt;/div&amp;gt;
     &amp;lt;div class="list-item"&amp;gt;Men's ice silk short sleeves&amp;lt;/div&amp;gt;
     &amp;lt;div class="list-item"&amp;gt;Men's round neck short sleeves&amp;lt;/div&amp;gt;
   &amp;lt;/div&amp;gt;
`&lt;/span&gt;

&lt;span class="nx"&gt;xCrawlOpenAIApp&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getElementSelectors&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;HTMLContent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;`all Women's wear`&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="cm"&gt;/*
      res:
      {
        selectors: '.scroll-list:nth-child(2) .list-item:nth-child(3)',
        type: 'multiple'
      }
    */&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;You can even pass the entire HTML to AI to help us operate, because the website content is more complex you also need to more accurately describe the location to take, and will consume a lot of Tokens.&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Intelligent reply to crawler questions
&lt;/h3&gt;

&lt;p&gt;Can provide you with intelligent answers and suggestions. Whether it is about crawling strategies, anti-crawling techniques or data processing, you can ask AI questions, and AI will provide you with professional answers and suggestions based on its powerful learning and reasoning capabilities to help you complete your tasks better. Reptile task.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;createXCrawlOpenAI&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;x-crawl&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;xCrawlOpenAIApp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createXCrawlOpenAI&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;clientOptions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Your API Key&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="nx"&gt;xCrawlOpenAIApp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;help&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;What is x-crawl&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="cm"&gt;/*
    res:
    x-crawl is a flexible Node.js AI-assisted web crawling library. It offers powerful AI-assisted features that make web crawling more efficient, intelligent, and convenient. You can find more information and the source code on x-crawl's GitHub page: https://github.com/coder-hxl/x-crawl.
   */&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="nx"&gt;xCrawlOpenAIApp&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;help&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Three major things to note about crawlers&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="cm"&gt;/*
      res:
      There are several important aspects to consider when working with crawlers:

      1. **Robots.txt:** It's important to respect the rules set in a website's robots.txt file. This file specifies which parts of a website can be crawled by search engines and other bots. Not following these rules can lead to your crawler being blocked or even legal issues.

      2. **Crawl Delay:** It's a good practice to implement a crawl delay between your requests to a website. This helps to reduce the load on the server and also shows respect for the server resources.

      3. **User-Agent:** Always set a descriptive User-Agent header for your crawler. This helps websites identify your crawler and allows them to contact you if there are any issues. Using a generic or misleading User-Agent can also lead to your crawler being blocked.

      By keeping these points in mind, you can ensure that your crawler operates efficiently and ethically.
   */&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;In the latest version of x-crawl, we have introduced powerful AI-assisted features to make crawler work more efficient, intelligent and convenient. This innovative feature is mainly reflected in the following aspects:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Intelligent on-demand analysis elements&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Traditional crawler work often requires manual analysis of the HTML page structure to extract the desired element attributes or values. Now, with the AI assistance of x-crawl, you can easily implement intelligent on-demand analysis of elements. Just tell the AI which elements you want to get information about, and the AI will automatically analyze the page structure and extract the corresponding element attributes or values.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Intelligent generation element selector&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Selectors are an integral part of the crawler's work, helping us quickly locate specific elements on the page. Now, x-crawl's AI assistant can intelligently generate element selectors for you. Simply input the HTML code into the AI, and the AI will automatically generate the appropriate selector for you based on the page structure, greatly simplifying the tedious process of determining the selector.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Intelligent Reply to crawler problems&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In the crawling work, we will inevitably encounter various problems and challenges. And x-crawl's AI assistance can provide you with intelligent answers and suggestions. Whether it is about crawling strategy, anti-crawling skills or data processing, you can ask AI questions, AI will provide you with professional answers and suggestions based on its strong learning and reasoning ability to help you better complete the crawling task.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. User-defined AI function&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In order to meet the individual needs of different users, x-crawl also provides the ability to customize the AI. This means that you can customize and optimize the AI according to your needs, making it better suited to your crawler work. Whether it's adjusting the AI's analysis strategy, optimizing the generation algorithm of the selector, or adding new functional modules, you can do it with simple operations to make the AI more in line with your usage habits and workflow.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;x-crawl GitHub: &lt;a href="https://github.com/coder-hxl/x-crawl" rel="noopener noreferrer"&gt;https://github.com/coder-hxl/x-crawl&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;x-crawl Documentation: &lt;a href="https://coder-hxl.github.io/x-crawl" rel="noopener noreferrer"&gt;https://coder-hxl.github.io/x-crawl&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you find x-crawl helpful, or if you like x-crawl, you can star &lt;a href="https://github.com/coder-hxl/x-crawl" rel="noopener noreferrer"&gt;x-crawl repository&lt;/a&gt; on GitHub. Your support is our motivation for continuous improvement! Thank you for your support!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>javascript</category>
      <category>node</category>
      <category>crawler</category>
    </item>
    <item>
      <title>Recommend a flexible Node.js multi-functional crawler library —— x-crawl</title>
      <dc:creator>CoderHXL</dc:creator>
      <pubDate>Wed, 20 Mar 2024 09:03:40 +0000</pubDate>
      <link>https://forem.com/coderhxl/recommend-a-flexible-nodejs-multi-functional-crawler-library-9dj</link>
      <guid>https://forem.com/coderhxl/recommend-a-flexible-nodejs-multi-functional-crawler-library-9dj</guid>
      <description>&lt;h1&gt;
  
  
  x-crawl
&lt;/h1&gt;

&lt;p&gt;x-crawl is a flexible Node.js multifunctional crawler library. Flexible usage and numerous functions can help you quickly, safely, and stably crawl pages, interfaces, and files.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If you also like x-crawl, you can give the &lt;a href="https://github.com/coder-hxl/x-crawl"&gt;x-crawl repository&lt;/a&gt; a star on GitHub to support it. Thank you for your support!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Features
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;🔥 Asynchronous Synchronous&lt;/strong&gt; - Just change the mode property to toggle asynchronous or synchronous crawling mode.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;⚙️Multiple uses&lt;/strong&gt; - Supports crawling dynamic pages, static pages, interface data, files and polling operations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;⚒️ Control page&lt;/strong&gt; - Crawling dynamic pages supports automated operations, keyboard input, event operations, etc.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;🖋️ Flexible writing style&lt;/strong&gt; - The same crawling API can be adapted to multiple configurations, and each configuration method is very unique.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;⏱️ Interval Crawling&lt;/strong&gt; - No interval, fixed interval and random interval to generate or avoid high concurrent crawling.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;🔄 Failed Retry&lt;/strong&gt; - Avoid crawling failure due to short-term problems, and customize the number of retries.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;➡️ Proxy Rotation&lt;/strong&gt; - Auto-rotate proxies with failure retry, custom error times and HTTP status codes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;👀 Device Fingerprinting&lt;/strong&gt; - Zero configuration or custom configuration, avoid fingerprinting to identify and track us from different locations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;🚀 Priority Queue&lt;/strong&gt; - According to the priority of a single crawling target, it can be crawled ahead of other targets.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;🧾 crawl log&lt;/strong&gt; - Logs the crawl and uses colored string reminders at the terminal.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;🦾 TypeScript&lt;/strong&gt; - Own types, implement complete types through generics.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Example
&lt;/h2&gt;

&lt;p&gt;Take the automatic acquisition of some photos of experiences and homes around the world every day as an example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// 1. Import module ES/CJS&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;xCrawl&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;x-crawl&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

&lt;span class="c1"&gt;// 2. Create a crawler instance&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;myXCrawl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;xCrawl&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;maxRetry&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;intervalTime&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;max&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;min&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="c1"&gt;// 3. Set the crawling task&lt;/span&gt;
&lt;span class="cm"&gt;/*
  Call the startPolling API to start the polling function,
  and the callback function will be called every other day
*/&lt;/span&gt;
&lt;span class="nx"&gt;myXCrawl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startPolling&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;d&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;stopPolling&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Call the crawlPage API to crawl the page&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;pageResults&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;myXCrawl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;crawlPage&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;targets&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
      &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://www.airbnb.cn/s/*/experiences&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://www.airbnb.cn/s/plus_homes&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="na"&gt;viewport&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;width&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1920&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;height&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1080&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;

  &lt;span class="c1"&gt;// Obtain the image URL by traversing the crawled page results&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;imgUrls&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;item&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;pageResults&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;item&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;elSelector&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.i9cqrtb&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.c4mnd7m&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

    &lt;span class="c1"&gt;// wait for the page element to appear&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;waitForSelector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;elSelector&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;// Get the URL of the page image&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;urls&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;$&lt;/span&gt;&lt;span class="nf"&gt;$eval&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;elSelector&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; picture img`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;imgEls&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
      &lt;span class="nx"&gt;imgEls&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;src&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nx"&gt;imgUrls&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(...&lt;/span&gt;&lt;span class="nx"&gt;urls&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="c1"&gt;// close the page&lt;/span&gt;
    &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="c1"&gt;// Call crawlFile API to crawl pictures&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;myXCrawl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;crawlFile&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;targets&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;imgUrls&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;storeDirs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./upload&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;running result:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--eEMkDFhb--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_800/https://raw.githubusercontent.com/coder-hxl/x-crawl/main/assets/run-example.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--eEMkDFhb--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_800/https://raw.githubusercontent.com/coder-hxl/x-crawl/main/assets/run-example.gif" width="800" height="321"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; Please do not crawl randomly, you can check the &lt;strong&gt;robots.txt&lt;/strong&gt; protocol before crawling. The class name of the website may change, this is just to demonstrate how to use x-crawl.&lt;/p&gt;

&lt;h2&gt;
  
  
  More
&lt;/h2&gt;

&lt;p&gt;More content can be viewed: &lt;a href="https://github.com/coder-hxl/x-crawl"&gt;https://github.com/coder-hxl/x-crawl&lt;/a&gt;&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>javascript</category>
      <category>node</category>
      <category>typescript</category>
    </item>
    <item>
      <title>Use x-crawl to help you fetch data</title>
      <dc:creator>CoderHXL</dc:creator>
      <pubDate>Mon, 10 Apr 2023 01:43:36 +0000</pubDate>
      <link>https://forem.com/coderhxl/use-x-crawl-to-help-you-fetch-data-550i</link>
      <guid>https://forem.com/coderhxl/use-x-crawl-to-help-you-fetch-data-550i</guid>
      <description>&lt;h1&gt;
  
  
  x-crawl
&lt;/h1&gt;

&lt;p&gt;x-crawl is a flexible Node.js multifunctional crawler library. Flexible usage and numerous functions can help you quickly, safely, and stably crawl pages, interfaces, and files.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If you also like x-crawl, you can give &lt;a href="https://github.com/coder-hxl/x-crawl" rel="noopener noreferrer"&gt;x-crawl repository&lt;/a&gt; a star to support it, thank you for your support!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Features
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;🔥 Asynchronous Synchronous&lt;/strong&gt; - Just change the mode property to toggle asynchronous or synchronous crawling mode.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;⚙️ Multiple purposes&lt;/strong&gt; - It can crawl pages, crawl interfaces, crawl files and poll crawls to meet the needs of various scenarios.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;🖋️ Flexible writing style&lt;/strong&gt; - The same crawling API can be adapted to multiple configurations, and each configuration method is very unique.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;⏱️ Interval Crawling&lt;/strong&gt; - No interval, fixed interval and random interval to generate or avoid high concurrent crawling.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;🔄 Failed Retry&lt;/strong&gt; - Avoid crawling failure due to short-term problems, and customize the number of retries.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;➡️ Proxy Rotation&lt;/strong&gt; - Auto-rotate proxies with failure retry, custom error times and HTTP status codes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;👀 Device Fingerprinting&lt;/strong&gt; - Zero configuration or custom configuration, avoid fingerprinting to identify and track us from different locations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;🚀 Priority Queue&lt;/strong&gt; - According to the priority of a single crawling target, it can be crawled ahead of other targets.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;☁️ Crawl SPA&lt;/strong&gt; - Crawl SPA (Single Page Application) to generate pre-rendered content (aka "SSR" (Server Side Rendering)).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;⚒️ Control Page&lt;/strong&gt; - You can submit form, keyboard input, event operation, generate screenshots of the page, etc.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;🧾 Capture Record&lt;/strong&gt; - Capture and record crawling, and use colored strings to remind in the terminal.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;🦾 TypeScript&lt;/strong&gt; - Own types, implement complete types through generics.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Example
&lt;/h2&gt;

&lt;p&gt;Take the automatic acquisition of some photos of experiences and homes around the world every day as an example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// 1.Import module ES/CJS&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;xCrawl&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;x-crawl&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

&lt;span class="c1"&gt;// 2.Create a crawler instance&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;myXCrawl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;xCrawl&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="na"&gt;maxRetry&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="na"&gt;intervalTime&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;max&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;min&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2000&lt;/span&gt; &lt;span class="p"&gt;}})&lt;/span&gt;

&lt;span class="c1"&gt;// 3.Set the crawling task&lt;/span&gt;
&lt;span class="cm"&gt;/*
  Call the startPolling API to start the polling function,
  and the callback function will be called every other day
*/&lt;/span&gt;
&lt;span class="nx"&gt;myXCrawl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startPolling&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;d&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;stopPolling&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Call crawlPage API to crawl Page&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;myXCrawl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;crawlPage&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;targets&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
      &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://www.airbnb.cn/s/experiences&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://www.airbnb.cn/s/plus_homes&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="na"&gt;viewport&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;width&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1920&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;height&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1080&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;

  &lt;span class="c1"&gt;// Store the image URL to targets&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;targets&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;elSelectorMap&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;._fig15y&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;._aov0j6&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;item&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;item&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;

    &lt;span class="c1"&gt;// Wait for the page to load&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;setTimeout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="c1"&gt;// Gets the URL of the page image&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;urls&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;$&lt;/span&gt;&lt;span class="nf"&gt;$eval&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;elSelectorMap&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;&lt;span class="s2"&gt; img`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;imgEls&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;imgEls&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;src&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nx"&gt;targets&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(...&lt;/span&gt;&lt;span class="nx"&gt;urls&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;// Close page&lt;/span&gt;
    &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="c1"&gt;// Call the crawlFile API to crawl pictures&lt;/span&gt;
  &lt;span class="nx"&gt;myXCrawl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;crawlFile&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;targets&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;storeDir&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./upload&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;running result:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2Fcoder-hxl%2Fx-crawl%2Fmain%2Fassets%2Fen%2Fcrawler.png%3Fv7" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2Fcoder-hxl%2Fx-crawl%2Fmain%2Fassets%2Fen%2Fcrawler.png%3Fv7"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2Fcoder-hxl%2Fx-crawl%2Fmain%2Fassets%2Fen%2Fcrawler-result.png%3Fv7" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2Fcoder-hxl%2Fx-crawl%2Fmain%2Fassets%2Fen%2Fcrawler-result.png%3Fv7"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; Do not crawl at will, you can check the &lt;strong&gt;robots.txt&lt;/strong&gt; protocol before crawling. This is just to demonstrate how to use x-crawl.&lt;/p&gt;

&lt;h2&gt;
  
  
  More
&lt;/h2&gt;

&lt;p&gt;For more detailed documentation, please check: &lt;a href="https://github.com/coder-hxl/x-crawl" rel="noopener noreferrer"&gt;https://github.com/coder-hxl/x-crawl&lt;/a&gt;&lt;/p&gt;

</description>
      <category>crawl</category>
      <category>node</category>
      <category>javascript</category>
      <category>typescript</category>
    </item>
  </channel>
</rss>
