<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Syed Sirajul Islam Anik</title>
    <description>The latest articles on Forem by Syed Sirajul Islam Anik (@ssianik).</description>
    <link>https://forem.com/ssianik</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F195264%2F87ad29ad-4974-445d-9c09-102bf00b900a.jpeg</url>
      <title>Forem: Syed Sirajul Islam Anik</title>
      <link>https://forem.com/ssianik</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/ssianik"/>
    <language>en</language>
    <item>
      <title>Elasticsearch Sample Data Generator</title>
      <dc:creator>Syed Sirajul Islam Anik</dc:creator>
      <pubDate>Wed, 24 Mar 2021 01:44:59 +0000</pubDate>
      <link>https://forem.com/ssianik/elasticsearch-sample-data-generator-208k</link>
      <guid>https://forem.com/ssianik/elasticsearch-sample-data-generator-208k</guid>
      <description>&lt;p&gt;Recently, I am trying to learn Elasticsearch once again. I used "once again" because I wanted to learn it since late 2016 and in between the time frame, I tried learning it several times and as always I have failed myself to learn it. And just like every other time, I am motivated this time as well 😉&lt;/p&gt;

&lt;h2&gt;
  
  
  Motive
&lt;/h2&gt;

&lt;p&gt;To learn elasticsearch, you need lots of data to make queries as you want. I searched a few places to get a valid dump. But I couldn't find any dump that I can go with. Also what I found online, I am not familiar with the types of data. So, I thought to make a generator of my own. I have already used &lt;a href="https://laravel.com/docs/master/artisan" rel="noopener noreferrer"&gt;Artisan Console&lt;/a&gt; and &lt;a href="https://github.com/fzaninotto/Faker" rel="noopener noreferrer"&gt;fzaninotto/Faker&lt;/a&gt;, that's why I thought to make a generator that anyone can use with their terminal and generate the dump the way they wish.&lt;/p&gt;

&lt;h2&gt;
  
  
  The repository
&lt;/h2&gt;

&lt;p&gt;This is the repository that you can use to generate the dump.&lt;/p&gt;


&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev.to%2Fassets%2Fgithub-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/ssi-anik" rel="noopener noreferrer"&gt;
        ssi-anik
      &lt;/a&gt; / &lt;a href="https://github.com/ssi-anik/elasticsearch-sample-data-generator" rel="noopener noreferrer"&gt;
        elasticsearch-sample-data-generator
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      Sample data generator and writes in file to upload to Elasticsearch for bulk upload
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;
&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;elasticsearch-sample-data-generator&lt;/h2&gt;
&lt;/div&gt;
&lt;p&gt;The purpose of the project is to generate a dump for Elasticsearch Bulk API.&lt;/p&gt;
&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Requirements&lt;/h2&gt;
&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;Either your local machine should have &lt;code&gt;composer&lt;/code&gt; or &lt;code&gt;docker&lt;/code&gt; installed to get it working. And the local PHP version should be &lt;code&gt;&amp;gt;=7.3&lt;/code&gt; and &lt;code&gt;&amp;lt;8.0&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Installation&lt;/h2&gt;
&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;Clone the repository.&lt;/li&gt;
&lt;li&gt;If you have &lt;code&gt;composer&lt;/code&gt; installed on your local machine and satisfies the requirement, then run &lt;code&gt;composer install&lt;/code&gt; to install the project dependencies.&lt;/li&gt;
&lt;li&gt;If you don't know &lt;code&gt;php&lt;/code&gt; or the local &lt;code&gt;php&lt;/code&gt; requirement is not satisfied on your machine, then uncomment the &lt;code&gt;COPY . /app&lt;/code&gt; and &lt;code&gt;RUN composer install&lt;/code&gt; lines in &lt;code&gt;Dockerfile&lt;/code&gt;. So, they'll look like the following.&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight highlight-source-dockerfile notranslate position-relative overflow-auto js-code-highlight"&gt;
&lt;pre&gt;&lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;#&lt;/span&gt; It'll copy the project in the PHP container.&lt;/span&gt;
&lt;span class="pl-k"&gt;COPY&lt;/span&gt; . /app

&lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;#&lt;/span&gt; It'll install the project dependencies.&lt;/span&gt;
&lt;span class="pl-k"&gt;RUN&lt;/span&gt; composer install&lt;/pre&gt;

&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;Run &lt;code&gt;cp docker-compose.yml.example docker-compose.yml&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Make changes in your &lt;code&gt;docker-compose.yml&lt;/code&gt; file. If you don't need the &lt;code&gt;elasticsearch&lt;/code&gt; &amp;amp; &lt;code&gt;kibana&lt;/code&gt;, remove those services.&lt;/li&gt;
&lt;li&gt;If you made the…&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
  &lt;/div&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/ssi-anik/elasticsearch-sample-data-generator" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;/div&gt;


&lt;h2&gt;
  
  
  Installation [without docker]
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Clone the repository. &lt;/li&gt;
&lt;li&gt;If your machine has PHP version &lt;code&gt;&amp;gt;=7.3&lt;/code&gt; and &lt;code&gt;&amp;lt;8.0&lt;/code&gt; and composer installed, then just run &lt;code&gt;composer install&lt;/code&gt; being in the root of the repository. It'll install the project dependencies.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's all.&lt;/p&gt;

&lt;h2&gt;
  
  
  Installation [with docker]
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Clone the repository.&lt;/li&gt;
&lt;li&gt;Uncomment the line &lt;code&gt;Copy . /app&lt;/code&gt; in the &lt;code&gt;Dockerfile&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Uncomment the line &lt;code&gt;RUN composer install&lt;/code&gt; in the &lt;code&gt;Dockerfile&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Copy the &lt;code&gt;docker-compose.yml.example&lt;/code&gt; to &lt;code&gt;docker-compose.yml&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Comment the line &lt;code&gt;.:/app&lt;/code&gt; in your docker-compose.yml's &lt;code&gt;services.php.volumes&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Uncomment the line &lt;code&gt;./dumps:/app/dumps&lt;/code&gt; in your docker-compose.yml's &lt;code&gt;services.php.volumes&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;If you don't need elasticsearch and kibana services, then just delete them.&lt;/li&gt;
&lt;li&gt;Run &lt;code&gt;docker-compose up -d --build&lt;/code&gt; to run your containers.&lt;/li&gt;
&lt;li&gt;To exec into the PHP service, run &lt;code&gt;docker-compose exec php bash&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's all for the docker-based installation. If you're good at docker, you can tweak these things as well by going through the &lt;code&gt;Dockerfile&lt;/code&gt; and the &lt;code&gt;docker-compose.yml&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Usage
&lt;/h2&gt;

&lt;p&gt;The repository contains one executable &lt;code&gt;elasticsearch-dump&lt;/code&gt; in the root of it. We'll have to use this to run commands and generate dumps.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;./elasticsearch-dump generate&lt;/code&gt; is the base command. Let's have a look at the available arguments and options.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;./elasticsearch-dump generate &lt;span class="nt"&gt;--help&lt;/span&gt;

Description:
  Generate dump &lt;span class="k"&gt;for &lt;/span&gt;elasticsearch bulk API upload

Usage:
  generate &lt;span class="o"&gt;[&lt;/span&gt;options] &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="nt"&gt;--&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt; &amp;lt;fields&amp;gt;

Arguments:
  fields               Enter the fields definition &lt;span class="o"&gt;(&lt;/span&gt;required&lt;span class="o"&gt;)&lt;/span&gt;

Options:
  &lt;span class="nt"&gt;--file&lt;/span&gt;&lt;span class="o"&gt;[=&lt;/span&gt;FILE]        Enter the file name &lt;span class="o"&gt;[&lt;/span&gt;default: &lt;span class="s2"&gt;"dumps/dump.json"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
  &lt;span class="nt"&gt;--entries&lt;/span&gt;&lt;span class="o"&gt;[=&lt;/span&gt;ENTRIES]  Enter the number of entries &lt;span class="o"&gt;[&lt;/span&gt;default: &lt;span class="s2"&gt;"1"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
  &lt;span class="nt"&gt;--action&lt;/span&gt;&lt;span class="o"&gt;[=&lt;/span&gt;ACTION]    Enter the action name &lt;span class="o"&gt;[&lt;/span&gt;index or create] &lt;span class="o"&gt;[&lt;/span&gt;default: &lt;span class="s2"&gt;"index"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
  &lt;span class="nt"&gt;--index&lt;/span&gt;&lt;span class="o"&gt;[=&lt;/span&gt;INDEX]      Enter the index name &lt;span class="o"&gt;[&lt;/span&gt;default: &lt;span class="s2"&gt;"my-index"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
  &lt;span class="nt"&gt;--id&lt;/span&gt;&lt;span class="o"&gt;[=&lt;/span&gt;ID]            Enter the sequence start value &lt;span class="o"&gt;[&lt;/span&gt;default: &lt;span class="s2"&gt;"1"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
  &lt;span class="nt"&gt;--append&lt;/span&gt;             Append to existing file
  &lt;span class="nt"&gt;--force&lt;/span&gt;              Does not ask &lt;span class="k"&gt;for &lt;/span&gt;confirmation
  &lt;span class="nt"&gt;--uuid&lt;/span&gt;               UUID based ID generation
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Options
&lt;/h2&gt;

&lt;p&gt;Before we check the required argument, let's explore the options first. There are few options that expect values and a few are boolean flags. And all the options are optional. You'll override the common values passing these options.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;--file&lt;/code&gt; - Default is &lt;code&gt;dumps/dump.json&lt;/code&gt;. You can pass the file name where you want to save the dump. You can pass a relative or absolute path. If the path starts with &lt;code&gt;/&lt;/code&gt; then it'll use it as an absolute path. Otherwise, it'll always dump in the &lt;code&gt;dumps&lt;/code&gt; directory and considers the file name only.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;--entries&lt;/code&gt; - Default is &lt;code&gt;1&lt;/code&gt;. The number of entries you want to generate.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;--action&lt;/code&gt; - Default is &lt;code&gt;index&lt;/code&gt;. The type of the action. Either it can be &lt;code&gt;index&lt;/code&gt; or &lt;code&gt;create&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;--index&lt;/code&gt; - Default is &lt;code&gt;my-index&lt;/code&gt;. The name of the index where you'll put these values.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;--id&lt;/code&gt; - Default is &lt;code&gt;1&lt;/code&gt;. The start position of the sequence. It can only generate a numeric sequence.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;--append&lt;/code&gt; - A boolean flag. If exists then it'll append to the existing file. If the file doesn't exist, then it'll create the file and put contents on it.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;--force&lt;/code&gt; - A boolean flag. By default, the command will ask you for confirmation. By providing this flag, you can bypass the confirmation.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;--uuid&lt;/code&gt; - A boolean flag. If passed, the &lt;code&gt;--id&lt;/code&gt; will not be considered and will generate the UUID-based IDs.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Arguments
&lt;/h2&gt;

&lt;p&gt;The command generates data utilizing the &lt;a href="https://github.com/fzaninotto/Faker" rel="noopener noreferrer"&gt;PHP's Faker library&lt;/a&gt;. We have to pass the fields that we want to generate with the fake data.&lt;/p&gt;

&lt;p&gt;Suppose we want to generate &lt;code&gt;name&lt;/code&gt; and &lt;code&gt;address&lt;/code&gt; fields. When you pass the fields, you can use the pipe &lt;code&gt;|&lt;/code&gt; to separate each field. So, the command looks like the following.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;./elasticsearch-dump generate &lt;span class="nt"&gt;--entries&lt;/span&gt; 10 &lt;span class="s2"&gt;"name|address"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, both the &lt;strong&gt;&lt;code&gt;name&lt;/code&gt;&lt;/strong&gt; and &lt;strong&gt;&lt;code&gt;address&lt;/code&gt;&lt;/strong&gt; fields are resolved to the Faker's &lt;code&gt;name&lt;/code&gt; and &lt;code&gt;address&lt;/code&gt; properties. If we have to have a different key for the objects, we can use a colon &lt;code&gt;:&lt;/code&gt; to separate them. So, if we want to have &lt;code&gt;firstName&lt;/code&gt; in our name fields, and &lt;code&gt;streetAddress&lt;/code&gt; in our address field, then we can simply use the following.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;./elasticsearch-dump generate &lt;span class="nt"&gt;--entries&lt;/span&gt; 10 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"name:firstName|address:streetAddress"&lt;/span&gt;

&lt;span class="c"&gt;# Generates&lt;/span&gt;
&lt;span class="c"&gt;# {"name":"Roosevelt","address":"45647 Judy Isle"}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, the &lt;code&gt;name&lt;/code&gt; key will be in the object, containing the &lt;code&gt;firstName&lt;/code&gt; as well as the &lt;code&gt;streetAddress&lt;/code&gt; value in the &lt;code&gt;address&lt;/code&gt; key. Now, &lt;code&gt;firstName&lt;/code&gt; and the &lt;code&gt;streetAddress&lt;/code&gt; are resolved to the faker's property.&lt;/p&gt;

&lt;p&gt;If the faker wants you to pass a method, you can also do it by passing as a method.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;./elasticsearch-dump generate &lt;span class="nt"&gt;--entries&lt;/span&gt; 10 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"name:firstName|id:numerify('ID-####')|amount:numberBetween(1000, 9000)"&lt;/span&gt;

&lt;span class="c"&gt;# Generates&lt;/span&gt;
&lt;span class="c"&gt;# {"name":"Lourdes","id":"ID-4912","amount":1004}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Object nesting
&lt;/h2&gt;

&lt;p&gt;When passing your fields to the command's argument, you can pass nest objects using the dot notation.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;./elasticsearch-dump generate &lt;span class="nt"&gt;--entries&lt;/span&gt; 10 &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="s2"&gt;"student.name:firstName|student.age:numberBetween(20, 27)|id:numerify('ID-####')"&lt;/span&gt;

&lt;span class="c"&gt;# Generates&lt;/span&gt;
&lt;span class="c"&gt;# {"student":{"name":"Chandler","age":20},"id":"ID-4386"}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Check the JSON. The &lt;code&gt;student&lt;/code&gt; object contains the &lt;code&gt;name&lt;/code&gt; and &lt;code&gt;age&lt;/code&gt; within it. The ID field is outside the &lt;code&gt;student&lt;/code&gt; object.&lt;/p&gt;

&lt;h2&gt;
  
  
  Extending the faker functionality
&lt;/h2&gt;

&lt;p&gt;If the faker doesn't provide the type of data you want and you want to extend it, you can also do so by providing an array of values in the project's &lt;a href="https://github.com/ssi-anik/elasticsearch-sample-data-generator/blob/master/config/source.php" rel="noopener noreferrer"&gt;&lt;code&gt;config/source.php&lt;/code&gt;&lt;/a&gt; file. The file already contains &lt;code&gt;designation&lt;/code&gt; as an example. You can call the custom provider using the &lt;code&gt;custom('key')&lt;/code&gt; format.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;./elasticsearch-dump generate &lt;span class="s2"&gt;"name|designation:custom('designation')"&lt;/span&gt;

&lt;span class="c"&gt;# Generates&lt;/span&gt;
&lt;span class="c"&gt;# {"name":"Annabelle Balistreri","designation":"HR Managers"}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So, for our case &lt;code&gt;custom('designation')&lt;/code&gt;, where &lt;code&gt;designation&lt;/code&gt; is the key in the &lt;code&gt;config/source.php&lt;/code&gt; file.&lt;/p&gt;




&lt;p&gt;Hope this helps you to generate lots of data. &lt;/p&gt;

&lt;p&gt;Happy coding. ❤️&lt;/p&gt;

</description>
      <category>elasticsearch</category>
      <category>bulk</category>
      <category>data</category>
      <category>dump</category>
    </item>
  </channel>
</rss>
