<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Rohit Sah</title>
    <description>The latest articles on Forem by Rohit Sah (@sahrohit).</description>
    <link>https://forem.com/sahrohit</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1131304%2F4caeb306-51f8-47b0-af9b-e2d7e4455f96.jpeg</url>
      <title>Forem: Rohit Sah</title>
      <link>https://forem.com/sahrohit</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/sahrohit"/>
    <language>en</language>
    <item>
      <title>Per Segment Plane Sweep Line Segment Intersection on the GPU Paper Summary</title>
      <dc:creator>Rohit Sah</dc:creator>
      <pubDate>Fri, 27 Jun 2025 18:06:51 +0000</pubDate>
      <link>https://forem.com/sahrohit/per-segment-plane-sweep-line-segment-intersection-on-the-gpu-paper-summary-13fd</link>
      <guid>https://forem.com/sahrohit/per-segment-plane-sweep-line-segment-intersection-on-the-gpu-paper-summary-13fd</guid>
      <description>&lt;p&gt;This blog is originally post on &lt;a href="https://blog.sahrohit.com.np/posts/plane-sweep-parallel" rel="noopener noreferrer"&gt;https://blog.sahrohit.com.np/posts/plane-sweep-parallel&lt;/a&gt; and distributed across platforms for more reach. The original paper of this paper summary is available &lt;a href="https://doi.org/10.1145/3557915.3561003" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;There are two algorithms presented in the paper one is HMAL, and another is Per Segment Sweep Line algorithm. HMAL is limited by synchronization requirement and number of threads allowed in a single GPU grid cell. The other is the PSSL algorithm where it addresses the drawback of the HMAL by using dynamic parallelism. The PSSL algorithm is fast, memory efficient and easier to implement than other competing approaches.&lt;/p&gt;

&lt;p&gt;HMAL utilizes processing cores to function as a hardware representation of the active list data structure,containing list segments encountered by the sweep line.The drawback of the HMAL algorithm is the synchronization requirement, it is limited by the maximum number of threads allowed in a single GPU grid cell (currently 1024 threads). Second Algorithm used in this paper is the Per Segment Sweep Line (PSSL) algorithm. It addresses the drawbacks of the HMAL algorithm by eliminating the indeed for synchronization primitives in the GPU kernels, and thus eliminating the limitations of constraining the number of threads to a single thread block. It achieves these improvements by taking advantage of dynamic parallelism. PSSL is easy to implement, fast and memory efficient compared to competing approaches.&lt;/p&gt;

&lt;h2&gt;
  
  
  Related Work
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Serial Plane Sweep Algorithm
&lt;/h3&gt;

&lt;p&gt;Once the sweep line encounters the left endpoint of the line segment, the line segment will be added to the list, and when the right endpoint of the line segment is added to the list it is removed from the active.&lt;/p&gt;

&lt;h3&gt;
  
  
  Grid based Algorithm
&lt;/h3&gt;

&lt;p&gt;The algorithm treats each grid cell as a separate unit of work for a processing unit by testing each line segment for intersections against all other line segments on the same grid. Polygons with high line segment count in a relatively small area (high-density), will result in grid cells requiring large numbers of comparisons, leading to unbalanced workloads for processing units.&lt;/p&gt;

&lt;h3&gt;
  
  
  Two-level Grid based Algorithm
&lt;/h3&gt;

&lt;p&gt;In this algorithm, the grid is subdivided again into sub grids if the number of line segments in a grid exceeds a user-specified threshold.&lt;/p&gt;

&lt;h3&gt;
  
  
  Neighbor finding Reduction Variation of Parallel Sweep Line Algorithm
&lt;/h3&gt;

&lt;p&gt;This algorithm generates events for each endpoint of a line segment and each found intersection point. The initial endpoint events are processed in parallel first. The intersections that are found during the processing of the original endpoints generate new events.&lt;/p&gt;

&lt;h3&gt;
  
  
  Line Segment Filtering PSCMBR
&lt;/h3&gt;

&lt;p&gt;PSCMBR is a combination of PolySketch (used to find a Minimum Bounding Rectangle (MBR) around a predetermined number of line segments, called a tile) and CMBR (Common minimum bounding rectangle). PolySketch filters for overlapping tiles and CMBR of the tiles refines the search for candidate line segments that intersect.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;These are five different types of line segment algorithms that do already exist. The results and time complexity of these algorithms vary and these are also considered for benchmarking and comparing the result of the proposed algorithm.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Preliminaries
&lt;/h2&gt;

&lt;p&gt;List of lines sorted in X-axis forming complex regions. If the segments do not form complex regions (self-intersecting regions or regions without a closed loop) the algorithm will still run to completion. The line segments are generated using a python script.The generated line segments are not generated in sorted order, as sorting these lines are also a part of the algorithm and time complexity of sorting is also considered in the final algorithm&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The input data is not sorted and time taken to sort is also considered a part of the algorithm.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Hardware Materialized Active List
&lt;/h2&gt;

&lt;p&gt;The main difference is that of handling the active list. In traditional sweep line algorithms when the left endpoint of a segment is encountered it is entered into the active list and when the right endpoint of the line segment is encountered, it is removed from the active list.&lt;/p&gt;

&lt;p&gt;Now since the list is sorted, the newly inserted line segment only checks its neighboring segments in the active list of intersections.&lt;/p&gt;

&lt;p&gt;An imaginary sweep line moves across the plane from left to right. As the sweep line encounters new line segments, it is added to the hardware active list and processed for intersections and topology.&lt;/p&gt;

&lt;p&gt;When the left endpoint of the segment is encountered, a thread will be assigned. Then this new added segment will be checked for intersection with all the existing threads in parallel. Thus, the active list is itself materialized into the hardware. This removes the logarithmic lookup time for a central tree-based active list. We need to make sure that the number of active line segments doesn’t exceed the number of available CUDA processor cores.&lt;/p&gt;

&lt;p&gt;If the x value of the left endpoint is larger than the right endpoint of the segment in the hardware active list. The thread will notice and free itself, automatically removing itself from the active hardware list.&lt;/p&gt;

&lt;p&gt;At the end of each loop, the processing threads must be synchronized, to make sure all threads are using the current line segment and all are testing the current line segment for intersections.&lt;/p&gt;

&lt;p&gt;The implementation is limited to the maximum active list size of 1024 due to the block level synchronization. The size of the active list for N number of line segments is roughly sqrt(N). So, this algorithm can process 1M total line segments for two polygons.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The active list is actually threads, and when the left endpoint of a line segment is encountered, it is assigned a thread and then it looks for intersection on all items of the active list on their respective threads and stores the result. Then, the right endpoint of the line segment is encountered and the thread is cleared. This makes sure that when the sweep line is moved, it only checks for the intersection in the active list.The limitation of this approach is the maximum size of the active list. Due to the block level synchronization, the maximum size of the active list can be 1024, and hence it can roughly process 1M line segments of two polynomials at once.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Per Segment Sweep Line Algorithm
&lt;/h2&gt;

&lt;p&gt;This algorithm will test line segments, as if encountered by a per-segment sweep line. The intersection tests will be held off for later but the pairing of the line segments is computed first.&lt;/p&gt;

&lt;p&gt;Once the pairing of the line segments has been calculated, then a separate dynamic thread will be launched to perform the intersection tests. Since the pairing and intersection happen separately, these are not constrained as HMAL was.&lt;/p&gt;

&lt;p&gt;The endpoints are sorted using their x-value, y-value, segmentID and left-right flag using standard parallel sort. The left-right endpoint is used to represent the left endpoint (value of 1) or right endpoint (value of 0) of the line segment.&lt;/p&gt;

&lt;p&gt;Test Count Values(TCVs) (number of intersection tests for each line) are calculated.&lt;/p&gt;

&lt;p&gt;Threads are launched based on the worklist created in the previous step. Then the possible intersections are calculated based on the left-most endpoint. This is done to prevent the repetition of the intersection point calculation. As in Fig(a,b,c) these are the types of intersection calculated for line segment A. Fig (d)  type of intersection will be calculated under line segment B. Fig (e) type of intersection will be calculated by line segment B because the rightmost endpoint of the line segment A falls in between line segment B. Fig (f) type of intersection is also calculated by line segment B because A and C don't fully cross segment B.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzrycgb7kka64em8nw05d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzrycgb7kka64em8nw05d.png" alt="Types of Intersections" width="800" height="280"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once the intersection points are found, the line segment will be broken down and the interior above and interior below data tags will be updated on each line segment.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;First the TCVs are calculated by sorting each line by features and also number of intersections are calculated. Then worker threads are launched based on the number of intersections. Then based on the type of intersection the intersections are calculated in order to prevent the repeated calculations. Then the line segments are broken at intersection points and the data is updated.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Algorithm Analysis
&lt;/h2&gt;

&lt;h3&gt;
  
  
  HMAL Time Complexity
&lt;/h3&gt;

&lt;p&gt;Sorting of Line Segments: O (&lt;em&gt;N log N&lt;/em&gt;)&lt;/p&gt;

&lt;p&gt;Detecting Intersections: O(&lt;em&gt;N&lt;/em&gt;)&lt;/p&gt;

&lt;p&gt;Sorting the number of intersection: O(&lt;em&gt;k log k&lt;/em&gt;)&lt;/p&gt;

&lt;p&gt;Breaking of Line Segment: O(&lt;em&gt;k&lt;/em&gt;)&lt;/p&gt;

&lt;p&gt;Total: O(&lt;em&gt;N log N&lt;/em&gt; + &lt;em&gt;k log k&lt;/em&gt;).&lt;/p&gt;

&lt;h3&gt;
  
  
  PSSL Time Complexity
&lt;/h3&gt;

&lt;p&gt;Creation of Stencils and determination of intersection tests: O(N)&lt;/p&gt;

&lt;p&gt;PSSL Algorithm: O(&lt;em&gt;N&lt;/em&gt;)&lt;/p&gt;

&lt;p&gt;Breaking of Line Segment: O(&lt;em&gt;k log k&lt;/em&gt;)&lt;/p&gt;

&lt;p&gt;Update Topology Flags: O(&lt;em&gt;k&lt;/em&gt;)&lt;/p&gt;

&lt;p&gt;Total: O(&lt;em&gt;N log N&lt;/em&gt; + &lt;em&gt;k log k&lt;/em&gt;).&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The time complexity of both the algorithms is very similar and to be O(&lt;em&gt;N log N&lt;/em&gt; + &lt;em&gt;k log k&lt;/em&gt;). The main difference is that the HMAL algorithm is limited to the maximum block size of the CUDA threads whereas the PSSL algorithm isn’t.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Experimental Results
&lt;/h2&gt;

&lt;p&gt;Each test is performed 5 times and average of that is taken. The PS-ACC algorithm was not tested as the source code for this program was not available. (Despite being quadratic, PS-ACC performs way better)&lt;/p&gt;

&lt;p&gt;By far the best performance is given by PSSL-CPU because the size of the dataset is small (10K to 80K lines). PSSL-GPU lacks behind by an order of magnitude due to the data transfer to/from PGU and kernel launch overheads. This shows that the PSSL-CPU can be used in database applications like PostGIS.&lt;/p&gt;

&lt;p&gt;Their own implementation of sweep line algorithm was used instead in order to address the known sensitivity of the number precisions and rounding errors. In the Parks/Counties dataset, the SSL enters an infinite loop due to the arithmetic precision errors.&lt;/p&gt;

&lt;p&gt;The PSSL-GPU performs at linear and sublinear times due to the maximum size of the list never exceeding the number of processing cores.&lt;/p&gt;

&lt;p&gt;PSSL didn’t struggle either on CPU or GPU, but it is sensitive to the density of the intersections.&lt;/p&gt;

&lt;p&gt;PSCMBR is the algorithm that uses geometric overlay instead of joins to find the intersection of the polygon. This cannot be entirely compared with the PSSL algorithm as PSSL performs joins operation but PSCMBR does polygon overlay. In the results, PSCMBR does perform better on datasets with high density where PSSL lacks.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Each test is run 5 times and their average time for the results is taken. The algorithms are tested both on generated dataset as well as real world datasets. The HAML algorithm didn’t run on all tests due to the block synchronization requirements. PSSL did perform better on all tests and showed little bit of struggle while performing operation on dense datasets. The PSSL algorithm is also compared with PSCMBR.&lt;/p&gt;
&lt;/blockquote&gt;

</description>
    </item>
    <item>
      <title>Parallelizing Sorting Algorithms using OpenMP</title>
      <dc:creator>Rohit Sah</dc:creator>
      <pubDate>Sun, 13 Oct 2024 23:00:06 +0000</pubDate>
      <link>https://forem.com/sahrohit/parallelizing-sorting-algorithms-using-openmp-1hec</link>
      <guid>https://forem.com/sahrohit/parallelizing-sorting-algorithms-using-openmp-1hec</guid>
      <description>&lt;p&gt;This blog is originally post on &lt;a href="https://blog.sahrohit.com.np/posts/parallel-sorting-algorithm" rel="noopener noreferrer"&gt;https://blog.sahrohit.com.np/posts/parallel-sorting-algorithm&lt;/a&gt; and distributed across platforms for more reach.&lt;/p&gt;

&lt;p&gt;This blog focuses on investigating parallel programming using OpenMP, particularly in the context of sorting algorithms. The goal is to compare the performance of serial and parallel implementations of three well-known sorting algorithms — Bubble, Quick Sort, and Merge Sort — on randomly generated numbers, providing insights into how parallel programming impacts computational efficiency.&lt;/p&gt;

&lt;p&gt;The blog involves creating both serial (S) and parallel (P) implementations for each of the three sorting algorithms, alongside a reference implementation using STL’s std::sort to benchmark the results. The performance of these algorithms is evaluated by measuring the execution times across varying input sizes, and the effects of hyper-threading are also studied.&lt;/p&gt;

&lt;h3&gt;
  
  
  Hardware Overview
&lt;/h3&gt;

&lt;p&gt;The hardware setup for the systems used in this blog includes two servers. The first, radish, has 4 cores with 2 threads per core, resulting in 8 virtual CPUs (vCPUs). The second, jalapeno, has 4 cores with 2 threads per core, providing 16 vCPUs, indicating hyper-threading is also enabled. These systems provide a range of hardware configurations for testing the parallel efficiency of sorting algorithms with OpenMP.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sorting Algorithm Overview
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Bubble Sort (O(n^2))
&lt;/h3&gt;

&lt;p&gt;Bubble Sort is a simple comparison-based sorting algorithm with quadratic time complexity that works by comparing each item with its adjacent item and swapping to sort. Parallelizing Bubble Sort is challenging due to its inherent sequential nature.&lt;/p&gt;

&lt;p&gt;The parallel version of this algorithm alternates between two phases to ensure all elements are correctly positioned. Each thread operates on distinct portions of the data during these phases, improving efficiency by reducing the number of redundant comparisons. The shared state of the array is maintained consistently across threads by using synchronization techniques, such as reduction, to ensure proper sorting without conflicts or race conditions. This approach enables the system to leverage available computational resources fully, speeding up the sorting process, especially for large datasets.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;bubbleSortParallel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;bool&lt;/span&gt; &lt;span class="n"&gt;sorted&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;tmp&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;sorted&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

        &lt;span class="cp"&gt;#pragma omp parallel private(tmp)
&lt;/span&gt;        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="c1"&gt;// Even phase&lt;/span&gt;
            &lt;span class="cp"&gt;#pragma omp for reduction(&amp;amp;&amp;amp;:sorted)
&lt;/span&gt;            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="n"&gt;tmp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
                    &lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
                    &lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tmp&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
                    &lt;span class="n"&gt;sorted&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
                &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;

            &lt;span class="c1"&gt;// Odd phase&lt;/span&gt;
            &lt;span class="cp"&gt;#pragma omp for reduction(&amp;amp;&amp;amp;:sorted)
&lt;/span&gt;            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="n"&gt;tmp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
                    &lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
                    &lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tmp&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
                    &lt;span class="n"&gt;sorted&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
                &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Quick Sort (O(n log n))
&lt;/h3&gt;

&lt;p&gt;Quick Sort is a widely used, efficient, and divide-and-conquer algorithm. By recursively partitioning the array, it achieves O(n log n) performance in its average case. Parallelizing Quick Sort is possible by assigning different partitions to different threads.&lt;/p&gt;

&lt;p&gt;In the parallel version of Quick Sort, the divide-and-conquer approach is further optimized by assigning different array partitions to separate threads. After partitioning the array around a pivot element, the left and right subarrays are handled independently. This independence allows for parallel execution, where multiple threads can simultaneously sort different parts of the array, greatly reducing the overall sorting time. By controlling the depth of recursion, tasks are conditionally created to avoid excessive parallelization overhead, ensuring that computational resources are utilized efficiently. As a result, parallel Quick Sort can handle large datasets more effectively than its sequential counterpart.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;quickSortParallel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;low&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;high&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;depth&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;low&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;high&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;pi&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;partition&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;low&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;high&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="cp"&gt;#pragma omp task shared(arr) if(depth &amp;lt; 3)
&lt;/span&gt;        &lt;span class="n"&gt;quickSortParallel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;low&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pi&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;depth&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="cp"&gt;#pragma omp task shared(arr) if(depth &amp;lt; 3)
&lt;/span&gt;        &lt;span class="n"&gt;quickSortParallel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pi&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;high&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;depth&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="cp"&gt;#pragma omp taskwait
&lt;/span&gt;    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the source code, parallelization is achieved through the #pragma omp task directives, which create parallel tasks for sorting the left and right subarrays after partitioning. Specifically, after calculating the pivot index (pi), two separate tasks are spawned: one for sorting the subarray to the left of the pivot (quickSortParallel(arr, low, pi - 1, depth + 1)) and another for the subarray to the right of the pivot (quickSortParallel(arr, pi + 1, high, depth + 1)). These tasks are executed concurrently by different threads.&lt;/p&gt;

&lt;p&gt;The condition if(depth &amp;lt; 3) ensures that task creation is limited to the first few recursive levels, which prevents excessive overhead from creating too many tasks in deep recursion. The #pragma omp taskwait directive ensures that the program waits for both sorting tasks to complete before moving forward, maintaining correctness in the overall sorting process. This structured approach efficiently parallelizes the Quick Sort algorithm while balancing task creation and resource usage.&lt;/p&gt;

&lt;h3&gt;
  
  
  Merge Sort (O(n log n))
&lt;/h3&gt;

&lt;p&gt;Merge Sort is another divide-and-conquer algorithm with guaranteed O(n log n) performance in the worst case. It is particularly suitable for parallelization since different subarrays can be merged concurrently, making it an ideal candidate for OpenMP.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;mergeSortParallel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;depth&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;l&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

        &lt;span class="cp"&gt;#pragma omp task shared(arr) if(depth &amp;lt; 3)
&lt;/span&gt;        &lt;span class="n"&gt;mergeSortParallel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;depth&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="cp"&gt;#pragma omp task shared(arr) if(depth &amp;lt; 3)
&lt;/span&gt;        &lt;span class="n"&gt;mergeSortParallel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;depth&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="cp"&gt;#pragma omp taskwait
&lt;/span&gt;        &lt;span class="n"&gt;merge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the parallel version of Merge Sort, the array is recursively divided into smaller subarrays that can be sorted independently. The key advantage for parallelization lies in the merging process, where two sorted subarrays can be merged simultaneously in separate threads. This parallel execution of independent subarray sorts significantly reduces the overall runtime, especially for large datasets. The recursion depth is controlled to prevent excessive overhead, ensuring that only the upper levels of recursion are parallelized, where the benefits of parallelism are most significant.&lt;/p&gt;

&lt;p&gt;In the source code, parallelization is implemented using the #pragma omp task directive, which creates parallel tasks for sorting the left and right halves of the array (mergeSortParallel(arr, l, m) and mergeSortParallel(arr, m + 1, r)). The if(depth &amp;lt; 3) condition restricts task creation to higher levels of recursion, balancing the trade-off between parallelism and overhead. The #pragma omp taskwait directive ensures that both sorting tasks are completed before proceeding with the merging step, which is crucial for maintaining the correctness of the Merge Sort algorithm.&lt;/p&gt;

&lt;h2&gt;
  
  
  Methodology
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Experimental Setup
&lt;/h3&gt;

&lt;p&gt;To evaluate performance, the program generates random arrays of integers, where the size of the arrays was varied from small to large (e.g., 2,500 to 500,000 integers). The random arrays were generated using the command-line arguments for array size and seed value. Each sorting algorithm, both serial and parallel versions (16 threads), was compiled into separate executables, and the execution time was measured.&lt;/p&gt;

&lt;p&gt;Each executable took two command-line arguments:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cpp &lt;span class="o"&gt;[&lt;/span&gt;executable name] &lt;span class="o"&gt;[&lt;/span&gt;number of random integers to generate] &lt;span class="o"&gt;[&lt;/span&gt;seed value &lt;span class="k"&gt;for &lt;/span&gt;random number generation]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then, a &lt;a href="https://github.com/sahrohit/parallel-sorting/blob/main/run.sh" rel="noopener noreferrer"&gt;bash script&lt;/a&gt; was written to build and execute all the executable and write the runtime of each into a single CSV file.&lt;/p&gt;

&lt;p&gt;Parallelization was achieved using OpenMP, with each sorting algorithm being parallelized based on its structure. Execution times were measured using the C++ chrono library, ensuring accuracy. Performance was compared for each algorithm in serial, parallel, and using STL’s std::sort as a reference.&lt;/p&gt;

&lt;h3&gt;
  
  
  Metrics for Comparision
&lt;/h3&gt;

&lt;p&gt;The primary metric used for comparison was execution time in seconds. Tests were performed across a range of input sizes to observe how performance scales with increasing data. The results were plotted in graphs to visualize the difference between serial, parallel, and reference implementations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bubble Sort Analysis
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Graph 1: Bubble Sort (bbs, bbp, reference)
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmrjsksg5k2yc2ofscsdd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmrjsksg5k2yc2ofscsdd.png" alt="Bubble Sort Runtime over Input Size" width="800" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F461oz82zofj18glz9ztg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F461oz82zofj18glz9ztg.png" alt="Bubble Sort Runtime (log) over Input Size" width="800" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This graph compares the execution times of three sorting implementations: the C++ STL sort, the parallel bubble sort, and the serial bubble sort, for various input sizes. Specifically, the parallel version could be considerably slower for smaller-sized arrays due to the overhead introduced by thread management, synchronization, and cache coherency. However, for larger inputs, its execution time will decrease considerably. This is due to the fact that CPU cores are utilized more effectively, memory access patterns are more efficient, and the hardware prefetching is better exploited, even though both bubble sort implementations have an O(n²) complexity.&lt;/p&gt;

&lt;h3&gt;
  
  
  Explanation of Graph
&lt;/h3&gt;

&lt;p&gt;For smaller datasets, the overhead associated with managing threads in the parallel version (bbp) leads to slower performance compared to the serial version (bbs). However, as the dataset size increases, the parallel implementation starts to outperform the serial one by distributing the workload across multiple cores. The reference STL sort algorithm consistently surpasses both bbs and bbp due to its sophisticated, optimized design, which includes considerations for memory access patterns, CPU pipelining, and more advanced sorting algorithms with better time complexity.&lt;/p&gt;

&lt;h3&gt;
  
  
  Analysis
&lt;/h3&gt;

&lt;p&gt;The weaker performance of parallel bubble sort on smaller arrays stems from the considerable overhead involved in thread creation and synchronization. Additionally, bubble sort’s inherent memory access patterns introduce frequent synchronization barriers due to data dependencies. However, on larger datasets, the increased workload enables parallelism to be more effective, resulting in better performance for bbp compared to its serial counterpart. Nevertheless, both bubble sort variants remain slower than the STL sort, which leverages advanced algorithms and optimizations tailored for modern CPU architectures.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick Sort Analysis
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Graph 2: Quick Sort (qss, qsp, reference)
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft17koipnhg4w6xtpcfvc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft17koipnhg4w6xtpcfvc.png" alt="Quick Sort Runtime over Input Size" width="800" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnommb8d4eyhp5zkwm5y3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnommb8d4eyhp5zkwm5y3.png" alt="Quick Sort Runtime (log) over Input Size" width="800" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The graphs compare the execution times of three different quick sort implementations: serial (qss), parallel (qsp with 16 threads), and the reference (STL sort). Surprisingly, both the serial and parallel implementations outperform the std::sort, which can be attributed to key differences in the algorithmic approach and optimizations.&lt;/p&gt;

&lt;h3&gt;
  
  
  Explanation of Graph
&lt;/h3&gt;

&lt;p&gt;For smaller datasets, the serial Quick Sort (qss) performs comparably to the parallel Quick Sort (qsp). As input sizes increase, the parallel version takes advantage of multi-core processing, drastically reducing execution times compared to the serial version. Interestingly, despite the std::sort being highly optimized, both qss and qsp demonstrate faster runtimes. This outcome suggests that the overhead introduced by std::sort's additional optimizations becomes detrimental, especially for the data distributions tested here.&lt;/p&gt;

&lt;h3&gt;
  
  
  Analysis
&lt;/h3&gt;

&lt;p&gt;The slower performance of std::sort in this comparison is likely due to the overhead of its more complex optimizations, such as the 3-partition quicksort and the switch to insertion sort for small partitions. While these optimizations are designed for handling worst-case scenarios and duplicate-heavy datasets, they can add unnecessary overhead when such conditions are not present. In contrast, the simpler Lomuto partition scheme used in qss and qsp proves more efficient for the input sizes and data distributions tested here, leading to better performance.&lt;/p&gt;

&lt;h2&gt;
  
  
  Merge Sort Analysis
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Graph 3: Merge Sort (mss, msp, reference)
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvz9wysd4jkd4rogeih7l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvz9wysd4jkd4rogeih7l.png" alt="Merge Sort Runtime over Input Size" width="800" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4vdqpaf08o3o3awx9ws7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4vdqpaf08o3o3awx9ws7.png" alt="Merge Sort Runtime (log) over Input Size" width="800" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The third graph presents the performance of Merge Sort, comparing the serial version (mss), the parallel version (msp), and the reference (STL sort). Merge Sort is well-suited for parallelization, making it reasonable to expect the parallel version (msp) to outperform the serial counterpart (mss) when tested across dataset sizes. The graph illustrates how the parallel version takes advantage of multi-threading, particularly for larger inputs, while the reference STL sort only performs slightly better than serial merge sort.&lt;/p&gt;

&lt;h3&gt;
  
  
  Explanation of Graph
&lt;/h3&gt;

&lt;p&gt;The parallel Merge Sort (msp) consistently outperforms the serial version (mss) for larger input sizes. By utilizing 16 threads, msp efficiently splits the array and merges the sections concurrently, significantly improving execution time compared to the serial version (mss). However, for smaller datasets, the overhead of managing multiple threads diminishes msp's advantage. On the other hand, STL sort, while not parallelized, leverages highly efficient algorithms, including a 3-partition quicksort, insertion sort, and heap sort fallback, allowing it to perform slightly better than the serial Merge Sort (mss), particularly on small datasets.&lt;/p&gt;

&lt;h3&gt;
  
  
  Analysis
&lt;/h3&gt;

&lt;p&gt;Merge Sort is highly suited for parallelization, as its divide-and-conquer approach naturally allows independent sections of an array to be merged simultaneously. The parallel Merge Sort (msp) significantly reduces execution time for larger datasets by distributing the workload across 16 threads. While the STL sort benefits from optimized memory management and CPU cache utilization, it cannot leverage parallelism, making it slower than msp on large datasets. However, it still manages to outperform the serial Merge Sort (mss) in small data scenarios due to its efficient hybrid algorithms that avoid the worst-case time complexities typical of quicksort.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hyper-threading Analysis
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Hyper-threading Results
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu80r5cdyscr92fuod165.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu80r5cdyscr92fuod165.png" alt="Bubble Sort Hyperthreaded" width="800" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi4tjbfde58oihyx23fhq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi4tjbfde58oihyx23fhq.png" alt="Bubble Sort Hyperthreaded (log)" width="800" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbh7a5kqntgyx6hfo42ps.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbh7a5kqntgyx6hfo42ps.png" alt="Quick Sort Hyperthreaded" width="800" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmnq7qg5e4rb0zqghn7jz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmnq7qg5e4rb0zqghn7jz.png" alt="Quick Sort Hyperthreaded (log)" width="800" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fik9m7dli2f5ebo64w2n4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fik9m7dli2f5ebo64w2n4.png" alt="Merge Sort Hyperthreaded" width="800" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F38euypgmjise98cr8vh5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F38euypgmjise98cr8vh5.png" alt="Merge Sort Hyperthreaded (log)" width="800" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The hyper-threading experiment was conducted on all the above algorithms as the test case. Hyper-threading was enabled and the execution time was measured for varying input sizes. The graph shows the execution times of the hyper threaded runtime decrease in all cases and performs better with hyper-threading.&lt;/p&gt;

&lt;h3&gt;
  
  
  Explanation of Results
&lt;/h3&gt;

&lt;p&gt;hyper-threading shows a small improvement in performance for all input sizes, particularly when the workload is large enough to saturate the logical cores. For smaller datasets, hyper-threading introduces minimal benefit as the threads do not fully utilize the available resources.&lt;/p&gt;

&lt;h3&gt;
  
  
  Analysis
&lt;/h3&gt;

&lt;p&gt;hyper-threading can provide performance improvements by allowing more threads to run simultaneously, but this is only beneficial when the workload is large enough to keep all logical cores busy. In cases where the workload is too small, the overhead of hyper-threading can actually degrade performance.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The results demonstrate that while parallel implementations of sorting algorithms can provide performance improvements, the benefits are heavily dependent on the size of the dataset and the specific algorithm. For smaller datasets, the overhead of managing threads often negates the benefits of parallelism. However, for larger datasets, parallel versions outperform their serial counterparts. The analysis also shows that hyper-threading can provide performance boosts, but only under certain conditions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Source Code
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/sahrohit/parallel-sorting.git" rel="noopener noreferrer"&gt;https://github.com/sahrohit/parallel-sorting.git&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/sahrohit/parallel-sorting/blob/main/combined.csv" rel="noopener noreferrer"&gt;https://github.com/sahrohit/parallel-sorting/blob/main/combined.csv&lt;/a&gt;&lt;/p&gt;

</description>
      <category>performance</category>
      <category>algorithms</category>
      <category>cpp</category>
      <category>compiling</category>
    </item>
    <item>
      <title>Automated Latex Resume with GitHub Action</title>
      <dc:creator>Rohit Sah</dc:creator>
      <pubDate>Sun, 12 May 2024 17:49:44 +0000</pubDate>
      <link>https://forem.com/sahrohit/automated-latex-resume-with-github-action-e0p</link>
      <guid>https://forem.com/sahrohit/automated-latex-resume-with-github-action-e0p</guid>
      <description>&lt;p&gt;This blog is originally post on &lt;a href="https://blog.sahrohit.com.np/posts/automated-latex-resume" rel="noopener noreferrer"&gt;https://blog.sahrohit.com.np/posts/automated-latex-resume&lt;/a&gt; and distributed across platforms for more reach.&lt;/p&gt;

&lt;p&gt;Resumes are a crucial tool for developers. They are often the first impression you make on a potential employer, so it's important to have a well-formatted and ATS-friendly resume. However, keeping your resume updated across multiple platforms can be a hassle. This blog post explores how to use Latex and GitHub Actions to automate the resume update process.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;Finding ATS-friendly resume templates is easy. The challenge lies in keeping your resume updated across various platforms, including your portfolio, GitHub README, and downloadable links. Manually updating each location can be time-consuming, error-prone and may lead to versioning hell of resume.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjf277ighzdgxfatfal9z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjf277ighzdgxfatfal9z.png" alt="Versioning Hell of my Resume’s" width="711" height="640"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Versioning Hell of my Resume’s&lt;/p&gt;

&lt;h2&gt;
  
  
  Solution Attempts
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Version Control with Git
&lt;/h3&gt;

&lt;p&gt;I've always wanted to use Git to manage versions of my resume, but dealing with .docx files in Git isn't very enjoyable. My breakthrough with Git version control came when I switched to creating my resume in LaTeX. Not only is LaTeX ATS-friendly, but it's also easy to maintain since I can track changes in my Git repository. LaTeX does have a learning curve, but you can start editing in minutes by referring to guides on platforms like &lt;a href="https://www.overleaf.com" rel="noopener noreferrer"&gt;Overleaf&lt;/a&gt;, one of the best online LaTeX editors. This solved one of my problems, but updating links everywhere was still an issue.&lt;/p&gt;

&lt;p&gt;Resume.tex - &lt;a href="https://github.com/sahrohit/sahrohit/blob/master/resume.tex" rel="noopener noreferrer"&gt;https://github.com/sahrohit/sahrohit/blob/master/resume.tex&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Overleaf Link -  &lt;a href="https://www.overleaf.com/read/qxxqjqgxcfzn#5da12c" rel="noopener noreferrer"&gt;https://www.overleaf.com/read/qxxqjqgxcfzn#5da12c&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  GitHub Actions for Automation
&lt;/h3&gt;

&lt;p&gt;I decided to create a separate GitHub repository to manage my resume but realized that using the default README (same as my username) would be the best place to keep it. I used GitHub Actions to automate the build process, which generates and commits the PDF to the same repository. This setup serves two purposes: it checks whether the LaTeX file builds successfully and generates the PDF of my resume in my profile repository. The action only runs if there are changes in the resume.tex file, committing the generated PDF back to the same repository.&lt;/p&gt;

&lt;p&gt;Additionally, the generated PDF is uploaded as an artifact and later downloaded to be committed to the portfolio repository. Currently, my portfolio is built with Next.js and includes the resume in the public folder, making it downloadable via the website.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Build and Commit Resume&lt;/span&gt;

&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;push&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;branches&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;master&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;paths&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;resume.tex&lt;/span&gt;

&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;compile&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Compile resume PDF&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;

    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Check out Resume Repository&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;repository&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sahrohit/sahrohit&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Run the build process with Docker&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;addnab/docker-run-action@v3&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;thomasweise/docker-texlive-full:latest&lt;/span&gt;
          &lt;span class="na"&gt;options&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;-v ${{ github.workspace }}:/data&lt;/span&gt;
          &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
            &lt;span class="s"&gt;cd data&lt;/span&gt;
            &lt;span class="s"&gt;pdflatex resume.tex&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Upload PDF artifact&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/upload-artifact@v3&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;resume-pdf&lt;/span&gt;
          &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;resume.pdf&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Commit and push changes&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;stefanzweifel/git-auto-commit-action@v5&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;commit_message&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Updated Resume PDF&lt;/span&gt;

  &lt;span class="na"&gt;commit&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Commit resume to Portfolio&lt;/span&gt;
    &lt;span class="na"&gt;needs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;compile&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;

    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Check out portfolio repo&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;repository&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sahrohit/portfolio&lt;/span&gt;
          &lt;span class="na"&gt;token&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.ACCESS_TOKEN }}&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Download PDF artifact&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/download-artifact@v3&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;resume-pdf&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Move PDF to public directory&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
          &lt;span class="s"&gt;mv resume.pdf public/"Rohit's-Resume.pdf"&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Commit and push changes&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;stefanzweifel/git-auto-commit-action@v5&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;commit_message&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Updated Resume PDF&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Upon changes to my &lt;a href="https://github.com/sahrohit/sahrohit/blob/master/resume.tex" rel="noopener noreferrer"&gt;resume.tex&lt;/a&gt; file in the &lt;a href="https://github.com/sahrohit/sahrohit" rel="noopener noreferrer"&gt;sahrohit/sahrohit&lt;/a&gt; repository, it generates a PDF and adds it to the same repository. Additionally, it creates a commit to update the resume in my portfolio (&lt;a href="https://github.com/sahrohit/portfolio" rel="noopener noreferrer"&gt;sahrohit/portfolio&lt;/a&gt;). As &lt;a href="https://github.com/sahrohit/portfolio" rel="noopener noreferrer"&gt;sahrohit/portfolio&lt;/a&gt; is private, for committing you'll need an ACCESS_TOKEN in project’s secrets to authorize the commit.&lt;/p&gt;

&lt;p&gt;At the end of the day, what it does is save me a few manual commits and updates. But more than that, it gives me the satisfaction of updating the resume.tex file and initiating the entire process with a single commit. It's fulfilling to see how the resume updates across multiple places. It's these small satisfactions that keep us going and looking forward to another day.&lt;/p&gt;

</description>
      <category>githubactions</category>
      <category>automation</category>
      <category>latex</category>
      <category>resume</category>
    </item>
  </channel>
</rss>
