<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Alfiya Fatima</title>
    <description>The latest articles on Forem by Alfiya Fatima (@alfiyafatima09).</description>
    <link>https://forem.com/alfiyafatima09</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1330812%2F89894d7e-9ae7-4040-afb2-b498ae9fd119.jpg</url>
      <title>Forem: Alfiya Fatima</title>
      <link>https://forem.com/alfiyafatima09</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/alfiyafatima09"/>
    <language>en</language>
    <item>
      <title>From DNA to DSA : How I Went From Studying Genomes To Understanding Algorithms!</title>
      <dc:creator>Alfiya Fatima</dc:creator>
      <pubDate>Mon, 30 Dec 2024 19:33:26 +0000</pubDate>
      <link>https://forem.com/alfiyafatima09/from-dna-to-dsa-how-i-went-from-studying-genomes-to-understanding-algorithms-238c</link>
      <guid>https://forem.com/alfiyafatima09/from-dna-to-dsa-how-i-went-from-studying-genomes-to-understanding-algorithms-238c</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;I am Alfiya Fatima, a third year engineering student and a Full-Stack Developer from Bengaluru.&lt;br&gt;
Let's Connect on &lt;a href="https://www.linkedin.com/in/alfiyafatima09/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxktp8ssgvcizpvb6dvtw.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxktp8ssgvcizpvb6dvtw.jpg" alt="Profile Pic" width="800" height="799"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;alfiyafatima09&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Let's Begin :)
&lt;/h2&gt;

&lt;p&gt;In 12th grade, I recall my lecturer telling, &lt;strong&gt;"The DNA in your body stretches 120 billion miles—twice the solar system's diameter!"&lt;/strong&gt; Her words sparked my curiosity about decoding DNA, though I never imagined I’d someday use algorithms to analyze genomes. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How did I find myself here, exploring the world of Data Structures and Algorithms?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I was fortunate to attend the &lt;strong&gt;ACM India Winter School 2024 on Data Structures and Algorithms for Strings&lt;/strong&gt; at Amrita Vishwa Vidyapeetham, Coimbatore. It was a remarkable experience, surrounded by brilliant peers and guided by top professors:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://www.linkedin.com/in/venkatesh-raman-8865b3b/" rel="noopener noreferrer"&gt;Prof. Venkatesh Raman&lt;/a&gt; (IMSc Chennai)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.linkedin.com/in/sharma-thankachan-6304a0199/" rel="noopener noreferrer"&gt;Prof. Sharma Thankachan&lt;/a&gt; (North Carolina State University)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.linkedin.com/in/chirgjain/" rel="noopener noreferrer"&gt;Prof. Chirag Jain&lt;/a&gt; (IISc Bangalore).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With my friend &lt;a href="https://www.linkedin.com/in/skysingh04/" rel="noopener noreferrer"&gt;Akash&lt;/a&gt;, I joined students from diverse backgrounds-undergraduates to PhDs-representing institutes across India. The professors’ ability to &lt;strong&gt;simplify complex concepts&lt;/strong&gt; and spark curiosity turned each session into an &lt;strong&gt;exciting journey&lt;/strong&gt; through the string algorithms.&lt;/p&gt;

&lt;p&gt;This opportunity started with a suggestion from my senior, &lt;a href="https://www.linkedin.com/in/ashupdsce/" rel="noopener noreferrer"&gt;Ashutosh Pandey&lt;/a&gt; , who encouraged us to be a part of this.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Did We Learn?
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;String Matching Algorithms&lt;/li&gt;
&lt;li&gt;FFT-Based Algorithms&lt;/li&gt;
&lt;li&gt;Suffix Trees and Arrays&lt;/li&gt;
&lt;li&gt;Burrows-Wheeler Transform &amp;amp; FM Index&lt;/li&gt;
&lt;li&gt;Document Retrieval &amp;amp; Range Minimum Queries&lt;/li&gt;
&lt;li&gt;Applications in Computational Biology&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Glimpse about the classes!
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Have you ever encountered pattern matching?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pattern matching&lt;/strong&gt; is the process of finding a specific sequence or pattern within a larger set of data, like searching for a word in a text or a pattern in a dataset.&lt;/p&gt;

&lt;p&gt;It’s a fundamental concept with wide applications, from powering &lt;strong&gt;search engines&lt;/strong&gt; to &lt;strong&gt;solving complex problems in computational biology&lt;/strong&gt;. Whether it’s finding relevant information on the web or identifying gene sequences in vast genomic datasets, pattern matching is key.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;But what makes this so intriguing? How can we efficiently solve various challenges, such as finding all occurrences of a pattern in a text, identifying repeated patterns for genome assembly, enabling data compression, or detecting similarities in large datasets?&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;To answer these broader questions, let us first narrow down the problem: How do we find a specific pattern in a given text efficiently?&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Naive Approach
&lt;/h2&gt;

&lt;p&gt;The simplest way to find a pattern in a text is to &lt;strong&gt;compare it with every possible substring of the text&lt;/strong&gt;. This is exactly what the Naive Algorithm does.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How It Works:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The algorithm compares the pattern with each substring of the text.&lt;/li&gt;
&lt;li&gt;If it finds a match, great! If not, it moves on to the next one.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt;&lt;br&gt;
Text: "&lt;code&gt;banana&lt;/code&gt;"&lt;br&gt;
Pattern: "&lt;code&gt;ana&lt;/code&gt;"&lt;/p&gt;

&lt;p&gt;The algorithm compares the pattern "ana" with each possible substring of the text:&lt;br&gt;
Compare "&lt;code&gt;ana&lt;/code&gt;" with "&lt;code&gt;ban&lt;/code&gt;" → No match.&lt;br&gt;
Compare "&lt;code&gt;ana&lt;/code&gt;" with "&lt;code&gt;ana&lt;/code&gt;" → Match found!&lt;br&gt;
Compare "&lt;code&gt;ana&lt;/code&gt;" with "&lt;code&gt;nan&lt;/code&gt;" → No match.&lt;br&gt;
Compare "&lt;code&gt;ana&lt;/code&gt;" with "&lt;code&gt;ana&lt;/code&gt;" → Match found!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Is it efficient?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The time complexity of the &lt;strong&gt;Naive Algorithm is O(n × m)&lt;/strong&gt;, where:&lt;br&gt;
n is the length of the text,&lt;br&gt;
m is the length of the pattern.&lt;/p&gt;

&lt;p&gt;Now, imagine you have a genomic sequence with 10 million base pairs (n = 10^7) and a pattern of 10 bases (m = 10). How many comparisons would the Naive Algorithm perform? &lt;strong&gt;The answer: 100 million comparisons.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyuiv188gto1om01zdc30.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyuiv188gto1om01zdc30.jpg" alt="Complex" width="600" height="389"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What If There’s a Smarter Way to Match Patterns?
&lt;/h2&gt;

&lt;p&gt;Have you ever wondered if there's a way to &lt;em&gt;avoid checking positions that you already know won’t work?&lt;/em&gt; This is where the &lt;strong&gt;Knuth-Morris-Pratt&lt;/strong&gt; (KMP) Algorithm comes in.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How does KMP help?&lt;/strong&gt;&lt;br&gt;
KMP preprocesses the pattern to create a &lt;strong&gt;prefix table&lt;/strong&gt;, which stores information about how much of the pattern matches at different points. This helps skip unnecessary comparisons.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt;&lt;br&gt;
Text: "&lt;code&gt;banana&lt;/code&gt;"&lt;br&gt;
Pattern: "&lt;code&gt;ana&lt;/code&gt;"&lt;br&gt;
Prefix Table: [0, 0, 1]&lt;/p&gt;

&lt;p&gt;The table helps KMP skip checks and match the pattern more efficiently.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does it improve performance?&lt;/strong&gt;&lt;br&gt;
Yes! The time complexity of &lt;strong&gt;KMP is O(n + m)&lt;/strong&gt;, much faster than the Naive approach (O(n × m)). &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;But is it enough for large genomic data?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7rdllrwgqtd3h0zl49kj.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7rdllrwgqtd3h0zl49kj.jpg" alt="Not yet!" width="204" height="192"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Is There Even More Efficient Way?
&lt;/h2&gt;

&lt;p&gt;Ok read this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Every Substring Is a Prefix of Some Suffix of a Text T&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0mr335crmnams1yu3jr2.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0mr335crmnams1yu3jr2.jpg" alt="Wait What Moment" width="220" height="220"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;At first, this may seem like a simple idea, but in pattern matching algorithms, it's a powerful insight. When searching for a pattern in a text, &lt;strong&gt;every substring is a prefix of some suffix&lt;/strong&gt;. Recognizing this allows us to optimize pattern matching, skipping unnecessary parts of the data and improving efficiency.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is a Suffix Tree?
&lt;/h2&gt;

&lt;p&gt;A Suffix Tree is a compressed tree structure that represents all the suffixes of a string.&lt;/p&gt;

&lt;p&gt;For example, for the string "banana", the suffixes would be:&lt;/p&gt;

&lt;p&gt;"&lt;code&gt;banana&lt;/code&gt;"&lt;br&gt;
"&lt;code&gt;anana&lt;/code&gt;"&lt;br&gt;
"&lt;code&gt;nana&lt;/code&gt;"&lt;br&gt;
"&lt;code&gt;ana&lt;/code&gt;"&lt;br&gt;
"&lt;code&gt;na&lt;/code&gt;"&lt;br&gt;
"&lt;code&gt;a&lt;/code&gt;"&lt;/p&gt;

&lt;p&gt;Each of these &lt;strong&gt;suffixes is represented as a node in the tree&lt;/strong&gt;, and the tree is compressed, meaning that common prefixes among the suffixes are shared to save space.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpa0mij1tpqnkzco2pcnl.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpa0mij1tpqnkzco2pcnl.jpg" alt="Suffix Trees" width="800" height="1103"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why Is This Important?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Efficient Pattern Matching&lt;/strong&gt;: Once the suffix tree is built in O(n) time, searching for a pattern like “ana” takes just &lt;strong&gt;O(m) time&lt;/strong&gt;, making it much faster than methods like the Naive or KMP algorithms.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multiple Patterns&lt;/strong&gt;: The tree allows you to search for multiple patterns at once and find the longest repeated substring in &lt;strong&gt;O(n) time.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How Does It Scale for Large Genomes?&lt;/strong&gt;&lt;br&gt;
For large genomes, like one with 10 million base pairs, building the tree takes O(n) time, and searching for a 10-base pattern takes only &lt;strong&gt;O(m) time&lt;/strong&gt;, ensuring highly efficient searches.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Didn’t you find it fascinating how we managed to reduce the complexity?&lt;/em&gt;&lt;/strong&gt; &lt;/p&gt;

&lt;p&gt;It was one of my &lt;strong&gt;favorite parts in the class&lt;/strong&gt;! Seeing how optimizing algorithms leads to massive performance improvements was truly rewarding and made the experience even more enjoyable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion:
&lt;/h2&gt;

&lt;p&gt;While some concepts were challenging, I &lt;strong&gt;embraced the uncertainty&lt;/strong&gt;, trusting that with time, everything would fall into place. The journey through these algorithms wasn’t always easy, but &lt;strong&gt;each step forward deepened my understanding&lt;/strong&gt;. I’m confident that as I continue to explore, the connections will become even clearer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Heartfelt Gratitude
&lt;/h2&gt;

&lt;p&gt;I am truly thankful to ACM India, Amrita Vishwa Vidyapeetham, and the amazing mentors who made this program possible. Their commitment to fostering an environment of learning, collaboration, and innovation has been a great source of inspiration.&lt;/p&gt;

&lt;p&gt;A special thank you to &lt;strong&gt;Dr. K. Somasundaram&lt;/strong&gt; for his humility and constant support throughout the winter school. He made sure we were comfortable and made the most of the experience.&lt;/p&gt;

&lt;p&gt;The connections I made here is something I will always cherish. Collaborating, sharing ideas, and learning from each other made this journey even more memorable.&lt;/p&gt;

&lt;p&gt;As the program came to a close, we received certificates and shared a final group photo to mark the end of this enriching experience.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8pez2gyrmf3edufxk3hy.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8pez2gyrmf3edufxk3hy.jpg" alt="Group Picture" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>algorithms</category>
      <category>beginners</category>
      <category>complexity</category>
    </item>
  </channel>
</rss>
