<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Naveen Jayaraj</title>
    <description>The latest articles on Forem by Naveen Jayaraj (@naveen_jayaraj).</description>
    <link>https://forem.com/naveen_jayaraj</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3579206%2F392a817c-1329-4c1d-a2ad-d0cb3a24152e.png</url>
      <title>Forem: Naveen Jayaraj</title>
      <link>https://forem.com/naveen_jayaraj</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/naveen_jayaraj"/>
    <language>en</language>
    <item>
      <title>Reimagining Essay Evaluation: Graphs, RoBERTa, and the Art of Fair Scoring</title>
      <dc:creator>Naveen Jayaraj</dc:creator>
      <pubDate>Wed, 22 Oct 2025 21:52:31 +0000</pubDate>
      <link>https://forem.com/naveen_jayaraj/reimagining-essay-evaluation-graphs-roberta-and-the-art-of-fair-scoring-4n0m</link>
      <guid>https://forem.com/naveen_jayaraj/reimagining-essay-evaluation-graphs-roberta-and-the-art-of-fair-scoring-4n0m</guid>
      <description>&lt;p&gt;&lt;em&gt;Modern way to evaluate a paper&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Author:&lt;/strong&gt; Naveen Jayaraj&lt;br&gt;&lt;br&gt;
📅 &lt;em&gt;October 21, 2025&lt;/em&gt; · ⏱ &lt;em&gt;5 min read&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;The Challenge: Essays Are More Than Just Word Soup&lt;/li&gt;
&lt;li&gt;The Monarch-AES Approach&lt;/li&gt;
&lt;li&gt;Conquering the Challenges&lt;/li&gt;
&lt;li&gt;The Optimization Experiment: Monarch Butterfly Optimization&lt;/li&gt;
&lt;li&gt;Results and Evaluation&lt;/li&gt;
&lt;li&gt;Lessons and Reflections&lt;/li&gt;
&lt;li&gt;Conclusion&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  The Challenge: Essays Are More Than Just Word Soup
&lt;/h2&gt;

&lt;p&gt;Automated Essay Scoring (AES) is an intriguing &lt;strong&gt;Natural Language Processing (NLP)&lt;/strong&gt; challenge — the attempt to score essays in a way that's fair, consistent, and effective.&lt;br&gt;&lt;br&gt;
The issue is not trivial. Essays aren't just words; they contain &lt;strong&gt;argument, structure, and coherence&lt;/strong&gt;.  &lt;/p&gt;

&lt;p&gt;This understanding led to the creation of &lt;strong&gt;Monarch-AES&lt;/strong&gt; (also called &lt;em&gt;GraphBertAES&lt;/em&gt;), a hybrid AES model that combines:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The semantic power of &lt;strong&gt;RoBERTa&lt;/strong&gt;,
&lt;/li&gt;
&lt;li&gt;The structure-aware reasoning ability of &lt;strong&gt;Graph Attention Networks (GAT)&lt;/strong&gt;, and
&lt;/li&gt;
&lt;li&gt;The optimization capability of &lt;strong&gt;Monarch Butterfly Optimization (MBO)&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Traditional transformer-based AES models like BERT and RoBERTa perform well in understanding &lt;em&gt;what&lt;/em&gt; an essay says, but not &lt;em&gt;how&lt;/em&gt; it is structured.&lt;br&gt;&lt;br&gt;
Good writing is as much about &lt;strong&gt;organization and coherence&lt;/strong&gt; as it is about content.&lt;/p&gt;

&lt;p&gt;This inspired the question:  &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Can a model understand both what an essay says and how it says it?&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Monarch-AES Approach
&lt;/h2&gt;

&lt;p&gt;The &lt;strong&gt;Monarch-AES architecture&lt;/strong&gt; was designed to evaluate both &lt;strong&gt;semantics&lt;/strong&gt; (meaning) and &lt;strong&gt;structure&lt;/strong&gt; (organization) of essays.&lt;/p&gt;

&lt;h3&gt;
  
  
  🧠 Semantic Modeling
&lt;/h3&gt;

&lt;p&gt;RoBERTa processed entire essays to generate deep contextual embeddings using the &lt;code&gt;[CLS]&lt;/code&gt; token — representing the essay’s overall meaning.&lt;/p&gt;

&lt;h3&gt;
  
  
  🔗 Structural Modeling
&lt;/h3&gt;

&lt;p&gt;Each essay was represented as a &lt;strong&gt;graph&lt;/strong&gt;, with sentences as nodes and two kinds of edges:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Sequential edges&lt;/strong&gt; → between consecutive sentences to model narrative flow
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Semantic edges&lt;/strong&gt; → between semantically similar sentences to model thematic cohesion
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This setup allowed the &lt;strong&gt;Graph Attention Network (GAT)&lt;/strong&gt; to capture how ideas relate and transition — essentially, how well-structured the essay is.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6f0frrhy3u7dakpxpvt3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6f0frrhy3u7dakpxpvt3.png" alt="Flow chart of the project" width="800" height="1190"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Overall pipeline of Monarch-AES: semantic and structural layers working together.&lt;/em&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  🧩 Combined Representation
&lt;/h3&gt;

&lt;p&gt;The final essay representation was created by &lt;strong&gt;concatenating&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;RoBERTa’s semantic embedding
&lt;/li&gt;
&lt;li&gt;GAT’s structural embedding
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This &lt;strong&gt;fusion of meaning and structure&lt;/strong&gt; was then passed through a regression layer to produce the essay’s predicted score.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fletnbl1bbikut82onyrw.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fletnbl1bbikut82onyrw.jpeg" alt="GATConv model architecture" width="800" height="289"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Graph Attention Network (GAT) used for structural coherence modeling.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsos5h0exetkimutst35l.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsos5h0exetkimutst35l.jpg" alt="RoBERTa architecture diagram" width="700" height="433"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;RoBERTa model architecture providing contextual semantic representation.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Conquering the Challenges
&lt;/h2&gt;

&lt;p&gt;Building Monarch-AES came with several hurdles — both technical and conceptual.&lt;/p&gt;

&lt;h3&gt;
  
  
  ⚙️ Data Handling
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;The model was trained and evaluated on the &lt;strong&gt;ASAP 2.0 dataset&lt;/strong&gt;, a benchmark for AES tasks.
&lt;/li&gt;
&lt;li&gt;To focus on consistency, a single essay prompt (“The Venus Prompt”) was chosen.
&lt;/li&gt;
&lt;li&gt;Class imbalance was tackled using &lt;strong&gt;PyTorch’s WeightedRandomSampler&lt;/strong&gt; for balanced training.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  🔄 Graph Construction
&lt;/h3&gt;

&lt;p&gt;Creating dynamic sentence-level graphs during training was initially slow.&lt;br&gt;&lt;br&gt;
The fix? &lt;strong&gt;Precompute all graphs&lt;/strong&gt; before training and store them as objects — which significantly sped up the process.&lt;/p&gt;

&lt;h3&gt;
  
  
  🧪 Debugging &amp;amp; Stability
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Mixing &lt;code&gt;BertModel&lt;/code&gt; and &lt;code&gt;RoBERTa&lt;/code&gt; caused compatibility warnings — solved by migrating fully to &lt;code&gt;RobertaModel&lt;/code&gt;.
&lt;/li&gt;
&lt;li&gt;Common errors (like unimported modules or &lt;code&gt;train_test_split&lt;/code&gt; issues) reinforced the value of clean, reproducible code practices.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Optimization Experiment: Monarch Butterfly Optimization
&lt;/h2&gt;

&lt;p&gt;While standard training used &lt;strong&gt;AdamW&lt;/strong&gt;, Monarch-AES also underwent an experimental phase using &lt;strong&gt;Monarch Butterfly Optimization (MBO)&lt;/strong&gt; — a &lt;strong&gt;metaheuristic inspired by butterfly migration&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Unlike gradient descent:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;MBO evolves a &lt;em&gt;population of solutions&lt;/em&gt; across generations.
&lt;/li&gt;
&lt;li&gt;It balances exploration and exploitation using &lt;strong&gt;Lévy flights&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  🌿 Why MBO?
&lt;/h3&gt;

&lt;p&gt;MBO can escape &lt;strong&gt;local minima&lt;/strong&gt; that AdamW tends to converge to, providing better parameter exploration in complex, high-dimensional spaces.&lt;/p&gt;

&lt;p&gt;The experimental MBO setup required:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Removing backpropagation&lt;/li&gt;
&lt;li&gt;Ranking model candidates by &lt;strong&gt;fitness (loss)&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Iteratively evolving them&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Although computationally heavier, MBO showed that &lt;strong&gt;nature-inspired algorithms&lt;/strong&gt; can successfully tune deep models in novel ways.&lt;/p&gt;




&lt;h2&gt;
  
  
  Results and Evaluation
&lt;/h2&gt;

&lt;p&gt;Model performance was measured using key AES metrics:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Score&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;QWK&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Quadratic Weighted Kappa&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.834&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;MSE&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Mean Squared Error&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.198&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;MAE&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Mean Absolute Error&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.256&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;These results indicate &lt;strong&gt;high agreement with human raters&lt;/strong&gt; and near-zero deviation from true essay scores.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzav7nce7q51rg2klptg0.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzav7nce7q51rg2klptg0.jpeg" alt="Confusion matrix" width="780" height="703"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Confusion matrix showing strong alignment between predicted and actual scores.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwpx7ezrlvxmtk2oceqkt.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwpx7ezrlvxmtk2oceqkt.jpeg" alt="Actual vs predicted graph" width="624" height="703"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Predicted vs Actual score distribution — illustrating strong model reliability.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Visual analyses such as &lt;strong&gt;loss curves&lt;/strong&gt;, &lt;strong&gt;scatter plots&lt;/strong&gt;, and &lt;strong&gt;confusion matrices&lt;/strong&gt; demonstrated that Monarch-AES consistently &lt;strong&gt;outperformed transformer-only baselines&lt;/strong&gt;, achieving more &lt;em&gt;human-like evaluation&lt;/em&gt; through the blend of semantics and structure.&lt;/p&gt;




&lt;h2&gt;
  
  
  Lessons and Reflections
&lt;/h2&gt;

&lt;p&gt;The Monarch-AES project yielded several insights:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Hybrid architectures&lt;/strong&gt; combining transformers and GNNs lead to richer, more interpretable representations.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Graphical essay modeling&lt;/strong&gt; revealed how thematic links and sentence transitions influence perceived writing quality.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Metaheuristic optimizers&lt;/strong&gt; like MBO can outperform gradient-based ones in navigating complex search landscapes.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Efficient preprocessing&lt;/strong&gt; and &lt;strong&gt;clean pipelines&lt;/strong&gt; greatly improved scalability and reproducibility.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Monarch-AES was built on a fundamental belief:  &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Meaning and structure must go hand in hand — in writing and in AI.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Essays are not mere token sequences but &lt;strong&gt;structured arguments&lt;/strong&gt;.&lt;br&gt;&lt;br&gt;
By combining &lt;strong&gt;RoBERTa’s semantic power&lt;/strong&gt; with &lt;strong&gt;GAT’s structural insight&lt;/strong&gt;, the system evaluated essays like a human — understanding &lt;em&gt;what&lt;/em&gt; is said and &lt;em&gt;how&lt;/em&gt; it’s said.&lt;/p&gt;

&lt;p&gt;This work underscored the importance of &lt;strong&gt;hybrid intelligence&lt;/strong&gt; — blending architectures and ideas for deeper understanding.&lt;br&gt;&lt;br&gt;
The future of NLP lies not in choosing between semantics and structure but in &lt;strong&gt;integrating them into one unified system.&lt;/strong&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  Tags
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;essay scoring system&lt;/code&gt; · &lt;code&gt;monarch butterfly optimization&lt;/code&gt; · &lt;code&gt;AI&lt;/code&gt; · &lt;code&gt;Machine Learning&lt;/code&gt; · &lt;code&gt;RoBERTa&lt;/code&gt; · &lt;code&gt;GNN&lt;/code&gt;&lt;/p&gt;




&lt;p&gt;© 2025 &lt;strong&gt;Naveen's Technical Revelations&lt;/strong&gt;  &lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>gnn</category>
      <category>webdev</category>
    </item>
  </channel>
</rss>
