<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Nicholas Synovic</title>
    <description>The latest articles on Forem by Nicholas Synovic (@nicholassynovic).</description>
    <link>https://forem.com/nicholassynovic</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F730471%2F2a348b4d-cfa8-45b8-8224-2aab58e918cd.jpg</url>
      <title>Forem: Nicholas Synovic</title>
      <link>https://forem.com/nicholassynovic</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/nicholassynovic"/>
    <language>en</language>
    <item>
      <title>Use Your Tokens Before You Lose Your Tokens</title>
      <dc:creator>Nicholas Synovic</dc:creator>
      <pubDate>Wed, 04 Mar 2026 03:37:10 +0000</pubDate>
      <link>https://forem.com/nicholassynovic/use-your-tokens-before-you-lose-your-tokens-kdp</link>
      <guid>https://forem.com/nicholassynovic/use-your-tokens-before-you-lose-your-tokens-kdp</guid>
      <description>&lt;p&gt;If you have the privilege of a GitHub Copilot Education license or a workplace-wide Google AI Plus subscription, I have one primary piece of advice: burn through your credits. These institutional offerings provide a unique “sandbox” where you can fail for free. My recommendation for mastering these agents is to start small but think critically.&lt;/p&gt;

&lt;p&gt;Begin with a project you already know inside and out. Take a well-documented method and ask the agent to document it from scratch. Does it capture the nuance? Does it understand the “why” behind the logic? Now, expand the scope: provide the agent with the entire class or module and repeat the task. Observe how the quality of the output shifts as you provide more project context. This exercise isn’t just about documentation; it’s about learning the “contextual threshold” of the model you are using.&lt;/p&gt;

&lt;p&gt;Once you understand the agent’s baseline, move into active validation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tooling Audit: Ask the agent to identify all the explicit and implicit configuration options within your codebase. See if it can find the “ghosts” in your architecture.&lt;/li&gt;
&lt;li&gt;Security &amp;amp; Memory Loops: Ask the agent to generate a memory-safe implementation of a function, then validate that code against a tool like &lt;code&gt;valgrind&lt;/code&gt;. If it fails, pass the valgrind error log back into the agent. Watching an agent respond to a debugger’s output is the best way to understand its ability to “reason” through technical constraints.&lt;/li&gt;
&lt;li&gt;Planning vs. Execution: Use the Plan Mode to have the agent tackle a specific GitHub Issue. Evaluate it not just on the code it writes, but on the logic of the steps it proposes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We are at a unique juncture where LLMs trained on code are only going to become more pervasive and more capable. Use the opportunity your institution has provided to become a leader in understanding what these agents can—and cannot—do. Identify the patterns that lead to failure and the strategies that lead to success.&lt;/p&gt;

&lt;p&gt;It is a tall order to stay ahead of this curve, but as students, scientists, and engineers, we are built for this challenge. Burn the tokens, make the mistakes, and break the models now. These agents are here to stay, and the best time to learn their limitations is while someone else is picking up the tab.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This is a section of a larger blog post I made on my website. Feel free to read the full post for free &lt;a href="https://nicholassynovic.github.io/blog_posts/2026-03-03.html" rel="noopener noreferrer"&gt;here&lt;/a&gt;. Thanks!&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>beginners</category>
      <category>learning</category>
    </item>
    <item>
      <title>Sensible Chuckle: The First `git commit` Message Of The Git Version Control System</title>
      <dc:creator>Nicholas Synovic</dc:creator>
      <pubDate>Wed, 14 May 2025 15:01:38 +0000</pubDate>
      <link>https://forem.com/nicholassynovic/sensible-chuckle-the-first-git-commit-message-of-the-git-version-control-project-3aef</link>
      <guid>https://forem.com/nicholassynovic/sensible-chuckle-the-first-git-commit-message-of-the-git-version-control-project-3aef</guid>
      <description>&lt;p&gt;As part of a side project, I was interested in exploring the first &lt;code&gt;git commit&lt;/code&gt; message of the Git Version Control System project.&lt;/p&gt;

&lt;p&gt;It was made by Linus Torvalds on 04/07/2005 and it is: &lt;/p&gt;

&lt;p&gt;&lt;code&gt;Initial revision of "git", the information manager from hell&lt;/code&gt;&lt;/p&gt;

</description>
      <category>git</category>
      <category>humor</category>
      <category>watercooler</category>
    </item>
    <item>
      <title>KiSSES: Keep Static Site Examples Simple</title>
      <dc:creator>Nicholas Synovic</dc:creator>
      <pubDate>Tue, 04 Mar 2025 20:21:32 +0000</pubDate>
      <link>https://forem.com/nicholassynovic/kisses-keep-static-site-examples-simple-mh3</link>
      <guid>https://forem.com/nicholassynovic/kisses-keep-static-site-examples-simple-mh3</guid>
      <description>&lt;p&gt;I don't know about you, but every time that I check out a static site generator's example GitHub page, I'm both over and underwhelmed at the same time. On one hand, the Github page often has great technical details and depth to allow me to leverage and extend the example to fit my needs. On the other hand, feature's such as GitHub Action integration, or deploying to GitHub pages is often left to the engineer to figure out. And in some cases, the example site is not longer in line with current revisions of the tool!&lt;/p&gt;

&lt;p&gt;And look, I know that every project is different, and that your preferred static site generator probably has better documentation and examples than what I've seen. But of the projects that I have seen, GitHub pages deployment or recommended project repository layouts are sidelined to focus on technical documentation. &lt;/p&gt;

&lt;p&gt;Is this good or bad? I don't know. Am I too unexperienced to work within these constraints? Maybe. But I can't be the only engineer to have faced these issues. And for projects aimed at quickly and rapidly creating websites from limited format text documents (e.g., Markup, ReStructured Text), I'd think that features such as starter or template GitHub repositories would be more common.&lt;/p&gt;

&lt;p&gt;Because of my frustrations, I've released two example GitHub repositories for two popular static site generators: &lt;a href="https://www.mkdocs.org/" rel="noopener noreferrer"&gt;MkDocs&lt;/a&gt; and &lt;a href="https://www.sphinx-doc.org/en/master/index.html" rel="noopener noreferrer"&gt;Sphinx&lt;/a&gt;. The goal with these repositories is to be focussed on a minimal project using the static site generator, that builds into a Read The Docs theme compatible website, and provide supporting tooling regarding formatting of the underlying formatting language. It also provides the tooling needed to deploy to GitHub Pages both from the command line and via GitHub Actions (both are powered by the &lt;a href="https://pypi.org/project/ghp-import/" rel="noopener noreferrer"&gt;&lt;code&gt;ghp-import&lt;/code&gt;&lt;/a&gt; project).&lt;/p&gt;

&lt;p&gt;Now I understand that my examples are not going to be complete to everyone. So I'd like to open my issue boards to the community to suggest how to better improve these examples. I think it's a real shame that better examples of minimal static sites don't exist, and I think projects like mine address low hanging fruit on their issue boards.&lt;/p&gt;

&lt;p&gt;MkDocs example site: &lt;a href="https://github.com/NicholasSynovic/example_mkdocs" rel="noopener noreferrer"&gt;https://github.com/NicholasSynovic/example_mkdocs&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Sphinx example site: &lt;a href="https://github.com/NicholasSynovic/example_sphinx" rel="noopener noreferrer"&gt;https://github.com/NicholasSynovic/example_sphinx&lt;/a&gt; &lt;/p&gt;

</description>
      <category>webdev</category>
      <category>watercooler</category>
      <category>github</category>
    </item>
    <item>
      <title>DeepSeek w/ Ollama + Open WebUI</title>
      <dc:creator>Nicholas Synovic</dc:creator>
      <pubDate>Tue, 28 Jan 2025 14:57:41 +0000</pubDate>
      <link>https://forem.com/nicholassynovic/deepseek-w-ollama-open-webui-1jih</link>
      <guid>https://forem.com/nicholassynovic/deepseek-w-ollama-open-webui-1jih</guid>
      <description>&lt;h2&gt;
  
  
  DeepSeek R1 Exists
&lt;/h2&gt;

&lt;p&gt;It's the latest exciting open-source LLM model and the first (to my knowledge) open-source &lt;em&gt;reasoning&lt;/em&gt; model. While I'm unfamiliar with the intricacies of reasoning models, the gist of it is that these LLMs "think through" the problem before responding. In other words, as part of the output that you get from your prompt, you also get the chain of thought that supports the reasoning behind the model's output. This provides context as to why the model generated its final output. &lt;/p&gt;

&lt;p&gt;To be clear, I wouldn't call these models self-explaining; at the end of the day, LLMs are still considered black boxes that generate text based on statistical and mathematical computations. Just because DeepSeek "thinks through" a problem does not mean that it is truly sentient, accurate, or correct. There is still a need for human-in-the-loop (i.e., human reviewer) style usage when leveraging these models.&lt;/p&gt;

&lt;p&gt;With the context and clarification out of the way, how can you leverage DeepSeek R1 locally? And more broadly, how do you do so with any open-source LLM?&lt;/p&gt;

&lt;h2&gt;
  
  
  Ollama
&lt;/h2&gt;

&lt;p&gt;You leverage Ollama, an open-source &lt;em&gt;inference engine&lt;/em&gt; that is designed to work with &lt;em&gt;quantized LLMs&lt;/em&gt; via the &lt;em&gt;GGUF file format&lt;/em&gt; or hosted on the &lt;em&gt;Ollama Model Hub&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz6kpwb841fh9lm2s3wpq.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz6kpwb841fh9lm2s3wpq.gif" alt="https://media3.giphy.com/media/v1.Y2lkPTc5MGI3NjExOHJuYXpkeHpvN2N1YXFqbmphZTJmaHJpcm1uM2JoNzJ2d3dtdzVzZyZlcD12MV9pbnRlcm5hbF9naWZfYnlfaWQmY3Q9Zw/3WmWdBzqveXaE/giphy.gif" width="480" height="321"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://media3.giphy.com/media/v1.Y2lkPTc5MGI3NjExOHJuYXpkeHpvN2N1YXFqbmphZTJmaHJpcm1uM2JoNzJ2d3dtdzVzZyZlcD12MV9pbnRlcm5hbF9naWZfYnlfaWQmY3Q9Zw/3WmWdBzqveXaE/giphy.gif" rel="noopener noreferrer"&gt;https://media3.giphy.com/media/v1.Y2lkPTc5MGI3NjExOHJuYXpkeHpvN2N1YXFqbmphZTJmaHJpcm1uM2JoNzJ2d3dtdzVzZyZlcD12MV9pbnRlcm5hbF9naWZfYnlfaWQmY3Q9Zw/3WmWdBzqveXaE/giphy.gif&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In short:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An inference engine is a utility to run machine and deep learning models efficiently by optimizing the model's underlying computational graph.

&lt;ul&gt;
&lt;li&gt;The computational graph is similar to a program's call graph (or the order in which instructions are executed) but for mathematics&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Quantized LLMs are large language models whose computational graph relies on either a reduced number of bits to represent floating point numbers or integers.

&lt;ul&gt;
&lt;li&gt;Deep learning models are often trained using bit widths of 64. 128, or higher to represent the nuances of data that can be represented at a given time. Reducing the bit width or precision (e.g., floating point representation to integer representation) often improves the latency of the model (measured in tokens per second) at the cost of precise answers.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;The GGUF file format is not important for this discussion, but you can learn more about it &lt;a href="https://github.com/ggerganov/ggml/blob/master/docs/gguf.md" rel="noopener noreferrer"&gt;here&lt;/a&gt;
&lt;/li&gt;

&lt;li&gt;The &lt;a href="https://ollama.com/search" rel="noopener noreferrer"&gt;Ollama Model Hub&lt;/a&gt; hosts quantized LLMs ready for downstream consumption via the &lt;a href="https://github.com/ollama/ollama" rel="noopener noreferrer"&gt;&lt;code&gt;ollama&lt;/code&gt; command line utility&lt;/a&gt;.&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;Ollama provides a very simple interface to get started with using LLMs locally. Alternatives do exist (e.g., &lt;a href="https://vllm.ai" rel="noopener noreferrer"&gt;&lt;code&gt;vllm&lt;/code&gt;&lt;/a&gt;) but the tooling surrounding Ollama is extensive and well-documented, so it is my preferred choice when running LLMs locally.&lt;/p&gt;

&lt;p&gt;As Ollama is a command-line utility it can be difficult to leverage tooling such as document and image reasoning, web searching, retrieval augmented generation (RAG), and multi-modal data analysis without having to develop your own interface. This is where GUI interfaces such as Open WebUI fill the gap. &lt;/p&gt;

&lt;h2&gt;
  
  
  Open WebUI
&lt;/h2&gt;

&lt;p&gt;Open WebUI is a self-hostable application that communicates to Ollama via Ollama's HTTP REST API. It provides a ChatGPT-like interface that I find familiar while exposing existing ChatGPT features such as image generation, document reasoning, RAG, and web search. It also supports new features like the ability to chain multiple models together to provide one model with a prompt, and then automatically pass the response of that model into a second or third LLM for post-processing! I think it's a neat project and an exemplar of the Ollama ecosystem. You can find more information about it &lt;a href="https://github.com/open-webui/open-webui" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Putting It All Together
&lt;/h2&gt;

&lt;p&gt;Having gone through all of this now, how can we install these tools?&lt;/p&gt;

&lt;p&gt;If you are on an M series Mac, you should install Ollama locally and ignore all references to Ollama docker installation hereafter. This is because Ollama via Docker does not support M series Mac GPU acceleration, but the compiled binary does. You can read about it &lt;a href="https://ollama.com/blog/ollama-is-now-available-as-an-official-docker-image" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;For everyone else, I recommend installing Ollama and Open Web UI via Docker Compose via this YAML file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;3.8'&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ai&lt;/span&gt;

&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;ollama&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;container_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ollama&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ollama/ollama:0.5.7&lt;/span&gt;
    &lt;span class="na"&gt;restart&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;always&lt;/span&gt;
    &lt;span class="na"&gt;networks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;ollama-network&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;11434:11434"&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;ollama:/root/.ollama&lt;/span&gt;
    &lt;span class="na"&gt;deploy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;reservations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;devices&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;driver&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nvidia&lt;/span&gt;
              &lt;span class="na"&gt;capabilities&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;gpu&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

  &lt;span class="na"&gt;open-webui&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;container_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;open-webui&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ghcr.io/open-webui/open-webui:0.5.7&lt;/span&gt;
    &lt;span class="na"&gt;restart&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;always&lt;/span&gt;
    &lt;span class="na"&gt;extra_hosts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;host.docker.internal:host-gateway"&lt;/span&gt;
    &lt;span class="na"&gt;networks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;ollama-network&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;3000:8080"&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;open-webui:/app/backend/data&lt;/span&gt;

&lt;span class="na"&gt;networks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;ollama-network&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;external&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;

&lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;ollama&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;open-webui&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Copy this to a &lt;code&gt;docker-compose.yml&lt;/code&gt; file and then run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker compose &lt;span class="nt"&gt;--file&lt;/span&gt; ./docker-compose.yml create
docker compose &lt;span class="nt"&gt;--file&lt;/span&gt; ./docker-compose.yml start
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This installs Ollama at its latest version (as of writing) with NVIDIA GPU acceleration support across all GPUs. It also installs the latest version of Open WebUI (as of writing). The Ollama HTTP REST API is exposed on port 11434 and Open WebUI is exposed on port 3000.&lt;/p&gt;

&lt;p&gt;If you don't have NVIDIA GPU support for Docker or are using a different GPU vendor or intend to run this on CPU, see this post from &lt;a href="https://ollama.com/blog/ollama-is-now-available-as-an-official-docker-image" rel="noopener noreferrer"&gt;Ollama&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;Once installed run the following command to install DeepSeek R1 from Ollama's Model Hub:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker compose &lt;span class="nt"&gt;--file&lt;/span&gt; ./docker-compose.yml &lt;span class="nb"&gt;exec &lt;/span&gt;ollama ollama pull deepseek-r1:7b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then refresh your browser's connection to Open WebUI (via &lt;a href="http://localhost:3000" rel="noopener noreferrer"&gt;http://localhost:3000&lt;/a&gt;) and you should be able to start using DeepSeek R1 locally!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu17kp8sbp9l3qttjsdxr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu17kp8sbp9l3qttjsdxr.png" alt="DeepSeek R1 7B running on my system through Open WebUI with Ollama as the backend inference server" width="800" height="1022"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;NOTE&lt;/strong&gt;: All computers are different and unique snowflakes. For context, I have deployed this on a system running Pop OS  with an NVIDIA 3060 GPU. While I've done my best to make deployment of Ollama, Open WebUI, and DeepSeek R1 repeatable and reproducible, your system might need additional tinkering to get it to work right. Please see both the Ollama and Open WebUI documentation and GitHub Issue boards for support.&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>ai</category>
      <category>docker</category>
      <category>opensource</category>
      <category>watercooler</category>
    </item>
    <item>
      <title>Polyglot: Lua (Part 1)</title>
      <dc:creator>Nicholas Synovic</dc:creator>
      <pubDate>Mon, 13 Jan 2025 00:06:28 +0000</pubDate>
      <link>https://forem.com/nicholassynovic/polyglot-lua-part-1-opk</link>
      <guid>https://forem.com/nicholassynovic/polyglot-lua-part-1-opk</guid>
      <description>&lt;p&gt;In my previous post, I talked about the reasons why I want to learn more programming language, the Lua programming language, and the developer tooling for Lua. Now it's time to actually code in Lua!&lt;/p&gt;

&lt;p&gt;For this post, I'll be completing several basic Rosetta Code tasks. Nothing crazy, but enough to get me familiar with the language and its syntax. As Lua has a fairly minimal and straight forward syntax, I'll post the code snippets and output here, but I won't explain the implementation. For the complete source code, you can see my GitHub repository &lt;a href="https://github.com/NicholasSynovic/example_lua" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  GitHub Template
&lt;/h2&gt;

&lt;p&gt;I created a GitHub Template to bootstrap my Lua projects going forward. You can find it &lt;a href="https://github.com/NicholasSynovic/template_lua" rel="noopener noreferrer"&gt;here&lt;/a&gt;. As I find tooling to improve my Lua experience, I'll update the template.&lt;/p&gt;

&lt;h2&gt;
  
  
  Rosetta Code Problems
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://rosettacode.org/wiki/Arithmetic/Integer" rel="noopener noreferrer"&gt;Integer Arithmetic&lt;/a&gt;
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;Outcome: Taught me how to take in user input and function declarations&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight lua"&gt;&lt;code&gt;&lt;span class="kd"&gt;local&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="kd"&gt;local&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;difference&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="kd"&gt;local&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;product&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="kd"&gt;local&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;int_quotient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;//&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="c1"&gt;-- Rounds to negative infinity&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="kd"&gt;local&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;remainder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="kd"&gt;local&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;exponentiation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;^&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="kd"&gt;local&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="nb"&gt;io.write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"First number: "&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;-- Use with io.read for single line input&lt;/span&gt;
    &lt;span class="kd"&gt;local&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;io.read&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"n"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;-- Captures user input&lt;/span&gt;

    &lt;span class="nb"&gt;io.write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Second number: "&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="kd"&gt;local&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;io.read&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"n"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"==="&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Sum: "&lt;/span&gt; &lt;span class="o"&gt;..&lt;/span&gt; &lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="c1"&gt;-- ".." syntax used to concatenate&lt;/span&gt;
    &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Difference: "&lt;/span&gt; &lt;span class="o"&gt;..&lt;/span&gt; &lt;span class="n"&gt;difference&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Product: "&lt;/span&gt; &lt;span class="o"&gt;..&lt;/span&gt; &lt;span class="n"&gt;product&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="s2"&gt;"Integer Quotient (rounds to negative infinity): "&lt;/span&gt; &lt;span class="o"&gt;..&lt;/span&gt; &lt;span class="n"&gt;int_quotient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Remainder"&lt;/span&gt; &lt;span class="o"&gt;..&lt;/span&gt; &lt;span class="n"&gt;remainder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Exponentiation: "&lt;/span&gt; &lt;span class="o"&gt;..&lt;/span&gt; &lt;span class="n"&gt;exponentiation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;a href="https://rosettacode.org/wiki/Compare_length_of_two_strings" rel="noopener noreferrer"&gt;String Length Comparison&lt;/a&gt;
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;Outcome: Learned that all objects (including arrays) are tables, how to sort tables, and how to index over them with a &lt;code&gt;for&lt;/code&gt; loop&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight lua"&gt;&lt;code&gt;&lt;span class="kd"&gt;local&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="nb"&gt;io.write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"First string: "&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="kd"&gt;local&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;io.read&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"l"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nb"&gt;io.write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Second string: "&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="kd"&gt;local&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;io.read&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"l"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nb"&gt;io.write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Third string: "&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="kd"&gt;local&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;io.read&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"l"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"==="&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="kd"&gt;local&lt;/span&gt; &lt;span class="n"&gt;strings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="c1"&gt;-- Loads strings into an array (implemented as a table)&lt;/span&gt;

    &lt;span class="nb"&gt;table.sort&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;strings&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;foo&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bar&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="o"&gt;#&lt;/span&gt;&lt;span class="n"&gt;foo&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;#&lt;/span&gt;&lt;span class="n"&gt;bar&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;-- Sort array based on string length&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;ipairs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;strings&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
        &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;#&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;-- Print string size then string content&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Conclusions
&lt;/h2&gt;

&lt;p&gt;Lua wasn't that hard to get a basic grasp of. While yes, I did not cover aspects such as loops, control flow, or binary operations, reading the &lt;a href="https://www.lua.org/manual/5.4/" rel="noopener noreferrer"&gt;manual&lt;/a&gt; and &lt;a href="https://www.lua.org/pil/contents.html" rel="noopener noreferrer"&gt;book&lt;/a&gt; provided enough context for me to grasp the core concepts.&lt;/p&gt;

&lt;p&gt;I'd like to thank the Rosetta Code community for their problems and solutions. Without them it would be far more difficult for me to understand these core language features. &lt;/p&gt;

</description>
      <category>programming</category>
      <category>beginners</category>
      <category>lua</category>
      <category>learning</category>
    </item>
    <item>
      <title>Polyglot: Lua (Part 0)</title>
      <dc:creator>Nicholas Synovic</dc:creator>
      <pubDate>Sun, 12 Jan 2025 21:39:05 +0000</pubDate>
      <link>https://forem.com/nicholassynovic/polyglot-lua-part-0-3ppg</link>
      <guid>https://forem.com/nicholassynovic/polyglot-lua-part-0-3ppg</guid>
      <description>&lt;p&gt;I've been interested in expanding my toolkit of programming languages for some time now. I would currently say that I am proficient in Java, C, and C++ and have expertise in Python. But this clearly isn't the full range of programming languages or experiences out there. For example, I have very little knowledge of functional or embedded languages.&lt;/p&gt;

&lt;p&gt;To encourage me to write more posts, I'm going to start documenting my experience learning different programming languages and the projects that I write with them. To start this series, I will begin with the Lua scripting language.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is Lua?
&lt;/h2&gt;

&lt;p&gt;Lua is an &lt;a href="https://www.lua.org/about.html" rel="noopener noreferrer"&gt;"efficient, lightweight, embeddable scripting language"&lt;/a&gt; in active development since 1993. It claims to be fast, but most importantly the interpreter is very small at only a few 552Kb for the latest (5.4.7) binary.&lt;/p&gt;

&lt;p&gt;Personally, this doesn't matter a whole lot to me. Binary size and speed mean less than if I can glean a new technique or experience from using the language. But I also don't want to waste time learning a dead language either. So every language that I learn needs to meet the following criteria:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Must have a package manager,&lt;/li&gt;
&lt;li&gt;Must be able to test code,&lt;/li&gt;
&lt;li&gt;Must have development tooling (e.g. LSP support, code formatting, linting) and,&lt;/li&gt;
&lt;li&gt;(Optional) Must support static typing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Lua supports most of this primarily through community packages. &lt;a href="https://luarocks.org/" rel="noopener noreferrer"&gt;&lt;code&gt;luarocks&lt;/code&gt;&lt;/a&gt; is the Lua package manager. Lua does not ship with a unit testing framework by default, but the community seems to have selected &lt;a href="https://luarocks.org/modules/bluebird75/luaunit" rel="noopener noreferrer"&gt;&lt;code&gt;luaunit&lt;/code&gt;&lt;/a&gt; as the defacto testing library. LSP and linting support is provided through the &lt;a href="https://luals.github.io/" rel="noopener noreferrer"&gt;&lt;code&gt;lua-language-server&lt;/code&gt;&lt;/a&gt; and code formatting is handled through &lt;a href="https://github.com/JohnnyMorganz/StyLua" rel="noopener noreferrer"&gt;&lt;code&gt;stylua&lt;/code&gt;&lt;/a&gt;. However, I can't find tooling similar to Python's &lt;a href="https://github.com/PyCQA/bandit" rel="noopener noreferrer"&gt;&lt;code&gt;bandit&lt;/code&gt;&lt;/a&gt; to perform security audits. I believe this to be an open area of Lua library development.&lt;/p&gt;

&lt;p&gt;Lua does not support static typing. But, given the minimal keywords and language features of Lua, the community has come up with different interpreters and programming languages that generate Lua code that implement static typing. &lt;a href="https://github.com/andremm/typedlua" rel="noopener noreferrer"&gt;&lt;code&gt;typedlua&lt;/code&gt;&lt;/a&gt; seemed promising, as it promised to implement a type system on top of Lua (like TypeScript), but hasn't received a commit in 5 years. &lt;a href="https://github.com/dibyendumajumdar/ravi" rel="noopener noreferrer"&gt;&lt;code&gt;ravi&lt;/code&gt;&lt;/a&gt; also seemed promising, but leverages a modified Lua VM which breaks compatibility with some Lua libraries. I would prefer the TypeScript-like approach to implementing static types to not break compatibility with existing libraries.&lt;/p&gt;

&lt;h2&gt;
  
  
  Learning Lua
&lt;/h2&gt;

&lt;p&gt;... Will have to wait for the next post. This post took me longer than expected to compile all of my sources. As a sneak peak, I intend to release a GitHub Lua template following my &lt;a href="https://github.com/NicholasSynovic?tab=repositories&amp;amp;q=template&amp;amp;type=&amp;amp;language=&amp;amp;sort=" rel="noopener noreferrer"&gt;other templates&lt;/a&gt; and another repo that is focussed on solving code kata from &lt;a href="https://rosettacode.org/wiki/Rosetta_Code" rel="noopener noreferrer"&gt;Rosetta Code&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>programming</category>
      <category>beginners</category>
      <category>lua</category>
      <category>learning</category>
    </item>
    <item>
      <title>GitHub Templates on templates on templates</title>
      <dc:creator>Nicholas Synovic</dc:creator>
      <pubDate>Thu, 02 Jan 2025 02:20:30 +0000</pubDate>
      <link>https://forem.com/nicholassynovic/github-templates-on-templates-on-templates-84k</link>
      <guid>https://forem.com/nicholassynovic/github-templates-on-templates-on-templates-84k</guid>
      <description>&lt;p&gt;Did you know that you can create a new repository from an already existing repository on GitHub? This allows you to inherit both the history and contents of the repository. But what if you want the contents?&lt;/p&gt;

&lt;p&gt;GitHub allows you to create template repositories, repositories whose histories are not inherited but whose content is when creating a new repository. This simple feature is compelling for bootstrapping new projects together. It allows you to define a generic repository with all your config files ready to go rather than copying and committing them after instantiation. Furthermore, depending on how you architect your templates, you can have templates that inherit other templates.&lt;/p&gt;

&lt;p&gt;I've found this particularly useful when creating &lt;em&gt;per language&lt;/em&gt; templates. I have my generic repository which contains my GitHub-specific files, generic tooling config files, and other supporting documents. Then for each programming language template, each inherits the generic template. Finally, for each project, it inherits the programming language tooling most relevant to it.&lt;/p&gt;

&lt;p&gt;I have found this to be an extreme time saver in my day-to-day work and personal projects. For an example template repository, you can see &lt;a href="https://github.com/NicholasSynovic/template_base" rel="noopener noreferrer"&gt;my generic template&lt;/a&gt; and &lt;a href="https://github.com/NicholasSynovic/template_python" rel="noopener noreferrer"&gt;my Python template&lt;/a&gt; repositories.&lt;/p&gt;

</description>
      <category>github</category>
    </item>
    <item>
      <title>Introducing acolor: A small utility to print ANSI color codes</title>
      <dc:creator>Nicholas Synovic</dc:creator>
      <pubDate>Tue, 31 Dec 2024 17:26:23 +0000</pubDate>
      <link>https://forem.com/nicholassynovic/introducing-acolor-a-small-utility-to-print-ansi-color-codes-1b03</link>
      <guid>https://forem.com/nicholassynovic/introducing-acolor-a-small-utility-to-print-ansi-color-codes-1b03</guid>
      <description>&lt;p&gt;In my previous post, I wrote about a tool I wanted to create to print ANSI color codes to the console. I currently need a this as I am "prettifying" my shell prompt at the moment and figured it would just be faster to leverage this tool over Googling the necessary shell codes.&lt;/p&gt;

&lt;p&gt;So I created &lt;code&gt;acolor&lt;/code&gt;, an open-source Python utility built on top of &lt;code&gt;colorist&lt;/code&gt; to provide a convient way to output ANSI color codes to the terminal. Currently, only named color codes are supported (e.g., red, green, blue). Hex, HSL, VGA, and RGB color codes are currently not supported but &lt;code&gt;acolor&lt;/code&gt; can easily be extended to include them. &lt;/p&gt;

&lt;p&gt;You can view the source code &lt;a href="https://github.com/NicholasSynovic/acolor" rel="noopener noreferrer"&gt;here&lt;/a&gt;. You can install it with &lt;code&gt;pipx&lt;/code&gt; via:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;pipx install git+https://github.com/NicholasSynovic/acolor&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Here are the current command line options of the applicaton:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;acolor &lt;span class="nt"&gt;--help&lt;/span&gt;

Usage: acolor &lt;span class="o"&gt;[&lt;/span&gt;OPTIONS]

Options:
  &lt;span class="nt"&gt;-c&lt;/span&gt;, &lt;span class="nt"&gt;--color&lt;/span&gt; TEXT  Color name to generate ANSI code
  &lt;span class="nt"&gt;-r&lt;/span&gt;, &lt;span class="nt"&gt;--reset&lt;/span&gt;       Print ANSI reset code
  &lt;span class="nt"&gt;--help&lt;/span&gt;            Show this message and exit.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here's an example usage:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;acolor &lt;span class="nt"&gt;--color&lt;/span&gt; red
&lt;span class="s1"&gt;'\x1b[31m'&lt;/span&gt;

&lt;span class="nv"&gt;$ &lt;/span&gt;acolor &lt;span class="nt"&gt;--reset&lt;/span&gt;
&lt;span class="s1"&gt;'\x1b[0m'&lt;/span&gt;

&lt;span class="nv"&gt;$ &lt;/span&gt;acolor &lt;span class="nt"&gt;--color&lt;/span&gt; &lt;span class="nb"&gt;test
test &lt;/span&gt;is not a valid color: dict_keys&lt;span class="o"&gt;([&lt;/span&gt;&lt;span class="s1"&gt;'BLACK'&lt;/span&gt;, &lt;span class="s1"&gt;'RED'&lt;/span&gt;, &lt;span class="s1"&gt;'GREEN'&lt;/span&gt;, &lt;span class="s1"&gt;'YELLOW'&lt;/span&gt;, &lt;span class="s1"&gt;'BLUE'&lt;/span&gt;, &lt;span class="s1"&gt;'MAGENTA'&lt;/span&gt;, &lt;span class="s1"&gt;'CYAN'&lt;/span&gt;, &lt;span class="s1"&gt;'WHITE'&lt;/span&gt;&lt;span class="o"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>linux</category>
      <category>python</category>
      <category>tooling</category>
      <category>beginners</category>
    </item>
    <item>
      <title>Install Tailscale With Ansible</title>
      <dc:creator>Nicholas Synovic</dc:creator>
      <pubDate>Sat, 28 Dec 2024 22:59:47 +0000</pubDate>
      <link>https://forem.com/nicholassynovic/install-tailscale-with-ansible-3962</link>
      <guid>https://forem.com/nicholassynovic/install-tailscale-with-ansible-3962</guid>
      <description>&lt;p&gt;I recently found out about &lt;a href="https://tailscale.com/" rel="noopener noreferrer"&gt;Tailscale&lt;/a&gt; from the &lt;a href="https://www.youtube.com/watch?v=UyczOQTx5Gg" rel="noopener noreferrer"&gt;Level1Tech's interview with its founder&lt;/a&gt;. After trying it out, I can say that I am more than satisfied with its performance, ease of use, and ability to network all of my devices together across different intranets. &lt;/p&gt;

&lt;p&gt;As someone who prefers to configure their computer using infrastructure-as-code (IaC) practices, I decided to write an Ansible play for installing Tailscale. The following is the play that I created:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Install Tailscale&lt;/span&gt;
  &lt;span class="na"&gt;hosts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;myhosts&lt;/span&gt;
  &lt;span class="na"&gt;become&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;tasks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Download Tailscale GPG Key&lt;/span&gt;
      &lt;span class="na"&gt;ansible.builtin.uri&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;dest&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/usr/share/keyrings/tailscale-archive-keyring.gpg&lt;/span&gt;
        &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://pkgs.tailscale.com/stable/ubuntu/jammy.noarmor.gpg&lt;/span&gt; 

    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Add Tailscale repository&lt;/span&gt;
      &lt;span class="na"&gt;ansible.builtin.uri&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;dest&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/etc/apt/sources.list.d/tailscale.list&lt;/span&gt;
        &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://pkgs.tailscale.com/stable/ubuntu/jammy.tailscale-keyring.list&lt;/span&gt;

    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Install Tailscale&lt;/span&gt;
      &lt;span class="na"&gt;ansible.builtin.apt&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tailscale&lt;/span&gt;
        &lt;span class="na"&gt;update_cache&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
        &lt;span class="na"&gt;state&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;present&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This play is my attempt at a direct translation from the &lt;a href="https://tailscale.com/download/linux" rel="noopener noreferrer"&gt;Tailscale download instructions&lt;/a&gt;. For those who are more familiar with Ansible, let me know how I can improve upon this play.&lt;/p&gt;

&lt;p&gt;Thanks!&lt;/p&gt;

</description>
      <category>linux</category>
      <category>ansible</category>
      <category>network</category>
      <category>watercooler</category>
    </item>
    <item>
      <title>Back To Basics: git</title>
      <dc:creator>Nicholas Synovic</dc:creator>
      <pubDate>Sat, 28 Dec 2024 01:44:45 +0000</pubDate>
      <link>https://forem.com/nicholassynovic/back-to-basics-git-478m</link>
      <guid>https://forem.com/nicholassynovic/back-to-basics-git-478m</guid>
      <description>&lt;h2&gt;
  
  
  Not to brag, but...
&lt;/h2&gt;

&lt;p&gt;I can use &lt;code&gt;git&lt;/code&gt; (like everyone else). I've been using &lt;code&gt;git&lt;/code&gt; since ~2016 and its been my primary VCS tooling throughout university. I've also &lt;a href="https://arxiv.org/abs/2207.11767" rel="noopener noreferrer"&gt;published research&lt;/a&gt; that leverages &lt;code&gt;git&lt;/code&gt; and GitHub to derive project insights. I've been very fond of the technology, but have come to realize that I'm not adequetly leveraging both, and thus hindering my progress.&lt;/p&gt;

&lt;p&gt;So for today's post, I want to optimize my &lt;code&gt;git&lt;/code&gt; config to maximize my productivity when using the tool.&lt;/p&gt;

&lt;h2&gt;
  
  
  Mr. Worldwide
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;git&lt;/code&gt; can be configured globally, system-wide, or on a per-project basis. When configured globally, it affects every project for that particular user. For me, this is my preferred configuration option as I'm often working on solo-projects. &lt;/p&gt;

&lt;p&gt;The &lt;code&gt;git&lt;/code&gt; documentation for &lt;code&gt;git config&lt;/code&gt; can be viewed &lt;a href="https://git-scm.com/docs/git-config#Documentation/git-config.txt-alias" rel="noopener noreferrer"&gt;here&lt;/a&gt;. If you are following along, this is the file stored at: &lt;code&gt;~/.gitconfig&lt;/code&gt;. Each option can be configured with &lt;code&gt;git config --global --add KEY VALUE&lt;/code&gt;, but I'll be displaying the output from the file itself. To start, we'll configure &lt;code&gt;git blame&lt;/code&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  "It's Your Fault!"
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;git blame&lt;/code&gt; reports who contributed each line in a given file. This is particularly useful when identifying who contributed a specific feature, created a bug, or maliciously tampered with a file. There isn't much to configure here, but I will turn on repeated line coloring (for repeated lines contributed in a commit), using the UNIX Epoch as the time format, and reporting author email addresses over names.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[blame]
    coloring = repeatedLines
    date = unix
    showEmail = true
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Color Makes It Cooler
&lt;/h2&gt;

&lt;p&gt;I typically work in terminals that support ANSI color codes, so anytime that I can add a splash of color to my development experience is pleasant. I've made &lt;code&gt;git&lt;/code&gt; output most of its UI in color if possible using the &lt;code&gt;ui.color&lt;/code&gt; config option set to  &lt;code&gt;auto&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[color]
    ui = auto
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  All My Ducks In A Column
&lt;/h2&gt;

&lt;p&gt;Some of &lt;code&gt;git&lt;/code&gt;'s commands can be formatted as columnar output. However, I don't know which commands they are? It's undocumented as to which commands are affected, but it does affect &lt;code&gt;git blame&lt;/code&gt; and I like standardized output so I'm going to set it to always be on.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[column]
    ui = always
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Signing Off
&lt;/h2&gt;

&lt;p&gt;I wrote a Dev.to post on why &lt;a href="https://dev.to/nicholassynovic/why-sign-commits-1nlb"&gt;you should sign your commits with GPG&lt;/a&gt;, and I still stand by that post today. While tedious to setup and maintain across workstations, it does provide a layer of collaborator authentication.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[commit]
    gpgSign = true
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Speed Demon
&lt;/h2&gt;

&lt;p&gt;Some of the work that I do involves assessing the quality of software repositories longitudinally. Thus, I'm often checking out many commits sequentially in a &lt;code&gt;git&lt;/code&gt; repository. Therefore, when I heard about the &lt;code&gt;core.fsmonitor&lt;/code&gt; config option, I was ecstatic. This option, "can speed up Git commands that need to refresh the Git index (e.g. git status) in a working directory with many files. The built-in monitor eliminates the need to install and maintain an external third-party tool" (&lt;a href="https://git-scm.com/docs/git-config#Documentation/git-config.txt-corefsmonitor" rel="noopener noreferrer"&gt;Source&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;In my testing, I found that when checking out 500 commits sequentially from the &lt;a href="https://github.com/numpy/numpy" rel="noopener noreferrer"&gt;&lt;code&gt;numpy&lt;/code&gt; repository&lt;/a&gt;, disabling this feature required 13.8 seconds to complete on average across 10 runs. Enabling this feature took on average 11.2 seconds across 10 runs. Not an astounding difference in testing, but if &lt;code&gt;core.fsmonitor&lt;/code&gt; can save me  2.6 seconds per 500 commits, on a project with 37,775 commits that could add up to a time savings of 211.54 seconds, or 3 minutes and 32 seconds! More testing on my end needs to be done if this feature scales linearly, but for now I will keep it on and use version 1 of the tool.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[core]
        fsmonitor = true
        fsmonitorHookVersion = 1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Core Defaults
&lt;/h2&gt;

&lt;p&gt;In addition to the &lt;code&gt;fsmonitor&lt;/code&gt; config, I also leverage &lt;code&gt;nvim&lt;/code&gt; and &lt;code&gt;less&lt;/code&gt; as my editor and pager of choice.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[core]
        fsmonitor = true
        fsmonitorHookVersion = 1
        editor = nvim
        pager = less
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Optimizing Nodes And Edges
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;git&lt;/code&gt; can create a graph of how every commit relates to one another. This allows for efficiently applying patches to a commit once checked out. However, this has to be done manually with &lt;code&gt;git commit-graph write&lt;/code&gt;. We can automate some of this by enabling the commit graph to be written anytime &lt;code&gt;git fetch&lt;/code&gt; is called.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[fetch]
    writeCommitGraph = true
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Don't Forget About The User!
&lt;/h2&gt;

&lt;p&gt;Finally, I'll configure my name and email for &lt;code&gt;git&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[user]
    name = Nicholas M. Synovic
    email = ***
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Wrapping Up
&lt;/h2&gt;

&lt;p&gt;I know that I've skipped over many different configuration options that &lt;code&gt;git&lt;/code&gt; has to offer. So consider this post and my config a jumping off point that you can extend.&lt;/p&gt;

&lt;p&gt;My full config is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[blame]
        coloring = repeatedLines
        date = unix
        showEmail = true
[color]
        ui = auto
[column]
        ui = always
[commit]
        gpgSign = true
[core]
        fsmonitor = true
        fsmonitorHookVersion = 1
        editor = nvim
        pager = less
[fetch]
    writeCommitGraph = true
[user]
    name = Nicholas M. Synovic
    email = ***
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>git</category>
      <category>github</category>
      <category>tooling</category>
      <category>linux</category>
    </item>
    <item>
      <title>Submitting GPU jobs to Slurm @ Loyola University Chicago</title>
      <dc:creator>Nicholas Synovic</dc:creator>
      <pubDate>Sun, 08 Dec 2024 01:06:03 +0000</pubDate>
      <link>https://forem.com/nicholassynovic/submitting-gpu-jobs-to-slurm-loyola-university-chicago-41pd</link>
      <guid>https://forem.com/nicholassynovic/submitting-gpu-jobs-to-slurm-loyola-university-chicago-41pd</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Slurm logo taken from [0]&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Context
&lt;/h2&gt;

&lt;p&gt;The Computer Science department at Loyola University Chicago [1] utilizes a high-performance computing cluster to support both research and teaching initiatives with particular interest in running HPC and AI applications with efficiency and effectiveness. As our department's needs have grown and changed over time, it has become clear that we require a more structured approach to allocating computational resources to individual projects. Our current method of running scripts as background processes thereby leveraging shared resources across all users, has limitations and does not scale effectively for ongoing projects. Specifically, our reliance on shared resources often results in computational bottlenecks due to multiple concurrent jobs competing for limited system resources. &lt;/p&gt;

&lt;p&gt;To effectively manage the execution of jobs, resource allocation, and job order within our department, we are exploring the use of a job scheduler [2] as a solution. Specifically, Slurm [3] is currently under consideration due to its ability to meet our computational needs. However, given that not all members of the department have experience with job scheduling, and no formal training process currently exists, this blog post aims to serve as a brief, informal introduction to the technology and its applications, providing a foundation for readers to pursue further research into the applications of job schedulers and their benefits.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Slurm?
&lt;/h2&gt;

&lt;p&gt;Slurm is an open-source job scheduler software package that enables efficient management of workload execution on shared computing resources, such as cluster computers [3]. A job scheduler like Slurm manages the order in which programs or applications (referred to as "jobs") are executed on these resources. By queuing jobs and allocating access to computational resources on a managed basis (e.g., first-in-first-out, last-in-last-out, or when specific hardware becomes available), Slurm ensures that each job has exclusive access to the required resources, preventing conflicts between multiple concurrent processes.&lt;/p&gt;

&lt;p&gt;More information about Slurm can be found here [3]. The user guide and documentation for Slurm can be found here [4]. Slurm's source code is available on GitHub here [5].  &lt;/p&gt;

&lt;h2&gt;
  
  
  Problem
&lt;/h2&gt;

&lt;p&gt;Our department aims to integrate AI methods into our research and teaching programs, with a current focus on batch inferencing, training, and fine-tuning large language models (LLMs). To achieve this, we require access to significant GPU resources. However, our current setup limits individual users from fully utilizing the available GPUs for these computationally intensive tasks, as multiple users are often competing for simultaneous access to the same or related resources.&lt;/p&gt;

&lt;p&gt;Our cluster computer has the necessary hardware and software infrastructure to execute AI and HPC codes efficiently. However, due to shared resource allocation among multiple users, these codes often take longer than expected to complete. In some cases, they may even stall or be terminated by the system, as it prioritizes freeing resources for other users over allowing a single task to run for an extended period.&lt;/p&gt;

&lt;h2&gt;
  
  
  Solution
&lt;/h2&gt;

&lt;p&gt;With Slurm, we can utilize a set of fully available computational resources to schedule jobs efficiently. Users can configure their jobs to take advantage of specific resources and allocate a specified number of each resource as needed. Additionally, if a job does not require exclusive access to system capabilities, Slurm enables parallel execution by running multiple jobs simultaneously on separate hardware units.&lt;/p&gt;

&lt;p&gt;The rest of this post is a tutorial that provides a step-by-step guide on how to submit jobs to Slurm. We'll use a real-world example - training a simple Convolutional Neural Network model on the MNIST dataset using TensorFlow [6] and Keras [7] in Python. &lt;/p&gt;

&lt;p&gt;While our focus is on using Slurm, it's essential to write your code with concurrency and parallelism in mind. This means designing your program to take advantage of multiple computational resources simultaneously. If your code isn't optimized for concurrent execution, scaling its performance will be challenging. We assume prior knowledge of writing high-performance computing (HPC) codes and focus on using Slurm to manage and execute them efficiently.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;NOTE&lt;/strong&gt;: The code utilized in this tutorial is not optimized for running on multiple GPUs by default and will not scale with additional resources. To scale the code, you'll need to extend the code to support a multi-GPU, distributed training strategy as outlined in [8]. &lt;/p&gt;

&lt;h3&gt;
  
  
  Tutorial
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;This tutorial will guide you through submitting jobs to Slurm in a series of easy-to-follow steps. Important notes and considerations will be highlighted in block quotes.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;Connect to the cluster computer.&lt;/li&gt;
&lt;li&gt;Clone your code from GitHub to a directory on the cluster computer.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;You are using &lt;code&gt;git&lt;/code&gt; and GitHub to keep track of versions, right?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;Configure, build, and test your software.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Here is where the tutorial will begin, I will be using this code provided by the Tensorflow team for training CNN model on the MNIST dataset [9].&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;Create a &lt;code&gt;bash&lt;/code&gt; script called &lt;code&gt;job.bash&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;touch job.bash&lt;/code&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;You can name this file whatever you want, but it will have to be a &lt;code&gt;bash&lt;/code&gt; script&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;Add the following code to &lt;code&gt;job.bash&lt;/code&gt;:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;

&lt;span class="c"&gt;#SBATCH --gres=gpu:1&lt;/span&gt;

module load python/3.10

srun python train.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;Here's what the code is doing line-by-line:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;#!/bin/bash&lt;/code&gt;: shebang to inform the operating system what interpreter to use&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;#SBATCH --gres=gpu:1&lt;/code&gt;: This defines an &lt;code&gt;sbatch&lt;/code&gt; directive to set Slurm to use a general resource (&lt;code&gt;--gres&lt;/code&gt;) of a single GPU (&lt;code&gt;gpu:1&lt;/code&gt;). If multiple GPUs are required, you would replace 1 with the number of GPUs needed.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;module load python/3.10&lt;/code&gt;: Configure the user environment to use &lt;code&gt;python3.10&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;srun python train.py&lt;/code&gt;: Submit the job (&lt;code&gt;python&lt;/code&gt;) to the Slurm queue with its arguments (&lt;code&gt;train.py&lt;/code&gt;) and configure the job with the aforementioned directives (see 2).&lt;/li&gt;
&lt;/ol&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;Run &lt;code&gt;sbatch job.bash&lt;/code&gt; to queue the job.&lt;/li&gt;
&lt;li&gt;Run &lt;code&gt;squeue&lt;/code&gt; to see the queued Slurm jobs.&lt;/li&gt;
&lt;li&gt;Wait for the job to execute. A &lt;code&gt;slurm-$(JOB_NUMBER).out&lt;/code&gt; file will be created with any standard output or error piped into it.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;And that's it! You now have a basic understanding of how to use Slurm for running GPU-related jobs. For a comprehensive guide on directives, configuration options, and more advanced usage, please refer to the official Slurm documentation here [10 - 12].&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;p&gt;[0] &lt;a href="https://slurm.schedmd.com/slurm_logo.png" rel="noopener noreferrer"&gt;https://slurm.schedmd.com/slurm_logo.png&lt;/a&gt;&lt;br&gt;
[1] &lt;a href="https://www.luc.edu/cs/" rel="noopener noreferrer"&gt;https://www.luc.edu/cs/&lt;/a&gt;&lt;br&gt;
[2] &lt;a href="https://en.wikipedia.org/wiki/Job_scheduler" rel="noopener noreferrer"&gt;https://en.wikipedia.org/wiki/Job_scheduler&lt;/a&gt;&lt;br&gt;
[3] &lt;a href="https://www.schedmd.com/slurm" rel="noopener noreferrer"&gt;https://www.schedmd.com/slurm&lt;/a&gt;&lt;br&gt;
[4] &lt;a href="https://slurm.schedmd.com/documentation.html" rel="noopener noreferrer"&gt;https://slurm.schedmd.com/documentation.html&lt;/a&gt;&lt;br&gt;
[5] &lt;a href="https://github.com/SchedMD/slurm" rel="noopener noreferrer"&gt;https://github.com/SchedMD/slurm&lt;/a&gt;&lt;br&gt;
[6] &lt;a href="https://www.tensorflow.org/" rel="noopener noreferrer"&gt;https://www.tensorflow.org/&lt;/a&gt;&lt;br&gt;
[7] &lt;a href="https://keras.io/" rel="noopener noreferrer"&gt;https://keras.io/&lt;/a&gt;&lt;br&gt;
[8] &lt;a href="https://www.tensorflow.org/guide/distributed_training" rel="noopener noreferrer"&gt;https://www.tensorflow.org/guide/distributed_training&lt;/a&gt;&lt;br&gt;
[9] &lt;a href="https://github.com/keras-team/keras-io/blob/master/examples/vision/mnist_convnet.py" rel="noopener noreferrer"&gt;https://github.com/keras-team/keras-io/blob/master/examples/vision/mnist_convnet.py&lt;/a&gt;&lt;br&gt;
[10] &lt;a href="https://slurm.schedmd.com/sbatch.html" rel="noopener noreferrer"&gt;https://slurm.schedmd.com/sbatch.html&lt;/a&gt;&lt;br&gt;
[11] &lt;a href="https://slurm.schedmd.com/srun.html" rel="noopener noreferrer"&gt;https://slurm.schedmd.com/srun.html&lt;/a&gt;&lt;br&gt;
[12] &lt;a href="https://slurm.schedmd.com/squeue.html" rel="noopener noreferrer"&gt;https://slurm.schedmd.com/squeue.html&lt;/a&gt;&lt;/p&gt;

</description>
      <category>slurm</category>
      <category>backend</category>
      <category>hpc</category>
      <category>softwareengineering</category>
    </item>
    <item>
      <title>Creating an arXiv DB</title>
      <dc:creator>Nicholas Synovic</dc:creator>
      <pubDate>Sun, 01 Sep 2024 00:27:10 +0000</pubDate>
      <link>https://forem.com/nicholassynovic/creating-an-arxiv-db-940</link>
      <guid>https://forem.com/nicholassynovic/creating-an-arxiv-db-940</guid>
      <description>&lt;p&gt;As a Ph.D. student studying Deep Learning (DL) from the perspective of a Software Engineer, I rely upon academic resources to learn about DL models, techniques, and methods. &lt;a href="https://arxiv.org" rel="noopener noreferrer"&gt;arXiv&lt;/a&gt; is arguably the largest host of the latest academic (but not peer-reviewed) DL manuscripts.  &lt;/p&gt;

&lt;p&gt;However, as it relies upon community donations to support the service, there are limitations to the service. One of them is that only the last week of manuscripts are browsable at any given time with the rest being searchable.&lt;/p&gt;

&lt;p&gt;As someone who checks the service often for the latest information, it can become irritating when I'm casually browsing the site, find an interesting manuscript, and (for one reason or another) forget to bookmark it and then not find the paper as I can't nail down the exact keywords to search for it. Additionally, I'd like to leverage the data on the site for other projects like testing retrieval augmented generation (RAG) techniques for finding information from manuscripts. &lt;/p&gt;

&lt;p&gt;To support users like me, the arXiv team releases the metadata of all papers submitted to the platform weekly on &lt;a href="https://www.kaggle.com/datasets/Cornell-University/arxiv" rel="noopener noreferrer"&gt;Kaggle&lt;/a&gt; as JSON. So for today's blog post, let's convert the JSON file into a queriable SQLite3 database!&lt;/p&gt;

&lt;h2&gt;
  
  
  Project Setup
&lt;/h2&gt;

&lt;p&gt;I'll leverage Python 3.10 and bash for this project primarily for the &lt;a href="https://pypi.org/project/pandas/" rel="noopener noreferrer"&gt;&lt;code&gt;pandas&lt;/code&gt; library&lt;/a&gt;. &lt;code&gt;pandas&lt;/code&gt; provides convenient &lt;code&gt;read_json&lt;/code&gt; and &lt;code&gt;to_sql&lt;/code&gt; methods for reading JSON files and writing to SQL databases respectfully.&lt;/p&gt;

&lt;p&gt;To start, I created a GitHub repository based on &lt;a href="https://github.com/NicholasSynovic/template_python" rel="noopener noreferrer"&gt;my Python template repo&lt;/a&gt;. You can find all the project code &lt;a href="https://github.com/NicholasSynovic/tool_arXiv-db" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting And Cleaning The Data
&lt;/h2&gt;

&lt;p&gt;As the arXiv Dataset is hosted on Kaggle, we can use their &lt;a href="https://pypi.org/project/kaggle/" rel="noopener noreferrer"&gt;&lt;code&gt;kaggle&lt;/code&gt;&lt;/a&gt; Python library to download and unzip the data. Wrapping this as a bash script, we get:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;

kaggle datasets download &lt;span class="nt"&gt;--unzip&lt;/span&gt; Cornell-University/arxiv &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="nv"&gt;$1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Where the &lt;code&gt;--unzip&lt;/code&gt; argument decompresses the data, and the &lt;code&gt;-p&lt;/code&gt; argument specifies a path to download the data. We can improve this further by leveraging &lt;a href="https://github.com/nk412/optparse" rel="noopener noreferrer"&gt;&lt;code&gt;optparse&lt;/code&gt;&lt;/a&gt; to provide command-line arguments for our script. All said and done, we have a download script that looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;

&lt;span class="nb"&gt;source &lt;/span&gt;optparse.bash
optparse.define &lt;span class="nv"&gt;short&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;p &lt;span class="nv"&gt;long&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;path &lt;span class="nv"&gt;desc&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"Directory to store dataset"&lt;/span&gt; &lt;span class="nv"&gt;variable&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;PATH &lt;span class="nv"&gt;default&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"."&lt;/span&gt;
&lt;span class="nb"&gt;source&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt; optparse.build &lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="nv"&gt;ABS_PATH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;realpath&lt;/span&gt; &lt;span class="nv"&gt;$PATH&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

kaggle datasets download &lt;span class="nt"&gt;--unzip&lt;/span&gt; Cornell-University/arxiv &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="nv"&gt;$ABS_PATH&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now that we have the data in JSON format, we can further optimize it by converting it into &lt;a href="https://jsonlines.org/" rel="noopener noreferrer"&gt;JSON Lines&lt;/a&gt; (JL) format. JL is a format for storing JSON data where each line is a single object. This effectively removes top-level arrays of objects which is how the arXiv Dataset is stored. By converting the data into a JL format, Pandas can read the file in chunks, thereby reducing the memory overhead by loading only portions of data into memory.&lt;/p&gt;

&lt;p&gt;We can leverage &lt;a href="https://github.com/jqlang/jq" rel="noopener noreferrer"&gt;&lt;code&gt;jq&lt;/code&gt;&lt;/a&gt; to do the conversion with this script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;

&lt;span class="nb"&gt;source &lt;/span&gt;optparse.bash

optparse.define &lt;span class="nv"&gt;short&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;i &lt;span class="nv"&gt;long&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;input &lt;span class="nv"&gt;desc&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"Input JSON file"&lt;/span&gt; &lt;span class="nv"&gt;variable&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;inputPath
optparse.define &lt;span class="nv"&gt;short&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;o &lt;span class="nv"&gt;long&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;output &lt;span class="nv"&gt;desc&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"Output JSON Lines file"&lt;/span&gt; &lt;span class="nv"&gt;variable&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;outputPath

&lt;span class="nb"&gt;source&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt; optparse.build &lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[[&lt;/span&gt; &lt;span class="nt"&gt;-z&lt;/span&gt; &lt;span class="nv"&gt;$inputPath&lt;/span&gt; &lt;span class="o"&gt;]]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"No input provided."&lt;/span&gt;
    &lt;span class="nb"&gt;exit &lt;/span&gt;1
&lt;span class="k"&gt;fi

if&lt;/span&gt; &lt;span class="o"&gt;[[&lt;/span&gt; &lt;span class="nt"&gt;-z&lt;/span&gt; &lt;span class="nv"&gt;$outputPath&lt;/span&gt; &lt;span class="o"&gt;]]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"No output provided."&lt;/span&gt;
    &lt;span class="nb"&gt;exit &lt;/span&gt;1
&lt;span class="k"&gt;fi

&lt;/span&gt;&lt;span class="nv"&gt;absInputPath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;realpath&lt;/span&gt; &lt;span class="nv"&gt;$inputPath&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;absOutputPath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;realpath&lt;/span&gt; &lt;span class="nv"&gt;$outputPath&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

jq &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="nv"&gt;$absInputPath&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$absOutputPath&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With that, our data is finally in a format where we can start loading it into a database!&lt;/p&gt;

&lt;h2&gt;
  
  
  Creating The Database
&lt;/h2&gt;

&lt;p&gt;We will define our database schema using &lt;a href="https://www.sqlalchemy.org/" rel="noopener noreferrer"&gt;SQLAlchemy&lt;/a&gt;. First, we will store a subset of the information in a single table called &lt;code&gt;documents&lt;/code&gt;. This is to test that our database configuration is correct and avoid storing nested data now. The code is fairly simple to create a SQLite3 database with SQLAlchemy:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pathlib&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sqlalchemy&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;Engine&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;MetaData&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;PrimaryKeyConstraint&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;Table&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;create_engine&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;DB&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Engine&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_engine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sqlite:///&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;MetaData&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MetaData&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;documentTable&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;documents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createTables&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;createTables&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Table&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Table&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;documentTable&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="nc"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="nc"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="nc"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;submitter&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="nc"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;comments&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="nc"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;journal-ref&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="nc"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;doi&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="nc"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;report-no&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="nc"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;categories&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="nc"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;license&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="nc"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;abstract&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="nc"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;update_date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;DateTime&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="nc"&gt;PrimaryKeyConstraint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bind&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;checkfirst&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Running this code and checking the database schema we see that the table and columns have been created successfully:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff6ofhguzzn00nf2ei1lq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff6ofhguzzn00nf2ei1lq.png" width="586" height="235"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We will extend this later by adding tables and relationships between nested values and the documents table.&lt;/p&gt;

&lt;h2&gt;
  
  
  Inserting Data Into The Database
&lt;/h2&gt;

&lt;p&gt;With Pandas, we can read the data in from the JL file as chunks:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pathlib&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Iterator&lt;/span&gt;

&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DataFrame&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;readJSON&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fp&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunksize&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Iterator&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_json&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;path_or_buf&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;fp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;lines&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;chunksize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;chunksize&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ujson&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This &lt;code&gt;Iterator[DataFrame]&lt;/code&gt; object lazily reads the file into memory which we can do with a &lt;code&gt;for&lt;/code&gt; loop.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;loadData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dfs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Iterator&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;DB&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;  &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;DataFrame&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;dfs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;quit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As our current database schema doesn't take all of the fields captured in the JSON objects, we need to parse our DataFrame for the columns that are captured:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;getDocuments&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;documentsDF&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;DataFrame&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;submitter&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;comments&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;journal-ref&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;doi&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;report-no&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;categories&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;license&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;abstract&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;update_date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;copy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;documentsDF&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;update_date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_datetime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;arg&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;documentsDF&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;update_date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;documentsDF&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then we can load the document data into the database:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;loadData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dfs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Iterator&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;DB&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;DataFrame&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;dfs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;documentsDF&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;DataFrame&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;getDocuments&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;documentsDF&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_sql&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;documentTable&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;con&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;if_exists&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;append&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;quit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Checking our testing database, we can see that the first set of documents was loaded correctly:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5biwn7c5jxkklplrjrof.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5biwn7c5jxkklplrjrof.png" width="800" height="62"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;However, when we try to import the entire dataset into the database, we get a &lt;code&gt;sqlalchemy.exc.IntegrityError&lt;/code&gt; because some of the primary keys are duplicated in the JL file. Rather than handling this when converting the data, we can extend our &lt;code&gt;DB&lt;/code&gt; class to support reading DataFrames into a table while checking for duplicates should an error arise:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="bp"&gt;...&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DataFrame&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sqlalchemy.exc&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;IntegrityError&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;DB&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;

&lt;span class="bp"&gt;...&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;toSQL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tableName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_sql&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tableName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;con&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;if_exists&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;append&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;IntegrityError&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;ids&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;param&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;param&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;~&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;isin&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ids&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;

            &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_sql&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tableName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;con&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;if_exists&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;append&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If an error occurs, a unique DataFrame is created from the list of primary keys not reported by the &lt;code&gt;IntegrityError&lt;/code&gt;, another attempt is made to reinsert them into the database. Additionally, we now return the number of rows committed to the database.&lt;/p&gt;

&lt;p&gt;So our updated &lt;code&gt;loadData&lt;/code&gt; method now looks like this (with a Spinner object to help report progress from the &lt;a href="https://pypi.org/project/progress/" rel="noopener noreferrer"&gt;&lt;code&gt;progress&lt;/code&gt;&lt;/a&gt; library):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;loadData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dfs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Iterator&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;DB&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nc"&gt;Spinner&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Loading data into &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;... &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;spinner&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;DataFrame&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;dfs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;documentsDF&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;DataFrame&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;getDocuments&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toSQL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tableName&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;documentTable&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;documentsDF&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;spinner&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;next&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Wrapping Up
&lt;/h2&gt;

&lt;p&gt;Now that the basic structure of the application has been created, all that's left is to add the other tables.&lt;/p&gt;

&lt;p&gt;For example, we can create a table called &lt;code&gt;authors&lt;/code&gt; to store each author of a document:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Table&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Table&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;authorTable&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="nc"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Integer&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="nc"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;document_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="nc"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;author&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="nc"&gt;PrimaryKeyConstraint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="nc"&gt;ForeignKeyConstraint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;document_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="n"&gt;refcolumns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;documents.id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And then access only the authors from the DataFrame with this method:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;getAuthors&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;idIncrement&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;authorsDF&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;DataFrame&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;authors_parsed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
    &lt;span class="n"&gt;authorsDF&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;authorsDF&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;explode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;column&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;authors_parsed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ignore_index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;authorsDF&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;author&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;authorsDF&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;authors_parsed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;apply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;authorsDF&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;authorsDF&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;drop&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;authors_parsed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;authorsDF&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;idIncrement&lt;/span&gt;
    &lt;span class="n"&gt;authorsDF&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;authorsDF&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reset_index&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;authorsDF&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;authorsDF&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rename&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;document_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;index&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;authorsDF&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then we can use the &lt;code&gt;DB.toSQL()&lt;/code&gt; method to write it to the database.&lt;/p&gt;

&lt;p&gt;The final database schema is as follows (generated with &lt;a href="https://www.schemacrawler.com/" rel="noopener noreferrer"&gt;SchemaCrawler&lt;/a&gt;):&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Felvuus7uhhg49tsov9hh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Felvuus7uhhg49tsov9hh.png" alt=" " width="800" height="361"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As seen here, there is additional complexity in storing this data. The versions table undergoes similar transformations as well.&lt;/p&gt;

&lt;p&gt;If you are interested in how the &lt;code&gt;versions&lt;/code&gt; table is created and to leverage this tool, please visit the &lt;a href="https://github.com/NicholasSynovic/tool_arXiv-db" rel="noopener noreferrer"&gt;GitHub project page&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Thanks for taking the time to read this post. I hope to be posting more in the future. &lt;/p&gt;

</description>
      <category>arxiv</category>
      <category>database</category>
      <category>softwareengineering</category>
      <category>computerscience</category>
    </item>
  </channel>
</rss>
