<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Yaniv </title>
    <description>The latest articles on Forem by Yaniv  (@yaniv2809).</description>
    <link>https://forem.com/yaniv2809</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3872105%2Ff4b2afaa-aa64-4769-8240-0374345b6dde.png</url>
      <title>Forem: Yaniv </title>
      <link>https://forem.com/yaniv2809</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/yaniv2809"/>
    <language>en</language>
    <item>
      <title>How I Built a 12-Step CI/CD Pipeline That Spins Up MySQL, Flask, and Playwright From Scratch</title>
      <dc:creator>Yaniv </dc:creator>
      <pubDate>Sat, 11 Apr 2026 14:44:47 +0000</pubDate>
      <link>https://forem.com/yaniv2809/how-i-built-a-12-step-cicd-pipeline-that-spins-up-mysql-flask-and-playwright-from-scratch-912</link>
      <guid>https://forem.com/yaniv2809/how-i-built-a-12-step-cicd-pipeline-that-spins-up-mysql-flask-and-playwright-from-scratch-912</guid>
      <description>&lt;p&gt;Setting up CI for a web app is straightforward. Setting up CI for a test automation framework that needs a real database, two backend servers, and a headless browser — all starting from zero on every run — is a different problem.&lt;/p&gt;

&lt;p&gt;This is how I built a GitHub Actions pipeline that provisions the entire infrastructure, runs 37 tests across 3 layers, and deploys a historical test report — every time I push to main.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem: "Works On My Machine" Doesn't Scale
&lt;/h2&gt;

&lt;p&gt;Locally, my test framework needs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;MySQL 8.0 with a specific schema loaded&lt;/li&gt;
&lt;li&gt;A JSON Server running on port 3000&lt;/li&gt;
&lt;li&gt;A Flask API server running on port 5000&lt;/li&gt;
&lt;li&gt;Playwright with Chromium installed&lt;/li&gt;
&lt;li&gt;Environment variables for DB credentials and API keys&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Running &lt;code&gt;pytest&lt;/code&gt; locally assumes all of this is already set up. In CI, nothing exists. Every run starts from an empty Ubuntu container.&lt;/p&gt;

&lt;p&gt;The challenge isn't running the tests — it's building the world they need to run in.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 12 Steps
&lt;/h2&gt;

&lt;p&gt;Here's the full pipeline, and why each step exists:&lt;/p&gt;

&lt;h3&gt;
  
  
  Steps 1-3: Foundation
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;1. Checkout Code&lt;/span&gt;
  &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;

&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;2. Set up Python &lt;/span&gt;&lt;span class="m"&gt;3.13&lt;/span&gt;
  &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/setup-python@v5&lt;/span&gt;
  &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;python-version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;3.13'&lt;/span&gt;

&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;3. Set up Node.js &lt;/span&gt;&lt;span class="m"&gt;18&lt;/span&gt;
  &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/setup-node@v4&lt;/span&gt;
  &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;node-version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;18'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Python for the test framework. Node.js for JSON Server. Nothing surprising here, but note the explicit version pinning — &lt;code&gt;3.13&lt;/code&gt;, not &lt;code&gt;3.x&lt;/code&gt;. CI flakiness often starts with "we let the runner pick the version."&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4-5: Dependencies
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;4. Install Python Dependencies&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;python -m pip install --upgrade pip&lt;/span&gt;
    &lt;span class="s"&gt;pip install -r requirements.txt&lt;/span&gt;

&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;5. Install Playwright Browsers&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;playwright install chromium --with-deps&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;--with-deps&lt;/code&gt; is critical. Without it, Playwright installs the browser binary but not the OS-level libraries it needs (libgbm, libasound, etc.). The test run will fail with a cryptic error about missing shared objects.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 6: Database Schema
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;mysql&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mysql:8.0&lt;/span&gt;
    &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;MYSQL_ROOT_PASSWORD&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;root_password&lt;/span&gt;
      &lt;span class="na"&gt;MYSQL_DATABASE&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;expense_test_db&lt;/span&gt;
      &lt;span class="na"&gt;MYSQL_USER&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;test_user&lt;/span&gt;
      &lt;span class="na"&gt;MYSQL_PASSWORD&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;test_password&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;3306:3306&lt;/span&gt;
    &lt;span class="na"&gt;options&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;&amp;gt;-&lt;/span&gt;
      &lt;span class="s"&gt;--health-cmd="mysqladmin ping"&lt;/span&gt;
      &lt;span class="s"&gt;--health-interval=10s&lt;/span&gt;
      &lt;span class="s"&gt;--health-timeout=5s&lt;/span&gt;
      &lt;span class="s"&gt;--health-retries=5&lt;/span&gt;

&lt;span class="c1"&gt;# In steps:&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;6. Initialize MySQL Schema&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;sudo apt-get install -y mysql-client&lt;/span&gt;
    &lt;span class="s"&gt;mysql -h 127.0.0.1 -u test_user -ptest_password expense_test_db &amp;lt; data/init_mysql.sql&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The MySQL service container starts alongside the job. The &lt;code&gt;health-cmd&lt;/code&gt; ensures the container is actually ready before we try to load the schema. Without health checks, you'll hit "Connection refused" errors roughly 30% of the time.&lt;/p&gt;

&lt;p&gt;The schema itself is minimal — one table with a CHECK constraint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;expenses&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="n"&gt;AUTO_INCREMENT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;expense_name&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;amount&lt;/span&gt; &lt;span class="nb"&gt;DOUBLE&lt;/span&gt; &lt;span class="k"&gt;CHECK&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;category&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That CHECK constraint is actually a test target — one of my E2E tests validates that negative amounts get rejected at the DB level even though the UI accepts them.&lt;/p&gt;

&lt;h3&gt;
  
  
  Steps 7-9: Server Orchestration
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;7. Install &amp;amp; Start JSON Server&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;npm install -g json-server&lt;/span&gt;
    &lt;span class="s"&gt;json-server --watch json-server/db.json --port 3000 &amp;amp;&lt;/span&gt;

&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;8. Start Flask Server&lt;/span&gt;
  &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;DB_TYPE&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mysql&lt;/span&gt;
    &lt;span class="na"&gt;MYSQL_HOST&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;127.0.0.1&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;python server/app.py &amp;amp;&lt;/span&gt;

&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;9. Wait for Servers to be Ready&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;curl --retry 10 --retry-delay 2 --retry-connrefused http://localhost:3000/expenses&lt;/span&gt;
    &lt;span class="s"&gt;curl --retry 10 --retry-delay 2 --retry-connrefused http://localhost:5000/expenses&lt;/span&gt;
    &lt;span class="s"&gt;echo "All servers are up and running!"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;&amp;amp;&lt;/code&gt; at the end of each server command runs it in the background. Step 9 is the safety net — it polls both servers with retry logic until they respond, or fails after 20 seconds.&lt;/p&gt;

&lt;p&gt;This is a pattern I see skipped in a lot of CI setups. People start a server and immediately run tests, then wonder why they get intermittent connection errors. Always add a readiness check.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 10: The Actual Tests
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;10. Run Tests (exclude Mobile)&lt;/span&gt;
  &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;GROQ_API_KEY&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.GROQ_API_KEY }}&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;pytest -m "not mobile" --alluredir=allure-results --ai-analysis&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;-m "not mobile"&lt;/code&gt; excludes tests that require a physical Android device. The remaining 37 tests cover Web (Playwright), API, Database, and cross-layer E2E.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;--ai-analysis&lt;/code&gt; triggers an optional Groq LLM call on test failures to classify the root cause. The API key is stored as a GitHub Secret.&lt;/p&gt;

&lt;h3&gt;
  
  
  Steps 11-12: Reporting
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;11. Generate Allure Report&lt;/span&gt;
  &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;simple-elf/allure-report-action@master&lt;/span&gt;
  &lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;always()&lt;/span&gt;
  &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;allure_results&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;allure-results&lt;/span&gt;
    &lt;span class="na"&gt;allure_history&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;allure-history&lt;/span&gt;
    &lt;span class="na"&gt;keep_reports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;20&lt;/span&gt;

&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;12. Deploy Allure Report to GitHub Pages&lt;/span&gt;
  &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;peaceiris/actions-gh-pages@v3&lt;/span&gt;
  &lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;always()&lt;/span&gt;
  &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;github_token&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.GITHUB_TOKEN }}&lt;/span&gt;
    &lt;span class="na"&gt;publish_dir&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;allure-history&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;if: always()&lt;/code&gt; is key — the report is generated even when tests fail. Without this, failures produce no report, which is exactly when you need one most.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;keep_reports: 20&lt;/code&gt; maintains the last 20 runs, so you get historical trend analysis in Allure — pass rates over time, flakiness detection, duration trends.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Learned Building This
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Health checks prevent 80% of CI flakiness.&lt;/strong&gt; The MySQL health-cmd and the curl retry loops in step 9 eliminated almost all intermittent failures. Before adding them, roughly 1 in 4 runs failed due to timing issues.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. &lt;code&gt;if: always()&lt;/code&gt; on reporting steps is non-negotiable.&lt;/strong&gt; The whole point of CI reports is to understand failures. If the report step only runs on success, it's useless.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Service containers beat &lt;code&gt;docker-compose&lt;/code&gt; in GitHub Actions.&lt;/strong&gt; I originally tried running &lt;code&gt;docker-compose up&lt;/code&gt; inside the workflow. It works, but it's slower and harder to debug. Native service containers integrate better with the runner's networking.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Pin your versions.&lt;/strong&gt; Python 3.13, Node 18, MySQL 8.0 — not &lt;code&gt;latest&lt;/code&gt;, not &lt;code&gt;3.x&lt;/code&gt;. A version bump in a dependency should be a deliberate commit, not a surprise in CI.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Docker Alternative
&lt;/h2&gt;

&lt;p&gt;For local development, the same test suite runs via Docker Compose with a single command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker-compose up &lt;span class="nt"&gt;--build&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This uses a custom entrypoint script that replicates the CI steps — waits for MySQL, starts both servers, runs pytest:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;[1/5] Waiting for MySQL...        ✓
[2/5] Starting JSON Server...     ✓
[3/5] Starting Flask Server...    ✓
[4/5] Waiting for servers...      ✓
[5/5] Running tests...
========================= 34 passed, 3 xfailed =========================
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same tests, same infrastructure, same results — whether it runs in GitHub Actions or on a developer's laptop.&lt;/p&gt;

&lt;h2&gt;
  
  
  Full Source
&lt;/h2&gt;

&lt;p&gt;The complete workflow file and Docker setup are in the repo:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/Yaniv2809/Financial-Integrity-Ecosystem" rel="noopener noreferrer"&gt;Financial-Integrity-Ecosystem&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The CI config is at &lt;code&gt;.github/workflows/ci.yml&lt;/code&gt;. The Docker setup is in &lt;code&gt;Dockerfile&lt;/code&gt;, &lt;code&gt;docker-compose.yml&lt;/code&gt;, and &lt;code&gt;docker-entrypoint.sh&lt;/code&gt;.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This is part 2 of a series on building a multi-layer test automation framework. &lt;a href="https://dev.to/yaniv2809/how-i-used-set-theory-to-catch-bugs-that-unit-tests-miss-5d8a"&gt;Part 1&lt;/a&gt; covered using Set Theory for cross-layer data integrity validation.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Yaniv Metuku — QA Automation Engineer&lt;/em&gt;&lt;/p&gt;

</description>
      <category>testing</category>
      <category>python</category>
      <category>devops</category>
      <category>github</category>
    </item>
    <item>
      <title>How I Used Set Theory to Catch Bugs That Unit Tests Miss</title>
      <dc:creator>Yaniv </dc:creator>
      <pubDate>Fri, 10 Apr 2026 16:40:02 +0000</pubDate>
      <link>https://forem.com/yaniv2809/how-i-used-set-theory-to-catch-bugs-that-unit-tests-miss-5d8a</link>
      <guid>https://forem.com/yaniv2809/how-i-used-set-theory-to-catch-bugs-that-unit-tests-miss-5d8a</guid>
      <description>&lt;p&gt;Most test automation tutorials teach you to test layers in isolation: UI tests check buttons, API tests check status codes, DB tests check records. But the bugs that actually cost money in production? They live &lt;strong&gt;between&lt;/strong&gt; the layers.&lt;/p&gt;

&lt;p&gt;I learned this the hard way while building a test automation framework for a financial expense tracker. This post is about one specific technique — &lt;strong&gt;Set Theory validation&lt;/strong&gt; — that catches data integrity bugs that no single-layer test will ever find.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem: Everything Passes, But Data Is Wrong
&lt;/h2&gt;

&lt;p&gt;Imagine this scenario:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A user creates an expense for $100 through the Web UI&lt;/li&gt;
&lt;li&gt;The UI shows a success message ✅&lt;/li&gt;
&lt;li&gt;The API returns &lt;code&gt;201 Created&lt;/code&gt; ✅&lt;/li&gt;
&lt;li&gt;The database has... $0. Or two records. Or nothing.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Every individual layer test passes. The UI test confirms the success message appeared. The API test confirms the status code. But nobody verified that the &lt;strong&gt;actual data&lt;/strong&gt; made it through the entire pipeline correctly.&lt;/p&gt;

&lt;p&gt;In financial applications, this is not a cosmetic bug — it's a &lt;strong&gt;silent data inconsistency&lt;/strong&gt; that can go unnoticed until an audit.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Approach: Database State as a Mathematical Set
&lt;/h2&gt;

&lt;p&gt;Instead of checking "does a record exist?", I treat the entire database table as a mathematical set, and use &lt;strong&gt;set difference&lt;/strong&gt; to prove exactly what changed.&lt;/p&gt;

&lt;p&gt;Here's the concept:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Step 1: Capture DB state BEFORE the action
&lt;/span&gt;&lt;span class="n"&gt;old_set&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{(&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;each&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;expenses&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;old_sum&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;expenses&lt;/span&gt;

&lt;span class="c1"&gt;# Step 2: Perform the action (create expense via UI or API)
&lt;/span&gt;
&lt;span class="c1"&gt;# Step 3: Capture DB state AFTER the action
&lt;/span&gt;&lt;span class="n"&gt;new_set&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{(&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;each&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;expenses&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;new_sum&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;expenses&lt;/span&gt;

&lt;span class="c1"&gt;# Step 4: Validate using set difference
&lt;/span&gt;&lt;span class="n"&gt;isolated_record&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;new_set&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;old_set&lt;/span&gt;

&lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;isolated_record&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;          &lt;span class="c1"&gt;# Exactly ONE new record
&lt;/span&gt;&lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;new_sum&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;old_sum&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;expected_amount&lt;/span&gt;  &lt;span class="c1"&gt;# Amount is correct
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is powerful because:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;It's operation-independent.&lt;/strong&gt; Whether the expense was created via UI, API, or direct SQL — the validation is the same.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;It catches duplicates.&lt;/strong&gt; If a bug causes two records to be inserted, &lt;code&gt;len(isolated_record)&lt;/code&gt; will be 2.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;It catches phantom data.&lt;/strong&gt; If some other process modified the table during the test, the set difference will include unexpected records.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;It catches amount drift.&lt;/strong&gt; If &lt;code&gt;$100&lt;/code&gt; was entered but &lt;code&gt;$99.99&lt;/code&gt; was stored (floating point issues, rounding bugs), the sum check catches it.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Real Implementation
&lt;/h2&gt;

&lt;p&gt;In my framework, this looks like this across the stack:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DB helper that captures state:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@staticmethod&lt;/span&gt;
&lt;span class="nd"&gt;@allure.step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DB: Get all expenses as set&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_all_expenses_as_set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT id, expense_name, amount FROM expenses&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetchall&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt;

&lt;span class="nd"&gt;@staticmethod&lt;/span&gt;
&lt;span class="nd"&gt;@allure.step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DB: Get sum of amounts&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_sum_of_amounts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT COALESCE(SUM(amount), 0) FROM expenses&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetchone&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Cross-layer E2E test that uses it:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_api_create_reflects_in_db&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Capture pre-state
&lt;/span&gt;    &lt;span class="n"&gt;old_set&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;DBActions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_all_expenses_as_set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;old_sum&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;DBActions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_sum_of_amounts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Act: Create expense via API
&lt;/span&gt;    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;APIActions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;APIVerification&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;verify_status_code&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;201&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Capture post-state
&lt;/span&gt;    &lt;span class="n"&gt;new_set&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;DBActions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_all_expenses_as_set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;new_sum&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;DBActions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_sum_of_amounts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Validate integrity
&lt;/span&gt;    &lt;span class="n"&gt;diff&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;new_set&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;old_set&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;diff&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Expected 1 new record, got &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;diff&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;new_sum&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;old_sum&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;expected_amount&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The same pattern applies to &lt;strong&gt;update&lt;/strong&gt; (set difference shows one record with changed values) and &lt;strong&gt;delete&lt;/strong&gt; (old_set - new_set shows the removed record).&lt;/p&gt;

&lt;h2&gt;
  
  
  Where This Caught Real Bugs
&lt;/h2&gt;

&lt;p&gt;While building this, the Set Theory approach caught two issues that single-layer tests completely missed:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. MySQL CHECK constraint silently rejecting negative amounts.&lt;/strong&gt;&lt;br&gt;
The UI happily accepted &lt;code&gt;-50&lt;/code&gt; as an expense amount. The API returned &lt;code&gt;201&lt;/code&gt;. But MySQL's &lt;code&gt;CHECK (amount &amp;gt;= 0)&lt;/code&gt; constraint blocked the INSERT — so the record never existed in the DB. The set difference was empty when it should have contained one record. Without the cross-layer test, this would have looked like a perfectly passing test suite.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. VARCHAR(255) overflow truncation.&lt;/strong&gt;&lt;br&gt;
A 300-character expense name was entered through the UI. The API accepted it. MySQL truncated it to 255 characters silently. The set difference caught the mismatch because the stored record didn't match the expected data.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Full Picture
&lt;/h2&gt;

&lt;p&gt;This technique is one piece of a larger framework I built with 53 tests across 4 layers (Web/Playwright, API/Flask, Mobile/Appium, Database/MySQL). The cross-layer E2E tests that use Set Theory are a small percentage of the total test count, but they catch the highest-risk bugs.&lt;/p&gt;

&lt;p&gt;The architecture enforces strict separation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Tests → Workflows → Actions/Verifications → Page Objects + Data
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every layer has one job. Tests never call raw UI or API actions directly — they go through workflows that compose actions into business flows. This keeps the set theory validation reusable across different test scenarios.&lt;/p&gt;

&lt;h2&gt;
  
  
  When You Should (and Shouldn't) Use This
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Use it when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your application handles financial data, inventory, or any domain where data accuracy matters more than UI polish&lt;/li&gt;
&lt;li&gt;Data flows through multiple systems (frontend → backend → database → reporting)&lt;/li&gt;
&lt;li&gt;You've had production bugs where "the UI said X but the DB had Y"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Don't use it when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You're testing a static website or content-only app&lt;/li&gt;
&lt;li&gt;The DB is behind a well-tested ORM with strong constraints and you trust the abstraction&lt;/li&gt;
&lt;li&gt;Test execution time is critical and you can't afford the extra DB queries&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;p&gt;The full framework is open source:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/Yaniv2809/Financial-Integrity-Ecosystem" rel="noopener noreferrer"&gt;Financial-Integrity-Ecosystem&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can run the entire suite with a single command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/Yaniv2809/Financial-Integrity-Ecosystem.git
&lt;span class="nb"&gt;cd &lt;/span&gt;Financial-Integrity-Ecosystem
docker-compose up &lt;span class="nt"&gt;--build&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This spins up MySQL, Flask, JSON Server, and Playwright — runs all 37 non-mobile tests automatically.&lt;/p&gt;

&lt;p&gt;The cross-layer E2E tests are in &lt;code&gt;tests/api/test_e2e_api_db_expense.py&lt;/code&gt; and &lt;code&gt;tests/test_e2e_web_api_db.py&lt;/code&gt; if you want to see the set theory pattern in action.&lt;/p&gt;




&lt;p&gt;I recently shared this project on r/QualityAssurance and got valuable feedback that led to several improvements, including adding a testing philosophy section that explains the business risk behind each layer. If you have feedback or have used similar patterns in production, I'd genuinely like to hear about it.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Yaniv Metuku — QA Automation Engineer&lt;/em&gt;&lt;/p&gt;

</description>
      <category>testing</category>
      <category>python</category>
      <category>automation</category>
      <category>beginners</category>
    </item>
  </channel>
</rss>
