Forem: KILLALLSKYWALKER

Refactor the Terraform Script — Restoring Balance

KILLALLSKYWALKER — Tue, 13 Jan 2026 23:07:17 +0000

Previously when i use terraform and localstack , i avoid to using the tflocal wrapper due to one 100% same way when running in real environment . However i notice that thats a lot of unnecessary thing that we need to add and add more maintenance .

As example from previous

provider "aws" {
  profile = "localstack"
  region  = "us-east-1"

  s3_use_path_style           = true
  skip_credentials_validation = true
  skip_metadata_api_check     = true

  endpoints {
    s3             = "http://s3.localhost.localstack.cloud:4566"
  }

  default_tags {
    tags = {
      Environment = "tutorial"
      Project     = "terraform-configure-providers"
    }
  }
}

We need to specify specific server endpoint and also few flaf like skip_credentials_validation and others . Which is we dont need it on real environment .

Tflocal

Localstack provide a wrapper to run Terraform against LocalStack . You can read more about it in here .

Once you setup everything , you just can clean up your tf file . The only difference that you will need to is when you want to run it .

To init
tflocal init

To apply
tflocal apply

Closing

Luckily, I realized this early and moved away from the dark side. By using tflocal, things become much simpler. The Terraform files stay clean and close to real AWS usage, without extra LocalStack settings inside the code.

LocalStack details are handled when running Terraform, not when writing it. This keeps the setup easy to understand, easier to maintain, and closer to how things should work in the real environment.

Help me, Localstack. You're my only hope.

KILLALLSKYWALKER — Tue, 13 Jan 2026 22:53:30 +0000

Help me, Localstack. You're my only hope.

KILLALLSKYWALKER — Sun, 11 Jan 2026 23:01:52 +0000

Last year, I learned Terraform with a group on Discord. The person who taught us was very knowledgeable and explained things clearly. It was an introduction to Terraform, but it gave a strong foundation.

At that time, we learned how to set up an AWS EC2 instance inside a specific VPC and subnet. We used the latest Amazon Linux 2023 AMI and stored the Terraform state in an S3 bucket. This was a very good example to understand how Terraform helps manage and create infrastructure in a structured and repeatable way.

However, I am a normal person. If I do not practice what I learn, I will slowly forget it. In my current daily job, I do not work with AWS at all. Because of that, I need a way to keep practicing so my knowledge stays fresh and I can continue improving this skill.

To practice properly, I would normally need an AWS account. But using a real AWS account means I must be very careful to avoid unexpected charges, even though there is a free tier.

This is where LocalStack helps a lot. LocalStack allows us to emulate AWS services locally on our own machine. By using Terraform together with LocalStack, I can safely practice AWS infrastructure without worrying about costs. I believe this is one of the safest and best ways to learn and improve Terraform skills for AWS.

Setup

Below is the most simple setup to start using Terraform with LocalStack.

1. Install LocalStack

Install LocalStack based on your operating system. Please refer to the official documentation below and follow the instructions for your platform:

https://docs.localstack.cloud/aws/getting-started/installation/

2. Verify LocalStack Installation

After installation, make sure LocalStack is running correctly.

Start LocalStack:

localstack start

In another terminal, verify it is working:

curl http://localhost:4566/_localstack/health

If LocalStack is running, you should see a JSON response showing all available services. This means LocalStack is working correctly.

3. Set Up AWS CLI Profile (Using Original AWS CLI)

Since we are not using awslocal and will use the original AWS CLI, we still need to configure an AWS profile as usual. LocalStack does not validate credentials, so dummy values are fine.

Create a new AWS profile:

aws configure --profile localstack

Use the following example values:

AWS Access Key ID: test
AWS Secret Access Key: test
Default region name: us-east-1
Default output format: json

After that, you must update the AWS config file to point to the LocalStack endpoint.

Open ~/.aws/config and add the following:

[profile localstack]
region = us-east-1
output = json
endpoint_url = http://localhost:4566

This tells the AWS CLI to send all requests for this profile to LocalStack instead of real AWS.

This profile will be used by Terraform and AWS CLI when working with LocalStack.

4. Ensure AWS CLI Version Supports `endpoint_url` in Config

Make sure you are using AWS CLI v2.13.0 or later. Versions older than this may not fully support the endpoint_url value inside the AWS config file.

Check your AWS CLI version:

aws --version

You should see output similar to:

aws-cli/2.13.0 Python/3.x

If your version is lower than 2.13.0, please upgrade AWS CLI before continuing.

Once this is done, your AWS CLI and profile are ready to work with LocalStack and Terraform.

Create an S3 Bucket with Terraform

After the setup is complete, we can start using Terraform with LocalStack. In this example, we will create an S3 bucket.

Terraform Files

You will need two files:

terraform.tf

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.92"
    }
  }
  required_version = ">= 1.2"
}

main.tf

provider "aws" {
  profile = "localstack"
  region  = "us-east-1"

  s3_use_path_style           = true
  skip_credentials_validation = true
  skip_metadata_api_check     = true

  endpoints {
    s3 = "http://s3.localhost.localstack.cloud:4566"
  }

  default_tags {
    tags = {
      Environment = "tutorial"
      Project     = "localstack-terraform"
    }
  }
}

resource "aws_s3_bucket" "example" {
  bucket_prefix = "localstack-terraform"
}

Explanation of `main.tf`

Below is a simple explanation for each part of main.tf.

Provider Block

provider "aws" {

This tells Terraform that we are using the AWS provider.

profile = "localstack"

This uses the AWS CLI profile named localstack, which we configured earlier.

region = "us-east-1"

This sets the AWS region. LocalStack requires a region value even though it runs locally.

S3 Specific Settings

s3_use_path_style = true

This forces Terraform to use path-style S3 URLs. LocalStack requires this to work correctly.

skip_credentials_validation = true

This skips AWS credential validation because LocalStack does not check real credentials.

skip_metadata_api_check = true

This disables calls to the EC2 metadata service, which is not needed when running locally.

Endpoints Configuration

endpoints {
  s3 = "http://s3.localhost.localstack.cloud:4566"
}

This tells Terraform to send all S3 requests to LocalStack instead of real AWS S3.

Default Tags

default_tags {
  tags = {
    Environment = "tutorial"
    Project     = "localstack-terraform"
  }
}

These tags will be automatically added to all AWS resources created by this provider.

S3 Bucket Resource

resource "aws_s3_bucket" "example" {
  bucket_prefix = "localstack-terraform"
}

This creates an S3 bucket. Terraform will generate a unique bucket name that starts with localstack-terraform.

At this point, you can run terraform init and terraform apply to create the S3 bucket in LocalStack.

Closing

This example is only an introduction to show how LocalStack can help you test and try AWS services locally without using a real AWS account.

However, there is an important catch. When using Terraform this way, you need to manually configure service endpoints, provider flags, and other LocalStack-specific settings. Over time, this adds unnecessary configuration and maintenance to your Terraform code.

To avoid this, Terraform provides tflocal, which is designed to work directly with LocalStack. It automatically handles service endpoints and reduces the need for extra configuration. In real projects, using tflocal is usually a cleaner and easier approach when working with LocalStack.

May the Blocks Be With You: Parallel Processing in Mage AI

KILLALLSKYWALKER — Mon, 22 Sep 2025 16:00:00 +0000

In tourism, many people still get cheated by scam companies. This happens a lot with umrah packages, tourist guides, and travel agencies. Why? Because it is not easy to check if a company is legal or not.

The government has official websites with lists of banned, blacklisted, or registered names. There is a search function, but the problem is the data is split into many different lists. For example, one list for tourist guides, one list for umrah, one list for travel agencies. You must choose the right list first, and then search. Also, each list uses pagination. That means you still need to click page by page, which is slow and not friendly.

I started to think , What if we make one simple website, where people just type a keyword, and it will show if the name exists in any of the lists? This way, travelers can check quickly if a company is real or a scam. Btw I’m doing this for fun, since I can’t go anywhere during the school holidays as the roads are all jammed , so i just spend my time on a little project for fun .

The Challenge

The most lazy part of this actually is to get all the related data . Copy and paste by hand is possible and easy way , but it too much work , it would be fun project to depress project later on haha . So why not use Mage AI as i already use this to for my previous project related to data .

At first, I created a normal block with a loop. It worked, but it was too slow because it went step by step through every page ( at least for page that not have many page ) . Then I realized , why not try a dynamic block? With dynamic blocks, I can run many requests at the same time with parallel processing. Much faster, much smarter.

Mage AI Dynamic Blocks

Here is where Mage AI helps. Mage AI has dynamic blocks. With this feature, we can scrape many pages in parallel. That means faster and easier. To learn more about Mage AI Dynamic Blocks go here .

This is how it works :

Generate a list of url including pagination parameter using a loader block . Keep in mind a dynamic block must return a list of 2 lists of dictionaries
Scrape the page based on url that are store in list of dictionary and reduce the data into one set
Export the data to destination

Example

First step
Create loader block . Ensure you set this block as dynamic .

Once you set it as dynamic , you can write this as your loader , the purpose is for us to get all targeted url that we want to to scrape .

from typing import Dict, List
import requests
from bs4 import BeautifulSoup

@data_loader
def load_data(*args, **kwargs) -> List[List[Dict]]:
    """
    This loader prepares tasks for scraping multiple MOTAC pages.
    Each entry in 'urls' becomes a separate block run if used with dynamic blocks.
    """
    url = "https://the-targeted-url"

    response = requests.get(url, timeout=20)
    response.raise_for_status()
    soup = BeautifulSoup(response.text, "html.parser")

    record_tag = soup.select_one("li.uk-disabled span")
    jumlah_rekod = None

    if record_tag:
        text = record_tag.get_text(strip=True)  
        jumlah_rekod = int(text.split(":")[-1].strip())  

    urls = []
    for offset in range(0, jumlah_rekod, 20):  
        if offset == 0:
            urls.append(url)
        else:
            urls.append(f"{url}?s=&n=&v={offset}")

    tasks = []
    metadata = []

    for idx, url in enumerate(urls, start=1):
        tasks.append(dict(id=idx, url=url))
        metadata.append(dict(block_uuid=f"scrape_page_{idx}"))

    return [
        tasks,
        metadata
    ]

Second Step
Create a transformer . This transformer will do scraping and get all the data from the page . This will be automatic set as dynamic if you set the first block dynamic . The only thing we need to do is to reduce for the output . The reason we reduce because we want to export in one step , so we don't spawn another extra block for export .

import requests
from bs4 import BeautifulSoup

@transformer
def scrape_page(row, *args, **kwargs):
    url = row["url"]
    response = requests.get(url, timeout=30)
    response.raise_for_status()

    soup = BeautifulSoup(response.text, "html.parser")

    results = []
    table = soup.find("table")

    if table:
        headers = [th.get_text(strip=True) for th in table.find_all("th")]
        for tr in table.find_all("tr")[1:]:  # skip header row
            cells = [td.get_text(strip=True) for td in tr.find_all("td")]
            if cells:
                results.append(dict(zip(headers, cells)))

    return {
        "page_id": row["id"],
        "url": url,
        "records": results,
    }

Third Step
Add another block to do some cleaning of the data format , column name and etc before exporting .

if 'transformer' not in globals():
    from mage_ai.data_preparation.decorators import transformer
if 'test' not in globals():
    from mage_ai.data_preparation.decorators import test

import pandas as pd

@transformer
def transform(data, *args, **kwargs):
    df = data

    if '#' in df.columns:
        df = df.drop(columns=['#'])

    df = df.rename(columns={
        'Nama': 'nama',
        'No. TG': 'no_tg',
        'Tempoh Sah': 'tempoh_sah',
        'Tarikh Batal': 'tarikh_batal',
        'Seksyen': 'seksyen'
    })

    for col in ['tempoh_sah', 'tarikh_batal']:
        if col in df.columns:
            df[col] = pd.to_datetime(df[col], format='%d/%m/%y', errors='coerce')

    return df

@test
def test_output(output, *args) -> None:
    # Ensure required columns exist
    required_cols = ['nama', 'no_tg', 'tempoh_sah', 'tarikh_batal', 'seksyen']
    for col in required_cols:
        assert col in output.columns, f'Missing column: {col}'

Fourth Step
As for now , since this only daily project , i just gonna export first and using full load first . No worry , if i've a mood , i will write a better approach for this :)

from mage_ai.settings.repo import get_repo_path
from mage_ai.io.config import ConfigFileLoader
from mage_ai.io.postgres import Postgres
from pandas import DataFrame
from os import path

if 'data_exporter' not in globals():
    from mage_ai.data_preparation.decorators import data_exporter


@data_exporter
def export_data_to_postgres(df: DataFrame, **kwargs) -> None:
    """
    Template for exporting data to a PostgreSQL database.
    Specify your configuration settings in 'io_config.yaml'.

    Docs: https://docs.mage.ai/design/data-loading#postgresql
    """
    schema_name = 'public'  
    table_name = 'pemandu_pelancong'  
    config_path = path.join(get_repo_path(), 'io_config.yaml')
    config_profile = 'default'

    with Postgres.with_config(ConfigFileLoader(config_path, config_profile)) as loader:
        loader.export(
            df,
            schema_name,
            table_name,
            index=False,  
            if_exists='replace',  
        )

Why Use Dynamic Blocks for Scraping?

Dynamic blocks are powerful because they make scraping large datasets much faster. Instead of one request after another, you can run many requests at the same time. For websites with hundreds of pages, this saves a lot of time.

But there are also things to keep in mind

Respect rate limits: Some websites may block you if you send too many requests at once
Error handling: Always add retries in case some requests fail
Data consistency: Make sure to clean and validate data before saving
Ethics and legality: Always check if scraping the website is allowed

Closing Thoughts

This little holiday project showed me how useful Mage AI’s dynamic blocks can be. With just a few blocks, I turned a slow and boring manual process into a fast, automated pipeline. The scraped data can now be used to build a simple search directory, helping people quickly check if a company is real or a scam.

Dynamic blocks are not only fun , they’re practical, powerful, and a great tool for anyone working with pagination or large API calls.

So remember when you face hundreds of pages, don’t suffer like anakin let the blocks be with you.

The Rise of Visual Testing: Pest in Laravel

KILLALLSKYWALKER — Mon, 15 Sep 2025 16:00:00 +0000

Previously we use tools like Diffy or Percy to do visual regression testing . These tools very great , but we depending on another service , extra cost and also a bit of context switching since everything had to be set up outside of our main test suite.

I have seen a lot of about Pest, but in my Laravel projects i still use PHPUnit. What really caught my attention is Pest 4 new visual regression testing feature. Since I had been relying on Diffy and Percy before, the idea of running these tests directly in my PHP test suite sounded too good to ignore.

So I gave it a try and really easy ! Another thing that stood out is how Pest writes tests. Instead of the heavy PHPUnit class and method structure, Pest feels more like RSpec (from Ruby) or Jest (from JavaScript).Since I actively use both RSpec and Jest in my day to day work, the syntax felt very natural almost like i still in those environments, but now directly in PHP.

Visual Regression Testing with Pest 4

Since this is the visual , i try to change a little bit the way i test this , instead of i change the Anakin Skywalker to Darth Vader i replace the image anakin.jpg with another image but same name .

The test is simple only like this -

it('displays anakin skywalker as a jedi knight', function () {
    $page = visit(['/']);

    $page->assertScreenshotMatches();
});

When first time we run this it will save this screenshot

As long it don't detect any difference in the screenshot , the test will pass . However when you change the image it will fail the test .

If you’re sure the change is intentional, update the snapshot with:
./vendor/bin/pest --update-snapshots

If the change is not expected, re-run the test with:
./vendor/bin/pest --diff
to review the differences.

This example when you run --diff

This is the current change that i update

Closing

Visual regression testing before always needed other tools like Diffy or Percy. These tools are good, but Pest 4 gives us a new way. Now we can run visual tests inside our PHP test suite. No more jumping to other services, paying extra money, or losing focus. Everything stays in one place with our tests.

The best thing for me is how simple it work. Just write a test, run it, and Pest will save the snapshots. The syntax also easy as it same with rspec and jestt not heavy PHPUnit classes. For me, it feels lighter and more fun.

Because of this, I think Pest 4 is really worth trying, especially if you still use PHPUnit. And if you already use Pest, then this new visual regression testing can be the extra feature you did not know you needed.

May the Auth Be With You: Securing Mage AI

KILLALLSKYWALKER — Sun, 14 Sep 2025 16:00:00 +0000

Previously i talk how easier to get start with Mage AI and ready it for prod . By default Mage AI can be access without any authentication . This makes it quick and convenient to get started during development, but it also means that anyone who can access the instance has full control. For production environments, this is not secure. In this article, we go through how to enable and configure user authentication in Mage AI to properly secure your pipelines .

It still easy , what you need to is just ensure your env contain this env variable

REQUIRE_USER_AUTHENTICATION=1

By default the user will be set as

Email    :      admin@admin.com
Password :  admin

You can change the default one by providing this environment

DEFAULT_OWNER_EMAIL
DEFAULT_OWNER_PASSWORD
DEFAULT_OWNER_USERNAME

But here the catch , at the moment there is no official reset password yet for open source version . So it will be little bit tricky when you forgot your owner password where you cannot reset other user password etc .

You can generate new hash and salt using this

import bcrypt


def generate_salt() -> str:
    return bcrypt.gensalt(14)

def create_bcrypt_hash(password: str, salt: str) -> str:
    password_bytes = password.encode()
    password_hash_bytes = bcrypt.hashpw(password_bytes, salt)
    password_hash_str = password_hash_bytes.decode()
    return password_hash_str

password = "MyNewSecret123!"


password_salt=generate_salt()
password_hash=create_bcrypt_hash(password, password_salt)

print(password_salt,password_hash)

Once you get this , you can update the user password but of course it can be update by admin of database only :)

The Chosen One : Mage AI

KILLALLSKYWALKER — Fri, 12 Sep 2025 16:00:00 +0000

As i write previously how i found Mage AI , i will try to explain more the reason why it the chosen one , at least for me .

You know why because there's a debate who is actually the chosen one anakin or luke , but of course it back to how you see it . Same as the tool , right tool for right job .

The chosen one

When i try Mage AI , from the very first run , i already feel like it

Really simple to start , just with one docker command , or with docker compose you can already use it end to end
Interactive development where it using notebook style block , it really easy to test and get the feedback
Production ready , scheduling , monitoring , built in retry , notification , a lot of connector / plugin ( of course there some tweak need to do , but really minimal )
Team friendly , really easy to be use by beginner ( When i build the pipeline , im the only person in the team that know how to use this , but i do a sharing on this tools , and bring other dev on board really easy )

Instead of struggling learning the tool , i can build the pipeline right away .

How Simple It Actually

This is just a demo to show why it simple , interactive , production ready and team friendy .

docker run -it -p 6789:6789 -v $(pwd):/home/src mageai/mageai /app/run_app.sh mage start demo

once it running you can go to localhost:6789 and will see this dashboard

To add your first pipeline by clicking pipeline menu

Once you in the pipelines page you can click new button , you can fill the information like below

Once done you can see your pipelines detail where you can an option to add block like data loader , transformer , data exporter and other s . But at the moment we just need 3 block which is data loader , transformer and data exporter , once we setup this 3 we already have a complete component that cover ETL flow which is

Extract ( using loader block )
Transform ( ( Using transformers block )
Load ( Using exporter block )

Act you can chain more the block for more complex flow and more dynamic , but as for demo this only sufficient enough .

Data Loader Block
Ignore @test first . Not cover in this demo . This is where we load the api data from the swapi api .

import io
import pandas as pd
import requests
if 'data_loader' not in globals():
    from mage_ai.data_preparation.decorators import data_loader
if 'test' not in globals():
    from mage_ai.data_preparation.decorators import test


@data_loader
def load_data_from_api(*args, **kwargs):
    """
    Load Star Wars planets data from SWAPI
    """
    url = 'https://swapi.info/api/planets'
    response = requests.get(url)

    # Check if request was successful
    if response.status_code != 200:
        raise Exception(f"API request failed with status code: {response.status_code}")

    data = response.json()

    df = pd.DataFrame(data)

    return df


@test
def test_output(output, *args) -> None:
    """
    Template code for testing the output of the block.
    """
    assert output is not None, 'The output is undefined'

Transformer Block
You can use directly the template that provided by the block for transformation , don't worry you still can add your own method for your own transformation .

from mage_ai.data_cleaner.transformer_actions.base import BaseAction
from mage_ai.data_cleaner.transformer_actions.constants import ActionType, Axis
from mage_ai.data_cleaner.transformer_actions.utils import build_transformer_action
from pandas import DataFrame

if 'transformer' not in globals():
    from mage_ai.data_preparation.decorators import transformer
if 'test' not in globals():
    from mage_ai.data_preparation.decorators import test


@transformer
def execute_transformer_action(df: DataFrame, *args, **kwargs) -> DataFrame:
    """
    Execute Transformer Action: ActionType.REMOVE

    Docs: https://docs.mage.ai/guides/transformer-blocks#remove-columns
    """
    action = build_transformer_action(
        df,
        action_type=ActionType.REMOVE,
        arguments=['residents','films','url'],  # Specify columns to remove
        axis=Axis.COLUMN,
    )

    return BaseAction(action).execute(df)


@test
def test_output(output, *args) -> None:
    """
    Template code for testing the output of the block.
    """
    assert output is not None, 'The output is undefined'

Export Block
The demo only show to csv . In real case we will export to the real destination .

from mage_ai.io.file import FileIO
from pandas import DataFrame

if 'data_exporter' not in globals():
    from mage_ai.data_preparation.decorators import data_exporter


@data_exporter
def export_data_to_file(df: DataFrame, **kwargs) -> None:
    """
    Template for exporting data to filesystem.

    Docs: https://docs.mage.ai/design/data-loading#fileio
    """
    filepath = 'star_wars.csv'
    FileIO().export(df, filepath)

Now you have all the block and the pipelien already complete . What left is you just need to set the schedule when want it to be trigger . Once you set that , then it good to go :) So you can see how it simple and easy to run up end to end pipeline with Mage AI ? The chosen one , Mage AI .

The Data Awakens: My First Pipeline with Mage AI

KILLALLSKYWALKER — Thu, 11 Sep 2025 16:00:00 +0000

Most of my work is doing backend development and front end development , i had zero experience with ETL . Most of the the time is build an api , web app , debugging application error , google how to center div and etc etc .

Data engineering ? For me it was something that i think was reserved for a different kind of developer or specialist who lived in SQL, optimized queries for breakfast, and wrangled giant data clusters .

Since my current role is not too busy , one day i volunteered to take on a task to build our first data pipeline .

At first, I had no idea what I was getting myself into. I didn’t even know what tools people normally used. Do I supposed to write a lot of cron jobs? Build custom scripts? Install some massive framework like Airflow that looked intimidating just from the docs?

First when i see Airflow , i feel like is it i made the wrong choice to volunteer ? haha , considering the timeline of the project , i know there's a lot i need to know when using airflow especially this need to be on production within short timeline .

Mage AI

Of course, there were other tools like Luigi, Prefect, and Dagster. I gave each of them a quick spin, just a simple “hello world” test.My reasoning was simple , if even the hello world felt complicated, how could I possibly feel confident using that tool for a real project with a tight deadline? Not on that also i need to consider the team adaptability as no one in the team also familiar with ETL .

That’s where Mage AI stood out. From the very first try to run it , it feels really easy and straight forward .

If you want to try just run this

docker run -it -p 6789:6789 -v $(pwd):/home/src mageai/mageai /app/run_app.sh mage start my-first-etl

Once you run you can access the mage ai dashboard , and can start doing your first pipeline . It's really easy to start , play around especially for first timer like me to explore ETL tools .

Of course , this is only tool , i spend a lot of time also to understand the foundation related to ETL , data modeling , data loading strategy , transformation , orchestration & scheduling ( which one of Mage AI do ) , i enroll to Kotak Sakti Bootcamp . Enrolling in the Kotak Sakti Bootcamp had a big impact on me which help me to strong my understanding on what data engineer needs .

I will try to share more what i experience when working with Mage AI in future .

The Dark Side : Manual Rules

KILLALLSKYWALKER — Wed, 10 Sep 2025 16:00:00 +0000

Previously i share with you the usage of PostgreSQL Anonymizer .

At that time, the way we added the rules is manual. When the schema is small, this approach worked fine with few fields, few changes and still easy to maintain.

But in reality, database schemas never stay static.Day by day, new fields get added or modified. This introduces a serious risk where newly added columns might expose raw PII if we forget to declare anonymization rules and also maintaining a growing set of SECURITY LABELs quickly becomes error prone and hard to track.

Automating the Anonymization Rules

The easier way actually you can just use this , but it will never work properly because you need to remember that we might have constraint in our table . So this is not the one that we can use .

ALTER DATABASE postgres SET anon.privacy_by_default = true;

So , you still need to write your own rules , where this rules will always getting bigger day by day .

SECURITY LABEL FOR anon ON COLUMN candidates.last_name IS 'MASKED WITH FUNCTION anon.dummy_last_name()';

So when i mention manually , we track this rules in a repo whenever we add or change . Change track in git , but here the catch , since this is like a different repo , the team always kind of ignore this and disconnected . So the rules sometime is really outdated also .

Solution

Since the project is using laravel , as you know laravel got a solid migration , i use laravel migration to add the new rules or remove the rules whenever we add or remove column, so all the dev that working on this job portal know what added , remove and need to be sync with the anonymizer rules . This is example how i do it at that time .

    /**
     * Run the migrations.
     */
    public function up(): void
    {
        Schema::table('users', function (Blueprint $table) {
            $table->string('identification_number')->nullable()->after('email');
        });

        DB::statement("SECURITY LABEL FOR anon ON COLUMN users.identification_number IS 'MASKED WITH FUNCTION anon.partial(identification_number,2,$$******$$,2)'");
    }

    /**
     * Reverse the migrations.
     */
    public function down(): void
    {
        DB::statement("SECURITY LABEL FOR anon ON COLUMN users.identification_number IS NULL");

        Schema::table('users', function (Blueprint $table) {
            $table->dropColumn('identification_number');
        });
    }

With this approach we can keep track everything in the migration and if someone miss out , we can remind it in pr review . We don't need seperate rules repo anymore . I know this not sexy enough , but we can make it more cleaner using helper or trait to write the statement for security label .

The Phantom Records: Hiding Sensitive Data for Debugging

KILLALLSKYWALKER — Tue, 09 Sep 2025 16:00:00 +0000

When I was debugging a patch for our job portal, I found out the only way to be sure it worked was to run it against production like data.At that time our fake seeders also not solid and don't have kind of data that messy and edge cased that real user can create .

But I also couldn’t just pull raw production into my laptop. That database was full of personal information. If any of it slipped into logs or screenshots, it would be a really big problem .

Thanks PostgreSQL Anonymizer , you make my life easier at that time and till today :)

What is PostgreSQL Anonyimzer

An extension to mask or replace personally identifiable information (PII) or commercially sensitive data from a Postgres database by using declarative approach of anonymization .

You just need to ensure the masking rules implemented directly inside the database schema.

Masking Method

Anonymous Dumps : Simply export the masked data into an SQL file
Static Masking : Remove the PII according to the rules
Dynamic Masking : Hide PII only for the masked users
Masking Views : Build dedicated views for the masked users
Masking Data Wrappers : Apply masking rules on external data

You can read more about this in the documentation PostgreSQL Anonymizer .

Demo

First run this postgresSQL with this . For demo we just use this image to make thing simple . But if in real environment you can follow the step in the doc based on how you host your postgreSQL

docker run -d -e POSTGRES_PASSWORD=password -p 6543:5432 registry.gitlab.com/dalibo/postgresql_anonymizer

Once the container already up , run this to create a table and activate the extension

CREATE TABLE public.candidates (
    id BIGSERIAL PRIMARY KEY,
    first_name VARCHAR(100) NOT NULL,
    last_name VARCHAR(100) NOT NULL,
    email VARCHAR(150) UNIQUE NOT NULL,
    phone_number VARCHAR(20),
    date_of_birth DATE,
    national_id VARCHAR(50),   
    address TEXT,
    city VARCHAR(100),
    state VARCHAR(100),
    postal_code VARCHAR(20),
    country VARCHAR(100),
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW()
);

Then insert some data that containing PII

INSERT INTO public.candidates 
(first_name, last_name, email, phone_number, date_of_birth, national_id, address, city, state, postal_code, country) 
VALUES
('Aisha', 'Rahman', 'aisha.rahman@example.com', '+60123456789', '1990-04-15', '900415-10-1234', '12 Jalan Bukit Bintang', 'Kuala Lumpur', 'Wilayah Persekutuan', '55100', 'Malaysia'),
('John', 'Tan', 'john.tan@example.com', '+6598765432', '1985-11-22', '851122-08-5678', '55 Orchard Road', 'Singapore', 'Singapore', '238880', 'Singapore'),
('Mei', 'Chen', 'mei.chen@example.com', '+60179876543', '1993-06-09', '930609-14-1122', '8 Taman Sutera', 'Johor Bahru', 'Johor', '80250', 'Malaysia'),
('Arjun', 'Patel', 'arjun.patel@example.com', '+919812345678', '1992-02-28', 'AADHAR1234567890', '22 MG Road', 'Bengaluru', 'Karnataka', '560001', 'India'),
('Sarah', 'Lim', 'sarah.lim@example.com', '+60123459876', '1995-09-05', '950905-05-3344', '19 Jalan Gasing', 'Petaling Jaya', 'Selangor', '46000', 'Malaysia');

Okay now you have the data to test . Now lets ensure the extension is there and enable it and create user for anonymize .

ALTER DATABASE postgres SET session_preload_libraries TO 'anon';

CREATE EXTENSION IF NOT EXISTS anon;

SELECT anon.init();

CREATE ROLE anonymize_user LOGIN PASSWORD 'password';
ALTER ROLE anonymize_user SET anon.transparent_dynamic_masking = True;
SECURITY LABEL FOR anon ON ROLE anonymize_user IS 'MASKED';

Now you can set the rule for masking , i just add one for sample , you can add depending on what you want to mask .

SECURITY LABEL FOR anon ON COLUMN candidates.last_name IS 'MASKED WITH FUNCTION anon.dummy_last_name()';

Now the final step just use the pg_dump

pg_dump postgres --user anonymize_user --no-security-labels --exclude-extension="anon" --file=postgres_anonymized.sql

Now you will get the dump with mask data based on your rules . On top of that if you using anonymize_user , you will see the masking data instead the real data . There's a lot of thing and the way you can use this extension . Give it a try . I will write another usage in my previous work for this in next article .

Attack of the Clones: From Endless Notifications to Daily Summaries in Laravel

KILLALLSKYWALKER — Mon, 08 Sep 2025 16:00:00 +0000

In our job portal, every time a candidate applied for a job, we sent an email to the recruiter right away.

At first, this was fine. Recruiters got updates quickly. But when more people started applying, recruiters with popular jobs were getting dozens or even hundreds of emails in one day.

Their inbox became full of “clone” emails. Many recruiters felt annoyed and started to ignore the notifications.

We needed a better way.

The answer? Send one daily summary email instead of many small ones.

Too Many Emails

Notification::route('mail', $recruiter->email)
->notify(new ApplicationNotification($application));

This sends one email per application. With 100 applicants → 100 emails. Inbox spam. Recruiters will not happy.

New Way : Daily Summary Notification

Instead of sending one email each time, we gather all applications for a recruiter during the day and send one notification in the evening.

use App\Models\Recruiter;
use App\Models\Application;
use App\Notifications\ApplicationNotification;
use Illuminate\Bus\Queueable;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Foundation\Bus\Dispatchable;

class SendDailyApplicationSummary implements ShouldQueue
{
    use Dispatchable, Queueable;

    public function handle()
    {
        Recruiter::chunk(100, function ($recruiters) {
            foreach ($recruiters as $recruiter) {
                $applications = Application::where('recruiter_id', $recruiter->id)
                    ->whereDate('created_at', today())
                    ->get();

                if ($applications->isNotEmpty()) {
                    Notification::route('mail', $recruiter->email)
                        ->notify(new ApplicationNotification($applications));
                }
            }
        });
    }
}

Of course , we need to ensure our ApplicationNotification has been updated also with logic to summary the daily application .

Once we have this now we just set schedule for it using laravel schedule

class Kernel extends ConsoleKernel
{
    /**
     * Define the application's command schedule.
     */
    protected function schedule(Schedule $schedule): void
    {
      $schedule->job(new SendDailyApplicationSummary)->dailyAt('18:00');
    }
}

Now every recruiter of company will receive one single email summary instead of multiple email for applications that they receive on that day .

Closing

See how simple it is with Laravel’s Notification, Scheduler, and Queues.We can send application updates properly as a daily summary, without spamming recruiters.
This way, recruiters stay happy and not annoying with our job portal notification .

The Staging Strikes Back: Safer Emails in Laravel with Mailpit

KILLALLSKYWALKER — Sun, 07 Sep 2025 16:00:00 +0000

When building a job portal , there's so much email transaction that will be send to user . As an example like candidate receive the job status , company view the profile , company receive notification when applicant apply and so many more .

So we need to ensure the test including the email send when it been test . By right , when we have small feature , small flow to test , the QA just use and create their own dummy email , but when the test has to much , QA also run out of email . But it's not only about this , what happen if QA use wrong email which is actually belong to real user outside , wrong recipient and send the email ?

Why not using log ? It easy for developer but not QA . How about mailtrap ? Mailtrap great , its work , but had cost and we dont want our data go outside .

Mailpit , A New Hope

Mailpit is an open-source SMTP server + web UI that catches all emails sent from our Laravel app. Instead of leaving your staging environment, every email lands in Mailpit’s inbox at http://localhost:8025 or any port you set ( in our case we using our domain ) . Of course our staging environment is protected .

You can setup it and refer to its documentation Mailpit

Closing

After we use mailpit , we make the QA life easier , no more worry about test email reaching unintended user . On top of that we also can preview the email in different mode in one place .

Sorry for no code example for this , maybe later i can create a simple tutorial for this :)

Forem: KILLALLSKYWALKER

Refactor the Terraform Script — Restoring Balance

Tflocal

Closing

Help me, Localstack. You're my only hope.

Help me, Localstack. You're my only hope.

Setup

1. Install LocalStack

2. Verify LocalStack Installation

3. Set Up AWS CLI Profile (Using Original AWS CLI)

4. Ensure AWS CLI Version Supports endpoint_url in Config

Create an S3 Bucket with Terraform

Terraform Files

Explanation of main.tf

Provider Block

S3 Specific Settings

Endpoints Configuration

Default Tags

S3 Bucket Resource

Closing

May the Blocks Be With You: Parallel Processing in Mage AI

The Challenge

Mage AI Dynamic Blocks

Example

Why Use Dynamic Blocks for Scraping?

Closing Thoughts

The Rise of Visual Testing: Pest in Laravel

Visual Regression Testing with Pest 4

Closing

May the Auth Be With You: Securing Mage AI

The Chosen One : Mage AI

The chosen one

How Simple It Actually

The Data Awakens: My First Pipeline with Mage AI

Mage AI

The Dark Side : Manual Rules

Automating the Anonymization Rules

Solution

The Phantom Records: Hiding Sensitive Data for Debugging

What is PostgreSQL Anonyimzer

Masking Method

Demo

Attack of the Clones: From Endless Notifications to Daily Summaries in Laravel

Too Many Emails

New Way : Daily Summary Notification

Closing

The Staging Strikes Back: Safer Emails in Laravel with Mailpit

Mailpit , A New Hope

Closing

4. Ensure AWS CLI Version Supports `endpoint_url` in Config

Explanation of `main.tf`