Forem: Naveena

From CMDB to Clarity: Mapping Products to Applications Without Losing Your Mind

Naveena — Mon, 06 Apr 2026 20:01:53 +0000

When working in large enterprises, especially in healthcare or financial systems, knowing what you own isn’t enough.
You also need to know:
What business product does this application actually support?

This article walks through how to move from a messy CMDB in ServiceNow to a usable product-to-application mapping—so you can make safer decisions, reduce technical debt, and stop guessing.

The Problem
You’re asked a simple question:
“Can we retire this application?”
You check your CMDB:

App exists ✅
Owner exists (maybe outdated) ✅
Infrastructure mapped ✅

But…

Which product uses it? ❌
Is it still critical? ❌
What breaks if you remove it? ❌

Meanwhile, your architecture looks like this:
App A → DB1
App B → DB2
App C → ???
No product context. No clarity.

The Approach
Define Product Mapping
Start by explicitly mapping applications to business products.
CREATE TABLE product_app_map (
product_name STRING,
application_name STRING
);
Even a simple mapping table changes everything.

Backfill Using What You Already Know
Pull data from:

CMDB (applications, owners)
Business docs (products)
Logs / usage patterns

INSERT INTO product_app_map
SELECT
'Claims Processing' AS product_name,
app_name
FROM cmdb_applications
WHERE tag = 'claims';

It won’t be perfect—but it’s a start.

Handle “Unknowns” Explicitly
You will find apps like this:
App X → Owner: Unknown → Usage: Unknown
Don’t ignore them. Flag them.

SELECT application_name
FROM product_app_map
WHERE product_name IS NULL;
These are your hidden technical debt hotspots.

Validate with Real Usage Data
Augment mapping using logs or query patterns:
def infer_product(app_logs):
if "claims" in app_logs:
return "Claims Processing"
elif "billing" in app_logs:
return "Billing"
else:
return "Unknown"
Not perfect—but better than tribal knowledge.

Enforce It in Governance
Make mapping mandatory:
if (!application.product) {
throw new Error("Application not mapped to product");
}
No mapping → no deployment, no change approval.

Why This Works
You shift from assets → context
Unknown systems become visible risks
Impact analysis becomes trivial
Instead of:
“I think this app is used somewhere…”
You get:
SELECT product_name
FROM product_app_map
WHERE application_name = 'App A';

The Payoff
Safer decommissioning
Faster incident resolution
Clear ownership and accountability
Real visibility into technical debt

Final Thought
Most enterprises don’t lack data.
They lack meaningful relationships between that data.
Your CMDB knows everything you have.
It just doesn’t know why you have it.

🚨 Why Your SQL Window Functions Betray You in Cloud SSMS vs Snowflake

Naveena — Sat, 21 Feb 2026 18:16:20 +0000

🚨 Why Your SQL Window Functions Betray You in Cloud SSMS vs Snowflake
You run the same query.
Same data.
Same logic.
But your numbers don’t match.
Welcome to the sneaky world of window functions — where defaults quietly change your results between Microsoft SQL Server (Cloud SSMS) and Snowflake.
Let’s break down the drama.

🎭 The Silent Villain: Default Window Frames
Here’s a classic trap:
LAST_VALUE(amount) OVER (
PARTITION BY customer_id
ORDER BY order_date
)
In Snowflake, the default frame is:
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
Translation?
LAST_VALUE() returns the current row’s value, not the actual last value in the partition.
In SQL Server, you might assume it works differently — or you might get lucky depending on how you've tested it.
💥 Result: mismatched reports, confused stakeholders, late-night debugging.
Pro Tip:
Always define your frame explicitly:
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
If it matters, don’t leave it to defaults.

🧨 NULLs: The Chaos Agents
Sorting rules differ:
Snowflake → NULL comes first in ascending order
SQL Server → NULL comes last

Now imagine using:
ROW_NUMBER() OVER (PARTITION BY dept ORDER BY bonus)
If bonus has NULLs?
Your row numbers shift.
Your rankings change.
Your dashboard breaks.

🎲 Ties = Non-Deterministic Madness
This one burns teams during migrations.
ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary)
If two employees earn the same salary:
SQL Server might return one order.
Snowflake (distributed engine) might return another.
Re-run it? You might get a different result again.

Because neither engine guarantees deterministic ordering unless you make it deterministic.
🔥 Add a tie-breaker:
ORDER BY salary, employee_id
Always.

🧬 Collation & Case Sensitivity
SQL Server respects database-level collation.
Snowflake handles text comparisons differently.
If you're partitioning by strings, grouping may not match exactly after migration.
Subtle. Painful. Real.

☁️ Engine vs. Cloud Architecture
SQL Server executes in a traditional engine model.
Snowflake distributes computation across clusters.
Distributed systems expose sloppy ordering assumptions fast.
What “worked fine” before?
Was probably relying on physical storage order.
Snowflake doesn’t care about your assumptions.

🛡️ The Migration Survival Checklist
If you want consistent results:
✅ Explicit ROWS BETWEEN
✅ Deterministic ORDER BY
✅ Explicit NULL handling
✅ Test ties
✅ Test edge cases
✅ Never trust defaults

🎯 Bottom Line
Window functions aren’t broken.
Your assumptions are.
When moving from SQL Server to Snowflake, make everything explicit.
Because in analytics…
“Almost the same result” is not the same result.

From Parquet to Snowflake: Query Smart, Load Fast

Naveena — Tue, 06 Jan 2026 18:05:05 +0000

When working with large volumes of financial data, querying efficiently and loading the results into a data warehouse like Snowflake is crucial. This article walks through how an analyst can handle millions of records stored as Parquet files in AWS S3 and export processed data to Snowflake.

The Problem
The task is to generate daily metrics (like total transaction volume, active customers, and average balances) from 3 TB of Parquet data. The data is partitioned by transaction_date in S3, but older partitions have inconsistent column names. The results must then be loaded into Snowflake for further analysis.

The Approach
Efficiently Query the Data
Instead of scanning the entire dataset, you only read the last 30 days of data by using partition pruning. This saves both time and cost.

Handle Schema Evolution
As the schema has changed over time (e.g., different column names for balance), you use SQL functions like COALESCE to handle missing or differently named columns, ensuring consistency.

Aggregate Metrics
You aggregate data by region to calculate total transaction volume, count active customers, and find the average account balance.

Load Data into Snowflake
After processing the data, you use Snowflake’s COPY INTO method for efficient, large-scale ingestion, moving the results from a CSV file into your warehouse.

Why This Works
Partition pruning ensures that only the relevant data is queried, making it fast and cost-efficient.

Schema handling with functions like COALESCE allows for seamless integration across different data partitions.

Snowflake’s optimized loading mechanisms allow for fast and reliable data transfer.

This approach makes working with large, partitioned datasets in cloud storage manageable, while ensuring efficient data processing and loading into Snowflake.

The Solution in PySQL
Read the last 30 days of Parquet using partition pruning:

import duckdb
import datetime

end = datetime.date.today()
start = end - datetime.timedelta(days=30)

con = duckdb.connect()
df = con.execute(f"""
SELECT *
FROM read_parquet('s3://bank-lake/transactions/transaction_date>= {start} AND transaction_date <= {end}/*.parquet')
""").fetchdf()

Aggregate metrics, handling schema differences:

result = con.execute("""
SELECT
region,
SUM(transaction_amount) AS total_tx,
COUNT(DISTINCT customer_id) AS active_customers,
AVG(COALESCE(account_balance, acct_balance)) AS avg_balance
FROM df
GROUP BY region
""").fetchdf()

Load the results into Snowflake:

import snowflake.connector

result.to_csv("daily.csv", index=False)

conn = snowflake.connector.connect(
user='YOUR_USER',
password='YOUR_PASSWORD',
account='YOUR_ACCOUNT'
)
conn.cursor().execute("""
PUT file://daily.csv @%DAILY_REGION_METRICS;
COPY INTO DAILY_REGION_METRICS
FROM @%DAILY_REGION_METRICS
FILE_FORMAT=(TYPE=CSV FIELD_OPTIONALLY_ENCLOSED_BY='"');
""")