Forem: Kay Wilson

Apples to Oranges or Oranges to Nectarines? (Aspiring Solution Architect)

Kay Wilson — Tue, 11 Apr 2023 19:40:31 +0000

Comparing Solution and Naval Architecture

For the past four years of my professional career, I have worked as a Naval Architect. Most people have no idea what naval architecture is and contrary to what the name may imply, I do not do any sketching or drawing at all. Surprisingly, Solution Architects and Naval Architects have more in common than just their job titles. We have similar job duties! We just work in different industries.

Simply put, I design technical solutions for naval ships. These solutions are based on data that I use to calculate/model the expected behavior of ships and their structure using stress analysis and buoyancy theories, technical requirements and specifications, and data provided to me by the "customer". I review customer provided designs, proposals, and any other submitted documents, create a model, and then either provide or approve a solution to a given problem.

Note: we call projects "work items" or "availabilities" so you can use this interchangeably when reading from here on.

Data Analysis & Designing Technical Solutions

Much of the data used in these models are as old as the Navy itself. Long ago, naval architects conducted and recorded various experiments on the behavior of naval ships due to various stresses such as wind, flooding of tanks, earthquakes and hurricanes, and explosions. Using this ancient data and current conditions reported, I am able to model the expected outcome of stress, corrosion, stability, etc. to then assess and manage the risk of structural failure or casualty. Here are two examples:

Of these experiments, there were "inclining experiments" to model the behavior of buoyancy or stability due to stress loads aboard the ship. Long story short, heavy weights were moved to various locations and the buoyancy was observed, recorded, and eventually turned into charts and respective formulas. Naval architects in my office less than a decade ago had to endure the long grueling process of reading through these many many many charts and making additional tables and charts to then model the expected behavior of buoyancy due to the removal and addition of weights, all by hand. Instead now, all of these charts and data have been uploaded into a lovely interactive database (even though I have had to do these calculations by hand as almost a rite of passage). I now simply input the new data of columns and rows and out comes our new stability model complete with a couple visuals. Then I am able to either approve the requested solution or present an appropriate alternative.
Let's say there is a hole on a deck that needs to be repaired but based on the work items already scheduled, there is no time to complete the repair. Taking such data as the size and location of the hole and material type and thickness, I can estimate the corrosion and failure rate and predict when the deck will fail due to stress. Different locations on the ship and material have been shown to exhibit varying degrees of stress or deterioration. Based on this information, I can appropriately recommend the best method of repair and its required completion date.

There are many other components that I manage that all have their own tech manuals and formulas. There is never a dull day in the office, fortunately!

Project Estimation and Risk Analysis

While researching Solution Architecture, I came across a project estimation technique/concept that seems similar to a method that I use in my current role: T-shirt sizing.

In my current position, we have scheduled future projects called availabilities with pre-assigned tasks outlined in a "work item". There are smaller or larger tasks that are assigned to shorter or longer scheduled availabilities respectively in order to remain on or close to schedule. Shorter availabilities are much more frequent and longer availabilities can be spaced out by years.

However, once I have assessed the risk of failure or casualty, if a more complex or larger tasks is labeled as "critical", it can be deemed a priority and is thus reassigned to shorter or sooner availabilities regardless and tasks assessed as "low" can be reassigned to longer availabilities.

Contractors are also prepaid for work before an availability starts and any additional tasks or "growth work" added have higher costs and additional fees. Only tasks deemed critical are added as growth work, and I must use my best engineering judgement to determine what is necessary and urgent using my models as support.

I am of course not an expert on T-shirt sizing but it seems similar to this method!

Effective Communication and Cross-functional Teams

As an SME, I am in constant communication with various people in many different positions and areas of expertise. Once we have created a model, assessed the problem, designed a solution, the last step is to communicate this solution to the rest of the project team.

For a docking evolution (which is the start or end of an availability), there are divers, dock masters, enlisted and commissioned sailors, rope handlers, crane operators, etc. that are all working on getting the ship into or out of drydock. A drydock is literally a dry dock. The ship is lifted out of the water onto a platform.

It may seem simple but it takes days of prep and getting the ship into the drydock and completing an evolution can take almost 28 straight hours (thankfully there is overtime which makes 28 hours feel a wee bit like an 8 hour shift). During this time, we have to be in constant communication with all personnel to track, record, and report every stage of this process as we are the official government oversight and lead POC and are also expected to find a solution to any and every problem that may arise whether it be who to call or what technical manual to read. Naval ships cost hundreds of millions of dollars to build so I try my hardest to not break them! Below is what a docking evolution looks like.

For structural repairs or alterations, we often perform onsite inspections and meet with different personnel and project managers to get the complete picture of the issue before recommending a solution. Having strong communication skills in order to understand the task at hand and effectively communicate all requirements and solutions is definitely a must and is a great strength of mine!

Even though I enjoy what I currently do, I would love the opportunity to transition in a role in Solution Architecture and am confident that the skills (or powers) that I've harnessed will come in handy in a Solution Architect or similar role!

The Perfect Duo? (Testing Azure and Terraform Capabilities)

Kay Wilson — Tue, 11 Apr 2023 15:19:07 +0000

What better way to learn a new skill than to practice? Terraform is easily the best IaC application. Using it, I've created and deployed Azure resources like Kubernetes clusters, Active Directory and Security groups, CI/CD solutions using Azure DevOps, and Virtual Machine instances. It has been super easy to learn!

Terraform is similar to Azure Resource Management (ARM) templates, which I am familiar with and have completed a couple projects using it. However, Terraform is useful as it allows you to configure infrastructure across multiple cloud platforms. Most companies don't just use one cloud providers as one might have a resource that better than the other and vice versa. For example, Azure Active Directory is the arguably the best solution for identity and access management while AWS can be a better solution for IaaS. This is where Terraform comes in handy!

General Steps

Creating resources and solutions using Terraform is pretty straightforward. Each solution was deployed using these steps.

1.Create the respective providers, main, variables, outputs files, and service principal credentials.

providers: specifies cloud provider used
main: specifies the parameters for my resources
variables: specifies the variables for our parameters like the resource group location. These are parameter values that start with var in my main file.
outputs: specifies output variables
service principal credentials: specifies my service principal app id and password as I choose to use a least privilege model to protect my subscriptions and its resources

2.Using Azure CloudShell, I created and applied my Terraform execution plan.

terraform plan -out main.tfplan
terraform apply main.tfplan

3.Test the results!

Note: code are only samples of what was used and resources have additional parameters that were included in my Terraform files. You can find the complete code in my Github repo.

End to End Governance (Azure DevOps and Terraform)

In this project, designed an end to end governance solution for a scenario where a construction company called Kay Inc is working on the construction of a new hospital.

Kay Inc has various departments: Structural Design, Project Management, and IT (or superadmins). Each department has:

a team that develops solutions to make some aspects of the department more efficient. For example, a function application that automatically uploads published drawings from the design team to a Cosmos DB to be used and tracked by the project management dept.
an admin team that creates and manages management groups, subscriptions, RBAC, Policies, etc.

Active Directory and IAM: I created three Active Directory groups for each department: a group for the entire department and two separate groups for the developer team with DevOps and ARM Contributor roles and admin team with ARM Owner and DevOps Project Administrator roles. The IT department only has one AD group with "superadmin" privileges.

The construction project is broken down into 5 projects groups: one for each department, one for the entire company, and a collaborative space.

To implement my solution, I...

I created a DevOps Organization and PAT to configure the environment then created the DevOps projects, AD group assignments, and service connections for the company.

1. Configuring the AD groups

resource "azuread_group" "groups" {
  for_each                = var.groups
  display_name            = "kayinc-${each.value}-${local.suffix}"
  prevent_duplicate_names = true
  security_enabled        = true
}

2. Configuring the DevOps projects and enabling their necessary features

resource "azuredevops_project" "team_projects" {
  for_each        = var.projects
  name            = each.value.name
  description     = each.value.description
  visibility      = "private"
  version_control = "Git"

  features = {
    repositories = "enabled"
    pipelines    = "enabled"
    artifacts    = "disabled"
    boards       = "disabled"
    testplans    = "disabled"
  }
}

3. Configuring Security Group Assignments for each project with their assigned permissions and dependencies

module "ado_team_permissions" {
  for_each       = var.projects
  source         = "./modules/azure-devops-permissions"
  ado_project_id = azuredevops_project.team_projects["${each.value.team}"].id
  team_aad_id    = azuread_group.groups["${each.value.team}_devs"].id   # Receives 'Contributor' Permissions
  admin_aad_id   = azuread_group.groups["${each.value.team}_admins"].id # Receives 'Project Administrator' Permissions

  depends_on = [
    azuread_group.groups,
    azuredevops_project.team_projects
  ]
}

Creating a Kubernetes cluster using AKS

In this task, I created a AKS cluster and configured Container Insights to manage my Kubernetes environment using Terraform.

1. Create an AKS cluster specifying node pool parameters including the number of nodes

resource "azurerm_kubernetes_cluster" "k8s" {
  location            = azurerm_resource_group.rg.location
  name                = var.cluster_name
  resource_group_name = azurerm_resource_group.rg.name
  dns_prefix          = var.dns_prefix
  tags                = {
    Environment = "Development"
  }

default_node_pool {
    name       = "agentpool"
    vm_size    = "Standard_D2_v2"
    node_count = var.agent_count
  }

2. Configure Container Insights to monitor the health and performance of my cluster

resource "azurerm_log_analytics_solution" "test" {
  location              = azurerm_log_analytics_workspace.test.location
  resource_group_name   = azurerm_resource_group.rg.name
  solution_name         = "ContainerInsights"
  workspace_name        = azurerm_log_analytics_workspace.test.name
  workspace_resource_id = azurerm_log_analytics_workspace.test.id

  plan {
    product   = "OMSGallery/ContainerInsights"
    publisher = "Microsoft"
  }
}

Virtual Machine Scale Sets from a custom Packer image

Azure Virtual Machine Scale Sets automate the deployment of Virtual Machine instances based on specified parameters to decrease latency and increase availability.

1. Configuring the VM image by creating a resource group, service principal, and the Packer template file to a Packer image.

az group create -n kayPackerimages -l eastus

az ad sp create-for-rbac --role Contributor --scopes /subscriptions/kayssub1234 --query "{client_id: appId, client_secret: password, tenant_id: tenant }"

packer build ubuntu.json

Packer image file is a json file that include my Azure credentials, os type, image sku and location, etc.

2. Create a Virtual Network, subnet, and loadbalancer

resource "azurerm_virtual_network" "vmss" {
  name                = "vmss-vnet"
  address_space       = ["10.0.0.0/16"]
  location            = var.location
  resource_group_name = azurerm_resource_group.vmss.name
  tags = var.tags
}

resource "azurerm_subnet" "vmss" {
  name                 = "vmss-subnet"
  resource_group_name  = azurerm_resource_group.vmss.name
  virtual_network_name = azurerm_virtual_network.vmss.name
  address_prefixes       = ["10.0.2.0/24"]
}

resource "azurerm_lb" "vmss" {
  name                = "vmss-lb"
  location            = var.location
  resource_group_name = azurerm_resource_group.vmss.name

  frontend_ip_configuration {
    name                 = "PublicIPAddress"
    public_ip_address_id = azurerm_public_ip.vmss.id
  }

  tags = var.tags
}

2. Create my virtual machine using the Packer image

data "azurerm_image" "image" {
  name                = var.packer_image_name
  resource_group_name = data.azurerm_resource_group.image.name
}

Of course I did not start with these more complicated tasks. I first practiced configuring simple solutions like:

an Azure CosmosDB
Azure Virtual Machines
resource groups

See you soon? (Predictive Modeling using Machine Learning and Data Analysis)

Kay Wilson — Tue, 04 Apr 2023 16:35:03 +0000

Introduction

Machine Learning and AI has always perplexed me. I always figured it was based on past data and prediction somehow but never understood exactly how. This project offered the perfect opportunity to learn something new and gain hands-on practice using Power BI that helped me pass Microsoft's Power BI Data Analyst certification exam.

In this project, I cleansed a dataset and created visuals and machine learning models to predict the readmission of diabetic patients. Microsoft offers many data analysis tools. Of these tools, I used Databricks with Apache Spark and Power BI. The data set was provided by VCU's Center for Clinical and Translational Research.

Databricks vs Power BI

Databricks and Power BI differ with their respective user-friendliness. Databricks uses programming language for both data analysis and machine learning. I opted to use Python and PySpark (Apache Spark for Python). Power BI, however, is little-to-no code and allows you perform data analysis via a very user-friendly GUI that looks similar to Excel. Essentially, Databricks is for developers and Power BI is for everyone else.

Both Databricks and Power BI allow SQL queries. Within Databricks, SQL queries are what allow you to create visuals. Within Power BI, DAX and SQL can both be used for queries but are not required to create visuals. You can simply specify the data you want to use in the field pane. You can even create or import custom visuals using Python and R languages and animated visuals.

Preparing Data

Importing Data

Databricks: I created a blob storage account in Azure and a container and then uploaded the dataset file in CSV format as blob. I then mounted the storage account to my Databricks notebook and imported my CSV blob and created a dataframe . One mistake I made was that in order to use my blob storage account in Databricks, I needed to enable Data Lake Gen 2 on my storage account. This gave me headache in the beginning because kept receiving an error message when attempting to mount as I could not figure out why it would not upload my blob.

Mounting my blob storage account:

dbutils.fs.mount(
  source = "wasbs:/mycontainer@myblobstorageaccount.blob.core.windows.net",
  mount_point = "/mnt/mymountpoint",
  extra_configs = {"fs.azure.account.key.mydatalake.blob.core.windows.net": "myaccountkey123"})

Creating a dataframe from my blob:

df = (spark.read
  .format("csv")
  .option("header", "true")
  .option("inferSchema", "true")
  .load("wasbs://mycontainer@myblobstorageaccount.blob.core.windows.net/diabetic_data.csv")
)

Power BI: The CSV or flat file was imported locally to Power BI Desktop.

Cleansing Data

Only necessary data were needed from the raw data set. I was able to clean the data using various factors.

Columns deemed as potential indicators of diabetes readmissions were used for analysis. These were race, whether they were on any diabetes medications, previous number of readmissions, and the number of diagnoses, time spent in the hospital.
I dropped rows with certain discharge disposition IDs to exclude the patients that are no longer with us.
Patient number was included as a partition key.
Created an additional column called "Readmission Score" with the data type of integer to be used in certain calculations
Certain columns that could be used as potential indicators were deleted if there were too many rows with missing values (i.e., medical specialty).

Databricks: I created a new or cleansed dataframe with only the needed columns from my raw dataframe using code.

cleansed_df = df.select("patient_nbr", "race", "admission_type_id", "discharge_disposition_id", "number_diagnoses", "readmitted", "num_medications", "diabetesMed", "gender", "age")

Power BI: During the initial import of my data, the Transform Data option was used to only include the necessary columns and rows.

Modeling Data

After cleansing, tables were created to determine the correlation, if any, between the data.

Factors analyzed:

number of diagnoses
number of medications
race
sex
admission type
age

Choosing my aggregate function

There were a couple functions that I used in my queries such as count, avg, kurtosis, skewness

Average: used in finding the "readmission score" of a group
Count: used in showing the distribution of data
Kurtosis: readmission data showed a platykurtic distribution with a kurtosis of about -1.7. This means data was a little too flat.
Skewness: The readmissions data had a skewness score of 0.38. This shows that the distribution of my data is fairly symmetrical.

Visualization

Graphs and charts were created to visualize some of the relationships between data. It is much easier to read a chart then just look at a table. The type of visual used was dependent on the parameter used.

Bar graphs for average and count functions
Pie charts for percentages
Cards for single values

Power BI has a neat key influencer visual that analyzes your data for you to determine factors influencing a certain metric. In my case, readmission scores.

Analysis Results

Once I created my measures and visuals, I looked at my data to see what the results were. Here is a break down of what I saw. Using Power BI, I was able to create a nice report using my visuals which you can view here.

Race and Gender

This was an obvious choice of data to look at for trends. I used count and average to examine the readmission history based on the readmission score. I was able to use count to see the number of patients of a specific demographic that had a specific score and find the average readmission score based on a specific demographic.
This is what I saw:

Caucasians and Blacks had a higher readmission score than other races.
Asians had the lowest readmission score of any race
Women had a higher readmission score than Men
Asian women had a even higher readmission score than Asian men when comparing women and men of other races

Age

Age really did not show a correlation which was surprising to me. I expected to see that older patients would have a higher readmission rate. The average patient age was 71 no matter the readmission score. The age data was skewed making useless.

Number of Diagnoses

I was very confident that the number of diagnoses a patient had was going to greatly impact their readmissions. I expected to see that the higher the number of diagnoses, the higher the admission score. The data showed that there was little to no correlation.

Number of medications

I expected to see that the higher the number of medications, the lower the admission score as they would be managing their health at home. The data showed there was little to no correlation. A patient with no readmissions had only 1 more medication than those who have been.

Admission Type

I was most surprised to see that Admission Type showed the largest influence on readmissions. I was able to see that patients with a application type ID of 6 or (what type is it) had the highest readmission score, and the most common type of admission type was type 1.

Machine Learning (ML)

Power BI

Power BI has a built in no code ML feature called AI Insights. To use this feature, I created a dataflow in my Power BI workspace and added a machine learning model with the selected data that I wanted to include. My ML model was then applied to my table.

Databricks

I used AutoML in Python syntax to create a machine learning model.

from databricks import automl

summary = automl.classify(train_df, target_col="readmitted", timeout_minutes=15)

model_uri = summary.best_trial.model_path

import mlflow

# Prepare test dataset
test_pdf = test_df.toPandas()
y_test = test_pdf["Y"]
X_test = test_pdf.drop(["Y"], axis=1)
X_test.head()
# Run inference using the best model
model = mlflow.pyfunc.load_model(model_uri)
predictions = model.predict(X_test)
test_pdf["readmitted"] = predictions

import sklearn.metrics

model = mlflow.sklearn.load_model(model_uri)
sklearn.metrics.plot_confusion_matrix(model, X_test, y_test)