Forem: Adolfo Estevez

GCP Professional Data Engineer Guide - September 2020

Adolfo Estevez — Tue, 06 Oct 2020 07:01:11 +0000

I have recently recalled my first experience with GCP. It was in London, shortly before the 2012 Olympics, in an online gaming project, initially thought for AWS, that was migrated to App Engine - PAAS platform that would evolve to the current GCP.

My initial impression was good, although the platform imposed a number of development limitations, which would be reduced later with the release of App Engine Flexible.

Coinciding with the launch of Tensor Flow as an Open Source framework in 2015, I was lucky enough to attend a workshop on neural networks - given by one of the AI scientists from Google Seattle - where I had my second experience with the platform. I was very surprised by the simplicity of configuration and deployment, the NoOps concept and a Machine Learning / AI offering, without competition at the time.

Do Androids Dream of Electric Sheep? Philip K. Dick would have "hallucinated" with the electrical dreams of neural networks - powered by Tensor Flow.

Exam

The structure of the exam is the usual one in GCP exams: 2 hours and 50 questions, with a format directed towards scenario-type questions, mixing questions of high difficulty with simpler ones of medium-low difficulty.

In general, to choose the correct answer, you have to apply technical and business criteria. Therefore, it is necessary a deep knowledge of the services from the technological point of view, as well as skill / experience to apply the business criteria in a contextual way, depending on the question, type of environment, sector, application, etc .. .

Image #1, Data Lake, the ubiquitous architecture - Image owned by GCP

We can group the relevant services according to the states (and substates) of the data cycle:

Management, Storage, Transformation and Analysis.

Ingestion Batch / Data Lake: Cloud Storage.
Ingestion Streaming: Kafka, Pub/Sub, Computing Services, Cloud IoT Core.
Migrations: Transfer Appliance, Transfer Service, Interconnect, gsutil.
Transformations: Dataflow, Dataproc, Cloud Dataprep, Hadoop, Apache Beam.
Computing: Kubernetes Engine, Compute Instances, Cloud Functions, App Engine.
Storage: Cloud SQL, Cloud Spanner, Datastore / Firebase, BigQuery, BigTable, HBase, MongoDB, Cassandra.
Cache: Cloud Memorystore, Redis.
Analysis / Data Operations: BigQuery, Cloud Datalab, Data Studio, DataPrep, Cloud Composer, Apache Airflow.
Machine Learning: AI Platform, BigQueryML, Cloud AutoML, Tensor Flow, Cloud Text-to-Speech API, Cloud Speech-to-Text, Cloud Vision API, Cloud Video AI, Translations, Recommendations API, Cloud Inference API, Natural Language, DialogFlow, Spark MLib.
IoT: Cloud IoT Core, Cloud IoT Edge.
Security & encryption: IAM, Roles, Encryption, KMS, Data Prevention API, Compliance ...
Operations: Kubeflow, AI Platform, Cloud Deployment Manager ...
Monitorization: Cloud Stackdriver Logging, Stackdriver Monitoring.
Optimization: Cost control, Autoscaling, Preemptive instances ...

Pre-requisites and recommendations

At this level of certification, the questions do not refer, in general, to a single topic. That is, a question from the Analytics domain may require more or less advanced knowledge of Computing, Security, Networking or DevOps to be able to solve it successfully. I´d recommend having the GCP Associate Cloud Engineer certification or have equivalent knowledge.

GCP experience at the architectural level. The exam is focused, in part, on the architecture solution, design and deployment of data pipelines; selection of technologies to solve business problems, and to a lesser extent development. I´d recommend studying as many reference architectures as possible, such as the ones I show in this guide.
GCP experience at the development level. Although no explicit programming questions appeared in my question set, or in the mock test, the exam requires technical knowledge of services and APIS: SQL, Python, REST, algorithms, Map-Reduce, Spark, Apache Beam (Dataflow) …
GCP experience at Security level. Domain that appears transversally in all certifications - I´d recommend knowledge at the level of Associate Engineer.
GCP experience at Networking level. Another domain that appears transversely - I´d recommend knowledge at the level of Associate Engineer.
Knowledge of Data Analytics. It's a no-brainer, but some domain knowledge is essential. Otherwise, I´d recommend studying books like “Data Analytics with Hadoop” or taking courses like Specialized Program: Data Engineering, Big Data and ML on Google Cloud in Coursera. Likewise, practicing with laboratories or pet projects is essential to obtain some practical experience.
Knowledge of the Hadoop - Spark ecosystem. Connected with the previous point. High-level knowledge of the ecosystem is necessary: Map Reduce, Spark, Hive, Hdfs, Pig …
Knowledge of Machine Learning and IoT. Advanced knowledge in Data Science and Machine Learning is essential, apart from specific knowledge of GCP products. There are questions exclusively about this domain - at the level of certifications like AWS Machine Learning or higher. IoT appears on the exam in a lighter form, but it is essential to know the architecture and services of reference.
DevOps experience. Concepts such as CI / CD, infrastructure or configuration as code, are of great importance today, and this is reflected in the exam, although they do not have a great specific weight.

Standard questions

Representative question of the level of difficulty of the exam.

Image property of GCP

Practical migration scenario question, that includes cloud services and the Hadoop ecosystem, as well as concepts from the Analytics domain.

Services to study in detail

Image #2 - property of GCP

Cloud Storage - Core service that appears consistently in all certifications, and is central in the Data Lake systems. I´d recommend its study in detail at an architectural level - see Image 1 -, configurations according to the data temperature, and as an integration / storage element between the different services
BigQuery - Core service in the Analytics GCP domain as a BI and storage element. Extremely important in the exam, so have to be studied in detail: architecture, configuration, backups, export / import, streaming, batch, security, partitioning, sharding, projects, datasets, views, integration with other services, cost, queries and optimization SQL (legacy and standard) at table levels, keys …
Pub / Sub - Core service as an element of ingestion and integration. Its in-depth study is highly recommended: use cases, architecture, configuration, API, security and integration with other services (eg Dataflow, Cloud Storage) - Kafka's native cloud mirror service.
Dataflow - Core service in the Analytics GCP domain as a process and transformation element. Implementation based on Apache Beam that is necessary to know at a high level and pipeline design. Use cases, architecture, configuration, API and integration with other services.
Dataproc - Core service in the Analytics GCP domain as a process and transformation element. It is a service based on Hadoop, and therefore, it is the indicated service for a migration to the cloud. In this case, not only knowledge of Dataproc is required, but also in native services: Spark, HDFS, HBase, Pig … use cases, architecture, configuration, import / export, reliability, optimization, cost, API and integration with other services.
Cloud SQL, Cloud Spanner - Cloud native relational databases. Use cases, architecture, configuration, security, performance, reliability, cost and optimization: clusters, transactionality, disaster recovery, backups, export / import, SQL performance and optimization, tables, queries, keys and debugging. Integration with other services.
Cloud Bigtable - Low latency NoSQL managed database, suitable for time series, IoT… ideal to replace a HBase installation on premise. Use cases, architecture, configuration, security, performance, reliability and optimization: clusters, CAP, backups, export / import, partitioning, performance, and optimization of tables, queries, keys. Integration with other services.
Machine Learning - One of the strengths of the certification is the domain "Operationalizing machine learning models". Much more dense and complex than it may seem at first, since it not only includes the operability and knowledge of the relevant GCP services; likewise, it includes the knowledge of the Data Science fundamentals: algorithm selection, optimization, metrics … The level of difficulty of the questions is variable, but comparable to that of specific certifications, such as AWS Certified Machine Learning - Specialty. Most important services: BigQuery ML, Cloud Vision API, Cloud Video Intelligence, Cloud AutoML, Tensor Flow, Dialogflow, GPU´s, TPU´s …
Security - Security is a transversal concern across all domains, and appears consistently in all certifications. In this case, it appears as an independent technical topic, crosscutting concern or as a business requirement: KMS, IAM, Policies, Roles, Encryption, Data Prevention API …

Image #3, IoT Reference Architecture - owned by GCP

Very important services to consider

Networking - Cross-domain that can appear in the form of separate technical issues, cross cutting concerns, or as business requirements: VPC, Direct Interconnect, Multi Region / Zone, Hybrid connectivity, Firewall rules, Load Balancing, Network Security, Container Networking, API Access ( private / public) …
Hadoop - The exam covers ecosystems and third-party services like Hadoop, Spark, HDFS, Hive, Pig … use cases, architecture, functionality, integration and migration to GCP.
Apache Kafka - Alternative service to Pub / Sub, so it is advisable to study it at a high level: use cases, operational characteristics, configuration, migration and integration with GCP - plugins, connectors.
IoT - It can appear in various questions at the architectural level: use cases, reference architecture and integration with other services. IoT core, Edge Computing.
Datastore / Firebase - Document database. Use cases, configuration, performance, entity model, keys and index optimization, transactions, backups, export / import and integration with other services. It doesn't carry as much weight as the other data repositories.
Cloud Memory Store / Redis - Structured data cache repository. Use cases, architecture, configuration, performance, reliability and optimization: clusters, backups, export / import and integration with other services.
Cloud Dataprep - Use cases, console and general operation, supported formats, and Dataflow integration.
Cloud Stackdriver - Use cases, monitoring and logging, both at the system and application level: Cloud Stackdriver Logging, Cloud Stackdriver Monitoring, Stackdriver Agent and plugins.

Other services

MongoDB, Cassandra - NoSQL databases that can appear in different scenarios. Use cases, architecture and integration with other services.
Cloud Composer - Use cases, general operation and web console, configuration of diagram types, supported formats, import / export, integration with other services, connectors.
Cloud Data Studio - Use cases, configuration, networking, security, general operation and environment, and integration with other services.
Cloud Data Lab - Use cases, general operation and web console, types of diagrams, supported formats, import / export and integration with other services.
Kubernetes Engine - Use cases, architecture, clustering and integration with other services.
Kubeflow - Use cases, architecture, environment configuration, Kubernetes.
Apache Airflow - Use cases, architecture and general operation.
Cloud Functions - Use cases, architecture, configuration and integration with other services - such as Cloud Storage and Pub / Sub, in Push / Pull mode.
Compute Engine - Use cases, architecture, configuration, high availability, reliability and integration with other services.
App Engine - Use cases, architecture and integration with other services.

Bibliography & essential resources

Google provides a large number of resources for the preparation of this certification, in the form of courses, official guide book, documentation and mock exams. These resources are highly recommended, and in some cases, I would say essential.

The Certification Preparation Course, contained in the Data Engineering Specialized Program, includes an extra exam, lots of additional tips and materials and labs - using the external Qwik Labs tool.

GCP Certification Guide
Google Docs
Practice exam
Readiness course – highly recommended, includes an additional practice test.
Data Engineering, Big Data and ML on Google Cloud
Thirteen GCP Reference Architectures

Bibliography (selection) that I have used for the preparation of the certification

As I have previously indicated, I find the Google courses on Coursera to be excellent, as they combine a series of short videos, reading material, labs, and test questions, thus creating a very dynamic experience. In any case, they should only be considered as a starting point, being necessary the deepening - according to experience - in each one of the domains using, for instance, the excellent GCP documentation.

But you should not limit yourself to online courses. I can't hide the fact that I love books in general, and IT books in particular. In fact, I have a huge collection of books dating back to the 80s, which at some point I will donate to a local Cervantina bookstore.

Books provide a deeper and more dynamic experience than videos, which can be a bit monotonous if they are too long - as well as being a much more passive experience - like watching TV. The ideal is the combination of audiovisual and written media, thus creating your own learning path.

Laboratories

Image #4 - Data Lake based upon Cloud Storage - owned by GCP

Part of the job as a Data Engineer consists of creating, integrating, deploying and maintaining data pipelines, both in batch and streaming mode.

The Data Engineering Quest contains several labs that introduce the creation of different data transformation, IoT, and Machine Learning pipelines, so I find them excellent exercises - and not just for certification.

Is it worth?

The level of certification is advanced, and in general, it should not be the first cloud certification to obtain. It covers a large amount of material and domains, so tackling it without a certain level of prior knowledge can be quite a complex task.

If we compare it with the mirror certification on the AWS platform, it covers almost twice as much material, mainly due to the inclusion of questions about the Machine Learning / Data Science domain - which in the case of AWS have been eliminated, to be included in its own certification. Therefore, it is like taking two certifications in one.

Is it worth? of course, but not as a first certification - depending on the experience provided.

Certifications are a good way, not only to validate knowledge externally, but to collect updated information, validate good practices and consolidate knowledge with real practical cases (or almost).

Good luck to you all!

An AWS Summer: EFS & Lambda + Serverless Framework

Adolfo Estevez — Mon, 05 Oct 2020 09:56:18 +0000

The autumn equinox has just passed, which is a perfect moment to look back, and review some of the features released in this last summer by AWS - in no particular order, just because I think they are cool - and useful :)

Serverless challenges

If you´ve been developing serverless applications for a while, pretty sure you have found yourself facing a few challenges, apart from the old cold start thing - which have been solved to a great extent with the Provisioned Concurrency feature.

For instance, let's say you need to load large files of rules consumed by a Lambda function, that implements a rules engine, or you need to keep data files produced dynamically by the function between invocations. Lambda provides some local space - 512MB - that you may use, but it's small and ephemeral, so is not useful for those kinds of scenarios.

Other solutions come to mind: storing in databases - RDS, DynamoDB,S3 ... but comes with a high price of development, performance and cost. What would happen if we had peaks of several hundreds - or thousands requests - per second, loading big files in the startup and writing files to a data store concurrently?

Well, at the very least, we could have a big performance hit, depending on the size of the files, the latency of retrieving the files at startup + the cold start of Lambdas - enter provisioned concurrency - plus the latency of storing the intermediate files to the datastores - it's not the same storing and retrieving from S3 than from DynamoDB.

So no alternative? Well, we are in luck, as AWS released EFS support for Lambda in June!

Image property of AWS

Amazon EFS is widely known, so I'm not going to delve depth into the service, but just to mention that Amazon Elastic File Service provides a NFS file system that escalates on demand, providing high throughput and low latency. It's very useful when shared storage, and a parallel access from the services it´s needed.

Configuration & Considerations

"With power comes responsibility", or in our case with powerful features come some configuration constraints. EFS runs in different subnets within a VPC, which means that our Lambda functions have to run within a VPC as well. That comes with a price: IP directioning, possible performance hit, loss of connection to AWS global services, therefore a NAT Gateway or Private Links / Gateway might need to be used, depending on the use case.

That constraint was vastly improved last year when Hyperplane ENI for Lambda was released, allowing that just a few ENI´s - and therefore a few IP´s - would be enough to handle a big number of Lambda invocations, decoupling function scaling from ENI´s provisioning.

Configuration - Serverless Framework

The configuration of a Lambda function running within a VPC could be fairly simple - if only needs to access the VPC resources - as in shown in the image below - under the vpc label:

Serverless framework YAML - Image MNube.org

A security group is needed for the Lambda function, the ID´s of subnet(s) where the ENI(s) will be placed, and permissións to create, delete, and describe network interfaces.

VPC Lambda - Image MNube.org

The Lambda function is running within our VPC now, an ENI placed in each subnet selected, but in order to access the EFS instance a few permissións will need to be provided:

Role permissións EFS, Lambda - Image MNube.org

Now the EFS can be created within the VPC. In order to do that, the console, Cloudformation, Serverless, AWS CLI, AWS SDK, etc ... could be used.

EFS instance - Image MNube.org

After creating the instance, an access point needs to be provided to allow applications access. This is a new resource: "AWS::EFS::AccessPoint". It can be created from the console, or through a cloudformation file - we will need to supply the EFS ID: ${self.provider}.

Serverless framework YAML - Image MNube.org

Finally, we link the file system to the Lambda Function, providing the arn of the EFS, the arn of the access point, and the local mounted path - as shown on the image below:

Image MNube.org

The EFS instance is ready to be accessed by the Lambda function :)

Solution

I have used the Serverless framework to produce the solution - but AWS SAM with Cloud 9 as the official alternative could have been used instead. I have quite experience with Serverless, having introduced it to a few companies - including Everis - with big success.

Architecture - MNube.org

Let's create - or transfer - a rules file that can be accessed from the Lambda function :)

Different services could be used to transfer the files, like AWS DataSync, an EC2 instance, or even creating files from code. The files we might transfer from EC2 are accessible from the Lambda functions, so we´ll use this method.

After the EC2 instance has been created - a t2.micro is enough - in one of the subnets of the VPC that has access to the EFS ENI´s, a directory we´ll be needed - /efs. That directory doesn't have any link to the EFS instance, so we´ll need to mount the directory.

One way to do it is using the EFS tools:

                     sudo yum install -y amazon-efs-utils

An access point was created previously that we can use to mount the directory. It's easy to get the command line needed from the web console. Just go to to the Amazon EFS > Access Point > id link, and press the Attach button:

EFS Mount - Image MNube.org

After mounting the directory - in green - the files can be transfer to the /efs directory:

Mounting and creating files - Image MNube.org

At this point, the access to the directory from the Lambda function should be fully possible. I have coded a minimum Lambda function that lists the files contained in the directory:

Lambda function - Image MNube.org

The solution is now ready to be deployed. Keep in mind that I have only shown parts of the serverless.yml, equivalent to the cloudformation file you might use to provide the infrastructure - I will leave that to you as an exercise.

                serverless deploy --stage dev --region eu-west-1

Serverless Stack - Image MNube.org

An URL link is provided by the framework, as I created an API gateway that invokes the Lambda function:

Cloudwatch Logs - Image from MNUBE.org

I have captured the request trace from the Cloudwatch Logs, where we can see the files in /efs: test.txt and rules.txt, and the low latency of the request.

Other Use Cases

Loading big libraries that Lambda layers can´t handle.
Files that are updated regularly.
Files that need locks for concurrent access.
Access to big files - zip / unzip.
Using different computing architectures - EC2, ECS - to process the same files.

AWS Data Analytics Certification, is it worth?

Adolfo Estevez — Sun, 17 May 2020 08:13:48 +0000

On April 13, the journey of the new AWS Data Analytics Specialty certification officially began - prior to the beta phase in December 2019 / January 2020. It coincided in time with the AWS Database Specialty Beta, which forced me to choose between the two. Finally, I decided on taking the Databases Specialty, as I had recently tested from AWS Big Data.

The “Beta exam” experience is very different from the “standard” one: 85 questions and 4 hours long - that is, 20 questions and one more hour - a really intense experience. I recommend taking a 5-minute break - in the centers they are allowed - since after the third hour it is very difficult to stay focused.

The certification is the new version of AWS Big Data Specialty, an exam that will be withdrawn in June 2020. I will not go into much depth on the differences, suffice it to say that the domain of Machine Learning has been eliminated, expanding and updating the rest of domains in depth. But beware, Machine Learning and IoT continue to appear integrated in the other domains, therefore, it is necessary to know them at an architectural level, at the very least.

Image from aws.amazon.com

Prerequisites and recommendations

I will not repeat the information that is already available on the AWS website; instead, I am going to give my personal recommendations and observations, as I consider the Learning Path that AWS suggests to be somewhat light for the current level of the exam.

AWS experience at the architectural level. The exam is largely focused on advanced architecture solution - 5 pillars - and to a lesser extent on development, which is present mainly in services such as Kinesis and Glue. I recommend being in possession of the AWS Architect Solutions Pro certification or alternatively the AWS Architect Associate + AWS Security Specialty.
Advanced AWS security experience. it is a complete domain of the exam, but can be found - cross domain - in many questions. If you are in possession of the AWS Architect Solutions Pro, general security knowledge may be sufficient - not the specific certification knowledge for each service. Otherwise, the AWS Security Specialty is a good option, or equivalent knowledge in certain services - that I will indicate later on.
Analytics knowledge. Otherwise, I´d recommend studying books such as “Data Analytics with Hadoop” - O’Reilly 2016, or taking the courses indicated in the AWS Learning Path. Likewise, carry out laboratories or pet projects to obtain some practical experience.
Hadoop´s ecosystem knowledge. Connected to the previous point. High-level and architectural knowledge of the ecosystem is a must: Hive, Presto, Pig, …
Knowledge of Machine Learning and IoT - AWS ecosystem. Sagemaker and core IoT services at the architectural level

Imagen aws.amazon.com

Exam

The questions follow the style of other certifications such as AWS Pro Architect or Security or Databases Specialty. They are all “scenario based”, long and complex - most of them. You are not going to find many simple questions. Certainly, between 5% and 10% of “easy” questions appeared, but all in a “scenario” format.

Let's look at an example taken from the AWS sample questions:

Imagen amazon.aws.com

I´d classify this question as "intermediate" level of difficulty. If you have taken the Architect PRO, or some specialty such as Security or Big Data, you will know what I am talking about. Certainly, the level of the questions is much higher and deeper than in the previous version of the exam.

I´d recommend doing the new specialty directly, as the old one contains questions about already deprecated services - or outdated information.

Services to know in depth

Image from aws.amazon.com

AWS Kinesis - in its three modalities, Data Streams, Firehose and Analytics. Architecture, dimensioning, configuration, integration with other services, security, troubleshooting, metrics, optimization and development. Questions of various levels, some of them very complex and of great depth.

AWS Glue - in deep for ETL and discover - an integral part of the exam. Questions of different levels - I did not find them to be the most difficult.

AWS Redshift - architecture, design, dimensioning, integration, security, ETL, backups … a large number of questions and some of them very complex.

AWS EMR / Spark - architecture, sizing configuration, performance, integration with other services, security, integration with the Hadoop ecosystem - very important, but not as important as the previous three services. Very complex questions that require advanced and transversal knowledge of all domains and the Hadoop ecosystem: Hive, HBase, Presto, Scoop, Pig …

Security - KMS encryption, AWS Cloud HMS, Federation, Active Directory, IAM, Policies, Roles etc … in general and for each service in particular. Transversal questions to other domains and of a high difficulty.

Very important services to consider

AWS S3 - core service base (storage, security, rules) and new features like AWS S3 Select. It appears consistently across all certifications, which is why I´d assume it's known in depth except for the new features.
AWS Athena - architecture, configuration, integration, performance, use cases. It appears consistently and as an alternative to other services.
AWS Managed Kafka - alternative to Kinesis, architecture, configuration, dimensioning, performance, integration, use cases.
AWS Quicksight - subscription formats, service features, different ways of viewing, use cases. Alternative to other services.
AWS Elastic Search y Kibana (ELK) - architecture, configuration, dimensioning, performance, integration, use cases. Alternative to other services.
AWS Lambda - architecture, integration, use cases.
AWS StepFunctions - architecture, integration, use cases.
AWS DMS - architecture, integration, use cases.
AWS DataPipeline - architecture, integration, use cases.

Image from aws.amazon.com

Other services

AWS Networking - basic network architectures and knowledge: VPC, security groups, Direct Connect, VPN, Regions, Zones … network configuration of each particular service.
AWS DynamoDB, ElasticCache - architecture, integration, use case knowledge. These services, which appeared very prominently in the previous version of the exam, have much less weight in the current one.
AWS CloudWatch, Events, Log - architecture, configuration, integration, use case knowledge.
AWS RDS y Aurora - architecture, configuration, integration, use case knowledge.
EC2, Autoscaling - knowledge of architecture, integration, use cases.
SQS, SNS - knowledge of architecture, integration, use cases.
AWS Cloudformation - knowledge of architecture, use cases, devops.
Sagemaker y AWS IoT core - knowledge of architecture, integration, use cases.

Essential Resources

AWS Certification Website.
Example questions.
Readiness Course - a must, packed with information and resources - including a 20 question test.
AWS Whitepapers - Big Data Analytics Options on AWS.
AWS FAQS for every service - specially for Kinesis, Glue, Redshift, EMR.
AWS Big Data Blog
Practice Exam - a must, quite challenging and very representative of the actual exam.

Is it worth then?

Let´s see :)

AWS Data Analytics Specialty is a complex and difficult certification; expensive (300 euros), which requires a very important investment of time - even having experience in analytics and AWS. Therefore, it is not a decision that can be taken lightly.

In my personal case, I found it very convenient to have done it, since I having been working on several projects of that kind - fast data, IoT - under AWS in recent times - apart from being the only certification that I needed to complete the full set of thirteen - if Big Data is included - certifications.

Certifications are a good way, not only to validate knowledge externally, but to collect updated information, validate good practices and consolidate knowledge with real (or almost) practical cases.

For those interested in the analytics field or who have professional experience in it, and who want to make the leap to the cloud, my recommendation is to first obtain an AWS Architect-type certification - preferably PRO - and optionally the Security specialty or equivalent knowledge , at least in the services that I have mentioned in previous points.

For those who already have AWS certifications, but no professional experience in the specific field, it may be a good way to start, but it will not be an easy or short path. I recommend doing labs or pet projects, in order to get some experience necessary to pass the exam.

So is it worth it? Absolutely, but not as a first certification. Especially aimed at people with advanced knowledge of AWS architecture who want to delve deeper into the analytics - cloud field.

Good luck to you all!

somewhat light for the current level of the exam.

The Reference Architecture Disappointment

Adolfo Estevez — Tue, 12 May 2020 05:13:57 +0000

There is a "phenomenon" that I have experienced through my career that I like to call the "Reference Architecture Disappointment".

It´s a similar effect some people would experiment when they go to the MD´s consultation with several symptoms, just to find out that they may have a common cold. No frenzy at the Hospital, no crazy consultations, no House MD´s TV scenes. Just paracetamol, water and rest!

So many years of Medicine School just to prescribe that? Well, yes. The MD was able to recognize a common cold between dozen of illnesses with the same set of symptoms, and prescribed the simplest and best treatment. Question is, would you be able to do it?

Same thing when a Solutions Architect deals with a set of requirements. The "Architect" will select the best architecture that solves a business problem, in the simplest and efficient manner possible. That means - sometimes - to use the "Reference Architecture" for that particular problem, with the necessary changes.

Those architectures emerge from practical experience and encompass patterns and best practices. Usually reinventing the wheel is just not a good idea. Keep it simple and Rock On!