Forem: Muhammad Tabaza

Never Used GraphQL? Fear Not!

Muhammad Tabaza — Tue, 29 Dec 2020 11:53:54 +0000

This article is a practical introduction to GraphQL that enables you to get started building beautiful APIs with Pragmalang.

What Is GraphQL?

GraphQL is a language specification released by Facebook and standardized in 2015. The language is designed to make retrieving data from front-end applications easier for developers. Some of its advantages are:

Type-Safe Communication: You always know the type/shape of the data you're receiving, and the type of data the server expects from you.
Standard Documentation: All GraphQL APIs must follow a certain standard where it comes to documentation. Many tools know how to interpret this documentation by introspecting the API.
Tooling: Since GraphQL is a standard, many tools are designed to work with all GraphQL APIs, such as tools that generate TypeScript definitions based on the GraphQL schema to be used in your Web app.
No Underfetching/Overfetching Of Data: You select exactly the data that you need, and you can perform multiple operations within a single query.

GraphQL is actually two languages smashed into one: a schema definition language, and a query language.

Schema Definition Language

The schema definition language is what is used to describe the types of data that the API operates on, and what the API is capable of doing with said types.For example:

# This is a type definition
type Character {
  name: String! # '!' means "required" or "not nullable"
  friends: [Character] # '[]' mean "array". Types can be recursive
  homeWorld: Planet! # Types can relate to each other
}

# Another type definition
type Planet {
  name: String!
  climate: String!
}

# A special type definition that specifies
# the queries that a client can perform
type Query {
  getCharacter(name: String!): Character
  #            ^^^^^^^^^^^^^   ^^^^^^^^^
  #           Query Arguments  Query Return Type
  # Note that the return type is `Character`, not `Character!`
}

Comments in schema definitions are typically used for documentation. Comments in the example above were used for clarification purposes.

GraphQL APIs have three types of operations that a client can perform:

Queries: Simple retrieval of data
Mutations: Changes to existing data, or addition of new data.
Subscriptions: Retrieval of dta over time.

When developing a GraphQL API, you typically need to specify the queries, mutations, and subscriptions that your server can handle. In the example above, the server is only capable of performing a single query that takes a name of a character, and returns that character. But here's the catch: you need to specify **how* the query is handled*. Using traditional GraphQL frameworks, you would need to define what are known as resolvers to specify how each field of the type is fetched from the database.

Query Language

Once you're done writing a GraphQL API, you can use one of many GraphQL clients to send operations to your server. Most GraphQL APIs come with a GraphQL playground that you can use to write queries, and click a button to send them to the server, and display the JSON result. But when you're building a graphical application, you would want to use a client library from your code. An example query would be:

query {
  rick:getCharacter(name: "Rick Sanchez") { # We gave the operation a 'rick' alias
    name # Rick's name, just to make sure
    friends {
      name # We only want their names
    }
    homeWorld {
      name # We only want the name
    }
  }
}

This query - if Rick exists in the database - would return:

{
  "rick": {
    "name": "Rick Sanchez",
    "friends": [],
    "homeWorld": {
      "name": "Earth"
    }
  }
}

Rick's friends array is empty because he has no friends.

Where Does Pragma Come In?

When you're developing a GraphQL API, you need to specify the schema with all the required operations and types, and then implement resolvers for these operations. Of course, you would need to set up the database first.

What Pragma does is that it only requires that you define a Pragma schema. That's it! It generates a database, GraphQL schema, and all the operations you would need. Plus, with very simple syntax, you can extend the functionality of these operations using serverless functions that you import as if you were using a normal function from a library. Here's an example Pragma schema:

config { projectName = "character_api" }

@1 model Character {
  @1 name: String @primary
  @2 friends: [Character]
  @3 homeWorld: Planet
}

@2 model Planet {
  @1 name: String
  @2 climate: String
}

Note the small differences between the Pragma schema and the GraphQL schema. In Pragma, the equivalent of a GraphQL type is a model. Models and model fields need to have unique indices (e.g. @1, @5), which enables Pragma to perform most database migrations automatically based on changes to the schema.

You can explore the GraphQL API generated by Pragma by running the above schema (see Install Pragma), or you can check out another example at The Generated API section of the documentation.

If you have any questions or feedback, join our Discord channel, or leave a comment!

Go start building that new idea you had last week! It would only take a few minutes :)

Pragma: A Language for Building GraphQL APIs In No Time

Muhammad Tabaza — Thu, 29 Oct 2020 16:53:22 +0000

We're very excited to announce the first release of Pragma: An open-source domain-specific language for building GraphQL APIs by defining data models, and their associated validation/transformation, and authorization logic. Pragma takes your data model definitions, and automatically generates a fully functioning GraphQL API that you can use right away.

Motivation

Building a GraphQL API isn't a simple task. Writing a small API to create, read, update, and delete data in a database can take many hours, and lots of knowledge of the GraphQL framework and the language you're using.

Pragma aims to simplify this process by being incredibly easy to lean, fast to work in, trivial to set up, and very easy to maintain.

What Does Pragma Offer?

Pragma offers a way to very quickly build incredibly powerful, and extensible APIs. It supports use of serverless functions written in many languages for data validation and transformation, and also in user authorization, which is built into the language. These languages include JavaScript, Python, Go, Swift, Rust, Ruby, PHP, Java, Scala, and Ballerina.

How Can I Use It?

You can visit the documentation and read the Getting Started section to install Pragma, and follow a tutorial where you get to build a basic Todo application.

How Can I Contribute?

You can help by opening GitHub issues for any bugs you come across, or opening a pull request to improve documentation. You can read the contributing section in the README to learn how to start hacking on Pragma itself. Any help is greatly appreciated.

How Do I Stay In Touch?

You can follow Pragma on Twitter @pragmalang, and here on DEV. You can also join our Discord server for a chat. We'd love to talk to you guys and learn from your experiences.

We truly wish you enjoy the development experience we're creating as much as we enjoy working on it! Happy hacking everyone!

Parsing The World with Rust and POM

Muhammad Tabaza — Wed, 14 Aug 2019 12:09:16 +0000

As programmers, we spend a lot of time dealing with strings of text. Very often, we receive text as input from users, or we read text files and try to understand their content.

In many cases, we use regular expressions to see if the text matches a certain pattern, or to extract some information from the string. For example, if you receive a username as form input, and you want to make sure that it doesn't contain any spaces, you can use a regex like "^\S+$" to match a string with one or more non-white space characters (^ denotes the beginning of the string, \S denotes a non-white space character, + denotes repetition for one or more times, and $ denotes the end of the string). You might even write a function like:

fn has_whitespaces(s: String) -> bool {
  s.contains(' ')
}

Can you spot the bug in the code?

But what if you're trying to parse the contents of more complex strings. Say, a JSON string, a CSV (Coma-Separated Values) file, or even a program? If you don't use existing library functions, you're going to have a hard time using regular expressions and string methods.

In this article, we'll be exploring the POM library, which offers a really cool interface for intuitively defining and combining parsers. Using this library, you can conveniently define the entire grammar of a formal language (perhaps your own new programming language?)

POM is an implementation of a PEG (Parsing Expression Grammar,) which is a definition of a formal language in terms of symbols, string expressions, and rules.

For example, if we were to define a rule for parsing a variable definition such as: let x = 1;, we would say that let is a sequence of symbols, =, ;, and spaces are symbols, and x and 1 are string expressions. A variable definition consists of the let sequence, followed by a space, followed by a valid identifier (x, for example,) followed by zero or more spaces, followed by the = symbol, followed by zero or more spaces, followed by a valid expression (1 in this example,) followed by zero or more spaces and the ; symbol. A bit verbose, isn't it? Such are the definitions of formal languages.

Thankfully, POM provides a very declarative way of defining these rules. It allows us to define different rules for parsing each small piece of text, and then combine them using arithmetic and logical operators to form more complex rules. These operators are called parser combinators.

POM defines a type called: Parser, which is used to encode parsing rules. A parser can be constructed using many of the pre-defined parsers that the library provides, such as the sym and seq functions. The above example of a variable definition can be expressed in POM as:

use pom::char_class::*;
use pom::parser::*;

fn variable_def<'a>() -> Parser<'a, u8, (String, u32)> {
  let valid_id = (is_a(alpha) + is_a(alphanum).repeat(0..))
    .map(|(first, rest)| format!("{}{}", first as char, String::from_utf8(rest).unwrap()));
  let valid_expr = one_of(b"0123456789")
    .repeat(1..10)
    .convert(String::from_utf8)
    .convert(|s| s.parse::<u32>());
  seq(b"let") * sym(b' ') * valid_id - sym(b' ') - sym(b'=') 
    - sym(b' ').repeat(0..) + valid_expr - sym(b';')
}

Let's break down the code:
1 - We import all the contents of pom::char_class, which exports
predicates such as alpha, and alphanum. We use these predicates to
test the type of input characters.
pom::parser exports seq, and sym among other functions that return
Parsers.

use pom::char_class::*;
use pom::parser::*;

2 - The variable_def function takes no arguments, and return a Parser
with the lifetime 'a that accepts u8s (character bytes) as input, and
outputs a tuple of (String, u32). The tuple encodes the variable's
identifier, and its value (we limit valid values to 9-digit positive
integers for simplicity's sake.)

fn variable_def<'a>() -> Parser<'a, u8, (String, u32)>

3 - A valid variable identifier consists of at least one alphabetic character, followed by zero or more alphanumeric characters. We use the is_a parser to test that the first character of the variable id is_a(alpha), and that the rest of the id is_a(alphanum).
Notice how the + operator is used to combine the two parts of the variable identifier. It returns a tuple containing the results of both operands, which we then destructure in the map method call. map is used to transform the result of a parser, and return a new parser. In this case: (is_a(alpha) + is_a(alphanum).repeat(0..)) returns a Parser<'a, u8, (u8, Vec<u8>)> (the first character and the rest of the characters,) which we then transform into a Parser<'a, u8, String> (the whole identifier) by concatenating the first character and the rest of the characters using format!.

let valid_id = (is_a(alpha) + is_a(alphanum).repeat(0..))
    .map(|(first, rest)| format!("{}{}", first as char, String::from_utf8(rest).unwrap()));

4 - A valid expression is defined as 1 to 9 numbers converted to a String which is then parsed as a u32. convert is used here instead of map because String::from_utf8 returns a Result, so the parser wouldn't match the expression if the result of String::from_utf8 is Err.

let valid_expr = one_of(b"0123456789")
    .repeat(1..10)
    .convert(String::from_utf8)
    .convert(|s| s.parse::<u32>());

5 - The returned parser is a combination of the symbols mentioned above mixed with some spaces, and a valid identifier and expression. The * operator combines two parsers and returns the result of the right-hand parser. So Parser<'a, u8, T> * Parser<'a, u8, U> = Parser<'a, u8, U>. We also see the - operator in use, which combines two parsers and returns a parser with the value of the left-hand parser. These operators follow the same precedence rules as normal arithmetic operators in Rust.

seq(b"let") * sym(b' ') * valid_id - sym(b' ') - sym(b'=') 
    - sym(b' ').repeat(0..) + valid_expr - sym(b';')

Take a look at this handy table of parser combinators.

Now to test our parser:

#[test]
fn test_variable_parser() {
  let valid_variable_byte_string = b"let v1 = 42;";
  let parsed_variable = variable_def().parse(valid_variable_byte_string);
  assert_eq!(parsed_variable, Ok(("v1".to_string(), 42)));

  let invalid_variable_byte_string = b"let nosemicolon = 42";
  let parsed_variable = variable_def().parse(invalid_variable_byte_string);
  assert_eq!(parsed_variable, Err(pom::Error::Incomplete));

  let invalid_variable_byte_string = b"let morethan9digits = 1234567890;";
  let parsed_variable = variable_def().parse(invalid_variable_byte_string);
  assert_eq!(
    parsed_variable,
    Err(pom::Error::Mismatch {
      message: "expect: 59, found: 48".to_string(),
      position: 31
    })
  );
}

Using POM to parse a variable definition of a positive integer value might seem overkill, but try to imagine using it to parse an entire programming language.

I hope you found this article helpful. Now enjoy parsing every string in your way.

Special thanks to Junfeng Liu and all contributors for creating an amazing library. You deserve a cookie 🍪

Machine Learning: From Zero to Slightly Less Confused

Muhammad Tabaza — Sat, 25 May 2019 13:28:08 +0000

When I started my Computer Science studies three years ago, Machine Learning seemed like one of those tools that only brilliant scientists and Mathematicians could understand (let alone use to solve day-to-day problems). Whenever I heard the words "Machine Learning", I imagined a high tower with dark clouds above it, and a dragon guarding it. I think the main reason for this irrational fear is that the field is an intersection of so many disciplines that I had no idea about (e.g. Statistics, Probability, Computer Science, Linear Algebra, Calculus, and even Game theory).

I know it's not just me. It's no wonder people are afraid of Machine Learning, people don't like Math! Even though understanding some of the very basic Mathematics behind Machine Learning will not only give you a good sense of how it works, but it'll get you far as a Machine Learning practitioner. And who knows, maybe you'll grow to like the Math, like me.

In this article, I'll attempt to give you a better understanding of what Machine Learning really is, and hopefully get rid of any fear of the subject you've been building up. Getting started solving real world problems using Machine Learning can be much easier than many are led to believe.

Machine Learning (ML) is the science of making machines perform specific tasks, without explicitly writing the algorithm for performing the tasks. Another definition would be making the machine learn how to perform some task from experience, taking into account some performance measure (how well it performs the task).

Let's consider these two popular problems:

Given some features of a breast tumor (i.e. its area and smoothness), predict whether the tumor is malignant or benign.
Given the monthly income of a house in California, predict the house's price.

Problem 1: Tumor Classification

Let's see. We are using two variable features of a tumor to determine whether it is malignant or benign, how can we go about solving this problem?

Well, we can try to come up with some logic to decide the class of the tumor. Maybe something like:

def tumor_class(tumor):
      area = tumor[0]
      smoothness = tumor[1]
      if area < 110 and smoothness < 0.07: 
        return 'Malignant'
      elif area > 110 and smoothness < 0.07:
        return 'Benign'
      elif area < 110 and smoothness > 0.07:
        return 'Malignant'
      else:
        return 'Benign'

You can find and experiment with all the code on Google Colab.

But how can we know these threshold values (110 and 0.07)? How accurate is this algorithm? What if we had to use more than two features to predict the tumor's class? What if a tumor could belong to one of three or four classes? The program would become way too difficult for a human to write or read.

Let's say we have a table of 569 breast tumors that has three columns: the area, the smoothness, and the class (type) of tumors. Each row of the table is an example of an observed tumor. The table looks like this:

Area	Smoothness	Class
594.2	0.12480	1.0
1007.0	0.10010	0.0
611.2	0.08458	1.0
...	...	...

A row of the table can be called an example, an instance, or a tuple. A column of the table can be called a feature.
In ML, the feature we want to predict is often called the target, or label.

Never mind the measurement of the area and smoothness, but pay attention to the Class column. Class 1 represents "Malignant", and class 0 represents "Benign".

Alright, now that we have some data, we can plot it and see if that'll help us:

The X axis represents the area of the tumor, while the Y axis represents its smoothness. Each data point (tumor) is colored orange if it's malignant, or green if it's benign.

Notice how the two classes are roughly separated. Maybe we can draw a line that (roughly) separates the two classes (any tumor under the line is malignant, and any above the line is benign):

But what about the tumors that are misclassified? There are green points under the line, and orange points above it. If drawing a straight line is all we'll do, then we need to modify the line's equation in order to minimize the error.

Any straight line has the form: y = ax + b. Which means we can keep modifying a and b until the number of misclassified tumors is at its minimum. This is called the training process. We are using our data (experience) to learn the task of predicting tumor classes, with regard to how often we misclassify tumors.

a and b are called weights. a and x can be vectors depending on the number of features we're using to predict y. In our case, the line's equation can be written as y = a[1]*x[1] + a[21]*x[2] + b, where a[1] is the weight of the first feature (x[1], the area), and a[2] is the weight of the second feature (x[2], the smoothness).

The goal of the training process is to learn a function of the training features that predicts the target. Concretely, the function learned from training on our tumor data is a function that takes two arguments (area and smoothness), and returns the class of the tumor (0 or 1). This function is called the model.

Once the model is trained, we can start making predictions on new (previously unseen) breast tumors.

This entire process can be done in 13 lines of simple Python code:

from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression

cancer_data = load_breast_cancer()

# Despite its name, LogisticRegresssion is actually a classification model
classifier = LogisticRegression(solver='lbfgs', max_iter=5000)
classifier.fit(cancer_data.data[:,[3, 4]], cancer_data.target)

def tumor_type(tumors):
  y = classifier.predict(tumors)
  print(['Malignant' if y == 1 else 'Benign' for y in y])

tumor_type([
    [50, 0.06],
    [1500, 0.1], # Prints out: 
    [200, 0.04]  # ['Malignant', 'Benign', 'Malignant']
])

This example uses Scikit-learn, a very popular Python ML library. But you're not limited to Scikit-learn or Python. You can do ML in any language you like. R and MatLab are pretty popular choices.

In ML, the problems where your goal is to predict a discrete label (e.g. spam/not spam, male/female, or malignant/benign) are called classification problems. Our tumor classification problem is more specifically a binary classification problem (the output is one of only two classes).

Since we used a line to separate the two classes and predict the class of any new tumor, our model is called a linear model.

Now let's look at a regression problem.

Problem 2: Predicting House Prices

Suppose that you have a dataset that contains 17,000 records of houses in California. And given the median monthly income of of a city block, you are tasked with predicting the median house value in that block.

Let's start by plotting the data that we have:

The X axis represents the median block income in thousands, and the Y axis represents the median house price of the block (in U.S Dollars).

Notice how we can roughly represent the relation between the income and price as a straight line:

What we can do now is modify our line's equation to get the most accurate result possible.

Again, we can do all of this with a few lines of Python code:

from sklearn.linear_model import LinearRegression
import pandas as pd

house_data = pd.read_csv('sample_data/california_housing_train.csv')
house_target = house_data['median_house_value']
house_data = house_data['median_income'].to_numpy().reshape(-1, 1)

regressor = LinearRegression().fit(house_data , house_target )

def house_price(incomes):
  print(regressor.predict([[i] for i in incomes]).tolist())

house_price([2, 7, 8]) 
# Prints out: [127385.28173581685, 338641.43861720286, 380892.66999348]

Now you might be saying "A straight line doesn't fit this data!", and I would agree with you. There are many things we can do to improve the performance of this model, like getting rid of some of the outliers in the data:

Which affects the training process. We could look for a feature that better relates to the price, or use multiple features of the houses to get a multidimensional line. We can also scale down the data to speed up the training process. We can even use a different kind of model.

Many steps can be takes before even starting to train a model that will immensely improve its performance (i.e. feature engineering and preprocessing). One might even decide that they don't have the right data for their purposes, so they start collecting it.

This problem is an example of a regression problem, which is a problem where the result of the prediction is a value belonging to a continuous range of values (e.g. price in Dollars, age in years, or distance in meters).

The two problems we looked at are examples of supervised ML problems. Which are essentially the problems where the data used is labeled, meaning the target feature's values are known in the training data (e.g. our tumor data was labeled malignant/benign, and our house data was labeled with the price). What would we do if our data isn't labeled?

I hope you're starting to see the big picture. ML is wide and deep, and it can get very difficult. But the basics are just that: basics.

If I've managed to spark your interest in the subject, then I'd like to point you to a few places where you can learn much more:

Stanford's Machine Learning course on Coursera, which is also available on YouTube
Khan Academy for all the basic Math
Google's Machine Learning Crash Course
O'Reily: Data Science from Scratch
O'Reily: Introduction to Machine Learning with Python
Google Colaboratory: a fully hosted Jupyter environment (you don't need to install or set up anything, just do it all here)

I found these resources very helpful. Pick and choose whichever feels comfortable for you.

I hope you found this article helpful, and I would love to read your opinions in the comments.