Forem: Timilehin Okunola

How To Scrape Web Applications Using Puppeteer

Timilehin Okunola — Fri, 12 Jul 2024 19:43:20 +0000

Introduction

Website scraping offers a pool of possibilities for extracting data from websites for various purposes, such as analysis and content monitoring, web archiving and preservation, and research. Web scraping is an automated task, and Puppeteer, a popular Node.js library for headless Chrome/Chromium browser control, is a powerful tool.

Scraping multiple web pages simultaneously might be difficult, so we will also use the Puppeteer-Cluster package.

In this tutorial, we will use the popular scraping package Puppeteer to scrape the website books.toscrape.com, which was built for scraping purposes. We will use the puppeteer-cluster package to scrape the details of the first 100 books on this website.

Prerequisite

To follow along with this tutorial, you need to have the following installed on your PC
Node >= version 16
Npm
A code editor.

You also need to have a basic knowledge of JavaScript.

Set up puppeteer

Install the package Puppeteer by running the command below.

npm install puppeteer

Now, create a file called index.js.
Now paste the code below into the index.js file to set up the Puppeteer and take a screenshot of the website's first page.


const puppeteer = require("puppeteer");

(async () => {
const browser = await puppeteer.launch({protocolTimeout:600000 });

const page = await browser.newPage();

 await page.goto(`https://books.toscrape.com/index.html`, {
        timeout: 60000,
      });

// Set screen size
await page.setViewport({ width: 1080, height: 1024 });
await page.screenshot({ path: "homepage.png" });

await browser.close();
})();

Now run the command below in your terminal to see the result.

node index

When the code executes, you will see that a new image file called homepage.png has been created in the project's root folder. It contains a screenshot of the website's first landing page.

Now, let us scrape the website properly.

How To Grab Selectors From a Website

To scrape the website, you must grab selectors pointing to each element you want to scrape data.

To do this,

Open your browser
Navigate to the webpage from which you want to scrape data; we will visit the Book To Scrape Website for this tutorial.
Right-click on the time you wish to scrape, and click on inspect, as shown below.

This opens the developers' tools to display the web page's HTML source document and highlights the inspected element.
Right-click on the element from which you wish to scrape data in the dev tools. This opens another modal.
Highlight the Copy option, and a submenu pops up beside the initial modal. Select the Copy Selector.

This copies the exact path to the element. However, you can edit the path based on your understanding of the page’s HTML document.

Scrape The First Book On the Page

Grab the selector for the first book div to scrape the first book. Then, grab the element's content using the $eval method. This method expects two properties: the element path selector and a callback function where you define the property you need.

Below is a demo of implementing the $eval method.

const firstBook = await page.$eval(
    "#default > div > div > div > div > section > div:nth-child(2) > ol > li:nth-child(1) > article",
    (e) => e.innerHTML
  );

  console.log(firstBook);

When you add this function to the demo, we wrote earlier before the browser.close function. When you run the scraper in the terminal, you should have the HTML within the article element displayed in the console.

Scrape Multiple Books

Using the $$eval method, it is possible to scrape multiple elements, such as li and ol. The $$eval method expects two properties: the selector of the parent element containing the listed items and a callback function that maps over the array of elements and grabs the specified data from a selected element. This method returns an array of the specified data, which it grabs from each element in the parent element whose selector was specified.

Below is a demo of how to do that with the books on the first page of the Books to Scrape website.

const booksArray = await page.$$eval(
        "#default > div > div > div > div > section > div:nth-child(2) > ol> li",
        (elements) =>
          elements.map((el, i) => {
            const bookTitle = el.querySelector("h3> a").getAttribute("title");
});
);

Scrape Data From The First 100 Books on the website

In this section, we will scrape the first 100 books on this website. This website has 50 pages, and each page contains 20 books. This means we will be scraping through the first 5 pages of the website.

To do this, paste the code below in the scraper's main function.

const puppeteer = require("puppeteer");

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  let flattenedArray;
  const bookDataArray = [];
  for (let index = 1; index <= 5; index++) {
    if (index === 1) {
      // Navigate the page to a URL
      await page.goto(`https://books.toscrape.com/index.html`, {
        timeout: 60000,
      });

      //Take screenshot of each page
      await page.screenshot({ path: `images/page-${index}.png` });
    } else {
      // Navigate the page to a URL
      await page.goto(
        `https://books.toscrape.com/catalogue/page-${index}.html`,
        {
          timeout: 60000,
        }
      );

      await page.screenshot({ path: `images/page-${index}.png` });
    }

      const booksArray = await page.$$eval(
        "#default > div > div > div > div > section > div:nth-child(2) > ol> li",
        (elements) =>
          elements.map((el, i) => {
            const bookTitle = el.querySelector("h3> a").getAttribute("title");
            const bookPrice = el.querySelector("p.price_color").innerText;
            const imageLink = el.querySelector("img").getAttribute("src");
            const inStock = el.querySelector("p.availability").innerText;

            const bookDetailsLink = el
              .querySelector("h3> a")
              .getAttribute("href");

            const data = {
              i,
              title: `${bookTitle}`,
              detailsLink: `${bookDetailsLink}`,
              price: `${bookPrice}`,
              image: `https://books.toscrape.com/${imageLink}`,
              availability: `${inStock}`
            };

            return data;
          })
      );

      //Add an index number to each book detail.
      const updatedBookNoInDataArray = booksArray.map((e) => {
        return {
          ...e,
          i: index == 1 ? e.i + 1 : (index - 1) * 20 + e.i + 1,
        };
      });

      bookDataArray.push(updatedBookNoInDataArray);

      //Flatten out the array here
      flattenedArray = [].concat(...bookDataArray);

  }

  await browser.close(); 
})();

In the above code snippet, we first declared a flattenedArray and a bookDataArray to store the array of data we scraped. The bookdataArray will contain an array of arrays, and then we flatten out the result into the flattenedArray variable.

We then loop over the first 5 pages by dynamically changing the page number variable on the URL as we loop through each page. We check if we are on the first page, declare the URL for the first page, and then dynamically increment each number as the loop executes.

Then, on each page, we use the $$eval function to grab the array of books. For each book item, we get the following data: the title, the price, the link to the cover image, the link to the description page, and the availability of the book.

So, each page returns an array of 20 items. This means, at the end of each loop, the booksArray contains 20 items. Then we map over the booksArray to add a sequential index to the items based on the page where they were scraped from.

Then, each booksArray is pushed into the bookDataArray. The booksDataArray contains five arrays, each containing 20 book items. Then, we flatten out this array to give just one array, the flattenedArray.

If you log the flattenedArray to the console and run the script, you should have a single array of 100 items logged to the console. Each item would be an object with the following keys: i, title, detailsLink, price, image, and availability. You would also notice the index of each object starts at 1 and ends at 100.

Scrape The Book Description Data For Each of The 100 Books

In this section, we will scrape the book description data for each of the 100 books using the details link. To do this, we will be using another puppeteer package called puppeteer-cluster.

To get started, install the package by running the command below in your terminal.

npm install puppeteer-cluster

Next, import the package into your index file.

const { Cluster } = require("puppeteer-cluster");

Now at the bottom of the script before the browser.close method, declare a new array that will store the data we will be scraping from each book details page.

//some code

const addedData = [];

Initialize the cluster instance by pasting the code below in the script.

const cluster = await Cluster.launch({
    concurrency: Cluster.CONCURRENCY_PAGE,
    maxConcurrency: 100,
    timeout: 10000000,
  });

The code snippet above shows how we set the concurrency to CONCURRENCY_PAGE. This means that each worker in the cluster will have its own separate Puppeteer instance with a single page. This allows parallel task execution on different web pages.

The maxConcurrency is set to 100. This means that we will have a maximum of 100 workers running simultaneously. We set it to 100 because we intend to work with just 100 different pages.

The timeout option sets the timeout duration for tasks the cluster workers execute. This timeout defines the maximum amount of time a worker has to complete a task before it's considered timed out and potentially restarted. The value is specified in milliseconds (ms). Here, it is set to 10,000,000 ms, which is very high, about 10,000 seconds.

Next, declare a callback function that acts as an event listener. This function handles errors that may occur when the cluster is executing each of our pages and logs the error message to the console.

//Catch any error that occurs when you scrape a particular page and log it to the console.
 cluster.on("taskerror", (err, data) => {
    console.log(`Error Crawling ${data}: ${err.message}`);
  });

Next, write the function you need the scraper to execute on each page by pasting the code below into the main scraper script.

//Describe what you want the scraper to do on each page here
  await cluster.task(async ({ page, data: url }) => {
    await page.goto(url, { timeout: 100000 });
    const details = await page.$eval("#content_inner > article > p", (el) => {
      if (el === undefined) {
        return "";
      } else {
        return el.innerText;
      }
    });

    const tax = await page.$eval(
      "#content_inner > article > table > tbody > tr:nth-child(5) > td",
      (el) => {
        if (el === undefined) {
          return "";
        } else {
          return el.innerText;
        }
      }
    );
    const noOfleftInStock = await page.$eval(
      "#content_inner > article > table > tbody > tr:nth-child(6) > td",
      (el) => {
        if (el === undefined) {
          return "";
        } else {
          return el.innerText;
        }
      }
    );
    addedData.push({ details, noOfleftInStock, tax });
  });

We check if the page contains the element we are targeting. If it does not, we return an empty string. We then return an object containing the book's details, the number left in stock, and the tax.

for (const url of flattenedArray) {
    if (url.detailsLink.startsWith("catalogue/")) {
      await cluster.queue(`https://books.toscrape.com/${url.detailsLink}`);
    } else {
      await cluster.queue(
        `https://books.toscrape.com/catalogue/${url.detailsLink}`
      );
    }
  }

Then, we describe a for loop for each of the book items in the flattened array. Then, we check if the details link begins with “catalogue/” and just queue the URL by concatenating it with the root URL. If it does not, we add it to the root URL before concatenating. This is because the catalogue path is required in the URL path to retrieve the book details page.

Next, add the lines below to the code.

await cluster.idle();
await cluster.close();

The idle method instructs the cluster to wait for all currently running tasks within its workers of the cluster instance to finish. This ensures that all scraping activities initiated by the queue method are completed before proceeding.

The close method terminates the cluster entirely. This process involves gracefully shutting down all browser instances associated with the cluster workers and releasing any resources allocated to the cluster.
Then, we add the retrieved data from each page to our flattened array using the code snippet below.

const finalbookDataArray = flattenedArray.map((e, i) => {
    return {
      ...e,
      bookDescription: addedData[i].details,
      tax: addedData[i].tax,
      noOfleftInStock: addedData[i].noOfleftInStock,
    };
  });

Finally, let us write all the scraped data into a json file. We can use the node’s fs package as shown below.

//Import the package at the top of the file
const fs = require("fs");

const bookDataArrayJson = JSON.stringify(finalbookDataArray, null, 2);
  fs.writeFileSync("scraped-data.json", bookDataArrayJson);

The final code should look like this.

const puppeteer = require("puppeteer");
const { Cluster } = require("puppeteer-cluster");
const fs = require("fs");

(async () => {
  const browser = await puppeteer.launch({ protocolTimeout: 600000 });

  const page = await browser.newPage();

  let flattenedArray;
  const bookDataArray = [];
  for (let index = 1; index <= 5; index++) {
    if (index === 1) {
      // Navigate the page to a URL
      await page.goto(`https://books.toscrape.com/index.html`, {
        timeout: 60000,
      });

      await page.screenshot({ path: `images/page-${index}.png` });
    } else {
      // Navigate the page to a URL
      await page.goto(
        `https://books.toscrape.com/catalogue/page-${index}.html`,
        {
          timeout: 60000,
        }
      );

      await page.screenshot({ path: `images/page-${index}.png` });
    }

    const booksArray = await page.$$eval(
      "#default > div > div > div > div > section > div:nth-child(2) > ol> li",
      (elements) =>
        elements.map((el, i) => {
          const bookTitle = el.querySelector("h3> a").getAttribute("title");
          const bookPrice = el.querySelector("p.price_color").innerText;
          const imageLink = el.querySelector("img").getAttribute("src");
          const inStock = el.querySelector("p.availability").innerText;

          const bookDetailsLink = el
            .querySelector("h3> a")
            .getAttribute("href");

          const data = {
            i,
            title: `${bookTitle}`,
            detailsLink: `${bookDetailsLink}`,
            price: `${bookPrice}`,
            image: `https://books.toscrape.com/${imageLink}`,
            availability: `${inStock}`,
          };

          return data;
        })
    );

    //Add an index number to each book detail.
    const updatedBookNoInDataArray = booksArray.map((e) => {
      return {
        ...e,
        i: index == 1 ? e.i + 1 : (index - 1) * 20 + e.i + 1,
      };
    });

    bookDataArray.push(updatedBookNoInDataArray);

    //Flatten out the array here
    flattenedArray = [].concat(...bookDataArray);
  }

  const addedData = [];

  const cluster = await Cluster.launch({
    concurrency: Cluster.CONCURRENCY_PAGE,
    maxConcurrency: 100,
    timeout: 10000000,
  });

  //Catch any error that occurs when you scrape a particular page and log it to the console.
  cluster.on("taskerror", (err, data) => {
    console.log(`Error Crawling ${data}: ${err.message}`);
  });

  //Describe what you want the scraper to do on each page here
  await cluster.task(async ({ page, data: url }) => {
    await page.goto(url, { timeout: 100000 });
    const details = await page.$eval("#content_inner > article > p", (el) => {
      if (el === undefined) {
        return "";
      } else {
        return el.innerText;
      }
    });

    const tax = await page.$eval(
      "#content_inner > article > table > tbody > tr:nth-child(5) > td",
      (el) => {
        if (el === undefined) {
          return "";
        } else {
          return el.innerText;
        }
      }
    );
    const noOfleftInStock = await page.$eval(
      "#content_inner > article > table > tbody > tr:nth-child(6) > td",
      (el) => {
        if (el === undefined) {
          return "";
        } else {
          return el.innerText;
        }
      }
    );

    // console.log({details, noOfleftInStock, tax})
    addedData.push({ details, noOfleftInStock, tax });
  });

  for (const url of flattenedArray) {
    if (url.detailsLink.startsWith("catalogue/")) {
      await cluster.queue(`https://books.toscrape.com/${url.detailsLink}`);
    } else {
      await cluster.queue(
        `https://books.toscrape.com/catalogue/${url.detailsLink}`
      );
    }
  }

  await cluster.idle();
  await cluster.close();

  const finalbookDataArray = flattenedArray.map((e, i) => {
    return {
      ...e,
      bookDescription: addedData[i].details,
      tax: addedData[i].tax,
      noOfleftInStock: addedData[i].noOfleftInStock,
    };
  });

  const bookDataArrayJson = JSON.stringify(finalbookDataArray, null, 2);
  fs.writeFileSync("scraped-data.json", bookDataArrayJson);

  await browser.close();
})();

Now create a folder named Images and execute the scraper by running the command below in the terminal

node index

When the scraper finishes executing, you should have 5 images in your images folder and a file named scraped-data.json, which contains the json data of data from the website.

Wrapping Up

So far, in this tutorial, we have learned how to scrape data from a website using Puppeteer and how to scrape multiple pages at once using the Puppeteer-cluster package. You can get the full code on my repo here

You can improve your skills by scraping websites like e-commerce and real estate websites. You can also use the puppeteer cluster to create a scraper comparing data between two or more websites.

To learn more about Puppeteer, you can check out their documentation here. You can also check out the puppeteer-cluster package here.

In my next Puppeteer series, I will discuss how to use Puppeteer for Integration testing in web applications.
Till then, you can connect with me on GitHub | X.

Popular Libraries For Building Type-safe Web Application APIs

Timilehin Okunola — Sun, 07 Apr 2024 16:37:26 +0000

Introduction

Type-safe APIs are becoming increasingly popular in web development. This is because they bring several benefits to both the developer and the application.

In Web applications, a type-safe API describes an API where the request and response data are explicitly defined using types. Some benefits of using a type-safe API in applications include increased data security and integrity, improved performance, and enhanced developer experience.

Using type-safe APIs greatly benefits developers because it makes it easier to find bugs, improves code completion error checking and significantly reduces runtime errors.

In this article, we shall review a few libraries that come in handy for building type-safe Web Applications.

Yup

Yup is a popular open-source library for schema-based validation. It is a schema builder for runtime value parsing and validation, making it easy to define the objects' expected structure and data types.

Here are some key functionalities of Yup;
Concise schema definition: Yup provides intuitive syntax similar to JSON to define object schemas. It also allows schema definition to declare required and optional fields and apply validation rules like length, patterns, and custom validation logic.

Validation: Yup validates JavaScript objects against defined schemas, catching invalid data early. It handles both synchronous and asynchronous validations, giving you flexibility. The built-in async validation support makes it easy to model server-side and client-side validation equally well. It also provides clear and concise messages that make debugging seamless.

Powerful typescript support: This feature makes Yup infer static types from schema or ensure the schema correctly implements a type.
Extensibility: Yup also allows developers to add their type-safe methods and schema.

You can check out the package’s documentation here.

Joi

Joi is a powerful library for building type-safe web application APIs in Javascript and Node.js. It is one of the oldest libraries for building type-safe applications, with its first version released in 2013. Although originally released as an open-source library, it is now maintained by Automattic, the company behind Wordpress.com.
Here are a few beautiful features about Joi

Schema Definition: Joi uses an expressively rich schema language to define data structures and validation rules. It supports various data types, nested arrays, objects, and complex validation logic. Developers can customise the schema with options like conditional validation, custom error messages, and language localisation.

Data validation: It catches invalid data early in development, preventing errors and improving data integrity. It makes debugging easy by providing detailed error messages that pinpoint the exact location and nature of validation failures.

Performance: Joi is highly optimised for efficiency, especially with large datasets and complex validation rules. It leverages asynchronous validation and caching mechanisms for improved performance in large datasets. It also supports promises and async/await for non-blocking validation.

You can check out their documentation here.

Zod

Zod is more popular within typescript projects. It is a popular library for type-safe schema validation. It focuses more on offering a developer-friendly experience. This is possible because it offers the following features.

Type Safety: Zod offers typescript-first schema validation with static type inference. This means it leverages typescript types to define schemas, ensuring data is consistent with the specified structure. It also integrates seamlessly with typescript, improving data type inference and automatic type checking. This static type safety benefit catches potential errors at compile time instead of runtime like other packages.

Performance: Zod executes efficiently by avoiding unnecessary code execution through caching and other optimisation techniques. It suits applications with complex schemas, large datasets or high throughputs.

Immutable Schemas: In Zod, when changes are made to existing schemas, they do not modify or affect existing data, ensuring data integrity and predictability.

You can check out their documentation here.

Superstruct

This library is relatively new but is highly regarded. Like Zod, it is a type-safe schema validation, particularly within Typescript. Typescript GraphQL inspired its type annotation API, Go and Flow, making the API easy to understand. It was first released in 2019, and it is considered fully open-source. It offers the following features:

Strong type safety: It leverages typescript types directly to define schemas and offers strong data integrity and consistency. It provides static type-checking during development and catches errors at runtime.

Composability: It also allows the building of complex schemas by composing smaller reusable schema units. It also enables the sharing and reusability of small components across different parts of the codebase, which helps improve code organisation and maintainability for large-scale projects.

Immutable schemas: Similar to Zod, Superstruct allows changes to the schema without altering the data. This helps improve the application's scalability and predictability.

You can check out Superstruct documentation here.

tRPC

Compared to other libraries, tRPC is relatively new. However, it is primarily geared towards full-stack typescript development. A beautiful thing about tRPC is that it has no build or compile steps, meaning there is no code generation, runtime bloat, or build step. It also has zero dependencies; hence, it is a lightweight package. Some of the features tRPC offers include

Automatic Typesafety: tRPC was made to ease building typesafe end-to-end API. When a change is made to a schema on the server, tRPC will warn you of potential errors on the client before even saving the file. This leads to the development of a more reliable, robust API

Serverside code organisation: tRPC encourages server-side code organisation around well-defined functions. This promotes readability and maintainability for large code bases, making scaling the serverside codes easy. It also makes the process of testing the API logic seamless. tRPC also provides middleware that handles errors, logging, and other common tasks.

Automatic code generation: tRPC offers automatic code generation for client-side interfaces and boilerplate code, reducing repetitive code. It also auto-compiles API server code, giving developers confidence in their application’s endpoints.

Framework Agnostic: tRPC is compatible with all Javascript frameworks and runtimes. It is also easy to add to your existing projects. It also provides adapters for popular javascript libraries and frameworks like React, Nextjs, Solidjs, Svelte, Express, Fastify, and AWS Lambda.

Function-Based API Definition: This is one of tRPC's key outstanding features. It defines endpoints using Javascript functions instead of HTTP methods used in the traditional REST API. This approach streamlines API development by reducing boilerplate code and promoting code organisation. It also allows for more flexibility in API structure compared to rigid REST conventions.

tRPC is an open-source project whose active community contributes to its rapidly growing popularity. You can check out their documentation here.

Ts-rest

Ts-rest is another excellent library that makes the process of building type-safe API in typescript. It offers a simple way to define a contract for API routes. The application can implement and consume this without stress or extra code generation. Here are some of ts-rest key features:

Type safety: Ts-rest offers strong type safety through a typescript interface and request and response structure annotations. This leads to improved code quality and catch potential.

REST-Based API definition: Unlike tRPC, ts-rest uses the conventional REST approach for defining HTTP methods like GET, POST, PUT, and DELETE. This familiarity is usually beneficial for developers with experience working with REST APIs. Ts-rest supports various content types (JSON, form data, multipart) and provides common task middleware.

Full Optional OpenAPI integration: Ts-rest can automatically generate OpenAPI schema and documentation based on the API definitions. This allows developers to document API structure and data contracts for client-side applications and third-party tools. Also, this generated OpenAPI schema can be used to generate client-side Typescript code automatically to interact with the API. This shortens development time, makes client-side application development easier, and allows developers to focus on the application's features.

Ts-rest is an open-source project, and it has a fast-growing community. You can check out their documentation here.

Ajv

Ajv is specifically designed to validate JSON schema. This means it compares the application JSON data against defined schemas. Ajv offers the following features:

JSON Schema Language Support: This is its key feature. Ajv supports JSON schema drafts 04, 06, 07, 2019-09, and 2020-12 while offering the different flexibilities required for each. It provides a rich expression language for defining complex validation rules and conditions. Some of these keywords include required type, pattern, and additional properties.

Performance: Ajv works efficiently with large data sets. It utilises caching mechanisms and asynchronous validation for improved performance. According to the JSON-schema, jsck, z-schema, and themis benchmarks, Ajv is currently the fastest and most standard-compliant JSON validator.

Ajv seamlessly integrates with various web frameworks like Expressjs and Koa.js.

Ajv’s documentation is available here.

TypeBox

Like Ajv, TypeBox is a JSON Schema type builder and runtime validator for TypeScript. It allows you to define the expected structure of JSON data using familiar TypeScript types and then validate incoming data against those types at runtime.

Here are some key features of TypeBox:
JSON Schema Definition: TypeBox allows developers to use familiar typescript types to define the expected JSON data structure. The key difference between Ajv and TypeBox is that TypeBox uses TypeScript types for schema definition, providing type inference and IDE integration. Ajv on the other uses JSON Schema directly for defining validations.

Composability: TypeBox allows developers to build complex schemas from simpler and smaller ones for nested structures and diverse data representations. It also handles arrays, objects, and nested objects easily.

Clear Types Documentation: The typescript types are declared to act as a self-documenting schema, improving code readability. They also allow developers to add metadata like descriptions or examples to enhance schema documentation.

The documentation can be found here.

Class-Validator

Although this library is primarily for validating class instances in typescript and is not specifically designed for it, it can be handy in web application development, including APIs.
The Key feature of the class-validator library that is useful in developing typesafe applications:

Declarative validation: This feature allows developers to define validation rules on class properties using decorators like @IsString(), @IsNumber(), @isEmail() and @MinLength(8). This approach greatly improves code quality by enhancing readability, reducing boilerplate code and promoting clear mapping between data models and validation rules.

You can check out its documentation here.

feTS

feTS is a new library for building typesafe full-stack applications. This library makes building and consuming REST APIs easy without compromising type safety in client-server communication. The feTS client and server can be used to build full-stack applications. It can also be used independently. The independent use allows developers to adapt to their project’s specific needs.

Aside from providing type-safety, it also provides other features, which include:

JSON Schema Route Description: feTS utilises the JSON schema specifications for route description. This enables integration with any tool within the JSON Schema ecosystem.

OpenAPI integration: feTS utilises the OpenAPI specification for universal tool compatibility. Also, it generates an OpenAPI documentation UI using the Swagger-UI with the server out-of-the-box.

Versatile Server: The feTS server provides a superfast HTTP server that can be deployed and run anywhere with the power of the dependency @whatwg-node/server. Some servers where feTS can be deployed include the AWS Lambda, Azure Function, Bun, Node.js, Deno, Google Cloud Functions, Next.js, and μWebSockets.

You can learn more from the documentation here.

There are still a few more libraries in development. An example is Tempo by betwixt-labs.

Conclusion

So far, this article has covered some useful libraries for developing typesafe APIs. However, which one to use on your next project depends on your application's specific needs and peculiarities. For instance, if your application heavily depends on JSON schemas, your go-to library might be Ajv. If you want type-safety on both client and server, you might consider working with tRPC or ts-rest.

If you made it this far, I say thank you for reading through. I hope you found this article helpful. If you did, kindly like and share it with your friends and colleagues.

I would love to connect with you on Twitter | LinkedIn | Github.

How To Use Superstruct in an Express.js Application

Timilehin Okunola — Mon, 01 Apr 2024 16:13:07 +0000

Introduction

Data types are an important part of any programming language. Yet poor data input handling can inject chaos into an application. Hence, data validation is an extremely important aspect of any web application.

Data input validation within the Javascript ecosystem may be daunting. This is where Superstruct comes in. Superstruct is a Javascript library designed to simplify and strengthen data validation.

Superstruct leverages a declarative schema-based approach, allowing you to define your data's structure and constraints precisely. Beyond basic validation, it offers transformation and refinement capabilities.

It is designed for validating data at runtime, so it throws (or returns) detailed runtime errors for the developer to manage or the application end users. This is especially useful in web applications that use REST API or GraphQL API.

In this tutorial, we will create a basic REST API for a blog application using Superstruct for data validation. For simplicity, we won’t use any database connection, and the data will be stored in real-time. We will then go on to test our endpoints using Postman.

Build The Server

Prerequisites

To follow along with this tutorial, you need to have the following installed on your computer;
Node.js
NPM
A code editor. We will be using VS Code for this tutorial.

Also, you should have a basic knowledge of Javascript and Nodejs/Express.

First, create a folder on your computer and open this folder in your code editor.
Open your terminal and initialize a new node project by running the command below in your terminal.

npm init -y

Then install Express and the superstruct in your project by running the commands below.

npm install express

npm install --save superstruct

Next, create an index.js file and paste the following code to create the server.

const express = require("express");

const app = express();
app.use(express.json());

app.get("/", (req, res) => {
    return res.status(200).json({
      message: "Hello World"
    })
  });

  app.listen(5000, () => {
    console.log("Note app listening on port 5000");
  });

We have a basic server set up now. We can start the server by running the command below

node index

Create The Controllers

Create a folder named controllers. Create two new files named AuthController.js and PostController.js.

Paste the code into the AuthController.js file.

const { object, string, number, assert } = require("superstruct");

const UserDetails = object({
  id: number(),
  firstName: string(),
  lastName: string(),
  userName: string(),
});

//Create User
const createUser = async (req, res) => {
  try {
    assert(req.body, UserDetails, “User data is invalid”);
    return res.status(200).json({message:"User Signed Up successfully"})
  } catch (error) {
    const { path, failures } = error;
    // Handle errors based on path and error messages
  const errorMessage = failures()
    console.log("Validation failed:", path, errorMessage);
    return res.status(400).json({Error: ` ${JSON.stringify(errorMessage)}`})
  }
};

const LoginDetails = object({
    userName: string()
})
// Signin User
const loginUser = async (req, res) => {
try {
    assert(req.body, LoginDetails, “User data is invalid”)
    return res.status(200).json({message:"User Signed Up successfully"})
} catch (error) {
    const { path, failures } = error;
    // Handle errors based on path and error messages
  const errorMessage = failures()
    console.log("Validation failed:", path, errorMessage);
    return res.status(400).json({Error: ` ${JSON.stringify(errorMessage)}`})
}
};

module.exports = {
  createUser,
  loginUser,
};

In the above code snippet, first, we imported the object, string, number, and assert methods from the superstruct module.
Next, we declared our user’s details struct and created the createUser function. In the createUser function, we first checked using the assert method if the object passed in the body complies with the types we specified in the Userdetails struct. We also passed in an error explanation should the object being compared with the user details return an error.

This is declared within a try-catch block to catch any error in the checking process. Superstruct makes use of the StructError class to define errors.
Below is an example of what the StructError looks like.

StructError {
  value: false,
  key: 'email',
  type: 'string',
  refinement: undefined,
  path: ['email'],
  branch: [{ id: 1, name: 'Alex', email: false }, false],
  explanation: "An invalid email was passed"
  failures: [Function]
}

The failure returns an array of multiple StructError instances, each explaining each error. If only one error occurred, it returns an array containing an instance of just one StructError describing the error.

In the catch block, simply get the path class and failures method from the thrown error. Then, we call the failures methods to get the StructError instance(s) array. Then, we log the path and the errorMessages to the console. We also send a response status of 400 and include the JSON format of the errorMessages in the json response.

In the loginUser function, we replicated what we did with the createUser function but declared a new struct called LoginDetails containing just the userName.

To create our post controller, paste the code below into the PostController.js file

const { object, string, number, assert, enums, optional } = require("superstruct");

const PostDetails = object({
    id: number(),
    title: string(),
    content: string(),
    category: optional(enums(["Sports", "Technology", "Business", "Religion"])),
  });

//
const createPost = async(req, res) => {
try {
    assert(req.body, PostDetails, “Post data is invalid”);
    return res.status(200).json({message:"Post Created Up successfully"})
} catch (error) {
    const { path, failures } = error;
    // Handle errors based on path and error messages
  const errorMessage = failures()
    console.log("Validation failed:", path, errorMessage);
    return res.status(400).json({Error: ` ${JSON.stringify(errorMessage)}`})
}
}

module.exports = {
  createPost
};

In the above snippet, we also did something similar to what with the authController. However, we implemented the optional and enums method here.

We have successfully created the controllers.

Create The Routes

Now, let us create our routes. First, create a new folder named routes. In the routes folder, create two files named auth.js and post.js.
Paste the code below in the auth.js file to create the auth routes.

const router = require("express").Router();
const { createUser, loginUser } = require("../Controller/AuthController")

//REGISTER
router.post("/register", createUser);

//LOGIN
router.post("/login", loginUser);

module.exports = router;

To create the post route, paste the code below into the post.js file.

const router = require("express").Router();
const { createPost } = require("../Controller/PostController")

//REGISTER
router.post("/create", createPost);

module.exports = router;

Finally, let us import our routes into the index.js file.

Import the routes as shown below.

const AuthRoute = require("./Routes/auth");
const PostRoute = require("./Routes/post");

Then, use the routes below by pasting the snippet above the app.listen function.

app.use("/api/auth", AuthRoute);
app.use("/api/post", PostRoute)

We have successfully created three typesafe POST routes, “api/auth/register‘, “api/auth/login” and “api/post/create”.

Now, let us test our app.

Test The Application

We will be using the VS Code extension ThunderClient to test our APIs. First, restart the server by killing the terminal and running the command.

node index

Let us begin by testing the “api/auth/register” route.

In the above image, we tested the register route with the expected data types and got a response status of 200.
Now, let us alter one of the datatypes.

Let's look at what we have in the console to get a proper view of the clear error message.

This shows us the path where the invalid datatype was passed. The errorMessage returns an instance of the StructError class.

Let us test the login route.

If we pass the expected data type, we should get a status code 200.

Let us test this route by passing in fields not declared in the LoginDetails object.

The console has an array of StructError, with each instance of the StructError giving the details of each error.

Finally, let us test the create post route.

Notice how we did not get an error for not passing in the category field. This is because of the optional method we passed in the field.

Let's try adding a category that is not part of the categories declared in the category enum.

In the console, we have the error message as shown below.

Wrapping Up

So far, in this tutorial, we have learned how to implement superstruct for basic type-checking in our Express JS server. Type-checking adds an extra layer of security and integrity to our application. Superstruct’s concise error messages also make debugging exercises extremely easy as they identify the precise location of each error with specific or custom error messages.

Moving forward, you can integrate this package into your applications to handle type-checking with just a few lines of code.

I would love to connect with you on Twitter | LinkedIn | Github