<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Dan Keefe</title>
    <description>The latest articles on Forem by Dan Keefe (@peritract).</description>
    <link>https://forem.com/peritract</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F268996%2Fff735251-389e-4c86-b1f2-fcbb9f5630c3.png</url>
      <title>Forem: Dan Keefe</title>
      <link>https://forem.com/peritract</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/peritract"/>
    <language>en</language>
    <item>
      <title>Harry Potter and the Accessing of APIs</title>
      <dc:creator>Dan Keefe</dc:creator>
      <pubDate>Sun, 31 May 2020 10:10:38 +0000</pubDate>
      <link>https://forem.com/peritract/harry-potter-and-the-accessing-of-apis-1i0i</link>
      <guid>https://forem.com/peritract/harry-potter-and-the-accessing-of-apis-1i0i</guid>
      <description>&lt;p&gt;&lt;em&gt;Header image by &lt;a href="https://unsplash.com/@art_maltsev?utm_source=unsplash&amp;amp;utm_medium=referral&amp;amp;utm_content=creditCopyText"&gt;Artem Maltsev&lt;/a&gt; on &lt;a href="https://unsplash.com/s/photos/harry-potter?utm_source=unsplash&amp;amp;utm_medium=referral&amp;amp;utm_content=creditCopyText"&gt;Unsplash&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;APIs are everywhere. They're a core element of modern software/services, and an incredibly powerful tool for developers. Learning how to use and access APIs unlocks an incredible number of possibilities with code.&lt;/p&gt;

&lt;p&gt;This post explains what an API is and how to connect to one using Python, aiming to give an accessible introduction to a topic that can be bewildering to explore. It's designed for people who already have some experience with Python and are now looking to expand their skillset. Both for my own amusement, and because it's helpful to have an example to make concepts concrete, the post is structured around the &lt;a href="https://www.potterapi.com/"&gt;Harry Potter API&lt;/a&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;❗In the time since I wrote this tutorial, the Harry Potter API has been deactivated, a casualty of ongoing controversy. This means that the code examples will no longer run, but I'm leaving the article itself up in the hope that the explanations themselves are still of benefit to someone.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Code
&lt;/h2&gt;

&lt;p&gt;This tutorial was written using using Jupyter notebooks &amp;amp; Python 3.7.5; things might behave slightly differently if you're in a different IDE or using different versions of the language.&lt;/p&gt;

&lt;p&gt;You can find a complete copy of the code for this tutorial &lt;a href="https://github.com/peritract/tutorials"&gt;on Github&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is an API?
&lt;/h2&gt;

&lt;p&gt;An API (&lt;strong&gt;A&lt;/strong&gt;pplication &lt;strong&gt;P&lt;/strong&gt;rogramming &lt;strong&gt;I&lt;/strong&gt;nterface) is a service that provides data when asked for it. There are more specific and complex definitions, but that one is sufficient most of the time. APIs are designed to allow different machines and programs to speak to each other through code; there doesn't need to be a human in between. Most modern APIs use the &lt;a href="https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol"&gt;HTTP(S)&lt;/a&gt; protocol to communicate.&lt;/p&gt;

&lt;p&gt;APIs are everywhere - you can get &lt;a href="https://openweathermap.org/api"&gt;weather APIs&lt;/a&gt;, which allow you to get data on the weather. There are &lt;a href="http://api.cfl.ca/"&gt;Canadian football APIs&lt;/a&gt;, which provide data on Canadian football. There are APIs that will give you &lt;a href="http://www.loveaas.com/"&gt;love&lt;/a&gt; or &lt;a href="https://foaas.com/"&gt;hate&lt;/a&gt; or &lt;a href="https://placekitten.com/"&gt;placeholder images of kittens&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Many APIs charge for access, but many allow either totally free access, or have several usage tiers so that you can experiment for free, but would be charged if - like many new tech companies - you wanted to build an entire business on top of existing APIs.&lt;/p&gt;

&lt;p&gt;Everyone who uses the internet interacts with APIs every day - any time you see an interactive map, or see a list of products on a website, an API is probably being used in the background. Many modern organisations and companies are built on top of APIs provided by other organisations, and may provide APIs themselves. In short, APIs are everywhere.&lt;/p&gt;

&lt;p&gt;By connecting to different APIs, you can dramatically increase the power and scope of the software that you build; you don't need to independently map the world, or obsessively track the weather: using APIs, you can connect to services that are already doing that, building your awesome idea on top of existing structures.&lt;/p&gt;

&lt;h2&gt;
  
  
  Accessing an API
&lt;/h2&gt;

&lt;p&gt;There are two main pieces of information you need when attempting to use an API:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Where to find the API (the &lt;strong&gt;endpoint&lt;/strong&gt;)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;How to make the request&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  API endpoints
&lt;/h3&gt;

&lt;p&gt;In order to request data from an API, you need to know where to send the request. An address that an API provides for people to make requests is called an &lt;strong&gt;endpoint&lt;/strong&gt;. Some APIs have just one endpoint, responding to only one type of request. More usually, an API will have several different endpoints, each one allowing you to request different information.&lt;/p&gt;

&lt;p&gt;The Harry Potter API has a base address - &lt;code&gt;https://www.potterapi.com/v1/&lt;/code&gt; and then several endpoints that extend from there. If you want, for example, to get a random Hogwarts house, you can use the sorting hat endpoint - &lt;code&gt;sortinghat&lt;/code&gt; to make that request. The full address for the sorting hat endpoint is &lt;code&gt;https://www.potterapi.com/v1/sortinghat&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Because APIs communicate using HTTPS, the endpoint is a valid web address - or &lt;a href="https://www.lifewire.com/what-is-a-url-2626035"&gt;URL&lt;/a&gt; - that you can access. Visiting that address will show you a randomly-chosen Hogwarts house.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.potterapi.com/v1/sortinghat"&gt;Access the Harry Potter API sorting hat endpoint.&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  API calls
&lt;/h3&gt;

&lt;p&gt;In its simplest form, an API request is just the address of an API endpoint. Making an API call is when you access - or "hit" - an endpoint with a request.&lt;/p&gt;

&lt;p&gt;The link above lets you hit the &lt;code&gt;sortinghat&lt;/code&gt; endpoint manually. From a human point of view, you click the link and visit another site. What's actually happening is that - when you click the link - your browser makes a request to the API endpoint, which gives back the data in &lt;a href="https://www.json.org/json-en.html"&gt;JSON format&lt;/a&gt;, which your browser then displays for you. Most of the time, we don't need to think about the requests going back and forth across the internet, but it's helpful to be aware of them when talking about APIs.&lt;/p&gt;

&lt;p&gt;Not all requests are so simple. Some API endpoints listen out for extra information in requests, and return different data depending on the &lt;strong&gt;parameters&lt;/strong&gt; you provide. Some endpoints require &lt;strong&gt;authentication&lt;/strong&gt;, and will only return data to requests which contain a secret API key. We'll look at both of those further on.&lt;/p&gt;

&lt;h3&gt;
  
  
  API documentation
&lt;/h3&gt;

&lt;p&gt;Although many APIs work in very similar ways, you'll always need to do a bit of research to work out exactly what the endpoints for a particular API are, or how requests should be formatted. Luckily, most APIs come with detailed documentation, including example requests and the data they would return.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://www.potterapi.com/"&gt;Harry Potter API's documentation&lt;/a&gt; explains what each endpoint is for, and how to use it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Accessing an API using Python
&lt;/h2&gt;

&lt;p&gt;While it is possible to just visit API endpoints as a human user, it's not really what they're for. APIs are designed to be accessed using code, from within programs. We'll look now at how to use Python to access the &lt;code&gt;sortinghat&lt;/code&gt; api endpoint.&lt;/p&gt;

&lt;h3&gt;
  
  
  Importing libraries
&lt;/h3&gt;

&lt;p&gt;We only need one library to access the Harry Potter API. &lt;code&gt;requests&lt;/code&gt; allows us to make HTTPS requests through Python.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;requests&lt;/span&gt;  &lt;span class="c1"&gt;# Make calls to web API endpoints
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Creating the URL
&lt;/h3&gt;

&lt;p&gt;The next step is to craft the URL - the actual address to request data from.&lt;/p&gt;

&lt;p&gt;Although we could just store the URL as one string for this request, it's both good practice, and useful for later, to first create the different bits of the URL and then connect them together. This makes it easier to edit this URL and create new ones in the future.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Create the URL components
&lt;/span&gt;
&lt;span class="n"&gt;base_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"https://www.potterapi.com/v1/"&lt;/span&gt;

&lt;span class="n"&gt;endpoint_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"sortinghat"&lt;/span&gt;

&lt;span class="c1"&gt;# Join the pieces together
&lt;/span&gt;
&lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;base_url&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;endpoint_url&lt;/span&gt;

&lt;span class="c1"&gt;# View the url
&lt;/span&gt;
&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://www.potterapi.com/v1/sortinghat"&gt;https://www.potterapi.com/v1/sortinghat&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Making the request
&lt;/h3&gt;

&lt;p&gt;Once the URL has been created, we can use the &lt;code&gt;.get()&lt;/code&gt; method in the &lt;code&gt;requests&lt;/code&gt; library to ask the API for the data. There are other types of requests that we could use, but mostly, when dealing with APIs, you'll use GET requests: the HTTPS request that asks for information.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Make a request - accio data
&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In response to our request, we get (appropriately enough) a &lt;code&gt;response&lt;/code&gt; object. This not only contains our data, but also key information about the request and how it was received.&lt;/p&gt;

&lt;p&gt;All &lt;code&gt;response&lt;/code&gt; objects have an HTTP &lt;strong&gt;status code&lt;/strong&gt;. This is a 3-digit number that tells you if the request was successful and, if it wasn't successful, what went wrong. You are probably already familiar with some status codes, such as &lt;strong&gt;404&lt;/strong&gt;: the status code for when a requested resource could not be found. There are many &lt;a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Status"&gt;different codes&lt;/a&gt;, each one with a different meaning.&lt;/p&gt;

&lt;p&gt;The status code for a successful request with no problems is &lt;strong&gt;200&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Check the status code of the response object
&lt;/span&gt;
&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;200&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Getting the data
&lt;/h3&gt;

&lt;p&gt;Lastly, we need to actually extract the data from the &lt;code&gt;response&lt;/code&gt; object. As already mentioned, the API returns data in &lt;a href="https://www.json.org/json-en.html"&gt;JSON format&lt;/a&gt;. JSON stands for &lt;strong&gt;J&lt;/strong&gt;ava &lt;strong&gt;S&lt;/strong&gt;cript &lt;strong&gt;O&lt;/strong&gt;bject &lt;strong&gt;N&lt;/strong&gt;otation, and it is one of the most popular data formats in the world. It's a relatively lightweight format, it's human-readable, and it's easy to work with using most programming languages.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;response&lt;/code&gt; object has a built-in method, &lt;code&gt;.json()&lt;/code&gt;, that extracts the data from JSON format and returns it as the most appropriate Python data structure. In our current case, that's just a &lt;code&gt;str&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Access the data
&lt;/span&gt;
&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# View the data
&lt;/span&gt;
&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;Slytherin&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Authentication
&lt;/h2&gt;

&lt;p&gt;Many APIs require you to provide user credentials to access their data. This allows them to manage traffic, control server costs, charge for access, and understand how the API is actually being used.&lt;/p&gt;

&lt;p&gt;As a general rule, these credentials take the form of an &lt;strong&gt;API key&lt;/strong&gt; - a long string of letters and numbers that identifies a request as coming from a particular user. When making the request, you attach your API key.&lt;/p&gt;

&lt;h3&gt;
  
  
  Getting an API key
&lt;/h3&gt;

&lt;p&gt;The Harry Potter API doesn't require authorisation to access the &lt;code&gt;sortinghat&lt;/code&gt; endpoint, but it is required for any of the other endpoints. You can get a free key from the API by &lt;a href="https://www.potterapi.com/login/#signup"&gt;creating an account&lt;/a&gt; with a valid email.&lt;/p&gt;

&lt;p&gt;Once you've created an account, you'll be given your unique key.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;API keys should be kept private and secure - don't share your keys with anyone.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;In the cell below, I've replaced my actual key with a placeholder string.&lt;/strong&gt;&lt;/p&gt;






&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Store the API key as a variable.
&lt;/span&gt;
&lt;span class="n"&gt;HP_API_KEY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"XXXXXXXXXXXXXXXXXX"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Making a request with an API key
&lt;/h3&gt;

&lt;p&gt;We can use the API key to make requests to a different endpoint - the &lt;code&gt;spells&lt;/code&gt; endpoint.&lt;/p&gt;

&lt;p&gt;The first step here is similar to our earlier API call: we construct the URL from a base component and and endpoint component.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Construct the required URL pieces
# base_url already exits
&lt;/span&gt;
&lt;span class="n"&gt;endpoint_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"spells"&lt;/span&gt;

&lt;span class="c1"&gt;# Construct the URL
&lt;/span&gt;
&lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;base_url&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;endpoint_url&lt;/span&gt;

&lt;span class="c1"&gt;# View the URL
&lt;/span&gt;
&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://www.potterapi.com/v1/spells"&gt;https://www.potterapi.com/v1/spells&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If you attempt to visit the URL at the moment though, you'll get this response:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;"error": "Must pass API key for request"&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;We need to add the API key onto the request in order to authenticate it. However, it's not as simple as just sticking it onto the end - the API will think it's part of the actual URL and we'll end up requesting data from an endpoint that doesn't exist. Instead, we need to add it as a &lt;strong&gt;parameter&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Parameters
&lt;/h3&gt;

&lt;p&gt;Parameters - also called "query parameters" - are the extra pieces of pieces of information that the API is listening for. Each parameter has a name and a value, and they're attached onto the URL with a special syntax so that the API knows how to interpret them.&lt;/p&gt;

&lt;p&gt;Later on, we'll use several different parameters at once, but for this call, we just need one: the &lt;strong&gt;key&lt;/strong&gt; parameter.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Add the parameter onto the URL
&lt;/span&gt;
&lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;"?key="&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;HP_API_KEY&lt;/span&gt;

&lt;span class="c1"&gt;# Display the URL
&lt;/span&gt;
&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://www.potterapi.com/v1/spells?key=XXXXXXXXXXXXXXXXXX"&gt;https://www.potterapi.com/v1/spells?key=XXXXXXXXXXXXXXXXXX&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Because I've replaced my API key with a placeholder, the link above won't work. With a real API key though, it would give you the data.&lt;/p&gt;

&lt;p&gt;The "?" tells the API to stop reading the URL as an address from that point on, and start looking for parameters. Next comes the name of the parameter - &lt;code&gt;key&lt;/code&gt; - followed by an equals sign and then the actual value. In this case, I've replaced the real value with a fake one, for security reasons.&lt;/p&gt;

&lt;p&gt;When the API receives this request, it will start by identifying the address part of the URL, and directing it towards the right endpoint. Then it will extract any parameters it finds, matching the names of parameters it will accept to the names in the URL.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Send the request
&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Check the response code
&lt;/span&gt;
&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A response of 200 means that the server has accepted the API request and that the API key is valid.&lt;/p&gt;

&lt;h2&gt;
  
  
  Extracting the data
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;spells&lt;/code&gt; endpoint is a bit more complex than &lt;code&gt;sortinghat&lt;/code&gt; and returns more data. In order to extract meaningful information, we'll have to go through a few more steps.&lt;/p&gt;

&lt;p&gt;The start is the same - we can access the data using &lt;code&gt;.json()&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Access the data
&lt;/span&gt;
&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Check the type of the data
&lt;/span&gt;
&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;

&lt;/blockquote&gt;

&lt;p&gt;The response this time has given us a list - we'll need to loop through it to get at the spell details.&lt;/p&gt;

&lt;p&gt;Let's start by looking at just the first item in the list.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Check the type of the first list item
&lt;/span&gt;
&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;

&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Print out the first item
&lt;/span&gt;
&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;{'_id': '5b74ebd5fb6fc0739646754c', 'spell': 'Aberto', 'type': 'Charm', 'effect': 'opens objects'}&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Each item in the list is a dictionary of &lt;code&gt;key:value&lt;/code&gt; pairs. Now that we understand the structure of the data, we can actually access the data we requested.&lt;/p&gt;

&lt;p&gt;Because &lt;code&gt;.json()&lt;/code&gt; converts everything into Python objects, anything that you would normally do with Python is an option. With just a little more code, we can get the total number of spells:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Print the number of spells
&lt;/span&gt;
&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;151&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;We could also extract the spell names from the list.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Loop through the first five items of the list, printing out the name of each spell.

for item in data[:5]:
    print(item["spell"])
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;Aberto&lt;br&gt;
Accio&lt;br&gt;
Age Line&lt;br&gt;
Aguamenti&lt;br&gt;
Alarte Ascendare&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And - in a slightly more complex example - we can count spells by type.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Count up each type of spell
&lt;/span&gt;&lt;span class="n"&gt;spell_counts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;spell_counts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;spell_counts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;spell_counts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;spell_counts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;":"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;spell_counts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;Charm: 40&lt;br&gt;
Enchantment: 1&lt;br&gt;
Spell: 92&lt;br&gt;
Hex: 1&lt;br&gt;
Curse: 15&lt;br&gt;
Jinx: 2&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The API calls get the data into your program; you can then do whatever you want with it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Parameters
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;key&lt;/code&gt; is a required parameter for all the endpoints except &lt;code&gt;sortinghat&lt;/code&gt;, but it's not the only one available. By consulting the &lt;a href="https://www.potterapi.com/"&gt;documentation&lt;/a&gt;, you can learn which API endpoints accept which query parameters.&lt;/p&gt;

&lt;p&gt;You can use parameters to filter the data, returning only a subset of the available data. To explore this, we'll use the &lt;code&gt;characters&lt;/code&gt; endpoint, which accepts several different parameters.&lt;/p&gt;

&lt;h3&gt;
  
  
  Accessing characters
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Construct the url
&lt;/span&gt;
&lt;span class="n"&gt;endpoint_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"characters"&lt;/span&gt;

&lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;base_url&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;endpoint_url&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;"?key="&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;HP_API_KEY&lt;/span&gt;

&lt;span class="c1"&gt;# Request all character data
&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Check the response status
&lt;/span&gt;
&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;200&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Extract the data
&lt;/span&gt;
&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Count the number of characters
&lt;/span&gt;
&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;195&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Adding more parameters
&lt;/h3&gt;

&lt;p&gt;In order to add more parameters, filtering the data, we add them onto the end of the URL. "&amp;amp;" is used to connect the different parameters together.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Add another parameter onto the url
&lt;/span&gt;
&lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;"&amp;amp;deathEater=True"&lt;/span&gt;

&lt;span class="c1"&gt;# Request character data on characters who are Death Eaters
&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Check the response status
&lt;/span&gt;
&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;200&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Extract the data
&lt;/span&gt;
&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Count the Death Eaters
&lt;/span&gt;
&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;24&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;You can combine parameters in any way you want, filtering the data to whatever degree you need. The query below, for example, requests information on all the pure-blood wizards who work at the Ministry of Magic .&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Craft the URL
&lt;/span&gt;
&lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;base_url&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;endpoint_url&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; \
      &lt;span class="s"&gt;"?key="&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;HP_API_KEY&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; \
      &lt;span class="s"&gt;"&amp;amp;bloodStatus=pure-blood&amp;amp;ministryOfMagic=True"&lt;/span&gt;

&lt;span class="c1"&gt;# Hit the endpoint
&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Count the wizards
&lt;/span&gt;
&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;6&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Conclusions
&lt;/h2&gt;

&lt;p&gt;There's an awful lot more complexity to APIs that is worth exploring; hopefully this post has made some of the key ideas clear and given you a springboard from which to investigate further.&lt;/p&gt;

&lt;p&gt;One of the best ways to build your skills &amp;amp; understanding is to find an API you're interested in and just start playing around with it. Different APIs will have their own rules and documentation, but the broad principles are very similar: hit an endpoint to make a request, include parameters to be more specific. As APIs want you to use them, the documentation is normally quite clear and accessible. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/public-apis/public-apis"&gt;This GitHub repository&lt;/a&gt; holds a large list of publicly-accessible APIs for you to play with. Go explore &amp;amp; experiment, and if you have any questions or find something interesting, please do &lt;a href="https://github.com/peritract"&gt;let me know&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://pokeapi.co/"&gt;A world of dreams and adventure awaits you&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>python</category>
      <category>tutorial</category>
      <category>api</category>
      <category>requests</category>
    </item>
    <item>
      <title>Dungeons &amp; Data</title>
      <dc:creator>Dan Keefe</dc:creator>
      <pubDate>Thu, 14 May 2020 11:55:23 +0000</pubDate>
      <link>https://forem.com/peritract/dungeons-data-2675</link>
      <guid>https://forem.com/peritract/dungeons-data-2675</guid>
      <description>&lt;p&gt;I recently came across a Dungeons and Dragons (D&amp;amp;D) &lt;a href="https://www.dnd5eapi.co/" rel="noopener noreferrer"&gt;API&lt;/a&gt;, designed to support people playing D&amp;amp;D by giving them a quick way to check spell details etc. Not being a D&amp;amp;D player myself, but being interested in both fantasy monsters and data mining, I thought it would be an interesting data source to play around with.&lt;/p&gt;

&lt;p&gt;To that end, I extracted as much information about monsters as the API would give me, and started sifting through the data to see if anything interesting jumped out.&lt;/p&gt;

&lt;h2&gt;
  
  
  Background
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Dungeons_%26_Dragons" rel="noopener noreferrer"&gt;Dungeons &amp;amp; Dragons&lt;/a&gt; is a tabletop roleplaying game published by &lt;a href="https://company.wizards.com" rel="noopener noreferrer"&gt;Wizards of the Coast&lt;/a&gt;. Arguably/probably, it's the most well-known and influential one ever, casting a long shadow over much of the fantasy genre for the past several decades. Of course, older fantasy such as &lt;em&gt;The Lord of the Rings&lt;/em&gt; and &lt;em&gt;Beyond the Fields we Know&lt;/em&gt; cast an even longer shadow over D&amp;amp;D itself, but it's still very important within its scope.&lt;/p&gt;

&lt;p&gt;It's a collaborative roleplaying game, with some players taking on the roles of adventurers - with all sorts of different classes/abilities - and one player taking the role of "dungeon master" (DM), creating the world and animating the monsters.&lt;/p&gt;

&lt;p&gt;D&amp;amp;D is famous for, amongst other things, the number and variety of its various monsters. Monsters like the &lt;a href="https://en.wikipedia.org/wiki/Beholder_(Dungeons_%26_Dragons)" rel="noopener noreferrer"&gt;beholder&lt;/a&gt; and the &lt;a href="https://en.wikipedia.org/wiki/Owlbear" rel="noopener noreferrer"&gt;owlbear&lt;/a&gt; originate from D&amp;amp;D, but have taken on a wider cultural significance. Other D&amp;amp;D monsters were adapted from earlier legendary/folklore sources, but their portrayal within the game has had a definite effect on the way they are &lt;a href="https://en.wikipedia.org/wiki/Lich" rel="noopener noreferrer"&gt;widely thought of&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;As the game's name suggests, much of D&amp;amp;D revolves around noble heroes delving deep into dungeons and fighting monstrous creatures. This post describes my delving into monster data in a way that is both similar to, and much less brave, than those adventurers' expeditions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Disclaimer
&lt;/h2&gt;

&lt;p&gt;I don't know much about the specifics of D&amp;amp;D; I have a general cultural knowledge of how it works, etc., but I might say something dumb at any point because I don't know the rules; please view me as &lt;a href="https://www.buttersafe.com/2008/10/23/the-detour/" rel="noopener noreferrer"&gt;a well-intentioned but naive tourist&lt;/a&gt;, and forgive any irritating ignorance.&lt;/p&gt;

&lt;p&gt;This post is data-mining D&amp;amp;D from an outside perspective, and some of the things I find interesting might be less so to you if you have prior knowledge. I'm not claiming any definitive analysis, or special authority here, but it's fair to say that charging in without requisite prior knowledge is a &lt;a href="https://www.youtube.com/watch?v=mLyOj_QD4a4" rel="noopener noreferrer"&gt;time-honoured dungeon-crawling tradition&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Aims
&lt;/h2&gt;

&lt;p&gt;My analysis was driven primarily by curiosity, rather than any set aim, but there were a couple of questions I was interested in answering:&lt;/p&gt;

&lt;h3&gt;
  
  
  Are there neglected monster niches?
&lt;/h3&gt;

&lt;p&gt;People create "homebrew" D&amp;amp;D content all of the time, developing new quests and classes and monsters to put in their games. If there are particular kinds or styles of monster that are relatively uncommon, it could be useful to identify them, so that people creating new content can target the most untrodden ground.&lt;/p&gt;

&lt;h3&gt;
  
  
  Are the raw stats (such as strength and charisma) strongly linked to challenge rating (a monster's canonical difficulty to overcome)?
&lt;/h3&gt;

&lt;p&gt;Challenge rating (CR) seems - from an outside perspective - to be a bit of an arbitrary figure. Surely so much of the game's difficulty will depend on player choice, how intelligently the DM plays the undead wizard, etc. I'd like to look at how closely linked the CR of a monster is to its other numeric stats. My initial assumption was that there would be a noticeable trend, but that there would be a lot of noise caused by abilities and such that affected the monster's difficulty alongside its raw scores.&lt;/p&gt;

&lt;h2&gt;
  
  
  Motivation
&lt;/h2&gt;

&lt;p&gt;I was motivated to carry out this analysis by four factors, listed in ascending order of importance:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;I'm currently doing the &lt;a href="https://www.udacity.com/course/data-scientist-nanodegree--nd025" rel="noopener noreferrer"&gt;Udacity Data Scientist Nanodegree&lt;/a&gt;, and projects are required.&lt;/li&gt;
&lt;li&gt;There's a &lt;a href="https://www.dmsguild.com/" rel="noopener noreferrer"&gt;very active&lt;/a&gt; modding/homebrew community for D&amp;amp;D, and this sort of analysis might conceivably be useful to someone looking to create new monsters, etc.&lt;/li&gt;
&lt;li&gt;It was an opportunity to practice various data-related techniques.&lt;/li&gt;
&lt;li&gt;I found the API and thought it would be fun to play with.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Code
&lt;/h2&gt;

&lt;p&gt;All the code written during this project can be found on &lt;a href="https://github.com/Peritract/data-projects/tree/master/dungeons-and-data" rel="noopener noreferrer"&gt;Github&lt;/a&gt;. I wrote it using Python 3, inside a Jupyter notebook. The linked repository contains a full list of libraries used and commented code explaining what I did to generate the various visualisations explained below.&lt;/p&gt;

&lt;h2&gt;
  
  
  Data Source
&lt;/h2&gt;

&lt;p&gt;The data for this project was pulled from the &lt;a href="https://www.dnd5eapi.co/" rel="noopener noreferrer"&gt;Dungeons and Dragons 5th edition API&lt;/a&gt;. It's a really great API to work with - the documentation is very clear, and there are no fiddly authorisation problems.&lt;/p&gt;

&lt;p&gt;I made a total of 323 calls to the API - one to get a full list of available monsters, and 322 to pull full details for each of those listed monsters. I stored the whole thing in a Pandas dataframe, and then was good to go. Creating the full dataframe took around a minute.&lt;/p&gt;

&lt;p&gt;I'd like to stress that this analysis is only really scratching the surface of what can be done with this data; this is very much the &lt;a href="https://www.penny-arcade.com/comic/2008/11/14/" rel="noopener noreferrer"&gt;kill n common mobs&lt;/a&gt; of data-mining D&amp;amp;D. It could absolutely be taken further and used to do even more exciting things, probably by someone who is both a better data scientist than me and actually a player of the game.&lt;/p&gt;

&lt;h2&gt;
  
  
  Data cleaning &amp;amp; processing
&lt;/h2&gt;

&lt;p&gt;One of my absolute favourite things about APIs is that - if they're well-designed and reasonably maintained, almost no data cleaning is required. I dropped a few columns that were mostly empty (very few monsters have legendary actions, for example), but otherwise the data was automatically in a fit state. It was lovely.&lt;/p&gt;

&lt;p&gt;In terms of processing the data, I dropped several more columns that were very granular, such as the actions column containing specific details of the various actions each monster was capable of. I wanted to keep the analysis relatively high-level, so I kept only those columns that could be reduced to a single figure/detail for each separate monster.&lt;/p&gt;

&lt;p&gt;In order to extract key details - such as which monsters could swim - I had to do a small amount of data manipulation to pull data out &lt;code&gt;dict&lt;/code&gt;s into a usable format, but mostly this was a very easy dataset to work with.&lt;/p&gt;

&lt;h2&gt;
  
  
  Data exploration
&lt;/h2&gt;

&lt;p&gt;In total, the API provided me with data on 322 different monsters. For each one, I had information about its name, size, type (zombies are the "undead" type, for example), moral leanings (more on that later) and various numeric stats, ranging from how strong it was to how difficult it would be to defeat.&lt;/p&gt;

&lt;h3&gt;
  
  
  Type and Size
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F5e4uzbdqwqo6mg82cqs3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F5e4uzbdqwqo6mg82cqs3.png" alt="Monster types"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I started just by counting up the monsters for each type. "Beast" was by far the most common type of monster, followed by "dragon". Dragon being one of the most popular types makes sense - given the name of the game - but I was disappointed to discover that that's mostly because there are a whole bunch of different sub-types of dragon (red vs. black vs. green, and so on) and that each type appears several times at various ages. This makes dragons seem more dominant in the data than I feel is strictly fair; I get that it is important to have separate stat blocks for a creature that grows so dramatically in power, but it still seems like all the many and varied humanoids get a raw deal.&lt;/p&gt;

&lt;p&gt;I did spot an almost immediate imbalance though, linking back to the first question I wanted to answer. There are 23 separate "fiend" type monsters (basically demons), but only six "celestials" (basically angels). This violates the time-honoured principle of "as above, so below", and suggests that, if you are a dungeon master looking to brew up some new monsters, you're a lot more likely to come up with an original heavenly concept than you are a hellish one.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F4ehy8frmxn9sx2ampg8d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F4ehy8frmxn9sx2ampg8d.png" alt="Monster counts by size"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The next thing I looked at was monster size category. This seemed to show a reasonable spread: the majority of monsters were human-sized (medium) or one step larger, which makes sense both in terms of drama and providing sufficient challenge to players. A small number of extremely large monsters exist in the "gargantuan" category to provide challenge and terror.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stat distributions
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fc1z5zp2410yedynfakka.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fc1z5zp2410yedynfakka.png" alt="Distribution across monster stats"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I plotted the distribution of each of the main numeric stats across the whole set of monsters. This, again, was in aid of seeing if there were any obvious under-represented monsters. There was a lot of variation present, showing that for every stat, there are monsters at both ends of the scale. A small number of monsters had truly gargantuan (see what I did there) amounts of hit points, but the vast majority of monsters fell into a smaller range.&lt;/p&gt;

&lt;p&gt;One interesting thing did jump out to me here though - when you compare strength and intelligence, the distributions are very different. Discounting hit points because it clearly scales by other rules, you can see that the peak distribution for strength is around 16 (later than any other stat), while intelligence peaks close to zero (earlier than any other). I can see an ecological justification for it - many of the monsters are "beasts", and cannot be expected to have great minds - but it does suggest that the majority of monsters rely on physical, rather than intellectual prowess. Wisdom, the other cerebral stat, has a similar distribution to intelligence.&lt;/p&gt;

&lt;p&gt;"Dumb brute" seems to be a more common monster archetype than "scheming mastermind", and I suggest that - if anyone is looking to create new, original monsters - smart ones would be more easily differentiated. Plus, they're more fun, in my opinion; out of the three kidnappers in &lt;em&gt;The Princess Bride&lt;/em&gt;, Vizzini is the most tense confrontation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Senses and movement
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F7isa83f23c8r182lairv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F7isa83f23c8r182lairv.png" alt="Sense and movement"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Well over half of monsters can see in the dark. A respectable number have "blindsight" - they perceive the world through other senses, such as echolocation. A very small number have "truesight", which lets them see through illusions to unvarnished reality. This seems like a reasonable progression to me, with not too many monsters having an advantage in the dark, and the more exotic abilities being more rare.&lt;/p&gt;

&lt;p&gt;I do think that "tremorsense" deserves more related monsters though; detecting vibrations is something that a lot of real animals, such as moles, do, and some of the most &lt;a href="https://www.imdb.com/title/tt0100814/" rel="noopener noreferrer"&gt;iconic movie monsters&lt;/a&gt; ever have used it to great effect. &lt;/p&gt;

&lt;p&gt;The movement types suggest a similar neglected niche; while almost everything can walk, and flying and swimming have reasonable monster counts, very few monsters burrow. I'd like to champion the cause of blind things gnawing through the earth, because I find them extremely terrifying and also because they seem like they'd present new &amp;amp; exciting challenges compared to yet another goblinoid clan.&lt;/p&gt;

&lt;h3&gt;
  
  
  Alignment
&lt;/h3&gt;

&lt;p&gt;D&amp;amp;D has a moral alignment system based on two axes, one between good &amp;amp; evil, the other between law &amp;amp; chaos. Canonically, characters &amp;amp; monsters are given a two-word alignment description, such as "Chaotic evil" or "Lawful neutral".&lt;/p&gt;

&lt;p&gt;I know enough to know that there's a whole discourse around this in terms of how consistent/meaningful alignment is, how strictly it should be considered, and whether or not it's just a bad idea for the game/society generally. I have no plan of getting embroiled in that. &lt;/p&gt;

&lt;p&gt;However, I was curious about the representation of each alignment amongst monsters. It turns out, somewhat disappointingly, that a lot of monsters - 128 out of 322 - are "unaligned": they have no moral stance at all. A further few have awkward alignments, such as the Cloud Giant, which is "neutral good (50%) or neutral evil (50%)". After removing all the unaligned and awkward though, I was able to plot the monster counts for each of the nine normal alignments.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fbhaghvyirbjrkhr9url6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fbhaghvyirbjrkhr9url6.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Unsurprisingly, the most common alignments for monsters were evil ones, most commonly chaotic evil. Neutral alignments on either axis were quite unpopular, though "True Neutral" - the middle category - had quite a few occupants. Off-hand, I'm going to blithely assert that that's probably because it's a bit of a catch-all category. If there is a neglected niche here, it's monsters that aren't especially monstrous, ones that lean towards good or sit on the fence a lot.&lt;/p&gt;

&lt;h2&gt;
  
  
  Neglected monster niches
&lt;/h2&gt;

&lt;p&gt;My first aim was to examine neglected monster niches - particular types of monster that might be less than evenly-represented in the bestiary. Based on my exploration so far, I venture to suggest that there are some monster niches that are, if not under-occupied, at least less-occupied than others. D&amp;amp;D will always have a disproportionate number of dragons, but if you're looking for new monsters, there are some areas that would be more original than others.&lt;/p&gt;

&lt;p&gt;By my reckoning, the most original and exciting monster of all would be a well-intentioned burrowing celestial mastermind. If you're looking for further inspiration, allow me to point you towards the &lt;a href="https://factanimal.com/star-nosed-mole/" rel="noopener noreferrer"&gt;most celestial of all moles&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stats and challenge rating
&lt;/h2&gt;

&lt;p&gt;My other aim was to see how closely related monster stats are to challenge rating. To explore this, I calculated the correlations between CR and each other stat.&lt;/p&gt;

&lt;p&gt;It's worth mentioning at this point that this analysis rests on shaky ground, mathematically-speaking; while CR looks like a number, it's actually a set of ordered and inconsistently-spaced categories. For that reason, plus the limited data available, I chose not to do any more complicated modelling, because it wouldn't be very valid. Correlations suffer from the same issue, but can still be indicative as long as the above caveat is borne in mind; the correlations used here shouldn't be taken as precise metrics, but solely as a rough way of comparing two variables.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Stat&lt;/th&gt;
&lt;th&gt;CR correlation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Strength&lt;/td&gt;
&lt;td&gt;0.72&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dexterity&lt;/td&gt;
&lt;td&gt;-0.02&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Constitution&lt;/td&gt;
&lt;td&gt;0.86&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Intelligence&lt;/td&gt;
&lt;td&gt;0.64&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Wisdom&lt;/td&gt;
&lt;td&gt;0.55&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Charisma&lt;/td&gt;
&lt;td&gt;0.69&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hit points&lt;/td&gt;
&lt;td&gt;0.94&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Armour class&lt;/td&gt;
&lt;td&gt;0.76&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The majority of the stats showed a clear positive correlation; this is to be expected, as the point of those stats is to quantify a creatures attributes, and a creature that is stronger/faster/smarter than another will present a more significant challenge.&lt;/p&gt;

&lt;p&gt;Hit points, followed by constitution (which is linked to hit points and affects their growth) were the most highly correlated. This suggests that one way of increasing a monster's CR is just to bump its health up a bit so it takes longer to kill and has more time to fight back.&lt;/p&gt;

&lt;p&gt;The more physical stats - strength and constitution - are more correlated with CR than the cerebral ones, which, again, makes sense; I'm more terrified of intelligent enemies over the longterm, but when wrestling in a cave, I'd be more confident grappling a wizard than a cave troll.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F4xskjpxnkk8ne2oo03k2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F4xskjpxnkk8ne2oo03k2.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The most interesting correlation though, was with dexterity. Unlike the other stats, which plotted against CR in broadly similar neat lines, dexterity was all over the place. There were several high-CR monsters that have relatively poor scores for dexterity, and a couple of otherwise pitiful creatures (by CR) with high dexterity; the most anomalous is the will-o'-the-wisp, with a CR of 2 but a dexterity score of 28, significantly higher than any other monster.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://www.dandwiki.com/wiki/5e_SRD:dexterity" rel="noopener noreferrer"&gt;D&amp;amp;D wiki&lt;/a&gt; defines dexterity as measuring "agility, reflexes, and balance", but it seems to be used as a proxy for size in some cases, with dexterity being assumed to belong more with dainty creatures than hefty ones.To some extent, this makes sense, but it does feel slightly out of the spirit of the stat. The highest dexterity score amongst gargantuan creatures is 14, but the majority of the behemoths can fly, demonstrating that they aren't simply lumbering beasts. The kraken has many weaving tentacles (the description in the &lt;a href="https://www.dandwiki.com/wiki/5e_SRD:kraken" rel="noopener noreferrer"&gt;wiki&lt;/a&gt; uses verbs such as "twining" when describing it) which again matches my layman's understanding of dexterity, but is not reflected in the stat.&lt;/p&gt;

&lt;p&gt;I'm aware that I sound rather defensive of the monsters here (I have always had a soft spot for krakens; I am unsure why), and that this is probably one of those things that would make more intuitive sense if I was more familiar with the game in practice. I think possibly the stat is thinking of whole body dexterity (how easily can you fling yourself out of the path of an arrow), whereas I'm picturing the kraken's fine control of a single tentacle: the difference is one of nimbleness vs. precision.&lt;/p&gt;

&lt;p&gt;I do still think it's interesting that dexterity does not match the pattern of the other stats, suggesting that this stat is - either perceived to be or actually - less important than the others. This is definitely something that I lack the necessary domain knowledge to untangle further; if any reader can shed more light on this, I'd be very interested.&lt;/p&gt;

&lt;p&gt;I was, overall, a little disappointed by how high the correlations were between stats and CR. I expected it to be noticeable, but not quite to the extent that it was (particularly with hit points). Perhaps naively, I wanted to find that actually the scores were less important than the fine details of the monster.&lt;/p&gt;

&lt;p&gt;Still, there is definitely sufficient variation amongst monsters and stats to show that all those abilities and attributes do matter somewhere, albeit not to the extent that I wished. It must always be borne in mind, however, that CR itself is a crude and highly-subjective metric, heavily dependent on &lt;a href="https://1d4chan.org/wiki/Tucker%27s_Kobolds" rel="noopener noreferrer"&gt;how the monster is played&lt;/a&gt;, rather than how it is written.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final thoughts
&lt;/h2&gt;

&lt;p&gt;I don't have anything earth-shattering to wrap up with here; no big conclusions or stunning reversals. I got to explore an unfamiliar API and mine data that is a lot more interesting than most datasets floating around on the web.&lt;/p&gt;

&lt;p&gt;I'd like to experiment further with the API in the future, perhaps trying to generate new spells based on the descriptions of existing ones; unfortunately (because generating monsters would be the most fun of all), the descriptions of monsters are not available through the API.&lt;/p&gt;

&lt;p&gt;I enjoyed playing around with this data, and I found sufficient things to amuse me in it; it is my fervent hope that anyone who read this far down the post found it at least a fraction as interesting to read as it was to write.&lt;/p&gt;

&lt;p&gt;If anyone has suggestions for further avenues of analysis, I'd love to hear them. If anyone would like to clarify or explain anything that confused me (like the dexterity correlation), I'd be extremely interested in that as well. Please do get in touch.&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>python</category>
      <category>games</category>
      <category>showdev</category>
    </item>
    <item>
      <title>Wordclouds in Python</title>
      <dc:creator>Dan Keefe</dc:creator>
      <pubDate>Sun, 05 Apr 2020 15:06:07 +0000</pubDate>
      <link>https://forem.com/peritract/wordclouds-in-python-562e</link>
      <guid>https://forem.com/peritract/wordclouds-in-python-562e</guid>
      <description>&lt;p&gt;Wordclouds are a quick, engaging way to visualise text data. In Python, the simplest and most effective way to generate wordclouds is through the use of the &lt;a href="http://amueller.github.io/word_cloud/"&gt;Wordcloud&lt;/a&gt; library. In this tutorial, I'll explain how to generate wordclouds using the Wordcloud library, showing how to customise and improve your visualisations.&lt;/p&gt;

&lt;p&gt;This tutorial was written using using &lt;strong&gt;Jupyter notebooks&lt;/strong&gt;, &lt;strong&gt;Python 3.7.5&lt;/strong&gt; and &lt;strong&gt;Wordcloud 1.6.0&lt;/strong&gt;; things might behave slightly differently if you're in a different IDE or using different versions of the language/library.&lt;/p&gt;

&lt;p&gt;You can find a complete copy of the code for this tutorial on &lt;a href="https://github.com/Peritract/tutorials"&gt;Github&lt;/a&gt;, along with the text data and images used throughout.&lt;/p&gt;

&lt;h2&gt;
  
  
  Disclaimer
&lt;/h2&gt;

&lt;p&gt;Every statistician I have ever met requires me to inform you that wordclouds are not useful for analysis because they are simplistic and often misleading; I recommend &lt;a href="https://www.niemanlab.org/2011/10/word-clouds-considered-harmful/"&gt;this excellent article&lt;/a&gt; which goes into more detail on the problems with the form.&lt;/p&gt;

&lt;p&gt;The criticisms of wordclouds are absolutely valid, but that's not to say that wordclouds are pointless. While I wouldn't recommend them as a method of analysis or information extraction, they're a useful tool for presentation - people seem to find them fun and engaging, and they're a good thing to have on initial pages of presentations, etc. As long as you focus on using them for aesthetic, rather than analytic, reasons, they have a place.&lt;/p&gt;

&lt;h2&gt;
  
  
  Data source
&lt;/h2&gt;

&lt;p&gt;In order to demonstrate the possibilities of wordclouds, some text data is required. For the purposes of this tutorial, I've chosen to use the text of &lt;em&gt;Addie's Husband&lt;/em&gt; or &lt;em&gt;Through Clouds to Sunshine&lt;/em&gt;, a novel by &lt;a href="https://www.jstor.org/stable/2911375?seq=1"&gt;Mrs. Gordon Smythies&lt;/a&gt;, one of the most popular and most forgotten of Victorian novelists. &lt;/p&gt;

&lt;p&gt;You can find a copy of &lt;em&gt;Addie's Husband&lt;/em&gt; on &lt;a href="https://www.amazon.co.uk/Addies-Husband-Mrs-Gordon-Smythies-ebook/dp/B014LRV9YS"&gt;Amazon&lt;/a&gt; or on &lt;a href="https://www.gutenberg.org/files/49806/49806-h/49806-h.htm"&gt;Project Gutenberg&lt;/a&gt;. It's the affecting story of Adelaide, a young woman with bad lungs but a good heart. I accessed the text from Project Gutenberg, using the excellent &lt;a href="https://pypi.org/project/Gutenberg/"&gt;Gutenberg Python library&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In order that this tutorial can focus almost entirely on wordclouds, rather than text cleaning, I've already processed the novel's full text into a more consistent form. To be specific, the following steps have been carried out:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Removal of all metadata, including chapter headings, leaving just the prose&lt;/li&gt;
&lt;li&gt;Conversion of all text to lowercase&lt;/li&gt;
&lt;li&gt;Removal of all punctuation, special characters, and numeric values&lt;/li&gt;
&lt;li&gt;Removal of &lt;a href="https://en.wikipedia.org/wiki/Stop_words"&gt;stopwords&lt;/a&gt; and all words with fewer than 3 letters&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://en.wikipedia.org/wiki/Lemmatisation"&gt;Lemmatisation&lt;/a&gt; (converting each word to its dictionary form)&lt;/li&gt;
&lt;li&gt;Removal of all proper nouns I found on a cursory search&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The end result of this process is a text file - &lt;code&gt;adelaide.txt&lt;/code&gt; - containing a standardised and simplified form of &lt;em&gt;Addie's Husband&lt;/em&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Required libraries
&lt;/h2&gt;

&lt;p&gt;In order to work with the Wordcloud library effectively, you require several imports. Obviously the Wordcloud library itself, but also it helps to have libraries to deal with text processing and images.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;collections&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Counter&lt;/span&gt;  &lt;span class="c1"&gt;# Count the frequency of distinct strings
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;wordcloud&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;WordCloud&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ImageColorGenerator&lt;/span&gt;  &lt;span class="c1"&gt;# Generate wordclouds
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;PIL&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;  &lt;span class="c1"&gt;# Load images from files
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;  &lt;span class="c1"&gt;# Convert images to numbers
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Loading and preparing the data
&lt;/h2&gt;

&lt;p&gt;As already mentioned, the data for this tutorial is stored in a &lt;code&gt;.txt&lt;/code&gt; file. The first step is simply to load the &lt;a href="https://github.com/Peritract/tutorials/blob/master/wordclouds/resources/adelaide.txt"&gt;file&lt;/a&gt;'s data.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Load the data from a file
&lt;/span&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nb"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"./resources/adelaide.txt"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# View the first 200 characters of the text
&lt;/span&gt;
&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;'soldier sailor tinker tailor policeman plowboy gentleman lift lovely head dear marry gentleman miss absorbed enjoyment ruddy ribstone pippin turn blooming freckle face speaker answer pleasantly though'&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Our current text is perfectly suitable for wordcloud generation - given raw text, the Wordcloud library will automatically process it and generate a wordcloud.&lt;/p&gt;

&lt;p&gt;However, text can also be provided in the form of a frequency dictionary, in which the &lt;code&gt;key:value&lt;/code&gt; pairs have the form &lt;code&gt;word:frequency&lt;/code&gt;. I find this conceptually neater, as it allows you more explicit control over the processing steps before generating the cloud.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Split the text into a list of individual words
&lt;/span&gt;
&lt;span class="n"&gt;tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;" "&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Count the frequency of each word
&lt;/span&gt;
&lt;span class="n"&gt;word_counts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Counter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Display the count for a single word.
&lt;/span&gt;
&lt;span class="n"&gt;word_counts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"love"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;133&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Basic wordclouds
&lt;/h2&gt;

&lt;p&gt;In order to generate and display the most basic of wordclouds, very little is required. You need a &lt;code&gt;WordCloud&lt;/code&gt; object, and then to call &lt;code&gt;.generate()&lt;/code&gt; on it, passing a string as an argument.&lt;/p&gt;

&lt;p&gt;Finally, to display the cloud, &lt;code&gt;.to_image()&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;fog_machine&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;WordCloud&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# Create a wordcloud generator 
&lt;/span&gt;
&lt;span class="n"&gt;fog_machine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Generate the cloud using raw text
&lt;/span&gt;
&lt;span class="n"&gt;fog_machine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_image&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Og8MQWh6--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/qtycut8tlk4me5ce8k4r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Og8MQWh6--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/qtycut8tlk4me5ce8k4r.png" alt="Basic cloud image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Our basic wordcloud is rather small, and will require some customisation to improve.&lt;/p&gt;

&lt;p&gt;In the code below, I've generated a slightly better one, by using the following parameters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;width&lt;/code&gt; and &lt;code&gt;height&lt;/code&gt; to increase the size of the cloud's canvas area, and thus image quality&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;min_font_size&lt;/code&gt; to ensure that no words are too small to read&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;background_color&lt;/code&gt; to demonstrate that it doesn't have to be black if you don't want it to&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;colormap&lt;/code&gt; to choose a specific colour palette. Any valid &lt;a href="https://matplotlib.org/3.1.0/tutorials/colors/colormaps.html"&gt;Matplotlib colormap&lt;/a&gt; name is acceptable.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I've also chosen to generate the cloud using the frequency dictionary this time, not the raw text; that doesn't really have an effect on the style, it's just to demonstrate how you would do it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;
&lt;span class="c1"&gt;# Create a wordcloud generator with some better defaults
&lt;/span&gt;
&lt;span class="n"&gt;fog_machine&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;WordCloud&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;width&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;800&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="n"&gt;height&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;600&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="n"&gt;min_font_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;14&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="n"&gt;background_color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"#333333"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="n"&gt;colormap&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"spring"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Generate the cloud using a frequency dictionary
&lt;/span&gt;
&lt;span class="n"&gt;fog_machine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;generate_from_frequencies&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;word_counts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;fog_machine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_image&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# Display the cloud
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--dYyGG2yo--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/bd34sq3s1qixkv96f3ca.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--dYyGG2yo--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/bd34sq3s1qixkv96f3ca.png" alt="Better cloud image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It's not the best thing ever - I don't really have much of a talent for visual design - but I think it's clearly an improvement.&lt;/p&gt;

&lt;h2&gt;
  
  
  Shaped wordclouds
&lt;/h2&gt;

&lt;p&gt;Once you can create wordclouds with whatever colours you like, the next step is to shape the wordcloud, changing it from a boring rectangle into something appropriate to the content.&lt;/p&gt;

&lt;p&gt;This is done by using an image as a &lt;a href="https://www.colorexpertsbd.com/blog/what-is-image-masking"&gt;mask&lt;/a&gt;; you provide the &lt;code&gt;WordCloud&lt;/code&gt; object a numerical representation of an image with a &lt;strong&gt;white&lt;/strong&gt; background, and the wordcloud will only draw words in positions where the image is not white.&lt;/p&gt;

&lt;p&gt;As &lt;em&gt;Addie's Husband&lt;/em&gt; is a romance, it seems appropriate that the mask we use should be a heart. The image I'm using was sourced from &lt;a href="https://templatetrove.com/"&gt;Template Trove&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"./resources/heart.jpg"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Load the image from a file
&lt;/span&gt;
&lt;span class="n"&gt;image&lt;/span&gt;  &lt;span class="c1"&gt;# Display the image
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--sKgAlMeY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/9fzqwxykz0ha9emyrjin.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--sKgAlMeY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/9fzqwxykz0ha9emyrjin.jpg" alt="heart image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once the image is loaded in, we need to convert it into a numeric form, in which each pixel is represented as an array of three integers; the &lt;code&gt;WordCloud&lt;/code&gt; object will then only draw words on top of pixels in the image which are not equal to &lt;code&gt;[255, 255, 255]&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;mask&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Convert the image to a numeric representation (a 3D array)
&lt;/span&gt;
&lt;span class="n"&gt;mask&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# Display the top left pixel of the mask, which is white
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;array([255, 255, 255], dtype=uint8)&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;After creating the mask, you create the wordcloud almost exactly as before, just passing in the &lt;code&gt;mask&lt;/code&gt; parameter.&lt;/p&gt;

&lt;p&gt;One key point to note is that masked wordclouds ignore the &lt;code&gt;width&lt;/code&gt; and &lt;code&gt;height&lt;/code&gt; parameters, instead sizing themselves based on the dimensions of the mask, so there is no need to include them.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;
&lt;span class="c1"&gt;# Create a wordcloud generator with a mask
&lt;/span&gt;
&lt;span class="n"&gt;fog_machine&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;WordCloud&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mask&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;mask&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="n"&gt;min_font_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;14&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="n"&gt;colormap&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"Reds"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  

&lt;span class="c1"&gt;# Generate the cloud using a frequency dictionary
&lt;/span&gt;
&lt;span class="n"&gt;fog_machine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;generate_from_frequencies&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;word_counts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;fog_machine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_image&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# Display the cloud
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--oFdZLDlM--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/695a5x0gv15obb61o10j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--oFdZLDlM--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/695a5x0gv15obb61o10j.png" alt="Heart cloud image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Shaping wordclouds like this works best with simple images, as otherwise the core shape is not clear. By default, it also only works with images that have a fully white background - transparent backgrounds, for example, are treated as non-masked, and the wordcloud occupies the full rectangle again.&lt;/p&gt;

&lt;p&gt;It is possible to convert any image so that it has the correct structure for wordclouds, but doing so is rather beyond the scope of this tutorial; we're focused on wordclouds today, not image processing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Shaped &amp;amp; Coloured wordclouds
&lt;/h2&gt;

&lt;p&gt;We can both shape and colour a wordcloud based on an image, so that the words not only form the rough shape, but that each word is coloured as the image is in the same place. For this task, we'll use another heart, this one rainbow-coloured, and sourced from &lt;a href="https://pngfuel.com"&gt;PNGfuel&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;It's important to note here that this is a relatively crude fitting - a word that spans across several colour changes on the image will still only be in one colour, and you're not going to get lines as crisp and clear as the image itself. Again, this works best with simple images, ideally with clearly contrasting colours.&lt;/p&gt;

&lt;p&gt;The first step is to load in an image, as before.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"./resources/rainbow_heart.jpeg"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Load the image from a file
&lt;/span&gt;
&lt;span class="n"&gt;mask&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Convert the image to a numeric representation
&lt;/span&gt;
&lt;span class="n"&gt;image&lt;/span&gt;  &lt;span class="c1"&gt;# Display the image
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--a3MdnpgD--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/8ie9gyfbflz3358gcxpf.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--a3MdnpgD--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/8ie9gyfbflz3358gcxpf.jpeg" alt="Rainbow heart image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once you've created the mask, you can then use it as an input to the &lt;code&gt;ImageColorGenerator&lt;/code&gt; class, which is also part of the WordCloud library. This generates colours based on an image, and then can be used to colour words to match.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;image_colours&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ImageColorGenerator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mask&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can then pass the &lt;code&gt;ImageColorGenerator&lt;/code&gt; to a &lt;code&gt;WordCloud&lt;/code&gt; object using the &lt;code&gt;color_func&lt;/code&gt; parameter.&lt;/p&gt;

&lt;p&gt;In order to ensure that our wordcloud roughly matches the colours of the image, I've set the &lt;code&gt;max_words&lt;/code&gt; argument to 2000. This means that the &lt;code&gt;WordCloud&lt;/code&gt; object will use more words than the default of 200, which will result in smaller words and - hopefully - clearer bands of colour.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;
&lt;span class="c1"&gt;# Create a wordcloud generator with a mask
&lt;/span&gt;
&lt;span class="n"&gt;fog_machine&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;WordCloud&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mask&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;mask&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="n"&gt;max_words&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="n"&gt;color_func&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;image_colours&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Generate the cloud using a frequency dictionary
&lt;/span&gt;
&lt;span class="n"&gt;fog_machine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;generate_from_frequencies&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;word_counts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;fog_machine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_image&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# Display the cloud
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--LfMOXZQA--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/fzp0w6ko02ki2mbpacc3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--LfMOXZQA--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/fzp0w6ko02ki2mbpacc3.png" alt="Rainbow heart cloud image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Custom colour functions
&lt;/h2&gt;

&lt;p&gt;It is sometimes useful to have full control over the colours of words, so that you can highlight particular words or groups of word; you might, for example, wish to show positive words in one colour and negative words in another.&lt;/p&gt;

&lt;p&gt;We can define a custom colour function to do this, passing it to the &lt;code&gt;color_func&lt;/code&gt; parameter just as for mask colours. In the code below, I've defined a very simple one - return gold for words with the letter "o" in, grey for every other word - but you can customise this function to do whatever you want.&lt;/p&gt;

&lt;p&gt;When the &lt;code&gt;WordCloud&lt;/code&gt; object calls the function, it passes it a lot of information; this means that - when defining your custom function - it's better to have the parameters as &lt;code&gt;*args, *kwargs&lt;/code&gt;, as then you can pick and choose which arguments to later care about. The word itself is always passed as the first argument, or &lt;code&gt;args[0]&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# The custom function
&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;custom_colours&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;word&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# First argument is the word itself
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="s"&gt;"o"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;word&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; 
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s"&gt;"#FFD700"&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s"&gt;"#CCCCCC"&lt;/span&gt;

&lt;span class="c1"&gt;# Create a wordcloud generator with a custom color_func
&lt;/span&gt;
&lt;span class="n"&gt;fog_machine&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;WordCloud&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;width&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;800&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="n"&gt;height&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;600&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="n"&gt;min_font_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;14&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="n"&gt;color_func&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;custom_colours&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Generate the cloud using a frequency dictionary
&lt;/span&gt;
&lt;span class="n"&gt;fog_machine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;generate_from_frequencies&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;word_counts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;fog_machine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_image&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# Display the cloud
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--EVOuV1BP--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/4ujjso9ai1zl6vercdz1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--EVOuV1BP--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/4ujjso9ai1zl6vercdz1.png" alt="Custom colour cloud image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Saving wordclouds
&lt;/h2&gt;

&lt;p&gt;You can save a wordcloud to a file with a single line of code.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;fog_machine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"wordcloud.png"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The header image
&lt;/h2&gt;

&lt;p&gt;The header image for this tutorial was generated using Python and the techniques we've so far discussed. The code to create that wordcloud is below.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# One more wordcloud for Addie - the book does have a happy ending.
# The image is sourced from freepik.com.

image = Image.open("./resources/romance.jpeg")  # Load the image from a file

mask = np.array(image)  # Convert the image to a numeric representation

image_colors = ImageColorGenerator(mask)

# Create a wordcloud generator with a mask &amp;amp; colour function

fog_machine = WordCloud(mask=mask,
                        background_color="white",
                        max_font_size=28,
                        max_words=6000,
                        color_func=image_colors)

# Generate the cloud using a frequency dictionary

fog_machine.generate_from_frequencies(word_counts)

fog_machine.to_image()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--h20QaLBu--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/o8bf3yigjveabv1zbx1l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--h20QaLBu--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/o8bf3yigjveabv1zbx1l.png" alt="header cloud image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusions
&lt;/h2&gt;

&lt;p&gt;The final wordcloud we've created is a lot more complex than the previous ones; with a max of 6000 words (almost as many as there are unique words in &lt;em&gt;Addie's Husband&lt;/em&gt;), it takes a lot longer to generate, and the text is noticeably smaller. However, the end result is (in my opinion) worth the effort. Though the words themselves are now somewhat hard to read (always an issue with big wordclouds), it's an image that expresses one of the main ideas in &lt;em&gt;Addie's Husband&lt;/em&gt;, using words from &lt;em&gt;Addie's Husband&lt;/em&gt;, and I think that's neat.&lt;/p&gt;

&lt;p&gt;The Wordcloud library is stuffed with extra tweakable parameters, and there's a lot more you can do to refine things than is covered in this tutorial. However, hopefully this gives some idea of the possibilities and the code to move towards them. The &lt;a href="http://amueller.github.io/word_cloud/auto_examples/index.html"&gt;module documentation&lt;/a&gt; has many more examples.&lt;/p&gt;

&lt;p&gt;There are a couple of caveats with wordclouds that should always be borne in mind. As already mentioned, it's easy to make confusing or misleading word clouds, because the focus on frequency ignores any information from the text that was contained in more than one word; "not happy" and "happy" are both going to make the word "happy" appear larger.  &lt;/p&gt;

&lt;p&gt;Even if your wordclouds manage not to be misleading, that still doesn't make them meaningful; again, the focus on frequency ignores context and significance. Even with stopwords removed, it's clear from the wordclouds generated in this tutorial that not all words are equally informative about what's actually happening in &lt;em&gt;Addie's Husband&lt;/em&gt;. The word "say" is frequent, but would be equally so in almost any long narrative text. Arguably, the elaborate style of Victorian prose means that you'll get more frequent-but-not-significant words than a comparable modern text, but unless you do heavy cleaning, wordclouds will always contain some functional words that get in the way a bit. &lt;/p&gt;

&lt;p&gt;With the above said though, I still think wordclouds are a useful thing to be able to generate; as long as you bear the caveats in mind, and don't use them too heavily as actual analysis tools, they're an accessible and appealing visualisation that people respond to well.&lt;/p&gt;

&lt;p&gt;You can find a complete copy of the code for this tutorial on &lt;a href="https://github.com/Peritract/tutorials"&gt;Github&lt;/a&gt;, along with the text data and images used throughout.&lt;/p&gt;

&lt;p&gt;If you create any wordclouds using this tutorial, I'd love to see them. Send them to me via the links in my author bio, or tweet them to me at &lt;a href="https://twitter.com/peritract"&gt;@peritract&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>python</category>
      <category>wordcloud</category>
      <category>tutorial</category>
      <category>visualisation</category>
    </item>
  </channel>
</rss>
