Forem: Abhishek Biswal

Classification of Phrases/Quotes using Chat GPT API and Firebase Cloud Functions

Abhishek Biswal — Mon, 13 Mar 2023 15:55:20 +0000

Generating topics for a given quote can be challenging, but with advancements in AI and natural language processing, now it's possible to generate topics with high accuracy. In this article, we'll go through how to use the Chat GPT API to generate topics for a given quote and host the method in the Firebase cloud function HTTP method.

What is Chat-GPT-API?
Chat-GPT-API is an API that provides access to OpenAI's GPT (Generative Pre-trained Transformer) language model. GPT is a machine learning model that is trained to generate human-like text based on a given prompt. The Chat-GPT-API allows developers to integrate the GPT model into their applications and generate text based on user input.

What are Firebase Cloud Functions?
Firebase Cloud Functions is a serverless compute service that allows developers to run code in response to events and automatically scale based on traffic. Firebase Cloud Functions supports Node.js, Python, Java, Go, and .NET. In this article, we will use Firebase Cloud Functions with Node.js to host our code.

Now that we have an understanding of the technologies we will be using, let's take a look at the process.

The first step in creating a tag/category is to create a list of topics from which to choose. In our example, we defined a list of default topics as an array of objects with an id and name field. Below is an example:

const defaultTags = [
    { id: "1", name: "a" },
    { id: "2", name: "b" },
    { id: "3", name: "c" },
    { id: "4", name: "d" },
    { id: "5", name: "x" },
    { id: "6", name: "z" }
  ];

Next, we need to set up the configuration for the OpenAI API using our API key.

const configuration = new Configuration({
    apiKey: process.env.OPENAI_API_KEY,
  });

The function that creates the topics can now be defined. We will host this function in Firebase cloud function HTTP way so that we may access it via an API endpoint.

async function generateTopics(req, res){
    var data = req.body
    var tagNames = defaultTags.map(tag => tag.name);

    #initialize openaiapi
    const openai = new OpenAIApi(configuration);
}

Next create a prompt:
var userPrompt = "Assign multiple topics as an array from the topic list given below to the following quote:\nQuote - " + data.content + "\n\ntopics = [" + tagNames.join(", ") + "]\n\n Output should be in the format `topics = ['']`";

then create a openAIApi response using the prompt:

const response = await openai.createChatCompletion({
        model: "gpt-3.5-turbo",
        messages: [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role":"user", "content": userPrompt}],
        temperature: 0.8,
        max_tokens: 150,
        top_p: 1,
        frequency_penalty: 0,
        presence_penalty: 0.6
    });

Here we are using gpt-3.5-turbo model to get the response.

Then we have to parse the response to get the message. The structure of the response is like the below:

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "choices": [{
    "index": 0,
    "message": {
      "role": "user",
      "content": "topics = ['a', 'b', 'c']",
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 9,
    "completion_tokens": 12,
    "total_tokens": 21
  }
}

We have to get the message which is required further to get the topics.
const message = response.data.choices[0] const json = message.message const content = JSON.stringify(json.content)

Then we can use regular expression to parse the topics from the result:

const topicsArr = str.match(/\[([^[\]]*)\]/)[1].match(/'[^']*'/g).map(topic => topic.slice(1, -1));

After this we can use map and filter to assign respective ids from the default topic list

const selectedTagObjects = updatedTags.filter(tag => generatedTopics.includes(tag.name))
        .map(({ id, name }) => ({ id, name }));

after all this we can return the result as below:

var result = {
        "tags": selectedTagObjects
    }
    cors(req,res, ()=>{
        res.status(200).json({result})
    })

In conclusion, using the Chat GPT API for generating topics from a given quote can be a useful tool in various applications, such as content tagging and topic identification. The code snippet above demonstrates how we can utilize this API in a Firebase Cloud Function and provide an endpoint to generate topics from a given quote. By simply sending a post request with the quote in the request body, we can get back a list of relevant topics generated by the API. This can be further improved and integrated with other systems to create a more comprehensive solution for content analysis and classification.

How to prevent your files from being indexed while your application uses S3 for storage and CloudFront as distribution ?

Abhishek Biswal — Fri, 26 Mar 2021 05:53:39 +0000

When would you face such a scenario ?

While hosting your web application in an S3 and CloudFront distribution in AWS, sometimes we need to prevent some files(direct download links) from being indexed in the search engine results. Files like PDFs, Docs, Mp3s, Videos, Spreadsheets, PPts, etc., get indexed with the direct download link in the SERP. This isn't desirable as users directly get the file from the SERP without actually visiting your application/site which decreases site visits. To stop these files from ever being indexed in a SERP or to deindex the files which have already been indexed, we need to add an HTTP response header to these files that is X-Robots-Tag: noindex.

Common mistakes-

While there are tons of articles & guides available which claim one should block the search engine bots from accessing those files or folders and they simply won't be indexed.

Well, that's not true at all. The blocked bots will not crawl those folders but if they find those file links from other sources [ e.g., pages with internal links to those files ], they will still crawl and will index those files. We have faced this issue with some of our clients before and thus I decided to make a comprehensive guide on this topic.

Complications-

There is no direct method to add HTTP response headers to the files present inside the S3 bucket. There is an option to add custom user-defined meta headers to the files inside the S3 bucket with the prefix x-amz-meta-header-. CloudFront will serve them with those user-defined headers. So if we check those files for response headers with any HTTP headers testing method available [e.g., checking with https://securityheaders.com/ ], we can see these custom user-defined headers present on those files. Suppose we want to add the header X-Robots-Tag: noindex to those files in the S3 bucket. We have to add it like x-amz-meta-header-X-Robots-Tag: noindex. The crawlers won't recognise it.

Proposed solution-

We have to use the Lambda@Edge function to edit those origin response headers while accessing those files via CloudFront URL. [ custom domain you've connected to your CloudFront distribution ]. From the response HTTP header, we have to remove x-amz-meta-header-from the 'keyname' of the user-defined header, So the crawlers will find X-Robots-Tag: noindex as HTTP header while accessing those files and follow it's protocols.

Execution mechanism-

CloudFront dispatches four events [Viewer Request, Viewer Response, Origin Request, Origin Response].

These four events can be worked with AWS Lambda@Edge, Part of an AWS-lambda to execute methods to customize the contents CloudFront delivers. Lamba@Edge scales automatically and runs in the CloudFront location closer to the viewer.

In our case, we have to use the CloudFront event [Origin-Response] with lambda@edge to modify the header.

(Origin Response - Dispatched right after CloudFront gets a response from the Origin and before the object is cached in the response.)

Steps at a glance-

Create a lambda function with any language environment you're acquainted with [ here in the example; we've used Python 3.7]
Select the default service role, select the CloudFront distribution
Add the custom user-defined header x-amz-meta-X-Robots-Tag: noindex to those files in the S3 bucket.
Write the necessary code to remove x-amz-meta- from x-amz-meta-X-Robots-Tag
In your CloudFront distribution create an invalidation.
Then check your files with any method to check HTTP headers, and you'll find the X-Robots-Tag: noindex header in those files.
So when the crawlers get this response header, they will know that these files shouldn't be indexed. Eventually, indexed files will be removed from the SERP.

This is one of the few definitive ways to prevent your files from being indexed.