Yuki Ogawa for AWS Community Builders

Posted on May 11

Image Generation Chat App using Amazon Nova Canvas

#aws #novacanvas

Introduction

The Amazon Nova series announced at AWS re:Invent 2024 includes Nova Canvas, which can generate images. Previously, to generate images from Amazon Bedrock, the options were limited to the Amazon Titan series or the Stable Diffusion series, but now Nova Canvas has been added as a new choice.

In this article, we will explore what Nova Canvas can do and how to create an image generation chatbot.

Features of Nova Canvas

Nova Canvas is a generative AI model that creates new images from text or image prompts. The features of Nova Canvas are as follows:

Feature	Description
Provision of Reference Images	Can provide reference images useful for generating images or videos
Determination of Color Palette	Determines the color scheme or "color palette" of an image using text input
Image Editing	Allows replacing objects or backgrounds in input images using text prompts
Background Removal	Easily removes backgrounds, leaving the subject of the image unchanged
Safety, Responsible AI, Compensation	Includes traceability, content moderation, and watermarks for compensation

https://aws.amazon.com/ai/generative-ai/nova/creative/

Nova Canvas vs Titan

Below is a comparison between the Nova series and the Titan series.
The Nova series has been enhanced to handle longer content and more complex documents overall.

Feature	Titan Model	Nova Model
Optimal Use Case	General text generation, image tasks, embeddings	Long content and complex document processing
Use Case Scenarios	When image generation or embeddings are needed	When processing very long documents
Strengths	Versatility in text, image, and embeddings	Larger context window (up to 300K tokens)
Cost	Relatively low cost	High cost
Application Integration	Broad integration possibilities	Often optimized for specific use cases
Low Latency for Standard Tasks	Yes	No
Optimized for Production Workloads	Yes	No
Standard Context Length	Yes	No
Fast Response Time Required	Yes	No

On-Demand Pricing for Nova Canvas

Below are the usage fees for Nova Canvas and Titan Image Generator in the us-east-1 (Northern Virginia) region.
Depending on the image size, Nova Canvas costs several times more than Titan, so it is necessary to use them according to the use case.

Titan Image Generator also has a v1, but the cost in the above region was the same at the time of the survey.

Model	Image Resolution	Price per Image Generated at Standard Quality	Price per Image Generated at Premium Quality
Amazon Nova Canvas	Up to 1024 x 1024	USD 0.04	USD 0.06
Amazon Nova Canvas	Up to 2048 x 2048	USD 0.06	USD 0.08
Amazon Titan Image Generator v2	Smaller than 512 x 512	USD 0.008	USD 0.01
Amazon Titan Image Generator v2	Larger than 1024 x 1024	USD 0.01	USD 0.012

https://aws.amazon.com/bedrock/pricing/

Nova Canvas Chat App

Now, let's create a chatbot app using Nova Canvas!
Here is the technology stack we will use.

Item	Content	Remarks
Client Side	Streamlit	Web framework implemented in Python
Server Side	Cloud9	Integrated IDE environment on AWS, but new usage has been stopped. Alternatively, use VSCode Server or Amazon SageMaker Studio code editor.
Language	Python	Use version 3.9 or higher

Additionally, this time we will create a simple chatbot that can create icons with instructions in any languages.
The settings for the images to be created are just setting system prompts and negative prompts, so change the prompts according to the image you want to create.

1. Set Up the Development Environment

First, let's set up the development environment.
Install Python version


 or higher and create a development directory.

Install the following packages with

 ```pip install -r requirements.txt```

.



```requirements.txt
boto3==1.38.6
streamlit==1.45.0

The versions seem to be fine even if they are the latest, but please check before proceeding.

2. Create the Chat App

System Architecture

Here is the architecture of the chat app we will create. Initially, I thought it would be fine to just call Nova Canvas from Streamlit via boto3, but to incorporate the way of giving prompts for image generation, I added a process to extract keywords from the instruction sentences in the chat.

Tips for Image Generation via Gen-AI

Give Instructions with Words Instead of Sentences

Since generative AI understands prompt sentences by dividing them into a certain context length, it is better to give instructions with important words separated by commas.
Of course, Nova Canvas has the advantage of understanding longer contexts compared to Titan, but by using a description method that is easy for the generative AI model to understand, you can create images as you envision.

Example: Surprised child, colorful playground, anime style

Bring Important Words to the Front

It is necessary to bring important expressions to the front among the things you want to draw. Since the main theme this time is simple icon generation, the content defined as system prompts is brought to the front of the prompt sentence.

Utilize Negative Prompts

Negative prompts are important in image generation. Especially in normal prompt sentences, it is important to describe things you do not want to include in negative prompts instead of using negative expressions like not or no.
If negative expressions are included in the prompt, the generative AI model may generate an image that includes those words.

In this script, since we want to generate simple icons, we have added 3D, color, photo to the default negative prompts.
Also, since instructing icon generation may create multiple icon sets in one image, multiple images is also included in the negative prompts.

Give Instructions in English

Image generation models usually generate images closer to your vision when instructed in English rather than other languages.
In this chat app, the chat content is translated into English before extracting words, so it is also compatible with Japanese.

Nova Canvas Chat App Code

Here is the code for the Nova Canvas chat application.

import base64
import json
import os
import random
import boto3
import streamlit as st

REGION = "us-east-1"
IMG_MODEL_ID = "amazon.nova-canvas-v1:0"
TXT_MODEL_ID = "anthropic.claude-3-haiku-20240307-v1:0"

@st.cache_resource
def get_bedrock_client():
    return boto3.client(service_name="bedrock-runtime", region_name=REGION)

def generate_image(native_message, image_size, image_num, system_prompt, ng_text):
    """Generate images"""
    message = system_prompt + native_message
    print(f'textToImageParams: {message}')

    seed = random.randint(0, 858993460)
    native_request = {
        "taskType": "TEXT_IMAGE",
        "textToImageParams": {
            "text": message,
            "negativeText": ng_text
        },
        "imageGenerationConfig": {
            "seed": seed,
            "quality": "standard",
            "height": image_size,
            "width": image_size,
            "cfgScale": 10,
            "numberOfImages": image_num,
        },
    }
    request = json.dumps(native_request)

    bedrock_client = get_bedrock_client()
    response = bedrock_client.invoke_model(modelId=IMG_MODEL_ID, body=request)
    model_response = json.loads(response["body"].read())

    image_path_list = []
    for base64_image_data in model_response["images"]:

        # Save image to local folder
        i, output_dir = 1, "output"
        if not os.path.exists(output_dir):
            os.makedirs(output_dir)
        while os.path.exists(os.path.join(output_dir, f"nova_canvas_{i}.png")):
            i += 1

        image_data = base64.b64decode(base64_image_data)        
        image_path = os.path.join(output_dir, f"nova_canvas_{i}.png")
        with open(image_path, "wb") as file:
            file.write(image_data)

        print(f"The generated image has been saved to {image_path}")
        image_path_list.append(image_path)

    return image_path_list


def textract_from_input(message):
    """Translate Japanese instructions in chat to English and extract key words.
    This process uses foundational models like Haiku or Sonnet for text processing, not Nova Canvas.
    """
    bedrock_client = get_bedrock_client()
    system_prompt = "Translate Other language to English and extract key English words from the translated sentence. "\
    "Extract less than 10 meaningful English words, separated by commas. "\
    "Do not include prepositions or articles. "\
    "<examleInput>かわいい猫と女の子が楽しく遊んでいる</examleInput>"\ # example of Japanese
    "<examleOutput>cute cat, girl, play, joyful</examleOutput>"

    message = {
        "role": "user",
        "content": [{"text": message}]
    }
    messages = [message]
    system_prompts = [{"text" : system_prompt}]

    inference_config = {"temperature": 0.5}
    additional_model_fields = {"top_k": 200}

    response = bedrock_client.converse(
        modelId=TXT_MODEL_ID,
        messages=messages,
        system=system_prompts,
        inferenceConfig=inference_config,
        additionalModelRequestFields=additional_model_fields
    )
    words = response["output"]["message"]["content"][0]["text"]
    return words


def display_history(messages):
    """Display chat history"""
    for message in messages:
        display_img_content(message)


def display_img_content(message):
    """Display message and images
    Handles multiple images generated by Nova
    """
    contents = message["content"]
    print(f'message contents: {contents}')
    with st.chat_message(message["role"]):
        for content in contents:
            if content.get('text', None) != None:
                st.write(content["text"])
            else:
                st.image(content["image"])


def sidebar():
    """Display sidebar"""
    with st.sidebar:
        st.sidebar.title("Image Settings")

        # Image size
        image_size = st.sidebar.slider(
            "Image Size",
            min_value=320,
            max_value=960,
            step=64,
            value=320
        )

        # Number of images
        image_number = st.sidebar.slider(
            "Number of Images",
            min_value=1,
            max_value=5,
            step=1,
            value=1
        )

        # System prompt
        system_prompt = st.sidebar.text_area(
            "System Prompt",
            value="minimalist icon, simple, flat design, dual-line design, 1.5px stroke weight line, "\
            "solid background, monochromatic, 2D, ",
            height=200
        )

        # Negative prompt
        negative_text = st.sidebar.text_area(
            "Negative Prompt",
            value="3D, color, photo, multiple images",
            height=100
        )

        return image_size, image_number, system_prompt, negative_text


def main():
    """Main process"""
    st.title("Simple Icon Generator by Nova Canvas")
    img_size, img_num, system_prompt, negative_text = sidebar()

    if "messages" not in st.session_state:
        st.session_state.messages = []

    display_history(st.session_state.messages)

    if prompt := st.chat_input("What's up?"):
        input_msg = {"role": "user", "content": [{"text": prompt}]}
        display_img_content(input_msg)
        st.session_state.messages.append(input_msg)
        print(f'st.session_state.messages: {st.session_state.messages}')

        # Expand input content
        print(f'input_msg: {input_msg}')
        textract_resp = textract_from_input(input_msg["content"][0]["text"])
        print(f'texttract: {textract_resp}')

        # Generate images
        resp_img_path_list = generate_image(
            textract_resp, img_size, img_num, system_prompt, negative_text
        )

        resp_img_contents = []
        for resp_img_path in resp_img_path_list:
            resp_img_contents.append({"text": f"Image created! filepath: {resp_img_path}"})
            resp_img_contents.append({"image": resp_img_path})
        resp_msg = {
            "role": "assistant",
            "content": resp_img_contents
        }
        display_img_content(resp_msg)
        st.session_state.messages.append(resp_msg)


if __name__ == "__main__":
    main()

3. Running the Application

Let's run the application! Start the application with the following streamlit command.

streamlit run nova_canvas_chat.py --server.port 8080

In Cloud9, you can display the application by opening Preview -> Preview Running Application from the toolbar. The same should apply to local environments like VSCode, VSCode Server, and SageMaker Studio code editor.

If you see a startup screen like this, you have succeeded.

Now, let's instruct the chat to create the icon image you want. This time, I instructed it to "Create an icon for a ToDo list that can manage what I want to do." Since I want multiple suggestions, I set the number of images to 3 from the sidebar.

It created a nice ToDo list icon! Although the image shows only two, three images were generated, allowing you to choose the icon that best matches your image.

The created images are stored in the output folder within the project folder you created.

Extend the Chat App

If you want to distribute this application to many users, deploying it on an ECS container can achieve that. Also, since saving locally is inconvenient, let's use an S3 bucket as the storage destination.

Especially when distributing the application on EC2 or containers, it is easier to manage using file storage like S3 rather than local storage on the server.

Additionally, by naming the image files with creation timestamps or UUID, the chat application can function properly even when used by multiple people.

It might also be good to manage users with Cognito.

Summary

We created an image generation chat application using Amazon Nova Canvas.

Since it was almost my first time generating images, it took some time starting from prompt engineering basics, but the joy of generating an image as intended is great!

DEV Community