Introduction
The Amazon Nova series announced at AWS re:Invent 2024 includes Nova Canvas, which can generate images. Previously, to generate images from Amazon Bedrock, the options were limited to the Amazon Titan series or the Stable Diffusion series, but now Nova Canvas has been added as a new choice.
In this article, we will explore what Nova Canvas can do and how to create an image generation chatbot.
Features of Nova Canvas
Nova Canvas is a generative AI model that creates new images from text or image prompts. The features of Nova Canvas are as follows:
Feature | Description |
---|---|
Provision of Reference Images | Can provide reference images useful for generating images or videos |
Determination of Color Palette | Determines the color scheme or "color palette" of an image using text input |
Image Editing | Allows replacing objects or backgrounds in input images using text prompts |
Background Removal | Easily removes backgrounds, leaving the subject of the image unchanged |
Safety, Responsible AI, Compensation | Includes traceability, content moderation, and watermarks for compensation |
https://aws.amazon.com/ai/generative-ai/nova/creative/
Nova Canvas vs Titan
Below is a comparison between the Nova series and the Titan series.
The Nova series has been enhanced to handle longer content and more complex documents overall.
Feature | Titan Model | Nova Model |
---|---|---|
Optimal Use Case | General text generation, image tasks, embeddings | Long content and complex document processing |
Use Case Scenarios | When image generation or embeddings are needed | When processing very long documents |
Strengths | Versatility in text, image, and embeddings | Larger context window (up to 300K tokens) |
Cost | Relatively low cost | High cost |
Application Integration | Broad integration possibilities | Often optimized for specific use cases |
Low Latency for Standard Tasks | Yes | No |
Optimized for Production Workloads | Yes | No |
Standard Context Length | Yes | No |
Fast Response Time Required | Yes | No |
On-Demand Pricing for Nova Canvas
Below are the usage fees for Nova Canvas and Titan Image Generator in the us-east-1 (Northern Virginia) region.
Depending on the image size, Nova Canvas costs several times more than Titan, so it is necessary to use them according to the use case.
Titan Image Generator also has a v1, but the cost in the above region was the same at the time of the survey.
Model | Image Resolution | Price per Image Generated at Standard Quality | Price per Image Generated at Premium Quality |
---|---|---|---|
Amazon Nova Canvas | Up to 1024 x 1024 | USD 0.04 | USD 0.06 |
Amazon Nova Canvas | Up to 2048 x 2048 | USD 0.06 | USD 0.08 |
Amazon Titan Image Generator v2 | Smaller than 512 x 512 | USD 0.008 | USD 0.01 |
Amazon Titan Image Generator v2 | Larger than 1024 x 1024 | USD 0.01 | USD 0.012 |
https://aws.amazon.com/bedrock/pricing/
Nova Canvas Chat App
Now, let's create a chatbot app using Nova Canvas!
Here is the technology stack we will use.
Item | Content | Remarks |
---|---|---|
Client Side | Streamlit | Web framework implemented in Python |
Server Side | Cloud9 | Integrated IDE environment on AWS, but new usage has been stopped. Alternatively, use VSCode Server or Amazon SageMaker Studio code editor. |
Language | Python | Use version 3.9 or higher |
Additionally, this time we will create a simple chatbot that can create icons with instructions in any languages.
The settings for the images to be created are just setting system prompts and negative prompts, so change the prompts according to the image you want to create.
1. Set Up the Development Environment
First, let's set up the development environment.
Install Python version
or higher and create a development directory.
Install the following packages with
```pip install -r requirements.txt```
.
```requirements.txt
boto3==1.38.6
streamlit==1.45.0
The versions seem to be fine even if they are the latest, but please check before proceeding.
2. Create the Chat App
System Architecture
Here is the architecture of the chat app we will create. Initially, I thought it would be fine to just call Nova Canvas from Streamlit via boto3, but to incorporate the way of giving prompts for image generation, I added a process to extract keywords from the instruction sentences in the chat.
Tips for Image Generation via Gen-AI
Give Instructions with Words Instead of Sentences
Since generative AI understands prompt sentences by dividing them into a certain context length, it is better to give instructions with important words separated by commas.
Of course, Nova Canvas has the advantage of understanding longer contexts compared to Titan, but by using a description method that is easy for the generative AI model to understand, you can create images as you envision.
- Example: Surprised child, colorful playground, anime style
Bring Important Words to the Front
It is necessary to bring important expressions to the front among the things you want to draw. Since the main theme this time is simple icon generation, the content defined as system prompts is brought to the front of the prompt sentence.
Utilize Negative Prompts
Negative prompts are important in image generation. Especially in normal prompt sentences, it is important to describe things you do not want to include in negative prompts instead of using negative expressions like not
or no
.
If negative expressions are included in the prompt, the generative AI model may generate an image that includes those words.
In this script, since we want to generate simple icons, we have added 3D, color, photo
to the default negative prompts.
Also, since instructing icon generation may create multiple icon sets in one image, multiple images
is also included in the negative prompts.
Give Instructions in English
Image generation models usually generate images closer to your vision when instructed in English rather than other languages.
In this chat app, the chat content is translated into English before extracting words, so it is also compatible with Japanese.
Nova Canvas Chat App Code
Here is the code for the Nova Canvas chat application.
import base64
import json
import os
import random
import boto3
import streamlit as st
REGION = "us-east-1"
IMG_MODEL_ID = "amazon.nova-canvas-v1:0"
TXT_MODEL_ID = "anthropic.claude-3-haiku-20240307-v1:0"
@st.cache_resource
def get_bedrock_client():
return boto3.client(service_name="bedrock-runtime", region_name=REGION)
def generate_image(native_message, image_size, image_num, system_prompt, ng_text):
"""Generate images"""
message = system_prompt + native_message
print(f'textToImageParams: {message}')
seed = random.randint(0, 858993460)
native_request = {
"taskType": "TEXT_IMAGE",
"textToImageParams": {
"text": message,
"negativeText": ng_text
},
"imageGenerationConfig": {
"seed": seed,
"quality": "standard",
"height": image_size,
"width": image_size,
"cfgScale": 10,
"numberOfImages": image_num,
},
}
request = json.dumps(native_request)
bedrock_client = get_bedrock_client()
response = bedrock_client.invoke_model(modelId=IMG_MODEL_ID, body=request)
model_response = json.loads(response["body"].read())
image_path_list = []
for base64_image_data in model_response["images"]:
# Save image to local folder
i, output_dir = 1, "output"
if not os.path.exists(output_dir):
os.makedirs(output_dir)
while os.path.exists(os.path.join(output_dir, f"nova_canvas_{i}.png")):
i += 1
image_data = base64.b64decode(base64_image_data)
image_path = os.path.join(output_dir, f"nova_canvas_{i}.png")
with open(image_path, "wb") as file:
file.write(image_data)
print(f"The generated image has been saved to {image_path}")
image_path_list.append(image_path)
return image_path_list
def textract_from_input(message):
"""Translate Japanese instructions in chat to English and extract key words.
This process uses foundational models like Haiku or Sonnet for text processing, not Nova Canvas.
"""
bedrock_client = get_bedrock_client()
system_prompt = "Translate Other language to English and extract key English words from the translated sentence. "\
"Extract less than 10 meaningful English words, separated by commas. "\
"Do not include prepositions or articles. "\
"<examleInput>かわいい猫と女の子が楽しく遊んでいる</examleInput>"\ # example of Japanese
"<examleOutput>cute cat, girl, play, joyful</examleOutput>"
message = {
"role": "user",
"content": [{"text": message}]
}
messages = [message]
system_prompts = [{"text" : system_prompt}]
inference_config = {"temperature": 0.5}
additional_model_fields = {"top_k": 200}
response = bedrock_client.converse(
modelId=TXT_MODEL_ID,
messages=messages,
system=system_prompts,
inferenceConfig=inference_config,
additionalModelRequestFields=additional_model_fields
)
words = response["output"]["message"]["content"][0]["text"]
return words
def display_history(messages):
"""Display chat history"""
for message in messages:
display_img_content(message)
def display_img_content(message):
"""Display message and images
Handles multiple images generated by Nova
"""
contents = message["content"]
print(f'message contents: {contents}')
with st.chat_message(message["role"]):
for content in contents:
if content.get('text', None) != None:
st.write(content["text"])
else:
st.image(content["image"])
def sidebar():
"""Display sidebar"""
with st.sidebar:
st.sidebar.title("Image Settings")
# Image size
image_size = st.sidebar.slider(
"Image Size",
min_value=320,
max_value=960,
step=64,
value=320
)
# Number of images
image_number = st.sidebar.slider(
"Number of Images",
min_value=1,
max_value=5,
step=1,
value=1
)
# System prompt
system_prompt = st.sidebar.text_area(
"System Prompt",
value="minimalist icon, simple, flat design, dual-line design, 1.5px stroke weight line, "\
"solid background, monochromatic, 2D, ",
height=200
)
# Negative prompt
negative_text = st.sidebar.text_area(
"Negative Prompt",
value="3D, color, photo, multiple images",
height=100
)
return image_size, image_number, system_prompt, negative_text
def main():
"""Main process"""
st.title("Simple Icon Generator by Nova Canvas")
img_size, img_num, system_prompt, negative_text = sidebar()
if "messages" not in st.session_state:
st.session_state.messages = []
display_history(st.session_state.messages)
if prompt := st.chat_input("What's up?"):
input_msg = {"role": "user", "content": [{"text": prompt}]}
display_img_content(input_msg)
st.session_state.messages.append(input_msg)
print(f'st.session_state.messages: {st.session_state.messages}')
# Expand input content
print(f'input_msg: {input_msg}')
textract_resp = textract_from_input(input_msg["content"][0]["text"])
print(f'texttract: {textract_resp}')
# Generate images
resp_img_path_list = generate_image(
textract_resp, img_size, img_num, system_prompt, negative_text
)
resp_img_contents = []
for resp_img_path in resp_img_path_list:
resp_img_contents.append({"text": f"Image created! filepath: {resp_img_path}"})
resp_img_contents.append({"image": resp_img_path})
resp_msg = {
"role": "assistant",
"content": resp_img_contents
}
display_img_content(resp_msg)
st.session_state.messages.append(resp_msg)
if __name__ == "__main__":
main()
3. Running the Application
Let's run the application! Start the application with the following streamlit
command.
streamlit run nova_canvas_chat.py --server.port 8080
In Cloud9, you can display the application by opening Preview -> Preview Running Application from the toolbar. The same should apply to local environments like VSCode, VSCode Server, and SageMaker Studio code editor.
If you see a startup screen like this, you have succeeded.
Now, let's instruct the chat to create the icon image you want. This time, I instructed it to "Create an icon for a ToDo list that can manage what I want to do."
Since I want multiple suggestions, I set the number of images to 3 from the sidebar.
It created a nice ToDo list icon! Although the image shows only two, three images were generated, allowing you to choose the icon that best matches your image.
The created images are stored in the output
folder within the project folder you created.
Extend the Chat App
If you want to distribute this application to many users, deploying it on an ECS container can achieve that. Also, since saving locally is inconvenient, let's use an S3 bucket as the storage destination.
Especially when distributing the application on EC2 or containers, it is easier to manage using file storage like S3 rather than local storage on the server.
Additionally, by naming the image files with creation timestamps or UUID
, the chat application can function properly even when used by multiple people.
It might also be good to manage users with Cognito.
Summary
We created an image generation chat application using Amazon Nova Canvas.
Since it was almost my first time generating images, it took some time starting from prompt engineering basics, but the joy of generating an image as intended is great!
Top comments (0)