Forem: Allen Firstenberg

Through the Blue Frames: UX From Google Glass to Gemini

Allen Firstenberg — Fri, 15 May 2026 15:00:00 +0000

It's a busy few weeks for me.
Three weeks ago I attended a GDE Summit and Google Cloud Next.
Two weeks ago, I was one of those honored as a Voice AI 100 at the 10th Project Voice conference.
Next week I will be attending my eleventh Google I/O - my tenth as a GDE.

More on all of these shortly and how they illustrate my past 15 years as a developer.

But first - I think about what last week marked. Because 13 years ago last week I walked into Google's offices on the top floor of Chelsea Market in NYC and unboxed my first pair of Google Glass. (The first blue framed Glass in NYC.) And that event has shaped me and how I think about the role of personal computing to this day.

Although I was already a GDE at that time, I became the first Glass GDE. I made dozens of presentations to groups about how to develop for Glass and how we needed to think about developing for Glass. I got over my fears of speaking, leaned into the experience of working with people, started wearing my trademark blue shirts, and met some amazing folks in art and engineering. I wrote a book about it, too.

Most of all, I began to digest what a post-smartphone interface would be like. When I spoke at Augmented World Expo NYC in 2014 about Google Glass, I saw lots of demonstrations of goggles and AR popping off phone screens, and I didn't think that was it.

Instead, the message I tried to advocate was that the AR and VR worlds had much to learn from our experiences developing for Glass. Concepts such as "there when you need it, out of the way when you don't". I also said that the future of Glass had much to learn from AR and VR as well. Not in what they were showing, but rather that our devices need to be more contextual and understand the environment we were working in. Head mounted wearables had a unique feature no other did - they could "see" the same perspective we did without any action on our part.

At Google I/O in 2016, the first at Shoreline Amphitheater, a reporter saw I was wearing Glass and asked me what I thought about the keynote earlier that day. He expected me to talk about the new augmented reality platform that Google had announced. But I wasn't interested in that. I saw what I realized was truly the next generation of the Google Glass interface - Google Assistant and the Google Home.

Google Assistant, and the Voice First interfaces I was now helping people understand, started refining the message that I delivered at AWE a couple of years earlier. Voice agents needed context to work, but they mostly remained silent partners until we asked them something. On Google Home devices, they were mostly passive, ubiquitous, presences in the world we lived in.

The interface was also new. My message at the time was that, since digital computers were first available, we had to teach people how to use them - what holes to punch, what keys to click, how to use a mouse, or what swiping gestures were necessary on our phone.

For the first time, devices like Google Assistant and Alexa were turning that around. Now we were teaching computers how to understand us. They weren't perfect, and there were still many lessons we needed to figure out, such as discovery and monetization, but the interfaces were taking bold new steps in trying to figure out these answers.

Personally and professionally, this was a time when I continued to expand and grow. I didn't just do presentations, I collaborated on a weekly podcast, participated in the frequent Voice Lunch discussions, and held weekly office hours. When Glass was discontinued, it started my move into wearables in general, and then into becoming a Google Assistant GDE.

But as Google lost interest in Assistant, and Amazon struggled with the future of Alexa, I knew it was time for me to find the next generation of the future of interfaces. As I started to explore the world of LLMs, I realized that these were taking many of the concepts we had in voice and starting to bring them to everyone and to far more modalities than voice alone.

I became, briefly, an AI GDE as conversational interfaces started to take off. It was clear to me that the agents we were beginning to talk about were the evolution of the agents we were talking about in the voice world. And it was no surprise that we were talking about "context windows" and how important context was in these LLMs being able to work with our queries.

It was also clear that, while text was the default modality for these conversations, that was just the stepping stone. Voice was a clear next step. Incorporating images was an obvious next step. Perhaps we had learned some of the lessons I was advocating for?

I was hopeful. At I/O 2022, a whole 10 years after Glass launched, Google was talking about using AI to "bridge the physical and digital worlds" to use the context of what you could see in front of you to help with your search queries.

"If only," I thought, "they had some... glasses.. or something to make that easier."

We saw the first tease of that at I/O in 2024 in a demonstration of Project Astra, where glasses were able to answer questions about the context they were "seeing". At I/O 2025, it went two steps further - we were told this technology would be part of the forthcoming Android XR, and we could try on and test a prototype!

But there were many unanswered questions. Most importantly in my mind - how would developers tap into this interface? Glass and Assistant were notable because they were platforms, allowing developers to use the new interface that was available. Would Android XR let us seamlessly ask Gemini a question and get it answered by our app, all through voice? Or would it force a clunky "launch" and change in interaction model? Do we have a discovery model? Can we monetize our apps to pay for their development? Had we learned the lessons yet?

My conferences these weeks tell the tale of my quest to answer that final question. The GDE summit let me connect with developers across different fields, cross-pollinating ideas, and reminding me of this journey I started nearly 15 years ago. Cloud Next reminds me of the underlying workhorse that AI, LLMs, and agents are bringing to the table. Project Voice reminds me of the people who were delivering that next generation interface to millions of households and the small role I played in it.

And I/O?

That reminds me of the future. The next step.

Next week we will see if Google has truly learned the lessons from Glass and Assistant and AI. We'll see if they let us do ambient and ubiquitous computing in a whole new way. We'll learn, I hope, when these devices will be available for everyone. And, perhaps most importantly, we'll learn if they'll come with blue frames.

We'll hear and see next week. And I'll give voice to my thoughts then.

Acknowledgements

Along this journey, I've walked alongside many amazing people. Some pointed me in new directions. Some collaborated in shared understanding. Any list I give would be entirely inadequate, and likely missing a few who should be included, but I wanted to try to mention some. Google and the Google Developer Expert program, as a whole, who have provided great opportunities to attend many of these conferences. Jonathan Beri, who invited me to my first I/O in 2012. Jen Tong and Timothy Jordan, my mentors during the Glass years. Jessica Earley-Cha, one of my mentors during the Assistant years. Jason Salas, my co-author. Mark Tucker, my podcast co-host. Gerwin Sturm, Steven Gray, Linda Lawton, Denis Valasek, Noble Ackerson, and Mike Wolfson, my fellow GDEs who helped me explore these new worlds. And, most of all, my family - my parents who started me on this path with computers decades ago, and my child who keeps me grounded every day.

What's Coming with LangChainJS and Gemini?

Allen Firstenberg — Thu, 08 Jan 2026 22:53:08 +0000

The past few months have been huge for both Gemini and LangChainJS! I've been busy trying to keep up with this (and a lot more), but as the year comes to a close, I wanted to take a moment and let folks know about the exciting developments going on at the junction of the two and let you know what's coming "real soon now!"

LangChainJS finally hit its 1.0 milestone, and with it came a host of new features. At the same time, the API has stabilized, so we know what we're working with and what to expect going forward.

Gemini also hit a milestone with Gemini 3 coming out and Gemini 2 being shut down in a few months. There have also been some fantastic multimodal models in the past few months (can we say Nano Banana enough?), and the LangChainJS Gemini libraries have barely been able to keep up with some of these developments.

I want to take a quick look at how we got here with the LangChainJS Gemini libraries to understand where we're going. But if you're impatient and just want to see what's coming, skip ahead a section. I don't think you'll, be disappointed.

How we got (gestures) here

Currently, there are a dizzying number of packages available to use Gemini with LangChainJS:

@langchain/google-genai was based on the previous version of Google's Generative AI package. It was designed to work just with the AI Studio API (often confusingly called the Gemini API) and not with Vertex AI's Gemini API. It has not been maintained in roughly a year and the library it uses is not designed to work with modern versions of the Gemini model.
@langchain/google-gauth was a REST-based library and is used if you're running in a Google hosted environment or another node-like system to access either the AI Studio API or the Vertex AI API.
@langchain/google-webauth was similar to @langchain/google-gauth, but was designed to work in environments where there was no access to a file system.
@langchain/google-vertexai and @langchain/google-vertexai-web were similar to the above, but defaulted to using Vertex AI, although they could also use the AI Studio API.

There was also another package, @langchain/google-common which was the package that all the REST-based versions relied on to do the actual work.

As the person maintaining the REST-based packages, I always saw this as somewhat frustrating. The original goal was to have just one package. It was meant to use REST, since the libraries at the time (late 2023 and early 2024) only supported either the Vertex AI or AI Studio APIs. I wanted one library to make it easier. Well... best laid plans...

That one library became four. First, because we couldn't find an easy solution to support both node-based and web-based platforms and still support Google Cloud's Application Default Credentials (ADC). And then because we wanted a clear "Vertex" labeled package to match what was on the
Python side (which were both written by Google).

By the time I had that working in January of 2024, someone at Google had already written a version that just worked with the AI Studio API side. And thus began the confusion.

I've been proud of the google-common based libraries. We tried many things that the community wanted - we had cross-platform compatibility for over a year before Google offered a library that did the same thing, and when Gemini 2.0 launched, we had compatibility within days, while Google took over a month to get its new JavaScript library out. We experimented with features such as a Security Manager and a Media Manager. We supported other models besides Gemini, with the first two being Gemma on the AI Studio platform and those from Anthropic on Vertex AI.

But the packages were confusing, and one was outdated. So it was time to find a better solution.

Simplifying the packaging

I'm thrilled that, going forward, we'll be supporting one package:
@langchain/google

pause for cheers and applause

As always, you'll be able to use the package manager of your choice and install it as always:

yum add @langchain/google

Using the library should feel familiar. If you're currently using the ChatGoogle class, you'll continue to do so, just from a new library:

import { ChatGoogle } from "@langchain/google";

const llm = new ChatGoogle({
  model: "gemini-3-pro-preview",
})

You'll also be able to use the new LangChainJS 1 way of creating agents with the createAgent() function by just specifying the model, and it will use this new library.

const agent = createAgent({
  model: "gemini-3-flash-preview",
  tools: [],
})

(Until these changes are in place, this may not do what you expect. So beware.)

Just like the old libraries, the new library will continue to support API Keys from both AI Studio and Vertex AI Express mode, as well as Google Cloud credentials for service accounts and individuals. Credentials can be provided explicitly in the code, loaded through environment variables where available, or relying on ADC.

This new library uses REST behind the scenes, so it doesn't depend on any Google library to communicate with Gemini. I learned a lot while building the original REST version, and worked closely with LangChainJS engineers to try and avoid some of the worst mistakes we made back then. Our hope is that this new library becomes a model for how REST-based libraries can look and work for other integrations.

Despite this, our goal with this was, largely, to keep things that used to work continuing to work, so you wouldn't need to make big code changes.

But LangChainJS 1 brings with it a lot of new features. And this new library is ready to use them.

Improved (and standard!) text and multimodal support

While there are many great features with both LangChainJS 1 and Gemini 3, I want to highlight one of the biggest new features that this library will be supporting.

LangChainJS 0 was mostly oriented around text - which all models supported when it was created. As models began to support multimodal input, and eventually output, the implementation was a bit haphazard and different for each model. LangChainJS 1 sought to standardize that.

Better ways to handle replies - text and multimodal

Previously, the response.content field would be either a string or an array of MessageContentComplex objects. Most tasks assumed it was a string, but if you needed multimodal support, this started getting messy.

LangChainJS 1 keeps response.content for backwards compatibility, and we've tried to respect that. So if you want text, you can still look here.

But the better way to get the text parts from the response is to use response.text, which now guarantees you will get a string. Something like this:

const llm = new ChatGoogle({
  model: "gemini-3-flash-preview",
});
const result: AIMessage = await llm.invoke("Why is the sky blue?");
const answer: String = result.text;

If you need to differentiate between the "thinking" or
"reasoning" parts of the response and the final response, or if you get multi-modal responses back, you can use the new response.contentBlocks field. This field is guaranteed to be an array of the new, consistent, ContentBlock.Standard objects.

For example:

const llm = new ChatGoogle({
  model: "gemini-3-pro-image-preview",
});
const prompt = "Draw a parrot sitting on a chain-link fence.";
const result: AIMessage = await llm.invoke(prompt);
result.contentBlocks.forEach((block: ContentBlock.Standard) => {
  if (!("text" in block)) {
    saveToFile(block);
  }
})

Sending multimodal input to Gemini

This ContentBlock.Standard also works for sending data to Gemini. For example:

const llm = new ChatGoogle({
  model: "gemini-3-flash-preview",
});
const dataPath = "src/chat_models/tests/data/blue-square.png";
const dataType = "image/png";
const data = await fs.readFile(dataPath);
const data64 = data.toString("base64");

const content: ContentBlock.Standard[] = [
  {
    type: "text",
    text: "What is in this image?",
  },
  {
    type: "image",
    data: data64,
    mimeType: dataType,
  },
];
const message = new HumanMessage({
  contentBlocks: content
});

const result: AIMessage = await llm.invoke([message]);
console.log(result.text);

Similar tasks work for audio and video input as well.

What's missing, what's next, and what do you want to see?

We plan to release an alpha version of this in early January 2026, with a final version within a month after.

There is still a lot of discussion around what will happen with the old versions of the library, and the LangChain team and I welcome your thoughts.
My current thinking is:

They will receive a version bump when the new @langchain/google is released.
Older versions and this new version will be marked as deprecated, with the target package to be @langchain/google.
This final release will actually delegate all functionality to @langchain/google - the old libraries will just be a thin veneer.
- This will give you a little more time to migrate to the newer features without having to do extensive code changes.
- I can't guarantee full backwards compatibility, but the hope is that such issues will be minimal.

This first release of @langchain/google is also sure to be missing some features. We'd like to hear your feedback about what is most important to you. For example, here are some features that may not be available on day 1:

Embedding support
Batch support
Media manager
Security manager
Support for non-Gemini models (which are most important to you?)
Support for Veo and Imagen (how would you like to see these?)
Google's Gemini Deep Thinking model and the Interactions API

You may have other features that you think are important - if so, we'd love to hear which ones. (And if you are willing to help integrate them - let's talk.)

I, personally, want many of these features. But I want to get your feedback about what my priorities should be.

Some Personal Thanks

The past few months have been hectic for me, which is part of why this update has been delayed. I appreciate the support from the team at LangChain, from the community, and from my fellow GDEs. It means a lot to me when people tell me they're using Gemini with LangChainJS.

Thanks to my employer, for encouraging open source work, to
LangChain, for providing staff to assist in technical questions, and to Google for providing cloud credits to help make testing these updates possible and for sponsoring the #AISprintH2.

Very special thanks to Denis, Linda, Steven, Noble, and Mark who have always been there with technical and editorial advice, as well as a friendly voice when times got rough.

Very very special thanks to my family, who have always been there for me.

As many of you know, although I am both a Google Developer Expert and a LangChain Champion, I work for neither company. My work for the past two years on this project has been a labor of love because I appreciate the products that both Google and LangChain have delivered, and I want to make both better. I plan to continue that work - and I hope you are also out there, trying to make the world a better place in your own way.