Forem: Dario Farzati

De APIs a ACIs: La Próxima Evolución en la Interacción con el Software

Dario Farzati — Fri, 13 Sep 2024 09:04:12 +0000

Introducción

En el panorama en constante evolución del desarrollo de software, hemos visto cambios significativos en cómo las aplicaciones se comunican e interactúan. Desde los primeros días de las arquitecturas monolíticas hasta el auge de los microservicios, cada evolución tenía como objetivo hacer el software más eficiente y fácil de usar. Sin embargo, a medida que la tecnología avanza, también lo hace la complejidad de las interacciones.

Aquí es donde entran en juego las Interfaces Conversacionales de Aplicación (ACIs), y me gustaría compartir algunos pensamientos sobre un nuevo paradigma que podría reducir la carga cognitiva y hacer que la interacción con el software sea más intuitiva que nunca.

Los Large Language Models están transformando la forma en que interactuamos con la tecnología. Estamos entrando en una era donde comunicarse con las máquinas es tan natural como conversar con un colega. Los usuarios ahora pueden expresar sus necesidades en lenguaje cotidiano, y los sistemas responden en consecuencia. Y esta es la base de las ACIs.

Esta propuesta no es solo una actualización tecnológica, es un cambio de paradigma que también plantea preguntas profundas: ¿cómo remodelará esto la experiencia para usuarios, desarrolladores y agentes de IA?

La interfaz que se disuelve

En una era donde los LLMs pueden responder con texto, formatos estructurados, imágenes, voz o incluso código que genera interfaces visuales dinámicas, debemos preguntarnos: ¿cuánto tiempo más seguiremos construyendo capas entre nosotros y el software? ¿Qué pasaría si dejáramos de desarrollar user interfaces y application programming interfaces tradicionales por completo?

Hoy en día, todo parece converger en una interfaz de prompt o chat. Aunque no todos los problemas se resolverán por estos medios, esto es solo el comienzo. La clave no es la interfaz en sí, sino lo que representa: un cambio fundamental en cómo interactuamos con la tecnología.

Por ejemplo, integrar una API relativamente compleja requiere navegar por una extensa documentación, entender esquemas, respuestas, excepciones; esto no solo consume tiempo valioso, sino que también desvía el enfoque de la construcción de características principales.

Ahora, imagina eludir estas complejidades a través de una interfaz que entiende tu intención mediante lenguaje natural. Aquí es donde podemos ver que la era de las UIs y APIs podría estar evolucionando hacia un nuevo paradigma: las ACIs.

Antes de profundizar en el qué, cómo y por qué de las ACIs, revisemos brevemente el viaje que nos ha llevado a este momento crucial.

Todo es una API

Tracemos la evolución de las APIs para entender por qué fueron inventadas. Inicialmente, cuando teníamos sistemas con algoritmos y estructuras de datos específicos resolviendo problemas en ciertos dominios, era eficiente encapsularlos y exponerlos a través de interfaces - Interfaces de Programación de Aplicaciones (APIs). Para evitar confusiones más adelante, llamemos a estas APIs específicas "APIs de Dominio". Luego, encima de eso, agregamos otra interfaz, una Interfaz de Usuario, que capturaría las intenciones del usuario y las traduciría en llamadas API correspondientes.

En esencia, implementamos una interfaz (UI) para interactuar con otra interfaz (API de Dominio).

Eventualmente, también necesitamos interactuar con programas que se ejecutan en lugares remotos en todo Internet, por lo que nuestras interfaces o incluso nuestros sistemas backend interactuarían con estas interfaces remotas. Y así nuestro viaje de construcción de interfaces nos llevó a donde estamos hoy:

Entonces, todo es una I(nterfaz)... ¿y?

Bueno, no solo todo es una interfaz, sino que cada una opera en un nivel de abstracción diferente, escrita en varios lenguajes, por diversos autores que toman decisiones de diseño distintas (y no siempre coherentes) que responden a casos de uso únicos.

Como desarrolladores, al navegar por toda esta interfaz, pasamos un tiempo considerable moviéndonos a través de las diferentes capas de un sistema y sus dependencias. Esencialmente, estamos traduciendo datos de una interfaz a otra, todo mientras intentamos mantener la vista en nuestra intención original. En muchos casos, implementar correctamente estas interfaces es más desafiante y tedioso que escribir la lógica de dominio real que resuelve el problema!

Una parte sustancial de este esfuerzo se dedica principalmente a los pasos 2 y 3 a continuación:

Los desarrolladores implementan sistemas backend.
Los desarrolladores exponen interfaces de programación en esos sistemas backend.
Los desarrolladores consumen interfaces de programación para construir sistemas backend y/o interfaces de usuario.
Los usuarios finales interactúan con interfaces de usuario para lograr sus tareas.

Esta complejidad a menudo oscurece nuestro objetivo principal: resolver eficientemente el problema del usuario.

El Amanecer de las Interfaces Generadas Dinámicamente

En los últimos, ¿qué, 16 meses?, hemos observado un avance: aplicaciones LLM como ChatGPT han comenzado a usar acciones que consumen APIs y devuelven datos en múltiples formatos, incluyendo JSON estructurado y código HTML, CSS y JavaScript.

Vemos de nuevo una interfaz de usuario, en este caso, en forma de prompt o chat. Esta es una UI que, como cualquier otra UI, está escrita de antemano por un grupo de ingenieros. Pero esta tiene una característica especial: esta interfaz de prompt genera nuevas interfaces sobre la marcha, particularmente UIs (pero no limitado a ellas, como veremos más adelante).

Hoy puedes pedirle a modelos como Sonnet o ChatGPT que creen interfaces interactivas con entradas y salidas, visualizaciones de gráficos, SVGs, lo que quieras. Combinado con acciones que permiten al modelo conectarse con APIs, esto proporciona al usuario una interfaz de usuario inmediata, bajo demanda y ad-hoc para abordar cualquier tarea particular.

Pero eso no es todo. Con capacidades más potentes de Tool Calling y Structured Output, estas interfaces no solo pueden generar UIs sino también APIs. Devuelven datos en cualquier formato y esquema que especifiques.

Y esto es solo el comienzo.

Las capas que se sientan entre el usuario final y los algoritmos reales que resuelven las necesidades del usuario —las interfaces intermediarias diseñadas para traducir la intención del usuario en llamadas API programáticas— se están volviendo redundantes.

Claro, la mayoría de los humanos son visuales; y las personas visuales todavía quieren dashboards, paneles y botones. Pero el punto no es reemplazar las UIs con Prompts; el punto es en qué se convertirán realmente estas UIs de ahora en adelante, quién tendrá que implementarlas, cuándo y cómo?
¿Es realmente necesario construir una UI lo suficientemente común para atender al usuario promedio? ¿Intentar anticipar casos de uso? ¿Agregar localización para múltiples idiomas? ¿Garantizar la accesibilidad? ¿Siempre limitarse a una interfaz visual?

Todas estas tareas podrían abordarse dinámicamente de una manera que se ajuste mejor a Cada. Único. Usuario.

Estos usuarios pueden recibir exactamente la interfaz que necesitan, mostrando la información precisa que requieren, en un formato que coincida con sus preferencias y necesidades de accesibilidad: visual, audio o texto.

Hacia una Interfaz Única

Con estas capas colapsando y las interfaces siendo generadas dinámicamente sobre la marcha, podemos visualizar una única interfaz que sirva para todos los propósitos:

Sirve a los desarrolladores que buscan integrar funcionalidad en sus aplicaciones.
Sirve a los usuarios finales que buscan interactuar directamente con el sistema.
Sirve a los agentes de IA que realizan tareas o buscan información de forma autónoma.

Llamaría a esta interfaz única una Interfaz Conversacional de Aplicación (ACI).

Aunque su impacto puede ser remotamente comparable a cómo GraphQL cambió las interacciones de API en comparación con REST, las ACIs van mucho más allá al proponer un paradigma completamente nuevo.

Podemos esbozar varias diferencias radicales en comparación con las APIs tradicionales:

Interacción basada en intención

Las ACIs no tienen contratos fijos o preestablecidos, esquemas, métodos o flujos de trabajo precisos. Son basadas en intención en lugar de basadas en llamadas procedurales.

En lugar de llamar a métodos específicos con parámetros predefinidos, los desarrolladores y usuarios pueden expresar sus intenciones en lenguaje natural. Por ejemplo, "Crear una nueva cuenta de usuario con privilegios de administrador" es más intuitivo que construir una o dos llamadas API con el endpoint y los parámetros correctos.

Dado que las ACIs no están limitadas por contratos rígidos, pueden entender y adaptarse a nuevas solicitudes sin requerir actualizaciones explícitas o cambios de versión. Esta flexibilidad reduce la sobrecarga de mantenimiento y acelera los ciclos de desarrollo.

Solo imagina cómo construirías un endpoint de API para convertir nombres en codificación base64, que responda a diversas necesidades del consumidor:

# Llamada simple, podría ser hecha por un cliente ACI o incluso por el usuario final
curl localhost -d "¿Cómo se ve mi nombre en base64? Es John"
Tu nombre "John" en base64 es: Sm9obg==

# Llamada con una respuesta estructurada que puede ser usada por un cliente ACI
curl localhost -d "Escribe un documento JSON con `code: (base64 de 'OpenACI')`"
{"code":"T3BlbkFDSQ=="}

# Llamada con un cuerpo de solicitud JSON y una respuesta JSON
# – no diferente a una API JSON normal (¡podría incluso
# ser usada durante una fase de transición de API a ACI!)
curl localhost -d '{"intent":"convertToBase64","name":"OpenACI"}'
{"name":"OpenACI","base64":"T3BlbkFDSQ=="}

# Llamada con múltiples parámetros (ACI llama al manejador
# de intención múltiples veces - ¡no hay necesidad de cambiar
# la implementación!)
curl localhost -d "Muéstrame una tabla ascii con la representación en base64 de estos nombres: John Connor, Sarah Connor, Kyle Reese"
| Nombre        | Base64                |
|---------------|-----------------------|
| John Connor   | Sm9obiBDb25ub3I=      |
| Sarah Connor  | U2FyYWggQ29ubm9y      |
| Kyle Reese    | S3lsZSBSZWVzZQ==      |

Ahora compara la API que imaginaste, con la implementación ACI que realmente respondió a las solicitudes ilustradas arriba:

import { HttpAci } from '@openaci/http';
import { z } from 'zod';

const app = new HttpAci({ llmName: 'gpt-4o-mini' });

const schema = z.object({
    name: z.string(),
});

app.intent('Convert name to base64', schema, ({ entities }) => {
    const { name } = entities;
    return Buffer.from(name).toString('base64');
});

Diseño centrado en el humano

Estas interfaces posicionan a los humanos como los principales consumidores, apoyándolos en cualquier rol, ya sea como usuarios finales o desarrolladores. Esto es crucial: las ACIs sirven a ambos a través del mismo marco conversacional. Esta unificación simplifica la arquitectura y reduce la necesidad de interfaces separadas como UIs o APIs especializadas.

Para los desarrolladores, como se dijo anteriormente, podemos integrar rápidamente funcionalidades sin gastar tiempo aprendiendo una nueva API. Simplemente podemos escribir código cliente que establezca lo que queremos lograr, y la ACI interpreta y ejecuta la intención.

Para los usuarios finales, pueden personalizar su interacción con el sistema sobre la marcha. Por ejemplo, un usuario podría pedir, ya sea por chat o voz, "Muéstrame todos los documentos que creé ayer" sin navegar a través de múltiples pantallas de UI. Las aplicaciones aún podrían ofrecer una UI visual predeterminada, pero los usuarios podrían aprovechar las ACIs para personalizar y adaptar la interfaz a sus necesidades y preferencias, al detalle.

Accesibilidad y conveniencia

Teniendo en cuenta el punto anterior, la accesibilidad es un aspecto fundamental de las ACIs. No solo por inclusividad, sino también por conveniencia. Las ACIs son multilenguaje y multimodales. Al soportar múltiples idiomas y modalidades (texto, voz, visuales), las ACIs hacen que los sistemas sean más accesibles para una diverse gama de usuarios, incluidos aquellos con discapacidades.

Más allá de la interacción humana

A medida que los LLMs continúan evolucionando y los agentes de IA se vuelven más sofisticados, también se convierten en consumidores de ACIs. En este sentido, un consumidor no es solo humano, sino cualquiera capaz de interactuar con estas interfaces usando lenguaje natural. Esto abre posibilidades para sistemas multi-agente distribuidos que pueden colaborar y negociar usando lenguaje natural, aunque lo que probablemente sucederá es que los agentes de IA aprovecharán la flexibilidad de las ACIs para acordar el mejor formato de datos e intercambiarán mensajes de la manera más optimizada sin nuestra intervención.

El Camino por Delante

Las ACIs representan un enfoque transformador para la interacción con el software, alineando la tecnología más estrechamente con los patrones de comunicación humana. Tienen el potencial de reducir la sobrecarga de desarrollo al eliminar la necesidad de múltiples capas intermediarias; definitivamente empoderarán a los usuarios, quienes ganarán acceso directo y personalizado a las capacidades del sistema sin depender de que alguien más piense en una interfaz que podría no satisfacer todas las necesidades o preferencias del usuario. Las ACIs también fomentarán la innovación, ya que con menos barreras para la integración, nuevos servicios y colaboraciones pueden surgir más rápidamente.

Ahora bien, todavía hay un camino por recorrer. Aunque creo que las ACIs podrían implementarse hoy para algunos casos de uso, la realidad es que el rendimiento, incluso para los modelos más rápidos, y las economías de escala, aún tienen que inclinarse a favor de una adopción generalizada para aplicaciones de alto tráfico. Todavía no estamos en una etapa donde podríamos reemplazar una API que recibe cientos de solicitudes por segundo. La estructura de costos actual sigue siendo prohibitiva para muchos casos de uso.

Pero por supuesto, esto es solo cuestión de tiempo. Hemos visto el impresionante avance en los últimos 12 meses. Creo que el cambio de APIs a ACIs ya nos permite reimaginar cómo interactuamos con el software.

Conclusión

Las Interfaces Conversacionales de Aplicación podrían ser un cambio transformador en cómo interactuamos con el software. Al disolver las capas tradicionales entre usuarios y aplicaciones, las ACIs prometen un futuro donde la interacción es más intuitiva, personalizada y accesible que nunca. No veo esto solo como una mejora incremental, es una reimaginación fundamental de nuestra relación con la tecnología.

Sin embargo, como con cualquier cambio de paradigma, las ACIs plantean preguntas y desafíos que creo que nosotros, como comunidad, necesitamos abordar:

¿Cómo remodelarán las ACIs los roles de desarrolladores y diseñadores? Con interfaces siendo generadas dinámicamente, ¿qué nuevas habilidades necesitarán cultivar los profesionales?

¿Cuáles son las implicaciones para la privacidad y seguridad del usuario en un mundo dominado por interacciones basadas en intención? ¿Cómo aseguramos que la conveniencia de las ACIs no comprometa la protección de datos?

¿Cómo podemos superar las barreras actuales de rendimiento y costo para hacer que las ACIs sean viables para aplicaciones de alto tráfico? ¿Qué innovaciones se necesitan en hardware o software para apoyar este cambio?

Estas preguntas no son solo técnicas, tocan dimensiones éticas, sociales y económicas que darán forma al futuro de nuestro mundo digital.

Únete a la conversación

El viaje hacia la plena realización del potencial de las ACIs está apenas comenzando, e invita a la colaboración y al diálogo. Tus ideas, experiencias y pensamientos son invaluables para navegar este nuevo panorama.

La especificación OpenACI

Con este paradigma en mente, queremos proponer una especificación abierta para Interfaces Conversacionales de Aplicación. Se llama OpenACI y su primer borrador será publicado la próxima semana.

Mientras tanto, puedes jugar con un prototipo muy temprano (y simplista) de una implementación HTTP OpenACI en nuestro repositorio de GitHub:

openaci / http-node

OpenACI Node implementation

Application Conversational Interfaces (ACIs) introduce a new paradigm, in a way comparable to what GraphQL was for RPC or RESTful APIs, but in this case proposing something entirely different to replace the term “API”.

We can enumerate a few radically different points when compared to APIs:

ACIs don't have a fixed or pre-established contract, schema, methods or a precise workflow. They are intent-based rather than procedural call-based.
These interfaces put humans as the main consumer, without making a distinction whether they are end-users or developers.
With the previous point in mind, accessibility is an important aspect of ACIs. Not only for inclusivity but also for convenience. ACIs are multi-language and multi-modal.
As LLMs continue evolving and AI agents perform better at reasoning, they will also qualify as consumers of ACIs. In this sense, we can iterate over the concept and think of a consumer as anybody capable…

View on GitHub

¡Esperamos tus comentarios! Si quieres unirte para discutir y definir la especificación OpenACI, escríbeme a lfarzati@gmail.com or contáctame por LinkedIn.

[Anthropic Claude 3.5 Sonnet fue utilizado para revisar, corregir y mejorar la legibilidad de algunas secciones de este artículo, así como su traducción al Español]

From APIs to ACIs: The Next Evolution in Software Interaction

Dario Farzati — Thu, 12 Sep 2024 23:34:01 +0000

Introduction

In the ever-evolving landscape of software development, we've seen significant shifts in how applications communicate and interact. From the early days of monolithic architectures to the rise of microservices, each evolution aimed to make software more efficient and user-friendly. Yet, as technology advances, so does the complexity of interactions.

This is where Application Conversational Interfaces (ACIs) come into play, and I'd like to share some thoughts of a new paradigm that could reduce cognitive load and make software interaction more intuitive than ever before.

Large Language Models are transforming how we interact with technology. We're entering an era where communicating with machines is as natural as conversing with a colleague. Users can now express their needs in everyday language, and systems respond accordingly. And this is the foundation of ACIs.

This proposal is not just a technology upgrade – it's a paradigm shift that also raises profound questions: how will this reshape the experience for users, developers and AI agents?

The dissolving interface

In an era where LLMs can respond with text, structured formats, images, voice, or even code generating dynamic visual interfaces, we must ask ourselves: how much longer will we continue building layers between us and the software? What if we stop developing traditional user interfaces and application programming interfaces altogether?

Today, everything seems to be converging into a prompt or chat interface. While not all problems will be solved through such means, this is merely the beginning. The key isn't the interface itself but what it represents—a fundamental shift in how we interact with technology.

For example, integrating a relatively complex API requires navigating extensive documentation, understanding schemas, responses, exceptions; this not only consumes valuable time but also diverts focus from building core features.

Now, imagine bypassing these complexities through an interface that understands your intent via natural language. This is where we can see that the era of UIs and APIs could be evolving into a new paradigm: ACIs.

Before we delve into the what, how, and why of ACIs, let's briefly revisit the journey that has led us to this pivotal moment.

Everything is an API

Let's trace the evolution of APIs to understand why they were invented. Initially, when we had systems with specific algorithms and data structures solving problems in certain domains, it was efficient to encapsulate and expose them through interfaces — Application Programming Interfaces (APIs). To avoid confusion later, let's call these specific APIs "Domain APIs." Then, on top of that, we added another interface, a User Interface, that would capture user intentions and translate them into corresponding API calls.

In essence, we implemented an interface (UI) to interact with another interface (Domain API).

Eventually, we also needed to interface with programs running in remote places all over the Internet, thus our interfaces or even our backend systems would interact with these remote interfaces. And so our interface-building journey led us to where we are today:

So, everything is an I(nterface)... and?

Well not only is everything an interface, the thing is each operates at a different abstraction level, written in various languages, by diverse authors making distinct (and not always coherent) design choices that respond to unique use cases.

As developers, when navigating all this interfacing, we spend considerable time moving through the different layers of a system and its dependencies. We're essentially translating data from one interface to another, all while trying to keep sight of our original intent. In many cases, implementing these interfaces correctly is more challenging and tedious than writing the actual domain logic that solves the problem!

A substantial part of this effort is primarily spent on steps 2 and 3 below:

Developers implement backend systems.
Developers expose programming interfaces in those backend systems.
Developers consume programming interfaces to build backend systems and/or user interfaces.
End-users interact with user interfaces to accomplish their tasks.

This complexity often obscures our primary goal: efficiently solving the user's problem.

The Dawn of Dynamically Generated Interfaces

In the past, what, 16 months?, we've observed a breakthrough: LLM applications like ChatGPT have started using actions that consume APIs and return data in multiple formats, including structured JSON and HTML, CSS, and JavaScript code.

We see again a user interface, in this case, in the form of a prompt or chat. This is a UI that, as any other UIs, is written beforehand by a group of engineers. But this one has a special characteristic: this prompt interface generates new interfaces on the fly, particularly UIs (but not limited to, as we'll see below).

Today you can ask models like Sonnet or ChatGPT to create interactive interfaces with inputs and outputs, chart visualizations, SVGs, you name it. Combined with actions that enable the model to connect with APIs, this provides the user with an immediate, on-demand, ad-hoc user interface for addressing any particular task.

But, that's not all. With more powerful Tool Calling and Structured Output capabilities, these interfaces not only can generate UIs but also APIs. They return data in whatever format and schema you specify.

And this is just the beginning.

The layers that sit between the end-user and the actual algorithms solving the user's needs —the intermediary interfaces designed to translate user intent into programmatic API calls— are becoming redundant.

Sure, most humans are visual; and visual people still want dashboards, panels, and buttons. But the point is not about replacing UIs with Prompts; the point is what these UIs will really become from now on, who will have to implement them, when and how?
Is it really necessary to build a common enough UI to cater the average user? Trying to anticipate use cases? Adding localization for multiple languages? Ensuring accessibility? Always limiting to a visual interface?

All these tasks could be dynamically addressed in a way it best fits Every. Single. User.

These users can receive exactly the interface they need, showing the precise information they require, in a format that matches their preferences and accessibility needs: visual, audio, or text.

Towards a Single Interface

With these layers collapsing and interfaces being dynamically generated on the fly, we can envision a single interface serving all purposes:

It serves developers looking to integrate functionality into their applications.
It serves end-users seeking to interact directly with the system.
It serves AI agents performing tasks or seeking information autonomously.

I'd call this single interface an Application Conversational Interface (ACI).

While their impact may be remotely comparable to how GraphQL changed API interactions compared to REST, ACIs go much further by proposing an entirely new paradigm.

We can outline several radical differences when compared to traditional APIs:

Intent-based interaction

ACIs don't have fixed or pre-established contracts, schemas, methods, or precise workflows. They are intent-based rather than procedural call-based.

Instead of calling specific methods with predefined parameters, developers and users can express their intentions in natural language. For example, "Create a new user account with admin privileges" is more intuitive than constructing one or two API calls with the correct endpoint and parameters.

Since ACIs are not bound by rigid contracts, they can understand and adapt to new requests without requiring explicit updates or version changes. This flexibility reduces maintenance overhead and accelerates development cycles.

Just imagine how would you build an API endpoint for converting names into base64 encoding, that responds to diverse consumer needs:

# Simple call, could be made by an ACI client or even by end-user
curl localhost -d "How does my name look like in base64? It's John"
Your name "John" in base64 is: Sm9obg==

# Call with a structured response that can be used by an ACI client
curl localhost -d "Write a JSON document with `code: (base64 of 'OpenACI')`"
{"code":"T3BlbkFDSQ=="}

# Call with a JSON request body and a JSON response
# – no different to a normal JSON API (could even
# be used during an API-to-ACI transition phase!)
curl localhost -d '{"intent":"convertToBase64","name":"OpenACI"}'
{"name":"OpenACI","base64":"T3BlbkFDSQ=="}

# Call with multiple parameters (ACI calls the intent
# handler multiple times - no need to change the
# implementation!)
curl localhost -d "Show me an ascii table with the base64 representation of these names: John Connor, Sarah Connor, Kyle Reese"
| Name          | Base64                |
|---------------|-----------------------|
| John Connor   | Sm9obiBDb25ub3I=      |
| Sarah Connor  | U2FyYWggQ29ubm9y      |
| Kyle Reese    | S3lsZSBSZWVzZQ==      |

Now compare the API you pictured, with the ACI implementation that actually responded the requests illustrated above:

import { HttpAci } from '@openaci/http';
import { z } from 'zod';

const app = new HttpAci({ llmName: 'gpt-4o-mini' });

const schema = z.object({
    name: z.string(),
});

app.intent('Convert name to base64', schema, ({ entities }) => {
    const { name } = entities;
    return Buffer.from(name).toString('base64');
});

Human-centric design

These interfaces position humans as the primary consumers, supporting them in any role — whether as end-users or developers. This is crucial: ACIs serve both of them through the same conversational framework. This unification simplifies the architecture and reduces the need for separate interfaces like UIs or specialized APIs.

For developers, as said above, we can quickly integrate functionalities without spending time learning a new API. We can simply write client code that states what we want to achieve, and the ACI interprets and executes the intent.

For end-users, they can customize their interaction with the system on the fly. For instance, a user could ask, either by chat or voice, "Show me all the documents I created yesterday" without navigating through multiple UI screens. Apps could still offer a default visual UI, but users could leverage ACIs to customize and adapt the interface to their needs and preferences, to the detail.

Accessibility and convenience

With the previous point in mind, accessibility is a fundamental aspect of ACIs. Not only for inclusivity but also for convenience. ACIs are multi-language and multi-modal. By supporting multiple languages and modalities (text, voice, visuals), ACIs make systems more accessible to a diverse range of users, including those with disabilities.

Beyond human interaction

As LLMs continue evolving and AI agents become more sophisticated, they also become consumers of ACIs. In this sense, a consumer isn't only human but anyone capable of interacting with these interfaces using natural language. This opens up possibilities for distributed multi-agent systems that can collaborate and negotiate using natural language, although what will probably happen is that AI agents will leverage the flexibility of ACIs to agree on the best data format and will exchange messages in the most optimized way without our involvement.

The Road Ahead

ACIs represent a transformative approach to software interaction, aligning technology more closely with human communication patterns. They have the potential to reduce development overhead by eliminating the need for multiple intermediary layers; they will definitively empower users, who will gain direct and personalized access to system capabilities without depending on someone else thinking an interface that might not fulfill all the user's needs or preferences. ACIs will also foster innovation, since with fewer barriers to integration, new services and collaborations can emerge more rapidly.

Now, there's still a road ahead. While I believe ACIs could be implemented today for some use cases, reality is that the performance, even for the fastest models, and the economies of scale, have yet to tilt in favor of widespread adoption for high-traffic applications. We are not yet at a stage where we could replace an API that is hit with anything above hundreds of requests per second. The current cost structure remains prohibitive for many use cases.

But of course this is just a matter of time. We've seen the mind blowing breakthrough in just the past 12 months. I believe the shift from APIs to ACIs already lets us reimagine how we interact with software.

Conclusion

Application Conversational Interfaces could be a transformative shift in how we interact with software. By dissolving the traditional layers between users and applications, ACIs could promise a future where interaction is more intuitive, personalized, and accessible than ever before. I don't see this just as an incremental improvement — it's a fundamental reimagining of our relationship with technology.

However, as with any paradigm shift, ACIs bring forth questions and challenges that I think we, as a community, need to address:

How will ACIs reshape the roles of developers and designers? With interfaces being dynamically generated, what new skills will professionals need to cultivate?

What are the implications for user privacy and security in a world dominated by intent-based interactions? How do we ensure that the convenience of ACIs doesn't compromise data protection?

How can we overcome the current performance and cost barriers to make ACIs viable for high-traffic applications? What innovations are needed in hardware or software to support this shift?

These questions are not just technical—they touch on ethical, social, and economic dimensions that will shape the future of our digital world.

Join the conversation

The journey towards fully realizing the potential of ACIs is just beginning, and it invites collaboration and dialogue. Your insights, experiences, and ideas are invaluable in navigating this new landscape.

The OpenACI specification

With this paradigm in sight, we want to propose an open specification for Application Conversational Interfaces. It's called OpenACI and its first draft will be published next week.

In the meanwhile, you can play with a very early (and simplistic) prototype of an HTTP OpenACI implementation in our GitHub repo:

openaci / http-node

OpenACI Node implementation

We can enumerate a few radically different points when compared to APIs:

ACIs don't have a fixed or pre-established contract, schema, methods or a precise workflow. They are intent-based rather than procedural call-based.
These interfaces put humans as the main consumer, without making a distinction whether they are end-users or developers.
With the previous point in mind, accessibility is an important aspect of ACIs. Not only for inclusivity but also for convenience. ACIs are multi-language and multi-modal.
As LLMs continue evolving and AI agents perform better at reasoning, they will also qualify as consumers of ACIs. In this sense, we can iterate over the concept and think of a consumer as anybody capable…

View on GitHub

Looking forward to your feedback! If you want to join us discussing and defining the OpenACI spec, write me at lfarzati@gmail.com or ping me on LinkedIn.

[Anthropic Claude 3.5 Sonnet was used to review, proofread, and improve the readability of some sections in this article]

How I built an AI-based Telegram bot in 21 minutes with Make and OpenAI

Dario Farzati — Tue, 06 Dec 2022 17:15:59 +0000

No, it's not that I'm bragging, it really was that easy. Let me share the experience and all the details with you. By the end of this post you should be able to build an AI bot for yourself, your friends or your business!

The idea

It all started when I wanted to build sort of a Copilot AI but for Telegram. The bot would respond tech questions, explain concepts or even provide answers down to code level of detail.

OpenAI

For the AI part I was looking at OpenAI - they have what's called the GPT-3 models, which by their own quote it's

A set of models that can understand and generate natural language.

You can read more about these models in the Models page.

OpenAI's website has a playground where you can try these models without any code or configuration. The way it works is like this: you provide a text ("prompt"), and the AI will predict (or "complete") the text that would follow. Here's an example:

I wanted to achieve the same result but via a Telegram chat. The cool thing is that OpenAI provides this functionality via an API that you can integrate in your own apps.

Ok now I had a picture of the solution: you would simply ask the bot a question on Telegram; this question would then be sent as a prompt to OpenAI's API; and the API response would be sent back to Telegram as a reply.

Before you say it: yes, Slack would be a better place for this bot – and I built that as well! But the Telegram version was more fun so I'm going with it for this post 😄

So, I got the idea, I sketched the solution, now I needed to build it! The thing was, I didn't want to spend any time setting up a Node project, installing modules, writing code, deploying it somewhere... maybe it was about time to try these so called "no-code platforms"?

"Make" it happen

First of all, don't get the wrong idea - this is not sponsored content (although if Make wants to alleviate my bill for these non-profit, just-for-fun projects, I would be grateful 😅).

In fact, I've been totally skeptic about these platforms. My bar is really high for tools that abstract away so much complexity; it's really hard to think the right angles from where to approach this problem and getting the right UX to hide all this complexity while still offering something powerful and flexible that "just works".

Making my first attempts with a couple of these platforms only reinforced this opinion. They sucked big time. Either they had a complicated, unintuitive, or simply ugly user interface; or they would be slow; or a bunch of other issues that made me frown every time.

But then just when I was about to give up, a colleague of mine recommended Make.

Sure, like any tool, Make is not a silver bullet. But truth is I found it quite powerful for automating tasks or even rapidly setting up backend support for prototyping an idea, maybe even getting an MVP done.

With a clean, decluttered, simple UI that only progressively expands into details as you consciously want to dig deeper; with a consistent way of doing things that only takes you a few minutes to learn; and with a set of primitives that may not be very diverse yet, but they are really well designed and enough for most typical use cases, I have to say Make passed many of my high demands. 😅 ✅

So with that out of the way, what else did I need? Ah, of course! The bot!

Telegram bots

Creating a Telegram bot is fun: there's no website, no sign up, no forms — you just use a... bot. Yes, a bot that creates bots. It's called the BotFather 😂

Making sense of everything together

Was I missing something? Let's recap:

The Telegram bot would receive my message, and it would pass it as a prompt to OpenAI's API.
OpenAI would run my prompt against their GPT-3 model and respond with a completion.
The bot would use this response as a reply to my message.

Confident and with a plan, I was ready to jump into action!

Part I: creating the Telegram bot

I did this with the desktop Telegram client because it was easier to type, copy/paste and so on.

First thing I did was to look for BotFather:

After clicking its name and opening the conversation window, BotFather sent me a message enumerating all the available commands.

I typed /newbot and answered the couple of questions needed for creating my bot. After that, BotFather gave me a token for using Telegram's Bot API.

I copied the token in my Notes and continued my journey. Next stop? Getting an OpenAI account!

Time spent: ~3 minutes

Part II: setting up OpenAI

Pretty simple step, I just needed to get an API key that the bot would use for calling the API.

I wanted to go for a quick test. But, what API should I call, and how? Fortunately the OpenAI API reference is nice and clear. After browsing the docs for a little bit I found it: what I was looking for was the Completions API.

They even provide a curl command line example I could use! I tried and of course it worked like a charm.

Time spent: ~4 minutes

Part III: implementation time!

I went to Make and created a new Scenario. Scenarios are essentially programs composed by "Modules". Each module does something, for example:

Make an HTTP call
Parse a JSON string
Extract text from HTML
Iterate over an array
Branch the flow on given conditions
Set a variable

... and so on.

So what you do is basically add and connect modules, customizing them to your needs.

However, that's not all, and the game changer here is the number of integrations that Make have with existing services out there. In that sense, is no different than other tools such as Zapier or IFTTT.

So for example, would you like to do some stuff with Reddit or TikTok? Lots of modules for that:

And, you may have guessed it already - there are modules for Telegram as well!

Receiving Telegram messages

Let's go back to the flow I sketched. First thing I needed to do was to receive the Telegram message that was sent to the bot. For this I needed the "Watch Updates" module:

I added the module to the scenario and it opened up a configuration panel:

This module wanted to know the webhook that was gonna be used to receive the updates. Since this was a new project, I didn't have one, so I went and added a new webhook.

Now it was asking me for which Telegram connection the webhook should be added, but again, new project, so no connections existed yet! I proceeded to add a new connection.

I copy/pasted the token I saved in my Notes, saved the connection, then saved the webhook and after a few seconds I was back in the module configuration panel, only this time it would say:

We read your mind, we attached this webhook automatically for you.

This was cool! To appreciate what Make just did for me, let's imagine if I had to do it myself:

I should create an Express app
Then add a route that follows the Telegram webhook spec (meaning, I need to read and understand the docs as well)
Then implement logic for parsing the Telegram webhook notification message (again, reading docs to see how the message is formatted and structured)
Then use ngrok, localtunnel or similar to temporarily expose my local server
Then either on the side, or as part of the app bootstrap process, make a call to setWebhook in the Telegram API and give it the public URL of my server (again, reading docs involved)

and let's also count going back and forth multiple times between some of those points unless I'm lucky enough to get it 100% right in one go. Only after that, I would be ready to test.

But this no-code platform was already shining by solving all that stuff in a mere couple minutes since I picked the module.

Now it was time to give it a test! The module was configured so it should be able to get my messages.

I went back to Telegram, searched for my bot opened a conversation window, and clicked the Start button.

There I was in an empty, silence chatroom with my bot. Before it would become too uncomfortable, I went back to Make and hit the Play button to run the scenario:

The Telegram Bot module started spinning, indicating it was waiting for a message to come. I went back to the chatroom and wrote my bot a message:

Back in Make, the scenario was now stopped, but a bubble coming out from the module indicated a message has been received:

Amazing!

Time spent: ~5 minutes

Connecting the bot to OpenAI

Now I needed to send this message as a prompt to OpenAI.

I looked for OpenAI modules:

Ok that would have been too easy. I hope Make comes up with an OpenAI integration soon (pretty sure they will).

But hey, not a big deal. We said OpenAI offers an API, more specifically an HTTP API. And guess what - Make has modules for that. One of them is "Make an API Key Auth request". Exactly what I needed!

After adding it to the scenario, it asked me for a bunch of stuff. First of all, the credentials that should be used for the API call. I clicked Add since I haven't added any credentials so far, and typed in the values. I knew where the API key should go, from the docs and the example I ran before:

So I completed the credentials configuration and clicked Create:

I then proceeded with the module settings. Something that you would notice when going though the configuration fields, is that anytime there's a chance to use a dynamic value as an input, Make will show a popup with a lot of values or "placeholders" that you can embed in those fields - not only displaying available values coming out from other modules (such as the Telegram Bot module, in this case) but also utilities such as string, numeric and date functions.

This was really convenient, since I wanted the prompt property in the request body to be whatever message text we received from Telegram. I needed some way to express the following:

{
  "model": "text-davinci-003",
  "prompt": "<the-telegram-message-here>"
}

So, as you can see, it was a very simple thing to do. I just positioned the cursor between the quotes in prompt: "", and picked the placeholder for the Text property of the Message object coming from the Telegram Bot module:

Have in mind, Make can only suggest values it knows. If I haven't had run the Telegram module before, there wouldn't have been any output. Without any output, Make doesn't know the structure or values that the Telegram module can "spit out" after execution.

In a way this is OK, but since we are talking about well defined APIs here, I would have expected Make to know the schema of the module they are providing and let me choose any property of that schema regardless if I ran the scenario or the result of it.

Anyway, I saved the module and I was ready for another test! I hit the Play button again, went back to Telegram, wrote another message.

Back in Make, this time it took several seconds for the update to arrive – so, if this happens to you, give it a good 10, maybe 20 seconds. The message finally arrived, the flow then continued to the HTTP module, which made the request to the OpenAI API passing the Telegram message, and I could inspect the API response as well:

So exciting! But no time for celebrations yet, we need to send that response back to Telegram!

Time spent: ~6 minutes

Replying back on Telegram

I knew there was another Telegram module for sending messages:

Configuration for this one was super straightforward - the connection was automatically selected, and for the rest of the fields it was just a matter of picking up the right values from the right modules:

I extracted the Chat ID from the original message, since I want to respond in the same chatroom.
The message text would be the OpenAI response, coming in the choices array (take note, arrays in Make start at 1, not 0!)
I wanted to specify the original message id so in Telegram the message will be sent as a reply, so I extracted the message id from the original message.

Time to test it again? Of course! I hit that Play button one more time:

Wrote another message on Telegram:

and I waited... I didn't even want to check the Make scenario. I just waited for the magic. And a few seconds later...

🙀 🙀 🙀 🙀 🙀

Time spent: ~3 minutes

Conclusion

About 21 minutes after, I had a running AI bot. I invited the bot to a Telegram group I have with some friends, and oh boy if we had fun! The davinci-003 model is super awesome. There are cheaper (and faster/less latency) models, I suggest you test which one better fits your needs.

To be continued!

But, hold on. This is not the end of the story. I wanted to make several improvements to the bot - so I did.

For example, I didn't want the bot to react on absolutely every message - otherwise when the bot is in a group this only ensures chaos and spam. This turned out to be fairly simply with Make - I encourage you to figure it out yourself before I get to write the follow up post. 😉

Second, I wanted the bot to kind of keep the thread of the conversation. Otherwise every new message it receives, is kind of a new conversation. This required a little bit more effort, but again was done in about half an hour thanks to Make.

Stay tuned, I'll post the rest of the story soon!