Forem

Serverless Chats

Episode #31: Voice Automation with Serverless with Aleksandar Simovic

About Aleksandar Simovic

Aleksandar is an AWS Serverless Hero and an experienced senior software engineer at Science Exchange, a biotech company based in Palo Alto, California, that is helping scientists, research laboratories and big pharma companies get faster in experimentation and research. Co-author of “Serverless Applications with Node.js” book, published by Manning Publications. He is based in Belgrade and co-organizer of JS Belgrade, Map Meetup Belgrade and Serverless Belgrade. One of the core team members of Claudia.js, contributor to AWS SAM, AWS CDK, AWS Lambda Builders and many other open source libraries.


Transcript

Jeremy: Hi, everyone. I'm Jeremy Daly, and you are listening to Serverless Chats. This week, I'm chatting with Aleksandar Simovic. Hi, Aleksandar. Thanks for joining me.

Aleksandar: Hi, Jeremy. Thank you for having me. It's awesome to be here.

Jeremy: You are a senior software engineer at Science Exchange, plus you're also an AWS Serverless hero. Why don't you explain to the listeners a little about yourself and what you've been doing at Science Exchange.

Aleksandar: Yup. You're right. I'm a senior software engineer at Science Exchange doing serverless a bit more than four years at the moment. Yeah, there's a lot of titles here, AWS Serverless Hero, where I work with other two serverless heroes, Gojko and Slobodan on Claudia.js, one of the first frameworks for serverless. Also co-authored a book, Serverless Applications with Node.js with Slobodan, running many meet-ups on JavaScript serverless Wardley Maps scene, Belgrade Serbia. My main focus is serverless, and business strategy, basically building product with serverless and Wardley Maps.

Jeremy: Awesome, all right, so I want to talk to you about something today that maybe is not going to seem like it's about serverless, but I think you and I will agree that it very much so is. That has to do with voice automation or the ability to use voice integration, I'm sorry, voice interface technology. I think that the ability to control something with your voice is absolutely the future of how pretty much most interactions are going to go. Maybe I'm a little bit crazy here, but I think you sort of agree with me?

Aleksandar: Yeah, this is something that there's a lot of heated discussion about, but I'm going to just tell you a story of this Christmas I saw my seven-year-old nephew, who basically doesn't ... He's Serbian. He doesn't know English. He doesn't know how to type properly. He doesn't know the Latin letters. I saw him using the phone in a very different way than we used to use it. He basically started ... He only uses the phone by using the Google Voice function, so he opens up the phone and he just presses the Google search function and he basically just says what he wants without even typing or anything.

For him, that was the most easy way to interact with technology. And that's something which blew my mind as I saw that the way we are interacting with technology has evolved so much that in our age we sort of ... We started tapping on the iPhones and everything, and now we have a new kind of age slowly creeping in using voice.

What's surprising is that for many humans that are not used to phones, are not used to the traditional ways of using technology, voice has become something as a normal thing, something very ordinary.

Jeremy: Yeah, and promised the listeners we're going to get to why serverless is important here, but I want to just quickly start with ... just sort of lay this out, like lay out the groundwork here and what we mean by voice interface technology. When we started with visual interfaces we were using desktops or computers, and then everything started shifting to mobile, and companies started thinking mobile first. Now there's this thing, sort of voice first, right?

Aleksandar: Yeah.

Jeremy: We've seen this with Alexa and Google Home and Siri and some of these other things. It started very simple, where we were saying like "Oh, Alexa play this song." Or, "Alexa set a time," or things like that, and I hope people aren't playing this over the speakers so that their Alexa devices are going crazy. I should say, "Alex, order a 100 rolls of toilet paper." But these sort of interfaces now have become much more sophisticated.

The technology's much more sophisticated, and now people can do very, very complex things. I want to get into that in a minute, but when I think about voice interaction or this idea of using your voice to control different systems, and of course this home automation and all this kind of stuff, this was sort of predictable right?

Aleksandar: Yeah, so voice ... As you can see, everything that we are ... in technology everything evolves, and everything evolves so fast, and how do we ... The main issue that we have is how do we anticipate change. How do we anticipate what's going to happen? Luckily, maybe around 15 years ago, I know something called Wardley Maps has appeared, some kind of strategic maps developed by Simon Wardley, one researcher at ... one amazing, actually, researcher, and a former CEO. He discovered this way of how can you actually anticipate change and have a situational awareness of how things are going and evolving. Many of your serverless listeners already heard about him, but he basically created this concept called Wardley Maps, which kind of represent the strategic maps of a business landscape.

Which are kind of represented in a form of a value chain of components, which evolved over time. Now that doesn't sound very, like, novel, for some people maybe, I don't know, but basically he created a very visual map, visual way of mapping business surrounding. Based on that, you're able to anticipate how things are going to evolve. For example, we know about the electricity, how electricity was something novel, new, unknown, coming to a point where it's commoditized, industrialized. I mean all of our common lives are kind of pointless without electricity at the moment.

Our technology and the things we do are pointless without it. And as these things, as Simon developed this amazing mapping technique, and basically a structure about our own strategy, he found out different things that were going on. For example, that as new things appear, as things become commoditized, sorry, they become ... you are able to build some things on top of them. We can see that with our electricity we got radio. We got television and we got computers, internet, and we came here to serverless.

So, basically, what happened, I mean what Simon saw 15 years ago, is that there's going to be a ... He actually even created the first serverless technology in his company, and he basically, 15 years ago, he said there's going to be something, such as AWS Lambda. There is going to be something where you're going to have a runtime, a runtime as a commodity, where you won't have to think about servers, where you won't have to think about infrastructure, so basically he developed a way how do you anticipate change and how do you anticipate where is it going and what's going to happen.

Of course, someone's going to say, "Well, it's not a crystal ball. You can't see that much." Well you can't see that far ahead, but you can see things that are in several layers on top of the current technology which we have, and this is where we came to voice.

Jeremy: Yeah, no, so I think you're right about this idea of once things become commoditized, then being able to build things on top of those is ... or adding more value on top of those is a huge thing. That's, like you said, where we come to voice here, so serverless, which is runtime, has basically been commoditized, and I should take that back, so that I don't get in trouble for saying serverless is only a runtime, but functions as a service, let's use that, functions as a service is a commoditized runtime.

Aleksandar: Sure.

Jeremy: It's not just AWS who's doing it. You've got Microsoft Azure. You've got GCP. You're got a ton of open source projects, everything that runs on top of native-

Aleksandar: Oracle.

Jeremy: ... and Oracle, yes, yes, the new Oracle functions, all kinds of crazy things like that. But now that you've commoditized the ability to process ... and not only that, it's not only just processing business logic, it's also the natural language processing and the parsing of the voices and all that kinds of... that's all been commoditized now through different providers, so now Amazon Web Services, and I guess it's Amazon in generally really, with their Alexa device, they have created a completely commoditized platform where it will do all of the voice recognition. It'll do all of the slot filling, and we can talk about that later. And then pass that off to a Lambda function, which is commoditized, a commoditized runtime, and then you can process your business logic off of that.

Aleksandar: Exactly, and what's interesting is now that's ... I don't know, maybe 20 years ago, when we saw the first Star Trek episodes and whenever we saw people talking to computers and interacting, we were thinking it's fantasy, but at the moment now it's the reality. We see that Alexa and other competitors are present in our everyday homes, and what's even more interesting that now you can ... I don't know, I think Amazon reported about 300% increase in shopping on Alexa just last year, so this is kind of ...

We're coming to a point where we have this new interface, new way of interacting with a software, and the new way of how do we interact with other entities, other companies or products. So, basically we're coming to a point where everybody is going to start booking more and things through voice. I mean we already see that, that for example, there's also been a report, there's a developer who actually started earning over $10,000 each month for just one Alexa skill, so things are going super crazily there, and this is still of course the kind of like the genesis where ... I mean where people are custom building their own skills, kind of, trying to build their ... trying to discover how can they voice and in which way.

If you're familiar with Wardley Maps, you can even anticipate that maybe in five to 10 years we're going to have this race of these intelligent agents that are going to appear everywhere. Where going to have like ... for example, I don't know if you remember 2014, 2015, you probably do remember the whole when Lambda came out, how could we use Lambda? There were these cases where people were thinking you know I had this [inaudible 00:10:58] that kind of works for me, but where could I put Lambda, and they started doing small chunks, pieces, a piece here, a piece there, maybe I'm going to do a small function as a converter and we get a PDF converter or something like that.

And now we're starting to see the same thing appearing in the voice space, where we have these voice skills, that are of course naturally, completely serverless, because that's the only ... that's like the most recommended way how do you direct with Alexa skill. Basically, here we see these small pieces where people are actually building small building blocks. You are, for example, you can order, I don't know ... Naturally, of course, you can order things from Amazon or whatever, but you're now slowly starting to see, for example, print, give me some report, or send this or send a message to someone else. And hidden and seen in the appearance of these [Eco 00:11:53], shows in the past two years, where you can even see when something is happening in front of you.

Jeremy: Right, it's like a visual interface that's on top of your voice commands?

Aleksandar: Exactly, and what my kind of prediction, and things that I'm working on, I'll talk about it later, is that we're going to come to a point where things are going to be automated using Alexa on many manual things that we're already doing right now, I don't know, office, office manager thing, office manager tasks, like I don't know, send somebody a reminder or schedule a meeting.

Actually we already can see that using Alexa for business, but all these small pieces are starting ... Like people are starting to discover how can they easily use it, but as serverless evolved, and now people are actually building huge applications on like enterprise scale applications on serverless, and we saw that on Reinvent, at this last year.

This is how things are going to evolve with voice as well, so we're going to see an explosion of higher order kind of software, like more complex software, that's ... You might have an intelligent agent or an Alexa skill that's going to be able to do some financial or maybe do your taxes, you don't know, you know?

Jeremy: Yeah.

Aleksandar: So, things are slowing building. These building blocks are appearing. We see AWS is building this whole serverless ecosystem around itself, where it's going to be a piece of cake actually combining these components and creating something out of the box.

And here we come to a point that ... so something which I've been building in the past, let's say, a year and a half or two years, something called The Computer, which is basically building software with voice. As much as it is, it's ... There's a video about it. I guess you'll put the link there.

Jeremy: Sure.

Aleksandar: Basically, I created these first prototypes of how you can build software using voice.

Jeremy: Yeah, and so this project, and I think you originally called it Jarvis, right?

Aleksandar: Yeah.

Jeremy: It was sort of a Tony Stark type thing.

Aleksandar: Yeah.

Jeremy: But I think that's absolutely fascinating, and before we get into that project through, because I do want to kind of talk about a little bit about that. The voice interface stuff that you deal with now, so we talked about things being commoditized, and obviously things get smarter every single day, right? So we know from the re:Mars conference that AWS had, that Alexa can now do emotion and things like that. There's some scary stuff happening there, but at the same time also some really interesting stuff.

And so as someone who has built a couple of skills, just sort of more playing around with it or whatever, it's very prescriptive, right? Like you have to really script out how these conversations flow. So, if you say Alexa open or Alexa use this particular skill, and then you say do this and then do that, you have to kind of outline how that conversation is going to work. I know Amazon recognized this. I know other companies have recognized that this is sort of a problem.

So these intelligent agents, and I think like Bixby is one, now Alexa Conversations, there's a few of these sort of tools and capabilities what are those about? Because that to me is really interesting?

Aleksandar: Yeah, well honestly, yeah, it can sound a bit scary. To be honest, this moment, the moment I heard that Alexa can recognize emotion, I was kind of initially scared, but then I discovered that actually this is an amazing feature where for the first time I could say in a ... maybe I'm wrong about it, but like in human history a machine's going to be able to understand whether this human is really angry at me or really sad or feeling upset from a voice of kind of perspective.

By the way, that's being by led by one friend I met at Amazon called Victor Rozich, which is a senior machine learning scientist at Amazon Alexa, and I was amazed when I saw that presentation, which his boss, I can't remember her name, sorry about it, and him, when I was amazing how is that possible. We see that every single piece, even this ... the whole Alexa device, and even the corresponding technology behind it is evolving as we speak, and things and patterns are emerging. For example, I guess all of you remember the first voice agent was Siri, right?

Siri was ... people were amazed by like I can tell the Siri this. I can tell the Siri that, but again, those were pretty basic things. It was a novel thing. People didn't understand what it was. I remember there was a podcast with Adam Cheyer on voice where he was ... because Adam Cheyer is actually one of the founders of Siri, where because, I don't know if you're familiar with this, but Siri actually was first on application. It was the first on app.

And then when they released it on the app store, if I recall correctly, Steve Jobs actually contacted Adam Cheyer, and he wanted actually Siri to be part of the ecosystem. I think Apple really lost a big advantage it had over everyone else. It could be like a real competitor to Alexa, and at the moment, we don't see it that much. We just see this kind of Siri shortcuts that appeared and so forth.

But anyway, the thing is, these developers are from Siri they went to Bixby, and they worked at Samsung, and which is interesting is they have actually discovered a better way of how do you handle skills and voice applications and voice agents, intelligent agents.

And they have actually created ... Bixby actually functions in a very different way than Alexa skill. You make those capsules where ... Actually, you teach Bixby on how to interact with a certain API or something like that, you know? You don't actually ask Bixby to create an app or to ask another application or a skill or to do something like that, because let's ask ourselves, how many applications, on our phones, do we know from the top of our heads, at max, 30 or 40. How many of those do we know that are residing in our Alexa device? Probably much less.

Jeremy: Yeah, exactly. I can never remember what the name of the skill is, and that's the worst.

Aleksandar: Yeah, exactly, and you're like, okay, what was the name of the skill? It doesn't work that way actually. So, the guys at VLabs, I think that's the name, that's actually the company there, so they have actually evolved this way of interacting. They have found out that it doesn't work to have like 500 applications, 500 Alexa skills. It doesn't work, but actually you have these capsule where you actually can teach ... that actually Samsung Bixby, you can teach it a certain, let's say, application or some method of interaction.

So, for example, to translate it to Alexa, you could say, "Alexa get me ... order me some cab or make an appointment with my ... I don't know, doctor," and actually it would actually invoke those skills you made, but you won't interact with those skills in that way. But you would say Alexa ask this to do that. You could just say Alexa do that. Which is more convenient, and people ... It's much more communicative for people, and we can actually see that with these Alexa conversations.

So, we see that Amazon has kind of discovered that. We have Alexa conversations where one skill can actually call another skill from another developer, so we see that this is evolving that way. I'm not the person who discovered this, and I'm just a reproducer in a way from Ben Basche, who is a product manager at MultiChoice I think in South Africa, so he actually talks a lot about the way how things ... how intelligent agents are evolving and what's the master agent and all of these other things, so yeah.

Basically, there's a lot of people now investing into voice. You can see that even voice by itself is evolving, and what is particular interesting for serverless, and from the aspect of voice, is that 90% or maybe even more than that, of voice skills are actually built on serverless apps. So, we actually ... what happened is that now we have, besides a web and a mobile interface, we also have a voice interface that we should start thinking about as people are ... as for people voice is one of the most natural ways of interacting with technology.

Jeremy: Yeah, all right, so I totally agree with this idea of these intelligent voice services or these intelligent agents, because that's something that is such a problem or I think a limitation, even when you're using something like Siri, you're ... You still have to remember what it is that you're trying to ask it to do.

Sometimes there's very specific ways that that needs to happen, so being able to just say Alexa order me an Uber, that's probably easy to say, order me an Uber, but if you said something like what's the weather for the next 10 days, or something, and rather than it accessing the default weather app that Siri's got, that Alexa's got built in, like if there was a customer skill that you wanted it to access... and then one thing that's very cool that I'm pretty sure Siri, and I know for a fact that Alexa does it now, is it actually does voice recognition, so not voice recognition in terms of understanding what you're saying, but understanding who's talking to it, which is very, very cool.

Because now if you set that up in Alexa, you can say, Alexa who am I, and it will tell you which user it thinks you are. It's very accurate, so that's kind of a cool thing. All right, so I do want to get more into some of the other business automation things that we might be able to do, because ... and I'm thinking of this, yes, there's all kinds of great things that you can do from a home automation standpoint, and yes, order me cab or order me an Uber or those sort of things. I think there's going to be some really powerful business use cases, but walk us through the Jarvis or the computer project that you did, and just explain basically what that process was and how that worked, because I think this was an interesting use of a voice interface.

Aleksandar: Yeah, well, I mean as I mentioned from the beginning, using this concept called Wardley Maps you can predict how things are going to happen. Basically, as we saw with serverless, at the beginning people were doing small scale cellular programs in serverless functions and slowly started to evolve, as I mentioned before.

We can see that that's going to happen with voice as well, so at the moment, voice can do some very simplistic things, but in the future, it will probably be able to read your email. You can probably type. You can use voice to send an email, just maybe even do CRM, like for example such as seeing the ... show me the last 100 orders that somebody did in my platform or something like that.

Of course, what I mentioned is some people are going to get scared, like am I going to lose my job if Alexa is going to be able to do that or something, but actually what's going to happen is there's going to be an explosion of even more engineers and even more people required to actually operate these things.

We'll need more and more people, and nobody should be afraid of losing their job, so in these small scales, as small sale services get integrated, like these small Lego building blocks, that AWS is providing us. We're actually going to have ... We're going to require more engineers to work on much more complex and larger problems in this space we haven't yet discovered.

There's actually a nice saying by a friend of mine, who was actually a software manager, software development manger in Alexa, who actually, [Ben Dimage 00:24:55], who said, "What is the problem you want to work on? What is the problem you want to solve? Do you want to solve the problem of like manually managing servers or whatever do want to solve business problems that are of high, or more customer value importance."

So, in a sense, it's the same thing with voice, as small things that get automated, we're going to be able to work on larger problems. As we see, when electricity was invented, more jobs were created with radio and everything.

Aleksandar: Anyway, but I believe that the next step after solving these more complex software tasks using voice, we're going to even be able to managed robots with IoT, maybe [Ben Ihoe 00:25:44] and other serverless heroes, who works at the iRobots, maybe we're going to be able to tell your iRobot to go clean the linen room and now clean the bathroom or whatever, and it's going to do it for you. Like I don't know if you remember the Rosie from Jetsons.

Jeremy: Yes, of course.

Aleksandar: So, basically Roomba will be the Rosie of our age, but we see this. I have to again come back to this Simon Wardley predicted this basically 10 years ago, speaking to everything in the future with voice, and everybody ... Nobody believed him.

Because it's kind of does sound weird, and kind of does sound like you're Nostradamus or something, but it's just understanding the way how do we interact at home with technology, and which problems are we solving along the way. And you probably ... you and I discussed before, maybe this Alexa for business, we have the meeting room scheduler, linking email calendars, to do with the reminders, but I have already seen that there's some skills and some people building assistance for bio and laboratory, retail industry, things are evolving super fast.

Jeremy: Yeah.

Aleksandar: I think in five years, it's going to be like ... not maybe five years, but 10 years max, we're going to see something happening with voice, what's happening with serverless right now, everybody jumping on the bandwagon and pushing in all directions.

Jeremy: Yeah, they should jump on it sooner, because ... So the point that I was trying to get to was, and you made a very good point about the sort of encapsulating these pieces of business logic into these building blocks, and maybe they're not only business logic. Sometimes it's a building block that might be an API interface or some other interface, but what you did with that computer project was ... and it's hard to say The Computer project, but what you did with the Jarvis/The Computer project, was you took some of these pre-configured blocks or these Lego blocks that were in AWS, and then you used voice to basically assemble them and launch an application with them.

And that application can do multiple things, and that's why ... What's cool about that to me is that shows a very extreme, in my opinion, sort of an extreme sort of no-code approach to building something with your voice. But there are things in between that, business processes, that would be a lot less complex, and fairly easy to implement, with the technology we have right now, and so I think we were talking about this at one point, where we saying that when you're a developer and you're building these backend systems, sometimes the front end piece of it might be the harder thing for you to build to visualize that or give people some way to interact with that.

And of course with things like the Alexa presentation, UI language or presentation language, you can visualize some of that without having to do any real design work, but I see this as something that could be like let's say you're just ... You're a manager at some company, and you want to see the most recent inventory numbers, so you say, "Alexa, show me the most recent inventory numbers," and either that shows right up on an Alexa show, or it sends it to a dashboard, a wallboard somewhere or it sends you an email with a PDF report in there or something like that. Those types of processes now, those are possible today.

Aleksandar: Yeah, so my opinion is that it's never going to be a voice only, like maybe rarely, like for example like I don't know if you remember you can say hey mom get me something over the phone or whatever, but when you interact with a voice module, my kind of prediction is that you're going to want to see something.

We, as humans, we're not just audio only or video only, which we're not like visually focused only, but we're going to actually want to interact. We want to say something and see the result in front. Like let's say maybe your Alexa skill or something or application got stuck or something, you want to see that something's going on. You don't want to stay in confused and be like, "Okay, what's going on here."

So, in a sense, that's kind of something which yeah, I'd be working on this project called The Computer, which basically is building small scale applications using voice. Building higher order workflows is a much more complex thing.

Jeremy: Sure. Definitely would need visual feedback to do something like that.

Aleksandar: Exactly, and that's something ... yeah, basically what I build is with this computer project is you can very easily explain to Alexa just by saying I want to create some service, I want to create some application.

We can very simply say add this element or add this other element, and anyone, who knows in way to explain their business to just basically just by using voice able to explain what they want from an application, like do they want to save, delete or manage in some way customers and just tell to the Jarvis to create the solution and while they're explaining it they can see on the UI, on the interface, how is this application ... how does it look like?

And then they can just say now deploy this solution and it's done. Naturally, of course, much more complex solutions require a lot more time, a lot more time and dedication to explain it, but it's basically able to do that. Basically, it's just a ... For some people it's more of an experiment, but my belief is that nobody's going to use also voice ... I mean how can you use voice, for example, like that, in your cubicle or [inaudible 00:32:11] software piece. You won't be able to do that.

Jeremy: That's a good point, yeah.

Aleksandar: Exactly, it's going to actually force you to sit down with your UI engineer, UI designer, UX, like the whole team altogether, and you're all going to try to work collaboratively, because you won't be able to use an Alexa device on say hey imagine like 300 people in an office space yelling at their Alexa.

Jeremy: Right.

Aleksandar: That just won't work, you know? So, what's going to happen is we're going to have this collaborative work, but several people are working in a meeting and discussing how should they build an app, and just using an Alexa as a support device for actually explaining what's actually going to happen.

We'll see at Amazon thinking in this kind of space, not really developing by voice, but these kind of no-code solutions I think as Forrest Brazeal, another serverless hero colleague, he even wrote, tweeted, like maybe nine months ago about that somewhere some were hidden. There's this no code kind of solution being in the works inside of Amazon.

Which means they have understand, like even Amazon has understood, that even though we have all these infrastructure services, that are extremely, extremely good and useful, we do not have these kind of UI interfaces, and you can see now a huge wave of no-code, that's going to ...

Like no-code is now the blockchain of 2017, basically. Everybody is no-code now. I mean it doesn't work that way, but anyway, it's ... for some people, it's a nice way to get some VC investors. Anyway, I'm sorry, I'm sorry to be...

Jeremy: That's a little advice from Aleksandar out there for anyone starting a startup company.

Aleksandar: Yeah, you want to get some VC money, just label no-code on it and it's done, you can even put a regular CRM, just say no-code CRM. There really is no code, you just click around it. Anyway, so coming back to this whole serverless, and voice thing, voice interaction, to be honest, we see everybody knows that there's going to be another wave on top of serverless, again, and things are evolving, so there's a high ... there's a big chance that voice might be that thing, maybe not in a direct way how we see it at the moment, but we see like the UI, for example, as you can see, Amazon is investing ... AWS actually is investing a lot into AWS Amplify, which is really an amazing solution, which I recommend to everyone. If you're building a product, don't start building a product ... I mean actually start building a product with Amplify first, and then try to separate the pieces using other serverless technologies.

But we can see that we're coming to a point that where now the UI's the next layer that AWS is building, which is something that we should also focus on, and then we are coming to a point where voice as well as just another side channel interacting, side interaction channel as well. So, we should think about these things as ... as I mentioned from the beginning of this episode, I mentioned this, like even seven-year-olds are capable, and they easily ... nobody showed my nephew how to do it. He just was pressing buttons and he discovered that and he was like, "Wow, this is easy."

Then he was like just first harassing the Google Voice a lot, and then he goes like, "Okay, this is useful." So, it's natural, and my belief is that we had first just visual interfaces, and actually we had the terminal at the beginning, and then we had more visual interfaces.

Jeremy: Right.

Aleksandar: We're coming to a point where we have web and mobile and now we have voice as well, so we have this Three Musketeers of human interaction, human competitor interaction, and yeah, I think it's going to go in that direction. I can even mention there's some experimental project we're working on called The Doctor, which kind of sounds like wait, the computer is building software, what does The Doctor do? Does it cure disease? Does he use it? No.

Actually, it's something ... both are going to be under the ... I mean I already have a domain, but, domains, but anyway, it's actually helping out like a researchers and doctors how to actually discover more important actually significant relationships in between the works they're doing or whatever. I'm going to be honest, for example, let's say you are searching for some heart disease or something, there's like a gazillion articles that you can read about.

Jeremy: Sure.

Aleksandar: If you mention certain keywords, and you talk in a certain way, it's again, it's something that it's not like I'm the inventor of the idea, but basically helping out people who want to quickly and naturally find out certain works and papers. They are going to be able to do it very easily and just by using voice, so yeah.

Jeremy: I think that idea, this idea of medical advice or medical feedback, in the moment, actually could be really, really powerful. If you think about in the emergency room, if a doctor or a nurse that's treating a patient could just say something like, "Does this patient have any allergies," right?

Aleksandar: Yeah.

Jeremy: And that was just kind of tied together and it'd say, oh, yeah, that whatever, or-

Aleksandar: Exactly.

Jeremy: What's the correct dosage of this or that.

Aleksandar: Exactly.

Jeremy: You could ask questions, and that could extend to things that were not quite as life saving as maybe the medical profession, but what if you were-

Aleksandar: Exactly.

Jeremy: Maybe this is a little ... I don't think this is farfetched, and I'm just curious, like let's say you were a plumber, and you're working on a sink, or you're doing something, and then you forget what the right, I don't know, fitting is or what the torque is supposed to be or something, and you could just ask that question, and a system could answer it for you, that would just make people more productive. I know that's one of those things, where I do it all the time, where it's like I stop for a second, and I'm thinking about like oh what's the function to write that piece of code, so I go and I Google it or whatever, and the smarter these devices become, and the more questions they can answer for us, in a way that they we expect them to answer those questions, would be really powerful. So, just a couple more things, and then I'll let you go.

Aleksandar: It's a pleasure actually.

Jeremy: I know it's the end of your day, so things like the accuracy, so you've obviously built a number of apps, or a number of skills, so you know that we use slots and intents, like we basically have to tell the system in what order or in what way to expect us or expect our users to speak into the system. Slots are very cool, because slots have ... They have like a type, so you can say that I'm expecting a number here or I'm expecting an actor's name and things like that, so they're very precise if you capture the right utterances, as they call them, to know what your user might ask it. But even as accurate as it is though, you still think that it's still a bit of a limitation, right?

Aleksandar: Yeah, I'm going to be honest, this is why I said it's going to take 10 years for Alexa to actually really be super, super powerful, maybe not 10 year, but seven or eight for sure. These slots are so, I mean from my experience, these slots are super limited. Even though it can understand emotion, my opinion is that it's on a level of a four-year-old. A four-year-old knows what you want. Do you want me to ... like the most basic stuff.

He doesn't know the answer to Einstein's formula or whatever, or if the answer is always 42 or something, but it knows that if you're angry at it, if you want it to bring something, is it okay, or find that information from a playbook or whatever. Even though it's very precise and accurate, it's still not on a level that we expect it to. I've tried from biological, for some biological states, medical conditions, it understands sort of things, but when it comes to like chemistry formulas or even more complex things, it's very easy to mix certain things and it's not able to really understand.

I've tried with some chemical compounds and solutions, and it really isn't able to understand anything, and not only that, but it's able to even mix certain words, so we're very far away from something super amazing, like an intelligent 12-year-old or something like that. It's still a four-year-old baby basically, which is able to understand many commands, but not a lot. It still needs to learn a lot, so yeah, even ... What you mentioned, it's able to understand voice supervision, to understand oh this user is Jeremy and the other is Alex or whoever, but it's still a four-year-old unfortunately.

But, we see that AWS is understanding ... Amazon is understanding how Alexa works and what are its kind of implications there and ... So, for example, we have this Alexa presentation language where if ... to come back to the point which I did with that actually people are not just audio only. You can use this Alexa presentation language to describe skill at the same time, I mean skills UI and voice basically at the same time. That's an amazing thing when you interact with an Alexa skill, and you say show me, like I want to get an airplane ticket from Belgrade to Boston, where you live, I'm going to be like, well it's going to say well there are five airplane routes that you can take, and you're going to be like, "Wait," and then it's going to repeat the five airplane routes, but if you don't see it visually-

Jeremy: Yeah, you're not going to remember it.

Aleksandar: You're not going to ... You're going to be like, "Wait, what was the..."

Jeremy: Even now, when you get those automated phone things, when you call in and they're like ... they list like six options, and like to repeat these options, you're like-

Aleksandar: Yeah, exactly.

Jeremy: ... because I don't remember, because I forgot the one thinking that it would be something better when it got through, so no, I totally agree with you. And it's funny, the emotion thing, if Alexa can detect emotion, then it's probably not going to like my kids very much, because they're always yelling at it. They always get very angry.

My wife hates it when she asks what the weather it and it has to give her the whole forecast and tell her to have a nice day, and that for some reason makes her angry, but-

Aleksandar: Yeah, but this is actually an evolution of machine learning by itself.

Jeremy: Of course.

Aleksandar: If we take a look at it, we have these ... At the moment, the majority of machine learning is actually in a broad way, like so we are learning from our customers, what do they really want, from a broad range of customers. The next step, which is actually even part of this ... There was a lecture by this friend of ours, Ann, who actually was speaking on the terms of Alexa like the next step is going to be this personal kind of machine learning.

So the next step is going to be where Alexa is going to understand that ... I remember even Gojko saying there's this ... I don't know, some musician that he likes, and he's like, "It's not able to understand." I mean if you say Alexa play this music, and it just puts the wrong artist, and say stop immediately, it should be able to learn that you don't want that.

Jeremy: Yeah, that's not what you meant, yeah.

Aleksandar: Yeah, that's really not what you meant. You want to actually for ... You want Alexa to learn on your habits, so this personalization is also going to be an important evolutionary step into voice.

Jeremy: Yeah, I totally thing too, just and again to tie this all back to serverless, it's been the commoditization of that runtime, that really made the accessibility of Alexa so much easier, and in building skills it's still kind of tough. I mean there's some things to do there, but there's the skill kit or the ask skill, SDK or something like that, that makes building it a little bit easier, but really serverless did enable, I think, what's going to be a mass adoption of some of this voice technology, certainly from the Alexa side of things, at least in my opinion.

But anyways, so listen, Aleksandar, thank you so much for joining me, and sharing all of this complex knowledge about voice interaction technology. Anyways, if our listeners want to get a hold of you, and find out more about what you're doing, how do they do that?

Aleksandar: Well, I mean first thank you for having me. That's super ... I'm super grateful for that, and I really watch and ... Watch, listen and read the show, because I basically, even sometimes when I'm busy and everything, I just skim in the transport or something. I read through who's here and whatever. It's really an honor for me to be here, but if somebody wants to contact me, so they can go on Twitter, @simalexan, or same for GitHub. So, there's three serverless heroes writing on serverless.pub, which is a place where we kind of write our discoveries and the things that we like, and things we want to share with the general ... like everybody who's interested in serverless or something like that.

And we also wrote a book, so if somebody's interested, they can read about it, like Serverless Applications with Node.js. But yeah, that's kind of basically it. They can even send us an email. They can find the email on GitHub or something like that.

Jeremy: Awesome, all right, I will get all that into the show notes. Thanks again.

Aleksandar: Thank you very much.

Episode source