The first Amazon Echo, all the way back in 2014, was pitched as a device for a few simple things: playing music, asking basic questions, getting the weather. Since then, Amazon has found a few new things for people to do, like control smart home devices. But a decade later, Alexa is still mostly for playing music, asking basic questions, and getting the weather. And thatâs largely because, even as Amazon made Alexa ubiquitous in devices and homes all over the place, it never convinced developers to care.
Alexa was never supposed to have an app store. Instead, it had âskills,â which Amazon hoped developers would use to connect Alexa to new functionality and information. Developers werenât supposed to build their own things on top of an operating system, they were supposed to build new things for Alexa to do. The difference is subtle but important. Our phones are mostly a series of disconnected experiences â Instagram is a universe entirely apart from TikTok and Snapchat and your calendar app and Gmail. That just doesnât work for Alexa or any other successful assistant. If it knows your to-do list but not your calendar or knows your favorite kind of pizza but not your credit card number, it canât do much. It needs access to everything, and all the necessary tools at its disposal, to get things done for you.
In Amazonâs dream world, where âambient computingâ is perfect and everywhere, youâd just ask Alexa a question or give it an instruction: âFind me something fun to do this weekend.â âBook my train to New York next week.â âGet me up to speed on deep learning.â Alexa would have access to all the apps and information sources it needs, but youâd never need to worry about that; Alexa would just handle it however it needed and bring you the answers. There are a thousand complicated questions about how it actually works, but thatâs still the big idea.
âAlexa Skills made it fast and easy for developers to build voice-driven experiences, unlocking an entirely new way for developers and brands to engage with their customers,â Amazon spokesperson Jill Tornifoglio said in a statement. Customers use them billions of times a year, she said, and as the company embraces generative AI, âweâre excited for whatâs next.â
In retrospect, Amazonâs idea was pretty much exactly right. All these years later, OpenAI and other companies are also trying to build their own third-party ecosystems around chatbots, which are just another take on the idea of an interactive interface for the internet. But for all its prescience on the AI revolution, Amazon never figured out how to make skills work. It never solved some fundamental problems for developers, never cracked the user interface, and never found a way to show people all the things their Alexa device could do if only theyâd ask.Â
In retrospect, Amazonâs idea was pretty much exactly right
Amazon certainly tried its best to make skills happen. The company steadily rolled out new tools for developers, paid them in AWS credits and cash when their skills got used (though it recently stopped doing so), and tried to make skill development practically effortless. And on some level, all that effort paid off: Amazon says there are more than 160,000 skills available for the platform. That pales next to the millions of app store apps on smartphones, but itâs still a big number.
The interface for finding and using all those skills, though, has always been a mess. Letâs just take one simple example: if you ask Alexa to order you pizza, it might tell you it has a few skills for that and recommend Dominoâs. (If youâre wondering why Amazon would pick Dominoâs and not Pizza Hut or DoorDash or any other pizza-summoning service? Great question. No idea.) You respond yes. âHereâs Dominoâs,â Alexa says. Then a moment later: âHereâs the skill Dominoâs, by Dominoâs Pizza, LLC.â Another moment, then: âTo link your Dominoâs Pizza Profile please go to the Skills setting in your Alexa app. Weâll need your email address to place a guest order. Please enable âEmail Addressâ permissions in your Alexa app.â At this point, you have to find a buried setting in an app you might not even have on your phone; it would be vastly easier to just go to Dominoâs website. Or, heck, call the place.
If you know the skill youâre looking for, the system is a little better. You can say âAlexa, open Nature Soundsâ or âAlexa, enable Jeopardy,â and itâll open the skill with that name. But if you donât remember that the skill is called âEasy Yoga,â asking Alexa to start a yoga workout wonât get you anywhere.
There are little friction points like this all across the system. When youâve activated a skill, you have to explicitly say âstopâ or âcancelâ to back out of it in order to use another one. You canât easily do things across skills â Iâd like to price-check my pizza, but Alexa wonât let me. And maybe most frustrating of all, even once youâve enabled a skill, you still have to address it specifically. Saying âAlexa, ask AnyList to add spaghetti to my grocery listâ is not seamless interaction with an all-knowing assistant; thatâs having to learn a computerâs incredibly specific language just to use it properly.
As it has turned out, many of the most popular Alexa skills have two things in common: theyâre simple Q&A games, and theyâre made by a company called Volley. From Song Quiz to Jeopardy to Who Wants to Be a Millionaire to Are You Smarter Than a 5th Grader, Volley is one of the companies that has figured out how to make skills that really work. And Max Child, Volleyâs cofounder and CEO, says that getting your skill in front of people is one of the most important â and hardest â parts of the job.Â
âI think one of the underrated reasons that the iOS and Android app stores are so successful is because Facebook ads are so good,â he says. The pipeline from a hyper-targeted ad to an app install has been ruthlessly perfected over the years, and thereâs just nothing like that for voice assistants. The nearest equivalent is probably people asking their Alexa devices what they can do â which Child says does happen! â but thereâs just no competing with in-feed ads and hours of social scrolling. âBecause you donât have that hyper-targeted marketing, you end up having to do broad marketing, and you have to build broad games.â Hence games like Jeopardy and Millionaire, which are huge brands that appeal to practically everyone.
One way Volley makes money is through subscriptions. The full Jeopardy experience, for instance, is $12.99 a month, and like so many other modern subscriptions, itâs a lot easier to subscribe than to cancel. Itâs also one of the few ways to make money with a skill: developers are allowed to have audio ads in some kinds of skills, or to ask users to add their credit card details directly the way Dominoâs does, but asking a voice-first user to pick up their phone and dig through settings is a high bar to clear. Ads are only useful at vast scale â there was a brief moment when a lot of media companies thought the so-called âflash briefingsâ might be a hit, but that hasnât turned into much.
These are hardly unique challenges, by the way. Mobile app stores have similar huge discovery problems, issues with monetization, sketchy subscription systems, and more. Itâs just that with Alexa, the solution seemed so enticing: you shouldnât, and wouldnât, even need an app store. You should just be able to ask for what you want, and Alexa can go do it for you.
With Alexa, the solution seemed so enticing: you shouldnât, and wouldnât, even need an app store
A decade on, it appears that an all-powerful, omni-capable voice AI might just be impossible to pull off. If Amazon were to make everything so seamless and fast that you never even have to know youâre interacting with a third-party developer and your pizza just magically appears at your door, it raises some huge privacy concerns and questions about how Amazon picks those providers. If it asked you to choose all those defaults for yourself, itâs signing every new user up for an awful lot of busy work. If it allows developers to own and operate even more of the experience, it wrecks the ambient simplicity that makes Alexa so enticing in the first place. Too much simplicity and abstraction is actually a problem.
Weâre at something of an inflection point, though. A decade after its launch, Alexa is changing in two key ways. One is good news for the future of skills, the other might be bad. The good is that Alexa is no longer a voice-only, or even voice-first, experience â as Echo Show and Fire TV devices have gotten more popular, more people are interacting with Alexa with a screen nearby. That could solve a lot of interaction problems and give developers new ways to put their skills in front of users. (Screens are also a great place to advertise your skill, a fact Amazon knows maybe too well.) When Alexa can show you things, it can do a lot more.
Already, Child says that a majority of Volleyâs players are on a device with a screen. âWeâre very long on smart TVs,â he says, laughing. âEvery single smart TV thatâs sold now has a microphone in the remote. I really think casual voice games ⊠might make a lot of sense, and I think could be even more immersive.â
Amazon is also about to re-architect Alexa around LLMs, which could be the key to making all of this work. A smarter, AI-powered Alexa could finally understand what youâre actually trying to do, and do away with some of the awkward syntax required to use skills. It could understand more complicated questions and multistep instructions and use skills on your behalf. âDevelopers now need to only describe the capabilities of their device,â Amazonâs Charlie French said at Amazonâs AI Alexa launch event last year. âThey donât need to try and predict what a customer is going to say.â Amazon is just one of the companies promising that LLMs will be able to do things on your behalf with no extra work required; in that world, do skills even need to exist, or will the model simply figure out how to order pizza?
Thereâs some evidence that Amazon is behind in its AI work and that plugging in a language model wonât suddenly make Alexa amazing. (Even the best LLMs feel like theyâre only sort of slightly close to almost being good enough to do this stuff.) But even if it does, it only makes the bigger question more important: what can virtual assistants really do for us? And how do we ask them to do it? The correct answers are âanything you want,â and âany way you like.â That requires a lot of developers to give Alexa new powers. Which requires Amazon to give them a product, and a business, worth the effort.
Read the full article here