Marley Rafson

The Case for Augmented Reality on the Web

Augmented reality is already making its way into everyday browsers! This talk will dive into what that might mean for the traditional web developer, and why developing immersive experiences makes so much sense on the web, even in the face of native alternatives. We will cover topics like off-the-shelf web technologies, performance, and privacy all in the context of augmented reality.

Portrait photo of Marley Rafson


Thank you. And hello. Today I'll be talking to you about the case for augmented reality on the web. I'm Marley Rafson.

I'm a software engineer at Google on the web XR team. My Twitter handle is mprafson.

Love to talk letter. The two things today with the web and augmented reality. I believe that the combination of these two things, the whole is greater than the sum of its parts situation. I think that the web will gain from augmented reality, and things that augmented reality will certainly gain from the web.

So, to begin, let's start by getting everyone on the same page and let's define augmented reality. It's the mixing of computergenerated content with the real world. To me, we can now add contextualized information into the world around us. So, for example, you're trying to make a coffee at the coffee machine, and you have the instructions overlaid on top of the coffee machine.

And implication of augmented reality is you can interact with 3D content in a 3D space. So, the canonical example would be you're trying to buy a piece of furniture and you're able to put it into your room before you buy it.

Let's look at what augmented reality looks like today. Today this is smartphone AR. This is run using a smart phone and basically has some understanding of the world around it allowing virtual content to interact with that world too. We have augmented reality on headsets.

Your hands would be free, and you would be free to interact with the content using your hands. And the third example that I think is less talked about, but I find compelling is audiobased augmented reality.

You're seeing Bose frames. It's a pair of sunglasses that have speakers that shoot audio into your ears. With this example, with audio AR, you would have that experience where you're at a coffee machine and you don't know what to do. But instead of seeing the information, it's read off to you.

And it I would also like to say if you hear me say AR, that's just short for augmented reality. So, this talk, as I mentioned, it's also about the web. Let's talk about where those two things intersect. That's what we'll call the immersive web.

And the immersive web is the use of augmented reality, virtual reality and other 3D technologies on the web. It's this whole spectrum of things.

It can be more augmented reality than virtual reality and doesn't quite matter. And you'll hear me say XR. And that really just stands for wild card reality. So, it could be virtual, it could be augmented reality and some combination of the two.

So, today what the immersive web looks like is something like this. We have seen progressive enhancements of sites we use every day. The left two examples, Facebook 3D posts and they're incorporated directly into the newsfeed. So, that first example is a 3D model that you can interact with on the newsfeed.

The second is using portrait mode depth information to delightfully play with these photos with a bit of depth added. And the third example is from Wikipedia who about a year ago started supporting 3D models in their articles.

So, if you're not a large company and you don't have a whole engineering team dedicated to building this model viewing experience, there's a solution to that now. So, recently Google, a team at Google open sourced a web component for viewing models. It's called Model viewer. It's fantastic. Using HTML, you can add into your side around it.

That's an example with the astronaut which we know and love. And something that's great about the modelviewer team is they're thinking about accessibility early. So, they've already incorporated alt text into the component.

So, let's say you have these 3D models on your website. How do you actually get it into augmented reality? So, right now what we've seen is implementations of native system viewers that allow for augmented reality. This is not running in a web browser, but this is slick and tight integration with the web. So, you would have this search result, so it's integrated into search with Google. So, you can tap the button and then it intents into this native viewer and then you can place it in augmented reality is the same exists on iOS and it's similarly intense.

And so, what I just said there was that we're using native viewers to do augmented reality. But we would like to do that in the browser itself. That's where the WebXR device API comes in. It exposes lowlevel sensory information. So, the camera pose, if you can place an object on the floor.

And that's under development. It's open so we can all contribute to it. And you can find that on GitHub.

And with those examples that I just shared, that's all talking about bringing immersive technologies into the web. But we can also bring the web into immersive technologies. Two examples of XR browsers. And you will see the Helio browser placing images into the world around you and then Firefox Reality, which is a VR browser.

So, I think the immersive web today is fantastic. I think it's adding information, it's making it more delightful to browse the web. But when we're developing these technologies, it's important to see where we're going next and to start thinking about that earlier rather than later. So, when I think about a future of augmented reality, I start by thinking about how can this assist me? So, I think about maybe I'm leaving work one day and I'm wearing these lightweight AR goggles, I'm walking down the street.

I'm looking for dinner. So, as I look at the restaurants on the street, I'm seeing Yelp data populating with the stars rating and the types of menu.

And maybe I also want to see photos of the food. So, Instagram information's coming too. And I like Pokemon. So, maybe while I'm doing all of this, I'm catching Pokemon, playing Pokemon Go. And when I get to the train station, I'll have information populated for me telling me which train to take so I can get home fastest.

And as I mentioned earlier, I probably don't want to be restricted to doing this on just one headset. Maybe I want headphones another day from a different company. Who knows? Maybe I just want to use my smartphone. So, when I'm thinking of this vision of augmented reality going forward, I start to see some interesting things and patterns that I would like to call out here.

So, the first is that is inherently uses web content. I want access to all the things that I already use when I'm browsing the web. Another thing is it's this really interesting composition of different types of information. So, that could be models or 2D pages. And we're going to have to interact with them in a way that's esthetic and delightful and still makes me want to use it.

And onethird trend that I start to see it we're really interacting with the world around us. I could be places information on to a building or on to a tree or on passing by cars. And so, when I start to think about this vision that I want and where we are today, which is mostly placing 3D models, I think there is a lot of work to go.

And where I see a lot of that work coming into play is with a user agent. So, the definition from W3C of a user agent is a user agent is any software that retrieves, renders and facilitates end user interaction with web content. So, today that's usually just your browser. So, you get a lot of things when you're browsing using any of the modern browsers.

It could be Chrome, it could be Firefox. It helps you do safe browsing. It renders the HTML pages for you and things like that.

So, to understand how a user agent will play out with this added dimension for augmented reality, let's start to break down this definition of user agent. So, I'll go in reverse order. Start with a user agent facilitates end user interaction with web content.

So, what I believe is that the user should always be in control of how information is presented to them. So, on a modern web browser, that's preferences like default text sizing. That's contrast ratios. And Chrome extensions and things like that.

But when you think about augmented reality, we have this whole new dimension. It's really immersive. It's, you know? And so, we need to start to think about other ways that we want to have safe browsing and we want the user to be in control.

One example, think about physical proximity. You have content that's being rendered possibly close to you and I should be able to say I'm not comfortable with anything rendering closer than 5 meters away from me. Or I think it's compelling to think about this in terms of sound. I probably would never want to allow someone to whisper into my ear as I'm walking down the street.

so, a lot of these considerations come into play when we're talking about augmented reality. And in order to accommodate these things for the years, we're going to need an intervention point for the information.

So, the user agent knows what's being asked of it to do this. If we think of declarative content that we know and love, that would be HTML and CSS. I see this future where there's declarative content that you're able to add into the world around you so you could specify things like I would like float left and pin to a building. Or you could say, I want the depth to be 5 meters from the user.

But it's up to the user agent to decide whether or not it can actually honor that request. And it's in that way that it can advocate for the user. So, another thing that declarative content offers is that is provides semantic understanding for the user agent. So, what that means is that the user agent has a view into what's being asking to rendered or presented to the use. I find this example extremely lie lighting of what I'm talking about.

If we look at the left side, you're seeing the web. If given this canvas to render the object like on the web today, given an XY location, you can only see the color. I see a black pixel, a white pixel, a gray pixel. But nothing more than that.

But with semantic understanding of what's actually in the scene, we have a much more rich understanding of the model the and the content that we're placing. So, if you look at the right side, you can see I'm looking at the windshield. And it has a material that's class. And it's transparent.

You can also see things like the tire and stuff like that. And so, you can imagine with screenreaders or accessible or just translating this into soundbased AR, you can actually talk about what the user is seeing. So, you could say I see a van with black tires and white car metal paint.

So, the next part of the user agent that I'll talk about is rendering. So, the user agent is response for rendering to the browser. So, modern web rendering emergencies are amazing. They're fantastic.

That's blank, echo, WebKit, things like that. When you think about what they asked to do, it's mostly 2D and it's textbased.

They're fantastic at rendering pages like the verge at 60 frames per second. And some of you might know and some of you might not, but the web already does performant 3D rendering. That's great and provided to the user through a Canvas element and uses WebGL to create those graphics. What that might look like today is this, this is not augmented or virtual reality. But it's just rendering in a web page using a canvas and WebGL.

So, for the rendering needs of augmented reality, thinking back to the vision is this complex composition and layout dynamics of content. So, we're gonna probably need to build a rendering engine that can handle that from the ground up. Whereas the modern rendering engines were optimized for 2D.

And so, even though I just said that we probably want this authorized from the ground up and that we don't want to use Canvas elements, we tried it. So, my team experimented on this internal prototype, air web view. It's a set of libraries that enable AR through native apps and web view. It's a lightweight browse their you can embed into a native app.

The technical understanding of what we've done is that we've web rendered and using that to do than we have tracking, AR Core and ARKit. And we combine that with a natively rendered camera feed.

Let's look at what that looks like. You look at the example, we have a stormtrooper and it's convincingly rendered into the space around you. You need to make sure that the content that's rendered moves at the same time as a native camera feed. Because if it's off, it completely breaks the illusion.

So, we have this transparent web view, it's rendering the content, and this becomes a difficult technical problem. Because if you're familiar, and don't worry if you're not. You have a render loop that's running from the browser. But then you also have a render loop that's running natively, and you also have the update loop for the native tracking.

So, it's figuring out a way to get all of these things working together in a performant way. And the advantages of AR web view is it's lightweight.

It's only 60 kilobytes added to your app. It's crossplatform. So, you write code once and it works on both iOS and Android. It's embeddable into native apps and you can pull down content and code on demand.

And that last part, to highlight the advantages of that is that you can make changes to this augmented reality experience that's independent from your native app code. You don't have to wait for an app release to change your AR experience.

So, one use of AR web view in action was at Google IO this year. If you were there, you would have seen this experience that allowed you to orient yourself once you're on site at the conference. Using augmented reality, you could see which way you might want to go next. And so, this experience is literally just a web page. And these white text that say water station and eats market are just actually HTML elements styled with 3D transforms.

So, again, AR web view, it's a prototype, it's internal. We will a lot of fun and learned a lot with it. So, this last aspect of the user agent that I'll talk about is that the user agent retrieves content on behalf of the user. And this part is so fascinating to me.

So, if you think back to this vision of augmented reality, you're walking down the street. And you look at a restaurant and it knows what you want is the menu information.

And it knows what you want is the Instagram photos. And so, if I was doing this today on my smartphone, I would go to the restaurant. I would see the name. I would type it into maybe Google search, who knows. And then I would access the information that I wanted.

But what we have with augmented reality is this fundamental paradigm shift. Now the user agent is querying on my behalf and should know what I want to see. That's very interesting if in a camerafirst browser environment, the user receives content rather than querying for it. That means that the user gets it over the surfacing. This is an interesting question off the bat.

We can think about modern questions like echo chambers and stuff like that. Manage imagine you're in an augmented reality immersive world and you don't know when there's content that you're not seeing. Or you should have a way to know and have ways to set preferences on what you do want to see.

And I think that there is kind of this pessimistic view of that. But I think there's a lot of advantages of having this intermediary between you and the information. An example that I like to think

Imagine you're a coffee drinker. You wake up, you want to wear your AR glasses on the way to work. You haven't had your glass of coffee, you want the bare minimum. Until you get to the coffee shop, then you are ready to get the other things coming. In that way, the user agent having discretion is an advantage.

And so, as you start to hear these things about augmented reality and maybe this future where everybody is wearing headsets. You might be like me yesterday in Conrad electronics thinking, that's a lot of cameras. So, it's true. A camerafirst future requires a lot of cameras.

And it's not just augmented reality that's facing these things. It would be things like selfdriving cars where they have six censors, eight, a ton of sensors looking at the world around it as they're processing. What that means is we're capturing basic life everything that happens on the sidewalk. Anything that someone wearing one of these headsets can see. And so, as we try to build towards this augmented reality feature, we need to have privacy and surveillance top of mind when thinking about worldscale AR.

So, at face value, I think one of the first things you would think about in the context of privacy is that these cameras are capturing RGB data about the world around you. Which is just a picture. So, what happens with that picture data? Is your device computing it on device? Is it going up to the cloud? Who owns that data on the cloud? Is it going to your private cloud? And then when you think about secondary data. So, that's information that's extracted from this camera image which is how augmented reality works.

You can imagine if you're building meshes around you. You live in an apartment building with a distinct statue outside. You walk out.

Your headset creates a mesh of this statue and someone could know exactly where that statue is and locate you. Even if they don't have the camera data, they can understand that from the derivative information that's used for augmented reality. And what I think, though, is that user  user agents being more powerful is actually an advantage in this scenario. Because if we can move that information away from the developer and closer to a trusted user agent, that's a positive thing. But we'll need something like open source code to actually know that we're being  that we can trust our user agents.

So, to me, augmented reality and the web is an amazing partnership. The web will gain a whole new access for understanding information. It will be more contextualized. It will be ergonomic.

And for augmented reality, the web for sure is an important piece. It will provide the content and I love the webby principles that we use today.

I think as we look forward to a future with augmented reality, we need to keep those webby principles in line. If you are interested in using the resources today, check out the resources. Look at the model viewer and see if adding 3D models increases use and context to your information. And also, start talking to us about the WebXR device API. Everyone's opinion matters and it will be great for the ecosystem overall.

So, to me if the web plus augmented reality is an open ecosystem of rich, contextualized information, then I'm excited and I hope that you all are too.

[ Applause ]