Real-time communication with WebRTC: Google I/O 2013

By | January 20, 2020

JUSTIN UBERTI: Hi everyone. Thanks for coming to the
session on WebRTC for plugin-free realtime
communication. I’m Justin Uberti, tech lead
for WebRTC at Google. And with me today is– hey,
has anyone seen Sam? SAM DUTTON: Hey. JUSTIN UBERTI: Sam Dutton,
coming to you live from WebRTC on Chrome for Android. [APPLAUSE] SAM DUTTON: On a beautiful
Nexus 7. We got this low-res to cope
with the Wi-Fi here. That seems to be working
pretty well. JUSTIN UBERTI: That was
quite an entrance. Why don’t you come up here
and introduce yourself? SAM DUTTON: Yeah. Hey. I’m Sam Dutton. I’m a developer advocate
for Chrome. JUSTIN UBERTI: So we’re here to
talk to you today about the great things that WebRTC’s been
working on and how you can use them. So what is WebRTC? In a nutshell, it’s what we call
realtime communication– RTC– the ability to communicate
live with somebody or something as if you were right
there next to them. And this can mean audio,
video, or even just peer-to-peer data. And we think WebRTC
is really cool. But there’s a lot of other
people who are really excited about WebRTC as well. And one of the reasons is that
WebRTC fills a critical gap in the web platform, where
previously, a native proprietary app like Skype could
do something the web just couldn’t. But now we’ve turned that around
and changed that so we have a web of connected WebRTC
devices that can communicate in realtime just by loading
a web page. So here’s what we’re trying to
do with WebRTC, to build the key APIs for realtime
communication into the web, to make an amazing media stack in
Chrome so that developers can build great experiences, and
to use this network of connected WebRTC devices
to create a new communications ecosystem. And these kind of seem
like lofty goals. But take this quote from the
current CTO of the FCC who said he sees traditional
telephony fading away as voice just becomes another web app. So we’re trying to live
up to that promise. And right now, you can build a
single app with WebRTC that connects Chrome, Chrome for
Android, Firefox, and very soon, Opera. I’m especially excited to
announce the as of this week, Firefox 22 is going to beta,
which is the very first WebRTC-enabled version
of Firefox. So within a matter of weeks, we
will have over one billion users using a WebRTC-enabled
browser. [APPLAUSE] JUSTIN UBERTI: And I think that
just gives a good idea of the size of the opportunity
here. And we respect that number to
grow very significantly as both Chrome and Firefox get
increased adoption. For places where we don’t have
WebRTC-enabled browsers, we’re providing native, supported,
official tool kits on both Android, and very soon, iOS,
that can interoperate with WebRTC in the browser. [APPLAUSE] JUSTIN UBERTI: So here are
just a handful of the companies that see the
opportunity in WebRTC and are building their business
around it. So that’s the vision
for WebRTC. Now let’s dig into the APIs. There are remain categories of
API that exist in WebRTC. First, getting access
to input devices– accessing the microphone,
accessing the webcam, getting a stream of media from
either of them. Secondly, being able to connect
to another WebRTC endpoint across the internet,
and to send this audio and video in realtime. And third, the ability to do
this not just for audio and video, but for arbitrary
application data. And we think this one is
especially interesting. So because there’s
three categories, we have three objects. Three primary objects in WebRTC
to access this stuff. The first one, MediaStream, for
getting access to media, then RTCPeerConnection
and RTCDataChannel. And we’ll get into each one
of these individually. Sam, why don’t you tell
us about MediaStream? SAM DUTTON: Yeah, sure. So MediaStream represents a
single source of synchronized audio or video or both. Each MediaStream contains one
or more MediaStream tracks. For example, on your laptop,
you’ve got a webcam and a microphone providing video and
audio streams, and they’re synchronized. We get access to these local
devices using the getUserMedia method of Navigator. So we just look at the code for
that, just highlight that. And you can see that
getUserMedia there, it takes three parameters, three
arguments there. And the first one, if we look
at the constraints argument I’ve got, you can see I’m just
specifying I want video. That’s all I’m saying. Just give me video
and nothing else. And then in the success
callback, we’re setting the source of a video using the
stream that’s returned by getUserMedia. Let’s see that in action, really
simple example here. And you can see when we fire
the getUserMedia method, we get the allow permissions
bar at the top there. Now, this means that users have
to explicitly opt in to allowing access to their
microphone and camera. And yeah, there we have it. Using that code,
we’ve got video displayed in a video element. Great. What really excites me about
these APIs is when they come up against each other,
like in this example. What’s happening is, that we’ve
got getUserMedia being piped into a canvas element,
and then the canvas element being analyzed, and then
producing ASCII, just like that, which could make a
good codec, I think. JUSTIN UBERTI: It would
be a good codec. You can press it using
just gzip. SAM DUTTON: Yeah, smaller font
sizes, high resolution. Also, another example of
this from Facekat. Now what’s happening here is
that it’s using the head tracker JavaScript library to
track the position of my head. And when I move around, you can
see I’m moving through the game and trying to stay alive,
which is quite difficult. God, this is painful. Anyway– whoa. OK, I think I’ve flipped
into hyperspace there. And an old favorite, you’ve may
well have seen a webcam toy which gives us access to the
camera, kind of photobooth app, uses WebGL to create a
bunch of slightly psychedelic effects there. I quite this old movie
one, so I’ll take that and get a snapshot. And I can share that with my
friends, so beautiful work from Paul Neave there. Now you might remember I said
that we can use the constraints object. The simple example there was
just saying, use the video, nothing else. Well, we can do more interesting
things with constraints than that. We can do stuff like specify
the resolution or the frame rate, a whole stack of
things that we want from our local devices. A little example from that,
if we go over here. Now, let’s look at the
code, actually. If we go to the dev tools there,
you can see that I’ve got three different constraints
objects, one for each resolution. So when I press the buttons,
I use the QVGA constraints, getUserMedia, and then with the
VGA one, I’m getting high resolution. And for HD, I’m getting
the full 1280 by 720. We can also use getUserMedia
now for input from our microphone. In other words, we can use
getUserMedia to provide a source node for web audio. And there’s a huge amount of
interesting stuff we can do with that processing audio using
web audio, from the mic or wherever. A little example
of that here– I’ll just allowed access to the
mic, and you can see, I’m getting a nice little
visualization there in the canvas element. And I can start to
record this, blah blah blah blah blah– [AUDIO PLAYBACK] -To record this, blah blah
blah blah blah– [END AUDIO PLAYBACK] SAM DUTTON: And yeah, you can
see that’s used recorder.js to save that locally to disk. GetUserMedia also now– this is
kind of experimental, but we can use getUserMedia to get
a screen capture, in other words data coming directly from
what we see on screen, not from the audio video from
the mic and the camera. Probably the simplest if I show
you an example of this, so yeah, a little application
here. And when I click to make the
call, allow, and you can see there that I get this kind of
crazy hall of mirrors effect, because I’m capturing the screen
that I’m capturing, and so on and so on. Now that’s quite nice. But it would be really useful
if we could take that screen capture and then transmit that
to another computer. And for that, we have
RTCPeerConnection. JUSTIN UBERTI: Thanks, Sam. So as the name implies,
RTCPeerConnection is all about making a connection to another
peer and over this peer connection, we can actually
then go and send audio and video. And the way we do this is we
take the media streams that we’ve got from getUserMedia, and
we plug them into the peer connection, and send them
off to the other side. When the other side receives
them, they’ll pop out as a new media stream on their
peer connection. And they can then plug that
into a video element to display on the page. And so both sides of a peer
connection, they both get streams from getUserMedia, they
plug them in, and then those media streams pop out
magically encoded and decoded on the other side. Now under the hood,
peer connection is doing a ton of stuff– signal processing to remove
noise from audio and video; codec selection and compression
and decompression of the actual audio and video;
finding the actual peer-to-peer route through
firewalls, through NATs, through relays; encrypting the
data so that a user’s data is fully protected at all times;
and then actually managing the bandwidth so that if you have
two megabits, we use it. If you have 200 kilobits,
that’s all we use. But we do everything we can hide
this complexity from that web developer. And so the main thing is that
you get your media streams, you plug them in via
Adstream to peer connection, and off you go. And here’s a little
example of this. SAM DUTTON: Yeah, so you can see
here that we’ve created a new RTCPeerConnection. And when the stream is received,
the callback for that in gotRemoteStream there
attaches the media we’re getting from a video element
to the stream. Now, at the same time, we’re
also creating what’s called an offer, giving information about
media, and we’re setting that as the local description,
and then sending that to the callee, so that they can set
the remote description. You can see that in the
gotAnswer function there. Let’s have a little look at
RTCPeerConnection on one page, a very simple example here. So what we’ve got here
is getUserMedia here, just start that up. So it’s getting video from the
local camera here, displaying it on the left there. Now when I press call, it’s
using RTCPeerConnection to communicate that video
to the other– yeah, the other video element
on the page there. This is a great place to start
to get your head around RTCPeerConnection. And if we look in the code
there, you can see that it’s really simple. There’s not a lot of code there
to do that, to transmit video from one peer
to another. JUSTIN UBERTI: So that’s
really cool stuff. A full video chat client in a
single web page, and just about 15 lines of JavaScript. And we talked a bit quickly
through the whole thing around how we set up the parameter of
the call, the offers and answers, but I’ll come
back to that later. The next thing I want to talk
about is RTCDataChannel. And this says, if we have a peer
connection which already creates our peer-to-peer link
for us, can we send arbitrary application data over it? And this is the mechanism
that we use to do so. Now one example where we would
do this would be in a game. Like, take this game. I think it’s called Jank
Wars or something. And we have all these ships
floating around onscreen. Now, when a ship moves, we
want to make sure that’s communicated to the other player
as quickly as possible. And so we have this little JSON
object that contains the parameters and the position and
the velocity of the ships. And we can just take that object
and stuff it into the send method, and it will shoot
it across the other side where it pops out as onMessage. And the other side can
do the same thing. It can call send on its data
channel, and it works pretty much just like a WebSocket. That’s not an accident. And we tried to design it that
way, so that people familiar with using WebSockets could
also use a similar API for RTCDataChannel. And the benefit is that here,
we have a peer-to-peer connection with the lowest
possible latency for doing this communication. In addition, RTCDataChannel. can be either unreliable
or reliable. And we can think about this kind
of like UDP versus TCP. If you’re doing a game, it’s
more important that your packets get there quickly
than they’re guaranteed to get there. Whereas if you’re doing a file
transfer, the files are only any good if the entire
file is delivered. So you can choose this as the
app developer, which mode you want to use, either unreliable
or reliable. And lastly, everything
is fully secure. We use standard DTLS encryption
to make sure that the packages you send across
the data channel are fully encrypted on their way
to the destination. And you can do this either with
audio and video, or if you want to make a peer
connection for just data, you can do that as well. So Sam’s going to show us
how this actually works. SAM DUTTON: Yeah, so again,
another really simple example. We’re creating a peer connection
here, and once the data channel is received, in
the callback to that, we’re setting the receive
channel using the object. Now, when the receive channel
gets a message, kind of like WebSocket really, we’re just
putting some text in a local div there, using Now, the send channel
was created with createDataChannel. And then we got a send button. When that’s clicked, we get the
data from a text area, and we use the send channel to send
that to the other peer. Again, let’s see
this in action. This is, again, a good place to
start– one page demo, with all the code for RTCDataChannel,
so type in some text, and we hit send, and
it’s transmitting it to the other text area. A great place to start
if you’re looking at RTCDataChannel. Something a little more
useful here, a great app from Sharefest. Now, Sharefest is using
RTCDataChannel. to enable us to do
file sharing. I think I’m going to select a
nice photo here I’ve got of some cherries. And it’s popeye, is the URL. And now Justin is going to try
and get that up on screen on his side, just to check that
that’s gone through. So like I say, this is doing
file sharing using RTCDataChannel, and there’s
a huge amount of potential there. There we go. Those are the cherries. JUSTIN UBERTI: I
love cherries. SAM DUTTON: These are beautiful
Mountain View cherries, actually. They were really, really nice. JUSTIN UBERTI: All this data
is being sent peer-to-peer, and anybody else who connects to
the same URL will download that data peer-to-peer
from Sam’s machine. And so none of this has to
touch Sharefest servers. And I think that’s pretty
interesting if you think about things like file transfer and
bulk video distribution. OK, so we talked a lot about
how we can do really clever peer-to-peer stuff with
RTCPeerConnection. But it turns out we need servers
to kind of get the process kicked off. And the first part of it is
actually making sure that both sides can agree to actually
conduct the session. And this is the process that
we call signaling. The signaling in WebRTC is
abstract, which means that there’s no fully-defined
protocol on exactly how you do it. The key part is that you just
have to exchange session description objects. And if you think about this kind
of like a telephone call, when you make a call to someone,
the telephone network sends a message to the person
you’re calling, telling them there’s an incoming call and
the phone should ring. Then, when they answer the call,
they send a message back that says, the call
is now active. Now, these messages also contain
parameters around what media format to use, where the
person is on the network, and the same is true for WebRTC. And these things, these session
description objects, contain parameters like, what
codecs to use, what security keys to use, the network
information for setting up the peer-to-peer route. And the only important thing is
that you just send it from your side to the other
side, and vice versa. You can use any mechanism
you want– WebSockets, Google Cloud
Messaging, XHR. You can use any protocol, even
just send it as JSON, or you can use a standard protocols
like SIP or XMPP. Here’s a picture of how
this all works. The app gets a session
description from the browser and sends it across through the
cloud to the other side. Once it gets the message back
from the other side with the other side’s session
description, and both sessions consider passed down to WebRTC
in the browser, WebRTC can then set up and conduct the
media link peer-to-peer. So we do a lot to try to hide
the details of what’s inside the RTCSessionDescription,
because this includes a whole bunch of parameters– as I said, codecs, network
information, all sorts of stuff– this is just a snippet of what’s
contained inside a session description right now. Really advanced apps can do
complex behaviors by modifying this, but we designed API so
that regular apps just don’t have to think about it. The other thing that we need
servers for is to actually get the peer-to-peer session
fully routed. And in the old days, this
wouldn’t be a problem. A long time ago, each side
had a public IP address. They send each other’s IP
address to each other through the cloud, and we make the link directly between the peers. Well, in the age of NAT, things
are more complicated. NATs hand out what’s called a
private IP address, and these IP addresses are not useful
for communication. There’s no way we can make the
link actually peer-to-peer unless we have public address. So this is where we bring a
technology called STUN. The STUN server we can contact
from WebRTC, and we say, what’s my public IP address? And basically, the request comes
into the STUN server, it sees the address that that
request came from, puts the address into the packet,
and sends it back. So now WebRTC knows its public
IP address, and the STUN server doesn’t have to be in
the party anymore, doesn’t have to have media flowing
through it. So here, if you look at this
example, each side has contacted that STUN server to
find out what its public IP address is. And then it’s sent the traffic
to the other IP address through its NAT, and the data
still flows peer-to-peer. So this is kind of magic stuff,
and it usually works. Usually we can make sure that
the data all flows properly peer-to-peer, but not
in every case. And for that, we have a
technology called TURN built into WebRTC. This turn things around and
provides a cloud fallback when a peer-to-peer link is
impossible, basically asks for a relay in the cloud, saying,
give me a public address. And because this public address
is in the cloud, anybody can contact it, which
means the call always sets up, even if you’re behind
a restrictive, or even behind a proxy. The downside is that since the
data actually is being relayed through the server, there is
an operational cost to it. But it does mean the call
works in almost all environments. Now, on one hand, we have STUN,
which is super cheap, but doesn’t always work. And we have TURN, which
always works, but has some cost to it. How do we make sure we get
the best of both worlds? Here’s TURN in action, where
we try to use STUN and STUN didn’t work. And we couldn’t get the
things to actually penetrate the NATs. So instead, we fell back. Only then did we use TURN, and
sent the media from our one peer, through the NAT, through
the TURN server, and to the other side. And this is all done by a
technology called ICE. ICE knows about STUN and TURN,
and tries all the things in parallel to figure out the
best path for the call. If it can do STUN,
it does STUN. If it can do TURN, well then
I’ll fall back to TURN, but I’ll do so quickly. And we have stats from a
deployed WebRTC application that says 86% of the time,
we can make things work with just STUN. So only one out of seven calls
actually have to run through a TURN server. So how do you deploy TURN
for your application? Well, we have some testing
servers, a testing STUN server that you can use, plus we make
source code available for our own STUN and TURN server
as part of the WebRTC code package. But the thing I would really
recommend is the long name, but really good product– rfc5766-turn-server– which has Amazon VM images
that you can just take, download, and deploy into the
cloud, and you’ve got your TURN server provisioned for all
your users right there. I also recommend restund,
another TURN server that we’ve used with excellent results. One question that comes up
around WebRTC is, how is security handled? And the great thing is that
security has been built into WebRTC from the very beginning,
and so this means several different things. It means we have mandatory
encryption for both media and data. So all the data that’s being
sent by WebRTC is being encrypted using standard
AES encryption. We also have secure UI, meaning
the user’s camera microphone can only be accessed
if they’ve explicitly opted in to making that
functionality available. And last, WebRTC runs inside
the Chrome sandbox. So even if somebody tries to
attack WebRTC inside of Chrome, the browser and the user
will be fully protected. So here’s what you need to do
to take advantage of the security in WebRTC,
is really simple. Your app just needs
to use HTTPS for actually doing the signaling. As long as the signaling goes
over a secure conduit, the data will be fully secured as
well using the standard protocols of SRTP for media
or Datagram TLS for the data channel. One more question that comes
up is around making a multi-party call, a
conference call. How should I architect
my application? In the simple two-party
case, it’s easy. We just have a peer-to-peer
link. But as you start adding more
peers into the mix, things get a bit more complicated. And one approach that people use
is a mesh, where basically every peer connects to
every other peer. And this is really simple,
because there’s no servers or anything involved, other than
the signaling stuff. But every peer has to send
and copy this data to every other peer. So this has a corresponding
CPU and bandwidth cost. So depending on the media
you’re trying to send– for audio, it can be
kind of higher. For video, it’s going to be
less– the number of peers you can support in this topology is
fairly limited, especially if one of the peers is
on a mobile device. To deal with that another
architecture that can be used is the star architecture. And here, you can pick the most
capable device to be what we call the focus
for the call. And the focus is the part that’s
actually responsible for taking the data and sending
a copy to each of the other endpoints. But as we get to handing
multiple HD video streams, the job for a focus becomes
pretty difficult. And so for the most robust
conferencing architecture, we recommend an MCU, or multipoint
control unit. And this is a server that’s
custom made for relaying large amounts of audio and video. And it can do various things. It can do selective
stream forwarding. It can actually mix the
audio or video data. It can also do things
like recording. And so if one peer drops out, it
doesn’t interrupt the whole conference, because the MCU is
taking care of everything. So WebRTC is made with
standards in mind. And so you can connect
things that aren’t even WebRTC devices. And one thing that people want
to talk from WebRTC is phones. And there’s a bunch of easy
things they can be dropped into your web page to
make this happen. There’s a sipML5, which is
a way to talk to various standard SIP devices, Phono, and
what we’re going to show you now, a widget from Zingaya
to make a phone call. SAM DUTTON: OK, so we’ve got a
special guest joining us a little bit later in
the presentation. I just wanted to give him a call
to see if he’s available. So let’s use the Zingaya
WebRTC phone app now. And you could see, it’s
accessing my microphone. [PHONE DIALING AND RINGING] SAM DUTTON: Calling someone. I hope it’s the person I want. [PHONE RINGING] SAM DUTTON: See if he’s there. CHRIS WILSON: Hello? SAM DUTTON: Hey. Is that you, Chris? CHRIS WILSON: Hey, Sam. How’s it going? It is. SAM DUTTON: Hey. Fantastic. I just want to check you’re
ready for your gig later on. CHRIS WILSON: I’m ready
whenever you are. SAM DUTTON: That’s fantastic. OK, speak to you soon, Chris. Thanks. Bye bye. CHRIS WILSON: Talk
to you soon. Bye. SAM DUTTON: Cheers. JUSTIN UBERTI: It’s great–
no plugins, realtime communication. SAM DUTTON: Yeah, that
situation, we had a guy with a telephone. Something we were thinking
about is situations where there is no telephone network. Now, Voxio demonstrated this
with something called Tethr, which is kind of disaster
communications in a box. It uses the open BTS cell
framework– you can see, it’s that little box there– to
enable calls between feature phones via the open BTS cell
through WebRTC to computers. You can imagine this is kind
of fun to get a license for this in downtown San Francisco,
but this is incredibly useful in situations
where there is no infrastructure. Yeah, this is like telephony
without a carrier, which is amazing. JUSTIN UBERTI: So we have a code
lab this afternoon that I hope you can come to, where
I’ll really go into the details of exactly how to build
a WebRTC application. But now we’re going to talk
about some resources that I think are really useful. The first one is something
called WebRTC Internals. And this is a page you can open
up just by going to this URL while you’re in
a WebRTC call. And it’ll show all sorts of
great statistics about what’s actually happening
inside your call. This would be things like packet
loss, bandwidth, video resolution and sizes. And there’s also a full log of
all the calls made to the WebRTC API that you can
download and export. So if a customer’s reporting
problems with their call, you can easily get this debugging
information from them. Another thing is, the
WebRTC spec has been updating fairly rapidly. And so in a given browser, the
API might not always match the latest spec. Well, adapter.js is something
that’s there to insulate the web developer from the
differences between browsers and the differences
between versions. And so we make sure that
adapter.js always implements the latest spec, and then thunks
down to whatever the version supports. So as new APIs are added, we
polyfill them to make sure that you don’t have to write
custom version code or custom browser code for each browser. And we use this in our
own applications. SAM DUTTON: OK, if all this is
too much for you, good news is, we’ve got some fantastic
JavaScript frameworks come up in the last few months, really
great abstraction libraries to make it really, really simple to
build WebRTC apps just with a few lines of code. Example here from SimpleWebRTC,
a little bit of JavaScript there to specify a
video element that represents local video, and one that
represents the remote video stream coming in. And then join a room just by
calling the joinRoom method with a room name– really, really simple. PeerJS does something similar
for RTCDataChannel– create a peer, and then on
connection, you can send messages, receive messages, so
really, really easy to use. JUSTIN UBERTI: So JavaScript
frameworks go a long way, but they don’t cover the production aspects of the service– the signaling, the STUN and TURN
service we talked about. But fortunately, we have things
from both OpenTok and Vline that are basically turnkey
WebRTC services that handle all this stuff for you. You basically sign up for the
service, get an API key, and then you can make calls
using their production infrastructure, which is spread throughout the entire globe. They also make UI widgets that
can be easily dropped into your WebRTC app. So you get up and running
with WebRTC super fast. Now, we’ve got a special
treat for you today. Chris Wilson, a colleague of
ours, and a developer in the original Mosaic browser, and
an occasional musician as well, is going to be joining us
courtesy of WebRTC to show off the HD video quality and
full-band audio quality that we’re now able to offer in the
latest version of Chrome. Take it away, Chris. CHRIS WILSON: Hey, guys. SAM DUTTON: Hey, Chris. How’s it going? CHRIS WILSON: I’m good. How are you? SAM DUTTON: Yeah, good. Have you got some kind of
musical instrument with you? CHRIS WILSON: I do. You know, originally
you asked me for a face-melting guitar solo. But I’m a little more
relaxed now. I/O is starting to wind down. You can tell I’ve already got
my Hawaiian shirt on. I’m not ready for
some vacation. So I figured I’d bring my
ukulele and hook it up through a nice microphone here, so we
can listen to how that sounds. SAM DUTTON: Take it away. Melt my face, Chris. [PLAYING UKULELE] SAM DUTTON: That’s
pretty good. JUSTIN UBERTI: He’s
pretty good. All right. SAM DUTTON: That
was beautiful. Thank you, Chris. [APPLAUSE] CHRIS WILSON: All right, guys. JUSTIN UBERTI: Chris
Wilson, everybody. SAM DUTTON: The audience
has gone crazy, Chris. Thank you very much. JUSTIN UBERTI: You want
to finish up? SAM DUTTON: Yeah. So, we’ve had– well, a fraction
over 30 minutes to cover a really big topic. There’s a great lot of more
information out there online, some good stuff on HTML5 Rocks,
and a really good e-book too, if you want to
take a look at that. There are several ways
to contact us. There’s a great Google group– discuss-webrtc– post your technical questions. All the kind of new news for
WebRTC comes through on Google+ and Twitter stream. And we’re really grateful of
all the people, all of you who’ve submitted feature
requests and bugs. And please keep them coming,
and the URL for that is So thank you for that. [APPLAUSE] JUSTIN UBERTI: And so we’ve
built this stuff into the web platform to make realtime
communication accessible to everyone. And we’re super excited because
we can’t wait to see what you all are
going to build. So thank you for coming. Once again, the link. And now, if you have any
questions, we’ll be happy to try to answer them. Thank you very much. SAM DUTTON: Yeah. Thank you. [APPLAUSE] AUDIENCE: Hi. My name is Mark. I like to know, because I’m
using Linux and Ubuntu, how finally can I get rid of the
talk plugin for using Hangouts in Google+? JUSTIN UBERTI: The question
is, when can we get rid of that Hangouts plug-in? And so unfortunately, we
can only talk about WebRTC matters today. That’s handled by
another team. But let’s say that there
are many of us who have the same feeling. AUDIENCE: OK. Great. [LAUGHTER] AUDIENCE: Can you make any
comments on Microsoft’s competing standard, considering
they kind of hold the cards with Skype, and how
maybe we can go forward supporting both or maybe
converge the two, or just your thoughts on that? JUSTIN UBERTI: So Microsoft
has actually been a great participant in standards. They have several people they
sent from their team. And although they don’t see
things exactly the same way that we do, I think that the API
differences are sort of, theirs is a lot more low-level,
geared for expert developers. Ours is a little more
high-level, geared for web developers. And I think that really what
you can do is you can implement the high-level one on
top of the low-level one, maybe even vice versa. So Microsoft is a little more
secretive about what they do. So we don’t know exactly
what their timeframe is relative to IE. But they’re fully
participating. And obviously, they’re very
interested in Skype. So I’m very optimistic that
we’ll see a version of IE that supports this technology in the
not-too-distant future. AUDIENCE: Very good to hear. Thank you. AUDIENCE: My question would be,
I think you mentioned it quickly in the beginning. So if I wanted to communicate
with WebRTC, but one, I’m using a different environment
than the browser. Let’s say I want a web
application to speak to a native Android app. So what would be the approach to
integrate that with WebRTC? JUSTIN UBERTI: As I mentioned
earlier, we have a fully supported official native
version of pure connection, PureConnection.Java, which is
open source, and you can download, and you can build
that into your native application. And it interoperates. We have a demo app that
interoperates with our AppRTC demo app. So I think that using Chrome for
Android in a web view is one thing you can think about. But if that doesn’t work for
you, we have a native version that works great. AUDIENCE: OK. Thank you. AUDIENCE: Hi. My question would be, are there
any things that to be taken care between cross-browser
compatibility for this Firefox Chrome? Anything specific that
needs to be taken care, or it just works? JUSTIN UBERTI: There are
some minor differences. I mentioned adapter.js covers
some of the things where the API isn’t quite in sync
in both places. One specific thing is that
Firefox only supports the opus codec, and they only support
DTLS encryption. They don’t support something
called S-DES, that we also support. So for right now, you have to
set one parameter in the API, and you can see that in our app
RTC source code, to make sure that communication
actually uses those compatible protocols. We actually have a document,
though, on our web page, the documents exactly what you have
to do, which is really setting a single constraint
parameter when you’re creating your peer connection object. SAM DUTTON: Yeah. If you go to JUSTIN UBERTI: Yeah. That works at org/interop. AUDIENCE: OK. Thank you. AUDIENCE: When a peer connection
is made and it falls back to TURN, does the
TURN server, is it capable of unencrypting the messages that
go between the two endpoints? JUSTIN UBERTI: No. The TURN server is just
a packet relay. So this stuff is fully
encrypted. It doesn’t have the keying
information to do anything to it. So the TURN server just takes a
byte, sends a byte, takes a packet, sends a packet. AUDIENCE: So for keeping data
in sync with low latency between, say, an Android
application and the server, how would both the native
and the Android Chrome implementations of WebRTC fare
in terms of battery life? JUSTIN UBERTI: I don’t really
have a good answer for that. I wouldn’t think there would
be much difference. I mean, the key things that
are going to be driving battery consumption
in this case– are you talking about data,
or are you talking about audio and video? AUDIENCE: Data. JUSTIN UBERTI: For data, the
key drivers of your power consumption are going to be the
screen and the network. And so I think those should be
comparable between Chrome for Android and the native
application. AUDIENCE: OK, cool. Thanks. AUDIENCE: With two computers
running Chrome, or what have you seen glass-to-glass
latency? JUSTIN UBERTI: Repeat? AUDIENCE: Glass-to-glass, so
from the camera to the LCD. JUSTIN UBERTI: Oh, yeah. So it depends on a platform,
because the camera can have a large delay built
into it itself. Also, some of the audio
things have higher latencies than others. But the overall target is 150
milliseconds end-to-end. And we’ve seen lower than 100
milliseconds in best case solutions for glass-to-glass
type latency. AUDIENCE: OK. And how are you ensuring
priority of your data across the network? JUSTIN UBERTI: That’s a complex question with a long answer. But the basic thing, are you
saying, how do we compete with cat videos? AUDIENCE: No, just within the
WebRTC, are you just– how are you tagging
your packets? JUSTIN UBERTI: Right, so there
is something called DSCP where we can mark QoS bits– and this
isn’t yet implemented in WebRTC, but it’s on the roadmap,
to be able to tag things like audio as higher
priority than, say, video, and that as a higher priority
than cat videos. AUDIENCE: So it’s not today,
but will be done? JUSTIN UBERTI: It
will be done. We also have things for doing
FEC type mechanisms to protect things at the application
layer. But the expectation is that as
WebRTC becomes more pervasive, carriers will support DSCP at
least on the bit from coming off the computer and going
onto their network. And we have that DSCP does
help going through Wi-Fi access points, because Wi-Fi
access points to give priority to DSCP-marked traffic. AUDIENCE: Thank you. AUDIENCE: So in Chrome for iOS
being limited to UI web view and with other restrictions, how
much of WebRTC will you be able to implement? JUSTIN UBERTI: So that’s a
really interesting question. They haven’t made it easy for
us, but the Chrome for iOS team has already done some
amazing things to deliver the Chrome experience that
exists there now. And so we’re pretty optimistic
that one way or another, we can find some way to
make that work. No commitment to the
time frame, though. AUDIENCE: What are the
mechanisms for a saving video and audio that’s broadcast with
WebRTC, like making video recordings from it? JUSTIN UBERTI: So if you have
the media stream, you can then take the media stream and plug
it into things like the Web Rdio API, where you can actually
get the raw samples, and then make a wave file
and save that out. On the video side, you can go
into a canvas, and then extract the frames from a
canvas, and you can save that. There isn’t really any way to
sort of save it as a .MP4, .WEBM file yet. But if you want to make a thing
that just captures audio from the computer then it stores
on a server, you could basically make a custom server
that could do that recording. That’s one option. AUDIENCE: So the TURN
server is open– but you said the TURN server
doesn’t capture. JUSTIN UBERTI: No. AUDIENCE: It can’t act
as an endpoint. Do you have server technology
that acts as an endpoint? JUSTIN UBERTI: There
are people building this sort of stuff. Vline might be one particular
vendor who does this, but there’s something where you can
basically have an MCU, and the MCU that receives the media
could then do things like compositing or recording
of that media. AUDIENCE: So presumably, the
libraries for Java or Objective C could be used
to create a server implementation? JUSTIN UBERTI: Exactly. That’s what they’re doing. AUDIENCE: Hi, kind of two-part
question that has to do around codecs, specifically
on the video side, currently VP8, WebM. Is there plans for H.264,
and also what’s the timeline for VP9? JUSTIN UBERTI: Our plans are
around the VP family of codecs, so we support VP8. And VP9, you may have heard that
it’s sort of trying to finalize the bit stream
right now. So we are very much looking
forward to taking advantage of VP9 with all its new coding
techniques, once it’s both finished and also optimized
for realtime. AUDIENCE: And H.264, not
really on the plan? JUSTIN UBERTI: We think that
VP9 provides much better compression and overall
performance than H.264, so we have no plans as far as
H.264 at this time. AUDIENCE: OK. AUDIENCE: Running WebRTC on
Chrome or Android for mobile and tablets, how does it
compare with native performance, like Hangouts
on Android? JUSTIN UBERTI: We think that
we provide a comparable performance to any native
application right now. We’re always trying to
make things better. We still have Chrome for
Android, the WebRTC’s behind a flag because we still have work
to do around improving audio, improvement some
of the performance. But we think we can deliver
equivalent performance on the web browser. And we’re also working on taking
advantage of hardware acceleration, in cases where
there’s hardware decoders like there is on Nexus 10, and making
that so we can get the same sort of down-to-the-metal
performance that you could get from a native app. AUDIENCE: So the Google Talk
plugin is using not just H.264, but H.264 SVC optimized
for the needs of videoconferencing. Is VP8 and VP9 going to
be similarly optimized specifically in an SVC-like
fashion for video conferencing versus just the versions
for file encoding? JUSTIN UBERTI: So VP8 already
supports temporal scalability in the S part of SVC. VP9 supports additional
scalability modes as well. So we’re very excited about the
new coding techniques that are coming in VP9. AUDIENCE: So we want to use
WebRTC to do live streaming from, let’s say, cameras,
hardware cameras. And what are the things that
we should take care of such kind of an application? And when you mentioned
VP8 and VP9 support, H.264 is not supported. Assuming your hardware supports
only H.264, WebRTC can be used with Chrome
in that case? JUSTIN UBERTI: We are building
up support for hardware VP8, and later, VP9 encoders. So you can make a media
streaming application like you described, but we’re expecting
that all the major SSE vendors are now shipping hardware
with built-in VP8 encoders and decoders. So as this stuff gets into
market, you’re going to see this stuff become the most
efficient way to record and compress data. AUDIENCE: So the only way is
to support VP8 in hardware right now, right? JUSTIN UBERTI: If you want
hardware compression, the only things that we support right
now will be VP8 encoders. AUDIENCE: That’s on the device
side, you know, the camera which is on– JUSTIN UBERTI: Right. If you’re having encoding from
a device that you want to be decoded within the browser, I
advise you to do it in VP8. AUDIENCE: Thank you. JUSTIN UBERTI: Thank
you all for coming. SAM DUTTON: Yeah, thank you. [APPLAUSE]

65 thoughts on “Real-time communication with WebRTC: Google I/O 2013

  1. Sam Dutton Post author

    As linked to above and in the annotation on the video:

  2. stupedcraig Post author

    Am I missing something or does STUN not work with NAT unless port forwarding is setup?

  3. Justin Uberti Post author

    STUN will work with most NATs without explicit port forwarding set up.

  4. SleepyBoBos Post author

    So will this kill apps like Skype? Seems like peer to peer communications with video and audio is going to be so trivial now using your browser. It might at least change the business model for this type of service ie you go to a site to use their free 'call your friend' app and just be exposed to '5 tricks to a flat belly' ad?

  5. Antony Meyn Post author

    accoording to 24:30 of this video: "MCU large N-way call" would help build a robust group conference app, but isn't MCU a public server?? Would it then still need STUN or TURN?

  6. Sandip Thakor Post author

    for me :Its really next generation communication platform, for some one its agaist SKYPE mean again MS. for some its Google s another milestone for chrome browser… Anyway it is innovation from Google and thrs lots more to on Web RTC.

  7. Justin Uberti Post author

    You might still need TURN to traverse firewalls, but you wouldn't need STUN.

  8. Beetz Mee Post author

    It all sounds great but, what are you going to do to improve the infrastructure to support all that demand? The LAN and LAN/WEB based apps continue to explode. One problem, it all starts to fail when every yak and his brother no longer want to just talk, text and email and want to throw video on the pile as well….outside of security, the pipes aren't getting fatter and nooooooo one seems to have enough budget to buy and support it….and now, both sides have to comply or else?

  9. noauth Post author

    Assuming the MCU is a public server (it doesn't make sense otherwise) then NAT traversal is not necessary.

  10. sune00 Post author

    Check out our summerstudents: "" they did WebRTC conference

  11. lennyhome Post author

    So it does TURN only if STUN doesn't work or if the NSA asks for it.

  12. Chris Charles Post author

    I need a video chat app built with this.. for android, chrome and possibly ios… Anyone know a good coder who can script this up quick?

  13. Justin Uberti Post author

    Media is still encrypted, even when going over TURN. TURN is simply a packet relay.

  14. Kenneth Snyder Post author

    Is there a reference link available to the FCC quote regarding 'telephony fading away to just becoming a web app'?

  15. Justin Uberti Post author

    Check out the IETF 86 technical plenary recording on MeetEcho, around 13:20:
    "And we have efforts, which some of you are involved in, namely WebRTC, which basically turns voice into just another javascript application."

  16. Mohmed Ataala Post author

    Does WebRTC support broadcast audio  with  chat group 

  17. Collin Anglin Post author

    If anyone wants to test and experience a WEB RTC product send me a message and I will invite you to our beta testing session!  It is incredible and surely will make profits for anyone looking to help in the distribution of this product.

  18. Vijay sol Post author

    can we implement this functionality in PHP? if yes then please suggest us ?

  19. Ahmad Atlam Post author

    Does anyone else realize how much of a security nightmare they have just created ?? MIC, speakers and web cam access directly from browser using JS !! … That ought to be very interesting to see how it unfolds, but, why do I get the feeling that very soon ALOT of pictures and recordings will be taken that shouldnt have happened. How about behind the scenes face recognition for user profiling on e-commerce ?? From now on, put a sticker on your camera, disable your built-in mic and never leave an external one plugged in … on another note, what if in the same browser, 2 pages are using webRTC ?? how do you even know if that silly pop-up is using WebRTC ?? as much as this has potential, I prefer dedicated software to do the job … very soon we'll be running servers on each browser !! Mozilla and Chrome will inherit all the computers !! we might as well just plug in a browser like the matrix

  20. Anjunakitchen Post author

    Yay, Chrome becomes bloated with native crap an OS should be handling, along with the now even more bloated Firefox.

  21. VOIP Portland Post author

    3CX's new Webmeeting application brought me here.

  22. Francois Brand Post author

    they could have conveyed all this in under 10 mins at the MOST..

  23. CamiloSanchez1979 Post author

    So if the caller has to use TURN a fee will be charged? 

  24. julian correa Post author

    very exciting ..only concern i hope is the data travels safely and anonymously

  25. GUILD STREAM Post author

    I am needing help with my web.rtc. This is what i need. The web.rtc is asking to use the and mic each time my members go to their dashboard section. I need the web.rtc to remember the members for each log in

  26. George Penchev Post author

    The only video that provides so much valuable information for beginners at WebRTC.

  27. Dankest Elf Post author

    "high quality audio and video"… starts with horrible quality video, how ironic xD

  28. Gregory Owen Post author

    Hello everyone, I would like to build a radio network on a dedicated server for about 100 users +-. How do I go about using WebRTC to build such a thing? Is there anybody that knows that wants to work on something like this with me?

    I want to build it for a Neighbourhood Watch as we want to use it for live communications. Is there a way that the communication can only be received and played by 1 person at a time? Like a walkie-talkie network?

  29. Mattias Johansson Post author

    Any info about iOS? Could webRTC be added to Chrome just like it's been addeed to Bowser?

  30. Ahihi Đồ Ngốc Post author

    I have a idea to bring my camera security to the wed service. So Web RTC can help or not. Please reply if you have any idea.
    Thank you so much.

  31. fatima-z boujrar Post author

    please i really need your help i have a php app i want to add a chat(text,video) i have all the informations about the peers in a table in mysql db like the IP the session… i need to use webRTC without using a signaling servers like nodeJS is that possible ? if it isn't , how can i use the webRTC in my app in the real world ? thanks for response please .

  32. Rifat Gazi Post author

    Free voip call,User I'd and pin numbers operator code,Rifat Gazi topup balance recharge:$20000000 for sales to worldwide activation code.

  33. Евгений Щербина Post author

    Это реальная болтовня от дебилов для дебилов. Наболтать столько хуйни и не рассказать ничего!! Это ровно то, как делать не надо. А посмотрите их сайт с "живыми" примерами, которые работают на их сайте… и ни хуя не объясняют как это применить на своем сервере!! Дебилы блядь!!

  34. Ahana Soni Post author

    Hello, can I make video call option in the android studio from web RTC like WhatsApp?.

  35. Jayant Barthwal Post author

    how can I record the broadcast so that I can show that after some time , I hope you understand , I have developed the live broadcast website but how to record that ,plZz help

  36. static whales Post author

    is there any A-Z training available for the webrtc?

  37. Lee Boon Kong Post author

    Documentations did not even exist for Android Native, I need to read sample from scratch, please make some documentations

  38. Alejandro Torres Post author

    I am new to video real time comunication, Is WebRtc still a thing?

  39. RISHAV BHOWMIK Post author

    stun and turn proves that WebRTC is clearly a Fake Peer to Peer Connection

  40. tuan charlie Post author

    Hi, thanks for the great vid

    Using Chrome and Opera on some mobile devices the image gets compressed vertically, no matter if using constraints or not on the getUserMedia or if sizing the video tag, while always fine in mobile Firefox, and all desktop browsers. Any clue or direction to debug? Many thanks, cheers, Mat


Leave a Reply

Your email address will not be published. Required fields are marked *