The State of Images – The State of the Web

By | September 13, 2019


RICK VISCOMI: Hello,
everyone, and welcome back to “The State of the Web.” My guest is Colin Bendell. He’s part of the CTO
office at Cloudinary, and co-author of the book,
“High Performance Images.” And today, we’re talking
about the state of images. Let’s get started. [MUSIC PLAYING] Colin, thank you for being here. I want to start by just talking
about some of the things that developers have
probably already heard about with image optimization. Let’s start with
layout stability. What are some of the
things developers could do to eliminate that
jankiness of image loading? COLIN BENDELL: Yeah, so that’s
one of the many challenges we have with images
and video on the web is that experience where
the user is engaging in the page and all of a sudden,
they go to click on a link or scroll all, and all of
a sudden everything moves and it’s gone. And we talk about
jank a lot of times because it’s about JavaScript. Filling that buffer
in the time loop and be able to do too much,
and you can’t do that. But images also play in that
because if the browser has enough information– it’s
downloaded the CSS and most of the JavaScript it needs,
and it’s doing the layout, but then images are a
low priority request. And so they come in later. And as they’re coming in, if
the browser didn’t already have a box predefined, then
when the image finally comes in, it realizes, oh, this
is a 6 by 9 image and then has to
reflow everything. You’ve got that re-layout. And so there’s a
number of strategies that we’ve used in the web
because ultimately, we’re trying to stop the user from
having a poor experience, and we don’t want to distract
the user from that experience. So we’ve used in the past
things like defining the height and width of an image, which
is great in a fixed layout kind of style. And this becomes one of those
many different strategies of, well, it kind of
worked in the past. But now that I’m using
max-width and min-width, I’m using responsive
layouts, now how do I also
define that height? Because if I can’t define
that height in absolute terms, we still have the same problem. So some of the more
advanced strategies now are using techniques like
low quality placeholders, where you have an image that
maybe is encoded in base 64, so it’s a very
small image or maybe even an SVG, that gives enough
information to the browser early while it’s doing layout
to be able to already define what the height and
width of that element is. These are all challenges
with the browser. And later on, we can talk about
some of the new enhancements, like intrinsic sizes and
aspect ratio, which are also trying to provide
us new techniques to give that layout piece. But ultimately, the
biggest bang for your buck is to make sure you
have defined those width and low quality placeholders are
a good way to start with that, and if we can start to
transition to intrinsic sizes and things like that. RICK VISCOMI: Another
age-old battle developers have been fighting
with images has been the balance between
quality and file size. And the image format
has been one way of ensuring that you
minimize the file size. So what is the conventional
wisdom for using the right file format? COLIN BENDELL: Almost every
day, there’s a new file format that somebody’s talking about. And we’ve got a lot of
great standby formats– JPEG, PNG, GIF. I mean, if you’ve
been on the web and you search around
for image optimizations, you probably have seen a
lot of these talks talking about these standard formats. And even some new formats,
like WEBP, or FLIF, or other standards out there. Now the challenge is that
each of these formats have a different lineage,
a different history. JPEG’s been around for
25 years, 26 years, and it’s a great format because
it does a lot of great things to reduce the bytes
per pixel and give you a really nice, small image. But it’s designed for
photographic need. So inside of that
photographic, you are expecting to have gradients
of colors, that sunset picture or the clothing, and
that’s not always the case that you’re using. So you have different
formats that actually excel in different ways. So you have GIFs
and PNGs, which are great for more
illustrations or where you’re dealing with palettes or
solid colors in large swaths. So that way, in the
optimization of that format, it’s looking for those
patterns and basically reducing the bytes
down, and so then you’ve got a smaller image that way. That’s PNG. But where they
find those patterns varies dramatically between
these different formats. And so it really
starts to come down, first off, by the use cases. What are you trying to do? What kind of content? Are you trying to create
icons, illustrations? Are you trying to
do photographic? Are you taking pictures
for a news event, or you want for dinner and
the picture of your plate? So each format has
strengths and weaknesses to how you address it. And then you also can talk
about formats like SVGs, which are great for vector-based
imagery that can scale based on the display area. You don’t have an
exact pixel that needs to be stretched or shrunk. So those are the
formats that we have. But the challenge
always comes down to, what does the end user support? The browser or the
app that they’re consuming this content– fortunately, here
you are in 2019. JPEG, GIF, PNG
supported everywhere. They’ve been around
for a long time and these libraries
are very old. They’re very robust. But so when we talk
about new formats, that’s the first hurdle is
how do we make sure that they can be supported? And then, how do they
compare performance-wise? Both in decode, but
also byte savings. Yeah, and so those standbys
are always a good first place to start. RICK VISCOMI: And some of those
features that you may want out of your images, like
transparency, just aren’t even supported in some formats. COLIN BENDELL: Oh, exactly. So JPEG is great
for a photographic, but doesn’t do transparency. Whereas GIFs can do
transparency, but single-bit, whereas you can do much
better transparency with PNG. So you can do that
anti-aliasing with PNGs. So then you’re also trapped
with these trade-offs of what am I trying to accomplish? And what– the best format
to get the byte savings down? It’s good to note this paradigm
that we’re evolving into it. We’re in this Instagram age. We’ve got lots of
images everywhere to grab our attention. And it almost seems like
images are an old thing. We should have solved
this a long time ago. But yet, it’s a new thing. We have this tension because
we have content creators who are trying to create
beautiful experiences, and then us
developers are trying to get a good
score on Lighthouse and get our page
performance down and we have to balance
these two objectives in this with just the format selection. RICK VISCOMI: Exactly. So once you’ve
chosen a format, what are some of the knobs that you
can fine tune to optimize image quality? COLIN BENDELL: Yeah. OK, so the standard ones
that you’ll probably see a lot on the web,
for JPEG, quality index. So you change the quality
factor and you’ll probably hear, use quality 80 as a good– now it’s probably a
good spot to talk about, when we say quality, we’re
not talking about percentages. A lot of people think,
quality of 80% on JPEG. Actually, it’s a
unit base of 100, and each of those 100 map
to a quantization matrix. An 8 by 8 matrix is applied
in JPEG world to this image to try to reduce the number
of colors, if you will, in this block. So if we can make those all more
whites the same kind of white, then we can easily compress
those series of pixels out and we can get
better savings– the more aggressive that we can
take the range of colors down So quality indexes in JPEG is
the one most people talk about. You start with quality
80, quality 90, you get a lot of bang
for your buck on average. Now there’s other strategies. And we can talk
about other ones, like using chroma
subsampling, which is a strategy that comes
from the TV industry, where you can separate the
colors, because your eye– you’ve got cones
and rods, and you are more sensitive to
brightness than you are about the color itself. So in, say, JPEG,
you can express the bitstream of color– instead of just RGB,
you can do it in YCBCR. That’s the chroma,
which is the color, and luminance is the black
and white or the brightness. And so chroma subsampling is a
technique where you basically keep the luminance channel, and
the chroma, you can, in say a 2 by 2 grid, share
the same colors, and put one pixel of color. And blending that with
the black or white or the luminance
channel, it perceives like different colors, but
they’re actually the same. This is how all TV
generally works, is how we got color
TVs back in the ’20s is by doing this trickery
playing with our eyes. And so JPEG can support that. PNGs, GIFs, not so much. Another one is
progressive displays, so how you progressively
provide that content. So if you do one quarter of
the image and then one half, and then full, general
terms, those scan layers, as they apply, they bring in
the resolution of the image. And by doing that,
you’re actually able to save some bytes in
those layers in the JPEG world. So if you were to
start from somewhere, I would say start with
these three levers– quality probably 80 or 90,
use chroma subsampling– again, there’s footnotes there– and enable progressive
on those images. RICK VISCOMI: So we should
also talk about mobile and how not only are their
screen sizes getting smaller, but also the data plans
are getting more expensive. So and what role
have images played in adapting to the mobile world? COLIN BENDELL: I’ve been doing
a lot of research on my own about the consumption
habits by users, and looking at how different
users in different environments consume the web, what kind
of formats are available, and how is that in
different experiences. And I always had this mental
model of mobile and desktop, and that there is
these two paradigms. And if I track it
by time of day, I expected to see like desktop
traffic go up in the morning. People go to work. And then it to peter off
in the afternoon, evening, as they go home, and then mobile
traffic kind of carries on. It turns out that I
was completely wrong, that in today’s
day and age, what we see is, yes, the morning,
8 o’clock, 9 o’clock, you see traffic of desktop,
but mobile’s always there. And then in the afternoon,
you have this decline. You know, 5 o’clock–
slight, slight decline, but it continues on all the way. And mobile continues
on and it bumps up. So the first distinction
I wanted to make is that there really isn’t a
mobile versus desktop world. Desktop is not just a great
box at our desk anymore. A mobile device
includes our laptops, which we classify as
desktop, but really, we carry that around on the subway. We do work. So the minor correction
I want to make is that yeah, desktop is
actually a mobile device. So to your question, the real
question you asked about, what should we do for
mobile strategies, well, the first place
we start is, of course, we should be dealing
with responsive layouts using flexible grids and
all of the great stuff to give that consistent
experience, regardless of your form factor,
whether you’re viewing it on a tablet,
or a phone, or a desktop. But this has a number
of other challenges because you now have a very
wide screen or a very narrow, portrait and
landscape challenges. So where do I start? So the good news is that
for a number of years, we’ve had support for
responsive images. So you can use srcset and src– these two attributes
on the img tag. srcset allows you to
specify to the browser, hey, I have a number of
different versions of the same image that
allow me to define different breakpoints. So if you have a
wide image and you want to say on a giant screen
versus a smaller screen, it’s the same image, but I’ve
got different pixel volume. So srcset allows
you to say, I’ve got a 200-pixel-wide version,
a 300, a 500, a 1,000 pixel, et cetera. But the same image
just being resized. The attribute we have–
well, it’s not an attribute, it’s an element– is the source element. So in the picture tag, you can
now have a source that says, I have not just one
image here, but I have actually different
cuts of that image. So I have the image
that is square, perhaps, in this layout,
but meanwhile, I’ve got a wider version, as well,
if I’ve got a desktop view. So what can I do with that? See, you can’t specify
your media queries and then you provide
that different context so you can have that very fluid
experience of the same image, but instead of just
resizing it, and making it squishy and small, but
you’ve got maybe different cuts, from portrait, to square, to
letterbox, whatever, based on the experience of the web. RICK VISCOMI: You make a good
point about you can never really be sure, is a desktop
device purely the gray box sitting at home? But for example, I use my mobile
phone at home on my home Wi-Fi, where the network isn’t
really affecting me. So you can’t really make too
many assumptions about, well, I need to serve
low quality images. They’re on a small
form-factor device. It’s not always true. So do client hints
help at all here to understand the context
of the user’s experience? COLIN BENDELL: Yeah. So client hints is a strategy
that allows the browser to send additional
information along the wire to the content servers. And now, this is
at the HTTP layer. So you enable it through
markup or through HTTP headers, saying, enable the client hints. And then the different client
hits that are available today are Width,
Viewport-Width, and DPR. So DPR– sorry,
start with Width. Relative to the viewport–
the viewport is the screen– width is, how wide is
this specific image? So as I said, Viewport-Width
is the actual screen size. And then DPR is the
density of the pixels– 2x or 3x. These hints, when they’re sent
across the wire in the content negotiation, allow
the content server to say, oh, I see you’re making
a request for this image. I see that your layout is 500
pixels wide for this image. I’ll send you an image
that’s appropriately sized. The objective here is it’ll
help save you with your markup, so you don’t have to have
all the verbose markup that says, hey, here’s all of the
different srcsets and all the different source elements. You can now simplify
that down and let the content negotiation
define what should I send to that client. Now, I’ll put a footnote here
because client hints has been around for a number of years. Some great work was done
to bring it to fruition, but it also has flagged a
number of security concerns around fingerprinting, because
those client hints were also being sent to everybody. So we’ve had, actually, some
really good developments in the Feature Policy
spec, and where we’re evolving client hints now
to be much more security-aware. In fact, it’s going
to hopefully bring us to a spot where we can be a lot
more security-first conscious, and actually help us
lock down the user-agent, because the user-agent’s
such a terrible HTTP header. So this way, we’re
enabling the developer to say, these are the
sites that I trust to do content negotiation,
where I want my images server or my video server
to actually be able to do that negotiation
for width, and height, and DPR. But I don’t want those
ad networks doing that, and I’ve got control to make
that decision of what I trust. Because the
opportunity’s not just about providing
the right content, but also using other details
about the environment that might give out a hint of, how do
we get content to the eyeballs as fast as possible? So if you’re using, say, network
RTT or just look at congestion window, these are very
low-level TCP moments that allow the server,
perhaps, to recognize that hey, you’re in a Wi-Fi, but
you’re in a hotel Wi-Fi and it’s not the
greatest experience. It’s really slow. So perhaps a server
can then say, hey, I want to give you
actually a half-DPR image, so the pixels are going to
be half the volume of pixels so I can send you across the
wire a much smaller file, so I know it’s
going to get there. And I’ll let the browser
do that upsizing for you. So client hints
can, in that regard, help that content
negotiation back and forth. RICK VISCOMI: Images also
play an important role in accessibility. And according to the HTTP
Archive and Lighthouse, also, only about 50% of mobile
images have alt attributes set. So this is a drum we’ve been
beating for a very long time, and it’s kind of surprising to
hear that only half of images are properly set up for this. So why do you think
this is still an issue? COLIN BENDELL: We started
off talking about, hey, there is this tension between content
creators and us developers. The reality is that when
I’m building something, the first user is me. And so it comes from my
experiences that I start from. And the alt tag is a good
example of accessibility. You’ve got img, src,
blah, blah, and then alt. The alt tag is used by
anybody using a screen reader. And if you’re using
a screen reader and you don’t have
that alt tag, you’re going to have the person
with visual difficulties– they’re literally
going to be listening to the URL of that
string, right? Terrible. Now, the alt tag
gives that person the opportunity to
know that this is a dress with sunflowers on it. This is information that I care
about when I’m looking at it. But the challenge is that
I, as a developer– well, I’m fortunate enough to
have good-sighted vision, so I don’t suffer from this. So it’s hard for me
to remember to add the alt text, the description
attributes, for the rest. And so unless you’ve got that
direct, tangible experience, it’s easy to forget. So fortunately, we’ve got
things like Lighthouse that have the accessibility
scores in there that help remind us, but
it’s a challenge. Now, there’s actually
some cool stuff. There’s a Canary
version of Chrome that has machine
learning generating some of those alt tags. I think we’re seeing some
really cool stuff coming out in that angle to help
with the accessibility. But even a machine
learning-based system that’s going to generate tag– it’ll say that it’s a
dinosaur on a skateboard. Well, is that really what it is? It’s still best to have
that content for you. RICK VISCOMI: It’s
a good safety net, but developers should
still be providing that alt text if they can. COLIN BENDELL: Absolutely. And so when we talk
about images and video, there’s actually a whole
spectrum of that experience. So it’s not just
about the alt text, like why we’re doing that. Oh, because we want to
feel warm and fuzzy inside. Well, no, we’re trying to
do that for accessibility. It actually also helps with
SEO and discoverability, so there should be a lot of
people championing for this. But you also should be
thinking about the experience as a whole, because
we’re dealing with eyes that are making decisions. It’s our lizard
brain, if you will, like we’ve got these cones
and rods that go straight to our visual cortex in
the back of the head, and then it wakes up our
mammal brain and says, hey, you want to pay attention. So we’re creating visual
content experiences on the web to attract
you, to get that hook, to get above the
noise of the web. But our eyes aren’t
all the same. Some people– colorblind. We’re taught in school
that we’ve got three cones. Some people only have
two cones, or have two and the third cone’s
a little bit less. So that’s where you get the
gradients of red-green color confusion. But there’s also a percentage
of people, particularly women, that have four cones that
can actually see more colors. Now the interesting
bit about this is that all of our
physiology research, particularly around vision,
is all based in the 1940s, ’50s, ’60s, and it’s
predominantly dudes that did all this research. And depending on who
you’re looking at, there’s different people
that will say, well, maybe only 12% of women
have this fourth chroma. Some will say that there’s
up to 50% of women. So we have this other aspect
here of visual experiences that are different based
on possibly gender, and there’s just no research to
really say how much that really makes a difference. But it does make a
difference in when we make some of our decisions. So I said use 80
quality factor in JPEG. I take that back because that’s
almost a terrible choice, because if I’m
saying quality 80, I’m assuming some
editorial control. But I’m a dude that has spent
my time in front of code, and I’ve not spent my time
listening to my lizard brain, learning how the creative
expression works. And so I can very easily
make a poor choice by saying quality 80 that will
actually wash out those reds that a person who has
better apparatus than I have would actually
be able to see the difference
between those reds and say, no, that’s
not a good red. And if you look at it, there’s
very few textile websites out there that offer feedback. There’s a few out there that do. Lululemon is an example where,
if you look at the comments, the most common
recurring comment is that I love this
clothing, but the color wasn’t what I thought it was. And it’s always about the color. It’s not exactly what
I expected to be. And there’s this
downstream impact. They’re returning products
and things like that. But we expect the image to
be a representative reality. But if we’ve got people making
decisions and saying quality that waters down those
colors, then that experience could be diminished. So we have to be
conscious of it. So the fortunate
thing is that we also have a lot of
algorithms now that can help us to find the
right combination of chroma subsampling, or whether
it be quality factors, and decide how do we
preserve that experience? Using structural similarity,
SSIM-based algorithms, that try to make it more
equation-based and more math-driven that look
at the experience, and the emerging ones
also looking at the colors as a part of that input, so
that you can have confidence that yes, I want
to reduce bytes, but I want to make sure that
it meets the expectations of the content creator. So the accessibility
is one aspect of it, but also the
emotional experience, or trying to grab that user
and bring them in and get that experience consistent. RICK VISCOMI: That’s
such a great example about the shopping cart
thing because a user may look at a color in
an image and not expect to see it
arriving at their house a totally separate color. So the way the
developers actually fine-tune their image
quality doesn’t just save bytes on a wire and
dollars in egress costs, but it actually
could incur costs in return fees and bad reviews. And there’s a real
monetary cost to that. COLIN BENDELL: Absolutely. I would love to be able to
do some research on this. This is one of
these areas I think we haven’t explored enough,
is in the real world, so many times we
walk down the street and we walk by this fruit
cart, and all of sudden we stop and we’re like, oh, yeah. Actually, I would like
to have that apple. I’d like to make that choice. I’d like to buy that apple. And I was actually hungry. Yeah, I didn’t know about
it, but I was hungry. We’re making these decisions
because of the that lizard brain, seeing the environment. When we have the
right combination of colors and experiences,
those decisions now alert– so if I have the web without
images, it would be very fast, but we probably
wouldn’t see anybody actually engaging
in your content because the hook isn’t there. And our content creation teams
have spent their entire career learning how to get that
hook, that emotional that matches reality. RICK VISCOMI: I want
to ask you about WEBP. It’s been around since 2010,
but I’m ashamed to admit, I haven’t quite adapted it yet. So what do you think is the
current advice you would give to somebody about WEBP? Should I be using it? COLIN BENDELL: It’s been
around, what, since 2010 or so? It’s a great format
that has hit the web in the absence of other
formats getting large traction. And as I mentioned
earlier, there’s a lot of different
competing formats out there. But WEBP– it’s supported
by Chrome, predominantly, and more recently now we’ve
got Firefox and Edge that also support Chrome– sorry, support WEBP. Now, WEBP is trying to
learn from the past, learn the techniques that we’ve
developed over the last 20 years and be able to
apply new algorithms that can save more bytes. And so it’s a really
good mechanism. I’ve got data that suggests
that with JPEG, we’ve got techniques, and
with, say, Moz JPEG, we can save another
10% of bytes. With WEBP, depending on
the size of the image, we can save anywhere from
10% to up to 30% bytes. And it does depend. So small images, actually we say
you see that 30% byte savings. As you get to a larger
image, it gets watered down, so it’s about 10% over what
you could do with JPEG. So that’s some significant
byte savings across the wire. If you’re constrained on
that cellular network, then getting that visual
hook in front of the person– that’s a great win. But it comes with
some cons, as well. So JPEG’s been around
for a long time, as I said, and it’s a
really fast decoder. So when we’re
dealing with WEBP, we have to deal with
modern browsers. So we have to hope
that everybody’s adopting the latest browser
like you and I might be. The reality is that
not everybody does. I still see a lot of
Lotus Notes traffic. They’re people getting
web content in the browser and it’s from Lotus
Notes, or it’s from an old version
of Internet Explorer, or old versions of Firefox. The internet’s weird. There’s a lot of
stuff out there. So the one part
of it is adoption. The second challenge is,
of course, the use cases. With WEBP, we have,
finally, the ability to do transparency and even
animation inside of it. So we’ve learned those
lessons from the JPEG days. But we are also constrained
in two big ways. One is it’s forcing
chroma subsampling. That’s what I was mentioning
about the technique that we’ve learned from the
video industry, the luma and the chroma, and be able to
reduce that amount of chroma. You’re hacking the eye. You’re removing
detail but you’re able to get away with it. But what WEBP does is it forces
a 4:2:0 chroma subsampling, which means that there are
many cases where it’s not a good choice to use WEBP. For instance, in Asian
markets, most times you’ll have an image of whatever
commerce that you’re selling and you’ll have the
lexigraphical characters bonded to that image over
top, maybe with a red outline. Well, those hard lines are
very problematic for chroma subsampling. They become blurry
and they’re not great when you have that
blurry effect. So any time you have hard
lines, or illustrations, these are cases where WEBP just
falls down and doesn’t give us the flexibility that we need. And then the other gap is
color palette, color space. So in the web,
we’ve built the web based on 1996
standards around SRGB. If we looked up the amount
of colors that the typical– I say typical, quote unquote– three-coned human
can experience, if we plot those reddish,
greenish, and bluish cones, we get this– you’ve probably seen, it’s a
horseshoe shape, the XYZ color space. All possible colors. That’s not two-dimensional. It’s actually a three
dimensional view, but you get the idea. SRGB was a standard established
in the late 90s that said, we can represent
these sets of colors– the triangle of colors– and we can represent that on
the web, and in LCD displays, and so forth. So all of the web has been
based around this SRGB that limits to 8 bits per channel,
so you have this 24-bit image color space that is really a
fraction of the full colors that you can actually see. So with JPEG, there’s
some hacks you can do to kind of get
over that if you can beg, borrow the pixels in the index. But you’re still limited. With WEBP, you’re
limited to this SRGB. We need to move
beyond that and to be able to support more
colors, because if we’re trying to get that
hook of that user, we need to represent reality. And SRGB is just a small
percentage of that overall. Now, we’ve got devices– if
you’ve got a modern Android device or modern iPhone, you’ve
got devices that can do P3. P3 is an even wider
set of colors, both in the overall
sets of color, but also the discrete
colors between. And if you’ve got a 4K TV at
home, you’ve got probably Rec. 2020 with high DPR,
which is even larger. We’re closing in on what we
could do with analog film back in the ’70s, the colors
that we can reproduce. Still not 100% of reality. So we talked about formats. You know, these are
some of the gaps. We’ve solved some, and
WEBP is great and available in a lot of browsers. But we have these other
upcoming challenges that we need to also solve. So yeah, the internet is weird. [LAUGHTER] RICK VISCOMI: I
always say that, too. COLIN BENDELL: Yeah. RICK VISCOMI: So
you had mentioned that WEBP supports
animations, and that brings me back to the
days of those blinking “under construction” GIFs
that you’d see on the web everywhere. So we’ve obviously
come a long way, but what is the state
of animations today? And what would you recommend
for people who want to use them? COLIN BENDELL: Yeah,
so animations– it’s one of the spots where it
feels like it’s been stagnant for a long time. But surprisingly,
it’s not died out. Everybody assumed–
animated GIFs are terrible. Just to put it out there. They’re terrible
because they’re large. For a very short clip of,
say, a TV show and if you have a sequence of that, you
could be racking up megabytes of data really quickly. But they’re really good at
expressing meaning and intent. That’s why they’re so popular
in social media platforms, because you’re able to express
an idea really quickly. And you’ve got, it’s like
a video, but it’s muted. It loops. It’s got a whole different
creative aesthetic. But GIFs are really the
only option out there. Now, we’ve done a lot of
hacks to get you higher color palettes. It’s not just 256,
but you’re able to get the effectiveness of a
larger color palette, even though it’s not there. But we still have now
this challenge of there’s still really large
images in that. With WEBP, learning
from the past, they obviously brought in the
ability to support animation. So that 1.7 megabyte
animated GIF in a WEBP is maybe about 300 k. That’s pretty awesome. But that only supports inside
the Chrome environment. Last year, Apple and
WebKit, in Safari 12, brought in support for
image source equals MP4. So you can actually ship
an MP4 to an image tag and get an animated
GIF experience, where you can have a HEVC or
an H264 payload inside an MP4. It loads like an image tag,
but it’s muted and it’s looped. And so then the
advantages there, if you’ve got animations
that are more picture-based, because HEVC and H264– those are really good codecs
for real life, for video– movies, TV, that kind stuff. So if you’ve got animations
that have that kind of content, then you’ve got a great vehicle
that brings that 300k in WEBP down to maybe 50k in HEVC. In the contrast,
though, you still have these gaps of
palette-based animations. If you’re trying to illustrate
the ionic bonds for textbooks and things like
that, those cases, actually GIFs might be
still among the better, and WEBP are good for that. With these video codecs,
they become a little bit more blurry, so maybe not as good. So there’s still
lots of room for us to figure out how
to best deal with these different composites. Just one little quick
anecdote, though. It’s not just those young
kids with their animated GIFs. Talking with a number of
even the ad tech industry, they’re trying to figure out
how to get past people putting on ad blockers because, well,
it’s a crappy experience, and up the game in the
quality of the ad so that they’re relevant
and people aren’t– and there’s emerging
research showing that, yeah, animated
GIFs are actually really good for conversions. But they can’t be too noisy. Too noisy, really
disturbs people. But cinemagraphs–
subtle movements– create great engagement
for the ad tech. In fact, it’s working
great in even commerce. You probably see those like
the girl with the dress, moving the dress
so you could see the color, the light
refracting a little bit. So I think as we can start
solving some of this technology bits, we’re going to
see the content creation kind of explode
looking for animations. We talked about animation
as an image tag. The age-old, of
course, recommendation is you use video tags. If you got an animation,
just convert it to MP4 and use a video tag. That’s probably
actually the best way to get the mix of all worlds. You don’t have to
depend on Safari just to get the image as an
MP4 in an image tag. So if you can
change your markup, you change your templates
to use video, that’s great. The challenge is
that not everybody can modify their markup. If you’ve got to CMS
system, or if you’ve got some other
platform, then you’re going to have to sometimes
live with the tools you have. So this is where there’s
different strategies we can play. If you can play with the markup,
then great, use the video. Make it looping and muted. But if you are bound
by the content, or it’s going to be six
months, a year before you can get the next code
release out there that can do that conversion, then you
still have lots of other tools available to you. RICK VISCOMI: One
more HTTP archive stat that you’ve probably heard. About 2/3 of the average web
page is made up of image bytes, so practically
speaking, if developers do optimize their
images, what types of impact on the user experience
could they expect to see? Maybe web performance-wise? COLIN BENDELL: Yeah. So this is one of those
areas that is a yes, but no. We see the stats and
we say, where am I going to get my byte savings
to get my user engagement up? And how do I get my
Lighthouse score up? And it’s easy to say, pick on
the big bully on the street. But yes, there’s a large
amount of bytes on a web page that are dedicated to images,
and video, and animations. That’s because we’re
trying to grip that user, engage that user, because
people don’t read. Let’s be honest. You want to see the content. But I think I
mentioned this earlier, that images are a low priority
resource in the browser. So when the browser’s
parsing the HTML, it’s going to go and look for
all of the CSS and JavaScript in the preloader. And if it finds images, it
will make those requests in the network, but it’ll put
those as a low priority request so that we get the CSS back
and we can do the layout. So if you’re going to save
bytes on Images, most times, it’s really the
JavaScript that’s still holding up the train. Unless you’ve got really tight
CSS and JavaScript already, saving bytes on images
will not give you the same kind of big– like
chopping 50% of your bytes is not going to speed
up your page 50% because you usually have
a lot of this JavaScript. There’s a counterbalance though. The counterbalance is
that a lot of times you’re handcuffed to the system
and the solutions you have. Saving bytes can actually
help in some situations where you have a lot of
JavaScript that’s injecting, injected in tag
managers and things like that, where the
browser is actually getting confused and
getting things out of order. I see lots of cases
where the preloader finds all of the images
first before it finds that the CSS and JavaScript,
ships those requests, and then realizes that there’s
more JavaScript to load and it has to wait
for those images to come across before it
can do the other work. So in cases like
that, where you’ve got the browser
discoverability problem, those are cases where you
can improve the base load. But there’s another
challenge, another metric that I think that we often need
to look at, which is not just about the raw time
to first byte age, or the time to interactive,
or the first paint, and things like that, but it’s
how long does the user stick around? So the best practices
are around making sure that images are small
byte-wise, and we are making sure our CSS, and
web fonts, and things like that are discoverable
and loaded first. But we want to make sure that
those images and video are loaded so when the
user starts scrolling that that content is there. So the balance we want to find
is deferring image loading, and there’s a lot of
strategies to do that. I mentioned earlier low
quality placeholders. We put a little SVG or
inline text, to lazy loading, where we just use
pure JavaScript and don’t load that image until
the user actually scrolls. Those are good strategies to
make sure the bytes aren’t plugging up the pipe. But you also then want to
counterbalance that with once the layout has happened, once
you’ve got all the JavaScript, you want to make sure that
as the user goes to start scrolling that they’re there. You want to keep them there. You want to keep the user
going through their journey. So you want to then
open up the gate and make sure that those images
start to flow in after you’ve got all the essentials. So this is give and take
that you need to adjust. But don’t expect the world
to move on those core metrics for image and bytes savings. It will, depends on your market. But what you will see is
by optimizing those bytes, is your sessions, your user
engagement, as they interact– those metrics should
start to improve, as well. RICK VISCOMI: So what
is the ideal solution of optimizing the bytes
with the quality of images? Is it for people
to spend more time understanding all of
these different levers on their configuration? Or is it we need to rely more
on computers and algorithms to do the right thing? COLIN BENDELL:
What you said, yes. Yeah, this is the challenge. We’ve got a lot of
different formats, a lot of different
levers to use, and a lot of different contexts. And the good news
is that we have a lot of developed
industry knowledge to run through that
rubric to evaluate the context of those images. So this is where we
are really getting good as an industry, being
able to classify, is this a picture
or an illustration? Has this got lots of
high action movement or is low action movement? Has this got lots of colors,
or is this grayscale? And from there, what
is the right experience we’re trying to drive? So should we use this
knob and this knob? Should we change the quality
low and quality high? So yeah, absolutely we should
be depending on algorithms because if we’re
changing our markup, there’s a lot of different
ways we can be expressing that. And there’s almost a
hamster in a wheel effect where there’s always
another technique that we can apply that we
haven’t thought about before. There’s a lot of
great tools out there that allow us to just
do that automatically. And you can see that. I think Squoosh– there’s
a nice tool there. You can apply the
different checkboxes and apply if I want to change
it to Moz JPEG or whatever– you can see the impact. But there’s also great tools
that will do that for you that you can apply that in
your build chain, apply that into your CMS, et cetera. RICK VISCOMI: Not to mention,
you’re not always in control over the images that
get loaded on your site. You’re not always
putting them on your CMS. Sometimes your own users are
uploading them to your site. For example, my
avatar or something. You need those algorithms
to process images outside of your control. COLIN BENDELL: Yeah. And there’s a whole
actually other dimension here from a content
reputation perspective, from content you
maybe don’t want to have on your platforms
for legal or moral issues. There’s also security concerns. Images, and videos,
and things like that have had histories of
other vulnerabilities where they become a vehicle
for passing on a CVE where a certain decoder,
if you’re using ImageMagick or something like that,
could be exploited with this certain
kind of payload. So this is, yeah,
absolutely where the algorithms come into play. Depending on algorithms
would be my number one recommendation for dealing
with qualities and format selections. And then as we move
into the other domains, like adjusting for colors
and things like that, that’s where you now should
be focusing your attention with your creative teams. RICK VISCOMI: Finally, would
you recommend any resources for people to learn more
about image optimization? COLIN BENDELL: There’s a
lot of great resources. Adi has a great
site, images.guide. There’s a lot of
great content there. I’ve published a number
of different articles around the role of client
hints and some of the stats that we see there, and the
use of animations in Safari. And in the web.dev
site, there’s a ton of great resources about
different strategies, different tools
you can put in the build chain to kind
of fill that flow. RICK VISCOMI: Well, Colin,
this has been great. Thanks for coming on the show. COLIN BENDELL: Thank you. RICK VISCOMI: If you’d
like to find links to everything we talked about,
check out the description below. Thanks for watching. We’ll see you next time. [MUSIC PLAYING]

8 thoughts on “The State of Images – The State of the Web

  1. Michael Pumo Post author

    The music is way too loud compared to the talking.

    Reply
  2. Frank Johnson Post author

    A lot of great ideas discussed, thanks so much.

    Reply
  3. Uli Troyo Post author

    I admire the guys at Cloudinary for all the work they do with web images, everything from the Cloudinary app itself to their work on the FLIF image format to their conference on the subject. They’re hugely underrated!

    Reply
  4. Sandra Lahaie Post author

    sounds very interesting,,but way above my understanding…well done Colin,,,congrats Parents Peggy and Ted u must be very proud of him…

    Reply
  5. abdelmonem naceur Post author

    I enjoyed this constructive conversation, thank you!

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *