Make the Web Fast: Automagic site optimization with mod_pagespeed 1.0!

By | November 9, 2019


ILYA GRIGORIK: Hello everyone,
and welcome to our “Make the Web Fast” series here on
Google Developers Live. Today we’ll be talking about
a very awesome tool called mod_pagespeed, which
is a performance JIT for your website. So first off, my name
is Ilya Grigorik. I work with the “Make
the Web Fast” team. And with me today– JOSHUA MARANTZ: Hi. I’m Marinates And I work on
the mod_pagespeed team. ILYA GRIGORIK: Awesome. So Josh, I heard through the
rumor mill that there is an upcoming 1.0 release
for mod_pagespeed. JOSHUA MARANTZ: That’s
correct. The rumors are correct. We are releasing mod_pagespeed
1.0 next week. After two years of beta it’s
ready for broader adoption. ILYA GRIGORIK: Wow. So true to Google philosophy of
keeping everything in beta, that’s two years and quite
a few users, I think, as well, right? So it’ll be really interesting
to dive in and understand what’s happening underneath,
because I spent some time looking at what you guys
have built, certainly worked with the team. And the first thing that
stands out to me– I guess we have some slides
that I’ve prepared– is that if you care about
performance, there’s a lot of stuff that you need
to care about. In fact, my officemate, Steve
Souders, wrote a couple books on the subject. And there is stuff like image
compression, combining your resources, and deferring
JavaScript, cache extensions. Frankly, it’s a full-time job to
keep up with all the things that you need to do to keep
your website fast. JOSHUA MARANTZ: Yeah. We believe that if you’re
building a website, you’re trying to convey information. You’re trying to sell a product,
gain acceptance of an idea, communicate something. You’re not trying to hack your
website to make it faster. So the less you have to focus
on that, the better. Now, it is important that you
make websites fast because you get more engagement,
happier users. People will come back. But you don’t want to spend
all your time doing that. ILYA GRIGORIK: Right. And that’s actually a very good
point, very important point, which is of course
we care about speed. But speed in itself can be
almost a full-time job, because even these best
practices change over time. The browsers get smarter. We get more video on the web,
more images on the web. So these things shift, and you
need to keep up with it. And not only that, but some
techniques, like let’s say spriting images together or
inlining resources on the page, are actually pretty heavy
in terms of required work, either for the design
team or for the dev team. So it adds additional
complexity to your development cycle. So it’s not all just easy
wins along the way. JOSHUA MARANTZ: And the other
aspect is that sometimes it’s hard to draw a balance between
being able to keep your site up to date and keeping
it cached. For example, everybody knows who
works on web performance that the more assets
on a website that are cached, the better. So if you go back to a typical
newspaper site every day to see what the news is, you’ll
typically wind up downloading their JavaScript and their CSS,
which hasn’t changed– usually has not changed– every day, because they
have to set a fairly short cache lifetime. I typically see a minute or an
hour at the most in order so that they can push changes
out when they want to. But they don’t do
it every day. They probably do it once
every few weeks. And you want to be able to
change your website on the fly and have that propagate quickly
to everybody, but also have it so that when you don’t
change it, it stays in people’s caches. So this is a complicated and
messy thing to do manually. It’s a very easy thing
to automatically. ILYA GRIGORIK: Right. And I think that is the core
insight behind mod_pagespeed, which is to say, sure, you
can apply all these optimizations yourself. And in fact, you should know
them, because they should be best practices on your team. But we can, in fact, automate
some of this stuff. And that’s what mod_pagespeed
is all about. That’s why we say it’s a
performance JIT, Just In Time compiler, in your web server. So maybe you can tell us a
little bit as to what that actually means and how
you guys have gone about doing this work. JOSHUA MARANTZ: Sure. There are a number of approaches
to automated website automation. And our approach was to make
it really easy to adopt. So half the websites around
the world are powered by Apache web servers. And so what we did was we
packaged our optimization framework as an open-source
Apache module. So you pretty much in three
commands can download our package, install it, and restart
Apache, and your website runs faster. ILYA GRIGORIK: That’s a pretty
compelling pitch. JOSHUA MARANTZ: There’s
then more you can do. There’s a core set of filters
that we believe is very safe, will benefit websites a
lot and be very safe to run on all websites. And those come on when
you do that process. But then there’s more that you
can do if you’re willing to investigate and tune
it a little bit. But the whole idea was out
of the box, really good performance. ILYA GRIGORIK: Right. And safe, right? So your website shouldn’t
be broken. And I think we’ll actually take
a look at kind of deep in the guts of some of the filters
and how they work. But before we even get there,
one of the things that I wanted to highlight was that
I believe mod_pagespeed is actually based on another
open-source project. JOSHUA MARANTZ: That’s right. So the way that it is structured
is that we thought Apache was a very good delivery
vehicle for our technology. But we know it’s not the only
delivery vehicle for our technology that can
ever exist. So we layered this as an
optimization framework called the PageSpeed Optimization
Libraries, which is not tied to any particular server. It’s a plug-in architecture. And we’ll get into that more
a little bit later. And then we packaged that with
a connection to Apache, an Apache gasket, if you will, so
that it’s just plug-and-play and you don’t have to modify
the structure. ILYA GRIGORIK: So mod_pagespeed
is basically a wrapper around PageSpeed
Optimization Libraries for Apache. But if I want to adopt it
to some other server– maybe I’ve written one myself,
or I’m using some other popular server– I could actually still
reuse that same code. JOSHUA MARANTZ: Yes. And in fact, this is what we
have done with PageSpeed service, so that we’ve now
deployed this on two very different server stacks, one
based on Apache, one an internal Google one. But we can bring the same
technology in two very different deployments. And we are looking to expand
to any server that rises in popularity as well. ILYA GRIGORIK: Right. And PageSpeed Service is our
hosted version of this, which is actually still in beta,
and we’re still kind of field-testing it with
other customers. But it’s actually running
on the same code base, if you will. JOSHUA MARANTZ: Yes. ILYA GRIGORIK: Very cool. So I think we covered
some of this. We have a 1.0 coming. We know that it’s an
Apache module. You guys have been working
on it for over two years. And you actually mentioned some
of the core filters and optional filters. And it sounds like there’s
quite a few. JOSHUA MARANTZ: Yeah. There’s a wide variety. There’s a lot of ideas. Web performance is a topic
that invites papers. It invites conferences. Many companies are founded
around this. And there’s lots of ideas that
are pouring into this. But we try to take the ones that
are most effective, that are incredibly robust and
predictable, and put them into the core set so that the
out-of-the-box experience is really good. And then there are a lot of
other things that we are working on, that we’re
validating, that we’re making sure are really solid
and will make it in. And there’s others that we think
will probably always be kind of a manual configuration
kind of option. A good one of these is where
we defer JavaScript. That’s a complicated thing to
do and has generally amazing effects on websites. But it is something that you
want to hand-validate, and you don’t want to just
turn that on. ILYA GRIGORIK: Right. So I think this highlights kind
of a general point, which is to say there is a core set of
filters that you should be able to turn on or that will be
turned on once you install mod_pagespeed, and your site
should just go faster. But depending on your site,
you probably want to spend some time going through the
available filters and just seeing which ones may
apply to your site. And you’ll be able to get more
performance benefits out of mod_pagespeed. JOSHUA MARANTZ: Right. ILYA GRIGORIK: Very cool. So I think we’re going
to dive into the details of some of these. But I do want to touch on one
point, which is we do support 2.2 and 2.4 of Apache? JOSHUA MARANTZ: That’s
correct. Apache 2.4 support came
out recently. And that’s in our 1.0
release as well. ILYA GRIGORIK: Awesome. And you mentioned that it’s
just a couple of lines. So we have, I guess, packages
for Debian and RPMs that you can just install. JOSHUA MARANTZ: We do, although
external developers have generated packages for
openSUSE and even for Windows. It’s an open-source product. We have a build process and
instructions for doing that. And so people can put
up other packages. FreeBSD is another
one has support. ILYA GRIGORIK: Right. So I can just build it
from source, right? JOSHUA MARANTZ: Yes. ILYA GRIGORIK: OK. Cool. And then one more
thing for 1.0. I know that until recently, or
until we ship 1.0, there was one release tree, if you
will, or one package. And I think moving forward,
once we release 1.0, we’ll actually have two. JOSHUA MARANTZ: That’s right. The current release package, if
you’re using mod_pagespeed today, you’re on the
beta channel if you installed from binaries. And we’re going to continue
to have that beta channel. But the 1.0 release introduces
a new stable channel. And the way that this will work
is that we will release new features into beta. And after we’re really
comfortable and solid with them, then we’ll update
the stable channel. And then when you update your
packages and your operating system with yum update or
updating the Debian package system, you’ll upgrade
based on the channel that you’ve selected. ILYA GRIGORIK: Cool. That makes sense. All right. So let’s dive into
the guts of it. But I think we touched
on this already. I’ll just mention it briefly. The whole point, I guess, with
mod_pagespeed is to highlight the things that you
don’t need to do. So instead of having to worry
about it do I need to have an extra build process for
optimizing images or concatenating my CSS or
JavaScript or all the rest, all that is taken care
of by mod_pagespeed. And in fact, that means that
I don’t need to modify my current workflow or my team’s
workflow to take advantage of all these optimizations. JOSHUA MARANTZ: That’s
exactly the point. It’s a drop-in solution for
performing best practices for web clients. ILYA GRIGORIK: So the clients,
or the visitors I should say, would see the optimized
resources. I still have my original
resources in my dev environment. And mod_pagespeed
does the rest. JOSHUA MARANTZ: Exactly. ILYA GRIGORIK: Very cool. So you mentioned this earlier. We have over 100,000
mod_pagespeed installs today, since you guys announced
the product. And in fact, there is a number
of partners who have already installed it as part of their
hosting infrastructure. So for example, I know that in
DreamHost or Go Daddy, you can actually go into your control
panel, click– I think it’s in these
settings. I’m not sure exactly where
in the menu it is. But I know there
is a check-box. You say, please accelerate
my site. And all of a sudden, the
site goes faster. And what happens under the
hood is they enable mod_pagespeed for your site. JOSHUA MARANTZ: That’s
exactly right. Having that check-box to just
turn it on is even easier than the three-step install process
that I mentioned earlier. ILYA GRIGORIK: So it’s
like the Turbo button back in the old days. It’s like make my site fast. Why wouldn’t you turn that on? Very cool. So let’s dive into, I guess,
some of the guts. Here’s an example. JOSHUA MARANTZ: Yeah. So the best way to see
it in action, this is a shopping website. And we made a video comparing
a first view of that site on Chrome with mod_pagespeed off
and with mod_pagespeed on. And this is– ILYA GRIGORIK: So this
is a side-by-side recording of on and off. Wow. JOSHUA MARANTZ: Correct. So this video was made with
WebPagetest, which offers all kinds of opportunities– ILYA GRIGORIK: I’m going to play
that again just so you guys can see it. So the page loads
in 2.1 seconds. And then the other one takes
about five seconds, if I scroll back here. And you can see that
the mod_pagespeed one loads a lot faster. Now, there could be many reasons
for this, right? I’m guessing it’s a combination
of filters that come into effect to make
this difference. But it’s 200% faster,
or more, here. JOSHUA MARANTZ: Yeah. It’s also important to note that
even though the rendering takes 2.1 seconds, it’s actually
pretty visible after probably less than a second. ILYA GRIGORIK: Right. So if I just scrub back here. So we have 1.8 seconds in, we
already have the page, whereas the other one is blank. So that’s a dramatic
difference. JOSHUA MARANTZ: So we can see
a little bit of why that happened by going to
the next slide. So those that have dabbled in
web performance have seen these waterfall diagrams
that are available. ILYA GRIGORIK: That’s the
bread and butter. JOSHUA MARANTZ: They’re
available in Firebug and the Chrome Developer Tools. And these are from
WebPagetest. Again, this is from Chrome. ILYA GRIGORIK: So this is
the same website, right? We’re just looking at the
waterfall charts. JOSHUA MARANTZ: Exactly. The same exact website. On the left, the waterfall chart
is tall, which means there’s a lot of resources. And it’s wide. There’s a lot of
wide blue bars. Those wide blue bars are big
images which don’t need to be nearly that big. And so on the right, they
become a lot skinnier. There’s also a lot less bars. So the two things that are most
visible in the waterfall diagram, in terms of the effects
of mod_pagespeed filters, are one, optimizing
images. So there’s actually three
ways in which the images are optimized. Number one, mod_pagespeed looks
at the context in which images are displayed. Very often, images are taken
from cameras at full resolution and instantiated
into very small divs or elements in HTML, 100
by 200 or something. And there’s way more pixels
being sent down to the browser than the browser needs. And this wastes bandwidth and
it wastes CPU time on the browser resizing. So it’s much better to
resize on the server. But who wants to do that? Well, turn on mod_pagespeed and
it happens automatically. The other thing is that the
images are typically at a much higher quality ratio than you
need for an LCD display or a retina display. And it is pretty straightforward
to remove a lot of the bytes of
that image without reducing any visible quality. And the third point is that
modern web browsers, including Chrome and Opera, support a more
modern format of image called WebP, which Google
released over a year ago, which for the same quality can
get you about 30% less bytes. And so this is not something
you would do manually. But an automated tool can tune
the experience, tune the HTML that’s delivered and the images
that are delivered to the browser in question. So mod_pagespeed can take a JPEG
resource and transcode it and deliver it as WebP to Chrome
and Opera, and to other browsers deliver it as JPEG, so
that it works either way. So between all of those, we
shrunk this site way down. And actually, the waterfall
diagram, that blue line represents the onload event. What happens after the onload
event in this particular site is third-party widgets that are
loading asynchronously, which is great. So the site was built with
that really well. And so there’s analytics
running. There’s buttons from different
third-party vendors that are loading at that point. But it doesn’t block onloads,
so the user is fully interactive at the time of that
blue line, which happens way earlier than on
the other site. ILYA GRIGORIK: So it’s really
interesting that just by kind of blurring your eyes, we can
look at this waterfall and just figure out what’s happening
just on the shape, without even looking
at the resources. So you can say like, we
optimized the images. We probably concatenated some
files, which is why it became shorter, and a few
other things. JOSHUA MARANTZ: Yeah. I mean, mod_pagespeed is not
necessarily all that you would ever want to do to make
your site fast. It’s now running in
two-point-something seconds. So you could probably get it
down to one second, because there’s a lot of kind of
cascading effect here. And there’s not much
parallelism, especially after onload. And so diving deeper into the
waterfalls is something that you might want to do if you
want that next level. But kind of without any effort
at all you can get– ILYA GRIGORIK: Yeah. I’ll take it. Right. 2X just for turning on a flag. I’ll take it. All right. So now we will go under the
hood of this thing. OK. So we’ll start with
a simple one. So HTML Collapse. This is an example filter, a
very simple one but it will be a good introduction to
some of the more interesting ones later. JOSHUA MARANTZ: Yeah. So this is kind of the simplest
possible filter that you can have. And this is actually a filter
that we have in mod_pagespeed today. It’s a little bit more involved
than this in reality. But this is essentially it. You can, as every filter can,
register for interest in various HTML events,
as it were. And as the events stream through
the system, we can say, hey, this one’s
interested in a Characters node. That’s all Collapse Whitespace
cares about. And then it basically just wipes
out extra spaces that it’s pretty sure can’t matter. And cases where it would matter
is if it’s in a pre tag, and there’s other
cases as well. ILYA GRIGORIK: Right. So it’s not as simple
as it looks. This is not like run a gsub
and remove all the spaces. You’re actually parsing
the HTML. And you’re saying, hey, this
is inside of a pre tag or a script tag, so the white
space is significant. But nonetheless, inside of your
regular HTML markup, you can still compress the
extra white space. JOSHUA MARANTZ: Exactly. And this is a relatively
popular filter. It’s actually not
a core filter. ILYA GRIGORIK: Interesting. JOSHUA MARANTZ: And that’s
because we are a little bit conservative. And it is quite possible for an
element to have its white space become significant due to
a JavaScript event, which is not something that
mod_pagespeed currently looks at. So this is a filter that’s
pretty safe to do. But we leave it up to users
to turn it on by default. I’ve noticed a lot of users do
turn this on, because it’s mostly pretty safe. ILYA GRIGORIK: Right. OK. That makes sense. So this is a more interesting
one, right? So now we’re talking combining
multiple CSS files. So how does this work? JOSHUA MARANTZ: Right. So the basic idea here is that
as HTML is streaming through mod_pagespeed, we’re
parsing tags. We’re saying, hey, here’s
four link tags. Let’s collect all of those
together, collect the contents of those, and collect the
names of the CSS files. And we’ll get into how that
happens in a few minutes. But what it is that happens is
that those four link tags get replaced with a single
link tag. ILYA GRIGORIK: Which
is the one we see on the bottom, right? JOSHUA MARANTZ: Which is the one
that we see on the bottom. And it has the names of the
original CSS files separated by plus signs, literally. And then there is a .pagespeed
keyword, which is something that we look for when
we are serving it. ILYA GRIGORIK: It’s
kind of a hint to mod_pagespeed, if you will. JOSHUA MARANTZ: Yeah. And then there’s a code
“cc,” which actually means Combine CSS. And then there’s a HASH. And this HASH is
very important. This HASH is the technology that
lets mod_pagespeed serve any resource with a one-year
cache lifetime, because that HASH is the MD5 sum of the
optimized resource. ILYA GRIGORIK: So it’s the MD5
sum of the combined CSS files. JOSHUA MARANTZ: Right. So it’s kind of a signature
for this file. Or you could think of it as
a version of this file. ILYA GRIGORIK: Right. So if I modify, let’s say,
big.css, and I add extra white space, and I save it, the MD5
sum would change, and you would regenerate
this resource. JOSHUA MARANTZ: That’s correct
if we didn’t minify that CSS file and get rid of
that white space. ILYA GRIGORIK: OK. Right. So white space is
a bad example. JOSHUA MARANTZ: But if you
actually change the content of the CSS file, then we would
have a different MD5 sum. So we might have cached
this one for a year. And you might think it’s stale,
but it’s OK because we’ll never reference
it again. ILYA GRIGORIK: Right. OK. And I guess maybe to backtrack a
little bit, and the reason I guess we want to do this is
fetching multiple files consumes maybe additional
TCP connections. So by combining it all
together, we have one resource, which you can
fetch down faster. And hopefully that’ll lead to
a faster render on the page. JOSHUA MARANTZ: Exactly. This is kind of the height
of that waterfall chart that you see. If your waterfall chart, for
example, doesn’t fit on your screen, you know that you
have some work to do. ILYA GRIGORIK: Right. Yes. That’s a good rule of
thumb, in general. JOSHUA MARANTZ: The other point
that I want to make is that by providing long cache
lifetimes, you make all the caches that are in the network
in between the server and the client more effective. You make the browser cache
more effective. You make any caching done at the
ISP layer more effective. And you make content delivery
networks more effective, because the versions of the
assets that they store, they know that they don’t have to
check back with the origin to revalidate for a year. ILYA GRIGORIK: Right. OK. Very interesting. So let’s take a look at
the monster diagram. So maybe you can just walk us
through what happens when an HTTP request comes up. JOSHUA MARANTZ: Sure. So this is the view of what
happens in Apache. Apache has a module
architecture, which allows anybody to write their own
Apache module that can help make some kind of transformation
to the content or the networking. ILYA GRIGORIK: Some examples
are, like, mod_deflate, mod_security. There’s lots and lots
of these things. JOSHUA MARANTZ: Actually,
mod_deflate is a really good example, because what that one
does is– the most important thing that you can do even
before you run mod_pagespeed is make sure to always
compress your output. And that is basically an output
filter that just looks at the stream of bytes that are
coming through it and just makes them smaller and adds the
header to say, by the way, I gzipped it. ILYA GRIGORIK: This is a little
bit of an aside, but would you use mod_deflate
with mod_pagespeed? JOSHUA MARANTZ: Actually, if
you have mod_pagespeed, we will turn on mod_deflate. So they work together. Mod_pagespeed would be less
effective if mod_deflate wasn’t there. But they’re complementary,
because mod_pagespeed doesn’t attempt to gzip assets itself. It depends on mod_deflate
to do that. But it does make them smaller
in the first place. And image compression is
not really addressed by mod_deflate as well. So the way that an Apache module
works is that it can install into the Apache kernel
an input filter, which takes requests and mutates them in
some way that’s particular to the filter. Content generators can look at
URLs and say, either I know how to handle that one. I’ll take it over. Nobody else needs to
worry about it. Or, that one’s not for me. I decline it. I’ll pass it on to
the next one. And they can install output
filters, which just get put into the chain of the byte
stream as it goes through. ILYA GRIGORIK: Right. So here in this diagram, you
just have the PHP handler. So if I have a .PHP file, it
would intercept that and say, hey, that’s for me. I will generate the
byte stream. JOSHUA MARANTZ: Exactly. So mod_pagespeed puts a handler
in which looks at those .pagespeed .resources. ILYA GRIGORIK: So it’s like
a custom extension. JOSHUA MARANTZ: Exactly. And that’s for handling
resources, for handling images, CSS, and JavaScript. For HTML, it installs an output
filter where it looks at this stream of
bytes going by. And whenever it finds HTML, it
parses it and tries to make optimizations in it as
it goes through. So if an HTML file comes into
Apache, what will typically happen is it’ll go through
the input filters. The PageSpeed resource handler
will look at it. But it won’t do anything
with it, because it’s not a resource. The PHP handler, if PHP was
handling those, would take the URL and generate HTML out, which
would then be sent to mod_pagespeed’s output filter,
which would start looking at HTML and deciding, based on the
tags and the characters that are parsed, whether
it wants to mutate those bytes or not. An important thing that
mod_pagespeed tries to do is never slow down the page. So some of the things that
mod_pagespeed does are actually compute-intensive
or rely on the network. ILYA GRIGORIK: Right. That was actually going
to be my question. It sounds like a lot of work. JOSHUA MARANTZ: There is
definitely work going on. So there’s HTML parsing, but
streaming parsers go fast, so that’s not really a problem. But when we have to go and
optimize an image– well, we have to fetch images, we have to
optimize images– we’ll do that in the background,
typically, and also optimize them in the background. So we will only do the tag
replacement for images if we already had that in cache. ILYA GRIGORIK: Interesting. So let’s say I’ve just started
my web server. Nobody has hit it. And I make the first request, I
would still get the original unoptimized resource them? JOSHUA MARANTZ: That’s right. Probably for the most part,
your resources will come through unoptimized,
but Collapse Whitespace would work. ILYA GRIGORIK: Right. OK. So you would apply filters
that work really fast. And then on the second hit, you
would actually serve me the optimized content. JOSHUA MARANTZ: Exactly. ILYA GRIGORIK: Right. That’s a very good point. OK. So maybe one more quick note,
which is to say we talked about PHP, but I think it’s
important to note that one of the strong or popular
applications for Apache is that it can act as a proxy. So if you have some other server
running somewhere– that can be another app server,
maybe it’s a Ruby app server, Java, what have you– and you’re using mod_proxy,
this still applies, right, because it’s effectively
another handler? JOSHUA MARANTZ: That’s
correct. It’s easy to set up
mod_pagespeed as a reverse proxy or actually as a forward
proxy as well. And that way, it can optimize
content that’s not necessarily even generated within
the Apache server. ILYA GRIGORIK: So if I have a
Java server running right now serving my assets, I could
actually put Apache in front, turn on mod_pagespeed, and maybe
inherit some of these observations for free. JOSHUA MARANTZ: Right. That would be a reverse
proxy application. ILYA GRIGORIK: That’s right. Very cool. So we talked a little
about images. And images are a big deal
on the internet today. Just prior to this, we were kind
of talking, and we said that over 50% of all the
bandwidth on the internet is video, which is moving
pictures. But then the second-biggest
component is still images. So you guys put a lot of work
into optimizing images, in particular. And you already covered some
examples, but this is kind of an in-depth look at
what happens. JOSHUA MARANTZ: Yeah. This is kind of the life of an
image as it flies through mod_pagespeed. You’re right. A lot of the benefit of
mod_pagespeed, the real wins in terms of bandwidth usage and
latency that mod_pagespeed gets, at least in the core
filter set, on first view are from making images smaller. And so we put a lot of
effort into that. And this is how it works
at a high level. So we install a filter called
the image rewriting filter, which scans for elements with
image tags, and it looks for the source attribute. And the way that it works, in
order to not slow down HTML even on the first view, is it
looks in a metadata cache to see if we’ve seen this resource
at this width and height before. So because we’re optimizing
images for the element that they’re going to be drawn into,
those all go into the key of the metadata cache,
if you will. And so when that’s a hit, if
we have a warm server, it doesn’t matter whether the
browser cache is warm or cold, but if the server cache is warm,
then all we have to do to deliver that optimized image
is swap out that source attribute with the one that we
found in our metadata cache. ILYA GRIGORIK: The optimized
version of the image. JOSHUA MARANTZ: Exactly. And so if it’s a miss, though,
then we pretty much have to give up on this round, because
we’re not going to fetch a large image and optimize it
on the fly it without delaying the HTML. So we spin up a machine that
runs in the background– not a physical machine, but a
finite-state machine that runs in the software– and it goes off and
it does the fetch of the image resource. It runs the image optimization
algorithms. And we discussed what
those were before. So we can do transcoding. We’ll do resizing. And we’ll do recompression. ILYA GRIGORIK: Right. And I’m guessing you guys also
do stuff like removing extra metadata, which is
pretty popular in like PNG images, right? JOSHUA MARANTZ: Sure. That’s actually– in the core set, we’ll remove
the metadata and resize, and then it’s an option
to recompress. ILYA GRIGORIK: Actually, I’ll
highlight the resize, because I think this is very
important. You mentioned it, but I think
it’s still worth talking about for a little bit. So if I have an image– say if I have an image tag that
says the width of this image is 100 pixels and
the height is 100 pixels, so it’s square– but I can actually push a
larger image into it. It can be 1,000 by 1,000, which
is actually not uncommon on the internet. Somebody takes a photo. They resize it in
whatever editor. They upload it. And you’re actually getting the
full-res image, which then gets rescaled in the browser. So just by providing the width
and height in the markup, mod_pagespeed will be smart
enough to look at that and say, yes, but the origin image
is much bigger, so let me rescale that and serve
the proper version. JOSHUA MARANTZ: Yeah. I would go further to say not
only is it not uncommon, it’s quite common to take images
from your camera and put them online. ILYA GRIGORIK: So this alone
saves me a lot of time, because if I’m thinking
about– if I have a lot of images,
you mentioned kind of the newspaper use case
earlier, right? Lots of images there. I can just define the width and
height and push kind of the resizing logic
to mod_pagespeed. JOSHUA MARANTZ: Exactly. ILYA GRIGORIK: That’s
very cool. JOSHUA MARANTZ: And so we do
this kind of gauntlet of image optimizations. And when it comes out the other
side, we have a new URL with kind of the instructions
on how that got created and coded into it. So this image in this
example– this is on modpagespeed.com, which has
all of our examples. On modpagespeed.com, you’ll find
this Puzzle.jpg is the origin image. That’s shown in green. The width in which it was
displayed in our sample page is 256 by 192. ILYA GRIGORIK: Right. So this is from the
HTML markup. JOSHUA MARANTZ: Exactly. It was a JPG file originally,
but we were displaying it in Chrome. And we took it and now we’re
going to transcode it to WebP so that it’s delivered
more efficiently. We also put into the URL the MD5
sum of this image file so we can serve it for
a long time. And even if I change Puzzle.jpg,
then it won’t be a problem with stale caches. ILYA GRIGORIK: Right. It’s kind of a similar
pattern to what we saw with CSS earlier. JOSHUA MARANTZ: Exactly. ILYA GRIGORIK: OK. And I guess the WebP one is
really interesting, because this would get served– you
mentioned because this was in Chrome, you’d get WebP. But If I visited the same
website in, let’s say, Firefox browser, which currently does
not, unfortunately, support WebP, I would still get a JPG. JOSHUA MARANTZ: Exactly. So as a site owner, you can
make a decision, by using mod_pagespeed, that you’re going
to serve images in a modern web format that is not
supported by all browsers, but your site will still work
well on all browsers. ILYA GRIGORIK: Very cool. So that’s not even something
that I could do with a build step, right? JOSHUA MARANTZ: Correct. ILYA GRIGORIK: Yes. Very nice. JOSHUA MARANTZ: So I wanted to
dive into a little bit of what the PageSpeed Optimization
Library is. ILYA GRIGORIK: So this is
the part that powers modpagespeed.com, right? JOSHUA MARANTZ: Right. So this is a server-independent
library that does all of these
optimizations. And the way that it gets hooked
up to– and again, this is completely open-source
software. But the way that it gets hooked
into a server stack is that whoever is doing that
supplies some mechanism to do HTTP fetching and some mechanism
to do caching. And in different environments,
there are different technologies for accomplishing
these things. ILYA GRIGORIK: These things
are implemented in Apache. So Apache, I’m guessing, already
has an HTTP fetcher, which you reuse, but the cache
is likely something that you guys have implemented
yourself. JOSHUA MARANTZ: Sure. Actually, the cache that we use
for mod_pagespeed is also open sourced and would be
the default setting. But typically, in a serving
environment that has some maturity to it, there will be
some other caching solution you’ll want to use instead
of the one that we have open sourced. ILYA GRIGORIK: So in fact,
maybe could I even use something– like if I’m building
something with this library, I could use
memcached, right? JOSHUA MARANTZ: Yes. Yes. You’re kind of forcing
me to tip my hand. So a feature that we will be
releasing soon but is not yet in 1.0 is support of memcached,
which is an important feature for
scaling up websites. ILYA GRIGORIK: Right. Nice. OK. So if I have a custom server, I
could actually take this and build my own mod_pagespeed
variant. JOSHUA MARANTZ: Exactly. Yeah. There’s API documentation
on the web in the developers’ site. And we would be happy to support
actively anybody interested in porting this
to a new platform. ILYA GRIGORIK: Right. And we’ll mention this later,
but you guys do have an active Google Group where people can
come in and discuss, propose new filters, file bugs, all
that kind of stuff. JOSHUA MARANTZ: Yeah. There’s actually a variety
of support forums. There’s the Google Groups. There’s the issues list. People seem to be
fairly active on Stack Overflow as well. We try to be responsive. ILYA GRIGORIK: Yeah. I see a lot of questions
there. JOSHUA MARANTZ: We try to
be responsive to that. But we track everything in our
issues list, which is all accessed off of code.google.com. ILYA GRIGORIK: Right. OK. Perfect. So I wanted to highlight a few
kind of tips, configuration tricks, and a few
other things. We looked at the guts. We talked about kind of
high-level things. But one question that I get
quite commonly with mod_pagespeed is like,
OK, great. So I grabbed this,
installed it. I ran these three commands. Now it’s on. What if it doesn’t work, or
I’m scared, or how can I experiment with mod_pagespeed? And there’s a couple
of ways to do that. First of all, because we have
this additional module installed, you can actually
configure through a couple of different ways. So you can use query parameters
that will be intercepted by mod_pagespeed. So for example in this rewrite
CSS example, we have ModPagespeed=on, which basically
says turn on mod_pagespeed for this
request only. So you can have it disabled,
but I’m going to enable it here. And by the way, enable
this specific filter. So if I want to experiment with
some non-core filter, I can just pass this in, see what
happens, kind of test the waters, and then decide if I
want to make that the default for my configuration or not. JOSHUA MARANTZ: Yeah. It’s kind of a way to
interactively rapidly iterate on your site without having to
restart Apache or anything. ILYA GRIGORIK: I think it’s one
of my favorite features. I love just being able to
quickly get feedback on, how is this going to look? One alternative to that is to
actually send HTTP headers. So if you have some sort of a
client or server solution that you want to test with,
that’s another way. And then the last one is– we
actually mentioned this– the mod_proxy forward proxy
example, where you can actually say, please fetch me
this other site and run it through PageSpeed and
show me what will happen when we do that. ILYA GRIGORIK: Yeah. This is a very good way if
you’re considering the option of using mod_pagespeed on your
site, but you’re nervous about like installing it and rolling
out to your users without kind of looking at it first– ILYA GRIGORIK: Yeah. Let’s install it on 1,000
servers and see what happens. JOSHUA MARANTZ: –you can
install it on one server local to your system, which is running
your origin content. It’s running as a proxy. And then you can look at your
site through mod_pagespeed by setting a browser proxy. ILYA GRIGORIK: Right. That’s a very handy tool. And in fact, all three of these
are documented really well on the mod_pagespeed
site. So I have a link down here. But if you guys search on your
favorite search engine for mod_pagespeed and experiment,
you’ll find instructions for how to set up the mod_proxy,
which is really handy. I wanted to highlight this,
which is we mentioned already that there is a lot of
different filters. And we do have good
documentation. And there’s a couple different
resources. So one that you mentioned, which
is modpagespeed.com, where we actually list
all the filters. And we actually also
provide the demos. So it’s usually kind of a simple
file which illustrates what the filter does. So if you guys want to take a
look at that, that’s a very good place. And another one is, once again,
the configuration, or config filters, page on our
developers.google.com site, where we actually explain
what each one does. And we also highlight which
ones are in the core set and not. And another thing I’ll mention
is that by default, when you enable mod_pagespeed,
as Josh said, you have your core filters. But you can actually
say, don’t worry about the core filters. I’m going to hand-tune all
the filters myself. So you can customize it
completely for your site. JOSHUA MARANTZ: Yeah. By turning on the core filters,
what you’re doing is you’re kind of letting us make
the decision as we move the software and advance it of what
we think is safe for most sites, and you’ll take that. If you want to have total
control, and when you upgrade you’ll decide which filters you
want to enable for the new release, then you can put it in
pass-through mode and then add the filters that you want. ILYA GRIGORIK: Right. So that’s a good point. So I should probably, unless
I have a specific reason to avoid core filters, I could
leave that on because maybe in the subsequent release you guys
have added another filter or improved another filter such
that now it’s considered safe, and that would just be
automatically included during an upgrade. JOSHUA MARANTZ: Yeah. I’ll give you an example. I believe in the current
release, we have a filter called Flatten CSS Imports. One of the biggest anti-patterns
for performance in CSS files is to
use at-import. But it’s incredibly convenient
to do it. As a designer, that’s
what you want. You want to be able to
structure your code. You want modular code. So that’s a good thing. It’s bad how it’s delivered. Mod_pagespeed with the Flatten
CSS Imports filter will flatten those all out so you get
the best performance when you deliver it, but you don’t
have to maintain that. That was something that
we built into the product some time ago. But we wanted to do a lot of
testing on it to make sure it was rock solid. That’s being promoted into the
core filters in the next release after 1.0. And so if you have
core filters then you just get that. ILYA GRIGORIK: Interesting OK. That’s good to know. So we also touched on
some configuration. But one of the really nice
things about Apache is that you can configure it in a
million different ways. So there’s your Apache config,
where you can specify your virtual hosts. So mod_pagespeed can be
configured at a v-host level. So an example, down here we’re
saying mod_pagespeed is on for this example site, and
pass-through is actually the command that tells us, don’t
include the core filters. I’ll hand-tune the filters
that I want. So we’re just enabling these, I
guess, five filters for this example site. But I can also be much
more granular. I can use the htaccess file. So for example, I
have my v-host. I have my example file. But in my slash, I don’t know,
assets I want to have a different set of filters, I
could actually drop in an htaccess file with another
configuration. JOSHUA MARANTZ: Right. And there’s yet another twist,
which is you can use a directory scope in the
configuration file. ILYA GRIGORIK: Right. So I could literally have
different filters running on different subsections
of my website. JOSHUA MARANTZ: Exactly. Actually, the implementation
of just how the options get configured is itself a pretty
big topic within the mod_pagespeed codebase, because
you can configure my request headers, by query
parameters, by virtual host, by director scope, by virtual
host, and at the root. ILYA GRIGORIK: Yeah. But I think it highlights
the fact that our users have asked for that. So they are using all of
these mechanisms to customize their sites. So we needed to have it. And it allows you a lot
of flexibility, which is very nice. And experiments. So I think this is something
that you guys added just recently. JOSHUA MARANTZ: That’s right. We’ve been traditionally using
WebPagetest, which is an amazing tool for doing
detailed analysis. That’s how we produced the
video and the waterfall diagrams that we saw earlier. But WebPagetest will allow you
to run your tests from a set of servers that are running in
some corner of the world. There’s ones in Singapore,
in Dublin, in Virginia, and so on. But what you really want to do
at some point after you deploy is see what experience your
actual users are having. And so what this does is it
injects some performance measurement, using Google
Analytics, right into the web pages and allows you to bucket
users into experiment groups. And you can say, for example,
first of all you would establish what Google Analytics
ID you want to report the data to. And then you can say, well, I’m
going to send a third of my users into kind of a control
bucket which doesn’t have any optimizations in it. mod_pagespeed is running, but it
isn’t doing anything except injecting the Analytics
experiment. The second one we can say,
let’s just have the image compression and nothing else. And the third one, let’s have
the default settings. Or there’s a whole set of
options that you can do to customize your experiments. Then you can let this run
for a day, a week. Depending on the experiments
you might leave a small control group just to see how
it’s doing, and go back to log into Google Analytics and see
how users for each bucket are faring in terms of the latency
that they’re seeing on their web pages. ILYA GRIGORIK: So this
is really cool. So what you’ve described there
is the difference being synthetic testing and real
user measurement, which I think is what you’re referring
to when you’re saying Google Analytics, right? JOSHUA MARANTZ: Yes. ILYA GRIGORIK: And we actually
had an episode with Justin Cutroni from Google Analytics
where we talked about navigation timing and why
it’s so important. And the point that Justin always
loves to make is that it’s great that the developers
want to optimize the site. They always want to
optimize the site. But how does it affect
my bottom line? Like the business metrics,
the dollars as he put it. So this will actually
tell you. So we have three buckets here. And if I have in my Google
Analytics some conversion metrics– that could be a
purchase, that could be a registration, even time on
site or bounce rate– now I could measure against
that and say, well, you know what? Users that get a faster
experience are staying for longer. Maybe they’re converting
for more. And that makes for a very
compelling case to the rest of the team to say, this is why
we should invest into more performance optimization. JOSHUA MARANTZ: Exactly. ILYA GRIGORIK: Awesome. I love the business use case. It’s not just like speed
for speed’s sake. Although speed for speed’s sake
is also good, because it makes the web faster. So this example, this is
actually a very common question that we see, which is
many people have already applied some optimizations
to their site. So a good example of that
is something like domain sharding, where the problem is
that modern browsers allow up to six connections per host. So if you’re hosting a lot of
images on your domain, you may get blocked as you’re trying to
download a lot of images. So the general best practice for
that is to say, well, host it on different subdomains. And then that will allow the
browser to open multiple connections– more than six, I should say. But that creates a little
bit of complexity for mod_pagespeed. This is where you need to
kind of hand-tune your configuration. So can you explain what’s
happening here? JOSHUA MARANTZ: Sure. So the challenge is
that you want to– well, there’s a couple
challenges. So if somebody has hand-sharded
their domains or, in many cases, just done a
simple best practice of moving their resources to cookie-less
domains, which is all good, the first thing that you
have to do if you want mod_pagespeed to be effective
is you have to let us know what those domains are, because
mod_pagespeed doesn’t know what the domain
mapping is. So we have pagespeed.com
settings to tell us. So the first thing you have
to tell us is what are the domains that are basically
equivalent on your site. And so if you have, like,
static.example.com, your HTML is coming on www.example.com,
you have to authorize, at least with ModPagespeedDomain,
static.example.com. And if you’ve done
hand-sharding, you may have to authorize more than one of those
and tell us that they are essentially equivalent by
mapping them to kind of a canonical name. ILYA GRIGORIK: Right. So if I’m running example.com,
and I’m serving images from example.com, then mod_pagespeed
would say, yes, I know that I’m hosting this. Hence, I can optimize
this asset. JOSHUA MARANTZ: Exactly. ILYA GRIGORIK: But if I’m
hosting on a cdn.example.com, that could be anywhere, or it
could be a third-party asset. So mod_pagespeed won’t touch
that by default. JOSHUA MARANTZ: Right. If for example you’re serving
an image on Flickr or something, Flickr is not yet
running mod_pagespeed. And so if you just rewrite the
URL the way we did with the .pagespeed [INAUDIBLE] and it’s on Flickr, then it
just won’t work, because Flickr won’t be able to
decode that name. So we wouldn’t necessarily
authorize that. But if you have images on your
site that you want to put onto a CDN that knows how to reach
back to your origin, then you can do a domain mapping to say,
I want to take the images that are on example.com and put
them on cdn.example.com. Now when mod_pagespeed rewrites
that URL, when it optimizes the image or the CSS
file, et cetera, it will rewrite the domain to
go onto the CDN. This is, I think, kind of a
development feature which allows you, for example,
to develop locally and turn that off. But then when you’re ready to
actually push resources to the CDN, you can turn that on. This also allows you
to apply sharding. So by establishing the shards,
if you want to, for example, shard two ways, then you can use
the command that we gave here, ModPagespeedShardDomain
example.com to example1, and example2. ILYA GRIGORIK: That’s
the bottom one here. JOSHUA MARANTZ: And then
mod_pagespeed will kind of randomly disperse the resources
to those two domains so that you can have more
parallel connections. ILYA GRIGORIK: So this is
definitely a more advanced use case where that’s going
to reach deeper into mod_pagespeed and also think
about how does this work in the context of me using a CDN. But that in itself is actually
an important point. It is CDN-friendly. So you can make it work with
your CDN provider and help your CDN serve optimized
assets. JOSHUA MARANTZ: Exactly. And this is something that
I think it’s useful to experiment with. One of the things that you
probably don’t want to do is try to hand-shard your resources
in your HTML file, because the best practice
is to shard domains, but exactly to what? I’ve seen the right answer be
four, the right answer be two, the right answer can sometimes
just be one. And so all the effort you do to
hacking your HTML to edit the domains really is kind of
counter to the notion that you want to experiment with it. And you can experiment very
easily by just iterating over your pagespeed.conf file and
looking at WebPagetest. ILYA GRIGORIK: Yes. That certainly makes
it a lot easier. Yeah. So we talked about the
forward proxy. But I recently came across a
blog post, I think it was Frank Denis that wrote this
really awesome blog post that kind of blew me away, because
what he did was he used mod_pagespeed as a forwarding
proxy for his phone. And the basic observation was
that when you’re on your mobile device, you probably
don’t have a Wi-Fi connection most of the time. You’re in 3G. If you’re lucky, you’re
in 4G, what have you. And you’re downloading these
massive websites. So instead of using
mod_pagespeed to accelerate your site, why not use
mod_pagespeed to accelerate the rest of the web
as you fetch it? So in this diagram here,
I have my phone. We’re sending a request through
this forward proxy, which is running
mod_pagespeed. Mod_pagespeed requests the
actual site that I requested. I get this fat response back
with all kinds of unoptimized images, et cetera. Mod_pagespeed crunches all
of that and sends me the optimized assets, which
I thought was really, really clever. So he did this with
his iPhone. And he observed that for the
sites that he tested it on, he got much faster renders
and much fewer bytes. And in fact, he shared
some examples. And we’ll take a look
at those later. But these are the actual
filters that he used. So he shared those. And some examples that I wanted
to highlight was first, he enabled core filters. So that’s kind of by default. But I think he just
wanted to have it in there to be explicit. He said, I’m going to rewrite
images, convert JPGs to WebP– so he knows that he’s accessing
this on Chrome on iOS when he’s using that– convert PNG to JPG. And in fact, this is an
important one that you mentioned earlier, when I’m on
a mobile device, I have a small screen. I probably don’t want 100%
fidelity of all the pixels. I’m OK with the 75%
compression ratio. And that gives me a
lot of savings, byte savings for images. So this is kind of an
interesting example. And he also did a couple of
aggressive filters, which say defer all iframes until after
onload and other things, just to accelerate his browsing. JOSHUA MARANTZ: Yes. And pointing out, in particular,
defer JavaScript has a huge impact on the
speed of websites. It’s something that you want
to look at the results of when you do it. It was aggressive to put it into
a forward proxy, but he was extremely happy
having done it. ILYA GRIGORIK: Right. Yeah. And these are some examples. So of course, this is not
representative of the entire web, but he kind of
highlighted a few. So for example, this over-blog
URL, it went from 400 kilobytes to 271, which
is pretty significant. Going from 39 seconds of onloads
to 2 seconds is a big improvement. And not only that, but you can
see that because he was combining resources, it
went from 34 to 21. So the mobile browser had
to make fewer requests. All of those things are a win. And he got a better
mobile experience. Now this next one just kind of
completely blew me away, because I didn’t believe it. But it serves as
a good example. Cooking With Frank. So this is a blog,
lots of pictures. And guess what? The unoptimized version
is 3.15 megs. With compression, it comes
out to be 10 times smaller, 340 kilobytes. So when I’m on my mobile data
plan, I probably want the 340 kilobyte version. It’ll load much faster. Instead of making 85 requests,
it made 28 requests. So this is a dramatic
difference. JOSHUA MARANTZ: Yeah. We’re still learning exactly
what works really well on what kind of mobile device and
what kind of connection. But it seems likely that having
a lot less requests will benefit mobile even more
than it will benefit desktop. ILYA GRIGORIK: Yeah. So this, in general, seems like
a very interesting area to explore for mod_pagespeed,
like I want this on my phone. So just a quick recap. We’ve covered a lot
of stuff here. So we talked about the
upcoming 1.0 release. It’s an open-source
Apache module. It works with 2.2 and 2.4. Kind of the pitch, if you
will, is just-in-time performance optimization
for your website. And it’s already very widely
deployed across the web. So we feel it’s 1.0 ready. It’s 1.0 ready by Google
standards, which is perpetual beta. So that says a lot. But one question I do have for
you is, what’s after 1.0? Are we done? JOSHUA MARANTZ: I feel
like we’re at the beginning of this process. We’ve definitely discovered
that there is some meat to chew on here. There’s a lot more
that we can do. SPDY is an obvious topic. The rules change when you’re
working with SPDY. Combining becomes less
important, because you can multiplex multiple resources
over the same connection. Inlining becomes
less important. ILYA GRIGORIK: Same
reason, right? JOSHUA MARANTZ: In the release
that is coming after the 1.0 release, we’ll start seeing
some of the deeper SPDY integration. So Google also has a module
called mod_spdy, which we work pretty well with. And look for more
in that space. I would say the big wins that
we have right now, images; extending cache lifetime, which
is something that really benefits repeat viewers to
things like news sites; deferring JavaScript. There’s kind of other big areas
where we’re more aware of the networking
characteristics of the page and we’re optimizing. I feel like we’re relatively
early in our understanding. We’ve found a lot of
good things to do. But when we find good things to
do, it usually uncovers 10 more that we don’t have
time to do yet. ILYA GRIGORIK: Yeah. So I think that’s very
representative of the web performance community
in general. I think we’re still finding a
lot of interesting edge cases. And the browsers are only
getting smarter. We’re only getting more and
more assets on the web. So in fact, we know that the web
pages are growing, both in size and number of requests. So it sounds like there’s
a lot of work to do. JOSHUA MARANTZ: There’s
an astounding amount of work to do. But I think that we’ve come to
a point now where we have a stake in the ground where we
have demonstrable benefit. We have adoption. And we’d like to grow it. And we’re ready to take
off from here. ILYA GRIGORIK: So I’m glad that
you guys are doing it, because that makes my life
a little bit easier. I can install this and inherit
all of the work that you’ve put into this. So I think for the last
slide here, we’ve covered some of these. But I want to highlight these,
because I get these questions quite frequently on Stack
Overflow, through email, and through other means. So I kind of bucketed them. We already talked about
mod_deflate, mod_expires. So those work together
with mod_pagespeed. JOSHUA MARANTZ: That’s right. In fact, mod_pagespeed
turns mod_deflate on. And it’s kind of dependent on
mod_expires, because we have to know how often to pull
the origin resource. And you definitely want to
put an expires header. You want to use that. Tell us how often to check back
to see if your resource has updated. Actually, I just want to point
out one other thing. Mod_pagespeed can also look
directly at the file system, in which case it can just stat
the file to see if it’s changed, which is a little bit
more efficient if your files are right there on the same
server, as opposed to being generated by PHP or pulled
from somewhere else. ILYA GRIGORIK: Actually,
that’s a good point. That’s another config flag
that you can find in our documentation. JOSHUA MARANTZ: That’s
correct. So that’s mod_pagespeed
load from file. And I think that if the files
are there on your disk, just get mod_pagespeed to look
at them directly. But if they’re not and we have
to do a fetch to get them, then you definitely want to use
mod_expires to tell us how often to do that fetch. ILYA GRIGORIK: Right. OK. For the CDN , I think we’ve
covered a little bit. JOSHUA MARANTZ: Yeah. We covered it. CDNs are driven by the
cachability of resources we make things cachable
for a year. ILYA GRIGORIK: Which also,
I think, answers the next question which is, if you’re
using a CDN– or maybe if you’re not using a CDN, rather,
but you are using another cache in front, maybe a
Squid, a Varnish, what have you, maybe Nginx, those should
still work, right? JOSHUA MARANTZ: Exactly. ILYA GRIGORIK: They’d just
be more efficient. JOSHUA MARANTZ: Exactly. They’ll just have to pull
the origin less often. ILYA GRIGORIK: Yeah. OK. Perfect. So we talked about or
we mentioned the mod_pagespeed cache. So mod_pagespeed has
its own cache. We talked about the
upcoming memcache. But as a developer, do I need
to worry about that? So if I have my assets– and who manages that? If I update my asset, do I need
to worry about flushing the cache, et cetera? JOSHUA MARANTZ: So mod_pagespeed
comes pre-configured to use the
file system as a cache. And that works reasonably
well. As you scale up websites, you
have to think a little bit. We set the default cache, I
think, at 100 megabytes. Is that enough for
your assets? Or do you want to
make that grow? ILYA GRIGORIK: So it’s something
you can probably tweak in the configuration. JOSHUA MARANTZ: That’s another
configuration parameter. How often we go and garbage
collect that cache is another question. So when you change your assets,
you don’t have to manually purge the cache. Mod_pagespeed will just
do it automatically. ILYA GRIGORIK: And that was
actually that file name kind of scheme that we looked
at earlier, right? JOSHUA MARANTZ: Yeah. Well, the files on the cache
have recognizable names. But they’re not exactly
that scheme. But the hierarchy of your URL
space for your assets is reflected in the cache. So you can kind of poke around
the cache and see what we have in there. And you can just delete it. They’re just files. ILYA GRIGORIK: But it sounds
like generally speaking, I shouldn’t be touching them. JOSHUA MARANTZ: But you don’t
really need to touch it. You can just configure how big
you want it to be and how often you think we should
go and purge it. ILYA GRIGORIK: Perfect. JOSHUA MARANTZ: And upcoming,
you’ll be able to say, well, instead of storing the files on
the disk, I want to store them in memcache. And here are the host
and port numbers of my memcache instances. And then you can share that
cache among multiple servers so that you can scale up your
website a little bit better. ILYA GRIGORIK: Yeah. That’s very cool. So we actually talked about
affecting or not affecting the page load time when the
cache is empty. So that was that if we don’t
have the image resource optimized, we will just serve
the original image. But on the next hit, you will
get the optimized resource. So as you said, the last thing
that mod_pagespeed wants to do is to make your site slower. That would be the
anti-pattern. So that should never happen. But I’m guessing all
of this work does consume some resources. So what should we expect? If I install this on my
server is there kind of an average number? Does it really vary based on
the site, because it seems like it would, right? JOSHUA MARANTZ: Sure. A very image-rich site that
installs mod_pagespeed for the first time will go through
a period where we’ll use resources on the server to
optimize the images. There will be a bounded
amount of resources. This is actually another config
parameter that you can set, because we don’t know
exactly how many CPUs you have or anything. But by default, we will do, I
believe, eight concurrent image optimizations maximum
per physical machine. ILYA GRIGORIK: Right. So it’s like background workers optimizing these images. JOSHUA MARANTZ: And
that’s across all of the Apache processes. And so it doesn’t just fan out
arbitrarily until it kills your machine. ILYA GRIGORIK: Right. That would be an anti-pattern. JOSHUA MARANTZ: Yes. That would be another
anti-pattern for serving your resources efficiently. But what will happen is if you
have a page full of images, and the first time somebody
goes to them, we’ll start spinning up the optimization
of those, once those are in cache, that’ll settle
back down. So there will be typically
be a few minutes– it would vary on the site– of where all these images get
optimized, put into the cache, and then you’re good to go. If the cache is too small,
then it might be ongoing. ILYA GRIGORIK: So the most part,
if your website doesn’t change dramatically every couple
of minutes, chances are your visitors will be just
hitting the cache. And you would only see this
extra work being done when you have new assets or, for whatever
reason, that asset got evicted from the cache. JOSHUA MARANTZ: Right. ILYA GRIGORIK: And that’s where
you may want to go back and configure or check, is
your cache being used up? Maybe you should increase
the size or something to that extent. JOSHUA MARANTZ: This is probably
also a good time to point out that mod_pagespeed
offers some visibility into what it’s doing, because it
has a statistics page. So on the local server, you
can go to mod_pagespeed statistics, which by default
is accessible only from local host. But you can configure
that too. And then you’ll see how many
image rewrites are going on. You’ll see a variety of
statistics, which kind of give you a way to put your finger on
the pulse of mod_pagespeed. ILYA GRIGORIK: So I’m guessing
if I’m using some monitoring system, I could probably get
the variables out of there, shove it into Ganglia or some
other system, and track all that performance
there as well. JOSHUA MARANTZ: It’s
very scrape-able. And in fact, I think very soon
after mod_pagespeed was released people started to say,
well, I’ve hooked this up to this visualization system,
and here’s what it’s doing. ILYA GRIGORIK: That’s the first
thing that I would look for as well. That makes perfect sense. So shifting gears a little bit,
we didn’t specifically talk about mobile, with the
exception, I guess, of the forward proxy. But is there anything in
particular that we need to be aware about for mobile
and mod_pagespeed? JOSHUA MARANTZ: Mod_pagespeed,
this is actually one of the areas where I think we can do
a lot more in the future. But we’re already providing a
substantial benefit, making things smaller and
less requests. It’s all good. ILYA GRIGORIK: It’s images. We saw that, right? JOSHUA MARANTZ: It’s all good. What Frank Denis did was
he cranked the quality level down to 75. Typically, we would recommend
if you want to do this for desktop, we would say 85
is a very safe number. But for mobile, you might want
to crank it down further. I can’t think of anything that
mod_pagespeed does that would be undesirable for mobile. I think it’s all good. ILYA GRIGORIK: Smaller
resources, fewer requests, all of those things are prime
candidates for improving mobile performance. JOSHUA MARANTZ: The
only question, is there more we can do? And the answer is absolutely. Stay tuned. ILYA GRIGORIK: Right. I think that’s a good note
to kind of end this on. I’ll just mention that we do
have a lot of online resources about mod_pagespeed, if we
didn’t answer your question. So good places to start
are modpagespeed.com. I think there’s actually links
to the Google Group, the issue list, and demos there. So that’s a great place to
kind of kick off your exploration. We do have a Google Group where
you can ask questions. And of course, you can also just
reach out to myself or Josh, and we will be happy
to answer any questions. So thank you, guys. JOSHUA MARANTZ: Thanks.

56 thoughts on “Make the Web Fast: Automagic site optimization with mod_pagespeed 1.0!

  1. vbtechsupportcom Post author

    Thanks guys for the hard work. I've tried mod_pagespeed and stress testing my servers i found you get 2x times faster page load speeds. But there's a cost to this, as there's 3x times cpu usage load increase and this is with image rewrite_images filter disabled. Hopefully, version 1.0 will improve cpu load consumption 🙂

    Reply
  2. Colin Brazendale Post author

    How do you install this? Or must I ask my hosting company?

    Reply
  3. ShopBakersNook Post author

    What is the typical time need to install this on a server? 2 hours 10 hours?

    Reply
  4. Ilya Grigorik Post author

    Should be measured in minutes, or less. There are ready-made packages for popular platforms, which means that you should be 2 or 3 commands away from having installed and running in your Apache server.

    Reply
  5. Ilya Grigorik Post author

    If you manage your own server, then check our install instructions on the project page – we have packages for popular platforms, which should translate into 2 or 3 commands to get everything up and running. Now, if you are on a shared hosting platform, then yes, definitely ping your hosting provider and ask about enabling mod_pagespeed on their servers!

    Reply
  6. Ilya Grigorik Post author

    We had the stickers available at Google IO, not sure if they're still around.. Check the online google store!

    Reply
  7. Ilya Grigorik Post author

    There will be extra load on the server when the resource is optimized on the first hit, after that the resource should be served out of the cache. As Josh mentioned in the video, the default size for the cache is 100mb.. it may be worth checking your stats to see if you're exceeding that and hence are forced to reoptimize the assets.

    Last but not least, ping us on mod-pagespeed-discuss on Google Groups – happy to dig in deeper into this.

    Reply
  8. Davin Studer Post author

    At 53 minutes into the video … Why don't mobile carriers have forward proxies set up on their side to compress sites this way? Imagine the bandwidth saving that they could gain by doing this.

    Reply
  9. Ilya Grigorik Post author

    That's a great question! Turns out, some of them do! It's hit and miss, but some carriers do apply a similar technique for image optimization, to reduce the payload size. And of course, there are services like Opera Turbo which explicitly re-encode all images to webp to get the extra savings.

    Reply
  10. Michal Mocny Post author

    Great video, thanks Ilya!

    I'de like to point out that the benchmarks for the mobile forward proxy may be a bit misleading, considering that measuring DOMContentLoaded times when using deferred iframe/javascript may not produce comparable results (extent depends on site). I suspect that most sites would still show a good portion of content even if it isn't all dressed up yet, so this is still a huge win!

    I'm going to set this up myself to test. Wonder if my home sever upload is sufficient..

    Reply
  11. Michal Mocny Post author

    Also, I hadn't heard of the strategy to append hash value of static content to filename so that you can safely set long cache time and still have a way to update the content. Brilliant! And exactly the type of feature you want your webserver to handle and not roll by hand/preprocess.

    Same for image resizing/recompression, saves so much developer time!

    Reply
  12. Ilya Grigorik Post author

    Right, that's a great point. As Josh mentioned during the show, defer_js is definitely an aggressive setting for a forward proxy, and there is a good chance it may break some sites.. but on others, it can make a world of a difference.

    Reply
  13. Susan Daniels Post author

    Most of this is way over my head. But, I would just like to know if there is a WordPress plugin. I love speed – I can work as fast as it will go 🙂

    Warmly,
    Susan Daniels

    Reply
  14. Victor Jonsson Post author

    It runs in the Apache server. No plugin necessary for any CMS

    Reply
  15. Carlos Aguilar Post author

    If your are WordPress developer or have sites running on WordPress you should watch this video.

    Reply
  16. Boris Köster Post author

    Thanks – installed, tested and I love it. Currently testing the mod on 3 servers.

    Reply
  17. ROBERT JAMES Post author

    Hi
    the info you have provided is very detailed and usefull, professional.
    i am a normal user and a online auction site owner , how i can install the said software in a simple language and simple steps
    i apperciate to get more info .
    thanks
    Robert.

    Reply
  18. Bernhard Hofmann Post author

    There is a huge desire for this to be available for IIS.

    Reply
  19. janusdisplays Post author

    I am wondering if this mod alters CSS and JS like many of the other compression methods.

    Reply
  20. pritam gurung Post author

    It would be a shame for you not to earn more cash when these people are able to get extra income so easily using Certor Cash Code (search for it on google).

    Reply
  21. Arul Post author

    How can i use it on production site if its in beta 🙁

    Reply
  22. Jnanadas K Kunjan Post author

    VERY USE FULL TO EVERY GOOGLE DEVELOPERSS-THANKS…

    Reply
  23. awan atmadja Post author

    developers.google.com/speed/pagespeed/module/download

    Reply
  24. 쓰준 Post author

    Thank you Google~! And please help also the Windows Server 2003/2008 user~~ 😀

    Reply
  25. Mike lynn Post author

    Some cache scripts have a bad reflect on the actual site structure. Changing the actual design of the site.

    Reply
  26. ura soul Post author

    i notice the pricing page for pagespeed says the 'free' service will be a charged service at an undisclosed point..

    https://developers.google.com/speed/pagespeed/service/pricing

    so we debug and inspire the platform while it is 'open source' and then at some point the service is paid for?

    or am i missing something here?

    Reply
  27. Santi locomia Post author

    Has anyone watched this video for more than ten minutes??? Guys cut to the chase. Too much bla bla. Just get to the point. There is only a  few more millions of more interesting videos that get to the point right away. I'm not wasting more time listening to your bla blas. GET TO THE POINT!! 1:05 long video? Are you kidding me? Sorry I have more important things to do like….watching videos that get to the point….!!

    Reply
  28. Mario Cantor Post author

    sounds excellent… and looks like works with nginx… see https://groups.drupal.org/node/297348

    Reply
  29. The Gambling Post Post author

    I tested my site and with this new service I will save 0.254 seconds. Not even worth setting up for less than half a second. And yeah this video is way too long.

    Reply
  30. snoektv csgo Post author

    i wish the voice volume was alot louder. im at max pc volume, max headset volumes, max youtube volume and can barely hear it clearly

    Reply
  31. Andrew Rulnick Post author

    Getting ready to roll out our service update to broaden the adoption of Google PageSpeed Apache mod_pagespeed

    Reply
  32. Andrew Rulnick Post author

    I'm so excited to be back to working with this again – it's been on my list of things to do for so long…

    Reply
  33. DBL07 Consulting & Website Design Post author

    Is this something I can install on my server for my website http://www.dbl07.co/ or is this something my hosting company Hostgator VPS has to do for me?
    Cheers

    Reply
  34. ed lorens Post author

    I am looking toward a better speed and optimization. Thanks.

    Reply
  35. Johnny Hurst Post author

    This probably sounds like a stupid question but how do you install this?

    Reply
  36. Bill Guerra Post author

    Can you kindly tell me if the above 1 hour video from  2012 is current for 2015, by chance? Thank you.

    Reply
  37. Dhvani Kotak - NCrypted Post author

    Is this tool still useful with new concepts for Google Pagespeed?

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *