A lot of things happen between the time you request a web page and the time it completely loads in your browser. Each of those things takes anywhere from a few milliseconds to several minutes to complete and reducing the time of any step in the process improves the overall performance of the page.
Some of the steps along the way, such as the size of your HTML and CSS files, are under your control. With some, such as a DNS lookup to convert domain.com into an IP address, there’s not a lot you can do to speed up the process.
Last week I began a series about website performance. To understand how we can make sites perform better, I think it makes sense to know a little about what happens when someone requests a web page in their browser.
I won’t pretend to understand every detail and I won’t try to cover every possible thing that can happen. However, I would like to present a general idea of what happens when you request a web page and point out some areas we have influence and some where we can’t do much to improve performance.
The Path of a Browser Request
There are a handful of general steps that occur between the time you request a web page and the time it displays in your browser.
- DNS Lookup
- Browser sends an HTTP request
- Server responds and sends back the requested HTML file
- Browser begins to render HTML
Each of these general steps has additional steps and checks and in the rest of this post I want to go through each of them in a little more detail.
Domain.com is easy for humans to remember. An IP address like 18.104.22.168 isn’t. The latter is how servers are located though, so the first step in requesting a page is converting the domain to an IP address.
Several hundred root servers are scattered around the world. Each contains information that maps every domain to the IP address of the server on which it’s located. When you request a web page, the first step is a request to one of these servers to let your browser know the correct IP to send its request to.
13 servers sit at the top of the hierarchy of all the root servers and these 13 servers form the root zone. It’s managed by 12 different organizations, each responsible for maintaining one root zone server, with the exception of Verisign which maintains two of them.
Because domain mapping information is fairly consistent and because it takes time to send packets of data around the web, it makes sense to cache the information. It’s much quicker to look up a cached static page than it is to send the request and wait for a response.
Also, since most of us tend to visit the same sites again and again, we don’t need to store every possible domain to IP mapping. A number of devices you interact with to view web pages will store a DNS cache, starting with your browser. Here’s a list of devices that will store a cache in the order they’re checked.
- Browser cache
- Operating system cache
- Router cache
- ISP DNS cache
First the cache stored in your browser is checked, then the cache stored by your operating system, and so on. If none of these cached files have the information needed, a recursive search of root DNS servers takes place.
If you’ve ever wondered why changing nameservers for a domain can take up to 48 hours to take effect, it’s because all these caches needed to be updated to reflect the new information. It’s why you might see the site on its new server while a friend with a cache that hasn’t yet updated still sees the site on the old server.
A DNS lookup of one of the root servers doesn’t exactly take a long time, but it is quicker to check a local cache, so one thing we can do to improve performance is to suggest the mapping stays in local caches longer.
Browser Sends Request
After a browser has performed the DNS lookup, it sends an HTTP request to the appropriate server. It doesn’t have to literally be HTTP. It can be HTTPS or more recently an HTTP/2 request. The general idea though it that your browser sends a request for a specific file, often an HTML file.
I say browser, but the server itself doesn’t really care where the request originated. It might come from a browser and it might come from a search engine spider or anything else that can send a request.
Here’s the request Firefox sent when I requested the home page of this site.
GET / HTTP/1.1
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:50.0) Gecko/20100101 Firefox/50.0
Accept-Encoding: gzip, deflate
Cookie: _ga=GA1.2.604372764.1484810328; PHPSESSID=7ip53s84rki0mn2rjae5bhin05; bp-activity-oldestpage=1
This information isn’t a complete list of what can be sent and whatever browser you use has ways to send additional header information or to tweak the default.
For example, your browser probably has a way to send a different user-agent so you can see what information gets returned to different browsers. It’s something I have to do too often when a site delivers a video via flash as I don’t have flash installed on my laptop. Most of the time I just close the page, but in those cases where I really wanted to watch the video, I’ll change my user-agent to Safari on an iPad (or something similar) and the site will often deliver me a non-flash version of the same video.
You can’t control the request headers that are sent to your site since you don’t make the request, but there’s information in the header suggesting how we can improve performance and I’ll call your attention to one.
If you look back at the request above, you can see Firefox accepts gzip (as do other browsers) so one way we can improve performance is to use gzip and have zipped content delivered from the server. Zipping and unzipping takes less time than sending everything unzipped over the network.
Once the server receives a request, it will respond, though how will depend on a number of things. For example, the page requested may no longer exist on the server and the server will send back a 404 (file not found) error. It’s also possible the file has been moved to a new location and requests are being redirected temporarily (302) or permanently (301). If this is the case your browser needs to make another request for the file at the new location.
As you can guess redirection slows performance as each redirect is an extra request. There are reasons why you’d want to add a redirect, but keep in mind fewer are better for performance and unless you have a good reason you don’t want to chain redirects one after the other, since each will only slow performance.
Lots of other things can happen and the server sends different status codes based on those things. Ideally it sends a 200 OK, which means success, but again it might send a 404 or 301 or 500 (internal server error).
Anything other than 200 success means your browser will either send additional requests or let you know something is wrong so let’s focus on success. Here’s what my server sent back to the request Firefox sent it.
Date: Thu, 26 Jan 2017 23:39:09 GMT
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Link: <http://vanseodesign.com/wp-json/>; rel="https://api.w.org/", <http://vanseodesign.com/>; rel=shortlink
Connection: keep-alive, Keep-Alive
Keep-Alive: timeout=3, max=10
Content-Type: text/html; charset=UTF-8
You can see the status code at the bottom, the date and time at the top, information about the server, and other header information about the type of content requested, the length of the content requested, what kind of encoding, etc. Assuming a successful response the server also sends the requested file, which, in this case and in general, will be an HTML file.
The response time of the server is usually not the main bottleneck in performance. If it is, you probably need a new host or a better hosting package with the same host. Still there are some things you can do to improve response times, such as removing unnecessary redirection.
If your site relies on a database you can optimize the database and since the response ultimately leads to files being sent across the network, reducing the size of the files is something we should try to do.
Browser Renders Page
Once your browser receives an HTML file, it needs to render the page and it has to go through a few steps before you see it displayed. These steps are called the critical rendering path in which your browser needs to:
- Process HTML markup and build the DOM tree.
- Process CSS markup and build the CSSOM tree.
- Combine the DOM and CSSOM into a render tree.
- Run the layout on the render tree to compute the geometry of each node.
- Paint the individual nodes to the screen.
Hopefully you can already guess at a few things we can to to improve the time it takes to render a web page. Fewer embedded file requests, smaller files being requested, and reducing the number of render blocking resources will all improve performance, but they aren’t the only things.
That was a pretty quick walk through of what happens when you type a URL into a browser or click a link from one page to another. It admittedly skips a lot of details, but hopefully it’s enough to understand that a lot of things happen before a web page loads.
There are things you can do to improve each and every step, but most of what we’ll do involves the last step, which is probably where performance is impacted the most. Even if we only focus on the last step in the critical path, there’s still a lot of different improvements we can potentially make.
Before we start making changes we should understand a few more things. We should know where the bottlenecks are so we can spend our time where it’s most needed. We should also have some goals in mind to decide how much effort to put into each possible improvement.
Next week I want to talk about the goals we should set. I’ll talk about performance budgets, what they are, and how to set them. The following week I’ll show you how to find the performance bottlenecks in your site.
Download a free sample from my book, Design Fundamentals.
Interesting article, thanks for posting
Never realised there was so much to loading a web page! Interesting stuff, thanks.
I understood a lot, thankyou
Ok got it thanks. I have been confused.. so if a site is filtered out by an ISP for example it can be filtered out immediately at the IP stage and that is why you see immediate error in connection and no codes or any more data onscreen.
When the browser requests a page I assumed the server would get the request and try to return that page. If it was a first request for page. com I thought it would ask for page,com/index.
How does the server know to go to Wordpress?
It replies via the webserver, which has Wordpress installed on it.
Very helpful information here – Great article Steven!
Whoops. My comments disappeared with a careless keystroke.
Thank you for your explanation of how browser and servers exchange data when serving a page.
How can you see the requests and responses that are being sent between browser and web server?
I usually use a browser extension. It’s been awhile so I don’t have a specific one to recommend, but if you search for extensions for your favorite browser that show http headers, the extension should show the requests and responses.
Can’t you just view the requests in the dev tools of the browser?
Do they all show that information? I don’t think they showed the http info when I wrote the post, but I could be wrong. If browser dev tool show http requests then you can certainly use them.
Very informative article. Thank you.