Why optimize HTML?
When HTML takes a long time to load, parse, and download external files, user experience can suffer. Page load times (Page Onload) grow longer, with more users tending to abandon the longer they have to wait.
There are many ways you can optimize HTML to avoid these outcomes, including semantic optimizations that vary based on browser type. These change over time because HTML syntax changes over time, and different browsers adopt the updates at different rates.
I won't be covering those granular optimizations today. Instead, I want to give you a deeper understanding of:
- How to make sure HTML gets delivered quickly regardless of the browser type, and
- Which syntactical pieces affect modern HTML parsing the most.
I’m also going to explain why these optimizations are recommended, which will get a little technical, so I’ve included an overview at the beginning of each section, and then a TL;DR at the end.
Let’s get started.
Best practices for fast HTML delivery
- Clean up HTML so it is concise
- Compress HTML server-side
- Use non-standard optimizations as needed
HTML gets delivered like any other file on the internet – over a network in data packets, which have limited room for data. Here’s what the process looks like:
- On a new connection, the server can send up to 10 TCP packets in the first roundtrip.
- The server waits for the client (i.e., browser) to acknowledge the data.
- If the server receives confirmation from the client that it received the data, the server will double how much data it sends for each successive trip.
10 TCP packets is equivalent to about 14.3KB. So if the HTML is larger than 14.3KB, it will take multiple roundtrips to deliver the base file. Ideally, you would be able to include multiple files in that first connection, like CSS with server push, in order to complete the critical rendering path in a shorter amount of time.
Reducing the size of the HTML file helps reach this goal, with two main ways to do so:
- Clean up excess HTML code to shorten the file length.
- Compress the HTML file so that smaller file size is delivered.
HTML Delivery Tip #1: Clean up HTML so it is concise
Following W3C specifications for markup makes HTML more maintainable and readable. The ones that reduce the HTML file length most follow.
Don’t use inline styles.
Link to a stylesheet in the
<head> of the document instead of using inline styles. The type attribute does not need to be declared so that the reference to the external stylesheet looks something like this:
<link rel="stylesheet" href="styles.css">
Don't use inline scripts.
Reduce blank lines and unnecessary indentation.
Mozilla recommends indenting with 2 spaces rather than a tab – the equivalent of 4 spaces – and only separating blocks of code with a blank line when there is a good reason. You can also use a tool like HTML Tidy to strip out whitespace and extra blank lines from valid HTML.
HTML Delivery Tip #2: Compress HTML server-side
GZIP compression or a similar compression model allows less data to be sent to an end user’s browser to construct the same page. Total compressed page size is about half as large in MB as the uncompressed page size.
If you’re not compressing HTML and other files, your site is likely slower than competitors.
HTML Delivery Tip #3: Use non-standard optimizations as needed
There are some kinds of optimizations done regularly to other files that are not standard for HTML.
Minification deletes all unnecessary whitespace and all new line characters, and is not common practice in HTML. While you can minify HTML if you wish to do so, it can make the document more difficult to read, especially if the page changes often.
Caching is not always used for HTML either, because HTML files tend to change frequently.
That being said, it is possible to cache HTML. Caching rules allow you to dictate where users’ browsers will request the document from – the cache or the server. Use caution, because you don’t want to serve up an old version of a website. Static HTML pages, like blog posts, can usually be cached without adverse effects.
Best practices for fast HTML parsing
Get critical rendering files early
Load files in the right order
Load render-blocking scripts asynchronously
Use valid markup and include essential tags
Once the HTML document has been delivered to a browser, several steps need to happen in the background before anything shows on the screen. This is known as the critical rendering path – the minimum steps that the browser has to take before the first pixel displays.
HTML parsing is included in the critical rendering path. The faster HTML parsing can occur, the quicker DOM construction can happen, and the faster the rendering will occur. I will probably cover this content in a separate blog post in the future, but for now, I’ll explain the critical rendering path in conjunction with optimizations for the HTML portion of this path.
HTML Parsing Tip #1: Get critical rendering files early
For both critical CSS and JS, you can also use the HTTP preload and server push methodologies to get these files faster. CSS and JS are also typically static, which makes them excellent candidates for caching.
HTML Parsing Tip #2: Load files in the right order
Load order matters between external CSS and JS files, too. Both HTML and CSS have to be parsed for the page to render. When the browser reads through – parses – the HTML, it goes from top to bottom. When it runs into CSS, the browser can start parsing it.
However, the default behavior of the browser when it sees a
<script> tag is to stop parsing of HTML, download the script, parse it, and execute it. This is because the browser expects the script to affect the structure of the HTML, which in turn affects the way the page renders. It also means that if the HTML hasn't seen the
<head>tag of the document, load it after external CSS.
<body>tag, after the HTML content.
Finally, limit the number of files that need to load for rendering to happen. This can mean deferring third-party content that would otherwise load early in the page and slow rendering.
HTML Parsing Tip #3: Load render-blocking scripts asynchronously
When a browser sees a
<script> tag in HTML, it stops HTML parsing until the script is downloaded (if external), parsed, and executed. This is known as synchronous behavior because it all happens in the main processing thread of the browser. However, there are two attributes you can assign scripts to change this default behavior – async and defer.
async attribute – short for “asynchronous”
When a browser encounters an asynchronous script in the HTML, downloading and parsing of the script happens in a separate processing thread, allowing HTML parsing to continue. The only portion of an asynchronous script that affects HTML parsing occurs upon execution, which occurs as soon as the script is parsed.
async attribute should be denoted like this:
<script async src="script.js"></script>
defer attribute – for deferral of execution
Downloading and parsing a deferred script is also asynchronous, taking place in a separate processing thread. However, a script with the
defer attribute will only execute once the HTML is done being parsed, at which the point the document is considered
defer attribute should be denoted like this:
<script defer src="script.js"></script>
<script> tag. Use the defer attribute sparingly since it’s difficult to control execution order, and only when the script does not alter the rendering of the page.
HTML Parsing Tip #3: Use valid markup and include essential tags
Valid HTML5 markup is specified by the W3C. They also have an HTML validation tool you can use to see syntax and style errors in your code. There will almost always be some errors, but excessive errors in your document should be a concern. Browsers rely on HTML standardization to read and understand what an HTML document contains and how to display it, but poor document structure and poor use of syntax can slow down how quickly the page can display.
To make sure browsers can easily read your HTML, you should:
- Include essential tags and attributes
- Close all tags that require closure.
- Use descriptive tags in favor of generic ones.
Include essential tags and attributes.Declare
The doctype declaration should happen at the very top of the HTML document, outside of any other tags and above the
<html> tag. This lets the browser know what it’s looking at as soon as the page is delivered. For most cases,
<!DOCTYPE html> will be appropriate, which defaults to the current HTML version. However, other doctypes can be declared depending on how the document will primarily be used.
Letting the browser know what language it’s looking at reduces errors in parsing and allows faster rendering. Use as short a language declaration as possible inside the
For example, if your document is in Japanese, you would write:
You can exclude the country code in this example because Japanese is only spoken in Japan.Declare what character encoding the browser should use.
For character encoding, the current standard is to use UTF-8, which avoids the vulnerabilities of UTF-7. Character encoding is declared as a <meta> tag attribute within the
<head> of the document.
Without declaring the character encoding with the charset attribute, the browser will not know how to read the file. For that reason, you should include the
<meta> tag with charset attribute immediately after the opening
<head> tag so that it is one of the first things the browser reads.
In summary, the beginning of the HTML document should include something like this at minimum:
Close all tags that require closure
In HTML, a few tags are assumed to be self-closing, called void elements. Most tags, however, require a closing tag. Although a browser can usually read HTML without closing tags, leaving tags open can result in disproportionately poor performance because the browser must construct additional DOM nodes to compensate for the nested elements.
An appropriately opened and closed element looks like this:
Void elements that do not require a closing tag in HTML include
Favor descriptive element types and avoid generic ones
HTML5 includes new elements that are more specific to certain kinds of content. Using these descriptive element names gives the browser a more rigid set of rules for reading and styling the content contained in the element than using a generic element would. This can cut down on the number of rules necessary in CSS, as well as reducing redundant class attributes.
For instance, a navigation bar containing a set of links for the main navigation on the site can be denoted with the new
<nav> element instead of with a
<a href="/link1/">Link 1</a> |
<a href="/link2/">Link 2</a> |
<a href="/link3/">Link 3</a> |
Takeaways and the TL;DR
HTML can make or break your site. The way it’s delivered and structured determines how quickly the browser renders a webpage and what quality the rendering will be. With that in mind, we determined that reducing the amount of data that gets delivered with the HTML file allows the browser to start reading the HTML sooner, and that following best practices for document structure help the browser read it faster.
Here are the HTML optimization recommendations made in this article (the TL;DR):
- Reduce unnecessary whitespace and blank lines.
- Compress HTML on the server with GZIP or similar.
- Get critical rendering files – like above-the-fold styles – early in the page load with preload and server push.
- Always load external CSS before JS in the <head>.
- Place synchronous JS at the bottom of the <body>.
- Load scripts asynchronously whenever possible.
- Validate your HTML.
- Always include essential elements, like
- Favor descriptive elements types over generic ones.
And as always, test changes before you make them!