HTTP Protocol Tutorial

By Xah Lee. Date: . Last updated: .

In 10 minutes, you'll have a basic understanding of HTTP protocol.

Here's a summary of what HTTP protocol is:

  1. A client/Server model.
  2. Request/Response. Client makes a request to a server, server responds.
  3. request typically is over TCP, at port 80
  4. request/response format is just plain text, of two parts, header and payload (content), separeted by a empty line.
  5. First line of request message is called request line. It contains the “command” (aka “verb”).
  6. First line of response message is called status line. It contains the “status code”.
  7. There are different “commands”, technically called “request methods” . Most useful are GET and POST. GET basically just ask for a resource (for example a file, or any data identified by a path.) POST means sending some data to server, such as needed by login or shopping chart.
  8. Each response has a status code.

For example, when you use web browser to view a URL, the following happens:

  1. browser initiate a request to a server. (Server address is contained in the URL.)
  2. The server sends back response, also plain text. (if it is image file, the image is encoded into text.)
  3. The browser renders the result. (if it is HTML, browser parses it, and may make other requests such as images, style sheet, JavaScript file, etc.)

To understand HTTP protocol, we just need to understand the HTTP messages that the client/server send. Let's first look at tools to view HTTP messages.

How to See HTTP Messages

Web Browser Development Tool to View HTTP Messages

you can use web browser to view the header sent/received by client/server.

http protocol headers chrome browser tool 2016-04-01
Google Chrome showing HTTP message headers.

Here's how to use Google Chrome to view HTTP messages:

  1. Open the web development tool. (in Google Chrome, press F12 on Windows or Linux. Other browsers/OS have similar tool. You can find it in their menu.)
  2. Click on the Network tab.
  3. Visit some page, type a URL in the URL box and press Enter ↵.
  4. Click on a item in the left of the network report, to see the HTTP message header for that item. (each item is a HTTP request made by browser.)

Linux Command to View HTTP Messages

the following command line tools can view HTTP response header.

Here's HEAD example HEAD example.com.

linux http GET HEAD command line tool
linux http GET HEAD command line tool

The linux commands {GET, HEAD, POST} are perl scripts. They are installed on Ubuntu. You can read their doc by man HEAD.

Here's curl example curl --head example.com.

~/web/xahlee_info/linux $ curl --head example.com
HTTP/1.1 200 OK
Content-Encoding: gzip
Accept-Ranges: bytes
Cache-Control: max-age=604800
Content-Type: text/html
Date: Sat, 02 Apr 2016 17:23:12 GMT
Etag: "359670651+gzip"
Expires: Sat, 09 Apr 2016 17:23:12 GMT
Last-Modified: Fri, 09 Aug 2013 23:54:35 GMT
Server: ECS (iad/182A)
X-Cache: HIT
x-ec-custom-error: 1
Content-Length: 606

You can also use wget: wget --server-response --spider example.com

~/web/xahlee_info/linux $ wget --server-response --spider example.com
Spider mode enabled. Check if remote file exists.
--2016-04-02 10:25:10--  http://example.com/
Resolving example.com (example.com)... 2606:2800:220:1:248:1893:25c8:1946, 93.184.216.34
Connecting to example.com (example.com)|2606:2800:220:1:248:1893:25c8:1946|:80... connected.
HTTP request sent, awaiting response...
  HTTP/1.1 200 OK
  Content-Encoding: gzip
  Accept-Ranges: bytes
  Cache-Control: max-age=604800
  Content-Type: text/html
  Date: Sat, 02 Apr 2016 17:25:11 GMT
  Etag: "359670651+gzip"
  Expires: Sat, 09 Apr 2016 17:25:11 GMT
  Last-Modified: Fri, 09 Aug 2013 23:54:35 GMT
  Server: ECS (iad/182A)
  X-Cache: HIT
  x-ec-custom-error: 1
  Content-Length: 606
Length: 606 [text/html]
Remote file exists and could contain further links,
but recursion is disabled -- not retrieving.

〔➤see Linux: Download Website by Command: wget, curl, HEAD, GET

Other languages, such as Python and Ruby, have similar tools or libraries.

Client/Server Messaging

Sample message sent by client:

GET /hello.txt HTTP/1.1
User-Agent: curl/7.16.3 libcurl/7.16.3 OpenSSL/0.9.7l zlib/1.2.3
Host: www.example.com
Accept-Language: en, mi

Sample message sent by server:

HTTP/1.1 200 OK
Date: Mon, 27 Jul 2009 12:28:53 GMT
Server: Apache
Last-Modified: Wed, 22 Jul 2009 19:15:56 GMT
ETag: "34aa387-d-1568eb00"
Accept-Ranges: bytes
Content-Length: 51
Vary: Accept-Encoding
Content-Type: text/plain

Hello World! My payload includes a trailing CRLF.

the message exchanged by client/server is plain text. It has 2 parts, header and content.

header and content are separated by 1 blank line.

The first line of the header is special.

If it's request, it's called request line. For example, it looks like this:

GET /tutorial/index.html HTTP/1.1

If it's response, it's called status line. For example, it looks like this:

HTTP/1.1 200 OK

The rest of header part is made of lines, each line is called a “field”.

A field is separated by first colon : into two parts: field-name and field-value.

I recommend you keep browser web development console's network tab open, and just visit some websites you visit frequently, and look at the request/response headers. You'll soon be familiar what they are. For the ones you see frequently and is curious, see the following spec to find out:

Note: lines in HTTP message must be separated by the character sequence "\r\n". (that is, a carriage return followed by a line feed. Yes, both.) 〔➤see ASCII Table

HTTP Methods (aka Commands/Verbs)

Recall that the first line of request looks like this: GET /tutorial/index.html HTTP/1.1

It has 3 parts: ① request method. ② resource path. ③ http version.

The most used http commands are:

The following are less used commands of HTTP/1, and may not be implemented by server.

For detail about the commands, see: 〔RFC 7231 HTTP/1.1: Semantics and Content By IETF. @ https://tools.ietf.org/html/rfc7231

HTTP Status Code

In the server response message, the first line is the status line. Here's a example:

HTTP/1.1 200 OK

It has 3 parts: ① the HTTP version. ② the status code. ③ Human readable representation of the status code.

The status code has 3 digits. Its meaning is grouped into categories by the first digit:

The most used status code are:

Other, less used, often unsupported status code are:

+------+-------------------------------+--------------------------+
   | Code | Reason-Phrase                 | Defined in...            |
   +------+-------------------------------+--------------------------+
   | 100  | Continue                      | Section 6.2.1            |
   | 101  | Switching Protocols           | Section 6.2.2            |
   | 200  | OK                            | Section 6.3.1            |
   | 201  | Created                       | Section 6.3.2            |
   | 202  | Accepted                      | Section 6.3.3            |
   | 203  | Non-Authoritative Information | Section 6.3.4            |
   | 204  | No Content                    | Section 6.3.5            |
   | 205  | Reset Content                 | Section 6.3.6            |
   | 206  | Partial Content               | Section 4.1 of [RFC7233] |
   | 300  | Multiple Choices              | Section 6.4.1            |
   | 301  | Moved Permanently             | Section 6.4.2            |
   | 302  | Found                         | Section 6.4.3            |
   | 303  | See Other                     | Section 6.4.4            |
   | 304  | Not Modified                  | Section 4.1 of [RFC7232] |
   | 305  | Use Proxy                     | Section 6.4.5            |
   | 307  | Temporary Redirect            | Section 6.4.7            |
   | 400  | Bad Request                   | Section 6.5.1            |
   | 401  | Unauthorized                  | Section 3.1 of [RFC7235] |
   | 402  | Payment Required              | Section 6.5.2            |
   | 403  | Forbidden                     | Section 6.5.3            |
   | 404  | Not Found                     | Section 6.5.4            |
   | 405  | Method Not Allowed            | Section 6.5.5            |
   | 406  | Not Acceptable                | Section 6.5.6            |
   | 407  | Proxy Authentication Required | Section 3.2 of [RFC7235] |
   | 408  | Request Timeout               | Section 6.5.7            |
   | 409  | Conflict                      | Section 6.5.8            |
   | 410  | Gone                          | Section 6.5.9            |
   | 411  | Length Required               | Section 6.5.10           |
   | 412  | Precondition Failed           | Section 4.2 of [RFC7232] |
   | 413  | Payload Too Large             | Section 6.5.11           |
   | 414  | URI Too Long                  | Section 6.5.12           |
   | 415  | Unsupported Media Type        | Section 6.5.13           |
   | 416  | Range Not Satisfiable         | Section 4.4 of [RFC7233] |
   | 417  | Expectation Failed            | Section 6.5.14           |
   | 426  | Upgrade Required              | Section 6.5.15           |
   | 500  | Internal Server Error         | Section 6.6.1            |
   | 501  | Not Implemented               | Section 6.6.2            |
   | 502  | Bad Gateway                   | Section 6.6.3            |
   | 503  | Service Unavailable           | Section 6.6.4            |
   | 504  | Gateway Timeout               | Section 6.6.5            |
   | 505  | HTTP Version Not Supported    | Section 6.6.6            |
   +------+-------------------------------+--------------------------+

For detail of all status code, see: 〔RFC 7231 HTTP/1.1: Semantics and Content By IETF. @ https://tools.ietf.org/html/rfc7231

HTTP Cookies

Cookies is also sent as part of the http header.

What is a cookie?

Basically, when server responds, it can return a header such as Set-Cookie: name=value. When browser sees that, the browser is required to store it locally, along with which server the cookie came from. When browser makes a request to a server, browser must also send all cookies that the same server sent before.

The purpose of cookies is for server to keep states of clients. For example, by setting a cookie, the server is able to know if the browser user is logged in.

Here's a example of a header from server that asks browser to store cookie:

HTTP/1.0 200 OK
Content-type: text/html
Set-Cookie: name=value
Set-Cookie: name2=value2; Expires=Wed, 09 Jun 2021 10:18:14 GMT

server's http response header, including 2 lines of cookie

When browser make a request to server, it must send all the cookies it got from that server. Here's a example header from browser with cookie:

GET /spec.html HTTP/1.1
Host: www.example.org
Cookie: name=value; name2=value2
Accept: */*

Here's a real world cookie from apple.com:

http protocol headers cookie 2016-04-02 2
Cookie in HTTP header.

For detail on how cookies work, see:

JavaScript: Checking, Getting, Setting, Cookies

Anatomy of URL

Here's a review about URL parts. It is not part of HTTP protocol, but is useful for web programers if you don't know it already.

http://www.example.com:80/a/b/c#frag?x=1&y=2

The query string is generated in 2 ways:

The HTML form can specify method="post" instead. In that case, the URL won't have query string. The query string is in the body part of the HTTP message.

〔➤see HTML Form Example

The TCP/IP Protocol Suite, Or, is HTTP Message Carried by Birds?

The HTTP protocol is a high-level application layer protocol of the TCP/IP internet protocol suite. HTTP protocol is mostly about client/server exchanging human-readable text messages.

But how exactly do browser find server across the globe? How does browser send message exactly, by car, by boat, by airplane?

The details of how client/server communicate, is specified by many lower protocols in TCP/IP. For a basic introduction, see TCP/IP Tutorial for Beginner.

Reference

HTTP/2 Hypertext Transfer Protocol Version 2 (HTTP/2) By IETF. @ https://tools.ietf.org/html/rfc7540

obsolete. Hypertext Transfer Protocol -- HTTP/1.1 By IETF. @ https://tools.ietf.org/html/rfc2616