HTTP Protocol Tutorial
In 10 minutes, you'll have a basic understanding of HTTP protocol.
Here's a summary of what HTTP protocol is:
- A Client/Server model.
- Request/Response. Client makes a request to a server, server responds.
- Request typically is over TCP, at port 80. [see TCP/IP Tutorial for Beginner]
- Request/response format is just plain text, of two parts, header and payload (content), separeted by a empty line.
- First line of request message is called request line. It contains the “command”.
- First line of response message is called status line. It contains the “status code”.
- There are different “commands”, technically called “request methods” . Most useful are GET and POST. GET basically just ask for a resource (for example a file, or any data identified by a path.) POST means sending some data to server, such as needed by login or shopping chart.
- Each response has a status code.
For example, when you use web browser to view a URL, the following happens:
- Browser send a request to a server. (Server address is contained in the URL.)
- The server sends back response, also plain text. (if it is image file, the image is encoded into text.)
To understand HTTP protocol, we just need to understand the HTTP messages that the client/server send. Let's first look at tools to view HTTP messages.
How to See HTTP Messages
See HTTP Headers in Web Browser
you can use web browser to view the header sent/received by client/server.
Here's how to use Google Chrome to view HTTP messages:
- Open the web development tool. (in Google Chrome, press F12 on Windows or Linux. Other browsers/OS have similar tool. You can find it in their menu.)
- Click on the Network tab.
- Visit some page, type a URL in the URL box and press Enter.
- Click on a item in the left of the network report, to see the HTTP message header for that item. (each item is a HTTP request made by browser.)
Linux Command to View HTTP Headers
The following command line tools can view HTTP response header.
curl --head example.com
wget --server-response --spider example.com
Here's curl example:
curl --head example.com HTTP/1.1 200 OK Accept-Ranges: bytes Cache-Control: max-age=604800 Content-Type: text/html; charset=UTF-8 Date: Mon, 11 Mar 2019 04:25:57 GMT Etag: "1541025663" Expires: Mon, 18 Mar 2019 04:25:57 GMT Last-Modified: Fri, 09 Aug 2013 23:54:35 GMT Server: ECS (sjc/4E4E) X-Cache: HIT Content-Length: 1270
Here's wget example:
wget --server-response --spider example.com Spider mode enabled. Check if remote file exists. --2019-03-10 21:29:03-- http://example.com/ Resolving example.com (example.com)... 2606:2800:220:1:248:1893:25c8:1946, 184.108.40.206 Connecting to example.com (example.com)|2606:2800:220:1:248:1893:25c8:1946|:80... connected. HTTP request sent, awaiting response... HTTP/1.1 200 OK Content-Encoding: gzip Accept-Ranges: bytes Cache-Control: max-age=604800 Content-Type: text/html; charset=UTF-8 Date: Mon, 11 Mar 2019 04:29:03 GMT Etag: "1541025663" Expires: Mon, 18 Mar 2019 04:29:03 GMT Last-Modified: Fri, 09 Aug 2013 23:54:35 GMT Server: ECS (sjc/4E45) X-Cache: HIT Content-Length: 606 Length: 606 [text/html] Remote file exists and could contain further links, but recursion is disabled -- not retrieving.
[see Linux: Download Website: wget, curl]
Other languages, such as Python and Ruby, have similar tools or libraries.
Sample message sent by client:
GET /hello.txt HTTP/1.1 User-Agent: curl/7.16.3 libcurl/7.16.3 OpenSSL/0.9.7l zlib/1.2.3 Host: www.example.com Accept-Language: en, mi
Sample message sent by server:
HTTP/1.1 200 OK Date: Mon, 27 Jul 2009 12:28:53 GMT Server: Apache Last-Modified: Wed, 22 Jul 2009 19:15:56 GMT ETag: "34aa387-d-1568eb00" Accept-Ranges: bytes Content-Length: 51 Vary: Accept-Encoding Content-Type: text/plain Hello World! My payload includes a trailing CRLF.
Lines in HTTP message must be separated by the character sequence
(that is, a carriage return followed by a line feed. Yes, both.)
[see ASCII Table]
The message exchanged by client/server is plain text. It has 2 parts, header and content.
Header and content are separated by 1 blank line.
The first line of the header is special.
If it's request, it's called request line. For example, it looks like this:
GET /tutorial/index.html HTTP/1.1
If it's response, it's called status line. For example, it looks like this:
HTTP/1.1 200 OK
The rest of header part is made of lines, each line is called a “field”.
A field is separated by first colon : into two parts: field-name and field-value.
Recall that the first line of request looks like this:
GET /tutorial/index.html HTTP/1.1
It has 3 parts: ① request method. ② resource path. ③ http version.
The most used request methods are:
- GET → request a resource.
- HEAD → Same as GET, but only get headers, no content. That is, just request metadata. Useful for web crawler, proxy server, etc.
- POST → Send some info to server. For example, used for login, credit card, shopping cart, via HTML Form. [see HTML Form Example]
Other methods are much less used , and may not be implemented by server.
The following is a more complete list from HTTP/1.1 (source is Wikipedia 2019-03-11)
- GET → requests a representation of the specified resource. Requests using GET should only retrieve data and should have no other effect. (This is also true of some other HTTP methods.) The W3C has published guidance principles on this distinction, saying, “Web application design should be informed by the above principles, but also by the relevant limitations.” See safe methods below.
- HEAD → asks for a response identical to that of a GET request, but without the response body. This is useful for retrieving meta-information written in response headers, without having to transport the entire content.
- POST → requests that the server accept the entity enclosed in the request as a new subordinate of the web resource identified by the URI. The data POSTed might be, for example, an annotation for existing resources; a message for a bulletin board, newsgroup, mailing list, or comment thread; a block of data that is the result of submitting a web form to a data-handling process; or an item to add to a database.
- PUT → requests that the enclosed entity be stored under the supplied URI. If the URI refers to an already existing resource, it is modified; if the URI does not point to an existing resource, then the server can create the resource with that URI.
- DELETE → deletes the specified resource.
- TRACE → echoes the received request so that a client can see what (if any) changes or additions have been made by intermediate servers.
- OPTIONS → returns the HTTP methods that the server supports for the specified URL. This can be used to check the functionality of a web server by requesting ‘*’ instead of a specific resource.
- CONNECT → converts the request connection to a transparent TCP/IP tunnel, usually to facilitate SSL-encrypted communication (HTTPS) through an unencrypted HTTP proxy. See HTTP CONNECT method.
- PATCH → The PATCH method applies partial modifications to a resource.
For detail about the commands, see: [RFC 7231 HTTP/1.1: Semantics and Content By IETF. At https://tools.ietf.org/html/rfc7231 , accessed on 2016-04-02 ]
HTTP Status Code
In the server response message, the first line is the status line. Here's a example:
HTTP/1.1 200 OK
It has 3 parts: ① the HTTP version. ② the status code. ③ Human readable representation of the status code.
The status code has 3 digits. Its meaning is grouped into categories by the first digit:
- 1xx (Informational): The request was received, continuing process
- 2xx (Successful): The request was successfully received, understood, and accepted
- 3xx (Redirection): Further action needs to be taken in order to complete the request
- 4xx (Client Error): The request contains bad syntax or cannot be fulfilled
- 5xx (Server Error): The server failed to fulfill an apparently valid request
Here's a full list. Those with a 🌟 sign are most frequently used.
- 100 → Continue
- 101 → Switching Protocols
- 200 → 🌟 OK
- 201 → Created
- 202 → Accepted
- 203 → Non-Authoritative Information
- 204 → No Content
- 205 → Reset Content
- 206 → Partial Content
- 300 → Multiple Choices
- 301 → 🌟 Moved Permanently
- 302 → Found
- 303 → See Other
- 304 → Not Modified
- 305 → Use Proxy
- 307 → Temporary Redirect
- 400 → Bad Request
- 401 → Unauthorized
- 402 → Payment Required
- 403 → 🌟 Forbidden
- 404 → 🌟 Not Found
- 405 → Method Not Allowed
- 406 → Not Acceptable
- 407 → Proxy Authentication Required
- 408 → Request Timeout
- 409 → Conflict
- 410 → Gone
- 411 → Length Required
- 412 → Precondition Failed
- 413 → Payload Too Large
- 414 → URI Too Long
- 415 → Unsupported Media Type
- 416 → Range Not Satisfiable
- 417 → Expectation Failed
- 426 → Upgrade Required
- 500 → Internal Server Error
- 501 → Not Implemented
- 502 → Bad Gateway
- 503 → Service Unavailable
- 504 → Gateway Timeout
- 505 → HTTP Version Not Supported
For detail of all status code, see: [RFC 7231 HTTP/1.1: Semantics and Content By IETF. At https://tools.ietf.org/html/rfc7231 , accessed on 2016-04-02 ]
Cookies is also sent as part of the http header.
What is a cookie?
Basically, when server responds, it can return a header such as
Set-Cookie: name=value. When browser sees that, the browser is required to store it locally, along with which server the cookie came from. When browser makes a request to a server, browser must also send all cookies that the same server sent before.
The purpose of cookies is for server to keep states of clients. For example, by setting a cookie, the server is able to know if the browser user is logged in.
Here's a example of a header from server that asks browser to store cookie:
HTTP/1.0 200 OK Content-type: text/html Set-Cookie: name=value Set-Cookie: name2=value2; Expires=Wed, 09 Jun 2021 10:18:14 GMT server's http response header, including 2 lines of cookie
When browser make a request to server, it must send all the cookies it got from that server. Here's a example header from browser with cookie:
GET /spec.html HTTP/1.1 Host: www.example.org Cookie: name=value; name2=value2 Accept: */*
Here's a real world cookie from apple.com:
For detail on how cookies work, see:
Anatomy of URL
Here's a review about URL parts. It is not part of HTTP protocol, but is useful for web programers if you don't know it already.
- http → protocol
- www.example.com → host. The “example.com” is domain name. “www” is sub-domain.
- 80 → port number
- a/b/c → resource path
- frag → url fragment
- x=1&y=2 → Called “query string”. It's a list of key/value pairs, used as argument for input.
The query string is generated in 2 ways:
- User can type it directly in URL, or it can be embedded in a link in HTML.
- or, it is generated by a HTML Form when the form specifies GET method like this:
<form action="http://example.com/xyz" method="get" enctype="application/x-www-form-urlencoded">
The HTML form can specify
method="post" instead. In that case, the URL won't have query string. The query string is in the body part of the HTTP message.
[see HTML Form Example]
The TCP/IP Protocol Suite
The HTTP protocol is a high-level application layer protocol of the TCP/IP internet protocol suite. HTTP protocol is about client/server exchanging messages.
But how exactly do browser find server across the globe? and How does browser send message exactly, by airplane?
The details of how client/server communicate, is specified by many lower protocols in TCP/IP. For a basic introduction, see TCP/IP Tutorial for Beginner.
- RFC 7230, HTTP/1.1: Message Syntax and Routing.
- RFC 7231, HTTP/1.1: Semantics and Content.
- RFC 7232, HTTP/1.1: Conditional Requests.
- RFC 7233, HTTP/1.1: Range Requests.
- RFC 7234, HTTP/1.1: Caching.
- RFC 7235, HTTP/1.1: Authentication.
- RFC 2817 Upgrading to TLS Within HTTP/1.1.
- RFC 5785 Defining Well-Known Uniform Resource Identifiers (URIs).
- RFC 6266 Use of the Content-Disposition Header Field in the Hypertext Transfer Protocol (HTTP).
- RFC 6585 Additional HTTP Status Codes.
- Cookies. [HTTP State Management Mechanism By IETF. At http://tools.ietf.org/html/rfc6265 , accessed on 2014-02-28 ]
HTTP/2 [Hypertext Transfer Protocol Version 2 (HTTP/2) By IETF. At https://tools.ietf.org/html/rfc7540 , accessed on 2016-04-01 ]
obsolete. [Hypertext Transfer Protocol -- HTTP/1.1 By IETF. At https://tools.ietf.org/html/rfc2616 , accessed on 2016-04-02 ]
If you have a question, put $5 at patreon and message me.