TCP/IP Tutorial for Beginner

By Xah Lee. Date: . Last updated: .

This is a basic tutorial on TCP/IP, for beginner programer or scientists. In a hour, you should have a basic understanding.

TCP/IP is a set of protocols, and is the primary tech of the internet. When you browse the web, send email, chat online, online gaming, TCP/IP is working busily underneath.

What is a protocol?

A protocol is a set of rules and procedures, such as what format to use, when should data be send, what are the numbers in the data mean, what commands to use, what error code are there and their meaning, etc.

When two computers exachange data, they can understand each other if both uses the same protocol.

Overview of How Internet Works

Suppose you are viewing a web page, or chat with a friend online, or downloading a file. What happens underneath?

The app (email, chat, etc) breaks the data into thousands of tiny independent pieces. Each piece is called a packet (or datagram). Each packet has embedded with it the destination IP address. Your computer send this packet to your router , and your router send it to another router that's closer to the destination. This process continues until the designated machine with the IP address receives it. This is done for each and every packet. On the receiving machine, it re-assembles all these packets into the original whole piece in the right order, and send it to the right application on that machine (the email server, or web server, or chat server. (which in turn, repeat the same thing to send it to your friend's machine.))

Computer software follow a set of standardized rules of procedure when talking to each other. This standardized rules of procedure used for internet is called the Internet Protocol Suite (aka TCP/IP).

Networking Hardware

Before we talk about internet protocol, lets take a look at hardware needed, because hardware gives us a good overview of how things are connected.

The essential hardware for internet to work, for our purposes, are:

Network Interface Controller

Network Interface Controller (NIC) (aka network adapter, network card network interface card, LAN adapter ) is a hardware that lets your computer talk to the internet. All internet-capable device has at least one. Today's computer usually has two, one for Ethernet (wired) and one for wireless.

Network Interface Controller provides one of:

network adapter card 2017 03 13
Network adapter card. It provides ethernet port. This is used if your computer does not have wired ethernet builtin. This is popular in early 2000s. Since 2010, it's usually just a chip on motherboard.
ethernet cable 20210307
Ethernet cable, for wired internet.
usb network adapter 68283
USB wireless network adapter. This is used if your computer does not have wireless network adapter builtin. This is popular in late 2000s, when wireless became popular but most laptop or desktop computers don't have it builtin yet.

As of year 2020, every phone, laptop, desktop computer has wireless network adapter builtin. And every phone has wireless network adapter builtin.

How to list all Network Interface?

Router

Then, the second most important hardware is Router . Router transfer packets between internet devices.

TP-Link AC1750 WiFi Router 20200802 xDFmj
TP-Link AC1750 WiFi 5 Router and NETGEAR Cable Modem CM500V
home wifi router ports 2017 03 13
Home wireless router ports. Home wireless router also serve as wired router.

The Network Adapter (wired or wireless) in your computer send signals to the router, then the router either send it to other computer in your home, or send it to the internet via physical cable or phone line connected to it (typically a device called Cable Modem) .

Typically, each internet device start with its software sending info to the Network Adapter, then the Network Adapter send it to a router, then router send it to another router, and so on, until a router send it to a destination computer's Network Adapter. (the destination is usually a corporation's machine, we call it “server”. (e.g. when you goto google.com or any or any website.) The server, either stores your info (e.g. photos), or send it to another user's machine (e.g. online chat messages).)

Internet Addresses

In order to send info, devices must have addresses for destination. Two of them are most important:

MAC Address (aka Hardware Address, Physical Address)

Each Network Adapter has a ID, called MAC address (aka hardware address, physical address). This ID is burned into the hardware. (“MAC” is abbreviation for “Media Access Control” (the name is historical).)

MAC address is a 48 bits number. Usually written as 6 groups of 2 hexadecimal digits. For example, 8d-cc-58-ab-db-b8.

How to find the MAC address of the Network Adapters on my machine?

Binary Number, Hexadecimal Number

When working with networking protocols, you need to understand Binary Number and Hexadecimal Number in detail, and you need to be able to convert them.

bit means binary digit. A binary digit is either 1 or 0. e.g. 4 bits looks like this: 1001, or 0011, or 1100, etc.

octet means 8 bits.

byte means 8 bits. (wasn't so before 1990s)

1 hexadecimal digit is equivalent to 4 bits.

IP Address

IP address is used to identify all internet devices. (Each internet device may have one or more IP address.)

There are 2 versions of IP address: IPv4 and IPv6.

IPv4 is the older standard. Because it's only 32 bits, good for 2^32 unique address (about 4.2 billion). This is not enough since late 1990s. So, IPv6 was invented.

How to find the IP address of my network adapter?

How to find the IP address of my router?

Host, Hostname

A host refers to a particular machine (e.g. your computer). A hostname is just a name for a machine. Hostname is used for human to easily identify a machine. A host may have more than one IP address (because it can have multiple Network Adapter, or, a computer can be setup to function as a router, etc.).

How to find my hostname?

Network vs Host

Hosts in the internet is grouped into the concept of “network”. For example, all computers in a company can be one network. all computers in a home can be one network.

Each host is a part of a network.

IP Address Structure: Network, Host, Special Addresses

IP addresses are divided into 2 parts: network and host. The beginning bits are the network, the rest are host.

When a router gets a packet, it needs to know where to send this packet to (of all devices connected to it). Ultimately, this is done by a look-up table called Routing table (aka Routing Information Base, RIB)

When the network part of a destination IP address matches the network part of a from IP address, then the router knows it's from the same network, so it can send it to the host machine. Else, it is a different network, it can send it to another router.

The reason IP address is divided into network and host parts is because it makes routing much more efficient. Similar to a home address is divided into Country, State/Province, City, then finally street address.

Netmask: Network Bitmask

Each IPv4 address comes with a 32 bits number called bitmask. Bitmask is used to indicate how many bits are the network part. The network bits are 1, and host bits are 0.

For example, if a IP address has a bitmask of 11111111 11111111 00000000 00000000, it means the first 16 bits of the IP address is network, and rest the host.

CIDR Notation

CIDR notation is used to indicate how many bits in the beginning of a IP address are network. (CIDR means Classless Inter-Domain Routing)

CIDR notation is like this x.x.x.x/n, where the x.x.x.x is the usual dotted decimal notation for IP address, and the n is the number of bits for the network part.

example:

192.0.2.0/24

It means the first 24 bits are network.

IPv4 Special Address

When the network part of a IP address or the host part of a IP address is all 0 or all 1, it has special meaning.

There are more special addresses. See Reserved IP addresses

TCP/IP Protocol Layers

IP stack connections
TCP/IP data flow. The solid lines is the actual data connection. The dotted lines are abstract connection. 〔image source 2013-01-27 ❮http://en.wikipedia.org/wiki/File:IP_stack_connections.svg❯〕

TCP/IP is a set of protocols that are logically separated into 4 layers. They are:

Each layer down covers more detail about how to send a datagram.

Here's a human example. If i send you a letter, i'm not concerned about how it gets there, by car, by plane, boat, or who delivers the letter, or what happens if its raining. All I care, is the mail content and address, and whether you got the letter (and how soon you can get it). This is the highest level. But beneath it, there must be a system, such as address system, transportation system, government law or structure for delivering mail, etc.

In TCP/IP, the highest layer, the Application Layer, is concerned only about software sending some content (a sequence of bytes) to another address such as email address or URL or IP address. The lowest layer, the link layer, is concerned about how to actually connect hardware things physically, over cable/wire or radio waves. Such as the design of the cable, the electric signals.

Here's more detail about each layer.

Application layer (process-to-process): This is the high level layer. Application layer are protocols that focus communication from a high-level perspective, the application's perspective. Such as send/receive the data. The format of the data. For example, {HTTP (web), SMTP (email), DHCP (automatic host config)} are protocols at this level.

Transport layer (host-to-host): provides end-to-end communication services for applications. The transport layer provides convenient services such as connection-oriented data stream support, reliability, flow control, and multiplexing. Two most used protocols in this layer are TCP and UDP.

Internet layer (internetworking): The internet layer is about exchanging datagrams across machines. This layer defines the addressing and routing structures used in TCP/IP. The primary example is the IP (Internet Protocol), which defines IP addresses. Its function in routing is to send datagrams to the next router that is closer to the destination IP address.

Link layer: This layer is pretty much about physical connection technology. That is, translating packets to various electric or optical wire signals, or wireless by radio waves or satellite transmission. The Ethernet cable is considered a standard of the link layer.

UDP encapsulation
Sample encapsulation of data in TCP/IP. At top, the highests abstraction layer, the data is simply what we want to send, such as chat text. Then, the data is broken into many small packets. Below it, the Transport layer, it shows a datagram. It adds a “header” to the datagram. This header contain info such as how exactly we send it, should be connection oriented or not, etc. Below it, the IP header, contains even more lower level info. And so on. 〔image source 2013-01-27 ❮http://en.wikipedia.org/wiki/File:UDP_encapsulation.svg❯〕

Port Number

Port Number is a 16-bits number. It is used as a address to identify the app/process on a machine. IP address identifies a host, the port number identifies the process on that host.

Port number is used by TCP and UDP.

Port numbers are divided into three ranges:

Well-known ports are those from 0 through 1023. Examples:

here's a complete list. List of TCP and UDP port numbers

Socket

Network Socket is basically a API for programs to talk to the network. A socket address is a combination of IP address and a port number.

So, when a browser, or email app, want to talk to the internet, they talk to the socket. The socket is usually provided by the Operating System as a API. The programer don't have to worry about TCP/IP details, he just create a socket (by calling a function or new object), specify IP address, port number, and type of connection, and call functions/methods to send/receive data on it.

Here's sample doc of coding socket in different programing languages:

Connection Oriented vs Connectionless

There are 2 types of connection in TCP/IP:

TCP/IP by nature is a connectionless network, because each packet is independent. This is called Packet switching networking technology. (meaning, lots of small data “packets” are sent. Each one independent of another. They swarm towards destination, via routing (the “switch” part))

Packet Switching is in contrast to circuit switching tech.

Circuit Switching network is a connection-oriented networking approach. when a caller calls another, a electric circuit is established between the callers. It is used by early analog telephone networks. Circuit switching network in a sense dedicates the cable (or channel, medium) per active call/connection. Circuit switching

JT Switchboard 770x540
A telephone operator manually connecting calls with cord pairs at a telephone switchboard. Photo taken in 1975. (photo by Joseph A Carr. Used with permission) 〔image source 2013-02-20 ❮http://en.wikipedia.org/wiki/File:JT_Switchboard_770x540.jpg❯〕

However, a packet switching network (tcp/ip) can emulate the effects of physical connection by using protocols that acknowledge transmission, then establishing a virtual connection. TCP does this.

Here's how connection-oriented networking works. When a packet is sent, the receiver sends back acknowledgement. If the sender don't receive this, it re-sends. When a session of communication is over, the sender and receiver say goodbye to each other, therefore “closes” the connection. In this way, communication is established as if thru physical connection, even though the data units transmitted is actually discrete and goes thru many routers that doesn't have any notion about who is connected to whom.

TCP connection
TCP protocol connection. 〔image source 2013-02-09 ❮http://en.wikipedia.org/wiki/File:TCP_CLOSE.svg❯〕

IP Datagram

An IP packet consists of a header section and a data section.

An IP packet has no data checksum or any other footer after the data section. Typically the link layer encapsulates IP packets in frames with a CRC footer that detects most errors, and typically the end-to-end TCP layer checksum detects most other errors.

ipv4 header format 2023-06-05
ipv4 header format 2023-06-05

Routing schemes: unicast, anycast, multicast, broadcast

Routing

The Internet Protocol addressing system recognize 3 main types of addressing.

Transmission Control Protocol (TCP)

Transmission Control Protocol

TCP provides a communication service at an intermediate level between an application program and the Internet Protocol (IP). That is, when an application program desires to send a large chunk of data across the Internet using IP, instead of breaking the data into IP-sized pieces and issuing a series of IP requests, the software can issue a single request to TCP and let TCP handle the IP details.

tcp segment header 2023-06-05
tcp segment header 2023-06-05

Sequence number has 2 meanings depending on SYN flag in the datagram is on or off.

UDP (User Datagram Protocol)

User Datagram Protocol

UDP send datagrams without prior communications to set up special transmission channels or data paths.

UDP uses a simple transmission model with a minimum of protocol mechanism. It has no handshaking dialogs, and thus exposes any unreliability of the underlying network protocol to the user's program. As this is normally IP over unreliable media, there is no guarantee of delivery, ordering or duplicate protection. UDP provides checksums for data integrity, and port numbers for addressing different functions at the source and destination of the datagram.

UDP is suitable for purposes where error checking and correction is either not necessary or performed in the application, avoiding the overhead of such processing at the network interface level. Time-sensitive applications often use UDP because dropping packets is preferable to waiting for delayed packets, which may not be an option in a real-time system.[2] If error correction facilities are needed at the network interface level, an application may use the Transmission Control Protocol (TCP) or Stream Control Transmission Protocol (SCTP) which are designed for this purpose.

Datagram Congestion Control Protocol (DCCP)

Stream Control Transmission Protocol (SCTP)

Address Resolution Protocol (ARP)

Address Resolution Protocol = a protocol that creates a look-up table for mapping IP address to MAC address.

Each host has a ARP cache. If a host want to send data to another host in the same segment, it checks if the MAC address is in the ARP cache, if not, the host sends a broadcast called ARP request frame. The receiver with the IP address will respond and give it's MAC address.

Reverse Address Resolution Protocol (RARP) is obsolete, replaced by Bootstrap Protocol (BOOTP) then by Dynamic Host Configuration Protocol (DHCP).

Internet Control Message Protocol (ICMP)

Internet Control Message Protocol

The Internet Control Message Protocol (ICMP) is one of the core protocols of the Internet Protocol Suite. It is chiefly used by the operating systems of networked computers to send error messages indicating, for example, that a requested service is not available or that a host or router could not be reached. ICMP can also be used to relay query messages. It is assigned protocol number 1.

ICMP differs from transport protocols such as TCP and UDP in that it is not typically used to exchange data between systems, nor is it regularly employed by end-user network applications (with the exception of some diagnostic tools like ping and traceroute).

ICMP for Internet Protocol version 4 (IPv4) is also known as ICMPv4. IPv6 has a similar protocol, ICMPv6.

ICMP, often used by router to send messages back to host to indicate problems. Here's common scenarios.

ICMPv6

Internet Group Management Protocol

The Internet Group Management Protocol (IGMP) is a communications protocol used by hosts and adjacent routers on IP networks to establish multicast group memberships. IGMP is an integral part of IP multicast.

IGMP can be used for one-to-many networking applications such as online streaming video and gaming, and allows more efficient use of resources when supporting these types of applications.

IGMP is used on IPv4 networks. Multicast management on IPv6 networks is handled by Multicast Listener Discovery (MLD) which uses ICMPv6 messaging in contrast to IGMP's bare IP encapsulation.

Routing

Routing is one of the most important element in internet, because it is routing that moves data.

By definition, a router has 2 or more network adapters, because a router is used to forward data between different networks. For home routers, usually one end is connected to a cable modem or DSL modem to the internet, and the other hand are Ethernet ports for the home network.

The most critical part is the routing table. Routing table can be manually setup, called static routing, but is almost always constructed automatically by other “discovery” protocols, called dynamic routing. (because, manually setting up the routing table is humanly impossible when there are more than a handful of networks.) Routing table can still be manually adjusted, however.

Routing Table

Routing table

Routing table, aka Routing Information Base (RIB), is a data table stored in a router or a computer that lists the routes to particular network destinations, and in some cases, metrics (distances) associated with those routes. The routing table contains information about the topology of the network immediately around it.

The construction of routing tables is the primary goal of routing protocols. Static routes are entries made in a routing table by non-automatic means and which are fixed rather than being the result of some network topology “discovery” procedure.

How to see the routing table of my computer?

Routing Protocols

The job of Routing protocol is to fill the routing table.

There are 2 major types of routing protocol:

A link-state routing protocol is one of the two main classes of routing protocols used in packet switching networks for computer communications (the other is the distance-vector routing protocol). Examples of link-state routing protocols include open shortest path first (OSPF) and intermediate system to intermediate system (IS-IS).

The link-state protocol is performed by every router in the network. The basic concept of link-state routing is that every node constructs a map of the connectivity to the network, in the form of a graph, showing which nodes are connected to which other nodes. Each node then independently calculates the next best logical path from it to every possible destination in the network. The collection of best paths will then form the node's routing table.

This contrasts with distance-vector routing protocols, which work by having each node share its routing table with its neighbors. In a link-state protocol the only information passed between nodes is connectivity related.

Routing Information Protocol RIP. A distance vector routing protocol.

A RIP router broadcasts update message every 30 seconds. It can also request update.

Open Shortest Path First OSPF (a link-state routing protocol).

hop count

Routing loop problem

Core router

A core router is a router designed to operate in the Internet backbone, or core. To fulfill this role, a router must be able to support multiple telecommunications interfaces of the highest speed in use in the core Internet and must be able to forward IP packets at full speed on all of them. It must also support the routing protocols being used in the core. A core router is distinct from an edge router: edge routers sit at the edge of a backbone network and connect to core routers.

Autonomous System (Internet)


Dynamic Host Configuration Protocol (DHCP)

Dynamic Host Configuration Protocol

Zero configuration networking




Tunneling protocol

Computer networks use a tunneling protocol when one network protocol (the delivery protocol) encapsulates a different payload protocol. By using tunneling one can (for example) carry a payload over an incompatible delivery-network, or provide a secure path through an untrusted network.

Virtual private network

Simple Service Discovery Protocol (SSDP)

Simple Network Management Protocol

Network segment. A term for a portion of network. For example, An Ethernet hub is a device for connecting multiple Ethernet devices together and making them act as a single network segment.

Ethernet hub

Network switch A switch is a telecommunication device which receives a message from any device connected to it and then transmits the message only to the device for which the message was meant. This makes the switch a more intelligent device than a hub (which receives a message and then transmits it to all the other devices on its network).

Promiscuous mode

In computer networking, promiscuous mode or promisc mode is a mode for a wired network interface controller (NIC) or wireless network interface controller (WNIC) that causes the controller to pass all traffic it receives to the central processing unit (CPU) rather than passing only the frames that the controller is intended to receive. This mode is normally used for packet sniffing that takes place on a router or on a computer connected to a hub (instead of a switch) or one being part of a WLAN. The mode is also required for bridged networking for hardware virtualization.

In IEEE 802 networks such as Ethernet, token ring, and IEEE 802.11, and in FDDI, each frame includes a destination Media Access Control address (MAC address). In non-promiscuous mode, when a NIC receives a frame, it normally drops it unless the frame is addressed to that NIC's MAC address or is a broadcast or multicast frame. In promiscuous mode, however, the card allows all frames through, thus allowing the computer to read frames intended for other machines or network devices.


wireless

IEEE 802.11

Service set (802.11 network)

common problems

See: How to Diagnose Computer Networking Problems

Firewall

firewall 21570 http://en.wikipedia.org/wiki/File:Firewall.png

Firewall (computing) filters traffic. Firewall can be classified by their power:

placement of firewall:

Firewall can be software based or hardware. The function of a firewall is often parts of other services or device. Most Operating System has software based firewall. Some routers can also do some firewall functions, or be a powerful firewall. Firewall can also be a proxy server .

on Linux, Firewall framework is netfilter (iptables). For a intro, see: Linux: What's Netfilter, iptables, Their Differences?

Port scanner

DNS and host file

Hosts (file)

Domain Name System

WAN

Wide area network

Integrated Services Digital Network ISDN

High-Level Data Link Control HDLC

ATM

Asynchronous Transfer Mode


OpenWrt

DD-WRT

FON

diskeynote talk by Radia Perlman at Linux.conf.au 2013 http://mirror.linux.org.au/linux.conf.au/2013/mp4/Keynote_Radia_Perlman.mp4