TCP/IP Tutorial for Beginner
This is a basic tutorial on TCP/IP, for beginner programer or scientists. In a hour, you should have a basic understanding.
TCP/IP is a set of protocols, and is the primary tech of the internet. When you browse the web, send email, chat online, online gaming, TCP/IP is working busily underneath.
What is a protocol?
A protocol is a set of rules and procedures, such as what format to use, when should data be send, what are the numbers in the data mean, what commands to use, what error code are there and their meaning, etc.
When two computers exachange data, they can understand each other if both uses the same protocol.
Overview of How Internet Works
Suppose you are viewing a web page, or chat with a friend online, or downloading a file. What happens underneath?
The app (email, chat, etc) breaks the data into thousands of tiny independent pieces. Each piece is called a packet (or datagram). Each packet has embedded with it the destination IP address. Your computer send this packet to your router , and your router send it to another router that's closer to the destination. This process continues until the designated machine with the IP address receives it. This is done for each and every packet. On the receiving machine, it re-assembles all these packets into the original whole piece in the right order, and send it to the right application on that machine (the email server, or web server, or chat server. (which in turn, repeat the same thing to send it to your friend's machine.))
Computer software follow a set of standardized rules of procedure when talking to each other. This standardized rules of procedure used for internet is called the Internet Protocol Suite (aka TCP/IP).
Networking Hardware
Before we talk about internet protocol, lets take a look at hardware needed, because hardware gives us a good overview of how things are connected.
The essential hardware for internet to work, for our purposes, are:
- Network Adapter
- Router
Network Interface Controller
Network Interface Controller (NIC) (aka network adapter, network card network interface card, LAN adapter ) is a hardware that lets your computer talk to the internet. All internet-capable device has at least one. Today's computer usually has two, one for Ethernet (wired) and one for wireless.
Network Interface Controller provides one of:
- wired ethernet port
- wireless internet transceiver
As of year 2020, every phone, laptop, desktop computer has wireless network adapter builtin. And every phone has wireless network adapter builtin.
How to list all Network Interface?
- Linux: Type
ip link
orifconfig -a
- Windows: Type
ipconfig /all
Router
Then, the second most important hardware is Router . Router transfer packets between internet devices.
The Network Adapter (wired or wireless) in your computer send signals to the router, then the router either send it to other computer in your home, or send it to the internet via physical cable or phone line connected to it (typically a device called Cable Modem) .
Typically, each internet device start with its software sending info to the Network Adapter, then the Network Adapter send it to a router, then router send it to another router, and so on, until a router send it to a destination computer's Network Adapter. (the destination is usually a corporation's machine, we call it “server”. (e.g. when you goto google.com or any or any website.) The server, either stores your info (e.g. photos), or send it to another user's machine (e.g. online chat messages).)
Internet Addresses
In order to send info, devices must have addresses for destination. Two of them are most important:
- MAC Adddress
- IP Address
MAC Address (aka Hardware Address, Physical Address)
Each Network Adapter has a ID, called MAC address (aka hardware address, physical address). This ID is burned into the hardware. (“MAC” is abbreviation for “Media Access Control” (the name is historical).)
MAC address is a 48 bits number. Usually written as 6 groups of 2 hexadecimal
digits. For example,
8d-cc-58-ab-db-b8
.
How to find the MAC address of the Network Adapters on my machine?
- Linux: Type
ip link
orifconfig -a
- Windows: Type
ipconfig /all
Binary Number, Hexadecimal Number
When working with networking protocols, you need to understand Binary Number and Hexadecimal Number in detail, and you need to be able to convert them.
bit means binary digit. A binary digit is either 1 or 0. e.g. 4 bits looks like this: 1001
, or 0011
, or 1100
, etc.
octet means 8 bits.
byte means 8 bits. (wasn't so before 1990s)
1 hexadecimal digit is equivalent to 4 bits.
IP Address
IP address is used to identify all internet devices. (Each internet device may have one or more IP address.)
There are 2 versions of IP address: IPv4 and IPv6.
- IPv4 address = 32 bits. Usually written in 4 groups, each as a decimal. For example,
172.16.254.1
, each decimal group represent 8 bits. - IPv6 address = 128 bits. Usually written in 8 groups, each is 4 digits of hexadecimal, separated by colon, with leading 0 omitted. For example,
2001:db8:0:1234:0:567:8:1
. Each group of hexadecimal represents 16 bits.
IPv4 is the older standard. Because it's only 32 bits, good for 2^32 unique address (about 4.2 billion). This is not enough since late 1990s. So, IPv6 was invented.
How to find the IP address of my network adapter?
- Linux: Type
ip addr
orifconfig -a
- Windows: Type
ipconfig
How to find the IP address of my router?
- Linux: Type
ip route
. The line containing “default” has the IP address of default router. - Windows: Type
ipconfig
, then the “Default Gateway” line contains your router's IP address.
Host, Hostname
A host refers to a particular machine (e.g. your computer). A hostname is just a name for a machine. Hostname is used for human to easily identify a machine. A host may have more than one IP address (because it can have multiple Network Adapter, or, a computer can be setup to function as a router, etc.).
How to find my hostname?
- Linux: Type
hostname
- Windows: Type
hostname
Network vs Host
Hosts in the internet is grouped into the concept of “network”. For example, all computers in a company can be one network. all computers in a home can be one network.
Each host is a part of a network.
IP Address Structure: Network, Host, Special Addresses
IP addresses are divided into 2 parts: network and host. The beginning bits are the network, the rest are host.
When a router gets a packet, it needs to know where to send this packet to (of all devices connected to it). Ultimately, this is done by a look-up table called Routing table (aka Routing Information Base, RIB)
When the network part of a destination IP address matches the network part of a from IP address, then the router knows it's from the same network, so it can send it to the host machine. Else, it is a different network, it can send it to another router.
The reason IP address is divided into network and host parts is because it makes routing much more efficient. Similar to a home address is divided into Country, State/Province, City, then finally street address.
Netmask: Network Bitmask
Each IPv4 address comes with a 32 bits number called bitmask. Bitmask is used to indicate how many bits are the network part. The network bits are 1, and host bits are 0.
For example, if a IP address has a bitmask of 11111111 11111111 00000000 00000000
, it means the first 16 bits of the IP address is network, and rest the host.
CIDR Notation
CIDR notation is used to indicate how many bits in the beginning of a IP address are network. (CIDR means Classless Inter-Domain Routing)
CIDR notation is like this
x.x.x.x/n
, where the x.x.x.x
is the
usual dotted decimal notation for IP address, and the n
is
the number of bits for the network part.
example:
192.0.2.0/24
It means the first 24 bits are network.
IPv4 Special Address
When the network part of a IP address or the host part of a IP address is all 0 or all 1, it has special meaning.
- If the host part's bits are all 0, it refers to the local network.
- If the host part's bits are all 1, it's a Broadcast address, meant to sent to all hosts that belongs to the destination network/subnet. (this is called “directed broadcast”)
- If the entire IP address are all 1 (that is 255.255.255.255), it means local network broadcast. Router never forwarded packets with this destination outside the local network.
- Default route = 0.0.0.0/0
- 127.0.0.0 = loopback address. Localhost = localhost as a hostname translates to an IPv4 address in the 127.0.0.0/8 (loopback) net block, usually 127.0.0.1, or ::1 in IPv6.
There are more special addresses. See Reserved IP addresses
TCP/IP Protocol Layers
- “Host” means a computer.
- The “process” means a running software program, such as web browser.
- The “link” means “router”.
- {Ethernet, fiber, satellite} are physical links (e.g. cable or radio wave transmissions.)
TCP/IP is a set of protocols that are logically separated into 4 layers. They are:
- Application layer
- Transport layer
- Internet layer
- Link layer
Each layer down covers more detail about how to send a datagram.
Here's a human example. If i send you a letter, i'm not concerned about how it gets there, by car, by plane, boat, or who delivers the letter, or what happens if its raining. All I care, is the mail content and address, and whether you got the letter (and how soon you can get it). This is the highest level. But beneath it, there must be a system, such as address system, transportation system, government law or structure for delivering mail, etc.
In TCP/IP, the highest layer, the Application Layer, is concerned only about software sending some content (a sequence of bytes) to another address such as email address or URL or IP address. The lowest layer, the link layer, is concerned about how to actually connect hardware things physically, over cable/wire or radio waves. Such as the design of the cable, the electric signals.
Here's more detail about each layer.
Application layer (process-to-process): This is the high level layer. Application layer are protocols that focus communication from a high-level perspective, the application's perspective. Such as send/receive the data. The format of the data. For example, {HTTP (web), SMTP (email), DHCP (automatic host config)} are protocols at this level.
Transport layer (host-to-host): provides end-to-end communication services for applications. The transport layer provides convenient services such as connection-oriented data stream support, reliability, flow control, and multiplexing. Two most used protocols in this layer are TCP and UDP.
Internet layer (internetworking): The internet layer is about exchanging datagrams across machines. This layer defines the addressing and routing structures used in TCP/IP. The primary example is the IP (Internet Protocol), which defines IP addresses. Its function in routing is to send datagrams to the next router that is closer to the destination IP address.
Link layer: This layer is pretty much about physical connection technology. That is, translating packets to various electric or optical wire signals, or wireless by radio waves or satellite transmission. The Ethernet cable is considered a standard of the link layer.
Port Number
Port Number is a 16-bits number. It is used as a address to identify the app/process on a machine. IP address identifies a host, the port number identifies the process on that host.
Port number is used by TCP and UDP.
Port numbers are divided into three ranges:
- well-known ports
- registered ports
- dynamic or private ports
Well-known ports are those from 0 through 1023. Examples:
- 20 and 21: FTP
- 22: SSH (for secure remote command line access.)
- 23: Telnet (for remote command line access)
- 25: SMTP (for sending email)
- 53: DNS
- 80: HTTP
- 110: POP3 (for receiving email)
- 143: IMAP. (for receiving email, improved POP)
- 161: Simple Network Management Protocol (SNMP)
- 443: HTTPS. (secure web, for example, online banking, shopping)
here's a complete list. List of TCP and UDP port numbers
Socket
Network Socket is basically a API for programs to talk to the network. A socket address is a combination of IP address and a port number.
So, when a browser, or email app, want to talk to the internet, they talk to the socket. The socket is usually provided by the Operating System as a API. The programer don't have to worry about TCP/IP details, he just create a socket (by calling a function or new object), specify IP address, port number, and type of connection, and call functions/methods to send/receive data on it.
Here's sample doc of coding socket in different programing languages:
- golang https://golang.org/pkg/net/
- python https://docs.python.org/3/library/socket.html
- emacs lisp Network (ELISP Manual)
Connection Oriented vs Connectionless
There are 2 types of connection in TCP/IP:
- Connection Oriented.
- Connectionless.
TCP/IP by nature is a connectionless network, because each packet is independent. This is called Packet switching networking technology. (meaning, lots of small data “packets” are sent. Each one independent of another. They swarm towards destination, via routing (the “switch” part))
Packet Switching is in contrast to circuit switching tech.
Circuit Switching network is a connection-oriented networking approach. when a caller calls another, a electric circuit is established between the callers. It is used by early analog telephone networks. Circuit switching network in a sense dedicates the cable (or channel, medium) per active call/connection. Circuit switching
However, a packet switching network (tcp/ip) can emulate the effects of physical connection by using protocols that acknowledge transmission, then establishing a virtual connection. TCP does this.
Here's how connection-oriented networking works. When a packet is sent, the receiver sends back acknowledgement. If the sender don't receive this, it re-sends. When a session of communication is over, the sender and receiver say goodbye to each other, therefore “closes” the connection. In this way, communication is established as if thru physical connection, even though the data units transmitted is actually discrete and goes thru many routers that doesn't have any notion about who is connected to whom.
- TCP is a connection oriented protocol.
- UDP is a connectionless protocol.
IP Datagram
An IP packet consists of a header section and a data section.
An IP packet has no data checksum or any other footer after the data section. Typically the link layer encapsulates IP packets in frames with a CRC footer that detects most errors, and typically the end-to-end TCP layer checksum detects most other errors.
- ICMP = 1
- TCP = 6
- UDP = 17
Routing schemes: unicast, anycast, multicast, broadcast
The Internet Protocol addressing system recognize 3 main types of addressing.
- Unicast addressing uses a one-to-one association between destination address and network endpoint: each destination address uniquely identifies a single receiver endpoint.
- Broadcast or multicast addressing uses a one-to-many association, datagrams are routed from a single sender to multiple endpoints simultaneously in a single transmission. The network automatically replicates datagrams as needed for all network segments (links) that contain an eligible receiver.
- Anycast addressing routes datagrams to a single member of a group of potential receivers that are all identified by the same destination address. This is a one-to-one-of-many association.
Transmission Control Protocol (TCP)
TCP provides a communication service at an intermediate level between an application program and the Internet Protocol (IP). That is, when an application program desires to send a large chunk of data across the Internet using IP, instead of breaking the data into IP-sized pieces and issuing a series of IP requests, the software can issue a single request to TCP and let TCP handle the IP details.
- source port. 16 bits. The sender's port number.
- destination port. 16 bits. The receiver port number.
- sequence number. 32 bits.
Sequence number has 2 meanings depending on SYN flag in the datagram is on or off.
- If the SYN flag is 0, then this is the accumulated sequence number of the first data byte of this segment for the current session.
- If the SYN flag is 1, then this is the initial sequence number. The sequence number of the actual first data byte and the acknowledged number in the corresponding ACK are then this sequence number plus 1.
- Acknowledgment number (32 bits) – if the ACK flag is set then the value of this field is the next sequence number that the receiver is expecting. This acknowledges receipt of all prior bytes (if any). The first ACK sent by each end acknowledges the other end's initial sequence number itself, but no data.
- Data offset (4 bits) – specifies the size of the TCP header in 32-bit words. The minimum size header is 5 words and the maximum is 15 words thus giving the minimum size of 20 bytes and maximum of 60 bytes, allowing for up to 40 bytes of options in the header. This field gets its name from the fact that it is also the offset from the start of the TCP segment to the actual data.
- Reserved (3 bits) – for future use and should be set to zero
UDP (User Datagram Protocol)
UDP send datagrams without prior communications to set up special transmission channels or data paths.
UDP uses a simple transmission model with a minimum of protocol mechanism. It has no handshaking dialogs, and thus exposes any unreliability of the underlying network protocol to the user's program. As this is normally IP over unreliable media, there is no guarantee of delivery, ordering or duplicate protection. UDP provides checksums for data integrity, and port numbers for addressing different functions at the source and destination of the datagram.
UDP is suitable for purposes where error checking and correction is either not necessary or performed in the application, avoiding the overhead of such processing at the network interface level. Time-sensitive applications often use UDP because dropping packets is preferable to waiting for delayed packets, which may not be an option in a real-time system.[2] If error correction facilities are needed at the network interface level, an application may use the Transmission Control Protocol (TCP) or Stream Control Transmission Protocol (SCTP) which are designed for this purpose.
Datagram Congestion Control Protocol (DCCP)
Stream Control Transmission Protocol (SCTP)
Address Resolution Protocol (ARP)
Address Resolution Protocol = a protocol that creates a look-up table for mapping IP address to MAC address.
Each host has a ARP cache. If a host want to send data to another host in the same segment, it checks if the MAC address is in the ARP cache, if not, the host sends a broadcast called ARP request frame. The receiver with the IP address will respond and give it's MAC address.
Reverse Address Resolution Protocol (RARP) is obsolete, replaced by Bootstrap Protocol (BOOTP) then by Dynamic Host Configuration Protocol (DHCP).
Internet Control Message Protocol (ICMP)
Internet Control Message Protocol
The Internet Control Message Protocol (ICMP) is one of the core protocols of the Internet Protocol Suite. It is chiefly used by the operating systems of networked computers to send error messages indicating, for example, that a requested service is not available or that a host or router could not be reached. ICMP can also be used to relay query messages. It is assigned protocol number 1.
ICMP differs from transport protocols such as TCP and UDP in that it is not typically used to exchange data between systems, nor is it regularly employed by end-user network applications (with the exception of some diagnostic tools like ping and traceroute).
ICMP for Internet Protocol version 4 (IPv4) is also known as ICMPv4. IPv6 has a similar protocol, ICMPv6.
ICMP, often used by router to send messages back to host to indicate problems. Here's common scenarios.
- Echo Request and Echo Reply. For testing.
ping
,traceroute
. - Source Quench. Tell the sender that it's sending too fast.
- Destination Unreachable. For example, it's down
- Time Exceeded. For example, Too many hops. TTL (Time to Live) reached expiration (zero). For example, When routing loop occurs. (routing loop can happen when routing table is set manually. Or it can happen anyway.)
- Fragmentation Needed
Internet Group Management Protocol
The Internet Group Management Protocol (IGMP) is a communications protocol used by hosts and adjacent routers on IP networks to establish multicast group memberships. IGMP is an integral part of IP multicast.
IGMP can be used for one-to-many networking applications such as online streaming video and gaming, and allows more efficient use of resources when supporting these types of applications.
IGMP is used on IPv4 networks. Multicast management on IPv6 networks is handled by Multicast Listener Discovery (MLD) which uses ICMPv6 messaging in contrast to IGMP's bare IP encapsulation.
Routing
Routing is one of the most important element in internet, because it is routing that moves data.
By definition, a router has 2 or more network adapters, because a router is used to forward data between different networks. For home routers, usually one end is connected to a cable modem or DSL modem to the internet, and the other hand are Ethernet ports for the home network.
- receive data from one of its attached networks.
- check the destination address in the IP header. If it's on the network from whence the data came, the datagram is ignored. (because already reached its destination. (Ethernet sends it to all in the same network))
- If destination IP address for a different network, the router checks the routing table to determine where to forward the datagram.
- it dis-assemble and re-assemble the datagram and send it to the right adapter.
The most critical part is the routing table. Routing table can be manually setup, called static routing, but is almost always constructed automatically by other “discovery” protocols, called dynamic routing. (because, manually setting up the routing table is humanly impossible when there are more than a handful of networks.) Routing table can still be manually adjusted, however.
Routing Table
Routing table, aka Routing Information Base (RIB), is a data table stored in a router or a computer that lists the routes to particular network destinations, and in some cases, metrics (distances) associated with those routes. The routing table contains information about the topology of the network immediately around it.
The construction of routing tables is the primary goal of routing protocols. Static routes are entries made in a routing table by non-automatic means and which are fixed rather than being the result of some network topology “discovery” procedure.
How to see the routing table of my computer?
- Linux: Type
ip route
orroute
- Windows:
Routing Protocols
The job of Routing protocol is to fill the routing table.
There are 2 major types of routing protocol:
A link-state routing protocol is one of the two main classes of routing protocols used in packet switching networks for computer communications (the other is the distance-vector routing protocol). Examples of link-state routing protocols include open shortest path first (OSPF) and intermediate system to intermediate system (IS-IS).
The link-state protocol is performed by every router in the network. The basic concept of link-state routing is that every node constructs a map of the connectivity to the network, in the form of a graph, showing which nodes are connected to which other nodes. Each node then independently calculates the next best logical path from it to every possible destination in the network. The collection of best paths will then form the node's routing table.
This contrasts with distance-vector routing protocols, which work by having each node share its routing table with its neighbors. In a link-state protocol the only information passed between nodes is connectivity related.
Routing Information Protocol RIP. A distance vector routing protocol.
A RIP router broadcasts update message every 30 seconds. It can also request update.
Open Shortest Path First OSPF (a link-state routing protocol).
hop count
A core router is a router designed to operate in the Internet backbone, or core. To fulfill this role, a router must be able to support multiple telecommunications interfaces of the highest speed in use in the core Internet and must be able to forward IP packets at full speed on all of them. It must also support the routing protocols being used in the core. A core router is distinct from an edge router: edge routers sit at the edge of a backbone network and connect to core routers.
Dynamic Host Configuration Protocol (DHCP)
Dynamic Host Configuration Protocol
- multiplexing = multiple analog message signals or digital data streams are combined into one signal over a shared medium.
- Duplex (telecommunications) point-to-point system composed of two connected parties or devices that can communicate with one another in both directions.
Computer networks use a tunneling protocol when one network protocol (the delivery protocol) encapsulates a different payload protocol. By using tunneling one can (for example) carry a payload over an incompatible delivery-network, or provide a secure path through an untrusted network.
Simple Service Discovery Protocol (SSDP)
Simple Network Management Protocol
Network segment. A term for a portion of network. For example, An Ethernet hub is a device for connecting multiple Ethernet devices together and making them act as a single network segment.
- An Ethernet hub, active hub, network hub, repeater hub, multiport repeater or hub is a device for connecting multiple Ethernet devices together and making them act as a single network segment.
- It has multiple input/output (I/O) ports, in which a signal introduced at the input of any port appears at the output of every port except the original incoming.
- A hub works at the physical layer (layer 1) of the OSI model.
- The device is a form of multiport repeater. Repeater hubs also participate in collision detection, forwarding a jam signal to all ports if it detects a collision.
- A network hub is an unsophisticated device in comparison with, for example, a switch. A hub does not examine or manage any of the traffic that comes through it: any packet entering any port is rebroadcast on all other ports.
- The availability of low-priced network switches has largely rendered hubs obsolete
Network switch A switch is a telecommunication device which receives a message from any device connected to it and then transmits the message only to the device for which the message was meant. This makes the switch a more intelligent device than a hub (which receives a message and then transmits it to all the other devices on its network).
In computer networking, promiscuous mode or promisc mode is a mode for a wired network interface controller (NIC) or wireless network interface controller (WNIC) that causes the controller to pass all traffic it receives to the central processing unit (CPU) rather than passing only the frames that the controller is intended to receive. This mode is normally used for packet sniffing that takes place on a router or on a computer connected to a hub (instead of a switch) or one being part of a WLAN. The mode is also required for bridged networking for hardware virtualization.
In IEEE 802 networks such as Ethernet, token ring, and IEEE 802.11, and in FDDI, each frame includes a destination Media Access Control address (MAC address). In non-promiscuous mode, when a NIC receives a frame, it normally drops it unless the frame is addressed to that NIC's MAC address or is a broadcast or multicast frame. In promiscuous mode, however, the card allows all frames through, thus allowing the computer to read frames intended for other machines or network devices.
wireless
common problems
See: How to Diagnose Computer Networking Problems
Firewall
Firewall (computing) filters traffic. Firewall can be classified by their power:
- Basic firewall (aka packet filter). Simply look at each packet and decide to drop based on any {ip address, port number, protocol, tcp/udp traffic} in the packet. When the packet fits a filter rule, the firewall may simply drop the packet or send a error response.
- Stateful firewall. Understand up to transport layer. This is done by accumulate (caching) packets. Can know invalid packet, session hijacking, some DOS attack. Stateful firewall
- more advanced firewall understands app layer. Application firewall
placement of firewall:
- normal, between local network and outside.
- put public services outside the firewall.
- Two firewalls , between outsite world, public services, local network. The middle zone is called DMZ (computing). (not necessarily 2 firewall for this, might be just one filter/direct traffic among 3 zones (3 Network Interface).)
Firewall can be software based or hardware. The function of a firewall is often parts of other services or device. Most Operating System has software based firewall. Some routers can also do some firewall functions, or be a powerful firewall. Firewall can also be a proxy server .
on Linux, Firewall framework is netfilter (iptables). For a intro, see: Linux: What's Netfilter, iptables, Their Differences?
DNS and host file
WAN
Wide area networkIntegrated Services Digital Network ISDN
High-Level Data Link Control HDLC
diskeynote talk by Radia Perlman at Linux.conf.au 2013 http://mirror.linux.org.au/linux.conf.au/2013/mp4/Keynote_Radia_Perlman.mp4