.. _ip:

The Internet Protocol
#####################

.. warning::

   This chapter is work-in-progress and may contain vastly incomplete information.

The Internet Protocols (or short: IP) are what most of the Internet is currently running on. It has been specified as Internet Protocol version 4 (or short: IPv4) in :rfc:`791` in 1981 and re-newed as Internet Protocol version 6 (or short: IPv6) in :rfc:`2460` in 1998.

This section will only go into the very basics. If you want to know more details, for example about more advanced routing, I can recommend the great book [TIAIP]_. I will only put down a few bullet points with respect to IP addresses.

* IPv4 addresses consist of 32 bits and look like this, for example: ``127.0.0.1`` (a loopback address), ``10.190.239.10`` (a site-local address from one of the three site-local subnets), ``141.1.1.1`` (a globally unique address).

* IPv6 addresess consist of 128 bits and look like this, for example: ``::1`` (the loopback address), ``fe80::1`` (a link-local address), ``2001:db8::1`` (a globally unique address).

* IP addresses are split into *subnets*. A subnet or *prefix* is a consecutive range of IP addresses which has a size of a power of two. The *prefix length* is the number of leading bits in the IP address which consitute the subnet. For example, the address range ``2001:db8::0`` through ``2001:db8::f`` (16 addresses in total) can be written as ``2001:db8::/124``: The first 124 bits are fixed and the last four bits are variable and thus constitute the addresses in the subnet. This works for both IPv4 and IPv6.

* A single address is equivalent to a subnet with a prefix length equal to the address length. For example ``127.0.0.1/32`` is equivalent to ``127.0.0.1`` and ``fe80::1/128`` is equivalent to ``fe80::1``.

* The Internet consists of many connected nodes, each having an IP address (either or both of IPv4 and IPv6).

* To reach node :math:`A` from node :math:`B`, data usually travels over many other nodes (*hops*). Routers are nodes which decide where to send a packet based on the destination IP address (possibly among other criteria).

..
   If you are unfamiliar with the topic of IP, you might expect that everything is running on IPv6 by now. It has been specified in 1998, and that is, at the time of writing, 19 years ago. But here is your first lesson: Things move slow in system administration. A fundamental principle some people adhere to is "never change a running system". The more low-level a technology is, the slower it advances, I think. Now this wasn’t a fundamentally bad thing if there was nothing wrong with IPv4. However, there is a lot wrong with IPv4.


   Addressing
   ==========

   A core principle of the IP protocols is that each participant in the Internet has an address. The address is used to send them data, in form of IP packets. IPv4 addresses consist of 32-bits, that is, there are in theory 4294967296 addresses available. This is less than there are currently humans on this planet. So if every human had a single device connected to the internet (a smartphone for example, and we are getting close to that scenario), it would not be possible for each of those devices to have their own address. Now there isn’t just "you and me" who has a smartphone and maybe a laptop or desktop PC, but there are whole datacenters of big companies with hundreds of thousands of (virtual) systems which all need to be addressable. With the rise of the Internet of Things, even your light bulb is connected to the Internet. There is no way 4 billions of addresses can be enough *even in this very moment* (and I haven’t even mentioned yet that a lot of those 4 billion addresses are concentrated at a few huge organisations due to legacy; read up on Class-based Routing and e.g. the Department of Defense of the U.S. if you want to get into that history).

   This is a good reason for IPv6. IPv6 has addresses which are 128 bits wide. This means there are

   .. math::

     2^{128} = 340\,282\,366\,920\,938\,463\,463\,374\,607\,431\,768\,211\,456

   addresses. This is ... a ... huge number. Before we compare that to a few numbers, we need to mention a few things about how those addresses are handed out to users. In IPv6, the idea is that every end-user (like you and me) gets *at least* :math:`2^{64}` addresses. Yes, you heard right, each of us is supposed to get over :math:`18446744073` *billion* addresses. Some critics suggest that this will get us the same mess we got with IPv4 (running out of addresses very soon), but that’s not the case if you do the math yourself. Per square centimeter earth surface (including oceans!), there are more than 36000 of these :math:`2^{64}` address blocks available. This is an amount of raw addresses you cannot even imagine.

   An IPv4 address is typically written down as four dotted decimal numbers, each from 0 to 255. For example ``10.190.239.10`` is the IPv4 address my machine has in my local network.

   IPv6 addresses are much longer. The format is explained in detail in :rfc:`2460`. An example is ``2001:db8:abcd:ef01:2345:6789:abcd:ef01``, which is an address I chose from the range of :rfc:`3849` addresses, which are reserved for documentation.


   Forwarding
   ==========

   .. epigraph::

      The router knows the way.


   Now that we have established how addresses look like, we will discuss briefly how computers know where to send data when tasked with sending a packet to a specific address. First of all, the Internet isn’t a direct link between your system and the server you try to reach.


..
   This is already where critics of the IPv6 protocol came around with statements like "see, this handing out of huge address blocks is what gave us the scarcity of IPv4 right now". However, bear with me, I’ll show you that these critics have no point whatsoever. But first, let’s get even more crazy. In :rfc:`7368` (IPv6 Home Networking Architecture Principles) it is suggested that a user may get multiple such blocks of :math:`2^{64}` addresses to be able to create subnetworks of their own locally (we will go into a bit of detail with this *subnetting* later). So let’s assume each end-user gets :math:`2^{72}` addresses for themselves (an example from that RFC, and also what I’m getting at a large german ISP right now). That means that 56 bits of the 128 bits are left out of control of the end-user (called a *prefix*, see also later), i.e. we can serve :math:`2^{56}` end users. A few bits also get lost on administrativia, so let’s say we end up with :math:`2^{50}` available *prefixes*.

   Now, :math:`2^{40}` and :math:`2^{32}` (you recall from IPv4) don’t sound so different. Let’s compare :math:`2^{50}` to a few dimensions.

   * The surface of earth. `According to wikipedia <https://en.wikipedia.org/wiki/Earth#Surface>`_, the surface of earth is 510 million square kilometers in size.

     .. math::

        \frac{2^{50}\,\text{prefixes}}{510\cdot 10^6\,\text{km}^2} \approx 2.2\,\frac{\text{prefix}}{cm^2}

     Note the change of units there. Per square *centimeter* there are 2.2 times :math:`2^{72}` addresses available.

.. [TIAIP] The Internet and its Protocols