Building a Web Server From Scratch in Pure Python

python

Building a web server from scratch in Python? Join me as I journey from simple http.server to raw sockets, threading, selectors, and asyncio, exploring different concurrency models and benchmarking performance along the way.

Published

January 29, 2025

image source: https://www.artbreeder.com/image/ea82e4713d9d1468ae4c68676e3c

Introduction

Have you ever wondered what’s happening under the hood when you access a website? I certainly have! That curiosity led me down a rabbit hole – building a web server from scratch using nothing but pure Python. Now, before you get too excited, let me be clear: this isn’t about creating the next Nginx. My goal is much simpler, and hopefully more insightful: to understand the fundamental concepts of networking and concurrency that power the web.

Think of this as an educational adventure. We’re going to ditch the fancy frameworks and get our hands dirty with the raw building blocks. We’ll be exploring different ways to construct a basic web server, starting with Python’s built-in http.server and then diving deeper into sockets, threading, selectors, and finally, the asynchronous magic of asyncio.

Along the way, we’ll benchmark each server using Apache Benchmark (ab) on an AWS t2.micro EC2 instance. EC2 machine is though entirely optional, and you can run these benchmarks on your local machine as well. I used it to give us a consistent way to compare performance and replicate. So, join me as we embark on this journey. It’s all about learning, experimenting, and maybe, just maybe, gaining a newfound appreciation for what goes into serving up those cat 🐈 videos you love.

Environment Details

All code examples provided with this post were tested with Python 3.9.20.

Code Samples

All the code examples used in this post can be found on the GitHub repo 2025-01-29-pure-python-web-server

Attempt 1 - The Simplicity of `http.server`

To start our journey, I wanted to establish a really simple baseline. Python’s http.server module is perfect for this. It’s like the ‘Hello, World!’ of web servers – incredibly easy to set up. Let’s take a look at the code.

httpd/server.py

import http.server
import time
from http import HTTPStatus

PORT = 8000

class MyHandler(http.server.BaseHTTPRequestHandler):
    def do_GET(self):
        # Simulate a database call or some processing
        time.sleep(0.1)

        self.send_response(HTTPStatus.OK)
        self.send_header("Content-type", "text/html")
        self.end_headers()
        response_html = """
            <html>
                <head>
                    <title>My Basic Server</title>
                </head>
                <body>
                    <h1>Hello from my basic server</h1>
                </body>
            </html>
        """
        self.wfile.write(response_html.encode())

if __name__ == "__main__":
    with http.server.HTTPServer(("", PORT), MyHandler) as httpd:
        print(f"Serving on port {PORT}")
        httpd.serve_forever()

As you can see, it’s remarkably short.

We import http.server, define a handler class MyHandler that inherits from BaseHTTPRequestHandler, and override the do_GET method.
This method is called whenever the server receives a GET request.
Inside, I’ve added time.sleep(0.1) to mimic a slow database call or some processing – because real-world servers aren’t instant.
Then, we construct a simple HTML response and send it back.

To run this, just type python server.py in your terminal. You should see “Serving on port 8000”. To test it, open your browser or use curl http://localhost:8000. You should see the “Hello from my basic server” message.

Now, let’s see how it performs under a bit of load. I used Apache Benchmark with this command:

ab -n 1000 -c 10 http://localhost:8000/.

This sends 1000 requests with a concurrency of 10. Here are the results I got.

Server Software:        BaseHTTP/0.6
Server Hostname:        3.90.155.197
Server Port:            8000

Document Path:          /
Document Length:        249 bytes

Concurrency Level:      10
Time taken for tests:   114.367 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      366000 bytes
HTML transferred:       249000 bytes
Requests per second:    8.74 [#/sec] (mean)
Time per request:       1143.666 [ms] (mean)
Time per request:       114.367 [ms] (mean, across all concurrent requests)
Transfer rate:          3.13 [Kbytes/sec] received

Looking at these numbers, especially the Requests per second: 8.74, it’s clear that while http.server is incredibly easy to use, it’s not exactly a performance beast. This is because it’s a very basic, single-threaded server. When time.sleep(0.1) is running for one request, the entire server is essentially waiting. This is what we call blocking I/O. One operation blocks everything else.

This simple server gives us a starting point. It works, but it’s clearly not designed for handling many requests concurrently. In the next step, we’ll dive into raw sockets to understand things at a lower level and see if we can improve performance

Attempt 2 - Web Server with Raw Sockets

Okay, http.server was easy, but it felt a bit like magic, right? To really understand what’s going on, I decided to ditch the convenience and build a server using raw sockets. This means we’re going to interact directly with the network, handling connections and HTTP protocol details ourselves. Let’s look at the code:

sockets/server.py

import socket
import time

def handle_request(conn, addr):
    try:
        request_data = conn.recv(1024).decode()
        if request_data:
            # Simulate a database call or some processing
            time.sleep(0.1)  # 100 milliseconds delay
            response_html = """
            <html>
                <head>
                    <title>My Basic Server</title>
                </head>
                <body>
                    <h1>Hello from my basic server</h1>
                </body>
            </html>
        """
            response = "HTTP/1.1 200 OK\r\n"
            response += "Content-Type: text/html\r\n"
            response += f"Content-Length: {len(response_html)}\r\n"
            response += "\r\n"
            response += response_html
            conn.sendall(response.encode())
        else:
            print(f"Client {addr} sent no data")
    except Exception as e:
        print(f"Error handling client {addr}: {e}")
    finally:
        conn.close()


if __name__ == "__main__":
    HOST = ""  # Listen on all available interfaces
    PORT = 8000
    with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
        s.bind((HOST, PORT))
        s.listen()
        print(f"Listening on port {PORT}")
        while True:
            conn, addr = s.accept()
            with conn:
                handle_request(conn, addr)

This code is a bit more involved, but still pretty straightforward.

We start by importing the socket module.
The socket.socket() function is used to create a new socket object in Python. It takes two main arguments: socket.socket(family, type). I used socket.socket(socket.AF_INET, socket.SOCK_STREAM) where
- socket.AF_INET specifies the address family, meaning the socket will use IPv4. If you wanted an IPv6 socket, you would use socket.AF_INET6
- socket.SOCK_STREAM specifies the socket type, meaning it will be a TCP (stream-based) socket. If you wanted a UDP socket, you would use socket.SOCK_DGRAM
Then, s.bind((HOST, PORT)) binds the socket to listen on all available interfaces (HOST = ““) and port 8000.
s.listen() puts the socket into listening mode, ready to accept incoming connections.
The while True: loop is the heart of our server. s.accept() waits (blocks) for a new connection and, when one arrives it returns
- a new socket object (conn) for sending/receiving data with that client
- addr contains the client’s (IP address, port) tuple.
We then call handle_request(conn, addr) to process this connection.
Inside handle_request
- conn.recv(1024) attempts to receive up to 1024 bytes of data from the client – this is where we get the HTTP request. If the client sends less than 1024 bytes, it reads whatever is available. If the request is longer than 1024 bytes, only the first part is read (you may need a loop for large requests).
- .decode() we decode it i.e. convert the received raw bytes into a string using UTF-8 encoding, and check if there’s any data. Data sent over sockets is binary, so it needs decoding.
Just like before, time.sleep(0.1) simulates processing work.
Then comes the part where we manually construct the HTTP response. We need to include the status line (HTTP/1.1 200 OK), headers like Content-Type, Content-Length, and the HTML body, all separated by as per HTTP protocol. Finally, conn.sendall(response.encode()) sends the encoded (or binary) response back to the client, and conn.close() closes the connection.

Optimal Buffer Size (recv(N))

The optimal size for recv() depends on several factors, such as the expected request size, network performance, and memory efficiency. Here’s how you can determine the best size:

Buffer Size (`recv(N)`)	Use Case
1024 (1 KB)	Works well for small HTTP requests (GET requests, simple headers).
2048 (2 KB)	Good for typical HTTP requests with longer headers.
4096 (4 KB)	Often used as a standard buffer size for web servers.
8192 (8 KB)	Suitable for handling larger requests efficiently.
16384+ (16 KB or more)	Used for high-performance servers or large payloads (e.g., file uploads, API requests).

Choosing the Optimal Size

For a simple web server: 4096 (4 KB) or 8192 (8 KB) is a good choice because:
- Most HTTP request headers are under 8 KB.
- This balances efficiency and memory usage.
For handling large requests (e.g., POST with form data or JSON):
- Use 8192 (8 KB) or more.
- Implement a loop to dynamically read the entire request.

Let’s benchmark it with the same ab command, and here are the results:

Concurrency Level:      10
Time taken for tests:   100.414 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      314000 bytes
HTML transferred:       249000 bytes
Requests per second:    9.96 [#/sec] (mean)
Time per request:       1004.143 [ms] (mean)
Time per request:       100.414 [ms] (mean, across all concurrent requests)
Transfer rate:          3.05 [Kbytes/sec] received

Looking at Requests per second: 9.96, the performance is actually slightly better than http.server (8.74 req/sec), but still in the same ballpark. It’s not a significant improvement. Why? Because we are still using blocking sockets and a single process. Just like before, time.sleep(0.1) in handle_request blocks the entire server from handling other requests while it’s waiting. We are still processing requests sequentially, one after another.

Building with raw sockets gives us more control and a deeper understanding, but in terms of concurrency and performance, this version is not fundamentally different from http.server. In the next step, we’ll introduce threads to handle multiple requests concurrently and hopefully see a real jump in performance.

Attempt 3 - Threading to the Rescue

The single-threaded nature of our previous servers is clearly the bottleneck. To handle multiple requests concurrently, the classic solution is threading. Let’s see how threading can boost our server’s performance. Here’s the code:

threading/server.py

import socket
import threading
import time

def handle_request(conn, addr):
    try:
        request_data = conn.recv(1024).decode()
        if request_data:
            # Simulate a database call or some processing
            time.sleep(0.1)  # 100 milliseconds delay
            response_html = """
            <html>
                <head>
                    <title>My Basic Server</title>
                </head>
                <body>
                    <h1>Hello from my basic server</h1>
                </body>
            </html>
        """
            response = "HTTP/1.1 200 OK\r\n"
            response += "Content-Type: text/html\r\n"
            response += f"Content-Length: {len(response_html)}\r\n"
            response += "\r\n"
            response += response_html
            conn.sendall(response.encode())
        else:
            print(f"Client {addr} sent no data")
    except Exception as e:
        print(f"Error handling client {addr}: {e}")
    finally:
        conn.close()


def threaded_server():
    HOST = ""
    PORT = 8000
    with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
        s.bind((HOST, PORT))
        s.listen()
        print(f"Listening on port {PORT}")
        while True:
            conn, addr = s.accept()
            thread = threading.Thread(target=handle_request, args=(conn, addr))
            thread.start()


if __name__ == "__main__":
    threaded_server()

The core handle_request function remains the same as in sockets/server.py. The key change is in the threaded_server() function. Inside the while True: loop, after accepting a connection with conn, addr = s.accept(), instead of directly calling handle_request, we now create a new thread:

thread = threading.Thread(target=handle_request, args=(conn, addr)).
- We pass the handle_request function as the target for the thread, and the connection object “conn” and address “addr” as arguments.
- Then, thread.start() starts the new thread, which will execute handle_request concurrently.

This means that when a new connection comes in, the main thread quickly accepts it and offloads the actual request handling to a separate thread. The main thread then immediately goes back to listening for new connections. This allows us to handle multiple requests seemingly at the same time.

Now, let’s benchmark it, and here are the results:

Concurrency Level:      10
Time taken for tests:   11.085 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      314000 bytes
HTML transferred:       249000 bytes
Requests per second:    90.21 [#/sec] (mean)
Time per request:       110.852 [ms] (mean)
Time per request:       11.085 [ms] (mean, across all concurrent requests)
Transfer rate:          27.66 [Kbytes/sec] received

Wow! Look at the Requests per second: 90.21. That’s a huge jump compared to our previous servers (around 9 req/sec)! Threading has made a massive difference. The time per request has also dropped significantly. This is because now, while one thread is waiting for time.sleep(0.1) to finish, other threads can continue processing other requests concurrently. We are no longer blocking the entire server on a single request.

Threading is a simple way to achieve concurrency in Python and is very effective for I/O-bound tasks like web servers, where the server spends a lot of time waiting for network operations or, in our case, our simulated database call. However, it’s important to remember that Python’s Global Interpreter Lock (GIL) can limit the effectiveness of threads for CPU-bound tasks. Also, there’s overhead associated with creating and managing threads.

Despite these limitations, for our simple I/O-bound web server, threading provides a dramatic performance improvement. In the next sections, we’ll explore asynchronous I/O using selectors and asyncio to see if we can achieve even better concurrency and efficiency.

Attempt 4 - Selectors with Blocking Time Simulation

Threading significantly improved concurrency, but there’s another approach: asynchronous I/O. Instead of threads, asynchronous I/O allows a single thread to handle multiple connections by using non-blocking sockets and event notifications. Let’s explore this with Python’s selectors module. We’ll start with server_blocking.py, which introduces selectors but still uses a blocking time.sleep to simulate work – this is intentional to highlight the structure of a selector-based server, even with a blocking operation.

Here’s the code:

selectors/server_blocking.py

import socket
import selectors
import time

selector = selectors.DefaultSelector()


def send_response(conn, addr, response):
    """Send the response when the socket is ready for writing."""
    try:
        conn.sendall(response.encode())
    except Exception as e:
        print(f"Error sending response to {addr}: {e}")
    finally:
        selector.unregister(conn)
        conn.close()


def handle_request(conn, addr):
    try:
        request_data = conn.recv(1024).decode()
        if request_data:
            response_html = """
            <html>
                <head>
                    <title>My Basic Server</title>
                </head>
                <body>
                    <h1>Hello from my basic server</h1>
                </body>
            </html>
        """
            response = "HTTP/1.1 200 OK\r\n"
            response += "Content-Type: text/html\r\n"
            response += f"Content-Length: {len(response_html)}\r\n"
            response += "\r\n"
            response += response_html

            time.sleep(0.1)  # <-- Ideally, replace this with a non-blocking timer
            selector.modify(
                conn,
                selectors.EVENT_WRITE,
                lambda conn: send_response(conn, addr, response),
            )
        else:
            print(f"Client {addr} sent no data")
            selector.unregister(conn)
            conn.close()
    except Exception as e:
        print(f"Error handling client {addr}: {e}")
        selector.unregister(conn)
        conn.close()


def accept_connection(sock):
    conn, addr = sock.accept()
    conn.setblocking(False)  # Set the connection to non-blocking
    selector.register(
        conn, selectors.EVENT_READ, lambda conn: handle_request(conn, addr)
    )


def asynchronous_server():
    HOST = ""
    PORT = 8000
    with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock:
        sock.bind((HOST, PORT))
        sock.listen()
        sock.setblocking(False)  # Set the main socket to non-blocking

        selector.register(sock, selectors.EVENT_READ, accept_connection)

        print(f"Listening on port {PORT}")

        while True:
            events = selector.select()  # This function returns all the events
            for key, _ in events:
                callback = key.data  # This is the function we registered earlier
                callback(key.fileobj)  # Execute callback with the socket


if __name__ == "__main__":
    asynchronous_server()

This code introduces several new concepts. Let’s unpack them before moving forward.

Unpack Selectors — High-level I/O multiplexing

What Are Selectors and Why Do We Need Them?

Selectors are a way to efficiently manage multiple sockets at the same time without blocking the program. Instead of waiting for one socket to send or receive data before moving to the next, a selector watches multiple sockets at once and tells the program when each socket is ready. This makes it possible to handle thousands of connections in a single thread, saving system resources and improving performance.

We need selectors because traditional blocking sockets make the server wait for each client one at a time, which is slow. Instead of creating a separate thread for each connection (which is expensive), selectors allow us to handle all connections efficiently in an event-driven manner.

How Are Selectors Different from Sockets?

A socket is just an endpoint for sending and receiving data over a network, like a phone line for communication. Normally, a server listens for connections and then handles each socket one at a time (blocking) or spawns a thread for each socket (multi-threading).

Imagine a basic blocking server that handles one client at a time.

conn, addr = sock.accept()  # Blocks until a connection is received
data = conn.recv(1024)  # Blocks until data is received
conn.sendall(response)  # Blocks until data is sent
conn.close()

Each step blocks the execution, meaning the server can’t handle other clients until the current one is fully processed. This becomes a huge bottleneck!

A selector, on the other hand, is a tool that monitors multiple sockets at once. Instead of blocking or creating threads, it checks all registered sockets and only acts on the ones that are ready. Selectors provide a non-blocking, event-driven approach. Instead of waiting for each client, the server registers multiple sockets with a selector and processes them only when they are ready. This makes it much more efficient, especially when dealing with a large number of clients.

Think of it like this:

Basic sockets: You call each person on the phone one by one.
Sockets + threads: You hire an assistant for each phone call.
Selectors: You put all calls on hold and switch between them only when they need attention.

How Do Selectors Compare with Sockets + Threads?

Using sockets with threads, the server creates a new thread for each client connection. This works well for a small number of clients, but as the number grows, CPU and memory usage skyrocket due to context switching and thread management. If thousands of clients connect, the system slows down or crashes because threads take too much memory.

Selectors solve this by handling all connections in a single thread. Instead of creating a new thread per client, it waits for any socket to be ready and processes it immediately. This allows a single-threaded server to handle tens of thousands of connections efficiently, using far less memory and CPU.

For Comparison

Feature	Sockets (Blocking)	Sockets + Threads	Selectors
Concurrency	Low (one at a time)	Medium (one thread per client)	High (handles many clients in one thread)
CPU Usage	Low (but slow)	High (many threads)	Low (single-threaded, event-driven)
Memory Usage	Low	High (each thread takes memory)	Very Low
Scalability	Poor	Medium (limited by threads)	Excellent (handles thousands of clients)
Use Case	Small servers	Moderate workload	High-performance servers (e.g., Nginx, chat apps)

After unpacking Selectors and their importance, let’s move back to our code again.

First, we import selectors and create a selector object using selectors.DefaultSelector().
- What is selectors.DefaultSelector()? It provides a high-level abstraction for I/O multiplexing, meaning it allows monitoring multiple sockets (or file descriptors) for events like:
  - Read readiness (EVENT_READ) → Data is available to read
  - Write readiness (EVENT_WRITE) → The socket is ready to send data
- selectors.DefaultSelector() automatically picks the best available system-dependent selector mechanism. This ensures optimal performance depending on the operating system:
  - OS Selector Used
    
    Windows SelectSelector (based on select())
    
    Linux EpollSelector (based on epoll())
    
    macOS KqueueSelector (based on kqueue())
In asynchronous_server(), we create a socket, bind, and listen, just like before. Crucially, we set both the listening socket (sock.setblocking(False)) and the connection socket (conn.setblocking(False) in accept_connection()) to non-blocking mode. This means that operations like sock.accept() and conn.recv() will return immediately, even if there’s no data or connection ready.
We register the listening socket with the selector: selector.register(sock, selectors.EVENT_READ, accept_connection). This tells the selector to monitor “sock” for read events (new connections) and call the accept_connection function when a connection is ready. Similarly, in accept_connection, we register each new connection socket (conn) with the selector to monitor for EVENT_READ and call handle_request when data is ready to be read from that connection.

OS	Selector Used
Windows	`SelectSelector` (based on `select()`)
Linux	`EpollSelector` (based on `epoll()`)
macOS	`KqueueSelector` (based on `kqueue()`)

Why Do We Register Twice for selectors.EVENT_READ?

1st Registration: The Listening Socket (sock)

selector.register(sock, selectors.EVENT_READ, accept_connection)

This registers the main server socket (sock) with the selector.
The event type is selectors.EVENT_READ, meaning the selector will monitor when a new client connection is ready to be accepted.
When a new connection arrives, the accept_connection function is called.

2nd Registration: Client Connection (conn)

selector.register(
    conn, selectors.EVENT_READ, lambda conn: handle_request(conn, addr)
)

This registers the newly accepted client socket (conn).
Again, the event type is selectors.EVENT_READ, meaning the selector will monitor when the client sends data (HTTP request).
When data arrives, the handle_request function is called.

Why Is This Necessary? Each socket has a different role:

The server socket (sock) listens for new connections.
The client socket (conn) listens for incoming data from the client.

If we only registered the server socket, we wouldn’t be able to read incoming HTTP requests from clients. Likewise, if we didn’t register the client socket, we wouldn’t know when a client sends data.

The while True: loop in asynchronous_server is the event loop.
- events = selector.select() waits (but importantly, not blocking the entire thread) until one or more registered sockets are ready.
- It returns a list of events. For each event, callback = key.data retrieves the callback function we registered (e.g., accept_connection or handle_request), and callback(key.fileobj) executes that function, passing the socket object as an argument.
Now, look at handle_request. After receiving the request and preparing the response, we still have time.sleep(0.1). And then, instead of sending the response directly, we register the connection conn with the selector for EVENT_WRITE and associate it with the send_response callback: selector.modify(...). The intention here is to send the response when the socket is ready for writing… however, because of time.sleep(0.1) being before selector.modify, we are still blocking during the sleep.

Let’s benchmark it.

Concurrency Level:      10
Time taken for tests:   100.484 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      314000 bytes
HTML transferred:       249000 bytes
Requests per second:    9.95 [#/sec] (mean)
Time per request:       1004.840 [ms] (mean)
Time per request:       100.484 [ms] (mean, across all concurrent requests)
Transfer rate:          3.05 [Kbytes/sec] received

The Requests per second: 9.95 is similar to our raw sockets server, and even slightly worse than threaded server. This is not surprising. Even though we’ve introduced selectors and non-blocking sockets, the time.sleep(0.1) in handle_request is still blocking the event loop. While selector.select() itself is non-blocking and efficient for handling multiple connections, our simulated work is still synchronous and serializing request processing.

This selectors/server_blocking.py example, as written, doesn’t give us the performance benefits of asynchronous I/O because of the blocking time.sleep. However, it’s a crucial stepping stone. It demonstrates the structure of a selector-based event loop, registering sockets and callbacks. In the next iteration, we’ll try to replace the blocking time.sleep with a truly non-blocking delay mechanism to unlock the real power of asynchronous I/O with selectors.

Attempt 5 - Selectors with Non-Blocking Timer Simulation

In the previous section, we saw the structure of a selector-based server, but the blocking time.sleep negated any performance gains. To truly leverage asynchronous I/O, we need to replace that blocking delay with a non-blocking mechanism. In this attempt, I’ve used threading.Timer to simulate a non-blocking delay in conjunction with selectors. It’s still not pure asynchronous I/O in the ideal sense, as threading.Timer uses threads behind the scenes, but it’s a step closer and demonstrates the concept.

Here’s the code:

selectors/server_nonblocking.py

import socket
import selectors
import threading

selector = selectors.DefaultSelector()


def send_response(conn, addr, response):
    """Send the response when the timer expires."""
    try:
        conn.sendall(response.encode())
    except Exception as e:
        print(f"Error sending response to {addr}: {e}")
    finally:
        selector.unregister(conn)
        conn.close()


def handle_request(conn, addr):
    try:
        request_data = conn.recv(1024).decode()
        if request_data:
            response_html = """
            <html>
                <head>
                    <title>My Basic Server</title>
                </head>
                <body>
                    <h1>Hello from my basic server</h1>
                </body>
            </html>
        """
            response = "HTTP/1.1 200 OK\r\n"
            response += "Content-Type: text/html\r\n"
            response += f"Content-Length: {len(response_html)}\r\n"
            response += "\r\n"
            response += response_html

            # Use threading.Timer to call send_response after a delay
            timer = threading.Timer(0.1, send_response, args=(conn, addr, response))
            timer.start()

        else:
            print(f"Client {addr} sent no data")
            selector.unregister(conn)
            conn.close()
    except Exception as e:
        print(f"Error handling client {addr}: {e}")
        selector.unregister(conn)
        conn.close()


def accept_connection(sock):
    conn, addr = sock.accept()
    conn.setblocking(False)  # Set the connection to non-blocking
    selector.register(
        conn, selectors.EVENT_READ, lambda conn: handle_request(conn, addr)
    )


def asynchronous_server():
    HOST = ""
    PORT = 8000
    with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock:
        sock.bind((HOST, PORT))
        sock.listen()
        sock.setblocking(False)  # Set the main socket to non-blocking

        selector.register(sock, selectors.EVENT_READ, accept_connection)

        print(f"Listening on port {PORT}")

        while True:
            events = selector.select()  # This function returns all the events
            for key, _ in events:
                callback = key.data  # This is the function we registered earlier
                callback(key.fileobj)  # Execute callback with the socket


if __name__ == "__main__":
    asynchronous_server()

The accept_connection and asynchronous_server functions are the same as in selectors/server_blocking.py. The key change is again in handle_request. Instead of time.sleep(0.1) and then registering for EVENT_WRITE, we now use threading.Timer:

What is threading.Timer?

threading.Timer is part of Python’s threading module and it allows you to run a function after a specified delay. It works in the following way:

Delay: You give it a time delay in seconds.
Function: You specify the function that should be executed when the time delay is over.
Arguments: You can pass arguments to that function.

This creates a new thread that waits for the specified delay and then runs the given function. It’s important to note that this is happening in the background, so the rest of the code can keep running.

timer.start() line starts the timer. Without calling start(), the timer won’t actually run. The timer runs in the background, which means it doesn’t block the main code from executing. The rest of the program continues to run while waiting for the timer to expire. Once the timer expires, the given function is called automatically, and the response is sent back to the client.

timer = threading.Timer(0.1, send_response, args=(conn, addr, response))
timer.start()

This creates a Timer that will call send_response(conn, addr, response) after 0.1 seconds, but importantly, it does this in a separate thread. The handle_request function itself returns immediately after starting the timer. This means the main event loop in asynchronous_server() is no longer blocked during the simulated delay. It can continue to process other events, like handling new connections or reading data from other sockets.

Let’s benchmark it.

Concurrency Level:      10
Time taken for tests:   11.073 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      314000 bytes
HTML transferred:       249000 bytes
Requests per second:    90.31 [#/sec] (mean)
Time per request:       110.731 [ms] (mean)
Time per request:       11.073 [ms] (mean, across all concurrent requests)
Transfer rate:          27.69 [Kbytes/sec] received

The Requests per second: 90.31 is very similar to our threaded server (90.21 req/sec)! This is a significant improvement over the blocking selector version and the basic socket servers. By using threading.Timer, we’ve effectively offloaded the delay to a separate thread, allowing the main selector event loop to remain non-blocked and handle other connections concurrently.

While this approach works and shows improved concurrency, it’s crucial to understand that it’s still a hybrid approach, not pure asynchronous I/O. We’re using threads (via threading.Timer) to achieve non-blocking behavior. True asynchronous I/O aims to avoid threads altogether for concurrency, relying solely on event loops and non-blocking operations within a single thread.

In the next and final step, we’ll explore asyncio, Python’s built-in library for true asynchronous programming, to see how we can achieve non-blocking I/O and concurrency in a more elegant and efficient way, without relying on threads for the simulated delay.

Attempt 6 - asyncio True Non-Blocking I/O

Finally, we arrive at asyncio, Python’s built-in library for asynchronous programming. asyncio provides a framework for writing single-threaded concurrent code using coroutines, allowing for true non-blocking I/O without the complexities of threads for concurrency in I/O-bound operations.

What is a coroutine?

A coroutine in Python is a special type of function that can be paused and resumed during execution, making it useful for asynchronous programming. Coroutines allow Python to handle non-blocking operations efficiently, such as network requests, file I/O, or database queries, without needing multiple threads.

How is a Coroutine Different from a Regular Function?

Defined with async def: Unlike normal functions (def), coroutines use async def.
Uses await to pause execution: Coroutines can pause at await statements, allowing other coroutines to run in the meantime.
Needs to be explicitly scheduled: Calling a coroutine doesn’t execute it immediately; instead, it returns a coroutine object that must be awaited or run using an event loop.

Example of a Coroutine

import asyncio

async def say_hello():
    print("Hello!")
    await asyncio.sleep(2)  # Simulates a non-blocking delay
    print("World!")

# Running the coroutine
asyncio.run(say_hello())

Let’s examine the code.

asyncio/server.py

import asyncio

async def send_response(writer, response):
    """Send the response when the timer expires."""
    try:
        writer.write(response.encode())
        await writer.drain()  # Ensure data is sent
    except Exception as e:
        print(f"Error sending response: {e}")
    finally:
        writer.close()
        await writer.wait_closed()  # Wait for the writer to close


async def handle_request(reader, writer):
    addr = writer.get_extra_info("peername")
    try:
        request_data = await reader.read(1024)  # Asynchronously read data
        request_data = request_data.decode()

        if request_data:
            response_html = """
            <html>
                <head>
                    <title>My Basic Server</title>
                </head>
                <body>
                    <h1>Hello from my basic server</h1>
                </body>
            </html>
        """
            response = "HTTP/1.1 200 OK\r\n"
            response += "Content-Type: text/html\r\n"
            response += f"Content-Length: {len(response_html)}\r\n"
            response += "\r\n"
            response += response_html

            # Use asyncio.sleep for non-blocking delay
            await asyncio.sleep(0.1)
            asyncio.create_task(
                send_response(writer, response)
            )  # Create a task to send response asynchronously

        else:
            print(f"Client {addr} sent no data")
            writer.close()
            await writer.wait_closed()

    except Exception as e:
        print(f"Error handling client {addr}: {e}")
        writer.close()
        await writer.wait_closed()


async def main():
    HOST = ""
    PORT = 8000

    async def accept_connection(reader, writer):
        await handle_request(reader, writer)

    server = await asyncio.start_server(accept_connection, HOST, PORT)
    addr = server.sockets[0].getsockname()
    print(f"Serving on {addr}")

    async with server:
        await server.serve_forever()


if __name__ == "__main__":
    asyncio.run(main())

This code looks quite different from the previous versions, leveraging async and await keywords.

Note

The send_response and handle_request functions are now defined as async def, making them coroutines.
asyncio.start_server(accept_connection, HOST, PORT) starts the asynchronous server.
- asyncio.start_server() expects a callback function that follows this signature:
  - async def callback(reader: asyncio.StreamReader, writer: asyncio.StreamWriter):
  - Our accept_connection function matches this format, where
    - reader: A StreamReader object to read data from the client.
    - writer: A StreamWriter object to send data to the client.

Inside handle_request, reader and writer are asyncio.StreamReader and asyncio.StreamWriter objects, providing asynchronous read and write operations.
request_data = await reader.read(1024) asynchronously reads data from the client. The await keyword is crucial here. It’s where the magic of non-blocking I/O happens. When await reader.read(1024) is encountered, the handle_request coroutine pauses execution, yielding control back to the asyncio event loop. The event loop can then proceed to handle other tasks, like processing other connections. When data is available to be read on this connection, the event loop will resume the handle_request coroutine right after the await line. This is true non-blocking I/O within a single thread.
Similarly, await asyncio.sleep(0.1) provides a non-blocking delay. Instead of pausing the entire thread, it pauses only the current coroutine, allowing the event loop to continue processing other tasks.
asyncio.create_task(send_response(writer, response)) creates an asyncio.Task to run send_response concurrently. This means that sending the response happens in the background, without blocking the handle_request coroutine from processing further requests (though in our simple example, handle_request is essentially done after this).

Task vs Coroutine?

Coroutine This is an asynchronous function (a function defined with async def) that can pause its execution using await, allowing other tasks to run while it’s paused.

Task

A task is created from a coroutine using asyncio.create_task(coroutine()).
It is responsible for actually running the coroutine in the event loop.
You can manage the Task (e.g., cancel it, wait for it to finish, etc.).
Once a task is created, it begins running immediately and will be completed when the coroutine finishes its work.

Why have we not created any other Task? What is special about send_response() There’s no need to create additional tasks for other operations because:

They’re non-blocking (they let the event loop run other tasks while waiting).
The response sending (via send_response) is the only operation that benefits from being run in the background while the server handles other clients.

Okay, so could we use Blocking Sleep as well but assign it to a separate Task? Yes, we can do that as well. I have used non-blocking sleep in this example. But similar to blocking send_response() you can define a Task for a blocking sleep coroutine.

#| filename: "asyncio/server_blocking.py"
def blocking_sleep():
    """This simulates a blocking sleep."""
    time.sleep(1)  # This is a blocking sleep

async def blocking_sleep_task():
    """Run blocking sleep in a separate task."""
    # This simulates a blocking operation
    loop = asyncio.get_event_loop()  # Get the current event loop
    await loop.run_in_executor(None, blocking_sleep)  # Run the blocking function in a separate thread

# Use blocking sleep in a separate task
asyncio.create_task(blocking_sleep_task())

The main() function sets up the server and starts the asyncio event loop using asyncio.run(main()).

Let’s bencharmark it.

Concurrency Level:      10
Time taken for tests:   12.061 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      314000 bytes
HTML transferred:       249000 bytes
Requests per second:    82.91 [#/sec] (mean)
Time per request:       120.612 [ms] (mean)
Time per request:       12.061 [ms] (mean, across all concurrent requests)
Transfer rate:          25.42 [Kbytes/sec] received

The Requests per second: 82.91 is slightly lower than the threaded and timer-selector versions in this particular benchmark run, but still significantly better than the initial blocking servers. In many scenarios, asyncio can outperform threading for I/O-bound tasks due to lower overhead and more efficient concurrency management. However, the exact performance can vary depending on the workload and specific system conditions.

“asyncio” represents a more modern and efficient approach to concurrency for I/O-intensive applications in Python. It allows us to write highly concurrent code within a single thread, avoiding many of the complexities and overheads associated with threads. It’s the foundation for many modern Python web frameworks and asynchronous libraries.

Conclusion

We’ve come a long way, starting from a super simple http.server to exploring threading, selectors, and finally, asyncio. Let’s take a moment to look back at the performance of each server implementation. Here’s a table summarizing the ‘Requests per second’ we observed with Apache Benchmark:

Server Implementation	Requests per Second (approx.)
http.server	8.74
Raw Sockets	9.96
Threading	90.21
Selectors (Blocking)	9.95
Selectors (Non Blocking)	90.31
asyncio	82.91

As you can clearly see, threading and the selector-timer hybrid approaches provided a dramatic performance boost compared to the basic single-threaded servers. asyncio, while in this specific benchmark run showing slightly lower RPS than the threaded versions, still demonstrated a significant improvement over the blocking approaches and represents a more robust and scalable architecture for I/O-bound applications in the long run.

It’s important to remember that these are very basic, toy servers. They lack many features of production-ready web servers, such as robust HTTP parsing, proper error handling, security considerations, and more. They are meant for educational purposes – to illustrate the core concepts of networking and concurrency.

Building these basic servers from scratch was a learning journey for me. I hope it has been for you too! It really demystifies what’s happening behind the scenes and gives a deeper appreciation for the evolution of concurrency approaches in Python and the power of asynchronous I/O.

Introduction

Environment Details

Attempt 1 - The Simplicity of http.server

Attempt 2 - Web Server with Raw Sockets

Attempt 3 - Threading to the Rescue

Attempt 4 - Selectors with Blocking Time Simulation

What Are Selectors and Why Do We Need Them?

How Are Selectors Different from Sockets?

How Do Selectors Compare with Sockets + Threads?

Attempt 5 - Selectors with Non-Blocking Timer Simulation

Attempt 6 - asyncio True Non-Blocking I/O

Conclusion

Attempt 1 - The Simplicity of `http.server`