In this two-part article, we will explore the basics of Node.js, including its non-blocking architecture and the core of its execution environment: the event loop. We will also examine the Node.js environment, which includes a look at the Node command line tool, as well as the Node Package Manager (NPM) tool. Using this foundational knowledge, we will conclude with a walkthrough of the creation of a simple Hypertext Transfer Protocol (HTTP) server to handle HTTP requests from clients. By the end of this article series, we will have accumulated a fundamental understanding of how Node.js works and where it fits into the ever-changing world of web development.
Although there are many server languages and frameworks in existence today, including Java, Spring, Ruby, Rails, Python, and Flask. Node.js differentiates itself in two important ways:
- It is excellent at performing IO-intensive tasks, including handling HTTP requests and interacting with file system objects.
The Node.js Stack
The Event Loop
The event loop is the core of what makes Node.js such a valuable framework, allowing for thousands and tens of thousands of simultaneous connections and responsive reactions to IO-based events. In most programming languages and frameworks, this is accomplished by the use of multiple threads or some concurrent execution mechanism, but with Node.js this is not the case. Instead, code is executed within an iterative event loop, which cycles through a series of phases and notifies application code of the completion of these steps. In this section, we will explore the concept of event-driven programming, which is at the heart of Node.js and then, using this prerequisite knowledge, look into the internals of how the event loop in Node.js operates.
In the vast majority of code, steps are executed in a synchronous manner, where code on one line is executed prior to code on subsequent lines. For example, if we wanted to open a User Datagram Protocol (UDP) socket in Python and listen for incoming data, we could write the following (obtained from the Python Wiki):
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM, socket.IPPROTO_UDP) sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) sock.bind(('', 4242)) mreq = struct.pack("=4sl", socket.inet_aton("126.96.36.199"), socket.INADDR_ANY) sock.setsockopt(socket.IPPROTO_IP, socket.IP_ADD_MEMBERSHIP, mreq) while True: print sock.recv(10240)
Even with the inclusion of IO tasks, such as opening a network socket on a specified network interface, the code still executes in a top-down manner. The problem with this style of programming is that some calls may be blocking, where the current thread of execution halts until a specified task completes. For example, when we make the
sock.recv call, our current thread waits until data is received on our socket and then continues execution once the data is received. While this ensures that we can execute code only after an event occurs, it is inefficient, since we can no longer do useful work on this thread.
In event-driven programming, we register a callback function with the system that provides logic to be executed once an event has occurred. In short, a callback can be summed up using the following definition:
A callback is an observer function that is supplied to another function to be executed at a later time when an event of interest occurs.
This allows us to both respond to events, such as receiving data on a socket, but still execute subsequent code while we are waiting for the events to occur. For example, we could have written a socket interface in Python to operate in the following manner:
class UdpSocketServer (object): def __init__(self, ip, port): self._ip = ip self._port = port self._callbacks =  def on_data_received(self, callback): self._callbacks.append(callback) def listen(self): # Connect to socket and spawn new thread to listen for data # and execute the registered callbacks when data is recieved server = UdpSocketServer('127.0.0.1', 5005) server.on_data_received(lambda data: print data) server.listen() print 'We are now listening'
In this event-driven socket, the line
print 'We are now listening' is executed immediately after the completion of the
server.listen() call, even before any data is received by the socket server. This allows us to continue executing on our current thread while at the same time, responding to events as they occur. We are only able to program in this manner, though, if we have some mechanism that can listen for events and call our callback functions when the event occurs on a separate thread of execution. For example, we could delegate to the Operating System (OS) kernel for sockets or to a separate thread for files.
This type of programming is very reminiscent of hardware Direct Memory Access (DMA) used by computers to allow for large amounts of data to be transferred from a peripheral device (such as a disk to memory without causing the processor to block, waiting for the data transfer to complete. In the case of DMA, the processor tells the DMA controller to transfer a specified number of bytes from the peripheral device into memory at a specified address and interrupt the processor when the transfer is complete; the processor then continues its execution as normal and is interrupted at a later time when the transfer is complete. Just as with DMA, event-driven programming says to the event framework, "Do this operation in the background and let me know when it is done."
As we will see in the following section, Node.js has its own technique for supporting event-driven programming, which allows us to register a series of callbacks for common events, such as responding to an HTTP request or opening a file. By registering callbacks with the Node.js API, we are free from dealing with copious amounts of threads or handling low-level OS kernel calls. Instead, we register our callbacks and let Node.js handle the events in the manner it deals most efficient, knowing that when the event occurs, we are only responsible for supplying logic on how to handle the event at an application level.
The Internals of the Event Loop
The event loop is at the heart of the Node.js framework and at the core of libuv, providing the groundwork for the execution semantics of Node.js. Even with this importance, it has been a commonly misunderstood facet, causing many incorrect conceptual models to abound. In the fall of 2016, Bert Belder of IBM gave a talk on the Node.js event loop at Node.js Interactive Europe, where he cleared up much of the misunderstanding and controversy about this execution model. An abstracted view of his conceptual model is presented below:
Next, the Unicorn function is executed. In this bubble, IO events, such as network and file system events, are checked and if an event with a corresponding callback has occurred, the callback is executed. This process continues until all fired events are handled. Once all unicorn callbacks are handled, then all setImmediate callbacks are handled. While we do not cover the
setImmediate function in this article, more information can be found at this Stackoverflow post by JohnnyHK.
Finally, internal closing and cleanup procedures are executed by Node.js and any callbacks tied to these procedures are executed. Once these callbacks are executed, a single iteration of the loop has been completed. The continuation criteria for the loop is based on a reference counting scheme, where each registered callback to be executed increments the reference count and each executed callback decrements the reference count. Once the reference count is reduced to 0 (denoting that there are no more callbacks that can be executed), the loop is exited.
Within Our Code
Each time a blue bubble is executed, a subsequent loop is initiated: each time our code is called, Node.js will check if there are any completed promises within the callback, as well as if there are any
nextTickcalls (which will be executed within the same iteration of the loop, rather than deferred to a later iteration). Once all promises have been handled and all
nextTick callbacks executed, this sub-loop is exited and the main event loop continues. For more information on the
nextTick function, see the official Node.js Event Loop Timers documentation.
The Unicorn Function
The Unicorn function is responsible for handling the events that the OS kernel and thread pool have fired, as well as preparing near-term timeout events. In Node.js, a large portion of the IO tasks are performed through the OS kernel, such as reading and writing to sockets and handling network communications; when possible, Node.js will delegate to the functions and structures provided by the kernel and will check to see when these events have completed.
For example, when the Unicorn function is executed, Node.js will check if any IO events that can be processed on any desired IO resource, and if there are, will collect the events and process them using our callbacks. In the case of Node.js on Linux, this checking process is performed using the epoll functionality of the kernel. For more information, see the documentation for epoll and epoll_wait.
When asynchronous tasks cannot be performed by the operating system, they are delegated to a thread pool, which allows for tasks to be executed on separate threads. During the execution of a Unicorn function, any events that are fired from the thread pool are collected and processed using our code in a similar manner to the processing of OS kernel-based events.
Prior to leaving the Unicorn phase, near-term timer events are collected, preparing them to be executed during the Timeout phase in the next iteration of the main event loop. Note that if Node.js predicts that a timeout will not occur on the next iteration of the main event loop, then a timeout event is not collected. Instead, Node.js will defer preparing this timeout event until it determines that it will be executed in the subsequent event loop iteration.
Pulling it All Together
As we will see in the following part of this series, Node.js also provides an environment that allows us to include many different packages and supporting tools that take Node.js from a simple event loop framework to a full-fledged server implementation environment.