Writing client/server applications with WebSockets (WSX)

 

The WebSocket protocol (aka websox or WSX) is currently the best method of durable communication between JavaScript code running in a web browser and server-side applications running behind any of the major web servers (Apache, nginx, etc.). When multiple client-server interactions are required, it serves as a more secure, performant and flexible replacement for all other dynamic web content methodologies, including CGI, SSI, PHP and AJAX. It can also be used to create stand-alone clients and servers, independent of a full HTTP environment.

The WSX protocol is defined in RFC 6455. A slightly more readable description can be found on the Mozilla Developer Network, which also covers the JavaScript WebSocket interface built in to major web browsers. Fundamentally, the protocol consists of an HTTP handshake, followed by a binary line protocol for actual data transfer. The entire protocol can be implemented in JSUS programs using the getb() and putb() functions (UTF-8 encoded HTTP headers are, of course, fully compatible with the 8-bit binary supported by these functions).

However, while the websox protocol is fairly simple, it remains non-trivial to implement — making implementation at the JavaScript application layer more prone to errors and security breaches. In addition, in our opinion, the full protocol is more than a trifle "over-engineered" for most of the applications for which it is appropriate. Therefore, we have embedded a sub-set of the WSX protocol into the JSUS ask() and say() functions, but only when used over a TCP or Unix socket, redirected to both stdin and stdout. These functions automatically detect the beginning of a websox session and then transparently handle WebSocket message framing and convert between JavaScript strings and the UTF-8 or binary data sent over the line. They also transparently respond to ping, pong and close messages from the opposite end of the connection.

After a successful HTTP handshake, the JavaScript programmer need only send and receive standard strings, with a message type (or "opcode") as the first character. Both functions support unfragmented messages up to 64K in length, which is more than enough for most applications. In the unlikely event longer messages are required, the programmer must provide his own fragmentation implementation (or use the getb() and putb() functions to implement the full protocol, which is beyond the scope of this manual).

A websox connection must be initiated by the client — either a stand-alone program or, far more likely, by JavaScript running in a web browser. In the latter case, the program instantiates a WebSocket object, as described in Writing a WebSocket Client, on the Mozilla Developer Network. A WebSocket can be created almost anywhere in browser side JavaScript (including within a WebWorker thread). For example:

var webSok = new WebSocket("ws://example.com/wsx/program/path/info");

The ws (or wss) replaces the http (or https) URI scheme in the link to example.com. When the web browser executes this statement, it uses the Upgrade mechanism to upgrade the HTTP web server, at example.com, to using the WebSocket line protocol. But first, the browser (or stand-alone client) must establish a normal TCP connection and send the following header lines (for example):

GET /wsx/program/path/info HTTP/1.1 Host: example.com Upgrade: websocket Connection: Upgrade Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ== Sec-WebSocket-Version: 13 Origin: http://example.com

This client handshake is transparent to JavaScript running in the browser and may also include other standard headers, such as "Cookie". However, as we all know, Internet applications are like politicians, so nothing they say can be trusted. This is especially true for custom clients, as opposed to more trustworthy web browsers. It is particularly relevant for the Origin header, which should provide a valuable security check, but doesn't, in practice. In almost all cases, the developer will need to provide his own session verification, independent of the connection headers.

The way these headers are processed depends on how the main server is configured. There are many possible configurations, including dedicated websox servers, and we cannot cover them all within the scope of this document. Our standard solution is the jsite server, which configures the nginx web server to intercept all requests beginning with wsx/ (as in the example above). All such requests are reverse-proxied to jsite's own WSX server, which is similar to it's CGI server. A new server process is forked by jsite for each connection and the connected socket is redirected to both stdin and stdout. Next, the headers are validated and stored (with additional variables) in the process' environment, just like a CGI program. Then, jsite drops privileges to the example.com user and executes a jump() to the program named immediately after the wsx/ in the GET request. This program must be located (or linked to) in the bin folder of the site's home directory: /home/example.com/bin/program. At this point, the connection is still "pending" and the program must make it's own decision whether to reject or accept the request, based on it's own information and the content of the environment.

As for a CGI program, the connection headers are stored in the environment with names beginning with an HTTP_ prefix (for example, HTTP_COOKIE). In addition, jsite passes the following variables (most of which are identical to those for CGI programs):

WS_ACCEPT Pre-computed Sec-WebSocket-Accept header. DOCUMENT_ROOT path to the web site's root directory. SERVER_NAME server hostname (may be IP address). REQUEST Entire HTTP request line received by HTTP server. REMOTE_HOST client host name (may be same as REMOTE_ADDR). REMOTE_ADDR IP address of the client browser (or proxy). REQUEST_METHOD HTTP method (must be GET for a web socket). SCRIPT_NAME path to the program file (before PATH_INFO). PATH_INFO URL path suffix, following program name. QUERY_STRING Everything after question mark in the URI.

After start up, a server program can communicate with the client at any time, using simple say() and ask() calls (as can any child process). But first, the WebSocket upgrade request must be accepted or rejected. If appropriate, the program can refuse a connection simply by writing an HTTP error status response to stdout, for example: say("HTTP/1.1 400 Bad Request\r\n\r\n"). After this, the program need only exit() and the connection will be closed. To accept a connection the program must say() at least the following HTTP response headers:

HTTP/1.1 101 Switching Protocols Upgrade: websocket Connection: Upgrade Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=

The Sec-WebSocket-Accept header must be sent last, as a separate line, because this is what causes ask() and say() to upgrade into websox server mode. According to RFC 6455, the value of this header is computed from the value received in the client's Sec-WebSocket-Key header, concatenated with the magic string value "258EAFA5-E914-47DA-95CA-C5AB0DC85B11", which is then SHA-1 hashed and base64 encoded. This cumbersome procedure has no security value and serves only to reassure the browser that it really is talking to a WebSocket server. Fortunately, jsite is kind enough to pre-compute and pass the entire header in the WS_ACCEPT variable (including the trailing line end and empty line). Thus, the final header is easily written with: say(env("WS_ACCEPT")). This establishes the connected state and both browser and server switch to the WebSocket binary line protocol.

Once the connection is established, complete messages may be transferred as JavaScript strings containing either UCS-2 characters or 8-bit binary bytes stored in the lower half of each character. The ask() and say() functions handle the protocol itself and automatically frame the data and translate back and forth between UCS-2 and UTF-8 over the line. However, the first byte over the line must always be an 8-bit value identifying the type (or "opcode") of the message, as follows:

0x01 UCS-2 Character data, transmitted as UTF-8. 0x02 Binary data, encoded 1 byte per character. 0x08 a CLOSE message, to which the peer replies CLOSE. 0x09 a PING message, to which the peer replies PONG. 0x0A a PONG message, to which there is no reply.

Normally, only the first two types are exchanged (freely intermixed) with the body of each message determined by a private, application level protocol. A say() usually returns immediately, while ask() blocks until a complete message is received. While waiting, ask() automatically responds to PING and CLOSE messages from the peer (and PONG is always ignored). Thus, ask() returns (normally) with either a data message, or after completion of a closing handshake with the peer. In the latter case, the message body is either empty or a 2-byte, integer reason code for the peer's closure (as a single UCS-2 character).

A say() may be used to send any of these message types (with opcodes and other binary values coded as escapes within the JavaScript string). However, if PING is sent, the PONG reply will simply be ignored by a following ask(). A PONG, by itself, will be ignored by the peer, although it may still serve to keep the connection alive. An alternate method is to interrupt a long waiting ask() with a timer, issue a PING, and restart the ask(). A PING or PONG message consists of the opcode followed by an optional 2-byte, integer reason code, packed into a single UCS-2 character. A text message may also be appended but this will be ignored by both ask() and say() functions.

In many (if not most) cases the application protocol will be based on UCS character strings, rather than binary data. To simplify this, say() allows the opcode to be omitted as long as the actual text begins with a printable character (greater or equal to space). The opcode is then inserted automatically, over the line. Similarly, ask() returns only the actual text string without the opcode (as long as the first character is printable). Thus, server side programs typically only send and receive text strings, without any opcode. Special handling is required only for binary messages; for CLOSE and PING messages; and for those beginning with a control character (below space).

The WebSocket protocol is designed to provide full-duplex communication between a client and server (which is one of it's major advantages over earlier methodologies). JSUS programs, on the other hand, only operate in half-duplex mode — the ask() function blocks while waiting for input, preventing data from being sent simultaneously by a say(). This is easily remedied by forking a child process, which inherits both standard files. The child process may then continually stream data to the client (e.g. a video feed). At the same time, the parent process sends and receives control messages and signals the child when it needs to pause or stop. The parent process can also handle streaming from the client and both, working together, support full-duplex streaming in both directions. JSUS was intentionally designed this way to "keep it simple", and the same technique works equally well in other contexts, over any type of socket.

A server process must not terminate with just an exit() call (which would also close the standard files). Other server processes may still have these files open, which would leave the client still connected. A CLOSE message should always be sent to the client to initiate a formal closing handshake from the server. On receipt, the client will reply with a matching CLOSE. The say() for the server's CLOSE will return immediately and should be followed by an ask() for the client's CLOSE. When the client's CLOSE arrives, the socket is closed and the client's CLOSE message is returned to the process which may then exit(). During the CLOSE, the client browser will stop sending any further messages and will ignore any messages received from other server processes. And, after it closes its own side of the socket connection, any further ask() or say() calls will end abnormally with EOF or error. Thus, other server processes do not normally need to be aware of a CLOSE in process.

 

See also:

Serving multiple web sites (jsite) Network security considerations Server Program Display Agent (jspda.js)