Writing server-side web applications (CGI)

Intentionally, JSUS does not provide interfaces to graphical desktop environments such as Windows, Gnome or KDE. While technically possible, this would require a great deal of extra effort; it would support only the users of that environment; and it would do nothing for server-only systems (which don't usually have a GUI). Instead, we take the view there is already a better graphical interface, fully standardized, and readily available to all users, of all operating systems, everywhere. It's called the World Wide Web or, more precisely, HTML5 + JavaScript. In this view, every web browser is a potential "client" for any JavaScript program running on any computer with a standard web server (Apache, nginx, etc.). And, when both client and server run on the same host, JSUS programs provide a user experience similar to that of a conventional GUI but without a massive additional programming effort.

There are three major methodologies for web based client-server applications. The WebSocket interface (WSI) is the most recent, flexible and performant, but is not implemented in a standard way by all web servers. Our jsite manager for the nginx web server simply redirects the client's socket to stdin and stdout, so the requested server side program need only ask() or say() text messages, just like a command line program (this is described more fully on the WebSocket page). The XMLHttpRequest method was designed to support Asynchronous JavaScript And XML (AJAX) applications. An XHR object allows a JavaScript function running in the browser to make normal HTTP and HTTPS requests to the web server. Since this is browser only technology, it is not further discussed in this manual. The Common Gateway Interface (CGI), described here, is the oldest, slowest, but in many ways the simplest method for writing server side application programs.

For historical reasons, there is an enormous installed base of CGI programs and this continues to be the best approach for simple, occasional, tasks such as processing input from web "forms". As a result, all major web servers provide some level of CGI support, and almost all support the "standard" CGI protocol described here. The one, glaring, exception is the nginx web server which directly supports the fastCGI protocol, but not standard CGI. In our opinion, this adds unnecessary complexity to CGI programs and, especially under Linux, is only marginally faster and only then in special cases. Since nginx is nevertheless our preferred server, we developed the jsite manager which creates fast backend servers for both standard CGI and WebSocket programs. It also automatically configures nginx to serve multiple, user defined web site directories and to proxy the backend servers from all these sites. (This is our strongly recommended solution for most web sites).

In standard CGI, any link on a web page may refer to an executable program file, instead of a text file containing HTML code. The program may be written in any language, in the same style as a basic command line program, and runs as a separate process forked by the web server. When executed, the program's standard output is transparently re-directed back to the web server, so that anything written to stdout (by say(), for example) is captured and sent to the browser just like any other web page. (Any HTTP headers required by the server or browser must precede and be separated from the HTML by an empty line). If the HTTP request type is PUT, any data passed with the request is presented to the CGI program on stdin (and may be read with ask() calls.). For a GET request, any additional parameters are passed to the program as environment variables (which may be retrieved with the env() function). In fact, regardless of request type, the web server always creates an "environment" for the CGI program, which contains a sub-set of the web server's own variables, as well as others created specifically for the current request. The exact set of variables is determined by the web server, but the following are provided by all standard servers (including jsite):

GATEWAY_INTERFACE CGI protocol version. DOCUMENT_ROOT path to the web site's root directory. SERVER_SOFTWARE name and version of the web server. SERVER_NAME server hostname (may be IP address). SERVER_ADMIN email address for the server administrator. SERVER_PROTOCOL HTTP protocol version. SERVER_PORT TCP port number (usually 80 or 443). REMOTE_HOST client host name (may be same as REMOTE_ADDR). REMOTE_ADDR IP address of the client browser (or proxy). REMOTE_USER sometimes (not usually) available. REMOTE_IDENT unset, unless server did a lookup (rare). AUTH_TYPE authorization type, or unset. REQUEST_METHOD HTTP method (GET, POST, etc.). PATH_TRANSLATED full path, as determined by server. SCRIPT_NAME path to the program file (before PATH_INFO). PATH_INFO URL path suffix, following program name. QUERY_STRING the part of the URL after ? character, if any. CONTENT_TYPE media type (from HTTP header, if POST, etc). CONTENT_LENGTH length of data content on stdin. HTTP_USER_AGENT These (and similar variables) come from HTTP_REFERER HTTP headers provided by the web browser. HTTP_COOKIE HTTP_ACCEPT HTTP_ACCEPT_LANGUAGE . . .

At first sight, the wealth of variables seems to offer the possibility of "securely" managing individual connections, over the Internet. Sadly, this has always been an illusion. Internet programs, including web browsers, are like politicians — they can say anything they like to please the listener, and nothing they say can be relied upon. You cannot even believe that the HTTP_USER_AGENT legitimately represents the name of the browser it purports to be. Even the REMOTE_ADDR, the IP address of the client, can be faked (and it often is when access is through a proxy server). In short, all of these variables should be taken with a large dose of salt.

Having said that, guardedly, a couple of these variables may be a little more than worthless as long as absolute secrecy is not required, or over a secure network connection. The PATH_INFO contains anything in the request after that required to identify the program itself (and a separating slash). For example, a web page may be coded with a link to a CGI program, followed by other data to more precisely identify the actions requested, for example:

<a href="http:example.com/cgi/program/some/thing/extra">a very specific link</a>

In this case, PATH_INFO would contain some/thing/extra when cgi/program comes to be executed, but it has no effect on identifying the real path to the program itself. (The same web page might also contain multiple links to the same program, each receiving different PATH_INFO).

Similarly, the QUERY_STRING is intended to hold data entered into a "form" and submitted as a GET request to the web server. It contains everything in the URL following the ? character. Ideally, it holds "name=value" pairs, separated by ampersands, as originally intended. However, it is important to remember that a client program can write anything after the ? character, so caution is always advised. For a POST request, the query pairs are passed to the program over stdin. In this case, the form data is not passed as part of the URL and it cannot be saved as part of a link (invisible form fields are also less visible to the remote user).

The historically misspelled HTTP_REFERER contains the URL of the web page containing the link to the CGI program. It can be used, for example, to re-display that page after the program has been completed — by writing a Location header to stdout. However, this variable will be unset or empty if the program was invoked from the web browser's address bar, rather than from a hyperlink. And, again, this is easily spoofed by the client.

The HTTP_COOKIE variable contains all cookies stored in the web browser for this user and applicable to this particular domain. They are stored as name=value pairs, separated by semi-colons. The uses for cookies are almost limitless, and are therefore beyond the scope of this manual. For more information see HTTP cookies on Wikipedia or search elsewhere on the web.

The standard CGI model is more than sufficient for many simple JavaScript applications, when running under Linux and JSUS. JSUS does not directly support alternatives, such as fastCGI, because the problem they purport to address is largely non-existent in our working environment. The idea behind them is that a web server can become bogged down, under heavy load, because a new CGI process has to be started for each new request. However, this is mostly a Windows OS problem, where creating a new process really does require a great deal of extra effort (which we consider a major design flaw). Furthermore, the Windows-style "solution" — light-weight "threads" — introduces an entirely new set of security issues we feel to be unacceptable. In modern versions of Unix, especially Linux, the creation cost of a new process is essentially the same as for a new thread, but without the security issues. However, if truly necessary, there is nothing to prevent a programmer from writing a JSUS program which does comply with the fastCGI protocol. Of course, a WebSocket application would be much faster, more flexible and probably less complicated.

Writing server-side web applications (CGI)

See also: