Pipes, named and anonymous

Syntax:

t/f = mkfifo(path[,mode])

num = mkpipe([read,write[,size]])

str = get[b](path|fd[,max])

num = put[b](path|fd,str)

str = ask([str,][max])

num = say(str)

num = cry(str)

t/f = close(path)

Synopsis:

UNIX has long supported both anonymous and named pipes, as one of the standard methods for inter process communication (IPC). In most operating systems, a pipe is a unidirectional connection between two processes (or, in Linux, between two pipe ends within a single process). Data is written into the pipe at one end, by one or more processes and, at the other end, one or more processes can read any data already in the pipe. The only difference between anonymous and named pipes is the way they are created and the fact that named pipes exist within directories of the file system (just like UNIX domain sockets). As such they can be located and opened, publically, by any program with appropriate permissions. Anonymous pipes must be created dynamically and are passed between processes, privately, when they are inherited across a fork().

Pipes are sometimes called FIFO queues (short for First In, First Out). However, this is something of a misnomer because writes with a length greater than 4K bytes may be received interleaved with each other at the other end of the pipe. At one time, pipes were the fastest means of passing moderate amounts of data from one process to another because, even though named pipes live in the file system, there is no actual I/O against any external medium (communication is entirely within the kernel). Today, shared memory provides a faster and in many ways superior solution (see memory mapped files). And sockets provide full duplex communications, even across a network. Nevertheless, there are are still many valid uses for pipes.

Named pipes are created either with the mkfifo command or with the mkfifo() function. If omitted, the optional mode argument defaults to 0660 (read and write by user and group only). Once created, any program can either read from or write to the pipe, but not both. While Linux itself permits a pipe file to be opened read-write (as a special case) this is not supported by JSUS. Both ends of a named pipe must be "open" before data can be transferred and, normally, the first get() or put will block (suspend the program) until the other end becomes available in another process.

Anonymous pipes must be created with the mkpipe() function. Because these pipes never exist within the file system they cannot be referenced by a regular file name. Instead, the read and write ends of the pipe must be specified as integer "file descriptors", as used in low-level I/O programming. Subsequent get() and put() calls then reference each end of the pipe using a pseudo-file name consisting of "PIPE:n", where n is the number of the descriptor (similar to socket file names). Alternatively, the actual descriptor may be given instead of a path. However, when specified, the standard I/O file descriptors (0, 1 and 2) are used to replace the three standard files available to all programs on startup. These pipe ends should be referred to with "/dev/stdout", etc., or by using the ask(), say() and cry() functions. Indeed, the most common use of anonymous pipes is to transfer data from stdout in one process to stdin in another process. In such cases the other process can use standard file I/O without even knowing a pipe has been created and it need not even be written as a JSUS program. For this reason, if no arguments are given, the default is mkpipe(0,1).

For descriptors greater than 2, the programmer can choose any integer up to the system maximum (but not exceeding 16,777,215). For the first call to mkpipe(), creating a new pipe, these descriptors may not previously exist. The standard file descriptors, however, may be already opened to any non-pipe file (which will then be closed prior to creating the new pipe). If either descriptor already refers to a pipe, both descriptors must refer to opposite ends of the same anonymous pipe, in the same directions as originally created. This allows a JSUS process to gain access to a pipe created in another program (perhaps not written in JavaScript). This second call is not necessary for pure JSUS child processes because all necessary information is inherited across the fork() function. Such child processes can simply continue to use the pipe, concurrently with their parent, without any additional calls.

Alternatively, the descriptor for either or both ends of the pipe may be specified with a value of -1. In this case, JSUS will assign the next available descriptor to that end of the pipe. This is most convenient when it is difficult to determine which descriptors are currently available to the program. On success, the actual values of both descriptors are returned as the result of the expression (input*16777216+output). This is equivalent to a 48-bit integer with the input descriptor in the high 24 bits and the output in the low 24 bits. A JavaScript program can easily obtain the correct filenames with input="PIPE:"+Math.floor(value/16777216); output="PIPE:"+value%16777216. (On error, mkpipe() returns null, with standard error handling).

Under Linux, it is possible for both ends of an anonymous pipe to be accessed from a single process. For example, a secondary program written in JavaScript could be executed with load(), instead of jump(), for a significant saving in system overhead. The load()ed child writes all of its output to stdout and then terminates. Immediately after the load(), the parent can process all of the data by reading stdin. Unfortunately, pipes have a maximum capacity which, if exceeeded, causes the write (and usually the entire process) to fail. This can be avoided by giving as the optional third argument the specific size (i.e. capacity) of the pipe (in 1K byte increments). If specified as zero, or omitted, the pipe capacity is left at the system default (64K for Linux). Otherwise, the value is rounded to the next page size increment, up to the system maximum specified in /proc/sys/fs/pipe-max-size (1M for Linux). If running as root, even this system maximum may be exceeded. And, it should be noted there are sometimes good reasons for specifiying a capacity of less than the system default.

Except as noted, the semantics for named and anonymous pipes are identical (and similar to those for regular files). But this can be complicated by the fact that, while a pipe may have multiple readers as well as multiple writers, any data already read from the pipe can no longer be accessed from another process (unlike a regular file). So, for reliability reasons, there should never be more than one process reading from a pipe at any given time. In addition, pipes are simple, binary byte streams, with no concept of line or record boundaries. Furthermore, native Linux pipes have no understanding of the Unicode strings preferred by JavaScript programs.

JSUS addresses these limitations with internal wrappers around the native pipes. Writing into a pipe is fairly straightforward and any number of processes may put(), say() or cry() any number of strings into the same pipe. These will be received atomically at the read end of the pipe — as long as the strings, in multi-byte UTF-8 format, are less than 4096 bytes in total. If the pipe is already too full to accept the new data, the process will block until sufficient data has been removed from the pipe. If there is no longer any reader at the other end of the pipe, the write will fail, returning zero (with errno = EPIPE).

For reading, JSUS maintains an internal buffer equal in size to the capacity of the pipe (default 64K in Linux). All read requests are satisfied entirely from this buffer and, when empty, it is refilled by reading the entire contents of the pipe. (The process blocks if the pipe itself is empty). If the max argument is omitted, JSUS performs its own "record" handling by searching the unread data in the buffer for a new-line character (\n). If found, get() returns everything up to and including the terminating byte (and removes this line from the buffer). If not found (or if max is zero) get() returns all unread data in the buffer, which is then emptied. This allows "records" to be easily written and parsed as single lines.

Embedded nulls can also be read using a specific max block value, but this should only be attempted if the length of the multi-byte UTF-8 representation can be determined in advance. Typically, this means the data should be coded as pure ASCII, which occupies only one byte per character in the pipe. Alternatively, a record could be written into the pipe prepended by the length of the UTF-8 data, pre-computed before writing, and with the length being read first as a separate operation.

An "end-of-file" condition is detected after the last writer process has closed its end of the pipe, with no more data in the pipe or buffer. In this case a null is returned from the read (without any error). For a named pipe only, the reader process may then issue another get(), causing the process to block until another process writes more data into the pipe (effectively ignoring the condition). For an anonymous pipe, losing the last writer can only be detected if the reader process does not, itself, have the write end open. A close() can be issued against the write-end file but, unfortunately, this means new writer processes can no longer be forked from the reader (because they won't inherit any file to write to). A similar situation occurs when a writer process cannot detect the loss of the reader because it has its own read-end file still open. As an alternative to close(), an asynchronous timer may be used, prior to the get() or put(), to interrupt a deadlocked (blocked) read or write operation.

Pipes, named and anonymous

Syntax:

Synopsis:

See also: