str | t/f = etc([path[,locale|value]])
str = gulp(string|file|fd[,locale])
The history of computer programming is replete with multiple, different means of passing configuration or "start up" information to a program. In the Unix (and Linux) world, the most complete approach is the use of human-readable "configuration files", which are read and processed by the program before it begins to operate. The main advantage of this approach is that these files can be created and modified with a simple text editor and they can contain comments to guide the administrator in inserting appropriate values. But there are many more disadvantages. First, there are no accepted "standards" for these files, so every program has to create a design of its own, and the administrator has to learn to understand all of the designs of all of the programs he wishes to use. This leads to the second, larger problem, of creating a great deal of unique, custom start-up code in each and every program. And, generally, the more useful the program the more of this cumbersome cruft has to be produced.
The Windoze world started out using a similar approach, if slightly more standardized, and called them ".ini" files. Later, the Windows "Registry" was created — a database of configuration values which can be set and retrieved by individual programs via system calls. In principle, this was a greatly improved approach but, unfortunately, the registry has its own issues (beyond the scope of this discussion) and the Unix world has never broadly adopted a similar strategy. Windows and Unix also share a history with other methods of passing initialization information — command line options and environment variables, for example. These also suffer from the same non-standardization problems as configuration files.
The ETC database is designed to cohesively integrate all of these into a single configuration database system, similar to the registry, but overcoming many of its limitations. The etc() function returns text values from the database and largely eliminates the need for parsing during initialization. It also allows configuration values to be easily shared between multiple programs. The gulp() function produces a new output string from an input file (or string), replacing all "macro" references in the input text with values from the database. This is useful for creating conventional configuration files, sharing multiple configuration values between them. It is also very useful for producing easily customized HTML pages.
The database consists of from one to three, human-readable, text files. They provide up to three levels of administrative authority (five if environment and command line overrides are considered). At the top, the "root" file (/etc/etc) is maintained by a qualified administrator in the OS distribution's developer team (or the administrator of a custom installation). The "stem" file (/etc/local/etc) is intended for use by the local site administrator where an ETC aware distribution is installed. If present, the values in this file typically override the like-named values in the root file (but see below). A "leaf" file (typically ~/.etc) is created automagically in the home folder of every real (logged-in) user who runs a program which writes a new database value. A value in one of these leaf files typically overrides those in both the root and stem files (again, see below). Finally, a value with the same name found in the environment typically overrides all ETC file values and a command line argument overrides everything.
Each ETC file is organized as a pseudo-file system, with each "file" comprising a single name=value pair, containing a single value string, terminated by a single line-feed character, "\n". Folders (directories) are also permitted and each may contain any number of name-value pairs (and nested folders, to any depth). Values are UTF-8 strings and may therefore contain any valid Unicode sequences, EXCEPT embedded ASCII control characters. They may also contain the first argument of a getText() function call, which is then automatically translated before return or macro replacement. If needed, control characters may be encoded as standard escape sequences (below). The main difference from a real file system is that the names of folders and values MUST be valid JavaScript identifiers.
The most common example of the etc() function is the retrieval of a single value (or an entire folder) based on its pseudo-path in the ETC database:
etc(path) or etc(path,locale)
The rarely needed second argument specifies the language locale for translation. Normally, this can be omitted because the user's default locale will be taken from the environment. In addition, if a retrieved value does not begin with a period (or begins with .. or ./) the locale is ignored and no translation is attempted.
Like that to any other file, the "path" argument consists of an optional, hierarchical list of folder names, and an optional target "file" name, each separated by a solidus (slash). The search path always begins at the top, so a leading slash is optional. When retrieving an entire target folder, a trailing slash is required. Leading and trailing slashes are usually omitted when referring to a single value in the top level folder (or the command line or environment).
In principle, etc() searches first for a command line argument, with the path string treated the same as a string argument to the arg() function (thus, it may not contain slashes and may not begin with a digit). If an argument with the same name is found, etc() returns that value, instead of searching the entire ETC database. Obviously, this search can only proceed if the target value would normally be located in the "root" folder of the ETC database itelf, or in the base folder of a text domain ("/$/name", below). Next, if no argument is found, etc() looks for an environment variable with the name from "path", and returns any associated value. Only then, if still not found, does it search the ETC files themselves — first the leaf (~/.etc); then the stem (/etc/local/etc); and finally the root (/etc/etc).
One exception to this search order occurs if the name of a value, or any containing folder, begins with an underscore character (_). Such values and folders (and everything within them) are considered to be "protected" from overrides. In this case, the search order is reversed, and begins with the root, followed by the stem and then the leaf. Command line arguments and environment variables are not even considered, in a protected search. This protection is rarely needed, in practice. The root and stem files are well protected, and malicious changes to leaf files affect only those users foolish enough to allow them.
The etc() function may also be called to insert, delete or replace values (or folders) in the ETC database, but ONLY in the leaf file:
etc(path,value)
This allows application programs to maintain per-user configuration values, but does not permit them to modify the root and stem files intended for system administrators (these files should be modified only with a secure, specialized program — /sbin/etcetera). The leaf file is automatically created in the home folder, as needed (unless the real user lacks a valid entry in the system password file). The new "value" should be a string intended to be inserted (if the path does not exist), or to replace an existing value. If needed, control characters may be embedded within any value by the standard escape sequences defined in JavaScript (and most other languages) with the backslash itself escaped by repetition. Thus, "\\t" and "\\x09" are both translated to a tab character, on retrieval. An empty string legitimately eliminates any previous value, or creates a new empty value, while leaving the name-value pseudo-file still present in the database (important if "notes" are attached to the value, below). A new "value" specified as the unquoted number zero causes the "file" or folder to be completely deleted from the database, including any attached notes. A new folder can be inserted by specifying a non-existant path (ending in slash) and any non-zero "value" (which is promptly ignored by JSUS). This is rarely required, however, since any missing folders are created automatically, when new values (or notes) are inserted. For example:
if (etc("/my/new/value","line1\\nline2"))
strvar = etc("/my/new/value");
etc ("/my/new/value",0);
The first line creates two new folders, "my" and "new", and the pseudo-file "value". This contains two lines of text, separated with an escaped line-feed. The first call returns true, and the second line is therefore executed. The second call returns the value of "value" as a JavaScript string, with the "\\n" properly translated to integer 10 (x0A). If the value had somehow been deleted by another program, it would return false. The third line does, in fact, delete the value, but NOT the two folders. These must be deleted separately (etc("/my/",0) will do that nicely). In practice, applications are expected to create their own folder names, and to leave both folders and values in place.
To make life a little easier, the etc() and gulp() functions both support the use of the getText() text domain as the name for a program's collection of configuration values. The occurrance of a single dollar symbol ($) as the first element in any path is automagically replaced by the default text domain, if one has been set. They are similar to Internet domains, for example, ".application.company". This can be used as the name of a program's main configuration folder, in any of the ETC files, but especially the per-user leaf file. It should be noted that dollar and period characters are not normally permitted in path names, but they are permitted here as a useful special case. When the same name is used in the text translation database, this also helps to reduce naming conflicts. (If not obvious, the text domain is reversed within the path, in this example as /company/application/).
An etc() call without any arguments may be used to release the resources used by the ETC database system (which include memory mappings of the three files). It isn't essential, especially for short lived programs. But longer running programs are usually finished with retrieving and setting values early in their life cycle. It makes sense, then, to release these resources back to the system for use by other programs. If needed later, for example saving values during termination, the next etc() call will automatically reclaim the needed resources, without much fuss or overhead.
The gulp() function reads an entire text file (or a long string) and scans it line by line for macro references, which are replaced by values from the ETC database. Macros are embedded in the text with the format "&&path&&", where "path" has exactly the same semantics as in the etc() function. Thus, on our own system, &&identity&& would be replaced by the name of the system's owner (but might be overriden by an environment or command line option). Whereas &&/time/zone&&, beginning witha slash, would be replaced with the name of the system's time zone (and could not be overriden beyond the leaf file). Macros may also be embedded in normal database values, permitting up to three levels of macro re-direction.
The main purpose of gulp() is to provide system administrators with a simple method of managing the large configuration files used for traditional Unix server programs. In our own systems, we replace all variable values in most configuration files with macros, which can usually be shared across multiple files. The start-up scripts, which are all written in JavaScript, then gulp() the configuration file and write the result to a memory based file system, for use by the server program itself. This is one of the key techniques that has allowed us to build a pre-installed, pre-configured Linux server distribution that can be easily managed through a few simple web pages, even remotely. Almost all customization and changes are a simple matter of editing the ETC stem file, which is further simplified by embedded notes, in multiple languages.
The ETC database fully supports internationalized comments, or "notes". These can be attached to both values and folders, by appending a single colon (:) to the path and providing the default text in the second argument. For example:
etc("my/dollars:",".iFree Freedom isn't free");
The default note (by convention in English) is treated exactly the same as default text passed to the getText() function. It may begin with a full text code or, more often, with a simple text id (within the default text domain). If possible, a translated note is retrieved from the translation database and is returned to the caller (with the text code removed). If this is not possible, the default note is returned, while an error text is returned if there is no note at all.
Notes are not required if one has already been attached to the same name at a higher level of the ETC database, for example, to the default values in the root or stem files. In addition, as with getText() itself, the note may consist of nothing but a text id, in which case the translated or default note will be retrieved from the translation database. Furthermore, notes are rarely retrieved by normal programs, since they are intended for use mainly by the ETC database editor. In many cases, they can be safely dealt with administratively, outside the development process itself.
Program execution environment The text translation database