Passwords, Encoding and Encryption

Syntax:

str = crypt(plain[,salt|hash])

t/f = pass(plain,hashed)

str = btoa(text[,{16|8}])

str = atob(text[,{16|8}])

str = hash(algorithm,text[,{16|8}])

str = aes(bits,key|text[,{16|8}])

Synopsis:

Unix passwords are reasonably secure, as long as only a password hash and a random 'salt' value are stored, separately from the /etc/passwd file, and only readable by root. However, the MD5 algorithm is generally considered less secure and should probably be avoided for new applications. The crypt() function is used to create a Unix password hash string, using the like-named function from the system run-time library. However, the JSUS version does not support DES encryption (which is no longer considered secure). The plain argument specifies the password string to be cryptologically hashed, after first being converted to UTF-8 encoding. The salt argument is a string used to strengthen the encryption. It has the same format as a password hash:

$1$ssssssss$[eeeee . . .] $5$ssssssss$[eeeee . . .] $6$ssssssss$[eeeee . . .]

The 1, 5 or 6 specify a hash with the MD5, SHA2-256 or SHA2-512 algorithm, respectively. sss... is the salt value, which is logically concatenated with the plain password and used to perturb the algorithm (for increased security). This part must contain 2 to 16 characters, randomly selected from A-Z, a-z, 0-9, or a dot (".") or slash ("/"), with eight characters normally used for standard UNIX password hashes. If passed as a string of hashes ("#"), the function auto-computes a random salt, with the given number of characters. eee... is any previously hash-encrypted password+salt, which is ignored when passed to crypt(). Thus, the second argument may be a complete hash string produced by an earlier call to crypt().

Alternatively, a hash argument may be given as the numeric argument 1, 5, or 6 (with the same meaning as above). A password hash is then computed and returned using the specified algorithm and a random eight character salt. If the salt|hash is completely omitted, a hash of 6 (SHA2-512) is assumed, with a random salt.

If successful, crypt() returns a string like that passed as the salt argument, but always with the hash-encrypted value appended (eee...). This last part has a fixed length of 22 (MD5), 43 (SHA2-256) or 86 (SHA2-512). This string is the "password" value normally stored by Unix in the /etc/password or /etc/shadow file.

The pass() function is used to easily validate a pass phrase. It returns true if the "plain" pass phrase matches the "hashed" value when crypt()ed with the algorithm and salt contained in the hashed string. Returns null if not or if there are any errors. The hashed string is usually the output of an earlier crypt() against the password, as stored in the system password files.

The crypt() and pass() functions both assume the pass phrase is stored in UTF-16 format and always convert it to UTF-8 before processing. This is reasonable for most JSUS function calls because JavaScript defines the contents of strings as 16-bit characters (while most operating systems are designed to work best with 8-bit characters). In fact, JavaScript strings are neither UTF-16, nor UCS-2 — they are best described as arrays of short, unsigned, 16-bit integers which can hold any value from 0 through 65535. Naturally, any character can also contain an 8-bit "byte" value, in the range 0 through 255. This includes any and all 8-bit encodings such as ASCII, EBCDIC, UTF-8 or Latin-1. Strings can even hold pure binary data bytes in the lower half of each character. This is extremely important because, over the Internet, data is almost exclusively transferred in 8-bit encodings (most often, UTF-8).

The btoa() and atob() functions are replacements for the like named JavaScript functions defined within the HTML-5 standard. They convert to and from "Base64" encoding of data as 6-bit values (0 through 63), each transformed into a printable ASCII character (A-Z, a-z, 0-9, "+", or "/"). This encoding was originally invented to allow data to be sent in email, which officially only supports 7-bit ASCII characters! If you're a "bit" confused, it gets worse. The btoa() function is designed to convert a "binary" JavaScript string to a simple 64 character alphabet for transmission, while atob() does the reverse. Unfortunately, because Base64 was designed to encode 8-bit characters, some implementations fail when presented with characters having a code point greater than 255 (i.e. most of UCS-2). Other implementations convert the string to UTF-8 before encoding to Base64. But, what if the string is already encoded in UTF-8 (or some other encoding)?

JSUS addresses this by supporting an optional second argument, after the text string, specifying whether the internal string has a 16- or 8-bit encoding. If omitted, btoa() defaults to a value of 16 and the string is first converted to UTF-8 before applying Base64 encoding (which covers most expected cases). In reverse, atob() converts from UTF-8 to UCS after decoding from Base64. If 8-bits are specified, UTF-8 conversions are suppressed and any character with a value higher than 255 is treated as an error (and null is returned). The same solution is applied to the hash() and aes() functions, which have similar difficulties and are given an optional, final, character size argument.

The hash() function returns a string which is the cryptolographic hash of the plain text string given as its second argument. (This is not the same as the crypt() function, which adds other features related to password management). The required algorithm is specified as the numeric first argument. The values 5 and 1 represent the MD5 and SHA1 algorithms, both of which should normally be avoided because of cryptanalytic issues (they should be used only when specified as part of another protocol). The "best" algorithm, currently, is SHA2 which is specified as a 2 followed by the number of bits in the hash value (2224, 2256, 2384 or 2512). A future version of JSUS will support SHA3 (3xxx) but this is not presently in wide use and is not yet cryptographically essential. The special values of 0 and -1 may be used to select the "best" and previous best algorithm, respectively. Presently, these are set as SHA2-256 and SHA1. When SHA3 becomes more widely used it will (probably) become the new "best" algorithm and SHA2-256 the previous. This allows programs to automatically upgrade their algorithm to the latest generally accepted level, by hashing with the previous best if the latest fails.

The hash string is returned in 8-bit binary form, in the low order bytes of each 16-bit character. This string would typically be Base64 encoded before storage or transmission over a network, for example:

str = btoa(hash(2256,"perhaps a passphrase"),8);

The aes() function encrypts or decrypts one or more strings using Cipher Block Chaining (CBC) mode. It prepends the plain text with a one block (16 byte) explicit initialization vector, or nonce, computed randomly for each encryption call. It also extends any short ending block to the AES block size of 128 bits (16 bytes) using PKCS#7 padding. Thus, cipher text strings are always longer than the plain text, by 16 to 31 bytes. The nonce and padding are discarded during decryption and only the original plain text is returned. This is the technique most likely to be compatible with other AES implementations. It is intended for sending multiple encrypted messages over a communication channel (although the cipher text may also need to be Base64 encoded for some applications).

The first call in a series sets the key to be used for subsequent calls to aes(). The first argument specifies the size of the key given as the second argument (128, 192 or 256 bits). A negative key size causes the following calls to decrypt cipher text rather encrypt plain text. Subsequent calls must provide either a plain or cipher text string as the second argument while the first argument must always be the same size and sign as the first, key-setting call.

Internally, aes() maintains separate encryption and decryption states, allowing bi-directional encrypted conversations, using the same or different keys. This eliminates the expensive key setup overhead from individual messages, although a nonce is always computed for every encryption call. Either or both states may be reset by passing an empty string as the second argument, thereby ending the series. This must be done before any encryption key can be changed.

Keys and cipher text strings are always treated as 8-bit values in the low byte of each character. The third argument, even when defaulting to 16, applies only to the plain text — either the input to encryption or the result of decryption. By default, 16-bit UCS strings are automatically converted to and from UTF-8, before and after encryption (8-bit plain text is left as-is). Caution: while 7-bit ASCII characters can be used as key values, this is usually discouraged because the strength of the key is reduced. The most secure keys, like passwords, consist of random 8-bit values.

Passwords, Encoding and Encryption

Syntax:

Synopsis:

See also: