next up previous contents index
Next: Loading Prolog source files Up: Built-in predicates Previous: Notation of Predicate Descriptions   Contents   Index


Character representation

In traditional (Edinburgh-) Prolog, characters are represented using character-codes. Character codes are integer indices into a specific character set. Traditionally the character set was 7-bits US-ASCII. Since a long while 8-bit character sets are allowed, providing support for national character sets, of which iso-latin-1 (ISO 8859-1) is applicable to many western languages. Text-files are supposed to represent a sequence of character-codes.

ISO Prolog introduces three types, two of which are used for characters and one for accessing binary streams (see open4). These types are:


\begin{itemlist}
\item [code]
A \jargon{character-code} is an integer represent...
...ging.
\item [byte]
Bytes are used for accessing binary-streams.
\end{itemlist}

The current version of SWI-Prolog does not provide support for multi-byte character encoding. This implies for example that it is not capable of breaking a multi-byte encoded atom into characters. For SWI-Prolog, bytes and codes are the same and one-character-atoms are simple atoms containing one byte.

To ease the pain of these multiple representations, SWI-Prolog's built-in predicates dealing with character-data work as flexible as possible: they accept data in any of these formats as long as the interpretation is unambiguous. In addition, for output arguments that are instantiated, the character is extracted before unification. This implies that the following two calls are identical, both testing whether the next input characters is an a.


\begin{code}
peek_code(Stream, a).
peek_code(Stream, 97).
\end{code}

These multiple-representations are handled by a large number of built-in predicates, all of which are ISO-compatible. For converting betweem code and character there is char_code2. For breaking atoms and numbers into characters are are atom_chars2, atom_codes2, number_codes2 and number_chars2. For character I/O on streams there is get_char[1,2], get_code[1,2], get_byte[1,2], peek_char[1,2], peek_code[1,2], peek_byte[1,2], put_code[1,2], put_char[1,2] and put_byte[1,2]. The prolog-flag double_quotes (see current_prolog_flag2) controls how text between double-quotes is interpreted.


next up previous contents index
Next: Loading Prolog source files Up: Built-in predicates Previous: Notation of Predicate Descriptions   Contents   Index
Dr. Richard Botting 2001-12-12