Clojure is a homoiconic language, which is a fancy term describing the fact that Clojure programs are represented by Clojure data structures. This is a very important difference between Clojure (and Common Lisp) and most other programming languages - Clojure is defined in terms of the evaluation of data structures and not in terms of the syntax of character streams/files. It is quite common, and easy, for Clojure programs to manipulate, transform and produce other Clojure programs.
That said, most Clojure programs begin life as text files, and it is the task of the reader to parse the text and produce the data structure the compiler will see. This is not merely a phase of the compiler. The reader, and the Clojure data representations, have utility on their own in many of the same contexts one might use XML or JSON etc.
One might say the reader has syntax defined in terms of characters, and the Clojure language has syntax defined in terms of symbols, lists, vectors, maps etc. The reader is represented by the function read, which reads the next form (not character) from a stream, and returns the object represented by that form.
Since we have to start somewhere, this reference starts where evaluation starts, with the reader forms. This will inevitably entail talking about data structures whose descriptive details, and interpretation by the compiler, will follow.
Reader forms
Symbols
Symbols begin with a non-numeric character and can contain alphanumeric characters and *, +, !, -, _, and ? (other characters will be allowed eventually, but not all macro characters have been determined). '/' has special meaning, it can be used once in the middle of a symbol to separate the namespace from the name, e.g.
my-namespace/foo. '/' by itself names the division function. '.' has special meaning - it can be used one or more times in the middle of a symbol to designate a fully-qualified class name, e.g.java.util.BitSet. Symbols beginning with '.' are reserved by Clojure. Symbols containing / or . are said to be 'qualified'.Literals
- Strings - Enclosed in
"double quotes". Standard Java escape characters are supported. - Numbers - as per Java, plus indefinitely long integers are supported, as well as ratios, e.g.
22/7 Characters - preceded by a backslash:
\c.\newline,\spaceand\tabyield the corresponding characters.nilMeans 'nothing/no-value'- represents Java null and tests logical false- Booleans -
trueandfalse Keywords
Keywords are like symbols, except:
- They can and must begin with a colon, e.g.
:fred. - They cannot contain '.' or name classes.
- They can and must begin with a colon, e.g.
- Strings - Enclosed in
Lists
Lists are zero or more forms enclosed in parentheses:
(a b c)Vectors
Vectors are zero or more forms enclosed in square brackets:
[1 2 3]Maps
Maps are zero or more key/value pairs enclosed in braces:
{:a 1 :b 2}Commas are considered whitespace, and can be used to organize the pairs:
{:a 1, :b 2}Keys and values can be any forms.
Sets
Sets are zero or more forms enclosed in braces preceded by #:
#{:a :b :c}
Macro characters
The behavior of the reader is driven by a combination of built-in constructs and an extension system called the read table. Entries in the read table provide mappings from certain characters, called macro characters, to specific reading behavior, called reader macros. Unless indicated otherwise, macro characters cannot be used in user symbols.
Quote (')
'form=>(quote form)Character (\)
As per above, yields a character literal.
Comment (;)
Single-line comment, causes the reader to ignore everything from the semicolon to the end-of-line.
Meta (^)
^form=>(meta form)Deref (@)
@form=>(deref form)Dispatch (#)
The dispatch macro causes the reader to use a reader macro from another table, indexed by the character following #:
Regex patterns (#"pattern")
A regex pattern is read and compiled at read time. The resulting object is of type java.util.regex.Pattern.Metadata (#^)
Symbols, Lists, Vector and Maps can have metadata, which is a map associated with the object. The metadata reader macro first reads the metadata and attaches it to the next form read:
#^{:a 1 :b 2} [1 2 3]yields the vector [1 2 3] with a metadata map of {:a 1 :b 2}.A shorthand version allows the metadata to be a simple symbol or keyword, in which case it is treated as a single entry map with a key of :tag and a value of the symbol provided, e.g.:
#^String xis the same as#^{:tag String} xSuch tags can be used to convey type information to the compiler.
Var-quote (#')
#'x=>(var x)Anonymous function literal (#())
#(...)=>(fn [args] (...))where args are determined by the presence of argument literals taking the form %, %n or %&. % is a synonym for %1, %n designates the nth arg (1-based), and %& designates a rest arg. This is not a replacement for fn - idiomatic used would be for very short one-off mapping/filter fns and the like. #() forms cannot be nested.
Syntax-quote (`, note, the "backquote" character), Unquote (~) and Unquote-splicing (~@)
For all forms other than Symbols, Lists, Vectors and Maps, `x is the same as 'x.
For Symbols, syntax-quote resolves the symbol in the current context, yielding a fully-qualified symbol (i.e. namespace/name or fully.qualified.Classname). If a symbol is non-namespace-qualified and ends with '#', it is resolved to a generated symbol with the same name to which '_' and a unique id have been appended. e.g.
x#will resolve tox_123. All references to that symbol within a syntax-quoted expression resolve to the same generated symbol.
For Lists/Vectors/Maps, syntax-quote establishes a template of the corresponding data structure. Within the template, unqualified forms behave as if recursively syntax-quoted, but forms can be exempted from such recursive quoting by qualifying them with unquote or unquote-splicing, in which case they will be treated as expressions and be replaced in the template by their value, or sequence of values, respectively.
For example:
user=> (def x 5)
user=> (def lst '(a b c))
user=> `(fred x ~x lst ~@lst 7 8 :nine)
(user/fred user/x 5 user/lst a b c 7 8 :nine)
The read table is currently not accessible to user programs.
(read)
(read stream)
(read stream eof-is-error)
(read stream eof-is-error eof-value)
(read stream eof-is-error eof-value is-recursive)
Reads the next object from stream, which must be an instance of java.io.PushbackReader or some derivee. stream defaults to the current value of *in*
. eof-is-error defaults to true, in which case encountering the end of file during the read is an error. If eof-is-error is nil, then eof-value (defaults to nil) will be returned when EOF is encountered. Finally is-recursive (defaults to nil) indicates that this call to read is happening within another call to read.