Lexical conventions
This section defines the lexical grammar of the ShockScript language.
The tokenizer scans one of the following input goal symbols depending on the syntactic context: InputElementDiv, InputElementRegExp, InputElementXMLTag, InputElementPI, InputElementXMLContent.
The following program illustrates how the tokenizer decides which is the input goal symbol to scan:
/(?:)/ ;
a / b ;
<a>Text</a> ;
The following table indicates which is the input goal symbol that is scanned for each of the tokens comprising the previous program:
| Token | Input goal |
|---|---|
| /(?:)/ | InputElementRegExp |
| ; | InputElementDiv |
| a | InputElementRegExp |
| / | InputElementDiv |
| b | InputElementRegExp |
| ; | InputElementDiv |
| < | InputElementRegExp |
| a | InputElementXMLTag |
| > | InputElementXMLTag |
| Text | InputElementXMLContent |
| </ | InputElementXMLContent |
| a | InputElementXMLTag |
| > | InputElementXMLTag |
| ; | InputElementDiv |
The InputElementPI goal symbol must be used while parsing a <?fixed={x}?> expression.
Note: InputElementPI has nothing to do with E4X. It’s currently used in the fixed expression for escaping out of dynamic properties.
Syntax
-
InputElementDiv ::
-
WhiteSpace
LineTerminator
Comment
Identifier
ReservedWord
Punctuator
/
/=
NumericLiteral
StringLiteral
-
InputElementRegExp ::
-
WhiteSpace
LineTerminator
Comment
Identifier
ReservedWord
Punctuator
NumericLiteral
StringLiteral
RegularExpressionLiteral
XMLMarkup
<?fixed={
-
InputElementXMLTag ::
-
XMLName
XMLTagPunctuator
XMLAttributeValue
XMLWhitespace
{
-
InputElementPI ::
-
?>
-
InputElementXMLContent ::
-
XMLMarkup
XMLText
{
< [lookahead ∉ { ?, !, / }]
</
Source Characters
Syntax
-
SourceCharacter ::
-
Unicode code point
-
SourceCharacters ::
-
SourceCharacter SourceCharactersopt
White Space
The WhiteSpace token is filtered out by the lexical scanner.
Syntax
-
WhiteSpace ::
-
U+09 tab
U+0B vertical tab
U+0C form feed
U+20 space
U+A0 no-break space
Unicode “space separator”
Line Terminator
The LineTerminator token is filtered out by the lexical scanner, however it may result in a VirtualSemicolon to be inserted.
Syntax
-
LineTerminator ::
-
U+0A line feed
U+0D carriage return
U+2028 line separator
U+2029 paragraph separator
Comment
The Comment token is filtered out by the lexical scanner, however it propagates any LineTerminator token from its characters.
/*
* /*
* *
* */
*/
Syntax
-
Comment ::
-
// SingleLineCommentCharacters
MultiLineComment
-
SingleLineCommentCharacters ::
-
SingleLineCommentCharacter SingleLineCommentCharactersopt
-
SingleLineCommentCharacter ::
-
[lookahead ∉ { LineTerminator }] SourceCharacter
-
MultiLineComment ::
-
/* MultiLineCommentCharactersopt */
-
MultiLineCommentCharacters ::
-
SourceCharacters [but no embedded sequence /*]
MultiLineComment
MultiLineCommentCharacters SourceCharacters [but no embedded sequence /*]
MultiLineCommentCharacters MultiLineComment
Virtual Semicolon
The VirtualSemicolon nonterminal matches an automatically inserted semicolon, known as a virtual semicolon.
Virtual semicolons are inserted in the following occasions:
- After a right-curly character }
- Before a LineTerminator
Identifier
The Identifier symbol is similiar to that from the ECMA-262 third edition, but with support for scalar Unicode escapes, \xXX escapes and a \x{...} escape (alias to \u{...}).
Syntax
-
Identifier ::
-
IdentifierName [but not ReservedWord or ContextKeyword]
ContextKeyword
-
IdentifierName ::
-
IdentifierStart
IdentifierName IdentifierPart
-
IdentifierStart ::
-
UnicodeLetter
underscore _
$
UnicodeEscapeSequence
-
IdentifierPart ::
-
UnicodeLetter
UnicodeCombiningMark
UnicodeConnectorPunctuation
UnicodeDigit
underscore _
$
UnicodeEscapeSequence
-
UnicodeLetter ::
-
Unicode letter (“L”)
Unicode letter number (“Nl”)
-
UnicodeDigit ::
-
Unicode decimal digit number (“Nd”)
-
UnicodeCombiningMark ::
-
Unicode nonspacing mark (“Mn”)
Unicode spacing combining mark (“Mc”)
-
UnicodeConnectorPunctuation ::
-
Unicode connector punctuation (“Pc”)
Keywords
ReservedWord includes the following reserved words:
as
do
if
in
is
for
let
new
not
try
use
var
case
else
null
this
true
void
with
await
break
catch
class
const
false
super
throw
while
yield
delete
import
public
return
switch
typeof
default
extends
finally
package
private
continue
function
internal
interface
protected
implements
ContextKeyword is one of the following in certain syntactic contexts:
get
map
set
tap
xml
each
enum
meta
type
Embed
final
native
static
decimal
generic
abstract
override
namespace
undefined
Punctuator
Punctuator includes one of the following:
:: @
. .. ...
( ) [ ] { }
: ; ,
? ! =
?.
< <=
> >=
== ===
!= !==
+ - * % **
++ --
<< >> >>>
& ^ | ~
&& ^^ || ??
The @ punctuator must not be followed by a single quote ’ or a double quote character “.
Punctuator includes CompoundAssignmentPunctuator. CompoundAssignmentPunctuator is one of the following:
+= -= *= %= **=
<<= >>= >>>= &= ^= |=
&&= ^^= ||=
??=
Numeric Literal
NumericLiteral is similiar to NumericLiteral from the ECMA-262 third edition, with support for binary literals, underscore separators and certain suffixes:
0b1011
0o77777
0x0A
1_000
10d // double(10) or simply 10
10f // float(10)
10i // int(10)
10m // decimal(10). "m" for money
10n // bigint(10)
10u // uint(10)
Syntax
-
NumericLiteral ::
-
DecimalLiteral DecimalLiteralSuffixopt [lookahead ∉ { IdentifierStart, DecimalDigit }]
HexIntegerLiteral HexLiteralSuffixopt [lookahead ∉ { IdentifierStart, DecimalDigit }]
BinIntegerLiteral BinLiteralSuffixopt [lookahead ∉ { IdentifierStart, DecimalDigit }]
OctalIntegerLiteral OctalLiteralSuffixopt [lookahead ∉ { IdentifierStart, DecimalDigit }]
-
DecimalLiteralSuffix ::
-
d
D
f
F
i
I
m
M
n
N
u
U
-
HexLiteralSuffix ::
-
i
I
m
M
n
N
u
U
-
BinLiteralSuffix ::
-
DecimalLiteralSuffix
-
OctalLiteralSuffix ::
-
DecimalLiteralSuffix
-
DecimalLiteral ::
-
DecimalIntegerLiteral . UnderscoreDecimalDigitsopt
ExponentPartopt
. UnderscoreDecimalDigits ExponentPartopt
DecimalIntegerLiteral ExponentPartopt
-
DecimalIntegerLiteral ::
-
0
[lookahead = NonZeroDigit] UnderscoreDecimalDigitsopt
-
DecimalDigits ::
-
DecimalDigit{1,}
-
UnderscoreDecimalDigits ::
-
DecimalDigits
UnderscoreDecimalDigits _ DecimalDigits
-
DecimalDigit ::
-
0-9
-
NonZeroDigit ::
-
1-9
-
ExponentPart ::
-
ExponentIndicator SignedInteger
-
ExponentIndicator ::
-
e
E
-
SignedInteger ::
-
UnderscoreDecimalDigits
+ UnderscoreDecimalDigits
- UnderscoreDecimalDigits
-
HexIntegerLiteral ::
-
0x UnderscoreHexDigits
0X UnderscoreHexDigits
-
HexDigit ::
-
0-9
A-F
a-f
-
UnderscoreHexDigits ::
-
HexDigit{1,}
UnderscoreHexDigits _ HexDigit{1,}
-
BinIntegerLiteral ::
-
0b UnderscoreBinDigits
0B UnderscoreBinDigits
-
BinDigit ::
-
0
1
-
UnderscoreBinDigits ::
-
BinDigit{1,}
UnderscoreBinDigits _ BinDigit{1,}
-
OctalIntegerLiteral ::
-
0o UnderscoreOctalDigits
0O UnderscoreOctalDigits
-
OctalDigit ::
-
0-7
-
UnderscoreOctalDigits ::
-
OctalDigit{1,}
UnderscoreOctalDigits _ OctalDigit{1,}
Regular Expression Literal
RegularExpressionLiteral is similiar to RegularExpressionLiteral from the ECMA-262 third edition, with support for line breaks.
Syntax
-
RegularExpressionLiteral ::
-
/ RegularExpressionBody / RegularExpressionFlags
-
RegularExpressionBody ::
-
RegularExpressionFirstChar RegularExpressionChars
-
RegularExpressionChars ::
-
«empty»
RegularExpressionChars RegularExpressionChar
-
RegularExpressionFirstChar ::
-
SourceCharacter [but not * or \ or /]
BackslashSequence
-
RegularExpressionChar ::
-
SourceCharacter [but not \ or /]
BackslashSequence
-
BackslashSequence ::
-
\ SourceCharacter
-
RegularExpressionFlags ::
-
«empty»
RegularExpressionFlags IdentifierPart
String Literal
StringLiteral is similiar to the StringLiteral symbol from the ECMA-262 third edition. The following additional features are included:
- Scalar UnicodeEscapeSequence using the
\u{...}or\x{...}form - Triple strings
- Raw strings using the
@prefix
Triple string literals use either """ or ''' as delimiter and may span multiple lines. The contents of triple string literals are indentation-based, as can be observed in the following program:
const text = """
foo
bar
"""
text == "foo\nbar"
Triple strings are processed as follows:
- The base line for determining nested indentation characters is the non-empty (i.e. not a whitespace only line) first line that contains the lowest-indentation level after whitespace characters.
- Every line contents start from the base line’s first non-whitespace character.
- Beginning and end lines that are empty or consist only of whitespace are discarded.
Both regular and triple strings accept the @ prefix, designating raw string literals. Raw string literals contain no escape sequences.
const text = @"""
x\y
"""
Escape sequences are described by the following table:
| Escape | Description |
|---|---|
| \’ | U+27 single-quote |
| \“ | U+22 double-quote |
| \\ | U+5C backslash character |
| \b | U+08 backspace character |
| \f | U+0C form feed character |
| \n | U+0A line feed character |
| \r | U+0D carriage return character |
| \t | U+09 tab character |
| \v | U+0B vertical tab character |
| \0 | U+00 character |
| \xHH | Contributes an Unicode code point value |
| \uHHHH | Contributes an Unicode code point value |
| \u{…} | Contributes an Unicode code point value |
| \ followed by LineTerminator | Contributes nothing |
Syntax
-
StringLiteral ::
-
[lookahead ≠ """] " DoubleStringCharacter{0,} "
[lookahead ≠ '''] ' SingleStringCharacter{0,} '
""" TripleDoubleStringCharacter{0,} """
''' TripleSingleStringCharacter{0,} '''
RawStringLiteral
-
RawStringLiteral ::
-
@ [lookahead ≠ """] " DoubleStringRawCharacter{0,} "
@ [lookahead ≠ '''] ' SingleStringRawCharacter{0,} '
@""" TripleDoubleStringRawCharacter{0,} """
@''' TripleSingleStringRawCharacter{0,} '''
-
DoubleStringCharacter ::
-
SourceCharacter [but not double-quote " or backslash \ or LineTerminator]
EscapeSequence
-
SingleStringCharacter ::
-
SourceCharacter [but not single-quote ' or backslash \ or LineTerminator]
EscapeSequence
-
DoubleStringRawCharacter ::
-
SourceCharacter [but not double-quote " or LineTerminator]
-
SingleStringRawCharacter ::
-
SourceCharacter [but not single-quote ' or LineTerminator]
-
TripleDoubleStringCharacter ::
-
[lookahead ≠ """] SourceCharacter [but not backslash \ or LineTerminator]
EscapeSequence
LineTerminator
-
TripleSingleStringCharacter ::
-
[lookahead ≠ '''] SourceCharacter [but not backslash \ or LineTerminator]
EscapeSequence
LineTerminator
-
TripleDoubleStringRawCharacter ::
-
[lookahead ≠ """] SourceCharacter [but not LineTerminator]
LineTerminator
-
TripleSingleStringRawCharacter ::
-
[lookahead ≠ '''] SourceCharacter [but not LineTerminator]
LineTerminator
Escape Sequences
Syntax
-
EscapeSequence ::
-
\ CharacterEscapeSequence
\0 [lookahead ∉ DecimalDigit]
\ LineTerminator
UnicodeEscapeSequence
-
CharacterEscapeSequence ::
-
SingleEscapeCharacter
NonEscapeCharacter
-
SingleEscapeCharacter ::
-
'
"
\
b
f
n
r
t
v
-
NonEscapeCharacter ::
-
SourceCharacter [but not EscapeCharacter or LineTerminator]
-
EscapeCharacter ::
-
SingleEscapeCharacter
DecimalDigit
x
u
-
UnicodeEscapeSequence ::
-
\x HexDigit HexDigit
\u HexDigit{4}
\x { HexDigit{1,} }
\u { HexDigit{1,} }
XML
This section defines nonterminals used in the lexical grammar as part of the XML capabilities of the ShockScript language.
If a XMLMarkup, XMLAttributeValue or XMLText contains a LineTerminator after parsed, it contributes such LineTerminator to the lexical scanner.
Syntax
-
XMLMarkup ::
-
XMLComment
XMLCDATA
XMLPI
-
XMLWhitespaceCharacter ::
-
U+20 space
U+09 tab
U+0D carriage return
U+0A line feed
-
XMLWhitespace ::
-
XMLWhitespaceCharacter
XMLWhitespace XMLWhitespaceCharacter
-
XMLText ::
-
SourceCharacters [but no embedded left-curly { or less-than <]
-
XMLName ::
-
XMLNameStart
XMLName XMLNamePart
-
XMLNameStart ::
-
UnicodeLetter
underscore _
colon :
-
XMLNamePart ::
-
UnicodeLetter
UnicodeDigit
period .
hyphen -
underscore _
colon :
-
XMLComment ::
-
<!-- XMLCommentCharactersopt -->
-
XMLCommentCharacters ::
-
SourceCharacters [but no embedded sequence -->]
-
XMLCDATA ::
-
<![CDATA[ XMLCDATACharacters ]]>
-
XMLCDATACharacters ::
-
SourceCharacters [but no embedded sequence ]]>]
-
XMLPI ::
-
<? XMLPICharactersopt ?>
-
XMLPICharacters ::
-
SourceCharacters [but no embedded sequence ?>]
-
XMLAttributeValue ::
-
" XMLDoubleStringCharactersopt "
' XMLSingleStringCharactersopt '
-
XMLDoubleStringCharacters ::
-
SourceCharacters [but no embedded double-quote "]
-
XMLSingleStringCharacters ::
-
SourceCharacters [but no embedded single-quote ']
-
XMLTagPunctuator ::
-
=
&=
>
/>
Semantics
XMLCDATA contents, excluding the <![CDATA[ opening sequence and the ]]> closing sequence, are processed the same way as triple strings:
- The base line for determining nested indentation characters is the non-empty (i.e. not a whitespace only line) first line that contains the lowest-indentation level after whitespace characters.
- Every line contents start from the base line’s first non-whitespace character.
- Beginning and end lines that are empty or consist only of whitespace are discarded.
For XMLText, unlike the E4X standard, ShockScript always trims any whitespace at the beginning and end of the text. The parser can skip the token if its empty after trimming whitespace. Note that this does not apply to the XML or XMLList parsers during runtime; they ignore whitespace depending on the XMLContext object specified by the use xml pragma.