Lexical conventions
This section defines the lexical grammar of the ShockScript language.
The tokenizer scans one of the following input goal symbols depending on the syntactic context: InputElementDiv, InputElementRegExp, InputElementXMLTag, InputElementXMLPI, InputElementXMLContent.
The following program illustrates how the tokenizer decides which is the input goal symbol to scan:
/(?:)/ ;
a / b ;
<a>Text</a> ;
The following table indicates which is the input goal symbol that is scanned for each of the tokens comprising the previous program:
Token | Input goal |
---|---|
/(?:)/ | InputElementRegExp |
; | InputElementDiv |
a | InputElementRegExp |
/ | InputElementDiv |
b | InputElementRegExp |
; | InputElementDiv |
< | InputElementRegExp |
a | InputElementXMLTag |
> | InputElementXMLTag |
Text | InputElementXMLContent |
</ | InputElementXMLContent |
a | InputElementXMLTag |
> | InputElementXMLTag |
; | InputElementDiv |
The InputElementXMLPI goal symbol must be used when parsing the <?html={exp}?>
markup.
Syntax
-
InputElementDiv ::
-
WhiteSpace
LineTerminator
Comment
Identifier
ReservedWord
Punctuator
/
/=
NumericLiteral
StringLiteral
-
InputElementRegExp ::
-
WhiteSpace
LineTerminator
Comment
Identifier
ReservedWord
Punctuator
NumericLiteral
StringLiteral
RegularExpressionLiteral
XMLMarkup
-
InputElementXMLTag ::
-
XMLName
XMLTagPunctuator
XMLAttributeValue
XMLWhitespace
{
-
InputElementXMLPI ::
-
{
?>
-
InputElementXMLContent ::
-
XMLMarkup
XMLText
{
< [lookahead ∉ { ?, !, / }]
</
Source Characters
Syntax
-
SourceCharacter ::
-
Unicode code point
-
SourceCharacters ::
-
SourceCharacter SourceCharactersopt
White Space
The WhiteSpace token is filtered out by the lexical scanner.
Syntax
-
WhiteSpace ::
-
U+09 tab
U+0B vertical tab
U+0C form feed
U+20 space
U+A0 no-break space
Unicode “space separator”
Line Terminator
The LineTerminator token is filtered out by the lexical scanner, however it may result in a VirtualSemicolon to be inserted.
Syntax
-
LineTerminator ::
-
U+0A line feed
U+0D carriage return
U+2028 line separator
U+2029 paragraph separator
Comment
The Comment token is filtered out by the lexical scanner, however it propagates any LineTerminator token from its characters.
/*
* /*
* *
* */
*/
Syntax
-
Comment ::
-
// SingleLineCommentCharacters
MultiLineComment
-
SingleLineCommentCharacters ::
-
SingleLineCommentCharacter SingleLineCommentCharactersopt
-
SingleLineCommentCharacter ::
-
[lookahead ∉ { LineTerminator }] SourceCharacter
-
MultiLineComment ::
-
/* MultiLineCommentCharactersopt */
-
MultiLineCommentCharacters ::
-
SourceCharacters [but no embedded sequence /*]
MultiLineComment
MultiLineCommentCharacters SourceCharacters [but no embedded sequence /*]
MultiLineCommentCharacters MultiLineComment
Virtual Semicolon
The VirtualSemicolon nonterminal matches an automatically inserted semicolon, known as a virtual semicolon.
Virtual semicolons are inserted in the following occasions:
- After a right-curly character }
- Before a LineTerminator
Identifier
The Identifier symbol is similiar to that from the ECMA-262 third edition, but with support for scalar Unicode escapes.
Syntax
-
Identifier ::
-
IdentifierName [but not ReservedWord or ContextKeyword]
ContextKeyword
-
IdentifierName ::
-
IdentifierStart
IdentifierName IdentifierPart
-
IdentifierStart ::
-
UnicodeLetter
underscore _
$
UnicodeEscapeSequence
-
IdentifierPart ::
-
UnicodeLetter
UnicodeCombiningMark
UnicodeConnectorPunctuation
UnicodeDigit
underscore _
$
UnicodeEscapeSequence
-
UnicodeLetter ::
-
Unicode letter (“L”)
Unicode letter number (“Nl”)
-
UnicodeDigit ::
-
Unicode decimal digit number (“Nd”)
-
UnicodeCombiningMark ::
-
Unicode nonspacing mark (“Mn”)
Unicode spacing combining mark (“Mc”)
-
UnicodeConnectorPunctuation ::
-
Unicode connector punctuation (“Pc”)
Keywords
ReservedWord includes the following reserved words:
as
do
if
in
is
for
new
not
try
use
var
case
else
null
this
true
void
with
await
break
catch
class
const
false
super
throw
while
yield
delete
import
public
return
switch
typeof
default
extends
finally
package
private
continue
function
internal
interface
protected
implements
ContextKeyword is one of the following in certain syntactic contexts:
get
set
each
enum
type
Embed
final
native
static
abstract
override
namespace
Punctuator
Punctuator includes one of the following:
:: @
. .. ...
( ) [ ] { }
: ; ,
? ! =
?.
< <=
> >=
== ===
!= !==
+ - * % **
++ --
<< >> >>>
& ^ | ~
&& ^^ || ??
The @
punctuator must not be followed by a single quote ' or a double quote character ".
Punctuator includes CompoundAssignmentPunctuator. CompoundAssignmentPunctuator is one of the following:
+= -= *= %= **=
<<= >>= >>>= &= ^= |=
&&= ^^= ||=
??=
Numeric Literal
NumericLiteral is similiar to NumericLiteral from the ECMA-262 third edition, with support for binary literals and underscore separators:
0b1011
1_000
Syntax
-
NumericLiteral ::
-
DecimalLiteral [lookahead ∉ { IdentifierStart, DecimalDigit }]
HexIntegerLiteral [lookahead ∉ { IdentifierStart, DecimalDigit }]
BinIntegerLiteral [lookahead ∉ { IdentifierStart, DecimalDigit }]
-
DecimalLiteral ::
-
DecimalIntegerLiteral . UnderscoreDecimalDigitsopt
ExponentPartopt
. UnderscoreDecimalDigits ExponentPartopt
DecimalIntegerLiteral ExponentPartopt
-
DecimalIntegerLiteral ::
-
0
[lookahead = NonZeroDigit] UnderscoreDecimalDigitsopt
-
DecimalDigits ::
-
DecimalDigit{1,}
-
UnderscoreDecimalDigits ::
-
DecimalDigits
UnderscoreDecimalDigits _ DecimalDigits
-
DecimalDigit ::
-
0-9
-
NonZeroDigit ::
-
1-9
-
ExponentPart ::
-
ExponentIndicator SignedInteger
-
ExponentIndicator ::
-
e
E
-
SignedInteger ::
-
UnderscoreDecimalDigits
+ UnderscoreDecimalDigits
- UnderscoreDecimalDigits
-
HexIntegerLiteral ::
-
0x UnderscoreHexDigits
0X UnderscoreHexDigits
-
HexDigit ::
-
0-9
A-F
a-f
-
UnderscoreHexDigits ::
-
HexDigit{1,}
UnderscoreDecimalDigits _ HexDigit{1,}
-
BinIntegerLiteral ::
-
0b UnderscoreBinDigits
0B UnderscoreBinDigits
-
BinDigit ::
-
0
1
-
UnderscoreBinDigits ::
-
BinDigit{1,}
UnderscoreDecimalDigits _ BinDigit{1,}
Regular Expression Literal
RegularExpressionLiteral is similiar to RegularExpressionLiteral from the ECMA-262 third edition, with support for line breaks.
Syntax
-
RegularExpressionLiteral ::
-
/ RegularExpressionBody / RegularExpressionFlags
-
RegularExpressionBody ::
-
RegularExpressionFirstChar RegularExpressionChars
-
RegularExpressionChars ::
-
«empty»
RegularExpressionChars RegularExpressionChar
-
RegularExpressionFirstChar ::
-
SourceCharacter [but not * or \ or /]
BackslashSequence
-
RegularExpressionChar ::
-
SourceCharacter [but not \ or /]
BackslashSequence
-
BackslashSequence ::
-
\ SourceCharacter
-
RegularExpressionFlags ::
-
«empty»
RegularExpressionFlags IdentifierPart
String Literal
StringLiteral is similiar to the StringLiteral symbol from the ECMA-262 third edition. The following additional features are included:
- Scalar UnicodeEscapeSequence using the
\u{...}
form - Triple string literals
- Raw string literals using the
@
prefix
Triple string literals use either """
or '''
as delimiter and may span multiple lines. The contents of triple string literals are indentation-based, as can be observed in the following program:
const text = """
foo
bar
"""
text == "foo\nbar"
Triple string literals are processed as follows:
- The first empty line is ignored.
- The base indentation of a triple string literal is that of the last string line.
Both regular and triple string literals accept the @
prefix, designating raw string literals. Raw string literals contain no escape sequences.
const text = @"""
x\y
"""
Escape sequences are described by the following table:
Escape | Description |
---|---|
\' | U+27 single-quote |
\" | U+22 double-quote |
\\ | U+5C backslash character |
\b | U+08 backspace character |
\f | U+0C form feed character |
\n | U+0A line feed character |
\r | U+0D carriage return character |
\t | U+09 tab character |
\v | U+0B vertical tab character |
\0 | U+00 character |
\xHH | Contributes an Unicode code point value |
\uHHHH | Contributes an Unicode code point value |
\u{...} | Contributes an Unicode code point value |
\ followed by LineTerminator | Contributes nothing |
Syntax
-
StringLiteral ::
-
[lookahead ≠ """] " DoubleStringCharacter{0,} "
[lookahead ≠ '''] ' SingleStringCharacter{0,} '
""" TripleDoubleStringCharacter{0,} """
''' TripleSingleStringCharacter{0,} '''
RawStringLiteral
-
RawStringLiteral ::
-
@ [lookahead ≠ """] " DoubleStringRawCharacter{0,} "
@ [lookahead ≠ '''] ' SingleStringRawCharacter{0,} '
@""" TripleDoubleStringRawCharacter{0,} """
@''' TripleSingleStringRawCharacter{0,} '''
-
DoubleStringCharacter ::
-
SourceCharacter [but not double-quote " or backslash \ or LineTerminator]
EscapeSequence
-
SingleStringCharacter ::
-
SourceCharacter [but not single-quote ' or backslash \ or LineTerminator]
EscapeSequence
-
DoubleStringRawCharacter ::
-
SourceCharacter [but not double-quote " or LineTerminator]
-
SingleStringRawCharacter ::
-
SourceCharacter [but not single-quote ' or LineTerminator]
-
TripleDoubleStringCharacter ::
-
[lookahead ≠ """] SourceCharacter [but not backslash \ or LineTerminator]
EscapeSequence
LineTerminator
-
TripleSingleStringCharacter ::
-
[lookahead ≠ '''] SourceCharacter [but not backslash \ or LineTerminator]
EscapeSequence
LineTerminator
-
TripleDoubleStringRawCharacter ::
-
[lookahead ≠ """] SourceCharacter [but not LineTerminator]
LineTerminator
-
TripleSingleStringRawCharacter ::
-
[lookahead ≠ '''] SourceCharacter [but not LineTerminator]
LineTerminator
Escape Sequences
Syntax
-
EscapeSequence ::
-
\ CharacterEscapeSequence
\0 [lookahead ∉ DecimalDigit]
\ LineTerminator
HexEscapeSequence
UnicodeEscapeSequence
-
CharacterEscapeSequence ::
-
SingleEscapeCharacter
NonEscapeCharacter
-
SingleEscapeCharacter ::
-
'
"
\
b
f
n
r
t
v
-
NonEscapeCharacter ::
-
SourceCharacter [but not EscapeCharacter or LineTerminator]
-
EscapeCharacter ::
-
SingleEscapeCharacter
DecimalDigit
x
u
-
HexEscapeSequence ::
-
\x HexDigit HexDigit
-
UnicodeEscapeSequence ::
-
\u HexDigit{4}
\u { HexDigit{1,} }
XML
This section defines nonterminals used in the lexical grammar as part of the XML capabilities of the ShockScript language.
If a XMLMarkup, XMLAttributeValue or XMLText contains a LineTerminator after parsed, it contributes such LineTerminator to the lexical scanner.
Syntax
-
XMLMarkup ::
-
XMLComment
XMLCDATA
XMLPI
<?html=
-
XMLWhitespaceCharacter ::
-
U+20 space
U+09 tab
U+0D carriage return
U+0A line feed
-
XMLWhitespace ::
-
XMLWhitespaceCharacter
XMLWhitespace XMLWhitespaceCharacter
-
XMLText ::
-
SourceCharacters [but no embedded left-curly { or less-than <]
-
XMLName ::
-
XMLNameStart
XMLName XMLNamePart
-
XMLNameStart ::
-
UnicodeLetter
underscore _
colon :
-
XMLNamePart ::
-
UnicodeLetter
UnicodeDigit
period .
hyphen -
underscore _
colon :
-
XMLComment ::
-
<!-- XMLCommentCharactersopt -->
-
XMLCommentCharacters ::
-
SourceCharacters [but no embedded sequence -->]
-
XMLCDATA ::
-
<![CDATA[ XMLCDATACharacters ]]>
-
XMLCDATACharacters ::
-
SourceCharacters [but no embedded sequence ]]>]
-
XMLPI ::
-
<? XMLPICharactersopt ?>
-
XMLPICharacters ::
-
SourceCharacters [but no embedded sequence ?>]
-
XMLAttributeValue ::
-
" XMLDoubleStringCharactersopt "
' XMLSingleStringCharactersopt '
-
XMLDoubleStringCharacters ::
-
SourceCharacters [but no embedded double-quote "]
-
XMLSingleStringCharacters ::
-
SourceCharacters [but no embedded single-quote ']
-
XMLTagPunctuator ::
-
=
&=
>
/>