Lexical conventions

This section defines the lexical grammar of the ShockScript language.

The tokenizer scans one of the following input goal symbols depending on the syntactic context: InputElementDiv, InputElementRegExp, InputElementXMLTag, InputElementPI, InputElementXMLContent.

The following program illustrates how the tokenizer decides which is the input goal symbol to scan:

/(?:)/       ;
a / b        ;
<a>Text</a>  ;

The following table indicates which is the input goal symbol that is scanned for each of the tokens comprising the previous program:

Token	Input goal
/(?:)/	InputElementRegExp
;	InputElementDiv
a	InputElementRegExp
/	InputElementDiv
b	InputElementRegExp
;	InputElementDiv
<	InputElementRegExp
a	InputElementXMLTag
>	InputElementXMLTag
Text	InputElementXMLContent
</	InputElementXMLContent
a	InputElementXMLTag
>	InputElementXMLTag
;	InputElementDiv

The InputElementPI goal symbol must be used while parsing a <?fixed={x}?> expression.

Note: InputElementPI has nothing to do with E4X. It’s currently used in the fixed expression for escaping out of dynamic properties.

Syntax

InputElementDiv

WhiteSpace

LineTerminator

Comment

Identifier

ReservedWord

Punctuator

NumericLiteral

StringLiteral

InputElementRegExp

WhiteSpace

LineTerminator

Comment

Identifier

ReservedWord

Punctuator

NumericLiteral

StringLiteral

RegularExpressionLiteral

XMLMarkup

<?fixed={

InputElementXMLTag

XMLName

XMLTagPunctuator

XMLAttributeValue

XMLWhitespace

{

InputElementPI

InputElementXMLContent

XMLMarkup

XMLText

{

Source Characters

Syntax

SourceCharacter

Unicode code point

SourceCharacters

SourceCharacter

SourceCharacters

_opt

White Space

The WhiteSpace token is filtered out by the lexical scanner.

Syntax

WhiteSpace

Line Terminator

The LineTerminator token is filtered out by the lexical scanner, however it may result in a VirtualSemicolon to be inserted.

Syntax

LineTerminator

Comment

The Comment token is filtered out by the lexical scanner, however it propagates any LineTerminator token from its characters.

/*
 * /*
 *  *
 *  */
 */

Syntax

Comment

SingleLineCommentCharacters

MultiLineComment

SingleLineCommentCharacters

SingleLineCommentCharacter

SingleLineCommentCharacters

_opt

SingleLineCommentCharacter

LineTerminator

SourceCharacter

MultiLineComment

MultiLineCommentCharacters

_opt

MultiLineCommentCharacters

SourceCharacters

MultiLineComment

MultiLineCommentCharacters

SourceCharacters

MultiLineCommentCharacters

MultiLineComment

Virtual Semicolon

The VirtualSemicolon nonterminal matches an automatically inserted semicolon, known as a virtual semicolon.

Virtual semicolons are inserted in the following occasions:

After a right-curly character }
Before a LineTerminator

Identifier

The Identifier symbol is similiar to that from the ECMA-262 third edition, but with support for scalar Unicode escapes, \xXX escapes and a \x{...} escape (alias to \u{...}).

Syntax

Identifier

IdentifierName

ReservedWord

ContextKeyword

IdentifierName

IdentifierStart

IdentifierName

IdentifierPart

IdentifierStart

UnicodeLetter

UnicodeEscapeSequence

IdentifierPart

UnicodeLetter

UnicodeCombiningMark

UnicodeConnectorPunctuation

UnicodeDigit

UnicodeEscapeSequence

UnicodeLetter

UnicodeDigit

Unicode decimal digit number (“Nd”)

UnicodeCombiningMark

UnicodeConnectorPunctuation

Unicode connector punctuation (“Pc”)

Keywords

ReservedWord includes the following reserved words:

as
do
if
in
is

for
let
new
not
try
use
var

case
else
null
this
true
void
with

await
break
catch
class
const
false
super
throw
while
yield

delete
import
public
return
switch
typeof

default
extends
finally
package
private

continue
function
internal

interface
protected

implements

ContextKeyword is one of the following in certain syntactic contexts:

get
map
set
tap
xml

each
enum
meta
type

Embed
final

native
static

decimal
generic

abstract
override

namespace
undefined

Punctuator

Punctuator includes one of the following:

::  @
.  ..  ...
(  )  [  ]  {  }
:  ;  ,
?  !  =
?.
<  <=
>  >=
==  ===
!=  !==
+  -  *  %  **
++  --
<<  >>  >>>
&  ^  |  ~
&&  ^^  ||  ??

The @ punctuator must not be followed by a single quote ’ or a double quote character “.

Punctuator includes CompoundAssignmentPunctuator. CompoundAssignmentPunctuator is one of the following:

+=  -=  *=  %=  **=
<<=  >>=  >>>=  &=  ^=  |=
&&=  ^^=  ||=
??=

Numeric Literal

NumericLiteral is similiar to NumericLiteral from the ECMA-262 third edition, with support for binary literals, underscore separators and certain suffixes:

0b1011
0o77777
0x0A
1_000

10d     // double(10) or simply 10
10f     // float(10)
10i     // int(10)
10m     // decimal(10). "m" for money
10n     // bigint(10)
10u     // uint(10)

Syntax

NumericLiteral

DecimalLiteral

DecimalLiteralSuffix

_opt

IdentifierStart

DecimalDigit

HexIntegerLiteral

HexLiteralSuffix

_opt

IdentifierStart

DecimalDigit

BinIntegerLiteral

BinLiteralSuffix

_opt

IdentifierStart

DecimalDigit

OctalIntegerLiteral

OctalLiteralSuffix

_opt

IdentifierStart

DecimalDigit

DecimalLiteralSuffix

HexLiteralSuffix

BinLiteralSuffix

DecimalLiteralSuffix

OctalLiteralSuffix

DecimalLiteralSuffix

DecimalLiteral

DecimalIntegerLiteral

UnderscoreDecimalDigits

_opt

ExponentPart

_opt

UnderscoreDecimalDigits

ExponentPart

_opt

DecimalIntegerLiteral

ExponentPart

_opt

DecimalIntegerLiteral

NonZeroDigit

UnderscoreDecimalDigits

_opt

DecimalDigits

DecimalDigit

_{1,}

UnderscoreDecimalDigits

DecimalDigits

UnderscoreDecimalDigits

DecimalDigits

DecimalDigit

NonZeroDigit

ExponentPart

ExponentIndicator

SignedInteger

ExponentIndicator

SignedInteger

UnderscoreDecimalDigits

HexIntegerLiteral

UnderscoreHexDigits

HexDigit

UnderscoreHexDigits

HexDigit

_{1,}

UnderscoreHexDigits

HexDigit

_{1,}

BinIntegerLiteral

UnderscoreBinDigits

BinDigit

UnderscoreBinDigits

BinDigit

_{1,}

UnderscoreBinDigits

BinDigit

_{1,}

OctalIntegerLiteral

UnderscoreOctalDigits

OctalDigit

UnderscoreOctalDigits

OctalDigit

_{1,}

UnderscoreOctalDigits

OctalDigit

_{1,}

Regular Expression Literal

RegularExpressionLiteral is similiar to RegularExpressionLiteral from the ECMA-262 third edition, with support for line breaks.

Syntax

RegularExpressionLiteral

RegularExpressionBody

RegularExpressionFlags

RegularExpressionBody

RegularExpressionFirstChar

RegularExpressionChars

RegularExpressionChar

RegularExpressionFirstChar

SourceCharacter

BackslashSequence

RegularExpressionChar

SourceCharacter

BackslashSequence

SourceCharacter

RegularExpressionFlags

IdentifierPart

String Literal

StringLiteral is similiar to the StringLiteral symbol from the ECMA-262 third edition. The following additional features are included:

Scalar UnicodeEscapeSequence using the \u{...} or \x{...} form
Triple strings
Raw strings using the @ prefix

Triple string literals use either """ or ''' as delimiter and may span multiple lines. The contents of triple string literals are indentation-based, as can be observed in the following program:

const text = """
    foo
    bar
"""
text == "foo\nbar"

Triple strings are processed as follows:

The base line for determining nested indentation characters is the non-empty (i.e. not a whitespace only line) first line that contains the lowest-indentation level after whitespace characters.
Every line contents start from the base line’s first non-whitespace character.
Beginning and end lines that are empty or consist only of whitespace are discarded.

Both regular and triple strings accept the @ prefix, designating raw string literals. Raw string literals contain no escape sequences.

const text = @"""
    x\y
"""

Escape sequences are described by the following table:

Escape	Description
\’	U+27 single-quote
\“	U+22 double-quote
\\	U+5C backslash character
\b	U+08 backspace character
\f	U+0C form feed character
\n	U+0A line feed character
\r	U+0D carriage return character
\t	U+09 tab character
\v	U+0B vertical tab character
\0	U+00 character
\xHH	Contributes an Unicode code point value
\uHHHH	Contributes an Unicode code point value
\u{…}	Contributes an Unicode code point value
\ followed by LineTerminator	Contributes nothing

Syntax

StringLiteral

"""

DoubleStringCharacter

_{0,}

'''

SingleStringCharacter

_{0,}

"""

TripleDoubleStringCharacter

_{0,}

"""

'''

TripleSingleStringCharacter

_{0,}

'''

RawStringLiteral

"""

DoubleStringRawCharacter

_{0,}

'''

SingleStringRawCharacter

_{0,}

@"""

TripleDoubleStringRawCharacter

_{0,}

"""

@'''

TripleSingleStringRawCharacter

_{0,}

'''

DoubleStringCharacter

SourceCharacter

LineTerminator

EscapeSequence

SingleStringCharacter

SourceCharacter

LineTerminator

EscapeSequence

DoubleStringRawCharacter

SourceCharacter

LineTerminator

SingleStringRawCharacter

SourceCharacter

LineTerminator

TripleDoubleStringCharacter

"""

SourceCharacter

LineTerminator

EscapeSequence

LineTerminator

TripleSingleStringCharacter

'''

SourceCharacter

LineTerminator

EscapeSequence

LineTerminator

TripleDoubleStringRawCharacter

"""

SourceCharacter

LineTerminator

TripleSingleStringRawCharacter

'''

SourceCharacter

LineTerminator

Escape Sequences

Syntax

EscapeSequence

CharacterEscapeSequence

DecimalDigit

LineTerminator

UnicodeEscapeSequence

CharacterEscapeSequence

SingleEscapeCharacter

NonEscapeCharacter

SingleEscapeCharacter

NonEscapeCharacter

SourceCharacter

EscapeCharacter

LineTerminator

EscapeCharacter

SingleEscapeCharacter

DecimalDigit

UnicodeEscapeSequence

HexDigit

_{4}

{

HexDigit

_{1,}

}

{

HexDigit

_{1,}

}

XML

This section defines nonterminals used in the lexical grammar as part of the XML capabilities of the ShockScript language.

If a XMLMarkup, XMLAttributeValue or XMLText contains a LineTerminator after parsed, it contributes such LineTerminator to the lexical scanner.

Syntax

XMLMarkup

XMLComment

XMLCDATA

XMLPI

XMLWhitespaceCharacter

XMLWhitespace

XMLWhitespaceCharacter

XMLWhitespace

XMLWhitespaceCharacter

XMLText

SourceCharacters

{

XMLName

XMLNameStart

XMLName

XMLNamePart

XMLNameStart

UnicodeLetter

XMLNamePart

UnicodeLetter

UnicodeDigit

XMLComment

<!--

XMLCommentCharacters

_opt

-->

XMLCommentCharacters

SourceCharacters

-->

XMLCDATA

<![CDATA[

XMLCDATACharacters

]]>

XMLCDATACharacters

SourceCharacters

]]>

XMLPI

XMLPICharacters

_opt

XMLPICharacters

SourceCharacters

XMLAttributeValue

XMLDoubleStringCharacters

_opt

XMLSingleStringCharacters

_opt

XMLDoubleStringCharacters

SourceCharacters

XMLSingleStringCharacters

SourceCharacters

XMLTagPunctuator

Semantics

XMLCDATA contents, excluding the <![CDATA[ opening sequence and the ]]> closing sequence, are processed the same way as triple strings:

The base line for determining nested indentation characters is the non-empty (i.e. not a whitespace only line) first line that contains the lowest-indentation level after whitespace characters.
Every line contents start from the base line’s first non-whitespace character.
Beginning and end lines that are empty or consist only of whitespace are discarded.

For XMLText, unlike the E4X standard, ShockScript always trims any whitespace at the beginning and end of the text. The parser can skip the token if its empty after trimming whitespace. Note that this does not apply to the XML or XMLList parsers during runtime; they ignore whitespace depending on the XMLContext object specified by the use xml pragma.

Keyboard shortcuts

ShockScript