Lexical conventions

This section defines the lexical grammar of the ShockScript language.

The tokenizer scans one of the following input goal symbols depending on the syntactic context: InputElementDiv, InputElementRegExp, InputElementXMLTag, InputElementXMLPI, InputElementXMLContent.

The following program illustrates how the tokenizer decides which is the input goal symbol to scan:

/(?:)/       ;
a / b        ;
<a>Text</a>  ;

The following table indicates which is the input goal symbol that is scanned for each of the tokens comprising the previous program:

Token	Input goal
/(?:)/	InputElementRegExp
;	InputElementDiv
a	InputElementRegExp
/	InputElementDiv
b	InputElementRegExp
;	InputElementDiv
<	InputElementRegExp
a	InputElementXMLTag
>	InputElementXMLTag
Text	InputElementXMLContent
</	InputElementXMLContent
a	InputElementXMLTag
>	InputElementXMLTag
;	InputElementDiv

The InputElementXMLPI goal symbol must be used when parsing the <?fixed={exp}?> markup.

Syntax

InputElementDiv

WhiteSpace

LineTerminator

Comment

Identifier

ReservedWord

Punctuator

NumericLiteral

StringLiteral

InputElementRegExp

WhiteSpace

LineTerminator

Comment

Identifier

ReservedWord

Punctuator

NumericLiteral

StringLiteral

RegularExpressionLiteral

XMLMarkup

<?fixed={

InputElementXMLTag

XMLName

XMLTagPunctuator

XMLAttributeValue

XMLWhitespace

{

InputElementXMLPI

InputElementXMLContent

XMLMarkup

XMLText

{

Source Characters

Syntax

SourceCharacter

Unicode code point

SourceCharacters

SourceCharacter

SourceCharacters

_opt

White Space

The WhiteSpace token is filtered out by the lexical scanner.

Syntax

WhiteSpace

Line Terminator

The LineTerminator token is filtered out by the lexical scanner, however it may result in a VirtualSemicolon to be inserted.

Syntax

LineTerminator

Comment

The Comment token is filtered out by the lexical scanner, however it propagates any LineTerminator token from its characters.

/*
 * /*
 *  *
 *  */
 */

Syntax

Comment

SingleLineCommentCharacters

MultiLineComment

SingleLineCommentCharacters

SingleLineCommentCharacter

SingleLineCommentCharacters

_opt

SingleLineCommentCharacter

LineTerminator

SourceCharacter

MultiLineComment

MultiLineCommentCharacters

_opt

MultiLineCommentCharacters

SourceCharacters

MultiLineComment

MultiLineCommentCharacters

SourceCharacters

MultiLineCommentCharacters

MultiLineComment

Virtual Semicolon

The VirtualSemicolon nonterminal matches an automatically inserted semicolon, known as a virtual semicolon.

Virtual semicolons are inserted in the following occasions:

After a right-curly character }
Before a LineTerminator

Identifier

The Identifier symbol is similiar to that from the ECMA-262 third edition, but with support for scalar Unicode escapes.

Syntax

Identifier

IdentifierName

ReservedWord

ContextKeyword

IdentifierName

IdentifierStart

IdentifierName

IdentifierPart

IdentifierStart

UnicodeLetter

UnicodeEscapeSequence

IdentifierPart

UnicodeLetter

UnicodeCombiningMark

UnicodeConnectorPunctuation

UnicodeDigit

UnicodeEscapeSequence

UnicodeLetter

UnicodeDigit

Unicode decimal digit number (“Nd”)

UnicodeCombiningMark

UnicodeConnectorPunctuation

Unicode connector punctuation (“Pc”)

Keywords

ReservedWord includes the following reserved words:

as
do
if
in
is

for
new
not
try
use
var

case
else
null
this
true
void
with

await
break
catch
class
const
false
super
throw
while
yield

delete
import
public
return
switch
typeof

default
extends
finally
package
private

continue
function
internal

interface
protected

implements

ContextKeyword is one of the following in certain syntactic contexts:

get
set

each
enum
type

Embed
final

native
static

abstract
override

namespace

Punctuator

Punctuator includes one of the following:

::  @
.  ..  ...
(  )  [  ]  {  }
:  ;  ,
?  !  =
?.
<  <=
>  >=
==  ===
!=  !==
+  -  *  %  **
++  --
<<  >>  >>>
&  ^  |  ~
&&  ^^  ||  ??

The @ punctuator must not be followed by a single quote ' or a double quote character ".

Punctuator includes CompoundAssignmentPunctuator. CompoundAssignmentPunctuator is one of the following:

+=  -=  *=  %=  **=
<<=  >>=  >>>=  &=  ^=  |=
&&=  ^^=  ||=
??=

Numeric Literal

NumericLiteral is similiar to NumericLiteral from the ECMA-262 third edition, with support for binary literals and underscore separators:

0b1011
1_000

Syntax

NumericLiteral

DecimalLiteral

IdentifierStart

DecimalDigit

HexIntegerLiteral

IdentifierStart

DecimalDigit

BinIntegerLiteral

IdentifierStart

DecimalDigit

DecimalLiteral

DecimalIntegerLiteral

UnderscoreDecimalDigits

_opt

ExponentPart

_opt

UnderscoreDecimalDigits

ExponentPart

_opt

DecimalIntegerLiteral

ExponentPart

_opt

DecimalIntegerLiteral

NonZeroDigit

UnderscoreDecimalDigits

_opt

DecimalDigits

DecimalDigit

_{1,}

UnderscoreDecimalDigits

DecimalDigits

UnderscoreDecimalDigits

DecimalDigits

DecimalDigit

NonZeroDigit

ExponentPart

ExponentIndicator

SignedInteger

ExponentIndicator

SignedInteger

UnderscoreDecimalDigits

HexIntegerLiteral

UnderscoreHexDigits

HexDigit

UnderscoreHexDigits

HexDigit

_{1,}

UnderscoreDecimalDigits

HexDigit

_{1,}

BinIntegerLiteral

UnderscoreBinDigits

BinDigit

UnderscoreBinDigits

BinDigit

_{1,}

UnderscoreDecimalDigits

BinDigit

_{1,}

Regular Expression Literal

RegularExpressionLiteral is similiar to RegularExpressionLiteral from the ECMA-262 third edition, with support for line breaks.

Syntax

RegularExpressionLiteral

RegularExpressionBody

RegularExpressionFlags

RegularExpressionBody

RegularExpressionFirstChar

RegularExpressionChars

RegularExpressionChar

RegularExpressionFirstChar

SourceCharacter

BackslashSequence

RegularExpressionChar

SourceCharacter

BackslashSequence

SourceCharacter

RegularExpressionFlags

IdentifierPart

String Literal

StringLiteral is similiar to the StringLiteral symbol from the ECMA-262 third edition. The following additional features are included:

Scalar UnicodeEscapeSequence using the \u{...} form
Triple string literals
Raw string literals using the @ prefix

Triple string literals use either """ or ''' as delimiter and may span multiple lines. The contents of triple string literals are indentation-based, as can be observed in the following program:

const text = """
    foo
    bar
    """
text == "foo\nbar"

Triple string literals are processed as follows:

The first empty line is ignored.
The base indentation of a triple string literal is that of the last string line.

Both regular and triple string literals accept the @ prefix, designating raw string literals. Raw string literals contain no escape sequences.

const text = @"""
    x\y
    """

Escape sequences are described by the following table:

Escape	Description
\'	U+27 single-quote
\"	U+22 double-quote
\\	U+5C backslash character
\b	U+08 backspace character
\f	U+0C form feed character
\n	U+0A line feed character
\r	U+0D carriage return character
\t	U+09 tab character
\v	U+0B vertical tab character
\0	U+00 character
\xHH	Contributes an Unicode code point value
\uHHHH	Contributes an Unicode code point value
\u{...}	Contributes an Unicode code point value
\ followed by LineTerminator	Contributes nothing

Syntax

StringLiteral

"""

DoubleStringCharacter

_{0,}

'''

SingleStringCharacter

_{0,}

"""

TripleDoubleStringCharacter

_{0,}

"""

'''

TripleSingleStringCharacter

_{0,}

'''

RawStringLiteral

"""

DoubleStringRawCharacter

_{0,}

'''

SingleStringRawCharacter

_{0,}

@"""

TripleDoubleStringRawCharacter

_{0,}

"""

@'''

TripleSingleStringRawCharacter

_{0,}

'''

DoubleStringCharacter

SourceCharacter

LineTerminator

EscapeSequence

SingleStringCharacter

SourceCharacter

LineTerminator

EscapeSequence

DoubleStringRawCharacter

SourceCharacter

LineTerminator

SingleStringRawCharacter

SourceCharacter

LineTerminator

TripleDoubleStringCharacter

"""

SourceCharacter

LineTerminator

EscapeSequence

LineTerminator

TripleSingleStringCharacter

'''

SourceCharacter

LineTerminator

EscapeSequence

LineTerminator

TripleDoubleStringRawCharacter

"""

SourceCharacter

LineTerminator

TripleSingleStringRawCharacter

'''

SourceCharacter

LineTerminator

Escape Sequences

Syntax

EscapeSequence

CharacterEscapeSequence

DecimalDigit

LineTerminator

HexEscapeSequence

UnicodeEscapeSequence

CharacterEscapeSequence

SingleEscapeCharacter

NonEscapeCharacter

SingleEscapeCharacter

NonEscapeCharacter

SourceCharacter

EscapeCharacter

LineTerminator

EscapeCharacter

SingleEscapeCharacter

DecimalDigit

HexEscapeSequence

HexDigit

UnicodeEscapeSequence

HexDigit

_{4}

{

HexDigit

_{1,}

}

XML

This section defines nonterminals used in the lexical grammar as part of the XML capabilities of the ShockScript language.

If a XMLMarkup, XMLAttributeValue or XMLText contains a LineTerminator after parsed, it contributes such LineTerminator to the lexical scanner.

Syntax

XMLMarkup

XMLComment

XMLCDATA

XMLPI

XMLWhitespaceCharacter

XMLWhitespace

XMLWhitespaceCharacter

XMLWhitespace

XMLWhitespaceCharacter

XMLText

SourceCharacters

{

XMLName

XMLNameStart

XMLName

XMLNamePart

XMLNameStart

UnicodeLetter

XMLNamePart

UnicodeLetter

UnicodeDigit

XMLComment

<!--

XMLCommentCharacters

_opt

-->

XMLCommentCharacters

SourceCharacters

-->

XMLCDATA

<![CDATA[

XMLCDATACharacters

]]>

XMLCDATACharacters

SourceCharacters

]]>

XMLPI

XMLPICharacters

_opt

XMLPICharacters

SourceCharacters

XMLAttributeValue

XMLDoubleStringCharacters

_opt

XMLSingleStringCharacters

_opt

XMLDoubleStringCharacters

SourceCharacters

XMLSingleStringCharacters

SourceCharacters

XMLTagPunctuator

ShockScript Language Specification