Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Lexical conventions

This section defines the lexical grammar of the ShockScript language.

The tokenizer scans one of the following input goal symbols depending on the syntactic context: InputElementDiv, InputElementRegExp, InputElementXMLTag, InputElementPI, InputElementXMLContent.

The following program illustrates how the tokenizer decides which is the input goal symbol to scan:

/(?:)/       ;
a / b        ;
<a>Text</a>  ;

The following table indicates which is the input goal symbol that is scanned for each of the tokens comprising the previous program:

TokenInput goal
/(?:)/InputElementRegExp
;InputElementDiv
aInputElementRegExp
/InputElementDiv
bInputElementRegExp
;InputElementDiv
<InputElementRegExp
aInputElementXMLTag
>InputElementXMLTag
TextInputElementXMLContent
</InputElementXMLContent
aInputElementXMLTag
>InputElementXMLTag
;InputElementDiv

The InputElementPI goal symbol must be used while parsing a <?fixed={x}?> expression.

Note: InputElementPI has nothing to do with E4X. It’s currently used in the fixed expression for escaping out of dynamic properties.

Syntax

    InputElementDiv ::
      WhiteSpace
      LineTerminator
      Comment
      Identifier
      ReservedWord
      Punctuator
      /
      /=
      NumericLiteral
      StringLiteral
    InputElementRegExp ::
      WhiteSpace
      LineTerminator
      Comment
      Identifier
      ReservedWord
      Punctuator
      NumericLiteral
      StringLiteral
      RegularExpressionLiteral
      XMLMarkup
      <?fixed={
    InputElementXMLTag ::
      XMLName
      XMLTagPunctuator
      XMLAttributeValue
      XMLWhitespace
      {
    InputElementPI ::
      ?>
    InputElementXMLContent ::
      XMLMarkup
      XMLText
      {
      < [lookahead ∉ { ?, !, / }]
      </

Source Characters

Syntax

    SourceCharacter ::
      Unicode code point
    SourceCharacters ::
      SourceCharacter SourceCharactersopt

White Space

The WhiteSpace token is filtered out by the lexical scanner.

Syntax

    WhiteSpace ::
      U+09 tab
      U+0B vertical tab
      U+0C form feed
      U+20 space
      U+A0 no-break space
      Unicode “space separator”

Line Terminator

The LineTerminator token is filtered out by the lexical scanner, however it may result in a VirtualSemicolon to be inserted.

Syntax

    LineTerminator ::
      U+0A line feed
      U+0D carriage return
      U+2028 line separator
      U+2029 paragraph separator

Comment

The Comment token is filtered out by the lexical scanner, however it propagates any LineTerminator token from its characters.

/*
 * /*
 *  *
 *  */
 */

Syntax

    Comment ::
      // SingleLineCommentCharacters
      MultiLineComment
    SingleLineCommentCharacters ::
      SingleLineCommentCharacter SingleLineCommentCharactersopt
    SingleLineCommentCharacter ::
      [lookahead ∉ { LineTerminator }] SourceCharacter
    MultiLineComment ::
      /* MultiLineCommentCharactersopt */
    MultiLineCommentCharacters ::
      SourceCharacters [but no embedded sequence /*]
      MultiLineComment
      MultiLineCommentCharacters SourceCharacters [but no embedded sequence /*]
      MultiLineCommentCharacters MultiLineComment

Virtual Semicolon

The VirtualSemicolon nonterminal matches an automatically inserted semicolon, known as a virtual semicolon.

Virtual semicolons are inserted in the following occasions:

  • After a right-curly character }
  • Before a LineTerminator

Identifier

The Identifier symbol is similiar to that from the ECMA-262 third edition, but with support for scalar Unicode escapes, \xXX escapes and a \x{...} escape (alias to \u{...}).

Syntax

    Identifier ::
      IdentifierName [but not ReservedWord or ContextKeyword]
      ContextKeyword
    IdentifierName ::
      IdentifierStart
      IdentifierName IdentifierPart
    IdentifierStart ::
      UnicodeLetter
      underscore _
      $
      UnicodeEscapeSequence
    IdentifierPart ::
      UnicodeLetter
      UnicodeCombiningMark
      UnicodeConnectorPunctuation
      UnicodeDigit
      underscore _
      $
      UnicodeEscapeSequence
    UnicodeLetter ::
      Unicode letter (“L”)
      Unicode letter number (“Nl”)
    UnicodeDigit ::
      Unicode decimal digit number (“Nd”)
    UnicodeCombiningMark ::
      Unicode nonspacing mark (“Mn”)
      Unicode spacing combining mark (“Mc”)
    UnicodeConnectorPunctuation ::
      Unicode connector punctuation (“Pc”)

Keywords

ReservedWord includes the following reserved words:

as
do
if
in
is

for
let
new
not
try
use
var

case
else
null
this
true
void
with

await
break
catch
class
const
false
super
throw
while
yield

delete
import
public
return
switch
typeof

default
extends
finally
package
private

continue
function
internal

interface
protected

implements

ContextKeyword is one of the following in certain syntactic contexts:

get
map
set
tap
xml

each
enum
meta
type

Embed
final

native
static

decimal
generic

abstract
override

namespace
undefined

Punctuator

Punctuator includes one of the following:

::  @
.  ..  ...
(  )  [  ]  {  }
:  ;  ,
?  !  =
?.
<  <=
>  >=
==  ===
!=  !==
+  -  *  %  **
++  --
<<  >>  >>>
&  ^  |  ~
&&  ^^  ||  ??

The @ punctuator must not be followed by a single quote or a double quote character .

Punctuator includes CompoundAssignmentPunctuator. CompoundAssignmentPunctuator is one of the following:

+=  -=  *=  %=  **=
<<=  >>=  >>>=  &=  ^=  |=
&&=  ^^=  ||=
??=

Numeric Literal

NumericLiteral is similiar to NumericLiteral from the ECMA-262 third edition, with support for binary literals, underscore separators and certain suffixes:

0b1011
0o77777
0x0A
1_000

10d     // double(10) or simply 10
10f     // float(10)
10i     // int(10)
10m     // decimal(10). "m" for money
10n     // bigint(10)
10u     // uint(10)

Syntax

    NumericLiteral ::
      DecimalLiteral DecimalLiteralSuffixopt [lookahead ∉ { IdentifierStart, DecimalDigit }]
      HexIntegerLiteral HexLiteralSuffixopt [lookahead ∉ { IdentifierStart, DecimalDigit }]
      BinIntegerLiteral BinLiteralSuffixopt [lookahead ∉ { IdentifierStart, DecimalDigit }]
      OctalIntegerLiteral OctalLiteralSuffixopt [lookahead ∉ { IdentifierStart, DecimalDigit }]
    DecimalLiteralSuffix ::
      d
      D
      f
      F
      i
      I
      m
      M
      n
      N
      u
      U
    HexLiteralSuffix ::
      i
      I
      m
      M
      n
      N
      u
      U
    BinLiteralSuffix ::
      DecimalLiteralSuffix
    OctalLiteralSuffix ::
      DecimalLiteralSuffix
    DecimalLiteral ::
      DecimalIntegerLiteral . UnderscoreDecimalDigitsopt
      ExponentPartopt
      . UnderscoreDecimalDigits ExponentPartopt
      DecimalIntegerLiteral ExponentPartopt
    DecimalIntegerLiteral ::
      0
      [lookahead = NonZeroDigit] UnderscoreDecimalDigitsopt
    DecimalDigits ::
      DecimalDigit{1,}
    UnderscoreDecimalDigits ::
      DecimalDigits UnderscoreDecimalDigits _ DecimalDigits
    DecimalDigit ::
      0-9
    NonZeroDigit ::
      1-9
    ExponentPart ::
      ExponentIndicator SignedInteger
    ExponentIndicator ::
      e
      E
    SignedInteger ::
      UnderscoreDecimalDigits
      + UnderscoreDecimalDigits
      - UnderscoreDecimalDigits
    HexIntegerLiteral ::
      0x UnderscoreHexDigits
      0X UnderscoreHexDigits
    HexDigit ::
      0-9
      A-F
      a-f
    UnderscoreHexDigits ::
      HexDigit{1,}
      UnderscoreHexDigits _ HexDigit{1,}
    BinIntegerLiteral ::
      0b UnderscoreBinDigits
      0B UnderscoreBinDigits
    BinDigit ::
      0
      1
    UnderscoreBinDigits ::
      BinDigit{1,}
      UnderscoreBinDigits _ BinDigit{1,}
    OctalIntegerLiteral ::
      0o UnderscoreOctalDigits
      0O UnderscoreOctalDigits
    OctalDigit ::
      0-7
    UnderscoreOctalDigits ::
      OctalDigit{1,}
      UnderscoreOctalDigits _ OctalDigit{1,}

Regular Expression Literal

RegularExpressionLiteral is similiar to RegularExpressionLiteral from the ECMA-262 third edition, with support for line breaks.

Syntax

    RegularExpressionLiteral ::
      / RegularExpressionBody / RegularExpressionFlags
    RegularExpressionBody ::
      RegularExpressionFirstChar RegularExpressionChars
    RegularExpressionChars ::
      «empty»
      RegularExpressionChars RegularExpressionChar
    RegularExpressionFirstChar ::
      SourceCharacter [but not * or \ or /]
      BackslashSequence
    RegularExpressionChar ::
      SourceCharacter [but not \ or /]
      BackslashSequence
    BackslashSequence ::
      \ SourceCharacter
    RegularExpressionFlags ::
      «empty»
      RegularExpressionFlags IdentifierPart

String Literal

StringLiteral is similiar to the StringLiteral symbol from the ECMA-262 third edition. The following additional features are included:

  • Scalar UnicodeEscapeSequence using the \u{...} or \x{...} form
  • Triple strings
  • Raw strings using the @ prefix

Triple string literals use either """ or ''' as delimiter and may span multiple lines. The contents of triple string literals are indentation-based, as can be observed in the following program:

const text = """
    foo
    bar
"""
text == "foo\nbar"

Triple strings are processed as follows:

  • The base line for determining nested indentation characters is the non-empty (i.e. not a whitespace only line) first line that contains the lowest-indentation level after whitespace characters.
  • Every line contents start from the base line’s first non-whitespace character.
  • Beginning and end lines that are empty or consist only of whitespace are discarded.

Both regular and triple strings accept the @ prefix, designating raw string literals. Raw string literals contain no escape sequences.

const text = @"""
    x\y
"""

Escape sequences are described by the following table:

EscapeDescription
\’U+27 single-quote
\“U+22 double-quote
\\U+5C backslash character
\bU+08 backspace character
\fU+0C form feed character
\nU+0A line feed character
\rU+0D carriage return character
\tU+09 tab character
\vU+0B vertical tab character
\0U+00 character
\xHHContributes an Unicode code point value
\uHHHHContributes an Unicode code point value
\u{…}Contributes an Unicode code point value
\ followed by LineTerminatorContributes nothing

Syntax

    StringLiteral ::
      [lookahead ≠ """] " DoubleStringCharacter{0,} "
      [lookahead ≠ '''] ' SingleStringCharacter{0,} '
      """ TripleDoubleStringCharacter{0,} """
      ''' TripleSingleStringCharacter{0,} '''
      RawStringLiteral
    RawStringLiteral ::
      @ [lookahead ≠ """] " DoubleStringRawCharacter{0,} "
      @ [lookahead ≠ '''] ' SingleStringRawCharacter{0,} '
      @""" TripleDoubleStringRawCharacter{0,} """
      @''' TripleSingleStringRawCharacter{0,} '''
    DoubleStringCharacter ::
      SourceCharacter [but not double-quote " or backslash \ or LineTerminator]
      EscapeSequence
    SingleStringCharacter ::
      SourceCharacter [but not single-quote ' or backslash \ or LineTerminator]
      EscapeSequence
    DoubleStringRawCharacter ::
      SourceCharacter [but not double-quote " or LineTerminator]
    SingleStringRawCharacter ::
      SourceCharacter [but not single-quote ' or LineTerminator]
    TripleDoubleStringCharacter ::
      [lookahead ≠ """] SourceCharacter [but not backslash \ or LineTerminator]
      EscapeSequence
      LineTerminator
    TripleSingleStringCharacter ::
      [lookahead ≠ '''] SourceCharacter [but not backslash \ or LineTerminator]
      EscapeSequence
      LineTerminator
    TripleDoubleStringRawCharacter ::
      [lookahead ≠ """] SourceCharacter [but not LineTerminator]
      LineTerminator
    TripleSingleStringRawCharacter ::
      [lookahead ≠ '''] SourceCharacter [but not LineTerminator]
      LineTerminator

Escape Sequences

Syntax

    EscapeSequence ::
      \ CharacterEscapeSequence
      \0 [lookahead ∉ DecimalDigit]
      \ LineTerminator
      UnicodeEscapeSequence
    CharacterEscapeSequence ::
      SingleEscapeCharacter
      NonEscapeCharacter
    SingleEscapeCharacter ::
      '
      "
      \
      b
      f
      n
      r
      t
      v
    NonEscapeCharacter ::
      SourceCharacter [but not EscapeCharacter or LineTerminator]
    EscapeCharacter ::
      SingleEscapeCharacter
      DecimalDigit
      x
      u
    UnicodeEscapeSequence ::
      \x HexDigit HexDigit
      \u HexDigit{4}
      \x { HexDigit{1,} }
      \u { HexDigit{1,} }

XML

This section defines nonterminals used in the lexical grammar as part of the XML capabilities of the ShockScript language.

If a XMLMarkup, XMLAttributeValue or XMLText contains a LineTerminator after parsed, it contributes such LineTerminator to the lexical scanner.

Syntax

    XMLMarkup ::
      XMLComment
      XMLCDATA
      XMLPI
    XMLWhitespaceCharacter ::
      U+20 space
      U+09 tab
      U+0D carriage return
      U+0A line feed
    XMLWhitespace ::
      XMLWhitespaceCharacter
      XMLWhitespace XMLWhitespaceCharacter
    XMLText ::
      SourceCharacters [but no embedded left-curly { or less-than <]
    XMLName ::
      XMLNameStart
      XMLName XMLNamePart
    XMLNameStart ::
      UnicodeLetter
      underscore _
      colon :
    XMLNamePart ::
      UnicodeLetter
      UnicodeDigit
      period .
      hyphen -
      underscore _
      colon :
    XMLComment ::
      <!-- XMLCommentCharactersopt -->
    XMLCommentCharacters ::
      SourceCharacters [but no embedded sequence -->]
    XMLCDATA ::
      <![CDATA[ XMLCDATACharacters ]]>
    XMLCDATACharacters ::
      SourceCharacters [but no embedded sequence ]]>]
    XMLPI ::
      <? XMLPICharactersopt ?>
    XMLPICharacters ::
      SourceCharacters [but no embedded sequence ?>]
    XMLAttributeValue ::
      " XMLDoubleStringCharactersopt "
      ' XMLSingleStringCharactersopt '
    XMLDoubleStringCharacters ::
      SourceCharacters [but no embedded double-quote "]
    XMLSingleStringCharacters ::
      SourceCharacters [but no embedded single-quote ']
    XMLTagPunctuator ::
      =
      &=
      >
      />

Semantics

XMLCDATA contents, excluding the <![CDATA[ opening sequence and the ]]> closing sequence, are processed the same way as triple strings:

  • The base line for determining nested indentation characters is the non-empty (i.e. not a whitespace only line) first line that contains the lowest-indentation level after whitespace characters.
  • Every line contents start from the base line’s first non-whitespace character.
  • Beginning and end lines that are empty or consist only of whitespace are discarded.

For XMLText, unlike the E4X standard, ShockScript always trims any whitespace at the beginning and end of the text. The parser can skip the token if its empty after trimming whitespace. Note that this does not apply to the XML or XMLList parsers during runtime; they ignore whitespace depending on the XMLContext object specified by the use xml pragma.