Lexical conventions

This section defines the lexical grammar of the ShockScript language.

The tokenizer scans one of the following input goal symbols depending on the syntactic context: InputElementDiv, InputElementRegExp, InputElementXMLTag, InputElementXMLPI, InputElementXMLContent.

The following program illustrates how the tokenizer decides which is the input goal symbol to scan:

/(?:)/       ;
a / b        ;
<a>Text</a>  ;

The following table indicates which is the input goal symbol that is scanned for each of the tokens comprising the previous program:

TokenInput goal
/(?:)/InputElementRegExp
;InputElementDiv
aInputElementRegExp
/InputElementDiv
bInputElementRegExp
;InputElementDiv
<InputElementRegExp
aInputElementXMLTag
>InputElementXMLTag
TextInputElementXMLContent
</InputElementXMLContent
aInputElementXMLTag
>InputElementXMLTag
;InputElementDiv

The InputElementXMLPI goal symbol must be used when parsing the <?html={exp}?> markup.

Syntax

    InputElementDiv ::
      WhiteSpace
      LineTerminator
      Comment
      Identifier
      ReservedWord
      Punctuator
      /
      /=
      NumericLiteral
      StringLiteral
    InputElementRegExp ::
      WhiteSpace
      LineTerminator
      Comment
      Identifier
      ReservedWord
      Punctuator
      NumericLiteral
      StringLiteral
      RegularExpressionLiteral
      XMLMarkup
    InputElementXMLTag ::
      XMLName
      XMLTagPunctuator
      XMLAttributeValue
      XMLWhitespace
      {
    InputElementXMLPI ::
      {
      ?>
    InputElementXMLContent ::
      XMLMarkup
      XMLText
      {
      < [lookahead ∉ { ?, !, / }]
      </

Source Characters

Syntax

    SourceCharacter ::
      Unicode code point
    SourceCharacters ::
      SourceCharacter SourceCharactersopt

White Space

The WhiteSpace token is filtered out by the lexical scanner.

Syntax

    WhiteSpace ::
      U+09 tab
      U+0B vertical tab
      U+0C form feed
      U+20 space
      U+A0 no-break space
      Unicode “space separator”

Line Terminator

The LineTerminator token is filtered out by the lexical scanner, however it may result in a VirtualSemicolon to be inserted.

Syntax

    LineTerminator ::
      U+0A line feed
      U+0D carriage return
      U+2028 line separator
      U+2029 paragraph separator

Comment

The Comment token is filtered out by the lexical scanner, however it propagates any LineTerminator token from its characters.

/*
 * /*
 *  *
 *  */
 */

Syntax

    Comment ::
      // SingleLineCommentCharacters
      MultiLineComment
    SingleLineCommentCharacters ::
      SingleLineCommentCharacter SingleLineCommentCharactersopt
    SingleLineCommentCharacter ::
      [lookahead ∉ { LineTerminator }] SourceCharacter
    MultiLineComment ::
      /* MultiLineCommentCharactersopt */
    MultiLineCommentCharacters ::
      SourceCharacters [but no embedded sequence /*]
      MultiLineComment
      MultiLineCommentCharacters SourceCharacters [but no embedded sequence /*]
      MultiLineCommentCharacters MultiLineComment

Virtual Semicolon

The VirtualSemicolon nonterminal matches an automatically inserted semicolon, known as a virtual semicolon.

Virtual semicolons are inserted in the following occasions:

  • After a right-curly character }
  • Before a LineTerminator

Identifier

The Identifier symbol is similiar to that from the ECMA-262 third edition, but with support for scalar Unicode escapes.

Syntax

    Identifier ::
      IdentifierName [but not ReservedWord or ContextKeyword]
      ContextKeyword
    IdentifierName ::
      IdentifierStart
      IdentifierName IdentifierPart
    IdentifierStart ::
      UnicodeLetter
      underscore _
      $
      UnicodeEscapeSequence
    IdentifierPart ::
      UnicodeLetter
      UnicodeCombiningMark
      UnicodeConnectorPunctuation
      UnicodeDigit
      underscore _
      $
      UnicodeEscapeSequence
    UnicodeLetter ::
      Unicode letter (“L”)
      Unicode letter number (“Nl”)
    UnicodeDigit ::
      Unicode decimal digit number (“Nd”)
    UnicodeCombiningMark ::
      Unicode nonspacing mark (“Mn”)
      Unicode spacing combining mark (“Mc”)
    UnicodeConnectorPunctuation ::
      Unicode connector punctuation (“Pc”)

Keywords

ReservedWord includes the following reserved words:

as
do
if
in
is

for
new
not
try
use
var

case
else
null
this
true
void
with

await
break
catch
class
const
false
super
throw
while
yield

delete
import
public
return
switch
typeof

default
extends
finally
package
private

continue
function
internal

interface
protected

implements

ContextKeyword is one of the following in certain syntactic contexts:

get
set

each
enum
type

Embed
final

native
static

abstract
override

namespace

Punctuator

Punctuator includes one of the following:

::  @
.  ..  ...
(  )  [  ]  {  }
:  ;  ,
?  !  =
?.
<  <=
>  >=
==  ===
!=  !==
+  -  *  %  **
++  --
<<  >>  >>>
&  ^  |  ~
&&  ^^  ||  ??

The @ punctuator must not be followed by a single quote ' or a double quote character ".

Punctuator includes CompoundAssignmentPunctuator. CompoundAssignmentPunctuator is one of the following:

+=  -=  *=  %=  **=
<<=  >>=  >>>=  &=  ^=  |=
&&=  ^^=  ||=
??=

Numeric Literal

NumericLiteral is similiar to NumericLiteral from the ECMA-262 third edition, with support for binary literals and underscore separators:

0b1011
1_000

Syntax

    NumericLiteral ::
      DecimalLiteral [lookahead ∉ { IdentifierStart, DecimalDigit }]
      HexIntegerLiteral [lookahead ∉ { IdentifierStart, DecimalDigit }]
      BinIntegerLiteral [lookahead ∉ { IdentifierStart, DecimalDigit }]
    DecimalLiteral ::
      DecimalIntegerLiteral . UnderscoreDecimalDigitsopt
      ExponentPartopt
      . UnderscoreDecimalDigits ExponentPartopt
      DecimalIntegerLiteral ExponentPartopt
    DecimalIntegerLiteral ::
      0
      [lookahead = NonZeroDigit] UnderscoreDecimalDigitsopt
    DecimalDigits ::
      DecimalDigit{1,}
    UnderscoreDecimalDigits ::
      DecimalDigits UnderscoreDecimalDigits _ DecimalDigits
    DecimalDigit ::
      0-9
    NonZeroDigit ::
      1-9
    ExponentPart ::
      ExponentIndicator SignedInteger
    ExponentIndicator ::
      e
      E
    SignedInteger ::
      UnderscoreDecimalDigits
      + UnderscoreDecimalDigits
      - UnderscoreDecimalDigits
    HexIntegerLiteral ::
      0x UnderscoreHexDigits
      0X UnderscoreHexDigits
    HexDigit ::
      0-9
      A-F
      a-f
    UnderscoreHexDigits ::
      HexDigit{1,}
      UnderscoreDecimalDigits _ HexDigit{1,}
    BinIntegerLiteral ::
      0b UnderscoreBinDigits
      0B UnderscoreBinDigits
    BinDigit ::
      0
      1
    UnderscoreBinDigits ::
      BinDigit{1,}
      UnderscoreDecimalDigits _ BinDigit{1,}

Regular Expression Literal

RegularExpressionLiteral is similiar to RegularExpressionLiteral from the ECMA-262 third edition, with support for line breaks.

Syntax

    RegularExpressionLiteral ::
      / RegularExpressionBody / RegularExpressionFlags
    RegularExpressionBody ::
      RegularExpressionFirstChar RegularExpressionChars
    RegularExpressionChars ::
      «empty»
      RegularExpressionChars RegularExpressionChar
    RegularExpressionFirstChar ::
      SourceCharacter [but not * or \ or /]
      BackslashSequence
    RegularExpressionChar ::
      SourceCharacter [but not \ or /]
      BackslashSequence
    BackslashSequence ::
      \ SourceCharacter
    RegularExpressionFlags ::
      «empty»
      RegularExpressionFlags IdentifierPart

String Literal

StringLiteral is similiar to the StringLiteral symbol from the ECMA-262 third edition. The following additional features are included:

  • Scalar UnicodeEscapeSequence using the \u{...} form
  • Triple string literals
  • Raw string literals using the @ prefix

Triple string literals use either """ or ''' as delimiter and may span multiple lines. The contents of triple string literals are indentation-based, as can be observed in the following program:

const text = """
    foo
    bar
    """
text == "foo\nbar"

Triple string literals are processed as follows:

  • The first empty line is ignored.
  • The base indentation of a triple string literal is that of the last string line.

Both regular and triple string literals accept the @ prefix, designating raw string literals. Raw string literals contain no escape sequences.

const text = @"""
    x\y
    """

Escape sequences are described by the following table:

EscapeDescription
\'U+27 single-quote
\"U+22 double-quote
\\U+5C backslash character
\bU+08 backspace character
\fU+0C form feed character
\nU+0A line feed character
\rU+0D carriage return character
\tU+09 tab character
\vU+0B vertical tab character
\0U+00 character
\xHHContributes an Unicode code point value
\uHHHHContributes an Unicode code point value
\u{...}Contributes an Unicode code point value
\ followed by LineTerminatorContributes nothing

Syntax

    StringLiteral ::
      [lookahead ≠ """] " DoubleStringCharacter{0,} "
      [lookahead ≠ '''] ' SingleStringCharacter{0,} '
      """ TripleDoubleStringCharacter{0,} """
      ''' TripleSingleStringCharacter{0,} '''
      RawStringLiteral
    RawStringLiteral ::
      @ [lookahead ≠ """] " DoubleStringRawCharacter{0,} "
      @ [lookahead ≠ '''] ' SingleStringRawCharacter{0,} '
      @""" TripleDoubleStringRawCharacter{0,} """
      @''' TripleSingleStringRawCharacter{0,} '''
    DoubleStringCharacter ::
      SourceCharacter [but not double-quote " or backslash \ or LineTerminator]
      EscapeSequence
    SingleStringCharacter ::
      SourceCharacter [but not single-quote ' or backslash \ or LineTerminator]
      EscapeSequence
    DoubleStringRawCharacter ::
      SourceCharacter [but not double-quote " or LineTerminator]
    SingleStringRawCharacter ::
      SourceCharacter [but not single-quote ' or LineTerminator]
    TripleDoubleStringCharacter ::
      [lookahead ≠ """] SourceCharacter [but not backslash \ or LineTerminator]
      EscapeSequence
      LineTerminator
    TripleSingleStringCharacter ::
      [lookahead ≠ '''] SourceCharacter [but not backslash \ or LineTerminator]
      EscapeSequence
      LineTerminator
    TripleDoubleStringRawCharacter ::
      [lookahead ≠ """] SourceCharacter [but not LineTerminator]
      LineTerminator
    TripleSingleStringRawCharacter ::
      [lookahead ≠ '''] SourceCharacter [but not LineTerminator]
      LineTerminator

Escape Sequences

Syntax

    EscapeSequence ::
      \ CharacterEscapeSequence
      \0 [lookahead ∉ DecimalDigit]
      \ LineTerminator
      HexEscapeSequence
      UnicodeEscapeSequence
    CharacterEscapeSequence ::
      SingleEscapeCharacter
      NonEscapeCharacter
    SingleEscapeCharacter ::
      '
      "
      \
      b
      f
      n
      r
      t
      v
    NonEscapeCharacter ::
      SourceCharacter [but not EscapeCharacter or LineTerminator]
    EscapeCharacter ::
      SingleEscapeCharacter
      DecimalDigit
      x
      u
    HexEscapeSequence ::
      \x HexDigit HexDigit
    UnicodeEscapeSequence ::
      \u HexDigit{4}
      \u { HexDigit{1,} }

XML

This section defines nonterminals used in the lexical grammar as part of the XML capabilities of the ShockScript language.

If a XMLMarkup, XMLAttributeValue or XMLText contains a LineTerminator after parsed, it contributes such LineTerminator to the lexical scanner.

Syntax

    XMLMarkup ::
      XMLComment
      XMLCDATA
      XMLPI
      <?html=
    XMLWhitespaceCharacter ::
      U+20 space
      U+09 tab
      U+0D carriage return
      U+0A line feed
    XMLWhitespace ::
      XMLWhitespaceCharacter
      XMLWhitespace XMLWhitespaceCharacter
    XMLText ::
      SourceCharacters [but no embedded left-curly { or less-than <]
    XMLName ::
      XMLNameStart
      XMLName XMLNamePart
    XMLNameStart ::
      UnicodeLetter
      underscore _
      colon :
    XMLNamePart ::
      UnicodeLetter
      UnicodeDigit
      period .
      hyphen -
      underscore _
      colon :
    XMLComment ::
      <!-- XMLCommentCharactersopt -->
    XMLCommentCharacters ::
      SourceCharacters [but no embedded sequence -->]
    XMLCDATA ::
      <![CDATA[ XMLCDATACharacters ]]>
    XMLCDATACharacters ::
      SourceCharacters [but no embedded sequence ]]>]
    XMLPI ::
      <? XMLPICharactersopt ?>
    XMLPICharacters ::
      SourceCharacters [but no embedded sequence ?>]
    XMLAttributeValue ::
      " XMLDoubleStringCharactersopt "
      ' XMLSingleStringCharactersopt '
    XMLDoubleStringCharacters ::
      SourceCharacters [but no embedded double-quote "]
    XMLSingleStringCharacters ::
      SourceCharacters [but no embedded single-quote ']
    XMLTagPunctuator ::
      =
      &=
      >
      />