1*9880d681SAndroid Build Coastguard Worker=========================================== 2*9880d681SAndroid Build Coastguard WorkerKaleidoscope: Implementing a Parser and AST 3*9880d681SAndroid Build Coastguard Worker=========================================== 4*9880d681SAndroid Build Coastguard Worker 5*9880d681SAndroid Build Coastguard Worker.. contents:: 6*9880d681SAndroid Build Coastguard Worker :local: 7*9880d681SAndroid Build Coastguard Worker 8*9880d681SAndroid Build Coastguard WorkerChapter 2 Introduction 9*9880d681SAndroid Build Coastguard Worker====================== 10*9880d681SAndroid Build Coastguard Worker 11*9880d681SAndroid Build Coastguard WorkerWelcome to Chapter 2 of the "`Implementing a language with 12*9880d681SAndroid Build Coastguard WorkerLLVM <index.html>`_" tutorial. This chapter shows you how to use the 13*9880d681SAndroid Build Coastguard Workerlexer, built in `Chapter 1 <LangImpl1.html>`_, to build a full 14*9880d681SAndroid Build Coastguard Worker`parser <http://en.wikipedia.org/wiki/Parsing>`_ for our Kaleidoscope 15*9880d681SAndroid Build Coastguard Workerlanguage. Once we have a parser, we'll define and build an `Abstract 16*9880d681SAndroid Build Coastguard WorkerSyntax Tree <http://en.wikipedia.org/wiki/Abstract_syntax_tree>`_ (AST). 17*9880d681SAndroid Build Coastguard Worker 18*9880d681SAndroid Build Coastguard WorkerThe parser we will build uses a combination of `Recursive Descent 19*9880d681SAndroid Build Coastguard WorkerParsing <http://en.wikipedia.org/wiki/Recursive_descent_parser>`_ and 20*9880d681SAndroid Build Coastguard Worker`Operator-Precedence 21*9880d681SAndroid Build Coastguard WorkerParsing <http://en.wikipedia.org/wiki/Operator-precedence_parser>`_ to 22*9880d681SAndroid Build Coastguard Workerparse the Kaleidoscope language (the latter for binary expressions and 23*9880d681SAndroid Build Coastguard Workerthe former for everything else). Before we get to parsing though, lets 24*9880d681SAndroid Build Coastguard Workertalk about the output of the parser: the Abstract Syntax Tree. 25*9880d681SAndroid Build Coastguard Worker 26*9880d681SAndroid Build Coastguard WorkerThe Abstract Syntax Tree (AST) 27*9880d681SAndroid Build Coastguard Worker============================== 28*9880d681SAndroid Build Coastguard Worker 29*9880d681SAndroid Build Coastguard WorkerThe AST for a program captures its behavior in such a way that it is 30*9880d681SAndroid Build Coastguard Workereasy for later stages of the compiler (e.g. code generation) to 31*9880d681SAndroid Build Coastguard Workerinterpret. We basically want one object for each construct in the 32*9880d681SAndroid Build Coastguard Workerlanguage, and the AST should closely model the language. In 33*9880d681SAndroid Build Coastguard WorkerKaleidoscope, we have expressions, a prototype, and a function object. 34*9880d681SAndroid Build Coastguard WorkerWe'll start with expressions first: 35*9880d681SAndroid Build Coastguard Worker 36*9880d681SAndroid Build Coastguard Worker.. code-block:: c++ 37*9880d681SAndroid Build Coastguard Worker 38*9880d681SAndroid Build Coastguard Worker /// ExprAST - Base class for all expression nodes. 39*9880d681SAndroid Build Coastguard Worker class ExprAST { 40*9880d681SAndroid Build Coastguard Worker public: 41*9880d681SAndroid Build Coastguard Worker virtual ~ExprAST() {} 42*9880d681SAndroid Build Coastguard Worker }; 43*9880d681SAndroid Build Coastguard Worker 44*9880d681SAndroid Build Coastguard Worker /// NumberExprAST - Expression class for numeric literals like "1.0". 45*9880d681SAndroid Build Coastguard Worker class NumberExprAST : public ExprAST { 46*9880d681SAndroid Build Coastguard Worker double Val; 47*9880d681SAndroid Build Coastguard Worker 48*9880d681SAndroid Build Coastguard Worker public: 49*9880d681SAndroid Build Coastguard Worker NumberExprAST(double Val) : Val(Val) {} 50*9880d681SAndroid Build Coastguard Worker }; 51*9880d681SAndroid Build Coastguard Worker 52*9880d681SAndroid Build Coastguard WorkerThe code above shows the definition of the base ExprAST class and one 53*9880d681SAndroid Build Coastguard Workersubclass which we use for numeric literals. The important thing to note 54*9880d681SAndroid Build Coastguard Workerabout this code is that the NumberExprAST class captures the numeric 55*9880d681SAndroid Build Coastguard Workervalue of the literal as an instance variable. This allows later phases 56*9880d681SAndroid Build Coastguard Workerof the compiler to know what the stored numeric value is. 57*9880d681SAndroid Build Coastguard Worker 58*9880d681SAndroid Build Coastguard WorkerRight now we only create the AST, so there are no useful accessor 59*9880d681SAndroid Build Coastguard Workermethods on them. It would be very easy to add a virtual method to pretty 60*9880d681SAndroid Build Coastguard Workerprint the code, for example. Here are the other expression AST node 61*9880d681SAndroid Build Coastguard Workerdefinitions that we'll use in the basic form of the Kaleidoscope 62*9880d681SAndroid Build Coastguard Workerlanguage: 63*9880d681SAndroid Build Coastguard Worker 64*9880d681SAndroid Build Coastguard Worker.. code-block:: c++ 65*9880d681SAndroid Build Coastguard Worker 66*9880d681SAndroid Build Coastguard Worker /// VariableExprAST - Expression class for referencing a variable, like "a". 67*9880d681SAndroid Build Coastguard Worker class VariableExprAST : public ExprAST { 68*9880d681SAndroid Build Coastguard Worker std::string Name; 69*9880d681SAndroid Build Coastguard Worker 70*9880d681SAndroid Build Coastguard Worker public: 71*9880d681SAndroid Build Coastguard Worker VariableExprAST(const std::string &Name) : Name(Name) {} 72*9880d681SAndroid Build Coastguard Worker }; 73*9880d681SAndroid Build Coastguard Worker 74*9880d681SAndroid Build Coastguard Worker /// BinaryExprAST - Expression class for a binary operator. 75*9880d681SAndroid Build Coastguard Worker class BinaryExprAST : public ExprAST { 76*9880d681SAndroid Build Coastguard Worker char Op; 77*9880d681SAndroid Build Coastguard Worker std::unique_ptr<ExprAST> LHS, RHS; 78*9880d681SAndroid Build Coastguard Worker 79*9880d681SAndroid Build Coastguard Worker public: 80*9880d681SAndroid Build Coastguard Worker BinaryExprAST(char op, std::unique_ptr<ExprAST> LHS, 81*9880d681SAndroid Build Coastguard Worker std::unique_ptr<ExprAST> RHS) 82*9880d681SAndroid Build Coastguard Worker : Op(op), LHS(std::move(LHS)), RHS(std::move(RHS)) {} 83*9880d681SAndroid Build Coastguard Worker }; 84*9880d681SAndroid Build Coastguard Worker 85*9880d681SAndroid Build Coastguard Worker /// CallExprAST - Expression class for function calls. 86*9880d681SAndroid Build Coastguard Worker class CallExprAST : public ExprAST { 87*9880d681SAndroid Build Coastguard Worker std::string Callee; 88*9880d681SAndroid Build Coastguard Worker std::vector<std::unique_ptr<ExprAST>> Args; 89*9880d681SAndroid Build Coastguard Worker 90*9880d681SAndroid Build Coastguard Worker public: 91*9880d681SAndroid Build Coastguard Worker CallExprAST(const std::string &Callee, 92*9880d681SAndroid Build Coastguard Worker std::vector<std::unique_ptr<ExprAST>> Args) 93*9880d681SAndroid Build Coastguard Worker : Callee(Callee), Args(std::move(Args)) {} 94*9880d681SAndroid Build Coastguard Worker }; 95*9880d681SAndroid Build Coastguard Worker 96*9880d681SAndroid Build Coastguard WorkerThis is all (intentionally) rather straight-forward: variables capture 97*9880d681SAndroid Build Coastguard Workerthe variable name, binary operators capture their opcode (e.g. '+'), and 98*9880d681SAndroid Build Coastguard Workercalls capture a function name as well as a list of any argument 99*9880d681SAndroid Build Coastguard Workerexpressions. One thing that is nice about our AST is that it captures 100*9880d681SAndroid Build Coastguard Workerthe language features without talking about the syntax of the language. 101*9880d681SAndroid Build Coastguard WorkerNote that there is no discussion about precedence of binary operators, 102*9880d681SAndroid Build Coastguard Workerlexical structure, etc. 103*9880d681SAndroid Build Coastguard Worker 104*9880d681SAndroid Build Coastguard WorkerFor our basic language, these are all of the expression nodes we'll 105*9880d681SAndroid Build Coastguard Workerdefine. Because it doesn't have conditional control flow, it isn't 106*9880d681SAndroid Build Coastguard WorkerTuring-complete; we'll fix that in a later installment. The two things 107*9880d681SAndroid Build Coastguard Workerwe need next are a way to talk about the interface to a function, and a 108*9880d681SAndroid Build Coastguard Workerway to talk about functions themselves: 109*9880d681SAndroid Build Coastguard Worker 110*9880d681SAndroid Build Coastguard Worker.. code-block:: c++ 111*9880d681SAndroid Build Coastguard Worker 112*9880d681SAndroid Build Coastguard Worker /// PrototypeAST - This class represents the "prototype" for a function, 113*9880d681SAndroid Build Coastguard Worker /// which captures its name, and its argument names (thus implicitly the number 114*9880d681SAndroid Build Coastguard Worker /// of arguments the function takes). 115*9880d681SAndroid Build Coastguard Worker class PrototypeAST { 116*9880d681SAndroid Build Coastguard Worker std::string Name; 117*9880d681SAndroid Build Coastguard Worker std::vector<std::string> Args; 118*9880d681SAndroid Build Coastguard Worker 119*9880d681SAndroid Build Coastguard Worker public: 120*9880d681SAndroid Build Coastguard Worker PrototypeAST(const std::string &name, std::vector<std::string> Args) 121*9880d681SAndroid Build Coastguard Worker : Name(name), Args(std::move(Args)) {} 122*9880d681SAndroid Build Coastguard Worker }; 123*9880d681SAndroid Build Coastguard Worker 124*9880d681SAndroid Build Coastguard Worker /// FunctionAST - This class represents a function definition itself. 125*9880d681SAndroid Build Coastguard Worker class FunctionAST { 126*9880d681SAndroid Build Coastguard Worker std::unique_ptr<PrototypeAST> Proto; 127*9880d681SAndroid Build Coastguard Worker std::unique_ptr<ExprAST> Body; 128*9880d681SAndroid Build Coastguard Worker 129*9880d681SAndroid Build Coastguard Worker public: 130*9880d681SAndroid Build Coastguard Worker FunctionAST(std::unique_ptr<PrototypeAST> Proto, 131*9880d681SAndroid Build Coastguard Worker std::unique_ptr<ExprAST> Body) 132*9880d681SAndroid Build Coastguard Worker : Proto(std::move(Proto)), Body(std::move(Body)) {} 133*9880d681SAndroid Build Coastguard Worker }; 134*9880d681SAndroid Build Coastguard Worker 135*9880d681SAndroid Build Coastguard WorkerIn Kaleidoscope, functions are typed with just a count of their 136*9880d681SAndroid Build Coastguard Workerarguments. Since all values are double precision floating point, the 137*9880d681SAndroid Build Coastguard Workertype of each argument doesn't need to be stored anywhere. In a more 138*9880d681SAndroid Build Coastguard Workeraggressive and realistic language, the "ExprAST" class would probably 139*9880d681SAndroid Build Coastguard Workerhave a type field. 140*9880d681SAndroid Build Coastguard Worker 141*9880d681SAndroid Build Coastguard WorkerWith this scaffolding, we can now talk about parsing expressions and 142*9880d681SAndroid Build Coastguard Workerfunction bodies in Kaleidoscope. 143*9880d681SAndroid Build Coastguard Worker 144*9880d681SAndroid Build Coastguard WorkerParser Basics 145*9880d681SAndroid Build Coastguard Worker============= 146*9880d681SAndroid Build Coastguard Worker 147*9880d681SAndroid Build Coastguard WorkerNow that we have an AST to build, we need to define the parser code to 148*9880d681SAndroid Build Coastguard Workerbuild it. The idea here is that we want to parse something like "x+y" 149*9880d681SAndroid Build Coastguard Worker(which is returned as three tokens by the lexer) into an AST that could 150*9880d681SAndroid Build Coastguard Workerbe generated with calls like this: 151*9880d681SAndroid Build Coastguard Worker 152*9880d681SAndroid Build Coastguard Worker.. code-block:: c++ 153*9880d681SAndroid Build Coastguard Worker 154*9880d681SAndroid Build Coastguard Worker auto LHS = llvm::make_unique<VariableExprAST>("x"); 155*9880d681SAndroid Build Coastguard Worker auto RHS = llvm::make_unique<VariableExprAST>("y"); 156*9880d681SAndroid Build Coastguard Worker auto Result = std::make_unique<BinaryExprAST>('+', std::move(LHS), 157*9880d681SAndroid Build Coastguard Worker std::move(RHS)); 158*9880d681SAndroid Build Coastguard Worker 159*9880d681SAndroid Build Coastguard WorkerIn order to do this, we'll start by defining some basic helper routines: 160*9880d681SAndroid Build Coastguard Worker 161*9880d681SAndroid Build Coastguard Worker.. code-block:: c++ 162*9880d681SAndroid Build Coastguard Worker 163*9880d681SAndroid Build Coastguard Worker /// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current 164*9880d681SAndroid Build Coastguard Worker /// token the parser is looking at. getNextToken reads another token from the 165*9880d681SAndroid Build Coastguard Worker /// lexer and updates CurTok with its results. 166*9880d681SAndroid Build Coastguard Worker static int CurTok; 167*9880d681SAndroid Build Coastguard Worker static int getNextToken() { 168*9880d681SAndroid Build Coastguard Worker return CurTok = gettok(); 169*9880d681SAndroid Build Coastguard Worker } 170*9880d681SAndroid Build Coastguard Worker 171*9880d681SAndroid Build Coastguard WorkerThis implements a simple token buffer around the lexer. This allows us 172*9880d681SAndroid Build Coastguard Workerto look one token ahead at what the lexer is returning. Every function 173*9880d681SAndroid Build Coastguard Workerin our parser will assume that CurTok is the current token that needs to 174*9880d681SAndroid Build Coastguard Workerbe parsed. 175*9880d681SAndroid Build Coastguard Worker 176*9880d681SAndroid Build Coastguard Worker.. code-block:: c++ 177*9880d681SAndroid Build Coastguard Worker 178*9880d681SAndroid Build Coastguard Worker 179*9880d681SAndroid Build Coastguard Worker /// LogError* - These are little helper functions for error handling. 180*9880d681SAndroid Build Coastguard Worker std::unique_ptr<ExprAST> LogError(const char *Str) { 181*9880d681SAndroid Build Coastguard Worker fprintf(stderr, "LogError: %s\n", Str); 182*9880d681SAndroid Build Coastguard Worker return nullptr; 183*9880d681SAndroid Build Coastguard Worker } 184*9880d681SAndroid Build Coastguard Worker std::unique_ptr<PrototypeAST> LogErrorP(const char *Str) { 185*9880d681SAndroid Build Coastguard Worker LogError(Str); 186*9880d681SAndroid Build Coastguard Worker return nullptr; 187*9880d681SAndroid Build Coastguard Worker } 188*9880d681SAndroid Build Coastguard Worker 189*9880d681SAndroid Build Coastguard WorkerThe ``LogError`` routines are simple helper routines that our parser will 190*9880d681SAndroid Build Coastguard Workeruse to handle errors. The error recovery in our parser will not be the 191*9880d681SAndroid Build Coastguard Workerbest and is not particular user-friendly, but it will be enough for our 192*9880d681SAndroid Build Coastguard Workertutorial. These routines make it easier to handle errors in routines 193*9880d681SAndroid Build Coastguard Workerthat have various return types: they always return null. 194*9880d681SAndroid Build Coastguard Worker 195*9880d681SAndroid Build Coastguard WorkerWith these basic helper functions, we can implement the first piece of 196*9880d681SAndroid Build Coastguard Workerour grammar: numeric literals. 197*9880d681SAndroid Build Coastguard Worker 198*9880d681SAndroid Build Coastguard WorkerBasic Expression Parsing 199*9880d681SAndroid Build Coastguard Worker======================== 200*9880d681SAndroid Build Coastguard Worker 201*9880d681SAndroid Build Coastguard WorkerWe start with numeric literals, because they are the simplest to 202*9880d681SAndroid Build Coastguard Workerprocess. For each production in our grammar, we'll define a function 203*9880d681SAndroid Build Coastguard Workerwhich parses that production. For numeric literals, we have: 204*9880d681SAndroid Build Coastguard Worker 205*9880d681SAndroid Build Coastguard Worker.. code-block:: c++ 206*9880d681SAndroid Build Coastguard Worker 207*9880d681SAndroid Build Coastguard Worker /// numberexpr ::= number 208*9880d681SAndroid Build Coastguard Worker static std::unique_ptr<ExprAST> ParseNumberExpr() { 209*9880d681SAndroid Build Coastguard Worker auto Result = llvm::make_unique<NumberExprAST>(NumVal); 210*9880d681SAndroid Build Coastguard Worker getNextToken(); // consume the number 211*9880d681SAndroid Build Coastguard Worker return std::move(Result); 212*9880d681SAndroid Build Coastguard Worker } 213*9880d681SAndroid Build Coastguard Worker 214*9880d681SAndroid Build Coastguard WorkerThis routine is very simple: it expects to be called when the current 215*9880d681SAndroid Build Coastguard Workertoken is a ``tok_number`` token. It takes the current number value, 216*9880d681SAndroid Build Coastguard Workercreates a ``NumberExprAST`` node, advances the lexer to the next token, 217*9880d681SAndroid Build Coastguard Workerand finally returns. 218*9880d681SAndroid Build Coastguard Worker 219*9880d681SAndroid Build Coastguard WorkerThere are some interesting aspects to this. The most important one is 220*9880d681SAndroid Build Coastguard Workerthat this routine eats all of the tokens that correspond to the 221*9880d681SAndroid Build Coastguard Workerproduction and returns the lexer buffer with the next token (which is 222*9880d681SAndroid Build Coastguard Workernot part of the grammar production) ready to go. This is a fairly 223*9880d681SAndroid Build Coastguard Workerstandard way to go for recursive descent parsers. For a better example, 224*9880d681SAndroid Build Coastguard Workerthe parenthesis operator is defined like this: 225*9880d681SAndroid Build Coastguard Worker 226*9880d681SAndroid Build Coastguard Worker.. code-block:: c++ 227*9880d681SAndroid Build Coastguard Worker 228*9880d681SAndroid Build Coastguard Worker /// parenexpr ::= '(' expression ')' 229*9880d681SAndroid Build Coastguard Worker static std::unique_ptr<ExprAST> ParseParenExpr() { 230*9880d681SAndroid Build Coastguard Worker getNextToken(); // eat (. 231*9880d681SAndroid Build Coastguard Worker auto V = ParseExpression(); 232*9880d681SAndroid Build Coastguard Worker if (!V) 233*9880d681SAndroid Build Coastguard Worker return nullptr; 234*9880d681SAndroid Build Coastguard Worker 235*9880d681SAndroid Build Coastguard Worker if (CurTok != ')') 236*9880d681SAndroid Build Coastguard Worker return LogError("expected ')'"); 237*9880d681SAndroid Build Coastguard Worker getNextToken(); // eat ). 238*9880d681SAndroid Build Coastguard Worker return V; 239*9880d681SAndroid Build Coastguard Worker } 240*9880d681SAndroid Build Coastguard Worker 241*9880d681SAndroid Build Coastguard WorkerThis function illustrates a number of interesting things about the 242*9880d681SAndroid Build Coastguard Workerparser: 243*9880d681SAndroid Build Coastguard Worker 244*9880d681SAndroid Build Coastguard Worker1) It shows how we use the LogError routines. When called, this function 245*9880d681SAndroid Build Coastguard Workerexpects that the current token is a '(' token, but after parsing the 246*9880d681SAndroid Build Coastguard Workersubexpression, it is possible that there is no ')' waiting. For example, 247*9880d681SAndroid Build Coastguard Workerif the user types in "(4 x" instead of "(4)", the parser should emit an 248*9880d681SAndroid Build Coastguard Workererror. Because errors can occur, the parser needs a way to indicate that 249*9880d681SAndroid Build Coastguard Workerthey happened: in our parser, we return null on an error. 250*9880d681SAndroid Build Coastguard Worker 251*9880d681SAndroid Build Coastguard Worker2) Another interesting aspect of this function is that it uses recursion 252*9880d681SAndroid Build Coastguard Workerby calling ``ParseExpression`` (we will soon see that 253*9880d681SAndroid Build Coastguard Worker``ParseExpression`` can call ``ParseParenExpr``). This is powerful 254*9880d681SAndroid Build Coastguard Workerbecause it allows us to handle recursive grammars, and keeps each 255*9880d681SAndroid Build Coastguard Workerproduction very simple. Note that parentheses do not cause construction 256*9880d681SAndroid Build Coastguard Workerof AST nodes themselves. While we could do it this way, the most 257*9880d681SAndroid Build Coastguard Workerimportant role of parentheses are to guide the parser and provide 258*9880d681SAndroid Build Coastguard Workergrouping. Once the parser constructs the AST, parentheses are not 259*9880d681SAndroid Build Coastguard Workerneeded. 260*9880d681SAndroid Build Coastguard Worker 261*9880d681SAndroid Build Coastguard WorkerThe next simple production is for handling variable references and 262*9880d681SAndroid Build Coastguard Workerfunction calls: 263*9880d681SAndroid Build Coastguard Worker 264*9880d681SAndroid Build Coastguard Worker.. code-block:: c++ 265*9880d681SAndroid Build Coastguard Worker 266*9880d681SAndroid Build Coastguard Worker /// identifierexpr 267*9880d681SAndroid Build Coastguard Worker /// ::= identifier 268*9880d681SAndroid Build Coastguard Worker /// ::= identifier '(' expression* ')' 269*9880d681SAndroid Build Coastguard Worker static std::unique_ptr<ExprAST> ParseIdentifierExpr() { 270*9880d681SAndroid Build Coastguard Worker std::string IdName = IdentifierStr; 271*9880d681SAndroid Build Coastguard Worker 272*9880d681SAndroid Build Coastguard Worker getNextToken(); // eat identifier. 273*9880d681SAndroid Build Coastguard Worker 274*9880d681SAndroid Build Coastguard Worker if (CurTok != '(') // Simple variable ref. 275*9880d681SAndroid Build Coastguard Worker return llvm::make_unique<VariableExprAST>(IdName); 276*9880d681SAndroid Build Coastguard Worker 277*9880d681SAndroid Build Coastguard Worker // Call. 278*9880d681SAndroid Build Coastguard Worker getNextToken(); // eat ( 279*9880d681SAndroid Build Coastguard Worker std::vector<std::unique_ptr<ExprAST>> Args; 280*9880d681SAndroid Build Coastguard Worker if (CurTok != ')') { 281*9880d681SAndroid Build Coastguard Worker while (1) { 282*9880d681SAndroid Build Coastguard Worker if (auto Arg = ParseExpression()) 283*9880d681SAndroid Build Coastguard Worker Args.push_back(std::move(Arg)); 284*9880d681SAndroid Build Coastguard Worker else 285*9880d681SAndroid Build Coastguard Worker return nullptr; 286*9880d681SAndroid Build Coastguard Worker 287*9880d681SAndroid Build Coastguard Worker if (CurTok == ')') 288*9880d681SAndroid Build Coastguard Worker break; 289*9880d681SAndroid Build Coastguard Worker 290*9880d681SAndroid Build Coastguard Worker if (CurTok != ',') 291*9880d681SAndroid Build Coastguard Worker return LogError("Expected ')' or ',' in argument list"); 292*9880d681SAndroid Build Coastguard Worker getNextToken(); 293*9880d681SAndroid Build Coastguard Worker } 294*9880d681SAndroid Build Coastguard Worker } 295*9880d681SAndroid Build Coastguard Worker 296*9880d681SAndroid Build Coastguard Worker // Eat the ')'. 297*9880d681SAndroid Build Coastguard Worker getNextToken(); 298*9880d681SAndroid Build Coastguard Worker 299*9880d681SAndroid Build Coastguard Worker return llvm::make_unique<CallExprAST>(IdName, std::move(Args)); 300*9880d681SAndroid Build Coastguard Worker } 301*9880d681SAndroid Build Coastguard Worker 302*9880d681SAndroid Build Coastguard WorkerThis routine follows the same style as the other routines. (It expects 303*9880d681SAndroid Build Coastguard Workerto be called if the current token is a ``tok_identifier`` token). It 304*9880d681SAndroid Build Coastguard Workeralso has recursion and error handling. One interesting aspect of this is 305*9880d681SAndroid Build Coastguard Workerthat it uses *look-ahead* to determine if the current identifier is a 306*9880d681SAndroid Build Coastguard Workerstand alone variable reference or if it is a function call expression. 307*9880d681SAndroid Build Coastguard WorkerIt handles this by checking to see if the token after the identifier is 308*9880d681SAndroid Build Coastguard Workera '(' token, constructing either a ``VariableExprAST`` or 309*9880d681SAndroid Build Coastguard Worker``CallExprAST`` node as appropriate. 310*9880d681SAndroid Build Coastguard Worker 311*9880d681SAndroid Build Coastguard WorkerNow that we have all of our simple expression-parsing logic in place, we 312*9880d681SAndroid Build Coastguard Workercan define a helper function to wrap it together into one entry point. 313*9880d681SAndroid Build Coastguard WorkerWe call this class of expressions "primary" expressions, for reasons 314*9880d681SAndroid Build Coastguard Workerthat will become more clear `later in the 315*9880d681SAndroid Build Coastguard Workertutorial <LangImpl6.html#user-defined-unary-operators>`_. In order to parse an arbitrary 316*9880d681SAndroid Build Coastguard Workerprimary expression, we need to determine what sort of expression it is: 317*9880d681SAndroid Build Coastguard Worker 318*9880d681SAndroid Build Coastguard Worker.. code-block:: c++ 319*9880d681SAndroid Build Coastguard Worker 320*9880d681SAndroid Build Coastguard Worker /// primary 321*9880d681SAndroid Build Coastguard Worker /// ::= identifierexpr 322*9880d681SAndroid Build Coastguard Worker /// ::= numberexpr 323*9880d681SAndroid Build Coastguard Worker /// ::= parenexpr 324*9880d681SAndroid Build Coastguard Worker static std::unique_ptr<ExprAST> ParsePrimary() { 325*9880d681SAndroid Build Coastguard Worker switch (CurTok) { 326*9880d681SAndroid Build Coastguard Worker default: 327*9880d681SAndroid Build Coastguard Worker return LogError("unknown token when expecting an expression"); 328*9880d681SAndroid Build Coastguard Worker case tok_identifier: 329*9880d681SAndroid Build Coastguard Worker return ParseIdentifierExpr(); 330*9880d681SAndroid Build Coastguard Worker case tok_number: 331*9880d681SAndroid Build Coastguard Worker return ParseNumberExpr(); 332*9880d681SAndroid Build Coastguard Worker case '(': 333*9880d681SAndroid Build Coastguard Worker return ParseParenExpr(); 334*9880d681SAndroid Build Coastguard Worker } 335*9880d681SAndroid Build Coastguard Worker } 336*9880d681SAndroid Build Coastguard Worker 337*9880d681SAndroid Build Coastguard WorkerNow that you see the definition of this function, it is more obvious why 338*9880d681SAndroid Build Coastguard Workerwe can assume the state of CurTok in the various functions. This uses 339*9880d681SAndroid Build Coastguard Workerlook-ahead to determine which sort of expression is being inspected, and 340*9880d681SAndroid Build Coastguard Workerthen parses it with a function call. 341*9880d681SAndroid Build Coastguard Worker 342*9880d681SAndroid Build Coastguard WorkerNow that basic expressions are handled, we need to handle binary 343*9880d681SAndroid Build Coastguard Workerexpressions. They are a bit more complex. 344*9880d681SAndroid Build Coastguard Worker 345*9880d681SAndroid Build Coastguard WorkerBinary Expression Parsing 346*9880d681SAndroid Build Coastguard Worker========================= 347*9880d681SAndroid Build Coastguard Worker 348*9880d681SAndroid Build Coastguard WorkerBinary expressions are significantly harder to parse because they are 349*9880d681SAndroid Build Coastguard Workeroften ambiguous. For example, when given the string "x+y\*z", the parser 350*9880d681SAndroid Build Coastguard Workercan choose to parse it as either "(x+y)\*z" or "x+(y\*z)". With common 351*9880d681SAndroid Build Coastguard Workerdefinitions from mathematics, we expect the later parse, because "\*" 352*9880d681SAndroid Build Coastguard Worker(multiplication) has higher *precedence* than "+" (addition). 353*9880d681SAndroid Build Coastguard Worker 354*9880d681SAndroid Build Coastguard WorkerThere are many ways to handle this, but an elegant and efficient way is 355*9880d681SAndroid Build Coastguard Workerto use `Operator-Precedence 356*9880d681SAndroid Build Coastguard WorkerParsing <http://en.wikipedia.org/wiki/Operator-precedence_parser>`_. 357*9880d681SAndroid Build Coastguard WorkerThis parsing technique uses the precedence of binary operators to guide 358*9880d681SAndroid Build Coastguard Workerrecursion. To start with, we need a table of precedences: 359*9880d681SAndroid Build Coastguard Worker 360*9880d681SAndroid Build Coastguard Worker.. code-block:: c++ 361*9880d681SAndroid Build Coastguard Worker 362*9880d681SAndroid Build Coastguard Worker /// BinopPrecedence - This holds the precedence for each binary operator that is 363*9880d681SAndroid Build Coastguard Worker /// defined. 364*9880d681SAndroid Build Coastguard Worker static std::map<char, int> BinopPrecedence; 365*9880d681SAndroid Build Coastguard Worker 366*9880d681SAndroid Build Coastguard Worker /// GetTokPrecedence - Get the precedence of the pending binary operator token. 367*9880d681SAndroid Build Coastguard Worker static int GetTokPrecedence() { 368*9880d681SAndroid Build Coastguard Worker if (!isascii(CurTok)) 369*9880d681SAndroid Build Coastguard Worker return -1; 370*9880d681SAndroid Build Coastguard Worker 371*9880d681SAndroid Build Coastguard Worker // Make sure it's a declared binop. 372*9880d681SAndroid Build Coastguard Worker int TokPrec = BinopPrecedence[CurTok]; 373*9880d681SAndroid Build Coastguard Worker if (TokPrec <= 0) return -1; 374*9880d681SAndroid Build Coastguard Worker return TokPrec; 375*9880d681SAndroid Build Coastguard Worker } 376*9880d681SAndroid Build Coastguard Worker 377*9880d681SAndroid Build Coastguard Worker int main() { 378*9880d681SAndroid Build Coastguard Worker // Install standard binary operators. 379*9880d681SAndroid Build Coastguard Worker // 1 is lowest precedence. 380*9880d681SAndroid Build Coastguard Worker BinopPrecedence['<'] = 10; 381*9880d681SAndroid Build Coastguard Worker BinopPrecedence['+'] = 20; 382*9880d681SAndroid Build Coastguard Worker BinopPrecedence['-'] = 20; 383*9880d681SAndroid Build Coastguard Worker BinopPrecedence['*'] = 40; // highest. 384*9880d681SAndroid Build Coastguard Worker ... 385*9880d681SAndroid Build Coastguard Worker } 386*9880d681SAndroid Build Coastguard Worker 387*9880d681SAndroid Build Coastguard WorkerFor the basic form of Kaleidoscope, we will only support 4 binary 388*9880d681SAndroid Build Coastguard Workeroperators (this can obviously be extended by you, our brave and intrepid 389*9880d681SAndroid Build Coastguard Workerreader). The ``GetTokPrecedence`` function returns the precedence for 390*9880d681SAndroid Build Coastguard Workerthe current token, or -1 if the token is not a binary operator. Having a 391*9880d681SAndroid Build Coastguard Workermap makes it easy to add new operators and makes it clear that the 392*9880d681SAndroid Build Coastguard Workeralgorithm doesn't depend on the specific operators involved, but it 393*9880d681SAndroid Build Coastguard Workerwould be easy enough to eliminate the map and do the comparisons in the 394*9880d681SAndroid Build Coastguard Worker``GetTokPrecedence`` function. (Or just use a fixed-size array). 395*9880d681SAndroid Build Coastguard Worker 396*9880d681SAndroid Build Coastguard WorkerWith the helper above defined, we can now start parsing binary 397*9880d681SAndroid Build Coastguard Workerexpressions. The basic idea of operator precedence parsing is to break 398*9880d681SAndroid Build Coastguard Workerdown an expression with potentially ambiguous binary operators into 399*9880d681SAndroid Build Coastguard Workerpieces. Consider, for example, the expression "a+b+(c+d)\*e\*f+g". 400*9880d681SAndroid Build Coastguard WorkerOperator precedence parsing considers this as a stream of primary 401*9880d681SAndroid Build Coastguard Workerexpressions separated by binary operators. As such, it will first parse 402*9880d681SAndroid Build Coastguard Workerthe leading primary expression "a", then it will see the pairs [+, b] 403*9880d681SAndroid Build Coastguard Worker[+, (c+d)] [\*, e] [\*, f] and [+, g]. Note that because parentheses are 404*9880d681SAndroid Build Coastguard Workerprimary expressions, the binary expression parser doesn't need to worry 405*9880d681SAndroid Build Coastguard Workerabout nested subexpressions like (c+d) at all. 406*9880d681SAndroid Build Coastguard Worker 407*9880d681SAndroid Build Coastguard WorkerTo start, an expression is a primary expression potentially followed by 408*9880d681SAndroid Build Coastguard Workera sequence of [binop,primaryexpr] pairs: 409*9880d681SAndroid Build Coastguard Worker 410*9880d681SAndroid Build Coastguard Worker.. code-block:: c++ 411*9880d681SAndroid Build Coastguard Worker 412*9880d681SAndroid Build Coastguard Worker /// expression 413*9880d681SAndroid Build Coastguard Worker /// ::= primary binoprhs 414*9880d681SAndroid Build Coastguard Worker /// 415*9880d681SAndroid Build Coastguard Worker static std::unique_ptr<ExprAST> ParseExpression() { 416*9880d681SAndroid Build Coastguard Worker auto LHS = ParsePrimary(); 417*9880d681SAndroid Build Coastguard Worker if (!LHS) 418*9880d681SAndroid Build Coastguard Worker return nullptr; 419*9880d681SAndroid Build Coastguard Worker 420*9880d681SAndroid Build Coastguard Worker return ParseBinOpRHS(0, std::move(LHS)); 421*9880d681SAndroid Build Coastguard Worker } 422*9880d681SAndroid Build Coastguard Worker 423*9880d681SAndroid Build Coastguard Worker``ParseBinOpRHS`` is the function that parses the sequence of pairs for 424*9880d681SAndroid Build Coastguard Workerus. It takes a precedence and a pointer to an expression for the part 425*9880d681SAndroid Build Coastguard Workerthat has been parsed so far. Note that "x" is a perfectly valid 426*9880d681SAndroid Build Coastguard Workerexpression: As such, "binoprhs" is allowed to be empty, in which case it 427*9880d681SAndroid Build Coastguard Workerreturns the expression that is passed into it. In our example above, the 428*9880d681SAndroid Build Coastguard Workercode passes the expression for "a" into ``ParseBinOpRHS`` and the 429*9880d681SAndroid Build Coastguard Workercurrent token is "+". 430*9880d681SAndroid Build Coastguard Worker 431*9880d681SAndroid Build Coastguard WorkerThe precedence value passed into ``ParseBinOpRHS`` indicates the 432*9880d681SAndroid Build Coastguard Worker*minimal operator precedence* that the function is allowed to eat. For 433*9880d681SAndroid Build Coastguard Workerexample, if the current pair stream is [+, x] and ``ParseBinOpRHS`` is 434*9880d681SAndroid Build Coastguard Workerpassed in a precedence of 40, it will not consume any tokens (because 435*9880d681SAndroid Build Coastguard Workerthe precedence of '+' is only 20). With this in mind, ``ParseBinOpRHS`` 436*9880d681SAndroid Build Coastguard Workerstarts with: 437*9880d681SAndroid Build Coastguard Worker 438*9880d681SAndroid Build Coastguard Worker.. code-block:: c++ 439*9880d681SAndroid Build Coastguard Worker 440*9880d681SAndroid Build Coastguard Worker /// binoprhs 441*9880d681SAndroid Build Coastguard Worker /// ::= ('+' primary)* 442*9880d681SAndroid Build Coastguard Worker static std::unique_ptr<ExprAST> ParseBinOpRHS(int ExprPrec, 443*9880d681SAndroid Build Coastguard Worker std::unique_ptr<ExprAST> LHS) { 444*9880d681SAndroid Build Coastguard Worker // If this is a binop, find its precedence. 445*9880d681SAndroid Build Coastguard Worker while (1) { 446*9880d681SAndroid Build Coastguard Worker int TokPrec = GetTokPrecedence(); 447*9880d681SAndroid Build Coastguard Worker 448*9880d681SAndroid Build Coastguard Worker // If this is a binop that binds at least as tightly as the current binop, 449*9880d681SAndroid Build Coastguard Worker // consume it, otherwise we are done. 450*9880d681SAndroid Build Coastguard Worker if (TokPrec < ExprPrec) 451*9880d681SAndroid Build Coastguard Worker return LHS; 452*9880d681SAndroid Build Coastguard Worker 453*9880d681SAndroid Build Coastguard WorkerThis code gets the precedence of the current token and checks to see if 454*9880d681SAndroid Build Coastguard Workerif is too low. Because we defined invalid tokens to have a precedence of 455*9880d681SAndroid Build Coastguard Worker-1, this check implicitly knows that the pair-stream ends when the token 456*9880d681SAndroid Build Coastguard Workerstream runs out of binary operators. If this check succeeds, we know 457*9880d681SAndroid Build Coastguard Workerthat the token is a binary operator and that it will be included in this 458*9880d681SAndroid Build Coastguard Workerexpression: 459*9880d681SAndroid Build Coastguard Worker 460*9880d681SAndroid Build Coastguard Worker.. code-block:: c++ 461*9880d681SAndroid Build Coastguard Worker 462*9880d681SAndroid Build Coastguard Worker // Okay, we know this is a binop. 463*9880d681SAndroid Build Coastguard Worker int BinOp = CurTok; 464*9880d681SAndroid Build Coastguard Worker getNextToken(); // eat binop 465*9880d681SAndroid Build Coastguard Worker 466*9880d681SAndroid Build Coastguard Worker // Parse the primary expression after the binary operator. 467*9880d681SAndroid Build Coastguard Worker auto RHS = ParsePrimary(); 468*9880d681SAndroid Build Coastguard Worker if (!RHS) 469*9880d681SAndroid Build Coastguard Worker return nullptr; 470*9880d681SAndroid Build Coastguard Worker 471*9880d681SAndroid Build Coastguard WorkerAs such, this code eats (and remembers) the binary operator and then 472*9880d681SAndroid Build Coastguard Workerparses the primary expression that follows. This builds up the whole 473*9880d681SAndroid Build Coastguard Workerpair, the first of which is [+, b] for the running example. 474*9880d681SAndroid Build Coastguard Worker 475*9880d681SAndroid Build Coastguard WorkerNow that we parsed the left-hand side of an expression and one pair of 476*9880d681SAndroid Build Coastguard Workerthe RHS sequence, we have to decide which way the expression associates. 477*9880d681SAndroid Build Coastguard WorkerIn particular, we could have "(a+b) binop unparsed" or "a + (b binop 478*9880d681SAndroid Build Coastguard Workerunparsed)". To determine this, we look ahead at "binop" to determine its 479*9880d681SAndroid Build Coastguard Workerprecedence and compare it to BinOp's precedence (which is '+' in this 480*9880d681SAndroid Build Coastguard Workercase): 481*9880d681SAndroid Build Coastguard Worker 482*9880d681SAndroid Build Coastguard Worker.. code-block:: c++ 483*9880d681SAndroid Build Coastguard Worker 484*9880d681SAndroid Build Coastguard Worker // If BinOp binds less tightly with RHS than the operator after RHS, let 485*9880d681SAndroid Build Coastguard Worker // the pending operator take RHS as its LHS. 486*9880d681SAndroid Build Coastguard Worker int NextPrec = GetTokPrecedence(); 487*9880d681SAndroid Build Coastguard Worker if (TokPrec < NextPrec) { 488*9880d681SAndroid Build Coastguard Worker 489*9880d681SAndroid Build Coastguard WorkerIf the precedence of the binop to the right of "RHS" is lower or equal 490*9880d681SAndroid Build Coastguard Workerto the precedence of our current operator, then we know that the 491*9880d681SAndroid Build Coastguard Workerparentheses associate as "(a+b) binop ...". In our example, the current 492*9880d681SAndroid Build Coastguard Workeroperator is "+" and the next operator is "+", we know that they have the 493*9880d681SAndroid Build Coastguard Workersame precedence. In this case we'll create the AST node for "a+b", and 494*9880d681SAndroid Build Coastguard Workerthen continue parsing: 495*9880d681SAndroid Build Coastguard Worker 496*9880d681SAndroid Build Coastguard Worker.. code-block:: c++ 497*9880d681SAndroid Build Coastguard Worker 498*9880d681SAndroid Build Coastguard Worker ... if body omitted ... 499*9880d681SAndroid Build Coastguard Worker } 500*9880d681SAndroid Build Coastguard Worker 501*9880d681SAndroid Build Coastguard Worker // Merge LHS/RHS. 502*9880d681SAndroid Build Coastguard Worker LHS = llvm::make_unique<BinaryExprAST>(BinOp, std::move(LHS), 503*9880d681SAndroid Build Coastguard Worker std::move(RHS)); 504*9880d681SAndroid Build Coastguard Worker } // loop around to the top of the while loop. 505*9880d681SAndroid Build Coastguard Worker } 506*9880d681SAndroid Build Coastguard Worker 507*9880d681SAndroid Build Coastguard WorkerIn our example above, this will turn "a+b+" into "(a+b)" and execute the 508*9880d681SAndroid Build Coastguard Workernext iteration of the loop, with "+" as the current token. The code 509*9880d681SAndroid Build Coastguard Workerabove will eat, remember, and parse "(c+d)" as the primary expression, 510*9880d681SAndroid Build Coastguard Workerwhich makes the current pair equal to [+, (c+d)]. It will then evaluate 511*9880d681SAndroid Build Coastguard Workerthe 'if' conditional above with "\*" as the binop to the right of the 512*9880d681SAndroid Build Coastguard Workerprimary. In this case, the precedence of "\*" is higher than the 513*9880d681SAndroid Build Coastguard Workerprecedence of "+" so the if condition will be entered. 514*9880d681SAndroid Build Coastguard Worker 515*9880d681SAndroid Build Coastguard WorkerThe critical question left here is "how can the if condition parse the 516*9880d681SAndroid Build Coastguard Workerright hand side in full"? In particular, to build the AST correctly for 517*9880d681SAndroid Build Coastguard Workerour example, it needs to get all of "(c+d)\*e\*f" as the RHS expression 518*9880d681SAndroid Build Coastguard Workervariable. The code to do this is surprisingly simple (code from the 519*9880d681SAndroid Build Coastguard Workerabove two blocks duplicated for context): 520*9880d681SAndroid Build Coastguard Worker 521*9880d681SAndroid Build Coastguard Worker.. code-block:: c++ 522*9880d681SAndroid Build Coastguard Worker 523*9880d681SAndroid Build Coastguard Worker // If BinOp binds less tightly with RHS than the operator after RHS, let 524*9880d681SAndroid Build Coastguard Worker // the pending operator take RHS as its LHS. 525*9880d681SAndroid Build Coastguard Worker int NextPrec = GetTokPrecedence(); 526*9880d681SAndroid Build Coastguard Worker if (TokPrec < NextPrec) { 527*9880d681SAndroid Build Coastguard Worker RHS = ParseBinOpRHS(TokPrec+1, std::move(RHS)); 528*9880d681SAndroid Build Coastguard Worker if (!RHS) 529*9880d681SAndroid Build Coastguard Worker return nullptr; 530*9880d681SAndroid Build Coastguard Worker } 531*9880d681SAndroid Build Coastguard Worker // Merge LHS/RHS. 532*9880d681SAndroid Build Coastguard Worker LHS = llvm::make_unique<BinaryExprAST>(BinOp, std::move(LHS), 533*9880d681SAndroid Build Coastguard Worker std::move(RHS)); 534*9880d681SAndroid Build Coastguard Worker } // loop around to the top of the while loop. 535*9880d681SAndroid Build Coastguard Worker } 536*9880d681SAndroid Build Coastguard Worker 537*9880d681SAndroid Build Coastguard WorkerAt this point, we know that the binary operator to the RHS of our 538*9880d681SAndroid Build Coastguard Workerprimary has higher precedence than the binop we are currently parsing. 539*9880d681SAndroid Build Coastguard WorkerAs such, we know that any sequence of pairs whose operators are all 540*9880d681SAndroid Build Coastguard Workerhigher precedence than "+" should be parsed together and returned as 541*9880d681SAndroid Build Coastguard Worker"RHS". To do this, we recursively invoke the ``ParseBinOpRHS`` function 542*9880d681SAndroid Build Coastguard Workerspecifying "TokPrec+1" as the minimum precedence required for it to 543*9880d681SAndroid Build Coastguard Workercontinue. In our example above, this will cause it to return the AST 544*9880d681SAndroid Build Coastguard Workernode for "(c+d)\*e\*f" as RHS, which is then set as the RHS of the '+' 545*9880d681SAndroid Build Coastguard Workerexpression. 546*9880d681SAndroid Build Coastguard Worker 547*9880d681SAndroid Build Coastguard WorkerFinally, on the next iteration of the while loop, the "+g" piece is 548*9880d681SAndroid Build Coastguard Workerparsed and added to the AST. With this little bit of code (14 549*9880d681SAndroid Build Coastguard Workernon-trivial lines), we correctly handle fully general binary expression 550*9880d681SAndroid Build Coastguard Workerparsing in a very elegant way. This was a whirlwind tour of this code, 551*9880d681SAndroid Build Coastguard Workerand it is somewhat subtle. I recommend running through it with a few 552*9880d681SAndroid Build Coastguard Workertough examples to see how it works. 553*9880d681SAndroid Build Coastguard Worker 554*9880d681SAndroid Build Coastguard WorkerThis wraps up handling of expressions. At this point, we can point the 555*9880d681SAndroid Build Coastguard Workerparser at an arbitrary token stream and build an expression from it, 556*9880d681SAndroid Build Coastguard Workerstopping at the first token that is not part of the expression. Next up 557*9880d681SAndroid Build Coastguard Workerwe need to handle function definitions, etc. 558*9880d681SAndroid Build Coastguard Worker 559*9880d681SAndroid Build Coastguard WorkerParsing the Rest 560*9880d681SAndroid Build Coastguard Worker================ 561*9880d681SAndroid Build Coastguard Worker 562*9880d681SAndroid Build Coastguard WorkerThe next thing missing is handling of function prototypes. In 563*9880d681SAndroid Build Coastguard WorkerKaleidoscope, these are used both for 'extern' function declarations as 564*9880d681SAndroid Build Coastguard Workerwell as function body definitions. The code to do this is 565*9880d681SAndroid Build Coastguard Workerstraight-forward and not very interesting (once you've survived 566*9880d681SAndroid Build Coastguard Workerexpressions): 567*9880d681SAndroid Build Coastguard Worker 568*9880d681SAndroid Build Coastguard Worker.. code-block:: c++ 569*9880d681SAndroid Build Coastguard Worker 570*9880d681SAndroid Build Coastguard Worker /// prototype 571*9880d681SAndroid Build Coastguard Worker /// ::= id '(' id* ')' 572*9880d681SAndroid Build Coastguard Worker static std::unique_ptr<PrototypeAST> ParsePrototype() { 573*9880d681SAndroid Build Coastguard Worker if (CurTok != tok_identifier) 574*9880d681SAndroid Build Coastguard Worker return LogErrorP("Expected function name in prototype"); 575*9880d681SAndroid Build Coastguard Worker 576*9880d681SAndroid Build Coastguard Worker std::string FnName = IdentifierStr; 577*9880d681SAndroid Build Coastguard Worker getNextToken(); 578*9880d681SAndroid Build Coastguard Worker 579*9880d681SAndroid Build Coastguard Worker if (CurTok != '(') 580*9880d681SAndroid Build Coastguard Worker return LogErrorP("Expected '(' in prototype"); 581*9880d681SAndroid Build Coastguard Worker 582*9880d681SAndroid Build Coastguard Worker // Read the list of argument names. 583*9880d681SAndroid Build Coastguard Worker std::vector<std::string> ArgNames; 584*9880d681SAndroid Build Coastguard Worker while (getNextToken() == tok_identifier) 585*9880d681SAndroid Build Coastguard Worker ArgNames.push_back(IdentifierStr); 586*9880d681SAndroid Build Coastguard Worker if (CurTok != ')') 587*9880d681SAndroid Build Coastguard Worker return LogErrorP("Expected ')' in prototype"); 588*9880d681SAndroid Build Coastguard Worker 589*9880d681SAndroid Build Coastguard Worker // success. 590*9880d681SAndroid Build Coastguard Worker getNextToken(); // eat ')'. 591*9880d681SAndroid Build Coastguard Worker 592*9880d681SAndroid Build Coastguard Worker return llvm::make_unique<PrototypeAST>(FnName, std::move(ArgNames)); 593*9880d681SAndroid Build Coastguard Worker } 594*9880d681SAndroid Build Coastguard Worker 595*9880d681SAndroid Build Coastguard WorkerGiven this, a function definition is very simple, just a prototype plus 596*9880d681SAndroid Build Coastguard Workeran expression to implement the body: 597*9880d681SAndroid Build Coastguard Worker 598*9880d681SAndroid Build Coastguard Worker.. code-block:: c++ 599*9880d681SAndroid Build Coastguard Worker 600*9880d681SAndroid Build Coastguard Worker /// definition ::= 'def' prototype expression 601*9880d681SAndroid Build Coastguard Worker static std::unique_ptr<FunctionAST> ParseDefinition() { 602*9880d681SAndroid Build Coastguard Worker getNextToken(); // eat def. 603*9880d681SAndroid Build Coastguard Worker auto Proto = ParsePrototype(); 604*9880d681SAndroid Build Coastguard Worker if (!Proto) return nullptr; 605*9880d681SAndroid Build Coastguard Worker 606*9880d681SAndroid Build Coastguard Worker if (auto E = ParseExpression()) 607*9880d681SAndroid Build Coastguard Worker return llvm::make_unique<FunctionAST>(std::move(Proto), std::move(E)); 608*9880d681SAndroid Build Coastguard Worker return nullptr; 609*9880d681SAndroid Build Coastguard Worker } 610*9880d681SAndroid Build Coastguard Worker 611*9880d681SAndroid Build Coastguard WorkerIn addition, we support 'extern' to declare functions like 'sin' and 612*9880d681SAndroid Build Coastguard Worker'cos' as well as to support forward declaration of user functions. These 613*9880d681SAndroid Build Coastguard Worker'extern's are just prototypes with no body: 614*9880d681SAndroid Build Coastguard Worker 615*9880d681SAndroid Build Coastguard Worker.. code-block:: c++ 616*9880d681SAndroid Build Coastguard Worker 617*9880d681SAndroid Build Coastguard Worker /// external ::= 'extern' prototype 618*9880d681SAndroid Build Coastguard Worker static std::unique_ptr<PrototypeAST> ParseExtern() { 619*9880d681SAndroid Build Coastguard Worker getNextToken(); // eat extern. 620*9880d681SAndroid Build Coastguard Worker return ParsePrototype(); 621*9880d681SAndroid Build Coastguard Worker } 622*9880d681SAndroid Build Coastguard Worker 623*9880d681SAndroid Build Coastguard WorkerFinally, we'll also let the user type in arbitrary top-level expressions 624*9880d681SAndroid Build Coastguard Workerand evaluate them on the fly. We will handle this by defining anonymous 625*9880d681SAndroid Build Coastguard Workernullary (zero argument) functions for them: 626*9880d681SAndroid Build Coastguard Worker 627*9880d681SAndroid Build Coastguard Worker.. code-block:: c++ 628*9880d681SAndroid Build Coastguard Worker 629*9880d681SAndroid Build Coastguard Worker /// toplevelexpr ::= expression 630*9880d681SAndroid Build Coastguard Worker static std::unique_ptr<FunctionAST> ParseTopLevelExpr() { 631*9880d681SAndroid Build Coastguard Worker if (auto E = ParseExpression()) { 632*9880d681SAndroid Build Coastguard Worker // Make an anonymous proto. 633*9880d681SAndroid Build Coastguard Worker auto Proto = llvm::make_unique<PrototypeAST>("", std::vector<std::string>()); 634*9880d681SAndroid Build Coastguard Worker return llvm::make_unique<FunctionAST>(std::move(Proto), std::move(E)); 635*9880d681SAndroid Build Coastguard Worker } 636*9880d681SAndroid Build Coastguard Worker return nullptr; 637*9880d681SAndroid Build Coastguard Worker } 638*9880d681SAndroid Build Coastguard Worker 639*9880d681SAndroid Build Coastguard WorkerNow that we have all the pieces, let's build a little driver that will 640*9880d681SAndroid Build Coastguard Workerlet us actually *execute* this code we've built! 641*9880d681SAndroid Build Coastguard Worker 642*9880d681SAndroid Build Coastguard WorkerThe Driver 643*9880d681SAndroid Build Coastguard Worker========== 644*9880d681SAndroid Build Coastguard Worker 645*9880d681SAndroid Build Coastguard WorkerThe driver for this simply invokes all of the parsing pieces with a 646*9880d681SAndroid Build Coastguard Workertop-level dispatch loop. There isn't much interesting here, so I'll just 647*9880d681SAndroid Build Coastguard Workerinclude the top-level loop. See `below <#full-code-listing>`_ for full code in the 648*9880d681SAndroid Build Coastguard Worker"Top-Level Parsing" section. 649*9880d681SAndroid Build Coastguard Worker 650*9880d681SAndroid Build Coastguard Worker.. code-block:: c++ 651*9880d681SAndroid Build Coastguard Worker 652*9880d681SAndroid Build Coastguard Worker /// top ::= definition | external | expression | ';' 653*9880d681SAndroid Build Coastguard Worker static void MainLoop() { 654*9880d681SAndroid Build Coastguard Worker while (1) { 655*9880d681SAndroid Build Coastguard Worker fprintf(stderr, "ready> "); 656*9880d681SAndroid Build Coastguard Worker switch (CurTok) { 657*9880d681SAndroid Build Coastguard Worker case tok_eof: 658*9880d681SAndroid Build Coastguard Worker return; 659*9880d681SAndroid Build Coastguard Worker case ';': // ignore top-level semicolons. 660*9880d681SAndroid Build Coastguard Worker getNextToken(); 661*9880d681SAndroid Build Coastguard Worker break; 662*9880d681SAndroid Build Coastguard Worker case tok_def: 663*9880d681SAndroid Build Coastguard Worker HandleDefinition(); 664*9880d681SAndroid Build Coastguard Worker break; 665*9880d681SAndroid Build Coastguard Worker case tok_extern: 666*9880d681SAndroid Build Coastguard Worker HandleExtern(); 667*9880d681SAndroid Build Coastguard Worker break; 668*9880d681SAndroid Build Coastguard Worker default: 669*9880d681SAndroid Build Coastguard Worker HandleTopLevelExpression(); 670*9880d681SAndroid Build Coastguard Worker break; 671*9880d681SAndroid Build Coastguard Worker } 672*9880d681SAndroid Build Coastguard Worker } 673*9880d681SAndroid Build Coastguard Worker } 674*9880d681SAndroid Build Coastguard Worker 675*9880d681SAndroid Build Coastguard WorkerThe most interesting part of this is that we ignore top-level 676*9880d681SAndroid Build Coastguard Workersemicolons. Why is this, you ask? The basic reason is that if you type 677*9880d681SAndroid Build Coastguard Worker"4 + 5" at the command line, the parser doesn't know whether that is the 678*9880d681SAndroid Build Coastguard Workerend of what you will type or not. For example, on the next line you 679*9880d681SAndroid Build Coastguard Workercould type "def foo..." in which case 4+5 is the end of a top-level 680*9880d681SAndroid Build Coastguard Workerexpression. Alternatively you could type "\* 6", which would continue 681*9880d681SAndroid Build Coastguard Workerthe expression. Having top-level semicolons allows you to type "4+5;", 682*9880d681SAndroid Build Coastguard Workerand the parser will know you are done. 683*9880d681SAndroid Build Coastguard Worker 684*9880d681SAndroid Build Coastguard WorkerConclusions 685*9880d681SAndroid Build Coastguard Worker=========== 686*9880d681SAndroid Build Coastguard Worker 687*9880d681SAndroid Build Coastguard WorkerWith just under 400 lines of commented code (240 lines of non-comment, 688*9880d681SAndroid Build Coastguard Workernon-blank code), we fully defined our minimal language, including a 689*9880d681SAndroid Build Coastguard Workerlexer, parser, and AST builder. With this done, the executable will 690*9880d681SAndroid Build Coastguard Workervalidate Kaleidoscope code and tell us if it is grammatically invalid. 691*9880d681SAndroid Build Coastguard WorkerFor example, here is a sample interaction: 692*9880d681SAndroid Build Coastguard Worker 693*9880d681SAndroid Build Coastguard Worker.. code-block:: bash 694*9880d681SAndroid Build Coastguard Worker 695*9880d681SAndroid Build Coastguard Worker $ ./a.out 696*9880d681SAndroid Build Coastguard Worker ready> def foo(x y) x+foo(y, 4.0); 697*9880d681SAndroid Build Coastguard Worker Parsed a function definition. 698*9880d681SAndroid Build Coastguard Worker ready> def foo(x y) x+y y; 699*9880d681SAndroid Build Coastguard Worker Parsed a function definition. 700*9880d681SAndroid Build Coastguard Worker Parsed a top-level expr 701*9880d681SAndroid Build Coastguard Worker ready> def foo(x y) x+y ); 702*9880d681SAndroid Build Coastguard Worker Parsed a function definition. 703*9880d681SAndroid Build Coastguard Worker Error: unknown token when expecting an expression 704*9880d681SAndroid Build Coastguard Worker ready> extern sin(a); 705*9880d681SAndroid Build Coastguard Worker ready> Parsed an extern 706*9880d681SAndroid Build Coastguard Worker ready> ^D 707*9880d681SAndroid Build Coastguard Worker $ 708*9880d681SAndroid Build Coastguard Worker 709*9880d681SAndroid Build Coastguard WorkerThere is a lot of room for extension here. You can define new AST nodes, 710*9880d681SAndroid Build Coastguard Workerextend the language in many ways, etc. In the `next 711*9880d681SAndroid Build Coastguard Workerinstallment <LangImpl3.html>`_, we will describe how to generate LLVM 712*9880d681SAndroid Build Coastguard WorkerIntermediate Representation (IR) from the AST. 713*9880d681SAndroid Build Coastguard Worker 714*9880d681SAndroid Build Coastguard WorkerFull Code Listing 715*9880d681SAndroid Build Coastguard Worker================= 716*9880d681SAndroid Build Coastguard Worker 717*9880d681SAndroid Build Coastguard WorkerHere is the complete code listing for this and the previous chapter. 718*9880d681SAndroid Build Coastguard WorkerNote that it is fully self-contained: you don't need LLVM or any 719*9880d681SAndroid Build Coastguard Workerexternal libraries at all for this. (Besides the C and C++ standard 720*9880d681SAndroid Build Coastguard Workerlibraries, of course.) To build this, just compile with: 721*9880d681SAndroid Build Coastguard Worker 722*9880d681SAndroid Build Coastguard Worker.. code-block:: bash 723*9880d681SAndroid Build Coastguard Worker 724*9880d681SAndroid Build Coastguard Worker # Compile 725*9880d681SAndroid Build Coastguard Worker clang++ -g -O3 toy.cpp 726*9880d681SAndroid Build Coastguard Worker # Run 727*9880d681SAndroid Build Coastguard Worker ./a.out 728*9880d681SAndroid Build Coastguard Worker 729*9880d681SAndroid Build Coastguard WorkerHere is the code: 730*9880d681SAndroid Build Coastguard Worker 731*9880d681SAndroid Build Coastguard Worker.. literalinclude:: ../../examples/Kaleidoscope/Chapter2/toy.cpp 732*9880d681SAndroid Build Coastguard Worker :language: c++ 733*9880d681SAndroid Build Coastguard Worker 734*9880d681SAndroid Build Coastguard Worker`Next: Implementing Code Generation to LLVM IR <LangImpl03.html>`_ 735*9880d681SAndroid Build Coastguard Worker 736