xref: /aosp_15_r20/external/llvm/docs/tutorial/LangImpl02.rst (revision 9880d6810fe72a1726cb53787c6711e909410d58)
1*9880d681SAndroid Build Coastguard Worker===========================================
2*9880d681SAndroid Build Coastguard WorkerKaleidoscope: Implementing a Parser and AST
3*9880d681SAndroid Build Coastguard Worker===========================================
4*9880d681SAndroid Build Coastguard Worker
5*9880d681SAndroid Build Coastguard Worker.. contents::
6*9880d681SAndroid Build Coastguard Worker   :local:
7*9880d681SAndroid Build Coastguard Worker
8*9880d681SAndroid Build Coastguard WorkerChapter 2 Introduction
9*9880d681SAndroid Build Coastguard Worker======================
10*9880d681SAndroid Build Coastguard Worker
11*9880d681SAndroid Build Coastguard WorkerWelcome to Chapter 2 of the "`Implementing a language with
12*9880d681SAndroid Build Coastguard WorkerLLVM <index.html>`_" tutorial. This chapter shows you how to use the
13*9880d681SAndroid Build Coastguard Workerlexer, built in `Chapter 1 <LangImpl1.html>`_, to build a full
14*9880d681SAndroid Build Coastguard Worker`parser <http://en.wikipedia.org/wiki/Parsing>`_ for our Kaleidoscope
15*9880d681SAndroid Build Coastguard Workerlanguage. Once we have a parser, we'll define and build an `Abstract
16*9880d681SAndroid Build Coastguard WorkerSyntax Tree <http://en.wikipedia.org/wiki/Abstract_syntax_tree>`_ (AST).
17*9880d681SAndroid Build Coastguard Worker
18*9880d681SAndroid Build Coastguard WorkerThe parser we will build uses a combination of `Recursive Descent
19*9880d681SAndroid Build Coastguard WorkerParsing <http://en.wikipedia.org/wiki/Recursive_descent_parser>`_ and
20*9880d681SAndroid Build Coastguard Worker`Operator-Precedence
21*9880d681SAndroid Build Coastguard WorkerParsing <http://en.wikipedia.org/wiki/Operator-precedence_parser>`_ to
22*9880d681SAndroid Build Coastguard Workerparse the Kaleidoscope language (the latter for binary expressions and
23*9880d681SAndroid Build Coastguard Workerthe former for everything else). Before we get to parsing though, lets
24*9880d681SAndroid Build Coastguard Workertalk about the output of the parser: the Abstract Syntax Tree.
25*9880d681SAndroid Build Coastguard Worker
26*9880d681SAndroid Build Coastguard WorkerThe Abstract Syntax Tree (AST)
27*9880d681SAndroid Build Coastguard Worker==============================
28*9880d681SAndroid Build Coastguard Worker
29*9880d681SAndroid Build Coastguard WorkerThe AST for a program captures its behavior in such a way that it is
30*9880d681SAndroid Build Coastguard Workereasy for later stages of the compiler (e.g. code generation) to
31*9880d681SAndroid Build Coastguard Workerinterpret. We basically want one object for each construct in the
32*9880d681SAndroid Build Coastguard Workerlanguage, and the AST should closely model the language. In
33*9880d681SAndroid Build Coastguard WorkerKaleidoscope, we have expressions, a prototype, and a function object.
34*9880d681SAndroid Build Coastguard WorkerWe'll start with expressions first:
35*9880d681SAndroid Build Coastguard Worker
36*9880d681SAndroid Build Coastguard Worker.. code-block:: c++
37*9880d681SAndroid Build Coastguard Worker
38*9880d681SAndroid Build Coastguard Worker    /// ExprAST - Base class for all expression nodes.
39*9880d681SAndroid Build Coastguard Worker    class ExprAST {
40*9880d681SAndroid Build Coastguard Worker    public:
41*9880d681SAndroid Build Coastguard Worker      virtual ~ExprAST() {}
42*9880d681SAndroid Build Coastguard Worker    };
43*9880d681SAndroid Build Coastguard Worker
44*9880d681SAndroid Build Coastguard Worker    /// NumberExprAST - Expression class for numeric literals like "1.0".
45*9880d681SAndroid Build Coastguard Worker    class NumberExprAST : public ExprAST {
46*9880d681SAndroid Build Coastguard Worker      double Val;
47*9880d681SAndroid Build Coastguard Worker
48*9880d681SAndroid Build Coastguard Worker    public:
49*9880d681SAndroid Build Coastguard Worker      NumberExprAST(double Val) : Val(Val) {}
50*9880d681SAndroid Build Coastguard Worker    };
51*9880d681SAndroid Build Coastguard Worker
52*9880d681SAndroid Build Coastguard WorkerThe code above shows the definition of the base ExprAST class and one
53*9880d681SAndroid Build Coastguard Workersubclass which we use for numeric literals. The important thing to note
54*9880d681SAndroid Build Coastguard Workerabout this code is that the NumberExprAST class captures the numeric
55*9880d681SAndroid Build Coastguard Workervalue of the literal as an instance variable. This allows later phases
56*9880d681SAndroid Build Coastguard Workerof the compiler to know what the stored numeric value is.
57*9880d681SAndroid Build Coastguard Worker
58*9880d681SAndroid Build Coastguard WorkerRight now we only create the AST, so there are no useful accessor
59*9880d681SAndroid Build Coastguard Workermethods on them. It would be very easy to add a virtual method to pretty
60*9880d681SAndroid Build Coastguard Workerprint the code, for example. Here are the other expression AST node
61*9880d681SAndroid Build Coastguard Workerdefinitions that we'll use in the basic form of the Kaleidoscope
62*9880d681SAndroid Build Coastguard Workerlanguage:
63*9880d681SAndroid Build Coastguard Worker
64*9880d681SAndroid Build Coastguard Worker.. code-block:: c++
65*9880d681SAndroid Build Coastguard Worker
66*9880d681SAndroid Build Coastguard Worker    /// VariableExprAST - Expression class for referencing a variable, like "a".
67*9880d681SAndroid Build Coastguard Worker    class VariableExprAST : public ExprAST {
68*9880d681SAndroid Build Coastguard Worker      std::string Name;
69*9880d681SAndroid Build Coastguard Worker
70*9880d681SAndroid Build Coastguard Worker    public:
71*9880d681SAndroid Build Coastguard Worker      VariableExprAST(const std::string &Name) : Name(Name) {}
72*9880d681SAndroid Build Coastguard Worker    };
73*9880d681SAndroid Build Coastguard Worker
74*9880d681SAndroid Build Coastguard Worker    /// BinaryExprAST - Expression class for a binary operator.
75*9880d681SAndroid Build Coastguard Worker    class BinaryExprAST : public ExprAST {
76*9880d681SAndroid Build Coastguard Worker      char Op;
77*9880d681SAndroid Build Coastguard Worker      std::unique_ptr<ExprAST> LHS, RHS;
78*9880d681SAndroid Build Coastguard Worker
79*9880d681SAndroid Build Coastguard Worker    public:
80*9880d681SAndroid Build Coastguard Worker      BinaryExprAST(char op, std::unique_ptr<ExprAST> LHS,
81*9880d681SAndroid Build Coastguard Worker                    std::unique_ptr<ExprAST> RHS)
82*9880d681SAndroid Build Coastguard Worker        : Op(op), LHS(std::move(LHS)), RHS(std::move(RHS)) {}
83*9880d681SAndroid Build Coastguard Worker    };
84*9880d681SAndroid Build Coastguard Worker
85*9880d681SAndroid Build Coastguard Worker    /// CallExprAST - Expression class for function calls.
86*9880d681SAndroid Build Coastguard Worker    class CallExprAST : public ExprAST {
87*9880d681SAndroid Build Coastguard Worker      std::string Callee;
88*9880d681SAndroid Build Coastguard Worker      std::vector<std::unique_ptr<ExprAST>> Args;
89*9880d681SAndroid Build Coastguard Worker
90*9880d681SAndroid Build Coastguard Worker    public:
91*9880d681SAndroid Build Coastguard Worker      CallExprAST(const std::string &Callee,
92*9880d681SAndroid Build Coastguard Worker                  std::vector<std::unique_ptr<ExprAST>> Args)
93*9880d681SAndroid Build Coastguard Worker        : Callee(Callee), Args(std::move(Args)) {}
94*9880d681SAndroid Build Coastguard Worker    };
95*9880d681SAndroid Build Coastguard Worker
96*9880d681SAndroid Build Coastguard WorkerThis is all (intentionally) rather straight-forward: variables capture
97*9880d681SAndroid Build Coastguard Workerthe variable name, binary operators capture their opcode (e.g. '+'), and
98*9880d681SAndroid Build Coastguard Workercalls capture a function name as well as a list of any argument
99*9880d681SAndroid Build Coastguard Workerexpressions. One thing that is nice about our AST is that it captures
100*9880d681SAndroid Build Coastguard Workerthe language features without talking about the syntax of the language.
101*9880d681SAndroid Build Coastguard WorkerNote that there is no discussion about precedence of binary operators,
102*9880d681SAndroid Build Coastguard Workerlexical structure, etc.
103*9880d681SAndroid Build Coastguard Worker
104*9880d681SAndroid Build Coastguard WorkerFor our basic language, these are all of the expression nodes we'll
105*9880d681SAndroid Build Coastguard Workerdefine. Because it doesn't have conditional control flow, it isn't
106*9880d681SAndroid Build Coastguard WorkerTuring-complete; we'll fix that in a later installment. The two things
107*9880d681SAndroid Build Coastguard Workerwe need next are a way to talk about the interface to a function, and a
108*9880d681SAndroid Build Coastguard Workerway to talk about functions themselves:
109*9880d681SAndroid Build Coastguard Worker
110*9880d681SAndroid Build Coastguard Worker.. code-block:: c++
111*9880d681SAndroid Build Coastguard Worker
112*9880d681SAndroid Build Coastguard Worker    /// PrototypeAST - This class represents the "prototype" for a function,
113*9880d681SAndroid Build Coastguard Worker    /// which captures its name, and its argument names (thus implicitly the number
114*9880d681SAndroid Build Coastguard Worker    /// of arguments the function takes).
115*9880d681SAndroid Build Coastguard Worker    class PrototypeAST {
116*9880d681SAndroid Build Coastguard Worker      std::string Name;
117*9880d681SAndroid Build Coastguard Worker      std::vector<std::string> Args;
118*9880d681SAndroid Build Coastguard Worker
119*9880d681SAndroid Build Coastguard Worker    public:
120*9880d681SAndroid Build Coastguard Worker      PrototypeAST(const std::string &name, std::vector<std::string> Args)
121*9880d681SAndroid Build Coastguard Worker        : Name(name), Args(std::move(Args)) {}
122*9880d681SAndroid Build Coastguard Worker    };
123*9880d681SAndroid Build Coastguard Worker
124*9880d681SAndroid Build Coastguard Worker    /// FunctionAST - This class represents a function definition itself.
125*9880d681SAndroid Build Coastguard Worker    class FunctionAST {
126*9880d681SAndroid Build Coastguard Worker      std::unique_ptr<PrototypeAST> Proto;
127*9880d681SAndroid Build Coastguard Worker      std::unique_ptr<ExprAST> Body;
128*9880d681SAndroid Build Coastguard Worker
129*9880d681SAndroid Build Coastguard Worker    public:
130*9880d681SAndroid Build Coastguard Worker      FunctionAST(std::unique_ptr<PrototypeAST> Proto,
131*9880d681SAndroid Build Coastguard Worker                  std::unique_ptr<ExprAST> Body)
132*9880d681SAndroid Build Coastguard Worker        : Proto(std::move(Proto)), Body(std::move(Body)) {}
133*9880d681SAndroid Build Coastguard Worker    };
134*9880d681SAndroid Build Coastguard Worker
135*9880d681SAndroid Build Coastguard WorkerIn Kaleidoscope, functions are typed with just a count of their
136*9880d681SAndroid Build Coastguard Workerarguments. Since all values are double precision floating point, the
137*9880d681SAndroid Build Coastguard Workertype of each argument doesn't need to be stored anywhere. In a more
138*9880d681SAndroid Build Coastguard Workeraggressive and realistic language, the "ExprAST" class would probably
139*9880d681SAndroid Build Coastguard Workerhave a type field.
140*9880d681SAndroid Build Coastguard Worker
141*9880d681SAndroid Build Coastguard WorkerWith this scaffolding, we can now talk about parsing expressions and
142*9880d681SAndroid Build Coastguard Workerfunction bodies in Kaleidoscope.
143*9880d681SAndroid Build Coastguard Worker
144*9880d681SAndroid Build Coastguard WorkerParser Basics
145*9880d681SAndroid Build Coastguard Worker=============
146*9880d681SAndroid Build Coastguard Worker
147*9880d681SAndroid Build Coastguard WorkerNow that we have an AST to build, we need to define the parser code to
148*9880d681SAndroid Build Coastguard Workerbuild it. The idea here is that we want to parse something like "x+y"
149*9880d681SAndroid Build Coastguard Worker(which is returned as three tokens by the lexer) into an AST that could
150*9880d681SAndroid Build Coastguard Workerbe generated with calls like this:
151*9880d681SAndroid Build Coastguard Worker
152*9880d681SAndroid Build Coastguard Worker.. code-block:: c++
153*9880d681SAndroid Build Coastguard Worker
154*9880d681SAndroid Build Coastguard Worker      auto LHS = llvm::make_unique<VariableExprAST>("x");
155*9880d681SAndroid Build Coastguard Worker      auto RHS = llvm::make_unique<VariableExprAST>("y");
156*9880d681SAndroid Build Coastguard Worker      auto Result = std::make_unique<BinaryExprAST>('+', std::move(LHS),
157*9880d681SAndroid Build Coastguard Worker                                                    std::move(RHS));
158*9880d681SAndroid Build Coastguard Worker
159*9880d681SAndroid Build Coastguard WorkerIn order to do this, we'll start by defining some basic helper routines:
160*9880d681SAndroid Build Coastguard Worker
161*9880d681SAndroid Build Coastguard Worker.. code-block:: c++
162*9880d681SAndroid Build Coastguard Worker
163*9880d681SAndroid Build Coastguard Worker    /// CurTok/getNextToken - Provide a simple token buffer.  CurTok is the current
164*9880d681SAndroid Build Coastguard Worker    /// token the parser is looking at.  getNextToken reads another token from the
165*9880d681SAndroid Build Coastguard Worker    /// lexer and updates CurTok with its results.
166*9880d681SAndroid Build Coastguard Worker    static int CurTok;
167*9880d681SAndroid Build Coastguard Worker    static int getNextToken() {
168*9880d681SAndroid Build Coastguard Worker      return CurTok = gettok();
169*9880d681SAndroid Build Coastguard Worker    }
170*9880d681SAndroid Build Coastguard Worker
171*9880d681SAndroid Build Coastguard WorkerThis implements a simple token buffer around the lexer. This allows us
172*9880d681SAndroid Build Coastguard Workerto look one token ahead at what the lexer is returning. Every function
173*9880d681SAndroid Build Coastguard Workerin our parser will assume that CurTok is the current token that needs to
174*9880d681SAndroid Build Coastguard Workerbe parsed.
175*9880d681SAndroid Build Coastguard Worker
176*9880d681SAndroid Build Coastguard Worker.. code-block:: c++
177*9880d681SAndroid Build Coastguard Worker
178*9880d681SAndroid Build Coastguard Worker
179*9880d681SAndroid Build Coastguard Worker    /// LogError* - These are little helper functions for error handling.
180*9880d681SAndroid Build Coastguard Worker    std::unique_ptr<ExprAST> LogError(const char *Str) {
181*9880d681SAndroid Build Coastguard Worker      fprintf(stderr, "LogError: %s\n", Str);
182*9880d681SAndroid Build Coastguard Worker      return nullptr;
183*9880d681SAndroid Build Coastguard Worker    }
184*9880d681SAndroid Build Coastguard Worker    std::unique_ptr<PrototypeAST> LogErrorP(const char *Str) {
185*9880d681SAndroid Build Coastguard Worker      LogError(Str);
186*9880d681SAndroid Build Coastguard Worker      return nullptr;
187*9880d681SAndroid Build Coastguard Worker    }
188*9880d681SAndroid Build Coastguard Worker
189*9880d681SAndroid Build Coastguard WorkerThe ``LogError`` routines are simple helper routines that our parser will
190*9880d681SAndroid Build Coastguard Workeruse to handle errors. The error recovery in our parser will not be the
191*9880d681SAndroid Build Coastguard Workerbest and is not particular user-friendly, but it will be enough for our
192*9880d681SAndroid Build Coastguard Workertutorial. These routines make it easier to handle errors in routines
193*9880d681SAndroid Build Coastguard Workerthat have various return types: they always return null.
194*9880d681SAndroid Build Coastguard Worker
195*9880d681SAndroid Build Coastguard WorkerWith these basic helper functions, we can implement the first piece of
196*9880d681SAndroid Build Coastguard Workerour grammar: numeric literals.
197*9880d681SAndroid Build Coastguard Worker
198*9880d681SAndroid Build Coastguard WorkerBasic Expression Parsing
199*9880d681SAndroid Build Coastguard Worker========================
200*9880d681SAndroid Build Coastguard Worker
201*9880d681SAndroid Build Coastguard WorkerWe start with numeric literals, because they are the simplest to
202*9880d681SAndroid Build Coastguard Workerprocess. For each production in our grammar, we'll define a function
203*9880d681SAndroid Build Coastguard Workerwhich parses that production. For numeric literals, we have:
204*9880d681SAndroid Build Coastguard Worker
205*9880d681SAndroid Build Coastguard Worker.. code-block:: c++
206*9880d681SAndroid Build Coastguard Worker
207*9880d681SAndroid Build Coastguard Worker    /// numberexpr ::= number
208*9880d681SAndroid Build Coastguard Worker    static std::unique_ptr<ExprAST> ParseNumberExpr() {
209*9880d681SAndroid Build Coastguard Worker      auto Result = llvm::make_unique<NumberExprAST>(NumVal);
210*9880d681SAndroid Build Coastguard Worker      getNextToken(); // consume the number
211*9880d681SAndroid Build Coastguard Worker      return std::move(Result);
212*9880d681SAndroid Build Coastguard Worker    }
213*9880d681SAndroid Build Coastguard Worker
214*9880d681SAndroid Build Coastguard WorkerThis routine is very simple: it expects to be called when the current
215*9880d681SAndroid Build Coastguard Workertoken is a ``tok_number`` token. It takes the current number value,
216*9880d681SAndroid Build Coastguard Workercreates a ``NumberExprAST`` node, advances the lexer to the next token,
217*9880d681SAndroid Build Coastguard Workerand finally returns.
218*9880d681SAndroid Build Coastguard Worker
219*9880d681SAndroid Build Coastguard WorkerThere are some interesting aspects to this. The most important one is
220*9880d681SAndroid Build Coastguard Workerthat this routine eats all of the tokens that correspond to the
221*9880d681SAndroid Build Coastguard Workerproduction and returns the lexer buffer with the next token (which is
222*9880d681SAndroid Build Coastguard Workernot part of the grammar production) ready to go. This is a fairly
223*9880d681SAndroid Build Coastguard Workerstandard way to go for recursive descent parsers. For a better example,
224*9880d681SAndroid Build Coastguard Workerthe parenthesis operator is defined like this:
225*9880d681SAndroid Build Coastguard Worker
226*9880d681SAndroid Build Coastguard Worker.. code-block:: c++
227*9880d681SAndroid Build Coastguard Worker
228*9880d681SAndroid Build Coastguard Worker    /// parenexpr ::= '(' expression ')'
229*9880d681SAndroid Build Coastguard Worker    static std::unique_ptr<ExprAST> ParseParenExpr() {
230*9880d681SAndroid Build Coastguard Worker      getNextToken(); // eat (.
231*9880d681SAndroid Build Coastguard Worker      auto V = ParseExpression();
232*9880d681SAndroid Build Coastguard Worker      if (!V)
233*9880d681SAndroid Build Coastguard Worker        return nullptr;
234*9880d681SAndroid Build Coastguard Worker
235*9880d681SAndroid Build Coastguard Worker      if (CurTok != ')')
236*9880d681SAndroid Build Coastguard Worker        return LogError("expected ')'");
237*9880d681SAndroid Build Coastguard Worker      getNextToken(); // eat ).
238*9880d681SAndroid Build Coastguard Worker      return V;
239*9880d681SAndroid Build Coastguard Worker    }
240*9880d681SAndroid Build Coastguard Worker
241*9880d681SAndroid Build Coastguard WorkerThis function illustrates a number of interesting things about the
242*9880d681SAndroid Build Coastguard Workerparser:
243*9880d681SAndroid Build Coastguard Worker
244*9880d681SAndroid Build Coastguard Worker1) It shows how we use the LogError routines. When called, this function
245*9880d681SAndroid Build Coastguard Workerexpects that the current token is a '(' token, but after parsing the
246*9880d681SAndroid Build Coastguard Workersubexpression, it is possible that there is no ')' waiting. For example,
247*9880d681SAndroid Build Coastguard Workerif the user types in "(4 x" instead of "(4)", the parser should emit an
248*9880d681SAndroid Build Coastguard Workererror. Because errors can occur, the parser needs a way to indicate that
249*9880d681SAndroid Build Coastguard Workerthey happened: in our parser, we return null on an error.
250*9880d681SAndroid Build Coastguard Worker
251*9880d681SAndroid Build Coastguard Worker2) Another interesting aspect of this function is that it uses recursion
252*9880d681SAndroid Build Coastguard Workerby calling ``ParseExpression`` (we will soon see that
253*9880d681SAndroid Build Coastguard Worker``ParseExpression`` can call ``ParseParenExpr``). This is powerful
254*9880d681SAndroid Build Coastguard Workerbecause it allows us to handle recursive grammars, and keeps each
255*9880d681SAndroid Build Coastguard Workerproduction very simple. Note that parentheses do not cause construction
256*9880d681SAndroid Build Coastguard Workerof AST nodes themselves. While we could do it this way, the most
257*9880d681SAndroid Build Coastguard Workerimportant role of parentheses are to guide the parser and provide
258*9880d681SAndroid Build Coastguard Workergrouping. Once the parser constructs the AST, parentheses are not
259*9880d681SAndroid Build Coastguard Workerneeded.
260*9880d681SAndroid Build Coastguard Worker
261*9880d681SAndroid Build Coastguard WorkerThe next simple production is for handling variable references and
262*9880d681SAndroid Build Coastguard Workerfunction calls:
263*9880d681SAndroid Build Coastguard Worker
264*9880d681SAndroid Build Coastguard Worker.. code-block:: c++
265*9880d681SAndroid Build Coastguard Worker
266*9880d681SAndroid Build Coastguard Worker    /// identifierexpr
267*9880d681SAndroid Build Coastguard Worker    ///   ::= identifier
268*9880d681SAndroid Build Coastguard Worker    ///   ::= identifier '(' expression* ')'
269*9880d681SAndroid Build Coastguard Worker    static std::unique_ptr<ExprAST> ParseIdentifierExpr() {
270*9880d681SAndroid Build Coastguard Worker      std::string IdName = IdentifierStr;
271*9880d681SAndroid Build Coastguard Worker
272*9880d681SAndroid Build Coastguard Worker      getNextToken();  // eat identifier.
273*9880d681SAndroid Build Coastguard Worker
274*9880d681SAndroid Build Coastguard Worker      if (CurTok != '(') // Simple variable ref.
275*9880d681SAndroid Build Coastguard Worker        return llvm::make_unique<VariableExprAST>(IdName);
276*9880d681SAndroid Build Coastguard Worker
277*9880d681SAndroid Build Coastguard Worker      // Call.
278*9880d681SAndroid Build Coastguard Worker      getNextToken();  // eat (
279*9880d681SAndroid Build Coastguard Worker      std::vector<std::unique_ptr<ExprAST>> Args;
280*9880d681SAndroid Build Coastguard Worker      if (CurTok != ')') {
281*9880d681SAndroid Build Coastguard Worker        while (1) {
282*9880d681SAndroid Build Coastguard Worker          if (auto Arg = ParseExpression())
283*9880d681SAndroid Build Coastguard Worker            Args.push_back(std::move(Arg));
284*9880d681SAndroid Build Coastguard Worker          else
285*9880d681SAndroid Build Coastguard Worker            return nullptr;
286*9880d681SAndroid Build Coastguard Worker
287*9880d681SAndroid Build Coastguard Worker          if (CurTok == ')')
288*9880d681SAndroid Build Coastguard Worker            break;
289*9880d681SAndroid Build Coastguard Worker
290*9880d681SAndroid Build Coastguard Worker          if (CurTok != ',')
291*9880d681SAndroid Build Coastguard Worker            return LogError("Expected ')' or ',' in argument list");
292*9880d681SAndroid Build Coastguard Worker          getNextToken();
293*9880d681SAndroid Build Coastguard Worker        }
294*9880d681SAndroid Build Coastguard Worker      }
295*9880d681SAndroid Build Coastguard Worker
296*9880d681SAndroid Build Coastguard Worker      // Eat the ')'.
297*9880d681SAndroid Build Coastguard Worker      getNextToken();
298*9880d681SAndroid Build Coastguard Worker
299*9880d681SAndroid Build Coastguard Worker      return llvm::make_unique<CallExprAST>(IdName, std::move(Args));
300*9880d681SAndroid Build Coastguard Worker    }
301*9880d681SAndroid Build Coastguard Worker
302*9880d681SAndroid Build Coastguard WorkerThis routine follows the same style as the other routines. (It expects
303*9880d681SAndroid Build Coastguard Workerto be called if the current token is a ``tok_identifier`` token). It
304*9880d681SAndroid Build Coastguard Workeralso has recursion and error handling. One interesting aspect of this is
305*9880d681SAndroid Build Coastguard Workerthat it uses *look-ahead* to determine if the current identifier is a
306*9880d681SAndroid Build Coastguard Workerstand alone variable reference or if it is a function call expression.
307*9880d681SAndroid Build Coastguard WorkerIt handles this by checking to see if the token after the identifier is
308*9880d681SAndroid Build Coastguard Workera '(' token, constructing either a ``VariableExprAST`` or
309*9880d681SAndroid Build Coastguard Worker``CallExprAST`` node as appropriate.
310*9880d681SAndroid Build Coastguard Worker
311*9880d681SAndroid Build Coastguard WorkerNow that we have all of our simple expression-parsing logic in place, we
312*9880d681SAndroid Build Coastguard Workercan define a helper function to wrap it together into one entry point.
313*9880d681SAndroid Build Coastguard WorkerWe call this class of expressions "primary" expressions, for reasons
314*9880d681SAndroid Build Coastguard Workerthat will become more clear `later in the
315*9880d681SAndroid Build Coastguard Workertutorial <LangImpl6.html#user-defined-unary-operators>`_. In order to parse an arbitrary
316*9880d681SAndroid Build Coastguard Workerprimary expression, we need to determine what sort of expression it is:
317*9880d681SAndroid Build Coastguard Worker
318*9880d681SAndroid Build Coastguard Worker.. code-block:: c++
319*9880d681SAndroid Build Coastguard Worker
320*9880d681SAndroid Build Coastguard Worker    /// primary
321*9880d681SAndroid Build Coastguard Worker    ///   ::= identifierexpr
322*9880d681SAndroid Build Coastguard Worker    ///   ::= numberexpr
323*9880d681SAndroid Build Coastguard Worker    ///   ::= parenexpr
324*9880d681SAndroid Build Coastguard Worker    static std::unique_ptr<ExprAST> ParsePrimary() {
325*9880d681SAndroid Build Coastguard Worker      switch (CurTok) {
326*9880d681SAndroid Build Coastguard Worker      default:
327*9880d681SAndroid Build Coastguard Worker        return LogError("unknown token when expecting an expression");
328*9880d681SAndroid Build Coastguard Worker      case tok_identifier:
329*9880d681SAndroid Build Coastguard Worker        return ParseIdentifierExpr();
330*9880d681SAndroid Build Coastguard Worker      case tok_number:
331*9880d681SAndroid Build Coastguard Worker        return ParseNumberExpr();
332*9880d681SAndroid Build Coastguard Worker      case '(':
333*9880d681SAndroid Build Coastguard Worker        return ParseParenExpr();
334*9880d681SAndroid Build Coastguard Worker      }
335*9880d681SAndroid Build Coastguard Worker    }
336*9880d681SAndroid Build Coastguard Worker
337*9880d681SAndroid Build Coastguard WorkerNow that you see the definition of this function, it is more obvious why
338*9880d681SAndroid Build Coastguard Workerwe can assume the state of CurTok in the various functions. This uses
339*9880d681SAndroid Build Coastguard Workerlook-ahead to determine which sort of expression is being inspected, and
340*9880d681SAndroid Build Coastguard Workerthen parses it with a function call.
341*9880d681SAndroid Build Coastguard Worker
342*9880d681SAndroid Build Coastguard WorkerNow that basic expressions are handled, we need to handle binary
343*9880d681SAndroid Build Coastguard Workerexpressions. They are a bit more complex.
344*9880d681SAndroid Build Coastguard Worker
345*9880d681SAndroid Build Coastguard WorkerBinary Expression Parsing
346*9880d681SAndroid Build Coastguard Worker=========================
347*9880d681SAndroid Build Coastguard Worker
348*9880d681SAndroid Build Coastguard WorkerBinary expressions are significantly harder to parse because they are
349*9880d681SAndroid Build Coastguard Workeroften ambiguous. For example, when given the string "x+y\*z", the parser
350*9880d681SAndroid Build Coastguard Workercan choose to parse it as either "(x+y)\*z" or "x+(y\*z)". With common
351*9880d681SAndroid Build Coastguard Workerdefinitions from mathematics, we expect the later parse, because "\*"
352*9880d681SAndroid Build Coastguard Worker(multiplication) has higher *precedence* than "+" (addition).
353*9880d681SAndroid Build Coastguard Worker
354*9880d681SAndroid Build Coastguard WorkerThere are many ways to handle this, but an elegant and efficient way is
355*9880d681SAndroid Build Coastguard Workerto use `Operator-Precedence
356*9880d681SAndroid Build Coastguard WorkerParsing <http://en.wikipedia.org/wiki/Operator-precedence_parser>`_.
357*9880d681SAndroid Build Coastguard WorkerThis parsing technique uses the precedence of binary operators to guide
358*9880d681SAndroid Build Coastguard Workerrecursion. To start with, we need a table of precedences:
359*9880d681SAndroid Build Coastguard Worker
360*9880d681SAndroid Build Coastguard Worker.. code-block:: c++
361*9880d681SAndroid Build Coastguard Worker
362*9880d681SAndroid Build Coastguard Worker    /// BinopPrecedence - This holds the precedence for each binary operator that is
363*9880d681SAndroid Build Coastguard Worker    /// defined.
364*9880d681SAndroid Build Coastguard Worker    static std::map<char, int> BinopPrecedence;
365*9880d681SAndroid Build Coastguard Worker
366*9880d681SAndroid Build Coastguard Worker    /// GetTokPrecedence - Get the precedence of the pending binary operator token.
367*9880d681SAndroid Build Coastguard Worker    static int GetTokPrecedence() {
368*9880d681SAndroid Build Coastguard Worker      if (!isascii(CurTok))
369*9880d681SAndroid Build Coastguard Worker        return -1;
370*9880d681SAndroid Build Coastguard Worker
371*9880d681SAndroid Build Coastguard Worker      // Make sure it's a declared binop.
372*9880d681SAndroid Build Coastguard Worker      int TokPrec = BinopPrecedence[CurTok];
373*9880d681SAndroid Build Coastguard Worker      if (TokPrec <= 0) return -1;
374*9880d681SAndroid Build Coastguard Worker      return TokPrec;
375*9880d681SAndroid Build Coastguard Worker    }
376*9880d681SAndroid Build Coastguard Worker
377*9880d681SAndroid Build Coastguard Worker    int main() {
378*9880d681SAndroid Build Coastguard Worker      // Install standard binary operators.
379*9880d681SAndroid Build Coastguard Worker      // 1 is lowest precedence.
380*9880d681SAndroid Build Coastguard Worker      BinopPrecedence['<'] = 10;
381*9880d681SAndroid Build Coastguard Worker      BinopPrecedence['+'] = 20;
382*9880d681SAndroid Build Coastguard Worker      BinopPrecedence['-'] = 20;
383*9880d681SAndroid Build Coastguard Worker      BinopPrecedence['*'] = 40;  // highest.
384*9880d681SAndroid Build Coastguard Worker      ...
385*9880d681SAndroid Build Coastguard Worker    }
386*9880d681SAndroid Build Coastguard Worker
387*9880d681SAndroid Build Coastguard WorkerFor the basic form of Kaleidoscope, we will only support 4 binary
388*9880d681SAndroid Build Coastguard Workeroperators (this can obviously be extended by you, our brave and intrepid
389*9880d681SAndroid Build Coastguard Workerreader). The ``GetTokPrecedence`` function returns the precedence for
390*9880d681SAndroid Build Coastguard Workerthe current token, or -1 if the token is not a binary operator. Having a
391*9880d681SAndroid Build Coastguard Workermap makes it easy to add new operators and makes it clear that the
392*9880d681SAndroid Build Coastguard Workeralgorithm doesn't depend on the specific operators involved, but it
393*9880d681SAndroid Build Coastguard Workerwould be easy enough to eliminate the map and do the comparisons in the
394*9880d681SAndroid Build Coastguard Worker``GetTokPrecedence`` function. (Or just use a fixed-size array).
395*9880d681SAndroid Build Coastguard Worker
396*9880d681SAndroid Build Coastguard WorkerWith the helper above defined, we can now start parsing binary
397*9880d681SAndroid Build Coastguard Workerexpressions. The basic idea of operator precedence parsing is to break
398*9880d681SAndroid Build Coastguard Workerdown an expression with potentially ambiguous binary operators into
399*9880d681SAndroid Build Coastguard Workerpieces. Consider, for example, the expression "a+b+(c+d)\*e\*f+g".
400*9880d681SAndroid Build Coastguard WorkerOperator precedence parsing considers this as a stream of primary
401*9880d681SAndroid Build Coastguard Workerexpressions separated by binary operators. As such, it will first parse
402*9880d681SAndroid Build Coastguard Workerthe leading primary expression "a", then it will see the pairs [+, b]
403*9880d681SAndroid Build Coastguard Worker[+, (c+d)] [\*, e] [\*, f] and [+, g]. Note that because parentheses are
404*9880d681SAndroid Build Coastguard Workerprimary expressions, the binary expression parser doesn't need to worry
405*9880d681SAndroid Build Coastguard Workerabout nested subexpressions like (c+d) at all.
406*9880d681SAndroid Build Coastguard Worker
407*9880d681SAndroid Build Coastguard WorkerTo start, an expression is a primary expression potentially followed by
408*9880d681SAndroid Build Coastguard Workera sequence of [binop,primaryexpr] pairs:
409*9880d681SAndroid Build Coastguard Worker
410*9880d681SAndroid Build Coastguard Worker.. code-block:: c++
411*9880d681SAndroid Build Coastguard Worker
412*9880d681SAndroid Build Coastguard Worker    /// expression
413*9880d681SAndroid Build Coastguard Worker    ///   ::= primary binoprhs
414*9880d681SAndroid Build Coastguard Worker    ///
415*9880d681SAndroid Build Coastguard Worker    static std::unique_ptr<ExprAST> ParseExpression() {
416*9880d681SAndroid Build Coastguard Worker      auto LHS = ParsePrimary();
417*9880d681SAndroid Build Coastguard Worker      if (!LHS)
418*9880d681SAndroid Build Coastguard Worker        return nullptr;
419*9880d681SAndroid Build Coastguard Worker
420*9880d681SAndroid Build Coastguard Worker      return ParseBinOpRHS(0, std::move(LHS));
421*9880d681SAndroid Build Coastguard Worker    }
422*9880d681SAndroid Build Coastguard Worker
423*9880d681SAndroid Build Coastguard Worker``ParseBinOpRHS`` is the function that parses the sequence of pairs for
424*9880d681SAndroid Build Coastguard Workerus. It takes a precedence and a pointer to an expression for the part
425*9880d681SAndroid Build Coastguard Workerthat has been parsed so far. Note that "x" is a perfectly valid
426*9880d681SAndroid Build Coastguard Workerexpression: As such, "binoprhs" is allowed to be empty, in which case it
427*9880d681SAndroid Build Coastguard Workerreturns the expression that is passed into it. In our example above, the
428*9880d681SAndroid Build Coastguard Workercode passes the expression for "a" into ``ParseBinOpRHS`` and the
429*9880d681SAndroid Build Coastguard Workercurrent token is "+".
430*9880d681SAndroid Build Coastguard Worker
431*9880d681SAndroid Build Coastguard WorkerThe precedence value passed into ``ParseBinOpRHS`` indicates the
432*9880d681SAndroid Build Coastguard Worker*minimal operator precedence* that the function is allowed to eat. For
433*9880d681SAndroid Build Coastguard Workerexample, if the current pair stream is [+, x] and ``ParseBinOpRHS`` is
434*9880d681SAndroid Build Coastguard Workerpassed in a precedence of 40, it will not consume any tokens (because
435*9880d681SAndroid Build Coastguard Workerthe precedence of '+' is only 20). With this in mind, ``ParseBinOpRHS``
436*9880d681SAndroid Build Coastguard Workerstarts with:
437*9880d681SAndroid Build Coastguard Worker
438*9880d681SAndroid Build Coastguard Worker.. code-block:: c++
439*9880d681SAndroid Build Coastguard Worker
440*9880d681SAndroid Build Coastguard Worker    /// binoprhs
441*9880d681SAndroid Build Coastguard Worker    ///   ::= ('+' primary)*
442*9880d681SAndroid Build Coastguard Worker    static std::unique_ptr<ExprAST> ParseBinOpRHS(int ExprPrec,
443*9880d681SAndroid Build Coastguard Worker                                                  std::unique_ptr<ExprAST> LHS) {
444*9880d681SAndroid Build Coastguard Worker      // If this is a binop, find its precedence.
445*9880d681SAndroid Build Coastguard Worker      while (1) {
446*9880d681SAndroid Build Coastguard Worker        int TokPrec = GetTokPrecedence();
447*9880d681SAndroid Build Coastguard Worker
448*9880d681SAndroid Build Coastguard Worker        // If this is a binop that binds at least as tightly as the current binop,
449*9880d681SAndroid Build Coastguard Worker        // consume it, otherwise we are done.
450*9880d681SAndroid Build Coastguard Worker        if (TokPrec < ExprPrec)
451*9880d681SAndroid Build Coastguard Worker          return LHS;
452*9880d681SAndroid Build Coastguard Worker
453*9880d681SAndroid Build Coastguard WorkerThis code gets the precedence of the current token and checks to see if
454*9880d681SAndroid Build Coastguard Workerif is too low. Because we defined invalid tokens to have a precedence of
455*9880d681SAndroid Build Coastguard Worker-1, this check implicitly knows that the pair-stream ends when the token
456*9880d681SAndroid Build Coastguard Workerstream runs out of binary operators. If this check succeeds, we know
457*9880d681SAndroid Build Coastguard Workerthat the token is a binary operator and that it will be included in this
458*9880d681SAndroid Build Coastguard Workerexpression:
459*9880d681SAndroid Build Coastguard Worker
460*9880d681SAndroid Build Coastguard Worker.. code-block:: c++
461*9880d681SAndroid Build Coastguard Worker
462*9880d681SAndroid Build Coastguard Worker        // Okay, we know this is a binop.
463*9880d681SAndroid Build Coastguard Worker        int BinOp = CurTok;
464*9880d681SAndroid Build Coastguard Worker        getNextToken();  // eat binop
465*9880d681SAndroid Build Coastguard Worker
466*9880d681SAndroid Build Coastguard Worker        // Parse the primary expression after the binary operator.
467*9880d681SAndroid Build Coastguard Worker        auto RHS = ParsePrimary();
468*9880d681SAndroid Build Coastguard Worker        if (!RHS)
469*9880d681SAndroid Build Coastguard Worker          return nullptr;
470*9880d681SAndroid Build Coastguard Worker
471*9880d681SAndroid Build Coastguard WorkerAs such, this code eats (and remembers) the binary operator and then
472*9880d681SAndroid Build Coastguard Workerparses the primary expression that follows. This builds up the whole
473*9880d681SAndroid Build Coastguard Workerpair, the first of which is [+, b] for the running example.
474*9880d681SAndroid Build Coastguard Worker
475*9880d681SAndroid Build Coastguard WorkerNow that we parsed the left-hand side of an expression and one pair of
476*9880d681SAndroid Build Coastguard Workerthe RHS sequence, we have to decide which way the expression associates.
477*9880d681SAndroid Build Coastguard WorkerIn particular, we could have "(a+b) binop unparsed" or "a + (b binop
478*9880d681SAndroid Build Coastguard Workerunparsed)". To determine this, we look ahead at "binop" to determine its
479*9880d681SAndroid Build Coastguard Workerprecedence and compare it to BinOp's precedence (which is '+' in this
480*9880d681SAndroid Build Coastguard Workercase):
481*9880d681SAndroid Build Coastguard Worker
482*9880d681SAndroid Build Coastguard Worker.. code-block:: c++
483*9880d681SAndroid Build Coastguard Worker
484*9880d681SAndroid Build Coastguard Worker        // If BinOp binds less tightly with RHS than the operator after RHS, let
485*9880d681SAndroid Build Coastguard Worker        // the pending operator take RHS as its LHS.
486*9880d681SAndroid Build Coastguard Worker        int NextPrec = GetTokPrecedence();
487*9880d681SAndroid Build Coastguard Worker        if (TokPrec < NextPrec) {
488*9880d681SAndroid Build Coastguard Worker
489*9880d681SAndroid Build Coastguard WorkerIf the precedence of the binop to the right of "RHS" is lower or equal
490*9880d681SAndroid Build Coastguard Workerto the precedence of our current operator, then we know that the
491*9880d681SAndroid Build Coastguard Workerparentheses associate as "(a+b) binop ...". In our example, the current
492*9880d681SAndroid Build Coastguard Workeroperator is "+" and the next operator is "+", we know that they have the
493*9880d681SAndroid Build Coastguard Workersame precedence. In this case we'll create the AST node for "a+b", and
494*9880d681SAndroid Build Coastguard Workerthen continue parsing:
495*9880d681SAndroid Build Coastguard Worker
496*9880d681SAndroid Build Coastguard Worker.. code-block:: c++
497*9880d681SAndroid Build Coastguard Worker
498*9880d681SAndroid Build Coastguard Worker          ... if body omitted ...
499*9880d681SAndroid Build Coastguard Worker        }
500*9880d681SAndroid Build Coastguard Worker
501*9880d681SAndroid Build Coastguard Worker        // Merge LHS/RHS.
502*9880d681SAndroid Build Coastguard Worker        LHS = llvm::make_unique<BinaryExprAST>(BinOp, std::move(LHS),
503*9880d681SAndroid Build Coastguard Worker                                               std::move(RHS));
504*9880d681SAndroid Build Coastguard Worker      }  // loop around to the top of the while loop.
505*9880d681SAndroid Build Coastguard Worker    }
506*9880d681SAndroid Build Coastguard Worker
507*9880d681SAndroid Build Coastguard WorkerIn our example above, this will turn "a+b+" into "(a+b)" and execute the
508*9880d681SAndroid Build Coastguard Workernext iteration of the loop, with "+" as the current token. The code
509*9880d681SAndroid Build Coastguard Workerabove will eat, remember, and parse "(c+d)" as the primary expression,
510*9880d681SAndroid Build Coastguard Workerwhich makes the current pair equal to [+, (c+d)]. It will then evaluate
511*9880d681SAndroid Build Coastguard Workerthe 'if' conditional above with "\*" as the binop to the right of the
512*9880d681SAndroid Build Coastguard Workerprimary. In this case, the precedence of "\*" is higher than the
513*9880d681SAndroid Build Coastguard Workerprecedence of "+" so the if condition will be entered.
514*9880d681SAndroid Build Coastguard Worker
515*9880d681SAndroid Build Coastguard WorkerThe critical question left here is "how can the if condition parse the
516*9880d681SAndroid Build Coastguard Workerright hand side in full"? In particular, to build the AST correctly for
517*9880d681SAndroid Build Coastguard Workerour example, it needs to get all of "(c+d)\*e\*f" as the RHS expression
518*9880d681SAndroid Build Coastguard Workervariable. The code to do this is surprisingly simple (code from the
519*9880d681SAndroid Build Coastguard Workerabove two blocks duplicated for context):
520*9880d681SAndroid Build Coastguard Worker
521*9880d681SAndroid Build Coastguard Worker.. code-block:: c++
522*9880d681SAndroid Build Coastguard Worker
523*9880d681SAndroid Build Coastguard Worker        // If BinOp binds less tightly with RHS than the operator after RHS, let
524*9880d681SAndroid Build Coastguard Worker        // the pending operator take RHS as its LHS.
525*9880d681SAndroid Build Coastguard Worker        int NextPrec = GetTokPrecedence();
526*9880d681SAndroid Build Coastguard Worker        if (TokPrec < NextPrec) {
527*9880d681SAndroid Build Coastguard Worker          RHS = ParseBinOpRHS(TokPrec+1, std::move(RHS));
528*9880d681SAndroid Build Coastguard Worker          if (!RHS)
529*9880d681SAndroid Build Coastguard Worker            return nullptr;
530*9880d681SAndroid Build Coastguard Worker        }
531*9880d681SAndroid Build Coastguard Worker        // Merge LHS/RHS.
532*9880d681SAndroid Build Coastguard Worker        LHS = llvm::make_unique<BinaryExprAST>(BinOp, std::move(LHS),
533*9880d681SAndroid Build Coastguard Worker                                               std::move(RHS));
534*9880d681SAndroid Build Coastguard Worker      }  // loop around to the top of the while loop.
535*9880d681SAndroid Build Coastguard Worker    }
536*9880d681SAndroid Build Coastguard Worker
537*9880d681SAndroid Build Coastguard WorkerAt this point, we know that the binary operator to the RHS of our
538*9880d681SAndroid Build Coastguard Workerprimary has higher precedence than the binop we are currently parsing.
539*9880d681SAndroid Build Coastguard WorkerAs such, we know that any sequence of pairs whose operators are all
540*9880d681SAndroid Build Coastguard Workerhigher precedence than "+" should be parsed together and returned as
541*9880d681SAndroid Build Coastguard Worker"RHS". To do this, we recursively invoke the ``ParseBinOpRHS`` function
542*9880d681SAndroid Build Coastguard Workerspecifying "TokPrec+1" as the minimum precedence required for it to
543*9880d681SAndroid Build Coastguard Workercontinue. In our example above, this will cause it to return the AST
544*9880d681SAndroid Build Coastguard Workernode for "(c+d)\*e\*f" as RHS, which is then set as the RHS of the '+'
545*9880d681SAndroid Build Coastguard Workerexpression.
546*9880d681SAndroid Build Coastguard Worker
547*9880d681SAndroid Build Coastguard WorkerFinally, on the next iteration of the while loop, the "+g" piece is
548*9880d681SAndroid Build Coastguard Workerparsed and added to the AST. With this little bit of code (14
549*9880d681SAndroid Build Coastguard Workernon-trivial lines), we correctly handle fully general binary expression
550*9880d681SAndroid Build Coastguard Workerparsing in a very elegant way. This was a whirlwind tour of this code,
551*9880d681SAndroid Build Coastguard Workerand it is somewhat subtle. I recommend running through it with a few
552*9880d681SAndroid Build Coastguard Workertough examples to see how it works.
553*9880d681SAndroid Build Coastguard Worker
554*9880d681SAndroid Build Coastguard WorkerThis wraps up handling of expressions. At this point, we can point the
555*9880d681SAndroid Build Coastguard Workerparser at an arbitrary token stream and build an expression from it,
556*9880d681SAndroid Build Coastguard Workerstopping at the first token that is not part of the expression. Next up
557*9880d681SAndroid Build Coastguard Workerwe need to handle function definitions, etc.
558*9880d681SAndroid Build Coastguard Worker
559*9880d681SAndroid Build Coastguard WorkerParsing the Rest
560*9880d681SAndroid Build Coastguard Worker================
561*9880d681SAndroid Build Coastguard Worker
562*9880d681SAndroid Build Coastguard WorkerThe next thing missing is handling of function prototypes. In
563*9880d681SAndroid Build Coastguard WorkerKaleidoscope, these are used both for 'extern' function declarations as
564*9880d681SAndroid Build Coastguard Workerwell as function body definitions. The code to do this is
565*9880d681SAndroid Build Coastguard Workerstraight-forward and not very interesting (once you've survived
566*9880d681SAndroid Build Coastguard Workerexpressions):
567*9880d681SAndroid Build Coastguard Worker
568*9880d681SAndroid Build Coastguard Worker.. code-block:: c++
569*9880d681SAndroid Build Coastguard Worker
570*9880d681SAndroid Build Coastguard Worker    /// prototype
571*9880d681SAndroid Build Coastguard Worker    ///   ::= id '(' id* ')'
572*9880d681SAndroid Build Coastguard Worker    static std::unique_ptr<PrototypeAST> ParsePrototype() {
573*9880d681SAndroid Build Coastguard Worker      if (CurTok != tok_identifier)
574*9880d681SAndroid Build Coastguard Worker        return LogErrorP("Expected function name in prototype");
575*9880d681SAndroid Build Coastguard Worker
576*9880d681SAndroid Build Coastguard Worker      std::string FnName = IdentifierStr;
577*9880d681SAndroid Build Coastguard Worker      getNextToken();
578*9880d681SAndroid Build Coastguard Worker
579*9880d681SAndroid Build Coastguard Worker      if (CurTok != '(')
580*9880d681SAndroid Build Coastguard Worker        return LogErrorP("Expected '(' in prototype");
581*9880d681SAndroid Build Coastguard Worker
582*9880d681SAndroid Build Coastguard Worker      // Read the list of argument names.
583*9880d681SAndroid Build Coastguard Worker      std::vector<std::string> ArgNames;
584*9880d681SAndroid Build Coastguard Worker      while (getNextToken() == tok_identifier)
585*9880d681SAndroid Build Coastguard Worker        ArgNames.push_back(IdentifierStr);
586*9880d681SAndroid Build Coastguard Worker      if (CurTok != ')')
587*9880d681SAndroid Build Coastguard Worker        return LogErrorP("Expected ')' in prototype");
588*9880d681SAndroid Build Coastguard Worker
589*9880d681SAndroid Build Coastguard Worker      // success.
590*9880d681SAndroid Build Coastguard Worker      getNextToken();  // eat ')'.
591*9880d681SAndroid Build Coastguard Worker
592*9880d681SAndroid Build Coastguard Worker      return llvm::make_unique<PrototypeAST>(FnName, std::move(ArgNames));
593*9880d681SAndroid Build Coastguard Worker    }
594*9880d681SAndroid Build Coastguard Worker
595*9880d681SAndroid Build Coastguard WorkerGiven this, a function definition is very simple, just a prototype plus
596*9880d681SAndroid Build Coastguard Workeran expression to implement the body:
597*9880d681SAndroid Build Coastguard Worker
598*9880d681SAndroid Build Coastguard Worker.. code-block:: c++
599*9880d681SAndroid Build Coastguard Worker
600*9880d681SAndroid Build Coastguard Worker    /// definition ::= 'def' prototype expression
601*9880d681SAndroid Build Coastguard Worker    static std::unique_ptr<FunctionAST> ParseDefinition() {
602*9880d681SAndroid Build Coastguard Worker      getNextToken();  // eat def.
603*9880d681SAndroid Build Coastguard Worker      auto Proto = ParsePrototype();
604*9880d681SAndroid Build Coastguard Worker      if (!Proto) return nullptr;
605*9880d681SAndroid Build Coastguard Worker
606*9880d681SAndroid Build Coastguard Worker      if (auto E = ParseExpression())
607*9880d681SAndroid Build Coastguard Worker        return llvm::make_unique<FunctionAST>(std::move(Proto), std::move(E));
608*9880d681SAndroid Build Coastguard Worker      return nullptr;
609*9880d681SAndroid Build Coastguard Worker    }
610*9880d681SAndroid Build Coastguard Worker
611*9880d681SAndroid Build Coastguard WorkerIn addition, we support 'extern' to declare functions like 'sin' and
612*9880d681SAndroid Build Coastguard Worker'cos' as well as to support forward declaration of user functions. These
613*9880d681SAndroid Build Coastguard Worker'extern's are just prototypes with no body:
614*9880d681SAndroid Build Coastguard Worker
615*9880d681SAndroid Build Coastguard Worker.. code-block:: c++
616*9880d681SAndroid Build Coastguard Worker
617*9880d681SAndroid Build Coastguard Worker    /// external ::= 'extern' prototype
618*9880d681SAndroid Build Coastguard Worker    static std::unique_ptr<PrototypeAST> ParseExtern() {
619*9880d681SAndroid Build Coastguard Worker      getNextToken();  // eat extern.
620*9880d681SAndroid Build Coastguard Worker      return ParsePrototype();
621*9880d681SAndroid Build Coastguard Worker    }
622*9880d681SAndroid Build Coastguard Worker
623*9880d681SAndroid Build Coastguard WorkerFinally, we'll also let the user type in arbitrary top-level expressions
624*9880d681SAndroid Build Coastguard Workerand evaluate them on the fly. We will handle this by defining anonymous
625*9880d681SAndroid Build Coastguard Workernullary (zero argument) functions for them:
626*9880d681SAndroid Build Coastguard Worker
627*9880d681SAndroid Build Coastguard Worker.. code-block:: c++
628*9880d681SAndroid Build Coastguard Worker
629*9880d681SAndroid Build Coastguard Worker    /// toplevelexpr ::= expression
630*9880d681SAndroid Build Coastguard Worker    static std::unique_ptr<FunctionAST> ParseTopLevelExpr() {
631*9880d681SAndroid Build Coastguard Worker      if (auto E = ParseExpression()) {
632*9880d681SAndroid Build Coastguard Worker        // Make an anonymous proto.
633*9880d681SAndroid Build Coastguard Worker        auto Proto = llvm::make_unique<PrototypeAST>("", std::vector<std::string>());
634*9880d681SAndroid Build Coastguard Worker        return llvm::make_unique<FunctionAST>(std::move(Proto), std::move(E));
635*9880d681SAndroid Build Coastguard Worker      }
636*9880d681SAndroid Build Coastguard Worker      return nullptr;
637*9880d681SAndroid Build Coastguard Worker    }
638*9880d681SAndroid Build Coastguard Worker
639*9880d681SAndroid Build Coastguard WorkerNow that we have all the pieces, let's build a little driver that will
640*9880d681SAndroid Build Coastguard Workerlet us actually *execute* this code we've built!
641*9880d681SAndroid Build Coastguard Worker
642*9880d681SAndroid Build Coastguard WorkerThe Driver
643*9880d681SAndroid Build Coastguard Worker==========
644*9880d681SAndroid Build Coastguard Worker
645*9880d681SAndroid Build Coastguard WorkerThe driver for this simply invokes all of the parsing pieces with a
646*9880d681SAndroid Build Coastguard Workertop-level dispatch loop. There isn't much interesting here, so I'll just
647*9880d681SAndroid Build Coastguard Workerinclude the top-level loop. See `below <#full-code-listing>`_ for full code in the
648*9880d681SAndroid Build Coastguard Worker"Top-Level Parsing" section.
649*9880d681SAndroid Build Coastguard Worker
650*9880d681SAndroid Build Coastguard Worker.. code-block:: c++
651*9880d681SAndroid Build Coastguard Worker
652*9880d681SAndroid Build Coastguard Worker    /// top ::= definition | external | expression | ';'
653*9880d681SAndroid Build Coastguard Worker    static void MainLoop() {
654*9880d681SAndroid Build Coastguard Worker      while (1) {
655*9880d681SAndroid Build Coastguard Worker        fprintf(stderr, "ready> ");
656*9880d681SAndroid Build Coastguard Worker        switch (CurTok) {
657*9880d681SAndroid Build Coastguard Worker        case tok_eof:
658*9880d681SAndroid Build Coastguard Worker          return;
659*9880d681SAndroid Build Coastguard Worker        case ';': // ignore top-level semicolons.
660*9880d681SAndroid Build Coastguard Worker          getNextToken();
661*9880d681SAndroid Build Coastguard Worker          break;
662*9880d681SAndroid Build Coastguard Worker        case tok_def:
663*9880d681SAndroid Build Coastguard Worker          HandleDefinition();
664*9880d681SAndroid Build Coastguard Worker          break;
665*9880d681SAndroid Build Coastguard Worker        case tok_extern:
666*9880d681SAndroid Build Coastguard Worker          HandleExtern();
667*9880d681SAndroid Build Coastguard Worker          break;
668*9880d681SAndroid Build Coastguard Worker        default:
669*9880d681SAndroid Build Coastguard Worker          HandleTopLevelExpression();
670*9880d681SAndroid Build Coastguard Worker          break;
671*9880d681SAndroid Build Coastguard Worker        }
672*9880d681SAndroid Build Coastguard Worker      }
673*9880d681SAndroid Build Coastguard Worker    }
674*9880d681SAndroid Build Coastguard Worker
675*9880d681SAndroid Build Coastguard WorkerThe most interesting part of this is that we ignore top-level
676*9880d681SAndroid Build Coastguard Workersemicolons. Why is this, you ask? The basic reason is that if you type
677*9880d681SAndroid Build Coastguard Worker"4 + 5" at the command line, the parser doesn't know whether that is the
678*9880d681SAndroid Build Coastguard Workerend of what you will type or not. For example, on the next line you
679*9880d681SAndroid Build Coastguard Workercould type "def foo..." in which case 4+5 is the end of a top-level
680*9880d681SAndroid Build Coastguard Workerexpression. Alternatively you could type "\* 6", which would continue
681*9880d681SAndroid Build Coastguard Workerthe expression. Having top-level semicolons allows you to type "4+5;",
682*9880d681SAndroid Build Coastguard Workerand the parser will know you are done.
683*9880d681SAndroid Build Coastguard Worker
684*9880d681SAndroid Build Coastguard WorkerConclusions
685*9880d681SAndroid Build Coastguard Worker===========
686*9880d681SAndroid Build Coastguard Worker
687*9880d681SAndroid Build Coastguard WorkerWith just under 400 lines of commented code (240 lines of non-comment,
688*9880d681SAndroid Build Coastguard Workernon-blank code), we fully defined our minimal language, including a
689*9880d681SAndroid Build Coastguard Workerlexer, parser, and AST builder. With this done, the executable will
690*9880d681SAndroid Build Coastguard Workervalidate Kaleidoscope code and tell us if it is grammatically invalid.
691*9880d681SAndroid Build Coastguard WorkerFor example, here is a sample interaction:
692*9880d681SAndroid Build Coastguard Worker
693*9880d681SAndroid Build Coastguard Worker.. code-block:: bash
694*9880d681SAndroid Build Coastguard Worker
695*9880d681SAndroid Build Coastguard Worker    $ ./a.out
696*9880d681SAndroid Build Coastguard Worker    ready> def foo(x y) x+foo(y, 4.0);
697*9880d681SAndroid Build Coastguard Worker    Parsed a function definition.
698*9880d681SAndroid Build Coastguard Worker    ready> def foo(x y) x+y y;
699*9880d681SAndroid Build Coastguard Worker    Parsed a function definition.
700*9880d681SAndroid Build Coastguard Worker    Parsed a top-level expr
701*9880d681SAndroid Build Coastguard Worker    ready> def foo(x y) x+y );
702*9880d681SAndroid Build Coastguard Worker    Parsed a function definition.
703*9880d681SAndroid Build Coastguard Worker    Error: unknown token when expecting an expression
704*9880d681SAndroid Build Coastguard Worker    ready> extern sin(a);
705*9880d681SAndroid Build Coastguard Worker    ready> Parsed an extern
706*9880d681SAndroid Build Coastguard Worker    ready> ^D
707*9880d681SAndroid Build Coastguard Worker    $
708*9880d681SAndroid Build Coastguard Worker
709*9880d681SAndroid Build Coastguard WorkerThere is a lot of room for extension here. You can define new AST nodes,
710*9880d681SAndroid Build Coastguard Workerextend the language in many ways, etc. In the `next
711*9880d681SAndroid Build Coastguard Workerinstallment <LangImpl3.html>`_, we will describe how to generate LLVM
712*9880d681SAndroid Build Coastguard WorkerIntermediate Representation (IR) from the AST.
713*9880d681SAndroid Build Coastguard Worker
714*9880d681SAndroid Build Coastguard WorkerFull Code Listing
715*9880d681SAndroid Build Coastguard Worker=================
716*9880d681SAndroid Build Coastguard Worker
717*9880d681SAndroid Build Coastguard WorkerHere is the complete code listing for this and the previous chapter.
718*9880d681SAndroid Build Coastguard WorkerNote that it is fully self-contained: you don't need LLVM or any
719*9880d681SAndroid Build Coastguard Workerexternal libraries at all for this. (Besides the C and C++ standard
720*9880d681SAndroid Build Coastguard Workerlibraries, of course.) To build this, just compile with:
721*9880d681SAndroid Build Coastguard Worker
722*9880d681SAndroid Build Coastguard Worker.. code-block:: bash
723*9880d681SAndroid Build Coastguard Worker
724*9880d681SAndroid Build Coastguard Worker    # Compile
725*9880d681SAndroid Build Coastguard Worker    clang++ -g -O3 toy.cpp
726*9880d681SAndroid Build Coastguard Worker    # Run
727*9880d681SAndroid Build Coastguard Worker    ./a.out
728*9880d681SAndroid Build Coastguard Worker
729*9880d681SAndroid Build Coastguard WorkerHere is the code:
730*9880d681SAndroid Build Coastguard Worker
731*9880d681SAndroid Build Coastguard Worker.. literalinclude:: ../../examples/Kaleidoscope/Chapter2/toy.cpp
732*9880d681SAndroid Build Coastguard Worker   :language: c++
733*9880d681SAndroid Build Coastguard Worker
734*9880d681SAndroid Build Coastguard Worker`Next: Implementing Code Generation to LLVM IR <LangImpl03.html>`_
735*9880d681SAndroid Build Coastguard Worker
736