Describing Syntax and Semantics: Chapter 4: Lexical and Syntax Analysis

1. What are three reasons why syntax analyzers are based on grammars?

First, using BNF descriptions of the syntax of programs are clear and concise. Second, can be used as the direct basis for the syntax analyzer. Third, implementations based on BNF are relatively easy to maintain because of their modularity.

2. Explain the three reasons why lexical analysis is separated from syntax analysis.

1. Simplicity (Techniques for lexical analysis are less complex than those required for syntax analysis, so the lexical-analysis process can be simpler if it is separate. Also, removing the low level details of lexical analysis from the syntax analyzer makes the syntax analyzer both smaller and less complex)
2. Efficiency (Although it pays to optimize the lexical analyzer, because lexical analysis requires a significant portion of total compilation time, it is not fruitful to optimize the syntax analyzer. Separation facilitates this selective optimization)
3. Portability (Because the lexical analyzer reads input program files and often includes buffering of that input, it is somewhat platform dependent. However, they syntax analyzer can be platform independent. It is always good to isolate machine-dependent parts of any software system.)

3. Define lexeme and token

Lexeme is the logical groupings that the lexical analyzer collects characters into and Token is the internal codes for categories of these groupings.

4. What are the primary tasks of a lexical analyzer?

Lexical analysis process includes skipping comments and white space outside lexemes, as they are not relevant to the meaning of the program. Also lexical analyzer inserts lexemes for user-defined names into the symbol table, which is used by later phases of the compiler. Finally, lexical analysis detect syntactic errors in tokens, such as ill-formed floating-point literals, and report such errors to the user.

5. Describe briefly the three approaches to building a lexical analyzer.

– Write a formal description of the token patterns of the language using a descriptive language related to regular expression.

– Design a state transition diagram that describes the token patterns of the language and write a program that implements the diagram.

– Design a state transition diagram that describes the token patterns of the language and hand-construct a table-driven implementation of the state diagram.

6. What is a state transition diagram?
A state transition diagram, or just state diagram, is a directed graph. he nodes of a state diagram are labeled with state names. The arcs are labeled with the input characters that cause the transitions among the states. An arc may also include actions the lexical analyzer must perform when the transition is taken.

State diagrams of the form used for lexical analyzers are representations

of a class of mathematical machines called finite automata. Finite automata

can be designed to recognize members of a class of languages called regular

languages. Regular grammars are generative devices for regular languages.

The tokens of a programming language are a regular language, and a lexical

analyzer is a finite automaton.

7. Why are character classes used, rather than individual characters, for the letter and digit transitions of a state diagram for a lexical analyzer?

Observe that there are 52 different characters that can begin a name, which would require 52 transitions from the transition diagram's initial state. However, a lexical analyzer is interested only in determining that it is a name and is not concerned with which specific name it happens to be. Therefore, we define a character class named LETTER for all 52 letters and use a single transition on the first letter of any name.

8. What are the two distinct goals of syntax analysis?

1. Syntax analyzer must check input program to determine whether it is syntactically correct. When an error is found the analyzer must produce a diagnostic message and recover. (recovery means get back to a normal state and continue its analysis of the input program)
2. produce complete parse tree, or at least trace the structure of the complete parse tree, for syntactically correct input. Parse tree is used as the basis for translation

9. Describe the difference between top-down and bottom-up parser.

Top-down the tree is built from the root downward to the leaves
Bottom-up parse tree is built from the leaves upward to the root

10. Describe the parsing problem for a Top-Down Parser

The general form of a left sentential form is xAy, whereby our notational conventions x is a string of terminal symbols, A is a non terminal, and y is a mixed string. Because x contains only terminals, A is the leftmost non terminal in the sentential form, so it is the one that must be expanded to get the next sentential form in a left- most derivation.
Determining the next sentential form is a matter of choosing the correct grammar rule that has A as its LHS. For example, if the current sentential form is xAy and the A-rules are A → bB, A → cBb, and A → a, a top-down parser must choose among these three rules to get the next sentential form, which could be xbBy, xcBby, or xay. This is the parsing decision problem for top-down parsers.

11. Describe the parsing problem for a bottom-up parser

Bottom-up parser can only identify and process the low level details of the structure of the text prior to the secondary level and leave the overall structure of the highest level.

12. Explain why compilers use parsing algorithms that work on only a subset of all grammars.

Because parsing algorithms that works for any unambiguous grammar are complex and inefficient. In fact, the complexity of the algorithm is O (n3). Algorithm is O (n3) is not usually useful for practical processes, such as analysis of the syntax for the compiler, because they are too slow. In this situation, computer scientists often look for a faster algorithm, although it is less common. In the case of parsing, faster algorithm has listed who work only a subset of the set of all possible grammar

13. Why are named constants used, rather than numbers, for token codes?

Because named constants can used to simplify the reading of lexical and syntactic analysis

14. Describe how a recursive-descent parsing subprogram is written for a rule with a single RHS

First it calls the function that aparses terms(term). Then it continues to call that function as long as it finds ADD_OP or SUB_OP tokens (which it passes over by calling lex). A recursive-descent subprogram for a rule with a single RHS is relatively simple. For each terminal symbol in the RHS, that terminal symbol is compared with next Token. If they do not match, it is a syntax error. If they match,
the lexical analyzer is called to get the next input token. For each nonterminal, the parsing subprogram for that nonterminal is called.

15. Explain the two grammar characteristics that prohibit them from being used as the basis for a top-down parser.

Direct or indirect Left Recursion. Recursive decesnt parser subprogram for A immediately calls itself to parse the first symbol in its RHS. That activation of the A parser subprogram then immediately calls itself again, and again, and so forth. It is easy to see that this leads nowhere (stack overflow)

16. What is the FIRST set for a given grammar and sentential form?

FIRST( ) = {a => a } (If => , is in FIRST( ))
in which =>* means 0 or more derivation steps

17. Describe the pairwise disjointness test.

The pairwise test try to test whether a parsing subprogram can determine which RHS is being parsed on the basis of the next token of input.

18. What is left factoring?
This states that a <variable> is either an identifier or an identifier followed by
an expression in brackets (a subscript). These rules clearly do not pass the
pair-wise disjointness test, because both RHSs begin with the same terminal, identi-fier.

19. What is a phrase of a sentential form?

A phrase is a subsequence of a sentential form that is eventually reduced to a single non-terminal

20. What is a simple phrases of a sentential form?
simple phrases is just a phrase that takes a single derivation step from its root nonterminal node.

21. What is the handle of a sentential form?

a handle of a right-sentential form γ is a production A → β and a position in γ where β may be found and replaced by A to produce the previous right-sentential form in a rightmost derivation of γ.

22. What is the mathematical machine on which both top-down and bottom-up parsers are based?

A PDA (PushDown Automation) machine use a both top-down parsers and bottom-up parsers because it recognizing context free languange.

23. Describe three advantages of LR parsers.

1. They can be built for all programming languages.
2. They can detect syntax errors as soon as it possible in a left-to-right scan.
3. The LR class of grammars is a proper superset of the class parsable by LL parsers.

25. Describe the purpose of the ACTION table of an LR parser

The ACTION part specifies what the parser should do, given the state symbol on
top of the parse stack and the next token of input.

26. Describe the purpose of the GOTO table of an LR parser

The rows of the GOTO part of the LR parsing table have state symbols as labels. This part of the table has nonterminals as column labels. The values in the GOTO part of the table indicate which state symbol should be pushed onto the parse stack after a reduction has been completed, which means the handle has been removed from the parse stack and the new nonterminal has been pushed onto the parse stack.

27. Is left recursion a problem for LR parsers?

NO many left recursive grammars are LR, but none are LL

Describing Syntax and Semantics

Sunday, February 19, 2017

Chapter 4: Lexical and Syntax Analysis

No comments:

Post a Comment