Describing Syntax and Semantics: February 2017

Sunday, February 19, 2017

CONCEPTS OF PROGRAMMING LANGUAGES – CHAPTER 5 (NAMES, BINDINGS, AND SCOPES)

1. What are the design issues for names?

= The design issues for names are “Are the name case sensitive?” and “Are special words reserved for words or keywords?”

2. What is the potential danger of case-sensitive names?

= To some people, the problems are about writability and readability. For readability, because names that look very similar in fact denote different entities. For writability, many of the predefined names including both uppercase and lowercase letters.

3. In what way are reserved words better than keywords?

= A reserved word is a special word that cannot be used as a user-defined name. A keyword is a word that is special only in certain contexts.

4. What is an alias?

= When more than one variable can be used to access the same memory location, the variables are called aliases.

5. Which category of C++ reference variables is always aliases?

= Union types. Union is a type whose variables may store different type values at different times during program execution.

6. What is the l-values of a variables? What is the r-values?

= The l-values of a variable is its address. The r-values of a variable is its value.

7. Define binding and binding time.

= A binding is an association, such as between an attribute and an entity, or between an operation and a symbol. Binding time is the time at which binding takes place.

8. After language design and implementation [what are the four times bindings can take place in a program?]

· Language design time — bind operator symbols to operations

· Language implementation time– bind floating point type to a representation

· Compile time — bind a variable to a type in C or Java

· Load time — bind a C or C++ static variable to a memory cell)

· Runtime — bind a non-static local variable to a memory cell

9. Define static binding and dynamic binding.

= A static binding is if it first occurs before run time and remains unchanged throughout program execution. A dynamic binding is if it first occurs during execution or can change during execution of the program.

10. What are the advantages and disadvantages of implicit declarations?

· Advantage: convenient to programmers.

· Disadvantage: can be detrimental to reliability because they prevent the compilation process from detecting some typographical and programmer errors.

11. What are the advantages and disadvantages of dynamic type binding?

AD: The primary advantage of dynamic binding of variables to types is that it provides more programming flexibility.

DIS: First, it causes programs to be less reliable, because the error-detection capability of the compiler is diminished relative to a compiler for a language with static type bindings. Dynamic type binding allows any variable to be assigned a value of any type. Incorrect types of right sides of assignments are not detected as errors; rather, the type of the left side is simply changed to the incorrect type.

12. Define static, stack-dynamic, explicit heap-dynamic, and implicit heap-dynamic variables. What are their advantages and disadvantages?

= Static: bound to memory cells before execution begins and remains bound to the same memory cell throughout the execution.

Stack-dynamic: storage bindings are created for variables when their declaration statements are elaborated.

Explicit heap-dynamic: allocated and deallocated by explicit directives, specified by the programmer, which take effect during execution.

Implicit heap-dynamic variables: Allocation and deallocation caused by assignment statements.

13. Define lifetime, scope, static scope, and dynamic scope.

= Lifetime: A time during which the variable is bound to a specific memory location. The lifetime begins when it is bound to a specific cell and ends when it is unbound from that cell.

Scope: The range of statements in which the variable is visible. A variable is visible in a statement if it can be referenced in that statement.

Static scope: is based on program text and to connect a name reference to a variable , you (or the compiler) must find the declaration.

Dynamic scope: Based on calling sequences of program units, not their textual layout (temporal versus spatial). References to variables are connected to declarations by searching back through the chain of subprogram calls that forced execution to this point.

14. How is a reference to a nonlocal variable in a static-scoped program connected to its definition?

= A reference to a non-locally variable in a static-scoped language with nested subprograms requires a two step access process:

1. Find the correct activation record instance

2. Determine the correct offset within that activation record instance

15. What is the general problem with static scoping?

= Usually too much access. Scope structure destroyed as program evolves.

16. What is the referencing environment of a statement?

= Set of all names visible to the statement.

17. What is a static ancestor of a subprogram? What is a dynamic ancestor of a subprogram?

= The static ancestors of a subprogram sub() are all the procedures in the program within which the procedure sub() is defined, i.e., the definition of the procedure sub() is nested. The definition of a procedure may be directly nested within only one procedure, called its static parent procedure. However, this static parent procedure may itself be nested within another procedure, and so on up to the main() program. All these procedures are considered to be static ancestors of the procedure sub(). Simply put, the static ancestors are those that strictly contain the subprogram in question.

The dynamic ancestors of a subprogram sub() are all the procedures called before sub() during the execution of a program, that have not yet finished executing. These are the procedures that are waiting for procedure sub() to finish executing before they can terminate. Simply put, dynamic ancestors are those that are called to reach the subprogram in question.

18. What is a block?

= The storage of a variable is allocated when the section is entered and deallocated when the section is exited.

19. What is the purpose of the let constructs in functional languages?

= “let” introduces a new variable scope, and allows you to bind variables to values for that scope. It is often read as “let x be [value] in …”

22. What are the advantages and disadvantages of dynamic scoping?

= Advantage: convenience.

Disadvantage: cant type-check at compile time. Poor readability (can’t determine type of a variable statically).

23. What are the advantages of named constants?

= The advantages are readability and modifiability.

Chapter 4: Lexical and Syntax Analysis

1. What are three reasons why syntax analyzers are based on grammars?

First, using BNF descriptions of the syntax of programs are clear and concise. Second, can be used as the direct basis for the syntax analyzer. Third, implementations based on BNF are relatively easy to maintain because of their modularity.

2. Explain the three reasons why lexical analysis is separated from syntax analysis.

1. Simplicity (Techniques for lexical analysis are less complex than those required for syntax analysis, so the lexical-analysis process can be simpler if it is separate. Also, removing the low level details of lexical analysis from the syntax analyzer makes the syntax analyzer both smaller and less complex)
2. Efficiency (Although it pays to optimize the lexical analyzer, because lexical analysis requires a significant portion of total compilation time, it is not fruitful to optimize the syntax analyzer. Separation facilitates this selective optimization)
3. Portability (Because the lexical analyzer reads input program files and often includes buffering of that input, it is somewhat platform dependent. However, they syntax analyzer can be platform independent. It is always good to isolate machine-dependent parts of any software system.)

3. Define lexeme and token

Lexeme is the logical groupings that the lexical analyzer collects characters into and Token is the internal codes for categories of these groupings.

4. What are the primary tasks of a lexical analyzer?

Lexical analysis process includes skipping comments and white space outside lexemes, as they are not relevant to the meaning of the program. Also lexical analyzer inserts lexemes for user-defined names into the symbol table, which is used by later phases of the compiler. Finally, lexical analysis detect syntactic errors in tokens, such as ill-formed floating-point literals, and report such errors to the user.

5. Describe briefly the three approaches to building a lexical analyzer.

– Write a formal description of the token patterns of the language using a descriptive language related to regular expression.

– Design a state transition diagram that describes the token patterns of the language and write a program that implements the diagram.

– Design a state transition diagram that describes the token patterns of the language and hand-construct a table-driven implementation of the state diagram.

6. What is a state transition diagram?
A state transition diagram, or just state diagram, is a directed graph. he nodes of a state diagram are labeled with state names. The arcs are labeled with the input characters that cause the transitions among the states. An arc may also include actions the lexical analyzer must perform when the transition is taken.

State diagrams of the form used for lexical analyzers are representations

of a class of mathematical machines called finite automata. Finite automata

can be designed to recognize members of a class of languages called regular

languages. Regular grammars are generative devices for regular languages.

The tokens of a programming language are a regular language, and a lexical

analyzer is a finite automaton.

7. Why are character classes used, rather than individual characters, for the letter and digit transitions of a state diagram for a lexical analyzer?

Observe that there are 52 different characters that can begin a name, which would require 52 transitions from the transition diagram's initial state. However, a lexical analyzer is interested only in determining that it is a name and is not concerned with which specific name it happens to be. Therefore, we define a character class named LETTER for all 52 letters and use a single transition on the first letter of any name.

8. What are the two distinct goals of syntax analysis?

1. Syntax analyzer must check input program to determine whether it is syntactically correct. When an error is found the analyzer must produce a diagnostic message and recover. (recovery means get back to a normal state and continue its analysis of the input program)
2. produce complete parse tree, or at least trace the structure of the complete parse tree, for syntactically correct input. Parse tree is used as the basis for translation

9. Describe the difference between top-down and bottom-up parser.

Top-down the tree is built from the root downward to the leaves
Bottom-up parse tree is built from the leaves upward to the root

10. Describe the parsing problem for a Top-Down Parser

The general form of a left sentential form is xAy, whereby our notational conventions x is a string of terminal symbols, A is a non terminal, and y is a mixed string. Because x contains only terminals, A is the leftmost non terminal in the sentential form, so it is the one that must be expanded to get the next sentential form in a left- most derivation.
Determining the next sentential form is a matter of choosing the correct grammar rule that has A as its LHS. For example, if the current sentential form is xAy and the A-rules are A → bB, A → cBb, and A → a, a top-down parser must choose among these three rules to get the next sentential form, which could be xbBy, xcBby, or xay. This is the parsing decision problem for top-down parsers.

11. Describe the parsing problem for a bottom-up parser

Bottom-up parser can only identify and process the low level details of the structure of the text prior to the secondary level and leave the overall structure of the highest level.

12. Explain why compilers use parsing algorithms that work on only a subset of all grammars.

Because parsing algorithms that works for any unambiguous grammar are complex and inefficient. In fact, the complexity of the algorithm is O (n3). Algorithm is O (n3) is not usually useful for practical processes, such as analysis of the syntax for the compiler, because they are too slow. In this situation, computer scientists often look for a faster algorithm, although it is less common. In the case of parsing, faster algorithm has listed who work only a subset of the set of all possible grammar

13. Why are named constants used, rather than numbers, for token codes?

Because named constants can used to simplify the reading of lexical and syntactic analysis

14. Describe how a recursive-descent parsing subprogram is written for a rule with a single RHS

First it calls the function that aparses terms(term). Then it continues to call that function as long as it finds ADD_OP or SUB_OP tokens (which it passes over by calling lex). A recursive-descent subprogram for a rule with a single RHS is relatively simple. For each terminal symbol in the RHS, that terminal symbol is compared with next Token. If they do not match, it is a syntax error. If they match,
the lexical analyzer is called to get the next input token. For each nonterminal, the parsing subprogram for that nonterminal is called.

15. Explain the two grammar characteristics that prohibit them from being used as the basis for a top-down parser.

Direct or indirect Left Recursion. Recursive decesnt parser subprogram for A immediately calls itself to parse the first symbol in its RHS. That activation of the A parser subprogram then immediately calls itself again, and again, and so forth. It is easy to see that this leads nowhere (stack overflow)

16. What is the FIRST set for a given grammar and sentential form?

FIRST( ) = {a => a } (If => , is in FIRST( ))
in which =>* means 0 or more derivation steps

17. Describe the pairwise disjointness test.

The pairwise test try to test whether a parsing subprogram can determine which RHS is being parsed on the basis of the next token of input.

18. What is left factoring?
This states that a <variable> is either an identifier or an identifier followed by
an expression in brackets (a subscript). These rules clearly do not pass the
pair-wise disjointness test, because both RHSs begin with the same terminal, identi-fier.

19. What is a phrase of a sentential form?

A phrase is a subsequence of a sentential form that is eventually reduced to a single non-terminal

20. What is a simple phrases of a sentential form?
simple phrases is just a phrase that takes a single derivation step from its root nonterminal node.

21. What is the handle of a sentential form?

a handle of a right-sentential form γ is a production A → β and a position in γ where β may be found and replaced by A to produce the previous right-sentential form in a rightmost derivation of γ.

22. What is the mathematical machine on which both top-down and bottom-up parsers are based?

A PDA (PushDown Automation) machine use a both top-down parsers and bottom-up parsers because it recognizing context free languange.

23. Describe three advantages of LR parsers.

1. They can be built for all programming languages.
2. They can detect syntax errors as soon as it possible in a left-to-right scan.
3. The LR class of grammars is a proper superset of the class parsable by LL parsers.

25. Describe the purpose of the ACTION table of an LR parser

The ACTION part specifies what the parser should do, given the state symbol on
top of the parse stack and the next token of input.

26. Describe the purpose of the GOTO table of an LR parser

The rows of the GOTO part of the LR parsing table have state symbols as labels. This part of the table has nonterminals as column labels. The values in the GOTO part of the table indicate which state symbol should be pushed onto the parse stack after a reduction has been completed, which means the handle has been removed from the parse stack and the new nonterminal has been pushed onto the parse stack.

27. Is left recursion a problem for LR parsers?

NO many left recursive grammars are LR, but none are LL

Thursday, February 2, 2017

Chapter 3 Describing Syntax and Semantic

Syntax:the form or structure of the

expressions, statements, and program units

•Semantics: the meaning of the expressions,

statements, and program units

•Syntax and semantics provide a language’s

definition–Users of a language definition

· Other language designers

· Implementers

· Programmers (the users of the language)

The General Problem of Describing Syntax:

Terminology

•A sentence is a string of characters over some alphabet

•A language is a set of sentences

•A lexeme is the lowest level syntactic unit of a

language (e.g., *, sum, begin)

•A token is a category of lexemes (e.g.,

identifier)

Formal Definition of Languages

•Recognizers

– A recognition device reads input strings over the alphabet of

the language and decides whether the input strings belong

to the language

Example: syntax analysis part of a compiler

Generators

–A device that generates sentences of a language

–One can determine if the syntax of a particular sentence is

syntactically correct by comparing it to the structure of the

generator

Formal Methods of Describing Syntax

•Formal language-generation mechanisms,

usually called grammars, are commonly used

to describe the syntax of programming languages.

Attribute Grammars

Static Semantics

•

Nothing to do with meaning

•

Context-free grammars (CFGs) cannot describe

all of the syntax of programming languages

•

Categories of constructs that are trouble:

- Context-free, but cumbersome (e.g.,

types of operands in expressions)

- Non-context-free (e.g., variables must

be declared before they are used)

Attribute Grammars

•

Attribute grammars (AGs) have additions to

CFGs to carry some semantic info on parse

tree nodes

•

Primary value of AGs:

–

Static semantics specification

–

Compiler design (static semantics checking)

Attribute Grammars

: Definition

•

Def

: An attribute grammar is a context-free

grammar

G= (S,N,T,P) with the following

additions:

–For each grammar symbol x

there is a set

A(x) of attribute values

–

Each rule has a set of functions that define certain

attributes of the nonterminals in the rule

–

Each rule has a (possibly empty) set of predicates

to check for attribute consistency

1-25

Attribute Grammars: Definition

•

Let X0→X1... Xnbe a rule

•Functions of the form

S(X0) = f(A(X1), ... ,A(Xn))

Define synthesized attributes

•Functions of the form

I(Xj) = f(A(X0), ... , A(Xn)),

for i <= j <= n, define

inherited attributes

•Initially, there are

intrinsic attributes

on the leaves

1-26 Attribute Grammars:

An Example

• Syntax rule:

<proc_def>

→

procedure <proc_name>[1]

<proc_body> end <proc_name>[2];

•

Predicate:

<proc_name>[1]string == <proc_name>[2].string

Attribute Grammars:

An Example

•Syntax

<var>→A|B|C

•

actual_type

: synthesized for

•expected_type

: inherited for

<expr>

Attribute Grammar

(continued)

•

Syntax rule:

<expr>

→

<var>[1] + <var>[2]

Semantic rules:

<expr>.actual_type

←

<var>[1].actual_type

Predicate:

<var>[1].actual_type == <var>[2].actual_type

<expr>.expected_type == <expr>.actual_type

•Syntax rule:

<var>→id

Semantic rule:

<var>.actual_type←lookup (<var>.string)

Operational Semantics (continued)

•Uses of operational semantics:

- Language manuals and textbooks

- Teaching programming languages

•Two different levels of uses of operational semantics:

- Natural operational semantics

- Structural operational semantics

•Evaluation

- Good if used informally (language

manuals, etc.)

- Extremely complex if used formally (e.g.,VDL)

Denotational Semantics

•Based on recursive function theory

•The most abstract semantics description

method

•Originally developed by Scott and Strachey

(1970)

Denotational Semantics -

continued

•The process of building a denotational

specification for a language:

- Define a mathematical object for each language

entity

–Define a function that maps instances of the

language entities onto instances of the

corresponding mathematical objects

•The meaning of language constructs are defined

by only the values of the program's variables

Denotational Semantics:

program state

•The state of a program is the values of all its

current variables

s = {<i1, v1>, <i2, v2>, ..., <in, vn>}

•LetVARMAP

be a function that, when given a

variable name and a state, returns the current

value of the variable

VARMAP(ij, s) = vj

Evaluation of Denotational Semantics

•Can be used to prove the correctness of

programs

•Provides a rigorous way to think about

programs

•Can be an aid to language design

•Has been used in compiler generation systems

•Because of its complexity, it is of little use to

language users

Axiomatic Semantics

•Based on formal logic (predicate calculus)

•Original purpose: formal program verification

•Axioms or inference rules are defined for each

statement type in the language (to allow

transformations of logic expressions into more

formal logic expressions)

•The logic expressions are called

assertions

Axiomatic Semantics (continued)

•An assertion before a statement (a

precondition

) states the relationships and

constraints among variables that are true at

that point in execution

•An assertion following a statement is a

postcondition

•A weakest precondition

is the least restrictive

precondition that will guarantee the

postcondition

Evaluation of Axiomatic Semantics

•Developing axioms or inference rules for all of

the statements in a language is difficult

•It is a good tool for correctness proofs, and an

excellent framework for reasoning about

programs, but it is not as useful for language

users and compiler writers

•Its usefulness in describing the meaning of a

programming language is limited for language

users or compiler writers

Denotation Semantics vs Operational

Semantics

•In operational semantics, the state changes

are defined by coded algorithms

•In denotational semantics, the state changes

are defined by rigorous mathematical

functions