Lines Matching full:strong
6 <p><strong>Compilers</strong> allow to <strong>analyze</strong> and
7 <strong>manipulate textual</strong> data. There is numerous usages.
20 <p>A <strong>language</strong> is a way to express or to
21 <strong>formulate</strong> a <strong>solution</strong> to a
22 <strong>problem</strong>. And there exists a lot of problems. We read and
24 <strong>understood</strong> by <strong>machines</strong>. This operation is
25 possible thanks to <strong>compilers</strong>.</p>
27 languages</a> is a theory that includes the <strong>automatic
28 analysis</strong> of those languages through tools like
29 <strong>automata</strong> or <strong>grammars</strong>. It is necessary to
36 <p>A <strong>language</strong> is a set of <strong>words</strong>. Each word
37 is a <strong>sequence</strong> of <strong>symbols</strong> belonging to an
38 <strong>alphabet</strong>. A symbol represents the smallest <strong>lexical
39 unit</strong> of a language, it is atomic and we will call it a
40 <strong>lexeme</strong> (also often mentionned as a token). These sequences of
41 lexemes representing words are built with <strong>rules</strong>. From a word
42 and a root rule, we will try to <strong>derive</strong> its sub-rules. If a
43 derivation exists, then the word is considered as <strong>valid</strong>, else
44 it is considered as <strong>invalid</strong>. We also speak about words
45 <strong>matching</strong>. For example, if we consider the following
56 <strong>disjunction</strong>, i.e. the “or”, is represented by the
63 <p>We continue to derive until we <strong>eliminate</strong> every rules in
64 order to have only <strong>lexemes</strong>:</p>
82 <p>A set of rules is called a <strong>grammar</strong>. And so, a grammar
83 represents a <strong>language</strong>!</p>
87 <strong>levels</strong>:</p>
89 <li><strong>unrestricted</strong> grammars, matching langages known as
91 <li><strong>context-sensitive</strong> grammars, matching contextual
93 <li><strong>context-free</strong> grammars, matching algebraic languages,
95 <li><strong>regular</strong> grammars, matching regular languages.</li>
102 developers. But regular grammars are not able to match a <strong>pair of
103 symbols</strong> (like parentheses, braces or quotes), while algebraic
126 <strong>analyzes</strong>: <strong>lexical</strong> and
127 <strong>syntactic</strong>. A lexical analysis <strong>splits</strong> a word
128 in a <strong>sequence of lexemes</strong>. This sequence will thereafter be
130 <strong>belongs</strong> to the language.</p>
133 sequence and verify that they allow <strong>going forward</strong> in the
134 <strong>derivation</strong> of grammar's rules.</p>
142 <strong>go back</strong> to <em>k</em> step backward; we also speak about
143 backtrack. Put another way, rules can be <strong>ambiguous</strong>: each time
146 <em>k</em> variable defines the ambiguity <strong>level</strong>. If a grammar
148 <strong>unambiguous</strong>: each time a lexeme is used to derive our rules,
150 that the <em>k</em> variable is <strong>undefined</strong>. The following
175 <p>Writing compilers is a <strong>laborious</strong> task. It is not always
177 why there exists <strong>compilers of compilers</strong>, or in other words,
179 <strong>intermediary</strong> language to write a grammar. The
182 compiler-compilers through a <strong>dedicated</strong> language.</p>
187 <strong>algebraic grammars</strong>. It is written in files having the
191 <p>A grammar is composed of <strong>lexemes</strong> and
192 <strong>rules</strong>. The declaration of a lexeme has the following syntax:
195 <strong>name</strong> of the lexeme, <code><em>value</em></code> represents
196 its <strong>value</strong>, expressed with the
200 represent the names of <strong>namespaces</strong> and are optional (the
205 <p>Namespaces represent disjointed <strong>subsets</strong> of lexemes, used
206 to <strong>ease</strong> analyzes. A <code>%skip</code> declaration is similar
208 <strong>skip</strong>, it means to not consider. A common example of
212 example, which is a softly <strong>simplified</strong> version of the
215 <strong>complete</strong> grammar is located in the
256 <code>string</code> (it allows to avoid <strong>confusion</strong> with the
259 <strong>disjunction</strong> of several lexemes and rules. The
260 <strong>constructions</strong> of the PP language are the following:</p>
262 <li><code><em>rule</em>()</code> to <strong>call</strong> a rule,</li>
264 to <strong>declare</strong> a lexeme (namespaces do not appear here),</li>
265 <li><code>|</code> for a <strong>disjunction</strong> (a choice),</li>
268 <strong>optional</strong>,</li>
270 present <strong>1 or more</strong> times,</li>
272 present <strong>0 or more</strong> times,</li>
276 <li><code>#<em>node</em></code> to create a <strong>node</strong>
278 <li><code><em>token</em>[<em>i</em>]</code> to <strong>unify</strong>
308 classical: it starts by <strong>lexically</strong> analyzing the textual data,
310 <strong>order</strong> of the lexemes is primordial because the lexical
312 <strong>syntactic</strong> analyzer comes into play in order to
313 <strong>match</strong> our data.</p>
315 <strong>trace</strong>. This trace can be transformed into an AST, stands for
318 later), which allows us to add new <strong>constraints</strong> which can be
321 is a <strong>helper</strong> to easily read a grammar expressed with the PP
352 <strong>often</strong>. This is why the <code>hoa</code> script provides the
367 <p>This is a useful way to <strong>quickly</strong> test a data against a
370 <p>The analyzes start with the <strong>root</strong> rule but we can use
371 <strong>another rule</strong> thanks to the second argument of the
398 <li>during the <strong>lexical</strong> analysis,
403 <li>during the <strong>syntactic</strong> analysis,
412 <strong>production</strong> of an AST. It is important to control its
413 <strong>form</strong>, its <strong>size</strong>, the data it
414 <strong>holds</strong> etc. This is why it is necessary to understand the
416 <strong>nodes</strong> in the AST. First, lexemes declared with the
422 be <strong>overwritten</strong> inside a <strong>same rule</strong>. Finally,
424 <code>#array</code> rule, that allows to define a <strong>default</strong>
440 <strong>derivation</strong> of a rule. The mecanism is normally pretty much
441 <strong>intuitive</strong>.</p>
446 the <strong>namespaces</strong>.</p>
447 <p><strong>Lexemes</strong> are put in namespaces, it means that only the
448 lexemes of the <strong>current</strong> namespace are used by the lexical
451 lexeme is consumed, it is able to <strong>change</strong> the current
457 embarrassing because it is put in <strong>different</strong> namespaces:</p>
514 The data has been successfully <strong>splitted</strong>, and we have jumped
521 <code>-></code> operator. No <strong>history</strong> about the used
523 be accessible through <strong>several</strong> namespaces and we would like
524 that a lexeme jumps back to the <strong>previous</strong> namespace. Put
533 are managed like a <strong>stack</strong>, hence the <em>shift</em>
597 is the capability to express a <strong>unification</strong> of lexemes. Let's
607 <strong>also</strong> <code>"foo'</code> and <code>'foo"</code>!</p>
609 <strong>constraint</strong> on the <strong>value</strong> of lexemes at
612 value. And finally, the unification is <strong>locale</strong> to an instance
614 rules and the unification is only applied on the <strong>current</strong>
642 <p>An <strong>abstract syntax</strong> tree represents a textual data in a
643 <strong>structural</strong> form. Each <strong>node</strong> of this tree is
659 <p>Visitors are the most simple way to <strong>cross</strong> an AST. As an
766 <p>The <code>getData</code> method is very useful to <strong>store</strong>
767 data that are likely to be <strong>reused</strong>, for example from one
768 visitor to another. This method returns a <strong>reference</strong> to an
777 <p>It is common to use one visitor per <strong>constraint</strong>:
783 <strong>trivial</strong> and we will use it most of the time with a
807 <strong>layers</strong>. Each layer can be used with a maximal modularity and
810 <strong>trace</strong> is computed. This trace is a flat array containing only
890 trace can be useful: if we are on a rule, we have its <strong>name</strong>
892 <strong>name</strong> (with the <code>getTokenName</code> method), its
893 <strong>representation</strong> (expressed in the PCRE format, with the
894 <code>getRepresentation</code> method), its <strong>value</strong> (with the
895 <code>getValue</code> method), if it is a node to <strong>keep</strong> in the
896 AST (with the <code>isKept</code> method) and its <strong>unification</strong>
902 and sometimes we enter in a <strong>numerical</strong> rule, like
905 each one of these rules is <strong>linearized</strong> regarding the PP
917 array. We will print all the <strong>accessible</strong> rules from the root
985 <strong>tiniest</strong> form. It allows to <strong>reason</strong> on rules
986 in an <strong>easier</strong> and <strong>faster</strong> way. Indeed,
992 <p>A grammar can be used to satisfy two goals: to <strong>validate</strong> a
993 data (if we see it as an automata) or to <strong>generate</strong> a data. So
995 <strong>initialization</strong> of our compiler, running of the lexical and
996 syntactic <strong>analyzers</strong>, producing a <strong>trace</strong>,
997 transformation of this trace into an <strong>AST</strong> in order to
998 <strong>visit</strong> that tree. But we can stop at the first step, the
1002 <strong>generation</strong> algorithms:</p>
1009 algorithm can satisfy <strong>numerous</strong> context of usages. Each one
1011 <p>Generating a data based on a grammar requires <strong>two
1012 steps</strong>:</p>
1014 <li>generating values for all <strong>lexemes</strong> in order to get raw
1016 <li>generating a <strong>path</strong> in the rules of the grammar.</li>
1025 it has to be <strong>fast</strong>. We are going to use a process called
1026 <strong>isotropic</strong>. We start with a rule and we go forward only by
1027 using <strong>random</strong> and <strong>locally uniform</strong> (only for
1044 <strong>sequences of lexemes</strong>.</p>
1049 <p>The first algorithm is the <strong>random</strong> and
1050 <strong>uniform</strong> generation. This algorithm is useful if we would like
1051 to generate sequences of lexemes of a <strong>given size <em>n</em></strong>
1052 with a <strong>uniformity</strong>, not local (like in the isotropic
1053 generation), but on the <strong>set</strong> of all the possible sequences.
1054 Succinctly, the algorithm works in two steps: <strong>pre-computation</strong>
1055 (once per size) followed by a <strong>generation</strong>. The pre-computation
1056 is an <strong>automatic</strong> step and computes the <strong>number</strong>
1058 helps to compute <strong>cumulative distribution functions</strong> in order
1059 to <strong>guide</strong> the final sequences of lexemes.</p>
1099 <p>The second algorithm is the <strong>bounded exhaustive</strong> generation.
1100 This algorithm generates <strong>all</strong> sequences of lexemes (for the
1101 exhaustive side) from size 1 <strong>to</strong> <em>n</em> (for the
1103 exhaustivity is kind of a <strong>proof</strong>! The algorithm behaves like
1153 <p>The last algorithm is the <strong>grammar coverage-based</strong>
1154 generation. This algorithm reduces the <strong>combinatory explosion</strong>
1156 generate sequences of lexemes that will “<strong>activate</strong>” all the
1157 rules <strong>branches</strong> of the grammar. A rule is said covered if and
1159 been used in a sequence. To ensure a <strong>diversity</strong> in the
1161 <strong>randomly</strong>. The algorithm also behaves like an iterator:</p>
1181 <strong>termination</strong> of the algorithm, we will use the following
1182 <strong>heuristic</strong> on the <strong>repetition</strong> operators:
1188 finds its origin in the <strong>limit</strong> testing.</p>
1190 <strong>cover</strong> entirely our grammar!</p>
1195 <p>Here are some <strong>clues</strong> to know when to use one algorithm
1225 <strong>monolitic</strong>.</p>
1229 <p><code>Hoa\Compiler</code> provides two <strong>compiler-compilers</strong>:
1231 <strong>validate</strong> data. The <strong>PP language</strong> allows to
1232 write <strong>algebraic grammars</strong> in a <strong>simple</strong> and
1233 <strong>natural</strong> way. The LL(<em>k</em>) compiler-compiler is split
1234 in <strong>distinct processus</strong>, which encourages hackability. Finally,
1235 several algorithms allows to <strong>generate</strong> data based on a grammar