Index.xyl - OpenGrok cross reference for /plugin/findologicxmlexport/vendor/hoa/compiler/Documentation/En/Index.xyl

Lines Matching full:strong
6   <p><strong>Compilers</strong> allow to <strong>analyze</strong> and
7   <strong>manipulate textual</strong> data. There is numerous usages.
20   <p>A <strong>language</strong> is a way to express or to
21   <strong>formulate</strong> a <strong>solution</strong> to a
22   <strong>problem</strong>. And there exists a lot of problems. We read and
24   <strong>understood</strong> by <strong>machines</strong>. This operation is
25   possible thanks to <strong>compilers</strong>.</p>
27   languages</a> is a theory that includes the <strong>automatic
28   analysis</strong> of those languages through tools like
29   <strong>automata</strong> or <strong>grammars</strong>. It is necessary to
36   <p>A <strong>language</strong> is a set of <strong>words</strong>. Each word
37   is a <strong>sequence</strong> of <strong>symbols</strong> belonging to an
38   <strong>alphabet</strong>. A symbol represents the smallest <strong>lexical
39   unit</strong> of a language, it is atomic and we will call it a
40   <strong>lexeme</strong> (also often mentionned as a token). These sequences of
41   lexemes representing words are built with <strong>rules</strong>. From a word
42   and a root rule, we will try to <strong>derive</strong> its sub-rules. If a
43   derivation exists, then the word is considered as <strong>valid</strong>, else
44   it is considered as <strong>invalid</strong>. We also speak about words
45   <strong>matching</strong>. For example, if we consider the following
56   <strong>disjunction</strong>, i.e. the “or”, is represented by the
63   <p>We continue to derive until we <strong>eliminate</strong> every rules in
64   order to have only <strong>lexemes</strong>:</p>
82   <p>A set of rules is called a <strong>grammar</strong>. And so, a grammar
83   represents a <strong>language</strong>!</p>
87   <strong>levels</strong>:</p>
89     <li><strong>unrestricted</strong> grammars, matching langages known as
91     <li><strong>context-sensitive</strong> grammars, matching contextual
93     <li><strong>context-free</strong> grammars, matching algebraic languages,
95     <li><strong>regular</strong> grammars, matching regular languages.</li>
102   developers. But regular grammars are not able to match a <strong>pair of
103   symbols</strong> (like parentheses, braces or quotes), while algebraic
126   <strong>analyzes</strong>: <strong>lexical</strong> and
127   <strong>syntactic</strong>. A lexical analysis <strong>splits</strong> a word
128   in a <strong>sequence of lexemes</strong>. This sequence will thereafter be
130   <strong>belongs</strong> to the language.</p>
133   sequence and verify that they allow <strong>going forward</strong> in the
134   <strong>derivation</strong> of grammar's rules.</p>
142   <strong>go back</strong> to <em>k</em> step backward; we also speak about
143   backtrack. Put another way, rules can be <strong>ambiguous</strong>: each time
146   <em>k</em> variable defines the ambiguity <strong>level</strong>. If a grammar
148   <strong>unambiguous</strong>: each time a lexeme is used to derive our rules,
150   that the <em>k</em> variable is <strong>undefined</strong>. The following
175   <p>Writing compilers is a <strong>laborious</strong> task. It is not always
177   why there exists <strong>compilers of compilers</strong>, or in other words,
179   <strong>intermediary</strong> language to write a grammar. The
182   compiler-compilers through a <strong>dedicated</strong> language.</p>
187   <strong>algebraic grammars</strong>. It is written in files having the
191   <p>A grammar is composed of <strong>lexemes</strong> and
192   <strong>rules</strong>. The declaration of a lexeme has the following syntax:
195   <strong>name</strong> of the lexeme, <code><em>value</em></code> represents
196   its <strong>value</strong>, expressed with the
200   represent the names of <strong>namespaces</strong> and are optional (the
205   <p>Namespaces represent disjointed <strong>subsets</strong> of lexemes, used
206   to <strong>ease</strong> analyzes. A <code>%skip</code> declaration is similar
208   <strong>skip</strong>, it means to not consider. A common example of
212   example, which is a softly <strong>simplified</strong> version of the
215   <strong>complete</strong> grammar is located in the
256   <code>string</code> (it allows to avoid <strong>confusion</strong> with the
259   <strong>disjunction</strong> of several lexemes and rules. The
260   <strong>constructions</strong> of the PP language are the following:</p>
262     <li><code><em>rule</em>()</code> to <strong>call</strong> a rule,</li>
264     to <strong>declare</strong> a lexeme (namespaces do not appear here),</li>
265     <li><code>|</code> for a <strong>disjunction</strong> (a choice),</li>
268     <strong>optional</strong>,</li>
270     present <strong>1 or more</strong> times,</li>
272     present <strong>0 or more</strong> times,</li>
276     <li><code>#<em>node</em></code> to create a <strong>node</strong>
278     <li><code><em>token</em>[<em>i</em>]</code> to <strong>unify</strong>
308   classical: it starts by <strong>lexically</strong> analyzing the textual data,
310   <strong>order</strong> of the lexemes is primordial because the lexical
312   <strong>syntactic</strong> analyzer comes into play in order to
313   <strong>match</strong> our data.</p>
315   <strong>trace</strong>.  This trace can be transformed into an AST, stands for
318   later), which allows us to add new <strong>constraints</strong> which can be
321   is a <strong>helper</strong> to easily read a grammar expressed with the PP
352   <strong>often</strong>. This is why the <code>hoa</code> script provides the
367   <p>This is a useful way to <strong>quickly</strong> test a data against a
370   <p>The analyzes start with the <strong>root</strong> rule but we can use
371   <strong>another rule</strong> thanks to the second argument of the
398     <li>during the <strong>lexical</strong> analysis,
403     <li>during the <strong>syntactic</strong> analysis,
412   <strong>production</strong> of an AST. It is important to control its
413   <strong>form</strong>, its <strong>size</strong>, the data it
414   <strong>holds</strong> etc. This is why it is necessary to understand the
416   <strong>nodes</strong> in the AST. First, lexemes declared with the
422   be <strong>overwritten</strong> inside a <strong>same rule</strong>. Finally,
424   <code>#array</code> rule, that allows to define a <strong>default</strong>
440   <strong>derivation</strong> of a rule. The mecanism is normally pretty much
441   <strong>intuitive</strong>.</p>
446   the <strong>namespaces</strong>.</p>
447   <p><strong>Lexemes</strong> are put in namespaces, it means that only the
448   lexemes of the <strong>current</strong> namespace are used by the lexical
451   lexeme is consumed, it is able to <strong>change</strong> the current
457   embarrassing because it is put in <strong>different</strong> namespaces:</p>
514   The data has been successfully <strong>splitted</strong>, and we have jumped
521   <code>-></code> operator. No <strong>history</strong> about the used
523   be accessible through <strong>several</strong> namespaces and we would like
524   that a lexeme jumps back to the <strong>previous</strong> namespace. Put
533   are managed like a <strong>stack</strong>, hence the <em>shift</em>
597   is the capability to express a <strong>unification</strong> of lexemes. Let's
607   <strong>also</strong> <code>"foo'</code> and <code>'foo"</code>!</p>
609   <strong>constraint</strong> on the <strong>value</strong> of lexemes at
612   value. And finally, the unification is <strong>locale</strong> to an instance
614   rules and the unification is only applied on the <strong>current</strong>
642   <p>An <strong>abstract syntax</strong> tree represents a textual data in a
643   <strong>structural</strong> form. Each <strong>node</strong> of this tree is
659   <p>Visitors are the most simple way to <strong>cross</strong> an AST. As an
766   <p>The <code>getData</code> method is very useful to <strong>store</strong>
767   data that are likely to be <strong>reused</strong>, for example from one
768   visitor to another. This method returns a <strong>reference</strong> to an
777   <p>It is common to use one visitor per <strong>constraint</strong>:
783   <strong>trivial</strong> and we will use it most of the time with a
807   <strong>layers</strong>. Each layer can be used with a maximal modularity and
810   <strong>trace</strong> is computed. This trace is a flat array containing only
890   trace can be useful: if we are on a rule, we have its <strong>name</strong>
892   <strong>name</strong> (with the <code>getTokenName</code> method), its
893   <strong>representation</strong> (expressed in the PCRE format, with the
894   <code>getRepresentation</code> method), its <strong>value</strong> (with the
895   <code>getValue</code> method), if it is a node to <strong>keep</strong> in the
896   AST (with the <code>isKept</code> method) and its <strong>unification</strong>
902   and sometimes we enter in a <strong>numerical</strong> rule, like
905   each one of these rules is <strong>linearized</strong> regarding the PP
917   array. We will print all the <strong>accessible</strong> rules from the root
985   <strong>tiniest</strong> form. It allows to <strong>reason</strong> on rules
986   in an <strong>easier</strong> and <strong>faster</strong> way. Indeed,
992   <p>A grammar can be used to satisfy two goals: to <strong>validate</strong> a
993   data (if we see it as an automata) or to <strong>generate</strong> a data. So
995   <strong>initialization</strong> of our compiler, running of the lexical and
996   syntactic <strong>analyzers</strong>, producing a <strong>trace</strong>,
997   transformation of this trace into an <strong>AST</strong> in order to
998   <strong>visit</strong> that tree. But we can stop at the first step, the
1002   <strong>generation</strong> algorithms:</p>
1009   algorithm can satisfy <strong>numerous</strong> context of usages. Each one
1011   <p>Generating a data based on a grammar requires <strong>two
1012   steps</strong>:</p>
1014     <li>generating values for all <strong>lexemes</strong> in order to get raw
1016     <li>generating a <strong>path</strong> in the rules of the grammar.</li>
1025   it has to be <strong>fast</strong>. We are going to use a process called
1026   <strong>isotropic</strong>. We start with a rule and we go forward only by
1027   using <strong>random</strong> and <strong>locally uniform</strong> (only for
1044   <strong>sequences of lexemes</strong>.</p>
1049   <p>The first algorithm is the <strong>random</strong> and
1050   <strong>uniform</strong> generation. This algorithm is useful if we would like
1051   to generate sequences of lexemes of a <strong>given size <em>n</em></strong>
1052   with a <strong>uniformity</strong>, not local (like in the isotropic
1053   generation), but on the <strong>set</strong> of all the possible sequences.
1054   Succinctly, the algorithm works in two steps: <strong>pre-computation</strong>
1055   (once per size) followed by a <strong>generation</strong>. The pre-computation
1056   is an <strong>automatic</strong> step and computes the <strong>number</strong>
1058   helps to compute <strong>cumulative distribution functions</strong> in order
1059   to <strong>guide</strong> the final sequences of lexemes.</p>
1099   <p>The second algorithm is the <strong>bounded exhaustive</strong> generation.
1100   This algorithm generates <strong>all</strong> sequences of lexemes (for the
1101   exhaustive side) from size 1 <strong>to</strong> <em>n</em> (for the
1103   exhaustivity is kind of a <strong>proof</strong>! The algorithm behaves like
1153   <p>The last algorithm is the <strong>grammar coverage-based</strong>
1154   generation. This algorithm reduces the <strong>combinatory explosion</strong>
1156   generate sequences of lexemes that will “<strong>activate</strong>” all the
1157   rules <strong>branches</strong> of the grammar. A rule is said covered if and
1159   been used in a sequence. To ensure a <strong>diversity</strong> in the
1161   <strong>randomly</strong>. The algorithm also behaves like an iterator:</p>
1181   <strong>termination</strong> of the algorithm, we will use the following
1182   <strong>heuristic</strong> on the <strong>repetition</strong> operators:
1188   finds its origin in the <strong>limit</strong> testing.</p>
1190   <strong>cover</strong> entirely our grammar!</p>
1195   <p>Here are some <strong>clues</strong> to know when to use one algorithm
1225   <strong>monolitic</strong>.</p>
1229   <p><code>Hoa\Compiler</code> provides two <strong>compiler-compilers</strong>:
1231   <strong>validate</strong> data. The <strong>PP language</strong> allows to
1232   write <strong>algebraic grammars</strong> in a <strong>simple</strong> and
1233   <strong>natural</strong> way. The LL(<em>k</em>) compiler-compiler is split
1234   in <strong>distinct processus</strong>, which encourages hackability. Finally,
1235   several algorithms allows to <strong>generate</strong> data based on a grammar