1<?xml version="1.0" encoding="utf-8"?>
2
3<overlay xmlns="http://hoa-project.net/xyl/xylophone">
4<yield id="chapter">
5
6  <p>Strings can sometimes be <strong>complex</strong>, especially when they use
7  the <code>Unicode</code> encoding format. The <code>Hoa\Ustring</code> library
8  provides several operations on UTF-8 strings.</p>
9
10  <h2 id="Table_of_contents">Table of contents</h2>
11
12  <tableofcontents id="main-toc" />
13
14  <h2 id="Introduction" for="main-toc">Introduction</h2>
15
16  <p>When we manipulate strings, the <a href="http://unicode.org/">Unicode</a>
17  format establishes itself because of its <strong>compatibility</strong> with
18  historical formats (like ASCII) and its capacity to understand a
19  <strong>large</strong> range of characters and symbols for all cultures and
20  all regions in the world. PHP provides several tools to manipulate such
21  strings, like the following extensions:
22  <a href="http://php.net/mbstring"><code>mbstring</code></a>,
23  <a href="http://php.net/iconv"><code>iconv</code></a> or also the excellent
24  <a href="http://php.net/intl"><code>intl</code></a>  which is based on
25  <a href="http://icu-project.org/">ICU</a>, the reference implementation of
26  Unicode. Unfortunately, sometimes we have to mix these extensions to achieve
27  our aims and at the cost of a certain <strong>complexity</strong> along with
28  a regrettable <strong>verbosity</strong>.</p>
29  <p>The <code>Hoa\Ustring</code> library answers to these issues by providing a
30  <strong>simple</strong> way to manipulate strings with
31  <strong>performance</strong> and <strong>efficiency</strong> in minds. It
32  also provides some evoluated algorithms to perform <strong>search</strong>
33  operations on strings.</p>
34
35  <h2 id="Unicode_strings" for="main-toc">Unicode strings</h2>
36
37  <p>The <code>Hoa\Ustring\Ustring</code> class represents a
38  <strong>UTF-8</strong> Unicode strings and allows to manipulate it easily.
39  This class implements the
40  <a href="http://php.net/arrayaccess"><code>ArrayAccess</code></a>,
41  <a href="http://php.net/countable"><code>Countable</code></a> and
42  <a href="http://php.net/iteratoraggregate"><code>IteratorAggregate</code></a>
43  interfaces. We are going to use three examples in three different languages:
44  French, Arab and Japanese. Thus:</p>
45  <pre><code class="language-php">$french   = new Hoa\Ustring\Ustring('Je t\'aime');
46$arabic   = new Hoa\Ustring\Ustring('أحبك');
47$japanese = new Hoa\Ustring\Ustring('私はあなたを愛して');</code></pre>
48  <p>Now, let's see what we can do on these three strings.</p>
49
50  <h3 id="String_manipulation" for="main-toc">String manipulation</h3>
51
52  <p>Let's start with <strong>elementary</strong> operations. If we would like
53  to <strong>count</strong> the number of characters (not bytes), we will use
54  the <a href="http://php.net/count"><code>count</code> function</a>. Thus:</p>
55  <pre><code class="language-php">var_dump(
56    count($french),
57    count($arabic),
58    count($japanese)
59);
60
61/**
62 * Will output:
63 *     int(9)
64 *     int(4)
65 *     int(9)
66 */</code></pre>
67  <p>When we speak about text position, it is not suitable to speak about the
68  right or the left, but rather about a <strong>beginning</strong> or an
69  <strong>end</strong>, and based on the <strong>direction</strong> of writing.
70  We can know this direction thanks to the
71  <code>Hoa\Ustring\Ustring::getDirection</code> method. It returns the value of
72  one of the following constants:</p>
73  <ul>
74    <li><code>Hoa\Ustring\Ustring::LTR</code>, for left-to-right, if the text is
75    written from the left to the right,</li>
76    <li><code>Hoa\Ustring\Ustring::RTL</code>, for right-to-left, if the text is
77    written from the right to the left.</li>
78  </ul>
79  <p>Let's observe the result with our examples:</p>
80  <pre><code class="language-php">var_dump(
81    $french->getDirection()   === Hoa\Ustring\Ustring::LTR, // is left-to-right?
82    $arabic->getDirection()   === Hoa\Ustring\Ustring::RTL, // is right-to-left?
83    $japanese->getDirection() === Hoa\Ustring\Ustring::LTR  // is left-to-right?
84);
85
86/**
87 * Will output:
88 *     bool(true)
89 *     bool(true)
90 *     bool(true)
91 */</code></pre>
92  <p>The result of this method is computed thanks to the
93  <code>Hoa\Ustring\Ustring::getCharDirection</code> static method which computes
94  the direction of only one character.</p>
95  <p>If we would like to <strong>concatenate</strong> another string to the end
96  or to the beginning, we will respectively use the
97  <code>Hoa\Ustring\Ustring::append</code> and
98  <code>Hoa\Ustring\Ustring::prepend</code> methods. These methods, like most of
99  the ones which modifies the string, return the object itself, in order to
100  chain the calls. For instance:</p>
101  <pre><code class="language-php">echo $french->append('… et toi, m\'aimes-tu ?')->prepend('Mam\'zelle ! ');
102
103/**
104 * Will output:
105 *     Mam'zelle ! Je t'aime… et toi, m'aimes-tu ?
106 */</code></pre>
107  <p>We also have the <code>Hoa\Ustring\Ustring::toLowerCase</code> and
108  <code>Hoa\Ustring\Ustring::toUpperCase</code> methods to, respectively, set
109  the case of the string to lower or upper. For instance:</p>
110  <pre><code class="language-php">echo $french->toUpperCase();
111
112/**
113 * Will output:
114 *     MAM'ZELLE ! JE T'AIME… ET TOI, M'AIMES-TU ?
115 */</code></pre>
116  <p>We can also add characters to the beginning or to the end of the string to
117  reach a <strong>minimum</strong> length. This operation is frequently called
118  the <em>padding</em> (for historical reasons dating back to typewriters).
119  That's why we have the <code>Hoa\Ustring\Ustring::pad</code> method which
120  takes three arguments: the minimum length, characters to add and a constant
121  indicating whether we have to add at the end or at the beginning of the string
122  (respectively <code>Hoa\Ustring\Ustring::END</code>, by default, and
123  <code>Hoa\Ustring\Ustring::BEGINNING</code>).</p>
124  <pre><code class="language-php">echo $arabic->pad(20, ' ');
125
126/**
127 * Will output:
128 *                     أحبك
129 */</code></pre>
130  <p>A similar operation allows to remove, by default, <strong>spaces</strong>
131  at the beginning and at the end of the string thanks to the
132  <code>Hoa\Ustring\Ustring::trim</code> method. For example, to retreive our
133  original Arabic string:</p>
134  <pre><code class="language-php">echo $arabic->trim();
135
136/**
137 * Will output:
138 *     أحبك
139 */</code></pre>
140  <p>If we would like to remove other characters, we can use its first argument
141  which must be a regular expression. Finally, its second argument allows to
142  specify from what side we would like to remove character: at the beginning, at
143  the end or both, still by using the
144  <code>Hoa\Ustring\Ustring::BEGINNING</code> and
145  <code>Hoa\Ustring\Ustring::END</code> constants.</p>
146  <p>If we would like to remove other characters, we can use its first argument
147  which must be a regular expression. Finally, its second argument allows to
148  specify the side where to remove characters: at the beginning, at the end or
149  both, still by using the <code>Hoa\Ustring\Ustring::BEGINNING</code> and
150  <code>Hoa\Ustring\Ustring::END</code> constants. We can combine these
151  constants to express “both sides”, which is the default value:
152  <code class="language-php">Hoa\Ustring\Ustring::BEGINNING |
153  Hoa\Ustring\Ustring::END</code>. For example, to remove all the numbers and
154  the spaces only at the end, we will write:</p>
155  <pre><code class="language-php">$arabic->trim('\s|\d', Hoa\Ustring\Ustring::END);</code></pre>
156  <p>We can also <strong>reduce</strong> the string to a
157  <strong>sub-string</strong> by specifying the position of the first character
158  followed by the length of the sub-string to the
159  <code>Hoa\Ustring\Ustring::reduce</code> method:</p>
160  <pre><code class="language-php">echo $french->reduce(3, 6)->reduce(2, 4);
161
162/**
163 * Will output:
164 *     aime
165 */</code></pre>
166  <p>If we would like to get a specific character, we can rely on the
167  <code>ArrayAccess</code> interface. For instance, to get the first character
168  of each of our examples (from their original definitions):</p>
169  <pre><code class="language-php">var_dump(
170    $french[0],
171    $arabic[0],
172    $japanese[0]
173);
174
175/**
176 * Will output:
177 *     string(1) "J"
178 *     string(2) "أ"
179 *     string(3) "私"
180 */</code></pre>
181  <p>If we would like the last character, we will use the -1 index. The index is
182  not bounded to the length of the string. If the index exceeds this length,
183  then a <em>modulo</em> will be applied.</p>
184  <p>We can also modify or remove a specific character with this method. For
185  example:</p>
186  <pre><code class="language-php">$french->append(' ?');
187$french[-1] = '!';
188echo $french;
189
190/**
191 * Will output:
192 *     Je t'aime !
193 */</code></pre>
194  <p>Another very useful method is the <strong>ASCII</strong> transformation.
195  Be careful, this is not always possible, according to your settings. For
196  example:</p>
197  <pre><code class="language-php">$title = new Hoa\Ustring\Ustring('Un été brûlant sur la côte');
198echo $title->toAscii();
199
200/**
201 * Will output:
202 *     Un ete brulant sur la cote
203 */</code></pre>
204  <p>We can also transform from Arabic or Japanese to ASCII. Symbols, like
205  Mathemeticals symbols or emojis, are also transformed:</p>
206  <pre><code class="language-php">$emoji = new Hoa\Ustring\Ustring('I ❤ Unicode');
207$maths = new Hoa\Ustring\Ustring('∀ i ∈ ℕ');
208
209echo
210    $arabic->toAscii(), "\n",
211    $japanese->toAscii(), "\n",
212    $emoji->toAscii(), "\n",
213    $maths->toAscii(), "\n";
214
215/**
216 * Will output:
217 *     ahbk
218 *     sihaanatawo aishite
219 *     I (heavy black heart)️ Unicode
220 *     (for all) i (element of) N
221 */</code></pre>
222  <p>In order this method to work correctly, the
223  <a href="http://php.net/intl"><code>intl</code></a> extension needs to be
224  present, so that the
225  <a href="http://php.net/transliterator"><code>Transliterator</code></a> class
226  is present. If it does not exist, the
227  <a href="http://php.net/normalizer"><code>Normalizer</code></a> class must
228  exist. If this class does not exist neither, the
229  <code>Hoa\Ustring\Ustring::toAscii</code> method can still try a
230  transformation, but it is less efficient. To activate this last solution,
231  <code>true</code> must be passed as a single argument. This <em lang="fr">tour
232  de force</em> is not recommended in most cases.</p>
233  <p>We also find the <code>getTransliterator</code> method which returns a
234  <code>Transliterator</code> object, or <code>null</code> if this class does
235  not exist. This method takes a transliteration identifier as argument. We
236  suggest to <a href="http://userguide.icu-project.org/transforms/general">read
237  the documentation about the transliterator of ICU</a> to understand this
238  identifier. The <code>transliterate</code> method allows to transliterate the
239  current string based on an identifier and a beginning index and an end
240  one. This method works the same way than the
241  <a href="http://php.net/transliterator.transliterate"><code>Transliterator::transliterate</code></a>
242  method.</p>
243  <p>More generally, to change the <strong>encoding</strong> format, we can use
244  the <code>Hoa\Ustring\Ustring::transcode</code> static method, with a string
245  as first argument, the original encoding format as second argument and the
246  expected encoding format as third argument (UTF-8 by default). The get the
247  list of encoding formats, we have to refer to the
248  <a href="http://php.net/iconv"><code>iconv</code></a> extension or to use the
249  following command line in a terminal:</p>
250  <pre><code class="language-php">$ iconv --list</code></pre>
251  <p>To know if a string is encoded in UTF-8, we can use the
252  <code>Hoa\Ustring\Ustring::isUtf8</code> static method; for instance:</p>
253  <pre><code class="language-php">var_dump(
254    Hoa\Ustring\Ustring::isUtf8('a'),
255    Hoa\Ustring\Ustring::isUtf8(Hoa\Ustring\Ustring::transcode('a', 'UTF-8', 'UTF-16'))
256);
257
258/**
259 * Will output:
260 *     bool(true)
261 *     bool(false)
262 */</code></pre>
263  <p>We can <strong>split</strong> the string into several sub-strings by using
264  the <code>Hoa\Ustring\Ustring::split</code> method. As first argument, we have
265  a regular expression (of kind <a href="http://pcre.org/">PCRE</a>), then an
266  integer representing the maximum number of elements to return and finally a
267  combination of constants. These constants are the same as the ones of
268  <a href="http://php.net/preg_split"><code>preg_split</code></a>.</p>
269  <p>By default, the second argument is set to -1, which means infinity, and the
270  last argument is set to <code>PREG_SPLIT_NO_EMPTY</code>. Thus, if we would
271  like to get all the words of a string, we will write:</p>
272  <pre><code class="language-php">print_r($title->split('#\b|\s#'));
273
274/**
275 * Will output:
276 *     Array
277 *     (
278 *         [0] => Un
279 *         [1] => ete
280 *         [2] => brulant
281 *         [3] => sur
282 *         [4] => la
283 *         [5] => cote
284 *     )
285 */</code></pre>
286  <p>If we would like to <strong>iterate</strong> over all the
287  <strong>characters</strong>, it is recommended to use the
288  <code>IteratorAggregate</code> method, being the
289  <code>Hoa\Ustring\Ustring::getIterator</code> method. Let's see on the Arabic
290  example:</p>
291  <pre><code class="language-php">foreach ($arabic as $letter) {
292    echo $letter, "\n";
293}
294
295/**
296 * Will output:
297 *     أ
298 *     ح
299 *     ب
300 *     ك
301 */</code></pre>
302  <p>We notice that the iteration is based on the text direction, it means that
303  the first element of the iteration is the first letter of the string starting
304  from the beginning.</p>
305  <p>Of course, if we would like to get an array of characters, we can use the
306  <a href="http://php.net/iterator_to_array"><code>iterator_to_array</code></a>
307  PHP function:</p>
308  <pre><code class="language-php">print_r(iterator_to_array($arabic));
309
310/**
311 * Will output:
312 *     Array
313 *     (
314 *         [0] => أ
315 *         [1] => ح
316 *         [2] => ب
317 *         [3] => ك
318 *     )
319 */</code></pre>
320
321  <h3 id="Comparison_and_search" for="main-toc">Comparison and search</h3>
322
323  <p>Strings can also be <strong>compared</strong> thanks to the
324  <code>Hoa\Ustring\Ustring::compare</code> method:</p>
325  <pre><code class="language-php">$string = new Hoa\Ustring\Ustring('abc');
326var_dump(
327    $string->compare('wxyz')
328);
329
330/**
331 * Will output:
332 *     string(-1)
333 */</code></pre>
334  <p>This methods returns -1 if the initial string comes before (in the
335  alphabetical order), 0 if it is identical and 1 if it comes after. If we
336  would like to use all the power of the underlying mechanism, we can call the
337  <code>Hoa\Ustring\Ustring::getCollator</code> static method (if the
338  <a href="http://php.net/Collator"><code>Collator</code></a> class exists, else
339  <code>Hoa\Ustring\Ustring::compare</code> will use a simple byte to bytes
340  comparison without taking care of the other parameters). Thus, if we would
341  like to sort an array of strings, we will write:</p>
342  <pre><code class="language-php">$strings = array('c', 'Σ', 'd', 'x', 'α', 'a');
343Hoa\Ustring\Ustring::getCollator()->sort($strings);
344print_r($strings);
345
346/**
347 * Could output:
348 *     Array
349 *     (
350 *         [0] => a
351 *         [1] => c
352 *         [2] => d
353 *         [3] => x
354 *         [4] => α
355 *         [5] => Σ
356 *     )
357 */</code></pre>
358  <p>Comparison between two strings depends on the <strong>locale</strong>, it
359  means of the localization of the system, like the language, the country, the
360  region etc. We can use the
361  <a href="@hack:chapter=Locale"><code>Hoa\Locale</code> library</a> to modify
362  these data, but it's not a dependence of <code>Hoa\Ustring</code>.</p>
363  <p>We can also know if a string <strong>matches</strong> a certain pattern,
364  still expressed with a regular expression. To achieve that, we will use the
365  <code>Hoa\Ustring\Ustring::match</code> method. This method relies on the
366  <a href="http://php.net/preg_match"><code>preg_match</code></a> and
367  <a href="http://php.net/preg_match_all"><code>preg_match_all</code></a> PHP
368  functions, but by modifying the pattern's options to ensure the Unicode
369  support. We have the following parameters: the pattern, a variable passed by
370  reference to collect the matches, flags, an offset and finally a boolean
371  indicating whether the search is global or not (respectively if we have to use
372  <code>preg_match_all</code> or <code>preg_match</code>). By default, the
373  search is not global.</p>
374  <p>Thus, we will check that our French example contains <code>aime</code> with
375  a direct object complement:</p>
376  <pre><code class="language-php">$french->match('#(?:(?&amp;lt;direct_object>\w)[\'\b])aime#', $matches);
377var_dump($matches['direct_object']);
378
379/**
380 * Will output:
381 *     string(1) "t"
382 */</code></pre>
383  <p>This method returns <code>false</code> if an error is raised (for example
384  if the pattern is not correct), 0 if no match has been found, the number of
385  matches else.</p>
386  <p>Similarly, we can <strong>search</strong> and <strong>replace</strong>
387  sub-strings by other sub-strings based on a pattern, still expressed with a
388  regular expression. To achieve that, we will use the
389  <code>Hoa\Ustring\Ustring::replace</code> method. This method uses the
390  <a href="http://php.net/preg_replace"><code>preg_replace</code></a> and
391  <a href="http://php.net/preg_replace_callback"><code>preg_replace_callback</code></a>
392  PHP functions, but still by modifying the pattern's options to ensure the
393  Unicode support. As first argument, we find one or more patterns, as second
394  argument, one or more replacements and as last argument the limit of
395  replacements to apply. If the replacement is a callable, then the
396  <code>preg_replace_callback</code> function will be used.</p>
397  <p>Thus, we will modify our French example to be more polite:</p>
398  <pre><code class="language-php">$french->replace('#(?:\w[\'\b])(?&amp;lt;verb>aime)#', function ($matches) {
399    return 'vous ' . $matches['verb'];
400});
401
402echo $french;
403
404/**
405 * Will output:
406 *     Je vous aime
407 */</code></pre>
408  <p>The <code>Hoa\Ustring\Ustring</code> class provides constants which are
409  aliases of existing PHP constants and ensure a better readability of the
410  code:</p>
411  <ul>
412    <li><code>Hoa\Ustring\Ustring::WITHOUT_EMPTY</code>, alias of
413    <code>PREG_SPLIT_NO_EMPTY</code>,</li>
414    <li><code>Hoa\Ustring\Ustring::WITH_DELIMITERS</code>, alias of
415    <code>PREG_SPLIT_DELIM_CAPTURE</code>,</li>
416    <li><code>Hoa\Ustring\Ustring::WITH_OFFSET</code>, alias of
417    <code>PREG_OFFSET_CAPTURE</code> and
418    <code>PREG_SPLIT_OFFSET_CAPTURE</code>,</li>
419    <li><code>Hoa\Ustring\Ustring::GROUP_BY_PATTERN</code>, alias of
420    <code>PREG_PATTERN_ORDER</code>,</li>
421    <li><code>Hoa\Ustring\Ustring::GROUP_BY_TUPLE</code>, alias of
422    <code>PREG_SET_ORDER</code>.</li>
423  </ul>
424  <p>Because they are strict aliases, we can write:</p>
425  <pre><code class="language-php">$string = new Hoa\Ustring\Ustring('abc1 defg2 hikl3 xyz4');
426$string->match(
427    '#(\w+)(\d)#',
428    $matches,
429    Hoa\Ustring\Ustring::WITH_OFFSET
430  | Hoa\Ustring\Ustring::GROUP_BY_TUPLE,
431    0,
432    true
433);</code></pre>
434
435  <h3 id="Characters" for="main-toc">Characters</h3>
436
437  <p>The <code>Hoa\Ustring\Ustring</code> class offers static methods working on
438  a single Unicode character. We have already mentionned the
439  <code>getCharDirection</code> method which allows to know the
440  <strong>direction</strong> of a character. We also have the
441  <code>getCharWidth</code> which counts the <strong>number of columns</strong>
442  necessary to print a single character. Thus:</p>
443  <pre><code class="language-php">var_dump(
444    Hoa\Ustring\Ustring::getCharWidth(Hoa\Ustring\Ustring::fromCode(0x7f)),
445    Hoa\Ustring\Ustring::getCharWidth('a'),
446    Hoa\Ustring\Ustring::getCharWidth('㽠')
447);
448
449/**
450 * Will output:
451 *     int(-1)
452 *     int(1)
453 *     int(2)
454 */</code></pre>
455  <p>This method returns -1 or 0 if the character is not
456  <strong>printable</strong> (for instance, if this is a control character, like
457  <code>0x7f</code> which corresponds to <code>DELETE</code>), 1 or more if this
458  is a character that can be printed. In our example, <code>㽠</code> requires
459  2 columns to be printed.</p>
460  <p>To get more semantics, we have the
461  <code>Hoa\Ustring\Ustring::isCharPrintable</code> method which allows to know
462  whether a character is printable or not.</p>
463  <p>If we would like to count the number of columns necessary for a whole
464  string, we have to use the <code>Hoa\Ustring\Ustring::getWidth</code> method.
465  Thus:</p>
466  <pre><code class="language-php">var_dump(
467    $french->getWidth(),
468    $arabic->getWidth(),
469    $japanese->getWidth()
470);
471
472/**
473 * Will output:
474 *     int(9)
475 *     int(4)
476 *     int(18)
477 */</code></pre>
478  <p>Try this in your terminal with a <strong>monospaced</strong> font. You will
479  observe that Japanese requires 18 columns to be printed. This measure is very
480  useful if we would like to know the length of a string to position it
481  efficiently.</p>
482  <p>The <code>getCharWidth</code> method is different of <code>getWidth</code>
483  because it includes control characters. This method is intended to be used,
484  for example, with terminals (please, see the
485  <a href="@hack:chapter=Console"><code>Hoa\Console</code> library</a>).</p>
486  <p>Finally, if this time we are not interested by Unicode characters but
487  rather by <strong>machine</strong> characters <code>char</code> (being
488  1 byte), we have an extra operation. The
489  <code>Hoa\Ustring\Ustring::getBytesLength</code> method will count the
490  <strong>length</strong> of the string in bytes:</p>
491  <pre><code class="language-php">var_dump(
492    $arabic->getBytesLength(),
493    $japanese->getBytesLength()
494);
495
496/**
497 * Will output:
498 *     int(8)
499 *     int(27)
500 */</code></pre>
501  <p>If we compare these results with the ones of the
502  <code>Hoa\Ustring\Ustring::count</code> method, we understand that the Arabic
503  characters are encoded with 2 bytes whereas Japanese characteres are encoded
504  with 3 bytes. We can also get a specific byte thanks to the
505  <code>Hoa\Ustring\Ustring::getByteAt</code> method. Once again, the index is
506  not bounded.</p>
507
508  <h3 id="Code-point" for="main-toc">Code-point</h3>
509
510  <p>Each character is represented by an integer, called a
511  <strong>code-point</strong>. To get the code-point of a character, we can
512  use the <code>Hoa\Ustring\Ustring::toCode</code> static method, and to get a
513  character based on its code-point, we can use the
514  <code>Hoa\Ustring\Ustring::fromCode</code> static method. We also have the
515  <code>Hoa\Ustring\Ustring::toBinaryCode</code> method which returns the binary
516  representation of a character. Let's take an example:</p>
517  <pre><code class="language-php">var_dump(
518    Hoa\Ustring\Ustring::toCode('Σ'),
519    Hoa\Ustring\Ustring::toBinaryCode('Σ'),
520    Hoa\Ustring\Ustring::fromCode(0x1a9)
521);
522
523/**
524 * Will output:
525 *     int(931)
526 *     string(32) "1100111010100011"
527 *     string(2) "Σ"
528 */</code></pre>
529
530  <h2 id="Search_algorithms" for="main-toc">Search algorithms</h2>
531
532  <p>The <code>Hoa\Ustring</code> library provides sophisticated
533  <strong>search</strong> algorithms on strings through the
534  <code>Hoa\Ustring\Search</code> class.</p>
535  <p>We will study the <code>Hoa\Ustring\Search::approximated</code> algorithm
536  which searches a sub-string in a string up to <strong><em>k</em>
537  differences</strong> (a difference is an addition, a deletion or a
538  modification). Let's take the classical example of a DNA representation: We
539  will search all the sub-strings approximating <code>GATAA</code> with
540  1 difference (maximum) in <code>CAGATAAGAGAA</code>. So, we will write:</p>
541  <pre><code class="language-php">$x      = 'GATAA';
542$y      = 'CAGATAAGAGAA';
543$k      = 1;
544$search = Hoa\Ustring\Search::approximated($y, $x, $k);
545$n      = count($search);
546
547echo 'Try to match ', $x, ' in ', $y, ' with at most ', $k, ' difference(s):', "\n";
548echo $n, ' match(es) found:', "\n";
549
550foreach ($search as $position) {
551    echo '    • ', substr($y, $position['i'], $position['l'), "\n";
552}
553
554/**
555 * Will output:
556 *     Try to match GATAA in CAGATAAGAGAA with at most 1 difference(s):
557 *     4 match(es) found:
558 *         • AGATA
559 *         • GATAA
560 *         • ATAAG
561 *         • GAGAA
562 */</code></pre>
563  <p>This methods returns an array of arrays. Each sub-array represents a result
564  and contains three indexes: <code>i</code> for the position of the first
565  character (byte) of the result, <code>j</code> for the position of the last
566  character and <code>l</code> for the length of the result (simply
567  <code>j</code> - <code>i</code>). Thus, we can compute the results by using
568  our initial string (here <code class="language-php">$y</code>) and its
569  indexes.</p>
570  <p>With our example, we have four results. The first is <code>AGATA</code>,
571  being <code>GATA<em>A</em></code> with one moved character, and
572  <code>AGATA</code> exists in <code>C<em>AGATA</em>AGAGAA</code>.  The second
573  result is <code>GATAA</code>, our sub-string, which well and truly exists in
574  <code>CA<em>GATAA</em>GAGAA</code>. The third result is <code>ATAAG</code>,
575  being <code><em>G</em>ATAA</code> with one moved character, and
576  <code>ATAAG</code> exists in <code>CAG<em>ATAAG</em>AGAA</code>. Finally, the
577  last result is <code>GAGAA</code>, being <code>GA<em>T</em>AA</code> with one
578  modified character, and <code>GAGAA</code> exists in
579  <code>CAGATAA<em>GAGAA</em></code>.</p>
580  <p>Another example, more concrete this time. We will consider the
581  <code>--testIt --foobar --testThat --testAt</code> string (which represents
582  possible options of a command line), and we will search <code>--testot</code>,
583  an option that should have been given by the user. This option does not exist
584  as it is. We will then use our search algorithm with at most 1 difference.
585  Let's see:</p>
586  <pre><code class="language-php">$x      = 'testot';
587$y      = '--testIt --foobar --testThat --testAt';
588$k      = 1;
589$search = Hoa\Ustring\Search::approximated($y, $x, $k);
590$n      = count($search);
591
592// …
593
594/**
595 * Will output:
596 *     Try to match testot in --testIt --foobar --testThat --testAt with at most 1 difference(s)
597 *     2 match(es) found:
598 *         • testIt
599 *         • testAt
600 */</code></pre>
601  <p>The <code>testIt</code> and <code>testAt</code> results are true options,
602  so we can suggest them to the user. This is a mechanism user by
603  <code>Hoa\Console</code> to suggest corrections to the user in case of a
604  mistyping.</p>
605
606  <h2 id="Conclusion" for="main-toc">Conclusion</h2>
607
608  <p>The <code>Hoa\Ustring</code> library provides facilities to manipulate
609  strings encoded with the Unicode format, but also to make sophisticated search
610  on strings.</p>
611
612</yield>
613</overlay>
614