1<?xml version="1.0" encoding="utf-8"?>
2<!DOCTYPE library PUBLIC "-//Boost//DTD BoostBook XML V1.0//EN"
3"http://www.boost.org/tools/boostbook/dtd/boostbook.dtd">
4
5
6<!-- Copyright (c) 2002-2006 Pavol Droba.
7     Subject to the Boost Software License, Version 1.0.
8     (See accompanying file LICENSE_1_0.txt or  http://www.boost.org/LICENSE_1_0.txt)
9-->
10
11
12<section id="string_algo.usage" last-revision="$Date$">
13    <title>Usage</title>
14
15    <using-namespace name="boost"/>
16    <using-namespace name="boost::algorithm"/>
17
18
19    <section>
20        <title>First Example</title>
21
22        <para>
23            Using the algorithms is straightforward. Let us have a look at the first example:
24        </para>
25        <programlisting>
26    #include &lt;boost/algorithm/string.hpp&gt;
27    using namespace std;
28    using namespace boost;
29
30    // ...
31
32    string str1(" hello world! ");
33    to_upper(str1);  // str1 == " HELLO WORLD! "
34    trim(str1);      // str1 == "HELLO WORLD!"
35
36    string str2=
37       to_lower_copy(
38          ireplace_first_copy(
39             str1,"hello","goodbye")); // str2 == "goodbye world!"
40        </programlisting>
41        <para>
42            This example converts str1 to upper case and trims spaces from the start and the end
43            of the string. str2 is then created as a copy of str1 with "hello" replaced with "goodbye".
44            This example demonstrates several important concepts used in the library:
45        </para>
46        <itemizedlist>
47            <listitem>
48                <para><emphasis role="bold">Container parameters:</emphasis>
49                    Unlike in the STL algorithms, parameters are not specified only in the form
50                    of iterators. The STL convention allows for great flexibility,
51                    but it has several limitations. It is not possible to <emphasis>stack</emphasis> algorithms together,
52                    because a container is passed in two parameters. Therefore it is not possible to use
53                    a return value from another algorithm. It is considerably easier to write
54                    <code>to_lower(str1)</code>, than <code>to_lower(str1.begin(), str1.end())</code>.
55                </para>
56                <para>
57                    The magic of <ulink url="../../libs/range/index.html">Boost.Range</ulink>
58                    provides a uniform way of handling different string types.
59                    If there is a need to pass a pair of iterators,
60                    <ulink url="../../libs/range/doc/html/range/reference/utilities/iterator_range.html"><code>boost::iterator_range</code></ulink>
61                    can be used to package iterators into a structure with a compatible interface.
62                </para>
63            </listitem>
64            <listitem>
65                <para><emphasis role="bold">Copy vs. Mutable:</emphasis>
66                    Many algorithms in the library are performing a transformation of the input.
67                    The transformation can be done in-place, mutating the input sequence, or a copy
68                    of the transformed input can be created, leaving the input intact. None of
69                    these possibilities is superior to the other one and both have different
70                    advantages and disadvantages. For this reason, both are provided with the library.
71                </para>
72            </listitem>
73            <listitem>
74                <para><emphasis role="bold">Algorithm stacking:</emphasis>
75                    Copy versions return a transformed input as a result, thus allow a simple chaining of
76                    transformations within one expression (i.e. one can write <code>trim_copy(to_upper_copy(s))</code>).
77                    Mutable versions have <code>void</code> return, to avoid misuse.
78                </para>
79            </listitem>
80            <listitem>
81                <para><emphasis role="bold">Naming:</emphasis>
82                    Naming follows the conventions from the Standard C++ Library. If there is a
83                    copy and a mutable version of the same algorithm, the mutable version has no suffix
84                    and the copy version has the suffix <emphasis>_copy</emphasis>.
85                    Some algorithms have the prefix <emphasis>i</emphasis>
86                    (e.g. <functionname>ifind_first()</functionname>).
87                    This prefix identifies that the algorithm works in a case-insensitive manner.
88                </para>
89            </listitem>
90        </itemizedlist>
91        <para>
92            To use the library, include the <headername>boost/algorithm/string.hpp</headername> header.
93            If the regex related functions are needed, include the
94            <headername>boost/algorithm/string_regex.hpp</headername> header.
95        </para>
96    </section>
97    <section>
98        <title>Case conversion</title>
99
100        <para>
101            STL has a nice way of converting character case. Unfortunately, it works only
102            for a single character and we want to convert a string,
103        </para>
104        <programlisting>
105    string str1("HeLlO WoRld!");
106    to_upper(str1); // str1=="HELLO WORLD!"
107        </programlisting>
108        <para>
109            <functionname>to_upper()</functionname> and <functionname>to_lower()</functionname> convert the case of
110            characters in a string using a specified locale.
111        </para>
112        <para>
113            For more information see the reference for <headername>boost/algorithm/string/case_conv.hpp</headername>.
114        </para>
115    </section>
116    <section>
117        <title>Predicates and Classification</title>
118        <para>
119            A part of the library deals with string related predicates. Consider this example:
120        </para>
121        <programlisting>
122    bool is_executable( string&amp; filename )
123    {
124        return
125            iends_with(filename, ".exe") ||
126            iends_with(filename, ".com");
127    }
128
129    // ...
130    string str1("command.com");
131    cout
132        &lt;&lt; str1
133        &lt;&lt; (is_executable(str1)? "is": "is not")
134        &lt;&lt; "an executable"
135        &lt;&lt; endl; // prints "command.com is an executable"
136
137    //..
138    char text1[]="hello";
139    cout
140        &lt;&lt; text1
141        &lt;&lt; (all( text1, is_lower() )? " is": " is not")
142        &lt;&lt; " written in the lower case"
143        &lt;&lt; endl; // prints "hello is written in the lower case"
144        </programlisting>
145        <para>
146            The predicates determine whether if a substring is contained in the input string
147            under various conditions. The conditions are: a string starts with the substring,
148            ends with the substring,
149            simply contains the substring or if both strings are equal. See the reference for
150            <headername>boost/algorithm/string/predicate.hpp</headername> for more details.
151        </para>
152        <para>
153            Note that if we had used "hello world" as the input to the test, it would have
154            output "hello world is not written in the lower case" because the space in the
155            input string is not a lower case letter.
156        </para>
157        <para>
158            In addition the algorithm <functionname>all()</functionname> checks
159            all elements of a container to satisfy a condition specified by a predicate.
160            This predicate can be any unary predicate, but the library provides a bunch of
161            useful string-related predicates and combinators ready for use.
162            These are located in the <headername>boost/algorithm/string/classification.hpp</headername> header.
163            Classification predicates can be combined using logical combinators to form
164            a more complex expressions. For example: <code>is_from_range('a','z') || is_digit()</code>
165        </para>
166    </section>
167    <section>
168        <title>Trimming</title>
169
170        <para>
171            When parsing the input from a user, strings often have unwanted leading or trailing
172            characters. To get rid of them, we need trim functions:
173        </para>
174        <programlisting>
175    string str1="     hello world!     ";
176    string str2=trim_left_copy(str1);   // str2 == "hello world!     "
177    string str3=trim_right_copy(str1);  // str3 == "     hello world!"
178    trim(str1);                         // str1 == "hello world!"
179
180    string phone="00423333444";
181    // remove leading 0 from the phone number
182    trim_left_if(phone,is_any_of("0")); // phone == "423333444"
183        </programlisting>
184        <para>
185            It is possible to trim the spaces on the right, on the left or on both sides of a string.
186            And for those cases when there is a need to remove something else than blank space, there
187            are <emphasis>_if</emphasis> variants. Using these, a user can specify a functor which will
188            select the <emphasis>space</emphasis> to be removed. It is possible to use classification
189            predicates like <functionname>is_digit()</functionname> mentioned in the previous paragraph.
190            See the reference for the <headername>boost/algorithm/string/trim.hpp</headername>.
191        </para>
192    </section>
193    <section>
194        <title>Find algorithms</title>
195
196        <para>
197            The library contains a set of find algorithms. Here is an example:
198        </para>
199        <programlisting>
200    char text[]="hello dolly!";
201    iterator_range&lt;char*&gt; result=find_last(text,"ll");
202
203    transform( result.begin(), result.end(), result.begin(), bind2nd(plus&lt;char&gt;(), 1) );
204    // text = "hello dommy!"
205
206    to_upper(result); // text == "hello doMMy!"
207
208    // iterator_range is convertible to bool
209    if(find_first(text, "dolly"))
210    {
211        cout &lt;&lt; "Dolly is there" &lt;&lt; endl;
212    }
213        </programlisting>
214        <para>
215            We have used <functionname>find_last()</functionname> to search the <code>text</code> for "ll".
216            The result is given in the <ulink url="../../libs/range/doc/html/range/reference/utilities/iterator_range.html"><code>boost::iterator_range</code></ulink>.
217            This range delimits the
218            part of the input which satisfies the find criteria. In our example it is the last occurrence of "ll".
219
220            As we can see, input of the <functionname>find_last()</functionname> algorithm can be also
221            char[] because this type is supported by
222            <ulink url="../../libs/range/index.html">Boost.Range</ulink>.
223
224            The following lines transform the result. Notice that
225            <ulink url="../../libs/range/doc/html/range/reference/utilities/iterator_range.html"><code>boost::iterator_range</code></ulink> has familiar
226            <code>begin()</code> and <code>end()</code> methods, so it can be used like any other STL container.
227            Also it is convertible to bool therefore it is easy to use find algorithms for a simple containment checking.
228        </para>
229        <para>
230            Find algorithms are located in <headername>boost/algorithm/string/find.hpp</headername>.
231        </para>
232    </section>
233    <section>
234        <title>Replace Algorithms</title>
235        <para>
236            Find algorithms can be used for searching for a specific part of string. Replace goes one step
237            further. After a matching part is found, it is substituted with something else. The substitution is computed
238            from the original, using some transformation.
239        </para>
240        <programlisting>
241    string str1="Hello  Dolly,   Hello World!"
242    replace_first(str1, "Dolly", "Jane");      // str1 == "Hello  Jane,   Hello World!"
243    replace_last(str1, "Hello", "Goodbye");    // str1 == "Hello  Jane,   Goodbye World!"
244    erase_all(str1, " ");                      // str1 == "HelloJane,GoodbyeWorld!"
245    erase_head(str1, 6);                       // str1 == "Jane,GoodbyeWorld!"
246        </programlisting>
247        <para>
248            For the complete list of replace and erase functions see the
249            <link linkend="string_algo.reference">reference</link>.
250            There is a lot of predefined function for common usage, however, the library allows you to
251            define a custom <code>replace()</code> that suits a specific need. There is a generic <functionname>find_format()</functionname>
252            function which takes two parameters.
253            The first one is a <link linkend="string_algo.finder_concept">Finder</link> object, the second one is
254            a <link linkend="string_algo.formatter_concept">Formatter</link> object.
255            The Finder object is a functor which performs the searching for the replacement part. The Formatter object
256            takes the result of the Finder (usually a reference to the found substring) and creates a
257            substitute for it. Replace algorithm puts these two together and makes the desired substitution.
258        </para>
259        <para>
260            Check <headername>boost/algorithm/string/replace.hpp</headername>, <headername>boost/algorithm/string/erase.hpp</headername> and
261            <headername>boost/algorithm/string/find_format.hpp</headername> for reference.
262        </para>
263    </section>
264    <section>
265        <title>Find Iterator</title>
266
267        <para>
268            An extension to find algorithms is the Find Iterator. Instead of searching for just a one part of a string,
269            the find iterator allows us to iterate over the substrings matching the specified criteria.
270            This facility is using the <link linkend="string_algo.finder_concept">Finder</link> to incrementally
271            search the string.
272            Dereferencing a find iterator yields an <ulink url="../../libs/range/doc/html/range/reference/utilities/iterator_range.html"><code>boost::iterator_range</code></ulink>
273            object, that delimits the current match.
274        </para>
275        <para>
276            There are two iterators provided <classname>find_iterator</classname> and
277            <classname>split_iterator</classname>. The former iterates over substrings that are found using the specified
278            Finder. The latter iterates over the gaps between these substrings.
279        </para>
280        <programlisting>
281    string str1("abc-*-ABC-*-aBc");
282    // Find all 'abc' substrings (ignoring the case)
283    // Create a find_iterator
284    typedef find_iterator&lt;string::iterator&gt; string_find_iterator;
285    for(string_find_iterator It=
286            make_find_iterator(str1, first_finder("abc", is_iequal()));
287        It!=string_find_iterator();
288        ++It)
289    {
290        cout &lt;&lt; copy_range&lt;std::string&gt;(*It) &lt;&lt; endl;
291    }
292
293    // Output will be:
294    // abc
295    // ABC
296    // aBC
297
298    typedef split_iterator&lt;string::iterator&gt; string_split_iterator;
299    for(string_split_iterator It=
300        make_split_iterator(str1, first_finder("-*-", is_iequal()));
301        It!=string_split_iterator();
302        ++It)
303    {
304        cout &lt;&lt; copy_range&lt;std::string&gt;(*It) &lt;&lt; endl;
305    }
306
307    // Output will be:
308    // abc
309    // ABC
310    // aBC
311        </programlisting>
312        <para>
313            Note that the find iterators have only one template parameter. It is the base iterator type.
314            The Finder is specified at runtime. This allows us to typedef a find iterator for
315            common string types and reuse it. Additionally make_*_iterator functions help
316            to construct a find iterator for a particular range.
317        </para>
318        <para>
319            See the reference in <headername>boost/algorithm/string/find_iterator.hpp</headername>.
320        </para>
321    </section>
322    <section>
323        <title>Split</title>
324
325        <para>
326            Split algorithms are an extension to the find iterator for one common usage scenario.
327            These algorithms use a find iterator and store all matches into the provided
328            container. This container must be able to hold copies (e.g. <code>std::string</code>) or
329            references (e.g. <code>iterator_range</code>) of the extracted substrings.
330        </para>
331        <para>
332            Two algorithms are provided. <functionname>find_all()</functionname> finds all copies
333            of a string in the input. <functionname>split()</functionname> splits the input into parts.
334        </para>
335
336        <programlisting>
337    string str1("hello abc-*-ABC-*-aBc goodbye");
338
339    typedef vector&lt; iterator_range&lt;string::iterator&gt; &gt; find_vector_type;
340
341    find_vector_type FindVec; // #1: Search for separators
342    ifind_all( FindVec, str1, "abc" ); // FindVec == { [abc],[ABC],[aBc] }
343
344    typedef vector&lt; string &gt; split_vector_type;
345
346    split_vector_type SplitVec; // #2: Search for tokens
347    split( SplitVec, str1, is_any_of("-*"), token_compress_on ); // SplitVec == { "hello abc","ABC","aBc goodbye" }
348        </programlisting>
349        <para>
350            <code>[hello]</code> designates an <code>iterator_range</code> delimiting this substring.
351        </para>
352        <para>
353            First example show how to construct a container to hold references to all extracted
354            substrings. Algorithm <functionname>ifind_all()</functionname> puts into FindVec references
355            to all substrings that are in case-insensitive manner equal to "abc".
356        </para>
357        <para>
358            Second example uses <functionname>split()</functionname> to split string str1 into parts
359            separated by characters '-' or '*'. These parts are then put into the SplitVec.
360            It is possible to specify if adjacent separators are concatenated or not.
361        </para>
362        <para>
363            More information can be found in the reference: <headername>boost/algorithm/string/split.hpp</headername>.
364        </para>
365   </section>
366</section>
367