1<?xml version="1.0" encoding="utf-8"?> 2<!DOCTYPE library PUBLIC "-//Boost//DTD BoostBook XML V1.0//EN" 3"http://www.boost.org/tools/boostbook/dtd/boostbook.dtd"> 4 5 6<!-- Copyright (c) 2002-2006 Pavol Droba. 7 Subject to the Boost Software License, Version 1.0. 8 (See accompanying file LICENSE_1_0.txt or http://www.boost.org/LICENSE_1_0.txt) 9--> 10 11 12<section id="string_algo.usage" last-revision="$Date$"> 13 <title>Usage</title> 14 15 <using-namespace name="boost"/> 16 <using-namespace name="boost::algorithm"/> 17 18 19 <section> 20 <title>First Example</title> 21 22 <para> 23 Using the algorithms is straightforward. Let us have a look at the first example: 24 </para> 25 <programlisting> 26 #include <boost/algorithm/string.hpp> 27 using namespace std; 28 using namespace boost; 29 30 // ... 31 32 string str1(" hello world! "); 33 to_upper(str1); // str1 == " HELLO WORLD! " 34 trim(str1); // str1 == "HELLO WORLD!" 35 36 string str2= 37 to_lower_copy( 38 ireplace_first_copy( 39 str1,"hello","goodbye")); // str2 == "goodbye world!" 40 </programlisting> 41 <para> 42 This example converts str1 to upper case and trims spaces from the start and the end 43 of the string. str2 is then created as a copy of str1 with "hello" replaced with "goodbye". 44 This example demonstrates several important concepts used in the library: 45 </para> 46 <itemizedlist> 47 <listitem> 48 <para><emphasis role="bold">Container parameters:</emphasis> 49 Unlike in the STL algorithms, parameters are not specified only in the form 50 of iterators. The STL convention allows for great flexibility, 51 but it has several limitations. It is not possible to <emphasis>stack</emphasis> algorithms together, 52 because a container is passed in two parameters. Therefore it is not possible to use 53 a return value from another algorithm. It is considerably easier to write 54 <code>to_lower(str1)</code>, than <code>to_lower(str1.begin(), str1.end())</code>. 55 </para> 56 <para> 57 The magic of <ulink url="../../libs/range/index.html">Boost.Range</ulink> 58 provides a uniform way of handling different string types. 59 If there is a need to pass a pair of iterators, 60 <ulink url="../../libs/range/doc/html/range/reference/utilities/iterator_range.html"><code>boost::iterator_range</code></ulink> 61 can be used to package iterators into a structure with a compatible interface. 62 </para> 63 </listitem> 64 <listitem> 65 <para><emphasis role="bold">Copy vs. Mutable:</emphasis> 66 Many algorithms in the library are performing a transformation of the input. 67 The transformation can be done in-place, mutating the input sequence, or a copy 68 of the transformed input can be created, leaving the input intact. None of 69 these possibilities is superior to the other one and both have different 70 advantages and disadvantages. For this reason, both are provided with the library. 71 </para> 72 </listitem> 73 <listitem> 74 <para><emphasis role="bold">Algorithm stacking:</emphasis> 75 Copy versions return a transformed input as a result, thus allow a simple chaining of 76 transformations within one expression (i.e. one can write <code>trim_copy(to_upper_copy(s))</code>). 77 Mutable versions have <code>void</code> return, to avoid misuse. 78 </para> 79 </listitem> 80 <listitem> 81 <para><emphasis role="bold">Naming:</emphasis> 82 Naming follows the conventions from the Standard C++ Library. If there is a 83 copy and a mutable version of the same algorithm, the mutable version has no suffix 84 and the copy version has the suffix <emphasis>_copy</emphasis>. 85 Some algorithms have the prefix <emphasis>i</emphasis> 86 (e.g. <functionname>ifind_first()</functionname>). 87 This prefix identifies that the algorithm works in a case-insensitive manner. 88 </para> 89 </listitem> 90 </itemizedlist> 91 <para> 92 To use the library, include the <headername>boost/algorithm/string.hpp</headername> header. 93 If the regex related functions are needed, include the 94 <headername>boost/algorithm/string_regex.hpp</headername> header. 95 </para> 96 </section> 97 <section> 98 <title>Case conversion</title> 99 100 <para> 101 STL has a nice way of converting character case. Unfortunately, it works only 102 for a single character and we want to convert a string, 103 </para> 104 <programlisting> 105 string str1("HeLlO WoRld!"); 106 to_upper(str1); // str1=="HELLO WORLD!" 107 </programlisting> 108 <para> 109 <functionname>to_upper()</functionname> and <functionname>to_lower()</functionname> convert the case of 110 characters in a string using a specified locale. 111 </para> 112 <para> 113 For more information see the reference for <headername>boost/algorithm/string/case_conv.hpp</headername>. 114 </para> 115 </section> 116 <section> 117 <title>Predicates and Classification</title> 118 <para> 119 A part of the library deals with string related predicates. Consider this example: 120 </para> 121 <programlisting> 122 bool is_executable( string& filename ) 123 { 124 return 125 iends_with(filename, ".exe") || 126 iends_with(filename, ".com"); 127 } 128 129 // ... 130 string str1("command.com"); 131 cout 132 << str1 133 << (is_executable(str1)? "is": "is not") 134 << "an executable" 135 << endl; // prints "command.com is an executable" 136 137 //.. 138 char text1[]="hello"; 139 cout 140 << text1 141 << (all( text1, is_lower() )? " is": " is not") 142 << " written in the lower case" 143 << endl; // prints "hello is written in the lower case" 144 </programlisting> 145 <para> 146 The predicates determine whether if a substring is contained in the input string 147 under various conditions. The conditions are: a string starts with the substring, 148 ends with the substring, 149 simply contains the substring or if both strings are equal. See the reference for 150 <headername>boost/algorithm/string/predicate.hpp</headername> for more details. 151 </para> 152 <para> 153 Note that if we had used "hello world" as the input to the test, it would have 154 output "hello world is not written in the lower case" because the space in the 155 input string is not a lower case letter. 156 </para> 157 <para> 158 In addition the algorithm <functionname>all()</functionname> checks 159 all elements of a container to satisfy a condition specified by a predicate. 160 This predicate can be any unary predicate, but the library provides a bunch of 161 useful string-related predicates and combinators ready for use. 162 These are located in the <headername>boost/algorithm/string/classification.hpp</headername> header. 163 Classification predicates can be combined using logical combinators to form 164 a more complex expressions. For example: <code>is_from_range('a','z') || is_digit()</code> 165 </para> 166 </section> 167 <section> 168 <title>Trimming</title> 169 170 <para> 171 When parsing the input from a user, strings often have unwanted leading or trailing 172 characters. To get rid of them, we need trim functions: 173 </para> 174 <programlisting> 175 string str1=" hello world! "; 176 string str2=trim_left_copy(str1); // str2 == "hello world! " 177 string str3=trim_right_copy(str1); // str3 == " hello world!" 178 trim(str1); // str1 == "hello world!" 179 180 string phone="00423333444"; 181 // remove leading 0 from the phone number 182 trim_left_if(phone,is_any_of("0")); // phone == "423333444" 183 </programlisting> 184 <para> 185 It is possible to trim the spaces on the right, on the left or on both sides of a string. 186 And for those cases when there is a need to remove something else than blank space, there 187 are <emphasis>_if</emphasis> variants. Using these, a user can specify a functor which will 188 select the <emphasis>space</emphasis> to be removed. It is possible to use classification 189 predicates like <functionname>is_digit()</functionname> mentioned in the previous paragraph. 190 See the reference for the <headername>boost/algorithm/string/trim.hpp</headername>. 191 </para> 192 </section> 193 <section> 194 <title>Find algorithms</title> 195 196 <para> 197 The library contains a set of find algorithms. Here is an example: 198 </para> 199 <programlisting> 200 char text[]="hello dolly!"; 201 iterator_range<char*> result=find_last(text,"ll"); 202 203 transform( result.begin(), result.end(), result.begin(), bind2nd(plus<char>(), 1) ); 204 // text = "hello dommy!" 205 206 to_upper(result); // text == "hello doMMy!" 207 208 // iterator_range is convertible to bool 209 if(find_first(text, "dolly")) 210 { 211 cout << "Dolly is there" << endl; 212 } 213 </programlisting> 214 <para> 215 We have used <functionname>find_last()</functionname> to search the <code>text</code> for "ll". 216 The result is given in the <ulink url="../../libs/range/doc/html/range/reference/utilities/iterator_range.html"><code>boost::iterator_range</code></ulink>. 217 This range delimits the 218 part of the input which satisfies the find criteria. In our example it is the last occurrence of "ll". 219 220 As we can see, input of the <functionname>find_last()</functionname> algorithm can be also 221 char[] because this type is supported by 222 <ulink url="../../libs/range/index.html">Boost.Range</ulink>. 223 224 The following lines transform the result. Notice that 225 <ulink url="../../libs/range/doc/html/range/reference/utilities/iterator_range.html"><code>boost::iterator_range</code></ulink> has familiar 226 <code>begin()</code> and <code>end()</code> methods, so it can be used like any other STL container. 227 Also it is convertible to bool therefore it is easy to use find algorithms for a simple containment checking. 228 </para> 229 <para> 230 Find algorithms are located in <headername>boost/algorithm/string/find.hpp</headername>. 231 </para> 232 </section> 233 <section> 234 <title>Replace Algorithms</title> 235 <para> 236 Find algorithms can be used for searching for a specific part of string. Replace goes one step 237 further. After a matching part is found, it is substituted with something else. The substitution is computed 238 from the original, using some transformation. 239 </para> 240 <programlisting> 241 string str1="Hello Dolly, Hello World!" 242 replace_first(str1, "Dolly", "Jane"); // str1 == "Hello Jane, Hello World!" 243 replace_last(str1, "Hello", "Goodbye"); // str1 == "Hello Jane, Goodbye World!" 244 erase_all(str1, " "); // str1 == "HelloJane,GoodbyeWorld!" 245 erase_head(str1, 6); // str1 == "Jane,GoodbyeWorld!" 246 </programlisting> 247 <para> 248 For the complete list of replace and erase functions see the 249 <link linkend="string_algo.reference">reference</link>. 250 There is a lot of predefined function for common usage, however, the library allows you to 251 define a custom <code>replace()</code> that suits a specific need. There is a generic <functionname>find_format()</functionname> 252 function which takes two parameters. 253 The first one is a <link linkend="string_algo.finder_concept">Finder</link> object, the second one is 254 a <link linkend="string_algo.formatter_concept">Formatter</link> object. 255 The Finder object is a functor which performs the searching for the replacement part. The Formatter object 256 takes the result of the Finder (usually a reference to the found substring) and creates a 257 substitute for it. Replace algorithm puts these two together and makes the desired substitution. 258 </para> 259 <para> 260 Check <headername>boost/algorithm/string/replace.hpp</headername>, <headername>boost/algorithm/string/erase.hpp</headername> and 261 <headername>boost/algorithm/string/find_format.hpp</headername> for reference. 262 </para> 263 </section> 264 <section> 265 <title>Find Iterator</title> 266 267 <para> 268 An extension to find algorithms is the Find Iterator. Instead of searching for just a one part of a string, 269 the find iterator allows us to iterate over the substrings matching the specified criteria. 270 This facility is using the <link linkend="string_algo.finder_concept">Finder</link> to incrementally 271 search the string. 272 Dereferencing a find iterator yields an <ulink url="../../libs/range/doc/html/range/reference/utilities/iterator_range.html"><code>boost::iterator_range</code></ulink> 273 object, that delimits the current match. 274 </para> 275 <para> 276 There are two iterators provided <classname>find_iterator</classname> and 277 <classname>split_iterator</classname>. The former iterates over substrings that are found using the specified 278 Finder. The latter iterates over the gaps between these substrings. 279 </para> 280 <programlisting> 281 string str1("abc-*-ABC-*-aBc"); 282 // Find all 'abc' substrings (ignoring the case) 283 // Create a find_iterator 284 typedef find_iterator<string::iterator> string_find_iterator; 285 for(string_find_iterator It= 286 make_find_iterator(str1, first_finder("abc", is_iequal())); 287 It!=string_find_iterator(); 288 ++It) 289 { 290 cout << copy_range<std::string>(*It) << endl; 291 } 292 293 // Output will be: 294 // abc 295 // ABC 296 // aBC 297 298 typedef split_iterator<string::iterator> string_split_iterator; 299 for(string_split_iterator It= 300 make_split_iterator(str1, first_finder("-*-", is_iequal())); 301 It!=string_split_iterator(); 302 ++It) 303 { 304 cout << copy_range<std::string>(*It) << endl; 305 } 306 307 // Output will be: 308 // abc 309 // ABC 310 // aBC 311 </programlisting> 312 <para> 313 Note that the find iterators have only one template parameter. It is the base iterator type. 314 The Finder is specified at runtime. This allows us to typedef a find iterator for 315 common string types and reuse it. Additionally make_*_iterator functions help 316 to construct a find iterator for a particular range. 317 </para> 318 <para> 319 See the reference in <headername>boost/algorithm/string/find_iterator.hpp</headername>. 320 </para> 321 </section> 322 <section> 323 <title>Split</title> 324 325 <para> 326 Split algorithms are an extension to the find iterator for one common usage scenario. 327 These algorithms use a find iterator and store all matches into the provided 328 container. This container must be able to hold copies (e.g. <code>std::string</code>) or 329 references (e.g. <code>iterator_range</code>) of the extracted substrings. 330 </para> 331 <para> 332 Two algorithms are provided. <functionname>find_all()</functionname> finds all copies 333 of a string in the input. <functionname>split()</functionname> splits the input into parts. 334 </para> 335 336 <programlisting> 337 string str1("hello abc-*-ABC-*-aBc goodbye"); 338 339 typedef vector< iterator_range<string::iterator> > find_vector_type; 340 341 find_vector_type FindVec; // #1: Search for separators 342 ifind_all( FindVec, str1, "abc" ); // FindVec == { [abc],[ABC],[aBc] } 343 344 typedef vector< string > split_vector_type; 345 346 split_vector_type SplitVec; // #2: Search for tokens 347 split( SplitVec, str1, is_any_of("-*"), token_compress_on ); // SplitVec == { "hello abc","ABC","aBc goodbye" } 348 </programlisting> 349 <para> 350 <code>[hello]</code> designates an <code>iterator_range</code> delimiting this substring. 351 </para> 352 <para> 353 First example show how to construct a container to hold references to all extracted 354 substrings. Algorithm <functionname>ifind_all()</functionname> puts into FindVec references 355 to all substrings that are in case-insensitive manner equal to "abc". 356 </para> 357 <para> 358 Second example uses <functionname>split()</functionname> to split string str1 into parts 359 separated by characters '-' or '*'. These parts are then put into the SplitVec. 360 It is possible to specify if adjacent separators are concatenated or not. 361 </para> 362 <para> 363 More information can be found in the reference: <headername>boost/algorithm/string/split.hpp</headername>. 364 </para> 365 </section> 366</section> 367