C++ string handling

The C++ programming language has support for string handling, mostly implemented in its standard library. The language standard specifies several string types, some inherited from C, some designed to make use of the language's features, such as classes and RAII. The most-used of these is.
Since the initial versions of C++ had only the "low-level" C string handling functionality and conventions, multiple incompatible designs for string handling classes have been designed over the years and are still used instead of std::string, and C++ programmers may need to handle multiple conventions in a single application.

History

The type is the main string datatype in standard C++ since 1998, but it was not always part of C++. From C, C++ inherited the convention of using null-terminated strings that are handled by a pointer to their first element, and a library of functions that manipulate such strings. In modern standard C++, a string literal such as still denotes a NUL-terminated array of characters.
Using C++ classes to implement a string type offers several benefits of automated memory management and a reduced risk of out-of-bounds accesses, and more intuitive syntax for string comparison and concatenation. Therefore, it was strongly tempting to create such a class. Over the years, C++ application, library and framework developers produced their own, incompatible string representations, such as the one in AT&T's Standard Components library or the type in Microsoft's MFC. While standardized strings, legacy applications still commonly contain such custom string types and libraries may expect C-style strings, making it "virtually impossible" to avoid using multiple string types in C++ programs and requiring programmers to decide on the desired string representation ahead of starting a project.
In a 1991 retrospective on the history of C++, its inventor Bjarne Stroustrup called the lack of a standard string type in C++ 1.0 the worst mistake he made in its development; "the absence of those led to everybody re-inventing the wheel and to an unnecessary diversity in the most fundamental classes".

Implementation issues

The various vendors' string types have different implementation strategies and performance characteristics. In particular, some string types use a copy-on-write strategy, where an operation such as

string a = "hello!";
string b = a; // Copy constructor

does not actually copy the content of to ; instead, both strings share their contents and a reference count on the content is incremented. The actual copying is postponed until a mutating operation, such as appending a character to either string, makes the strings' contents differ. Copy-on-write can make major performance changes to code using strings. Though no longer uses it, many alternative string libraries still implement copy-on-write strings.
Some string implementations store 16-bit or 32-bit code points instead of bytes, this was intended to facilitate processing of Unicode text. However, it means that conversion to these types from or from arrays of bytes is a slow and often a lossy operation, dependent on the "locale", and can throw exceptions. Any processing advantages of 16-bit code units vanished when the variable-width UTF-16 encoding was introduced. Qt's is an example.
Third-party string implementations also differed considerably in the syntax to extract or compare substrings, or to perform searches in the text.

Standard string types

The class is the standard representation for a text string since C++98. The class provides some typical string operations like comparison, concatenation, find and replace, and a function for obtaining substrings. An can be constructed from a C-style string, and a C-style string can also be obtained from one.
The individual units making up the string are of type, at least 8 bits each. In modern usage these are often not "characters", but parts of a multibyte character encoding such as UTF-8.
The copy-on-write strategy was deliberately allowed by the initial C++ Standard for because it was deemed a useful optimization, and used by nearly all implementations. However, there were mistakes, in particular the returned a non-const reference in order to make it easy to port C in-place string manipulations This allowed the following code that shows that it must make a copy even though it is almost always used only to examine the string and not modify it:

std::string original;
std::string string_copy = original; // make a copy
char* pointer = &string_copy; // some tried to make operator return a "trick" class but this makes it complex
arbitrary_code_here; // no optimizations can fix this
*pointer = 'b'; // if operator did not copy, this would change original unexpectedly

This caused some implementations to abandon copy-on-write. It was also discovered that the overhead in multi-threaded applications due to the locking needed to examine or change the reference count was greater than the overhead of copying small strings on modern processors. The optimization was finally disallowed in C++11, with the result that even passing a as an argument to a function, viz.

void print

must be expected to perform a full copy of the string into newly allocated memory. The common idiom to avoid such copying is to pass as a const reference:

void print

In C++17 added a new class that is only a pointer and length to read-only data, makes passing arguments far faster than either of the above examples:

void print
...
std::string x =...;
print; // does not copy x.data
print; // also does not copy the characters!
...

Example usage

include
include

int main

is a typedef for a particular instantiation of the template class. Its definition is found in the header:

typedef basic_string string;

Thus provides functionality for strings having elements of type. There is a similar class, which consists of, and is most often used to store UTF-16 text on Windows and UTF-32 on most Unix-like platforms. The C++ standard, however, does not impose any interpretation as Unicode code points or code units on these types and does not even guarantee that a holds more bits than a. To resolve some of the incompatibilities resulting from 's properties, C++11 added two new classes: and , which are the given number of bits per code unit on all platforms.
C++11 also added new string literals of 16-bit and 32-bit "characters" and syntax for putting Unicode code points into null-terminated strings.
A is guaranteed to be specializable for any type with a struct to accompany it. As of C++11, only,, and specializations are required to be implemented in the standard library; any other types are implementation-defined. Each specialization is also a Standard Library container, and thus the Standard Library algorithms can be applied to the code units in strings.

Critiques

The design of has been held up as an example of monolithic design by Herb Sutter, who reckons that of the 103 member functions on the class in C++98, 71 could have been decoupled without loss of implementation efficiency.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...