Redirected from Concatenation
|
A common representation is an array of character codes[?], occupying one byte (e.g. in ASCII code) or two bytes (e.g. in unicode) each. The length can be stored implicitly by using a special terminating character (often NUL, ASCII code 0) -- the C programming language uses this convention -- or explicitly, for example by prefixing the string with integer value (convention used in Pascal).
Here is an example of a NUL terminated string stored in a 10 byte buffer, along with its ASCII representation:
F | R | A | N | K | k | f | f | w | |
46 | 52 | 41 | 4E | 4B | 00 | 6B | 66 | 66 | 77 |
The length of a string in the above example 5 characters, but note that it occupies 6 bytes. Characters after the terminator do not form part of the representation; they may be either part of another string or just garbage.
Here is the equivalent Pascal string:
F | R | A | N | K | k | f | f | w | |
05 | 46 | 52 | 41 | 4E | 4B | 6B | 66 | 66 | 77 |
Of course, other representations are possible. Using trees and lists makes certain string operations, such as character insertions or deletions, more efficient.
A simple operation on strings is concatenation. Other common operations include searching a substring in a longer string, sorting a list of strings and parsing a string. Because there are so many practical uses for strings there are many associated algorithms with various tradeoffs.
Advanced string algorithms often employ complex mechanisms and data structures, among them suffix trees[?], finite state machines.
Strings are such a useful datatype that several languages have been designed in order to make string processing applications easy to write. Examples include:
Many UNIX utilities perform simple string manipulations and can be used to easily program some powerful string processing algorithms. Files and finite streams may be viewed as strings.
Recent scripting languages, including Perl, Python and Ruby, employ regular expressions to facilitate text operations.
In theoretical computer science, one starts with a non-empty finite set called the alphabet; strings are then defined as finite sequences of elements from the alphabet, including the empty sequence. The set of all strings over a given alphabet, together with string concatentation, then forms a monoid, in fact a free monoid. Formal languages, the central objects of study, are defined as subsets of this monoid.
wikipedia.org dumped 2003-03-17 with terodump