Most programming languages that have a string datatype will have some string functions although there may be other low-level ways within each language to handle strings directly. In object-oriented languages, string functions are often implemented as properties and methods of string objects. In functional and list-based languages a string is represented as a list (of character codes), therefore all list-manipulation procedures could be considered string functions. However such languages may implement a subset of explicit string-specific functions as well.
For function that manipulate strings, modern object-oriented languages, like C# and Java have immutable strings and return a copy (in newly allocated dynamic memory), while others, like C manipulate the original string unless the programmer copies data to a new string. See for example Concatenation below.
The most basic example of a string function is the length(string) function. This function returns the length of a string literal.
e.g. length("hello world") would return 11.
Other languages may have string functions with similar or exactly the same syntax or parameters or outcomes. For example, in many languages the length function is usually represented as len(string). The below list of common functions aims to help limit this confusion.
Common string functions (multi language reference)
String functions common to many languages are listed below, including the different names used. The below list of common functions aims to help programmers find the equivalent function in a language. Note, string concatenation and regular expressions are handled in separate pages. Statements in guillemets (« … ») are optional.
{ Example in Pascal }varMyStr:string='Hello, World';MyChar:Char;beginMyChar:=MyStr[2];// 'e'
# Example in ALGOL 68 #
"Hello, World"[2]; // 'e'
// Example in C#include<stdio.h> // for printfcharMyStr[]="Hello, World";printf("%c",*(MyStr+1));// 'e'printf("%c",*(MyStr+7));// 'W'printf("%c",MyStr[11]);// 'd'printf("%s",MyStr);// 'Hello, World'printf("%s","Hello(2), World(2)");// 'Hello(2), World(2)'
// Example in C++#include<iostream> // for "cout"#include<string.h> // for "string" data typeusingnamespacestd;charMyStr1[]="Hello(1), World(1)";stringMyStr2="Hello(2), World(2)";cout<<"Hello(3), World(3)";// 'Hello(3), World(3)'cout<<MyStr2[6];// '2'cout<<MyStr1.substr(5,3);// '(1)'
// Example in C#"Hello, World"[2];// 'l'
# Example in Perl 5substr("Hello, World",1,1);# 'e'
# Examples in Python"Hello, World"[2]# 'l'"Hello, World"[-3]# 'r'
# Example in Raku"Hello, World".substr(1, 1); # 'e'
' Example in Visual BasicMid("Hello, World",2,1)
' Example in Visual Basic .NET"Hello, World".Chars(2)' "l"c
" Example in Smalltalk "'Hello, World'at:2."$e"
//Example in Rust"Hello, World".chars().nth(2);// Some('l')
Compare (integer result)
Definition
compare(string1,string2) returns integer.
Description
Compares two strings to each other. If they are equivalent, a zero is returned. Otherwise, most of these routines will return a positive or negative result corresponding to whether string1 is lexicographically greater than, or less than, respectively, than string2. The exceptions are the Scheme and Rexx routines which return the index of the first mismatch, and Smalltalk which answer a comparison code telling how the receiver sorts relative to string parameter.
Format
Languages
IF string1<string2 THEN -1 ELSE ABS (string1>string2) FI
(stringOP? string1string2), where OP can be any of =, -ci=, <, -ci<, >, -ci>, <=, -ci<=, >= and -ci>= (operators starting with '-ci' are case-insensitive)
(stringOP string1string2), where OP can be any of =, -ci=, <>, -ci<>, <, -ci<, >, -ci>, <=, -ci<=, >= and -ci>= (operators starting with '-ci' are case-insensitive)
(stringOP string1string2), where OP can be any of =, -equal, /=, -not-equal, <, -lessp, >, -greaterp, <=, -not-greaterp, >= and -not-lessp (the verbal operators are case-insensitive)
string1 OP string2, where OP can be any of -eq, -ceq, -ne, -cne, -lt, -clt, -gt, -cgt, -le, -cle, -ge, and -cge (operators starting with 'c' are case-sensitive)
string1 OP string2 is available in the syntax, but means comparison of the pointers pointing to the strings, not of the string contents. Use the Compare (integer result) function.
Concatenates (joins) two strings to each other, returning the combined string. Note that some languages like C have mutable strings, so really the second string is being appended to the first string and the mutated string is returned.
{ Example in Pascal }'abc'+'def';// returns "abcdef"
// Example in C#"abc"+"def";// returns "abcdef"
' Example in Visual Basic"abc"&"def"' returns "abcdef""abc"+"def"' returns "abcdef""abc"&Null' returns "abc""abc"+Null' returns Null
// Example in D"abc"~"def";// returns "abcdef"
;; Example in common lisp(concatenate'string"abc ""def ""ghi"); returns "abc def ghi"
# Example in Perl 5"abc"."def";# returns "abcdef""Perl ".5;# returns "Perl 5"
# Example in Raku"abc" ~ "def"; # returns "abcdef""Perl " ~ 6; # returns "Perl 6"
Contains
Definition
contains(string,substring) returns boolean
Description
Returns whether string contains substring as a substring. This is equivalent to using Find and then detecting that it does not result in the failure condition listed in the third column of the Find section. However, some languages have a simpler way of expressing this test.
¢ Example in ALGOL 68 ¢
string in string("e", loc int, "Hello mate"); ¢ returns true ¢
string in string("z", loc int, "word"); ¢ returns false ¢
// Example In C#"Hello mate".Contains("e");// returns true"word".Contains("z");// returns false
# Example in Python"e"in"Hello mate"# returns true"z"in"word"# returns false
# Example in Raku"Good morning!".contains('z') # returns False"¡Buenos días!".contains('í'); # returns True
" Example in Smalltalk "'Hello mate'includesSubstring:'e'" returns true "'word'includesSubstring:'z'" returns false "
Equality
Tests if two strings are equal. See also #Compare and #Compare. Note that doing equality checks via a generic Compare with integer result is not only confusing for the programmer but is often a significantly more expensive operation; this is especially true when using "C-strings".
' Example in Visual Basic"hello"="world"' returns false
# Examples in Perl 5'hello'eq'world'# returns 0'hello'eq'hello'# returns 1
# Examples in Raku'hello'eq'world'# returns False'hello'eq'hello'# returns True
# Example in Windows PowerShell"hello"-eq"world"# returns false
⍝ Example in APL'hello'≡'world'⍝ returns 0
Find
Definition
find(string,substring) returns integer
Description
Returns the position of the start of the first occurrence of substring in string. If the substring is not found most of these routines return an invalid index value – -1 where indexes are 0-based, 0 where they are 1-based – or some value to be interpreted as Boolean FALSE.
Related
instrrev
Format
Languages
If not found
string in string(substring, pos, string[startpos:])
'Hello mate'indexOfSubCollection:'late'ifAbsent:[ selferror ] "raises an exception"
Find character
Definition
find_character(string,char) returns integer
Description
Returns the position of the start of the first occurrence of the character char in string. If the character is not found most of these routines return an invalid index value – -1 where indexes are 0-based, 0 where they are 1-based – or some value to be interpreted as Boolean FALSE. This can be accomplished as a special case of #Find, with a string of one character; but it may be simpler or more efficient in many languages to locate just one character. Also, in many languages, characters and strings are different types, so it is convenient to have such a function.
returns BOOL: TRUE or FALSE, and position in REF INT pos.
instr(string, any char«,startpos») (char, can contain more them one char, in which case the position of the first appearance of any of them is returned.)
// Examples in C#"Hello mate".IndexOf('e');// returns 1"word".IndexOf('z')// returns -1
; Examples in Common Lisp(position#\e"Hello mate"); returns 1(position#\z"word"); returns NIL
^a Given a set of characters, SCAN returns the position of the first character found,[19] while VERIFY returns the position of the first character that does not belong to the set.[20]
// Example in C#String.Format("My {0} costs {1:C2}","pen",19.99);// returns "My pen costs $19.99"
// Example in Object Pascal (Delphi)Format('My %s costs $%2f',['pen',19.99]);// returns "My pen costs $19.99"
// Example in JavaString.format("My %s costs $%2f","pen",19.99);// returns "My pen costs $19.99"
# Examples in Rakusprintf"My %s costs \$%.2f", "pen", 19.99; # returns "My pen costs $19.99"1.fmt("%04d"); # returns "0001"
# Example in Python"My %s costs $%.2f"%("pen",19.99);# returns "My pen costs $19.99""My {0} costs ${1:.2f}".format("pen",19.99);# returns "My pen costs $19.99"
#Example in Python 3.6+pen="pen"f"My {pen} costs {19.99}"#returns "My pen costs 19.99"
; Example in Scheme(format"My ~a costs $~1,2F""pen"19.99); returns "My pen costs $19.99"
/* example in PL/I */putstring(some_string)edit('My','pen','costs',19.99)(a,a,a,p'$$$V.99')/* returns "My pen costs $19.99" */
Inequality
Tests if two strings are not equal. See also #Equality.
Format
Languages
string1nestring2 string1 NE string2
ALGOL 68 – note: the operator "ne" is literally in bold type-font.
Returns the left n part of a string. If n is greater than the length of the string then most implementations return the whole string (exceptions exist – see code examples). Note that for variable-length encodings such as UTF-8, UTF-16 or Shift-JIS, it can be necessary to remove string positions at the end, in order to avoid invalid strings.
Returns the length of a string (not counting the null terminator or any other of the string's internal structural information). An empty string returns a length of 0.
# Examples in Perl 5length("hello");# returns 5length("");# returns 0
# Examples in Raku"🏳️🌈".chars; chars"🏳️🌈"; # both return 1"🏳️🌈".codes; codes"🏳️🌈"; # both return 4"".chars; chars""; # both return 0"".codes; codes""; # both return 0
' Examples in Visual BasicLen("hello")' returns 5Len("")' returns 0
//Examples in Objective-C[@"hello"Length]//returns 5[@""Length]//returns 0
-- Examples in Lua("hello"):len()-- returns 5#""-- returns 0
// Example in C#"Wiki means fast?".ToLower();// "wiki means fast?"
; Example in Scheme(use-modules(srfisrfi-13))(string-downcase"Wiki means fast?"); "wiki means fast?"
/* Example in C */#include<ctype.h>#include<stdio.h>intmain(void){charstring[]="Wiki means fast?";inti;for(i=0;i<sizeof(string)-1;++i){/* transform characters in place, one by one */string[i]=tolower(string[i]);}puts(string);/* "wiki means fast?" */return0;}
# Example in Raku"Wiki means fast?".lc; # "wiki means fast?"
" Example in Smalltalk "'hello'reversed" returns 'olleh' "
# Example in Perl 5reverse"hello"# returns "olleh"
# Example in Raku"hello".flip# returns "olleh"
# Example in Python"hello"[::-1]# returns "olleh"
; Example in Scheme(use-modules(srfisrfi-13))(string-reverse"hello"); returns "olleh"
rfind
Definition
rfind(string,substring) returns integer
Description
Returns the position of the start of the last occurrence of substring in string. If the substring is not found most of these routines return an invalid index value – -1 where indexes are 0-based, 0 where they are 1-based – or some value to be interpreted as Boolean FALSE.
Returns the right n part of a string. If n is greater than the length of the string then most implementations return the whole string (exceptions exist – see code examples).
; Examples in Scheme(use-modules(srfisrfi-13))(string-take-right"abcde",3); returns "cde"(string-take-right"abcde",8); error
' Examples in Visual BasicRight("sandroguidi",3)' returns "idi"Right("sandroguidi",100)' returns "sandroguidi"
rpartition
Definition
<string>.rpartition(separator) Searches for the separator from right-to-left within the string then returns the sub-string before the separator; the separator; then the sub-string after the separator.
Description
Splits the given string by the right-most separator and returns the three substrings that together make the original.
<string>.split(separator[, limit]) splits a string on separator, optionally only up to a limited number of substrings
Description
Splits the given string by occurrences of the separator (itself a string) and returns a list (or array) of the substrings. If limit is given, after limit – 1 separators have been read, the rest of the string is made into the last substring, regardless of whether it has any separators in it. The Scheme and Erlang implementations are similar but differ in several ways. JavaScript differs also in that it cuts, it does not put the rest of the string into the last element. See the example here. The Cobra implementation will default to whitespace. Opposite of join.
// Example in C#"abc,defgh,ijk".Split(',');// {"abc", "defgh", "ijk"}"abc,defgh;ijk".Split(',',';');// {"abc", "defgh", "ijk"}
% Example in Erlangstring:tokens("abc;defgh;ijk",";").% ["abc", "defgh", "ijk"]
// Examples in Java"abc,defgh,ijk".split(",");// {"abc", "defgh", "ijk"}"abc,defgh;ijk".split(",|;");// {"abc", "defgh", "ijk"}
{ Example in Pascal }varlStrings:TStringList;lStr:string;beginlStrings:=TStringList.Create;lStrings.Delimiter:=',';lStrings.DelimitedText:='abc,defgh,ijk';lStr:=lStrings.Strings[0];// 'abc'lStr:=lStrings.Strings[1];// 'defgh'lStr:=lStrings.Strings[2];// 'ijk'end;
# Examples in Perl 5split(/spam/,'Spam eggs spam spam and ham');# ('Spam eggs ', ' ', ' and ham')split(/X/,'Spam eggs spam spam and ham');# ('Spam eggs spam spam and ham')
# Examples in Raku'Spam eggs spam spam and ham'.split(/spam/); # (Spam eggs and ham)split(/X/, 'Spam eggs spam spam and ham'); # (Spam eggs spam spam and ham)
Returns a substring of string between starting at startpos and endpos, or starting at startpos of length numChars. The resulting string is truncated if there are fewer than numChars characters beyond the starting point. endpos represents the index after the last character in the substring. Note that for variable-length encodings such as UTF-8, UTF-16 or Shift-JIS, it can be necessary to remove string positions at the end, in order to avoid invalid strings.
// Example in C#"Wiki means fast?".ToUpper();// "WIKI MEANS FAST?"
# Example in Perl 5uc("Wiki means fast?");# "WIKI MEANS FAST?"
# Example in Rakuuc("Wiki means fast?"); # "WIKI MEANS FAST?""Wiki means fast?".uc; # "WIKI MEANS FAST?"
/* Example in Rexx */translate("Wiki means fast?")/* "WIKI MEANS FAST?" *//* Example #2 */
A='This is an example.'
UPPERA/* "THIS IS AN EXAMPLE." *//* Example #3 */
A='upper using Translate Function.'
TranslateUPPERVARAZ/* Z="UPPER USING TRANSLATE FUNCTION." */
; Example in Scheme(use-modules(srfisrfi-13))(string-upcase"Wiki means fast?"); "WIKI MEANS FAST?"
' Example in Visual BasicUCase("Wiki means fast?")' "WIKI MEANS FAST?"
There is no standard trim function in C or C++. Most of the available string libraries[55] for C contain code which implements trimming, or functions that significantly ease an efficient implementation. The function has also often been called EatWhitespace in some non-standard C libraries.
In C, programmers often combine a ltrim and rtrim to implement trim:
With boost's function named simply trim the input sequence is modified in-place, and returns no result.
Another open source C++ library Qt, has several trim variants, including a standard one:[57]
#include<QString>trimmed=s.trimmed();
The Linux kernel also includes a strip function, strstrip(), since 2.6.18-rc1, which trims the string "in place". Since 2.6.33-rc1, the kernel uses strim() instead of strstrip() to avoid false warnings.[58]
may be interpreted as follows: f drops the preceding whitespace, and reverses the string. f is then again applied to its own output. Note that the type signature (the second line) is optional.
J
The trim algorithm in J is a functional description:
trim=.#~[:(+./\*.+./\.)' '&~:
That is: filter (#~) for non-space characters (' '&~:) between leading (+./\) and (*.) trailing (+./\.) spaces.
JavaScript
There is a built-in trim function in JavaScript 1.8.1 (Firefox 3.5 and later), and the ECMAScript 5 standard. In earlier versions it can be added to the String object's prototype as follows:
Perl 5 has no built-in trim function. However, the functionality is commonly achieved using regular expressions.
Example:
$string=~s/^\s+//;# remove leading whitespace$string=~s/\s+$//;# remove trailing whitespace
or:
$string=~s/^\s+|\s+$//g;# remove both leading and trailing whitespace
These examples modify the value of the original variable $string.
Also available for Perl is StripLTSpace in String::Strip from CPAN.
There are, however, two functions that are commonly used to strip whitespace from the end of strings, chomp and chop:
chop removes the last character from a string and returns it.
chomp removes the trailing newline character(s) from a string if present. (What constitutes a newline is $INPUT_RECORD_SEPARATOR dependent).
In Raku, the upcoming sister language of Perl, strings have a trim method.
Example:
$string = $string.trim; # remove leading and trailing whitespace$string .= trim; # same thing
Tcl
The Tclstring command has three relevant subcommands: trim, trimright and trimleft. For each of those commands, an additional argument may be specified: a string that represents a set of characters to remove—the default is whitespace (space, tab, newline, carriage return).
Example of trimming vowels:
setstringonomatopoeia
settrimmed[stringtrim$stringaeiou];# result is nomatopsetr_trimmed[stringtrimright$stringaeiou];# result is onomatopsetl_trimmed[stringtrimleft$stringaeiou];# result is nomatopoeia
XSLT
XSLT includes the function normalize-space(string) which strips leading and trailing whitespace, in addition to replacing any whitespace sequence (including line breaks) with a single space.
XSLT 2.0 includes regular expressions, providing another mechanism to perform string trimming.
Another XSLT technique for trimming is to utilize the XPath 2.0 substring() function.
References
^ abcdethe index can be negative, which then indicates the number of places before the end of the string.
^In Rust, the str::chars method iterates over code points and the std::iter::Iterator::nth method on iterators returns the zero-indexed nth value from the iterator, or None.
^the index can not be negative, use *-N where N indicate the number of places before the end of the string.
^In C++, the overloaded operator<=> method on a string returns a std::strong_ordering object (otherwise std::weak_ordering): less, equal (same as equivalent), or greater.
^ abcdefIn Rust, the operators == and != and the methods eq, ne are implemented by the PartialEq trait, and the operators <, >, <=, >= and the methods lt, gt, le, ge are implemented by the PartialOrd trait.
^The operators use the compiler's default collating sequence.
^modifies string1, which must have enough space to store the result
^In Rust, the + operator is implemented by the Add trait.
^if n is larger than the length of the string, then in Debug mode ArrayRangeException is thrown, in Release mode, the behaviour is unspecified.
^if n is larger than the length of the string, Java will throw an IndexOutOfBoundsException
^ abif n is larger than length of string, raises Invalid_argument
^ abif n is larger than length of string, throw the message "StringTake::take:"
^ abcIn Rust, strings are indexed in terms of byte offsets and there is a runtime panic if the index is out of bounds or if it would result in invalid UTF-8. A &str (string reference) can be indexed by various types of ranges, including Range (0..n), RangeFrom (n..), and RangeTo (..n) because they all implement the SliceIndex trait with str being the type being indexed.
The str::get method is the non-panicking way to index. It returns None in the cases in which indexing would panic.
^In Rust, the str::chars method iterates over code points and the std::iter::Iterator::count method on iterators consumes the iterator and returns the total number of elements in the iterator.
^ abThe transform function exists in the std:: namespace. You must include the <algorithm> header file to use it. The tolower and toupper functions are in the global namespace, obtained by the <ctype.h> header file. The std::tolower and std::toupper names are overloaded and cannot be passed to std::transform without a cast to resolve a function overloading ambiguity, e.g. std::transform(string.begin(), string.end(), result.begin(), (int (*)(int))std::tolower);
^std::string only, result is stored in string result which is at least as long as string, and may or may not be string itself
^ abonly ASCII characters as Ruby lacks Unicode support
^ abcdeThe "find" string in this construct is interpreted as a regular expression. Certain characters have special meaning in regular expressions. If you want to find a string literally, you need to quote the special characters.
^In Rust, the str::to_uppercase method returns a newly allocated String with any lowercase characters changed to uppercase ones following the Unicode rules.
^In Rust, the str::trim method returns a reference to the original &str.