Compare two strings, ignoring upper/lower case.
Synopsis
Result
cmp An integer which is:
< 0 | if str1 < str2 in lexicographic order ignoring upper/lower case distinctions |
0 | if str1 = str2 ignoring upper/lower case distinctions |
> 0 | if str1 > str2 in lexicographic order ignoring upper/lower case distinctions |
Parameters
str1 A non-NULL pointer to a (C-style NUL-terminated) string to be compared.
str2 A non-NULL pointer to a (C-style NUL-terminated) string to be compared.
Discussion
The standard C library strcmp() function does a case-sensitive string comparison, i.e. strcmp("cactus", "Cactus") will find the two strings not equal. Sometimes it’s useful to do case-insensitive string comparison, where upper/lower case distinctions are ignored. Many systems provide a strcasecmp() or strcmpi() function to do this, but some systems don’t, and even on those that do, the name isn’t standardised. So, Cactus provides its own version, Util_StrCmpi().
Notice that the return value of Util_StrCmpi(), like that of strcmp(), is zero (logical “false” in C) for equal strings, and nonzero (logical “true” in C) for non-equal strings. Code of the form
or
may be confusing to readers, because the sense of the comparison isn’t immediately obvious. Writing an explicit comparison against zero make make things clearer:
or
Unfortunately, the basic concept of “case-insensitive” string operations doesn’t generalize well to non-English character sets,1 where lower-case ↔ upper-case mappings may be context-dependent, many-to-one, and/or time-dependent.2 At present Cactus basically ignores these issues. :(
See Also
strcmp() Standard C library function (prototype in <string.h>) to compare two strings.
Examples
“Duplicate” a string, i.e. copy it to a newly–malloced buffer.
Synopsis
Result
copy A pointer to a buffer obtained from malloc(), which this function sets to a copy of the (C-style
NUL-terminated) string str. This buffer should be freed with free() when it’s not needed any more.
Parameters
str A non-NULL pointer to a (C-style NUL-terminated) string.
Discussion
Many systems have a C library function strdup(), which mallocs sufficient memory for a copy of its argument string, does the copy, and returns a pointer to the copy. However, some systems lack this function, so Cactus provides its own version, Util_Strdup().
See Also
<stdlib.h> System header file containing prototypes for malloc() and free.
strcpy() Standard C library function (prototype in <string.h>) to copy a string to a buffer. This does not
check that the buffer is big enough to hold the string, and is thus very dangerous. Use Util_Strlcpy() instead!
Util_Strlcpy() [B40] Safely copy a string.
Errors
NULL malloc() was unable to allocate memory for the buffer.
Examples
Concatenate strings safely.
Synopsis
Result
result_len The size of the string the function tried to create, i.e. the initial strlen(dst) plus strlen(src).
Parameters
dst A non-NULL pointer to the (C-style NUL-terminated) destination string.
src A non-NULL pointer to the (C-style NUL-terminated) source string.
size The size of the destination buffer.
Discussion
The standard strcat() and strcpy() functions provide no way to specify the size of the destination buffer, so code using these functions is often vulnerable to buffer overflows. The standard strncat() and strncpy() functions can be used to write safe code, but their API is cumbersome, error-prone, and sometimes surprisingly inefficient:
To solve these problems, the OpenBSD project developed the strlcat() and strlcpy() functions. See http://www.openbsd.org/papers/strlcpy-paper.ps for a history and general discussion of these functions. Some other Unix systems (notably Solaris) now provide these, but many don’t, so Cactus provides its own versions, Util_Strlcat() and Util_Strlcpy().
Util_Strlcat() appends the NUL-terminated string src to the end of the NUL-terminated string dst. It will append at most size - strlen(dst) - 1 characters (hence it never overflows the destination buffer), and it always leaves dst string NUL-terminated.
See Also
strcat() Standard C library function (prototype in <string.h>) to concatenate two strings. This does not
check that the buffer is big enough to hold the result, and is thus very dangerous. Use Util_Strlcat() instead!
Util_Strlcpy() [B40] Safely copy a string.
Examples
Copies a string safely.
Synopsis
Result
result_len The size of the string the function tried to create, i.e. strlen(src).
Parameters
dst A non-NULL pointer to the (C-style NUL-terminated) destination string.
src A non-NULL pointer to the (C-style NUL-terminated) source string.
size The size of the destination buffer.
Discussion
The standard strcat() and strcpy() functions provide no way to specify the size of the destination buffer, so code using these functions is often vulnerable to buffer overflows. The standard strncat() and strncpy() functions can be used to write safe code, but their API is cumbersome, error-prone, and sometimes surprisingly inefficient:
To solve these problems, the OpenBSD project developed the strlcat() and strlcpy() functions. See http://www.openbsd.org/papers/strlcpy-paper.ps for a history and general discussion of these functions. Some other Unix systems (notably Solaris) now provide these, but many don’t, so Cactus provides its own versions, Util_Strlcat() and Util_Strlcpy().
Util_Strlcpy() copies up to size-1 characters from the source string to the destination string, followed by a NUL character (so dst is always NUL-terminated). Unlike strncpy(), Util_Strlcpy() does not fill any left-over space at the end of the destination buffer with NUL characters.
See Also
strcpy() Standard C library function (prototype in <string.h>) to copy a string to a buffer. This does not
check that the buffer is big enough to hold the string, and is thus very dangerous. Use Util_Strlcpy() instead!
Util_Strdup() [B33] “Duplicate” a string, i.e. copy it to a newly-malloced buffer.
Util_Strlcat() [B36] Safely concatenates two strings.
Examples
Separate off the first token from a string.
Synopsis
Result
token This function returns the original value of *string_ptr, or NULL if the end of the string is reached.
Parameters
string_ptr A non-NULL pointer to a (modifyable) non-NULL pointer to the (C-style NUL-terminated) string
to operate on.
delim_set A non-NULL pointer to a (C-style NUL-terminated) string representing a set of delimiter characters
(the order of these characters doesn’t matter).
Discussion
Many Unix systems define a function strsep() which provides a clean way of splitting a string into “words”. However, some systems only provide the older (and inferior-in-several-ways) strtok() function, so Cactus implements its own strsep() function, Util_StrSep().
Util_StrSep() finds the first occurence in the string pointed to by *string_ptr of any character in the string pointed to by delim_set (or the terminating NUL if there is no such character), and replaces this by NUL. The location of the next character after the NUL character just stored (or NULL, if the end of the string was reached) is stored in *string_ptr.
An “empty” field, i.e. one caused by two adjacent delimiter characters, can be detected (after Util_StrSep() returns) by the test **string_ptr == ’\0’, or equivalently strlen(*string_ptr) == 0.
See the example section below for the typical usage of Util_StrSep().
See Also
strsep() Some systems provide this in the standard C library (prototype in <string.h>); Util_StrSep() is a
clone of this.
strtok() Inferior API for splitting a string into tokens (defined by the ANSI/ISO C standard).
Examples
1Hawaiian and Swahili are apparently the only other living languages that use solely the 26-letter “English” Latin alphabet.
2For example, the (lower-case) German “ß” doesn’t have a unique upper-case equivalent: “ß” usually maps to “SS” (for example “groß” ↔ “GROSS”), but if that would conflict with another word, then “ß” maps to “SZ” (for example “maße” ↔ “MASZE” because there’s a different word “MASSE”). Or at least that’s the way it was prior to 1998. The 1998 revisions to German orthography removed the SZ rule, so now (post-1998) the two distinct German words “masse” (English “mass”) and “maße” (“measures”) have identical upper-case forms “MASSE”. To further complicate matters, (the German-speaking parts of) Switzerland have a slightly different orthography, which never had the SZ rule.
French provides another tricky example: In France “é” ↔ “É” and “è” ↔ “È”, whereas in (the French-speaking parts of) Canada there are no accents on upper-case letters, so “é” ↔ “E” and “è” ↔ “E”.