Chapter B3
Full Descriptions of String Functions

Util_StrCmpi

Compare two strings, ignoring upper/lower case.

Synopsis

C
#include "util_String.h"  
int cmp = Util_StrCmpi(const char *str1, const char *str2);


Result

cmp An integer which is:

< 0if str1 < str2 in lexicographic order ignoring upper/lower case distinctions
0if str1 = str2 ignoring upper/lower case distinctions
> 0if str1 > str2 in lexicographic order ignoring upper/lower case distinctions

Parameters

str1 A non-NULL pointer to a (C-style NUL-terminated) string to be compared.
str2 A non-NULL pointer to a (C-style NUL-terminated) string to be compared.

Discussion

The standard C library strcmp() function does a case-sensitive string comparison, i.e. strcmp("cactus", "Cactus") will find the two strings not equal. Sometimes it’s useful to do case-insensitive string comparison, where upper/lower case distinctions are ignored. Many systems provide a strcasecmp() or strcmpi() function to do this, but some systems don’t, and even on those that do, the name isn’t standardised. So, Cactus provides its own version, Util_StrCmpi().

Notice that the return value of Util_StrCmpi(), like that of strcmp(), is zero (logical “false” in C) for equal strings, and nonzero (logical “true” in C) for non-equal strings. Code of the form

if (Util_StrCmpi(str1, str2))  
        { /* strings differ */ }

or

if (!Util_StrCmpi(str1, str2))  
        { /* strings are identical apart from case distinctions */ }

may be confusing to readers, because the sense of the comparison isn’t immediately obvious. Writing an explicit comparison against zero make make things clearer:

if (Util_StrCmpi(str1, str2) != 0)  
        { /* strings differ */ }

or

if (Util_StrCmpi(str1, str2) == 0)  
        { /* strings are identical apart from case distinctions */ }

Unfortunately, the basic concept of “case-insensitive” string operations doesn’t generalize well to non-English character sets,1 where lower-case upper-case mappings may be context-dependent, many-to-one, and/or time-dependent.2 At present Cactus basically ignores these issues. :(

See Also

strcmp() Standard C library function (prototype in <string.h>) to compare two strings.

Examples

C
#include "util_String.h"  
 
/* does the Cactus parameter  driver  specify the PUGH driver? */  
/* (Cactus parameters are supposed to be case-insensitive) */  
if (Util_StrCmpi(driver, "pugh") == 0)  
        { /* PUGH code */ }  
else  
        { /* non-PUGH code */ }


Util_Strdup

“Duplicate” a string, i.e. copy it to a newly–malloced buffer.

Synopsis

C
#include "util_String.h"  
char* copy = Util_Strdup(const char *str);


Result

copy A pointer to a buffer obtained from malloc(), which this function sets to a copy of the (C-style NUL-terminated) string str. This buffer should be freed with free() when it’s not needed any more.

Parameters

str A non-NULL pointer to a (C-style NUL-terminated) string.

Discussion

Many systems have a C library function strdup(), which mallocs sufficient memory for a copy of its argument string, does the copy, and returns a pointer to the copy. However, some systems lack this function, so Cactus provides its own version, Util_Strdup().

See Also

<stdlib.h> System header file containing prototypes for malloc() and free.
strcpy() Standard C library function (prototype in <string.h>) to copy a string to a buffer. This does not check that the buffer is big enough to hold the string, and is thus very dangerous. Use Util_Strlcpy() instead!
Util_Strlcpy() [B40] Safely copy a string.

Errors

NULL malloc() was unable to allocate memory for the buffer.

Examples

C
#include "util_String.h"  
 
/*  
 * return the (positive) answer to a question,  
 * or negative if an error occured  
 */  
int answer_question(const char* question)  
{  
/*  
 * we need to modify the question string in the process of parsing it  
 * but we must not destroy the input ==> copy it and modify the copy  
 *  
 * ... note the const qualifier on  question_copy  says that  
 *     the pointer  question_copy  won’t itself change, but  
 *     we can modify the string that it points to  
 */  
char* const question_copy = Util_Strdup(question);  
if (question_copy == NULL)  
        { return -1; }     /* couldn’t get memory for copy buffer */  
 
/* code that will modify  question_copy */  
 
free(question_copy);  
return 42;  
}


Util_Strlcat

Concatenate strings safely.

Synopsis

C
#include "util_String.h"  
size_t result_len = Util_Strlcat(char *dst, const char *src, size_t size);


Result

result_len The size of the string the function tried to create, i.e. the initial strlen(dst) plus strlen(src).

Parameters

dst A non-NULL pointer to the (C-style NUL-terminated) destination string.
src A non-NULL pointer to the (C-style NUL-terminated) source string.
size The size of the destination buffer.

Discussion

The standard strcat() and strcpy() functions provide no way to specify the size of the destination buffer, so code using these functions is often vulnerable to buffer overflows. The standard strncat() and strncpy() functions can be used to write safe code, but their API is cumbersome, error-prone, and sometimes surprisingly inefficient:

To solve these problems, the OpenBSD project developed the strlcat() and strlcpy() functions. See http://www.openbsd.org/papers/strlcpy-paper.ps for a history and general discussion of these functions. Some other Unix systems (notably Solaris) now provide these, but many don’t, so Cactus provides its own versions, Util_Strlcat() and Util_Strlcpy().

Util_Strlcat() appends the NUL-terminated string src to the end of the NUL-terminated string dst. It will append at most size - strlen(dst) - 1 characters (hence it never overflows the destination buffer), and it always leaves dst string NUL-terminated.

See Also

strcat() Standard C library function (prototype in <string.h>) to concatenate two strings. This does not check that the buffer is big enough to hold the result, and is thus very dangerous. Use Util_Strlcat() instead!
Util_Strlcpy() [B40] Safely copy a string.

Examples

C
#include "util_String.h"  
 
/*  
 * safely concatenate strings s1,s2,s3 into buffer:  
 * ... this code is safe (it will never overflow the buffer), but  
 *     quick-n-dirty in that it doesn’t give any error indication  
 *     if the result is truncated to fit in the buffer  
 */  
#define BUFFER_SIZE     1024  
char buffer[BUFFER_SIZE];  
 
Util_Strlcpy(buffer, s1, sizeof(buffer));  
Util_Strlcat(buffer, s2, sizeof(buffer));  
Util_Strlcat(buffer, s3, sizeof(buffer));


C
#include "util_String.h"  
 
#define OK              0  
#define ERROR_TRUNC     1  
 
/*  
 * safely concatenate strings s1,s2,s3 into buffer[N_buffer];  
 * return OK if ok, ERROR_TRUNC if result was truncated to fit in buffer  
 */  
int cat3(int N_buffer, char buffer[],  
         const char s1[], const char s2[], const char s3[])  
{  
int length;  
 
length = Util_Strlcpy(buffer, s1, N_buffer);  
if (length >= N_buffer)  
        return ERROR_TRUNC;                   /*** ERROR EXIT ***/  
 
length = Util_Strlcat(buffer, s2, N_buffer);  
if (length >= N_buffer)  
        return ERROR_TRUNC;                   /*** ERROR EXIT ***/  
 
length = Util_Strlcat(buffer, s3, N_buffer);  
if (length >= N_buffer)  
        return ERROR_TRUNC;                   /*** ERROR EXIT ***/  
 
return OK;                                    /*** NORMAL RETURN ***/  
}


Util_Strlcpy

Copies a string safely.

Synopsis

C
#include "util_String.h"  
size_t result_len = Util_Strlcpy(char *dst, const char *src, size_t size);


Result

result_len The size of the string the function tried to create, i.e. strlen(src).

Parameters

dst A non-NULL pointer to the (C-style NUL-terminated) destination string.
src A non-NULL pointer to the (C-style NUL-terminated) source string.
size The size of the destination buffer.

Discussion

The standard strcat() and strcpy() functions provide no way to specify the size of the destination buffer, so code using these functions is often vulnerable to buffer overflows. The standard strncat() and strncpy() functions can be used to write safe code, but their API is cumbersome, error-prone, and sometimes surprisingly inefficient:

To solve these problems, the OpenBSD project developed the strlcat() and strlcpy() functions. See http://www.openbsd.org/papers/strlcpy-paper.ps for a history and general discussion of these functions. Some other Unix systems (notably Solaris) now provide these, but many don’t, so Cactus provides its own versions, Util_Strlcat() and Util_Strlcpy().

Util_Strlcpy() copies up to size-1 characters from the source string to the destination string, followed by a NUL character (so dst is always NUL-terminated). Unlike strncpy(), Util_Strlcpy() does not fill any left-over space at the end of the destination buffer with NUL characters.

See Also

strcpy() Standard C library function (prototype in <string.h>) to copy a string to a buffer. This does not check that the buffer is big enough to hold the string, and is thus very dangerous. Use Util_Strlcpy() instead!
Util_Strdup() [B33] “Duplicate” a string, i.e. copy it to a newly-malloced buffer.
Util_Strlcat() [B36] Safely concatenates two strings.

Examples

C
#include "util_String.h"  
 
/*  
 * safely concatenate strings s1,s2,s3 into buffer:  
 * ... this code is safe (it will never overflow the buffer), but  
 *     quick-n-dirty in that it doesn’t give any error indication  
 *     if the result is truncated to fit in the buffer  
 */  
#define BUFFER_SIZE     1024  
char buffer[BUFFER_SIZE];  
 
Util_Strlcpy(buffer, s1, sizeof(buffer));  
Util_Strlcat(buffer, s2, sizeof(buffer));  
Util_Strlcat(buffer, s3, sizeof(buffer));


C
#include "util_String.h"  
 
#define OK              0  
#define ERROR_TRUNC     1  
 
/*  
 * safely concatenate strings s1,s2,s3 into buffer[N_buffer];  
 * return OK if ok, ERROR_TRUNC if result was truncated to fit in buffer  
 */  
int cat3(int N_buffer, char buffer[],  
         const char s1[], const char s2[], const char s3[])  
{  
int length;  
 
length = Util_Strlcpy(buffer, s1, N_buffer);  
if (length >= N_buffer)  
        return ERROR_TRUNC;                   /*** ERROR EXIT ***/  
 
length = Util_Strlcat(buffer, s2, N_buffer);  
if (length >= N_buffer)  
        return ERROR_TRUNC;                   /*** ERROR EXIT ***/  
 
length = Util_Strlcat(buffer, s3, N_buffer);  
if (length >= N_buffer)  
        return ERROR_TRUNC;                   /*** ERROR EXIT ***/  
 
return OK;                                    /*** NORMAL RETURN ***/  
}


Util_StrSep

Separate off the first token from a string.

Synopsis

C
#include "util_String.h"  
char* token = Util_StrSep(const char** string_ptr, const char* delim_set);


Result

token This function returns the original value of *string_ptr, or NULL if the end of the string is reached.

Parameters

string_ptr A non-NULL pointer to a (modifyable) non-NULL pointer to the (C-style NUL-terminated) string to operate on.
delim_set A non-NULL pointer to a (C-style NUL-terminated) string representing a set of delimiter characters (the order of these characters doesn’t matter).

Discussion

Many Unix systems define a function strsep() which provides a clean way of splitting a string into “words”. However, some systems only provide the older (and inferior-in-several-ways) strtok() function, so Cactus implements its own strsep() function, Util_StrSep().

Util_StrSep() finds the first occurence in the string pointed to by *string_ptr of any character in the string pointed to by delim_set (or the terminating NUL if there is no such character), and replaces this by NUL. The location of the next character after the NUL character just stored (or NULL, if the end of the string was reached) is stored in *string_ptr.

An “empty” field, i.e. one caused by two adjacent delimiter characters, can be detected (after Util_StrSep() returns) by the test **string_ptr == ’\0’, or equivalently strlen(*string_ptr) == 0.

See the example section below for the typical usage of Util_StrSep().

See Also

strsep() Some systems provide this in the standard C library (prototype in <string.h>); Util_StrSep() is a clone of this.
strtok() Inferior API for splitting a string into tokens (defined by the ANSI/ISO C standard).

Examples

C
#include <stdio.h>  
#include <stdlib.h>  
#include "util_String.h"  
 
/* prototypes */  
int parse_string(char* string,  
                 int N_argv, char* argv[]);  
 
/*  
 * Suppose we have a Cactus parameter  gridfn_list  containing a  
 * whitespace-separated list of grid functions.  This function  
 * "processes" (here just prints the name of) each grid function.  
 */  
void process_gridfn_list(const char* gridfn_list)  
{  
#define MAX_N_GRIDFNS   100  
int N_gridfns;  
int i;  
char* copy_of_gridfn_list;  
char* gridfn[MAX_N_GRIDFNS];  
 
copy_of_gridfn_list = Util_Strdup(gridfn_list);  
N_gridfns = parse_string(copy_of_gridfn_list,  
                         MAX_N_GRIDFNS, gridfn);  
 
        for (i = 0 ; i < N_gridfns ; ++i)  
        {  
        /* "process" (here just print the name of) each gridfn */  
        printf("grid function %d is \"%s\"\n", i, gridfn[i]);  
        }  
 
free(copy_of_gridfn_list);  
}  
 
/*  
 * This function parses a string containing whitespace-separated  
 * tokens into a main()-style argument vector (of size  N_argv ).  
 * This function returns the number of pointers stored into  argv[] .  
 *  
 * Adjacent sequences of whitespace are treated the same as single  
 * whitespace characters.  
 *  
 * Note that this function this modifies its input string; see  
 * Util_Strdup()  if this is a problem  
 */  
int parse_string(char* string,  
                 int N_argv, char* argv[])  
{  
int i;  
 
        for (i = 0 ; i < N_argv ; )  
        {  
        argv[i] = Util_StrSep(&string, " \t\n\r\v");  
        if (argv[i] == NULL)  
                { break; }      /* reached end-of-string */  
 
        if (*argv[i] == ’\0’)  
                {  
                /*  
                 * found a 0-length "token" (a sequence of  
                 * two or more adjacent whitespace characters)  
                 * ==> skip this "token" (don’t store it)  
                 * ==> no-op here  
                 */  
                }  
           else {  
                /* token has length > 0 ==> store it */  
                ++i;  
                }  
        }  
 
return i;  
}


1Hawaiian and Swahili are apparently the only other living languages that use solely the 26-letter “English” Latin alphabet.

2For example, the (lower-case) German “ß” doesn’t have a unique upper-case equivalent: “ß” usually maps to “SS” (for example “groß” “GROSS”), but if that would conflict with another word, then “ß” maps to “SZ” (for example “maße” “MASZE” because there’s a different word “MASSE”). Or at least that’s the way it was prior to 1998. The 1998 revisions to German orthography removed the SZ rule, so now (post-1998) the two distinct German words “masse” (English “mass”) and “maße” (“measures”) have identical upper-case forms “MASSE”. To further complicate matters, (the German-speaking parts of) Switzerland have a slightly different orthography, which never had the SZ rule.

French provides another tricky example: In France “é” É” and “è” È”, whereas in (the French-speaking parts of) Canada there are no accents on upper-case letters, so “é” “E” and “è” “E”.