< Free Open Study > |
12.4. Characters and StringsThis section provides some tips for using strings. The first applies to strings in all languages. Cross-Reference Issues for using magic characters and strings are similar to those for magic numbers discussed in Section 12.1, "Numbers in General." Avoid magic characters and strings Magic characters are literal characters (such as 'A') and magic strings are literal strings (such as "Gigamatic Accounting Program") that appear throughout a program. If you program in a language that supports the use of named constants, use them instead. Otherwise, use global variables. Several reasons for avoiding literal strings exist:
C++ Examples of Comparisons Using Strings
Watch for off-by-one errors Because substrings can be indexed much as arrays are, watch for off-by-one errors that read or write past the end of a string. cc2e.com/1285 Know how your language and environment support Unicode In some languages such as Java, all strings are Unicode. In others such as C and C++, handling Unicode strings requires its own set of functions. Conversion between Unicode and other character sets is often required for communication with standard and third-party libraries. If some strings won't be in Unicode (for example, in C or C++), decide early on whether to use the Unicode character set at all. If you decide to use Unicode strings, decide where and when to use them. Decide on an internationalization/localization strategy early in the lifetime of a program Issues related to internationalization and localization are major issues. Key considerations are deciding whether to store all strings in an external resource and whether to create separate builds for each language or to determine the specific language at run time. cc2e.com/1292 If you know you only need to support a single alphabetic language, consider using an ISO 8859 character set For applications that need to support only a single alphabetic language (such as English) and that don't need to support multiple languages or an ideographic language (such as written Chinese), the ISO 8859 extended-ASCII-type standard makes a good alternative to Unicode. If you need to support multiple languages, use Unicode Unicode provides more comprehensive support for international character sets than ISO 8859 or other standards. Decide on a consistent conversion strategy among string types If you use multiple string types, one common approach that helps keep the string types distinct is to keep all strings in a single format within the program and convert the strings to other formats as close as possible to input and output operations. Strings in CC++'s standard template library string class has eliminated most of the traditional problems with strings in C. For those programmers working directly with C strings, here are some ways to avoid common pitfalls: Be aware of the difference between string pointers and character arrays The problem with string pointers and character arrays arises because of the way C handles strings. Be alert to the difference between them in two ways:
Declare C-style strings to have length CONSTANT+1 In C and C++, off-by-one errors with C-style strings are common because it's easy to forget that a string of length n requires n + 1 bytes of storage and to forget to leave room for the null terminator (the byte set to 0 at the end of the string). An effective way to avoid such problems is to use named constants to declare all strings. A key in this approach is that you use the named constant the same way every time. Declare the string to be length CONSTANT+1, and then use CONSTANT to refer to the length of a string in the rest of the code. Here's an example: C Example of Good String Declarations
If you don't have a convention to handle this, you'll sometimes declare the string to be of length NAME_LENGTH and have operations on it with NAME_ LENGTH-1; at other times you'll declare the string to be of length NAME_LENGTH+1 and have operations on it work with length NAME_LENGTH. Every time you use a string, you'll have to remember which way you declared it. When you use strings the same way every time, you don't have to remember how you dealt with each string individually and you eliminate mistakes caused by forgetting the specifics of an individual string. Having a convention minimizes mental overload and programming errors. Initialize strings to null to avoid endless strings C determines the end of a string by finding a null terminator, a byte set to 0 at the end of the string. No matter how long you think the string is, C doesn't find the end of the string until it finds a 0 byte. If you forget to put a null at the end of the string, your string operations might not act the way you expect them to. Cross-Reference For more details on initializing data, see Section 10.3, "Guidelines for Initializing Variables." You can avoid endless strings in two ways. First, initialize arrays of characters to 0 when you declare them: C Example of a Good Declaration of a Character Arraychar EventName[ MAX_NAME_LENGTH + 1 ] = { 0 }; Second, when you allocate strings dynamically, initialize them to 0 by using calloc() instead of malloc(). calloc() allocates memory and initializes it to 0. malloc() allocates memory without initializing it, so you take your chances when you use memory allocated by malloc(). Use arrays of characters instead of pointers in C If memory isn't a constraint�and often it isn't�declare all your string variables as arrays of characters. This helps to avoid pointer problems, and the compiler will give you more warnings when you do something wrong. Cross-Reference For more discussion of arrays, read Section 12.8, "Arrays," later in this chapter. Use strncpy() instead of strcpy() to avoid endless strings String routines in C come in safe versions and dangerous versions. The more dangerous routines such as strcpy() and strcmp() keep going until they run into a null terminator. Their safer companions, strncpy() and strncmp(), take a parameter for maximum length so that even if the strings go on forever, your function calls won't. |
< Free Open Study > |
No comments:
Post a Comment