Recipe 1.14. Handling International EncodingsProblemYou need to handle strings that contain nonASCII characters: probably SolutionTo use Unicode in Ruby, simply add the following to the beginning of code.
You can also invoke the Ruby interpreter with arguments that do the same thing:
If you use a Unix environment, you can add the arguments to the shebang line of your Ruby application:
The jcode library overrides most of the methods of String and makes them capable of handling multibyte text. The exceptions are String#length, String#count, and String#size, which are not overridden. Instead jcode defines three new methods: String#jlength, string#jcount, and String#jsize. DiscussionConsider a UTF-8 string that encodes six Unicode characters: efbca1 (A), efbca2 (B), and so on up to UTF-8 efbca6 (F):
The string contains 18 bytes that encode 6 characters:
String#count is a method that takes a strong of bytes, and counts how many times those bytes occurs in the string. String#jcount takes a string of characters and counts how many times those characters occur in the string:
String#count treats "\xef\xbc\xa2" as three separate bytes, and counts the number of times each of those bytes shows up in the string. String#jcount TReats the same string as a single character, and looks for that character in the string, finding it only once.
Apart from these differences, Ruby handles most Unicode behind the scenes. Once you have your data in UTF-8 format, you really don't have to worry. Given that Ruby's creator Yukihiro Matsumoto is Japanese, it is no wonder that Ruby handles Unicode so elegantly. See Also
|
Wednesday, October 28, 2009
Recipe 1.14. Handling International Encodings
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment