The most common sorting style is code point sorting that is culture insensitive. This type of sorting doesn't respect the radical order of cultural aspect but it is the fastest sorting order.
For example:
Character 'E' has code point 0x45 and character 'a' has code point 0x61. If we compare or sort the character according to code point, 'E' will show before 'a'. But this contradict to our knowledge that 'a' should always show before 'E'.
Another example is the Chinese character where it's sorting order depending on phonetics or number of pen strokes. Sort order according to code point doesn't make much sense for Chinese characters.
The following chart show some Chinese characters sorted by unicode code point that is culture insensitive:
Ideograph | 汉语拼音 (Phonetic) | 笔划 (Key strokes) | Unicode Code Point |
一 | yi | 1 | 0x4E00 |
丁 | ding | 2 | 0x4E01 |
上 | shang | 3 | 0x4E0A |
且 | qie | 5 | 0x4E14 |
人 | ren | 2 | 0x4EBA |
We may use Windows API function CompareString to perform comparison for sorting operation.
var L: DWORD;
R: integer;
Str1, Str2: string;
begin
...
// For Stroke Count Order
L := MAKELCID(MAKELANGID(LANG_CHINESE, SUBLANG_CHINESE_SIMPLIFIED), SORT_CHINESE_PRC);
R := CompareString(L, 0, PChar(Str1), Length(Str1), PChar(Str2), Length(Str2));
L := MAKELCID(MAKELANGID(LANG_CHINESE, SUBLANG_CHINESE_SIMPLIFIED), SORT_CHINESE_PRCP);
R := CompareString(L, 0, PChar(Str1), Length(Str1), PChar(Str2), Length(Str2));
...
// For Ordinal Comparison (Code point comparison, culture insensitive)
R := StrComp(PChar(Str1), PChar(Str2));
end;
Stroke Count Order:
Ideograph | 汉语拼音 (Phonetic) | 笔划 (Key strokes) | Unicode Code Point |
一 | yi | 1 | 0x4E00 |
丁 | ding | 2 | 0x4E01 |
人 | ren | 2 | 0x4EBA |
上 | shang | 3 | 0x4E0A |
且 | qie | 5 | 0x4E14 |
Phonetic Order:
Ideograph | 汉语拼音 (Phonetic) | 笔划 (Key strokes) | Unicode Code Point |
丁 | ding | 2 | 0x4E01 |
且 | qie | 5 | 0x4E14 |
人 | ren | 2 | 0x4EBA |
上 | shang | 3 | 0x4E0A |
一 | yi | 1 | 0x4E00 |
Reference:
No comments:
Post a Comment