March 15, 2012

(C++) Return Email Headers as iso-8859-15? (or any other charset)

Filed under: charset, cpp — Tags: , , — admin @ 2:58 pm

Question:

In C++, is it somehow possible to specify a desired charset (like ISO-8859-15) when getting mail headers with POP3?

Answer:

Instead of calling the method that returns a “const char *” — which can return either utf-8 or ANSI (see this Chilkat blog post about the Utf8 property common to all Chilkat C++ classes), call the alternate method that returns the string in a CkString object.  You can then get the iso-8859-15 string from the CkString object.

Each Chilkat C++ method that returns a string has two versions — an upper-case version that returns the string in a CkString (always the last argument), and a lower-case version that returns a “const char *”.

For example, in the CkEmail class:

bool GetHeaderField(const char *fieldName, CkString &outFieldValue);
const char *getHeaderField(const char *fieldName);

The lower-case method returning a “const char *” returns a pointer to memory that may be overwritten in subsequent calls.  Therefore, make sure to copy the string to a safe place immediately before making additional calls on the same Chilkat object instance.  (Only methods that also return “const char *” would overwrite the memory from a previous call.)

The upper-case version of the method returns the string in a CkString object.  It is an output-only argument, meaning that the CkString contents are replaced, not appended.  To get the iso-8859-15 string from the CkString, call the getEnc method.  For example:

const char  *str_iso_8859_15 = outFieldValue.getEnc("iso-8859-15");

This returns a NULL-terminated string where each character is represented as a single byte using the iso-8859-15 encoding.

September 12, 2009

BASE64 Decode with Charset GB2312

Filed under: C#, Encoding, charset — Tags: , , , , — admin @ 7:57 am

Question:
I have a Base64 decode error, as follows:

CkString str;
str.setString("16q");
str.base64Decode("gb2312");
const char *strResult = str.getString();

convert result is { cb f2 }
But the correct result should be { d7 aa}

What’s wrong?

The platform is WinCE 6.0, use Chilkat_PPC_M5.lib

Answer:

The following code shows how to do it correctly:

    CkString str;
    str.setString("16q");

    // The following line of code tells the CkString object to
    // decode the base64 to raw bytes, then interpret those bytes as
    // GB2312 encoded characters and store them within the string.
    // Internally, the string is stored as utf-8.
    str.base64Decode("gb2312");

    // This is an implicit conversion to ANSI, because
    // getString returns either ANSI or utf-8,
    // depending on the setting of get_Utf8/put_Utf8
    const char *strAnsi = str.getString();

    // Instead, fetch the string as GB2312 bytes:
    const char *strGb2312 = str.getEnc("gb2312");

    const unsigned char *c = (const unsigned char *) strGb2312;
    while (*c != '\0') {  printf("%02x ",*c); c++; }
    printf("\n");

    // The output is "d7 aa "

    // Another way to decode using CkByteData...
    CkByteData data;
    data.appendEncoded("16q","base64");
    c = data.getData();
    unsigned long i;
    unsigned long sz = data.getSize();
    for (i=0; i<sz; i++) { printf("%02x ",*c); c++; }
    printf("\n");

    // The output is "d7 aa "

September 10, 2008

Visual Basic Font.Charset Property

Filed under: character — Tags: , , , — admin @ 7:14 am
Charset Name Charset Value
(Hex)
Charset Value
(Decimal)
Code-Page ID
ANSI_CHARSET 0×00 0 1252
DEFAULT_CHARSET 0×01 1
SYMBOL_CHARSET 0×02 2
SHIFTJIS_CHARSET 0×80 128 932
HANGUL_CHARSET 0×81 129 949
GB2312_CHARSET 0×86 134 936
CHINESEBIG5_CHARSET 0×88 136 950
GREEK_CHARSET 0xA1 161 1253
TURKISH_CHARSET 0xA2 162 1254
HEBREW_CHARSET 0xB1 177 1255
ARABIC_CHARSET 0xB2 178 1256
BALTIC_CHARSET 0xBA 186 1257
RUSSIAN_CHARSET 0xCC 204 1251
THAI_CHARSET 0xDE 222 874
EE_CHARSET 0xEE 238 1250
OEM_CHARSET 0xFF 255