mysql数据库默认的字符集是latin1。默认情况下,我们编译的httpd插件是可以正常读取该类型的数据库,并且不会出现乱码。但是,如果我们的数据库变成其他格式,比如UTF8,那么默认读取出来的数据就是乱码,且无论我们怎么设置参数都不会起作用。(转载请指明出于breaksoftware的csdn博客)
我们看一个utf8类型数据库的例子,使用以下指令查看字符集
SHOW VARIABLES LIKE 'character_set_%';
通过character_set_database的值,我们可以得知该数据库类型是utf8。这样我们在读取该数据库时,便需要指定utf8字符集。在其他语言中,我们一般如此设置
“charset=utf8"
我们尝试将这句话加入到连接数据库的参数中
status = apr_dbd_open(driver, pool_db, "host=localhost;user=user_name;pass=password;dbname=database_name;charset=utf8", &handle);
这句api可以执行成功,但是读取的结果还是乱码!这很不科学,于是我浏览了下apr数据库相关函数,发现没有一个特定的接口可以设定字符集。可以想象apr-util库只是对libmysql++-dev 复杂接口的封装。那么存在一种可能:apr-util实现还不全面。我们阅读apr_dbd_open的实现
struct {
const char *field;
const char *value;
} fields[] = {
{"host", NULL},
{"user", NULL},
{"pass", NULL},
{"dbname", NULL},
{"port", NULL},
{"sock", NULL},
{"flags", NULL},
{"fldsz", NULL},
{"group", NULL},
{"reconnect", NULL},
{NULL, NULL}
};
unsigned int port = 0;
apr_dbd_t *sql = apr_pcalloc(pool, sizeof(apr_dbd_t));
sql->fldsz = FIELDSIZE;
sql->conn = mysql_init(sql->conn);
if ( sql->conn == NULL ) {
return NULL;
}
for (ptr = strchr(params, '='); ptr; ptr = strchr(ptr, '=')) {
/* don't dereference memory that may not belong to us */
if (ptr == params) {
++ptr;
continue;
}
for (key = ptr-1; apr_isspace(*key); --key);
klen = 0;
while (apr_isalpha(*key)) {
/* don't parse backwards off the start of the string */
if (key == params) {
--key;
++klen;
break;
}
--key;
++klen;
}
++key;
for (value = ptr+1; apr_isspace(*value); ++value);
vlen = strcspn(value, delims);
for (i = 0; fields[i].field != NULL; i++) {
if (!strncasecmp(fields[i].field, key, klen)) {
fields[i].value = apr_pstrndup(pool, value, vlen);
break;
}
}
ptr = value+vlen;
}
if (fields[4].value != NULL) {
port = atoi(fields[4].value);
}
if (fields[6].value != NULL &&
!strcmp(fields[6].value, "CLIENT_FOUND_ROWS")) {
flags |= CLIENT_FOUND_ROWS; /* only option we know */
}
if (fields[7].value != NULL) {
sql->fldsz = atol(fields[7].value);
}
if (fields[8].value != NULL) {
mysql_options(sql->conn, MYSQL_READ_DEFAULT_GROUP, fields[8].value);
}
#if MYSQL_VERSION_ID >= 50013
if (fields[9].value != NULL) {
do_reconnect = atoi(fields[9].value) ? 1 : 0;
}
#endif
#if MYSQL_VERSION_ID >= 50013
/* the MySQL manual says this should be BEFORE mysql_real_connect */
mysql_options(sql->conn, MYSQL_OPT_RECONNECT, &do_reconnect);
#endif
real_conn = mysql_real_connect(sql->conn, fields[0].value,
fields[1].value, fields[2].value,
fields[3].value, port,
fields[5].value, flags);
粗略读了一下这段代码,可以得出以下几点判断:
我们从mysql的开发文档(http://dev.mysql.com/doc/refman/5.7/en/mysql-real-connect.html)中可以查阅到如下一句话
The user and passwd parameters use whatever character set has been configured for the MYSQL object. By default, this is latin1, but can be changed by calling mysql_options(mysql, MYSQL_SET_CHARSET_NAME, "charset_name") prior to connecting.
可以见得我们需要使用mysql_options,传递MYSQL_SET_CHARSET_NAME来设置字符集。但是通过对apr-util库的通篇搜索,mysql_options只是在apr_dbd_open中被使用了,且还搜索不到MYSQL_SET_CHARSET_NAME。那么我们可以认定apr-util还没实现”字符集选择“的功能。我们需要自己手工修改代码(/usr/src/apr-util-1.5.4/dbd/apr_dbd_mysql.c)
struct {
const char *field;
const char *value;
} fields[] = {
{"host", NULL},
{"user", NULL},
{"pass", NULL},
{"dbname", NULL},
{"port", NULL},
{"sock", NULL},
{"flags", NULL},
{"fldsz", NULL},
{"group", NULL},
{"reconnect", NULL},
{"charset", NULL},
{NULL, NULL}
};
先设定好需要解析的字段,再在mysql_real_connect之前插入
if (fields[10].value != NULL) {
mysql_options(sql->conn, MYSQL_SET_CHARSET_NAME, fields[10].value);
}
如此,重新编译apr-util和httpd库,我们的插件便可以支持数据库字符集的选择了。