首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >从wlatin1环境下载数据集到UTF-8环境的问题

从wlatin1环境下载数据集到UTF-8环境的问题
EN

Stack Overflow用户
提问于 2022-09-16 12:38:18
回答 3查看 229关注 0票数 0

我正在从使用proc download编码的服务器下载数据集(在SAS中使用wlatin1 ),并下载到使用UTF-8编码的服务器。

我有个错误

代码语言:javascript
运行
复制
ERROR: Some character data was lost during transcoding in the data set
       libref.datasetname. Either the data contains characters that are not
       representable in the new encoding or truncation occurred during transcoding.

我尝试在输入数据集上设置inencoding='utf8'inencoding='asciiany',但它不起作用(可能是因为wlatin1服务器有SAS9.3,而UTF-8服务器有SAS9.4)。

像下面的代码一样重写文件(然后执行proc download of myoutput)可以工作,但是我想知道是否有更优雅的方法来做同样的事情。

代码语言:javascript
运行
复制
data myoutput;
set pathin.myinput;
/*Translittera la I accentata con I normale*/
des_nome = tranwrd(des_nome,'CD'x,'I');
des_nome = tranwrd(des_nome,'ED'x,'i');
/*Translittera la A accentata con A normale*/
des_nome = tranwrd(des_nome,'C1'x,'A');
des_nome = tranwrd(des_nome,'E1'x,'a');
/*Translittera la E accentata con E normale*/
des_nome = tranwrd(des_nome,'C9'x,'E');
des_nome = tranwrd(des_nome,'E9'x,'e');
/*Translittera la O accentata con O normale*/
des_nome = tranwrd(des_nome,'D2'x,'O');
des_nome = tranwrd(des_nome,'D3'x,'O');
des_nome = tranwrd(des_nome,'D6'x,'O');
des_nome = tranwrd(des_nome,'F3'x,'o');
/*Translittera la U accentata con U normale*/
des_nome = tranwrd(des_nome,'DC'x,'U'); 
des_nome = tranwrd(des_nome,'F9'x,'u'); 
/*Translittera la Y accentata con Y normale*/
des_nome = tranwrd(des_nome,'DD'x,'Y');
des_nome = tranwrd(des_nome,'FD'x,'y');
/*Translittera accenti strani con '*/
des_nome = tranwrd(des_nome,'B4'x,"'");
/*Translittera simboli strani con spazi*/
des_nome = tranwrd(des_nome,'A7'x,' ');  /* § nel NOME */
des_nome = tranwrd(des_nome,'A3'x,' ');  /* £ nel NOME */
cod_cap_res = tranwrd(cod_cap_res,'A3'x,' '); /* £ nel CAP */
run;
EN

Stack Overflow用户

发布于 2022-09-16 19:19:43

此问题是原始数据集中至少一个变量的存储长度太短,无法适应至少一个字符串的UTF-8表示所需的扩展长度。

下面是演示这个问题的简单方法。使用WLATIN1 (或LATIN1)编码的所有256个可能字符创建一个简单文件。

代码语言:javascript
运行
复制
340  %put %sysfunc(getoption(encoding,keyword));
ENCODING=WLATIN1
341  data 'c:\downloads\wlatin1.sas7bdat';
342    string = collate(0,256);
343  run;

NOTE: The data set c:\downloads\wlatin1.sas7bdat has 1 observations and 1 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

现在尝试使用UTF-8编码在会话中读取它。即使在输出数据集中使变量更长,转换也会失败。

代码语言:javascript
运行
复制
38   data test1;
39     length string $1024;
40     set 'c:\downloads\wlatin1.sas7bdat';
NOTE: Data file WC000001.WLATIN1.DATA is in a format that is native to another host, or the file encoding does not match the
      session encoding. Cross Environment Data Access will be used, which might require additional CPU resources and might reduce
      performance.
41   run;

ERROR: Some character data was lost during transcoding in the dataset WC000001.WLATIN1. Either the data contains characters that
       are not representable in the new encoding or truncation occurred during transcoding.
NOTE: The DATA step has been abnormally terminated.
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.TEST1 may be incomplete.  When this step was stopped there were 0 observations and 1 variables.
WARNING: Data set WORK.TEST1 was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.00 seconds

因此,要修复该问题,请使用编码=“ANY”读取文件,然后将字符串从WLATIN1转换为UTF-8。

代码语言:javascript
运行
复制
42   data test;
43     length string $1024;
44     set 'c:\downloads\wlatin1.sas7bdat' (encoding='any');
45     string = kcvt(string,'wlatin1','utf-8');
46   run;

NOTE: There were 1 observations read from the data set c:\downloads\wlatin1.sas7bdat.
NOTE: The data set WORK.TEST has 1 observations and 1 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds
票数 1
EN
查看全部 3 条回答
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/73744935

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档