文章/答案/技术大牛

发布

社区首页 >问答首页 >从wlatin1环境下载数据集到UTF-8环境的问题

问从wlatin1环境下载数据集到UTF-8环境的问题
EN

Stack Overflow用户

提问于 2022-09-16 12:38:18

回答 3查看 229关注 0票数 0

我正在从使用proc download编码的服务器下载数据集(在SAS中使用wlatin1 )，并下载到使用UTF-8编码的服务器。

我有个错误

ERROR: Some character data was lost during transcoding in the data set
       libref.datasetname. Either the data contains characters that are not
       representable in the new encoding or truncation occurred during transcoding.

我尝试在输入数据集上设置inencoding='utf8'或inencoding='asciiany'，但它不起作用(可能是因为wlatin1服务器有SAS9.3，而UTF-8服务器有SAS9.4)。

像下面的代码一样重写文件(然后执行proc download of myoutput)可以工作，但是我想知道是否有更优雅的方法来做同样的事情。

data myoutput;
set pathin.myinput;
/*Translittera la I accentata con I normale*/
des_nome = tranwrd(des_nome,'CD'x,'I');
des_nome = tranwrd(des_nome,'ED'x,'i');
/*Translittera la A accentata con A normale*/
des_nome = tranwrd(des_nome,'C1'x,'A');
des_nome = tranwrd(des_nome,'E1'x,'a');
/*Translittera la E accentata con E normale*/
des_nome = tranwrd(des_nome,'C9'x,'E');
des_nome = tranwrd(des_nome,'E9'x,'e');
/*Translittera la O accentata con O normale*/
des_nome = tranwrd(des_nome,'D2'x,'O');
des_nome = tranwrd(des_nome,'D3'x,'O');
des_nome = tranwrd(des_nome,'D6'x,'O');
des_nome = tranwrd(des_nome,'F3'x,'o');
/*Translittera la U accentata con U normale*/
des_nome = tranwrd(des_nome,'DC'x,'U'); 
des_nome = tranwrd(des_nome,'F9'x,'u'); 
/*Translittera la Y accentata con Y normale*/
des_nome = tranwrd(des_nome,'DD'x,'Y');
des_nome = tranwrd(des_nome,'FD'x,'y');
/*Translittera accenti strani con '*/
des_nome = tranwrd(des_nome,'B4'x,"'");
/*Translittera simboli strani con spazi*/
des_nome = tranwrd(des_nome,'A7'x,' ');  /* § nel NOME */
des_nome = tranwrd(des_nome,'A3'x,' ');  /* £ nel NOME */
cod_cap_res = tranwrd(cod_cap_res,'A3'x,' '); /* £ nel CAP */
run;

encoding

character-encoding

sas

Stack Overflow用户

发布于 2022-09-16 19:19:43

此问题是原始数据集中至少一个变量的存储长度太短，无法适应至少一个字符串的UTF-8表示所需的扩展长度。

下面是演示这个问题的简单方法。使用WLATIN1 (或LATIN1)编码的所有256个可能字符创建一个简单文件。

340  %put %sysfunc(getoption(encoding,keyword));
ENCODING=WLATIN1
341  data 'c:\downloads\wlatin1.sas7bdat';
342    string = collate(0,256);
343  run;

NOTE: The data set c:\downloads\wlatin1.sas7bdat has 1 observations and 1 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

现在尝试使用UTF-8编码在会话中读取它。即使在输出数据集中使变量更长，转换也会失败。

38   data test1;
39     length string $1024;
40     set 'c:\downloads\wlatin1.sas7bdat';
NOTE: Data file WC000001.WLATIN1.DATA is in a format that is native to another host, or the file encoding does not match the
      session encoding. Cross Environment Data Access will be used, which might require additional CPU resources and might reduce
      performance.
41   run;

ERROR: Some character data was lost during transcoding in the dataset WC000001.WLATIN1. Either the data contains characters that
       are not representable in the new encoding or truncation occurred during transcoding.
NOTE: The DATA step has been abnormally terminated.
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.TEST1 may be incomplete.  When this step was stopped there were 0 observations and 1 variables.
WARNING: Data set WORK.TEST1 was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.00 seconds

因此，要修复该问题，请使用编码=“ANY”读取文件，然后将字符串从WLATIN1转换为UTF-8。

42   data test;
43     length string $1024;
44     set 'c:\downloads\wlatin1.sas7bdat' (encoding='any');
45     string = kcvt(string,'wlatin1','utf-8');
46   run;

NOTE: There were 1 observations read from the data set c:\downloads\wlatin1.sas7bdat.
NOTE: The data set WORK.TEST has 1 observations and 1 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds

票数 1

查看全部 3 条回答

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/73744935

复制

相似问题

问从wlatin1环境下载数据集到UTF-8环境的问题
EN

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问从wlatin1环境下载数据集到UTF-8环境的问题EN

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问从wlatin1环境下载数据集到UTF-8环境的问题
EN