我正在从使用proc download
编码的服务器下载数据集(在SAS中使用wlatin1 ),并下载到使用UTF-8编码的服务器。
我有个错误
ERROR: Some character data was lost during transcoding in the data set
libref.datasetname. Either the data contains characters that are not
representable in the new encoding or truncation occurred during transcoding.
我尝试在输入数据集上设置inencoding='utf8'
或inencoding='asciiany'
,但它不起作用(可能是因为wlatin1服务器有SAS9.3,而UTF-8服务器有SAS9.4)。
像下面的代码一样重写文件(然后执行proc download
of myoutput
)可以工作,但是我想知道是否有更优雅的方法来做同样的事情。
data myoutput;
set pathin.myinput;
/*Translittera la I accentata con I normale*/
des_nome = tranwrd(des_nome,'CD'x,'I');
des_nome = tranwrd(des_nome,'ED'x,'i');
/*Translittera la A accentata con A normale*/
des_nome = tranwrd(des_nome,'C1'x,'A');
des_nome = tranwrd(des_nome,'E1'x,'a');
/*Translittera la E accentata con E normale*/
des_nome = tranwrd(des_nome,'C9'x,'E');
des_nome = tranwrd(des_nome,'E9'x,'e');
/*Translittera la O accentata con O normale*/
des_nome = tranwrd(des_nome,'D2'x,'O');
des_nome = tranwrd(des_nome,'D3'x,'O');
des_nome = tranwrd(des_nome,'D6'x,'O');
des_nome = tranwrd(des_nome,'F3'x,'o');
/*Translittera la U accentata con U normale*/
des_nome = tranwrd(des_nome,'DC'x,'U');
des_nome = tranwrd(des_nome,'F9'x,'u');
/*Translittera la Y accentata con Y normale*/
des_nome = tranwrd(des_nome,'DD'x,'Y');
des_nome = tranwrd(des_nome,'FD'x,'y');
/*Translittera accenti strani con '*/
des_nome = tranwrd(des_nome,'B4'x,"'");
/*Translittera simboli strani con spazi*/
des_nome = tranwrd(des_nome,'A7'x,' '); /* § nel NOME */
des_nome = tranwrd(des_nome,'A3'x,' '); /* £ nel NOME */
cod_cap_res = tranwrd(cod_cap_res,'A3'x,' '); /* £ nel CAP */
run;
发布于 2022-09-16 19:19:43
此问题是原始数据集中至少一个变量的存储长度太短,无法适应至少一个字符串的UTF-8表示所需的扩展长度。
下面是演示这个问题的简单方法。使用WLATIN1 (或LATIN1)编码的所有256个可能字符创建一个简单文件。
340 %put %sysfunc(getoption(encoding,keyword));
ENCODING=WLATIN1
341 data 'c:\downloads\wlatin1.sas7bdat';
342 string = collate(0,256);
343 run;
NOTE: The data set c:\downloads\wlatin1.sas7bdat has 1 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
现在尝试使用UTF-8编码在会话中读取它。即使在输出数据集中使变量更长,转换也会失败。
38 data test1;
39 length string $1024;
40 set 'c:\downloads\wlatin1.sas7bdat';
NOTE: Data file WC000001.WLATIN1.DATA is in a format that is native to another host, or the file encoding does not match the
session encoding. Cross Environment Data Access will be used, which might require additional CPU resources and might reduce
performance.
41 run;
ERROR: Some character data was lost during transcoding in the dataset WC000001.WLATIN1. Either the data contains characters that
are not representable in the new encoding or truncation occurred during transcoding.
NOTE: The DATA step has been abnormally terminated.
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.TEST1 may be incomplete. When this step was stopped there were 0 observations and 1 variables.
WARNING: Data set WORK.TEST1 was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds
因此,要修复该问题,请使用编码=“ANY”读取文件,然后将字符串从WLATIN1转换为UTF-8。
42 data test;
43 length string $1024;
44 set 'c:\downloads\wlatin1.sas7bdat' (encoding='any');
45 string = kcvt(string,'wlatin1','utf-8');
46 run;
NOTE: There were 1 observations read from the data set c:\downloads\wlatin1.sas7bdat.
NOTE: The data set WORK.TEST has 1 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
https://stackoverflow.com/questions/73744935
复制相似问题