文章/答案/技术大牛

发布

社区首页 >问答首页 >Powershell -从多个文件中提取非UTF-8字符，重写新文件并创建一个包含错误字符的新文件(ebcdic?)

问Powershell -从多个文件中提取非UTF-8字符，重写新文件并创建一个包含错误字符的新文件(ebcdic?)
EN

Stack Overflow用户

提问于 2021-03-11 23:24:35

回答 1查看 200关注 0票数 0

我有一个小脚本，可以用来查找和替换文件中的字符或字符串。它可以工作，我可以用它来替换非UTF-8字符。

我需要做的是运行脚本一次，替换一个镜头中的所有无效数据，并创建另一个具有文件名和错误字符的文件。现在，我不得不一遍又一遍地运行脚本，不管有多少无效字符我都可以通过眼球来识别。然后，我用我运行的脚本的内容和运行脚本的文件编辑我的跟踪文件。效率一点也不高。为了清楚起见，我几乎不知道如何编写跟踪更正内容的第二部分。有没有人能提供一种更好的方法呢？

谢谢你，-Ron

$old = 'BAD DATA'
$new = ' '

$configFiles = Get-ChildItem . *.* -rec
foreach ($file in $configFiles)
{
    (Get-Content $file.PSPath) |
    Foreach-Object { $_ -replace "$old", "$new" } |
    Set-Content $file.PSPath
}

这是我的数据样本。

"PARTHENIA STREET°212 "，"CAUGA PARK“

十六进制中的数据‘°’是c2和b0。FTP之前的原始文件是一个单字节的HEX 09。它不仅转换错误，还在文件中添加了一个btye。

regex

powershell

utf-8

回答 1

Stack Overflow用户

发布于 2021-03-12 01:30:12

下面是一个基于ASCII-to-EBCDIC or EBCDIC-to-ASCII和Working with non-native PowerShell encoding (EBCDIC)将ebcidic转换为ascii的示例，但是ebcidic文件完全无法识别。它没有BOM。该文件是用sftp下载的，但听起来好像已经损坏了。

"hi`tthere","how`tare" | set-content file.txt  # tab 0x09 in the middle 

# From ASCII to EBCDIC
$asciibytes = get-content file.txt -Encoding byte 
$rawstring = [System.Text.Encoding]::ASCII.GetString($asciibytes)
$ebcdicbytes = [System.Text.Encoding]::GetEncoding('ebcdic-cp-us').getbytes($rawstring)
$ebcdicbytes | set-content ebcidic.txt -Encoding Byte

# From EBCDIC to ASCII
$ebcidicbytes = get-content ebcidic.txt -Encoding byte 
$rawstring = [System.Text.Encoding]::getencoding('ebcdic-cp-us').GetString($ebcidicbytes)
$asciibytes = [system.text.encoding]::ASCII.GetBytes($rawstring)
$asciibytes | set-content ascii.txt -Encoding Byte

下面是一个名为nonascii.ps1的脚本，它剥离非ascii字符(而不是ascii表中的空格和波浪符之间以及制表符之间)，并写入相同的文件名。

(get-content $args[0]) -replace '[^ -~\t]' | set-content $args[0]

请注意，Powershell5.1的get-content不能识别没有‘- utf8 utf8’参数的bom文件。

get-content file -encoding utf8

还请注意，Powershell6.2和更高版本可以使用.net已知的任何编码，尽管制表符完成没有反映这一点：

"hi`tthere" | set-content ebcidic.txt -encoding ebcdic-cp-us
get-content ebcidic.txt -encoding ebcdic-cp-us

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/66585479

复制

相似问题

问Powershell -从多个文件中提取非UTF-8字符，重写新文件并创建一个包含错误字符的新文件(ebcdic?)
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Powershell -从多个文件中提取非UTF-8字符，重写新文件并创建一个包含错误字符的新文件(ebcdic?)EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Powershell -从多个文件中提取非UTF-8字符，重写新文件并创建一个包含错误字符的新文件(ebcdic?)
EN