文章/答案/技术大牛

发布

社区首页 >问答首页 >读取PowerShell中较大的CSV解析多列以获得唯一值根据列中最早的值保存结果

问读取PowerShell中较大的CSV解析多列以获得唯一值根据列中最早的值保存结果
EN

Stack Overflow用户

提问于 2019-06-02 22:26:15

回答 2查看 559关注 0票数 0

我有一个很大的1000万行文件(当前是CSV)。我需要通读文件，并删除基于多列的重复项。

数据行的示例如下所示：

ComputerName、IPAddress、MacAddress、CurrentDate、FirstSeenDate

我想检查MacAddress和ComputerName中是否有重复项，如果发现重复项，则保留具有最旧FirstSeenDate的唯一项。

我使用import-csv将CSV读取到一个变量中，然后使用sort-object等方法处理该变量，但速度非常慢。

$data | Group-Object -Property ComputerName,MaAddress | ForEach-Object{$_.Group | Sort-Object -Property FirstSeenDate | Select-Object -First 1}

我想我可以使用stream.reader逐行读取CSV，基于数组包含的逻辑构建一个唯一的数组。

有什么想法？

powershell

csv

unique

large-data

Stack Overflow用户

发布于 2019-06-03 04:44:06

如果性能是主要考虑因素，我可能会使用Python。或者LogParser。

但是，如果我必须使用PowerShell，我可能会尝试如下所示：

$CultureInfo = [CultureInfo]::InvariantCulture
$DateFormat = 'M/d/yyyy' # Use whatever date format is appropriate

# We need to convert the strings that represent dates. You can skip the ParseExact() calls if the dates are already in a string sortable format (e.g., yyyy-MM-dd).
$Data = Import-Csv $InputFile | Select-Object -Property ComputerName, IPAddress, MacAddress, @{n = 'CurrentDate'; e = {[DateTime]::ParseExact($_.CurrentDate, $DateFormat, $CultureInfo)}}, @{n = 'FirstSeenDate'; e = {[DateTime]::ParseExact($_.FirstSeenDate, $DateFormat, $CultureInfo)}}

$Results = @{}
foreach ($Record in $Data) {
    $Key = $Record.ComputerName + ';' + $Record.MacAddress
    if (!$Results.ContainsKey($Key)) {
        $Results[$Key] = $Record
    }
    elseif ($Record.FirstSeenDate -lt $Results[$Key].FirstSeenDate) {
        $Results[$Key] = $Record
    }
}

$Results.Values | Sort-Object -Property ComputerName, MacAddress | Export-Csv $OutputFile -NoTypeInformation

这可能会更快，因为尽管Group-Object非常强大，但它通常是一个瓶颈。

如果您真的想尝试使用流读取器，请尝试使用Microsoft.VisualBasic.FileIO.TextFieldParser class，它是.Net框架的一部分，尽管它的名称有点误导。您可以通过运行Add-Type -AssemblyName Microsoft.VisualBasic来访问它。

票数 0

查看全部 2 条回答

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/56415931

复制

相似问题

问读取PowerShell中较大的CSV解析多列以获得唯一值根据列中最早的值保存结果
EN

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问读取PowerShell中较大的CSV解析多列以获得唯一值根据列中最早的值保存结果EN

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问读取PowerShell中较大的CSV解析多列以获得唯一值根据列中最早的值保存结果
EN