文章/答案/技术大牛

发布

社区首页 >问答首页 >使用Bash脚本选择具有特定名称的列和行

问使用Bash脚本选择具有特定名称的列和行
EN

Stack Overflow用户

提问于 2016-05-10 00:28:45

回答 1查看 5.3K关注 0票数 1

我正在处理一个非常大的文本文件(4GB)，我想用其中所需的数据创建一个更小的文件。它是一个由选项卡分隔的文件，并且有行和列标题。基本上，我希望选择具有给定列和/或行名的数据子集。

     colname_1    colname_2    colname_3    colname_4
row_1    1            2             3            5
row_2    4            6             9            1
row_3    2            3             4            2

我计划有一个包含我想要的列的列表的文件。

colname_1    colname_3

我是个编写脚本的新手，我真的不知道怎么做。我看到了其他的例子，但他们都是新的，他们想要的列号，但我不想。对不起，如果这是一个重复的问题，我试图搜索。

我希望结果是

     colname_1     colname_3
row_1    1             3
row_2    2             9
row_3    2             4

bash

回答 1

Stack Overflow用户

发布于 2016-05-10 03:14:24

实际上，可以通过跟踪与包含列列表的文件中的列名匹配的列的数组索引来实现这一点。在为列列表文件中的列名找到数据文件中的数组索引之后，您只需读取数据文件(从第二行开始)，然后输出row_label以及在将列列表文件与原始列匹配时确定的数组索引处的列的数据。

可能有几种方法来处理这个问题，下面假设每个列中的数据不包含任何空格。数组的使用假定bash (或其他高级shell支持数组)，而不是POSIX shell。

脚本以两个文件名作为输入。第一个是原始数据文件。第二个是列列表文件。一种办法可以是：

#!/bin/bash

declare -a cols  ## array holding original columns from original data file
declare -a csel  ## array holding columns to select (from file 2)
declare -a cpos  ## array holding array indexes of matching columns

cols=( $(head -n 1 "$1") )  ## fill cols from 1st line of data file
csel=( $(< "$2") )          ## read select columns from file 2

## fill column position array
for ((i = 0; i < ${#csel[@]}; i++)); do
    for ((j = 0; j < ${#cols[@]}; j++)); do
        [ "${csel[i]}" = "${cols[j]}" ] && cpos+=( $j )
    done
done

printf " " 
for ((i = 0; i < ${#csel[@]}; i++)); do   ## output header row
    printf "    %s" "${csel[i]}"
done

printf "\n"     ## output newline
unset cols      ## unset cols to reuse in reading lines below

while read -r line; do        ## read each data line in data file 
    cols=( $line )            ## separate into cols array
    printf "%s" "${cols[0]}"  ## output row label
    for ((j = 0; j < ${#cpos[@]}; j++)); do
        [ "$j" -eq "0" ] && { ## handle format for first column
            printf "%5s" "${cols[$((${cpos[j]}+1))]}"
            continue
        }                     ## output remaining columns
        printf "%13s" "${cols[$((${cpos[j]}+1))]}"
    done
    printf "\n"
done < <( tail -n+2 "$1" )

使用示例数据如下：

数据文件

$ cat dat/col+data.txt
     colname_1    colname_2    colname_3    colname_4
row_1    1            2             3            5
row_2    4            6             9            1
row_3    2            3             4            2

列选择文件

$ cat dat/col.txt
colname_1    colname_3

示例使用/输出

$ bash colnum.sh dat/col+data.txt dat/col.txt
     colname_1    colname_3
row_1    1            3
row_2    4            9
row_3    2            4

试试看，如果你有什么问题，请告诉我。注意，bash并不以处理大文件的速度惊人而闻名，但只要列列表不长得吓人，脚本就应该相当快。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/37127467

复制

相似问题

问使用Bash脚本选择具有特定名称的列和行
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用Bash脚本选择具有特定名称的列和行EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用Bash脚本选择具有特定名称的列和行
EN