文章/答案/技术大牛

发布

社区首页 >问答首页 >编辑文本表格文件的有效方法，使每个单元格从相同的位置开始

问编辑文本表格文件的有效方法，使每个单元格从相同的位置开始
EN

Stack Overflow用户

提问于 2019-02-13 18:56:15

回答 3查看 67关注 0票数 0

我有一个类似表格结构的文本文件，每一行包含0到4个单词，被任意数量的空格分割。

hello     world  this  is
     an   example  file
is   there a   good
way to    clean this
  your help is   
highly      appreciated

我的目标是以这样一种格式编辑这个文件，其中元素从跨行的相同位置开始，例如：

hello    world        this     is
         an           example  file
is       there        a        good
way      to           clean    this
         your         help     is       
highly   appreciated

空格数是任意的。我更喜欢以空格开头的行跳过第一个元素，但这并不严格。

我相信有很多方法可以做到这一点，我的优先顺序是：

用一些巧妙的伎俩
通过bash命令
在具有以下功能的文本编辑器上
脚本语言(可能是python)

由于这是数据准备/验证过程的一部分，我不需要一个完美的方法；毕竟，我将进行手动检查。我正在寻找一种方法，比如说，80 %到90%的工作。

有人能提出一个有效的方法吗？

如果有用，示例文件是这里。

python

bash

vim

回答 3

Stack Overflow用户

回答已采纳

发布于 2019-02-13 19:36:01

下面是一种让column尊重前导空格的方法:将前导空格更改为其他字符

sed 's/^ /_ /' file | column -t | sed 's/^_ /  /'

hello   world        this     is
        an           example  file
is      there        a        good
way     to           clean    this
        your         help     is
highly  appreciated

票数 3

Stack Overflow用户

发布于 2019-02-13 20:31:39

Python的re模块，.format()提供了一种很好的4.方法。

列宽度基于文件+ column_pad值中最长的非空格字符串的长度。

您可以使用column_pad来改变实际的列宽。

如果传入rename_file=True，您将得到一个名为'cleaned_<filename>filename`的新文件。否则，脚本将用已清除的文件替换原始文件。

#!/usr/bin/env python
import re
import sys

def clean_columns(filename, rename_file=False, column_pad=4):
    if rename_file:
        cleaned_filename = 'cleaned_' + filename
    else:
        cleaned_filename = filename

    cleaned_text = ''

    with open(filename, 'r') as dirty_file:
        text = dirty_file.readlines()

    string_list = list(
        {string.strip()
                for line in text
                for string in line.strip().split(' ')})

    max_string_length = len(max(string_list, key=len))
    column_width = max_string_length + column_pad
    formatting_string = '{: <' + str(column_width) + '}'

    for line in text:
        line = re.sub(r'\s+',' ', line).split(' ')
        formatting = formatting_string * len(line)
        line = formatting.format(*line)
        cleaned_text += line + '\n'

    with open(cleaned_filename, 'w') as cleaned:
        cleaned.write(cleaned_text)


clean_columns('sample.txt', rename_file=True, column_pad=8)

输出：

hello              world              this               is
                   an                 example            file
is                 there              a                  good
way                to                 clean              this
                   your               help               is
highly             appreciated

票数 2

Stack Overflow用户

发布于 2019-02-14 08:22:24

您可以使用https://github.com/junegunn/vim-easy-align插件对齐各种分隔符。

只要选择行，按：

<CR>：映射到<Plug>(EasyAlign)
<C-P>：实时预览，可选
*：对齐所有分隔符
<C-D>：切换到左对齐分隔符
<C-X>\s\@<=\S\+：在空格后选择非空格作为分隔符

或者使用命令：'<,'>EasyAlign */\s\@<=\S\+/dl

票数 2

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/54677474

复制

相似问题

问编辑文本表格文件的有效方法，使每个单元格从相同的位置开始
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问编辑文本表格文件的有效方法，使每个单元格从相同的位置开始EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问编辑文本表格文件的有效方法，使每个单元格从相同的位置开始
EN