文章/答案/技术大牛

发布

社区首页 >问答首页 >熊猫的课文格式解读

问熊猫的课文格式解读
EN

Stack Overflow用户

提问于 2021-05-05 15:27:14

回答 1查看 75关注 0票数 2

我有一张桌子的文本，我想读给熊猫听。

我可以使用\n来分隔行，但是如何将它们以格式分开(2个文本字段，然后是6x数字)。

是否有使用regex或类似的方法？

table_text = '''Name AIC sector Price (last close) Price (bid) Price (offer) NAV Total assets (£m) Market cap (£m)
  3i Infrastructure Plc Infrastructure GBX 296.00 2.96 2.96 254.50 2,268.700 2,638.645
  Aberdeen Asian Income Fund Limited Asia Pacific Income GBX 227.50 2.26 2.29 252.51 479.110 399.796
  Aberdeen Diversified Income & Growth Ord Flexible Investment GBX 95.20 0.95 0.96 115.34 379.030 294.985
  Aberdeen Emerging Markets Investment Company Limited Global Emerging Markets GBX 704.00 6.98 7.10 829.47 391.268 323.595
  Aberdeen Japan Investment Trust Plc Japan GBX 712.50 7.00 7.25 784.79 114.957 94.198
  Aberdeen Latin American Income Latin America GBX 57.00 0.54 0.57 62.13 40.985 32.555
  Aberdeen New Dawn Asia Pacific GBX 322.00 3.22 3.26 365.56 431.544 350.752
  Aberdeen New India Investment Trust Plc India GBX 516.00 5.16 5.18 601.47 375.170 301.268
  Aberdeen New Thai Investment Trust Plc Country Specialist GBX 445.00 4.40 4.50 516.30 92.585 71.180
  Aberdeen Smaller Companies Income Trust UK Smaller Companies GBX 358.00 3.56 3.60 397.45 95.028 79.153
  Aberdeen Standard Asia Focus 2025 CULS Asia Pacific Smaller Companies GBX 100.95 1.01 1.01 97.25 391.484 37.026
  Aberdeen Standard Asia Focus PLC Asia Pacific Smaller Companies GBX 1,280.00 12.75 13.00 1,440.65 483.841 402.730
  Aberdeen Standard Equity Inc Trust plc UK Equity Income GBX 353.00 3.50 3.56 379.60 203.368 170.598
  Aberdeen Standard European Logistics Income PLC Property - Europe GBX 116.00 1.15 1.16 117.82 309.808 305.022
  Aberforth Smaller Companies Trust Plc UK Smaller Companies GBX 1,496.00 14.94 15.00 1,613.41 1,513.467 1,327.297
  Aberforth Split Level Income Trust Plc UK Smaller Companies GBX 80.10 0.80 0.81 91.46 228.143 152.390
  Aberforth Split Level Income ZDP 2024 UK Smaller Companies GBX 111.50 1.10 1.13 113.83 227.713 53.032
  Acorn Income Fund Ltd UK Equity & Bond Income GBX 351.00 3.46 3.56 415.97 100.206 55.517
  Acorn Income Fund ZDP 2022 UK Equity & Bond Income GBX 161.00 1.61 1.61 162.09 34.413 34.182
  AEW UK REIT Ord Property - UK Commercial GBX 92.40 0.92 0.92 97.85 194.107 146.384'''
  
df = pd.DataFrame([x.split(';') for x in table_text.split('\n')])
print(df)

产出：

                                                    0
0   Name AIC sector Price (last close) Price (bid)...
1     3i Infrastructure Plc Infrastructure GBX 296...
2     Aberdeen Asian Income Fund Limited Asia Paci...
3     Aberdeen Diversified Income & Growth Ord Fle...
4     Aberdeen Emerging Markets Investment Company...
5     Aberdeen Japan Investment Trust Plc Japan GB...
6     Aberdeen Latin American Income Latin America...
7     Aberdeen New Dawn Asia Pacific GBX 322.00 3....
8     Aberdeen New India Investment Trust Plc Indi...
9     Aberdeen New Thai Investment Trust Plc Count...
10    Aberdeen Smaller Companies Income Trust UK S...
11    Aberdeen Standard Asia Focus 2025 CULS Asia ...
12    Aberdeen Standard Asia Focus PLC Asia Pacifi...
13    Aberdeen Standard Equity Inc Trust plc UK Eq...
14    Aberdeen Standard European Logistics Income ...
15    Aberforth Smaller Companies Trust Plc UK Sma...
16    Aberforth Split Level Income Trust Plc UK Sm...
17    Aberforth Split Level Income ZDP 2024 UK Sma...
18    Acorn Income Fund Ltd UK Equity & Bond Incom...
19    Acorn Income Fund ZDP 2022 UK Equity & Bond ...
20    AEW UK REIT Ord Property - UK Commercial GBX...

编辑：

这是我讨厌的做法。但是，它依赖于有一个用"GBX“填充的货币列。

会欢迎任何关于更好的方法的想法吗？

是否有一种正则化的方法可以找到三个大写字母，前面有一个空格，然后是一个空格，然后是数字呢？这将在没有硬编码"GBX“的情况下找到这种货币。

def convert_rows(df):
    sector_name = "GBX"   
    for index, row in df.iterrows():
        if sector_name in row[0]:
            name = row[0].split(sector_name)[0]
            numbers = row[0].split(sector_name)[1]
            df.at[index, ['Name']] = name
            df.at[index, ['AIC sector']] = sector_name
            df.at[index,['Price (last close)', 'Price (bid)', 'Price (offer)', 'NAV', 'Total assets (£m)', 'Market cap (£m)']] = numbers.split()
    return df
 
df = convert_rows(df)

regex

pandas

python-re

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-05-07 08:01:43

你可以试试这个：

import re

def convert_rows(df): 
    for index, row in df.iterrows():
        # Search for the pattern
        sector_name = re.match(r".+\s([A-Z]{3})\s\d+.+", row[0])
        if sector_name:
            sector_name = sector_name.group(1)  # GBX for instance
            name = row[0].split(sector_name)[0]
            numbers = row[0].split(sector_name)[1]
            df.at[index, ['Name']] = name
            df.at[index, ['AIC sector']] = sector_name
            df.at[index,['Price (last close)', 'Price (bid)', 'Price (offer)', 'NAV', 'Total assets (£m)', 'Market cap (£m)']] = numbers.split()
    return df

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/67404272

复制

相似问题

问熊猫的课文格式解读
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问熊猫的课文格式解读EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问熊猫的课文格式解读
EN