我有一个巨大的Dataframe像下面这样
Event_ID Name CompanyAd ticket Revenue Expences
0 G-00001 ABC a097ABSD00E1|ABS_DEw|51Job, Inc.Cayman_Islands NMS|01922453|909234 671 6720 150
1 G-00002 CSA a097A34D10E1|ABS_DEw|724 Solution's Inc. Canada NMS|90922453|209234 5 56 18
2 G-00003 CSA a097ABSD20E1|ABS_DEw|A B SKF_Sweden OTC+|70922453|509234 5 78 38
3 G-00004 VSX a097ABSD00E1|ABS_DEw|A/S Steamship Company$ Torm Denmark"s NMS OTC+|2092286453|09234 23 34 23
4 G-00005 ABC a09712SD00E1|ABS_DEw|ABB Ltd. Switzerland OTC+|09262453|092394 4 89 150
5 G-00006 ABC a097ABS680E1|ABS_DEw|Aber Diamond Ltd. Canada CAP MKT|0922453|092234 60 73 55
6 G-00007 CSA a097ABSD23E1||Abitibi Consolidated Inc. Canada OTC +|092245653|0925634 60 345 110
7 G-00008 ABC a09734SD00E1|ABS_DEw|ABN Amro Bank's N.V. Netherlands AMEX - Preferred OTC+|560922453|09234 89 890 150
8 G-00009 VSX a397ABSD00E1|ABS_DEw|ABN Amro Holding N.V. Netherlands NYSE|092242353|09234 0 0 0
9 G-00010 CSA a097AB5560E1|ABS_DEw|Acambis plc United Kingdom OTC +|0922453|0926734 6 45 16
10 G-00011 VSX a097A12D00E1|ABS_DEw|Ace Aviation Holdings'aed Inc. Canada OTC|02922453|09234 3 39 23
11 G-00012 ABC a097ABSD00E1||Acetex Corp. Canada OTC - Debt+|097722453|092234 2 34 150
12 G-00013 VSX a097ABS560E1|ABS_DEw|Acrex Ventures, Ltd. Canada OTC+|0922453|0967234 4 89 48
13 G-00014 VSX a097AqwD00E1|ABS_DEw|ACS-Tech 80 Ltd. Israel CAP MKT|09242453|0956234 32 127 35
14 G-00015 ABC a097ABS230E1|ABS_DEw|Actions Semiconductor Co. Ltd. Cayman Islands NMS|092234453|0923674 3 84 55
15 G-00016 ABC a097ABS900E1||Adastra Minerals Inc. Canada OTC*|092246753|0928934 1 100 150
16 G-00017 CSA a097dfrD00E1|ABS_DEw|ADB Systems International Inc. Canada OTC|092234453|09234 23 525 90
我想要从一个长字符串中提取"CompanyAd“,并且可以使用第二个”\“和第三个”\“作为标识符。所以应该是在第二个“\”和第三个“”的中间。
Event_ID Name CompanyAd ticket Revenue Expences
0 G-00001 ABC 51Job, Inc.Cayman_Islands NMS 671 6720 150
1 G-00002 CSA 724 Solution's Inc. Canada NMS 5 56 18
2 G-00003 CSA A B SKF_Sweden OTC+ 5 78 38
3 G-00004 VSX A/S Steamship Company$ Torm Denmark"s NMS OTC+ 23 34 23
4 G-00005 ABC ABB Ltd. Switzerland OTC+ 4 89 150
5 G-00006 ABC Aber Diamond Ltd. Canada CAP MKT 60 73 55
6 G-00007 CSA Abitibi Consolidated Inc. Canada OTC + 60 345 110
7 G-00008 ABC ABN Amro Bank's N.V. Netherlands AMEX - Preferred OTC+ 89 890 150
8 G-00009 VSX ABN Amro Holding N.V. Netherlands NYSE 0 0 0
9 G-00010 CSA Acambis plc United Kingdom OTC + 6 45 16
10 G-00011 VSX Ace Aviation Holdings'aed Inc. Canada OTC 3 39 23
11 G-00012 ABC Acetex Corp. Canada OTC - Debt+ 2 34 150
12 G-00013 VSX ABS_DEw|Acrex Ventures, Ltd. Canada OTC+ 4 89 48
13 G-00014 VSX ABS_DEw|ACS-Tech 80 Ltd. Israel CAP MKT 32 127 35
14 G-00015 ABC ABS_DEw|Actions Semiconductor Co. Ltd. Cayman Islands NMS 3 84 55
15 G-00016 ABC Adastra Minerals Inc. Canada OTC*|092246753|0928934 1 100 150
16 G-00017 CSA ADB Systems International Inc. Canada OTC 23 525 90
我试图使用regex模式提取,但没有成功。任何帮助都会很感激的。提前感谢
发布于 2022-08-22 20:03:40
你可以用字符串分裂代替正则表达式?然后列出如下的理解:
df["CompanyAd"] = [x.split("|")[2] for x in df["CompanyAd"]]
希望这能起作用
发布于 2022-08-22 20:06:35
如果它总是位于第三位,您可以这样做:
df['CompanyAd'] = df['CompanyAd'].str.split('|').str[2]
print(df['CompanyAd'])
0 51Job, Inc.Cayman_Islands NMS
1 724 Solution's Inc. Canada NMS
2 A B SKF_Sweden OTC+
3 A/S Steamship Company$ Torm Denmark"s NMS OTC+
4 ABB Ltd. Switzerland OTC+
5 Aber Diamond Ltd. Canada CAP MKT
6 Abitibi Consolidated Inc. Canada OTC +
7 ABN Amro Bank's N.V. Netherlands AMEX - Prefer...
8 ABN Amro Holding N.V. Netherlands NYSE
9 Acambis plc United Kingdom OTC +
10 Ace Aviation Holdings'aed Inc. Canada OTC
11 Acetex Corp. Canada OTC - Debt+
12 Acrex Ventures, Ltd. Canada OTC+
13 ACS-Tech 80 Ltd. Israel CAP MKT
14 Actions Semiconductor Co. Ltd. Cayman Islands NMS
15 Adastra Minerals Inc. Canada OTC*
16 ADB Systems International Inc. Canada OTC
Name: CompanyAd, dtype: object
https://stackoverflow.com/questions/73450377
复制相似问题