blocks|key|424685|text|data['result']+=+data['result'].map(lambda+x:+x.lstrip('%2B-').rstrip('aAbBcC'))|type|code-block|depth|inlineStyleRanges|entityRanges|data|syntax|javascript|424686|unstyled|entityMap^0|0^^$0|@$1|2|3|4|5|6|7|G|8|@]|9|@]|A|$B|C]]|$1|D|3|-4|5|E|7|H|8|@]|9|@]|A|$]]]|F|$]]

<pre><code>data['result'] = data['result'].map(lambda x: x.lstrip('+-').rstrip('aAbBcC'))
</code></pre>

blocks|key|206628|text|我会使用pandas的替换函数，非常简单和强大，因为你可以使用正则表达式。下面我将使用regex+\D来删除任何非数字字符，但显然，使用regex可以获得相当大的创造力。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|206629|data['result'].replace(regex=True,inplace=True,to_replace=r'\D',value=r'')|code-block|syntax|javascript|206630|entityMap^0|0|0^^$0|@$1|2|3|4|5|6|7|I|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|J|8|@]|9|@]|A|$E|F]]|$1|G|3|-4|5|6|7|K|8|@]|9|@]|A|$]]]|H|$]]

i'd use the pandas replace function, very simple and powerful as you can use regex. Below i'm using the regex \D to remove any non-digit characters but obviously you could get quite creative with regex.

<pre><code>data['result'].replace(regex=True,inplace=True,to_replace=r'\D',value=r'')
</code></pre>

blocks|key|206608|text|在知道要从dataframe列中删除的位置数量的特殊情况下，可以在lambda函数中使用字符串索引来删除这些部分：|type|unstyled|depth|inlineStyleRanges|entityRanges|data|206609|最后一个字符：|206610|data['result']+=+data['result'].map(lambda+x:+str(x)[:-1])|code-block|syntax|javascript|206611|前两个字符：|206612|data['result']+=+data['result'].map(lambda+x:+str(x)[2:])|206613|entityMap^0|0|0|0|0|0^^$0|@$1|2|3|4|5|6|7|O|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|P|8|@]|9|@]|A|$]]|$1|D|3|E|5|F|7|Q|8|@]|9|@]|A|$G|H]]|$1|I|3|J|5|6|7|R|8|@]|9|@]|A|$]]|$1|K|3|L|5|F|7|S|8|@]|9|@]|A|$G|H]]|$1|M|3|-4|5|6|7|T|8|@]|9|@]|A|$]]]|N|$]]

In the particular case where you know the number of positions that you want to remove from the dataframe column, you can use string indexing inside a lambda function to get rid of that parts:

Last character:

<pre><code>data['result'] = data['result'].map(lambda x: str(x)[:-1])
</code></pre>

First two characters:

<pre><code>data['result'] = data['result'].map(lambda x: str(x)[2:])
</code></pre>

blocks|key|206584|text|这里有一个错误:当前不能向str.lstrip和str.rstrip传递参数|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|206585|http://github.com/pydata/pandas/issues/2411|206586|编辑:+2012-12-07现在可以在dev分支上运行：|206587|In+[8]:+df['result'].str.lstrip('%2B-').str.rstrip('aAbBcC')
Out[8]:+
1+++++52
2+++++62
3+++++44
4+++++30
5++++110
Name:+result|code-block|syntax|javascript|206588|entityMap|0|LINK|mutability|MUTABLE|url^0|D|A|O|A|0|0|17|0|0|0|0^^$0|@$1|2|3|4|5|6|7|V|8|@$9|W|A|X|B|C]|$9|Y|A|Z|B|C]]|D|@]|E|$]]|$1|F|3|G|5|6|7|10|8|@]|D|@$9|11|A|12|1|13]]|E|$]]|$1|H|3|I|5|6|7|14|8|@]|D|@]|E|$]]|$1|J|3|K|5|L|7|15|8|@]|D|@]|E|$M|N]]|$1|O|3|-4|5|6|7|16|8|@]|D|@]|E|$]]]|P|$Q|$5|R|S|T|E|$U|G]]]]

There's a bug here: currently cannot pass arguments to <code>str.lstrip</code> and <code>str.rstrip</code>:

<a href="http://github.com/pydata/pandas/issues/2411" rel="noreferrer">http://github.com/pydata/pandas/issues/2411</a>

EDIT: 2012-12-07 this works now on the dev branch:

<pre><code>In [8]: df['result'].str.lstrip('+-').str.rstrip('aAbBcC')
Out[8]: 
1 52
2 62
3 44
4 30
5 110
Name: result
</code></pre>

blocks|key|206672|text|一种非常简单的方法是使用extract方法来选择所有数字。只需向它提供正则表达式'\d%2B'，它可以提取任意数量的数字。|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|206673|df['result']+=+df.result.str.extract(r'(\d%2B)',+expand=True).astype(int)
df

++++time++result
1++09:00++++++52
2++10:00++++++62
3++11:00++++++44
4++12:00++++++30
5++13:00+++++110|code-block|syntax|javascript|206674|entityMap^0|C|7|14|5|0|0^^$0|@$1|2|3|4|5|6|7|M|8|@$9|N|A|O|B|C]|$9|P|A|Q|B|C]]|D|@]|E|$]]|$1|F|3|G|5|H|7|R|8|@]|D|@]|E|$I|J]]|$1|K|3|-4|5|6|7|S|8|@]|D|@]|E|$]]]|L|$]]

A very simple method would be to use the <code>extract</code> method to select all the digits. Simply supply it the regular expression <code>'\d+'</code> which extracts any number of digits.

<pre><code>df['result'] = df.result.str.extract(r'(\d+)', expand=True).astype(int)
df

 time result
1 09:00 52
2 10:00 62
3 11:00 44
4 12:00 30
5 13:00 110
</code></pre>

blocks|key|206751|text|假设您的DF在数字之间有这些额外的字符作为well.The的最后一个条目。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|206752|++result+++time
0+++%2B52A++09:00
1+++%2B62B++10:00
2+++%2B44a++11:00
3+++%2B30b++12:00
4++-110a++13:00
5+++3%2Bb0++14:00|code-block|syntax|javascript|206753|您可以尝试使用str.replace来删除字符，不仅可以从开始和结束位置删除，还可以从两者之间删除。|206754|DF['result']+=+DF['result'].str.replace('\%2B%7Ca%7Cb%7C\-%7CA%7CB',+'')|206755|输出：|206756|++result+++time
0+++++52++09:00
1+++++62++10:00
2+++++44++11:00
3+++++30++12:00
4++++110++13:00
5+++++30++14:00|206757|entityMap^0|0|0|0|0|0|0^^$0|@$1|2|3|4|5|6|7|Q|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|R|8|@]|9|@]|A|$E|F]]|$1|G|3|H|5|6|7|S|8|@]|9|@]|A|$]]|$1|I|3|J|5|D|7|T|8|@]|9|@]|A|$E|F]]|$1|K|3|L|5|6|7|U|8|@]|9|@]|A|$]]|$1|M|3|N|5|D|7|V|8|@]|9|@]|A|$E|F]]|$1|O|3|-4|5|6|7|W|8|@]|9|@]|A|$]]]|P|$]]

Suppose your DF is having those extra character in between numbers as well.The last entry.

<pre><code> result time
0 +52A 09:00
1 +62B 10:00
2 +44a 11:00
3 +30b 12:00
4 -110a 13:00
5 3+b0 14:00
</code></pre>

You can try str.replace to remove characters not only from start and end but also from in between.

<pre><code>DF['result'] = DF['result'].str.replace('\+|a|b|\-|A|B', '')
</code></pre>

Output:

<pre><code> result time
0 52 09:00
1 62 10:00
2 44 11:00
3 30 12:00
4 110 13:00
5 30 14:00
</code></pre>

blocks|key|424777|text|我经常对这些类型的任务使用列表理解，因为它们通常更快。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|424778|这样做的不同方法(例如，修改DataFrame中序列的每个元素)在性能上可能有很大的差异。通常，列表理解可能是最快的-请参阅下面的代码竞赛以完成此任务：|424779|import+pandas+as+pd
#Map
data+=+pd.DataFrame({'time':['09:00','10:00','11:00','12:00','13:00'],+'result':['%2B52A','%2B62B','%2B44a','%2B30b','-110a']})
%25timeit+data['result']+=+data['result'].map(lambda+x:+x.lstrip('%2B-').rstrip('aAbBcC'))
10000+loops,+best+of+3:+187+µs+per+loop
#List+comprehension
data+=+pd.DataFrame({'time':['09:00','10:00','11:00','12:00','13:00'],+'result':['%2B52A','%2B62B','%2B44a','%2B30b','-110a']})
%25timeit+data['result']+=+[x.lstrip('%2B-').rstrip('aAbBcC')+for+x+in+data['result']]
10000+loops,+best+of+3:+117+µs+per+loop
#.str
data+=+pd.DataFrame({'time':['09:00','10:00','11:00','12:00','13:00'],+'result':['%2B52A','%2B62B','%2B44a','%2B30b','-110a']})
%25timeit+data['result']+=+data['result'].str.lstrip('%2B-').str.rstrip('aAbBcC')
1000+loops,+best+of+3:+336+µs+per+loop|code-block|syntax|javascript|424780|entityMap^0|0|0|0^^$0|@$1|2|3|4|5|6|7|K|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|L|8|@]|9|@]|A|$]]|$1|D|3|E|5|F|7|M|8|@]|9|@]|A|$G|H]]|$1|I|3|-4|5|6|7|N|8|@]|9|@]|A|$]]]|J|$]]

I often use list comprehensions for these types of tasks because they're often faster.

There can be big differences in performance between the various methods for doing things like this (i.e. modifying every element of a series within a DataFrame). Often a list comprehension can be fastest - see code race below for this task:

<pre><code>import pandas as pd
#Map
data = pd.DataFrame({'time':['09:00','10:00','11:00','12:00','13:00'], 'result':['+52A','+62B','+44a','+30b','-110a']})
%timeit data['result'] = data['result'].map(lambda x: x.lstrip('+-').rstrip('aAbBcC'))
10000 loops, best of 3: 187 µs per loop
#List comprehension
data = pd.DataFrame({'time':['09:00','10:00','11:00','12:00','13:00'], 'result':['+52A','+62B','+44a','+30b','-110a']})
%timeit data['result'] = [x.lstrip('+-').rstrip('aAbBcC') for x in data['result']]
10000 loops, best of 3: 117 µs per loop
#.str
data = pd.DataFrame({'time':['09:00','10:00','11:00','12:00','13:00'], 'result':['+52A','+62B','+44a','+30b','-110a']})
%timeit data['result'] = data['result'].str.lstrip('+-').str.rstrip('aAbBcC')
1000 loops, best of 3: 336 µs per loop
</code></pre>

blocks|key|206700|text|尝试使用正则表达式：|type|unstyled|depth|inlineStyleRanges|entityRanges|data|206701|import+re
data['result']+=+data['result'].map(lambda+x:+re.sub('[-%2BA-Za-z]',x)|code-block|syntax|javascript|206702|entityMap^0|0|0^^$0|@$1|2|3|4|5|6|7|I|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|J|8|@]|9|@]|A|$E|F]]|$1|G|3|-4|5|6|7|K|8|@]|9|@]|A|$]]]|H|$]]

Try this using regular expression:

<pre><code>import re
data['result'] = data['result'].map(lambda x: re.sub('[-+A-Za-z]',x)
</code></pre>

I am looking for an efficient way to remove unwanted parts from strings in a DataFrame column.

Data looks like:

<pre><code> time result
1 09:00 +52A
2 10:00 +62B
3 11:00 +44a
4 12:00 +30b
5 13:00 -110a
</code></pre>

I need to trim these data to:

<pre><code> time result
1 09:00 52
2 10:00 62
3 11:00 44
4 12:00 30
5 13:00 110
</code></pre>

I tried <code>.str.lstrip('+-')</code> and .<code>str.rstrip('aAbBcC')</code>, but got an error: 

<pre><code>TypeError: wrapper() takes exactly 1 argument (2 given)
</code></pre>

Any pointers would be greatly appreciated!

Remove unwanted parts from strings in a column

我正在寻找一种有效的方法来删除DataFrame列中字符串中不需要的部分。数据如下所示：    time    result1    09:00   +52A2    10:00   +62B3    11:00   +44a4    12:00   +30b5    13:00   -110a我需要将这些数据修剪为：    time    result1    09:00   522    10

问从列中的字符串中删除不需要的部分
EN

回答 8

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问从列中的字符串中删除不需要的部分EN

回答 8

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问从列中的字符串中删除不需要的部分
EN