Python3中的字符串处理小技巧

深度学习与Python

发布于 2019-07-10 14:14:07

5930

发布于 2019-07-10 14:14:07

python编程虽然有很多库函数可以使用，但是掌握一些必要的编程技巧也是非常重要的，如果你熟练使用诸如引用计数、类型检查、数据操作、使用堆栈、管理变量、消除列表、使用越来越少的“for”循环等等，那么你的代码会变得非常简洁高效。那么阅读你的代码将会是一种享受。

Python的速度历来是被诟病的，但是不同的方法编写出来的代码速度也是不一样的。比如说编写Fibonacci有几种方法可以实现。其中最流行的是只使用'for循环'，因为大多数来自C背景的程序员使用大量的for循环进行迭代。但是如果你通过使用Python数据结构提供的内部循环来实现的话将会比”for循环“更加的快速简洁。现在给大家分享一些Python里内置的文本处理方法：

>>> m = ['i am amazing in all the ways I should have']
>>> m[0]
'i am amazing in all the ways I should have'
>>> m[0].split()
['i', 'am', 'amazing', 'in', 'all', 'the', 'ways', 'I', 'should', 'have']
>>> n = m[0].split()
>>> n[2:]
['amazing', 'in', 'all', 'the', 'ways', 'I', 'should', 'have']
>>> n[0:2]
['i', 'am']
>>> n[-2]
'should'
>>>
>>> n[:-2]
['i', 'am', 'amazing', 'in', 'all', 'the', 'ways', 'I']
>>> n[::-2]
['have', 'I', 'the', 'in', 'am']

这些是使用列表来进行字符串操作，看看这些操作，没有使用for循环，依然简洁高效的完成目标功能。

下面是一些Collections模块中的一些用法：

from collections import Counter
>>> Counter(xrange(10))
Counter({0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 1, 6: 1, 7: 1, 8: 1, 9: 1})
>>> just_list_again = Counter(xrange(10))
>>> just_list_again_is_dict = just_list_again
>>> just_list_again_is_dict[1]
1
>>> just_list_again_is_dict[2]
1
>>> just_list_again_is_dict[3]
1
>>> just_list_again_is_dict['3']
0
Some other methods using counter:
Counter('abraakadabraaaaa')
Counter({'a': 10, 'r': 2, 'b': 2, 'k': 1, 'd': 1})
>>> c1=Counter('abraakadabraaaaa')
>>> c1.most_common(4)
[('a', 10), ('r', 2), ('b', 2), ('k', 1)]
>>> c1['b']
2
>>> c1['b'] # work as dictionary
2
>>> c1['k'] # work as dictionary
1
>>> type(c1)
<class 'collections.Counter'>
>>> c1['b'] = 20
>>> c1.most_common(4)
[('b', 20), ('a', 10), ('r', 2), ('k', 1)]
>>> c1['b'] += 20
>>> c1.most_common(4)
[('b', 40), ('a', 10), ('r', 2), ('k', 1)]
>>> c1.most_common(4)
[('b', 20), ('a', 10), ('r', 2), ('k', 1)]

下面是统计字符串中每个字符的的数量

>>> from collections import Counter
>>> c1=Counter('hello hihi hoo')
>>> +c1
Counter({'h': 4, 'o': 3, ' ': 2, 'i': 2, 'l': 2, 'e': 1})
>>> -c1
Counter()
>>> c1['x']
0

OrderedDict:

>>> from collections import OrderedDict
>>> d = {'banana': 3, 'apple':4, 'pear': 1, 'orange': 2}
>>> new_d = OrderedDict(sorted(d.items()))
>>> new_d
OrderedDict([('apple', 4), ('banana', 3), ('orange', 2), ('pear', 1)])
>>> for key in new_d:
...     print (key, new_d[key])
... 
apple 4
banana 3
orange 2
pear 1

Namedtuple:

# The primitive approach
lat_lng = (37.78, -122.40)
print 'The latitude is %f' % lat_lng[0]
print 'The longitude is %f' % lat_lng[1]
# The glorious namedtuple
LatLng = namedtuple('LatLng', ['latitude', 'longitude'])
lat_lng = LatLng(37.78, -122.40)
print ('The latitude is %f' % lat_lng.latitude)
print ('The longitude is %f' % lat_lng.longitude)

ChainMap:

ChainMap需要在Python3.3及以上的版本中使用，ChainMap表示将两个紫电链接到一起但并没有合并他们。

>>> from collections import ChainMap
>>> a1 = {'m':2,'n':20,'r':490}
>>> a2 = {'m':34,'n':32,'z':90}
>>> chain = ChainMap(a1,a2)
>>> chain
ChainMap({'n': 20, 'm': 2, 'r': 490}, {'n': 32, 'm': 34, 'z': 90})
>>> chain['n']
20
>>> new_chain = ChainMap({'a':22,'n':27},chain)
>>> new_chain['a']
22
>>> new_chain['n']
27

Comprehensions:您也可以使用词典或集合进行理解。

>>> m = {'a': 1, 'b': 2, 'c': 3, 'd': 4}
>>> m
{'d': 4, 'a': 1, 'b': 2, 'c': 3}
>>> {v: k for k, v in m.items()}
{1: 'a', 2: 'b', 3: 'c', 4: 'd'}

StartsWith和EndsWith：

StartsWith和EndsWith表示开始和结束。一个字符串需要索引开头和结尾时就可以使用startswith和endswith的方法。

phrase = "cat, dog and bird"
# See if the phrase starts with these strings.
if phrase.startswith("cat"):
    print(True)
if phrase.startswith("cat, dog"):
    print(True)
# It does not start with this string.
if not phrase.startswith("elephant"):
    print(False)
Output
True
True
False

Map和IMap：

Map和IMap作为迭代的内置函数，map在Python3中使用生成器表达式重建，有助于节省大量内存，但在Python2中使用字典表达式，因此你可以在python2中使用'itertools'模块，在itertools中map函数的名称改为imap。

>>>m = lambda x:x*x
>>>print m
 at 0x7f61acf9a9b0>
>>>print m(3)
9
# now as we understand lamda returns the values of expressions for various functions as well, one just have to look
# for various other stuff when you really takes care of other things
>>>my_sequence = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]
>>>print map(m,my_sequence)
[1,4,9,16,25,36,49,64,81,100,121,144,169,196,225,256,289,324,361,400]

参考

https://towardsdatascience.com/python-for-text-processing-e8fa81802a71

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。

原始发表：2019-07-03，如有侵权请联系 cloudcommunity@tencent.com 删除

编程算法