Python基础篇 strings 03

披头

发布于 2019-12-26 11:02:08

3290

发布于 2019-12-26 11:02:08

文章被收录于专栏：datartisan

Python基础，strings 03

找出子字符串出现频次和出现的索引位置核查是否存在字符串并找出其索引位置查找所有字符的出现次数和索引

找出子字符串出现频次和出现的索引位置

使用 string.count() 计算子字符串出现频次

string.count(s, sub[, start[, end]])

In [35]: mainStr = 'This is a sample string and a sample code. It is very short.'
    ...:
    ...: # Get the occurrence count of sub-string in main string.
    ...: count = mainStr.count('sample')
    ...:
    ...: print("'sample' sub string frequency / occurrence count : " , count)
'sample' sub string frequency / occurrence count :  2

使用 python 正则表达式计算出现频次

In [36]: import re
    ...:
    ...: # Create a Regex pattern to match the substring
    ...: regexPattern = re.compile("sample")
    ...:
    ...: # Get a list of strings that matches the given pattern i.e. substring
    ...: listOfMatches = regexPattern.findall(mainStr)
    ...:
    ...: print("'sample' sub string frequency / occurrence count : ", len(listOfMatches))
'sample' sub string frequency / occurrence count :  2

统计重叠字符串

string.count() 不能正确统计重叠字符串中的出现次数

In [37]: mainStr = 'thathatthat'

In [38]: # string.count() will not be able to count occurrences of overlapping sub-strings
    ...: count = mainStr.count('that')

In [39]: count
Out[39]: 2
'------that出现次数应为3------'

In [40]: # 自定义函数，用于查找重叠字符串出现次数
    ...: def frequencyCount(mainStr, subStr):
    ...:    counter = pos = 0
    ...:    while(True):
    ...:        pos = mainStr.find(subStr , pos)
    ...:        # pos索引作为find起始位置，找不到时返回-1
    ...:        if pos > -1:
    ...:            counter = counter + 1
    ...:            pos = pos + 1
    ...:        else:
    ...:            break
    ...:    return counter
    ...:

In [41]: # count occurrences of overlapping substrings
    ...: count = frequencyCount(mainStr, 'that')
    ...:
    ...: print("'that' sub string frequency count : ", count)
'that' sub string frequency count :  3

找出出现次数和所有的起始索引位置

using Python regex finditer()

In [50]: print('**** Find Occurrence count and all index position of a sub-string in a String **** ')
    ...:
    ...: import re
    ...:
    ...: mainStr = 'This is a sample string and a sample code. It is very Short.'
    ...:
    ...: # Create a Regex pattern to match the substring
    ...: regexPattern = re.compile('sample')
    ...:
    ...: # Iterate over all the matches of substring using iterator of matchObjects returnes by finditer()
    ...: iteratorOfMatchObs = regexPattern.finditer(mainStr)
    ...: indexPositions = []
    ...: count = 0
    ...: for matchObj in iteratorOfMatchObs:
    ...:    indexPositions.append(matchObj.start())
    ...:    count = count + 1
    ...:
    ...: print("Occurrence Count of substring 'sample' : ", count)
    ...: print("Index Positions of 'sample' are : ", indexPositions)
**** Find Occurrence count and all index position of a sub-string in a String ****
Occurrence Count of substring 'sample' :  2
Index Positions of 'sample' are :  [10, 30]

使用自定义函数查找重叠字符串索引位置

In [51]: def frequencyCountAndPositions(mainStr, subStr):
    ...:    counter = pos = 0
    ...:    indexpos = []
    ...:    while(True):
    ...:        pos = mainStr.find(subStr , pos)
    ...:        # pos索引作为find起始位置，找不到时返回-1
    ...:        if pos > -1:
    ...:            indexpos.append(pos)
    ...:            counter = counter + 1
    ...:            pos = pos + 1
    ...:        else:
    ...:            break
    ...:    return (counter, indexpos)
    ...:

In [52]: mainStr = 'thathatthat'
    ...:
    ...: result = frequencyCountAndPositions(mainStr, 'that')
    ...:
    ...: print("Occurrence Count of overlapping sub-strings 'that' : ", result[0])
    ...: print("Index Positions of 'that' are : ", result[1])
Occurrence Count of overlapping sub-strings 'that' :  3
Index Positions of 'that' are :  [0, 3, 7]

查找第n次出现的位置索引

In [54]: mainStr = 'This is a sample string and a sample code. It is very Short.'
    ...:
    ...: result = frequencyCountAndPositions(mainStr, 'is')
    ...: if result[0] >= 2:
    ...:    print("Index Positions of 2nd Occurrence of sub-string 'is'  : ", result[1][1])
    ...:
Index Positions of 2nd Occurrence of sub-string 'is'  :  5

核查是否存在字符串并找出其索引位置

use in / not in 操作符

In [55]: mainStr = "This is a sample String with sample message."
    ...:
    ...: # Use in operator to check if sub string exists in another string
    ...: if "sample" in mainStr:
    ...:    print ('Sub-string Found')
    ...: else:
    ...:    print('Sub-string not found')
    ...:
Sub-string Found

In [56]: mainStr = "This is a sample String with sample message."
    ...:
    ...: if "Hello" not in mainStr:
    ...:    print ("Sub-string Doesn't exists in main String")
    ...:
Sub-string Doesn't exists in main String

忽略大小写

In [57]: mainStr = "This is a sample String with sample message."
    ...:
    ...: # use in operator to check if sub string exists by ignoring case of strings
    ...: # Convert both the strings to lower case then check for membership using in operator
    ...: if "SAMple".lower() in mainStr.lower():
    ...:    print('Sub-string Found')
    ...: else:
    ...:    print('Sub-string not found')
    ...:
Sub-string Found

核查字符串是否包含列表中的元素

In [58]: mainStr = "This is a sample String with sample message."
    ...:
    ...: listOfstrs = ['Hello', 'here', 'with', 'here', 'who']
    ...:
    ...: def checkIfAny(mainStr, listOfStr):
    ...:    for subStr in listOfStr:
    ...:        if subStr in mainStr:
    ...:            return (True, subStr)
    ...:    return (False, "")
    ...:
    ...: # Check if mainStr string contains any string from the list
    ...: result = checkIfAny(mainStr, listOfstrs)
    ...: if result[0]:
    ...:    print('Sub-string Found in main String : ', result[1])
    ...:
Sub-string Found in main String :  with

使用 any()和列表推导式

In [59]: # Check if any string from the list exists in given string
    ...: result = any(([True if subStr in mainStr else False for subStr in listOfstrs]))
    ...:
    ...: if result:
    ...:    print('A string from list Found in main String ')
    ...:
A string from list Found in main String

核查字符串是否包含列表中的所有元素

In [60]: mainStr = "This is a sample String with sample message."
    ...: listOfstrs = ['sample', 'String', 'with']
    ...:
    ...: # Check if all strings from the list exists in given string
    ...: result = all(([True if subStr in mainStr else False for subStr in listOfstrs]))
    ...:
    ...: if result:
    ...:    print('All strings from list Found in main String ')
    ...:
All strings from list Found in main String

使用 python regex 正则

考虑大小写

In [61]: # Create a pattern to match string 'sample'
    ...: patternObj = re.compile("sample")

In [62]: mainStr = "This is a sample String with sample message."
    ...:
    ...: # search for the pattern in the string and return the match object
    ...: matchObj = patternObj.search(mainStr)
    ...:
    ...: # check if match object is not Null
    ...: if matchObj:
    ...:    print('Sub-string Found')
    ...: else:
    ...:    print('Sub-string Not Found')
    ...:
Sub-string Found

忽略大小写

In [63]: # search for the sub-string in string by ignoring case
    ...: matchObj =  re.search('SAMple', mainStr, flags=re.IGNORECASE)
    ...:
    ...: if matchObj:
    ...:    print('Sub-string Found')
    ...: else:
    ...:    print('Sub-string Not Found')
    ...:
Sub-string Found

查找所有字符的出现次数和索引

use collections.Counter()

collections.counter(iterable-or-mapping)

In [65]: from collections import Counter

In [66]: mainStr = 'This is a sample string and a sample code. It is a very short string. 001122'
    ...:
    ...: # Counter is a dict sub class that keeps the characters in string as keys and their frequency as value
    ...: frequency = Counter(mainStr)
    ...:
    ...: print("Occurrence Count of all characters :")
    ...: # Iterate over the dictionary and Print the frequency of each character
    ...: for (key, value) in frequency.items():
    ...:    print("Occurrence Count of ", key, " is : ", value)
    ...:
Occurrence Count of all characters :
Occurrence Count of  T  is :  1
Occurrence Count of  h  is :  2
Occurrence Count of  i  is :  5
Occurrence Count of  s  is :  8
Occurrence Count of     is :  15
Occurrence Count of  a  is :  6
Occurrence Count of  m  is :  2
Occurrence Count of  p  is :  2
Occurrence Count of  l  is :  2
Occurrence Count of  e  is :  4
Occurrence Count of  t  is :  4
Occurrence Count of  r  is :  4
Occurrence Count of  n  is :  3
Occurrence Count of  g  is :  2
Occurrence Count of  d  is :  2
Occurrence Count of  c  is :  1
Occurrence Count of  o  is :  2
Occurrence Count of  .  is :  2
Occurrence Count of  I  is :  1
Occurrence Count of  v  is :  1
Occurrence Count of  y  is :  1
Occurrence Count of  0  is :  2
Occurrence Count of  1  is :  2
Occurrence Count of  2  is :  2

use python regex

In [67]: import re
    ...:
    ...: # Create a Regex pattern to match alphanumeric characters
    ...: regexPattern = re.compile('[a-zA-Z0-9]')
    ...:
    ...: mainStr = 'This is a sample string and a sample code. It is a very short string. 001122'
    ...:
    ...: # Iterate over all the alphanumeric characters in string (that matches the regex pattern)
    ...: # While Iterating keep on updating the frequency count of each character in a dictionary
    ...: iteratorOfMatchObs = regexPattern.finditer(mainStr)
    ...: frequencyOfChars = {}
    ...: indexPositions = {}
    ...:
    ...: for matchObj in iteratorOfMatchObs:
    ...:    frequencyOfChars[matchObj.group()] = frequencyOfChars.get(matchObj.group(), 0) + 1
    ...:    indexPositions[matchObj.group()] = indexPositions.get(matchObj.group(), []) + [matchObj.start()]
    ...:
    ...: # Iterate over the dictionary and Print the frequency of each character
    ...: for (key, value) in frequencyOfChars.items():
    ...:    print("Occurrence Count of ", key , " is : ", value , ' & Index Positions : ', indexPositions[key])
    ...:
Occurrence Count of  T  is :  1  & Index Positions :  [0]
Occurrence Count of  h  is :  2  & Index Positions :  [1, 57]
Occurrence Count of  i  is :  5  & Index Positions :  [2, 5, 20, 46, 65]
Occurrence Count of  s  is :  8  & Index Positions :  [3, 6, 10, 17, 30, 47, 56, 62]
Occurrence Count of  a  is :  6  & Index Positions :  [8, 11, 24, 28, 31, 49]
Occurrence Count of  m  is :  2  & Index Positions :  [12, 32]
Occurrence Count of  p  is :  2  & Index Positions :  [13, 33]
Occurrence Count of  l  is :  2  & Index Positions :  [14, 34]
Occurrence Count of  e  is :  4  & Index Positions :  [15, 35, 40, 52]
Occurrence Count of  t  is :  4  & Index Positions :  [18, 44, 60, 63]
Occurrence Count of  r  is :  4  & Index Positions :  [19, 53, 59, 64]
Occurrence Count of  n  is :  3  & Index Positions :  [21, 25, 66]
Occurrence Count of  g  is :  2  & Index Positions :  [22, 67]
Occurrence Count of  d  is :  2  & Index Positions :  [26, 39]
Occurrence Count of  c  is :  1  & Index Positions :  [37]
Occurrence Count of  o  is :  2  & Index Positions :  [38, 58]
Occurrence Count of  I  is :  1  & Index Positions :  [43]
Occurrence Count of  v  is :  1  & Index Positions :  [51]
Occurrence Count of  y  is :  1  & Index Positions :  [54]
Occurrence Count of  0  is :  2  & Index Positions :  [70, 71]
Occurrence Count of  1  is :  2  & Index Positions :  [72, 73]
Occurrence Count of  2  is :  2  & Index Positions :  [74, 75]

use collections.Counter() 查找重复字符

In [69]: from collections import Counter
    ...:
    ...: mainStr = 'This is a sample string and a sample code. It is a very short string. 001122'
    ...:
    ...: listOfDupChars = []
    ...: # Counter is a dict sub class that keeps the characters in string as keys and their frequency as value
    ...: frequency = Counter(mainStr)
    ...:
    ...: # Iterate over the dictionary and Print the frequency of each character
    ...: for (key, value) in frequency.items():
    ...:    if value > 4:
    ...:        listOfDupChars.append(key)
    ...: print('Duplicate characters ; ', listOfDupChars)
Duplicate characters ;  ['i', 's', ' ', 'a']

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。

原始发表：2019-06-17，如有侵权请联系 cloudcommunity@tencent.com 删除

regex