  • 使用 string.count() 计算子字符串出现频次

string.count(s, sub[, start[, end]])

In [35]: mainStr = 'This is a sample string and a sample code. It is very short.'
    ...: # Get the occurrence count of sub-string in main string.
    ...: count = mainStr.count('sample')
    ...: print("'sample' sub string frequency / occurrence count : " , count)
'sample' sub string frequency / occurrence count :  2
  • 使用 python 正则表达式计算出现频次
In [36]: import re
    ...: # Create a Regex pattern to match the substring
    ...: regexPattern = re.compile("sample")
    ...: # Get a list of strings that matches the given pattern i.e. substring
    ...: listOfMatches = regexPattern.findall(mainStr)
    ...: print("'sample' sub string frequency / occurrence count : ", len(listOfMatches))
'sample' sub string frequency / occurrence count :  2
  • 统计重叠字符串

string.count() 不能正确统计重叠字符串中的出现次数

In [37]: mainStr = 'thathatthat'

In [38]: # string.count() will not be able to count occurrences of overlapping sub-strings
    ...: count = mainStr.count('that')

In [39]: count
Out[39]: 2
In [40]: # 自定义函数,用于查找重叠字符串出现次数
    ...: def frequencyCount(mainStr, subStr):
    ...:    counter = pos = 0
    ...:    while(True):
    ...:        pos = mainStr.find(subStr , pos)
    ...:        # pos索引作为find起始位置,找不到时返回-1
    ...:        if pos > -1:
    ...:            counter = counter + 1
    ...:            pos = pos + 1
    ...:        else:
    ...:            break
    ...:    return counter

In [41]: # count occurrences of overlapping substrings
    ...: count = frequencyCount(mainStr, 'that')
    ...: print("'that' sub string frequency count : ", count)
'that' sub string frequency count :  3
  • 找出出现次数和所有的起始索引位置

using Python regex finditer()

In [50]: print('**** Find Occurrence count and all index position of a sub-string in a String **** ')
    ...: import re
    ...: mainStr = 'This is a sample string and a sample code. It is very Short.'
    ...: # Create a Regex pattern to match the substring
    ...: regexPattern = re.compile('sample')
    ...: # Iterate over all the matches of substring using iterator of matchObjects returnes by finditer()
    ...: iteratorOfMatchObs = regexPattern.finditer(mainStr)
    ...: indexPositions = []
    ...: count = 0
    ...: for matchObj in iteratorOfMatchObs:
    ...:    indexPositions.append(matchObj.start())
    ...:    count = count + 1
    ...: print("Occurrence Count of substring 'sample' : ", count)
    ...: print("Index Positions of 'sample' are : ", indexPositions)
**** Find Occurrence count and all index position of a sub-string in a String ****
Occurrence Count of substring 'sample' :  2
Index Positions of 'sample' are :  [10, 30]
  • 使用自定义函数查找重叠字符串索引位置
In [51]: def frequencyCountAndPositions(mainStr, subStr):
    ...:    counter = pos = 0
    ...:    indexpos = []
    ...:    while(True):
    ...:        pos = mainStr.find(subStr , pos)
    ...:        # pos索引作为find起始位置,找不到时返回-1
    ...:        if pos > -1:
    ...:            indexpos.append(pos)
    ...:            counter = counter + 1
    ...:            pos = pos + 1
    ...:        else:
    ...:            break
    ...:    return (counter, indexpos)

In [52]: mainStr = 'thathatthat'
    ...: result = frequencyCountAndPositions(mainStr, 'that')
    ...: print("Occurrence Count of overlapping sub-strings 'that' : ", result[0])
    ...: print("Index Positions of 'that' are : ", result[1])
Occurrence Count of overlapping sub-strings 'that' :  3
Index Positions of 'that' are :  [0, 3, 7]
  • 查找第n次出现的位置索引
In [54]: mainStr = 'This is a sample string and a sample code. It is very Short.'
    ...: result = frequencyCountAndPositions(mainStr, 'is')
    ...: if result[0] >= 2:
    ...:    print("Index Positions of 2nd Occurrence of sub-string 'is'  : ", result[1][1])
Index Positions of 2nd Occurrence of sub-string 'is'  :  5
  • use in / not in 操作符
In [55]: mainStr = "This is a sample String with sample message."
    ...: # Use in operator to check if sub string exists in another string
    ...: if "sample" in mainStr:
    ...:    print ('Sub-string Found')
    ...: else:
    ...:    print('Sub-string not found')
Sub-string Found
In [56]: mainStr = "This is a sample String with sample message."
    ...: if "Hello" not in mainStr:
    ...:    print ("Sub-string Doesn't exists in main String")
Sub-string Doesn't exists in main String

  • 忽略大小写
In [57]: mainStr = "This is a sample String with sample message."
    ...: # use in operator to check if sub string exists by ignoring case of strings
    ...: # Convert both the strings to lower case then check for membership using in operator
    ...: if "SAMple".lower() in mainStr.lower():
    ...:    print('Sub-string Found')
    ...: else:
    ...:    print('Sub-string not found')
Sub-string Found
  • 核查字符串是否包含列表中的元素
In [58]: mainStr = "This is a sample String with sample message."
    ...: listOfstrs = ['Hello', 'here', 'with', 'here', 'who']
    ...: def checkIfAny(mainStr, listOfStr):
    ...:    for subStr in listOfStr:
    ...:        if subStr in mainStr:
    ...:            return (True, subStr)
    ...:    return (False, "")
    ...: # Check if mainStr string contains any string from the list
    ...: result = checkIfAny(mainStr, listOfstrs)
    ...: if result[0]:
    ...:    print('Sub-string Found in main String : ', result[1])
Sub-string Found in main String :  with

使用 any()和列表推导式

In [59]: # Check if any string from the list exists in given string
    ...: result = any(([True if subStr in mainStr else False for subStr in listOfstrs]))
    ...: if result:
    ...:    print('A string from list Found in main String ')
A string from list Found in main String
  • 核查字符串是否包含列表中的所有元素
In [60]: mainStr = "This is a sample String with sample message."
    ...: listOfstrs = ['sample', 'String', 'with']
    ...: # Check if all strings from the list exists in given string
    ...: result = all(([True if subStr in mainStr else False for subStr in listOfstrs]))
    ...: if result:
    ...:    print('All strings from list Found in main String ')
All strings from list Found in main String
  • 使用 python regex 正则


In [61]: # Create a pattern to match string 'sample'
    ...: patternObj = re.compile("sample")

In [62]: mainStr = "This is a sample String with sample message."
    ...: # search for the pattern in the string and return the match object
    ...: matchObj = patternObj.search(mainStr)
    ...: # check if match object is not Null
    ...: if matchObj:
    ...:    print('Sub-string Found')
    ...: else:
    ...:    print('Sub-string Not Found')
Sub-string Found


In [63]: # search for the sub-string in string by ignoring case
    ...: matchObj =  re.search('SAMple', mainStr, flags=re.IGNORECASE)
    ...: if matchObj:
    ...:    print('Sub-string Found')
    ...: else:
    ...:    print('Sub-string Not Found')
Sub-string Found
  • use collections.Counter()


In [65]: from collections import Counter

In [66]: mainStr = 'This is a sample string and a sample code. It is a very short string. 001122'
    ...: # Counter is a dict sub class that keeps the characters in string as keys and their frequency as value
    ...: frequency = Counter(mainStr)
    ...: print("Occurrence Count of all characters :")
    ...: # Iterate over the dictionary and Print the frequency of each character
    ...: for (key, value) in frequency.items():
    ...:    print("Occurrence Count of ", key, " is : ", value)
Occurrence Count of all characters :
Occurrence Count of  T  is :  1
Occurrence Count of  h  is :  2
Occurrence Count of  i  is :  5
Occurrence Count of  s  is :  8
Occurrence Count of     is :  15
Occurrence Count of  a  is :  6
Occurrence Count of  m  is :  2
Occurrence Count of  p  is :  2
Occurrence Count of  l  is :  2
Occurrence Count of  e  is :  4
Occurrence Count of  t  is :  4
Occurrence Count of  r  is :  4
Occurrence Count of  n  is :  3
Occurrence Count of  g  is :  2
Occurrence Count of  d  is :  2
Occurrence Count of  c  is :  1
Occurrence Count of  o  is :  2
Occurrence Count of  .  is :  2
Occurrence Count of  I  is :  1
Occurrence Count of  v  is :  1
Occurrence Count of  y  is :  1
Occurrence Count of  0  is :  2
Occurrence Count of  1  is :  2
Occurrence Count of  2  is :  2
  • use python regex
In [67]: import re
    ...: # Create a Regex pattern to match alphanumeric characters
    ...: regexPattern = re.compile('[a-zA-Z0-9]')
    ...: mainStr = 'This is a sample string and a sample code. It is a very short string. 001122'
    ...: # Iterate over all the alphanumeric characters in string (that matches the regex pattern)
    ...: # While Iterating keep on updating the frequency count of each character in a dictionary
    ...: iteratorOfMatchObs = regexPattern.finditer(mainStr)
    ...: frequencyOfChars = {}
    ...: indexPositions = {}
    ...: for matchObj in iteratorOfMatchObs:
    ...:    frequencyOfChars[matchObj.group()] = frequencyOfChars.get(matchObj.group(), 0) + 1
    ...:    indexPositions[matchObj.group()] = indexPositions.get(matchObj.group(), []) + [matchObj.start()]
    ...: # Iterate over the dictionary and Print the frequency of each character
    ...: for (key, value) in frequencyOfChars.items():
    ...:    print("Occurrence Count of ", key , " is : ", value , ' & Index Positions : ', indexPositions[key])
Occurrence Count of  T  is :  1  & Index Positions :  [0]
Occurrence Count of  h  is :  2  & Index Positions :  [1, 57]
Occurrence Count of  i  is :  5  & Index Positions :  [2, 5, 20, 46, 65]
Occurrence Count of  s  is :  8  & Index Positions :  [3, 6, 10, 17, 30, 47, 56, 62]
Occurrence Count of  a  is :  6  & Index Positions :  [8, 11, 24, 28, 31, 49]
Occurrence Count of  m  is :  2  & Index Positions :  [12, 32]
Occurrence Count of  p  is :  2  & Index Positions :  [13, 33]
Occurrence Count of  l  is :  2  & Index Positions :  [14, 34]
Occurrence Count of  e  is :  4  & Index Positions :  [15, 35, 40, 52]
Occurrence Count of  t  is :  4  & Index Positions :  [18, 44, 60, 63]
Occurrence Count of  r  is :  4  & Index Positions :  [19, 53, 59, 64]
Occurrence Count of  n  is :  3  & Index Positions :  [21, 25, 66]
Occurrence Count of  g  is :  2  & Index Positions :  [22, 67]
Occurrence Count of  d  is :  2  & Index Positions :  [26, 39]
Occurrence Count of  c  is :  1  & Index Positions :  [37]
Occurrence Count of  o  is :  2  & Index Positions :  [38, 58]
Occurrence Count of  I  is :  1  & Index Positions :  [43]
Occurrence Count of  v  is :  1  & Index Positions :  [51]
Occurrence Count of  y  is :  1  & Index Positions :  [54]
Occurrence Count of  0  is :  2  & Index Positions :  [70, 71]
Occurrence Count of  1  is :  2  & Index Positions :  [72, 73]
Occurrence Count of  2  is :  2  & Index Positions :  [74, 75]
  • use collections.Counter() 查找重复字符
In [69]: from collections import Counter
    ...: mainStr = 'This is a sample string and a sample code. It is a very short string. 001122'
    ...: listOfDupChars = []
    ...: # Counter is a dict sub class that keeps the characters in string as keys and their frequency as value
    ...: frequency = Counter(mainStr)
    ...: # Iterate over the dictionary and Print the frequency of each character
    ...: for (key, value) in frequency.items():
    ...:    if value > 4:
    ...:        listOfDupChars.append(key)
    ...: print('Duplicate characters ; ', listOfDupChars)
Duplicate characters ;  ['i', 's', ' ', 'a']
