Python String Processing
Background. Regular expression-based project applications.
1.Python string of delete blank characters or some specific characters, string
Description. strip(), rstrip(), and lstrip() are used to remove consecutive blank characters or sets of characters at both ends, the right end, and the left end, respectively.
①strip() Demonstration.
s = ‘abc ’
s2 = s.strip()
print(s2)
#abc
s = ‘ hello world ’
s2 = s.strip()
print(s2)
#hello world
s = ‘aaaassddf’
s2 = s.strip(‘a’)
print(s2)
#ssddf
s3 = s.strip(‘af’)
print(s3)
#ssdd
s = ‘aabbccddeeffg’
s2 = s.strip(‘af’)
print(s2)
#bbccddeeffg #The character f is not on either side of the string, so it cannot be deleted
s3 = s.strip(‘aefg’)
print(s3)
#bbccdd
②rstrip() Demonstration.
s = ‘aaaaddfaaa’
s2 = s.rstrip(‘a’)
print(s2)
#aaaaddf
③lstrip() Demonstration.
s = ‘aaaaddfaaa’
s2 = s.lstrip(‘a’)
print(s2)
#ddfaaa
2.Deleting a single fixed position character: slicing + splicing
s = ‘abc:123’
s1 = s[:3] +s[4:]
print(s1)
#abc123
3.Delete any position word-symbol Simultaneous deletion of multiple different word-symbol:replace(),re.sub()
s = ‘abc 123 xyz’
print(s.replace(‘ ’,’ ’))
#abc 123 xyz
s = ‘abc 123 xyz ’
print(re.sub(‘[ ]’,’ ‘ ’,s))
#abc 123 xyz
4.Simultaneous deletion of multiple different characters.translate()
translate() The method is based on the parametertable The given table (containing 256 characters) converts the characters of the string to be filtered out intodeletechars in the parameters.
translate() Methodology syntax.
str.translate(table)
bytes.translate(table,delete)
bytearray.translate(table,delete)
①str.translate(table) Methods.
I. Mapping relationship establishment.maketrans()
intab = ‘aeIOU’
outtab = ‘12345’
trantab =str.maketrans(intab,outtab)
print(str.maketrans(‘aeIOU’,’12345’))
#
II. Convert it to a string.
str = ‘this isstring example….wow!’
print(str.translate(trantab)+ ‘ ’)
#th3s 3s str3ng 2x1mpl2....w4w!
②bytes.translate(table,delete) Methods.
bytes_test =bytes.maketrans(b’run’,b’RUN’)
bytes_p = b’ruoon’
print(bytes_p.translate(bytes_test,b’o’))
#b’RUN’
5.Python word-symbol add before the stringu,r,b the meaning of?
u: Indicates Unicode string, Unicode.
r: Indicates a non-escaped character, the original string.
b: bytes, a data type in bytes.
Description.
① The default str in python3 is (python 2) Unicode, bytes is (python 2) str,the b'' prefix represents bytes
② After python 3, the string and bytes types are completely separated. The strings are written in word-symbol for the unit to be processed, bytes type is in bytes handled for the unit.
The type conversion between str and bytes is as follows.
str to bytes:bytes(s,encoding = ‘utf-8’)
bytes to str:str(b,encoding = ‘utf-8’)
The concept of characters?
word-symbol is the one used in the computer Letters, numbers, words harmony symbolic . The storage space required for each character is different in different encodings.ASCii code in, One English word-symbol need1Bytes;GB 2312 Code orGBK (a) 2Bytes is required to store one Chinese character in the encoding.UTF-8 1 Bytes for an English language and 3-4 Bytes for a Chinese character character in the encoding.UTF-16 in which 2Bytes are required for one English or Chinese character. Special, inUTF-32 Any character storage in the encoding requires 4 bytes.
Regular hands-on. Copy the data directly from a web form as follows, please feel free to cut.
Copy data
Analysis code format. Number + school name + real number + star rating + integer
The underlying canonical form.(d+)+s+([u4E00-u9FA5])+s+(d+(.d)?)+s+(d+★)+s+(d)
So now you want a few items to change directly on top of that.
original code
Print results
Overall feeling, The canonical rule is useful for dealing with the format specification of word-symbol String or text is very friendly, what?.. That said, if the file output is defined by us, Then there's no problem with formatting specifications。 definitely, For the less standardized format of word-symbol string together, There's still a way to deal with it, too, Masamune.。
This is the first simple regular expression I've forced myself to write, not necessarily meaningful or worth learning, but at least for general string processing or lookups it shouldn't be much of a problem. The mastery rate for the regular is pretty much around 40% I think, and the rest of those with brackets... We'll talk when we need to.
The usual. Let's suck on one!