Python String Processing

Background. Regular expression-based project applications.

1.Python string of delete blank characters or some specific characters, string

Description. strip(), rstrip(), and lstrip() are used to remove consecutive blank characters or sets of characters at both ends, the right end, and the left end, respectively.

strip() Demonstration.

s = ‘abc ’

s2 = s.strip()



s = ‘ hello world ’

s2 = s.strip()


#hello world

s = ‘aaaassddf’

s2 = s.strip(‘a’)



s3 = s.strip(‘af’)



s = ‘aabbccddeeffg’

s2 = s.strip(‘af’)


#bbccddeeffg #The character f is not on either side of the string, so it cannot be deleted

s3 = s.strip(‘aefg’)



rstrip() Demonstration.

s = ‘aaaaddfaaa’

s2 = s.rstrip(‘a’)



lstrip() Demonstration.

s = ‘aaaaddfaaa’

s2 = s.lstrip(‘a’)



2.Deleting a single fixed position character: slicing + splicing

s = ‘abc:123’

s1 = s[:3] +s[4:]



3.Delete any position word-symbol Simultaneous deletion of multiple different word-symbol:replace(),re.sub()

s = ‘abc 123 xyz’

print(s.replace(‘ ’,’ ’))

#abc 123 xyz

s = ‘abc 123 xyz ’

print(re.sub(‘[ ]’,’ ‘ ’,s))

#abc 123 xyz

4.Simultaneous deletion of multiple different characters.translate()

translate() The method is based on the parametertable The given table (containing 256 characters) converts the characters of the string to be filtered out intodeletechars in the parameters.

translate() Methodology syntax.




str.translate(table) Methods.

I. Mapping relationship establishment.maketrans()

intab = ‘aeIOU’

outtab = ‘12345’

trantab =str.maketrans(intab,outtab)



II. Convert it to a string.

str = ‘this isstring example….wow!’

print(str.translate(trantab)+ ‘ ’)

#th3s 3s str3ng 2x1mpl2....w4w!

②bytes.translate(table,delete) Methods.

bytes_test =bytes.maketrans(b’run’,b’RUN’)

bytes_p = b’ruoon’



5.Python word-symbol add before the stringu,r,b the meaning of?

u: Indicates Unicode string, Unicode.

r: Indicates a non-escaped character, the original string.

b: bytes, a data type in bytes.


① The default str in python3 is (python 2) Unicode, bytes is (python 2) str,the b'' prefix represents bytes

② After python 3, the string and bytes types are completely separated. The strings are written in word-symbol for the unit to be processed, bytes type is in bytes handled for the unit.

The type conversion between str and bytes is as follows.

str to bytes:bytes(s,encoding = ‘utf-8’)

bytes to str:str(b,encoding = ‘utf-8’)

The concept of characters?

word-symbol is the one used in the computer Letters, numbers, words harmony symbolic . The storage space required for each character is different in different encodings.ASCii code in, One English word-symbol need1Bytes;GB 2312 Code orGBK (a) 2Bytes is required to store one Chinese character in the encoding.UTF-8 1 Bytes for an English language and 3-4 Bytes for a Chinese character character in the encoding.UTF-16 in which 2Bytes are required for one English or Chinese character. Special, inUTF-32 Any character storage in the encoding requires 4 bytes.

Regular hands-on. Copy the data directly from a web form as follows, please feel free to cut.

Copy data

Analysis code format. Number + school name + real number + star rating + integer

The underlying canonical form.(d+)+s+([u4E00-u9FA5])+s+(d+(.d)?)+s+(d+★)+s+(d)

So now you want a few items to change directly on top of that.

original code

Print results

Overall feeling, The canonical rule is useful for dealing with the format specification of word-symbol String or text is very friendly, what?.. That said, if the file output is defined by us, Then there's no problem with formatting specifications。 definitely, For the less standardized format of word-symbol string together, There's still a way to deal with it, too, Masamune.。

This is the first simple regular expression I've forced myself to write, not necessarily meaningful or worth learning, but at least for general string processing or lookups it shouldn't be much of a problem. The mastery rate for the regular is pretty much around 40% I think, and the rest of those with brackets... We'll talk when we need to.

The usual. Let's suck on one!

1、Summary of common Python crawler frameworks
2、New products doubletracked and new strategy laid out for a safe future Baowo shows new strength in comprehensive development
3、Classes start in Beijing Scientific image processing and graphic layout
4、Who is really responsible for driverless car crashes
5、Heavy Humanity will achieve immortality in 2029 Disease aging and pain will be gone for good

    已推荐到看一看 和朋友分享想法
    最多200字,当前共 发送