Getting Started with Regular Expressions

regular expression

what: a rule for matching strings

where: program area: 1, form validation for login and registration 2. crawler 3. automation development

why: you can make a rule 1. to check if a particular string matches the rule 2. to find what matches the rule from a large segment of strings

how：

Syntax of regular expressions: 1. metacharacters 2. quantifiers 3. special uses and phenomena

1$ character

[...] ---- Match characters in the character set (set)

[^...] ---match all characters except those in the character group

d------- matching numbers

w------- matches alphanumeric or underscore

s------- matches any blank character

D------- matches non-numeric

W------- matches non-numbers or letters or underscores

S------- matches non-blank characters

------- matches a line feed fu

------- matches a tab

^-------- matches the beginning of the string

$-------- matches the end of the string

a|b------ matches a or b

() ------ matches the expression in parentheses and also represents a group

.-------- matches any character other than a line break

------- matches the boundary of a word

2 Quantifiers

*-------- repeated zero or more times

+ -------- repeated one or more times

What? ------- repeated zero times or once

------ repeat n times

----- repeated n or more times

---- Repeat n to m times

escape character

(located) at regular expression in， There are many meta characters with special meaning， for exampled harmonys etc.， If you want to match the normal in the regulard Instead of numbers

It would require a review of'' escape， change into’\‘。

In python, both regular expressions and matching content are in the form of strings, and there are special

Meaning, itself, also needs to be escaped. So if you match d once, you have to write \d in the string, then you have to write \d in the rule, so

It would be too much trouble.。 This is when we need to use the r'd' This concept， The positive rule at this point isr'\d' That'll do it.。

greedy matching

Greedy match: match the longest possible string when the match is satisfied, by default, greedy match is used

3 Special uses and phenomena

A few commonly used non-greedy matches

*What? ------ Repeat any number of times, but as few as possible

+? ------ Repeat 1 or follow as many times as possible, but as few times as possible

What? ----- Repeat 0 or 1 time, but as few times as possible

? - repeat n to m times, but as few times as possible

? ---- repeated more than n times, but as few as possible

Usage of .*?

. is any character

*is taken from 0 to infinite length

What? Yes or no greedy match

Combined is, taking as few arbitrary characters as possible, usually not written so separately, mostly used in

.*?x is to take the previous character of any length and know that an x occurs.