Getting Started with Regular Expressions
regular expression
what: a rule for matching strings
where: program area: 1, form validation for login and registration 2. crawler 3. automation development
why: you can make a rule 1. to check if a particular string matches the rule 2. to find what matches the rule from a large segment of strings
how:
Syntax of regular expressions: 1. metacharacters 2. quantifiers 3. special uses and phenomena
1$ character
[...] ---- Match characters in the character set (set)
[^...] ---match all characters except those in the character group
d------- matching numbers
w------- matches alphanumeric or underscore
s------- matches any blank character
D------- matches non-numeric
W------- matches non-numbers or letters or underscores
S------- matches non-blank characters
------- matches a line feed fu
------- matches a tab
^-------- matches the beginning of the string
$-------- matches the end of the string
a|b------ matches a or b
() ------ matches the expression in parentheses and also represents a group
.-------- matches any character other than a line break
------- matches the boundary of a word
2 Quantifiers
*-------- repeated zero or more times
+ -------- repeated one or more times
What? ------- repeated zero times or once
------ repeat n times
----- repeated n or more times
---- Repeat n to m times
escape character
(located) at regular expression in, There are many meta characters with special meaning, for exampled harmonys etc., If you want to match the normal in the regulard Instead of numbers
It would require a review of'' escape, change into’\‘。
In python, both regular expressions and matching content are in the form of strings, and there are special
Meaning, itself, also needs to be escaped. So if you match d once, you have to write \d in the string, then you have to write \d in the rule, so
It would be too much trouble.。 This is when we need to use the r'd' This concept, The positive rule at this point isr'\d' That'll do it.。
greedy matching
Greedy match: match the longest possible string when the match is satisfied, by default, greedy match is used
3 Special uses and phenomena
A few commonly used non-greedy matches
*What? ------ Repeat any number of times, but as few as possible
+? ------ Repeat 1 or follow as many times as possible, but as few times as possible
What? ----- Repeat 0 or 1 time, but as few times as possible
? - repeat n to m times, but as few times as possible
? ---- repeated more than n times, but as few as possible
Usage of .*?
. is any character
*is taken from 0 to infinite length
What? Yes or no greedy match
Combined is, taking as few arbitrary characters as possible, usually not written so separately, mostly used in
.*?x is to take the previous character of any length and know that an x occurs.