on 07-Jan-2009 07:21
Welcome to this addition of the PowerShell ABC's where you'll find 26 posts detailing a component of the PowerShell scripting language, one letter at a time. Today's letter is the letter "M" and I'm going to touch on the power of regular expressions and pattern Matching.
Along with the basic comparison operators (-[ci]eq, -[ci]ne, -[ci]gt, -[ci]ge, -[ci]lt, -[ci]le, -[ci]contains, and -[ci]notcontains), PowerShell has a number of operators allowing one to perform pattern matching comparisons.
These operators work on strings, matching and manipulating them using the wildcard and regular expression patterns.
Wildcard Patterns
The following wildcard pattern matching operators are listed in the following table:
| |||
Operator | Description | Example | Result |
---|---|---|---|
| |||
-like -clike -ilike | Do a wildcard pattern match | "one" -like "o*" | $true |
-notlike -cnotlike -inotlike | Do a wildcard pattern match; true if the pattern doesn't match | one" -notlike "o*" | $false |
|
You may be familiar with wildcard patterns as you likely have performed a "dir *.txt" or something similar on the command line. Wildcard patterns do have other options in addition to asterisks "*". You can specify any characters with the asterisk, a single character, or any character from a range or set. These are illustrated in the following table of wildcard expressions:
| ||||
Wildcard | Description | Example | Matches | Doesn't Match |
---|---|---|---|---|
| ||||
* | Matches zero or more characters | a* | a | ba |
anywhere in the string | aa | babc | ||
abc | ||||
ab | ||||
? | Matches any single character | a?b | abc | a, |
aXc | ab | |||
[-] | Matches a sequential range | a[b-d]c | abc | aac |
of characters | acc | aec | ||
adc | afc | |||
abbc | ||||
[...] | Matches any one character | a[bc]c | abc | a |
from a set of characters | acc | ab | ||
Ac | ||||
adc | ||||
|
Wildcard patters are fairly simple and somewhat limited in their capabilities and this is where regular expressions kick in.
Regular Expressions
Regular expressions are a superset of wildcard expressions in that you can express the same patterns in regular expressions that you can in wildcard expressions, but with a slightly different syntax. With regular expressions, instead of using "*" to match any sequence of characters, you would use ".*" and instead of using "?" to match any character, you use the "." character instead.
The syntax for regular expression patterns is very large so it's beyond the extent of this entry to describe it completely. PowerShell regular expressions are implemented with the .NET regular expression classes so any documentation regarding the formatting and use of .NET regular expressions is applicable in PowerShell.
PowerShell has two operators for use with regular expressions as illustrated in the table below:
| |||
Operator | Description | Example | Result |
---|---|---|---|
| |||
-match | Do a pattern match using regular expressions | "Hello" -match "[jkl]" | $true |
-cmatch | |||
-imatch | |||
-notmatch | Do a regex pattern match; | "Hello" -notmatch "[jkl"] | $false |
-cnotmatch | return true of the pattern doesn't match | ||
-inotmatch | |||
-replace | Do a regular expression substitution on the string | "Hello" -replace "ello", 'i' | "Hi" |
-creplace | on the right side and return the modified string | ||
-ireplace | |||
Delete the portion of the string not | "abcde" -replace "bcd" | "ae" | |
matching the regular expression | |||
|
The -match operator matches a pattern and returns a result. However, along with the result, it also sets the $matches variable which contains the portions of the string that are matched by individual parts of the regular expressions. $matches contains a hashtable where the keys of the hashtable are indexes that correspond to the parts of the pattern that is matched. The values are the substrings of the target string that are matched. Here is an example from the excellent Windows PowerShell in Action by Bruce Payette.
PS C:/> "abcdef" -match "(a) (((b)(c))de)f"
True
PS C:/> $matches
Key Value
--- -----
5 c
4 b
3 bc
2 bcde
1 a
0 abcdef
You'll noticed that there is one extra entry in the $matches hashtable beyond the 5 specified patterns. This is because there is always a default element that represents the entire string that matched.
Since PowerShell is built on .NET regular expressions, you are not limited to index based keys as illustrated above. This can be a pain when trying figure out which key index matches which matched pattern. The .NET regular expression implementation allows for named captures by palcing the sequence "?" immediate inside the parenthesis that indicate the matching group. The above example with named keys would look like this:
PS C:/> "abcdef" -match "(?a) (?((?b)(?c))de)f"
True
PS C:/> $matches
Key Value
--- -----
o1 c
e3 b
e4 bc
o2 bcde
1 a
0 abcdef