Skip to content

Regex

Like it or not, Network CLIs will be around for a long time. In order to automate networks, you will likely need to automate CLIs. This generally includes dealing with unstructured (or text) data. Creating data structures from CLI configuration allows you to interact with the date much more programmatically.

Regex Building Blocks

Meta Characters

Meta characters are the building blocks of regular expressions. Meta characters are interpreted, not evaluated as an exact match. The meta characters are as follows:

Meta character Description
. Period matches any single character except a line break.
[ ] Character class. Matches any character contained between the square brackets.
[^ ] Negated character class. Matches any character that is not contained between the square brackets
* Matches 0 or more repetitions of the preceding symbol.
+ Matches 1 or more repetitions of the preceding symbol.
? Makes the preceding symbol optional.
{n,m} Braces. Matches at least "n" but not more than "m" repetitions of the preceding symbol.
(xyz) Character group. Matches the characters xyz in that exact order.
| Alternation. Matches either the characters before or the characters after the symbol.
\ Escapes the next character. This allows you to match reserved characters [ ] ( ) { } . * + ? ^ $ \ |
^ Matches the beginning of the input.
$ Matches the end of the input.

Special Sequence

A special sequence, denoted by a \ followed by one of the characters in the list below, carries a specific meaning:

Special Sequence Description
\d Matches digits: [0-9]
\D Matches non-digits: [^\d]
\s Matches whitespace characters: [\t\n\f\r\p{Z}]
\S Matches non-whitespace characters: [^\s]
\w Matches alphanumeric characters: [a-zA-Z0-9_]
\W Matches non-alphanumeric characters: [^\w]

Outside of scope Special Sequence.

Special Sequence Description
\A Returns a match if the specified characters are at the beginning of the string
\b Returns a match where the specified characters are at the beginning or at the end of a word
\B Returns a match where the specified characters are present, but NOT at the beginning (or at the end) of a word
\Z Returns a match if the specified characters are at the end of the string

Lookaround

Though outside the scope of this lesson, it is worth noting the 4 types of lookaround's.

Symbol Description
?= Positive Lookahead
?! Negative Lookahead
?<= Positive Lookbehind
?<! Negative Lookbehind

Pattern Matching in Regex

Use this regex101.com link for this lesson.

Basic Matching

In regular expressions, a basic match involves finding a specific sequence of characters within a given text. For example, the regular expression cisco would match the word "cisco" wherever it appears in the text.

Note: Without the case insensitive flag which is covered later, case sensitivity matters.

Cisco
cisco
IOS

Escaping Special Characters

Certain characters have special meanings in regular expressions. To match them literally, you need to escape them using a backslash \. For instance, \. matches a literal dot, and \\ matches a literal backslash.

Software \[Amsterdam\]
License \(\"GPL\"\)

Any character

The dot . meta-character in regular expressions represents any single character. It can be used to match any character except a newline. For instance, the pattern a.b would match "C.sco," "C..co," "C...o," and so on.

C.sco
C..co
c...o

Note: With c...0 you will capture components, processor, and more.

Character Sets

Character sets allow you to specify a group of characters that you want to match. For instance, [aeiou]would match any lowercase vowel in a text. They can come in ranges, such as [a-z], [A-Z], [a-c], [0-9], etc.

[Cc]isco
c[aeo]n
c[a-m]n
c[n-z]n
[0-9] days

Order is determined by ascii ordering in Python by default, e.g. re.ASCII is set.

Negated Character Sets

A negated character set is represented by using ^ as the first character within square brackets. For example, [^0-9] matches any character that is not a digit.

[^c]isco

Repetitions

Repetitions specify how many times a preceding element can occur in the text. There are several ways to control this, such as *, +, {}, and ?.

Star

The * meta-character matches zero or more occurrences of the preceding element. For instance, a* matches "aaa," "a," and an empty string.

.*
.* Cisco
laws[,\.]*
cisco.com/.*
.*@cisco.com
w*\.cisco.com
CSR[0-9]*V
Processor board ID [0-9A-Z]*

Plus

The + meta-character matches one or more occurrences of the preceding element. For example, a+ matches "aaa" and "a," but not an empty string.

.+
.+ Cisco
laws[,\.]+
cisco.com/.+
.+@cisco.com
w+\.cisco.com
CSR[0-9]+V
Processor board ID [0-9A-Z]+

Question Mark

The ? meta-character matches zero or one occurrence of the preceding element. For instance, colou?r matches both "color" and "colour."

laws[,\.]?
[Uu]\.?[Ss]\.? 

Braces

Braces {} are used to specify the number of occurrences of the preceding element. For example, a{3} matches "aaa," and a{2,4} matches "aa," "aaa," and "aaaa."

CSR[0-9]{4}V
[0-9]{4}K
[0-9]{4,7}K
[0-9]{4,7}K bytes

Pattern Matching - Lab

Using what you have seen above, develop the regex pattern to match the following:

Note: for all labs, make reasonable assumptions which data will be dynamic, e.g. the version and time can reasonably be presumed to be different.

  • License Level:
  • 45 minutes
  • 17.1.1, RELEASE SOFTWARE
  • 22-Nov-19
  • Copyright (c) 1986-2019

Capture Groups and Anchors

Capturing Groups

Capturing groups in regular expressions are used to group sub-patterns together. They are enclosed in parentheses ( and ), allowing you to apply quantifiers or operations to multiple characters. For example, (ab)+ matches "ab," "abab," and so on. Capture groups can be used in Python to define "the interesting data". It is much easier to find 17.01.01 in the context of Cisco IOS XE Software, Version 17.01.01.

Next reload license Level: (.+)
([0-9]+) days, ([0-9]+) hours, ([0-9]+) minutes
Cisco IOS (XE)*
ROM: (.+)

Non-Capturing Groups

Non-capturing groups, denoted by (?:...), are similar to capturing groups but don't create a separate capture in the result. They are useful when you want to apply quantifiers or modifiers to a group without needing to reference the captured content.

Next reload license Level: (?:.+)
(?:[0-9]+) days, (?:[0-9]+) hours, (?:[0-9]+) minutes
Cisco IOS (?:XE)*

Alternation

Alternation, denoted by the vertical bar |, allows you to match one of several alternative patterns. For example, (cisco|arista) matches either "cisco" or "arista."

(Cisco|Arista).*
[0-9]+ (days|hours|minutes)

Anchors

Anchors are used to match patterns at specific positions within the text.

Caret

The caret ^ is an anchor that matches the beginning of a line or string. For example, ^abc matches "abc" only if it's at the beginning of a line.

^Cisco
^[Cc]isco
^Cisco.+
^Processor board.+

Dollar Sign

The dollar sign $ is an anchor that matches the end of a line or string. For instance, xyz$ matches "xyz" only if it's at the end of a line.

^cisco.+memory\.$
.*minutes

Greedy vs Lazy Matching

In regular expressions, quantifiers such as *, +, and ? are greedy by default. This means they match as much text as possible while still allowing the entire pattern to match. However, there are situations where you might want to match the smallest possible portion of text. This is where greedy vs lazy matching comes into play.

Greedy Matching

Greedy matching is the default behavior of quantifiers. It tries to match as many characters as possible while still allowing the overall pattern to match. For example, in the pattern a.*b, the expression a.* will match the longest possible string of characters between "a" and "b" in the text.

Cisco IOS.+(Software)
Virtual XE Software (.+),

Lazy (Non-Greedy) Matching

Lazy matching, also known as non-greedy or minimal matching, is achieved by adding a ? after a quantifier. It matches the shortest possible string of characters that allows the overall pattern to match. For example, in the pattern a.*?b, the expression a.*? will match the shortest string between "a" and "b."

Lazy matching is particularly useful when you want to match content within the smallest possible scope. It's especially important when dealing with text that contains nested or repeating patterns.

Cisco IOS.+?(Software)
Virtual XE Software (.+?),

Capture Group and Lazy Matching - LAB

Using only the concepts covered above, develop the regex pattern to match the following in the capture group:

  • fc3
  • 3 days, 15 hours, 46 minutes // Capture control processor uptime only
  • 17.1.1
  • 17.01.01
  • 1986-2019

Special Sequences

Special sequences in regular expressions provide shortcuts for matching common patterns in text. The pattern that they will follow is lowercase for digits, words, and whitespace and uppercase for the respective opposite. These will greatly simplify the usage of character sets.

Digits and Non-digits

The special sequence \d matches any digit character (equivalent to [0-9]). On the other hand, \D matches any non-digit character.

Version (\d+\.\d+\.\d+)
cisco (\D+)
uptime is \d+ days, \d+ hours, \d+ minutes

Words and Non-Words

The special sequence \w matches any word character (equivalent to [a-zA-Z0-9_]). It's commonly used to match variable names, identifiers, and alphanumeric characters. \W matches any non-word character.

License Level: \w+
Cisco IOS Software \W(\w+)\W+
http:\/\/www\.(\w+)\.com
(\W+\w+)\.html$

Whitespace and Non-Whitespace

The special sequence \s matches any whitespace character, including spaces, tabs, and newlines. It's used to match formatting characters. \S matches any non-whitespace character.

GPL Version (\S+).\s+For
^\s+$

Special Sequences - Lab

Using only the concepts covered above, develop the regex pattern to match the following in the capture group:

  • fc3
  • 3 days, 15 hours, 46 minutes // Capture control processor uptime only
  • 17.1.1
  • 17.01.01
  • 1986-2019

Flags - Case Insensitive, Global Search, Multiline

Flags in regular expressions modify the behavior of the pattern matching process. They are added after the closing delimiter of the regular expression.

The case insensitive flag allows you to perform a case-insensitive search. This means that uppercase and lowercase letters are treated as equivalent. For example, with the flag, the pattern apple will match "apple," "Apple," and "APPLE."

The global search flag enables multiline matching. This means that the ^ anchor matches the beginning of each line, and the $ anchor matches the end of each line, instead of the beginning and end of the entire text.

The multiline flag affects the behavior of the dot . meta-character. By default, the dot matches any character except a newline. With the multiline flag, the dot matches any character, including newlines.

Flags - Lab

Simply go to regex101 and try the same regex's with the different flags turned on

Appendix

The regex101.com link should already include the below data, however, captured here in case it is lost.

Cisco IOS XE Software, Version 17.01.01
Cisco IOS Software [Amsterdam], Virtual XE Software (X86_64_LINUX_IOSD-UNIVERSALK9-M), Version 17.1.1, RELEASE SOFTWARE (fc3)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2019 by Cisco Systems, Inc.
Compiled Fri 22-Nov-19 03:39 by mcpre


Cisco IOS-XE software, Copyright (c) 2005-2019 by cisco Systems, Inc.
All rights reserved.  Certain components of Cisco IOS-XE software are
licensed under the GNU General Public License ("GPL") Version 2.0.  The
software code licensed under GPL Version 2.0 is free software that comes
with ABSOLUTELY NO WARRANTY.  You can redistribute and/or modify such
GPL code under the terms of GPL Version 2.0.  For more details, see the
documentation or "License Notice" file accompanying the IOS-XE software,
or the applicable URL provided on the flyer accompanying the IOS-XE
software.


ROM: IOS-XE ROMMON

csr1 uptime is 3 days, 15 hours, 45 minutes
Uptime for this control processor is 3 days, 15 hours, 46 minutes
System returned to ROM by reload
System image file is "bootflash:packages.conf"
Last reload reason: reload



This product contains cryptographic features and is subject to United
States and local country laws governing import, export, transfer and
use. Delivery of Cisco cryptographic products does not imply
third-party authority to import, export, distribute or use encryption.
Importers, exporters, distributors and users are responsible for
compliance with U.S. and local country laws. By using this product you
agree to comply with applicable laws and regulations. If you are unable
to comply with U.S. and local laws, return this product immediately.

A summary of U.S. laws governing Cisco cryptographic products may be found at:
http://www.cisco.com/wwl/export/crypto/tool/stqrg.html

If you require further assistance please contact us by sending email to
export@cisco.com.

License Level: ax
License Type: N/A(Smart License Enabled)
Next reload license Level: ax


Smart Licensing Status: UNREGISTERED/No Licenses in Use

cisco CSR1000V (VXE) processor (revision VXE) with 2078006K/3075K bytes of memory.
Processor board ID 9SAGBHTUEE9
9 Gigabit Ethernet interfaces
32768K bytes of non-volatile configuration memory.
3978444K bytes of physical memory.
6188032K bytes of virtual hard disk at bootflash:.

Configuration register is 0x2102