Primer on regular expressions inside of Emacs
Raw link: https://www.youtube.com/watch?v=TxYGHjKBMUg
In this video tutorial I show how to use regular expression syntax to solve various practical problems in Emacs.
Knowledge of regexp notation is not a prerequisite to using Emacs effectively. In fact, you can be very productive without knowing anything about regular expressions. However, knowing those things will certainly boost your productivity and make Emacs an even more powerful tool at your hands.
See my dotemacs for the documentation and package declarations I provide.
This is the full text of my presentation, which was done using
org-mode
(check my dotemacs for presentations with Org).
* Emacs regular expressions in practice
Emacs has a few ways to operate on regexp matches, such as:
+ =isearch=
+ =query-replace=
+ =keep-lines=
+ =flush-lines=
To make our life easier, we can practice with the built-in
=regexp-builder= or the third-party package =visual-regexp=. This demo
will rely on the latter.
If you have the manual you can run =C-h r i regexp= to get to the
relevant chapter. *Do it!*
** Line boundaries
The caret =^= denotes the beginning of the line.
The dollar sign =$= marks the end.
Match all lines that start with a space:
Emacs
Emacs
Emacs
Emacs
Emacs
And all that end with a capital =S=:
emacs emacS
emacS emacs
emacs emacs
emacS emacS
** Remove or keep lines
Remove the empty lines. Then keep the ones that contain "username".
<username><![CDATA[name]]></username>
emacs emacS
emacS emacs
emacs emacs
emacS emacS
<userName><![CDATA[nom]]></userName>
emacs emacS
emacS emacs
emacs emacs
emacS emacS
<username><![CDATA[name]]></username>
emacs emacS
emacS emacs
emacs emacs
emacS emacS
** The dot character
The dot or full stop =.= means matches every character except the
newline.
Match these words using their common part =ired= as a string.
dired
fired
mired
tired
wired
** Character sets and ranges
A set of individual characters is marked between brackets =[]=.
Sets can be written as ranges:
| Range | Scope |
|------------+--------------------------------------------|
| [a-z] | all lower cases alphabetic characters |
| [A-Za-z] | all upper or lower case letters |
| [a-z0-9] | lower case alphabet or numbers 0 through 9 |
| [abcd1234] | letters a,b,c,d and numbers 1,2,3,4 |
Match both of those using a character set for the first letter:
emacs
Emacs
Match those that end with a number:
Emacs
emacs-27
emacs-26
GNU emacs
** Difference between postfix operators ?, +, *
"Postfix" means that it comes after a given set and alters its scope.
=?= match the previous term zero or one time.
=+= match the previous term one or more times.
=*= match the previous term zero or as many times as possible.
Match the =s= optionally:
day
days
Use =prote= followed by a postfix:
prot
prote
proteeee
** Grouped matches
A group is enclosed inside escaped parentheses =\(GROUP\)=.
Match both of these, including the optional suffix =ig=:
conf
config
** Greedy versus non-greedy
Postfix charaacter are greedy by default. "Greedy" matches the
longest possible part. Whereas "non-greedy" corresponds to the
shortest.
A non-greedy variant is used when the postfix is followed by =?=.
Using the =.*= construct, match items both greedily and not:
Hello world
Hello world world world world
** Multiple groups
Match the alphabetic and numeric parts in two separate groups.
emacs27
emacs26
emacs25
emacs24
** Literal hyphen and dot
Match the hyphen as part of the alphabetic group and the dot as part
of the numeric one.
emacs-27.1
emacs-26.3
emacs-25.2
** Exclude sets
To exclude a set you prepend a caret sign: =[^SET]=
Match every line except those that start with a capital letter.
GNU
Emacs
org-mode
regexp
emacs_lisp
Linux
guix
** Alternative groups with literal brackets
Use a character sets that matches =name= and =nom=.
name
nom
Then:
1. Match the =username= variants' =[name]= or =[nom]=.
2. Replace the match with =[PROT]=.
<username><![CDATA[name]]></username>
<nameuser><![CDATA[nam]]></nameuser>
<userName><![CDATA[nom]]></userName>
<nameuser><![CDATA[nome]]></nameuser>
** Either match
To target either set, use =\|=.
Prepend =vr/= to the first =group= and =match= on each line.
`(group-0 ((group (:inherit modus-theme-intense-blue))))
`(group-1 ((group (:inherit modus-theme-intense-magenta))))
`(group-2 ((group (:inherit modus-theme-intense-green))))
`(match-0 ((match (:inherit modus-theme-refine-yellow))))
`(match-1 ((match (:inherit modus-theme-refine-yellow))))
** Running elisp functions on groups
Run elisp by escaping the comma =\,= and then following it with a symbol
inside parentheses: =\,(FUNCTION)=.
Using the =.ired= pattern from earlier, run a replace command where you
must execute the =upcase= function on the second/middle match. Keep the
rest in tact.
direddireddired
firedfiredfired
miredmiredmired
tiredtiredtired
wiredwiredwired