Primer on regular expressions inside of Emacs

Raw link: https://www.youtube.com/watch?v=TxYGHjKBMUg

In this video tutorial I show how to use regular expression syntax to solve various practical problems in Emacs.

Knowledge of regexp notation is not a prerequisite to using Emacs effectively. In fact, you can be very productive without knowing anything about regular expressions. However, knowing those things will certainly boost your productivity and make Emacs an even more powerful tool at your hands.

See my dotemacs for the documentation and package declarations I provide.


This is the full text of my presentation, which was done using org-mode (check my dotemacs for presentations with Org).

* Emacs regular expressions in practice

Emacs has a few ways to operate on regexp matches, such as:

+ =isearch=
+ =query-replace=
+ =keep-lines=
+ =flush-lines=

To make our life easier, we can practice with the built-in
=regexp-builder= or the third-party package =visual-regexp=.  This demo
will rely on the latter.

If you have the manual you can run =C-h r i regexp= to get to the
relevant chapter.  *Do it!*

** Line boundaries

The caret =^= denotes the beginning of the line.

The dollar sign =$= marks the end.

Match all lines that start with a space:

Emacs
Emacs
 Emacs
Emacs
 Emacs

And all that end with a capital =S=:

emacs emacS
emacS emacs
emacs emacs
emacS emacS

** Remove or keep lines

Remove the empty lines.  Then keep the ones that contain "username".

<username><![CDATA[name]]></username>
emacs emacS
emacS emacs
emacs emacs
emacS emacS



<userName><![CDATA[nom]]></userName>
emacs emacS
emacS emacs
emacs emacs
emacS emacS




<username><![CDATA[name]]></username>
emacs emacS
emacS emacs
emacs emacs
emacS emacS

** The dot character

The dot or full stop =.= means matches every character except the
newline.

Match these words using their common part =ired= as a string.

dired
fired
mired
tired
wired

** Character sets and ranges

A set of individual characters is marked between brackets =[]=.

Sets can be written as ranges:

| Range      | Scope                                      |
|------------+--------------------------------------------|
| [a-z]      | all lower cases alphabetic characters      |
| [A-Za-z]   | all upper or lower case letters            |
| [a-z0-9]   | lower case alphabet or numbers 0 through 9 |
| [abcd1234] | letters a,b,c,d and numbers 1,2,3,4        |

Match both of those using a character set for the first letter:

emacs
Emacs

Match those that end with a number:

Emacs
emacs-27
emacs-26
GNU emacs

** Difference between postfix operators ?, +, *

"Postfix" means that it comes after a given set and alters its scope.

=?= match the previous term zero or one time.
=+= match the previous term one or more times.
=*= match the previous term zero or as many times as possible.

Match the =s= optionally:

day
days

Use =prote= followed by a postfix:

prot
prote
proteeee

** Grouped matches

A group is enclosed inside escaped parentheses =\(GROUP\)=.

Match both of these, including the optional suffix =ig=:

conf
config

** Greedy versus non-greedy

Postfix charaacter are greedy by default.  "Greedy" matches the
longest possible part.  Whereas "non-greedy" corresponds to the
shortest.

A non-greedy variant is used when the postfix is followed by =?=.

Using the =.*= construct, match items both greedily and not:

Hello world
Hello world world world world

** Multiple groups

Match the alphabetic and numeric parts in two separate groups.

emacs27
emacs26
emacs25
emacs24

** Literal hyphen and dot

Match the hyphen as part of the alphabetic group and the dot as part
of the numeric one.

emacs-27.1
emacs-26.3
emacs-25.2

** Exclude sets

To exclude a set you prepend a caret sign: =[^SET]=

Match every line except those that start with a capital letter.

GNU
Emacs
org-mode
regexp
emacs_lisp
Linux
guix

** Alternative groups with literal brackets

Use a character sets that matches =name= and =nom=.

name
nom

Then:

1. Match the =username= variants' =[name]= or =[nom]=.
2. Replace the match with =[PROT]=.


<username><![CDATA[name]]></username>
<nameuser><![CDATA[nam]]></nameuser>
<userName><![CDATA[nom]]></userName>
<nameuser><![CDATA[nome]]></nameuser>

** Either match

To target either set, use =\|=.

Prepend =vr/= to the first =group= and =match= on each line.

`(group-0 ((group (:inherit modus-theme-intense-blue))))
`(group-1 ((group (:inherit modus-theme-intense-magenta))))
`(group-2 ((group (:inherit modus-theme-intense-green))))
`(match-0 ((match (:inherit modus-theme-refine-yellow))))
`(match-1 ((match (:inherit modus-theme-refine-yellow))))

** Running elisp functions on groups

Run elisp by escaping the comma =\,= and then following it with a symbol
inside parentheses: =\,(FUNCTION)=.

Using the =.ired= pattern from earlier, run a replace command where you
must execute the =upcase= function on the second/middle match.  Keep the
rest in tact.

direddireddired
firedfiredfired
miredmiredmired
tiredtiredtired
wiredwiredwired