copied to clipboard!
string sed

Mastering sed and Regular Expressions: From Basics to Advanced Text Processing

updated: 2026/04/16 created: 2026/04/16

Introduction

When it comes to manipulating strings on the command line, sed is an incredibly powerful option. Whether you are replacing file content or extracting specific lines, combining it with regular expressions allows you to complete complex tasks in a single line.

This article provides a step-by-step guide from basic usage to advanced techniques using regular expressions. It is structured to be accessible even for those unfamiliar with the command line, highlighting common pitfalls for beginners.

Reference: sed, a stream editor - GNU Official Documentation

How sed and Regex Replace the First Character with “H”

The primary command we will examine is:

sed 's/./H/' input.txt
ElementDescription
sedThe stream editor. Processes files or standard input line by line.
's/./H/'The substitution command. Format: s/search_pattern/replacement_string/.
.A regex metacharacter matching any single character.
HThe string to replace the match with.
input.txtThe target file for processing.

The s command without flags only replaces the first match in each line. Consequently, only the very first character of every line changes to "H".

Before Execution

First, create input.txt using the following command. Note that for lines containing tabs, you may need to press Ctrl+v then Tab in your terminal.

cat << 'EOF' > input.txt
hello.
world.

hello world.
	hello.
1hello 2world.
EOF

input.txt:

hello.
world.

hello world.
	hello.
1hello 2world.

After Execution

sed 's/./H/' input.txt

Output:

Hello.
Horld.

Hello world.
Hhello.
Hhello 2world.

Empty lines remain unchanged because there is no character to match. In Line 5, the leading tab character is replaced by "H". In Line 6, the "1" is the first character, so it becomes "H".

Execution image

Differences Between GNU sed and BSD sed

macOS typically comes with the BSD version of sed, while Linux usually features the GNU version. Their behaviors can differ.

ItemGNU sedBSD sed
-i option (In-place)sed -i 's/a/b/' filesed -i '' 's/a/b/' file (Requires backup extension)
\t (Tab)Supported in regexMay not be supported
\+, \?Supported in BREOften not supported
-E optionEnables Extended Regex (ERE)Also used for ERE

Unless otherwise noted, commands in this article are written for the BSD version.

Replacing All Occurrences of hello with HI

sed 's/hello/HI/g' input.txt
ElementDescription
sSubstitute command.
helloThe string to search for.
HIThe replacement string.
gThe global flag. Replaces all matches within a line.

Without the g flag, only the first "hello" per line is replaced. With it, every instance—including those in Line 6—is changed.

Extracting Only Lines That Contain hello

sed -n '/hello/p' input.txt
ElementDescription
-nSuppresses default output.
/hello/The pattern to match.
pThe print command. Explicitly outputs the matched line.

By default, sed outputs every line. Combining -n to suppress output with p to explicitly print matched lines achieves grep-like extraction behavior.

Extracting a Column Using Backreferences

sed -n 's/\(hello\).*/\1/p' input.txt
ElemtentDescription
\(hello\)Grouping. The matched content can be referenced as \1.
.*Matches any sequence of characters (0 or more).
\1References the string matched by the first group.
pOutputs only if the substitution was successful.

By combining regular expression grouping with backreferences, it is possible to extract only a portion of a line — effectively pulling out a specific column.

Swapping the Order of hello and world Using Backreferences

sed -n 's/\(hello\) \(world\)/\2 \1/p' input.txt
ElementDescription
\(hello\)First group (referenced as \1)
\(world\)Second group (referenced as \2)
\2 \1Outputs the groups in reversed order

Backreferences are numbered in order as \1, \2, and so on. On lines matching hello world, the substitution is applied and the output becomes world hello.

Removing Consecutive Blank Lines

sed 'N;s/^\n//' input.txt
ElementDescription
NReads the next line into the pattern space and appends it
s/^\n//Deletes the leading newline character

By default, sed processes one line at a time. The N command reads the next line as well, allowing two lines to be processed together. Without N, the pattern space does not contain a newline character, so the match fails. This is why N is necessary when removing consecutive blank lines.

Replacing Spaces with Underscores

sed 's/ /_/g' input.txt
ElementDescription
A half-width space
_The replacement string
gReplaces all matches on each line

Although it looks straightforward, a common mistake is confusing half-width and full-width spaces. When a full-width space is present, the pattern will not match. Always pay close attention to the type of whitespace character when working with spaces in regular expressions.

Replacing Tab Characters with the String SPACE

sed 's/\t/SPACE/g' input.txt
ElementDescription
\tEscape sequence representing a tab character
SPACEThe replacement string

In BSD sed, \t may not work inside regular expressions. In that case, use $'\t' or embed an actual tab character by pressing Ctrl+v followed by Tab. Since input.txt contains a tab on line 5, this command can be used to verify the behavior.

Escaping the Dot (.) and Replacing It with DOT

sed 's/\./DOT/g' input.txt
ElementDescription
\.An escaped dot. Matches a literal . character
DOTThe replacement string

In regular expressions, . is a metacharacter meaning "any single character." To match a literal dot, it must be escaped as \.. Forgetting the escape causes every character to match, resulting in unintended replacements.

Replacing Digits with #

sed 's/[0-9]/#/g' input.txt
ElementDescription
[0-9]A character class matching any single digit from 0 to 9
#The replacement string

[0-9] is a regular expression character class that represents a single digit. Since \d is not supported in BSD sed, using [0-9] is the safe and reliable approach.

Dynamic Substitution Using Shell Variables and $1

Wrapping the sed command in double quotes allows shell variables to be expanded.

var="argument"
sed "s/hello/$var/" input.txt
sed 's/hello/$var/' input.txt
ElementDescription
"s/hello/$var/"Double quotes. The shell expands $var to argument
's/hello/$var/'Single quotes. $var is treated as a literal string

Variables are not expanded inside single quotes. In shell scripts, $1 (the first positional argument) is commonly used. For example, sed "s/hello/$1/" input.txt allows the replacement string to be passed dynamically when the script is run.

Greedy and Non-Greedy Matching

sed -E 's/h.*o/X/' input.txt
ElementDescription
h.*oMatches the longest possible string starting with h and ending with o (greedy matching)

By default, .* in sed performs greedy (longest) matching. When applied to hello world, h.*o matches as far right as possible — not stopping at the first o but continuing to the last one in the line.

Non-greedy (shortest) matching is not supported in BSD sed. Even in GNU sed, a workaround such as [^o]* is required. If non-greedy matching is essential, consider using Perl or Python instead.

Replacing Only on a Specific Line

sed '4s/hello/HI/' input.txt
ElementDescription
4Address specifier targeting only line 4
s/hello/HI/Substitution command

Address specifiers allow processing to be limited to a specific line number or lines matching a regular expression. A range such as 2,4s/hello/HI/ is also supported.

About BRE and ERE

sed supports two types of regular expressions: BRE (Basic Regular Expressions) and ERE (Extended Regular Expressions).

BRE example:

sed 's/\(hello\)/HI/' input.txt

In BRE, grouping requires \( and \).

ERE example:

sed -E 's/(hello)/HI/' input.txt

With the -E option, ERE is enabled and grouping can be done with just ( and ). Syntax that requires \( in BRE can be written more cleanly in ERE.

Replacing hello or world with X Using Extended Regular Expressions

sed -E 's/(hello|world)/X/g' input.txt
ElementDescription
-EEnables extended regular expressions
(hello|world)Matches either hello or world
XThe replacement string

In ERE, | works without escaping,

Quick Reference: Common Regular Expressions in sed

ExpressionMeaningExample
.Any single characters/./X/
*Zero or more repetitions of the preceding elements/el*/X/
^Beginning of lines/^/> /
$End of lines/$/ end/
[abc]Any one of a, b, or cs/[abc]/*/g
[^abc]Any character except a, b, or cs/[^abc]//g
[0-9]Any single digits/[0-9]/#/g
\(…\)Grouping (BRE)s/\(hello\)/[\1]/
\1Backreferences/\(hello\) \(world\)/\2\1/
\.Literal dots/\./,/g
\tTab character (GNU sed)s/\t/ /g
+One or more repetitions (ERE)s/[0-9]+/#/g
?Zero or one occurrence (ERE)s/e?l/X/g
|Alternation (BRE)s/hello|world/X/g

Reverse Lookup: Find the Command for What You Want to Do

Example 1: Add a string at the beginning of each line

cat << 'EOF' > input.txt
hello.
world.
EOF
sed 's/^/> /' input.txt

^ matches the beginning of a line, and > is inserted there.

Example 2: Remove trailing spaces from each line

cat << 'EOF' > input.txt
hello   
world   
EOF
sed 's/ *$//' input.txt

$ matches the end of a line, and any spaces immediately before it are removed.

Example 3: Delete blank lines

cat << 'EOF' > input.txt
hello.

world.
EOF
sed '/^$/d' input.txt

^$ matches a line where the beginning and end are adjacent — in other words, an empty line. The d command deletes those lines.

Common Pitfalls When Commands Don’t Work as Expected

Full-width space deletion has no effect

cat << 'EOF' > sample.txt
hello world
EOF
sed 's/ //g' sample.txt

This command looks like it removes a full-width space, but if the character was converted to a half-width space during copy-paste or terminal encoding, the pattern will not match and nothing will change. Use cat -A or a similar tool to verify the actual characters in the file.

HTML tag removal deletes the entire line

cat << 'EOF' > sample.txt
<p>hello</p> and <span>world</span>
EOF
sed 's/<.*>//g' sample.txt

Because .* is greedy, it matches from the first < all the way to the last > on the line, removing everything in between. To limit matching to individual tags, use [^>]* instead of .*.

Backslash errors in date format conversion

cat << 'EOF' > sample.txt
2024-04-16
EOF
sed 's/\([0-9]\+\)-\([0-9]\+\)-\([0-9]\+\)/\3\/\2\/\1/' sample.txt

In BSD sed, \+ (one or more repetitions) may not be supported. In that case, rewrite it as [0-9][0-9]*, or switch to extended regular expressions using the -E option.

Broaden Your String Processing Skills with sed and Regular Expressions

sed may look simple at first glance, but combined with regular expressions it handles a wide range of tasks — substitution, extraction, formatting, and more. Beginners often find the differences between BRE and ERE, or the behavioral gaps between BSD and GNU sed, confusing at first. Running the commands in this article on actual files is the best way to build real understanding. Start small, experiment freely, and gradually expand your range of applications.

Leave a Reply

Your email address will not be published. Required fields are marked *

©︎ 2025-2026 running terminal commands