Objective: Remove ANSI escape sequences from a input file on UNIX / Linux.
ANSI escape code (or escape sequences) is a method using in-band signaling to control the formatting, color, and other output options on video text terminals. To encode this formatting information, it embeds certain sequences of bytes into the text, which the terminal looks for and interprets as commands, not as character codes.
ANSI escape sequences start with 0x1B
and the most common sequence is called CSI (stands for Control Sequence Introducer or Control Sequence Initiator). CSI sequence starts with ‘ESC
‘ (0x1B
) and ‘[
‘ (left bracket, 0x5B
) characters.
For ANSI color and styling (SGR – Select Graphic Rendition), CSI ends with the ‘m
‘ character. An example ANSI escape sequence is shown below – it will switch the foreground color to black.
1 |
\x1b[30m |
The above is with 1 SGR parameter – 30. Below are examples with 2 or 3 SGR parameters. Each SGR parameter is terminated with a ‘;
‘ character.
1 |
\x1b[1;33m |
1 |
\x1b[1;33;41m |
To remove ANSI SGR escape sequences from a file “ansi.log
“, use the following GNU sed
syntax. This will handle up to 2 SGR parameters.
1 |
$ sed -r "s/\x1b\[([0-9]{1,2}(;[0-9]{1,2})?)?m//g" < ansi.log > noansi.log |
To handle SGR and EL (Erase in Line) sequeneces, the sed
syntax has to be slightly modified.
1 |
$ sed -r "s/\x1b\[([0-9]{1,2}(;[0-9]{1,2})?)?[m|K]//g" < ansi.log > noansi.log |
If the ANSI sequence has 3 or more SGR parameters, the above will not work as the “?” quantifier in the regular expression will only match zero or one preceding element. To match more than one element, we will need to replace it with the ‘*’ quantifier.
1 |
$ sed -r "s/\x1b\[([0-9]{1,2}(;[0-9]{1,2})*)?[m|K]//g" < ansi.log > noansi.log |
The following printf
statement will print yellow text on a red background. sed will then remove the ANSI escape sequence and print the text without any formatting.
1 |
printf "\033[1;33;41m %s \033[0m\n" "YELLOW on RED" | sed -r "s/\x1b\[([0-9]{1,2}(;[0-9]{1,2})*)?m//g" |