Filters are programs that take plain text(either stored in a file or produced by another program) as standard input, transforms it into a meaningful format, and then returns it as standard output. Linux has a number of filters. Some of the most commonly used filters are explained below:
Common Unix/ Linux filter programs are: cat, cut, grep, head, sort, uniq, and tail. Programs like awk and sed can be used to build quite complex filters because they are fully programmable. Unix filters can also be used by Data scientists to get a quick overview about a file based dataset.
Various Filter Commands with example
1. head: Displays the first n lines of the specified text files. If the number of lines is not specified then by default prints first 10 lines.
Syntax: head [-number_of_lines_to_print] [path]
tail : It works the same way as head, just in reverse order. The only difference in tail is, it returns the lines from bottom to up.
Syntax: tail [-number_of_lines_to_print] [path]
2. sort command: This command Sorts the lines alphabetically by default but there are many options available to modify the sorting mechanism. Be sure to check out the main page to see everything it can do.
Syntax: sort [-options] [path]
3. uniq command: This command Removes duplicate lines. uniq has a limitation that it can only remove continuous duplicate lines
Syntax: uniq [options] [path]
4. wc command: short for word count. wc command gives the number of lines, words and characters in the data.
Syntax:
wc [-options] [path]
wc gives 4 outputs as:
number of lines
number of words
number of characters
path
Newer versions of wc can differentiate between byte and character count. This difference arises with Unicode which includes multi-byte characters. The desired behaviour is selected with the -c or -m switch.
GNU wc used to be part of the GNU textutils package; it is now part of GNU coreutils.
options
wc -l <filename> print the line count
wc -c <filename> print the byte count
wc -m <filename> print the character count
wc -L <filename> print the length of longest line
wc -w <filename> print the word count
5. grep command: This command is used to search a particular information from a text file.
6. tac command: This command is used to reverse data.
7. cmp command
8. sed command: sed stands for stream editor. It is used to search and replace data effectively
The expression‘s/search/replace/g’
9. nl command: This command is used to number the lines of our text data.