Awk is a scripting language used for manipulating data and generating reports.The awk command programming language requires no compiling, and allows the user to use variables, numeric functions, string functions, and logical operators.

Awk is a utility that enables a programmer to write tiny but effective programs in the form of statements that define text patterns that are to be searched for in each line of a document and the action that is to be taken when a match is found within a line. Awk is mostly used for pattern scanning and processing. It searches one or more files to see if they contain lines that matches with the specified patterns and then performs the associated actions.

Awk is abbreviated from the names of the developers – Aho, Weinberger, and Kernighan.

WHAT CAN WE DO WITH AWK ?

1. AWK Operations:
(a) Scans a file line by line
(b) Splits each input line into fields
(c) Compares input line/fields to pattern
(d) Performs action(s) on matched lines

2. Useful For:
(a) Transform data files
(b) Produce formatted reports

3. Programming Constructs:
(a) Format output lines
(b) Arithmetic and string operations
(c) Conditionals and loops

Syntax:

Options:

Sample Commands

Example:
Consider the following text file as the input file for all cases below.

1. Default behavior of Awk : By default Awk prints every line of data from the specified file.

Output:

In the above example, no pattern is given. So the actions are applicable to all the lines. Action print without any argument prints the whole line by default, so it prints all the lines of the file without failure.

2. Print the lines which matches with the given pattern.

Output:

In the above example, the awk command prints all the line which matches with the ‘manager’.

3. Splitting a Line Into Fields : For each record i.e line, the awk command splits the record delimited by whitespace character by default and stores it in the $n variables. If the line has 4 words, it will be stored in $1, $2, $3 and $4 respectively. Also, $0 represents the whole line.

Output:

In the above example, $1 and $4 represents Name and Salary fields respectively.

Built In Variables In Awk

Awk’s built-in variables include the field variables—$1, $2, $3, and so on ($0 is the entire line) — that break a line of text into individual words or pieces called fields.

NR: NR command keeps a current count of the number of input records. Remember that records are usually lines. Awk command performs the pattern/action statements once for each record in a file.

NF: NF command keeps a count of the number of fields within the current input record.

FS: FS command contains the field separator character which is used to divide fields on the input line. The default is “white space”, meaning space and tab characters. FS can be reassigned to another character (typically in BEGIN) to change the field separator.

RS: RS command stores the current record separator character. Since, by default, an input line is the input record, the default record separator character is a newline.

OFS: OFS command stores the output field separator, which separates the fields when Awk prints them. The default is a blank space. Whenever print has several parameters separated with commas, it will print the value of OFS in between each parameter.

ORS: ORS command stores the output record separator, which separates the output lines when Awk prints them. The default is a newline character. print automatically outputs the contents of ORS at the end of whatever it is given to print.

Examples:

Use of NR built-in variables (Display Line Number)

Output:

In the above example, the awk command with NR prints all the lines along with the line number.

Use of NF built-in variables (Display Last Field)

Output:

In the above example $1 represents Name and $NF represents Salary. We can get the Salary using $NF , where $NF represents last field.

Another use of NR built-in variables (Display Line From 3 to 6)

Output:

More Examples

For the given text file:

1) To print the first item along with the row number(NR) separated with ” – “ from each line in geeksforgeeks.txt:

2) To return the second row/item from geeksforgeeks.txt:

3) To print any non empty line if present

4) To find the length of the longest line present in the file:

5) To count the lines in a file:

6) Printing lines with more than 10 characters:

7) To find/check for any string in any specific column:

8) To print the squares of first numbers from 1 to n say 6:

This article is contributed by Anshika Goyal and Praveen Negi. If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to [email protected] See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please write comments if you find anything incorrect, or you want to share more information about the topic discussed above.

I have 2 data files each containing one column. I want to make another data file by merging both the columns. I have the command line in shell but I don’t know how it works.

Please explain elaborately the below command:

I used the command above to write a shell script and got following output:

I want to know how this command line is working on this example to give the output?

How to use the awk command on linux

1 Answer 1

Well, part of learning to use Unix is to figure out what existing scripts are doing. In this case you need to know a bit about how awk works to understand the code. I will focus on describing the awk part, this should get you started in figuring out the rest.

Basically awk is a pattern-driven scripting language, where commands consist of both a (search) pattern/condition and a corresponding code block. During execution, any input files are read line by line and if the pattern/condition is true for a line, the code block is executed. There are special patterns BEGIN and END which are used to trigger code to get executed before the first line or after the last line is read.

In your example you have three pattern/code lines:

NR and FNR are two special variables set by awk . You can look up their meaning with man awk to see that

so basically this condition is true while lines from the first line are read (which means that a[i++]=$0 is executed once for each line from the first file) and false for all additional files. $0 is the current line of input.

This code block has no condition/pattern so it gets executed for every line read (from all files including the first one).

This part runs after the last line of the last file has been read and processed.

With these basics you should be able to figure out the meaning of the different code blocks and variables yourself.