Home / UNIX / Processing Text Tools In Linux

Processing Text Tools In Linux

Text Processing Tools In Linux
image source:-https://cdnp-2f3a.kxcdn.com

Tools For Extracting Text:

Linux operating system allows us to process the data in the form of file and different technique to extract the data from a particular file. There are different tools used to view the content of the file. Linux system represents the configurations and its related files are in the form of text. We use different tools to manage the file contents. Linux and similar operating system allow us a different type of tools to manipulate the text.

View File Contained:

A file is a way to represent the information in the form of texts so we can display the contents of existing file using particular tools.

1. Navigating text:

Linux system allows us to navigate the contents of the file using the particular command. A command mode is used to view the large file as line by line. A command line is used to quickly view any file and any section of the file. This command doesn’t require the whole file to be loaded into the memory to view part of it. There are different keys used to navigate the page like B character to move book on full screen, space bar ∅ key to move ahead a full screen, enter ↵ to move ahead one line, gk to move back one line, cr to move the bottom of the file, n key repeat the last search. q to quit etc.

Example:

more<file name>

less<file name>.

File Expert To View:

Linux system allows us to display the file contents using particular commands.

  1. Head
  2. Tail

Extracting Text By Keyword:

Linux operating system allows us to extract the contents of the file. We can also search the particular line from the existing file on the base of a searching pattern.

1. Grep:

Grep is a command that finds the contents of the base of given searching pattern. It stands for globally find regular expression. In this, we pass the searching pattern to extract the content of the file. With the help of expression.

-c count number of line from the searching pattern.

-m display line number of searching data.

-v  it display those lines which are not match

-i it ignores the case

-r to search them in reverse order.

Syntax:

grep[option]∅’pattern’∅filename

Example:

grep ‘anna’ abc

grep -n anna abc

grep -i anna abc

Extracting Text By Column:

The contents of the file are represented in the form of rows and column. Linux system helps us to filter the content from the file with the help of rows and column.

1. Cut:

It is a command that filters the contents of the file on the base of rows and columns. We can read the data from the file on the base of its field or column. In this, we pass the range of text to filter the contents and view them.

Roll Name Marks
1 Anna 100
2 Nanna 50
3 Janna 33

cut cols & Row file Name:

cut 1-6 anna↵

Gathering Text Statics:

A text file contain set of data at different line character and words. Unix system allows us to read the content from the file with the help of file and able to search the information.

Sorting Text:

The data are processed and represented in the form of files. The system contains the information into different fields or lines. We can arrange the text or contents into sorted order like ascending or descending.

1. Sort command:

It is a command that allows arranging the text into ascending or descending order. In this, we pass file name.

2. -r:

It is an option that allows arranging the data into descending order.

3. -n:

It is an option to arrange the contents in sorted order as numeric.

4. -f:

Ignore the field.

5. -u:

To remove duplicate lines in the output.

sort<filename>↵

sort anna

sort -f anna↵

sort -r anna↵

sort -v anna↵

Eliminating Duplicates Lines:

Unix operating system allows us to filter the unique records from the existing data it means we can remove the duplicate line from the particular file. It means the command of Unix check the data as unique and remove from the file.

1. Unique command:

It is a command that allows removing the duplicate line from the given file. In this, we pass file name for checking the unique data. A command uniq is used for removing the duplicate lines.

-u: It displays those lines that are unique

-d: This option display one copy of the line that is repeated in the input.

-c: Count the repeated record or particular line

uniq∅<file name>

uniq anna

unique -u anna

unique -c anna

uniq -d anna

Comparing File:

Linux system allows us to compare the files on the base of given records. In this method, we use the different files to compare the contents.

1. Diff:

It is a command that allows comparing the contents of given files. It separates the file contents after merging in a particular file. In this, we pass 2 file name.

Duplicating File Changes:

The command diff used to read and the difference between existing file. This format includes information to helps the utility of the file for accuracy. Unix system allows us to compare the files and find out the duplicate content between them.

Spell Checking:

The user uses a particular word in the file as contents, we can check the spelling using the particular command.

Aspell:

It is a command that allows checking the spelling from the given file. This command suggests the correct spelling of the existing word from the file.

aspell∅<filename>

aspell anna↵

Look:

It is a command that allows displaying the synomnies words. Word of given word.

Look girl↵

Tools For Manipulating Text:

1. tr:

It is a command that allows transferring the particular character to new character. In this, we pass the filename for changing the particular character on the base of given condition.

tr∅data∅New data∅file name

to’a-z’ ‘A-Z’ anna↵

2. Sed:

It is a command that also allows replacing the contents with new contents. It is basically used to convert particular word to another word. We can also use the wildcards with this command and g character for global searching.

sed s|old|New lg∅ file Name

sed s|Anna|Banna anna↵

Special Character For Search:

1. Regular Expressions:

The term regular expressions refer to search pattern that uses the special character called match character. We use a particular character to identify the data from the particular file using metacharacter. It displays the contents of the file where the existing pattern are matched. The regular expression also supports the wildcards character for searching the data.

Grep:

grep ‘an*’anna↵

grep ‘Raj[ie]V anna↵

Using Awk:

The Unix operating system allows us a feature to manage the data with different field format and help to process them. The data records can be controlled database. Awk basically used to match the patterns and processing the content using particular command and program. Awk based on the author. It is a program that supports programming language and its related instructions.

Awk:

It is a command that allows filtering the records and supports

$1 $2 $3
Roll Name Marks
1 Anna 100
2 Janna 30
3 Kanna 50
the programming language instructions. We can process the data on the base of given fields.

awk ‘addrev{action}’ file name

awk ‘{print}anna↵

awk/anna/{print}anna↵

Print:

It is a command that prints the contents.

Splitting A Line InTo Field:

Awk is allowed to filter a record from the particular file and by default, it separates the fields into the numeric format like 1,2,3, and display with$ symbol we use -F to separate the data field.

awk _F”!” print ($2, $3)’anna↵

About Parichay

Check Also

Unix Pipe And Redirection

Standard Input/Output: Linux operating system uses different types of input/output devices like keyboard, mouse, monitor, …

Leave a Reply

Your email address will not be published. Required fields are marked *