Useful one-liner to get the number of fields in the file

Today at work I had to figure out how many observations in large text file were broken. Each line represents observation. For some reason the script which produced the file was breaking sometimes so the lines were shorter than expected. Instead of looking like:

1,0:0:0:0:…:0

with total 81 fields after “,” they looked like

1,0:0:0:0

To figure out the exact number of broken lines and to locate them I used

awk ‘BEGIN {FS = “,”} NF < 81} | wc -l

This will just count the number of them. ‘FS’ sets delimiter. ‘NF’ is variable holding the number of fields in the line. Then we pipe into ‘wc’ with option ‘-l’ to only count the lines. If we wanted to filter out those new lines we could do

awk ‘BEGIN {FS = “,”} NF == 81} | filtered_file.txt

We could then compare them to quickly see which lines and where in the file got broken using vim editor:

vim -d unfiltered_file.txt filtered_file.txt

This last technique is quite useful. In just two months I used it quite often when working with different versions of scripts not under source control. Vim gives colored comparison of the differences between files. Conveniently you just press Ctrl-[ and Ctrl-] to jump back and force in places where two files are different. This is more convenient than standard linux ‘diff’ utility.

Advertisements
This entry was posted in Bash and tagged . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s