1

I have a file containing tons of information. It looks like this:

===============================================================================


   NSTEP       ENERGY          RMS            GMAX         NAME    NUMBER
     52      -4.8969E+05     5.1393E+00     1.7327E+03     P1         31

 BOND    =    29534.6906  ANGLE   =     2139.5547  DIHED      =     9235.7381
 VDWAALS =    51148.8783  EEL     =  -595288.4773  HBOND      =        0.0000
 1-4 VDW =     2741.3848  1-4 EEL =    26043.4789  RESTRAINT  =       29.3591
 DFTBESCF=   -15274.2075
 EAMBER  =  -489718.9594
 NMR restraints: Bond =    0.000   Angle =     0.000   Torsion =     0.000
===============================================================================

Now, I want to extract (and put in one column only) the value (here shown as -4.8969E+05) from the 'Energy' (or second) column in the line directly under the word ENERGY.

I tried to extract it with grep but have not been able to.

K7AAY
  • 9,631
  • I made your question somewhat readable. Take it as an example and improve further ([edit]) if needed. In particular: do these === belong to the file? do ** belong? Also please note we are not a script writing service, some research effort is required. What have you tried so far? – Kamil Maciorowski Oct 31 '18 at 22:28
  • Please click on edit above to the left, and add to your original post what we have asked for. Is the file you are trying to extract from tabulated; do {TAB} characters separate the columns, or are they separated by a number of spaces? If you can upload a sample file to someplace like Google Drive and share the URL so we might download it, that would speed resolution. Please also include what you tried with grep so far; it is far easier for us to fix a broken script than to write from scratch. – K7AAY Oct 31 '18 at 22:38
  • If the file contains only space delimiters (no tabs, no back-spaces), then you should be able to use cut -c M-N to extract the columns you want, with some additional filters to remove the non-columnar data. – AFH Oct 31 '18 at 23:07

1 Answers1

0

You seem to be saying “I want the second field from the line immediately after the line that contains the word ENERGY (in which ENERGY is the second field).”  If that’s what you want, you can do it with

awk '/ENERGY/ { found_it=1; next;     }
    found_it  { print $2; found_it=0; }'

(Put your filename at the end of that command — right after the }' — or pipe your data into the above command.)

This simply

  • Looks for a line that contains the string ENERGY,
    • sets a flag (found_it) when it does,
    • and skips that line.
  • When it encounters a line, and the found_it flag is set, that means that the previous line contained ENERGY, so
    • print the second word from that line, and
    • clear (zero) the found_it flag, so we don’t produce output from any subsequent lines.

If you file has ENERGY on lines 4, 14 and 24, then the above command will print the second field from lines 5, 15 and 25.  If that’s not what you want, a simpler approach is

awk '/ENERGY/ { found_it=1; next; }
    found_it  { print $2;   exit; }'

which is the same as the first one, except, after it has printed the second field from line 5, it just stops looking.  Even if ENERGY appears only once in the file, this approach is preferable in that it doesn’t require reading the entire file, but only up to the value you want.

These commands will:

  • find the string ENERGY even if it is part of a larger word, such as CENERGY, ENERGY-CONSUMING, ENERGYLEVEL or HIGH-ENERGY.
  • find the string ENERGY even if it is not the second field on its line.
  • print the value of the second field, not the field where it found ENERGY.
  • not find Energy or energy.
  • fail silently if ENERGY is on the last line.

If those are problems, edit your question to specify your requirements.

  • It works greatly! I was getting crazy with awk! Thanks a lot for your precious effort and time! You made my day! :) – Vito Genna Nov 01 '18 at 13:28