• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 305
  • Last Modified:

How do I join multiple lines in a text file with simple criteria of the lines to be joined and make certain the data keeps the correct spacing?

I am working with an interface that passes data from one system to another in XML. One of the systems has a description that can have multiple lines in freeform text. The text is delimited by <ATTRIBUTE NAME="Description"> at the beginning and </ATTRIBUTE> at the end. An example of what a file might look like is this:

        <ATTRIBUTE NAME="Description"> This is the description
of the item in
question. </ATTRIBUTE>

What I want it to end up looking like is the following:

        <ATTRIBUTE NAME="Description"> This is the description of the item in question. </ATTRIBUTE>

Spaces may or may not need to be added to make the data correct. If the lines were simply joined the above data would look like:

        <ATTRIBUTE NAME="Description"> This is the descriptionof the item inquestion. </ATTRIBUTE>

So, I also need to make certain that spaces are in the appropriate locations after the lines are joined.
0
e033343
Asked:
e033343
  • 4
  • 3
1 Solution
 
e033343Author Commented:
Should be multiple lines, instead of two lines.
0
 
Murugesan NagarajanSubject-matter expert at C++ C delivery, implementation, at UNIX oriented operating systems (Windows: CYGWIN_NT MINGW32_NT MINGW64_NT)Commented:
#Execute the following command:
awk 'BEGIN {spaceInPreviousLine=-1;ATTRIBUTEsentence="";}
{
    if(substr($0,1,10)=="")
    {
        if(substr($0,1,1)!=" ")
        {
            currSentence=" "$o
        }
        else
        {
            currSentence=$0
        }
        ATTRIBUTEsentence=ATTRIBUTEsentence""currSentence
        spaceInPreviousLine=-1
        print ATTRIBUTEsentence
    }
    else
    {
        if(spaceInPreviousLine==0)
        {
            if(substr($0,1,1)!=" ")
            {
                currSentence=" "$o
            }
            else
            {
            currSentence=$0
            }
        }
        else if(spaceInPreviousLine==-1)
        {
            OtherLines=$0
            print OtherLines
        }
        else
        {
            currSentence=$0
        }
        ATTRIBUTEsentence=ATTRIBUTEsentence""currSentence
    }
}' XMLfile > RequiredFileName

http://www.geocities.com/mukeshgct/technical/shellscripting/awkATTRIBUTE.html
0
 
e033343Author Commented:
I am working with this code. I think I will be able to modify it to get what I want. Thanks for the help.
0
Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

 
Murugesan NagarajanSubject-matter expert at C++ C delivery, implementation, at UNIX oriented operating systems (Windows: CYGWIN_NT MINGW32_NT MINGW64_NT)Commented:
Let us know if any further changes are required for this code.

For the following file:
##################
testing12
testing11
testing11
 First
 of the item in
 of the item in
 of the item in
of the item in
of the item in
 of the item in
of the item in
 of the item in
 of the item in
of the item in
question.
testing9
testing8
testing7
testing7
Second
of the item in
question.
testing5
testing4
testing3
testing3
 Third
of the item in
question.
testing1
##################


this code will have the following output:
##################
testing12
testing11
testing11
 First of the item in of the item in of the item in of the item in of the item in of the item in of the item in of the item in of the item in of the item in question.
testing9
testing8
testing7
testing7
Second of the item in question.
testing5
testing4
testing3
testing3
 Third of the item in question.
testing1
##################
0
 
e033343Author Commented:
The code works great for the data you have in the test file. The data that I have is formatted with preceding spaces to Identify sections. Here is an example of how the data actually looks:
<?xml version="1.0" encoding="US-ASCII"?><OBJECT>
 <OBJECTDATA>
  <CONTROLAREA>
   <BSR>
    <VERB>Save</VERB>
    <NOUN>DIB</NOUN>
    <INTERFACEID>PDMCSAP</INTERFACEID>
   </BSR>
   <SOURCE LOCATION="ABC" DIRECTION="OUT" REQUESTERID="GHI">
    <AUTHID>PDMCSAP</AUTHID>
    <DATE>
     <MONTH>11</MONTH>
     <DAY>05</DAY>
     <YEAR>2008</YEAR>
    </DATE>
    <TIME>13:20</TIME>
   </SOURCE>
  </CONTROLAREA>
  <DATAAREA>
   <PART NAME="ABC" REVISION="Z" VAULT="PPP" POLICY="X" STATE="A" TYPE="Part" NEWDESIGN="Yes">
    <ATTRIBUTELIST>
     <ATTRIBUTE NAME="End Item">XX</ATTRIBUTE>
     <ATTRIBUTE NAME="RoHS Compliant">Unassigned</ATTRIBUTE>
     <ATTRIBUTE NAME="Method of Classification">ODI</ATTRIBUTE>
     <ATTRIBUTE NAME="Controlling Document Revision"></ATTRIBUTE>
     <ATTRIBUTE NAME="Unit of Measure">TTT</ATTRIBUTE>
     <ATTRIBUTE NAME="Op Code">Unknown</ATTRIBUTE>
     <ATTRIBUTE NAME="Serial Indicator">No traceability</ATTRIBUTE>
     <ATTRIBUTE NAME="Material Type">HALB Semi Finished Products</ATTRIBUTE>
     <ATTRIBUTE NAME="ODA Cage Code"></ATTRIBUTE>
     <ATTRIBUTE NAME="Tooling Cross-Reference"></ATTRIBUTE>
     <ATTRIBUTE NAME="Vendor Part and CAGE Code"></ATTRIBUTE>
     <ATTRIBUTE NAME="Related Technology Export Classification"></ATTRIBUTE>
     <ATTRIBUTE NAME="Release Status">A</ATTRIBUTE>
     <ATTRIBUTE NAME="Special Conditions"></ATTRIBUTE>
     <ATTRIBUTE NAME="Spare Part">A</ATTRIBUTE>
     <ATTRIBUTE NAME="Description">TEST
TO CONCATENATE
LINES</ATTRIBUTE>
     <ATTRIBUTE NAME="Weight Unit">X</ATTRIBUTE>
     <ATTRIBUTE NAME="Print Code">Unknown</ATTRIBUTE>
     <ATTRIBUTE NAME="Originator"></ATTRIBUTE>
     <ATTRIBUTE NAME="Pb-Free">J</ATTRIBUTE>
    </ATTRIBUTELIST>
   </PART>
  </DATAAREA>
 </OBJECTDATA>
</OBJECT>
 
As you can see only  the beginning, ending and the data I am trying to fix actually do not have spaces at the begining of the lines.
 
Thank you for your help.
0
 
Murugesan NagarajanSubject-matter expert at C++ C delivery, implementation, at UNIX oriented operating systems (Windows: CYGWIN_NT MINGW32_NT MINGW64_NT)Commented:
The following will work as expected:

awk 'BEGIN { previousLineAttribute = 0 ; }
{
      currLine = $0
      if(previousLineAttribute == 0 )
      {
            attributeLine = $0
            attributeLineLen = length(attributeLine) ;
            for( i=0; i <= attributeLineLen ; i++)
            {
                  firstCharacter = substr($0,i,1) ;
                  if( (firstCharacter==" ") || (firstCharacter=="      ") )
                  {
                        continue;
                  }
                  attributeLine = substr($0,i)
                  if(substr(attributeLine,1,11)=="")
                        {
                              print $0
                        }
                        else
                        {
                              previousLineAttribute = 1;
                              attributeLine = $0 ;
                        }
                  }
                  else
                  {
                        print $0
                  }
                  break;
            }
      }
      else
      {
            tmpCurrLine = currLine ;
            if ( substr(currLine,i,1)!=" " )
            {
                  tmpCurrLine = " "currLine
            }
            attributeLine=attributeLine""tmpCurrLine
            attributeLineLen = length(attributeLine) ;
            if(substr(attributeLine, attributeLineLen-11, attributeLineLen )=="")
            {
                  print attributeLine ;
                  previousLineAttribute = 0 ;
            }
      }
}' XMLFileName > ChangedFileName




http://www.geocities.com/mukeshgct/technical/shellscripting/awkATTRIBUTE.html
0
 
e033343Author Commented:
Thanks for your help.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Cloud Class® Course: C++ 11 Fundamentals

This course will introduce you to C++ 11 and teach you about syntax fundamentals.

  • 4
  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now