Link to home
Start Free TrialLog in
Avatar of 07592161981m
07592161981m

asked on

Large file splitting into smaller files

Hi Experts,

I have a large java dump file created and bundled in a single file for 6 days of data.
How  can i divide and create smaller files like for 6 days of data into 6 different files for each day,How can I do this by a script ?
Avatar of Steven Vona
Steven Vona
Flag of United States of America image

Avatar of 07592161981m
07592161981m

ASKER

I've requested that this question be deleted for the following reason:

Wrong Question.
Assuming that the first column of your file contains a datestamp (not timestamp!) without embedded spaces:

awk '{print $0 > $1".out"}' inputfile
awk '{print $0 > $1".out"}' verbosegc.20120911.170447.37451.txt.001
awk: (FILENAME=verbosegc.20120911.170447.37451.txt.001 FNR=14) fatal: can't redirect to `</initialized>.out' (No such file or directory)

Getting this error.
<af type="nursery" id="1" timestamp="Sep 11 17:04:52 2012" intervalms="0.000">.

Based on the above line , need to separate files for Sep11, 12, 13 & 14.

Have these timestamps in randomly.
<af type="nursery" id="1" timestamp="Sep 11 17:04:52 2012" intervalms="0.000">
<af type="nursery" id="193" timestamp="Sep 12 00:00:03 2012" intervalms="1370951.683">
<af type="nursery" id="491" timestamp="Sep 13 00:00:05 2012" intervalms="802938.834">
<af type="nursery" id="757" timestamp="Sep 14 00:02:49 2012" intervalms="1115834.953">

Need to Separate the files based on the starting time of the day.
ASKER CERTIFIED SOLUTION
Avatar of woolmilkporc
woolmilkporc
Flag of Germany image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
You are the  Master . It  worked  great.
Thx for the points!

You're fast - I eventually planned to enhance the command so that the output filenames would contain the input filename:

awk -F"timestamp=\""  '{D=substr($2,1,6); gsub(" ","",D); print $0 > FILENAME "-" D ".out"}' inputfile

Better?
After each timestamp line I have some data to be  get copied in the file. Its not getting copying in the new files created.

<af type="nursery" id="1" timestamp="Sep 11 17:04:52 2012" intervalms="0.000">
  <minimum requested_bytes="184" />
  <time exclusiveaccessms="0.008" meanexclusiveaccessms="0.008" threads="0" lastthreadtid="0x0000000000011100" />
  <refs soft="2390" weak="11402" phantom="0" dynamicSoftReferenceThreshold="32" maxSoftReferenceThreshold="32" />
  <nursery freebytes="0" totalbytes="167772160" percent="0" />
  <tenured freebytes="1808907656" totalbytes="1811939328" percent="99" >
    <soa freebytes="1718311304" totalbytes="1721342976" percent="99" />
    <loa freebytes="90596352" totalbytes="90596352" percent="100" />
  </tenured>
  <gc type="scavenger" id="1" totalid="1" intervalms="0.000">
    <flipped objectcount="257542" bytes="12485696" />
    <tenured objectcount="0" bytes="0" />
    <finalization objectsqueued="421" />
    <scavenger tiltratio="50" />
    <nursery freebytes="155140736" totalbytes="167772160" percent="92" tenureage="10" />
    <tenured freebytes="1808907656" totalbytes="1811939328" percent="99" >
      <soa freebytes="1718311304" totalbytes="1721342976" percent="99" />
      <loa freebytes="90596352" totalbytes="90596352" percent="100" />
    </tenured>
    <time totalms="115.926" />
  </gc>
  <nursery freebytes="155075200" totalbytes="167772160" percent="92" />
  <tenured freebytes="1808907656" totalbytes="1811939328" percent="99" >
    <soa freebytes="1718311304" totalbytes="1721342976" percent="99" />
    <loa freebytes="90596352" totalbytes="90596352" percent="100" />
  </tenured>
  <refs soft="2383" weak="11274" phantom="0" dynamicSoftReferenceThreshold="32" maxSoftReferenceThreshold="32" />
  <time totalms="116.058" />
</af>


<af type="nursery" id="2" timestamp="Sep 11 17:04:56 2012" intervalms="4104.046">
  <minimum requested_bytes="568" />
  <time exclusiveaccessms="0.016" meanexclusiveaccessms="0.016" threads="0" lastthreadtid="0x0000000000011100" />
  <refs soft="3883" weak="11844" phantom="0" dynamicSoftReferenceThreshold="31" maxSoftReferenceThreshold="32" />
  <nursery freebytes="0" totalbytes="167772160" percent="0" />
  <tenured freebytes="1806034848" totalbytes="1811939328" percent="99" >
    <soa freebytes="1715438496" totalbytes="1721342976" percent="99" />
    <loa freebytes="90596352" totalbytes="90596352" percent="100" />
  </tenured>
  <gc type="scavenger" id="2" totalid="3" intervalms="4104.164">
    <flipped objectcount="503380" bytes="25876296" />
    <tenured objectcount="0" bytes="0" />
    <finalization objectsqueued="2509" />
    <scavenger tiltratio="50" />
    <nursery freebytes="141327832" totalbytes="167772160" percent="84" tenureage="11" />
    <tenured freebytes="1806034848" totalbytes="1811939328" percent="99" >
      <soa freebytes="1715438496" totalbytes="1721342976" percent="99" />
      <loa freebytes="90596352" totalbytes="90596352" percent="100" />
    </tenured>
    <time totalms="190.079" />
  </gc>
  <nursery freebytes="141262296" totalbytes="167772160" percent="84" />
  <tenured freebytes="1806034848" totalbytes="1811939328" percent="99" >
    <soa freebytes="1715438496" totalbytes="1721342976" percent="99" />
    <loa freebytes="90596352" totalbytes="90596352" percent="100" />
  </tenured>
  <refs soft="2779" weak="11428" phantom="0" dynamicSoftReferenceThreshold="31" maxSoftReferenceThreshold="32" />
  <time totalms="190.319" />
</af>
So not every line contains "timestamp"? Didn't know that.

Anyway, here you go:

awk -F"timestamp=\""  'BEGIN {D="NODATE"} {if($0~"timestamp") {D=substr($2,1,6); gsub(" ","",D)}; print $0 > FILENAME "-" D ".out"}' inputfile

Should the first line of inputfile not contain "timestamp" then this line (and all following lines up to, but not including, the first line containing "timestamp") will go to a file named "inputfile-NODATE.out"
thank you very much WoolMilkPorc. It worked with out issues.