Generating multiple files

Posted on 2011-05-09
Last Modified: 2012-05-11

The following (perl -i.bak -pe '$_.="\n\n" unless $.%2000' ) produces an output with two \n after every 2k records.

The files generated have no specific column length and (in some cases 98K records in others 157k).

I'm looking for a way to make an adjustment to the above so the final output created is in batches of 50k records (not inclusive of \n)

Thank you
Question by:faithless1
    LVL 11

    Accepted Solution

    Hi ff1,

    How about this:
        perl -i.bak -pe '$len+=length;print "\n\n",$len=0 if $len>51200'
    LVL 11

    Assisted Solution

    Correction to my post above:
        perl -i.bak -pe '$len+=length;$len=length,print "\n\n" if $len>51200'

    This still won't always create exactly 50K (51,200 byte) paragraphs, since it doesn't split lines.  Is that a concern ff1?  If so, do you want lines to be split?

    By the way, this script is similar to ozo's one, but neither "generate multiple files", as the subject of this thread suggests.  Are you wanting multiple files?

    Author Comment

    Hi tel2,

    Yes, I'm looking to generate multiple files from a file with 467192 records. Every 2k records will be followed by 2 empty lines.

    Author Comment

    This command: perl -i.bak -pe '$_.="\n\n" unless $.%2000'

    Considering that I will truncate the file to 450K for simplicity, the above script will add 432 empty lines so then I will need to output each batch to a different file.

    50048 - 1st

    100096 - 2nd etc
    LVL 11

    Assisted Solution

    Sorry fl1, I still don't understand the requirements, and I doubt I will be putting more time into it.
    Your initial description (apart from the heading) says nothing about multiple files.  The code you've supplied doesn't generate multiple files - it generates one file, and puts 2 newlines after each 2000 lines.  And your requested adjustment to that code is: " the final output created is in batches of 50k records (not inclusive of \n)".  No mention of multiple files there.

    If you want to make this clearer for me or anyone else who might work on this:
    - Please explain the whole thing clearly.
    - Provide sample input and output, including filenames.  (For conciseness, if you want to abbreviate it, by putting "...etc..." in the middle of the data, feel free.)

    By the way, why do you want this?


    Featured Post

    How your wiki can always stay up-to-date

    Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
    - Increase transparency
    - Onboard new hires faster
    - Access from mobile/offline

    Join & Write a Comment

    I've just discovered very important differences between Windows an Unix formats in Perl,at least 5.xx.. MOST IMPORTANT: Use Unix file format while saving Your script. otherwise it will have ^M s or smth likely weird in the EOL, Then DO NOT use m…
    I have been pestered over the years to produce and distribute regular data extracts, and often the request have explicitly requested the data be emailed as an Excel attachement; specifically Excel, as it appears: CSV files confuse (no Red or Green h…
    Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
    how to add IIS SMTP to handle application/Scanner relays into office 365.

    728 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    20 Experts available now in Live!

    Get 1:1 Help Now