Remove extra spaces, empty lines, dates,

Posted on 2010-11-17
Last Modified: 2012-06-21

How can I remove multiple spaces and empty lines? Also, I need to remove all single digits from a large file.

Here's what the file looks like:

     aword aword aword           aword   aword

bword bword             bword


3    #40,000 sss ss           ss $1000 # In this case I would want to remove 1,3,2,4,5,3 but not #40,000 or $1000

1 2 3 4 5 6 8 9 #Need to remove any standalone character 1-9

www Aug 21, 2007 #need to remove any instance of www

Oct 29, 2008 # Need to remove any occurrence of a date

Thanks a lot in advance.

Question by:faithless1
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
LVL 10

Expert Comment

ID: 34161350
The following:
perl -e'open(F,$ARGV[0])||die; $_=join("\n",<F>); s/(\s)\s+/$1/gm; 1 while( s/(^\s|^\d\n|^\d\s)//gm); s/\s*www\s*//g; print' my_big_file


aword aword aword aword aword
bword bword bword
40,000 sss ss ss $1000Aug 21, 2007
Oct 29, 2008

is that what you wanted?


Author Comment

ID: 34161375
Superb thanks! I also wanted to remove any instance of a date that follows that format (Aug 21, 2007 etc). Thanks again
LVL 10

Accepted Solution

jeromee earned 500 total points
ID: 34161712
Here you go:

perl -e'open(F,$ARGV[0])||die; $_=join("\n",<F>); s/(\s)\s+/$1/gm; 1 while( s/(^\s|^\d\n|^\d\s)//gm); s/\s*www\s*//g; s/(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s+\d+, \d{4}\s*//g; print' my_big_file

aword aword aword aword aword
bword bword bword
40,000 sss ss ss $1000
LVL 10

Expert Comment

ID: 34177998
Glad I was able to help.
Happy Perling!

Featured Post

Enroll in May's Course of the Month

May’s Course of the Month is now available! Experts Exchange’s Premium Members and Team Accounts have access to a complimentary course each month as part of their membership—an extra way to increase training and boost professional development.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Convert grep lines to perl 6 96
unable to put logic for reading multiple repo in a single file 4 101
IE 11 + long running scripts 3 117
Sleep one hour while in loop 4 40
As most anyone who uses or has come across them can attest to, regular expressions (regex) are a complicated bit of magic. Packed so succinctly within their cryptic syntax lies a great deal of power. It's not the "take over the world" kind of power,…
Active Directory replication delay is the cause to many problems.  Here is a super easy script to force Active Directory replication to all sites with by using an elevated PowerShell command prompt, and a tool to verify your changes.
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…

740 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question