?
Solved

Remove extra spaces, empty lines, dates,

Posted on 2010-11-17
4
Medium Priority
?
510 Views
Last Modified: 2012-06-21
Hi,

How can I remove multiple spaces and empty lines? Also, I need to remove all single digits from a large file.

Here's what the file looks like:

     aword aword aword           aword   aword

bword bword             bword

1
3
2
4
5

3    #40,000 sss ss           ss $1000 # In this case I would want to remove 1,3,2,4,5,3 but not #40,000 or $1000


1 2 3 4 5 6 8 9 #Need to remove any standalone character 1-9


www Aug 21, 2007 #need to remove any instance of www


Oct 29, 2008 # Need to remove any occurrence of a date


Thanks a lot in advance.








 
0
Comment
Question by:faithless1
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
4 Comments
 
LVL 10

Expert Comment

by:jeromee
ID: 34161350
The following:
perl -e'open(F,$ARGV[0])||die; $_=join("\n",<F>); s/(\s)\s+/$1/gm; 1 while( s/(^\s|^\d\n|^\d\s)//gm); s/\s*www\s*//g; print' my_big_file

Returns

aword aword aword aword aword
bword bword bword
40,000 sss ss ss $1000Aug 21, 2007
Oct 29, 2008

is that what you wanted?


0
 

Author Comment

by:faithless1
ID: 34161375
Superb thanks! I also wanted to remove any instance of a date that follows that format (Aug 21, 2007 etc). Thanks again
0
 
LVL 10

Accepted Solution

by:
jeromee earned 2000 total points
ID: 34161712
Here you go:

perl -e'open(F,$ARGV[0])||die; $_=join("\n",<F>); s/(\s)\s+/$1/gm; 1 while( s/(^\s|^\d\n|^\d\s)//gm); s/\s*www\s*//g; s/(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s+\d+, \d{4}\s*//g; print' my_big_file

aword aword aword aword aword
bword bword bword
40,000 sss ss ss $1000
0
 
LVL 10

Expert Comment

by:jeromee
ID: 34177998
Glad I was able to help.
Happy Perling!
0

Featured Post

VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

As most anyone who uses or has come across them can attest to, regular expressions (regex) are a complicated bit of magic. Packed so succinctly within their cryptic syntax lies a great deal of power. It's not the "take over the world" kind of power,…
Checking the Alert Log in AWS RDS Oracle can be a pain through their user interface.  I made a script to download the Alert Log, look for errors, and email me the trace files.  In this article I'll describe what I did and share my script.
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
Six Sigma Control Plans
Suggested Courses

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question