Solved

Remove extra spaces, empty lines, dates,

Posted on 2010-11-17
4
505 Views
Last Modified: 2012-06-21
Hi,

How can I remove multiple spaces and empty lines? Also, I need to remove all single digits from a large file.

Here's what the file looks like:

     aword aword aword           aword   aword

bword bword             bword

1
3
2
4
5

3    #40,000 sss ss           ss $1000 # In this case I would want to remove 1,3,2,4,5,3 but not #40,000 or $1000


1 2 3 4 5 6 8 9 #Need to remove any standalone character 1-9


www Aug 21, 2007 #need to remove any instance of www


Oct 29, 2008 # Need to remove any occurrence of a date


Thanks a lot in advance.








 
0
Comment
Question by:faithless1
  • 3
4 Comments
 
LVL 10

Expert Comment

by:jeromee
ID: 34161350
The following:
perl -e'open(F,$ARGV[0])||die; $_=join("\n",<F>); s/(\s)\s+/$1/gm; 1 while( s/(^\s|^\d\n|^\d\s)//gm); s/\s*www\s*//g; print' my_big_file

Returns

aword aword aword aword aword
bword bword bword
40,000 sss ss ss $1000Aug 21, 2007
Oct 29, 2008

is that what you wanted?


0
 

Author Comment

by:faithless1
ID: 34161375
Superb thanks! I also wanted to remove any instance of a date that follows that format (Aug 21, 2007 etc). Thanks again
0
 
LVL 10

Accepted Solution

by:
jeromee earned 500 total points
ID: 34161712
Here you go:

perl -e'open(F,$ARGV[0])||die; $_=join("\n",<F>); s/(\s)\s+/$1/gm; 1 while( s/(^\s|^\d\n|^\d\s)//gm); s/\s*www\s*//g; s/(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s+\d+, \d{4}\s*//g; print' my_big_file

aword aword aword aword aword
bword bword bword
40,000 sss ss ss $1000
0
 
LVL 10

Expert Comment

by:jeromee
ID: 34177998
Glad I was able to help.
Happy Perling!
0

Featured Post

Networking for the Cloud Era

Join Microsoft and Riverbed for a discussion and demonstration of enhancements to SteelConnect:
-One-click orchestration and cloud connectivity in Azure environments
-Tight integration of SD-WAN and WAN optimization capabilities
-Scalability and resiliency equal to a data center

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

As most anyone who uses or has come across them can attest to, regular expressions (regex) are a complicated bit of magic. Packed so succinctly within their cryptic syntax lies a great deal of power. It's not the "take over the world" kind of power,…
There are many situations when we need to display the data in sorted order. For example: Student details by name or by rank or by total marks etc. If you are working on data driven based projects then you will use sorting techniques very frequently.…
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
In a recent question (https://www.experts-exchange.com/questions/29004105/Run-AutoHotkey-script-directly-from-Notepad.html) here at Experts Exchange, a member asked how to run an AutoHotkey script (.AHK) directly from Notepad++ (aka NPP). This video…

820 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question