We help IT Professionals succeed at work.

Delete lines from long file UNLESS the line starts with specified character (Perl)

hankknight
hankknight asked
on
256 Views
Last Modified: 2013-11-05
Hello,

I have a file that is more than 800 mb (it is a .sql dump saved in UTF-8).

It has useful information but also allot of junk.

Most of the lines that I do NOT need start with funny characters like this:

ÂÈÒØÙæçè÷øÿ

So I want to REMOVE all lines UNLESS they start with one of the following:

abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'"`\/-#$*()

Lines may ALSO begin with a space or a tab.

If a line starts with ANYTHING else then I want the entire line DELETED.

How can I do this with a Perl script that I can exicute from a command line?

Thanks!
Comment
Watch Question

Top Expert 2005
Commented:
This one is on us!
(Get your first solution completely free - no credit card required)
UNLOCK SOLUTION
Top Expert 2009

Commented:
You might want to use [\x20-\x7E].  This will get all normal characters.
See here for the normal characters that you are missing in the above:
http://www.asciitable.com/

You could use this in the command given my ravenpl, just change the characters matched.

Author

Commented:
Thanks.

I tried both of your ideas and neither one worked.  They both created 800mb files, meaning they deleted nothing.

cat dump.sql | perl -ne 'next unless(m/^[\sa-zA-Z\"\`\\\/\-\#\$\*\(\)]/); print' > fix1.sql
cat dump.sql | perl -ne 'next unless(m/^[\x20-\x7E]/); print' > fix2.sql

I have placed a small except from the real file here for testing (just the first 1.5 mb)
https://filedb.experts-exchange.com/incoming/ee-stuff/4405-fixDump.zip


Top Expert 2009

Commented:
The dump.sql file has 46 lines, with lines 45 and 46 being about 1,000,000 characters long each.
Lines 45 and 46 are each an INSERT statement.
It looks as if must of that data is an embeded image (looks like JPEG data at quick look).

The fixed.sql file has 45 lines - only 1 INSERT statement.
The INSERT INTO that begins at byte 2142 is actually data within the 11 (content) column in the first INSERT.

I think if you try to use the fixed.sql, you'll get an error.
What are you trying to do?  Are you trying to remove the content column?  Obviously, that data will be lost if you do.  But if you do, you need to remove it properly, or the file will become useless.

Author

Commented:
I have a .sql dump that is very important to me.  My server crashed and I MUST restore this .sql file backup.  

But it is 800mb large and when I try to restore it from a command line I get a mySQL error.

I don't mind manually going in there and removing some junk but there is way too much junk to remove.

I have another idea?  Maybe remove ALL lines that are longer than 500 characters and leave the rest?  What was left would be more manageable for me.

Yes, most of the useless data are binary jpgs  or png images that I do not need.
Top Expert 2009

Commented:
In the sample that you posted, if you remove all lines > 500 characters, you will be left with only the create table - no data.

What is the error you get when you try to restore the file?  

Author

Commented:
This is the error I am getting:

ERROR 1064 (42000) at line 158: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''PNG\r\n\Z\n\0\0\0\rIHDR\0\0Ø\0\0\0\0\0´Âø·\0\0\0+tEXtCreation Time\0Thu 9' at line 1

Can you think of a way to fix it?
Top Expert 2009

Commented:
Well, if you don't care about the content column data, we could remove it from the .sql file.  You should still be able to get all other data.

You might want to ask in the MySQL area to see if anyone there knows of a way to solve the problem without losing your data.

Author

Commented:
I have the mySQL people and they can't help    :-(

How can I get rid of all content column data?
Top Expert 2009
Commented:
This one is on us!
(Get your first solution completely free - no credit card required)
UNLOCK SOLUTION
Top Expert 2009

Commented:
Was this file ever transferred via FTP?

Author

Commented:
->>Was this file ever transferred via FTP?

Yes.  Maybe that is my problem!  I tried to transfer it three times and it was bust all three times. What is your idea?
Top Expert 2009

Commented:
I'm not certain on the format of the file, but I'm guessing the file needed to be transferred as BINARY.  If it was transferred as ASCII, that would have corrupted the data.

If the FTP transfer failed, that might have made things worse...

Gain unlimited access to on-demand training courses with an Experts Exchange subscription.

Get Access
Why Experts Exchange?

Experts Exchange always has the answer, or at the least points me in the correct direction! It is like having another employee that is extremely experienced.

Jim Murphy
Programmer at Smart IT Solutions

When asked, what has been your best career decision?

Deciding to stick with EE.

Mohamed Asif
Technical Department Head

Being involved with EE helped me to grow personally and professionally.

Carl Webster
CTP, Sr Infrastructure Consultant
Empower Your Career
Did You Know?

We've partnered with two important charities to provide clean water and computer science education to those who need it most. READ MORE

Ask ANY Question

Connect with Certified Experts to gain insight and support on specific technology challenges including:

  • Troubleshooting
  • Research
  • Professional Opinions
Unlock the solution to this question.
Join our community and discover your potential

Experts Exchange is the only place where you can interact directly with leading experts in the technology field. Become a member today and access the collective knowledge of thousands of technology experts.

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.