Solved

delete duplicate rows in a file as well as all commas, and values that are -999 or 999 using perl

Posted on 2011-09-02
8
266 Views
Last Modified: 2012-08-14
I have a file below. It has duplicate files for each day. I only want a single file for each day so it should
1. keep the first of each new original row then delete all the duplicates.

Then I want to

2. Delete commas & 

3. Delete negative values or values -999 or 999
test.txt
0
Comment
Question by:libertyforall2
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 3
8 Comments
 
LVL 9

Expert Comment

by:parparov
ID: 36475832
Use sort -u to get rid of the duplicates
Use sed s/\,// to eliminate commas
Use sed s/999// to eliminate 999s and so on.
0
 

Author Comment

by:libertyforall2
ID: 36476038
How to I write that in the command window? Also, how would I eliminate ALL negative numbers?
0
 

Author Comment

by:libertyforall2
ID: 36476048
uila% sort -u test.txt >text2.txt

It did not eliminated duplicate rows
0
Are your AD admin tools letting you down?

Managing Active Directory can get complicated.  Often, the native tools for managing AD are just not up to the task.  The largest Active Directory installations in the world have relied on one tool to manage their day-to-day administration tasks: Hyena. Start your trial today.

 
LVL 9

Accepted Solution

by:
parparov earned 500 total points
ID: 36476219
Your file looks unclean, having carriage returns (\r) instead of new lines (\n)
This sequence should do the job:
perl -ne 's/\r/\n/g; print;' < test.txt | sort -u | sed s/\,/\ /g | sed s/\ 999//g | sed s/\ \-999//g

Open in new window

0
 

Author Comment

by:libertyforall2
ID: 36476396
Almost. It worked but left negative numbers and rows with no data. some lines include these below

06-17-2011 00:00:00 7 8 18 7 15 9 8 32 14 14 12 11 5 -1 -1 -1 0 0 0 8 14 14 12 6
06-18-2011 00:00:00 4 8 8 4 1 2 5 17 13 11 8 15 11 10 16 12 11 11 9 10 8 4 5 6
06-19-2011 00:00:00 8 10 13 14 16 12 10 11 16 9 8 6 15 7 9 13 13 12 14 10 6 7 8 7
06-20-2011 00:00:00 6 7 8 5 4 7 9 18 8 6 6 10 12 8 7 9 9 8 7 7 4 3 2 3
06-21-2011 00:00:00 8 6 2 2 5 6 3 7 11 7 10 10 5 3 7 8 5 7 5 5 5 3 5 6
06-22-2011 00:00:00 6 5 4 3 3 3 2 20 8 6 4 10 18 17 19 16 22 24 10 22 21 23
06-23-2011 00:00:00 16 15 18 18 17 14 16 28 20 12 11
06-24-2011 00:00:00
06-25-2011 00:00:00
06-26-2011 00:00:00
06-27-2011 00:00:00
06-28-2011 00:00:00
06-29-2011 00:00:00 5 5 5 4 6 5 4 5 5 5 6 6 4 6 5
06-30-2011 00:00:00 2 3 5 2 4 6 6 8 8 4 4 5 6 4
06-30-2011 00:00:00
07-01-2011 00:00:00
07-02-2011 00:00:00
07-03-2011 00:00:00
07-04-2011 00:00:00
07-05-2011 00:00:00 4 14 13 10 11 13 14 22
07-06-2011 00:00:00 20 18 13 13 12 6 2 6 8 7 12 7 6 11 8 6 11 25 27 24 18 20 25 14
07-07-2011 00:00:00 14 7 5 8 11 11 11 15 23 12 16 20 9 8 9 12 16 13 17 16 19 18 20 22
0
 
LVL 9

Expert Comment

by:parparov
ID: 36476477
Deleting rows with no data is required?
0
 

Author Comment

by:libertyforall2
ID: 36476505
Need to delete rows with no data but the negative data points are more important. Lets just focus on that first.
0
 

Author Closing Comment

by:libertyforall2
ID: 36477268
Files worked
0

Featured Post

On Demand Webinar - Networking for the Cloud Era

This webinar discusses:
-Common barriers companies experience when moving to the cloud
-How SD-WAN changes the way we look at networks
-Best practices customers should employ moving forward with cloud migration
-What happens behind the scenes of SteelConnect’s one-click button

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article will help you understand what HashTables are and how to use them in PowerShell.
A quick Powershell script I wrote to find old program installations and check versions of a specific file across the network.
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans

728 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question