Solved

regex code,How to do this?include and exclude chars?

Posted on 2016-09-01
24
74 Views
Last Modified: 2016-09-01
I use this tool,Word List Updater 2.7:
the tool i use
All i want is to filter(exclude) all email domains and this type of chars: ®©�ØÇÖÄüèöµÃ‖|¦ and include this: :^/\\,.+ .;
The code bellow exclude this too: ^/\\,.+:;
^[^/\\{«»„““”‘’|\n\t….,;`^"<>'}+:?®©�ØÇÖÄüèöµÃ‖|¦]*$

This is my list i want to filter:

john>123
john:123
john;123
john/123
john@123
john!123
john#123
john%123
john^123
john=123
john*1
john&1
john+1
john\123
èáàúùóò
acan
itoh
ö
ü


Ã
Ä
Ö
Ç
Ø
RE
�^�O�OsG���
w���n���
john-123
john_123
marcy
µ
john
marcy
michael
test
&amp;lt;
&amp;lt
&amp;gt;
&amp;gt
&amp;lt;&amp;gt;
&amp;
&amp

^
��
¦
johnny$1234
john~123
john)123
john(123

Open in new window


so the final list must look like this:

john>123
john:123
john;123
john/123
john@123
john!123
john#123
john%123
john^123
john=123
john*1
john&1
john+1
john\123
acan
itoh
RE
john-123
john_123
marcy
john
marcy
michael
test
^
johnny$1234
john~123
john)123
john(123

Open in new window

0
Comment
Question by:john lambert
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 14
  • 8
  • 2
24 Comments
 
LVL 51

Expert Comment

by:Rgonzo1971
ID: 41779925
Hi,

pls try


^[^{«»„““”‘’|\n\t…`"<>'}?®©�ØÇÖÄüèöµÃ‖|¦]*$ 

Open in new window

Regards
0
 

Author Comment

by:john lambert
ID: 41779961
no no doesn't work this is the rez. plz try to use same tool as mine is easy to find and clean

john/123
test:123
test;123
john@123
john!123
john#123
john%123
john^123
john=123
john*1
john&1
john+1
john\123
èáàúùóò
acan
itoh
ö
ü


Ã
Ä
Ö
Ç
Ø
RE
�^�O�OsG���
w���n���
john-123
john_123
marcy
µ
john
marcy
michael
test
&amp;lt;
&amp;lt
&amp;gt;
&amp;gt
&amp;lt;&amp;gt;
&amp;
&amp

^
��
¦
johnny$1234
john~123
john)123
john(123

Open in new window

0
 

Author Comment

by:john lambert
ID: 41779975
or i can give u mine
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 51

Expert Comment

by:Rgonzo1971
ID: 41780182
Sorry can't help further
0
 

Author Comment

by:john lambert
ID: 41780393
ok no problem thanks anyway
0
 
LVL 40

Expert Comment

by:Subsun
ID: 41780475
Are you reading from text file? If yes I can try using PowerShell script to achieve the result.
0
 

Author Comment

by:john lambert
ID: 41780479
yes yes i use that tool and a text file( 8 gb) huge file , yes
0
 
LVL 40

Expert Comment

by:Subsun
ID: 41780511
Try the following code.. it works based on your input and output data posted in the question..
GC C:\temp\input.txt | ?{$_ -match "^(?!(&|\?)).*([$>:;/@!#%^=*&+\\\-_$~)(]).*(?<!;)$|^[a-zA-Z].*[a-zA-Z]$"} | Out-File C:\temp\output.txt

Open in new window

1
 

Author Comment

by:john lambert
ID: 41780524
ur result, this must desseapear too: �^�O�OsG���
and this 2 lines too:

john@yahoo.com
john@live.com

can handle huge txt files right? 8-10 gb?
I added 2 new lines, 2 emails...so no emails

john/123
test:123
test;123
john@123
john@yahoo.com
john@live.com
john!123
john#123
john%123
john^123
john=123
john*1
john&1
john+1
john\123
�^�O�OsG���
john-123
john_123
^
johnny$1234
john~123
john)123
john(123

Open in new window

0
 
LVL 40

Expert Comment

by:Subsun
ID: 41780555
Try..
GC C:\temp\input.txt | ?{$_ -match "^(?!(&|\?|�)).*([$>:;/@!#%^=*&+\\\-_$~)(]).*(?<!;)$|^[a-zA-Z].*[a-zA-Z]$"} | Out-File C:\temp\output.txt

Open in new window

1
 

Author Comment

by:john lambert
ID: 41780568
yes much better one last favour make emails dessapear plz

john@yahoo.com
john@live.com
0
 

Author Comment

by:john lambert
ID: 41780575
si final final list must look like this

Input list:

john/123
test:123
test;123
john@123
john!123
john@yahoo.com
john@live.com
john#123
john%123
john^123
john=123
john*1
john&1
john+1
john\123
èáàúùóò
acan
itoh
ö
ü



Ã
Ä
Ö
Ç
Ø
RE

�^�O�OsG���
w���n���
john-123
john_123
marcy
µ
john
marcy
michael
test
&amp;lt;
&amp;lt
&amp;gt;
&amp;gt
&amp;lt;&amp;gt;
&amp;
&amp

^
��
¦
johnny$1234
john~123
john)123
john(123

Open in new window




output list:
---------------
john/123
test:123
test;123
john@123
john!123
john#123
john%123
john^123
john=123
john*1
john&1
john+1
john\123
acan
itoh
RE
john-123
john_123
marcy
john
marcy
michael
test
johnny$1234
john~123
john)123
john(123

Open in new window

0
 
LVL 40

Expert Comment

by:Subsun
ID: 41780576
Try..
GC C:\temp\input.txt | ?{$_ -match "^(?!(&|\?|�)).*([$>:;/@!#%^=*&+\\\-_$~)(]).*(?<!;)$|^[a-zA-Z].*[a-zA-Z]$" -and $_ -notmatch '\w+@\w+\.\w+'} | Out-File C:\temp\output.txt

Open in new window

1
 

Author Comment

by:john lambert
ID: 41780593
yes yes perfecttt just wondering to be everything perfect can u do this too modify a bit ur script? :

1.handle huge files? (10-20-50 gb) ?
2.can u remove extra spaces?
3.can u remove duplicates from huge files using this regex?


for example,input:
john
john
john^123
  money
carlos 123
  marcos 123
john&123
john&123

Open in new window


output:
------------
john
john^123
money
carlos123 
marcos123
john&123

Open in new window

0
 
LVL 40

Expert Comment

by:Subsun
ID: 41780615
1.handle huge files? (10-20-50 gb) ?
Not sure, I have not tested it..
2.can u remove duplicates from huge files using this regex?

3.can u remove extra spaces?
GC C:\temp\input.txt | ?{$_ -match "^(?!(&|\?|�)).*([$>:;/@!#%^=*&+\\\-_$~)(]).*(?<!;)$|^[a-zA-Z].*[a-zA-Z]$" -and $_ -notmatch '\w+@\w+\.\w+'} | %{$_.Trim()} | Select -Unique | Out-File C:\temp\output.txt

Open in new window

0
 

Author Comment

by:john lambert
ID: 41780625
doesn't work,can't see :
input
-------
  money
carlos 123
  marcos 123

Open in new window


output(this 3 liens doesn't appear ):
----------
money
carlos 123
 marcos 123

Open in new window

0
 
LVL 40

Expert Comment

by:Subsun
ID: 41780645
Try..
GC C:\temp\input.txt | ?{$_ -match "^(?!(&|\?|�)).*([$>:;/@!#%^=*&+\\\-_$~)(\s]).*(?<!;)$|^[a-zA-Z].*[a-zA-Z]$" -and $_ -notmatch '\w+@\w+\.\w+'} | %{$_.Trim()} | Select -Unique | Out-File C:\temp\output.txt

Open in new window

0
 

Author Comment

by:john lambert
ID: 41780655
i can see in ur snapshot :

carlos 123
marcos 123

i want:
 
carlos123
marcos123

no extra spaces at all

My output is this,i used ur last string:

john/123
test:123
test;123
john@123
john^123
john!123
john#123
john%123
john=123
john*1
john&1
john+1
john\123
acan
itoh
RE
john-123
john_123
marcy
john
michael
test
^
johnny$1234
john~123
john)123
john(123

Open in new window

0
 
LVL 40

Accepted Solution

by:
Subsun earned 500 total points
ID: 41780662
Try..
GC C:\temp\input.txt | ?{$_ -match "^(?!(&|\?|�)).*([$>:;/@!#%^=*&+\\\-_$~)(\s]).*(?<!;)$|^[a-zA-Z].*[a-zA-Z]$" -and $_ -notmatch '\w+@\w+\.\w+'} | %{$_.Trim() -replace "\s"} | Select -Unique | Out-File C:\temp\output.txt

Open in new window

1
 

Author Comment

by:john lambert
ID: 41780693
Oh my God!!!!!!!!!!! hard work uhh u have steel nervs Subsun mmm it seems to remove also duplicates,amazing i just hope to handle HUGE txt files like 8 gb or 20 gb i will test tomorrow with huge txt files i hope to work fine
0
 

Author Comment

by:john lambert
ID: 41780698
working perfect thank you....just tell me if handle huge txt files?
0
 
LVL 40

Expert Comment

by:Subsun
ID: 41780707
I just tested the code which I posted in my last comment, I am not getting �^�O�OsG��� in output.
0
 

Author Comment

by:john lambert
ID: 41780713
no no my mistake no no the code is PERFECT, CONGRATULATION!!!!!!!!!!!!! 1000 THANKS!!
U worked a lot God bless you!!!!!!!!!!!!!!!!!!!!!!!!
0
 

Author Closing Comment

by:john lambert
ID: 41780726
working perfect,thanks for ur hard work, God bless u
1

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

by Batuhan Cetin Regular expression is a language that we use to edit a string or retrieve sub-strings that meets specific rules from a text. A regular expression can be applied to a set of string variables. There are many RegEx engines for u…
As most anyone who uses or has come across them can attest to, regular expressions (regex) are a complicated bit of magic. Packed so succinctly within their cryptic syntax lies a great deal of power. It's not the "take over the world" kind of power,…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

738 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question