Solved

regex code,How to do this?include and exclude chars?

Posted on 2016-09-01
24
56 Views
Last Modified: 2016-09-01
I use this tool,Word List Updater 2.7:
the tool i use
All i want is to filter(exclude) all email domains and this type of chars: ®©�ØÇÖÄüèöµÃ‖|¦ and include this: :^/\\,.+ .;
The code bellow exclude this too: ^/\\,.+:;
^[^/\\{«»„““”‘’|\n\t….,;`^"<>'}+:?®©�ØÇÖÄüèöµÃ‖|¦]*$

This is my list i want to filter:

john>123
john:123
john;123
john/123
john@123
john!123
john#123
john%123
john^123
john=123
john*1
john&1
john+1
john\123
èáàúùóò
acan
itoh
ö
ü


Ã
Ä
Ö
Ç
Ø
RE
�^�O�OsG���
w���n���
john-123
john_123
marcy
µ
john
marcy
michael
test
&amp;lt;
&amp;lt
&amp;gt;
&amp;gt
&amp;lt;&amp;gt;
&amp;
&amp

^
��
¦
johnny$1234
john~123
john)123
john(123

Open in new window


so the final list must look like this:

john>123
john:123
john;123
john/123
john@123
john!123
john#123
john%123
john^123
john=123
john*1
john&1
john+1
john\123
acan
itoh
RE
john-123
john_123
marcy
john
marcy
michael
test
^
johnny$1234
john~123
john)123
john(123

Open in new window

0
Comment
Question by:john lambert
  • 14
  • 8
  • 2
24 Comments
 
LVL 48

Expert Comment

by:Rgonzo1971
ID: 41779925
Hi,

pls try


^[^{«»„““”‘’|\n\t…`"<>'}?®©�ØÇÖÄüèöµÃ‖|¦]*$ 

Open in new window

Regards
0
 

Author Comment

by:john lambert
ID: 41779961
no no doesn't work this is the rez. plz try to use same tool as mine is easy to find and clean

john/123
test:123
test;123
john@123
john!123
john#123
john%123
john^123
john=123
john*1
john&1
john+1
john\123
èáàúùóò
acan
itoh
ö
ü


Ã
Ä
Ö
Ç
Ø
RE
�^�O�OsG���
w���n���
john-123
john_123
marcy
µ
john
marcy
michael
test
&amp;lt;
&amp;lt
&amp;gt;
&amp;gt
&amp;lt;&amp;gt;
&amp;
&amp

^
��
¦
johnny$1234
john~123
john)123
john(123

Open in new window

0
 

Author Comment

by:john lambert
ID: 41779975
or i can give u mine
0
 
LVL 48

Expert Comment

by:Rgonzo1971
ID: 41780182
Sorry can't help further
0
 

Author Comment

by:john lambert
ID: 41780393
ok no problem thanks anyway
0
 
LVL 40

Expert Comment

by:Subsun
ID: 41780475
Are you reading from text file? If yes I can try using PowerShell script to achieve the result.
0
 

Author Comment

by:john lambert
ID: 41780479
yes yes i use that tool and a text file( 8 gb) huge file , yes
0
 
LVL 40

Expert Comment

by:Subsun
ID: 41780511
Try the following code.. it works based on your input and output data posted in the question..
GC C:\temp\input.txt | ?{$_ -match "^(?!(&|\?)).*([$>:;/@!#%^=*&+\\\-_$~)(]).*(?<!;)$|^[a-zA-Z].*[a-zA-Z]$"} | Out-File C:\temp\output.txt

Open in new window

1
 

Author Comment

by:john lambert
ID: 41780524
ur result, this must desseapear too: �^�O�OsG���
and this 2 lines too:

john@yahoo.com
john@live.com

can handle huge txt files right? 8-10 gb?
I added 2 new lines, 2 emails...so no emails

john/123
test:123
test;123
john@123
john@yahoo.com
john@live.com
john!123
john#123
john%123
john^123
john=123
john*1
john&1
john+1
john\123
�^�O�OsG���
john-123
john_123
^
johnny$1234
john~123
john)123
john(123

Open in new window

0
 
LVL 40

Expert Comment

by:Subsun
ID: 41780555
Try..
GC C:\temp\input.txt | ?{$_ -match "^(?!(&|\?|�)).*([$>:;/@!#%^=*&+\\\-_$~)(]).*(?<!;)$|^[a-zA-Z].*[a-zA-Z]$"} | Out-File C:\temp\output.txt

Open in new window

1
 

Author Comment

by:john lambert
ID: 41780568
yes much better one last favour make emails dessapear plz

john@yahoo.com
john@live.com
0
 

Author Comment

by:john lambert
ID: 41780575
si final final list must look like this

Input list:

john/123
test:123
test;123
john@123
john!123
john@yahoo.com
john@live.com
john#123
john%123
john^123
john=123
john*1
john&1
john+1
john\123
èáàúùóò
acan
itoh
ö
ü



Ã
Ä
Ö
Ç
Ø
RE

�^�O�OsG���
w���n���
john-123
john_123
marcy
µ
john
marcy
michael
test
&amp;lt;
&amp;lt
&amp;gt;
&amp;gt
&amp;lt;&amp;gt;
&amp;
&amp

^
��
¦
johnny$1234
john~123
john)123
john(123

Open in new window




output list:
---------------
john/123
test:123
test;123
john@123
john!123
john#123
john%123
john^123
john=123
john*1
john&1
john+1
john\123
acan
itoh
RE
john-123
john_123
marcy
john
marcy
michael
test
johnny$1234
john~123
john)123
john(123

Open in new window

0
What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

 
LVL 40

Expert Comment

by:Subsun
ID: 41780576
Try..
GC C:\temp\input.txt | ?{$_ -match "^(?!(&|\?|�)).*([$>:;/@!#%^=*&+\\\-_$~)(]).*(?<!;)$|^[a-zA-Z].*[a-zA-Z]$" -and $_ -notmatch '\w+@\w+\.\w+'} | Out-File C:\temp\output.txt

Open in new window

1
 

Author Comment

by:john lambert
ID: 41780593
yes yes perfecttt just wondering to be everything perfect can u do this too modify a bit ur script? :

1.handle huge files? (10-20-50 gb) ?
2.can u remove extra spaces?
3.can u remove duplicates from huge files using this regex?


for example,input:
john
john
john^123
  money
carlos 123
  marcos 123
john&123
john&123

Open in new window


output:
------------
john
john^123
money
carlos123 
marcos123
john&123

Open in new window

0
 
LVL 40

Expert Comment

by:Subsun
ID: 41780615
1.handle huge files? (10-20-50 gb) ?
Not sure, I have not tested it..
2.can u remove duplicates from huge files using this regex?

3.can u remove extra spaces?
GC C:\temp\input.txt | ?{$_ -match "^(?!(&|\?|�)).*([$>:;/@!#%^=*&+\\\-_$~)(]).*(?<!;)$|^[a-zA-Z].*[a-zA-Z]$" -and $_ -notmatch '\w+@\w+\.\w+'} | %{$_.Trim()} | Select -Unique | Out-File C:\temp\output.txt

Open in new window

0
 

Author Comment

by:john lambert
ID: 41780625
doesn't work,can't see :
input
-------
  money
carlos 123
  marcos 123

Open in new window


output(this 3 liens doesn't appear ):
----------
money
carlos 123
 marcos 123

Open in new window

0
 
LVL 40

Expert Comment

by:Subsun
ID: 41780645
Try..
GC C:\temp\input.txt | ?{$_ -match "^(?!(&|\?|�)).*([$>:;/@!#%^=*&+\\\-_$~)(\s]).*(?<!;)$|^[a-zA-Z].*[a-zA-Z]$" -and $_ -notmatch '\w+@\w+\.\w+'} | %{$_.Trim()} | Select -Unique | Out-File C:\temp\output.txt

Open in new window

0
 

Author Comment

by:john lambert
ID: 41780655
i can see in ur snapshot :

carlos 123
marcos 123

i want:
 
carlos123
marcos123

no extra spaces at all

My output is this,i used ur last string:

john/123
test:123
test;123
john@123
john^123
john!123
john#123
john%123
john=123
john*1
john&1
john+1
john\123
acan
itoh
RE
john-123
john_123
marcy
john
michael
test
^
johnny$1234
john~123
john)123
john(123

Open in new window

0
 
LVL 40

Accepted Solution

by:
Subsun earned 500 total points
ID: 41780662
Try..
GC C:\temp\input.txt | ?{$_ -match "^(?!(&|\?|�)).*([$>:;/@!#%^=*&+\\\-_$~)(\s]).*(?<!;)$|^[a-zA-Z].*[a-zA-Z]$" -and $_ -notmatch '\w+@\w+\.\w+'} | %{$_.Trim() -replace "\s"} | Select -Unique | Out-File C:\temp\output.txt

Open in new window

1
 

Author Comment

by:john lambert
ID: 41780693
Oh my God!!!!!!!!!!! hard work uhh u have steel nervs Subsun mmm it seems to remove also duplicates,amazing i just hope to handle HUGE txt files like 8 gb or 20 gb i will test tomorrow with huge txt files i hope to work fine
0
 

Author Comment

by:john lambert
ID: 41780698
working perfect thank you....just tell me if handle huge txt files?
0
 
LVL 40

Expert Comment

by:Subsun
ID: 41780707
I just tested the code which I posted in my last comment, I am not getting �^�O�OsG��� in output.
0
 

Author Comment

by:john lambert
ID: 41780713
no no my mistake no no the code is PERFECT, CONGRATULATION!!!!!!!!!!!!! 1000 THANKS!!
U worked a lot God bless you!!!!!!!!!!!!!!!!!!!!!!!!
0
 

Author Closing Comment

by:john lambert
ID: 41780726
working perfect,thanks for ur hard work, God bless u
1

Featured Post

What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

by Batuhan Cetin Regular expression is a language that we use to edit a string or retrieve sub-strings that meets specific rules from a text. A regular expression can be applied to a set of string variables. There are many RegEx engines for u…
Whatever be the reason, if you are working on web development side,  you will need day-today validation codes like email validation, date validation , IP address validation, phone validation on any of the edit page or say at the time of registration…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now