• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 106
  • Last Modified:

regex code,How to do this?include and exclude chars?

I use this tool,Word List Updater 2.7:
the tool i use
All i want is to filter(exclude) all email domains and this type of chars: ®©�ØÇÖÄüèöµÃ‖|¦ and include this: :^/\\,.+ .;
The code bellow exclude this too: ^/\\,.+:;
^[^/\\{«»„““”‘’|\n\t….,;`^"<>'}+:?®©�ØÇÖÄüèöµÃ‖|¦]*$

This is my list i want to filter:

john>123
john:123
john;123
john/123
john@123
john!123
john#123
john%123
john^123
john=123
john*1
john&1
john+1
john\123
èáàúùóò
acan
itoh
ö
ü


Ã
Ä
Ö
Ç
Ø
RE
�^�O�OsG���
w���n���
john-123
john_123
marcy
µ
john
marcy
michael
test
&amp;lt;
&amp;lt
&amp;gt;
&amp;gt
&amp;lt;&amp;gt;
&amp;
&amp

^
��
¦
johnny$1234
john~123
john)123
john(123

Open in new window


so the final list must look like this:

john>123
john:123
john;123
john/123
john@123
john!123
john#123
john%123
john^123
john=123
john*1
john&1
john+1
john\123
acan
itoh
RE
john-123
john_123
marcy
john
marcy
michael
test
^
johnny$1234
john~123
john)123
john(123

Open in new window

0
john lambert
Asked:
john lambert
  • 14
  • 8
  • 2
1 Solution
 
Rgonzo1971Commented:
Hi,

pls try


^[^{«»„““”‘’|\n\t…`"<>'}?®©�ØÇÖÄüèöµÃ‖|¦]*$ 

Open in new window

Regards
0
 
john lambertAuthor Commented:
no no doesn't work this is the rez. plz try to use same tool as mine is easy to find and clean

john/123
test:123
test;123
john@123
john!123
john#123
john%123
john^123
john=123
john*1
john&1
john+1
john\123
èáàúùóò
acan
itoh
ö
ü


Ã
Ä
Ö
Ç
Ø
RE
�^�O�OsG���
w���n���
john-123
john_123
marcy
µ
john
marcy
michael
test
&amp;lt;
&amp;lt
&amp;gt;
&amp;gt
&amp;lt;&amp;gt;
&amp;
&amp

^
��
¦
johnny$1234
john~123
john)123
john(123

Open in new window

0
 
john lambertAuthor Commented:
or i can give u mine
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
Rgonzo1971Commented:
Sorry can't help further
0
 
john lambertAuthor Commented:
ok no problem thanks anyway
0
 
SubsunCommented:
Are you reading from text file? If yes I can try using PowerShell script to achieve the result.
0
 
john lambertAuthor Commented:
yes yes i use that tool and a text file( 8 gb) huge file , yes
0
 
SubsunCommented:
Try the following code.. it works based on your input and output data posted in the question..
GC C:\temp\input.txt | ?{$_ -match "^(?!(&|\?)).*([$>:;/@!#%^=*&+\\\-_$~)(]).*(?<!;)$|^[a-zA-Z].*[a-zA-Z]$"} | Out-File C:\temp\output.txt

Open in new window

1
 
john lambertAuthor Commented:
ur result, this must desseapear too: �^�O�OsG���
and this 2 lines too:

john@yahoo.com
john@live.com

can handle huge txt files right? 8-10 gb?
I added 2 new lines, 2 emails...so no emails

john/123
test:123
test;123
john@123
john@yahoo.com
john@live.com
john!123
john#123
john%123
john^123
john=123
john*1
john&1
john+1
john\123
�^�O�OsG���
john-123
john_123
^
johnny$1234
john~123
john)123
john(123

Open in new window

0
 
SubsunCommented:
Try..
GC C:\temp\input.txt | ?{$_ -match "^(?!(&|\?|�)).*([$>:;/@!#%^=*&+\\\-_$~)(]).*(?<!;)$|^[a-zA-Z].*[a-zA-Z]$"} | Out-File C:\temp\output.txt

Open in new window

1
 
john lambertAuthor Commented:
yes much better one last favour make emails dessapear plz

john@yahoo.com
john@live.com
0
 
john lambertAuthor Commented:
si final final list must look like this

Input list:

john/123
test:123
test;123
john@123
john!123
john@yahoo.com
john@live.com
john#123
john%123
john^123
john=123
john*1
john&1
john+1
john\123
èáàúùóò
acan
itoh
ö
ü



Ã
Ä
Ö
Ç
Ø
RE

�^�O�OsG���
w���n���
john-123
john_123
marcy
µ
john
marcy
michael
test
&amp;lt;
&amp;lt
&amp;gt;
&amp;gt
&amp;lt;&amp;gt;
&amp;
&amp

^
��
¦
johnny$1234
john~123
john)123
john(123

Open in new window




output list:
---------------
john/123
test:123
test;123
john@123
john!123
john#123
john%123
john^123
john=123
john*1
john&1
john+1
john\123
acan
itoh
RE
john-123
john_123
marcy
john
marcy
michael
test
johnny$1234
john~123
john)123
john(123

Open in new window

0
 
SubsunCommented:
Try..
GC C:\temp\input.txt | ?{$_ -match "^(?!(&|\?|�)).*([$>:;/@!#%^=*&+\\\-_$~)(]).*(?<!;)$|^[a-zA-Z].*[a-zA-Z]$" -and $_ -notmatch '\w+@\w+\.\w+'} | Out-File C:\temp\output.txt

Open in new window

1
 
john lambertAuthor Commented:
yes yes perfecttt just wondering to be everything perfect can u do this too modify a bit ur script? :

1.handle huge files? (10-20-50 gb) ?
2.can u remove extra spaces?
3.can u remove duplicates from huge files using this regex?


for example,input:
john
john
john^123
  money
carlos 123
  marcos 123
john&123
john&123

Open in new window


output:
------------
john
john^123
money
carlos123 
marcos123
john&123

Open in new window

0
 
SubsunCommented:
1.handle huge files? (10-20-50 gb) ?
Not sure, I have not tested it..
2.can u remove duplicates from huge files using this regex?

3.can u remove extra spaces?
GC C:\temp\input.txt | ?{$_ -match "^(?!(&|\?|�)).*([$>:;/@!#%^=*&+\\\-_$~)(]).*(?<!;)$|^[a-zA-Z].*[a-zA-Z]$" -and $_ -notmatch '\w+@\w+\.\w+'} | %{$_.Trim()} | Select -Unique | Out-File C:\temp\output.txt

Open in new window

0
 
john lambertAuthor Commented:
doesn't work,can't see :
input
-------
  money
carlos 123
  marcos 123

Open in new window


output(this 3 liens doesn't appear ):
----------
money
carlos 123
 marcos 123

Open in new window

0
 
SubsunCommented:
Try..
GC C:\temp\input.txt | ?{$_ -match "^(?!(&|\?|�)).*([$>:;/@!#%^=*&+\\\-_$~)(\s]).*(?<!;)$|^[a-zA-Z].*[a-zA-Z]$" -and $_ -notmatch '\w+@\w+\.\w+'} | %{$_.Trim()} | Select -Unique | Out-File C:\temp\output.txt

Open in new window

0
 
john lambertAuthor Commented:
i can see in ur snapshot :

carlos 123
marcos 123

i want:
 
carlos123
marcos123

no extra spaces at all

My output is this,i used ur last string:

john/123
test:123
test;123
john@123
john^123
john!123
john#123
john%123
john=123
john*1
john&1
john+1
john\123
acan
itoh
RE
john-123
john_123
marcy
john
michael
test
^
johnny$1234
john~123
john)123
john(123

Open in new window

0
 
SubsunCommented:
Try..
GC C:\temp\input.txt | ?{$_ -match "^(?!(&|\?|�)).*([$>:;/@!#%^=*&+\\\-_$~)(\s]).*(?<!;)$|^[a-zA-Z].*[a-zA-Z]$" -and $_ -notmatch '\w+@\w+\.\w+'} | %{$_.Trim() -replace "\s"} | Select -Unique | Out-File C:\temp\output.txt

Open in new window

1
 
john lambertAuthor Commented:
Oh my God!!!!!!!!!!! hard work uhh u have steel nervs Subsun mmm it seems to remove also duplicates,amazing i just hope to handle HUGE txt files like 8 gb or 20 gb i will test tomorrow with huge txt files i hope to work fine
0
 
john lambertAuthor Commented:
working perfect thank you....just tell me if handle huge txt files?
0
 
SubsunCommented:
I just tested the code which I posted in my last comment, I am not getting �^�O�OsG��� in output.
0
 
john lambertAuthor Commented:
no no my mistake no no the code is PERFECT, CONGRATULATION!!!!!!!!!!!!! 1000 THANKS!!
U worked a lot God bless you!!!!!!!!!!!!!!!!!!!!!!!!
0
 
john lambertAuthor Commented:
working perfect,thanks for ur hard work, God bless u
1

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

  • 14
  • 8
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now