Link to home
Start Free TrialLog in
Avatar of john lambert
john lambert

asked on

regex code to filter this thip of combo email

regex code to filter this thip of combo email,is possible to do this?

3:awhitcomb@gmail.com:aberw:0x77E61D83DD3D2961059CC734E58E644852D172CC:''

Open in new window


1'st output:
awhitcomb@gmail.com:0x77E61D83DD3D2961059CC734E58E644852D172CC

2nd output:
aberw:0x77E61D83DD3D2961059CC734E58E644852D172CC
Avatar of Rgonzo1971
Rgonzo1971

Hi,

you could use subgroup 1 and 3 first and thensubgroup 2 and 3
(?:3:)(\w+@\w+\.\w+:)([^:]+:)(0x\w+)

Open in new window


Regards
Avatar of john lambert

ASKER

dones't work for me..
.*?:(.*?):(.*?):(.*?):
 
Use group 1 and 3 for first output
Use group 2 and 3 for second output

HTH,
Dan
well for my tool word list updater2.7 doesn't work any of this 2 codes..ufff
i don't know why dones't working...we can tyr this too,please??

179819085:best-boy23@seznam.cz:0179819085:548af13bdd5fc92c120491aef92f9a22:John:John:Pepa:1975-08-08:37:M:65:969:900563

output:
best-boy23@seznam.cz:548af13bdd5fc92c120491aef92f9a22

Open in new window

.*?:(.*?):(.*?):(.*?):.*

Open in new window


User generated image
i see i see but for my tool dones't work....i tried all this 4 functions and no good results,u can try using same tool u can find it clean on internet if not i can upload for you:

User generated image
.*?:(.*?:).*?:(.*?):.*

Open in new window

test content:

john@yahoo.com:lazio:::lazio99:::juventus@1
john@yahoo.com:lazio:::lazio99
3:awhitcomb@gmail.com:aberw:0x77E61D83DD3D2961059CC734E58E644852D172CC:''
john@yahoo.com:lazio:lazio99:juventus@@
dgfd

Open in new window


here my clean tool, is very small u can download and try ,doesn't work:

https://www.sendspace.com/file/5v7gw2
Your tool can't use multiple groups.

Why don't you use Notepad++ or EditPad or any other editor with a proper Regex implementation?

Anyway, you can obtain all 3 groups in your tool by using this retain pattern:
 .*?:(.*?:.*?:.*?):.*

Open in new window

why? becouse this tool supports 10000000000 Gb txt  file,for example my file have 7 GB  and also becouse is very very fast and never give me errors
OK. Then after the first step (when you obtain awhitcomb@gmail.com:aberw:0x77E61D83DD3D2961059CC734E58E644852D172CC) use the following retain pattern to get the second option (aberw:0x77E61D83DD3D2961059CC734E58E644852D172CC):
.*?:(.*)

For the first option I don't know how you can get it. Using this as remove pattern almost works, but deletes the : also:
:.*?:
with first code i got this:
awhitcomb@gmail.com:aberw:0x77E61D83DD3D2961059CC734E58E644852D172CC:''

Open in new window


but the second code doesn't work,i tried all 4 options
anyone? ufff
I'll have to meditate a bit abot that, but two tips I could offer by now:

  • The x64 version of Notepadd++ supports very big files, too.
  • The free tool Expresso (see
  • here
  • ) is wonderful for developing and testing regular expressions (even while it doesn't support testing on gigabyte sized data piles ...)
dones't support my huge list 16 gb , says too big ,supports at least 6 gb ?
first i want this:

Input:
139903:gcoquio@surfeu.ch:guillaume_tell:8c5b7bb6042110ac96c9ae351dbd7fbc:Comandante Ch?:::1962-04-22:50:M:66:729:67793

Open in new window


Output:

gcoquio@surfeu.ch:8c5b7bb6042110ac96c9ae351dbd7fbc

Open in new window

or split first this:

139903:gcoquio@surfeu.ch:guillaume_tell:8c5b7bb6042110ac96c9ae351dbd7fbc

Open in new window


then this:
gcoquio@surfeu.ch:guillaume_tell:8c5b7bb6042110ac96c9ae351dbd7fbc

Open in new window


then this:
guillaume_tell:8c5b7bb6042110ac96c9ae351dbd7fbc

Open in new window

then this:
 gcoquio@surfeu.ch::8c5b7bb6042110ac96c9ae351dbd7fbc

Open in new window

Look this UltraEdit text editor can load 16 Gb but i don't know how to use reg. expresion here

User generated image
this (^(.*?):(.*?)) working to cut this,input:
139903:gcoquio@surfeu.ch:guillaume_tell:8c5b7bb6042110ac96c9ae351dbd7fbc:Comandante Ch?:::1962-04-22:50:M:66:729:67793

Open in new window


output:
gcoquio@surfeu.ch:guillaume_tell:8c5b7bb6042110ac96c9ae351dbd7fbc:Comandante Ch?:::1962-04-22:50:M:66:729:67793

Open in new window

Can you please send a sample file of a few dozen lines?

If all the lines have the same number of fields, with : as delimiter, this looks like a job for awk.
ok until now i got this,i cut first : and last :::, here what i have now:
By the way i using WINDOWS not linux os

gcoquio@surfeu.ch:guillaume_tell:8c5b7bb6042110ac96c9ae351dbd7fbc:Comandante Ch?
helil38@yahoo.de:paar48of:4708471dadc188ddc28ca02ad3203c00:Ilse Stelz
fabpatsmash2000@yahoo.com:fabpatsmash:8155adbe513300ff7b12a99ee717d12e:Fabpatsmash
nosekeponer_666@hotmail.com:devil_cara:e6b28db7802e90c9372b2069ed9b3e47:Javi
maisesap@hotmail.com:diablilla1:06c0681c95e6d499eb653073e1ed4bb5:Carmen
eloy_malaguita@hotmail.com:cojone:50fccc7c8b7417438df5e34d019c9036:Yo
chikito200@hotmail.com:ckikito200:45c8d284d70198a61b45524a9ce29795:Preguntamelo
kicks_r_us@hotmail.com:oulanem:81b74ebe1e8baae94d4f6c3d1e82673a:Oulanem
zerdesht03@hotmail.com:xolefize:e10adc3949ba59abbe56e057f20f883e:Ahmed
ropa60@hotmail.com:249886.scheda.batty35030:b51418d07c873d79c1ac21e5d9ea0dd0:Anto76

Open in new window

Hmmm - how about this regex:
(\d*?:){0,1}(.*?@.*?):(.*?):(.*?):

Open in new window

First result would be
$2:$4

Open in new window

Second result would be
$3:$4

Open in new window


Snippet from Expresso attached ...

Seems to work fine with the last sample text, too.
Expresso-Sample.png
Who cares what you're using? Windows 10 has bash shell, you can download gawk for windows from here:
http://gnuwin32.sourceforge.net/packages/gawk.htm

The beauty of awk is that it works with fields, like Excel. For ex, if sterge.txt has the content that you posted above
awk 'BEGIN { FS=":" } { print $1 }' sterge.txt
will print
gcoquio@surfeu.ch
helil38@yahoo.de
fabpatsmash2000@yahoo.com
nosekeponer_666@hotmail.com
maisesap@hotmail.com
eloy_malaguita@hotmail.com
chikito200@hotmail.com
kicks_r_us@hotmail.com
zerdesht03@hotmail.com
ropa60@hotmail.com

awk 'BEGIN { FS=":" } { print $1":"$3 }' sterge.txt
will print
gcoquio@surfeu.ch:8c5b7bb6042110ac96c9ae351dbd7fbc
helil38@yahoo.de:4708471dadc188ddc28ca02ad3203c00
fabpatsmash2000@yahoo.com:8155adbe513300ff7b12a99ee717d12e
nosekeponer_666@hotmail.com:e6b28db7802e90c9372b2069ed9b3e47
maisesap@hotmail.com:06c0681c95e6d499eb653073e1ed4bb5
eloy_malaguita@hotmail.com:50fccc7c8b7417438df5e34d019c9036
chikito200@hotmail.com:45c8d284d70198a61b45524a9ce29795
kicks_r_us@hotmail.com:81b74ebe1e8baae94d4f6c3d1e82673a
zerdesht03@hotmail.com:e10adc3949ba59abbe56e057f20f883e
ropa60@hotmail.com:b51418d07c873d79c1ac21e5d9ea0dd0
C:\Program Files (x86)\GnuWin32\bin>awk 'BEGIN { FS=":" } { print $1 }' sterge.txt
awk: 'BEGIN
awk: ^ invalid char ''' in expression

Open in new window

In Windows awk you're stuck with double quotes:

D:\portables\Gnu utilities\bin>awk "BEGIN { FS=""":""" } { print $1 }" sterge.txt

Open in new window


D:\portables\Gnu utilities\bin>awk "BEGIN { FS=""":""" } { print $1""":"""$2 }" sterge.txt
gcoquio@surfeu.ch:guillaume_tell
helil38@yahoo.de:paar48of
fabpatsmash2000@yahoo.com:fabpatsmash
nosekeponer_666@hotmail.com:devil_cara
maisesap@hotmail.com:diablilla1
eloy_malaguita@hotmail.com:cojone
chikito200@hotmail.com:ckikito200
kicks_r_us@hotmail.com:oulanem
zerdesht03@hotmail.com:xolefize
ropa60@hotmail.com:249886.scheda.batty35030

Open in new window

output is this:

john@yahoo.com
john@yahoo.com
3
139903
john@yahoo.com
dgfd

Open in new window


notgood i need

email:hash
then.......
username:hash
and this dones't work to filter email:pass and also output.txt to save the res. the file have 16 gb
That's why I asked for a sample file, to see the input.
If what you posted after I asked is the input, then:

email:hash
D:\portables\Gnu utilities\bin>awk "BEGIN { FS=""":""" } { print $1""":"""$3 }" sterge.txt
gcoquio@surfeu.ch:8c5b7bb6042110ac96c9ae351dbd7fbc
helil38@yahoo.de:4708471dadc188ddc28ca02ad3203c00
fabpatsmash2000@yahoo.com:8155adbe513300ff7b12a99ee717d12e
nosekeponer_666@hotmail.com:e6b28db7802e90c9372b2069ed9b3e47
maisesap@hotmail.com:06c0681c95e6d499eb653073e1ed4bb5
eloy_malaguita@hotmail.com:50fccc7c8b7417438df5e34d019c9036
chikito200@hotmail.com:45c8d284d70198a61b45524a9ce29795
kicks_r_us@hotmail.com:81b74ebe1e8baae94d4f6c3d1e82673a
zerdesht03@hotmail.com:e10adc3949ba59abbe56e057f20f883e
ropa60@hotmail.com:b51418d07c873d79c1ac21e5d9ea0dd0

username:hash
D:\portables\Gnu utilities\bin>awk "BEGIN { FS=""":""" } { print $2""":"""$3 }" sterge.txt
guillaume_tell:8c5b7bb6042110ac96c9ae351dbd7fbc
paar48of:4708471dadc188ddc28ca02ad3203c00
fabpatsmash:8155adbe513300ff7b12a99ee717d12e
devil_cara:e6b28db7802e90c9372b2069ed9b3e47
diablilla1:06c0681c95e6d499eb653073e1ed4bb5
cojone:50fccc7c8b7417438df5e34d019c9036
ckikito200:45c8d284d70198a61b45524a9ce29795
oulanem:81b74ebe1e8baae94d4f6c3d1e82673a
xolefize:e10adc3949ba59abbe56e057f20f883e
249886.scheda.batty35030:b51418d07c873d79c1ac21e5d9ea0dd0

Open in new window

oh my god is working,but can u  set Output.txt please? to ''output.txt''  hugeee result can u do that please?
Ok so with this sample working  perfect:
helil38@yahoo.de:paar48of:4708471dadc188ddc28ca02ad3203c00:Ilse Stelz

Open in new window

first i cut using a code for Word List Updater2.7 blah blah can u make a try ?make it work with this original sample?
thank you 90% of the battle u won,ur great i love you man!!!

139903:gcoquio@surfeu.ch:guillaume_tell:8c5b7bb6042110ac96c9ae351dbd7fbc:Comandante Ch?:::1962-04-22:50:M:66:729:67793

Open in new window

Since you only have an additional field, add 1 to the fields in the awk commands:

sterge.txt
139903:gcoquio@surfeu.ch:guillaume_tell:8c5b7bb6042110ac96c9ae351dbd7fbc:Comandante Ch?:::1962-04-22:50:M:66:729:67793

D:\portables\Gnu utilities\bin>awk "BEGIN { FS=""":""" } { print $3""":"""$4 }" sterge.txt
guillaume_tell:8c5b7bb6042110ac96c9ae351dbd7fbc

D:\portables\Gnu utilities\bin>awk "BEGIN { FS=""":""" } { print $2""":"""$4 }" sterge.txt
gcoquio@surfeu.ch:8c5b7bb6042110ac96c9ae351dbd7fbc

Open in new window


Want to save the output?
D:\portables\Gnu utilities\bin>awk "BEGIN { FS=""":""" } { print $2""":"""$4 }" sterge.txt > output.txt
output working perfect,but output results not ok,need 2 codes to save email:has and username:hash  with the above original sample
??

awk "BEGIN { FS=""":""" } { print $2""":"""$4 }" sterge.txt > emailhash.txt
awk "BEGIN { FS=""":""" } { print $3""":"""$4 }" sterge.txt > usernamehash.txt
no no look, this is the original sample,make a try see if u can split using the original sample,original sample from beging of beginings:


139903:gcoquio@surfeu.ch:guillaume_tell:8c5b7bb6042110ac96c9ae351dbd7fbc:Comandante Ch?:::1962-04-22:50:M:66:729:67793
60099:gcoquio@mia.uk:gianii:8c5b7bb6042110ac96c9ae351dbd7fbc:rihagd:::1990-01-22:80:M:66:729:66666

Open in new window


first extract,email:hash:
gcoquio@surfeu.ch:8c5b7bb6042110ac96c9ae351dbd7fbc
gcoquio@mia.uk:8c5b7bb6042110ac96c9ae351dbd7fbc

Open in new window


then extract username:hash
surfeu.ch:guillaume_tell:8c5b7bb6042110ac96c9ae351dbd7fbc
gianii:8c5b7bb6042110ac96c9ae351dbd7fbc:rihagd

Open in new window

So what is the problem??

sterge.txt:
139903:gcoquio@surfeu.ch:guillaume_tell:8c5b7bb6042110ac96c9ae351dbd7fbc:Comandante Ch?:::1962-04-22:50:M:66:729:67793
60099:gcoquio@mia.uk:gianii:8c5b7bb6042110ac96c9ae351dbd7fbc:rihagd:::1990-01-22:80:M:66:729:66666

Commands:
D:\portables\Gnu utilities\bin>awk "BEGIN { FS=""":""" } { print $2""":"""$4 }" sterge.txt > emailhash.txt

D:\portables\Gnu utilities\bin>awk "BEGIN { FS=""":""" } { print $3""":"""$4 }" sterge.txt > usernamehash.txt

Results:
emailhash.txt:
gcoquio@surfeu.ch:8c5b7bb6042110ac96c9ae351dbd7fbc
gcoquio@mia.uk:8c5b7bb6042110ac96c9ae351dbd7fbc

usernamehash.txt:
guillaume_tell:8c5b7bb6042110ac96c9ae351dbd7fbc
gianii:8c5b7bb6042110ac96c9ae351dbd7fbc

Open in new window

Works perfect, ur i good shape my god!!!
what about this is possible?
Original sample:

139903:gcoquio@surfeu.ch:guillaume_tell:8c5b7bb6042110ac96c9ae351dbd7fbc:Comandante Ch?:::1962-04-22:50:M:66:729:67793
60099:gcoquio@mia.uk:gianii:8c5b7bb6042110ac96c9ae351dbd7fbc:rihagd:::1990-01-22:80:M:66:729:66666

Open in new window


revert username:hash
look the output:
8c5b7bb6042110ac96c9ae351dbd7fbc:Comandante Ch?
8c5b7bb6042110ac96c9ae351dbd7fbc:rihagd

Open in new window


or haha this is crazy is possible to save the revert in this format? or save in above format and then use otehr comand to revert?

Comandante Ch?:8c5b7bb6042110ac96c9ae351dbd7fbc
rihagd:8c5b7bb6042110ac96c9ae351dbd7fbc

Open in new window

You did not look at awk, did you? It simply uses : as a field separator (that's what FS=':' means) and then it splits every line at every : and counts the parts as fields.

On your sample, the id is the first field, the email the 2nd, the username 3rd field, the hash is the 4th field, in the 5th you have some other username and so on.

Want to print the 5th field and then the 4th?
D:\portables\Gnu utilities\bin>awk "BEGIN { FS=""":""" } { print $5""":"""$4 }" sterge.txt
Comandante Ch?:8c5b7bb6042110ac96c9ae351dbd7fbc
rihagd:8c5b7bb6042110ac96c9ae351dbd7fbc

Open in new window

this is more normal and is last favour i ask:
Input:
54:jmshaw3567@hotmail.com:54:0x7BABC233DE26AB19EAD1B9C278128D5C434910EE:''
56:mark13886570@sina.com:56:0x66036CCD51CD5EB31978E803784D79CC5DADFBEC:''

Open in new window


Output:
mshaw3567@hotmail.com:0x7BABC233DE26AB19EAD1B9C278128D5C434910EE
mark13886570@sina.com:0x66036CCD51CD5EB31978E803784D79CC5DADFBEC

Open in new window


Toorrow when my head is clear hehe i will try to learn costumize and follow ur examples
ASKER CERTIFIED SOLUTION
Avatar of Dan Craciun
Dan Craciun
Flag of Romania image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
yes
well......amazing!!no words!I honestly thought I would not solve these problems never
thank you very much!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!ur the best!!!thank you thank you thank you, God bless u!!!
thank you....