Solved

collect number of times unique lines occur in a text file

Posted on 2014-11-14
12
112 Views
Last Modified: 2014-11-17
For the code that I accepted from before in VB6:

Private Sub Command1_Click()
 Dim f As Integer
    Dim g As Integer
    Dim strLine As String
    
    f = FreeFile
    Open "C:\Users\Alpesh\Desktop\111114.txt" For Input As #f
        g = FreeFile
        Open "c:\Users\Alpesh\Desktop\052214.txt" For Append As #g
            Do Until EOF(f)
                Line Input #f, strLine
                If InStr(strLine, "BlockedIP") > 0 Then
                    Print #g, strLine
                End If
            Loop
        Close #g
    Close #f
End Sub

Open in new window



Is it possible to count how many unique lines are collected for output?  For example, instead of pasting all 502 "BlockedIP" lines, I would rather see a count of them as seen below:  502 times for the first one and 4 times for the second, etc...


502- 00:01:05 192.168.1.100 [DNSRedir] BlockedIP response sent, keyword blogspot.com: adsense.blogspot.com. -> 192.168.1.100
4 -00:01:33 192.168.1.102 [DNSRedir] BlockedIP response sent, keyword adnxs.com: ib.adnxs.com. -> 192.168.1.100
052214.txt
0
Comment
Question by:al4629740
  • 6
  • 6
12 Comments
 
LVL 45

Expert Comment

by:Martin Liss
Comment Utility
Using the file you attached and the code below I get the following which is not the counts that you said there should be. The code assumes that a record is unique based on the text after "keyword". Is that not correct?

7- adnxs.com: ib.adnxs.com. -> 192.168.1.100
1207- blogspot.com: adsense.blogspot.com. -> 192.168.1.100

Private Sub Command1_Click()
Dim f As Integer
Dim strLine As String
Dim lngLines As Long
Dim arrKeys() As String
Dim bFound As Boolean
Dim bFirst As Boolean
Dim intCount As Integer
Dim strParts() As String

bFirst = True
f = FreeFile

Open "C:\temp\052214.txt" For Input As #f
ReDim arrKeys(1, 0)
Do Until EOF(f)
    Line Input #f, strLine
    bFound = False
    If InStr(strLine, "BlockedIP") > 0 Then
        strParts = Split(strLine, "keyword")
        For lngLines = 0 To intCount - 1
            If arrKeys(1, lngLines) = strParts(1) Then
                arrKeys(0, intCount - 1) = arrKeys(0, intCount - 1) + 1
                bFound = True
                Exit For
            End If
        Next
        If Not bFound Then
            If Not bFirst Then
                ReDim Preserve arrKeys(1, intCount)
            End If
            arrKeys(1, intCount) = strParts(1)
            arrKeys(0, intCount) = 1
            bFirst = False
            intCount = intCount + 1
        End If
    End If
Loop
Close
For lngLines = 0 To UBound(arrKeys)
    Debug.Print arrKeys(0, lngLines) & "-" & arrKeys(1, lngLines)
Next
MsgBox "done"
End Sub

Open in new window

0
 

Author Comment

by:al4629740
Comment Utility
It looks like I have the wrong numbers. Let me test it and get back to you
0
 
LVL 45

Expert Comment

by:Martin Liss
Comment Utility
Any results from your testing? Did you see the message I sent you?
0
 

Author Comment

by:al4629740
Comment Utility
Martin,

Where is the output file?  It executes but I can't see where the results are.
0
 
LVL 45

Expert Comment

by:Martin Liss
Comment Utility
They are in the Debug window which you can access if  you Goto the VBE and type Ctrl+g
0
 

Author Comment

by:al4629740
Comment Utility
Not every blocked site shows up
0
Why You Should Analyze Threat Actor TTPs

After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

 
LVL 45

Expert Comment

by:Martin Liss
Comment Utility
There are 1992 lines in the file you posted that contain "BlockedIP". My results in the Immediate Window (which you may have to scroll in order to see it all) shows this:
7- adnxs.com: ib.adnxs.com. -> 192.168.1.100
1207- blogspot.com: adsense.blogspot.com. -> 192.168.1.100
20- visualwebsiteoptimizer.com: dev.visualwebsiteoptimizer.com. -> 192.168.1.100
41- ^.*s(3|e)x: expertsexchange.112.2o7.net. -> 192.168.1.100
389- ^.*\.(asp|aspx|htm|html|jsp|php|xml)-: www.xml-sitemaps.com. -> 192.168.1.100
1- adsrvr.org: match.adsrvr.org. -> 192.168.1.100
1- tube: rtd.tubemogul.com. -> 192.168.1.100
1- criteo.com: dis.criteo.com. -> 192.168.1.100
1- w55c.net: geo-lb02.w55c.net. -> 192.168.1.100
1- tube: rtb.tubemogul.com. -> 192.168.1.100
1- w55c.net: i.w55c.net. -> 192.168.1.100
7- xnxx: www.xnxx.com. -> 192.168.1.100
1- criteo.com: rtax.criteo.com. -> 192.168.1.100
4- twitt: twitter.github.io. -> 192.168.1.100
1- dailymotion.com: www.dailymotion.com. -> 192.168.1.100
1- ^(.*\.)?xvideos\.(com|net)$: www.xvideos.com. -> 192.168.1.100
1- pinterest.com: assets.pinterest.com. -> 192.168.1.100
2- taboo: cdn.taboola.com. -> 192.168.1.100
79- disqus.com: collegetimescom.disqus.com. -> 192.168.1.100
3- pinterest.com: www.pinterest.com. -> 192.168.1.100
1- tumblr.com: officegirls.tumblr.com. -> 192.168.1.100
1- tumblr.com: greekpowerlady.tumblr.com. -> 192.168.1.100
1- tumblr.com: www.tumblr.com. -> 192.168.1.100
3- tumblr.com: sandybrown121.tumblr.com. -> 192.168.1.100
3- eroti: eleganteroticdresses.tumblr.com. -> 192.168.1.100
1- tumblr.com: assets.tumblr.com. -> 192.168.1.100
1- tumblr.com: static.tumblr.com. -> 192.168.1.100
1- tumblr.com: 38.media.tumblr.com. -> 192.168.1.100
1- tumblr.com: 33.media.tumblr.com. -> 192.168.1.100
1- tumblr.com: 40.media.tumblr.com. -> 192.168.1.100
1- tumblr.com: 41.media.tumblr.com. -> 192.168.1.100
1- tumblr.com: 36.media.tumblr.com. -> 192.168.1.100
4- tumblr.com: secure.assets.tumblr.com. -> 192.168.1.100
3- lingerie: www.lingeriediva.com. -> 192.168.1.100
2- tumblr.com: platform.tumblr.com. -> 192.168.1.100
1- pinterest.com: passets-lt.pinterest.com. -> 192.168.1.100
3- lingerie: www.spicylingerie.com. -> 192.168.1.100
2- mature: elegantmatures.tumblr.com. -> 192.168.1.100
2- tumblr.com: classicwomen.tumblr.com. -> 192.168.1.100
2- tumblr.com: strictbeauties.tumblr.com. -> 192.168.1.100
1- adcash.com: www.adcash.com. -> 192.168.1.100
3- tumblr.com: api.tumblr.com. -> 192.168.1.100
1- mgid.com: jsc.mgid.com. -> 192.168.1.100
1- directrev.com: xch.directrev.com. -> 192.168.1.100
1- pinterest.com: widgets.pinterest.com. -> 192.168.1.100
2- addthisedge.com: m.addthisedge.com. -> 192.168.1.100
7- tumblr.com: 31.media.tumblr.com. -> 192.168.1.100
9- adsrvr.org: rtb.adsrvr.org. -> 192.168.1.100
7- blogspot.com: 1.bp.blogspot.com. -> 192.168.1.100
1- tumblr.com: heavenlycheesecake.tumblr.com. -> 192.168.1.100
155- tumblr.com: elegantsexy.tumblr.com. -> 192.168.1.100
and the sum of the counts displayed for each address is 1992.
0
 

Author Comment

by:al4629740
Comment Utility
I ran the code twice in the same immediate window.  In the immediate window I got this:

7- adnxs.com: ib.adnxs.com. -> 192.168.1.100
1207- blogspot.com: adsense.blogspot.com. -> 192.168.1.100
7- adnxs.com: ib.adnxs.com. -> 192.168.1.100
1207- blogspot.com: adsense.blogspot.com. -> 192.168.1.100

This is the code:

Dim f As Integer
Dim strLine As String
Dim lngLines As Long
Dim arrKeys() As String
Dim bFound As Boolean
Dim bFirst As Boolean
Dim intCount As Integer
Dim strParts() As String

bFirst = True
f = FreeFile

Open "C:\Users\Me\Desktop\111114.txt" For Input As #f
ReDim arrKeys(1, 0)
Do Until EOF(f)
    Line Input #f, strLine
    bFound = False
    If InStr(strLine, "BlockedIP") > 0 Then
        strParts = Split(strLine, "keyword")
        For lngLines = 0 To intCount - 1
            If arrKeys(1, lngLines) = strParts(1) Then
                arrKeys(0, intCount - 1) = arrKeys(0, intCount - 1) + 1
                bFound = True
                Exit For
            End If
        Next
        If Not bFound Then
            If Not bFirst Then
                ReDim Preserve arrKeys(1, intCount)
            End If
            arrKeys(1, intCount) = strParts(1)
            arrKeys(0, intCount) = 1
            bFirst = False
            intCount = intCount + 1
        End If
    End If
Loop
Close
For lngLines = 0 To UBound(arrKeys)
    Debug.Print arrKeys(0, lngLines) & "-" & arrKeys(1, lngLines)
Next
MsgBox "done"

Open in new window

0
 

Author Comment

by:al4629740
Comment Utility
The immediate window output is also short.  No where to scroll to
0
 
LVL 45

Accepted Solution

by:
Martin Liss earned 500 total points
Comment Utility
The only thing that I can imagine is that you aren't using the file that you posted in your original question. Just in case something has happened to the original on your PC why don't you download it from your post. I'm attaching my whole project. Change the Open "C:\temp\052214.txt" For Input As #f statement in Command1_Click to match your file path, run it and tell me what happens. I'm mystified because as I've shown I get a different output then you.
Project1.zip
0
 

Author Comment

by:al4629740
Comment Utility
I must have copied your code incorrectly.  

Thank you Martin
0
 
LVL 45

Expert Comment

by:Martin Liss
Comment Utility
If you need any tweaks let me know. In any case you're welcome and I'm glad I was able to help.

In my profile you'll find links to some articles I've written that may interest you.
Marty - MVP 2009 to 2014
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Introduction I needed to skip over some file processing within a For...Next loop in some old production code and wished that VB (classic) had a statement that would drop down to the end of the current iteration, bypassing the statements that were c…
Have you ever wanted to restrict the users input in a textbox to numbers, and while doing that make sure that they can't 'cheat' by pasting in non-numeric text? Of course you can do that with code you write yourself but it's tedious and error-prone …
As developers, we are not limited to the functions provided by the VBA language. In addition, we can call the functions that are part of the Windows operating system. These functions are part of the Windows API (Application Programming Interface). U…
Get people started with the utilization of class modules. Class modules can be a powerful tool in Microsoft Access. They allow you to create self-contained objects that encapsulate functionality. They can easily hide the complexity of a process from…

763 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

8 Experts available now in Live!

Get 1:1 Help Now