Solved

List Parsing

Posted on 2001-06-04
8
139 Views
Last Modified: 2010-05-02
Hi,

I have this program that produces these text files which contains thousands of numbers, separated by new lines. The number are between 4 and 7 digits. Some of these files are upwards of 3 megs. Unfortunately most of the numbers are duplicated. What would be the most efficient way for me to parse these files and remove all of the duplicates?

Zaphod.
0
Comment
Question by:Z_Beeblebrox
8 Comments
 
LVL 1

Expert Comment

by:superchook
Comment Utility
Well...

One way that I have used in the past is to read the list into an array (or a database if the number of unique values is truly huge).

Thhen scan the array (db) for each new number you read, and add or discard it.

Using an SQL compliant db has a couple of other advantages - in that you can dump the list sorted/filtered by any number of criteria, whereas you have to perform the operations yourself on an array - but arrays/RAM is much faster if the sample set can fit into memory.



0
 
LVL 1

Expert Comment

by:superchook
Comment Utility
Well...

One way that I have used in the past is to read the list into an array (or a database if the number of unique values is truly huge).

Thhen scan the array (db) for each new number you read, and add or discard it.

Using an SQL compliant db has a couple of other advantages - in that you can dump the list sorted/filtered by any number of criteria, whereas you have to perform the operations yourself on an array - but arrays/RAM is much faster if the sample set can fit into memory.



0
 

Expert Comment

by:sunnysideandy
Comment Utility
VERSION 5.00
Object = "{831FDD16-0C5C-11D2-A9FC-0000F8754DA1}#2.0#0"; "mscomctl.ocx"
Begin VB.Form frmSortRand
   Caption         =   "Form1"
   ClientHeight    =   5190
   ClientLeft      =   60
   ClientTop       =   345
   ClientWidth     =   5895
   LinkTopic       =   "Form1"
   ScaleHeight     =   5190
   ScaleWidth      =   5895
   StartUpPosition =   3  'Windows Default
   Begin MSComctlLib.ListView lstOrder
      Height          =   4965
      Left            =   2175
      TabIndex        =   3
      Top             =   150
      Width           =   1740
      _ExtentX        =   3069
      _ExtentY        =   8758
      View            =   3
      LabelWrap       =   -1  'True
      HideSelection   =   -1  'True
      FullRowSelect   =   -1  'True
      _Version        =   393217
      ForeColor       =   -2147483640
      BackColor       =   -2147483643
      BorderStyle     =   1
      Appearance      =   1
      NumItems        =   0
   End
   Begin MSComctlLib.ListView lstRandom
      Height          =   4965
      Left            =   300
      TabIndex        =   2
      Top             =   150
      Width           =   1665
      _ExtentX        =   2937
      _ExtentY        =   8758
      View            =   3
      LabelWrap       =   -1  'True
      HideSelection   =   -1  'True
      FullRowSelect   =   -1  'True
      _Version        =   393217
      ForeColor       =   -2147483640
      BackColor       =   -2147483643
      BorderStyle     =   1
      Appearance      =   1
      NumItems        =   0
   End
   Begin VB.CommandButton cmdList
      Caption         =   "List"
      Height          =   390
      Left            =   4050
      TabIndex        =   1
      Top             =   900
      Width           =   1665
   End
   Begin VB.CommandButton cmdGetRandom
      Caption         =   "Get Random"
      Height          =   390
      Left            =   4050
      TabIndex        =   0
      Top             =   225
      Width           =   1665
   End
End
Attribute VB_Name = "frmSortRand"
Attribute VB_GlobalNameSpace = False
Attribute VB_Creatable = False
Attribute VB_PredeclaredId = True
Attribute VB_Exposed = False
Option Explicit
Const MAX_RAND = 1000

Private Sub cmdGetRandom_Click()
    Dim nIndex As Integer, nRand As Integer
    lstRandom.ListItems.Clear
    For nIndex = 1 To MAX_RAND
        lstRandom.ListItems.Add , "id=" & nIndex, CStr(Int((MAX_RAND * Rnd) + 1))
    Next
End Sub

Private Sub cmdList_Click()
    Dim aSorted(MAX_RAND) As Integer
    Dim nIndex As Integer
    For nIndex = 1 To lstRandom.ListItems.Count
        aSorted(CInt(lstRandom.ListItems.Item(nIndex).Text)) = 1
    Next
    lstOrder.ListItems.Clear
    For nIndex = 0 To MAX_RAND
        If (aSorted(nIndex) = 1) Then
            lstOrder.ListItems.Add , "id=" & nIndex, CStr(nIndex)
        End If
    Next
End Sub

Private Sub Form_Load()
    lstRandom.ColumnHeaders.Add , , , lstRandom.Width - 265
    lstOrder.ColumnHeaders.Add , , , lstOrder.Width - 265
End Sub
0
 

Expert Comment

by:sunnysideandy
Comment Utility
VERSION 5.00
Object = "{831FDD16-0C5C-11D2-A9FC-0000F8754DA1}#2.0#0"; "mscomctl.ocx"
Begin VB.Form frmSortRand
   Caption         =   "Form1"
   ClientHeight    =   5190
   ClientLeft      =   60
   ClientTop       =   345
   ClientWidth     =   5895
   LinkTopic       =   "Form1"
   ScaleHeight     =   5190
   ScaleWidth      =   5895
   StartUpPosition =   3  'Windows Default
   Begin MSComctlLib.ListView lstOrder
      Height          =   4965
      Left            =   2175
      TabIndex        =   3
      Top             =   150
      Width           =   1740
      _ExtentX        =   3069
      _ExtentY        =   8758
      View            =   3
      LabelWrap       =   -1  'True
      HideSelection   =   -1  'True
      FullRowSelect   =   -1  'True
      _Version        =   393217
      ForeColor       =   -2147483640
      BackColor       =   -2147483643
      BorderStyle     =   1
      Appearance      =   1
      NumItems        =   0
   End
   Begin MSComctlLib.ListView lstRandom
      Height          =   4965
      Left            =   300
      TabIndex        =   2
      Top             =   150
      Width           =   1665
      _ExtentX        =   2937
      _ExtentY        =   8758
      View            =   3
      LabelWrap       =   -1  'True
      HideSelection   =   -1  'True
      FullRowSelect   =   -1  'True
      _Version        =   393217
      ForeColor       =   -2147483640
      BackColor       =   -2147483643
      BorderStyle     =   1
      Appearance      =   1
      NumItems        =   0
   End
   Begin VB.CommandButton cmdList
      Caption         =   "List"
      Height          =   390
      Left            =   4050
      TabIndex        =   1
      Top             =   900
      Width           =   1665
   End
   Begin VB.CommandButton cmdGetRandom
      Caption         =   "Get Random"
      Height          =   390
      Left            =   4050
      TabIndex        =   0
      Top             =   225
      Width           =   1665
   End
End
Attribute VB_Name = "frmSortRand"
Attribute VB_GlobalNameSpace = False
Attribute VB_Creatable = False
Attribute VB_PredeclaredId = True
Attribute VB_Exposed = False
Option Explicit
Const MAX_RAND = 1000

Private Sub cmdGetRandom_Click()
    Dim nIndex As Integer, nRand As Integer
    lstRandom.ListItems.Clear
    For nIndex = 1 To MAX_RAND
        lstRandom.ListItems.Add , "id=" & nIndex, CStr(Int((MAX_RAND * Rnd) + 1))
    Next
End Sub

Private Sub cmdList_Click()
    Dim aSorted(MAX_RAND) As Integer
    Dim nIndex As Integer
    For nIndex = 1 To lstRandom.ListItems.Count
        aSorted(CInt(lstRandom.ListItems.Item(nIndex).Text)) = 1
    Next
    lstOrder.ListItems.Clear
    For nIndex = 0 To MAX_RAND
        If (aSorted(nIndex) = 1) Then
            lstOrder.ListItems.Add , "id=" & nIndex, CStr(nIndex)
        End If
    Next
End Sub

Private Sub Form_Load()
    lstRandom.ColumnHeaders.Add , , , lstRandom.Width - 265
    lstOrder.ColumnHeaders.Add , , , lstOrder.Width - 265
End Sub
0
What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

 
LVL 3

Accepted Solution

by:
Hornet241 earned 50 total points
Comment Utility
This way will take a minute or so but you won't be hampered by Max limits of a list box.(It took an K6 266, 80Meg RAM 20 sec to loop through an array of 0 to 9999999)

Dim NumArray() As Boolean

ff = freefile
Open "YourFileName" for input as #ff

Do While not EOF(ff)
    Line Input #ff, InNum
    if Val(InNum) > LastHi then
        LastHi = Val(InNum)
        ReDim Preserve NumArray(LastHi) As Boolean
    End If
    ThisNum = Val(InNum)
    NumArray(ThisNum) = True
loop
Close #ff

ff = freefile
Open "NextFileName" for Output as #ff

For a = 0 To UBound(NumArray, 1)
    If NumArray(a) = True Then
        Print #ff, Trim(Str(a))
    End If
Next a

Close #ff
0
 
LVL 7

Author Comment

by:Z_Beeblebrox
Comment Utility
Hi,

Just so you guys don't think I am ignoring this question, so far I prefer hornet241's solution. In fact, its pretty impressive. But just in case there is a better way, I will leave this question open until tomorrow evening. If by then there is no better answer then you can have the points.

Zaphod.
0
 
LVL 43

Expert Comment

by:TimCottee
Comment Utility
Zaphod, another possible solution, this works equally well with strings and numbers, uses the collection object which allows you to directly test using a key the existence of an element:

Private colNumbers As Collection

Private Sub Command3_Click()
    Dim lngNumber As Long
    Dim lngCount As Long
    Set colNumbers = New Collection
    Do
        lngCount = lngCount + 1
        lngNumber = Rnd() * 10000
        If Not NumberExists(lngNumber) Then
            colNumbers.Add lngNumber, CStr(lngNumber)
        End If
        Label1.Caption = colNumbers.Count & " / " & lngCount
        Label2.Caption = lngNumber
        DoEvents
    Loop Until lngCount = 10000000000000#
    MsgBox colNumbers.Count
    Set colNumbers = Nothing
End Sub

Private Function NumberExists(ByVal Number As Long)
    On Error Resume Next
    If colNumbers(CStr(Number)) <> Number Then
        NumberExists = False
    Else
        NumberExists = True
    End If
End Function

This example uses random numbers, it you run it (with the label controls to see what is going on) you can see that the count of numbers tested goes up and up but the count of elements goes up to the maximum slowly (relatively) and then just sits there.
0
 
LVL 2

Expert Comment

by:Microsoft
Comment Utility
accept hornets code, very well written even if i say so my self.

well done horn

cheers
Andy
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Introduction I needed to skip over some file processing within a For...Next loop in some old production code and wished that VB (classic) had a statement that would drop down to the end of the current iteration, bypassing the statements that were c…
Introduction In a recent article (http://www.experts-exchange.com/A_7811-A-Better-Concatenate-Function.html) for the Excel community, I showed an improved version of the Excel Concatenate() function.  While writing that article I realized that no o…
Get people started with the utilization of class modules. Class modules can be a powerful tool in Microsoft Access. They allow you to create self-contained objects that encapsulate functionality. They can easily hide the complexity of a process from…
This lesson covers basic error handling code in Microsoft Excel using VBA. This is the first lesson in a 3-part series that uses code to loop through an Excel spreadsheet in VBA and then fix errors, taking advantage of error handling code. This l…

728 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

9 Experts available now in Live!

Get 1:1 Help Now