[2 days left] What’s wrong with your cloud strategy? Learn why multicloud solutions matter with Nimble Storage.Register Now

x
?
Solved

List Parsing

Posted on 2001-06-04
8
Medium Priority
?
149 Views
Last Modified: 2010-05-02
Hi,

I have this program that produces these text files which contains thousands of numbers, separated by new lines. The number are between 4 and 7 digits. Some of these files are upwards of 3 megs. Unfortunately most of the numbers are duplicated. What would be the most efficient way for me to parse these files and remove all of the duplicates?

Zaphod.
0
Comment
Question by:Z_Beeblebrox
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
8 Comments
 
LVL 1

Expert Comment

by:superchook
ID: 6154517
Well...

One way that I have used in the past is to read the list into an array (or a database if the number of unique values is truly huge).

Thhen scan the array (db) for each new number you read, and add or discard it.

Using an SQL compliant db has a couple of other advantages - in that you can dump the list sorted/filtered by any number of criteria, whereas you have to perform the operations yourself on an array - but arrays/RAM is much faster if the sample set can fit into memory.



0
 
LVL 1

Expert Comment

by:superchook
ID: 6154518
Well...

One way that I have used in the past is to read the list into an array (or a database if the number of unique values is truly huge).

Thhen scan the array (db) for each new number you read, and add or discard it.

Using an SQL compliant db has a couple of other advantages - in that you can dump the list sorted/filtered by any number of criteria, whereas you have to perform the operations yourself on an array - but arrays/RAM is much faster if the sample set can fit into memory.



0
 

Expert Comment

by:sunnysideandy
ID: 6154557
VERSION 5.00
Object = "{831FDD16-0C5C-11D2-A9FC-0000F8754DA1}#2.0#0"; "mscomctl.ocx"
Begin VB.Form frmSortRand
   Caption         =   "Form1"
   ClientHeight    =   5190
   ClientLeft      =   60
   ClientTop       =   345
   ClientWidth     =   5895
   LinkTopic       =   "Form1"
   ScaleHeight     =   5190
   ScaleWidth      =   5895
   StartUpPosition =   3  'Windows Default
   Begin MSComctlLib.ListView lstOrder
      Height          =   4965
      Left            =   2175
      TabIndex        =   3
      Top             =   150
      Width           =   1740
      _ExtentX        =   3069
      _ExtentY        =   8758
      View            =   3
      LabelWrap       =   -1  'True
      HideSelection   =   -1  'True
      FullRowSelect   =   -1  'True
      _Version        =   393217
      ForeColor       =   -2147483640
      BackColor       =   -2147483643
      BorderStyle     =   1
      Appearance      =   1
      NumItems        =   0
   End
   Begin MSComctlLib.ListView lstRandom
      Height          =   4965
      Left            =   300
      TabIndex        =   2
      Top             =   150
      Width           =   1665
      _ExtentX        =   2937
      _ExtentY        =   8758
      View            =   3
      LabelWrap       =   -1  'True
      HideSelection   =   -1  'True
      FullRowSelect   =   -1  'True
      _Version        =   393217
      ForeColor       =   -2147483640
      BackColor       =   -2147483643
      BorderStyle     =   1
      Appearance      =   1
      NumItems        =   0
   End
   Begin VB.CommandButton cmdList
      Caption         =   "List"
      Height          =   390
      Left            =   4050
      TabIndex        =   1
      Top             =   900
      Width           =   1665
   End
   Begin VB.CommandButton cmdGetRandom
      Caption         =   "Get Random"
      Height          =   390
      Left            =   4050
      TabIndex        =   0
      Top             =   225
      Width           =   1665
   End
End
Attribute VB_Name = "frmSortRand"
Attribute VB_GlobalNameSpace = False
Attribute VB_Creatable = False
Attribute VB_PredeclaredId = True
Attribute VB_Exposed = False
Option Explicit
Const MAX_RAND = 1000

Private Sub cmdGetRandom_Click()
    Dim nIndex As Integer, nRand As Integer
    lstRandom.ListItems.Clear
    For nIndex = 1 To MAX_RAND
        lstRandom.ListItems.Add , "id=" & nIndex, CStr(Int((MAX_RAND * Rnd) + 1))
    Next
End Sub

Private Sub cmdList_Click()
    Dim aSorted(MAX_RAND) As Integer
    Dim nIndex As Integer
    For nIndex = 1 To lstRandom.ListItems.Count
        aSorted(CInt(lstRandom.ListItems.Item(nIndex).Text)) = 1
    Next
    lstOrder.ListItems.Clear
    For nIndex = 0 To MAX_RAND
        If (aSorted(nIndex) = 1) Then
            lstOrder.ListItems.Add , "id=" & nIndex, CStr(nIndex)
        End If
    Next
End Sub

Private Sub Form_Load()
    lstRandom.ColumnHeaders.Add , , , lstRandom.Width - 265
    lstOrder.ColumnHeaders.Add , , , lstOrder.Width - 265
End Sub
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 

Expert Comment

by:sunnysideandy
ID: 6154560
VERSION 5.00
Object = "{831FDD16-0C5C-11D2-A9FC-0000F8754DA1}#2.0#0"; "mscomctl.ocx"
Begin VB.Form frmSortRand
   Caption         =   "Form1"
   ClientHeight    =   5190
   ClientLeft      =   60
   ClientTop       =   345
   ClientWidth     =   5895
   LinkTopic       =   "Form1"
   ScaleHeight     =   5190
   ScaleWidth      =   5895
   StartUpPosition =   3  'Windows Default
   Begin MSComctlLib.ListView lstOrder
      Height          =   4965
      Left            =   2175
      TabIndex        =   3
      Top             =   150
      Width           =   1740
      _ExtentX        =   3069
      _ExtentY        =   8758
      View            =   3
      LabelWrap       =   -1  'True
      HideSelection   =   -1  'True
      FullRowSelect   =   -1  'True
      _Version        =   393217
      ForeColor       =   -2147483640
      BackColor       =   -2147483643
      BorderStyle     =   1
      Appearance      =   1
      NumItems        =   0
   End
   Begin MSComctlLib.ListView lstRandom
      Height          =   4965
      Left            =   300
      TabIndex        =   2
      Top             =   150
      Width           =   1665
      _ExtentX        =   2937
      _ExtentY        =   8758
      View            =   3
      LabelWrap       =   -1  'True
      HideSelection   =   -1  'True
      FullRowSelect   =   -1  'True
      _Version        =   393217
      ForeColor       =   -2147483640
      BackColor       =   -2147483643
      BorderStyle     =   1
      Appearance      =   1
      NumItems        =   0
   End
   Begin VB.CommandButton cmdList
      Caption         =   "List"
      Height          =   390
      Left            =   4050
      TabIndex        =   1
      Top             =   900
      Width           =   1665
   End
   Begin VB.CommandButton cmdGetRandom
      Caption         =   "Get Random"
      Height          =   390
      Left            =   4050
      TabIndex        =   0
      Top             =   225
      Width           =   1665
   End
End
Attribute VB_Name = "frmSortRand"
Attribute VB_GlobalNameSpace = False
Attribute VB_Creatable = False
Attribute VB_PredeclaredId = True
Attribute VB_Exposed = False
Option Explicit
Const MAX_RAND = 1000

Private Sub cmdGetRandom_Click()
    Dim nIndex As Integer, nRand As Integer
    lstRandom.ListItems.Clear
    For nIndex = 1 To MAX_RAND
        lstRandom.ListItems.Add , "id=" & nIndex, CStr(Int((MAX_RAND * Rnd) + 1))
    Next
End Sub

Private Sub cmdList_Click()
    Dim aSorted(MAX_RAND) As Integer
    Dim nIndex As Integer
    For nIndex = 1 To lstRandom.ListItems.Count
        aSorted(CInt(lstRandom.ListItems.Item(nIndex).Text)) = 1
    Next
    lstOrder.ListItems.Clear
    For nIndex = 0 To MAX_RAND
        If (aSorted(nIndex) = 1) Then
            lstOrder.ListItems.Add , "id=" & nIndex, CStr(nIndex)
        End If
    Next
End Sub

Private Sub Form_Load()
    lstRandom.ColumnHeaders.Add , , , lstRandom.Width - 265
    lstOrder.ColumnHeaders.Add , , , lstOrder.Width - 265
End Sub
0
 
LVL 3

Accepted Solution

by:
Hornet241 earned 200 total points
ID: 6154859
This way will take a minute or so but you won't be hampered by Max limits of a list box.(It took an K6 266, 80Meg RAM 20 sec to loop through an array of 0 to 9999999)

Dim NumArray() As Boolean

ff = freefile
Open "YourFileName" for input as #ff

Do While not EOF(ff)
    Line Input #ff, InNum
    if Val(InNum) > LastHi then
        LastHi = Val(InNum)
        ReDim Preserve NumArray(LastHi) As Boolean
    End If
    ThisNum = Val(InNum)
    NumArray(ThisNum) = True
loop
Close #ff

ff = freefile
Open "NextFileName" for Output as #ff

For a = 0 To UBound(NumArray, 1)
    If NumArray(a) = True Then
        Print #ff, Trim(Str(a))
    End If
Next a

Close #ff
0
 
LVL 7

Author Comment

by:Z_Beeblebrox
ID: 6154926
Hi,

Just so you guys don't think I am ignoring this question, so far I prefer hornet241's solution. In fact, its pretty impressive. But just in case there is a better way, I will leave this question open until tomorrow evening. If by then there is no better answer then you can have the points.

Zaphod.
0
 
LVL 43

Expert Comment

by:TimCottee
ID: 6155448
Zaphod, another possible solution, this works equally well with strings and numbers, uses the collection object which allows you to directly test using a key the existence of an element:

Private colNumbers As Collection

Private Sub Command3_Click()
    Dim lngNumber As Long
    Dim lngCount As Long
    Set colNumbers = New Collection
    Do
        lngCount = lngCount + 1
        lngNumber = Rnd() * 10000
        If Not NumberExists(lngNumber) Then
            colNumbers.Add lngNumber, CStr(lngNumber)
        End If
        Label1.Caption = colNumbers.Count & " / " & lngCount
        Label2.Caption = lngNumber
        DoEvents
    Loop Until lngCount = 10000000000000#
    MsgBox colNumbers.Count
    Set colNumbers = Nothing
End Sub

Private Function NumberExists(ByVal Number As Long)
    On Error Resume Next
    If colNumbers(CStr(Number)) <> Number Then
        NumberExists = False
    Else
        NumberExists = True
    End If
End Function

This example uses random numbers, it you run it (with the label controls to see what is going on) you can see that the count of numbers tested goes up and up but the count of elements goes up to the maximum slowly (relatively) and then just sits there.
0
 
LVL 2

Expert Comment

by:Microsoft
ID: 6157663
accept hornets code, very well written even if i say so my self.

well done horn

cheers
Andy
0

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

When trying to find the cause of a problem in VBA or VB6 it's often valuable to know what procedures were executed prior to the error. You can use the Call Stack for that but it is often inadequate because it may show procedures you aren't intereste…
If you have ever used Microsoft Word then you know that it has a good spell checker and it may have occurred to you that the ability to check spelling might be a nice piece of functionality to add to certain applications of yours. Well the code that…
Get people started with the process of using Access VBA to control Excel using automation, Microsoft Access can control other applications. An example is the ability to programmatically talk to Excel. Using automation, an Access application can laun…
Get people started with the utilization of class modules. Class modules can be a powerful tool in Microsoft Access. They allow you to create self-contained objects that encapsulate functionality. They can easily hide the complexity of a process from…
Suggested Courses

649 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question