Solved

Word List Pivot table not counting correctly.

Posted on 2013-11-12
11
412 Views
Last Modified: 2013-11-14
Using the following macro to generate word counts of a list of words


Sub MakeWordList(wkbk As Excel.Workbook)
    Dim InputSheet As Worksheet
    Dim WordListSheet As Worksheet
    Dim PuncChars As Variant, x As Variant
    Dim i As Long, r As Long
    Dim txt As String
    Dim wordCnt As Long
    Dim AllWords As Range
    Dim PC As PivotCache
    Dim PT As PivotTable
   
    Application.ScreenUpdating = False
    Set InputSheet = Workbooks("Automated Interview Analysis.xlsm").Sheets("owssvr")
 
    Set WordListSheet = wkbk.Worksheets.Add
    WordListSheet.name = "FreqAll"
    WordListSheet.Range("A1") = "All Words"
    WordListSheet.Range("A1").Font.Bold = True
    InputSheet.Activate
    wordCnt = 2
    PuncChars = Array(".", ",", ";", ":", "'", "!", "#", _
        "$", "%", "&", "(", ")", " - ", "_", "--", "+", _
        "=", "~", "/", "\", "{", "}", "[", "]", """", "?", "*")
    r = 1

'   Loop until blank cell is encountered
    Do While Cells(r, 1) <> ""
'       covert to UPPERCASE
        txt = UCase(Cells(r, 1))
'       Remove punctuation
        For i = 0 To UBound(PuncChars)
            txt = Replace(txt, PuncChars(i), "")
        Next i
'       Remove excess spaces
        txt = WorksheetFunction.Trim(txt)
'       Extract the words
        x = Split(txt)
        For i = 0 To UBound(x)
            WordListSheet.Cells(wordCnt, 1) = x(i)
            wordCnt = wordCnt + 1
        Next i
    r = r + 1
    Loop
   
'   Create pivot table
    WordListSheet.Activate
    Set AllWords = Range("A1").CurrentRegion
    Set PC = ActiveWorkbook.PivotCaches.Add _
        (SourceType:=xlDatabase, _
        SourceData:=AllWords)
    Set PT = PC.CreatePivotTable _
        (TableDestination:=Range("C1"), _
        tableName:="PivotTable1")
    With PT
        .AddDataField .PivotFields("All Words")
        .PivotFields("All Words").Orientation = xlRowField
    End With
End Sub


However, when I create th epivot table.  It puts some words twice with the count and so they need to be further summed.  Any reason this is happening?  Can't figure it out.
0
Comment
Question by:HyperBPP
  • 6
  • 5
11 Comments
 
LVL 85

Accepted Solution

by:
Rory Archibald earned 500 total points
ID: 39644184
Hard to be sure without a sample workbook but I suspect you have some non-printing characters that are causing the 'duplicates'.
0
 
LVL 6

Author Comment

by:HyperBPP
ID: 39644987
The document is a governemnt FOUO document so I cannot post it.  How would I identify these non-printing characters?
0
 
LVL 85

Expert Comment

by:Rory Archibald
ID: 39645008
Start by testing the length of the cells in the pivot table row fields (using the LEN function) and see if they are different.
0
 
LVL 6

Author Comment

by:HyperBPP
ID: 39645066
Tested.  They are coming up different.  If I click in a cell they don't appear to be different.  So how do I get rid of the non-printing characters?
0
 
LVL 85

Expert Comment

by:Rory Archibald
ID: 39645113
If you edit the cell do there appear to be any trailing or leading spaces? Did the data perhaps come from a web page?
0
U.S. Department of Agriculture and Acronis Access

With the new era of mobile computing, smartphones and tablets, wireless communications and cloud services, the USDA sought to take advantage of a mobilized workforce and the blurring lines between personal and corporate computing resources.

 
LVL 6

Author Comment

by:HyperBPP
ID: 39645136
If edit the cell there are no leading or trailing spaces.  The data came from a MS word document.  I tried the following function that I found online:

Public Function TrimComplete(ByVal sValue As String) As _
        String

        Dim sAns As String
        Dim sWkg As String
        Dim sChar As String
        Dim lLen As Long
        Dim lCtr As Long

        sAns = sValue
        lLen = Len(sValue)

        If lLen > 0 Then
            'Ltrim
            For lCtr = 1 To lLen
                sChar = Mid(sAns, lCtr, 1)
                If (Asc(sChar) > 32) and (Asc(sChar) < 127) Then Exit For
            Next

            sAns = Mid(sAns, lCtr)
            lLen = Len(sAns)

            'Rtrim
            If lLen > 0 Then
                For lCtr = lLen To 1 Step -1
                    sChar = Mid(sAns, lCtr, 1)
                    If (Asc(sChar) > 32) and (Asc(sChar) < 127) Then Exit For
                Next
            End If
            sAns = Left$(sAns, lCtr)
        End If

        TrimComplete = sAns

    End Function

It collapsed a couple of the words.   However, there appear to still be a few non-printable characters after this function runs.  It fixed some of though.
0
 
LVL 85

Expert Comment

by:Rory Archibald
ID: 39645336
Can you test this version on a copy of your workbook?

Sub MakeWordList(wkbk As Excel.Workbook)
   Dim strPattern                  As String
   Dim RegExp                      As Object
   Dim InputSheet                  As Worksheet
   Dim WordListSheet               As Worksheet
   Dim x                           As Variant
   Dim i                           As Long
   Dim r                           As Long
   Dim txt                         As String
   Dim wordCnt                     As Long
   Dim AllWords                    As Range
   Dim PC                          As PivotCache
   Dim PT                          As PivotTable

   Application.ScreenUpdating = False
   '    Set InputSheet = Workbooks("Automated Interview Analysis.xlsm").Sheets("owssvr")
   Set InputSheet = ActiveWorkbook.Sheets(1)
   Set WordListSheet = wkbk.Worksheets.Add
   Set RegExp = CreateObject("vbscript.regexp")
   strPattern = "([^A-Za-z0-9 ]+)"
   With RegExp
      .Pattern = strPattern
      .IgnoreCase = False
   End With
   WordListSheet.Name = "FreqAll"
   WordListSheet.Range("A1") = "All Words"
   WordListSheet.Range("A1").Font.Bold = True
   InputSheet.Activate
   r = 1

   '   Loop until blank cell is encountered
   Do While Cells(r, 1) <> ""
      '       covert to UPPERCASE
      txt = UCase(Cells(r, 1).Value)
      '       clean text
      txt = RegExp.Replace(txt, "")
      '       Remove excess spaces
      txt = WorksheetFunction.Trim(txt)
      '       Extract the words
      x = Split(txt)
      For i = 0 To UBound(x)
         WordListSheet.Cells(wordCnt, 1) = x(i)
         wordCnt = wordCnt + 1
      Next i
      r = r + 1
   Loop

   '   Create pivot table
   WordListSheet.Activate
   Set AllWords = Range("A1").CurrentRegion
   Set PC = ActiveWorkbook.PivotCaches.Add _
            (SourceType:=xlDatabase, _
             SourceData:=AllWords)
   Set PT = PC.CreatePivotTable _
            (TableDestination:=Range("C1"), _
             TableName:="PivotTable1")
   With PT
      .AddDataField .PivotFields("All Words")
      .PivotFields("All Words").Orientation = xlRowField
   End With
End Sub

Open in new window

0
 
LVL 6

Author Comment

by:HyperBPP
ID: 39645365
Throws and error at line 42.... Application defined or object defined error
0
 
LVL 85

Expert Comment

by:Rory Archibald
ID: 39645416
Oops - too much hack and slash. Add back:
wordCnt = 2

Open in new window

before line 28
0
 
LVL 6

Author Comment

by:HyperBPP
ID: 39645425
Does only a few words, spits out 1's for those words and then stops.  No errors.  Just very incomplete.
0
 
LVL 6

Author Closing Comment

by:HyperBPP
ID: 39648589
This was the case.  Ended up using the Clean() function to recitfy.
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction This Article briefly covers methods of calculating the NPV and IRR variants in Excel as well as the limitations in calculating and interpreting IRR results. Paraphrasing Richard Shockley, author of my favourite finance reference tex…
As with any other System Center product, the installation for the Authoring Tool can be quite a pain sometimes. This article serves to help you avoid making these mistakes and hopefully save you a ton of time on troubleshooting :)  Step 1: Make sur…
The viewer will learn how to use the =DISCRINV command to create a discrete random variable, use this command to model a set of probabilities and outcomes in a Monte Carlo simulation, and learn how to find the standard deviation of a set of probabil…
The viewer will learn how to create a normally distributed random variable in Excel, use a normal distribution to simulate the return on an investment over a period of years, Create a Monte Carlo simulation using a normal random variable, and calcul…

919 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

23 Experts available now in Live!

Get 1:1 Help Now