Solved

Using regular expressions to convert chess pgn data into columns

Posted on 2007-11-25
11
811 Views
Last Modified: 2010-04-30
What I want to do is take a chess .pgn file and convert it to print out in column form by running a Word or Excel macro.

The raw.pgn file looks like:

[Event "Team 45 45 League T35"]
[Site "Internet Chess Club"]
[Date "2007.10.21"]
[Round "1"]
[White "Checkmates"]
[Black "MBelli"]
[Result "1/2-1/2"]
[WhiteElo "2394"]
[BlackElo "2530"]
[Opening "King's Indian: Petrosian system, Stein variation"]
[ECO "E92"]
[Time "00:29:48"]
[TimeControl "2700+45"]

1. d4 {0:45:41} Nf6 {0:45:32} 2. c4 {0:46:21} g6 {0:46:15} 3. Nc3 {0:47:04}
Bg7 {0:46:59} 4. e4 {0:47:48} d6 {0:47:43} 5. Nf3 {0:48:10} O-O {0:48:26} 6.
Be2 {0:48:52} e5 {0:49:10} 7. d5 {0:49:35} a5 {0:49:51} 8. Bg5 {0:50:17} h6
{0:50:19} 9. Bh4 {0:50:58} Na6 {0:50:59} 10. Nd2 {0:51:39} Qe8 {0:51:39} 11.
O-O {0:52:12} Nh7 {0:52:06} 12. a3 {0:52:46} Bd7 {0:52:48} 13. Kh1 {0:53:28}
h5 {0:53:09} 14. f3 {0:54:09} Bh6 {0:53:36} 15. b3 {0:54:46} Qb8 {0:54:12}
16. Bf2 {0:55:20} f5 {0:53:51} 17. Qc2 {0:55:26} Qd8 {0:53:25} 18. Rab1
{0:55:55} Qg5 {0:53:30} 19. Rfd1 {0:55:37} h4 {0:53:36} 20. b4 {0:55:52}
axb4 {0:54:19} 21. axb4 {0:56:32} Nf6 {0:55:00} 22. c5 {0:55:11} Nh5
{0:54:46} 23. Bd3 {0:52:51} {Game drawn by mutual agreement} 1/2-1/2

and I would like to have it look like:

1. d4 Nf6
2. c4 g6
3. Nc3 Bg7
etc.

I am a regular expression rookie that has hit the wall on this problem. Thanks in advance for the help!

Fred
0
Comment
Question by:freddick
  • 6
  • 5
11 Comments
 
LVL 92

Expert Comment

by:Patrick Matthews
ID: 20346692
Hello freddick,

This is close--it works on the sample data, but fails on Move 23 because it does not fit the pattern.

Regards,

Patrick
Option Explicit 
Sub GrabThem() 
    Dim fso As Object
    Dim ts As Object
    Dim AllFile As String
    Dim arr As Variant
    Dim Counter As Long
    Dim TheLine As String
    
    Set fso = CreateObject("Scripting.FileSystemObject")
    Set ts = fso.OpenTextFile("c:\foo.txt")
    
    AllFile = ts.ReadAll
    
    arr = RegExpFind(AllFile, "\d+\.\s+([a-z]|-)+\d?\s+\{\d+:\d{2}:\d{2}\}\s+([a-z]|-)+\d?\s+\{\d+:\d{2}:\d{2}\}", , _
        False)
    
    For Counter = 0 To UBound(arr)
        Cells(Counter + 1, 1) = Counter + 1
        Cells(Counter + 1, 2) = RegExpFind(CStr(arr(Counter)), "([a-z]|-)+\d?", 1, False)
        Cells(Counter + 1, 3) = RegExpFind(CStr(arr(Counter)), "([a-z]|-)+\d?", 2, False)
    Next
    
    ts.Close
    Set ts = Nothing
    Set fso = Nothing
    
End Sub 
Function RegExpFind(LookIn As String, PatternStr As String, Optional Pos, _
    Optional MatchCase As Boolean = True) 
    ' This function uses Regular Expressions to parse a string (LookIn), and return matches to a
    ' pattern (PatternStr).  Use Pos to indicate which match you want:
    ' Pos omitted               : function returns a zero-based array of all matches
    ' Pos = 0                   : the last match
    ' Pos = 1                   : the first match
    ' Pos = 2                   : the second match
    ' Pos = <positive integer>  : the Nth match
    ' If Pos is greater than the number of matches, is negative, or is non-numeric, the function
    ' returns an empty string.  If no match is found, the function returns an empty string
    
    ' If MatchCase is omitted or True (default for RegExp) then the Pattern must match case (and
    ' thus you may have to use [a-zA-Z] instead of just [a-z] or [A-Z]).
    
    ' If you use this function in Excel, you can use range references for any of the arguments.
    ' If you use this in Excel and return the full array, make sure to set up the formula as an
    ' array formula.  If you need the array formula to go down a column, use TRANSPOSE()
    
    Dim RegX As Object
    Dim TheMatches As Object
    Dim Answer() As String
    Dim Counter As Long
    
    ' Evaluate Pos.  If it is there, it must be numeric and converted to Long
    If Not IsMissing(Pos) Then
        If Not IsNumeric(Pos) Then
            RegExpFind = ""
            Exit Function
        Else
            Pos = CLng(Pos)
        End If
    End If
    
    ' Create instance of RegExp object
    Set RegX = CreateObject("VBScript.RegExp")
    With RegX
        .Pattern = PatternStr
        .Global = True
        .IgnoreCase = Not MatchCase
    End With
        
    ' Test to see if there are any matches
    If RegX.test(LookIn) Then
        
        ' Run RegExp to get the matches, which are returned as a zero-based collection
        Set TheMatches = RegX.Execute(LookIn)
        
        ' If Pos is missing, user wants array of all matches.  Build it and assign it as the
        ' function's return value
        If IsMissing(Pos) Then
            ReDim Answer(0 To TheMatches.Count - 1) As String
            For Counter = 0 To UBound(Answer)
                Answer(Counter) = TheMatches(Counter)
            Next
            RegExpFind = Answer
        
        ' User wanted the Nth match (or last match, if Pos = 0).  Get the Nth value, if possible
        Else
            Select Case Pos
                Case 0                          ' Last match
                    RegExpFind = TheMatches(TheMatches.Count - 1)
                Case 1 To TheMatches.Count      ' Nth match
                    RegExpFind = TheMatches(Pos - 1)
                Case Else                       ' Invalid item number
                    RegExpFind = ""
            End Select
        End If
    
    ' If there are no matches, return empty string
    Else
        RegExpFind = ""
    End If
    
    ' Release object variables
    Set RegX = Nothing
    Set TheMatches = Nothing
    
End Function

Open in new window

0
 
LVL 92

Expert Comment

by:Patrick Matthews
ID: 20346742
freddick,

Before anyone can come up with a fix for the "last move", you are going to have t show how the
various end of game scenarios (checkmate, resigns, draw, out of time, etc) are depicted.

Regards,

Patrick
0
 
LVL 92

Expert Comment

by:Patrick Matthews
ID: 20346780
freddick,

Taking a look at a PGN standards page (http://www.chessclub.com/help/PGN-spec) I see that
I also have not allowed for pawn promotions, and no doubt other things too...

Regards,

Patrick
0
 

Author Comment

by:freddick
ID: 20347529
Thank you for looking at this.

I thought that the easiest way to do this was to paste from move 1. to the end of all the moves into Word, then do a find \{*:*:*\} and replace with 'nothing' to get thet times out.

Then go to the beggining of the moves and start with the move number  1. then go to the second set of double spaces and insert a comma.

1.<space>d4<space><space>f5<space><space>,

then do a text to table conversion based on the ,

One problem that I ran into was that once the {times} were deleted the text realigned itself and sometimes replaced one of the second spaces with a paragraph mark.

The other problem that I ran into was in matching the move number my pattern would first match tens digit and then the whole number e.g., the 1 in 12 and then 12.

What I then unsuccessfully tried to do was to say, "Ok, I have removed the times and am left with something like this:

1.<space>d4<space><space>f5<space><space>2.<space> etc.

...and all I really care about is starting at 1. and matching until the space before the next move number, in this case 2. and then inserting my comma in order to be able to perform a text to table conversion. But I didn't know how to make the pattern match look at the entire move number only, for example 12 and not 1 and then 12. And then once I found the second number how to back up one space and insert my comma.  

Some additional info --


Piece names:

pawn - usually referred to by their starting square

d5
dxe5

in the case of a doubled pawn where more than 1 pawn could capture

d4xe5

R = rook
N = Knight
B = Bishop
Q = Queen
K = King

A rare case where 2 Knights (for example) could be used to capture 2 opponents Knights, both starting from the same rank or file and capturing on the same (for the target pieces) rank of file

Na6xNb8

Pawn promotion could be any of the following:

a8=Q
a8=Q+ with check
a8=Q# with mate

or the pawn could promote to any other piece R,N,B,Q

in short, any move could end in check (+) or checkmate (#)


0
 

Author Comment

by:freddick
ID: 20349614
Does anyone have a solution in Perl or ?.
0
Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

 
LVL 92

Expert Comment

by:Patrick Matthews
ID: 20349841
freddick,

OK, tried a different approach, and this seems to be working more generally.  It should pick up
"regular" moves, captures, pawn promotions, castling, check, and checkmate, as well as the
various game outcomes (draw, black/white wins, abandoned).

Regards,

Patrick
Option Explicit 
Sub GrabThem() 
    Dim fso As Object
    Dim ts As Object
    Dim AllFile As String
    Dim arr As Variant
    Dim Counter As Long
    Dim TheMove As String
    
    Set fso = CreateObject("Scripting.FileSystemObject")
    Set ts = fso.OpenTextFile("c:\foo.txt")
    
    AllFile = ts.ReadAll
    
    arr = RegExpFind(AllFile, "([RNBQK]?[a-h][1-8](x[RNBQK]?[a-h][1-8])?(=[RNBQK])?[\+#]?)|O(-O){1,2}|" & _
        "1/2-1/2|1-0|0-1|\*", , True)
    
    ' use 1 and not 0 as lower terminus because 0 will grab the result from the tags at the top of the file
    For Counter = 1 To UBound(arr) Step 2
        Cells((Counter + 1) / 2, 1) = (Counter + 1) / 2
        TheMove = arr(Counter)
        Select Case TheMove
            Case "1/2-1/2": Cells((Counter + 1) / 2, 2) = "Draw"
            Case "1-0": Cells((Counter + 1) / 2, 2) = "White wins"
            Case "0-1": Cells((Counter + 1) / 2, 2) = "Black wins"
            Case "*": Cells((Counter + 1) / 2, 2) = "Game abandoned"
            Case Else: Cells((Counter + 1) / 2, 2) = TheMove
        End Select
        If (Counter + 1) <= UBound(arr) Then
            TheMove = arr(Counter + 1)
            Select Case TheMove
                Case "1/2-1/2": Cells((Counter + 1) / 2, 3) = "Draw"
                Case "1-0": Cells((Counter + 1) / 2, 3) = "White wins"
                Case "0-1": Cells((Counter + 1) / 2, 3) = "Black wins"
                Case "*": Cells((Counter + 1) / 2, 3) = "Game abandoned"
                Case Else: Cells((Counter + 1) / 2, 3) = TheMove
            End Select
        End If
    Next
    
    ts.Close
    Set ts = Nothing
    Set fso = Nothing
    
End Sub 
Function RegExpFind(LookIn As String, PatternStr As String, Optional Pos, _
    Optional MatchCase As Boolean = True) 
    ' This function uses Regular Expressions to parse a string (LookIn), and return matches to a
    ' pattern (PatternStr).  Use Pos to indicate which match you want:
    ' Pos omitted               : function returns a zero-based array of all matches
    ' Pos = 0                   : the last match
    ' Pos = 1                   : the first match
    ' Pos = 2                   : the second match
    ' Pos = <positive integer>  : the Nth match
    ' If Pos is greater than the number of matches, is negative, or is non-numeric, the function
    ' returns an empty string.  If no match is found, the function returns an empty string
    
    ' If MatchCase is omitted or True (default for RegExp) then the Pattern must match case (and
    ' thus you may have to use [a-zA-Z] instead of just [a-z] or [A-Z]).
    
    ' If you use this function in Excel, you can use range references for any of the arguments.
    ' If you use this in Excel and return the full array, make sure to set up the formula as an
    ' array formula.  If you need the array formula to go down a column, use TRANSPOSE()
    
    Dim RegX As Object
    Dim TheMatches As Object
    Dim Answer() As String
    Dim Counter As Long
    
    ' Evaluate Pos.  If it is there, it must be numeric and converted to Long
    If Not IsMissing(Pos) Then
        If Not IsNumeric(Pos) Then
            RegExpFind = ""
            Exit Function
        Else
            Pos = CLng(Pos)
        End If
    End If
    
    ' Create instance of RegExp object
    Set RegX = CreateObject("VBScript.RegExp")
    With RegX
        .Pattern = PatternStr
        .Global = True
        .IgnoreCase = Not MatchCase
    End With
        
    ' Test to see if there are any matches
    If RegX.test(LookIn) Then
        
        ' Run RegExp to get the matches, which are returned as a zero-based collection
        Set TheMatches = RegX.Execute(LookIn)
        
        ' If Pos is missing, user wants array of all matches.  Build it and assign it as the
        ' function's return value
        If IsMissing(Pos) Then
            ReDim Answer(0 To TheMatches.Count - 1) As String
            For Counter = 0 To UBound(Answer)
                Answer(Counter) = TheMatches(Counter)
            Next
            RegExpFind = Answer
        
        ' User wanted the Nth match (or last match, if Pos = 0).  Get the Nth value, if possible
        Else
            Select Case Pos
                Case 0                          ' Last match
                    RegExpFind = TheMatches(TheMatches.Count - 1)
                Case 1 To TheMatches.Count      ' Nth match
                    RegExpFind = TheMatches(Pos - 1)
                Case Else                       ' Invalid item number
                    RegExpFind = ""
            End Select
        End If
    
    ' If there are no matches, return empty string
    Else
        RegExpFind = ""
    End If
    
    ' Release object variables
    Set RegX = Nothing
    Set TheMatches = Nothing
    
End Function

Open in new window

0
 
LVL 92

Expert Comment

by:Patrick Matthews
ID: 20349871
freddick,

BTW, that code I just posted assumes that in the tags at the top of the file you have a tag like this:

[Result "1/2-1/2"]
[Result "1-0"]
[Result "0-1"]
[Result "*"]

Basically, I am counting on seeing the game disposition twice: once in the opening tags, and again
in the MoveText.

Regards,

Patrick
0
 

Author Comment

by:freddick
ID: 20357711
Hi Patrick,

Thank you for all your work on this! It's very much appreciated.

I tried the code on a few games and it always stops at the same spot at line 20...

Cells((Counter + 1) / 2, 1) =

Any ideas?

Thank You!

Fred
0
 

Author Comment

by:freddick
ID: 20357765
Hi Patrick,

Just to make sure I am using this properly...

Copy Patrick's program
Open Word and create new macro from program
Go to www.team4545league.org and copy entire game pgn code (or use code in above message)
Paste game code into Word
CTRL+HOME
Run new macro

get error line 20 "Compile error: Sub or Function not defined"

Fred
0
 
LVL 92

Accepted Solution

by:
Patrick Matthews earned 500 total points
ID: 20362551
fred,

Sorry for the confusion--the code I posted is for Excel, not Word :)

Regards,

Patrick
0
 

Author Comment

by:freddick
ID: 20367572
Thanks Patrick!

Works great!

One little thing left to do, if you are up to it, is if the game ends on the first player's move it doesn't print the last move. In the case above, line 23 is missing after you run the macro.

THANKS!!

Fred
0

Featured Post

Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

Join & Write a Comment

Nice table. Huge mess. Maybe this was something you created way back before you figured out tabs or a document you received from someone else. Either way, using the spacebar to separate the columns resulted in a mess. Trying to convert text to t…
Shortcuts in Word Just the other day I had a training for Microsoft and they wanted me to show how well the new Windows and Office behaved on a touch device, which by the way is great, but it was only then that I realized that using keyboard shortc…
This video walks the viewer through the process of creating a watermark for their document, customizing it, and saving it for viewing/printing needs.
Office 365 is currently available in five editions. Three of them are for business use: Office 365 Business Essentials, Office 365 Business, and Office 365 Business Premium. Two of them are for home/personal use: Office 365 Home and Office 365 Perso…

744 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now