Link to home
Start Free TrialLog in
Avatar of freddick
freddick

asked on

Using regular expressions to convert chess pgn data into columns

What I want to do is take a chess .pgn file and convert it to print out in column form by running a Word or Excel macro.

The raw.pgn file looks like:

[Event "Team 45 45 League T35"]
[Site "Internet Chess Club"]
[Date "2007.10.21"]
[Round "1"]
[White "Checkmates"]
[Black "MBelli"]
[Result "1/2-1/2"]
[WhiteElo "2394"]
[BlackElo "2530"]
[Opening "King's Indian: Petrosian system, Stein variation"]
[ECO "E92"]
[Time "00:29:48"]
[TimeControl "2700+45"]

1. d4 {0:45:41} Nf6 {0:45:32} 2. c4 {0:46:21} g6 {0:46:15} 3. Nc3 {0:47:04}
Bg7 {0:46:59} 4. e4 {0:47:48} d6 {0:47:43} 5. Nf3 {0:48:10} O-O {0:48:26} 6.
Be2 {0:48:52} e5 {0:49:10} 7. d5 {0:49:35} a5 {0:49:51} 8. Bg5 {0:50:17} h6
{0:50:19} 9. Bh4 {0:50:58} Na6 {0:50:59} 10. Nd2 {0:51:39} Qe8 {0:51:39} 11.
O-O {0:52:12} Nh7 {0:52:06} 12. a3 {0:52:46} Bd7 {0:52:48} 13. Kh1 {0:53:28}
h5 {0:53:09} 14. f3 {0:54:09} Bh6 {0:53:36} 15. b3 {0:54:46} Qb8 {0:54:12}
16. Bf2 {0:55:20} f5 {0:53:51} 17. Qc2 {0:55:26} Qd8 {0:53:25} 18. Rab1
{0:55:55} Qg5 {0:53:30} 19. Rfd1 {0:55:37} h4 {0:53:36} 20. b4 {0:55:52}
axb4 {0:54:19} 21. axb4 {0:56:32} Nf6 {0:55:00} 22. c5 {0:55:11} Nh5
{0:54:46} 23. Bd3 {0:52:51} {Game drawn by mutual agreement} 1/2-1/2

and I would like to have it look like:

1. d4 Nf6
2. c4 g6
3. Nc3 Bg7
etc.

I am a regular expression rookie that has hit the wall on this problem. Thanks in advance for the help!

Fred
Avatar of Patrick Matthews
Patrick Matthews
Flag of United States of America image

Hello freddick,

This is close--it works on the sample data, but fails on Move 23 because it does not fit the pattern.

Regards,

Patrick
Option Explicit 
Sub GrabThem() 
    Dim fso As Object
    Dim ts As Object
    Dim AllFile As String
    Dim arr As Variant
    Dim Counter As Long
    Dim TheLine As String
    
    Set fso = CreateObject("Scripting.FileSystemObject")
    Set ts = fso.OpenTextFile("c:\foo.txt")
    
    AllFile = ts.ReadAll
    
    arr = RegExpFind(AllFile, "\d+\.\s+([a-z]|-)+\d?\s+\{\d+:\d{2}:\d{2}\}\s+([a-z]|-)+\d?\s+\{\d+:\d{2}:\d{2}\}", , _
        False)
    
    For Counter = 0 To UBound(arr)
        Cells(Counter + 1, 1) = Counter + 1
        Cells(Counter + 1, 2) = RegExpFind(CStr(arr(Counter)), "([a-z]|-)+\d?", 1, False)
        Cells(Counter + 1, 3) = RegExpFind(CStr(arr(Counter)), "([a-z]|-)+\d?", 2, False)
    Next
    
    ts.Close
    Set ts = Nothing
    Set fso = Nothing
    
End Sub 
Function RegExpFind(LookIn As String, PatternStr As String, Optional Pos, _
    Optional MatchCase As Boolean = True) 
    ' This function uses Regular Expressions to parse a string (LookIn), and return matches to a
    ' pattern (PatternStr).  Use Pos to indicate which match you want:
    ' Pos omitted               : function returns a zero-based array of all matches
    ' Pos = 0                   : the last match
    ' Pos = 1                   : the first match
    ' Pos = 2                   : the second match
    ' Pos = <positive integer>  : the Nth match
    ' If Pos is greater than the number of matches, is negative, or is non-numeric, the function
    ' returns an empty string.  If no match is found, the function returns an empty string
    
    ' If MatchCase is omitted or True (default for RegExp) then the Pattern must match case (and
    ' thus you may have to use [a-zA-Z] instead of just [a-z] or [A-Z]).
    
    ' If you use this function in Excel, you can use range references for any of the arguments.
    ' If you use this in Excel and return the full array, make sure to set up the formula as an
    ' array formula.  If you need the array formula to go down a column, use TRANSPOSE()
    
    Dim RegX As Object
    Dim TheMatches As Object
    Dim Answer() As String
    Dim Counter As Long
    
    ' Evaluate Pos.  If it is there, it must be numeric and converted to Long
    If Not IsMissing(Pos) Then
        If Not IsNumeric(Pos) Then
            RegExpFind = ""
            Exit Function
        Else
            Pos = CLng(Pos)
        End If
    End If
    
    ' Create instance of RegExp object
    Set RegX = CreateObject("VBScript.RegExp")
    With RegX
        .Pattern = PatternStr
        .Global = True
        .IgnoreCase = Not MatchCase
    End With
        
    ' Test to see if there are any matches
    If RegX.test(LookIn) Then
        
        ' Run RegExp to get the matches, which are returned as a zero-based collection
        Set TheMatches = RegX.Execute(LookIn)
        
        ' If Pos is missing, user wants array of all matches.  Build it and assign it as the
        ' function's return value
        If IsMissing(Pos) Then
            ReDim Answer(0 To TheMatches.Count - 1) As String
            For Counter = 0 To UBound(Answer)
                Answer(Counter) = TheMatches(Counter)
            Next
            RegExpFind = Answer
        
        ' User wanted the Nth match (or last match, if Pos = 0).  Get the Nth value, if possible
        Else
            Select Case Pos
                Case 0                          ' Last match
                    RegExpFind = TheMatches(TheMatches.Count - 1)
                Case 1 To TheMatches.Count      ' Nth match
                    RegExpFind = TheMatches(Pos - 1)
                Case Else                       ' Invalid item number
                    RegExpFind = ""
            End Select
        End If
    
    ' If there are no matches, return empty string
    Else
        RegExpFind = ""
    End If
    
    ' Release object variables
    Set RegX = Nothing
    Set TheMatches = Nothing
    
End Function

Open in new window

freddick,

Before anyone can come up with a fix for the "last move", you are going to have t show how the
various end of game scenarios (checkmate, resigns, draw, out of time, etc) are depicted.

Regards,

Patrick
freddick,

Taking a look at a PGN standards page (http://www.chessclub.com/help/PGN-spec) I see that
I also have not allowed for pawn promotions, and no doubt other things too...

Regards,

Patrick
Avatar of freddick
freddick

ASKER

Thank you for looking at this.

I thought that the easiest way to do this was to paste from move 1. to the end of all the moves into Word, then do a find \{*:*:*\} and replace with 'nothing' to get thet times out.

Then go to the beggining of the moves and start with the move number  1. then go to the second set of double spaces and insert a comma.

1.<space>d4<space><space>f5<space><space>,

then do a text to table conversion based on the ,

One problem that I ran into was that once the {times} were deleted the text realigned itself and sometimes replaced one of the second spaces with a paragraph mark.

The other problem that I ran into was in matching the move number my pattern would first match tens digit and then the whole number e.g., the 1 in 12 and then 12.

What I then unsuccessfully tried to do was to say, "Ok, I have removed the times and am left with something like this:

1.<space>d4<space><space>f5<space><space>2.<space> etc.

...and all I really care about is starting at 1. and matching until the space before the next move number, in this case 2. and then inserting my comma in order to be able to perform a text to table conversion. But I didn't know how to make the pattern match look at the entire move number only, for example 12 and not 1 and then 12. And then once I found the second number how to back up one space and insert my comma.  

Some additional info --


Piece names:

pawn - usually referred to by their starting square

d5
dxe5

in the case of a doubled pawn where more than 1 pawn could capture

d4xe5

R = rook
N = Knight
B = Bishop
Q = Queen
K = King

A rare case where 2 Knights (for example) could be used to capture 2 opponents Knights, both starting from the same rank or file and capturing on the same (for the target pieces) rank of file

Na6xNb8

Pawn promotion could be any of the following:

a8=Q
a8=Q+ with check
a8=Q# with mate

or the pawn could promote to any other piece R,N,B,Q

in short, any move could end in check (+) or checkmate (#)


Does anyone have a solution in Perl or ?.
freddick,

OK, tried a different approach, and this seems to be working more generally.  It should pick up
"regular" moves, captures, pawn promotions, castling, check, and checkmate, as well as the
various game outcomes (draw, black/white wins, abandoned).

Regards,

Patrick
Option Explicit 
Sub GrabThem() 
    Dim fso As Object
    Dim ts As Object
    Dim AllFile As String
    Dim arr As Variant
    Dim Counter As Long
    Dim TheMove As String
    
    Set fso = CreateObject("Scripting.FileSystemObject")
    Set ts = fso.OpenTextFile("c:\foo.txt")
    
    AllFile = ts.ReadAll
    
    arr = RegExpFind(AllFile, "([RNBQK]?[a-h][1-8](x[RNBQK]?[a-h][1-8])?(=[RNBQK])?[\+#]?)|O(-O){1,2}|" & _
        "1/2-1/2|1-0|0-1|\*", , True)
    
    ' use 1 and not 0 as lower terminus because 0 will grab the result from the tags at the top of the file
    For Counter = 1 To UBound(arr) Step 2
        Cells((Counter + 1) / 2, 1) = (Counter + 1) / 2
        TheMove = arr(Counter)
        Select Case TheMove
            Case "1/2-1/2": Cells((Counter + 1) / 2, 2) = "Draw"
            Case "1-0": Cells((Counter + 1) / 2, 2) = "White wins"
            Case "0-1": Cells((Counter + 1) / 2, 2) = "Black wins"
            Case "*": Cells((Counter + 1) / 2, 2) = "Game abandoned"
            Case Else: Cells((Counter + 1) / 2, 2) = TheMove
        End Select
        If (Counter + 1) <= UBound(arr) Then
            TheMove = arr(Counter + 1)
            Select Case TheMove
                Case "1/2-1/2": Cells((Counter + 1) / 2, 3) = "Draw"
                Case "1-0": Cells((Counter + 1) / 2, 3) = "White wins"
                Case "0-1": Cells((Counter + 1) / 2, 3) = "Black wins"
                Case "*": Cells((Counter + 1) / 2, 3) = "Game abandoned"
                Case Else: Cells((Counter + 1) / 2, 3) = TheMove
            End Select
        End If
    Next
    
    ts.Close
    Set ts = Nothing
    Set fso = Nothing
    
End Sub 
Function RegExpFind(LookIn As String, PatternStr As String, Optional Pos, _
    Optional MatchCase As Boolean = True) 
    ' This function uses Regular Expressions to parse a string (LookIn), and return matches to a
    ' pattern (PatternStr).  Use Pos to indicate which match you want:
    ' Pos omitted               : function returns a zero-based array of all matches
    ' Pos = 0                   : the last match
    ' Pos = 1                   : the first match
    ' Pos = 2                   : the second match
    ' Pos = <positive integer>  : the Nth match
    ' If Pos is greater than the number of matches, is negative, or is non-numeric, the function
    ' returns an empty string.  If no match is found, the function returns an empty string
    
    ' If MatchCase is omitted or True (default for RegExp) then the Pattern must match case (and
    ' thus you may have to use [a-zA-Z] instead of just [a-z] or [A-Z]).
    
    ' If you use this function in Excel, you can use range references for any of the arguments.
    ' If you use this in Excel and return the full array, make sure to set up the formula as an
    ' array formula.  If you need the array formula to go down a column, use TRANSPOSE()
    
    Dim RegX As Object
    Dim TheMatches As Object
    Dim Answer() As String
    Dim Counter As Long
    
    ' Evaluate Pos.  If it is there, it must be numeric and converted to Long
    If Not IsMissing(Pos) Then
        If Not IsNumeric(Pos) Then
            RegExpFind = ""
            Exit Function
        Else
            Pos = CLng(Pos)
        End If
    End If
    
    ' Create instance of RegExp object
    Set RegX = CreateObject("VBScript.RegExp")
    With RegX
        .Pattern = PatternStr
        .Global = True
        .IgnoreCase = Not MatchCase
    End With
        
    ' Test to see if there are any matches
    If RegX.test(LookIn) Then
        
        ' Run RegExp to get the matches, which are returned as a zero-based collection
        Set TheMatches = RegX.Execute(LookIn)
        
        ' If Pos is missing, user wants array of all matches.  Build it and assign it as the
        ' function's return value
        If IsMissing(Pos) Then
            ReDim Answer(0 To TheMatches.Count - 1) As String
            For Counter = 0 To UBound(Answer)
                Answer(Counter) = TheMatches(Counter)
            Next
            RegExpFind = Answer
        
        ' User wanted the Nth match (or last match, if Pos = 0).  Get the Nth value, if possible
        Else
            Select Case Pos
                Case 0                          ' Last match
                    RegExpFind = TheMatches(TheMatches.Count - 1)
                Case 1 To TheMatches.Count      ' Nth match
                    RegExpFind = TheMatches(Pos - 1)
                Case Else                       ' Invalid item number
                    RegExpFind = ""
            End Select
        End If
    
    ' If there are no matches, return empty string
    Else
        RegExpFind = ""
    End If
    
    ' Release object variables
    Set RegX = Nothing
    Set TheMatches = Nothing
    
End Function

Open in new window

freddick,

BTW, that code I just posted assumes that in the tags at the top of the file you have a tag like this:

[Result "1/2-1/2"]
[Result "1-0"]
[Result "0-1"]
[Result "*"]

Basically, I am counting on seeing the game disposition twice: once in the opening tags, and again
in the MoveText.

Regards,

Patrick
Hi Patrick,

Thank you for all your work on this! It's very much appreciated.

I tried the code on a few games and it always stops at the same spot at line 20...

Cells((Counter + 1) / 2, 1) =

Any ideas?

Thank You!

Fred
Hi Patrick,

Just to make sure I am using this properly...

Copy Patrick's program
Open Word and create new macro from program
Go to www.team4545league.org and copy entire game pgn code (or use code in above message)
Paste game code into Word
CTRL+HOME
Run new macro

get error line 20 "Compile error: Sub or Function not defined"

Fred
ASKER CERTIFIED SOLUTION
Avatar of Patrick Matthews
Patrick Matthews
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thanks Patrick!

Works great!

One little thing left to do, if you are up to it, is if the game ends on the first player's move it doesn't print the last move. In the case above, line 23 is missing after you run the macro.

THANKS!!

Fred