freddick
asked on
Using regular expressions to convert chess pgn data into columns
What I want to do is take a chess .pgn file and convert it to print out in column form by running a Word or Excel macro.
The raw.pgn file looks like:
[Event "Team 45 45 League T35"]
[Site "Internet Chess Club"]
[Date "2007.10.21"]
[Round "1"]
[White "Checkmates"]
[Black "MBelli"]
[Result "1/2-1/2"]
[WhiteElo "2394"]
[BlackElo "2530"]
[Opening "King's Indian: Petrosian system, Stein variation"]
[ECO "E92"]
[Time "00:29:48"]
[TimeControl "2700+45"]
1. d4 {0:45:41} Nf6 {0:45:32} 2. c4 {0:46:21} g6 {0:46:15} 3. Nc3 {0:47:04}
Bg7 {0:46:59} 4. e4 {0:47:48} d6 {0:47:43} 5. Nf3 {0:48:10} O-O {0:48:26} 6.
Be2 {0:48:52} e5 {0:49:10} 7. d5 {0:49:35} a5 {0:49:51} 8. Bg5 {0:50:17} h6
{0:50:19} 9. Bh4 {0:50:58} Na6 {0:50:59} 10. Nd2 {0:51:39} Qe8 {0:51:39} 11.
O-O {0:52:12} Nh7 {0:52:06} 12. a3 {0:52:46} Bd7 {0:52:48} 13. Kh1 {0:53:28}
h5 {0:53:09} 14. f3 {0:54:09} Bh6 {0:53:36} 15. b3 {0:54:46} Qb8 {0:54:12}
16. Bf2 {0:55:20} f5 {0:53:51} 17. Qc2 {0:55:26} Qd8 {0:53:25} 18. Rab1
{0:55:55} Qg5 {0:53:30} 19. Rfd1 {0:55:37} h4 {0:53:36} 20. b4 {0:55:52}
axb4 {0:54:19} 21. axb4 {0:56:32} Nf6 {0:55:00} 22. c5 {0:55:11} Nh5
{0:54:46} 23. Bd3 {0:52:51} {Game drawn by mutual agreement} 1/2-1/2
and I would like to have it look like:
1. d4 Nf6
2. c4 g6
3. Nc3 Bg7
etc.
I am a regular expression rookie that has hit the wall on this problem. Thanks in advance for the help!
Fred
The raw.pgn file looks like:
[Event "Team 45 45 League T35"]
[Site "Internet Chess Club"]
[Date "2007.10.21"]
[Round "1"]
[White "Checkmates"]
[Black "MBelli"]
[Result "1/2-1/2"]
[WhiteElo "2394"]
[BlackElo "2530"]
[Opening "King's Indian: Petrosian system, Stein variation"]
[ECO "E92"]
[Time "00:29:48"]
[TimeControl "2700+45"]
1. d4 {0:45:41} Nf6 {0:45:32} 2. c4 {0:46:21} g6 {0:46:15} 3. Nc3 {0:47:04}
Bg7 {0:46:59} 4. e4 {0:47:48} d6 {0:47:43} 5. Nf3 {0:48:10} O-O {0:48:26} 6.
Be2 {0:48:52} e5 {0:49:10} 7. d5 {0:49:35} a5 {0:49:51} 8. Bg5 {0:50:17} h6
{0:50:19} 9. Bh4 {0:50:58} Na6 {0:50:59} 10. Nd2 {0:51:39} Qe8 {0:51:39} 11.
O-O {0:52:12} Nh7 {0:52:06} 12. a3 {0:52:46} Bd7 {0:52:48} 13. Kh1 {0:53:28}
h5 {0:53:09} 14. f3 {0:54:09} Bh6 {0:53:36} 15. b3 {0:54:46} Qb8 {0:54:12}
16. Bf2 {0:55:20} f5 {0:53:51} 17. Qc2 {0:55:26} Qd8 {0:53:25} 18. Rab1
{0:55:55} Qg5 {0:53:30} 19. Rfd1 {0:55:37} h4 {0:53:36} 20. b4 {0:55:52}
axb4 {0:54:19} 21. axb4 {0:56:32} Nf6 {0:55:00} 22. c5 {0:55:11} Nh5
{0:54:46} 23. Bd3 {0:52:51} {Game drawn by mutual agreement} 1/2-1/2
and I would like to have it look like:
1. d4 Nf6
2. c4 g6
3. Nc3 Bg7
etc.
I am a regular expression rookie that has hit the wall on this problem. Thanks in advance for the help!
Fred
freddick,
Before anyone can come up with a fix for the "last move", you are going to have t show how the
various end of game scenarios (checkmate, resigns, draw, out of time, etc) are depicted.
Regards,
Patrick
Before anyone can come up with a fix for the "last move", you are going to have t show how the
various end of game scenarios (checkmate, resigns, draw, out of time, etc) are depicted.
Regards,
Patrick
freddick,
Taking a look at a PGN standards page (http://www.chessclub.com/help/PGN-spec) I see that
I also have not allowed for pawn promotions, and no doubt other things too...
Regards,
Patrick
Taking a look at a PGN standards page (http://www.chessclub.com/help/PGN-spec) I see that
I also have not allowed for pawn promotions, and no doubt other things too...
Regards,
Patrick
ASKER
Thank you for looking at this.
I thought that the easiest way to do this was to paste from move 1. to the end of all the moves into Word, then do a find \{*:*:*\} and replace with 'nothing' to get thet times out.
Then go to the beggining of the moves and start with the move number 1. then go to the second set of double spaces and insert a comma.
1.<space>d4<space><space>f 5<space><s pace>,
then do a text to table conversion based on the ,
One problem that I ran into was that once the {times} were deleted the text realigned itself and sometimes replaced one of the second spaces with a paragraph mark.
The other problem that I ran into was in matching the move number my pattern would first match tens digit and then the whole number e.g., the 1 in 12 and then 12.
What I then unsuccessfully tried to do was to say, "Ok, I have removed the times and am left with something like this:
1.<space>d4<space><space>f 5<space><s pace>2.<sp ace> etc.
...and all I really care about is starting at 1. and matching until the space before the next move number, in this case 2. and then inserting my comma in order to be able to perform a text to table conversion. But I didn't know how to make the pattern match look at the entire move number only, for example 12 and not 1 and then 12. And then once I found the second number how to back up one space and insert my comma.
Some additional info --
Piece names:
pawn - usually referred to by their starting square
d5
dxe5
in the case of a doubled pawn where more than 1 pawn could capture
d4xe5
R = rook
N = Knight
B = Bishop
Q = Queen
K = King
A rare case where 2 Knights (for example) could be used to capture 2 opponents Knights, both starting from the same rank or file and capturing on the same (for the target pieces) rank of file
Na6xNb8
Pawn promotion could be any of the following:
a8=Q
a8=Q+ with check
a8=Q# with mate
or the pawn could promote to any other piece R,N,B,Q
in short, any move could end in check (+) or checkmate (#)
I thought that the easiest way to do this was to paste from move 1. to the end of all the moves into Word, then do a find \{*:*:*\} and replace with 'nothing' to get thet times out.
Then go to the beggining of the moves and start with the move number 1. then go to the second set of double spaces and insert a comma.
1.<space>d4<space><space>f
then do a text to table conversion based on the ,
One problem that I ran into was that once the {times} were deleted the text realigned itself and sometimes replaced one of the second spaces with a paragraph mark.
The other problem that I ran into was in matching the move number my pattern would first match tens digit and then the whole number e.g., the 1 in 12 and then 12.
What I then unsuccessfully tried to do was to say, "Ok, I have removed the times and am left with something like this:
1.<space>d4<space><space>f
...and all I really care about is starting at 1. and matching until the space before the next move number, in this case 2. and then inserting my comma in order to be able to perform a text to table conversion. But I didn't know how to make the pattern match look at the entire move number only, for example 12 and not 1 and then 12. And then once I found the second number how to back up one space and insert my comma.
Some additional info --
Piece names:
pawn - usually referred to by their starting square
d5
dxe5
in the case of a doubled pawn where more than 1 pawn could capture
d4xe5
R = rook
N = Knight
B = Bishop
Q = Queen
K = King
A rare case where 2 Knights (for example) could be used to capture 2 opponents Knights, both starting from the same rank or file and capturing on the same (for the target pieces) rank of file
Na6xNb8
Pawn promotion could be any of the following:
a8=Q
a8=Q+ with check
a8=Q# with mate
or the pawn could promote to any other piece R,N,B,Q
in short, any move could end in check (+) or checkmate (#)
ASKER
Does anyone have a solution in Perl or ?.
freddick,
OK, tried a different approach, and this seems to be working more generally. It should pick up
"regular" moves, captures, pawn promotions, castling, check, and checkmate, as well as the
various game outcomes (draw, black/white wins, abandoned).
Regards,
Patrick
OK, tried a different approach, and this seems to be working more generally. It should pick up
"regular" moves, captures, pawn promotions, castling, check, and checkmate, as well as the
various game outcomes (draw, black/white wins, abandoned).
Regards,
Patrick
Option Explicit
Sub GrabThem()
Dim fso As Object
Dim ts As Object
Dim AllFile As String
Dim arr As Variant
Dim Counter As Long
Dim TheMove As String
Set fso = CreateObject("Scripting.FileSystemObject")
Set ts = fso.OpenTextFile("c:\foo.txt")
AllFile = ts.ReadAll
arr = RegExpFind(AllFile, "([RNBQK]?[a-h][1-8](x[RNBQK]?[a-h][1-8])?(=[RNBQK])?[\+#]?)|O(-O){1,2}|" & _
"1/2-1/2|1-0|0-1|\*", , True)
' use 1 and not 0 as lower terminus because 0 will grab the result from the tags at the top of the file
For Counter = 1 To UBound(arr) Step 2
Cells((Counter + 1) / 2, 1) = (Counter + 1) / 2
TheMove = arr(Counter)
Select Case TheMove
Case "1/2-1/2": Cells((Counter + 1) / 2, 2) = "Draw"
Case "1-0": Cells((Counter + 1) / 2, 2) = "White wins"
Case "0-1": Cells((Counter + 1) / 2, 2) = "Black wins"
Case "*": Cells((Counter + 1) / 2, 2) = "Game abandoned"
Case Else: Cells((Counter + 1) / 2, 2) = TheMove
End Select
If (Counter + 1) <= UBound(arr) Then
TheMove = arr(Counter + 1)
Select Case TheMove
Case "1/2-1/2": Cells((Counter + 1) / 2, 3) = "Draw"
Case "1-0": Cells((Counter + 1) / 2, 3) = "White wins"
Case "0-1": Cells((Counter + 1) / 2, 3) = "Black wins"
Case "*": Cells((Counter + 1) / 2, 3) = "Game abandoned"
Case Else: Cells((Counter + 1) / 2, 3) = TheMove
End Select
End If
Next
ts.Close
Set ts = Nothing
Set fso = Nothing
End Sub
Function RegExpFind(LookIn As String, PatternStr As String, Optional Pos, _
Optional MatchCase As Boolean = True)
' This function uses Regular Expressions to parse a string (LookIn), and return matches to a
' pattern (PatternStr). Use Pos to indicate which match you want:
' Pos omitted : function returns a zero-based array of all matches
' Pos = 0 : the last match
' Pos = 1 : the first match
' Pos = 2 : the second match
' Pos = <positive integer> : the Nth match
' If Pos is greater than the number of matches, is negative, or is non-numeric, the function
' returns an empty string. If no match is found, the function returns an empty string
' If MatchCase is omitted or True (default for RegExp) then the Pattern must match case (and
' thus you may have to use [a-zA-Z] instead of just [a-z] or [A-Z]).
' If you use this function in Excel, you can use range references for any of the arguments.
' If you use this in Excel and return the full array, make sure to set up the formula as an
' array formula. If you need the array formula to go down a column, use TRANSPOSE()
Dim RegX As Object
Dim TheMatches As Object
Dim Answer() As String
Dim Counter As Long
' Evaluate Pos. If it is there, it must be numeric and converted to Long
If Not IsMissing(Pos) Then
If Not IsNumeric(Pos) Then
RegExpFind = ""
Exit Function
Else
Pos = CLng(Pos)
End If
End If
' Create instance of RegExp object
Set RegX = CreateObject("VBScript.RegExp")
With RegX
.Pattern = PatternStr
.Global = True
.IgnoreCase = Not MatchCase
End With
' Test to see if there are any matches
If RegX.test(LookIn) Then
' Run RegExp to get the matches, which are returned as a zero-based collection
Set TheMatches = RegX.Execute(LookIn)
' If Pos is missing, user wants array of all matches. Build it and assign it as the
' function's return value
If IsMissing(Pos) Then
ReDim Answer(0 To TheMatches.Count - 1) As String
For Counter = 0 To UBound(Answer)
Answer(Counter) = TheMatches(Counter)
Next
RegExpFind = Answer
' User wanted the Nth match (or last match, if Pos = 0). Get the Nth value, if possible
Else
Select Case Pos
Case 0 ' Last match
RegExpFind = TheMatches(TheMatches.Count - 1)
Case 1 To TheMatches.Count ' Nth match
RegExpFind = TheMatches(Pos - 1)
Case Else ' Invalid item number
RegExpFind = ""
End Select
End If
' If there are no matches, return empty string
Else
RegExpFind = ""
End If
' Release object variables
Set RegX = Nothing
Set TheMatches = Nothing
End Function
freddick,
BTW, that code I just posted assumes that in the tags at the top of the file you have a tag like this:
[Result "1/2-1/2"]
[Result "1-0"]
[Result "0-1"]
[Result "*"]
Basically, I am counting on seeing the game disposition twice: once in the opening tags, and again
in the MoveText.
Regards,
Patrick
BTW, that code I just posted assumes that in the tags at the top of the file you have a tag like this:
[Result "1/2-1/2"]
[Result "1-0"]
[Result "0-1"]
[Result "*"]
Basically, I am counting on seeing the game disposition twice: once in the opening tags, and again
in the MoveText.
Regards,
Patrick
ASKER
Hi Patrick,
Thank you for all your work on this! It's very much appreciated.
I tried the code on a few games and it always stops at the same spot at line 20...
Cells((Counter + 1) / 2, 1) =
Any ideas?
Thank You!
Fred
Thank you for all your work on this! It's very much appreciated.
I tried the code on a few games and it always stops at the same spot at line 20...
Cells((Counter + 1) / 2, 1) =
Any ideas?
Thank You!
Fred
ASKER
Hi Patrick,
Just to make sure I am using this properly...
Copy Patrick's program
Open Word and create new macro from program
Go to www.team4545league.org and copy entire game pgn code (or use code in above message)
Paste game code into Word
CTRL+HOME
Run new macro
get error line 20 "Compile error: Sub or Function not defined"
Fred
Just to make sure I am using this properly...
Copy Patrick's program
Open Word and create new macro from program
Go to www.team4545league.org and copy entire game pgn code (or use code in above message)
Paste game code into Word
CTRL+HOME
Run new macro
get error line 20 "Compile error: Sub or Function not defined"
Fred
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thanks Patrick!
Works great!
One little thing left to do, if you are up to it, is if the game ends on the first player's move it doesn't print the last move. In the case above, line 23 is missing after you run the macro.
THANKS!!
Fred
Works great!
One little thing left to do, if you are up to it, is if the game ends on the first player's move it doesn't print the last move. In the case above, line 23 is missing after you run the macro.
THANKS!!
Fred
This is close--it works on the sample data, but fails on Move 23 because it does not fit the pattern.
Regards,
Patrick
Open in new window