Search and replace, swapping tags

I have thousands of XML files that contain tags I need to swap. The tags are contained in the following tag:

<assessmentItem xmlns="http://www.imsglobal.org/xsd/imsqti_v2p1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.imsglobal.org/xsd/imsqti_v2p1  http://www.imsglobal.org/xsd/qti/qtiv2p1/imsqti_v2p1p1.xsd http://www.w3.org/1998/Math/MathML http://www.w3.org/Math/XMLSchema/mathml2/mathml2.xsd" identifier="choice" title="ABC1234567" adaptive="false" timeDependent="false">

Open in new window

I need to swap the identifier and title, or if swapping is too involved, I really just need the identifier to contain the value of the title.

However, further down in the file, are additional occurrences of both identifier= and title=, so I can't just find and replace them.

I'm not sure about the best process to go about here?
LVL 2
musickmannData AnalystAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Bill PrewCommented:
What platform are you working on (Windows, Unix, Apple, etc...)?

So, you want the updated line in your example to read as below afterwards?

<assessmentItem xmlns="http://www.imsglobal.org/xsd/imsqti_v2p1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.imsglobal.org/xsd/imsqti_v2p1  http://www.imsglobal.org/xsd/qti/qtiv2p1/imsqti_v2p1p1.xsd http://www.w3.org/1998/Math/MathML http://www.w3.org/Math/XMLSchema/mathml2/mathml2.xsd" identifier="ABC1234567" title="choice" adaptive="false" timeDependent="false">

Open in new window


»bp
0
musickmannData AnalystAuthor Commented:
I'm running a Mac, but have a Win10 VM as well.

That is the correct re-written line.

What I've been doing is making two replacements
First:
Find: " title
Replace: " identifier
Second:
Find: xsd" identifier
Replace: xsd" title

However, since I can't be 100% sure there won't be an errant match elsewhere in the file, I am doing this in batches of 300 since the files happen to be in folders of 300 each. As long as each Find/Replace only matches 300 occurrences, then I'll be okay to go on that package. But, with over 85,000 files, this will take a long time :)
0
Bill PrewCommented:
Can you supply a full example file?

So, would this always be the first match in a file, so if we only replaced the first occurrence per file, might that get there?


»bp
0
Cloud Class® Course: Microsoft Office 2010

This course will introduce you to the interfaces and features of Microsoft Office 2010 Word, Excel, PowerPoint, Outlook, and Access. You will learn about the features that are shared between all products in the Office suite, as well as the new features that are product specific.

musickmannData AnalystAuthor Commented:
Interesting thought, it would be the first occurrence in the file. It is always in the header area of the file. Attached is a sample full file. These are based on the QTI 2.1 standard from IMS Global.

The content I need to swap will always be in the <assessmentItem> area.
0
Bill PrewCommented:
Attached is a sample full file.
Sorry, nothing attached.

»bp
0
musickmannData AnalystAuthor Commented:
Ha, guess it helps to click the upload file button.
QTI-Question-Sample.xml
0
Bill PrewCommented:
Here is a small Powershell script that would do the job I believe.  Adjust the $folder variable to reference your folder.  And since it updates the files in place test on a copy of data to make sure it works as desired.

$folder = 'B:\EE\EE29077691\Files'
$filter = '*.xml'
Get-ChildItem $folder -Filter $filter | 
Foreach-Object {
    Write-Host $_.FullName
    [xml]$xml = (Get-Content $_.FullName)
    $node = $xml.assessmentItem
    $saveTitle = $node.title
    $node.title = $node.identifier
    $node.identifier = $saveTitle
    $xml.Save($_.FullName)
}

Open in new window


»bp
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
musickmannData AnalystAuthor Commented:
So this would be the first time I've ever used anything in Powershell, so that's pretty cool on it's own.

It appears to be doing as expected, but I wanted to add a few counts just as a check to make sure I see the expected numbers, I think I got it right but would appreciate a second set of eyes.

In looking through this, another thought popped in my head, and it would be a different question altogether, but is it possible to have powershell parse through XML files and export data from some of the nodes into a txt file?

 
$folder = 'Z:\LocalFiles\test'
$filter = '*.xml'
$item_count = 0
$identifier_count = 0
$title_count = 0
Get-ChildItem $folder -Filter $filter | 
Foreach-Object {
    Write-Host $_.FullName
    [xml]$xml = (Get-Content $_.FullName)
    $node = $xml.assessmentItem
    $saveTitle = $node.title
    $node.title = $node.identifier
    $title_count ++
    $node.identifier = $saveTitle
    $identifier_count ++
    $xml.Save($_.FullName)
    Write-Host "New Title:" $node.title
    Write-Host "New Identifier:" $node.identifier
    $item_count ++
}
Write-Host "Total Items:" $item_count
Write-Host "Total Identifiers:" $identifier_count
Write-Host "Total Titles:" $title_count

Open in new window

0
musickmannData AnalystAuthor Commented:
I actually do have one followup -
There is one file in the folders that I would want to exclude, it's specifically named imsmanifest.xml. I've tried adding an -exclude option, I tried changing the filter to -include and adding -exclude, but the output is just the script, no action taken.

It isn't critical, as this files structure doesn't have the same tags, so it more than likely won't match, but just want to be double sure.
0
musickmannData AnalystAuthor Commented:
Thanks so much, this will be super helpful, and I'm excited about maybe poking around some PowerShell - double win!
0
Bill PrewCommented:
You could do the following.  -Filter is faster than -Include / -Exclude, so often preferred for simple selections.

Get-ChildItem $folder -Include "*.xml" -Exclude "imsmanifest.xml" |

Open in new window


»bp
0
musickmannData AnalystAuthor Commented:
The Include/Exclude combo results in no files being processed, so I wanted to share my process to resolve in case anyone else finds this question helpful.

The end result that is working for line 6 is:
Get-ChildItem "$folder\\*" -file -Exclude "imsmanifest.xml" |

I wasn't clear with the entirety of the directory structure at the beginning of this question as I was just focused on the immediate need, but since the solution was great, it was easy to go back and tweak down to be a little more specific.

The folder structure is
Package folder
--passages folder
--images folder
--audio folder
--sytlesheets folder
items.xml (hundreds of files)
imsmaniest.xml (one file)

I wanted to alter only the items. Using just the -Exclude option to remove the imsmanifest file, the script still tried to process the folders and presented errors, which didn't hurt anything, but threw off my counts. I expect to see 300 at the end of each package, but was getting anything from 301-303, which made me pause to figure out what happened and review the console.

With the above solution, only the item files are being processed, so my counts should be 300 unless truly something unexpected happened, so I can streamline my process, run the script and move forward without double checking each package.

I also added another count check by creating the new folder/file list as a variable
$list = Get-ChildItem "$folder\\*" -file -Exclude "imsmanifest.xml"
And at the end in my little report section adding $list.count as Total Items in Package. This way, if a package has a different number of items, I'll know without thinking something went wrong.

Thanks again for a great solution and a little primer into PowerShell Bill!
1
Bill PrewCommented:
I think you could also do:

Get-ChildItem $folder\*.xml -Exclude "imsmanifest.xml" |

Open in new window


»bp
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Grep

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.