XML

20K

Solutions

13K

Contributors

Extensible Markup Language (XML) refers to the encoding of documents such that they can be read by both machines and humans. XML documents use tags to show the beginning and end of a set of data. XML is used extensively on websites to show volumes of data, and is the default for a number of office productivity suites. This topic includes discussions of XML-related technologies, such as XQuery (the XML Query language), XPath (the XML Path language), XSLT (eXtensible Stylesheet Language Transformations), XLink (the XML Linking language) and XPointer (the XML Pointer language).

Share tech news, updates, or what's on your mind.

Sign up to Post

Create a Windows 10 custom Image with custom task bar and custom start menu using XML for deployment.
1
CHALLENGE LAB: Troubleshooting Connectivity Issues
CHALLENGE LAB: Troubleshooting Connectivity Issues

Goal: Fix the connectivity issue in the lab's AWS environment so that you can SSH into the provided EC2 instance.  

Title Image
Many times as a report developer I've been asked to display normalized data such as three rows with values Jack, Joe, and Bob as a single comma-separated string such as 'Jack, Joe, Bob', and vice versa.  Here's how to do it. 
6
 
LVL 66

Author Comment

by:Jim Horn
Comment Utility
Done.  This is a republication from My Website which is allowed with the NoIndex checkbox checked per recent rules change.
0
Example PowerPoint Add-In
I was working on a PowerPoint add-in the other day and a client asked me "can you implement a feature which processes a chart when it's pasted into a slide from another deck?". It got me wondering how to hook into built-in ribbon events in Office.
5
 
LVL 16

Expert Comment

by:Kyle Santos
Comment Utility
Great job!
0
 
LVL 12

Author Comment

by:Jamie Garroch
Comment Utility
Thanks Kyle Santos :-)
0
PaperPort is a popular document imaging/management product from Nuance Communications, previously known as ScanSoft. PaperPort is in widespread use by both individuals and businesses.

The current version of PaperPort is 14. The previous version was 12. Yes, Nuance got superstitious and skipped 13. Both of these most recent versions come in two editions, Professional and Standard, although the Nuance folks do not call it Standard – they simply leave Professional off the name, i.e., PaperPort 12 and PaperPort Professional 12; PaperPort 14 and PaperPort Professional 14. In this article, I refer to them as PP-Std and PP-Pro, and all such references are valid for versions 12 and 14.

There are numerous differences between PP-Std and PP-Pro. The comparison matrices may be seen in the Files section at this PaperPort wiki in these files:

Comparison Matrix of PP12 Standard and PP12 Professional.pdf
Comparison Matrix of PP14 Standard and PP14 Professional.pdf

As shown in the documents above, one of the differences between PP-Std and PP-Pro is that the former allows only five Scanning Profiles to be created, while the latter allows an unlimited number. However, it turns out that PP-Std will properly handle an unlimited number of Scanning Profiles. The problem is that it won't let you create them. This is easy to overcome by creating the file containing the Scanning Profiles outside of PP-Std. This article describes two ways to do it.

3
 

Expert Comment

by:mapline
Comment Utility
Hi Joe
Great suggestion 2 comments:
1 My PP 14.5 std stores file in C:\ProgramData\Nuance\PaperPort\14\Profiles.xml (Windows 10)
2. Notepad++ great free app for viewin/editing xml files.
Many thanks
0
 
LVL 54

Author Comment

by:Joe Winograd, EE MVE 2015&2016
Comment Utility
Hi Michael,
Sorry I'm just replying to your 25-Mar-2016 comment now. I don't recollect seeing it when it first came in and only just now saw it when I received a notification that you endorsed the article today — btw, thanks for that!

> My PP 14.5 std stores file in C:\ProgramData\Nuance\PaperPort\14\Profiles.xml (Windows 10)

You will also find it at C:\Users\All Users\Nuance\PaperPort\14\Profiles.xml in W10. That's because C:\Users\All Users\ points to C:\ProgramData\. In other words, C:\ProgramData\ is the "real" folder and C:\Users\All Users\ is simply a pointer to it — technically known as a junction or symbolic link. So if you look at C:\ProgramData\ and C:\Users\All Users\ in your file manager, they'll show the identical contents, because they are one-and-the-same folder.

> Notepad++ great free app for viewing/editing xml files.

I have Notepad++ installed and agree that it is a great free app, although I use it only for test purposes, since I do all of my text editing with my fav text editor that I've been using forever. But thanks for the tip to our readers! Regards, Joe
0
The Confluence of Individual Knowledge and the Collective Intelligence
At this writing (summer 2013) the term API has made its way into the popular lexicon of the English language.  A few years ago, the term would only be recognized by oil companies and a few geeky programmers.  But today, the term gives relevance and meaning to the "rise of the machines."  The explosion of online storage and computing power has given us a host of new applications that perform highly valuable, highly specialized functions, and that enable direct machine-to-machine communication.  The output from these applications can be used by other applications to deliver rich internet application experiences that are customized and personalized.  The information from these functions can underpin business decisions in advertising and marketing, in shipping and transportation, in medical diagnosis, and many other data-intensive endeavors.

In the context of our discussion here, when we refer to "API" we mean "web API" -- specifically the collection and dissemination of information via HTTP protocols.  These APIs let servers talk to each other in ways that build powerful online applications with relatively little effort for the developers.

Governments and companies like Google, Yahoo, Weather.com, MapQuest, UnderTone Networks, and many others gather, analyze, store, collate and publish detailed information …
6
 

Expert Comment

by:APD_Toronto
Comment Utility
Good Explanation!
0
 

Expert Comment

by:APD_Toronto
Comment Utility
Good Explanation!
0
Browsing the questions asked to the Experts of this forum, you will be amazed to see how many times people are headaching about monster regular expressions (regex) to select that specific part of some HTML or XML file they want to extract. The examples in this article are code in PHP.

Even when their regex seem to work for their case, they realize that it is very hard to have a regex which is safe in all cases. V.gr., if you try to match the beginning of a paragraph, which is one of the simplest things you could imagine, you will try to match "<p>", isn't it?

Well, in that case, you are going to miss <P>, like in the first versions of HTML. I can already hear the answer of the regex fans: You only have to adapt your pattern to match "<[pP]>".

What happens when the coder of the HTML makes use of his liberty to add spaces between "<p" and the closing ">" like in "<p >"? The regex aren't still defeated: you can save the coding by having "<[pP] *>" as pattern.

And what if I want to match also <p some-attribute="some-string"> along with <p> without attributes? No problem: the regex becomes "<[pP][^>]*>". Pretty simple, no? But is doesn't work when one of the attribute strings contains a ">" char, or when the opening < and the closing > are on different lines, and that's really the point: XML and HTML files are not organised line by line as regex are written for.

It is better to stop here: I hope you start to …
0

Introduction

In my previous article I showed you how the XML Source component can be used to load XML files into a SQL Server database, using fairly simple XML structures.  In this follow-up article I will demonstrate how to tackle the complex XML issue.
 

The Complex XML Example

You probably know that SSRS reports, RDLs, are actually XML files.  And they’re not the easiest types of XML files around.  To humans they are still readable but the structure can be quite complex.  So there we’ve got our example: an RDL.  More specifically I’ll be using the RDL that’s available for download in one of my earlier articles.
 

The Goal

Every good example has got a goal.  Our goal today is to retrieve a list of datasets and fields as defined in the RDL.  Shouldn’t be too difficult, right?
 

Using The XML Source Component

Let’s try to get this done through the XML Source component with which we’re very familiar by now.  You know the drill: drag an XML Source into your Data Flow, open it up and configure the XML and XSD locations.

Note: to be able to do this I cheated a bit by manually manipulating the RDL a little.  More precisely I removed all the namespace references from the <report> tag and further down the XML (removed “rd:”).

With both files configured, let’s have a look at the Columns page:

 The XML Source component handling a really complex XML file
7
 

Expert Comment

by:Giuseppe Serra
Comment Utility
Hi. How can I fetch an XML from a remote URL that also required credentials login? I have no clue how set up the package.
0
 

Expert Comment

by:Giuseppe Serra
Comment Utility
Hi. How can I fetch an XML from a remote URL that also required credentials login? I have no clue how to set up the package.
0
The Client Need Led Us to RSS
I recently had an investment company ask me how they might notify their constituents about their newsworthy publications.  Probably you would think "Facebook" or "Twitter" but this is an interesting client.  Their constituents are mostly bankers and builders.  The average demographic was male, 50 years old, using a desktop computer, and occasionally a Blackberry.  Not exactly the Twitter crowd.  We considered using broadcast email, but the client wanted something more automatic and less intrusive.  And since attachments may not make it to a Blackberry something other than email seemed to be needed.

When the investment company released a publication, they made a PDF for print and they were willing to use FTP to place a copy of the PDF on their web server.  But that was the extent of their interest in the process; they wanted automation to handle the rest.

So we arrived at the idea of an RSS feed.  It was automated, and just low-tech enough that everyone could understand it.

RSS Feeds
An RSS feed is simply a specialized subset of XML.  It carries only a few bits of information, such as a title, description and link.  RSS is very ightweight and easy to use.   A competing standard, Atom, is quite similar.  We chose RSS because everyone at the meeting had heard of it, perhaps because it also forms the basis for podcasting.  
More information on RSS is available here:
http://cyber.law.harvard.edu/rss/rss.html
And …
5

The Problem

How to write an Xquery that works like a SQL outer join, providing placeholders for absent data on the outer side?  I give a bit more background at the end.

The situation expressed as relational data
Let’s work through this.  I’ve mocked up some data in Access, as I can’t share the original data with you.

We have some data in tblMain, and some related data in tblSub.  We’re going to write a query with an outer join so that we can see everything in tblMain and related data in tblSub.

(An inner join would act as a constraint and limit what was displayed from tblMain to only those records with a match in tblSub.)

Here are some screenshots to illustrate.

 tblMain tblSub
MainID is the Primary Key in tblMain and the Foreign Key in tblSub.

The SQL Solution
Here's the SQL that will produce the outer join that I'm after:
 
SELECT 
    tblMain.MainID, 
    tblMain.MainText, 
    tblSub.SubID, 
    tblSub.Sub
FROM 
    tblMain 
LEFT OUTER JOIN 
    tblSub 
ON 
    tblMain.MainID = tblSub.MainID;

Open in new window


The result of this SQL is shown in the next figure.  As you can see, and this will be no surprise to you folks who work with databases, the datasheet has empty cells where there is no matching data from tblSub.
 SQL output
Now let's contrast this with the XML case.

Expressed as XML
The same data, once exported to XML, is as follows.
tblMain:
<?xml 

Open in new window

0
 
LVL 8

Author Comment

by:Andrew_Webster
Comment Utility
I posted this to the x-query.com mailing list, and so far I've had two really great responses.

First up, from David Carlisle is this:
 
I don't think you need build the matched an unmatched cases separately and then sort them back. Also sorting on $sorted/ID will be a string sort and so (on saxon at least) 10 comes before 2 (I think it depends on your systems default collation

I'd just do something like

xquery version "1.0";
<results>
{
  for $main in doc("tblMain.xml")/dataroot/tblMain
  let $subs :=  doc("tblSub.xml")/dataroot/tblSub[MainID = $main/MainID]
   return
   if (exists($subs))
   then
       for $sub in $subs
       return
       <row>
       <ID>{data($main/MainID)}</ID>
       <Main>{data($main/MainText)}</Main>
       <Sub>{data($sub/Sub)}</Sub>
       </row>
   else
   <row>
       <ID>{data($main/MainID)}</ID>
       <Main>{data($main/MainText)}</Main>
       <Sub>-</Sub>
   </row>
}
</results>

Open in new window


Second up, from Martin Probst, is this simple solution:
 
Both of these illustrate some great lessons in XQuery.  For me, they are a lesson in how I think in SQL, and can visualize SQL solutions, but still really struggle to do so with XQuery.  However, I've been developing with XQuery for fifteen years and with XQuery for three months, so maybe that's not a surprise.

I've got to write queries based on what I've learned here that will operate on XML data representing millions of rows of relational data.  I'll try all three approaches and check them for performance, then I'll post the results here.  
0
 
LVL 8

Author Comment

by:Andrew_Webster
Comment Utility
Here's the code for Martin's solution - it didn't make it to the last comment!
<results>
{
for $main in doc("C:\Xquery Outer Joins\tblMain.xml")/dataroot/tblMain
let $subs := doc("C:\Xquery Outer
Joins\tblSub.xml")/dataroot/tblSub[MainID=$main/MainID]/Sub
let $actual := if ($subs) then $subs else '-'
for $sub in $actual
return
<row>
   <MainID>{data($main/MainID)}</MainID>
   <MainText>{data($main/MainText)}</MainText>
   <Sub>{data($sub)}</Sub>
</row>
}
</results>

Open in new window

0

XML

20K

Solutions

13K

Contributors

Extensible Markup Language (XML) refers to the encoding of documents such that they can be read by both machines and humans. XML documents use tags to show the beginning and end of a set of data. XML is used extensively on websites to show volumes of data, and is the default for a number of office productivity suites. This topic includes discussions of XML-related technologies, such as XQuery (the XML Query language), XPath (the XML Path language), XSLT (eXtensible Stylesheet Language Transformations), XLink (the XML Linking language) and XPointer (the XML Pointer language).