?
Solved

Excel VBA to Extract XBRL Data

Posted on 2014-08-13
12
Medium Priority
?
1,592 Views
Last Modified: 2014-08-17
Hi, I have downloaded programmatically from SEC (https://www.sec.gov/Archives/edgar/data/1800/000110465914056950/0001104659-14-056950-index.htm) a set of XBRL documents in zipped format  for a specific entity ( CIK Number is 0000001800    ) to my local drive.
The zip file is 0000001800_0001104659-14-056950-xbrl.zip and contains 6 xml documents. One of these files is named: abt-20140630.xml.

I would like to pull 4 values from the said xml to Excel using VBA ( without the need to unzipped the zip file).
i.e. Sales: 5551000000, CostOfGoodsSold: 2506000000, Start Date: 2014-01-01 and End Date:2014-06-30
The reference sources are:

1. <us-gaap:SalesRevenueNet id="Tag334" decimals="-6" contextRef="D2014Q2" unitRef="USD">5551000000</us-gaap:SalesRevenueNet>
2. <us-gaap:CostOfGoodsSold id="Tag437" decimals="-6" contextRef="D2014Q2" unitRef="USD">2506000000</us-gaap:CostOfGoodsSold>

3. <xbrli:period>
      <xbrli:startDate>2014-01-01</xbrli:startDate>
      <xbrli:endDate>2014-06-30</xbrli:endDate>
    </xbrli:period>


Specifications:
1. MS Excel 2010
2. Windows 8

Would appreciate your help with the VBA code to achieve this.

Thank you.
0000001800-0001104659-14-056950-xbrl.zip
0
Comment
Question by:Russellbrown
  • 6
  • 6
12 Comments
 
LVL 46

Expert Comment

by:aikimark
ID: 40262108
VBA can be used to extract one or more files from a zip file like this:
Dim oShell As Object

Set oShell = CreateObject("Shell.Application")
oShell.Namespace("C:\users\RussellBrown\Downloads").CopyHere oApp.Namespace("C:\users\RussellBrown\Downloads\0000001800_0001104659-14-056950-xbrl.zip").items.Item("abt-20140630.xml")

Open in new window


Once extracted, you would use a DOM object or a regular expression object to extract the data from the abt-20140630.xml file

This is a regular expression pattern to get that data
<xbrli:startDate>([^<]*)</xbrli:startDate>(?:.|\n)*?<xbrli:endDate>([^<]*)</xbrli:endDate>(?:.|\n)*?<us-gaap:SalesRevenueNet id="Tag334"[^>]*>(\d*)</us-gaap:SalesRevenueNet>(?:.|\n)*?<us-gaap:CostOfGoodsSold id="Tag437"[^>]*>(\d*)</us-gaap:CostOfGoodsSold>

Open in new window

Which produces the following

SubMatch 0: 2014-01-01
SubMatch 1: 2014-06-30
SubMatch 2: 5551000000
SubMatch 3: 2506000000

I'm not sure of some of the details.  For instance,
1. Since you have multiple SalesRevenueNet and CostOfGoodsSold items, do we differentiate them by just their id values or by other/more data?
2. Since there are multiple xbrli:startDate and xbrli:endDate items, which pair should be used?
0
 

Author Comment

by:Russellbrown
ID: 40262253
Hi, thank you very much for the reply.
I am able to get the first part working i.e.

Sub ExtractFromZip()
Dim oShell As Object

Set oShell = CreateObject("Shell.Application")
oShell.Namespace("C:\ZTestSEC\TestEE").CopyHere oShell.Namespace("C:\ZTestSEC\0000001800_0001104659-14-056950-xbrl.zip").items.Item("abt-20140630.xml")

End Sub

How would I add the additional code for the second part i.e.
'<xbrli:startDate>([^<]*)</xbrli:startDate>(?:.|\n)*?<xbrli:endDate>([^<]*)</xbrli:endDate>(?:.|\n)*?<us-gaap:SalesRevenueNet id="Tag334"[^>]*>(\d*)</us-gaap:SalesRevenueNet>(?:.|\n)*?<us-gaap:CostOfGoodsSold id="Tag437"[^>]*>(\d*)</us-gaap:CostOfGoodsSold>

Sorry, I am not familiar with the DOM object or a regular expression object. Can you please assist by providing the vba code lines?

Will revert on the 2 questions that you have kindly raised  after getting the code done,  in order to keep it simple.

Thanks.
0
 
LVL 46

Expert Comment

by:aikimark
ID: 40262276
Please answer my questions relative to the actual XML file.
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:Russellbrown
ID: 40262307
Hi,
1. for SalesRevenueNet and CostOfGoodsSold  items- we will use the contextRef, and in this case it is ="D2014Q2" representing the 2nd quarter of Year 2014. Later on we can modify it into a variable.
2. For the start and end dates - the first instance of the <xbrli:period></xbrli:period> if it is possible.
0
 
LVL 46

Accepted Solution

by:
aikimark earned 2000 total points
ID: 40262402
Here is a version of VBA code that displays the parsed data in the immediate window
Option Explicit

Sub Q_28497353()
    Dim oRE As Object
    Dim oMatches As Object
    Dim oM As Object
    Dim oSM As Object
    Dim lngSM As Long
    Dim strXML As String
    Dim oShell As Object
    Const cPath As String = "C:\users\RussellBrown\Downloads"
    Const cFile As String = "abt-20140630.xml"
    Const cZipfile As String = "0000001800-0001104659-14-056950-xbrl.zip"
    
    If Len(Dir(cPath & "\" & cFile)) = 0 Then
        If Len(Dir(cPath & "\" & cZipfile)) = 0 Then
            MsgBox "Zip file does not exist", vbCritical
            Exit Sub
        Else
            Set oShell = CreateObject("Shell.Application")
            oShell.Namespace(cPath).CopyHere oShell.Namespace(cPath & "\" & cZipfile).items.Item(cFile)
        End If
    End If
    If Len(Dir(cPath & "\" & cFile)) = 0 Then
        MsgBox "Unable to extract " & cFile & " from zip file", vbCritical
        Exit Sub
    End If
    
    Open cPath & "\" & cFile For Input As #1
    strXML = Input(LOF(1), #1)
    Close #1
    
    Set oRE = CreateObject("vbscript.regexp")
    oRE.Global = True
    oRE.Pattern = "<xbrli:startDate>([^<]*)</xbrli:startDate>(?:.|\n)*?<xbrli:endDate>([^<]*)</xbrli:endDate>(?:.|\n)*?<us-gaap:SalesRevenueNet .*?contextRef=""D2014Q2""[^>]*>(\d*)</us-gaap:SalesRevenueNet>(?:.|\n)*?<us-gaap:CostOfGoodsSold .*?contextRef=""D2014Q2""[^>]*>(\d*)</us-gaap:CostOfGoodsSold>"

    If oRE.test(strXML) Then
        Set oMatches = oRE.Execute(strXML)
        For Each oM In oMatches
           With oM
               For lngSM = 0 To .submatches.Count - 1
                   Debug.Print "submatch(" & lngSM & ")", .submatches(lngSM)
               Next
           End With
        Next
    Else
        MsgBox "XML data did not match regular expression pattern", vbExclamation
    End If

End Sub

Open in new window

0
 

Author Comment

by:Russellbrown
ID: 40262573
It works perfectly. Thank you very much.  I noticed the patterns i.e.

1)([^<]*)

2) (?:.|\n)*?

3)  .*?contextRef=""D2014Q2""[^>]*>(\d*)    

I visited http://msdn.microsoft.com/en-us/library/ms974570.aspx to have a look at the syntax and expressions. Hope this is the correct reference site.
Can you please kick-start me on how to read the symbols so I can apply it accordingly to other scenarios?
 
Thanks.
0
 

Author Comment

by:Russellbrown
ID: 40262710
I experimented with the codes by adding 2 more items: Sales and COGS for D2014Q2YTD i.e.
a = "<xbrli:startDate>([^<]*)</xbrli:startDate>(?:.|\n)*?"
b = "<xbrli:endDate>([^<]*)</xbrli:endDate>(?:.|\n)*?"
c = "<us-gaap:SalesRevenueNet .*?contextRef=""D2014Q2""[^>]*>(\d*)</us-gaap:SalesRevenueNet>(?:.|\n)*?"
d = "<us-gaap:CostOfGoodsSold .*?contextRef=""D2014Q2""[^>]*>(\d*)</us-gaap:CostOfGoodsSold>(?:.|\n)*?"
e = "<us-gaap:SalesRevenueNet .*?contextRef=""D2014Q2YTD""[^>]*>(\d*)</us-gaap:SalesRevenueNet>(?:.|\n)*?"
f = "<us-gaap:CostOfGoodsSold .*?contextRef=""D2014Q2YTD""[^>]*>(\d*)</us-gaap:CostOfGoodsSold>"
   
I get the following results:
1)  oRE.Pattern = a & b & c & d works ( 4 items )
2)  oRE.Pattern = a & b & e & f works  ( 4 items )
3)  oRE.Pattern = a & b & c & d & e & f not working ( 4 + 2 more items )

Would appreciate it very much if you could point out where I have do it wrong. Thanks.
0
 
LVL 46

Expert Comment

by:aikimark
ID: 40262924
Here is a good article by Patrick Matthews that you should read.
http:A_1336-Using-Regular-Expressions-in-Visual-Basic-for-Applications-and-Visual-Basic-6.html

1)([^<]*)
Capture all characters that are not a less than character and place them into a capture group.

2) (?:.|\n)*?
Ignore all characters, but don't be greedy when matching.  Although this is in parentheses, the leading ?: characters indicate that this group should not be captured.

3)  .*?contextRef=""D2014Q2""[^>]*>(\d*)
Ignore all characters, but don't be greedy when matching.  Then match contextRef="D2014Q2" string followed by any characters that aren't a > character, terminating with a > character.  Then capture zero or more digits.  Since this pattern is a VB string, actual quote mark characters are represented by double-quote character sequences.
0
 
LVL 46

Expert Comment

by:aikimark
ID: 40262932
The pattern should resemble the order of the items in the file.
a & b & c & e & d & f

It is possible to go after the data one or two patterns at a time.
0
 

Author Comment

by:Russellbrown
ID: 40265748
Thank you so much for the excellent replies. I will go through the references and try to work with some examples. Before I sign off and happily accept  the solutions to my question(s), would you be familiar with the SEC XBRL? If so, I will post it as a new question i.e.

Either using Python or Excel VBA:
1. obtain a list of Instance URLs for 10-K filings e.g. http://www.sec.gov/Archives/edgar/data/1800/000110465909048013/abt-20090630.xml
a) use the monthly RSS feed e.g. http://www.sec.gov/Archives/edgar/monthly/xbrlrss-2014-08.xml  to obtain the company name, CIK, Accession Number/Acc-no: and Period of Report. Add a loop to iterate through the years and months.
b) output the list with details to either Excel / Pyhton.

Many thanks and regards.
0
 
LVL 46

Expert Comment

by:aikimark
ID: 40266003
If you have a follow-up question, then close this thread and include a link to this thread in your new question.  After posting the new question, add a comment in this thread letting the experts (and future readers) know about the new question.
0
 

Author Closing Comment

by:Russellbrown
ID: 40266009
Excellent. The replies were clear and concise and addressed my question(s) completely.
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Do you use a spreadsheet like Microsoft's Excel?  Have you ever wanted to link out to a non excel file on your computer or network drive?  This is the way I found to do it!
If you need to forecast numbers -- typically for finance -- the Windows and Mac versions of Excel 2016 have a basket of tools to get the job done.
Graphs within dashboards are meant to be dynamic, representing data from a period of time that will change each time the dashboard is updated with new data. Rather than update each graph to point to a different set within a static set of data, t…
This Micro Tutorial will demonstrate how to use longer labels with horizontal bar charts instead of the vertical column chart.

839 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question