Need help with reading line by line a large .nc file converted .txt using Microsoft Access DAO Recordset.

stephenlecomptejr used Ask the Experts™
Need help with reading line by line a large .nc file converted .txt using Microsoft Access DAO Recordset.

I'm providing a sample database as to my attempts to read a linked txt file that I basically renamed from a .nc data file.
If you take a look - there is a module:  modEExchange with a sub:  PullNonValidTagLineItemsOnly that I'm trying to get working correctly.  
To test you open the immediate window and enter:  Call PullNonValidTagLineItemsOnly("0001130310-19-000016", 7)  per eeexample2.png.

Basically I've identified some tags within this file that don't seem to find the ending </us-gaap and now I'm taking that and trying to go to that exact line no within that text file and put in a separate table until I come across either that ending </us-gaap or  - the next 10 line items within that text file at least

I understand doing this can be tricky cause text files have limitations when you are looking at this line by line and the string limitations as to width.  I made sure though the data file is long text for the Field1 and have linked the text file with the advanced link specifications in image:  eeexample1.png.

Per eexample2.png - the first tag to come up is CommitmentsAndContingencies on line number:  19175 within the text file..based on the sSQL looping through the DAO.REcordset ...  once I grab the first tag I loop through the linked table:  0001130310-19-000016 and am now trying to go to that exact line no - to start pulling that tag - into table:  usgaapfileswtags_SUB as a sample of that data.

But as I go through using DAO.Recordset:  rsTxt.... it reads line by line by stops at line 84, per image:  eexample2.png shows.
I don't know why it doesn't go to line 19175 instead.
Manually - In eexample2.png you can see I can go in the linked table:   - select 19175 - and go to that specific record line item - and get that tag no problem
In eexample3.png you see notepad ++ goes to line no 19175 - and there is the same information.

But here using DAO - if you notice in eexample2 - it stops on line no 84 and refuses to loop anymore.

So my question is - how do I make the dao recordset jump to that specific line no. within the linked text file and grab that record?

Here is the code:

Public Sub PullNonValidTagLineItemsOnly(sTextFile As String, lIDFileName As Long)

  Dim sSQL As String
  Dim rs As DAO.Recordset
  Dim sSQLTxt As String
  Dim sLineItem As String
  Dim rsTxt As DAO.Recordset
  Dim m As Long
  Dim sTagName As String
  Dim lLineNo As Long
  Dim lDetailNo As Long
  Dim lOrgFileswTags As Long
  Dim sTagFull As String
  Dim lLimit As Long
  Dim lTimes As Long

  sSQL = "SELECT usgaaptags.tagname, usgaapfileswtags.usgaaplineno, usgaapfileswtags.usgaapvalid, usgaapfileswtags.usgaapfilesid, usgaapfileswtags.usgaapdetailid, usgaapfileswtags.usgaaprow"
  sSQL = sSQL & " FROM usgaaptags INNER JOIN usgaapfileswtags ON usgaaptags.tagid = usgaapfileswtags.tagid"
  sSQL = sSQL & " WHERE [usgaapfileswtags].[usgaapvalid] = False And [usgaapfileswtags].[usgaapfilesid] = " & lIDFileName
  Set rs = CurrentDb.OpenRecordset(sSQL)
  Do Until rs.EOF
    sTagName = rs.Fields(0)
    lLineNo = Nz(rs.Fields(1), 0)
    lOrgFileswTags = rs.Fields(3)
    lDetailNo = rs.Fields(4)
    sTagFull = Nz(rs.Fields(5), "")
    'now loop through and grab those from here...
    'find the lineno and do this to go in for 10 tries at the most...
    'or warn couldn't find it..
    m = 0
    sSQLTxt = "SELECT * FROM [" & sTextFile & "]"
    Set rsTxt = CurrentDb.OpenRecordset(sSQL)
    Do Until rsTxt.EOF
      m = m + 1
      'If m = 22783 Then Stop
      'sLineItem = Nz(rsTxt.Fields(0), "")
      If m = lLineNo Then
        sLineItem = Nz(rsTxt.Fields(0), "")
        sSQL = "INSERT INTO [usgaapfileswtags_SUB] (usgaapdetailid, usgaaplineno, usgaaprowsub)"
        sSQL = sSQL & " VALUES (" & lDetailNo & ", " & m & ", '" & sLineItem & "')"
        CurrentDb.Execute sSQL
        If sLineItem Like "*</us-gaap:*" Or lTimes > lLimit Then
          Exit Do
        End If
      End If
    Debug.Print m
    Set rsTxt = Nothing
    lLimit = 10
    lTimes = 0
  Set rs = Nothing

End Sub

Open in new window

Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Top Expert 2014

It is stopping on a Stop statement, which is expected behavior.

I don't think there's a way to implement this with your text format using either DAO or ADODB.  

* convert part of your input into XML and use XPath to do your navigation and lookups
* preprocess the input text, populating a dictionary with the key values. The value of the each dictionary item is its line number, or position in the file, or actual text of the line.
* find an ODBC driver for that file format. I'd look at ProgressData or CData to start with.
Most Valuable Expert 2015
Distinguished Expert 2018

First, it is not clear to me, what info you are trying to look up.
Second, as this should be a one-time scan of the file, I think it would be way simpler to use the FileSystemObject and the textreader rather than linking the file.



First, trying to grab all tags and put in a database.  It does that but when the tag is no longer ending on one line - I'm trying to identify that information also and put in a different table.

" the tag is no longer ending on one line"  what I mean I'm looking for this:

Open in new window

vs a tag just doing this:

	<us-gaap:CommercialPaper contextRef="FI2018Q4_us-gaap_DebtInstrumentAxis_us-gaap_CommercialPaperMember" decimals="0" id="Fact-962AE4C245AD51B698FBD7F8B20B0E07" unitRef="usd">0</us-gaap:CommercialPaper>

Open in new window

Like there is a unique identifier for CommercialPaper tag.   And then that unique identifier - I want to show those 5 line items for it when a person clicks on that.

Second, I'm beginning to agree with you on that.



I really, really appreciate your remarks as it's helping me better figure this out.
I just never did this before and still digesting what you are saying.  I wish there was a video that could guide me through the steps so I can attempt this.
I'm only used to MS Access VBA at this point.
Acronis in Gartner 2019 MQ for datacenter backup

It is an honor to be featured in Gartner 2019 Magic Quadrant for Datacenter Backup and Recovery Solutions. Gartner’s MQ sets a high standard and earning a place on their grid is a great affirmation that Acronis is delivering on our mission to protect all data, apps, and systems.

Top Expert 2014

Which tags are you trying to get from this input file?

Gustav's comment about using FSO is the way I would bring the data into the program.


What I need help is understanding the difference between the way the tags are displayed above.

CommercialPaper is the same tag but is one the value assigned and the other the definition?
Top Expert 2014

I don't know the data enough to help you in that regard.

Although you've indicated that some tags extend across multiple lines, I don't see that in the data file you shared.


Here are the two files I was talking about.
Please do a search for: <us-gaap:CommercialPaper within using Notepad++
Top Expert 2014

John TsioumprisSoftware & Systems Engineer

Your solution starts with a wrong assumption ...that the tags will only expand to 10 lines...
This is no way to do it...the correct way is by counting the "<" and ">"
For example
< tag 'Here you have "<" incremented to 1
Some Info
Some Other Info
End> ' Here you have ">" so you decrease to 0...tag closed...
You can also scan for cases of "<" and "/>" for nested data
Top Expert 2014

Interesting.  Some of the us-gaap tags aren't matched with /us-gaap tags.


Yea aikimark.  Thank you.  That's my challenge to provide for arrangements to catch both those scenarios.

And the 10 lines is just a way for me to get this working, I know it's not only 10 lines. John.
Top Expert 2014
Ok. One step closer.  I first solved this in a Powershell environment (see below).  There are two regex patterns that are needed.

$path = 'C:\Users\New User\Downloads\'
#$path = 'C:\Users\New User\Downloads\0001130310-19-000016.txt'
$content = gc -path $path -raw
$rgxPatterns = '<us-gaap:[^>]+?">[^<]*<\/us-gaap:[^>]+>', '<us-gaap:[^>]+?/>'

$HashTbl = [hashtable]@{}
foreach ($pattern in $rgxPatterns){
    $rgx = [regex]$pattern   
    $filtered = $rgx.Matches($content)
    $filtered | 
      % {$null = $_.groups[0].Value -match 'id="([^`"]+)"'
         $key = $Matches[1]
         $HashTbl.Add($key, $_.value)

Open in new window


Do you want to take a crack at this?
Basically, you use a filesystemobject textstream to .readall() the contents of the file into a variable.  Then invoke the .Execute() method of a vbscript.regexp object for each of the patterns I listed above.  Then iterate through the matches, populating a dictionary or collection object.

Note: there is a third pattern I use for parsing the id value of the tag.

Despite the size of the file, this performed reasonably well.


Really appreciate the replies.   They have been helpful.
aikimark, the coding has really helped tremendously.

Thank you sincerely!

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial