Link to home
Start Free TrialLog in
Avatar of MorDeRor
MorDeRor

asked on

reading files programmatically

Hello,

I have to go through the text of a bunch of stored procedures looking for other objects that they reference. For example: the text that follows the word "exec" is a name of a stored procedure, one after words like "insert", "delete" refer to a table.

can some one give me an idea what is the easiest way to accomplish the task of going throgh all these files and retrieving the necessary information?

thanks
Avatar of SunBow
SunBow
Flag of United States of America image

Learn what is real about parsing and syntax.

Roughly, take file in as text one character at a time (preferably buffered)
Make use of 'nice' variables such as token
Given the syntax, create some subroutine(s) like 'GetToken' to make that process redundant
With luck you can create the necessary variable from the beginning letter to the next time non-letters are used.
It may be useful to define a character as a delimiter that supports that function, such as whitespace, paren, or punctuation.

It also helps to cheat, having inside knowledge about the file contents.
As such, with limited understanding of goal, I might start by making a printout of every line containing the token 'exec', then figure out what further coding may be useful there. The same goes for other desireable 'tokens' such as "insert"
Avatar of MorDeRor
MorDeRor

ASKER

Thanks, I am not sure I understand everything that you wrote, but I can at least look some of it up.

as far as the logic of parsing, it would be to locate an instance of let's say "exec" and take in whatever characters are there until the next blank.

with other key words, it's a bit more involved. Here parser has to find the key word - let's say "delete", and evaluate the next word (i.e. group of alphanumeric characters enclosed by blank spaces). If it is one of the "secondary key words" - in this case "from", then the target is the next word after that.

does this make sense?
Avatar of Patrick Matthews
Please make a list of all the rules you need to be followed.  Be as explicit as you can.  I believe a
Regular Expressions approach will work to get almost all of them, if you can state your rules
clearly enough.

It would also be helpful if you could provide a sample file to www.ee-stuff.com.

Patrick
thanks matthew,

the original text is in a SQL server table. I can put it in a file and parse the file, but if it's not too much trouble I would love to skip this step.

here are the parsing  rules:
we are looking for particular key words. These are:
1. exec
2. insert
3. update
4. delete

"word" is defined as a group of alphanumeric characters surrounded by blank spaces.

for each key word:

1. target word follows
2. if the next word is "into" then target word is after that, else target word follows the key word
3. target word follows
4. if the next word is "from" then target word is after that, else target word follows the key word

all words must be stripped of square brackets ( [] ) if present

I hope this is clear.

here is sample text:

create procedure [sp_MSins_IBI_Incident_First_Forecast] @c1 int,@c2 datetime,@c3 datetime

AS
BEGIN

insert into [IBI_Incident_First_Forecast](
[iIncidentId], [dtCloseDate], [dtInsertDate]
 )

values (
@c1, @c2, @c3
 )


UPDATE       IBI_Top_Incident
SET       Orig_CloseDate       = FF.dtCloseDate
FROM       IBI_Top_Incident I, IBI_Incident_First_Forecast FF
WHERE   I.iIncidentId = @c1
AND       I.iIncidentId       = FF.iIncidentId

UPDATE       IBI_salesIncident
SET       Forecastentrydate = ih.dtInsertDate
FROM    IBI_Incident_First_Forecast Ih, IBI_salesIncident I
WHERE   I.iIncidentId = @c1
AND     I.Iincidentid = Ih.Iincidentid

END


Function RegExpFind(LookIn As String, PatternStr As String, Optional Pos, _
    Optional MatchCase As Boolean = True)

    ' This function uses Regular Expressions to parse a string (LookIn), and return matches to a
    ' pattern (PatternStr).  Use Pos to indicate which match you want:
    ' Pos omitted               : function returns a zero-based array of all matches
    ' Pos = 0                   : the last match
    ' Pos = 1                   : the first match
    ' Pos = 2                   : the second match
    ' Pos = <positive integer>  : the Nth match
    ' If Pos is greater than the number of matches, is negative, or is non-numeric, the function
    ' returns an empty string.  If no match is found, the function returns an empty string
   
    ' If MatchCase is omitted or True (default for RegExp) then the Pattern must match case (and
    ' thus you may have to use [a-zA-Z] instead of just [a-z] or [A-Z]).
   
    ' If you use this function in Excel, you can use range references for any of the arguments.
    ' If you use this in Excel and return the full array, make sure to set up the formula as an
    ' array formula.  If you need the array formula to go down a column, use TRANSPOSE()
   
    Dim RegX As Object
    Dim TheMatches As Object
    Dim Answer() As String
    Dim Counter As Long
   
    ' Evaluate Pos.  If it is there, it must be numeric and converted to Long
    If Not IsMissing(Pos) Then
        If Not IsNumeric(Pos) Then
            RegExpFind = ""
            Exit Function
        Else
            Pos = CLng(Pos)
        End If
    End If
   
    ' Create instance of RegExp object
    Set RegX = CreateObject("VBScript.RegExp")
    With RegX
        .Pattern = PatternStr
        .Global = True
        .IgnoreCase = Not MatchCase
    End With
       
    ' Test to see if there are any matches
    If RegX.test(LookIn) Then
       
        ' Run RegExp to get the matches, which are returned as a zero-based collection
        Set TheMatches = RegX.Execute(LookIn)
       
        ' If Pos is missing, user wants array of all matches.  Build it and assign it as the
        ' function's return value
        If IsMissing(Pos) Then
            ReDim Answer(0 To TheMatches.Count - 1) As String
            For Counter = 0 To UBound(Answer)
                Answer(Counter) = TheMatches(Counter)
            Next
            RegExpFind = Answer
       
        ' User wanted the Nth match (or last match, if Pos = 0).  Get the Nth value, if possible
        Else
            Select Case Pos
                Case 0                          ' Last match
                    RegExpFind = TheMatches(TheMatches.Count - 1)
                Case 1 To TheMatches.Count      ' Nth match
                    RegExpFind = TheMatches(Pos - 1)
                Case Else                       ' Invalid item number
                    RegExpFind = ""
            End Select
        End If
   
    ' If there are no matches, return empty string
    Else
        RegExpFind = ""
    End If
   
    ' Release object variables
    Set RegX = Nothing
    Set TheMatches = Nothing
   
End Function

Function Parse()

    Dim fso As Object
    Dim ts As Object
    Dim WholeFile As String
    Dim arr As Variant
    Dim Counter As Long
   
    Set fso = CreateObject("Scripting.FileSystemObject")
    Set ts = fso.OpenTextFile("c:\Documents and Settings\matthewsp\Desktop\Blah.txt")
   
    WholeFile = ts.ReadAll
    ts.Close
    Set ts = Nothing
    Set fso = Nothing
   
    arr = RegExpFind(WholeFile, "(exec|insert( into){0,1}|update|delete( from){0,1}) +\[{0,1}[a-z0-9_]+\]{0,1}", , False)
   
    For Counter = 0 To UBound(arr)
        arr(Counter) = Replace(arr(Counter), "exec", "", 1, 1, vbTextCompare)
        arr(Counter) = Replace(arr(Counter), "insert", "", 1, 1, vbTextCompare)
        arr(Counter) = Replace(arr(Counter), "into", "", 1, 1, vbTextCompare)
        arr(Counter) = Replace(arr(Counter), "from", "", 1, 1, vbTextCompare)
        arr(Counter) = Replace(arr(Counter), "delete", "", 1, 1, vbTextCompare)
        arr(Counter) = Replace(arr(Counter), "update", "", 1, 1, vbTextCompare)
        arr(Counter) = Replace(arr(Counter), "[", "", 1, 1, vbTextCompare)
        arr(Counter) = Replace(arr(Counter), "]", "", 1, 1, vbTextCompare)
        arr(Counter) = Trim(arr(Counter))
        Debug.Print arr(Counter)
    Next
   
    Parse = arr
   
End Function





The function Parse returns a zero-based array with the matches.

Patrick
thanks patric, I am not sure yet how it works, but it seems to be a big step in the right direction. I just need to figure out a way to ignore comments, and to return a 2D array so that I would know what action is performed - e.g. update, insert, or exec.
> I just need to figure out a way to ignore comments, and to return a 2D array so that I would know
> what action is performed - e.g. update, insert, or exec.

I wish you had mentioned that earlier!  I can do that, maybe in a few hours.

Patrick
sorry, did not see that until I ran some tests with the code you sent me. The text is T-SQL code, so comments can be either line-by-line, or block comments. line by line comment is identified by "--" at the beginning of the respective line, block comments are contained between "/*" and "*/" symbols.
ASKER CERTIFIED SOLUTION
Avatar of Patrick Matthews
Patrick Matthews
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
it's so beautiful I could cry!
You're welcome :)
OK, a slight problem with the replace function. it removes everything between first "/*" and last "*/". which as you can imagine is not the best thing to do because there can be multiple /* */ comment blocks. any suggestions? If this should be a new question please let me know.
I'm not seeing that behavior.  Please post sample text that is exhibiting this behavior.

Patrick
OK, here is a test file:

 CREATE PROC IBISp_Repl_Ins_Company_IBI_Company_SalesRep_Contact  
@iCompanyId int /* to identify records in the Company table */  
 
AS  
 
/* create a table variable to hold all the processing */  
declare @insert TABLE  
(  
 [icompanyid] [int] NOT NULL ,  
 [parent_flag] [char] (1) NULL,  
 [dtinsertdate] [datetime] NULL ,  
 [vchcompanyname] [varchar] (255) NULL ,  
 [UP_companyname] [varchar] (255) NULL ,  
 [company_revenue] [varchar] (255) NULL ,  
 [company_type] [char] (20) NULL,  
 [company_subtype] [char] (20) NULL,  
 [top_parentid] [int] NULL ,  
 [parent_revenue] [varchar] (255) NULL ,  
 [parent_name] [varchar] (255) NULL ,  
 [UP_parent_name] [varchar] (255) NULL ,  
 [parent_company_type] [char] (20) NULL ,  
 [parent_branch_name] [char] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,  
 [branch_name] [char] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,  
 [region_name] [char] (25) NULL,  
 [Zbranch] [int] NULL,  
        [Zregion] [int] NULL,  
 [countrystatus] [char] (1) NULL,  
 [salesrep_contact_ID] [char] (10) COLLATE SQL_Latin1_General_CP1_CI_AS  NULL,  
 [salesrep_contact] [char] (30) NULL,  
 [visit_flag] [char] (1) NULL ,  
 [Billion_Revenue_Flag] [char] (1) NULL ,  
        [mf_rescue_flag] [char] (1) NULL,  
 [individual_count] [int] NULL  
)  
 
 
 
/* perform the initial load of data from Company  */  
INSERT @insert  
(  
icompanyid,  
dtinsertdate,  
vchcompanyname,    
company_type,  
company_subtype  
)  
SELECT  
icompanyid,  
dtinsertdate,  
vchcompanyname,  
iCompanyTypeCode,  
iCompanySubTypeCode  
FROM company c  
WHERE  
c.iCompanyId = @iCompanyId  
AND c.tirecordstatus =1  
AND c.istatusid = 328  
 
IF @@ERROR<>0 GOTO ERROR  
 
 
-- SET branch from company record, then if there is a sales rep assigned it gets SET to sales reps branch  
UPDATE @insert  
SET  Branch_Name = b.branch_name,  
 Region_Name = b.region_name,  
 CountryStatus = b.countrystatus,  
 Zbranch = b.iparameterid_branch,  
        Zregion = b.iparameterid_region  
FROM @insert A, ibi_decode_branch_region b  
WHERE a.iCompanyId = @iCompanyId  
AND a.branch_name = b.branch_name  
 
UPDATE @insert  
SET company_revenue = rp.vchparameterdesc  
FROM @insert A, company c, referenceparameters rp  
WHERE a.iCompanyId = @iCompanyId  
AND a.icompanyid = c.icompanyid  
AND c.vchuser3 = rp.iparameterid  
 
UPDATE @insert  
SET company_revenue = 'Unknown'  
WHERE iCompanyId = @iCompanyId  
AND (company_revenue is NULL) or (company_revenue = '0')  
 
-- icontacttypeid 7 is a Sales Rep  
UPDATE @insert  
SET  salesrep_contact_ID = b.chuserid,  
     salesrep_contact = u.chusername  
FROM @insert a, contactinternal b, users u  
WHERE a.iCompanyId = @iCompanyId  
AND a.icompanyid = b.iownerid  
AND b.chuserid = u.chuserid  
AND b.tirecordstatus = 1  
AND u.tirecordstatus = 1  
AND b.icontacttypeid in (7)  
 
-- hd04422 6-26-2003 If company does not have internal contact, then get internal contact from parent company  
UPDATE @insert  
SET  salesrep_contact_ID = b.chuserid,  
     salesrep_contact = u.chusername  
FROM @insert a, contactinternal b, users u  
WHERE a.iCompanyId = @iCompanyId  
AND a.top_parentid = b.iownerid  
AND b.chuserid = u.chuserid  
AND b.tirecordstatus = 1  
AND u.tirecordstatus = 1  
AND b.icontacttypeid in (7)  
AND a.salesrep_contact_ID is NULL  
 
-- SET branch based on sales rep location instead of taking branch on company record  
-- hd04422 11-14-2003 per Dave Small  
UPDATE @insert  
SET branch_name = b.branch_name,  
    region_name = b.region_name,  
    countrystatus = b.countrystatus,  
    zBranch = b.iParameterid_Branch,  
    zRegion = b.iParameterid_Region  
FROM @insert a, users u, ibi_decode_branch_region b  
WHERE a.iCompanyId = @iCompanyId  
AND a.salesrep_contact_ID = u.chuserid  
AND u.chgroupid = b.branch_code  
AND a.salesrep_contact_ID <> 'Unassigned'  
 
UPDATE @insert  
SET  top_parentid = b.highest_parent,  
     parent_revenue = b.parent_revenue,  
     parent_name = b.vchcompanyname  
FROM @insert a, ibiv_company_hierarchy3 b  
WHERE a.iCompanyId = @iCompanyId  
AND a.icompanyid = b.icompanyid  
 
-- hd04422 11-14-2003  
-- SET branch from company record, then if there is a sales rep assigned it gets SET to sales reps branch  
UPDATE @insert  
SET  parent_branch_name = b.branch_name  
FROM @insert a, company c, ibi_decode_branch_region b  
WHERE a.iCompanyId = @iCompanyId  
AND a.top_parentid = c.icompanyid  
AND c.vchuser1 = b.iparameterid_branch  
 
UPDATE @insert    
SET  parent_branch_name = d.branch_name  
FROM @insert a, company c, contactinternal b, users u, ibi_decode_branch_region d  
WHERE a.iCompanyId = @iCompanyId  
AND a.top_parentid = c.icompanyid  
AND c.icompanyid = b.iownerid  
AND b.chuserid = u.chuserid  
AND b.tirecordstatus = 1  
AND u.tirecordstatus = 1  
AND b.icontacttypeid in (7)  
AND u.chgroupid = d.branch_code  
 
-- if parent is in branch that is not same as company branch, SET parent to company ID  
UPDATE @insert  
SET  top_parentid = icompanyid ,  
     parent_revenue = company_revenue ,  
     parent_name = vchcompanyname  
FROM @insert  
WHERE iCompanyId = @iCompanyId  
AND branch_name <> parent_branch_name  
 
UPDATE @insert  
SET  visit_flag = 'Y'  
FROM @insert A, IBI_Top_SalesTask V  
WHERE a.iCompanyId = @iCompanyId  
AND a.icompanyid = v.icompanyid and  
      v.iIncidentTypeId in (102314, 102460, 102459, 101942)  
 
UPDATE @insert  
SET parent_flag = 'Y'  
WHERE iCompanyId = @iCompanyId  
AND top_parentid is not NULL  
 
CREATE TABLE [dbo].[temp_parent2]  
(  
 [icompanyid] [int] NOT NULL  
)  
 
INSERT temp_parent2  
SELECT distinct top_parentid  
FROM IBI_Company_SalesRep_Contact  
WHERE top_parentid = @iCompanyId  
AND top_parentid is not NULL  
 
UPDATE @insert  
SET parent_flag = 'Y'  
FROM @insert A , temp_parent2 T, IBI_Company_SalesRep_Contact C  
WHERE a.iCompanyId = @iCompanyId  
AND a.iCompanyId = t.iCompanyid  
AND t.icompanyid = c.icompanyid  
AND c.top_parentid  = c.icompanyid  
 
DROP TABLE temp_parent2  
 
 
UPDATE @insert  
SET mf_rescue_flag = 'Y'  
FROM @insert A, company C  
WHERE a.iCompanyId = @iCompanyId  
AND A.icompanyid = C.icompanyid  
AND c.vchuser2 = '101871'  
 
UPDATE @insert  
SET mf_rescue_flag = 'N'  
FROM  @insert A, company C  
WHERE a.iCompanyId = @iCompanyId  
AND A.icompanyid = C.icompanyid  
AND c.vchuser2 is NULL  
 
UPDATE @insert  
SET mf_rescue_flag = 'N'  
FROM  @insert A, company C  
WHERE a.iCompanyId = @iCompanyId  
AND A.icompanyid = C.icompanyid  
AND c.vchuser2 != '101871'  
 
 
UPDATE  @insert  
SET  Billion_Revenue_Flag = 'Y'  
FROM   @insert A  
WHERE   iCompanyId = @iCompanyId  
AND Company_Revenue = 'Over $1 Billion'  
 
UPDATE  @insert  
SET  Billion_Revenue_Flag = 'N'  
FROM  @insert  
WHERE   iCompanyId = @iCompanyId  
AND  Billion_Revenue_Flag is NULL  
 
UPDATE @insert  
SET UP_companyname = UPPER(vchcompanyname),  
    UP_parent_name = UPPER(parent_name)  
 
UPDATE @insert  
SET parent_company_type = b.icompanytypecode  
FROM  @insert A, Company B  
WHERE a.iCompanyId = @iCompanyId  
AND a.top_parentid = b.icompanyid  
 
UPDATE @insert  
SET individual_count = (SELECT count(i.iindividualid)  
 from individual i  
        WHERE a.iCompanyId = @iCompanyId AND a.icompanyid = i.icompanyid  
 and i.tirecordstatus = 1  
 and i.istatusid = 328)  
FROM  @insert A  
 
 
IF @@ERROR<>0 GOTO ERROR  
 
/* now do the real update */  
BEGIN TRANSACTION  
 
insert IBI_Company_SalesRep_Contact  
(  
 [icompanyid] ,  
 [parent_flag] ,  
 [dtinsertdate] ,  
 [vchcompanyname]  ,  
 [UP_companyname]  ,  
 [company_revenue]  ,  
 [company_type],  
 [company_subtype] ,  
 [top_parentid]  ,  
 [parent_revenue]  ,  
 [parent_name]  ,  
 [UP_parent_name]  ,  
 [parent_company_type]  ,  
 [parent_branch_name] ,  
 [branch_name]  ,  
 [region_name]  ,  
 [Zbranch] ,  
        [Zregion] ,  
 [countrystatus] ,  
 [salesrep_contact_ID] ,  
 [salesrep_contact] ,  
 [visit_flag] ,  
 [Billion_Revenue_Flag] ,  
        [mf_rescue_flag] ,  
 [individual_count]  
)  
select  
 [icompanyid] ,  
 [parent_flag] ,  
 [dtinsertdate] ,  
 [vchcompanyname]  ,  
 [UP_companyname]  ,  
 [company_revenue]  ,  
 [company_type],  
 [company_subtype] ,  
 [top_parentid]  ,  
 [parent_revenue]  ,  
 [parent_name]  ,  
 [UP_parent_name]  ,  
 [parent_company_type]  ,  
 [parent_branch_name] ,  
 [branch_name]  ,  
 [region_name]  ,  
 [Zbranch]  ,  
        [Zregion] ,  
 [countrystatus] ,  
 [salesrep_contact_ID] ,  
 [salesrep_contact] ,  
 [visit_flag] ,  
 [Billion_Revenue_Flag] ,  
        [mf_rescue_flag] ,  
 [individual_count]  
 
FROM @insert  
/* This also was commented out on 9/05/06: WHERE iCompanyId=@iCompanyId */  
/* This code was commented out on 8/9/06: and salesrep_contact_ID is NOT NULL */  
 
IF @@ERROR<>0  
BEGIN  
ROLLBACK TRANSACTION  
GOTO ERROR  
END  
 
COMMIT TRANSACTION  
RETURN  
 
ERROR:  
PRINT 'ERROR HAS OCCURED!'  
RETURN -101  
 
I solved the original problem by removing the /W switch from one of the calls to RegExpReplace so now it looks like:

    WholeFile = RegExpReplace(WholeFile, "/\*[\w\n]*\*/", " ", True, False)

this gives me the text I am looking for. Now the other problem I ran into is that search string  "insert" will match "@insert" I tried to add [^@] to the search conditions, and that excludes the unwanted results, but also grabs the extra character in from of the word "insert" ...
OK, I solved the unwanted "@" character appearance for now by combination of:

arr = RegExpFind(WholeFile, "[^@](exec|insert|update|delete) +\[{0,1}\w+\]{0,1}", , False)
and
arr(Counter) = Mid(arr(Counter), 2)
hopefully the last update. I don't know why, but I had to put \W parameter back in. this together with the last comment I wrote finally provides the desired result! thanks
Part of the problem is that you created a variable, @insert, whose name was similar enough to the key word INSERT
that my Regular Expression was not making a fine enough distinction.  Again, if you had been more clear and complete
up front about your needs, it would have saved time for both of us.

I'll take another look.

Patrick
don't get me wrong, I wasn't compaining. It is already working the way that I did it (see above). Just from the point of view of keeping the code clean, I would have loved it to be able to do something like the whole word matching that is possible from Office search dialog boxes.
OK, I misunderstood.  Glad you got it working :)