Need help with document linecounts

Hey guys,

We have an existing system done in C#.net using SQL Server 2005 as database.  We have no way of updating that existing app.

Database has a blob col.  Its a word document file.  Now in the database there's a linecount col wherein the computation is

Character with Space / 65 = Linecount.

Now there's a new demand called VBC = Visible Black Character wherein spaces are not included only those characters typed in meaning Character without spaces.

VBC count is not included in the database.  And I'm creating an app in VB6 that will need that VBC.

How can i use the existing database and get VBC to each document without making the app slow?

Any input guys is greatly appreciated.

TIA.
m3mdiclAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Brook BraswellApplication Development ManagerCommented:
dim sTmpDoc as string
dim VBCount as single
sTmpDoc = trim(replace(BlobData," ",""))
VBCount = len(stmpdoc)

0
GrahamSkanRetiredCommented:
I don't understand exactly what you need, but you should know that a Word document file does not have a line count figure. Any such number is assigned when the application is opened and pagination is done with respect to the application's current printer driver.
0
Kevin CrossChief Technology OfficerCommented:
How are you getting the {characters with space} count now? Are you using LEN()?
If so, you can use the same approach but alter by adding an inner REPLACE() function.
LEN(REPLACE(your_column, ' ', '')) will tell you {characters w/o space} count.
0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

Kevin CrossChief Technology OfficerCommented:
Guess I type way to slow. Two comments now. Sorry fellows. For what it is worth, mine is for SQL Server; however, you can see the syntax/functions are very similar to what you would use in VB. I did not believe trimming string for spaces was necessary after replacing all the spaces with empty string, but again that is based on SQL server where non-space white space is not removed. I forget if TRIM() In VB acts differently. Anyway, that would be the only difference in code. If you use trimming method in SQL, you will have to use left and right functions: LTRIM() and RTRIM(); there is no single TRIM() for T-SQL.
0
m3mdiclAuthor Commented:
It's getting it via word object I guess.  I don't know how C# works but its like this

Character with Space / 65 = Linecount.

Just to remind you guys that I have no control over the existing app.  Only I can edit/change the database.
0
Brook BraswellApplication Development ManagerCommented:
If you do not have control of the VB app, then are you wanting to replace the linecount column of the database with the modified calculation without spaces or create a new column for the different value ?
if you use a new column then likely the app would have to be changed anyway.
if you put a change to the existing column then you would need to have it updated on a regular basis when the value changes...

Suggest:
An Update Trigger to set the LineCount column with the Length(Replace(BlobColumn,' ',''))/65
0
Kevin CrossChief Technology OfficerCommented:
Okay. Hmm. With a BLOB, the REPLACE() and LEN() approach may not be suitable, plus if you cannot change application I am not sure if you can successfully impact this number unless you have a column in SELECT query that is sending the count. If the application is reading the characters from the BLOB, I am not sure you want to remove all spaces as that would leave you will one massive string of characters.
0
Kevin CrossChief Technology OfficerCommented:
@Brook1966: I did not see your last comment, btw. The comment on REPLACE/LEN not being suitable was in response to my own suggestion to use those and not yours. However, do note that it is LEN() for SQL Server.
0
Brook BraswellApplication Development ManagerCommented:
I am not sure you can do all of this from solely within the database as the Blob may contain the word doc in binary format and not as easily readable as straight text....
instead of "Mary had a little lamb" you might get "ÐÏࡱá                >  þÿ                     4          6      þÿÿÿ    3"
0
m3mdiclAuthor Commented:
Because I will be running queries all over the database and computing the correct VBC linecount for it, which the VBC linecount column doesn't exist.

Any recommendations on what do to?  Do i need to create another database for that?  Do i need to just hardcode it in VB6 make a function to convert the linecount to VBC linecount something like

Public Function ConvertTOVBC(Byval LCount as double) as double
End Function

Open in new window


I worried about this coz it will cause so much overhead. Any ideas?  If only the database has this VBC Col then there's no problem with my current VB App.
0
Brook BraswellApplication Development ManagerCommented:
@mwvisa1:
it is kewl - not looking to steal a cookie but help a client.  You can have the points if it helps him :)
In some versions of SQL it is LENGTH rather then LEN ( Sybase SQL Anywhere )
** Well noted that the client is using SQL Server 2005
0
m3mdiclAuthor Commented:
Existing APP in C#.Net not in VB6.  I cannot update the existing app because I dunno how to use C#.

Instead I created a kinda replica with limited functionality using VB6 and use the existing database.  
This app will be distributed to our clients for viewing purpose only as we cannot give them access to our current C#.Net app.

Now in the database there's no VBC Column, which our client prefers.

How do i do it?  The existing app is LIVE it's been running 24/7 for the past 3 years i think.
0
Kevin CrossChief Technology OfficerCommented:
Hmm. Two things:
1. Do you feel comfortable with VB.NET or just VB6?
2. Is your own knowledge of C# the only barrier to changing original app? i.e., is that acceptable to the original vendor who wrote the application?

First the reason I asked the latter is that if the C# application is calculating the characters with spaces, then it is likely already converting the data from binary to character data. You could simply add supplementary code to that process to calculate the character total less spaces.

With that said, there is also the possibility of translating the specific C# code to VB.NET. It may be more efficient than what you will have to do with ADO. Now, I have not tried with a large Word document, but SQL 2005 introduced some higher limits on VARCHAR with VARCHAR(MAX), so may be able to do this easily with what you have. Just checking what all your options are, as you may be able to have a Moderator add VB and/or C# .NET topic areas also.
0
Brook BraswellApplication Development ManagerCommented:
I do not think this will be a very good method for you since it seems you need the entire database updated for your client....but here is the populated function
Public Function ConvertTOVBC(Byval LCount as double) as double
                dim CN as new adodb.Connection
                DIM RS as new adodb.recordset
                dim sBLOB as string
                dim sCount as single
                dim SQL as string
                CN.OPEN "YOUR CONNECTION STRING"
                ' SET YOUR sBLOB to a converted string from the blob column
                SQL = "SELECT BLOB FROM TABLE WHERE CONDITION"
                RS.OPEN SQL,CN,adopenstatic, adreadonly
                sBlob = RS!BLOB ' CONVERT THIS TO NORMAL STRING
                sCount = LEN(REPLACE(sBLOB," ",""))
                ConvertToVBC = sCount
                
End Function

Open in new window

0
m3mdiclAuthor Commented:
@mwvisa1
1.) I'm comfortable with VB6.
2.) The problem is we don't have the source code from the developer.  But I will take that into account.
0
Kevin CrossChief Technology OfficerCommented:
No worries. It looks like Brook1966 is handling helping you with VB6 code.
0
m3mdiclAuthor Commented:
Yes but that's my last resort. I'm looking for second opinion.

I'm thinking is it possible in SQL Server 2005 to create a script that will update the existing database?

I mean the database is live is it possible to trigger a script when a new data has been added?  
I'm planning on inserting a VBC column on the existing database.

Then update all the rows with the corresponding VBC count per row.
Is this possible with SQL Server alone?
0
m3mdiclAuthor Commented:
Since I have no control over the frontend, I'll just use the existing database to automatically generate the correct VBC value on the VBC column using a query.

What do you think guys?
0
Kevin CrossChief Technology OfficerCommented:
You can do that as indicated earlier using a trigger. You can use an AFTER INSERT, UPDATE trigger. That brings up a thought: how is INSERT and UPDATE handled? Is it all done in the C# or is it through Stored Procedures? I will have to do some testing on the BLOB data manipulation before commenting on that further in T-SQL.
0
m3mdiclAuthor Commented:
Through Stored Procedure.
0
Kevin CrossChief Technology OfficerCommented:
What is that code?
0
m3mdiclAuthor Commented:
Ok to make things more simple, I created another table to put the VBC data.

Can someone help me with a script to export the ID and VBC to the other table.

So for example Table A has columns ID and  DocumentBLOB

What i need to do is compute the correct VBC linecount by opening the BLOB and then putting it to the table i created. Create two columns like ID and VBC on Table B.
0
Kevin CrossChief Technology OfficerCommented:
Try this:
SELECT ID
     , LEN(REPLACE(CONVERT(NVARCHAR(MAX), DocumentBLOB), ' ', '')) AS VBC
FROM TableA
;

Open in new window


If DocumentBLOB is not already VARBINARY(MAX), then try like this:
SELECT ID
     , LEN(REPLACE(CONVERT(NVARCHAR(MAX), CONVERT(VARBINARY(MAX), DocumentBLOB)), ' ', '')) AS VBC
FROM TableA
;

Open in new window


Not sure it will work, but have tested on simplistic case:
DECLARE @DocumentBLOB VARBINARY(MAX)
SET @DocumentBLOB = CONVERT(VARBINARY(MAX), 'Some text with spaces in it!');
SELECT LEN(REPLACE(CONVERT(VARCHAR(MAX), @DocumentBLOB), ' ', ''));

Open in new window

Final result is 23 which is the 28-character string less the five spaces between each word.

Whichever works for you, just put INSERT INTO VBCTable(ID, VBC) in front of the SELECT ...
0
m3mdiclAuthor Commented:
Thanks for the reply.  I will try that and let you know.
0
m3mdiclAuthor Commented:
Hi I tried it out but it seems like it don't match with the VBC count.

Here are the actual results randomized.

ID -------------------  VBC---------------------- Expected VBC from word ( character with no spaces)
91074-------------193443----------------------3628
91118-------------537291----------------------8488
91128-------------73530-----------------------1930
91150-------------89867-----------------------2790

Here is my SQL Query

INSERT INTO VBCTable(DocumentID, VBC)
SELECT DocumentID, LEN(REPLACE(CONVERT(NVARCHAR(MAX), RawFile), ' ', '')) AS VBC
FROM dbo.DocumentBinary

Open in new window

0
Kevin CrossChief Technology OfficerCommented:
You probably just have to divide by words per line or something.
0
m3mdiclAuthor Commented:
There must be something wrong because i'm computing for characters with no space.
I think it is synonymous to Length of string right?

The results are troublesome it doesn't make any sense.  Where do you think is the problem?
0
Kevin CrossChief Technology OfficerCommented:
Not sure as I have no way of telling which value is correct. i.e., I am not sure what is being stored in the first place. We are replacing spaces, but remember that does not include carriage return and line feeds. Word's count of characters with no spaces may not be taking characters such as that into account. SQL will for sure. With the gap being so vast, I suspect there is also a disconnect in how the expected VBC is being calculated OR as you said there is a serious disconnect in what is being stored.
0
m3mdiclAuthor Commented:
Maybe we should replace linefeeds and carriage return as well. Can we do that?  Can we remove those?
0
Kevin CrossChief Technology OfficerCommented:
You can try and see if that is a significant part of the difference or not.

-- INSERT INTO VBCTable(DocumentID, VBC)
SELECT DocumentID
     , LEN(REPLACE(REPLACE(REPLACE(CONVERT(NVARCHAR(MAX), RawFile), ' ', ''), CHAR(13), ''), CHAR(10), '')) AS VBC
FROM dbo.DocumentBinary
;

Open in new window


Typically, the carriage return plus line feed combination is found together, but just in case I replaced each individually above. CHAR(13) is carriage return character; CHAR(10), line feed.

Regards,
Kevin
0
m3mdiclAuthor Commented:
Thanks Kevin let me try that and give you updates.
0
m3mdiclAuthor Commented:
Hi, it is still not matching even with the linefeeds removed.

ID -------------------  VBC---------------------- Expected VBC from word ( character with no spaces)
91074-------------193429----------------------3628
91118-------------537261----------------------8488
91128-------------73527-----------------------1930
91150-------------89865-----------------------2790

Any ideas why the margin is so big?
0
Kevin CrossChief Technology OfficerCommented:
Not sure without knowing what the content of the field is. It could be binary representation of the Word file, including all its formatting and headers, i.e., the document itself or it could be the text contents as a long string. It just depends on what the C# code is doing before inserting it to the database.

I guess a question is what does the following yield:
SELECT CONVERT(NVARCHAR(1930), RawFile) AS FileContent
FROM dbo.DocumentBinary
/* not sure if DocumentID is INT or not, 
   so quotes may not be necessary. */
WHERE DocumentID = '91128'
;

Open in new window


Does this look like the contents of the Word document you are comparing to?
0
m3mdiclAuthor Commented:
let me try that sir thanks.
0
m3mdiclAuthor Commented:
This is the output in SQL Server Mgt Studio Express.

¿¿¿¿¿¿42
0
m3mdiclAuthor Commented:
That inverted question mark should be square like ascii codes.
0
Kevin CrossChief Technology OfficerCommented:
Hmm. That means the data is not converting nicely to VARCHAR, so that is why the length is not matching up. I will be offline several times over the next few days, so please feel free to ask a Moderator to ask for additional Experts. Maybe they can catch something I am missing, so it may not be a bad idea anyway.
0
GrahamSkanRetiredCommented:
If you want to analyse a Word document, you will have to use the Word application and the Word object model. Attempts to parse the file directly will not be able to distinguish between the document text and 'control' information (pointers, formatting information, styles, macros etc.).

Note too that the file storage format is radically different between 2003 and earlier versions, vis-a-vis 2007 and later. Actually, if it is a 2007 or later document, it is theoretically possible to Unzip the file and to analyse the XML in the result. If you want to try that then, I wish you the best of British luck. You'll need it.

Finally, bear in mind the difficulty in defining a 'line' in a Word document as I mentioned in my earlier comment:
#37040369

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
m3mdiclAuthor Commented:
Hi GrahamSkan

So you are saying that we can't do this in SQL Server alone.  I need to hardcode everything in VB.

So what will i do is something like this.

-Open BLOB in Word Object
-Get the character with no space
-Place the value on TableB

How can i do this since the database is adding up everyday and it's running 24/7.  I need to be able to update those values at Table B automatically.

TIA.
0
GrahamSkanRetiredCommented:
I think that you would have to create a Word application

Then with each the BLOB
    save as a file,
    open it in Word,
    get and store the count,
    close the file without saving

As a start, see if you can save the BLOB as a document file and open it (manually) with Word.
0
m3mdiclAuthor Commented:
Can you help me with the snippet i forgot how to use ado.stream to open the binary in word. Thanks.
0
GrahamSkanRetiredCommented:
I'm not too familiar myself, but try this:
Set stm = New ADODB.Stream
With stm
    .Open
    .Type = adTypeBinary
    .Write rs.Fields("MyBlob").Value
    .SaveToFile “MyFile.doc”, adSaveCreateOverWrite
    
End With

Open in new window

0
m3mdiclAuthor Commented:
Thanks for the quick reply. I think this is the only possible way but it's so slow.

Here is my test code.  I haven't inserted it yet in a table.  I just showed it in a listbox.
Can you please check if we can still optimize it a bit faster?

Private Sub Command1_Click()
Dim stm As ADODB.Stream
Dim rs As ADODB.Recordset
Dim objWord As Word.Application
Dim VBC As String
Dim SQL As String
Dim x As Long

    Set objWord = CreateObject("Word.Application")

    'open connection in module
    Call Open_Connection
    
    SQL = "SELECT DocumentID, BLOB FROM TableA"
        
    Set rs = New ADODB.Recordset
    rs.Open SQL, con, adOpenKeyset, adLockOptimistic
    
    Set stm = New ADODB.Stream
    
    stm.Type = adTypeBinary
    stm.Open
    
    For x = 1 To rs.RecordCount
        With stm
            .Write rs.Fields("BLOB").Value
            .SaveToFile App.Path & "\Temp.doc", adSaveCreateOverWrite
        End With
        
    objWord.Documents.Open App.Path & "\Temp.doc"
    VBC = objWord.ActiveDocument.ComputeStatistics(wdStatisticCharacters)
        
    List1.AddItem VBC
    objWord.ActiveDocument.Close False
    
    rs.MoveNext
    Next
    
    rs.Close
    con.Close
End Sub

Open in new window

0
GrahamSkanRetiredCommented:
I can't advise anything faster, but don't forget to end the Word application after the loop:

objWord.Quit

or you will leave an invisible instance in memory every time that you run the code. For the same reason, you might like to make the application visible in case of aborts.
0
m3mdiclAuthor Commented:
Ok thanks guys for the help. I rest my case for the time being.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft SQL Server 2005

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.