Link to home
Start Free TrialLog in
Avatar of dsteinschneider
dsteinschneider

asked on

Need help with Invalid length parameter passed to the substring function

I need to parse out several lines of email headers contained in a text type column

Here's an example

Date: Thu, 27 Apr 2000 09:05:22 -0400
From: John Doe <bill@website.com>
Subject: test
To: <BillSmith@website2.com>
Mime-Version: 1.0
Organization: Company, Inc.

I need to pull the Date:, From: Subject: and To: lines to their own columns. I know how to use SUBSTRING(EmailText,PATINDEX('%From:%',EmailText),40) to get 40 chars of the From: line but would like to pull everything up to the linefeed for each of the above.

I accepted the following as a solution previously but discovered it includes the trailing char(10) which shows up in the column as a trailing pair of boxes when I SELECT INTO a new table:

SELECt SUBSTRING(EmailText,PATINDEX('%From:%',EmailText),PATINDEX('%'+CHAR(10)+'%',SUBSTRING(EmailText,PATINDEX('%From:%',EmailText),Len(EmailText))))
 
I tried the following but I get "Invalid length parameter passed to the substring function" due to missing data in certain rows:

SELECt SUBSTRING(EmailText,PATINDEX('%From:%',EmailText),PATINDEX('%'+CHAR(10)+'%',SUBSTRING(EmailText,PATINDEX('%From:%',EmailText),Len(EmailText)))-2)

I tried eliminating incomplete headers through the WHERE clause by making sure that the strings 'From:','To:','Subject:' and 'Date:' are in the header but sometimes they appear with no following text which results in the invalid length parameter passed error.
ASKER CERTIFIED SOLUTION
Avatar of Anthony Perkins
Anthony Perkins
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
The reason I suggest you do it this way, is that it will give you greater flexibility in modifying it.  For example, you may find you are getting false positives:  The "From:" may be embedded in the text somewhere.  What you would have to do then is search for CHAR(10) + 'From:'

Nome other notes:
1.  Since the column is a Text datatype use DATALENGTH instead of LEN
2.  I am making the assumption all the "lines" end in CHAR(10), except for optionally the last.
3.  If you return Null instead of '' it will alert you to the fact that the "field" did not exist as opposed to being really empty.
Avatar of dsteinschneider
dsteinschneider

ASKER

Thanks,
I agree with you three notes. The way you wrote the function makes it very clear. It works well except that each column has a leading space. I tried changing this:

Set @Value = SUBSTRING(@EmailText, @FromPos+1, @ToPos - @FromPos-2)

Looked good for the top 2000 rows but somewhere beyond that it resulted in the invalid parameter for substring error.

>>It works well except that each column has a leading space. <<
Right that is because there is a space after every "Field".  If you do not need the space, just change the query to (assuming that all "Fields" have a space at the end):
Select     ID,
     EmailText,
     dbo.udf_GetEmailText(EmailText, 'Date: '),
     dbo.udf_GetEmailText(EmailText, 'From: '),
     dbo.udf_GetEmailText(EmailText, 'Subject: '),
     dbo.udf_GetEmailText(EmailText, 'To: '),
     dbo.udf_GetEmailText(EmailText, 'Mime-Version: '),
     dbo.udf_GetEmailText(EmailText, 'Organization: ')
From     YourTableName
I had tried already to modify the query by altering the second parameter but I still got the leading space.

I realized my last attempt to shift the @FromPos and @ToPos was faulty - here's the change that works:

Set @Value = SUBSTRING(@EmailText, @FromPos+1, @ToPos - (@FromPos+1))

Thanks again for your help - this is for a compliance report.