Need help with Invalid length parameter passed to the substring function

I need to parse out several lines of email headers contained in a text type column

Here's an example

Date: Thu, 27 Apr 2000 09:05:22 -0400
From: John Doe <bill@website.com>
Subject: test
To: <BillSmith@website2.com>
Mime-Version: 1.0
Organization: Company, Inc.

I need to pull the Date:, From: Subject: and To: lines to their own columns. I know how to use SUBSTRING(EmailText,PATINDEX('%From:%',EmailText),40) to get 40 chars of the From: line but would like to pull everything up to the linefeed for each of the above.

I accepted the following as a solution previously but discovered it includes the trailing char(10) which shows up in the column as a trailing pair of boxes when I SELECT INTO a new table:

SELECt SUBSTRING(EmailText,PATINDEX('%From:%',EmailText),PATINDEX('%'+CHAR(10)+'%',SUBSTRING(EmailText,PATINDEX('%From:%',EmailText),Len(EmailText))))
 
I tried the following but I get "Invalid length parameter passed to the substring function" due to missing data in certain rows:

SELECt SUBSTRING(EmailText,PATINDEX('%From:%',EmailText),PATINDEX('%'+CHAR(10)+'%',SUBSTRING(EmailText,PATINDEX('%From:%',EmailText),Len(EmailText)))-2)

I tried eliminating incomplete headers through the WHERE clause by making sure that the strings 'From:','To:','Subject:' and 'Date:' are in the header but sometimes they appear with no following text which results in the invalid length parameter passed error.
LVL 1
dsteinschneiderAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Anthony PerkinsCommented:
You may want to resort to using a function like this:

CREATE Function dbo.udf_GetEmailText(
            @EmailText text,
            @Field varchar(20))

Returns varchar(1000)

Begin
Declare @FromPos integer,
      @ToPos integer,
      @LF char(1),
      @Value varchar(1000)

Set @LF = CHAR(10)

Set @FromPos = CHARINDEX(@Field, @EmailText)
If @FromPos > 0
   Begin
      Set @FromPos = @FromPos + LEN(@Field)
      Set @ToPos = CHARINDEX(@LF, @EmailText, @FromPos)
      If @ToPos = 0
         Begin
            Set @ToPos = DATALENGTH(@EmailText)
         End
      Set @Value = SUBSTRING(@EmailText, @FromPos, @ToPos - @FromPos + 1)
   End
Else
   Begin
      Set @Value = Null       -- Or '' whichever you prefer
   End

Return @Value
End

You can then call it as follows:

Select      ID,
      EmailText,
      dbo.udf_GetEmailText(EmailText, 'Date:'),
      dbo.udf_GetEmailText(EmailText, 'From:'),
      dbo.udf_GetEmailText(EmailText, 'Subject:'),
      dbo.udf_GetEmailText(EmailText, 'To:'),
      dbo.udf_GetEmailText(EmailText, 'Mime-Version:'),
      dbo.udf_GetEmailText(EmailText, 'Organization:')
From      YourTableName
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Anthony PerkinsCommented:
The reason I suggest you do it this way, is that it will give you greater flexibility in modifying it.  For example, you may find you are getting false positives:  The "From:" may be embedded in the text somewhere.  What you would have to do then is search for CHAR(10) + 'From:'

Nome other notes:
1.  Since the column is a Text datatype use DATALENGTH instead of LEN
2.  I am making the assumption all the "lines" end in CHAR(10), except for optionally the last.
3.  If you return Null instead of '' it will alert you to the fact that the "field" did not exist as opposed to being really empty.
0
dsteinschneiderAuthor Commented:
Thanks,
I agree with you three notes. The way you wrote the function makes it very clear. It works well except that each column has a leading space. I tried changing this:

Set @Value = SUBSTRING(@EmailText, @FromPos+1, @ToPos - @FromPos-2)

Looked good for the top 2000 rows but somewhere beyond that it resulted in the invalid parameter for substring error.

0
Anthony PerkinsCommented:
>>It works well except that each column has a leading space. <<
Right that is because there is a space after every "Field".  If you do not need the space, just change the query to (assuming that all "Fields" have a space at the end):
Select     ID,
     EmailText,
     dbo.udf_GetEmailText(EmailText, 'Date: '),
     dbo.udf_GetEmailText(EmailText, 'From: '),
     dbo.udf_GetEmailText(EmailText, 'Subject: '),
     dbo.udf_GetEmailText(EmailText, 'To: '),
     dbo.udf_GetEmailText(EmailText, 'Mime-Version: '),
     dbo.udf_GetEmailText(EmailText, 'Organization: ')
From     YourTableName
0
dsteinschneiderAuthor Commented:
I had tried already to modify the query by altering the second parameter but I still got the leading space.

I realized my last attempt to shift the @FromPos and @ToPos was faulty - here's the change that works:

Set @Value = SUBSTRING(@EmailText, @FromPos+1, @ToPos - (@FromPos+1))

Thanks again for your help - this is for a compliance report.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft SQL Server

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.