counting short string occurence from within a long string

gene-analyst
gene-analyst used Ask the Experts™
on
I would like to be able to do a count of how many times
a particular string occurs in a textbox
e.g the string "AUG" from the text
ATTTCACGAACTCAUGTACACGACTTAGAUG
and then display the count,

have searched for the answer everywhere,
so any help at all would be appreciated.
thanks
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®

Author

Commented:
I managed to find a simple solution to this, was thinking far too deep, lol. Don't know how to close the question though!

Private Sub calculate_Click()

ttext = Text1.Text
tpos = 1
baslen = Len(Text1.Text)
basnum.Caption = baslen
codnum.Caption = baslen / 3

For i = 0 To baselen
If InStr(tpos, ttext, "aug") Then tcount = tcount + 1
tpos = tpos + 1
Next i
thyminnum.Caption = tcount


End Sub

Author

Commented:
thought this had worked but it would seem not, can't understand what is wrong with it!
That looks like Amino Acid sequences in DNA/RNA, in which case you need to be sure that you are getting "proper" triples, and not getting 'overlaps'.  This makes the solution quite a bit more involved.
Ensure you’re charging the right price for your IT

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden using our free interactive tool and use it to determine the right price for your IT services. Start calculating Now!

Commented:
'Look for '#### comments:

' Notes:
'   Using "Option Explicit" would have caught the baselen/baslen misspelling.
'   Better variable naming and control naming would help a lot.
'   Instr was always searching "the rest of" the string.
'      It started checking for "aug" in all 32 characters,
'      then in the last 31 chars, the last 30, etc.
'   Since "AUG" appears at the end, Instr would almost always match.
'    Mid$ picks 3 characters out each time for comparison,
'      (except close to the end of the string).

Option Explicit                 '#### I highly recommend always using Option Explicit

Private Sub calculate_Click()
    Dim ttext As String    '#### ... and declaring all your variables.
    Dim baslen As Integer
    Dim tcount As Integer
    Dim i As Integer
   
    ttext = LCase$(Text1.Text)      '#### Changed to lowercase to match "aug".
                                    '#### tpos is unnecessary, so omitted.
    baslen = Len(Text1.Text)
    basnum.Caption = baslen
    codnum.Caption = baslen / 3
   
    For i = 0 To baslen              '#### baselen changed to baslen
                                     '#### used i+1 instead of tpos in next line
        If Mid$(ttext, i+1, 3) = "aug" Then tcount = tcount + 1
                                      '#### used Mid$ instead of Instr

    Next i
    thyminnum.Caption = tcount

End Sub

Commented:
If Arthur_Wood's comment is valid, you could use a loop like:

For i = 0 To baslen Step 3

This still assumes that the proper triples start in the first position.  That would be easy to implement, too.

Author

Commented:
Thanks farsight, that worked great! Although I had declared my variables (and should have included them!), I can see now how essential the optiion explicit command is. As a newbie I found your answer clear, and beautifully simple.
As Arthur says, there will be overlap, but I can easily solve that now thanks to you, and can at last put the Ibuprofin away :-)

Author

Commented:
Thanks farsight, that worked great! Although I had declared my variables (and should have included them!), I can see now how essential the optiion explicit command is. As a newbie I found your answer clear, and beautifully simple.
As Arthur says, there will be overlap, but I can easily solve that now thanks to you, and can at last put the Ibuprofin away :-)

Commented:
You're welcome.  Is this problem work-related, or just an exercise.  The "ATCG" caught my eye.

I'm a software developer with a great interest in genetics, proteomics, etc.  I've bought a few deep books on Computational Molecular Biology and on Algorithms for processing DNA and protein data, though I've done no work in that area (yet).

Author

Commented:
Just an excercise, I'm a microbiology graduate looking
to work within Bioinformatics. The Masters degree
I'm applying for will involve programming in VB and genetics, so I've just built a simple prog whereby
a sequence can be copied and pasted into a text box
(in base form) and reveal the number of each type of amino acid present. Very basic, but a start.
The books you are studying sound interesting. I don't know what your science background is, but if find the molecular side too heavy you might want to take a look at
"Biochemistry" by Lubert Stryer. It was like a bible for biochemistry lectures! I think I'll do a search for the books you mentioned, and hope they aren't beyond me!

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial