# Routine to compress very long string

I am looking for a routine (in full VB, no OCX, DLL...) to compress very long string. Those strings are in fact RTF files stored in a database.
As they take a lot of place, I will reduce the size of them.

If the answer is very good, I will increase the points.
LVL 14
###### Who is Participating?

Commented:
OK
0

Commented:
public function compress()

On Error Resume Next

For TT = 1 To Len(Text1)
sana1 = Mid(Text1, TT, 1)
sana2 = Mid(Text1, TT + 1, 1)
sana3 = Mid(Text1, TT + 2, 1)
X = 1

If Not sana1 = sana2 Then löyty = 2

If sana1 = sana2 Then

If sana1 = sana3 Then
löyty = 1
End If

End If

If löyty = 1 Then
alku:
X = X + 1
merkki = Mid(Text1, TT + X + 1, 1)

If merkki = sana1 Then GoTo alku
sana = Chr(255) & Chr(X - 1) & sana1
TT = TT + X
End If

If löyty = 2 Then sana = sana1
Text = Text & sana
Next

Text1 = Text
end function

public function uncompress()

On Error Resume Next

For TT = 1 To Len(Text1)
sana1 = Asc(Mid(Text1, TT, 1))
sana2 = Asc(Mid(Text1, TT + 1, 1))
sana3 = Asc(Mid(Text1, TT + 2, 1))
sana4 = Asc(Mid(Text1, TT - 1, 1))

If sana1 = 255 Then

For TT6 = 1 To sana2
sana = sana & Chr(sana3)
Next

sana1 = ""
sana2 = ""
End If

If sana = "" Then

If Not sana4 = 255 Then
sana = Chr(sana1)
End If

End If

Text = Text & sana
sana = ""
Next

Text1 = Text
end function
0

Author Commented:
I have tried your routine, and it doesn't compress at all.
0

Commented:
idcanada, have you considered using latin alphabet and a word of WHAT this thing should do?
0

Commented:
Hi Waty,

IDcanada's routine does only compress when you have a lot of sequences of identical characters.

BTW: Have you tried to use compress.exe. So create a temp file. Call compress.exe and store the result.
See http://www.filelibrary.com/Contents/Windows/74/2.html for an example
0

Commented:

' In this Function alone I have compressed strings more than 52%
' Simply put DefInt A-Z in the module and you're set.

Public Function PackTxt(text\$) As String
' 1) The sub adds CHR\$(255) to a packed string to indicate t
'     hat
'     'it is packed. (For automatic packing/unpacking)
' 2) The sub checks if the values of the characters in the s
'     tring are
'     'between 32 and 127 to reduce errors while packing.
'     ' 3) Last but not least, I've added some documatation.
'Usage is very simple. Take a string (A\$, for example) and c
'     all PackTxt:
'      'a\$ = "This is a test string for the text packer."
'a\$ = a\$ + "we we and we we we we we here we go again to th
'     e pasting level
in our season"

'The string will be packed and CHR\$(255) added to the beginn
'     ing to indicate
'that it is packed. To unpack a string just call PackTxt aga
'     in:

'     'There. That's all there is to it. Have fun :)
'     'PackTxt SUB begins here
'This sub packs strings to about 52% of their original size.
'      The sub
' automatically checks if the string is packed or not and un
'     packs/packs
'     ' it accordingly. (neat, huh?)
' Don't play around with the dictionary or you can lower the
'      packing
' ratio (when I got it there was a single space missing and
'     that
'     ' reduced the ratio by about 10%!).
' NOTE: text\$ may only contain ASCII characters with the val
'     ues 32-127!
'the commets here are cluttered up... delete them if you wan
'     t
'The dictionary... it may look like junk, but it is pretty m
'     uch the heart
'     ' of the packer (so don't mess it up :) )
d1\$ = " e as tinthouerhet anreesr d onn or o i y wo tontyo. neisarte"
d2\$ = "ed, ctiy bat snd fal pensestvengitu talehaurllcousa mf dfoof "
d3\$ = "siril hmeg om Icehironsasiossbedepe rli Tetel nicho lilprcactut"
d4\$ = "Thpaeceachh wige ebuaisursulmawaotowtsmploI solyee Cunm rtieno S"
d5\$ = "diwhs.rafincademe.irplk ury Pwoacos gams,duayavucColamowe Aoopu"
Dict\$ = d1\$ + d2\$ + d3\$ + d4\$ + d5\$

If Len(Dict\$) <> 320 Then 'just to check...
MsgBox "PACKING ERROR: Dictionary is the wrong size"
Exit Function
End If

If Not Left\$(text\$, 1) = Chr\$(255) Then 'check if the string is packed
'      'the string isn't packed so pack it...
If Len(text\$) < 4 Then Exit Function 'no use with strings less than 4
chars.
For a = 1 To Len(text\$) 'check if there are any characters
v = Asc(Mid\$(text\$, a, 1)) ' with values out of range (they

If v < 32 Or v > 127 Then Exit Function ' _cannot_ be packed otherwise)
Next a
Do
DoEvents
cnt = cnt + 1'read pointer in text\$
Chars\$ = Mid\$(text\$, cnt, 2) 'characters to be checked for in Dict\$

If cnt = Len(text\$) Then'if the end of the string has been reached
text\$ = Chr\$(255) + temp\$ + Chr\$(Asc(Mid\$(text\$, cnt, 1)) - 32)
Exit Function '^^^^ add the last character
End If

xx = 1 'read pointer in Dict\$
ReDo:
x = InStr(xx, Dict\$, Chars\$) '4;1;120;1;0xp

If x Then'if the characters from text\$ are in Dict\$
If (x \ 2) = (x / 2) Then 'if the instr of the characters can't be
xx = x + 1 ' divided by 2 then look again (it needs
GoTo ReDo' to be divided so it can be packed)
End If
temp\$ = temp\$ + Chr\$((x \ 2) + 96) 'add the instr of the characters
cnt = cnt + 1 ' in Dict\$ to temp\$ (note that
'      '^^^ characters shouldn't ' it's stored so that it's more
'      ' be compressed twice ' than 95)
PackTxt = temp\$
Else
'      'if the characters aren't found store the first character
'(note that it's less than 95 and that cnt is only moved up
'      _1_)
temp\$ = temp\$ + Chr\$(Asc(Mid\$(text\$, cnt, 1)) - 32)
End If

Loop While cnt < Len(text\$)
text\$ = Chr\$(255) + temp\$ 'copy temp\$ into text\$ and add CHR\$(255)
PackTxt = text\$
Exit Function ' to indicate a packed string
Else
'      'text\$ is packed so unpack it
comp\$ = Right\$(text\$, Len(text\$) - 1) 'remove CHR\$(255)
text\$ = ""'re-init text\$
For x = 1 To Len(comp\$)
Chars\$ = Asc(Mid\$(comp\$, x, 1))

If Chars\$ > 95 Then 'if char > 95 then char is the instr of the
'      ' unpacked characters in Dict\$, remember?
text\$ = text\$ + Mid\$(Dict\$, (Chars\$ - 96) * 2 + 1, 2)
PackTxt = text\$
Else 'if the characters weren't found in Dict\$ they were stored
'     ' with a value less than 95 (most are found, though)
text\$ = text\$ + Chr\$(Asc(Mid\$(comp\$, x, 1)) + 32)
PackTxt = text\$
End If

Next x

End If

PackTxt = text\$
End Function

0

Commented:

BTW. It does work and in case you are not satisfied, here is another.
0

Commented:
why don'y you use huffman code , to compress them .

0

Author Commented:
idcanada, you routines doesn't handles the carriage returns. As I want to compress HTML string, the routines doesn't work.
NB : for your sample, I compress your string from 118 to 82.

Mirkwood, using compress involve write and read to disk, and I don't want to do that. But I will check on the url you gave me.

Inteqam : Do you have some code to provide?
0

Commented:

0

Author Commented:
Even with Chr(10) & Chr(13), it doesn't work to much, I am looking to modify the code.

NB : there is also the chr\$(9) (tab) char.
0

Commented:

Why do you keep on rejecting my answer?
Your question asks how to compress strings, I showed you 2 different ways of doing it
and you still reject my answer.
so maybe they do not forfill all 100% of your requirements, however,
I am sure that I have covered a large part of it.

Do you not think your judgeing me harshly?

0

Author Commented:
I leave this question opened, so other users could post their answers too.

Your second answer was better, but it can not handle special chars like Chr\$(10), Chr\$(13), Chr\$(9).
As I store HTML and RTF files in an access database, I want to pack them. With your algorithm, I gain 20% of the size. But I can't handle the files. I am trying to modify your code to in order to recognize those chars.
If you can post the code to pack an HTML file, you will receive the points.

0

Commented:
Why reinvent the wheel? There are several compression
algorithms out there.. find the source and use one?

0

Commented:
Try this one (LZW compression), found on www.planet-source-code.com/vb

'****************************************************************
' Name: LZW (Lempel, Ziv, Welch) Dictionary Compression
' Description:The Lempel, Ziv, Welch compression algorithm i
'     s considered the most efficcient all purpose compression alg
'     orithm there is.
'
' Inputs:None
' Returns:None
' Assumes:None
' Side Effects:None
'
'Code provided by Planet Source Code(tm) 'as is', without
'     warranties as to performance, fitness, merchantability,
'     and any other warranty (whether expressed or implied).
'****************************************************************

'     ' -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
'     '| LZW - Compression/Uncompression|
'     '|-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-|
'     '|Author: Asgeir B. Ingvarsson |
'     '| |
'     '|E-Mail: abi@islandia.is |
'     '| |
'     '| IS-107, Reykjavik|
'     '| ICELAND |
'     '|-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-|
'     '|using either of the above measures. |
'     '|-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-|
'     '|This code has one flaw, it can't process characters |
'     '|higher than 127. |
'     '|For the code that can compress all 256 ascii chars. |
'     '|If you use this code or modify it, I would appreciate|
'     '|it if you would mention my name somewhere and send me|
'     '|a copy of the code (if it has been modified).|
'     '|-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-|
'     '|LZW is property of Unisys and is free for|
'     '|noncommercial software. |
'     ' -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Private Dict(0 To 255) As String
Private Count As Integer

Private Sub Init()

For i = 0 To 127
Dict(i) = Chr(i)
Next

End Sub

Private Function Search(inp As String) As Integer

For i = 0 To 255

If Dict(i) = inp Then Search = i: Exit Function
Next

Search = 256
End Function

If Count = 256 Then Wipe
Dict(Count) = inp
Count = Count + 1
End Sub

Private Sub Wipe()

For i = 128 To 255
Dict(i) = ""
Next

Count = 128
End Sub

Public Function Deflate(inp As String) As String

'     'Begin Error Checking

If Len(inp) = 0 Then Exit Function

For i = 1 To Len(inp)

If Asc(Mid(inp, i, 1)) > 127 Then MsgBox "Illegal Character Value", vbCritical, "Error:": Exit Function
Next

'     'End Error Checking
Init
Wipe
p = ""
i = 1

Do Until i > Len(inp)
c = Mid(inp, i, 1)
i = i + 1
temp = p & c

If Not Search(CStr(temp)) = 256 Then
p = temp
Else
o = o & Chr(Search(CStr(p)))
p = c
End If

Loop

o = o & Chr(Search(CStr(p)))
Deflate = o
End Function

Public Function Inflate(inp As String) As String

If Len(inp) = 0 Then Exit Function
Init
Wipe
cW = Asc(Mid(inp, 1, 1))
o = Dict(cW)
i = 2

Do Until i > Len(inp)
pW = cW
cW = Asc(Mid(inp, i, 1))
i = i + 1

If Not Dict(cW) = "" Then
o = o & Dict(cW)
p = Dict(pW)
c = Mid(Dict(cW), 1, 1)
ElseIf Dict(cW) = "" Then
p = Dict(pW)
c = Mid(Dict(pW), 1, 1)
o = o & p & c
End If

Loop

Inflate = o
End Function

Public Sub main()

inp = "Hello World, Hello World"
d = Deflate(CStr(inp)) 'Compress
q = Inflate(CStr(d)) 'Uncompress
MsgBox "Uncompressed: " & q & vbCrLf & vbCrLf & _
"Compressed: " & d & vbCrLf & vbCrLf & _
"Compressed Size: " & Len(d) & vbCrLf & vbCrLf & _
"Uncompressed Size: " & Len(q) & vbCrLf & vbCrLf & _
"Compression Ratio: " & (100 - (((Len(d) / Len(q)) * 100) \ 1)) & "%", vbOKOnly, "Results:"
End Sub

0

Commented:
Somehow my last answer got removed, basically I said.
Write the string out to disk in binary format byte by byte.
use pkzip to zip it (Called from within your program)
Read the zipped file back into a string and store that string in your database.

If you think this may be an answer to your question let me know and I will again post the full code on how to accomplish this.
0

Author Commented:
lppjohns, you could post the code here or send it to me at waty.thierry@usa.net

MikeP, your code works better than the one of idcanada, the speed to files of 40K is very low.
0

Commented:
you can code the pkzip functionality in your code ,

by using the huffman code ,

i think i can write it.

0

Author Commented:
Inteqam, if your code is very quick, I am interessed. The code from MikeP works (20% saved) but is very slow.
0

Commented:
In VB, it will never be quick. VB just is not optimized to do that.
0

Author Commented:
Yes, Mirkwood, I know that, so probably the method with PKZip is the best at this moment, or if someone has a DLL written in C and wich is royalty free....
0

Commented:
You can try the LZW algorythm as a part of the Windows API (the LZ* functions), maybe it will make things faster
0

Author Commented:
MikeP, I think your option is the best. If you already have this job, could you post it, otherwise, I will implement it.

0

Commented:
I e-mailed you, what was the problem with my answer?
I'm just curious.
0

Commented:
waty, with your permission, if noone emailed you a better algorythm, I will lock this questions.
0

Author Commented:
MikeP, your solution has approched what I wanted, but I was not entirely satisfied. I think it will be definitivelly not possible to do it under VB.
0

Author Commented:
I forgot to tell you that you could lock the question
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.