Solved

# Routine to compress very long string

Posted on 1998-11-30
1,172 Views
I am looking for a routine (in full VB, no OCX, DLL...) to compress very long string. Those strings are in fact RTF files stored in a database.
As they take a lot of place, I will reduce the size of them.

If the answer is very good, I will increase the points.
0
Question by:waty
• 10
• 5
• 4
• +5

LVL 4

Expert Comment

ID: 1447410
public function compress()

On Error Resume Next

For TT = 1 To Len(Text1)
sana1 = Mid(Text1, TT, 1)
sana2 = Mid(Text1, TT + 1, 1)
sana3 = Mid(Text1, TT + 2, 1)
X = 1

If Not sana1 = sana2 Then lÃ¶yty = 2

If sana1 = sana2 Then

If sana1 = sana3 Then
lÃ¶yty = 1
End If

End If

If lÃ¶yty = 1 Then
alku:
X = X + 1
merkki = Mid(Text1, TT + X + 1, 1)

If merkki = sana1 Then GoTo alku
sana = Chr(255) & Chr(X - 1) & sana1
TT = TT + X
End If

If lÃ¶yty = 2 Then sana = sana1
Text = Text & sana
Next

Text1 = Text
end function

public function uncompress()

On Error Resume Next

For TT = 1 To Len(Text1)
sana1 = Asc(Mid(Text1, TT, 1))
sana2 = Asc(Mid(Text1, TT + 1, 1))
sana3 = Asc(Mid(Text1, TT + 2, 1))
sana4 = Asc(Mid(Text1, TT - 1, 1))

If sana1 = 255 Then

For TT6 = 1 To sana2
sana = sana & Chr(sana3)
Next

sana1 = ""
sana2 = ""
End If

If sana = "" Then

If Not sana4 = 255 Then
sana = Chr(sana1)
End If

End If

Text = Text & sana
sana = ""
Next

Text1 = Text
end function
0

LVL 14

Author Comment

ID: 1447411
I have tried your routine, and it doesn't compress at all.
0

LVL 1

Expert Comment

ID: 1447412
idcanada, have you considered using latin alphabet and a word of WHAT this thing should do?
0

LVL 13

Expert Comment

ID: 1447413
Hi Waty,

IDcanada's routine does only compress when you have a lot of sequences of identical characters.

BTW: Have you tried to use compress.exe. So create a temp file. Call compress.exe and store the result.
See http://www.filelibrary.com/Contents/Windows/74/2.html for an example
0

LVL 4

Expert Comment

ID: 1447414

' In this Function alone I have compressed strings more than 52%
' Simply put DefInt A-Z in the module and you're set.

Public Function PackTxt(text\$) As String
' 1) The sub adds CHR\$(255) to a packed string to indicate t
'     hat
'     'it is packed. (For automatic packing/unpacking)
' 2) The sub checks if the values of the characters in the s
'     tring are
'     'between 32 and 127 to reduce errors while packing.
'     ' 3) Last but not least, I've added some documatation.
'Usage is very simple. Take a string (A\$, for example) and c
'     all PackTxt:
'      'a\$ = "This is a test string for the text packer."
'a\$ = a\$ + "we we and we we we we we here we go again to th
'     e pasting level
in our season"

'The string will be packed and CHR\$(255) added to the beginn
'     ing to indicate
'that it is packed. To unpack a string just call PackTxt aga
'     in:

'     'There. That's all there is to it. Have fun :)
'     'PackTxt SUB begins here
'This sub packs strings to about 52% of their original size.
'      The sub
' automatically checks if the string is packed or not and un
'     packs/packs
'     ' it accordingly. (neat, huh?)
' Don't play around with the dictionary or you can lower the
'      packing
' ratio (when I got it there was a single space missing and
'     that
'     ' reduced the ratio by about 10%!).
' NOTE: text\$ may only contain ASCII characters with the val
'     ues 32-127!
'the commets here are cluttered up... delete them if you wan
'     t
'The dictionary... it may look like junk, but it is pretty m
'     uch the heart
'     ' of the packer (so don't mess it up :) )
d1\$ = " e as tinthouerhet anreesr d onn or o i y wo tontyo. neisarte"
d2\$ = "ed, ctiy bat snd fal pensestvengitu talehaurllcousa mf dfoof "
d3\$ = "siril hmeg om Icehironsasiossbedepe rli Tetel nicho lilprcactut"
d4\$ = "Thpaeceachh wige ebuaisursulmawaotowtsmploI solyee Cunm rtieno S"
d5\$ = "diwhs.rafincademe.irplk ury Pwoacos gams,duayavucColamowe Aoopu"
Dict\$ = d1\$ + d2\$ + d3\$ + d4\$ + d5\$

If Len(Dict\$) <> 320 Then 'just to check...
MsgBox "PACKING ERROR: Dictionary is the wrong size"
Exit Function
End If

If Not Left\$(text\$, 1) = Chr\$(255) Then 'check if the string is packed
'      'the string isn't packed so pack it...
If Len(text\$) < 4 Then Exit Function 'no use with strings less than 4
chars.
For a = 1 To Len(text\$) 'check if there are any characters
v = Asc(Mid\$(text\$, a, 1)) ' with values out of range (they

If v < 32 Or v > 127 Then Exit Function ' _cannot_ be packed otherwise)
Next a
Do
DoEvents
cnt = cnt + 1'read pointer in text\$
Chars\$ = Mid\$(text\$, cnt, 2) 'characters to be checked for in Dict\$

If cnt = Len(text\$) Then'if the end of the string has been reached
text\$ = Chr\$(255) + temp\$ + Chr\$(Asc(Mid\$(text\$, cnt, 1)) - 32)
Exit Function '^^^^ add the last character
End If

xx = 1 'read pointer in Dict\$
ReDo:
x = InStr(xx, Dict\$, Chars\$) '4;1;120;1;0xp

If x Then'if the characters from text\$ are in Dict\$
If (x \ 2) = (x / 2) Then 'if the instr of the characters can't be
xx = x + 1 ' divided by 2 then look again (it needs
GoTo ReDo' to be divided so it can be packed)
End If
temp\$ = temp\$ + Chr\$((x \ 2) + 96) 'add the instr of the characters
cnt = cnt + 1 ' in Dict\$ to temp\$ (note that
'      '^^^ characters shouldn't ' it's stored so that it's more
'      ' be compressed twice ' than 95)
PackTxt = temp\$
Else
'      'if the characters aren't found store the first character
'(note that it's less than 95 and that cnt is only moved up
'      _1_)
temp\$ = temp\$ + Chr\$(Asc(Mid\$(text\$, cnt, 1)) - 32)
End If

Loop While cnt < Len(text\$)
text\$ = Chr\$(255) + temp\$ 'copy temp\$ into text\$ and add CHR\$(255)
PackTxt = text\$
Exit Function ' to indicate a packed string
Else
'      'text\$ is packed so unpack it
comp\$ = Right\$(text\$, Len(text\$) - 1) 'remove CHR\$(255)
text\$ = ""'re-init text\$
For x = 1 To Len(comp\$)
Chars\$ = Asc(Mid\$(comp\$, x, 1))

If Chars\$ > 95 Then 'if char > 95 then char is the instr of the
'      ' unpacked characters in Dict\$, remember?
text\$ = text\$ + Mid\$(Dict\$, (Chars\$ - 96) * 2 + 1, 2)
PackTxt = text\$
Else 'if the characters weren't found in Dict\$ they were stored
'     ' with a value less than 95 (most are found, though)
text\$ = text\$ + Chr\$(Asc(Mid\$(comp\$, x, 1)) + 32)
PackTxt = text\$
End If

Next x

End If

PackTxt = text\$
End Function

0

LVL 4

Expert Comment

ID: 1447415

BTW. It does work and in case you are not satisfied, here is another.
0

LVL 7

Expert Comment

ID: 1447416
why don'y you use huffman code , to compress them .

0

LVL 14

Author Comment

ID: 1447417
idcanada, you routines doesn't handles the carriage returns. As I want to compress HTML string, the routines doesn't work.
NB : for your sample, I compress your string from 118 to 82.

Mirkwood, using compress involve write and read to disk, and I don't want to do that. But I will check on the url you gave me.

Inteqam : Do you have some code to provide?
0

LVL 4

Expert Comment

ID: 1447418

Add the Chr(10) & Chr(13)

0

LVL 14

Author Comment

ID: 1447419
Even with Chr(10) & Chr(13), it doesn't work to much, I am looking to modify the code.

NB : there is also the chr\$(9) (tab) char.
0

LVL 4

Expert Comment

ID: 1447420

Why do you keep on rejecting my answer?
Your question asks how to compress strings, I showed you 2 different ways of doing it
and you still reject my answer.
so maybe they do not forfill all 100% of your requirements, however,
I am sure that I have covered a large part of it.

Do you not think your judgeing me harshly?

0

LVL 14

Author Comment

ID: 1447421
I leave this question opened, so other users could post their answers too.

Your second answer was better, but it can not handle special chars like Chr\$(10), Chr\$(13), Chr\$(9).
As I store HTML and RTF files in an access database, I want to pack them. With your algorithm, I gain 20% of the size. But I can't handle the files. I am trying to modify your code to in order to recognize those chars.
If you can post the code to pack an HTML file, you will receive the points.

0

LVL 3

Expert Comment

ID: 1447422
Why reinvent the wheel? There are several compression
algorithms out there.. find the source and use one?

0

LVL 8

Expert Comment

ID: 1447423
Try this one (LZW compression), found on www.planet-source-code.com/vb

'****************************************************************
' Name: LZW (Lempel, Ziv, Welch) Dictionary Compression
' Description:The Lempel, Ziv, Welch compression algorithm i
'     s considered the most efficcient all purpose compression alg
'     orithm there is.
' By: Asgeir Bjarni Ingvarsson
'
' Inputs:None
' Returns:None
' Assumes:None
' Side Effects:None
'
'Code provided by Planet Source Code(tm) 'as is', without
'     warranties as to performance, fitness, merchantability,
'     and any other warranty (whether expressed or implied).
'****************************************************************

'     ' -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
'     '| LZW - Compression/Uncompression|
'     '|-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-|
'     '|Author: Asgeir B. Ingvarsson |
'     '| |
'     '|E-Mail: abi@islandia.is |
'     '| |
'     '|Address: Hringbraut 119 |
'     '| IS-107, Reykjavik|
'     '| ICELAND |
'     '|-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-|
'     '|using either of the above measures. |
'     '|-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-|
'     '|This code has one flaw, it can't process characters |
'     '|higher than 127. |
'     '|For the code that can compress all 256 ascii chars. |
'     '|please e-mail me.|
'     '|If you use this code or modify it, I would appreciate|
'     '|it if you would mention my name somewhere and send me|
'     '|a copy of the code (if it has been modified).|
'     '|-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-|
'     '|LZW is property of Unisys and is free for|
'     '|noncommercial software. |
'     ' -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Private Dict(0 To 255) As String
Private Count As Integer

Private Sub Init()

For i = 0 To 127
Dict(i) = Chr(i)
Next

End Sub

Private Function Search(inp As String) As Integer

For i = 0 To 255

If Dict(i) = inp Then Search = i: Exit Function
Next

Search = 256
End Function

Private Sub Add(inp As String)

If Count = 256 Then Wipe
Dict(Count) = inp
Count = Count + 1
End Sub

Private Sub Wipe()

For i = 128 To 255
Dict(i) = ""
Next

Count = 128
End Sub

Public Function Deflate(inp As String) As String

'     'Begin Error Checking

If Len(inp) = 0 Then Exit Function

For i = 1 To Len(inp)

If Asc(Mid(inp, i, 1)) > 127 Then MsgBox "Illegal Character Value", vbCritical, "Error:": Exit Function
Next

'     'End Error Checking
Init
Wipe
p = ""
i = 1

Do Until i > Len(inp)
c = Mid(inp, i, 1)
i = i + 1
temp = p & c

If Not Search(CStr(temp)) = 256 Then
p = temp
Else
o = o & Chr(Search(CStr(p)))
p = c
End If

Loop

o = o & Chr(Search(CStr(p)))
Deflate = o
End Function

Public Function Inflate(inp As String) As String

If Len(inp) = 0 Then Exit Function
Init
Wipe
cW = Asc(Mid(inp, 1, 1))
o = Dict(cW)
i = 2

Do Until i > Len(inp)
pW = cW
cW = Asc(Mid(inp, i, 1))
i = i + 1

If Not Dict(cW) = "" Then
o = o & Dict(cW)
p = Dict(pW)
c = Mid(Dict(cW), 1, 1)
Add (CStr(p) & CStr(c))
ElseIf Dict(cW) = "" Then
p = Dict(pW)
c = Mid(Dict(pW), 1, 1)
o = o & p & c
Add (CStr(p) & CStr(c))
End If

Loop

Inflate = o
End Function

Public Sub main()

inp = "Hello World, Hello World"
d = Deflate(CStr(inp)) 'Compress
q = Inflate(CStr(d)) 'Uncompress
MsgBox "Uncompressed: " & q & vbCrLf & vbCrLf & _
"Compressed: " & d & vbCrLf & vbCrLf & _
"Compressed Size: " & Len(d) & vbCrLf & vbCrLf & _
"Uncompressed Size: " & Len(q) & vbCrLf & vbCrLf & _
"Compression Ratio: " & (100 - (((Len(d) / Len(q)) * 100) \ 1)) & "%", vbOKOnly, "Results:"
End Sub

0

Expert Comment

ID: 1447424
Somehow my last answer got removed, basically I said.
Write the string out to disk in binary format byte by byte.
use pkzip to zip it (Called from within your program)
Read the zipped file back into a string and store that string in your database.

If you think this may be an answer to your question let me know and I will again post the full code on how to accomplish this.
0

LVL 14

Author Comment

ID: 1447425
lppjohns, you could post the code here or send it to me at waty.thierry@usa.net

MikeP, your code works better than the one of idcanada, the speed to files of 40K is very low.
0

LVL 7

Expert Comment

ID: 1447426
you can code the pkzip functionality in your code ,

by using the huffman code ,

i think i can write it.

0

LVL 14

Author Comment

ID: 1447427
Inteqam, if your code is very quick, I am interessed. The code from MikeP works (20% saved) but is very slow.
0

LVL 13

Expert Comment

ID: 1447428
In VB, it will never be quick. VB just is not optimized to do that.
0

LVL 14

Author Comment

ID: 1447429
Yes, Mirkwood, I know that, so probably the method with PKZip is the best at this moment, or if someone has a DLL written in C and wich is royalty free....
0

LVL 8

Expert Comment

ID: 1447430
You can try the LZW algorythm as a part of the Windows API (the LZ* functions), maybe it will make things faster
0

LVL 14

Author Comment

ID: 1447431
MikeP, I think your option is the best. If you already have this job, could you post it, otherwise, I will implement it.

0

Expert Comment

ID: 1447432
I e-mailed you, what was the problem with my answer?
I'm just curious.
0

LVL 8

Expert Comment

ID: 1447433
waty, with your permission, if noone emailed you a better algorythm, I will lock this questions.
0

LVL 14

Author Comment

ID: 1447434
MikeP, your solution has approched what I wanted, but I was not entirely satisfied. I think it will be definitivelly not possible to do it under VB.
0

LVL 14

Author Comment

ID: 1447435
I forgot to tell you that you could lock the question
0

LVL 8

Accepted Solution

MikeP090797 earned 200 total points
ID: 1447436
OK
0

## Join & Write a Comment Already a member? Login.

### Suggested Solutions

Introduction I needed to skip over some file processing within a For...Next loop in some old production code and wished that VB (classic) had a statement that would drop down to the end of the current iteration, bypassing the statements that were câ€¦
Introduction While answering a recent question (http://www.experts-exchange.com/Q_27402310.html) in the VB classic zone, I wrote some VB code in the (Office) VBA environment, rather than fire up my older PC.  I didn't post completely correct code oâ€¦
Get people started with the process of using Access VBA to control Outlook using automation, Microsoft Access can control other applications. An example is the ability to programmatically talk to Microsoft Outlook. Using automation, an Access applicâ€¦
Get people started with the utilization of class modules. Class modules can be a powerful tool in Microsoft Access. They allow you to create self-contained objects that encapsulate functionality. They can easily hide the complexity of a process fromâ€¦

#### 762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

#### Need Help in Real-Time?

Connect with top rated Experts

22 Experts available now in Live!