Solved

Routine to compress very long string

Posted on 1998-11-30
27
1,172 Views
Last Modified: 2006-11-17
I am looking for a routine (in full VB, no OCX, DLL...) to compress very long string. Those strings are in fact RTF files stored in a database.
As they take a lot of place, I will reduce the size of them.

If the answer is very good, I will increase the points.
0
Comment
Question by:waty
  • 10
  • 5
  • 4
  • +5
27 Comments
 
LVL 4

Expert Comment

by:idcanada
ID: 1447410
public function compress()

       On Error Resume Next

              For TT = 1 To Len(Text1)
                     sana1 = Mid(Text1, TT, 1)
                     sana2 = Mid(Text1, TT + 1, 1)
                     sana3 = Mid(Text1, TT + 2, 1)
                     X = 1

                            If Not sana1 = sana2 Then löyty = 2

                                          If sana1 = sana2 Then

                                                        If sana1 = sana3 Then
                                                               löyty = 1
                                                        End If

                                          End If


                                          If löyty = 1 Then
                                                 alku:
                                                 X = X + 1
                                                 merkki = Mid(Text1, TT + X + 1, 1)

                                                        If merkki = sana1 Then GoTo alku
                                                               sana = Chr(255) & Chr(X - 1) & sana1
                                                               TT = TT + X
                                                        End If


                                                        If löyty = 2 Then sana = sana1
                                                               Text = Text & sana
                                                        Next

                                                 Text1 = Text
                                          end function


public function uncompress()

       On Error Resume Next

              For TT = 1 To Len(Text1)
                     sana1 = Asc(Mid(Text1, TT, 1))
                     sana2 = Asc(Mid(Text1, TT + 1, 1))
                     sana3 = Asc(Mid(Text1, TT + 2, 1))
                     sana4 = Asc(Mid(Text1, TT - 1, 1))

                            If sana1 = 255 Then

                                          For TT6 = 1 To sana2
                                                 sana = sana & Chr(sana3)
                                          Next

                                   sana1 = ""
                                   sana2 = ""
                            End If


                            If sana = "" Then

                                          If Not sana4 = 255 Then
                                                 sana = Chr(sana1)
                                          End If

                            End If

                     Text = Text & sana
                     sana = ""
              Next

       Text1 = Text
end function
0
 
LVL 14

Author Comment

by:waty
ID: 1447411
I have tried your routine, and it doesn't compress at all.
0
 
LVL 1

Expert Comment

by:zivf
ID: 1447412
idcanada, have you considered using latin alphabet and a word of WHAT this thing should do?
0
 
LVL 13

Expert Comment

by:Mirkwood
ID: 1447413
Hi Waty,

IDcanada's routine does only compress when you have a lot of sequences of identical characters.

BTW: Have you tried to use compress.exe. So create a temp file. Call compress.exe and store the result.
See http://www.filelibrary.com/Contents/Windows/74/2.html for an example
0
 
LVL 4

Expert Comment

by:idcanada
ID: 1447414

' In this Function alone I have compressed strings more than 52%
' Simply put DefInt A-Z in the module and you're set.



Public Function PackTxt(text$) As String
       ' 1) The sub adds CHR$(255) to a packed string to indicate t
       '     hat
       '     'it is packed. (For automatic packing/unpacking)
       ' 2) The sub checks if the values of the characters in the s
       '     tring are
       '     'between 32 and 127 to reduce errors while packing.
       '     ' 3) Last but not least, I've added some documatation.
       'Usage is very simple. Take a string (A$, for example) and c
       '     all PackTxt:
       '      'a$ = "This is a test string for the text packer."
        'a$ = a$ + "we we and we we we we we here we go again to th
       '     e pasting level
       in our season"
       
       
       'The string will be packed and CHR$(255) added to the beginn
       '     ing to indicate
       'that it is packed. To unpack a string just call PackTxt aga
       '     in:
       
       '     'There. That's all there is to it. Have fun :)
       '     'PackTxt SUB begins here
       'This sub packs strings to about 52% of their original size.
       '      The sub
       ' automatically checks if the string is packed or not and un
       '     packs/packs
       '     ' it accordingly. (neat, huh?)
       ' Don't play around with the dictionary or you can lower the
       '      packing
       ' ratio (when I got it there was a single space missing and
       '     that
       '     ' reduced the ratio by about 10%!).
       ' NOTE: text$ may only contain ASCII characters with the val
       '     ues 32-127!
       'the commets here are cluttered up... delete them if you wan
       '     t
       'The dictionary... it may look like junk, but it is pretty m
       '     uch the heart
       '     ' of the packer (so don't mess it up :) )
       d1$ = " e as tinthouerhet anreesr d onn or o i y wo tontyo. neisarte"
       d2$ = "ed, ctiy bat snd fal pensestvengitu talehaurllcousa mf dfoof "
       d3$ = "siril hmeg om Icehironsasiossbedepe rli Tetel nicho lilprcactut"
       d4$ = "Thpaeceachh wige ebuaisursulmawaotowtsmploI solyee Cunm rtieno S"
       d5$ = "diwhs.rafincademe.irplk ury Pwoacos gams,duayavucColamowe Aoopu"
       Dict$ = d1$ + d2$ + d3$ + d4$ + d5$

              If Len(Dict$) <> 320 Then 'just to check...
                      MsgBox "PACKING ERROR: Dictionary is the wrong size"
                      Exit Function
              End If


              If Not Left$(text$, 1) = Chr$(255) Then 'check if the string is packed
                     '      'the string isn't packed so pack it...
                      If Len(text$) < 4 Then Exit Function 'no use with strings less than 4
                     chars.
                      For a = 1 To Len(text$) 'check if there are any characters
                     v = Asc(Mid$(text$, a, 1)) ' with values out of range (they

                            If v < 32 Or v > 127 Then Exit Function ' _cannot_ be packed otherwise)
                                    Next a
                                    Do
                                    DoEvents
                                   cnt = cnt + 1'read pointer in text$
                                   Chars$ = Mid$(text$, cnt, 2) 'characters to be checked for in Dict$
                                   

                                          If cnt = Len(text$) Then'if the end of the string has been reached
                                                  text$ = Chr$(255) + temp$ + Chr$(Asc(Mid$(text$, cnt, 1)) - 32)
                                                  Exit Function '^^^^ add the last character
                                          End If

                                   
                                   xx = 1 'read pointer in Dict$
                                   ReDo:
                                   x = InStr(xx, Dict$, Chars$) '4;1;120;1;0xp

                                          If x Then'if the characters from text$ are in Dict$
                                                  If (x \ 2) = (x / 2) Then 'if the instr of the characters can't be
                                                 xx = x + 1 ' divided by 2 then look again (it needs
                                                 GoTo ReDo' to be divided so it can be packed)
                                                  End If
                                                  temp$ = temp$ + Chr$((x \ 2) + 96) 'add the instr of the characters
                                                  cnt = cnt + 1 ' in Dict$ to temp$ (note that
                                                 '      '^^^ characters shouldn't ' it's stored so that it's more
                                                 '      ' be compressed twice ' than 95)
                                                 PackTxt = temp$
                                          Else
                                                 '      'if the characters aren't found store the first character
                                                  '(note that it's less than 95 and that cnt is only moved up
                                                 '      _1_)
                                                  temp$ = temp$ + Chr$(Asc(Mid$(text$, cnt, 1)) - 32)
                                          End If

                                    Loop While cnt < Len(text$)
                                    text$ = Chr$(255) + temp$ 'copy temp$ into text$ and add CHR$(255)
                                   PackTxt = text$
                                    Exit Function ' to indicate a packed string
                            Else
                                   '      'text$ is packed so unpack it
                                    comp$ = Right$(text$, Len(text$) - 1) 'remove CHR$(255)
                                    text$ = ""'re-init text$
                                    For x = 1 To Len(comp$)
                                   Chars$ = Asc(Mid$(comp$, x, 1))

                                          If Chars$ > 95 Then 'if char > 95 then char is the instr of the
                                                 '      ' unpacked characters in Dict$, remember?
                                                  text$ = text$ + Mid$(Dict$, (Chars$ - 96) * 2 + 1, 2)
                                                 PackTxt = text$
                                          Else 'if the characters weren't found in Dict$ they were stored
                                                 '     ' with a value less than 95 (most are found, though)
                                                  text$ = text$ + Chr$(Asc(Mid$(comp$, x, 1)) + 32)
                                                 PackTxt = text$
                                          End If

                                    Next x
                                   
                            End If

                     PackTxt = text$
              End Function


0
 
LVL 4

Expert Comment

by:idcanada
ID: 1447415

BTW. It does work and in case you are not satisfied, here is another.
0
 
LVL 7

Expert Comment

by:Inteqam
ID: 1447416
why don'y you use huffman code , to compress them .

0
 
LVL 14

Author Comment

by:waty
ID: 1447417
idcanada, you routines doesn't handles the carriage returns. As I want to compress HTML string, the routines doesn't work.
NB : for your sample, I compress your string from 118 to 82.

Mirkwood, using compress involve write and read to disk, and I don't want to do that. But I will check on the url you gave me.

Inteqam : Do you have some code to provide?
0
 
LVL 4

Expert Comment

by:idcanada
ID: 1447418

Add the Chr(10) & Chr(13)

0
 
LVL 14

Author Comment

by:waty
ID: 1447419
Even with Chr(10) & Chr(13), it doesn't work to much, I am looking to modify the code.

NB : there is also the chr$(9) (tab) char.
0
 
LVL 4

Expert Comment

by:idcanada
ID: 1447420

Why do you keep on rejecting my answer?
Your question asks how to compress strings, I showed you 2 different ways of doing it
and you still reject my answer.
My answers do what you ask,
so maybe they do not forfill all 100% of your requirements, however,
I am sure that I have covered a large part of it.

Do you not think your judgeing me harshly?



0
 
LVL 14

Author Comment

by:waty
ID: 1447421
I leave this question opened, so other users could post their answers too.

Your first answer was not good.

Your second answer was better, but it can not handle special chars like Chr$(10), Chr$(13), Chr$(9).
As I store HTML and RTF files in an access database, I want to pack them. With your algorithm, I gain 20% of the size. But I can't handle the files. I am trying to modify your code to in order to recognize those chars.
If you can post the code to pack an HTML file, you will receive the points.

0
 
LVL 3

Expert Comment

by:myqlG
ID: 1447422
Why reinvent the wheel? There are several compression
algorithms out there.. find the source and use one?

0
Why You Should Analyze Threat Actor TTPs

After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

 
LVL 8

Expert Comment

by:MikeP090797
ID: 1447423
Try this one (LZW compression), found on www.planet-source-code.com/vb


'****************************************************************
' Name: LZW (Lempel, Ziv, Welch) Dictionary Compression
' Description:The Lempel, Ziv, Welch compression algorithm i
'     s considered the most efficcient all purpose compression alg
'     orithm there is.
' By: Asgeir Bjarni Ingvarsson
'
' Inputs:None
' Returns:None
' Assumes:None
' Side Effects:None
'
'Code provided by Planet Source Code(tm) 'as is', without
'     warranties as to performance, fitness, merchantability,
'     and any other warranty (whether expressed or implied).
'****************************************************************

'     ' -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
'     '| LZW - Compression/Uncompression|
'     '|-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-|
'     '|Author: Asgeir B. Ingvarsson |
'     '| |
'     '|E-Mail: abi@islandia.is |
'     '| |
'     '|Address: Hringbraut 119 |
'     '| IS-107, Reykjavik|
'     '| ICELAND |
'     '|-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-|
'     '|For any comments or questions, please contact me |
'     '|using either of the above measures. |
'     '|-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-|
'     '|This code has one flaw, it can't process characters |
'     '|higher than 127. |
'     '|For the code that can compress all 256 ascii chars. |
'     '|please e-mail me.|
'     '|If you use this code or modify it, I would appreciate|
'     '|it if you would mention my name somewhere and send me|
'     '|a copy of the code (if it has been modified).|
'     '|-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-|
'     '|LZW is property of Unisys and is free for|
'     '|noncommercial software. |
'     ' -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Private Dict(0 To 255) As String
Private Count As Integer

Private Sub Init()


              For i = 0 To 127
                     Dict(i) = Chr(i)
              Next

End Sub


Private Function Search(inp As String) As Integer


              For i = 0 To 255

                            If Dict(i) = inp Then Search = i: Exit Function
                            Next

                     Search = 256
              End Function


Private Sub Add(inp As String)


              If Count = 256 Then Wipe
                     Dict(Count) = inp
                     Count = Count + 1
              End Sub


Private Sub Wipe()


              For i = 128 To 255
                     Dict(i) = ""
              Next

       Count = 128
End Sub


Public Function Deflate(inp As String) As String

       '     'Begin Error Checking

              If Len(inp) = 0 Then Exit Function

                            For i = 1 To Len(inp)

                                          If Asc(Mid(inp, i, 1)) > 127 Then MsgBox "Illegal Character Value", vbCritical, "Error:": Exit Function
                                          Next

                                   '     'End Error Checking
                                   Init
                                   Wipe
                                   p = ""
                                   i = 1

                                          Do Until i > Len(inp)
                                                 c = Mid(inp, i, 1)
                                                 i = i + 1
                                                 temp = p & c

                                                        If Not Search(CStr(temp)) = 256 Then
                                                               p = temp
                                                        Else
                                                               o = o & Chr(Search(CStr(p)))
                                                               Add CStr(temp)
                                                               p = c
                                                        End If

                                          Loop

                                   o = o & Chr(Search(CStr(p)))
                                   Deflate = o
                            End Function


Public Function Inflate(inp As String) As String


              If Len(inp) = 0 Then Exit Function
                     Init
                     Wipe
                     cW = Asc(Mid(inp, 1, 1))
                     o = Dict(cW)
                     i = 2

                            Do Until i > Len(inp)
                                   pW = cW
                                   cW = Asc(Mid(inp, i, 1))
                                   i = i + 1

                                          If Not Dict(cW) = "" Then
                                                 o = o & Dict(cW)
                                                 p = Dict(pW)
                                                 c = Mid(Dict(cW), 1, 1)
                                                 Add (CStr(p) & CStr(c))
                                          ElseIf Dict(cW) = "" Then
                                                 p = Dict(pW)
                                                 c = Mid(Dict(pW), 1, 1)
                                                 o = o & p & c
                                                 Add (CStr(p) & CStr(c))
                                          End If

                            Loop

                     Inflate = o
              End Function


Public Sub main()

       inp = "Hello World, Hello World"
       d = Deflate(CStr(inp)) 'Compress
       q = Inflate(CStr(d)) 'Uncompress
       MsgBox "Uncompressed: " & q & vbCrLf & vbCrLf & _
       "Compressed: " & d & vbCrLf & vbCrLf & _
       "Compressed Size: " & Len(d) & vbCrLf & vbCrLf & _
       "Uncompressed Size: " & Len(q) & vbCrLf & vbCrLf & _
       "Compression Ratio: " & (100 - (((Len(d) / Len(q)) * 100) \ 1)) & "%", vbOKOnly, "Results:"
End Sub

0
 

Expert Comment

by:lppjohns
ID: 1447424
Somehow my last answer got removed, basically I said.
Write the string out to disk in binary format byte by byte.
use pkzip to zip it (Called from within your program)
Read the zipped file back into a string and store that string in your database.

If you think this may be an answer to your question let me know and I will again post the full code on how to accomplish this.
0
 
LVL 14

Author Comment

by:waty
ID: 1447425
lppjohns, you could post the code here or send it to me at waty.thierry@usa.net

MikeP, your code works better than the one of idcanada, the speed to files of 40K is very low.
0
 
LVL 7

Expert Comment

by:Inteqam
ID: 1447426
you can code the pkzip functionality in your code ,  

by using the huffman code ,

i think i can write it.


0
 
LVL 14

Author Comment

by:waty
ID: 1447427
Inteqam, if your code is very quick, I am interessed. The code from MikeP works (20% saved) but is very slow.
0
 
LVL 13

Expert Comment

by:Mirkwood
ID: 1447428
In VB, it will never be quick. VB just is not optimized to do that.
0
 
LVL 14

Author Comment

by:waty
ID: 1447429
Yes, Mirkwood, I know that, so probably the method with PKZip is the best at this moment, or if someone has a DLL written in C and wich is royalty free....
0
 
LVL 8

Expert Comment

by:MikeP090797
ID: 1447430
You can try the LZW algorythm as a part of the Windows API (the LZ* functions), maybe it will make things faster
0
 
LVL 14

Author Comment

by:waty
ID: 1447431
MikeP, I think your option is the best. If you already have this job, could you post it, otherwise, I will implement it.

0
 

Expert Comment

by:lppjohns
ID: 1447432
I e-mailed you, what was the problem with my answer?
I'm just curious.
0
 
LVL 8

Expert Comment

by:MikeP090797
ID: 1447433
waty, with your permission, if noone emailed you a better algorythm, I will lock this questions.
Please comment.
0
 
LVL 14

Author Comment

by:waty
ID: 1447434
MikeP, your solution has approched what I wanted, but I was not entirely satisfied. I think it will be definitivelly not possible to do it under VB.
0
 
LVL 14

Author Comment

by:waty
ID: 1447435
I forgot to tell you that you could lock the question
0
 
LVL 8

Accepted Solution

by:
MikeP090797 earned 200 total points
ID: 1447436
OK
0

Featured Post

Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

Join & Write a Comment

Introduction I needed to skip over some file processing within a For...Next loop in some old production code and wished that VB (classic) had a statement that would drop down to the end of the current iteration, bypassing the statements that were c…
Introduction While answering a recent question (http://www.experts-exchange.com/Q_27402310.html) in the VB classic zone, I wrote some VB code in the (Office) VBA environment, rather than fire up my older PC.  I didn't post completely correct code o…
Get people started with the process of using Access VBA to control Outlook using automation, Microsoft Access can control other applications. An example is the ability to programmatically talk to Microsoft Outlook. Using automation, an Access applic…
Get people started with the utilization of class modules. Class modules can be a powerful tool in Microsoft Access. They allow you to create self-contained objects that encapsulate functionality. They can easily hide the complexity of a process from…

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

22 Experts available now in Live!

Get 1:1 Help Now