What is the most effective way to apply CRC32?

I am implementing a protocol for use in direct modem-to-modem connections.

This is the packet format:

0x7e (start of packet)
  <send sequence number, acknowledged sequence, and data area 0 to 512 bytes>
0x7e (end of packet)

Where the meaning of "send sequence number" and "acknowledged sequence number" are exactly as in TCP.
The data is escaped using HDLC conventions (0x7d 0x5d, 0x7d 0x5e) but escaping is done *after* CRC calculation because the CRC itself is subject to escaping.

My question is about the most effective way to apply the CRC32. I have two possible approaches:

1. CRC the header and the data area, EXCLUDING the CRC area, then append the CRC32 to the packet. On the receiving end, do the same (checksum all but the CRC area) and do a direct comparison to the appended CRC.

2. CRC the header, the data area, *AND* the CRC (initialized to zero), ie. checksum the WHOLE packet, including the four zeros where the CRC is located, then overwrite the 4 zeros with the calculated CRC. On the receiving end, checksum the whole packet, *including* the appended CRC and look for a "magic CRC" value that will always be the result if the packet has not been corrupted.

My first thought was approach #1 was simpler and was just as effective as approach #2 might be. Then I got thinking about how CRC detects all possible one bit errors, and all possible two bit errors, etc. If I apply approach #2, I get the full effectiveness of CRC on the CRC itself. If I apply approach #1, the CRC itself is not "protected" by the CRC.

Is approach #2 stronger than approach #1? Any comments about the strongest approach, or links to online references would be great. I have already found many web pages on CRC, but none that address the checksumming of the CRC itself, and the increased effectiveness of doing so.
Who is Participating?
Approach 2 doesn't "protect" the crc area anymore than approach 1. In fact, with approach 1 you crc header+data; with approach 2: header+data+4zerobytes; then transmit header+data+crc. So approach2 is the same thing as doing some (payload indipendent) invertible transformation to the crc of approach 1 before transmitting it, which doesn't help error checking.
I do not think possible approach #2 is possible at all. When you create a checksum with four zeros in place of CRC sum, and then overwrite the four zeros with checksum you modify the original data thus invalidate your checksum.

Additionally you do not have to check your checksum against corruptions. Think the following scenarios:

. If your data is corrupted, its checksum on the receiveing end will not match original checksum added to the end of your data.

. If your data transferred inact but your checksum is corrupted then again the checksum of your data will not match the checksum value transferred over network.

so approach #2 is not possible and not needed. Approach #1 already provides enough checks against data & checksum corruption.
Approach 2 is somewhat possible but not exactly as as adq presented it. I mean, it is always possible to find a 32bit integer x such that
crc32(data -concatenated with- x) = some given constant K.
Thus it is possible to have the receiving end just crc the whole packet and check that the result is equal to K (rather than crc-ing the packet minus crc field and comparing the result with the crc field). However it's not enough to calculate crc32(data -concatenated with- 32bit 0) and use that as x.
adg080898Author Commented:
To alikoank:
I have come across, in at least one CRC reference that I remember, that some implementations of CRC will checksum right through the CRC, and will then expect a zero CRC to be correct. It mentioned that this is a "bonus" for CRC because it makes it much easier to do CRC in hardware because it doesn't have to "skip" the CRC, and it is very easy to test for zero in hardware (no "compare" just a wide NAND gate).

You highlighted something that could potentially *weaken* approach 2. Since it is checksumming an additional 4 bytes, it might VERY slightly reduce the effectiveness of the CRC. Or maybe not...would it just jump around the CRC lookup table a little bit and have no real effect - performing a transform that is essentially a no-op as far as error detection is concerned?

Should I just forget about all this "one bit error" "two bit error" stuff and just accept that there is a 1 in 2**32 chance that the packet is corrupted AND the checksum matches?

I am worried slightly by something else. If line noise corrupts a data byte into a 0x7e then it will confuse the code that finds packet boundaries. It will then take the four bytes before the corrupted byte and treat THEM as the CRC32. Then, when it encounters the "real" 0x7e, it will again do a bogus checksum test of the remainder of the "real" packet. This is scary because the data being checked (in both cases) is drastically different from the original packet - maximizing the chance of a false positive. In this case, the best checksum on the planet boils down to its width: how likely is a false match.

I do plan to detect bad escapes (a 0x7d that is followed by something other than a 0x5d or 0x5e) and drop the packet if any are encountered, so this might buy me a little bit of extra protection.

I was considering storing the CRC32 followed by the inverted CRC32 to make a false positive incredibly unlikely. It does scream "overkill" though! :)

Am I being overly cautious? Practically every modem currently in use in PCs has error detection algorithms (V.42 / MNP / etc). I believe modems use a CRC16 on their "frames" and their frames are very small. Is it safe to say that it is virtually impossible for an error to get past V.42 AND a 32-bit application-level checksum? Somewhere between 1 in 2**32 and 1 in 2**48 likelyhood?

If anybody has some empirical knowledge about modem reliability to share (the frequency of corruption getting past modem error detection), that would be terrific.
adg080898Author Commented:
Correction: Actually, the final checksum in that hardware-CRC case may not necessarily be zero, but it is at least constant.
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.