You didn't specify what flavor of assembler. Here's a version using C syntax with very simple operations. You should be able to convert pretty readily to any particular assembly language.

This version assumes a 32 bit number. If you need fewer bits, you can reduce the maximum digit supported.

Because you want fast, the easiest thing is to unroll all the digit processing. This removes any need for an integer divide, the slowest assembly instruction.

I used a set_digit() macro from C to simplify understanding. Every assembler I've ever programmed has had a macro language more powerful than the C preprocessor, so this shouldn't present much of a challenge.

If you don't know how to do a 'while' statement in assembly, it usually looks like this:

# while (test) { stuff }

# jumpto TEST

# LOOP:

# stuff

# TEST:

# test the condition

# branch-if-true, LOOP

There are two flavors of BCD in common use. The most basic BCD encoding simply takes the value (0-9) of two decimal digits and puts them into the high- and low- nibbles of a byte, respectively:

19 -> 1,9 -> byte[1 9] -> byte[0001 1001]

That's "1 << 4" in C, and uses what are called "bit shifting" operators. For your processor it may be called "left shift" or "shift left".

To join the two nibbles, high and low, you treat them as bytes with 4 interesting bits set. To start with:

0000HHHH - high nibble value (0000 0001 = 0x01)

0000LLLL - low nibble value (0000 1001 = 0x09)

Then you move (shift left) the high nibble to the left:

1 << 4

HHHH0000 = 0001 0000 = 0x10

Then you "bitwise or" or "logical or" the two values together

0x10 | 0x09

HHHH0000

0000LLLL

HHHHLLLL -> 0x19

That is the simple BCD form. Various processors, including the 8080, 6502, and 68000 families, had special purpose instructions for dealing with data in this form, so if you intend to do complex math with the data this is where you need to be.

But transferring BCD in this form is a chore, because the byte values frequently include 0x00 and sending 0x00 across the internet (or just to a C language routine) is dangerous.

So rather than expand the BCD out to ascii bytes (simple, but wasteful of space) there is an "advanced form" of BCD that avoids nul bytes and low-order ascii characters (but does require 8-bit safe transmission). That form takes the BCD and does a binary NOT operation (most CPUs have this as a single opcode).

The encoded digit values 9 and 0 are the extreme cases for BCD. Look at their representation:

99 = 1001 1001

90 = 1001 0000

09 = 0000 1001

00 = 0000 0000

The problems here are with 00, which is a NUL when expressed as ASCII, and 09 (TAB).

By inverting the bits, we get:

99 = 0110 0110

90 = 0110 0000

09 = 1111 0110

00 = 1111 1111

The lowest possible ASCII byte is 0110 0000 = 0x60, the back-tick (`). This avoids the NULs and other control-sequence bytes, but does require that transmission support high-bit (aka 8-bit) characters. I

If you want just simple encoding, return at the end of the simple encoding section. If you also want the advanced encoding, continue through to the end.

long accum;

char buffer[10];

buf_x = 0;

#define set_digit(DIGIT) \

while (accum >= DIGIT) \

++buffer[buf_x]; \

++buf_x;

long_to_b10_digits:

set_digit( 1000000000 );

set_digit( 100000000 );

set_digit( 10000000 );

set_digit( 1000000 );

set_digit( 100000 );

set_digit( 10000 );

set_digit( 1000 );

set_digit( 100 );

set_digit( 10 );

buffer[buf_x] = accum; // 1s digit

simple_bcd_encoding:

char digits[6];

int digit_x;

/* This approach does not use digits[6], but see below. */

buf_x = 0;

for (digit_x = 0; digit_x < 5; ++digit_x) {

digits[digit_x] = (buffer[buf_x] << 4) | buffer[buf_x + 1];

buf_x += 2;

advanced_bcd_encoding:

for (digit_x = 0; digit_x < 5; ++digit_x)

digits[digit_x] = ~digits[digit_x];

digits[5] = 0;

This version assumes a 32 bit number. If you need fewer bits, you can reduce the maximum digit supported.

Because you want fast, the easiest thing is to unroll all the digit processing. This removes any need for an integer divide, the slowest assembly instruction.

I used a set_digit() macro from C to simplify understanding. Every assembler I've ever programmed has had a macro language more powerful than the C preprocessor, so this shouldn't present much of a challenge.

If you don't know how to do a 'while' statement in assembly, it usually looks like this:

# while (test) { stuff }

# jumpto TEST

# LOOP:

# stuff

# TEST:

# test the condition

# branch-if-true, LOOP

There are two flavors of BCD in common use. The most basic BCD encoding simply takes the value (0-9) of two decimal digits and puts them into the high- and low- nibbles of a byte, respectively:

19 -> 1,9 -> byte[1 9] -> byte[0001 1001]

That's "1 << 4" in C, and uses what are called "bit shifting" operators. For your processor it may be called "left shift" or "shift left".

To join the two nibbles, high and low, you treat them as bytes with 4 interesting bits set. To start with:

0000HHHH - high nibble value (0000 0001 = 0x01)

0000LLLL - low nibble value (0000 1001 = 0x09)

Then you move (shift left) the high nibble to the left:

1 << 4

HHHH0000 = 0001 0000 = 0x10

Then you "bitwise or" or "logical or" the two values together

0x10 | 0x09

HHHH0000

0000LLLL

HHHHLLLL -> 0x19

That is the simple BCD form. Various processors, including the 8080, 6502, and 68000 families, had special purpose instructions for dealing with data in this form, so if you intend to do complex math with the data this is where you need to be.

But transferring BCD in this form is a chore, because the byte values frequently include 0x00 and sending 0x00 across the internet (or just to a C language routine) is dangerous.

So rather than expand the BCD out to ascii bytes (simple, but wasteful of space) there is an "advanced form" of BCD that avoids nul bytes and low-order ascii characters (but does require 8-bit safe transmission). That form takes the BCD and does a binary NOT operation (most CPUs have this as a single opcode).

The encoded digit values 9 and 0 are the extreme cases for BCD. Look at their representation:

99 = 1001 1001

90 = 1001 0000

09 = 0000 1001

00 = 0000 0000

The problems here are with 00, which is a NUL when expressed as ASCII, and 09 (TAB).

By inverting the bits, we get:

99 = 0110 0110

90 = 0110 0000

09 = 1111 0110

00 = 1111 1111

The lowest possible ASCII byte is 0110 0000 = 0x60, the back-tick (`). This avoids the NULs and other control-sequence bytes, but does require that transmission support high-bit (aka 8-bit) characters. I

If you want just simple encoding, return at the end of the simple encoding section. If you also want the advanced encoding, continue through to the end.

long accum;

char buffer[10];

buf_x = 0;

#define set_digit(DIGIT) \

while (accum >= DIGIT) \

++buffer[buf_x]; \

++buf_x;

long_to_b10_digits:

set_digit( 1000000000 );

set_digit( 100000000 );

set_digit( 10000000 );

set_digit( 1000000 );

set_digit( 100000 );

set_digit( 10000 );

set_digit( 1000 );

set_digit( 100 );

set_digit( 10 );

buffer[buf_x] = accum; // 1s digit

simple_bcd_encoding:

char digits[6];

int digit_x;

/* This approach does not use digits[6], but see below. */

buf_x = 0;

for (digit_x = 0; digit_x < 5; ++digit_x) {

digits[digit_x] = (buffer[buf_x] << 4) | buffer[buf_x + 1];

buf_x += 2;

advanced_bcd_encoding:

for (digit_x = 0; digit_x < 5; ++digit_x)

digits[digit_x] = ~digits[digit_x];

digits[5] = 0;