Link to home
Start Free TrialLog in
Avatar of DanRollins
DanRollinsFlag for United States of America

asked on

Challenge/Puzzle: ASCII Opcodes

Have you ever examined a hex dump of a program's code?  Sometimes you can see s short sequence of ASCII text.  For instance,
   PSQ
is
    PUSH EAX    
    PUSH EBX
    PUSH ECX

This challenge is designed to test just how well you know the Intel opcodes.  The challenge is either:

1) Write a program that will run without error, whose opcodes make up a recognizable word or phrase.

2) Write a short program that does something useful but which can be TYPED ENTIRELY ON A KEYBOARD.  
Avatar of DanRollins
DanRollins
Flag of United States of America image

ASKER

#2 is VERY HARD because all opcodes and ModRegR/M and immediate and offset bytes must be in the range of 0x20 through 0x7E (some symbols, letters, and numbers).  Here is a sample of how hard this is:

If you want to clear out a register, say EAX, you will typically use

    XOR EAX,EAX
but the opcodes for that are
    0x33 0xC0
and while 0x33 is an ASCII '0', 0xC0 is disallowed (you can't type 0xC0 on the keyboard).  
==-=-=-=-
Note: If you tell me that it is possible to type the 0xC0 character by pressing and holding ALT while then striking some digits on the numeric keybad, then you have missed the point and you are disqualified on account of idiocy.)
==-=-=-=-
One example of how to clear out a register is this:

    push 0x31313131
    pop  eax
    sub  eax, 0x31313131

The opcodes are (in hex)
    68 31 31 31 31 58 2D 31 31 31 31
which is the 'keyboard-able' ASCII string:
    h1111X-1111

=-=-=-=-=-=-=-=-=-=-=-
Here is something to get you started:  Look in the Intel Manual (Vol 2: Instruction Set Reference) Table A2.  Note that the available opcodes are mainly AND, XOR, INC, PUSH and some Jmps.  There may be some two-byte opcodes available as well.

Very few addressing modes are available because the ModRegR/M byte must start with a MOD of 00 or 01 (which specify mainly access to memory via immediate offset or indirectly through one or more index registers).  Refer to tables 2-1 and 2-2
=-=-=-=-=-=-=-=-=-=-=-
To accomplish anything meaningful, you will almost certainly need to result to self-modifying code.

Any takers?

-- Dan
I'll pick up the gauntlet (but I may disappear in oblivion again for a short while ;).

I think #1 is harder then #2, unless you meant the haxadecimal opcodes that must make up a certain phrase (still hard, with only A-F as letters), unless you mean that "recognizable" means that this: #$↨╥A☻↨Bττ¬É┐L counts as recognizable for my name? (I wonder if EE picks up all the 127+ codes correctly!)

Though it is a funny idea of course! Two thinks to achieve number one are possibly searching binaries in your system for phrases (who knows, maybe someone has already done it accidentally?) or using illegal opcodes but with legal jumps to make the program work.

It's a pity that a lot of "funny" opcodes are two-byte opcodes starting with 0Fh or FFh, which is out of the range for at least #2.

Abel
Actually #1 is too easy, since it doesn't need to perform any sort of function.  An example of a short phrase:

push  6F6C6C65h
and   byte ptr [ecx+62h],ah
ins   byte ptr [edi],dx
and   dword ptr [ecx],esp
and   dword ptr [ecx],esp

which is:
    68 65 6C 6C 6F 20 61 62 65 6C 21 21 21 21                
which is
    hello able!!!!

And of couse, you could just use the DB pseudo-op to dump preset strings into your code segment.  So, there is no real challenge there.  

But maybe some magic phrase will strike somebody funny:   'Cool! executing my name makes the printer go offline" or "executing 'quiET! CaTS be hERe' makes the mouse freeze up" or something.

-- Dan
#2 is not that hard. In a DOS prompt, try this:

type X5O!P%@AP[4\PZX54(P^)7CC)7}$This really is a valid Application!$H+H* >test.com
test

;-) (Actually, I stole the idea. I'll explain and give credits in a later comment.)
Ah. What stupid things am I saying? Of course not "TYPE"! Well, rather like this:

copy con test.com
X5O!P%@AP[4\PZX54(P^)7CC)7}$This really is a valid Application!$H+H*
^Z (<-- note: CTRL-Z or F6)
test
(of course, you can also use Notepad or any ASCII text editor of your choice instead of COPY CON. In this case, there is no need for the ^Z, which is a EOF char to signal COPY that the end of file has been reached.)
AvonWyss, your code reminded me of something:

Enter the Eicar virus. Yes it is a virus, not that it is one, but try your AV kit on it and you will see that they recognize it as one. The same string, but with the text "EICAR-STANDARD-ANTIVIRUS-TEST-FILE!" is a standardized way of testing your virus scanner. EICAR = European Institute for Computer Antivirus Research.

But you probably knew all this already. (I know you wanted to give the credits, I just couldn't resist posting this once I found out that my NAV was not so happy with your program)

A valid application that you can read as a book. It won't be Windows PE format then. And DOS EXE format neither. But heay, with COM we should be able to tell a story of 64kB! I guess that it should be possible to actually do something (a first person shooter?) in the 64k characters of text. Let's get to work ;)
abel, of course you're right, it's an EICAR test file with the modified string. I just happend to think about this at the very moment when I read Dan's question #2.

But for more general things, a working approach may be to build a library (or even sort of an assembly "language") to create such applications. Assuming that we should be able to somehow put together a fairly complete set of commands which are compiled to ASCII codes (where special attention has to be brought where a command would accept a constant or a memory address), we could program just about anything just as a sequence of keyboard chars. Another solution would be to create a small interpretor, just-in-time-compiler or decoder, which would generate the actual program in memory. If both this interpreter/compiler/decoder and the code to be interpreted are just keyboard chars, the task would also have been successful.

Actually, I think that a BASE64 decoder may just be ideal for that kind of task. Its not too complex and therefore one should be able to write one using just keyboard chars, and the BASE64 data is easily created from binary data...
That's a good example.  As a COM program, it seems that it automatically runs in 16-bit mode.  A dissasemby yields:

0B9B:0100 58            POP     AX
0B9B:0101 354F21        XOR     AX,214F
0B9B:0104 50            PUSH    AX
0B9B:0105 254041        AND     AX,4140
0B9B:0108 50            PUSH    AX
0B9B:0109 5B            POP     BX
0B9B:010A 345C          XOR     AL,5C
0B9B:010C 50            PUSH    AX
0B9B:010D 5A            POP     DX
0B9B:010E 58            POP     AX
0B9B:010F 353428        XOR     AX,2834
0B9B:0112 50            PUSH    AX
0B9B:0113 5E            POP     SI
0B9B:0114 2937          SUB     [BX],SI
0B9B:0116 43            INC     BX
0B9B:0117 43            INC     BX
0B9B:0118 2937          SUB     [BX],SI
0B9B:011A 7D24          JGE     0140
DB 'This really is a valid Application!$'

(after self modifications)

0B9B:0140 CD21          INT     21
0B9B:0142 CD20          INT     20

=-=-=-=-=-=-=-=-=-=-=-
Most of the code is setting up the Int21/int20 sequence, which is otherwise not possible to code.  So it builds the four opcode bytes and fixes them in memory (it relies on some existing bytes being there so that it can use SUB rather than MOV).  The rest is just setting up AH and DX for the Int21 call and the message itself.

Note the effective technique used to copy data from one register to another:  PUSH/POP.  

The code relies on the nature of COM programs and the environment in effect when a COM program begins.  For instance, POP AX appears to be a quick way to clear AX.

I thought that an easy way to create this would be:

   echo X5O!P%@AP[4\PZX...etc..>test.com

but the %@ gets converted to just @ and the ^ gets lost for some reason.  Note that the techniqu would work if you use %% rather than %, but I don't know how to handle the ^ character.  If the program is > 128 (max command line len), you would need to do it isn two steps:

   echo YadaYada...Etc...>DoIt.com
   echo FooBar...Etc...>>DoIt.com

but that would cause an embedded 0D 0A in the code.  You would need to either jump around it or use it in some way (hmmmm...)

-==-=-=-=-=-=-=-=-=-=-=-=-
Ten or 15 years ago, I had a program like this published in PC Magazine.  The program it generated was actually quite useful:  It set ERRORLEVEL based upon whether or not there was a diskette in drive A: (other techniques tended to trigger an "Abort, Retry, Ignore" prompt).  I used the DOS interrupt INT 25H to (attempt to) read the boot sector of the disk.

>> ...virus...
I think that we may be playing with fire here.  But the challenge is so novel and interesting, that I think it worth persuing anyway.  As my granpa used to say "Guns don't kill people, it's the bullets I think, or maybe old age or something."

Re my PC Mag article, I later realised that a one-byte change (replace 25 with 26) was a recipe for disaster.  I'm pretty sure that modern OS would foil such foul mischief, but I don't know.

-- Dan
>> Actually, I think that a BASE64 decoder may just be ideal .

I have a hunch that a Base64 decoder would be a lot harder to write than a hex-digit pair decoder.  Maybe that should be our first focus...

-- Dan
Avatar of MarkSteward
MarkSteward

Copy the following to a .com file:

 PPPHHHHH7P[5__0G!)G2)G4)G6X5555u5PAX[!Gr4455555'1GrH_^]]"
 This program will reboot the computer.
 It is written entirely in x86 machine code, but only uses characters between " and u, so can be created automatically by a batch file.
 Written by Mark Steward (marksteward@hotmail.com), 2001.  Do not distribute without credit.

I made the above when I was bored last summer (exams, etc.).  It was designed to be memorable (I even closed the square brackets), and is ideal for use in a batch file.
It was entirely written by me and last modified 1/6/2001.
You can optionally ignore the final character (") if you're sure it's going to work, as it shouldn't reach that far.

I have also written an on-the-fly hexadecimal decoder (a little larger and less memorable), but I can't find it at the moment.  Please post if interested.

BTW, coping with variable origins (other than 0x100, e.g.) is very difficult: has anyone worked out a sure way?

Mark.


Source (compiles in Debug!):

push ax ; P
 push ax ; P
  push ax ; P
   dec ax ; H
   dec ax ; H
   dec ax ; H
   dec ax ; H
   dec ax ; H
   aaa ; 7
   push ax ; P
   pop bx ; [
   ; modify code below:
   xor ax,5f5f ; 5__
   ; or 4_, and then modify later
   xor [bx+21],al ; 0G!
   sub [bx+32],ax ; )G2
   sub [bx+34],ax ; )G4
   sub [bx+36],ax ; )G6
   ; mov ds,40:
  pop ax ; X
  xor ax,3535 ; 555
  xor ax,3575 ; 5u5
  push ax ; P
  ; (pop ds)
  db 'A'
  ; mov [72],1234:
 pop ax ; X
pop bx ; [
and [bx+72],ax ; !Gr
xor al,34; 44
xor ax,3535 ; 555
xor ax,2735 ; 55'
xor [bx+72],ax ; 1Gr
; (jmp ffff:0)
db 'H_^]]'
; (ret)
db '"'
; 39 bytes!
Dan: (I assumed by "write" a program you meant something original).
Hya MarkSteward!  I'm glad you dropped by!.  There are some clever techniques there:

* You need to be able to clear a register now and then, so you push some zeros onto the stack early on.  

* The AAA to get 0x0101 is nifty.

*) A useful way to set a register to some hard-to-get values:
    xor AX, (this)
    xor AX, (that)

*) That RET at the end.... GENIUS! (lol)

-==-=-=-=-
>>I assumed by "write" a program you meant something original

Absolutley -- AvonWyss knew that previous code was not really in the running (didn't you Avon?)

>> ...have also written an on-the-fly hexadecimal decoder...

We were just discussing that.  Of course, given a hex-to-binary decoder, you would be free to write any kind of program at all.  It would bootstrap into a full-fledged program.

>> coping with variable origins (other than 0x100, e.g.) is very difficult...

Lotsa things are difficult here!  Only the impossible is impossible, and I've been wondering a bit about THAT!

I think you could make some self-relative code ... Prepare some memory location (say 0x3535) with some code, then CALL that addr.  There, you could pop the return address into BX, push it back and then RET.  That would set up an index register with a code-local address.  There would be some difficulties...

=-=-=-=-=-=-=-=--==-
One thing to explore... It seems to me that since PUSH and POP are freely available, it may be relatively easy to write short sequences on the stack.  Can anyone think of a general-purpose technique to execute such code?

I'm looking at some free time pretty soon, so I expect to work on a hex decoder.  That's gotta be the holy grail.

-- Dan
Yeah, a hex decoder is useful: I wrote it when I was being asked to test my school's security (they were trying programs intended to make Windows 98 secure).  I'll try to find it.  The easiest thing to write from scratch would be a kind of hexadecimal decoder that used the charset 0123456789:;<=>?, but you'd have difficulty piping that in DOS, etc.

Stylistically, pushing the zeroes also makes the machine code easier to remember, besides making it easier to make up many numbers.  I did quite a lot of programming using only opcodes between 20 and 7e, and I think you can set a register to any value with only two xors or subs.  Pushing immediates (which I haven't used, as it's 286+) is a more convenient way to get certain values into registers, though.

Speaking of viruses, this stuff is probably quite useful for buffer overruns (although I suppose if you can overrun, you can probably write special characters).  You could also have a text document that starts with something like "$$ss", which will jump forward 75 bytes, potentially to another "ss" or similar, ending at a fake public key or signature.  That would be fun to craft.  Or a virus could conceal part of its code in html files after "<!--", as that is cmp al,21;sub ax,xx2d.  But enough on viruses.

I'm interested in your idea of executing a frame.  It might be more efficient in certain places.

OK, so it's at least 5 years out of date, but this could help those people who like doing more than they're meant to with QBasic, etc.  Instead of pages of Chr$(&H8B)+Chr$(4)+... you could have an executable string, which would be CALLed ABSOLUTE.

Just some ideas; I'm working on the recognisable word or phrase.

    Mark.
Dan, Mark,

Of course I knew that this would not exactly be what was asked for, and I think that my comment about the credits not belonging to me made this clear. But it came to my mind while reading Dan's question (and the comment "#2 is VERY HARD because...") , so that I put this here as starting point for a discussion.
AvonWyss: sorry, I see that did look a bit like a slur.  No discredit intended; I was just trying to justify my interruption into a fairly aged conversation.  Thanks for helping clear it up.
think quite a long time back, i saw a project that can converts any .com file to one which contains only viewable characters. hope it is of interest to you all
ASKER CERTIFIED SOLUTION
Avatar of MarkSteward
MarkSteward

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Game's over.  An Australian, Jim Tucker did this in 1995.  He also uses XP as the start of the program, and uses base 91 encoding (I assume ASCII 20-7e, excluding a couple of odd ones).  See http://www.simtel.net/pub/pd/43214.html to download the program.

Still, the challenge was to involve knowledge of Intel opcodes, and to make something intelligible that hides code.  I've got a long paragraph that prints a message, but I'm sure there's still some fun stuff to find...
checking...
In reply to the output message,
   hello!

VERY!  The code builds this sequence:

    LODSW
    SUB     AX,6161
again:
    AAD     10
    STOSB
    LODSW
    SUB     AX,6161
    JNB     again
    RET

Which loops through converting the data
    65 6C --> B4  (C-1 is B, 5-1 is 4)
    6A 61 --> 09
    6B 6C --> BA
    69 61 --> 08
    62 61 --> 01
    6E 6D --> CD
    62 63 --> 21
    64 6D --> C3

...to create the payload:

    0100 B409      MOV  AH,09
    0102 BA0801    MOV  DX,0108
    0105 CD21      INT  21
    0107 C3        RET
     0108 686924    DB 'hi$'

But to get there, it first converts this sequence:
    013A   JNB 0160                              
    013C   POP DI                                
    013D   INC BX                                
    013E   SUB AX,612D                            
    0141   DB  61                                
    0142   PUSH BP                                
    0143   POP SP                                
    0144   SUB CH,[DI]                            
    0146   SUB AX,6161                            
    0149   JNB     01C2                              
    014B   INC BX                                
to...
    REPZ
    MOVSB
    POP    DI
    RET  ; execute the converter
;------------- the hex converter
    LODSW
    SUB     AX,6161
    AAD     10
    STOSB
    LODSW
    SUB     AX,6161
    JNB     0142
    RET

What I like is the insight to use 6?6? as the source bytes.  It would be much more difficult to use actual hex digits 30, 31, etc...

-- Dan
MarkSteward ,
EE is not letting me accept your comment as an answer.  I'm told that if you post an answer, I should be abble to accept it.  I'm upping the points (if it will let me...)

-- Dan
Bummer! Now we need a new challenge!
-- Dan