Mac GUI

Re: Integer Basic Tokenization

comp.sys.apple2

Author: Paul Schlyter
Date: 17 Mar 2001 3:35 pm
Ref: 1 2 3

In article <98t1cf$r6...@news5.svr.pol.co.uk>,
Thug <th...@optusnet.com.au> wrote:
 
> "Paul Schlyter" <pau...@saaf.se> wrote in message
> news:98sef0$5tq$1@merope.saaf.se...
>>
>> Visit my apple 2 page at:
>>
>>     http://hotel04.ausys.se/pausch/apple2
>>
>> and download my utility FID, which comes with C source.  It contains
>> de-tokenizers for Applesoft Basic, Integer Basic, and S-C Assembler
>> source files (the latter are stored as "I" type files too!).  That
>> code should help you figure out how INteger Basic programs are
>> tokenized.
...........................
> Another example:
>  
> 2020 X0=133*L : Y0=100 : S=-2 : X1= RND(200)+150)*(1-L)
>  : FOR I=1 TO 500 : NEXT I : X3=0
>  
> Becomes:
>  
> 2020 X-20111^EHIMEM:*L : Y-20111 POKE HIMEM: : S=-2 : X14449
> RND(200)+150)*(1-L)
>  : FOR I=1 TO 500 : NEXT I : X-20367 HIMEM:HIMEM:
>  
> So, obviously it's Variable names with trailing digits which is the problem.
..............
> Anyway, the fix is easy (I think). You need to add another check to the
> code, that's all. You need to add a "InVar" boolean flag to indicate that a
> variable name is being "constrcuted".  InVar would get set whenever a
> AlphaNum character is encountered that isn't part of a REM or a String; and
> it would get reset as soon as a Token is encountered. Finally, make InVar
> another exception to the "convert the following two bytes to a number"
> routine.
 
An updated version of FID, where this IntBasic listing bug is fixed,
is now available on my page above.
 
The old code already had a "lastAN" boolean flag, which remembered
whether the last token was an alphanumeric character or not -- although
it also was set if the last token was an ending quote or an ending
paranthesis - this was to help determining whether a leading space
should be inserted in front of the next token or not.  I changed the
name of that boolean flag to "leadSP" instead, and now lastAN is
set only if the last token was an alphanumeric ASCII character;
it is NOT set if the last token was an integer constant.
 
All the needed modifications are in the dumpBufferAsIntBasicFile()
function, and the updated version appears below.  The Integer
Basic tokens are assumed to reside where "data" points, and "len"
is supposed to contain the length of the Integer Basic file image.
The parameters "fname" and "f" are used to control where the
output appears and how it's supposed to be named (the dump ends
with a "SAVE <filename>" line, i.e. it can be transferred to the
Apple II as a text file and EXEC'ed into memory, or one can do an
IN#2 (assuming a serial card resides in slot 2) and then have some
other computer send it directly to that serial card.
 
uint  =  unsigned int
U8    =  unsigned char
==========================================================================
 
int dumpBufferAsIntBasicFile( U8 *data, char *fname, uint len, FILE *f )
/*
 *   Integer Basic file format:
 *
 *   <Length_of_file> (16-bit little endian)
 *   <Line>
 *   ......
 *   <Line>
 *
 *   where <Line> is:
 *      1 byte:   Line length
 *      2 bytes:  Line number, binary little endian
 *      <token>
 *      <token>
 *      <token>
 *      .......
 *      <end-of-line token>
 *
 *   <token> is one of:
 *      $12 - $7F:   Tokens as listed below: 1 byte/token
 *      $80 - $FF:   ASCII characters with high bit set
 *      $B0 - $B9:   Integer constant, 3 bytes:  $B0-$B9,
 *                     followed by the integer value in
 *                     2-byte binary little-endian format
 *                     (Note: a $B0-$B9 byte preceded by an
 *                      alphanumeric ASCII(hi_bit_set) byte
 *                      is not the start of an integer
 *                      constant, but instead part of a
 *                      variable name)
 *
 *   <end-of-line token> is:
 *      $01:         One byte having the value $01
 *                   (Note: a $01 byte may also appear
 *                    inside an integer constant)
 *
 *  Note that the tokens $02 to $11 represent commands which
 *  can be executed as direct commands only -- any attempt to
 *  enter then into an Integer Basic program will be rejected
 *  as a syntax error.  Therefore, no Integer Basic program
 *  which was entered through the Integer Basic interpreter
 *  will contain any of the tokens $02 to $11.  The token $00
 *  appears to be unused and won't appear in Integer Basic
 *  programs either.  However, $00 is used as an end-of-line
 *  marker in S-C Assembler source files, which also are of
 *  DOS file type "I".
 *
 *  (note here a difference from Applesoft Basic, where there
 *  are no "direct mode only" commands - any Applesoft commands
 *  can be entered into an Applesoft program as well).
 *
 */
{
#define REM_TOKEN   0x5D
#define UNARY_PLUS  0x35
#define UNARY_MINUS 0x36
#define QUOTE_START 0x28
#define QUOTE_END   0x29
    static char *itoken[128] =
    {
        /* $00-$0F */
        "HIMEM:","<$01>", "_",     " : ",
        "LOAD",  "SAVE",  "CON",   "RUN",    /* Direct commands */
        "RUN",   "DEL",   ",",     "NEW",
        "CLR",   "AUTO",  ",",     "MAN",
 
        /* $10-$1F */
        "HIMEM:","LOMEM:","+",     "-",     /* Binary ops */
        "*",     "/",     "=",     "#",
        ">=",    ">",     "<=",    "<>",
        "<",     "AND",   "OR",    "MOD",
 
        /* $20-$2F */
        "^",     "+",     "(",     ",",
        "THEN",  "THEN",  ",",     ",",
        "\"",    "\"",    "(",     "!",
        "!",     "(",     "PEEK",  "RND",
 
        /* $30-$3F */
        "SGN",   "ABS",   "PDL",   "RNDX",
        "(",     "+",     "-",     "NOT",      /* Unary ops */
        "(",     "=",     "#",     "LEN(",
        "ASC(",  "SCRN(", ",",     "(",
 
        /* $40-$4F */
        "$",     "$",     "(",     ",",
        ",",     ";",     ";",     ";",
        ",",     ",",     ",",     "TEXT",  /* Statements */
        "GR",    "CALL",  "DIM",   "DIM",
 
        /* $50-$5F */
        "TAB",   "END",   "INPUT", "INPUT",
        "INPUT", "FOR",   "=",     "TO",
        "STEP",  "NEXT",  ",",     "RETURN",
        "GOSUB", "REM",   "LET",   "GOTO",
 
        /* $60-$6F */
        "IF",    "PRINT", "PRINT", "PRINT",
        "POKE",  ",",     "COLOR=","PLOT",
        ",",     "HLIN",  ",",     "AT",
        "VLIN",  ",",     "AT",    "VTAB",
 
        /* $70-$7F */
        "=",     "=",     ")",     ")",
        "LIST",  ",",     "LIST",  "POP",
        "NODSP", "DSP",  "NOTRACE","DSP",
        "DSP",   "TRACE", "PR#",   "IN#",
    };
 
    U8 *data0 = data;
    int alen = get16(data);
    pause(22,f);
    for( data+=2; *data && (data-data0 <= alen); )
    {
        int inREM = 0, inQUOTE = 0;
        int lastAN = 0, leadSP = 0, lastTOK = 0;
        unsigned int lineno;
        unsigned int linelen = *data++;
        lineno = get16(data), data += 2;
        linelen = linelen;
        fprintf( f, "%u ", lineno );
        for( ; *data!=0x01; data++ )
        {
            leadSP = leadSP || lastAN;
            if ( *data & 0x80 )
            {
                if ( !inREM && !inQUOTE && !lastAN && (*data >= 0xB0 && *data <= 0xB9) )
                {
                    signed short integer = get16(data+1);
                    int leadspace = lastTOK && leadSP;
                    fprintf( f, leadspace ? " %d" : "%d", (int) integer );
                    data += 2;
                    leadSP = 1;
                }
                else
                {
                    char c = *data & 0x7F;
                    int leadspace = !inREM && !inQUOTE &&
                                    lastTOK && leadSP && isalnum(c);
                    if ( leadspace )
                        fprintf( f, " " );
                    if ( c >= 0x20 )
                        fprintf( f, "%c", c );
                    else
                        fprintf( f, "^%c", c+0x40 );
                    lastAN = isalnum(c);
                }
                lastTOK = 0;
            }
            else
            {
                char *tok = itoken[*data];
                char lastchar = tok[strlen(tok)-1];
                int leadspace = leadSP &&
                          ( isalnum(tok[0]) ||
                            *data == UNARY_PLUS ||
                            *data == UNARY_MINUS ||
                            *data == QUOTE_START  );
                switch( *data )
                {
                    case REM_TOKEN:   inREM = 1;    break;
                    case QUOTE_START: inQUOTE = 1;  break;
                    case QUOTE_END:   inQUOTE = 0;  break;
                    default:  break;
                }
                fprintf( f, leadspace ? " %s" : "%s", tok );
                lastAN  = 0;
                leadSP = isalnum(lastchar) || lastchar == ')' || lastchar == '\"';
                lastTOK = 1;
            }
        }
        fprintf( f, "\n" ), data++;
        if ( pause(0,f) < 0 )
            goto exit;
    }
    len = len;
    if ( f != stdout )
        fprintf( f, "\nSAVE %s\n", fname );
exit:
    return 0;
}  /* dumpBufferAsIntBasicFile */
 
--  
----------------------------------------------------------------
Paul Schlyter,  Swedish Amateur Astronomer's Society (SAAF)
Grev Turegatan 40,  S-114 38 Stockholm,  SWEDEN
e-mail:  pausch at saaf dot se   or    paul.schlyter at ausys dot se
WWW:     http://hotel04.ausys.se/pausch    http://welcome.to/pausch



Integer Basic Tokenization
115 Mar 2001 9:43 pmThug
216 Mar 2001 8:13 am|- Paul Schlyter
316 Mar 2001 12:31 pm|  |- Thug
416 Mar 2001 4:59 pm|  |  |- Matthew Russotto
517 Mar 2001 12:50 pm|  |  |- Paul Schlyter
617 Mar 2001 3:35 pm|  |  \ Paul Schlyter
817 Mar 2001 12:51 pm|     \ Paul Schlyter
918 Mar 2001 2:40 pm|        \ David Wilson
1217 Mar 2001 12:49 pm   \ Paul Schlyter