TODATA.TXT ---------- "To add or to remove a line number, that is the subject." by Emmanuel ROCHE Tonight, there is a storm where I live on the Atlantic Coast, so it is difficult to sleep. This week-end, while trying to put some order into my stuff, I encountered the listing of an old BASIC program to transfer files between computers. I had a look. Since the programmer was using a line editor, he was expecting, of course, that the recipient would send him, too, files with line numbers at the beginning of each line. This was perfectly normal, back then. However, in case the sender had used ED or WordStar to write his message, there was an option, in this old comm program in BASIC, to add line numbers in front of each line. This got me thinking. This is almost exactly what I do from time to time, and have not find a nice simple way of doing. (PIP has an option (in fact, 2) to add line numbers to a file, but they are not of the form that I needed.) For example, a few months ago, when explaining the chain of thoughts that went into the creation of "Computer-Aided Family Trees (CAFT)", I wrote: "I used one of my old BASIC program, TODATA.BAS, to add a number and a DATA statement in front of each line. (Since this program is ugly, I decided not to show it to you. It is old, and should be rewritten, but I only use it from time to time, so have never invested the time to improve it. As long as it works without error, that's the most important thing, before being readable or pretty.)" Is it, finally, the opportunity to improve it? Time will tell. Anyway, I have nothing to read, and the storm is noisy. Programming will concentrate my thoughts so much that I will no longer hear it. The old comm program in BASIC used the STR$ function. The less that can be said is that I very rarely use this function. It would be interesting to check, among the two dozens of BASIC programs that I have published on the comp.os.cpm Newsgroup, how many times I have used it. According to my manual, "The value of the numeric-expression is converted to a decimal string in the same form as used in a PRINT command. Note that positive values yield a string with a leading space, whereas negative values have a leading minus sign." This is precisely that leading space that has often bothered me. For instance, in a dump program like the recently-published ENTCOM.BAS, you need to print in nice column the (hex) address, then rows after rows of (hex) bytes, neatly arranged vertically, while BASIC, it is said, "works in decimal"... Well, this is not exactly true because, right from the start, the "math packages" that are to be found inside BASIC interpreters separated them into several kinds of BASIC interpreters: those able to compute only byte values (that is to say: only up to 255, like the original Tiny BASIC!), most that were using "integers" (that is to say: one sign bit and 15-bit numbers, enabling them to compute up to 32,767. After that, for example if you wanted to reach the top of the memory, you had to do some strange math operation.), some BASIC interpreters that used fixed-point maths, some that used BCD, and finally one well-known BASIC interpreter that used not one, but two differents formats of floating-point numbers. (Since then, some BASIC interpreters, while still being compatible with the "standard", also use internally UNT values; that is to say: "Unsigned iNTeger", or 16-bits values: no more strange math operation to access the full 64K of memory of an 8-bit system.) So, in short, while BASIC appears to input and output decimal numbers, internally it uses at least 4 (very) different versions of those numbers, with a quantity of subroutines to convert from one format to the next, and the problems of rounding and truncation that can occur while doing those convertions. We, we are only concerned with the format used to output numbers, but the above explanation was made to explain that, internally, things can be pretty complex. To give you an idea, the "Falconer Floating-Point Package" that I published on the comp.os.cpm Newsgroup takes 2.5K... 2.5 KiloBytes: that is to say: as big as a fully working Tiny BASIC interpreter! All that just to print the "correct" value on screen... By the way, let us go back to the problem: displaying a number on screen. According to the manual, "Positive numbers are preceded by a space; negative numbers by a minus sign. All numbers are followed by a space." Well, a quick session with BASIC in command mode should confirm this: ? "|"8"|" | 8 | Yes, indeed, BASIC adds a space BEFORE and AFTER the number. Long, long time ago, when I was printing a hex number, I first checked if it was above 0FH (that is to say: was printed with 2 hex numbers). Else, I was first printing a leading "0" before the single hex number, since the famous BASIC had no option specifying the width of printing hex numbers. Later, I found the PRINT USING command, but settled using a variant (that must be portable, since I have used it ever since). For example, to print an address: PRINT RIGHT$ ("000" + HEX$ (adr), 4) "| " etc At first, it looks strange, but the hex address is correctly displayed using 4 hex numbers. Finally, I encountered Mallard BASIC, probably... No! Certainly the best MBASIC-compatible interpreter ever written. It is a pleasure to use it. Its line editor, being designed for a screen, is much simpler than MBASIC's line editor, which was designed for an ASR-33 Teletype, but was never modified, even when sold for the IBM Clown... Regarding HEX$, Mallard BASIC has a very nice option: PRINT HEX$ (adr, 4) will print the hex address using 4 numbers (When listing memory above 0900H (like the BDOS), I use 5 numbers, to get a PL/M-compatible hex address that I end, of course, with a "H".). Now, let us see the action of STR$: ? "|"STR$(8)"|" | 8| Ha? There is a change: no space after the number. Just to be sure, I rechecked the subroutine used in the old comm program in BASIC: it outputted a space before each line number. Apparently, this old BASIC had no trouble reading lines starting with spaces. However, me, I found them quite unpleasant: BASIC and ED, and all the line editors that I know, display the line number starting at column one, without any leading space. So, how to print this number without this leading space? Suddenly, I had a flash of light: STR$ converts a number into a string! So, instead of adding PRINT USING or PRINT RIGHT$ commands, why not consider the number as a string, and see how to print it on screen not from the first character (the space) but from the second (the first number of the number). Of course, I knew the answer, but I had a check, nevertheless, in the manual. The most-often used string functions are LEFT$, RIGHT$, and MID$. Since we have no idea how long a string will be (for instance, when getting a line from a file), we cannot use LEFT$ or RIGHT$. Only MID$ remains. Its form is: MID$ (string, start-position, facultative-sub-string-length). In our case, we don't care about the facultative sub-string length, and we already have a string (our number): only start-position remains. "The start-position is an integer-expression which specifies the character in string which is to be the first character of the sub-string. The integer-expression must yield a value in the range 1 to 255." That should be enough. My widest daisy-wheel printer is 136-columns wide, and I only know of one printer (for a Mainframe computer) that was 256-columns wide. Later, it is also written: "The first character of the original string is at position 1." That is to say: the leading space is at position 1, so the first number of the number is at position 2... Rush to the BASIC interpreter: ? "|" MID$ (STR$ (8), 2) "|" |8| Wahoo! No more leading space! ? "|" MID$ (STR$ (88), 2) "|" |88| ? "|" MID$ (STR$ (888), 2) "|" |888| (etc.) It works! So, I quickly patched TODATA.BAS and decided to make a good test. I copied my biggest ASM file into the RAMdisk, then changed the default input filetype to be ASM (instead of ASC), then typed "run": TODATA> Enter ASC File Name: ? asm86 ASM86.DAT is 19582-lines long. Using WordStar 4, I then opened the DAT file, then jumped to the last line. No doubt about it: the DAT file was really 19582 lines long! One last thing. Line editors used to be standards on computers. That's why BASIC has one, since screens did not exist when it was created (in 1964, long before microcomputers, when the ASR-33 Teletype was the standard terminal, hence the word "PRINT" to... really, physically print characters on the paper roll of the Teletype). The only difference between a line editor and a full-screen file editor is that it adds line numbers at the beginning of each line. Once you know this, line editors are perfectly usable. But, what size of file can they edit? To answer this question, I loaded Good Old Mallard BASIC Version 1.29, as originally discovered on the Amstrad PCW8256: M>basic Mallard-80 BASIC with Jetsam Version 1.29 (c) Copyright 1984 Locomotive Software Ltd All rights reserved 30061 free bytes Ok I then tried to load the above DAT file, knowing full well that it was too big: load"asm86.dat Memory full Ok and saved the portion that it had been able to load. It was 30K big, and the last DATA line number was 11670: that is to say: this 8-bit BASIC had loaded a 1167-lines long ASM file, which was chosen at random to be typical of ASM files. Now, one page, in the USA, happens to be 55 lines tall, so 1167 / 55 = 21 pages. Now, 21 pages (and 30K) maybe does not seem much now, but, on an 8-bit CP/M system, this happens to be also the size of the source code of Palo Alto Tiny BASIC... That is to say: using BASIC, you can create a COMmand file as powerful as a Tiny BASIC 3K big. (Generally speaking, a COM file is 10 to 20 times smaller than its ASM file, depending upon the quantity of comments.) (Of course, if 21 pages is not enough, you can use INCLUDE or MACLIB pseudo-ops in your ASM file. That's why so many old programs of the CP/M User Group are cut in several files.) When using BASIC to create source code files for an assembler, you then need a program to remove the line numbers and REM (or ' character). Assembly language programs have (almost universally) the ASM file type. It would be better if our assembler understood several file types, thus enabling us to write source code with BASIC, ED, or WordStar. Unfortunately, this is not the case. For historical reasons, ASM and MAC accept line numbers and "*" in column one as indicating a comment -- they were used by the Processor Technology "Software Package #1" assembler -- but they only accept an ASM file type. Of course, if we wrote an 8080 assembler in BASIC, it would be simpler if it could assemble source code written for ASM (for compatibility) and files with line numbers created with BASIC. Let us call this hypothetical "BASIC assembler" BSM. To edit a file with BASIC, it needs to have line numbers. To assemble a file with ASM, it needs to have the line numbers removed. So, we need at least 2 small utilities. Let us call them TOBSM when it add line numbers, and TOASM when it remove line numbers. 10 REM TOBSM.BAS by Emmanuel ROCHE 20 : 30 ' BSM line = 1(2345)0 + " '" + ASM line 40 : 50 PRINT 60 INPUT "TOBSM> Enter ASM File Name: " ; file$ 70 PRINT 80 file1$ = file$ + ".ASM" 90 OPEN "I", #1, file1$ 100 file2$ = file$ + ".BSM" 110 OPEN "O", #2, file2$ 120 li = 0 130 ' lin = 0 140 WHILE NOT EOF (1) 150 li = li + 1 160 LINE INPUT #1, line$ 170 ' PRINT MID$ (STR$ (li), 2) "0 '" LEFT$ (line$, 68) 180 PRINT #2, MID$ (STR$ (li), 2) "0 '" line$ 190 ' lin = lin + 1 200 ' IF lin = 23 THEN lin = 0 : PRINT : PRINT "Press RETURN to Continue " ; : WHILE INKEY$ = "" : WEND : PRINT 210 WEND 220 CLOSE 230 ' PRINT 240 PRINT UPPER$ (file2$) " is" STR$ (li) "-lines long." 250 PRINT 260 END 10 REM TOASM.BAS by Emmanuel ROCHE 20 : 30 ' BSM line = 1(2345)0 + " '" + ASM line 40 : 50 PRINT 60 INPUT "TOASM> Enter BSM File Name: " ; file$ 70 PRINT 80 file1$ = file$ + ".BSM" 90 OPEN "I", #1, file1$ 100 file2$ = file$ + ".ASM" 110 OPEN "O", #2, file2$ 120 li = 0 130 ' lin = 0 140 WHILE NOT EOF (1) 150 LINE INPUT #1, line$ 160 ' PRINT LEFT$ (line$, 78) 170 li = li + 1 180 ' lin = lin + 1 190 ptr = INSTR (line$, " '") 200 line2$ = MID$ (line$, ptr+2) 210 ' PRINT LEFT$ (line2$, 78) 220 PRINT #2, line2$ 230 ' IF lin = 23 THEN lin = 0 : PRINT : PRINT "Press RETURN to Continue " ; : WHILE INKEY$ = "" : WEND : PRINT 240 WEND 250 CLOSE 260 ' PRINT 270 PRINT UPPER$ (file2$) " is" STR$ (li) "-lines long." 280 PRINT 290 END The proper functionning of those 2 BASIC programs was checked by comparing the output file with the original file (ASM --> BSM --> BIS). My COMPARE.BAS program found them (ASM and BIS) to be identical, no matter how long the files were. By the way, for the record, the Amstrad PCW8256 was furnished with only one BASIC program: RPED (later, I was told this is the acronym of "Roland Perry's EDitor"). This BASIC program was defining a 200-lines array, where the user could load/save any ASCII file. 200 lines is roughly 4 pages, while we have already seen that 1167 lines are 21 pages... (And TOASM is as long as RPED... but RPED was "full-screen", while TOASM is line-oriented. I have never seen any program like TOASM published in the English magazines catering to the PCW, but I only read a few from time to time, since I was not a subscriber to them.) Ha! By the way, here is TODATA, which provided the impetus for all those thoughts. 10 REM TODATA.BAS by Emmanuel ROCHE 20 : 30 ' DATA line = 1(2345)0 + " DATA " + line read 40 : 50 PRINT 60 INPUT "TODATA> Enter ASC File Name: " ; file$ 70 PRINT 80 file1$ = file$ + ".ASC" 90 OPEN "I", #1, file1$ 100 file2$ = file$ + ".DAT" 110 OPEN "O", #2, file2$ 120 li = 0 130 ' lin = 0 140 WHILE NOT EOF (1) 150 li = li + 1 160 LINE INPUT #1, line$ 170 ' PRINT MID$ (STR$ (li), 2) "0 DATA " LEFT$ (line$, 68) 180 PRINT #2, MID$ (STR$ (li), 2) "0 DATA " line$ 190 ' lin = lin + 1 200 ' IF lin = 23 THEN lin = 0 : PRINT : PRINT "Press RETURN to Continue " ; : WHILE INKEY$ = "" : WEND : PRINT 210 WEND 220 CLOSE 230 ' PRINT 240 PRINT UPPER$ (file2$) " is" STR$ (li) "-lines long." 250 PRINT 260 END system A>That's all, folks! (Time to go to bed...) Yours Sincerely, "French Luser" EOF