Please title this page. (multask.htm)

Robotics, data communications, measurement and control, and computer music are only a few of the microcomputer programming applications that must respond to external real-time stimuli. Handling these stimuli is no problem if only one event occurs at a time and there is enough time between each event to do all the required computing for that task. But real-time problems are seldom so obliging. Often, the computer must handle several events concurrently.

This article describes one way to include concurrency in your programs, so that each simultaneous function can be written as a separate, straightforward task. You achieve concurrency by using a compact set of routines called a multitask kernel, to which each task speaks in a well-defined way. To demonstrate, I will present an 8080 kernel that supports multiple concurrent tasks in programs running under typical single-user operating systems.

A Split-Screen Display

First, let's explore some multitasking concepts by taking a look at an example program. Suppose you would like to use your computer as a smart terminal for a remote system, but you have a tendency to forget the passage of time, thereby running up a gigantic bill for computer and telephone services. A possible solution is to display the current time in a large banner format on the upper one-third of the console screen and show the usual dialogue with the remote system on the lower two-thirds. Because you can assume that the processor and the console screen run much faster than the telephone modem, this goal appears reasonable.

Five tasks are chosen to run at the same time; these tasks send keyboard characters to the modem, receive modem input, periodically get the time of day, and display the two parts of the screen. The keyboard task can be written in pseudocode as:

The routine that enables you to write this task as though it had exclusive use of the processor is called SWAPOUT. This kernel routine takes control from the calling task and gives it to another task. Thus, one way that the processor can be shared among concurrent tasks is by having them voluntarily relinquish control, with the understanding that they will soon get it back. Of course, the best time for a task to do this is when it is in a waiting mode.

The SWAPOUT routine demonstrates how multiple tasks are made to behave as concurrently executing processes. Each task is assigned its own data area, or stack frame. When SWAPOUT is called, the complete state of the calling task is saved; the state of the task includes the condition of its program counter, machine registers and flags, and everything already on its stack. Then, by switching the stack pointer to another stack frame, the next ready task is restored to the state in which it was preserved, and returned to execution.

Unlike the keyboard task, the other sample tasks (modem receive, time, and display) are not truly independent of each other. They need to communicate somehow; in particular, the display task needs to receive characters for display from the remaining two tasks. Moreover, it needs to know when characters are available. The other two tasks need to know when to send characters. I will explore these two closely related issues of task communication and synchronization further, but, for now, assume the existence of some kind of buffer mechanism between tasks. The remaining four tasks can then be sketched as follows:

A new kernel routine, DELAY, is introduced in TIME_TASK. This routine allows you to postpone a task for a specified time, giving control of the microprocessor to the remaining tasks. Here, it triggers a new time display about once per minute. To implement the DELAY routine, the existence of some type of real-time clock is assumed, perhaps the same one used to handle the time of day.

The clock handler calls the kernel routine TICK each time a tick (cycle) occurs. When the time specified to DELAY has elapsed, the delayed task is readied to continue execution.

The availability of a periodic clock interrupt that cycles every few milliseconds also raises the possibility of calling SWAPOUT from the interrupt handler. This technique (called time-slicing) would eliminate the need for each task to call SWAPOUT itself. Some new problems would be introduced, however, because the order of task execution is harder to control. What's more, all parts of the system must now be reentrant; that is, they must not use memory cells that are accessible to other parts of the system, except explicitly for intertask communication. An important consequence of this situation is that runtime libraries and operating system calls must be reentrant, which they not always are. A more direct way to eliminate multiple calls to SWAPOUT involves the use of semaphores.

Semaphores

You might notice that in the sample keyboard task, the keyboard is checked many times before a character is finally typed. Calling SWAPOUT prevents getting hung up in a loop, but it seems a waste of microprocessor time to keep swapping KEYBOARD_TASK in and out of execution with nothing accomplished. A better method would forgo execution until a certain event occurs (a key is hit). This facility can be provided by a semaphore, a simple data structure used to signal the occurrence of events and to wait for them. The event associated with a semaphore is agreed upon by the tasks involved; the routines for using it are provided by the multitask kernel.

A semaphore is based on the idea of using a Boolean variable to communicate an event (for example, the real-time clock flip-flop). A way to suspend a task's execution has been added by the inclusion of a link field, which the kernel routines can use to build a list of tasks waiting for the event (see figure 1). This field takes care of one or more tasks that are anticipating an event. But what if lots of events are signaled before they can all be handled? Within reasonable limits, these events can be handled by a counting semaphore, a semaphore whose Boolean variable is replaced by a counter. Each time an event is signaled with nothing waiting for it, the counter is incremented. When a task finally gets around to checking the semaphore, the counter is decremented, and the task doesn't need to wait at all.

Figure 1: The semaphore data structure consists of a count
of the number of times the semaphore has been signaled
with no tasks waiting and a link to a list of tasks waiting for signals.

The modem-receive task could undergo similar changes, with both tasks using the new kernel routine WAIT(semaphore). The address of the semaphore is passed to the kernel. If at least one event has occurred, WAIT returns immediately; otherwise, it postpones the task until a routine such as SIGNAL(semaphore) is called to signal the event. In the above example, an interrupt handler could do the signaling, or one task could be dedicated to checking all the devices and signaling the appropriate semaphores.

Intertask Communication

Now that you have semaphores to handle the synchronizing of tasks, you can build ways of moving data between tasks as well. Focusing on the passing of a single character, I will use a circular first-in/first-out (FIFO) buffer of the form shown in figure 2; in this buffer, characters are put in at TAIL and removed at HEAD, advancing these pointers with each access. What's needed is a way to indicate when the buffer is full and when it is empty; you will also want to suspend a task that can't access the buffer at these times.

Figure 2: The FIFO buffer. Characters are put into the buffer where TAIL points and
removed at HEAD. After each access, the appropriate pointer is advanced; at the
end of the buffer, the pointers are wrapped around to point to the first slot.

Both of these needs are met by using a pair of counting semaphores, one to guard the input and one to guard the output. The count fields correspond to the count of characters put in and taken out, respectively. Initially, the input count is 0, the output count is set to the buffer size, and the buffer is empty. The following complementary routines can now be written to access the top and bottom screen buffers:

The effect of these routines is to enable tasks to pass characters to other tasks without concem for whether they can be accepted at the moment; likewise, the accepting tasks need not worry about whether characters are available (SWAPOUT is unnecessary). Buffer-space housekeeping is done by the semaphore counts, and through the semaphore links, tasks are postponed if their requests cannot be honored.

Mutual Excdusion

The possibility of tasks interfering with each other may be important not only in conjunction with memory cells used by simultaneous tasks but also with any resource. In fact, the console screen is itself a single resource shared by two display tasks. What makes this condition a problem is that screen operations are divisible; that is, a single console function (such as positioning the cursor) might require several accesses (escape sequences). Each of these accesses might take enough time so that the microprocessor could be executing other tasks while waiting. Thus, the console-screen accesses constitute a critical region of the program--one that needs protection from use by more than one task at a time.

The mutual exclusion of tasks from a critical region can be handily accomplished using a semaphore. The count field of this semaphore takes on only two values: available (1) and reserved (O). A calling task waits at the entrance to the region if it is reserved; upon exit, the region is made available again. The semaphore is initially set to available (1). The following routine is passed either a displayable character or a special code requesting a control function such as erase line or inverse video on/off:

The routine calls WAIT or SWAP-OUT between sending characters to the console device. Therefore, other tasks run but cannot enter the critical region until the first calling task is finished.

An 8080 Multitask Kernel

Listing 1 (page 458) shows a set of routines, called MTK80, written in 8080 code for the Digital Research MAC assembler. You can insert the routines into a program along with the included macroinstructions for convenient calling sequences (see listing 2, page 466). Except for the user macroinstructions, all labels include a nonalphanumeric character to reduce conflicts with user symbols. If you use the DELAY call, a real-time (periodic) clock is also required.

If the conditional assembly control symbol INTS? is true, interrupts are disabled on entry and enabled on exit from each kernel routine. This setup ensures that the kernel data structures cannot be accessed by more than one routine at a time (a simple form of mutual exclusion). If STAK?CK is true, the stack pointer is checked to see if it is inside the current frame. If it intrudes upon a neighboring frame at the time it is checked, crucial data has likely been destroyed, so the kernel is abandoned. The operation of the MTK80 kernel can be described from three perspectives: the states that tasks enter, the underlying data structure, and the function calls used.

Task States

The state diagram in figure 3 shows the possible states a task can be in (circles) and the routines used to make transitions (arrows) to other states. You would need one such diagram for each task to describe the condition of all tasks in a program. Only one task can be in the running state at a time, and, of course, it makes the function calls to change its own state or the state of another task.

The dorrnant state applies to tasks that might be in program memory but have no assigned stack frame. When a task initially starts, it is associated with a frame and given control. As the diagram shows, only a running task can stop its own execution altogether and become dormant again. It is not possible to terminate other tasks directly.

A task is in the blocked state when it is waiting for either a semaphore signal or a delay time-out. Only a running task can become blocked; the task is then out of contention for microprocessor time until the specified event occurs.

Tasks in the ready state are in contention for the microprocessor simultaneously (but are not running). Only one task can execute at a time; the remaining tasks are in line to run when given a chance. Tasks are made ready when an event is signaled, a delay times out, or the running task gives up control voluntarily. When more than one task is ready, the task that was presented to the kernel first will be executed.

Data Structure

The 8080 kernel is based on the data structure in figure 4; in this figure the RAM (random-access read/write memory) area pointed to by STACK is divided into frames of equal size. Several bytes in each frame are used to save the stack pointer for that frame (SP), link to other frames to form lists (LINK), and save the start-up time during delays (TIME). The remainder of each frame is used as a stack in the normal 8080 manner.

In figure 4, the queue heads shown are pointers to (possibly empty) lists of frames, strung together by their LINK fields. Lists are maintained of ready, available, and delayed frames (and thus of their associated tasks). There is also a pointer to the current frame (the running task). Frames not presently allocated to tasks are available for use by dormant tasks being initiated. In the ready list, the newest arrival is placed at the end of the list. The frame at the head of the list (the one that has been ready the longest) is the one to go when the microprocessor becomes free. All tasks have equal priority.

A semaphore is a 3-byte structure in memory; 1 byte contains the signal count and the other 2 point to a (possibly empty) list of tasks waiting for signals. Two representative semaphores are shown in figure 4. One of them has no waiters (a nil LINK) but can have a nonzero count; the other has a single frame on its wait list.

You establish the data structure and error exit by using the macro-instruction INITMT (listing 2). The macroinstruction sets the ready and delayed lists empty, puts all frames but one on the available list, and makes the remaining frame the currently running one. When INITMT is called, the (hardware) stack pointer must be valid. The default operating system stack is valid and can serve as the stack pointer. On the return from INITMT, however, the state of the machine has been completely altered, and the calling task has become a task with a fresh stack.

Semaphores are initialized by the INITSEM macroinstruction, which sets the count field to the specified initial value and sets the wait queue to empty.

The START macroinstruction gets a frame from the available list and makes that frame the new running task, beginning its execution at the address supplied. The calling task is put on the ready list. An interesting result of the dynamic connection between executable code and stack frames is that, as long as there are available frames, the same piece of (reentrant) code can be alive in several incarnations at the same time! Because the task started gets a copy of the creator's registers, it might be handy to have more than one identical task active, each with different initial register contents. In this way you could pass parameters to new tasks on start-up; you could also use this method in the split-screen example to reduce the two nearly identical subscreen tasks to one routine.

A special case of the START macroinstruction is the TASK macroinstruction, which terminates the calling task and starts the new one. It is equivalent to resetting the stack pointer to the top of the current frame and jumping to the new starting address.

Figure 4: The MTK80 data structures. A sample state is shown, involving six stack frames and two
semaphores. NIL represents the end of a list

The STOP macroinstruction terminates a task by returning the task's frame to the available list. The first frame on the ready list is removed and is run.

SWAPOUT puts the currently running task at the end of the ready list, swapping it for the one at the top of the ready list. As with other functions that take the current task out of execution, SWAPOUT first pushes the return address and registers onto the stack, saves the stack pointer (at SP), and performs the optional stack overflow check. When a new current task is selected, its stack pointer is restored, followed by its register contents. It is then allowed to resume execution where it left off.

Using a semaphore, the WAIT macroinstruction enables a task to synchronize itself with an event. If the semaphore count is nonzero, it has been signaled (the event has occurred), and the count is decremented, acknowledging the signal. If the count is 0, the task becomes blocked; the current frame is added to the end of the semaphore queue list using its LINK field, and the next ready task begins execution.

Two special cases of WAIT are CWAIT and CKSEM, both of which return the status of the semaphore in the Z flag (nonzero if signaled). CWAIT acknowledges a signal by decrementing the count; CKSEM leaves the count unchanged. Control returns immediately to the caller in either case.

The complement of WAIT is the SIGNAL macroinstruction, which uses a semaphore to signal an event. If any tasks are waiting for the event, the first one is moved to the end of the ready list. Otherwise, the semaphore's count of unacknowledged signals is incremented. No check for overflow of this count is made (the maximum is 255). Another version of SIGNAL is FLAG, which works in exactly the same way except that it never increments the count past 1. This behavior is appropriate for mutual exclusion semaphores and for preventing count overflow. Neither SIGNAL nor TICK takes the calling task out of execution.

The DELAY macroinstruction puts the calling task on the delayed list, thereby blocking it until the specified number of system clock ticks have passed. The delayed list is ordered according to the length of time requested, with the shortest time first.

The proper placement of the calling task in the list is handled by DELAY, because this somewhat complex insertion needs to be done only once. The next ready task is then placed into execution.

DELAY uses a 2-byte counter register. Because this counter rolls over at 65,536, that value is the maximum number of cycles supported in one DELAY call. The register contains only a relative count that is incremented once by each call to TICK. Because the task with the shortest delay is at the top of the delayed list, TICK then compares the target wakeup time of that task with the current tick count. Any and all tasks whose time is up are appended to the ready

list. The current task then resumes execution, whether it called TICK or was interrupted.

Applications

Listing 3 contains a skeletal example of the declaration of data structures and the use of kernel macroinstruction calls. Refer also to the comments in listing 2 for details of ways to use the routines. If you have no macroinstruction facilities, use the macroinstruction definitions as a set of instructions for calling the kernel routines explicitly.

I used a Z80 version of this kernel for more than three years in my ham radio teletype (RTTY) communications program. The kernel made it easier for me to include features such as a split-screen display, continuous time and modem-frequency monitoring, and the reading and writing of a circular disk log file while transmitting and receiving.

I have written a related kernel in BDS C (B.D. Software C) for inclusion in a CP/M RTTY program. Expansion to allow monitoring of multiple communication channels (VHF, HF, telephone, packet radio, and satellite) is a natural application of multiple concurrent tasks. -

References

1. Agoston, M. K. "A Microprocessor Operating System: The Kernel." Dr. Dobb's Journal, vol. 2, number 8, September 1977, pages 20-40. Goes right into the details of a kernel for an 8080 real-time operating system that includes message passing between tasks.

2. Chein, Y. P. "Multi-tasking Executive Simplifies Real-time Microprocessor System Design." Computer Design, January 1980, pages 109-117. Description of a commercial product for the 8080 family. Emphasis is on highly structured interactions and a consistent channel mechanism for communicating with hardware and with other tasks and processors.

3. Dahmke, Mark. " Introduction to Multiprogramming." September 1979 BYTE, pages 20-32. Multiuser system design considerations. Emphasis on timing, I/O, memory allocation, and data structures.

4. Kinzer, Don. "A Time-Sharing/Multi-User Subsystem for Microprocessors." June 1980 BYTE, pages 122-134. A hardware approach to accommodating multiple users on a small system, each with a bank-selected memory segment. Includes an introduction to time-sliced processor sharing .

5. Knies, Wilfred, et al. "Industrial Real-Time FORTRAN Draft Standard." ACM SIGPLAN Notices, vol. 16, number 7, July 1981, pages 45-60. A proposed definition of procedures to handle multitasking functions, bit manipulations, process l/O, and shared files in FORTRAN programs. Glossary of relevant terms; state model and transition diagram.

6. Tsichritzis, D. C., and P. A. Bernstein. Operating Systems. New York: Academic Press, 1974. A textbook that focuses on large operating-system issues. Discus sions of concurrent processes, interprocess communication and synchronization, and resource allocation.

Jerry Holter (2329-B Hilgard Ave., Berkeley, CA 94709) is a senior engineer at Ampe Corporation in Redwood City, CA. He is involved with real-time microprocessor servomechanism control.

The author would like to thank David Altekruse, Ron Porat, and Robin Gurse for their support and assistance in the preparation of this article

Add Multiple Tasks to Your Communication and Control Program

A special kernel lets your 8080 run multiple tasks concurrently