Musical Instrument Digital Interface: MIDI Serial

Many devices use playback of digital audio waveforms as their audio source. It was desirable to implement a sub-protocol within MIDI in which devices could exchange this digital audio waveform data. In other words, a protocol was needed that allowed devices to exchange waveform data over MIDI cables within the parameters of MIDI. The only way to do this was with System Exclusive messages, and so several specific SysEx messages were defined in order to implement Sample Dump Standard (SDS). Many samplers support this protocol.

The device that sends the waveform data is the transmitter, and the device that receives it is the receiver.

A waveform exchange (ie, dump) can be done with or without handshaking. In the non-handshaking version, the transmitter's MIDI OUT is connected to the receiver's MIDI IN, and only the transmitter sends MIDI messages to the receiver. In the handshaking version, the transmitter's MIDI OUT is connected to the receiver's MIDI IN and the receiver's MIDI OUT is connected to the transmitter's MIDI IN. The transmitter sends a portion of the waveform data, after which it expects some sort of acknowledgement from the receiver that the portion has been received successfully or otherwise, and then the transmitter sends the next portion. All of this is accomplished with the devices passing defined SysEx messages between themselves.

The SysEx messages are the DUMP REQUEST, ACK, NAK, WAIT, CANCEL, Dump Header, and Data Packet messages. The first 5 (capitalized) are generated by the receiver. The last 2 are generated by the transmitter.

The dump procedure works as follows. The transmitter sends a Dump Header to indicate a dump start to the receiver. (This could have happened as a result of the receiver requesting the transmitter to start a dump via the DUMP REQUEST). The transmitter then waits for upto 2 seconds for a response from the receiver. This gives the receiver a chance to decide if it wants to and can accept the waveform. If no response is received, then the transmitter assumes a non-handshaking action, and proceeds to send out the first Data Packet. If an expected response (ie, handshake) is received instead, then the transmitter bases its next action upon the receiver's response. If the response is an ACK, the transmitter proceeds to send out the first Data Packet. If the response is a NAK, the transmitter sends the Dump Header again. If the response is a CANCEL, the transmitter aborts the dump. If the response is a WAIT, the transmitter pauses indefinitely until it subsequently receives one of the preceding responses. After the first data packet is sent, the transmitter waits for upto 20 milliseconds for a response from the receiver. This gives the receiver time to perform certain error-checking on the packet's contents. If no response is received, then the transmitter assumes a non-handshaking action, and proceeds to send out the next Data Packet. If an expected response is received instead, then the transmitter bases its next action upon the receiver's response. If the response is an ACK, the transmitter proceeds to send out the next Data Packet. If the response is a NAK, the transmitter resends that same (ie, first) Data Packet again. If the response is a CANCEL, the transmitter aborts the dump. If the response is a WAIT, the transmitter pauses indefinitely until it subsequently receives one of the preceding responses. Eventually, the transmitter sends out as many Data Packets as are needed to pass all of the waveform data to the receiver, repeating this handshake procedure after each packet (or assuming a non-handshaking action after each packet). After that happens, the dump is done.

Here are the messages (with all bytes in hex). In each message, the byte notated as cc represents the SysEx channel that the message is being sent upon. There are 128 possible SysEx channels that a device can be set to (ie, 0 to 127).This allows various devices to be set to different SysEx channels along the daisy-chain, and have the dump occur between 2 particular devices with matching SysEx channels.

DUMP REQUEST

F0 7E cc 03 sl sh F7

If a receiving device wishes to initiate the dump (ie, tell some other device to send some waveform data), then the receiver sends the DUMP REQUEST. The sl sh is the 14-bit number (ie 0 to 16,384) of the waveform which the receiver is requesting from the transmitter. Most samplers number their internal waveforms from 0 to how ever many waveforms there are. Note that the 14-bit sample number is transmitted as 2 bytes where the first byte (sl) contains bits 0 to 6 (with high bit clear), and the second byte (sh) contains bits 7 to 13, right-justified (with high bit clear). When the transmitter gets this request, if such a sample number is available, the transmitter will kick off the dump with a Dump Header. Otherwise, the transmitter will cancel the dump. Typically, the receiver will wait for the Dump Header for a few seconds, and if not received, will abort the operation.

Dump Header

F0 7E cc 01 sl sh ee pl pm ph gl gm gh hl hm hh il im ih jj F7

The transmitter sends this to the receiver to provide information about the waveform data that is about to be sent (in Data Packet messages). The sl sh is the waveform's 14-bit number (ie 0 to 16,384). See Dump Request.

ee is the number of significant bits of the waveform. For example, a 16-bit resolution waveform would have a 16 here.

pl pm ph is the sample period in nanoseconds (ie, 1,000,000,000/sample rate in Hertz). For example, a waveform sampled at 41667 Hertz will have a period of 23,999 nanoseconds. This value is transmitted as 3 bytes where pl is bits 0 to 6, pm is bits 7 to 13 right-justified, and ph is bits 14 to 20 right-justified (ie, for a total of 20 bits of resolution) with the high bit of all 3 bytes clear. So, our 23,999 (0x5DBF) becomes the 3 bytes 3F 3B 01.

gl gm gh is the waveform length in words. (What this implies is that if you have 8-bit or less resolution, the waveform length will be half the number of sample points that you intend to dump. You always end up having to send an even number of points).

hl hm hh is the word offset (from 0, ie, the very first sample point in the waveform) where the sustain loop starts. il im ih is the word offset where the sustain loop ends (ie, where the playback loops back to the sustain loop start). jj is the looptype where 00 means "forward only" (most common) and 01 means "backward/forward", and 7F means "no loop point" (ie, the waveform is played through once only without looping). Note that older MIDI samplers didn't support the 7F value for looptype. For these older samplers, usually, if you set both th start and end loop points to the same value as the waveform length, a sampler will consider this to be a non-looped waveform. So to be safe, when you want to indicate that a waveform is not to be looped, you should set looptype to 7F, and set the start and end loop positions to the same value as the waveform's length.

Data Packet

F0 7E cc 02 kk [120 bytes here] ll F7

The data packet is what is used to transfer the actual waveform data. It transfers 120 bytes of waveform data at a time. So, the total size of a packet is 127 bytes.

kk is the packet number from 0 to 127. The first packet that is sent is number 0. The second packet is number 1. After packet number 127, the count rolls over to 0 again (ie, packet 128 becomes 0 again). This number is used by the receiver to ensure that it hasn't missed any packets. The packet number is also used to distinguish new packets from resent packets. After all, if packet number 3 follows packet number 1, then either packet number 2 has been missed by the receiver, or the transmitter sent packets out of order. For example, assume that a device has gotten packet 1, found an error in it, and sends a NAK to the transmitter. But, the transmitter has already assumed non-handshaking and started sending packet 2. The receiver would note that the next arriving packet is number 2. Then, the transmitter finally sees the receiver's late NAK to packet number 1, and resends that packet. The receiver can then note that it has received packet 1 out of order.

The 120 bytes of waveform data follow. The transmitter has to pack up each sample point of its waveform data. With a 16-bit waveform, the transmitter must break up each 16-bit word into 3 bytes for transmission where the first contains bits 15 to 9, the second contains bits 8 to 2, and the third contains bits 1 and 0. In other words, unlike with the waveform length of the DUMP HEADER, the DATA PACKET's bytes are left-justified. The first data byte contains the highest 7 bits (which are placed in bit positions 0 to 6, since you'll remember that all transmitted data bytes must have bit 7 clear). The second data byte contains the next highest 7 bits. And the last data byte contains the remaining, lowest bits, which for a 16-bit point means the last two bits. For example, the 16-bit sample word 0xF0F0 would be 0x78 0x3C 0x00. Because each 16-bit word must be broken up into 3 bytes, and because there must be only 120 bytes in a packet, that means that a packet can contain 40, 16-bit sample points. In fact, waveforms with resolutions of 15 to 21 bits pack up likewise. Waveforms with resolutions of 8 to 14 bits pack each sample point into 2 bytes (for 60 points per packet). Waveforms with resolutions of 22 to 28 bits pack each sample point into 4 bytes (for 30 points per packet). Sample points are represented by 0 being full negative value. So, in a 16-bit waveform, 0x0000 is full negative value and 0xFFFF is full positive value (ie, signed shorts aren't used, unlike in the WAVE file format, so you have to subtract a 16-bit point by 0x8000 after unpacking the 3 bytes into an unsigned short, if you want to adjust to a signed short).

Here's a C example of how to unpack 3 bytes of a DATA PACKET into a 16-bit point (ie, # of significant bits = 16). It is passed a pointer to the first of those 3 bytes, and returns a signed 16-bit point.

short unpack3(unsigned char * ptr)
{
    unsigned short num;

    /* Unpack 3 bytes into an unsigned short */
    num = ((unsigned short)(*ptr) << 9) | ((unsigned short)(*(ptr+1)) << 2) | (*(ptr+2) >> 5);

    /* Change unsigned range to signed range */
    num -= 0x8000;

    return((short)num);
}

NOTE: Even the last packet must have 120 data bytes in it. If a particular waveform packs up such that there aren't 120 bytes for the last packet, then that last packet's data should be padded out with 0 bytes to 120 bytes total. The receiver should ACK this last packet also.

ll is the checksum. This is the XOR of the bytes 0x7E, cc, 0x02, kk, and all 120 bytes of waveform data (with bit 7 of result masked off to 0). The receiver uses this to check that no errors occurred in the packet transmission. If so, the receiver will NAK this packet, and expect the transmitter to resend it.

ACK

F0 7E cc 7F kk F7

The receiver sends this after successfully receiving a Dump Header and after each successfully received Data Packet. It means "the last message was received correctly. Proceed with the next message". kk is the packet number that was received correctly (0 if responding to a Dump Header). The transmitter uses this to determine which particular packet the receiver has accepted (in case packet dumps get out of order).

NAK

F0 7E cc 7E kk F7

The receiver sends this after unsuccessfully receiving a Dump Header and after each unsuccessfully received Data Packet. It means "the last message was not received correctly. Resend that message". kk is the packet number that was received incorrectly (0 if responding to a Dump Header). The transmitter uses this to determine which particular packet the receiver has rejected (in case packet dumps get out of order).

CANCEL

F0 7E cc 7D kk F7

The receiver sends this when it wishes the transmitter to stop the dump. kk is the packet number upon which the dump is aborted (0 if responding to a Dump Header).

WAIT

F0 7E cc 7C kk F7

The receiver sends this when it wants the transmitter to pause the dump operation. The transmitter will send nothing until it receives another message from the receiver; an ACK to continue, a NAK to resend, or a CANCEL to abort the dump. kk is the packet number upon which the wait was initiated (0 if responding to a Dump Header).

This is useful for receivers which need to perform lengthy operations at certain times, such as writing data to floppy disk. If the receiver did not issue a WAIT, then the transmitter might count down its 20 millisecond timeout, and assume a non-handshaking action such as sending the next packet, without waiting for a response from the receiver. A WAIT tells the transmitter to wait indefinitely for a response.


Some people like to use computer-based wave editing software to find and set loops points. This is because the computer's large display, and mouse support, is more conducive to displaying a waveform and quickly locating satisfactory loop points, than the typically small LCD upon MIDI samplers (plus a lack of pointing devices such as a mouse). The task of finding satisfactory loop points typically involves much trial-and-error. The user has to set the start and end loop points, listen to the result, and then choose other points if the result is not yet satisfactory. Because the waveform usually has to be sent back to the sampler in order to properly judge the results, and because a MIDI Sample Dump can be a time-consuming procedure, this means that the user typically wastes a lot of time waiting for samples to be transferred. For this reason, 2 messages were added to the SDS specification. (Note that many early MIDI samplers do not support these newer messages). One message allows a transmitter (such as a computer) to ask the receiver to send only the position of the loop points for a given waveform. That means that the transmitter can quickly get information about loop points without needing to transfer an entire waveform dump. The other message allows a transmitter (such as a computer) to tell the receiver to set the start and end loop points to particular positions for a given waveform. That means that the transmitter can quickly set new loop positions without needing to transfer an entire waveform dump.

It is also possible to send/receive multiple loop points (up to 16384) in one message (as described below).

LOOP POINT TRANSMIT

F0 7E cc 05 01 sl sh ll lh jj hl hm hh il im ih more F7

The transmitter sends this to the receiver to set the start loop and end loop positions for a particular waveform. The receiver should set those loop positions for that waveform and ACK this message if successful. Otherwise, a NAK is returned. The sl sh is the waveform's 14-bit number (ie 0 to 16,384). See Dump Request.

The ll lh is the loop's 14-bit number (ie 0 to 16,384). Many samplers allow more than one loop to set for a given waveform, for example, there can be a sustain loop (ie, the part of the waveform looped while the user holds down a key and the sustain portion of a VCA is sustaining the sound), and a release loop (ie, the part of the waveform looped after the user releases the key and the release portion of a VCA is slowly fading out the sound). The sampler numbers the loops from 0 to how ever many loops are supported per waveform. Note that the 14-bit sample number is transmitted as 2 bytes where the first byte (ll) contains bits 0 to 6 (with high bit clear), and the second byte (lh) contains bits 7 to 13, right-justified (with high bit clear). The number of loops supported will likely vary from manufacturer to manufacturer, but a loop number of 00 00 always refers to the sustain loop. A loop number of 7F 7F is reserved to mean "delete all loops" (ie, the sampler will delete all loops that are currently set for the waveform. This is an easy way to start with a "clean slate", but note that not all samplers support this special request).

hl hm hh, il im ih, and jj are the loop start position, loop end position, and looptype. They are specified in the same way as per the Dump Header message.

It is also possible to specify more loop points (up to 16384) in one message. Where you see more in the above template, you could put another loop number, followed by its looptype, loop start position, and loop end position. After this, you could repeat these fields for the next loop, etc. So how does the receiver know how many loop points he is getting? Well, if he doesn't find an F7 where he expects one, then he must be dealing with the ll byte of the next loop's number. Therefore he should expect to find a following lh jj hl hm hh il im ih bytes. After that should be an F7, but of course, it could be yet another loop's ll byte.

LOOP POINT REQUEST

F0 7E cc 05 02 sl sh ll lh F7

The transmitter sends this to the receiver to ask it to send the start loop and end loop positions for a particular waveform. The receiver will then return a Loop Point Transmit message containing the requested information, or a NAK if it can't handle the request successfully. The sl sh is the waveform's 14-bit number (ie 0 to 16,384). See Dump Request.

The ll lh is the loop's 14-bit number (ie 0 to 16,384). See Loop Point Transmit.

I don't have enough information to determine what happens when you use a loop number of 7F 7F. This may cause the receiver to return a Loop Point Transmit containing all of the loops for that waveform. Or, I don't know as if you can specify several loop numbers in the above message, in order to have the receiver return all of those loops in one Loop Point Transmit message. You'll have to experiment to deduce this information. If someone does some experiments with a sampler that supports these Loop Point messages, please inform me of the results.