A Gentle Introduction to the MIDI Tuning Specification

©2001 by Joe Monzo


 
--- In tuning@y..., jpehrson@r... wrote:

http://groups.yahoo.com/group/tuning/message/23368

> I thought somebody said, was it John DeLaubenfels (??)
> that the MIDI unit was not a set division, but could be
> altered somewhat, so it is not a permanent measure... 
> Did I get that wrong (??)


Hi Joe,

I had originally coined the term "midipu" (= MIDI Pitch-bend
Unit) and defined it as 1/4096 = 1/(2^12) of a Semitone.
This was based on my use of pitch-bend in Cakewalk.

Manuel corrected me and posted a link to the official
MIDI tuning specification webpage, wherein he stated
correctly that the finest tuning resolution available
in MIDI is 1/16384 = 1/(2^14) of a Semitone, and that the
figure in my definition was merely a less-finely-resolved
choice that Cakewalk had made.  IIRC, John deL. chimed
in in agreement.

(I have since renamed that measurement a "cawapu" = 
CAkeWAlk Pitch-bend Unit, and changed my definition of
midipu to agree with Manuel's).

So there is much variability in how different manufacturers
choose to implement the MIDI tuning spec, but the spec itself
offers 1/(2^14) Semitone as the limit of resolution.

Now, here's my long essay on how it all works....


-----------------------------------------------------

GENTLE INTRODUCTION TO THE MIDI TUNING SPECIFICATION

by Joe Monzo

-----------------------------------------------------



Much of this is going to be elementary for anyone who
knows anything about how computer work internally.  But
following my explanation should shed some light on how
MIDI tuning and pitch-bend works.

I'm going to explain four numbering systems that all
have a bearing on learning how to use the MIDI tuning
specification:

1. decimal       base_10
2. binary        base_2
3. hexadecimal   base_16
4. octal         base_8


Think in terms of a prime-factor vector, which you've
seen me (and Graham Breed) use here many times before.
This use of a prime-factor vector was the confusing
aspect of my HEWM notation which Joe Pehrson cited
from my paper a couple of months back (in April 2001
posts to the Tuning List), where I omit the primes
themselves and just string the exponents out in a series.

Understanding bits and bytes is similar to this.

The difference is that instead of representing exponents
of prime-factors, the numbers represent values of the
base-number raised to successive exponents.



Different numbering systems
---------------------------


For example, let's start with the system you're most
familiar with: decimal, also called base_10.
Of course I realize everyone understands this, but
it's necessary to spell out the procedure.


There are 10 decimal digits: 0 1 2 3 4 5 6 7 8 9.
These can be thought of as indicating the value of a
number of individual "units".

When we've used up all 10 digits and must continue
counting, how do we do it?

We imagine that these numbers may appear in a "place"
which has a specific meaning: that each place refers
to an exponent of the base number, and any digit in that
place indicates the value of that exponent.

So to get to the number "ten", simply form a new place
to the left of the original one and make that place
represent not units but "tens".  In math, the difference
is indicated by:  unit = 10^0, ten = 10^1.  So placing
a "1" in the "tens" place indicates 1 "ten", or simply 10.
The zero in the units place shows that the total value
is only 10 and not more than 10.

Then we cycle thru the units place again until we reach
19, and then go back to zero in the units and bump the
tens up to 2, with the result of 20.  Etc., etc.
The next place to the left is 100s (= 10^2), the next
to the left of that 1000s (= 10^3), etc.

So in other words the number 111, for example, really means:

  1*(10^2) + 1*(10^1) + 1*(10^0)

=   100    +    10    +    1

=   111.




Now on to binary, also called base_2.

There are only 2 binary digits: 0 and 1.  The reason
why this is so effective for use in computers is because
electronic switches can relay data by means of being in
one of either of two states: on or off.

The word "bit" is a contraction of the two words
"b-inary dig-it".  A "byte" is an 8-digit binary string.


So how do we get any bigger numbers than 2 in binary?
Easy... the same way as in the decimal system.

Each place to the left will be a higher exponent of 2.

So the right-most place is 2^0 = 1, then next to the left
is 2^1 = 2, the next is 2^2 = 4, the next is 2^3 = 8, etc.

So we cycle thru the whole set by placing first a zero,
then a one, in each successive place:

binary  decimal

   0  = 0
   1  = 1
  10  = 2
  11  = 3
 100  = 4
 101  = 5
 110  = 6
 111  = 7
1000  = 8

etc.

So, for example, the number 7 [decimal] is represented
in binary as with "1"s filling the first three places:

      1         1         1  [_base-2]

=  1*(2^2) + 1*(2^1) + 1*(2^0)

=     4    +    2    +    1

=     7  [_base-10]


The next number, 8 [decimal], divides evenly into
the third power of 2, so it has a "1" filling the
2^3 place and zeros in all the others, and looks
like this is binary: 1000.


      1         0         0         0  [_base-2]

=  1*(2^3) + 0*(2^2) + 0*(2^1) + 0*(2^0)

=     8         0    +    0    +    0

=     8  [_base-10]



The next number system I'll discuss is hexadecimal,
or hex for short, also called base_16.

Long strings of 1s and 0s are difficult to comprehend
visually, so hex is used by programmers as a convenient
shorthand for binary.  Its combination of numbers and
letters is much easier for humans to parse.

In this system, each place can hold numbers which
represent 0 thru 15.  But we need to keep our "digits"
to, obviously, a single digit, so we invoke the first
few letters of the alphabet after we pass 9.

Let's continue counting from the table above, this
time with the results in hex rather than decimal (you
only see a difference after 9).  I'll give the decimal
value at the end, just so you can see what it is:

binary  hex   decimal

1001  =  9
1010  =  A  =  10
1011  =  B  =  11
1100  =  C  =  12
1101  =  D  =  13
1110  =  E  =  14
1111  =  F  =  15


Because 1 hex digit can represent exactly 4 binary digits,
this is sometimes found to be a useful grouping, and is
called a "nibble" (I've also seen it spelled "nybble").
A nibble is exactly half a byte... get it?

I'm mentioning nibbles here because they play a role
in understanding how the MIDI tuning spec uses the data
contained in each byte of a signal.  I'll use nibble-size
binary groupings below to illustrate the hex numbers that
we'll come across.

Also note that it's easy to specify binary strings
of 1s in decimal format in a manner which makes obvious
their binary derivation, by bumping up to the next
higher power of 2 and adding "-1".  For example,
our last number in the above table, 1111 [binary]
= 10000 [binary] minus 1, or in decimal, (2^4)-1
= 16 - 1 = 15, which = F in hex.


After F [hex], which = 15 [decimal], we get to
16 [decimal] by using the same procedure as before:
bump up to the next higher exponent of 16 and start
over again.  So a "1" in the next higher place means
16^1, and in the next higher place after that, 16^2, etc.

So the highest 2-digit number in hex is FF [hex],
which equals

  (15 * (16^1)) + (15 * (16^0))

= (15 *   16)   + (15 *  1)

=     240       +    15

=  255


The next higher number would be 256 [decimal],
which divides evenly as the second power of 16,
so we put a "1" in the third place followed by
two zeros: 100 [hex] = 256 [decimal].   So,
in hex, FF + 1 = 100.



There's an intermediate numbering system called octal,
which is (you guessed it) base_8.  This system is a
bit easier to understand because it works just like
decimal but only has digits 0 thru 7.  After 7 comes
10 [octal] = 8 [decimal].

In fact it's somewhat akin to our usual diatonic
musical numbering system which uses A B C D E F G,
then starts again at A when we reach the next "octave".

Octal is not much used these days, hex being preferred.
But because of the structure of the MIDI tuning data
protocol, octal plays an important role in MIDI tuning
calculations.



Now, with that out of the way, on to the specifics
of the MIDI tuning data format.

From the official MIDI website:

> Frequency data shall be sent via system exclusive
> messages. Because system exclusive data bytes have
> their high bit set low, containing 7 bits of data,
> a 3-byte (21-bit) frequency data word is used for
> specifying a frequency with the suggested resolution.


In other words, the left-most bit or place, in each of
the three 8-digit binary strings called a "byte" which
belong to a tuning command, is set to 0, which in this
case is simply a flag to let the hardware know that
these 3 bytes of data are a SysEx type of message.

This is extremely important, because it means that the
rules of enumeration which I elaborated above are not
quite followed.  There's a different system in use here
that's called an "offset".


More from the midi website:

> Frequency data shall be defined in units which are
> fractions of a semitone. The frequency range starts
> at MIDI note 0, C = 8.1758 Hz, and extends above
> MIDI note 127, G = 12543.875 Hz. The first byte of
> the frequency data word specifies the nearest
> equal-tempered semitone below the frequency. The next
> two bytes (14 bits) specify the fraction of 100 cents
> above the semitone at which the frequency lies.
> Effective resolution = 100 cents / 214 = .0061 cents.



There's a serious typo error on the webpage here, in the
denominator of that fraction: "214" should really be 214,
so that effective resolution should be given as
100 cents / 2^14 = ~0.006103516 cent.

  
Thus, the greatest possible MIDI tuning resolution
is equivalent to 1200 / (0.006103516) = 196608-EDO,
or 2^14 = 16384 midipus per Semitone.


Also, the frequencies may be specified to a more accurate
number of decimal places than those published in the MIDI spec,
which is particularly important in the case of the low
frequencies:

    MIDI note 0 is the 12edo "C" which is 5 "8ves" plus a "major-6th" below A-440, which = 440 * 2^(-5-(9/12)) = ~8.175798916 Hz, which is ~1/4355 of a cent lower than the published figure.

    MIDI note 127 is the 12edo "G" which is 4 "8ves" plus a "minor-7th" above the reference tone of A-440, which = 440 * 2^(4+(10/12)) = ~12543.85395 Hz, which is ~1/344 of a cent lower than the published figure. (The published figure appears to be the result of an erroneous calculation, because rounding off intermediate values in the calculation results in other values which are all different from it.)

Let's begin by illustrating the nature of the MIDI data. I'll use variables s and m, to stand for Semitone bit and midipu bit, respectively: 0sssssss 0mmmmmmm 0mmmmmmm So in other words, the highest value that any of these bytes can have is 1111111 [binary] = (2^8)-1 [decimal], which equals 127. This is the same as saying: (2^6)+(2^5)+(2^4)+(2^3)+(2^2)+(2^1)+(2^0) = 64 + 32 + 16 + 8 + 4 + 2 + 1 = 127. 127 [decimal] = 7F [hex], because the first "nibble" is 0111 [binary] = 7 [hex], and the second "nibble" is 1111 [binary] = F [hex]. So all the values in the three MIDI data bytes must be between 0 and 127 [decimal], which is the same as between 0 and 7F [hex]. The Semitone component ------------------------ This is why there are 128 possible different MIDI notes, numbered from C0 to G10; or, to put it another way which relates to the illustration above, 128 different semitone divisions of the total pitch-space. In Semitones, this is: 10 "octaves" + the highest "octave" of C + a "5th" = (12 * 10) + 1 + 7 = 128 total MIDI-notes. Let's examine some specific MIDI-note numbers to see how it works. First let's try the familiar "octave". This is the 12th note above the starting note. Recapping the hex table I gave above, we see that: hex decimal 9 = 9 A = 10 B = 11 C = 12 So in hex the 12th note would be the digit "C": 0C [hex] = (0 * (16^1)) + (12 * (16^0)) [decimal] = (0 * 16) + (12 * 1) = 0 + 12 = 12 [decimal], or an "octave" = MIDI-note C1. Let's try the highest hex digit, F. Remember, F [hex] = 15 [decimal]: 0F [hex] = (0 * (16^1)) + (15 * (16^0)) [decimal] = (0 * 16) + (15 * 1) = 0 + 15 = 15 [decimal] = (15 / 12) = 1 & 3/12 "octaves" above C0 = an "octave" + a "minor 3rd" above C0 = MIDI-note Eb1/D#1. And to find out highest MIDI-note: 7F [hex] = (7 * (16^1)) + (15 * (16^0)) [decimal] = (7 * 16) + (15 * 1) = 112 + 15 = 127 [decimal] = (127 / 12) = 10 & 7/12 "octaves" above C0 = 10 "octaves" + a "perfect 5th" above C0 = MIDI-note G10. Now let's reverse the procedure, so that for any given MIDI-note we find the hex value. Let's find "middle-C": = MIDI-note C5. = 5 "octaves" above C0 = 5 "octaves" * 12 Semitones = 60 [decimal] (MIDI-note number 60) (60 / 16 = 3 & 12/16, therefore...) 60 [decimal] = 48 + 12 = (3 * 16) + (12 * 1) = (3 * (16^1)) + (12 * (16^0)) [decimal] = 3C [hex] Since A-440 Hz is the MIDI tuning reference, let's find that: = MIDI-note A5. = 5 "octaves" + a "major 6th" above C0 = 5 & 9/12 "octaves" above C0 = ((5 * 12) + 9) Semitones = (60 + 9) Semitones = 69 [decimal] (MIDI-note number 69) (69 / 16 = 4 & 5/16, therefore...) 69 [decimal] = (4 * 16) + 5 = 64 + 5 = (4 * 16) + (5 * 1) = (4 * (16^1)) + (5 * (16^0)) [decimal] = 45 [hex] I chose these MIDI-notes deliberately because they appear in the table on the MIDI spec website. This explanation should make it easier to understand that table. Knowing the MIDI-note number of A-440 Hz enables us to calculate more accurate frequency values to replace those given on the MIDI website (one needs to be especially careful when rounding off very low frequencies): Frequency of lowest MIDI-note = 440 Hz / (ratio of A5:C0) = 440 / (2^((69-0)/12)) Hz = ~8.175798916 Hz Frequency of highest MIDI-note = (ratio of A5:G10) * 440 Hz = (2^((127-69)/12)) * 440 Hz = ~12543.85395 Hz The pitch-bend component ------------------------ That's easy enough for the semitone component of the tuning spec, because it only occupies one byte. But for the fraction-of-a-semitone component (or pitch-bend component), which occupies *two* bytes, the math is a bit more complicated. You can't simply keep bumping up to the next higher exponent of your base as in normal calculation, because the MIDI spec requires that the first bit of each MIDI data byte must be a zero in order to flag it as a data byte. That zero in what is called the most significant bit throws the calculation off by one exponent. (It's called the "most significant bit" because it has the highest *potential* value in its byte, even tho in the MIDI spec it is actually equal to zero.) Here's the solution. 127 [decimal] = 7F [hex] is the highest value we can have in any of the three tuning data bytes. So if our first data byte (the one to the right) has a value of 7F [hex] = 127 [decimal], we can't simply use the regular 80 [hex] to represent 128 [decimal] . We have to skip over the predetermined zero in the most significant bit of this byte, and put a "1" into the next available place, which would be the least significant bit of the next higher byte. In binary notation, we may designate the mandatory zero in the highest bit with an "x", to illustrate that it cannot be used in our calculation. This ends up giving us a rather bizarre combination of octal and hex in our calculations, and has made MIDI tuning math more complicated than it probably needed to be. I will refer to this as "octal-hex" in my labels. We will also find that it is easier to understand the octal-hex combination if we divide the bytes into two nibbles for the purposes of binary notation, and if we use zeros as place-holders in the unused places of the octal-hex numbers and divide them into bytes. Thus, in effect, the two pitch-bend data bytes are divided into 4 nibbles which are counted in the pattern: octal - hex - octal - hex (from left to right). So if we start now at: 127 [decimal] = 00 7F [octal-hex] = x000 0000 x111 1111 [binary], this gives a tuning inflection of 0.775146484 (= 3175/4096) cent. The next number is: 128 [decimal] = 01 00 [octal-hex] = x000 0001 x000 0000 [binary]. This gives a tuning inflection of 0.78125 (= 25/32) cent. So we can see that 01 00 [octal-hex], instead of representing 256 [decimal] as in a regular hex calculation, will now represent 128 [decimal] instead. So now we may cycle thru all the possible combinations of digits in the lower (right-most) byte until we fill all the places with their highest digit (which is "1" in binary), which would give us: x000 0001 x111 1111 [binary] = 01 7F [octal-hex] = 128 + 127 [decimal] = 255 [decimal]. This gives a tuning inflection of ~1.556396484 (= 1 + 2279/4096) cents. The next number is: 256 [decimal] = 02 00 [octal-hex] = x000 0010 x000 0000 [binary] This gives a tuning inflection of 1.5625 (= 1 + 9/16) cents. So this rather complicated calculation is achieved by treating the left byte the same way as the right one, then multiplying it by 128, then adding both bytes together. Alternatively, perhaps it is easier to think of each hex digit as a certain exponent of 2 which follows an alternating irregular pattern: 4 3 4 ... pattern of exponent increase / \ / \ / \ = 2^11 2^7 2^4 2^0 ... exponent of 2 = 2048 128 16 1 ... decimal value Since this interrupted pattern (i.e., the mandatory zero byte that doesn't count in the calculation) is a non-standard kind of math, let's cycle thru all the remaining pairs of numbers where the next "place" changes, to be absolutely clear on how it works. = x000 0010 x111 1111 [binary] = 02 7F [octal-hex] = 383 [decimal] Tuning inflection: 2.337646484 (= 2 + 1383/4096) cents. = x000 0011 x000 0000 [binary] = 03 00 [octal-hex] = 384 [decimal] Tuning inflection: 2.34375 (= 2 + 11/32) cents. = x000 0011 x111 1111 [binary] = 03 7F [octal-hex] = 511 [decimal] Tuning inflection: 3.118896484 (3 + 487/4096) cents. = x000 0100 x000 0000 [binary] = 04 00 [octal-hex] = 512 [decimal] Tuning inflection: 3.125 (= 3 + 1/8) cents. = x000 0111 x111 1111 [binary] = 07 7F [octal-hex] = 1023 [decimal] Tuning inflection: 6.243896484 (= 6 + 999/4096) cents. = x000 1000 x000 0000 [binary] = 08 00 [octal-hex] = 1024 [decimal] Tuning inflection: 6.25 (= 6 + 1/4) cents. = x000 1111 x111 1111 [binary] = 0F 7F [octal-hex] = 2047 [decimal] Tuning inflection: 12.49389648 (= 12 + 2023/4096) cents. = x001 0000 x000 0000 [binary] = 10 00 [octal-hex] = 2048 [decimal] Tuning inflection: 12.5 (= 12 + 1/2) cents. = x001 1111 x111 1111 [binary] = 1F 7F [octal-hex] = 4095 [decimal] Tuning inflection: 24.99389648 (= 24 + 4071/4096) cents. = x010 0000 x000 0000 [binary] = 20 00 [octal-hex] = 4096 [decimal] Tuning inflection: 25 cents. = x011 1111 x111 1111 [binary] = 3F 7F [octal-hex] = 8191 [decimal] Tuning inflection: 49.99389648 (= 49 + 4071/4096) cents. = x100 0000 x000 0000 [binary] = 40 00 [octal-hex] = 8192 [decimal] Tuning inflection: 50 cents. = x111 1111 x111 1111 [binary] = 7F 7F [octal-hex] = 16383 [decimal] Tuning inflection: 99.99389648 (= 99 + 4071/4096) cents. Whew! Still with me? We made it thru the hardest part. Since 7F 7F [octal-hex] = 16383 [decimal] is the highest possible value, there are a total of 16384 = 2^14 possible divisions of the Semitone in the MIDI tuning spec. This is how I found the error in the fraction on the MIDI website. Most instruments and software do not take full advantage of this super-fine resolution. As the MIDI spec says: > An instrument which does not support the full > suggested resolution may discard any unneeded > lower bits on reception, but it is preferred > where possible that full resolution be stored > internally, for possible transmission to other > instruments which can use the increased resolution. Cakewalk [TM] 2.0, the MIDI sequencer I use, gives a tuning resolution of 4096 = 2^12 cawapus (a new term I just coined) per Semitone. Thus it ignores the first two bits available in the spec, and therefore gives a range of possible values from 0 to 4095 [decimal] = 00 00 to 1F 7F [hex]. In other words, the first nibble can only be a 0 or 1 in all four numbering systems considered here: let "x" designate the two bits that cannot be used because they are reserved for the SysEx flag. let "y" designate the two bits that Cakewalk's tuning spec cannot recognize. the cawapu spec uses a total of 1+4+3+4 = 12 bits. thus, the maximum possible value is: xyy1 1111 x111 1111 [binary] = 1 F 7 F [hex] = 4095 [decimal] since the leading nibble can only designate a binary digit, the cawapu data stream is thus really a weird progression of binary-hex-octal-hex. -monz http://www.monz.org "All roads lead to n^0"


Errata on official MIDI tuning webpage

As stated above, there are several errors on the official MIDI tuning page.

At the end of the section titled "FREQUENCY DATA FORMAT" is a table titled "Examples of frequency data:" (almost halfway down the page).

Below i give the correct figures for Hz, using 8 decimal places of precision instead of the 4 as on the MIDI page, along with some additional data showing MIDI-note, pitch-bend amount in both tetradekamus and cents, and ratios from the tuning reference of A-440 Hz.

The "7F 7F 7F" command is reserved to indicate "no change", thus the highest possible frequency obtainable in MIDI is 13289.6566 Hz. ("14mu" is my abbreviation for "tetradekamu".)



  MIDI       MIDI --pitch-bend--    ratio
freq.data    note +14mus  +cents  from A-440	      Hz

00 00 00  =    0      0   0.0000  0.018581361       8.17579892
00 00 01  =    0      1   0.0061  0.018581427       8.17582774
01 00 00  =    1      0   0.0000  0.019686266       8.66195722
0C 00 00  =   12      0   0.0000  0.037162722      16.35159783
3C 00 00  =   60      0   0.0000  0.594603558     261.62556530
3D 00 00  =   61      0   0.0000  0.629960525     277.18263098
44 7F 7F  =   68  16383  99.9939  0.999996474     439.99844877
45 00 00  =   69      0   0.0000  1               440.00000000
45 00 01  =   69      1   0.0061  1.000003526     440.00155124
78 00 00  =  120      0   0.0000  19.02731384    8372.01808962
78 00 01  =  120      1   0.0061  19.02738092    8372.04760546
7F 00 00  =  127      0   0.0000  28.50875898   12543.85395142
7F 00 01  =  127      1   0.0061  28.50885949   12543.89817521
7F 7F 7E  =  127  16382  99.9878  30.20376504   13289.65661609
7F 7F 7F     --     --      --        --              --

The most egregious error in the table is the second note in the list, which the official MIDI page gives as "00 00 01 = 8.2104 Hz". The interval between this note and the first one is ~7.3111 cents, whereas it states explicitly in the text that 1 unit of pitch-bend equals only 0.0061 cents! The actual frequency data needed to obtain this frequency would be 00 09 2E -- quite a difference! This must have been the result of an error in the calculation.

The other errors are much smaller, and are probably the result of rounding various values at some point in the calculation. (Remember that the larger difference in numbers for the higher frequencies doesn't actually sound as big as it looks, because we perceive pitch logarithmically.)



updated:


  • If you don't understand my theory or the terms I've used, start here
    or try some definitions.
  • I welcome feedback about this webpage:
    corrections, improvements, good links.
    Let me know if you don't understand something.


    return to my home page
    return to the Sonic Arts home page