February 1997

Shrinking the Synth

by Paul D. Lehrman

Last month I talked about the new level of complexities that synthesis is heading towards. I extolled the virtues of physical-modeling technology whose real-time-oriented control capabilities are demanding a higher level of involvement on the part of the synthesist.

You got a problem with that? Too dry, too expensive, too involved, too much work? Okay, this month, how about we go in the opposite direction: cheap, and hands-off? Or even better, free and totally hands-off? We can do that. And the scary part is, this direction may turn out to be our best hope for a decent soundtrack as we hurtle on down the information superhighway.

Let’s back up a little. We all know about computer sound cards, those horrid things with two-operator FM chips that produce nasty noises in the background while you play games, and that have made MIDI a dirty word in multimedia circles. Well, sound cards have gotten a lot better of late, using chips with sophisticated sound engines and decent sample sets from manufacturers of high-end synth gear, but their era may be drawing to a close. Computer manufacturers are putting more audio DSP hardware on their motherboards, for things like digital recording, voice mail, and speech synthesis, and a lot of that hardware can handle music synthesis just as well. With all of that stuff already on there, adding a synth chip, maybe a reverb processor, and a couple of megabytes of sample ROM right onto the motherboard is child’s play. Communicate with the chip over an internal bus using MIDI for the control language (just like a sound card); run its output through the computer’s onboard digital-to-analog convertor; throw in some MIDI File playback software, and voilà ! –a synth studio in a box.

Does this cost very much? I’ve heard reports that the per-unit price to computer manufacturers of some OEM synth chips is about to drop down to about the level of a Zip cartridge, and maybe lower. And not only is the cost invisible to the user, so is the technology: no more trying to figure out if the card on the store shelf fits into the type of slot in your computer, no more IRQ conflicts or hardware jumpers. No more driver software issues either: the right drivers will be pre-installed on the computer’s hard disk when it leaves the factory. And since everything is in software, there’s no reason why manufacturers have to lock into a particular method of synthesis on their DSP chips. While today most companies are concentrating on PCM sample-playback technology, we might someday see other architectures, including subtractive, granular, real FM (like six-op with wavetables, still a wonderful way to make sound), and even our friend physical modeling finding their way onto computer motherboards.

Besides the inexorable march of technology, there have been political forces at work on this situation, which will make onboard synths even more useable. Today’s General MIDI synths, whether they’re in a rack or on a chip, use fixed sets of instrument samples stored in ROM. High-end samplers and sampling synths, on the other hand, let you stick any sound you want–from a mic, a CD, or whatever–into RAM and trigger it with a sequencer. But with RAM now so cheap, why shouldn’t GM synths be able to access different sounds too? The answer is that there is no industry-standard architecture for including external samples in a synth engine; while some samplers have the capability of using sounds stored in other samplers’ formats, there is by no means universal agreement on how sounds should be exchanged. Since GM by definition must be a universally agreed-upon standard, sampling has simply been out of the question.

But that’s about to change, thanks to a proposed addition to the General MIDI spec that has been under hot discussed for a couple of years within the MIDI Manufacturers Association and its Interactive Audio Special Interest Group, and may well be ratified by the time you read this: Downloadable Samples (DLS). The DLS spec outlines a structure for transmitting, mapping, and triggering samples that can be used by any synth manufacturer. Therefore a set of instruments, ambiences, or sound effects that conforms to the DLS spec can be used by any synth engine that also conforms. Now a musical file can contain not only MIDI instructions, but also the exact sounds required to play it.

But let’s get even cheaper. How about synthesis without any visible hardware at all? Today’s Power PC and Pentium CPUs are so fast, they can generate and process complex waveforms on the fly, needing only a DAC to turn them into real-world audio. You don’t need a DSP chip at all: put the basic waveforms of the sounds you want in RAM, and pull them out with MIDI commands. The CPU can address the RAM, as well as handle pitch changing, filtering, and other DSP in real time. If the computer is fast enough, you can address multiple sounds simultaneously. Get 24 sounds going at once, and you have a genuine General MIDI synth, without a stitch of hardware to be found.

This "software synthesis" is getting lots of attention from some big players. It’s in Apple’s QuickTime multimedia system software, where it’s called "QuickTime Music Architecture", and uses a set of instrument samples designed by Roland. Over a year ago, you may remember, Intel announced its "Native DSP" sound engine, which was to include a synthesizer component. That software was never released, however–it was quashed, according to reports, by Microsoft, who didn’t want any of its thunder stolen at the moment it was announcing Windows 95. But now Microsoft has taken up the cause itself: it recently announced its own system, also using Roland sounds, called "DirectMusic". (Typically for the company, it announced that the new technology would be compatible with Downloadable Samples, months before that spec was approved.) Other developers have continued to work on synth engines for Intel machines; a couple are already available and several more are in the works, both by companies you’ve never heard of and by some of the better-known names in music technology.

The coolest part of software synthesis is that it’s a total no-brainer for the user. There’s no hardware to deal with, no setup procedures outside of loading the software drivers and sound sets, and no wiring besides the output cable. What’s cool for developers of content for the systems is that the engines are completely portable–even more so than General MIDI. You can literally put the entire synthesizer on a few floppies or a website and install it on any machine that has a DAC and the requisite horsepower. Now everyone in your audience is using identical synthesizer engines, regardless of who made their computer, and so getting your files to sound exactly the same on every platform – something the developers of General MIDI could wish for, but never before expect to achieve – is a snap. Adding to the coolness factor is that, since the sounds are in RAM, not ROM, replacing them is easy, so using Downloadable Samples is another no-brainer.

But there are some serious disadvantages to using software synthesis. The largest of these is that it eats up CPU horsepower: the more voices you want to synthesize, the more software cycles are taken up fetching and processing the waveforms. Use large samples or fast sample rates, or fancy delays and reverbs, and the problem gets worse. Since audio has to be processed in real time, other functions, such as I/O or graphics rendering, will wait, with a resultant degradation in performance. It doesn’t take much to degrade video performance to the point where frames have to be dropped, and many users will find that an unacceptable compromise. There’s no free lunch here: yes, CPUs are getting faster, but everyone in the multimedia biz thinks that the extra bandwidth belongs to them. As the machines get speedier the demands being put on them are keeping pace, what with 3-D graphics in millions of colors, streaming digital audio, high-speed networking, and the other wonders of the modern age. If every CPU is running at 90% of capacity, asking for 20% more to do real-time synthesis may not be such a hot idea. As one observer has put it, it’s no bargain using a quarter of a $300 chip to make music, when you can do the same with a $20 dedicated DSP chip, and get all of it.

Another problem is that software synths have a hard time being played in real time: when they respond to MIDI data coming in from an external source, like a keyboard, there is often a delay in processing the data, known as latency. I’ve seen latencies from 50 milliseconds up to 300 miliseconds, and even longer in some cases, any of which are totally unacceptable in any kind of performing or recording environment. So while they may work okay for playing back pre-recorded files, software synths are not very useful for creating them.

As with on-board hardware synthesis, the most common form of software synthesis is PCM sample playback, although other methods are certainly feasible, and even though no announcements have been made, at least two manufacturers have been talking about releasing other types of synthesis engines. The quality of the sounds, rather than being fixed in hardware, is adjustable–in some cases by the user. On even the fastest CPU, there’s always going to be a trade-off between sound quality and either polyphony or image quality–and in the latter case, it means most users will opt for better pictures and worse sound. And it’s not like there’s a lot of quality to spare: at their best, none of the software synth engines I have heard yet sound really good. Certainly they’re a step up from the old "Sound Smasher" cards, but what I’ve heard ranges in quality from halfway decent to seriously awful.

You might be asking at this juncture, what’s the point of all this? Why do we want to put all of this synthesis power in the hands of every computer user? Are we trying to make composers and sound designers out of everybody? Not hardly. What this is all about is trying to create a reliable delivery system for reasonable-sounding music under extremely restricted conditions: i.e., over the Internet. There’s this problem with the Internet (and with another popular delivery system, CD-ROMs), which can be summed up in one word, a word that makes anyone who’s tried to work with audio on it cringe: bandwidth.

Despite all the solemn industry pronouncements that "In the next century bandwidth will not be an issue!", bandwidth is always going to be an issue. (Remember when they said that nuclear power was going to make electricity "too cheap to meter"?) Like CPU speed, what we demand from our Internet connections will constantly keep pace with the capacity. And even when we’ve got cable modems and DVD, audio is always going to be the poor cousin, the afterthought, something to be given whatever bandwidth or disc real estate is left over when the visual folks have had their fill. Nobody likes sitting in front of their computer waiting for images to materialize (why do you think they call it the "World Wide Wait"?), and they’re going to like even less waiting for audio to download, or even for the buffers required by streaming audio to fill.

You can’t do much about compressing audio, not if you want it to sound good. Yes, there are some remarkable strides being made in conveying intelligible speech at 28,800 bps, but no one can pretend that this is an acceptable way to transmit music. Even with an ISDN or T1 line, handling more than one channel of high-quality audio at a time is next to impossible, so creating any kind of multi-channel sound environment –either to create a surround field, or to allow the user to mix multiple sources–is out of the question.

If there’s a synthesizer already in your computer, however, and you feed it MIDI data, you can avoid all of these problems. The bandwidth of MIDI is anywhere between 1/100th and 1/10,000th of what’s required for digital audio, and the fidelity of the playback is not at all dependent (assuming the receiver has decent buffering) on the speed of the connection. It’s really no more demanding than text, and so you don’t need to tell your browser to lower the music quality just to preserve the visuals any more than you would need to tell it to stop sending text because it’s slowing down the graphics!.

If the synthesizer can use alternative sound sets of downloadable samples, then its capabilities are effectively unlimited: a clever programmer can create a sound set that will encompass everything needed for a game, movie, or interactive presentation, and either pre-load it into the receiving computer, or send it down the line incrementally, during slack periods. A lot of music and sound effects can be created from just a couple of megabytes of well-designed samples. With playback engines already onboard, either in software or hardware, this "cheap" synthesis transforms itself into something very different: the high-fidelity alternative for Internet audio.

All of this, no doubt, is leaving both quality-conscious audio engineers and MIDI purists horrified. The fact that the best we can hope for in terms of soundtracks in this brave new world of the Internet is the ability to access a few megabytes of samples accompanying a telephone-quality voice track is sobering, to say the least. For an industry that has strived for so many years to attain the highest reproduction quality, it’s pretty odd to be staring down the barrel of sonic mediocrity and calling it the future. We’ll just have to hope that as our audiences jump on the Internet, they don’t throw out their CD players and hi-fi VCRs. And as content providers, we’d better not throw out our MIDI sequencers.

Paul Lehrman had a musical score premiered at Lincoln Center last month (really!) but they forgot to put his name in the program. On the other hand, his new book, Getting Into Digital Recording, published by Hal Leonard, does have his name on it.

These materials copyright ©1997 by Paul D. Lehrman and Intertec Publishing