Head-Fi.org › Forums › Equipment Forums › Computer Audio › How does USB audio transfer work ?
New Posts  All Forums:Forum Nav:

How does USB audio transfer work ?

post #1 of 22
Thread Starter 
Just trying to get a handle on how exactly it works behind the scenes. I am finding bits and pieces of information but nothing to put the whole picture together. In simple terms -

Between the host(pc) and the device(soundcard/dac) which side controls the transfer and the timing ?

Does the device send a request to the pc everytime it is ready for more data ? or it is just a synchronized/timed transfer that keeps happening at a fixed interval ?

Does the pc send data to the device one sample at a time or in batches (many samples together) ?

post #2 of 22
In short USB audio protocols are one way street, the host sends the data like S/PDIF without possibility for retransmit.

I have never found the answer for the timing issues. Is the audio stream really like S/PDIF in the bus or does the DAC have to reclock it? If reclock, how to avoid underflow/overflow errors?

I understand some devices have their own proprietary communication using normal USB data protocols, the USB audio streaming means any device can be used without special drivers.
post #3 of 22
In addition to a variety of proprietary solutions from vendors there is a generic Device Class Specification for USB in version 1.0 and version 2.0 plus some associated format specs.

While the transport is isochronous and you do not get retransmissions there are modes that are not just one way with the source/PC sending data to the audio device.

In standard isochronous mode the receiver must estimate the playback rate by averaging over the incoming data. The master clock is a multimedia timer in the PC and the USB audio receiver uses a buffer with a software PLL to adapt to whatever the average data rate from the PC is.

With async mode there is a secondary connection back from the device to the PC where the device can tell the PC how much data to send, so it effectively slaves the PC to its local clock.


post #4 of 22
Originally Posted by thomaspf View Post
With async mode there is a secondary connection back from the device to the PC where the device can tell the PC how much data to send, so it effectively slaves the PC to its local clock.
Has anybody implemented that?
post #5 of 22
My best guess is that units such as the 0404 USB and other "pro audio" USB devices that have the capability to record/play at 96/192 kHz must use such an arrangement, as they require device specific drivers and operate above the max sample rate defined in the USB Audio standard.
post #6 of 22

Everything you wanted to know about USB audio but where afraid to ask

Goodsound, others;

Basically when any usb device is plugged into a computer the computer ask's the device for what is called it's enumeration. This is a big table of information that declares what the device can do etc. You can look at some of this information on a PC with USBView and MAC's with USB Prober.

In Audio without drivers there is basically 3 methods of Iscosynchronus (ISO) transmission:

1) Sync, not used much basically the device strickly tracks the computer on the sample rate.

2) Adaptive here the data is buffered and the sample clock is updated every so often to match the computer.

3) Async here the device tells the computer what rate to send the data with a feedback packet. Kind of like a flow control 3 byte data packet tell the computer to slow up or speed up the data flow. Most promising.


If you write a driver then you can do just about anything but most are doing what is called Block Mode (0404 USB). The computer sends down packets and the device spills it out. Block mode is a low priority and therefore can cause skipping and stuff which some of you have seen.


Ok back to ISO stuff...

In the enumeration you declare what is called the rate of the SOF (Start of Frame) packet. This is a one byte packet usually for audio sent every 1ms that determines the timing from the computer in Adaptive mode. In Async mode I use this to dertemine what size of data my buffer has and if I should slow down or speed up the data flow.

Typically what happens in all operating systems is that a packet is sent between each of the SOF frames. Since these happen every 1ms let's look at what we have.

Ok 44.1/16 bits. That means we need 4 bytes of data for stereo (32 bits or 16 bits by 2). Since we are running at 44,100 we divide that by 1000 since each SOF is 1ms we have 1000 SOF a second. So we have (44,100/1000)*4 bytes = 176.4

Well it is impossible to send less than a byte so what the computer does is send 9 packets of 176 then 1 packet of 180:

(176*9 + 180)/10 = 176.4

In the case of Adaptive which is the most common of controllers (Benchmark, most of the PCM27xx series and others) the controller takes the SOF frame and compares it to it's internal clock and then adjusts the serial stream to the dac to make sure that the internal buffers don't underrun over overrun.

In Async mode we actually compare the input DMA pointer (iDMA) to the out going DMA pointer (oDMA) going to the dac and determine if we need the computer to stream faster or slower.

So basically the reason that Async is better than Adaptive is because in Async the clock does not change. Only the rate at which the data reaches the device. This is not to say that either is really better than the other, there are many other attributes that determine how good a device maybe.


Jitter... mainly confused and stuff.

First there is no jitter in the USB stream like there is in SPDIF because there is no clock ridding over the data like there is in SPDIF products. There is in all clocked audio devices both SPDIF/USB jitter which is called intrinsic. Intrinsic jitter is caused by the Audio receiver (SPDIF, USB, Firewire, etc) in internal clocking and also in the dac and also caused by the power supply.

Much of the jitter caused by USB Audio devices is due to the USB recveiver it self. See many of them including the TAS1020, TUBS3200, PCM27xx devices take some multiple of the required USB 12MHZ clock to determine the serial clock for the dac. Well try and get 44.1, 48, 88.2, 96K from a 12MHZ clock and you can see the problem. Many of these use clock multipliers and dividers of great complexity to do this. This causes most of the jitter that people write about saying that USB Audio is bad. Well... not really that true.

Also with Async mode and using either an ARM controller or the TAS1020/TUSB3200 you can source the Master Clock (MCLK) into the device using a very low jitter clock and derive all the Audio Clock signals from that Clock lowering the jitter by a substantial amount.

Or you can do what other companies on the SPDIF side have done for a long time and that is by the use of PLL/VCXO you can rederive the clock and also buffer the data via FIFO's and other things to clean up the signal between the USB Audio receiver and the DAC or processor.


Part of the problem with USB Audio at this point is there isn't really a controller you can buy off the shelf that already implements all this stuff without programming. The main reason being the enumeration. The amount of options is really staggering. Also to make a generic controller that works with anything is really a task that may never pan out.

With say an ARM processor or the TAS1020/TUSB3200 you can program your way into some really good designs. This is no easy task as USB is not something simple to deal with. There is allot going on to make one of these work really well. But expect more and more companies to climb on board to this technology in the near future.

Five years ago this month I made my first USB DAC. I thought this is silly nobody is every going to buy this thing. My first listen through the Cosecant and I knew it was going to be hot.

post #7 of 22
Thread Starter 
Hi Gordon,
Thanks your reply. It was very useful.

I am still reading a little bit about USB and while my knowledge about the whole interface is still in its infancy I was wondering –
Now that the bandwidth and data payload capacity have increased with USB 2.0 (and USB 3.0 will roll out next year with even more), would it make any sense to use the Interrupt mode of transfer for audio purposes ? It does seem to have some sort of a stop/stall/retry mechanism to delay/resend the transmission if it fails (out buffer not ready). That way the device and host can keep running at their own independent pace i.e. device could have its own clock.

Audigy 2 NX supposedly uses Asynchronous mode but I have so far only “heard” that it is async. I don’t know for sure if it really is or not. It uses the Philips ISP1581BD controller. Not sure if that supports async or not.

I am trying to find more about Asynchronous mode but I can’t seem to find anything online. guess will have to look at the “USB Specification” for that.
post #8 of 22

Great post, that was extremely informative.

While I'm no expert on modern digital design techniques I did a fair amount of machine code and assembly language programming about twenty years ago on microprocessors as well as microcontrollers. I'm fairly familiar with DAC and ADC devices and wrote an assembler successive approximation ADC routine for a 6809 processor that worked very well. The hardest part of that code was to make sure that every possible subroutine in the code took exactly the same number of clock cycles to complete.

It seems to me that configuring an audio device to "spoof" a USB hard drive might be one way to go. Would such an idea be feasible or is that something that you were discussing above and I just missed the point?
post #9 of 22
Good Sound,

Async mode is like this... you have a data pipe which is the same for Sync, Adaptive which is the data flow that will end up at the DAC. This data pipe endpoint and OUT endpoint. All the references are based from the computer so OUT is to the device and IN from the device. Anyways... For Async there is also an ISO IN endpoint pipe that is used as what they call Feedback but what I would call flow control. You can tell the computer to speed up and slow down or even.

Reading the USB Specification is like reading the Urantia Book. The preface should say about the same thing. <i>We will make up new terms to describe the concepts of this Specification to further confuse anyone trying to read it!</i>

Anyways like anything making an Async device doesn't mean it's going to work any better than say another device. I have a $50 async dac sitting here and it's terrible. I am not sure why it's even done async.

The point is that if you can remove the internally generated clocks from the Audio Clock (i.e I2S, Left Justified, Right Justified even AC97) then you can supply a low jitter Master Clock into the system and get better results.


Let's see.... the first computer I built had 128 bytes of memory. I had #10 Zilog Z80 development system that was wire wrapped back plane and 8" single sided diskettes. I wrote CPM floppy drivers that would use NMI interrupts to read and write sectors one at a time. I have written ROM BIOS for two PC's I developed for the goverment as well as the motherboards, power supplies and many of the boards.

Coding is so much easier now than it ever was. In the past you too do everything with as little as possible. Twenty years ago I could tell you op-codes and T states for at least 5 or 6 different processors.

Now I prefer Analog mostly, it's much more fun and harder to do.

post #10 of 22

It seems to me that configuring an audio device to "spoof" a USB hard drive might be one way to go. Would such an idea be feasible or is that something that you were discussing above and I just missed the point?
This is what allot of PRO stuff does like the EMU 0404 USB. But the problem is that it would require a driver to do this. Also as I stated above Interrupt and Bulk both have a much lower priority than does ISO mode devices. Therefore many of these products tend to slip up and miss samples when used on certain machines with internal hubs and devices that interact with them.

post #11 of 22
Thread Starter 
Originally Posted by Wavelength View Post
Reading the USB Specification is like reading the Urantia Book. The preface should say about the same thing. We will make up new terms to describe the concepts of this Specification to further confuse anyone trying to read it!
I don't feel as bad now that you mention it. This is like my n'th attempt at reading anything about USB and I was almost on the verge of giving up(again)!

Originally Posted by Wavelength View Post
Interrupt and Bulk both have a much lower priority than does ISO mode devices
I was of the impression that Interrupt mode can also be made high priority and is the only mode, other than isochronous, that gets "guaranteed latency" on the bus. Maybe even that "high" priority is lower than iso ?

Anyway, lets throw in a real example just for the fun of it, and to make it a little less abstract. I downloaded USBView and here is screen shot of the Audigy 2 NX in action.

As you can see it has opened 4 endpoints - 1 Interrupt, 1 Control and 2 Isochronous. But here's the thing –

At idle, i.e. no playback or recording happening, it has only the Interrupt endpoint open.

When only in playback mode it opens the Control and one Isochronous endpoint. This is the iso endpoint that doesn't have any payload size and has a bsyncaddress of 0x09h.

While in record mode only it opens only the Control endpoint. The iso endpoint is gone. I forgot whether it closes the Interrupt endpoint or not, but most likely not.

And while in full duplex mode (play & rec at the same time) it opens only the Control endpoint and the two iso endpoints (as seen in the screenshot).

I was hoping to see the value of bmAttributes for each of the endpoints.
So what and how exactly is everything going about getting done ?

I completely agree about having a master clock for the DAC and then deriving the "rest" of the clocks from it instead of the other way around. I would go a little further and say there could be two master clocks - one for 44.1Khz and one for all the rest of the sampling rates that are multiples of 48Khz.

On that subject, I see that the Audigy 2 NX has 3 oscillators on board (markings on the case in parantheses) -
- 12Mhz (12.000N3H)
- 24Mhz (24x5H3)
- 3.686Mhz (C3.6864LOJ)
There's also probably another crystal/oscillator tucked in somewhere probably a 3.xx Mhz again.

I know it doesn't end here. The quality of the clocks, dacs, opamps, power supply go a long way in making a good (or bad) usb solution, but I thought I'd bring this up as an exercise in usb audio if not anything else.
post #12 of 22

If you scroll down in USB View you going to see a bunch more stuff.

It's hard to say about the oscillators. To support 48/96 you typically need a 24.576MHZ. But for 44.1/88.2 you need a 22.5792.

For USB you need 12MHZ and many controllers can use a PLL/DLL multiplier and complex dividers to get just about any Fs rate you may want.

post #13 of 22
Thread Starter 
actually thats all its displaying. there is nothing below that.
post #14 of 22
Did you install the Creative driver or are you just connecting the 2NX and work with the Windows system driver?

With usbaudio.sys you should find traffic on 2 isochronous endpoints during playback. One from the PC to the device and one the other way round for the flow control.


post #15 of 22
Thread Starter 
yes Creative driver is installed and everything else checks out fine. soundcard works perfect otherwise.
New Posts  All Forums:Forum Nav:
  Return Home
  Back to Forum: Computer Audio
Head-Fi.org › Forums › Equipment Forums › Computer Audio › How does USB audio transfer work ?