Friday, 16 July 2010

MUSB isochronous transfers fixed (hopefully)

So, it took a while, but I think I managed to fix the MUSB bug, the relevant commit is here.

The MUSB DMA supports 2 different transfer modes: mode-0 transfers packets one at a time, requiring CPU intervention to reload the next packet, while mode-1 is able to transfer multiple packets, before triggering an interrupt.

For every mode-0 transfer, or for the last mode-1 packet, the packet in the FIFO must be manually flushed, that is, MUSB_TXCSR_TXPKTRDY must be set. This was somehow done in musb_g_tx (musb_gadget.c:513), but incorrectly. Using the procedure described on the wiki, with a packet size of 4 (parameter -x 4), I got the following usbmon output:
S Zi:2:120:1 -115:1024:0 1 -18:0:512 512 <
C Zi:2:120:1 0:1024:5584:0 1 0:0:512 512 = de010203 de050607 de090a0b de0d0e0f de111213 de151617 de19
1a1b de1d1e1f
S Zi:2:120:1 -115:1024:5584 1 -18:0:512 512 <
C Zi:2:120:1 0:1024:6608:0 1 0:0:4 4 = df010203
S Zi:2:120:1 -115:1024:6608 1 -18:0:512 512 <
C Zi:2:120:1 0:1024:7632:0 1 0:0:4 4 = df010203
S Zi:2:120:1 -115:1024:7632 1 -18:0:512 512 <
C Zi:2:120:1 0:1024:464:0 1 0:0:4 4 = df010203
S Zi:2:120:1 -115:1024:464 1 -18:0:512 512 <
...
The first packet is 512 bytes, which is normal, and then same data was repeated again and again: df is supposed to be incremented, that is, the next packets should contain e0010203, e1010203. I tried commenting out
if (csr & MUSB_TXCSR_TXPKTRDY)
return;
as recommend by Ajay, with the same result...

I then realized that the following lines (which, BTW, do not necessarily send a zero packet, they simply ask the MUSB controller to fetch the next packet from the FIFO):
DBG(4, "sending zero pkt\n");
musb_writew(epio, MUSB_TXCSR, MUSB_TXCSR_MODE
| MUSB_TXCSR_TXPKTRDY);
were clearing all flags from MUSB_TXCSR (including things like MUSB_TXCSR_ISO or >MUSB_TXCSR_DMAEN). So I replaced these with:
DBG(4, "sending zero pkt\n");
musb_writew(epio, MUSB_TXCSR, csr
| MUSB_TXCSR_TXPKTRDY);
where csr is the current value of MUSB_TXCSR. I'm not sure if some other flags must be cleared, but at least, it works for my test case...

And the output looks better:
S Zi:2:125:1 -115:1024:0 1 -18:0:512 512 <
C Zi:2:125:1 0:1024:2608:0 1 0:0:512 512 = de010203 de050607 de090a0b de0d0e0f de111213 de151617 de191a1b de1d1e1f
S Zi:2:125:1 -115:1024:2608 1 -18:0:512 512 <
C Zi:2:125:1 0:1024:3632:0 1 0:0:12 12 = df010203 e0010203 e1010203
S Zi:2:125:1 -115:1024:3632 1 -18:0:512 512 <
C Zi:2:125:1 0:1024:4656:0 1 0:0:12 12 = e2010203 e3010203 e4010203
S Zi:2:125:1 -115:1024:4656 1 -18:0:512 512 <
C Zi:2:125:1 0:1024:5680:0 1 0:0:4 4 = e5010203
S Zi:2:125:1 -115:1024:5680 1 -18:0:512 512 <
C Zi:2:125:1 0:1024:6704:0 1 0:0:4 4 = e6010203
...
Except that 3 requests were merged into one packet (twice), which is not supposed to happen, I suppose: If the gadget driver sends a short request to the gadget controller, it probably means that it wants a short packet to be transmitted.

In any case, a bigger problem appeared, if the driver sent packets of 5 bytes, instead of 4 bytes:
S Zi:2:009:1 -115:8:0 1 -18:0:512 512 <
C Zi:2:009:1 0:8:4568:0 1 0:0:512 512 = de010203 de050607 de090a0b de0d0e0f de111213 de151617 de191a1b de1d1e1f
S Zi:2:009:1 -115:8:4568 1 -18:0:512 512 <
C Zi:2:009:1 0:8:4576:0 1 0:0:15 15 = df010203 e0010203 e1010203 de0d0e
S Zi:2:009:1 -115:8:4576 1 -18:0:512 512 <
C Zi:2:009:1 0:8:4584:0 1 0:0:10 10 = e1e20203 e3010203 e301
S Zi:2:009:1 -115:8:4584 1 -18:0:512 512 <
C Zi:2:009:1 0:8:4592:0 1 0:0:15 15 = e4010203 e5010203 e6010203 e60102
S Zi:2:009:1 -115:8:4592 1 -18:0:512 512 <
C Zi:2:009:1 0:8:4600:0 1 0:0:5 5 = e7010203 e7
S Zi:2:009:1 -115:8:4600 1 -18:0:512 512 <
C Zi:2:009:1 0:8:4608:0 1 0:0:5 5 = e8010203 e8
...
The second packet data is badly corrupted, it should be df010203 04e00102 0304e101 020304. It seems like the DMA engine is not able to properly write unaligned data to the FIFO.

After seeing that problem (added to the fact that requests should not be merged together in the first place), I realized that MUSB_TXCSR_TXPKTRDY was probably not set at the right place in the code: it should be set right after the DMA finishes copying the data to the FIFO.

This happens in musbhsdma.c, function dma_controller_irq. And the code was already there, it's just that, for some reasons, it was only enabled for the host mode, and not the peripheral mode (if (devctl & MUSB_DEVCTL_HM), line 533). I enabled that code path, with a fix for DMA mode 0: it is useless to clear MUSB_TXCSR_DMAMODE in that case, and MUSB_TXCSR_DMAENAB may need to be cleared before clearing MUSB_TXCSR_DMAMODE, but the flag needs to be set again, otherwise this confuses the driver...

Then, I disabled the code setting MUSB_TXCSR_TXPKTRDY in musb_g_tx, and replaced it with another piece of code that forces the packet to be sent, by setting MUSB_TXCSR_FLUSHFIFO, and, it seems to work!
S Zi:2:016:1 -115:8:0 1 -18:0:512 512 <
C Zi:2:016:1 0:8:1328:0 1 0:0:512 512 = de010203 de050607 de090a0b de0d0e0f de111213 de151617 de191a1b de1d1e1f
S Zi:2:016:1 -115:8:1328 1 -18:0:512 512 <
C Zi:2:016:1 0:8:1336:0 1 0:0:5 5 = df010203 df
S Zi:2:016:1 -115:8:1336 1 -18:0:512 512 <
C Zi:2:016:1 0:8:1344:0 1 0:0:5 5 = e0010203 e0
S Zi:2:016:1 -115:8:1344 1 -18:0:512 512 <
C Zi:2:016:1 0:8:1352:0 1 0:0:5 5 = e1010203 e1
...


Long story short, the webcam works, even with DMA enabled... And g_ether still works (i.e., I didn't break everything else)...

The latest code is available here: the kernel is now based on a 2.6.34 tree.