Topics on this page:

  • DV, DVCAM, DVCPRO -- What is DV? What's the difference between DV, DVCAM, and DVCPRO? Digital8? How good are the DV formats compared to other formats? What are the DV artifacts I keep hearing about?
  • Digital-S/D-9, DVCPRO50, DVCPROHD100, D-9HD -- What are Digital-S and DVCPRO50? Four codecs for HD?
  • 4:2:2, 4:1:1, 4:2:0 -- What are 4:2:2, 4:1:1, and 4:2:0 anyway? Why does PAL DV use 4:2:0? Can I chromakey with 4:1:1? Can I use 4:1:1 DV sources for upconversion to HDTV?
  • 1394/FireWire -- What is 1394 and/or "FireWire"? Why are DV and 1394 always discussed together? What does a 1394 connection do for me? Is 1394 that much better than Y/C or component analog?
  • locked vs unlocked audio -- What's the difference between locked and unlocked audio? Will unlocked audio hurt me? How do I deal with it? How do I intermix locked and unlocked audio? Does unlocked audio explain why my audio loses sync in Adobe Premiere?

by Adam J. Wilt

What is DV?
DV is an international standard created by a consortium of 10 companies for a consumer digital video format. The companies involved were Matsushita Electric Industrial Corp (Panasonic), Sony Corp, Victor Corporation of Japan (JVC), Philips Electronics, N.V., Sanyo Electric Co. Ltd, Hitachi, Ltd., Sharp Corporation,  Thomson Multimedia, Mitsubishi Electric Corporation, and Toshiba Corporation. Since then others have joined up; there are now over 60 companies in the DV consortium.

DV, originally known as DVC (Digital Video Cassette), uses a 1/4 inch (6.35mm) metal evaporate tape to record very high quality digital video. The video is sampled at the same rate as D-1, D-5, or Digital Betacam video -- 720 pixels per scanline -- although the color information is sampled at half the D-1 rate: 4:1:1 in 525-line (NTSC), and 4:2:0 in 625-line (PAL) formats. (See below for a discussion of color sampling.)

The sampled video is compressed using a Discrete Cosine Transform (DCT), the same sort of compression used in motion-JPEG. However, DV's DCT allows for more local optimization (of quantizing tables) within the frame than do JPEG compressors, allowing for higher quality at the nominal 5:1 compression factor than a JPEG frame would show.

DV uses intraframe compression: Each compressed frame depends entirely on itself, and not on any data from preceding or following frames. However, it also uses adaptive interfield compression; if the compressor detects little difference between the two interlaced fields of a frame, it will compress them together, freeing up some of the "bit budget" to allow for higher overall quality. In theory, this means that static areas of images will be more accurately represented than areas with a lot of motion; in practice, this can sometimes be observed as a slight degree of "blockiness" in the immediate vicinity of moving objects, as discussed below.

DV video information is carried in a nominal 25 megabit per second (Mbps) data stream. Once you add in audio, subcode (including timecode), Insert and Track Information (ITI), and error correction, the total data stream come to about 36 Mbps. Roger Jennings' paper on the Adaptec website runs through the detailed numbers.
 

What's the difference between DV, DVCAM, and DVCPRO?
Not a lot! The basic video encoding algorithm is the same between all three formats. The VTR sections of the US$20,000 DVCAM DXC-D130 or  US$17,000 DVCPRO AJ-D700 cameras will record no better an image than the lowly DV format DCR-VX1000 at US$4,000 (please note: I am not saying that the camera section and lens of the VX1000 are the equals of the high-end pro and broadcast cameras: there are significant quality differences! But the video data recorded in all three formats is essentially identical, though there may be minor differences in the actual codec implementations). A summary of differences (and similarities) is tabled in Technical Details.

The consumer-oriented DV uses 10 micron tracks in SP recording mode. Newer camcorders offer an LP mode to increase recording times, but the 6.7 micron tracks make tape interchange problematic on DV machines, and prevents LP tapes from being played in DVCAM or DVCPRO VTRs. Sony's DVCAM professional format increases the track pitch to 15 microns (at the loss of recording time) to improve tape interchange and increase the robustness and reliability of insert editing. Panasonic's DVCPRO increases track pitch and width to 18 microns, and uses a metal particle tape for better durability. DVCPRO also adds a longitudinal analog audio cue track and a control track to improve editing performance and user-friendliness in linear editing operations.
 

Digital8?
Sony's Digital8 uses DV compression atop the existing Video8/Hi8 technological base. Digital8 records on Video8 or Hi8 tapes, but these run at twice their normal speed and thus hold half the time listed on the label. Digital8 will also play back existing Video8 and Hi8 tapes, even over 1394/i.link, allowing such tapes to be read into NLEs (at least, those for which the lack of timecode is not an issue -- batch capture utilities are unlikely to work, since Video8/Hi8 timecodes are not sent across the 1394 connection).

Digital8 is a camcorder-only format as of Spring 1999; no VTRs are expected. It appears to be the 8mm division's way of keeping its customer base from defecting to DV. By leveraging the massive investments of 15 years in 8mm analog camcorders and transports, the unit cost of Digital8 gear is kept very low, roughly half of what a comparable DV camcorder would cost, and its ability to play back legacy analog tapes is worthwhile for those with large libraries of 8mm.

All Digital8 camcorders can record from the analog inputs (at least outside the EU), and all are equipped with i.LINK ports for digital dubbing and NLE connections.

How good are the DV formats compared to other formats?
DV formats are typically reckoned to be equal to or slightly better than Betacam SP and MII in terms of picture quality (however, DV holds up better over repeated play cycles, where BetaSP shows noticeable dropout). They are a notch below Digital-S and DVCPRO50, which are themselves a (largely imperceptible) notch below Digital Betacam, D-1, and D-5. They are quite a bit better than 3/4" U-matic, Hi8, and SVHS.

On a scale of 1 to 10, where 1 is just barely video and 10 is as good as it gets, I would arrogantly rate assorted formats as follows:

D-5 (10-bit uncompressed digital) 10
D-1 (8-bit uncompressed digital) 9.9
Digital Betacam, Ampex DCT 9.7
D-9 (Digital-S), DVCPRO50 9.6
DV, DVCAM, D-7 (DVCPRO) 9
MII, Betacam SP 8.9
1" Type C 8.7
3/4" SP 6.5
3/4", Hi8, SVHS 5
Video 8, Betamax 4
VHS 3
EIAJ Type 1, Fisher-Price Pixelvision 1

I had previously placed D-2 and D-3 uncompressed composite digital formats just below BetaSP, lower than any of the component formats. My feeling was that while D-2 and D-3 are excellent first-generation formats for composite analog playback and NTSC broadcast, the compositing of color with luminance (which includes a color bandwidth limitation even more severe than DV or BetaSP employ) makes clean multigeneration and multi-layer image compositing problematic at best (even such simple things as adding titles).

However, I was severely upbraided by several folks with extensive digital composite experience, who all rated D-2 and D-3 between DV and DigiBeta. If you've got a high-end all-digital postproduction chain, the quality in these formats holds up over multiple generations extremely well, much better than any analog format, be it component or composite. While this is certainly true, if you don't have that all-digital pathway, I'm doubtful about how they would fare... so assume that D-2 and D-3 fall somewhere in the range between 1" and DigiBeta, and go have a look for yourself!

I've also moved 1" / BetaSP / DV formats down a bit numerically, though the relative rankings are preserved. Again, folks who live in high-end digital suites all day suggested this, and I have to agree. Bear in mind that my perceptions are largely predisposed to see BetaSP quality as pretty darned good; most of my work has been in analog component and Y/C editing with analog Y/C monitoring on PVM-series monitors. But after you sit in front of analog component or digital monitoring using BVM or Panasonic broadcast-grade monitors, your attitudes start to adjust upwards, and you start to discern differences between the merely very good stuff and the truly excellent stuff a bit more readily!]

What are the DV artifacts I keep hearing about?
DV artifacts come in three flavors: mosquito noise, quilting, and motion blocking. Other picture defects encountered are dropouts and banding (a sign of tape damage or head clogging).

The most noticeable spatial artifacts are feathering or mosquito noise around (typically) diagonal fine detail. These are compression-induced errors usually seen around sharp-edged fine text, dense clusters of leaves, and the like; they show up as pixel noise within 8 pixels of the fine detail or edge causing them. The best place to look for them is in fine text superimposed on a non-black background. White on blue seems to show it off best. The magnitude of these errors and their location tends to be such that if you monitor the tape using a composite video connection, the artifacts will be masked by dot-crawl and other composite artifacts.

A spatial quilting artifact can also be seen on certain diagonals -- typically long, straight edges about 20 degrees off of the horizontal. These are minor discontinuities in the rendering of the diagonal as it passes from one DCT block to the next; so minor that they're usually invisible. Watching such diagonals during slow pans is often the only way to see the artifact.

Motion blocking occurs when the two fields in a frame (or portions of the two fields) are too different for the DVC codec to compress them together. "Bit budget" must be expended on compressing them separately, and as a result some fine detail is lost, showing up as a slight blockiness or coarseness of the image when compared to the same scene with no motion. Motion blocking is best observed in a lockdown shot of a static scene through which objects are moving: in the immediate vicinity of the moving object (say, a car driving through the scene), some loss of detail is seen. This loss of detail travels with the object, always bounded by DCT block boundaries. However, motion blur in the scene usually masks most of this artifact, making this sort of blocking hard to see in most circumstances.

Finally, banding or striping of the image occurs when one head of the two on the scanner is clogged or otherwise unable to recover data. The image will show 10 horizontal bands (12 in PAL countries), with every other band showing a "live" picture and the alternate bands showing a freeze frame of a previous image or of no image at all (or, at least in the case of the JVC GR-DV1u, a black-and-white checkerboard, which the frame buffers appear to be initialized with).  Most often this is due to a head clog, and cleaning the heads using a standard manufacturer's head cleaning tape is all that's required. It can also be caused by tape damage, or by a defective tape. If head cleaning and changing the tape used don't solve it, you may have a dead head or head preamp; service will be required.

What are Digital-S and DVCPRO50?
JVC's D-9 (formerly known as Digital-S) and Panasonic's  DVCPRO50 use two DV codecs in parallel. The tape data rate is doubled to 50 Mbps (video) and the compression work is split between the two codecs. The result is a 4:2:2 image compressed about 3.3:1. It's visually lossless and utterly gorgeous. Think of Digital Betacam at a bargain price.

JVC's D-9 uses the 1/2" SVHS form factor for tapes and VTRs, although the tape cassette itself is more robust and the transport is equipped with sapphire guide roller flanges and tape cleaner blades and a new scanner design. One of the D-9 players will also play back analog SVHS tapes, allowing its use for editing existing libraries of SVHS tapes as well as newer D-9 footage. Head life (so far, in on-air broadcast usage) is well in excess of 4000 hours; equipment cost is very low (comparable to 25 Mbps DVCAM or DVCPRO); and maintenance expenses are well below those of the Betacam decks that D-9 is typically displacing. So far only JVC is supporting this format, which has resulted in a less-than-headlong rush by the video community to embrace it. Watch it, though; it's hot. If you're doing high-end EFP on a budget, this is the format to use.

Panasonic's DVCPRO50 uses the same DVCPRO tapes and transports as its 25 Mbps DVCPRO products (there is also a 93-minute DVCPRO50 tape specifically for the AJ-D950A VTR, which Panasonic says should only be used in DVCPRO50 mode. When using standard DVCPRO tapes, the maximum recording time is about 61 minutes since the P123L cassette is being run twice as fast).  DVCPRO50 VTRs will also play back DVCPRO tapes.

The 900-series DVCPRO50 kit is real jack-of-all-trades stuff. The AJ-D910WA camcorder (US$24,200) will record either DVCPRO or DVCPRO50, in either 4:3 or true 16:9 modes. The AJ-D950A VTR (US$25,000) records and play back either DVCPRO or DVCPRO50, and additionally is switchable between 525/59.94 (NTSC) and 625/50 (PAL) formats. The only thing you give up is miniDV cassette playback; even with the adapter the 950 won't read the tiny tapes. Fortunately the AJ-D940 DVCPRO50 player, US$19,500 or so, will play back those miniDV tapes, and offers a wider range of slo-mo speeds in the bargain.

Unlike D-9, second-sourcing is available from Philips, Hitachi, and Ikegami.

The DVCPRO50 kit is also a lot more portable and lightweight than D-9, so it's the format of choice if you're doing high-end EFP with a somewhat bigger budget and you want to keep your camera operators from wearing out as quickly!

Panasonic also has DVCPRO-form-factor progressive-scan cameras and VTRs that use the 50 Mb/sec data rate to encode a 480-line proscan image.
 

Four codecs for HD?
Both JVC and Panasonic showed working prototypes of 100 Mbps DV-derived products at NAB '99 for handling HDTV; Panasonic was shipping at NAB 2000.  Both firms gang four DV codecs together to get the 100 Mbps datastream, while preserving the same equipment form factor and operational methodologies used in the current 50 Mbps products. Panasonic calls their stuff DVCPROHD100, while JVC uses the D-9HD moniker, reflecting the SMPTE standard number for their DV50 format.

DVCPRO HD and D-9 HD both record 1280 Y samples and 640 Cr and Cb samples per line, compared to HDCAM's 1440 Y and 480 Cr & CB samples. Thus the DV100 formats have slightly lower luma resolution than HDCAM but slightly better chroma resolution (see the next section for a discussion of sampling).

It should be noted that both of these companies are well-placed to serve the growing DTV market whatever image format a broadcaster selects. Panasonic is selling a switchable 720p/1080i HD-D5 VTR (not based on DV technology), the AJ-HD2700, which has already become the studio standard VTR for the dawn of US DTV. JVC's NAB '98 and '99 displays featured D-9 variants of most popular ATSC DTV formats -- 480i, 480p/30, 480p/60, 720p, and 1080i.. These two companies will be pushing the edge of the DV envelope for quite some time to come...

Sony's HDCAM format uses compression technology "derived from DV and with certain similarities", but it is not on the main branch of the DV family tree. Its data rate of 135 Mbps yields beautiful images; it's extremely rare to see a noticeable artifact in an HDCAM picture.
 

What are 4:2:2, 4:1:1, and 4:2:0 anyway?
These are all shorthand notations for different sampling structures for digital video. They are also used for CIF and QSIF and suchlike MPEG frame sizes, but in the discussion that follows, I focus on the numbers for SDTV (standard-definition TV) digitized to the ITU-R BT.601 standards: 13.5 MHz sample frequency and 720 pixels per line.

The first number refers to the 13.5 MHz sampling rate of the luminance: "4" because (a) it's nominally almost approximately sort of four times the NTSC and/or PAL color subcarrier frequencies, and (b) because if it's "4" the other numbers can be integers whereas if it were "1" the formats would be "1:0.5:0.5", "1:0.25:0.25", and "1:0.5:0" respectively, and which would you rather try to read off in a hurry? The 13.5 MHz sampling yields 720 pixels per scanline in both 525/59.94 and 625/50 systems (NTSC and PAL/SECAM). This number applies to D-1, D-5, Digital Betacam, BetaSX, Digital-S, and all the DV formats just the same.

The other two numbers refer to the sampling rates of the color difference signals R-Y and B-Y (or, more properly in the digital domain, Cr and CB)

In 4:2:2 systems (D-1, D-5, DigiBeta, BetaSX, Digital-S, DVCPRO50) the color is sampled at half the rate of the luminance, with both color-difference samples co-sited (located at the same place) as the alternate luminance samples. Thus you have 360 color samples (in each of Cr and CB) per scanline.

In 4:1:1 systems (NTSC DV & DVCAM, DVCPRO) the color data are sampled half as frequently as in 4:2:2, resulting in 180 color samples per scanline. The Cr and CB samples are considered to be co-sited with every fourth luminance sample. Yes, this sounds horrible -- but it's still enough for a color bandwidth extending to around 1.5 MHz, about the same color bandwidth as Betacam SP (which, were it a digital format, would be characterized as 3:1:1).

So where does 4:2:0 (PAL DV, DVD, main-profile MPEG-2) fit in? 4 x Y, 2 x Cr, and 0 x CB? Fortunately not! 4:2:0 is the non-intuitive notation for half-luminance-rate sampling of color in both the horizontal and vertical dimensions. Chroma is sampled 360 times per line, but only on every other line of each field. The theory here is that by evenly subsampling chroma in both H and V dimensions, you get a better image than the seemingly unbalanced 4:1:1, where the vertical color resolution appears to be four times the horizontal color resolution. Alas, it ain't so: while 4:2:0 works well with PAL and SECAM color encoding and broadcasting, interlace already diminishes vertical resolution, and the heavy filtering needed to properly process 4:2:0 images causes noticeable losses; as a result, multigeneration work in 4:2:0 is much more subject to visible degradation than multigeneration work in 4:1:1.

"Now how much would you pay? But wait, there's more!" In US implementations of 4:2:0, the color samples are supposed to be vertically interleaved with luminance, whereas in European 4:2:0 they're supposed to be co-sited. Practically speaking, this is a headache for developers of codecs, encoders, and DVEs, but for DV purposes it's not especially exciting, since only European DV is 4:2:0.
 

Why does PAL DV use 4:2:0?
The best explanation I can come up with why PAL DV went with 4:2:0 is that both PAL and SECAM show reduced vertical color resolution and better horizontal color resolution compared to NTSC, so 4:2:0 seemed a closer match to the native display systems in PAL/SECAM countries. As PAL DV was intended as a consumer format for off-air recording or camcorder acquisition, multigeneration losses in 4:2:0 were considered a less important factor than the optimization of first-generation performance. PAL DVCAM also used 4:2:0.

When Panasonic developed DVCPRO, they opted for 4:1:1 even in PAL versions, specifically for the multigeneration advantage. Thus PAL DVCPRO decks have the pleasure and responsibility of handling both 4:1:1 DVCPRO playback and 4:2:0 DV playback; they have extra hardware to digitally resample the 4:2:0 signal and come up with a decently synthesized 4:1:1. Sometimes there is a reason for the higher prices that the poor Europeans are saddled with when it comes time to purchase gear...
 

Can I chroma-key with 4:1:1?
Yes indeed. Many early DVEs were 4:1:1 internally; plenty of digital boxes out there still are (such as the Panasonic WJ-MX50 and Sony FXE-series vision mixers, both of which chroma-key). As previously mentioned, BetaSP could be considered a 3:1:1 format in terms of component bandwidth, and BetaSP is used for chroma-key applications all the time.

True, the chroma performance of 4:2:2 formats is superior to 4:1:1 formats, especially in multigeneration analog dubbing. Part of the standard JVC sales pitch for D-9 is the superiority of 4:2:2 (which is true), and the utter doom and degradation that awaits you should you try to do anything -- including chroma-key -- with a 4:1:1 format (which is, shall we say, a wee bit exaggerated). But that doesn't mean that you can't do very satisfactory work in 4:1:1. A Bentley may not be as fancy as a Rolls Royce, but it'll still get you there in style. If you're used to the VW Beetle world of color-under analog formats, DV's Bentley should present few problems.

JVC has an excellent D-9 demo tape showing multigeneration performance comparisons of DV, D-9, and Digital Betacam; watch it if you can. Just be sure you take the hype with a grain of salt...
 

Can I use 4:1:1 DV sources for upconversion to HDTV?
All SDTV source material will suffer when upconverted to HDTV, compared with material originated in HD to begin with. 4:1:1 material is reported by some to be problematic in this aspect; certainly a 4:2:2 original will be more forgiving and if upconversion is your primary goal, you may want to look closely at D-9 (Digital-S) or DVCPRO50.

Snell & Wilcox have run DV through upconversion and reports that it look OK, especially if the excessive aperture correction (edge enhancement) in most DV cameras is turned down.

Of more concern is that DV artifacts, especially mosquito noise, may become annoyingly prominent when upconverted. However, the jury is still out on this.

Also, all HD material (at least in the USA) is likely to be 16:9. The way many DV cameras produce 16:9 by throwing away vertical resolution is enough to send shudders up my spine for SDTV work; for HD, it'll be a complete disaster. Perhaps I should add a section on shooting for HD upconversion; there are lots of issues...
 

What is 1394 and/or "FireWire"?
IEEE-1394 is a standard communications protocol for high-speed, short-distance data transfer. It has been developed from Apple Computer's original "FireWire" proposal (FireWire is a trademark of Apple Computer). Check out the white papers on Adaptec's website for pointers to additional 1394 sites for detailed information.

Sony calls their implementation of 1394 "i.LINK".
 

Why are DV and 1394 always discussed together?
They appear to have been developed together. The data stored on DV tape appear to reflect the packet structure sent across a 1394 link to a frightening degree of exactness. Certainly the DV format and 1394 High Performance Data Bus co-evolved, such that the first consumer DV camcorder in the USA (the Sony DCR-VX1000 and its single-chip brother the VX700) was also the first 1394-equipped consumer product available.
 

What does a 1394 connection do for me?
Plenty of good things:

  • You can make digital dubs between two camcorders or VTRs using 1394 I/O, and the copy will be identical to the original.
  • You can do cuts-only linear editing over 1394, with no generation loss.
  • You can stick a 1394 board into your computer (PC or Mac), and transfer DV to and from your hard disk. If your system can support 3.6 MBytes/sec sustained data rate -- simple enough with many A/V rated SCSI-2 drives and with most ATA/EIDE drives these days -- the world of computer-based nonlinear editing is open to you without paying the quality price of heavy JPEG compression and its associated artifacts, or the monetary price of buying heavy-duty NLE hardware and banks of RAID-striped hard drives.

Some time ago I edited a friend's wedding, going from Hi8 camera originals to a DV edit master. The 20-minute ceremony was covered by two cameras; we sync-rolled the VTRs and mixed the show in real time as if it were live. At the end, we weren't sure we liked it. So we dubbed it off via 1394 to another DV cassette, inserted a fresh DV cassette, and had another bash at the edit. This time, we liked it. We put the tape into the VX1000 and set up the DHR-1000 VTR as the recorder, using the built-in editor to drop the second attempt in frame-accurately atop the first across the 1394 wire. No generation loss. And we still had the first edit on the backup tape, should we have changed our minds.
 

Is 1394 that much better than Y/C or component analog?
Yes. A 1394 dub is a digital copy. It's identical to the original. That's really nice.

Yes, you can do almost the same thing with a SMPTE 259M SDI (serial digital interface) transfer. But VTRs with SDI cost big money. 1394 is built into many low-end cameras and VTRs, and the connecting cable -- even at Sony prices -- is only US$50; you can find it for US$20 if you shop around.

Also, transferring via 1394 is a digital copy, a data dump. No decompression or recompression occurs. Transferring DV around as baseband video, even digitally over SDI, subjects it to the small but definite degradation of repeated decompression/recompression.

If a digitally-perfect copy is a 10, and a point-the-camera-at-the-screen-and-pray transfer is a 1, here's how DV picture quality holds up over different transfer methods:
 

IEEE-1394 10
SDI 9.8
Analog Component (Y, R-Y, B-Y) 9
Y/C ("S-video") 8
Analog Composite 5
Point camera at screen and pray 1

What's the deal with DVCPRO gear and 1394?
DVCPRO, or D-7, is a DV-based format with a few subtle differences in its datastream. These changes were made by Panasonic's engineers to improve the robustness and reliability of the DVCPRO system when compared to DV, but they do mean that certain data header bits do not conform to Blue Book standards. Thus a direct data interchange between DVCPRO gear and DV/DVCAM gear is not possible in the same way that DV and DVCAM gear can interchange data; furthermore some nonlinear editor systems are not capable of accepting or generating a D-7-compatible signal.

As a result, DVCPRO gear with 1394 connections can only exchange data with other DVCPRO systems, not with DV or DVCAM gear. Since a 1394 transfer is a direct data dump, this is understandable; if a cross-format transfer were to be possible it would require that one deck or the other "translate" the signal to or from the DVCPRO data format to the Blue Book format.

As far as incompatibility with 1394 transfers to and from NLEs, this limitation is expected to diminish (and eventually vanish) as developers get a chance to work with DVCPRO over 1394, and to provide switches inside their programs to supply a Blue Book or DVCPRO datastream as required.

Remember, D-7 was designed first and foremost as an ENG format; robustness of the signal was paramount, and interconnection of gear in the ENG world is done via analog or via SDI (1394 is too limited an interface for the broadcast world, where the ability to switch and route video over thousand-meter runs is both necessary and taken for granted; 1394 has a length limit of 4.5 meters and requires a point-to-point session-level communication instead of a switchable open-ended transmission). 1394 was added to the DVCPRO lineup as an afterthought, at the prompting of customers, and as it becomes more prevalent (and if the marketplace demands it) you'll see more NLEs capable of dealing with D-7 data as readily as with Blue Book data, and possibly even real time DV/DVCPRO format translators. It's early in D-7's evolution; there may yet be surprises up Panasonic's sleeves...
 

What's the difference between locked and unlocked audio?
Locked audio is "audio done right": the audio sample clock (the digital time reference used in the sampling process) is precisely locked to the video sample clock such that there is exactly the same number of audio samples recorded per "audio frame" of video (not all TV formats and sound sample rates have a neat integer relationship between audio samples and frames, so an "audio frame" is my term [similar to a "color frame"] for the number of video frames it takes for audio and video to match up in the same phase relationship).

For PAL, 625/50 video, locked audio provides exactly the same number of samples per video frame with either 32 or 48kHz audio, but for NTSC, 525/59.94 video, the 48kHz "audio frame" is 5 video frames: locked audio will provide exactly the same number of audio samples for every five video frames, though not every frame within that 5-frame sequence has an equal number of audio samples. 32kHz locked "audio frames" cover a whopping 15 video frames!.

[There is such a thing as an AES/EBU audio frame, but I'm not sure it that's the same thing I'm referring to. Comments/clarifications welcomed!]

Unlocked audio: theory:

Unfortunately, such precisely-locked audio clocks are expensive. Since DV was designed as a consumer format, unlocked audio was allowed as a cost-saving measure. In unlocked audio, the audio clock is allowed some imprecision, such that there can be a variation from the locked spec of up to +/- 25 audio samples written to tape for every frame, instead of a precise and exact number.

This economy measure is simply one of allowing the audio clock to "hunt" a bit around the desired frequency; the phase-locked loop (or other slaving method) used to keep the audio sampling in sync with the video sampling can have a bit more slop in its lockup, with the audio sampling sometimes running a bit slower, sometimes a bit faster, but always staying in sync over the long run. The total amount of sync slippage allowed in unlocked audio is +/- 1/3 frame -- not enough to really worry about.

It's the difference between walking a dog on a short leather leash, always forcing the dog to stay right by your side (locked audio), and using a long, elastic leash or one of those "retractable clothesline" leashes that allows the dog to run ahead a bit or lag behind (unlocked audio). In either case both you and the dog will get where you're going at the same time, but along the way the "unlocked" dog has a bit more freedom to deviate from your exact walking pace.

Unlocked audio should not cause audio sync to drift away from video over a long period of time. The audio clock is still linked to the video clock; it's just allowed a bit more oscillation about the desired frequency (more wow & flutter if you will) as it's trying to track the video clock. Like the dog on the springy leash, it can run a bit ahead or a bit behind the video clock momentarily (up to 1/3 frame ahead or behind), but in the long run it'll still be pacing the video clock and on average will be right there in sync with it. I have shot one-hour continuous takes of talking heads with a consumer DV camcorder (DCR-VX1000) and experienced no drift at all between audio and video.

DV cameras and VTRs generate unlocked audio, both in 32 kHz 12 bit and in 48 kHz 16 bit recordings. DVCAM and DVCPRO cameras and VTRs generate locked audio in 48/16 audio format, and DVCAM can also generated locked 32/12 audio. 44.1kHz, discussed below, is never locked; it has no neat integer relationship with either 625/50 or 525/59.54 frame rates.

Some nonlinear DV/1394 editors generate locked audio, some output unlocked, and some allow the choice. Final Cut Pro through version 1.2.1 (at least) generates locked audio always, but it doesn't set the flag to tell the VTR that it's locked -- so the VTR reports it as unlocked.

DV gear is happy to record locked audio via 1394, just as the DVCAM DSR-20 VTR will accept unlocked audio. The DVCAM DSR-30 VTR and DSR-200 camcorder can also be made to record unlocked audio with a bit of coaxing.

Also, many nonlinear editors output 16 bit 44.1 kHz audio (at least on PC platforms), which both DV and DVCAM 1394-equipped decks record without any problems. 44.1 kHz is part of the Blue Book spec, so this is not too surprising.

(Many thanks to Earl Jamgochian at Sony for filling in and clarifying many of the details in this section.)

Unlocked audio: real life:
 
"The difference between theory and real life is that in theory, there is no difference between theory and real life, but in real life, there is a difference."  -DV Filmmaker Marshall Spight While the theory sounds good, real life is sometimes a bit different. Some manufacturers appear to take the word "unlocked" literally; a completely separate clock seems to be used for the digitization of audio, with no direct linkage or locking to the video clock. The result is an audio time base stability that's excellent (since no "hunting" around a target frequency is present), but the possibility arises of a long-term drift between audio and video, when processed independent of each other.

This was revealed at NAB '99 by Randy Ubillos, lead engineer on Final Cut Pro, who has found that while most DV cameras are pretty good, Canon cameras grab 48kHz sound at around 48.009 kHz, which can result in almost a second of video/audio slippage over the course of an hour (or around one frame every two minutes). Sonys, by contrast, seem to average 48.001 or 48.0005 kHz, resulting in perhaps a couple of frames of slippage over the same time period (and I haven't seen any slippage in my own tests of the VX1000). Clocking rates for other cameras were not discussed.

In normal playback of the DV tape this isn't seen, since on playback the audio is played back based on its embedded clocking data, in sync with the image. Both the audio and video slave to the data samples in each packet; as these are commingled in the DV datastream, the sound and picture will always play back in sync.

In most DV NLE systems to date (May '99), it was also not a problem, since captures were limited to under ten minutes due to the 2 Gigabyte file size limit and the slippage seen in this short time period was minimal.

Final Cut Pro, however, uses file referencing to span the 2 Gig limit, allowing captures limited only by available disk space, and the QuickTime media format used treats audio and video as separate tracks, each with its own time reference. When capturing long clips, the drift can become apparent; Final Cut can measure this drift and recalculate the audio sample frequency so that QuickTime playback will stay in sync.

As far as I can tell, the AVI file format used in some Windows-based NLEs does not allow this sort of long-term slippage to occur, but I may simply lack sufficient data. I do know that various QuickTime-based DV NLEs have shown certain oddball audio/video sync problems that I have not seen or heard of in AVI-based NLEs; this is not a QuickTime problem per se, merely an artifact of QuickTime's flexible and elegant approach to multiple-track media streams in that such problems can be made to occur.
 

Will unlocked audio hurt me? How do I deal with it?
When using analog audio I/O, the whole question of locked vs unlocked is moot: it's analog and there are no clocks to worry about. Analog is always safe to use for dubbing or editing. As discussed above, DV audio data are converted to analog in real time as the data come off the tape, and audio slippage simply doesn't occur regardless of the accuracy of the sampling clock.

It should also be of no concern when taking the audio in via 1394 to a DV-based nonlinear editing system. When all the audio samples are stored in a neat memory array, the software doesn't care if there was some time base instability on the original recording; when non-real-time rendering is occurring, a sample is a sample is a sample.

However, some long-term slippage between audio and video can occur in long clips, at least in QuickTime format, if the capture application doesn't compensate for any audio clock inaccuracy. Fortunately, the problem is understood by those in the business (at least at Apple and Digital Origin), and corrective measures are taken at capture time: Final Cut Pro measures the actual number of samples captured over time vs. the theoretical number, calculates the actual effective sampling rate, and uses that in QuickTime file processing.

Unlocked is only a potential problem when doing real-time audio and video editing with digital transfer of the audio between source and recorder. "Digital" means conveyance of the audio using the IEEE-1394 bus, AES/EBU digital audio outputs (on pro DVCAM/DVCPRO VTRs), or SDI embedded audio (ditto).

As far as DV-based editing is concerned, when you make an edit in the digital domain between two different DV datastreams using unlocked audio, you might wind up with a few too many audio samples or not quite enough, in which case you can get a click or pop on the soundtrack during playback as the audio subsystem either has to discard some extra data and resynchronize (an audio buffer overrun), or as it winds up with too few bits of sound to cover the time available (buffer underrun) and you get a momentary dead spot or mute effect (depending on the audio circuitry used, the system may also mute when it's resynchronizing after discarding samples). In either case the audio glitch will occur in a fraction of a second; it won't result in several seconds of dead audio or any prolonged audio noise. Reportedly, it's also only a problem at the out-points of insert edits, not at edit in-points (unverified).

Interestingly enough the same problem may occur when cutting between two locked audio streams without regard to synchronization of the "audio frames", though here the problem is much smaller in scope since the variation in sample counts will only be +/- 2 samples per video frame. Such errors are typically inaudible, though they may still complicate things if the audio track is then used in real-time digital audio mixing (see below), and they'll only occur in 525/59.94 video, never 625/50 due to 625's 1:1 relationship between video frames and "audio frames".

[It's also worth noting that any hard cut between clips can result in a pop or click if the instantaneous level of the audio at the cut point is mismatched, causing impulse noise. This is true in locked or unlocked audio; it can even occur when working in analog. This is one reason that linear analog audio tape and film fullcoat mag tracks are often spliced at an angle instead of with a straight cut; this mechanically performs a quick crossfade between the two tracks instead of an abrupt transition.]

When all you are doing is editing one generation down from camera originals to an edit master, and then making release copies on an analog format such as BetaSP, SVHS, Hi8, VHS, or the like, all you need to be concerned about is audible popping or muting. The release copies will contain an analog track that records what you hear; there are no hidden gremlins due to asynchronous clocking, jitter, or other nasties that so complicate digital audio.

However, when you take the digital audio datastream from a DV tape and try to integrate it into a larger digital audio system, such as AES/EBU routers, digital audio workstations (DAWs), and/or multitrack digital audio recorders including the Alesis ADAT and Tascam DA-88/98, the sloppy synchronization of unlocked audio can cause glitches, artifacts, and distortion. If the receiving gear is trying to derive its audio clock from the unlocked audio datastream, the entire downstream audio chain can be rendered unstable and disfunctional.

Furthermore, playback of unlocked audio including edit-point glitches as discussed above into a DAW or other digital audio system can cause a major commotion when the edit-point glitch is played back. Ever had a really bad splice go through the gate on a film projector, or past the heads on an analog audio tape recorder? A glitched unlocked audio edit is the digital equivalent of that crummy splice, only worse!

Fortunately it's fairly simple to avoid this. Either convert unlocked audio to locked, or use analog audio connections between your unlocked source and the digital audio chain you're feeding (and if your source tape has 44.1kHz/16 bit or 32kHz/12-bit sound, going analog into the digital system means that you get a rate conversion into 48kHz sound at however many bits are being used courtesy of the A/D converter on the professional digital system; it may actually sound better -- and be easier -- than hooking up digital sample rate converters in the chain).

There are four known ways to convert unlocked audio to locked audio:

1) The DSR-60/80/85/2000 DVCAM VTRs will convert unlocked audio to locked audio on playback. DVCPRO VTRs are also supposed to relock DV audio on playback. This solved your problem at the point of playback. If you need to make a tape with locked audio, then...

2) Dub your DV tape to a DVCAM or DVCPRO tape using analog audio connections between the source and the recorder. Hey presto, locked audio! The video can be dubbed via SDI for minimal if any losses. This is also the recommended route of your source audio is not 48kHz since you want the dub to have 48kHz audio for best compatibility.

3) Play back the DV tape in a high-end DVCAM or DVCPRO VTR, and dub it to a high-end DVCAM or DVCPRO VTR using either the AES/EBU digital audio or the SDI embedded audio options. The player will reclock the data and the recorder will write locked audio to tape.

4) Transfer your footage into a nonlinear editor that allows outputting locked audio, and use the NLE to write out locked audio, even to a DV-format tape. Slow and cranky, but it works.
 

How do I intermix locked and unlocked audio?
It's best not to intermix any variations of digital audio on the same tape. While VTRs seem to cope with sudden changes in sampling rate, bit depth, and locked/unlocked status, often you'll get a brief moment of silence at the transition between audio types as the internal workings of the audio chain readjust themselves to the new audio type. Some nonlinear editors are very uppity about audio changes; if you start digitizing a 48 kHz clip and the audio changes to 32 kHz, you'll get silence for the entire 32 kHz section (or vice versa; once the capture card and software start grabbing data at a certain rate, they're too busy to try to change rates in midstream. Furthermore, the meta-data stored with the clip can only remember one audio format per clip). And if you try to digitally feed such mixed-mode tapes' audio into further digital processing, major glitches can be expected.

The best thing when doing a linear edit is to use analog audio, or (if the only changes you have are between locked and unlocked audio) use the digital outputs from a high-end VTR as described above. For nonlinear editing, capture clips each containing only a single format of audio; when you render the finished project, all the audio will be converted to a common format.
 

Does unlocked audio explain why my audio loses sync in Adobe Premiere (or Final Cut Pro, or...)?
Sorry, no! Adobe Premiere 4.2 and earlier versions have a historical problem with synchronous audio playback from the timeline. As discussed above, unlocked audio doesn't drift over the long term. Premiere audio can drift regardless of whether the source was locked or unlocked. This particular problem is variously attributed to the difference between 30 Hz and the 29.97 Hz that NTSC runs at; the inability of an AVI or QuickTime file to maintain synchronous audio; the weakness of the Windows VFW subsystem at really keeping things in sync, and the phases of the moon (if anyone knows what's really going on, this author would appreciate being appropriately enlightened).

Premiere 5.1 fixes 4.2's audio sync problems. Certainly I've had no problems with Premiere 5.1 on Windows editing clips up to 9:30 in length (the 2 Gig limit of my AVI-based system), nor have I heard of any such problems in discussions with other people.

If you find you're getting sync slippage, check two things, especially on QuickTime-based editors: (1) when you capture a clip, make sure that the sample rate selected in the capture menus is the same as the sample rate on tape, and (2) set your timeline/sequence options to use the same sample rate as your captured clips (reportedly, not doing this is causes sync drift in Final Cut Pro). It also can't hurt to make sure your video capture settings are correctly set to 29.97 fps (NTSC) or 25 fps (PAL).


Adam J. Wilt is in the middle of an illustrious career involving a multitude of disciplines. His experience includes serving as Project Lead Senior Software Engineer for the Abekas A72 video character generator and a video software designer for ABC-TV and Pinnacle Systems, among many others. His fields of expertise include film & video production/postproduction, stop-motion animation, still photography; computer graphics, interface design, object-oriented design and programming, graphical user interfaces, and real-time hardware control. See his Web site at adamwilt.com.

Copyright 2000 Adam J. Wilt, excerpted from http://www.adamwilt.com/DV-FAQ-tech.html