0.1% Caveats

0.1% Caveats

by
Evan T. Chen
August 7, 1996

An edited and shortened version was published in the August 1997 issue of
TV Technology.
Copyright © 1997. All Rights Reserved.

Table of Contents

  1. Introduction
  2. Issue #1: A Nurtured Misconception (NTSC Color Does Not Run At 29.97fps!)
  3. Issue #2: A Mysterious Offset
  4. Issue #3: The Tascam DA-88
  5. Issue #4: Tricking Stubborn DAWs
  6. Conclusion


Introduction

The "0.1%" issue is how film and video post houses in this day and age of digital deal with the temporal discrepancy between the slower realm created in NTSC color video and real world time which runs 0.1% faster. If you've ever dealt with 29.97 vs. 30, 59.94 vs. 60, 44.056 vs. 44.1, 47.952 vs. 48, pull down vs. pull up, vari-speed, or sync wide vs. sync narrow, then you've delved headfirst into this dilemma. The annoyance (or benefit, depending on your point of view) has long been considered the technical cornerstone of film post-production houses in the digital era -- although the issue isn't all that technical, just misunderstood -- and for the "technical guru" at these facilities the primary reason why they're employed. Yet, the popular but somewhat disconcerting and unorganized trial-and-error approach in dealing with this issue, though feasible and often times circumstantially necessary, seems rather unprofessional, especially since we, the film post industry, make claims such as "the best in the business," "years of experience and expertise," and "dedicated to improving the way we do post." Apparently, the handful of individuals that truly understand the "0.1%" issue as it pertains to audio are mostly design engineers that unfortunately lack the experience of motion picture sound work, and those in the editorial houses whose job it is to effectively keep this dilemma transparent are mainly efficient end-users who, though readily familiar with the operations of every piece of gear in the facility, lack a design engineer's specialized approach to understanding functionality. Is it so necessary for the post community to have such a firm grasp of this concept? Yes, if we are to better our methods and increase productivity. The inefficiency that results ripples up the chain of command until the one who holds the pink slip (Producer, Executive Producer, Director?) unknowingly shells out more dough than necessary to complete a project. Although the economics of this is great, morally speaking it may cause some jitters. Yet if there's little systematic competition among the head honcho supervisors on the block, then they're asking themselves, "Why change a sure thing?" Nevertheless, this article focuses on four infamous 0.1% issues that seem to have the film post industry mind-boggled. How the information is used (or not used) is up to the individual. Return to Table of Contents.


Issue #1: A Nurtured Misconception (NTSC Color Does Not Run At 29.97fps!)

It's taken some time but the eclectic film post industry, which understandably is always dead last against other audio venues in the race to play catch-up to technology, is now finally aware that there exists a speed difference between video and "real-world" time in film, and I'd say a third of us can now effectively (though uncomfortably) deal with the annoying prospect that color video runs at 29.97fps (sort of) versus a true 30fps. However, with everything in post becoming more and more sophisticated, and with the deluge of software upgrades cluttering our workspace with more complicated and confusing options from which to choose, and with the recent surge in audio synchronization problems among desktop non-linear digital picture editing workstations, I think it's time to reveal that color video does NOT run at 29.97fps! Say again? Theoretically NTSC color video runs slightly faster than 29.97fps, but instead of printing 29.97002616431...fps all the time, the pioneers in the industry reasonably decided to make things easier to read and use the truncated 29.97 value instead. Thus in the past several years, with the merging of technologies and the blending of music and post, this industry, which doesn't concern itself with hard-core techno-engineering stuff, has gotten used to seeing "29.97" splattered all over the place, from trade magazines to users' manuals, and along with the technology this misconception rapidly bloomed. Now, when an inexperienced engineer fresh out of the University of Silicon Valley Whiz-Kids or head technoid at a post facility comes along and has to do some sort of calculation or designing with these numbers, the results will inevitably be incorrect. 30 to 29.97, yeah that's 1000 to 999, 29.97 is 0.1% less than 30, no problem, right? Well, mathematically speaking yes, but for audio post, not so. Here's a real-world example.

Don T. Know, the director of a music video, shoots on location in film with the lead singer of No Money lip-syncing to a 7-minute song played back wild on a consumer DAT player. Don gets his footage telecined to video and heads over to the local film post house where the $3000/wk Sound Editor, Les Clue, is supervising the project. Les gives his $1400/wk Assistant Sound Editor, Mimi Butkiss, the responsibility of resolving the speed of the DAT. When Les isn't looking, Mimi passes the task onto the less-than-minimum-wage intern Alec Smart, fresh graduate of the Recording Engineering program from the Yew C. Less Institute of Music Technology. Alec's read that video runs at 29.97, so if he vari-speeds this wild DAT down 0.1% on their DA60, then 30 - 30 x 0.1% = 29.97, that's it! Pleased with his ingenuity, Alec glows warmly with the knowledge and pride that he's done such great work for No Money. All tongue-in-cheek aside, what's Alec's mistake?

Since we now have accuracy of up to 1:48,000th of a second (and maybe even 1:96,000th in the near future) and an unwieldy 44.1k sampling rate, precision takes on an entirely new role. Fortunately, AES and SMPTE endorsed a method for synchronizing digital audio with picture (all you design engineers out there pay attention!), a 1001 to 1000 ratio between true 30fps and NTSC color video. The problem with Alec's calculation is his incorrect assumption that for digital audio NTSC color video runs at a precise 29.97fps. In the world of digital, NTSC color video, according to AES and SMPTE, runs at precisely:

30fps x 1000 ÷ 1001 = 29.97002997002997...fps.

Translation: 30fps is 0.1% faster than NTSC color video (as it pertains to digital audio), and although the number 29.97 is 0.1% less than 30, NTSC color video (again as it pertains to digital audio) does NOT run 0.1% slower than 30fps! Did you get that? Think about this: 999 is 0.1% less than 1000, but 1000 is NOT 0.1% greater than 999; 1001 is 0.1% greater than 1000, but 1000 is NOT 0.1% less than 1001. That's why it's possible to use vari-speed to increase the audio rate by 0.1% from the world of video to a real-world film speed, but not possible to vari-speed a real-world program down to NTSC color video! Now that you know this, from here on out, whenever you see the 29.97fps, keep in mind that it actually refers to a slightly faster rate.

Now for some bone-picking: there are some other problems with Alec's scenario (besides the ethical). The technical savvy will realize that a consumer DAT's internal crystal typically rates between 50-100ppm (parts per million). For a 7-minute song, this results in a worst-case scenario of a loss or gain of about 1.25 frames, noticeable for the scrutinizers. However, even if we assume that a custom-modified DAT player with a tighter crystal tolerance was used for the shoot, Alec would still be using his facility TC DAT's internal oscillator -- a crystal with no doubt a worse specification than the customized location DAT machine -- since few TC DAT players allow the user to vari-speed by 0.1% divisions while referenced to blackburst. Hence, the inaccuracy of the internal clocks here will contribute more significantly to drift than any miscalculation between 999:1000 and 1000:1001 (the proper ratio) on Alec's part. (The crystal argument is that if internal oscillators get any more accurate, we wouldn't need house sync, just some sort of phase sync.) So what's the correct answer? Alec should have inserted 29.97nd timecode referenced to video on the DAT, and then played it back at 30nd with the band nrr setting off. Another solution is to vari-speed the DAT by +0.1% at the shoot, and in post turn vari-speed off. The best solution is to use a TC DAT player referenced to the same 60Hz as the camera (camera's don't run at precisely 24fps either) at the production shoot. If the TC DAT was pre-striped at 29.97ndf, then the TC DAT player must have its band nrr setting turned off at the shoot, and once in post, the DA60 is left at 29.97ndf referenced to video. If the TC DAT was pre-striped at 30ndf (band nrr off) referenced to video, then that DAT player must be left at this setting through the entire production and post-production process, with only its reference source changing. Use this same procedure if the TC DAT was pre-striped at 30ndf referenced to its internal clock; the only difference is that in the latter case the music sounds 1/1001 times slower but with everything still in sync (try and figure that one out!).

I know, I know, this 1001:1000 business is getting way too esoteric now, but I am aware of one mis-implementation of this AES/SMPTE recommended practice (and suspect two) by a major DAW manufacturer commonly used in film post. "Selective sacrifice" a.k.a. "corner-cutting" is what engineers and manufacturers necessarily must do to stay competitive -- it's really shocking how certain cost-cutting features are being designed (or mis-designed) for pure marketing reasons -- and it's difficult, even for design engineers, to test for these practices on strictly an end-user level without any design notes or schematics. However, not conforming to a digital audio synchronization standard is a big no-no when it comes to post, so everybody heads up. Return to Table of Contents.


Issue #2: A Mysterious Offset

We've all seen how the timecode instantly shifts when the format on a digital recorder such as the Tascam DA-88 or Sony 7050 is changed, say from 29.97nd to 30nd or vise versa. For example, let's say you've just transferred your Dolby printmaster onto a timecoded DAT pre-formatted 44.056k-29.97nd (tcF=30nd, band nrr off) referenced to video starting at 00:58:00:00.00 with a "2-beep" at 1:00:00:00.00 (all sampling rates and timecode rates in this article are absolute, i.e. 44.1k is 44.1k, NOT 44.056k or 44.144k and 30 is 30fps NOT 29.97!). You pass this off to the Post-Production Coordinator who later returns complaining that it's slowly drifting out of sync and that the sampling rate should be at 44.1k with the "2-beep" at 1:00:00:00.00. Confidently, you switch the DAT player to tcF=29.97nd referenced to video, which indeed bumps up the sampling rate, but all of a sudden your "2-beep" has shifted to 00:59:56:12! What do you do? Well the inaccurate way is to jog the DAT player (presuming your TC DAT has this feature) to the start of the "2-beep," eye-ball the timecode at which it starts, and enter an offset. A better solution would be to load the beginning of the DAT into a DAW, determine precisely where the "2-beep" occurs, and then calculate an offset. There is however a method to calculate accurately where this new "2-beep" starts without having to take any of these measures. Read on!

Similarly on certain digital audio workstations and non-linear editing picture workstations, the user can choose between several timecode formats which ultimately effect the timeline, with 29.97 or 30 being the most confusing choice. To make matters even more complicated, sometimes the 29.97 and 30 settings also contain a puzzling pull-up/dn option. Thus the possible timecode configurations are now 24, 25, 29.97df, 29.97df pull-up, 29.97nd, 29.97nd pull-up, 30df, 30nd, 30df pull-dn, 30nd pull-dn (although few boxes implement all of these)! Some of you may have seen a 29.97 pull-dn and 30 pull-up configuration before as well. This is mainly a problem with semantics in the industry among manufacturers, much like the loose end of a cable labelled "in" which can be interpreted two ways: "in" TO the machine or "in" FROM the machine. Chances are "29.97 pull-dn" does not refer to a 29.94fps rate, and "30 pull-up" does not refer to a 30.03fps rate, despite what manufacturers have decided to call it. In any case changing the timecode format in your digital audio workstation, say from 29.97 to 30, shifts your timeline, and the doorclose that used to occur at 02:00:10:00 has now moved to 2:00:17:06! What gives?

This is another hazy area that's plagued the film post industry for ages, and the solution to this problem involves understanding two things about the equipment in use: sampling rate and timecode rate. I'm going to warn you right now that the next few paragraphs is already much too textbook-like for me to give detailed derivations of my calculations, so I'm giving you only the barebone equations. If you can't handle number crunching, skip this section, and just live with the haze (which is what everybody's doing anyways). If you can follow the logic, chances are you'll be able to figure out the hidden details on your own. Since this problem deals exclusively with 30-frame timecode running at either 29.97 or 30fps, I'll focus solely on these two rates and use the more cumbersome of the two most popular sampling rates, 44.1k, as an example.

The most common combination of sampling/timecode rates are 44.1k and 29.97, 44.056k and 29.97, 44.1k and 30, and 44.144k and 30. However, just like the value 29.97fps shouldn't be taken too literally, neither should 44.056k and 44.144k -- these are NOT precise values! For any one of these four combinations your digital audio gear plays back a specific number of samples per video frame. For example at 44.1k samples/second referenced to 30fps, 1470 samples are played per frame. 1470 samples/frame is also the ratio for 44.056k samples/second at 29.97fps (and not coincidentally either). For 44.1k to 29.97 and 44.144k to 30 this ratio becomes 1471.47 samples/second (if you're attempting to do the math, don't forget the 1001:1000 ratio mentioned earlier). Wait a minute! How can there be a fraction of a sample (0.47 samples) in a second? Well, there can't, so designers of the audio gear you're using have implemented a staggered mapping system. Now, when you change a timecode or sampling rate setting on the DA-88 or TC DAT or DAW, you've modified this ratio. That is, instead of playing back X amount of samples/frame, you're now playing back Y samples/frame. Returning to that doorclose, which let's say starts at sample number 318,278,961, it's still going to occur at that sample (samples are absolute), but since we're now only playing 1470 samples/frame instead of 1471.47 samples/frame, it'll take place at a different timecode.

To calculate where exactly an event will occur after disturbing the TC rate or sampling rate depends on whether a digital audio recorder or DAW is in use and whether one's traversing from 1470 samples/frame to 1471.47 samples/frame or vise versa. In order to redistribute samples, an arbitrary fixed point in timecode must be allotted. For digital audio recorders it's usually 00:00:00:00.00 and for DAWs it's usually the starting timecode of the timeline, session, EDL, whatever you have it. Our doorclose on the Tascam DA-88 occurred at 2:00:10:00.00 at 29.97nd video reference pull-off, which means the unit's sampling rate is 44.1k samples/second, a ratio of 1471.47 samples/frame. Now switching the unit to 30nd pull-dn maintains the 29.97nd video reference rate but changes the sampling rate to 44.056k samples/second, a new ratio of 1470 samples/frame. To redistribute 1471.47 samples/frame to 1470 samples/frame, just multiply the difference in time of the doorclose from the arbitrary reference timecode (00:00:00:00.00) by 1001:1000 (there's that magic ratio again):

(2:00:10:00.00 - 00:00:00:00.00) x 1001:1000 = 2:00:17:06.30.

Now let's say we have our R3 A/B "2-beep" occurring at 3:00:06:00.00 in our DAW with the EDL starting at 02:59:30:00.00. Our reference is internal 30nd sampling at 44.1k. Now we decide to change the timecode to 29.97fps, and all of a sudden the timeline's numbers shift. Where does our "2-beep" occur now? In this case we're converting from 1470 samples/frame to 1471.47 samples/frame, which means we multiply the RELATIVE position of the "2-beep" by 1000:1001 and add the offset [(X - Xo) x 1000:1001 + Xo]:

(1000 x 3:00:06:00.00 + 02:59:30:00.00) ÷ 1001 = 3:00:05:28.92,

just over a one-frame shift. And there you have it, mystery solved, case closed. By the way, EDL scripts and cue sheet programs that allow the changing of timecode rates are based on a similar principle. Return to Table of Contents.


Issue #3: the Tascam DA-88.

In the last few years the Tascam DA-88 has become the workhorse standard as a comparatively inexpensive digital medium in film post, but when it comes to its pull-up/dn/off operations, it has the industry going "huh?" As with every piece of digital record/playback gear, it boils down to its timecode rate (the film post industry almost exclusively deals with 29.97 and 30fps) and sampling rate, assuming everything's referenced and playing back in-phase to house sync. To maintain the benefits of digital technology, 44.1k seems to be the favored sampling rate due to all the sound effects CD libraries in common use (by the way, would everybody please stop using that same trite warning buzzer sound from that popular effects library -- I'm sure you know the one I'm talking about), although 48k seems to be the "easy way out" since machines like the Sony 3348 have less complicated options for that sampling rate. (I must admit however that certain digital decks only operate at 48k, which necessitates that sampling rate, and that technically speaking 48k samples/sec maps easier into 29.97f/s than 44.1k samples/sec). Anyways, here's the low down:

SettingsResults
Tape TCTCPullRefTC RateSampling RateVari
29.97nd or 30nd29.97ndoffvid29.9744.1koff
29.97nd or 30nd29.97ndoffint29.9744.1koff
29.97nd or 30nd29.97ndupvid29.9744.1koff
29.97nd or 30nd29.97ndupint3044.144kon
29.97nd or 30nd30nddnvid29.9744.056koff
29.97nd or 30nd30nddnint3044.1koff
29.97nd or 30nd30ndoffvidn/an/an/a
29.97nd or 30nd30ndoffint3044.1koff

The parameters "TC Rate", "Sampling Rate", and "Vari" can not be set by the user, and the data in the columns are a result of the parameters set under "Tape TC", "TC", "Pull", and "Ref." "Tape TC" indicates the original timecode setting ("TC") of the tape when it was formatted. Notice that regardless of whether the tape was formatted at 29.97nd or 30nd, the results are identical for the same user settings! "TC" refers to the timecode setting of the DA-88. "Pull" is the Tascam's pull-up/dn/off setting. "Ref" indicates whether I used blackburst or the DA-88's internal clock as a timing reference. "TC Rate" is the actual speed of the timecode in frames/second. "Vari" is the vari-speed indicator light. Obviously this chart doesn't cover all the possibilities but one can easily extend it to do so.

So what does this all mean? Well first off, the post community is under the impression that DA-88 tapes must always be pre-formatted at 30nd pull-down, which is not the case. If someone makes a mistake formatting at say 29.97nd pull-off, just set the DA-88 to 30nd pull-down before recording and playback and you'll be fine (if you record at 29.97nd pull-off and playback at 30nd pull-down though your timecode and sampling rate are both going to be out). There's also this myth that all predub sources at a film mix must be running at 44.056k. In regards to the DA-88, this means that tapes used as predub sources at a film mix must be at 30nd pull-down/44.056k. (Be aware that the Sony PCM800 and Tascam DA-88 will accept a 44.1k AES/EBU signal while recording at 44.056k -- an illegal procedure -- without giving you a warning. You'll definitely hear glitches when playing back.) However, if you're transferring tracks analog or if your DAW supports both 44.056k and 44.1k sampling rates, there's no reason you can't record all predubs at the higher 44.1k sampling rate. Sure, if you're mixing through an all digital console then sampling rates matter, but that's far from being the case as it stands now. Sure, if you're mixing to a projected image running at a true 24fps or 30fps, then yes you will need to bump everything up by 0.1%. Unfortunately doing it this way means your timecode values are all going be off, while most film mixes are to video anyways. Don't we have to eventually pull-up our mix to real-world time? Yes, sure, when your soundtrack's in the theater running at 24/30fps. Even during the transfer stages everything may be done at video speed! Hence, predubs heading out of the editing facility to the mix house are not required to be at 44.056k (unless, of course, it's one of the house specs, in which case I strongly suggest they re-evaluate their post process)! Return to Table of Contents.


Issue #4: Tricking Stubborn DAWs

Transferring tracks digitally to DA-88 means setting sampling rates that agree, and granted that certain DAWs run only at 44.056k while referenced to video (in this instance DA-88 predub sources must be at 30nd pull-dn, no way around it), while on the flip side others solely work at 44.1k regardless of its reference. However there are ways to trick most DAWs that play at only a fixed sampling rate into functioning at different rates. By adjusting the DAWs' sampling rate tolerance to 0.1% or 100ppm (this may be an undocumented command) a machine that only plays back 44.056k or 44.1k will now record DIGITALLY at 44.1k or 44.056k respectively. On playback these DAWs will automatically pull-up/dn their programs. For example, if one needed to pull-up an 8-track mix from a DA-88 recorded at 30nd pull-dn referenced to blackburst (29.97nd, 44.056k), transfer the material digitally into an Avid AudioStation with its tolerance set 50 samples/second or so. When the Avid plays back the mix, since the software runs only at 44.1k and assuming the startup hardware settings have been returned, it will operate at 44.1k.

An alternate method is to change the DAWs' hardware reference clock speed, whether it's word clock, oversample clock, blackburst, whatever. In this case the user tricks the software into thinking that it's still being fed "proper" timing information, but in reality the program is brute-forced into running slightly slower or faster. Again, this will only work if one can adjust the DAWs' timing tolerance to at least 0.1%. For instance, say you need to run audio 0.1% faster on a DAW that recorded the material analog at 44.056k while referenced to word clock. Change the sampling rate tolerance to 100ppm, or 50 samples/second, or sync wide, or sync narrow off, or however the manufacturer decided to implement this feature, and bump up your word clock to 44.1k (there are several boxes that will do this). Simple as that! One disadvantage to the brute force method though is that since these DAWs only work at 44.1k/48k or 44.056k/47.052k referenced to blackburst, there's no option to remap the timecode timeline in your project. In other words your EDL or session that used to display timecode that ran in sync at 29.97fps with NTSC color video won't anymore. It will now be either at 30fps or 29.9400898...fps, varying along with your wordclock, so don't trust those numbers! As you can see, this approach is not for the fainthearted, and should only be used by those that are competent in the internal workings of DAWs.

As an example, I was involved on a feature in which almost everything was cut on Pro Tools running at 29.97/44.056 (there's that myth again). The ADR was done with the Avid AudioVision's ADR Tool, which is programmed to map samples across it's timeline at 29.97/44.1k or 48k (versus 29.97/44.056 or 47.952, or 30/44.1 or 48). Hence, the AudioVision exported the ADR session as strictly a 29.97/44.1 session, which in our case was the wrong frame/sampling rate combination. It was suggested that the sample rate tolerance on the AudioVision be set at 50 samples/second and the sampling rate from the Timeline MicroLynx be changed to 44.056k (the MicroLynx controls the AudioVision's sampling rate through a 256x Oversample Clock) before the start of the looping session. This theory is flawed in that the AudioVision's timeline maintains a constant ratio of samples per frame; therefore, slowing the sampling rate also means slowing the timeline! In other words, if the sampling rate had been set at 32k samples/second, the timeline would be running at 21.75fps, whereas 44.056k translates to 29.94fps -- only 44.1k corresponds to an NTSC color video speed of 29.97fps -- so on export Pro Tools would extrapolate a 29.97/44.1k session regardless! (However, had we recorded ADR at 21.75fps/32k, we'd hear chipmunks on playback). Return to Table of Contents.


Conclusion

Times are a changin'! It just isn't good enough for us in post to know or do only the "creative stuff" anymore, especially when artistry and technology have merged. (Actually there was always a one-to-one correlation between the two -- Leonardo DaVinci comes to mind; it's just more apparent now.) What, you think solving the techno mumbo-jumbo puzzle isn't a creative process? Anyways the important thing is not what you know, it's how you apply what you know. "Push-button" knowing is no big deal. Understanding is. Return to Table of Contents.

Evan T. Chen is a freelance all-around post-production kinda guy who recently returned to the Los Angeles feature film scene after a two and half year stint in New York. He is currently negotiating with various companies to design a DAW specifically for post based on high-end Unix and/or NT platforms.

[Note: This was the original biographical blurb that appeard in the 1997 issue of TV Technology. - Evan 2/21/03]

Copyright © 2010 by Evan T. Chen. All rights reserved.