Synchronization Discrepancies When Editing Audio for Film in the Video Domain

Synchronization Discrepancies When Editing
Audio for Film in the Video Domain

by
Evan T. Chen
August 25, 1995

The majority of those using Digital Audio Workstations (DAW) to do post audio work for film in the video domain will testify to the overwhelming advantages digitized picture has over work-tape and traditional mag editing. However, retaining frame-locked audio synchronization using the latter method is less of an issue than working with random-access picture or film-to-tape transfers. In fact DAWs with digitized picture can still cause a full video frame synchronization discrepancy on the final optical if post-production "professionals" fail to understand certain critical issues involved in posting in video for film.

The single most important concept to grasp in posting for film in the video realm is the film-to-tape transfer: how is film which runs at 24fps transferred to NTSC color video which runs at approximately 29.970fps? A telecine machine is used to do the transfer and it either "pulls-up" the video by a specified ratio -- the ratio of the frequency of the AC mains to the field rate of NTSC color, approximately 60.000/59.940 -- to 30fps or "pulls-down" the film (59.940/60.000) to approximately 23.976fps so that the number of film frames to the number of video frames is 24/30 = 23.976/29.970 = 4/5, a fraction of whole numbers. The problem now becomes how four film frames is mapped into five video frames. Since every video frame consists of two interlaced fields, the four film frames is equivalently mapped into ten video fields which simplifies into two film frames per five video fields (4/5 => 4/10 = 2/5). Graphically this is what we have:

Figure 1

There exists four logical combinations of mapping two film frames into five video fields:


Figure 2



Figure 3



Figure 4



Figure 5

In these examples period T is arbitrarily set to 1/12th of a second which assumes actual film speed (24fps) and that during the transfer process the video is pulled-up to 30fps.

The differences between the various transfer sequences is whether the first film frame is transferred into three fields or two fields (3:2 sequence or 2:3 sequence, hence "3:2 pull-down") and whether the first film frame is transferred into the first or second field of a video frame.

When many workstations are stopped on a "frame," they actually display only the first or second fields of a video frame. In this article I'll assume the first field. For example if a reel is transferred to video in a "C-sequence" and that video is subsequently digitized into the DAW, the first two transferred images as displayed by the workstation would be the following:


Figure 6

Now, if a sound editor were to use the DAW to place distinct audio events with corresponding picture at film frames one and two, such as a clapstick, footstep, gunshot, or door slam, the sounds would in reality not be in synchronization by 2/5 and 1/5 of a film frame (or 1 field and 1/2 of a field) respectively. The reason is the beginning of the actual film footage does not start -- in this particular instance -- on the first field of a video frame. In actuality film frame 1 starts one field early, and film frame 2 starts half a field early!


Figure 7

To be perfectly in sync the events would have to be shifted correspondingly and placed as shown in Figure 8.

One comment is in order. One film frame does not last ON SCREEN for precisely 1/24 of a second: it takes the projector a fraction of this time to advance the film. I am assuming in this article that this minuscule amount of time is negligible -- which it is.


Figure 8

One film frame may be transferred into the video realm in eight possible orientations as shown in Figures 9 and 10.


Figure 9



Figure 10

One may notice that in cases 5 and 7 the same film frame is transferred into three fields, two of which are both first fields of a video frame. Therefore, in these instances, the DAWs display two consecutive fields of the same film frame. The question arises whether to place an audio event on the first or the second of the duplicate shots. By observation the first field is the better choice, having less of a local synchronization error. Hence, a good rule of thumb is to always place the audio event on the first field when a repeating shot due to a 3:2 pull-down is encountered (this is assuming that the DAW displays the first field).

Careful analysis will also show that to maintain the least amount of local synchronization problems in sound editing, a worktape transferred in an "A" or "C" sequence (2:3 sequences) is more advantageous over "B" and "D" sequences (3:2). This is shown in Figures 11 and 12.


Figure 11



Figure 12

Similar analysis for "C" and "D" sequences will show that ΔTC=ΔTA=4/5 film frame and ΔTD=ΔTB=6/5 film frame. Four film frames are used in the above illustration because that is the period of our repeating transfer sequence; in other words, the transfer cycle repeats every four film frames. Our recurring local synchronization errors therefore occur within this region as well. A note on Δtmax: this represents within a period of the transfer sequence the greatest discrepancy in synchronization. As we shall see there is a way of minimizing both Δtmax and ΔT for certain cases.

Looking carefully at Figure 12, one might notice that whenever an audio synchronization error occurs, it always occurs late, after the visual cue. What if one were to globally shift the entire audio track(s) early by a fraction of a frame? This is illustrated in Figure 13.


Figure 13

Notice that Δtmax and ΔT have both decreased in value. Closer examination suggests that shifting the audio track(s) even further won't improve ΔT, since Δt1 and Δt4 will increase by the same amount that Δt2 and Δt3 decreases. However, it may improve ΔtBmax.

By shifting the audio track(s) 1.5/5 of a film frame, as shown in Figure 14, Δtmax and ΔT are both minimized.


Figure 14

Any further shifting will only cause Δtmax and then ΔT to increase, as shown in Figure 15. Using similar procedures, optimal "shifts" may be found for the other three transfer sequences. Remember, 1/5 of a film frame is one half of a video field or 1/120 of a second.


Figure 15

The location of the "2-beep" or "2-pop" is crucial to correct synchronization between picture and sound, but depending on how the "2" frame was transferred, the location of the "2-pop" may not align perfectly with the frame on the worktape. In some instances a guide track in mag with a one-frame beep is transferred along with the film, and if the alignment of the audio track and transfer proceeded without error, then the transferred "2-pop" is properly positioned on the worktape, whether it appears to be in synchronization with the "2" frame or not. If it is required that a "2-pop" be added by the sound editor, then to achieve perfect synchronization it is crucial to know exactly how the "2" frame is transferred. For example, referring to Figures 9 and 10, an editor would be incorrect six out of eight times if using a DAW. As an example, see Figure 16 below.


Figure 16

Furthermore, in cases 1 and 5 in Figures 9 and 10, the two instances in which the beep is placed correctly, ambiguity arises for the latter case since the workstation displays the "2" frame twice. The rule of thumb applies here as well: in the event that the DAW displays the same image twice consecutively, always align the audio event with the first image. Also, since one film frame lasts longer than a video frame, be sure to make the "2-pop" the proper length, that is 1/24th of a second, not 1/30th of a second! If the beep is shorter than a film frame, the optical lab may decide to synchronize the end of the film frame with the end of the beep, rather than the beginning of the film frame with the beginning of the beep (this rarely occurs but why give them that choice?). By making the duration of the "2-pop" a full film frame in length, the lab can't help but align it perfectly with the "2" frame.

Since so much of sound editing for film in video depends on the transfer process, how does one determine how a particular film frame was transferred? The easiest way is to punch a hole in the first frame to be transferred (or have some other distinguishing mark), have the lab burn-in a timecode window along with field dominance, and have the lab specify the transfer sequence. It is then just a matter of verifying the transfer sequence by comparing the punch hole frame on a VTR with the timecode burn-in. An in-house VITC-capable window generator can be used for further verification. Unfortunately, different transfer houses have varying degrees of control over how a film is transferred to video, and some may not be able to modify these parameters at all. For example, many transfer houses can't specify field dominance.

Without knowing exactly the location of the first frame transferred, one can only determine a particular frame transfer to within 1/5 of a film frame accuracy. With a workstation this procedure is based on the fact that in the four transfer sequences there always exists film frames that will be transferred twice consecutively, a marker to which all other frames can be referenced. Also, if the worktape has a timecode window, a transfer sequence will repeat every five frames. That is if the first frame of an "A" sequence is at 1:00:00:00, then video frames at 1:00:00:05, 1:00:00:10, 1:00:00:15, 1:00:00:20, 1:00:00:25, 1:00:01:00, etc. will also be first frames of an "A" sequence. This is assuming non-drop frame timecode, though; drop-frame timecode requires additional calculation. Once this information is known, finding out transfer information about individual frames is just a matter of literally "filling in the blanks."

For example, if this is what the DAW displays,


Figure 17

Hence, without knowing the first frame transferred, two possibilities exist:


Figure 18

Armed with this knowledge, many sync-related problems can be avoided, albeit sometimes in an unorthodox manner. For example, I recently worked on a project in which a few days before the mix the negative cutter discovered four film frames missing from the workprint. After further investigation we concluded that the frames were not consecutive: two film frames were missing near the beginning of the film, one was missing in the middle, and one more later on. New workprints were out of the question due to a looming deadline, so with my fingers crossed I shifted portions of the audio track by carefully-calculated amounts (this was definitely a brain-teaser). After mixing the soundtrack, I requested a pulled-up transfer to mag be made in order to check sync before going to the final optical. The verdict: no problems.

One final instance is changeovers. With digital audio workstations butt-splicing the beginning of the next reel to the end of the previous reel is not necessarily the best method to create changeovers. Again referring to Figures 9 and 10, assume that the LFOA of the first reel is transferred as in case 5, and the FFOA of the second reel is transferred as in case 1. Butt-splicing produces what is shown in Figure 19:


Figure 19

Remember, on most DAWs video can only be moved into the nearest first field; second field edits are not allowed. Now, although the workstation's video timeline appears fine, what the audience in the theater is seeing and hearing is shown below in Figure 20. Thus, the audio is late by 3/5 of a film frame. A 3/5 film frame shift in the audio tracks of the second reel would remedy this situation, but notice that the audio would overlap the tracks in reel one by 1/5 of a film frame.


Figure 20

Finally, how relevant is a few fifths of a film frame? Is it really detrimental to the soundtrack if the audio synchronization is off by this much? Won't human limitations introduce more of an error during changeovers? Well, it may not get you fired, but the tried-and-true mag guys who have converted to Digital Audio Workstations have been noticing that their non-mag projects seem a little "rubbery." Furthermore, these minor faults do add up. In an industry where so many people at so many facilities can create so many problems at so many different stages of the film making process, any help minimizing these distortions, I'm sure, would be greatly appreciated by those most critically involved. And it's not just about sync discrepancies: it's about having the knowledge to be able to solve various post-related issues that seem to inevitably arise on every project. Plus, it's always the attention to detail that makes a person, a facility, or a project stand out. Anybody can just get by. Perhaps there should be a "code of honor" engraved in stone somewhere in the film community concerning the responsibilities and ethics of post-production personnel. Maybe there should be a standardized test governed by a select committee given to those wanting to become unionized "professionals." It's too bad one can conceivably create more problems if these certain aspects are misunderstood and therefore misused. But then again, maybe a DAW will be able to automatically compensate in the near future.

Copyright © 2010 by Evan T. Chen. All rights reserved.