2

I am attempting to reconstruct velocity information of an object from iPhone videos, and so I am interested in determining the time difference between frames. From what I understand, iPhone (HEVC encoding, .mov files) shoots in variable frame rate. Needing high accuracy, I cannot work under the assumption of constant frame rate video. So far I've only been able to recover pts_time values that effectively reflect CFR video, but based on other posts it seems this should be possible. I am working with ffmpeg 5.1.1 on macOS Monterey 12.6.

Here's what I've tried:

  1. "Confirmed" VFR video running mediainfo input.mov from the command line w output:
Video
ID                                       : 1
Format                                   : HEVC
Format/Info                              : High Efficiency Video Coding
Format profile                           : Main@L5@Main
Codec ID                                 : hvc1
Codec ID/Info                            : High Efficiency Video Coding
Duration                                 : 1 s 372 ms
Source duration                          : 1 s 947 ms
Bit rate                                 : 32.2 Mb/s
Width                                    : 3 840 pixels
Height                                   : 2 160 pixels
Display aspect ratio                     : 16:9
Rotation                                 : 180°
Frame rate mode                          : Variable
Frame rate                               : 25.685 FPS
Minimum frame rate                       : 7.500 FPS
Maximum frame rate                       : 75.000 FPS
Color space                              : YUV
Chroma subsampling                       : 4:2:0
Bit depth                                : 8 bits
Bits/(Pixel*Frame)                       : 0.151
Stream size                              : 4.78 MiB (91%)
Source stream size                       : 5.23 MiB (99%)
Title                                    : Core Media Video
Encoded date                             : UTC 2022-08-27 18:44:20
Tagged date                              : UTC 2022-08-27 18:44:20
Color range                              : Limited
Color primaries                          : BT.709
Transfer characteristics                 : BT.709
Matrix coefficients                      : BT.709
Codec configuration box                  : hvcC

Audio
ID                                       : 2
Format                                   : AAC LC
Format/Info                              : Advanced Audio Codec Low Complexity
Codec ID                                 : mp4a-40-2
Duration                                 : 1 s 372 ms
Source duration                          : 1 s 440 ms
Bit rate mode                            : Variable
Bit rate                                 : 181 kb/s
Channel(s)                               : 2 channels
Channel layout                           : L R
Sampling rate                            : 44.1 kHz
Frame rate                               : 43.066 FPS (1024 SPF)
Compression mode                         : Lossy
Stream size                              : 30.3 KiB (1%)
Source stream size                       : 31.9 KiB (1%)
Title                                    : Core Media Audio
Encoded date                             : UTC 2022-08-27 18:44:20
Tagged date                              : UTC 2022-08-27 18:44:20

Other #1
Type                                     : meta
Duration                                 : 1 s 372 ms
Source duration                          : 9 s 245 ms
Stream size                              : 0.00 Byte
Source stream size                       : 10.0 Bytes

Other #2
Type                                     : meta
Duration                                 : 1 s 372 ms
Source duration                          : 9 s 245 ms
Stream size                              : 0.00 Byte
Source stream size                       : 8.00 Bytes
  1. Running ffmpeg -i input.mov -vf vfrdet -an -f null - (showing it's not VFR?) w output:
MacBook-Pro-6:timing_validation $ ffmpeg -i 30_4k-3ft.mov -vf vfrdet -an -f null -
ffmpeg version 5.1.1 Copyright (c) 2000-2022 the FFmpeg developers
  built with Apple clang version 13.1.6 (clang-1316.0.21.2.5)
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '30_4k-3ft.mov':
  Metadata:
    major_brand     : qt  
    minor_version   : 0
    compatible_brands: qt  
    creation_time   : 2022-08-27T18:44:20.000000Z
    com.apple.quicktime.make: Apple
    com.apple.quicktime.model: iPhone 13 Pro Max
    com.apple.quicktime.software: 15.0
    com.apple.quicktime.creationdate: 2022-08-27T11:21:43-0700
  Duration: 00:00:01.37, start: 0.000000, bitrate: 32207 kb/s
  Stream #0:0[0x1](und): Video: hevc (Main) (hvc1 / 0x31637668), yuv420p(tv, bt709), 3840x2160, 22539 kb/s, 25.68 fps, 30 tbr, 600 tbn (default)
    Metadata:
      creation_time   : 2022-08-27T18:44:20.000000Z
      handler_name    : Core Media Video
      vendor_id       : [0][0][0][0]
      encoder         : HEVC
    Side data:
      displaymatrix: rotation of -180.00 degrees
  Stream #0:1[0x2](und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 181 kb/s (default)
    Metadata:
      creation_time   : 2022-08-27T18:44:20.000000Z
      handler_name    : Core Media Audio
      vendor_id       : [0][0][0][0]
  Stream #0:2[0x3](und): Data: none (mebx / 0x7862656D), 0 kb/s (default)
    Metadata:
      creation_time   : 2022-08-27T18:44:20.000000Z
      handler_name    : Core Media Metadata
  Stream #0:3[0x4](und): Data: none (mebx / 0x7862656D), 0 kb/s (default)
    Metadata:
      creation_time   : 2022-08-27T18:44:20.000000Z
      handler_name    : Core Media Metadata
Stream mapping:
  Stream #0:0 -> #0:0 (hevc (native) -> wrapped_avframe (native))
Press [q] to stop, [?] for help
Output #0, null, to 'pipe:':
  Metadata:
    major_brand     : qt  
    minor_version   : 0
    compatible_brands: qt  
    com.apple.quicktime.creationdate: 2022-08-27T11:21:43-0700
    com.apple.quicktime.make: Apple
    com.apple.quicktime.model: iPhone 13 Pro Max
    com.apple.quicktime.software: 15.0
    encoder         : Lavf59.27.100
  Stream #0:0(und): Video: wrapped_avframe, yuv420p(tv, bt709, progressive), 3840x2160, q=2-31, 200 kb/s, 30 fps, 30 tbn (default)
    Metadata:
      creation_time   : 2022-08-27T18:44:20.000000Z
      handler_name    : Core Media Video
      vendor_id       : [0][0][0][0]
      encoder         : Lavc59.37.100 wrapped_avframe
    Side data:
      displaymatrix: rotation of -0.00 degrees
frame=    1 fps=0.0 q=-0.0 size=N/A time=00:00:00.03 bitrate=N/A speed=0.0622x  frame=   16 fps= 15 q=-0.0 size=N/A time=00:00:00.53 bitrate=N/A speed=0.498x   frame=   41 fps= 28 q=-0.0 Lsize=N/A time=00:00:01.36 bitrate=N/A speed=0.921x    
video:19kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
[Parsed_vfrdet_0 @ 0x7ff2c8836fc0] VFR:0.000000 (0/40)
  1. Running ffprobe -select_streams v:0 -show_entries packet=pts_time,duration_time,stream_index input.mov which shows a couple deviations from CFR (although duration_time doesn't reflect this...), but the total pts_time value doesn't match the video duration shown in the previous output:
[PACKET]
stream_index=0
pts_time=-0.433333
duration_time=0.033333
[/PACKET]
[PACKET]
stream_index=0
pts_time=-0.300000
duration_time=0.033333
[/PACKET]
[PACKET]
stream_index=0
pts_time=-0.366667
duration_time=0.033333
[/PACKET]
[PACKET]
stream_index=0
pts_time=-0.166667
duration_time=0.033333
[/PACKET]
[PACKET]
stream_index=0
pts_time=-0.233333
duration_time=0.033333
[/PACKET]
[PACKET]
stream_index=0
pts_time=-0.033333
duration_time=0.033333
[/PACKET]
[PACKET]
stream_index=0
pts_time=-0.100000
duration_time=0.033333
[/PACKET]
[PACKET]
stream_index=0
pts_time=0.100000
duration_time=0.033333
[/PACKET]
[PACKET]
stream_index=0
pts_time=0.033333
duration_time=0.033333
[/PACKET]
[PACKET]
stream_index=0
pts_time=0.000000
duration_time=0.033333
[/PACKET]
[PACKET]
stream_index=0
pts_time=0.066667
duration_time=0.033333
[/PACKET]
[PACKET]
stream_index=0
pts_time=0.233333
duration_time=0.033333
[/PACKET]
[PACKET]
stream_index=0
pts_time=0.166667
duration_time=0.033333
[/PACKET]
[PACKET]
stream_index=0
pts_time=0.133333
duration_time=0.033333
[/PACKET]
[PACKET]
stream_index=0
pts_time=0.200000
duration_time=0.033333
[/PACKET]
[PACKET]
stream_index=0
pts_time=0.366667
duration_time=0.033333
[/PACKET]
[PACKET]
stream_index=0
pts_time=0.300000
duration_time=0.033333
[/PACKET]
[PACKET]
stream_index=0
pts_time=0.266667
duration_time=0.033333
[/PACKET]
[PACKET]
stream_index=0
pts_time=0.333333
duration_time=0.033333
[/PACKET]
[PACKET]
stream_index=0
pts_time=0.500000
duration_time=0.033333
[/PACKET]
[PACKET]
stream_index=0
pts_time=0.433333
duration_time=0.033333
[/PACKET]
[PACKET]
stream_index=0
pts_time=0.400000
duration_time=0.033333
[/PACKET]
[PACKET]
stream_index=0
pts_time=0.466667
duration_time=0.033333
[/PACKET]
[PACKET]
stream_index=0
pts_time=0.633333
duration_time=0.033333
[/PACKET]
[PACKET]
stream_index=0
pts_time=0.566667
duration_time=0.033333
[/PACKET]
[PACKET]
stream_index=0
pts_time=0.533333
duration_time=0.033333
[/PACKET]
[PACKET]
stream_index=0
pts_time=0.600000
duration_time=0.033333
[/PACKET]
[PACKET]
stream_index=0
pts_time=0.766667
duration_time=0.033333
[/PACKET]
[PACKET]
stream_index=0
pts_time=0.700000
duration_time=0.033333
[/PACKET]
[PACKET]
stream_index=0
pts_time=0.666667
duration_time=0.033333
[/PACKET]
[PACKET]
stream_index=0
pts_time=0.733333
duration_time=0.033333
[/PACKET]
[PACKET]
stream_index=0
pts_time=0.900000
duration_time=0.033333
[/PACKET]
[PACKET]
stream_index=0
pts_time=0.833333
duration_time=0.033333
[/PACKET]
[PACKET]
stream_index=0
pts_time=0.800000
duration_time=0.033333
[/PACKET]
[PACKET]
stream_index=0
pts_time=0.866667
duration_time=0.033333
[/PACKET]
[PACKET]
stream_index=0
pts_time=1.033333
duration_time=0.033333
[/PACKET]
[PACKET]
stream_index=0
pts_time=0.966667
duration_time=0.033333
[/PACKET]
[PACKET]
stream_index=0
pts_time=0.933333
duration_time=0.033333
[/PACKET]
[PACKET]
stream_index=0
pts_time=1.000000
duration_time=0.033333
[/PACKET]
[PACKET]
stream_index=0
pts_time=1.166667
duration_time=0.033333
[/PACKET]
[PACKET]
stream_index=0
pts_time=1.100000
duration_time=0.033333
[/PACKET]
[PACKET]
stream_index=0
pts_time=1.066667
duration_time=0.033333
[/PACKET]
[PACKET]
stream_index=0
pts_time=1.133333
duration_time=0.033333
[/PACKET]
[PACKET]
stream_index=0
pts_time=1.300000
duration_time=0.033333
[/PACKET]
[PACKET]
stream_index=0
pts_time=1.233333
duration_time=0.033333
[/PACKET]
[PACKET]
stream_index=0
pts_time=1.200000
duration_time=0.033333
[/PACKET]
[PACKET]
stream_index=0
pts_time=1.266667
duration_time=0.033333
[/PACKET]
[PACKET]
stream_index=0
pts_time=1.433333
duration_time=0.033333
[/PACKET]
[PACKET]
stream_index=0
pts_time=1.366667
duration_time=0.033333
[/PACKET]
[PACKET]
stream_index=0
pts_time=1.333333
duration_time=0.033333
[/PACKET]

Some questions:

  1. Are frames being dropped in the process or some other assumption?

  2. Is there any claim on the accuracy of the inter-frame time difference? It would be great if this were accurate to the millisecond.

  3. If this isn't realistic, are there other libraries/modules that can do this? Interested in anything open-source that could run from the command line or in python/C++.

ereastin
  • 21
  • 1
  • 1
    Based on the packet timings, doesn't look like the optical acquisition times are used for encoding. The mebx streams may contain this info. Contact the developers/forum for exiftool to see if they can decode those. – Gyan Oct 27 '22 at 05:31
  • Will check it out, thanks. If I forced these videos to mp4 at film time (iPhone has a max compatibility option), do you know if this would be any different? – ereastin Oct 28 '22 at 02:33
  • I doubt it. Both can accommodate VFR. – Gyan Oct 28 '22 at 04:16

0 Answers0