My idea is: 1) Determine how frequently you want to update the number on screen. Every frame is too often, once per second is too infrequent. Maybe 4× per second is OK. 2) Determine how much time to average over. One or two seconds seems fine. 3) From those two numbers, compute number of buckets. For example update 4× per second, average over 2 seconds, gives you 8 buckets. 4) Every time a frame is drawn, based on current time, you either a) increment the counter in the last bucket or b) shift all buckets over one position and increment the new last bucket from zero to one. 5) Average the sum of frames drawn over that time period you decided earlier. 6) Don't display fraction of a FPS. 7) Done. It costs you 8 integers, plus maybe one more for a timestamp of the last bucket, plus maybe one more to implement ring buffer instead of shifting all those integers over. Less than a single CPU cache line.
If all you care about is a windowed average you can implement the fps counter (really you keep track of seconds per frame) using an exponential moving average which has constant time and space complexity regardless of window size. The calculation you do once a frame is:
Which will give smoothing comparable to an n sample moving average. This is the same formula used for n-day exponential moving averages for stocks.
As the article points out, this is a sample based window which is not as good as a time based window, but it's also dead simple to implement.
Edit:
Just spit balling because I haven't thought about this problem in a while and asked AI to give a foruma for EMA with variable duration events, so take it with a grain of salt. Maybe for a time based window you could use a dynamic alpha (forgetting factor) with the following formula:
For dynamic alpha, start by pretending the new incoming, windowed data doesn't exist. Look at your old data. How quickly do you want it to disappear? It'll have some formula like e^-(bt) for some constant b governing what proportion from 0 to 1 of that data still remains.
So...T seconds have elapsed since your last frame, and you have a new data point you'd like to incorporate into your EMA (and, for this problem, that data point is T itself). You keep e^-(bT) of the old data, 1 minus that of the new data, and you're done. Alpha is indeed 1-e^-(b * cur_spf) for some constant b, just like the AI said.
Which b do you choose though? I usually prefer to think of it in terms of half lives. Let H be your desired half life, and set 1-e^-(bH) equal to 1/2. You get b=log(2)/H. That's similar to the AI answer, but it rescales window_secs into a parameter you can actually start to reason about. The AI answer gives you a 1/e life instead, which is a less comfortable constant to mentally process.
As a fun fact, you can tailor this sort of stateless solution to have almost any decay property you'd like. Start with your favorite ODE satisfying a few properties, and this equation falls out as the step a discrete solver would take to approximate the ODE.
That's basically a one-pole low pass filter. You can use this for smoothing any kind of data that arrives at a steady rate.
You can also dynamically calculate the filter coefficient based on the current delta time, which makes sure that the smoothing behavior is independent from the current framerate.
Thanks, I added a formula for a dynamic filter coefficient to my original comment. It makes sense intuitively to me, but I'm also not certain if the exact formula I provided is correct.
Ideally you'd want to measure _perceived_ performance of the game by players, which would probably depend on the _lowest_ "fps" value during the specific interval. I've seen some games change the colour of the fps counter based on whether or not there were significant FPS _dips_ below the one-second average. So e.g. you might be able to render 100 frames in a specific second, but if one frame took 0.1s and the others took the rest, then for users it'll feel like the game plays at 10fps at that point, even though the actual number of frames rendered is much higher.
> but if one frame took 0.1s and the others took the rest, then for users it'll feel like the game plays at 10fps at that point
Wouldn't it feel like 10fps for 0.1s only? I agree it's a good thing to measure, I think it's called "stutter" usually, but I'm not sure you can say "it feels like 10 fps" since its for such a small moment.
Yes, you are right. The word I was looking for is smoothness: the game won't feel like a stable 10fps, but it would feel as _smooth_ as a 10fps game, or even worse actually, because it's less predictable, and our brains can adapt to stable latency relatively well
Right, if that happens every 5 second or whatever, you'd feel intermittent stutters basically, but again, it wouldn't really "feel like a 10fps game", because those render 10 frames per second, not "X frames per second and every Y seconds, a 0.1s frame", a game like that would just feel "stuttery", at least from my perspective, not "like a 10fps game".
> FPS based on last frame is good to see load spikes, particular periodic ones, if you graph them.
This.
The other two are not good for anything gaming as any framerate inconsistency breaks the experience. A stable frame rate is much more important than a high one.
It all boils down to what FPS counter is suppose to show. In my games I make three delta time indicators: 100%, low1%, and low01% average over 10s rolling window. Helps spotting dropped frames and stutters.
Technically, the methods with a queue drop up to an entire frame at the beginning of the window. Depending on how the averageProcessingTime() function is implemented, this can mean either faster recovery after a single heavy frame (if it divides by the sum of the durations of the frames in the window) or slightly lower than actual values overall (if it just divides by the duration of the window).
But that's just the nerd in me talking. The article is great!
Present-to-present time, especially while we wait for VK_EXT_present_timing to become adopted, can only be indirectly measured. This makes just-in-time rendering unnecessarily hard. High-accuracy event timings can only be made for rendering, not presentation. The missed latches can be seen by phase doubling. Waiting on the last frame to finish displaying requires use of a fence on a separate timing thread. The timings provided by this and by VK_KHR_present_wait are muddied with OS scheduler latency. Spin-locking the waits with zero timeouts should be a thing, but does not seem to be guaranteed. The compositor also seems to inject scheduler jitter.
After all that, people can talk about averaging methods, but there's a lot to be done before what this blog is talking about is even available.
The reason solving just-in-time rendering is important is because queue priority is not actually supported by most drivers. Some extensions can give us global priority for the process, not real priority for queues. The right way then to avoid workload A from causing workload B to miss a latch is to put workload A into the idle time that would exist from running B just in time. This is itself a luxury based on the fact that workload B is lightweight enough that its own uncertainty can only rarely exceed the latch deadline.
At least on VRR displays, making B a bit late has much less dire consequences, but driving refresh from the application needs exclusive access to the display, and not all compositors want to provide this.
Please do reach out if it seems like I'm only still catching up. I'm sure someone knows a decent way to get sub-millisecond just-in-time rendering accuracy without watching the phase suddenly double on FRR. Ping https://github.com/positron-solutions/mutate and we can get in touch.
...and don't just smooth your measured frame duration for displaying the FPS (or better: frame duration in milliseconds), but also use it as actual frame time for your animations and game logic timing to prevent micro-stutter.
The measured frame duration will have jitter up to 1 or even 2 milliseconds for various 'external reasons' even when your per-frame-work fits comfortably into the vsync-interval each single frame. Using an extremly precise timer doesn't help much unfortunately, it will just very precisely measure the externally introduced jitter which your code has absolutely no control over :)
What you are measuring is basically the time distance between when the operating system decides to schedule your per-frame workload. But OS schedulers (usually) don't know about vsync, and they don't care about being one or two milliseconds late, and this may introduce micro-stutter when naively using a measured frame time directly for 'driving the game logic'.
For instance if the previous frame was a 'long' frame, but the current frame will be 'short' because of scheduling jitter, you'll overshoot and introduce visible micro-stuttering, because the rendered frames will still be displayed at the fixed vsync-interval (I'm conveniently ignoring vsync-off or variable-refresh-rate scenarios).
The measurement jitter may be caused by other reasons too, e.g. on web browsers all time sources have reduced precision since Sprectre/Meltdown, but thankfully the resulting jitter goes both ways and averaging/filtering over enough frames gives you back the exact refresh interval (for instance 8.333 or 16.667 milliseconds even when the time source only has millisecond precision).
On some 3D APIs you can also query the 'presentation timestamp', but so far I only found the timestamp provided by CADisplayLink on macOS and iOS to be completely jitter-free.
I also found an EMA filter (Exponential Moving Average) more useful than a simple sliding window average (which I used before in sokol_app.h). A properly tuned EMA filter reacts quicker and 'less harshly' to frame duration changes (like moving the render window to a display with different refresh rate), it's also has less implementation complexity because it doesn't require a ring buffer of previous frame durations.
TL;DR: proper frame timing for games is a surprisingly complex topic because desktop operating systems are usually not "tuned" for game workloads.
Also see the "classic" blog post about frame timing jitter:
spf_avg = alpha * cur_spf + (1.0f - alpha) * spf_avg;
For alpha value you can use the formula:
alpha = 2/(n+1)
Which will give smoothing comparable to an n sample moving average. This is the same formula used for n-day exponential moving averages for stocks.
As the article points out, this is a sample based window which is not as good as a time based window, but it's also dead simple to implement.
Edit:
Just spit balling because I haven't thought about this problem in a while and asked AI to give a foruma for EMA with variable duration events, so take it with a grain of salt. Maybe for a time based window you could use a dynamic alpha (forgetting factor) with the following formula:
alpha = 1-e^-(cur_spf/window_secs)
So...T seconds have elapsed since your last frame, and you have a new data point you'd like to incorporate into your EMA (and, for this problem, that data point is T itself). You keep e^-(bT) of the old data, 1 minus that of the new data, and you're done. Alpha is indeed 1-e^-(b * cur_spf) for some constant b, just like the AI said.
Which b do you choose though? I usually prefer to think of it in terms of half lives. Let H be your desired half life, and set 1-e^-(bH) equal to 1/2. You get b=log(2)/H. That's similar to the AI answer, but it rescales window_secs into a parameter you can actually start to reason about. The AI answer gives you a 1/e life instead, which is a less comfortable constant to mentally process.
As a fun fact, you can tailor this sort of stateless solution to have almost any decay property you'd like. Start with your favorite ODE satisfying a few properties, and this equation falls out as the step a discrete solver would take to approximate the ODE.
You can also dynamically calculate the filter coefficient based on the current delta time, which makes sure that the smoothing behavior is independent from the current framerate.
Wouldn't it feel like 10fps for 0.1s only? I agree it's a good thing to measure, I think it's called "stutter" usually, but I'm not sure you can say "it feels like 10 fps" since its for such a small moment.
FPS based on the median of a moving window is good if you want perceived frame rate, which rejects extreme outliers.
FPS based on the average of a moving window is good if you want statistical mean frame rate.
This.
The other two are not good for anything gaming as any framerate inconsistency breaks the experience. A stable frame rate is much more important than a high one.
But that's just the nerd in me talking. The article is great!
After all that, people can talk about averaging methods, but there's a lot to be done before what this blog is talking about is even available.
The reason solving just-in-time rendering is important is because queue priority is not actually supported by most drivers. Some extensions can give us global priority for the process, not real priority for queues. The right way then to avoid workload A from causing workload B to miss a latch is to put workload A into the idle time that would exist from running B just in time. This is itself a luxury based on the fact that workload B is lightweight enough that its own uncertainty can only rarely exceed the latch deadline.
At least on VRR displays, making B a bit late has much less dire consequences, but driving refresh from the application needs exclusive access to the display, and not all compositors want to provide this.
Please do reach out if it seems like I'm only still catching up. I'm sure someone knows a decent way to get sub-millisecond just-in-time rendering accuracy without watching the phase suddenly double on FRR. Ping https://github.com/positron-solutions/mutate and we can get in touch.
The measured frame duration will have jitter up to 1 or even 2 milliseconds for various 'external reasons' even when your per-frame-work fits comfortably into the vsync-interval each single frame. Using an extremly precise timer doesn't help much unfortunately, it will just very precisely measure the externally introduced jitter which your code has absolutely no control over :)
What you are measuring is basically the time distance between when the operating system decides to schedule your per-frame workload. But OS schedulers (usually) don't know about vsync, and they don't care about being one or two milliseconds late, and this may introduce micro-stutter when naively using a measured frame time directly for 'driving the game logic'.
For instance if the previous frame was a 'long' frame, but the current frame will be 'short' because of scheduling jitter, you'll overshoot and introduce visible micro-stuttering, because the rendered frames will still be displayed at the fixed vsync-interval (I'm conveniently ignoring vsync-off or variable-refresh-rate scenarios).
The measurement jitter may be caused by other reasons too, e.g. on web browsers all time sources have reduced precision since Sprectre/Meltdown, but thankfully the resulting jitter goes both ways and averaging/filtering over enough frames gives you back the exact refresh interval (for instance 8.333 or 16.667 milliseconds even when the time source only has millisecond precision).
On some 3D APIs you can also query the 'presentation timestamp', but so far I only found the timestamp provided by CADisplayLink on macOS and iOS to be completely jitter-free.
I also found an EMA filter (Exponential Moving Average) more useful than a simple sliding window average (which I used before in sokol_app.h). A properly tuned EMA filter reacts quicker and 'less harshly' to frame duration changes (like moving the render window to a display with different refresh rate), it's also has less implementation complexity because it doesn't require a ring buffer of previous frame durations.
TL;DR: proper frame timing for games is a surprisingly complex topic because desktop operating systems are usually not "tuned" for game workloads.
Also see the "classic" blog post about frame timing jitter:
https://medium.com/@alen.ladavac/the-elusive-frame-timing-16...