Make Flame 32 times faster with collision detection and additional tricks

Alexey Volkov
5 min readDec 10, 2022

Flame rendering performance is strictly dependent on how fast your CPU is and how many operations it must perform at each rendering cycle. I already described it in detail in my previous article about optimization techniques.

But if you have a lot of interacting objects, the most expensive operation in your game would be collision detection. It might be so expansive that you would not see anything else except slow collision calculation on the performance-monitoring screen. It will hide all other unoptimized things and become the most painful problem.

By default, Flame uses “Sweep and prune” algorithm at broad phase of collision detection. It works fast enough if you have less than 100 collideable components. In other case you will have big performance problems.

But not since Flame 1.4.0 release. Since this release you can change broad phase algorithm to more efficient one: “Quad tree”.

Quad Tree implementation problems

I’ll skip detail description of “what broad phase is?” — it is already well described in official Flame documentation and many other articles about collision detection.

I’ll also skip detailed description of “how does Quad Tree work?” — this is very common thing for gamedev ant there are a lot of articles about.

Here I want to describe Flame’s implementation features, why it would work faster with this engine than anything you’ll find in internet.

For Dart we already have implementation of the algorithm: quadtree_dart. It works nice, it even has spectacular demo, but it shares problems of many other implementations — no matter what language was used:

  1. Authors of algorithms firstly think about objects that constantly moves. So that’s why they do not bother about state preservation, they just recalculate whole map every calculation tick, every time rebuild quadrants from scratch. As the advantage of such approach, they highlight that a map would be divided into quadrants in most optimal way for collision calculation.
  2. But in many games, there are a lot of static objects, at least temporary static. In such situation it worth to divide the map into quadrants once and then only receive updates for moved objects, and to create new sections if needed. And to skip resource-heavy step of removing unnecessary quadrants.
  3. As a consequence of 1st point, many algorithms do not save relation between object and quadrant, instead it is calculated every time by object’s coordinates. This is also too expensive, at least for Dart.

So, many of “algorithmic guys” prefer to run calculations over and over instead saving state in something like HashMap and just update it incrementally in cause if something would change. It absolutely worth to spend a little more memory, to create additional data structures but increase performance a lot.

Flame’s implementation is based on one of such algorithms (to be concrete, on this one: https://gamedevelopment.tutsplus.com/tutorials/quick-tip-use-quadtrees-to-detect-likely-collisions-in-2d-space--gamedev-374 ), but all mentioned problems are solved.

Benchmarks

A month before “QuadTree feature” become a part of Flame, I wrote small benchmark app to demonstrate difference between “Sweep and prune” and “Quad tree”. You can to try it here. It uses Quad Tree from external library, still not refactored and merged into “main”. But it does not matter, if there is some difference in performance — it would be absolutely small.

At this example you can compare performance in 3 modes:

  1. Standard Flame’s “Sweep” algorithm
  2. New “Quad Tree” algorithm
  3. New “Quad Tree” algorithm with avoiding of unnecessary “updateTree” and “renderTree” calls

Here is what I saw in first mode, with “TestGameStandard” launched:

D means disappointment

Very disappointing picture. The most of application time is spend on calculating collisions, and this can not pe playable at all.

Let’s change “TestGameStandard” to “TestGameQuadTree” but let’s additionally remove the mixin “UpdateOnce” from “Brick” class. Most probably you would see something like this:

Nice but still not ideal.

Well, new collision system works 2x faster than “Sweep”! But we still did not achieve our goal: at least 60 FPS. What’s a problem?

As we can see at CPU Flame Chart, collision detection system does not spend so much time as before, and it reveals for us another bottleneck: useless “updateTree” calls. If you would explore this example, you will notice that it does not render every Brick individually. Instead, Bricks are pre-rendered into “LayerComponent” and this cached picture is displayed every game tick. Brick’s state is never changed. So, what actually does updateTree then? Nothing. It just spends our recources, running unnecesary loops and calling functions withought any real effect.

Well, let’s bring “UpdateOnce” mixin back into its place at Brick class. And launch example at this mode. The result should look like this:

This is how a “success” looks like

Congratulations, we got it!

This is more precise benchmark results for better evaluation of effectivity of every used solution:

Microseconds spend on “update()”:

  • with “Sweep and Prune”: average 80 000
  • with “Quad Tree”: average 2500

Objects pairs which are potentially collided:

  • with “Sweep and Prune”: 19–150
  • with “Quad Tree”: 13–60, and 0 with additional check of minimal distance.

As we can see, “Quad Tree” is faster than “Sweep and prune” and works a bit more effective. However, unnecessary updateTree() calls are your another greatest enemy. Combining these two approaches we reached 32 times performance improvement!

Problems

You might to think that this algorithm is something like “silver bullet” for collision performance problem. But this would not be a truth.

Here is a list of problems:

1. Quadrants recursive searching. We can not just check one quadrant. Some objects are “on edge” between quadrants, it means that we should check wider range of objects. Let’s look at following example:

Objects, marked by red lines, are “objects on the edge”, such objects belong to multiple quadrants. It makes us to gather objects from “parent” quadrants. In worst cause collision check could be performed between target object and all objects on the map.

2. We should know game field size. If you game field will grow up dynamically, you will need to clear all QuadTree and rebuild it from scratch, again and again. This operation isn’t cheap.

So, QuadTree still is very basic and general optimization approach. It just increases possible amount of collideable game objects, but it is still limited.

--

--