Skip to content

bobhansky/FrustumCullingPerformanceAnalysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

FrustumCullingPerformanceAnalysis

Features of this project

  • Bounding Volume Hierarchy
  • CPU frustum culling
  • GPU frustum culling
  • Compute shader
  • Instance Drawing

The Main purpose is to compare the peformance boost with some features, and also analyze the performnance bottleneck.

Frustum Culling:

Demo

6 Versions:

  • Ver1: No instance drawing (draw call for every identical object) + No Frustum Culling.
  • Ver2: No instance drawing + With CPU Frustum Culling
  • Ver3: With Instance drawing + No Frustum Culling
  • Ver4: With Instance drawing + With CPU Frustum Culling. Update and send the ModelMatrix(4x16 bytes * N instance) of visible objects to shader by VBO every frame
  • Ver5: With Instance drawing + With CPU Frustum Culling. Send all model matrix of an object to shader in SSBO once, and then update and send the indices(4 bytes * N instance) of visible objects to shader every frame
  • Ver6: With Instance drawing + With GPU Frustum Culling (Compute Shader).

Note

1. BVH only contain 1 object on leaf node. It is partitioned by longeset extention and by midpoint.
2. CPU Frustum Culling test visibility by traversing through BVH and test against bounding box of the node that's been traversing.
3. GPU Frustum Culling test visibility by sending the bounding box of each objects to compute shader to tell.  (brute force)
4. Applications is ran under release mode, unless there is specific setting to debug mode.

Test Environment

Test scene description: The cup objects (666 trirangles) forming an 3d matrix, with dimension NUM = 40 and in total 64001 cups.

Each setting is at the same position, looking at the same angle, rendering the same content to the frame.

Test is performed on Windows 11, i7 12700h, Nvidia RTX3060 laptop

Result

Ver1: No instance drawing (draw call for every identical object) + No Frustum Culling.

Naive drawing called 64001 (instances) * 2 (meshes) times of draw calls, resulting in an unsuprisingly low 7 fps


Ver2: No instance drawing + With CPU Frustum Culling

Naive drawing + CPU frustum culling only drew 29450 objects. in total of 29450 * 2 times of draw calls, resulting a doubled 15 fps compared to Ver1


Ver3: With Instance drawing + No Frustum Culling

Instance Drawing + No Frustum Culling reduces the draw call times to 2 but rendered all 64001 objects. FPS increased to 68 fps. Increased 353% compared to V2


Ver4: With Instance drawing + With CPU Frustum Culling. Update and send the ModelMatrix(4x16 bytes * N instance) of visible objects to shader by VBO every frame

Instance Drawing + CPU Side Frustum Culling:

In Release Mode, where CPU performance is not capped, the bottleneck is at gpu. Fps Increased to 137 compared to 68 of Ver3. Increased 101% compared to V3

In Debug Mode, CPU performance is significantly slower (to mimic cpu bottleneck). With Frustum Culling, the extra computation to check if bounding box is inside the view frustum comsumes nearly half of the cpu cycles. Thus in this case, with frustum culling, the fps is 37, significantly lower than the fps of Ver3 in Debug mode: still 68 fps.

CPU profile in Visual Studio 2022 shows ~45% of time is on updateVisibleObject(), inside which program performs BVH traversal and test bounding volume against view frustum.


Ver5: With Instance drawing + With CPU Frustum Culling. Send all model matrix of an object to shader in SSBO once, and then update and send the indices(4 bytes * N instance) of visible objects to shader every frame

CPU-GPU communication data size is reduced to 1/16 compared to the method of sending matrices to gpu every frame. Increase to 153 fps. Increased 11.678% compared to V4 release mode.

Ver6: With Instance drawing + With GPU Frustum Culling (Compute Shader).

In release mode, FPS reduced to 140 because of the bottleneck is at gpu in release mode, and at the same time gpu need to do frustum culling, and program needs to synchronize with the execution of compute shader. In Debug mode, where bottleneck is at cpu, transfering the frustum culling job to gpu makes the fps goes from 37 to 138. (Debug mode VER4 VS Debug mode VER6)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors