CS513 Assignment 4.2: Faster Particles (Computation in GPU)
by Muhammad Karim
General Configurations:
   # Development Platform: Linux 64-bit, Code::Blocks IDE with GCC, Debug+Release build
   # Used OpenFrameworks library for linux
   # CPU: Intel Core2Duo 2.24 GHz
   # Graphics Hardware: ATI Mobility Radeon HD 3470 (OpenGL 2.1)
Particle Rendering Results:
Particles (Position and Velocity) updated in GPU (PBO+FBO) with Blending
128x128 particles generated, FBO updates were tested in debug rectangle behind
Comparison with or without blending
Particle image fragments were discarded after a certain distance from the center
Timings for particle udpates and rendering in client/cpu/Vertex Array vs update in GPU and rendering using PBOs
FPS was recorded after 10 seconds from starting the application to let the particles move around and get stabilized.
All times are in seconds.
# of Particles
VA + CPU
R2VB + GPU
32x32
57
35
64x64
60
62
128x128
56
57
256x256
32
36
512x512
11
15
1024x1024
3
3
***Update on Point Sprites Windows
I have updated the latest ATI driver (Using Mobility Modder tool to install the actual Catalyst distribution by ATI rather
the only driver available for the laptop by the manufacturer, and ... got Point Sprites to work on Windows (Finally!)
Discussion:
Dynamics: Each particles generates with a random direction from a source position and shoots out upwards with some gravitational pull, as the gravitational pull increases in time, they start dropping in the floor and bounces off.
For particle updates (position and velocity) in the GPU I used FBOs to bake the position and velocity of the particles and used ping-pong swapping with 2 sets of buffers (at each fram one is the Source and another the destination render target.
Before rendering particles in each fram the FBO updated position contents were copied in GPU into the PBO using OpenGL's PBO funcitons. I found this to be a little on low particle count but faster on reasonably higher particle counts (64x64 or more).
I tried using VTF in the vertex program but since I am using an ATI card it only seem to support arbvp1 profile. I tried compiling the programs in glsl but could not get correct behaviour out of them. So I only tried the R2VF technique.
I used single render targets for each FBO which means I had to use separate FBOs for particle Position and Velocity. For some reason I could not access more that 1 texture sampler (using tex2D) from a fragment program that is bound with MRT! They seem to out put different colors in multiple targets correctly but could not fetch texel values from multiple samplers. I am planning to translate my shaders in GLSL
and try to see if that would work.
GPU computation seem to give better results in most cases where there were relatively high count of particles, but the difference was not very significant in release builds. Most likely this is because the GPU is a mobility class lower end version and the CPU is relatively high end and fast.
Point sprites seem to generate coordinates for the fragment program correctly, and render correctly in the linux platform, the same code does not seem to work with windows (driver issue).
I will upload code and binaries in a few days, have not yet figured out how to make a distributable package on linux yet.