CS513 Assignment 4.2: Faster Particles (Computation in GPU)

by Muhammad Karim

General Configurations:
   # Development Platform: Linux 64-bit, Code::Blocks IDE with GCC, Debug+Release build
   # Used OpenFrameworks library for linux
   # CPU: Intel Core2Duo 2.24 GHz
   # Graphics Hardware: ATI Mobility Radeon HD 3470 (OpenGL 2.1)

  1. Particle Rendering Results:

    Particles (Position and Velocity) updated in GPU (PBO+FBO) with Blending


    128x128 particles generated, FBO updates were tested in debug rectangle behind

    Comparison with or without blending


    Particle image fragments were discarded after a certain distance from the center

  2. Timings for particle udpates and rendering in client/cpu/Vertex Array vs update in GPU and rendering using PBOs

    FPS was recorded after 10 seconds from starting the application to let the particles move around and get stabilized.
    All times are in seconds.

    # of Particles VA + CPU R2VB + GPU
    32x32 57 35
    64x64 60 62
    128x128 56 57
    256x256 32 36
    512x512 11 15
    1024x1024 3 3

  3. ***Update on Point Sprites Windows

    I have updated the latest ATI driver (Using Mobility Modder tool to install the actual Catalyst distribution by ATI rather the only driver available for the laptop by the manufacturer, and ... got Point Sprites to work on Windows (Finally!)




  4. Discussion:

    1. Dynamics: Each particles generates with a random direction from a source position and shoots out upwards with some gravitational pull, as the gravitational pull increases in time, they start dropping in the floor and bounces off.

    2. For particle updates (position and velocity) in the GPU I used FBOs to bake the position and velocity of the particles and used ping-pong swapping with 2 sets of buffers (at each fram one is the Source and another the destination render target.

    3. Before rendering particles in each fram the FBO updated position contents were copied in GPU into the PBO using OpenGL's PBO funcitons. I found this to be a little on low particle count but faster on reasonably higher particle counts (64x64 or more).

    4. I tried using VTF in the vertex program but since I am using an ATI card it only seem to support arbvp1 profile. I tried compiling the programs in glsl but could not get correct behaviour out of them. So I only tried the R2VF technique.
    5. I used single render targets for each FBO which means I had to use separate FBOs for particle Position and Velocity. For some reason I could not access more that 1 texture sampler (using tex2D) from a fragment program that is bound with MRT! They seem to out put different colors in multiple targets correctly but could not fetch texel values from multiple samplers. I am planning to translate my shaders in GLSL and try to see if that would work.

    6. GPU computation seem to give better results in most cases where there were relatively high count of particles, but the difference was not very significant in release builds. Most likely this is because the GPU is a mobility class lower end version and the CPU is relatively high end and fast.

    7. Point sprites seem to generate coordinates for the fragment program correctly, and render correctly in the linux platform, the same code does not seem to work with windows (driver issue).

    8. I will upload code and binaries in a few days, have not yet figured out how to make a distributable package on linux yet.