Real-time many-body dynamic collision detection with CUDA
https://tianxiag.github.io/parallel/
We would like to provide a parallel scheme for solving the real-time many-body dynamic collision detection problem with CUDA. We want to base our collision detection methods on Kinetic Sweep and Prune algorithm that take both the objects’ shapes and the objects’ motion into account.
The Sweep and Prune algorithm is a classic algorithm for solving the collision detection problem. It involves mainly 2 phases: sweep phase and prune phase. It will first check if the endpoints of objects have overlap in the sweep phase, and then check whether the objects collide in the prune phase. On top of that, we want to use motion model to find collision detection dynamically in terms of motion. The steps of the model can be summarized in the graph below:
(Source: Kinetic Sweep and Prune for Collision Detection by Daniel S. Coming and Oliver G. Staadt http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.115.4443&rep=rep1&type=pdf ). Because collision detection is widely used in computer graphics and computer animations, and also because the computation is very intensive for this problem, collision detection might easily become a bottleneck in graphics rendering. On the other hand, computationally intensive might also mean that there are rooms for parallelism once the problems of data communication and data dependency are solved.
We hope that parallelism could greatly increase the speed of collision detection and could enhance the overall experience of graphics rendering.
The problem is challenging in the following ways.
It has time dependency. The work from the next timestamps rely on the works from previous timestamps, so we have to figure out a way to hide the stall.
We need to solve the work balance problem. Because we are dealing with a multi objects problem, we would not want to compare one object with all other objects. In this way, we have to find the closest objects of a particular object. Because those objects will not be distributed evenly, we need to find a way to balance the work, and also to solve the potential locality problem.
Computing: GHC Machines.
Code: Start from scratch.
- A paper that describes one possible implementation of Collision Detection.
- A survey paper that summarizes and compares several parallelization methods.
PLAN TO ACHIEVE
An implementation that ensures the correctness of collision detection.
A set of pairs of “input that varies in shapes and object numbers” and “output that validates the quality of the collision detection algorithm”.
A CUDA-based parallelized version that has a speedup ratio relatively proportional to the hardware capability.
HOPE TO ACHIEVE
Other parallelized versions based on OpenMP, MPI, SIMD, and/or Pthreads.
A preprocessing module that can select the most suitable parallelization algorithm according to the features of the input.
Language: C/C++.
CUDA: Given the independence of input, CUDA should be able to reach a satisfactory scalability.
11.1-11.7: Read paper, do background research, start to generate valid input-output pairs.
11.8-11.14: Finish to generate valid input-output pairs, develop the serial version of collision detection code, develop performance analysis scripts.
11.15-11.21: Finish the serial version of collision detection code, start to develop a CUDA-based parallelization version, write the milestone report.
11.22-11.28: Finish the development of the CUDA version, generate more input-output pairs and analyze the relationship between scalability and input features.
11.29-12.5: Develop other parallelization versions, keep marking down the insights from the analysis results.
12.6-12.9: Write down final report.
After doing many researches on collision detection algorithm, we choose the V-Collide algorithm designed in http://gamma.cs.unc.edu/V-COLLIDE/ to be the base of our sequential collision detection algorithm, and build the kinetic motion part on it. We configed and ran the start code on customized simple cases such as boxes and circles meeting each other halfway.
The start code is built to be a library instead of executable files. To migrate the starter code to Cuda, we simplified the packaging issue by linking the collision detection method directly to the testing cases.
Progress
We can now run the serialized version to ensure the functionality of collision detection. The Cuda algorithm is implemented by inheriting the sequential method. The integration has been tested to be successful.
Schedule
We are on the track as we expected at the beginning.
Because the collision detection code is very long and has a lot of dependencies, it took us a while to figure out the logic and reorganize the code to run in parallel. Currently we have a runnable version of code (in serial), and are still trying our best to ensure the correctness of our cuda collision detection code. We spent longer than expected time on migrating the sequential code to Cuda because of the unexpected complexity of code. However, we hope that we can catch up during Thanksgiving time. We still believe that we can finish our goals on time. For now, we decide to stick to the original plan, with the schedule slightly revised.
Revised Schedule
Hao
11.22-11.24 - Finish parallelized code in Cuda for n-body collision parts.
11.25-11.28 - Propose and implement methods to parallelize over the OBB tree of the collision detection algorithm.
11.28-12.1 - Finish parallelized code for simple cases such as 2 boxes
12.2-12.5 - Generalize the code for complex cases
12.6-12.9 - Wrap up and write final report
Tianxiang
11.22-11.28 - Finish parallelized code in Cuda for n-body collision parts.
11.29-12.1 - Integrate performance checking and correctness validation scripts.
12.2-12.5 - Integrate OpenGL rendering functionality for the demo videos.
12.6-12.9 - Wrap up and write final report
We plan to use a demo of videos displaying the animation of collided objects.
We hope to pay more attention to the parallelism algorithm itself instead of on the demonstration tools. Thus, even though an interactively-runnable demo may be more impressive and descriptive, we decide to use static videos to convey the ideas.
For now, we are still trying to resolve the dependency of methods. There are two major pain-points: