Real-time many-body dynamic collision detection with CUDA

Proposal

TITLE

PROJECT LINK

https://tianxiag.github.io/parallel/

SUMMARY

We would like to provide a parallel scheme for solving the real-time many-body dynamic collision detection problem with CUDA. We want to base our collision detection methods on Kinetic Sweep and Prune algorithm that take both the objects’ shapes and the objects’ motion into account.

BACKGROUND

The Sweep and Prune algorithm is a classic algorithm for solving the collision detection problem. It involves mainly 2 phases: sweep phase and prune phase. It will first check if the endpoints of objects have overlap in the sweep phase, and then check whether the objects collide in the prune phase. On top of that, we want to use motion model to find collision detection dynamically in terms of motion. The steps of the model can be summarized in the graph below:

(Source: Kinetic Sweep and Prune for Collision Detection by Daniel S. Coming and Oliver G. Staadt http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.115.4443&rep=rep1&type=pdf ). Because collision detection is widely used in computer graphics and computer animations, and also because the computation is very intensive for this problem, collision detection might easily become a bottleneck in graphics rendering. On the other hand, computationally intensive might also mean that there are rooms for parallelism once the problems of data communication and data dependency are solved.

We hope that parallelism could greatly increase the speed of collision detection and could enhance the overall experience of graphics rendering.

THE CHALLENGE

The problem is challenging in the following ways.

It has time dependency. The work from the next timestamps rely on the works from previous timestamps, so we have to figure out a way to hide the stall.
We need to solve the work balance problem. Because we are dealing with a multi objects problem, we would not want to compare one object with all other objects. In this way, we have to find the closest objects of a particular object. Because those objects will not be distributed evenly, we need to find a way to balance the work, and also to solve the potential locality problem.

RESOURCES

Computing: GHC Machines.

Code: Start from scratch.

REFERENCE

Kinetic Sweep and Prune for Collision Detection

- A paper that describes one possible implementation of Collision Detection.

The Study of Parallel Collision Detection Algorithms

- A survey paper that summarizes and compares several parallelization methods.

GOALS AND DELIVERABLES

PLAN TO ACHIEVE

An implementation that ensures the correctness of collision detection.

A set of pairs of “input that varies in shapes and object numbers” and “output that validates the quality of the collision detection algorithm”.

A CUDA-based parallelized version that has a speedup ratio relatively proportional to the hardware capability.

HOPE TO ACHIEVE

Other parallelized versions based on OpenMP, MPI, SIMD, and/or Pthreads.

A preprocessing module that can select the most suitable parallelization algorithm according to the features of the input.

PLATFORM CHOICE

Language: C/C++.
CUDA: Given the independence of input, CUDA should be able to reach a satisfactory scalability.

SCHEDULE

11.1-11.7: Read paper, do background research, start to generate valid input-output pairs.
11.8-11.14: Finish to generate valid input-output pairs, develop the serial version of collision detection code, develop performance analysis scripts.
11.15-11.21: Finish the serial version of collision detection code, start to develop a CUDA-based parallelization version, write the milestone report.
11.22-11.28: Finish the development of the CUDA version, generate more input-output pairs and analyze the relationship between scalability and input features.
11.29-12.5: Develop other parallelization versions, keep marking down the insights from the analysis results.
12.6-12.9: Write down final report.

Check Point

One to two paragraphs, summarize the work that you have completed so far. (This should be easy if you have been maintaining this information on your project page.)

Research and the Starter Code

After doing many researches on collision detection algorithm, we choose the V-Collide algorithm designed in http://gamma.cs.unc.edu/V-COLLIDE/ to be the base of our sequential collision detection algorithm, and build the kinetic motion part on it. We configed and ran the start code on customized simple cases such as boxes and circles meeting each other halfway.

Cuda Environment Configuration

The start code is built to be a library instead of executable files. To migrate the starter code to Cuda, we simplified the packaging issue by linking the collision detection method directly to the testing cases.

Describe how you are doing with respect to the goals and deliverables stated in your proposal. Do you still believe you will be able to produce all your deliverables? If not, why? What about the ”nice to haves”? In your milestone writeup we want a new list of goals that you plan to hit for the poster session.

Progress

We can now run the serialized version to ensure the functionality of collision detection. The Cuda algorithm is implemented by inheriting the sequential method. The integration has been tested to be successful.

Schedule

We are on the track as we expected at the beginning.

Because the collision detection code is very long and has a lot of dependencies, it took us a while to figure out the logic and reorganize the code to run in parallel. Currently we have a runnable version of code (in serial), and are still trying our best to ensure the correctness of our cuda collision detection code. We spent longer than expected time on migrating the sequential code to Cuda because of the unexpected complexity of code. However, we hope that we can catch up during Thanksgiving time. We still believe that we can finish our goals on time. For now, we decide to stick to the original plan, with the schedule slightly revised.

Revised Schedule

Hao

11.22-11.24 - Finish parallelized code in Cuda for n-body collision parts.

11.25-11.28 - Propose and implement methods to parallelize over the OBB tree of the collision detection algorithm.

11.28-12.1 - Finish parallelized code for simple cases such as 2 boxes

12.2-12.5 - Generalize the code for complex cases

12.6-12.9 - Wrap up and write final report

Tianxiang

11.22-11.28 - Finish parallelized code in Cuda for n-body collision parts.

11.29-12.1 - Integrate performance checking and correctness validation scripts.

12.2-12.5 - Integrate OpenGL rendering functionality for the demo videos.

12.6-12.9 - Wrap up and write final report

What do you plan to show at the poster session? Will it be a demo? Will it be a graph?

We plan to use a demo of videos displaying the animation of collided objects.

We hope to pay more attention to the parallelism algorithm itself instead of on the demonstration tools. Thus, even though an interactively-runnable demo may be more impressive and descriptive, we decide to use static videos to convey the ideas.

List the issues that concern you the most. Are there any remaining unknowns (things you simply don’t know how to solve, or resource you don’t know how to get) or is it just a 4 matter of coding and doing the work? If you do not wish to put this information on a public web site you are welcome to email the staff directly.

For now, we are still trying to resolve the dependency of methods. There are two major pain-points:

Data Structure: The starter code uses linked-lists and other customized data structures to represent intermediate results and the final outputs. This is mainly designed for the flexibility of appending newly detected collided objects. This, however, generates a lot of trouble for the data transition between the host and device memory space.
Hierarchical Implementation: The starter code encapsulates the core collision detection function inside an unwriteable function. Although the major part that we want to parallelize is the core detection function, directly invoking it will cause device functions to call host functions, which is prohibited in Cuda.