NAMDPerformance
NAMD Performance Study
NAMD is written in charm++ and thus has some unique attributes when profiled by TAU. For example the charm++ scheduler, which assigns task to processors and helps in load balancing the program, has a notion of Idling while waiting for tasks to complete. Thus TAU creates a event to capture time spent when the scheduler is in its idle state (Idle) as well as a event (Main) to account for the communication latencies. You can see how NAMD performs on different hardware with these charts:
[Image:intrepid-ranger-breakdown.png]
Where on Intrepid (BlueGene P) Idle time (red) increases as NAMD scales on Ranger (Sun x86 cluster) Main time increases (blue). This shows how Rangers relatively slower communications layer results in larger latencies as NAMD scales compared to how NAMD scales on Intrepid.
The ability for NAMD to scale to a large number of processors is highly dependent on how it is configured. Many options are provided to tweak NAMD's performance structure to optimize performance for different simulation parameters and machines. So insteed of focusing on NAMD's scaling behavior we showed how TAU can identify other performance aspects of NAMD. This chart shows the the incressing variation across processors for varius NAMD events. Notice how after each load balancing phase the divergence among processors is temporally arrested.
[Image:namd-deviation-snapshot.png|800px]