It is not only the latency of the memory accesses that can limit the execution speed of an application. There is also a limited ratio at which data can be read from main memory, the memory bandwidth limit.
When the memory bandwidth limit is reached it is no longer the latency of the memory accesses that limits the execution speed, but the number of accesses to main memory that the application causes. The only way to improve performance is then to reduce the fetch ratio of the application, so that the number of main memory accesses is reduced.
When a program hits the memory bandwidth limit some optimizations intended to reduce the impact of memory access latencies become ineffective, or even reduce performance. For example, prefetching does not reduce the number of main memory accesses, so it loses its effect. Instead, prefetching often leads to more main memory accesses since some data that isn't used is usually prefetched, decreasing the performance of the application.
For single-core processors it is relatively unusual that applications hit the memory bandwidth limit. Very high fetch ratios are required for this to happen. However, with the current trend towards multithreaded and multicore processors the computation power of processors is increasing very rapidly, while the memory bandwidth is increasing much more slowly. This makes the memory bandwidth limitation one of the biggest problems for scaling application performance on multicore processors.
Applications that manage to stay below the memory bandwidth limit on single-core processors and having problems that were hidden by hardware or software prefetching, may suddenly hit the memory bandwidth limit when parallelized on multicore processors. It is not unusual that applications get no speed-up at all. Some applications even get reduced performance, since threads start throwing each other's data out of the cache, further increasing the fetch ratio.