Last weekend I participated at IOM-2009, a workshop on The Influence of I/O on Microprocessor Architecture, co-located with the High Performance Computer Architectures (HPCA) conference-2009 at Raleigh, NC. Ram Huggahalli, Principal Engineer, and Platform Architect in Communication Technology Lab, Intel served as the workshop chair. I must say that this workshop was very well organized. There were 8 presentations, 45 minutes each, leaving ample time for valuable discussions and feedback.
Challenges
The workshop addressed the challenge of how to provide more I/O to systems, particularly with 40 GigE and 100 GigE getting standardized very soon. Here are the challenges as listed by the workshop chair.
1) Making I/O an integral part of chip/system design instead of having it as a peripheral device. Networking and other I/O needs to be integrated with microprocessor design instead of being thought as a peripheral device.
2) Making some kind of revolutionary change in the way network I/O is performed because (i) Memory access wont become faster, (ii) Demand for I/O (networking) will only increase and the current way will probably not scale.
Summary of viewpoints:
Main viewpoints expressed in the workshop:
1) A presentation from Sandia National Labs demonstrated experimental data on pre-release Nehalem systems. Main points: (i) Nehalem showing much improved network I/O because of reduced local memory latency,
(ii) NUMAness in Nehalem plays major role, many benchmarks show great difference using local memory than using remote memory. a list of commercial benchmarks for which performance varies a lot depending on whether memory is located locally or remotely. Memory is controlled in linux with numactl.
2) Main stumbling bottleneck for Network I/O: Avoiding copy required from user space to kernel space for transmit and vice-versa for receive. Strategies presented:
(i) Cache injection (2 papers: 1 from IBM Labs, Zurich, and 1 from Univ. of Victoria): Strategy is to inject data received directly into cache so that when receive() is issued, there is no cache-miss and data is readily available. Challenge is that data that is displaced in cache will cause cache-misses, and therefore it is hard to come up with an algorithm which is suited to all workloads. Presenters from Univ. of Victoria presented strategies in the context of MPI running on a IBM Cell processor.
(ii) IOMMU to drive hardware accelerators (1 paper from Univ. Pittsburgh, Intel): Using IOMMU hardware to access physical memory by supplying virtual address so that a hardware accelerator device can directly access memory. The presenter demonstrated this approach in the context of a USB drive.
(iii) Creating DMA Cache (Chinese Academy of Sciences) : Having separate cache to keep I/O data before it can be read by application. As a result primary cache is not affected.
(iv) Intelligent NICs (Virginia Tech): NICs which can interact with the CPU and transfer data when required.
(3) Other Interesting papers:
(i) Using network processors to create virtual NICs. (Univ Massachussetts, Lowell)
(ii) Active end-system analysis for finding bottleneck rate for receive network I/O: This work is mainly from UCDavis, while I have contributed to the theoretical part. This work demonstrates the importance of pacing on the transmit side, and illustrates how to compute the bottleneck at the receiver using a stochastic model. Slides are available here.
-
Authors
-
Recent Posts
-
Archives
- September 2015
- September 2013
- August 2009
- May 2009
- April 2009
- March 2009
- February 2009
- December 2008
- November 2008
- October 2008
- August 2008
- July 2008
- June 2008
- May 2008
- April 2008
- March 2008
- February 2008
- December 2007
- November 2007
- October 2007
- September 2007
- August 2007
- July 2007
- May 2007
- April 2007
- March 2007
- February 2007
- January 2007
- December 2006
- November 2006
- October 2006
- September 2006
- August 2006
- July 2006
- June 2006
- May 2006
- April 2006
- March 2006
- February 2006
- January 2006
- December 2005
- November 2005
- October 2005
- September 2005
-
Categories
-
Meta