Be a lifelong learner with a natural curiosity to figure out how the world works, and an architect with passion to shape the world to come by crafting the next big thing. Don't worry dude, just hacking!
这周二晚上收到了SC’12大会的邮件通知,我的论文终于被接收了.在被SC’12录取之前,我这篇文章分别被IPDPS,ICS和SC据过,每一次被拒都得到了非常多有帮助的评审意见,帮助我更好的改进这篇文章.当然,被拒的滋味不好受.我认识的众多好友投顶级会议纷纷一投就中(例如madong的IJCAI, jiayu的NIPS和SIGIR, yang xi的ASPLOS和OOSPLA),我被拒了那么多次咋还没中呢,心里的挫败感多多少少还是有一点.
简介:IBM Research最近在Big Data领域有很多工作,例如我们组在4月份在10台采用POWER7处理器的P730服务器上成功地用14分钟跑完了1TB数据的排序(7月份又在10台Power7R2上用8分44秒跑完了1TB排序),这项工作已经发表为一篇IBM Research Report,欢迎大家围观,并提出宝贵意见,谢谢。
The use of Big Data underpins critical activities in all sectors of our society. Achieving the full transformative potential of Big Data in this increasingly digital world requires both new data analysis algorithms and a new class of systems to handle the dramatic data growth, the demand to integrate structured and unstructured data analytics, and the increasing computing needs of massive-scale analytics. In this paper, we discuss several Big Data research activities at IBM Research: (1) Big Data benchmarking and methodology; (2) workload optimized systems for Big Data; (3) case study of Big Data workloads on IBM Power systems. In (3), we show that preliminary infrastructure tuning results in sorting 1TB data in 14 minutes on 10 Power 730 machines running IBM InfoSphere BigInsights. Further improvement is expected, among other factors, on the new IBM PowerLinuxTM 7R2 systems.
By: Anne E. Gattiker, Fadi H. Gebara, Ahmed Gheith, H. Peter Hofstee, Damir A. Jamsek, Jian Li, Evan Speight, Ju Wei Shi, Guan Cheng Chen, Peter W. Wong
Published in: RC25281 in 2012
LIMITED DISTRIBUTION NOTICE:
This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.
#include <amp.h>
#include <iostream>
using namespace concurrency;
void CampMethod() {
int aCPP[] = {1, 2, 3, 4, 5};
int bCPP[] = {6, 7, 8, 9, 10};
int sumCPP[5] = {0, 0, 0, 0, 0};
// Create C++ AMP objects.
array_view<int, 1> a(5, aCPP);
array_view<int, 1> b(5, bCPP);
array_view<int, 1> sum(5, sumCPP);
parallel_for_each(
// Define the compute domain, which is the set of threads that are created.
sum.extent,
// Define the code to run on each thread on the accelerator.
[=](index<1> idx) restrict(amp)
{
sum[idx] = a[idx] + b[idx];
}
);
// Print the results. The expected output is "7, 9, 11, 13, 15".
for (int i = 0; i < 5; i++) {
std::cout << sum[i] << "\n";
}
}
Intel’s Nehalem family of CPUs span from large multi-socket 32 core/64 thread systems to ultra small form factor laptops. What were some of the key tradeoffs in architecting and developing the Nehalem family of CPUs? What pipeline should it use? Should it optimize for servers? For desktops? For Laptops? There are lots of tradeoffs here. This talk will discuss some of the tradeoffs and results.
在大型机时代,计算机非常昂贵,用户需要分时共享同一台大型机。计算资源的稀缺使得那时候的软件开发者必须高效地利用每一个处理器时钟周期,因此他们大都使用汇编、C等非常底层的语言来进行软件开发,而算法的效率是他们最关心的问题。在之后的几十年中,计算机硬件变得越来越廉价,软件开发者越来越不需要关心软件的性能。以主流的互联网应用为例,现在的开发大量使用成熟的框架来帮助自动生成大量的代码。就拿Django这个流行的Web开发框架来说,它的设计原则是“focuses on automating as much as possible and adhering to the DRY principle: Don’t Repeat Yourself.”开发者最核心的目标已经变成了如何用最少的代码,最快的速度将自己的点子转为成可用的软件产品并推向市场。“市场投放时间”已经取代“处理器时钟周期”成为软件开发的关键指标。在过去的几十年里,正是因为硬件一直在按照摩尔定律稳步地发展,所以开发者不再需要时刻关注软件的性能,而是将其注意力转移到更为重要的开发效率上,这点在近十年来Java、Python、Ruby等高级语言的兴起上就可见一斑。多核的出现,将硬件的细节再一次暴露在程序员的面前。如果想利用好多核,程序员必须手动的处理同步、死锁、数据竞跑等疑难问题,这极大的降低了软件开发的效率。现有的生产工具(多核开发框架、开发工具)远不能满足生产力(软件开发效率)的发展需要,还有很大的发展空间。可以预见,不久的将来更简单易用的多核开发框架将不断涌现,在多核上进行并行编程将变得越来越容易。
[2] Wei Xue, JuWei Shi, Bo Yang. X-RIME: Cloud-Based Large Scale Social Network Analysis. Proceedings of 2010 IEEE International Conference on Services Computing.
[3] Kai Shuang, Yin Yang, Bin Cai, Zhe Xiang. X-RIME: HADOOP-BASED LARGE-SCALE SOCIAL NETWORK ANALYSIS. Proceedings of IC-BNMT2010.