首页 > 技术相关 > Mapreduce简介

Mapreduce简介

2011年4月21日 发表评论 阅读评论

今天在Opps里和grapeot,Ouyang等人扯到了mapreduce,感觉这东西还是挺有意思的,并且和分布式系统等体系结构领域有关,我于是花了点时间去看了下(主要是看wiki以及google员工在MIT讲课的课件),在这里对mapreduce做个简要介绍。

Mapreduce是google提出的一种编程模型,该模型最早在OSDI(一个操作系统领域顶级会议) 2004上提出。其主要是用于大规模数据并行处理(如排序,索引),不过现在貌似机器学习(machine learning,ML)上用的也挺多的。其主要有两个步骤,一个是Map(可译为分发,映射),一个是reduce(可译为化简,回收,合并等),这两个概念都是从函数式语言借来的。Map和reduce的两个步骤主要工作如下:

"Map" step: The master node takes the input, partitions it up into smaller sub-problems, and distributes those to worker nodes. A worker node may do this again in turn, leading to a multi-level tree structure. The worker node processes that smaller problem, and passes the answer back to its master node.

Map阶段主要是主节点把输入划分成一些子问题,并且分发给工作节点。而工作节点有可能继续重复这一工作,直到划分够细,子问题能在规定的时间内有一个工作节点完成为止,形成一个树状结构。下一层的工作节点会将完成的计算结果传输给上一层的(主)节点。

"Reduce" step: The master node then takes the answers to all the sub-problems and combines them in some way to get the output — the answer to the problem it was originally trying to solve.

Reduce阶段把所有的问题,所有的答案按一定的方式组合化简归并,最终形成最初问题的答案并输出。

在这两个阶段中,Map阶段分配后的并行度是很高的,因此很适合一些数据处理的应用,比如说对大量网页文件进行建立索引,因为这时候可以直接把数以亿计的网页直接按数G分给各个工作节点处理(貌似google一般是以64M为单位的)。而reduce阶段的并行度相对不高,但是由于可以层层的reduce,并行度其实也很可观。

为了提高可靠性,每个节点会周期性的把完成的工作和状态的更新报告回来。如果一个节点保持沉默超过一个预设的时间间隔,主节点(类同Google File System中的主服务器)记录下这个节点状态为死亡,并把分配给这个节点的数据发到别的节点。

另外,可以看出,其实这个并行模型和分治法非常像,基本思想几乎一致,感觉是一个基本思想(分治)在大规模数据处理中的一个应用和具体化。

也许也正因为这个原因,mapreduce从技术上没有任何先进之处,感觉其火起来很大一部分是由于云计算泡沫的原因,因此一些传统的数据库领域的人甚至认为其是一种倒退,因为:

  1. 在大规模的数据密集应用的编程领域,它是一个巨大的倒退
  2. 它是一个非最优的实现,使用了蛮力而非索引
  3. 它一点也不新颖——代表了一种25年前已经开发得非常完善的技术
  4. 它缺乏当前DBMS基本都拥有的大多数特性
  5. 它和DBMS用户已经依赖的所有工具都不兼容  

顺便说件大傻帽grapeot碰到的傻帽事:

    求一个图的连通片个数,7M个节点70M条边,在Amazon Elastic EC2上开了16个small instance跑了4个半小时就跑完了。然后我写了个串行的版本,在自己的笔记本上跑。。。90秒。。

    这件事说明,任何事物都具有适用性,反过来说,任何事物都有存在的理由。想用万能的通用结构或者模型高效解决任何问题是不可能的,因此,多样性结构是高效的必要条件,因为应用是多样的。

不过,相对与传统的牛叉的各种数据库技术,mapreduce能活的这么好,主要原因是:传统的数据库太难用了,传统的并行编程模型也太难用了,而mapreduce简单,因此简单的,易用的往往是最好的,这也是windows能够打败linux的原因之一。而历史上,野蛮的蒙古人就用简单的暴力几乎整个地球上所谓的文明人!

最后,附一张来自google在MIT的课件上的图,来形象的描述mapreduce:

mapreduce示意图

参考:

维基:http://en.wikipedia.org/wiki/MapReduce

google lab:http://labs.google.com/papers/mapreduce.html

MIT MapReduce course:http://sites.google.com/site/mriap2008/home

另附:MapReduce: 一个巨大的倒退的链接(点前面标题)



本文作者: Sigma    在新浪微博关注SigmaSigmaWeibo    RSS订阅本博客
本文链接: http://www.sigma.me/2011/04/21/intro-to-mapreduce.html
本博客采用知识共享署名—非商业性-禁止演绎使用3.0协议进行许可,转载请保留作者和原文链接。

  1. 2011年4月22日00:23 | #1

    为什么shuffle在reducer上完成?感觉这才是map-reduce的核心或者说性能瓶颈,因为这一步很难并行化。可是很多教程讲得都很简略,甚至就没提到。

    为什么map-reduce要和DBMS联系起来?它是一种计算模型而不是数据存储模型啊,存储模型应该去看HDFS之类。

    关于图的连通片问题,其实map-reduce版算法的效率很高(http://www.cs.columbia.edu/~coms699812/homework2.pdf第2题题目中有介绍),应该说这个问题是适合map-reduce的。我没有在我的机器上架hadoop测试,但怀疑是amazon的map-reduce实现比较挫造成的性能损耗。

    还有你才tm大傻帽。

  2. 2011年4月22日03:03 | #2

    @grapeot

    其实维基上讲到了,我懒得翻译,主要shuffle干的事不清不楚的

    后面它要联系起DBMS是因为很多mapreduce处理后的数据一般还需要输入到数据库里吧

    图的连通片的问题,我也觉得是不是问题的问题,而是amazon那种集群结构以及网络负荷导致通信开销太大了,也许map的时间就能算完了

    傻帽就是傻帽,别转移注意力

  3. 2011年4月22日04:07 | #3

    @grapeot

    我没说shuffle没技术含量,像fft运算,其实shuffle是最难高效实现的,但介绍fft算法的人,很少会去扯shuffle这种细节

    我说他不清不楚,是指它做的事比较不具体,和你后面要怎么reduce有关,其实它可以算在reduce上面,他就是为了方便reduce

    对比的话,不是我干的,我只是列出一个反对观点而已

    起码mapreduce和dbms都是数据处理的,拿来对比也未尝不可,但是你不会说你和马桶都是用来。。。。

  4. 2011年4月22日14:55 | #4

    高端

  5. 2011年4月27日03:32 | #5

    过来看看

  6. 2011年4月22日03:48 | #6

    shuffle干的才是有技术含量的事情。。。这个很难并行所以优化一点是一点,怎么不清不楚了。。。

    囧。。如果说输出要输入到数据库里,为啥拿map-reduce和DBMS比。。。要联系也是看接口好不好啊。。。就像你拉的屎要掉到马桶里,拿你跟马桶比很奇怪啊。。。

    你才傻冒!

  7. 2019年1月9日06:33 | #7

    I must convey my respect for your generosity giving support to women who actually need guidance on this particular issue. Your real commitment to getting the message all around had been unbelievably advantageous and have continuously helped workers like me to achieve their pursuits. The warm and friendly help denotes a whole lot a person like me and especially to my office workers. Thanks a ton; from each one of us.

  8. 2019年1月9日08:32 | #8

    I not to mention my buddies were found to be examining the good tips and tricks located on your site and so suddenly I got a horrible feeling I never expressed respect to the web blog owner for them. My boys appeared to be for that reason joyful to read through them and now have sincerely been tapping into those things. Many thanks for being indeed thoughtful and for finding this sort of superb areas most people are really wanting to learn about. My personal honest regret for not expressing appreciation to earlier.

  9. 2019年1月9日15:02 | #9

    I precisely had to say thanks yet again. I am not sure the things I could possibly have worked on without the entire points documented by you over my subject. It was actually a real intimidating circumstance in my view, but coming across the very well-written form you processed the issue took me to leap for gladness. I will be thankful for the work and even expect you recognize what a great job that you are getting into instructing many people all through your webpage. I am certain you’ve never met any of us.

  10. 2019年1月9日19:20 | #10

    Thanks a lot for providing individuals with remarkably remarkable possiblity to read articles and blog posts from here. It is usually so pleasurable and also stuffed with a great time for me personally and my office colleagues to search your blog at a minimum thrice in a week to see the newest guidance you have. And of course, I am also always impressed considering the special pointers you serve. Selected 2 tips in this post are absolutely the most beneficial we’ve had.

  11. 2019年1月10日10:59 | #11

    I simply needed to appreciate you once more. I’m not certain what I could possibly have carried out without the type of concepts shown by you on this area of interest. It was actually a real frightening circumstance for me personally, but understanding this expert form you managed that forced me to cry over happiness. I’m just happy for the support as well as hope you comprehend what an amazing job you have been putting in training some other people all through a blog. Most probably you have never got to know any of us.

  12. 2019年1月10日14:38 | #12

    Thanks so much for giving everyone an extremely nice possiblity to read from this blog. It’s always so lovely and stuffed with fun for me and my office mates to search your site really 3 times weekly to study the latest items you have. Of course, I’m so actually fulfilled with the terrific methods you serve. Certain 1 areas in this posting are particularly the most efficient I have had.

  13. 2019年1月11日06:46 | #13

    I must express appreciation to you for bailing me out of this particular dilemma. After looking out throughout the world-wide-web and obtaining basics which were not helpful, I figured my entire life was well over. Living minus the answers to the difficulties you’ve resolved as a result of your post is a critical case, and ones which might have negatively damaged my career if I hadn’t noticed your blog. The natural talent and kindness in dealing with all the things was useful. I am not sure what I would’ve done if I hadn’t discovered such a step like this. I can at this time look ahead to my future. Thank you very much for your specialized and result oriented help. I will not be reluctant to recommend your web page to any individual who wants and needs counselling about this area.

  14. 2019年1月11日09:35 | #14

    I intended to create you one bit of word to help say thanks once again for these wonderful pointers you have shared above. It is simply tremendously generous of people like you in giving extensively precisely what numerous people could possibly have made available for an e book to earn some profit for their own end, especially given that you could possibly have done it if you ever desired. Those inspiring ideas likewise served as a easy way to be aware that other people online have the identical dreams similar to my own to understand a great deal more with regard to this issue. I am sure there are several more pleasurable opportunities up front for individuals that take a look at your blog post.

  15. 2019年1月12日03:04 | #15

    I needed to write you a little note just to give thanks yet again considering the lovely opinions you have featured at this time. This has been quite unbelievably open-handed with you to provide without restraint what most people would’ve sold for an ebook in making some money for their own end, and in particular now that you could have done it in case you decided. The smart ideas additionally worked to provide a good way to recognize that some people have similar passion similar to mine to realize a little more pertaining to this issue. Certainly there are a lot more pleasurable times ahead for folks who read through your blog.

  16. 2019年1月12日05:25 | #16

    A lot of thanks for your entire labor on this website. Kim take interest in setting aside time for investigation and it is simple to grasp why. My spouse and i learn all relating to the powerful method you make important solutions through this web blog and even inspire response from other people about this point plus our daughter is always becoming educated a lot of things. Enjoy the rest of the year. You are performing a great job.

  17. 2019年1月12日19:34 | #17

    I happen to be writing to make you understand what a magnificent discovery my cousin’s child found studying your blog. She learned a good number of things, with the inclusion of what it is like to have an excellent teaching heart to get certain people without problems know selected complicated subject matter. You actually did more than her desires. Thank you for supplying the precious, trustworthy, educational as well as unique tips on your topic to Ethel.

  18. 2019年1月13日01:35 | #18

    I must point out my appreciation for your kindness supporting individuals who must have assistance with this particular matter. Your special commitment to getting the message all through appeared to be extremely important and has in every case enabled most people much like me to realize their ambitions. This useful guide indicates much a person like me and even further to my fellow workers. With thanks; from all of us.

  19. 2019年1月13日07:19 | #19

    Thank you so much for providing individuals with an exceptionally marvellous opportunity to read from here. It is always so terrific plus full of a lot of fun for me personally and my office peers to visit your site not less than three times in a week to find out the newest items you have. And indeed, we are actually motivated with the mind-boggling points you serve. Certain 1 ideas on this page are in truth the most effective I’ve had.

  20. 2019年1月13日14:34 | #20

    I intended to compose you that little bit of note to finally thank you very much as before for all the striking basics you have provided in this case. It is really shockingly generous with people like you to offer publicly just what a few people could possibly have marketed for an e book to end up making some money on their own, principally seeing that you could have tried it in case you considered necessary. These things as well served like the easy way to be sure that many people have a similar dream much like my own to realize a good deal more in terms of this problem. I think there are some more pleasurable situations in the future for folks who see your website.

  21. 2019年1月13日18:55 | #21

    I and also my pals happened to be taking note of the nice helpful tips found on your web site and then the sudden I got an awful feeling I never thanked the blog owner for those techniques. The young men are already totally thrilled to read all of them and have in effect honestly been making the most of these things. Thank you for actually being very kind and for figuring out such fabulous issues millions of individuals are really desperate to understand about. My personal honest regret for not expressing appreciation to earlier.

  22. 2019年1月14日09:08 | #22

    I must express appreciation to this writer for bailing me out of this particular scenario. After checking through the the web and meeting basics which are not helpful, I assumed my life was gone. Being alive without the presence of approaches to the difficulties you have resolved by way of your good blog post is a critical case, and those that could have adversely affected my career if I had not come across your blog post. Your primary ability and kindness in playing with all things was useful. I don’t know what I would have done if I hadn’t discovered such a stuff like this. It’s possible to now relish my future. Thanks so much for the expert and results-oriented guide. I will not hesitate to endorse your web blog to any individual who would need guidance on this area.

  1. 本文目前尚无任何 trackbacks 和 pingbacks.

无觅相关文章插件,快速提升流量