您好,欢迎访问代理记账网站
  • 价格透明
  • 信息保密
  • 进度掌控
  • 售后无忧

MapReduce概述

1.MapReduce定义

MapReduce是一个分布式运算程序的编程框架,是用户开发“基于Hadoop的数据分析应用”的核心框架。

MapReduce核心功能是将用户编写的业务逻辑代码和自带默认组件整合成一个完整的分布式运算程序,并行运行在一个Hadoop集群上。

2.MapReduce优缺点

优点

1)MapReduce 易于编程

它简单的实现一些接口,就可以完成一个分布式程序,这个分布式程序可以分布到大量廉价的PC机器上运行。也就是说你写一个分布式程序,跟写一个简单的串行程序是一模一样的。就是因为这个特点使得MapReduce编程变得非常流行。

实现其内部接口+组件,即可开发分布式程序

2)良好的扩展性

当你的计算资源不能得到满足的时候,你可以通过简单的增加机器来扩展它的计算能力。
根据需求,随时添加机器进行扩容

3)高容错性

MapReduce设计的初衷就是使程序能够部署在廉价的PC机器上,这就要求它具有很高的容错性。比如其中一台机器挂了,它可以把上面的计算任务转移到另外一个节点上运行,不至于这个任务运行失败,而且这个过程不需要人工参与,而完全是由Hadoop内部完成的。
体现在集群上的多个副本

4)适合PB级以上海量数据的离线处理

可以实现上千台服务器集群并发工作,提供数据处理能力。
海量数据的离线处理,是Hadoop的核心功能;

缺点

1)不擅长实时计算

MapReduce无法像MySQL一样,在毫秒或者秒级内返回结果。
因为其需要不断的落盘与IO,所以会慢。但不是说它一定不能做实时计算,只是不擅长;

2)不擅长流式计算

流式计算的输入数据是动态的,而MapReduce的输入数据集是静态的,不能动态变化。这是因为MapReduce自身的设计特点决定了数据源必须是静态的。

3)不擅长DAG(有向无环图)计算

多个应用程序存在依赖关系,后一个应用程序的输入为前一个的输出。在这种情况下,MapReduce并不是不能做,而是使用后,每个MapReduce作业的输出结果都会写入到磁盘,会造成大量的磁盘IO,导致性能非常的低下。

3.MapReduce的核心思想

1)MapReduce分布式运行程序往往需要分成2个阶段进行执行。
2)第一个阶段的MapTask并发实例,完全并行运行,互不干扰。
3)第二个阶段的ReduceTask并发实例互不干扰,但是他的入参数据依赖上一个阶段的所有MapTask并发实例的输出。
4)MapReduce编程模式只能包含一个Map阶段和一个Reduce阶段,如果用户的业务逻辑非常复杂,那就只能多个MapReduce程序,串行运行。

4.MapReduce进程包括哪些?

一个完整的MapReduce程序在分布式运行时,有三个实例进程:

1)MrAppMaster:负责整个程序的过程调度及状态协调
2)MapTask:负责Map阶段的整个数据处理流程
3)ReduceTask:负责Reduce阶段的整个数据处理流程

5.第一个案例

WordCount案例,有三个主要的类:自定义Mapper类,自定义Reducer和Driver驱动类组成;具体实现见下面案例;

6.Java序列化与Hadoop序列化类型对比

Java类型	Hadoop Writable类型Boolean	BooleanWritableByte	ByteWritableInt	IntWritableFloat	FloatWritableLong	LongWritableDouble	DoubleWritableString	TextMap	MapWritableArray	ArrayWritableNull	NullWritable

7.MapReduce编程规范

用户编写的程序分成3部分:Mapper、Reducer和Driver。

1.Mapper阶段

1)自定义Mapper类继承Hadoop的Mapper父类
2)Mapper的输入数据时KV,一般读取文本数据用LongWritable,Text
3) Mapper中的业务写在map方法中,MapTask每读取一次数据,就会调用此方法一次
4)map方法执行完成,通过上下文写出加工后的KV值。

2.Reduce阶段

1)自定义Reducer类继承Hadoop的Reducer父类
2)Reducer的输入数据类型为KV,对应Mapper的输出KV
3)Reducer中的业务写在reduce方法中,是ReduceTask调用的,对相同key不同值的一组数据的处理
4)reduce方法执行完成,通过上下文写出加工后的KV值。

3.Driver阶段

相当于Yarn集群的客户端,用于提交我们整个程序到Yarn集群,提交的是封装了MapReduce程序相关运行参数的Job对象。

8.WordCount案例演示

1)需求

在给定的的文本文件中统计每个单词出现的总次数

输入数据

fengxq fengxq
ss ss
cls cls
jiao
banzhang
xue
hadoop
sgg sgg sgg
nihao nihao
bigdata0111
laiba

期望输出

banzhang 1
bigdata0111 1
cls 2
fengxq 2
hadoop 1
jiao 1
laiba 1
nihao 2
sgg 3
ss 2
xue 1

2)代码

自定义WordCountMapper 类

package com.fengxq.mr.wc;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;

/**
 * 继承Mapper类
 */
public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
    private Text outk = new Text();
    private IntWritable outv = new IntWritable(1); // 一次
    /**
     * 写map方法
     * @param key
     * @param value
     * @param context
     * @throws IOException
     * @throws InterruptedException
     */
    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        // 读取行数据
        String line = value.toString();
        /**
         * 切割
         * 格式:fengxq fengxq
         */
        String[] datas = line.split(" ");
        // 循环获取key value,并输出
        for (String data : datas) {
            outk.set(data);
            context.write(outk, outv);
        }
    }
}

自定义WordCountReducer类

package com.fengxq.mr.wc;

import jdk.nashorn.internal.ir.CallNode;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;

public class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
    IntWritable outv = new IntWritable();
    /**
     * 对已经分组好的键值进行输出
     * @param key
     * @param values
     * @param context
     * @throws IOException
     * @throws InterruptedException
     */
    @Override
    protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
        int sum = 0; // 总数
        for (IntWritable value : values) {
            sum += value.get();
        }
        outv.set(sum);
        context.write(key, outv);
    }
}

自定义WordCountDriver类

package com.fengxq.mr.wc;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.IOException;

public class WordCountDriver {
    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
        Job job = Job.getInstance();
        job.setJarByClass(WordCountDriver.class);

        job.setMapperClass(WordCountMapper.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);
        job.setReducerClass(WordCountReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);

//        FileInputFormat.setInputPaths(job, new Path(args[0]));
//        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        FileInputFormat.setInputPaths(job, new Path("D:\\hadoop_in\\wcinput"));
        FileOutputFormat.setOutputPath(job, new Path("D:\\hadoop_out\\wcinput_out"));
        job.waitForCompletion(true);

    }
}

执行日志

C:\instal\java\jdk8\bin\java.exe "-javaagent:C:\instal\JetBrains\IntelliJ IDEA 2019.3.1\lib\idea_rt.jar=5998:C:\instal\JetBrains\IntelliJ IDEA 2019.3.1\bin" -Dfile.encoding=UTF-8 -classpath C:\instal\java\jdk8\jre\lib\charsets.jar;C:\instal\java\jdk8\jre\lib\deploy.jar;C:\instal\java\jdk8\jre\lib\ext\access-bridge-64.jar;C:\instal\java\jdk8\jre\lib\ext\cldrdata.jar;C:\instal\java\jdk8\jre\lib\ext\dnsns.jar;C:\instal\java\jdk8\jre\lib\ext\jaccess.jar;C:\instal\java\jdk8\jre\lib\ext\jfxrt.jar;C:\instal\java\jdk8\jre\lib\ext\localedata.jar;C:\instal\java\jdk8\jre\lib\ext\nashorn.jar;C:\instal\java\jdk8\jre\lib\ext\sunec.jar;C:\instal\java\jdk8\jre\lib\ext\sunjce_provider.jar;C:\instal\java\jdk8\jre\lib\ext\sunmscapi.jar;C:\instal\java\jdk8\jre\lib\ext\sunpkcs11.jar;C:\instal\java\jdk8\jre\lib\ext\zipfs.jar;C:\instal\java\jdk8\jre\lib\javaws.jar;C:\instal\java\jdk8\jre\lib\jce.jar;C:\instal\java\jdk8\jre\lib\jfr.jar;C:\instal\java\jdk8\jre\lib\jfxswt.jar;C:\instal\java\jdk8\jre\lib\jsse.jar;C:\instal\java\jdk8\jre\lib\management-agent.jar;C:\instal\java\jdk8\jre\lib\plugin.jar;C:\instal\java\jdk8\jre\lib\resources.jar;C:\instal\java\jdk8\jre\lib\rt.jar;C:\mydisk\ws_bigdata2\MapReduce\target\classes;C:\mydisk\pub_noinstal\RepMaven\junit\junit\4.12\junit-4.12.jar;C:\mydisk\pub_noinstal\RepMaven\org\hamcrest\hamcrest-core\1.3\hamcrest-core-1.3.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\logging\log4j\log4j-slf4j-impl\2.12.0\log4j-slf4j-impl-2.12.0.jar;C:\mydisk\pub_noinstal\RepMaven\org\slf4j\slf4j-api\1.7.25\slf4j-api-1.7.25.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\logging\log4j\log4j-api\2.12.0\log4j-api-2.12.0.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\logging\log4j\log4j-core\2.12.0\log4j-core-2.12.0.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\hadoop\hadoop-client\3.1.3\hadoop-client-3.1.3.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\hadoop\hadoop-common\3.1.3\hadoop-common-3.1.3.jar;C:\mydisk\pub_noinstal\RepMaven\com\google\guava\guava\27.0-jre\guava-27.0-jre.jar;C:\mydisk\pub_noinstal\RepMaven\com\google\guava\failureaccess\1.0\failureaccess-1.0.jar;C:\mydisk\pub_noinstal\RepMaven\com\google\guava\listenablefuture\9999.0-empty-to-avoid-conflict-with-guava\listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar;C:\mydisk\pub_noinstal\RepMaven\org\checkerframework\checker-qual\2.5.2\checker-qual-2.5.2.jar;C:\mydisk\pub_noinstal\RepMaven\com\google\errorprone\error_prone_annotations\2.2.0\error_prone_annotations-2.2.0.jar;C:\mydisk\pub_noinstal\RepMaven\com\google\j2objc\j2objc-annotations\1.1\j2objc-annotations-1.1.jar;C:\mydisk\pub_noinstal\RepMaven\org\codehaus\mojo\animal-sniffer-annotations\1.17\animal-sniffer-annotations-1.17.jar;C:\mydisk\pub_noinstal\RepMaven\commons-cli\commons-cli\1.2\commons-cli-1.2.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\commons\commons-math3\3.1.1\commons-math3-3.1.1.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\httpcomponents\httpclient\4.5.2\httpclient-4.5.2.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\httpcomponents\httpcore\4.4.4\httpcore-4.4.4.jar;C:\mydisk\pub_noinstal\RepMaven\commons-codec\commons-codec\1.11\commons-codec-1.11.jar;C:\mydisk\pub_noinstal\RepMaven\commons-io\commons-io\2.5\commons-io-2.5.jar;C:\mydisk\pub_noinstal\RepMaven\commons-net\commons-net\3.6\commons-net-3.6.jar;C:\mydisk\pub_noinstal\RepMaven\commons-collections\commons-collections\3.2.2\commons-collections-3.2.2.jar;C:\mydisk\pub_noinstal\RepMaven\org\eclipse\jetty\jetty-servlet\9.3.24.v20180605\jetty-servlet-9.3.24.v20180605.jar;C:\mydisk\pub_noinstal\RepMaven\org\eclipse\jetty\jetty-security\9.3.24.v20180605\jetty-security-9.3.24.v20180605.jar;C:\mydisk\pub_noinstal\RepMaven\org\eclipse\jetty\jetty-webapp\9.3.24.v20180605\jetty-webapp-9.3.24.v20180605.jar;C:\mydisk\pub_noinstal\RepMaven\org\eclipse\jetty\jetty-xml\9.3.24.v20180605\jetty-xml-9.3.24.v20180605.jar;C:\mydisk\pub_noinstal\RepMaven\javax\servlet\jsp\jsp-api\2.1\jsp-api-2.1.jar;C:\mydisk\pub_noinstal\RepMaven\com\sun\jersey\jersey-servlet\1.19\jersey-servlet-1.19.jar;C:\mydisk\pub_noinstal\RepMaven\commons-logging\commons-logging\1.1.3\commons-logging-1.1.3.jar;C:\mydisk\pub_noinstal\RepMaven\log4j\log4j\1.2.17\log4j-1.2.17.jar;C:\mydisk\pub_noinstal\RepMaven\commons-lang\commons-lang\2.6\commons-lang-2.6.jar;C:\mydisk\pub_noinstal\RepMaven\commons-beanutils\commons-beanutils\1.9.3\commons-beanutils-1.9.3.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\commons\commons-configuration2\2.1.1\commons-configuration2-2.1.1.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\commons\commons-lang3\3.4\commons-lang3-3.4.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\avro\avro\1.7.7\avro-1.7.7.jar;C:\mydisk\pub_noinstal\RepMaven\org\codehaus\jackson\jackson-core-asl\1.9.13\jackson-core-asl-1.9.13.jar;C:\mydisk\pub_noinstal\RepMaven\org\codehaus\jackson\jackson-mapper-asl\1.9.13\jackson-mapper-asl-1.9.13.jar;C:\mydisk\pub_noinstal\RepMaven\com\thoughtworks\paranamer\paranamer\2.3\paranamer-2.3.jar;C:\mydisk\pub_noinstal\RepMaven\org\xerial\snappy\snappy-java\1.0.5\snappy-java-1.0.5.jar;C:\mydisk\pub_noinstal\RepMaven\com\google\re2j\re2j\1.1\re2j-1.1.jar;C:\mydisk\pub_noinstal\RepMaven\com\google\protobuf\protobuf-java\2.5.0\protobuf-java-2.5.0.jar;C:\mydisk\pub_noinstal\RepMaven\com\google\code\gson\gson\2.2.4\gson-2.2.4.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\hadoop\hadoop-auth\3.1.3\hadoop-auth-3.1.3.jar;C:\mydisk\pub_noinstal\RepMaven\com\nimbusds\nimbus-jose-jwt\4.41.1\nimbus-jose-jwt-4.41.1.jar;C:\mydisk\pub_noinstal\RepMaven\com\github\stephenc\jcip\jcip-annotations\1.0-1\jcip-annotations-1.0-1.jar;C:\mydisk\pub_noinstal\RepMaven\net\minidev\json-smart\2.3\json-smart-2.3.jar;C:\mydisk\pub_noinstal\RepMaven\net\minidev\accessors-smart\1.2\accessors-smart-1.2.jar;C:\mydisk\pub_noinstal\RepMaven\org\ow2\asm\asm\5.0.4\asm-5.0.4.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\curator\curator-framework\2.13.0\curator-framework-2.13.0.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\curator\curator-client\2.13.0\curator-client-2.13.0.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\curator\curator-recipes\2.13.0\curator-recipes-2.13.0.jar;C:\mydisk\pub_noinstal\RepMaven\com\google\code\findbugs\jsr305\3.0.0\jsr305-3.0.0.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\htrace\htrace-core4\4.1.0-incubating\htrace-core4-4.1.0-incubating.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\commons\commons-compress\1.18\commons-compress-1.18.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\kerby\kerb-simplekdc\1.0.1\kerb-simplekdc-1.0.1.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\kerby\kerb-client\1.0.1\kerb-client-1.0.1.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\kerby\kerby-config\1.0.1\kerby-config-1.0.1.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\kerby\kerb-core\1.0.1\kerb-core-1.0.1.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\kerby\kerby-pkix\1.0.1\kerby-pkix-1.0.1.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\kerby\kerby-asn1\1.0.1\kerby-asn1-1.0.1.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\kerby\kerby-util\1.0.1\kerby-util-1.0.1.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\kerby\kerb-common\1.0.1\kerb-common-1.0.1.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\kerby\kerb-crypto\1.0.1\kerb-crypto-1.0.1.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\kerby\kerb-util\1.0.1\kerb-util-1.0.1.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\kerby\token-provider\1.0.1\token-provider-1.0.1.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\kerby\kerb-admin\1.0.1\kerb-admin-1.0.1.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\kerby\kerb-server\1.0.1\kerb-server-1.0.1.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\kerby\kerb-identity\1.0.1\kerb-identity-1.0.1.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\kerby\kerby-xdr\1.0.1\kerby-xdr-1.0.1.jar;C:\mydisk\pub_noinstal\RepMaven\com\fasterxml\jackson\core\jackson-databind\2.7.8\jackson-databind-2.7.8.jar;C:\mydisk\pub_noinstal\RepMaven\com\fasterxml\jackson\core\jackson-core\2.7.8\jackson-core-2.7.8.jar;C:\mydisk\pub_noinstal\RepMaven\org\codehaus\woodstox\stax2-api\3.1.4\stax2-api-3.1.4.jar;C:\mydisk\pub_noinstal\RepMaven\com\fasterxml\woodstox\woodstox-core\5.0.3\woodstox-core-5.0.3.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\hadoop\hadoop-hdfs-client\3.1.3\hadoop-hdfs-client-3.1.3.jar;C:\mydisk\pub_noinstal\RepMaven\com\squareup\okhttp\okhttp\2.7.5\okhttp-2.7.5.jar;C:\mydisk\pub_noinstal\RepMaven\com\squareup\okio\okio\1.6.0\okio-1.6.0.jar;C:\mydisk\pub_noinstal\RepMaven\com\fasterxml\jackson\core\jackson-annotations\2.7.8\jackson-annotations-2.7.8.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\hadoop\hadoop-yarn-api\3.1.3\hadoop-yarn-api-3.1.3.jar;C:\mydisk\pub_noinstal\RepMaven\javax\xml\bind\jaxb-api\2.2.11\jaxb-api-2.2.11.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\hadoop\hadoop-yarn-client\3.1.3\hadoop-yarn-client-3.1.3.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\hadoop\hadoop-mapreduce-client-core\3.1.3\hadoop-mapreduce-client-core-3.1.3.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\hadoop\hadoop-yarn-common\3.1.3\hadoop-yarn-common-3.1.3.jar;C:\mydisk\pub_noinstal\RepMaven\javax\servlet\javax.servlet-api\3.1.0\javax.servlet-api-3.1.0.jar;C:\mydisk\pub_noinstal\RepMaven\org\eclipse\jetty\jetty-util\9.3.24.v20180605\jetty-util-9.3.24.v20180605.jar;C:\mydisk\pub_noinstal\RepMaven\com\sun\jersey\jersey-core\1.19\jersey-core-1.19.jar;C:\mydisk\pub_noinstal\RepMaven\javax\ws\rs\jsr311-api\1.1.1\jsr311-api-1.1.1.jar;C:\mydisk\pub_noinstal\RepMaven\com\sun\jersey\jersey-client\1.19\jersey-client-1.19.jar;C:\mydisk\pub_noinstal\RepMaven\com\fasterxml\jackson\module\jackson-module-jaxb-annotations\2.7.8\jackson-module-jaxb-annotations-2.7.8.jar;C:\mydisk\pub_noinstal\RepMaven\com\fasterxml\jackson\jaxrs\jackson-jaxrs-json-provider\2.7.8\jackson-jaxrs-json-provider-2.7.8.jar;C:\mydisk\pub_noinstal\RepMaven\com\fasterxml\jackson\jaxrs\jackson-jaxrs-base\2.7.8\jackson-jaxrs-base-2.7.8.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\hadoop\hadoop-mapreduce-client-jobclient\3.1.3\hadoop-mapreduce-client-jobclient-3.1.3.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\hadoop\hadoop-mapreduce-client-common\3.1.3\hadoop-mapreduce-client-common-3.1.3.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\hadoop\hadoop-annotations\3.1.3\hadoop-annotations-3.1.3.jar com.fengxq.mr.wc.WordCountDriver
log4j:WARN No appenders could be found for logger (org.apache.htrace.core.Tracer).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
[WARN] [2021-06-01 20:09:54][org.apache.hadoop.metrics2.impl.MetricsConfig]Cannot locate configuration: tried hadoop-metrics2-jobtracker.properties,hadoop-metrics2.properties
[INFO] [2021-06-01 20:09:54][org.apache.hadoop.metrics2.impl.MetricsSystemImpl]Scheduled Metric snapshot period at 10 second(s).
[INFO] [2021-06-01 20:09:54][org.apache.hadoop.metrics2.impl.MetricsSystemImpl]JobTracker metrics system started
[WARN] [2021-06-01 20:09:55][org.apache.hadoop.mapreduce.JobResourceUploader]Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
[WARN] [2021-06-01 20:09:55][org.apache.hadoop.mapreduce.JobResourceUploader]No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
[INFO] [2021-06-01 20:09:55][org.apache.hadoop.mapreduce.lib.input.FileInputFormat]Total input files to process : 1
[INFO] [2021-06-01 20:09:55][org.apache.hadoop.mapreduce.JobSubmitter]number of splits:1
[INFO] [2021-06-01 20:09:55][org.apache.hadoop.mapreduce.JobSubmitter]Submitting tokens for job: job_local758397422_0001
[INFO] [2021-06-01 20:09:55][org.apache.hadoop.mapreduce.JobSubmitter]Executing with tokens: []
[INFO] [2021-06-01 20:09:55][org.apache.hadoop.mapreduce.Job]The url to track the job: http://localhost:8080/
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapreduce.Job]Running job: job_local758397422_0001
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapred.LocalJobRunner]OutputCommitter set in config null
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter]File Output Committer Algorithm version is 2
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter]FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapred.LocalJobRunner]OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapred.LocalJobRunner]Waiting for map tasks
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapred.LocalJobRunner]Starting task: attempt_local758397422_0001_m_000000_0
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter]File Output Committer Algorithm version is 2
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter]FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapred.Task] Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@5792442a
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapred.MapTask]Processing split: file:/D:/hadoop_in/wcinput/hello.txt:0+106
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapred.MapTask](EQUATOR) 0 kvi 26214396(104857584)
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapred.MapTask]mapreduce.task.io.sort.mb: 100
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapred.MapTask]soft limit at 83886080
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapred.MapTask]bufstart = 0; bufvoid = 104857600
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapred.MapTask]kvstart = 26214396; length = 6553600
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapred.MapTask]Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapred.LocalJobRunner]
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapred.MapTask]Starting flush of map output
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapred.MapTask]Spilling map output
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapred.MapTask]bufstart = 0; bufend = 163; bufvoid = 104857600
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapred.MapTask]kvstart = 26214396(104857584); kvend = 26214332(104857328); length = 65/6553600
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapred.MapTask]Finished spill 0
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapred.Task]Task:attempt_local758397422_0001_m_000000_0 is done. And is in the process of committing
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapred.LocalJobRunner]map
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapred.Task]Task 'attempt_local758397422_0001_m_000000_0' done.
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapred.Task]Final Counters for attempt_local758397422_0001_m_000000_0: Counters: 17
	File System Counters
		FILE: Number of bytes read=261
		FILE: Number of bytes written=341453
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
	Map-Reduce Framework
		Map input records=11
		Map output records=17
		Map output bytes=163
		Map output materialized bytes=203
		Input split bytes=101
		Combine input records=0
		Spilled Records=17
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=0
		Total committed heap usage (bytes)=261619712
	File Input Format Counters 
		Bytes Read=106
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapred.LocalJobRunner]Finishing task: attempt_local758397422_0001_m_000000_0
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapred.LocalJobRunner]map task executor complete.
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapred.LocalJobRunner]Waiting for reduce tasks
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapred.LocalJobRunner]Starting task: attempt_local758397422_0001_r_000000_0
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter]File Output Committer Algorithm version is 2
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter]FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
[INFO] [2021-06-01 20:09:57][org.apache.hadoop.mapreduce.Job]Job job_local758397422_0001 running in uber mode : false
[INFO] [2021-06-01 20:09:57][org.apache.hadoop.mapreduce.Job] map 100% reduce 0%
[INFO] [2021-06-01 20:09:57][org.apache.hadoop.mapred.Task] Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@1d0cf332
[INFO] [2021-06-01 20:09:57][org.apache.hadoop.mapred.ReduceTask]Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@1346e016
[WARN] [2021-06-01 20:09:57][org.apache.hadoop.metrics2.impl.MetricsSystemImpl]JobTracker metrics system already initialized!
[INFO] [2021-06-01 20:09:57][org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl]MergerManager: memoryLimit=2654155520, maxSingleShuffleLimit=663538880, mergeThreshold=1751742720, ioSortFactor=10, memToMemMergeOutputsThreshold=10
[INFO] [2021-06-01 20:09:57][org.apache.hadoop.mapreduce.task.reduce.EventFetcher]attempt_local758397422_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
[INFO] [2021-06-01 20:09:57][org.apache.hadoop.mapreduce.task.reduce.LocalFetcher]localfetcher#1 about to shuffle output of map attempt_local758397422_0001_m_000000_0 decomp: 199 len: 203 to MEMORY
[INFO] [2021-06-01 20:09:57][org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput]Read 199 bytes from map-output for attempt_local758397422_0001_m_000000_0
[INFO] [2021-06-01 20:09:57][org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl]closeInMemoryFile -> map-output of size: 199, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->199
[INFO] [2021-06-01 20:09:57][org.apache.hadoop.mapreduce.task.reduce.EventFetcher]EventFetcher is interrupted.. Returning
[INFO] [2021-06-01 20:09:57][org.apache.hadoop.mapred.LocalJobRunner]1 / 1 copied.
[INFO] [2021-06-01 20:09:57][org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl]finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
[INFO] [2021-06-01 20:09:57][org.apache.hadoop.mapred.Merger]Merging 1 sorted segments
[INFO] [2021-06-01 20:09:57][org.apache.hadoop.mapred.Merger]Down to the last merge-pass, with 1 segments left of total size: 188 bytes
[INFO] [2021-06-01 20:09:57][org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl]Merged 1 segments, 199 bytes to disk to satisfy reduce memory limit
[INFO] [2021-06-01 20:09:57][org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl]Merging 1 files, 203 bytes from disk
[INFO] [2021-06-01 20:09:57][org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl]Merging 0 segments, 0 bytes from memory into reduce
[INFO] [2021-06-01 20:09:57][org.apache.hadoop.mapred.Merger]Merging 1 sorted segments
[INFO] [2021-06-01 20:09:57][org.apache.hadoop.mapred.Merger]Down to the last merge-pass, with 1 segments left of total size: 188 bytes
[INFO] [2021-06-01 20:09:57][org.apache.hadoop.mapred.LocalJobRunner]1 / 1 copied.
[INFO] [2021-06-01 20:09:57][org.apache.hadoop.conf.Configuration.deprecation]mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
[INFO] [2021-06-01 20:09:57][org.apache.hadoop.mapred.Task]Task:attempt_local758397422_0001_r_000000_0 is done. And is in the process of committing
[INFO] [2021-06-01 20:09:57][org.apache.hadoop.mapred.LocalJobRunner]1 / 1 copied.
[INFO] [2021-06-01 20:09:57][org.apache.hadoop.mapred.Task]Task attempt_local758397422_0001_r_000000_0 is allowed to commit now
[INFO] [2021-06-01 20:09:57][org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter]Saved output of task 'attempt_local758397422_0001_r_000000_0' to file:/D:/hadoop_out/wcinput_out
[INFO] [2021-06-01 20:09:57][org.apache.hadoop.mapred.LocalJobRunner]reduce > reduce
[INFO] [2021-06-01 20:09:57][org.apache.hadoop.mapred.Task]Task 'attempt_local758397422_0001_r_000000_0' done.
[INFO] [2021-06-01 20:09:57][org.apache.hadoop.mapred.Task]Final Counters for attempt_local758397422_0001_r_000000_0: Counters: 24
	File System Counters
		FILE: Number of bytes read=699
		FILE: Number of bytes written=341757
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
	Map-Reduce Framework
		Combine input records=0
		Combine output records=0
		Reduce input groups=11
		Reduce shuffle bytes=203
		Reduce input records=17
		Reduce output records=11
		Spilled Records=17
		Shuffled Maps =1
		Failed Shuffles=0
		Merged Map outputs=1
		GC time elapsed (ms)=0
		Total committed heap usage (bytes)=261619712
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Output Format Counters 
		Bytes Written=101
[INFO] [2021-06-01 20:09:57][org.apache.hadoop.mapred.LocalJobRunner]Finishing task: attempt_local758397422_0001_r_000000_0
[INFO] [2021-06-01 20:09:57][org.apache.hadoop.mapred.LocalJobRunner]reduce task executor complete.
[INFO] [2021-06-01 20:09:58][org.apache.hadoop.mapreduce.Job] map 100% reduce 100%
[INFO] [2021-06-01 20:09:58][org.apache.hadoop.mapreduce.Job]Job job_local758397422_0001 completed successfully
[INFO] [2021-06-01 20:09:58][org.apache.hadoop.mapreduce.Job]Counters: 30
	File System Counters
		FILE: Number of bytes read=960
		FILE: Number of bytes written=683210
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
	Map-Reduce Framework
		Map input records=11
		Map output records=17
		Map output bytes=163
		Map output materialized bytes=203
		Input split bytes=101
		Combine input records=0
		Combine output records=0
		Reduce input groups=11
		Reduce shuffle bytes=203
		Reduce input records=17
		Reduce output records=11
		Spilled Records=34
		Shuffled Maps =1
		Failed Shuffles=0
		Merged Map outputs=1
		GC time elapsed (ms)=0
		Total committed heap usage (bytes)=523239424
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=106
	File Output Format Counters 
		Bytes Written=101

Process finished with exit code 0

执行结果
在这里插入图片描述

banzhang	1
bigdata0111	1
cls	2
fengxq	2
hadoop	1
jiao	1
laiba	1
nihao	2
sgg	3
ss	2
xue	1


分享:

低价透明

统一报价,无隐形消费

金牌服务

一对一专属顾问7*24小时金牌服务

信息保密

个人信息安全有保障

售后无忧

服务出问题客服经理全程跟进