前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >how to run JavaWordCount in Spark

how to run JavaWordCount in Spark

作者头像
Jerry Wang
发布2021-02-20 14:51:06
2870
发布2021-02-20 14:51:06
举报

Created by Jerry Wang, last modified on Aug 17, 2015

The general steps could be found in this link: http://stackoverflow.com/questions/22252534/how-to-run-a-spark-java-program-from-command-line

  1. mkdir example-java-build/; cd example-java-build
  2. mvn archetype:generate -DarchetypeGroupId=org.apache.maven.archetypes -DgroupId=spark.examples -DartifactId=JavaWordCount \ – 对应生成的project folder name -Dfilter=org.apache.maven.archetypes:maven-archetype-quickstart
clipboard1
clipboard1

below is my pom.xml:

代码语言:javascript
复制
 <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
     <modelVersion>4.0.0</modelVersion>
     <groupId>spark.examples</groupId> --- 和命令行里指定的groupid 一致
     <artifactId>JavaWordCount</artifactId>--- 和命令行里指定的groupid 一致
     <packaging>jar</packaging>
     <version>1</version>
     <name>JavaWordCount</name>
     <url>http://maven.apache.org</url>
    <dependencies>
      <dependency>
        <groupId>junit</groupId>
        <artifactId>junit</artifactId>
        <version>3.8.1</version>
        <scope>test</scope>
    </dependency>
    <dependency>
                <groupId>org.apache.spark</groupId>
                        <artifactId>spark-examples_2.10</artifactId>
                                <version>1.1.0</version>
                            </dependency>
    <dependency>
                <groupId>org.apache.spark</groupId>
                        <artifactId>spark-core_2.10</artifactId>
                                <version>1.4.1</version>
                            </dependency>
    </dependencies>
  </project>
```xml
3. cd example-java-build/JavaWordCount
mvn package
This creates your fat jar file inside the target directory. 
![clipboard2](https://user-images.githubusercontent.com/5669954/28005843-f8c64808-654c-11e7-8a72-bf61d78e15bd.png)

在classes folder里有零散的.class file:
![clipboard3](https://user-images.githubusercontent.com/5669954/28005849-fd29e0e4-654c-11e7-8e25-8563f2219e18.png)

Copy the jar file to any location on the server. Go to the your bin folder of your spark. 
  
Submit spark job: ./spark-submit --class "org.apache.spark.examples.JavaWordCount" --master local /root/devExpert/spark-1.4.1/example-java- build/JavaWordCount/target/JavaWordCount-1.jar
 
use jd.exe to open the compiled java class, make sure the value specified by --class equals to the complate name of class,
 
in my example it is org.apache.spark.examples.JavaWordCount. Or else you will meet with java.lang.ClassNotFoundException.
![clipboard4](https://user-images.githubusercontent.com/5669954/28005858-0459ae6c-654d-11e7-9c61-d61b36d20334.png)

4. ./spark-submit --class "org.apache.spark.examples.JavaWordCount" --master local /root/devExpert/spark-1.4.1/example-java-build/JavaWordCount/target/JavaWordCount-1.jar /root/devExpert/spark-1.4.1/bin/test.txt
-debug: sh -x ./spark-submit --class "org.apache.spark.examples.JavaWordCount" --master local /root/devExpert/spark-1.4.1/example-java-build/JavaWordCount/target/JavaWordCount-1.jar /root/devExpert/spark-1.4.1/bin/test.txt
等价于:/usr/jdk1.7.0_79/bin/java -cp /root/devExpert/spark-1.4.1/conf/:/root/devExpert/spark-1.4.1/assembly/target/scala-2.10/spark-assembly-1.4.1-hadoop2.4.0.jar:/root/devExpert/spark-1.4.1/lib_managed/jars/datanucleus-rdbms-3.2.9.jar:/root/devExpert/spark-1.4.1/lib_managed/jars/datanucleus-core-3.2.10.jar:/root/devExpert/spark-1.4.1/lib_managed/jars/datanucleus-api-jdo-3.2.6.jar -Xms512m -Xmx512m -XX:MaxPermSize=256m org.apache.spark.deploy.SparkSubmit --master local --class org.apache.spark.examples.JavaWordCount /root/devExpert/spark-1.4.1/example-java-build/JavaWordCount/target/JavaWordCount-1.jar /root/devExpert/spark-1.4.1/bin/test.txt

-cp 和 -classpath 一样,是指定类运行所依赖其他类的路径,通常是类库,jar包之类,需要全路径到jar包,window上分号“;”  
  
分隔,linux上是分号“:”分隔。不支持通配符,需要列出所有jar包,用一点“.”代表当前路径。 
output:
![clipboard6](https://user-images.githubusercontent.com/5669954/28005861-095700e0-654d-11e7-86da-e3b08feda93b.png)
本文参与 腾讯云自媒体分享计划,分享自作者个人站点/博客。
原始发表:2019-06-30 ,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • Created by Jerry Wang, last modified on Aug 17, 2015
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档