软件版本: JDK:1.7.0_67 Scala:2.10.4 Hadoop:2.5.0 Spark:1.6.1 Maven:3.3.3 Zinc:0.3.5.3
(1)搭建Maven环境
1)解压Maven安装包
maven安装包下载地址:http://archive.apache.org/dist/maven/maven-3/3.3.3/binaries/
cd /opt/softres/
softwares]$ tar -zxf apache-maven-3.3.3-bin.tar.gz -C /opt/modules/
2)修改配置文件
apache-maven-3.3.3]$ cd conf
conf]$ vim settings.xml
<mirrors> <mirror> <id>aliyun</id> <mirrorOf>central</mirrorOf> <name>aliyun repository</name> <url>http://maven.aliyun.com/nexus/content/groups/public/</url> </mirror> </mirrors>
3)配置MAVEN_HOME环境变量
$ vim /etc/profile
# MAVEN_HOME export MAVEN_HOME=/opt/modules/apache-maven-3.3.3 export PATH=$PATH:$MAVEN_HOME/bin
(2)搭建Spark环境
1)解压spark源码包
spark源码包下载地址:http://archive.apache.org/dist/spark/spark-1.6.1/
softwares]$ tar -zxf spark-1.6.1.tgz -C /opt/modules/
2)修改配置文件/opt/modules/spark-1.6.1/make-distribution.sh
# Figure out where the Spark framework is installed SPARK_HOME=/opt/modules/spark-1.6.1 DISTDIR="$SPARK_HOME/dist" VERSION=1.6.1 SCALA_VERSION=2.10.4 SPARK_HADOOP_VERSION=2.5.0 SPARK_HIVE=1
3)修改配置文件/opt/modules/spark-1.6.1/pom.xml中的hadoop.version和scala.version
<properties> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> <project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding> <akka.group>com.typesafe.akka</akka.group> <akka.version>2.3.11</akka.version> <java.version>1.7</java.version> <maven.version>3.3.3</maven.version> <sbt.project.name>spark</sbt.project.name> <mesos.version>0.21.1</mesos.version> <mesos.classifier>shaded-protobuf</mesos.classifier> <slf4j.version>1.7.10</slf4j.version> <log4j.version>1.2.17</log4j.version> <hadoop.version>2.5.0</hadoop.version> <protobuf.version>2.5.0</protobuf.version> <yarn.version>${hadoop.version}</yarn.version> <hbase.version>0.98.7-hadoop2</hbase.version> <hbase.artifact>hbase</hbase.artifact> <flume.version>1.6.0</flume.version> <zookeeper.version>3.4.5</zookeeper.version> <curator.version>2.4.0</curator.version> <hive.group>org.spark-project.hive</hive.group> <!-- Version used in Maven Hive dependency --> <hive.version>1.2.1.spark</hive.version> <!-- Version used for internal directory structure --> <hive.version.short>1.2.1</hive.version.short> <derby.version>10.10.1.1</derby.version> <parquet.version>1.7.0</parquet.version> <hive.parquet.version>1.6.0</hive.parquet.version> <jblas.version>1.2.4</jblas.version> <jetty.version>8.1.14.v20131031</jetty.version> <orbit.version>3.0.0.v201112011016</orbit.version> <chill.version>0.5.0</chill.version> <ivy.version>2.4.0</ivy.version> <oro.version>2.0.8</oro.version> <codahale.metrics.version>3.1.2</codahale.metrics.version> <avro.version>1.7.7</avro.version> <avro.mapred.classifier>hadoop2</avro.mapred.classifier> <jets3t.version>0.7.1</jets3t.version> <aws.kinesis.client.version>1.4.0</aws.kinesis.client.version> <!-- the producer is used in tests --> <aws.kinesis.producer.version>0.10.1</aws.kinesis.producer.version> <!-- org.apache.httpcomponents/httpclient--> <commons.httpclient.version>4.3.2</commons.httpclient.version> <!-- commons-httpclient/commons-httpclient--> <httpclient.classic.version>3.1</httpclient.classic.version> <commons.math3.version>3.4.1</commons.math3.version> <!-- managed up from 3.2.1 for SPARK-11652 --> <commons.collections.version>3.2.2</commons.collections.version> <scala.version>2.10.4</scala.version> <scala.binary.version>2.10</scala.binary.version> <jline.version>${scala.version}</jline.version> <jline.groupid>org.scala-lang</jline.groupid> <codehaus.jackson.version>1.9.13</codehaus.jackson.version> <fasterxml.jackson.version>2.4.4</fasterxml.jackson.version> <snappy.version>1.1.2</snappy.version> <netlib.java.version>1.1.2</netlib.java.version> <calcite.version>1.2.0-incubating</calcite.version> <commons-codec.version>1.10</commons-codec.version> <!-- org.apache.commons/commons-lang/--> <commons-lang2.version>2.6</commons-lang2.version> <!-- org.apache.commons/commons-lang3/--> <commons-lang3.version>3.3.2</commons-lang3.version> <datanucleus-core.version>3.2.10</datanucleus-core.version> <janino.version>2.7.8</janino.version> <jersey.version>1.9</jersey.version> <joda.version>2.9</joda.version> <jodd.version>3.5.2</jodd.version> <jsr305.version>1.3.9</jsr305.version> <libthrift.version>0.9.2</libthrift.version> <test.java.home>${java.home}</test.java.home> <test.exclude.tags></test.exclude.tags>
4)将编译用到的scala-2.10.4.zip和zinc-0.3.5.3.tgz文件解压到/opt/modules/spark-1.6.1/build目录下(zinc-0.3.5.3.tgz文件下载地址:http://downloads.typesafe.com/zinc/0.3.5.3/zinc-0.3.5.3.tgz)。
build]$ unzip /opt/softwares/scala-2.10.4.zip
build]$ tar -zxf /opt/softwares/zinc-0.3.5.3.tgz
(3)编译Spark
spark-1.6.1]$ ./make-distribution.sh --tgz -Phadoop-2.4 -Dhadoop.version=2.5.0 -Pyarn -Phive -Phive-thriftserver
(4)编译过程及编译成功截图如下
Apache Hadoop-2.5.0-Spark-1.6.1编译成功。
【报错1】
Using `mvn` from path: /opt/modules/apache-maven-3.3.3/bin/mvn [ERROR] Error executing Maven. [ERROR] 1 problem was encountered while building the effective settings [FATAL] Non-parseable settings /opt/modules/apache-maven-3.3.3/conf/settings.xml: Duplicated tag: 'mirrors' (position: START_TAG seen ...</mirrors>\n\n <mirrors>... @161:12) @ /opt/mules/apache-maven-3.3.3/conf/settings.xml, line 161, column 12
【解决方案】报错标签重复,将 /opt/modules/apache-maven-3.3.3/conf/settings.xml文件中的已有部分去掉,保留我配置的aliyun镜像即可。 原settings.xml文件:
<mirrors> <!-- mirror | Specifies a repository mirror site to use instead of a given repository. The repository that | this mirror serves has an ID that matches the mirrorOf element of this mirror. IDs are used | for inheritance and direct lookup purposes, and must be unique across the set of mirrors. | <mirror> <id>mirrorId</id> <mirrorOf>repositoryId</mirrorOf> <name>Human Readable Name for this Mirror.</name> <url>http://my.repository.com/repo/path</url> </mirror> --> </mirrors> <mirrors> <mirror> <id>aliyun</id> <mirrorOf>central</mirrorOf> <name>aliyun repository</name> <url>http://maven.aliyun.com/nexus/content/groups/public/</url> </mirror> </mirrors>
改后的settings.xml文件:
<mirrors> <mirror> <id>aliyun</id> <mirrorOf>central</mirrorOf> <name>aliyun repository</name> <url>http://maven.aliyun.com/nexus/content/groups/public/</url> </mirror> </mirrors>
软件版本: JDK:1.7.0_67 Scala:2.10.4 Hadoop:2.5.0-cdh5.3.6 Spark:1.6.1 Maven:3.3.3 Zinc:0.3.5.3
(1)备份MAVEN环境
cd /home/beifeng
~]$ mkdir m2-apache-apark-backup
~]$ cp -r ./.m2/* m2-apache-apark-backup/
cd /home/beifeng/.m2
.m2]$ rm -rf ./*
(2)搭建Spark环境
1)解压spark源码包
softwares]$ tar -zxf spark-1.6.1.tgz
softwares]$ mv spark-1.6.1 /opt/modules/spark-1.6.1-cdh5.3.6
2)修改配置文件/opt/modules/spark-1.6.1/pom.xml中的组件为cdh-5.3.6版本
<properties> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> <project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding> <akka.group>com.typesafe.akka</akka.group> <akka.version>2.3.11</akka.version> <java.version>1.7</java.version> <maven.version>3.3.3</maven.version> <sbt.project.name>spark</sbt.project.name> <mesos.version>0.21.1</mesos.version> <mesos.classifier>shaded-protobuf</mesos.classifier> <slf4j.version>1.7.10</slf4j.version> <log4j.version>1.2.17</log4j.version> <hadoop.version>2.5.0-cdh5.3.6</hadoop.version> <protobuf.version>2.5.0</protobuf.version> <yarn.version>${hadoop.version}</yarn.version> <hbase.version>0.98.7-hadoop2</hbase.version> <hbase.artifact>hbase</hbase.artifact> <flume.version>1.6.0-cdh5.3.6</flume.version> <zookeeper.version>3.4.5-cdh5.3.6</zookeeper.version> <curator.version>2.4.0</curator.version> <hive.group>org.spark-project.hive</hive.group> <!-- Version used in Maven Hive dependency --> <hive.version>1.2.1.spark</hive.version> <!-- Version used for internal directory structure --> <hive.version.short>1.2.1</hive.version.short> <derby.version>10.10.1.1</derby.version> <parquet.version>1.7.0</parquet.version> <hive.parquet.version>1.6.0</hive.parquet.version> <jblas.version>1.2.4</jblas.version> <jetty.version>8.1.14.v20131031</jetty.version> <orbit.version>3.0.0.v201112011016</orbit.version> <chill.version>0.5.0</chill.version> <ivy.version>2.4.0</ivy.version> <oro.version>2.0.8</oro.version> <codahale.metrics.version>3.1.2</codahale.metrics.version> <avro.version>1.7.7</avro.version> <avro.mapred.classifier>hadoop2</avro.mapred.classifier> <jets3t.version>0.7.1</jets3t.version> <aws.kinesis.client.version>1.4.0</aws.kinesis.client.version> <!-- the producer is used in tests --> <aws.kinesis.producer.version>0.10.1</aws.kinesis.producer.version> <!-- org.apache.httpcomponents/httpclient--> <commons.httpclient.version>4.3.2</commons.httpclient.version> <!-- commons-httpclient/commons-httpclient--> <httpclient.classic.version>3.1</httpclient.classic.version> <commons.math3.version>3.4.1</commons.math3.version> <!-- managed up from 3.2.1 for SPARK-11652 --> <commons.collections.version>3.2.2</commons.collections.version> <scala.version>2.10.4</scala.version> <scala.binary.version>2.10</scala.binary.version> <jline.version>${scala.version}</jline.version> <jline.groupid>org.scala-lang</jline.groupid> <codehaus.jackson.version>1.9.13</codehaus.jackson.version> <fasterxml.jackson.version>2.4.4</fasterxml.jackson.version> <snappy.version>1.1.2</snappy.version> <netlib.java.version>1.1.2</netlib.java.version> <calcite.version>1.2.0-incubating</calcite.version> <commons-codec.version>1.10</commons-codec.version> <!-- org.apache.commons/commons-lang/--> <commons-lang2.version>2.6</commons-lang2.version> <!-- org.apache.commons/commons-lang3/--> <commons-lang3.version>3.3.2</commons-lang3.version> <datanucleus-core.version>3.2.10</datanucleus-core.version> <janino.version>2.7.8</janino.version> <jersey.version>1.9</jersey.version> <joda.version>2.9</joda.version> <jodd.version>3.5.2</jodd.version> <jsr305.version>1.3.9</jsr305.version> <libthrift.version>0.9.2</libthrift.version> <test.java.home>${java.home}</test.java.home> <test.exclude.tags></test.exclude.tags>
3)修改配置文件/opt/modules/spark-1.6.1/make-distribution.sh
VERSION=1.6.1 SCALA_VERSION=2.10.4 SPARK_HADOOP_VERSION=2.5.0-cdh5.3.6 SPARK_HIVE=1
4)修改配置文件/opt/modules/apache-maven-3.3.3/conf/settings.xml
<mirrors> <mirror> <id>cloudera-repo</id> <mirrorOf>central</mirrorOf> <name>Cloudera Repository</name> <url>https://repository.cloudera.com/artifactory/cloudera-repos</url> <releases> <enabled>true</enabled> </releases> <snapshots> <enabled>false</enabled> </snapshots> </mirror> <mirror> <id>aliyun</id> <mirrorOf>central</mirrorOf> <name>aliyun repository</name> <url>http://maven.aliyun.com/nexus/content/groups/public/</url> </mirror> </mirrors>
5)将scala和zinc的安装包解压到/opt/modules/spark-1.6.1-cdh5.3.6/build/目录下
build]$ unzip /opt/softwares/scala-2.10.4.zip -d /opt/modules/spark-1.6.1-cdh5.3.6/build/
build]$ tar -zxf /opt/softwares/zinc-0.3.5.3.tgz -C .
(3)编译Spark
spark-1.6.1-cdh5.3.6]$ ./make-distribution.sh --tgz -Phadoop-2.4 -Dhadoop.version=2.5.0-cdh5.3.6 -Pyarn -Phive -Phive-thriftserver
(4)编译失败,报错未能解决
【报错1】
Using `mvn` from path: /opt/modules/apache-maven-3.3.3/bin/mvn [ERROR] Error executing Maven. [ERROR] 1 problem was encountered while building the effective settings [FATAL] Non-parseable settings /opt/modules/apache-maven-3.3.3/conf/settings.xml: end tag name </mirrors> must match start tag name <mirror> from line 154 (position: TEXT seen ...</snapshots>\n </mirrors>... @164:13) @ /opt/modules/apache-maven-3.3.3/conf/settings.xml, line 164, column 13
【解决方案】
<mirror> <id>cloudera-repo</id> <name>Cloudera Repository</name> <url>https://repository.cloudera.com/artifactory/cloudera-repos</url> <releases> <enabled>true</enabled> </releases> <snapshots> <enabled>false</enabled> </snapshots> </mirror>
【报错2】
Using `mvn` from path: /opt/modules/apache-maven-3.3.3/bin/mvn [ERROR] Error executing Maven. [ERROR] 2 problems were encountered while building the effective settings [WARNING] Unrecognised tag: 'releases' (position: START_TAG seen ...</url>\n <releases>... @158:17) @ /opt/modules/apache-maven-3.3.3/conf/settings.xml, line 158, column 17 [ERROR] 'mirrors.mirror.mirrorOf' for cloudera-repo is missing @ /opt/modules/apache-maven-3.3.3/conf/settings.xml
【解决方案】
<mirrors> <mirror> <id>aliyun</id> <mirrorOf>central</mirrorOf> <name>aliyun repository</name> <url>http://maven.aliyun.com/nexus/content/groups/public/</url> </mirror> <mirror> <id>cloudera-repo</id> <mirrorOf>central</mirrorOf> <name>Cloudera Repository</name> <url>https://repository.cloudera.com/artifactory/cloudera-repos</url> <releases> <enabled>true</enabled> </releases> <snapshots> <enabled>false</enabled> </snapshots> </mirror> </mirrors>
【报错3】
[INFO] ------------------------------------------------------------------------ [ERROR] Failed to execute goal on project spark-launcher_2.10: Could not resolve dependencies for project org.apache.spark:spark-launcher_2.10:jar:1.6.1: Could not find artifact org.apache.hadoop:hadoop-client:jar:2.5.0-cdh5.3.6 in aliyun (http://maven.aliyun.com/nexus/content/groups/public/) -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn <goals> -rf :spark-launcher_2.10
【解决方案】
<mirrors> <mirror> <id>cloudera-repo</id> <mirrorOf>central</mirrorOf> <name>Cloudera Repository</name> <url>https://repository.cloudera.com/artifactory/cloudera-repos</url> <releases> <enabled>true</enabled> </releases> <snapshots> <enabled>false</enabled> </snapshots> </mirror> <mirror> <id>aliyun</id> <mirrorOf>central</mirrorOf> <name>aliyun repository</name> <url>http://maven.aliyun.com/nexus/content/groups/public/</url> </mirror> </mirrors>
【报错4】
[INFO] ------------------------------------------------------------------------ [ERROR] Plugin org.apache.maven.plugins:maven-remote-resources-plugin:1.5 or one of its dependencies could not be resolved: Failed to read artifact descriptor for org.apache.maven.plugins:maven-remote-resources-plugin:jar:1.5: Could not transfer artifact org.apache.maven.plugins:maven-remote-resources-plugin:pom:1.5 from/to cloudera-repo (https://repository.cloudera.com/artifactory/cloudera-repos): Remote host closed connection during handshake: SSL peer shut down incorrectly -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/PluginResolutionException
【解决方案】
<mirrors> <mirror> <id>cloudera-repo</id> <mirrorOf>central</mirrorOf> <name>Cloudera Repository</name> <url>https://repository.cloudera.com/artifactory/cloudera-repos</url> </mirror> <mirror> <id>aliyun</id> <mirrorOf>*</mirrorOf> <name>aliyun repository</name> <url>http://maven.aliyun.com/nexus/content/groups/public/</url> </mirror> </mirrors>
本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。
我来说两句