spark单机模式简单搭建

待安装列表 hadoop hive scala spark 一.环境变量配置: ~/.bash_profile PATH=$PATH:$HOME/bin

export PATH

JAVA_HOME=/usr/local/jdk export SCALA_HOME=/usr/local/scala export SPARK_HOME=/usr/local/spark export PATH=.:$JAVA_HOME/bin:$SCALA_HOME/bin:$PATH

HADOOP_HOME=/usr/local/hadoop export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export YARN_HOME=$HADOOP_HOME export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export HDFS_CONF_DIR=$HADOOP_HOME/etc/hadoop export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH export HADOOP_HOME PATH

HIVE_HOME=/usr/local/hive PATH=$HIVE_HOME/bin:$PATH export HIVE_HOME PATH

二.hadoop 安装搭建 1.配置ssh互信 ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys chmod 700 ~/.ssh chmod 600 ~/.ssh/authorized_keys

2.修改hostname 为yul32 vi/etc/hosts vi /etc/sysconfig/network (3.修改hadoop-env.sh export JAVA_HOME=/usr/local/jdk)

(4.修改core-site.xml) <property> <name>fs.defaultFS</name> <value>hdfs://yul32:9000</value> </property>

(5.修改hdfs-site.xml) (/usr/hadoop-2.3.0/etc/hadoop) <property> <name>dfs.namenode.name.dir</name> <value>/usr/local/hadoop/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/usr/local/hadoop/dfs/data</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.permission</name> <value>false</value> </property>

(5.修改mapred-site.xml) (mapred-site.xml.template ?) (/usr/hadoop-2.3.0/etc/hadoop) <property> <name>mapreduce.cluster.temp.dir</name> <value></value> <description>No description</description> <final>true</final> </property> <property> <name>mapreduce.cluster.local.dir</name> <value></value> <description>No description</description> <final>true</final> </property>

(6.修改yarn-site.xml) (/usr/hadoop-2.3.0/etc/hadoop) <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property>

7.修改slaves.sh ?? yul32 8.namenode format 输入命令hadoop namenode –format 9.启动hadoop cd hadoop/sbin start-all.sh ifup ifdown 三.spark 搭建 (/usr/spark-1.1.0-bin-hadoop2.3/conf) <报错 readonly> 1.修改conf/slaves yul32

(2.修改spark-env.sh (/usr/spark-1.1.0-bin-hadoop2.3/conf)) export SCALA_HOME=/usr/local/scala export JAVA_HOME=/usr/local/jdk export SPARK_MASTER_IP=yul32 export SPARK_WORKER_CORES=1 export SPARK_WORKER_INSTANCES=1 export SPARK_MASTER_PORT=7077 export SPARK_WORKER_MEMORY=1g export MASTER=spark://${SPARK_MASTER_IP}:${SPARK_MASTER_PORT} 3.启动spark ./sbin/start-all.sh 4.运行spark例子 ./bin/run-example org.apache.spark.examples.JavaSparkPi 2 5.运行scala-shell ./bin/spark-shell --master local[2] 6.python ./bin/pyspark --master local[2] 7.启动spark sql ./sbin/start-thriftserver.sh(./sbin/start-thriftserver.sh --master yarn) 在后台运行命令: nohup ./sbin/start-thriftserver.sh --master yarn & 查看后台运行进程命令: jobs -l 启动后jps 中包含 SparkSubmit 8.spark sql 客户端连接 ./bin/beeline -u jdbc:hive2://yul32:10000 -n spark -p spark 说明 -n 用户名 -p 密码 或者输入命令 ./bin/beeline beeline> !connect jdbc:hive2://yul32:10000 用户名 密码

上传文件,创建表; 1.hadoop fs -ls /user/ocdc/coc hadoop fs -put /home/ocdc/CI_CUSER_20141104112305197.csv /user/ocdc/coc 2.shark> create table CI_CUSER_20141104112305196( PRODUCT_NO string)ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' ; shark> load data inpath '/user/ocdc/coc/CI_CUSER_20141104112305197.csv' into table CI_CUSER_20141104112305196; shark> create table CI_CUSER_20141104112305197( PRODUCT_NO string)ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' stored as rcfile; shark> insert into table CI_CUSER_20141104112305197 select * from CI_CUSER_20141104112305196;

四.hive 安装配置(非必须) 1.修改hive-env.sh export HADOOP_HOME=/usr/local/hadoop export HIVE_CONF_DIR=/usr/local/hive/conf 2.hive 远程服务 (端口号10000) 启动方式 hive --service hiveserver & 连接Hive JDBC URL:jdbc:hive://ip:10000/default (Hive默认端口:10000 默认数据库名:default) hive数据仓库的位置 hive/conf/hive-site.xml hive.metastroe.warehouse.dir:数据仓库的位置,默认是/user/hive/warehouse; <property> <name>hive.metastore.warehouse.dir</name> <value>/user/hive/warehouse</value> <description>location of default database for the warehouse</description> </property>

shark jdbc 连接 1.查看SharServer 是否启动 [ocdc@oc98 conf]$ jps 7983 Kafka 8803 SharkCliDriver 7377 ResourceManager 16894 SharkServer 6925 JournalNode 12601 CoarseGrainedExecutorBackend 17056 CoarseGrainedExecutorBackend 18424 Jps 14486 Master 4108 QuorumPeerMain 23408 HRegionServer 17655 RunJar 6727 DataNode 7132 DFSZKFailoverController 7510 NodeManager 12553 WorkerLauncher 6614 NameNode 23268 HMaster 12415 SharkCliDriver 2.查看SharkServer端口 [ocdc@oc98 conf]$ netstat -apn | grep 16894 tcp 0 0 ::ffff:10.1.251.98:57902 :::* LISTEN 16894/java tcp 0 0 :::52309 :::* LISTEN 16894/java tcp 0 0 :::9977 :::* LISTEN 16894/java tcp 0 0 :::41222 :::* LISTEN 16894/java tcp 0 0 :::4040 :::* LISTEN 16894/java tcp 0 0 :::45192 :::* LISTEN 16894/java tcp 0 0 ::ffff:10.1.251.98:35289 ::ffff:10.1.251.98:3306 ESTABLISHED 16894/java tcp 0 0 ::ffff:10.1.251.98:57902 ::ffff:10.1.251.104:41877 ESTABLISHED 16894/java tcp 0 0 ::ffff:10.1.251.98:57902 ::ffff:10.1.251.98:53176 ESTABLISHED 16894/java tcp 0 0 ::ffff:10.1.251.98:9977 ::ffff:10.1.48.20:60586 ESTABLISHED 16894/java tcp 1 0 ::ffff:10.1.251.98:57320 ::ffff:10.1.251.98:50012 CLOSE_WAIT 16894/java tcp 0 0 ::ffff:10.1.251.98:9977 ::ffff:10.1.48.20:59756 ESTABLISHED 16894/java tcp 0 0 ::ffff:10.1.251.98:57902 ::ffff:10.1.251.101:50160 ESTABLISHED 16894/java tcp 0 0 ::ffff:10.1.251.98:57902 ::ffff:10.1.251.98:53172 ESTABLISHED 16894/java tcp 0 0 ::ffff:10.1.251.98:57902 ::ffff:10.1.251.101:50159 ESTABLISHED 16894/java unix 2 [ ] STREAM CONNECTED 8889813 16894/java unix 2 [ ] STREAM CONNECTED 8889793 16894/java 端口为9977 即shark服务启动端口 nohup ./bin/shark –-service sharkserver –-p 9977 &

3.jdbc连接 public class SharkTest { private static String driverName = "org.apache.hadoop.hive.jdbc.HiveDriver"; public static void main(String args[]) throws SQLException { try { Class.forName(driverName); } catch (ClassNotFoundException e) { e.printStackTrace(); System.exit(1); } Connection con = DriverManager.getConnection( "jdbc:hive://10.1.251.98:9977/default", "ocdc", "asiainfo"); Statement stmt = con.createStatement(); ResultSet res = stmt.executeQuery("select * from src "); if (res.next()) { System.out.println(res.getString(1)+ " " + res.getString(2)); } } }

Sparksql Sever启动命令 ./sbin/start-thriftserver.sh --master yarn 客户端连接 ./bin/beeline -u jdbc:hive2://10.1.251.98:10000 -n ocdc -p asiainfo 让配置文件立即生效 source /etc/profile

依赖jar包 hive-common-0.8.1.jar hive-exec-0.8.1.jar hive-jdbc-0.8.1.jar hive-metastore-0.8.1.jar hive-service-0.8.1.jar libfb303.jar slf4j-api-1.4.3.jar slf4j-log4j12-1.4.3.jar httpclient-4.2.5.jar hadoop-common-2.3.0.jar

wq 是保存 i 是编辑 q 是强制退出

(赋权) 1、到你想要赋权的文件夹路径下 2、使用 chmod 777 slaves(为这个文件赋权) 3、赋权给ysy(用户)写的权限 chown -R ysy132:ysy132 dfs

切换用户 使用 su - ysy

(hadoop报错日志位置为 /usr/hadoop-2.3.0/logs) tail -500 hadoop-root-namenode-ysy0915.log 查看500行报错日志

(启动hadoop) 在hadoop-2.3.0目录下 输入./sbin/start-dfs.sh 停止 .sbin/stop-dfs.sh ./sbin/stop-dfs.sh

查看启动的节点 jps (回退到上一个目录下)

eg:spark SQL (select a+b from table) val a:Int = inputRow.getInt(0) val b:Int = inputRow.getInt(1) val result:Int = a + b resultRow.setInt(0,result)

def generateCode(e: Expression): Tree = e match{ case Attribute(ordinal) => q"inputRow.getInt($ordinal)" case Add(left,right)=> q""" { val leftResult = ${generateCode(left)} val rightResult = ${generateCode(right)} leftResult + rightResult } """ }

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

相关文章

来自专栏蓝天

Hive 0.12.0安装指南

本文的安装参照了官方的文档:GettingStarted,将Hive 0.12.0安装在Hadoop 2.4.0上。本文将Hive配置成Server模式,并...

7130
来自专栏向治洪

ActivityManagerService启动过程分析

之前讲Android的View的绘制原理和流程的时候,讲到过在Android调用setContentView之后,Android调用了一个prepreTravl...

26980
来自专栏Kubernetes

Kubernetes Nginx Ingress Controller源码分析

main controllers/nginx/pkg/cmd/controller/main.go:29 func main() { // start a ...

572100
来自专栏搜云库

Spring Boot 中使用 MyBatis 整合 Druid 多数据源

本文将讲述 spring boot + mybatis + druid 多数据源配置方案。 环境 CentOs7.3 安装 MySQL 5.7.19 二进制版本...

37870
来自专栏Kubernetes

Kubernetes Nginx Ingress Controller源码分析之创建篇

main controllers/nginx/pkg/cmd/controller/main.go:29 func main() { // start a ...

87670
来自专栏pangguoming

使用AndroidStudio编译NDK的方法及错误解决方案

参考资料: 【android ndk】macos环境下Android Studio中利用gradle编译jni模块及配置:http://demo.netfouc...

761120
来自专栏Ryan Miao

SpringCloud学习5-如何创建一个服务提供者provider

13830
来自专栏JavaWeb

项目中Spring 声明式事务使用的一些坑点分析01

80880
来自专栏IT进修之路

原 荐 SpringBoot整合mybati

20640
来自专栏JAVA技术站

Spring整合Rabbitmq

没有找到一篇完整的文章介绍Spring如何整合Rabbitmq应用,琢磨一天搞出的一个入门的demo与伙伴们分享.

7120

扫码关注云+社区

领取腾讯云代金券