Spark2.x学习笔记:15、Spark SQL的SQL

15、 Spark SQL的SQL

15.1 Spark SQL所支持的SQL语法

select [distinct] [column names]|[wildcard]
from tableName
[join clause tableName on join condition]
[where condition]
[group by column name]
[having conditions]
[order by column names [asc|desc]]

如果只用join进行查询,则支持的语法为:

select statement
from statement
[join | inner join | left join | left semi join | left outer join | right join |right outer join | full join | full outer join]
on join condition

15.2 Spark SQL的SQL的框架

15.3 与Hive Metastore结合

(1)Spark要能找到HDFS和Hive的配置文件

  • 第1种方法:可以直接将core-site.xml、hdfs-site.xml和hive-site.xml复制到Spark安装目录下的conf目录中。该方法存在一个缺陷,如果HDFS或Hive的配置修改了,则需要手动修改Spark对应的配置文件。
  • 第2种方法:在Spark配置文件中指定Hadoop配置文件目录

(2)Spark SQL与Hive Metastore结合,直接使用spark.sql(“select … from table where …”)

15.4 实例演示

(1)spark-shell

[root@node1 ~]# spark-shell
17/10/24 10:15:21 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Spark context Web UI available at http://192.168.80.131:4040
Spark context available as 'sc' (master = local[*], app id = local-1508854525067).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.2.0
      /_/

Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_112)
Type in expressions to have them evaluated.
Type :help for more information.

scala> spark.sql("show databases").show
+------------+
|databaseName|
+------------+
|     default|
|        test|
+------------+


scala> spark.sql("show tables").show
+--------+---------+-----------+
|database|tableName|isTemporary|
+--------+---------+-----------+
| default|  copyemp|      false|
| default|     demo|      false|
| default|     dept|      false|
| default|     dual|      false|
| default|      emp|      false|
| default|   empbak|      false|
| default|employees|      false|
| default|     mytb|      false|
| default|    users|      false|
+--------+---------+-----------+


scala> spark.sql("select * from emp").show
+----+------+---------+----+----------+------+------+----+
| eid| ename|      job| mgr|  hiredate|   sal|  comm| did|
+----+------+---------+----+----------+------+------+----+
|7782| CLARK|  MANAGER|7839|1981-06-09|2450.0|   0.0|  10|
|7839|  KING|PRESIDENT|   0|1981-11-17|5000.0|   0.0|  10|
|7934|MILLER|    CLERK|7782|1982-01-23|1300.0|   0.0|  10|
|7369| SMITH|    CLERK|7902|1980-12-17| 800.0|   0.0|  20|
|7566| JONES|  MANAGER|7839|1981-04-02|2975.0|   0.0|  20|
|7902|  FORD|  ANALYST|7566|1981-12-03|3000.0|   0.0|  20|
|7499| ALLEN| SALESMAN|7698|1981-02-20|1600.0| 300.0|  30|
|7521|  WARD| SALESMAN|7698|1981-02-22|1250.0| 500.0|  30|
|7654|MARTIN| SALESMAN|7698|1981-09-28|1250.0|1400.0|  30|
|7698| BLAKE|  MANAGER|7839|1981-05-01|2850.0|   0.0|  30|
|7844|TURNER| SALESMAN|7698|1981-09-08|1500.0|   0.0|  30|
|7900| JAMES|    CLERK|7698|1981-12-03| 950.0|   0.0|  30|
|8888|HADRON|     null|null|2016-08-31|6666.0|  null|null|
+----+------+---------+----+----------+------+------+----+


scala> 

(2)spark-sql

[root@node1 ~]# spark-sql
17/10/24 10:17:21 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/10/24 10:17:32 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
17/10/24 10:17:32 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
spark-sql> show databases;
default
test
Time taken: 3.93 seconds, Fetched 2 row(s)
spark-sql> show tables;
default copyemp false
default demo    false
default dept    false
default dual    false
default emp false
default empbak  false
default employees   false
default mytb    false
default users   false
Time taken: 0.145 seconds, Fetched 9 row(s)
spark-sql> select * from emp;
7782    CLARK   MANAGER 7839    1981-06-09  2450.0  0.0 10
7839    KING    PRESIDENT   0   1981-11-17  5000.0  0.0 10
7934    MILLER  CLERK   7782    1982-01-23  1300.0  0.0 10
7369    SMITH   CLERK   7902    1980-12-17  800.0   0.0 20
7566    JONES   MANAGER 7839    1981-04-02  2975.0  0.0 20
7902    FORD    ANALYST 7566    1981-12-03  3000.0  0.0 20
7499    ALLEN   SALESMAN    7698    1981-02-20  1600.0  300.0   30
7521    WARD    SALESMAN    7698    1981-02-22  1250.0  500.0   30
7654    MARTIN  SALESMAN    7698    1981-09-28  1250.0  1400.0  30
7698    BLAKE   MANAGER 7839    1981-05-01  2850.0  0.0 30
7844    TURNER  SALESMAN    7698    1981-09-08  1500.0  0.0 30
7900    JAMES   CLERK   7698    1981-12-03  950.0   0.0 30
8888    HADRON  NULL    NULL    2016-08-31  6666.0  NULL    NULL
Time taken: 3.266 seconds, Fetched 13 row(s)
spark-sql> 

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

相关文章

来自专栏智能计算时代

Bluemix Local: Architectural Overview

We are ready to take Bluemix into your datacentres, behind your firewall. Some o...

2858
来自专栏牛肉圆粉不加葱

ResourceManager HA无法连接Spark TrackUi现象解决方案

在对ResourceManager做了基于Zookeeper的HA后, 在YARN集群上执行Spark application后, 打开Spark Applic...

842
来自专栏康怀帅的专栏

Docker 相关概念总览

Docker 概念总览 Docker Engine Docker 引擎 Docker architecture Docker 架构 Docker daemon ...

3908
来自专栏乐沙弥的世界

Oracle 11g RAC CRS-4535/ORA-15077

    新安装了Oracle 11g rac之后,不知道是什么原因导致第二个节点上的crsd无法启动?其错误消息是CRS-4535: Cannot commun...

743
来自专栏鬼谷君

Docker 部署Django项目

这里指定 Python 版本为docker官方提供的 "0.0.0.0:8000" 这里笔者开启容器中 8000 端口

943
来自专栏鬼谷君

Docker部署Vue 工程包

这样前端工程镜像就build好了,可以执行docker run -d -p9528:9528 dist:v0.1启动

732
来自专栏王亚昌的专栏

How to build your own ubuntu image with docker?

docker run -d -p 222:22 ubuntu-sshd-admin

942
来自专栏积累沉淀

ie8和chrome获取上传图片的宽度和高度等尺寸

测试后可用 <html> <head>     <title>测试</title>     <meta chars...

3688
来自专栏琯琯博客

docker-resources资源汇集相关项目博文

docker资源汇总。英文版本链接 资源汇集 书籍 第一本Docker书 (7.4分) Docker —— 从入门到实践 (内容一般) The Docker B...

4237
来自专栏虚拟化云计算

用qemu中最少的代码实现一个kvm模拟器

本文依据qemu2.11的源码,把整个初始化和运行虚拟机的代码拿出来,完成一个可以运行的模拟器demo。从中可以很清晰的看出qemu-kvm的初始化以及虚拟机的...

94710

扫码关注云+社区