blocks|key|1801747|text|如果没有将别名应用于dataframe，则在创建已连接的dataframe后将收到一个错误。对于两个名称相同的列，引用其中一个重复的命名列将返回一个错误，该错误本质上说它不知道您选择了哪一列(含糊不清)。在Server和其他语言中，SQL引擎不会让查询通过，或者会自动在字段名后面添加前缀或后缀。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|1801748|entityMap^0|0^^$0|@$1|2|3|4|5|6|7|D|8|@]|9|@]|A|$]]|$1|B|3|-4|5|6|7|E|8|@]|9|@]|A|$]]]|C|$]]

If you do not apply an alias to the dataframe, you’ll receive an error after you create your joined dataframe. With two columns named the same thing, referencing one of the duplicate named columns returns an error that essentially says it doesn’t know which one you selected (Ambiguous). In SQL Server and other languages, the SQL engine wouldn’t let that query go through or it would automatically append a prefix or suffix to that field name.

blocks|key|1801764|text|尝试一下，您可以使用col()来引用列|type|unstyled|depth|inlineStyleRanges|entityRanges|data|1801765|scala>+spark.sql("select+*+from+table1+LEFT+OUTER+JOIN+table2+ON+table1.NAME+=+table2.NAME").drop(col("table2.NAME")).show()
%2B---%2B----%2B----------%2B
%7C+ID%7CNAME%7CACTUALNAME%7C
%2B---%2B----%2B----------%2B
%7C++1%7C+har%7C+++++HARRY%7C
%7C++2%7C+ron%7C++++RONALD%7C
%7C++3%7Cfred%7C++++++null%7C
%2B---%2B----%2B----------%2B|code-block|syntax|javascript|1801766|entityMap^0|0|0^^$0|@$1|2|3|4|5|6|7|I|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|J|8|@]|9|@]|A|$E|F]]|$1|G|3|-4|5|6|7|K|8|@]|9|@]|A|$]]]|H|$]]

try this you can use col() for referring column 

<pre><code>scala&gt; spark.sql("select * from table1 LEFT OUTER JOIN table2 ON table1.NAME = table2.NAME").drop(col("table2.NAME")).show()
+---+----+----------+
| ID|NAME|ACTUALNAME|
+---+----+----------+
| 1| har| HARRY|
| 2| ron| RONALD|
| 3|fred| null|
+---+----+----------+
</code></pre>

blocks|key|2001420|text|我们可以选择sql查询中所需的字段，如下所示|type|unstyled|depth|inlineStyleRanges|entityRanges|data|2001421|spark.sql("select+A.ID,A.NAME,B.ACTUALNAME+from+table1+A+LEFT+OUTER+JOIN+table2+B+ON+table1.NAME+=+table2.NAME").show()|code-block|syntax|javascript|2001422|entityMap^0|0|0^^$0|@$1|2|3|4|5|6|7|I|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|J|8|@]|9|@]|A|$E|F]]|$1|G|3|-4|5|6|7|K|8|@]|9|@]|A|$]]]|H|$]]

we can select the required fields in the sql query like below one 

<pre><code>spark.sql("select A.ID,A.NAME,B.ACTUALNAME from table1 A LEFT OUTER JOIN table2 B ON table1.NAME = table2.NAME").show()
</code></pre>

blocks|key|1801860|text|这主要是一项学术练习，但您也可以通过打开Spark在引用标识符中解释正则表达式的能力(一种从Hive+SQL继承来的能力)，而不需要删除列。构建火花上下文时，需要将spark.sql.parser.quotedRegexColumnNames设置为true，这样才能正常工作。|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|1801861|$+spark-shell+--master+"local[*]"+--conf+spark.sql.parser.quotedRegexColumnNames=true
...
scala>+spark.sql("select+table1.*,+table2.`%5E(?!NAME$).*$`+from+table1+LEFT+OUTER+JOIN+table2+ON+table1.NAME+=+table2.NAME").show()
%2B---%2B----%2B----------%2B
%7C+ID%7CNAME%7CACTUALNAME%7C
%2B---%2B----%2B----------%2B
%7C++1%7C+har%7C+++++HARRY%7C
%7C++2%7C+ron%7C++++RONALD%7C
%7C++3%7Cfred%7C++++++null%7C
%2B---%2B----%2B----------%2B|code-block|syntax|javascript|1801862|这里|1801863|table2.`%5E(?!NAME$).*$`|1801864|解析为table2的所有列，但NAME除外。任何有效的Java正则表达式都应该可以工作。|1801865|entityMap^0|2A|13|3G|4|0|0|0|0|3|6|F|4|0^^$0|@$1|2|3|4|5|6|7|S|8|@$9|T|A|U|B|C]|$9|V|A|W|B|C]]|D|@]|E|$]]|$1|F|3|G|5|H|7|X|8|@]|D|@]|E|$I|J]]|$1|K|3|L|5|6|7|Y|8|@]|D|@]|E|$]]|$1|M|3|N|5|H|7|Z|8|@]|D|@]|E|$I|J]]|$1|O|3|P|5|6|7|10|8|@$9|11|A|12|B|C]|$9|13|A|14|B|C]]|D|@]|E|$]]|$1|Q|3|-4|5|6|7|15|8|@]|D|@]|E|$]]]|R|$]]

This is mostly an academic exercise, but you can also do it without the need to drop columns by switching on the ability of Spark SQL to interpret regular expressions in quoted identifiers, an ability inherited from Hive SQL. You need to set <code>spark.sql.parser.quotedRegexColumnNames</code> to <code>true</code> when building the Spark context for this to work.

<pre><code>$ spark-shell --master "local[*]" --conf spark.sql.parser.quotedRegexColumnNames=true
...
scala&gt; spark.sql("select table1.*, table2.`^(?!NAME$).*$` from table1 LEFT OUTER JOIN table2 ON table1.NAME = table2.NAME").show()
+---+----+----------+
| ID|NAME|ACTUALNAME|
+---+----+----------+
| 1| har| HARRY|
| 2| ron| RONALD|
| 3|fred| null|
+---+----+----------+
</code></pre>

Here

<pre><code>table2.`^(?!NAME$).*$`
</code></pre>

resolves to all columns of <code>table2</code> except <code>NAME</code>. Any valid Java regular expression should work.

I am writing a join query for 2 dataframes. I have to perform join on column which has same name in both dataframes. How can I write it in Query?

<pre><code>var df1 = Seq((1,"har"),(2,"ron"),(3,"fred")).toDF("ID", "NAME")
var df2 = Seq(("har", "HARRY"),("ron", "RONALD")).toDF("NAME", "ACTUALNAME")
df1.createOrReplaceTempView("table1")
df2.createOrReplaceTempView("table2")
</code></pre>

I know we can do <code>df3 = df1.join(df2, Seq("NAME"))</code> where <code>NAME</code> is the common column. In this scenario <code>df3</code> will have only <code>ID, NAME, ACTUALNAME</code>.

If we do it from SQL then query will be <code>select * from table1 LEFT OUTER JOIN table2 ON table1.NAME = table2.NAME</code>. For this output dataframe will have <code>ID, NAME, NAME, ACTUALNAME</code> columns. How can I remove extra <code>NAME</code> column which came from <code>df2</code>. 

This does not work as well <code>spark.sql("select * from table1 LEFT OUTER JOIN table2 ON table1.NAME = table2.NAME").drop(df2("NAME"))</code>

Is there a cleaner way to do this? Renaming <code>df2</code> columns is the last option which I don't want to use. I have scenario where creating SQL queries is easier than dataframes so looking for only SPARK SQL Specific answers

Spark SQL QUERY join on Same column name

翻译质量差，导致语言生硬或混乱。

没有提供实际的解决方法或示例。

解答不清晰，无法理解或解决问题。

页面排版不美观，阅读体验差。

文章

问答

视频

学习中心

腾讯云实验室

直播

竞赛

腾讯云代码分析专区

腾讯iOA零信任安全管理系统专区

腾讯云架构师技术同盟交流圈

腾讯云数据库专区

腾讯云顾问专区

腾讯云原生专区

腾讯混元专区

腾讯云TCE专区

腾讯云Lighthouse专区

腾讯云HAI专区

腾讯云Edgeone专区

腾讯云存储专区

腾讯云智能专区

腾讯轻联专区 

腾讯云开发专区

TAPD专区

腾讯轻量云游戏服专区

腾讯云最具价值专家

腾讯云架构师技术同盟

腾讯云创作之星

腾讯云开发者先锋

腾讯云代码助手

云原生构建

TAPD 敏捷项目管理

Cloud Studio

SDK中心

API中心

命令行工具

涵盖代码开发、场景应用、自动测试全流程，助你从零构建专属AI助手

一站式MCP教程库，解锁AI应用新玩法

我正在为2个数据格式编写一个联接查询。我必须在列上执行连接，该列在两个dataframes中具有相同的名称。我如何在查询中编写它？var df1 = Seq((1,"har"),(2,"ron"),(3,"fred")).toDF("ID", "NAME")var df2 = Seq(("har", "HARRY"),...

问在同一列名上激发SQL查询联接
EN

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在同一列名上激发SQL查询联接EN