blocks|key|4517143|text|除了@Patrick的答案之外，您还可以使用以下命令删除多个列|type|unstyled|depth|inlineStyleRanges|entityRanges|data|4517144|columns_to_drop+=+['id',+'id_copy']
df+=+df.drop(*columns_to_drop)|code-block|syntax|javascript|4517145|entityMap^0|0|0^^$0|@$1|2|3|4|5|6|7|I|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|J|8|@]|9|@]|A|$E|F]]|$1|G|3|-4|5|6|7|K|8|@]|9|@]|A|$]]]|H|$]]

Adding to @Patrick's answer, you can use the following to drop multiple columns

<pre><code>columns_to_drop = ['id', 'id_copy']
df = df.drop(*columns_to_drop)
</code></pre>

blocks|key|4517067|text|要做到这一点，一种简单的方法是使用"select“，并意识到您可以使用df.columns获取dataframe的所有columns+df的列表|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|4517068|drop_list+=+['a+column',+'another+column',+...]

df.select([column+for+column+in+df.columns+if+column+not+in+drop_list])|code-block|syntax|javascript|4517069|entityMap^0|I|6|Z|A|1B|9|1N|7|1V|2|0|0^^$0|@$1|2|3|4|5|6|7|M|8|@$9|N|A|O|B|C]|$9|P|A|Q|B|C]|$9|R|A|S|B|C]|$9|T|A|U|B|C]|$9|V|A|W|B|C]]|D|@]|E|$]]|$1|F|3|G|5|H|7|X|8|@]|D|@]|E|$I|J]]|$1|K|3|-4|5|6|7|Y|8|@]|D|@]|E|$]]]|L|$]]

An easy way to do this is to user "<code>select</code>" and realize you can get a list of all <code>columns</code> for the <code>dataframe</code>, <code>df</code>, with <code>df.columns</code>

<pre><code>drop_list = ['a column', 'another column', ...]

df.select([column for column in df.columns if column not in drop_list])
</code></pre>

blocks|key|4517176|text|您可以使用两种方式：|type|unstyled|depth|inlineStyleRanges|entityRanges|data|4517177|1:你只需要保留必要的列：|4517178|drop_column_list+=+["drop_column"]
df+=+df.select([column+for+column+in+df.columns+if+column+not+in+drop_column_list])++|code-block|syntax|javascript|4517179|他说:这是更优雅的方式。|4517180|df+=+df.drop("col_name")|4517181|您应该避免collect()版本，因为它会将完整的数据集发送到主服务器，这将需要大量的计算工作！|4517182|entityMap^0|0|0|0|0|0|0^^$0|@$1|2|3|4|5|6|7|Q|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|R|8|@]|9|@]|A|$]]|$1|D|3|E|5|F|7|S|8|@]|9|@]|A|$G|H]]|$1|I|3|J|5|6|7|T|8|@]|9|@]|A|$]]|$1|K|3|L|5|F|7|U|8|@]|9|@]|A|$G|H]]|$1|M|3|N|5|6|7|V|8|@]|9|@]|A|$]]|$1|O|3|-4|5|6|7|W|8|@]|9|@]|A|$]]]|P|$]]

You can use two way:

1:
You just keep the necessary columns: 

<pre><code>drop_column_list = ["drop_column"]
df = df.select([column for column in df.columns if column not in drop_column_list]) 
</code></pre>

2: This is the more elegant way. 

<pre><code>df = df.drop("col_name")
</code></pre>

You should avoid the collect() version, because it will send to the master the complete dataset, it will take a big computing effort!

blocks|key|9369|text|您可以显式命名要保留的列，如下所示：|type|unstyled|depth|inlineStyleRanges|entityRanges|data|9370|keep+=+[a.id,+a.julian_date,+a.user_id,+b.quan_created_money,+b.quan_created_cnt]|code-block|syntax|javascript|9371|或者，在更一般的方法中，您可以通过列表理解包括除特定列之外的所有列。例如(不包括b中的id列)：|offset|length|style|CODE|9372|keep+=+[a[c]+for+c+in+a.columns]+%2B+[b[c]+for+c+in+b.columns+if+c+!=+'id']|9373|最后，对连接结果进行选择：|9374|d+=+a.join(b,+a.id==b.id,+'outer').select(*keep)|9375|entityMap^0|0|0|14|1|17|2|0|0|0|0^^$0|@$1|2|3|4|5|6|7|U|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|V|8|@]|9|@]|A|$E|F]]|$1|G|3|H|5|6|7|W|8|@$I|X|J|Y|K|L]|$I|Z|J|10|K|L]]|9|@]|A|$]]|$1|M|3|N|5|D|7|11|8|@]|9|@]|A|$E|F]]|$1|O|3|P|5|6|7|12|8|@]|9|@]|A|$]]|$1|Q|3|R|5|D|7|13|8|@]|9|@]|A|$E|F]]|$1|S|3|-4|5|6|7|14|8|@]|9|@]|A|$]]]|T|$]]

You could either explicitly name the columns you want to keep, like so:

<pre><code>keep = [a.id, a.julian_date, a.user_id, b.quan_created_money, b.quan_created_cnt]
</code></pre>

Or in a more general approach you'd include all columns except for a specific one via a list comprehension. For example like this (excluding the <code>id</code> column from <code>b</code>):

<pre><code>keep = [a[c] for c in a.columns] + [b[c] for c in b.columns if c != 'id']
</code></pre>

Finally you make a selection on your join result:

<pre><code>d = a.join(b, a.id==b.id, 'outer').select(*keep)
</code></pre>

blocks|key|4517024|text|也许有点离题，但这里是使用Scala的解决方案。从oldDataFrame中创建一个列名的Array，并删除要删除("colExclude")的列。然后将Array[Column]传递给select并解压缩它。|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|4517025|val+columnsToKeep:+Array[Column]+=+oldDataFrame.columns.diff(Array("colExclude"))
+++++++++++++++++++++++++++++++++++++++++++++++.map(x+=>+oldDataFrame.col(x))
val+newDataFrame:+DataFrame+=+oldDataFrame.select(columnsToKeep:+_*)|code-block|syntax|javascript|4517026|entityMap^0|P|C|19|5|1L|E|25|D|2L|6|0|0^^$0|@$1|2|3|4|5|6|7|M|8|@$9|N|A|O|B|C]|$9|P|A|Q|B|C]|$9|R|A|S|B|C]|$9|T|A|U|B|C]|$9|V|A|W|B|C]]|D|@]|E|$]]|$1|F|3|G|5|H|7|X|8|@]|D|@]|E|$I|J]]|$1|K|3|-4|5|6|7|Y|8|@]|D|@]|E|$]]]|L|$]]

Maybe a little bit off topic, but here is the solution using Scala. Make an <code>Array</code> of column names from your <code>oldDataFrame</code> and delete the columns that you want to drop <code>("colExclude")</code>. Then pass the <code>Array[Column]</code> to <code>select</code> and unpack it. 

<pre><code>val columnsToKeep: Array[Column] = oldDataFrame.columns.diff(Array("colExclude"))
 .map(x =&gt; oldDataFrame.col(x))
val newDataFrame: DataFrame = oldDataFrame.select(columnsToKeep: _*)
</code></pre>

blocks|key|4517236|text|是的，可以像这样通过切片来删除/选择列：|type|unstyled|depth|inlineStyleRanges|entityRanges|data|4517237|slice+=+data.columnsa:b|4517238|切片(data.select).show()|4517239|示例：|4517240|newDF+=+spark.createDataFrame([
+++++++++++++++++++++++++++(1,+"a",+"4",+0),+
++++++++++++++++++++++++++++(2,+"b",+"10",+3),+
++++++++++++++++++++++++++++(7,+"b",+"4",+1),+
++++++++++++++++++++++++++++(7,+"d",+"4",+9)],
++++++++++++++++++++++++++++("id",+"x1",+"x2",+"y"))


slice+=+newDF.columns[1:3]
newDF.select(slice).show()|code-block|syntax|javascript|4517241|使用select方法获取功能列：|4517242|features+=+newDF.columns[:-1]
newDF.select(features).show()|4517243|使用drop方法获取最后一列：|4517244|last_col=+newDF.drop(*features)
last_col.show()|4517245|entityMap^0|0|0|0|0|0|0|0|0|0^^$0|@$1|2|3|4|5|6|7|W|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|X|8|@]|9|@]|A|$]]|$1|D|3|E|5|6|7|Y|8|@]|9|@]|A|$]]|$1|F|3|G|5|6|7|Z|8|@]|9|@]|A|$]]|$1|H|3|I|5|J|7|10|8|@]|9|@]|A|$K|L]]|$1|M|3|N|5|6|7|11|8|@]|9|@]|A|$]]|$1|O|3|P|5|J|7|12|8|@]|9|@]|A|$K|L]]|$1|Q|3|R|5|6|7|13|8|@]|9|@]|A|$]]|$1|S|3|T|5|J|7|14|8|@]|9|@]|A|$K|L]]|$1|U|3|-4|5|6|7|15|8|@]|9|@]|A|$]]]|V|$]]

Yes, it is possible to drop/select columns by slicing like this:
slice = data.columns[a:b]
data.select(slice).show()
Example:
<pre><code>newDF = spark.createDataFrame([
 (1, &quot;a&quot;, &quot;4&quot;, 0), 
 (2, &quot;b&quot;, &quot;10&quot;, 3), 
 (7, &quot;b&quot;, &quot;4&quot;, 1), 
 (7, &quot;d&quot;, &quot;4&quot;, 9)],
 (&quot;id&quot;, &quot;x1&quot;, &quot;x2&quot;, &quot;y&quot;))


slice = newDF.columns[1:3]
newDF.select(slice).show()
</code></pre>
Use select method to get features column:
<pre><code>features = newDF.columns[:-1]
newDF.select(features).show()
</code></pre>
Use drop method to get last column:
<pre><code>last_col= newDF.drop(*features)
last_col.show()
</code></pre>

blocks|key|9555|text|您可以像这样删除列：|type|unstyled|depth|inlineStyleRanges|entityRanges|data|9556|df.drop("column+Name).columns|code-block|syntax|javascript|9557|在您的案例中：|9558|df.drop("id").columns|9559|如果要删除多个列，可以执行以下操作：|9560|dfWithLongColName.drop("ORIGIN_COUNTRY_NAME",+"DEST_COUNTRY_NAME")|9561|entityMap^0|0|0|0|0|0|0^^$0|@$1|2|3|4|5|6|7|Q|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|R|8|@]|9|@]|A|$E|F]]|$1|G|3|H|5|6|7|S|8|@]|9|@]|A|$]]|$1|I|3|J|5|D|7|T|8|@]|9|@]|A|$E|F]]|$1|K|3|L|5|6|7|U|8|@]|9|@]|A|$]]|$1|M|3|N|5|D|7|V|8|@]|9|@]|A|$E|F]]|$1|O|3|-4|5|6|7|W|8|@]|9|@]|A|$]]]|P|$]]

You can delete column like this:

<pre><code>df.drop("column Name).columns
</code></pre>

In your case :

<pre><code>df.drop("id").columns
</code></pre>

If you want to drop more than one column you can do:

<pre><code>dfWithLongColName.drop("ORIGIN_COUNTRY_NAME", "DEST_COUNTRY_NAME")
</code></pre>

<pre><code>&gt;&gt;&gt; a
DataFrame[id: bigint, julian_date: string, user_id: bigint]
&gt;&gt;&gt; b
DataFrame[id: bigint, quan_created_money: decimal(10,0), quan_created_cnt: bigint]
&gt;&gt;&gt; a.join(b, a.id==b.id, 'outer')
DataFrame[id: bigint, julian_date: string, user_id: bigint, id: bigint, quan_created_money: decimal(10,0), quan_created_cnt: bigint]
</code></pre>

There are two <code>id: bigint</code> and I want to delete one. How can I do?

How to delete columns in pyspark dataframe

>>> aDataFrame[id: bigint, julian_date: string, user_id: bigint]>>> bDataFrame[id: bigint, quan_created_money: decimal(10,0), quan_created_cnt: bigint]>>> a.join(b, a.id==b.id, 'outer')DataFrame[id: b

问如何删除pyspark dataframe中的列
EN

回答 7

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何删除pyspark dataframe中的列EN

回答 7

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何删除pyspark dataframe中的列
EN