在PySpark或Scala中,可以使用withColumn()
方法来在现有DataFrame中创建新行。
在PySpark中,可以按照以下步骤进行操作:
from pyspark.sql import SparkSession
from pyspark.sql.functions import lit
spark = SparkSession.builder.getOrCreate()
data = [("Alice", 25), ("Bob", 30), ("Charlie", 35)]
df = spark.createDataFrame(data, ["Name", "Age"])
df.show()
输出:
+-------+---+
| Name|Age|
+-------+---+
| Alice| 25|
| Bob| 30|
|Charlie| 35|
+-------+---+
withColumn()
方法创建新行:new_row = ("Dave", 40)
df_new = df.withColumn("Name", lit(new_row[0])).withColumn("Age", lit(new_row[1]))
df_new.show()
输出:
+-------+---+
| Name|Age|
+-------+---+
| Dave| 40|
| Dave| 40|
| Dave| 40|
+-------+---+
在Scala中,可以按照以下步骤进行操作:
import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder.getOrCreate()
val data = Seq(("Alice", 25), ("Bob", 30), ("Charlie", 35))
val df = spark.createDataFrame(data).toDF("Name", "Age")
df.show()
输出:
+-------+---+
| Name|Age|
+-------+---+
| Alice| 25|
| Bob| 30|
|Charlie| 35|
+-------+---+
withColumn()
方法创建新行:val new_row = ("Dave", 40)
val df_new = df.withColumn("Name", lit(new_row._1)).withColumn("Age", lit(new_row._2))
df_new.show()
输出:
+-------+---+
| Name|Age|
+-------+---+
| Dave| 40|
| Dave| 40|
| Dave| 40|
+-------+---+
以上示例中,我们使用withColumn()
方法将新的姓名和年龄值添加到DataFrame中,并使用lit()
函数将值转换为常量列。
领取专属 10元无门槛券
手把手带您无忧上云