使用Scala创建包含随机内容的大型Spark数据帧可以通过以下步骤实现:
import org.apache.spark.sql.{SparkSession, Row}
import org.apache.spark.sql.types.{StructType, StructField, StringType, IntegerType}
import org.apache.spark.sql.functions.rand
val spark = SparkSession.builder()
.appName("RandomDataFrame")
.master("local")
.getOrCreate()
val schema = StructType(Seq(
StructField("id", IntegerType, nullable = false),
StructField("name", StringType, nullable = false),
StructField("age", IntegerType, nullable = false)
))
val numRows = 1000000 // 数据帧的行数
val randomDF = spark.range(numRows)
.selectExpr("CAST(id AS INT)", "CONCAT('Name', CAST(id AS STRING))", "CAST(RAND() * 100 AS INT)")
.toDF("id", "name", "age")
这里使用spark.range
生成一个包含指定行数的数据帧,然后使用selectExpr
函数生成随机的id、name和age列。
randomDF.show()
完整的代码示例:
import org.apache.spark.sql.{SparkSession, Row}
import org.apache.spark.sql.types.{StructType, StructField, StringType, IntegerType}
import org.apache.spark.sql.functions.rand
val spark = SparkSession.builder()
.appName("RandomDataFrame")
.master("local")
.getOrCreate()
val schema = StructType(Seq(
StructField("id", IntegerType, nullable = false),
StructField("name", StringType, nullable = false),
StructField("age", IntegerType, nullable = false)
))
val numRows = 1000000 // 数据帧的行数
val randomDF = spark.range(numRows)
.selectExpr("CAST(id AS INT)", "CONCAT('Name', CAST(id AS STRING))", "CAST(RAND() * 100 AS INT)")
.toDF("id", "name", "age")
randomDF.show()
这样就可以使用Scala创建一个包含随机内容的大型Spark数据帧。
领取专属 10元无门槛券
手把手带您无忧上云