在PySpark中,根据值出现的次数进行筛选可以通过以下步骤实现:
from pyspark.sql import SparkSession
from pyspark.sql.functions import col
spark = SparkSession.builder.appName("ValueFilter").getOrCreate()
data = [("apple", 5), ("banana", 3), ("orange", 2), ("apple", 2), ("banana", 4)]
df = spark.createDataFrame(data, ["fruit", "count"])
count_df = df.groupBy("fruit").count()
filtered_df = count_df.filter(col("count") > 2)
filtered_df.show()
答案解析:
领取专属 10元无门槛券
手把手带您无忧上云