此代码创建包含numpy ndarrays的50k行( 1.5GB文件所需时间超过8分钟):x = pd.DataFrame(columns# like if it concatenates x with a new dataframe each time正如在中提到的,您只能加载某些列:
x = pd.read_parquet</e
在的这段代码中,parquetFile = sqlContext.read.parquet("people.parquet")
# Parquet files can also be registered as tables and then used in SQL statements.sqlContext.sql("SELECT name FROM