我正在尝试找出spark数据框中的列是什么数据类型,并基于该定义操作列。
这是我到目前为止所知道的:
import pyspark
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('MyApp').getOrCreate()
df = spark.read.csv('Path To csv File',inferSchema=True,header=True)
for x in df.columns:
if type(x) == 'integer
下面是我将数据写入Hive的代码
from pyspark import since,SparkContext as sc
from pyspark.sql import SparkSession
from pyspark.sql.functions import _functions , isnan
from pyspark.sql import SQLContext
from pyspark.sql.types import *
from pyspark import HiveContext as hc
spark = SparkSession.builder.appName("
在使用df.write.csv尝试将spark数据文件导出到csv文件后,我得到以下错误消息:
~\AppData\Local\Programs\Python\Python39\lib\site-packages\py4j\protocol.py
in get_return_value(answer, gateway_client, target_id, name
324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
325 if answer[1]