我试图做一些源,以目标测试在火星雨。我要做的第一部分是使用精益六西格玛方法对列进行计数,以确保列中的差异小于3/1000000。但是,当我运行这个程序时,if语句会抛出一个:
TypeError:无效论证,而不是字符串或列:-276244类型类型.对于列文字,请使用“lit”、“数组”、“struct”或“create_map”函数。
有人能帮忙吗?
import pyspark.sql.functions as f
from pyspark.sql.types import *
good_fields = []
bad_fields = {}
count_issues = {}
columns = list(spark.sql('show columns from tu_historical').toPandas()['col_name'])
for col in columns:
print(col)
df = spark.sql(f'select pid,fnum,{col} from historical_clean')
df1 = spark.sql(f'select pid,fnum,{col} from historical1')
#count issue testing
if abs(df1.count()-df.count()) > df1.count()*.000003:
count_issues[col] = df1.count()-df.count()
test_df = df.join(df1,(df.num == df1.file) & (df1.pid == df.pid),'left').filter(df1[col]!=df[col])发布于 2022-05-02 14:33:12
似乎你的columns有一个奇怪的价值。
您可能需要使用它来获取列名:
columns = spark.sql('select * from tu_historical limit 0').columnshttps://stackoverflow.com/questions/72088067
复制相似问题