我使用joda.time.Datetime库将字符串转换为datetime字段,但它引发不受支持的异常,这里是主要类代码:
//create new var with input data without header
var inputDataWithoutHeader: RDD[String] = dropHeader(inputFile)
var inputDF1 = inputDataWithoutHeader.map(_.split(",")).map{p =>
val dateYMD: DateTime = DateTimeFormat.forPattern(
我试图使用结构化流从Apache中的MQTT代理读取json流,读取传入json的一些属性并将它们输出到控制台。我的代码是这样的:
val spark = SparkSession
.builder()
.appName("BahirStructuredStreaming")
.master("local[*]")
.getOrCreate()
import spark.implicits._
val topic = "temp"
val brokerUrl = "tcp://localhost:1883"
v
我有一个简单的spark作业,它从文件中拆分单词并加载到hive中的表中。
public static void wordCountJava7() {
// Define a configuration to use to interact with Spark
SparkConf conf = new SparkConf().setMaster("local[4]").setAppName("Work Count App");
SparkContext sc = new SparkContext(conf);
// Crea
我正在使用Spark2.1.0和Scala2.10.6
当我尝试这样做的时候:
val x = (avroRow1).join(flattened)
我知道错误:
value join is not a member of org.apache.spark.rdd.RDD[org.apache.spark.sql.Row]
我为什么要收到这条消息?我有下列进口报表:
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
impor
我得到一个sql查询,我刚刚添加了新的where子句,当我添加那个子句时,它停止获取任何东西,如果我复制sql查询并从程序中运行它,它工作得很好。
SELECT P.PROPS, P.DT_CHANGED, PI.KOLICH, S.TEGLO,S.CENA_PROD_ED1, PI.STOKA, P.DATETIME
FROM PRODAWA P
LEFT JOIN PRODAWA_ITEMS PI ON (P.DATETIME = PI.DATETIME)
LEFT JOIN STOKI_DEF S ON (PI.STOKA = S.STOKA)
W
我需要在时间戳类型的2列之间以分钟为单位计算。网络上有这么多简单的例子,但是使用psycopg2 + sqlalchemy,它们都不能正常工作。我试过:
from sqlalchemy import as sa
from datetime import datetime
# con is a standard pool of connections :class:Connection
con.execute(
sa.func.datediff(
sa.literal_column('minute'),
datetime.utcnow()
我创建了一个DataFrame,如下所示:
val file = sc.textFile(FileName)
case class CreateDF(project:String, title:String, requests_num:Int, return_size:Int)
val df = file.map(line=>line.split(" ")).map(line=> CreateDF(line(0),line(1),line(2).toInt,line(3).toInt)).toDF()
+-------+--------------------+
如果这种问题不能用火花解决的话,我很惊讶:
iris_tbl <- copy_to(sc, aDataFrame)
# date_vector is a character vector of element
# in this format: YYYY-MM-DD (year, month, day)
for (d in date_vector) {
...
aDataFrame %>% mutate(newValue=gsub("-","",d)))
...
}
我收到这个错误:
Error: org.apache.spark
当我在将我的代码从Spark2.0迁移到2.1时,我无意中发现了一个与Dataframe保存相关的问题。
这是密码
import org.apache.spark.sql.types._
import org.apache.spark.ml.linalg.VectorUDT
val df = spark.createDataFrame(Seq(Tuple1(1))).toDF("values")
val toSave = new org.apache.spark.ml.feature.VectorAssembler().setInputCols(Array("value
看起来spark sql对" like“查询是区分大小写的,对吧?
spark.sql("select distinct status, length(status) from table")
返回
Active|6
spark.sql("select distinct status from table where status like '%active%'")
不返回值
spark.sql("select distinct status from table where status like '%Activ