java - Performance Impact of RDD to JavaRDD conversion -
i have code , wan work against javardd instead of rdd i'm doing conversion here. know performance impact of transformation specialy when i'm dealing gbs of data.
rdd<string> textfile = sc.textfile(filepath, 2); javardd<string> javardd = textfile.tojavardd();
this wide or narrow transformation ? difference between javardd , rdd ?
there's no significant performance penalty - javardd
simple wrapper around rdd
make calls java code more convenient. holds original rdd
ad member, , calls member's method on method invocation, example (from javardd.scala):
def cache(): javardd[t] = wraprdd(rdd.cache())
wraprdd
boils down new javardd[t](rdd)
, performance penalty creating thin java object every method invocation, that's entirely negligible it's not done per element in rdd, once entire object.
Comments
Post a Comment