python - What happens when a spark dataframe is converted to Pandas dataframe using toPandas() method -
this question has answer here:
i have spark dataframe can convert pandas dataframe using
topandas()
method available in pyspark.
i have following queries regarding this?
- does conversion break purpose of using spark itself(distributed computing)?
- the dataset going huge , speed , memory issues?
- if can explain ,what happens 1 line of code,that help.
thanks
yes, once topandas
called on spark-dataframe out of distributed system , new pandas dataframe in driver node of cluster.
and if spark-data frame huge , if doesnt fit driver memory crash.
Comments
Post a Comment