python - What happens when a spark dataframe is converted to Pandas dataframe using toPandas() method -


this question has answer here:

i have spark dataframe can convert pandas dataframe using

topandas() 

method available in pyspark.

i have following queries regarding this?

  1. does conversion break purpose of using spark itself(distributed computing)?
  2. the dataset going huge , speed , memory issues?
  3. if can explain ,what happens 1 line of code,that help.

thanks

yes, once topandas called on spark-dataframe out of distributed system , new pandas dataframe in driver node of cluster.

and if spark-data frame huge , if doesnt fit driver memory crash.


Comments

Popular posts from this blog

ios - RestKit 0.20 — CoreData: error: Failed to call designated initializer on NSManagedObject class (again) -

java - Digest auth with Spring Security using javaconfig -

laravel - PDOException in Connector.php line 55: SQLSTATE[HY000] [1045] Access denied for user 'root'@'localhost' (using password: YES) -