machine learning - Multinomial Logistic Regression in spark ml vs mllib -


spark version 2.0.0 has stated goal bring feature parity between ml , now-deprecated mllib packages.

presently ml package provides elasticnet support binary regression. obtain multinomial apparently have accept using deprecated mllib?

the downsides of using mllib:

  • it deprecated. have "why using old stuff" questions field
  • they not use ml workflow not integrate cleanly
  • for above reasons have rewrite.

is there approach available achieving one-vs-all multinomial ml package?

this answer-in-progress. there is onevsrest classifier in spark.ml.
apparently approach provide logisticregressionclassifier binary classifier - run binary version across classes , return class highest score.

update in response @zero323. here info xiangrui meng on deprecation of mllib:

switch rdd-based mllib apis maintenance mode in spark 2.0

hi all,  more year ago, in spark 1.2 introduced ml pipeline api built on top of spark sql’s dataframes. since new dataframe-based api has been developed under spark.ml package, while old rdd-based api has been developed in parallel under spark.mllib package. while easier implement , experiment new apis under new package, became harder , harder maintain both packages grew bigger , bigger. , new users confused having 2 sets of apis overlapped functions.  started recommend dataframe-based api on rdd-based api in spark 1.5 versatility , flexibility, , saw development , usage gradually shifting dataframe-based api. counting lines of scala code, 1.5 current master added ~10000 lines dataframe-based api while ~700 rdd-based api. so, gather more resources on development of dataframe-based api , users migrate on sooner, want propose switching rdd-based mllib apis maintenance mode in spark 2.0. mean exactly?  * not accept new features in rdd-based spark.mllib package, unless block implementing new features in dataframe-based spark.ml package. * still accept bug fixes in rdd-based api. * add more features dataframe-based api in 2.x series reach feature parity rdd-based api. * once reach feature parity (possibly in spark 2.2), deprecate rdd-based api. * remove rdd-based api main spark repo in spark 3.0.  though rdd-based api in de facto maintenance mode, announcement make clear , hence important both mllib developers , users. we’d appreciate feedback!  (as side note, people use “spark ml” refer dataframe-based api or entire mllib component. causes confusion. clear, “spark ml” not official name , there no plans rename mllib “spark ml” @ time.)  best, xiangrui 

another update there jira , work nearing completion of may 2016 support multiclass logistic regression in spark.ml


Comments

Popular posts from this blog

ios - RestKit 0.20 — CoreData: error: Failed to call designated initializer on NSManagedObject class (again) -

java - Digest auth with Spring Security using javaconfig -

laravel - PDOException in Connector.php line 55: SQLSTATE[HY000] [1045] Access denied for user 'root'@'localhost' (using password: YES) -