python - Classification of sparse data -
i struggling best choice classification/prediction problem. let me explain task - have database of keywords abstracts different research papers, have list of journals specified impact factors. want build model article classification based on keywords, result possible impact factor (taken number without further journal description) given keywords. removed unique keyword tags not have statistical significance have keywords repeated 2 , more times in abstract list (6000 keyword total). think dummy coding - each article create binary feature vector 6000 attributes in length - each attribute refers presence of keyword in abstract , classify whole set svm. pretty sure solution not elegant , not correct, have suggestions better deal?
there nothing wrong using coding strategy text , support vector machines.
for actual objective:
- support vector regression (svr) may more appropriate
- beware of journal impact factor. crude. need take temporal aspects account; , many work not published in journals @ all
Comments
Post a Comment