string - Creating an R data.frame column based on the difference between two character columns -

- April 15, 2010

i have data.frame, df, have 2 columns, 1 title of song , other combined title , artist. wish create separate artist field. first 3 rows shown here

title                               titleartist i'll never smile again  i'll never smile again tommy dorsey & orchestra / frank sinatra & pied pipers imagination         imagination glenn miller & orchestra / ray eberle breeze ,    breeze , jimmy dorsey & orchestra / bob eberly

there no issues on set of data code

library(stringr) library(dplyr)   df %>%   head(3) %>%   mutate(artist=str_to_title(str_trim(str_replace(titleartist,title,"")))) %>%   select(artist,title)   artist                                                         title 1 tommy dorsey & orchestra / frank sinatra & pied pipers i'll never smile again 2                  jimmy dorsey & orchestra / bob eberly       breeze ,  3                  glenn miller & orchestra / ray eberle            imagination

but when apply thousands of rows error

error: incorrectly nested parentheses in regexp pattern. (u_regex_mismatched_paren)  #or part of mutation  df$artist <-str_replace(df$titleartist,df$title,"")  error in stri_replace_first_regex(string, pattern, replacement, opts_regex =    attr(pattern,  :   incorrectly nested parentheses in regexp pattern. (u_regex_mismatched_paren)

i have removed parentheses columns , code appears work while before error

error: syntax error in regexp pattern. (u_regex_rule_syntax)

is special character might causing issue or might else?

tia

your general problem str_replace treating artist values regular expressions, there lot of potential errors due special characters beyond parentheses. stringi library, stringr wraps , simplifies, allows more fine-grained controls, including treating arguments fixed strings instead of regexes. don't have original data works when throw error-causing characters in:

library(dplyr) library(stringi)   df = data_frame(title = c("i'll never smile again (",  "imagination.*", "the breeze , i(?>="),            titleartist = c("i'll never smile again ( tommy dorsey & orchestra / frank sinatra & pied pipers",                             "imagination.* glenn miller & orchestra / ray eberle",                             "the breeze , i(?>= jimmy dorsey & orchestra / bob eberly"))  df %>%   mutate(artist=stri_trans_totitle(stri_trim(stri_replace_first_fixed(titleartist,title,"")))) %>%    select(artist,title)

results:

source: local data frame [3 x 2]  artist                     title (chr)                     (chr) 1 tommy dorsey & orchestra / frank sinatra & pied pipers i'll never smile again ( 2                  glenn miller & orchestra / ray eberle             imagination.* 3                  jimmy dorsey & orchestra / bob eberly      breeze , i(?>=

Search This Blog

Today's Best Video

string - Creating an R data.frame column based on the difference between two character columns -

Comments

Post a Comment

Popular posts from this blog

ios - RestKit 0.20 — CoreData: error: Failed to call designated initializer on NSManagedObject class (again) -

java - Digest auth with Spring Security using javaconfig -

laravel - PDOException in Connector.php line 55: SQLSTATE[HY000] [1045] Access denied for user 'root'@'localhost' (using password: YES) -