string - Creating an R data.frame column based on the difference between two character columns -
i have data.frame, df, have 2 columns, 1 title of song , other combined title , artist. wish create separate artist field. first 3 rows shown here
title titleartist i'll never smile again i'll never smile again tommy dorsey & orchestra / frank sinatra & pied pipers imagination imagination glenn miller & orchestra / ray eberle breeze , breeze , jimmy dorsey & orchestra / bob eberly
there no issues on set of data code
library(stringr) library(dplyr) df %>% head(3) %>% mutate(artist=str_to_title(str_trim(str_replace(titleartist,title,"")))) %>% select(artist,title) artist title 1 tommy dorsey & orchestra / frank sinatra & pied pipers i'll never smile again 2 jimmy dorsey & orchestra / bob eberly breeze , 3 glenn miller & orchestra / ray eberle imagination
but when apply thousands of rows error
error: incorrectly nested parentheses in regexp pattern. (u_regex_mismatched_paren) #or part of mutation df$artist <-str_replace(df$titleartist,df$title,"") error in stri_replace_first_regex(string, pattern, replacement, opts_regex = attr(pattern, : incorrectly nested parentheses in regexp pattern. (u_regex_mismatched_paren)
i have removed parentheses columns , code appears work while before error
error: syntax error in regexp pattern. (u_regex_rule_syntax)
is special character might causing issue or might else?
tia
your general problem str_replace
treating artist
values regular expressions, there lot of potential errors due special characters beyond parentheses. stringi
library, stringr
wraps , simplifies, allows more fine-grained controls, including treating arguments fixed strings instead of regexes. don't have original data works when throw error-causing characters in:
library(dplyr) library(stringi) df = data_frame(title = c("i'll never smile again (", "imagination.*", "the breeze , i(?>="), titleartist = c("i'll never smile again ( tommy dorsey & orchestra / frank sinatra & pied pipers", "imagination.* glenn miller & orchestra / ray eberle", "the breeze , i(?>= jimmy dorsey & orchestra / bob eberly")) df %>% mutate(artist=stri_trans_totitle(stri_trim(stri_replace_first_fixed(titleartist,title,"")))) %>% select(artist,title)
results:
source: local data frame [3 x 2] artist title (chr) (chr) 1 tommy dorsey & orchestra / frank sinatra & pied pipers i'll never smile again ( 2 glenn miller & orchestra / ray eberle imagination.* 3 jimmy dorsey & orchestra / bob eberly breeze , i(?>=
Comments
Post a Comment