c# - Find Strings with certain Hamming distance LINQ -
if run following (thanks @octavioccl help) linq query:
var result = stringslist .groupby(s => s) .where(g => g.count() > 1) .orderbydescending(g => g.count()) .select(g => g.key);
it gives strings occur in list atleast twice (but matched i.e. hamming distance =0).
i wondering if there elegant solution (all solutions have tried far either use loops , counter ugly or regex) possible can specify hamming distance in where
clause strings lie within specified hamming distance range?
p.s: strings of equal length
update
really krontogiannis detailed answer. mentioned earlier, want list of strings hamming distance below given threshold. code working fine (thanks again).
only thing remaining take strings out of 'resultset' , insert/add `list'
basically want:
list<string> outputlist = new list<string>(); foreach (string str in patternslist) { var rs = wordslist .groupby(w => hamming(w, str)) .where(h => h.key <= hammingthreshold) .orderbydescending(h => h.key) .select(h => h.count()); outputlist.add(rs); //i know won't work show needed }
thanks
calculating hamming distance between 2 strings using linq can done in elegant way:
func<string, string, int> hamming = (s1, s2) => s1.zip(s2, (l, r) => l - r == 0 ? 0 : 1).sum();
you question bit vague 'grouping'. can see calculate hamming distance need 2 strings. either need calculate hamming distance words in string list vs input, or calculate distance between words in list (or different need tell :-) ).
in way i'll give 2 examples input
var words = new[] { "hello", "rellp", "holla", "fooba", "hempd" };
case 1
var input = "hello"; var hammingthreshold = 3; var rs = words .groupby(w => hamming(w, input)) .where(h => h.key <= hammingthreshold) .orderbydescending(h => h.key);
output like
hempd distance 3 rellp holla distance 2 hello distance 0
case 2
var hs = words .selectmany((w1, i) => words .where((w2, j) => > j) .select(w2 => new { word1 = w1, word2 = w2 })) // word pairs except self .groupby(pair => hamming(pair.word1, pair.word2)) .where(g => g.key <= hammingthreshold) .orderbydescending(g => g.key);
output like
(holla, rellp) (fooba, holla) (hempd, hello) distance 3 (rellp, hello) (holla, hello) distance 2
edit words first grouping can use selectmany
var output = rs.selectmany(g => g).tolist();
Comments
Post a Comment