c# - Find byte sequence within a byte array -
i have byte array , wish find "occurrences" of bytes.
for example 00 69 73 6f 6d
in large byte array (> 50/100 megabytes)
or
even better reverse operation: searching common pattern without knowing code should able read , find file.
you can use boyer-moore algorithm efficiently search sequence of bytes in array of bytes.
here's c# version converted java version the wikipedia entry on boyer-moore.
public sealed class boyermoore { readonly byte[] needle; readonly int[] chartable; readonly int[] offsettable; public boyermoore(byte[] needle) { this.needle = needle; this.chartable = makebytetable(needle); this.offsettable = makeoffsettable(needle); } public ienumerable<int> search(byte[] haystack) { if (needle.length == 0) yield break; (int = needle.length - 1; < haystack.length;) { int j; (j = needle.length - 1; needle[j] == haystack[i]; --i, --j) { if (j != 0) continue; yield return i; += needle.length - 1; break; } += math.max(offsettable[needle.length - 1 - j], chartable[haystack[i]]); } } static int[] makebytetable(byte[] needle) { const int alphabet_size = 256; int[] table = new int[alphabet_size]; (int = 0; < table.length; ++i) table[i] = needle.length; (int = 0; < needle.length - 1; ++i) table[needle[i]] = needle.length - 1 - i; return table; } static int[] makeoffsettable(byte[] needle) { int[] table = new int[needle.length]; int lastprefixposition = needle.length; (int = needle.length - 1; >= 0; --i) { if (isprefix(needle, + 1)) lastprefixposition = + 1; table[needle.length - 1 - i] = lastprefixposition - + needle.length - 1; } (int = 0; < needle.length - 1; ++i) { int slen = suffixlength(needle, i); table[slen] = needle.length - 1 - + slen; } return table; } static bool isprefix(byte[] needle, int p) { (int = p, j = 0; < needle.length; ++i, ++j) if (needle[i] != needle[j]) return false; return true; } static int suffixlength(byte[] needle, int p) { int len = 0; (int = p, j = needle.length - 1; >= 0 && needle[i] == needle[j]; --i, --j) ++len; return len; } }
here's console app test code it:
public static void main() { byte[] haystack = new byte[10000]; byte[] needle = { 0x00, 0x69, 0x73, 0x6f, 0x6d }; // put few copies of needle haystack. (int = 1000; <= 9000; += 1000) array.copy(needle, 0, haystack, i, needle.length); var searcher = new boyermoore(needle); foreach (int index in searcher.search(haystack)) console.writeline(index); }
note how search()
method returns indices of locations of start of needle
inside haystack
.
if wanted count, do:
int count = new boyermoore(needle).search(haystack).count();
for second question: assume asking finding longest repeated sequence of bytes?
that's more complicated - , different - question. if want answer that, should ask separate question it, should read the wikipedia entry on "longest repeated substring problem".
Comments
Post a Comment