javascript - Data Structure for storing partial urls where search speed is the priority -
we have large database of partial urls (strings) such as:
"example1.com"
"example2.com/test.js"
"/foo.js"
our software listens http requests , tries find 1 of our database's partial urls in http request's full url.
so getting full urls (i.e.: http://www.example.com/blah.js?foo=bar") , trying match 1 of our database's partial patterns on it.
which best data structure store our partial urls database on if care search speed?
right now, do:
- iterating through entire database of partial urls (strings) , using indexof (in javascript) see if full url contains each partial string.
update:
this software extension firefox written in javascript on firefox's addon sdk.
assuming partial strings domain names and/or page names try generate possible combinations url starting end:
http://www.example.com/blah.js?foo=bar blaj.js example.com/blah.js www.example.com/blah.js
then hash combinations, store them in array , try find of them in array contains hashes of partial strings database.
note:
in case want match string in url, ample
in example.com
becomes little complicated in terms of storage, because random combinations of strings in url
where n
length of url , k
length of string find. according this question maximum reasonable length of url 2000 characters. , assuming want match random string you'd have k
vary 1 2000 result in large amount of hashes generated url - sum of n on k
each k
1 2000. or more precisely - 2000! / (k!*(2000-k)!) different hashes
Comments
Post a Comment