[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
looking for hostname geographic hint validation
On Wed, Aug 28, 2013 at 04:07:05PM +0100, Ben wrote:
> Dear Bradley,
>
> So basically you're asking others to do your homework for you ? ;-)
Actually no, I'm asking people to do something which I can not.
While it is true I could test against a manual inference, I would simply
be checking one inference against another. Agreement would only prove
that the algorithm does what I expect. Only the operators, who actually
know what they are doing, can give me the ground truth I need to test my
inferences against reality.
> For example, picking one example from your list ....
>
> <iata>([^a-z]+[a-z]+\d*){3}.ic.ac.uk
>
> Far from being IATA codes, the intermediate subdomains actually refer to
> departments (DepartmentOfComputing and CHemistry in the two I quoted).
>
> Sorry to rain on your parade, but someone had to say it. ;-)
You are most likely right, but I am not looking for perfection. I am
hoping for an inference that will get me with in 10 km of the actual
city most of the time.
Given the validation I have so far, out of the 19,611 hostnames for which a
location is inferred, and I have validation data, we infer the city
correctly 93% of the time.
While there is work left to do, it is far from the lost cause you
present.
--
the value of a world model is not how accurately it captures reality
but how often it leads us to take appropriate action