We have a web application based on WebSpeed. It is basically formed like this
Input Form -> Price Listing -> Selection -> Confirmation
There are of course a number of parameters passed between these pages and for the price listing we have an url that looks something like (a bit simplified):
departure=AAA&destination=BBB&departureDate=CCC&arrivalDate=DDD
The problem is that we have a number of external sources polling the Price Listing a couple of times per minute and changing the input parameters, then probably electronically scraping the resulting page. Its quite obvious what they are doing since I logg all accesses and the pattern is clear:
Date (YYYY-MM-DD);Time;IP;Departure;Destination;DepartureDate;ArrivalDate
2011-04-07;14:02:59;x.x.x.x;ARN;PVZ;2011-06-15;2011-07-06
2011-04-07;14:03:25;x.x.x.x;ARN;PVZ;2011-06-16;2011-07-07
2011-04-07;14:03:49;x.x.x.x;ARN;PVZ;2011-06-17;2011-07-08
2011-04-07;14:04:11;x.x.x.x;ARN;PVZ;2011-06-18;2011-07-09
2011-04-07;14:04:26;x.x.x.x;ARN;PVZ;2011-06-19;2011-07-10
2011-04-07;14:04:49;x.x.x.x;ARN;PVZ;2011-06-20;2011-07-11
2011-04-07;14:05:19;x.x.x.x;ARN;PVZ;2011-06-21;2011-07-12
2011-04-07;14:05:48;x.x.x.x;ARN;PVZ;2011-06-22;2011-07-13
2011-04-07;14:06:18;x.x.x.x;ARN;PVZ;2011-06-23;2011-07-14
2011-04-07;14:06:47;x.x.x.x;ARN;PVZ;2011-06-24;2011-07-15
2011-04-07;14:07:21;x.x.x.x;ARN;PVZ;2011-06-25;2011-07-16
2011-04-07;14:07:38;x.x.x.x;ARN;PVZ;2011-06-26;2011-07-17
2011-04-07;14:08:11;x.x.x.x;ARN;PVZ;2011-06-27;2011-07-18
2011-04-07;14:08:21;x.x.x.x;ARN;PVZ;2011-06-28;2011-07-19
2011-04-07;14:08:48;x.x.x.x;ARN;PVZ;2011-06-29;2011-07-20
2011-04-07;14:09:28;x.x.x.x;ARN;PVZ;2011-06-30;2011-07-21
The issue we have with this is:
a) They take up resources in webspeed thus making us having to have more licenses than really needed
b) They take up resources in our server forcing us to invest in faster than needed servers
c) These calls also invoke an external web service call that actually costs money
Has anybody handled a problem like this? We are considering blocking IP's when reach a specific number of searches but then we have to keep track of a list of IPs that are free to search as much as they want etc.
Are you legitimate users people of flesh and blood?
In that case a very simple solution is to add a CAPTCHA on your form. Check out
http://en.wikipedia.org/wiki/CAPTCHA
-peter