d/l html source of 11 webpages+do simple operations on text

Closed Posted Aug 28, 2010 Paid on delivery
Closed Paid on delivery

I would like a simple application which does the following:

(1) A search engine, a search phrase and a webpage URL are specified as Input. The application downloads the source of Top 10 Pages for that search engine and search phrase, and also downloads the source of the webpage (specified as Input).

(2) From step 1, there is source code of 11 pages (10 from search engine +1 specified as Input). Now do some simple text based comparion operations on the source of all 10 pages. Eg. check for occurence of search phrase in h2 headiings for all 11 pages, and so on. (I will provide a list of operations to be performed with a working example so that you can easily understand exactly what all operations have to be done).

(3) A report is created detailing the results of various operations of step 2. This is compressed to zip and stored on server.

(4) There are 2 simple interfaces- one for admin and one for end user. THe link to create such a report is only shown to admin, and once the report has been created a link to it is shown to both admin and end user.

(5) Some simple operations- eg. finding the age of the 11 domains, finding number of backlinks to each domain, also have to be done. I have a php script with me which does all these things, you can use that script for this application, or you can code it on your own.

I would prefer this to be done in Java and esp. using Google Apps Cloud, but you can also do this as a regular Java application, or even as a PHP application.

Regards,

Arvind.

Java PHP Software Architecture

Project ID: #772020

About the project

4 proposals Remote project Active Oct 27, 2010