Assimilator is a handy tool for automatic search and download of files from the web. The approach is like to drift net fishing for files. This idea is based on a sort of six degrees of separation theory, i.e. If you know site that contains information related to what your looking for, chances are it will have a link to another related site that my have more information or files that you are searching for. This documentation is still under construction!
Future areas of development
Bugs and Issues
Completely re-written with support for Mac. The crawler is now HTTP/1.1 compliant and supports transparent redirections and resuming of downloads. HTML parsing is greatly improved but support for java script parsing for links has yet to be added. New version will search images, hyperlinks and frames for links and can search any sort of file as specified by the user. Recently added features include using Google search to generate initial url, HTML formatting of exported file lists,trasparent Redirection and resume capability (depends on server).
In order to get the best out of AWC it may help to understand the way in which it works. This section is targeted at giving you an idea of the theories behind AWC and insight into gaining better results from the program. AWC will extract all the links from a web page and then parse them for files. Any files matching the type and name specifications will then be queued for downloading. Any links that aren’t files will be queued for parsing and then the process repeats with the next page in the queue.
The quick analysis tool is used to gain a quick insight into the links contained (or as detected
by AWC) on a certain page. To check a URL simply type it into the text box and hit return or
click the "Get" Button. You can abort a HTTP request at any stage by clicking the "Stop" button.
Once the status text regesters the page having been successfully recieved, you can view any
aspect of the page by the appropriate button:
A new search is started by selecting the appropriate option from the file menu. This displays a dilog box for defining the search criterea. The first entry is the initial URL, this is the location of the web page on which the search will start. This entry is mandatory. To the right of this is the check box labeled "Google", when this is enabled the first entry becomes the keywords for a search on www.google.com which will form the initial URL for the search.
In each of the fields for the window multiple keywords or types are separated by a single space. If a keyword is preceeded by a minus sign that keyword must not be found in the search field eg. "findThis -notThis". To the right of some of the feilds is an option labeled "All", when this is enabled, all keywords in the corresponding field must be found (or not found in the case of a negative keyword) in order for a match to be made.
Once the criterea have been appropriatly specified, click go to begin the search.
once a search has begun the search window will appear this window is used to display the progress of the current search, as well as manipulating the pages searched and order in hich it is done. At the top of the window are the main operation buttons Pause/Continue, Skip, Refine, Exit and More/Less. In the bottom half of the screen are the various information panels, Search Queue, History, File Que and Info. Easch of these panel's displays different information concerning the search currently underway.
searchs need to be "managed".... why
When this happens you can remove pages by hand ueing the delete button in the search panel or refine the initial search criterea, currently refine will only start a knew search but soon it is hoped that it will be possible to refine searches in progress and reapply criterea to exiting queues.
If a url in the search queue is strongly suspected to contain relevent links then it can be promoted to the top of the queue to be searched next. the url at the top of the search queue is requested and parsed for links, if any of those links meet the page criterea they are added to the bottom of the search queue, if any meet the file criterea
If you wish to add a relavent page to the search queue you can do so by typing it in the small textbox provided and clicking the add button.
The following are ideas for future development of AWC once the current version is up to scratch. After the basic functionality is complete and working bug free, then work will begin on expanding awc to make it more powerful.