WebGrab+Plus is a multi-site incremental xmltv epg grabber.
It collects tv-program guide data from selected tvguide sites for your favourite channels.
Fast through its incremental mode in which it grabs only what is new or changed.
Rich in detail and highly configurable.
Optional postprocessors to add IMDb data or to customize your xmltv listing.
How it works . . . .
The xmltv update process:
Assuming a previous xmltv listing exist (e.g. of yesterday), the program reads this and stores it as a target for update and as reference of what shows have to be changed or added. If no xmltv listing exists, the program creates a new one. Before grabbing show details, the program determines if the existing show in the xmltv listing is still valid or needs an update. For that it connects to the TV Guide website and grabs the so called index pages (the html pages that contain an overview the scheduled shows per timespan (e.g. day or several days)). It then compares the shows listed there (channel, start and stop times and title) with shows in the existing xmltv listing. As a result of this comparison the following situations occur:
- same (.), no update .The show in the index page is considered the same as the one in the existing xmltv listing.
- changed (c), update. The index show is different from the xmltv show but they have overlapping or equal time span.
- gab (g), insert . The index show fits in a time gab of the xmltv listing.
- new (n), add . The index show is new, it will be added to the end (or to the beginning if that is the case) of the xmltv listing.
- repair (r), update. This is a special situation that occurs if errors or overlapping shows are detected in the xmltv listing. The program will try to solve this by remove and update.
When the program runs, these resulting situations for each show are printed in the command window like this (the iiii indicates 4 days of index pages downloaded):
The comparison of the show title in the index page (index_title) and the one in the xmltv file is rather complicated and tricky. This is due to the fact that the index_title frequently differs from the one in the show detail page to a certain extend. Differences can be due to abbreviation of long titles, different use of punctuation characters and combination of title with other elements in the index_title (like category and subtitle). The program deals with all those differences through a weighted comparison. The result of this comparison is a 'title match factor', which , roughly, is the biggest percentage of 'matching' words between the two titles in any of the elements of the index_title. If this title match factor is less than the value for it in the siteini file the show is considered - not same - and a show update is started.
For that it will grab the show details from the show detail html page(s) of the TV Guide website if provided by it.
The update modes:
The program supports a variety of update modes. The preferred and most efficient is
This works as described above for all shows in the index page. In this mode the download time is reduced to the minimum.
Other update modes are:
'light' (l) which is incremental but forces a re-grab of all shows for 'today' ,
'smart' (s) is the same with a forced re-grab for today and tomorrow ,
'full' (f) not incremental, forces a full re-grab of all days requested.
Besides and independent from the modes mentioned above is a special grabbing mode 'index-only' that is automatically selected by the program if no elements need to be scrubbed from the show detail page. This mode is 'superfast' but seldom useful because most sites provide very little show data on the index page. But if you are satisfied with just start and stop times and a title it's there. Occasionally there is a site with richer data on the index page (like tvguide.co.uk). Some sites list only details on the index page or provide only more detailed information for some shows on detail pages. The program automatically recognizes these cases.