I am using radiotimes.com.ini. I want to use the description from the show details page in the <desc> tag and not use the description from the index page.
What should I put in the .ini file?
Thank you
Never mind. By trial and error (lots of error), I am making progress.
Hi Graham,
The prefered ini for radiotimes is the one that uses the xmltv feed http://www.webgrabplus.com/sites/default/files/download/ini/info/zip/UK_radiotimes.com.xmltv.zip
That one is faster and doesn't need a show detail page because all the data is in the index page, so no problem with a double listing in desc. But if you prefer to use the other one radiotimes.com.ini you can just add a * as first character to the line .. index_description.scrub {single|"Description":"||","|","}
Jan
Jan. Thank you. I am trying to get text that is not in the xmltv feed and is only available by scraping the show web pages.
I can get the data that I want and it is all okay when I set <timespan>0</timespan> to get data for today only. The xmltv produced by WebGrab is not okay when I set <timespan>7</timespan> with many shows having the wrong date in <programme start="20131201133 ...> and in the stop time.
On 30 November I ran a test using <timespan>7 17:00</timespan> and some shows have the wrong date in the xmltv. On 1 December I ran the same test using the same config.xml and the same .ini and shows again have the wrong date in the xmltv.
The attached zip includes the config.xml and the .ini with the xml files produced by each test. An example of the problem is "Live Snooker" on BBC2. I expected this show to be in the xmltv on 1, 2, 3, 4 and 5 of December but the show is missing on some dates.
Another example can be found by searching the xmltv files for the string "Claire Rawle and James Lewis". The search will find the same show in each xmltv file but with different dates.
I have looked at the .ini but I am not able to see the cause of the problem. Thank you for any suggestions and for the work that you have done on WebGrab++.
Note: .... All attempts to attach a file to this post appear to have failed with
Hi,
the site had some duplicate shows in the index page which we didn't discover before. I removed these duplicates in rev 9 of radiotimes.com.ini which you can get @ http://www.webgrabplus.com/sites/default/files/download/ini/info/zip/UK_radiotimes.com.zip
Jan
Thank you. This is now correctly capturing all of the days that I specify.
You have ...
episode.modify {remove|'temp_1'}
subtitle.modify {remove|'temp_1'}
subtitle.modify {remove|'episode'}
which produces a sub-title ...
<title lang="en">Who Do You Think You Are?</title>
<sub-title lang="en">. Sue Johnston</sub-title>
I have edited to ...
episode.modify {remove|'temp_1'}
subtitle.modify {remove|'temp_1'}
subtitle.modify {remove|'episode'}
subtitle.modify {remove|'. '}
which produces a sub-title (without the leading dot space) ...
<title lang="en">Who Do You Think You Are?</title>
<sub-title lang="en">Sue Johnston</sub-title>
I tried ...
subtitle.modify {remove|'. ' 0 2}
to restrict the modify to the first 2 chars but it didn't work. Any suggestions?
Thank you.
Graham,
regarding subtitle.modify {remove|'. ' 0 2} , indeed that won't work! Remove (and also replace and substring for that matter) comes in several flavours depending on the argument type. The default value of type is string , which is used for your example , then remove tries to remove a string with value "'. ' 0 2" which it won't find so nothing happens.
Here you intend to use index values (start pos and length). For that type needs to be specified as type=char (other possible index types are word, sentence, paragraph and element, see the manual 4.6 - arguments). But even specified as subtitle.modify {remove|0 2} it wont work, because now it will always remove the first two chars whatever their value. With a few extra lines you can solve that by first checking if the first two char are indeed ". " Like this:
temp_1.modify {substring(type=char)|'subtitle' 0 2}
subtitle.modify {remove('temp_1' = ". " type=char)|0 2}
The best way is to use type=regex . This can do the checking and removing in one scrubstring :
subtitle.modify {remove(type=regex)|(\A\. ).+}
Disadvantage here is only that to understand regular expressions needs some study.
Jan
Thank you again.
I now have an xmltv file that is good for me.
G