You are here

Use description from show detail page in <desc> tag

7 posts / 0 new
Last post
Graham
Offline
Donator
Joined: 10 years
Last seen: 7 months
Use description from show detail page in <desc> tag

I am using radiotimes.com.ini.  I want to use the description from the show details page in the <desc> tag and not use the description from the index page.
What should I put in the .ini file?
Thank you
 
Never mind.  By trial and error (lots of error), I am making progress.

WGMaker
Offline
WGMaker's picture
WG++ Team memberDonator
Joined: 11 years
Last seen: 1 hour
Is the support helpful?
support us

Hi Graham,
 
The prefered ini for radiotimes is the one that uses the xmltv feed http://www.webgrabplus.com/sites/default/files/download/ini/info/zip/UK_radiotimes.com.xmltv.zip
That one is faster and doesn't need a show detail page because all the data is in the index page, so no problem with a double listing in desc. But if you prefer to use the other one radiotimes.com.ini you can just add a * as first character to the line   ..   index_description.scrub {single|"Description":"||","|","}
 
Jan

Graham
Offline
Donator
Joined: 10 years
Last seen: 7 months
WGMaker wrote:

The prefered ini for radiotimes is the one that uses the xmltv feed ...

Jan. Thank you.  I am trying to get text that is not in the xmltv feed and is only available by scraping the show web pages.
 
I can get the data that I want and it is all okay when I set   <timespan>0</timespan> to get data for today only.  The xmltv produced by WebGrab is not okay when I set   <timespan>7</timespan> with many shows having the wrong date in   <programme start="20131201133 ...> and in the stop time.
 
On 30 November I ran a test using  <timespan>7 17:00</timespan> and some shows have the wrong date in the xmltv.  On 1 December I ran the same test using  the same config.xml and the same .ini and shows again have the wrong date in the xmltv.

The attached zip includes the config.xml and the .ini with the xml files produced by each test.  An example of the problem is "Live Snooker" on BBC2.  I expected this show to be in the xmltv on 1, 2, 3, 4 and 5 of December but the show is missing on some dates.
 
Another example can be found by searching the xmltv files for the string "Claire Rawle and James Lewis".  The search will find the same show in each xmltv file but with different dates.
 
I have looked at the .ini but I am not able to see the cause of the problem.  Thank you for any suggestions and for the work that you have done on WebGrab++.
Note: .... All attempts to attach a file to this post appear to have failed with

An AJAX HTTP request terminated abnormally.
Debugging information follows.
Path: /file/ajax/field_attachments/und/form-QEEJiX4piLqoJjlHcWcrftmDn9igAyZ8IL2czNZ9CYo
StatusText: n/a
ResponseText:
WebGrab+Plus
Logged in as GrahamLogout
Error                
The website encountered an unexpected error. Please try again later.
Error message
EntityMalformedException: Missing bundle property on entity of type comment. in entity_extract_ids() (line 7663 of C:\DrupalWG\includes\common.inc).
Theme by Danetsoft and Danang Probo Sayekti inspired by Maksimer-->
Program Development - Jan van Straaten ------- Web design - Francis De PaemeleereSupported by: servercare.nl

ReadyState: undefined

Attachments: 
WGMaker
Offline
WGMaker's picture
WG++ Team memberDonator
Joined: 11 years
Last seen: 1 hour
Is the support helpful?
support us

Hi,
the site had some duplicate shows in the index page which we didn't discover before. I removed these duplicates in rev 9 of radiotimes.com.ini which you can get @ http://www.webgrabplus.com/sites/default/files/download/ini/info/zip/UK_radiotimes.com.zip
 
Jan

Graham
Offline
Donator
Joined: 10 years
Last seen: 7 months
WGMaker wrote:

... rev 9 of radiotimes.com.ini ...

Thank you.  This is now correctly capturing all of the days that I specify.

You have ...

episode.modify {remove|'temp_1'}
subtitle.modify {remove|'temp_1'}
subtitle.modify {remove|'episode'}
which produces a sub-title ...

<title lang="en">Who Do You Think You Are?</title>
<sub-title lang="en">. Sue Johnston</sub-title>

I have edited to ...

episode.modify {remove|'temp_1'}
subtitle.modify {remove|'temp_1'}
subtitle.modify {remove|'episode'}
subtitle.modify {remove|'. '}

which produces a sub-title (without the leading dot space) ...

<title lang="en">Who Do You Think You Are?</title>
<sub-title lang="en">Sue Johnston</sub-title>

I tried ...

subtitle.modify {remove|'. ' 0 2}

to restrict the modify to the first 2 chars but it didn't work.  Any suggestions?

Thank you.
 

WGMaker
Offline
WGMaker's picture
WG++ Team memberDonator
Joined: 11 years
Last seen: 1 hour
Is the support helpful?
support us

Graham,
 
regarding  subtitle.modify {remove|'. ' 0 2} , indeed that won't work! Remove (and also replace and substring for that matter) comes in several flavours depending on the argument type. The default value of type is string , which is used for your example , then remove tries to remove a string with value "'. ' 0 2"  which it won't find so nothing happens.
 
Here you intend to use index values (start pos and length). For that type needs to be specified as type=char (other possible index types are word, sentence, paragraph and element, see the manual 4.6 - arguments). But even specified as subtitle.modify {remove|0 2} it wont work, because now it will always remove the first two chars whatever their value. With a few extra lines you can solve that by first checking if the first two char are indeed ". "   Like this:
temp_1.modify {substring(type=char)|'subtitle' 0 2}
subtitle.modify {remove('temp_1' = ". " type=char)|0 2}

 
The best way is to use type=regex . This can do the checking and removing in one scrubstring :
subtitle.modify {remove(type=regex)|(\A\. ).+}
Disadvantage here is only that to understand regular expressions needs some study.
 
Jan
  
 

Graham
Offline
Donator
Joined: 10 years
Last seen: 7 months
WGMaker wrote:

temp_1.modify {substring(type=char)|'subtitle' 0 2}
subtitle.modify {remove('temp_1' = ". " type=char)|0 2}

Thank you again.
I now have an xmltv file that is good for me.
G

Log in or register to post comments

Brought to you by Jan van Straaten

Program Development - Jan van Straaten ------- Web design - Francis De Paemeleere
Supported by: servercare.nl