Use description from show detail page in <desc> tag

7 posts / 0 new

Last post

Tue, 2013-11-26 13:51

Graham

Offline

Joined: 11 years

Last seen: 8 months

Use description from show detail page in <desc> tag

I am using radiotimes.com.ini. I want to use the description from the show details page in the <desc> tag and not use the description from the index page.
What should I put in the .ini file?
Thank you

Never mind. By trial and error (lots of error), I am making progress.

Thu, 2013-11-28 08:08

WGMaker

Offline

Joined: 12 years

Last seen: 13 hours

Is the support helpful?

Hi Graham,

The prefered ini for radiotimes is the one that uses the xmltv feed http://www.webgrabplus.com/sites/default/files/download/ini/info/zip/UK_radiotimes.com.xmltv.zip
That one is faster and doesn't need a show detail page because all the data is in the index page, so no problem with a double listing in desc. But if you prefer to use the other one radiotimes.com.ini you can just add a * as first character to the line .. index_description.scrub {single|"Description":"||","|","}

Jan

Sun, 2013-12-01 15:39

Graham

Offline

Joined: 11 years

Last seen: 8 months

WGMaker wrote:

The prefered ini for radiotimes is the one that uses the xmltv feed ...

Jan. Thank you. I am trying to get text that is not in the xmltv feed and is only available by scraping the show web pages.

I can get the data that I want and it is all okay when I set <timespan>0</timespan> to get data for today only. The xmltv produced by WebGrab is not okay when I set <timespan>7</timespan> with many shows having the wrong date in <programme start="20131201133 ...> and in the stop time.

On 30 November I ran a test using <timespan>7 17:00</timespan> and some shows have the wrong date in the xmltv. On 1 December I ran the same test using the same config.xml and the same .ini and shows again have the wrong date in the xmltv.

The attached zip includes the config.xml and the .ini with the xml files produced by each test. An example of the problem is "Live Snooker" on BBC2. I expected this show to be in the xmltv on 1, 2, 3, 4 and 5 of December but the show is missing on some dates.

Another example can be found by searching the xmltv files for the string "Claire Rawle and James Lewis". The search will find the same show in each xmltv file but with different dates.

I have looked at the .ini but I am not able to see the cause of the problem. Thank you for any suggestions and for the work that you have done on WebGrab++.
Note: .... All attempts to attach a file to this post appear to have failed with

An AJAX HTTP request terminated abnormally.
Debugging information follows.
Path: /file/ajax/field_attachments/und/form-QEEJiX4piLqoJjlHcWcrftmDn9igAyZ8IL2czNZ9CYo
StatusText: n/a
ResponseText:
WebGrab+Plus
Logged in as GrahamLogout
Error
The website encountered an unexpected error. Please try again later.
Error message
EntityMalformedException: Missing bundle property on entity of type comment. in entity_extract_ids() (line 7663 of C:\DrupalWG\includes\common.inc).
Theme by Danetsoft and Danang Probo Sayekti inspired by Maksimer-->
Program Development - Jan van Straaten ------- Web design - Francis De PaemeleereSupported by: servercare.nl

ReadyState: undefined

Attachments:

files.zip

Tue, 2013-12-03 19:45

WGMaker

Offline

Joined: 12 years

Last seen: 13 hours

Is the support helpful?

Hi,
the site had some duplicate shows in the index page which we didn't discover before. I removed these duplicates in rev 9 of radiotimes.com.ini which you can get @ http://www.webgrabplus.com/sites/default/files/download/ini/info/zip/UK_radiotimes.com.zip

Jan

Wed, 2013-12-04 18:18

Graham

Offline

Joined: 11 years

Last seen: 8 months

WGMaker wrote:

... rev 9 of radiotimes.com.ini ...

Thank you. This is now correctly capturing all of the days that I specify.

You have ...

episode.modify {remove|'temp_1'}
subtitle.modify {remove|'temp_1'}
subtitle.modify {remove|'episode'}
which produces a sub-title ...

<title lang="en">Who Do You Think You Are?</title>
<sub-title lang="en">. Sue Johnston</sub-title>

I have edited to ...

episode.modify {remove|'temp_1'}
subtitle.modify {remove|'temp_1'}
subtitle.modify {remove|'episode'}
subtitle.modify {remove|'. '}

which produces a sub-title (without the leading dot space) ...

<title lang="en">Who Do You Think You Are?</title>
<sub-title lang="en">Sue Johnston</sub-title>

I tried ...

subtitle.modify {remove|'. ' 0 2}

to restrict the modify to the first 2 chars but it didn't work. Any suggestions?

Thank you.

Thu, 2013-12-05 09:27

WGMaker

Offline

Joined: 12 years

Last seen: 13 hours

Is the support helpful?

Graham,

regarding subtitle.modify {remove|'. ' 0 2} , indeed that won't work! Remove (and also replace and substring for that matter) comes in several flavours depending on the argument type. The default value of type is string , which is used for your example , then remove tries to remove a string with value "'. ' 0 2" which it won't find so nothing happens.

Here you intend to use index values (start pos and length). For that type needs to be specified as type=char (other possible index types are word, sentence, paragraph and element, see the manual 4.6 - arguments). But even specified as subtitle.modify {remove|0 2} it wont work, because now it will always remove the first two chars whatever their value. With a few extra lines you can solve that by first checking if the first two char are indeed ". " Like this:
temp_1.modify {substring(type=char)|'subtitle' 0 2}
subtitle.modify {remove('temp_1' = ". " type=char)|0 2}

The best way is to use type=regex . This can do the checking and removing in one scrubstring :
subtitle.modify {remove(type=regex)|(\A\. ).+}
Disadvantage here is only that to understand regular expressions needs some study.

Jan

Thu, 2013-12-05 18:30

Graham

Offline

Joined: 11 years

Last seen: 8 months

WGMaker wrote:

temp_1.modify {substring(type=char)|'subtitle' 0 2}
subtitle.modify {remove('temp_1' = ". " type=char)|0 2}

Thank you again.
I now have an xmltv file that is good for me.
G

WebGrab+Plus

Search form

You are here

Use description from show detail page in <desc> tag