You are here

Join two files guide.xml

15 posts / 0 new
Last post
Tapiocapioca
Offline
Has donated long time ago
Joined: 7 years
Last seen: 4 years
Join two files guide.xml

I am grabbing the epg with two session of  WebGrab+ to be faster, so at the end of the job I have two files: guide1.xml and guide2.xml but if I want use it I need join two files into one. Can someone suggest me the better way? Is there one option of REX can help me? I read the documents but I can't find one solution.

Tapiocapioca
Offline
Has donated long time ago
Joined: 7 years
Last seen: 4 years

I think you hate me, but can you make me one example?

I have c:\guide1.xml and c:\guide2.xml

Need i make one file merge-xmltv.ini an write inside:

subpage.format {list|c:\guide1.xml c:\guide2.xml} line

After i don't understand about channel.xml :(

Tapiocapioca
Offline
Has donated long time ago
Joined: 7 years
Last seen: 4 years

Ok Ok. Thank you like usually, I will have problems i will ask again :)

Tapiocapioca
Offline
Has donated long time ago
Joined: 7 years
Last seen: 4 years

I need ask you again. I think the setup of my ini is ok, Inside my original WebGrab++.config.xml I have this config:

<channel update="i" site="guidatv.sky.it" site_id="##id=101##_##icon_file=101_home.png" xmltv_id="Sky Cinema 1 HD">Sky Cinema 1 HD</channel>
<channel offset="2" same_as="Sky Cinema 1 HD" xmltv_id="Sky Cinema +2HD">Sky Cinema +2 HD</channel>
<channel offset="1" same_as="Sky Cinema 1 HD" xmltv_id="Cinema 1 HD (DSL Lente)">Sky Cinema +1 HD</channel>

What's the way to modify it and use for merge inside the new WebGrab++.config.xml? Please can you explain me?

Tapiocapioca
Offline
Has donated long time ago
Joined: 7 years
Last seen: 4 years

I did different but is working fine.

I joined all files WebGrab++.config.xml into one inside the new folder WebGrab_Merger

I edited the ini following your instructions an after I replaced "guidatv.sky.it" with "merge-xmltv" and is working perfectly.

Is not working rex, but this is another one question...

Tapiocapioca
Offline
Has donated long time ago
Joined: 7 years
Last seen: 4 years

At the end something is wrong.. The channels are joined but not the description and other things... What's can be? The log show me

[Error   ] no shows in indexpage!

Tapiocapioca
Offline
Has donated long time ago
Joined: 7 years
Last seen: 4 years

I totally understood the procedure, is not totally working because skip the channels with offset. Is there another one way?

Tapiocapioca
Offline
Has donated long time ago
Joined: 7 years
Last seen: 4 years
Blackbear199 wrote:

what else do you expect when you do it your way and not the way i told you.

should i rub my crystal ball to view your files?

I follow your way, but is one old bug I think, never fixed. I have same problem of this user.

http://www.webgrabplus.com/content/merging-xmltv-files

Tapiocapioca
Offline
Has donated long time ago
Joined: 7 years
Last seen: 4 years

Excuseme if I am slow to answer you, my connection is really worst, I will read soon your mesages, but thank you in advice.

Tapiocapioca
Offline
Has donated long time ago
Joined: 7 years
Last seen: 4 years

Ok. I followed everything you told me, I have tree files "guide_01.xml, guide_02.xml, guide_03.xml" my config inside the file is:

Quote:

**------------------------------------------------------------------------------------------------
* @header_start
* WebGrab+Plus ini for grabbing EPG data from TvGuide websites
* @Site: merge-xmltv-utc
* @MinSWversion: V1.57
* @Revision 0 - [29/08/2016] Blackbear199
*   - creation
* @Remarks: merges ini files and corrects all time to UTC.Variation of original merge-xmltv.ini
* @header_end
**------------------------------------------------------------------------------------------------
*** edit (optional) - cultureinfo=en-GB - to the cultureinfo of the country for which the xmltv data is created
site {url=merge-xmltv-utc|timezone=UTC|maxdays=31.1|cultureinfo=en-GB|charset=UTF-8|titlematchfactor=90|keepindexpage}
*
*** eventually enable and adapt ratingsystem and episodesystem to your requiements
*site {ratingsystem=GB|episodesystem=onscreen}
*
*** edit - path_of_the_xmltv_file2merge.xml - to your requirements
*** more than one file2merge or just one:
*subpage.format {list|path_of_the_1st_xmltv_file2merge.xml|path_of_the_2nd_xmltv_file2merge.xml|etc}
*** example
*subpage.format {list|D:\guide-1.xml|D:\guide-2.xml}
subpage.format {list|C:\ProgramData\ServerCare\data\xml\guide_01.xml|C:\ProgramData\ServerCare\data\xml\guide_02.xml|C:\ProgramData\ServerCare\data\xml\guide_03.xml}
url_index{url|file://|subpage|}

scope.range {(datelogo)|end}
index_variable_element.modify {set|'config_site_id'}
index_variable_element.modify {cleanup(style=regex)}
end_scope
index_showsplit.scrub {regex||<programme [^>]*channel=\"'index_variable_element'\"[^>]*>.*?</programme>||}
*
index_start.scrub {regex||start="(\d{12})\d{2}\s[-+]\d{4}"||}
index_stop.scrub {regex||stop="(\d{12})\d{2}\s[-+]\d{4}"||}
index_title.scrub {single|<title|>|</title>|</title>}
index_subtitle.scrub {single|<sub-title|>|</sub-title>|</sub-title>}
index_description.scrub {single|<desc|>|</desc>|</desc>}
index_actor.scrub {multi|<actor>||</actor>|</actor>}
index_director.scrub {multi|<director>||</director>|</director>}
index_writer.scrub {multi|<writer>||</writer>|</writer>}
index_producer.scrub {multi|<producer>||</producer>|</producer>}
index_presenter.scrub {multi|<presenter>||</presenter>|</presenter>}
index_productiondate.scrub {single|<year>||</year>|</year>}
index_category.scrub {multi|<category|>|</category>|</category>}
index_rating.scrub {multi|<rating|<value>|</value>|</rating>}
index_starrating.scrub {single|<star-rating>|<value>|</value>|</star-rating>}
index_episode.scrub {single|<episode-num|>|<|/episode-num>}
*
scope.range {(indexshowdetails)|end}
index_temp_9.scrub {regex||start="\d{14}\s([-+]\d{4})"||}
index_temp_8.modify {substring(type=char)|'index_temp_9' 1 4}
index_temp_9.modify {substring(type=char)|0 1}
index_temp_7.modify {substring(type=char)|'index_temp_8' 0 2}
index_temp_8.modify {substring(type=char)|2 4}
index_temp_8.modify {addstart|'index_temp_7':}
index_temp_8.modify {calculate(format=time,H:mm)}
*
index_temp_1.modify {substring(type=char)|'index_start' 0 4} * year
index_temp_1.modify {addend|/}
index_temp_2.modify {substring(type=char)|'index_start' 4 2} * month
index_temp_1.modify {addend|'index_temp_2'/}
index_temp_2.modify {substring(type=char)|'index_start' 6 2} * day
index_temp_1.modify {addend|'index_temp_2' }
index_temp_2.modify {substring(type=char)|'index_start' 8 2} * hour
index_temp_1.modify {addend|'index_temp_2':}
index_temp_2.modify {substring(type=char)|'index_start' 10 2} * minute
index_start.modify {set|'index_temp_1''index_temp_2'}
index_start.modify {calculate('index_temp_9' "-" format=date,unix)|0:'index_temp_8' +}
index_start.modify {calculate('index_temp_9' "+" format=date,unix)|0:'index_temp_8' -}
*
index_temp_1.modify {substring(type=char)|'index_stop' 0 4} * year
index_temp_1.modify {addend|/}
index_temp_2.modify {substring(type=char)|'index_stop' 4 2} * month
index_temp_1.modify {addend|'index_temp_2'/}
index_temp_2.modify {substring(type=char)|'index_stop' 6 2} * day
index_temp_1.modify {addend|'index_temp_2' }
index_temp_2.modify {substring(type=char)|'index_stop' 8 2} * hour
index_temp_1.modify {addend|'index_temp_2':}
index_temp_2.modify {substring(type=char)|'index_stop' 10 2} * minute
index_stop.modify {set|'index_temp_1''index_temp_2'}
index_stop.modify {calculate('index_temp_9' "-" format=date,unix)|0:'index_temp_8' +}
index_stop.modify {calculate('index_temp_9' "+" format=date,unix)|0:'index_temp_8' -}
*
index_description.modify {cleanup}
end_scope
*
**  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _
**      #####  CHANNEL FILE CREATION (only to create the xxx-channel.xml file)
** @auto_xml_channel_start
*index_site_id.scrub {regex||<channel [^>]*id="[^\"]*"[^>]*>.*?</channel>||}
*scope.range {(channellist)|end}
*index_site_channel.modify {addstart|'index_site_id'}
*index_site_id.modify {substring(type=regex)|<channel [^>]*id="([^\"]*)"[^>]*>}
*index_site_channel.modify {substring(type=regex)|<display-name [^>]*>(.*?)</display-name>}
*index_site_id.modify {cleanup(removeduplicates=equal link="index_site_channel")}
*end_scope
** @auto_xml_channel_end

inside the file WebGrab++.config.xml I I copied my default configuration and i added all files with this format:

Quote:

    <channel update="i" site="guidatv.sky.it" site_id="##id=899##_##icon_file=899_home.png" xmltv_id="Rai Uno">Rai Uno</channel>
    <channel offset="2" same_as="Rai Uno" xmltv_id="Rai 1 +2HD">Rai 1 +2HD</channel>
    <channel offset="1" same_as="Rai Uno" xmltv_id="Rai 1 +1HD">Rai 1 +1HD</channel>
    <channel offset="0" same_as="Rai Uno" xmltv_id="Rai 1 HD">Rai 1 HD</channel>

If I lunch the script I have this result.

Quote:

[        ]
[        ]              WebGrab+Plus/w MDB & REX Postprocess -- version  V1.57             
[        ]
[        ]                                 Jan van Straaten                                
[        ]                              Francis De Paemeleere                              
[        ]
[        ]             thanks to Paul Weterings and all the contributing users             
[        ] --------------------------------------------------------------------------------
[        ]
[        ] Job started at 19/09/2016 22:25:58
[  Debug ]
[  Debug ] Running  on: Microsoft Windows NT 6.1.7601 Service Pack 1
[  Debug ] Environment: 4.0.30319.42000
[  Debug ]
[  Debug ] Loading timezone data
[  Debug ] Embedded timezones source: WGconsole.WG.Common.timezonesdata.txt
[  Debug ] Reading config file: C:\ProgramData\ServerCare\WebGrab_Merger\WebGrab++.config.xml
[        ] Job finished at 19/09/2016 22:25:58 done in 0s
[Critical] Unhandled Exception
[Critical]
Unable to find the siteini: guidatv.sky.it.ini.
Looked in:
C:\ProgramData\ServerCare\WebGrab_Merger
C:\ProgramData\ServerCare\WebGrab_Merger\siteini.user (+ subfolders max.depth = 6)
C:\ProgramData\ServerCare\WebGrab_Merger\siteini.pack (+ subfolders max.depth = 6)
[Critical]
   in WGconsole.Program.ConsoleApplication(String[] args)
   in WGconsole.Program.Main(String[] args)
[Critical] For detailed info, see log file C:\ProgramData\ServerCare\WebGrab_Merger\WebGrab++.log.txt
[Critical] Execution stopped

What's the first step to fix it?

Tapiocapioca
Offline
Has donated long time ago
Joined: 7 years
Last seen: 4 years
Blackbear199 wrote:

<channel update="i" site="guidatv.sky.it" site_id="##id=899##_##icon_file=899_home.png" xmltv_id="Rai Uno">Rai Uno</channel>
    <channel offset="2" same_as="Rai Uno" xmltv_id="Rai 1 +2HD">Rai 1 +2HD</channel>
    <channel offset="1" same_as="Rai Uno" xmltv_id="Rai 1 +1HD">Rai 1 +1HD</channel>
    <channel offset="0" same_as="Rai Uno" xmltv_id="Rai 1 HD">Rai 1 HD</channel>

you use these lines in your webgrab++config.xml that you use to grab the actual data from web sites not for merging files.

 

for merging files,at the bottom of the merge-xmltv-utc.ini create a channels.xml

this will scan your guide 1,2,3 xml files and get all the channel id's,copy all the <channel..</chanel> lines from the merge-xmltv-utc.channels.xml to you webgrab++config.xml

merging ini files is the exact same process as grabbing data from websites except the data is read from files and not from a web page so you need a channel.xml for you xml files to read them just like you need one to get channel information from a website.

I am trying to use the scope for make the channels list but I have the same errors, I made the ini file like this:

Quote:

**      #####  CHANNEL FILE CREATION (only to create the xxx-channel.xml file)
** @auto_xml_channel_start
index_site_id.scrub {regex||<channel [^>]*id="[^\"]*"[^>]*>.*?</channel>||}
scope.range {(channellist)|end}
index_site_channel.modify {addstart|'index_site_id'}
index_site_id.modify {substring(type=regex)|<channel [^>]*id="([^\"]*)"[^>]*>}
index_site_channel.modify {substring(type=regex)|<display-name [^>]*>(.*?)</display-name>}
index_site_id.modify {cleanup(removeduplicates=equal link="index_site_channel")}
end_scope
** @auto_xml_channel_end

If I want it works I need also modify the list of the cannels like this, is it the right procedure? Escuseme if I am always asking but I want be sure..

Quote:

<channel update="i" site="merge-xmltv-utc" site_id="##id=899##_##icon_file=899_home.png" xmltv_id="Rai Uno">Rai Uno</channel>
    <channel offset="2" same_as="Rai Uno" xmltv_id="Rai 1 +2HD">Rai 1 +2HD</channel>
    <channel offset="1" same_as="Rai Uno" xmltv_id="Rai 1 +1HD">Rai 1 +1HD</channel>
    <channel offset="0" same_as="Rai Uno" xmltv_id="Rai 1 HD">Rai 1 HD</channel>

If I do it, the file merge-xmltv-utc.channels.xml Is create in the same folder of guide.xml running webgrab.

If I open the file merge-xmltv-utc.channels.xml I have the channels like:

Quote:

<?xml version="1.0" encoding="UTF-8"?>
<site generator-info-name="WebGrab+Plus/w MDB &amp; REX Postprocess -- version  V1.57 -- Jan van Straaten" site="merge-xmltv-utc">
  <channels>
    <channel update="i" site="merge-xmltv-utc" site_id="Rai Uno" xmltv_id="Rai Uno">Rai Uno</channel>
    <channel update="i" site="merge-xmltv-utc" site_id="Rai 1 +2HD" xmltv_id="Rai 1 +2HD">Rai 1 +2HD</channel>
    <channel update="i" site="merge-xmltv-utc" site_id="Rai 1 HD +1" xmltv_id="Rai 1 HD +1">Rai 1 HD +1</channel>
    <channel update="i" site="merge-xmltv-utc" site_id="Rai 1 HD" xmltv_id="Rai 1 HD">Rai 1 HD</channel>
    ..........

I copy all list of channels inside webgrab++config.xml, and I add again the * inside the ini about the scope make the list of channels and I run again webgrab.

After short timre the file guide.xml look like correctly made.

Thank you!

mosli
Offline
Donator
Joined: 3 years
Last seen: 2 months
Tapiocapioca wrote:

I think you hate me, but can you make me one example?
I have c:\guide1.xml and c:\guide2.xml

Here's a simple tool to do this: http://www.webgrabplus.com/comment/23188#comment-23188

mat8861
Offline
WG++ Team memberDonator
Joined: 8 years
Last seen: 9 hours

Do you realize you are replying to a post 4 years old ?

mosli
Offline
Donator
Joined: 3 years
Last seen: 2 months

If you google for a solution for that issue, you will still end up here. Probably even in 2030.

mat8861
Offline
WG++ Team memberDonator
Joined: 8 years
Last seen: 9 hours

That user solved his problem using merge, if someone is looking for other solution can google by himself.

Log in or register to post comments

Brought to you by Jan van Straaten

Program Development - Jan van Straaten ------- Web design - Francis De Paemeleere
Supported by: servercare.nl