You are here

ISHOW.GR returning 404 errors for all channels

4 posts / 0 new
Last post
nikost74
Offline
Donator
Joined: 8 years
Last seen: 1 week
ISHOW.GR returning 404 errors for all channels

Hello,

I am experiencing issues grabbing EPG data from ishow.gr. All channels return HTTP 404 errors and no guide data is downloaded.

Environment

  • WebGrab++ version: V5.5.0.0
  • Operating System: Linux (server environment)
  • .NET runtimes installed:
     Microsoft.NETCore.App 8.0.24 [/usr/lib64/dotnet/shared/Microsoft.NETCore.App]
     Microsoft.NETCore.App 9.0.13 [/usr/lib64/dotnet/shared/Microsoft.NETCore.App]

Description of the problem

WebGrab++ starts normally and loads the configuration and siteini files correctly. However, when processing channels from ishow.gr, the index pages fail to download and return 404 (NOT FOUND) responses.

Because of this, the guide cannot be updated and the existing guide data is restored.

Example error message

error downloading page: Response status code does not indicate success: 404 (NOT FOUND).
Unable to update channel mega.gr
Generic syntax exception:
no index page data received from mega.gr
unable to update channel, try again later
Existing guide data restored!

 

Notes
The siteini.pack is reported as up to date.
The problem started recently and affects all channels using the ishow.gr.ini (Revision 05) siteini.
I have changed the xmltv_id to match my own headend syntax so you can ignore those names.

Attachments: 
mat8861
Offline
WG++ Team memberDonator
Joined: 10 years
Last seen: 1 hour

Update siteini.pack

nikost74
Offline
Donator
Joined: 8 years
Last seen: 1 week

I have done some debug and testing on the ishow.gr and it seems that site redesign broke category, URL, description, actor and productiondate scraping.

The ishow.gr website recently changed its HTML structure. This caused several scraping rules in `ishow.gr.ini` to break simultaneously. The most visible symptom was raw HTML fragments being dumped into `<category>` tags in the output `guide.xml`, like this:

<category lang="el">"</category>
<category lang="el">onclick="location.href='/show?guid=29688398-1128-4E7B-8C81-72C15BBF0018'"&gt;</category>
<category lang="el">&lt;td</category>
<category lang="el">class="progTd</category>
<category lang="el">progTdicon"</category>

Additionally, descriptions, production dates and actor scraping were silently broken — returning empty fields.

Change 1 - `index_category.scrub`

The site removed the `style` attribute from the `<tr>` row element entirely, and introduced `genre0` as a permanent base class on every show. The old scrub used `" style` as its end delimiter which no longer exists, causing it to overshoot and swallow surrounding HTML.

Old HTML structure:

<tr id="progTr1" class="progTr genre7" style="cursor:pointer" onclick="...">

New HTML structure:

<tr id="progTr1" class="progTr genre0 genre7 " onclick="location.href='/show?guid=...'">

Old line:

index_category.scrub {single(separator=" ")|class="progTr genre0 ||" style|" style}

New line:

index_category.scrub {regex||class="progTr genre0 (genre\d+) "||}

The regex anchors tightly to the class attribute's closing quote, completely ignoring attribute order. It also correctly skips the always-present `genre0` base class and captures only the meaningful secondary genre code.

Change 2 - `index_urlshow`

The site changed the `onclick` handler format on programme rows. The capitalisation changed (`onClick` → `onclick`) and the JavaScript prefix was removed (`javascript:document.location.href=` → `location.href=`). This caused show detail page URLs to be constructed incorrectly, breaking description, actor, director and all other detail-page scraping.

Old HTML:

onClick="javascript:document.location.href='/show?guid=...'"

New HTML:

onclick="location.href='/show?guid=...'"

Old line:

index_urlshow {url|https://www.ishow.gr|onClick="javascript:document.location.href='||'">|'">}

New line:

index_urlshow {url|https://www.ishow.gr|onclick="location.href='||'">|'">}

Change 3 - `description.scrub`

Two things changed on the show detail page. First, the `style` attribute on the synopsis div changed from `style="margin-top:10px"` to `style="margin-top: 10px; margin-bottom: 20px"` (added spaces and a second property). Second, the description text is now wrapped in a `<p>` tag inside the div.

Old HTML:

<div id="synopsis" class="show_text" style="margin-top:10px">Description text here</div>

New HTML:

<div id="synopsis" class="show_text" style="margin-top: 10px; margin-bottom: 20px">
    <p>Description text here</p>
</div>

Old line:

description.scrub {single|<div id="synopsis" class="show_text" style="margin-top:10px">||</div>|</div>}

New line:

description.scrub {regex||<div id="synopsis"[^>]*>\s*<p>(.*?)</p>||}

The regex matches the div regardless of style attribute content, and correctly captures the text from inside the `<p>` wrapper.

Change 4 - `actor.scrub`

The actor HTML structure gained an outer wrapper div `cast_person_div`. The inner `cast_person_info` div is still present but the old `multi` scrub was anchored to a `float: left` inline style that no longer exists.

Old HTML:

<div style="float: left; text-align: left" class="cast_person_info">
    Κλιντ Ίστγουντ
    <div class="role">Secret Service Agent</div>
</div>

New HTML:

<div class="cast_person_div" onclick="location.href='/person/56915/...'">
    <div style="float:left; margin-right: 8px">
        <img src="https://webgrabplus.com/..." />
    </div>
    <div class="cast_person_info" style="text-align: left">
        <div>Κλιντ Ίστγουντ</div>
        <div class="role">Secret Service Agent</div>
    </div>
</div>

Old line:

actor.scrub {multi|<div style="float: left; text-align: left" class="cast_person_info">||</div>|</div>}

New line:

actor.scrub {multi|<div class="cast_person_info"|<div>|</div>|</div>}

Change 5 - `productiondate.scrub`

The `releaseYear` div gained a `style` attribute, causing the exact-match `single` scrub to fail.

Old HTML:

<div class="releaseYear">1993</div>

New HTML:

<div class="releaseYear" style="margin-left: 10px">1993</div>

Old line:

productiondate.scrub {single|<div class="releaseYear">||</div>|</div>}

New line:

productiondate.scrub {regex||<div class="releaseYear"[^>]*>(\d+)</div>||}

Change 6 - index_title.modify and description.modify — Double-encoded HTML entities

The ishow.gr website serves HTML-encoded apostrophes as &amp;#039; (double-encoded). All replace directives failed because WG++ re-encodes special characters during the XML write phase. The correct solution is the built-in htmldecodespecialchar cleanup style which decodes HTML entities at the modify stage before XML writing.

Add after existing title modify lines:

index_title.modify {cleanup(style=htmldecodespecialchar)}

Add after existing description modify lines:

description.modify {cleanup(style=htmldecodespecialchar)}

Verified output after fix

<programme start="20260426004000 +0300" stop="20260426024500 +0300" channel="EPT1">
  <title lang="el">Η δεύτερη ευκαιρία</title>
  <desc lang="el">Τριάντα χρόνια μετά τη δολοφονία του Προέδρου Κένεντι...</desc>
  <credits>
    <actor>Κλιντ Ίστγουντ</actor>
    <actor>Τζον Μάλκοβιτς</actor>
    <actor>Ρενέ Ρούσο</actor>
  </credits>
  <date>1993</date>
  <category lang="el">Ταινία</category>
  <icon src="https://www.ishow.gr/files/images/...jpg"/>
</programme>

All fields clean and correct. Fix tested on multiple channels.

mat8861
Offline
WG++ Team memberDonator
Joined: 10 years
Last seen: 1 hour

Thanks new version in siteini.pack, i fixed also few other things and added actor with roles. Will update with the cast page soon.

Log in or register to post comments

Brought to you by Jan van Straaten

Program Development - Jan van Straaten ------- Web design - Francis De Paemeleere
Supported by: servercare.nl