You are here

Problem with port.hu

36 posts / 0 new
Last post
fabatka
Offline
Donator
Joined: 7 years
Last seen: 2 months
Problem with port.hu

Hello!

Some channels have stopped downloading EPG from the port.hu website.

1. WebGrab+Plus/w MDB & REX Postprocess -- version V3.2.3.0
2. Ubuntu 16.04.3 LTS
3. Mono JIT compiler version 6.12.0.122 (tarball Mon Feb 22 17:33:28 UTC 2021)
Copyright (C) 2002-2014 Novell, Inc, Xamarin Inc and Contributors. www.mono-project.com

Thanks for answers!

Attachments: 
Blackbear199
Offline
Blackbear199's picture
WG++ Team memberDonator
Joined: 8 years
Last seen: 5 hours

looks like the have some sort flood protection running.
grabbing get blocked when you grab too much data to fast.
try adding some delays to slow webgrab down.
add the retry= part to the site {xxx} line like below.
dont forget the | thats used to separate the settings.

site {channelnameprefix=port|retry=<retry time-out="30" channel-delay="5" index-delay="5" show-delay="5">4</retry>}

you can also try lowering the delays.i used the above and didnt get blocked.

fabatka
Offline
Donator
Joined: 7 years
Last seen: 2 months

thanks for the answer!

but I don’t understand where in the config this needs to be inserted. can you paste it into my config as an example so that I can understand?

Blackbear199
Offline
Blackbear199's picture
WG++ Team memberDonator
Joined: 8 years
Last seen: 5 hours

its not in the config,its in the port.hu.ini

Blackbear199
Offline
Blackbear199's picture
WG++ Team memberDonator
Joined: 8 years
Last seen: 5 hours

screenshot.
edit:
i made a typo,corrected.

fabatka
Offline
Donator
Joined: 7 years
Last seen: 2 months

I understand already, thank you. looks like it's working

Blackbear199
Offline
Blackbear199's picture
WG++ Team memberDonator
Joined: 8 years
Last seen: 5 hours

double post

Blackbear199
Offline
Blackbear199's picture
WG++ Team memberDonator
Joined: 8 years
Last seen: 5 hours

fyi it looks like the details page(show-delay) is the main cause of getting blocked.
you could probably use a index-delay,channel-delay of 1,maybe even a show-delay also of 1.
if you get blocked i would increase the show-delay and try again.

values in post above slow webgrab down alot,maybe more than necessary.

fabatka
Offline
Donator
Joined: 7 years
Last seen: 2 months

thanks, I'll test it.

But there is a similar problem with the site musor.tv

PinChin
Offline
PinChin's picture
Donator
Joined: 3 years
Last seen: 4 months

Hi!

I can confirm the isssues with musor.tv, also adding a really strange thing: while most of the channels are indexing/grabbing with the proper time/EIT (as: with the correct hour), a couple of the channels were grabbed/indexed with an incorrect date. The best example to this is Galaxy 4: tried to grab it multiple times, but the result was the same (see attached picture) ; according the grabbed time this program supposed to be tomorrow - at the grabbed time - but according to the website (musor.tv) the program is actually today (at the given time).
I didn't modified the site ini - as the other channels/date/time from the site are ok, and neither the backend software adjusts the EPG time.

I've never encountered any issue like this till now. (Not with other site inis)

Setup:
- WebGrab+Plus/w MDB & REX Postprocess -- version V5.1.3 beta
- Debian 11

- TvHeadend running under OSMC (Vero 4k+)

Blackbear199
Offline
Blackbear199's picture
WG++ Team memberDonator
Joined: 8 years
Last seen: 5 hours

thy this.
its a version i have,different from one in siteini.pack

Edit: Updated Feb 11 2024

* @Revision 17 - [24/12/2023] Blackbear199
* details title fix for random index_urlshow failures
* change start time to UTC

Attachments: 
PinChin
Offline
PinChin's picture
Donator
Joined: 3 years
Last seen: 4 months

Hi!

Thx, I'll try it @ the next run and will report back!

UPDATE: For some reason (using your files) ALL of the channels just gave back 'no shows on indexpage', BUT after copying back the original (siteini pack) files, it starts to grab properly (except the original issue)

Blackbear199
Offline
Blackbear199's picture
WG++ Team memberDonator
Joined: 8 years
Last seen: 5 hours

works fine for me with V5.1.3 and linux.

edit the original ini and on the site {xxx} line change firstshow=now to firstshow=1
thats what causing the data to be shifted one day.

i checked a bunch channels and the firshow=x setting is probably not even needed but its safer to have it as the site uses a start time only with no date part.

PinChin
Offline
PinChin's picture
Donator
Joined: 3 years
Last seen: 4 months

Thx for the info. I'll check it later, as at the moment I'm facing an issue with my donation (meaning My 'donator' status has just gone).

Blackbear199
Offline
Blackbear199's picture
WG++ Team memberDonator
Joined: 8 years
Last seen: 5 hours

i just check and your good till 2025 so not sure what happened,i sent a msg to jan to fix it.

fabatka
Offline
Donator
Joined: 7 years
Last seen: 2 months

Hello!

Grabbing does not work for me with your new config on musor.tv

WG version 5.1.3
Alpine Linux v3.18

Blackbear199
Offline
Blackbear199's picture
WG++ Team memberDonator
Joined: 8 years
Last seen: 5 hours

works fine for me on windows and linux(ubuntu).
upload your wg config(remove license info).

fabatka
Offline
Donator
Joined: 7 years
Last seen: 2 months

added in previous post

Blackbear199
Offline
Blackbear199's picture
WG++ Team memberDonator
Joined: 8 years
Last seen: 5 hours

no idea why its not working.
pinchin had the same problem.
did u try changing the original ini firstshow= setting as i said above?
mtv europe has wrong site_id="xx",it should be site_id="MTV_EURO"
thats why its giving the error 404,other than that the rest look fine.

Blackbear199
Offline
Blackbear199's picture
WG++ Team memberDonator
Joined: 8 years
Last seen: 5 hours

looking at your config i noticed your using a different user agent than i do.
try the one i use..
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36 Edg/117.0.2045.47

fabatka
Offline
Donator
Joined: 7 years
Last seen: 2 months

added firstshow=1, but doesn't work

Attachments: 
PinChin
Offline
PinChin's picture
Donator
Joined: 3 years
Last seen: 4 months

I've noticed 2 things in your config file which is a bit off:

1., You're not using any delays (channel, show etc) at scraping, which is suggested to use (min 2sec for channel/show etc.) And increase the timeout to 20 -->> grabbing too fast on musor.tv can lead to an IP ban

2. You're not using the suggested user agent (as it was written in musor.tv.ini)

Edit: Blackbear199 was faster than I type on phone :P

fabatka
Offline
Donator
Joined: 7 years
Last seen: 2 months

after changing the user-agent it worked
Thanks Blackbear199

fabatka
Offline
Donator
Joined: 7 years
Last seen: 2 months

To PinChin
Can you write me an example of what to insert into the musor.ini config?

PinChin
Offline
PinChin's picture
Donator
Joined: 3 years
Last seen: 4 months

If You meant what to put in the 'musor.tv.ini' file: I wouldn't really touch that, but if you open it, you'll see in the header part @ the 'Remarks' section what is advised to change.

If Your Webgrab++.config file is mainly for these couple of channels and don't want to grab from an another site as well, then I'd change (in the Webgrab++ config file) this line:

time-out="10"

to this:

time-out="20" channel-delay="3" index-delay="3" show-delay="3"

With this You can (likely) avoid the IP ban.

fabatka
Offline
Donator
Joined: 7 years
Last seen: 2 months

added these settings

Attachments: 
PinChin
Offline
PinChin's picture
Donator
Joined: 3 years
Last seen: 4 months

If You're changing in the ini file, then change also the channel-delay and index-delay as well to 3

BUT (as you might have realised by now) if You change in the ini file, then those settings will be valid ONLY for the specified site, to which the ini file belongs to and during grabbing surpasses the Webgrab++ config files corresponding settings.

Blackbear199
Offline
Blackbear199's picture
WG++ Team memberDonator
Joined: 8 years
Last seen: 5 hours

glad you guys were able to figure it out.

changing the user agent to the mobile one mentioned in the remarks of the ini in siteini.pack is not really the ideal way.
its a mobile device useragent.
user agent set in webgrab config is a global setting meaning its used for all site ini's.
it may fix musor.tv but break other ini you use.

the more correct way todo this is use channel grouping.
on the downloads page the documented configuration files explains this(its in the webgrab config one)
basically you wrap your musor <channel lines inside <channels>xxxx</channels> tags in your webgrab config.
doing this allows users to set specific settings used for specific channels like the user agent,retry and many other settings.
i edited my post above with a txt file with a example and a few small fixes(see the revision comments).

fabatka
Offline
Donator
Joined: 7 years
Last seen: 2 months

why does this happen? nothing changes.
the same data is overwritten by the same data

( 8/178 ) MUSOR.TV -- chan. (xmltv_id=MTV 00s) -- mode Incremental
iiic
epg correction :
CHANGED show corrected,
show with ---- start = 11/11/2023 14:00:00 stop = 11/11/2023 15:00:00 title = Crazy In Love!
Replaces ----- start = 11/11/2023 14:00:00 stop = 11/11/2023 15:00:00 title = Crazy In Love!
c
epg correction :
CHANGED show corrected,
show with ---- start = 11/11/2023 15:00:00 stop = 11/11/2023 19:00:00 title = Non-Stop Y2Ks!
Replaces ----- start = 11/11/2023 15:00:00 stop = 11/11/2023 19:00:00 title = Non-Stop Y2Ks!
c
epg correction :
CHANGED show corrected,
show with ---- start = 11/11/2023 19:00:00 stop = 11/11/2023 22:00:00 title = 40 Worldwide Hits From The Boys!
Replaces ----- start = 11/11/2023 19:00:00 stop = 11/11/2023 22:00:00 title = 40 Worldwide Hits From The Boys!
c
epg correction :
CHANGED show corrected,
show with ---- start = 11/11/2023 22:00:00 stop = 12/11/2023 03:00:00 title = Get The Party Started!
Replaces ----- start = 11/11/2023 22:00:00 stop = 12/11/2023 03:00:00 title = Get The Party Started!
c
epg correction :
CHANGED show corrected,
show with ---- start = 12/11/2023 03:00:00 stop = 12/11/2023 04:00:00 title = Dancefloor Fillers!
Replaces ----- start = 12/11/2023 03:00:00 stop = 12/11/2023 04:00:00 title = Dancefloor Fillers!
c
epg correction :
CHANGED show corrected,
show with ---- start = 12/11/2023 04:00:00 stop = 12/11/2023 09:00:00 title = Non-Stop Y2Ks!
Replaces ----- start = 12/11/2023 04:00:00 stop = 12/11/2023 09:00:00 title = Non-Stop Y2Ks!

PinChin
Offline
PinChin's picture
Donator
Joined: 3 years
Last seen: 4 months

I think it's because your grabbing is in incremental mode, which mainly just corrects the already grabbed times

Blackbear199
Offline
Blackbear199's picture
WG++ Team memberDonator
Joined: 8 years
Last seen: 5 hours

could be a number of reasons.
corrupt data from previous broken ini.

i would run a grab once using
<update>f</update>

then remove the f
<update></update>

update will now run in incremental mode(default update mode of the <channel update="x" setting for each channel

you should see all .... meaning no changes or corrections

update requested for - 1 - out of - 1 - channels for 1 day(s)
( 1/1 ) MUSOR.TV -- chan. (xmltv_id=AXN (HD)) -- mode Incremental
i.............

Summary for update of AXN (HD)
no changes, no update necessary !
unchanged shows inspected 13
total after update 13

mmario73
Offline
Donator
Joined: 2 months
Last seen: 2 months

Hi all,

I am using the latest 5.1.3 version of wg++ on Ubuntu 18.04. I have not any issues with port.hu portal but with musor.hu.
I didn't want to open a new topic for my issue, because I saw that you commented here about musor.hu.
I have a 403 error, and I tried everything, that you suggested above, but nothing helped me.
I changed the user agent and updated it to Revision 16 from 13.
The error message I am getting is:

Job started at 11/02/2024 11:05:18
Checking License ..
For License request/update data, see WGLicense.log.txt
found: /home/hts/.hts/tvheadend/.wg++/./siteini.pack/Hungary/musor.tv.ini -- Revision 16
encrypted in 'new (V3)' mode
timezone=UTC+00:00 mapped with timezone_id "Atlantic/Canary"
found: /home/hts/.hts/tvheadend/.wg++/./siteini.pack/Misc/dummy.ini -- Revision 02
processing /home/hts/.hts/tvheadend/.wg++/guide.xml ...
Found existing channel (xmltv_id=Max 4) in the config file
Found existing channel (xmltv_id=Example) in the config file
....

i=index .=same c=change g=gab r=replace n=new

Group (0) :
update requested for - 2 - out of - 2 - channels for 1 day(s)
( 1/2 ) MUSOR.TV -- chan. (xmltv_id=Max 4) -- mode Force
i
error downloading page: Response status code does not indicate success: 403 (Forbidden).
Unable to update channel Max 4
Generic syntax exception:
message:
no index page data received from Max 4
unable to update channel, try again later
Existing guide data restored!
( 1/2 ) DUMMY -- chan. (xmltv_id=Example) -- mode Force
in

Summary for update of Example
missing shows added 0
changed shows updated 0
new shows added 1
unchanged shows inspected 0
total after update 1

Job finished at 11/02/2024 11:05:19 done in 1s

What I am doing wrong?

the example of WebGrab++.config.xml is below:

<?xml version="1.0"?>

guide.xml

rex

Mozilla/5.0 (Android 13; Mobile; rv:68.0) Gecko/68.0 Firefox/114.0

decrypt_userkey

To force a license update; replace this text with the letter f
on
4
0
f

Max 4
Example

Thank you in advance.

Blackbear199
Offline
Blackbear199's picture
WG++ Team memberDonator
Joined: 8 years
Last seen: 5 hours

403 forbidden can be caused by a number of things.

first redownload the file in post 11,revision 17

second i would upgrade to V5.1.4.2
https://github.com/SilentButeo2/webgrabplus-siteinipack/blob/master/eval...

third..
please dont paste log data
upload the entire file.
the same goes for your webgrab config,upload the entire file(remove username/password info).
everyone pastes it(i dont know why) but as u can see the forums messes up the tags.

mmario73
Offline
Donator
Joined: 2 months
Last seen: 2 months

To answer my question, I got the BAN.
I connected over wireguard to my host and tied on Windows PC Firefox to load the page and I got:

"Forbidden
You don't have permission to access this resource."

Does anyone know how long take the BAN?
Is there any other solution to bypass the BAN?

Blackbear199
Offline
Blackbear199's picture
WG++ Team memberDonator
Joined: 8 years
Last seen: 5 hours

wait 15 min and try again.

edit the ini and on the site {xxx} line increase the delays for the retry=
show-delay is the one that would cause it the most.
next would be index-delay
lastly channel-delay.

the current settings work fine for me.
i just tried a channel and grabbed 14 days and didnt get banned.

mmario73
Offline
Donator
Joined: 2 months
Last seen: 2 months

Hi,

Thank you very much for your quick support and providing me with a link for a new version of the software as well as for site.ini.
I tried after 20 minutes and I still have a BAN, hence I will try after a few hours again.
I will apply your recommendation for settings.

Log in or register to post comments

Brought to you by Jan van Straaten

Program Development - Jan van Straaten ------- Web design - Francis De Paemeleere
Supported by: servercare.nl