You are here

ROTATING PROXY TVGUIDE

24 posts / 0 new
Last post
onceyoda
Offline
Donator
Joined: 2 years
Last seen: 3 days
ROTATING PROXY TVGUIDE

How can I implement this ?
So I am using this 
Tool and and bunch of proxy strings
alpkeskin/rota: A high-performance proxy rotation engine with automated IP management and real-time health monitoring
 

I set it to time-based 30 seconds so every 30 seconds it will rotate proxy the issue is well some IP I have is block so when it goes 403 it wont retry but skip that channel. 

Is there a way to retry until it receives an index page? Not skip the channel totally.

Blackbear199
Offline
Has donated long time ago
Joined: 10 years
Last seen: 4 months

had a quick look.

it could be tricky using a timed rotation.it has no way to know that webgrab failed and rotate proxy.

in 30 seconds webgrab could go through all of its retries and the channel gets skipped as you pointed out.

try setting your time-out limit higher to like 30,the same as the proxy rotation.

even increase your retries to 6 maybe.

you can set this in your webgrab config but this is a global setting that applies to every ini you use which you may not want.

retry settings are explained here..

WebGrab++.config.xml | WebGrab+Plus

 

really u only want it on the problematic sites.

i would edit their ini's and add it to the site {xxx} line or even add a new site {xxxx} line with custom retry settings just for that site.

a retry line may also already exist in the ini so look for that also and just edit it.

to add a new line its the same format as the retry line in your webgrab config setting but with retry= at the beginning.

example

site {retry=<retry channel-delay="0" index-delay="0" show-delay="0" time-out="30">6</retry>}

 

onceyoda
Offline
Donator
Joined: 2 years
Last seen: 3 days

The problem is with getting the index page

error downloading page: Response status code does not indicate success: 400 (Bad Request).
Unable to update channel Showtime FamilyZone (East) HDTV (SHOFe) [3077]
Generic syntax exception:
   message: 
no index page data received from Showtime FamilyZone (East) HDTV (SHOFe) [3077]
unable to update channel, try again later
Existing guide data restored!

Does this have a retry? 
If the index page fails it skips without retry so 

Blackbear199
Offline
Has donated long time ago
Joined: 10 years
Last seen: 4 months

if the index page fails webgrab stops.

its just the way it works.

error 400 bad request is not a block,the request url has a issue or something.

 check your channel list,i dont see that channel.

my list shows it as a different channel number but the callsign should still match,mine shows it as SHOWFAMHD,different than what yours shows.

i say your list is old.

 

 

onceyoda
Offline
Donator
Joined: 2 years
Last seen: 3 days

I generated that that fresh channel list from different provider basically I have like DTV, SLING and DISH channel list, checking each which one will work.
I was just using it as example pointing out if it is possible to retry the 403 or 400 in index page as let's say I have a working proxy but within a few minutes TVGUIDE will block it that's why I set it to 30 seconds to ensure it won't get stuck at retrying for instances TVGUIDE block the IP.

So, my only concern now is the 403 in index as that can't be retry and the channel is skip.

  I hope it the future you support retry in index page. If that is added TVGUIDE + Rotating Proxy will be great.

Blackbear199
Offline
Has donated long time ago
Joined: 10 years
Last seen: 4 months

are you using delays with tvguide?
i can grab data without being blocked,its slow but it works.
i dont think your approach of trying to grab data faster with a rotating proxy is a good idea.
your shooting yourself in the foot on purpose and expecting a proxy to fix it.

 

Blackbear199
Offline
Has donated long time ago
Joined: 10 years
Last seen: 4 months

also if your using V5.3 the delay setting does not apply to the subdetails page.

this alone will most certainly trigger a block.

for each show on tvguide there can be up to 2 requests besides the index page.
1. details page - has the regular show details, like description,category,years,country,ect

2. subdetails page - for tvguide this contains the credits, director,actor,ect

if you really dont need the credits disable the urlsubdeail lines.

V5.5 has just been released,this has the delay enabled for subdetails page.

 

onceyoda
Offline
Donator
Joined: 2 years
Last seen: 3 days

details page and sub details page works fine
I didnt encounter any issues as any 403 will be solve in the next retry when the proxy change.

With regards with the delay
<retry channel-delay="10" index-delay="10" show-delay="0" time-out="10">50</retry>
I have this at top

Blackbear199
Offline
Has donated long time ago
Joined: 10 years
Last seen: 4 months

zero for show delay will definitely cause a ban.

are these the delays from your wg config?

tvguide.com.ini has a retry line also which will overide these settings

i think more realistic setting would be

channel-delay 5

index-delay 5

show-delay 5

time-out 30 

retries 4

50 retries is kinda harsh dont you think?

do u really want webgrab to try that many times,do the math..

50 retries x 10 sec time-out = 500 seconds or just over 8 minutes per channel.

so if a channel keeps failing(timming out) it could take 8 minutes or more before webgrab moves onto the next channel.

also as i said you need V5.5,only it has the show-delay enabled for subdetails pages which tvguide uses.

any version below this doesnt have it and will surely cause a ban on a movie,series bunch of shows in a row that have credits(director,ector,ect).

 

onceyoda
Offline
Donator
Joined: 2 years
Last seen: 3 days

The 50 retries is to ensure the rotating proxy will be use
if I put 4 retries and proxy rotates after 30 seconds it will skip that channel, right? so the proxy is basically useless for low retry value

I am running this in cron to so the duration of the process is not an issue and I am getting 2 days of epg every day so there is also a buffer.
The only real issue I have right now is the index as there is no retry function there unlike the details part where I can just set it to higher retry value

 

Blackbear199
Offline
Has donated long time ago
Joined: 10 years
Last seen: 4 months

if u used..

time-out 30

retries 4

30 x 4 = 120 seconds

the proxy should rotate 4 times
also remember after webgrab triggers a rtry and then starts grabbing data again the retry is reset.

so its not 4 retries for the entire channel grab,its 4 for when a retry is triggered.

if its triggers 10 times during a channel grab then each time gets 4 retries.

the only way webgrab skips the channel is when all 4 retries fail in a row.

 

onceyoda
Offline
Donator
Joined: 2 years
Last seen: 3 days

So lets say I am getting a channel 
Loaded 1 proxy for 30 seconds - the 30 seconds is also to ensure the IP won't be block totally coz I am randomly picking the proxy from my pool.

After 30 seconds new proxy is used and still on the same channel so let's say for 1 channel getting 2 days of EPG 5 minimum of 5-7 proxies in the first run as the next runs only 1 day is needed as there is a buffer 
For instances that it will hit 403 as it is inevitable as I am picking the proxy randomly the tool can use the same proxy with in that 1 channel, the 50 retries will solve that. 

Do you see the logic behind it?

 

Blackbear199
Offline
Has donated long time ago
Joined: 10 years
Last seen: 4 months

first,update to V5.5

this is your killer as no show delay for subdetails page is a 100% ban trigger.

i dont get banned and grabbed 10 channels for 7 days testing.

if you get the delays tuned in you wont need the proxy.

your biggest issue as i said is not using V5.5

before this i posted in the forums for users to disable the subdetails page.

with V5.5 you wont have to.

 

Blackbear199
Offline
Has donated long time ago
Joined: 10 years
Last seen: 4 months

also using a proxy unless your proxy list contains 1000's of proxy's your going to run into trouble.

it dosent take long for a ban without delays,i think it might make it through one channel maybe?

i forget how long they ban the ip for,hours,days?

so say u had 20 proxies the will all get banned and you will be rotating through nothing but banned proxies.

 

 

onceyoda
Offline
Donator
Joined: 2 years
Last seen: 3 days

Okay I will try and find and optimize config
____________________________________

This is what happens sometimes there are IP that is Block already in first run

So I need to wait for a few seconds and check it manually if the proxy change. After I confirmed the change.
I can run it again just fine.

But if you can add that retry feature in index level that would be great for problematic sites with rate limiting.

Blackbear199
Offline
Has donated long time ago
Joined: 10 years
Last seen: 4 months

i cannot see jan considering this.

the problem i see if there is now way to tell if the 403 forbidden is comming from a ip ban or if its really forbidden ban like geo blocking,ect.

i dont think he will go for wasting resources doing retries for nothing.

if he did consider it he mays well ignore any response code and if no data is recieved just keep retrying x amount of times regardless of what the problem is.

that i know he wont go for.

 

onceyoda
Offline
Donator
Joined: 2 years
Last seen: 3 days

Does WEBGRAB uses like backened API for some sites or completely Local Resources?
If all local, then that's fine right? as long the default is 0 retry and we can configure it so that it is up to us? 
 

Blackbear199
Offline
Has donated long time ago
Joined: 10 years
Last seen: 4 months

your screenshots dont really meany anything..

its not a true test.

notice your last screenshots..

i.....................

this means your running in incrimental mode so only the index page is downloaded,no details or subdetails pages.

for a true test run a force update to grab all new data or delete your guide.xml

 

Blackbear199
Offline
Has donated long time ago
Joined: 10 years
Last seen: 4 months

depends on the site.

yes some have api and some are all html code.

 

onceyoda
Offline
Donator
Joined: 2 years
Last seen: 3 days

Okay 
I'll try show a real test

onceyoda
Offline
Donator
Joined: 2 years
Last seen: 3 days

I'll drop it here a video file when it starts to 403

mat8861
Offline
WG++ Team memberDonator
Joined: 10 years
Last seen: 9 hours

to run the test in force set <update>f</update> (f=full or force) as indicated here:

https://webgrabplus.com/documentation/configuration/webgrabconfigxml#con...

in log you will see mode force, example:

[  Info  ] (   1/38  ) SKY.DE -- chan. (xmltv_id=Sky Sport Austria 1 HD) -- mode Force

onceyoda
Offline
Donator
Joined: 2 years
Last seen: 3 days

Solved this
Decided to break the process when it fails on indexing the page and retry using python

Temporarily or permanently if the dev decided not to add the function to retry in index

EPGFreak
Offline
Donator
Joined: 6 years
Last seen: 3 weeks

What does your working configuration look like?

Log in or register to post comments

Brought to you by Jan van Straaten

Program Development - Jan van Straaten ------- Web design - Francis De Paemeleere
Supported by: servercare.nl