How can I implement this ?
So I am using this
Tool and and bunch of proxy strings
alpkeskin/rota: A high-performance proxy rotation engine with automated IP management and real-time health monitoring
I set it to time-based 30 seconds so every 30 seconds it will rotate proxy the issue is well some IP I have is block so when it goes 403 it wont retry but skip that channel.
Is there a way to retry until it receives an index page? Not skip the channel totally.

had a quick look.
it could be tricky using a timed rotation.it has no way to know that webgrab failed and rotate proxy.
in 30 seconds webgrab could go through all of its retries and the channel gets skipped as you pointed out.
try setting your time-out limit higher to like 30,the same as the proxy rotation.
even increase your retries to 6 maybe.
you can set this in your webgrab config but this is a global setting that applies to every ini you use which you may not want.
retry settings are explained here..
WebGrab++.config.xml | WebGrab+Plus
really u only want it on the problematic sites.
i would edit their ini's and add it to the site {xxx} line or even add a new site {xxxx} line with custom retry settings just for that site.
a retry line may also already exist in the ini so look for that also and just edit it.
to add a new line its the same format as the retry line in your webgrab config setting but with retry= at the beginning.
example
site {retry=<retry channel-delay="0" index-delay="0" show-delay="0" time-out="30">6</retry>}
The problem is with getting the index page
error downloading page: Response status code does not indicate success: 400 (Bad Request).
Unable to update channel Showtime FamilyZone (East) HDTV (SHOFe) [3077]
Generic syntax exception:
message:
no index page data received from Showtime FamilyZone (East) HDTV (SHOFe) [3077]
unable to update channel, try again later
Existing guide data restored!
Does this have a retry?
If the index page fails it skips without retry so
if the index page fails webgrab stops.
its just the way it works.
error 400 bad request is not a block,the request url has a issue or something.
check your channel list,i dont see that channel.
my list shows it as a different channel number but the callsign should still match,mine shows it as SHOWFAMHD,different than what yours shows.
i say your list is old.
I generated that that fresh channel list from different provider basically I have like DTV, SLING and DISH channel list, checking each which one will work.
I was just using it as example pointing out if it is possible to retry the 403 or 400 in index page as let's say I have a working proxy but within a few minutes TVGUIDE will block it that's why I set it to 30 seconds to ensure it won't get stuck at retrying for instances TVGUIDE block the IP.
So, my only concern now is the 403 in index as that can't be retry and the channel is skip.
I hope it the future you support retry in index page. If that is added TVGUIDE + Rotating Proxy will be great.
are you using delays with tvguide?
i can grab data without being blocked,its slow but it works.
i dont think your approach of trying to grab data faster with a rotating proxy is a good idea.
your shooting yourself in the foot on purpose and expecting a proxy to fix it.
also if your using V5.3 the delay setting does not apply to the subdetails page.
this alone will most certainly trigger a block.
for each show on tvguide there can be up to 2 requests besides the index page.
1. details page - has the regular show details, like description,category,years,country,ect
2. subdetails page - for tvguide this contains the credits, director,actor,ect
if you really dont need the credits disable the urlsubdeail lines.
V5.5 has just been released,this has the delay enabled for subdetails page.
details page and sub details page works fine
I didnt encounter any issues as any 403 will be solve in the next retry when the proxy change.
With regards with the delay
<retry channel-delay="10" index-delay="10" show-delay="0" time-out="10">50</retry>
I have this at top
zero for show delay will definitely cause a ban.
are these the delays from your wg config?
tvguide.com.ini has a retry line also which will overide these settings
i think more realistic setting would be
channel-delay 5
index-delay 5
show-delay 5
time-out 30
retries 4
50 retries is kinda harsh dont you think?
do u really want webgrab to try that many times,do the math..
50 retries x 10 sec time-out = 500 seconds or just over 8 minutes per channel.
so if a channel keeps failing(timming out) it could take 8 minutes or more before webgrab moves onto the next channel.
also as i said you need V5.5,only it has the show-delay enabled for subdetails pages which tvguide uses.
any version below this doesnt have it and will surely cause a ban on a movie,series bunch of shows in a row that have credits(director,ector,ect).
The 50 retries is to ensure the rotating proxy will be use
if I put 4 retries and proxy rotates after 30 seconds it will skip that channel, right? so the proxy is basically useless for low retry value
I am running this in cron to so the duration of the process is not an issue and I am getting 2 days of epg every day so there is also a buffer.
The only real issue I have right now is the index as there is no retry function there unlike the details part where I can just set it to higher retry value
if u used..
time-out 30
retries 4
30 x 4 = 120 seconds
the proxy should rotate 4 times
also remember after webgrab triggers a rtry and then starts grabbing data again the retry is reset.
so its not 4 retries for the entire channel grab,its 4 for when a retry is triggered.
if its triggers 10 times during a channel grab then each time gets 4 retries.
the only way webgrab skips the channel is when all 4 retries fail in a row.
So lets say I am getting a channel
Loaded 1 proxy for 30 seconds - the 30 seconds is also to ensure the IP won't be block totally coz I am randomly picking the proxy from my pool.
After 30 seconds new proxy is used and still on the same channel so let's say for 1 channel getting 2 days of EPG 5 minimum of 5-7 proxies in the first run as the next runs only 1 day is needed as there is a buffer
For instances that it will hit 403 as it is inevitable as I am picking the proxy randomly the tool can use the same proxy with in that 1 channel, the 50 retries will solve that.
Do you see the logic behind it?
first,update to V5.5
this is your killer as no show delay for subdetails page is a 100% ban trigger.
i dont get banned and grabbed 10 channels for 7 days testing.
if you get the delays tuned in you wont need the proxy.
your biggest issue as i said is not using V5.5
before this i posted in the forums for users to disable the subdetails page.
with V5.5 you wont have to.
also using a proxy unless your proxy list contains 1000's of proxy's your going to run into trouble.
it dosent take long for a ban without delays,i think it might make it through one channel maybe?
i forget how long they ban the ip for,hours,days?
so say u had 20 proxies the will all get banned and you will be rotating through nothing but banned proxies.
Okay I will try and find and optimize config


____________________________________
This is what happens sometimes there are IP that is Block already in first run
So I need to wait for a few seconds and check it manually if the proxy change. After I confirmed the change.
I can run it again just fine.
But if you can add that retry feature in index level that would be great for problematic sites with rate limiting.
i cannot see jan considering this.
the problem i see if there is now way to tell if the 403 forbidden is comming from a ip ban or if its really forbidden ban like geo blocking,ect.
i dont think he will go for wasting resources doing retries for nothing.
if he did consider it he mays well ignore any response code and if no data is recieved just keep retrying x amount of times regardless of what the problem is.
that i know he wont go for.
Does WEBGRAB uses like backened API for some sites or completely Local Resources?
If all local, then that's fine right? as long the default is 0 retry and we can configure it so that it is up to us?
your screenshots dont really meany anything..
its not a true test.
notice your last screenshots..
i.....................
this means your running in incrimental mode so only the index page is downloaded,no details or subdetails pages.
for a true test run a force update to grab all new data or delete your guide.xml
depends on the site.
yes some have api and some are all html code.
Okay
I'll try show a real test
I'll drop it here a video file when it starts to 403
to run the test in force set <update>f</update> (f=full or force) as indicated here:
https://webgrabplus.com/documentation/configuration/webgrabconfigxml#con...
in log you will see mode force, example:
[ Info ] ( 1/38 ) SKY.DE -- chan. (xmltv_id=Sky Sport Austria 1 HD) -- mode Force
Solved this
Decided to break the process when it fails on indexing the page and retry using python
Temporarily or permanently if the dev decided not to add the function to retry in index
What does your working configuration look like?