You are here

Problems with replacing back slash with forward slash

6 posts / 0 new
Last post
khanhkronos
Offline
Joined: 6 months
Last seen: 2 months
Problems with replacing back slash with forward slash

Hi, I apologize for asking a stupid question, but say for example if I have this temp value to store the URL:

index_temp_1.modify{set|"http://something.com/v\attachfiles_3422\3sthumdai\44dsdsd.jpg"}

and I want to change the backslash to forward slash, how can I do it? I've tried 

index_temp_1.modify{replace|\|/}

but the debugger refuses to acknowledge there are 2 expressions. I guess it got confused between \ and | ?

I also tried {replace|\\|/} but again, didn't seem to do the trick. 

khanhkronos
Offline
Joined: 6 months
Last seen: 2 months

Alright, after a few more trials and errors, I think I found the answer. Just modify the command like this:

index_temp_1.modify{replace(type=regex)|\\|/}

and all back slashes will be forward slashes. Is this the only way to do this though, I wonder?

Blackbear199
Offline
Joined: 2 years
Last seen: 1 hour

this is just one of those quirks with webgrab you learn by trial and error(trust me u will find a few more).

the reason you first method doesnt work i can explain..

index_temp_1.modify {replace|\|/}

as you know webgrab uses the | (vertical pipe) as a separator bewteen expression-1(the \ in this case) and expression-2( the /).

index_temp_1.modify {replace|expression-1|expression-2}

so when u use \| as expression-1 to webgrab ur saying replace the vertical pipe and not a \ as with regex its acts like a escape.

so it other words webrab interepids it wrong.

this is all to do with webgrab using the vertical pipe to separate the expressions.

webgrab also uses the vertical pipe as a separator for multi value elements.

so when you see actors for example in the guide.xml..

<actor>name 1</actor>

<actor>name 2</actor>

ect..

when webgrab stores this stuff internally(before the xml file is written) is in this format..

name 1|name2|name 3|ect

the vertical pipe is used as a separator

there are times when scrubbing/modifying data that you may want to replace these to make the element non multi value

you would do this like your first attempt..

index_actor.modify {replace|\||##}

results..

name 1##name 2##name 3##ect

look closely at my expressions

espression-1 is \|

expression-2 ##

separated by a vertical pipe

back to your attempt,one thinks that escaping the \ would fix this..

index_temp_1.modify {replace|\\|/}

it doesnt,just adds to the confusion as now webgrab think expression-1 is a \ followed by a | (vertical pipe thats escaped).

in this case your replace line is never valid because the vertical pipe is never interepided as the separator between the 2 expressions.

it works with (type=regex) as a argument because it understands the first \ as a escape for the second and the vertical pipe is interepided and the separator between the 2 expressions.

 

 

 

khanhkronos
Offline
Joined: 6 months
Last seen: 2 months

Thank you for the detailed explanation! I have a feeling the problem is just what you've described. I was thinking along the line of Webgrab misunderstanding the symbols too, but was never able to find a way to correct it until I mess around with regex. Anyway, that is solved now. I have another question, but not sure if I should open another topic for it. For the website I am scraping, there is only one central show page listing all the programs, not the usual multitudes of href links  embbeded in the schedule directing to each page's own show. The problem is not all shows on the schedule is represented on that one show page, and so I was only able to match the index_title with the show titles that are already there. This leads to the classic (?) title issue in the xml file for those shows that are not presented. Is there anyway to force disable the index title and show title check within Webgrab?

Blackbear199
Offline
Joined: 2 years
Last seen: 1 hour

situations like this(if i understand you correctly) the easiest thing to do is NOT scrub the details title since some shows dont have one and leads to the (?) as you said.

instead let it use the index title

title.modify {addstart|'index_title'}

so there is no title.scrub line at all,just the one above.

 

 

khanhkronos
Offline
Joined: 6 months
Last seen: 2 months

Ah, yes...stupid me. That fixed the problem! Thanks again!

Log in or register to post comments

Brought to you by Jan van Straaten

Program Development - Jan van Straaten ------- Web design - Francis De Paemeleere
Supported by: servercare.nl