Help needed..unable to scrub "part" of this line !

5 posts / 0 new

Last post

Fri, 2015-10-23 22:35

karimf

Offline

Joined: 10 years

Last seen: 8 years

Help needed..unable to scrub "part" of this line !

Hello again :)

I've been trying to scrub part of this following line to use it as the productiondate of the show.

The line is:

...Of Reason: Renée Zellweger is back, but still torn between steady Colin Firth and slimy Hugh Grant. Will it be wedding cake or comfort ice cream for bumbling Bridget? (2004)(104 mins)

I want to scrub the 2004 part, this is my regex that is not working:

temp_1.scrub {regex(debug)||\s*?\d{4}\s||}

productiondate.modify {addstart|'temp_1'}

The log says "not match found".

Anyone could give me any guidance guys ?

Thanks.

Sat, 2015-10-24 05:20

francis

Offline

Joined: 12 years

Last seen: 2 weeks

Is the support helpful?

2 answers:

1. http://regexpal.com/

2.[^>]*?\([12]\d{3}\)

Things to know:

\s are white spaces. (so yours could never work)

[^>] means all values except >

*? means, take 0 or more values (the smallest amount possible)

[12]\d{3} means, any 4 digit number starting with 1 or 2

Sat, 2015-10-24 14:01

karimf

Offline

Joined: 10 years

Last seen: 8 years

Thanks Francis again for always helping and giving from your time.

1. I use the mentioned site but somethings slip out of my limited knowledge :) but I'm learning

2. the expression you gave me doesn't get a match. Here's an example from the log file:

[ Debug ] No Production date found in:

[ Debug ] Debugging information SiteIni
[ Debug ] Element: TEMP_3
[ Debug ] html source written to : C:\ProgramData\ServerCare\WebGrab\html.source.htm
[ Debug ] scrub strings:
[ Debug ] type & arguments : regex(debug)
[ Debug ] regex_expression : [^>]*?\([12]\d{3}\)
[ Debug ] !! No match group definition () in :[^>]*?\([12]\d{3}\)
[ Debug ] Found 1 top level un-grouped match(es):
[ Debug ] Paroled US Army ranger Nicolas Cage becomes trapped on a hijacked prison plane. OTT action with Steve Buscemi and John Malkovich among the cons. (1997)
[ Debug ] Element Value(s) :
[ Debug ] ----------begin--element----------
Paroled US Army ranger Nicolas Cage becomes trapped on a hijacked prison plane. OTT action with Steve Buscemi and John Malkovich among the cons. (1997)
[ Debug ] ----------end----element----------

It seems it stops after the 4 digits are found not "scrub" the 4 digits (the original has about 5 more words in it after these 4 digits)

4. I tried this regex and it worked:

productiondate.scrub{regex(debug)||.+?(\d{4})||}

What do you think ? is it robust enough ?

Thanks again .

Sat, 2015-10-24 15:22

francis

Offline

Joined: 12 years

Last seen: 2 weeks

Is the support helpful?

2. For me its seems to work correctly. Because you did not defined a group in your regex, it will grab all. So just define a group around the year, and wg++ will only return the result of the group. You can see that because WG++ warns you about it with:
!! No match group definition () in

So just change

<p class=\"bubble-programme-description description\">[^>]*?\([12]\d{3}\)
into
<p class=\"bubble-programme-description description\">[^>]*?\(([12]\d{3})\)

4. Well, I don't know if there are other blocks after the description . If so, it is risky because you it could grab something like
goto(2100)
that is occurring after the description

Also yours will catch "this show is about the 2000 people ..."

Sat, 2015-10-24 16:28

karimf

Offline

Joined: 10 years

Last seen: 8 years

Thanks again Francis, as usual giving a hand to everybody here :)

I guess that your modified regex is of course better. At least it will not catch wrong numbers like in the example you gave.

There are no blocks after but I am going to use your regex, it is of course and as usual better than what I try :)

Thanks again.

WebGrab+Plus

You are here

Help needed..unable to scrub "part" of this line !

WebGrab+Plus

Search form

You are here

Help needed..unable to scrub "part" of this line !