**------------------------------------------------------------------------------------------------ * @header_start * WebGrab+Plus ini for grabbing IMDB data from TvGuide websites * @MinSWversion: V3.1.4.2 * - (postprocess V3.0.3) * @Site: imdb.com, primary search with imdb * @Revision 11 - [03/01/2021] Jan van Straaten * - rewrite , based on imdb.com.ask rev 13 * @Remarks: none * @header_end **------------------------------------------------------------------------------------------------ site {url=imdb.com|mdbinitype=movie|cultureinfo=en-GB|charset=UTF-8|matchfactor=60|searchsite=imdb} * scope.range {(primarysearch)|end} * primary search (using imdb's title search): url_primarysearch {url|https://www.imdb.com/find?q=|'title'|&ref_=nv_sr_sm} *url_primarysearch.modify {remove()|%28%29} * in case no productiondate url_primarysearch.modify {replace()| |+} url_primarysearch.modify {replace()|'|%27} url_primarysearch.modify {replace()|:|%3A} url_primarysearch.modify {replace()|https%3A|https:} url_primarysearch.modify {replace()|;|%3B} url_primarysearch.headers {customheader=Accept-Encoding=gzip,deflate} mdb_show_id.scrub {regex()|primary||/tt(\d{7,8})/||} mdb_show_id.modify {substring(type=element)|0 3} * imdb url's: url_mdb_p1 {url()|primary|https://www.imdb.com/title/tt|mdb_show_id|/} url_mdb_p2 {url|primary|https://www.imdb.com/title/tt|mdb_show_id|/plotsummary} url_mdb_p3 {url|primary|https://www.imdb.com/title/tt|mdb_show_id|/releaseinfo#akas} url_mdb_p4 {url|primary|https://www.imdb.com/title/tt|mdb_show_id|/reviews} url_mdb_p5 {url|primary|https://www.imdb.com/title/tt|mdb_show_id|/fullcredits#cast} *url_mdb_p6 {url|primary|https://www.imdb.com/title/tt|mdb_show_id|/criticreviews?ref_=tturv_sa_5 * url_mdb.headers {customheader=Accept-Encoding=gzip,deflate,br} end_scope *imdb elements scope.range {(match)|end} * musthaves *mdb_title.scrub {single|p3| (original title)|||}* original title mdb_title.scrub {single(separator=" - " exclude="IMDb" include=first)|p1|||(|} * OK mdb_title.scrub {regex|p3||(.+?)||} *aka's *OK mdb_title.modify {cleanup} end_scope scope.range {(getelements)|end} mdb_actor.scrub {regex|p5||name/nm\d+?/\?ref_=ttfc_fc_cl_t\d+?\"\s>(.+?)\s+?.+?\s+?(.+?)\s+?||} mdb_actor.modify {substring(type=element)|0 24} * keep the first 12 actors (double because still name|role format mdb_actor.modify {remove(type=regex)|""} mdb_actor.modify {replace|\||###} mdb_actor.modify {replace(type=regex)|"(###\s{5,})\w+"| (role=} mdb_actor.modify {replace|###|\|} ** the next is only necessary if actor still contains 9 or more spaces (after the role name) ** some p5 pages have a very simple actor listing without these spaces, so we add them mdb_actor.modify {addend| } mdb_actor.modify {substring(type=regex)|"\A\s(.+?\s\(role=.+?)\s{9}"} mdb_actor.modify {remove|} mdb_actor.modify {addend(not "")|)} * mdb_director.scrub {multi|p1|"director":|"name": "|"|\},} mdb_director.scrub {multi|p5|/?ref_=ttfc_fc_dr|"\r> ||} * fulllist mdb_director.modify {substring(type=element)|0 5} * keep the first 6 * mdb_productiondate.scrub {single|p1||||} mdb_category.scrub {multi|p1|genres&ref_=tt_ov_inf|>||} mdb_category.modify {cleanup(removeduplicates)} * mdb_description.scrub {single|p1||} mdb_starrating.scrub {single()|p1|
|itemprop="ratingValue">||
} mdb_starratingvotes.scrub {single|p1|
|based on|user ratings|
} mdb_country.scrub {regex|p1||

Release Date:

.+?\((.+?)\)||} mdb_plot.scrub {single|p2|id="plot-summaries-content">|

|

|} mdb_commentsummary.scrub {multi(max=5 excludeblock="Warning: Spoilers")|p4|class="title" >|||
} mdb_review.scrub {multi(max=1 excludeblock="Warning: Spoilers")|p4|class="title" >|
|
|
} mdb_review.modify {cleanup} end_scope *