David Bowley
24 Oct 2007, 11:59 AM
<?php
$url = "http://www.somedomain.com/curltest/testpage.html";
$ch = curl_init(); // initialize curl handle
curl_setopt($ch, CURLOPT_URL,$url); // set url to post to
curl_setopt($ch, CURLOPT_FAILONERROR, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);// allow redirects
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1); // return into a variable
curl_setopt($ch, CURLOPT_TIMEOUT, 3); // times out after 4s
$myHtml = curl_exec($ch); // run the whole process
curl_close($ch);
$pattern = "<HTML\b[^>]*>(.*?)</HTML>"
preg_match_all($pattern, $myHtml, $result, PREG_SET_ORDER);
echo $pattern;
?>
At the moment nothing is being output here. Now what should be happening just as a tester of the RegEx is that everything between the HTML tags on the page should be output to the page, however nothing happens.
Once I've got this sorted out I want a different RegEx so that it can extract all links on a page that have a certain phrase in the anchor text.
Any help anyone?
$url = "http://www.somedomain.com/curltest/testpage.html";
$ch = curl_init(); // initialize curl handle
curl_setopt($ch, CURLOPT_URL,$url); // set url to post to
curl_setopt($ch, CURLOPT_FAILONERROR, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);// allow redirects
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1); // return into a variable
curl_setopt($ch, CURLOPT_TIMEOUT, 3); // times out after 4s
$myHtml = curl_exec($ch); // run the whole process
curl_close($ch);
$pattern = "<HTML\b[^>]*>(.*?)</HTML>"
preg_match_all($pattern, $myHtml, $result, PREG_SET_ORDER);
echo $pattern;
?>
At the moment nothing is being output here. Now what should be happening just as a tester of the RegEx is that everything between the HTML tags on the page should be output to the page, however nothing happens.
Once I've got this sorted out I want a different RegEx so that it can extract all links on a page that have a certain phrase in the anchor text.
Any help anyone?