如何在PHP中获取网页的HTML代码？

91

我想在PHP中检索链接（网页）的HTML代码。例如，如果链接是

/programming/ask

那么我要提供该页面的HTML代码。我想检索此HTML代码并将其存储在PHP变量中。

我怎样才能做到这一点？

php html

— 普拉尚
source

你能再解释一下吗？您想将网络请求发送到给定的URL并读取对我猜的变量的响应？

— Chathuranga Chandrasekara

是的，我也想这样做，我希望整个源代码都在该Web请求返回的变量中。

— Prashant

1

您可以使用此工具轻松地删除html。

— Faraz Kelhini 2014年

即使将allow_url_fopen设置为true，此函数也不会返回页面的HTML吗？我还应该检查什么？

— CodeForGood

138

如果您的PHP服务器允许使用url fopen包装器，则最简单的方法是：

$html = file_get_contents('/programming/ask');

如果需要更多控制，则应查看cURL函数：

$c = curl_init('/programming/ask');
curl_setopt($c, CURLOPT_RETURNTRANSFER, true);
//curl_setopt(... other options you want...)

$html = curl_exec($c);

if (curl_error($c))
    die(curl_error($c));

// Get the status code
$status = curl_getinfo($c, CURLINFO_HTTP_CODE);

curl_close($c);

— 格雷格
source

我担心404。如果链接不存在，那么我就不需要它的内容，而是想显示一条错误消息？我们如何找到该URL是否给出404错误（简单的URL是否起作用）？

— Prashant

1

@Prashant：我已经编辑过添加一个curl_getinfo调用，它将给您200或404或其他值

— Greg

另外，PHP如何获得当前页面的HTML？

— Renaro Santos 2014年

这是跨网域吗？

— I.A.A.盖伊2016年

在PHP7上不起作用。检查了php.ini并打开了fopen。

— 卡斯珀·帕尔吉

22

另外，如果您想以某种方式操作检索到的页面，则可能需要尝试一些php DOM解析器。我发现PHP简单HTML DOM解析器 非常易于使用。

— 德米特里·皮萨列夫（Dmitri Pisarev）
source

11

您可能想从Yahoo签出YQL库：http : //developer.yahoo.com/yql

手头的任务很简单

select * from html where url = 'http://stackoverflow.com/questions/ask'

您可以在控制台中尝试以下操作：http : //developer.yahoo.com/yql/console（需要登录）

另请参阅Chris Heilmanns截屏，以获取一些不错的想法，您还可以做什么：http : //developer.yahoo.net/blogs/theater/archives/2009/04/screencast_collating_distributed_information.html

— 艾克蒙德
source

10

简单方法：使用file_get_contents()：

$page = file_get_contents('http://stackoverflow.com/questions/ask');

请注意，allow_url_fopen一定要true在你php.ini能够使用URL的fopen封装。

更高级的方法：如果您无法更改PHP配置（默认情况下allow_url_fopen是）false，并且已安装ext / curl，请使用该cURL库连接到所需的页面。

— 斯蒂芬·格里格（Stefan Gehrig）
source

即使将allow_url_fopen设置为true，此函数也不会返回页面的HTML吗？我还应该检查什么？

— CodeForGood

4

如果要将源存储为变量，则可以使用file_get_contents，但是curl是更好的方法。

$url = file_get_contents('http://example.com');
echo $url;

该解决方案将在您的网站上显示该网页。但是，卷曲是更好的选择。

— 小猪
source

3

看一下这个功能：

http://ru.php.net/manual/zh/function.file-get-contents.php

— 谢尔盖
source

3

include_once('simple_html_dom.php');
$url="http://stackoverflow.com/questions/ask";
$html = file_get_html($url);

您可以使用此代码以数组（解析形式）的形式获取整个HTML代码，在此处http://sourceforge.net/projects/simplehtmldom/files/simple_html_dom.php/download下载'simple_html_dom.php'文件

— 萨拉斯
source

2

这是两种从URL获取内容的简单方法：

1）第一种方法

从您的主机（php.ini或其他地方）启用Allow_url_include

<?php
$variableee = readfile("http://example.com/");
echo $variableee;
?>

要么

2）第二种方法

启用php_curl，php_imap和php_openssl

<?php
// you can add anoother curl options too
// see here - http://php.net/manual/en/function.curl-setopt.php
function get_dataa($url) {
  $ch = curl_init();
  $timeout = 5;
  curl_setopt($ch, CURLOPT_URL, $url);
  curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0)");
  curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
  curl_setopt($ch, CURLOPT_SSL_VERIFYHOST,false);
  curl_setopt($ch, CURLOPT_SSL_VERIFYPEER,false);
  curl_setopt($ch, CURLOPT_MAXREDIRS, 10);
  curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
  curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
  $data = curl_exec($ch);
  curl_close($ch);
  return $data;
}

$variableee = get_dataa('http://example.com');
echo $variableee;
?>

— 托杜亚
source

1

您也可以使用DomDocument方法来获取单个HTML标签级别的变量

$homepage = file_get_contents('https://www.example.com/');
$doc = new DOMDocument;
$doc->loadHTML($homepage);
$titles = $doc->getElementsByTagName('h3');
echo $titles->item(0)->nodeValue;

— Krishnamoorthy Acharya
source

1

$output = file("http://www.example.com");直到我启用后才起作用：allow_url_fopen, allow_url_include,并且file_uploads在php.iniPHP7中

— 肯
source