URL的InputStream

117

如何从URL获取InputStream？

例如，我想在url上获取文件，wwww.somewebsite.com/a.txt并通过servlet将其作为Java中的InputStream读取。

我试过了

InputStream is = new FileInputStream("wwww.somewebsite.com/a.txt");

但是我得到的是一个错误：

java.io.FileNotFoundException

java url inputstream

— 白熊
source

1

为什么要回滚删除servlets标签？这里没有javax.servlet.*涉及API。在带有main()方法的普通香草Java类中执行此操作时，您将遇到完全相同的问题。

— BalusC

1

也许您应该熟悉URL是什么：docs.oracle.com/javase/tutorial/networking/urls/definition.html

— b1nary.atr0phy

228

使用java.net.URL#openStream()正确的URL（包括协议！）。例如

InputStream input = new URL("http://www.somewebsite.com/a.txt").openStream();
// ...

也可以看看：

使用java.net.URLConnection触发和处理HTTP请求

— BalusC
source

2

您是否知道这是否在每次读取InputStream时发出网络请求，还是一次读取整个文件，从而不必在读取时发出网络请求？

— gsingh2011 '01

在Android中的UI线程中调用此方法将引发异常。在后台线程中执行此操作。使用Bolts-Android

— Behrouz.M，

19

尝试：

final InputStream is = new URL("http://wwww.somewebsite.com/a.txt").openStream();

— 威士忌酒
source

10

（a）wwww.somewebsite.com/a.txt不是“文件URL”。它根本不是URL。如果放在http://它的最前面，那将是HTTP URL，这显然是您打算在此处使用的URL。

（b）FileInputStream用于文件，而不用于URL。

（c）从任何 URL 获取输入流的方法是通过URL.openStream(),或URL.getConnection().getInputStream(),等效，但是您可能还有其他理由先获取URLConnection和使用它。

— 洛恩侯爵
source

4

您的原始代码使用FileInputStream，该文件用于访问文件系统托管的文件。

您使用的构造函数将尝试在当前工作目录（系统属性user.dir的值）的www.somewebsite.com子文件夹中找到一个名为a.txt的文件。您提供的名称使用File类解析为文件。

URL对象是解决此问题的通用方法。您可以使用URL访问本地文件，也可以访问网络托管的资源。除了http：//或https：//之外，URL类还支持file：//协议，因此您很高兴。

— 克里斯蒂安·波蒂扎（Cristian Botiza）
source

2

纯Java：

 urlToInputStream(url,httpHeaders);

我取得了一些成功，使用了这种方法。它处理重定向与一个可以通过可变数量的HTTP标头为Map<String,String>。它还允许从HTTP重定向到HTTPS。

private InputStream urlToInputStream(URL url, Map<String, String> args) {
    HttpURLConnection con = null;
    InputStream inputStream = null;
    try {
        con = (HttpURLConnection) url.openConnection();
        con.setConnectTimeout(15000);
        con.setReadTimeout(15000);
        if (args != null) {
            for (Entry<String, String> e : args.entrySet()) {
                con.setRequestProperty(e.getKey(), e.getValue());
            }
        }
        con.connect();
        int responseCode = con.getResponseCode();
        /* By default the connection will follow redirects. The following
         * block is only entered if the implementation of HttpURLConnection
         * does not perform the redirect. The exact behavior depends to 
         * the actual implementation (e.g. sun.net).
         * !!! Attention: This block allows the connection to 
         * switch protocols (e.g. HTTP to HTTPS), which is <b>not</b> 
         * default behavior. See: /programming/1884230 
         * for more info!!!
         */
        if (responseCode < 400 && responseCode > 299) {
            String redirectUrl = con.getHeaderField("Location");
            try {
                URL newUrl = new URL(redirectUrl);
                return urlToInputStream(newUrl, args);
            } catch (MalformedURLException e) {
                URL newUrl = new URL(url.getProtocol() + "://" + url.getHost() + redirectUrl);
                return urlToInputStream(newUrl, args);
            }
        }
        /*!!!!!*/

        inputStream = con.getInputStream();
        return inputStream;
    } catch (Exception e) {
        throw new RuntimeException(e);
    }
}

完整示例调用

private InputStream getInputStreamFromUrl(URL url, String user, String passwd) throws IOException {
        String encoded = Base64.getEncoder().encodeToString((user + ":" + passwd).getBytes(StandardCharsets.UTF_8));
        Map<String,String> httpHeaders=new Map<>();
        httpHeaders.put("Accept", "application/json");
        httpHeaders.put("User-Agent", "myApplication");
        httpHeaders.put("Authorization", "Basic " + encoded);
        return urlToInputStream(url,httpHeaders);
    }

— 施纳斯
source

HttpURLConnection除非您没有告诉重定向，否则您将已经遵循重定向。

— 罗恩侯爵'18年

1

我知道OP没有提到标头，但是我很欣赏简洁的示例（考虑到它是Java）。

— chbrown

@EJP我添加了一些解释作为内联注释。我认为，对于HTTP 301将HTTP地址重定向到HTTPS地址的情况，我主要介绍了重定向块。当然，这超出了原始问题，但是这是默认情况下无法处理的常见用例。请参阅：stackoverflow.com/questions/1884230/…–

— jschnasse

HttpURLConnection正如我已经说过的那样，您的代码在没有重定向块的情况下同样可以很好地工作，因为默认情况下已经遵循了默认重定向。

— 罗恩侯爵

@ user207421这部分正确。重定向块用于协议交换机，例如http-> https，默认情况下不支持。我试图在代码注释中表达这一点。参见stackoverflow.com/questions/1884230/…。

— jschnasse

-1

这是读取指定网页内容的完整示例。从HTML表单读取网页。我们使用标准InputStream类，但是使用JSoup库可以更轻松地完成。

<dependency>
    <groupId>javax.servlet</groupId>
    <artifactId>javax.servlet-api</artifactId>
    <version>3.1.0</version>
    <scope>provided</scope>

</dependency>

<dependency>
    <groupId>commons-validator</groupId>
    <artifactId>commons-validator</artifactId>
    <version>1.6</version>
</dependency>

这些是Maven依赖项。我们使用Apache Commons库来验证URL字符串。

package com.zetcode.web;

import com.zetcode.service.WebPageReader;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import javax.servlet.ServletException;
import javax.servlet.ServletOutputStream;
import javax.servlet.annotation.WebServlet;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

@WebServlet(name = "ReadWebPage", urlPatterns = {"/ReadWebPage"})
public class ReadWebpage extends HttpServlet {

    @Override
    protected void doGet(HttpServletRequest request, HttpServletResponse response)
            throws ServletException, IOException {

        response.setContentType("text/plain;charset=UTF-8");

        String page = request.getParameter("webpage");

        String content = new WebPageReader().setWebPageName(page).getWebPageContent();

        ServletOutputStream os = response.getOutputStream();
        os.write(content.getBytes(StandardCharsets.UTF_8));
    }
}

该ReadWebPageservlet的读取特定网页的内容，并以纯文本格式发送回客户端。阅读页面的任务委托给WebPageReader。

package com.zetcode.service;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.URL;
import java.nio.charset.StandardCharsets;
import java.util.logging.Level;
import java.util.logging.Logger;
import java.util.stream.Collectors;
import org.apache.commons.validator.routines.UrlValidator;

public class WebPageReader {

    private String webpage;
    private String content;

    public WebPageReader setWebPageName(String name) {

        webpage = name;
        return this;
    }

    public String getWebPageContent() {

        try {

            boolean valid = validateUrl(webpage);

            if (!valid) {

                content = "Invalid URL; use http(s)://www.example.com format";
                return content;
            }

            URL url = new URL(webpage);

            try (InputStream is = url.openStream();
                    BufferedReader br = new BufferedReader(
                            new InputStreamReader(is, StandardCharsets.UTF_8))) {

                content = br.lines().collect(
                      Collectors.joining(System.lineSeparator()));
            }

        } catch (IOException ex) {

            content = String.format("Cannot read webpage %s", ex);
            Logger.getLogger(WebPageReader.class.getName()).log(Level.SEVERE, null, ex);
        }

        return content;
    }

    private boolean validateUrl(String webpage) {

        UrlValidator urlValidator = new UrlValidator();

        return urlValidator.isValid(webpage);
    }
}

WebPageReader验证URL并读取网页的内容。它返回一个包含页面HTML代码的字符串。

<!DOCTYPE html>
<html>
    <head>
        <title>Home page</title>
        <meta charset="UTF-8">
    </head>
    <body>
        <form action="ReadWebPage">

            <label for="page">Enter a web page name:</label>
            <input  type="text" id="page" name="webpage">

            <button type="submit">Submit</button>

        </form>
    </body>
</html>

最后，这是包含HTML表单的主页。这取自我关于该主题的教程。

— 扬·博德纳
source