如何在C ++中将整个文件读入std :: string？

Question 1

如何将文件读入std::string，即一次读取整个文件？

文本或二进制模式应由调用方指定。该解决方案应符合标准，可移植且高效。它不应不必要地复制字符串的数据，并且应避免在读取字符串时重新分配内存。

这样做将是STAT的文件大小，单向调整std::string和fread()进入std::string的const_cast<char*>()'编辑data()。这就要求std::string的数据是连续的，这不是标准所要求的，但是对于所有已知的实现来说都是如此。更糟糕的是，如果以文本模式读取文件，则文件std::string的大小可能不等于文件的大小。

一个完全正确的，符合标准的和便携式解决方案可以使用以下构建std::ifstream的rdbuf()进入std::ostringstream，并从那里进入std::string。但是，这可能会复制字符串数据和/或不必要地重新分配内存。

所有相关的标准库实现是否足够聪明，可以避免所有不必要的开销？
还有另一种方法吗？
我是否错过了一些已经提供所需功能的隐藏Boost功能？

void slurp(std::string& data, bool is_binary)

Question 2

一种方法是将流缓冲区刷新到单独的内存流中，然后将其转换为std::string：

std::string slurp(std::ifstream& in) {
    std::ostringstream sstr;
    sstr << in.rdbuf();
    return sstr.str();
}

这很简洁。但是，正如问题所指出的那样，它执行冗余副本，但是不幸的是，根本没有办法删除此副本。

不幸的是，唯一避免冗余副本的真正解决方案是手动循环读取。由于C ++现在已经保证了连续的字符串，因此可以编写以下代码（≥C++ 14）：

auto read_file(std::string_view path) -> std::string {
    constexpr auto read_size = std::size_t{4096};
    auto stream = std::ifstream{path.data()};
    stream.exceptions(std::ios_base::badbit);

    auto out = std::string{};
    auto buf = std::string(read_size, '\0');
    while (stream.read(& buf[0], read_size)) {
        out.append(buf, 0, stream.gcount());
    }
    out.append(buf, 0, stream.gcount());
    return out;
}

Question 3

看到类似问题的答案。

为了您的方便，我重新发布了CTT的解决方案：

string readFile2(const string &fileName)
{
    ifstream ifs(fileName.c_str(), ios::in | ios::binary | ios::ate);

    ifstream::pos_type fileSize = ifs.tellg();
    ifs.seekg(0, ios::beg);

    vector<char> bytes(fileSize);
    ifs.read(bytes.data(), fileSize);

    return string(bytes.data(), fileSize);
}

当与Moby Dick（1.3M）进行平均100次运行时，此解决方案的执行时间比此处提供的其他答案快20％。对于便携式C ++解决方案来说还不错，我想看看mmap'ing文件的结果;）

Question 4

最短的变体： Live On Coliru

std::string str(std::istreambuf_iterator<char>{ifs}, {});

它需要标题<iterator>。

有报道说，此方法比预分配字符串和使用慢std::istream::read。但是，在启用了优化功能的现代编译器上，情况似乎不再如此，尽管各种方法的相对性能似乎高度依赖于编译器。

Question 5

使用

#include <iostream>
#include <sstream>
#include <fstream>

int main()
{
  std::ifstream input("file.txt");
  std::stringstream sstr;

  while(input >> sstr.rdbuf());

  std::cout << sstr.str() << std::endl;
}

或非常接近的东西。我没有打开stdlib参考资料来仔细检查自己。

是的，我知道我没有slurp按照要求编写函数。

Question 6

如果您使用的是C ++ 17（std :: filesystem），也可以使用这种方法（通过获取文件的大小std::filesystem::file_size而不是seekgand tellg）：

#include <filesystem>
#include <fstream>
#include <string>

namespace fs = std::filesystem;

std::string readFile(fs::path path)
{
    // Open the stream to 'lock' the file.
    std::ifstream f(path, std::ios::in | std::ios::binary);

    // Obtain the size of the file.
    const auto sz = fs::file_size(path);

    // Create a buffer.
    std::string result(sz, '\0');

    // Read the whole file into the buffer.
    f.read(result.data(), sz);

    return result;
}

注意：如果您的标准库尚未完全支持C ++ 17 <experimental/filesystem>，std::experimental::filesystem则可能需要使用。如果它不支持非常量std :: basic_string dataresult.data()，&result[0]则可能还需要替换为。

Question 7

我的信誉不足，无法使用来直接评论响应tellg()。

请注意，tellg()错误可能会返回-1。如果您要传递的结果tellg()作为分配参数，则应该先检查结果。

问题的一个示例：

...
std::streamsize size = file.tellg();
std::vector<char> buffer(size);
...

在上面的示例中，如果tellg()遇到错误，它将返回-1。签署（IE的结果之间的隐式转换tellg()）和无符号（即ARG的vector<char>构造函数）将导致您的载体错误分配一个非常大的字节数。（可能是4294967295字节，即4GB。）

修改paxos1977的答案以解决上述问题：

string readFile2(const string &fileName)
{
    ifstream ifs(fileName.c_str(), ios::in | ios::binary | ios::ate);

    ifstream::pos_type fileSize = ifs.tellg();
    if (fileSize < 0)                             <--- ADDED
        return std::string();                     <--- ADDED

    ifs.seekg(0, ios::beg);

    vector<char> bytes(fileSize);
    ifs.read(&bytes[0], fileSize);

    return string(&bytes[0], fileSize);
}

Question 8

此解决方案将错误检查添加到基于rdbuf（）的方法中。

std::string file_to_string(const std::string& file_name)
{
    std::ifstream file_stream{file_name};

    if (file_stream.fail())
    {
        // Error opening file.
    }

    std::ostringstream str_stream{};
    file_stream >> str_stream.rdbuf();  // NOT str_stream << file_stream.rdbuf()

    if (file_stream.fail() && !file_stream.eof())
    {
        // Error reading file.
    }

    return str_stream.str();
}

我添加此答案是因为将错误检查添加到原始方法并不像您期望的那么简单。原始方法使用stringstream的插入运算符（str_stream << file_stream.rdbuf()）。问题是，当没有插入任何字符时，这将设置字符串流的故障位。这可能是由于错误，也可能是由于文件为空。如果通过检查故障位来检查故障，则在读取空文件时会遇到误报。您如何消除由于文件为空而导致的不能插入任何字符的合法失败和“失败”来插入歧义的歧义？

您可能会想明确检查一个空文件，但这是更多代码和相关的错误检查。

检查失败情况str_stream.fail() && !str_stream.eof()不起作用，因为插入操作未设置eofbit（在ostringstream或ifstream上）。

因此，解决方案是更改操作。不要使用ostringstream的插入运算符（<<），而是使用ifstream的提取运算符（>>），它会设置eofbit。然后检查故障情况file_stream.fail() && !file_stream.eof()。

重要的是，当file_stream >> str_stream.rdbuf()遇到合法故障时，就永远不要设置eofbit（根据我对规范的理解）。这意味着上述检查足以检测合法故障。

Question 9

这是使用新文件系统库并具有相当强大的错误检查功能的版本：

#include <cstdint>
#include <exception>
#include <filesystem>
#include <fstream>
#include <sstream>
#include <string>

namespace fs = std::filesystem;

std::string loadFile(const char *const name);
std::string loadFile(const std::string &name);

std::string loadFile(const char *const name) {
  fs::path filepath(fs::absolute(fs::path(name)));

  std::uintmax_t fsize;

  if (fs::exists(filepath)) {
    fsize = fs::file_size(filepath);
  } else {
    throw(std::invalid_argument("File not found: " + filepath.string()));
  }

  std::ifstream infile;
  infile.exceptions(std::ifstream::failbit | std::ifstream::badbit);
  try {
    infile.open(filepath.c_str(), std::ios::in | std::ifstream::binary);
  } catch (...) {
    std::throw_with_nested(std::runtime_error("Can't open input file " + filepath.string()));
  }

  std::string fileStr;

  try {
    fileStr.resize(fsize);
  } catch (...) {
    std::stringstream err;
    err << "Can't resize to " << fsize << " bytes";
    std::throw_with_nested(std::runtime_error(err.str()));
  }

  infile.read(fileStr.data(), fsize);
  infile.close();

  return fileStr;
}

std::string loadFile(const std::string &name) { return loadFile(name.c_str()); };

Question 10

这样的事情应该不会太糟：

void slurp(std::string& data, const std::string& filename, bool is_binary)
{
    std::ios_base::openmode openmode = ios::ate | ios::in;
    if (is_binary)
        openmode |= ios::binary;
    ifstream file(filename.c_str(), openmode);
    data.clear();
    data.reserve(file.tellg());
    file.seekg(0, ios::beg);
    data.append(istreambuf_iterator<char>(file.rdbuf()), 
                istreambuf_iterator<char>());
}

这样做的好处是我们首先进行了保留，因此读入内容时不必增加字符串。缺点是我们逐个字符地进行处理。一个更聪明的版本可以获取整个读取的buf，然后调用下溢。

Question 11

您可以使用'std :: getline'函数，并指定'eof'作为分隔符。结果代码有点晦涩：

std::string data;
std::ifstream in( "test.txt" );
std::getline( in, data, std::string::traits_type::to_char_type( 
                  std::string::traits_type::eof() ) );

Question 12

切勿写入std :: string的const char *缓冲区。永远不能！这样做是一个巨大的错误。

在std :: string中为整个字符串保留（）空间，从文件中将适当大小的块读取到缓冲区中，然后执行append（）。块必须有多大取决于输入文件的大小。我敢肯定，所有其他可移植且符合STL的机制都将执行相同的操作（但可能看起来更漂亮）。

Question 13

因为这似乎是一种广泛使用的实用程序，所以我的方法是搜索并偏爱已有的库来手工制作解决方案，尤其是在您的项目中已经链接了boost库（链接器标志-lboost_system -lboost_filesystem）的情况下。在这里（还有旧的Boost版本），boost提供了一个load_string_file实用程序：

#include <iostream>
#include <string>
#include <boost/filesystem/string_file.hpp>

int main() {
    std::string result;
    boost::filesystem::load_string_file("aFileName.xyz", result);
    std::cout << result.size() << std::endl;
}

优点是，此函数无需查找整个文件来确定大小，而是在内部使用stat（）。但是，作为一个可能忽略不计的缺点，可以很容易地推断出源代码：字符串不必要地用'\0'由文件内容重写的字符来调整大小。

Question 14

#include <string>
#include <sstream>

using namespace std;

string GetStreamAsString(const istream& in)
{
    stringstream out;
    out << in.rdbuf();
    return out.str();
}

string GetFileAsString(static string& filePath)
{
    ifstream stream;
    try
    {
        // Set to throw on failure
        stream.exceptions(fstream::failbit | fstream::badbit);
        stream.open(filePath);
    }
    catch (system_error& error)
    {
        cerr << "Failed to open '" << filePath << "'\n" << error.code().message() << endl;
        return "Open fail";
    }

    return GetStreamAsString(stream);
}

用法：

const string logAsString = GetFileAsString(logFilePath);

Question 15

基于CTT解决方案的更新功能：

#include <string>
#include <fstream>
#include <limits>
#include <string_view>
std::string readfile(const std::string_view path, bool binaryMode = true)
{
    std::ios::openmode openmode = std::ios::in;
    if(binaryMode)
    {
        openmode |= std::ios::binary;
    }
    std::ifstream ifs(path.data(), openmode);
    ifs.ignore(std::numeric_limits<std::streamsize>::max());
    std::string data(ifs.gcount(), 0);
    ifs.seekg(0);
    ifs.read(data.data(), data.size());
    return data;
}

有两个重要区别：

tellg()自文件的开头起，不能保证以字节为单位返回偏移量。相反，正如Puzomor Croatia指出的那样，它更多是可以在fstream调用中使用的令牌。gcount()但是会返回最后提取的未格式化字节的数量。因此，我们打开文件，提取并丢弃所有内容，ignore()以获取文件的大小，然后基于该文件构造输出字符串。

其次，我们避免通过直接写入字符串将文件的数据从a复制std::vector<char>到a std::string。

在性能方面，这应该是绝对最快的，提前分配适当大小的字符串并调用read()一次。作为一个有趣的事实，在gcc上使用ignore()和countg()代替ateandtellg()编译成几乎相同的东西，一点一点。

Question 16

#include <iostream>
#include <fstream>
#include <string.h>
using namespace std;
main(){
    fstream file;
    //Open a file
    file.open("test.txt");
    string copy,temp;
    //While loop to store whole document in copy string
    //Temp reads a complete line
    //Loop stops until temp reads the last line of document
    while(getline(file,temp)){
        //add new line text in copy
        copy+=temp;
        //adds a new line
        copy+="\n";
    }
    //Display whole document
    cout<<copy;
    //close the document
    file.close();
}