如何将多维数组写入文本文件？

115

在另一个问题中，如果我可以提供遇到问题的阵列，其他用户会提供一些帮助。但是，我什至无法完成基本的I / O任务，例如将数组写入文件。

谁能解释我需要向文件写入4x11x14 numpy数组的哪种循环？

该数组由四个11 x 14数组组成，因此我应该使用漂亮的换行符对其进行格式化，以使文件读取更加容易。

编辑：所以我已经尝试了numpy.savetxt函数。奇怪的是，它给出了以下错误：

TypeError: float argument required, not numpy.ndarray

我认为这是因为函数不适用于多维数组？我希望在一个文件中找到任何解决方案吗？

python file-io numpy

— 伊沃·弗利普斯（Ivo Flipse）
source

197

如果您想将其写入磁盘，以便轻松地以numpy数组的形式读回，请查看numpy.save。对其进行酸洗也可以，但是对于大型阵列而言效率较低（您的不是，因此两者都很好）。

如果您希望它易于阅读，请查看numpy.savetxt。

编辑： 所以，savetxt对于> 2维数组似乎似乎不是一个很好的选择...但是只是为了得出所有结论，它的全部结论是：

我刚刚意识到，numpy.savetxt大于2维的ndarray上的阻塞...这可能是设计使然，因为没有固有定义的方式来指示文本文件中的其他维。

例如，这个（二维数组）可以正常工作

import numpy as np
x = np.arange(20).reshape((4,5))
np.savetxt('test.txt', x)

TypeError: float argument required, not numpy.ndarray对于3D数组，相同的操作将失败（错误消息不多：）：

import numpy as np
x = np.arange(200).reshape((4,5,10))
np.savetxt('test.txt', x)

一种解决方法是将3D（或更大）阵列分成2D切片。例如

x = np.arange(200).reshape((4,5,10))
with file('test.txt', 'w') as outfile:
    for slice_2d in x:
        np.savetxt(outfile, slice_2d)

但是，我们的目标是使人类清晰易读，同时仍然可以轻松地将其读回numpy.loadtxt。因此，我们可以稍微冗长一些，并使用注释出的行来区分切片。默认情况下，numpy.loadtxt将忽略任何以#（或commentskwarg 指定的任何字符）开头的行。（看起来比实际要冗长得多...）

import numpy as np

# Generate some test data
data = np.arange(200).reshape((4,5,10))

# Write the array to disk
with open('test.txt', 'w') as outfile:
    # I'm writing a header here just for the sake of readability
    # Any line starting with "#" will be ignored by numpy.loadtxt
    outfile.write('# Array shape: {0}\n'.format(data.shape))

    # Iterating through a ndimensional array produces slices along
    # the last axis. This is equivalent to data[i,:,:] in this case
    for data_slice in data:

        # The formatting string indicates that I'm writing out
        # the values in left-justified columns 7 characters in width
        # with 2 decimal places.  
        np.savetxt(outfile, data_slice, fmt='%-7.2f')

        # Writing out a break to indicate different slices...
        outfile.write('# New slice\n')

这样产生：

# Array shape: (4, 5, 10)
0.00    1.00    2.00    3.00    4.00    5.00    6.00    7.00    8.00    9.00   
10.00   11.00   12.00   13.00   14.00   15.00   16.00   17.00   18.00   19.00  
20.00   21.00   22.00   23.00   24.00   25.00   26.00   27.00   28.00   29.00  
30.00   31.00   32.00   33.00   34.00   35.00   36.00   37.00   38.00   39.00  
40.00   41.00   42.00   43.00   44.00   45.00   46.00   47.00   48.00   49.00  
# New slice
50.00   51.00   52.00   53.00   54.00   55.00   56.00   57.00   58.00   59.00  
60.00   61.00   62.00   63.00   64.00   65.00   66.00   67.00   68.00   69.00  
70.00   71.00   72.00   73.00   74.00   75.00   76.00   77.00   78.00   79.00  
80.00   81.00   82.00   83.00   84.00   85.00   86.00   87.00   88.00   89.00  
90.00   91.00   92.00   93.00   94.00   95.00   96.00   97.00   98.00   99.00  
# New slice
100.00  101.00  102.00  103.00  104.00  105.00  106.00  107.00  108.00  109.00 
110.00  111.00  112.00  113.00  114.00  115.00  116.00  117.00  118.00  119.00 
120.00  121.00  122.00  123.00  124.00  125.00  126.00  127.00  128.00  129.00 
130.00  131.00  132.00  133.00  134.00  135.00  136.00  137.00  138.00  139.00 
140.00  141.00  142.00  143.00  144.00  145.00  146.00  147.00  148.00  149.00 
# New slice
150.00  151.00  152.00  153.00  154.00  155.00  156.00  157.00  158.00  159.00 
160.00  161.00  162.00  163.00  164.00  165.00  166.00  167.00  168.00  169.00 
170.00  171.00  172.00  173.00  174.00  175.00  176.00  177.00  178.00  179.00 
180.00  181.00  182.00  183.00  184.00  185.00  186.00  187.00  188.00  189.00 
190.00  191.00  192.00  193.00  194.00  195.00  196.00  197.00  198.00  199.00 
# New slice

只要我们知道原始数组的形状，就可以很容易地读回它。我们可以做numpy.loadtxt('test.txt').reshape((4,5,10))。作为一个示例（您可以在一行中完成此操作，我只是在冗长地澄清事情）：

# Read the array from disk
new_data = np.loadtxt('test.txt')

# Note that this returned a 2D array!
print new_data.shape

# However, going back to 3D is easy if we know the 
# original shape of the array
new_data = new_data.reshape((4,5,10))

# Just to check that they're the same...
assert np.all(new_data == data)

— 乔·金顿
source

2

+1我之外，也见numpy.loadtxt（docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html）

— 多米尼克·罗杰

2

现在，有一个更简单的解决方案可以解决此问题：yourStrArray = np.array（[yourMulDArray中val的[str（val）]，dtype ='string'）; np.savetxt（'YourTextFile.txt'，yourStrArray，fmt ='％s'）

— Greg Kramida 2012年

@GregKramida，您如何恢复阵列？

— astrojuanlu

@ Juanlu001：我知道numpy.loadtxt（...）也接受dtype参数，可以将其设置为np.string_。首先，我要试一试。还有一个numpy.fromstring（...）用于从字符串解析数组。

— Greg Kramida 2013年

嘿，如果我需要存储图像阵列怎么办？如果图像尺寸为512 x 512，我们将如何调整尺寸？

— Ambika Saxena

31

鉴于我认为您有兴趣让人们可读该文件，因此我不确定这是否满足您的要求，但是如果这不是主要问题，就pickle可以了。

要保存它：

import pickle

my_data = {'a': [1, 2.0, 3, 4+6j],
           'b': ('string', u'Unicode string'),
           'c': None}
output = open('data.pkl', 'wb')
pickle.dump(my_data, output)
output.close()

读回：

import pprint, pickle

pkl_file = open('data.pkl', 'rb')

data1 = pickle.load(pkl_file)
pprint.pprint(data1)

pkl_file.close()

— 多米尼克·罗杰（Dominic Rodger）
source

您可能不需要pprint打印字典。

— zyy19年

11

如果不需要人类可读的输出，则可以尝试的另一种选择是将数组另存为MATLAB .mat文件，这是结构化数组。我鄙视MATLAB，但是我可以.mat在很少的几行中进行读写的事实很方便。

与乔金顿的回答，这样做的好处是，你不需要知道数据的原始形状中.mat的文件，即在阅读无需重塑。而且，不像使用pickle，一个.mat文件可以通过MATLAB读取，以及其他一些程序/语言。

这是一个例子：

import numpy as np
import scipy.io

# Some test data
x = np.arange(200).reshape((4,5,10))

# Specify the filename of the .mat file
matfile = 'test_mat.mat'

# Write the array to the mat file. For this to work, the array must be the value
# corresponding to a key name of your choice in a dictionary
scipy.io.savemat(matfile, mdict={'out': x}, oned_as='row')

# For the above line, I specified the kwarg oned_as since python (2.7 with 
# numpy 1.6.1) throws a FutureWarning.  Here, this isn't really necessary 
# since oned_as is a kwarg for dealing with 1-D arrays.

# Now load in the data from the .mat that was just saved
matdata = scipy.io.loadmat(matfile)

# And just to check if the data is the same:
assert np.all(x == matdata['out'])

如果忘记了在.mat文件中为数组命名的键，则始终可以执行以下操作：

print matdata.keys()

当然，您可以使用更多键存储许多数组。

因此，是的-您的眼睛无法看懂它，而只需要两行即可写入和读取数据，我认为这是一个公平的权衡。

看看scipy.io.savemat 和scipy.io.loadmat的文档，以及本教程页面：scipy.io文件IO教程

— 航海图
source

9

ndarray.tofile() 应该也可以

例如，如果您的数组被调用a：

a.tofile('yourfile.txt',sep=" ",format="%s")

虽然不确定如何获取换行格式。

编辑（在此处向Kevin J. Black发表评论）：

从1.5.0版开始，np.tofile()采用可选参数 newline='\n'以允许多行输出。 https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.savetxt.html

— 原子33ls
source

但是有办法从texfile创建原始数组吗？

— Ahashan Alam Sojib

@AhashanAlamSojib看到stackoverflow.com/questions/3518778/...

— atomh33ls

tofile没有newline='\n'。

— NicoSchlömer19年

4

有专门的库可以做到这一点。（加上python包装器）

netCDF4：http：//www.unidata.ucar.edu/software/netcdf/
netCDF4 Python界面：http：//www.unidata.ucar.edu/software/netcdf/software.html#Python
HDF5：http：//www.hdfgroup.org/HDF5/

希望这可以帮助

— 罗尼·布伦德尔
source

1

您可以简单地在三个嵌套循环中遍历数组并将其值写入文件。为了阅读，您只需使用完全相同的循环结构即可。您将以正确的顺序获得值，以再次正确填充数组。

— 朱勒
source

0

我有一种方法可以使用简单的filename.write（）操作。它对我来说很好用，但是我正在处理具有约1500个数据元素的数组。

我基本上只需要for循环来遍历文件，然后以csv样式输出将其逐行写入输出目标。

import numpy as np

trial = np.genfromtxt("/extension/file.txt", dtype = str, delimiter = ",")

with open("/extension/file.txt", "w") as f:
    for x in xrange(len(trial[:,1])):
        for y in range(num_of_columns):
            if y < num_of_columns-2:
                f.write(trial[x][y] + ",")
            elif y == num_of_columns-1:
                f.write(trial[x][y])
        f.write("\n")

if和elif语句用于在数据元素之间添加逗号。无论出于何种原因，当以nd数组形式读取文件时，这些内容都会被删除。我的目标是将文件输出为csv，因此此方法有助于解决该问题。

希望这可以帮助！

— 本尼
source

0

泡菜最适合这些情况。假设您有一个名为的ndarray x_train。您可以将其转储到文件中，然后使用以下命令将其还原：

import pickle

###Load into file
with open("myfile.pkl","wb") as f:
    pickle.dump(x_train,f)

###Extract from file
with open("myfile.pkl","rb") as f:
    x_temp = pickle.load(f)

— 剑八扎拉基
source