用numpy将csv加载到二维矩阵中进行绘图


78

鉴于此CSV文件:

"A","B","C","D","E","F","timestamp"
611.88243,9089.5601,5133.0,864.07514,1715.37476,765.22777,1.291111964948E12
611.88243,9089.5601,5133.0,864.07514,1715.37476,765.22777,1.291113113366E12
611.88243,9089.5601,5133.0,864.07514,1715.37476,765.22777,1.291120650486E12

我只是想将其加载为3行7列的矩阵/ ndarray。但是,由于某种原因,我能从numpy中得到的只有一个具有3行(每行一个)且没有列的ndarray。

r = np.genfromtxt(fname,delimiter=',',dtype=None, names=True)
print r
print r.shape

[ (611.88243, 9089.5601000000006, 5133.0, 864.07514000000003, 1715.3747599999999, 765.22776999999996, 1291111964948.0)
 (611.88243, 9089.5601000000006, 5133.0, 864.07514000000003, 1715.3747599999999, 765.22776999999996, 1291113113366.0)
 (611.88243, 9089.5601000000006, 5133.0, 864.07514000000003, 1715.3747599999999, 765.22776999999996, 1291120650486.0)]
(3,)

我可以手动迭代并将其修改为所需的形状,但这似乎很愚蠢。我只想将其加载为适当的矩阵,以便可以像在matlab中一样将其切成不同的维度并绘制出来。

Answers:


152

纯麻木

numpy.loadtxt(open("test.csv", "rb"), delimiter=",", skiprows=1)

请查阅loadtxt文档。

您还可以使用python的csv模块:

import csv
import numpy
reader = csv.reader(open("test.csv", "rb"), delimiter=",")
x = list(reader)
result = numpy.array(x).astype("float")

您将不得不将其转换为您喜欢的数字类型。我想您可以将全部内容写成一行:

结果= numpy.array(list(csv.reader(open(“ test.csv”,“ rb”),delimiter =“,”))))。astype(“ float”)

新增提示:

您还可以使用pandas.io.parsers.read_csv并获取关联的numpy数组,该数组可以更快。


我还要补充一点,skiprows = 1标志正在跳过第一行,如果要保留所有数据,它不是标准的激活标志。工作完美!
Arturo

loadtxt不会同时加载在genfromtxt上names = True发生的列名称
mhstnsc

我可以问一下-是open那条线的本地线路吗?如图所示,文件是否在行尾关闭?
Daniel Soutar '18

是的,它会关闭文件。另请参阅:stackoverflow.com/questions/8011797/…–
Kaveh_kh,

我建议使用seocnd方法,因为loadtxt它非常慢。或者pandas是为目的非常伟大
fireball.1

6

我认为dtype在有名称行的地方使用该例程会令人困惑。尝试

>>> r = np.genfromtxt(fname, delimiter=',', names=True)
>>> r
array([[  6.11882430e+02,   9.08956010e+03,   5.13300000e+03,
          8.64075140e+02,   1.71537476e+03,   7.65227770e+02,
          1.29111196e+12],
       [  6.11882430e+02,   9.08956010e+03,   5.13300000e+03,
          8.64075140e+02,   1.71537476e+03,   7.65227770e+02,
          1.29111311e+12],
       [  6.11882430e+02,   9.08956010e+03,   5.13300000e+03,
          8.64075140e+02,   1.71537476e+03,   7.65227770e+02,
          1.29112065e+12]])
>>> r[:,0]    # Slice 0'th column
array([ 611.88243,  611.88243,  611.88243])

有趣的是,这不会改变我的情况。我正在使用Python 2.5和numpy 1.4.1,所以也许是问题所在
dgorissen 2010年

我正在使用Python 2.6和NumPy 1.3.0!我更喜欢旧的行为。
mtrw

4

您可以使用np.genfromtxt将带有标题的CSV文件读取到NumPy结构化数组中。例如:

import numpy as np

csv_fname = 'file.csv'
with open(csv_fname, 'w') as fp:
    fp.write("""\
"A","B","C","D","E","F","timestamp"
611.88243,9089.5601,5133.0,864.07514,1715.37476,765.22777,1.291111964948E12
611.88243,9089.5601,5133.0,864.07514,1715.37476,765.22777,1.291113113366E12
611.88243,9089.5601,5133.0,864.07514,1715.37476,765.22777,1.291120650486E12
""")

# Read the CSV file into a Numpy record array
r = np.genfromtxt(csv_fname, delimiter=',', names=True, case_sensitive=True)
print(repr(r))

看起来像这样:

array([(611.88243, 9089.5601, 5133., 864.07514, 1715.37476, 765.22777, 1.29111196e+12),
       (611.88243, 9089.5601, 5133., 864.07514, 1715.37476, 765.22777, 1.29111311e+12),
       (611.88243, 9089.5601, 5133., 864.07514, 1715.37476, 765.22777, 1.29112065e+12)],
      dtype=[('A', '<f8'), ('B', '<f8'), ('C', '<f8'), ('D', '<f8'), ('E', '<f8'), ('F', '<f8'), ('timestamp', '<f8')])

您可以像这样访问命名列r['E']

array([1715.37476, 1715.37476, 1715.37476])

注意:该答案以前使用np.recfromcsv来将数据读取到NumPy记录数组中。尽管该方法没有什么问题,但结构化数组通常在速度和兼容性方面都优于记录数组。

By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.