使用NodeJS解析CSV文件


124

使用nodejs,我想解析一个包含10000条记录的.csv文件,并对每一行进行一些操作。我尝试使用http://www.adaltas.com/projects/node-csv。我无法让它在每一行暂停。这只会读取所有10000条记录。我需要执行以下操作:

  1. 逐行读取csv
  2. 在每条线上执行耗时的操作
  3. 转到下一行

有人可以在这里提出其他建议吗?


也许会对您有所帮助:stackoverflow.com/a/15554600/1169798
Sirko 2014年

1
您是否为每行添加了回调?否则,它将只是异步读取它们。
Ben Fortune

Answers:


81

似乎您需要使用一些基于流的解决方案,已经存在这样的库,因此在重塑自己之前,请尝试使用该库,该库还包括验证支持。https://www.npmjs.org/package/fast-csv


26
NodeCSV也得到了很好的支持,并且碰巧拥有大约一个数量级的用户。 npmjs.com/package/csv
steampowered 2015年

4
fast-csv快速,易于使用并开始使用。
Roger Garzon Nieto

1
它支持url吗?
DMS-KH

57

我用这种方式:-

var fs = require('fs'); 
var parse = require('csv-parse');

var csvData=[];
fs.createReadStream(req.file.path)
    .pipe(parse({delimiter: ':'}))
    .on('data', function(csvrow) {
        console.log(csvrow);
        //do something with csvrow
        csvData.push(csvrow);        
    })
    .on('end',function() {
      //do something with csvData
      console.log(csvData);
    });

2
我可能做错了什么,但是运行时parse未定义。有什么我想念的吗?当我运行npm install csv-parse然后在代码中添加时var parse = require("csv-parse");,它就可以工作。你确定你的作品吗?无论哪种方式,我喜欢这个解决方案(即使我必须包括csv-parse模块
伊恩·

1
您说得对@lan,应该包含csv-parse模块。
vineet '16

1
太棒了,谢谢您验证和更新您的答案!
伊恩

3
不错的解决方案。为我工作。
太阳蜜蜂

3
可悲的是,这很糟糕-我在读取大文件和长行时遇到了错误....(内存错误-尽管通过其他方式读取它-可以正常工作)
Seti 2016年

54

我当前的解决方案使用异步模块来依次执行:

var fs = require('fs');
var parse = require('csv-parse');
var async = require('async');

var inputFile='myfile.csv';

var parser = parse({delimiter: ','}, function (err, data) {
  async.eachSeries(data, function (line, callback) {
    // do something with the line
    doSomething(line).then(function() {
      // when processing finishes invoke the callback to move to the next one
      callback();
    });
  })
});
fs.createReadStream(inputFile).pipe(parser);

1
我想你想念一些')'吗?
Steven Luong C

我认为在第14和15行的末尾添加')'应该可以解决此问题。
2016年

@ShashankVivek-在这个旧答案中(从2015年开始),“ async”是使用的npm库。有关此内容的更多信息,请访问caolan.github.io/async- 了解为什么这可能对blog.risingstack.com/node-hero-async-programming-in-node-js有所帮助但是自2015年以来,javascript有了很大的发展,如果您的问题一般来讲,更多地是关于异步的,然后阅读这篇较新的文章medium.com/@tkssharma/…–
prule

15
  • 此解决方案在上面的某些答案中使用csv-parser代替csv-parse
  • csv-parser大约在2年后出现 csv-parse
  • 他们两个都解决了相同的目的,但是我个人发现 csv-parser更好,因为通过它很容易处理标题。

首先安装csv-parser:

npm install csv-parser

因此,假设您有一个csv文件,如下所示:

NAME, AGE
Lionel Messi, 31
Andres Iniesta, 34

您可以通过以下方式执行所需的操作:

const fs = require('fs'); 
const csv = require('csv-parser');

fs.createReadStream(inputFilePath)
.pipe(csv())
.on('data', function(data){
    try {
        console.log("Name is: "+data.NAME);
        console.log("Age is: "+data.AGE);

        //perform the operation
    }
    catch(err) {
        //error handler
    }
})
.on('end',function(){
    //some final operation
});  

欲了解更多信息,请参阅


13

为了在fast-csv中暂停流式传输,您可以执行以下操作:

let csvstream = csv.fromPath(filePath, { headers: true })
    .on("data", function (row) {
        csvstream.pause();
        // do some heavy work
        // when done resume the stream
        csvstream.resume();
    })
    .on("end", function () {
        console.log("We are done!")
    })
    .on("error", function (error) {
        console.log(error)
    });

我一直在寻找csvstream.pause()和resume()!我的应用程序总是会用完内存,因为它读取数据的速度比处理速度快得多。
ehrhardt

@adnan感谢您指出这一点。文档中没有提到它,这也是我一直在寻找的东西。
Piyush Beli,

10

您所引用的node-csv项目完全可以完成转换大部分CSV数据的每一行的任务,该文档来自http://csv.adaltas.com/transform/

csv()
  .from('82,Preisner,Zbigniew\n94,Gainsbourg,Serge')
  .to(console.log)
  .transform(function(row, index, callback){
    process.nextTick(function(){
      callback(null, row.reverse());
    });
});

从我的经验来看,我可以说这也是一个相当快的实现,我一直在处理具有近1万条记录的数据集,并且整个集的处理时间处于合理的数十毫秒级别。

提出了jurka基于流的解决方案建议:node-csv基于IS流,并遵循Node.js的流API。


8

快速CSV NPM模块可以读取数据线,由线从csv文件。

这是一个例子:

let csv= require('fast-csv');

var stream = fs.createReadStream("my.csv");

csv
 .parseStream(stream, {headers : true})
 .on("data", function(data){
     console.log('I am one line of data', data);
 })
 .on("end", function(){
     console.log("done");
 });

1
fast-csv@4.0.2没有,fromStream()并且其项目站点缺少示例和文档。
Cees Timmerman

3

我需要一个异步的csv阅读器,最初尝试了@Pransh Tiwari的答案,但无法与awaitand 一起使用util.promisify()。最终,我遇到了node-csvtojson,它与csv-parser几乎一样,但是有希望。这是csvtojson实际使用的示例:

const csvToJson = require('csvtojson');

const processRecipients = async () => {
    const recipients = await csvToJson({
        trim:true
    }).fromFile('./recipients.csv');

    // Code executes after recipients are fully loaded.
    recipients.forEach((recipient) => {
        console.log(recipient.name, recipient.email);
    });
};

2

尝试逐行npm插件。

npm install line-by-line --save

5
安装插件不是问的问题。添加一些代码来解释如何使用该插件和/或解释为什么OP应该使用它会是远远更为有利。
domdambrogia

2

这是我从外部网址获取csv文件的解决方案

const parse = require( 'csv-parse/lib/sync' );
const axios = require( 'axios' );
const readCSV = ( module.exports.readCSV = async ( path ) => {
try {
   const res = await axios( { url: path, method: 'GET', responseType: 'blob' } );
   let records = parse( res.data, {
      columns: true,
      skip_empty_lines: true
    } );

    return records;
 } catch ( e ) {
   console.log( 'err' );
 }

} );
readCSV('https://urltofilecsv');

2

使用await / async执行此任务的解决方法:

const csv = require('csvtojson')
const csvFilePath = 'data.csv'
const array = await csv().fromFile(csvFilePath);

2

好的,所以这里有很多答案,我不认为他们会回答您的问题,我认为这与我的相似。

您需要执行类似联系数据库或第三方api的操作,这需要时间并且是异步的。由于大小过大或其他原因,您不想将整个文档加载到内存中,因此需要逐行阅读以进行处理。

我已经阅读了fs文档,并且可以在阅读时暂停,但是使用.on('data')调用将使其连续,其中大多数答案都在使用并引起问题。


更新:我比想了解的更多有关流的信息

最好的方法是创建可写流。这会将csv数据通过管道传递到可写流中,从而可以管理异步调用。管道将一直管理缓冲区,直到返回到读取器为止,因此不会浪费大量内存

简单版

const parser = require('csv-parser');
const stripBom = require('strip-bom-stream');
const stream = require('stream')

const mySimpleWritable = new stream.Writable({
  objectMode: true, // Because input is object from csv-parser
  write(chunk, encoding, done) { // Required
    // chunk is object with data from a line in the csv
    console.log('chunk', chunk)
    done();
  },
  final(done) { // Optional
    // last place to clean up when done
    done();
  }
});
fs.createReadStream(fileNameFull).pipe(stripBom()).pipe(parser()).pipe(mySimpleWritable)

类版本

const parser = require('csv-parser');
const stripBom = require('strip-bom-stream');
const stream = require('stream')
// Create writable class
class MyWritable extends stream.Writable {
  // Used to set object mode because we get an object piped in from csv-parser
  constructor(another_variable, options) {
    // Calls the stream.Writable() constructor.
    super({ ...options, objectMode: true });
    // additional information if you want
    this.another_variable = another_variable
  }
  // The write method
  // Called over and over, for each line in the csv
  async _write(chunk, encoding, done) {
    // The chunk will be a line of your csv as an object
    console.log('Chunk Data', this.another_variable, chunk)

    // demonstrate await call
    // This will pause the process until it is finished
    await new Promise(resolve => setTimeout(resolve, 2000));

    // Very important to add.  Keeps the pipe buffers correct.  Will load the next line of data
    done();
  };
  // Gets called when all lines have been read
  async _final(done) {
    // Can do more calls here with left over information in the class
    console.log('clean up')
    // lets pipe know its done and the .on('final') will be called
    done()
  }
}

// Instantiate the new writable class
myWritable = new MyWritable(somevariable)
// Pipe the read stream to csv-parser, then to your write class
// stripBom is due to Excel saving csv files with UTF8 - BOM format
fs.createReadStream(fileNameFull).pipe(stripBom()).pipe(parser()).pipe(myWritable)

// optional
.on('finish', () => {
  // will be called after the wriables internal _final
  console.log('Called very last')
})

旧方法:

可读性强的问题

const csv = require('csv-parser');
const fs = require('fs');

const processFileByLine = async(fileNameFull) => {

  let reading = false

  const rr = fs.createReadStream(fileNameFull)
  .pipe(csv())

  // Magic happens here
  rr.on('readable', async function(){
    // Called once when data starts flowing
    console.log('starting readable')

    // Found this might be called a second time for some reason
    // This will stop that event from happening
    if (reading) {
      console.log('ignoring reading')
      return
    }
    reading = true
    
    while (null !== (data = rr.read())) {
      // data variable will be an object with information from the line it read
      // PROCESS DATA HERE
      console.log('new line of data', data)
    }

    // All lines have been read and file is done.
    // End event will be called about now so that code will run before below code

    console.log('Finished readable')
  })


  rr.on("end", function () {
    // File has finished being read
    console.log('closing file')
  });

  rr.on("error", err => {
    // Some basic error handling for fs error events
    console.log('error', err);
  });
}

您会注意到一个reading标志。我注意到,由于某种原因,在文件结尾处,.on('read')再次在大小文件上被调用。我不确定为什么,但是这阻止了读取相同订单项的第二个过程。


1

我使用了一个简单的例子:https : //www.npmjs.com/package/csv-parser

使用非常简单:

const csv = require('csv-parser')
const fs = require('fs')
const results = [];

fs.createReadStream('./CSVs/Update 20191103C.csv')
  .pipe(csv())
  .on('data', (data) => results.push(data))
  .on('end', () => {
    console.log(results);
    console.log(results[0]['Lowest Selling Price'])
  });

1

我使用的csv-parsePapa Parse,但是对于较大的文件却遇到了性能问题,我发现更好的库之一是Papa Parse,文档很好,很好的支持,轻巧,没有依赖性。

安装 papaparse

npm install papaparse

用法:

  • 异步/等待
const fs = require('fs');
const Papa = require('papaparse');

const csvFilePath = 'data/test.csv'

// Function to read csv which returns a promise so you can do async / await.

const readCSV = async (filePath) => {
  const csvFile = fs.readFileSync(filePath)
  const csvData = csvFile.toString()  
  return new Promise(resolve => {
    Papa.parse(csvData, {
      header: true,
      transformHeader: header => header.trim(),
      complete: results => {
        console.log('Complete', results.data.length, 'records.'); 
        resolve(results.data);
      }
    });
  });
};

const test = async () => {
  let parsedData = await readCSV(csvFilePath); 
}

test()
  • 打回来
const fs = require('fs');
const Papa = require('papaparse');

const csvFilePath = 'data/test.csv'

const file = fs.createReadStream(csvFilePath);

var csvData=[];
Papa.parse(file, {
  header: true,
  transformHeader: header => header.trim(),
  step: function(result) {
    csvData.push(result.data)
  },
  complete: function(results, file) {
    console.log('Complete', csvData.length, 'records.'); 
  }
});

注意header: true是配置中的一个选项,有关其他选项,请参阅文档


0
fs = require('fs');
fs.readFile('FILENAME WITH PATH','utf8', function(err,content){
if(err){
    console.log('error occured ' +JSON.stringify(err));
 }
 console.log('Fileconetent are ' + JSON.stringify(content));
})


-1

npm安装csv

样本CSV文件您将需要一个CSV文件来进行解析,因此您已经拥有一个CSV文件,或者您可以复制下面的文本并将其粘贴到新文件中,然后将该文件称为“ mycsv.csv”

ABC, 123, Fudge
532, CWE, ICECREAM
8023, POOP, DOGS
441, CHEESE, CARMEL
221, ABC, HOUSE
1
ABC, 123, Fudge
2
532, CWE, ICECREAM
3
8023, POOP, DOGS
4
441, CHEESE, CARMEL
5
221, ABC, HOUSE

读取和解析CSV文件的示例代码

创建一个新文件,并将以下代码插入其中。确保通读幕后发生的事情。

    var csv = require('csv'); 
    // loads the csv module referenced above.

    var obj = csv(); 
    // gets the csv module to access the required functionality

    function MyCSV(Fone, Ftwo, Fthree) {
        this.FieldOne = Fone;
        this.FieldTwo = Ftwo;
        this.FieldThree = Fthree;
    }; 
    // Define the MyCSV object with parameterized constructor, this will be used for storing the data read from the csv into an array of MyCSV. You will need to define each field as shown above.

    var MyData = []; 
    // MyData array will contain the data from the CSV file and it will be sent to the clients request over HTTP. 

    obj.from.path('../THEPATHINYOURPROJECT/TOTHE/csv_FILE_YOU_WANT_TO_LOAD.csv').to.array(function (data) {
        for (var index = 0; index < data.length; index++) {
            MyData.push(new MyCSV(data[index][0], data[index][1], data[index][2]));
        }
        console.log(MyData);
    });
    //Reads the CSV file from the path you specify, and the data is stored in the array we specified using callback function.  This function iterates through an array and each line from the CSV file will be pushed as a record to another array called MyData , and logs the data into the console to ensure it worked.

var http = require('http');
//Load the http module.

var server = http.createServer(function (req, resp) {
    resp.writeHead(200, { 'content-type': 'application/json' });
    resp.end(JSON.stringify(MyData));
});
// Create a webserver with a request listener callback.  This will write the response header with the content type as json, and end the response by sending the MyData array in JSON format.

server.listen(8080);
// Tells the webserver to listen on port 8080(obviously this may be whatever port you want.)
1
var csv = require('csv'); 
2
// loads the csv module referenced above.
3

4
var obj = csv(); 
5
// gets the csv module to access the required functionality
6

7
function MyCSV(Fone, Ftwo, Fthree) {
8
    this.FieldOne = Fone;
9
    this.FieldTwo = Ftwo;
10
    this.FieldThree = Fthree;
11
}; 
12
// Define the MyCSV object with parameterized constructor, this will be used for storing the data read from the csv into an array of MyCSV. You will need to define each field as shown above.
13

14
var MyData = []; 
15
// MyData array will contain the data from the CSV file and it will be sent to the clients request over HTTP. 
16

17
obj.from.path('../THEPATHINYOURPROJECT/TOTHE/csv_FILE_YOU_WANT_TO_LOAD.csv').to.array(function (data) {
18
    for (var index = 0; index < data.length; index++) {
19
        MyData.push(new MyCSV(data[index][0], data[index][1], data[index][2]));
20
    }
21
    console.log(MyData);
22
});
23
//Reads the CSV file from the path you specify, and the data is stored in the array we specified using callback function.  This function iterates through an array and each line from the CSV file will be pushed as a record to another array called MyData , and logs the data into the console to ensure it worked.
24

25
var http = require('http');
26
//Load the http module.
27

28
var server = http.createServer(function (req, resp) {
29
    resp.writeHead(200, { 'content-type': 'application/json' });
30
    resp.end(JSON.stringify(MyData));
31
});
32
// Create a webserver with a request listener callback.  This will write the response header with the content type as json, and end the response by sending the MyData array in JSON format.
33

34
server.listen(8080);
35
// Tells the webserver to listen on port 8080(obviously this may be whatever port you want.)
Things to be aware of in your app.js code
In lines 7 through 11, we define the function called 'MyCSV' and the field names.

If your CSV file has multiple columns make sure you define this correctly to match your file.

On line 17 we define the location of the CSV file of which we are loading.  Make sure you use the correct path here.

启动您的应用并验证功能打开控制台并键入以下命令:

Node app 1 Node app您应该在控制台中看到以下输出:

[  MYCSV { Fieldone: 'ABC', Fieldtwo: '123', Fieldthree: 'Fudge' },
   MYCSV { Fieldone: '532', Fieldtwo: 'CWE', Fieldthree: 'ICECREAM' },
   MYCSV { Fieldone: '8023', Fieldtwo: 'POOP', Fieldthree: 'DOGS' },
   MYCSV { Fieldone: '441', Fieldtwo: 'CHEESE', Fieldthree: 'CARMEL' },
   MYCSV { Fieldone: '221', Fieldtwo: 'ABC', Fieldthree: 'HOUSE' }, ]

1 [MYCSV {Fieldone:'ABC',Fieldtwo:'123',Fieldthree:'Fudge'},2 MYCSV {Fieldone:'532',Fieldtwo:'CWE',Fieldthree:'ICECREAM'},3 MYCSV {Fieldone: '8023',Fieldtwo:'POOP',Fieldthree:'DOGS'},4 MYCSV {Fieldone:'441',Fieldtwo:'CHEESE',Fieldthree:'CARMEL'},5 MYCSV {Fieldone:'221',Fieldtwo: 'ABC',Fieldthree:'HOUSE'},]现在,您应该打开一个网络浏览器并导航到您的服务器。您应该看到它以JSON格式输出数据。

结束语使用node.js及其CSV模块,我们可以快速轻松地读取和使用服务器上存储的数据,并根据请求将其提供给客户端

By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.