使用nodejs,我想解析一个包含10000条记录的.csv文件,并对每一行进行一些操作。我尝试使用http://www.adaltas.com/projects/node-csv。我无法让它在每一行暂停。这只会读取所有10000条记录。我需要执行以下操作:
- 逐行读取csv
- 在每条线上执行耗时的操作
- 转到下一行
有人可以在这里提出其他建议吗?
使用nodejs,我想解析一个包含10000条记录的.csv文件,并对每一行进行一些操作。我尝试使用http://www.adaltas.com/projects/node-csv。我无法让它在每一行暂停。这只会读取所有10000条记录。我需要执行以下操作:
有人可以在这里提出其他建议吗?
Answers:
似乎您需要使用一些基于流的解决方案,已经存在这样的库,因此在重塑自己之前,请尝试使用该库,该库还包括验证支持。https://www.npmjs.org/package/fast-csv
我用这种方式:-
var fs = require('fs');
var parse = require('csv-parse');
var csvData=[];
fs.createReadStream(req.file.path)
.pipe(parse({delimiter: ':'}))
.on('data', function(csvrow) {
console.log(csvrow);
//do something with csvrow
csvData.push(csvrow);
})
.on('end',function() {
//do something with csvData
console.log(csvData);
});
parse
未定义。有什么我想念的吗?当我运行npm install csv-parse
然后在代码中添加时var parse = require("csv-parse");
,它就可以工作。你确定你的作品吗?无论哪种方式,我喜欢这个解决方案(即使我必须包括csv-parse
模块
csv-parse
模块。
我当前的解决方案使用异步模块来依次执行:
var fs = require('fs');
var parse = require('csv-parse');
var async = require('async');
var inputFile='myfile.csv';
var parser = parse({delimiter: ','}, function (err, data) {
async.eachSeries(data, function (line, callback) {
// do something with the line
doSomething(line).then(function() {
// when processing finishes invoke the callback to move to the next one
callback();
});
})
});
fs.createReadStream(inputFile).pipe(parser);
csv-parser
代替csv-parse
。 csv-parser
大约在2年后出现
csv-parse
。 csv-parser
更好,因为通过它很容易处理标题。首先安装csv-parser:
npm install csv-parser
因此,假设您有一个csv文件,如下所示:
NAME, AGE
Lionel Messi, 31
Andres Iniesta, 34
您可以通过以下方式执行所需的操作:
const fs = require('fs');
const csv = require('csv-parser');
fs.createReadStream(inputFilePath)
.pipe(csv())
.on('data', function(data){
try {
console.log("Name is: "+data.NAME);
console.log("Age is: "+data.AGE);
//perform the operation
}
catch(err) {
//error handler
}
})
.on('end',function(){
//some final operation
});
欲了解更多信息,请参阅
为了在fast-csv中暂停流式传输,您可以执行以下操作:
let csvstream = csv.fromPath(filePath, { headers: true })
.on("data", function (row) {
csvstream.pause();
// do some heavy work
// when done resume the stream
csvstream.resume();
})
.on("end", function () {
console.log("We are done!")
})
.on("error", function (error) {
console.log(error)
});
您所引用的node-csv项目完全可以完成转换大部分CSV数据的每一行的任务,该文档来自http://csv.adaltas.com/transform/:
csv()
.from('82,Preisner,Zbigniew\n94,Gainsbourg,Serge')
.to(console.log)
.transform(function(row, index, callback){
process.nextTick(function(){
callback(null, row.reverse());
});
});
从我的经验来看,我可以说这也是一个相当快的实现,我一直在处理具有近1万条记录的数据集,并且整个集的处理时间处于合理的数十毫秒级别。
提出了jurka基于流的解决方案建议:node-csv基于IS流,并遵循Node.js的流API。
该快速CSV NPM模块可以读取数据线,由线从csv文件。
这是一个例子:
let csv= require('fast-csv');
var stream = fs.createReadStream("my.csv");
csv
.parseStream(stream, {headers : true})
.on("data", function(data){
console.log('I am one line of data', data);
})
.on("end", function(){
console.log("done");
});
fromStream()
并且其项目站点缺少示例和文档。
我需要一个异步的csv阅读器,最初尝试了@Pransh Tiwari的答案,但无法与await
and 一起使用util.promisify()
。最终,我遇到了node-csvtojson,它与csv-parser几乎一样,但是有希望。这是csvtojson实际使用的示例:
const csvToJson = require('csvtojson');
const processRecipients = async () => {
const recipients = await csvToJson({
trim:true
}).fromFile('./recipients.csv');
// Code executes after recipients are fully loaded.
recipients.forEach((recipient) => {
console.log(recipient.name, recipient.email);
});
};
尝试逐行npm插件。
npm install line-by-line --save
这是我从外部网址获取csv文件的解决方案
const parse = require( 'csv-parse/lib/sync' );
const axios = require( 'axios' );
const readCSV = ( module.exports.readCSV = async ( path ) => {
try {
const res = await axios( { url: path, method: 'GET', responseType: 'blob' } );
let records = parse( res.data, {
columns: true,
skip_empty_lines: true
} );
return records;
} catch ( e ) {
console.log( 'err' );
}
} );
readCSV('https://urltofilecsv');
好的,所以这里有很多答案,我不认为他们会回答您的问题,我认为这与我的相似。
您需要执行类似联系数据库或第三方api的操作,这需要时间并且是异步的。由于大小过大或其他原因,您不想将整个文档加载到内存中,因此需要逐行阅读以进行处理。
我已经阅读了fs文档,并且可以在阅读时暂停,但是使用.on('data')调用将使其连续,其中大多数答案都在使用并引起问题。
更新:我比想了解的更多有关流的信息
最好的方法是创建可写流。这会将csv数据通过管道传递到可写流中,从而可以管理异步调用。管道将一直管理缓冲区,直到返回到读取器为止,因此不会浪费大量内存
简单版
const parser = require('csv-parser');
const stripBom = require('strip-bom-stream');
const stream = require('stream')
const mySimpleWritable = new stream.Writable({
objectMode: true, // Because input is object from csv-parser
write(chunk, encoding, done) { // Required
// chunk is object with data from a line in the csv
console.log('chunk', chunk)
done();
},
final(done) { // Optional
// last place to clean up when done
done();
}
});
fs.createReadStream(fileNameFull).pipe(stripBom()).pipe(parser()).pipe(mySimpleWritable)
类版本
const parser = require('csv-parser');
const stripBom = require('strip-bom-stream');
const stream = require('stream')
// Create writable class
class MyWritable extends stream.Writable {
// Used to set object mode because we get an object piped in from csv-parser
constructor(another_variable, options) {
// Calls the stream.Writable() constructor.
super({ ...options, objectMode: true });
// additional information if you want
this.another_variable = another_variable
}
// The write method
// Called over and over, for each line in the csv
async _write(chunk, encoding, done) {
// The chunk will be a line of your csv as an object
console.log('Chunk Data', this.another_variable, chunk)
// demonstrate await call
// This will pause the process until it is finished
await new Promise(resolve => setTimeout(resolve, 2000));
// Very important to add. Keeps the pipe buffers correct. Will load the next line of data
done();
};
// Gets called when all lines have been read
async _final(done) {
// Can do more calls here with left over information in the class
console.log('clean up')
// lets pipe know its done and the .on('final') will be called
done()
}
}
// Instantiate the new writable class
myWritable = new MyWritable(somevariable)
// Pipe the read stream to csv-parser, then to your write class
// stripBom is due to Excel saving csv files with UTF8 - BOM format
fs.createReadStream(fileNameFull).pipe(stripBom()).pipe(parser()).pipe(myWritable)
// optional
.on('finish', () => {
// will be called after the wriables internal _final
console.log('Called very last')
})
旧方法:
可读性强的问题
const csv = require('csv-parser');
const fs = require('fs');
const processFileByLine = async(fileNameFull) => {
let reading = false
const rr = fs.createReadStream(fileNameFull)
.pipe(csv())
// Magic happens here
rr.on('readable', async function(){
// Called once when data starts flowing
console.log('starting readable')
// Found this might be called a second time for some reason
// This will stop that event from happening
if (reading) {
console.log('ignoring reading')
return
}
reading = true
while (null !== (data = rr.read())) {
// data variable will be an object with information from the line it read
// PROCESS DATA HERE
console.log('new line of data', data)
}
// All lines have been read and file is done.
// End event will be called about now so that code will run before below code
console.log('Finished readable')
})
rr.on("end", function () {
// File has finished being read
console.log('closing file')
});
rr.on("error", err => {
// Some basic error handling for fs error events
console.log('error', err);
});
}
您会注意到一个reading
标志。我注意到,由于某种原因,在文件结尾处,.on('read')再次在大小文件上被调用。我不确定为什么,但是这阻止了读取相同订单项的第二个过程。
我使用了一个简单的例子:https : //www.npmjs.com/package/csv-parser
使用非常简单:
const csv = require('csv-parser')
const fs = require('fs')
const results = [];
fs.createReadStream('./CSVs/Update 20191103C.csv')
.pipe(csv())
.on('data', (data) => results.push(data))
.on('end', () => {
console.log(results);
console.log(results[0]['Lowest Selling Price'])
});
我使用的csv-parse
是Papa Parse,但是对于较大的文件却遇到了性能问题,我发现更好的库之一是Papa Parse,文档很好,很好的支持,轻巧,没有依赖性。
安装 papaparse
npm install papaparse
用法:
const fs = require('fs');
const Papa = require('papaparse');
const csvFilePath = 'data/test.csv'
// Function to read csv which returns a promise so you can do async / await.
const readCSV = async (filePath) => {
const csvFile = fs.readFileSync(filePath)
const csvData = csvFile.toString()
return new Promise(resolve => {
Papa.parse(csvData, {
header: true,
transformHeader: header => header.trim(),
complete: results => {
console.log('Complete', results.data.length, 'records.');
resolve(results.data);
}
});
});
};
const test = async () => {
let parsedData = await readCSV(csvFilePath);
}
test()
const fs = require('fs');
const Papa = require('papaparse');
const csvFilePath = 'data/test.csv'
const file = fs.createReadStream(csvFilePath);
var csvData=[];
Papa.parse(file, {
header: true,
transformHeader: header => header.trim(),
step: function(result) {
csvData.push(result.data)
},
complete: function(results, file) {
console.log('Complete', csvData.length, 'records.');
}
});
注意header: true
是配置中的一个选项,有关其他选项,请参阅文档
您可以使用csv-to-json模块将csv转换为json格式,然后可以在程序中轻松使用json文件
npm安装csv
样本CSV文件您将需要一个CSV文件来进行解析,因此您已经拥有一个CSV文件,或者您可以复制下面的文本并将其粘贴到新文件中,然后将该文件称为“ mycsv.csv”
ABC, 123, Fudge
532, CWE, ICECREAM
8023, POOP, DOGS
441, CHEESE, CARMEL
221, ABC, HOUSE
1
ABC, 123, Fudge
2
532, CWE, ICECREAM
3
8023, POOP, DOGS
4
441, CHEESE, CARMEL
5
221, ABC, HOUSE
读取和解析CSV文件的示例代码
创建一个新文件,并将以下代码插入其中。确保通读幕后发生的事情。
var csv = require('csv');
// loads the csv module referenced above.
var obj = csv();
// gets the csv module to access the required functionality
function MyCSV(Fone, Ftwo, Fthree) {
this.FieldOne = Fone;
this.FieldTwo = Ftwo;
this.FieldThree = Fthree;
};
// Define the MyCSV object with parameterized constructor, this will be used for storing the data read from the csv into an array of MyCSV. You will need to define each field as shown above.
var MyData = [];
// MyData array will contain the data from the CSV file and it will be sent to the clients request over HTTP.
obj.from.path('../THEPATHINYOURPROJECT/TOTHE/csv_FILE_YOU_WANT_TO_LOAD.csv').to.array(function (data) {
for (var index = 0; index < data.length; index++) {
MyData.push(new MyCSV(data[index][0], data[index][1], data[index][2]));
}
console.log(MyData);
});
//Reads the CSV file from the path you specify, and the data is stored in the array we specified using callback function. This function iterates through an array and each line from the CSV file will be pushed as a record to another array called MyData , and logs the data into the console to ensure it worked.
var http = require('http');
//Load the http module.
var server = http.createServer(function (req, resp) {
resp.writeHead(200, { 'content-type': 'application/json' });
resp.end(JSON.stringify(MyData));
});
// Create a webserver with a request listener callback. This will write the response header with the content type as json, and end the response by sending the MyData array in JSON format.
server.listen(8080);
// Tells the webserver to listen on port 8080(obviously this may be whatever port you want.)
1
var csv = require('csv');
2
// loads the csv module referenced above.
3
4
var obj = csv();
5
// gets the csv module to access the required functionality
6
7
function MyCSV(Fone, Ftwo, Fthree) {
8
this.FieldOne = Fone;
9
this.FieldTwo = Ftwo;
10
this.FieldThree = Fthree;
11
};
12
// Define the MyCSV object with parameterized constructor, this will be used for storing the data read from the csv into an array of MyCSV. You will need to define each field as shown above.
13
14
var MyData = [];
15
// MyData array will contain the data from the CSV file and it will be sent to the clients request over HTTP.
16
17
obj.from.path('../THEPATHINYOURPROJECT/TOTHE/csv_FILE_YOU_WANT_TO_LOAD.csv').to.array(function (data) {
18
for (var index = 0; index < data.length; index++) {
19
MyData.push(new MyCSV(data[index][0], data[index][1], data[index][2]));
20
}
21
console.log(MyData);
22
});
23
//Reads the CSV file from the path you specify, and the data is stored in the array we specified using callback function. This function iterates through an array and each line from the CSV file will be pushed as a record to another array called MyData , and logs the data into the console to ensure it worked.
24
25
var http = require('http');
26
//Load the http module.
27
28
var server = http.createServer(function (req, resp) {
29
resp.writeHead(200, { 'content-type': 'application/json' });
30
resp.end(JSON.stringify(MyData));
31
});
32
// Create a webserver with a request listener callback. This will write the response header with the content type as json, and end the response by sending the MyData array in JSON format.
33
34
server.listen(8080);
35
// Tells the webserver to listen on port 8080(obviously this may be whatever port you want.)
Things to be aware of in your app.js code
In lines 7 through 11, we define the function called 'MyCSV' and the field names.
If your CSV file has multiple columns make sure you define this correctly to match your file.
On line 17 we define the location of the CSV file of which we are loading. Make sure you use the correct path here.
启动您的应用并验证功能打开控制台并键入以下命令:
Node app 1 Node app您应该在控制台中看到以下输出:
[ MYCSV { Fieldone: 'ABC', Fieldtwo: '123', Fieldthree: 'Fudge' },
MYCSV { Fieldone: '532', Fieldtwo: 'CWE', Fieldthree: 'ICECREAM' },
MYCSV { Fieldone: '8023', Fieldtwo: 'POOP', Fieldthree: 'DOGS' },
MYCSV { Fieldone: '441', Fieldtwo: 'CHEESE', Fieldthree: 'CARMEL' },
MYCSV { Fieldone: '221', Fieldtwo: 'ABC', Fieldthree: 'HOUSE' }, ]
1 [MYCSV {Fieldone:'ABC',Fieldtwo:'123',Fieldthree:'Fudge'},2 MYCSV {Fieldone:'532',Fieldtwo:'CWE',Fieldthree:'ICECREAM'},3 MYCSV {Fieldone: '8023',Fieldtwo:'POOP',Fieldthree:'DOGS'},4 MYCSV {Fieldone:'441',Fieldtwo:'CHEESE',Fieldthree:'CARMEL'},5 MYCSV {Fieldone:'221',Fieldtwo: 'ABC',Fieldthree:'HOUSE'},]现在,您应该打开一个网络浏览器并导航到您的服务器。您应该看到它以JSON格式输出数据。
结束语使用node.js及其CSV模块,我们可以快速轻松地读取和使用服务器上存储的数据,并根据请求将其提供给客户端