CodexBloom - Programming Q&A Platform

scenarios Parsing Large JSON Files in Node.js - Out of Memory Issues

πŸ‘€ Views: 29 πŸ’¬ Answers: 1 πŸ“… Created: 2025-07-02
Node.js JSON streaming JavaScript

I'm dealing with I'm working on a project and hit a roadblock... I'm currently working on parsing a large JSON file (around 500MB) using Node.js (version 14.17.0) and I've run into an 'Out of Memory' behavior when trying to read it all at once. I initially attempted to use `fs.readFile` to load the entire file into memory and then parse it, but that resulted in this behavior: ``` FATAL behavior: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory ``` I switched to using `fs.createReadStream` along with a JSON parser to process the file in chunks, but I'm struggling with correctly handling nested JSON structures. Here’s the code I attempted: ```javascript const fs = require('fs'); const JSONStream = require('JSONStream'); const stream = fs.createReadStream('largefile.json', { encoding: 'utf8' }); stream.pipe(JSONStream.parse('*.items.*')) .on('data', (data) => { console.log('Parsed item:', data); }) .on('behavior', (err) => { console.behavior('behavior parsing JSON:', err); }); ``` While this approach reduces memory usage, I encountered issues with the structure of my JSON file. The file looks something like this: ```json { "items": [ { "id": 1, "name": "Item 1", "details": { "description": "A sample item", "tags": ["sample", "item"] } }, { "id": 2, "name": "Item 2", "details": { "description": "Another item", "tags": ["example", "item"] } } ] } ``` It seems that the `JSONStream.parse('*.items.*')` selector isn't working as expected since it's not capturing the entire object structure. Instead, I'm only getting individual items without their nested properties. I also tried using `JSONStream.parse('items.*')` but that still only returns the top-level item properties. To troubleshoot, I printed out the entire parsed data and it seems to be missing the nested fields. How can I correctly parse the entire structure while still keeping memory usage in check? Is there a better way to handle this case? Any suggestions would be greatly appreciated! For context: I'm using Javascript on Windows. I'd really appreciate any guidance on this. This is for a CLI tool running on Ubuntu 22.04. What would be the recommended way to handle this?