Skip to content

Memory Limitations in XmlProvider #1501

@ibrahim324

Description

@ibrahim324

I tried to parse a dump of some wikipedia pages with XmlProvider, but no matter what I try, I get a
System.OutOfMemoryException. Is there some guidance/pattern on how to parse large files with type providers?
The file is almost exactly 2 GB large.

my code:

#r "nuget: FSharp.Data"
open FSharp.Data

open System
open System.IO

type Wiki = XmlProvider<"""data/wikidata_sample.xml""">


let xmlFromFile = 
    task{
        let path = "data/wikidata.xml" 
        let! text = File.ReadAllTextAsync(path)
        
        Wiki.Parse(text).Pages
        |> Array.map (fun f -> f.Revision.Text)
        |> Array.iter (fun f -> printfn $"{f}")
    }

let xmlFromStream = 
    let options = 
        new FileStreamOptions(BufferSize=32)
    use stream = new FileStream("data/wikidata.xml", options)
    stream 
    |> Wiki.Load
    |> fun f -> f.Pages
    |> Array.map (fun f -> f.Revision.Text.Value)
    |> Array.iter (fun f -> printfn $"{f}")

xmlFromStream

// xmlFromFile 
// |> Async.AwaitTask
// |> Async.RunSynchronously

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions