Home My Page Projects Dose: library and tools
Summary Activity Tracker Lists SCM Files

[#20113] please consider outputting multi-document yaml

Date:
2016-03-07 07:55
Priority:
3
State:
Open
Submitted by:
Johannes Schauer (josch)
Assigned to:
Pietro Abate (abate)
Summary:
please consider outputting multi-document yaml

Detailed description
Currently, it is difficult to consume large yaml documents as output by dose3 tools. For example using the Python yaml module, even when using the CBaseLoader, parsing a 500MB yaml document will result in a memory usage of 12 GB. Producing yaml documents of this size is nothing unusual when running distcheck on all of Debian unstable or when checking for cross-build satisfiability of source packages with buildcheck (where most source packages fail to satisfy their cross-build dependencies).

Fortunately, the yaml format offers a feature which allows to encode multiple documents in a single stream, using the `---` separator: http://yaml.org/spec/1.2/spec.html#id2760395

So a possible solution to this dilemma would be to let dose3 output one individual yaml document per package. Then yaml parsers would not anymore have to parse the whole yaml document at once but would parse each document individually, decreasing the memory usage by multiple orders of magnitude.

I know that botch, rebootstrap and the code generating bootstrap.debian.net would benefit from this. The code generating qa.debian.org/dose would probably also profit.
Message  ↓
Date: 2016-04-18 10:25
Sender: Pietro Abate

this is a very good idea. Or maybe instead of one document per package, n packages per document ... I'll check this out. This is going to break a lot of scripts based on the yaml output I guess.

Field Old Value Date By
assigned_tonone2016-04-18 10:25abate