Scour tried to handle "order" attribute as a SVGLength. However, the
"order" attribute *can* consist of two integers according to the
[SVG 1.1 Specification] and SVGLength is not designed to handle that.
With this change, we now pretend that "order" is a string, which side
steps this issue.
[SVG 1.1 Specification]: https://www.w3.org/TR/SVG11/single-page.html#filters-feConvolveMatrixElementOrderAttributeCloses: #189
Signed-off-by: Niels Thykier <niels@thykier.net>
Avoid looping over DefaultAttribute(s) that are not relevant for a
given node. This skips a lot of calls to removeDefaultAttributeValue
but more importantly, it avoids "node.nodeName not in attribute.elements"
line in removeDefaultAttributeValue. As attribute.elements is a list, this
becomes expensive for "larger lists" (or in this case when there are a lot
of attributes).
This seems to remove about 1½-2 minutes of runtime (out of ~8) on the
1_42_polytope_7-cube.svg test case provided in #184.
Signed-off-by: Niels Thykier <niels@thykier.net>
There are a lot of "DefaultAttribute"s and for a given tag, most of
the "DefaultAttribute"s are not applicable. Therefore, we create two
data structures to assist us with only dealing with the attributes
that matter.
Here there are two cases:
* Those that always matter. These go into
default_attributes_unrestricted list.
* Those that matter only based on the node name. These go into the
default_attributes_restricted_by_tag with the node name as key
(with the value being a list of matching attributes).
In the next commit, we will use those for optimizing the removal of
default attributes.
Signed-off-by: Niels Thykier <niels@thykier.net>
* properly parse paths without space after boolean flags (fixes#161)
* omit space after boolean flag to shave off a few bytes when not using renderer workarounds
In python2.7 and python3.3, time.time() is sufficient accurate for our
purpose and avoids going through hoops to select the best available
function.
Signed-off-by: Niels Thykier <niels@thykier.net>
The original implementation of removeDuplicateGradient does O(n²)
search over all gradients to remove duplicates. In images with many
gradients (such as [MediaWiki_logo_1.svg]), this becomes a significant
overhead as that logo has over 900 duplicated gradients.
We solve this by creating a key for each gradient based on the
attributes we use for duplication detection. This key is generated
such that if two gradients have the same key, they are duplicates (for
our purpose) and the keys are different then the gradients are
guaranteed to be different as well. With such a key, we can rely on a
dict to handle the duplication detection (which it does very well).
This change improves the runtime performance on [MediaWiki_logo_1.svg]
by about 25% (8m51s -> 1m56s on 5 runs).
Signed-off-by: Niels Thykier <niels@thykier.net>
The unprotected_ids function returns all unprotected ids and
removeUnreferencedIDs removes all of them that does not appear in the
return value of findReferencedElements.
On closer observation it turns out that removeUnreferencedIDs cannot
cause nodes/IDs to become unprotected nor unreferenced (as it only
remove the "id" attribute, not the node). With this in mind, we can
just remove the loop and save a call to all of these functions.
Signed-off-by: Niels Thykier <niels@thykier.net>
The automated python2 -> python3 converter creates some suboptimal
code patterns in some cases, notably in its handling of dicts.
This commit handles the following cases:
* "if x in list(y.keys()):" => "if x in y:"
The original code is neuters the O(1) lookup effeciency of a dict
by turning it into a list. This occurs a O(n) in converting it to
a list and then another O(n) for the lookup. When done in a loop,
this becomes O(n * m) rather than the optimal O(m).
* "for x in list(y.keys()):" => "for x in y:" OR "for x in list(y):"
A dict (y in these cases) operates as an iterator over keys in the
dict by default. This makes the entire "list(y.keys())" dance
redundant _in most cases_. In a some cases, scour modifies the
dict while iterating over it and in those cases, we need a
"list(y)" (but not a "y.keys()").
The benefit of this differs between python2 and python3. In
python3, we basically "only" avoid function call. In python2,
y.keys() generates a list, so here we avoid generating a
"throw-away list".
The test suite succeed both with "python testscour.py" and "python3
testscour.py" (used 2.7.14+ and 3.6.4 from Debian testing).
On a 341kB flame-graph generated by "nytprof" (a perl profiler), this
commit changes the runtimes of scour from the range 3.39s - 3.45s to
3.27s - 3.35s making it roughly 3% faster in this case (YMMV,
particularly with different input). The timings were recorded using
the following command line:
time PYTHONPATH=. python3 -m scour.scour --enable-id-stripping \
--shorten-ids --indent=none --enable-comment-stripping
-i input.svg -o output.svg
This was used 5 times with and 5 times without the patch picking the
worst and best time to define the range. The runtime test was only
preformed on python3.
All changed lines where found with:
grep -rE ' in list[(].*[.]keys[(][)][)]:'
Signed-off-by: Niels Thykier <niels@thykier.net>
It was a dict with a two element list a la:
{
"id1": [len(nodeListX), nodeListX]],
"id2": [len(nodeListY), nodeListY]],
...
}
This can trivially be simplified to:
{
"id1": nodeListX,
"id2": nodeListY,
...
}
The two call-sites that actually needs the length (e.g. to sort by how
often the id is used) can trivially compute that via a call to "len".
All other call sites either just need to tell if an ID is used at all
or work the nodes referencing the id (e.g. to remap the id). The
former are unaffected by this change and the latter can now avoid a
layer of indirection.
This refactoring has negiable changes to the runtime and probably also
to memory (not tested, but it is a minor constant improvement per
referenced id).
Signed-off-by: Niels Thykier <niels@thykier.net>
The removeUnusedDefs function does not actually remove anything (that
is left for its callers to do). This implies that
findReferencedElements will return the same value before, during and
after a call to removeUnusedDefs. Therefore, we can reuse the value
from findReferencedElements when recursing into child nodes.
Signed-off-by: Niels Thykier <niels@thykier.net>
Split the handling of referencingProps into a separate loop that calls
findReferencingProperty directly. This saves a bunch of "make list,
join list, append to another list and eventually split text into two
elements" operations.
This gives approximately 10% faster runtimes on 341 kB flamegraph
generated by the "nytprof" Perl profiler.
Signed-off-by: Niels Thykier <niels@thykier.net>
The bare "except" also catches exceptions like "NameError" and
"SystemExit", which we really should not catch. In scour.py, use the
most specific exception (NotFoundErr) and in the tests just catch any
"regular" exception.
Reported by flake8.
Signed-off-by: Niels Thykier <niels@thykier.net>
There has been a minor rearrangement of the code that handles the children
of the element being serialized: The relevant `if' statement has had its
condition effectively negated and thus has also had its consequent and
alternative swapped; now, there is a very short consequent, followed by a
very long alternative, rather than a very long consequent followed by a
very short alternative.
* Do not collapse straight path segments in paths that have intermediate markers (see #145). The intermediate nodes might be unnecessary for the shape of the path, but their markers would be lost.
* Collapse subpaths of moveto `m` and lineto `l` commands if they have the same direction (before we only collapsed horizontal/vertical `h`/`v` lineto commands)
* Attempt to collapse lineto `l` commands into a preceding moveto `m` command (these are then called "implicit lineto commands")
* Preserve empty path segments if they have `stroke-linecap` set to `round` or `square`. They render no visible line but a tiny dot or square.
Third-party applications obviously can not handle additional output on stdout nor can they be expected to do any weird stdout/sterr redirection as we do via `options.stdout`
We probably shouldn't print anything in `scourString()` to start with unless we offer an option to disable all non-SVG output for third-party libraries to use.
- prevent '--set-precision=0' by requiring >=1
- warn user if '--set-c-precision' > '--set-precision' instead of silently ignoring the value
- some code cleanup