An optimization that prunes nested <g>-tags when they contain exactly
one <g> and nothing else (except whitespace nodes). This looks a bit
like `removeNestedGroups` except it only touches <g> tags without
attributes (but can remove <g>-tags completely from a tree, whereas
this optimization always leaves at least one <g> tag behind).
Closes: #215
Signed-off-by: Niels Thykier <niels@thykier.net>
If someone gave it an ID and we have not stripped it, then it is
probably important and can alter the output somehow if we fiddle merge
it into another node (or discard the <g> node with an ID).
Signed-off-by: Niels Thykier <niels@thykier.net>
Both `mergeSiblingGroupsWithCommonAttributes` and `removeNestedGroups`
used the same code in different forms. Extract it into its own
function.
Signed-off-by: Niels Thykier <niels@thykier.net>
This rename makes py.test/py.test-3 find the test suite out of the
box. Example command lines:
# Running the test suite (optionally include "-v")
$ py.test-3
# Running the test suite with coverage enabled (and branch
# coverage).
$ py.test-3 --cov=scour --cov-report=html --cov-branch
Signed-off-by: Niels Thykier <niels@thykier.net>
In some cases, gnuplot generates a very suboptimal SVG content of the
following pattern:
<g color="black" fill="none" stroke="currentColor">
<path d="m82.5 323.3v-4.1" stroke="#000"/>
</g>
<g color="black" fill="none" stroke="currentColor">
<path d="m116.4 323.3v-4.1" stroke="#000"/>
</g>
... repeated 10+ more times here ...
<g color="black" fill="none" stroke="currentColor">
<path d="m65.4 72.8v250.5h420v-250.5h-420z" stroke="#000"/>
</g>
A more optimal pattern would be:
<g color="black" fill="none" stroke="#000">
<path d="m82.5 323.3v-4.1"/>
<path d="m116.4 323.3v-4.1"/>
... 10+ more paths here ...
<path d="m65.4 72.8v250.5h420v-250.5h-420z"/>
</g>
This patch enables that optimization by handling the merging of two
sibling <g> entries that have identical attributes. In the above
example that does not solve the rewrite from "currentColor" to "#000"
for the stroke attribute. However, the existing code already handles
that automatically after the <g> elements have been merged.
This change provides comparable results to --create-groups as shown by
the following diagram while being a distinct optimization:
+----------------------------+-------+--------+
| Test | Size | in % |
+----------------------------+-------+--------+
| baseline | 17961 | 100% |
| baseline + --create-groups | 17418 | 97.0% |
| patched | 16939 | 94.3% |
| patched + --create-groups | 16855 | 93.8% |
+----------------------------+-------+--------+
The image used in the size table above was generated based on the
instructions from https://bugs.debian.org/858039#10 with gnuplot 5.2
patchlevel 2. Beyond the test-based "--create-groups", the following
scour command-line parameters were used:
--enable-id-stripping --enable-comment-stripping \
--shorten-ids --indent=none
Note that the baseline was scour'ed repeatedly to stablize the image
size.
Signed-off-by: Niels Thykier <niels@thykier.net>
Follow the spec "blindly" as it turns out covering all the border
and getting reasonably styled output is just to cumbersome.
This way at least scour output is consistent and it also saves us
some bytes (a lot in some cases as we do not indent <tspan>s etc.
anymore)
SVG specifies special logic for handling whitespace, see
https://www.w3.org/TR/SVG/text.html#WhiteSpace
by implementing it we can even shave off some unneeded bytes here
and there (e.g. consecutive spaces).
Unfortunately handling of newlines by renderers is inconsistent:
Sometimes they are replaced by a single space, sometimes they
are removed in the output.
As we can not know the expected behavior work around this by keeping
newlines inside text content elements intact.
Fixes#160.
Previously we added way to many and removed empty lines afterwards
(potentially destructive if xml:space="preserve")
Also adds proper indentation for comment nodes
Work around an exception in removeDefaultAttributeValue() caused by some rarely used filter attributes that allow an optional second value which SVGLength does not handle properly
The scale function on the transform attribute has a short form, where
only the first argument is used. But optimizeTransform would always
assume that there were two when checking for the identity scale.
Closes: #190
Signed-off-by: Niels Thykier <niels@thykier.net>
Scour tried to handle "order" attribute as a SVGLength. However, the
"order" attribute *can* consist of two integers according to the
[SVG 1.1 Specification] and SVGLength is not designed to handle that.
With this change, we now pretend that "order" is a string, which side
steps this issue.
[SVG 1.1 Specification]: https://www.w3.org/TR/SVG11/single-page.html#filters-feConvolveMatrixElementOrderAttributeCloses: #189
Signed-off-by: Niels Thykier <niels@thykier.net>
This change pushes the responsibility of updating referencedIDs to its
callers where needed. The only caller of renameIDs is shortenIDs and
that works perfectly fine without updating its copy of referencedIDs.
In shortenIDs, we need to be able to lookup which nodes referenced the
"original ID" (and not the "new ID").
While shortenIDs *could* update referencedIDs so it remained valid, it
is extra complexity for no gain. As an example of this complexity,
imagine if two or more IDs are "rotated" like so:
Original IDs: a, bb, ccc, dddd
Mapping:
dddd -> ccc
ccc -> bb
bb -> a
a -> dddd
While doable within reasonable performance, we do not need to support
it at the moment, so there is no reason to handle that complexity.
Signed-off-by: Niels Thykier <niels@thykier.net>
With the current code, scour could do a pointless remap of an ID,
where there is no benefit in it. Consider:
```xml
<?xml version="1.0" encoding="UTF-8"?>
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<defs>
<rect id="a" width="80" height="50" fill="red"/>
<rect id="b" width="80" height="50" fill="blue"/>
</defs>
<use xlink:href="#a"/>
<use xlink:href="#b"/>
<use xlink:href="#b"/>
</svg>
```
In this example, there is no point in swapping the IDs - even if "#b"
is used more often than "#a", they have the same length. Besides a
performance win on an already scour'ed image, it also mean scour will
behave like a function with a fixed-point (i.e. scour eventually stops
altering the image).
To solve this, we no longer check whether an we find exactly the same
ID. Instead, we look at the length of the new ID compared to the
original. This gives us a slight complication as we can now "reserve"
a "future" ID to avoid the rename.
Thanks to Eduard "Ede_123" Braun for providing the test case.
Signed-off-by: Niels Thykier <niels@thykier.net>
Avoid looping over DefaultAttribute(s) that are not relevant for a
given node. This skips a lot of calls to removeDefaultAttributeValue
but more importantly, it avoids "node.nodeName not in attribute.elements"
line in removeDefaultAttributeValue. As attribute.elements is a list, this
becomes expensive for "larger lists" (or in this case when there are a lot
of attributes).
This seems to remove about 1½-2 minutes of runtime (out of ~8) on the
1_42_polytope_7-cube.svg test case provided in #184.
Signed-off-by: Niels Thykier <niels@thykier.net>
There are a lot of "DefaultAttribute"s and for a given tag, most of
the "DefaultAttribute"s are not applicable. Therefore, we create two
data structures to assist us with only dealing with the attributes
that matter.
Here there are two cases:
* Those that always matter. These go into
default_attributes_unrestricted list.
* Those that matter only based on the node name. These go into the
default_attributes_restricted_by_tag with the node name as key
(with the value being a list of matching attributes).
In the next commit, we will use those for optimizing the removal of
default attributes.
Signed-off-by: Niels Thykier <niels@thykier.net>
This patch enables shortenIDs to remap IDs currently in use. This is
very helpful to ensure that scour does not change been "optimal" and
"suboptimal" choices for IDs as observed in GH#186.
Closes: #186
Signed-off-by: Niels Thykier <niels@thykier.net>
* properly parse paths without space after boolean flags (fixes#161)
* omit space after boolean flag to shave off a few bytes when not using renderer workarounds
In python2.7 and python3.3, time.time() is sufficient accurate for our
purpose and avoids going through hoops to select the best available
function.
Signed-off-by: Niels Thykier <niels@thykier.net>
This basic implementation can drop and rewrite some cases of "m0 0"
and "z" without triggering the issues experienced in #163. It works
by analysing the path backwards and tracking "z" and "m" commands.
Signed-off-by: Niels Thykier <niels@thykier.net>
The "m0 0" rewrite gets some cases wrong, like:
m150 240h200m0 0 150 150v-300z
Scour rewrote that into the following
m150 240h200l150 150v-300z
However, these two paths do not produce an identical figure at all.
The first is a line followed by a triangle while the second is a
quadrilateral.
While there are some instances we can rewrite (that scour will no
longer rewrite), these will require an analysis over multiple commands
to determine whether the rewrite is safe. This will reappear in the
next commit.
Closes: #163
Signed-off-by: Niels Thykier <niels@thykier.net>