shortenIDs: Avoid pointless renames of IDs

With the current code, scour could do a pointless remap of an ID,
where there is no benefit in it.  Consider:

```xml
<?xml version="1.0" encoding="UTF-8"?>
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
 <defs>
  <rect id="a" width="80" height="50" fill="red"/>
  <rect id="b" width="80" height="50" fill="blue"/>
 </defs>
 <use xlink:href="#a"/>
 <use xlink:href="#b"/>
 <use xlink:href="#b"/>
</svg>
```

In this example, there is no point in swapping the IDs - even if "#b"
is used more often than "#a", they have the same length.  Besides a
performance win on an already scour'ed image, it also mean scour will
behave like a function with a fixed-point (i.e. scour eventually stops
altering the image).

To solve this, we no longer check whether an we find exactly the same
ID.  Instead, we look at the length of the new ID compared to the
original.  This gives us a slight complication as we can now "reserve"
a "future" ID to avoid the rename.

Thanks to Eduard "Ede_123" Braun for providing the test case.

Signed-off-by: Niels Thykier <niels@thykier.net>
This commit is contained in:
Niels Thykier 2018-04-15 16:29:44 +00:00
parent 00cf42b554
commit d6406a3470
3 changed files with 43 additions and 5 deletions

View file

@ -715,22 +715,36 @@ def shortenIDs(doc, prefix, options):
idList.extend([rid for rid in identifiedElements if rid not in idList])
# Ensure we do not reuse a protected ID by accident
protectedIDs = protected_ids(identifiedElements, options)
# IDs that we have used "ahead of time". Happens if we are about to
# rename an ID and there is no length difference between the new and
# the old ID.
consumedIDs = set()
curIdNum = 1
for rid in idList:
curId = intToID(curIdNum, prefix)
# Skip ahead if the new ID is in use and protected.
while curId in protectedIDs:
# Skip ahead if the new ID has already been used or is protected.
while curId in protectedIDs or curId in consumedIDs:
curIdNum += 1
curId = intToID(curIdNum, prefix)
# Now check if we found a new ID (we can happen to choose the same
# ID. More likely on a rerun)
if curId != rid:
# Avoid checking the ID if it will not affect the size of the document
# (e.g. remapping "c" to "a" is not going to win us anything)
if len(curId) != len(rid):
# Then go rename it.
num += renameID(doc, rid, curId, identifiedElements, referencedIDs)
elif curId < rid:
# If we skip reassigning an ID because it has the same length
# (E.g. skipping "c" -> "a"), then we have to mark the "future"
# ID as taken ("c" in the example).
# The "strictly less than" in the condition is to ensure that we
# properly update curIdNum in the corner case where curId == rid.
consumedIDs.add(rid)
# Use continue here without updating curIdNum to avoid losing
# the current ID.
continue
curIdNum += 1
return num

View file

@ -1948,6 +1948,19 @@ class ShortenIDsOption(unittest.TestCase):
'Did not update reference to shortened ID')
class ShortenIDsStableOutput(unittest.TestCase):
def runTest(self):
doc = scourXmlFile('unittests/shorten-ids-stable-output.svg',
parse_args(['--shorten-ids']))
use_tags = doc.getElementsByTagName('use')
hrefs_ordered = [x.getAttributeNS('http://www.w3.org/1999/xlink', 'href')
for x in use_tags]
expected = ['#a', '#b', '#b']
self.assertEquals(hrefs_ordered, expected,
'--shorten-ids pointlessly reassigned ids')
class MustKeepGInSwitch(unittest.TestCase):
def runTest(self):

View file

@ -0,0 +1,11 @@
<?xml version="1.0" encoding="UTF-8"?>
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<defs>
<rect id="a" width="80" height="50" fill="red"/>
<rect id="b" width="80" height="50" fill="blue"/>
</defs>
<use xlink:href="#a"/>
<use xlink:href="#b"/>
<use xlink:href="#b"/>
</svg>

After

Width:  |  Height:  |  Size: 323 B