#3714 RawHeader.get can return also string lists
Closed 2 years ago by tkopecek. Opened 2 years ago by tkopecek.
tkopecek/koji issue3713  into  master

file modified
+27 -2
@@ -754,7 +754,12 @@ 

              elif dtype == 6:

                  # string (null terminated)

                  end = self.header.find(six.b('\0'), pos)

-                 print("String(%d): %r" % (end - pos, self.header[pos:end]))

+                 try:

+                     print("String(%d): %r" % (end - pos, _decode_item(self.header[pos:end])))

+                 except ValueError:

+                     print('INVALID STRING')

+                     print("String(%d): %r" % (end - pos, self.header[pos:end]))

+                     raise

                  next = end + 1

              elif dtype == 7:

                  print("Data: %s" % hex_string(self.header[pos:pos + count]))
@@ -770,7 +775,11 @@ 

                  # unicode string array

                  for i in range(count):

                      end = self.header.find(six.b('\0'), pos)

-                     print("i18n(%d): %r" % (end - pos, self.header[pos:end]))

+                     try:

+                         print("i18n(%d): %r" % (end - pos, _decode_item(self.header[pos:end])))

+                     except Exception:

+                         print('INVALID STRING')

+                         print("i18n(%d): %r" % (end - pos, self.header[pos:end]))

                      pos = end + 1

                  next = pos

              else:
@@ -807,6 +816,22 @@ 

          elif dtype == 7:

              # raw data

              return self.header[pos:pos + count]

+         elif dtype == 8:

+             # string array

+             result = []

+             for _ in range(count):

+                 end = self.header.find(six.b('\0'), pos)

+                 result.append(self.header[pos:end])

+                 pos = end + 1

+             return result

+         elif dtype == 9:

+             # unicode string array

+             result = []

+             for _ in range(count):

+                 end = self.header.find(six.b('\0'), pos)

+                 result.append(_decode_item(self.header[pos:end]))

+                 pos = end + 1

+             return result

          else:

              # XXX - not all valid data types are handled

              raise GenericError("Unable to read header data type: %x" % dtype)

Maybe also dtype=9?

https://pagure.io/fork/mikem/koji/commits/pr3714updates#

This should cover all current tag types.

Also kinda wondering if we should also decode in get() for dtype=6, but that would be an api change.

I'm not sure here. But if we do that, dtype=8 need also decoding.

rebased onto 710654d

2 years ago

@mikem Is there a reason to not reraise exception in dump/9?

Is there a reason to not reraise exception in dump/9?

The goal of dump is just to report what it finds. Bad content in an rpm header shouldn't be an error, just a fact to report the details about.

But if we do that, dtype=8 need also decoding.

An old comment had me misinterpreting type 9. The difference between type 8 and 9 isn't unicode or not, but internationalization. When rpm reads an i18n entry, it checks RPMTAG_HEADERI18NTABLE and the current locale for translations (unless the HEADERGET_RAW flag is used). That behavior is way too much for this class to bother with, so I think we just emulate the raw behavior (which treats type 8 and 9 essentially the same).

The python bindings from rpmlib itself return all these header types (6, 8, and 9) as strings, and in casual experiments rpmbuild errors if you try to make an rpm with invalid unicode in such fields.

RawHeader was originally written for python2 and didn't have to worry about the distinction. We didn't add any decoding to _getitem in the py3 port, so now we have an established history of returning bytes.

Added some changes here that make the decoding optional and more systematic. Also some other fixes.

This is inflating a bit and it might make sense to combine with my other PR (#3703). Wouldn't hurt to add some unit tests too.

Is #3713 causing us a problem somewhere? If so we could probably just go with your original commit for the sake of getting a fix.

One thing I'm still puzzling about is that all the actual examples of dtype=9 I've seen are count=1 and rpmlib seems to return then as single strings (e.g. summary, description) rather than a list.

I've combined these changes and my updates into #3703

It seems that most of the changes from here were added to #3712 inadvertently. Only the "flake8 fix" commit is left if this PR is rebased (and make flake8 seems fine without it).

Pull-Request has been closed by tkopecek

2 years ago
Metadata