[generic] utf8 decode before re.match(), for Python 3

Otherwise we raise TypeError: can't use a string pattern on a bytes-like object This perhaps argues for putting it in is_html(), which already does this decoding. But of course plain whitespace isn't just html. So perhaps renaming is_html()? I dunno what is simpler. Let's start with this.
2024-11-22 08:34:32 +01:00 · 2017-03-19 21:52:13 -04:00 · 2017-03-19 21:52:13 -04:00 · a5d5a2c068
commit a5d5a2c068
parent 00bc75ca01
1 changed files with 1 additions and 1 deletions
--- a/youtube_dl/extractor/generic.py
+++ b/youtube_dl/extractor/generic.py
@ -1759,7 +1759,7 @@ class GenericIE(InfoExtractor):
            self._sort_formats(info_dict['formats'])
            return info_dict

-        if re.match(r'^\s+$', first_bytes):
+        if re.match(r'^\s+$', first_bytes.decode('utf-8', 'replace')):
            self._downloader.report_warning(
                'First block is just whitespace? Continuing...')
        elif not is_html(first_bytes):