[generic] Allow parsing when first 512 bytes are whitespace

is_html(first_bytes) will fail if the first 512 bytes of the URL are
all whitespace, for some weird reason. Such a case probably is not a
direct video link, the case we're concerned about downloading
inadvertently, since that wouldn't be a valid video binary file
format.

But it's still peculiar, so don't silently ignore it -- print a
warning and continue on.
This commit is contained in:
John Hawkinson 2017-03-19 21:01:47 -04:00
parent 6206194c5a
commit 00bc75ca01
1 changed files with 6 additions and 3 deletions

View File

@ -1759,9 +1759,12 @@ class GenericIE(InfoExtractor):
self._sort_formats(info_dict['formats'])
return info_dict
# Maybe it's a direct link to a video?
# Be careful not to download the whole thing!
if not is_html(first_bytes):
if re.match(r'^\s+$', first_bytes):
self._downloader.report_warning(
'First block is just whitespace? Continuing...')
elif not is_html(first_bytes):
# Maybe it's a direct link to a video?
# Be careful not to download the whole thing!
self._downloader.report_warning(
'URL could be a direct video link, returning it as such.')
info_dict.update({