1
0
mirror of https://codeberg.org/polarisfm/youtube-dl synced 2024-11-26 02:14:32 +01:00

Merge remote-tracking branch 'upstream/master' into channel-verified

This commit is contained in:
Andrew Udvare 2019-01-27 21:39:52 -05:00
commit 733a0a0b18
No known key found for this signature in database
GPG Key ID: 1AFD9AFC120C26DD
55 changed files with 1941 additions and 591 deletions

View File

@ -6,8 +6,8 @@
--- ---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2019.01.02*. If it's not, read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected. ### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2019.01.27*. If it's not, read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2019.01.02** - [ ] I've **verified** and **I assure** that I'm running youtube-dl **2019.01.27**
### Before submitting an *issue* make sure you have: ### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through the [README](https://github.com/rg3/youtube-dl/blob/master/README.md), **most notably** the [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections - [ ] At least skimmed through the [README](https://github.com/rg3/youtube-dl/blob/master/README.md), **most notably** the [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@ -36,7 +36,7 @@ Add the `-v` flag to **your command line** you run youtube-dl with (`youtube-dl
[debug] User config: [] [debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2019.01.02 [debug] youtube-dl version 2019.01.27
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {} [debug] Proxy map: {}

View File

@ -339,7 +339,7 @@ Incorrect:
'PLMYEtVRpaqY00V9W81Cwmzp6N6vZqfUKD4' 'PLMYEtVRpaqY00V9W81Cwmzp6N6vZqfUKD4'
``` ```
### Use safe conversion functions ### Use convenience conversion and parsing functions
Wrap all extracted numeric data into safe functions from [`youtube_dl/utils.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/utils.py): `int_or_none`, `float_or_none`. Use them for string to number conversions as well. Wrap all extracted numeric data into safe functions from [`youtube_dl/utils.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/utils.py): `int_or_none`, `float_or_none`. Use them for string to number conversions as well.
@ -347,6 +347,8 @@ Use `url_or_none` for safe URL processing.
Use `try_get` for safe metadata extraction from parsed JSON. Use `try_get` for safe metadata extraction from parsed JSON.
Use `unified_strdate` for uniform `upload_date` or any `YYYYMMDD` meta field extraction, `unified_timestamp` for uniform `timestamp` extraction, `parse_filesize` for `filesize` extraction, `parse_count` for count meta fields extraction, `parse_resolution`, `parse_duration` for `duration` extraction, `parse_age_limit` for `age_limit` extraction.
Explore [`youtube_dl/utils.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/utils.py) for more useful convenience functions. Explore [`youtube_dl/utils.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/utils.py) for more useful convenience functions.
#### More examples #### More examples

105
ChangeLog
View File

@ -1,3 +1,108 @@
version 2019.01.27
Core
+ [extractor/common] Extract season in _json_ld
* [postprocessor/ffmpeg] Fallback to ffmpeg/avconv for audio codec detection
(#681)
Extractors
* [vice] Fix extraction for locked videos (#16248)
+ [wakanim] Detect DRM protected videos
+ [wakanim] Add support for wakanim.tv (#14374)
* [usatoday] Fix extraction for videos with custom brightcove partner id
(#18990)
* [drtv] Fix extraction (#18989)
* [nhk] Extend URL regular expression (#18968)
* [go] Fix Adobe Pass requests for Disney Now (#18901)
+ [openload] Add support for oload.club (#18969)
version 2019.01.24
Core
* [YoutubeDL] Fix negation for string operators in format selection (#18961)
version 2019.01.23
Core
* [utils] Fix urljoin for paths with non-http(s) schemes
* [extractor/common] Improve jwplayer relative URL handling (#18892)
+ [YoutubeDL] Add negation support for string comparisons in format selection
expressions (#18600, #18805)
* [extractor/common] Improve HLS video-only format detection (#18923)
Extractors
* [crunchyroll] Extend URL regular expression (#18955)
* [pornhub] Bypass scrape detection (#4822, #5930, #7074, #10175, #12722,
#17197, #18338 #18842, #18899)
+ [vrv] Add support for authentication (#14307)
* [videomore:season] Fix extraction
* [videomore] Improve extraction (#18908)
+ [tnaflix] Pass Referer in metadata request (#18925)
* [radiocanada] Relax DRM check (#18608, #18609)
* [vimeo] Fix video password verification for videos protected by
Referer HTTP header
+ [hketv] Add support for hkedcity.net (#18696)
+ [streamango] Add support for fruithosts.net (#18710)
+ [instagram] Add support for tags (#18757)
+ [odnoklassniki] Detect paid videos (#18876)
* [ted] Correct acodec for HTTP formats (#18923)
* [cartoonnetwork] Fix extraction (#15664, #17224)
* [vimeo] Fix extraction for password protected player URLs (#18889)
version 2019.01.17
Extractors
* [youtube] Extend JS player signature function name regular expressions
(#18890, #18891, #18893)
version 2019.01.16
Core
+ [test/helper] Add support for maxcount and count collection len checkers
* [downloader/hls] Fix uplynk ad skipping (#18824)
* [postprocessor/ffmpeg] Improve ffmpeg version parsing (#18813)
Extractors
* [youtube] Skip unsupported adaptive stream type (#18804)
+ [youtube] Extract DASH formats from player response (#18804)
* [funimation] Fix extraction (#14089)
* [skylinewebcams] Fix extraction (#18853)
+ [curiositystream] Add support for non app URLs
+ [bitchute] Check formats (#18833)
* [wistia] Extend URL regular expression (#18823)
+ [playplustv] Add support for playplus.com (#18789)
version 2019.01.10
Core
* [extractor/common] Use episode name as title in _json_ld
+ [extractor/common] Add support for movies in _json_ld
* [postprocessor/ffmpeg] Embed subtitles with non-standard language codes
(#18765)
+ [utils] Add language codes replaced in 1989 revision of ISO 639
to ISO639Utils (#18765)
Extractors
* [youtube] Extract live HLS URL from player response (#18799)
+ [outsidetv] Add support for outsidetv.com (#18774)
* [jwplatform] Use JW Platform Delivery API V2 and add support for more URLs
+ [fox] Add support National Geographic (#17985, #15333, #14698)
+ [playplustv] Add support for playplus.tv (#18789)
* [globo] Set GLBID cookie manually (#17346)
+ [gaia] Add support for gaia.com (#14605)
* [youporn] Fix title and description extraction (#18748)
+ [hungama] Add support for hungama.com (#17402, #18771)
* [dtube] Fix extraction (#18741)
* [tvnow] Fix and rework extractors and prepare for a switch to the new API
(#17245, #18499)
* [carambatv:page] Fix extraction (#18739)
version 2019.01.02 version 2019.01.02
Extractors Extractors

View File

@ -496,7 +496,7 @@ The `-o` option allows users to indicate a template for the output file names.
**tl;dr:** [navigate me to examples](#output-template-examples). **tl;dr:** [navigate me to examples](#output-template-examples).
The basic usage is not to set any template arguments when downloading a single file, like in `youtube-dl -o funny_video.flv "https://some/video"`. However, it may contain special sequences that will be replaced when downloading each video. The special sequences may be formatted according to [python string formatting operations](https://docs.python.org/2/library/stdtypes.html#string-formatting). For example, `%(NAME)s` or `%(NAME)05d`. To clarify, that is a percent symbol followed by a name in parentheses, followed by a formatting operations. Allowed names along with sequence type are: The basic usage is not to set any template arguments when downloading a single file, like in `youtube-dl -o funny_video.flv "https://some/video"`. However, it may contain special sequences that will be replaced when downloading each video. The special sequences may be formatted according to [python string formatting operations](https://docs.python.org/2/library/stdtypes.html#string-formatting). For example, `%(NAME)s` or `%(NAME)05d`. To clarify, that is a percent symbol followed by a name in parentheses, followed by formatting operations. Allowed names along with sequence type are:
- `id` (string): Video identifier - `id` (string): Video identifier
- `title` (string): Video title - `title` (string): Video title
@ -667,7 +667,7 @@ The following numeric meta fields can be used with comparisons `<`, `<=`, `>`, `
- `asr`: Audio sampling rate in Hertz - `asr`: Audio sampling rate in Hertz
- `fps`: Frame rate - `fps`: Frame rate
Also filtering work for comparisons `=` (equals), `!=` (not equals), `^=` (begins with), `$=` (ends with), `*=` (contains) and following string meta fields: Also filtering work for comparisons `=` (equals), `^=` (starts with), `$=` (ends with), `*=` (contains) and following string meta fields:
- `ext`: File extension - `ext`: File extension
- `acodec`: Name of the audio codec in use - `acodec`: Name of the audio codec in use
- `vcodec`: Name of the video codec in use - `vcodec`: Name of the video codec in use
@ -675,6 +675,8 @@ Also filtering work for comparisons `=` (equals), `!=` (not equals), `^=` (begin
- `protocol`: The protocol that will be used for the actual download, lower-case (`http`, `https`, `rtsp`, `rtmp`, `rtmpe`, `mms`, `f4m`, `ism`, `http_dash_segments`, `m3u8`, or `m3u8_native`) - `protocol`: The protocol that will be used for the actual download, lower-case (`http`, `https`, `rtsp`, `rtmp`, `rtmpe`, `mms`, `f4m`, `ism`, `http_dash_segments`, `m3u8`, or `m3u8_native`)
- `format_id`: A short description of the format - `format_id`: A short description of the format
Any string comparison may be prefixed with negation `!` in order to produce an opposite comparison, e.g. `!*=` (does not contain).
Note that none of the aforementioned meta fields are guaranteed to be present since this solely depends on the metadata obtained by particular extractor, i.e. the metadata offered by the video hoster. Note that none of the aforementioned meta fields are guaranteed to be present since this solely depends on the metadata obtained by particular extractor, i.e. the metadata offered by the video hoster.
Formats for which the value is not known are excluded unless you put a question mark (`?`) after the operator. You can combine format filters, so `-f "[height <=? 720][tbr>500]"` selects up to 720p videos (or videos where the height is not known) with a bitrate of at least 500 KBit/s. Formats for which the value is not known are excluded unless you put a question mark (`?`) after the operator. You can combine format filters, so `-f "[height <=? 720][tbr>500]"` selects up to 720p videos (or videos where the height is not known) with a bitrate of at least 500 KBit/s.
@ -1211,7 +1213,7 @@ Incorrect:
'PLMYEtVRpaqY00V9W81Cwmzp6N6vZqfUKD4' 'PLMYEtVRpaqY00V9W81Cwmzp6N6vZqfUKD4'
``` ```
### Use safe conversion functions ### Use convenience conversion and parsing functions
Wrap all extracted numeric data into safe functions from [`youtube_dl/utils.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/utils.py): `int_or_none`, `float_or_none`. Use them for string to number conversions as well. Wrap all extracted numeric data into safe functions from [`youtube_dl/utils.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/utils.py): `int_or_none`, `float_or_none`. Use them for string to number conversions as well.
@ -1219,6 +1221,8 @@ Use `url_or_none` for safe URL processing.
Use `try_get` for safe metadata extraction from parsed JSON. Use `try_get` for safe metadata extraction from parsed JSON.
Use `unified_strdate` for uniform `upload_date` or any `YYYYMMDD` meta field extraction, `unified_timestamp` for uniform `timestamp` extraction, `parse_filesize` for `filesize` extraction, `parse_count` for count meta fields extraction, `parse_resolution`, `parse_duration` for `duration` extraction, `parse_age_limit` for `age_limit` extraction.
Explore [`youtube_dl/utils.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/utils.py) for more useful convenience functions. Explore [`youtube_dl/utils.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/utils.py) for more useful convenience functions.
#### More examples #### More examples

View File

@ -320,6 +320,7 @@
- **Fusion** - **Fusion**
- **Fux** - **Fux**
- **FXNetworks** - **FXNetworks**
- **Gaia**
- **GameInformer** - **GameInformer**
- **GameOne** - **GameOne**
- **gameone:playlist** - **gameone:playlist**
@ -360,6 +361,7 @@
- **hitbox** - **hitbox**
- **hitbox:live** - **hitbox:live**
- **HitRecord** - **HitRecord**
- **hketv**: 香港教育局教育電視 (HKETV) Educational Television, Hong Kong Educational Bureau
- **HornBunny** - **HornBunny**
- **HotNewHipHop** - **HotNewHipHop**
- **hotstar** - **hotstar**
@ -370,6 +372,8 @@
- **HRTiPlaylist** - **HRTiPlaylist**
- **Huajiao**: 花椒直播 - **Huajiao**: 花椒直播
- **HuffPost**: Huffington Post - **HuffPost**: Huffington Post
- **Hungama**
- **HungamaSong**
- **Hypem** - **Hypem**
- **Iconosquare** - **Iconosquare**
- **ign.com** - **ign.com**
@ -383,6 +387,7 @@
- **IndavideoEmbed** - **IndavideoEmbed**
- **InfoQ** - **InfoQ**
- **Instagram** - **Instagram**
- **instagram:tag**: Instagram hashtag search
- **instagram:user**: Instagram user profile - **instagram:user**: Instagram user profile
- **Internazionale** - **Internazionale**
- **InternetVideoArchive** - **InternetVideoArchive**
@ -540,8 +545,6 @@
- **MyviEmbed** - **MyviEmbed**
- **MyVisionTV** - **MyVisionTV**
- **n-tv.de** - **n-tv.de**
- **natgeo**
- **natgeo:episodeguide**
- **natgeo:video** - **natgeo:video**
- **Naver** - **Naver**
- **NBA** - **NBA**
@ -642,6 +645,7 @@
- **orf:oe1**: Radio Österreich 1 - **orf:oe1**: Radio Österreich 1
- **orf:tvthek**: ORF TVthek - **orf:tvthek**: ORF TVthek
- **OsnatelTV** - **OsnatelTV**
- **OutsideTV**
- **PacktPub** - **PacktPub**
- **PacktPubCourse** - **PacktPubCourse**
- **PandaTV**: 熊猫TV - **PandaTV**: 熊猫TV
@ -666,6 +670,7 @@
- **Pinkbike** - **Pinkbike**
- **Pladform** - **Pladform**
- **play.fm** - **play.fm**
- **PlayPlusTV**
- **PlaysTV** - **PlaysTV**
- **Playtvak**: Playtvak.cz, iDNES.cz and Lidovky.cz - **Playtvak**: Playtvak.cz, iDNES.cz and Lidovky.cz
- **Playvid** - **Playvid**
@ -934,7 +939,9 @@
- **TVNet** - **TVNet**
- **TVNoe** - **TVNoe**
- **TVNow** - **TVNow**
- **TVNowList** - **TVNowAnnual**
- **TVNowNew**
- **TVNowSeason**
- **TVNowShow** - **TVNowShow**
- **tvp**: Telewizja Polska - **tvp**: Telewizja Polska
- **tvp:embed**: Telewizja Polska - **tvp:embed**: Telewizja Polska
@ -1062,6 +1069,7 @@
- **VVVVID** - **VVVVID**
- **VyboryMos** - **VyboryMos**
- **Vzaar** - **Vzaar**
- **Wakanim**
- **Walla** - **Walla**
- **WalyTV** - **WalyTV**
- **washingtonpost** - **washingtonpost**

View File

@ -153,15 +153,27 @@ def expect_value(self, got, expected, field):
isinstance(got, compat_str), isinstance(got, compat_str),
'Expected field %s to be a unicode object, but got value %r of type %r' % (field, got, type(got))) 'Expected field %s to be a unicode object, but got value %r of type %r' % (field, got, type(got)))
got = 'md5:' + md5(got) got = 'md5:' + md5(got)
elif isinstance(expected, compat_str) and expected.startswith('mincount:'): elif isinstance(expected, compat_str) and re.match(r'^(?:min|max)?count:\d+', expected):
self.assertTrue( self.assertTrue(
isinstance(got, (list, dict)), isinstance(got, (list, dict)),
'Expected field %s to be a list or a dict, but it is of type %s' % ( 'Expected field %s to be a list or a dict, but it is of type %s' % (
field, type(got).__name__)) field, type(got).__name__))
expected_num = int(expected.partition(':')[2]) op, _, expected_num = expected.partition(':')
assertGreaterEqual( expected_num = int(expected_num)
if op == 'mincount':
assert_func = assertGreaterEqual
msg_tmpl = 'Expected %d items in field %s, but only got %d'
elif op == 'maxcount':
assert_func = assertLessEqual
msg_tmpl = 'Expected maximum %d items in field %s, but got %d'
elif op == 'count':
assert_func = assertEqual
msg_tmpl = 'Expected exactly %d items in field %s, but got %d'
else:
assert False
assert_func(
self, len(got), expected_num, self, len(got), expected_num,
'Expected %d items in field %s, but only got %d' % (expected_num, field, len(got))) msg_tmpl % (expected_num, field, len(got)))
return return
self.assertEqual( self.assertEqual(
expected, got, expected, got,
@ -237,6 +249,20 @@ def assertGreaterEqual(self, got, expected, msg=None):
self.assertTrue(got >= expected, msg) self.assertTrue(got >= expected, msg)
def assertLessEqual(self, got, expected, msg=None):
if not (got <= expected):
if msg is None:
msg = '%r not less than or equal to %r' % (got, expected)
self.assertTrue(got <= expected, msg)
def assertEqual(self, got, expected, msg=None):
if not (got == expected):
if msg is None:
msg = '%r not equal to %r' % (got, expected)
self.assertTrue(got == expected, msg)
def expect_warnings(ydl, warnings_re): def expect_warnings(ydl, warnings_re):
real_warning = ydl.report_warning real_warning = ydl.report_warning

View File

@ -497,7 +497,64 @@ jwplayer("mediaplayer").setup({"abouttext":"Visit Indie DB","aboutlink":"http:\/
'width': 1280, 'width': 1280,
'height': 720, 'height': 720,
}] }]
) ),
(
# https://github.com/rg3/youtube-dl/issues/18923
# https://www.ted.com/talks/boris_hesser_a_grassroots_healthcare_revolution_in_africa
'ted_18923',
'http://hls.ted.com/talks/31241.m3u8',
[{
'url': 'http://hls.ted.com/videos/BorisHesser_2018S/audio/600k.m3u8?nobumpers=true&uniqueId=76011e2b',
'format_id': '600k-Audio',
'vcodec': 'none',
}, {
'url': 'http://hls.ted.com/videos/BorisHesser_2018S/audio/600k.m3u8?nobumpers=true&uniqueId=76011e2b',
'format_id': '68',
'vcodec': 'none',
}, {
'url': 'http://hls.ted.com/videos/BorisHesser_2018S/video/64k.m3u8?nobumpers=true&uniqueId=76011e2b',
'format_id': '163',
'acodec': 'none',
'width': 320,
'height': 180,
}, {
'url': 'http://hls.ted.com/videos/BorisHesser_2018S/video/180k.m3u8?nobumpers=true&uniqueId=76011e2b',
'format_id': '481',
'acodec': 'none',
'width': 512,
'height': 288,
}, {
'url': 'http://hls.ted.com/videos/BorisHesser_2018S/video/320k.m3u8?nobumpers=true&uniqueId=76011e2b',
'format_id': '769',
'acodec': 'none',
'width': 512,
'height': 288,
}, {
'url': 'http://hls.ted.com/videos/BorisHesser_2018S/video/450k.m3u8?nobumpers=true&uniqueId=76011e2b',
'format_id': '984',
'acodec': 'none',
'width': 512,
'height': 288,
}, {
'url': 'http://hls.ted.com/videos/BorisHesser_2018S/video/600k.m3u8?nobumpers=true&uniqueId=76011e2b',
'format_id': '1255',
'acodec': 'none',
'width': 640,
'height': 360,
}, {
'url': 'http://hls.ted.com/videos/BorisHesser_2018S/video/950k.m3u8?nobumpers=true&uniqueId=76011e2b',
'format_id': '1693',
'acodec': 'none',
'width': 853,
'height': 480,
}, {
'url': 'http://hls.ted.com/videos/BorisHesser_2018S/video/1500k.m3u8?nobumpers=true&uniqueId=76011e2b',
'format_id': '2462',
'acodec': 'none',
'width': 1280,
'height': 720,
}]
),
] ]
for m3u8_file, m3u8_url, expected_formats in _TEST_CASES: for m3u8_file, m3u8_url, expected_formats in _TEST_CASES:

View File

@ -239,6 +239,76 @@ class TestFormatSelection(unittest.TestCase):
downloaded = ydl.downloaded_info_dicts[0] downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'vid-vcodec-dot') self.assertEqual(downloaded['format_id'], 'vid-vcodec-dot')
def test_format_selection_string_ops(self):
formats = [
{'format_id': 'abc-cba', 'ext': 'mp4', 'url': TEST_URL},
{'format_id': 'zxc-cxz', 'ext': 'webm', 'url': TEST_URL},
]
info_dict = _make_result(formats)
# equals (=)
ydl = YDL({'format': '[format_id=abc-cba]'})
ydl.process_ie_result(info_dict.copy())
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'abc-cba')
# does not equal (!=)
ydl = YDL({'format': '[format_id!=abc-cba]'})
ydl.process_ie_result(info_dict.copy())
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'zxc-cxz')
ydl = YDL({'format': '[format_id!=abc-cba][format_id!=zxc-cxz]'})
self.assertRaises(ExtractorError, ydl.process_ie_result, info_dict.copy())
# starts with (^=)
ydl = YDL({'format': '[format_id^=abc]'})
ydl.process_ie_result(info_dict.copy())
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'abc-cba')
# does not start with (!^=)
ydl = YDL({'format': '[format_id!^=abc]'})
ydl.process_ie_result(info_dict.copy())
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'zxc-cxz')
ydl = YDL({'format': '[format_id!^=abc][format_id!^=zxc]'})
self.assertRaises(ExtractorError, ydl.process_ie_result, info_dict.copy())
# ends with ($=)
ydl = YDL({'format': '[format_id$=cba]'})
ydl.process_ie_result(info_dict.copy())
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'abc-cba')
# does not end with (!$=)
ydl = YDL({'format': '[format_id!$=cba]'})
ydl.process_ie_result(info_dict.copy())
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'zxc-cxz')
ydl = YDL({'format': '[format_id!$=cba][format_id!$=cxz]'})
self.assertRaises(ExtractorError, ydl.process_ie_result, info_dict.copy())
# contains (*=)
ydl = YDL({'format': '[format_id*=bc-cb]'})
ydl.process_ie_result(info_dict.copy())
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'abc-cba')
# does not contain (!*=)
ydl = YDL({'format': '[format_id!*=bc-cb]'})
ydl.process_ie_result(info_dict.copy())
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'zxc-cxz')
ydl = YDL({'format': '[format_id!*=abc][format_id!*=zxc]'})
self.assertRaises(ExtractorError, ydl.process_ie_result, info_dict.copy())
ydl = YDL({'format': '[format_id!*=-]'})
self.assertRaises(ExtractorError, ydl.process_ie_result, info_dict.copy())
def test_youtube_format_selection(self): def test_youtube_format_selection(self):
order = [ order = [
'38', '37', '46', '22', '45', '35', '44', '18', '34', '43', '6', '5', '17', '36', '13', '38', '37', '46', '22', '45', '35', '44', '18', '34', '43', '6', '5', '17', '36', '13',

View File

@ -507,6 +507,8 @@ class TestUtil(unittest.TestCase):
self.assertEqual(urljoin('http://foo.de/', ''), None) self.assertEqual(urljoin('http://foo.de/', ''), None)
self.assertEqual(urljoin('http://foo.de/', ['foobar']), None) self.assertEqual(urljoin('http://foo.de/', ['foobar']), None)
self.assertEqual(urljoin('http://foo.de/a/b/c.txt', '.././../d.txt'), 'http://foo.de/d.txt') self.assertEqual(urljoin('http://foo.de/a/b/c.txt', '.././../d.txt'), 'http://foo.de/d.txt')
self.assertEqual(urljoin('http://foo.de/a/b/c.txt', 'rtmp://foo.de'), 'rtmp://foo.de')
self.assertEqual(urljoin(None, 'rtmp://foo.de'), 'rtmp://foo.de')
def test_url_or_none(self): def test_url_or_none(self):
self.assertEqual(url_or_none(None), None) self.assertEqual(url_or_none(None), None)

28
test/testdata/m3u8/ted_18923.m3u8 vendored Normal file
View File

@ -0,0 +1,28 @@
#EXTM3U
#EXT-X-VERSION:4
#EXT-X-STREAM-INF:AUDIO="600k",BANDWIDTH=1255659,PROGRAM-ID=1,CODECS="avc1.42c01e,mp4a.40.2",RESOLUTION=640x360
/videos/BorisHesser_2018S/video/600k.m3u8?nobumpers=true&uniqueId=76011e2b
#EXT-X-STREAM-INF:AUDIO="600k",BANDWIDTH=163154,PROGRAM-ID=1,CODECS="avc1.42c00c,mp4a.40.2",RESOLUTION=320x180
/videos/BorisHesser_2018S/video/64k.m3u8?nobumpers=true&uniqueId=76011e2b
#EXT-X-STREAM-INF:AUDIO="600k",BANDWIDTH=481701,PROGRAM-ID=1,CODECS="avc1.42c015,mp4a.40.2",RESOLUTION=512x288
/videos/BorisHesser_2018S/video/180k.m3u8?nobumpers=true&uniqueId=76011e2b
#EXT-X-STREAM-INF:AUDIO="600k",BANDWIDTH=769968,PROGRAM-ID=1,CODECS="avc1.42c015,mp4a.40.2",RESOLUTION=512x288
/videos/BorisHesser_2018S/video/320k.m3u8?nobumpers=true&uniqueId=76011e2b
#EXT-X-STREAM-INF:AUDIO="600k",BANDWIDTH=984037,PROGRAM-ID=1,CODECS="avc1.42c015,mp4a.40.2",RESOLUTION=512x288
/videos/BorisHesser_2018S/video/450k.m3u8?nobumpers=true&uniqueId=76011e2b
#EXT-X-STREAM-INF:AUDIO="600k",BANDWIDTH=1693925,PROGRAM-ID=1,CODECS="avc1.4d401f,mp4a.40.2",RESOLUTION=853x480
/videos/BorisHesser_2018S/video/950k.m3u8?nobumpers=true&uniqueId=76011e2b
#EXT-X-STREAM-INF:AUDIO="600k",BANDWIDTH=2462469,PROGRAM-ID=1,CODECS="avc1.640028,mp4a.40.2",RESOLUTION=1280x720
/videos/BorisHesser_2018S/video/1500k.m3u8?nobumpers=true&uniqueId=76011e2b
#EXT-X-STREAM-INF:AUDIO="600k",BANDWIDTH=68101,PROGRAM-ID=1,CODECS="mp4a.40.2",DEFAULT=YES
/videos/BorisHesser_2018S/audio/600k.m3u8?nobumpers=true&uniqueId=76011e2b
#EXT-X-I-FRAME-STREAM-INF:BANDWIDTH=74298,PROGRAM-ID=1,CODECS="avc1.42c00c",RESOLUTION=320x180,URI="/videos/BorisHesser_2018S/video/64k_iframe.m3u8?nobumpers=true&uniqueId=76011e2b"
#EXT-X-I-FRAME-STREAM-INF:BANDWIDTH=216200,PROGRAM-ID=1,CODECS="avc1.42c015",RESOLUTION=512x288,URI="/videos/BorisHesser_2018S/video/180k_iframe.m3u8?nobumpers=true&uniqueId=76011e2b"
#EXT-X-I-FRAME-STREAM-INF:BANDWIDTH=304717,PROGRAM-ID=1,CODECS="avc1.42c015",RESOLUTION=512x288,URI="/videos/BorisHesser_2018S/video/320k_iframe.m3u8?nobumpers=true&uniqueId=76011e2b"
#EXT-X-I-FRAME-STREAM-INF:BANDWIDTH=350933,PROGRAM-ID=1,CODECS="avc1.42c015",RESOLUTION=512x288,URI="/videos/BorisHesser_2018S/video/450k_iframe.m3u8?nobumpers=true&uniqueId=76011e2b"
#EXT-X-I-FRAME-STREAM-INF:BANDWIDTH=495850,PROGRAM-ID=1,CODECS="avc1.42c01e",RESOLUTION=640x360,URI="/videos/BorisHesser_2018S/video/600k_iframe.m3u8?nobumpers=true&uniqueId=76011e2b"
#EXT-X-I-FRAME-STREAM-INF:BANDWIDTH=810750,PROGRAM-ID=1,CODECS="avc1.4d401f",RESOLUTION=853x480,URI="/videos/BorisHesser_2018S/video/950k_iframe.m3u8?nobumpers=true&uniqueId=76011e2b"
#EXT-X-I-FRAME-STREAM-INF:BANDWIDTH=1273700,PROGRAM-ID=1,CODECS="avc1.640028",RESOLUTION=1280x720,URI="/videos/BorisHesser_2018S/video/1500k_iframe.m3u8?nobumpers=true&uniqueId=76011e2b"
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="600k",LANGUAGE="en",NAME="Audio",AUTOSELECT=YES,DEFAULT=YES,URI="/videos/BorisHesser_2018S/audio/600k.m3u8?nobumpers=true&uniqueId=76011e2b",BANDWIDTH=614400

View File

@ -1063,21 +1063,24 @@ class YoutubeDL(object):
if not m: if not m:
STR_OPERATORS = { STR_OPERATORS = {
'=': operator.eq, '=': operator.eq,
'!=': operator.ne,
'^=': lambda attr, value: attr.startswith(value), '^=': lambda attr, value: attr.startswith(value),
'$=': lambda attr, value: attr.endswith(value), '$=': lambda attr, value: attr.endswith(value),
'*=': lambda attr, value: value in attr, '*=': lambda attr, value: value in attr,
} }
str_operator_rex = re.compile(r'''(?x) str_operator_rex = re.compile(r'''(?x)
\s*(?P<key>ext|acodec|vcodec|container|protocol|format_id) \s*(?P<key>ext|acodec|vcodec|container|protocol|format_id)
\s*(?P<op>%s)(?P<none_inclusive>\s*\?)? \s*(?P<negation>!\s*)?(?P<op>%s)(?P<none_inclusive>\s*\?)?
\s*(?P<value>[a-zA-Z0-9._-]+) \s*(?P<value>[a-zA-Z0-9._-]+)
\s*$ \s*$
''' % '|'.join(map(re.escape, STR_OPERATORS.keys()))) ''' % '|'.join(map(re.escape, STR_OPERATORS.keys())))
m = str_operator_rex.search(filter_spec) m = str_operator_rex.search(filter_spec)
if m: if m:
comparison_value = m.group('value') comparison_value = m.group('value')
op = STR_OPERATORS[m.group('op')] str_op = STR_OPERATORS[m.group('op')]
if m.group('negation'):
op = lambda attr, value: not str_op(attr, value)
else:
op = str_op
if not m: if not m:
raise ValueError('Invalid filter specification %r' % filter_spec) raise ValueError('Invalid filter specification %r' % filter_spec)

View File

@ -75,10 +75,14 @@ class HlsFD(FragmentFD):
fd.add_progress_hook(ph) fd.add_progress_hook(ph)
return fd.real_download(filename, info_dict) return fd.real_download(filename, info_dict)
def is_ad_fragment(s): def is_ad_fragment_start(s):
return (s.startswith('#ANVATO-SEGMENT-INFO') and 'type=ad' in s or return (s.startswith('#ANVATO-SEGMENT-INFO') and 'type=ad' in s or
s.startswith('#UPLYNK-SEGMENT') and s.endswith(',ad')) s.startswith('#UPLYNK-SEGMENT') and s.endswith(',ad'))
def is_ad_fragment_end(s):
return (s.startswith('#ANVATO-SEGMENT-INFO') and 'type=master' in s or
s.startswith('#UPLYNK-SEGMENT') and s.endswith(',segment'))
media_frags = 0 media_frags = 0
ad_frags = 0 ad_frags = 0
ad_frag_next = False ad_frag_next = False
@ -87,12 +91,13 @@ class HlsFD(FragmentFD):
if not line: if not line:
continue continue
if line.startswith('#'): if line.startswith('#'):
if is_ad_fragment(line): if is_ad_fragment_start(line):
ad_frags += 1
ad_frag_next = True ad_frag_next = True
elif is_ad_fragment_end(line):
ad_frag_next = False
continue continue
if ad_frag_next: if ad_frag_next:
ad_frag_next = False ad_frags += 1
continue continue
media_frags += 1 media_frags += 1
@ -123,7 +128,6 @@ class HlsFD(FragmentFD):
if line: if line:
if not line.startswith('#'): if not line.startswith('#'):
if ad_frag_next: if ad_frag_next:
ad_frag_next = False
continue continue
frag_index += 1 frag_index += 1
if frag_index <= ctx['fragment_index']: if frag_index <= ctx['fragment_index']:
@ -196,8 +200,10 @@ class HlsFD(FragmentFD):
'start': sub_range_start, 'start': sub_range_start,
'end': sub_range_start + int(splitted_byte_range[0]), 'end': sub_range_start + int(splitted_byte_range[0]),
} }
elif is_ad_fragment(line): elif is_ad_fragment_start(line):
ad_frag_next = True ad_frag_next = True
elif is_ad_fragment_end(line):
ad_frag_next = False
self._finish_frag_download(ctx) self._finish_frag_download(ctx)

View File

@ -55,6 +55,7 @@ class BitChuteIE(InfoExtractor):
formats = [ formats = [
{'url': format_url} {'url': format_url}
for format_url in orderedSet(format_urls)] for format_url in orderedSet(format_urls)]
self._check_formats(formats, video_id)
self._sort_formats(formats) self._sort_formats(formats)
description = self._html_search_regex( description = self._html_search_regex(

View File

@ -82,6 +82,12 @@ class CarambaTVPageIE(InfoExtractor):
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
videomore_url = VideomoreIE._extract_url(webpage) videomore_url = VideomoreIE._extract_url(webpage)
if not videomore_url:
videomore_id = self._search_regex(
r'getVMCode\s*\(\s*["\']?(\d+)', webpage, 'videomore id',
default=None)
if videomore_id:
videomore_url = 'videomore:%s' % videomore_id
if videomore_url: if videomore_url:
title = self._og_search_title(webpage) title = self._og_search_title(webpage)
return { return {

View File

@ -1,20 +1,19 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .turner import TurnerBaseIE from .turner import TurnerBaseIE
from ..utils import int_or_none
class CartoonNetworkIE(TurnerBaseIE): class CartoonNetworkIE(TurnerBaseIE):
_VALID_URL = r'https?://(?:www\.)?cartoonnetwork\.com/video/(?:[^/]+/)+(?P<id>[^/?#]+)-(?:clip|episode)\.html' _VALID_URL = r'https?://(?:www\.)?cartoonnetwork\.com/video/(?:[^/]+/)+(?P<id>[^/?#]+)-(?:clip|episode)\.html'
_TEST = { _TEST = {
'url': 'http://www.cartoonnetwork.com/video/teen-titans-go/starfire-the-cat-lady-clip.html', 'url': 'https://www.cartoonnetwork.com/video/ben-10/how-to-draw-upgrade-episode.html',
'info_dict': { 'info_dict': {
'id': '8a250ab04ed07e6c014ef3f1e2f9016c', 'id': '6e3375097f63874ebccec7ef677c1c3845fa850e',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Starfire the Cat Lady', 'title': 'How to Draw Upgrade',
'description': 'Robin decides to become a cat so that Starfire will finally love him.', 'description': 'md5:2061d83776db7e8be4879684eefe8c0f',
}, },
'params': { 'params': {
# m3u8 download # m3u8 download
@ -25,18 +24,39 @@ class CartoonNetworkIE(TurnerBaseIE):
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
id_type, video_id = re.search(r"_cnglobal\.cvp(Video|Title)Id\s*=\s*'([^']+)';", webpage).groups()
query = ('id' if id_type == 'Video' else 'titleId') + '=' + video_id def find_field(global_re, name, content_re=None, value_re='[^"]+', fatal=False):
return self._extract_cvp_info( metadata_re = ''
'http://www.cartoonnetwork.com/video-seo-svc/episodeservices/getCvpPlaylist?networkName=CN2&' + query, video_id, { if content_re:
'secure': { metadata_re = r'|video_metadata\.content_' + content_re
'media_src': 'http://androidhls-secure.cdn.turner.com/toon/big', return self._search_regex(
'tokenizer_src': 'https://token.vgtf.net/token/token_mobile', r'(?:_cnglobal\.currentVideo\.%s%s)\s*=\s*"(%s)";' % (global_re, metadata_re, value_re),
}, webpage, name, fatal=fatal)
}, {
media_id = find_field('mediaId', 'media id', 'id', '[0-9a-f]{40}', True)
title = find_field('episodeTitle', 'title', '(?:episodeName|name)', fatal=True)
info = self._extract_ngtv_info(
media_id, {'networkId': 'cartoonnetwork'}, {
'url': url, 'url': url,
'site_name': 'CartoonNetwork', 'site_name': 'CartoonNetwork',
'auth_required': self._search_regex( 'auth_required': find_field('authType', 'auth type') != 'unauth',
r'_cnglobal\.cvpFullOrPreviewAuth\s*=\s*(true|false);',
webpage, 'auth required', default='false') == 'true',
}) })
series = find_field(
'propertyName', 'series', 'showName') or self._html_search_meta('partOfSeries', webpage)
info.update({
'id': media_id,
'display_id': display_id,
'title': title,
'description': self._html_search_meta('description', webpage),
'series': series,
'episode': title,
})
for field in ('season', 'episode'):
field_name = field + 'Number'
info[field + '_number'] = int_or_none(find_field(
field_name, field + ' number', value_re=r'\d+') or self._html_search_meta(field_name, webpage))
return info

View File

@ -1241,17 +1241,30 @@ class InfoExtractor(object):
if expected_type is not None and expected_type != item_type: if expected_type is not None and expected_type != item_type:
return info return info
if item_type in ('TVEpisode', 'Episode'): if item_type in ('TVEpisode', 'Episode'):
episode_name = unescapeHTML(e.get('name'))
info.update({ info.update({
'episode': unescapeHTML(e.get('name')), 'episode': episode_name,
'episode_number': int_or_none(e.get('episodeNumber')), 'episode_number': int_or_none(e.get('episodeNumber')),
'description': unescapeHTML(e.get('description')), 'description': unescapeHTML(e.get('description')),
}) })
if not info.get('title') and episode_name:
info['title'] = episode_name
part_of_season = e.get('partOfSeason') part_of_season = e.get('partOfSeason')
if isinstance(part_of_season, dict) and part_of_season.get('@type') in ('TVSeason', 'Season', 'CreativeWorkSeason'): if isinstance(part_of_season, dict) and part_of_season.get('@type') in ('TVSeason', 'Season', 'CreativeWorkSeason'):
info['season_number'] = int_or_none(part_of_season.get('seasonNumber')) info.update({
'season': unescapeHTML(part_of_season.get('name')),
'season_number': int_or_none(part_of_season.get('seasonNumber')),
})
part_of_series = e.get('partOfSeries') or e.get('partOfTVSeries') part_of_series = e.get('partOfSeries') or e.get('partOfTVSeries')
if isinstance(part_of_series, dict) and part_of_series.get('@type') in ('TVSeries', 'Series', 'CreativeWorkSeries'): if isinstance(part_of_series, dict) and part_of_series.get('@type') in ('TVSeries', 'Series', 'CreativeWorkSeries'):
info['series'] = unescapeHTML(part_of_series.get('name')) info['series'] = unescapeHTML(part_of_series.get('name'))
elif item_type == 'Movie':
info.update({
'title': unescapeHTML(e.get('name')),
'description': unescapeHTML(e.get('description')),
'duration': parse_duration(e.get('duration')),
'timestamp': unified_timestamp(e.get('dateCreated')),
})
elif item_type in ('Article', 'NewsArticle'): elif item_type in ('Article', 'NewsArticle'):
info.update({ info.update({
'timestamp': parse_iso8601(e.get('datePublished')), 'timestamp': parse_iso8601(e.get('datePublished')),
@ -1588,6 +1601,7 @@ class InfoExtractor(object):
# References: # References:
# 1. https://tools.ietf.org/html/draft-pantos-http-live-streaming-21 # 1. https://tools.ietf.org/html/draft-pantos-http-live-streaming-21
# 2. https://github.com/rg3/youtube-dl/issues/12211 # 2. https://github.com/rg3/youtube-dl/issues/12211
# 3. https://github.com/rg3/youtube-dl/issues/18923
# We should try extracting formats only from master playlists [1, 4.3.4], # We should try extracting formats only from master playlists [1, 4.3.4],
# i.e. playlists that describe available qualities. On the other hand # i.e. playlists that describe available qualities. On the other hand
@ -1659,11 +1673,16 @@ class InfoExtractor(object):
rendition = stream_group[0] rendition = stream_group[0]
return rendition.get('NAME') or stream_group_id return rendition.get('NAME') or stream_group_id
# parse EXT-X-MEDIA tags before EXT-X-STREAM-INF in order to have the
# chance to detect video only formats when EXT-X-STREAM-INF tags
# precede EXT-X-MEDIA tags in HLS manifest such as [3].
for line in m3u8_doc.splitlines():
if line.startswith('#EXT-X-MEDIA:'):
extract_media(line)
for line in m3u8_doc.splitlines(): for line in m3u8_doc.splitlines():
if line.startswith('#EXT-X-STREAM-INF:'): if line.startswith('#EXT-X-STREAM-INF:'):
last_stream_inf = parse_m3u8_attributes(line) last_stream_inf = parse_m3u8_attributes(line)
elif line.startswith('#EXT-X-MEDIA:'):
extract_media(line)
elif line.startswith('#') or not line.strip(): elif line.startswith('#') or not line.strip():
continue continue
else: else:
@ -2616,7 +2635,7 @@ class InfoExtractor(object):
'id': this_video_id, 'id': this_video_id,
'title': unescapeHTML(video_data['title'] if require_title else video_data.get('title')), 'title': unescapeHTML(video_data['title'] if require_title else video_data.get('title')),
'description': video_data.get('description'), 'description': video_data.get('description'),
'thumbnail': self._proto_relative_url(video_data.get('image')), 'thumbnail': urljoin(base_url, self._proto_relative_url(video_data.get('image'))),
'timestamp': int_or_none(video_data.get('pubdate')), 'timestamp': int_or_none(video_data.get('pubdate')),
'duration': float_or_none(jwplayer_data.get('duration') or video_data.get('duration')), 'duration': float_or_none(jwplayer_data.get('duration') or video_data.get('duration')),
'subtitles': subtitles, 'subtitles': subtitles,
@ -2643,12 +2662,9 @@ class InfoExtractor(object):
for source in jwplayer_sources_data: for source in jwplayer_sources_data:
if not isinstance(source, dict): if not isinstance(source, dict):
continue continue
source_url = self._proto_relative_url(source.get('file')) source_url = urljoin(
if not source_url: base_url, self._proto_relative_url(source.get('file')))
continue if not source_url or source_url in urls:
if base_url:
source_url = compat_urlparse.urljoin(base_url, source_url)
if source_url in urls:
continue continue
urls.append(source_url) urls.append(source_url)
source_type = source.get('type') or '' source_type = source.get('type') or ''

View File

@ -144,7 +144,7 @@ class CrunchyrollBaseIE(InfoExtractor):
class CrunchyrollIE(CrunchyrollBaseIE, VRVIE): class CrunchyrollIE(CrunchyrollBaseIE, VRVIE):
IE_NAME = 'crunchyroll' IE_NAME = 'crunchyroll'
_VALID_URL = r'https?://(?:(?P<prefix>www|m)\.)?(?P<url>crunchyroll\.(?:com|fr)/(?:media(?:-|/\?id=)|[^/]*/[^/?&]*?)(?P<video_id>[0-9]+))(?:[/?&]|$)' _VALID_URL = r'https?://(?:(?P<prefix>www|m)\.)?(?P<url>crunchyroll\.(?:com|fr)/(?:media(?:-|/\?id=)|(?:[^/]*/){1,2}[^/?&]*?)(?P<video_id>[0-9]+))(?:[/?&]|$)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.crunchyroll.com/wanna-be-the-strongest-in-the-world/episode-1-an-idol-wrestler-is-born-645513', 'url': 'http://www.crunchyroll.com/wanna-be-the-strongest-in-the-world/episode-1-an-idol-wrestler-is-born-645513',
'info_dict': { 'info_dict': {
@ -269,6 +269,9 @@ class CrunchyrollIE(CrunchyrollBaseIE, VRVIE):
}, { }, {
'url': 'http://www.crunchyroll.com/media-723735', 'url': 'http://www.crunchyroll.com/media-723735',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.crunchyroll.com/en-gb/mob-psycho-100/episode-2-urban-legends-encountering-rumors-780921',
'only_matching': True,
}] }]
_FORMAT_IDS = { _FORMAT_IDS = {

View File

@ -46,8 +46,24 @@ class CuriosityStreamBaseIE(InfoExtractor):
self._handle_errors(result) self._handle_errors(result)
self._auth_token = result['message']['auth_token'] self._auth_token = result['message']['auth_token']
def _extract_media_info(self, media):
video_id = compat_str(media['id']) class CuriosityStreamIE(CuriosityStreamBaseIE):
IE_NAME = 'curiositystream'
_VALID_URL = r'https?://(?:app\.)?curiositystream\.com/video/(?P<id>\d+)'
_TEST = {
'url': 'https://app.curiositystream.com/video/2',
'md5': '262bb2f257ff301115f1973540de8983',
'info_dict': {
'id': '2',
'ext': 'mp4',
'title': 'How Did You Develop The Internet?',
'description': 'Vint Cerf, Google\'s Chief Internet Evangelist, describes how he and Bob Kahn created the internet.',
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
media = self._call_api('media/' + video_id, video_id)
title = media['title'] title = media['title']
formats = [] formats = []
@ -114,38 +130,21 @@ class CuriosityStreamBaseIE(InfoExtractor):
} }
class CuriosityStreamIE(CuriosityStreamBaseIE):
IE_NAME = 'curiositystream'
_VALID_URL = r'https?://app\.curiositystream\.com/video/(?P<id>\d+)'
_TEST = {
'url': 'https://app.curiositystream.com/video/2',
'md5': '262bb2f257ff301115f1973540de8983',
'info_dict': {
'id': '2',
'ext': 'mp4',
'title': 'How Did You Develop The Internet?',
'description': 'Vint Cerf, Google\'s Chief Internet Evangelist, describes how he and Bob Kahn created the internet.',
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
media = self._call_api('media/' + video_id, video_id)
return self._extract_media_info(media)
class CuriosityStreamCollectionIE(CuriosityStreamBaseIE): class CuriosityStreamCollectionIE(CuriosityStreamBaseIE):
IE_NAME = 'curiositystream:collection' IE_NAME = 'curiositystream:collection'
_VALID_URL = r'https?://app\.curiositystream\.com/collection/(?P<id>\d+)' _VALID_URL = r'https?://(?:app\.)?curiositystream\.com/(?:collection|series)/(?P<id>\d+)'
_TEST = { _TESTS = [{
'url': 'https://app.curiositystream.com/collection/2', 'url': 'https://app.curiositystream.com/collection/2',
'info_dict': { 'info_dict': {
'id': '2', 'id': '2',
'title': 'Curious Minds: The Internet', 'title': 'Curious Minds: The Internet',
'description': 'How is the internet shaping our lives in the 21st Century?', 'description': 'How is the internet shaping our lives in the 21st Century?',
}, },
'playlist_mincount': 12, 'playlist_mincount': 17,
} }, {
'url': 'https://curiositystream.com/series/2',
'only_matching': True,
}]
def _real_extract(self, url): def _real_extract(self, url):
collection_id = self._match_id(url) collection_id = self._match_id(url)
@ -153,7 +152,10 @@ class CuriosityStreamCollectionIE(CuriosityStreamBaseIE):
'collections/' + collection_id, collection_id) 'collections/' + collection_id, collection_id)
entries = [] entries = []
for media in collection.get('media', []): for media in collection.get('media', []):
entries.append(self._extract_media_info(media)) media_id = compat_str(media.get('id'))
entries.append(self.url_result(
'https://curiositystream.com/video/' + media_id,
CuriosityStreamIE.ie_key(), media_id))
return self.playlist_result( return self.playlist_result(
entries, collection_id, entries, collection_id,
collection.get('title'), collection.get('description')) collection.get('title'), collection.get('description'))

View File

@ -77,10 +77,9 @@ class DRTVIE(InfoExtractor):
r'data-resource="[^>"]+mu/programcard/expanded/([^"]+)"'), r'data-resource="[^>"]+mu/programcard/expanded/([^"]+)"'),
webpage, 'video id') webpage, 'video id')
programcard = self._download_json( data = self._download_json(
'http://www.dr.dk/mu/programcard/expanded/%s' % video_id, 'https://www.dr.dk/mu-online/api/1.4/programcard/%s' % video_id,
video_id, 'Downloading video JSON') video_id, 'Downloading video JSON', query={'expanded': 'true'})
data = programcard['Data'][0]
title = remove_end(self._og_search_title( title = remove_end(self._og_search_title(
webpage, default=None), ' | TV | DR') or data['Title'] webpage, default=None), ' | TV | DR') or data['Title']
@ -97,7 +96,7 @@ class DRTVIE(InfoExtractor):
formats = [] formats = []
subtitles = {} subtitles = {}
for asset in data['Assets']: for asset in [data['PrimaryAsset']]:
kind = asset.get('Kind') kind = asset.get('Kind')
if kind == 'Image': if kind == 'Image':
thumbnail = asset.get('Uri') thumbnail = asset.get('Uri')

View File

@ -15,16 +15,16 @@ from ..utils import (
class DTubeIE(InfoExtractor): class DTubeIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?d\.tube/(?:#!/)?v/(?P<uploader_id>[0-9a-z.-]+)/(?P<id>[0-9a-z]{8})' _VALID_URL = r'https?://(?:www\.)?d\.tube/(?:#!/)?v/(?P<uploader_id>[0-9a-z.-]+)/(?P<id>[0-9a-z]{8})'
_TEST = { _TEST = {
'url': 'https://d.tube/#!/v/benswann/zqd630em', 'url': 'https://d.tube/#!/v/broncnutz/x380jtr1',
'md5': 'a03eaa186618ffa7a3145945543a251e', 'md5': '9f29088fa08d699a7565ee983f56a06e',
'info_dict': { 'info_dict': {
'id': 'zqd630em', 'id': 'x380jtr1',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Reality Check: FDA\'s Disinformation Campaign on Kratom', 'title': 'Lefty 3-Rings is Back Baby!! NCAA Picks',
'description': 'md5:700d164e066b87f9eac057949e4227c2', 'description': 'md5:60be222088183be3a42f196f34235776',
'uploader_id': 'benswann', 'uploader_id': 'broncnutz',
'upload_date': '20180222', 'upload_date': '20190107',
'timestamp': 1519328958, 'timestamp': 1546854054,
}, },
'params': { 'params': {
'format': '480p', 'format': '480p',
@ -48,7 +48,7 @@ class DTubeIE(InfoExtractor):
def canonical_url(h): def canonical_url(h):
if not h: if not h:
return None return None
return 'https://ipfs.io/ipfs/' + h return 'https://video.dtube.top/ipfs/' + h
formats = [] formats = []
for q in ('240', '480', '720', '1080', ''): for q in ('240', '480', '720', '1080', ''):

View File

@ -411,6 +411,7 @@ from .funk import (
from .funnyordie import FunnyOrDieIE from .funnyordie import FunnyOrDieIE
from .fusion import FusionIE from .fusion import FusionIE
from .fxnetworks import FXNetworksIE from .fxnetworks import FXNetworksIE
from .gaia import GaiaIE
from .gameinformer import GameInformerIE from .gameinformer import GameInformerIE
from .gameone import ( from .gameone import (
GameOneIE, GameOneIE,
@ -451,6 +452,7 @@ from .hellporno import HellPornoIE
from .helsinki import HelsinkiIE from .helsinki import HelsinkiIE
from .hentaistigma import HentaiStigmaIE from .hentaistigma import HentaiStigmaIE
from .hgtv import HGTVComShowIE from .hgtv import HGTVComShowIE
from .hketv import HKETVIE
from .hidive import HiDiveIE from .hidive import HiDiveIE
from .historicfilms import HistoricFilmsIE from .historicfilms import HistoricFilmsIE
from .hitbox import HitboxIE, HitboxLiveIE from .hitbox import HitboxIE, HitboxLiveIE
@ -469,6 +471,10 @@ from .hrti import (
) )
from .huajiao import HuajiaoIE from .huajiao import HuajiaoIE
from .huffpost import HuffPostIE from .huffpost import HuffPostIE
from .hungama import (
HungamaIE,
HungamaSongIE,
)
from .hypem import HypemIE from .hypem import HypemIE
from .iconosquare import IconosquareIE from .iconosquare import IconosquareIE
from .ign import ( from .ign import (
@ -489,7 +495,11 @@ from .ina import InaIE
from .inc import IncIE from .inc import IncIE
from .indavideo import IndavideoEmbedIE from .indavideo import IndavideoEmbedIE
from .infoq import InfoQIE from .infoq import InfoQIE
from .instagram import InstagramIE, InstagramUserIE from .instagram import (
InstagramIE,
InstagramUserIE,
InstagramTagIE,
)
from .internazionale import InternazionaleIE from .internazionale import InternazionaleIE
from .internetvideoarchive import InternetVideoArchiveIE from .internetvideoarchive import InternetVideoArchiveIE
from .iprima import IPrimaIE from .iprima import IPrimaIE
@ -682,11 +692,7 @@ from .myvi import (
MyviEmbedIE, MyviEmbedIE,
) )
from .myvidster import MyVidsterIE from .myvidster import MyVidsterIE
from .nationalgeographic import ( from .nationalgeographic import NationalGeographicVideoIE
NationalGeographicVideoIE,
NationalGeographicIE,
NationalGeographicEpisodeGuideIE,
)
from .naver import NaverIE from .naver import NaverIE
from .nba import NBAIE from .nba import NBAIE
from .nbc import ( from .nbc import (
@ -828,6 +834,7 @@ from .orf import (
ORFOE1IE, ORFOE1IE,
ORFIPTVIE, ORFIPTVIE,
) )
from .outsidetv import OutsideTVIE
from .packtpub import ( from .packtpub import (
PacktPubIE, PacktPubIE,
PacktPubCourseIE, PacktPubCourseIE,
@ -856,6 +863,7 @@ from .piksel import PikselIE
from .pinkbike import PinkbikeIE from .pinkbike import PinkbikeIE
from .pladform import PladformIE from .pladform import PladformIE
from .playfm import PlayFMIE from .playfm import PlayFMIE
from .playplustv import PlayPlusTVIE
from .plays import PlaysTVIE from .plays import PlaysTVIE
from .playtvak import PlaytvakIE from .playtvak import PlaytvakIE
from .playvid import PlayvidIE from .playvid import PlayvidIE
@ -1193,7 +1201,9 @@ from .tvnet import TVNetIE
from .tvnoe import TVNoeIE from .tvnoe import TVNoeIE
from .tvnow import ( from .tvnow import (
TVNowIE, TVNowIE,
TVNowListIE, TVNowNewIE,
TVNowSeasonIE,
TVNowAnnualIE,
TVNowShowIE, TVNowShowIE,
) )
from .tvp import ( from .tvp import (
@ -1363,6 +1373,7 @@ from .vuclip import VuClipIE
from .vvvvid import VVVVIDIE from .vvvvid import VVVVIDIE
from .vyborymos import VyboryMosIE from .vyborymos import VyboryMosIE
from .vzaar import VzaarIE from .vzaar import VzaarIE
from .wakanim import WakanimIE
from .walla import WallaIE from .walla import WallaIE
from .washingtonpost import ( from .washingtonpost import (
WashingtonPostIE, WashingtonPostIE,

View File

@ -1,11 +1,11 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
# import json
# import uuid
from .adobepass import AdobePassIE from .adobepass import AdobePassIE
from .uplynk import UplynkPreplayIE
from ..compat import compat_str
from ..utils import ( from ..utils import (
HEADRequest,
int_or_none, int_or_none,
parse_age_limit, parse_age_limit,
parse_duration, parse_duration,
@ -16,7 +16,7 @@ from ..utils import (
class FOXIE(AdobePassIE): class FOXIE(AdobePassIE):
_VALID_URL = r'https?://(?:www\.)?fox\.com/watch/(?P<id>[\da-fA-F]+)' _VALID_URL = r'https?://(?:www\.)?(?:fox\.com|nationalgeographic\.com/tv)/watch/(?P<id>[\da-fA-F]+)'
_TESTS = [{ _TESTS = [{
# clip # clip
'url': 'https://www.fox.com/watch/4b765a60490325103ea69888fb2bd4e8/', 'url': 'https://www.fox.com/watch/4b765a60490325103ea69888fb2bd4e8/',
@ -43,41 +43,47 @@ class FOXIE(AdobePassIE):
# episode, geo-restricted, tv provided required # episode, geo-restricted, tv provided required
'url': 'https://www.fox.com/watch/30056b295fb57f7452aeeb4920bc3024/', 'url': 'https://www.fox.com/watch/30056b295fb57f7452aeeb4920bc3024/',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.nationalgeographic.com/tv/watch/f690e05ebbe23ab79747becd0cc223d1/',
'only_matching': True,
}] }]
# _access_token = None
# def _call_api(self, path, video_id, data=None):
# headers = {
# 'X-Api-Key': '238bb0a0c2aba67922c48709ce0c06fd',
# }
# if self._access_token:
# headers['Authorization'] = 'Bearer ' + self._access_token
# return self._download_json(
# 'https://api2.fox.com/v2.0/' + path, video_id, data=data, headers=headers)
# def _real_initialize(self):
# self._access_token = self._call_api(
# 'login', None, json.dumps({
# 'deviceId': compat_str(uuid.uuid4()),
# }).encode())['accessToken']
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
video = self._download_json( video = self._download_json(
'https://api.fox.com/fbc-content/v1_4/video/%s' % video_id, 'https://api.fox.com/fbc-content/v1_5/video/%s' % video_id,
video_id, headers={ video_id, headers={
'apikey': 'abdcbed02c124d393b39e818a4312055', 'apikey': 'abdcbed02c124d393b39e818a4312055',
'Content-Type': 'application/json', 'Content-Type': 'application/json',
'Referer': url, 'Referer': url,
}) })
# video = self._call_api('vodplayer/' + video_id, video_id)
title = video['name'] title = video['name']
release_url = video['videoRelease']['url'] release_url = video['videoRelease']['url']
# release_url = video['url']
description = video.get('description')
duration = int_or_none(video.get('durationInSeconds')) or int_or_none(
video.get('duration')) or parse_duration(video.get('duration'))
timestamp = unified_timestamp(video.get('datePublished'))
rating = video.get('contentRating')
age_limit = parse_age_limit(rating)
data = try_get( data = try_get(
video, lambda x: x['trackingData']['properties'], dict) or {} video, lambda x: x['trackingData']['properties'], dict) or {}
creator = data.get('brand') or data.get('network') or video.get('network') rating = video.get('contentRating')
series = video.get('seriesName') or data.get(
'seriesName') or data.get('show')
season_number = int_or_none(video.get('seasonNumber'))
episode = video.get('name')
episode_number = int_or_none(video.get('episodeNumber'))
release_year = int_or_none(video.get('releaseYear'))
if data.get('authRequired'): if data.get('authRequired'):
resource = self._get_mvpd_resource( resource = self._get_mvpd_resource(
'fbc-fox', title, video.get('guid'), rating) 'fbc-fox', title, video.get('guid'), rating)
@ -86,6 +92,18 @@ class FOXIE(AdobePassIE):
'auth': self._extract_mvpd_auth( 'auth': self._extract_mvpd_auth(
url, video_id, 'fbc-fox', resource) url, video_id, 'fbc-fox', resource)
}) })
m3u8_url = self._download_json(release_url, video_id)['playURL']
formats = self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='hls')
self._sort_formats(formats)
duration = int_or_none(video.get('durationInSeconds')) or int_or_none(
video.get('duration')) or parse_duration(video.get('duration'))
timestamp = unified_timestamp(video.get('datePublished'))
creator = data.get('brand') or data.get('network') or video.get('network')
series = video.get('seriesName') or data.get(
'seriesName') or data.get('show')
subtitles = {} subtitles = {}
for doc_rel in video.get('documentReleases', []): for doc_rel in video.get('documentReleases', []):
@ -98,36 +116,19 @@ class FOXIE(AdobePassIE):
}] }]
break break
info = { return {
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'description': description, 'formats': formats,
'description': video.get('description'),
'duration': duration, 'duration': duration,
'timestamp': timestamp, 'timestamp': timestamp,
'age_limit': age_limit, 'age_limit': parse_age_limit(rating),
'creator': creator, 'creator': creator,
'series': series, 'series': series,
'season_number': season_number, 'season_number': int_or_none(video.get('seasonNumber')),
'episode': episode, 'episode': video.get('name'),
'episode_number': episode_number, 'episode_number': int_or_none(video.get('episodeNumber')),
'release_year': release_year, 'release_year': int_or_none(video.get('releaseYear')),
'subtitles': subtitles, 'subtitles': subtitles,
} }
urlh = self._request_webpage(HEADRequest(release_url), video_id)
video_url = compat_str(urlh.geturl())
if UplynkPreplayIE.suitable(video_url):
info.update({
'_type': 'url_transparent',
'url': video_url,
'ie_key': UplynkPreplayIE.ie_key(),
})
else:
m3u8_url = self._download_json(release_url, video_id)['playURL']
formats = self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='hls')
self._sort_formats(formats)
info['formats'] = formats
return info

View File

@ -1,6 +1,9 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import random
import string
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_HTTPError from ..compat import compat_HTTPError
from ..utils import ( from ..utils import (
@ -87,7 +90,7 @@ class FunimationIE(InfoExtractor):
video_id = title_data.get('id') or self._search_regex([ video_id = title_data.get('id') or self._search_regex([
r"KANE_customdimensions.videoID\s*=\s*'(\d+)';", r"KANE_customdimensions.videoID\s*=\s*'(\d+)';",
r'<iframe[^>]+src="/player/(\d+)"', r'<iframe[^>]+src="/player/(\d+)',
], webpage, 'video_id', default=None) ], webpage, 'video_id', default=None)
if not video_id: if not video_id:
player_url = self._html_search_meta([ player_url = self._html_search_meta([
@ -108,8 +111,10 @@ class FunimationIE(InfoExtractor):
if self._TOKEN: if self._TOKEN:
headers['Authorization'] = 'Token %s' % self._TOKEN headers['Authorization'] = 'Token %s' % self._TOKEN
sources = self._download_json( sources = self._download_json(
'https://prod-api-funimationnow.dadcdigital.com/api/source/catalog/video/%s/signed/' % video_id, 'https://www.funimation.com/api/showexperience/%s/' % video_id,
video_id, headers=headers)['items'] video_id, headers=headers, query={
'pinst_id': ''.join([random.choice(string.digits + string.ascii_letters) for _ in range(8)]),
})['items']
except ExtractorError as e: except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403: if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
error = self._parse_json(e.cause.read(), video_id)['errors'][0] error = self._parse_json(e.cause.read(), video_id)['errors'][0]

View File

@ -0,0 +1,98 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
int_or_none,
str_or_none,
strip_or_none,
try_get,
)
class GaiaIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?gaia\.com/video/(?P<id>[^/?]+).*?\bfullplayer=(?P<type>feature|preview)'
_TESTS = [{
'url': 'https://www.gaia.com/video/connecting-universal-consciousness?fullplayer=feature',
'info_dict': {
'id': '89356',
'ext': 'mp4',
'title': 'Connecting with Universal Consciousness',
'description': 'md5:844e209ad31b7d31345f5ed689e3df6f',
'upload_date': '20151116',
'timestamp': 1447707266,
'duration': 936,
},
'params': {
# m3u8 download
'skip_download': True,
},
}, {
'url': 'https://www.gaia.com/video/connecting-universal-consciousness?fullplayer=preview',
'info_dict': {
'id': '89351',
'ext': 'mp4',
'title': 'Connecting with Universal Consciousness',
'description': 'md5:844e209ad31b7d31345f5ed689e3df6f',
'upload_date': '20151116',
'timestamp': 1447707266,
'duration': 53,
},
'params': {
# m3u8 download
'skip_download': True,
},
}]
def _real_extract(self, url):
display_id, vtype = re.search(self._VALID_URL, url).groups()
node_id = self._download_json(
'https://brooklyn.gaia.com/pathinfo', display_id, query={
'path': 'video/' + display_id,
})['id']
node = self._download_json(
'https://brooklyn.gaia.com/node/%d' % node_id, node_id)
vdata = node[vtype]
media_id = compat_str(vdata['nid'])
title = node['title']
media = self._download_json(
'https://brooklyn.gaia.com/media/' + media_id, media_id)
formats = self._extract_m3u8_formats(
media['mediaUrls']['bcHLS'], media_id, 'mp4')
self._sort_formats(formats)
subtitles = {}
text_tracks = media.get('textTracks', {})
for key in ('captions', 'subtitles'):
for lang, sub_url in text_tracks.get(key, {}).items():
subtitles.setdefault(lang, []).append({
'url': sub_url,
})
fivestar = node.get('fivestar', {})
fields = node.get('fields', {})
def get_field_value(key, value_key='value'):
return try_get(fields, lambda x: x[key][0][value_key])
return {
'id': media_id,
'display_id': display_id,
'title': title,
'formats': formats,
'description': strip_or_none(get_field_value('body') or get_field_value('teaser')),
'timestamp': int_or_none(node.get('created')),
'subtitles': subtitles,
'duration': int_or_none(vdata.get('duration')),
'like_count': int_or_none(try_get(fivestar, lambda x: x['up_count']['value'])),
'dislike_count': int_or_none(try_get(fivestar, lambda x: x['down_count']['value'])),
'comment_count': int_or_none(node.get('comment_count')),
'series': try_get(node, lambda x: x['series']['title'], compat_str),
'season_number': int_or_none(get_field_value('season')),
'season_id': str_or_none(get_field_value('series_nid', 'nid')),
'episode_number': int_or_none(get_field_value('episode')),
}

View File

@ -72,7 +72,7 @@ class GloboIE(InfoExtractor):
return return
try: try:
self._download_json( glb_id = (self._download_json(
'https://login.globo.com/api/authentication', None, data=json.dumps({ 'https://login.globo.com/api/authentication', None, data=json.dumps({
'payload': { 'payload': {
'email': email, 'email': email,
@ -81,7 +81,9 @@ class GloboIE(InfoExtractor):
}, },
}).encode(), headers={ }).encode(), headers={
'Content-Type': 'application/json; charset=utf-8', 'Content-Type': 'application/json; charset=utf-8',
}) }) or {}).get('glbId')
if glb_id:
self._set_cookie('.globo.com', 'GLBID', glb_id)
except ExtractorError as e: except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 401: if isinstance(e.cause, compat_HTTPError) and e.cause.code == 401:
resp = self._parse_json(e.cause.read(), None) resp = self._parse_json(e.cause.read(), None)

View File

@ -25,15 +25,15 @@ class GoIE(AdobePassIE):
}, },
'watchdisneychannel': { 'watchdisneychannel': {
'brand': '004', 'brand': '004',
'requestor_id': 'Disney', 'resource_id': 'Disney',
}, },
'watchdisneyjunior': { 'watchdisneyjunior': {
'brand': '008', 'brand': '008',
'requestor_id': 'DisneyJunior', 'resource_id': 'DisneyJunior',
}, },
'watchdisneyxd': { 'watchdisneyxd': {
'brand': '009', 'brand': '009',
'requestor_id': 'DisneyXD', 'resource_id': 'DisneyXD',
} }
} }
_VALID_URL = r'https?://(?:(?P<sub_domain>%s)\.)?go\.com/(?:(?:[^/]+/)*(?P<id>vdka\w+)|(?:[^/]+/)*(?P<display_id>[^/?#]+))'\ _VALID_URL = r'https?://(?:(?P<sub_domain>%s)\.)?go\.com/(?:(?:[^/]+/)*(?P<id>vdka\w+)|(?:[^/]+/)*(?P<display_id>[^/?#]+))'\
@ -130,8 +130,8 @@ class GoIE(AdobePassIE):
'device': '001', 'device': '001',
} }
if video_data.get('accesslevel') == '1': if video_data.get('accesslevel') == '1':
requestor_id = site_info['requestor_id'] requestor_id = site_info.get('requestor_id', 'DisneyChannels')
resource = self._get_mvpd_resource( resource = site_info.get('resource_id') or self._get_mvpd_resource(
requestor_id, title, video_id, None) requestor_id, title, video_id, None)
auth = self._extract_mvpd_auth( auth = self._extract_mvpd_auth(
url, video_id, requestor_id, resource) url, video_id, requestor_id, resource)

View File

@ -0,0 +1,191 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
clean_html,
ExtractorError,
int_or_none,
merge_dicts,
parse_count,
str_or_none,
try_get,
unified_strdate,
urlencode_postdata,
urljoin,
)
class HKETVIE(InfoExtractor):
IE_NAME = 'hketv'
IE_DESC = '香港教育局教育電視 (HKETV) Educational Television, Hong Kong Educational Bureau'
_GEO_BYPASS = False
_GEO_COUNTRIES = ['HK']
_VALID_URL = r'https?://(?:www\.)?hkedcity\.net/etv/resource/(?P<id>[0-9]+)'
_TESTS = [{
'url': 'https://www.hkedcity.net/etv/resource/2932360618',
'md5': 'f193712f5f7abb208ddef3c5ea6ed0b7',
'info_dict': {
'id': '2932360618',
'ext': 'mp4',
'title': '喜閱一生(共享閱讀樂) (中、英文字幕可供選擇)',
'description': 'md5:d5286d05219ef50e0613311cbe96e560',
'upload_date': '20181024',
'duration': 900,
'subtitles': 'count:2',
},
'skip': 'Geo restricted to HK',
}, {
'url': 'https://www.hkedcity.net/etv/resource/972641418',
'md5': '1ed494c1c6cf7866a8290edad9b07dc9',
'info_dict': {
'id': '972641418',
'ext': 'mp4',
'title': '衣冠楚楚 (天使系列之一)',
'description': 'md5:10bb3d659421e74f58e5db5691627b0f',
'upload_date': '20070109',
'duration': 907,
'subtitles': {},
},
'params': {
'geo_verification_proxy': '<HK proxy here>',
},
'skip': 'Geo restricted to HK',
}]
_CC_LANGS = {
'中文(繁體中文)': 'zh-Hant',
'中文(简体中文)': 'zh-Hans',
'English': 'en',
'Bahasa Indonesia': 'id',
'\u0939\u093f\u0928\u094d\u0926\u0940': 'hi',
'\u0928\u0947\u092a\u093e\u0932\u0940': 'ne',
'Tagalog': 'tl',
'\u0e44\u0e17\u0e22': 'th',
'\u0627\u0631\u062f\u0648': 'ur',
}
_FORMAT_HEIGHTS = {
'SD': 360,
'HD': 720,
}
_APPS_BASE_URL = 'https://apps.hkedcity.net'
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
title = (
self._html_search_meta(
('ed_title', 'search.ed_title'), webpage, default=None) or
self._search_regex(
r'data-favorite_title_(?:eng|chi)=(["\'])(?P<id>(?:(?!\1).)+)\1',
webpage, 'title', default=None, group='url') or
self._html_search_regex(
r'<h1>([^<]+)</h1>', webpage, 'title', default=None) or
self._og_search_title(webpage)
)
file_id = self._search_regex(
r'post_var\[["\']file_id["\']\s*\]\s*=\s*(.+?);',
webpage, 'file ID')
curr_url = self._search_regex(
r'post_var\[["\']curr_url["\']\s*\]\s*=\s*"(.+?)";',
webpage, 'curr URL')
data = {
'action': 'get_info',
'curr_url': curr_url,
'file_id': file_id,
'video_url': file_id,
}
response = self._download_json(
self._APPS_BASE_URL + '/media/play/handler.php', video_id,
data=urlencode_postdata(data),
headers=merge_dicts({
'Content-Type': 'application/x-www-form-urlencoded'},
self.geo_verification_headers()))
result = response['result']
if not response.get('success') or not response.get('access'):
error = clean_html(response.get('access_err_msg'))
if 'Video streaming is not available in your country' in error:
self.raise_geo_restricted(
msg=error, countries=self._GEO_COUNTRIES)
else:
raise ExtractorError(error, expected=True)
formats = []
width = int_or_none(result.get('width'))
height = int_or_none(result.get('height'))
playlist0 = result['playlist'][0]
for fmt in playlist0['sources']:
file_url = urljoin(self._APPS_BASE_URL, fmt.get('file'))
if not file_url:
continue
# If we ever wanted to provide the final resolved URL that
# does not require cookies, albeit with a shorter lifespan:
# urlh = self._downloader.urlopen(file_url)
# resolved_url = urlh.geturl()
label = fmt.get('label')
h = self._FORMAT_HEIGHTS.get(label)
w = h * width // height if h and width and height else None
formats.append({
'format_id': label,
'ext': fmt.get('type'),
'url': file_url,
'width': w,
'height': h,
})
self._sort_formats(formats)
subtitles = {}
tracks = try_get(playlist0, lambda x: x['tracks'], list) or []
for track in tracks:
if not isinstance(track, dict):
continue
track_kind = str_or_none(track.get('kind'))
if not track_kind or not isinstance(track_kind, compat_str):
continue
if track_kind.lower() not in ('captions', 'subtitles'):
continue
track_url = urljoin(self._APPS_BASE_URL, track.get('file'))
if not track_url:
continue
track_label = track.get('label')
subtitles.setdefault(self._CC_LANGS.get(
track_label, track_label), []).append({
'url': self._proto_relative_url(track_url),
'ext': 'srt',
})
# Likes
emotion = self._download_json(
'https://emocounter.hkedcity.net/handler.php', video_id,
data=urlencode_postdata({
'action': 'get_emotion',
'data[bucket_id]': 'etv',
'data[identifier]': video_id,
}),
headers={'Content-Type': 'application/x-www-form-urlencoded'},
fatal=False) or {}
like_count = int_or_none(try_get(
emotion, lambda x: x['data']['emotion_data'][0]['count']))
return {
'id': video_id,
'title': title,
'description': self._html_search_meta(
'description', webpage, fatal=False),
'upload_date': unified_strdate(self._html_search_meta(
'ed_date', webpage, fatal=False), day_first=False),
'duration': int_or_none(result.get('length')),
'formats': formats,
'subtitles': subtitles,
'thumbnail': urljoin(self._APPS_BASE_URL, result.get('image')),
'view_count': parse_count(result.get('view_count')),
'like_count': like_count,
}

View File

@ -0,0 +1,117 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
int_or_none,
urlencode_postdata,
)
class HungamaIE(InfoExtractor):
_VALID_URL = r'''(?x)
https?://
(?:www\.)?hungama\.com/
(?:
(?:video|movie)/[^/]+/|
tv-show/(?:[^/]+/){2}\d+/episode/[^/]+/
)
(?P<id>\d+)
'''
_TESTS = [{
'url': 'http://www.hungama.com/video/krishna-chants/39349649/',
'md5': 'a845a6d1ebd08d80c1035126d49bd6a0',
'info_dict': {
'id': '2931166',
'ext': 'mp4',
'title': 'Lucky Ali - Kitni Haseen Zindagi',
'track': 'Kitni Haseen Zindagi',
'artist': 'Lucky Ali',
'album': 'Aks',
'release_year': 2000,
}
}, {
'url': 'https://www.hungama.com/movie/kahaani-2/44129919/',
'only_matching': True,
}, {
'url': 'https://www.hungama.com/tv-show/padded-ki-pushup/season-1/44139461/episode/ep-02-training-sasu-pathlaag-karing/44139503/',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
info = self._search_json_ld(webpage, video_id)
m3u8_url = self._download_json(
'https://www.hungama.com/index.php', video_id,
data=urlencode_postdata({'content_id': video_id}), headers={
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
'X-Requested-With': 'XMLHttpRequest',
}, query={
'c': 'common',
'm': 'get_video_mdn_url',
})['stream_url']
formats = self._extract_m3u8_formats(
m3u8_url, video_id, ext='mp4', entry_protocol='m3u8_native',
m3u8_id='hls')
self._sort_formats(formats)
info.update({
'id': video_id,
'formats': formats,
})
return info
class HungamaSongIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?hungama\.com/song/[^/]+/(?P<id>\d+)'
_TEST = {
'url': 'https://www.hungama.com/song/kitni-haseen-zindagi/2931166/',
'md5': 'a845a6d1ebd08d80c1035126d49bd6a0',
'info_dict': {
'id': '2931166',
'ext': 'mp4',
'title': 'Lucky Ali - Kitni Haseen Zindagi',
'track': 'Kitni Haseen Zindagi',
'artist': 'Lucky Ali',
'album': 'Aks',
'release_year': 2000,
}
}
def _real_extract(self, url):
audio_id = self._match_id(url)
data = self._download_json(
'https://www.hungama.com/audio-player-data/track/%s' % audio_id,
audio_id, query={'_country': 'IN'})[0]
track = data['song_name']
artist = data.get('singer_name')
m3u8_url = self._download_json(
data.get('file') or data['preview_link'],
audio_id)['response']['media_url']
formats = self._extract_m3u8_formats(
m3u8_url, audio_id, ext='mp4', entry_protocol='m3u8_native',
m3u8_id='hls')
self._sort_formats(formats)
title = '%s - %s' % (artist, track) if artist else track
thumbnail = data.get('img_src') or data.get('album_image')
return {
'id': audio_id,
'title': title,
'thumbnail': thumbnail,
'track': track,
'artist': artist,
'album': data.get('album_name'),
'release_year': int_or_none(data.get('date')),
'formats': formats,
}

View File

@ -227,44 +227,37 @@ class InstagramIE(InfoExtractor):
} }
class InstagramUserIE(InfoExtractor): class InstagramPlaylistIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?instagram\.com/(?P<id>[^/]{2,})/?(?:$|[?#])' # A superclass for handling any kind of query based on GraphQL which
IE_DESC = 'Instagram user profile' # results in a playlist.
IE_NAME = 'instagram:user'
_TEST = {
'url': 'https://instagram.com/porsche',
'info_dict': {
'id': 'porsche',
'title': 'porsche',
},
'playlist_count': 5,
'params': {
'extract_flat': True,
'skip_download': True,
'playlistend': 5,
}
}
_gis_tmpl = None _gis_tmpl = None # used to cache GIS request type
def _entries(self, data): def _parse_graphql(self, webpage, item_id):
# Reads a webpage and returns its GraphQL data.
return self._parse_json(
self._search_regex(
r'sharedData\s*=\s*({.+?})\s*;\s*[<\n]', webpage, 'data'),
item_id)
def _extract_graphql(self, data, url):
# Parses GraphQL queries containing videos and generates a playlist.
def get_count(suffix): def get_count(suffix):
return int_or_none(try_get( return int_or_none(try_get(
node, lambda x: x['edge_media_' + suffix]['count'])) node, lambda x: x['edge_media_' + suffix]['count']))
uploader_id = data['entry_data']['ProfilePage'][0]['graphql']['user']['id'] uploader_id = self._match_id(url)
csrf_token = data['config']['csrf_token'] csrf_token = data['config']['csrf_token']
rhx_gis = data.get('rhx_gis') or '3c7ca9dcefcf966d11dacf1f151335e8' rhx_gis = data.get('rhx_gis') or '3c7ca9dcefcf966d11dacf1f151335e8'
self._set_cookie('instagram.com', 'ig_pr', '1')
cursor = '' cursor = ''
for page_num in itertools.count(1): for page_num in itertools.count(1):
variables = json.dumps({ variables = {
'id': uploader_id,
'first': 12, 'first': 12,
'after': cursor, 'after': cursor,
}) }
variables.update(self._query_vars_for(data))
variables = json.dumps(variables)
if self._gis_tmpl: if self._gis_tmpl:
gis_tmpls = [self._gis_tmpl] gis_tmpls = [self._gis_tmpl]
@ -276,21 +269,26 @@ class InstagramUserIE(InfoExtractor):
'%s:%s:%s' % (rhx_gis, csrf_token, std_headers['User-Agent']), '%s:%s:%s' % (rhx_gis, csrf_token, std_headers['User-Agent']),
] ]
# try all of the ways to generate a GIS query, and not only use the
# first one that works, but cache it for future requests
for gis_tmpl in gis_tmpls: for gis_tmpl in gis_tmpls:
try: try:
media = self._download_json( json_data = self._download_json(
'https://www.instagram.com/graphql/query/', uploader_id, 'https://www.instagram.com/graphql/query/', uploader_id,
'Downloading JSON page %d' % page_num, headers={ 'Downloading JSON page %d' % page_num, headers={
'X-Requested-With': 'XMLHttpRequest', 'X-Requested-With': 'XMLHttpRequest',
'X-Instagram-GIS': hashlib.md5( 'X-Instagram-GIS': hashlib.md5(
('%s:%s' % (gis_tmpl, variables)).encode('utf-8')).hexdigest(), ('%s:%s' % (gis_tmpl, variables)).encode('utf-8')).hexdigest(),
}, query={ }, query={
'query_hash': '42323d64886122307be10013ad2dcc44', 'query_hash': self._QUERY_HASH,
'variables': variables, 'variables': variables,
})['data']['user']['edge_owner_to_timeline_media'] })
media = self._parse_timeline_from(json_data)
self._gis_tmpl = gis_tmpl self._gis_tmpl = gis_tmpl
break break
except ExtractorError as e: except ExtractorError as e:
# if it's an error caused by a bad query, and there are
# more GIS templates to try, ignore it and keep trying
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403: if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
if gis_tmpl != gis_tmpls[-1]: if gis_tmpl != gis_tmpls[-1]:
continue continue
@ -348,14 +346,80 @@ class InstagramUserIE(InfoExtractor):
break break
def _real_extract(self, url): def _real_extract(self, url):
username = self._match_id(url) user_or_tag = self._match_id(url)
webpage = self._download_webpage(url, user_or_tag)
data = self._parse_graphql(webpage, user_or_tag)
webpage = self._download_webpage(url, username) self._set_cookie('instagram.com', 'ig_pr', '1')
data = self._parse_json(
self._search_regex(
r'sharedData\s*=\s*({.+?})\s*;\s*[<\n]', webpage, 'data'),
username)
return self.playlist_result( return self.playlist_result(
self._entries(data), username, username) self._extract_graphql(data, url), user_or_tag, user_or_tag)
class InstagramUserIE(InstagramPlaylistIE):
_VALID_URL = r'https?://(?:www\.)?instagram\.com/(?P<id>[^/]{2,})/?(?:$|[?#])'
IE_DESC = 'Instagram user profile'
IE_NAME = 'instagram:user'
_TEST = {
'url': 'https://instagram.com/porsche',
'info_dict': {
'id': 'porsche',
'title': 'porsche',
},
'playlist_count': 5,
'params': {
'extract_flat': True,
'skip_download': True,
'playlistend': 5,
}
}
_QUERY_HASH = '42323d64886122307be10013ad2dcc44',
@staticmethod
def _parse_timeline_from(data):
# extracts the media timeline data from a GraphQL result
return data['data']['user']['edge_owner_to_timeline_media']
@staticmethod
def _query_vars_for(data):
# returns a dictionary of variables to add to the timeline query based
# on the GraphQL of the original page
return {
'id': data['entry_data']['ProfilePage'][0]['graphql']['user']['id']
}
class InstagramTagIE(InstagramPlaylistIE):
_VALID_URL = r'https?://(?:www\.)?instagram\.com/explore/tags/(?P<id>[^/]+)'
IE_DESC = 'Instagram hashtag search'
IE_NAME = 'instagram:tag'
_TEST = {
'url': 'https://instagram.com/explore/tags/lolcats',
'info_dict': {
'id': 'lolcats',
'title': 'lolcats',
},
'playlist_count': 50,
'params': {
'extract_flat': True,
'skip_download': True,
'playlistend': 50,
}
}
_QUERY_HASH = 'f92f56d47dc7a55b606908374b43a314',
@staticmethod
def _parse_timeline_from(data):
# extracts the media timeline data from a GraphQL result
return data['data']['hashtag']['edge_hashtag_to_media']
@staticmethod
def _query_vars_for(data):
# returns a dictionary of variables to add to the timeline query based
# on the GraphQL of the original page
return {
'tag_name':
data['entry_data']['TagPage'][0]['graphql']['hashtag']['name']
}

View File

@ -7,8 +7,8 @@ from .common import InfoExtractor
class JWPlatformIE(InfoExtractor): class JWPlatformIE(InfoExtractor):
_VALID_URL = r'(?:https?://content\.jwplatform\.com/(?:feeds|players|jw6)/|jwplatform:)(?P<id>[a-zA-Z0-9]{8})' _VALID_URL = r'(?:https?://(?:content\.jwplatform|cdn\.jwplayer)\.com/(?:(?:feed|player|thumb|preview|video|manifest)s|jw6|v2/media)/|jwplatform:)(?P<id>[a-zA-Z0-9]{8})'
_TEST = { _TESTS = [{
'url': 'http://content.jwplatform.com/players/nPripu9l-ALJ3XQCI.js', 'url': 'http://content.jwplatform.com/players/nPripu9l-ALJ3XQCI.js',
'md5': 'fa8899fa601eb7c83a64e9d568bdf325', 'md5': 'fa8899fa601eb7c83a64e9d568bdf325',
'info_dict': { 'info_dict': {
@ -19,7 +19,10 @@ class JWPlatformIE(InfoExtractor):
'upload_date': '20081127', 'upload_date': '20081127',
'timestamp': 1227796140, 'timestamp': 1227796140,
} }
} }, {
'url': 'https://cdn.jwplayer.com/players/nPripu9l-ALJ3XQCI.js',
'only_matching': True,
}]
@staticmethod @staticmethod
def _extract_url(webpage): def _extract_url(webpage):
@ -34,5 +37,5 @@ class JWPlatformIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
json_data = self._download_json('http://content.jwplatform.com/feeds/%s.json' % video_id, video_id) json_data = self._download_json('https://cdn.jwplayer.com/v2/media/' + video_id, video_id)
return self._parse_jwplayer_data(json_data, video_id) return self._parse_jwplayer_data(json_data, video_id)

View File

@ -1,15 +1,9 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from .adobepass import AdobePassIE
from .theplatform import ThePlatformIE
from ..utils import ( from ..utils import (
smuggle_url, smuggle_url,
url_basename, url_basename,
update_url_query,
get_element_by_class,
) )
@ -64,132 +58,3 @@ class NationalGeographicVideoIE(InfoExtractor):
{'force_smil_url': True}), {'force_smil_url': True}),
'id': guid, 'id': guid,
} }
class NationalGeographicIE(ThePlatformIE, AdobePassIE):
IE_NAME = 'natgeo'
_VALID_URL = r'https?://channel\.nationalgeographic\.com/(?:(?:(?:wild/)?[^/]+/)?(?:videos|episodes)|u)/(?P<id>[^/?]+)'
_TESTS = [
{
'url': 'http://channel.nationalgeographic.com/u/kdi9Ld0PN2molUUIMSBGxoeDhD729KRjQcnxtetilWPMevo8ZwUBIDuPR0Q3D2LVaTsk0MPRkRWDB8ZhqWVeyoxfsZZm36yRp1j-zPfsHEyI_EgAeFY/',
'md5': '518c9aa655686cf81493af5cc21e2a04',
'info_dict': {
'id': 'vKInpacll2pC',
'ext': 'mp4',
'title': 'Uncovering a Universal Knowledge',
'description': 'md5:1a89148475bf931b3661fcd6ddb2ae3a',
'timestamp': 1458680907,
'upload_date': '20160322',
'uploader': 'NEWA-FNG-NGTV',
},
'add_ie': ['ThePlatform'],
},
{
'url': 'http://channel.nationalgeographic.com/u/kdvOstqYaBY-vSBPyYgAZRUL4sWUJ5XUUPEhc7ISyBHqoIO4_dzfY3K6EjHIC0hmFXoQ7Cpzm6RkET7S3oMlm6CFnrQwSUwo/',
'md5': 'c4912f656b4cbe58f3e000c489360989',
'info_dict': {
'id': 'Pok5lWCkiEFA',
'ext': 'mp4',
'title': 'The Stunning Red Bird of Paradise',
'description': 'md5:7bc8cd1da29686be4d17ad1230f0140c',
'timestamp': 1459362152,
'upload_date': '20160330',
'uploader': 'NEWA-FNG-NGTV',
},
'add_ie': ['ThePlatform'],
},
{
'url': 'http://channel.nationalgeographic.com/the-story-of-god-with-morgan-freeman/episodes/the-power-of-miracles/',
'only_matching': True,
},
{
'url': 'http://channel.nationalgeographic.com/videos/treasures-rediscovered/',
'only_matching': True,
},
{
'url': 'http://channel.nationalgeographic.com/the-story-of-god-with-morgan-freeman/videos/uncovering-a-universal-knowledge/',
'only_matching': True,
},
{
'url': 'http://channel.nationalgeographic.com/wild/destination-wild/videos/the-stunning-red-bird-of-paradise/',
'only_matching': True,
}
]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
release_url = self._search_regex(
r'video_auth_playlist_url\s*=\s*"([^"]+)"',
webpage, 'release url')
theplatform_path = self._search_regex(r'https?://link\.theplatform\.com/s/([^?]+)', release_url, 'theplatform path')
video_id = theplatform_path.split('/')[-1]
query = {
'mbr': 'true',
}
is_auth = self._search_regex(r'video_is_auth\s*=\s*"([^"]+)"', webpage, 'is auth', fatal=False)
if is_auth == 'auth':
auth_resource_id = self._search_regex(
r"video_auth_resourceId\s*=\s*'([^']+)'",
webpage, 'auth resource id')
query['auth'] = self._extract_mvpd_auth(url, video_id, 'natgeo', auth_resource_id)
formats = []
subtitles = {}
for key, value in (('switch', 'http'), ('manifest', 'm3u')):
tp_query = query.copy()
tp_query.update({
key: value,
})
tp_formats, tp_subtitles = self._extract_theplatform_smil(
update_url_query(release_url, tp_query), video_id, 'Downloading %s SMIL data' % value)
formats.extend(tp_formats)
subtitles = self._merge_subtitles(subtitles, tp_subtitles)
self._sort_formats(formats)
info = self._extract_theplatform_metadata(theplatform_path, display_id)
info.update({
'id': video_id,
'formats': formats,
'subtitles': subtitles,
'display_id': display_id,
})
return info
class NationalGeographicEpisodeGuideIE(InfoExtractor):
IE_NAME = 'natgeo:episodeguide'
_VALID_URL = r'https?://channel\.nationalgeographic\.com/(?:wild/)?(?P<id>[^/]+)/episode-guide'
_TESTS = [
{
'url': 'http://channel.nationalgeographic.com/the-story-of-god-with-morgan-freeman/episode-guide/',
'info_dict': {
'id': 'the-story-of-god-with-morgan-freeman-season-1',
'title': 'The Story of God with Morgan Freeman - Season 1',
},
'playlist_mincount': 6,
},
{
'url': 'http://channel.nationalgeographic.com/underworld-inc/episode-guide/?s=2',
'info_dict': {
'id': 'underworld-inc-season-2',
'title': 'Underworld, Inc. - Season 2',
},
'playlist_mincount': 7,
},
]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
show = get_element_by_class('show', webpage)
selected_season = self._search_regex(
r'<div[^>]+class="select-seasons[^"]*".*?<a[^>]*>(.*?)</a>',
webpage, 'selected season')
entries = [
self.url_result(self._proto_relative_url(entry_url), 'NationalGeographic')
for entry_url in re.findall('(?s)<div[^>]+class="col-inner"[^>]*?>.*?<a[^>]+href="([^"]+)"', webpage)]
return self.playlist_result(
entries, '%s-%s' % (display_id, selected_season.lower().replace(' ', '-')),
'%s - %s' % (show, selected_season))

View File

@ -5,8 +5,8 @@ from ..utils import ExtractorError
class NhkVodIE(InfoExtractor): class NhkVodIE(InfoExtractor):
_VALID_URL = r'https?://www3\.nhk\.or\.jp/nhkworld/en/vod/(?P<id>[^/]+/[^/?#&]+)' _VALID_URL = r'https?://www3\.nhk\.or\.jp/nhkworld/en/(?:vod|ondemand)/(?P<id>[^/]+/[^/?#&]+)'
_TEST = { _TESTS = [{
# Videos available only for a limited period of time. Visit # Videos available only for a limited period of time. Visit
# http://www3.nhk.or.jp/nhkworld/en/vod/ for working samples. # http://www3.nhk.or.jp/nhkworld/en/vod/ for working samples.
'url': 'http://www3.nhk.or.jp/nhkworld/en/vod/tokyofashion/20160815', 'url': 'http://www3.nhk.or.jp/nhkworld/en/vod/tokyofashion/20160815',
@ -19,7 +19,10 @@ class NhkVodIE(InfoExtractor):
'episode': 'The Kimono as Global Fashion', 'episode': 'The Kimono as Global Fashion',
}, },
'skip': 'Videos available only for a limited period of time', 'skip': 'Videos available only for a limited period of time',
} }, {
'url': 'https://www3.nhk.or.jp/nhkworld/en/ondemand/video/2015173/',
'only_matching': True,
}]
_API_URL = 'http://api.nhk.or.jp/nhkworld/vodesdlist/v1/all/all/all.json?apikey=EJfK8jdS57GqlupFgAfAAwr573q01y6k' _API_URL = 'http://api.nhk.or.jp/nhkworld/vodesdlist/v1/all/all/all.json?apikey=EJfK8jdS57GqlupFgAfAAwr573q01y6k'
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -115,6 +115,10 @@ class OdnoklassnikiIE(InfoExtractor):
}, { }, {
'url': 'https://m.ok.ru/dk?st.cmd=movieLayer&st.discId=863789452017&st.retLoc=friend&st.rtu=%2Fdk%3Fst.cmd%3DfriendMovies%26st.mode%3Down%26st.mrkId%3D%257B%2522uploadedMovieMarker%2522%253A%257B%2522marker%2522%253A%25221519410114503%2522%252C%2522hasMore%2522%253Atrue%257D%252C%2522sharedMovieMarker%2522%253A%257B%2522marker%2522%253Anull%252C%2522hasMore%2522%253Afalse%257D%257D%26st.friendId%3D561722190321%26st.frwd%3Don%26_prevCmd%3DfriendMovies%26tkn%3D7257&st.discType=MOVIE&st.mvId=863789452017&_prevCmd=friendMovies&tkn=3648#lst#', 'url': 'https://m.ok.ru/dk?st.cmd=movieLayer&st.discId=863789452017&st.retLoc=friend&st.rtu=%2Fdk%3Fst.cmd%3DfriendMovies%26st.mode%3Down%26st.mrkId%3D%257B%2522uploadedMovieMarker%2522%253A%257B%2522marker%2522%253A%25221519410114503%2522%252C%2522hasMore%2522%253Atrue%257D%252C%2522sharedMovieMarker%2522%253A%257B%2522marker%2522%253Anull%252C%2522hasMore%2522%253Afalse%257D%257D%26st.friendId%3D561722190321%26st.frwd%3Don%26_prevCmd%3DfriendMovies%26tkn%3D7257&st.discType=MOVIE&st.mvId=863789452017&_prevCmd=friendMovies&tkn=3648#lst#',
'only_matching': True, 'only_matching': True,
}, {
# Paid video
'url': 'https://ok.ru/video/954886983203',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -244,6 +248,11 @@ class OdnoklassnikiIE(InfoExtractor):
'ext': 'flv', 'ext': 'flv',
}) })
if not formats:
payment_info = metadata.get('paymentInfo')
if payment_info:
raise ExtractorError('This video is paid, subscribe to download it', expected=True)
self._sort_formats(formats) self._sort_formats(formats)
info['formats'] = formats info['formats'] = formats

View File

@ -249,7 +249,7 @@ class OpenloadIE(InfoExtractor):
(?:www\.)? (?:www\.)?
(?: (?:
openload\.(?:co|io|link)| openload\.(?:co|io|link)|
oload\.(?:tv|stream|site|xyz|win|download|cloud|cc|icu|fun) oload\.(?:tv|stream|site|xyz|win|download|cloud|cc|icu|fun|club)
) )
)/ )/
(?:f|embed)/ (?:f|embed)/
@ -334,6 +334,9 @@ class OpenloadIE(InfoExtractor):
}, { }, {
'url': 'https://oload.fun/f/gb6G1H4sHXY', 'url': 'https://oload.fun/f/gb6G1H4sHXY',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://oload.club/f/Nr1L-aZ2dbQ',
'only_matching': True,
}] }]
_USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36' _USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36'

View File

@ -0,0 +1,28 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
class OutsideTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?outsidetv\.com/(?:[^/]+/)*?play/[a-zA-Z0-9]{8}/\d+/\d+/(?P<id>[a-zA-Z0-9]{8})'
_TESTS = [{
'url': 'http://www.outsidetv.com/category/snow/play/ZjQYboH6/1/10/Hdg0jukV/4',
'md5': '192d968fedc10b2f70ec31865ffba0da',
'info_dict': {
'id': 'Hdg0jukV',
'ext': 'mp4',
'title': 'Home - Jackson Ep 1 | Arbor Snowboards',
'description': 'md5:41a12e94f3db3ca253b04bb1e8d8f4cd',
'upload_date': '20181225',
'timestamp': 1545742800,
}
}, {
'url': 'http://www.outsidetv.com/home/play/ZjQYboH6/1/10/Hdg0jukV/4',
'only_matching': True,
}]
def _real_extract(self, url):
jw_media_id = self._match_id(url)
return self.url_result(
'jwplatform:' + jw_media_id, 'JWPlatform', jw_media_id)

View File

@ -0,0 +1,109 @@
# coding: utf-8
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
from ..compat import compat_HTTPError
from ..utils import (
clean_html,
ExtractorError,
int_or_none,
PUTRequest,
)
class PlayPlusTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?playplus\.(?:com|tv)/VOD/(?P<project_id>[0-9]+)/(?P<id>[0-9a-f]{32})'
_TEST = {
'url': 'https://www.playplus.tv/VOD/7572/db8d274a5163424e967f35a30ddafb8e',
'md5': 'd078cb89d7ab6b9df37ce23c647aef72',
'info_dict': {
'id': 'db8d274a5163424e967f35a30ddafb8e',
'ext': 'mp4',
'title': 'Capítulo 179 - Final',
'description': 'md5:01085d62d8033a1e34121d3c3cabc838',
'timestamp': 1529992740,
'upload_date': '20180626',
},
'skip': 'Requires account credential',
}
_NETRC_MACHINE = 'playplustv'
_GEO_COUNTRIES = ['BR']
_token = None
_profile_id = None
def _call_api(self, resource, video_id=None, query=None):
return self._download_json('https://api.playplus.tv/api/media/v2/get' + resource, video_id, headers={
'Authorization': 'Bearer ' + self._token,
}, query=query)
def _real_initialize(self):
email, password = self._get_login_info()
if email is None:
self.raise_login_required()
req = PUTRequest(
'https://api.playplus.tv/api/web/login', json.dumps({
'email': email,
'password': password,
}).encode(), {
'Content-Type': 'application/json; charset=utf-8',
})
try:
self._token = self._download_json(req, None)['token']
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 401:
raise ExtractorError(self._parse_json(
e.cause.read(), None)['errorMessage'], expected=True)
raise
self._profile = self._call_api('Profiles')['list'][0]['_id']
def _real_extract(self, url):
project_id, media_id = re.match(self._VALID_URL, url).groups()
media = self._call_api(
'Media', media_id, {
'profileId': self._profile,
'projectId': project_id,
'mediaId': media_id,
})['obj']
title = media['title']
formats = []
for f in media.get('files', []):
f_url = f.get('url')
if not f_url:
continue
file_info = f.get('fileInfo') or {}
formats.append({
'url': f_url,
'width': int_or_none(file_info.get('width')),
'height': int_or_none(file_info.get('height')),
})
self._sort_formats(formats)
thumbnails = []
for thumb in media.get('thumbs', []):
thumb_url = thumb.get('url')
if not thumb_url:
continue
thumbnails.append({
'url': thumb_url,
'width': int_or_none(thumb.get('width')),
'height': int_or_none(thumb.get('height')),
})
return {
'id': media_id,
'title': title,
'formats': formats,
'thumbnails': thumbnails,
'description': clean_html(media.get('description')) or media.get('shortDescription'),
'timestamp': int_or_none(media.get('publishDate'), 1000),
'view_count': int_or_none(media.get('numberOfViews')),
'comment_count': int_or_none(media.get('numberOfComments')),
'tags': media.get('tags'),
}

View File

@ -10,7 +10,9 @@ from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_HTTPError, compat_HTTPError,
compat_str, compat_str,
compat_urllib_request,
) )
from .openload import PhantomJSwrapper
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
int_or_none, int_or_none,
@ -22,7 +24,29 @@ from ..utils import (
) )
class PornHubIE(InfoExtractor): class PornHubBaseIE(InfoExtractor):
def _download_webpage_handle(self, *args, **kwargs):
def dl(*args, **kwargs):
return super(PornHubBaseIE, self)._download_webpage_handle(*args, **kwargs)
webpage, urlh = dl(*args, **kwargs)
if any(re.search(p, webpage) for p in (
r'<body\b[^>]+\bonload=["\']go\(\)',
r'document\.cookie\s*=\s*["\']RNKEY=',
r'document\.location\.reload\(true\)')):
url_or_request = args[0]
url = (url_or_request.get_full_url()
if isinstance(url_or_request, compat_urllib_request.Request)
else url_or_request)
phantom = PhantomJSwrapper(self, required_version='2.0')
phantom.get(url, html=webpage)
webpage, urlh = dl(*args, **kwargs)
return webpage, urlh
class PornHubIE(PornHubBaseIE):
IE_DESC = 'PornHub and Thumbzilla' IE_DESC = 'PornHub and Thumbzilla'
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?:// https?://
@ -307,7 +331,7 @@ class PornHubIE(InfoExtractor):
} }
class PornHubPlaylistBaseIE(InfoExtractor): class PornHubPlaylistBaseIE(PornHubBaseIE):
def _extract_entries(self, webpage, host): def _extract_entries(self, webpage, host):
# Only process container div with main playlist content skipping # Only process container div with main playlist content skipping
# drop-down menu that uses similar pattern for videos (see # drop-down menu that uses similar pattern for videos (see

View File

@ -49,6 +49,16 @@ class RadioCanadaIE(InfoExtractor):
# m3u8 download # m3u8 download
'skip_download': True, 'skip_download': True,
}, },
},
{
# with protectionType but not actually DRM protected
'url': 'radiocanada:toutv:140872',
'info_dict': {
'id': '140872',
'title': 'Épisode 1',
'series': 'District 31',
},
'only_matching': True,
} }
] ]
@ -67,8 +77,10 @@ class RadioCanadaIE(InfoExtractor):
el = find_xpath_attr(metadata, './/Meta', 'name', name) el = find_xpath_attr(metadata, './/Meta', 'name', name)
return el.text if el is not None else None return el.text if el is not None else None
# protectionType does not necessarily mean the video is DRM protected (see
# https://github.com/rg3/youtube-dl/pull/18609).
if get_meta('protectionType'): if get_meta('protectionType'):
raise ExtractorError('This video is DRM protected.', expected=True) self.report_warning('This video is probably DRM protected.')
device_types = ['ipad'] device_types = ['ipad']
if not smuggled_data: if not smuggled_data:

View File

@ -26,7 +26,7 @@ class SkylineWebcamsIE(InfoExtractor):
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
stream_url = self._search_regex( stream_url = self._search_regex(
r'url\s*:\s*(["\'])(?P<url>(?:https?:)?//.+?\.m3u8.*?)\1', webpage, r'(?:url|source)\s*:\s*(["\'])(?P<url>(?:https?:)?//.+?\.m3u8.*?)\1', webpage,
'stream url', group='url') 'stream url', group='url')
title = self._og_search_title(webpage) title = self._og_search_title(webpage)

View File

@ -14,7 +14,7 @@ from ..utils import (
class StreamangoIE(InfoExtractor): class StreamangoIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?streamango\.com/(?:f|embed)/(?P<id>[^/?#&]+)' _VALID_URL = r'https?://(?:www\.)?(?:streamango\.com|fruithosts\.net)/(?:f|embed)/(?P<id>[^/?#&]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://streamango.com/f/clapasobsptpkdfe/20170315_150006_mp4', 'url': 'https://streamango.com/f/clapasobsptpkdfe/20170315_150006_mp4',
'md5': 'e992787515a182f55e38fc97588d802a', 'md5': 'e992787515a182f55e38fc97588d802a',
@ -38,6 +38,9 @@ class StreamangoIE(InfoExtractor):
}, { }, {
'url': 'https://streamango.com/embed/clapasobsptpkdfe/20170315_150006_mp4', 'url': 'https://streamango.com/embed/clapasobsptpkdfe/20170315_150006_mp4',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://fruithosts.net/f/mreodparcdcmspsm/w1f1_r4lph_2018_brrs_720p_latino_mp4',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -265,6 +265,8 @@ class TEDIE(InfoExtractor):
'format_id': m3u8_format['format_id'].replace('hls', 'http'), 'format_id': m3u8_format['format_id'].replace('hls', 'http'),
'protocol': 'http', 'protocol': 'http',
}) })
if f.get('acodec') == 'none':
del f['acodec']
formats.append(f) formats.append(f)
audio_download = talk_info.get('audioDownload') audio_download = talk_info.get('audioDownload')

View File

@ -96,7 +96,7 @@ class TNAFlixNetworkBaseIE(InfoExtractor):
cfg_xml = self._download_xml( cfg_xml = self._download_xml(
cfg_url, display_id, 'Downloading metadata', cfg_url, display_id, 'Downloading metadata',
transform_source=fix_xml_ampersands) transform_source=fix_xml_ampersands, headers={'Referer': url})
formats = [] formats = []

View File

@ -10,8 +10,9 @@ from ..utils import (
int_or_none, int_or_none,
parse_iso8601, parse_iso8601,
parse_duration, parse_duration,
try_get, str_or_none,
update_url_query, update_url_query,
urljoin,
) )
@ -24,8 +25,7 @@ class TVNowBaseIE(InfoExtractor):
def _call_api(self, path, video_id, query): def _call_api(self, path, video_id, query):
return self._download_json( return self._download_json(
'https://api.tvnow.de/v3/' + path, 'https://api.tvnow.de/v3/' + path, video_id, query=query)
video_id, query=query)
def _extract_video(self, info, display_id): def _extract_video(self, info, display_id):
video_id = compat_str(info['id']) video_id = compat_str(info['id'])
@ -108,6 +108,11 @@ class TVNowIE(TVNowBaseIE):
(?!(?:list|jahr)(?:/|$))(?P<id>[^/?\#&]+) (?!(?:list|jahr)(?:/|$))(?P<id>[^/?\#&]+)
''' '''
@classmethod
def suitable(cls, url):
return (False if TVNowNewIE.suitable(url) or TVNowSeasonIE.suitable(url) or TVNowAnnualIE.suitable(url) or TVNowShowIE.suitable(url)
else super(TVNowIE, cls).suitable(url))
_TESTS = [{ _TESTS = [{
'url': 'https://www.tvnow.de/rtl2/grip-das-motormagazin/der-neue-porsche-911-gt-3/player', 'url': 'https://www.tvnow.de/rtl2/grip-das-motormagazin/der-neue-porsche-911-gt-3/player',
'info_dict': { 'info_dict': {
@ -116,7 +121,6 @@ class TVNowIE(TVNowBaseIE):
'ext': 'mp4', 'ext': 'mp4',
'title': 'Der neue Porsche 911 GT 3', 'title': 'Der neue Porsche 911 GT 3',
'description': 'md5:6143220c661f9b0aae73b245e5d898bb', 'description': 'md5:6143220c661f9b0aae73b245e5d898bb',
'thumbnail': r're:^https?://.*\.jpg$',
'timestamp': 1495994400, 'timestamp': 1495994400,
'upload_date': '20170528', 'upload_date': '20170528',
'duration': 5283, 'duration': 5283,
@ -161,136 +165,314 @@ class TVNowIE(TVNowBaseIE):
info = self._call_api( info = self._call_api(
'movies/' + display_id, display_id, query={ 'movies/' + display_id, display_id, query={
'fields': ','.join(self._VIDEO_FIELDS), 'fields': ','.join(self._VIDEO_FIELDS),
'station': mobj.group(1),
}) })
return self._extract_video(info, display_id) return self._extract_video(info, display_id)
class TVNowListBaseIE(TVNowBaseIE): class TVNowNewIE(InfoExtractor):
_SHOW_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
(?P<base_url> (?P<base_url>https?://
https?:// (?:www\.)?tvnow\.(?:de|at|ch)/
(?:www\.)?tvnow\.(?:de|at|ch)/[^/]+/ (?:shows|serien))/
(?P<show_id>[^/]+) (?P<show>[^/]+)-\d+/
) [^/]+/
episode-\d+-(?P<episode>[^/?$&]+)-(?P<id>\d+)
''' '''
def _extract_list_info(self, display_id, show_id):
fields = list(self._SHOW_FIELDS)
fields.extend('formatTabs.%s' % field for field in self._SEASON_FIELDS)
fields.extend(
'formatTabs.formatTabPages.container.movies.%s' % field
for field in self._VIDEO_FIELDS)
return self._call_api(
'formats/seo', display_id, query={
'fields': ','.join(fields),
'name': show_id + '.php'
})
class TVNowListIE(TVNowListBaseIE):
_VALID_URL = r'%s/(?:list|jahr)/(?P<id>[^?\#&]+)' % TVNowListBaseIE._SHOW_VALID_URL
_SHOW_FIELDS = ('title', )
_SEASON_FIELDS = ('id', 'headline', 'seoheadline', )
_VIDEO_FIELDS = ('id', 'headline', 'seoUrl', )
_TESTS = [{ _TESTS = [{
'url': 'https://www.tvnow.de/rtl/30-minuten-deutschland/list/aktuell', 'url': 'https://www.tvnow.de/shows/grip-das-motormagazin-1669/2017-05/episode-405-der-neue-porsche-911-gt-3-331082',
'info_dict': {
'id': '28296',
'title': '30 Minuten Deutschland - Aktuell',
},
'playlist_mincount': 1,
}, {
'url': 'https://www.tvnow.de/vox/ab-ins-beet/list/staffel-14',
'only_matching': True,
}, {
'url': 'https://www.tvnow.de/rtl2/grip-das-motormagazin/jahr/2018/3',
'only_matching': True, 'only_matching': True,
}] }]
@classmethod def _real_extract(self, url):
def suitable(cls, url): mobj = re.match(self._VALID_URL, url)
return (False if TVNowIE.suitable(url) base_url = re.sub(r'(?:shows|serien)', '_', mobj.group('base_url'))
else super(TVNowListIE, cls).suitable(url)) show, episode = mobj.group('show', 'episode')
return self.url_result(
# Rewrite new URLs to the old format and use extraction via old API
# at api.tvnow.de as a loophole for bypassing premium content checks
'%s/%s/%s' % (base_url, show, episode),
ie=TVNowIE.ie_key(), video_id=mobj.group('id'))
class TVNowNewBaseIE(InfoExtractor):
def _call_api(self, path, video_id, query={}):
result = self._download_json(
'https://apigw.tvnow.de/module/' + path, video_id, query=query)
error = result.get('error')
if error:
raise ExtractorError(
'%s said: %s' % (self.IE_NAME, error), expected=True)
return result
"""
TODO: new apigw.tvnow.de based version of TVNowIE. Replace old TVNowIE with it
when api.tvnow.de is shut down. This version can't bypass premium checks though.
class TVNowIE(TVNowNewBaseIE):
_VALID_URL = r'''(?x)
https?://
(?:www\.)?tvnow\.(?:de|at|ch)/
(?:shows|serien)/[^/]+/
(?:[^/]+/)+
(?P<display_id>[^/?$&]+)-(?P<id>\d+)
'''
_TESTS = [{
# episode with annual navigation
'url': 'https://www.tvnow.de/shows/grip-das-motormagazin-1669/2017-05/episode-405-der-neue-porsche-911-gt-3-331082',
'info_dict': {
'id': '331082',
'display_id': 'grip-das-motormagazin/der-neue-porsche-911-gt-3',
'ext': 'mp4',
'title': 'Der neue Porsche 911 GT 3',
'description': 'md5:6143220c661f9b0aae73b245e5d898bb',
'thumbnail': r're:^https?://.*\.jpg$',
'timestamp': 1495994400,
'upload_date': '20170528',
'duration': 5283,
'series': 'GRIP - Das Motormagazin',
'season_number': 14,
'episode_number': 405,
'episode': 'Der neue Porsche 911 GT 3',
},
}, {
# rtl2, episode with season navigation
'url': 'https://www.tvnow.de/shows/armes-deutschland-11471/staffel-3/episode-14-bernd-steht-seit-der-trennung-von-seiner-frau-allein-da-526124',
'only_matching': True,
}, {
# rtlnitro
'url': 'https://www.tvnow.de/serien/alarm-fuer-cobra-11-die-autobahnpolizei-1815/staffel-13/episode-5-auf-eigene-faust-pilot-366822',
'only_matching': True,
}, {
# superrtl
'url': 'https://www.tvnow.de/shows/die-lustigsten-schlamassel-der-welt-1221/staffel-2/episode-14-u-a-ketchup-effekt-364120',
'only_matching': True,
}, {
# ntv
'url': 'https://www.tvnow.de/shows/startup-news-10674/staffel-2/episode-39-goetter-in-weiss-387630',
'only_matching': True,
}, {
# vox
'url': 'https://www.tvnow.de/shows/auto-mobil-174/2017-11/episode-46-neues-vom-automobilmarkt-2017-11-19-17-00-00-380072',
'only_matching': True,
}, {
'url': 'https://www.tvnow.de/shows/grip-das-motormagazin-1669/2017-05/episode-405-der-neue-porsche-911-gt-3-331082',
'only_matching': True,
}]
def _extract_video(self, info, url, display_id):
config = info['config']
source = config['source']
video_id = compat_str(info.get('id') or source['videoId'])
title = source['title'].strip()
paths = []
for manifest_url in (info.get('manifest') or {}).values():
if not manifest_url:
continue
manifest_url = update_url_query(manifest_url, {'filter': ''})
path = self._search_regex(r'https?://[^/]+/(.+?)\.ism/', manifest_url, 'path')
if path in paths:
continue
paths.append(path)
def url_repl(proto, suffix):
return re.sub(
r'(?:hls|dash|hss)([.-])', proto + r'\1', re.sub(
r'\.ism/(?:[^.]*\.(?:m3u8|mpd)|[Mm]anifest)',
'.ism/' + suffix, manifest_url))
formats = self._extract_mpd_formats(
url_repl('dash', '.mpd'), video_id,
mpd_id='dash', fatal=False)
formats.extend(self._extract_ism_formats(
url_repl('hss', 'Manifest'),
video_id, ism_id='mss', fatal=False))
formats.extend(self._extract_m3u8_formats(
url_repl('hls', '.m3u8'), video_id, 'mp4',
'm3u8_native', m3u8_id='hls', fatal=False))
if formats:
break
else:
if try_get(info, lambda x: x['rights']['isDrm']):
raise ExtractorError(
'Video %s is DRM protected' % video_id, expected=True)
if try_get(config, lambda x: x['boards']['geoBlocking']['block']):
raise self.raise_geo_restricted()
if not info.get('free', True):
raise ExtractorError(
'Video %s is not available for free' % video_id, expected=True)
self._sort_formats(formats)
description = source.get('description')
thumbnail = url_or_none(source.get('poster'))
timestamp = unified_timestamp(source.get('previewStart'))
duration = parse_duration(source.get('length'))
series = source.get('format')
season_number = int_or_none(self._search_regex(
r'staffel-(\d+)', url, 'season number', default=None))
episode_number = int_or_none(self._search_regex(
r'episode-(\d+)', url, 'episode number', default=None))
return {
'id': video_id,
'display_id': display_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'timestamp': timestamp,
'duration': duration,
'series': series,
'season_number': season_number,
'episode_number': episode_number,
'episode': title,
'formats': formats,
}
def _real_extract(self, url): def _real_extract(self, url):
base_url, show_id, season_id = re.match(self._VALID_URL, url).groups() display_id, video_id = re.match(self._VALID_URL, url).groups()
info = self._call_api('player/' + video_id, video_id)
return self._extract_video(info, video_id, display_id)
"""
list_info = self._extract_list_info(season_id, show_id)
season = next( class TVNowListBaseIE(TVNowNewBaseIE):
season for season in list_info['formatTabs']['items'] _SHOW_VALID_URL = r'''(?x)
if season.get('seoheadline') == season_id) (?P<base_url>
https?://
(?:www\.)?tvnow\.(?:de|at|ch)/(?:shows|serien)/
[^/?#&]+-(?P<show_id>\d+)
)
'''
title = list_info.get('title') @classmethod
headline = season.get('headline') def suitable(cls, url):
if title and headline: return (False if TVNowNewIE.suitable(url)
title = '%s - %s' % (title, headline) else super(TVNowListBaseIE, cls).suitable(url))
else:
title = headline or title def _extract_items(self, url, show_id, list_id, query):
items = self._call_api(
'teaserrow/format/episode/' + show_id, list_id,
query=query)['items']
entries = [] entries = []
for container in season['formatTabPages']['items']: for item in items:
items = try_get( if not isinstance(item, dict):
container, lambda x: x['container']['movies']['items'], continue
list) or [] item_url = urljoin(url, item.get('url'))
for info in items: if not item_url:
seo_url = info.get('seoUrl') continue
if not seo_url: video_id = str_or_none(item.get('id') or item.get('videoId'))
continue item_title = item.get('subheadline') or item.get('text')
video_id = info.get('id') entries.append(self.url_result(
entries.append(self.url_result( item_url, ie=TVNowNewIE.ie_key(), video_id=video_id,
'%s/%s/player' % (base_url, seo_url), TVNowIE.ie_key(), video_title=item_title))
compat_str(video_id) if video_id else None))
return self.playlist_result( return self.playlist_result(entries, '%s/%s' % (show_id, list_id))
entries, compat_str(season.get('id') or season_id), title)
class TVNowSeasonIE(TVNowListBaseIE):
_VALID_URL = r'%s/staffel-(?P<id>\d+)' % TVNowListBaseIE._SHOW_VALID_URL
_TESTS = [{
'url': 'https://www.tvnow.de/serien/alarm-fuer-cobra-11-die-autobahnpolizei-1815/staffel-13',
'info_dict': {
'id': '1815/13',
},
'playlist_mincount': 22,
}]
def _real_extract(self, url):
_, show_id, season_id = re.match(self._VALID_URL, url).groups()
return self._extract_items(
url, show_id, season_id, {'season': season_id})
class TVNowAnnualIE(TVNowListBaseIE):
_VALID_URL = r'%s/(?P<year>\d{4})-(?P<month>\d{2})' % TVNowListBaseIE._SHOW_VALID_URL
_TESTS = [{
'url': 'https://www.tvnow.de/shows/grip-das-motormagazin-1669/2017-05',
'info_dict': {
'id': '1669/2017-05',
},
'playlist_mincount': 2,
}]
def _real_extract(self, url):
_, show_id, year, month = re.match(self._VALID_URL, url).groups()
return self._extract_items(
url, show_id, '%s-%s' % (year, month), {
'year': int(year),
'month': int(month),
})
class TVNowShowIE(TVNowListBaseIE): class TVNowShowIE(TVNowListBaseIE):
_VALID_URL = TVNowListBaseIE._SHOW_VALID_URL _VALID_URL = TVNowListBaseIE._SHOW_VALID_URL
_SHOW_FIELDS = ('id', 'title', )
_SEASON_FIELDS = ('id', 'headline', 'seoheadline', )
_VIDEO_FIELDS = ()
_TESTS = [{ _TESTS = [{
'url': 'https://www.tvnow.at/vox/ab-ins-beet', # annual navigationType
'url': 'https://www.tvnow.de/shows/grip-das-motormagazin-1669',
'info_dict': { 'info_dict': {
'id': 'ab-ins-beet', 'id': '1669',
'title': 'Ab ins Beet!',
}, },
'playlist_mincount': 7, 'playlist_mincount': 73,
}, { }, {
'url': 'https://www.tvnow.at/vox/ab-ins-beet/list', # season navigationType
'only_matching': True, 'url': 'https://www.tvnow.de/shows/armes-deutschland-11471',
}, { 'info_dict': {
'url': 'https://www.tvnow.de/rtl2/grip-das-motormagazin/jahr/', 'id': '11471',
'only_matching': True, },
'playlist_mincount': 3,
}] }]
@classmethod @classmethod
def suitable(cls, url): def suitable(cls, url):
return (False if TVNowIE.suitable(url) or TVNowListIE.suitable(url) return (False if TVNowNewIE.suitable(url) or TVNowSeasonIE.suitable(url) or TVNowAnnualIE.suitable(url)
else super(TVNowShowIE, cls).suitable(url)) else super(TVNowShowIE, cls).suitable(url))
def _real_extract(self, url): def _real_extract(self, url):
base_url, show_id = re.match(self._VALID_URL, url).groups() base_url, show_id = re.match(self._VALID_URL, url).groups()
list_info = self._extract_list_info(show_id, show_id) result = self._call_api(
'teaserrow/format/navigation/' + show_id, show_id)
items = result['items']
entries = [] entries = []
for season_info in list_info['formatTabs']['items']: navigation = result.get('navigationType')
season_url = season_info.get('seoheadline') if navigation == 'annual':
if not season_url: for item in items:
continue if not isinstance(item, dict):
season_id = season_info.get('id') continue
entries.append(self.url_result( year = int_or_none(item.get('year'))
'%s/list/%s' % (base_url, season_url), TVNowListIE.ie_key(), if year is None:
compat_str(season_id) if season_id else None, continue
season_info.get('headline'))) months = item.get('months')
if not isinstance(months, list):
continue
for month_dict in months:
if not isinstance(month_dict, dict) or not month_dict:
continue
month_number = int_or_none(list(month_dict.keys())[0])
if month_number is None:
continue
entries.append(self.url_result(
'%s/%04d-%02d' % (base_url, year, month_number),
ie=TVNowAnnualIE.ie_key()))
elif navigation == 'season':
for item in items:
if not isinstance(item, dict):
continue
season_number = int_or_none(item.get('season'))
if season_number is None:
continue
entries.append(self.url_result(
'%s/staffel-%d' % (base_url, season_number),
ie=TVNowSeasonIE.ie_key()))
else:
raise ExtractorError('Unknown navigationType')
return self.playlist_result(entries, show_id, list_info.get('title')) return self.playlist_result(entries, show_id)

View File

@ -3,21 +3,23 @@ from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
ExtractorError,
get_element_by_attribute, get_element_by_attribute,
parse_duration, parse_duration,
try_get,
update_url_query, update_url_query,
ExtractorError,
) )
from ..compat import compat_str from ..compat import compat_str
class USATodayIE(InfoExtractor): class USATodayIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?usatoday\.com/(?:[^/]+/)*(?P<id>[^?/#]+)' _VALID_URL = r'https?://(?:www\.)?usatoday\.com/(?:[^/]+/)*(?P<id>[^?/#]+)'
_TEST = { _TESTS = [{
# Brightcove Partner ID = 29906170001
'url': 'http://www.usatoday.com/media/cinematic/video/81729424/us-france-warn-syrian-regime-ahead-of-new-peace-talks/', 'url': 'http://www.usatoday.com/media/cinematic/video/81729424/us-france-warn-syrian-regime-ahead-of-new-peace-talks/',
'md5': '4d40974481fa3475f8bccfd20c5361f8', 'md5': '033587d2529dc3411a1ab3644c3b8827',
'info_dict': { 'info_dict': {
'id': '81729424', 'id': '4799374959001',
'ext': 'mp4', 'ext': 'mp4',
'title': 'US, France warn Syrian regime ahead of new peace talks', 'title': 'US, France warn Syrian regime ahead of new peace talks',
'timestamp': 1457891045, 'timestamp': 1457891045,
@ -25,8 +27,20 @@ class USATodayIE(InfoExtractor):
'uploader_id': '29906170001', 'uploader_id': '29906170001',
'upload_date': '20160313', 'upload_date': '20160313',
} }
} }, {
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/29906170001/38a9eecc-bdd8-42a3-ba14-95397e48b3f8_default/index.html?videoId=%s' # ui-video-data[asset_metadata][items][brightcoveaccount] = 28911775001
'url': 'https://www.usatoday.com/story/tech/science/2018/08/21/yellowstone-supervolcano-eruption-stop-worrying-its-blow/973633002/',
'info_dict': {
'id': '5824495846001',
'ext': 'mp4',
'title': 'Yellowstone more likely to crack rather than explode',
'timestamp': 1534790612,
'description': 'md5:3715e7927639a4f16b474e9391687c62',
'uploader_id': '28911775001',
'upload_date': '20180820',
}
}]
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/%s/default_default/index.html?videoId=%s'
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) display_id = self._match_id(url)
@ -35,10 +49,11 @@ class USATodayIE(InfoExtractor):
if not ui_video_data: if not ui_video_data:
raise ExtractorError('no video on the webpage', expected=True) raise ExtractorError('no video on the webpage', expected=True)
video_data = self._parse_json(ui_video_data, display_id) video_data = self._parse_json(ui_video_data, display_id)
item = try_get(video_data, lambda x: x['asset_metadata']['items'], dict) or {}
return { return {
'_type': 'url_transparent', '_type': 'url_transparent',
'url': self.BRIGHTCOVE_URL_TEMPLATE % video_data['brightcove_id'], 'url': self.BRIGHTCOVE_URL_TEMPLATE % (item.get('brightcoveaccount', '29906170001'), item.get('brightcoveid') or video_data['brightcove_id']),
'id': compat_str(video_data['id']), 'id': compat_str(video_data['id']),
'title': video_data['title'], 'title': video_data['title'],
'thumbnail': video_data.get('thumbnail'), 'thumbnail': video_data.get('thumbnail'),

View File

@ -94,7 +94,6 @@ class ViceIE(AdobePassIE):
'url': 'https://www.viceland.com/en_us/video/thursday-march-1-2018/5a8f2d7ff1cdb332dd446ec1', 'url': 'https://www.viceland.com/en_us/video/thursday-march-1-2018/5a8f2d7ff1cdb332dd446ec1',
'only_matching': True, 'only_matching': True,
}] }]
_PREPLAY_HOST = 'vms.vice'
@staticmethod @staticmethod
def _extract_urls(webpage): def _extract_urls(webpage):
@ -158,9 +157,8 @@ class ViceIE(AdobePassIE):
}) })
try: try:
host = 'www.viceland' if is_locked else self._PREPLAY_HOST
preplay = self._download_json( preplay = self._download_json(
'https://%s.com/%s/video/preplay/%s' % (host, locale, video_id), 'https://vms.vice.com/%s/video/preplay/%s' % (locale, video_id),
video_id, query=query) video_id, query=query)
except ExtractorError as e: except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code in (400, 401): if isinstance(e.cause, compat_HTTPError) and e.cause.code in (400, 401):

View File

@ -4,8 +4,14 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str
from ..utils import ( from ..utils import (
int_or_none, int_or_none,
orderedSet,
parse_duration,
str_or_none,
unified_strdate,
url_or_none,
xpath_element, xpath_element,
xpath_text, xpath_text,
) )
@ -13,7 +19,19 @@ from ..utils import (
class VideomoreIE(InfoExtractor): class VideomoreIE(InfoExtractor):
IE_NAME = 'videomore' IE_NAME = 'videomore'
_VALID_URL = r'videomore:(?P<sid>\d+)$|https?://videomore\.ru/(?:(?:embed|[^/]+/[^/]+)/|[^/]+\?.*\btrack_id=)(?P<id>\d+)(?:[/?#&]|\.(?:xml|json)|$)' _VALID_URL = r'''(?x)
videomore:(?P<sid>\d+)$|
https?://(?:player\.)?videomore\.ru/
(?:
(?:
embed|
[^/]+/[^/]+
)/|
[^/]*\?.*?\btrack_id=
)
(?P<id>\d+)
(?:[/?#&]|\.(?:xml|json)|$)
'''
_TESTS = [{ _TESTS = [{
'url': 'http://videomore.ru/kino_v_detalayah/5_sezon/367617', 'url': 'http://videomore.ru/kino_v_detalayah/5_sezon/367617',
'md5': '44455a346edc0d509ac5b5a5b531dc35', 'md5': '44455a346edc0d509ac5b5a5b531dc35',
@ -79,6 +97,9 @@ class VideomoreIE(InfoExtractor):
}, { }, {
'url': 'videomore:367617', 'url': 'videomore:367617',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://player.videomore.ru/?partner_id=97&track_id=736234&autoplay=0&userToken=',
'only_matching': True,
}] }]
@staticmethod @staticmethod
@ -136,7 +157,7 @@ class VideomoreIE(InfoExtractor):
class VideomoreVideoIE(InfoExtractor): class VideomoreVideoIE(InfoExtractor):
IE_NAME = 'videomore:video' IE_NAME = 'videomore:video'
_VALID_URL = r'https?://videomore\.ru/(?:(?:[^/]+/){2})?(?P<id>[^/?#&]+)[/?#&]*$' _VALID_URL = r'https?://videomore\.ru/(?:(?:[^/]+/){2})?(?P<id>[^/?#&]+)(?:/*|[?#&].*?)$'
_TESTS = [{ _TESTS = [{
# single video with og:video:iframe # single video with og:video:iframe
'url': 'http://videomore.ru/elki_3', 'url': 'http://videomore.ru/elki_3',
@ -176,6 +197,9 @@ class VideomoreVideoIE(InfoExtractor):
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
}, {
'url': 'https://videomore.ru/molodezhka/6_sezon/29_seriya?utm_so',
'only_matching': True,
}] }]
@classmethod @classmethod
@ -196,13 +220,16 @@ class VideomoreVideoIE(InfoExtractor):
r'track-id=["\'](\d+)', r'track-id=["\'](\d+)',
r'xcnt_product_id\s*=\s*(\d+)'), webpage, 'video id') r'xcnt_product_id\s*=\s*(\d+)'), webpage, 'video id')
video_url = 'videomore:%s' % video_id video_url = 'videomore:%s' % video_id
else:
video_id = None
return self.url_result(video_url, VideomoreIE.ie_key()) return self.url_result(
video_url, ie=VideomoreIE.ie_key(), video_id=video_id)
class VideomoreSeasonIE(InfoExtractor): class VideomoreSeasonIE(InfoExtractor):
IE_NAME = 'videomore:season' IE_NAME = 'videomore:season'
_VALID_URL = r'https?://videomore\.ru/(?!embed)(?P<id>[^/]+/[^/?#&]+)[/?#&]*$' _VALID_URL = r'https?://videomore\.ru/(?!embed)(?P<id>[^/]+/[^/?#&]+)(?:/*|[?#&].*?)$'
_TESTS = [{ _TESTS = [{
'url': 'http://videomore.ru/molodezhka/sezon_promo', 'url': 'http://videomore.ru/molodezhka/sezon_promo',
'info_dict': { 'info_dict': {
@ -210,8 +237,16 @@ class VideomoreSeasonIE(InfoExtractor):
'title': 'Молодежка Промо', 'title': 'Молодежка Промо',
}, },
'playlist_mincount': 12, 'playlist_mincount': 12,
}, {
'url': 'http://videomore.ru/molodezhka/sezon_promo?utm_so',
'only_matching': True,
}] }]
@classmethod
def suitable(cls, url):
return (False if (VideomoreIE.suitable(url) or VideomoreVideoIE.suitable(url))
else super(VideomoreSeasonIE, cls).suitable(url))
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) display_id = self._match_id(url)
@ -219,9 +254,54 @@ class VideomoreSeasonIE(InfoExtractor):
title = self._og_search_title(webpage) title = self._og_search_title(webpage)
entries = [ data = self._parse_json(
self.url_result(item) for item in re.findall( self._html_search_regex(
r'<a[^>]+href="((?:https?:)?//videomore\.ru/%s/[^/]+)"[^>]+class="widget-item-desc"' r'\bclass=["\']seasons-tracks["\'][^>]+\bdata-custom-data=(["\'])(?P<value>{.+?})\1',
% display_id, webpage)] webpage, 'data', default='{}', group='value'),
display_id, fatal=False)
entries = []
if data:
episodes = data.get('episodes')
if isinstance(episodes, list):
for ep in episodes:
if not isinstance(ep, dict):
continue
ep_id = int_or_none(ep.get('id'))
ep_url = url_or_none(ep.get('url'))
if ep_id:
e = {
'url': 'videomore:%s' % ep_id,
'id': compat_str(ep_id),
}
elif ep_url:
e = {'url': ep_url}
else:
continue
e.update({
'_type': 'url',
'ie_key': VideomoreIE.ie_key(),
'title': str_or_none(ep.get('title')),
'thumbnail': url_or_none(ep.get('image')),
'duration': parse_duration(ep.get('duration')),
'episode_number': int_or_none(ep.get('number')),
'upload_date': unified_strdate(ep.get('date')),
})
entries.append(e)
if not entries:
entries = [
self.url_result(
'videomore:%s' % video_id, ie=VideomoreIE.ie_key(),
video_id=video_id)
for video_id in orderedSet(re.findall(
r':(?:id|key)=["\'](\d+)["\']', webpage))]
if not entries:
entries = [
self.url_result(item) for item in re.findall(
r'<a[^>]+href="((?:https?:)?//videomore\.ru/%s/[^/]+)"[^>]+class="widget-item-desc"'
% display_id, webpage)]
return self.playlist_result(entries, display_id, title) return self.playlist_result(entries, display_id, title)

View File

@ -1,6 +1,7 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import base64
import json import json
import re import re
import itertools import itertools
@ -392,6 +393,22 @@ class VimeoIE(VimeoBaseInfoExtractor):
'skip_download': True, 'skip_download': True,
}, },
}, },
{
'url': 'http://player.vimeo.com/video/68375962',
'md5': 'aaf896bdb7ddd6476df50007a0ac0ae7',
'info_dict': {
'id': '68375962',
'ext': 'mp4',
'title': 'youtube-dl password protected test video',
'uploader_url': r're:https?://(?:www\.)?vimeo\.com/user18948128',
'uploader_id': 'user18948128',
'uploader': 'Jaime Marquínez Ferrándiz',
'duration': 10,
},
'params': {
'videopassword': 'youtube-dl',
},
},
{ {
'url': 'http://vimeo.com/moogaloop.swf?clip_id=2539741', 'url': 'http://vimeo.com/moogaloop.swf?clip_id=2539741',
'only_matching': True, 'only_matching': True,
@ -418,6 +435,8 @@ class VimeoIE(VimeoBaseInfoExtractor):
'url': 'https://vimeo.com/160743502/abd0e13fb4', 'url': 'https://vimeo.com/160743502/abd0e13fb4',
'only_matching': True, 'only_matching': True,
} }
# https://gettingthingsdone.com/workflowmap/
# vimeo embed with check-password page protected by Referer header
] ]
@staticmethod @staticmethod
@ -448,18 +467,22 @@ class VimeoIE(VimeoBaseInfoExtractor):
urls = VimeoIE._extract_urls(url, webpage) urls = VimeoIE._extract_urls(url, webpage)
return urls[0] if urls else None return urls[0] if urls else None
def _verify_player_video_password(self, url, video_id): def _verify_player_video_password(self, url, video_id, headers):
password = self._downloader.params.get('videopassword') password = self._downloader.params.get('videopassword')
if password is None: if password is None:
raise ExtractorError('This video is protected by a password, use the --video-password option') raise ExtractorError('This video is protected by a password, use the --video-password option')
data = urlencode_postdata({'password': password}) data = urlencode_postdata({
pass_url = url + '/check-password' 'password': base64.b64encode(password.encode()),
password_request = sanitized_Request(pass_url, data) })
password_request.add_header('Content-Type', 'application/x-www-form-urlencoded') headers = merge_dicts(headers, {
password_request.add_header('Referer', url) 'Content-Type': 'application/x-www-form-urlencoded',
return self._download_json( })
password_request, video_id, checked = self._download_json(
'Verifying the password', 'Wrong password') url + '/check-password', video_id,
'Verifying the password', data=data, headers=headers)
if checked is False:
raise ExtractorError('Wrong video password', expected=True)
return checked
def _real_initialize(self): def _real_initialize(self):
self._login() self._login()
@ -572,7 +595,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
cause=e) cause=e)
else: else:
if config.get('view') == 4: if config.get('view') == 4:
config = self._verify_player_video_password(redirect_url, video_id) config = self._verify_player_video_password(redirect_url, video_id, headers)
vod = config.get('video', {}).get('vod', {}) vod = config.get('video', {}).get('vod', {})

View File

@ -11,10 +11,12 @@ import time
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_HTTPError,
compat_urllib_parse_urlencode, compat_urllib_parse_urlencode,
compat_urllib_parse, compat_urllib_parse,
) )
from ..utils import ( from ..utils import (
ExtractorError,
float_or_none, float_or_none,
int_or_none, int_or_none,
) )
@ -24,29 +26,41 @@ class VRVBaseIE(InfoExtractor):
_API_DOMAIN = None _API_DOMAIN = None
_API_PARAMS = {} _API_PARAMS = {}
_CMS_SIGNING = {} _CMS_SIGNING = {}
_TOKEN = None
_TOKEN_SECRET = ''
def _call_api(self, path, video_id, note, data=None): def _call_api(self, path, video_id, note, data=None):
# https://tools.ietf.org/html/rfc5849#section-3
base_url = self._API_DOMAIN + '/core/' + path base_url = self._API_DOMAIN + '/core/' + path
encoded_query = compat_urllib_parse_urlencode({ query = [
'oauth_consumer_key': self._API_PARAMS['oAuthKey'], ('oauth_consumer_key', self._API_PARAMS['oAuthKey']),
'oauth_nonce': ''.join([random.choice(string.ascii_letters) for _ in range(32)]), ('oauth_nonce', ''.join([random.choice(string.ascii_letters) for _ in range(32)])),
'oauth_signature_method': 'HMAC-SHA1', ('oauth_signature_method', 'HMAC-SHA1'),
'oauth_timestamp': int(time.time()), ('oauth_timestamp', int(time.time())),
'oauth_version': '1.0', ]
}) if self._TOKEN:
query.append(('oauth_token', self._TOKEN))
encoded_query = compat_urllib_parse_urlencode(query)
headers = self.geo_verification_headers() headers = self.geo_verification_headers()
if data: if data:
data = json.dumps(data).encode() data = json.dumps(data).encode()
headers['Content-Type'] = 'application/json' headers['Content-Type'] = 'application/json'
method = 'POST' if data else 'GET' base_string = '&'.join([
base_string = '&'.join([method, compat_urllib_parse.quote(base_url, ''), compat_urllib_parse.quote(encoded_query, '')]) 'POST' if data else 'GET',
compat_urllib_parse.quote(base_url, ''),
compat_urllib_parse.quote(encoded_query, '')])
oauth_signature = base64.b64encode(hmac.new( oauth_signature = base64.b64encode(hmac.new(
(self._API_PARAMS['oAuthSecret'] + '&').encode('ascii'), (self._API_PARAMS['oAuthSecret'] + '&' + self._TOKEN_SECRET).encode('ascii'),
base_string.encode(), hashlib.sha1).digest()).decode() base_string.encode(), hashlib.sha1).digest()).decode()
encoded_query += '&oauth_signature=' + compat_urllib_parse.quote(oauth_signature, '') encoded_query += '&oauth_signature=' + compat_urllib_parse.quote(oauth_signature, '')
return self._download_json( try:
'?'.join([base_url, encoded_query]), video_id, return self._download_json(
note='Downloading %s JSON metadata' % note, headers=headers, data=data) '?'.join([base_url, encoded_query]), video_id,
note='Downloading %s JSON metadata' % note, headers=headers, data=data)
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 401:
raise ExtractorError(json.loads(e.cause.read().decode())['message'], expected=True)
raise
def _call_cms(self, path, video_id, note): def _call_cms(self, path, video_id, note):
if not self._CMS_SIGNING: if not self._CMS_SIGNING:
@ -55,19 +69,22 @@ class VRVBaseIE(InfoExtractor):
self._API_DOMAIN + path, video_id, query=self._CMS_SIGNING, self._API_DOMAIN + path, video_id, query=self._CMS_SIGNING,
note='Downloading %s JSON metadata' % note, headers=self.geo_verification_headers()) note='Downloading %s JSON metadata' % note, headers=self.geo_verification_headers())
def _set_api_params(self, webpage, video_id):
if not self._API_PARAMS:
self._API_PARAMS = self._parse_json(self._search_regex(
r'window\.__APP_CONFIG__\s*=\s*({.+?})</script>',
webpage, 'api config'), video_id)['cxApiParams']
self._API_DOMAIN = self._API_PARAMS.get('apiDomain', 'https://api.vrv.co')
def _get_cms_resource(self, resource_key, video_id): def _get_cms_resource(self, resource_key, video_id):
return self._call_api( return self._call_api(
'cms_resource', video_id, 'resource path', data={ 'cms_resource', video_id, 'resource path', data={
'resource_key': resource_key, 'resource_key': resource_key,
})['__links__']['cms_resource']['href'] })['__links__']['cms_resource']['href']
def _real_initialize(self):
webpage = self._download_webpage(
'https://vrv.co/', None, headers=self.geo_verification_headers())
self._API_PARAMS = self._parse_json(self._search_regex(
[
r'window\.__APP_CONFIG__\s*=\s*({.+?})(?:</script>|;)',
r'window\.__APP_CONFIG__\s*=\s*({.+})'
], webpage, 'app config'), None)['cxApiParams']
self._API_DOMAIN = self._API_PARAMS.get('apiDomain', 'https://api.vrv.co')
class VRVIE(VRVBaseIE): class VRVIE(VRVBaseIE):
IE_NAME = 'vrv' IE_NAME = 'vrv'
@ -86,6 +103,22 @@ class VRVIE(VRVBaseIE):
'skip_download': True, 'skip_download': True,
}, },
}] }]
_NETRC_MACHINE = 'vrv'
def _real_initialize(self):
super(VRVIE, self)._real_initialize()
email, password = self._get_login_info()
if email is None:
return
token_credentials = self._call_api(
'authenticate/by:credentials', None, 'Token Credentials', data={
'email': email,
'password': password,
})
self._TOKEN = token_credentials['oauth_token']
self._TOKEN_SECRET = token_credentials['oauth_token_secret']
def _extract_vrv_formats(self, url, video_id, stream_format, audio_lang, hardsub_lang): def _extract_vrv_formats(self, url, video_id, stream_format, audio_lang, hardsub_lang):
if not url or stream_format not in ('hls', 'dash'): if not url or stream_format not in ('hls', 'dash'):
@ -116,28 +149,16 @@ class VRVIE(VRVBaseIE):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(
url, video_id,
headers=self.geo_verification_headers())
media_resource = self._parse_json(self._search_regex(
[
r'window\.__INITIAL_STATE__\s*=\s*({.+?})(?:</script>|;)',
r'window\.__INITIAL_STATE__\s*=\s*({.+})'
], webpage, 'inital state'), video_id).get('watch', {}).get('mediaResource') or {}
video_data = media_resource.get('json') episode_path = self._get_cms_resource(
if not video_data: 'cms:/episodes/' + video_id, video_id)
self._set_api_params(webpage, video_id) video_data = self._call_cms(episode_path, video_id, 'video')
episode_path = self._get_cms_resource(
'cms:/episodes/' + video_id, video_id)
video_data = self._call_cms(episode_path, video_id, 'video')
title = video_data['title'] title = video_data['title']
streams_json = media_resource.get('streams', {}).get('json', {}) streams_path = video_data['__links__'].get('streams', {}).get('href')
if not streams_json: if not streams_path:
self._set_api_params(webpage, video_id) self.raise_login_required()
streams_path = video_data['__links__']['streams']['href'] streams_json = self._call_cms(streams_path, video_id, 'streams')
streams_json = self._call_cms(streams_path, video_id, 'streams')
audio_locale = streams_json.get('audio_locale') audio_locale = streams_json.get('audio_locale')
formats = [] formats = []
@ -202,11 +223,7 @@ class VRVSeriesIE(VRVBaseIE):
def _real_extract(self, url): def _real_extract(self, url):
series_id = self._match_id(url) series_id = self._match_id(url)
webpage = self._download_webpage(
url, series_id,
headers=self.geo_verification_headers())
self._set_api_params(webpage, series_id)
seasons_path = self._get_cms_resource( seasons_path = self._get_cms_resource(
'cms:/seasons?series_id=' + series_id, series_id) 'cms:/seasons?series_id=' + series_id, series_id)
seasons_data = self._call_cms(seasons_path, series_id, 'seasons') seasons_data = self._call_cms(seasons_path, series_id, 'seasons')

View File

@ -0,0 +1,66 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
ExtractorError,
merge_dicts,
urljoin,
)
class WakanimIE(InfoExtractor):
_VALID_URL = r'https://(?:www\.)?wakanim\.tv/[^/]+/v2/catalogue/episode/(?P<id>\d+)'
_TESTS = [{
'url': 'https://www.wakanim.tv/de/v2/catalogue/episode/2997/the-asterisk-war-omu-staffel-1-episode-02-omu',
'info_dict': {
'id': '2997',
'ext': 'mp4',
'title': 'Episode 02',
'description': 'md5:2927701ea2f7e901de8bfa8d39b2852d',
'series': 'The Asterisk War (OmU.)',
'season_number': 1,
'episode': 'Episode 02',
'episode_number': 2,
},
'params': {
'format': 'bestvideo',
'skip_download': True,
},
}, {
# DRM Protected
'url': 'https://www.wakanim.tv/de/v2/catalogue/episode/7843/sword-art-online-alicization-omu-arc-2-folge-15-omu',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
m3u8_url = urljoin(url, self._search_regex(
r'file\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1', webpage, 'm3u8 url',
group='url'))
# https://docs.microsoft.com/en-us/azure/media-services/previous/media-services-content-protection-overview#streaming-urls
encryption = self._search_regex(
r'encryption%3D(c(?:enc|bc(?:s-aapl)?))',
m3u8_url, 'encryption', default=None)
if encryption and encryption in ('cenc', 'cbcs-aapl'):
raise ExtractorError('This video is DRM protected.', expected=True)
formats = self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls')
info = self._search_json_ld(webpage, video_id, default={})
title = self._search_regex(
(r'<h1[^>]+\bclass=["\']episode_h1[^>]+\btitle=(["\'])(?P<title>(?:(?!\1).)+)\1',
r'<span[^>]+\bclass=["\']episode_title["\'][^>]*>(?P<title>[^<]+)'),
webpage, 'title', default=None, group='title')
return merge_dicts(info, {
'id': video_id,
'title': title,
'formats': formats,
})

View File

@ -12,7 +12,7 @@ from ..utils import (
class WistiaIE(InfoExtractor): class WistiaIE(InfoExtractor):
_VALID_URL = r'(?:wistia:|https?://(?:fast\.)?wistia\.(?:net|com)/embed/iframe/)(?P<id>[a-z0-9]+)' _VALID_URL = r'(?:wistia:|https?://(?:fast\.)?wistia\.(?:net|com)/embed/(?:iframe|medias)/)(?P<id>[a-z0-9]+)'
_API_URL = 'http://fast.wistia.com/embed/medias/%s.json' _API_URL = 'http://fast.wistia.com/embed/medias/%s.json'
_IFRAME_URL = 'http://fast.wistia.net/embed/iframe/%s' _IFRAME_URL = 'http://fast.wistia.net/embed/iframe/%s'
@ -38,6 +38,9 @@ class WistiaIE(InfoExtractor):
}, { }, {
'url': 'http://fast.wistia.com/embed/iframe/sh7fpupwlt', 'url': 'http://fast.wistia.com/embed/iframe/sh7fpupwlt',
'only_matching': True, 'only_matching': True,
}, {
'url': 'http://fast.wistia.net/embed/medias/sh7fpupwlt.json',
'only_matching': True,
}] }]
@staticmethod @staticmethod

View File

@ -68,11 +68,9 @@ class YouPornIE(InfoExtractor):
request.add_header('Cookie', 'age_verified=1') request.add_header('Cookie', 'age_verified=1')
webpage = self._download_webpage(request, display_id) webpage = self._download_webpage(request, display_id)
title = self._search_regex( title = self._html_search_regex(
[r'(?:video_titles|videoTitle)\s*[:=]\s*(["\'])(?P<title>(?:(?!\1).)+)\1', r'(?s)<div[^>]+class=["\']watchVideoTitle[^>]+>(.+?)</div>',
r'<h1[^>]+class=["\']heading\d?["\'][^>]*>(?P<title>[^<]+)<'], webpage, 'title', default=None) or self._og_search_title(
webpage, 'title', group='title',
default=None) or self._og_search_title(
webpage, default=None) or self._html_search_meta( webpage, default=None) or self._html_search_meta(
'title', webpage, fatal=True) 'title', webpage, fatal=True)
@ -134,7 +132,11 @@ class YouPornIE(InfoExtractor):
formats.append(f) formats.append(f)
self._sort_formats(formats) self._sort_formats(formats)
description = self._og_search_description(webpage, default=None) description = self._html_search_regex(
r'(?s)<div[^>]+\bid=["\']description["\'][^>]*>(.+?)</div>',
webpage, 'description',
default=None) or self._og_search_description(
webpage, default=None)
thumbnail = self._search_regex( thumbnail = self._search_regex(
r'(?:imageurl\s*=|poster\s*:)\s*(["\'])(?P<thumbnail>.+?)\1', r'(?:imageurl\s*=|poster\s*:)\s*(["\'])(?P<thumbnail>.+?)\1',
webpage, 'thumbnail', fatal=False, group='thumbnail') webpage, 'thumbnail', fatal=False, group='thumbnail')

View File

@ -566,7 +566,6 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'uploader': 'SET India', 'uploader': 'SET India',
'uploader_id': 'setindia', 'uploader_id': 'setindia',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/setindia', 'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/setindia',
'license': 'Standard YouTube License',
'age_limit': 18, 'age_limit': 18,
} }
}, },
@ -604,7 +603,6 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/8KVIDEO', 'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/8KVIDEO',
'description': '', 'description': '',
'uploader': '8KVIDEO', 'uploader': '8KVIDEO',
'license': 'Standard YouTube License',
'title': 'UHDTV TEST 8K VIDEO.mp4' 'title': 'UHDTV TEST 8K VIDEO.mp4'
}, },
'params': { 'params': {
@ -639,13 +637,11 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'id': 'nfWlot6h_JM', 'id': 'nfWlot6h_JM',
'ext': 'm4a', 'ext': 'm4a',
'title': 'Taylor Swift - Shake It Off', 'title': 'Taylor Swift - Shake It Off',
'alt_title': 'Shake It Off', 'description': 'md5:bec2185232c05479482cb5a9b82719bf',
'description': 'md5:95f66187cd7c8b2c13eb78e1223b63c3',
'duration': 242, 'duration': 242,
'uploader': 'TaylorSwiftVEVO', 'uploader': 'TaylorSwiftVEVO',
'uploader_id': 'TaylorSwiftVEVO', 'uploader_id': 'TaylorSwiftVEVO',
'upload_date': '20140818', 'upload_date': '20140818',
'license': 'Standard YouTube License',
'creator': 'Taylor Swift', 'creator': 'Taylor Swift',
}, },
'params': { 'params': {
@ -661,10 +657,9 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'duration': 219, 'duration': 219,
'upload_date': '20100909', 'upload_date': '20100909',
'uploader': 'TJ Kirk', 'uploader': 'Amazing Atheist',
'uploader_id': 'TheAmazingAtheist', 'uploader_id': 'TheAmazingAtheist',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/TheAmazingAtheist', 'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/TheAmazingAtheist',
'license': 'Standard YouTube License',
'title': 'Burning Everyone\'s Koran', 'title': 'Burning Everyone\'s Koran',
'description': 'SUBSCRIBE: http://www.youtube.com/saturninefilms\n\nEven Obama has taken a stand against freedom on this issue: http://www.huffingtonpost.com/2010/09/09/obama-gma-interview-quran_n_710282.html', 'description': 'SUBSCRIBE: http://www.youtube.com/saturninefilms\n\nEven Obama has taken a stand against freedom on this issue: http://www.huffingtonpost.com/2010/09/09/obama-gma-interview-quran_n_710282.html',
} }
@ -682,7 +677,6 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'uploader_id': 'WitcherGame', 'uploader_id': 'WitcherGame',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/WitcherGame', 'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/WitcherGame',
'upload_date': '20140605', 'upload_date': '20140605',
'license': 'Standard YouTube License',
'age_limit': 18, 'age_limit': 18,
}, },
}, },
@ -691,7 +685,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'url': 'https://www.youtube.com/watch?v=6kLq3WMV1nU', 'url': 'https://www.youtube.com/watch?v=6kLq3WMV1nU',
'info_dict': { 'info_dict': {
'id': '6kLq3WMV1nU', 'id': '6kLq3WMV1nU',
'ext': 'webm', 'ext': 'mp4',
'title': 'Dedication To My Ex (Miss That) (Lyric Video)', 'title': 'Dedication To My Ex (Miss That) (Lyric Video)',
'description': 'md5:33765bb339e1b47e7e72b5490139bb41', 'description': 'md5:33765bb339e1b47e7e72b5490139bb41',
'duration': 246, 'duration': 246,
@ -699,7 +693,6 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'uploader_id': 'LloydVEVO', 'uploader_id': 'LloydVEVO',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/LloydVEVO', 'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/LloydVEVO',
'upload_date': '20110629', 'upload_date': '20110629',
'license': 'Standard YouTube License',
'age_limit': 18, 'age_limit': 18,
}, },
}, },
@ -717,7 +710,6 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'creator': 'deadmau5', 'creator': 'deadmau5',
'description': 'md5:12c56784b8032162bb936a5f76d55360', 'description': 'md5:12c56784b8032162bb936a5f76d55360',
'uploader': 'deadmau5', 'uploader': 'deadmau5',
'license': 'Standard YouTube License',
'title': 'Deadmau5 - Some Chords (HD)', 'title': 'Deadmau5 - Some Chords (HD)',
'alt_title': 'Some Chords', 'alt_title': 'Some Chords',
}, },
@ -735,7 +727,6 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'upload_date': '20150827', 'upload_date': '20150827',
'uploader_id': 'olympic', 'uploader_id': 'olympic',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/olympic', 'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/olympic',
'license': 'Standard YouTube License',
'description': 'HO09 - Women - GER-AUS - Hockey - 31 July 2012 - London 2012 Olympic Games', 'description': 'HO09 - Women - GER-AUS - Hockey - 31 July 2012 - London 2012 Olympic Games',
'uploader': 'Olympic', 'uploader': 'Olympic',
'title': 'Hockey - Women - GER-AUS - London 2012 Olympic Games', 'title': 'Hockey - Women - GER-AUS - London 2012 Olympic Games',
@ -757,7 +748,6 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/AllenMeow', 'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/AllenMeow',
'description': 'made by Wacom from Korea | 字幕&加油添醋 by TY\'s Allen | 感謝heylisa00cavey1001同學熱情提供梗及翻譯', 'description': 'made by Wacom from Korea | 字幕&加油添醋 by TY\'s Allen | 感謝heylisa00cavey1001同學熱情提供梗及翻譯',
'uploader': '孫ᄋᄅ', 'uploader': '孫ᄋᄅ',
'license': 'Standard YouTube License',
'title': '[A-made] 變態妍字幕版 太妍 我就是這樣的人', 'title': '[A-made] 變態妍字幕版 太妍 我就是這樣的人',
}, },
}, },
@ -791,7 +781,6 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'uploader_id': 'dorappi2000', 'uploader_id': 'dorappi2000',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/dorappi2000', 'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/dorappi2000',
'uploader': 'dorappi2000', 'uploader': 'dorappi2000',
'license': 'Standard YouTube License',
'formats': 'mincount:31', 'formats': 'mincount:31',
}, },
'skip': 'not actual anymore', 'skip': 'not actual anymore',
@ -807,7 +796,6 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'uploader': 'Airtek', 'uploader': 'Airtek',
'description': 'Retransmisión en directo de la XVIII media maratón de Zaragoza.', 'description': 'Retransmisión en directo de la XVIII media maratón de Zaragoza.',
'uploader_id': 'UCzTzUmjXxxacNnL8I3m4LnQ', 'uploader_id': 'UCzTzUmjXxxacNnL8I3m4LnQ',
'license': 'Standard YouTube License',
'title': 'Retransmisión XVIII Media maratón Zaragoza 2015', 'title': 'Retransmisión XVIII Media maratón Zaragoza 2015',
}, },
'params': { 'params': {
@ -880,6 +868,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
'skip': 'This video is not available.',
}, },
{ {
# Multifeed video with comma in title (see https://github.com/rg3/youtube-dl/issues/8536) # Multifeed video with comma in title (see https://github.com/rg3/youtube-dl/issues/8536)
@ -916,7 +905,6 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'uploader_id': 'IronSoulElf', 'uploader_id': 'IronSoulElf',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/IronSoulElf', 'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/IronSoulElf',
'uploader': 'IronSoulElf', 'uploader': 'IronSoulElf',
'license': 'Standard YouTube License',
'creator': 'Todd Haberman, Daniel Law Heath and Aaron Kaplan', 'creator': 'Todd Haberman, Daniel Law Heath and Aaron Kaplan',
'track': 'Dark Walk - Position Music', 'track': 'Dark Walk - Position Music',
'artist': 'Todd Haberman, Daniel Law Heath and Aaron Kaplan', 'artist': 'Todd Haberman, Daniel Law Heath and Aaron Kaplan',
@ -1020,13 +1008,12 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'id': 'iqKdEhx-dD4', 'id': 'iqKdEhx-dD4',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Isolation - Mind Field (Ep 1)', 'title': 'Isolation - Mind Field (Ep 1)',
'description': 'md5:25b78d2f64ae81719f5c96319889b736', 'description': 'md5:46a29be4ceffa65b92d277b93f463c0f',
'duration': 2085, 'duration': 2085,
'upload_date': '20170118', 'upload_date': '20170118',
'uploader': 'Vsauce', 'uploader': 'Vsauce',
'uploader_id': 'Vsauce', 'uploader_id': 'Vsauce',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/Vsauce', 'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/Vsauce',
'license': 'Standard YouTube License',
'series': 'Mind Field', 'series': 'Mind Field',
'season_number': 1, 'season_number': 1,
'episode_number': 1, 'episode_number': 1,
@ -1052,7 +1039,6 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'uploader': 'New Century Foundation', 'uploader': 'New Century Foundation',
'uploader_id': 'UCEJYpZGqgUob0zVVEaLhvVg', 'uploader_id': 'UCEJYpZGqgUob0zVVEaLhvVg',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/channel/UCEJYpZGqgUob0zVVEaLhvVg', 'uploader_url': r're:https?://(?:www\.)?youtube\.com/channel/UCEJYpZGqgUob0zVVEaLhvVg',
'license': 'Standard YouTube License',
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
@ -1080,6 +1066,26 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
# DRM protected # DRM protected
'url': 'https://www.youtube.com/watch?v=s7_qI6_mIXc', 'url': 'https://www.youtube.com/watch?v=s7_qI6_mIXc',
'only_matching': True, 'only_matching': True,
},
{
# Video with unsupported adaptive stream type formats
'url': 'https://www.youtube.com/watch?v=Z4Vy8R84T1U',
'info_dict': {
'id': 'Z4Vy8R84T1U',
'ext': 'mp4',
'title': 'saman SMAN 53 Jakarta(Sancety) opening COFFEE4th at SMAN 53 Jakarta',
'description': 'md5:d41d8cd98f00b204e9800998ecf8427e',
'duration': 433,
'upload_date': '20130923',
'uploader': 'Amelia Putri Harwita',
'uploader_id': 'UCpOxM49HJxmC1qCalXyB3_Q',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/channel/UCpOxM49HJxmC1qCalXyB3_Q',
'formats': 'maxcount:10',
},
'params': {
'skip_download': True,
'youtube_include_dash_manifest': False,
},
} }
] ]
@ -1196,8 +1202,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
funcname = self._search_regex( funcname = self._search_regex(
(r'(["\'])signature\1\s*,\s*(?P<sig>[a-zA-Z0-9$]+)\(', (r'(["\'])signature\1\s*,\s*(?P<sig>[a-zA-Z0-9$]+)\(',
r'\.sig\|\|(?P<sig>[a-zA-Z0-9$]+)\(', r'\.sig\|\|(?P<sig>[a-zA-Z0-9$]+)\(',
r'yt\.akamaized\.net/\)\s*\|\|\s*.*?\s*c\s*&&\s*d\.set\([^,]+\s*,\s*(?P<sig>[a-zA-Z0-9$]+)\(', r'yt\.akamaized\.net/\)\s*\|\|\s*.*?\s*c\s*&&\s*d\.set\([^,]+\s*,\s*(?:encodeURIComponent\s*\()?(?P<sig>[a-zA-Z0-9$]+)\(',
r'\bc\s*&&\s*d\.set\([^,]+\s*,\s*(?P<sig>[a-zA-Z0-9$]+)\(', r'\bc\s*&&\s*d\.set\([^,]+\s*,\s*(?:encodeURIComponent\s*\()?\s*(?P<sig>[a-zA-Z0-9$]+)\(',
r'\bc\s*&&\s*d\.set\([^,]+\s*,\s*\([^)]*\)\s*\(\s*(?P<sig>[a-zA-Z0-9$]+)\('), r'\bc\s*&&\s*d\.set\([^,]+\s*,\s*\([^)]*\)\s*\(\s*(?P<sig>[a-zA-Z0-9$]+)\('),
jscode, 'Initial JS player signature function name', group='sig') jscode, 'Initial JS player signature function name', group='sig')
@ -1544,6 +1550,13 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
if dash_mpd and dash_mpd[0] not in dash_mpds: if dash_mpd and dash_mpd[0] not in dash_mpds:
dash_mpds.append(dash_mpd[0]) dash_mpds.append(dash_mpd[0])
def add_dash_mpd_pr(pl_response):
dash_mpd = url_or_none(try_get(
pl_response, lambda x: x['streamingData']['dashManifestUrl'],
compat_str))
if dash_mpd and dash_mpd not in dash_mpds:
dash_mpds.append(dash_mpd)
is_live = None is_live = None
view_count = None view_count = None
@ -1601,6 +1614,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
if isinstance(pl_response, dict): if isinstance(pl_response, dict):
player_response = pl_response player_response = pl_response
if not video_info or self._downloader.params.get('youtube_include_dash_manifest', True): if not video_info or self._downloader.params.get('youtube_include_dash_manifest', True):
add_dash_mpd_pr(player_response)
# We also try looking in get_video_info since it may contain different dashmpd # We also try looking in get_video_info since it may contain different dashmpd
# URL that points to a DASH manifest with possibly different itag set (some itags # URL that points to a DASH manifest with possibly different itag set (some itags
# are missing from DASH manifest pointed by webpage's dashmpd, some - from DASH # are missing from DASH manifest pointed by webpage's dashmpd, some - from DASH
@ -1632,6 +1646,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
pl_response = get_video_info.get('player_response', [None])[0] pl_response = get_video_info.get('player_response', [None])[0]
if isinstance(pl_response, dict): if isinstance(pl_response, dict):
player_response = pl_response player_response = pl_response
add_dash_mpd_pr(player_response)
add_dash_mpd(get_video_info) add_dash_mpd(get_video_info)
if view_count is None: if view_count is None:
view_count = extract_view_count(get_video_info) view_count = extract_view_count(get_video_info)
@ -1817,6 +1832,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
url_data = compat_parse_qs(url_data_str) url_data = compat_parse_qs(url_data_str)
if 'itag' not in url_data or 'url' not in url_data: if 'itag' not in url_data or 'url' not in url_data:
continue continue
stream_type = int_or_none(try_get(url_data, lambda x: x['stream_type'][0]))
# Unsupported FORMAT_STREAM_TYPE_OTF
if stream_type == 3:
continue
format_id = url_data['itag'][0] format_id = url_data['itag'][0]
url = url_data['url'][0] url = url_data['url'][0]
@ -1930,31 +1949,38 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'http_chunk_size': 10485760, 'http_chunk_size': 10485760,
} }
formats.append(dct) formats.append(dct)
elif video_info.get('hlsvp'):
manifest_url = video_info['hlsvp'][0]
formats = []
m3u8_formats = self._extract_m3u8_formats(
manifest_url, video_id, 'mp4', fatal=False)
for a_format in m3u8_formats:
itag = self._search_regex(
r'/itag/(\d+)/', a_format['url'], 'itag', default=None)
if itag:
a_format['format_id'] = itag
if itag in self._formats:
dct = self._formats[itag].copy()
dct.update(a_format)
a_format = dct
a_format['player_url'] = player_url
# Accept-Encoding header causes failures in live streams on Youtube and Youtube Gaming
a_format.setdefault('http_headers', {})['Youtubedl-no-compression'] = 'True'
formats.append(a_format)
else: else:
error_message = clean_html(video_info.get('reason', [None])[0]) manifest_url = (
if not error_message: url_or_none(try_get(
error_message = extract_unavailable_message() player_response,
if error_message: lambda x: x['streamingData']['hlsManifestUrl'],
raise ExtractorError(error_message, expected=True) compat_str)) or
raise ExtractorError('no conn, hlsvp or url_encoded_fmt_stream_map information found in video info') url_or_none(try_get(
video_info, lambda x: x['hlsvp'][0], compat_str)))
if manifest_url:
formats = []
m3u8_formats = self._extract_m3u8_formats(
manifest_url, video_id, 'mp4', fatal=False)
for a_format in m3u8_formats:
itag = self._search_regex(
r'/itag/(\d+)/', a_format['url'], 'itag', default=None)
if itag:
a_format['format_id'] = itag
if itag in self._formats:
dct = self._formats[itag].copy()
dct.update(a_format)
a_format = dct
a_format['player_url'] = player_url
# Accept-Encoding header causes failures in live streams on Youtube and Youtube Gaming
a_format.setdefault('http_headers', {})['Youtubedl-no-compression'] = 'True'
formats.append(a_format)
else:
error_message = clean_html(video_info.get('reason', [None])[0])
if not error_message:
error_message = extract_unavailable_message()
if error_message:
raise ExtractorError(error_message, expected=True)
raise ExtractorError('no conn, hlsvp, hlsManifestUrl or url_encoded_fmt_stream_map information found in video info')
# uploader # uploader
video_uploader = try_get( video_uploader = try_get(

View File

@ -9,9 +9,6 @@ import re
from .common import AudioConversionError, PostProcessor from .common import AudioConversionError, PostProcessor
from ..compat import (
compat_subprocess_get_DEVNULL,
)
from ..utils import ( from ..utils import (
encodeArgument, encodeArgument,
encodeFilename, encodeFilename,
@ -79,6 +76,20 @@ class FFmpegPostProcessor(PostProcessor):
programs = ['avprobe', 'avconv', 'ffmpeg', 'ffprobe'] programs = ['avprobe', 'avconv', 'ffmpeg', 'ffprobe']
prefer_ffmpeg = True prefer_ffmpeg = True
def get_ffmpeg_version(path):
ver = get_exe_version(path, args=['-version'])
if ver:
regexs = [
r'(?:\d+:)?([0-9.]+)-[0-9]+ubuntu[0-9.]+$', # Ubuntu, see [1]
r'n([0-9.]+)$', # Arch Linux
# 1. http://www.ducea.com/2006/06/17/ubuntu-package-version-naming-explanation/
]
for regex in regexs:
mobj = re.match(regex, ver)
if mobj:
ver = mobj.group(1)
return ver
self.basename = None self.basename = None
self.probe_basename = None self.probe_basename = None
@ -110,11 +121,10 @@ class FFmpegPostProcessor(PostProcessor):
self._paths = dict( self._paths = dict(
(p, os.path.join(location, p)) for p in programs) (p, os.path.join(location, p)) for p in programs)
self._versions = dict( self._versions = dict(
(p, get_exe_version(self._paths[p], args=['-version'])) (p, get_ffmpeg_version(self._paths[p])) for p in programs)
for p in programs)
if self._versions is None: if self._versions is None:
self._versions = dict( self._versions = dict(
(p, get_exe_version(p, args=['-version'])) for p in programs) (p, get_ffmpeg_version(p)) for p in programs)
self._paths = dict((p, p) for p in programs) self._paths = dict((p, p) for p in programs)
if prefer_ffmpeg is False: if prefer_ffmpeg is False:
@ -152,27 +162,45 @@ class FFmpegPostProcessor(PostProcessor):
return self._paths[self.probe_basename] return self._paths[self.probe_basename]
def get_audio_codec(self, path): def get_audio_codec(self, path):
if not self.probe_available: if not self.probe_available and not self.available:
raise PostProcessingError('ffprobe or avprobe not found. Please install one.') raise PostProcessingError('ffprobe/avprobe and ffmpeg/avconv not found. Please install one.')
try: try:
cmd = [ if self.probe_available:
encodeFilename(self.probe_executable, True), cmd = [
encodeArgument('-show_streams'), encodeFilename(self.probe_executable, True),
encodeFilename(self._ffmpeg_filename_argument(path), True)] encodeArgument('-show_streams')]
else:
cmd = [
encodeFilename(self.executable, True),
encodeArgument('-i')]
cmd.append(encodeFilename(self._ffmpeg_filename_argument(path), True))
if self._downloader.params.get('verbose', False): if self._downloader.params.get('verbose', False):
self._downloader.to_screen('[debug] %s command line: %s' % (self.basename, shell_quote(cmd))) self._downloader.to_screen(
handle = subprocess.Popen(cmd, stderr=compat_subprocess_get_DEVNULL(), stdout=subprocess.PIPE, stdin=subprocess.PIPE) '[debug] %s command line: %s' % (self.basename, shell_quote(cmd)))
output = handle.communicate()[0] handle = subprocess.Popen(
if handle.wait() != 0: cmd, stderr=subprocess.PIPE,
stdout=subprocess.PIPE, stdin=subprocess.PIPE)
stdout_data, stderr_data = handle.communicate()
expected_ret = 0 if self.probe_available else 1
if handle.wait() != expected_ret:
return None return None
except (IOError, OSError): except (IOError, OSError):
return None return None
audio_codec = None output = (stdout_data if self.probe_available else stderr_data).decode('ascii', 'ignore')
for line in output.decode('ascii', 'ignore').split('\n'): if self.probe_available:
if line.startswith('codec_name='): audio_codec = None
audio_codec = line.split('=')[1].strip() for line in output.split('\n'):
elif line.strip() == 'codec_type=audio' and audio_codec is not None: if line.startswith('codec_name='):
return audio_codec audio_codec = line.split('=')[1].strip()
elif line.strip() == 'codec_type=audio' and audio_codec is not None:
return audio_codec
else:
# Stream #FILE_INDEX:STREAM_INDEX[STREAM_ID](LANGUAGE): CODEC_TYPE: CODEC_NAME
mobj = re.search(
r'Stream\s*#\d+:\d+(?:\[0x[0-9a-f]+\])?(?:\([a-z]{3}\))?:\s*Audio:\s*([0-9a-z]+)',
output)
if mobj:
return mobj.group(1)
return None return None
def run_ffmpeg_multiple_files(self, input_paths, out_path, opts): def run_ffmpeg_multiple_files(self, input_paths, out_path, opts):
@ -384,9 +412,8 @@ class FFmpegEmbedSubtitlePP(FFmpegPostProcessor):
opts += ['-c:s', 'mov_text'] opts += ['-c:s', 'mov_text']
for (i, lang) in enumerate(sub_langs): for (i, lang) in enumerate(sub_langs):
opts.extend(['-map', '%d:0' % (i + 1)]) opts.extend(['-map', '%d:0' % (i + 1)])
lang_code = ISO639Utils.short2long(lang) lang_code = ISO639Utils.short2long(lang) or lang
if lang_code is not None: opts.extend(['-metadata:s:s:%d' % i, 'language=%s' % lang_code])
opts.extend(['-metadata:s:s:%d' % i, 'language=%s' % lang_code])
temp_filename = prepend_extension(filename, 'temp') temp_filename = prepend_extension(filename, 'temp')
self._downloader.to_screen('[ffmpeg] Embedding subtitles in \'%s\'' % filename) self._downloader.to_screen('[ffmpeg] Embedding subtitles in \'%s\'' % filename)

View File

@ -1868,7 +1868,7 @@ def urljoin(base, path):
path = path.decode('utf-8') path = path.decode('utf-8')
if not isinstance(path, compat_str) or not path: if not isinstance(path, compat_str) or not path:
return None return None
if re.match(r'^(?:https?:)?//', path): if re.match(r'^(?:[a-zA-Z][a-zA-Z0-9+-.]*:)?//', path):
return path return path
if isinstance(base, bytes): if isinstance(base, bytes):
base = base.decode('utf-8') base = base.decode('utf-8')
@ -2968,6 +2968,7 @@ class ISO639Utils(object):
'gv': 'glv', 'gv': 'glv',
'ha': 'hau', 'ha': 'hau',
'he': 'heb', 'he': 'heb',
'iw': 'heb', # Replaced by he in 1989 revision
'hi': 'hin', 'hi': 'hin',
'ho': 'hmo', 'ho': 'hmo',
'hr': 'hrv', 'hr': 'hrv',
@ -2977,6 +2978,7 @@ class ISO639Utils(object):
'hz': 'her', 'hz': 'her',
'ia': 'ina', 'ia': 'ina',
'id': 'ind', 'id': 'ind',
'in': 'ind', # Replaced by id in 1989 revision
'ie': 'ile', 'ie': 'ile',
'ig': 'ibo', 'ig': 'ibo',
'ii': 'iii', 'ii': 'iii',
@ -3091,6 +3093,7 @@ class ISO639Utils(object):
'wo': 'wol', 'wo': 'wol',
'xh': 'xho', 'xh': 'xho',
'yi': 'yid', 'yi': 'yid',
'ji': 'yid', # Replaced by yi in 1989 revision
'yo': 'yor', 'yo': 'yor',
'za': 'zha', 'za': 'zha',
'zh': 'zho', 'zh': 'zho',

View File

@ -1,3 +1,3 @@
from __future__ import unicode_literals from __future__ import unicode_literals
__version__ = '2019.01.02' __version__ = '2019.01.27'