Merge branch 'master' into feature/bandcamp_user_support

This commit is contained in:
Lyz 2020-09-24 10:49:20 +02:00
commit ad1fcb938a
No known key found for this signature in database
GPG Key ID: 6C7D7C1612CDE02F
71 changed files with 2113 additions and 842 deletions

View File

@ -18,7 +18,7 @@ title: ''
<!-- <!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.03.24. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED. - First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.09.20. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser. - Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape. - Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape.
- Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates. - Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates.
@ -26,7 +26,7 @@ Carefully read and work through this check list in order to prevent the most com
--> -->
- [ ] I'm reporting a broken site support - [ ] I'm reporting a broken site support
- [ ] I've verified that I'm running youtube-dl version **2020.03.24** - [ ] I've verified that I'm running youtube-dl version **2020.09.20**
- [ ] I've checked that all provided URLs are alive and playable in a browser - [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped - [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
- [ ] I've searched the bugtracker for similar issues including closed ones - [ ] I've searched the bugtracker for similar issues including closed ones
@ -41,7 +41,7 @@ Add the `-v` flag to your command line you run youtube-dl with (`youtube-dl -v <
[debug] User config: [] [debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2020.03.24 [debug] youtube-dl version 2020.09.20
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {} [debug] Proxy map: {}

View File

@ -19,7 +19,7 @@ labels: 'site-support-request'
<!-- <!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.03.24. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED. - First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.09.20. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser. - Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that site you are requesting is not dedicated to copyright infringement, see https://yt-dl.org/copyright-infringement. youtube-dl does not support such sites. In order for site support request to be accepted all provided example URLs should not violate any copyrights. - Make sure that site you are requesting is not dedicated to copyright infringement, see https://yt-dl.org/copyright-infringement. youtube-dl does not support such sites. In order for site support request to be accepted all provided example URLs should not violate any copyrights.
- Search the bugtracker for similar site support requests: http://yt-dl.org/search-issues. DO NOT post duplicates. - Search the bugtracker for similar site support requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
@ -27,7 +27,7 @@ Carefully read and work through this check list in order to prevent the most com
--> -->
- [ ] I'm reporting a new site support request - [ ] I'm reporting a new site support request
- [ ] I've verified that I'm running youtube-dl version **2020.03.24** - [ ] I've verified that I'm running youtube-dl version **2020.09.20**
- [ ] I've checked that all provided URLs are alive and playable in a browser - [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that none of provided URLs violate any copyrights - [ ] I've checked that none of provided URLs violate any copyrights
- [ ] I've searched the bugtracker for similar site support requests including closed ones - [ ] I've searched the bugtracker for similar site support requests including closed ones

View File

@ -18,13 +18,13 @@ title: ''
<!-- <!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.03.24. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED. - First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.09.20. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Search the bugtracker for similar site feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates. - Search the bugtracker for similar site feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
- Finally, put x into all relevant boxes (like this [x]) - Finally, put x into all relevant boxes (like this [x])
--> -->
- [ ] I'm reporting a site feature request - [ ] I'm reporting a site feature request
- [ ] I've verified that I'm running youtube-dl version **2020.03.24** - [ ] I've verified that I'm running youtube-dl version **2020.09.20**
- [ ] I've searched the bugtracker for similar site feature requests including closed ones - [ ] I've searched the bugtracker for similar site feature requests including closed ones

View File

@ -18,7 +18,7 @@ title: ''
<!-- <!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.03.24. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED. - First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.09.20. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser. - Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape. - Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape.
- Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates. - Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates.
@ -27,7 +27,7 @@ Carefully read and work through this check list in order to prevent the most com
--> -->
- [ ] I'm reporting a broken site support issue - [ ] I'm reporting a broken site support issue
- [ ] I've verified that I'm running youtube-dl version **2020.03.24** - [ ] I've verified that I'm running youtube-dl version **2020.09.20**
- [ ] I've checked that all provided URLs are alive and playable in a browser - [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped - [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
- [ ] I've searched the bugtracker for similar bug reports including closed ones - [ ] I've searched the bugtracker for similar bug reports including closed ones
@ -43,7 +43,7 @@ Add the `-v` flag to your command line you run youtube-dl with (`youtube-dl -v <
[debug] User config: [] [debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2020.03.24 [debug] youtube-dl version 2020.09.20
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {} [debug] Proxy map: {}

View File

@ -19,13 +19,13 @@ labels: 'request'
<!-- <!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.03.24. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED. - First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.09.20. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Search the bugtracker for similar feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates. - Search the bugtracker for similar feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
- Finally, put x into all relevant boxes (like this [x]) - Finally, put x into all relevant boxes (like this [x])
--> -->
- [ ] I'm reporting a feature request - [ ] I'm reporting a feature request
- [ ] I've verified that I'm running youtube-dl version **2020.03.24** - [ ] I've verified that I'm running youtube-dl version **2020.09.20**
- [ ] I've searched the bugtracker for similar feature requests including closed ones - [ ] I've searched the bugtracker for similar feature requests including closed ones

View File

@ -153,7 +153,7 @@ After you have ensured this site is distributing its content legally, you can fo
5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/extractor/extractors.py). 5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. Note that tests with `only_matching` key in test's dict are not counted in. 6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. Note that tests with `only_matching` key in test's dict are not counted in.
7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/ytdl-org/youtube-dl/blob/7f41a598b3fba1bcab2817de64a08941200aa3c8/youtube_dl/extractor/common.py#L94-L303). Add tests and code for as many as you want. 7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/ytdl-org/youtube-dl/blob/7f41a598b3fba1bcab2817de64a08941200aa3c8/youtube_dl/extractor/common.py#L94-L303). Add tests and code for as many as you want.
8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](http://flake8.pycqa.org/en/latest/index.html#quickstart): 8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](https://flake8.pycqa.org/en/latest/index.html#quickstart):
$ flake8 youtube_dl/extractor/yourextractor.py $ flake8 youtube_dl/extractor/yourextractor.py

187
ChangeLog
View File

@ -1,3 +1,190 @@
version 2020.09.20
Core
* [extractor/common] Relax interaction count extraction in _json_ld
+ [extractor/common] Extract author as uploader for VideoObject in _json_ld
* [downloader/hls] Fix incorrect end byte in Range HTTP header for
media segments with EXT-X-BYTERANGE (#14748, #24512)
* [extractor/common] Handle ssl.CertificateError in _request_webpage (#26601)
* [downloader/http] Improve timeout detection when reading block of data
(#10935)
* [downloader/http] Retry download when urlopen times out (#10935, #26603)
Extractors
* [redtube] Extend URL regular expression (#26506)
* [twitch] Refactor
* [twitch:stream] Switch to GraphQL and fix reruns (#26535)
+ [telequebec] Add support for brightcove videos (#25833)
* [pornhub] Extract metadata from JSON-LD (#26614)
* [pornhub] Fix view count extraction (#26621, #26614)
version 2020.09.14
Core
+ [postprocessor/embedthumbnail] Add support for non jpg/png thumbnails
(#25687, #25717)
Extractors
* [rtlnl] Extend URL regular expression (#26549, #25821)
* [youtube] Fix empty description extraction (#26575, #26006)
* [srgssr] Extend URL regular expression (#26555, #26556, #26578)
* [googledrive] Use redirect URLs for source format (#18877, #23919, #24689,
#26565)
* [svtplay] Fix id extraction (#26576)
* [redbulltv] Improve support for rebull.com TV localized URLs (#22063)
+ [redbulltv] Add support for new redbull.com TV URLs (#22037, #22063)
* [soundcloud:pagedplaylist] Reduce pagination limit (#26557)
version 2020.09.06
Core
+ [utils] Recognize wav mimetype (#26463)
Extractors
* [nrktv:episode] Improve video id extraction (#25594, #26369, #26409)
* [youtube] Fix age gate content detection (#26100, #26152, #26311, #26384)
* [youtube:user] Extend URL regular expression (#26443)
* [xhamster] Improve initials regular expression (#26526, #26353)
* [svtplay] Fix video id extraction (#26425, #26428, #26438)
* [twitch] Rework extractors (#12297, #20414, #20604, #21811, #21812, #22979,
#24263, #25010, #25553, #25606)
* Switch to GraphQL
+ Add support for collections
+ Add support for clips and collections playlists
* [biqle] Improve video ext extraction
* [xhamster] Fix extraction (#26157, #26254)
* [xhamster] Extend URL regular expression (#25789, #25804, #25927))
version 2020.07.28
Extractors
* [youtube] Fix sigfunc name extraction (#26134, #26135, #26136, #26137)
* [youtube] Improve description extraction (#25937, #25980)
* [wistia] Restrict embed regular expression (#25969)
* [youtube] Prevent excess HTTP 301 (#25786)
+ [youtube:playlists] Extend URL regular expression (#25810)
+ [bellmedia] Add support for cp24.com clip URLs (#25764)
* [brightcove] Improve embed detection (#25674)
version 2020.06.16.1
Extractors
* [youtube] Force old layout (#25682, #25683, #25680, #25686)
* [youtube] Fix categories and improve tags extraction
version 2020.06.16
Extractors
* [youtube] Fix uploader id and uploader URL extraction
* [youtube] Improve view count extraction
* [youtube] Fix upload date extraction (#25677)
* [youtube] Fix thumbnails extraction (#25676)
* [youtube] Fix playlist and feed extraction (#25675)
+ [facebook] Add support for single-video ID links
+ [youtube] Extract chapters from JSON (#24819)
+ [kaltura] Add support for multiple embeds on a webpage (#25523)
version 2020.06.06
Extractors
* [tele5] Bypass geo restriction
+ [jwplatform] Add support for bypass geo restriction
* [tele5] Prefer jwplatform over nexx (#25533)
* [twitch:stream] Expect 400 and 410 HTTP errors from API
* [twitch:stream] Fix extraction (#25528)
* [twitch] Fix thumbnails extraction (#25531)
+ [twitch] Pass v5 Accept HTTP header (#25531)
* [brightcove] Fix subtitles extraction (#25540)
+ [malltv] Add support for sk.mall.tv (#25445)
* [periscope] Fix untitled broadcasts (#25482)
* [jwplatform] Improve embeds extraction (#25467)
version 2020.05.29
Core
* [postprocessor/ffmpeg] Embed series metadata with --add-metadata
* [utils] Fix file permissions in write_json_file (#12471, #25122)
Extractors
* [ard:beta] Extend URL regular expression (#25405)
+ [youtube] Add support for more invidious instances (#25417)
* [giantbomb] Extend URL regular expression (#25222)
* [ard] Improve URL regular expression (#25134, #25198)
* [redtube] Improve formats extraction and extract m3u8 formats (#25311,
#25321)
* [indavideo] Switch to HTTPS for API request (#25191)
* [redtube] Improve title extraction (#25208)
* [vimeo] Improve format extraction and sorting (#25285)
* [soundcloud] Reduce API playlist page limit (#25274)
+ [youtube] Add support for yewtu.be (#25226)
* [mailru] Fix extraction (#24530, #25239)
* [bellator] Fix mgid extraction (#25195)
version 2020.05.08
Core
* [downloader/http] Request last data block of exact remaining size
* [downloader/http] Finish downloading once received data length matches
expected
* [extractor/common] Use compat_cookiejar_Cookie for _set_cookie to always
ensure cookie name and value are bytestrings on python 2 (#23256, #24776)
+ [compat] Introduce compat_cookiejar_Cookie
* [utils] Improve cookie files support
+ Add support for UTF-8 in cookie files
* Skip malformed cookie file entries instead of crashing (invalid entry
length, invalid expires at)
Extractors
* [youtube] Improve signature cipher extraction (#25187, #25188)
* [iprima] Improve extraction (#25138)
* [uol] Fix extraction (#22007)
+ [orf] Add support for more radio stations (#24938, #24968)
* [dailymotion] Fix typo
- [puhutv] Remove no longer available HTTP formats (#25124)
version 2020.05.03
Core
+ [extractor/common] Extract multiple JSON-LD entries
* [options] Clarify doc on --exec command (#19087, #24883)
* [extractor/common] Skip malformed ISM manifest XMLs while extracting
ISM formats (#24667)
Extractors
* [crunchyroll] Fix and improve extraction (#25096, #25060)
* [youtube] Improve player id extraction
* [youtube] Use redirected video id if any (#25063)
* [yahoo] Fix GYAO Player extraction and relax URL regular expression
(#24178, #24778)
* [tvplay] Fix Viafree extraction (#15189, #24473, #24789)
* [tenplay] Relax URL regular expression (#25001)
+ [prosiebensat1] Extract series metadata
* [prosiebensat1] Improve extraction and remove 7tv.de support (#24948)
- [prosiebensat1] Remove 7tv.de support (#24948)
* [youtube] Fix DRM videos detection (#24736)
* [thisoldhouse] Fix video id extraction (#24548, #24549)
+ [soundcloud] Extract AAC format (#19173, #24708)
* [youtube] Skip broken multifeed videos (#24711)
* [nova:embed] Fix extraction (#24700)
* [motherless] Fix extraction (#24699)
* [twitch:clips] Extend URL regular expression (#24290, #24642)
* [tv4] Fix ISM formats extraction (#24667)
* [tele5] Fix extraction (#24553)
+ [mofosex] Add support for generic embeds (#24633)
+ [youporn] Add support for generic embeds
+ [spankwire] Add support for generic embeds (#24633)
* [spankwire] Fix extraction (#18924, #20648)
version 2020.03.24 version 2020.03.24
Core Core

View File

@ -434,9 +434,9 @@ Alternatively, refer to the [developer instructions](#developer-instructions) fo
either the path to the binary or its either the path to the binary or its
containing directory. containing directory.
--exec CMD Execute a command on the file after --exec CMD Execute a command on the file after
downloading, similar to find's -exec downloading and post-processing, similar to
syntax. Example: --exec 'adb push {} find's -exec syntax. Example: --exec 'adb
/sdcard/Music/ && rm {}' push {} /sdcard/Music/ && rm {}'
--convert-subs FORMAT Convert the subtitles to other format --convert-subs FORMAT Convert the subtitles to other format
(currently supported: srt|ass|vtt|lrc) (currently supported: srt|ass|vtt|lrc)
@ -545,7 +545,7 @@ The basic usage is not to set any template arguments when downloading a single f
- `extractor` (string): Name of the extractor - `extractor` (string): Name of the extractor
- `extractor_key` (string): Key name of the extractor - `extractor_key` (string): Key name of the extractor
- `epoch` (numeric): Unix epoch when creating the file - `epoch` (numeric): Unix epoch when creating the file
- `autonumber` (numeric): Five-digit number that will be increased with each download, starting at zero - `autonumber` (numeric): Number that will be increased with each download, starting at `--autonumber-start`
- `playlist` (string): Name or id of the playlist that contains the video - `playlist` (string): Name or id of the playlist that contains the video
- `playlist_index` (numeric): Index of the video in the playlist padded with leading zeros according to the total length of the playlist - `playlist_index` (numeric): Index of the video in the playlist padded with leading zeros according to the total length of the playlist
- `playlist_id` (string): Playlist identifier - `playlist_id` (string): Playlist identifier
@ -1032,7 +1032,7 @@ After you have ensured this site is distributing its content legally, you can fo
5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/extractor/extractors.py). 5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. Note that tests with `only_matching` key in test's dict are not counted in. 6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. Note that tests with `only_matching` key in test's dict are not counted in.
7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/ytdl-org/youtube-dl/blob/7f41a598b3fba1bcab2817de64a08941200aa3c8/youtube_dl/extractor/common.py#L94-L303). Add tests and code for as many as you want. 7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/ytdl-org/youtube-dl/blob/7f41a598b3fba1bcab2817de64a08941200aa3c8/youtube_dl/extractor/common.py#L94-L303). Add tests and code for as many as you want.
8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](http://flake8.pycqa.org/en/latest/index.html#quickstart): 8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](https://flake8.pycqa.org/en/latest/index.html#quickstart):
$ flake8 youtube_dl/extractor/yourextractor.py $ flake8 youtube_dl/extractor/yourextractor.py

View File

@ -497,6 +497,7 @@
- **MNetTV** - **MNetTV**
- **MoeVideo**: LetitBit video services: moevideo.net, playreplay.net and videochart.net - **MoeVideo**: LetitBit video services: moevideo.net, playreplay.net and videochart.net
- **Mofosex** - **Mofosex**
- **MofosexEmbed**
- **Mojvideo** - **Mojvideo**
- **Morningstar**: morningstar.com - **Morningstar**: morningstar.com
- **Motherless** - **Motherless**
@ -619,11 +620,21 @@
- **Ooyala** - **Ooyala**
- **OoyalaExternal** - **OoyalaExternal**
- **OraTV** - **OraTV**
- **orf:burgenland**: Radio Burgenland
- **orf:fm4**: radio FM4 - **orf:fm4**: radio FM4
- **orf:fm4:story**: fm4.orf.at stories - **orf:fm4:story**: fm4.orf.at stories
- **orf:iptv**: iptv.ORF.at - **orf:iptv**: iptv.ORF.at
- **orf:kaernten**: Radio Kärnten
- **orf:noe**: Radio Niederösterreich
- **orf:oberoesterreich**: Radio Oberösterreich
- **orf:oe1**: Radio Österreich 1 - **orf:oe1**: Radio Österreich 1
- **orf:oe3**: Radio Österreich 3
- **orf:salzburg**: Radio Salzburg
- **orf:steiermark**: Radio Steiermark
- **orf:tirol**: Radio Tirol
- **orf:tvthek**: ORF TVthek - **orf:tvthek**: ORF TVthek
- **orf:vorarlberg**: Radio Vorarlberg
- **orf:wien**: Radio Wien
- **OsnatelTV** - **OsnatelTV**
- **OutsideTV** - **OutsideTV**
- **PacktPub** - **PacktPub**
@ -706,6 +717,8 @@
- **RayWenderlichCourse** - **RayWenderlichCourse**
- **RBMARadio** - **RBMARadio**
- **RDS**: RDS.ca - **RDS**: RDS.ca
- **RedBull**
- **RedBullEmbed**
- **RedBullTV** - **RedBullTV**
- **RedBullTVRrnContent** - **RedBullTVRrnContent**
- **Reddit** - **Reddit**
@ -939,16 +952,13 @@
- **TVPlayHome** - **TVPlayHome**
- **Tweakers** - **Tweakers**
- **TwitCasting** - **TwitCasting**
- **twitch:chapter**
- **twitch:clips** - **twitch:clips**
- **twitch:profile**
- **twitch:stream** - **twitch:stream**
- **twitch:video**
- **twitch:videos:all**
- **twitch:videos:highlights**
- **twitch:videos:past-broadcasts**
- **twitch:videos:uploads**
- **twitch:vod** - **twitch:vod**
- **TwitchCollection**
- **TwitchVideos**
- **TwitchVideosClips**
- **TwitchVideosCollections**
- **twitter** - **twitter**
- **twitter:amplify** - **twitter:amplify**
- **twitter:broadcast** - **twitter:broadcast**

View File

@ -39,6 +39,13 @@ class TestYoutubeDLCookieJar(unittest.TestCase):
assert_cookie_has_value('HTTPONLY_COOKIE') assert_cookie_has_value('HTTPONLY_COOKIE')
assert_cookie_has_value('JS_ACCESSIBLE_COOKIE') assert_cookie_has_value('JS_ACCESSIBLE_COOKIE')
def test_malformed_cookies(self):
cookiejar = YoutubeDLCookieJar('./test/testdata/cookies/malformed_cookies.txt')
cookiejar.load(ignore_discard=True, ignore_expires=True)
# Cookies should be empty since all malformed cookie file entries
# will be ignored
self.assertFalse(cookiejar._cookies)
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()

View File

@ -803,6 +803,8 @@ class TestUtil(unittest.TestCase):
self.assertEqual(mimetype2ext('text/vtt'), 'vtt') self.assertEqual(mimetype2ext('text/vtt'), 'vtt')
self.assertEqual(mimetype2ext('text/vtt;charset=utf-8'), 'vtt') self.assertEqual(mimetype2ext('text/vtt;charset=utf-8'), 'vtt')
self.assertEqual(mimetype2ext('text/html; charset=utf-8'), 'html') self.assertEqual(mimetype2ext('text/html; charset=utf-8'), 'html')
self.assertEqual(mimetype2ext('audio/x-wav'), 'wav')
self.assertEqual(mimetype2ext('audio/x-wav;codec=pcm'), 'wav')
def test_month_by_name(self): def test_month_by_name(self):
self.assertEqual(month_by_name(None), None) self.assertEqual(month_by_name(None), None)

View File

@ -267,7 +267,7 @@ class TestYoutubeChapters(unittest.TestCase):
for description, duration, expected_chapters in self._TEST_CASES: for description, duration, expected_chapters in self._TEST_CASES:
ie = YoutubeIE() ie = YoutubeIE()
expect_value( expect_value(
self, ie._extract_chapters(description, duration), self, ie._extract_chapters_from_description(description, duration),
expected_chapters, None) expected_chapters, None)

View File

@ -74,6 +74,28 @@ _TESTS = [
] ]
class TestPlayerInfo(unittest.TestCase):
def test_youtube_extract_player_info(self):
PLAYER_URLS = (
('https://www.youtube.com/s/player/64dddad9/player_ias.vflset/en_US/base.js', '64dddad9'),
# obsolete
('https://www.youtube.com/yts/jsbin/player_ias-vfle4-e03/en_US/base.js', 'vfle4-e03'),
('https://www.youtube.com/yts/jsbin/player_ias-vfl49f_g4/en_US/base.js', 'vfl49f_g4'),
('https://www.youtube.com/yts/jsbin/player_ias-vflCPQUIL/en_US/base.js', 'vflCPQUIL'),
('https://www.youtube.com/yts/jsbin/player-vflzQZbt7/en_US/base.js', 'vflzQZbt7'),
('https://www.youtube.com/yts/jsbin/player-en_US-vflaxXRn1/base.js', 'vflaxXRn1'),
('https://s.ytimg.com/yts/jsbin/html5player-en_US-vflXGBaUN.js', 'vflXGBaUN'),
('https://s.ytimg.com/yts/jsbin/html5player-en_US-vflKjOTVq/html5player.js', 'vflKjOTVq'),
('http://s.ytimg.com/yt/swfbin/watch_as3-vflrEm9Nq.swf', 'vflrEm9Nq'),
('https://s.ytimg.com/yts/swfbin/player-vflenCdZL/watch_as3.swf', 'vflenCdZL'),
)
for player_url, expected_player_id in PLAYER_URLS:
expected_player_type = player_url.split('.')[-1]
player_type, player_id = YoutubeIE._extract_player_info(player_url)
self.assertEqual(player_type, expected_player_type)
self.assertEqual(player_id, expected_player_id)
class TestSignature(unittest.TestCase): class TestSignature(unittest.TestCase):
def setUp(self): def setUp(self):
TEST_DIR = os.path.dirname(os.path.abspath(__file__)) TEST_DIR = os.path.dirname(os.path.abspath(__file__))

View File

@ -0,0 +1,9 @@
# Netscape HTTP Cookie File
# http://curl.haxx.se/rfc/cookie_spec.html
# This is a generated file! Do not edit.
# Cookie file entry with invalid number of fields - 6 instead of 7
www.foobar.foobar FALSE / FALSE 0 COOKIE
# Cookie file entry with invalid expires at
www.foobar.foobar FALSE / FALSE 1.7976931348623157e+308 COOKIE VALUE

View File

@ -57,6 +57,17 @@ try:
except ImportError: # Python 2 except ImportError: # Python 2
import cookielib as compat_cookiejar import cookielib as compat_cookiejar
if sys.version_info[0] == 2:
class compat_cookiejar_Cookie(compat_cookiejar.Cookie):
def __init__(self, version, name, value, *args, **kwargs):
if isinstance(name, compat_str):
name = name.encode()
if isinstance(value, compat_str):
value = value.encode()
compat_cookiejar.Cookie.__init__(self, version, name, value, *args, **kwargs)
else:
compat_cookiejar_Cookie = compat_cookiejar.Cookie
try: try:
import http.cookies as compat_cookies import http.cookies as compat_cookies
except ImportError: # Python 2 except ImportError: # Python 2
@ -2987,6 +2998,7 @@ __all__ = [
'compat_basestring', 'compat_basestring',
'compat_chr', 'compat_chr',
'compat_cookiejar', 'compat_cookiejar',
'compat_cookiejar_Cookie',
'compat_cookies', 'compat_cookies',
'compat_ctypes_WINFUNCTYPE', 'compat_ctypes_WINFUNCTYPE',
'compat_etree_Element', 'compat_etree_Element',

View File

@ -141,7 +141,7 @@ class HlsFD(FragmentFD):
count = 0 count = 0
headers = info_dict.get('http_headers', {}) headers = info_dict.get('http_headers', {})
if byte_range: if byte_range:
headers['Range'] = 'bytes=%d-%d' % (byte_range['start'], byte_range['end']) headers['Range'] = 'bytes=%d-%d' % (byte_range['start'], byte_range['end'] - 1)
while count <= fragment_retries: while count <= fragment_retries:
try: try:
success, frag_content = self._download_fragment( success, frag_content = self._download_fragment(

View File

@ -106,7 +106,12 @@ class HttpFD(FileDownloader):
set_range(request, range_start, range_end) set_range(request, range_start, range_end)
# Establish connection # Establish connection
try: try:
ctx.data = self.ydl.urlopen(request) try:
ctx.data = self.ydl.urlopen(request)
except (compat_urllib_error.URLError, ) as err:
if isinstance(err.reason, socket.timeout):
raise RetryDownload(err)
raise err
# When trying to resume, Content-Range HTTP header of response has to be checked # When trying to resume, Content-Range HTTP header of response has to be checked
# to match the value of requested Range HTTP header. This is due to a webservers # to match the value of requested Range HTTP header. This is due to a webservers
# that don't support resuming and serve a whole file with no Content-Range # that don't support resuming and serve a whole file with no Content-Range
@ -218,24 +223,27 @@ class HttpFD(FileDownloader):
def retry(e): def retry(e):
to_stdout = ctx.tmpfilename == '-' to_stdout = ctx.tmpfilename == '-'
if not to_stdout: if ctx.stream is not None:
ctx.stream.close() if not to_stdout:
ctx.stream = None ctx.stream.close()
ctx.stream = None
ctx.resume_len = byte_counter if to_stdout else os.path.getsize(encodeFilename(ctx.tmpfilename)) ctx.resume_len = byte_counter if to_stdout else os.path.getsize(encodeFilename(ctx.tmpfilename))
raise RetryDownload(e) raise RetryDownload(e)
while True: while True:
try: try:
# Download and write # Download and write
data_block = ctx.data.read(block_size if not is_test else min(block_size, data_len - byte_counter)) data_block = ctx.data.read(block_size if data_len is None else min(block_size, data_len - byte_counter))
# socket.timeout is a subclass of socket.error but may not have # socket.timeout is a subclass of socket.error but may not have
# errno set # errno set
except socket.timeout as e: except socket.timeout as e:
retry(e) retry(e)
except socket.error as e: except socket.error as e:
if e.errno not in (errno.ECONNRESET, errno.ETIMEDOUT): # SSLError on python 2 (inherits socket.error) may have
raise # no errno set but this error message
retry(e) if e.errno in (errno.ECONNRESET, errno.ETIMEDOUT) or getattr(e, 'message', None) == 'The read operation timed out':
retry(e)
raise
byte_counter += len(data_block) byte_counter += len(data_block)
@ -299,7 +307,7 @@ class HttpFD(FileDownloader):
'elapsed': now - ctx.start_time, 'elapsed': now - ctx.start_time,
}) })
if is_test and byte_counter == data_len: if data_len is not None and byte_counter == data_len:
break break
if not is_test and ctx.chunk_size and ctx.data_len is not None and byte_counter < ctx.data_len: if not is_test and ctx.chunk_size and ctx.data_len is not None and byte_counter < ctx.data_len:

View File

@ -249,7 +249,7 @@ class ARDMediathekIE(ARDMediathekBaseIE):
class ARDIE(InfoExtractor): class ARDIE(InfoExtractor):
_VALID_URL = r'(?P<mainurl>https?://(www\.)?daserste\.de/[^?#]+/videos/(?P<display_id>[^/?#]+)-(?P<id>[0-9]+))\.html' _VALID_URL = r'(?P<mainurl>https?://(www\.)?daserste\.de/[^?#]+/videos(?:extern)?/(?P<display_id>[^/?#]+)-(?P<id>[0-9]+))\.html'
_TESTS = [{ _TESTS = [{
# available till 14.02.2019 # available till 14.02.2019
'url': 'http://www.daserste.de/information/talk/maischberger/videos/das-groko-drama-zerlegen-sich-die-volksparteien-video-102.html', 'url': 'http://www.daserste.de/information/talk/maischberger/videos/das-groko-drama-zerlegen-sich-die-volksparteien-video-102.html',
@ -263,6 +263,9 @@ class ARDIE(InfoExtractor):
'upload_date': '20180214', 'upload_date': '20180214',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
}, },
}, {
'url': 'https://www.daserste.de/information/reportage-dokumentation/erlebnis-erde/videosextern/woelfe-und-herdenschutzhunde-ungleiche-brueder-102.html',
'only_matching': True,
}, { }, {
'url': 'http://www.daserste.de/information/reportage-dokumentation/dokus/videos/die-story-im-ersten-mission-unter-falscher-flagge-100.html', 'url': 'http://www.daserste.de/information/reportage-dokumentation/dokus/videos/die-story-im-ersten-mission-unter-falscher-flagge-100.html',
'only_matching': True, 'only_matching': True,
@ -310,9 +313,9 @@ class ARDIE(InfoExtractor):
class ARDBetaMediathekIE(ARDMediathekBaseIE): class ARDBetaMediathekIE(ARDMediathekBaseIE):
_VALID_URL = r'https://(?:beta|www)\.ardmediathek\.de/(?P<client>[^/]+)/(?:player|live)/(?P<video_id>[a-zA-Z0-9]+)(?:/(?P<display_id>[^/?#]+))?' _VALID_URL = r'https://(?:(?:beta|www)\.)?ardmediathek\.de/(?P<client>[^/]+)/(?:player|live|video)/(?P<display_id>(?:[^/]+/)*)(?P<video_id>[a-zA-Z0-9]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://beta.ardmediathek.de/ard/player/Y3JpZDovL2Rhc2Vyc3RlLmRlL3RhdG9ydC9mYmM4NGM1NC0xNzU4LTRmZGYtYWFhZS0wYzcyZTIxNGEyMDE/die-robuste-roswita', 'url': 'https://ardmediathek.de/ard/video/die-robuste-roswita/Y3JpZDovL2Rhc2Vyc3RlLmRlL3RhdG9ydC9mYmM4NGM1NC0xNzU4LTRmZGYtYWFhZS0wYzcyZTIxNGEyMDE',
'md5': 'dfdc87d2e7e09d073d5a80770a9ce88f', 'md5': 'dfdc87d2e7e09d073d5a80770a9ce88f',
'info_dict': { 'info_dict': {
'display_id': 'die-robuste-roswita', 'display_id': 'die-robuste-roswita',
@ -325,6 +328,15 @@ class ARDBetaMediathekIE(ARDMediathekBaseIE):
'upload_date': '20191222', 'upload_date': '20191222',
'ext': 'mp4', 'ext': 'mp4',
}, },
}, {
'url': 'https://beta.ardmediathek.de/ard/video/Y3JpZDovL2Rhc2Vyc3RlLmRlL3RhdG9ydC9mYmM4NGM1NC0xNzU4LTRmZGYtYWFhZS0wYzcyZTIxNGEyMDE',
'only_matching': True,
}, {
'url': 'https://ardmediathek.de/ard/video/saartalk/saartalk-gesellschaftsgift-haltung-gegen-hass/sr-fernsehen/Y3JpZDovL3NyLW9ubGluZS5kZS9TVF84MTY4MA/',
'only_matching': True,
}, {
'url': 'https://www.ardmediathek.de/ard/video/trailer/private-eyes-s01-e01/one/Y3JpZDovL3dkci5kZS9CZWl0cmFnLTE1MTgwYzczLWNiMTEtNGNkMS1iMjUyLTg5MGYzOWQxZmQ1YQ/',
'only_matching': True,
}, { }, {
'url': 'https://www.ardmediathek.de/ard/player/Y3JpZDovL3N3ci5kZS9hZXgvbzEwNzE5MTU/', 'url': 'https://www.ardmediathek.de/ard/player/Y3JpZDovL3N3ci5kZS9hZXgvbzEwNzE5MTU/',
'only_matching': True, 'only_matching': True,
@ -336,7 +348,11 @@ class ARDBetaMediathekIE(ARDMediathekBaseIE):
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('video_id') video_id = mobj.group('video_id')
display_id = mobj.group('display_id') or video_id display_id = mobj.group('display_id')
if display_id:
display_id = display_id.rstrip('/')
if not display_id:
display_id = video_id
player_page = self._download_json( player_page = self._download_json(
'https://api.ardmediathek.de/public-gateway', 'https://api.ardmediathek.de/public-gateway',

View File

@ -115,12 +115,14 @@ class BandcampIE(InfoExtractor):
track_number = int_or_none(track_info.get('track_num')) track_number = int_or_none(track_info.get('track_num'))
duration = float_or_none(track_info.get('duration')) duration = float_or_none(track_info.get('duration'))
# r'\b%s\s*["\']?\s*:\s*(["\'])(?P<value>(?:(?!\1).)+)\1' % key,
def extract(key): def extract(key):
return self._search_regex( return self._search_regex(
r'\b%s\s*["\']?\s*:\s*(["\'])(?P<value>(?:(?!\1).)+)\1' % key, r'\b%s\s*["\']?\s*:\s*(["\'])(?P<value>.+)\1' % key,
webpage, key, default=None, group='value') webpage, key, default=None, group='value')
artist = extract('artist') artist = extract('artist')
album = extract('album_title') album = extract('album_title')
timestamp = unified_timestamp( timestamp = unified_timestamp(
extract('publish_date') or extract('album_publish_date')) extract('publish_date') or extract('album_publish_date'))

View File

@ -528,7 +528,7 @@ class BBCCoUkIE(InfoExtractor):
def get_programme_id(item): def get_programme_id(item):
def get_from_attributes(item): def get_from_attributes(item):
for p in('identifier', 'group'): for p in ('identifier', 'group'):
value = item.get(p) value = item.get(p)
if value and re.match(r'^[pb][\da-z]{7}$', value): if value and re.match(r'^[pb][\da-z]{7}$', value):
return value return value

View File

@ -25,8 +25,8 @@ class BellMediaIE(InfoExtractor):
etalk| etalk|
marilyn marilyn
)\.ca| )\.ca|
much\.com (?:much|cp24)\.com
)/.*?(?:\bvid(?:eoid)?=|-vid|~|%7E|/(?:episode)?)(?P<id>[0-9]{6,})''' )/.*?(?:\b(?:vid(?:eoid)?|clipId)=|-vid|~|%7E|/(?:episode)?)(?P<id>[0-9]{6,})'''
_TESTS = [{ _TESTS = [{
'url': 'https://www.bnnbloomberg.ca/video/david-cockfield-s-top-picks~1403070', 'url': 'https://www.bnnbloomberg.ca/video/david-cockfield-s-top-picks~1403070',
'md5': '36d3ef559cfe8af8efe15922cd3ce950', 'md5': '36d3ef559cfe8af8efe15922cd3ce950',
@ -62,6 +62,9 @@ class BellMediaIE(InfoExtractor):
}, { }, {
'url': 'http://www.etalk.ca/video?videoid=663455', 'url': 'http://www.etalk.ca/video?videoid=663455',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.cp24.com/video?clipId=1982548',
'only_matching': True,
}] }]
_DOMAINS = { _DOMAINS = {
'thecomedynetwork': 'comedy', 'thecomedynetwork': 'comedy',

View File

@ -3,10 +3,11 @@ from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from .vk import VKIE from .vk import VKIE
from ..utils import ( from ..compat import (
HEADRequest, compat_b64decode,
int_or_none, compat_urllib_parse_unquote,
) )
from ..utils import int_or_none
class BIQLEIE(InfoExtractor): class BIQLEIE(InfoExtractor):
@ -47,9 +48,16 @@ class BIQLEIE(InfoExtractor):
if VKIE.suitable(embed_url): if VKIE.suitable(embed_url):
return self.url_result(embed_url, VKIE.ie_key(), video_id) return self.url_result(embed_url, VKIE.ie_key(), video_id)
self._request_webpage( embed_page = self._download_webpage(
HEADRequest(embed_url), video_id, headers={'Referer': url}) embed_url, video_id, headers={'Referer': url})
video_id, sig, _, access_token = self._get_cookies(embed_url)['video_ext'].value.split('%3A') video_ext = self._get_cookies(embed_url).get('video_ext')
if video_ext:
video_ext = compat_urllib_parse_unquote(video_ext.value)
if not video_ext:
video_ext = compat_b64decode(self._search_regex(
r'video_ext\s*:\s*[\'"]([A-Za-z0-9+/=]+)',
embed_page, 'video_ext')).decode()
video_id, sig, _, access_token = video_ext.split(':')
item = self._download_json( item = self._download_json(
'https://api.vk.com/method/video.get', video_id, 'https://api.vk.com/method/video.get', video_id,
headers={'User-Agent': 'okhttp/3.4.1'}, query={ headers={'User-Agent': 'okhttp/3.4.1'}, query={

View File

@ -5,32 +5,34 @@ import base64
import re import re
import struct import struct
from .common import InfoExtractor
from .adobepass import AdobePassIE from .adobepass import AdobePassIE
from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_etree_fromstring, compat_etree_fromstring,
compat_HTTPError,
compat_parse_qs, compat_parse_qs,
compat_urllib_parse_urlparse, compat_urllib_parse_urlparse,
compat_urlparse, compat_urlparse,
compat_xml_parse_error, compat_xml_parse_error,
compat_HTTPError,
) )
from ..utils import ( from ..utils import (
ExtractorError, clean_html,
extract_attributes, extract_attributes,
ExtractorError,
find_xpath_attr, find_xpath_attr,
fix_xml_ampersands, fix_xml_ampersands,
float_or_none, float_or_none,
js_to_json,
int_or_none, int_or_none,
js_to_json,
mimetype2ext,
parse_iso8601, parse_iso8601,
smuggle_url, smuggle_url,
str_or_none,
unescapeHTML, unescapeHTML,
unsmuggle_url, unsmuggle_url,
update_url_query,
clean_html,
mimetype2ext,
UnsupportedError, UnsupportedError,
update_url_query,
url_or_none,
) )
@ -424,7 +426,7 @@ class BrightcoveNewIE(AdobePassIE):
# [2] looks like: # [2] looks like:
for video, script_tag, account_id, player_id, embed in re.findall( for video, script_tag, account_id, player_id, embed in re.findall(
r'''(?isx) r'''(?isx)
(<video\s+[^>]*\bdata-video-id\s*=\s*['"]?[^>]+>) (<video(?:-js)?\s+[^>]*\bdata-video-id\s*=\s*['"]?[^>]+>)
(?:.*? (?:.*?
(<script[^>]+ (<script[^>]+
src=["\'](?:https?:)?//players\.brightcove\.net/ src=["\'](?:https?:)?//players\.brightcove\.net/
@ -553,10 +555,16 @@ class BrightcoveNewIE(AdobePassIE):
subtitles = {} subtitles = {}
for text_track in json_data.get('text_tracks', []): for text_track in json_data.get('text_tracks', []):
if text_track.get('src'): if text_track.get('kind') != 'captions':
subtitles.setdefault(text_track.get('srclang'), []).append({ continue
'url': text_track['src'], text_track_url = url_or_none(text_track.get('src'))
}) if not text_track_url:
continue
lang = (str_or_none(text_track.get('srclang'))
or str_or_none(text_track.get('label')) or 'en').lower()
subtitles.setdefault(lang, []).append({
'url': text_track_url,
})
is_live = False is_live = False
duration = float_or_none(json_data.get('duration'), 1000) duration = float_or_none(json_data.get('duration'), 1000)

View File

@ -10,12 +10,13 @@ import os
import random import random
import re import re
import socket import socket
import ssl
import sys import sys
import time import time
import math import math
from ..compat import ( from ..compat import (
compat_cookiejar, compat_cookiejar_Cookie,
compat_cookies, compat_cookies,
compat_etree_Element, compat_etree_Element,
compat_etree_fromstring, compat_etree_fromstring,
@ -67,6 +68,7 @@ from ..utils import (
sanitized_Request, sanitized_Request,
sanitize_filename, sanitize_filename,
str_or_none, str_or_none,
str_to_int,
strip_or_none, strip_or_none,
unescapeHTML, unescapeHTML,
unified_strdate, unified_strdate,
@ -623,9 +625,12 @@ class InfoExtractor(object):
url_or_request = update_url_query(url_or_request, query) url_or_request = update_url_query(url_or_request, query)
if data is not None or headers: if data is not None or headers:
url_or_request = sanitized_Request(url_or_request, data, headers) url_or_request = sanitized_Request(url_or_request, data, headers)
exceptions = [compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error]
if hasattr(ssl, 'CertificateError'):
exceptions.append(ssl.CertificateError)
try: try:
return self._downloader.urlopen(url_or_request) return self._downloader.urlopen(url_or_request)
except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err: except tuple(exceptions) as err:
if isinstance(err, compat_urllib_error.HTTPError): if isinstance(err, compat_urllib_error.HTTPError):
if self.__can_accept_status_code(err, expected_status): if self.__can_accept_status_code(err, expected_status):
# Retain reference to error to prevent file object from # Retain reference to error to prevent file object from
@ -1182,16 +1187,33 @@ class InfoExtractor(object):
'twitter card player') 'twitter card player')
def _search_json_ld(self, html, video_id, expected_type=None, **kwargs): def _search_json_ld(self, html, video_id, expected_type=None, **kwargs):
json_ld = self._search_regex( json_ld_list = list(re.finditer(JSON_LD_RE, html))
JSON_LD_RE, html, 'JSON-LD', group='json_ld', **kwargs)
default = kwargs.get('default', NO_DEFAULT) default = kwargs.get('default', NO_DEFAULT)
if not json_ld:
return default if default is not NO_DEFAULT else {}
# JSON-LD may be malformed and thus `fatal` should be respected. # JSON-LD may be malformed and thus `fatal` should be respected.
# At the same time `default` may be passed that assumes `fatal=False` # At the same time `default` may be passed that assumes `fatal=False`
# for _search_regex. Let's simulate the same behavior here as well. # for _search_regex. Let's simulate the same behavior here as well.
fatal = kwargs.get('fatal', True) if default == NO_DEFAULT else False fatal = kwargs.get('fatal', True) if default == NO_DEFAULT else False
return self._json_ld(json_ld, video_id, fatal=fatal, expected_type=expected_type) json_ld = []
for mobj in json_ld_list:
json_ld_item = self._parse_json(
mobj.group('json_ld'), video_id, fatal=fatal)
if not json_ld_item:
continue
if isinstance(json_ld_item, dict):
json_ld.append(json_ld_item)
elif isinstance(json_ld_item, (list, tuple)):
json_ld.extend(json_ld_item)
if json_ld:
json_ld = self._json_ld(json_ld, video_id, fatal=fatal, expected_type=expected_type)
if json_ld:
return json_ld
if default is not NO_DEFAULT:
return default
elif fatal:
raise RegexNotFoundError('Unable to extract JSON-LD')
else:
self._downloader.report_warning('unable to extract JSON-LD %s' % bug_reports_message())
return {}
def _json_ld(self, json_ld, video_id, fatal=True, expected_type=None): def _json_ld(self, json_ld, video_id, fatal=True, expected_type=None):
if isinstance(json_ld, compat_str): if isinstance(json_ld, compat_str):
@ -1227,7 +1249,10 @@ class InfoExtractor(object):
interaction_type = is_e.get('interactionType') interaction_type = is_e.get('interactionType')
if not isinstance(interaction_type, compat_str): if not isinstance(interaction_type, compat_str):
continue continue
interaction_count = int_or_none(is_e.get('userInteractionCount')) # For interaction count some sites provide string instead of
# an integer (as per spec) with non digit characters (e.g. ",")
# so extracting count with more relaxed str_to_int
interaction_count = str_to_int(is_e.get('userInteractionCount'))
if interaction_count is None: if interaction_count is None:
continue continue
count_kind = INTERACTION_TYPE_MAP.get(interaction_type.split('/')[-1]) count_kind = INTERACTION_TYPE_MAP.get(interaction_type.split('/')[-1])
@ -1247,6 +1272,7 @@ class InfoExtractor(object):
'thumbnail': url_or_none(e.get('thumbnailUrl') or e.get('thumbnailURL')), 'thumbnail': url_or_none(e.get('thumbnailUrl') or e.get('thumbnailURL')),
'duration': parse_duration(e.get('duration')), 'duration': parse_duration(e.get('duration')),
'timestamp': unified_timestamp(e.get('uploadDate')), 'timestamp': unified_timestamp(e.get('uploadDate')),
'uploader': str_or_none(e.get('author')),
'filesize': float_or_none(e.get('contentSize')), 'filesize': float_or_none(e.get('contentSize')),
'tbr': int_or_none(e.get('bitrate')), 'tbr': int_or_none(e.get('bitrate')),
'width': int_or_none(e.get('width')), 'width': int_or_none(e.get('width')),
@ -1256,10 +1282,10 @@ class InfoExtractor(object):
extract_interaction_statistic(e) extract_interaction_statistic(e)
for e in json_ld: for e in json_ld:
if isinstance(e.get('@context'), compat_str) and re.match(r'^https?://schema.org/?$', e.get('@context')): if '@context' in e:
item_type = e.get('@type') item_type = e.get('@type')
if expected_type is not None and expected_type != item_type: if expected_type is not None and expected_type != item_type:
return info continue
if item_type in ('TVEpisode', 'Episode'): if item_type in ('TVEpisode', 'Episode'):
episode_name = unescapeHTML(e.get('name')) episode_name = unescapeHTML(e.get('name'))
info.update({ info.update({
@ -1293,11 +1319,17 @@ class InfoExtractor(object):
}) })
elif item_type == 'VideoObject': elif item_type == 'VideoObject':
extract_video_object(e) extract_video_object(e)
continue if expected_type is None:
continue
else:
break
video = e.get('video') video = e.get('video')
if isinstance(video, dict) and video.get('@type') == 'VideoObject': if isinstance(video, dict) and video.get('@type') == 'VideoObject':
extract_video_object(video) extract_video_object(video)
break if expected_type is None:
continue
else:
break
return dict((k, v) for k, v in info.items() if v is not None) return dict((k, v) for k, v in info.items() if v is not None)
@staticmethod @staticmethod
@ -2820,7 +2852,7 @@ class InfoExtractor(object):
def _set_cookie(self, domain, name, value, expire_time=None, port=None, def _set_cookie(self, domain, name, value, expire_time=None, port=None,
path='/', secure=False, discard=False, rest={}, **kwargs): path='/', secure=False, discard=False, rest={}, **kwargs):
cookie = compat_cookiejar.Cookie( cookie = compat_cookiejar_Cookie(
0, name, value, port, port is not None, domain, True, 0, name, value, port, port is not None, domain, True,
domain.startswith('.'), path, True, secure, expire_time, domain.startswith('.'), path, True, secure, expire_time,
discard, None, None, rest) discard, None, None, rest)

View File

@ -13,6 +13,7 @@ from ..compat import (
compat_b64decode, compat_b64decode,
compat_etree_Element, compat_etree_Element,
compat_etree_fromstring, compat_etree_fromstring,
compat_str,
compat_urllib_parse_urlencode, compat_urllib_parse_urlencode,
compat_urllib_request, compat_urllib_request,
compat_urlparse, compat_urlparse,
@ -25,9 +26,9 @@ from ..utils import (
intlist_to_bytes, intlist_to_bytes,
int_or_none, int_or_none,
lowercase_escape, lowercase_escape,
merge_dicts,
remove_end, remove_end,
sanitized_Request, sanitized_Request,
unified_strdate,
urlencode_postdata, urlencode_postdata,
xpath_text, xpath_text,
) )
@ -136,6 +137,7 @@ class CrunchyrollIE(CrunchyrollBaseIE, VRVIE):
# rtmp # rtmp
'skip_download': True, 'skip_download': True,
}, },
'skip': 'Video gone',
}, { }, {
'url': 'http://www.crunchyroll.com/media-589804/culture-japan-1', 'url': 'http://www.crunchyroll.com/media-589804/culture-japan-1',
'info_dict': { 'info_dict': {
@ -157,11 +159,12 @@ class CrunchyrollIE(CrunchyrollBaseIE, VRVIE):
'info_dict': { 'info_dict': {
'id': '702409', 'id': '702409',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Re:ZERO -Starting Life in Another World- Episode 5 The Morning of Our Promise Is Still Distant', 'title': compat_str,
'description': 'md5:97664de1ab24bbf77a9c01918cb7dca9', 'description': compat_str,
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
'uploader': 'TV TOKYO', 'uploader': 'Re:Zero Partners',
'upload_date': '20160508', 'timestamp': 1462098900,
'upload_date': '20160501',
}, },
'params': { 'params': {
# m3u8 download # m3u8 download
@ -172,12 +175,13 @@ class CrunchyrollIE(CrunchyrollBaseIE, VRVIE):
'info_dict': { 'info_dict': {
'id': '727589', 'id': '727589',
'ext': 'mp4', 'ext': 'mp4',
'title': "KONOSUBA -God's blessing on this wonderful world! 2 Episode 1 Give Me Deliverance From This Judicial Injustice!", 'title': compat_str,
'description': 'md5:cbcf05e528124b0f3a0a419fc805ea7d', 'description': compat_str,
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
'uploader': 'Kadokawa Pictures Inc.', 'uploader': 'Kadokawa Pictures Inc.',
'upload_date': '20170118', 'timestamp': 1484130900,
'series': "KONOSUBA -God's blessing on this wonderful world!", 'upload_date': '20170111',
'series': compat_str,
'season': "KONOSUBA -God's blessing on this wonderful world! 2", 'season': "KONOSUBA -God's blessing on this wonderful world! 2",
'season_number': 2, 'season_number': 2,
'episode': 'Give Me Deliverance From This Judicial Injustice!', 'episode': 'Give Me Deliverance From This Judicial Injustice!',
@ -200,10 +204,11 @@ class CrunchyrollIE(CrunchyrollBaseIE, VRVIE):
'info_dict': { 'info_dict': {
'id': '535080', 'id': '535080',
'ext': 'mp4', 'ext': 'mp4',
'title': '11eyes Episode 1 Red Night ~ Piros éjszaka', 'title': compat_str,
'description': 'Kakeru and Yuka are thrown into an alternate nightmarish world they call "Red Night".', 'description': compat_str,
'uploader': 'Marvelous AQL Inc.', 'uploader': 'Marvelous AQL Inc.',
'upload_date': '20091021', 'timestamp': 1255512600,
'upload_date': '20091014',
}, },
'params': { 'params': {
# Just test metadata extraction # Just test metadata extraction
@ -224,15 +229,17 @@ class CrunchyrollIE(CrunchyrollBaseIE, VRVIE):
# just test metadata extraction # just test metadata extraction
'skip_download': True, 'skip_download': True,
}, },
'skip': 'Video gone',
}, { }, {
# A video with a vastly different season name compared to the series name # A video with a vastly different season name compared to the series name
'url': 'http://www.crunchyroll.com/nyarko-san-another-crawling-chaos/episode-1-test-590532', 'url': 'http://www.crunchyroll.com/nyarko-san-another-crawling-chaos/episode-1-test-590532',
'info_dict': { 'info_dict': {
'id': '590532', 'id': '590532',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Haiyoru! Nyaruani (ONA) Episode 1 Test', 'title': compat_str,
'description': 'Mahiro and Nyaruko talk about official certification.', 'description': compat_str,
'uploader': 'TV TOKYO', 'uploader': 'TV TOKYO',
'timestamp': 1330956000,
'upload_date': '20120305', 'upload_date': '20120305',
'series': 'Nyarko-san: Another Crawling Chaos', 'series': 'Nyarko-san: Another Crawling Chaos',
'season': 'Haiyoru! Nyaruani (ONA)', 'season': 'Haiyoru! Nyaruani (ONA)',
@ -442,23 +449,21 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
webpage, 'language', default=None, group='lang') webpage, 'language', default=None, group='lang')
video_title = self._html_search_regex( video_title = self._html_search_regex(
r'(?s)<h1[^>]*>((?:(?!<h1).)*?<span[^>]+itemprop=["\']title["\'][^>]*>(?:(?!<h1).)+?)</h1>', (r'(?s)<h1[^>]*>((?:(?!<h1).)*?<(?:span[^>]+itemprop=["\']title["\']|meta[^>]+itemprop=["\']position["\'])[^>]*>(?:(?!<h1).)+?)</h1>',
webpage, 'video_title') r'<title>(.+?),\s+-\s+.+? Crunchyroll'),
webpage, 'video_title', default=None)
if not video_title:
video_title = re.sub(r'^Watch\s+', '', self._og_search_description(webpage))
video_title = re.sub(r' {2,}', ' ', video_title) video_title = re.sub(r' {2,}', ' ', video_title)
video_description = (self._parse_json(self._html_search_regex( video_description = (self._parse_json(self._html_search_regex(
r'<script[^>]*>\s*.+?\[media_id=%s\].+?({.+?"description"\s*:.+?})\);' % video_id, r'<script[^>]*>\s*.+?\[media_id=%s\].+?({.+?"description"\s*:.+?})\);' % video_id,
webpage, 'description', default='{}'), video_id) or media_metadata).get('description') webpage, 'description', default='{}'), video_id) or media_metadata).get('description')
if video_description: if video_description:
video_description = lowercase_escape(video_description.replace(r'\r\n', '\n')) video_description = lowercase_escape(video_description.replace(r'\r\n', '\n'))
video_upload_date = self._html_search_regex(
[r'<div>Availability for free users:(.+?)</div>', r'<div>[^<>]+<span>\s*(.+?\d{4})\s*</span></div>'],
webpage, 'video_upload_date', fatal=False, flags=re.DOTALL)
if video_upload_date:
video_upload_date = unified_strdate(video_upload_date)
video_uploader = self._html_search_regex( video_uploader = self._html_search_regex(
# try looking for both an uploader that's a link and one that's not # try looking for both an uploader that's a link and one that's not
[r'<a[^>]+href="/publisher/[^"]+"[^>]*>([^<]+)</a>', r'<div>\s*Publisher:\s*<span>\s*(.+?)\s*</span>\s*</div>'], [r'<a[^>]+href="/publisher/[^"]+"[^>]*>([^<]+)</a>', r'<div>\s*Publisher:\s*<span>\s*(.+?)\s*</span>\s*</div>'],
webpage, 'video_uploader', fatal=False) webpage, 'video_uploader', default=False)
formats = [] formats = []
for stream in media.get('streams', []): for stream in media.get('streams', []):
@ -611,14 +616,15 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
r'(?s)<h\d[^>]+id=["\']showmedia_about_episode_num[^>]+>.+?</h\d>\s*<h4>\s*Season (\d+)', r'(?s)<h\d[^>]+id=["\']showmedia_about_episode_num[^>]+>.+?</h\d>\s*<h4>\s*Season (\d+)',
webpage, 'season number', default=None)) webpage, 'season number', default=None))
return { info = self._search_json_ld(webpage, video_id, default={})
return merge_dicts({
'id': video_id, 'id': video_id,
'title': video_title, 'title': video_title,
'description': video_description, 'description': video_description,
'duration': duration, 'duration': duration,
'thumbnail': thumbnail, 'thumbnail': thumbnail,
'uploader': video_uploader, 'uploader': video_uploader,
'upload_date': video_upload_date,
'series': series, 'series': series,
'season': season, 'season': season,
'season_number': season_number, 'season_number': season_number,
@ -626,7 +632,7 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
'episode_number': episode_number, 'episode_number': episode_number,
'subtitles': subtitles, 'subtitles': subtitles,
'formats': formats, 'formats': formats,
} }, info)
class CrunchyrollShowPlaylistIE(CrunchyrollBaseIE): class CrunchyrollShowPlaylistIE(CrunchyrollBaseIE):

View File

@ -32,7 +32,7 @@ class DailymotionBaseInfoExtractor(InfoExtractor):
@staticmethod @staticmethod
def _get_cookie_value(cookies, name): def _get_cookie_value(cookies, name):
cookie = cookies.get('name') cookie = cookies.get(name)
if cookie: if cookie:
return cookie.value return cookie.value

View File

@ -15,7 +15,7 @@ from ..utils import (
class ExpressenIE(InfoExtractor): class ExpressenIE(InfoExtractor):
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?:// https?://
(?:www\.)?expressen\.se/ (?:www\.)?(?:expressen|di)\.se/
(?:(?:tvspelare/video|videoplayer/embed)/)? (?:(?:tvspelare/video|videoplayer/embed)/)?
tv/(?:[^/]+/)* tv/(?:[^/]+/)*
(?P<id>[^/?#&]+) (?P<id>[^/?#&]+)
@ -42,13 +42,16 @@ class ExpressenIE(InfoExtractor):
}, { }, {
'url': 'https://www.expressen.se/videoplayer/embed/tv/ditv/ekonomistudion/experterna-har-ar-fragorna-som-avgor-valet/?embed=true&external=true&autoplay=true&startVolume=0&partnerId=di', 'url': 'https://www.expressen.se/videoplayer/embed/tv/ditv/ekonomistudion/experterna-har-ar-fragorna-som-avgor-valet/?embed=true&external=true&autoplay=true&startVolume=0&partnerId=di',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.di.se/videoplayer/embed/tv/ditv/borsmorgon/implantica-rusar-70--under-borspremiaren-hor-styrelsemedlemmen/?embed=true&external=true&autoplay=true&startVolume=0&partnerId=di',
'only_matching': True,
}] }]
@staticmethod @staticmethod
def _extract_urls(webpage): def _extract_urls(webpage):
return [ return [
mobj.group('url') for mobj in re.finditer( mobj.group('url') for mobj in re.finditer(
r'<iframe[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//(?:www\.)?expressen\.se/(?:tvspelare/video|videoplayer/embed)/tv/.+?)\1', r'<iframe[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//(?:www\.)?(?:expressen|di)\.se/(?:tvspelare/video|videoplayer/embed)/tv/.+?)\1',
webpage)] webpage)]
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -809,6 +809,16 @@ from .orf import (
ORFFM4IE, ORFFM4IE,
ORFFM4StoryIE, ORFFM4StoryIE,
ORFOE1IE, ORFOE1IE,
ORFOE3IE,
ORFNOEIE,
ORFWIEIE,
ORFBGLIE,
ORFOOEIE,
ORFSTMIE,
ORFKTNIE,
ORFSBGIE,
ORFTIRIE,
ORFVBGIE,
ORFIPTVIE, ORFIPTVIE,
) )
from .outsidetv import OutsideTVIE from .outsidetv import OutsideTVIE
@ -913,7 +923,9 @@ from .rbmaradio import RBMARadioIE
from .rds import RDSIE from .rds import RDSIE
from .redbulltv import ( from .redbulltv import (
RedBullTVIE, RedBullTVIE,
RedBullEmbedIE,
RedBullTVRrnContentIE, RedBullTVRrnContentIE,
RedBullIE,
) )
from .reddit import ( from .reddit import (
RedditIE, RedditIE,
@ -1224,14 +1236,11 @@ from .twentymin import TwentyMinutenIE
from .twentythreevideo import TwentyThreeVideoIE from .twentythreevideo import TwentyThreeVideoIE
from .twitcasting import TwitCastingIE from .twitcasting import TwitCastingIE
from .twitch import ( from .twitch import (
TwitchVideoIE,
TwitchChapterIE,
TwitchVodIE, TwitchVodIE,
TwitchProfileIE, TwitchCollectionIE,
TwitchAllVideosIE, TwitchVideosIE,
TwitchUploadsIE, TwitchVideosClipsIE,
TwitchPastBroadcastsIE, TwitchVideosCollectionsIE,
TwitchHighlightsIE,
TwitchStreamIE, TwitchStreamIE,
TwitchClipsIE, TwitchClipsIE,
) )
@ -1401,7 +1410,7 @@ from .webofstories import (
WebOfStoriesPlaylistIE, WebOfStoriesPlaylistIE,
) )
from .weibo import ( from .weibo import (
WeiboIE, WeiboIE,
WeiboMobileIE WeiboMobileIE
) )
from .weiqitv import WeiqiTVIE from .weiqitv import WeiqiTVIE

View File

@ -466,15 +466,18 @@ class FacebookIE(InfoExtractor):
return info_dict return info_dict
if '/posts/' in url: if '/posts/' in url:
entries = [ video_id_json = self._search_regex(
self.url_result('facebook:%s' % vid, FacebookIE.ie_key()) r'(["\'])video_ids\1\s*:\s*(?P<ids>\[.+?\])', webpage, 'video ids', group='ids',
for vid in self._parse_json( default='')
self._search_regex( if video_id_json:
r'(["\'])video_ids\1\s*:\s*(?P<ids>\[.+?\])', entries = [
webpage, 'video ids', group='ids'), self.url_result('facebook:%s' % vid, FacebookIE.ie_key())
video_id)] for vid in self._parse_json(video_id_json, video_id)]
return self.playlist_result(entries, video_id)
return self.playlist_result(entries, video_id) # Single Video?
video_id = self._search_regex(r'video_id:\s*"([0-9]+)"', webpage, 'single video id')
return self.url_result('facebook:%s' % video_id, FacebookIE.ie_key())
else: else:
_, info_dict = self._extract_from_url( _, info_dict = self._extract_from_url(
self._VIDEO_PAGE_TEMPLATE % video_id, self._VIDEO_PAGE_TEMPLATE % video_id,

View File

@ -1708,6 +1708,15 @@ class GenericIE(InfoExtractor):
}, },
'add_ie': ['Kaltura'], 'add_ie': ['Kaltura'],
}, },
{
# multiple kaltura embeds, nsfw
'url': 'https://www.quartier-rouge.be/prive/femmes/kamila-avec-video-jaime-sadomie.html',
'info_dict': {
'id': 'kamila-avec-video-jaime-sadomie',
'title': "Kamila avec vídeo “J'aime sadomie”",
},
'playlist_count': 8,
},
{ {
# Non-standard Vimeo embed # Non-standard Vimeo embed
'url': 'https://openclassrooms.com/courses/understanding-the-web', 'url': 'https://openclassrooms.com/courses/understanding-the-web',
@ -2844,9 +2853,12 @@ class GenericIE(InfoExtractor):
return self.url_result(mobj.group('url'), 'Zapiks') return self.url_result(mobj.group('url'), 'Zapiks')
# Look for Kaltura embeds # Look for Kaltura embeds
kaltura_url = KalturaIE._extract_url(webpage) kaltura_urls = KalturaIE._extract_urls(webpage)
if kaltura_url: if kaltura_urls:
return self.url_result(smuggle_url(kaltura_url, {'source_url': url}), KalturaIE.ie_key()) return self.playlist_from_matches(
kaltura_urls, video_id, video_title,
getter=lambda x: smuggle_url(x, {'source_url': url}),
ie=KalturaIE.ie_key())
# Look for EaglePlatform embeds # Look for EaglePlatform embeds
eagleplatform_url = EaglePlatformIE._extract_url(webpage) eagleplatform_url = EaglePlatformIE._extract_url(webpage)

View File

@ -13,10 +13,10 @@ from ..utils import (
class GiantBombIE(InfoExtractor): class GiantBombIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?giantbomb\.com/videos/(?P<display_id>[^/]+)/(?P<id>\d+-\d+)' _VALID_URL = r'https?://(?:www\.)?giantbomb\.com/(?:videos|shows)/(?P<display_id>[^/]+)/(?P<id>\d+-\d+)'
_TEST = { _TESTS = [{
'url': 'http://www.giantbomb.com/videos/quick-look-destiny-the-dark-below/2300-9782/', 'url': 'http://www.giantbomb.com/videos/quick-look-destiny-the-dark-below/2300-9782/',
'md5': 'c8ea694254a59246a42831155dec57ac', 'md5': '132f5a803e7e0ab0e274d84bda1e77ae',
'info_dict': { 'info_dict': {
'id': '2300-9782', 'id': '2300-9782',
'display_id': 'quick-look-destiny-the-dark-below', 'display_id': 'quick-look-destiny-the-dark-below',
@ -26,7 +26,10 @@ class GiantBombIE(InfoExtractor):
'duration': 2399, 'duration': 2399,
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
} }
} }, {
'url': 'https://www.giantbomb.com/shows/ben-stranding/2970-20212',
'only_matching': True,
}]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)

View File

@ -220,19 +220,27 @@ class GoogleDriveIE(InfoExtractor):
'id': video_id, 'id': video_id,
'export': 'download', 'export': 'download',
}) })
urlh = self._request_webpage(
source_url, video_id, note='Requesting source file', def request_source_file(source_url, kind):
errnote='Unable to request source file', fatal=False) return self._request_webpage(
source_url, video_id, note='Requesting %s file' % kind,
errnote='Unable to request %s file' % kind, fatal=False)
urlh = request_source_file(source_url, 'source')
if urlh: if urlh:
def add_source_format(src_url): def add_source_format(urlh):
formats.append({ formats.append({
'url': src_url, # Use redirect URLs as download URLs in order to calculate
# correct cookies in _calc_cookies.
# Using original URLs may result in redirect loop due to
# google.com's cookies mistakenly used for googleusercontent.com
# redirect URLs (see #23919).
'url': urlh.geturl(),
'ext': determine_ext(title, 'mp4').lower(), 'ext': determine_ext(title, 'mp4').lower(),
'format_id': 'source', 'format_id': 'source',
'quality': 1, 'quality': 1,
}) })
if urlh.headers.get('Content-Disposition'): if urlh.headers.get('Content-Disposition'):
add_source_format(source_url) add_source_format(urlh)
else: else:
confirmation_webpage = self._webpage_read_content( confirmation_webpage = self._webpage_read_content(
urlh, url, video_id, note='Downloading confirmation page', urlh, url, video_id, note='Downloading confirmation page',
@ -242,9 +250,12 @@ class GoogleDriveIE(InfoExtractor):
r'confirm=([^&"\']+)', confirmation_webpage, r'confirm=([^&"\']+)', confirmation_webpage,
'confirmation code', fatal=False) 'confirmation code', fatal=False)
if confirm: if confirm:
add_source_format(update_url_query(source_url, { confirmed_source_url = update_url_query(source_url, {
'confirm': confirm, 'confirm': confirm,
})) })
urlh = request_source_file(confirmed_source_url, 'confirmed source')
if urlh and urlh.headers.get('Content-Disposition'):
add_source_format(urlh)
if not formats: if not formats:
reason = self._search_regex( reason = self._search_regex(

View File

@ -58,7 +58,7 @@ class IndavideoEmbedIE(InfoExtractor):
video_id = self._match_id(url) video_id = self._match_id(url)
video = self._download_json( video = self._download_json(
'http://amfphp.indavideo.hu/SYm0json.php/player.playerHandler.getVideoData/%s' % video_id, 'https://amfphp.indavideo.hu/SYm0json.php/player.playerHandler.getVideoData/%s' % video_id,
video_id)['data'] video_id)['data']
title = video['title'] title = video['title']

View File

@ -16,12 +16,22 @@ class IPrimaIE(InfoExtractor):
_GEO_BYPASS = False _GEO_BYPASS = False
_TESTS = [{ _TESTS = [{
'url': 'http://play.iprima.cz/gondici-s-r-o-33', 'url': 'https://prima.iprima.cz/particka/92-epizoda',
'info_dict': { 'info_dict': {
'id': 'p136534', 'id': 'p51388',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Gondíci s. r. o. (34)', 'title': 'Partička (92)',
'description': 'md5:16577c629d006aa91f59ca8d8e7f99bd', 'description': 'md5:859d53beae4609e6dd7796413f1b6cac',
},
'params': {
'skip_download': True, # m3u8 download
},
}, {
'url': 'https://cnn.iprima.cz/videa/70-epizoda',
'info_dict': {
'id': 'p681554',
'ext': 'mp4',
'title': 'HLAVNÍ ZPRÁVY 3.5.2020',
}, },
'params': { 'params': {
'skip_download': True, # m3u8 download 'skip_download': True, # m3u8 download
@ -68,9 +78,16 @@ class IPrimaIE(InfoExtractor):
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
title = self._og_search_title(
webpage, default=None) or self._search_regex(
r'<h1>([^<]+)', webpage, 'title')
video_id = self._search_regex( video_id = self._search_regex(
(r'<iframe[^>]+\bsrc=["\'](?:https?:)?//(?:api\.play-backend\.iprima\.cz/prehravac/embedded|prima\.iprima\.cz/[^/]+/[^/]+)\?.*?\bid=(p\d+)', (r'<iframe[^>]+\bsrc=["\'](?:https?:)?//(?:api\.play-backend\.iprima\.cz/prehravac/embedded|prima\.iprima\.cz/[^/]+/[^/]+)\?.*?\bid=(p\d+)',
r'data-product="([^"]+)">'), r'data-product="([^"]+)">',
r'id=["\']player-(p\d+)"',
r'playerId\s*:\s*["\']player-(p\d+)',
r'\bvideos\s*=\s*["\'](p\d+)'),
webpage, 'real id') webpage, 'real id')
playerpage = self._download_webpage( playerpage = self._download_webpage(
@ -125,8 +142,8 @@ class IPrimaIE(InfoExtractor):
return { return {
'id': video_id, 'id': video_id,
'title': self._og_search_title(webpage), 'title': title,
'thumbnail': self._og_search_thumbnail(webpage), 'thumbnail': self._og_search_thumbnail(webpage, default=None),
'formats': formats, 'formats': formats,
'description': self._og_search_description(webpage), 'description': self._og_search_description(webpage, default=None),
} }

View File

@ -4,6 +4,7 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import unsmuggle_url
class JWPlatformIE(InfoExtractor): class JWPlatformIE(InfoExtractor):
@ -32,10 +33,14 @@ class JWPlatformIE(InfoExtractor):
@staticmethod @staticmethod
def _extract_urls(webpage): def _extract_urls(webpage):
return re.findall( return re.findall(
r'<(?:script|iframe)[^>]+?src=["\']((?:https?:)?//content\.jwplatform\.com/players/[a-zA-Z0-9]{8})', r'<(?:script|iframe)[^>]+?src=["\']((?:https?:)?//(?:content\.jwplatform|cdn\.jwplayer)\.com/players/[a-zA-Z0-9]{8})',
webpage) webpage)
def _real_extract(self, url): def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})
self._initialize_geo_bypass({
'countries': smuggled_data.get('geo_countries'),
})
video_id = self._match_id(url) video_id = self._match_id(url)
json_data = self._download_json('https://cdn.jwplayer.com/v2/media/' + video_id, video_id) json_data = self._download_json('https://cdn.jwplayer.com/v2/media/' + video_id, video_id)
return self._parse_jwplayer_data(json_data, video_id) return self._parse_jwplayer_data(json_data, video_id)

View File

@ -113,9 +113,14 @@ class KalturaIE(InfoExtractor):
@staticmethod @staticmethod
def _extract_url(webpage): def _extract_url(webpage):
urls = KalturaIE._extract_urls(webpage)
return urls[0] if urls else None
@staticmethod
def _extract_urls(webpage):
# Embed codes: https://knowledge.kaltura.com/embedding-kaltura-media-players-your-site # Embed codes: https://knowledge.kaltura.com/embedding-kaltura-media-players-your-site
mobj = ( finditer = (
re.search( re.finditer(
r"""(?xs) r"""(?xs)
kWidget\.(?:thumb)?[Ee]mbed\( kWidget\.(?:thumb)?[Ee]mbed\(
\{.*? \{.*?
@ -124,7 +129,7 @@ class KalturaIE(InfoExtractor):
(?P<q3>['"])entry_?[Ii]d(?P=q3)\s*:\s* (?P<q3>['"])entry_?[Ii]d(?P=q3)\s*:\s*
(?P<q4>['"])(?P<id>(?:(?!(?P=q4)).)+)(?P=q4)(?:,|\s*\}) (?P<q4>['"])(?P<id>(?:(?!(?P=q4)).)+)(?P=q4)(?:,|\s*\})
""", webpage) """, webpage)
or re.search( or re.finditer(
r'''(?xs) r'''(?xs)
(?P<q1>["']) (?P<q1>["'])
(?:https?:)?//cdnapi(?:sec)?\.kaltura\.com(?::\d+)?/(?:(?!(?P=q1)).)*\b(?:p|partner_id)/(?P<partner_id>\d+)(?:(?!(?P=q1)).)* (?:https?:)?//cdnapi(?:sec)?\.kaltura\.com(?::\d+)?/(?:(?!(?P=q1)).)*\b(?:p|partner_id)/(?P<partner_id>\d+)(?:(?!(?P=q1)).)*
@ -138,7 +143,7 @@ class KalturaIE(InfoExtractor):
) )
(?P<q3>["'])(?P<id>(?:(?!(?P=q3)).)+)(?P=q3) (?P<q3>["'])(?P<id>(?:(?!(?P=q3)).)+)(?P=q3)
''', webpage) ''', webpage)
or re.search( or re.finditer(
r'''(?xs) r'''(?xs)
<(?:iframe[^>]+src|meta[^>]+\bcontent)=(?P<q1>["']) <(?:iframe[^>]+src|meta[^>]+\bcontent)=(?P<q1>["'])
(?:https?:)?//(?:(?:www|cdnapi(?:sec)?)\.)?kaltura\.com/(?:(?!(?P=q1)).)*\b(?:p|partner_id)/(?P<partner_id>\d+) (?:https?:)?//(?:(?:www|cdnapi(?:sec)?)\.)?kaltura\.com/(?:(?!(?P=q1)).)*\b(?:p|partner_id)/(?P<partner_id>\d+)
@ -148,7 +153,8 @@ class KalturaIE(InfoExtractor):
(?P=q1) (?P=q1)
''', webpage) ''', webpage)
) )
if mobj: urls = []
for mobj in finditer:
embed_info = mobj.groupdict() embed_info = mobj.groupdict()
for k, v in embed_info.items(): for k, v in embed_info.items():
if v: if v:
@ -160,7 +166,8 @@ class KalturaIE(InfoExtractor):
webpage) webpage)
if service_mobj: if service_mobj:
url = smuggle_url(url, {'service_url': service_mobj.group('id')}) url = smuggle_url(url, {'service_url': service_mobj.group('id')})
return url urls.append(url)
return urls
def _kaltura_api_call(self, video_id, actions, service_url=None, *args, **kwargs): def _kaltura_api_call(self, video_id, actions, service_url=None, *args, **kwargs):
params = actions[0] params = actions[0]

View File

@ -128,6 +128,12 @@ class MailRuIE(InfoExtractor):
'http://api.video.mail.ru/videos/%s.json?new=1' % video_id, 'http://api.video.mail.ru/videos/%s.json?new=1' % video_id,
video_id, 'Downloading video JSON') video_id, 'Downloading video JSON')
headers = {}
video_key = self._get_cookies('https://my.mail.ru').get('video_key')
if video_key:
headers['Cookie'] = 'video_key=%s' % video_key.value
formats = [] formats = []
for f in video_data['videos']: for f in video_data['videos']:
video_url = f.get('url') video_url = f.get('url')
@ -140,6 +146,7 @@ class MailRuIE(InfoExtractor):
'url': video_url, 'url': video_url,
'format_id': format_id, 'format_id': format_id,
'height': height, 'height': height,
'http_headers': headers,
}) })
self._sort_formats(formats) self._sort_formats(formats)

View File

@ -8,7 +8,7 @@ from ..utils import merge_dicts
class MallTVIE(InfoExtractor): class MallTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?mall\.tv/(?:[^/]+/)*(?P<id>[^/?#&]+)' _VALID_URL = r'https?://(?:(?:www|sk)\.)?mall\.tv/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://www.mall.tv/18-miliard-pro-neziskovky-opravdu-jsou-sportovci-nebo-clovek-v-tisni-pijavice', 'url': 'https://www.mall.tv/18-miliard-pro-neziskovky-opravdu-jsou-sportovci-nebo-clovek-v-tisni-pijavice',
'md5': '1c4a37f080e1f3023103a7b43458e518', 'md5': '1c4a37f080e1f3023103a7b43458e518',
@ -26,6 +26,9 @@ class MallTVIE(InfoExtractor):
}, { }, {
'url': 'https://www.mall.tv/kdo-to-plati/18-miliard-pro-neziskovky-opravdu-jsou-sportovci-nebo-clovek-v-tisni-pijavice', 'url': 'https://www.mall.tv/kdo-to-plati/18-miliard-pro-neziskovky-opravdu-jsou-sportovci-nebo-clovek-v-tisni-pijavice',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://sk.mall.tv/gejmhaus/reklamacia-nehreje-vyrobnik-tepla-alebo-spekacka',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -26,7 +26,7 @@ class MotherlessIE(InfoExtractor):
'categories': ['Gaming', 'anal', 'reluctant', 'rough', 'Wife'], 'categories': ['Gaming', 'anal', 'reluctant', 'rough', 'Wife'],
'upload_date': '20100913', 'upload_date': '20100913',
'uploader_id': 'famouslyfuckedup', 'uploader_id': 'famouslyfuckedup',
'thumbnail': r're:http://.*\.jpg', 'thumbnail': r're:https?://.*\.jpg',
'age_limit': 18, 'age_limit': 18,
} }
}, { }, {
@ -40,7 +40,7 @@ class MotherlessIE(InfoExtractor):
'game', 'hairy'], 'game', 'hairy'],
'upload_date': '20140622', 'upload_date': '20140622',
'uploader_id': 'Sulivana7x', 'uploader_id': 'Sulivana7x',
'thumbnail': r're:http://.*\.jpg', 'thumbnail': r're:https?://.*\.jpg',
'age_limit': 18, 'age_limit': 18,
}, },
'skip': '404', 'skip': '404',
@ -54,7 +54,7 @@ class MotherlessIE(InfoExtractor):
'categories': ['superheroine heroine superher'], 'categories': ['superheroine heroine superher'],
'upload_date': '20140827', 'upload_date': '20140827',
'uploader_id': 'shade0230', 'uploader_id': 'shade0230',
'thumbnail': r're:http://.*\.jpg', 'thumbnail': r're:https?://.*\.jpg',
'age_limit': 18, 'age_limit': 18,
} }
}, { }, {
@ -76,7 +76,8 @@ class MotherlessIE(InfoExtractor):
raise ExtractorError('Video %s is for friends only' % video_id, expected=True) raise ExtractorError('Video %s is for friends only' % video_id, expected=True)
title = self._html_search_regex( title = self._html_search_regex(
r'id="view-upload-title">\s+([^<]+)<', webpage, 'title') (r'(?s)<div[^>]+\bclass=["\']media-meta-title[^>]+>(.+?)</div>',
r'id="view-upload-title">\s+([^<]+)<'), webpage, 'title')
video_url = (self._html_search_regex( video_url = (self._html_search_regex(
(r'setup\(\{\s*["\']file["\']\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1', (r'setup\(\{\s*["\']file["\']\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1',
r'fileurl\s*=\s*(["\'])(?P<url>(?:(?!\1).)+)\1'), r'fileurl\s*=\s*(["\'])(?P<url>(?:(?!\1).)+)\1'),
@ -84,14 +85,15 @@ class MotherlessIE(InfoExtractor):
or 'http://cdn4.videos.motherlessmedia.com/videos/%s.mp4?fs=opencloud' % video_id) or 'http://cdn4.videos.motherlessmedia.com/videos/%s.mp4?fs=opencloud' % video_id)
age_limit = self._rta_search(webpage) age_limit = self._rta_search(webpage)
view_count = str_to_int(self._html_search_regex( view_count = str_to_int(self._html_search_regex(
r'<strong>Views</strong>\s+([^<]+)<', (r'>(\d+)\s+Views<', r'<strong>Views</strong>\s+([^<]+)<'),
webpage, 'view count', fatal=False)) webpage, 'view count', fatal=False))
like_count = str_to_int(self._html_search_regex( like_count = str_to_int(self._html_search_regex(
r'<strong>Favorited</strong>\s+([^<]+)<', (r'>(\d+)\s+Favorites<', r'<strong>Favorited</strong>\s+([^<]+)<'),
webpage, 'like count', fatal=False)) webpage, 'like count', fatal=False))
upload_date = self._html_search_regex( upload_date = self._html_search_regex(
r'<strong>Uploaded</strong>\s+([^<]+)<', webpage, 'upload date') (r'class=["\']count[^>]+>(\d+\s+[a-zA-Z]{3}\s+\d{4})<',
r'<strong>Uploaded</strong>\s+([^<]+)<'), webpage, 'upload date')
if 'Ago' in upload_date: if 'Ago' in upload_date:
days = int(re.search(r'([0-9]+)', upload_date).group(1)) days = int(re.search(r'([0-9]+)', upload_date).group(1))
upload_date = (datetime.datetime.now() - datetime.timedelta(days=days)).strftime('%Y%m%d') upload_date = (datetime.datetime.now() - datetime.timedelta(days=days)).strftime('%Y%m%d')

View File

@ -6,6 +6,7 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
clean_html, clean_html,
determine_ext,
int_or_none, int_or_none,
js_to_json, js_to_json,
qualities, qualities,
@ -33,42 +34,76 @@ class NovaEmbedIE(InfoExtractor):
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
bitrates = self._parse_json( duration = None
self._search_regex(
r'(?s)(?:src|bitrates)\s*=\s*({.+?})\s*;', webpage, 'formats'),
video_id, transform_source=js_to_json)
QUALITIES = ('lq', 'mq', 'hq', 'hd')
quality_key = qualities(QUALITIES)
formats = [] formats = []
for format_id, format_list in bitrates.items():
if not isinstance(format_list, list): player = self._parse_json(
format_list = [format_list] self._search_regex(
for format_url in format_list: r'Player\.init\s*\([^,]+,\s*({.+?})\s*,\s*{.+?}\s*\)\s*;',
format_url = url_or_none(format_url) webpage, 'player', default='{}'), video_id, fatal=False)
if not format_url: if player:
continue for format_id, format_list in player['tracks'].items():
if format_id == 'hls': if not isinstance(format_list, list):
formats.extend(self._extract_m3u8_formats( format_list = [format_list]
format_url, video_id, ext='mp4', for format_dict in format_list:
entry_protocol='m3u8_native', m3u8_id='hls', if not isinstance(format_dict, dict):
fatal=False)) continue
continue format_url = url_or_none(format_dict.get('src'))
f = { format_type = format_dict.get('type')
'url': format_url, ext = determine_ext(format_url)
} if (format_type == 'application/x-mpegURL'
f_id = format_id or format_id == 'HLS' or ext == 'm3u8'):
for quality in QUALITIES: formats.extend(self._extract_m3u8_formats(
if '%s.mp4' % quality in format_url: format_url, video_id, 'mp4',
f_id += '-%s' % quality entry_protocol='m3u8_native', m3u8_id='hls',
f.update({ fatal=False))
'quality': quality_key(quality), elif (format_type == 'application/dash+xml'
'format_note': quality.upper(), or format_id == 'DASH' or ext == 'mpd'):
formats.extend(self._extract_mpd_formats(
format_url, video_id, mpd_id='dash', fatal=False))
else:
formats.append({
'url': format_url,
}) })
break duration = int_or_none(player.get('duration'))
f['format_id'] = f_id else:
formats.append(f) # Old path, not actual as of 08.04.2020
bitrates = self._parse_json(
self._search_regex(
r'(?s)(?:src|bitrates)\s*=\s*({.+?})\s*;', webpage, 'formats'),
video_id, transform_source=js_to_json)
QUALITIES = ('lq', 'mq', 'hq', 'hd')
quality_key = qualities(QUALITIES)
for format_id, format_list in bitrates.items():
if not isinstance(format_list, list):
format_list = [format_list]
for format_url in format_list:
format_url = url_or_none(format_url)
if not format_url:
continue
if format_id == 'hls':
formats.extend(self._extract_m3u8_formats(
format_url, video_id, ext='mp4',
entry_protocol='m3u8_native', m3u8_id='hls',
fatal=False))
continue
f = {
'url': format_url,
}
f_id = format_id
for quality in QUALITIES:
if '%s.mp4' % quality in format_url:
f_id += '-%s' % quality
f.update({
'quality': quality_key(quality),
'format_note': quality.upper(),
})
break
f['format_id'] = f_id
formats.append(f)
self._sort_formats(formats) self._sort_formats(formats)
title = self._og_search_title( title = self._og_search_title(
@ -81,7 +116,8 @@ class NovaEmbedIE(InfoExtractor):
r'poster\s*:\s*(["\'])(?P<value>(?:(?!\1).)+)\1', webpage, r'poster\s*:\s*(["\'])(?P<value>(?:(?!\1).)+)\1', webpage,
'thumbnail', fatal=False, group='value') 'thumbnail', fatal=False, group='value')
duration = int_or_none(self._search_regex( duration = int_or_none(self._search_regex(
r'videoDuration\s*:\s*(\d+)', webpage, 'duration', fatal=False)) r'videoDuration\s*:\s*(\d+)', webpage, 'duration',
default=duration))
return { return {
'id': video_id, 'id': video_id,

View File

@ -11,7 +11,6 @@ from ..compat import (
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
int_or_none, int_or_none,
JSON_LD_RE,
js_to_json, js_to_json,
NO_DEFAULT, NO_DEFAULT,
parse_age_limit, parse_age_limit,
@ -425,13 +424,20 @@ class NRKTVEpisodeIE(InfoExtractor):
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
nrk_id = self._parse_json( info = self._search_json_ld(webpage, display_id, default={})
self._search_regex(JSON_LD_RE, webpage, 'JSON-LD', group='json_ld'), nrk_id = info.get('@id') or self._html_search_meta(
display_id)['@id'] 'nrk:program-id', webpage, default=None) or self._search_regex(
r'data-program-id=["\'](%s)' % NRKTVIE._EPISODE_RE, webpage,
'nrk id')
assert re.match(NRKTVIE._EPISODE_RE, nrk_id) assert re.match(NRKTVIE._EPISODE_RE, nrk_id)
return self.url_result(
'nrk:%s' % nrk_id, ie=NRKIE.ie_key(), video_id=nrk_id) info.update({
'_type': 'url_transparent',
'id': nrk_id,
'url': 'nrk:%s' % nrk_id,
'ie_key': NRKIE.ie_key(),
})
return info
class NRKTVSerieBaseIE(InfoExtractor): class NRKTVSerieBaseIE(InfoExtractor):

View File

@ -162,13 +162,12 @@ class ORFTVthekIE(InfoExtractor):
class ORFRadioIE(InfoExtractor): class ORFRadioIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
station = mobj.group('station')
show_date = mobj.group('date') show_date = mobj.group('date')
show_id = mobj.group('show') show_id = mobj.group('show')
data = self._download_json( data = self._download_json(
'http://audioapi.orf.at/%s/api/json/current/broadcast/%s/%s' 'http://audioapi.orf.at/%s/api/json/current/broadcast/%s/%s'
% (station, show_id, show_date), show_id) % (self._API_STATION, show_id, show_date), show_id)
entries = [] entries = []
for info in data['streams']: for info in data['streams']:
@ -183,7 +182,7 @@ class ORFRadioIE(InfoExtractor):
duration = end - start if end and start else None duration = end - start if end and start else None
entries.append({ entries.append({
'id': loop_stream_id.replace('.mp3', ''), 'id': loop_stream_id.replace('.mp3', ''),
'url': 'http://loopstream01.apa.at/?channel=%s&id=%s' % (station, loop_stream_id), 'url': 'http://loopstream01.apa.at/?channel=%s&id=%s' % (self._LOOP_STATION, loop_stream_id),
'title': title, 'title': title,
'description': clean_html(data.get('subtitle')), 'description': clean_html(data.get('subtitle')),
'duration': duration, 'duration': duration,
@ -205,6 +204,8 @@ class ORFFM4IE(ORFRadioIE):
IE_NAME = 'orf:fm4' IE_NAME = 'orf:fm4'
IE_DESC = 'radio FM4' IE_DESC = 'radio FM4'
_VALID_URL = r'https?://(?P<station>fm4)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>4\w+)' _VALID_URL = r'https?://(?P<station>fm4)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>4\w+)'
_API_STATION = 'fm4'
_LOOP_STATION = 'fm4'
_TEST = { _TEST = {
'url': 'http://fm4.orf.at/player/20170107/4CC', 'url': 'http://fm4.orf.at/player/20170107/4CC',
@ -223,10 +224,142 @@ class ORFFM4IE(ORFRadioIE):
} }
class ORFNOEIE(ORFRadioIE):
IE_NAME = 'orf:noe'
IE_DESC = 'Radio Niederösterreich'
_VALID_URL = r'https?://(?P<station>noe)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
_API_STATION = 'noe'
_LOOP_STATION = 'oe2n'
_TEST = {
'url': 'https://noe.orf.at/player/20200423/NGM',
'only_matching': True,
}
class ORFWIEIE(ORFRadioIE):
IE_NAME = 'orf:wien'
IE_DESC = 'Radio Wien'
_VALID_URL = r'https?://(?P<station>wien)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
_API_STATION = 'wie'
_LOOP_STATION = 'oe2w'
_TEST = {
'url': 'https://wien.orf.at/player/20200423/WGUM',
'only_matching': True,
}
class ORFBGLIE(ORFRadioIE):
IE_NAME = 'orf:burgenland'
IE_DESC = 'Radio Burgenland'
_VALID_URL = r'https?://(?P<station>burgenland)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
_API_STATION = 'bgl'
_LOOP_STATION = 'oe2b'
_TEST = {
'url': 'https://burgenland.orf.at/player/20200423/BGM',
'only_matching': True,
}
class ORFOOEIE(ORFRadioIE):
IE_NAME = 'orf:oberoesterreich'
IE_DESC = 'Radio Oberösterreich'
_VALID_URL = r'https?://(?P<station>ooe)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
_API_STATION = 'ooe'
_LOOP_STATION = 'oe2o'
_TEST = {
'url': 'https://ooe.orf.at/player/20200423/OGMO',
'only_matching': True,
}
class ORFSTMIE(ORFRadioIE):
IE_NAME = 'orf:steiermark'
IE_DESC = 'Radio Steiermark'
_VALID_URL = r'https?://(?P<station>steiermark)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
_API_STATION = 'stm'
_LOOP_STATION = 'oe2st'
_TEST = {
'url': 'https://steiermark.orf.at/player/20200423/STGMS',
'only_matching': True,
}
class ORFKTNIE(ORFRadioIE):
IE_NAME = 'orf:kaernten'
IE_DESC = 'Radio Kärnten'
_VALID_URL = r'https?://(?P<station>kaernten)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
_API_STATION = 'ktn'
_LOOP_STATION = 'oe2k'
_TEST = {
'url': 'https://kaernten.orf.at/player/20200423/KGUMO',
'only_matching': True,
}
class ORFSBGIE(ORFRadioIE):
IE_NAME = 'orf:salzburg'
IE_DESC = 'Radio Salzburg'
_VALID_URL = r'https?://(?P<station>salzburg)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
_API_STATION = 'sbg'
_LOOP_STATION = 'oe2s'
_TEST = {
'url': 'https://salzburg.orf.at/player/20200423/SGUM',
'only_matching': True,
}
class ORFTIRIE(ORFRadioIE):
IE_NAME = 'orf:tirol'
IE_DESC = 'Radio Tirol'
_VALID_URL = r'https?://(?P<station>tirol)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
_API_STATION = 'tir'
_LOOP_STATION = 'oe2t'
_TEST = {
'url': 'https://tirol.orf.at/player/20200423/TGUMO',
'only_matching': True,
}
class ORFVBGIE(ORFRadioIE):
IE_NAME = 'orf:vorarlberg'
IE_DESC = 'Radio Vorarlberg'
_VALID_URL = r'https?://(?P<station>vorarlberg)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
_API_STATION = 'vbg'
_LOOP_STATION = 'oe2v'
_TEST = {
'url': 'https://vorarlberg.orf.at/player/20200423/VGUM',
'only_matching': True,
}
class ORFOE3IE(ORFRadioIE):
IE_NAME = 'orf:oe3'
IE_DESC = 'Radio Österreich 3'
_VALID_URL = r'https?://(?P<station>oe3)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
_API_STATION = 'oe3'
_LOOP_STATION = 'oe3'
_TEST = {
'url': 'https://oe3.orf.at/player/20200424/3WEK',
'only_matching': True,
}
class ORFOE1IE(ORFRadioIE): class ORFOE1IE(ORFRadioIE):
IE_NAME = 'orf:oe1' IE_NAME = 'orf:oe1'
IE_DESC = 'Radio Österreich 1' IE_DESC = 'Radio Österreich 1'
_VALID_URL = r'https?://(?P<station>oe1)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)' _VALID_URL = r'https?://(?P<station>oe1)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
_API_STATION = 'oe1'
_LOOP_STATION = 'oe1'
_TEST = { _TEST = {
'url': 'http://oe1.orf.at/player/20170108/456544', 'url': 'http://oe1.orf.at/player/20170108/456544',

View File

@ -18,7 +18,7 @@ class PeriscopeBaseIE(InfoExtractor):
item_id, query=query) item_id, query=query)
def _parse_broadcast_data(self, broadcast, video_id): def _parse_broadcast_data(self, broadcast, video_id):
title = broadcast['status'] title = broadcast.get('status') or 'Periscope Broadcast'
uploader = broadcast.get('user_display_name') or broadcast.get('username') uploader = broadcast.get('user_display_name') or broadcast.get('username')
title = '%s - %s' % (uploader, title) if uploader else title title = '%s - %s' % (uploader, title) if uploader else title
is_live = broadcast.get('state').lower() == 'running' is_live = broadcast.get('state').lower() == 'running'

View File

@ -17,6 +17,7 @@ from ..utils import (
determine_ext, determine_ext,
ExtractorError, ExtractorError,
int_or_none, int_or_none,
merge_dicts,
NO_DEFAULT, NO_DEFAULT,
orderedSet, orderedSet,
remove_quotes, remove_quotes,
@ -59,13 +60,14 @@ class PornHubIE(PornHubBaseIE):
''' '''
_TESTS = [{ _TESTS = [{
'url': 'http://www.pornhub.com/view_video.php?viewkey=648719015', 'url': 'http://www.pornhub.com/view_video.php?viewkey=648719015',
'md5': '1e19b41231a02eba417839222ac9d58e', 'md5': 'a6391306d050e4547f62b3f485dd9ba9',
'info_dict': { 'info_dict': {
'id': '648719015', 'id': '648719015',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Seductive Indian beauty strips down and fingers her pink pussy', 'title': 'Seductive Indian beauty strips down and fingers her pink pussy',
'uploader': 'Babes', 'uploader': 'Babes',
'upload_date': '20130628', 'upload_date': '20130628',
'timestamp': 1372447216,
'duration': 361, 'duration': 361,
'view_count': int, 'view_count': int,
'like_count': int, 'like_count': int,
@ -82,8 +84,8 @@ class PornHubIE(PornHubBaseIE):
'id': '1331683002', 'id': '1331683002',
'ext': 'mp4', 'ext': 'mp4',
'title': '重庆婷婷女王足交', 'title': '重庆婷婷女王足交',
'uploader': 'Unknown',
'upload_date': '20150213', 'upload_date': '20150213',
'timestamp': 1423804862,
'duration': 1753, 'duration': 1753,
'view_count': int, 'view_count': int,
'like_count': int, 'like_count': int,
@ -121,6 +123,7 @@ class PornHubIE(PornHubBaseIE):
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
'skip': 'This video has been disabled',
}, { }, {
'url': 'http://www.pornhub.com/view_video.php?viewkey=ph557bbb6676d2d', 'url': 'http://www.pornhub.com/view_video.php?viewkey=ph557bbb6676d2d',
'only_matching': True, 'only_matching': True,
@ -338,10 +341,10 @@ class PornHubIE(PornHubBaseIE):
video_uploader = self._html_search_regex( video_uploader = self._html_search_regex(
r'(?s)From:&nbsp;.+?<(?:a\b[^>]+\bhref=["\']/(?:(?:user|channel)s|model|pornstar)/|span\b[^>]+\bclass=["\']username)[^>]+>(.+?)<', r'(?s)From:&nbsp;.+?<(?:a\b[^>]+\bhref=["\']/(?:(?:user|channel)s|model|pornstar)/|span\b[^>]+\bclass=["\']username)[^>]+>(.+?)<',
webpage, 'uploader', fatal=False) webpage, 'uploader', default=None)
view_count = self._extract_count( view_count = self._extract_count(
r'<span class="count">([\d,\.]+)</span> views', webpage, 'view') r'<span class="count">([\d,\.]+)</span> [Vv]iews', webpage, 'view')
like_count = self._extract_count( like_count = self._extract_count(
r'<span class="votesUp">([\d,\.]+)</span>', webpage, 'like') r'<span class="votesUp">([\d,\.]+)</span>', webpage, 'like')
dislike_count = self._extract_count( dislike_count = self._extract_count(
@ -356,7 +359,11 @@ class PornHubIE(PornHubBaseIE):
if div: if div:
return re.findall(r'<a[^>]+\bhref=[^>]+>([^<]+)', div) return re.findall(r'<a[^>]+\bhref=[^>]+>([^<]+)', div)
return { info = self._search_json_ld(webpage, video_id, default={})
# description provided in JSON-LD is irrelevant
info['description'] = None
return merge_dicts({
'id': video_id, 'id': video_id,
'uploader': video_uploader, 'uploader': video_uploader,
'upload_date': upload_date, 'upload_date': upload_date,
@ -372,7 +379,7 @@ class PornHubIE(PornHubBaseIE):
'tags': extract_list('tags'), 'tags': extract_list('tags'),
'categories': extract_list('categories'), 'categories': extract_list('categories'),
'subtitles': subtitles, 'subtitles': subtitles,
} }, info)
class PornHubPlaylistBaseIE(PornHubBaseIE): class PornHubPlaylistBaseIE(PornHubBaseIE):

View File

@ -11,6 +11,7 @@ from ..utils import (
determine_ext, determine_ext,
float_or_none, float_or_none,
int_or_none, int_or_none,
merge_dicts,
unified_strdate, unified_strdate,
) )
@ -175,7 +176,7 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
(?: (?:
(?:beta\.)? (?:beta\.)?
(?: (?:
prosieben(?:maxx)?|sixx|sat1(?:gold)?|kabeleins(?:doku)?|the-voice-of-germany|7tv|advopedia prosieben(?:maxx)?|sixx|sat1(?:gold)?|kabeleins(?:doku)?|the-voice-of-germany|advopedia
)\.(?:de|at|ch)| )\.(?:de|at|ch)|
ran\.de|fem\.com|advopedia\.de|galileo\.tv/video ran\.de|fem\.com|advopedia\.de|galileo\.tv/video
) )
@ -193,10 +194,14 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
'info_dict': { 'info_dict': {
'id': '2104602', 'id': '2104602',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Episode 18 - Staffel 2', 'title': 'CIRCUS HALLIGALLI - Episode 18 - Staffel 2',
'description': 'md5:8733c81b702ea472e069bc48bb658fc1', 'description': 'md5:8733c81b702ea472e069bc48bb658fc1',
'upload_date': '20131231', 'upload_date': '20131231',
'duration': 5845.04, 'duration': 5845.04,
'series': 'CIRCUS HALLIGALLI',
'season_number': 2,
'episode': 'Episode 18 - Staffel 2',
'episode_number': 18,
}, },
}, },
{ {
@ -300,8 +305,9 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
'info_dict': { 'info_dict': {
'id': '2572814', 'id': '2572814',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Andreas Kümmert: Rocket Man', 'title': 'The Voice of Germany - Andreas Kümmert: Rocket Man',
'description': 'md5:6ddb02b0781c6adf778afea606652e38', 'description': 'md5:6ddb02b0781c6adf778afea606652e38',
'timestamp': 1382041620,
'upload_date': '20131017', 'upload_date': '20131017',
'duration': 469.88, 'duration': 469.88,
}, },
@ -310,7 +316,7 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
}, },
}, },
{ {
'url': 'http://www.fem.com/wellness/videos/wellness-video-clip-kurztripps-zum-valentinstag.html', 'url': 'http://www.fem.com/videos/beauty-lifestyle/kurztrips-zum-valentinstag',
'info_dict': { 'info_dict': {
'id': '2156342', 'id': '2156342',
'ext': 'mp4', 'ext': 'mp4',
@ -332,19 +338,6 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
'playlist_count': 2, 'playlist_count': 2,
'skip': 'This video is unavailable', 'skip': 'This video is unavailable',
}, },
{
'url': 'http://www.7tv.de/circus-halligalli/615-best-of-circus-halligalli-ganze-folge',
'info_dict': {
'id': '4187506',
'ext': 'mp4',
'title': 'Best of Circus HalliGalli',
'description': 'md5:8849752efd90b9772c9db6fdf87fb9e9',
'upload_date': '20151229',
},
'params': {
'skip_download': True,
},
},
{ {
# title in <h2 class="subtitle"> # title in <h2 class="subtitle">
'url': 'http://www.prosieben.de/stars/oscar-award/videos/jetzt-erst-enthuellt-das-geheimnis-von-emma-stones-oscar-robe-clip', 'url': 'http://www.prosieben.de/stars/oscar-award/videos/jetzt-erst-enthuellt-das-geheimnis-von-emma-stones-oscar-robe-clip',
@ -421,7 +414,6 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
r'<div[^>]+id="veeseoDescription"[^>]*>(.+?)</div>', r'<div[^>]+id="veeseoDescription"[^>]*>(.+?)</div>',
] ]
_UPLOAD_DATE_REGEXES = [ _UPLOAD_DATE_REGEXES = [
r'<meta property="og:published_time" content="(.+?)">',
r'<span>\s*(\d{2}\.\d{2}\.\d{4} \d{2}:\d{2}) \|\s*<span itemprop="duration"', r'<span>\s*(\d{2}\.\d{2}\.\d{4} \d{2}:\d{2}) \|\s*<span itemprop="duration"',
r'<footer>\s*(\d{2}\.\d{2}\.\d{4}) \d{2}:\d{2} Uhr', r'<footer>\s*(\d{2}\.\d{2}\.\d{4}) \d{2}:\d{2} Uhr',
r'<span style="padding-left: 4px;line-height:20px; color:#404040">(\d{2}\.\d{2}\.\d{4})</span>', r'<span style="padding-left: 4px;line-height:20px; color:#404040">(\d{2}\.\d{2}\.\d{4})</span>',
@ -451,17 +443,21 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
if description is None: if description is None:
description = self._og_search_description(webpage) description = self._og_search_description(webpage)
thumbnail = self._og_search_thumbnail(webpage) thumbnail = self._og_search_thumbnail(webpage)
upload_date = unified_strdate(self._html_search_regex( upload_date = unified_strdate(
self._UPLOAD_DATE_REGEXES, webpage, 'upload date', default=None)) self._html_search_meta('og:published_time', webpage,
'upload date', default=None)
or self._html_search_regex(self._UPLOAD_DATE_REGEXES,
webpage, 'upload date', default=None))
info.update({ json_ld = self._search_json_ld(webpage, clip_id, default={})
return merge_dicts(info, {
'id': clip_id, 'id': clip_id,
'title': title, 'title': title,
'description': description, 'description': description,
'thumbnail': thumbnail, 'thumbnail': thumbnail,
'upload_date': upload_date, 'upload_date': upload_date,
}) }, json_ld)
return info
def _extract_playlist(self, url, webpage): def _extract_playlist(self, url, webpage):
playlist_id = self._html_search_regex( playlist_id = self._html_search_regex(

View File

@ -82,17 +82,6 @@ class PuhuTVIE(InfoExtractor):
urls = [] urls = []
formats = [] formats = []
def add_http_from_hls(m3u8_f):
http_url = m3u8_f['url'].replace('/hls/', '/mp4/').replace('/chunklist.m3u8', '.mp4')
if http_url != m3u8_f['url']:
f = m3u8_f.copy()
f.update({
'format_id': f['format_id'].replace('hls', 'http'),
'protocol': 'http',
'url': http_url,
})
formats.append(f)
for video in videos['data']['videos']: for video in videos['data']['videos']:
media_url = url_or_none(video.get('url')) media_url = url_or_none(video.get('url'))
if not media_url or media_url in urls: if not media_url or media_url in urls:
@ -101,12 +90,9 @@ class PuhuTVIE(InfoExtractor):
playlist = video.get('is_playlist') playlist = video.get('is_playlist')
if (video.get('stream_type') == 'hls' and playlist is True) or 'playlist.m3u8' in media_url: if (video.get('stream_type') == 'hls' and playlist is True) or 'playlist.m3u8' in media_url:
m3u8_formats = self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
media_url, video_id, 'mp4', entry_protocol='m3u8_native', media_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False) m3u8_id='hls', fatal=False))
for m3u8_f in m3u8_formats:
formats.append(m3u8_f)
add_http_from_hls(m3u8_f)
continue continue
quality = int_or_none(video.get('quality')) quality = int_or_none(video.get('quality'))
@ -128,8 +114,6 @@ class PuhuTVIE(InfoExtractor):
format_id += '-%sp' % quality format_id += '-%sp' % quality
f['format_id'] = format_id f['format_id'] = format_id
formats.append(f) formats.append(f)
if is_hls:
add_http_from_hls(f)
self._sort_formats(formats) self._sort_formats(formats)
creator = try_get( creator = try_get(

View File

@ -1,6 +1,8 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_HTTPError from ..compat import compat_HTTPError
from ..utils import ( from ..utils import (
@ -10,7 +12,7 @@ from ..utils import (
class RedBullTVIE(InfoExtractor): class RedBullTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?redbull(?:\.tv|\.com(?:/[^/]+)?(?:/tv)?)(?:/events/[^/]+)?/(?:videos?|live)/(?P<id>AP-\w+)' _VALID_URL = r'https?://(?:www\.)?redbull(?:\.tv|\.com(?:/[^/]+)?(?:/tv)?)(?:/events/[^/]+)?/(?:videos?|live|(?:film|episode)s)/(?P<id>AP-\w+)'
_TESTS = [{ _TESTS = [{
# film # film
'url': 'https://www.redbull.tv/video/AP-1Q6XCDTAN1W11', 'url': 'https://www.redbull.tv/video/AP-1Q6XCDTAN1W11',
@ -29,8 +31,8 @@ class RedBullTVIE(InfoExtractor):
'id': 'AP-1PMHKJFCW1W11', 'id': 'AP-1PMHKJFCW1W11',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Grime - Hashtags S2E4', 'title': 'Grime - Hashtags S2E4',
'description': 'md5:b5f522b89b72e1e23216e5018810bb25', 'description': 'md5:5546aa612958c08a98faaad4abce484d',
'duration': 904.6, 'duration': 904,
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
@ -44,11 +46,15 @@ class RedBullTVIE(InfoExtractor):
}, { }, {
'url': 'https://www.redbull.com/us-en/events/AP-1XV2K61Q51W11/live/AP-1XUJ86FDH1W11', 'url': 'https://www.redbull.com/us-en/events/AP-1XV2K61Q51W11/live/AP-1XUJ86FDH1W11',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.redbull.com/int-en/films/AP-1ZSMAW8FH2111',
'only_matching': True,
}, {
'url': 'https://www.redbull.com/int-en/episodes/AP-1TQWK7XE11W11',
'only_matching': True,
}] }]
def _real_extract(self, url): def extract_info(self, video_id):
video_id = self._match_id(url)
session = self._download_json( session = self._download_json(
'https://api.redbull.tv/v3/session', video_id, 'https://api.redbull.tv/v3/session', video_id,
note='Downloading access token', query={ note='Downloading access token', query={
@ -105,24 +111,119 @@ class RedBullTVIE(InfoExtractor):
'subtitles': subtitles, 'subtitles': subtitles,
} }
def _real_extract(self, url):
video_id = self._match_id(url)
return self.extract_info(video_id)
class RedBullEmbedIE(RedBullTVIE):
_VALID_URL = r'https?://(?:www\.)?redbull\.com/embed/(?P<id>rrn:content:[^:]+:[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12}:[a-z]{2}-[A-Z]{2,3})'
_TESTS = [{
# HLS manifest accessible only using assetId
'url': 'https://www.redbull.com/embed/rrn:content:episode-videos:f3021f4f-3ed4-51ac-915a-11987126e405:en-INT',
'only_matching': True,
}]
_VIDEO_ESSENSE_TMPL = '''... on %s {
videoEssence {
attributes
}
}'''
def _real_extract(self, url):
rrn_id = self._match_id(url)
asset_id = self._download_json(
'https://edge-graphql.crepo-production.redbullaws.com/v1/graphql',
rrn_id, headers={'API-KEY': 'e90a1ff11335423998b100c929ecc866'},
query={
'query': '''{
resource(id: "%s", enforceGeoBlocking: false) {
%s
%s
}
}''' % (rrn_id, self._VIDEO_ESSENSE_TMPL % 'LiveVideo', self._VIDEO_ESSENSE_TMPL % 'VideoResource'),
})['data']['resource']['videoEssence']['attributes']['assetId']
return self.extract_info(asset_id)
class RedBullTVRrnContentIE(InfoExtractor): class RedBullTVRrnContentIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?redbull(?:\.tv|\.com(?:/[^/]+)?(?:/tv)?)/(?:video|live)/rrn:content:[^:]+:(?P<id>[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})' _VALID_URL = r'https?://(?:www\.)?redbull\.com/(?P<region>[a-z]{2,3})-(?P<lang>[a-z]{2})/tv/(?:video|live|film)/(?P<id>rrn:content:[^:]+:[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})'
_TESTS = [{ _TESTS = [{
'url': 'https://www.redbull.com/int-en/tv/video/rrn:content:live-videos:e3e6feb4-e95f-50b7-962a-c70f8fd13c73/mens-dh-finals-fort-william', 'url': 'https://www.redbull.com/int-en/tv/video/rrn:content:live-videos:e3e6feb4-e95f-50b7-962a-c70f8fd13c73/mens-dh-finals-fort-william',
'only_matching': True, 'only_matching': True,
}, { }, {
'url': 'https://www.redbull.com/int-en/tv/video/rrn:content:videos:a36a0f36-ff1b-5db8-a69d-ee11a14bf48b/tn-ts-style?playlist=rrn:content:event-profiles:83f05926-5de8-5389-b5e4-9bb312d715e8:extras', 'url': 'https://www.redbull.com/int-en/tv/video/rrn:content:videos:a36a0f36-ff1b-5db8-a69d-ee11a14bf48b/tn-ts-style?playlist=rrn:content:event-profiles:83f05926-5de8-5389-b5e4-9bb312d715e8:extras',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.redbull.com/int-en/tv/film/rrn:content:films:d1f4d00e-4c04-5d19-b510-a805ffa2ab83/follow-me',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) region, lang, rrn_id = re.search(self._VALID_URL, url).groups()
rrn_id += ':%s-%s' % (lang, region.upper())
return self.url_result(
'https://www.redbull.com/embed/' + rrn_id,
RedBullEmbedIE.ie_key(), rrn_id)
webpage = self._download_webpage(url, display_id)
video_url = self._og_search_url(webpage) class RedBullIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?redbull\.com/(?P<region>[a-z]{2,3})-(?P<lang>[a-z]{2})/(?P<type>(?:episode|film|(?:(?:recap|trailer)-)?video)s|live)/(?!AP-|rrn:content:)(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://www.redbull.com/int-en/episodes/grime-hashtags-s02-e04',
'md5': 'db8271a7200d40053a1809ed0dd574ff',
'info_dict': {
'id': 'AA-1MT8DQWA91W14',
'ext': 'mp4',
'title': 'Grime - Hashtags S2E4',
'description': 'md5:5546aa612958c08a98faaad4abce484d',
},
}, {
'url': 'https://www.redbull.com/int-en/films/kilimanjaro-mountain-of-greatness',
'only_matching': True,
}, {
'url': 'https://www.redbull.com/int-en/recap-videos/uci-mountain-bike-world-cup-2017-mens-xco-finals-from-vallnord',
'only_matching': True,
}, {
'url': 'https://www.redbull.com/int-en/trailer-videos/kings-of-content',
'only_matching': True,
}, {
'url': 'https://www.redbull.com/int-en/videos/tnts-style-red-bull-dance-your-style-s1-e12',
'only_matching': True,
}, {
'url': 'https://www.redbull.com/int-en/live/mens-dh-finals-fort-william',
'only_matching': True,
}, {
# only available on the int-en website so a fallback is need for the API
# https://www.redbull.com/v3/api/graphql/v1/v3/query/en-GB>en-INT?filter[uriSlug]=fia-wrc-saturday-recap-estonia&rb3Schema=v1:hero
'url': 'https://www.redbull.com/gb-en/live/fia-wrc-saturday-recap-estonia',
'only_matching': True,
}]
_INT_FALLBACK_LIST = ['de', 'en', 'es', 'fr']
_LAT_FALLBACK_MAP = ['ar', 'bo', 'car', 'cl', 'co', 'mx', 'pe']
def _real_extract(self, url):
region, lang, filter_type, display_id = re.search(self._VALID_URL, url).groups()
if filter_type == 'episodes':
filter_type = 'episode-videos'
elif filter_type == 'live':
filter_type = 'live-videos'
regions = [region.upper()]
if region != 'int':
if region in self._LAT_FALLBACK_MAP:
regions.append('LAT')
if lang in self._INT_FALLBACK_LIST:
regions.append('INT')
locale = '>'.join(['%s-%s' % (lang, reg) for reg in regions])
rrn_id = self._download_json(
'https://www.redbull.com/v3/api/graphql/v1/v3/query/' + locale,
display_id, query={
'filter[type]': filter_type,
'filter[uriSlug]': display_id,
'rb3Schema': 'v1:hero',
})['data']['id']
return self.url_result( return self.url_result(
video_url, ie=RedBullTVIE.ie_key(), 'https://www.redbull.com/embed/' + rrn_id,
video_id=RedBullTVIE._match_id(video_url)) RedBullEmbedIE.ie_key(), rrn_id)

View File

@ -4,6 +4,7 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
determine_ext,
ExtractorError, ExtractorError,
int_or_none, int_or_none,
merge_dicts, merge_dicts,
@ -14,7 +15,7 @@ from ..utils import (
class RedTubeIE(InfoExtractor): class RedTubeIE(InfoExtractor):
_VALID_URL = r'https?://(?:(?:www\.)?redtube\.com/|embed\.redtube\.com/\?.*?\bid=)(?P<id>[0-9]+)' _VALID_URL = r'https?://(?:(?:\w+\.)?redtube\.com/|embed\.redtube\.com/\?.*?\bid=)(?P<id>[0-9]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.redtube.com/66418', 'url': 'http://www.redtube.com/66418',
'md5': 'fc08071233725f26b8f014dba9590005', 'md5': 'fc08071233725f26b8f014dba9590005',
@ -30,6 +31,9 @@ class RedTubeIE(InfoExtractor):
}, { }, {
'url': 'http://embed.redtube.com/?bgcolor=000000&id=1443286', 'url': 'http://embed.redtube.com/?bgcolor=000000&id=1443286',
'only_matching': True, 'only_matching': True,
}, {
'url': 'http://it.redtube.com/66418',
'only_matching': True,
}] }]
@staticmethod @staticmethod
@ -57,7 +61,7 @@ class RedTubeIE(InfoExtractor):
if not info.get('title'): if not info.get('title'):
info['title'] = self._html_search_regex( info['title'] = self._html_search_regex(
(r'<h(\d)[^>]+class="(?:video_title_text|videoTitle)[^"]*">(?P<title>(?:(?!\1).)+)</h\1>', (r'<h(\d)[^>]+class="(?:video_title_text|videoTitle|video_title)[^"]*">(?P<title>(?:(?!\1).)+)</h\1>',
r'(?:videoTitle|title)\s*:\s*(["\'])(?P<title>(?:(?!\1).)+)\1',), r'(?:videoTitle|title)\s*:\s*(["\'])(?P<title>(?:(?!\1).)+)\1',),
webpage, 'title', group='title', webpage, 'title', group='title',
default=None) or self._og_search_title(webpage) default=None) or self._og_search_title(webpage)
@ -77,7 +81,7 @@ class RedTubeIE(InfoExtractor):
}) })
medias = self._parse_json( medias = self._parse_json(
self._search_regex( self._search_regex(
r'mediaDefinition\s*:\s*(\[.+?\])', webpage, r'mediaDefinition["\']?\s*:\s*(\[.+?}\s*\])', webpage,
'media definitions', default='{}'), 'media definitions', default='{}'),
video_id, fatal=False) video_id, fatal=False)
if medias and isinstance(medias, list): if medias and isinstance(medias, list):
@ -85,6 +89,12 @@ class RedTubeIE(InfoExtractor):
format_url = url_or_none(media.get('videoUrl')) format_url = url_or_none(media.get('videoUrl'))
if not format_url: if not format_url:
continue continue
if media.get('format') == 'hls' or determine_ext(format_url) == 'm3u8':
formats.extend(self._extract_m3u8_formats(
format_url, video_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='hls',
fatal=False))
continue
format_id = media.get('quality') format_id = media.get('quality')
formats.append({ formats.append({
'url': format_url, 'url': format_url,

View File

@ -14,12 +14,27 @@ class RtlNlIE(InfoExtractor):
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?://(?:(?:www|static)\.)? https?://(?:(?:www|static)\.)?
(?: (?:
rtlxl\.nl/[^\#]*\#!/[^/]+/| rtlxl\.nl/(?:[^\#]*\#!|programma)/[^/]+/|
rtl\.nl/(?:(?:system/videoplayer/(?:[^/]+/)+(?:video_)?embed\.html|embed)\b.+?\buuid=|video/) rtl\.nl/(?:(?:system/videoplayer/(?:[^/]+/)+(?:video_)?embed\.html|embed)\b.+?\buuid=|video/)|
embed\.rtl\.nl/\#uuid=
) )
(?P<id>[0-9a-f-]+)''' (?P<id>[0-9a-f-]+)'''
_TESTS = [{ _TESTS = [{
# new URL schema
'url': 'https://www.rtlxl.nl/programma/rtl-nieuws/0bd1384d-d970-3086-98bb-5c104e10c26f',
'md5': '490428f1187b60d714f34e1f2e3af0b6',
'info_dict': {
'id': '0bd1384d-d970-3086-98bb-5c104e10c26f',
'ext': 'mp4',
'title': 'RTL Nieuws',
'description': 'md5:d41d8cd98f00b204e9800998ecf8427e',
'timestamp': 1593293400,
'upload_date': '20200627',
'duration': 661.08,
},
}, {
# old URL schema
'url': 'http://www.rtlxl.nl/#!/rtl-nieuws-132237/82b1aad1-4a14-3d7b-b554-b0aed1b2c416', 'url': 'http://www.rtlxl.nl/#!/rtl-nieuws-132237/82b1aad1-4a14-3d7b-b554-b0aed1b2c416',
'md5': '473d1946c1fdd050b2c0161a4b13c373', 'md5': '473d1946c1fdd050b2c0161a4b13c373',
'info_dict': { 'info_dict': {
@ -31,6 +46,7 @@ class RtlNlIE(InfoExtractor):
'upload_date': '20160429', 'upload_date': '20160429',
'duration': 1167.96, 'duration': 1167.96,
}, },
'skip': '404',
}, { }, {
# best format available a3t # best format available a3t
'url': 'http://www.rtl.nl/system/videoplayer/derden/rtlnieuws/video_embed.html#uuid=84ae5571-ac25-4225-ae0c-ef8d9efb2aed/autoplay=false', 'url': 'http://www.rtl.nl/system/videoplayer/derden/rtlnieuws/video_embed.html#uuid=84ae5571-ac25-4225-ae0c-ef8d9efb2aed/autoplay=false',
@ -76,6 +92,10 @@ class RtlNlIE(InfoExtractor):
}, { }, {
'url': 'https://static.rtl.nl/embed/?uuid=1a2970fc-5c0b-43ff-9fdc-927e39e6d1bc&autoplay=false&publicatiepunt=rtlnieuwsnl', 'url': 'https://static.rtl.nl/embed/?uuid=1a2970fc-5c0b-43ff-9fdc-927e39e6d1bc&autoplay=false&publicatiepunt=rtlnieuwsnl',
'only_matching': True, 'only_matching': True,
}, {
# new embed URL schema
'url': 'https://embed.rtl.nl/#uuid=84ae5571-ac25-4225-ae0c-ef8d9efb2aed/autoplay=false',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -246,7 +246,12 @@ class SoundcloudIE(InfoExtractor):
'comment_count': int, 'comment_count': int,
'repost_count': int, 'repost_count': int,
}, },
} },
{
# with AAC HQ format available via OAuth token
'url': 'https://soundcloud.com/wandw/the-chainsmokers-ft-daya-dont-let-me-down-ww-remix-1',
'only_matching': True,
},
] ]
_API_V2_BASE = 'https://api-v2.soundcloud.com/' _API_V2_BASE = 'https://api-v2.soundcloud.com/'
@ -350,6 +355,9 @@ class SoundcloudIE(InfoExtractor):
format_id_list = [] format_id_list = []
if protocol: if protocol:
format_id_list.append(protocol) format_id_list.append(protocol)
ext = f.get('ext')
if ext == 'aac':
f['abr'] = '256'
for k in ('ext', 'abr'): for k in ('ext', 'abr'):
v = f.get(k) v = f.get(k)
if v: if v:
@ -360,9 +368,13 @@ class SoundcloudIE(InfoExtractor):
abr = f.get('abr') abr = f.get('abr')
if abr: if abr:
f['abr'] = int(abr) f['abr'] = int(abr)
if protocol == 'hls':
protocol = 'm3u8' if ext == 'aac' else 'm3u8_native'
else:
protocol = 'http'
f.update({ f.update({
'format_id': '_'.join(format_id_list), 'format_id': '_'.join(format_id_list),
'protocol': 'm3u8_native' if protocol == 'hls' else 'http', 'protocol': protocol,
'preference': -10 if preview else None, 'preference': -10 if preview else None,
}) })
formats.append(f) formats.append(f)
@ -546,8 +558,10 @@ class SoundcloudSetIE(SoundcloudPlaylistBaseIE):
class SoundcloudPagedPlaylistBaseIE(SoundcloudIE): class SoundcloudPagedPlaylistBaseIE(SoundcloudIE):
def _extract_playlist(self, base_url, playlist_id, playlist_title): def _extract_playlist(self, base_url, playlist_id, playlist_title):
# Per the SoundCloud documentation, the maximum limit for a linked partioning query is 200.
# https://developers.soundcloud.com/blog/offset-pagination-deprecated
COMMON_QUERY = { COMMON_QUERY = {
'limit': 2000000000, 'limit': 200,
'linked_partitioning': '1', 'linked_partitioning': '1',
} }

View File

@ -8,15 +8,10 @@ class BellatorIE(MTVServicesInfoExtractor):
_TESTS = [{ _TESTS = [{
'url': 'http://www.bellator.com/fight/atwr7k/bellator-158-michael-page-vs-evangelista-cyborg', 'url': 'http://www.bellator.com/fight/atwr7k/bellator-158-michael-page-vs-evangelista-cyborg',
'info_dict': { 'info_dict': {
'id': 'b55e434e-fde1-4a98-b7cc-92003a034de4', 'title': 'Michael Page vs. Evangelista Cyborg',
'ext': 'mp4', 'description': 'md5:0d917fc00ffd72dd92814963fc6cbb05',
'title': 'Douglas Lima vs. Paul Daley - Round 1',
'description': 'md5:805a8dd29310fd611d32baba2f767885',
},
'params': {
# m3u8 download
'skip_download': True,
}, },
'playlist_count': 3,
}, { }, {
'url': 'http://www.bellator.com/video-clips/bw6k7n/bellator-158-foundations-michael-venom-page', 'url': 'http://www.bellator.com/video-clips/bw6k7n/bellator-158-foundations-michael-venom-page',
'only_matching': True, 'only_matching': True,
@ -25,6 +20,9 @@ class BellatorIE(MTVServicesInfoExtractor):
_FEED_URL = 'http://www.bellator.com/feeds/mrss/' _FEED_URL = 'http://www.bellator.com/feeds/mrss/'
_GEO_COUNTRIES = ['US'] _GEO_COUNTRIES = ['US']
def _extract_mgid(self, webpage):
return self._extract_triforce_mgid(webpage)
class ParamountNetworkIE(MTVServicesInfoExtractor): class ParamountNetworkIE(MTVServicesInfoExtractor):
_VALID_URL = r'https?://(?:www\.)?paramountnetwork\.com/[^/]+/[\da-z]{6}(?:[/?#&]|$)' _VALID_URL = r'https?://(?:www\.)?paramountnetwork\.com/[^/]+/[\da-z]{6}(?:[/?#&]|$)'

View File

@ -114,7 +114,7 @@ class SRGSSRPlayIE(InfoExtractor):
[^/]+/(?P<type>video|audio)/[^?]+| [^/]+/(?P<type>video|audio)/[^?]+|
popup(?P<type_2>video|audio)player popup(?P<type_2>video|audio)player
) )
\?id=(?P<id>[0-9a-f\-]{36}|\d+) \?.*?\b(?:id=|urn=urn:[^:]+:video:)(?P<id>[0-9a-f\-]{36}|\d+)
''' '''
_TESTS = [{ _TESTS = [{
@ -175,6 +175,12 @@ class SRGSSRPlayIE(InfoExtractor):
}, { }, {
'url': 'https://www.srf.ch/play/tv/popupvideoplayer?id=c4dba0ca-e75b-43b2-a34f-f708a4932e01', 'url': 'https://www.srf.ch/play/tv/popupvideoplayer?id=c4dba0ca-e75b-43b2-a34f-f708a4932e01',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.srf.ch/play/tv/10vor10/video/snowden-beantragt-asyl-in-russland?urn=urn:srf:video:28e1a57d-5b76-4399-8ab3-9097f071e6c5',
'only_matching': True,
}, {
'url': 'https://www.rts.ch/play/tv/19h30/video/le-19h30?urn=urn:rts:video:6348260',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -224,9 +224,17 @@ class SVTPlayIE(SVTPlayBaseIE):
self._adjust_title(info_dict) self._adjust_title(info_dict)
return info_dict return info_dict
svt_id = self._search_regex( svt_id = try_get(
r'<video[^>]+data-video-id=["\']([\da-zA-Z-]+)', data, lambda x: x['statistics']['dataLake']['content']['id'],
webpage, 'video id') compat_str)
if not svt_id:
svt_id = self._search_regex(
(r'<video[^>]+data-video-id=["\']([\da-zA-Z-]+)',
r'["\']videoSvtId["\']\s*:\s*["\']([\da-zA-Z-]+)',
r'"content"\s*:\s*{.*?"id"\s*:\s*"([\da-zA-Z-]+)"',
r'["\']svtId["\']\s*:\s*["\']([\da-zA-Z-]+)'),
webpage, 'video id')
return self._extract_by_video_id(svt_id, webpage) return self._extract_by_video_id(svt_id, webpage)

View File

@ -6,18 +6,16 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from .jwplatform import JWPlatformIE from .jwplatform import JWPlatformIE
from .nexx import NexxIE from .nexx import NexxIE
from ..compat import ( from ..compat import compat_urlparse
compat_str,
compat_urlparse,
)
from ..utils import ( from ..utils import (
NO_DEFAULT, NO_DEFAULT,
try_get, smuggle_url,
) )
class Tele5IE(InfoExtractor): class Tele5IE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?tele5\.de/(?:[^/]+/)*(?P<id>[^/?#&]+)' _VALID_URL = r'https?://(?:www\.)?tele5\.de/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_GEO_COUNTRIES = ['DE']
_TESTS = [{ _TESTS = [{
'url': 'https://www.tele5.de/mediathek/filme-online/videos?vid=1549416', 'url': 'https://www.tele5.de/mediathek/filme-online/videos?vid=1549416',
'info_dict': { 'info_dict': {
@ -30,6 +28,21 @@ class Tele5IE(InfoExtractor):
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
}, {
# jwplatform, nexx unavailable
'url': 'https://www.tele5.de/filme/ghoul-das-geheimnis-des-friedhofmonsters/',
'info_dict': {
'id': 'WJuiOlUp',
'ext': 'mp4',
'upload_date': '20200603',
'timestamp': 1591214400,
'title': 'Ghoul - Das Geheimnis des Friedhofmonsters',
'description': 'md5:42002af1d887ff3d5b2b3ca1f8137d97',
},
'params': {
'skip_download': True,
},
'add_ie': [JWPlatformIE.ie_key()],
}, { }, {
'url': 'https://www.tele5.de/kalkofes-mattscheibe/video-clips/politik-und-gesellschaft?ve_id=1551191', 'url': 'https://www.tele5.de/kalkofes-mattscheibe/video-clips/politik-und-gesellschaft?ve_id=1551191',
'only_matching': True, 'only_matching': True,
@ -88,15 +101,8 @@ class Tele5IE(InfoExtractor):
if not jwplatform_id: if not jwplatform_id:
jwplatform_id = extract_id(JWPLATFORM_ID_RE, 'jwplatform id') jwplatform_id = extract_id(JWPLATFORM_ID_RE, 'jwplatform id')
media = self._download_json(
'https://cdn.jwplayer.com/v2/media/' + jwplatform_id,
display_id)
nexx_id = try_get(
media, lambda x: x['playlist'][0]['nexx_id'], compat_str)
if nexx_id:
return nexx_result(nexx_id)
return self.url_result( return self.url_result(
'jwplatform:%s' % jwplatform_id, ie=JWPlatformIE.ie_key(), smuggle_url(
video_id=jwplatform_id) 'jwplatform:%s' % jwplatform_id,
{'geo_countries': self._GEO_COUNTRIES}),
ie=JWPlatformIE.ie_key(), video_id=jwplatform_id)

View File

@ -13,14 +13,24 @@ from ..utils import (
class TeleQuebecBaseIE(InfoExtractor): class TeleQuebecBaseIE(InfoExtractor):
@staticmethod @staticmethod
def _limelight_result(media_id): def _result(url, ie_key):
return { return {
'_type': 'url_transparent', '_type': 'url_transparent',
'url': smuggle_url( 'url': smuggle_url(url, {'geo_countries': ['CA']}),
'limelight:media:' + media_id, {'geo_countries': ['CA']}), 'ie_key': ie_key,
'ie_key': 'LimelightMedia',
} }
@staticmethod
def _limelight_result(media_id):
return TeleQuebecBaseIE._result(
'limelight:media:' + media_id, 'LimelightMedia')
@staticmethod
def _brightcove_result(brightcove_id):
return TeleQuebecBaseIE._result(
'http://players.brightcove.net/6150020952001/default_default/index.html?videoId=%s'
% brightcove_id, 'BrightcoveNew')
class TeleQuebecIE(TeleQuebecBaseIE): class TeleQuebecIE(TeleQuebecBaseIE):
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
@ -37,11 +47,27 @@ class TeleQuebecIE(TeleQuebecBaseIE):
'id': '577116881b4b439084e6b1cf4ef8b1b3', 'id': '577116881b4b439084e6b1cf4ef8b1b3',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Un petit choc et puis repart!', 'title': 'Un petit choc et puis repart!',
'description': 'md5:b04a7e6b3f74e32d7b294cffe8658374', 'description': 'md5:067bc84bd6afecad85e69d1000730907',
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
}, {
'url': 'https://zonevideo.telequebec.tv/media/55267/le-soleil/passe-partout',
'info_dict': {
'id': '6167180337001',
'ext': 'mp4',
'title': 'Le soleil',
'description': 'md5:64289c922a8de2abbe99c354daffde02',
'uploader_id': '6150020952001',
'upload_date': '20200625',
'timestamp': 1593090307,
},
'params': {
'format': 'bestvideo',
'skip_download': True,
},
'add_ie': ['BrightcoveNew'],
}, { }, {
# no description # no description
'url': 'http://zonevideo.telequebec.tv/media/30261', 'url': 'http://zonevideo.telequebec.tv/media/30261',
@ -58,7 +84,14 @@ class TeleQuebecIE(TeleQuebecBaseIE):
'https://mnmedias.api.telequebec.tv/api/v2/media/' + media_id, 'https://mnmedias.api.telequebec.tv/api/v2/media/' + media_id,
media_id)['media'] media_id)['media']
info = self._limelight_result(media_data['streamInfo']['sourceId']) source_id = media_data['streamInfo']['sourceId']
source = (try_get(
media_data, lambda x: x['streamInfo']['source'],
compat_str) or 'limelight').lower()
if source == 'brightcove':
info = self._brightcove_result(source_id)
else:
info = self._limelight_result(source_id)
info.update({ info.update({
'title': media_data.get('title'), 'title': media_data.get('title'),
'description': try_get( 'description': try_get(

View File

@ -10,8 +10,8 @@ from ..utils import (
class TenPlayIE(InfoExtractor): class TenPlayIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?10play\.com\.au/[^/]+/episodes/[^/]+/[^/]+/(?P<id>tpv\d{6}[a-z]{5})' _VALID_URL = r'https?://(?:www\.)?10play\.com\.au/(?:[^/]+/)+(?P<id>tpv\d{6}[a-z]{5})'
_TEST = { _TESTS = [{
'url': 'https://10play.com.au/masterchef/episodes/season-1/masterchef-s1-ep-1/tpv190718kwzga', 'url': 'https://10play.com.au/masterchef/episodes/season-1/masterchef-s1-ep-1/tpv190718kwzga',
'info_dict': { 'info_dict': {
'id': '6060533435001', 'id': '6060533435001',
@ -27,7 +27,10 @@ class TenPlayIE(InfoExtractor):
'format': 'bestvideo', 'format': 'bestvideo',
'skip_download': True, 'skip_download': True,
} }
} }, {
'url': 'https://10play.com.au/how-to-stay-married/web-extras/season-1/terrys-talks-ep-1-embracing-change/tpv190915ylupc',
'only_matching': True,
}]
BRIGHTCOVE_URL_TEMPLATE = 'https://players.brightcove.net/2199827728001/cN6vRtRQt_default/index.html?videoId=%s' BRIGHTCOVE_URL_TEMPLATE = 'https://players.brightcove.net/2199827728001/cN6vRtRQt_default/index.html?videoId=%s'
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -31,6 +31,10 @@ class ThisOldHouseIE(InfoExtractor):
}, { }, {
'url': 'https://www.thisoldhouse.com/21113884/s41-e13-paradise-lost', 'url': 'https://www.thisoldhouse.com/21113884/s41-e13-paradise-lost',
'only_matching': True, 'only_matching': True,
}, {
# iframe www.thisoldhouse.com
'url': 'https://www.thisoldhouse.com/21083431/seaside-transformation-the-westerly-project',
'only_matching': True,
}] }]
_ZYPE_TMPL = 'https://player.zype.com/embed/%s.html?api_key=hsOk_yMSPYNrT22e9pu8hihLXjaZf0JW5jsOWv4ZqyHJFvkJn6rtToHl09tbbsbe' _ZYPE_TMPL = 'https://player.zype.com/embed/%s.html?api_key=hsOk_yMSPYNrT22e9pu8hihLXjaZf0JW5jsOWv4ZqyHJFvkJn6rtToHl09tbbsbe'
@ -38,6 +42,6 @@ class ThisOldHouseIE(InfoExtractor):
display_id = self._match_id(url) display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
video_id = self._search_regex( video_id = self._search_regex(
r'<iframe[^>]+src=[\'"](?:https?:)?//thisoldhouse\.chorus\.build/videos/zype/([0-9a-f]{24})', r'<iframe[^>]+src=[\'"](?:https?:)?//(?:www\.)?thisoldhouse\.(?:chorus\.build|com)/videos/zype/([0-9a-f]{24})',
webpage, 'video id') webpage, 'video id')
return self.url_result(self._ZYPE_TMPL % video_id, 'Zype', video_id) return self.url_result(self._ZYPE_TMPL % video_id, 'Zype', video_id)

View File

@ -6,7 +6,6 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_HTTPError, compat_HTTPError,
compat_str,
compat_urlparse, compat_urlparse,
) )
from ..utils import ( from ..utils import (
@ -15,9 +14,7 @@ from ..utils import (
int_or_none, int_or_none,
parse_iso8601, parse_iso8601,
qualities, qualities,
smuggle_url,
try_get, try_get,
unsmuggle_url,
update_url_query, update_url_query,
url_or_none, url_or_none,
) )
@ -235,11 +232,6 @@ class TVPlayIE(InfoExtractor):
] ]
def _real_extract(self, url): def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})
self._initialize_geo_bypass({
'countries': smuggled_data.get('geo_countries'),
})
video_id = self._match_id(url) video_id = self._match_id(url)
geo_country = self._search_regex( geo_country = self._search_regex(
r'https?://[^/]+\.([a-z]{2})', url, r'https?://[^/]+\.([a-z]{2})', url,
@ -285,8 +277,6 @@ class TVPlayIE(InfoExtractor):
'ext': ext, 'ext': ext,
} }
if video_url.startswith('rtmp'): if video_url.startswith('rtmp'):
if smuggled_data.get('skip_rtmp'):
continue
m = re.search( m = re.search(
r'^(?P<url>rtmp://[^/]+/(?P<app>[^/]+))/(?P<playpath>.+)$', video_url) r'^(?P<url>rtmp://[^/]+/(?P<app>[^/]+))/(?P<playpath>.+)$', video_url)
if not m: if not m:
@ -347,115 +337,80 @@ class ViafreeIE(InfoExtractor):
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?:// https?://
(?:www\.)? (?:www\.)?
viafree\. viafree\.(?P<country>dk|no|se)
(?: /(?P<id>program(?:mer)?/(?:[^/]+/)+[^/?#&]+)
(?:dk|no)/programmer|
se/program
)
/(?:[^/]+/)+(?P<id>[^/?#&]+)
''' '''
_TESTS = [{ _TESTS = [{
'url': 'http://www.viafree.se/program/livsstil/husraddarna/sasong-2/avsnitt-2', 'url': 'http://www.viafree.no/programmer/underholdning/det-beste-vorspielet/sesong-2/episode-1',
'info_dict': { 'info_dict': {
'id': '395375', 'id': '757786',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Husräddarna S02E02', 'title': 'Det beste vorspielet - Sesong 2 - Episode 1',
'description': 'md5:4db5c933e37db629b5a2f75dfb34829e', 'description': 'md5:b632cb848331404ccacd8cd03e83b4c3',
'series': 'Husräddarna', 'series': 'Det beste vorspielet',
'season': 'Säsong 2',
'season_number': 2, 'season_number': 2,
'duration': 2576, 'duration': 1116,
'timestamp': 1400596321, 'timestamp': 1471200600,
'upload_date': '20140520', 'upload_date': '20160814',
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
'add_ie': [TVPlayIE.ie_key()],
}, { }, {
# with relatedClips # with relatedClips
'url': 'http://www.viafree.se/program/reality/sommaren-med-youtube-stjarnorna/sasong-1/avsnitt-1', 'url': 'http://www.viafree.se/program/reality/sommaren-med-youtube-stjarnorna/sasong-1/avsnitt-1',
'info_dict': { 'only_matching': True,
'id': '758770',
'ext': 'mp4',
'title': 'Sommaren med YouTube-stjärnorna S01E01',
'description': 'md5:2bc69dce2c4bb48391e858539bbb0e3f',
'series': 'Sommaren med YouTube-stjärnorna',
'season': 'Säsong 1',
'season_number': 1,
'duration': 1326,
'timestamp': 1470905572,
'upload_date': '20160811',
},
'params': {
'skip_download': True,
},
'add_ie': [TVPlayIE.ie_key()],
}, { }, {
# Different og:image URL schema # Different og:image URL schema
'url': 'http://www.viafree.se/program/reality/sommaren-med-youtube-stjarnorna/sasong-1/avsnitt-2', 'url': 'http://www.viafree.se/program/reality/sommaren-med-youtube-stjarnorna/sasong-1/avsnitt-2',
'only_matching': True, 'only_matching': True,
}, { }, {
'url': 'http://www.viafree.no/programmer/underholdning/det-beste-vorspielet/sesong-2/episode-1', 'url': 'http://www.viafree.se/program/livsstil/husraddarna/sasong-2/avsnitt-2',
'only_matching': True, 'only_matching': True,
}, { }, {
'url': 'http://www.viafree.dk/programmer/reality/paradise-hotel/saeson-7/episode-5', 'url': 'http://www.viafree.dk/programmer/reality/paradise-hotel/saeson-7/episode-5',
'only_matching': True, 'only_matching': True,
}] }]
_GEO_BYPASS = False
@classmethod @classmethod
def suitable(cls, url): def suitable(cls, url):
return False if TVPlayIE.suitable(url) else super(ViafreeIE, cls).suitable(url) return False if TVPlayIE.suitable(url) else super(ViafreeIE, cls).suitable(url)
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) country, path = re.match(self._VALID_URL, url).groups()
content = self._download_json(
'https://viafree-content.mtg-api.com/viafree-content/v1/%s/path/%s' % (country, path), path)
program = content['_embedded']['viafreeBlocks'][0]['_embedded']['program']
guid = program['guid']
meta = content['meta']
title = meta['title']
webpage = self._download_webpage(url, video_id) try:
stream_href = self._download_json(
program['_links']['streamLink']['href'], guid,
headers=self.geo_verification_headers())['embedded']['prioritizedStreams'][0]['links']['stream']['href']
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
self.raise_geo_restricted(countries=[country])
raise
data = self._parse_json( formats = self._extract_m3u8_formats(stream_href, guid, 'mp4')
self._search_regex( self._sort_formats(formats)
r'(?s)window\.App\s*=\s*({.+?})\s*;\s*</script', episode = program.get('episode') or {}
webpage, 'data', default='{}'),
video_id, transform_source=lambda x: re.sub(
r'(?s)function\s+[a-zA-Z_][\da-zA-Z_]*\s*\([^)]*\)\s*{[^}]*}\s*',
'null', x), fatal=False)
video_id = None return {
'id': guid,
if data: 'title': title,
video_id = try_get( 'thumbnail': meta.get('image'),
data, lambda x: x['context']['dispatcher']['stores'][ 'description': meta.get('description'),
'ContentPageProgramStore']['currentVideo']['id'], 'series': episode.get('seriesTitle'),
compat_str) 'episode_number': int_or_none(episode.get('episodeNumber')),
'season_number': int_or_none(episode.get('seasonNumber')),
# Fallback #1 (extract from og:image URL schema) 'duration': int_or_none(try_get(program, lambda x: x['video']['duration']['milliseconds']), 1000),
if not video_id: 'timestamp': parse_iso8601(try_get(program, lambda x: x['availability']['start'])),
thumbnail = self._og_search_thumbnail(webpage, default=None) 'formats': formats,
if thumbnail: }
video_id = self._search_regex(
# Patterns seen:
# http://cdn.playapi.mtgx.tv/imagecache/600x315/cloud/content-images/inbox/765166/a2e95e5f1d735bab9f309fa345cc3f25.jpg
# http://cdn.playapi.mtgx.tv/imagecache/600x315/cloud/content-images/seasons/15204/758770/4a5ba509ca8bc043e1ebd1a76131cdf2.jpg
r'https?://[^/]+/imagecache/(?:[^/]+/)+(\d{6,})/',
thumbnail, 'video id', default=None)
# Fallback #2. Extract from raw JSON string.
# May extract wrong video id if relatedClips is present.
if not video_id:
video_id = self._search_regex(
r'currentVideo["\']\s*:\s*.+?["\']id["\']\s*:\s*["\'](\d{6,})',
webpage, 'video id')
return self.url_result(
smuggle_url(
'mtg:%s' % video_id,
{
'geo_countries': [
compat_urlparse.urlparse(url).netloc.rsplit('.', 1)[-1]],
# rtmp host mtgfs.fplive.net for viafree is unresolvable
'skip_rtmp': True,
}),
ie=TVPlayIE.ie_key(), video_id=video_id)
class TVPlayHomeIE(InfoExtractor): class TVPlayHomeIE(InfoExtractor):

View File

@ -1,26 +1,29 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import collections
import itertools import itertools
import re
import random
import json import json
import random
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_kwargs, compat_kwargs,
compat_parse_qs, compat_parse_qs,
compat_str, compat_str,
compat_urlparse,
compat_urllib_parse_urlencode, compat_urllib_parse_urlencode,
compat_urllib_parse_urlparse, compat_urllib_parse_urlparse,
) )
from ..utils import ( from ..utils import (
clean_html, clean_html,
ExtractorError, ExtractorError,
float_or_none,
int_or_none, int_or_none,
orderedSet,
parse_duration, parse_duration,
parse_iso8601, parse_iso8601,
qualities,
try_get, try_get,
unified_timestamp, unified_timestamp,
update_url_query, update_url_query,
@ -50,8 +53,14 @@ class TwitchBaseIE(InfoExtractor):
def _call_api(self, path, item_id, *args, **kwargs): def _call_api(self, path, item_id, *args, **kwargs):
headers = kwargs.get('headers', {}).copy() headers = kwargs.get('headers', {}).copy()
headers['Client-ID'] = self._CLIENT_ID headers.update({
kwargs['headers'] = headers 'Accept': 'application/vnd.twitchtv.v5+json; charset=UTF-8',
'Client-ID': self._CLIENT_ID,
})
kwargs.update({
'headers': headers,
'expected_status': (400, 410),
})
response = self._download_json( response = self._download_json(
'%s/%s' % (self._API_BASE, path), item_id, '%s/%s' % (self._API_BASE, path), item_id,
*args, **compat_kwargs(kwargs)) *args, **compat_kwargs(kwargs))
@ -142,105 +151,16 @@ class TwitchBaseIE(InfoExtractor):
}) })
self._sort_formats(formats) self._sort_formats(formats)
def _download_access_token(self, channel_name):
return self._call_api(
'api/channels/%s/access_token' % channel_name, channel_name,
'Downloading access token JSON')
class TwitchItemBaseIE(TwitchBaseIE): def _extract_channel_id(self, token, channel_name):
def _download_info(self, item, item_id): return compat_str(self._parse_json(token, channel_name)['channel_id'])
return self._extract_info(self._call_api(
'kraken/videos/%s%s' % (item, item_id), item_id,
'Downloading %s info JSON' % self._ITEM_TYPE))
def _extract_media(self, item_id):
info = self._download_info(self._ITEM_SHORTCUT, item_id)
response = self._call_api(
'api/videos/%s%s' % (self._ITEM_SHORTCUT, item_id), item_id,
'Downloading %s playlist JSON' % self._ITEM_TYPE)
entries = []
chunks = response['chunks']
qualities = list(chunks.keys())
for num, fragment in enumerate(zip(*chunks.values()), start=1):
formats = []
for fmt_num, fragment_fmt in enumerate(fragment):
format_id = qualities[fmt_num]
fmt = {
'url': fragment_fmt['url'],
'format_id': format_id,
'quality': 1 if format_id == 'live' else 0,
}
m = re.search(r'^(?P<height>\d+)[Pp]', format_id)
if m:
fmt['height'] = int(m.group('height'))
formats.append(fmt)
self._sort_formats(formats)
entry = dict(info)
entry['id'] = '%s_%d' % (entry['id'], num)
entry['title'] = '%s part %d' % (entry['title'], num)
entry['formats'] = formats
entries.append(entry)
return self.playlist_result(entries, info['id'], info['title'])
def _extract_info(self, info):
status = info.get('status')
if status == 'recording':
is_live = True
elif status == 'recorded':
is_live = False
else:
is_live = None
return {
'id': info['_id'],
'title': info.get('title') or 'Untitled Broadcast',
'description': info.get('description'),
'duration': int_or_none(info.get('length')),
'thumbnail': info.get('preview'),
'uploader': info.get('channel', {}).get('display_name'),
'uploader_id': info.get('channel', {}).get('name'),
'timestamp': parse_iso8601(info.get('recorded_at')),
'view_count': int_or_none(info.get('views')),
'is_live': is_live,
}
def _real_extract(self, url):
return self._extract_media(self._match_id(url))
class TwitchVideoIE(TwitchItemBaseIE): class TwitchVodIE(TwitchBaseIE):
IE_NAME = 'twitch:video'
_VALID_URL = r'%s/[^/]+/b/(?P<id>\d+)' % TwitchBaseIE._VALID_URL_BASE
_ITEM_TYPE = 'video'
_ITEM_SHORTCUT = 'a'
_TEST = {
'url': 'http://www.twitch.tv/riotgames/b/577357806',
'info_dict': {
'id': 'a577357806',
'title': 'Worlds Semifinals - Star Horn Royal Club vs. OMG',
},
'playlist_mincount': 12,
'skip': 'HTTP Error 404: Not Found',
}
class TwitchChapterIE(TwitchItemBaseIE):
IE_NAME = 'twitch:chapter'
_VALID_URL = r'%s/[^/]+/c/(?P<id>\d+)' % TwitchBaseIE._VALID_URL_BASE
_ITEM_TYPE = 'chapter'
_ITEM_SHORTCUT = 'c'
_TESTS = [{
'url': 'http://www.twitch.tv/acracingleague/c/5285812',
'info_dict': {
'id': 'c5285812',
'title': 'ACRL Off Season - Sports Cars @ Nordschleife',
},
'playlist_mincount': 3,
'skip': 'HTTP Error 404: Not Found',
}, {
'url': 'http://www.twitch.tv/tsm_theoddone/c/2349361',
'only_matching': True,
}]
class TwitchVodIE(TwitchItemBaseIE):
IE_NAME = 'twitch:vod' IE_NAME = 'twitch:vod'
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?:// https?://
@ -309,17 +229,60 @@ class TwitchVodIE(TwitchItemBaseIE):
'only_matching': True, 'only_matching': True,
}] }]
def _real_extract(self, url): def _download_info(self, item_id):
item_id = self._match_id(url) return self._extract_info(
self._call_api(
'kraken/videos/%s' % item_id, item_id,
'Downloading video info JSON'))
info = self._download_info(self._ITEM_SHORTCUT, item_id) @staticmethod
def _extract_info(info):
status = info.get('status')
if status == 'recording':
is_live = True
elif status == 'recorded':
is_live = False
else:
is_live = None
_QUALITIES = ('small', 'medium', 'large')
quality_key = qualities(_QUALITIES)
thumbnails = []
preview = info.get('preview')
if isinstance(preview, dict):
for thumbnail_id, thumbnail_url in preview.items():
thumbnail_url = url_or_none(thumbnail_url)
if not thumbnail_url:
continue
if thumbnail_id not in _QUALITIES:
continue
thumbnails.append({
'url': thumbnail_url,
'preference': quality_key(thumbnail_id),
})
return {
'id': info['_id'],
'title': info.get('title') or 'Untitled Broadcast',
'description': info.get('description'),
'duration': int_or_none(info.get('length')),
'thumbnails': thumbnails,
'uploader': info.get('channel', {}).get('display_name'),
'uploader_id': info.get('channel', {}).get('name'),
'timestamp': parse_iso8601(info.get('recorded_at')),
'view_count': int_or_none(info.get('views')),
'is_live': is_live,
}
def _real_extract(self, url):
vod_id = self._match_id(url)
info = self._download_info(vod_id)
access_token = self._call_api( access_token = self._call_api(
'api/vods/%s/access_token' % item_id, item_id, 'api/vods/%s/access_token' % vod_id, vod_id,
'Downloading %s access token' % self._ITEM_TYPE) 'Downloading %s access token' % self._ITEM_TYPE)
formats = self._extract_m3u8_formats( formats = self._extract_m3u8_formats(
'%s/vod/%s.m3u8?%s' % ( '%s/vod/%s.m3u8?%s' % (
self._USHER_BASE, item_id, self._USHER_BASE, vod_id,
compat_urllib_parse_urlencode({ compat_urllib_parse_urlencode({
'allow_source': 'true', 'allow_source': 'true',
'allow_audio_only': 'true', 'allow_audio_only': 'true',
@ -329,7 +292,7 @@ class TwitchVodIE(TwitchItemBaseIE):
'nauth': access_token['token'], 'nauth': access_token['token'],
'nauthsig': access_token['sig'], 'nauthsig': access_token['sig'],
})), })),
item_id, 'mp4', entry_protocol='m3u8_native') vod_id, 'mp4', entry_protocol='m3u8_native')
self._prefer_source(formats) self._prefer_source(formats)
info['formats'] = formats info['formats'] = formats
@ -343,7 +306,7 @@ class TwitchVodIE(TwitchItemBaseIE):
info['subtitles'] = { info['subtitles'] = {
'rechat': [{ 'rechat': [{
'url': update_url_query( 'url': update_url_query(
'https://api.twitch.tv/v5/videos/%s/comments' % item_id, { 'https://api.twitch.tv/v5/videos/%s/comments' % vod_id, {
'client_id': self._CLIENT_ID, 'client_id': self._CLIENT_ID,
}), }),
'ext': 'json', 'ext': 'json',
@ -353,166 +316,415 @@ class TwitchVodIE(TwitchItemBaseIE):
return info return info
class TwitchPlaylistBaseIE(TwitchBaseIE): def _make_video_result(node):
_PLAYLIST_PATH = 'kraken/channels/%s/videos/?offset=%d&limit=%d' assert isinstance(node, dict)
video_id = node.get('id')
if not video_id:
return
return {
'_type': 'url_transparent',
'ie_key': TwitchVodIE.ie_key(),
'id': video_id,
'url': 'https://www.twitch.tv/videos/%s' % video_id,
'title': node.get('title'),
'thumbnail': node.get('previewThumbnailURL'),
'duration': float_or_none(node.get('lengthSeconds')),
'view_count': int_or_none(node.get('viewCount')),
}
class TwitchGraphQLBaseIE(TwitchBaseIE):
_PAGE_LIMIT = 100 _PAGE_LIMIT = 100
def _extract_playlist(self, channel_id): _OPERATION_HASHES = {
info = self._call_api( 'CollectionSideBar': '27111f1b382effad0b6def325caef1909c733fe6a4fbabf54f8d491ef2cf2f14',
'kraken/channels/%s' % channel_id, 'FilterableVideoTower_Videos': 'a937f1d22e269e39a03b509f65a7490f9fc247d7f83d6ac1421523e3b68042cb',
channel_id, 'Downloading channel info JSON') 'ClipsCards__User': 'b73ad2bfaecfd30a9e6c28fada15bd97032c83ec77a0440766a56fe0bd632777',
channel_name = info.get('display_name') or info.get('name') 'ChannelCollectionsContent': '07e3691a1bad77a36aba590c351180439a40baefc1c275356f40fc7082419a84',
'StreamMetadata': '1c719a40e481453e5c48d9bb585d971b8b372f8ebb105b17076722264dfa5b3e',
'ComscoreStreamingQuery': 'e1edae8122517d013405f237ffcc124515dc6ded82480a88daef69c83b53ac01',
'VideoPreviewOverlay': '3006e77e51b128d838fa4e835723ca4dc9a05c5efd4466c1085215c6e437e65c',
}
def _download_gql(self, video_id, ops, note, fatal=True):
for op in ops:
op['extensions'] = {
'persistedQuery': {
'version': 1,
'sha256Hash': self._OPERATION_HASHES[op['operationName']],
}
}
return self._download_json(
'https://gql.twitch.tv/gql', video_id, note,
data=json.dumps(ops).encode(),
headers={
'Content-Type': 'text/plain;charset=UTF-8',
'Client-ID': self._CLIENT_ID,
}, fatal=fatal)
class TwitchCollectionIE(TwitchGraphQLBaseIE):
_VALID_URL = r'https?://(?:(?:www|go|m)\.)?twitch\.tv/collections/(?P<id>[^/]+)'
_TESTS = [{
'url': 'https://www.twitch.tv/collections/wlDCoH0zEBZZbQ',
'info_dict': {
'id': 'wlDCoH0zEBZZbQ',
'title': 'Overthrow Nook, capitalism for children',
},
'playlist_mincount': 13,
}]
_OPERATION_NAME = 'CollectionSideBar'
def _real_extract(self, url):
collection_id = self._match_id(url)
collection = self._download_gql(
collection_id, [{
'operationName': self._OPERATION_NAME,
'variables': {'collectionID': collection_id},
}],
'Downloading collection GraphQL')[0]['data']['collection']
title = collection.get('title')
entries = [] entries = []
for edge in collection['items']['edges']:
if not isinstance(edge, dict):
continue
node = edge.get('node')
if not isinstance(node, dict):
continue
video = _make_video_result(node)
if video:
entries.append(video)
return self.playlist_result(
entries, playlist_id=collection_id, playlist_title=title)
class TwitchPlaylistBaseIE(TwitchGraphQLBaseIE):
def _entries(self, channel_name, *args):
cursor = None
variables_common = self._make_variables(channel_name, *args)
entries_key = '%ss' % self._ENTRY_KIND
for page_num in itertools.count(1):
variables = variables_common.copy()
variables['limit'] = self._PAGE_LIMIT
if cursor:
variables['cursor'] = cursor
page = self._download_gql(
channel_name, [{
'operationName': self._OPERATION_NAME,
'variables': variables,
}],
'Downloading %ss GraphQL page %s' % (self._NODE_KIND, page_num),
fatal=False)
if not page:
break
edges = try_get(
page, lambda x: x[0]['data']['user'][entries_key]['edges'], list)
if not edges:
break
for edge in edges:
if not isinstance(edge, dict):
continue
if edge.get('__typename') != self._EDGE_KIND:
continue
node = edge.get('node')
if not isinstance(node, dict):
continue
if node.get('__typename') != self._NODE_KIND:
continue
entry = self._extract_entry(node)
if entry:
cursor = edge.get('cursor')
yield entry
if not cursor or not isinstance(cursor, compat_str):
break
# Deprecated kraken v5 API
def _entries_kraken(self, channel_name, broadcast_type, sort):
access_token = self._download_access_token(channel_name)
channel_id = self._extract_channel_id(access_token['token'], channel_name)
offset = 0 offset = 0
limit = self._PAGE_LIMIT
broken_paging_detected = False
counter_override = None counter_override = None
for counter in itertools.count(1): for counter in itertools.count(1):
response = self._call_api( response = self._call_api(
self._PLAYLIST_PATH % (channel_id, offset, limit), 'kraken/channels/%s/videos/' % channel_id,
channel_id, channel_id,
'Downloading %s JSON page %s' 'Downloading video JSON page %s' % (counter_override or counter),
% (self._PLAYLIST_TYPE, counter_override or counter)) query={
page_entries = self._extract_playlist_page(response) 'offset': offset,
if not page_entries: 'limit': self._PAGE_LIMIT,
'broadcast_type': broadcast_type,
'sort': sort,
})
videos = response.get('videos')
if not isinstance(videos, list):
break break
for video in videos:
if not isinstance(video, dict):
continue
video_url = url_or_none(video.get('url'))
if not video_url:
continue
yield {
'_type': 'url_transparent',
'ie_key': TwitchVodIE.ie_key(),
'id': video.get('_id'),
'url': video_url,
'title': video.get('title'),
'description': video.get('description'),
'timestamp': unified_timestamp(video.get('published_at')),
'duration': float_or_none(video.get('length')),
'view_count': int_or_none(video.get('views')),
'language': video.get('language'),
}
offset += self._PAGE_LIMIT
total = int_or_none(response.get('_total')) total = int_or_none(response.get('_total'))
# Since the beginning of March 2016 twitch's paging mechanism if total and offset >= total:
# is completely broken on the twitch side. It simply ignores
# a limit and returns the whole offset number of videos.
# Working around by just requesting all videos at once.
# Upd: pagination bug was fixed by twitch on 15.03.2016.
if not broken_paging_detected and total and len(page_entries) > limit:
self.report_warning(
'Twitch pagination is broken on twitch side, requesting all videos at once',
channel_id)
broken_paging_detected = True
offset = total
counter_override = '(all at once)'
continue
entries.extend(page_entries)
if broken_paging_detected or total and len(page_entries) >= total:
break break
offset += limit
return self.playlist_result(
[self._make_url_result(entry) for entry in orderedSet(entries)],
channel_id, channel_name)
def _make_url_result(self, url):
try:
video_id = 'v%s' % TwitchVodIE._match_id(url)
return self.url_result(url, TwitchVodIE.ie_key(), video_id=video_id)
except AssertionError:
return self.url_result(url)
def _extract_playlist_page(self, response):
videos = response.get('videos')
return [video['url'] for video in videos] if videos else []
def _real_extract(self, url):
return self._extract_playlist(self._match_id(url))
class TwitchProfileIE(TwitchPlaylistBaseIE): class TwitchVideosIE(TwitchPlaylistBaseIE):
IE_NAME = 'twitch:profile' _VALID_URL = r'https?://(?:(?:www|go|m)\.)?twitch\.tv/(?P<id>[^/]+)/(?:videos|profile)'
_VALID_URL = r'%s/(?P<id>[^/]+)/profile/?(?:\#.*)?$' % TwitchBaseIE._VALID_URL_BASE
_PLAYLIST_TYPE = 'profile'
_TESTS = [{ _TESTS = [{
'url': 'http://www.twitch.tv/vanillatv/profile', # All Videos sorted by Date
'info_dict': { 'url': 'https://www.twitch.tv/spamfish/videos?filter=all',
'id': 'vanillatv',
'title': 'VanillaTV',
},
'playlist_mincount': 412,
}, {
'url': 'http://m.twitch.tv/vanillatv/profile',
'only_matching': True,
}]
class TwitchVideosBaseIE(TwitchPlaylistBaseIE):
_VALID_URL_VIDEOS_BASE = r'%s/(?P<id>[^/]+)/videos' % TwitchBaseIE._VALID_URL_BASE
_PLAYLIST_PATH = TwitchPlaylistBaseIE._PLAYLIST_PATH + '&broadcast_type='
class TwitchAllVideosIE(TwitchVideosBaseIE):
IE_NAME = 'twitch:videos:all'
_VALID_URL = r'%s/all' % TwitchVideosBaseIE._VALID_URL_VIDEOS_BASE
_PLAYLIST_PATH = TwitchVideosBaseIE._PLAYLIST_PATH + 'archive,upload,highlight'
_PLAYLIST_TYPE = 'all videos'
_TESTS = [{
'url': 'https://www.twitch.tv/spamfish/videos/all',
'info_dict': { 'info_dict': {
'id': 'spamfish', 'id': 'spamfish',
'title': 'Spamfish', 'title': 'spamfish - All Videos sorted by Date',
}, },
'playlist_mincount': 869, 'playlist_mincount': 924,
}, {
# All Videos sorted by Popular
'url': 'https://www.twitch.tv/spamfish/videos?filter=all&sort=views',
'info_dict': {
'id': 'spamfish',
'title': 'spamfish - All Videos sorted by Popular',
},
'playlist_mincount': 931,
}, {
# Past Broadcasts sorted by Date
'url': 'https://www.twitch.tv/spamfish/videos?filter=archives',
'info_dict': {
'id': 'spamfish',
'title': 'spamfish - Past Broadcasts sorted by Date',
},
'playlist_mincount': 27,
}, {
# Highlights sorted by Date
'url': 'https://www.twitch.tv/spamfish/videos?filter=highlights',
'info_dict': {
'id': 'spamfish',
'title': 'spamfish - Highlights sorted by Date',
},
'playlist_mincount': 901,
}, {
# Uploads sorted by Date
'url': 'https://www.twitch.tv/esl_csgo/videos?filter=uploads&sort=time',
'info_dict': {
'id': 'esl_csgo',
'title': 'esl_csgo - Uploads sorted by Date',
},
'playlist_mincount': 5,
}, {
# Past Premieres sorted by Date
'url': 'https://www.twitch.tv/spamfish/videos?filter=past_premieres',
'info_dict': {
'id': 'spamfish',
'title': 'spamfish - Past Premieres sorted by Date',
},
'playlist_mincount': 1,
}, {
'url': 'https://www.twitch.tv/spamfish/videos/all',
'only_matching': True,
}, { }, {
'url': 'https://m.twitch.tv/spamfish/videos/all', 'url': 'https://m.twitch.tv/spamfish/videos/all',
'only_matching': True, 'only_matching': True,
}]
class TwitchUploadsIE(TwitchVideosBaseIE):
IE_NAME = 'twitch:videos:uploads'
_VALID_URL = r'%s/uploads' % TwitchVideosBaseIE._VALID_URL_VIDEOS_BASE
_PLAYLIST_PATH = TwitchVideosBaseIE._PLAYLIST_PATH + 'upload'
_PLAYLIST_TYPE = 'uploads'
_TESTS = [{
'url': 'https://www.twitch.tv/spamfish/videos/uploads',
'info_dict': {
'id': 'spamfish',
'title': 'Spamfish',
},
'playlist_mincount': 0,
}, { }, {
'url': 'https://m.twitch.tv/spamfish/videos/uploads', 'url': 'https://www.twitch.tv/spamfish/videos',
'only_matching': True, 'only_matching': True,
}] }]
Broadcast = collections.namedtuple('Broadcast', ['type', 'label'])
class TwitchPastBroadcastsIE(TwitchVideosBaseIE): _DEFAULT_BROADCAST = Broadcast(None, 'All Videos')
IE_NAME = 'twitch:videos:past-broadcasts' _BROADCASTS = {
_VALID_URL = r'%s/past-broadcasts' % TwitchVideosBaseIE._VALID_URL_VIDEOS_BASE 'archives': Broadcast('ARCHIVE', 'Past Broadcasts'),
_PLAYLIST_PATH = TwitchVideosBaseIE._PLAYLIST_PATH + 'archive' 'highlights': Broadcast('HIGHLIGHT', 'Highlights'),
_PLAYLIST_TYPE = 'past broadcasts' 'uploads': Broadcast('UPLOAD', 'Uploads'),
'past_premieres': Broadcast('PAST_PREMIERE', 'Past Premieres'),
'all': _DEFAULT_BROADCAST,
}
_DEFAULT_SORTED_BY = 'Date'
_SORTED_BY = {
'time': _DEFAULT_SORTED_BY,
'views': 'Popular',
}
_OPERATION_NAME = 'FilterableVideoTower_Videos'
_ENTRY_KIND = 'video'
_EDGE_KIND = 'VideoEdge'
_NODE_KIND = 'Video'
@classmethod
def suitable(cls, url):
return (False
if any(ie.suitable(url) for ie in (
TwitchVideosClipsIE,
TwitchVideosCollectionsIE))
else super(TwitchVideosIE, cls).suitable(url))
@staticmethod
def _make_variables(channel_name, broadcast_type, sort):
return {
'channelOwnerLogin': channel_name,
'broadcastType': broadcast_type,
'videoSort': sort.upper(),
}
@staticmethod
def _extract_entry(node):
return _make_video_result(node)
def _real_extract(self, url):
channel_name = self._match_id(url)
qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
filter = qs.get('filter', ['all'])[0]
sort = qs.get('sort', ['time'])[0]
broadcast = self._BROADCASTS.get(filter, self._DEFAULT_BROADCAST)
return self.playlist_result(
self._entries(channel_name, broadcast.type, sort),
playlist_id=channel_name,
playlist_title='%s - %s sorted by %s'
% (channel_name, broadcast.label,
self._SORTED_BY.get(sort, self._DEFAULT_SORTED_BY)))
class TwitchVideosClipsIE(TwitchPlaylistBaseIE):
_VALID_URL = r'https?://(?:(?:www|go|m)\.)?twitch\.tv/(?P<id>[^/]+)/(?:clips|videos/*?\?.*?\bfilter=clips)'
_TESTS = [{ _TESTS = [{
'url': 'https://www.twitch.tv/spamfish/videos/past-broadcasts', # Clips
'url': 'https://www.twitch.tv/vanillatv/clips?filter=clips&range=all',
'info_dict': { 'info_dict': {
'id': 'spamfish', 'id': 'vanillatv',
'title': 'Spamfish', 'title': 'vanillatv - Clips Top All',
}, },
'playlist_mincount': 0, 'playlist_mincount': 1,
}, { }, {
'url': 'https://m.twitch.tv/spamfish/videos/past-broadcasts', 'url': 'https://www.twitch.tv/dota2ruhub/videos?filter=clips&range=7d',
'only_matching': True, 'only_matching': True,
}] }]
Clip = collections.namedtuple('Clip', ['filter', 'label'])
class TwitchHighlightsIE(TwitchVideosBaseIE): _DEFAULT_CLIP = Clip('LAST_WEEK', 'Top 7D')
IE_NAME = 'twitch:videos:highlights' _RANGE = {
_VALID_URL = r'%s/highlights' % TwitchVideosBaseIE._VALID_URL_VIDEOS_BASE '24hr': Clip('LAST_DAY', 'Top 24H'),
_PLAYLIST_PATH = TwitchVideosBaseIE._PLAYLIST_PATH + 'highlight' '7d': _DEFAULT_CLIP,
_PLAYLIST_TYPE = 'highlights' '30d': Clip('LAST_MONTH', 'Top 30D'),
'all': Clip('ALL_TIME', 'Top All'),
}
# NB: values other than 20 result in skipped videos
_PAGE_LIMIT = 20
_OPERATION_NAME = 'ClipsCards__User'
_ENTRY_KIND = 'clip'
_EDGE_KIND = 'ClipEdge'
_NODE_KIND = 'Clip'
@staticmethod
def _make_variables(channel_name, filter):
return {
'login': channel_name,
'criteria': {
'filter': filter,
},
}
@staticmethod
def _extract_entry(node):
assert isinstance(node, dict)
clip_url = url_or_none(node.get('url'))
if not clip_url:
return
return {
'_type': 'url_transparent',
'ie_key': TwitchClipsIE.ie_key(),
'id': node.get('id'),
'url': clip_url,
'title': node.get('title'),
'thumbnail': node.get('thumbnailURL'),
'duration': float_or_none(node.get('durationSeconds')),
'timestamp': unified_timestamp(node.get('createdAt')),
'view_count': int_or_none(node.get('viewCount')),
'language': node.get('language'),
}
def _real_extract(self, url):
channel_name = self._match_id(url)
qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
range = qs.get('range', ['7d'])[0]
clip = self._RANGE.get(range, self._DEFAULT_CLIP)
return self.playlist_result(
self._entries(channel_name, clip.filter),
playlist_id=channel_name,
playlist_title='%s - Clips %s' % (channel_name, clip.label))
class TwitchVideosCollectionsIE(TwitchPlaylistBaseIE):
_VALID_URL = r'https?://(?:(?:www|go|m)\.)?twitch\.tv/(?P<id>[^/]+)/videos/*?\?.*?\bfilter=collections'
_TESTS = [{ _TESTS = [{
'url': 'https://www.twitch.tv/spamfish/videos/highlights', # Collections
'url': 'https://www.twitch.tv/spamfish/videos?filter=collections',
'info_dict': { 'info_dict': {
'id': 'spamfish', 'id': 'spamfish',
'title': 'Spamfish', 'title': 'spamfish - Collections',
}, },
'playlist_mincount': 805, 'playlist_mincount': 3,
}, {
'url': 'https://m.twitch.tv/spamfish/videos/highlights',
'only_matching': True,
}] }]
_OPERATION_NAME = 'ChannelCollectionsContent'
_ENTRY_KIND = 'collection'
_EDGE_KIND = 'CollectionsItemEdge'
_NODE_KIND = 'Collection'
class TwitchStreamIE(TwitchBaseIE): @staticmethod
def _make_variables(channel_name):
return {
'ownerLogin': channel_name,
}
@staticmethod
def _extract_entry(node):
assert isinstance(node, dict)
collection_id = node.get('id')
if not collection_id:
return
return {
'_type': 'url_transparent',
'ie_key': TwitchCollectionIE.ie_key(),
'id': collection_id,
'url': 'https://www.twitch.tv/collections/%s' % collection_id,
'title': node.get('title'),
'thumbnail': node.get('thumbnailURL'),
'duration': float_or_none(node.get('lengthSeconds')),
'timestamp': unified_timestamp(node.get('updatedAt')),
'view_count': int_or_none(node.get('viewCount')),
}
def _real_extract(self, url):
channel_name = self._match_id(url)
return self.playlist_result(
self._entries(channel_name), playlist_id=channel_name,
playlist_title='%s - Collections' % channel_name)
class TwitchStreamIE(TwitchGraphQLBaseIE):
IE_NAME = 'twitch:stream' IE_NAME = 'twitch:stream'
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?:// https?://
@ -560,37 +772,52 @@ class TwitchStreamIE(TwitchBaseIE):
def suitable(cls, url): def suitable(cls, url):
return (False return (False
if any(ie.suitable(url) for ie in ( if any(ie.suitable(url) for ie in (
TwitchVideoIE,
TwitchChapterIE,
TwitchVodIE, TwitchVodIE,
TwitchProfileIE, TwitchCollectionIE,
TwitchAllVideosIE, TwitchVideosIE,
TwitchUploadsIE, TwitchVideosClipsIE,
TwitchPastBroadcastsIE, TwitchVideosCollectionsIE,
TwitchHighlightsIE,
TwitchClipsIE)) TwitchClipsIE))
else super(TwitchStreamIE, cls).suitable(url)) else super(TwitchStreamIE, cls).suitable(url))
def _real_extract(self, url): def _real_extract(self, url):
channel_id = self._match_id(url) channel_name = self._match_id(url).lower()
stream = self._call_api( gql = self._download_gql(
'kraken/streams/%s?stream_type=all' % channel_id.lower(), channel_name, [{
channel_id, 'Downloading stream JSON').get('stream') 'operationName': 'StreamMetadata',
'variables': {'channelLogin': channel_name},
}, {
'operationName': 'ComscoreStreamingQuery',
'variables': {
'channel': channel_name,
'clipSlug': '',
'isClip': False,
'isLive': True,
'isVodOrCollection': False,
'vodID': '',
},
}, {
'operationName': 'VideoPreviewOverlay',
'variables': {'login': channel_name},
}],
'Downloading stream GraphQL')
user = gql[0]['data']['user']
if not user:
raise ExtractorError(
'%s does not exist' % channel_name, expected=True)
stream = user['stream']
if not stream: if not stream:
raise ExtractorError('%s is offline' % channel_id, expected=True) raise ExtractorError('%s is offline' % channel_name, expected=True)
# Channel name may be typed if different case than the original channel name access_token = self._download_access_token(channel_name)
# (e.g. http://www.twitch.tv/TWITCHPLAYSPOKEMON) that will lead to constructing token = access_token['token']
# an invalid m3u8 URL. Working around by use of original channel name from stream
# JSON and fallback to lowercase if it's not available.
channel_id = stream.get('channel', {}).get('name') or channel_id.lower()
access_token = self._call_api(
'api/channels/%s/access_token' % channel_id, channel_id,
'Downloading channel access token')
stream_id = stream.get('id') or channel_name
query = { query = {
'allow_source': 'true', 'allow_source': 'true',
'allow_audio_only': 'true', 'allow_audio_only': 'true',
@ -600,44 +827,42 @@ class TwitchStreamIE(TwitchBaseIE):
'playlist_include_framerate': 'true', 'playlist_include_framerate': 'true',
'segment_preference': '4', 'segment_preference': '4',
'sig': access_token['sig'].encode('utf-8'), 'sig': access_token['sig'].encode('utf-8'),
'token': access_token['token'].encode('utf-8'), 'token': token.encode('utf-8'),
} }
formats = self._extract_m3u8_formats( formats = self._extract_m3u8_formats(
'%s/api/channel/hls/%s.m3u8?%s' '%s/api/channel/hls/%s.m3u8' % (self._USHER_BASE, channel_name),
% (self._USHER_BASE, channel_id, compat_urllib_parse_urlencode(query)), stream_id, 'mp4', query=query)
channel_id, 'mp4')
self._prefer_source(formats) self._prefer_source(formats)
view_count = stream.get('viewers') view_count = stream.get('viewers')
timestamp = parse_iso8601(stream.get('created_at')) timestamp = unified_timestamp(stream.get('createdAt'))
channel = stream['channel'] sq_user = try_get(gql, lambda x: x[1]['data']['user'], dict) or {}
title = self._live_title(channel.get('display_name') or channel.get('name')) uploader = sq_user.get('displayName')
description = channel.get('status') description = try_get(
sq_user, lambda x: x['broadcastSettings']['title'], compat_str)
thumbnails = [] thumbnail = url_or_none(try_get(
for thumbnail_key, thumbnail_url in stream['preview'].items(): gql, lambda x: x[2]['data']['user']['stream']['previewImageURL'],
m = re.search(r'(?P<width>\d+)x(?P<height>\d+)\.jpg$', thumbnail_key) compat_str))
if not m:
continue title = uploader or channel_name
thumbnails.append({ stream_type = stream.get('type')
'url': thumbnail_url, if stream_type in ['rerun', 'live']:
'width': int(m.group('width')), title += ' (%s)' % stream_type
'height': int(m.group('height')),
})
return { return {
'id': compat_str(stream['_id']), 'id': stream_id,
'display_id': channel_id, 'display_id': channel_name,
'title': title, 'title': self._live_title(title),
'description': description, 'description': description,
'thumbnails': thumbnails, 'thumbnail': thumbnail,
'uploader': channel.get('display_name'), 'uploader': uploader,
'uploader_id': channel.get('name'), 'uploader_id': channel_name,
'timestamp': timestamp, 'timestamp': timestamp,
'view_count': view_count, 'view_count': view_count,
'formats': formats, 'formats': formats,
'is_live': True, 'is_live': stream_type == 'live',
} }

View File

@ -578,6 +578,18 @@ class TwitterBroadcastIE(TwitterBaseIE, PeriscopeBaseIE):
IE_NAME = 'twitter:broadcast' IE_NAME = 'twitter:broadcast'
_VALID_URL = TwitterBaseIE._BASE_REGEX + r'i/broadcasts/(?P<id>[0-9a-zA-Z]{13})' _VALID_URL = TwitterBaseIE._BASE_REGEX + r'i/broadcasts/(?P<id>[0-9a-zA-Z]{13})'
_TEST = {
# untitled Periscope video
'url': 'https://twitter.com/i/broadcasts/1yNGaQLWpejGj',
'info_dict': {
'id': '1yNGaQLWpejGj',
'ext': 'mp4',
'title': 'Andrea May Sahouri - Periscope Broadcast',
'uploader': 'Andrea May Sahouri',
'uploader_id': '1PXEdBZWpGwKe',
},
}
def _real_extract(self, url): def _real_extract(self, url):
broadcast_id = self._match_id(url) broadcast_id = self._match_id(url)
broadcast = self._call_api( broadcast = self._call_api(

View File

@ -2,12 +2,17 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import (
compat_str,
compat_urllib_parse_urlencode,
)
from ..utils import ( from ..utils import (
clean_html, clean_html,
int_or_none, int_or_none,
parse_duration, parse_duration,
parse_iso8601,
qualities,
update_url_query, update_url_query,
str_or_none,
) )
@ -16,21 +21,25 @@ class UOLIE(InfoExtractor):
_VALID_URL = r'https?://(?:.+?\.)?uol\.com\.br/.*?(?:(?:mediaId|v)=|view/(?:[a-z0-9]+/)?|video(?:=|/(?:\d{4}/\d{2}/\d{2}/)?))(?P<id>\d+|[\w-]+-[A-Z0-9]+)' _VALID_URL = r'https?://(?:.+?\.)?uol\.com\.br/.*?(?:(?:mediaId|v)=|view/(?:[a-z0-9]+/)?|video(?:=|/(?:\d{4}/\d{2}/\d{2}/)?))(?P<id>\d+|[\w-]+-[A-Z0-9]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://player.mais.uol.com.br/player_video_v3.swf?mediaId=15951931', 'url': 'http://player.mais.uol.com.br/player_video_v3.swf?mediaId=15951931',
'md5': '25291da27dc45e0afb5718a8603d3816', 'md5': '4f1e26683979715ff64e4e29099cf020',
'info_dict': { 'info_dict': {
'id': '15951931', 'id': '15951931',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Miss simpatia é encontrada morta', 'title': 'Miss simpatia é encontrada morta',
'description': 'md5:3f8c11a0c0556d66daf7e5b45ef823b2', 'description': 'md5:3f8c11a0c0556d66daf7e5b45ef823b2',
'timestamp': 1470421860,
'upload_date': '20160805',
} }
}, { }, {
'url': 'http://tvuol.uol.com.br/video/incendio-destroi-uma-das-maiores-casas-noturnas-de-londres-04024E9A3268D4C95326', 'url': 'http://tvuol.uol.com.br/video/incendio-destroi-uma-das-maiores-casas-noturnas-de-londres-04024E9A3268D4C95326',
'md5': 'e41a2fb7b7398a3a46b6af37b15c00c9', 'md5': '2850a0e8dfa0a7307e04a96c5bdc5bc2',
'info_dict': { 'info_dict': {
'id': '15954259', 'id': '15954259',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Incêndio destrói uma das maiores casas noturnas de Londres', 'title': 'Incêndio destrói uma das maiores casas noturnas de Londres',
'description': 'Em Londres, um incêndio destruiu uma das maiores boates da cidade. Não há informações sobre vítimas.', 'description': 'Em Londres, um incêndio destruiu uma das maiores boates da cidade. Não há informações sobre vítimas.',
'timestamp': 1470674520,
'upload_date': '20160808',
} }
}, { }, {
'url': 'http://mais.uol.com.br/static/uolplayer/index.html?mediaId=15951931', 'url': 'http://mais.uol.com.br/static/uolplayer/index.html?mediaId=15951931',
@ -55,91 +64,55 @@ class UOLIE(InfoExtractor):
'only_matching': True, 'only_matching': True,
}] }]
_FORMATS = {
'2': {
'width': 640,
'height': 360,
},
'5': {
'width': 1280,
'height': 720,
},
'6': {
'width': 426,
'height': 240,
},
'7': {
'width': 1920,
'height': 1080,
},
'8': {
'width': 192,
'height': 144,
},
'9': {
'width': 568,
'height': 320,
},
'11': {
'width': 640,
'height': 360,
}
}
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
media_id = None
if video_id.isdigit():
media_id = video_id
if not media_id:
embed_page = self._download_webpage(
'https://jsuol.com.br/c/tv/uol/embed/?params=[embed,%s]' % video_id,
video_id, 'Downloading embed page', fatal=False)
if embed_page:
media_id = self._search_regex(
(r'uol\.com\.br/(\d+)', r'mediaId=(\d+)'),
embed_page, 'media id', default=None)
if not media_id:
webpage = self._download_webpage(url, video_id)
media_id = self._search_regex(r'mediaId=(\d+)', webpage, 'media id')
video_data = self._download_json( video_data = self._download_json(
'http://mais.uol.com.br/apiuol/v3/player/getMedia/%s.json' % media_id, # https://api.mais.uol.com.br/apiuol/v4/player/data/[MEDIA_ID]
media_id)['item'] 'https://api.mais.uol.com.br/apiuol/v3/media/detail/' + video_id,
video_id)['item']
media_id = compat_str(video_data['mediaId'])
title = video_data['title'] title = video_data['title']
ver = video_data.get('revision', 2)
query = { uol_formats = self._download_json(
'ver': video_data.get('numRevision', 2), 'https://croupier.mais.uol.com.br/v3/formats/%s/jsonp' % media_id,
'r': 'http://mais.uol.com.br', media_id)
} quality = qualities(['mobile', 'WEBM', '360p', '720p', '1080p'])
for k in ('token', 'sign'):
v = video_data.get(k)
if v:
query[k] = v
formats = [] formats = []
for f in video_data.get('formats', []): for format_id, f in uol_formats.items():
if not isinstance(f, dict):
continue
f_url = f.get('url') or f.get('secureUrl') f_url = f.get('url') or f.get('secureUrl')
if not f_url: if not f_url:
continue continue
query = {
'ver': ver,
'r': 'http://mais.uol.com.br',
}
for k in ('token', 'sign'):
v = f.get(k)
if v:
query[k] = v
f_url = update_url_query(f_url, query) f_url = update_url_query(f_url, query)
format_id = str_or_none(f.get('id')) format_id = format_id
if format_id == '10': if format_id == 'HLS':
formats.extend(self._extract_m3u8_formats( m3u8_formats = self._extract_m3u8_formats(
f_url, video_id, 'mp4', 'm3u8_native', f_url, media_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False)) m3u8_id='hls', fatal=False)
encoded_query = compat_urllib_parse_urlencode(query)
for m3u8_f in m3u8_formats:
m3u8_f['extra_param_to_segment_url'] = encoded_query
m3u8_f['url'] = update_url_query(m3u8_f['url'], query)
formats.extend(m3u8_formats)
continue continue
fmt = { formats.append({
'format_id': format_id, 'format_id': format_id,
'url': f_url, 'url': f_url,
'source_preference': 1, 'quality': quality(format_id),
} 'preference': -1,
fmt.update(self._FORMATS.get(format_id, {})) })
formats.append(fmt) self._sort_formats(formats)
self._sort_formats(formats, ('height', 'width', 'source_preference', 'tbr', 'ext'))
tags = [] tags = []
for tag in video_data.get('tags', []): for tag in video_data.get('tags', []):
@ -148,12 +121,24 @@ class UOLIE(InfoExtractor):
continue continue
tags.append(tag_description) tags.append(tag_description)
thumbnails = []
for q in ('Small', 'Medium', 'Wmedium', 'Large', 'Wlarge', 'Xlarge'):
q_url = video_data.get('thumb' + q)
if not q_url:
continue
thumbnails.append({
'id': q,
'url': q_url,
})
return { return {
'id': media_id, 'id': media_id,
'title': title, 'title': title,
'description': clean_html(video_data.get('desMedia')), 'description': clean_html(video_data.get('description')),
'thumbnail': video_data.get('thumbnail'), 'thumbnails': thumbnails,
'duration': int_or_none(video_data.get('durationSeconds')) or parse_duration(video_data.get('duration')), 'duration': parse_duration(video_data.get('duration')),
'tags': tags, 'tags': tags,
'formats': formats, 'formats': formats,
'timestamp': parse_iso8601(video_data.get('publishDate'), ' '),
'view_count': int_or_none(video_data.get('viewsQtty')),
} }

View File

@ -140,28 +140,28 @@ class VimeoBaseInfoExtractor(InfoExtractor):
}) })
# TODO: fix handling of 308 status code returned for live archive manifest requests # TODO: fix handling of 308 status code returned for live archive manifest requests
sep_pattern = r'/sep/video/'
for files_type in ('hls', 'dash'): for files_type in ('hls', 'dash'):
for cdn_name, cdn_data in config_files.get(files_type, {}).get('cdns', {}).items(): for cdn_name, cdn_data in config_files.get(files_type, {}).get('cdns', {}).items():
manifest_url = cdn_data.get('url') manifest_url = cdn_data.get('url')
if not manifest_url: if not manifest_url:
continue continue
format_id = '%s-%s' % (files_type, cdn_name) format_id = '%s-%s' % (files_type, cdn_name)
if files_type == 'hls': sep_manifest_urls = []
formats.extend(self._extract_m3u8_formats( if re.search(sep_pattern, manifest_url):
manifest_url, video_id, 'mp4', for suffix, repl in (('', 'video'), ('_sep', 'sep/video')):
'm3u8' if is_live else 'm3u8_native', m3u8_id=format_id, sep_manifest_urls.append((format_id + suffix, re.sub(
note='Downloading %s m3u8 information' % cdn_name, sep_pattern, '/%s/' % repl, manifest_url)))
fatal=False)) else:
elif files_type == 'dash': sep_manifest_urls = [(format_id, manifest_url)]
mpd_pattern = r'/%s/(?:sep/)?video/' % video_id for f_id, m_url in sep_manifest_urls:
mpd_manifest_urls = [] if files_type == 'hls':
if re.search(mpd_pattern, manifest_url): formats.extend(self._extract_m3u8_formats(
for suffix, repl in (('', 'video'), ('_sep', 'sep/video')): m_url, video_id, 'mp4',
mpd_manifest_urls.append((format_id + suffix, re.sub( 'm3u8' if is_live else 'm3u8_native', m3u8_id=f_id,
mpd_pattern, '/%s/%s/' % (video_id, repl), manifest_url))) note='Downloading %s m3u8 information' % cdn_name,
else: fatal=False))
mpd_manifest_urls = [(format_id, manifest_url)] elif files_type == 'dash':
for f_id, m_url in mpd_manifest_urls:
if 'json=1' in m_url: if 'json=1' in m_url:
real_m_url = (self._download_json(m_url, video_id, fatal=False) or {}).get('url') real_m_url = (self._download_json(m_url, video_id, fatal=False) or {}).get('url')
if real_m_url: if real_m_url:
@ -170,11 +170,6 @@ class VimeoBaseInfoExtractor(InfoExtractor):
m_url.replace('/master.json', '/master.mpd'), video_id, f_id, m_url.replace('/master.json', '/master.mpd'), video_id, f_id,
'Downloading %s MPD information' % cdn_name, 'Downloading %s MPD information' % cdn_name,
fatal=False) fatal=False)
for f in mpd_formats:
if f.get('vcodec') == 'none':
f['preference'] = -50
elif f.get('acodec') == 'none':
f['preference'] = -40
formats.extend(mpd_formats) formats.extend(mpd_formats)
live_archive = live_event.get('archive') or {} live_archive = live_event.get('archive') or {}
@ -186,6 +181,12 @@ class VimeoBaseInfoExtractor(InfoExtractor):
'preference': 1, 'preference': 1,
}) })
for f in formats:
if f.get('vcodec') == 'none':
f['preference'] = -50
elif f.get('acodec') == 'none':
f['preference'] = -40
subtitles = {} subtitles = {}
text_tracks = config['request'].get('text_tracks') text_tracks = config['request'].get('text_tracks')
if text_tracks: if text_tracks:

View File

@ -56,7 +56,7 @@ class WistiaIE(InfoExtractor):
urls.append(unescapeHTML(match.group('url'))) urls.append(unescapeHTML(match.group('url')))
for match in re.finditer( for match in re.finditer(
r'''(?sx) r'''(?sx)
<div[^>]+class=(["']).*?\bwistia_async_(?P<id>[a-z0-9]{10})\b.*?\2 <div[^>]+class=(["'])(?:(?!\1).)*?\bwistia_async_(?P<id>[a-z0-9]{10})\b(?:(?!\1).)*?\1
''', webpage): ''', webpage):
urls.append('wistia:%s' % match.group('id')) urls.append('wistia:%s' % match.group('id'))
for match in re.finditer(r'(?:data-wistia-?id=["\']|Wistia\.embed\(["\']|id=["\']wistia_)(?P<id>[a-z0-9]{10})', webpage): for match in re.finditer(r'(?:data-wistia-?id=["\']|Wistia\.embed\(["\']|id=["\']wistia_)(?P<id>[a-z0-9]{10})', webpage):

View File

@ -20,13 +20,13 @@ from ..utils import (
class XHamsterIE(InfoExtractor): class XHamsterIE(InfoExtractor):
_DOMAINS = r'(?:xhamster\.(?:com|one|desi)|xhms\.pro|xhamster[27]\.com)' _DOMAINS = r'(?:xhamster\.(?:com|one|desi)|xhms\.pro|xhamster\d+\.com)'
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?:// https?://
(?:.+?\.)?%s/ (?:.+?\.)?%s/
(?: (?:
movies/(?P<id>\d+)/(?P<display_id>[^/]*)\.html| movies/(?P<id>[\dA-Za-z]+)/(?P<display_id>[^/]*)\.html|
videos/(?P<display_id_2>[^/]*)-(?P<id_2>\d+) videos/(?P<display_id_2>[^/]*)-(?P<id_2>[\dA-Za-z]+)
) )
''' % _DOMAINS ''' % _DOMAINS
_TESTS = [{ _TESTS = [{
@ -99,12 +99,21 @@ class XHamsterIE(InfoExtractor):
}, { }, {
'url': 'https://xhamster2.com/videos/femaleagent-shy-beauty-takes-the-bait-1509445', 'url': 'https://xhamster2.com/videos/femaleagent-shy-beauty-takes-the-bait-1509445',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://xhamster11.com/videos/femaleagent-shy-beauty-takes-the-bait-1509445',
'only_matching': True,
}, {
'url': 'https://xhamster26.com/videos/femaleagent-shy-beauty-takes-the-bait-1509445',
'only_matching': True,
}, { }, {
'url': 'http://xhamster.com/movies/1509445/femaleagent_shy_beauty_takes_the_bait.html', 'url': 'http://xhamster.com/movies/1509445/femaleagent_shy_beauty_takes_the_bait.html',
'only_matching': True, 'only_matching': True,
}, { }, {
'url': 'http://xhamster.com/movies/2221348/britney_spears_sexy_booty.html?hd', 'url': 'http://xhamster.com/movies/2221348/britney_spears_sexy_booty.html?hd',
'only_matching': True, 'only_matching': True,
}, {
'url': 'http://de.xhamster.com/videos/skinny-girl-fucks-herself-hard-in-the-forest-xhnBJZx',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -129,7 +138,8 @@ class XHamsterIE(InfoExtractor):
initials = self._parse_json( initials = self._parse_json(
self._search_regex( self._search_regex(
r'window\.initials\s*=\s*({.+?})\s*;\s*\n', webpage, 'initials', (r'window\.initials\s*=\s*({.+?})\s*;\s*</script>',
r'window\.initials\s*=\s*({.+?})\s*;'), webpage, 'initials',
default='{}'), default='{}'),
video_id, fatal=False) video_id, fatal=False)
if initials: if initials:

View File

@ -12,6 +12,7 @@ from ..compat import (
) )
from ..utils import ( from ..utils import (
clean_html, clean_html,
ExtractorError,
int_or_none, int_or_none,
mimetype2ext, mimetype2ext,
parse_iso8601, parse_iso8601,
@ -368,31 +369,47 @@ class YahooGyaOPlayerIE(InfoExtractor):
'url': 'https://gyao.yahoo.co.jp/episode/%E3%81%8D%E3%81%AE%E3%81%86%E4%BD%95%E9%A3%9F%E3%81%B9%E3%81%9F%EF%BC%9F%20%E7%AC%AC2%E8%A9%B1%202019%2F4%2F12%E6%94%BE%E9%80%81%E5%88%86/5cb02352-b725-409e-9f8d-88f947a9f682', 'url': 'https://gyao.yahoo.co.jp/episode/%E3%81%8D%E3%81%AE%E3%81%86%E4%BD%95%E9%A3%9F%E3%81%B9%E3%81%9F%EF%BC%9F%20%E7%AC%AC2%E8%A9%B1%202019%2F4%2F12%E6%94%BE%E9%80%81%E5%88%86/5cb02352-b725-409e-9f8d-88f947a9f682',
'only_matching': True, 'only_matching': True,
}] }]
_GEO_BYPASS = False
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url).replace('/', ':') video_id = self._match_id(url).replace('/', ':')
video = self._download_json( headers = self.geo_verification_headers()
'https://gyao.yahoo.co.jp/dam/v1/videos/' + video_id, headers['Accept'] = 'application/json'
video_id, query={ resp = self._download_json(
'fields': 'longDescription,title,videoId', 'https://gyao.yahoo.co.jp/apis/playback/graphql', video_id, query={
}, headers={ 'appId': 'dj00aiZpPUNJeDh2cU1RazU3UCZzPWNvbnN1bWVyc2VjcmV0Jng9NTk-',
'X-User-Agent': 'Unknown Pc GYAO!/2.0.0 Web', 'query': '''{
}) content(parameter: {contentId: "%s", logicaAgent: PC_WEB}) {
video {
delivery {
id
}
title
}
}
}''' % video_id,
}, headers=headers)
content = resp['data']['content']
if not content:
msg = resp['errors'][0]['message']
if msg == 'not in japan':
self.raise_geo_restricted(countries=['JP'])
raise ExtractorError(msg)
video = content['video']
return { return {
'_type': 'url_transparent', '_type': 'url_transparent',
'id': video_id, 'id': video_id,
'title': video['title'], 'title': video['title'],
'url': smuggle_url( 'url': smuggle_url(
'http://players.brightcove.net/4235717419001/SyG5P0gjb_default/index.html?videoId=' + video['videoId'], 'http://players.brightcove.net/4235717419001/SyG5P0gjb_default/index.html?videoId=' + video['delivery']['id'],
{'geo_countries': ['JP']}), {'geo_countries': ['JP']}),
'description': video.get('longDescription'),
'ie_key': BrightcoveNewIE.ie_key(), 'ie_key': BrightcoveNewIE.ie_key(),
} }
class YahooGyaOIE(InfoExtractor): class YahooGyaOIE(InfoExtractor):
IE_NAME = 'yahoo:gyao' IE_NAME = 'yahoo:gyao'
_VALID_URL = r'https?://(?:gyao\.yahoo\.co\.jp/(?:p|title/[^/]+)|streaming\.yahoo\.co\.jp/p/y)/(?P<id>\d+/v\d+|[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})' _VALID_URL = r'https?://(?:gyao\.yahoo\.co\.jp/(?:p|title(?:/[^/]+)?)|streaming\.yahoo\.co\.jp/p/y)/(?P<id>\d+/v\d+|[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})'
_TESTS = [{ _TESTS = [{
'url': 'https://gyao.yahoo.co.jp/p/00449/v03102/', 'url': 'https://gyao.yahoo.co.jp/p/00449/v03102/',
'info_dict': { 'info_dict': {
@ -405,6 +422,9 @@ class YahooGyaOIE(InfoExtractor):
}, { }, {
'url': 'https://gyao.yahoo.co.jp/title/%E3%81%97%E3%82%83%E3%81%B9%E3%81%8F%E3%82%8A007/5b025a49-b2e5-4dc7-945c-09c6634afacf', 'url': 'https://gyao.yahoo.co.jp/title/%E3%81%97%E3%82%83%E3%81%B9%E3%81%8F%E3%82%8A007/5b025a49-b2e5-4dc7-945c-09c6634afacf',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://gyao.yahoo.co.jp/title/5b025a49-b2e5-4dc7-945c-09c6634afacf',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -70,9 +70,14 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
_PLAYLIST_ID_RE = r'(?:PL|LL|EC|UU|FL|RD|UL|TL|PU|OLAK5uy_)[0-9A-Za-z-_]{10,}' _PLAYLIST_ID_RE = r'(?:PL|LL|EC|UU|FL|RD|UL|TL|PU|OLAK5uy_)[0-9A-Za-z-_]{10,}'
_YOUTUBE_CLIENT_HEADERS = {
'x-youtube-client-name': '1',
'x-youtube-client-version': '1.20200609.04.02',
}
def _set_language(self): def _set_language(self):
self._set_cookie( self._set_cookie(
'.youtube.com', 'PREF', 'f1=50000000&hl=en', '.youtube.com', 'PREF', 'f1=50000000&f6=8&hl=en',
# YouTube sets the expire time to about two months # YouTube sets the expire time to about two months
expire_time=time.time() + 2 * 30 * 24 * 3600) expire_time=time.time() + 2 * 30 * 24 * 3600)
@ -298,10 +303,11 @@ class YoutubeEntryListBaseInfoExtractor(YoutubeBaseInfoExtractor):
# Downloading page may result in intermittent 5xx HTTP error # Downloading page may result in intermittent 5xx HTTP error
# that is usually worked around with a retry # that is usually worked around with a retry
more = self._download_json( more = self._download_json(
'https://youtube.com/%s' % mobj.group('more'), playlist_id, 'https://www.youtube.com/%s' % mobj.group('more'), playlist_id,
'Downloading page #%s%s' 'Downloading page #%s%s'
% (page_num, ' (retry #%d)' % count if count else ''), % (page_num, ' (retry #%d)' % count if count else ''),
transform_source=uppercase_escape) transform_source=uppercase_escape,
headers=self._YOUTUBE_CLIENT_HEADERS)
break break
except ExtractorError as e: except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code in (500, 503): if isinstance(e.cause, compat_HTTPError) and e.cause.code in (500, 503):
@ -388,8 +394,15 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
(?:www\.)?invidious\.drycat\.fr/| (?:www\.)?invidious\.drycat\.fr/|
(?:www\.)?tube\.poal\.co/| (?:www\.)?tube\.poal\.co/|
(?:www\.)?vid\.wxzm\.sx/| (?:www\.)?vid\.wxzm\.sx/|
(?:www\.)?yewtu\.be/|
(?:www\.)?yt\.elukerio\.org/| (?:www\.)?yt\.elukerio\.org/|
(?:www\.)?yt\.lelux\.fi/| (?:www\.)?yt\.lelux\.fi/|
(?:www\.)?invidious\.ggc-project\.de/|
(?:www\.)?yt\.maisputain\.ovh/|
(?:www\.)?invidious\.13ad\.de/|
(?:www\.)?invidious\.toot\.koeln/|
(?:www\.)?invidious\.fdn\.fr/|
(?:www\.)?watch\.nettohikari\.com/|
(?:www\.)?kgg2m7yk5aybusll\.onion/| (?:www\.)?kgg2m7yk5aybusll\.onion/|
(?:www\.)?qklhadlycap4cnod\.onion/| (?:www\.)?qklhadlycap4cnod\.onion/|
(?:www\.)?axqzx4s6s54s32yentfqojs3x5i7faxza6xo3ehd4bzzsg2ii4fv2iid\.onion/| (?:www\.)?axqzx4s6s54s32yentfqojs3x5i7faxza6xo3ehd4bzzsg2ii4fv2iid\.onion/|
@ -397,6 +410,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
(?:www\.)?fz253lmuao3strwbfbmx46yu7acac2jz27iwtorgmbqlkurlclmancad\.onion/| (?:www\.)?fz253lmuao3strwbfbmx46yu7acac2jz27iwtorgmbqlkurlclmancad\.onion/|
(?:www\.)?invidious\.l4qlywnpwqsluw65ts7md3khrivpirse744un3x7mlskqauz5pyuzgqd\.onion/| (?:www\.)?invidious\.l4qlywnpwqsluw65ts7md3khrivpirse744un3x7mlskqauz5pyuzgqd\.onion/|
(?:www\.)?owxfohz4kjyv25fvlqilyxast7inivgiktls3th44jhk3ej3i7ya\.b32\.i2p/| (?:www\.)?owxfohz4kjyv25fvlqilyxast7inivgiktls3th44jhk3ej3i7ya\.b32\.i2p/|
(?:www\.)?4l2dgddgsrkf2ous66i6seeyi6etzfgrue332grh2n7madpwopotugyd\.onion/|
youtube\.googleapis\.com/) # the various hostnames, with wildcard subdomains youtube\.googleapis\.com/) # the various hostnames, with wildcard subdomains
(?:.*?\#/)? # handle anchor (#/) redirect urls (?:.*?\#/)? # handle anchor (#/) redirect urls
(?: # the various things that can precede the ID: (?: # the various things that can precede the ID:
@ -426,6 +440,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
(?(1).+)? # if we found the ID, everything can follow (?(1).+)? # if we found the ID, everything can follow
$""" % {'playlist_id': YoutubeBaseInfoExtractor._PLAYLIST_ID_RE} $""" % {'playlist_id': YoutubeBaseInfoExtractor._PLAYLIST_ID_RE}
_NEXT_URL_RE = r'[\?&]next_url=([^&]+)' _NEXT_URL_RE = r'[\?&]next_url=([^&]+)'
_PLAYER_INFO_RE = (
r'/(?P<id>[a-zA-Z0-9_-]{8,})/player_ias\.vflset(?:/[a-zA-Z]{2,3}_[a-zA-Z]{2,3})?/base\.(?P<ext>[a-z]+)$',
r'\b(?P<id>vfl[a-zA-Z0-9_-]+)\b.*?\.(?P<ext>[a-z]+)$',
)
_formats = { _formats = {
'5': {'ext': 'flv', 'width': 400, 'height': 240, 'acodec': 'mp3', 'abr': 64, 'vcodec': 'h263'}, '5': {'ext': 'flv', 'width': 400, 'height': 240, 'acodec': 'mp3', 'abr': 64, 'vcodec': 'h263'},
'6': {'ext': 'flv', 'width': 450, 'height': 270, 'acodec': 'mp3', 'abr': 64, 'vcodec': 'h263'}, '6': {'ext': 'flv', 'width': 450, 'height': 270, 'acodec': 'mp3', 'abr': 64, 'vcodec': 'h263'},
@ -1227,6 +1245,42 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'url': 'https://www.youtubekids.com/watch?v=3b8nCWDgZ6Q', 'url': 'https://www.youtubekids.com/watch?v=3b8nCWDgZ6Q',
'only_matching': True, 'only_matching': True,
}, },
{
# invalid -> valid video id redirection
'url': 'DJztXj2GPfl',
'info_dict': {
'id': 'DJztXj2GPfk',
'ext': 'mp4',
'title': 'Panjabi MC - Mundian To Bach Ke (The Dictator Soundtrack)',
'description': 'md5:bf577a41da97918e94fa9798d9228825',
'upload_date': '20090125',
'uploader': 'Prochorowka',
'uploader_id': 'Prochorowka',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/Prochorowka',
'artist': 'Panjabi MC',
'track': 'Beware of the Boys (Mundian to Bach Ke) - Motivo Hi-Lectro Remix',
'album': 'Beware of the Boys (Mundian To Bach Ke)',
},
'params': {
'skip_download': True,
},
},
{
# empty description results in an empty string
'url': 'https://www.youtube.com/watch?v=x41yOUIvK2k',
'info_dict': {
'id': 'x41yOUIvK2k',
'ext': 'mp4',
'title': 'IMG 3456',
'description': '',
'upload_date': '20170613',
'uploader_id': 'ElevageOrVert',
'uploader': 'ElevageOrVert',
},
'params': {
'skip_download': True,
},
},
] ]
def __init__(self, *args, **kwargs): def __init__(self, *args, **kwargs):
@ -1253,14 +1307,18 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
""" Return a string representation of a signature """ """ Return a string representation of a signature """
return '.'.join(compat_str(len(part)) for part in example_sig.split('.')) return '.'.join(compat_str(len(part)) for part in example_sig.split('.'))
def _extract_signature_function(self, video_id, player_url, example_sig): @classmethod
id_m = re.match( def _extract_player_info(cls, player_url):
r'.*?[-.](?P<id>[a-zA-Z0-9_-]+)(?:/watch_as3|/html5player(?:-new)?|(?:/[a-z]{2,3}_[A-Z]{2})?/base)?\.(?P<ext>[a-z]+)$', for player_re in cls._PLAYER_INFO_RE:
player_url) id_m = re.search(player_re, player_url)
if not id_m: if id_m:
break
else:
raise ExtractorError('Cannot identify player %r' % player_url) raise ExtractorError('Cannot identify player %r' % player_url)
player_type = id_m.group('ext') return id_m.group('ext'), id_m.group('id')
player_id = id_m.group('id')
def _extract_signature_function(self, video_id, player_url, example_sig):
player_type, player_id = self._extract_player_info(player_url)
# Read from filesystem cache # Read from filesystem cache
func_id = '%s_%s_%s' % ( func_id = '%s_%s_%s' % (
@ -1342,7 +1400,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
funcname = self._search_regex( funcname = self._search_regex(
(r'\b[cs]\s*&&\s*[adf]\.set\([^,]+\s*,\s*encodeURIComponent\s*\(\s*(?P<sig>[a-zA-Z0-9$]+)\(', (r'\b[cs]\s*&&\s*[adf]\.set\([^,]+\s*,\s*encodeURIComponent\s*\(\s*(?P<sig>[a-zA-Z0-9$]+)\(',
r'\b[a-zA-Z0-9]+\s*&&\s*[a-zA-Z0-9]+\.set\([^,]+\s*,\s*encodeURIComponent\s*\(\s*(?P<sig>[a-zA-Z0-9$]+)\(', r'\b[a-zA-Z0-9]+\s*&&\s*[a-zA-Z0-9]+\.set\([^,]+\s*,\s*encodeURIComponent\s*\(\s*(?P<sig>[a-zA-Z0-9$]+)\(',
r'\b(?P<sig>[a-zA-Z0-9$]{2})\s*=\s*function\(\s*a\s*\)\s*{\s*a\s*=\s*a\.split\(\s*""\s*\)', r'(?:\b|[^a-zA-Z0-9$])(?P<sig>[a-zA-Z0-9$]{2})\s*=\s*function\(\s*a\s*\)\s*{\s*a\s*=\s*a\.split\(\s*""\s*\)',
r'(?P<sig>[a-zA-Z0-9$]+)\s*=\s*function\(\s*a\s*\)\s*{\s*a\s*=\s*a\.split\(\s*""\s*\)', r'(?P<sig>[a-zA-Z0-9$]+)\s*=\s*function\(\s*a\s*\)\s*{\s*a\s*=\s*a\.split\(\s*""\s*\)',
# Obsolete patterns # Obsolete patterns
r'(["\'])signature\1\s*,\s*(?P<sig>[a-zA-Z0-9$]+)\(', r'(["\'])signature\1\s*,\s*(?P<sig>[a-zA-Z0-9$]+)\(',
@ -1616,8 +1674,63 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
video_id = mobj.group(2) video_id = mobj.group(2)
return video_id return video_id
def _extract_chapters_from_json(self, webpage, video_id, duration):
if not webpage:
return
player = self._parse_json(
self._search_regex(
r'RELATED_PLAYER_ARGS["\']\s*:\s*({.+})\s*,?\s*\n', webpage,
'player args', default='{}'),
video_id, fatal=False)
if not player or not isinstance(player, dict):
return
watch_next_response = player.get('watch_next_response')
if not isinstance(watch_next_response, compat_str):
return
response = self._parse_json(watch_next_response, video_id, fatal=False)
if not response or not isinstance(response, dict):
return
chapters_list = try_get(
response,
lambda x: x['playerOverlays']
['playerOverlayRenderer']
['decoratedPlayerBarRenderer']
['decoratedPlayerBarRenderer']
['playerBar']
['chapteredPlayerBarRenderer']
['chapters'],
list)
if not chapters_list:
return
def chapter_time(chapter):
return float_or_none(
try_get(
chapter,
lambda x: x['chapterRenderer']['timeRangeStartMillis'],
int),
scale=1000)
chapters = []
for next_num, chapter in enumerate(chapters_list, start=1):
start_time = chapter_time(chapter)
if start_time is None:
continue
end_time = (chapter_time(chapters_list[next_num])
if next_num < len(chapters_list) else duration)
if end_time is None:
continue
title = try_get(
chapter, lambda x: x['chapterRenderer']['title']['simpleText'],
compat_str)
chapters.append({
'start_time': start_time,
'end_time': end_time,
'title': title,
})
return chapters
@staticmethod @staticmethod
def _extract_chapters(description, duration): def _extract_chapters_from_description(description, duration):
if not description: if not description:
return None return None
chapter_lines = re.findall( chapter_lines = re.findall(
@ -1651,6 +1764,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
}) })
return chapters return chapters
def _extract_chapters(self, webpage, description, video_id, duration):
return (self._extract_chapters_from_json(webpage, video_id, duration)
or self._extract_chapters_from_description(description, duration))
def _real_extract(self, url): def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {}) url, smuggled_data = unsmuggle_url(url, {})
@ -1678,7 +1795,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
# Get video webpage # Get video webpage
url = proto + '://www.youtube.com/watch?v=%s&gl=US&hl=en&has_verified=1&bpctr=9999999999' % video_id url = proto + '://www.youtube.com/watch?v=%s&gl=US&hl=en&has_verified=1&bpctr=9999999999' % video_id
video_webpage = self._download_webpage(url, video_id) video_webpage, urlh = self._download_webpage_handle(url, video_id)
qs = compat_parse_qs(compat_urllib_parse_urlparse(urlh.geturl()).query)
video_id = qs.get('v', [None])[0] or video_id
# Attempt to extract SWF player URL # Attempt to extract SWF player URL
mobj = re.search(r'swfConfig.*?"(https?:\\/\\/.*?watch.*?-.*?\.swf)"', video_webpage) mobj = re.search(r'swfConfig.*?"(https?:\\/\\/.*?watch.*?-.*?\.swf)"', video_webpage)
@ -1721,7 +1841,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
# Get video info # Get video info
video_info = {} video_info = {}
embed_webpage = None embed_webpage = None
if re.search(r'player-age-gate-content">', video_webpage) is not None: if (self._og_search_property('restrictions:age', video_webpage, default=None) == '18+'
or re.search(r'player-age-gate-content">', video_webpage) is not None):
age_gate = True age_gate = True
# We simulate the access to the video from www.youtube.com/v/{video_id} # We simulate the access to the video from www.youtube.com/v/{video_id}
# this can be viewed without login into Youtube # this can be viewed without login into Youtube
@ -1794,6 +1915,9 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
video_details = try_get( video_details = try_get(
player_response, lambda x: x['videoDetails'], dict) or {} player_response, lambda x: x['videoDetails'], dict) or {}
microformat = try_get(
player_response, lambda x: x['microformat']['playerMicroformatRenderer'], dict) or {}
video_title = video_info.get('title', [None])[0] or video_details.get('title') video_title = video_info.get('title', [None])[0] or video_details.get('title')
if not video_title: if not video_title:
self._downloader.report_warning('Unable to extract video title') self._downloader.report_warning('Unable to extract video title')
@ -1823,7 +1947,9 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
''', replace_url, video_description) ''', replace_url, video_description)
video_description = clean_html(video_description) video_description = clean_html(video_description)
else: else:
video_description = self._html_search_meta('description', video_webpage) or video_details.get('shortDescription') video_description = video_details.get('shortDescription')
if video_description is None:
video_description = self._html_search_meta('description', video_webpage)
if not smuggled_data.get('force_singlefeed', False): if not smuggled_data.get('force_singlefeed', False):
if not self._downloader.params.get('noplaylist'): if not self._downloader.params.get('noplaylist'):
@ -1840,15 +1966,26 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
# fields may contain comma as well (see # fields may contain comma as well (see
# https://github.com/ytdl-org/youtube-dl/issues/8536) # https://github.com/ytdl-org/youtube-dl/issues/8536)
feed_data = compat_parse_qs(compat_urllib_parse_unquote_plus(feed)) feed_data = compat_parse_qs(compat_urllib_parse_unquote_plus(feed))
def feed_entry(name):
return try_get(feed_data, lambda x: x[name][0], compat_str)
feed_id = feed_entry('id')
if not feed_id:
continue
feed_title = feed_entry('title')
title = video_title
if feed_title:
title += ' (%s)' % feed_title
entries.append({ entries.append({
'_type': 'url_transparent', '_type': 'url_transparent',
'ie_key': 'Youtube', 'ie_key': 'Youtube',
'url': smuggle_url( 'url': smuggle_url(
'%s://www.youtube.com/watch?v=%s' % (proto, feed_data['id'][0]), '%s://www.youtube.com/watch?v=%s' % (proto, feed_data['id'][0]),
{'force_singlefeed': True}), {'force_singlefeed': True}),
'title': '%s (%s)' % (video_title, feed_data['title'][0]), 'title': title,
}) })
feed_ids.append(feed_data['id'][0]) feed_ids.append(feed_id)
self.to_screen( self.to_screen(
'Downloading multifeed video (%s) - add --no-playlist to just download video %s' 'Downloading multifeed video (%s) - add --no-playlist to just download video %s'
% (', '.join(feed_ids), video_id)) % (', '.join(feed_ids), video_id))
@ -1860,6 +1997,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
view_count = extract_view_count(video_info) view_count = extract_view_count(video_info)
if view_count is None and video_details: if view_count is None and video_details:
view_count = int_or_none(video_details.get('viewCount')) view_count = int_or_none(video_details.get('viewCount'))
if view_count is None and microformat:
view_count = int_or_none(microformat.get('viewCount'))
if is_live is None: if is_live is None:
is_live = bool_or_none(video_details.get('isLive')) is_live = bool_or_none(video_details.get('isLive'))
@ -1919,12 +2058,12 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
} }
for fmt in streaming_formats: for fmt in streaming_formats:
if fmt.get('drm_families'): if fmt.get('drmFamilies') or fmt.get('drm_families'):
continue continue
url = url_or_none(fmt.get('url')) url = url_or_none(fmt.get('url'))
if not url: if not url:
cipher = fmt.get('cipher') cipher = fmt.get('cipher') or fmt.get('signatureCipher')
if not cipher: if not cipher:
continue continue
url_data = compat_parse_qs(cipher) url_data = compat_parse_qs(cipher)
@ -1975,22 +2114,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
if self._downloader.params.get('verbose'): if self._downloader.params.get('verbose'):
if player_url is None: if player_url is None:
player_version = 'unknown'
player_desc = 'unknown' player_desc = 'unknown'
else: else:
if player_url.endswith('swf'): player_type, player_version = self._extract_player_info(player_url)
player_version = self._search_regex( player_desc = '%s player %s' % ('flash' if player_type == 'swf' else 'html5', player_version)
r'-(.+?)(?:/watch_as3)?\.swf$', player_url,
'flash player', fatal=False)
player_desc = 'flash player %s' % player_version
else:
player_version = self._search_regex(
[r'html5player-([^/]+?)(?:/html5player(?:-new)?)?\.js',
r'(?:www|player(?:_ias)?)[-.]([^/]+)(?:/[a-z]{2,3}_[A-Z]{2})?/base\.js'],
player_url,
'html5 player', fatal=False)
player_desc = 'html5 player %s' % player_version
parts_sizes = self._signature_cache_id(encrypted_sig) parts_sizes = self._signature_cache_id(encrypted_sig)
self.to_screen('{%s} signature length %s, %s' % self.to_screen('{%s} signature length %s, %s' %
(format_id, parts_sizes, player_desc)) (format_id, parts_sizes, player_desc))
@ -2123,7 +2250,12 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
video_uploader_id = mobj.group('uploader_id') video_uploader_id = mobj.group('uploader_id')
video_uploader_url = mobj.group('uploader_url') video_uploader_url = mobj.group('uploader_url')
else: else:
self._downloader.report_warning('unable to extract uploader nickname') owner_profile_url = url_or_none(microformat.get('ownerProfileUrl'))
if owner_profile_url:
video_uploader_id = self._search_regex(
r'(?:user|channel)/([^/]+)', owner_profile_url, 'uploader id',
default=None)
video_uploader_url = owner_profile_url
channel_id = ( channel_id = (
str_or_none(video_details.get('channelId')) str_or_none(video_details.get('channelId'))
@ -2134,17 +2266,33 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
video_webpage, 'channel id', default=None, group='id')) video_webpage, 'channel id', default=None, group='id'))
channel_url = 'http://www.youtube.com/channel/%s' % channel_id if channel_id else None channel_url = 'http://www.youtube.com/channel/%s' % channel_id if channel_id else None
# thumbnail image thumbnails = []
# We try first to get a high quality image: thumbnails_list = try_get(
m_thumb = re.search(r'<span itemprop="thumbnail".*?href="(.*?)">', video_details, lambda x: x['thumbnail']['thumbnails'], list) or []
video_webpage, re.DOTALL) for t in thumbnails_list:
if m_thumb is not None: if not isinstance(t, dict):
video_thumbnail = m_thumb.group(1) continue
elif 'thumbnail_url' not in video_info: thumbnail_url = url_or_none(t.get('url'))
self._downloader.report_warning('unable to extract video thumbnail') if not thumbnail_url:
continue
thumbnails.append({
'url': thumbnail_url,
'width': int_or_none(t.get('width')),
'height': int_or_none(t.get('height')),
})
if not thumbnails:
video_thumbnail = None video_thumbnail = None
else: # don't panic if we can't find it # We try first to get a high quality image:
video_thumbnail = compat_urllib_parse_unquote_plus(video_info['thumbnail_url'][0]) m_thumb = re.search(r'<span itemprop="thumbnail".*?href="(.*?)">',
video_webpage, re.DOTALL)
if m_thumb is not None:
video_thumbnail = m_thumb.group(1)
thumbnail_url = try_get(video_info, lambda x: x['thumbnail_url'][0], compat_str)
if thumbnail_url:
video_thumbnail = compat_urllib_parse_unquote_plus(thumbnail_url)
if video_thumbnail:
thumbnails.append({'url': video_thumbnail})
# upload date # upload date
upload_date = self._html_search_meta( upload_date = self._html_search_meta(
@ -2154,6 +2302,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
[r'(?s)id="eow-date.*?>(.*?)</span>', [r'(?s)id="eow-date.*?>(.*?)</span>',
r'(?:id="watch-uploader-info".*?>.*?|["\']simpleText["\']\s*:\s*["\'])(?:Published|Uploaded|Streamed live|Started) on (.+?)[<"\']'], r'(?:id="watch-uploader-info".*?>.*?|["\']simpleText["\']\s*:\s*["\'])(?:Published|Uploaded|Streamed live|Started) on (.+?)[<"\']'],
video_webpage, 'upload date', default=None) video_webpage, 'upload date', default=None)
if not upload_date:
upload_date = microformat.get('publishDate') or microformat.get('uploadDate')
upload_date = unified_strdate(upload_date) upload_date = unified_strdate(upload_date)
video_license = self._html_search_regex( video_license = self._html_search_regex(
@ -2225,17 +2375,21 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
m_cat_container = self._search_regex( m_cat_container = self._search_regex(
r'(?s)<h4[^>]*>\s*Category\s*</h4>\s*<ul[^>]*>(.*?)</ul>', r'(?s)<h4[^>]*>\s*Category\s*</h4>\s*<ul[^>]*>(.*?)</ul>',
video_webpage, 'categories', default=None) video_webpage, 'categories', default=None)
category = None
if m_cat_container: if m_cat_container:
category = self._html_search_regex( category = self._html_search_regex(
r'(?s)<a[^<]+>(.*?)</a>', m_cat_container, 'category', r'(?s)<a[^<]+>(.*?)</a>', m_cat_container, 'category',
default=None) default=None)
video_categories = None if category is None else [category] if not category:
else: category = try_get(
video_categories = None microformat, lambda x: x['category'], compat_str)
video_categories = None if category is None else [category]
video_tags = [ video_tags = [
unescapeHTML(m.group('content')) unescapeHTML(m.group('content'))
for m in re.finditer(self._meta_regex('og:video:tag'), video_webpage)] for m in re.finditer(self._meta_regex('og:video:tag'), video_webpage)]
if not video_tags:
video_tags = try_get(video_details, lambda x: x['keywords'], list)
def _extract_count(count_name): def _extract_count(count_name):
return str_to_int(self._search_regex( return str_to_int(self._search_regex(
@ -2286,7 +2440,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
errnote='Unable to download video annotations', fatal=False, errnote='Unable to download video annotations', fatal=False,
data=urlencode_postdata({xsrf_field_name: xsrf_token})) data=urlencode_postdata({xsrf_field_name: xsrf_token}))
chapters = self._extract_chapters(description_original, video_duration) chapters = self._extract_chapters(video_webpage, description_original, video_id, video_duration)
# Look for the DASH manifest # Look for the DASH manifest
if self._downloader.params.get('youtube_include_dash_manifest', True): if self._downloader.params.get('youtube_include_dash_manifest', True):
@ -2377,7 +2531,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'creator': video_creator or artist, 'creator': video_creator or artist,
'title': video_title, 'title': video_title,
'alt_title': video_alt_title or track, 'alt_title': video_alt_title or track,
'thumbnail': video_thumbnail, 'thumbnails': thumbnails,
'description': video_description, 'description': video_description,
'categories': video_categories, 'categories': video_categories,
'tags': video_tags, 'tags': video_tags,
@ -2641,7 +2795,7 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
ids = [] ids = []
last_id = playlist_id[-11:] last_id = playlist_id[-11:]
for n in itertools.count(1): for n in itertools.count(1):
url = 'https://youtube.com/watch?v=%s&list=%s' % (last_id, playlist_id) url = 'https://www.youtube.com/watch?v=%s&list=%s' % (last_id, playlist_id)
webpage = self._download_webpage( webpage = self._download_webpage(
url, playlist_id, 'Downloading page {0} of Youtube mix'.format(n)) url, playlist_id, 'Downloading page {0} of Youtube mix'.format(n))
new_ids = orderedSet(re.findall( new_ids = orderedSet(re.findall(
@ -2873,7 +3027,7 @@ class YoutubeChannelIE(YoutubePlaylistBaseInfoExtractor):
class YoutubeUserIE(YoutubeChannelIE): class YoutubeUserIE(YoutubeChannelIE):
IE_DESC = 'YouTube.com user videos (URL or "ytuser" keyword)' IE_DESC = 'YouTube.com user videos (URL or "ytuser" keyword)'
_VALID_URL = r'(?:(?:https?://(?:\w+\.)?youtube\.com/(?:(?P<user>user|c)/)?(?!(?:attribution_link|watch|results|shared)(?:$|[^a-z_A-Z0-9-])))|ytuser:)(?!feed/)(?P<id>[A-Za-z0-9_-]+)' _VALID_URL = r'(?:(?:https?://(?:\w+\.)?youtube\.com/(?:(?P<user>user|c)/)?(?!(?:attribution_link|watch|results|shared)(?:$|[^a-z_A-Z0-9%-])))|ytuser:)(?!feed/)(?P<id>[A-Za-z0-9_%-]+)'
_TEMPLATE_URL = 'https://www.youtube.com/%s/%s/videos' _TEMPLATE_URL = 'https://www.youtube.com/%s/%s/videos'
IE_NAME = 'youtube:user' IE_NAME = 'youtube:user'
@ -2903,6 +3057,9 @@ class YoutubeUserIE(YoutubeChannelIE):
}, { }, {
'url': 'https://www.youtube.com/c/gametrailers', 'url': 'https://www.youtube.com/c/gametrailers',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.youtube.com/c/Pawe%C5%82Zadro%C5%BCniak',
'only_matching': True,
}, { }, {
'url': 'https://www.youtube.com/gametrailers', 'url': 'https://www.youtube.com/gametrailers',
'only_matching': True, 'only_matching': True,
@ -2981,7 +3138,7 @@ class YoutubeLiveIE(YoutubeBaseInfoExtractor):
class YoutubePlaylistsIE(YoutubePlaylistsBaseInfoExtractor): class YoutubePlaylistsIE(YoutubePlaylistsBaseInfoExtractor):
IE_DESC = 'YouTube.com user/channel playlists' IE_DESC = 'YouTube.com user/channel playlists'
_VALID_URL = r'https?://(?:\w+\.)?youtube\.com/(?:user|channel)/(?P<id>[^/]+)/playlists' _VALID_URL = r'https?://(?:\w+\.)?youtube\.com/(?:user|channel|c)/(?P<id>[^/]+)/playlists'
IE_NAME = 'youtube:playlists' IE_NAME = 'youtube:playlists'
_TESTS = [{ _TESTS = [{
@ -3007,6 +3164,9 @@ class YoutubePlaylistsIE(YoutubePlaylistsBaseInfoExtractor):
'title': 'Chem Player', 'title': 'Chem Player',
}, },
'skip': 'Blocked', 'skip': 'Blocked',
}, {
'url': 'https://www.youtube.com/c/ChristophLaimer/playlists',
'only_matching': True,
}] }]
@ -3151,9 +3311,10 @@ class YoutubeFeedsInfoExtractor(YoutubeBaseInfoExtractor):
break break
more = self._download_json( more = self._download_json(
'https://youtube.com/%s' % mobj.group('more'), self._PLAYLIST_TITLE, 'https://www.youtube.com/%s' % mobj.group('more'), self._PLAYLIST_TITLE,
'Downloading page #%s' % page_num, 'Downloading page #%s' % page_num,
transform_source=uppercase_escape) transform_source=uppercase_escape,
headers=self._YOUTUBE_CLIENT_HEADERS)
content_html = more['content_html'] content_html = more['content_html']
more_widget_html = more['load_more_widget_html'] more_widget_html = more['load_more_widget_html']

View File

@ -853,7 +853,7 @@ def parseOpts(overrideArguments=None):
postproc.add_option( postproc.add_option(
'--exec', '--exec',
metavar='CMD', dest='exec_cmd', metavar='CMD', dest='exec_cmd',
help='Execute a command on the file after downloading, similar to find\'s -exec syntax. Example: --exec \'adb push {} /sdcard/Music/ && rm {}\'') help='Execute a command on the file after downloading and post-processing, similar to find\'s -exec syntax. Example: --exec \'adb push {} /sdcard/Music/ && rm {}\'')
postproc.add_option( postproc.add_option(
'--convert-subs', '--convert-subtitles', '--convert-subs', '--convert-subtitles',
metavar='FORMAT', dest='convertsubtitles', default=None, metavar='FORMAT', dest='convertsubtitles', default=None,

View File

@ -13,6 +13,7 @@ from ..utils import (
encodeFilename, encodeFilename,
PostProcessingError, PostProcessingError,
prepend_extension, prepend_extension,
replace_extension,
shell_quote shell_quote
) )
@ -41,6 +42,38 @@ class EmbedThumbnailPP(FFmpegPostProcessor):
'Skipping embedding the thumbnail because the file is missing.') 'Skipping embedding the thumbnail because the file is missing.')
return [], info return [], info
def is_webp(path):
with open(encodeFilename(path), 'rb') as f:
b = f.read(12)
return b[0:4] == b'RIFF' and b[8:] == b'WEBP'
# Correct extension for WebP file with wrong extension (see #25687, #25717)
_, thumbnail_ext = os.path.splitext(thumbnail_filename)
if thumbnail_ext:
thumbnail_ext = thumbnail_ext[1:].lower()
if thumbnail_ext != 'webp' and is_webp(thumbnail_filename):
self._downloader.to_screen(
'[ffmpeg] Correcting extension to webp and escaping path for thumbnail "%s"' % thumbnail_filename)
thumbnail_webp_filename = replace_extension(thumbnail_filename, 'webp')
os.rename(encodeFilename(thumbnail_filename), encodeFilename(thumbnail_webp_filename))
thumbnail_filename = thumbnail_webp_filename
thumbnail_ext = 'webp'
# Convert unsupported thumbnail formats to JPEG (see #25687, #25717)
if thumbnail_ext not in ['jpg', 'png']:
# NB: % is supposed to be escaped with %% but this does not work
# for input files so working around with standard substitution
escaped_thumbnail_filename = thumbnail_filename.replace('%', '#')
os.rename(encodeFilename(thumbnail_filename), encodeFilename(escaped_thumbnail_filename))
escaped_thumbnail_jpg_filename = replace_extension(escaped_thumbnail_filename, 'jpg')
self._downloader.to_screen('[ffmpeg] Converting thumbnail "%s" to JPEG' % escaped_thumbnail_filename)
self.run_ffmpeg(escaped_thumbnail_filename, escaped_thumbnail_jpg_filename, ['-bsf:v', 'mjpeg2jpeg'])
os.remove(encodeFilename(escaped_thumbnail_filename))
thumbnail_jpg_filename = replace_extension(thumbnail_filename, 'jpg')
# Rename back to unescaped for further processing
os.rename(encodeFilename(escaped_thumbnail_jpg_filename), encodeFilename(thumbnail_jpg_filename))
thumbnail_filename = thumbnail_jpg_filename
if info['ext'] == 'mp3': if info['ext'] == 'mp3':
options = [ options = [
'-c', 'copy', '-map', '0', '-map', '1', '-c', 'copy', '-map', '0', '-map', '1',

View File

@ -447,6 +447,13 @@ class FFmpegMetadataPP(FFmpegPostProcessor):
metadata[meta_f] = info[info_f] metadata[meta_f] = info[info_f]
break break
# See [1-4] for some info on media metadata/metadata supported
# by ffmpeg.
# 1. https://kdenlive.org/en/project/adding-meta-data-to-mp4-video/
# 2. https://wiki.multimedia.cx/index.php/FFmpeg_Metadata
# 3. https://kodi.wiki/view/Video_file_tagging
# 4. http://atomicparsley.sourceforge.net/mpeg-4files.html
add('title', ('track', 'title')) add('title', ('track', 'title'))
add('date', 'upload_date') add('date', 'upload_date')
add(('description', 'comment'), 'description') add(('description', 'comment'), 'description')
@ -457,6 +464,10 @@ class FFmpegMetadataPP(FFmpegPostProcessor):
add('album') add('album')
add('album_artist') add('album_artist')
add('disc', 'disc_number') add('disc', 'disc_number')
add('show', 'series')
add('season_number')
add('episode_id', ('episode', 'episode_id'))
add('episode_sort', 'episode_number')
if not metadata: if not metadata:
self._downloader.to_screen('[ffmpeg] There isn\'t any metadata to add') self._downloader.to_screen('[ffmpeg] There isn\'t any metadata to add')

View File

@ -7,6 +7,7 @@ import base64
import binascii import binascii
import calendar import calendar
import codecs import codecs
import collections
import contextlib import contextlib
import ctypes import ctypes
import datetime import datetime
@ -30,6 +31,7 @@ import ssl
import subprocess import subprocess
import sys import sys
import tempfile import tempfile
import time
import traceback import traceback
import xml.etree.ElementTree import xml.etree.ElementTree
import zlib import zlib
@ -1835,6 +1837,12 @@ def write_json_file(obj, fn):
os.unlink(fn) os.unlink(fn)
except OSError: except OSError:
pass pass
try:
mask = os.umask(0)
os.umask(mask)
os.chmod(tf.name, 0o666 & ~mask)
except OSError:
pass
os.rename(tf.name, fn) os.rename(tf.name, fn)
except Exception: except Exception:
try: try:
@ -2735,14 +2743,66 @@ class YoutubeDLCookieJar(compat_cookiejar.MozillaCookieJar):
1. https://curl.haxx.se/docs/http-cookies.html 1. https://curl.haxx.se/docs/http-cookies.html
""" """
_HTTPONLY_PREFIX = '#HttpOnly_' _HTTPONLY_PREFIX = '#HttpOnly_'
_ENTRY_LEN = 7
_HEADER = '''# Netscape HTTP Cookie File
# This file is generated by youtube-dl. Do not edit.
'''
_CookieFileEntry = collections.namedtuple(
'CookieFileEntry',
('domain_name', 'include_subdomains', 'path', 'https_only', 'expires_at', 'name', 'value'))
def save(self, filename=None, ignore_discard=False, ignore_expires=False): def save(self, filename=None, ignore_discard=False, ignore_expires=False):
"""
Save cookies to a file.
Most of the code is taken from CPython 3.8 and slightly adapted
to support cookie files with UTF-8 in both python 2 and 3.
"""
if filename is None:
if self.filename is not None:
filename = self.filename
else:
raise ValueError(compat_cookiejar.MISSING_FILENAME_TEXT)
# Store session cookies with `expires` set to 0 instead of an empty # Store session cookies with `expires` set to 0 instead of an empty
# string # string
for cookie in self: for cookie in self:
if cookie.expires is None: if cookie.expires is None:
cookie.expires = 0 cookie.expires = 0
compat_cookiejar.MozillaCookieJar.save(self, filename, ignore_discard, ignore_expires)
with io.open(filename, 'w', encoding='utf-8') as f:
f.write(self._HEADER)
now = time.time()
for cookie in self:
if not ignore_discard and cookie.discard:
continue
if not ignore_expires and cookie.is_expired(now):
continue
if cookie.secure:
secure = 'TRUE'
else:
secure = 'FALSE'
if cookie.domain.startswith('.'):
initial_dot = 'TRUE'
else:
initial_dot = 'FALSE'
if cookie.expires is not None:
expires = compat_str(cookie.expires)
else:
expires = ''
if cookie.value is None:
# cookies.txt regards 'Set-Cookie: foo' as a cookie
# with no name, whereas http.cookiejar regards it as a
# cookie with no value.
name = ''
value = cookie.name
else:
name = cookie.name
value = cookie.value
f.write(
'\t'.join([cookie.domain, initial_dot, cookie.path,
secure, expires, name, value]) + '\n')
def load(self, filename=None, ignore_discard=False, ignore_expires=False): def load(self, filename=None, ignore_discard=False, ignore_expires=False):
"""Load cookies from a file.""" """Load cookies from a file."""
@ -2752,12 +2812,30 @@ class YoutubeDLCookieJar(compat_cookiejar.MozillaCookieJar):
else: else:
raise ValueError(compat_cookiejar.MISSING_FILENAME_TEXT) raise ValueError(compat_cookiejar.MISSING_FILENAME_TEXT)
def prepare_line(line):
if line.startswith(self._HTTPONLY_PREFIX):
line = line[len(self._HTTPONLY_PREFIX):]
# comments and empty lines are fine
if line.startswith('#') or not line.strip():
return line
cookie_list = line.split('\t')
if len(cookie_list) != self._ENTRY_LEN:
raise compat_cookiejar.LoadError('invalid length %d' % len(cookie_list))
cookie = self._CookieFileEntry(*cookie_list)
if cookie.expires_at and not cookie.expires_at.isdigit():
raise compat_cookiejar.LoadError('invalid expires at %s' % cookie.expires_at)
return line
cf = io.StringIO() cf = io.StringIO()
with open(filename) as f: with io.open(filename, encoding='utf-8') as f:
for line in f: for line in f:
if line.startswith(self._HTTPONLY_PREFIX): try:
line = line[len(self._HTTPONLY_PREFIX):] cf.write(prepare_line(line))
cf.write(compat_str(line)) except compat_cookiejar.LoadError as e:
write_string(
'WARNING: skipping cookie file entry due to %s: %r\n'
% (e, line), sys.stderr)
continue
cf.seek(0) cf.seek(0)
self._really_load(cf, filename, ignore_discard, ignore_expires) self._really_load(cf, filename, ignore_discard, ignore_expires)
# Session cookies are denoted by either `expires` field set to # Session cookies are denoted by either `expires` field set to
@ -4120,6 +4198,7 @@ def mimetype2ext(mt):
'vnd.ms-sstr+xml': 'ism', 'vnd.ms-sstr+xml': 'ism',
'quicktime': 'mov', 'quicktime': 'mov',
'mp2t': 'ts', 'mp2t': 'ts',
'x-wav': 'wav',
}.get(res, res) }.get(res, res)

View File

@ -1,3 +1,3 @@
from __future__ import unicode_literals from __future__ import unicode_literals
__version__ = '2020.03.24' __version__ = '2020.09.20'