1
0
mirror of https://codeberg.org/polarisfm/youtube-dl synced 2024-11-26 02:14:32 +01:00

Merge remote-tracking branch 'origin/master' into makotv

This commit is contained in:
relrelb 2020-04-10 16:18:03 +03:00
commit 0b1814211b
137 changed files with 4734 additions and 3844 deletions

View File

@ -18,7 +18,7 @@ title: ''
<!-- <!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.11.05. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED. - First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.03.24. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser. - Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape. - Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape.
- Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates. - Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates.
@ -26,7 +26,7 @@ Carefully read and work through this check list in order to prevent the most com
--> -->
- [ ] I'm reporting a broken site support - [ ] I'm reporting a broken site support
- [ ] I've verified that I'm running youtube-dl version **2019.11.05** - [ ] I've verified that I'm running youtube-dl version **2020.03.24**
- [ ] I've checked that all provided URLs are alive and playable in a browser - [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped - [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
- [ ] I've searched the bugtracker for similar issues including closed ones - [ ] I've searched the bugtracker for similar issues including closed ones
@ -41,7 +41,7 @@ Add the `-v` flag to your command line you run youtube-dl with (`youtube-dl -v <
[debug] User config: [] [debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2019.11.05 [debug] youtube-dl version 2020.03.24
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {} [debug] Proxy map: {}

View File

@ -19,7 +19,7 @@ labels: 'site-support-request'
<!-- <!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.11.05. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED. - First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.03.24. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser. - Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that site you are requesting is not dedicated to copyright infringement, see https://yt-dl.org/copyright-infringement. youtube-dl does not support such sites. In order for site support request to be accepted all provided example URLs should not violate any copyrights. - Make sure that site you are requesting is not dedicated to copyright infringement, see https://yt-dl.org/copyright-infringement. youtube-dl does not support such sites. In order for site support request to be accepted all provided example URLs should not violate any copyrights.
- Search the bugtracker for similar site support requests: http://yt-dl.org/search-issues. DO NOT post duplicates. - Search the bugtracker for similar site support requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
@ -27,7 +27,7 @@ Carefully read and work through this check list in order to prevent the most com
--> -->
- [ ] I'm reporting a new site support request - [ ] I'm reporting a new site support request
- [ ] I've verified that I'm running youtube-dl version **2019.11.05** - [ ] I've verified that I'm running youtube-dl version **2020.03.24**
- [ ] I've checked that all provided URLs are alive and playable in a browser - [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that none of provided URLs violate any copyrights - [ ] I've checked that none of provided URLs violate any copyrights
- [ ] I've searched the bugtracker for similar site support requests including closed ones - [ ] I've searched the bugtracker for similar site support requests including closed ones

View File

@ -18,13 +18,13 @@ title: ''
<!-- <!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.11.05. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED. - First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.03.24. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Search the bugtracker for similar site feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates. - Search the bugtracker for similar site feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
- Finally, put x into all relevant boxes (like this [x]) - Finally, put x into all relevant boxes (like this [x])
--> -->
- [ ] I'm reporting a site feature request - [ ] I'm reporting a site feature request
- [ ] I've verified that I'm running youtube-dl version **2019.11.05** - [ ] I've verified that I'm running youtube-dl version **2020.03.24**
- [ ] I've searched the bugtracker for similar site feature requests including closed ones - [ ] I've searched the bugtracker for similar site feature requests including closed ones

View File

@ -18,7 +18,7 @@ title: ''
<!-- <!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.11.05. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED. - First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.03.24. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser. - Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape. - Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape.
- Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates. - Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates.
@ -27,7 +27,7 @@ Carefully read and work through this check list in order to prevent the most com
--> -->
- [ ] I'm reporting a broken site support issue - [ ] I'm reporting a broken site support issue
- [ ] I've verified that I'm running youtube-dl version **2019.11.05** - [ ] I've verified that I'm running youtube-dl version **2020.03.24**
- [ ] I've checked that all provided URLs are alive and playable in a browser - [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped - [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
- [ ] I've searched the bugtracker for similar bug reports including closed ones - [ ] I've searched the bugtracker for similar bug reports including closed ones
@ -43,7 +43,7 @@ Add the `-v` flag to your command line you run youtube-dl with (`youtube-dl -v <
[debug] User config: [] [debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2019.11.05 [debug] youtube-dl version 2020.03.24
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {} [debug] Proxy map: {}

View File

@ -19,13 +19,13 @@ labels: 'request'
<!-- <!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.11.05. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED. - First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.03.24. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Search the bugtracker for similar feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates. - Search the bugtracker for similar feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
- Finally, put x into all relevant boxes (like this [x]) - Finally, put x into all relevant boxes (like this [x])
--> -->
- [ ] I'm reporting a feature request - [ ] I'm reporting a feature request
- [ ] I've verified that I'm running youtube-dl version **2019.11.05** - [ ] I've verified that I'm running youtube-dl version **2020.03.24**
- [ ] I've searched the bugtracker for similar feature requests including closed ones - [ ] I've searched the bugtracker for similar feature requests including closed ones

View File

@ -13,7 +13,7 @@ dist: trusty
env: env:
- YTDL_TEST_SET=core - YTDL_TEST_SET=core
- YTDL_TEST_SET=download - YTDL_TEST_SET=download
matrix: jobs:
include: include:
- python: 3.7 - python: 3.7
dist: xenial dist: xenial
@ -35,6 +35,11 @@ matrix:
env: YTDL_TEST_SET=download env: YTDL_TEST_SET=download
- env: JYTHON=true; YTDL_TEST_SET=core - env: JYTHON=true; YTDL_TEST_SET=core
- env: JYTHON=true; YTDL_TEST_SET=download - env: JYTHON=true; YTDL_TEST_SET=download
- name: flake8
python: 3.8
dist: xenial
install: pip install flake8
script: flake8 .
fast_finish: true fast_finish: true
allow_failures: allow_failures:
- env: YTDL_TEST_SET=download - env: YTDL_TEST_SET=download

314
ChangeLog
View File

@ -1,3 +1,313 @@
version 2020.03.24
Core
- [utils] Revert support for cookie files with spaces used instead of tabs
Extractors
* [teachable] Update upskillcourses and gns3 domains
* [generic] Look for teachable embeds before wistia
+ [teachable] Extract chapter metadata (#24421)
+ [bilibili] Add support for player.bilibili.com (#24402)
+ [bilibili] Add support for new URL schema with BV ids (#24439, #24442)
* [limelight] Remove disabled API requests (#24255)
* [soundcloud] Fix download URL extraction (#24394)
+ [cbc:watch] Add support for authentication (#19160)
* [hellporno] Fix extraction (#24399)
* [xtube] Fix formats extraction (#24348)
* [ndr] Fix extraction (#24326)
* [nhk] Update m3u8 URL and use native HLS downloader (#24329)
- [nhk] Remove obsolete rtmp formats (#24329)
* [nhk] Relax URL regular expression (#24329)
- [vimeo] Revert fix showcase password protected video extraction (#24224)
version 2020.03.08
Core
+ [utils] Add support for cookie files with spaces used instead of tabs
Extractors
+ [pornhub] Add support for pornhubpremium.com (#24288)
- [youtube] Remove outdated code and unnecessary requests
* [youtube] Improve extraction in 429 HTTP error conditions (#24283)
* [nhk] Update API version (#24270)
version 2020.03.06
Extractors
* [youtube] Fix age-gated videos support without login (#24248)
* [vimeo] Fix showcase password protected video extraction (#24224)
* [pornhub] Improve title extraction (#24184)
* [peertube] Improve extraction (#23657)
+ [servus] Add support for new URL schema (#23475, #23583, #24142)
* [vimeo] Fix subtitles URLs (#24209)
version 2020.03.01
Core
* [YoutubeDL] Force redirect URL to unicode on python 2
- [options] Remove duplicate short option -v for --version (#24162)
Extractors
* [xhamster] Fix extraction (#24205)
* [franceculture] Fix extraction (#24204)
+ [telecinco] Add support for article opening videos
* [telecinco] Fix extraction (#24195)
* [xtube] Fix metadata extraction (#21073, #22455)
* [youjizz] Fix extraction (#24181)
- Remove no longer needed compat_str around geturl
* [pornhd] Fix extraction (#24128)
+ [teachable] Add support for multiple videos per lecture (#24101)
+ [wistia] Add support for multiple generic embeds (#8347, 11385)
* [imdb] Fix extraction (#23443)
* [tv2dk:bornholm:play] Fix extraction (#24076)
version 2020.02.16
Core
* [YoutubeDL] Fix playlist entry indexing with --playlist-items (#10591,
#10622)
* [update] Fix updating via symlinks (#23991)
+ [compat] Introduce compat_realpath (#23991)
Extractors
+ [npr] Add support for streams (#24042)
+ [24video] Add support for porn.24video.net (#23779, #23784)
- [jpopsuki] Remove extractor (#23858)
* [nova] Improve extraction (#23690)
* [nova:embed] Improve (#23690)
* [nova:embed] Fix extraction (#23672)
+ [abc:iview] Add support for 720p (#22907, #22921)
* [nytimes] Improve format sorting (#24010)
+ [toggle] Add support for mewatch.sg (#23895, #23930)
* [thisoldhouse] Fix extraction (#23951)
+ [popcorntimes] Add support for popcorntimes.tv (#23949)
* [sportdeutschland] Update to new API
* [twitch:stream] Lowercase channel id for stream request (#23917)
* [tv5mondeplus] Fix extraction (#23907, #23911)
* [tva] Relax URL regular expression (#23903)
* [vimeo] Fix album extraction (#23864)
* [viewlift] Improve extraction
* Fix extraction (#23851)
+ Add support for authentication
+ Add support for more domains
* [svt] Fix series extraction (#22297)
* [svt] Fix article extraction (#22897, #22919)
* [soundcloud] Imporve private playlist/set tracks extraction (#3707)
version 2020.01.24
Extractors
* [youtube] Fix sigfunc name extraction (#23819)
* [stretchinternet] Fix extraction (#4319)
* [voicerepublic] Fix extraction
* [azmedien] Fix extraction (#23783)
* [businessinsider] Fix jwplatform id extraction (#22929, #22954)
+ [24video] Add support for 24video.vip (#23753)
* [ivi:compilation] Fix entries extraction (#23770)
* [ard] Improve extraction (#23761)
* Simplify extraction
+ Extract age limit and series
* Bypass geo-restriction
+ [nbc] Add support for nbc multi network URLs (#23049)
* [americastestkitchen] Fix extraction
* [zype] Improve extraction
+ Extract subtitles (#21258)
+ Support URLs with alternative keys/tokens (#21258)
+ Extract more metadata
* [orf:tvthek] Improve geo restricted videos detection (#23741)
* [soundcloud] Restore previews extraction (#23739)
version 2020.01.15
Extractors
* [yourporn] Fix extraction (#21645, #22255, #23459)
+ [canvas] Add support for new API endpoint (#17680, #18629)
* [ndr:base:embed] Improve thumbnails extraction (#23731)
+ [vodplatform] Add support for embed.kwikmotion.com domain
+ [twitter] Add support for promo_video_website cards (#23711)
* [orf:radio] Clean description and improve extraction
* [orf:fm4] Fix extraction (#23599)
* [safari] Fix kaltura session extraction (#23679, #23670)
* [lego] Fix extraction and extract subtitle (#23687)
* [cloudflarestream] Improve extraction
+ Add support for bytehighway.net domain
+ Add support for signed URLs
+ Extract thumbnail
* [naver] Improve extraction
* Improve geo-restriction handling
+ Extract automatic captions
+ Extract uploader metadata
+ Extract VLive HLS formats
* Improve metadata extraction
- [pandatv] Remove extractor (#23630)
* [dctp] Fix format extraction (#23656)
+ [scrippsnetworks] Add support for www.discovery.com videos
* [discovery] Fix anonymous token extraction (#23650)
* [nrktv:seriebase] Fix extraction (#23625, #23537)
* [wistia] Improve format extraction and extract subtitles (#22590)
* [vice] Improve extraction (#23631)
* [redtube] Detect private videos (#23518)
version 2020.01.01
Extractors
* [brightcove] Invalidate policy key cache on failing requests
* [pornhub] Improve locked videos detection (#22449, #22780)
+ [pornhub] Add support for m3u8 formats
* [pornhub] Fix extraction (#22749, #23082)
* [brightcove] Update policy key on failing requests
* [spankbang] Improve removed video detection (#23423)
* [spankbang] Fix extraction (#23307, #23423, #23444)
* [soundcloud] Automatically update client id on failing requests
* [prosiebensat1] Improve geo restriction handling (#23571)
* [brightcove] Cache brightcove player policy keys
* [teachable] Fail with error message if no video URL found
* [teachable] Improve locked lessons detection (#23528)
+ [scrippsnetworks] Add support for Scripps Networks sites (#19857, #22981)
* [mitele] Fix extraction (#21354, #23456)
* [soundcloud] Update client id (#23516)
* [mailru] Relax URL regular expressions (#23509)
version 2019.12.25
Core
* [utils] Improve str_to_int
+ [downloader/hls] Add ability to override AES decryption key URL (#17521)
Extractors
* [mediaset] Fix parse formats (#23508)
+ [tv2dk:bornholm:play] Add support for play.tv2bornholm.dk (#23291)
+ [slideslive] Add support for url and vimeo service names (#23414)
* [slideslive] Fix extraction (#23413)
* [twitch:clips] Fix extraction (#23375)
+ [soundcloud] Add support for token protected embeds (#18954)
* [vk] Improve extraction
* Fix User Videos extraction (#23356)
* Extract all videos for lists with more than 1000 videos (#23356)
+ Add support for video albums (#14327, #14492)
- [kontrtube] Remove extractor
- [videopremium] Remove extractor
- [musicplayon] Remove extractor (#9225)
+ [ufctv] Add support for ufcfightpass.imgdge.com and
ufcfightpass.imggaming.com (#23343)
+ [twitch] Extract m3u8 formats frame rate (#23333)
+ [imggaming] Add support for playlists and extract subtitles
+ [ufcarabia] Add support for UFC Arabia (#23312)
* [ufctv] Fix extraction
* [yahoo] Fix gyao brightcove player id (#23303)
* [vzaar] Override AES decryption key URL (#17521)
+ [vzaar] Add support for AES HLS manifests (#17521, #23299)
* [nrl] Fix extraction
* [teachingchannel] Fix extraction
* [nintendo] Fix extraction and partially add support for Nintendo Direct
videos (#4592)
+ [ooyala] Add better fallback values for domain and streams variables
+ [youtube] Add support youtubekids.com (#23272)
* [tv2] Detect DRM protection
+ [tv2] Add support for katsomo.fi and mtv.fi (#10543)
* [tv2] Fix tv2.no article extraction
* [msn] Improve extraction
+ Add support for YouTube and NBCSports embeds
+ Add support for articles with multiple videos
* Improve AOL embed support
* Improve format extraction
* [abcotvs] Relax URL regular expression and improve metadata extraction
(#18014)
* [channel9] Reduce response size
* [adobetv] Improve extaction
* Use OnDemandPagedList for list extractors
* Reduce show extraction requests
* Extract original video format and subtitles
+ Add support for adobe tv embeds
version 2019.11.28
Core
+ [utils] Add generic caesar cipher and rot47
* [utils] Handle rd-suffixed day parts in unified_strdate (#23199)
Extractors
* [vimeo] Improve extraction
* Fix review extraction
* Fix ondemand extraction
* Make password protected player case as an expected error (#22896)
* Simplify channel based extractors code
- [openload] Remove extractor (#11999)
- [verystream] Remove extractor
- [streamango] Remove extractor (#15406)
* [dailymotion] Improve extraction
* Extract http formats included in m3u8 manifest
* Fix user extraction (#3553, #21415)
+ Add suport for User Authentication (#11491)
* Fix password protected videos extraction (#23176)
* Respect age limit option and family filter cookie value (#18437)
* Handle video url playlist query param
* Report allowed countries for geo-restricted videos
* [corus] Improve extraction
+ Add support for Series Plus, W Network, YTV, ABC Spark, disneychannel.com
and disneylachaine.ca (#20861)
+ Add support for self hosted videos (#22075)
* Detect DRM protection (#14910, #9164)
* [vivo] Fix extraction (#22328, #22279)
+ [bitchute] Extract upload date (#22990, #23193)
* [soundcloud] Update client id (#23214)
version 2019.11.22
Core
+ [extractor/common] Clean jwplayer description HTML tags
+ [extractor/common] Add data, headers and query to all major extract formats
methods
Extractors
* [chaturbate] Fix extraction (#23010, #23012)
+ [ntvru] Add support for non relative file URLs (#23140)
* [vk] Fix wall audio thumbnails extraction (#23135)
* [ivi] Fix format extraction (#21991)
- [comcarcoff] Remove extractor
+ [drtv] Add support for new URL schema (#23059)
+ [nexx] Add support for Multi Player JS Setup (#23052)
+ [teamcoco] Add support for new videos (#23054)
* [soundcloud] Check if the soundtrack has downloads left (#23045)
* [facebook] Fix posts video data extraction (#22473)
- [addanime] Remove extractor
- [minhateca] Remove extractor
- [daisuki] Remove extractor
* [seeker] Fix extraction
- [revision3] Remove extractors
* [twitch] Fix video comments URL (#18593, #15828)
* [twitter] Improve extraction
+ Add support for generic embeds (#22168)
* Always extract http formats for native videos (#14934)
+ Add support for Twitter Broadcasts (#21369)
+ Extract more metadata
* Improve VMap format extraction
* Unify extraction code for both twitter statuses and cards
+ [twitch] Add support for Clip embed URLs
* [lnkgo] Fix extraction (#16834)
* [mixcloud] Improve extraction
* Improve metadata extraction (#11721)
* Fix playlist extraction (#22378)
* Fix user mixes extraction (#15197, #17865)
+ [kinja] Add support for Kinja embeds (#5756, #11282, #22237, #22384)
* [onionstudios] Fix extraction
+ [hotstar] Pass Referer header to format requests (#22836)
* [dplay] Minimize response size
+ [patreon] Extract uploader_id and filesize
* [patreon] Minimize response size
* [roosterteeth] Fix login request (#16094, #22689)
version 2019.11.05 version 2019.11.05
Extractors Extractors
@ -504,7 +814,7 @@ Extractors
version 2019.04.17 version 2019.04.17
Extractors Extractors
* [openload] Randomize User-Agent (closes #20688) * [openload] Randomize User-Agent (#20688)
+ [openload] Add support for oladblock domains (#20471) + [openload] Add support for oladblock domains (#20471)
* [adn] Fix subtitle extraction (#12724) * [adn] Fix subtitle extraction (#12724)
+ [aol] Add support for localized websites + [aol] Add support for localized websites
@ -1069,7 +1379,7 @@ Extractors
+ [youtube] Extract channel meta fields (#9676, #12939) + [youtube] Extract channel meta fields (#9676, #12939)
* [porntube] Fix extraction (#17541) * [porntube] Fix extraction (#17541)
* [asiancrush] Fix extraction (#15630) * [asiancrush] Fix extraction (#15630)
+ [twitch:clips] Extend URL regular expression (closes #17559) + [twitch:clips] Extend URL regular expression (#17559)
+ [vzaar] Add support for HLS + [vzaar] Add support for HLS
* [tube8] Fix metadata extraction (#17520) * [tube8] Fix metadata extraction (#17520)
* [eporner] Extract JSON-LD (#17519) * [eporner] Extract JSON-LD (#17519)

View File

@ -835,7 +835,9 @@ In February 2015, the new YouTube player contained a character sequence in a str
### HTTP Error 429: Too Many Requests or 402: Payment Required ### HTTP Error 429: Too Many Requests or 402: Payment Required
These two error codes indicate that the service is blocking your IP address because of overuse. Contact the service and ask them to unblock your IP address, or - if you have acquired a whitelisted IP address already - use the [`--proxy` or `--source-address` options](#network-options) to select another IP address. These two error codes indicate that the service is blocking your IP address because of overuse. Usually this is a soft block meaning that you can gain access again after solving CAPTCHA. Just open a browser and solve a CAPTCHA the service suggests you and after that [pass cookies](#how-do-i-pass-cookies-to-youtube-dl) to youtube-dl. Note that if your machine has multiple external IPs then you should also pass exactly the same IP you've used for solving CAPTCHA with [`--source-address`](#network-options). Also you may need to pass a `User-Agent` HTTP header of your browser with [`--user-agent`](#workarounds).
If this is not the case (no CAPTCHA suggested to solve by the service) then you can contact the service and ask them to unblock your IP address, or - if you have acquired a whitelisted IP address already - use the [`--proxy` or `--source-address` options](#network-options) to select another IP address.
### SyntaxError: Non-ASCII character ### SyntaxError: Non-ASCII character

View File

@ -1,7 +1,6 @@
#!/usr/bin/env python #!/usr/bin/env python
from __future__ import unicode_literals from __future__ import unicode_literals
import base64
import io import io
import json import json
import mimetypes import mimetypes
@ -15,7 +14,6 @@ sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from youtube_dl.compat import ( from youtube_dl.compat import (
compat_basestring, compat_basestring,
compat_input,
compat_getpass, compat_getpass,
compat_print, compat_print,
compat_urllib_request, compat_urllib_request,
@ -40,28 +38,20 @@ class GitHubReleaser(object):
try: try:
info = netrc.netrc().authenticators(self._NETRC_MACHINE) info = netrc.netrc().authenticators(self._NETRC_MACHINE)
if info is not None: if info is not None:
self._username = info[0] self._token = info[2]
self._password = info[2]
compat_print('Using GitHub credentials found in .netrc...') compat_print('Using GitHub credentials found in .netrc...')
return return
else: else:
compat_print('No GitHub credentials found in .netrc') compat_print('No GitHub credentials found in .netrc')
except (IOError, netrc.NetrcParseError): except (IOError, netrc.NetrcParseError):
compat_print('Unable to parse .netrc') compat_print('Unable to parse .netrc')
self._username = compat_input( self._token = compat_getpass(
'Type your GitHub username or email address and press [Return]: ') 'Type your GitHub PAT (personal access token) and press [Return]: ')
self._password = compat_getpass(
'Type your GitHub password and press [Return]: ')
def _call(self, req): def _call(self, req):
if isinstance(req, compat_basestring): if isinstance(req, compat_basestring):
req = sanitized_Request(req) req = sanitized_Request(req)
# Authorizing manually since GitHub does not response with 401 with req.add_header('Authorization', 'token %s' % self._token)
# WWW-Authenticate header set (see
# https://developer.github.com/v3/#basic-authentication)
b64 = base64.b64encode(
('%s:%s' % (self._username, self._password)).encode('utf-8')).decode('ascii')
req.add_header('Authorization', 'Basic %s' % b64)
response = self._opener.open(req).read().decode('utf-8') response = self._opener.open(req).read().decode('utf-8')
return json.loads(response) return json.loads(response)

View File

@ -26,13 +26,13 @@
- **AcademicEarth:Course** - **AcademicEarth:Course**
- **acast** - **acast**
- **acast:channel** - **acast:channel**
- **AddAnime**
- **ADN**: Anime Digital Network - **ADN**: Anime Digital Network
- **AdobeConnect** - **AdobeConnect**
- **AdobeTV** - **adobetv**
- **AdobeTVChannel** - **adobetv:channel**
- **AdobeTVShow** - **adobetv:embed**
- **AdobeTVVideo** - **adobetv:show**
- **adobetv:video**
- **AdultSwim** - **AdultSwim**
- **aenetworks**: A+E Networks: A&E, Lifetime, History.com, FYI Network and History Vault - **aenetworks**: A+E Networks: A&E, Lifetime, History.com, FYI Network and History Vault
- **afreecatv**: afreecatv.com - **afreecatv**: afreecatv.com
@ -98,6 +98,7 @@
- **BiliBili** - **BiliBili**
- **BilibiliAudio** - **BilibiliAudio**
- **BilibiliAudioAlbum** - **BilibiliAudioAlbum**
- **BiliBiliPlayer**
- **BioBioChileTV** - **BioBioChileTV**
- **BIQLE** - **BIQLE**
- **BitChute** - **BitChute**
@ -175,7 +176,6 @@
- **CNN** - **CNN**
- **CNNArticle** - **CNNArticle**
- **CNNBlogs** - **CNNBlogs**
- **ComCarCoff**
- **ComedyCentral** - **ComedyCentral**
- **ComedyCentralFullEpisodes** - **ComedyCentralFullEpisodes**
- **ComedyCentralShortname** - **ComedyCentralShortname**
@ -203,8 +203,6 @@
- **dailymotion** - **dailymotion**
- **dailymotion:playlist** - **dailymotion:playlist**
- **dailymotion:user** - **dailymotion:user**
- **DaisukiMotto**
- **DaisukiMottoPlaylist**
- **daum.net** - **daum.net**
- **daum.net:clip** - **daum.net:clip**
- **daum.net:playlist** - **daum.net:playlist**
@ -392,7 +390,6 @@
- **JeuxVideo** - **JeuxVideo**
- **Joj** - **Joj**
- **Jove** - **Jove**
- **jpopsuki.tv**
- **JWPlatform** - **JWPlatform**
- **Kakao** - **Kakao**
- **Kaltura** - **Kaltura**
@ -400,13 +397,14 @@
- **Kankan** - **Kankan**
- **Karaoketv** - **Karaoketv**
- **KarriereVideos** - **KarriereVideos**
- **Katsomo**
- **KeezMovies** - **KeezMovies**
- **Ketnet** - **Ketnet**
- **KhanAcademy** - **KhanAcademy**
- **KickStarter** - **KickStarter**
- **KinjaEmbed**
- **KinoPoisk** - **KinoPoisk**
- **KonserthusetPlay** - **KonserthusetPlay**
- **kontrtube**: KontrTube.ru - Труба зовёт
- **KrasView**: Красвью - **KrasView**: Красвью
- **Ku6** - **Ku6**
- **KUSI** - **KUSI**
@ -485,14 +483,12 @@
- **Mgoon** - **Mgoon**
- **MGTV**: 芒果TV - **MGTV**: 芒果TV
- **MiaoPai** - **MiaoPai**
- **Minhateca**
- **MinistryGrid** - **MinistryGrid**
- **Minoto** - **Minoto**
- **miomio.tv** - **miomio.tv**
- **MiTele**: mitele.es - **MiTele**: mitele.es
- **mixcloud** - **mixcloud**
- **mixcloud:playlist** - **mixcloud:playlist**
- **mixcloud:stream**
- **mixcloud:user** - **mixcloud:user**
- **Mixer:live** - **Mixer:live**
- **Mixer:vod** - **Mixer:vod**
@ -518,7 +514,6 @@
- **mtvjapan** - **mtvjapan**
- **mtvservices:embedded** - **mtvservices:embedded**
- **MuenchenTV**: münchen.tv - **MuenchenTV**: münchen.tv
- **MusicPlayOn**
- **mva**: Microsoft Virtual Academy videos - **mva**: Microsoft Virtual Academy videos
- **mva:course**: Microsoft Virtual Academy courses - **mva:course**: Microsoft Virtual Academy courses
- **Mwave** - **Mwave**
@ -623,7 +618,6 @@
- **OnionStudios** - **OnionStudios**
- **Ooyala** - **Ooyala**
- **OoyalaExternal** - **OoyalaExternal**
- **Openload**
- **OraTV** - **OraTV**
- **orf:fm4**: radio FM4 - **orf:fm4**: radio FM4
- **orf:fm4:story**: fm4.orf.at stories - **orf:fm4:story**: fm4.orf.at stories
@ -634,7 +628,6 @@
- **OutsideTV** - **OutsideTV**
- **PacktPub** - **PacktPub**
- **PacktPubCourse** - **PacktPubCourse**
- **PandaTV**: 熊猫TV
- **pandora.tv**: 판도라TV - **pandora.tv**: 판도라TV
- **ParamountNetwork** - **ParamountNetwork**
- **parliamentlive.tv**: UK parliament videos - **parliamentlive.tv**: UK parliament videos
@ -670,6 +663,7 @@
- **Pokemon** - **Pokemon**
- **PolskieRadio** - **PolskieRadio**
- **PolskieRadioCategory** - **PolskieRadioCategory**
- **Popcorntimes**
- **PopcornTV** - **PopcornTV**
- **PornCom** - **PornCom**
- **PornerBros** - **PornerBros**
@ -723,8 +717,6 @@
- **Restudy** - **Restudy**
- **Reuters** - **Reuters**
- **ReverbNation** - **ReverbNation**
- **revision**
- **revision3:embed**
- **RICE** - **RICE**
- **RMCDecouverte** - **RMCDecouverte**
- **RockstarGames** - **RockstarGames**
@ -769,6 +761,7 @@
- **screen.yahoo:search**: Yahoo screen search - **screen.yahoo:search**: Yahoo screen search
- **Screencast** - **Screencast**
- **ScreencastOMatic** - **ScreencastOMatic**
- **ScrippsNetworks**
- **scrippsnetworks:watch** - **scrippsnetworks:watch**
- **SCTE** - **SCTE**
- **SCTECourse** - **SCTECourse**
@ -832,7 +825,6 @@
- **Steam** - **Steam**
- **Stitcher** - **Stitcher**
- **Streamable** - **Streamable**
- **Streamango**
- **streamcloud.eu** - **streamcloud.eu**
- **StreamCZ** - **StreamCZ**
- **StreetVoice** - **StreetVoice**
@ -922,6 +914,7 @@
- **tv2.hu** - **tv2.hu**
- **TV2Article** - **TV2Article**
- **TV2DK** - **TV2DK**
- **TV2DKBornholmPlay**
- **TV4**: tv4.se and tv4play.se - **TV4**: tv4.se and tv4play.se
- **TV5MondePlus**: TV5MONDE+ - **TV5MondePlus**: TV5MONDE+
- **TVA** - **TVA**
@ -958,10 +951,12 @@
- **twitch:vod** - **twitch:vod**
- **twitter** - **twitter**
- **twitter:amplify** - **twitter:amplify**
- **twitter:broadcast**
- **twitter:card** - **twitter:card**
- **udemy** - **udemy**
- **udemy:course** - **udemy:course**
- **UDNEmbed**: 聯合影音 - **UDNEmbed**: 聯合影音
- **UFCArabia**
- **UFCTV** - **UFCTV**
- **UKTVPlay** - **UKTVPlay**
- **umg:de**: Universal Music Deutschland - **umg:de**: Universal Music Deutschland
@ -982,7 +977,6 @@
- **Vbox7** - **Vbox7**
- **VeeHD** - **VeeHD**
- **Veoh** - **Veoh**
- **verystream**
- **Vesti**: Вести.Ru - **Vesti**: Вести.Ru
- **Vevo** - **Vevo**
- **VevoPlaylist** - **VevoPlaylist**
@ -1002,7 +996,6 @@
- **videomore** - **videomore**
- **videomore:season** - **videomore:season**
- **videomore:video** - **videomore:video**
- **VideoPremium**
- **VideoPress** - **VideoPress**
- **Vidio** - **Vidio**
- **VidLii** - **VidLii**
@ -1012,8 +1005,8 @@
- **Vidzi** - **Vidzi**
- **vier**: vier.be and vijf.be - **vier**: vier.be and vijf.be
- **vier:videos** - **vier:videos**
- **ViewLift** - **viewlift**
- **ViewLiftEmbed** - **viewlift:embed**
- **Viidea** - **Viidea**
- **viki** - **viki**
- **viki:channel** - **viki:channel**

View File

@ -816,11 +816,15 @@ class TestYoutubeDL(unittest.TestCase):
'webpage_url': 'http://example.com', 'webpage_url': 'http://example.com',
} }
def get_ids(params): def get_downloaded_info_dicts(params):
ydl = YDL(params) ydl = YDL(params)
# make a copy because the dictionary can be modified # make a deep copy because the dictionary and nested entries
ydl.process_ie_result(playlist.copy()) # can be modified
return [int(v['id']) for v in ydl.downloaded_info_dicts] ydl.process_ie_result(copy.deepcopy(playlist))
return ydl.downloaded_info_dicts
def get_ids(params):
return [int(v['id']) for v in get_downloaded_info_dicts(params)]
result = get_ids({}) result = get_ids({})
self.assertEqual(result, [1, 2, 3, 4]) self.assertEqual(result, [1, 2, 3, 4])
@ -852,6 +856,22 @@ class TestYoutubeDL(unittest.TestCase):
result = get_ids({'playlist_items': '2-4,3-4,3'}) result = get_ids({'playlist_items': '2-4,3-4,3'})
self.assertEqual(result, [2, 3, 4]) self.assertEqual(result, [2, 3, 4])
# Tests for https://github.com/ytdl-org/youtube-dl/issues/10591
# @{
result = get_downloaded_info_dicts({'playlist_items': '2-4,3-4,3'})
self.assertEqual(result[0]['playlist_index'], 2)
self.assertEqual(result[1]['playlist_index'], 3)
result = get_downloaded_info_dicts({'playlist_items': '2-4,3-4,3'})
self.assertEqual(result[0]['playlist_index'], 2)
self.assertEqual(result[1]['playlist_index'], 3)
self.assertEqual(result[2]['playlist_index'], 4)
result = get_downloaded_info_dicts({'playlist_items': '4,2'})
self.assertEqual(result[0]['playlist_index'], 4)
self.assertEqual(result[1]['playlist_index'], 2)
# @}
def test_urlopen_no_file_protocol(self): def test_urlopen_no_file_protocol(self):
# see https://github.com/ytdl-org/youtube-dl/issues/8227 # see https://github.com/ytdl-org/youtube-dl/issues/8227
ydl = YDL() ydl = YDL()

View File

@ -26,7 +26,6 @@ from youtube_dl.extractor import (
ThePlatformIE, ThePlatformIE,
ThePlatformFeedIE, ThePlatformFeedIE,
RTVEALaCartaIE, RTVEALaCartaIE,
FunnyOrDieIE,
DemocracynowIE, DemocracynowIE,
) )
@ -322,18 +321,6 @@ class TestRtveSubtitles(BaseTestSubtitles):
self.assertEqual(md5(subtitles['es']), '69e70cae2d40574fb7316f31d6eb7fca') self.assertEqual(md5(subtitles['es']), '69e70cae2d40574fb7316f31d6eb7fca')
class TestFunnyOrDieSubtitles(BaseTestSubtitles):
url = 'http://www.funnyordie.com/videos/224829ff6d/judd-apatow-will-direct-your-vine'
IE = FunnyOrDieIE
def test_allsubtitles(self):
self.DL.params['writesubtitles'] = True
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(set(subtitles.keys()), set(['en']))
self.assertEqual(md5(subtitles['en']), 'c5593c193eacd353596c11c2d4f9ecc4')
class TestDemocracynowSubtitles(BaseTestSubtitles): class TestDemocracynowSubtitles(BaseTestSubtitles):
url = 'http://www.democracynow.org/shows/2015/7/3' url = 'http://www.democracynow.org/shows/2015/7/3'
IE = DemocracynowIE IE = DemocracynowIE

View File

@ -19,6 +19,7 @@ from youtube_dl.utils import (
age_restricted, age_restricted,
args_to_str, args_to_str,
encode_base_n, encode_base_n,
caesar,
clean_html, clean_html,
date_from_str, date_from_str,
DateRange, DateRange,
@ -69,6 +70,7 @@ from youtube_dl.utils import (
remove_start, remove_start,
remove_end, remove_end,
remove_quotes, remove_quotes,
rot47,
shell_quote, shell_quote,
smuggle_url, smuggle_url,
str_to_int, str_to_int,
@ -340,6 +342,8 @@ class TestUtil(unittest.TestCase):
self.assertEqual(unified_strdate('July 15th, 2013'), '20130715') self.assertEqual(unified_strdate('July 15th, 2013'), '20130715')
self.assertEqual(unified_strdate('September 1st, 2013'), '20130901') self.assertEqual(unified_strdate('September 1st, 2013'), '20130901')
self.assertEqual(unified_strdate('Sep 2nd, 2013'), '20130902') self.assertEqual(unified_strdate('Sep 2nd, 2013'), '20130902')
self.assertEqual(unified_strdate('November 3rd, 2019'), '20191103')
self.assertEqual(unified_strdate('October 23rd, 2005'), '20051023')
def test_unified_timestamps(self): def test_unified_timestamps(self):
self.assertEqual(unified_timestamp('December 21, 2010'), 1292889600) self.assertEqual(unified_timestamp('December 21, 2010'), 1292889600)
@ -495,6 +499,12 @@ class TestUtil(unittest.TestCase):
def test_str_to_int(self): def test_str_to_int(self):
self.assertEqual(str_to_int('123,456'), 123456) self.assertEqual(str_to_int('123,456'), 123456)
self.assertEqual(str_to_int('123.456'), 123456) self.assertEqual(str_to_int('123.456'), 123456)
self.assertEqual(str_to_int(523), 523)
# Python 3 has no long
if sys.version_info < (3, 0):
eval('self.assertEqual(str_to_int(123456L), 123456)')
self.assertEqual(str_to_int('noninteger'), None)
self.assertEqual(str_to_int([]), None)
def test_url_basename(self): def test_url_basename(self):
self.assertEqual(url_basename('http://foo.de/'), '') self.assertEqual(url_basename('http://foo.de/'), '')
@ -1367,6 +1377,20 @@ Line 1
self.assertRaises(ValueError, encode_base_n, 0, 70) self.assertRaises(ValueError, encode_base_n, 0, 70)
self.assertRaises(ValueError, encode_base_n, 0, 60, custom_table) self.assertRaises(ValueError, encode_base_n, 0, 60, custom_table)
def test_caesar(self):
self.assertEqual(caesar('ace', 'abcdef', 2), 'cea')
self.assertEqual(caesar('cea', 'abcdef', -2), 'ace')
self.assertEqual(caesar('ace', 'abcdef', -2), 'eac')
self.assertEqual(caesar('eac', 'abcdef', 2), 'ace')
self.assertEqual(caesar('ace', 'abcdef', 0), 'ace')
self.assertEqual(caesar('xyz', 'abcdef', 2), 'xyz')
self.assertEqual(caesar('abc', 'acegik', 2), 'ebg')
self.assertEqual(caesar('ebg', 'acegik', -2), 'abc')
def test_rot47(self):
self.assertEqual(rot47('youtube-dl'), r'J@FEF36\5=')
self.assertEqual(rot47('YOUTUBE-DL'), r'*~&%&qt\s{')
def test_urshift(self): def test_urshift(self):
self.assertEqual(urshift(3, 1), 1) self.assertEqual(urshift(3, 1), 1)
self.assertEqual(urshift(-3, 1), 2147483646) self.assertEqual(urshift(-3, 1), 2147483646)

View File

@ -92,6 +92,7 @@ from .utils import (
YoutubeDLCookieJar, YoutubeDLCookieJar,
YoutubeDLCookieProcessor, YoutubeDLCookieProcessor,
YoutubeDLHandler, YoutubeDLHandler,
YoutubeDLRedirectHandler,
) )
from .cache import Cache from .cache import Cache
from .extractor import get_info_extractor, gen_extractor_classes, _LAZY_LOADER from .extractor import get_info_extractor, gen_extractor_classes, _LAZY_LOADER
@ -990,7 +991,7 @@ class YoutubeDL(object):
'playlist_title': ie_result.get('title'), 'playlist_title': ie_result.get('title'),
'playlist_uploader': ie_result.get('uploader'), 'playlist_uploader': ie_result.get('uploader'),
'playlist_uploader_id': ie_result.get('uploader_id'), 'playlist_uploader_id': ie_result.get('uploader_id'),
'playlist_index': i + playliststart, 'playlist_index': playlistitems[i - 1] if playlistitems else i + playliststart,
'extractor': ie_result['extractor'], 'extractor': ie_result['extractor'],
'webpage_url': ie_result['webpage_url'], 'webpage_url': ie_result['webpage_url'],
'webpage_url_basename': url_basename(ie_result['webpage_url']), 'webpage_url_basename': url_basename(ie_result['webpage_url']),
@ -2343,6 +2344,7 @@ class YoutubeDL(object):
debuglevel = 1 if self.params.get('debug_printtraffic') else 0 debuglevel = 1 if self.params.get('debug_printtraffic') else 0
https_handler = make_HTTPS_handler(self.params, debuglevel=debuglevel) https_handler = make_HTTPS_handler(self.params, debuglevel=debuglevel)
ydlh = YoutubeDLHandler(self.params, debuglevel=debuglevel) ydlh = YoutubeDLHandler(self.params, debuglevel=debuglevel)
redirect_handler = YoutubeDLRedirectHandler()
data_handler = compat_urllib_request_DataHandler() data_handler = compat_urllib_request_DataHandler()
# When passing our own FileHandler instance, build_opener won't add the # When passing our own FileHandler instance, build_opener won't add the
@ -2356,7 +2358,7 @@ class YoutubeDL(object):
file_handler.file_open = file_open file_handler.file_open = file_open
opener = compat_urllib_request.build_opener( opener = compat_urllib_request.build_opener(
proxy_handler, https_handler, cookie_processor, ydlh, data_handler, file_handler) proxy_handler, https_handler, cookie_processor, ydlh, redirect_handler, data_handler, file_handler)
# Delete the default user-agent header, which would otherwise apply in # Delete the default user-agent header, which would otherwise apply in
# cases where our custom HTTP handler doesn't come into play # cases where our custom HTTP handler doesn't come into play

View File

@ -2754,6 +2754,17 @@ else:
compat_expanduser = os.path.expanduser compat_expanduser = os.path.expanduser
if compat_os_name == 'nt' and sys.version_info < (3, 8):
# os.path.realpath on Windows does not follow symbolic links
# prior to Python 3.8 (see https://bugs.python.org/issue9949)
def compat_realpath(path):
while os.path.islink(path):
path = os.path.abspath(os.readlink(path))
return path
else:
compat_realpath = os.path.realpath
if sys.version_info < (3, 0): if sys.version_info < (3, 0):
def compat_print(s): def compat_print(s):
from .utils import preferredencoding from .utils import preferredencoding
@ -2998,6 +3009,7 @@ __all__ = [
'compat_os_name', 'compat_os_name',
'compat_parse_qs', 'compat_parse_qs',
'compat_print', 'compat_print',
'compat_realpath',
'compat_setenv', 'compat_setenv',
'compat_shlex_quote', 'compat_shlex_quote',
'compat_shlex_split', 'compat_shlex_split',

View File

@ -64,7 +64,7 @@ class HlsFD(FragmentFD):
s = urlh.read().decode('utf-8', 'ignore') s = urlh.read().decode('utf-8', 'ignore')
if not self.can_download(s, info_dict): if not self.can_download(s, info_dict):
if info_dict.get('extra_param_to_segment_url'): if info_dict.get('extra_param_to_segment_url') or info_dict.get('_decryption_key_url'):
self.report_error('pycrypto not found. Please install it.') self.report_error('pycrypto not found. Please install it.')
return False return False
self.report_warning( self.report_warning(
@ -169,7 +169,7 @@ class HlsFD(FragmentFD):
if decrypt_info['METHOD'] == 'AES-128': if decrypt_info['METHOD'] == 'AES-128':
iv = decrypt_info.get('IV') or compat_struct_pack('>8xq', media_sequence) iv = decrypt_info.get('IV') or compat_struct_pack('>8xq', media_sequence)
decrypt_info['KEY'] = decrypt_info.get('KEY') or self.ydl.urlopen( decrypt_info['KEY'] = decrypt_info.get('KEY') or self.ydl.urlopen(
self._prepare_url(info_dict, decrypt_info['URI'])).read() self._prepare_url(info_dict, info_dict.get('_decryption_key_url') or decrypt_info['URI'])).read()
frag_content = AES.new( frag_content = AES.new(
decrypt_info['KEY'], AES.MODE_CBC, iv).decrypt(frag_content) decrypt_info['KEY'], AES.MODE_CBC, iv).decrypt(frag_content)
self._append_fragment(ctx, frag_content) self._append_fragment(ctx, frag_content)

View File

@ -110,17 +110,17 @@ class ABCIViewIE(InfoExtractor):
# ABC iview programs are normally available for 14 days only. # ABC iview programs are normally available for 14 days only.
_TESTS = [{ _TESTS = [{
'url': 'https://iview.abc.net.au/show/ben-and-hollys-little-kingdom/series/0/video/ZX9371A050S00', 'url': 'https://iview.abc.net.au/show/gruen/series/11/video/LE1927H001S00',
'md5': 'cde42d728b3b7c2b32b1b94b4a548afc', 'md5': '67715ce3c78426b11ba167d875ac6abf',
'info_dict': { 'info_dict': {
'id': 'ZX9371A050S00', 'id': 'LE1927H001S00',
'ext': 'mp4', 'ext': 'mp4',
'title': "Gaston's Birthday", 'title': "Series 11 Ep 1",
'series': "Ben And Holly's Little Kingdom", 'series': "Gruen",
'description': 'md5:f9de914d02f226968f598ac76f105bcf', 'description': 'md5:52cc744ad35045baf6aded2ce7287f67',
'upload_date': '20180604', 'upload_date': '20190925',
'uploader_id': 'abc4kids', 'uploader_id': 'abc1',
'timestamp': 1528140219, 'timestamp': 1569445289,
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
@ -148,7 +148,7 @@ class ABCIViewIE(InfoExtractor):
'hdnea': token, 'hdnea': token,
}) })
for sd in ('sd', 'sd-low'): for sd in ('720', 'sd', 'sd-low'):
sd_url = try_get( sd_url = try_get(
stream, lambda x: x['streams']['hls'][sd], compat_str) stream, lambda x: x['streams']['hls'][sd], compat_str)
if not sd_url: if not sd_url:

View File

@ -4,29 +4,30 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str
from ..utils import ( from ..utils import (
dict_get,
int_or_none, int_or_none,
parse_iso8601, try_get,
) )
class ABCOTVSIE(InfoExtractor): class ABCOTVSIE(InfoExtractor):
IE_NAME = 'abcotvs' IE_NAME = 'abcotvs'
IE_DESC = 'ABC Owned Television Stations' IE_DESC = 'ABC Owned Television Stations'
_VALID_URL = r'https?://(?:abc(?:7(?:news|ny|chicago)?|11|13|30)|6abc)\.com(?:/[^/]+/(?P<display_id>[^/]+))?/(?P<id>\d+)' _VALID_URL = r'https?://(?P<site>abc(?:7(?:news|ny|chicago)?|11|13|30)|6abc)\.com(?:(?:/[^/]+)*/(?P<display_id>[^/]+))?/(?P<id>\d+)'
_TESTS = [ _TESTS = [
{ {
'url': 'http://abc7news.com/entertainment/east-bay-museum-celebrates-vintage-synthesizers/472581/', 'url': 'http://abc7news.com/entertainment/east-bay-museum-celebrates-vintage-synthesizers/472581/',
'info_dict': { 'info_dict': {
'id': '472581', 'id': '472548',
'display_id': 'east-bay-museum-celebrates-vintage-synthesizers', 'display_id': 'east-bay-museum-celebrates-vintage-synthesizers',
'ext': 'mp4', 'ext': 'mp4',
'title': 'East Bay museum celebrates vintage synthesizers', 'title': 'East Bay museum celebrates synthesized music',
'description': 'md5:24ed2bd527096ec2a5c67b9d5a9005f3', 'description': 'md5:24ed2bd527096ec2a5c67b9d5a9005f3',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
'timestamp': 1421123075, 'timestamp': 1421118520,
'upload_date': '20150113', 'upload_date': '20150113',
'uploader': 'Jonathan Bloom',
}, },
'params': { 'params': {
# m3u8 download # m3u8 download
@ -37,39 +38,63 @@ class ABCOTVSIE(InfoExtractor):
'url': 'http://abc7news.com/472581', 'url': 'http://abc7news.com/472581',
'only_matching': True, 'only_matching': True,
}, },
{
'url': 'https://6abc.com/man-75-killed-after-being-struck-by-vehicle-in-chester/5725182/',
'only_matching': True,
},
] ]
_SITE_MAP = {
'6abc': 'wpvi',
'abc11': 'wtvd',
'abc13': 'ktrk',
'abc30': 'kfsn',
'abc7': 'kabc',
'abc7chicago': 'wls',
'abc7news': 'kgo',
'abc7ny': 'wabc',
}
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) site, display_id, video_id = re.match(self._VALID_URL, url).groups()
video_id = mobj.group('id') display_id = display_id or video_id
display_id = mobj.group('display_id') or video_id station = self._SITE_MAP[site]
webpage = self._download_webpage(url, display_id) data = self._download_json(
'https://api.abcotvs.com/v2/content', display_id, query={
'id': video_id,
'key': 'otv.web.%s.story' % station,
'station': station,
})['data']
video = try_get(data, lambda x: x['featuredMedia']['video'], dict) or data
video_id = compat_str(dict_get(video, ('id', 'publishedKey'), video_id))
title = video.get('title') or video['linkText']
m3u8 = self._html_search_meta( formats = []
'contentURL', webpage, 'm3u8 url', fatal=True).split('?')[0] m3u8_url = video.get('m3u8')
if m3u8_url:
formats = self._extract_m3u8_formats(m3u8, display_id, 'mp4') formats = self._extract_m3u8_formats(
video['m3u8'].split('?')[0], display_id, 'mp4', m3u8_id='hls', fatal=False)
mp4_url = video.get('mp4')
if mp4_url:
formats.append({
'abr': 128,
'format_id': 'https',
'height': 360,
'url': mp4_url,
'width': 640,
})
self._sort_formats(formats) self._sort_formats(formats)
title = self._og_search_title(webpage).strip() image = video.get('image') or {}
description = self._og_search_description(webpage).strip()
thumbnail = self._og_search_thumbnail(webpage)
timestamp = parse_iso8601(self._search_regex(
r'<div class="meta">\s*<time class="timeago" datetime="([^"]+)">',
webpage, 'upload date', fatal=False))
uploader = self._search_regex(
r'rel="author">([^<]+)</a>',
webpage, 'uploader', default=None)
return { return {
'id': video_id, 'id': video_id,
'display_id': display_id, 'display_id': display_id,
'title': title, 'title': title,
'description': description, 'description': dict_get(video, ('description', 'caption'), try_get(video, lambda x: x['meta']['description'])),
'thumbnail': thumbnail, 'thumbnail': dict_get(image, ('source', 'dynamicSource')),
'timestamp': timestamp, 'timestamp': int_or_none(video.get('date')),
'uploader': uploader, 'duration': int_or_none(video.get('length')),
'formats': formats, 'formats': formats,
} }

View File

@ -1,25 +1,119 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import functools
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str from ..compat import compat_str
from ..utils import ( from ..utils import (
parse_duration,
unified_strdate,
str_to_int,
int_or_none,
float_or_none, float_or_none,
int_or_none,
ISO639Utils, ISO639Utils,
determine_ext, OnDemandPagedList,
parse_duration,
str_or_none,
str_to_int,
unified_strdate,
) )
class AdobeTVBaseIE(InfoExtractor): class AdobeTVBaseIE(InfoExtractor):
_API_BASE_URL = 'http://tv.adobe.com/api/v4/' def _call_api(self, path, video_id, query, note=None):
return self._download_json(
'http://tv.adobe.com/api/v4/' + path,
video_id, note, query=query)['data']
def _parse_subtitles(self, video_data, url_key):
subtitles = {}
for translation in video_data.get('translations', []):
vtt_path = translation.get(url_key)
if not vtt_path:
continue
lang = translation.get('language_w3c') or ISO639Utils.long2short(translation['language_medium'])
subtitles.setdefault(lang, []).append({
'ext': 'vtt',
'url': vtt_path,
})
return subtitles
def _parse_video_data(self, video_data):
video_id = compat_str(video_data['id'])
title = video_data['title']
s3_extracted = False
formats = []
for source in video_data.get('videos', []):
source_url = source.get('url')
if not source_url:
continue
f = {
'format_id': source.get('quality_level'),
'fps': int_or_none(source.get('frame_rate')),
'height': int_or_none(source.get('height')),
'tbr': int_or_none(source.get('video_data_rate')),
'width': int_or_none(source.get('width')),
'url': source_url,
}
original_filename = source.get('original_filename')
if original_filename:
if not (f.get('height') and f.get('width')):
mobj = re.search(r'_(\d+)x(\d+)', original_filename)
if mobj:
f.update({
'height': int(mobj.group(2)),
'width': int(mobj.group(1)),
})
if original_filename.startswith('s3://') and not s3_extracted:
formats.append({
'format_id': 'original',
'preference': 1,
'url': original_filename.replace('s3://', 'https://s3.amazonaws.com/'),
})
s3_extracted = True
formats.append(f)
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'description': video_data.get('description'),
'thumbnail': video_data.get('thumbnail'),
'upload_date': unified_strdate(video_data.get('start_date')),
'duration': parse_duration(video_data.get('duration')),
'view_count': str_to_int(video_data.get('playcount')),
'formats': formats,
'subtitles': self._parse_subtitles(video_data, 'vtt'),
}
class AdobeTVEmbedIE(AdobeTVBaseIE):
IE_NAME = 'adobetv:embed'
_VALID_URL = r'https?://tv\.adobe\.com/embed/\d+/(?P<id>\d+)'
_TEST = {
'url': 'https://tv.adobe.com/embed/22/4153',
'md5': 'c8c0461bf04d54574fc2b4d07ac6783a',
'info_dict': {
'id': '4153',
'ext': 'flv',
'title': 'Creating Graphics Optimized for BlackBerry',
'description': 'md5:eac6e8dced38bdaae51cd94447927459',
'thumbnail': r're:https?://.*\.jpg$',
'upload_date': '20091109',
'duration': 377,
'view_count': int,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
video_data = self._call_api(
'episode/' + video_id, video_id, {'disclosure': 'standard'})[0]
return self._parse_video_data(video_data)
class AdobeTVIE(AdobeTVBaseIE): class AdobeTVIE(AdobeTVBaseIE):
IE_NAME = 'adobetv'
_VALID_URL = r'https?://tv\.adobe\.com/(?:(?P<language>fr|de|es|jp)/)?watch/(?P<show_urlname>[^/]+)/(?P<id>[^/]+)' _VALID_URL = r'https?://tv\.adobe\.com/(?:(?P<language>fr|de|es|jp)/)?watch/(?P<show_urlname>[^/]+)/(?P<id>[^/]+)'
_TEST = { _TEST = {
@ -42,45 +136,33 @@ class AdobeTVIE(AdobeTVBaseIE):
if not language: if not language:
language = 'en' language = 'en'
video_data = self._download_json( video_data = self._call_api(
self._API_BASE_URL + 'episode/get/?language=%s&show_urlname=%s&urlname=%s&disclosure=standard' % (language, show_urlname, urlname), 'episode/get', urlname, {
urlname)['data'][0] 'disclosure': 'standard',
'language': language,
formats = [{ 'show_urlname': show_urlname,
'url': source['url'], 'urlname': urlname,
'format_id': source.get('quality_level') or source['url'].split('-')[-1].split('.')[0] or None, })[0]
'width': int_or_none(source.get('width')), return self._parse_video_data(video_data)
'height': int_or_none(source.get('height')),
'tbr': int_or_none(source.get('video_data_rate')),
} for source in video_data['videos']]
self._sort_formats(formats)
return {
'id': compat_str(video_data['id']),
'title': video_data['title'],
'description': video_data.get('description'),
'thumbnail': video_data.get('thumbnail'),
'upload_date': unified_strdate(video_data.get('start_date')),
'duration': parse_duration(video_data.get('duration')),
'view_count': str_to_int(video_data.get('playcount')),
'formats': formats,
}
class AdobeTVPlaylistBaseIE(AdobeTVBaseIE): class AdobeTVPlaylistBaseIE(AdobeTVBaseIE):
def _parse_page_data(self, page_data): _PAGE_SIZE = 25
return [self.url_result(self._get_element_url(element_data)) for element_data in page_data]
def _extract_playlist_entries(self, url, display_id): def _fetch_page(self, display_id, query, page):
page = self._download_json(url, display_id) page += 1
entries = self._parse_page_data(page['data']) query['page'] = page
for page_num in range(2, page['paging']['pages'] + 1): for element_data in self._call_api(
entries.extend(self._parse_page_data( self._RESOURCE, display_id, query, 'Download Page %d' % page):
self._download_json(url + '&page=%d' % page_num, display_id)['data'])) yield self._process_data(element_data)
return entries
def _extract_playlist_entries(self, display_id, query):
return OnDemandPagedList(functools.partial(
self._fetch_page, display_id, query), self._PAGE_SIZE)
class AdobeTVShowIE(AdobeTVPlaylistBaseIE): class AdobeTVShowIE(AdobeTVPlaylistBaseIE):
IE_NAME = 'adobetv:show'
_VALID_URL = r'https?://tv\.adobe\.com/(?:(?P<language>fr|de|es|jp)/)?show/(?P<id>[^/]+)' _VALID_URL = r'https?://tv\.adobe\.com/(?:(?P<language>fr|de|es|jp)/)?show/(?P<id>[^/]+)'
_TEST = { _TEST = {
@ -92,26 +174,31 @@ class AdobeTVShowIE(AdobeTVPlaylistBaseIE):
}, },
'playlist_mincount': 136, 'playlist_mincount': 136,
} }
_RESOURCE = 'episode'
def _get_element_url(self, element_data): _process_data = AdobeTVBaseIE._parse_video_data
return element_data['urls'][0]
def _real_extract(self, url): def _real_extract(self, url):
language, show_urlname = re.match(self._VALID_URL, url).groups() language, show_urlname = re.match(self._VALID_URL, url).groups()
if not language: if not language:
language = 'en' language = 'en'
query = 'language=%s&show_urlname=%s' % (language, show_urlname) query = {
'disclosure': 'standard',
'language': language,
'show_urlname': show_urlname,
}
show_data = self._download_json(self._API_BASE_URL + 'show/get/?%s' % query, show_urlname)['data'][0] show_data = self._call_api(
'show/get', show_urlname, query)[0]
return self.playlist_result( return self.playlist_result(
self._extract_playlist_entries(self._API_BASE_URL + 'episode/?%s' % query, show_urlname), self._extract_playlist_entries(show_urlname, query),
compat_str(show_data['id']), str_or_none(show_data.get('id')),
show_data['show_name'], show_data.get('show_name'),
show_data['show_description']) show_data.get('show_description'))
class AdobeTVChannelIE(AdobeTVPlaylistBaseIE): class AdobeTVChannelIE(AdobeTVPlaylistBaseIE):
IE_NAME = 'adobetv:channel'
_VALID_URL = r'https?://tv\.adobe\.com/(?:(?P<language>fr|de|es|jp)/)?channel/(?P<id>[^/]+)(?:/(?P<category_urlname>[^/]+))?' _VALID_URL = r'https?://tv\.adobe\.com/(?:(?P<language>fr|de|es|jp)/)?channel/(?P<id>[^/]+)(?:/(?P<category_urlname>[^/]+))?'
_TEST = { _TEST = {
@ -121,24 +208,30 @@ class AdobeTVChannelIE(AdobeTVPlaylistBaseIE):
}, },
'playlist_mincount': 96, 'playlist_mincount': 96,
} }
_RESOURCE = 'show'
def _get_element_url(self, element_data): def _process_data(self, show_data):
return element_data['url'] return self.url_result(
show_data['url'], 'AdobeTVShow', str_or_none(show_data.get('id')))
def _real_extract(self, url): def _real_extract(self, url):
language, channel_urlname, category_urlname = re.match(self._VALID_URL, url).groups() language, channel_urlname, category_urlname = re.match(self._VALID_URL, url).groups()
if not language: if not language:
language = 'en' language = 'en'
query = 'language=%s&channel_urlname=%s' % (language, channel_urlname) query = {
'channel_urlname': channel_urlname,
'language': language,
}
if category_urlname: if category_urlname:
query += '&category_urlname=%s' % category_urlname query['category_urlname'] = category_urlname
return self.playlist_result( return self.playlist_result(
self._extract_playlist_entries(self._API_BASE_URL + 'show/?%s' % query, channel_urlname), self._extract_playlist_entries(channel_urlname, query),
channel_urlname) channel_urlname)
class AdobeTVVideoIE(InfoExtractor): class AdobeTVVideoIE(AdobeTVBaseIE):
IE_NAME = 'adobetv:video'
_VALID_URL = r'https?://video\.tv\.adobe\.com/v/(?P<id>\d+)' _VALID_URL = r'https?://video\.tv\.adobe\.com/v/(?P<id>\d+)'
_TEST = { _TEST = {
@ -160,38 +253,36 @@ class AdobeTVVideoIE(InfoExtractor):
video_data = self._parse_json(self._search_regex( video_data = self._parse_json(self._search_regex(
r'var\s+bridge\s*=\s*([^;]+);', webpage, 'bridged data'), video_id) r'var\s+bridge\s*=\s*([^;]+);', webpage, 'bridged data'), video_id)
title = video_data['title']
formats = [{ formats = []
'format_id': '%s-%s' % (determine_ext(source['src']), source.get('height')), sources = video_data.get('sources') or []
'url': source['src'], for source in sources:
'width': int_or_none(source.get('width')), source_src = source.get('src')
'height': int_or_none(source.get('height')), if not source_src:
'tbr': int_or_none(source.get('bitrate')), continue
} for source in video_data['sources']] formats.append({
'filesize': int_or_none(source.get('kilobytes') or None, invscale=1000),
'format_id': '-'.join(filter(None, [source.get('format'), source.get('label')])),
'height': int_or_none(source.get('height') or None),
'tbr': int_or_none(source.get('bitrate') or None),
'width': int_or_none(source.get('width') or None),
'url': source_src,
})
self._sort_formats(formats) self._sort_formats(formats)
# For both metadata and downloaded files the duration varies among # For both metadata and downloaded files the duration varies among
# formats. I just pick the max one # formats. I just pick the max one
duration = max(filter(None, [ duration = max(filter(None, [
float_or_none(source.get('duration'), scale=1000) float_or_none(source.get('duration'), scale=1000)
for source in video_data['sources']])) for source in sources]))
subtitles = {}
for translation in video_data.get('translations', []):
lang_id = translation.get('language_w3c') or ISO639Utils.long2short(translation['language_medium'])
if lang_id not in subtitles:
subtitles[lang_id] = []
subtitles[lang_id].append({
'url': translation['vttPath'],
'ext': 'vtt',
})
return { return {
'id': video_id, 'id': video_id,
'formats': formats, 'formats': formats,
'title': video_data['title'], 'title': title,
'description': video_data.get('description'), 'description': video_data.get('description'),
'thumbnail': video_data['video'].get('poster'), 'thumbnail': video_data.get('video', {}).get('poster'),
'duration': duration, 'duration': duration,
'subtitles': subtitles, 'subtitles': self._parse_subtitles(video_data, 'vttPath'),
} }

View File

@ -5,6 +5,7 @@ from .common import InfoExtractor
from ..utils import ( from ..utils import (
clean_html, clean_html,
int_or_none, int_or_none,
js_to_json,
try_get, try_get,
unified_strdate, unified_strdate,
) )
@ -13,22 +14,21 @@ from ..utils import (
class AmericasTestKitchenIE(InfoExtractor): class AmericasTestKitchenIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?americastestkitchen\.com/(?:episode|videos)/(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?americastestkitchen\.com/(?:episode|videos)/(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'https://www.americastestkitchen.com/episode/548-summer-dinner-party', 'url': 'https://www.americastestkitchen.com/episode/582-weeknight-japanese-suppers',
'md5': 'b861c3e365ac38ad319cfd509c30577f', 'md5': 'b861c3e365ac38ad319cfd509c30577f',
'info_dict': { 'info_dict': {
'id': '1_5g5zua6e', 'id': '5b400b9ee338f922cb06450c',
'title': 'Summer Dinner Party', 'title': 'Weeknight Japanese Suppers',
'ext': 'mp4', 'ext': 'mp4',
'description': 'md5:858d986e73a4826979b6a5d9f8f6a1ec', 'description': 'md5:3d0c1a44bb3b27607ce82652db25b4a8',
'thumbnail': r're:^https?://.*\.jpg', 'thumbnail': r're:^https?://',
'timestamp': 1497285541, 'timestamp': 1523664000,
'upload_date': '20170612', 'upload_date': '20180414',
'uploader_id': 'roger.metcalf@americastestkitchen.com', 'release_date': '20180414',
'release_date': '20170617',
'series': "America's Test Kitchen", 'series': "America's Test Kitchen",
'season_number': 17, 'season_number': 18,
'episode': 'Summer Dinner Party', 'episode': 'Weeknight Japanese Suppers',
'episode_number': 24, 'episode_number': 15,
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
@ -47,7 +47,7 @@ class AmericasTestKitchenIE(InfoExtractor):
self._search_regex( self._search_regex(
r'window\.__INITIAL_STATE__\s*=\s*({.+?})\s*;\s*</script>', r'window\.__INITIAL_STATE__\s*=\s*({.+?})\s*;\s*</script>',
webpage, 'initial context'), webpage, 'initial context'),
video_id) video_id, js_to_json)
ep_data = try_get( ep_data = try_get(
video_data, video_data,
@ -55,17 +55,7 @@ class AmericasTestKitchenIE(InfoExtractor):
lambda x: x['videoDetail']['content']['data']), dict) lambda x: x['videoDetail']['content']['data']), dict)
ep_meta = ep_data.get('full_video', {}) ep_meta = ep_data.get('full_video', {})
zype_id = ep_meta.get('zype_id') zype_id = ep_data.get('zype_id') or ep_meta['zype_id']
if zype_id:
embed_url = 'https://player.zype.com/embed/%s.js?api_key=jZ9GUhRmxcPvX7M3SlfejB6Hle9jyHTdk2jVxG7wOHPLODgncEKVdPYBhuz9iWXQ' % zype_id
ie_key = 'Zype'
else:
partner_id = self._search_regex(
r'src=["\'](?:https?:)?//(?:[^/]+\.)kaltura\.com/(?:[^/]+/)*(?:p|partner_id)/(\d+)',
webpage, 'kaltura partner id')
external_id = ep_data.get('external_id') or ep_meta['external_id']
embed_url = 'kaltura:%s:%s' % (partner_id, external_id)
ie_key = 'Kaltura'
title = ep_data.get('title') or ep_meta.get('title') title = ep_data.get('title') or ep_meta.get('title')
description = clean_html(ep_meta.get('episode_description') or ep_data.get( description = clean_html(ep_meta.get('episode_description') or ep_data.get(
@ -79,8 +69,8 @@ class AmericasTestKitchenIE(InfoExtractor):
return { return {
'_type': 'url_transparent', '_type': 'url_transparent',
'url': embed_url, 'url': 'https://player.zype.com/embed/%s.js?api_key=jZ9GUhRmxcPvX7M3SlfejB6Hle9jyHTdk2jVxG7wOHPLODgncEKVdPYBhuz9iWXQ' % zype_id,
'ie_key': ie_key, 'ie_key': 'Zype',
'title': title, 'title': title,
'description': description, 'description': description,
'thumbnail': thumbnail, 'thumbnail': thumbnail,

View File

@ -1,6 +1,7 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import json
import re import re
from .common import InfoExtractor from .common import InfoExtractor
@ -22,7 +23,101 @@ from ..utils import (
from ..compat import compat_etree_fromstring from ..compat import compat_etree_fromstring
class ARDMediathekIE(InfoExtractor): class ARDMediathekBaseIE(InfoExtractor):
_GEO_COUNTRIES = ['DE']
def _extract_media_info(self, media_info_url, webpage, video_id):
media_info = self._download_json(
media_info_url, video_id, 'Downloading media JSON')
return self._parse_media_info(media_info, video_id, '"fsk"' in webpage)
def _parse_media_info(self, media_info, video_id, fsk):
formats = self._extract_formats(media_info, video_id)
if not formats:
if fsk:
raise ExtractorError(
'This video is only available after 20:00', expected=True)
elif media_info.get('_geoblocked'):
self.raise_geo_restricted(
'This video is not available due to geoblocking',
countries=self._GEO_COUNTRIES)
self._sort_formats(formats)
subtitles = {}
subtitle_url = media_info.get('_subtitleUrl')
if subtitle_url:
subtitles['de'] = [{
'ext': 'ttml',
'url': subtitle_url,
}]
return {
'id': video_id,
'duration': int_or_none(media_info.get('_duration')),
'thumbnail': media_info.get('_previewImage'),
'is_live': media_info.get('_isLive') is True,
'formats': formats,
'subtitles': subtitles,
}
def _extract_formats(self, media_info, video_id):
type_ = media_info.get('_type')
media_array = media_info.get('_mediaArray', [])
formats = []
for num, media in enumerate(media_array):
for stream in media.get('_mediaStreamArray', []):
stream_urls = stream.get('_stream')
if not stream_urls:
continue
if not isinstance(stream_urls, list):
stream_urls = [stream_urls]
quality = stream.get('_quality')
server = stream.get('_server')
for stream_url in stream_urls:
if not url_or_none(stream_url):
continue
ext = determine_ext(stream_url)
if quality != 'auto' and ext in ('f4m', 'm3u8'):
continue
if ext == 'f4m':
formats.extend(self._extract_f4m_formats(
update_url_query(stream_url, {
'hdcore': '3.1.1',
'plugin': 'aasp-3.1.1.69.124'
}), video_id, f4m_id='hds', fatal=False))
elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
stream_url, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
else:
if server and server.startswith('rtmp'):
f = {
'url': server,
'play_path': stream_url,
'format_id': 'a%s-rtmp-%s' % (num, quality),
}
else:
f = {
'url': stream_url,
'format_id': 'a%s-%s-%s' % (num, ext, quality)
}
m = re.search(
r'_(?P<width>\d+)x(?P<height>\d+)\.mp4$',
stream_url)
if m:
f.update({
'width': int(m.group('width')),
'height': int(m.group('height')),
})
if type_ == 'audio':
f['vcodec'] = 'none'
formats.append(f)
return formats
class ARDMediathekIE(ARDMediathekBaseIE):
IE_NAME = 'ARD:mediathek' IE_NAME = 'ARD:mediathek'
_VALID_URL = r'^https?://(?:(?:(?:www|classic)\.)?ardmediathek\.de|mediathek\.(?:daserste|rbb-online)\.de|one\.ard\.de)/(?:.*/)(?P<video_id>[0-9]+|[^0-9][^/\?]+)[^/\?]*(?:\?.*)?' _VALID_URL = r'^https?://(?:(?:(?:www|classic)\.)?ardmediathek\.de|mediathek\.(?:daserste|rbb-online)\.de|one\.ard\.de)/(?:.*/)(?P<video_id>[0-9]+|[^0-9][^/\?]+)[^/\?]*(?:\?.*)?'
@ -63,94 +158,6 @@ class ARDMediathekIE(InfoExtractor):
def suitable(cls, url): def suitable(cls, url):
return False if ARDBetaMediathekIE.suitable(url) else super(ARDMediathekIE, cls).suitable(url) return False if ARDBetaMediathekIE.suitable(url) else super(ARDMediathekIE, cls).suitable(url)
def _extract_media_info(self, media_info_url, webpage, video_id):
media_info = self._download_json(
media_info_url, video_id, 'Downloading media JSON')
formats = self._extract_formats(media_info, video_id)
if not formats:
if '"fsk"' in webpage:
raise ExtractorError(
'This video is only available after 20:00', expected=True)
elif media_info.get('_geoblocked'):
raise ExtractorError('This video is not available due to geo restriction', expected=True)
self._sort_formats(formats)
duration = int_or_none(media_info.get('_duration'))
thumbnail = media_info.get('_previewImage')
is_live = media_info.get('_isLive') is True
subtitles = {}
subtitle_url = media_info.get('_subtitleUrl')
if subtitle_url:
subtitles['de'] = [{
'ext': 'ttml',
'url': subtitle_url,
}]
return {
'id': video_id,
'duration': duration,
'thumbnail': thumbnail,
'is_live': is_live,
'formats': formats,
'subtitles': subtitles,
}
def _extract_formats(self, media_info, video_id):
type_ = media_info.get('_type')
media_array = media_info.get('_mediaArray', [])
formats = []
for num, media in enumerate(media_array):
for stream in media.get('_mediaStreamArray', []):
stream_urls = stream.get('_stream')
if not stream_urls:
continue
if not isinstance(stream_urls, list):
stream_urls = [stream_urls]
quality = stream.get('_quality')
server = stream.get('_server')
for stream_url in stream_urls:
if not url_or_none(stream_url):
continue
ext = determine_ext(stream_url)
if quality != 'auto' and ext in ('f4m', 'm3u8'):
continue
if ext == 'f4m':
formats.extend(self._extract_f4m_formats(
update_url_query(stream_url, {
'hdcore': '3.1.1',
'plugin': 'aasp-3.1.1.69.124'
}),
video_id, f4m_id='hds', fatal=False))
elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
stream_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
else:
if server and server.startswith('rtmp'):
f = {
'url': server,
'play_path': stream_url,
'format_id': 'a%s-rtmp-%s' % (num, quality),
}
else:
f = {
'url': stream_url,
'format_id': 'a%s-%s-%s' % (num, ext, quality)
}
m = re.search(r'_(?P<width>\d+)x(?P<height>\d+)\.mp4$', stream_url)
if m:
f.update({
'width': int(m.group('width')),
'height': int(m.group('height')),
})
if type_ == 'audio':
f['vcodec'] = 'none'
formats.append(f)
return formats
def _real_extract(self, url): def _real_extract(self, url):
# determine video id from url # determine video id from url
m = re.match(self._VALID_URL, url) m = re.match(self._VALID_URL, url)
@ -302,19 +309,20 @@ class ARDIE(InfoExtractor):
} }
class ARDBetaMediathekIE(InfoExtractor): class ARDBetaMediathekIE(ARDMediathekBaseIE):
_VALID_URL = r'https://(?:beta|www)\.ardmediathek\.de/[^/]+/(?:player|live)/(?P<video_id>[a-zA-Z0-9]+)(?:/(?P<display_id>[^/?#]+))?' _VALID_URL = r'https://(?:beta|www)\.ardmediathek\.de/(?P<client>[^/]+)/(?:player|live)/(?P<video_id>[a-zA-Z0-9]+)(?:/(?P<display_id>[^/?#]+))?'
_TESTS = [{ _TESTS = [{
'url': 'https://beta.ardmediathek.de/ard/player/Y3JpZDovL2Rhc2Vyc3RlLmRlL3RhdG9ydC9mYmM4NGM1NC0xNzU4LTRmZGYtYWFhZS0wYzcyZTIxNGEyMDE/die-robuste-roswita', 'url': 'https://beta.ardmediathek.de/ard/player/Y3JpZDovL2Rhc2Vyc3RlLmRlL3RhdG9ydC9mYmM4NGM1NC0xNzU4LTRmZGYtYWFhZS0wYzcyZTIxNGEyMDE/die-robuste-roswita',
'md5': '2d02d996156ea3c397cfc5036b5d7f8f', 'md5': 'dfdc87d2e7e09d073d5a80770a9ce88f',
'info_dict': { 'info_dict': {
'display_id': 'die-robuste-roswita', 'display_id': 'die-robuste-roswita',
'id': 'Y3JpZDovL2Rhc2Vyc3RlLmRlL3RhdG9ydC9mYmM4NGM1NC0xNzU4LTRmZGYtYWFhZS0wYzcyZTIxNGEyMDE', 'id': '70153354',
'title': 'Tatort: Die robuste Roswita', 'title': 'Die robuste Roswita',
'description': r're:^Der Mord.*trüber ist als die Ilm.', 'description': r're:^Der Mord.*trüber ist als die Ilm.',
'duration': 5316, 'duration': 5316,
'thumbnail': 'https://img.ardmediathek.de/standard/00/55/43/59/34/-1774185891/16x9/960?mandant=ard', 'thumbnail': 'https://img.ardmediathek.de/standard/00/70/15/33/90/-1852531467/16x9/960?mandant=ard',
'upload_date': '20180826', 'timestamp': 1577047500,
'upload_date': '20191222',
'ext': 'mp4', 'ext': 'mp4',
}, },
}, { }, {
@ -330,71 +338,69 @@ class ARDBetaMediathekIE(InfoExtractor):
video_id = mobj.group('video_id') video_id = mobj.group('video_id')
display_id = mobj.group('display_id') or video_id display_id = mobj.group('display_id') or video_id
webpage = self._download_webpage(url, display_id) player_page = self._download_json(
data_json = self._search_regex(r'window\.__APOLLO_STATE__\s*=\s*(\{.*);\n', webpage, 'json') 'https://api.ardmediathek.de/public-gateway',
data = self._parse_json(data_json, display_id) display_id, data=json.dumps({
'query': '''{
res = { playerPage(client:"%s", clipId: "%s") {
'id': video_id, blockedByFsk
'display_id': display_id, broadcastedOn
maturityContentRating
mediaCollection {
_duration
_geoblocked
_isLive
_mediaArray {
_mediaStreamArray {
_quality
_server
_stream
} }
formats = [] }
subtitles = {} _previewImage
geoblocked = False _subtitleUrl
for widget in data.values(): _type
if widget.get('_geoblocked') is True: }
geoblocked = True show {
if '_duration' in widget: title
res['duration'] = int_or_none(widget['_duration']) }
if 'clipTitle' in widget: synopsis
res['title'] = widget['clipTitle'] title
if '_previewImage' in widget: tracking {
res['thumbnail'] = widget['_previewImage'] atiCustomVars {
if 'broadcastedOn' in widget: contentId
res['timestamp'] = unified_timestamp(widget['broadcastedOn']) }
if 'synopsis' in widget: }
res['description'] = widget['synopsis'] }
subtitle_url = url_or_none(widget.get('_subtitleUrl')) }''' % (mobj.group('client'), video_id),
if subtitle_url: }).encode(), headers={
subtitles.setdefault('de', []).append({ 'Content-Type': 'application/json'
'ext': 'ttml', })['data']['playerPage']
'url': subtitle_url, title = player_page['title']
}) content_id = str_or_none(try_get(
if '_quality' in widget: player_page, lambda x: x['tracking']['atiCustomVars']['contentId']))
format_url = url_or_none(try_get( media_collection = player_page.get('mediaCollection') or {}
widget, lambda x: x['_stream']['json'][0])) if not media_collection and content_id:
if not format_url: media_collection = self._download_json(
continue 'https://www.ardmediathek.de/play/media/' + content_id,
ext = determine_ext(format_url) content_id, fatal=False) or {}
if ext == 'f4m': info = self._parse_media_info(
formats.extend(self._extract_f4m_formats( media_collection, content_id or video_id,
format_url + '?hdcore=3.11.0', player_page.get('blockedByFsk'))
video_id, f4m_id='hds', fatal=False)) age_limit = None
elif ext == 'm3u8': description = player_page.get('synopsis')
formats.extend(self._extract_m3u8_formats( maturity_content_rating = player_page.get('maturityContentRating')
format_url, video_id, 'mp4', m3u8_id='hls', if maturity_content_rating:
fatal=False)) age_limit = int_or_none(maturity_content_rating.lstrip('FSK'))
else: if not age_limit and description:
# HTTP formats are not available when geoblocked is True, age_limit = int_or_none(self._search_regex(
# other formats are fine though r'\(FSK\s*(\d+)\)\s*$', description, 'age limit', default=None))
if geoblocked: info.update({
continue 'age_limit': age_limit,
quality = str_or_none(widget.get('_quality')) 'display_id': display_id,
formats.append({ 'title': title,
'format_id': ('http-' + quality) if quality else 'http', 'description': description,
'url': format_url, 'timestamp': unified_timestamp(player_page.get('broadcastedOn')),
'preference': 10, # Plain HTTP, that's nice 'series': try_get(player_page, lambda x: x['show']['title']),
})
if not formats and geoblocked:
self.raise_geo_restricted(
msg='This video is not available due to geoblocking',
countries=['DE'])
self._sort_formats(formats)
res.update({
'subtitles': subtitles,
'formats': formats,
}) })
return info
return res

View File

@ -47,39 +47,19 @@ class AZMedienIE(InfoExtractor):
'url': 'https://www.telebaern.tv/telebaern-news/montag-1-oktober-2018-ganze-sendung-133531189#video=0_7xjo9lf1', 'url': 'https://www.telebaern.tv/telebaern-news/montag-1-oktober-2018-ganze-sendung-133531189#video=0_7xjo9lf1',
'only_matching': True 'only_matching': True
}] }]
_API_TEMPL = 'https://www.%s/api/pub/gql/%s/NewsArticleTeaser/cb9f2f81ed22e9b47f4ca64ea3cc5a5d13e88d1d'
_PARTNER_ID = '1719221' _PARTNER_ID = '1719221'
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) host, display_id, article_id, entry_id = re.match(self._VALID_URL, url).groups()
host = mobj.group('host')
video_id = mobj.group('id')
entry_id = mobj.group('kaltura_id')
if not entry_id: if not entry_id:
api_url = 'https://www.%s/api/pub/gql/%s' % (host, host.split('.')[0]) entry_id = self._download_json(
payload = { self._API_TEMPL % (host, host.split('.')[0]), display_id, query={
'query': '''query VideoContext($articleId: ID!) { 'variables': json.dumps({
article: node(id: $articleId) { 'contextId': 'NewsArticle:' + article_id,
... on Article { }),
mainAssetRelation { })['data']['context']['mainAsset']['video']['kaltura']['kalturaId']
asset {
... on VideoAsset {
kalturaId
}
}
}
}
}
}''',
'variables': {'articleId': 'Article:%s' % mobj.group('article_id')},
}
json_data = self._download_json(
api_url, video_id, headers={
'Content-Type': 'application/json',
},
data=json.dumps(payload).encode())
entry_id = json_data['data']['article']['mainAssetRelation']['asset']['kalturaId']
return self.url_result( return self.url_result(
'kaltura:%s:%s' % (self._PARTNER_ID, entry_id), 'kaltura:%s:%s' % (self._PARTNER_ID, entry_id),

View File

@ -24,7 +24,18 @@ from ..utils import (
class BiliBiliIE(InfoExtractor): class BiliBiliIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.|bangumi\.|)bilibili\.(?:tv|com)/(?:video/av|anime/(?P<anime_id>\d+)/play#)(?P<id>\d+)' _VALID_URL = r'''(?x)
https?://
(?:(?:www|bangumi)\.)?
bilibili\.(?:tv|com)/
(?:
(?:
video/[aA][vV]|
anime/(?P<anime_id>\d+)/play\#
)(?P<id_bv>\d+)|
video/[bB][vV](?P<id>[^/?#&]+)
)
'''
_TESTS = [{ _TESTS = [{
'url': 'http://www.bilibili.tv/video/av1074402/', 'url': 'http://www.bilibili.tv/video/av1074402/',
@ -92,6 +103,10 @@ class BiliBiliIE(InfoExtractor):
'skip_download': True, # Test metadata only 'skip_download': True, # Test metadata only
}, },
}] }]
}, {
# new BV video id format
'url': 'https://www.bilibili.com/video/BV1JE411F741',
'only_matching': True,
}] }]
_APP_KEY = 'iVGUTjsxvpLeuDCf' _APP_KEY = 'iVGUTjsxvpLeuDCf'
@ -109,7 +124,7 @@ class BiliBiliIE(InfoExtractor):
url, smuggled_data = unsmuggle_url(url, {}) url, smuggled_data = unsmuggle_url(url, {})
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id') video_id = mobj.group('id') or mobj.group('id_bv')
anime_id = mobj.group('anime_id') anime_id = mobj.group('anime_id')
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
@ -419,3 +434,17 @@ class BilibiliAudioAlbumIE(BilibiliAudioBaseIE):
entries, am_id, album_title, album_data.get('intro')) entries, am_id, album_title, album_data.get('intro'))
return self.playlist_result(entries, am_id) return self.playlist_result(entries, am_id)
class BiliBiliPlayerIE(InfoExtractor):
_VALID_URL = r'https?://player\.bilibili\.com/player\.html\?.*?\baid=(?P<id>\d+)'
_TEST = {
'url': 'http://player.bilibili.com/player.html?aid=92494333&cid=157926707&page=1',
'only_matching': True,
}
def _real_extract(self, url):
video_id = self._match_id(url)
return self.url_result(
'http://www.bilibili.tv/video/av%s/' % video_id,
ie=BiliBiliIE.ie_key(), video_id=video_id)

View File

@ -7,6 +7,7 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
orderedSet, orderedSet,
unified_strdate,
urlencode_postdata, urlencode_postdata,
) )
@ -23,6 +24,7 @@ class BitChuteIE(InfoExtractor):
'description': 'md5:3f21f6fb5b1d17c3dee9cf6b5fe60b3a', 'description': 'md5:3f21f6fb5b1d17c3dee9cf6b5fe60b3a',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
'uploader': 'Victoria X Rave', 'uploader': 'Victoria X Rave',
'upload_date': '20170813',
}, },
}, { }, {
'url': 'https://www.bitchute.com/embed/lbb5G1hjPhw/', 'url': 'https://www.bitchute.com/embed/lbb5G1hjPhw/',
@ -74,12 +76,17 @@ class BitChuteIE(InfoExtractor):
r'(?s)<p\b[^>]+\bclass=["\']video-author[^>]+>(.+?)</p>'), r'(?s)<p\b[^>]+\bclass=["\']video-author[^>]+>(.+?)</p>'),
webpage, 'uploader', fatal=False) webpage, 'uploader', fatal=False)
upload_date = unified_strdate(self._search_regex(
r'class=["\']video-publish-date[^>]+>[^<]+ at \d+:\d+ UTC on (.+?)\.',
webpage, 'upload date', fatal=False))
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'description': description, 'description': description,
'thumbnail': thumbnail, 'thumbnail': thumbnail,
'uploader': uploader, 'uploader': uploader,
'upload_date': upload_date,
'formats': formats, 'formats': formats,
} }

View File

@ -586,45 +586,63 @@ class BrightcoveNewIE(AdobePassIE):
account_id, player_id, embed, content_type, video_id = re.match(self._VALID_URL, url).groups() account_id, player_id, embed, content_type, video_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage( policy_key_id = '%s_%s' % (account_id, player_id)
'http://players.brightcove.net/%s/%s_%s/index.min.js' policy_key = self._downloader.cache.load('brightcove', policy_key_id)
% (account_id, player_id, embed), video_id) policy_key_extracted = False
store_pk = lambda x: self._downloader.cache.store('brightcove', policy_key_id, x)
policy_key = None def extract_policy_key():
webpage = self._download_webpage(
'http://players.brightcove.net/%s/%s_%s/index.min.js'
% (account_id, player_id, embed), video_id)
catalog = self._search_regex( policy_key = None
r'catalog\(({.+?})\);', webpage, 'catalog', default=None)
if catalog: catalog = self._search_regex(
catalog = self._parse_json( r'catalog\(({.+?})\);', webpage, 'catalog', default=None)
js_to_json(catalog), video_id, fatal=False)
if catalog: if catalog:
policy_key = catalog.get('policyKey') catalog = self._parse_json(
js_to_json(catalog), video_id, fatal=False)
if catalog:
policy_key = catalog.get('policyKey')
if not policy_key: if not policy_key:
policy_key = self._search_regex( policy_key = self._search_regex(
r'policyKey\s*:\s*(["\'])(?P<pk>.+?)\1', r'policyKey\s*:\s*(["\'])(?P<pk>.+?)\1',
webpage, 'policy key', group='pk') webpage, 'policy key', group='pk')
store_pk(policy_key)
return policy_key
api_url = 'https://edge.api.brightcove.com/playback/v1/accounts/%s/%ss/%s' % (account_id, content_type, video_id) api_url = 'https://edge.api.brightcove.com/playback/v1/accounts/%s/%ss/%s' % (account_id, content_type, video_id)
headers = { headers = {}
'Accept': 'application/json;pk=%s' % policy_key,
}
referrer = smuggled_data.get('referrer') referrer = smuggled_data.get('referrer')
if referrer: if referrer:
headers.update({ headers.update({
'Referer': referrer, 'Referer': referrer,
'Origin': re.search(r'https?://[^/]+', referrer).group(0), 'Origin': re.search(r'https?://[^/]+', referrer).group(0),
}) })
try:
json_data = self._download_json(api_url, video_id, headers=headers) for _ in range(2):
except ExtractorError as e: if not policy_key:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403: policy_key = extract_policy_key()
json_data = self._parse_json(e.cause.read().decode(), video_id)[0] policy_key_extracted = True
message = json_data.get('message') or json_data['error_code'] headers['Accept'] = 'application/json;pk=%s' % policy_key
if json_data.get('error_subcode') == 'CLIENT_GEO': try:
self.raise_geo_restricted(msg=message) json_data = self._download_json(api_url, video_id, headers=headers)
raise ExtractorError(message, expected=True) break
raise except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code in (401, 403):
json_data = self._parse_json(e.cause.read().decode(), video_id)[0]
message = json_data.get('message') or json_data['error_code']
if json_data.get('error_subcode') == 'CLIENT_GEO':
self.raise_geo_restricted(msg=message)
elif json_data.get('error_code') == 'INVALID_POLICY_KEY' and not policy_key_extracted:
policy_key = None
store_pk(None)
continue
raise ExtractorError(message, expected=True)
raise
errors = json_data.get('errors') errors = json_data.get('errors')
if errors and errors[0].get('error_subcode') == 'TVE_AUTH': if errors and errors[0].get('error_subcode') == 'TVE_AUTH':

View File

@ -9,21 +9,26 @@ class BusinessInsiderIE(InfoExtractor):
_VALID_URL = r'https?://(?:[^/]+\.)?businessinsider\.(?:com|nl)/(?:[^/]+/)*(?P<id>[^/?#&]+)' _VALID_URL = r'https?://(?:[^/]+\.)?businessinsider\.(?:com|nl)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://uk.businessinsider.com/how-much-radiation-youre-exposed-to-in-everyday-life-2016-6', 'url': 'http://uk.businessinsider.com/how-much-radiation-youre-exposed-to-in-everyday-life-2016-6',
'md5': 'ca237a53a8eb20b6dc5bd60564d4ab3e', 'md5': 'ffed3e1e12a6f950aa2f7d83851b497a',
'info_dict': { 'info_dict': {
'id': 'hZRllCfw', 'id': 'cjGDb0X9',
'ext': 'mp4', 'ext': 'mp4',
'title': "Here's how much radiation you're exposed to in everyday life", 'title': "Bananas give you more radiation exposure than living next to a nuclear power plant",
'description': 'md5:9a0d6e2c279948aadaa5e84d6d9b99bd', 'description': 'md5:0175a3baf200dd8fa658f94cade841b3',
'upload_date': '20170709', 'upload_date': '20160611',
'timestamp': 1499606400, 'timestamp': 1465675620,
},
'params': {
'skip_download': True,
}, },
}, { }, {
'url': 'https://www.businessinsider.nl/5-scientifically-proven-things-make-you-less-attractive-2017-7/', 'url': 'https://www.businessinsider.nl/5-scientifically-proven-things-make-you-less-attractive-2017-7/',
'only_matching': True, 'md5': '43f438dbc6da0b89f5ac42f68529d84a',
'info_dict': {
'id': '5zJwd4FK',
'ext': 'mp4',
'title': 'Deze dingen zorgen ervoor dat je minder snel een date scoort',
'description': 'md5:2af8975825d38a4fed24717bbe51db49',
'upload_date': '20170705',
'timestamp': 1499270528,
},
}, { }, {
'url': 'http://www.businessinsider.com/excel-index-match-vlookup-video-how-to-2015-2?IR=T', 'url': 'http://www.businessinsider.com/excel-index-match-vlookup-video-how-to-2015-2?IR=T',
'only_matching': True, 'only_matching': True,
@ -35,7 +40,8 @@ class BusinessInsiderIE(InfoExtractor):
jwplatform_id = self._search_regex( jwplatform_id = self._search_regex(
(r'data-media-id=["\']([a-zA-Z0-9]{8})', (r'data-media-id=["\']([a-zA-Z0-9]{8})',
r'id=["\']jwplayer_([a-zA-Z0-9]{8})', r'id=["\']jwplayer_([a-zA-Z0-9]{8})',
r'id["\']?\s*:\s*["\']?([a-zA-Z0-9]{8})'), r'id["\']?\s*:\s*["\']?([a-zA-Z0-9]{8})',
r'(?:jwplatform\.com/players/|jwplayer_)([a-zA-Z0-9]{8})'),
webpage, 'jwplatform id') webpage, 'jwplatform id')
return self.url_result( return self.url_result(
'jwplatform:%s' % jwplatform_id, ie=JWPlatformIE.ie_key(), 'jwplatform:%s' % jwplatform_id, ie=JWPlatformIE.ie_key(),

View File

@ -13,6 +13,8 @@ from ..utils import (
int_or_none, int_or_none,
merge_dicts, merge_dicts,
parse_iso8601, parse_iso8601,
str_or_none,
url_or_none,
) )
@ -20,15 +22,15 @@ class CanvasIE(InfoExtractor):
_VALID_URL = r'https?://mediazone\.vrt\.be/api/v1/(?P<site_id>canvas|een|ketnet|vrt(?:video|nieuws)|sporza)/assets/(?P<id>[^/?#&]+)' _VALID_URL = r'https?://mediazone\.vrt\.be/api/v1/(?P<site_id>canvas|een|ketnet|vrt(?:video|nieuws)|sporza)/assets/(?P<id>[^/?#&]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://mediazone.vrt.be/api/v1/ketnet/assets/md-ast-4ac54990-ce66-4d00-a8ca-9eac86f4c475', 'url': 'https://mediazone.vrt.be/api/v1/ketnet/assets/md-ast-4ac54990-ce66-4d00-a8ca-9eac86f4c475',
'md5': '90139b746a0a9bd7bb631283f6e2a64e', 'md5': '68993eda72ef62386a15ea2cf3c93107',
'info_dict': { 'info_dict': {
'id': 'md-ast-4ac54990-ce66-4d00-a8ca-9eac86f4c475', 'id': 'md-ast-4ac54990-ce66-4d00-a8ca-9eac86f4c475',
'display_id': 'md-ast-4ac54990-ce66-4d00-a8ca-9eac86f4c475', 'display_id': 'md-ast-4ac54990-ce66-4d00-a8ca-9eac86f4c475',
'ext': 'flv', 'ext': 'mp4',
'title': 'Nachtwacht: De Greystook', 'title': 'Nachtwacht: De Greystook',
'description': 'md5:1db3f5dc4c7109c821261e7512975be7', 'description': 'Nachtwacht: De Greystook',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
'duration': 1468.03, 'duration': 1468.04,
}, },
'expected_warnings': ['is not a supported codec', 'Unknown MIME type'], 'expected_warnings': ['is not a supported codec', 'Unknown MIME type'],
}, { }, {
@ -39,23 +41,45 @@ class CanvasIE(InfoExtractor):
'HLS': 'm3u8_native', 'HLS': 'm3u8_native',
'HLS_AES': 'm3u8', 'HLS_AES': 'm3u8',
} }
_REST_API_BASE = 'https://media-services-public.vrt.be/vualto-video-aggregator-web/rest/external/v1'
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
site_id, video_id = mobj.group('site_id'), mobj.group('id') site_id, video_id = mobj.group('site_id'), mobj.group('id')
# Old API endpoint, serves more formats but may fail for some videos
data = self._download_json( data = self._download_json(
'https://mediazone.vrt.be/api/v1/%s/assets/%s' 'https://mediazone.vrt.be/api/v1/%s/assets/%s'
% (site_id, video_id), video_id) % (site_id, video_id), video_id, 'Downloading asset JSON',
'Unable to download asset JSON', fatal=False)
# New API endpoint
if not data:
token = self._download_json(
'%s/tokens' % self._REST_API_BASE, video_id,
'Downloading token', data=b'',
headers={'Content-Type': 'application/json'})['vrtPlayerToken']
data = self._download_json(
'%s/videos/%s' % (self._REST_API_BASE, video_id),
video_id, 'Downloading video JSON', fatal=False, query={
'vrtPlayerToken': token,
'client': '%s@PROD' % site_id,
}, expected_status=400)
message = data.get('message')
if message and not data.get('title'):
if data.get('code') == 'AUTHENTICATION_REQUIRED':
self.raise_login_required(message)
raise ExtractorError(message, expected=True)
title = data['title'] title = data['title']
description = data.get('description') description = data.get('description')
formats = [] formats = []
for target in data['targetUrls']: for target in data['targetUrls']:
format_url, format_type = target.get('url'), target.get('type') format_url, format_type = url_or_none(target.get('url')), str_or_none(target.get('type'))
if not format_url or not format_type: if not format_url or not format_type:
continue continue
format_type = format_type.upper()
if format_type in self._HLS_ENTRY_PROTOCOLS_MAP: if format_type in self._HLS_ENTRY_PROTOCOLS_MAP:
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
format_url, video_id, 'mp4', self._HLS_ENTRY_PROTOCOLS_MAP[format_type], format_url, video_id, 'mp4', self._HLS_ENTRY_PROTOCOLS_MAP[format_type],
@ -134,20 +158,20 @@ class CanvasEenIE(InfoExtractor):
}, },
'skip': 'Pagina niet gevonden', 'skip': 'Pagina niet gevonden',
}, { }, {
'url': 'https://www.een.be/sorry-voor-alles/herbekijk-sorry-voor-alles', 'url': 'https://www.een.be/thuis/emma-pakt-thilly-aan',
'info_dict': { 'info_dict': {
'id': 'mz-ast-11a587f8-b921-4266-82e2-0bce3e80d07f', 'id': 'md-ast-3a24ced2-64d7-44fb-b4ed-ed1aafbf90b8',
'display_id': 'herbekijk-sorry-voor-alles', 'display_id': 'emma-pakt-thilly-aan',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Herbekijk Sorry voor alles', 'title': 'Emma pakt Thilly aan',
'description': 'md5:8bb2805df8164e5eb95d6a7a29dc0dd3', 'description': 'md5:c5c9b572388a99b2690030afa3f3bad7',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
'duration': 3788.06, 'duration': 118.24,
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
'skip': 'Episode no longer available', 'expected_warnings': ['is not a supported codec'],
}, { }, {
'url': 'https://www.canvas.be/check-point/najaar-2016/de-politie-uw-vriend', 'url': 'https://www.canvas.be/check-point/najaar-2016/de-politie-uw-vriend',
'only_matching': True, 'only_matching': True,
@ -183,19 +207,44 @@ class VrtNUIE(GigyaBaseIE):
IE_DESC = 'VrtNU.be' IE_DESC = 'VrtNU.be'
_VALID_URL = r'https?://(?:www\.)?vrt\.be/(?P<site_id>vrtnu)/(?:[^/]+/)*(?P<id>[^/?#&]+)' _VALID_URL = r'https?://(?:www\.)?vrt\.be/(?P<site_id>vrtnu)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{ _TESTS = [{
# Available via old API endpoint
'url': 'https://www.vrt.be/vrtnu/a-z/postbus-x/1/postbus-x-s1a1/', 'url': 'https://www.vrt.be/vrtnu/a-z/postbus-x/1/postbus-x-s1a1/',
'info_dict': { 'info_dict': {
'id': 'pbs-pub-2e2d8c27-df26-45c9-9dc6-90c78153044d$vid-90c932b1-e21d-4fb8-99b1-db7b49cf74de', 'id': 'pbs-pub-2e2d8c27-df26-45c9-9dc6-90c78153044d$vid-90c932b1-e21d-4fb8-99b1-db7b49cf74de',
'ext': 'flv', 'ext': 'mp4',
'title': 'De zwarte weduwe', 'title': 'De zwarte weduwe',
'description': 'md5:d90c21dced7db869a85db89a623998d4', 'description': 'md5:db1227b0f318c849ba5eab1fef895ee4',
'duration': 1457.04, 'duration': 1457.04,
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
'season': '1', 'season': 'Season 1',
'season_number': 1, 'season_number': 1,
'episode_number': 1, 'episode_number': 1,
}, },
'skip': 'This video is only available for registered users' 'skip': 'This video is only available for registered users',
'params': {
'username': '<snip>',
'password': '<snip>',
},
'expected_warnings': ['is not a supported codec'],
}, {
# Only available via new API endpoint
'url': 'https://www.vrt.be/vrtnu/a-z/kamp-waes/1/kamp-waes-s1a5/',
'info_dict': {
'id': 'pbs-pub-0763b56c-64fb-4d38-b95b-af60bf433c71$vid-ad36a73c-4735-4f1f-b2c0-a38e6e6aa7e1',
'ext': 'mp4',
'title': 'Aflevering 5',
'description': 'Wie valt door de mand tijdens een missie?',
'duration': 2967.06,
'season': 'Season 1',
'season_number': 1,
'episode_number': 5,
},
'skip': 'This video is only available for registered users',
'params': {
'username': '<snip>',
'password': '<snip>',
},
'expected_warnings': ['Unable to download asset JSON', 'is not a supported codec', 'Unknown MIME type'],
}] }]
_NETRC_MACHINE = 'vrtnu' _NETRC_MACHINE = 'vrtnu'
_APIKEY = '3_0Z2HujMtiWq_pkAjgnS2Md2E11a1AwZjYiBETtwNE-EoEHDINgtnvcAOpNgmrVGy' _APIKEY = '3_0Z2HujMtiWq_pkAjgnS2Md2E11a1AwZjYiBETtwNE-EoEHDINgtnvcAOpNgmrVGy'

View File

@ -1,8 +1,10 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import hashlib
import json import json
import re import re
from xml.sax.saxutils import escape
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
@ -216,6 +218,29 @@ class CBCWatchBaseIE(InfoExtractor):
'clearleap': 'http://www.clearleap.com/namespace/clearleap/1.0/', 'clearleap': 'http://www.clearleap.com/namespace/clearleap/1.0/',
} }
_GEO_COUNTRIES = ['CA'] _GEO_COUNTRIES = ['CA']
_LOGIN_URL = 'https://api.loginradius.com/identity/v2/auth/login'
_TOKEN_URL = 'https://cloud-api.loginradius.com/sso/jwt/api/token'
_API_KEY = '3f4beddd-2061-49b0-ae80-6f1f2ed65b37'
_NETRC_MACHINE = 'cbcwatch'
def _signature(self, email, password):
data = json.dumps({
'email': email,
'password': password,
}).encode()
headers = {'content-type': 'application/json'}
query = {'apikey': self._API_KEY}
resp = self._download_json(self._LOGIN_URL, None, data=data, headers=headers, query=query)
access_token = resp['access_token']
# token
query = {
'access_token': access_token,
'apikey': self._API_KEY,
'jwtapp': 'jwt',
}
resp = self._download_json(self._TOKEN_URL, None, headers=headers, query=query)
return resp['signature']
def _call_api(self, path, video_id): def _call_api(self, path, video_id):
url = path if path.startswith('http') else self._API_BASE_URL + path url = path if path.startswith('http') else self._API_BASE_URL + path
@ -239,7 +264,8 @@ class CBCWatchBaseIE(InfoExtractor):
def _real_initialize(self): def _real_initialize(self):
if self._valid_device_token(): if self._valid_device_token():
return return
device = self._downloader.cache.load('cbcwatch', 'device') or {} device = self._downloader.cache.load(
'cbcwatch', self._cache_device_key()) or {}
self._device_id, self._device_token = device.get('id'), device.get('token') self._device_id, self._device_token = device.get('id'), device.get('token')
if self._valid_device_token(): if self._valid_device_token():
return return
@ -248,16 +274,30 @@ class CBCWatchBaseIE(InfoExtractor):
def _valid_device_token(self): def _valid_device_token(self):
return self._device_id and self._device_token return self._device_id and self._device_token
def _cache_device_key(self):
email, _ = self._get_login_info()
return '%s_device' % hashlib.sha256(email.encode()).hexdigest() if email else 'device'
def _register_device(self): def _register_device(self):
self._device_id = self._device_token = None
result = self._download_xml( result = self._download_xml(
self._API_BASE_URL + 'device/register', self._API_BASE_URL + 'device/register',
None, 'Acquiring device token', None, 'Acquiring device token',
data=b'<device><type>web</type></device>') data=b'<device><type>web</type></device>')
self._device_id = xpath_text(result, 'deviceId', fatal=True) self._device_id = xpath_text(result, 'deviceId', fatal=True)
self._device_token = xpath_text(result, 'deviceToken', fatal=True) email, password = self._get_login_info()
if email and password:
signature = self._signature(email, password)
data = '<login><token>{0}</token><device><deviceId>{1}</deviceId><type>web</type></device></login>'.format(
escape(signature), escape(self._device_id)).encode()
url = self._API_BASE_URL + 'device/login'
result = self._download_xml(
url, None, data=data,
headers={'content-type': 'application/xml'})
self._device_token = xpath_text(result, 'token', fatal=True)
else:
self._device_token = xpath_text(result, 'deviceToken', fatal=True)
self._downloader.cache.store( self._downloader.cache.store(
'cbcwatch', 'device', { 'cbcwatch', self._cache_device_key(), {
'id': self._device_id, 'id': self._device_id,
'token': self._device_token, 'token': self._device_token,
}) })

View File

@ -32,7 +32,7 @@ class Channel9IE(InfoExtractor):
'upload_date': '20130828', 'upload_date': '20130828',
'session_code': 'KOS002', 'session_code': 'KOS002',
'session_room': 'Arena 1A', 'session_room': 'Arena 1A',
'session_speakers': ['Andrew Coates', 'Brady Gaster', 'Mads Kristensen', 'Ed Blankenship', 'Patrick Klug'], 'session_speakers': 'count:5',
}, },
}, { }, {
'url': 'http://channel9.msdn.com/posts/Self-service-BI-with-Power-BI-nuclear-testing', 'url': 'http://channel9.msdn.com/posts/Self-service-BI-with-Power-BI-nuclear-testing',
@ -64,15 +64,15 @@ class Channel9IE(InfoExtractor):
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
}, {
'url': 'https://channel9.msdn.com/Niners/Splendid22/Queue/76acff796e8f411184b008028e0d492b/RSS',
'info_dict': {
'id': 'Niners/Splendid22/Queue/76acff796e8f411184b008028e0d492b',
'title': 'Channel 9',
},
'playlist_mincount': 100,
}, { }, {
'url': 'https://channel9.msdn.com/Events/DEVintersection/DEVintersection-2016/RSS', 'url': 'https://channel9.msdn.com/Events/DEVintersection/DEVintersection-2016/RSS',
'info_dict': {
'id': 'Events/DEVintersection/DEVintersection-2016',
'title': 'DEVintersection 2016 Orlando Sessions',
},
'playlist_mincount': 14,
}, {
'url': 'https://channel9.msdn.com/Niners/Splendid22/Queue/76acff796e8f411184b008028e0d492b/RSS',
'only_matching': True, 'only_matching': True,
}, { }, {
'url': 'https://channel9.msdn.com/Events/Speakers/scott-hanselman/RSS?UrlSafeName=scott-hanselman', 'url': 'https://channel9.msdn.com/Events/Speakers/scott-hanselman/RSS?UrlSafeName=scott-hanselman',
@ -112,11 +112,11 @@ class Channel9IE(InfoExtractor):
episode_data), content_path) episode_data), content_path)
content_id = episode_data['contentId'] content_id = episode_data['contentId']
is_session = '/Sessions(' in episode_data['api'] is_session = '/Sessions(' in episode_data['api']
content_url = 'https://channel9.msdn.com/odata' + episode_data['api'] content_url = 'https://channel9.msdn.com/odata' + episode_data['api'] + '?$select=Captions,CommentCount,MediaLengthInSeconds,PublishedDate,Rating,RatingCount,Title,VideoMP4High,VideoMP4Low,VideoMP4Medium,VideoPlayerPreviewImage,VideoWMV,VideoWMVHQ,Views,'
if is_session: if is_session:
content_url += '?$expand=Speakers' content_url += 'Code,Description,Room,Slides,Speakers,ZipFile&$expand=Speakers'
else: else:
content_url += '?$expand=Authors' content_url += 'Authors,Body&$expand=Authors'
content_data = self._download_json(content_url, content_id) content_data = self._download_json(content_url, content_id)
title = content_data['Title'] title = content_data['Title']
@ -210,7 +210,7 @@ class Channel9IE(InfoExtractor):
'id': content_id, 'id': content_id,
'title': title, 'title': title,
'description': clean_html(content_data.get('Description') or content_data.get('Body')), 'description': clean_html(content_data.get('Description') or content_data.get('Body')),
'thumbnail': content_data.get('Thumbnail') or content_data.get('VideoPlayerPreviewImage'), 'thumbnail': content_data.get('VideoPlayerPreviewImage'),
'duration': int_or_none(content_data.get('MediaLengthInSeconds')), 'duration': int_or_none(content_data.get('MediaLengthInSeconds')),
'timestamp': parse_iso8601(content_data.get('PublishedDate')), 'timestamp': parse_iso8601(content_data.get('PublishedDate')),
'avg_rating': int_or_none(content_data.get('Rating')), 'avg_rating': int_or_none(content_data.get('Rating')),

View File

@ -3,7 +3,11 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ExtractorError from ..utils import (
ExtractorError,
lowercase_escape,
url_or_none,
)
class ChaturbateIE(InfoExtractor): class ChaturbateIE(InfoExtractor):
@ -38,12 +42,31 @@ class ChaturbateIE(InfoExtractor):
'https://chaturbate.com/%s/' % video_id, video_id, 'https://chaturbate.com/%s/' % video_id, video_id,
headers=self.geo_verification_headers()) headers=self.geo_verification_headers())
m3u8_urls = [] found_m3u8_urls = []
for m in re.finditer( data = self._parse_json(
r'(["\'])(?P<url>http.+?\.m3u8.*?)\1', webpage): self._search_regex(
m3u8_fast_url, m3u8_no_fast_url = m.group('url'), m.group( r'initialRoomDossier\s*=\s*(["\'])(?P<value>(?:(?!\1).)+)\1',
'url').replace('_fast', '') webpage, 'data', default='{}', group='value'),
video_id, transform_source=lowercase_escape, fatal=False)
if data:
m3u8_url = url_or_none(data.get('hls_source'))
if m3u8_url:
found_m3u8_urls.append(m3u8_url)
if not found_m3u8_urls:
for m in re.finditer(
r'(\\u002[27])(?P<url>http.+?\.m3u8.*?)\1', webpage):
found_m3u8_urls.append(lowercase_escape(m.group('url')))
if not found_m3u8_urls:
for m in re.finditer(
r'(["\'])(?P<url>http.+?\.m3u8.*?)\1', webpage):
found_m3u8_urls.append(m.group('url'))
m3u8_urls = []
for found_m3u8_url in found_m3u8_urls:
m3u8_fast_url, m3u8_no_fast_url = found_m3u8_url, found_m3u8_url.replace('_fast', '')
for m3u8_url in (m3u8_fast_url, m3u8_no_fast_url): for m3u8_url in (m3u8_fast_url, m3u8_no_fast_url):
if m3u8_url not in m3u8_urls: if m3u8_url not in m3u8_urls:
m3u8_urls.append(m3u8_url) m3u8_urls.append(m3u8_url)
@ -63,7 +86,12 @@ class ChaturbateIE(InfoExtractor):
formats = [] formats = []
for m3u8_url in m3u8_urls: for m3u8_url in m3u8_urls:
m3u8_id = 'fast' if '_fast' in m3u8_url else 'slow' for known_id in ('fast', 'slow'):
if '_%s' % known_id in m3u8_url:
m3u8_id = known_id
break
else:
m3u8_id = None
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
m3u8_url, video_id, ext='mp4', m3u8_url, video_id, ext='mp4',
# ffmpeg skips segments for fast m3u8 # ffmpeg skips segments for fast m3u8

View File

@ -1,20 +1,24 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import base64
import re import re
from .common import InfoExtractor from .common import InfoExtractor
class CloudflareStreamIE(InfoExtractor): class CloudflareStreamIE(InfoExtractor):
_DOMAIN_RE = r'(?:cloudflarestream\.com|(?:videodelivery|bytehighway)\.net)'
_EMBED_RE = r'embed\.%s/embed/[^/]+\.js\?.*?\bvideo=' % _DOMAIN_RE
_ID_RE = r'[\da-f]{32}|[\w-]+\.[\w-]+\.[\w-]+'
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?:// https?://
(?: (?:
(?:watch\.)?(?:cloudflarestream\.com|videodelivery\.net)/| (?:watch\.)?%s/|
embed\.(?:cloudflarestream\.com|videodelivery\.net)/embed/[^/]+\.js\?.*?\bvideo= %s
) )
(?P<id>[\da-f]+) (?P<id>%s)
''' ''' % (_DOMAIN_RE, _EMBED_RE, _ID_RE)
_TESTS = [{ _TESTS = [{
'url': 'https://embed.cloudflarestream.com/embed/we4g.fla9.latest.js?video=31c9291ab41fac05471db4e73aa11717', 'url': 'https://embed.cloudflarestream.com/embed/we4g.fla9.latest.js?video=31c9291ab41fac05471db4e73aa11717',
'info_dict': { 'info_dict': {
@ -41,23 +45,28 @@ class CloudflareStreamIE(InfoExtractor):
return [ return [
mobj.group('url') mobj.group('url')
for mobj in re.finditer( for mobj in re.finditer(
r'<script[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//embed\.(?:cloudflarestream\.com|videodelivery\.net)/embed/[^/]+\.js\?.*?\bvideo=[\da-f]+?.*?)\1', r'<script[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//%s(?:%s).*?)\1' % (CloudflareStreamIE._EMBED_RE, CloudflareStreamIE._ID_RE),
webpage)] webpage)]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
domain = 'bytehighway.net' if 'bytehighway.net/' in url else 'videodelivery.net'
base_url = 'https://%s/%s/' % (domain, video_id)
if '.' in video_id:
video_id = self._parse_json(base64.urlsafe_b64decode(
video_id.split('.')[1]), video_id)['sub']
manifest_base_url = base_url + 'manifest/video.'
formats = self._extract_m3u8_formats( formats = self._extract_m3u8_formats(
'https://cloudflarestream.com/%s/manifest/video.m3u8' % video_id, manifest_base_url + 'm3u8', video_id, 'mp4',
video_id, 'mp4', entry_protocol='m3u8_native', m3u8_id='hls', 'm3u8_native', m3u8_id='hls', fatal=False)
fatal=False)
formats.extend(self._extract_mpd_formats( formats.extend(self._extract_mpd_formats(
'https://cloudflarestream.com/%s/manifest/video.mpd' % video_id, manifest_base_url + 'mpd', video_id, mpd_id='dash', fatal=False))
video_id, mpd_id='dash', fatal=False))
self._sort_formats(formats) self._sort_formats(formats)
return { return {
'id': video_id, 'id': video_id,
'title': video_id, 'title': video_id,
'thumbnail': base_url + 'thumbnails/thumbnail.jpg',
'formats': formats, 'formats': formats,
} }

View File

@ -1766,6 +1766,19 @@ class InfoExtractor(object):
# the same GROUP-ID # the same GROUP-ID
f['acodec'] = 'none' f['acodec'] = 'none'
formats.append(f) formats.append(f)
# for DailyMotion
progressive_uri = last_stream_inf.get('PROGRESSIVE-URI')
if progressive_uri:
http_f = f.copy()
del http_f['manifest_url']
http_f.update({
'format_id': f['format_id'].replace('hls-', 'http-'),
'protocol': 'http',
'url': progressive_uri,
})
formats.append(http_f)
last_stream_inf = {} last_stream_inf = {}
return formats return formats
@ -2327,6 +2340,8 @@ class InfoExtractor(object):
if res is False: if res is False:
return [] return []
ism_doc, urlh = res ism_doc, urlh = res
if ism_doc is None:
return []
return self._parse_ism_formats(ism_doc, urlh.geturl(), ism_id) return self._parse_ism_formats(ism_doc, urlh.geturl(), ism_id)

View File

@ -4,7 +4,12 @@ from __future__ import unicode_literals
import re import re
from .theplatform import ThePlatformFeedIE from .theplatform import ThePlatformFeedIE
from ..utils import int_or_none from ..utils import (
dict_get,
ExtractorError,
float_or_none,
int_or_none,
)
class CorusIE(ThePlatformFeedIE): class CorusIE(ThePlatformFeedIE):
@ -12,24 +17,49 @@ class CorusIE(ThePlatformFeedIE):
https?:// https?://
(?:www\.)? (?:www\.)?
(?P<domain> (?P<domain>
(?:globaltv|etcanada)\.com| (?:
(?:hgtv|foodnetwork|slice|history|showcase|bigbrothercanada)\.ca globaltv|
etcanada|
seriesplus|
wnetwork|
ytv
)\.com|
(?:
hgtv|
foodnetwork|
slice|
history|
showcase|
bigbrothercanada|
abcspark|
disney(?:channel|lachaine)
)\.ca
)
/(?:[^/]+/)*
(?:
video\.html\?.*?\bv=|
videos?/(?:[^/]+/)*(?:[a-z0-9-]+-)?
)
(?P<id>
[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12}|
(?:[A-Z]{4})?\d{12,20}
) )
/(?:video/(?:[^/]+/)?|(?:[^/]+/)+(?:videos/[a-z0-9-]+-|video\.html\?.*?\bv=))
(?P<id>\d+)
''' '''
_TESTS = [{ _TESTS = [{
'url': 'http://www.hgtv.ca/shows/bryan-inc/videos/movie-night-popcorn-with-bryan-870923331648/', 'url': 'http://www.hgtv.ca/shows/bryan-inc/videos/movie-night-popcorn-with-bryan-870923331648/',
'md5': '05dcbca777bf1e58c2acbb57168ad3a6',
'info_dict': { 'info_dict': {
'id': '870923331648', 'id': '870923331648',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Movie Night Popcorn with Bryan', 'title': 'Movie Night Popcorn with Bryan',
'description': 'Bryan whips up homemade popcorn, the old fashion way for Jojo and Lincoln.', 'description': 'Bryan whips up homemade popcorn, the old fashion way for Jojo and Lincoln.',
'uploader': 'SHWM-NEW',
'upload_date': '20170206', 'upload_date': '20170206',
'timestamp': 1486392197, 'timestamp': 1486392197,
}, },
'params': {
'format': 'bestvideo',
'skip_download': True,
},
'expected_warnings': ['Failed to parse JSON'],
}, { }, {
'url': 'http://www.foodnetwork.ca/shows/chopped/video/episode/chocolate-obsession/video.html?v=872683587753', 'url': 'http://www.foodnetwork.ca/shows/chopped/video/episode/chocolate-obsession/video.html?v=872683587753',
'only_matching': True, 'only_matching': True,
@ -48,58 +78,83 @@ class CorusIE(ThePlatformFeedIE):
}, { }, {
'url': 'https://www.bigbrothercanada.ca/video/big-brother-canada-704/1457812035894/', 'url': 'https://www.bigbrothercanada.ca/video/big-brother-canada-704/1457812035894/',
'only_matching': True 'only_matching': True
}, {
'url': 'https://www.seriesplus.com/emissions/dre-mary-mort-sur-ordonnance/videos/deux-coeurs-battant/SERP0055626330000200/',
'only_matching': True
}, {
'url': 'https://www.disneychannel.ca/shows/gabby-duran-the-unsittables/video/crybaby-duran-clip/2f557eec-0588-11ea-ae2b-e2c6776b770e/',
'only_matching': True
}] }]
_GEO_BYPASS = False
_TP_FEEDS = { _SITE_MAP = {
'globaltv': { 'globaltv': 'series',
'feed_id': 'ChQqrem0lNUp', 'etcanada': 'series',
'account_id': 2269680845, 'foodnetwork': 'food',
}, 'bigbrothercanada': 'series',
'etcanada': { 'disneychannel': 'disneyen',
'feed_id': 'ChQqrem0lNUp', 'disneylachaine': 'disneyfr',
'account_id': 2269680845,
},
'hgtv': {
'feed_id': 'L0BMHXi2no43',
'account_id': 2414428465,
},
'foodnetwork': {
'feed_id': 'ukK8o58zbRmJ',
'account_id': 2414429569,
},
'slice': {
'feed_id': '5tUJLgV2YNJ5',
'account_id': 2414427935,
},
'history': {
'feed_id': 'tQFx_TyyEq4J',
'account_id': 2369613659,
},
'showcase': {
'feed_id': '9H6qyshBZU3E',
'account_id': 2414426607,
},
'bigbrothercanada': {
'feed_id': 'ChQqrem0lNUp',
'account_id': 2269680845,
},
} }
def _real_extract(self, url): def _real_extract(self, url):
domain, video_id = re.match(self._VALID_URL, url).groups() domain, video_id = re.match(self._VALID_URL, url).groups()
feed_info = self._TP_FEEDS[domain.split('.')[0]] site = domain.split('.')[0]
return self._extract_feed_info('dtjsEC', feed_info['feed_id'], 'byId=' + video_id, video_id, lambda e: { path = self._SITE_MAP.get(site, site)
'episode_number': int_or_none(e.get('pl1$episode')), if path != 'series':
'season_number': int_or_none(e.get('pl1$season')), path = 'migration/' + path
'series': e.get('pl1$show'), video = self._download_json(
}, { 'https://globalcontent.corusappservices.com/templates/%s/playlist/' % path,
'HLS': { video_id, query={'byId': video_id},
'manifest': 'm3u', headers={'Accept': 'application/json'})[0]
}, title = video['title']
'DesktopHLS Default': {
'manifest': 'm3u', formats = []
}, for source in video.get('sources', []):
'MP4 MBR': { smil_url = source.get('file')
'manifest': 'm3u', if not smil_url:
}, continue
}, feed_info['account_id']) source_type = source.get('type')
note = 'Downloading%s smil file' % (' ' + source_type if source_type else '')
resp = self._download_webpage(
smil_url, video_id, note, fatal=False,
headers=self.geo_verification_headers())
if not resp:
continue
error = self._parse_json(resp, video_id, fatal=False)
if error:
if error.get('exception') == 'GeoLocationBlocked':
self.raise_geo_restricted(countries=['CA'])
raise ExtractorError(error['description'])
smil = self._parse_xml(resp, video_id, fatal=False)
if smil is None:
continue
namespace = self._parse_smil_namespace(smil)
formats.extend(self._parse_smil_formats(
smil, smil_url, video_id, namespace))
if not formats and video.get('drm'):
raise ExtractorError('This video is DRM protected.', expected=True)
self._sort_formats(formats)
subtitles = {}
for track in video.get('tracks', []):
track_url = track.get('file')
if not track_url:
continue
lang = 'fr' if site in ('disneylachaine', 'seriesplus') else 'en'
subtitles.setdefault(lang, []).append({'url': track_url})
metadata = video.get('metadata') or {}
get_number = lambda x: int_or_none(video.get('pl1$' + x) or metadata.get(x + 'Number'))
return {
'id': video_id,
'title': title,
'formats': formats,
'thumbnail': dict_get(video, ('defaultThumbnailUrl', 'thumbnail', 'image')),
'description': video.get('description'),
'timestamp': int_or_none(video.get('availableDate'), 1000),
'subtitles': subtitles,
'duration': float_or_none(metadata.get('duration')),
'series': dict_get(video, ('show', 'pl1$show')),
'season_number': get_number('season'),
'episode_number': get_number('episode'),
}

View File

@ -1,50 +1,93 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import base64
import functools import functools
import hashlib
import itertools
import json import json
import random
import re import re
import string
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_struct_pack from ..compat import compat_HTTPError
from ..utils import ( from ..utils import (
determine_ext, age_restricted,
error_to_compat_str, clean_html,
ExtractorError, ExtractorError,
int_or_none, int_or_none,
mimetype2ext,
OnDemandPagedList, OnDemandPagedList,
parse_iso8601,
sanitized_Request,
str_to_int,
try_get, try_get,
unescapeHTML, unescapeHTML,
update_url_query,
url_or_none,
urlencode_postdata, urlencode_postdata,
) )
class DailymotionBaseInfoExtractor(InfoExtractor): class DailymotionBaseInfoExtractor(InfoExtractor):
_FAMILY_FILTER = None
_HEADERS = {
'Content-Type': 'application/json',
'Origin': 'https://www.dailymotion.com',
}
_NETRC_MACHINE = 'dailymotion'
def _get_dailymotion_cookies(self):
return self._get_cookies('https://www.dailymotion.com/')
@staticmethod @staticmethod
def _build_request(url): def _get_cookie_value(cookies, name):
"""Build a request with the family filter disabled""" cookie = cookies.get('name')
request = sanitized_Request(url) if cookie:
request.add_header('Cookie', 'family_filter=off; ff=off') return cookie.value
return request
def _download_webpage_handle_no_ff(self, url, *args, **kwargs): def _set_dailymotion_cookie(self, name, value):
request = self._build_request(url) self._set_cookie('www.dailymotion.com', name, value)
return self._download_webpage_handle(request, *args, **kwargs)
def _download_webpage_no_ff(self, url, *args, **kwargs): def _real_initialize(self):
request = self._build_request(url) cookies = self._get_dailymotion_cookies()
return self._download_webpage(request, *args, **kwargs) ff = self._get_cookie_value(cookies, 'ff')
self._FAMILY_FILTER = ff == 'on' if ff else age_restricted(18, self._downloader.params.get('age_limit'))
self._set_dailymotion_cookie('ff', 'on' if self._FAMILY_FILTER else 'off')
def _call_api(self, object_type, xid, object_fields, note, filter_extra=None):
if not self._HEADERS.get('Authorization'):
cookies = self._get_dailymotion_cookies()
token = self._get_cookie_value(cookies, 'access_token') or self._get_cookie_value(cookies, 'client_token')
if not token:
data = {
'client_id': 'f1a362d288c1b98099c7',
'client_secret': 'eea605b96e01c796ff369935357eca920c5da4c5',
}
username, password = self._get_login_info()
if username:
data.update({
'grant_type': 'password',
'password': password,
'username': username,
})
else:
data['grant_type'] = 'client_credentials'
try:
token = self._download_json(
'https://graphql.api.dailymotion.com/oauth/token',
None, 'Downloading Access Token',
data=urlencode_postdata(data))['access_token']
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 400:
raise ExtractorError(self._parse_json(
e.cause.read().decode(), xid)['error_description'], expected=True)
raise
self._set_dailymotion_cookie('access_token' if username else 'client_token', token)
self._HEADERS['Authorization'] = 'Bearer ' + token
resp = self._download_json(
'https://graphql.api.dailymotion.com/', xid, note, data=json.dumps({
'query': '''{
%s(xid: "%s"%s) {
%s
}
}''' % (object_type, xid, ', ' + filter_extra if filter_extra else '', object_fields),
}).encode(), headers=self._HEADERS)
obj = resp['data'][object_type]
if not obj:
raise ExtractorError(resp['errors'][0]['message'], expected=True)
return obj
class DailymotionIE(DailymotionBaseInfoExtractor): class DailymotionIE(DailymotionBaseInfoExtractor):
@ -54,18 +97,9 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
(?:(?:www|touch)\.)?dailymotion\.[a-z]{2,3}/(?:(?:(?:embed|swf|\#)/)?video|swf)| (?:(?:www|touch)\.)?dailymotion\.[a-z]{2,3}/(?:(?:(?:embed|swf|\#)/)?video|swf)|
(?:www\.)?lequipe\.fr/video (?:www\.)?lequipe\.fr/video
) )
/(?P<id>[^/?_]+) /(?P<id>[^/?_]+)(?:.+?\bplaylist=(?P<playlist_id>x[0-9a-z]+))?
''' '''
IE_NAME = 'dailymotion' IE_NAME = 'dailymotion'
_FORMATS = [
('stream_h264_ld_url', 'ld'),
('stream_h264_url', 'standard'),
('stream_h264_hq_url', 'hq'),
('stream_h264_hd_url', 'hd'),
('stream_h264_hd1080_url', 'hd180'),
]
_TESTS = [{ _TESTS = [{
'url': 'http://www.dailymotion.com/video/x5kesuj_office-christmas-party-review-jason-bateman-olivia-munn-t-j-miller_news', 'url': 'http://www.dailymotion.com/video/x5kesuj_office-christmas-party-review-jason-bateman-olivia-munn-t-j-miller_news',
'md5': '074b95bdee76b9e3654137aee9c79dfe', 'md5': '074b95bdee76b9e3654137aee9c79dfe',
@ -74,7 +108,6 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'Office Christmas Party Review Jason Bateman, Olivia Munn, T.J. Miller', 'title': 'Office Christmas Party Review Jason Bateman, Olivia Munn, T.J. Miller',
'description': 'Office Christmas Party Review - Jason Bateman, Olivia Munn, T.J. Miller', 'description': 'Office Christmas Party Review - Jason Bateman, Olivia Munn, T.J. Miller',
'thumbnail': r're:^https?:.*\.(?:jpg|png)$',
'duration': 187, 'duration': 187,
'timestamp': 1493651285, 'timestamp': 1493651285,
'upload_date': '20170501', 'upload_date': '20170501',
@ -146,7 +179,16 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
}, { }, {
'url': 'https://www.lequipe.fr/video/k7MtHciueyTcrFtFKA2', 'url': 'https://www.lequipe.fr/video/k7MtHciueyTcrFtFKA2',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.dailymotion.com/video/x3z49k?playlist=xv4bw',
'only_matching': True,
}] }]
_GEO_BYPASS = False
_COMMON_MEDIA_FIELDS = '''description
geoblockedCountries {
allowed
}
xid'''
@staticmethod @staticmethod
def _extract_urls(webpage): def _extract_urls(webpage):
@ -162,264 +204,140 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
return urls return urls
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id, playlist_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage_no_ff( if playlist_id:
'https://www.dailymotion.com/video/%s' % video_id, video_id) if not self._downloader.params.get('noplaylist'):
self.to_screen('Downloading playlist %s - add --no-playlist to just download video' % playlist_id)
return self.url_result(
'http://www.dailymotion.com/playlist/' + playlist_id,
'DailymotionPlaylist', playlist_id)
self.to_screen('Downloading just video %s because of --no-playlist' % video_id)
age_limit = self._rta_search(webpage) password = self._downloader.params.get('videopassword')
media = self._call_api(
description = self._og_search_description( 'media', video_id, '''... on Video {
webpage, default=None) or self._html_search_meta( %s
'description', webpage, 'description') stats {
likes {
view_count_str = self._search_regex( total
(r'<meta[^>]+itemprop="interactionCount"[^>]+content="UserPlays:([\s\d,.]+)"',
r'video_views_count[^>]+>\s+([\s\d\,.]+)'),
webpage, 'view count', default=None)
if view_count_str:
view_count_str = re.sub(r'\s', '', view_count_str)
view_count = str_to_int(view_count_str)
comment_count = int_or_none(self._search_regex(
r'<meta[^>]+itemprop="interactionCount"[^>]+content="UserComments:(\d+)"',
webpage, 'comment count', default=None))
player_v5 = self._search_regex(
[r'buildPlayer\(({.+?})\);\n', # See https://github.com/ytdl-org/youtube-dl/issues/7826
r'playerV5\s*=\s*dmp\.create\([^,]+?,\s*({.+?})\);',
r'buildPlayer\(({.+?})\);',
r'var\s+config\s*=\s*({.+?});',
# New layout regex (see https://github.com/ytdl-org/youtube-dl/issues/13580)
r'__PLAYER_CONFIG__\s*=\s*({.+?});'],
webpage, 'player v5', default=None)
if player_v5:
player = self._parse_json(player_v5, video_id, fatal=False) or {}
metadata = try_get(player, lambda x: x['metadata'], dict)
if not metadata:
metadata_url = url_or_none(try_get(
player, lambda x: x['context']['metadata_template_url1']))
if metadata_url:
metadata_url = metadata_url.replace(':videoId', video_id)
else:
metadata_url = update_url_query(
'https://www.dailymotion.com/player/metadata/video/%s'
% video_id, {
'embedder': url,
'integration': 'inline',
'GK_PV5_NEON': '1',
})
metadata = self._download_json(
metadata_url, video_id, 'Downloading metadata JSON')
if try_get(metadata, lambda x: x['error']['type']) == 'password_protected':
password = self._downloader.params.get('videopassword')
if password:
r = int(metadata['id'][1:], 36)
us64e = lambda x: base64.urlsafe_b64encode(x).decode().strip('=')
t = ''.join(random.choice(string.ascii_letters) for i in range(10))
n = us64e(compat_struct_pack('I', r))
i = us64e(hashlib.md5(('%s%d%s' % (password, r, t)).encode()).digest())
metadata = self._download_json(
'http://www.dailymotion.com/player/metadata/video/p' + i + t + n, video_id)
self._check_error(metadata)
formats = []
for quality, media_list in metadata['qualities'].items():
for media in media_list:
media_url = media.get('url')
if not media_url:
continue
type_ = media.get('type')
if type_ == 'application/vnd.lumberjack.manifest':
continue
ext = mimetype2ext(type_) or determine_ext(media_url)
if ext == 'm3u8':
m3u8_formats = self._extract_m3u8_formats(
media_url, video_id, 'mp4', preference=-1,
m3u8_id='hls', fatal=False)
for f in m3u8_formats:
f['url'] = f['url'].split('#')[0]
formats.append(f)
elif ext == 'f4m':
formats.extend(self._extract_f4m_formats(
media_url, video_id, preference=-1, f4m_id='hds', fatal=False))
else:
f = {
'url': media_url,
'format_id': 'http-%s' % quality,
'ext': ext,
}
m = re.search(r'H264-(?P<width>\d+)x(?P<height>\d+)', media_url)
if m:
f.update({
'width': int(m.group('width')),
'height': int(m.group('height')),
})
formats.append(f)
self._sort_formats(formats)
title = metadata['title']
duration = int_or_none(metadata.get('duration'))
timestamp = int_or_none(metadata.get('created_time'))
thumbnail = metadata.get('poster_url')
uploader = metadata.get('owner', {}).get('screenname')
uploader_id = metadata.get('owner', {}).get('id')
subtitles = {}
subtitles_data = metadata.get('subtitles', {}).get('data', {})
if subtitles_data and isinstance(subtitles_data, dict):
for subtitle_lang, subtitle in subtitles_data.items():
subtitles[subtitle_lang] = [{
'ext': determine_ext(subtitle_url),
'url': subtitle_url,
} for subtitle_url in subtitle.get('urls', [])]
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'duration': duration,
'timestamp': timestamp,
'uploader': uploader,
'uploader_id': uploader_id,
'age_limit': age_limit,
'view_count': view_count,
'comment_count': comment_count,
'formats': formats,
'subtitles': subtitles,
}
# vevo embed
vevo_id = self._search_regex(
r'<link rel="video_src" href="[^"]*?vevo\.com[^"]*?video=(?P<id>[\w]*)',
webpage, 'vevo embed', default=None)
if vevo_id:
return self.url_result('vevo:%s' % vevo_id, 'Vevo')
# fallback old player
embed_page = self._download_webpage_no_ff(
'https://www.dailymotion.com/embed/video/%s' % video_id,
video_id, 'Downloading embed page')
timestamp = parse_iso8601(self._html_search_meta(
'video:release_date', webpage, 'upload date'))
info = self._parse_json(
self._search_regex(
r'var info = ({.*?}),$', embed_page,
'video info', flags=re.MULTILINE),
video_id)
self._check_error(info)
formats = []
for (key, format_id) in self._FORMATS:
video_url = info.get(key)
if video_url is not None:
m_size = re.search(r'H264-(\d+)x(\d+)', video_url)
if m_size is not None:
width, height = map(int_or_none, (m_size.group(1), m_size.group(2)))
else:
width, height = None, None
formats.append({
'url': video_url,
'ext': 'mp4',
'format_id': format_id,
'width': width,
'height': height,
})
self._sort_formats(formats)
# subtitles
video_subtitles = self.extract_subtitles(video_id, webpage)
title = self._og_search_title(webpage, default=None)
if title is None:
title = self._html_search_regex(
r'(?s)<span\s+id="video_title"[^>]*>(.*?)</span>', webpage,
'title')
return {
'id': video_id,
'formats': formats,
'uploader': info['owner.screenname'],
'timestamp': timestamp,
'title': title,
'description': description,
'subtitles': video_subtitles,
'thumbnail': info['thumbnail_url'],
'age_limit': age_limit,
'view_count': view_count,
'duration': info['duration']
} }
views {
total
}
}
}
... on Live {
%s
audienceCount
isOnAir
}''' % (self._COMMON_MEDIA_FIELDS, self._COMMON_MEDIA_FIELDS), 'Downloading media JSON metadata',
'password: "%s"' % self._downloader.params.get('videopassword') if password else None)
xid = media['xid']
def _check_error(self, info): metadata = self._download_json(
error = info.get('error') 'https://www.dailymotion.com/player/metadata/video/' + xid,
xid, 'Downloading metadata JSON',
query={'app': 'com.dailymotion.neon'})
error = metadata.get('error')
if error: if error:
title = error.get('title') or error['message'] title = error.get('title') or error['raw_message']
# See https://developer.dailymotion.com/api#access-error # See https://developer.dailymotion.com/api#access-error
if error.get('code') == 'DM007': if error.get('code') == 'DM007':
self.raise_geo_restricted(msg=title) allowed_countries = try_get(media, lambda x: x['geoblockedCountries']['allowed'], list)
self.raise_geo_restricted(msg=title, countries=allowed_countries)
raise ExtractorError( raise ExtractorError(
'%s said: %s' % (self.IE_NAME, title), expected=True) '%s said: %s' % (self.IE_NAME, title), expected=True)
def _get_subtitles(self, video_id, webpage): title = metadata['title']
try: is_live = media.get('isOnAir')
sub_list = self._download_webpage( formats = []
'https://api.dailymotion.com/video/%s/subtitles?fields=id,language,url' % video_id, for quality, media_list in metadata['qualities'].items():
video_id, note=False) for m in media_list:
except ExtractorError as err: media_url = m.get('url')
self._downloader.report_warning('unable to download video subtitles: %s' % error_to_compat_str(err)) media_type = m.get('type')
return {} if not media_url or media_type == 'application/vnd.lumberjack.manifest':
info = json.loads(sub_list) continue
if (info['total'] > 0): if media_type == 'application/x-mpegURL':
sub_lang_list = dict((l['language'], [{'url': l['url'], 'ext': 'srt'}]) for l in info['list']) formats.extend(self._extract_m3u8_formats(
return sub_lang_list media_url, video_id, 'mp4',
self._downloader.report_warning('video doesn\'t have subtitles') 'm3u8' if is_live else 'm3u8_native',
return {} m3u8_id='hls', fatal=False))
else:
f = {
'url': media_url,
'format_id': 'http-' + quality,
}
m = re.search(r'/H264-(\d+)x(\d+)(?:-(60)/)?', media_url)
if m:
width, height, fps = map(int_or_none, m.groups())
f.update({
'fps': fps,
'height': height,
'width': width,
})
formats.append(f)
for f in formats:
f['url'] = f['url'].split('#')[0]
if not f.get('fps') and f['format_id'].endswith('@60'):
f['fps'] = 60
self._sort_formats(formats)
subtitles = {}
subtitles_data = try_get(metadata, lambda x: x['subtitles']['data'], dict) or {}
for subtitle_lang, subtitle in subtitles_data.items():
subtitles[subtitle_lang] = [{
'url': subtitle_url,
} for subtitle_url in subtitle.get('urls', [])]
thumbnails = []
for height, poster_url in metadata.get('posters', {}).items():
thumbnails.append({
'height': int_or_none(height),
'id': height,
'url': poster_url,
})
owner = metadata.get('owner') or {}
stats = media.get('stats') or {}
get_count = lambda x: int_or_none(try_get(stats, lambda y: y[x + 's']['total']))
return {
'id': video_id,
'title': self._live_title(title) if is_live else title,
'description': clean_html(media.get('description')),
'thumbnails': thumbnails,
'duration': int_or_none(metadata.get('duration')) or None,
'timestamp': int_or_none(metadata.get('created_time')),
'uploader': owner.get('screenname'),
'uploader_id': owner.get('id') or metadata.get('screenname'),
'age_limit': 18 if metadata.get('explicit') else 0,
'tags': metadata.get('tags'),
'view_count': get_count('view') or int_or_none(media.get('audienceCount')),
'like_count': get_count('like'),
'formats': formats,
'subtitles': subtitles,
'is_live': is_live,
}
class DailymotionPlaylistIE(DailymotionBaseInfoExtractor): class DailymotionPlaylistBaseIE(DailymotionBaseInfoExtractor):
IE_NAME = 'dailymotion:playlist'
_VALID_URL = r'(?:https?://)?(?:www\.)?dailymotion\.[a-z]{2,3}/playlist/(?P<id>x[0-9a-z]+)'
_TESTS = [{
'url': 'http://www.dailymotion.com/playlist/xv4bw_nqtv_sport/1#video=xl8v3q',
'info_dict': {
'title': 'SPORT',
'id': 'xv4bw',
},
'playlist_mincount': 20,
}]
_PAGE_SIZE = 100 _PAGE_SIZE = 100
def _fetch_page(self, playlist_id, authorizaion, page): def _fetch_page(self, playlist_id, page):
page += 1 page += 1
videos = self._download_json( videos = self._call_api(
'https://graphql.api.dailymotion.com', self._OBJECT_TYPE, playlist_id,
playlist_id, 'Downloading page %d' % page, '''videos(allowExplicit: %s, first: %d, page: %d) {
data=json.dumps({
'query': '''{
collection(xid: "%s") {
videos(first: %d, page: %d) {
pageInfo {
hasNextPage
nextPage
}
edges { edges {
node { node {
xid xid
url url
} }
} }
} }''' % ('false' if self._FAMILY_FILTER else 'true', self._PAGE_SIZE, page),
} 'Downloading page %d' % page)['videos']
}''' % (playlist_id, self._PAGE_SIZE, page)
}).encode(), headers={
'Authorization': authorizaion,
'Origin': 'https://www.dailymotion.com',
})['data']['collection']['videos']
for edge in videos['edges']: for edge in videos['edges']:
node = edge['node'] node = edge['node']
yield self.url_result( yield self.url_result(
@ -427,86 +345,49 @@ class DailymotionPlaylistIE(DailymotionBaseInfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
playlist_id = self._match_id(url) playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
api = self._parse_json(self._search_regex(
r'__PLAYER_CONFIG__\s*=\s*({.+?});',
webpage, 'player config'), playlist_id)['context']['api']
auth = self._download_json(
api.get('auth_url', 'https://graphql.api.dailymotion.com/oauth/token'),
playlist_id, data=urlencode_postdata({
'client_id': api.get('client_id', 'f1a362d288c1b98099c7'),
'client_secret': api.get('client_secret', 'eea605b96e01c796ff369935357eca920c5da4c5'),
'grant_type': 'client_credentials',
}))
authorizaion = '%s %s' % (auth.get('token_type', 'Bearer'), auth['access_token'])
entries = OnDemandPagedList(functools.partial( entries = OnDemandPagedList(functools.partial(
self._fetch_page, playlist_id, authorizaion), self._PAGE_SIZE) self._fetch_page, playlist_id), self._PAGE_SIZE)
return self.playlist_result( return self.playlist_result(
entries, playlist_id, entries, playlist_id)
self._og_search_title(webpage))
class DailymotionUserIE(DailymotionBaseInfoExtractor): class DailymotionPlaylistIE(DailymotionPlaylistBaseIE):
IE_NAME = 'dailymotion:playlist'
_VALID_URL = r'(?:https?://)?(?:www\.)?dailymotion\.[a-z]{2,3}/playlist/(?P<id>x[0-9a-z]+)'
_TESTS = [{
'url': 'http://www.dailymotion.com/playlist/xv4bw_nqtv_sport/1#video=xl8v3q',
'info_dict': {
'id': 'xv4bw',
},
'playlist_mincount': 20,
}]
_OBJECT_TYPE = 'collection'
class DailymotionUserIE(DailymotionPlaylistBaseIE):
IE_NAME = 'dailymotion:user' IE_NAME = 'dailymotion:user'
_VALID_URL = r'https?://(?:www\.)?dailymotion\.[a-z]{2,3}/(?!(?:embed|swf|#|video|playlist)/)(?:(?:old/)?user/)?(?P<user>[^/]+)' _VALID_URL = r'https?://(?:www\.)?dailymotion\.[a-z]{2,3}/(?!(?:embed|swf|#|video|playlist)/)(?:(?:old/)?user/)?(?P<id>[^/]+)'
_MORE_PAGES_INDICATOR = r'(?s)<div class="pages[^"]*">.*?<a\s+class="[^"]*?icon-arrow_right[^"]*?"'
_PAGE_TEMPLATE = 'http://www.dailymotion.com/user/%s/%s'
_TESTS = [{ _TESTS = [{
'url': 'https://www.dailymotion.com/user/nqtv', 'url': 'https://www.dailymotion.com/user/nqtv',
'info_dict': { 'info_dict': {
'id': 'nqtv', 'id': 'nqtv',
'title': 'Rémi Gaillard',
}, },
'playlist_mincount': 100, 'playlist_mincount': 152,
}, { }, {
'url': 'http://www.dailymotion.com/user/UnderProject', 'url': 'http://www.dailymotion.com/user/UnderProject',
'info_dict': { 'info_dict': {
'id': 'UnderProject', 'id': 'UnderProject',
'title': 'UnderProject',
}, },
'playlist_mincount': 1800, 'playlist_mincount': 1000,
'expected_warnings': [
'Stopped at duplicated page',
],
'skip': 'Takes too long time', 'skip': 'Takes too long time',
}, {
'url': 'https://www.dailymotion.com/user/nqtv',
'info_dict': {
'id': 'nqtv',
},
'playlist_mincount': 148,
'params': {
'age_limit': 0,
},
}] }]
_OBJECT_TYPE = 'channel'
def _extract_entries(self, id):
video_ids = set()
processed_urls = set()
for pagenum in itertools.count(1):
page_url = self._PAGE_TEMPLATE % (id, pagenum)
webpage, urlh = self._download_webpage_handle_no_ff(
page_url, id, 'Downloading page %s' % pagenum)
if urlh.geturl() in processed_urls:
self.report_warning('Stopped at duplicated page %s, which is the same as %s' % (
page_url, urlh.geturl()), id)
break
processed_urls.add(urlh.geturl())
for video_id in re.findall(r'data-xid="(.+?)"', webpage):
if video_id not in video_ids:
yield self.url_result(
'http://www.dailymotion.com/video/%s' % video_id,
DailymotionIE.ie_key(), video_id)
video_ids.add(video_id)
if re.search(self._MORE_PAGES_INDICATOR, webpage) is None:
break
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
user = mobj.group('user')
webpage = self._download_webpage(
'https://www.dailymotion.com/user/%s' % user, user)
full_user = unescapeHTML(self._html_search_regex(
r'<a class="nav-image" title="([^"]+)" href="/%s">' % re.escape(user),
webpage, 'user'))
return {
'_type': 'playlist',
'id': user,
'title': full_user,
'entries': self._extract_entries(user),
}

View File

@ -16,10 +16,11 @@ class DctpTvIE(InfoExtractor):
_TESTS = [{ _TESTS = [{
# 4x3 # 4x3
'url': 'http://www.dctp.tv/filme/videoinstallation-fuer-eine-kaufhausfassade/', 'url': 'http://www.dctp.tv/filme/videoinstallation-fuer-eine-kaufhausfassade/',
'md5': '3ffbd1556c3fe210724d7088fad723e3',
'info_dict': { 'info_dict': {
'id': '95eaa4f33dad413aa17b4ee613cccc6c', 'id': '95eaa4f33dad413aa17b4ee613cccc6c',
'display_id': 'videoinstallation-fuer-eine-kaufhausfassade', 'display_id': 'videoinstallation-fuer-eine-kaufhausfassade',
'ext': 'flv', 'ext': 'm4v',
'title': 'Videoinstallation für eine Kaufhausfassade', 'title': 'Videoinstallation für eine Kaufhausfassade',
'description': 'Kurzfilm', 'description': 'Kurzfilm',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
@ -27,10 +28,6 @@ class DctpTvIE(InfoExtractor):
'timestamp': 1302172322, 'timestamp': 1302172322,
'upload_date': '20110407', 'upload_date': '20110407',
}, },
'params': {
# rtmp download
'skip_download': True,
},
}, { }, {
# 16x9 # 16x9
'url': 'http://www.dctp.tv/filme/sind-youtuber-die-besseren-lehrer/', 'url': 'http://www.dctp.tv/filme/sind-youtuber-die-besseren-lehrer/',
@ -59,33 +56,26 @@ class DctpTvIE(InfoExtractor):
uuid = media['uuid'] uuid = media['uuid']
title = media['title'] title = media['title']
ratio = '16x9' if media.get('is_wide') else '4x3' is_wide = media.get('is_wide')
play_path = 'mp4:%s_dctp_0500_%s.m4v' % (uuid, ratio) formats = []
servers = self._download_json( def add_formats(suffix):
'http://www.dctp.tv/streaming_servers/', display_id, templ = 'https://%%s/%s_dctp_%s.m4v' % (uuid, suffix)
note='Downloading server list JSON', fatal=False) formats.extend([{
'format_id': 'hls-' + suffix,
'url': templ % 'cdn-segments.dctp.tv' + '/playlist.m3u8',
'protocol': 'm3u8_native',
}, {
'format_id': 's3-' + suffix,
'url': templ % 'completed-media.s3.amazonaws.com',
}, {
'format_id': 'http-' + suffix,
'url': templ % 'cdn-media.dctp.tv',
}])
if servers: add_formats('0500_' + ('16x9' if is_wide else '4x3'))
endpoint = next( if is_wide:
server['endpoint'] add_formats('720p')
for server in servers
if url_or_none(server.get('endpoint'))
and 'cloudfront' in server['endpoint'])
else:
endpoint = 'rtmpe://s2pqqn4u96e4j8.cloudfront.net/cfx/st/'
app = self._search_regex(
r'^rtmpe?://[^/]+/(?P<app>.*)$', endpoint, 'app')
formats = [{
'url': endpoint,
'app': app,
'play_path': play_path,
'page_url': url,
'player_url': 'http://svm-prod-dctptv-static.s3.amazonaws.com/dctptv-relaunch2012-110.swf',
'ext': 'flv',
}]
thumbnails = [] thumbnails = []
images = media.get('images') images = media.get('images')

View File

@ -13,8 +13,8 @@ from ..compat import compat_HTTPError
class DiscoveryIE(DiscoveryGoBaseIE): class DiscoveryIE(DiscoveryGoBaseIE):
_VALID_URL = r'''(?x)https?:// _VALID_URL = r'''(?x)https?://
(?P<site> (?P<site>
(?:(?:www|go)\.)?discovery| go\.discovery|
(?:www\.)? www\.
(?: (?:
investigationdiscovery| investigationdiscovery|
discoverylife| discoverylife|
@ -22,8 +22,7 @@ class DiscoveryIE(DiscoveryGoBaseIE):
ahctv| ahctv|
destinationamerica| destinationamerica|
sciencechannel| sciencechannel|
tlc| tlc
velocity
)| )|
watch\. watch\.
(?: (?:
@ -83,7 +82,7 @@ class DiscoveryIE(DiscoveryGoBaseIE):
'authRel': 'authorization', 'authRel': 'authorization',
'client_id': '3020a40c2356a645b4b4', 'client_id': '3020a40c2356a645b4b4',
'nonce': ''.join([random.choice(string.ascii_letters) for _ in range(32)]), 'nonce': ''.join([random.choice(string.ascii_letters) for _ in range(32)]),
'redirectUri': 'https://fusion.ddmcdn.com/app/mercury-sdk/180/redirectHandler.html?https://www.%s.com' % site, 'redirectUri': 'https://www.discovery.com/',
})['access_token'] })['access_token']
headers = self.geo_verification_headers() headers = self.geo_verification_headers()

View File

@ -4,7 +4,6 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str
from ..utils import ( from ..utils import (
encode_base_n, encode_base_n,
ExtractorError, ExtractorError,
@ -55,7 +54,7 @@ class EpornerIE(InfoExtractor):
webpage, urlh = self._download_webpage_handle(url, display_id) webpage, urlh = self._download_webpage_handle(url, display_id)
video_id = self._match_id(compat_str(urlh.geturl())) video_id = self._match_id(urlh.geturl())
hash = self._search_regex( hash = self._search_regex(
r'hash\s*:\s*["\']([\da-f]{32})', webpage, 'hash') r'hash\s*:\s*["\']([\da-f]{32})', webpage, 'hash')

View File

@ -21,6 +21,7 @@ from .acast import (
from .adn import ADNIE from .adn import ADNIE
from .adobeconnect import AdobeConnectIE from .adobeconnect import AdobeConnectIE
from .adobetv import ( from .adobetv import (
AdobeTVEmbedIE,
AdobeTVIE, AdobeTVIE,
AdobeTVShowIE, AdobeTVShowIE,
AdobeTVChannelIE, AdobeTVChannelIE,
@ -104,6 +105,7 @@ from .bilibili import (
BiliBiliBangumiIE, BiliBiliBangumiIE,
BilibiliAudioIE, BilibiliAudioIE,
BilibiliAudioAlbumIE, BilibiliAudioAlbumIE,
BiliBiliPlayerIE,
) )
from .biobiochiletv import BioBioChileTVIE from .biobiochiletv import BioBioChileTVIE
from .bitchute import ( from .bitchute import (
@ -496,7 +498,6 @@ from .jeuxvideo import JeuxVideoIE
from .jove import JoveIE from .jove import JoveIE
from .joj import JojIE from .joj import JojIE
from .jwplatform import JWPlatformIE from .jwplatform import JWPlatformIE
from .jpopsukitv import JpopsukiIE
from .kakao import KakaoIE from .kakao import KakaoIE
from .kaltura import KalturaIE from .kaltura import KalturaIE
from .kanalplay import KanalPlayIE from .kanalplay import KanalPlayIE
@ -510,7 +511,6 @@ from .kickstarter import KickStarterIE
from .kinja import KinjaEmbedIE from .kinja import KinjaEmbedIE
from .kinopoisk import KinoPoiskIE from .kinopoisk import KinoPoiskIE
from .konserthusetplay import KonserthusetPlayIE from .konserthusetplay import KonserthusetPlayIE
from .kontrtube import KontrTubeIE
from .krasview import KrasViewIE from .krasview import KrasViewIE
from .ku6 import Ku6IE from .ku6 import Ku6IE
from .kusi import KUSIIE from .kusi import KUSIIE
@ -637,7 +637,10 @@ from .mixcloud import (
from .mlb import MLBIE from .mlb import MLBIE
from .mnet import MnetIE from .mnet import MnetIE
from .moevideo import MoeVideoIE from .moevideo import MoeVideoIE
from .mofosex import MofosexIE from .mofosex import (
MofosexIE,
MofosexEmbedIE,
)
from .mojvideo import MojvideoIE from .mojvideo import MojvideoIE
from .morningstar import MorningstarIE from .morningstar import MorningstarIE
from .motherless import ( from .motherless import (
@ -657,7 +660,6 @@ from .mtv import (
MTVJapanIE, MTVJapanIE,
) )
from .muenchentv import MuenchenTVIE from .muenchentv import MuenchenTVIE
from .musicplayon import MusicPlayOnIE
from .mwave import MwaveIE, MwaveMeetGreetIE from .mwave import MwaveIE, MwaveMeetGreetIE
from .mychannels import MyChannelsIE from .mychannels import MyChannelsIE
from .myspace import MySpaceIE, MySpaceAlbumIE from .myspace import MySpaceIE, MySpaceAlbumIE
@ -797,10 +799,6 @@ from .ooyala import (
OoyalaIE, OoyalaIE,
OoyalaExternalIE, OoyalaExternalIE,
) )
from .openload import (
OpenloadIE,
VerystreamIE,
)
from .ora import OraTVIE from .ora import OraTVIE
from .orf import ( from .orf import (
ORFTVthekIE, ORFTVthekIE,
@ -814,7 +812,6 @@ from .packtpub import (
PacktPubIE, PacktPubIE,
PacktPubCourseIE, PacktPubCourseIE,
) )
from .pandatv import PandaTVIE
from .pandoratv import PandoraTVIE from .pandoratv import PandoraTVIE
from .parliamentliveuk import ParliamentLiveUKIE from .parliamentliveuk import ParliamentLiveUKIE
from .patreon import PatreonIE from .patreon import PatreonIE
@ -857,6 +854,7 @@ from .polskieradio import (
PolskieRadioIE, PolskieRadioIE,
PolskieRadioCategoryIE, PolskieRadioCategoryIE,
) )
from .popcorntimes import PopcorntimesIE
from .popcorntv import PopcornTVIE from .popcorntv import PopcornTVIE
from .porn91 import Porn91IE from .porn91 import Porn91IE
from .porncom import PornComIE from .porncom import PornComIE
@ -969,7 +967,10 @@ from .savefrom import SaveFromIE
from .sbs import SBSIE from .sbs import SBSIE
from .screencast import ScreencastIE from .screencast import ScreencastIE
from .screencastomatic import ScreencastOMaticIE from .screencastomatic import ScreencastOMaticIE
from .scrippsnetworks import ScrippsNetworksWatchIE from .scrippsnetworks import (
ScrippsNetworksWatchIE,
ScrippsNetworksIE,
)
from .scte import ( from .scte import (
SCTEIE, SCTEIE,
SCTECourseIE, SCTECourseIE,
@ -1061,7 +1062,6 @@ from .srmediathek import SRMediathekIE
from .stanfordoc import StanfordOpenClassroomIE from .stanfordoc import StanfordOpenClassroomIE
from .steam import SteamIE from .steam import SteamIE
from .streamable import StreamableIE from .streamable import StreamableIE
from .streamango import StreamangoIE
from .streamcloud import StreamcloudIE from .streamcloud import StreamcloudIE
from .streamcz import StreamCZIE from .streamcz import StreamCZIE
from .streetvoice import StreetVoiceIE from .streetvoice import StreetVoiceIE
@ -1173,8 +1173,12 @@ from .turbo import TurboIE
from .tv2 import ( from .tv2 import (
TV2IE, TV2IE,
TV2ArticleIE, TV2ArticleIE,
KatsomoIE,
)
from .tv2dk import (
TV2DKIE,
TV2DKBornholmPlayIE,
) )
from .tv2dk import TV2DKIE
from .tv2hu import TV2HuIE from .tv2hu import TV2HuIE
from .tv4 import TV4IE from .tv4 import TV4IE
from .tv5mondeplus import TV5MondePlusIE from .tv5mondeplus import TV5MondePlusIE
@ -1238,7 +1242,10 @@ from .udemy import (
UdemyCourseIE UdemyCourseIE
) )
from .udn import UDNEmbedIE from .udn import UDNEmbedIE
from .ufctv import UFCTVIE from .ufctv import (
UFCTVIE,
UFCArabiaIE,
)
from .uktvplay import UKTVPlayIE from .uktvplay import UKTVPlayIE
from .digiteka import DigitekaIE from .digiteka import DigitekaIE
from .dlive import ( from .dlive import (
@ -1292,7 +1299,6 @@ from .videomore import (
VideomoreVideoIE, VideomoreVideoIE,
VideomoreSeasonIE, VideomoreSeasonIE,
) )
from .videopremium import VideoPremiumIE
from .videopress import VideoPressIE from .videopress import VideoPressIE
from .vidio import VidioIE from .vidio import VidioIE
from .vidlii import VidLiiIE from .vidlii import VidLiiIE

View File

@ -31,7 +31,13 @@ class FranceCultureIE(InfoExtractor):
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
video_data = extract_attributes(self._search_regex( video_data = extract_attributes(self._search_regex(
r'(?s)<div[^>]+class="[^"]*?(?:title-zone-diffusion|heading-zone-(?:wrapper|player-button))[^"]*?"[^>]*>.*?(<button[^>]+data-asset-source="[^"]+"[^>]+>)', r'''(?sx)
(?:
</h1>|
<div[^>]+class="[^"]*?(?:title-zone-diffusion|heading-zone-(?:wrapper|player-button))[^"]*?"[^>]*>
).*?
(<button[^>]+data-asset-source="[^"]+"[^>]+>)
''',
webpage, 'video data')) webpage, 'video data'))
video_url = video_data['data-asset-source'] video_url = video_data['data-asset-source']

View File

@ -60,6 +60,9 @@ from .tnaflix import TNAFlixNetworkEmbedIE
from .drtuber import DrTuberIE from .drtuber import DrTuberIE
from .redtube import RedTubeIE from .redtube import RedTubeIE
from .tube8 import Tube8IE from .tube8 import Tube8IE
from .mofosex import MofosexEmbedIE
from .spankwire import SpankwireIE
from .youporn import YouPornIE
from .vimeo import VimeoIE from .vimeo import VimeoIE
from .dailymotion import DailymotionIE from .dailymotion import DailymotionIE
from .dailymail import DailyMailIE from .dailymail import DailyMailIE
@ -88,10 +91,6 @@ from .piksel import PikselIE
from .videa import VideaIE from .videa import VideaIE
from .twentymin import TwentyMinutenIE from .twentymin import TwentyMinutenIE
from .ustream import UstreamIE from .ustream import UstreamIE
from .openload import (
OpenloadIE,
VerystreamIE,
)
from .videopress import VideoPressIE from .videopress import VideoPressIE
from .rutube import RutubeIE from .rutube import RutubeIE
from .limelight import LimelightBaseIE from .limelight import LimelightBaseIE
@ -2102,6 +2101,9 @@ class GenericIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'Smoky Barbecue Favorites', 'title': 'Smoky Barbecue Favorites',
'thumbnail': r're:^https?://.*\.jpe?g', 'thumbnail': r're:^https?://.*\.jpe?g',
'description': 'md5:5ff01e76316bd8d46508af26dc86023b',
'upload_date': '20170909',
'timestamp': 1504915200,
}, },
'add_ie': [ZypeIE.ie_key()], 'add_ie': [ZypeIE.ie_key()],
'params': { 'params': {
@ -2288,7 +2290,7 @@ class GenericIE(InfoExtractor):
if head_response is not False: if head_response is not False:
# Check for redirect # Check for redirect
new_url = compat_str(head_response.geturl()) new_url = head_response.geturl()
if url != new_url: if url != new_url:
self.report_following_redirect(new_url) self.report_following_redirect(new_url)
if force_videoid: if force_videoid:
@ -2388,12 +2390,12 @@ class GenericIE(InfoExtractor):
return self.playlist_result( return self.playlist_result(
self._parse_xspf( self._parse_xspf(
doc, video_id, xspf_url=url, doc, video_id, xspf_url=url,
xspf_base_url=compat_str(full_response.geturl())), xspf_base_url=full_response.geturl()),
video_id) video_id)
elif re.match(r'(?i)^(?:{[^}]+})?MPD$', doc.tag): elif re.match(r'(?i)^(?:{[^}]+})?MPD$', doc.tag):
info_dict['formats'] = self._parse_mpd_formats( info_dict['formats'] = self._parse_mpd_formats(
doc, doc,
mpd_base_url=compat_str(full_response.geturl()).rpartition('/')[0], mpd_base_url=full_response.geturl().rpartition('/')[0],
mpd_url=url) mpd_url=url)
self._sort_formats(info_dict['formats']) self._sort_formats(info_dict['formats'])
return info_dict return info_dict
@ -2537,15 +2539,21 @@ class GenericIE(InfoExtractor):
return self.playlist_from_matches( return self.playlist_from_matches(
dailymail_urls, video_id, video_title, ie=DailyMailIE.ie_key()) dailymail_urls, video_id, video_title, ie=DailyMailIE.ie_key())
# Look for Teachable embeds, must be before Wistia
teachable_url = TeachableIE._extract_url(webpage, url)
if teachable_url:
return self.url_result(teachable_url)
# Look for embedded Wistia player # Look for embedded Wistia player
wistia_url = WistiaIE._extract_url(webpage) wistia_urls = WistiaIE._extract_urls(webpage)
if wistia_url: if wistia_urls:
return { playlist = self.playlist_from_matches(wistia_urls, video_id, video_title, ie=WistiaIE.ie_key())
'_type': 'url_transparent', for entry in playlist['entries']:
'url': self._proto_relative_url(wistia_url), entry.update({
'ie_key': WistiaIE.ie_key(), '_type': 'url_transparent',
'uploader': video_uploader, 'uploader': video_uploader,
} })
return playlist
# Look for SVT player # Look for SVT player
svt_url = SVTIE._extract_url(webpage) svt_url = SVTIE._extract_url(webpage)
@ -2710,6 +2718,21 @@ class GenericIE(InfoExtractor):
if tube8_urls: if tube8_urls:
return self.playlist_from_matches(tube8_urls, video_id, video_title, ie=Tube8IE.ie_key()) return self.playlist_from_matches(tube8_urls, video_id, video_title, ie=Tube8IE.ie_key())
# Look for embedded Mofosex player
mofosex_urls = MofosexEmbedIE._extract_urls(webpage)
if mofosex_urls:
return self.playlist_from_matches(mofosex_urls, video_id, video_title, ie=MofosexEmbedIE.ie_key())
# Look for embedded Spankwire player
spankwire_urls = SpankwireIE._extract_urls(webpage)
if spankwire_urls:
return self.playlist_from_matches(spankwire_urls, video_id, video_title, ie=SpankwireIE.ie_key())
# Look for embedded YouPorn player
youporn_urls = YouPornIE._extract_urls(webpage)
if youporn_urls:
return self.playlist_from_matches(youporn_urls, video_id, video_title, ie=YouPornIE.ie_key())
# Look for embedded Tvigle player # Look for embedded Tvigle player
mobj = re.search( mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//cloud\.tvigle\.ru/video/.+?)\1', webpage) r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//cloud\.tvigle\.ru/video/.+?)\1', webpage)
@ -2964,7 +2987,7 @@ class GenericIE(InfoExtractor):
# Look for VODPlatform embeds # Look for VODPlatform embeds
mobj = re.search( mobj = re.search(
r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//(?:www\.)?vod-platform\.net/[eE]mbed/.+?)\1', r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//(?:(?:www\.)?vod-platform\.net|embed\.kwikmotion\.com)/[eE]mbed/.+?)\1',
webpage) webpage)
if mobj is not None: if mobj is not None:
return self.url_result( return self.url_result(
@ -3048,18 +3071,6 @@ class GenericIE(InfoExtractor):
return self.playlist_from_matches( return self.playlist_from_matches(
twentymin_urls, video_id, video_title, ie=TwentyMinutenIE.ie_key()) twentymin_urls, video_id, video_title, ie=TwentyMinutenIE.ie_key())
# Look for Openload embeds
openload_urls = OpenloadIE._extract_urls(webpage)
if openload_urls:
return self.playlist_from_matches(
openload_urls, video_id, video_title, ie=OpenloadIE.ie_key())
# Look for Verystream embeds
verystream_urls = VerystreamIE._extract_urls(webpage)
if verystream_urls:
return self.playlist_from_matches(
verystream_urls, video_id, video_title, ie=VerystreamIE.ie_key())
# Look for VideoPress embeds # Look for VideoPress embeds
videopress_urls = VideoPressIE._extract_urls(webpage) videopress_urls = VideoPressIE._extract_urls(webpage)
if videopress_urls: if videopress_urls:
@ -3153,10 +3164,6 @@ class GenericIE(InfoExtractor):
return self.playlist_from_matches( return self.playlist_from_matches(
peertube_urls, video_id, video_title, ie=PeerTubeIE.ie_key()) peertube_urls, video_id, video_title, ie=PeerTubeIE.ie_key())
teachable_url = TeachableIE._extract_url(webpage, url)
if teachable_url:
return self.url_result(teachable_url)
indavideo_urls = IndavideoEmbedIE._extract_urls(webpage) indavideo_urls = IndavideoEmbedIE._extract_urls(webpage)
if indavideo_urls: if indavideo_urls:
return self.playlist_from_matches( return self.playlist_from_matches(

View File

@ -1,12 +1,11 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
js_to_json, int_or_none,
merge_dicts,
remove_end, remove_end,
determine_ext, unified_timestamp,
) )
@ -14,15 +13,21 @@ class HellPornoIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?hellporno\.(?:com/videos|net/v)/(?P<id>[^/]+)' _VALID_URL = r'https?://(?:www\.)?hellporno\.(?:com/videos|net/v)/(?P<id>[^/]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://hellporno.com/videos/dixie-is-posing-with-naked-ass-very-erotic/', 'url': 'http://hellporno.com/videos/dixie-is-posing-with-naked-ass-very-erotic/',
'md5': '1fee339c610d2049699ef2aa699439f1', 'md5': 'f0a46ebc0bed0c72ae8fe4629f7de5f3',
'info_dict': { 'info_dict': {
'id': '149116', 'id': '149116',
'display_id': 'dixie-is-posing-with-naked-ass-very-erotic', 'display_id': 'dixie-is-posing-with-naked-ass-very-erotic',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Dixie is posing with naked ass very erotic', 'title': 'Dixie is posing with naked ass very erotic',
'description': 'md5:9a72922749354edb1c4b6e540ad3d215',
'categories': list,
'thumbnail': r're:https?://.*\.jpg$', 'thumbnail': r're:https?://.*\.jpg$',
'duration': 240,
'timestamp': 1398762720,
'upload_date': '20140429',
'view_count': int,
'age_limit': 18, 'age_limit': 18,
} },
}, { }, {
'url': 'http://hellporno.net/v/186271/', 'url': 'http://hellporno.net/v/186271/',
'only_matching': True, 'only_matching': True,
@ -36,40 +41,36 @@ class HellPornoIE(InfoExtractor):
title = remove_end(self._html_search_regex( title = remove_end(self._html_search_regex(
r'<title>([^<]+)</title>', webpage, 'title'), ' - Hell Porno') r'<title>([^<]+)</title>', webpage, 'title'), ' - Hell Porno')
flashvars = self._parse_json(self._search_regex( info = self._parse_html5_media_entries(url, webpage, display_id)[0]
r'var\s+flashvars\s*=\s*({.+?});', webpage, 'flashvars'), self._sort_formats(info['formats'])
display_id, transform_source=js_to_json)
video_id = flashvars.get('video_id') video_id = self._search_regex(
thumbnail = flashvars.get('preview_url') (r'chs_object\s*=\s*["\'](\d+)',
ext = determine_ext(flashvars.get('postfix'), 'mp4') r'params\[["\']video_id["\']\]\s*=\s*(\d+)'), webpage, 'video id',
default=display_id)
description = self._search_regex(
r'class=["\']desc_video_view_v2[^>]+>([^<]+)', webpage,
'description', fatal=False)
categories = [
c.strip()
for c in self._html_search_meta(
'keywords', webpage, 'categories', default='').split(',')
if c.strip()]
duration = int_or_none(self._og_search_property(
'video:duration', webpage, fatal=False))
timestamp = unified_timestamp(self._og_search_property(
'video:release_date', webpage, fatal=False))
view_count = int_or_none(self._search_regex(
r'>Views\s+(\d+)', webpage, 'view count', fatal=False))
formats = [] return merge_dicts(info, {
for video_url_key in ['video_url', 'video_alt_url']:
video_url = flashvars.get(video_url_key)
if not video_url:
continue
video_text = flashvars.get('%s_text' % video_url_key)
fmt = {
'url': video_url,
'ext': ext,
'format_id': video_text,
}
m = re.search(r'^(?P<height>\d+)[pP]', video_text)
if m:
fmt['height'] = int(m.group('height'))
formats.append(fmt)
self._sort_formats(formats)
categories = self._html_search_meta(
'keywords', webpage, 'categories', default='').split(',')
return {
'id': video_id, 'id': video_id,
'display_id': display_id, 'display_id': display_id,
'title': title, 'title': title,
'thumbnail': thumbnail, 'description': description,
'categories': categories, 'categories': categories,
'duration': duration,
'timestamp': timestamp,
'view_count': view_count,
'age_limit': 18, 'age_limit': 18,
'formats': formats, })
}

View File

@ -1,5 +1,7 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import base64
import json
import re import re
from .common import InfoExtractor from .common import InfoExtractor
@ -8,6 +10,7 @@ from ..utils import (
mimetype2ext, mimetype2ext,
parse_duration, parse_duration,
qualities, qualities,
try_get,
url_or_none, url_or_none,
) )
@ -15,15 +18,16 @@ from ..utils import (
class ImdbIE(InfoExtractor): class ImdbIE(InfoExtractor):
IE_NAME = 'imdb' IE_NAME = 'imdb'
IE_DESC = 'Internet Movie Database trailers' IE_DESC = 'Internet Movie Database trailers'
_VALID_URL = r'https?://(?:www|m)\.imdb\.com/(?:video|title|list).+?[/-]vi(?P<id>\d+)' _VALID_URL = r'https?://(?:www|m)\.imdb\.com/(?:video|title|list).*?[/-]vi(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.imdb.com/video/imdb/vi2524815897', 'url': 'http://www.imdb.com/video/imdb/vi2524815897',
'info_dict': { 'info_dict': {
'id': '2524815897', 'id': '2524815897',
'ext': 'mp4', 'ext': 'mp4',
'title': 'No. 2 from Ice Age: Continental Drift (2012)', 'title': 'No. 2',
'description': 'md5:87bd0bdc61e351f21f20d2d7441cb4e7', 'description': 'md5:87bd0bdc61e351f21f20d2d7441cb4e7',
'duration': 152,
} }
}, { }, {
'url': 'http://www.imdb.com/video/_/vi2524815897', 'url': 'http://www.imdb.com/video/_/vi2524815897',
@ -47,21 +51,23 @@ class ImdbIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(
'https://www.imdb.com/videoplayer/vi' + video_id, video_id) data = self._download_json(
video_metadata = self._parse_json(self._search_regex( 'https://www.imdb.com/ve/data/VIDEO_PLAYBACK_DATA', video_id,
r'window\.IMDbReactInitialState\.push\(({.+?})\);', webpage, query={
'video metadata'), video_id)['videos']['videoMetadata']['vi' + video_id] 'key': base64.b64encode(json.dumps({
title = self._html_search_meta( 'type': 'VIDEO_PLAYER',
['og:title', 'twitter:title'], webpage) or self._html_search_regex( 'subType': 'FORCE_LEGACY',
r'<title>(.+?)</title>', webpage, 'title', fatal=False) or video_metadata['title'] 'id': 'vi%s' % video_id,
}).encode()).decode(),
})[0]
quality = qualities(('SD', '480p', '720p', '1080p')) quality = qualities(('SD', '480p', '720p', '1080p'))
formats = [] formats = []
for encoding in video_metadata.get('encodings', []): for encoding in data['videoLegacyEncodings']:
if not encoding or not isinstance(encoding, dict): if not encoding or not isinstance(encoding, dict):
continue continue
video_url = url_or_none(encoding.get('videoUrl')) video_url = url_or_none(encoding.get('url'))
if not video_url: if not video_url:
continue continue
ext = mimetype2ext(encoding.get( ext = mimetype2ext(encoding.get(
@ -69,7 +75,7 @@ class ImdbIE(InfoExtractor):
if ext == 'm3u8': if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
video_url, video_id, 'mp4', entry_protocol='m3u8_native', video_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False)) preference=1, m3u8_id='hls', fatal=False))
continue continue
format_id = encoding.get('definition') format_id = encoding.get('definition')
formats.append({ formats.append({
@ -80,13 +86,33 @@ class ImdbIE(InfoExtractor):
}) })
self._sort_formats(formats) self._sort_formats(formats)
webpage = self._download_webpage(
'https://www.imdb.com/video/vi' + video_id, video_id)
video_metadata = self._parse_json(self._search_regex(
r'args\.push\(\s*({.+?})\s*\)\s*;', webpage,
'video metadata'), video_id)
video_info = video_metadata.get('VIDEO_INFO')
if video_info and isinstance(video_info, dict):
info = try_get(
video_info, lambda x: x[list(video_info.keys())[0]][0], dict)
else:
info = {}
title = self._html_search_meta(
['og:title', 'twitter:title'], webpage) or self._html_search_regex(
r'<title>(.+?)</title>', webpage, 'title',
default=None) or info['videoTitle']
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'alt_title': info.get('videoSubTitle'),
'formats': formats, 'formats': formats,
'description': video_metadata.get('description'), 'description': info.get('videoDescription'),
'thumbnail': video_metadata.get('slate', {}).get('url'), 'thumbnail': url_or_none(try_get(
'duration': parse_duration(video_metadata.get('duration')), video_metadata, lambda x: x['videoSlate']['source'])),
'duration': parse_duration(info.get('videoRuntime')),
} }

View File

@ -0,0 +1,133 @@
# coding: utf-8
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
from ..compat import compat_HTTPError
from ..utils import (
ExtractorError,
int_or_none,
str_or_none,
try_get,
)
class ImgGamingBaseIE(InfoExtractor):
_API_BASE = 'https://dce-frontoffice.imggaming.com/api/v2/'
_API_KEY = '857a1e5d-e35e-4fdf-805b-a87b6f8364bf'
_HEADERS = None
_MANIFEST_HEADERS = {'Accept-Encoding': 'identity'}
_REALM = None
_VALID_URL_TEMPL = r'https?://(?P<domain>%s)/(?P<type>live|playlist|video)/(?P<id>\d+)(?:\?.*?\bplaylistId=(?P<playlist_id>\d+))?'
def _real_initialize(self):
self._HEADERS = {
'Realm': 'dce.' + self._REALM,
'x-api-key': self._API_KEY,
}
email, password = self._get_login_info()
if email is None:
self.raise_login_required()
p_headers = self._HEADERS.copy()
p_headers['Content-Type'] = 'application/json'
self._HEADERS['Authorization'] = 'Bearer ' + self._download_json(
self._API_BASE + 'login',
None, 'Logging in', data=json.dumps({
'id': email,
'secret': password,
}).encode(), headers=p_headers)['authorisationToken']
def _call_api(self, path, media_id):
return self._download_json(
self._API_BASE + path + media_id, media_id, headers=self._HEADERS)
def _extract_dve_api_url(self, media_id, media_type):
stream_path = 'stream'
if media_type == 'video':
stream_path += '/vod/'
else:
stream_path += '?eventId='
try:
return self._call_api(
stream_path, media_id)['playerUrlCallback']
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
raise ExtractorError(
self._parse_json(e.cause.read().decode(), media_id)['messages'][0],
expected=True)
raise
def _real_extract(self, url):
domain, media_type, media_id, playlist_id = re.match(self._VALID_URL, url).groups()
if playlist_id:
if self._downloader.params.get('noplaylist'):
self.to_screen('Downloading just video %s because of --no-playlist' % media_id)
else:
self.to_screen('Downloading playlist %s - add --no-playlist to just download video' % playlist_id)
media_type, media_id = 'playlist', playlist_id
if media_type == 'playlist':
playlist = self._call_api('vod/playlist/', media_id)
entries = []
for video in try_get(playlist, lambda x: x['videos']['vods']) or []:
video_id = str_or_none(video.get('id'))
if not video_id:
continue
entries.append(self.url_result(
'https://%s/video/%s' % (domain, video_id),
self.ie_key(), video_id))
return self.playlist_result(
entries, media_id, playlist.get('title'),
playlist.get('description'))
dve_api_url = self._extract_dve_api_url(media_id, media_type)
video_data = self._download_json(dve_api_url, media_id)
is_live = media_type == 'live'
if is_live:
title = self._live_title(self._call_api('event/', media_id)['title'])
else:
title = video_data['name']
formats = []
for proto in ('hls', 'dash'):
media_url = video_data.get(proto + 'Url') or try_get(video_data, lambda x: x[proto]['url'])
if not media_url:
continue
if proto == 'hls':
m3u8_formats = self._extract_m3u8_formats(
media_url, media_id, 'mp4', 'm3u8' if is_live else 'm3u8_native',
m3u8_id='hls', fatal=False, headers=self._MANIFEST_HEADERS)
for f in m3u8_formats:
f.setdefault('http_headers', {}).update(self._MANIFEST_HEADERS)
formats.append(f)
else:
formats.extend(self._extract_mpd_formats(
media_url, media_id, mpd_id='dash', fatal=False,
headers=self._MANIFEST_HEADERS))
self._sort_formats(formats)
subtitles = {}
for subtitle in video_data.get('subtitles', []):
subtitle_url = subtitle.get('url')
if not subtitle_url:
continue
subtitles.setdefault(subtitle.get('lang', 'en_US'), []).append({
'url': subtitle_url,
})
return {
'id': media_id,
'title': title,
'formats': formats,
'thumbnail': video_data.get('thumbnailUrl'),
'description': video_data.get('description'),
'duration': int_or_none(video_data.get('duration')),
'tags': video_data.get('tags'),
'is_live': is_live,
'subtitles': subtitles,
}

View File

@ -1,8 +1,9 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
import json import json
import re
import sys
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
@ -91,48 +92,67 @@ class IviIE(InfoExtractor):
'contentid': video_id 'contentid': video_id
} }
] ]
}).encode() })
try: bundled = hasattr(sys, 'frozen')
from Crypto.Cipher import Blowfish
from Crypto.Hash import CMAC
timestamp = self._download_json( for site in (353, 183):
content_data = (data % site).encode()
if site == 353:
if bundled:
continue
try:
from Cryptodome.Cipher import Blowfish
from Cryptodome.Hash import CMAC
pycryptodomex_found = True
except ImportError:
pycryptodomex_found = False
continue
timestamp = (self._download_json(
self._LIGHT_URL, video_id,
'Downloading timestamp JSON', data=json.dumps({
'method': 'da.timestamp.get',
'params': []
}).encode(), fatal=False) or {}).get('result')
if not timestamp:
continue
query = {
'ts': timestamp,
'sign': CMAC.new(self._LIGHT_KEY, timestamp.encode() + content_data, Blowfish).hexdigest(),
}
else:
query = {}
video_json = self._download_json(
self._LIGHT_URL, video_id, self._LIGHT_URL, video_id,
'Downloading timestamp JSON', data=json.dumps({ 'Downloading video JSON', data=content_data, query=query)
'method': 'da.timestamp.get',
'params': []
}).encode())['result']
data = data % 353 error = video_json.get('error')
query = { if error:
'ts': timestamp, origin = error.get('origin')
'sign': CMAC.new(self._LIGHT_KEY, timestamp.encode() + data, Blowfish).hexdigest(), message = error.get('message') or error.get('user_message')
} extractor_msg = 'Unable to download video %s'
except ImportError: if origin == 'NotAllowedForLocation':
data = data % 183 self.raise_geo_restricted(message, self._GEO_COUNTRIES)
query = {} elif origin == 'NoRedisValidData':
extractor_msg = 'Video %s does not exist'
video_json = self._download_json( elif site == 353:
self._LIGHT_URL, video_id, continue
'Downloading video JSON', data=data, query=query) elif bundled:
error = video_json.get('error')
if error:
origin = error.get('origin')
message = error.get('message') or error.get('user_message')
extractor_msg = 'Unable to download video %s'
if origin == 'NotAllowedForLocation':
self.raise_geo_restricted(message, self._GEO_COUNTRIES)
elif origin == 'NoRedisValidData':
extractor_msg = 'Video %s does not exist'
elif message:
if 'недоступен для просмотра на площадке s183' in message:
raise ExtractorError( raise ExtractorError(
'pycryptodome not found. Please install it.', 'This feature does not work from bundled exe. Run youtube-dl from sources.',
expected=True) expected=True)
extractor_msg += ': ' + message elif not pycryptodomex_found:
raise ExtractorError(extractor_msg % video_id, expected=True) raise ExtractorError(
'pycryptodomex not found. Please install it.',
expected=True)
elif message:
extractor_msg += ': ' + message
raise ExtractorError(extractor_msg % video_id, expected=True)
else:
break
result = video_json['result'] result = video_json['result']
title = result['title'] title = result['title']
@ -219,7 +239,7 @@ class IviCompilationIE(InfoExtractor):
self.url_result( self.url_result(
'http://www.ivi.ru/watch/%s/%s' % (compilation_id, serie), IviIE.ie_key()) 'http://www.ivi.ru/watch/%s/%s' % (compilation_id, serie), IviIE.ie_key())
for serie in re.findall( for serie in re.findall(
r'<a href="/watch/%s/(\d+)"[^>]+data-id="\1"' % compilation_id, html)] r'<a\b[^>]+\bhref=["\']/watch/%s/(\d+)["\']' % compilation_id, html)]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)

View File

@ -1,68 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
int_or_none,
unified_strdate,
)
class JpopsukiIE(InfoExtractor):
IE_NAME = 'jpopsuki.tv'
_VALID_URL = r'https?://(?:www\.)?jpopsuki\.tv/(?:category/)?video/[^/]+/(?P<id>\S+)'
_TEST = {
'url': 'http://www.jpopsuki.tv/video/ayumi-hamasaki---evolution/00be659d23b0b40508169cdee4545771',
'md5': '88018c0c1a9b1387940e90ec9e7e198e',
'info_dict': {
'id': '00be659d23b0b40508169cdee4545771',
'ext': 'mp4',
'title': 'ayumi hamasaki - evolution',
'description': 'Release date: 2001.01.31\r\n浜崎あゆみ - evolution',
'thumbnail': 'http://www.jpopsuki.tv/cache/89722c74d2a2ebe58bcac65321c115b2.jpg',
'uploader': 'plama_chan',
'uploader_id': '404',
'upload_date': '20121101'
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_url = 'http://www.jpopsuki.tv' + self._html_search_regex(
r'<source src="(.*?)" type', webpage, 'video url')
video_title = self._og_search_title(webpage)
description = self._og_search_description(webpage)
thumbnail = self._og_search_thumbnail(webpage)
uploader = self._html_search_regex(
r'<li>from: <a href="/user/view/user/(.*?)/uid/',
webpage, 'video uploader', fatal=False)
uploader_id = self._html_search_regex(
r'<li>from: <a href="/user/view/user/\S*?/uid/(\d*)',
webpage, 'video uploader_id', fatal=False)
upload_date = unified_strdate(self._html_search_regex(
r'<li>uploaded: (.*?)</li>', webpage, 'video upload_date',
fatal=False))
view_count_str = self._html_search_regex(
r'<li>Hits: ([0-9]+?)</li>', webpage, 'video view_count',
fatal=False)
comment_count_str = self._html_search_regex(
r'<h2>([0-9]+?) comments</h2>', webpage, 'video comment_count',
fatal=False)
return {
'id': video_id,
'url': video_url,
'title': video_title,
'description': description,
'thumbnail': thumbnail,
'uploader': uploader,
'uploader_id': uploader_id,
'upload_date': upload_date,
'view_count': int_or_none(view_count_str),
'comment_count': int_or_none(comment_count_str),
}

View File

@ -1,73 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
int_or_none,
parse_duration,
)
class KontrTubeIE(InfoExtractor):
IE_NAME = 'kontrtube'
IE_DESC = 'KontrTube.ru - Труба зовёт'
_VALID_URL = r'https?://(?:www\.)?kontrtube\.ru/videos/(?P<id>\d+)/(?P<display_id>[^/]+)/'
_TEST = {
'url': 'http://www.kontrtube.ru/videos/2678/nad-olimpiyskoy-derevney-v-sochi-podnyat-rossiyskiy-flag/',
'md5': '975a991a4926c9a85f383a736a2e6b80',
'info_dict': {
'id': '2678',
'display_id': 'nad-olimpiyskoy-derevney-v-sochi-podnyat-rossiyskiy-flag',
'ext': 'mp4',
'title': 'Над олимпийской деревней в Сочи поднят российский флаг',
'description': 'md5:80edc4c613d5887ae8ccf1d59432be41',
'thumbnail': 'http://www.kontrtube.ru/contents/videos_screenshots/2000/2678/preview.mp4.jpg',
'duration': 270,
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
display_id = mobj.group('display_id')
webpage = self._download_webpage(
url, display_id, 'Downloading page')
video_url = self._search_regex(
r"video_url\s*:\s*'(.+?)/?',", webpage, 'video URL')
thumbnail = self._search_regex(
r"preview_url\s*:\s*'(.+?)/?',", webpage, 'thumbnail', fatal=False)
title = self._html_search_regex(
r'(?s)<h2>(.+?)</h2>', webpage, 'title')
description = self._html_search_meta(
'description', webpage, 'description')
duration = self._search_regex(
r'Длительность: <em>([^<]+)</em>', webpage, 'duration', fatal=False)
if duration:
duration = parse_duration(duration.replace('мин', 'min').replace('сек', 'sec'))
view_count = self._search_regex(
r'Просмотров: <em>([^<]+)</em>',
webpage, 'view count', fatal=False)
if view_count:
view_count = int_or_none(view_count.replace(' ', ''))
comment_count = int_or_none(self._search_regex(
r'Комментарии \((\d+)\)<', webpage, ' comment count', fatal=False))
return {
'id': video_id,
'display_id': display_id,
'url': video_url,
'thumbnail': thumbnail,
'title': title,
'description': description,
'duration': duration,
'view_count': int_or_none(view_count),
'comment_count': int_or_none(comment_count),
}

View File

@ -4,7 +4,6 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str
from ..utils import ( from ..utils import (
clean_html, clean_html,
determine_ext, determine_ext,
@ -36,7 +35,7 @@ class LecturioBaseIE(InfoExtractor):
self._LOGIN_URL, None, 'Downloading login popup') self._LOGIN_URL, None, 'Downloading login popup')
def is_logged(url_handle): def is_logged(url_handle):
return self._LOGIN_URL not in compat_str(url_handle.geturl()) return self._LOGIN_URL not in url_handle.geturl()
# Already logged in # Already logged in
if is_logged(urlh): if is_logged(urlh):

View File

@ -2,23 +2,24 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re import re
import uuid
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str from ..compat import compat_HTTPError
from ..utils import ( from ..utils import (
unescapeHTML, ExtractorError,
parse_duration, int_or_none,
get_element_by_class, qualities,
) )
class LEGOIE(InfoExtractor): class LEGOIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?lego\.com/(?P<locale>[^/]+)/(?:[^/]+/)*videos/(?:[^/]+/)*[^/?#]+-(?P<id>[0-9a-f]+)' _VALID_URL = r'https?://(?:www\.)?lego\.com/(?P<locale>[a-z]{2}-[a-z]{2})/(?:[^/]+/)*videos/(?:[^/]+/)*[^/?#]+-(?P<id>[0-9a-f]{32})'
_TESTS = [{ _TESTS = [{
'url': 'http://www.lego.com/en-us/videos/themes/club/blocumentary-kawaguchi-55492d823b1b4d5e985787fa8c2973b1', 'url': 'http://www.lego.com/en-us/videos/themes/club/blocumentary-kawaguchi-55492d823b1b4d5e985787fa8c2973b1',
'md5': 'f34468f176cfd76488767fc162c405fa', 'md5': 'f34468f176cfd76488767fc162c405fa',
'info_dict': { 'info_dict': {
'id': '55492d823b1b4d5e985787fa8c2973b1', 'id': '55492d82-3b1b-4d5e-9857-87fa8c2973b1_en-US',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Blocumentary Great Creations: Akiyuki Kawaguchi', 'title': 'Blocumentary Great Creations: Akiyuki Kawaguchi',
'description': 'Blocumentary Great Creations: Akiyuki Kawaguchi', 'description': 'Blocumentary Great Creations: Akiyuki Kawaguchi',
@ -26,103 +27,123 @@ class LEGOIE(InfoExtractor):
}, { }, {
# geo-restricted but the contentUrl contain a valid url # geo-restricted but the contentUrl contain a valid url
'url': 'http://www.lego.com/nl-nl/videos/themes/nexoknights/episode-20-kingdom-of-heroes-13bdc2299ab24d9685701a915b3d71e7##sp=399', 'url': 'http://www.lego.com/nl-nl/videos/themes/nexoknights/episode-20-kingdom-of-heroes-13bdc2299ab24d9685701a915b3d71e7##sp=399',
'md5': '4c3fec48a12e40c6e5995abc3d36cc2e', 'md5': 'c7420221f7ffd03ff056f9db7f8d807c',
'info_dict': { 'info_dict': {
'id': '13bdc2299ab24d9685701a915b3d71e7', 'id': '13bdc229-9ab2-4d96-8570-1a915b3d71e7_nl-NL',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Aflevering 20 - Helden van het koninkrijk', 'title': 'Aflevering 20: Helden van het koninkrijk',
'description': 'md5:8ee499aac26d7fa8bcb0cedb7f9c3941', 'description': 'md5:8ee499aac26d7fa8bcb0cedb7f9c3941',
'age_limit': 5,
}, },
}, { }, {
# special characters in title # with subtitle
'url': 'http://www.lego.com/en-us/starwars/videos/lego-star-wars-force-surprise-9685ee9d12e84ff38e84b4e3d0db533d', 'url': 'https://www.lego.com/nl-nl/kids/videos/classic/creative-storytelling-the-little-puppy-aa24f27c7d5242bc86102ebdc0f24cba',
'info_dict': { 'info_dict': {
'id': '9685ee9d12e84ff38e84b4e3d0db533d', 'id': 'aa24f27c-7d52-42bc-8610-2ebdc0f24cba_nl-NL',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Force Surprise LEGO® Star Wars™ Microfighters', 'title': 'De kleine puppy',
'description': 'md5:9c673c96ce6f6271b88563fe9dc56de3', 'description': 'md5:5b725471f849348ac73f2e12cfb4be06',
'age_limit': 1,
'subtitles': {
'nl': [{
'ext': 'srt',
'url': r're:^https://.+\.srt$',
}],
},
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
}] }]
_BITRATES = [256, 512, 1024, 1536, 2560] _QUALITIES = {
'Lowest': (64, 180, 320),
'Low': (64, 270, 480),
'Medium': (96, 360, 640),
'High': (128, 540, 960),
'Highest': (128, 720, 1280),
}
def _real_extract(self, url): def _real_extract(self, url):
locale, video_id = re.match(self._VALID_URL, url).groups() locale, video_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, video_id) countries = [locale.split('-')[1].upper()]
title = get_element_by_class('video-header', webpage).strip() self._initialize_geo_bypass({
progressive_base = 'https://lc-mediaplayerns-live-s.legocdn.com/' 'countries': countries,
streaming_base = 'http://legoprod-f.akamaihd.net/' })
content_url = self._html_search_meta('contentUrl', webpage)
path = self._search_regex(
r'(?:https?:)?//[^/]+/(?:[iz]/s/)?public/(.+)_[0-9,]+\.(?:mp4|webm)',
content_url, 'video path', default=None)
if not path:
player_url = self._proto_relative_url(self._search_regex(
r'<iframe[^>]+src="((?:https?)?//(?:www\.)?lego\.com/[^/]+/mediaplayer/video/[^"]+)',
webpage, 'player url', default=None))
if not player_url:
base_url = self._proto_relative_url(self._search_regex(
r'data-baseurl="([^"]+)"', webpage, 'base url',
default='http://www.lego.com/%s/mediaplayer/video/' % locale))
player_url = base_url + video_id
player_webpage = self._download_webpage(player_url, video_id)
video_data = self._parse_json(unescapeHTML(self._search_regex(
r"video='([^']+)'", player_webpage, 'video data')), video_id)
progressive_base = self._search_regex(
r'data-video-progressive-url="([^"]+)"',
player_webpage, 'progressive base', default='https://lc-mediaplayerns-live-s.legocdn.com/')
streaming_base = self._search_regex(
r'data-video-streaming-url="([^"]+)"',
player_webpage, 'streaming base', default='http://legoprod-f.akamaihd.net/')
item_id = video_data['ItemId']
net_storage_path = video_data.get('NetStoragePath') or '/'.join([item_id[:2], item_id[2:4]]) try:
base_path = '_'.join([item_id, video_data['VideoId'], video_data['Locale'], compat_str(video_data['VideoVersion'])]) item = self._download_json(
path = '/'.join([net_storage_path, base_path]) # https://contentfeed.services.lego.com/api/v2/item/[VIDEO_ID]?culture=[LOCALE]&contentType=Video
streaming_path = ','.join(map(lambda bitrate: compat_str(bitrate), self._BITRATES)) 'https://services.slingshot.lego.com/mediaplayer/v2',
video_id, query={
'videoId': '%s_%s' % (uuid.UUID(video_id), locale),
}, headers=self.geo_verification_headers())
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 451:
self.raise_geo_restricted(countries=countries)
raise
formats = self._extract_akamai_formats( video = item['Video']
'%si/s/public/%s_,%s,.mp4.csmil/master.m3u8' % (streaming_base, path, streaming_path), video_id) video_id = video['Id']
m3u8_formats = list(filter( title = video['Title']
lambda f: f.get('protocol') == 'm3u8_native' and f.get('vcodec') != 'none',
formats)) q = qualities(['Lowest', 'Low', 'Medium', 'High', 'Highest'])
if len(m3u8_formats) == len(self._BITRATES): formats = []
self._sort_formats(m3u8_formats) for video_source in item.get('VideoFormats', []):
for bitrate, m3u8_format in zip(self._BITRATES, m3u8_formats): video_source_url = video_source.get('Url')
progressive_base_url = '%spublic/%s_%d.' % (progressive_base, path, bitrate) if not video_source_url:
mp4_f = m3u8_format.copy() continue
mp4_f.update({ video_source_format = video_source.get('Format')
'url': progressive_base_url + 'mp4', if video_source_format == 'F4M':
'format_id': m3u8_format['format_id'].replace('hls', 'mp4'), formats.extend(self._extract_f4m_formats(
'protocol': 'http', video_source_url, video_id,
}) f4m_id=video_source_format, fatal=False))
web_f = { elif video_source_format == 'M3U8':
'url': progressive_base_url + 'webm', formats.extend(self._extract_m3u8_formats(
'format_id': m3u8_format['format_id'].replace('hls', 'webm'), video_source_url, video_id, 'mp4', 'm3u8_native',
'width': m3u8_format['width'], m3u8_id=video_source_format, fatal=False))
'height': m3u8_format['height'], else:
'tbr': m3u8_format.get('tbr'), video_source_quality = video_source.get('Quality')
'ext': 'webm', format_id = []
for v in (video_source_format, video_source_quality):
if v:
format_id.append(v)
f = {
'format_id': '-'.join(format_id),
'quality': q(video_source_quality),
'url': video_source_url,
} }
formats.extend([web_f, mp4_f]) quality = self._QUALITIES.get(video_source_quality)
else: if quality:
for bitrate in self._BITRATES: f.update({
for ext in ('web', 'mp4'): 'abr': quality[0],
formats.append({ 'height': quality[1],
'format_id': '%s-%s' % (ext, bitrate), 'width': quality[2],
'url': '%spublic/%s_%d.%s' % (progressive_base, path, bitrate, ext), }),
'tbr': bitrate, formats.append(f)
'ext': ext,
})
self._sort_formats(formats) self._sort_formats(formats)
subtitles = {}
sub_file_id = video.get('SubFileId')
if sub_file_id and sub_file_id != '00000000-0000-0000-0000-000000000000':
net_storage_path = video.get('NetstoragePath')
invariant_id = video.get('InvariantId')
video_file_id = video.get('VideoFileId')
video_version = video.get('VideoVersion')
if net_storage_path and invariant_id and video_file_id and video_version:
subtitles.setdefault(locale[:2], []).append({
'url': 'https://lc-mediaplayerns-live-s.legocdn.com/public/%s/%s_%s_%s_%s_sub.srt' % (net_storage_path, invariant_id, video_file_id, locale, video_version),
})
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'description': self._html_search_meta('description', webpage), 'description': video.get('Description'),
'thumbnail': self._html_search_meta('thumbnail', webpage), 'thumbnail': video.get('GeneratedCoverImage') or video.get('GeneratedThumbnail'),
'duration': parse_duration(self._html_search_meta('duration', webpage)), 'duration': int_or_none(video.get('Length')),
'formats': formats, 'formats': formats,
'subtitles': subtitles,
'age_limit': int_or_none(video.get('AgeFrom')),
'season': video.get('SeasonTitle'),
'season_number': int_or_none(video.get('Season')) or None,
'episode_number': int_or_none(video.get('Episode')) or None,
} }

View File

@ -18,7 +18,6 @@ from ..utils import (
class LimelightBaseIE(InfoExtractor): class LimelightBaseIE(InfoExtractor):
_PLAYLIST_SERVICE_URL = 'http://production-ps.lvp.llnw.net/r/PlaylistService/%s/%s/%s' _PLAYLIST_SERVICE_URL = 'http://production-ps.lvp.llnw.net/r/PlaylistService/%s/%s/%s'
_API_URL = 'http://api.video.limelight.com/rest/organizations/%s/%s/%s/%s.json'
@classmethod @classmethod
def _extract_urls(cls, webpage, source_url): def _extract_urls(cls, webpage, source_url):
@ -70,7 +69,8 @@ class LimelightBaseIE(InfoExtractor):
try: try:
return self._download_json( return self._download_json(
self._PLAYLIST_SERVICE_URL % (self._PLAYLIST_SERVICE_PATH, item_id, method), self._PLAYLIST_SERVICE_URL % (self._PLAYLIST_SERVICE_PATH, item_id, method),
item_id, 'Downloading PlaylistService %s JSON' % method, fatal=fatal, headers=headers) item_id, 'Downloading PlaylistService %s JSON' % method,
fatal=fatal, headers=headers)
except ExtractorError as e: except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403: if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
error = self._parse_json(e.cause.read().decode(), item_id)['detail']['contentAccessPermission'] error = self._parse_json(e.cause.read().decode(), item_id)['detail']['contentAccessPermission']
@ -79,22 +79,22 @@ class LimelightBaseIE(InfoExtractor):
raise ExtractorError(error, expected=True) raise ExtractorError(error, expected=True)
raise raise
def _call_api(self, organization_id, item_id, method): def _extract(self, item_id, pc_method, mobile_method, referer=None):
return self._download_json(
self._API_URL % (organization_id, self._API_PATH, item_id, method),
item_id, 'Downloading API %s JSON' % method)
def _extract(self, item_id, pc_method, mobile_method, meta_method, referer=None):
pc = self._call_playlist_service(item_id, pc_method, referer=referer) pc = self._call_playlist_service(item_id, pc_method, referer=referer)
metadata = self._call_api(pc['orgId'], item_id, meta_method) mobile = self._call_playlist_service(
mobile = self._call_playlist_service(item_id, mobile_method, fatal=False, referer=referer) item_id, mobile_method, fatal=False, referer=referer)
return pc, mobile, metadata return pc, mobile
def _extract_info(self, pc, mobile, i, referer):
get_item = lambda x, y: try_get(x, lambda x: x[y][i], dict) or {}
pc_item = get_item(pc, 'playlistItems')
mobile_item = get_item(mobile, 'mediaList')
video_id = pc_item.get('mediaId') or mobile_item['mediaId']
title = pc_item.get('title') or mobile_item['title']
def _extract_info(self, streams, mobile_urls, properties):
video_id = properties['media_id']
formats = [] formats = []
urls = [] urls = []
for stream in streams: for stream in pc_item.get('streams', []):
stream_url = stream.get('url') stream_url = stream.get('url')
if not stream_url or stream.get('drmProtected') or stream_url in urls: if not stream_url or stream.get('drmProtected') or stream_url in urls:
continue continue
@ -155,7 +155,7 @@ class LimelightBaseIE(InfoExtractor):
}) })
formats.append(fmt) formats.append(fmt)
for mobile_url in mobile_urls: for mobile_url in mobile_item.get('mobileUrls', []):
media_url = mobile_url.get('mobileUrl') media_url = mobile_url.get('mobileUrl')
format_id = mobile_url.get('targetMediaPlatform') format_id = mobile_url.get('targetMediaPlatform')
if not media_url or format_id in ('Widevine', 'SmoothStreaming') or media_url in urls: if not media_url or format_id in ('Widevine', 'SmoothStreaming') or media_url in urls:
@ -179,54 +179,34 @@ class LimelightBaseIE(InfoExtractor):
self._sort_formats(formats) self._sort_formats(formats)
title = properties['title']
description = properties.get('description')
timestamp = int_or_none(properties.get('publish_date') or properties.get('create_date'))
duration = float_or_none(properties.get('duration_in_milliseconds'), 1000)
filesize = int_or_none(properties.get('total_storage_in_bytes'))
categories = [properties.get('category')]
tags = properties.get('tags', [])
thumbnails = [{
'url': thumbnail['url'],
'width': int_or_none(thumbnail.get('width')),
'height': int_or_none(thumbnail.get('height')),
} for thumbnail in properties.get('thumbnails', []) if thumbnail.get('url')]
subtitles = {} subtitles = {}
for caption in properties.get('captions', []): for flag in mobile_item.get('flags'):
lang = caption.get('language_code') if flag == 'ClosedCaptions':
subtitles_url = caption.get('url') closed_captions = self._call_playlist_service(
if lang and subtitles_url: video_id, 'getClosedCaptionsDetailsByMediaId',
subtitles.setdefault(lang, []).append({ False, referer) or []
'url': subtitles_url, for cc in closed_captions:
}) cc_url = cc.get('webvttFileUrl')
closed_captions_url = properties.get('closed_captions_url') if not cc_url:
if closed_captions_url: continue
subtitles.setdefault('en', []).append({ lang = cc.get('languageCode') or self._search_regex(r'/[a-z]{2}\.vtt', cc_url, 'lang', default='en')
'url': closed_captions_url, subtitles.setdefault(lang, []).append({
'ext': 'ttml', 'url': cc_url,
}) })
break
get_meta = lambda x: pc_item.get(x) or mobile_item.get(x)
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'description': description, 'description': get_meta('description'),
'formats': formats, 'formats': formats,
'timestamp': timestamp, 'duration': float_or_none(get_meta('durationInMilliseconds'), 1000),
'duration': duration, 'thumbnail': get_meta('previewImageUrl') or get_meta('thumbnailImageUrl'),
'filesize': filesize,
'categories': categories,
'tags': tags,
'thumbnails': thumbnails,
'subtitles': subtitles, 'subtitles': subtitles,
} }
def _extract_info_helper(self, pc, mobile, i, metadata):
return self._extract_info(
try_get(pc, lambda x: x['playlistItems'][i]['streams'], list) or [],
try_get(mobile, lambda x: x['mediaList'][i]['mobileUrls'], list) or [],
metadata)
class LimelightMediaIE(LimelightBaseIE): class LimelightMediaIE(LimelightBaseIE):
IE_NAME = 'limelight' IE_NAME = 'limelight'
@ -251,8 +231,6 @@ class LimelightMediaIE(LimelightBaseIE):
'description': 'md5:8005b944181778e313d95c1237ddb640', 'description': 'md5:8005b944181778e313d95c1237ddb640',
'thumbnail': r're:^https?://.*\.jpeg$', 'thumbnail': r're:^https?://.*\.jpeg$',
'duration': 144.23, 'duration': 144.23,
'timestamp': 1244136834,
'upload_date': '20090604',
}, },
'params': { 'params': {
# m3u8 download # m3u8 download
@ -268,30 +246,29 @@ class LimelightMediaIE(LimelightBaseIE):
'title': '3Play Media Overview Video', 'title': '3Play Media Overview Video',
'thumbnail': r're:^https?://.*\.jpeg$', 'thumbnail': r're:^https?://.*\.jpeg$',
'duration': 78.101, 'duration': 78.101,
'timestamp': 1338929955, # TODO: extract all languages that were accessible via API
'upload_date': '20120605', # 'subtitles': 'mincount:9',
'subtitles': 'mincount:9', 'subtitles': 'mincount:1',
}, },
}, { }, {
'url': 'https://assets.delvenetworks.com/player/loader.swf?mediaId=8018a574f08d416e95ceaccae4ba0452', 'url': 'https://assets.delvenetworks.com/player/loader.swf?mediaId=8018a574f08d416e95ceaccae4ba0452',
'only_matching': True, 'only_matching': True,
}] }]
_PLAYLIST_SERVICE_PATH = 'media' _PLAYLIST_SERVICE_PATH = 'media'
_API_PATH = 'media'
def _real_extract(self, url): def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {}) url, smuggled_data = unsmuggle_url(url, {})
video_id = self._match_id(url) video_id = self._match_id(url)
source_url = smuggled_data.get('source_url')
self._initialize_geo_bypass({ self._initialize_geo_bypass({
'countries': smuggled_data.get('geo_countries'), 'countries': smuggled_data.get('geo_countries'),
}) })
pc, mobile, metadata = self._extract( pc, mobile = self._extract(
video_id, 'getPlaylistByMediaId', video_id, 'getPlaylistByMediaId',
'getMobilePlaylistByMediaId', 'properties', 'getMobilePlaylistByMediaId', source_url)
smuggled_data.get('source_url'))
return self._extract_info_helper(pc, mobile, 0, metadata) return self._extract_info(pc, mobile, 0, source_url)
class LimelightChannelIE(LimelightBaseIE): class LimelightChannelIE(LimelightBaseIE):
@ -313,6 +290,7 @@ class LimelightChannelIE(LimelightBaseIE):
'info_dict': { 'info_dict': {
'id': 'ab6a524c379342f9b23642917020c082', 'id': 'ab6a524c379342f9b23642917020c082',
'title': 'Javascript Sample Code', 'title': 'Javascript Sample Code',
'description': 'Javascript Sample Code - http://www.delvenetworks.com/sample-code/playerCode-demo.html',
}, },
'playlist_mincount': 3, 'playlist_mincount': 3,
}, { }, {
@ -320,22 +298,23 @@ class LimelightChannelIE(LimelightBaseIE):
'only_matching': True, 'only_matching': True,
}] }]
_PLAYLIST_SERVICE_PATH = 'channel' _PLAYLIST_SERVICE_PATH = 'channel'
_API_PATH = 'channels'
def _real_extract(self, url): def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {}) url, smuggled_data = unsmuggle_url(url, {})
channel_id = self._match_id(url) channel_id = self._match_id(url)
source_url = smuggled_data.get('source_url')
pc, mobile, medias = self._extract( pc, mobile = self._extract(
channel_id, 'getPlaylistByChannelId', channel_id, 'getPlaylistByChannelId',
'getMobilePlaylistWithNItemsByChannelId?begin=0&count=-1', 'getMobilePlaylistWithNItemsByChannelId?begin=0&count=-1',
'media', smuggled_data.get('source_url')) source_url)
entries = [ entries = [
self._extract_info_helper(pc, mobile, i, medias['media_list'][i]) self._extract_info(pc, mobile, i, source_url)
for i in range(len(medias['media_list']))] for i in range(len(pc['playlistItems']))]
return self.playlist_result(entries, channel_id, pc['title']) return self.playlist_result(
entries, channel_id, pc.get('title'), mobile.get('description'))
class LimelightChannelListIE(LimelightBaseIE): class LimelightChannelListIE(LimelightBaseIE):
@ -368,10 +347,12 @@ class LimelightChannelListIE(LimelightBaseIE):
def _real_extract(self, url): def _real_extract(self, url):
channel_list_id = self._match_id(url) channel_list_id = self._match_id(url)
channel_list = self._call_playlist_service(channel_list_id, 'getMobileChannelListById') channel_list = self._call_playlist_service(
channel_list_id, 'getMobileChannelListById')
entries = [ entries = [
self.url_result('limelight:channel:%s' % channel['id'], 'LimelightChannel') self.url_result('limelight:channel:%s' % channel['id'], 'LimelightChannel')
for channel in channel_list['channelList']] for channel in channel_list['channelList']]
return self.playlist_result(entries, channel_list_id, channel_list['title']) return self.playlist_result(
entries, channel_list_id, channel_list['title'])

View File

@ -8,7 +8,6 @@ from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_b64decode, compat_b64decode,
compat_HTTPError, compat_HTTPError,
compat_str,
) )
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
@ -99,7 +98,7 @@ class LinuxAcademyIE(InfoExtractor):
'sso': 'true', 'sso': 'true',
}) })
login_state_url = compat_str(urlh.geturl()) login_state_url = urlh.geturl()
try: try:
login_page = self._download_webpage( login_page = self._download_webpage(
@ -129,7 +128,7 @@ class LinuxAcademyIE(InfoExtractor):
}) })
access_token = self._search_regex( access_token = self._search_regex(
r'access_token=([^=&]+)', compat_str(urlh.geturl()), r'access_token=([^=&]+)', urlh.geturl(),
'access token') 'access token')
self._download_webpage( self._download_webpage(

View File

@ -20,10 +20,10 @@ class MailRuIE(InfoExtractor):
IE_DESC = 'Видео@Mail.Ru' IE_DESC = 'Видео@Mail.Ru'
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?:// https?://
(?:(?:www|m)\.)?my\.mail\.ru/ (?:(?:www|m)\.)?my\.mail\.ru/+
(?: (?:
video/.*\#video=/?(?P<idv1>(?:[^/]+/){3}\d+)| video/.*\#video=/?(?P<idv1>(?:[^/]+/){3}\d+)|
(?:(?P<idv2prefix>(?:[^/]+/){2})video/(?P<idv2suffix>[^/]+/\d+))\.html| (?:(?P<idv2prefix>(?:[^/]+/+){2})video/(?P<idv2suffix>[^/]+/\d+))\.html|
(?:video/embed|\+/video/meta)/(?P<metaid>\d+) (?:video/embed|\+/video/meta)/(?P<metaid>\d+)
) )
''' '''
@ -85,6 +85,14 @@ class MailRuIE(InfoExtractor):
{ {
'url': 'http://my.mail.ru/+/video/meta/7949340477499637815', 'url': 'http://my.mail.ru/+/video/meta/7949340477499637815',
'only_matching': True, 'only_matching': True,
},
{
'url': 'https://my.mail.ru//list/sinyutin10/video/_myvideo/4.html',
'only_matching': True,
},
{
'url': 'https://my.mail.ru//list//sinyutin10/video/_myvideo/4.html',
'only_matching': True,
} }
] ]
@ -237,7 +245,7 @@ class MailRuMusicSearchBaseIE(InfoExtractor):
class MailRuMusicIE(MailRuMusicSearchBaseIE): class MailRuMusicIE(MailRuMusicSearchBaseIE):
IE_NAME = 'mailru:music' IE_NAME = 'mailru:music'
IE_DESC = 'Музыка@Mail.Ru' IE_DESC = 'Музыка@Mail.Ru'
_VALID_URL = r'https?://my\.mail\.ru/music/songs/[^/?#&]+-(?P<id>[\da-f]+)' _VALID_URL = r'https?://my\.mail\.ru/+music/+songs/+[^/?#&]+-(?P<id>[\da-f]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://my.mail.ru/music/songs/%D0%BC8%D0%BB8%D1%82%D1%85-l-a-h-luciferian-aesthetics-of-herrschaft-single-2017-4e31f7125d0dfaef505d947642366893', 'url': 'https://my.mail.ru/music/songs/%D0%BC8%D0%BB8%D1%82%D1%85-l-a-h-luciferian-aesthetics-of-herrschaft-single-2017-4e31f7125d0dfaef505d947642366893',
'md5': '0f8c22ef8c5d665b13ac709e63025610', 'md5': '0f8c22ef8c5d665b13ac709e63025610',
@ -273,7 +281,7 @@ class MailRuMusicIE(MailRuMusicSearchBaseIE):
class MailRuMusicSearchIE(MailRuMusicSearchBaseIE): class MailRuMusicSearchIE(MailRuMusicSearchBaseIE):
IE_NAME = 'mailru:music:search' IE_NAME = 'mailru:music:search'
IE_DESC = 'Музыка@Mail.Ru' IE_DESC = 'Музыка@Mail.Ru'
_VALID_URL = r'https?://my\.mail\.ru/music/search/(?P<id>[^/?#&]+)' _VALID_URL = r'https?://my\.mail\.ru/+music/+search/+(?P<id>[^/?#&]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://my.mail.ru/music/search/black%20shadow', 'url': 'https://my.mail.ru/music/search/black%20shadow',
'info_dict': { 'info_dict': {

View File

@ -6,7 +6,6 @@ import re
from .theplatform import ThePlatformBaseIE from .theplatform import ThePlatformBaseIE
from ..compat import ( from ..compat import (
compat_parse_qs, compat_parse_qs,
compat_str,
compat_urllib_parse_urlparse, compat_urllib_parse_urlparse,
) )
from ..utils import ( from ..utils import (
@ -114,7 +113,7 @@ class MediasetIE(ThePlatformBaseIE):
continue continue
urlh = ie._request_webpage( urlh = ie._request_webpage(
embed_url, video_id, note='Following embed URL redirect') embed_url, video_id, note='Following embed URL redirect')
embed_url = compat_str(urlh.geturl()) embed_url = urlh.geturl()
program_guid = _program_guid(_qs(embed_url)) program_guid = _program_guid(_qs(embed_url))
if program_guid: if program_guid:
entries.append(embed_url) entries.append(embed_url)
@ -123,7 +122,7 @@ class MediasetIE(ThePlatformBaseIE):
def _parse_smil_formats(self, smil, smil_url, video_id, namespace=None, f4m_params=None, transform_rtmp_url=None): def _parse_smil_formats(self, smil, smil_url, video_id, namespace=None, f4m_params=None, transform_rtmp_url=None):
for video in smil.findall(self._xpath_ns('.//video', namespace)): for video in smil.findall(self._xpath_ns('.//video', namespace)):
video.attrib['src'] = re.sub(r'(https?://vod05)t(-mediaset-it\.akamaized\.net/.+?.mpd)\?.+', r'\1\2', video.attrib['src']) video.attrib['src'] = re.sub(r'(https?://vod05)t(-mediaset-it\.akamaized\.net/.+?.mpd)\?.+', r'\1\2', video.attrib['src'])
return super()._parse_smil_formats(smil, smil_url, video_id, namespace, f4m_params, transform_rtmp_url) return super(MediasetIE, self)._parse_smil_formats(smil, smil_url, video_id, namespace, f4m_params, transform_rtmp_url)
def _real_extract(self, url): def _real_extract(self, url):
guid = self._match_id(url) guid = self._match_id(url)

View File

@ -129,7 +129,7 @@ class MediasiteIE(InfoExtractor):
query = mobj.group('query') query = mobj.group('query')
webpage, urlh = self._download_webpage_handle(url, resource_id) # XXX: add UrlReferrer? webpage, urlh = self._download_webpage_handle(url, resource_id) # XXX: add UrlReferrer?
redirect_url = compat_str(urlh.geturl()) redirect_url = urlh.geturl()
# XXX: might have also extracted UrlReferrer and QueryString from the html # XXX: might have also extracted UrlReferrer and QueryString from the html
service_path = compat_urlparse.urljoin(redirect_url, self._html_search_regex( service_path = compat_urlparse.urljoin(redirect_url, self._html_search_regex(

View File

@ -4,8 +4,8 @@ from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
int_or_none, int_or_none,
parse_iso8601,
smuggle_url, smuggle_url,
parse_duration,
) )
@ -18,16 +18,18 @@ class MiTeleIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': 'FhYW1iNTE6J6H7NkQRIEzfne6t2quqPg', 'id': 'FhYW1iNTE6J6H7NkQRIEzfne6t2quqPg',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Tor, la web invisible', 'title': 'Diario de La redacción Programa 144',
'description': 'md5:3b6fce7eaa41b2d97358726378d9369f', 'description': 'md5:07c35a7b11abb05876a6a79185b58d27',
'series': 'Diario de', 'series': 'Diario de',
'season': 'La redacción', 'season': 'Season 14',
'season_number': 14, 'season_number': 14,
'season_id': 'diario_de_t14_11981', 'episode': 'Tor, la web invisible',
'episode': 'Programa 144',
'episode_number': 3, 'episode_number': 3,
'thumbnail': r're:(?i)^https?://.*\.jpg$', 'thumbnail': r're:(?i)^https?://.*\.jpg$',
'duration': 2913, 'duration': 2913,
'age_limit': 16,
'timestamp': 1471209401,
'upload_date': '20160814',
}, },
'add_ie': ['Ooyala'], 'add_ie': ['Ooyala'],
}, { }, {
@ -39,13 +41,15 @@ class MiTeleIE(InfoExtractor):
'title': 'Cuarto Milenio Temporada 6 Programa 226', 'title': 'Cuarto Milenio Temporada 6 Programa 226',
'description': 'md5:5ff132013f0cd968ffbf1f5f3538a65f', 'description': 'md5:5ff132013f0cd968ffbf1f5f3538a65f',
'series': 'Cuarto Milenio', 'series': 'Cuarto Milenio',
'season': 'Temporada 6', 'season': 'Season 6',
'season_number': 6, 'season_number': 6,
'season_id': 'cuarto_milenio_t06_12715', 'episode': 'Episode 24',
'episode': 'Programa 226',
'episode_number': 24, 'episode_number': 24,
'thumbnail': r're:(?i)^https?://.*\.jpg$', 'thumbnail': r're:(?i)^https?://.*\.jpg$',
'duration': 7313, 'duration': 7313,
'age_limit': 12,
'timestamp': 1471209021,
'upload_date': '20160814',
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
@ -54,67 +58,36 @@ class MiTeleIE(InfoExtractor):
}, { }, {
'url': 'http://www.mitele.es/series-online/la-que-se-avecina/57aac5c1c915da951a8b45ed/player', 'url': 'http://www.mitele.es/series-online/la-que-se-avecina/57aac5c1c915da951a8b45ed/player',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.mitele.es/programas-tv/diario-de/la-redaccion/programa-144-40_1006364575251/player/',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
paths = self._download_json( pre_player = self._parse_json(self._search_regex(
'https://www.mitele.es/amd/agp/web/metadata/general_configuration', r'window\.\$REACTBASE_STATE\.prePlayer_mtweb\s*=\s*({.+})',
video_id, 'Downloading paths JSON') webpage, 'Pre Player'), display_id)['prePlayer']
title = pre_player['title']
ooyala_s = paths['general_configuration']['api_configuration']['ooyala_search'] video = pre_player['video']
base_url = ooyala_s.get('base_url', 'cdn-search-mediaset.carbyne.ps.ooyala.com') video_id = video['dataMediaId']
full_path = ooyala_s.get('full_path', '/search/v1/full/providers/') content = pre_player.get('content') or {}
source = self._download_json( info = content.get('info') or {}
'%s://%s%s%s/docs/%s' % (
ooyala_s.get('protocol', 'https'), base_url, full_path,
ooyala_s.get('provider_id', '104951'), video_id),
video_id, 'Downloading data JSON', query={
'include_titles': 'Series,Season',
'product_name': ooyala_s.get('product_name', 'test'),
'format': 'full',
})['hits']['hits'][0]['_source']
embedCode = source['offers'][0]['embed_codes'][0]
titles = source['localizable_titles'][0]
title = titles.get('title_medium') or titles['title_long']
description = titles.get('summary_long') or titles.get('summary_medium')
def get(key1, key2):
value1 = source.get(key1)
if not value1 or not isinstance(value1, list):
return
if not isinstance(value1[0], dict):
return
return value1[0].get(key2)
series = get('localizable_titles_series', 'title_medium')
season = get('localizable_titles_season', 'title_medium')
season_number = int_or_none(source.get('season_number'))
season_id = source.get('season_id')
episode = titles.get('title_sort_name')
episode_number = int_or_none(source.get('episode_number'))
duration = parse_duration(get('videos', 'duration'))
return { return {
'_type': 'url_transparent', '_type': 'url_transparent',
# for some reason only HLS is supported # for some reason only HLS is supported
'url': smuggle_url('ooyala:' + embedCode, {'supportedformats': 'm3u8,dash'}), 'url': smuggle_url('ooyala:' + video_id, {'supportedformats': 'm3u8,dash'}),
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'description': description, 'description': info.get('synopsis'),
'series': series, 'series': content.get('title'),
'season': season, 'season_number': int_or_none(info.get('season_number')),
'season_number': season_number, 'episode': content.get('subtitle'),
'season_id': season_id, 'episode_number': int_or_none(info.get('episode_number')),
'episode': episode, 'duration': int_or_none(info.get('duration')),
'episode_number': episode_number, 'thumbnail': video.get('dataPoster'),
'duration': duration, 'age_limit': int_or_none(info.get('rating')),
'thumbnail': get('images', 'url'), 'timestamp': parse_iso8601(pre_player.get('publishedTime')),
} }

View File

@ -1,5 +1,8 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import ( from ..utils import (
int_or_none, int_or_none,
str_to_int, str_to_int,
@ -54,3 +57,23 @@ class MofosexIE(KeezMoviesIE):
}) })
return info return info
class MofosexEmbedIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?mofosex\.com/embed/?\?.*?\bvideoid=(?P<id>\d+)'
_TESTS = [{
'url': 'https://www.mofosex.com/embed/?videoid=318131&referrer=KM',
'only_matching': True,
}]
@staticmethod
def _extract_urls(webpage):
return re.findall(
r'<iframe[^>]+\bsrc=["\']((?:https?:)?//(?:www\.)?mofosex\.com/embed/?\?.*?\bvideoid=\d+)',
webpage)
def _real_extract(self, url):
video_id = self._match_id(url)
return self.url_result(
'http://www.mofosex.com/videos/{0}/{0}.html'.format(video_id),
ie=MofosexIE.ie_key(), video_id=video_id)

View File

@ -26,7 +26,7 @@ class MotherlessIE(InfoExtractor):
'categories': ['Gaming', 'anal', 'reluctant', 'rough', 'Wife'], 'categories': ['Gaming', 'anal', 'reluctant', 'rough', 'Wife'],
'upload_date': '20100913', 'upload_date': '20100913',
'uploader_id': 'famouslyfuckedup', 'uploader_id': 'famouslyfuckedup',
'thumbnail': r're:http://.*\.jpg', 'thumbnail': r're:https?://.*\.jpg',
'age_limit': 18, 'age_limit': 18,
} }
}, { }, {
@ -40,7 +40,7 @@ class MotherlessIE(InfoExtractor):
'game', 'hairy'], 'game', 'hairy'],
'upload_date': '20140622', 'upload_date': '20140622',
'uploader_id': 'Sulivana7x', 'uploader_id': 'Sulivana7x',
'thumbnail': r're:http://.*\.jpg', 'thumbnail': r're:https?://.*\.jpg',
'age_limit': 18, 'age_limit': 18,
}, },
'skip': '404', 'skip': '404',
@ -54,7 +54,7 @@ class MotherlessIE(InfoExtractor):
'categories': ['superheroine heroine superher'], 'categories': ['superheroine heroine superher'],
'upload_date': '20140827', 'upload_date': '20140827',
'uploader_id': 'shade0230', 'uploader_id': 'shade0230',
'thumbnail': r're:http://.*\.jpg', 'thumbnail': r're:https?://.*\.jpg',
'age_limit': 18, 'age_limit': 18,
} }
}, { }, {
@ -76,7 +76,8 @@ class MotherlessIE(InfoExtractor):
raise ExtractorError('Video %s is for friends only' % video_id, expected=True) raise ExtractorError('Video %s is for friends only' % video_id, expected=True)
title = self._html_search_regex( title = self._html_search_regex(
r'id="view-upload-title">\s+([^<]+)<', webpage, 'title') (r'(?s)<div[^>]+\bclass=["\']media-meta-title[^>]+>(.+?)</div>',
r'id="view-upload-title">\s+([^<]+)<'), webpage, 'title')
video_url = (self._html_search_regex( video_url = (self._html_search_regex(
(r'setup\(\{\s*["\']file["\']\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1', (r'setup\(\{\s*["\']file["\']\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1',
r'fileurl\s*=\s*(["\'])(?P<url>(?:(?!\1).)+)\1'), r'fileurl\s*=\s*(["\'])(?P<url>(?:(?!\1).)+)\1'),
@ -84,14 +85,15 @@ class MotherlessIE(InfoExtractor):
or 'http://cdn4.videos.motherlessmedia.com/videos/%s.mp4?fs=opencloud' % video_id) or 'http://cdn4.videos.motherlessmedia.com/videos/%s.mp4?fs=opencloud' % video_id)
age_limit = self._rta_search(webpage) age_limit = self._rta_search(webpage)
view_count = str_to_int(self._html_search_regex( view_count = str_to_int(self._html_search_regex(
r'<strong>Views</strong>\s+([^<]+)<', (r'>(\d+)\s+Views<', r'<strong>Views</strong>\s+([^<]+)<'),
webpage, 'view count', fatal=False)) webpage, 'view count', fatal=False))
like_count = str_to_int(self._html_search_regex( like_count = str_to_int(self._html_search_regex(
r'<strong>Favorited</strong>\s+([^<]+)<', (r'>(\d+)\s+Favorites<', r'<strong>Favorited</strong>\s+([^<]+)<'),
webpage, 'like count', fatal=False)) webpage, 'like count', fatal=False))
upload_date = self._html_search_regex( upload_date = self._html_search_regex(
r'<strong>Uploaded</strong>\s+([^<]+)<', webpage, 'upload date') (r'class=["\']count[^>]+>(\d+\s+[a-zA-Z]{3}\s+\d{4})<',
r'<strong>Uploaded</strong>\s+([^<]+)<'), webpage, 'upload date')
if 'Ago' in upload_date: if 'Ago' in upload_date:
days = int(re.search(r'([0-9]+)', upload_date).group(1)) days = int(re.search(r'([0-9]+)', upload_date).group(1))
upload_date = (datetime.datetime.now() - datetime.timedelta(days=days)).strftime('%Y%m%d') upload_date = (datetime.datetime.now() - datetime.timedelta(days=days)).strftime('%Y%m%d')

View File

@ -14,20 +14,27 @@ from ..utils import (
class MSNIE(InfoExtractor): class MSNIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?msn\.com/(?:[^/]+/)+(?P<display_id>[^/]+)/[a-z]{2}-(?P<id>[\da-zA-Z]+)' _VALID_URL = r'https?://(?:(?:www|preview)\.)?msn\.com/(?:[^/]+/)+(?P<display_id>[^/]+)/[a-z]{2}-(?P<id>[\da-zA-Z]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.msn.com/en-ae/foodanddrink/joinourtable/criminal-minds-shemar-moore-shares-a-touching-goodbye-message/vp-BBqQYNE', 'url': 'https://www.msn.com/en-in/money/video/7-ways-to-get-rid-of-chest-congestion/vi-BBPxU6d',
'md5': '8442f66c116cbab1ff7098f986983458', 'md5': '087548191d273c5c55d05028f8d2cbcd',
'info_dict': { 'info_dict': {
'id': 'BBqQYNE', 'id': 'BBPxU6d',
'display_id': 'criminal-minds-shemar-moore-shares-a-touching-goodbye-message', 'display_id': '7-ways-to-get-rid-of-chest-congestion',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Criminal Minds - Shemar Moore Shares A Touching Goodbye Message', 'title': 'Seven ways to get rid of chest congestion',
'description': 'md5:e8e89b897b222eb33a6b5067a8f1bc25', 'description': '7 Ways to Get Rid of Chest Congestion',
'duration': 104, 'duration': 88,
'uploader': 'CBS Entertainment', 'uploader': 'Health',
'uploader_id': 'IT0X5aoJ6bJgYerJXSDCgFmYPB1__54v', 'uploader_id': 'BBPrMqa',
}, },
}, {
# Article, multiple Dailymotion Embeds
'url': 'https://www.msn.com/en-in/money/sports/hottest-football-wags-greatest-footballers-turned-managers-and-more/ar-BBpc7Nl',
'info_dict': {
'id': 'BBpc7Nl',
},
'playlist_mincount': 4,
}, { }, {
'url': 'http://www.msn.com/en-ae/news/offbeat/meet-the-nine-year-old-self-made-millionaire/ar-BBt6ZKf', 'url': 'http://www.msn.com/en-ae/news/offbeat/meet-the-nine-year-old-self-made-millionaire/ar-BBt6ZKf',
'only_matching': True, 'only_matching': True,
@ -43,93 +50,122 @@ class MSNIE(InfoExtractor):
'only_matching': True, 'only_matching': True,
}, { }, {
# Vidible(AOL) Embed # Vidible(AOL) Embed
'url': 'https://www.msn.com/en-us/video/animals/yellowstone-park-staffers-catch-deer-engaged-in-behavior-they-cant-explain/vi-AAGfdg1', 'url': 'https://www.msn.com/en-us/money/other/jupiter-is-about-to-come-so-close-you-can-see-its-moons-with-binoculars/vi-AACqsHR',
'only_matching': True, 'only_matching': True,
}, { }, {
# Dailymotion Embed # Dailymotion Embed
'url': 'https://www.msn.com/es-ve/entretenimiento/watch/winston-salem-paire-refait-des-siennes-en-perdant-sa-raquette-au-service/vp-AAG704L', 'url': 'https://www.msn.com/es-ve/entretenimiento/watch/winston-salem-paire-refait-des-siennes-en-perdant-sa-raquette-au-service/vp-AAG704L',
'only_matching': True, 'only_matching': True,
}, {
# YouTube Embed
'url': 'https://www.msn.com/en-in/money/news/meet-vikram-%E2%80%94-chandrayaan-2s-lander/vi-AAGUr0v',
'only_matching': True,
}, {
# NBCSports Embed
'url': 'https://www.msn.com/en-us/money/football_nfl/week-13-preview-redskins-vs-panthers/vi-BBXsCDb',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) display_id, page_id = re.match(self._VALID_URL, url).groups()
video_id, display_id = mobj.group('id', 'display_id')
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
video = self._parse_json( entries = []
self._search_regex( for _, metadata in re.findall(r'data-metadata\s*=\s*(["\'])(?P<data>.+?)\1', webpage):
r'data-metadata\s*=\s*(["\'])(?P<data>.+?)\1', video = self._parse_json(unescapeHTML(metadata), display_id)
webpage, 'video data', default='{}', group='data'),
display_id, transform_source=unescapeHTML)
if not video: provider_id = video.get('providerId')
player_name = video.get('playerName')
if player_name and provider_id:
entry = None
if player_name == 'AOL':
if provider_id.startswith('http'):
provider_id = self._search_regex(
r'https?://delivery\.vidible\.tv/video/redirect/([0-9a-f]{24})',
provider_id, 'vidible id')
entry = self.url_result(
'aol-video:' + provider_id, 'Aol', provider_id)
elif player_name == 'Dailymotion':
entry = self.url_result(
'https://www.dailymotion.com/video/' + provider_id,
'Dailymotion', provider_id)
elif player_name == 'YouTube':
entry = self.url_result(
provider_id, 'Youtube', provider_id)
elif player_name == 'NBCSports':
entry = self.url_result(
'http://vplayer.nbcsports.com/p/BxmELC/nbcsports_embed/select/media/' + provider_id,
'NBCSportsVPlayer', provider_id)
if entry:
entries.append(entry)
continue
video_id = video['uuid']
title = video['title']
formats = []
for file_ in video.get('videoFiles', []):
format_url = file_.get('url')
if not format_url:
continue
if 'format=m3u8-aapl' in format_url:
# m3u8_native should not be used here until
# https://github.com/ytdl-org/youtube-dl/issues/9913 is fixed
formats.extend(self._extract_m3u8_formats(
format_url, display_id, 'mp4',
m3u8_id='hls', fatal=False))
elif 'format=mpd-time-csf' in format_url:
formats.extend(self._extract_mpd_formats(
format_url, display_id, 'dash', fatal=False))
elif '.ism' in format_url:
if format_url.endswith('.ism'):
format_url += '/manifest'
formats.extend(self._extract_ism_formats(
format_url, display_id, 'mss', fatal=False))
else:
format_id = file_.get('formatCode')
formats.append({
'url': format_url,
'ext': 'mp4',
'format_id': format_id,
'width': int_or_none(file_.get('width')),
'height': int_or_none(file_.get('height')),
'vbr': int_or_none(self._search_regex(r'_(\d+)\.mp4', format_url, 'vbr', default=None)),
'preference': 1 if format_id == '1001' else None,
})
self._sort_formats(formats)
subtitles = {}
for file_ in video.get('files', []):
format_url = file_.get('url')
format_code = file_.get('formatCode')
if not format_url or not format_code:
continue
if compat_str(format_code) == '3100':
subtitles.setdefault(file_.get('culture', 'en'), []).append({
'ext': determine_ext(format_url, 'ttml'),
'url': format_url,
})
entries.append({
'id': video_id,
'display_id': display_id,
'title': title,
'description': video.get('description'),
'thumbnail': video.get('headlineImage', {}).get('url'),
'duration': int_or_none(video.get('durationSecs')),
'uploader': video.get('sourceFriendly'),
'uploader_id': video.get('providerId'),
'creator': video.get('creator'),
'subtitles': subtitles,
'formats': formats,
})
if not entries:
error = unescapeHTML(self._search_regex( error = unescapeHTML(self._search_regex(
r'data-error=(["\'])(?P<error>.+?)\1', r'data-error=(["\'])(?P<error>.+?)\1',
webpage, 'error', group='error')) webpage, 'error', group='error'))
raise ExtractorError('%s said: %s' % (self.IE_NAME, error), expected=True) raise ExtractorError('%s said: %s' % (self.IE_NAME, error), expected=True)
player_name = video.get('playerName') return self.playlist_result(entries, page_id)
if player_name:
provider_id = video.get('providerId')
if provider_id:
if player_name == 'AOL':
return self.url_result(
'aol-video:' + provider_id, 'Aol', provider_id)
elif player_name == 'Dailymotion':
return self.url_result(
'https://www.dailymotion.com/video/' + provider_id,
'Dailymotion', provider_id)
title = video['title']
formats = []
for file_ in video.get('videoFiles', []):
format_url = file_.get('url')
if not format_url:
continue
if 'm3u8' in format_url:
# m3u8_native should not be used here until
# https://github.com/ytdl-org/youtube-dl/issues/9913 is fixed
m3u8_formats = self._extract_m3u8_formats(
format_url, display_id, 'mp4',
m3u8_id='hls', fatal=False)
formats.extend(m3u8_formats)
elif determine_ext(format_url) == 'ism':
formats.extend(self._extract_ism_formats(
format_url + '/Manifest', display_id, 'mss', fatal=False))
else:
formats.append({
'url': format_url,
'ext': 'mp4',
'format_id': 'http',
'width': int_or_none(file_.get('width')),
'height': int_or_none(file_.get('height')),
})
self._sort_formats(formats)
subtitles = {}
for file_ in video.get('files', []):
format_url = file_.get('url')
format_code = file_.get('formatCode')
if not format_url or not format_code:
continue
if compat_str(format_code) == '3100':
subtitles.setdefault(file_.get('culture', 'en'), []).append({
'ext': determine_ext(format_url, 'ttml'),
'url': format_url,
})
return {
'id': video_id,
'display_id': display_id,
'title': title,
'description': video.get('description'),
'thumbnail': video.get('headlineImage', {}).get('url'),
'duration': int_or_none(video.get('durationSecs')),
'uploader': video.get('sourceFriendly'),
'uploader_id': video.get('providerId'),
'creator': video.get('creator'),
'subtitles': subtitles,
'formats': formats,
}

View File

@ -1,66 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_urlparse
from ..utils import (
int_or_none,
js_to_json,
mimetype2ext,
)
class MusicPlayOnIE(InfoExtractor):
_VALID_URL = r'https?://(?:.+?\.)?musicplayon\.com/play(?:-touch)?\?(?:v|pl=\d+&play)=(?P<id>\d+)'
_TESTS = [{
'url': 'http://en.musicplayon.com/play?v=433377',
'md5': '00cdcdea1726abdf500d1e7fd6dd59bb',
'info_dict': {
'id': '433377',
'ext': 'mp4',
'title': 'Rick Ross - Interview On Chelsea Lately (2014)',
'description': 'Rick Ross Interview On Chelsea Lately',
'duration': 342,
'uploader': 'ultrafish',
},
}, {
'url': 'http://en.musicplayon.com/play?pl=102&play=442629',
'only_matching': True,
}]
_URL_TEMPLATE = 'http://en.musicplayon.com/play?v=%s'
def _real_extract(self, url):
video_id = self._match_id(url)
url = self._URL_TEMPLATE % video_id
page = self._download_webpage(url, video_id)
title = self._og_search_title(page)
description = self._og_search_description(page)
thumbnail = self._og_search_thumbnail(page)
duration = self._html_search_meta('video:duration', page, 'duration', fatal=False)
view_count = self._og_search_property('count', page, fatal=False)
uploader = self._html_search_regex(
r'<div>by&nbsp;<a href="[^"]+" class="purple">([^<]+)</a></div>', page, 'uploader', fatal=False)
sources = self._parse_json(
self._search_regex(r'setup\[\'_sources\'\]\s*=\s*([^;]+);', page, 'video sources'),
video_id, transform_source=js_to_json)
formats = [{
'url': compat_urlparse.urljoin(url, source['src']),
'ext': mimetype2ext(source.get('type')),
'format_note': source.get('data-res'),
} for source in sources]
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'uploader': uploader,
'duration': int_or_none(duration),
'view_count': int_or_none(view_count),
'formats': formats,
}

View File

@ -1,68 +1,33 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
clean_html,
dict_get,
ExtractorError, ExtractorError,
int_or_none, int_or_none,
parse_duration,
try_get,
update_url_query, update_url_query,
) )
class NaverIE(InfoExtractor): class NaverBaseIE(InfoExtractor):
_VALID_URL = r'https?://(?:m\.)?tv(?:cast)?\.naver\.com/v/(?P<id>\d+)' _CAPTION_EXT_RE = r'\.(?:ttml|vtt)'
_TESTS = [{ def _extract_video_info(self, video_id, vid, key):
'url': 'http://tv.naver.com/v/81652',
'info_dict': {
'id': '81652',
'ext': 'mp4',
'title': '[9월 모의고사 해설강의][수학_김상희] 수학 A형 16~20번',
'description': '합격불변의 법칙 메가스터디 | 메가스터디 수학 김상희 선생님이 9월 모의고사 수학A형 16번에서 20번까지 해설강의를 공개합니다.',
'upload_date': '20130903',
},
}, {
'url': 'http://tv.naver.com/v/395837',
'md5': '638ed4c12012c458fefcddfd01f173cd',
'info_dict': {
'id': '395837',
'ext': 'mp4',
'title': '9년이 지나도 아픈 기억, 전효성의 아버지',
'description': 'md5:5bf200dcbf4b66eb1b350d1eb9c753f7',
'upload_date': '20150519',
},
'skip': 'Georestricted',
}, {
'url': 'http://tvcast.naver.com/v/81652',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
vid = self._search_regex(
r'videoId["\']\s*:\s*(["\'])(?P<value>(?:(?!\1).)+)\1', webpage,
'video id', fatal=None, group='value')
in_key = self._search_regex(
r'inKey["\']\s*:\s*(["\'])(?P<value>(?:(?!\1).)+)\1', webpage,
'key', default=None, group='value')
if not vid or not in_key:
error = self._html_search_regex(
r'(?s)<div class="(?:nation_error|nation_box|error_box)">\s*(?:<!--.*?-->)?\s*<p class="[^"]+">(?P<msg>.+?)</p>\s*</div>',
webpage, 'error', default=None)
if error:
raise ExtractorError(error, expected=True)
raise ExtractorError('couldn\'t extract vid and key')
video_data = self._download_json( video_data = self._download_json(
'http://play.rmcnmv.naver.com/vod/play/v2.0/' + vid, 'http://play.rmcnmv.naver.com/vod/play/v2.0/' + vid,
video_id, query={ video_id, query={
'key': in_key, 'key': key,
}) })
meta = video_data['meta'] meta = video_data['meta']
title = meta['subject'] title = meta['subject']
formats = [] formats = []
get_list = lambda x: try_get(video_data, lambda y: y[x + 's']['list'], list) or []
def extract_formats(streams, stream_type, query={}): def extract_formats(streams, stream_type, query={}):
for stream in streams: for stream in streams:
@ -73,7 +38,7 @@ class NaverIE(InfoExtractor):
encoding_option = stream.get('encodingOption', {}) encoding_option = stream.get('encodingOption', {})
bitrate = stream.get('bitrate', {}) bitrate = stream.get('bitrate', {})
formats.append({ formats.append({
'format_id': '%s_%s' % (stream.get('type') or stream_type, encoding_option.get('id') or encoding_option.get('name')), 'format_id': '%s_%s' % (stream.get('type') or stream_type, dict_get(encoding_option, ('name', 'id'))),
'url': stream_url, 'url': stream_url,
'width': int_or_none(encoding_option.get('width')), 'width': int_or_none(encoding_option.get('width')),
'height': int_or_none(encoding_option.get('height')), 'height': int_or_none(encoding_option.get('height')),
@ -83,7 +48,7 @@ class NaverIE(InfoExtractor):
'protocol': 'm3u8_native' if stream_type == 'HLS' else None, 'protocol': 'm3u8_native' if stream_type == 'HLS' else None,
}) })
extract_formats(video_data.get('videos', {}).get('list', []), 'H264') extract_formats(get_list('video'), 'H264')
for stream_set in video_data.get('streams', []): for stream_set in video_data.get('streams', []):
query = {} query = {}
for param in stream_set.get('keys', []): for param in stream_set.get('keys', []):
@ -101,28 +66,101 @@ class NaverIE(InfoExtractor):
'mp4', 'm3u8_native', m3u8_id=stream_type, fatal=False)) 'mp4', 'm3u8_native', m3u8_id=stream_type, fatal=False))
self._sort_formats(formats) self._sort_formats(formats)
replace_ext = lambda x, y: re.sub(self._CAPTION_EXT_RE, '.' + y, x)
def get_subs(caption_url):
if re.search(self._CAPTION_EXT_RE, caption_url):
return [{
'url': replace_ext(caption_url, 'ttml'),
}, {
'url': replace_ext(caption_url, 'vtt'),
}]
else:
return [{'url': caption_url}]
automatic_captions = {}
subtitles = {} subtitles = {}
for caption in video_data.get('captions', {}).get('list', []): for caption in get_list('caption'):
caption_url = caption.get('source') caption_url = caption.get('source')
if not caption_url: if not caption_url:
continue continue
subtitles.setdefault(caption.get('language') or caption.get('locale'), []).append({ sub_dict = automatic_captions if caption.get('type') == 'auto' else subtitles
'url': caption_url, sub_dict.setdefault(dict_get(caption, ('locale', 'language')), []).extend(get_subs(caption_url))
})
upload_date = self._search_regex( user = meta.get('user', {})
r'<span[^>]+class="date".*?(\d{4}\.\d{2}\.\d{2})',
webpage, 'upload date', fatal=False)
if upload_date:
upload_date = upload_date.replace('.', '')
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'formats': formats, 'formats': formats,
'subtitles': subtitles, 'subtitles': subtitles,
'description': self._og_search_description(webpage), 'automatic_captions': automatic_captions,
'thumbnail': meta.get('cover', {}).get('source') or self._og_search_thumbnail(webpage), 'thumbnail': try_get(meta, lambda x: x['cover']['source']),
'view_count': int_or_none(meta.get('count')), 'view_count': int_or_none(meta.get('count')),
'upload_date': upload_date, 'uploader_id': user.get('id'),
'uploader': user.get('name'),
'uploader_url': user.get('url'),
} }
class NaverIE(NaverBaseIE):
_VALID_URL = r'https?://(?:m\.)?tv(?:cast)?\.naver\.com/(?:v|embed)/(?P<id>\d+)'
_GEO_BYPASS = False
_TESTS = [{
'url': 'http://tv.naver.com/v/81652',
'info_dict': {
'id': '81652',
'ext': 'mp4',
'title': '[9월 모의고사 해설강의][수학_김상희] 수학 A형 16~20번',
'description': '메가스터디 수학 김상희 선생님이 9월 모의고사 수학A형 16번에서 20번까지 해설강의를 공개합니다.',
'timestamp': 1378200754,
'upload_date': '20130903',
'uploader': '메가스터디, 합격불변의 법칙',
'uploader_id': 'megastudy',
},
}, {
'url': 'http://tv.naver.com/v/395837',
'md5': '8a38e35354d26a17f73f4e90094febd3',
'info_dict': {
'id': '395837',
'ext': 'mp4',
'title': '9년이 지나도 아픈 기억, 전효성의 아버지',
'description': 'md5:eb6aca9d457b922e43860a2a2b1984d3',
'timestamp': 1432030253,
'upload_date': '20150519',
'uploader': '4가지쇼 시즌2',
'uploader_id': 'wrappinguser29',
},
'skip': 'Georestricted',
}, {
'url': 'http://tvcast.naver.com/v/81652',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
content = self._download_json(
'https://tv.naver.com/api/json/v/' + video_id,
video_id, headers=self.geo_verification_headers())
player_info_json = content.get('playerInfoJson') or {}
current_clip = player_info_json.get('currentClip') or {}
vid = current_clip.get('videoId')
in_key = current_clip.get('inKey')
if not vid or not in_key:
player_auth = try_get(player_info_json, lambda x: x['playerOption']['auth'])
if player_auth == 'notCountry':
self.raise_geo_restricted(countries=['KR'])
elif player_auth == 'notLogin':
self.raise_login_required()
raise ExtractorError('couldn\'t extract vid and key')
info = self._extract_video_info(video_id, vid, in_key)
info.update({
'description': clean_html(current_clip.get('description')),
'timestamp': int_or_none(current_clip.get('firstExposureTime'), 1000),
'duration': parse_duration(current_clip.get('displayPlayTime')),
'like_count': int_or_none(current_clip.get('recommendPoint')),
'age_limit': 19 if current_clip.get('adult') else None,
})
return info

View File

@ -87,11 +87,25 @@ class NBCIE(AdobePassIE):
def _real_extract(self, url): def _real_extract(self, url):
permalink, video_id = re.match(self._VALID_URL, url).groups() permalink, video_id = re.match(self._VALID_URL, url).groups()
permalink = 'http' + compat_urllib_parse_unquote(permalink) permalink = 'http' + compat_urllib_parse_unquote(permalink)
response = self._download_json( video_data = self._download_json(
'https://friendship.nbc.co/v2/graphql', video_id, query={ 'https://friendship.nbc.co/v2/graphql', video_id, query={
'query': '''{ 'query': '''query bonanzaPage(
page(name: "%s", platform: web, type: VIDEO, userId: "0") { $app: NBCUBrands! = nbc
data { $name: String!
$oneApp: Boolean
$platform: SupportedPlatforms! = web
$type: EntityPageType! = VIDEO
$userId: String!
) {
bonanzaPage(
app: $app
name: $name
oneApp: $oneApp
platform: $platform
type: $type
userId: $userId
) {
metadata {
... on VideoPageData { ... on VideoPageData {
description description
episodeNumber episodeNumber
@ -100,15 +114,20 @@ class NBCIE(AdobePassIE):
mpxAccountId mpxAccountId
mpxGuid mpxGuid
rating rating
resourceId
seasonNumber seasonNumber
secondaryTitle secondaryTitle
seriesShortTitle seriesShortTitle
} }
} }
} }
}''' % permalink, }''',
}) 'variables': json.dumps({
video_data = response['data']['page']['data'] 'name': permalink,
'oneApp': True,
'userId': '0',
}),
})['data']['bonanzaPage']['metadata']
query = { query = {
'mbr': 'true', 'mbr': 'true',
'manifest': 'm3u', 'manifest': 'm3u',
@ -117,8 +136,8 @@ class NBCIE(AdobePassIE):
title = video_data['secondaryTitle'] title = video_data['secondaryTitle']
if video_data.get('locked'): if video_data.get('locked'):
resource = self._get_mvpd_resource( resource = self._get_mvpd_resource(
'nbcentertainment', title, video_id, video_data.get('resourceId') or 'nbcentertainment',
video_data.get('rating')) title, video_id, video_data.get('rating'))
query['auth'] = self._extract_mvpd_auth( query['auth'] = self._extract_mvpd_auth(
url, video_id, 'nbcentertainment', resource) url, video_id, 'nbcentertainment', resource)
theplatform_url = smuggle_url(update_url_query( theplatform_url = smuggle_url(update_url_query(

View File

@ -7,8 +7,11 @@ from .common import InfoExtractor
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
int_or_none, int_or_none,
merge_dicts,
parse_iso8601, parse_iso8601,
qualities, qualities,
try_get,
urljoin,
) )
@ -85,21 +88,25 @@ class NDRIE(NDRBaseIE):
def _extract_embed(self, webpage, display_id): def _extract_embed(self, webpage, display_id):
embed_url = self._html_search_meta( embed_url = self._html_search_meta(
'embedURL', webpage, 'embed URL', fatal=True) 'embedURL', webpage, 'embed URL',
default=None) or self._search_regex(
r'\bembedUrl["\']\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1', webpage,
'embed URL', group='url')
description = self._search_regex( description = self._search_regex(
r'<p[^>]+itemprop="description">([^<]+)</p>', r'<p[^>]+itemprop="description">([^<]+)</p>',
webpage, 'description', default=None) or self._og_search_description(webpage) webpage, 'description', default=None) or self._og_search_description(webpage)
timestamp = parse_iso8601( timestamp = parse_iso8601(
self._search_regex( self._search_regex(
r'<span[^>]+itemprop="(?:datePublished|uploadDate)"[^>]+content="([^"]+)"', r'<span[^>]+itemprop="(?:datePublished|uploadDate)"[^>]+content="([^"]+)"',
webpage, 'upload date', fatal=False)) webpage, 'upload date', default=None))
return { info = self._search_json_ld(webpage, display_id, default={})
return merge_dicts({
'_type': 'url_transparent', '_type': 'url_transparent',
'url': embed_url, 'url': embed_url,
'display_id': display_id, 'display_id': display_id,
'description': description, 'description': description,
'timestamp': timestamp, 'timestamp': timestamp,
} }, info)
class NJoyIE(NDRBaseIE): class NJoyIE(NDRBaseIE):
@ -220,11 +227,17 @@ class NDREmbedBaseIE(InfoExtractor):
upload_date = ppjson.get('config', {}).get('publicationDate') upload_date = ppjson.get('config', {}).get('publicationDate')
duration = int_or_none(config.get('duration')) duration = int_or_none(config.get('duration'))
thumbnails = [{ thumbnails = []
'id': thumbnail.get('quality') or thumbnail_id, poster = try_get(config, lambda x: x['poster'], dict) or {}
'url': thumbnail['src'], for thumbnail_id, thumbnail in poster.items():
'preference': quality_key(thumbnail.get('quality')), thumbnail_url = urljoin(url, thumbnail.get('src'))
} for thumbnail_id, thumbnail in config.get('poster', {}).items() if thumbnail.get('src')] if not thumbnail_url:
continue
thumbnails.append({
'id': thumbnail.get('quality') or thumbnail_id,
'url': thumbnail_url,
'preference': quality_key(thumbnail.get('quality')),
})
return { return {
'id': video_id, 'id': video_id,

View File

@ -6,7 +6,7 @@ from .common import InfoExtractor
class NhkVodIE(InfoExtractor): class NhkVodIE(InfoExtractor):
_VALID_URL = r'https?://www3\.nhk\.or\.jp/nhkworld/(?P<lang>[a-z]{2})/ondemand/(?P<type>video|audio)/(?P<id>\d{7}|[a-z]+-\d{8}-\d+)' _VALID_URL = r'https?://www3\.nhk\.or\.jp/nhkworld/(?P<lang>[a-z]{2})/ondemand/(?P<type>video|audio)/(?P<id>\d{7}|[^/]+?-\d{8}-\d+)'
# Content available only for a limited period of time. Visit # Content available only for a limited period of time. Visit
# https://www3.nhk.or.jp/nhkworld/en/ondemand/ for working samples. # https://www3.nhk.or.jp/nhkworld/en/ondemand/ for working samples.
_TESTS = [{ _TESTS = [{
@ -30,8 +30,11 @@ class NhkVodIE(InfoExtractor):
}, { }, {
'url': 'https://www3.nhk.or.jp/nhkworld/fr/ondemand/audio/plugin-20190404-1/', 'url': 'https://www3.nhk.or.jp/nhkworld/fr/ondemand/audio/plugin-20190404-1/',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www3.nhk.or.jp/nhkworld/en/ondemand/audio/j_art-20150903-1/',
'only_matching': True,
}] }]
_API_URL_TEMPLATE = 'https://api.nhk.or.jp/nhkworld/%sod%slist/v7/episode/%s/%s/all%s.json' _API_URL_TEMPLATE = 'https://api.nhk.or.jp/nhkworld/%sod%slist/v7a/episode/%s/%s/all%s.json'
def _real_extract(self, url): def _real_extract(self, url):
lang, m_type, episode_id = re.match(self._VALID_URL, url).groups() lang, m_type, episode_id = re.match(self._VALID_URL, url).groups()
@ -82,15 +85,9 @@ class NhkVodIE(InfoExtractor):
audio = episode['audio'] audio = episode['audio']
audio_path = audio['audio'] audio_path = audio['audio']
info['formats'] = self._extract_m3u8_formats( info['formats'] = self._extract_m3u8_formats(
'https://nhks-vh.akamaihd.net/i%s/master.m3u8' % audio_path, 'https://nhkworld-vh.akamaihd.net/i%s/master.m3u8' % audio_path,
episode_id, 'm4a', m3u8_id='hls', fatal=False) episode_id, 'm4a', entry_protocol='m3u8_native',
for proto in ('rtmpt', 'rtmp'): m3u8_id='hls', fatal=False)
info['formats'].append({
'ext': 'flv',
'format_id': proto,
'url': '%s://flv.nhk.or.jp/ondemand/mp4:flv%s' % (proto, audio_path),
'vcodec': 'none',
})
for f in info['formats']: for f in info['formats']:
f['language'] = lang f['language'] = lang
return info return info

View File

@ -5,13 +5,12 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from .ooyala import OoyalaIE from .ooyala import OoyalaIE
from ..utils import unescapeHTML
class NintendoIE(InfoExtractor): class NintendoIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?nintendo\.com/games/detail/(?P<id>[^/?#&]+)' _VALID_URL = r'https?://(?:www\.)?nintendo\.com/(?:games/detail|nintendo-direct)/(?P<id>[^/?#&]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.nintendo.com/games/detail/yEiAzhU2eQI1KZ7wOHhngFoAHc1FpHwj', 'url': 'https://www.nintendo.com/games/detail/duck-hunt-wii-u/',
'info_dict': { 'info_dict': {
'id': 'MzMmticjp0VPzO3CCj4rmFOuohEuEWoW', 'id': 'MzMmticjp0VPzO3CCj4rmFOuohEuEWoW',
'ext': 'flv', 'ext': 'flv',
@ -28,7 +27,19 @@ class NintendoIE(InfoExtractor):
'id': 'tokyo-mirage-sessions-fe-wii-u', 'id': 'tokyo-mirage-sessions-fe-wii-u',
'title': 'Tokyo Mirage Sessions ♯FE', 'title': 'Tokyo Mirage Sessions ♯FE',
}, },
'playlist_count': 3, 'playlist_count': 4,
}, {
'url': 'https://www.nintendo.com/nintendo-direct/09-04-2019/',
'info_dict': {
'id': 'J2bXdmaTE6fe3dWJTPcc7m23FNbc_A1V',
'ext': 'mp4',
'title': 'Switch_ROS_ND0904-H264.mov',
'duration': 2324.758,
},
'params': {
'skip_download': True,
},
'add_ie': ['Ooyala'],
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -39,8 +50,11 @@ class NintendoIE(InfoExtractor):
entries = [ entries = [
OoyalaIE._build_url_result(m.group('code')) OoyalaIE._build_url_result(m.group('code'))
for m in re.finditer( for m in re.finditer(
r'class=(["\'])embed-video\1[^>]+data-video-code=(["\'])(?P<code>(?:(?!\2).)+)\2', r'data-(?:video-id|directVideoId)=(["\'])(?P<code>(?:(?!\1).)+)\1', webpage)]
webpage)]
title = self._html_search_regex(
r'(?s)<(?:span|div)[^>]+class="(?:title|wrapper)"[^>]*>.*?<h1>(.+?)</h1>',
webpage, 'title', fatal=False)
return self.playlist_result( return self.playlist_result(
entries, page_id, unescapeHTML(self._og_search_title(webpage, fatal=False))) entries, page_id, title)

View File

@ -6,6 +6,7 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
clean_html, clean_html,
determine_ext,
int_or_none, int_or_none,
js_to_json, js_to_json,
qualities, qualities,
@ -18,7 +19,7 @@ class NovaEmbedIE(InfoExtractor):
_VALID_URL = r'https?://media\.cms\.nova\.cz/embed/(?P<id>[^/?#&]+)' _VALID_URL = r'https?://media\.cms\.nova\.cz/embed/(?P<id>[^/?#&]+)'
_TEST = { _TEST = {
'url': 'https://media.cms.nova.cz/embed/8o0n0r?autoplay=1', 'url': 'https://media.cms.nova.cz/embed/8o0n0r?autoplay=1',
'md5': 'b3834f6de5401baabf31ed57456463f7', 'md5': 'ee009bafcc794541570edd44b71cbea3',
'info_dict': { 'info_dict': {
'id': '8o0n0r', 'id': '8o0n0r',
'ext': 'mp4', 'ext': 'mp4',
@ -33,36 +34,76 @@ class NovaEmbedIE(InfoExtractor):
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
bitrates = self._parse_json( duration = None
self._search_regex(
r'(?s)(?:src|bitrates)\s*=\s*({.+?})\s*;', webpage, 'formats'),
video_id, transform_source=js_to_json)
QUALITIES = ('lq', 'mq', 'hq', 'hd')
quality_key = qualities(QUALITIES)
formats = [] formats = []
for format_id, format_list in bitrates.items():
if not isinstance(format_list, list): player = self._parse_json(
continue self._search_regex(
for format_url in format_list: r'Player\.init\s*\([^,]+,\s*({.+?})\s*,\s*{.+?}\s*\)\s*;',
format_url = url_or_none(format_url) webpage, 'player', default='{}'), video_id, fatal=False)
if not format_url: if player:
continue for format_id, format_list in player['tracks'].items():
f = { if not isinstance(format_list, list):
'url': format_url, format_list = [format_list]
} for format_dict in format_list:
f_id = format_id if not isinstance(format_dict, dict):
for quality in QUALITIES: continue
if '%s.mp4' % quality in format_url: format_url = url_or_none(format_dict.get('src'))
f_id += '-%s' % quality format_type = format_dict.get('type')
f.update({ ext = determine_ext(format_url)
'quality': quality_key(quality), if (format_type == 'application/x-mpegURL'
'format_note': quality.upper(), or format_id == 'HLS' or ext == 'm3u8'):
formats.extend(self._extract_m3u8_formats(
format_url, video_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='hls',
fatal=False))
elif (format_type == 'application/dash+xml'
or format_id == 'DASH' or ext == 'mpd'):
formats.extend(self._extract_mpd_formats(
format_url, video_id, mpd_id='dash', fatal=False))
else:
formats.append({
'url': format_url,
}) })
break duration = int_or_none(player.get('duration'))
f['format_id'] = f_id else:
formats.append(f) # Old path, not actual as of 08.04.2020
bitrates = self._parse_json(
self._search_regex(
r'(?s)(?:src|bitrates)\s*=\s*({.+?})\s*;', webpage, 'formats'),
video_id, transform_source=js_to_json)
QUALITIES = ('lq', 'mq', 'hq', 'hd')
quality_key = qualities(QUALITIES)
for format_id, format_list in bitrates.items():
if not isinstance(format_list, list):
format_list = [format_list]
for format_url in format_list:
format_url = url_or_none(format_url)
if not format_url:
continue
if format_id == 'hls':
formats.extend(self._extract_m3u8_formats(
format_url, video_id, ext='mp4',
entry_protocol='m3u8_native', m3u8_id='hls',
fatal=False))
continue
f = {
'url': format_url,
}
f_id = format_id
for quality in QUALITIES:
if '%s.mp4' % quality in format_url:
f_id += '-%s' % quality
f.update({
'quality': quality_key(quality),
'format_note': quality.upper(),
})
break
f['format_id'] = f_id
formats.append(f)
self._sort_formats(formats) self._sort_formats(formats)
title = self._og_search_title( title = self._og_search_title(
@ -75,7 +116,8 @@ class NovaEmbedIE(InfoExtractor):
r'poster\s*:\s*(["\'])(?P<value>(?:(?!\1).)+)\1', webpage, r'poster\s*:\s*(["\'])(?P<value>(?:(?!\1).)+)\1', webpage,
'thumbnail', fatal=False, group='value') 'thumbnail', fatal=False, group='value')
duration = int_or_none(self._search_regex( duration = int_or_none(self._search_regex(
r'videoDuration\s*:\s*(\d+)', webpage, 'duration', fatal=False)) r'videoDuration\s*:\s*(\d+)', webpage, 'duration',
default=duration))
return { return {
'id': video_id, 'id': video_id,
@ -91,7 +133,7 @@ class NovaIE(InfoExtractor):
_VALID_URL = r'https?://(?:[^.]+\.)?(?P<site>tv(?:noviny)?|tn|novaplus|vymena|fanda|krasna|doma|prask)\.nova\.cz/(?:[^/]+/)+(?P<id>[^/]+?)(?:\.html|/|$)' _VALID_URL = r'https?://(?:[^.]+\.)?(?P<site>tv(?:noviny)?|tn|novaplus|vymena|fanda|krasna|doma|prask)\.nova\.cz/(?:[^/]+/)+(?P<id>[^/]+?)(?:\.html|/|$)'
_TESTS = [{ _TESTS = [{
'url': 'http://tn.nova.cz/clanek/tajemstvi-ukryte-v-podzemi-specialni-nemocnice-v-prazske-krci.html#player_13260', 'url': 'http://tn.nova.cz/clanek/tajemstvi-ukryte-v-podzemi-specialni-nemocnice-v-prazske-krci.html#player_13260',
'md5': '1dd7b9d5ea27bc361f110cd855a19bd3', 'md5': '249baab7d0104e186e78b0899c7d5f28',
'info_dict': { 'info_dict': {
'id': '1757139', 'id': '1757139',
'display_id': 'tajemstvi-ukryte-v-podzemi-specialni-nemocnice-v-prazske-krci', 'display_id': 'tajemstvi-ukryte-v-podzemi-specialni-nemocnice-v-prazske-krci',
@ -113,7 +155,8 @@ class NovaIE(InfoExtractor):
'params': { 'params': {
# rtmp download # rtmp download
'skip_download': True, 'skip_download': True,
} },
'skip': 'gone',
}, { }, {
# media.cms.nova.cz embed # media.cms.nova.cz embed
'url': 'https://novaplus.nova.cz/porad/ulice/epizoda/18760-2180-dil', 'url': 'https://novaplus.nova.cz/porad/ulice/epizoda/18760-2180-dil',
@ -128,6 +171,7 @@ class NovaIE(InfoExtractor):
'skip_download': True, 'skip_download': True,
}, },
'add_ie': [NovaEmbedIE.ie_key()], 'add_ie': [NovaEmbedIE.ie_key()],
'skip': 'CHYBA 404: STRÁNKA NENALEZENA',
}, { }, {
'url': 'http://sport.tn.nova.cz/clanek/sport/hokej/nhl/zivot-jde-dal-hodnotil-po-vyrazeni-z-playoff-jiri-sekac.html', 'url': 'http://sport.tn.nova.cz/clanek/sport/hokej/nhl/zivot-jde-dal-hodnotil-po-vyrazeni-z-playoff-jiri-sekac.html',
'only_matching': True, 'only_matching': True,
@ -152,14 +196,29 @@ class NovaIE(InfoExtractor):
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
description = clean_html(self._og_search_description(webpage, default=None))
if site == 'novaplus':
upload_date = unified_strdate(self._search_regex(
r'(\d{1,2}-\d{1,2}-\d{4})$', display_id, 'upload date', default=None))
elif site == 'fanda':
upload_date = unified_strdate(self._search_regex(
r'<span class="date_time">(\d{1,2}\.\d{1,2}\.\d{4})', webpage, 'upload date', default=None))
else:
upload_date = None
# novaplus # novaplus
embed_id = self._search_regex( embed_id = self._search_regex(
r'<iframe[^>]+\bsrc=["\'](?:https?:)?//media\.cms\.nova\.cz/embed/([^/?#&]+)', r'<iframe[^>]+\bsrc=["\'](?:https?:)?//media\.cms\.nova\.cz/embed/([^/?#&]+)',
webpage, 'embed url', default=None) webpage, 'embed url', default=None)
if embed_id: if embed_id:
return self.url_result( return {
'https://media.cms.nova.cz/embed/%s' % embed_id, '_type': 'url_transparent',
ie=NovaEmbedIE.ie_key(), video_id=embed_id) 'url': 'https://media.cms.nova.cz/embed/%s' % embed_id,
'ie_key': NovaEmbedIE.ie_key(),
'id': embed_id,
'description': description,
'upload_date': upload_date
}
video_id = self._search_regex( video_id = self._search_regex(
[r"(?:media|video_id)\s*:\s*'(\d+)'", [r"(?:media|video_id)\s*:\s*'(\d+)'",
@ -233,18 +292,8 @@ class NovaIE(InfoExtractor):
self._sort_formats(formats) self._sort_formats(formats)
title = mediafile.get('meta', {}).get('title') or self._og_search_title(webpage) title = mediafile.get('meta', {}).get('title') or self._og_search_title(webpage)
description = clean_html(self._og_search_description(webpage, default=None))
thumbnail = config.get('poster') thumbnail = config.get('poster')
if site == 'novaplus':
upload_date = unified_strdate(self._search_regex(
r'(\d{1,2}-\d{1,2}-\d{4})$', display_id, 'upload date', default=None))
elif site == 'fanda':
upload_date = unified_strdate(self._search_regex(
r'<span class="date_time">(\d{1,2}\.\d{1,2}\.\d{4})', webpage, 'upload date', default=None))
else:
upload_date = None
return { return {
'id': video_id, 'id': video_id,
'display_id': display_id, 'display_id': display_id,

View File

@ -4,6 +4,7 @@ from .common import InfoExtractor
from ..utils import ( from ..utils import (
int_or_none, int_or_none,
qualities, qualities,
url_or_none,
) )
@ -48,6 +49,10 @@ class NprIE(InfoExtractor):
}, },
}], }],
'expected_warnings': ['Failed to download m3u8 information'], 'expected_warnings': ['Failed to download m3u8 information'],
}, {
# multimedia, no formats, stream
'url': 'https://www.npr.org/2020/02/14/805476846/laura-stevenson-tiny-desk-concert',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -95,6 +100,17 @@ class NprIE(InfoExtractor):
'format_id': format_id, 'format_id': format_id,
'quality': quality(format_id), 'quality': quality(format_id),
}) })
for stream_id, stream_entry in media.get('stream', {}).items():
if not isinstance(stream_entry, dict):
continue
if stream_id != 'hlsUrl':
continue
stream_url = url_or_none(stream_entry.get('$text'))
if not stream_url:
continue
formats.extend(self._extract_m3u8_formats(
stream_url, stream_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
self._sort_formats(formats) self._sort_formats(formats)
entries.append({ entries.append({

View File

@ -12,6 +12,7 @@ from ..utils import (
ExtractorError, ExtractorError,
int_or_none, int_or_none,
JSON_LD_RE, JSON_LD_RE,
js_to_json,
NO_DEFAULT, NO_DEFAULT,
parse_age_limit, parse_age_limit,
parse_duration, parse_duration,
@ -105,6 +106,7 @@ class NRKBaseIE(InfoExtractor):
MESSAGES = { MESSAGES = {
'ProgramRightsAreNotReady': 'Du kan dessverre ikke se eller høre programmet', 'ProgramRightsAreNotReady': 'Du kan dessverre ikke se eller høre programmet',
'ProgramRightsHasExpired': 'Programmet har gått ut', 'ProgramRightsHasExpired': 'Programmet har gått ut',
'NoProgramRights': 'Ikke tilgjengelig',
'ProgramIsGeoBlocked': 'NRK har ikke rettigheter til å vise dette programmet utenfor Norge', 'ProgramIsGeoBlocked': 'NRK har ikke rettigheter til å vise dette programmet utenfor Norge',
} }
message_type = data.get('messageType', '') message_type = data.get('messageType', '')
@ -255,6 +257,17 @@ class NRKTVIE(NRKBaseIE):
''' % _EPISODE_RE ''' % _EPISODE_RE
_API_HOSTS = ('psapi-ne.nrk.no', 'psapi-we.nrk.no') _API_HOSTS = ('psapi-ne.nrk.no', 'psapi-we.nrk.no')
_TESTS = [{ _TESTS = [{
'url': 'https://tv.nrk.no/program/MDDP12000117',
'md5': '8270824df46ec629b66aeaa5796b36fb',
'info_dict': {
'id': 'MDDP12000117AA',
'ext': 'mp4',
'title': 'Alarm Trolltunga',
'description': 'md5:46923a6e6510eefcce23d5ef2a58f2ce',
'duration': 2223,
'age_limit': 6,
},
}, {
'url': 'https://tv.nrk.no/serie/20-spoersmaal-tv/MUHH48000314/23-05-2014', 'url': 'https://tv.nrk.no/serie/20-spoersmaal-tv/MUHH48000314/23-05-2014',
'md5': '9a167e54d04671eb6317a37b7bc8a280', 'md5': '9a167e54d04671eb6317a37b7bc8a280',
'info_dict': { 'info_dict': {
@ -266,6 +279,7 @@ class NRKTVIE(NRKBaseIE):
'series': '20 spørsmål', 'series': '20 spørsmål',
'episode': '23.05.2014', 'episode': '23.05.2014',
}, },
'skip': 'NoProgramRights',
}, { }, {
'url': 'https://tv.nrk.no/program/mdfp15000514', 'url': 'https://tv.nrk.no/program/mdfp15000514',
'info_dict': { 'info_dict': {
@ -370,7 +384,24 @@ class NRKTVIE(NRKBaseIE):
class NRKTVEpisodeIE(InfoExtractor): class NRKTVEpisodeIE(InfoExtractor):
_VALID_URL = r'https?://tv\.nrk\.no/serie/(?P<id>[^/]+/sesong/\d+/episode/\d+)' _VALID_URL = r'https?://tv\.nrk\.no/serie/(?P<id>[^/]+/sesong/\d+/episode/\d+)'
_TEST = { _TESTS = [{
'url': 'https://tv.nrk.no/serie/hellums-kro/sesong/1/episode/2',
'info_dict': {
'id': 'MUHH36005220BA',
'ext': 'mp4',
'title': 'Kro, krig og kjærlighet 2:6',
'description': 'md5:b32a7dc0b1ed27c8064f58b97bda4350',
'duration': 1563,
'series': 'Hellums kro',
'season_number': 1,
'episode_number': 2,
'episode': '2:6',
'age_limit': 6,
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://tv.nrk.no/serie/backstage/sesong/1/episode/8', 'url': 'https://tv.nrk.no/serie/backstage/sesong/1/episode/8',
'info_dict': { 'info_dict': {
'id': 'MSUI14000816AA', 'id': 'MSUI14000816AA',
@ -386,7 +417,8 @@ class NRKTVEpisodeIE(InfoExtractor):
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
} 'skip': 'ProgramRightsHasExpired',
}]
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) display_id = self._match_id(url)
@ -409,7 +441,7 @@ class NRKTVSerieBaseIE(InfoExtractor):
(r'INITIAL_DATA(?:_V\d)?_*\s*=\s*({.+?})\s*;', (r'INITIAL_DATA(?:_V\d)?_*\s*=\s*({.+?})\s*;',
r'({.+?})\s*,\s*"[^"]+"\s*\)\s*</script>'), r'({.+?})\s*,\s*"[^"]+"\s*\)\s*</script>'),
webpage, 'config', default='{}' if not fatal else NO_DEFAULT), webpage, 'config', default='{}' if not fatal else NO_DEFAULT),
display_id, fatal=False) display_id, fatal=False, transform_source=js_to_json)
if not config: if not config:
return return
return try_get( return try_get(
@ -479,6 +511,14 @@ class NRKTVSeriesIE(NRKTVSerieBaseIE):
_VALID_URL = r'https?://(?:tv|radio)\.nrk(?:super)?\.no/serie/(?P<id>[^/]+)' _VALID_URL = r'https?://(?:tv|radio)\.nrk(?:super)?\.no/serie/(?P<id>[^/]+)'
_ITEM_RE = r'(?:data-season=["\']|id=["\']season-)(?P<id>\d+)' _ITEM_RE = r'(?:data-season=["\']|id=["\']season-)(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'https://tv.nrk.no/serie/blank',
'info_dict': {
'id': 'blank',
'title': 'Blank',
'description': 'md5:7664b4e7e77dc6810cd3bca367c25b6e',
},
'playlist_mincount': 30,
}, {
# new layout, seasons # new layout, seasons
'url': 'https://tv.nrk.no/serie/backstage', 'url': 'https://tv.nrk.no/serie/backstage',
'info_dict': { 'info_dict': {
@ -648,7 +688,7 @@ class NRKSkoleIE(InfoExtractor):
_TESTS = [{ _TESTS = [{
'url': 'https://www.nrk.no/skole/?page=search&q=&mediaId=14099', 'url': 'https://www.nrk.no/skole/?page=search&q=&mediaId=14099',
'md5': '6bc936b01f9dd8ed45bc58b252b2d9b6', 'md5': '18c12c3d071953c3bf8d54ef6b2587b7',
'info_dict': { 'info_dict': {
'id': '6021', 'id': '6021',
'ext': 'mp4', 'ext': 'mp4',

View File

@ -23,8 +23,8 @@ class NRLTVIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
q_data = self._parse_json(self._search_regex( q_data = self._parse_json(self._html_search_regex(
r"(?s)q-data='({.+?})'", webpage, 'player data'), display_id) r'(?s)q-data="({.+?})"', webpage, 'player data'), display_id)
ooyala_id = q_data['videoId'] ooyala_id = q_data['videoId']
return self.url_result( return self.url_result(
'ooyala:' + ooyala_id, 'Ooyala', ooyala_id, q_data.get('title')) 'ooyala:' + ooyala_id, 'Ooyala', ooyala_id, q_data.get('title'))

View File

@ -3,9 +3,10 @@ from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
clean_html,
xpath_text,
int_or_none, int_or_none,
strip_or_none,
unescapeHTML,
xpath_text,
) )
@ -47,10 +48,10 @@ class NTVRuIE(InfoExtractor):
'duration': 1496, 'duration': 1496,
}, },
}, { }, {
'url': 'http://www.ntv.ru/kino/Koma_film', 'url': 'https://www.ntv.ru/kino/Koma_film/m70281/o336036/video/',
'md5': 'f825770930937aa7e5aca0dc0d29319a', 'md5': 'e9c7cde24d9d3eaed545911a04e6d4f4',
'info_dict': { 'info_dict': {
'id': '1007609', 'id': '1126480',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Остросюжетный фильм «Кома»', 'title': 'Остросюжетный фильм «Кома»',
'description': 'Остросюжетный фильм «Кома»', 'description': 'Остросюжетный фильм «Кома»',
@ -68,6 +69,10 @@ class NTVRuIE(InfoExtractor):
'thumbnail': r're:^http://.*\.jpg', 'thumbnail': r're:^http://.*\.jpg',
'duration': 2590, 'duration': 2590,
}, },
}, {
# Schemeless file URL
'url': 'https://www.ntv.ru/video/1797442',
'only_matching': True,
}] }]
_VIDEO_ID_REGEXES = [ _VIDEO_ID_REGEXES = [
@ -96,37 +101,31 @@ class NTVRuIE(InfoExtractor):
'http://www.ntv.ru/vi%s/' % video_id, 'http://www.ntv.ru/vi%s/' % video_id,
video_id, 'Downloading video XML') video_id, 'Downloading video XML')
title = clean_html(xpath_text(player, './data/title', 'title', fatal=True)) title = strip_or_none(unescapeHTML(xpath_text(player, './data/title', 'title', fatal=True)))
description = clean_html(xpath_text(player, './data/description', 'description'))
video = player.find('./data/video') video = player.find('./data/video')
video_id = xpath_text(video, './id', 'video id')
thumbnail = xpath_text(video, './splash', 'thumbnail')
duration = int_or_none(xpath_text(video, './totaltime', 'duration'))
view_count = int_or_none(xpath_text(video, './views', 'view count'))
token = self._download_webpage(
'http://stat.ntv.ru/services/access/token',
video_id, 'Downloading access token')
formats = [] formats = []
for format_id in ['', 'hi', 'webm']: for format_id in ['', 'hi', 'webm']:
file_ = video.find('./%sfile' % format_id) file_ = xpath_text(video, './%sfile' % format_id)
if file_ is None: if not file_:
continue continue
size = video.find('./%ssize' % format_id) if file_.startswith('//'):
file_ = self._proto_relative_url(file_)
elif not file_.startswith('http'):
file_ = 'http://media.ntv.ru/vod/' + file_
formats.append({ formats.append({
'url': 'http://media2.ntv.ru/vod/%s&tok=%s' % (file_.text, token), 'url': file_,
'filesize': int_or_none(size.text if size is not None else None), 'filesize': int_or_none(xpath_text(video, './%ssize' % format_id)),
}) })
self._sort_formats(formats) self._sort_formats(formats)
return { return {
'id': video_id, 'id': xpath_text(video, './id'),
'title': title, 'title': title,
'description': description, 'description': strip_or_none(unescapeHTML(xpath_text(player, './data/description'))),
'thumbnail': thumbnail, 'thumbnail': xpath_text(video, './splash'),
'duration': duration, 'duration': int_or_none(xpath_text(video, './totaltime')),
'view_count': view_count, 'view_count': int_or_none(xpath_text(video, './views')),
'formats': formats, 'formats': formats,
} }

View File

@ -69,10 +69,10 @@ class NYTimesBaseIE(InfoExtractor):
'width': int_or_none(video.get('width')), 'width': int_or_none(video.get('width')),
'height': int_or_none(video.get('height')), 'height': int_or_none(video.get('height')),
'filesize': get_file_size(video.get('file_size') or video.get('fileSize')), 'filesize': get_file_size(video.get('file_size') or video.get('fileSize')),
'tbr': int_or_none(video.get('bitrate'), 1000), 'tbr': int_or_none(video.get('bitrate'), 1000) or None,
'ext': ext, 'ext': ext,
}) })
self._sort_formats(formats) self._sort_formats(formats, ('height', 'width', 'filesize', 'tbr', 'fps', 'format_id'))
thumbnails = [] thumbnails = []
for image in video_data.get('images', []): for image in video_data.get('images', []):

View File

@ -1,12 +1,12 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import base64
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_b64decode, compat_b64decode,
compat_str, compat_str,
compat_urllib_parse_urlencode,
) )
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
@ -21,9 +21,9 @@ from ..utils import (
class OoyalaBaseIE(InfoExtractor): class OoyalaBaseIE(InfoExtractor):
_PLAYER_BASE = 'http://player.ooyala.com/' _PLAYER_BASE = 'http://player.ooyala.com/'
_CONTENT_TREE_BASE = _PLAYER_BASE + 'player_api/v1/content_tree/' _CONTENT_TREE_BASE = _PLAYER_BASE + 'player_api/v1/content_tree/'
_AUTHORIZATION_URL_TEMPLATE = _PLAYER_BASE + 'sas/player_api/v2/authorization/embed_code/%s/%s?' _AUTHORIZATION_URL_TEMPLATE = _PLAYER_BASE + 'sas/player_api/v2/authorization/embed_code/%s/%s'
def _extract(self, content_tree_url, video_id, domain='example.org', supportedformats=None, embed_token=None): def _extract(self, content_tree_url, video_id, domain=None, supportedformats=None, embed_token=None):
content_tree = self._download_json(content_tree_url, video_id)['content_tree'] content_tree = self._download_json(content_tree_url, video_id)['content_tree']
metadata = content_tree[list(content_tree)[0]] metadata = content_tree[list(content_tree)[0]]
embed_code = metadata['embed_code'] embed_code = metadata['embed_code']
@ -31,59 +31,62 @@ class OoyalaBaseIE(InfoExtractor):
title = metadata['title'] title = metadata['title']
auth_data = self._download_json( auth_data = self._download_json(
self._AUTHORIZATION_URL_TEMPLATE % (pcode, embed_code) self._AUTHORIZATION_URL_TEMPLATE % (pcode, embed_code),
+ compat_urllib_parse_urlencode({ video_id, headers=self.geo_verification_headers(), query={
'domain': domain, 'domain': domain or 'player.ooyala.com',
'supportedFormats': supportedformats or 'mp4,rtmp,m3u8,hds,dash,smooth', 'supportedFormats': supportedformats or 'mp4,rtmp,m3u8,hds,dash,smooth',
'embedToken': embed_token, 'embedToken': embed_token,
}), video_id, headers=self.geo_verification_headers()) })['authorization_data'][embed_code]
cur_auth_data = auth_data['authorization_data'][embed_code]
urls = [] urls = []
formats = [] formats = []
if cur_auth_data['authorized']: streams = auth_data.get('streams') or [{
for stream in cur_auth_data['streams']: 'delivery_type': 'hls',
url_data = try_get(stream, lambda x: x['url']['data'], compat_str) 'url': {
if not url_data: 'data': base64.b64encode(('http://player.ooyala.com/hls/player/all/%s.m3u8' % embed_code).encode()).decode(),
continue }
s_url = compat_b64decode(url_data).decode('utf-8') }]
if not s_url or s_url in urls: for stream in streams:
continue url_data = try_get(stream, lambda x: x['url']['data'], compat_str)
urls.append(s_url) if not url_data:
ext = determine_ext(s_url, None) continue
delivery_type = stream.get('delivery_type') s_url = compat_b64decode(url_data).decode('utf-8')
if delivery_type == 'hls' or ext == 'm3u8': if not s_url or s_url in urls:
formats.extend(self._extract_m3u8_formats( continue
re.sub(r'/ip(?:ad|hone)/', '/all/', s_url), embed_code, 'mp4', 'm3u8_native', urls.append(s_url)
m3u8_id='hls', fatal=False)) ext = determine_ext(s_url, None)
elif delivery_type == 'hds' or ext == 'f4m': delivery_type = stream.get('delivery_type')
formats.extend(self._extract_f4m_formats( if delivery_type == 'hls' or ext == 'm3u8':
s_url + '?hdcore=3.7.0', embed_code, f4m_id='hds', fatal=False)) formats.extend(self._extract_m3u8_formats(
elif delivery_type == 'dash' or ext == 'mpd': re.sub(r'/ip(?:ad|hone)/', '/all/', s_url), embed_code, 'mp4', 'm3u8_native',
formats.extend(self._extract_mpd_formats( m3u8_id='hls', fatal=False))
s_url, embed_code, mpd_id='dash', fatal=False)) elif delivery_type == 'hds' or ext == 'f4m':
elif delivery_type == 'smooth': formats.extend(self._extract_f4m_formats(
self._extract_ism_formats( s_url + '?hdcore=3.7.0', embed_code, f4m_id='hds', fatal=False))
s_url, embed_code, ism_id='mss', fatal=False) elif delivery_type == 'dash' or ext == 'mpd':
elif ext == 'smil': formats.extend(self._extract_mpd_formats(
formats.extend(self._extract_smil_formats( s_url, embed_code, mpd_id='dash', fatal=False))
s_url, embed_code, fatal=False)) elif delivery_type == 'smooth':
else: self._extract_ism_formats(
formats.append({ s_url, embed_code, ism_id='mss', fatal=False)
'url': s_url, elif ext == 'smil':
'ext': ext or delivery_type, formats.extend(self._extract_smil_formats(
'vcodec': stream.get('video_codec'), s_url, embed_code, fatal=False))
'format_id': delivery_type, else:
'width': int_or_none(stream.get('width')), formats.append({
'height': int_or_none(stream.get('height')), 'url': s_url,
'abr': int_or_none(stream.get('audio_bitrate')), 'ext': ext or delivery_type,
'vbr': int_or_none(stream.get('video_bitrate')), 'vcodec': stream.get('video_codec'),
'fps': float_or_none(stream.get('framerate')), 'format_id': delivery_type,
}) 'width': int_or_none(stream.get('width')),
else: 'height': int_or_none(stream.get('height')),
'abr': int_or_none(stream.get('audio_bitrate')),
'vbr': int_or_none(stream.get('video_bitrate')),
'fps': float_or_none(stream.get('framerate')),
})
if not formats and not auth_data.get('authorized'):
raise ExtractorError('%s said: %s' % ( raise ExtractorError('%s said: %s' % (
self.IE_NAME, cur_auth_data['message']), expected=True) self.IE_NAME, auth_data['message']), expected=True)
self._sort_formats(formats) self._sort_formats(formats)
subtitles = {} subtitles = {}

View File

@ -3,21 +3,17 @@ from __future__ import unicode_literals
import json import json
import os import os
import re
import subprocess import subprocess
import tempfile import tempfile
from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_urlparse, compat_urlparse,
compat_kwargs, compat_kwargs,
) )
from ..utils import ( from ..utils import (
check_executable, check_executable,
determine_ext,
encodeArgument, encodeArgument,
ExtractorError, ExtractorError,
get_element_by_id,
get_exe_version, get_exe_version,
is_outdated_version, is_outdated_version,
std_headers, std_headers,
@ -240,262 +236,3 @@ class PhantomJSwrapper(object):
self._load_cookies() self._load_cookies()
return (html, encodeArgument(out)) return (html, encodeArgument(out))
class OpenloadIE(InfoExtractor):
_DOMAINS = r'''
(?:
openload\.(?:co|io|link|pw)|
oload\.(?:tv|best|biz|stream|site|xyz|win|download|cloud|cc|icu|fun|club|info|online|monster|press|pw|life|live|space|services|website|vip)|
oladblock\.(?:services|xyz|me)|openloed\.co
)
'''
_VALID_URL = r'''(?x)
https?://
(?P<host>
(?:www\.)?
%s
)/
(?:f|embed)/
(?P<id>[a-zA-Z0-9-_]+)
''' % _DOMAINS
_EMBED_WORD = 'embed'
_STREAM_WORD = 'f'
_REDIR_WORD = 'stream'
_URL_IDS = ('streamurl', 'streamuri', 'streamurj')
_TESTS = [{
'url': 'https://openload.co/f/kUEfGclsU9o',
'md5': 'bf1c059b004ebc7a256f89408e65c36e',
'info_dict': {
'id': 'kUEfGclsU9o',
'ext': 'mp4',
'title': 'skyrim_no-audio_1080.mp4',
'thumbnail': r're:^https?://.*\.jpg$',
},
}, {
'url': 'https://openload.co/embed/rjC09fkPLYs',
'info_dict': {
'id': 'rjC09fkPLYs',
'ext': 'mp4',
'title': 'movie.mp4',
'thumbnail': r're:^https?://.*\.jpg$',
'subtitles': {
'en': [{
'ext': 'vtt',
}],
},
},
'params': {
'skip_download': True, # test subtitles only
},
}, {
'url': 'https://openload.co/embed/kUEfGclsU9o/skyrim_no-audio_1080.mp4',
'only_matching': True,
}, {
'url': 'https://openload.io/f/ZAn6oz-VZGE/',
'only_matching': True,
}, {
'url': 'https://openload.co/f/_-ztPaZtMhM/',
'only_matching': True,
}, {
# unavailable via https://openload.co/f/Sxz5sADo82g/, different layout
# for title and ext
'url': 'https://openload.co/embed/Sxz5sADo82g/',
'only_matching': True,
}, {
# unavailable via https://openload.co/embed/e-Ixz9ZR5L0/ but available
# via https://openload.co/f/e-Ixz9ZR5L0/
'url': 'https://openload.co/f/e-Ixz9ZR5L0/',
'only_matching': True,
}, {
'url': 'https://oload.tv/embed/KnG-kKZdcfY/',
'only_matching': True,
}, {
'url': 'http://www.openload.link/f/KnG-kKZdcfY',
'only_matching': True,
}, {
'url': 'https://oload.stream/f/KnG-kKZdcfY',
'only_matching': True,
}, {
'url': 'https://oload.xyz/f/WwRBpzW8Wtk',
'only_matching': True,
}, {
'url': 'https://oload.win/f/kUEfGclsU9o',
'only_matching': True,
}, {
'url': 'https://oload.download/f/kUEfGclsU9o',
'only_matching': True,
}, {
'url': 'https://oload.cloud/f/4ZDnBXRWiB8',
'only_matching': True,
}, {
# Its title has not got its extension but url has it
'url': 'https://oload.download/f/N4Otkw39VCw/Tomb.Raider.2018.HDRip.XviD.AC3-EVO.avi.mp4',
'only_matching': True,
}, {
'url': 'https://oload.cc/embed/5NEAbI2BDSk',
'only_matching': True,
}, {
'url': 'https://oload.icu/f/-_i4y_F_Hs8',
'only_matching': True,
}, {
'url': 'https://oload.fun/f/gb6G1H4sHXY',
'only_matching': True,
}, {
'url': 'https://oload.club/f/Nr1L-aZ2dbQ',
'only_matching': True,
}, {
'url': 'https://oload.info/f/5NEAbI2BDSk',
'only_matching': True,
}, {
'url': 'https://openload.pw/f/WyKgK8s94N0',
'only_matching': True,
}, {
'url': 'https://oload.pw/f/WyKgK8s94N0',
'only_matching': True,
}, {
'url': 'https://oload.live/f/-Z58UZ-GR4M',
'only_matching': True,
}, {
'url': 'https://oload.space/f/IY4eZSst3u8/',
'only_matching': True,
}, {
'url': 'https://oload.services/embed/bs1NWj1dCag/',
'only_matching': True,
}, {
'url': 'https://oload.online/f/W8o2UfN1vNY/',
'only_matching': True,
}, {
'url': 'https://oload.monster/f/W8o2UfN1vNY/',
'only_matching': True,
}, {
'url': 'https://oload.press/embed/drTBl1aOTvk/',
'only_matching': True,
}, {
'url': 'https://oload.website/embed/drTBl1aOTvk/',
'only_matching': True,
}, {
'url': 'https://oload.life/embed/oOzZjNPw9Dc/',
'only_matching': True,
}, {
'url': 'https://oload.biz/f/bEk3Gp8ARr4/',
'only_matching': True,
}, {
'url': 'https://oload.best/embed/kkz9JgVZeWc/',
'only_matching': True,
}, {
'url': 'https://oladblock.services/f/b8NWEgkqNLI/',
'only_matching': True,
}, {
'url': 'https://oladblock.xyz/f/b8NWEgkqNLI/',
'only_matching': True,
}, {
'url': 'https://oladblock.me/f/b8NWEgkqNLI/',
'only_matching': True,
}, {
'url': 'https://openloed.co/f/b8NWEgkqNLI/',
'only_matching': True,
}, {
'url': 'https://oload.vip/f/kUEfGclsU9o',
'only_matching': True,
}]
@classmethod
def _extract_urls(cls, webpage):
return re.findall(
r'(?x)<iframe[^>]+src=["\']((?:https?://)?%s/%s/[a-zA-Z0-9-_]+)'
% (cls._DOMAINS, cls._EMBED_WORD), webpage)
def _extract_decrypted_page(self, page_url, webpage, video_id):
phantom = PhantomJSwrapper(self, required_version='2.0')
webpage, _ = phantom.get(page_url, html=webpage, video_id=video_id)
return webpage
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
host = mobj.group('host')
video_id = mobj.group('id')
url_pattern = 'https://%s/%%s/%s/' % (host, video_id)
for path in (self._EMBED_WORD, self._STREAM_WORD):
page_url = url_pattern % path
last = path == self._STREAM_WORD
webpage = self._download_webpage(
page_url, video_id, 'Downloading %s webpage' % path,
fatal=last)
if not webpage:
continue
if 'File not found' in webpage or 'deleted by the owner' in webpage:
if not last:
continue
raise ExtractorError('File not found', expected=True, video_id=video_id)
break
webpage = self._extract_decrypted_page(page_url, webpage, video_id)
for element_id in self._URL_IDS:
decoded_id = get_element_by_id(element_id, webpage)
if decoded_id:
break
if not decoded_id:
decoded_id = self._search_regex(
(r'>\s*([\w-]+~\d{10,}~\d+\.\d+\.0\.0~[\w-]+)\s*<',
r'>\s*([\w~-]+~\d+\.\d+\.\d+\.\d+~[\w~-]+)',
r'>\s*([\w-]+~\d{10,}~(?:[a-f\d]+:){2}:~[\w-]+)\s*<',
r'>\s*([\w~-]+~[a-f0-9:]+~[\w~-]+)\s*<',
r'>\s*([\w~-]+~[a-f0-9:]+~[\w~-]+)'), webpage,
'stream URL')
video_url = 'https://%s/%s/%s?mime=true' % (host, self._REDIR_WORD, decoded_id)
title = self._og_search_title(webpage, default=None) or self._search_regex(
r'<span[^>]+class=["\']title["\'][^>]*>([^<]+)', webpage,
'title', default=None) or self._html_search_meta(
'description', webpage, 'title', fatal=True)
entries = self._parse_html5_media_entries(page_url, webpage, video_id)
entry = entries[0] if entries else {}
subtitles = entry.get('subtitles')
return {
'id': video_id,
'title': title,
'thumbnail': entry.get('thumbnail') or self._og_search_thumbnail(webpage, default=None),
'url': video_url,
'ext': determine_ext(title, None) or determine_ext(url, 'mp4'),
'subtitles': subtitles,
}
class VerystreamIE(OpenloadIE):
IE_NAME = 'verystream'
_DOMAINS = r'(?:verystream\.com|woof\.tube)'
_VALID_URL = r'''(?x)
https?://
(?P<host>
(?:www\.)?
%s
)/
(?:stream|e)/
(?P<id>[a-zA-Z0-9-_]+)
''' % _DOMAINS
_EMBED_WORD = 'e'
_STREAM_WORD = 'stream'
_REDIR_WORD = 'gettoken'
_URL_IDS = ('videolink', )
_TESTS = [{
'url': 'https://verystream.com/stream/c1GWQ9ngBBx/',
'md5': 'd3e8c5628ccb9970b65fd65269886795',
'info_dict': {
'id': 'c1GWQ9ngBBx',
'ext': 'mp4',
'title': 'Big Buck Bunny.mp4',
'thumbnail': r're:^https?://.*\.jpg$',
},
}, {
'url': 'https://verystream.com/e/c1GWQ9ngBBx/',
'only_matching': True,
}]
def _extract_decrypted_page(self, page_url, webpage, video_id):
return webpage # for Verystream, the webpage is already decrypted

View File

@ -6,12 +6,14 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str from ..compat import compat_str
from ..utils import ( from ..utils import (
clean_html,
determine_ext, determine_ext,
float_or_none, float_or_none,
HEADRequest, HEADRequest,
int_or_none, int_or_none,
orderedSet, orderedSet,
remove_end, remove_end,
str_or_none,
strip_jsonp, strip_jsonp,
unescapeHTML, unescapeHTML,
unified_strdate, unified_strdate,
@ -88,8 +90,11 @@ class ORFTVthekIE(InfoExtractor):
format_id = '-'.join(format_id_list) format_id = '-'.join(format_id_list)
ext = determine_ext(src) ext = determine_ext(src)
if ext == 'm3u8': if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats( m3u8_formats = self._extract_m3u8_formats(
src, video_id, 'mp4', m3u8_id=format_id, fatal=False)) src, video_id, 'mp4', m3u8_id=format_id, fatal=False)
if any('/geoprotection' in f['url'] for f in m3u8_formats):
self.raise_geo_restricted()
formats.extend(m3u8_formats)
elif ext == 'f4m': elif ext == 'f4m':
formats.extend(self._extract_f4m_formats( formats.extend(self._extract_f4m_formats(
src, video_id, f4m_id=format_id, fatal=False)) src, video_id, f4m_id=format_id, fatal=False))
@ -161,44 +166,48 @@ class ORFRadioIE(InfoExtractor):
show_date = mobj.group('date') show_date = mobj.group('date')
show_id = mobj.group('show') show_id = mobj.group('show')
if station == 'fm4':
show_id = '4%s' % show_id
data = self._download_json( data = self._download_json(
'http://audioapi.orf.at/%s/api/json/current/broadcast/%s/%s' % (station, show_id, show_date), 'http://audioapi.orf.at/%s/api/json/current/broadcast/%s/%s'
show_id % (station, show_id, show_date), show_id)
)
def extract_entry_dict(info, title, subtitle): entries = []
return { for info in data['streams']:
'id': info['loopStreamId'].replace('.mp3', ''), loop_stream_id = str_or_none(info.get('loopStreamId'))
'url': 'http://loopstream01.apa.at/?channel=%s&id=%s' % (station, info['loopStreamId']), if not loop_stream_id:
continue
title = str_or_none(data.get('title'))
if not title:
continue
start = int_or_none(info.get('start'), scale=1000)
end = int_or_none(info.get('end'), scale=1000)
duration = end - start if end and start else None
entries.append({
'id': loop_stream_id.replace('.mp3', ''),
'url': 'http://loopstream01.apa.at/?channel=%s&id=%s' % (station, loop_stream_id),
'title': title, 'title': title,
'description': subtitle, 'description': clean_html(data.get('subtitle')),
'duration': (info['end'] - info['start']) / 1000, 'duration': duration,
'timestamp': info['start'] / 1000, 'timestamp': start,
'ext': 'mp3', 'ext': 'mp3',
'series': data.get('programTitle') 'series': data.get('programTitle'),
} })
entries = [extract_entry_dict(t, data['title'], data['subtitle']) for t in data['streams']]
return { return {
'_type': 'playlist', '_type': 'playlist',
'id': show_id, 'id': show_id,
'title': data['title'], 'title': data.get('title'),
'description': data['subtitle'], 'description': clean_html(data.get('subtitle')),
'entries': entries 'entries': entries,
} }
class ORFFM4IE(ORFRadioIE): class ORFFM4IE(ORFRadioIE):
IE_NAME = 'orf:fm4' IE_NAME = 'orf:fm4'
IE_DESC = 'radio FM4' IE_DESC = 'radio FM4'
_VALID_URL = r'https?://(?P<station>fm4)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)' _VALID_URL = r'https?://(?P<station>fm4)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>4\w+)'
_TEST = { _TEST = {
'url': 'http://fm4.orf.at/player/20170107/CC', 'url': 'http://fm4.orf.at/player/20170107/4CC',
'md5': '2b0be47375432a7ef104453432a19212', 'md5': '2b0be47375432a7ef104453432a19212',
'info_dict': { 'info_dict': {
'id': '2017-01-07_2100_tl_54_7DaysSat18_31295', 'id': '2017-01-07_2100_tl_54_7DaysSat18_31295',
@ -209,7 +218,8 @@ class ORFFM4IE(ORFRadioIE):
'timestamp': 1483819257, 'timestamp': 1483819257,
'upload_date': '20170107', 'upload_date': '20170107',
}, },
'skip': 'Shows from ORF radios are only available for 7 days.' 'skip': 'Shows from ORF radios are only available for 7 days.',
'only_matching': True,
} }

View File

@ -1,99 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
ExtractorError,
qualities,
)
class PandaTVIE(InfoExtractor):
IE_DESC = '熊猫TV'
_VALID_URL = r'https?://(?:www\.)?panda\.tv/(?P<id>[0-9]+)'
_TESTS = [{
'url': 'http://www.panda.tv/66666',
'info_dict': {
'id': '66666',
'title': 're:.+',
'uploader': '刘杀鸡',
'ext': 'flv',
'is_live': True,
},
'params': {
'skip_download': True,
},
'skip': 'Live stream is offline',
}, {
'url': 'https://www.panda.tv/66666',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
config = self._download_json(
'https://www.panda.tv/api_room_v2?roomid=%s' % video_id, video_id)
error_code = config.get('errno', 0)
if error_code != 0:
raise ExtractorError(
'%s returned error %s: %s'
% (self.IE_NAME, error_code, config['errmsg']),
expected=True)
data = config['data']
video_info = data['videoinfo']
# 2 = live, 3 = offline
if video_info.get('status') != '2':
raise ExtractorError(
'Live stream is offline', expected=True)
title = data['roominfo']['name']
uploader = data.get('hostinfo', {}).get('name')
room_key = video_info['room_key']
stream_addr = video_info.get(
'stream_addr', {'OD': '1', 'HD': '1', 'SD': '1'})
# Reverse engineered from web player swf
# (http://s6.pdim.gs/static/07153e425f581151.swf at the moment of
# writing).
plflag0, plflag1 = video_info['plflag'].split('_')
plflag0 = int(plflag0) - 1
if plflag1 == '21':
plflag0 = 10
plflag1 = '4'
live_panda = 'live_panda' if plflag0 < 1 else ''
plflag_auth = self._parse_json(video_info['plflag_list'], video_id)
sign = plflag_auth['auth']['sign']
ts = plflag_auth['auth']['time']
rid = plflag_auth['auth']['rid']
quality_key = qualities(['OD', 'HD', 'SD'])
suffix = ['_small', '_mid', '']
formats = []
for k, v in stream_addr.items():
if v != '1':
continue
quality = quality_key(k)
if quality <= 0:
continue
for pref, (ext, pl) in enumerate((('m3u8', '-hls'), ('flv', ''))):
formats.append({
'url': 'https://pl%s%s.live.panda.tv/live_panda/%s%s%s.%s?sign=%s&ts=%s&rid=%s'
% (pl, plflag1, room_key, live_panda, suffix[quality], ext, sign, ts, rid),
'format_id': '%s-%s' % (k, ext),
'quality': quality,
'source_preference': pref,
})
self._sort_formats(formats)
return {
'id': video_id,
'title': self._live_title(title),
'uploader': uploader,
'formats': formats,
'is_live': True,
}

View File

@ -8,6 +8,7 @@ from ..compat import compat_str
from ..utils import ( from ..utils import (
int_or_none, int_or_none,
parse_resolution, parse_resolution,
str_or_none,
try_get, try_get,
unified_timestamp, unified_timestamp,
url_or_none, url_or_none,
@ -415,6 +416,7 @@ class PeerTubeIE(InfoExtractor):
peertube\.cpy\.re peertube\.cpy\.re
)''' )'''
_UUID_RE = r'[\da-fA-F]{8}-[\da-fA-F]{4}-[\da-fA-F]{4}-[\da-fA-F]{4}-[\da-fA-F]{12}' _UUID_RE = r'[\da-fA-F]{8}-[\da-fA-F]{4}-[\da-fA-F]{4}-[\da-fA-F]{4}-[\da-fA-F]{12}'
_API_BASE = 'https://%s/api/v1/videos/%s/%s'
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
(?: (?:
peertube:(?P<host>[^:]+):| peertube:(?P<host>[^:]+):|
@ -423,26 +425,30 @@ class PeerTubeIE(InfoExtractor):
(?P<id>%s) (?P<id>%s)
''' % (_INSTANCES_RE, _UUID_RE) ''' % (_INSTANCES_RE, _UUID_RE)
_TESTS = [{ _TESTS = [{
'url': 'https://peertube.cpy.re/videos/watch/2790feb0-8120-4e63-9af3-c943c69f5e6c', 'url': 'https://framatube.org/videos/watch/9c9de5e8-0a1e-484a-b099-e80766180a6d',
'md5': '80f24ff364cc9d333529506a263e7feb', 'md5': '9bed8c0137913e17b86334e5885aacff',
'info_dict': { 'info_dict': {
'id': '2790feb0-8120-4e63-9af3-c943c69f5e6c', 'id': '9c9de5e8-0a1e-484a-b099-e80766180a6d',
'ext': 'mp4', 'ext': 'mp4',
'title': 'wow', 'title': 'What is PeerTube?',
'description': 'wow such video, so gif', 'description': 'md5:3fefb8dde2b189186ce0719fda6f7b10',
'thumbnail': r're:https?://.*\.(?:jpg|png)', 'thumbnail': r're:https?://.*\.(?:jpg|png)',
'timestamp': 1519297480, 'timestamp': 1538391166,
'upload_date': '20180222', 'upload_date': '20181001',
'uploader': 'Luclu7', 'uploader': 'Framasoft',
'uploader_id': '7fc42640-efdb-4505-a45d-a15b1a5496f1', 'uploader_id': '3',
'uploder_url': 'https://peertube.nsa.ovh/accounts/luclu7', 'uploader_url': 'https://framatube.org/accounts/framasoft',
'license': 'Unknown', 'channel': 'Les vidéos de Framasoft',
'duration': 3, 'channel_id': '2',
'channel_url': 'https://framatube.org/video-channels/bf54d359-cfad-4935-9d45-9d6be93f63e8',
'language': 'en',
'license': 'Attribution - Share Alike',
'duration': 113,
'view_count': int, 'view_count': int,
'like_count': int, 'like_count': int,
'dislike_count': int, 'dislike_count': int,
'tags': list, 'tags': ['framasoft', 'peertube'],
'categories': list, 'categories': ['Science & Technology'],
} }
}, { }, {
'url': 'https://peertube.tamanoir.foucry.net/videos/watch/0b04f13d-1e18-4f1d-814e-4979aa7c9c44', 'url': 'https://peertube.tamanoir.foucry.net/videos/watch/0b04f13d-1e18-4f1d-814e-4979aa7c9c44',
@ -484,13 +490,38 @@ class PeerTubeIE(InfoExtractor):
entries = [peertube_url] entries = [peertube_url]
return entries return entries
def _call_api(self, host, video_id, path, note=None, errnote=None, fatal=True):
return self._download_json(
self._API_BASE % (host, video_id, path), video_id,
note=note, errnote=errnote, fatal=fatal)
def _get_subtitles(self, host, video_id):
captions = self._call_api(
host, video_id, 'captions', note='Downloading captions JSON',
fatal=False)
if not isinstance(captions, dict):
return
data = captions.get('data')
if not isinstance(data, list):
return
subtitles = {}
for e in data:
language_id = try_get(e, lambda x: x['language']['id'], compat_str)
caption_url = urljoin('https://%s' % host, e.get('captionPath'))
if not caption_url:
continue
subtitles.setdefault(language_id or 'en', []).append({
'url': caption_url,
})
return subtitles
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
host = mobj.group('host') or mobj.group('host_2') host = mobj.group('host') or mobj.group('host_2')
video_id = mobj.group('id') video_id = mobj.group('id')
video = self._download_json( video = self._call_api(
'https://%s/api/v1/videos/%s' % (host, video_id), video_id) host, video_id, '', note='Downloading video JSON')
title = video['name'] title = video['name']
@ -513,10 +544,28 @@ class PeerTubeIE(InfoExtractor):
formats.append(f) formats.append(f)
self._sort_formats(formats) self._sort_formats(formats)
def account_data(field): full_description = self._call_api(
return try_get(video, lambda x: x['account'][field], compat_str) host, video_id, 'description', note='Downloading description JSON',
fatal=False)
category = try_get(video, lambda x: x['category']['label'], compat_str) description = None
if isinstance(full_description, dict):
description = str_or_none(full_description.get('description'))
if not description:
description = video.get('description')
subtitles = self.extract_subtitles(host, video_id)
def data(section, field, type_):
return try_get(video, lambda x: x[section][field], type_)
def account_data(field, type_):
return data('account', field, type_)
def channel_data(field, type_):
return data('channel', field, type_)
category = data('category', 'label', compat_str)
categories = [category] if category else None categories = [category] if category else None
nsfw = video.get('nsfw') nsfw = video.get('nsfw')
@ -528,14 +577,17 @@ class PeerTubeIE(InfoExtractor):
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'description': video.get('description'), 'description': description,
'thumbnail': urljoin(url, video.get('thumbnailPath')), 'thumbnail': urljoin(url, video.get('thumbnailPath')),
'timestamp': unified_timestamp(video.get('publishedAt')), 'timestamp': unified_timestamp(video.get('publishedAt')),
'uploader': account_data('displayName'), 'uploader': account_data('displayName', compat_str),
'uploader_id': account_data('uuid'), 'uploader_id': str_or_none(account_data('id', int)),
'uploder_url': account_data('url'), 'uploader_url': url_or_none(account_data('url', compat_str)),
'license': try_get( 'channel': channel_data('displayName', compat_str),
video, lambda x: x['licence']['label'], compat_str), 'channel_id': str_or_none(channel_data('id', int)),
'channel_url': url_or_none(channel_data('url', compat_str)),
'language': data('language', 'id', compat_str),
'license': data('licence', 'label', compat_str),
'duration': int_or_none(video.get('duration')), 'duration': int_or_none(video.get('duration')),
'view_count': int_or_none(video.get('views')), 'view_count': int_or_none(video.get('views')),
'like_count': int_or_none(video.get('likes')), 'like_count': int_or_none(video.get('likes')),
@ -544,4 +596,5 @@ class PeerTubeIE(InfoExtractor):
'tags': try_get(video, lambda x: x['tags'], list), 'tags': try_get(video, lambda x: x['tags'], list),
'categories': categories, 'categories': categories,
'formats': formats, 'formats': formats,
'subtitles': subtitles
} }

View File

@ -46,7 +46,7 @@ class PlatziBaseIE(InfoExtractor):
headers={'Referer': self._LOGIN_URL}) headers={'Referer': self._LOGIN_URL})
# login succeeded # login succeeded
if 'platzi.com/login' not in compat_str(urlh.geturl()): if 'platzi.com/login' not in urlh.geturl():
return return
login_error = self._webpage_read_content( login_error = self._webpage_read_content(

View File

@ -20,20 +20,16 @@ class PokemonIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'The Ol Raise and Switch!', 'title': 'The Ol Raise and Switch!',
'description': 'md5:7db77f7107f98ba88401d3adc80ff7af', 'description': 'md5:7db77f7107f98ba88401d3adc80ff7af',
'timestamp': 1511824728,
'upload_date': '20171127',
}, },
'add_id': ['LimelightMedia'], 'add_id': ['LimelightMedia'],
}, { }, {
# no data-video-title # no data-video-title
'url': 'https://www.pokemon.com/us/pokemon-episodes/pokemon-movies/pokemon-the-rise-of-darkrai-2008', 'url': 'https://www.pokemon.com/fr/episodes-pokemon/films-pokemon/pokemon-lascension-de-darkrai-2008',
'info_dict': { 'info_dict': {
'id': '99f3bae270bf4e5097274817239ce9c8', 'id': 'dfbaf830d7e54e179837c50c0c6cc0e1',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Pokémon: The Rise of Darkrai', 'title': "Pokémon : L'ascension de Darkrai",
'description': 'md5:ea8fbbf942e1e497d54b19025dd57d9d', 'description': 'md5:d1dbc9e206070c3e14a06ff557659fb5',
'timestamp': 1417778347,
'upload_date': '20141205',
}, },
'add_id': ['LimelightMedia'], 'add_id': ['LimelightMedia'],
'params': { 'params': {

View File

@ -0,0 +1,99 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import (
compat_b64decode,
compat_chr,
)
from ..utils import int_or_none
class PopcorntimesIE(InfoExtractor):
_VALID_URL = r'https?://popcorntimes\.tv/[^/]+/m/(?P<id>[^/]+)/(?P<display_id>[^/?#&]+)'
_TEST = {
'url': 'https://popcorntimes.tv/de/m/A1XCFvz/haensel-und-gretel-opera-fantasy',
'md5': '93f210991ad94ba8c3485950a2453257',
'info_dict': {
'id': 'A1XCFvz',
'display_id': 'haensel-und-gretel-opera-fantasy',
'ext': 'mp4',
'title': 'Hänsel und Gretel',
'description': 'md5:1b8146791726342e7b22ce8125cf6945',
'thumbnail': r're:^https?://.*\.jpg$',
'creator': 'John Paul',
'release_date': '19541009',
'duration': 4260,
'tbr': 5380,
'width': 720,
'height': 540,
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id, display_id = mobj.group('id', 'display_id')
webpage = self._download_webpage(url, display_id)
title = self._search_regex(
r'<h1>([^<]+)', webpage, 'title',
default=None) or self._html_search_meta(
'ya:ovs:original_name', webpage, 'title', fatal=True)
loc = self._search_regex(
r'PCTMLOC\s*=\s*(["\'])(?P<value>(?:(?!\1).)+)\1', webpage, 'loc',
group='value')
loc_b64 = ''
for c in loc:
c_ord = ord(c)
if ord('a') <= c_ord <= ord('z') or ord('A') <= c_ord <= ord('Z'):
upper = ord('Z') if c_ord <= ord('Z') else ord('z')
c_ord += 13
if upper < c_ord:
c_ord -= 26
loc_b64 += compat_chr(c_ord)
video_url = compat_b64decode(loc_b64).decode('utf-8')
description = self._html_search_regex(
r'(?s)<div[^>]+class=["\']pt-movie-desc[^>]+>(.+?)</div>', webpage,
'description', fatal=False)
thumbnail = self._search_regex(
r'<img[^>]+class=["\']video-preview[^>]+\bsrc=(["\'])(?P<value>(?:(?!\1).)+)\1',
webpage, 'thumbnail', default=None,
group='value') or self._og_search_thumbnail(webpage)
creator = self._html_search_meta(
'video:director', webpage, 'creator', default=None)
release_date = self._html_search_meta(
'video:release_date', webpage, default=None)
if release_date:
release_date = release_date.replace('-', '')
def int_meta(name):
return int_or_none(self._html_search_meta(
name, webpage, default=None))
return {
'id': video_id,
'display_id': display_id,
'url': video_url,
'title': title,
'description': description,
'thumbnail': thumbnail,
'creator': creator,
'release_date': release_date,
'duration': int_meta('video:duration'),
'tbr': int_meta('ya:ovs:bitrate'),
'width': int_meta('og:video:width'),
'height': int_meta('og:video:height'),
'http_headers': {
'Referer': url,
},
}

View File

@ -8,6 +8,7 @@ from ..utils import (
ExtractorError, ExtractorError,
int_or_none, int_or_none,
js_to_json, js_to_json,
merge_dicts,
urljoin, urljoin,
) )
@ -27,23 +28,22 @@ class PornHdIE(InfoExtractor):
'view_count': int, 'view_count': int,
'like_count': int, 'like_count': int,
'age_limit': 18, 'age_limit': 18,
} },
'skip': 'HTTP Error 404: Not Found',
}, { }, {
# removed video
'url': 'http://www.pornhd.com/videos/1962/sierra-day-gets-his-cum-all-over-herself-hd-porn-video', 'url': 'http://www.pornhd.com/videos/1962/sierra-day-gets-his-cum-all-over-herself-hd-porn-video',
'md5': '956b8ca569f7f4d8ec563e2c41598441', 'md5': '1b7b3a40b9d65a8e5b25f7ab9ee6d6de',
'info_dict': { 'info_dict': {
'id': '1962', 'id': '1962',
'display_id': 'sierra-day-gets-his-cum-all-over-herself-hd-porn-video', 'display_id': 'sierra-day-gets-his-cum-all-over-herself-hd-porn-video',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Sierra loves doing laundry', 'title': 'md5:98c6f8b2d9c229d0f0fde47f61a1a759',
'description': 'md5:8ff0523848ac2b8f9b065ba781ccf294', 'description': 'md5:8ff0523848ac2b8f9b065ba781ccf294',
'thumbnail': r're:^https?://.*\.jpg', 'thumbnail': r're:^https?://.*\.jpg',
'view_count': int, 'view_count': int,
'like_count': int, 'like_count': int,
'age_limit': 18, 'age_limit': 18,
}, },
'skip': 'Not available anymore',
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -61,7 +61,13 @@ class PornHdIE(InfoExtractor):
r"(?s)sources'?\s*[:=]\s*(\{.+?\})", r"(?s)sources'?\s*[:=]\s*(\{.+?\})",
webpage, 'sources', default='{}')), video_id) webpage, 'sources', default='{}')), video_id)
info = {}
if not sources: if not sources:
entries = self._parse_html5_media_entries(url, webpage, video_id)
if entries:
info = entries[0]
if not sources and not info:
message = self._html_search_regex( message = self._html_search_regex(
r'(?s)<(div|p)[^>]+class="no-video"[^>]*>(?P<value>.+?)</\1', r'(?s)<(div|p)[^>]+class="no-video"[^>]*>(?P<value>.+?)</\1',
webpage, 'error message', group='value') webpage, 'error message', group='value')
@ -80,23 +86,29 @@ class PornHdIE(InfoExtractor):
'format_id': format_id, 'format_id': format_id,
'height': height, 'height': height,
}) })
self._sort_formats(formats) if formats:
info['formats'] = formats
self._sort_formats(info['formats'])
description = self._html_search_regex( description = self._html_search_regex(
r'<(div|p)[^>]+class="description"[^>]*>(?P<value>[^<]+)</\1', (r'(?s)<section[^>]+class=["\']video-description[^>]+>(?P<value>.+?)</section>',
webpage, 'description', fatal=False, group='value') r'<(div|p)[^>]+class="description"[^>]*>(?P<value>[^<]+)</\1'),
webpage, 'description', fatal=False,
group='value') or self._html_search_meta(
'description', webpage, default=None) or self._og_search_description(webpage)
view_count = int_or_none(self._html_search_regex( view_count = int_or_none(self._html_search_regex(
r'(\d+) views\s*<', webpage, 'view count', fatal=False)) r'(\d+) views\s*<', webpage, 'view count', fatal=False))
thumbnail = self._search_regex( thumbnail = self._search_regex(
r"poster'?\s*:\s*([\"'])(?P<url>(?:(?!\1).)+)\1", webpage, r"poster'?\s*:\s*([\"'])(?P<url>(?:(?!\1).)+)\1", webpage,
'thumbnail', fatal=False, group='url') 'thumbnail', default=None, group='url')
like_count = int_or_none(self._search_regex( like_count = int_or_none(self._search_regex(
(r'(\d+)\s*</11[^>]+>(?:&nbsp;|\s)*\blikes', (r'(\d+)</span>\s*likes',
r'(\d+)\s*</11[^>]+>(?:&nbsp;|\s)*\blikes',
r'class=["\']save-count["\'][^>]*>\s*(\d+)'), r'class=["\']save-count["\'][^>]*>\s*(\d+)'),
webpage, 'like count', fatal=False)) webpage, 'like count', fatal=False))
return { return merge_dicts(info, {
'id': video_id, 'id': video_id,
'display_id': display_id, 'display_id': display_id,
'title': title, 'title': title,
@ -106,4 +118,4 @@ class PornHdIE(InfoExtractor):
'like_count': like_count, 'like_count': like_count,
'formats': formats, 'formats': formats,
'age_limit': 18, 'age_limit': 18,
} })

View File

@ -17,6 +17,7 @@ from ..utils import (
determine_ext, determine_ext,
ExtractorError, ExtractorError,
int_or_none, int_or_none,
NO_DEFAULT,
orderedSet, orderedSet,
remove_quotes, remove_quotes,
str_to_int, str_to_int,
@ -51,7 +52,7 @@ class PornHubIE(PornHubBaseIE):
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?:// https?://
(?: (?:
(?:[^/]+\.)?(?P<host>pornhub\.(?:com|net))/(?:(?:view_video\.php|video/show)\?viewkey=|embed/)| (?:[^/]+\.)?(?P<host>pornhub(?:premium)?\.(?:com|net))/(?:(?:view_video\.php|video/show)\?viewkey=|embed/)|
(?:www\.)?thumbzilla\.com/video/ (?:www\.)?thumbzilla\.com/video/
) )
(?P<id>[\da-z]+) (?P<id>[\da-z]+)
@ -148,6 +149,9 @@ class PornHubIE(PornHubBaseIE):
}, { }, {
'url': 'https://www.pornhub.net/view_video.php?viewkey=203640933', 'url': 'https://www.pornhub.net/view_video.php?viewkey=203640933',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.pornhubpremium.com/view_video.php?viewkey=ph5e4acdae54a82',
'only_matching': True,
}] }]
@staticmethod @staticmethod
@ -165,6 +169,13 @@ class PornHubIE(PornHubBaseIE):
host = mobj.group('host') or 'pornhub.com' host = mobj.group('host') or 'pornhub.com'
video_id = mobj.group('id') video_id = mobj.group('id')
if 'premium' in host:
if not self._downloader.params.get('cookiefile'):
raise ExtractorError(
'PornHub Premium requires authentication.'
' You may want to use --cookies.',
expected=True)
self._set_cookie(host, 'age_verified', '1') self._set_cookie(host, 'age_verified', '1')
def dl_webpage(platform): def dl_webpage(platform):
@ -188,10 +199,10 @@ class PornHubIE(PornHubBaseIE):
# http://www.pornhub.com/view_video.php?viewkey=1331683002), not relying # http://www.pornhub.com/view_video.php?viewkey=1331683002), not relying
# on that anymore. # on that anymore.
title = self._html_search_meta( title = self._html_search_meta(
'twitter:title', webpage, default=None) or self._search_regex( 'twitter:title', webpage, default=None) or self._html_search_regex(
(r'<h1[^>]+class=["\']title["\'][^>]*>(?P<title>[^<]+)', (r'(?s)<h1[^>]+class=["\']title["\'][^>]*>(?P<title>.+?)</h1>',
r'<div[^>]+data-video-title=(["\'])(?P<title>.+?)\1', r'<div[^>]+data-video-title=(["\'])(?P<title>(?:(?!\1).)+)\1',
r'shareTitle\s*=\s*(["\'])(?P<title>.+?)\1'), r'shareTitle["\']\s*[=:]\s*(["\'])(?P<title>(?:(?!\1).)+)\1'),
webpage, 'title', group='title') webpage, 'title', group='title')
video_urls = [] video_urls = []
@ -227,12 +238,13 @@ class PornHubIE(PornHubBaseIE):
else: else:
thumbnail, duration = [None] * 2 thumbnail, duration = [None] * 2
if not video_urls: def extract_js_vars(webpage, pattern, default=NO_DEFAULT):
tv_webpage = dl_webpage('tv')
assignments = self._search_regex( assignments = self._search_regex(
r'(var.+?mediastring.+?)</script>', tv_webpage, pattern, webpage, 'encoded url', default=default)
'encoded url').split(';') if not assignments:
return {}
assignments = assignments.split(';')
js_vars = {} js_vars = {}
@ -254,11 +266,35 @@ class PornHubIE(PornHubBaseIE):
assn = re.sub(r'var\s+', '', assn) assn = re.sub(r'var\s+', '', assn)
vname, value = assn.split('=', 1) vname, value = assn.split('=', 1)
js_vars[vname] = parse_js_value(value) js_vars[vname] = parse_js_value(value)
return js_vars
video_url = js_vars['mediastring'] def add_video_url(video_url):
if video_url not in video_urls_set: v_url = url_or_none(video_url)
video_urls.append((video_url, None)) if not v_url:
video_urls_set.add(video_url) return
if v_url in video_urls_set:
return
video_urls.append((v_url, None))
video_urls_set.add(v_url)
if not video_urls:
FORMAT_PREFIXES = ('media', 'quality')
js_vars = extract_js_vars(
webpage, r'(var\s+(?:%s)_.+)' % '|'.join(FORMAT_PREFIXES),
default=None)
if js_vars:
for key, format_url in js_vars.items():
if any(key.startswith(p) for p in FORMAT_PREFIXES):
add_video_url(format_url)
if not video_urls and re.search(
r'<[^>]+\bid=["\']lockedPlayer', webpage):
raise ExtractorError(
'Video %s is locked' % video_id, expected=True)
if not video_urls:
js_vars = extract_js_vars(
dl_webpage('tv'), r'(var.+?mediastring.+?)</script>')
add_video_url(js_vars['mediastring'])
for mobj in re.finditer( for mobj in re.finditer(
r'<a[^>]+\bclass=["\']downloadBtn\b[^>]+\bhref=(["\'])(?P<url>(?:(?!\1).)+)\1', r'<a[^>]+\bclass=["\']downloadBtn\b[^>]+\bhref=(["\'])(?P<url>(?:(?!\1).)+)\1',
@ -276,10 +312,16 @@ class PornHubIE(PornHubBaseIE):
r'/(\d{6}/\d{2})/', video_url, 'upload data', default=None) r'/(\d{6}/\d{2})/', video_url, 'upload data', default=None)
if upload_date: if upload_date:
upload_date = upload_date.replace('/', '') upload_date = upload_date.replace('/', '')
if determine_ext(video_url) == 'mpd': ext = determine_ext(video_url)
if ext == 'mpd':
formats.extend(self._extract_mpd_formats( formats.extend(self._extract_mpd_formats(
video_url, video_id, mpd_id='dash', fatal=False)) video_url, video_id, mpd_id='dash', fatal=False))
continue continue
elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
video_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
continue
tbr = None tbr = None
mobj = re.search(r'(?P<height>\d+)[pP]?_(?P<tbr>\d+)[kK]', video_url) mobj = re.search(r'(?P<height>\d+)[pP]?_(?P<tbr>\d+)[kK]', video_url)
if mobj: if mobj:
@ -373,7 +415,7 @@ class PornHubPlaylistBaseIE(PornHubBaseIE):
class PornHubUserIE(PornHubPlaylistBaseIE): class PornHubUserIE(PornHubPlaylistBaseIE):
_VALID_URL = r'(?P<url>https?://(?:[^/]+\.)?pornhub\.(?:com|net)/(?:(?:user|channel)s|model|pornstar)/(?P<id>[^/?#&]+))(?:[?#&]|/(?!videos)|$)' _VALID_URL = r'(?P<url>https?://(?:[^/]+\.)?(?P<host>pornhub(?:premium)?\.(?:com|net))/(?:(?:user|channel)s|model|pornstar)/(?P<id>[^/?#&]+))(?:[?#&]|/(?!videos)|$)'
_TESTS = [{ _TESTS = [{
'url': 'https://www.pornhub.com/model/zoe_ph', 'url': 'https://www.pornhub.com/model/zoe_ph',
'playlist_mincount': 118, 'playlist_mincount': 118,
@ -441,7 +483,7 @@ class PornHubPagedPlaylistBaseIE(PornHubPlaylistBaseIE):
class PornHubPagedVideoListIE(PornHubPagedPlaylistBaseIE): class PornHubPagedVideoListIE(PornHubPagedPlaylistBaseIE):
_VALID_URL = r'https?://(?:[^/]+\.)?(?P<host>pornhub\.(?:com|net))/(?P<id>(?:[^/]+/)*[^/?#&]+)' _VALID_URL = r'https?://(?:[^/]+\.)?(?P<host>pornhub(?:premium)?\.(?:com|net))/(?P<id>(?:[^/]+/)*[^/?#&]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://www.pornhub.com/model/zoe_ph/videos', 'url': 'https://www.pornhub.com/model/zoe_ph/videos',
'only_matching': True, 'only_matching': True,
@ -556,7 +598,7 @@ class PornHubPagedVideoListIE(PornHubPagedPlaylistBaseIE):
class PornHubUserVideosUploadIE(PornHubPagedPlaylistBaseIE): class PornHubUserVideosUploadIE(PornHubPagedPlaylistBaseIE):
_VALID_URL = r'(?P<url>https?://(?:[^/]+\.)?(?P<host>pornhub\.(?:com|net))/(?:(?:user|channel)s|model|pornstar)/(?P<id>[^/]+)/videos/upload)' _VALID_URL = r'(?P<url>https?://(?:[^/]+\.)?(?P<host>pornhub(?:premium)?\.(?:com|net))/(?:(?:user|channel)s|model|pornstar)/(?P<id>[^/]+)/videos/upload)'
_TESTS = [{ _TESTS = [{
'url': 'https://www.pornhub.com/pornstar/jenny-blighe/videos/upload', 'url': 'https://www.pornhub.com/pornstar/jenny-blighe/videos/upload',
'info_dict': { 'info_dict': {

View File

@ -16,7 +16,7 @@ from ..utils import (
class ProSiebenSat1BaseIE(InfoExtractor): class ProSiebenSat1BaseIE(InfoExtractor):
_GEO_COUNTRIES = ['DE'] _GEO_BYPASS = False
_ACCESS_ID = None _ACCESS_ID = None
_SUPPORTED_PROTOCOLS = 'dash:clear,hls:clear,progressive:clear' _SUPPORTED_PROTOCOLS = 'dash:clear,hls:clear,progressive:clear'
_V4_BASE_URL = 'https://vas-v4.p7s1video.net/4.0/get' _V4_BASE_URL = 'https://vas-v4.p7s1video.net/4.0/get'
@ -39,14 +39,18 @@ class ProSiebenSat1BaseIE(InfoExtractor):
formats = [] formats = []
if self._ACCESS_ID: if self._ACCESS_ID:
raw_ct = self._ENCRYPTION_KEY + clip_id + self._IV + self._ACCESS_ID raw_ct = self._ENCRYPTION_KEY + clip_id + self._IV + self._ACCESS_ID
server_token = (self._download_json( protocols = self._download_json(
self._V4_BASE_URL + 'protocols', clip_id, self._V4_BASE_URL + 'protocols', clip_id,
'Downloading protocols JSON', 'Downloading protocols JSON',
headers=self.geo_verification_headers(), query={ headers=self.geo_verification_headers(), query={
'access_id': self._ACCESS_ID, 'access_id': self._ACCESS_ID,
'client_token': sha1((raw_ct).encode()).hexdigest(), 'client_token': sha1((raw_ct).encode()).hexdigest(),
'video_id': clip_id, 'video_id': clip_id,
}, fatal=False) or {}).get('server_token') }, fatal=False, expected_status=(403,)) or {}
error = protocols.get('error') or {}
if error.get('title') == 'Geo check failed':
self.raise_geo_restricted(countries=['AT', 'CH', 'DE'])
server_token = protocols.get('server_token')
if server_token: if server_token:
urls = (self._download_json( urls = (self._download_json(
self._V4_BASE_URL + 'urls', clip_id, 'Downloading urls JSON', query={ self._V4_BASE_URL + 'urls', clip_id, 'Downloading urls JSON', query={

View File

@ -43,8 +43,15 @@ class RedTubeIE(InfoExtractor):
webpage = self._download_webpage( webpage = self._download_webpage(
'http://www.redtube.com/%s' % video_id, video_id) 'http://www.redtube.com/%s' % video_id, video_id)
if any(s in webpage for s in ['video-deleted-info', '>This video has been removed']): ERRORS = (
raise ExtractorError('Video %s has been removed' % video_id, expected=True) (('video-deleted-info', '>This video has been removed'), 'has been removed'),
(('private_video_text', '>This video is private', '>Send a friend request to its owner to be able to view it'), 'is private'),
)
for patterns, message in ERRORS:
if any(p in webpage for p in patterns):
raise ExtractorError(
'Video %s %s' % (video_id, message), expected=True)
info = self._search_json_ld(webpage, video_id, default={}) info = self._search_json_ld(webpage, video_id, default={})

View File

@ -8,7 +8,6 @@ from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_parse_qs, compat_parse_qs,
compat_str,
compat_urlparse, compat_urlparse,
) )
from ..utils import ( from ..utils import (
@ -39,13 +38,13 @@ class SafariBaseIE(InfoExtractor):
'Downloading login page') 'Downloading login page')
def is_logged(urlh): def is_logged(urlh):
return 'learning.oreilly.com/home/' in compat_str(urlh.geturl()) return 'learning.oreilly.com/home/' in urlh.geturl()
if is_logged(urlh): if is_logged(urlh):
self.LOGGED_IN = True self.LOGGED_IN = True
return return
redirect_url = compat_str(urlh.geturl()) redirect_url = urlh.geturl()
parsed_url = compat_urlparse.urlparse(redirect_url) parsed_url = compat_urlparse.urlparse(redirect_url)
qs = compat_parse_qs(parsed_url.query) qs = compat_parse_qs(parsed_url.query)
next_uri = compat_urlparse.urljoin( next_uri = compat_urlparse.urljoin(
@ -165,7 +164,8 @@ class SafariIE(SafariBaseIE):
kaltura_session = self._download_json( kaltura_session = self._download_json(
'%s/player/kaltura_session/?reference_id=%s' % (self._API_BASE, reference_id), '%s/player/kaltura_session/?reference_id=%s' % (self._API_BASE, reference_id),
video_id, 'Downloading kaltura session JSON', video_id, 'Downloading kaltura session JSON',
'Unable to download kaltura session JSON', fatal=False) 'Unable to download kaltura session JSON', fatal=False,
headers={'Accept': 'application/json'})
if kaltura_session: if kaltura_session:
session = kaltura_session.get('session') session = kaltura_session.get('session')
if session: if session:

View File

@ -7,6 +7,7 @@ import re
from .aws import AWSIE from .aws import AWSIE
from .anvato import AnvatoIE from .anvato import AnvatoIE
from .common import InfoExtractor
from ..utils import ( from ..utils import (
smuggle_url, smuggle_url,
urlencode_postdata, urlencode_postdata,
@ -102,3 +103,50 @@ class ScrippsNetworksWatchIE(AWSIE):
'anvato:anvato_scripps_app_web_prod_0837996dbe373629133857ae9eb72e740424d80a:%s' % mcp_id, 'anvato:anvato_scripps_app_web_prod_0837996dbe373629133857ae9eb72e740424d80a:%s' % mcp_id,
{'geo_countries': ['US']}), {'geo_countries': ['US']}),
AnvatoIE.ie_key(), video_id=mcp_id) AnvatoIE.ie_key(), video_id=mcp_id)
class ScrippsNetworksIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?(?P<site>cookingchanneltv|discovery|(?:diy|food)network|hgtv|travelchannel)\.com/videos/[0-9a-z-]+-(?P<id>\d+)'
_TESTS = [{
'url': 'https://www.cookingchanneltv.com/videos/the-best-of-the-best-0260338',
'info_dict': {
'id': '0260338',
'ext': 'mp4',
'title': 'The Best of the Best',
'description': 'Catch a new episode of MasterChef Canada Tuedsay at 9/8c.',
'timestamp': 1475678834,
'upload_date': '20161005',
'uploader': 'SCNI-SCND',
},
'add_ie': ['ThePlatform'],
}, {
'url': 'https://www.diynetwork.com/videos/diy-barnwood-tablet-stand-0265790',
'only_matching': True,
}, {
'url': 'https://www.foodnetwork.com/videos/chocolate-strawberry-cake-roll-7524591',
'only_matching': True,
}, {
'url': 'https://www.hgtv.com/videos/cookie-decorating-101-0301929',
'only_matching': True,
}, {
'url': 'https://www.travelchannel.com/videos/two-climates-one-bag-5302184',
'only_matching': True,
}, {
'url': 'https://www.discovery.com/videos/guardians-of-the-glades-cooking-with-tom-cobb-5578368',
'only_matching': True,
}]
_ACCOUNT_MAP = {
'cookingchanneltv': 2433005105,
'discovery': 2706091867,
'diynetwork': 2433004575,
'foodnetwork': 2433005105,
'hgtv': 2433004575,
'travelchannel': 2433005739,
}
_TP_TEMPL = 'https://link.theplatform.com/s/ip77QC/media/guid/%d/%s?mbr=true'
def _real_extract(self, url):
site, guid = re.match(self._VALID_URL, url).groups()
return self.url_result(smuggle_url(
self._TP_TEMPL % (self._ACCOUNT_MAP[site], guid),
{'force_smil_url': True}), 'ThePlatform', guid)

View File

@ -7,9 +7,18 @@ from .common import InfoExtractor
class ServusIE(InfoExtractor): class ServusIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?servus\.com/(?:(?:at|de)/p/[^/]+|tv/videos)/(?P<id>[aA]{2}-\w+|\d+-\d+)' _VALID_URL = r'''(?x)
https?://
(?:www\.)?
(?:
servus\.com/(?:(?:at|de)/p/[^/]+|tv/videos)|
servustv\.com/videos
)
/(?P<id>[aA]{2}-\w+|\d+-\d+)
'''
_TESTS = [{ _TESTS = [{
'url': 'https://www.servus.com/de/p/Die-Gr%C3%BCnen-aus-Sicht-des-Volkes/AA-1T6VBU5PW1W12/', # new URL schema
'url': 'https://www.servustv.com/videos/aa-1t6vbu5pw1w12/',
'md5': '3e1dd16775aa8d5cbef23628cfffc1f4', 'md5': '3e1dd16775aa8d5cbef23628cfffc1f4',
'info_dict': { 'info_dict': {
'id': 'AA-1T6VBU5PW1W12', 'id': 'AA-1T6VBU5PW1W12',
@ -18,6 +27,10 @@ class ServusIE(InfoExtractor):
'description': 'md5:1247204d85783afe3682644398ff2ec4', 'description': 'md5:1247204d85783afe3682644398ff2ec4',
'thumbnail': r're:^https?://.*\.jpg', 'thumbnail': r're:^https?://.*\.jpg',
} }
}, {
# old URL schema
'url': 'https://www.servus.com/de/p/Die-Gr%C3%BCnen-aus-Sicht-des-Volkes/AA-1T6VBU5PW1W12/',
'only_matching': True,
}, { }, {
'url': 'https://www.servus.com/at/p/Wie-das-Leben-beginnt/1309984137314-381415152/', 'url': 'https://www.servus.com/at/p/Wie-das-Leben-beginnt/1309984137314-381415152/',
'only_matching': True, 'only_matching': True,

View File

@ -1,13 +1,18 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_b64decode from ..compat import (
compat_b64decode,
compat_urllib_parse_unquote_plus,
)
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
ExtractorError, ExtractorError,
int_or_none, int_or_none,
js_to_json,
KNOWN_EXTENSIONS, KNOWN_EXTENSIONS,
parse_filesize, parse_filesize,
rot47,
url_or_none, url_or_none,
urlencode_postdata, urlencode_postdata,
) )
@ -112,16 +117,22 @@ class VivoIE(SharedBaseIE):
webpage, 'filesize', fatal=False)) webpage, 'filesize', fatal=False))
def _extract_video_url(self, webpage, video_id, url): def _extract_video_url(self, webpage, video_id, url):
def decode_url(encoded_url): def decode_url_old(encoded_url):
return compat_b64decode(encoded_url).decode('utf-8') return compat_b64decode(encoded_url).decode('utf-8')
stream_url = url_or_none(decode_url(self._search_regex( stream_url = self._search_regex(
r'data-stream\s*=\s*(["\'])(?P<url>(?:(?!\1).)+)\1', webpage, r'data-stream\s*=\s*(["\'])(?P<url>(?:(?!\1).)+)\1', webpage,
'stream url', default=None, group='url'))) 'stream url', default=None, group='url')
if stream_url:
stream_url = url_or_none(decode_url_old(stream_url))
if stream_url: if stream_url:
return stream_url return stream_url
return self._parse_json(
def decode_url(encoded_url):
return rot47(compat_urllib_parse_unquote_plus(encoded_url))
return decode_url(self._parse_json(
self._search_regex( self._search_regex(
r'InitializeStream\s*\(\s*(["\'])(?P<url>(?:(?!\1).)+)\1', r'(?s)InitializeStream\s*\(\s*({.+?})\s*\)\s*;', webpage,
webpage, 'stream', group='url'), 'stream'),
video_id, transform_source=decode_url)[0] video_id, transform_source=js_to_json)['source'])

View File

@ -2,7 +2,7 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ExtractorError from ..utils import smuggle_url
class SlidesLiveIE(InfoExtractor): class SlidesLiveIE(InfoExtractor):
@ -14,9 +14,9 @@ class SlidesLiveIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': 'LMtgR8ba0b0', 'id': 'LMtgR8ba0b0',
'ext': 'mp4', 'ext': 'mp4',
'title': '38902413: external video', 'title': 'GCC IA16 backend',
'description': '3890241320170925-9-1yd6ech.mp4', 'description': 'Watch full version of this video at https://slideslive.com/38902413.',
'uploader': 'SlidesLive Administrator', 'uploader': 'SlidesLive Videos - A',
'uploader_id': 'UC62SdArr41t_-_fX40QCLRw', 'uploader_id': 'UC62SdArr41t_-_fX40QCLRw',
'upload_date': '20170925', 'upload_date': '20170925',
} }
@ -24,16 +24,38 @@ class SlidesLiveIE(InfoExtractor):
# video_service_name = youtube # video_service_name = youtube
'url': 'https://slideslive.com/38903721/magic-a-scientific-resurrection-of-an-esoteric-legend', 'url': 'https://slideslive.com/38903721/magic-a-scientific-resurrection-of-an-esoteric-legend',
'only_matching': True, 'only_matching': True,
}, {
# video_service_name = url
'url': 'https://slideslive.com/38922070/learning-transferable-skills-1',
'only_matching': True,
}, {
# video_service_name = vimeo
'url': 'https://slideslive.com/38921896/retrospectives-a-venue-for-selfreflection-in-ml-research-3',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
video_data = self._download_json( video_data = self._download_json(
url, video_id, headers={'Accept': 'application/json'}) 'https://ben.slideslive.com/player/' + video_id, video_id)
service_name = video_data['video_service_name'].lower() service_name = video_data['video_service_name'].lower()
if service_name == 'youtube': assert service_name in ('url', 'vimeo', 'youtube')
yt_video_id = video_data['video_service_id'] service_id = video_data['video_service_id']
return self.url_result(yt_video_id, 'Youtube', video_id=yt_video_id) info = {
'id': video_id,
'thumbnail': video_data.get('thumbnail'),
'url': service_id,
}
if service_name == 'url':
info['title'] = video_data['title']
else: else:
raise ExtractorError( info.update({
'Unsupported service name: {0}'.format(service_name), expected=True) '_type': 'url_transparent',
'ie_key': service_name.capitalize(),
'title': video_data.get('title'),
})
if service_name == 'vimeo':
info['url'] = smuggle_url(
'https://player.vimeo.com/video/' + service_id,
{'http_headers': {'Referer': url}})
return info

View File

@ -9,10 +9,13 @@ from .common import (
SearchInfoExtractor SearchInfoExtractor
) )
from ..compat import ( from ..compat import (
compat_HTTPError,
compat_kwargs,
compat_str, compat_str,
compat_urlparse, compat_urlparse,
) )
from ..utils import ( from ..utils import (
error_to_compat_str,
ExtractorError, ExtractorError,
float_or_none, float_or_none,
HEADRequest, HEADRequest,
@ -24,11 +27,17 @@ from ..utils import (
unified_timestamp, unified_timestamp,
update_url_query, update_url_query,
url_or_none, url_or_none,
urlhandle_detect_ext,
) )
class SoundcloudEmbedIE(InfoExtractor): class SoundcloudEmbedIE(InfoExtractor):
_VALID_URL = r'https?://(?:w|player|p)\.soundcloud\.com/player/?.*?url=(?P<id>.*)' _VALID_URL = r'https?://(?:w|player|p)\.soundcloud\.com/player/?.*?\burl=(?P<id>.+)'
_TEST = {
# from https://www.soundi.fi/uutiset/ennakkokuuntelussa-timo-kaukolammen-station-to-station-to-station-julkaisua-juhlitaan-tanaan-g-livelabissa/
'url': 'https://w.soundcloud.com/player/?visual=true&url=https%3A%2F%2Fapi.soundcloud.com%2Fplaylists%2F922213810&show_artwork=true&maxwidth=640&maxheight=960&dnt=1&secret_token=s-ziYey',
'only_matching': True,
}
@staticmethod @staticmethod
def _extract_urls(webpage): def _extract_urls(webpage):
@ -37,8 +46,13 @@ class SoundcloudEmbedIE(InfoExtractor):
webpage)] webpage)]
def _real_extract(self, url): def _real_extract(self, url):
return self.url_result(compat_urlparse.parse_qs( query = compat_urlparse.parse_qs(
compat_urlparse.urlparse(url).query)['url'][0]) compat_urlparse.urlparse(url).query)
api_url = query['url'][0]
secret_token = query.get('secret_token')
if secret_token:
api_url = update_url_query(api_url, {'secret_token': secret_token[0]})
return self.url_result(api_url)
class SoundcloudIE(InfoExtractor): class SoundcloudIE(InfoExtractor):
@ -83,7 +97,7 @@ class SoundcloudIE(InfoExtractor):
'repost_count': int, 'repost_count': int,
} }
}, },
# not streamable song # geo-restricted
{ {
'url': 'https://soundcloud.com/the-concept-band/goldrushed-mastered?in=the-concept-band/sets/the-royal-concept-ep', 'url': 'https://soundcloud.com/the-concept-band/goldrushed-mastered?in=the-concept-band/sets/the-royal-concept-ep',
'info_dict': { 'info_dict': {
@ -95,18 +109,13 @@ class SoundcloudIE(InfoExtractor):
'uploader_id': '9615865', 'uploader_id': '9615865',
'timestamp': 1337635207, 'timestamp': 1337635207,
'upload_date': '20120521', 'upload_date': '20120521',
'duration': 30, 'duration': 227.155,
'license': 'all-rights-reserved', 'license': 'all-rights-reserved',
'view_count': int, 'view_count': int,
'like_count': int, 'like_count': int,
'comment_count': int, 'comment_count': int,
'repost_count': int, 'repost_count': int,
}, },
'params': {
# rtmp
'skip_download': True,
},
'skip': 'Preview',
}, },
# private link # private link
{ {
@ -217,7 +226,6 @@ class SoundcloudIE(InfoExtractor):
'skip_download': True, 'skip_download': True,
}, },
}, },
# not available via api.soundcloud.com/i1/tracks/id/streams
{ {
'url': 'https://soundcloud.com/giovannisarani/mezzo-valzer', 'url': 'https://soundcloud.com/giovannisarani/mezzo-valzer',
'md5': 'e22aecd2bc88e0e4e432d7dcc0a1abf7', 'md5': 'e22aecd2bc88e0e4e432d7dcc0a1abf7',
@ -226,7 +234,7 @@ class SoundcloudIE(InfoExtractor):
'ext': 'mp3', 'ext': 'mp3',
'title': 'Mezzo Valzer', 'title': 'Mezzo Valzer',
'description': 'md5:4138d582f81866a530317bae316e8b61', 'description': 'md5:4138d582f81866a530317bae316e8b61',
'uploader': 'Giovanni Sarani', 'uploader': 'Micronie',
'uploader_id': '3352531', 'uploader_id': '3352531',
'timestamp': 1551394171, 'timestamp': 1551394171,
'upload_date': '20190228', 'upload_date': '20190228',
@ -238,14 +246,16 @@ class SoundcloudIE(InfoExtractor):
'comment_count': int, 'comment_count': int,
'repost_count': int, 'repost_count': int,
}, },
'expected_warnings': ['Unable to download JSON metadata'], },
} {
# with AAC HQ format available via OAuth token
'url': 'https://soundcloud.com/wandw/the-chainsmokers-ft-daya-dont-let-me-down-ww-remix-1',
'only_matching': True,
},
] ]
_API_BASE = 'https://api.soundcloud.com/'
_API_V2_BASE = 'https://api-v2.soundcloud.com/' _API_V2_BASE = 'https://api-v2.soundcloud.com/'
_BASE_URL = 'https://soundcloud.com/' _BASE_URL = 'https://soundcloud.com/'
_CLIENT_ID = 'BeGVhOrGmfboy1LtiHTQF6Ejpt9ULJCI'
_IMAGE_REPL_RE = r'-([0-9a-z]+)\.jpg' _IMAGE_REPL_RE = r'-([0-9a-z]+)\.jpg'
_ARTWORK_MAP = { _ARTWORK_MAP = {
@ -261,14 +271,53 @@ class SoundcloudIE(InfoExtractor):
'original': 0, 'original': 0,
} }
def _store_client_id(self, client_id):
self._downloader.cache.store('soundcloud', 'client_id', client_id)
def _update_client_id(self):
webpage = self._download_webpage('https://soundcloud.com/', None)
for src in reversed(re.findall(r'<script[^>]+src="([^"]+)"', webpage)):
script = self._download_webpage(src, None, fatal=False)
if script:
client_id = self._search_regex(
r'client_id\s*:\s*"([0-9a-zA-Z]{32})"',
script, 'client id', default=None)
if client_id:
self._CLIENT_ID = client_id
self._store_client_id(client_id)
return
raise ExtractorError('Unable to extract client id')
def _download_json(self, *args, **kwargs):
non_fatal = kwargs.get('fatal') is False
if non_fatal:
del kwargs['fatal']
query = kwargs.get('query', {}).copy()
for _ in range(2):
query['client_id'] = self._CLIENT_ID
kwargs['query'] = query
try:
return super(SoundcloudIE, self)._download_json(*args, **compat_kwargs(kwargs))
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 401:
self._store_client_id(None)
self._update_client_id()
continue
elif non_fatal:
self._downloader.report_warning(error_to_compat_str(e))
return False
raise
def _real_initialize(self):
self._CLIENT_ID = self._downloader.cache.load('soundcloud', 'client_id') or 'YUKXoArFcqrlQn9tfNHvvyfnDISj04zk'
@classmethod @classmethod
def _resolv_url(cls, url): def _resolv_url(cls, url):
return SoundcloudIE._API_V2_BASE + 'resolve?url=' + url + '&client_id=' + cls._CLIENT_ID return SoundcloudIE._API_V2_BASE + 'resolve?url=' + url
def _extract_info_dict(self, info, full_title=None, secret_token=None, version=2): def _extract_info_dict(self, info, full_title=None, secret_token=None):
track_id = compat_str(info['id']) track_id = compat_str(info['id'])
title = info['title'] title = info['title']
track_base_url = self._API_BASE + 'tracks/%s' % track_id
format_urls = set() format_urls = set()
formats = [] formats = []
@ -277,26 +326,27 @@ class SoundcloudIE(InfoExtractor):
query['secret_token'] = secret_token query['secret_token'] = secret_token
if info.get('downloadable') and info.get('has_downloads_left'): if info.get('downloadable') and info.get('has_downloads_left'):
format_url = update_url_query( download_url = update_url_query(
info.get('download_url') or track_base_url + '/download', query) self._API_V2_BASE + 'tracks/' + track_id + '/download', query)
format_urls.add(format_url) redirect_url = (self._download_json(download_url, track_id, fatal=False) or {}).get('redirectUri')
if version == 2: if redirect_url:
v1_info = self._download_json( urlh = self._request_webpage(
track_base_url, track_id, query=query, fatal=False) or {} HEADRequest(redirect_url), track_id, fatal=False)
else: if urlh:
v1_info = info format_url = urlh.geturl()
formats.append({ format_urls.add(format_url)
'format_id': 'download', formats.append({
'ext': v1_info.get('original_format') or 'mp3', 'format_id': 'download',
'filesize': int_or_none(v1_info.get('original_content_size')), 'ext': urlhandle_detect_ext(urlh) or 'mp3',
'url': format_url, 'filesize': int_or_none(urlh.headers.get('Content-Length')),
'preference': 10, 'url': format_url,
}) 'preference': 10,
})
def invalid_url(url): def invalid_url(url):
return not url or url in format_urls or re.search(r'/(?:preview|playlist)/0/30/', url) return not url or url in format_urls
def add_format(f, protocol): def add_format(f, protocol, is_preview=False):
mobj = re.search(r'\.(?P<abr>\d+)\.(?P<ext>[0-9a-z]{3,4})(?=[/?])', stream_url) mobj = re.search(r'\.(?P<abr>\d+)\.(?P<ext>[0-9a-z]{3,4})(?=[/?])', stream_url)
if mobj: if mobj:
for k, v in mobj.groupdict().items(): for k, v in mobj.groupdict().items():
@ -305,16 +355,27 @@ class SoundcloudIE(InfoExtractor):
format_id_list = [] format_id_list = []
if protocol: if protocol:
format_id_list.append(protocol) format_id_list.append(protocol)
ext = f.get('ext')
if ext == 'aac':
f['abr'] = '256'
for k in ('ext', 'abr'): for k in ('ext', 'abr'):
v = f.get(k) v = f.get(k)
if v: if v:
format_id_list.append(v) format_id_list.append(v)
preview = is_preview or re.search(r'/(?:preview|playlist)/0/30/', f['url'])
if preview:
format_id_list.append('preview')
abr = f.get('abr') abr = f.get('abr')
if abr: if abr:
f['abr'] = int(abr) f['abr'] = int(abr)
if protocol == 'hls':
protocol = 'm3u8' if ext == 'aac' else 'm3u8_native'
else:
protocol = 'http'
f.update({ f.update({
'format_id': '_'.join(format_id_list), 'format_id': '_'.join(format_id_list),
'protocol': 'm3u8_native' if protocol == 'hls' else 'http', 'protocol': protocol,
'preference': -10 if preview else None,
}) })
formats.append(f) formats.append(f)
@ -325,7 +386,7 @@ class SoundcloudIE(InfoExtractor):
if not isinstance(t, dict): if not isinstance(t, dict):
continue continue
format_url = url_or_none(t.get('url')) format_url = url_or_none(t.get('url'))
if not format_url or t.get('snipped') or '/preview/' in format_url: if not format_url:
continue continue
stream = self._download_json( stream = self._download_json(
format_url, track_id, query=query, fatal=False) format_url, track_id, query=query, fatal=False)
@ -348,44 +409,14 @@ class SoundcloudIE(InfoExtractor):
add_format({ add_format({
'url': stream_url, 'url': stream_url,
'ext': ext, 'ext': ext,
}, 'http' if protocol == 'progressive' else protocol) }, 'http' if protocol == 'progressive' else protocol,
t.get('snipped') or '/preview/' in format_url)
if not formats:
# Old API, does not work for some tracks (e.g.
# https://soundcloud.com/giovannisarani/mezzo-valzer)
# and might serve preview URLs (e.g.
# http://www.soundcloud.com/snbrn/ele)
format_dict = self._download_json(
track_base_url + '/streams', track_id,
'Downloading track url', query=query, fatal=False) or {}
for key, stream_url in format_dict.items():
if invalid_url(stream_url):
continue
format_urls.add(stream_url)
mobj = re.search(r'(http|hls)_([^_]+)_(\d+)_url', key)
if mobj:
protocol, ext, abr = mobj.groups()
add_format({
'abr': abr,
'ext': ext,
'url': stream_url,
}, protocol)
if not formats:
# We fallback to the stream_url in the original info, this
# cannot be always used, sometimes it can give an HTTP 404 error
urlh = self._request_webpage(
HEADRequest(info.get('stream_url') or track_base_url + '/stream'),
track_id, query=query, fatal=False)
if urlh:
stream_url = urlh.geturl()
if not invalid_url(stream_url):
add_format({'url': stream_url}, 'http')
for f in formats: for f in formats:
f['vcodec'] = 'none' f['vcodec'] = 'none'
if not formats and info.get('policy') == 'BLOCK':
self.raise_geo_restricted()
self._sort_formats(formats) self._sort_formats(formats)
user = info.get('user') or {} user = info.get('user') or {}
@ -441,9 +472,7 @@ class SoundcloudIE(InfoExtractor):
track_id = mobj.group('track_id') track_id = mobj.group('track_id')
query = { query = {}
'client_id': self._CLIENT_ID,
}
if track_id: if track_id:
info_json_url = self._API_V2_BASE + 'tracks/' + track_id info_json_url = self._API_V2_BASE + 'tracks/' + track_id
full_title = track_id full_title = track_id
@ -457,20 +486,24 @@ class SoundcloudIE(InfoExtractor):
resolve_title += '/%s' % token resolve_title += '/%s' % token
info_json_url = self._resolv_url(self._BASE_URL + resolve_title) info_json_url = self._resolv_url(self._BASE_URL + resolve_title)
version = 2
info = self._download_json( info = self._download_json(
info_json_url, full_title, 'Downloading info JSON', query=query, fatal=False) info_json_url, full_title, 'Downloading info JSON', query=query)
if not info:
info = self._download_json(
info_json_url.replace(self._API_V2_BASE, self._API_BASE),
full_title, 'Downloading info JSON', query=query)
version = 1
return self._extract_info_dict(info, full_title, token, version) return self._extract_info_dict(info, full_title, token)
class SoundcloudPlaylistBaseIE(SoundcloudIE): class SoundcloudPlaylistBaseIE(SoundcloudIE):
def _extract_track_entries(self, tracks, token=None): def _extract_set(self, playlist, token=None):
playlist_id = compat_str(playlist['id'])
tracks = playlist.get('tracks') or []
if not all([t.get('permalink_url') for t in tracks]) and token:
tracks = self._download_json(
self._API_V2_BASE + 'tracks', playlist_id,
'Downloading tracks', query={
'ids': ','.join([compat_str(t['id']) for t in tracks]),
'playlistId': playlist_id,
'playlistSecretToken': token,
})
entries = [] entries = []
for track in tracks: for track in tracks:
track_id = str_or_none(track.get('id')) track_id = str_or_none(track.get('id'))
@ -483,7 +516,10 @@ class SoundcloudPlaylistBaseIE(SoundcloudIE):
url += '?secret_token=' + token url += '?secret_token=' + token
entries.append(self.url_result( entries.append(self.url_result(
url, SoundcloudIE.ie_key(), track_id)) url, SoundcloudIE.ie_key(), track_id))
return entries return self.playlist_result(
entries, playlist_id,
playlist.get('title'),
playlist.get('description'))
class SoundcloudSetIE(SoundcloudPlaylistBaseIE): class SoundcloudSetIE(SoundcloudPlaylistBaseIE):
@ -494,6 +530,7 @@ class SoundcloudSetIE(SoundcloudPlaylistBaseIE):
'info_dict': { 'info_dict': {
'id': '2284613', 'id': '2284613',
'title': 'The Royal Concept EP', 'title': 'The Royal Concept EP',
'description': 'md5:71d07087c7a449e8941a70a29e34671e',
}, },
'playlist_mincount': 5, 'playlist_mincount': 5,
}, { }, {
@ -516,17 +553,13 @@ class SoundcloudSetIE(SoundcloudPlaylistBaseIE):
msgs = (compat_str(err['error_message']) for err in info['errors']) msgs = (compat_str(err['error_message']) for err in info['errors'])
raise ExtractorError('unable to download video webpage: %s' % ','.join(msgs)) raise ExtractorError('unable to download video webpage: %s' % ','.join(msgs))
entries = self._extract_track_entries(info['tracks'], token) return self._extract_set(info, token)
return self.playlist_result(
entries, str_or_none(info.get('id')), info.get('title'))
class SoundcloudPagedPlaylistBaseIE(SoundcloudPlaylistBaseIE): class SoundcloudPagedPlaylistBaseIE(SoundcloudIE):
def _extract_playlist(self, base_url, playlist_id, playlist_title): def _extract_playlist(self, base_url, playlist_id, playlist_title):
COMMON_QUERY = { COMMON_QUERY = {
'limit': 2000000000, 'limit': 2000000000,
'client_id': self._CLIENT_ID,
'linked_partitioning': '1', 'linked_partitioning': '1',
} }
@ -712,9 +745,7 @@ class SoundcloudPlaylistIE(SoundcloudPlaylistBaseIE):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
playlist_id = mobj.group('id') playlist_id = mobj.group('id')
query = { query = {}
'client_id': self._CLIENT_ID,
}
token = mobj.group('token') token = mobj.group('token')
if token: if token:
query['secret_token'] = token query['secret_token'] = token
@ -723,10 +754,7 @@ class SoundcloudPlaylistIE(SoundcloudPlaylistBaseIE):
self._API_V2_BASE + 'playlists/' + playlist_id, self._API_V2_BASE + 'playlists/' + playlist_id,
playlist_id, 'Downloading playlist', query=query) playlist_id, 'Downloading playlist', query=query)
entries = self._extract_track_entries(data['tracks'], token) return self._extract_set(data, token)
return self.playlist_result(
entries, playlist_id, data.get('title'), data.get('description'))
class SoundcloudSearchIE(SearchInfoExtractor, SoundcloudIE): class SoundcloudSearchIE(SearchInfoExtractor, SoundcloudIE):
@ -751,7 +779,6 @@ class SoundcloudSearchIE(SearchInfoExtractor, SoundcloudIE):
self._MAX_RESULTS_PER_PAGE) self._MAX_RESULTS_PER_PAGE)
query.update({ query.update({
'limit': limit, 'limit': limit,
'client_id': self._CLIENT_ID,
'linked_partitioning': 1, 'linked_partitioning': 1,
'offset': 0, 'offset': 0,
}) })

View File

@ -4,6 +4,7 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
determine_ext,
ExtractorError, ExtractorError,
merge_dicts, merge_dicts,
orderedSet, orderedSet,
@ -64,7 +65,7 @@ class SpankBangIE(InfoExtractor):
url.replace('/%s/embed' % video_id, '/%s/video' % video_id), url.replace('/%s/embed' % video_id, '/%s/video' % video_id),
video_id, headers={'Cookie': 'country=US'}) video_id, headers={'Cookie': 'country=US'})
if re.search(r'<[^>]+\bid=["\']video_removed', webpage): if re.search(r'<[^>]+\b(?:id|class)=["\']video_removed', webpage):
raise ExtractorError( raise ExtractorError(
'Video %s is not available' % video_id, expected=True) 'Video %s is not available' % video_id, expected=True)
@ -75,11 +76,20 @@ class SpankBangIE(InfoExtractor):
if not f_url: if not f_url:
return return
f = parse_resolution(format_id) f = parse_resolution(format_id)
f.update({ ext = determine_ext(f_url)
'url': f_url, if format_id.startswith('m3u8') or ext == 'm3u8':
'format_id': format_id, formats.extend(self._extract_m3u8_formats(
}) f_url, video_id, 'mp4', entry_protocol='m3u8_native',
formats.append(f) m3u8_id='hls', fatal=False))
elif format_id.startswith('mpd') or ext == 'mpd':
formats.extend(self._extract_mpd_formats(
f_url, video_id, mpd_id='dash', fatal=False))
elif ext == 'mp4' or f.get('width') or f.get('height'):
f.update({
'url': f_url,
'format_id': format_id,
})
formats.append(f)
STREAM_URL_PREFIX = 'stream_url_' STREAM_URL_PREFIX = 'stream_url_'
@ -93,28 +103,22 @@ class SpankBangIE(InfoExtractor):
r'data-streamkey\s*=\s*(["\'])(?P<value>(?:(?!\1).)+)\1', r'data-streamkey\s*=\s*(["\'])(?P<value>(?:(?!\1).)+)\1',
webpage, 'stream key', group='value') webpage, 'stream key', group='value')
sb_csrf_session = self._get_cookies(
'https://spankbang.com')['sb_csrf_session'].value
stream = self._download_json( stream = self._download_json(
'https://spankbang.com/api/videos/stream', video_id, 'https://spankbang.com/api/videos/stream', video_id,
'Downloading stream JSON', data=urlencode_postdata({ 'Downloading stream JSON', data=urlencode_postdata({
'id': stream_key, 'id': stream_key,
'data': 0, 'data': 0,
'sb_csrf_session': sb_csrf_session,
}), headers={ }), headers={
'Referer': url, 'Referer': url,
'X-CSRFToken': sb_csrf_session, 'X-Requested-With': 'XMLHttpRequest',
}) })
for format_id, format_url in stream.items(): for format_id, format_url in stream.items():
if format_id.startswith(STREAM_URL_PREFIX): if format_url and isinstance(format_url, list):
if format_url and isinstance(format_url, list): format_url = format_url[0]
format_url = format_url[0] extract_format(format_id, format_url)
extract_format(
format_id[len(STREAM_URL_PREFIX):], format_url)
self._sort_formats(formats) self._sort_formats(formats, field_preference=('preference', 'height', 'width', 'fps', 'tbr', 'format_id'))
info = self._search_json_ld(webpage, video_id, default={}) info = self._search_json_ld(webpage, video_id, default={})

View File

@ -3,34 +3,47 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import (
compat_urllib_parse_unquote,
compat_urllib_parse_urlparse,
)
from ..utils import ( from ..utils import (
sanitized_Request, float_or_none,
int_or_none,
merge_dicts,
str_or_none,
str_to_int, str_to_int,
unified_strdate, url_or_none,
) )
from ..aes import aes_decrypt_text
class SpankwireIE(InfoExtractor): class SpankwireIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?(?P<url>spankwire\.com/[^/]*/video(?P<id>[0-9]+)/?)' _VALID_URL = r'''(?x)
https?://
(?:www\.)?spankwire\.com/
(?:
[^/]+/video|
EmbedPlayer\.aspx/?\?.*?\bArticleId=
)
(?P<id>\d+)
'''
_TESTS = [{ _TESTS = [{
# download URL pattern: */<height>P_<tbr>K_<video_id>.mp4 # download URL pattern: */<height>P_<tbr>K_<video_id>.mp4
'url': 'http://www.spankwire.com/Buckcherry-s-X-Rated-Music-Video-Crazy-Bitch/video103545/', 'url': 'http://www.spankwire.com/Buckcherry-s-X-Rated-Music-Video-Crazy-Bitch/video103545/',
'md5': '8bbfde12b101204b39e4b9fe7eb67095', 'md5': '5aa0e4feef20aad82cbcae3aed7ab7cd',
'info_dict': { 'info_dict': {
'id': '103545', 'id': '103545',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Buckcherry`s X Rated Music Video Crazy Bitch', 'title': 'Buckcherry`s X Rated Music Video Crazy Bitch',
'description': 'Crazy Bitch X rated music video.', 'description': 'Crazy Bitch X rated music video.',
'duration': 222,
'uploader': 'oreusz', 'uploader': 'oreusz',
'uploader_id': '124697', 'uploader_id': '124697',
'upload_date': '20070507', 'timestamp': 1178587885,
'upload_date': '20070508',
'average_rating': float,
'view_count': int,
'comment_count': int,
'age_limit': 18, 'age_limit': 18,
} 'categories': list,
'tags': list,
},
}, { }, {
# download URL pattern: */mp4_<format_id>_<video_id>.mp4 # download URL pattern: */mp4_<format_id>_<video_id>.mp4
'url': 'http://www.spankwire.com/Titcums-Compiloation-I/video1921551/', 'url': 'http://www.spankwire.com/Titcums-Compiloation-I/video1921551/',
@ -45,83 +58,125 @@ class SpankwireIE(InfoExtractor):
'upload_date': '20150822', 'upload_date': '20150822',
'age_limit': 18, 'age_limit': 18,
}, },
'params': {
'proxy': '127.0.0.1:8118'
},
'skip': 'removed',
}, {
'url': 'https://www.spankwire.com/EmbedPlayer.aspx/?ArticleId=156156&autostart=true',
'only_matching': True,
}] }]
@staticmethod
def _extract_urls(webpage):
return re.findall(
r'<iframe[^>]+\bsrc=["\']((?:https?:)?//(?:www\.)?spankwire\.com/EmbedPlayer\.aspx/?\?.*?\bArticleId=\d+)',
webpage)
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) video_id = self._match_id(url)
video_id = mobj.group('id')
req = sanitized_Request('http://www.' + mobj.group('url')) video = self._download_json(
req.add_header('Cookie', 'age_verified=1') 'https://www.spankwire.com/api/video/%s.json' % video_id, video_id)
webpage = self._download_webpage(req, video_id)
title = self._html_search_regex( title = video['title']
r'<h1>([^<]+)', webpage, 'title')
description = self._html_search_regex(
r'(?s)<div\s+id="descriptionContent">(.+?)</div>',
webpage, 'description', fatal=False)
thumbnail = self._html_search_regex(
r'playerData\.screenShot\s*=\s*["\']([^"\']+)["\']',
webpage, 'thumbnail', fatal=False)
uploader = self._html_search_regex(
r'by:\s*<a [^>]*>(.+?)</a>',
webpage, 'uploader', fatal=False)
uploader_id = self._html_search_regex(
r'by:\s*<a href="/(?:user/viewProfile|Profile\.aspx)\?.*?UserId=(\d+).*?"',
webpage, 'uploader id', fatal=False)
upload_date = unified_strdate(self._html_search_regex(
r'</a> on (.+?) at \d+:\d+',
webpage, 'upload date', fatal=False))
view_count = str_to_int(self._html_search_regex(
r'<div id="viewsCounter"><span>([\d,\.]+)</span> views</div>',
webpage, 'view count', fatal=False))
comment_count = str_to_int(self._html_search_regex(
r'<span\s+id="spCommentCount"[^>]*>([\d,\.]+)</span>',
webpage, 'comment count', fatal=False))
videos = re.findall(
r'playerData\.cdnPath([0-9]{3,})\s*=\s*(?:encodeURIComponent\()?["\']([^"\']+)["\']', webpage)
heights = [int(video[0]) for video in videos]
video_urls = list(map(compat_urllib_parse_unquote, [video[1] for video in videos]))
if webpage.find(r'flashvars\.encrypted = "true"') != -1:
password = self._search_regex(
r'flashvars\.video_title = "([^"]+)',
webpage, 'password').replace('+', ' ')
video_urls = list(map(
lambda s: aes_decrypt_text(s, password, 32).decode('utf-8'),
video_urls))
formats = [] formats = []
for height, video_url in zip(heights, video_urls): videos = video.get('videos')
path = compat_urllib_parse_urlparse(video_url).path if isinstance(videos, dict):
m = re.search(r'/(?P<height>\d+)[pP]_(?P<tbr>\d+)[kK]', path) for format_id, format_url in videos.items():
if m: video_url = url_or_none(format_url)
tbr = int(m.group('tbr')) if not format_url:
height = int(m.group('height')) continue
else: height = int_or_none(self._search_regex(
tbr = None r'(\d+)[pP]', format_id, 'height', default=None))
formats.append({ m = re.search(
'url': video_url, r'/(?P<height>\d+)[pP]_(?P<tbr>\d+)[kK]', video_url)
'format_id': '%dp' % height, if m:
'height': height, tbr = int(m.group('tbr'))
'tbr': tbr, height = height or int(m.group('height'))
else:
tbr = None
formats.append({
'url': video_url,
'format_id': '%dp' % height if height else format_id,
'height': height,
'tbr': tbr,
})
m3u8_url = url_or_none(video.get('HLS'))
if m3u8_url:
formats.extend(self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
self._sort_formats(formats, ('height', 'tbr', 'width', 'format_id'))
view_count = str_to_int(video.get('viewed'))
thumbnails = []
for preference, t in enumerate(('', '2x'), start=0):
thumbnail_url = url_or_none(video.get('poster%s' % t))
if not thumbnail_url:
continue
thumbnails.append({
'url': thumbnail_url,
'preference': preference,
}) })
self._sort_formats(formats)
age_limit = self._rta_search(webpage) def extract_names(key):
entries_list = video.get(key)
if not isinstance(entries_list, list):
return
entries = []
for entry in entries_list:
name = str_or_none(entry.get('name'))
if name:
entries.append(name)
return entries
return { categories = extract_names('categories')
tags = extract_names('tags')
uploader = None
info = {}
webpage = self._download_webpage(
'https://www.spankwire.com/_/video%s/' % video_id, video_id,
fatal=False)
if webpage:
info = self._search_json_ld(webpage, video_id, default={})
thumbnail_url = None
if 'thumbnail' in info:
thumbnail_url = url_or_none(info['thumbnail'])
del info['thumbnail']
if not thumbnail_url:
thumbnail_url = self._og_search_thumbnail(webpage)
if thumbnail_url:
thumbnails.append({
'url': thumbnail_url,
'preference': 10,
})
uploader = self._html_search_regex(
r'(?s)by\s*<a[^>]+\bclass=["\']uploaded__by[^>]*>(.+?)</a>',
webpage, 'uploader', fatal=False)
if not view_count:
view_count = str_to_int(self._search_regex(
r'data-views=["\']([\d,.]+)', webpage, 'view count',
fatal=False))
return merge_dicts({
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'description': description, 'description': video.get('description'),
'thumbnail': thumbnail, 'duration': int_or_none(video.get('duration')),
'thumbnails': thumbnails,
'uploader': uploader, 'uploader': uploader,
'uploader_id': uploader_id, 'uploader_id': str_or_none(video.get('userId')),
'upload_date': upload_date, 'timestamp': int_or_none(video.get('time_approved_on')),
'average_rating': float_or_none(video.get('rating')),
'view_count': view_count, 'view_count': view_count,
'comment_count': comment_count, 'comment_count': int_or_none(video.get('comments')),
'age_limit': 18,
'categories': categories,
'tags': tags,
'formats': formats, 'formats': formats,
'age_limit': age_limit, }, info)
}

View File

@ -13,36 +13,18 @@ from ..utils import (
class SportDeutschlandIE(InfoExtractor): class SportDeutschlandIE(InfoExtractor):
_VALID_URL = r'https?://sportdeutschland\.tv/(?P<sport>[^/?#]+)/(?P<id>[^?#/]+)(?:$|[?#])' _VALID_URL = r'https?://sportdeutschland\.tv/(?P<sport>[^/?#]+)/(?P<id>[^?#/]+)(?:$|[?#])'
_TESTS = [{ _TESTS = [{
'url': 'http://sportdeutschland.tv/badminton/live-li-ning-badminton-weltmeisterschaft-2014-kopenhagen', 'url': 'https://sportdeutschland.tv/badminton/re-live-deutsche-meisterschaften-2020-halbfinals?playlistId=0',
'info_dict': { 'info_dict': {
'id': 'live-li-ning-badminton-weltmeisterschaft-2014-kopenhagen', 'id': 're-live-deutsche-meisterschaften-2020-halbfinals',
'ext': 'mp4', 'ext': 'mp4',
'title': 're:Li-Ning Badminton Weltmeisterschaft 2014 Kopenhagen', 'title': 're:Re-live: Deutsche Meisterschaften 2020.*Halbfinals',
'categories': ['Badminton'], 'categories': ['Badminton-Deutschland'],
'view_count': int, 'view_count': int,
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.(?:jpg|png)$',
'description': r're:Die Badminton-WM 2014 aus Kopenhagen bei Sportdeutschland\.TV',
'timestamp': int, 'timestamp': int,
'upload_date': 're:^201408[23][0-9]$', 'upload_date': '20200201',
'description': 're:.*', # meaningless description for THIS video
}, },
'params': {
'skip_download': 'Live stream',
},
}, {
'url': 'http://sportdeutschland.tv/li-ning-badminton-wm-2014/lee-li-ning-badminton-weltmeisterschaft-2014-kopenhagen-herren-einzel-wei-vs',
'info_dict': {
'id': 'lee-li-ning-badminton-weltmeisterschaft-2014-kopenhagen-herren-einzel-wei-vs',
'ext': 'mp4',
'upload_date': '20140825',
'description': 'md5:60a20536b57cee7d9a4ec005e8687504',
'timestamp': 1408976060,
'duration': 2732,
'title': 'Li-Ning Badminton Weltmeisterschaft 2014 Kopenhagen: Herren Einzel, Wei Lee vs. Keun Lee',
'thumbnail': r're:^https?://.*\.jpg$',
'view_count': int,
'categories': ['Li-Ning Badminton WM 2014'],
}
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -50,7 +32,7 @@ class SportDeutschlandIE(InfoExtractor):
video_id = mobj.group('id') video_id = mobj.group('id')
sport_id = mobj.group('sport') sport_id = mobj.group('sport')
api_url = 'http://proxy.vidibusdynamic.net/sportdeutschland.tv/api/permalinks/%s/%s?access_token=true' % ( api_url = 'https://proxy.vidibusdynamic.net/ssl/backend.sportdeutschland.tv/api/permalinks/%s/%s?access_token=true' % (
sport_id, video_id) sport_id, video_id)
req = sanitized_Request(api_url, headers={ req = sanitized_Request(api_url, headers={
'Accept': 'application/vnd.vidibus.v2.html+json', 'Accept': 'application/vnd.vidibus.v2.html+json',

View File

@ -1,14 +1,14 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
from .ard import ARDMediathekIE from .ard import ARDMediathekBaseIE
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
get_element_by_attribute, get_element_by_attribute,
) )
class SRMediathekIE(ARDMediathekIE): class SRMediathekIE(ARDMediathekBaseIE):
IE_NAME = 'sr:mediathek' IE_NAME = 'sr:mediathek'
IE_DESC = 'Saarländischer Rundfunk' IE_DESC = 'Saarländischer Rundfunk'
_VALID_URL = r'https?://sr-mediathek(?:\.sr-online)?\.de/index\.php\?.*?&id=(?P<id>[0-9]+)' _VALID_URL = r'https?://sr-mediathek(?:\.sr-online)?\.de/index\.php\?.*?&id=(?P<id>[0-9]+)'

View File

@ -1,128 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_chr
from ..utils import (
determine_ext,
ExtractorError,
int_or_none,
js_to_json,
)
class StreamangoIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?(?:streamango\.com|fruithosts\.net|streamcherry\.com)/(?:f|embed)/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://streamango.com/f/clapasobsptpkdfe/20170315_150006_mp4',
'md5': 'e992787515a182f55e38fc97588d802a',
'info_dict': {
'id': 'clapasobsptpkdfe',
'ext': 'mp4',
'title': '20170315_150006.mp4',
}
}, {
# no og:title
'url': 'https://streamango.com/embed/foqebrpftarclpob/asdf_asd_2_mp4',
'info_dict': {
'id': 'foqebrpftarclpob',
'ext': 'mp4',
'title': 'foqebrpftarclpob',
},
'params': {
'skip_download': True,
},
'skip': 'gone',
}, {
'url': 'https://streamango.com/embed/clapasobsptpkdfe/20170315_150006_mp4',
'only_matching': True,
}, {
'url': 'https://fruithosts.net/f/mreodparcdcmspsm/w1f1_r4lph_2018_brrs_720p_latino_mp4',
'only_matching': True,
}, {
'url': 'https://streamcherry.com/f/clapasobsptpkdfe/',
'only_matching': True,
}]
def _real_extract(self, url):
def decrypt_src(encoded, val):
ALPHABET = '=/+9876543210zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA'
encoded = re.sub(r'[^A-Za-z0-9+/=]', '', encoded)
decoded = ''
sm = [None] * 4
i = 0
str_len = len(encoded)
while i < str_len:
for j in range(4):
sm[j % 4] = ALPHABET.index(encoded[i])
i += 1
char_code = ((sm[0] << 0x2) | (sm[1] >> 0x4)) ^ val
decoded += compat_chr(char_code)
if sm[2] != 0x40:
char_code = ((sm[1] & 0xf) << 0x4) | (sm[2] >> 0x2)
decoded += compat_chr(char_code)
if sm[3] != 0x40:
char_code = ((sm[2] & 0x3) << 0x6) | sm[3]
decoded += compat_chr(char_code)
return decoded
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
title = self._og_search_title(webpage, default=video_id)
formats = []
for format_ in re.findall(r'({[^}]*\bsrc\s*:\s*[^}]*})', webpage):
mobj = re.search(r'(src\s*:\s*[^(]+\(([^)]*)\)[\s,]*)', format_)
if mobj is None:
continue
format_ = format_.replace(mobj.group(0), '')
video = self._parse_json(
format_, video_id, transform_source=js_to_json,
fatal=False) or {}
mobj = re.search(
r'([\'"])(?P<src>(?:(?!\1).)+)\1\s*,\s*(?P<val>\d+)',
mobj.group(1))
if mobj is None:
continue
src = decrypt_src(mobj.group('src'), int_or_none(mobj.group('val')))
if not src:
continue
ext = determine_ext(src, default_ext=None)
if video.get('type') == 'application/dash+xml' or ext == 'mpd':
formats.extend(self._extract_mpd_formats(
src, video_id, mpd_id='dash', fatal=False))
else:
formats.append({
'url': src,
'ext': ext or 'mp4',
'width': int_or_none(video.get('width')),
'height': int_or_none(video.get('height')),
'tbr': int_or_none(video.get('bitrate')),
})
if not formats:
error = self._search_regex(
r'<p[^>]+\bclass=["\']lead[^>]+>(.+?)</p>', webpage,
'error', default=None)
if not error and '>Sorry' in webpage:
error = 'Video %s is not available' % video_id
if error:
raise ExtractorError(error, expected=True)
self._sort_formats(formats)
return {
'id': video_id,
'url': url,
'title': title,
'formats': formats,
}

View File

@ -5,44 +5,28 @@ from ..utils import int_or_none
class StretchInternetIE(InfoExtractor): class StretchInternetIE(InfoExtractor):
_VALID_URL = r'https?://portal\.stretchinternet\.com/[^/]+/portal\.htm\?.*?\beventId=(?P<id>\d+)' _VALID_URL = r'https?://portal\.stretchinternet\.com/[^/]+/(?:portal|full)\.htm\?.*?\beventId=(?P<id>\d+)'
_TEST = { _TEST = {
'url': 'https://portal.stretchinternet.com/umary/portal.htm?eventId=313900&streamType=video', 'url': 'https://portal.stretchinternet.com/umary/portal.htm?eventId=573272&streamType=video',
'info_dict': { 'info_dict': {
'id': '313900', 'id': '573272',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Augustana (S.D.) Baseball vs University of Mary', 'title': 'University of Mary Wrestling vs. Upper Iowa',
'description': 'md5:7578478614aae3bdd4a90f578f787438', 'timestamp': 1575668361,
'timestamp': 1490468400, 'upload_date': '20191206',
'upload_date': '20170325',
} }
} }
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
stream = self._download_json(
'https://neo-client.stretchinternet.com/streamservice/v1/media/stream/v%s'
% video_id, video_id)
video_url = 'https://%s' % stream['source']
event = self._download_json( event = self._download_json(
'https://neo-client.stretchinternet.com/portal-ws/getEvent.json', 'https://api.stretchinternet.com/trinity/event/tcg/' + video_id,
video_id, query={ video_id)[0]
'clientID': 99997,
'eventID': video_id,
'token': 'asdf',
})['event']
title = event.get('title') or event['mobileTitle']
description = event.get('customText')
timestamp = int_or_none(event.get('longtime'))
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': event['title'],
'description': description, 'timestamp': int_or_none(event.get('dateCreated'), 1000),
'timestamp': timestamp, 'url': 'https://' + event['media'][0]['url'],
'url': video_url,
} }

View File

@ -4,19 +4,14 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import compat_str
compat_parse_qs,
compat_urllib_parse_urlparse,
)
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
dict_get, dict_get,
int_or_none, int_or_none,
orderedSet, str_or_none,
strip_or_none, strip_or_none,
try_get, try_get,
urljoin,
compat_str,
) )
@ -237,23 +232,23 @@ class SVTPlayIE(SVTPlayBaseIE):
class SVTSeriesIE(SVTPlayBaseIE): class SVTSeriesIE(SVTPlayBaseIE):
_VALID_URL = r'https?://(?:www\.)?svtplay\.se/(?P<id>[^/?&#]+)' _VALID_URL = r'https?://(?:www\.)?svtplay\.se/(?P<id>[^/?&#]+)(?:.+?\btab=(?P<season_slug>[^&#]+))?'
_TESTS = [{ _TESTS = [{
'url': 'https://www.svtplay.se/rederiet', 'url': 'https://www.svtplay.se/rederiet',
'info_dict': { 'info_dict': {
'id': 'rederiet', 'id': '14445680',
'title': 'Rederiet', 'title': 'Rederiet',
'description': 'md5:505d491a58f4fcf6eb418ecab947e69e', 'description': 'md5:d9fdfff17f5d8f73468176ecd2836039',
}, },
'playlist_mincount': 318, 'playlist_mincount': 318,
}, { }, {
'url': 'https://www.svtplay.se/rederiet?tab=sasong2', 'url': 'https://www.svtplay.se/rederiet?tab=season-2-14445680',
'info_dict': { 'info_dict': {
'id': 'rederiet-sasong2', 'id': 'season-2-14445680',
'title': 'Rederiet - Säsong 2', 'title': 'Rederiet - Säsong 2',
'description': 'md5:505d491a58f4fcf6eb418ecab947e69e', 'description': 'md5:d9fdfff17f5d8f73468176ecd2836039',
}, },
'playlist_count': 12, 'playlist_mincount': 12,
}] }]
@classmethod @classmethod
@ -261,83 +256,87 @@ class SVTSeriesIE(SVTPlayBaseIE):
return False if SVTIE.suitable(url) or SVTPlayIE.suitable(url) else super(SVTSeriesIE, cls).suitable(url) return False if SVTIE.suitable(url) or SVTPlayIE.suitable(url) else super(SVTSeriesIE, cls).suitable(url)
def _real_extract(self, url): def _real_extract(self, url):
series_id = self._match_id(url) series_slug, season_id = re.match(self._VALID_URL, url).groups()
qs = compat_parse_qs(compat_urllib_parse_urlparse(url).query) series = self._download_json(
season_slug = qs.get('tab', [None])[0] 'https://api.svt.se/contento/graphql', series_slug,
'Downloading series page', query={
if season_slug: 'query': '''{
series_id += '-%s' % season_slug listablesBySlug(slugs: ["%s"]) {
associatedContent(include: [productionPeriod, season]) {
webpage = self._download_webpage( items {
url, series_id, 'Downloading series page') item {
... on Episode {
root = self._parse_json( videoSvtId
self._search_regex( }
self._SVTPLAY_RE, webpage, 'content', group='json'), }
series_id) }
id
name
}
id
longDescription
name
shortDescription
}
}''' % series_slug,
})['data']['listablesBySlug'][0]
season_name = None season_name = None
entries = [] entries = []
for season in root['relatedVideoContent']['relatedVideosAccordion']: for season in series['associatedContent']:
if not isinstance(season, dict): if not isinstance(season, dict):
continue continue
if season_slug: if season_id:
if season.get('slug') != season_slug: if season.get('id') != season_id:
continue continue
season_name = season.get('name') season_name = season.get('name')
videos = season.get('videos') items = season.get('items')
if not isinstance(videos, list): if not isinstance(items, list):
continue continue
for video in videos: for item in items:
content_url = video.get('contentUrl') video = item.get('item') or {}
if not content_url or not isinstance(content_url, compat_str): content_id = video.get('videoSvtId')
if not content_id or not isinstance(content_id, compat_str):
continue continue
entries.append( entries.append(self.url_result(
self.url_result( 'svt:' + content_id, SVTPlayIE.ie_key(), content_id))
urljoin(url, content_url),
ie=SVTPlayIE.ie_key(),
video_title=video.get('title')
))
metadata = root.get('metaData') title = series.get('name')
if not isinstance(metadata, dict): season_name = season_name or season_id
metadata = {}
title = metadata.get('title')
season_name = season_name or season_slug
if title and season_name: if title and season_name:
title = '%s - %s' % (title, season_name) title = '%s - %s' % (title, season_name)
elif season_slug: elif season_id:
title = season_slug title = season_id
return self.playlist_result( return self.playlist_result(
entries, series_id, title, metadata.get('description')) entries, season_id or series.get('id'), title,
dict_get(series, ('longDescription', 'shortDescription')))
class SVTPageIE(InfoExtractor): class SVTPageIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?svt\.se/(?:[^/]+/)*(?P<id>[^/?&#]+)' _VALID_URL = r'https?://(?:www\.)?svt\.se/(?P<path>(?:[^/]+/)*(?P<id>[^/?&#]+))'
_TESTS = [{ _TESTS = [{
'url': 'https://www.svt.se/sport/oseedat/guide-sommartraningen-du-kan-gora-var-och-nar-du-vill', 'url': 'https://www.svt.se/sport/ishockey/bakom-masken-lehners-kamp-mot-mental-ohalsa',
'info_dict': { 'info_dict': {
'id': 'guide-sommartraningen-du-kan-gora-var-och-nar-du-vill', 'id': '25298267',
'title': 'GUIDE: Sommarträning du kan göra var och när du vill', 'title': 'Bakom masken Lehners kamp mot mental ohälsa',
}, },
'playlist_count': 7, 'playlist_count': 4,
}, { }, {
'url': 'https://www.svt.se/nyheter/inrikes/ebba-busch-thor-kd-har-delvis-ratt-om-no-go-zoner', 'url': 'https://www.svt.se/nyheter/utrikes/svenska-andrea-ar-en-mil-fran-branderna-i-kalifornien',
'info_dict': { 'info_dict': {
'id': 'ebba-busch-thor-kd-har-delvis-ratt-om-no-go-zoner', 'id': '24243746',
'title': 'Ebba Busch Thor har bara delvis rätt om ”no-go-zoner”', 'title': 'Svenska Andrea redo att fly sitt hem i Kalifornien',
}, },
'playlist_count': 1, 'playlist_count': 2,
}, { }, {
# only programTitle # only programTitle
'url': 'http://www.svt.se/sport/ishockey/jagr-tacklar-giroux-under-intervjun', 'url': 'http://www.svt.se/sport/ishockey/jagr-tacklar-giroux-under-intervjun',
'info_dict': { 'info_dict': {
'id': '2900353', 'id': '8439V2K',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Stjärnorna skojar till det - under SVT-intervjun', 'title': 'Stjärnorna skojar till det - under SVT-intervjun',
'duration': 27, 'duration': 27,
@ -356,16 +355,26 @@ class SVTPageIE(InfoExtractor):
return False if SVTIE.suitable(url) else super(SVTPageIE, cls).suitable(url) return False if SVTIE.suitable(url) else super(SVTPageIE, cls).suitable(url)
def _real_extract(self, url): def _real_extract(self, url):
playlist_id = self._match_id(url) path, display_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, playlist_id) article = self._download_json(
'https://api.svt.se/nss-api/page/' + path, display_id,
query={'q': 'articles'})['articles']['content'][0]
entries = [ entries = []
self.url_result(
'svt:%s' % video_id, ie=SVTPlayIE.ie_key(), video_id=video_id)
for video_id in orderedSet(re.findall(
r'data-video-id=["\'](\d+)', webpage))]
title = strip_or_none(self._og_search_title(webpage, default=None)) def _process_content(content):
if content.get('_type') in ('VIDEOCLIP', 'VIDEOEPISODE'):
video_id = compat_str(content['image']['svtId'])
entries.append(self.url_result(
'svt:' + video_id, SVTPlayIE.ie_key(), video_id))
return self.playlist_result(entries, playlist_id, title) for media in article.get('media', []):
_process_content(media)
for obj in article.get('structuredBody', []):
_process_content(obj.get('content') or {})
return self.playlist_result(
entries, str_or_none(article.get('id')),
strip_or_none(article.get('title')))

View File

@ -4,11 +4,12 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from .wistia import WistiaIE from .wistia import WistiaIE
from ..compat import compat_str
from ..utils import ( from ..utils import (
clean_html, clean_html,
ExtractorError, ExtractorError,
int_or_none,
get_element_by_class, get_element_by_class,
strip_or_none,
urlencode_postdata, urlencode_postdata,
urljoin, urljoin,
) )
@ -20,8 +21,8 @@ class TeachableBaseIE(InfoExtractor):
_SITES = { _SITES = {
# Only notable ones here # Only notable ones here
'upskillcourses.com': 'upskill', 'v1.upskillcourses.com': 'upskill',
'academy.gns3.com': 'gns3', 'gns3.teachable.com': 'gns3',
'academyhacker.com': 'academyhacker', 'academyhacker.com': 'academyhacker',
'stackskills.com': 'stackskills', 'stackskills.com': 'stackskills',
'market.saleshacker.com': 'saleshacker', 'market.saleshacker.com': 'saleshacker',
@ -58,7 +59,7 @@ class TeachableBaseIE(InfoExtractor):
self._logged_in = True self._logged_in = True
return return
login_url = compat_str(urlh.geturl()) login_url = urlh.geturl()
login_form = self._hidden_inputs(login_page) login_form = self._hidden_inputs(login_page)
@ -110,27 +111,29 @@ class TeachableIE(TeachableBaseIE):
''' % TeachableBaseIE._VALID_URL_SUB_TUPLE ''' % TeachableBaseIE._VALID_URL_SUB_TUPLE
_TESTS = [{ _TESTS = [{
'url': 'http://upskillcourses.com/courses/essential-web-developer-course/lectures/1747100', 'url': 'https://gns3.teachable.com/courses/gns3-certified-associate/lectures/6842364',
'info_dict': { 'info_dict': {
'id': 'uzw6zw58or', 'id': 'untlgzk1v7',
'ext': 'mp4', 'ext': 'bin',
'title': 'Welcome to the Course!', 'title': 'Overview',
'description': 'md5:65edb0affa582974de4625b9cdea1107', 'description': 'md5:071463ff08b86c208811130ea1c2464c',
'duration': 138.763, 'duration': 736.4,
'timestamp': 1479846621, 'timestamp': 1542315762,
'upload_date': '20161122', 'upload_date': '20181115',
'chapter': 'Welcome',
'chapter_number': 1,
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
}, { }, {
'url': 'http://upskillcourses.com/courses/119763/lectures/1747100', 'url': 'http://v1.upskillcourses.com/courses/119763/lectures/1747100',
'only_matching': True, 'only_matching': True,
}, { }, {
'url': 'https://academy.gns3.com/courses/423415/lectures/6885939', 'url': 'https://gns3.teachable.com/courses/423415/lectures/6885939',
'only_matching': True, 'only_matching': True,
}, { }, {
'url': 'teachable:https://upskillcourses.com/courses/essential-web-developer-course/lectures/1747100', 'url': 'teachable:https://v1.upskillcourses.com/courses/essential-web-developer-course/lectures/1747100',
'only_matching': True, 'only_matching': True,
}] }]
@ -160,22 +163,51 @@ class TeachableIE(TeachableBaseIE):
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
wistia_url = WistiaIE._extract_url(webpage) wistia_urls = WistiaIE._extract_urls(webpage)
if not wistia_url: if not wistia_urls:
if any(re.search(p, webpage) for p in ( if any(re.search(p, webpage) for p in (
r'class=["\']lecture-contents-locked', r'class=["\']lecture-contents-locked',
r'>\s*Lecture contents locked', r'>\s*Lecture contents locked',
r'id=["\']lecture-locked')): r'id=["\']lecture-locked',
# https://academy.tailoredtutors.co.uk/courses/108779/lectures/1955313
r'class=["\'](?:inner-)?lesson-locked',
r'>LESSON LOCKED<')):
self.raise_login_required('Lecture contents locked') self.raise_login_required('Lecture contents locked')
raise ExtractorError('Unable to find video URL')
title = self._og_search_title(webpage, default=None) title = self._og_search_title(webpage, default=None)
return { chapter = None
chapter_number = None
section_item = self._search_regex(
r'(?s)(?P<li><li[^>]+\bdata-lecture-id=["\']%s[^>]+>.+?</li>)' % video_id,
webpage, 'section item', default=None, group='li')
if section_item:
chapter_number = int_or_none(self._search_regex(
r'data-ss-position=["\'](\d+)', section_item, 'section id',
default=None))
if chapter_number is not None:
sections = []
for s in re.findall(
r'(?s)<div[^>]+\bclass=["\']section-title[^>]+>(.+?)</div>', webpage):
section = strip_or_none(clean_html(s))
if not section:
sections = []
break
sections.append(section)
if chapter_number <= len(sections):
chapter = sections[chapter_number - 1]
entries = [{
'_type': 'url_transparent', '_type': 'url_transparent',
'url': wistia_url, 'url': wistia_url,
'ie_key': WistiaIE.ie_key(), 'ie_key': WistiaIE.ie_key(),
'title': title, 'title': title,
} 'chapter': chapter,
'chapter_number': chapter_number,
} for wistia_url in wistia_urls]
return self.playlist_result(entries, video_id, title)
class TeachableCourseIE(TeachableBaseIE): class TeachableCourseIE(TeachableBaseIE):
@ -187,20 +219,20 @@ class TeachableCourseIE(TeachableBaseIE):
/(?:courses|p)/(?:enrolled/)?(?P<id>[^/?#&]+) /(?:courses|p)/(?:enrolled/)?(?P<id>[^/?#&]+)
''' % TeachableBaseIE._VALID_URL_SUB_TUPLE ''' % TeachableBaseIE._VALID_URL_SUB_TUPLE
_TESTS = [{ _TESTS = [{
'url': 'http://upskillcourses.com/courses/essential-web-developer-course/', 'url': 'http://v1.upskillcourses.com/courses/essential-web-developer-course/',
'info_dict': { 'info_dict': {
'id': 'essential-web-developer-course', 'id': 'essential-web-developer-course',
'title': 'The Essential Web Developer Course (Free)', 'title': 'The Essential Web Developer Course (Free)',
}, },
'playlist_count': 192, 'playlist_count': 192,
}, { }, {
'url': 'http://upskillcourses.com/courses/119763/', 'url': 'http://v1.upskillcourses.com/courses/119763/',
'only_matching': True, 'only_matching': True,
}, { }, {
'url': 'http://upskillcourses.com/courses/enrolled/119763', 'url': 'http://v1.upskillcourses.com/courses/enrolled/119763',
'only_matching': True, 'only_matching': True,
}, { }, {
'url': 'https://academy.gns3.com/courses/enrolled/423415', 'url': 'https://gns3.teachable.com/courses/enrolled/423415',
'only_matching': True, 'only_matching': True,
}, { }, {
'url': 'teachable:https://learn.vrdev.school/p/gear-vr-developer-mini', 'url': 'teachable:https://learn.vrdev.school/p/gear-vr-developer-mini',

View File

@ -1,35 +1,33 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from .ooyala import OoyalaIE
class TeachingChannelIE(InfoExtractor): class TeachingChannelIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?teachingchannel\.org/videos/(?P<title>.+)' _VALID_URL = r'https?://(?:www\.)?teachingchannel\.org/videos?/(?P<id>[^/?&#]+)'
_TEST = { _TEST = {
'url': 'https://www.teachingchannel.org/videos/teacher-teaming-evolution', 'url': 'https://www.teachingchannel.org/videos/teacher-teaming-evolution',
'md5': '3d6361864d7cac20b57c8784da17166f',
'info_dict': { 'info_dict': {
'id': 'F3bnlzbToeI6pLEfRyrlfooIILUjz4nM', 'id': '3swwlzkT',
'ext': 'mp4', 'ext': 'mp4',
'title': 'A History of Teaming', 'title': 'A History of Teaming',
'description': 'md5:2a9033db8da81f2edffa4c99888140b3', 'description': 'md5:2a9033db8da81f2edffa4c99888140b3',
'duration': 422.255, 'duration': 422,
'upload_date': '20170316',
'timestamp': 1489691297,
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
'add_ie': ['Ooyala'], 'add_ie': ['JWPlatform'],
} }
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) display_id = self._match_id(url)
title = mobj.group('title') webpage = self._download_webpage(url, display_id)
webpage = self._download_webpage(url, title) mid = self._search_regex(
ooyala_code = self._search_regex( r'(?:data-mid=["\']|id=["\']jw-video-player-)([a-zA-Z0-9]{8})',
r'data-embed-code=\'(.+?)\'', webpage, 'ooyala code') webpage, 'media id')
return OoyalaIE._build_url_result(ooyala_code) return self.url_result('jwplatform:' + mid, 'JWPlatform', mid)

View File

@ -1,9 +1,19 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from .jwplatform import JWPlatformIE
from .nexx import NexxIE from .nexx import NexxIE
from ..compat import compat_urlparse from ..compat import (
compat_str,
compat_urlparse,
)
from ..utils import (
NO_DEFAULT,
try_get,
)
class Tele5IE(InfoExtractor): class Tele5IE(InfoExtractor):
@ -44,14 +54,49 @@ class Tele5IE(InfoExtractor):
qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query) qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
video_id = (qs.get('vid') or qs.get('ve_id') or [None])[0] video_id = (qs.get('vid') or qs.get('ve_id') or [None])[0]
if not video_id: NEXX_ID_RE = r'\d{6,}'
JWPLATFORM_ID_RE = r'[a-zA-Z0-9]{8}'
def nexx_result(nexx_id):
return self.url_result(
'https://api.nexx.cloud/v3/759/videos/byid/%s' % nexx_id,
ie=NexxIE.ie_key(), video_id=nexx_id)
nexx_id = jwplatform_id = None
if video_id:
if re.match(NEXX_ID_RE, video_id):
return nexx_result(video_id)
elif re.match(JWPLATFORM_ID_RE, video_id):
jwplatform_id = video_id
if not nexx_id:
display_id = self._match_id(url) display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
video_id = self._html_search_regex(
(r'id\s*=\s*["\']video-player["\'][^>]+data-id\s*=\s*["\'](\d+)', def extract_id(pattern, name, default=NO_DEFAULT):
r'\s+id\s*=\s*["\']player_(\d{6,})', return self._html_search_regex(
r'\bdata-id\s*=\s*["\'](\d{6,})'), webpage, 'video id') (r'id\s*=\s*["\']video-player["\'][^>]+data-id\s*=\s*["\'](%s)' % pattern,
r'\s+id\s*=\s*["\']player_(%s)' % pattern,
r'\bdata-id\s*=\s*["\'](%s)' % pattern), webpage, name,
default=default)
nexx_id = extract_id(NEXX_ID_RE, 'nexx id', default=None)
if nexx_id:
return nexx_result(nexx_id)
if not jwplatform_id:
jwplatform_id = extract_id(JWPLATFORM_ID_RE, 'jwplatform id')
media = self._download_json(
'https://cdn.jwplayer.com/v2/media/' + jwplatform_id,
display_id)
nexx_id = try_get(
media, lambda x: x['playlist'][0]['nexx_id'], compat_str)
if nexx_id:
return nexx_result(nexx_id)
return self.url_result( return self.url_result(
'https://api.nexx.cloud/v3/759/videos/byid/%s' % video_id, 'jwplatform:%s' % jwplatform_id, ie=JWPlatformIE.ie_key(),
ie=NexxIE.ie_key(), video_id=video_id) video_id=jwplatform_id)

View File

@ -11,6 +11,7 @@ from ..utils import (
determine_ext, determine_ext,
int_or_none, int_or_none,
str_or_none, str_or_none,
try_get,
urljoin, urljoin,
) )
@ -24,7 +25,7 @@ class TelecincoIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': '1876350223', 'id': '1876350223',
'title': 'Bacalao con kokotxas al pil-pil', 'title': 'Bacalao con kokotxas al pil-pil',
'description': 'md5:1382dacd32dd4592d478cbdca458e5bb', 'description': 'md5:716caf5601e25c3c5ab6605b1ae71529',
}, },
'playlist': [{ 'playlist': [{
'md5': 'adb28c37238b675dad0f042292f209a7', 'md5': 'adb28c37238b675dad0f042292f209a7',
@ -55,6 +56,26 @@ class TelecincoIE(InfoExtractor):
'description': 'md5:2771356ff7bfad9179c5f5cd954f1477', 'description': 'md5:2771356ff7bfad9179c5f5cd954f1477',
'duration': 50, 'duration': 50,
}, },
}, {
# video in opening's content
'url': 'https://www.telecinco.es/vivalavida/fiorella-sobrina-edmundo-arrocet-entrevista_18_2907195140.html',
'info_dict': {
'id': '2907195140',
'title': 'La surrealista entrevista a la sobrina de Edmundo Arrocet: "No puedes venir aquí y tomarnos por tontos"',
'description': 'md5:73f340a7320143d37ab895375b2bf13a',
},
'playlist': [{
'md5': 'adb28c37238b675dad0f042292f209a7',
'info_dict': {
'id': 'TpI2EttSDAReWpJ1o0NVh2',
'ext': 'mp4',
'title': 'La surrealista entrevista a la sobrina de Edmundo Arrocet: "No puedes venir aquí y tomarnos por tontos"',
'duration': 1015,
},
}],
'params': {
'skip_download': True,
},
}, { }, {
'url': 'http://www.telecinco.es/informativos/nacional/Pablo_Iglesias-Informativos_Telecinco-entrevista-Pedro_Piqueras_2_1945155182.html', 'url': 'http://www.telecinco.es/informativos/nacional/Pablo_Iglesias-Informativos_Telecinco-entrevista-Pedro_Piqueras_2_1945155182.html',
'only_matching': True, 'only_matching': True,
@ -135,17 +156,28 @@ class TelecincoIE(InfoExtractor):
display_id = self._match_id(url) display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
article = self._parse_json(self._search_regex( article = self._parse_json(self._search_regex(
r'window\.\$REACTBASE_STATE\.article\s*=\s*({.+})', r'window\.\$REACTBASE_STATE\.article(?:_multisite)?\s*=\s*({.+})',
webpage, 'article'), display_id)['article'] webpage, 'article'), display_id)['article']
title = article.get('title') title = article.get('title')
description = clean_html(article.get('leadParagraph')) description = clean_html(article.get('leadParagraph')) or ''
if article.get('editorialType') != 'VID': if article.get('editorialType') != 'VID':
entries = [] entries = []
for p in article.get('body', []): body = [article.get('opening')]
content = p.get('content') body.extend(try_get(article, lambda x: x['body'], list) or [])
if p.get('type') != 'video' or not content: for p in body:
if not isinstance(p, dict):
continue continue
entries.append(self._parse_content(content, url)) content = p.get('content')
if not content:
continue
type_ = p.get('type')
if type_ == 'paragraph':
content_str = str_or_none(content)
if content_str:
description += content_str
continue
if type_ == 'video' and isinstance(content, dict):
entries.append(self._parse_content(content, url))
return self.playlist_result( return self.playlist_result(
entries, str_or_none(article.get('id')), title, description) entries, str_or_none(article.get('id')), title, description)
content = article['opening']['content'] content = article['opening']['content']

View File

@ -38,8 +38,6 @@ class TeleQuebecIE(TeleQuebecBaseIE):
'ext': 'mp4', 'ext': 'mp4',
'title': 'Un petit choc et puis repart!', 'title': 'Un petit choc et puis repart!',
'description': 'md5:b04a7e6b3f74e32d7b294cffe8658374', 'description': 'md5:b04a7e6b3f74e32d7b294cffe8658374',
'upload_date': '20180222',
'timestamp': 1519326631,
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,

View File

@ -17,14 +17,12 @@ class TFOIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?tfo\.org/(?:en|fr)/(?:[^/]+/){2}(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?tfo\.org/(?:en|fr)/(?:[^/]+/){2}(?P<id>\d+)'
_TEST = { _TEST = {
'url': 'http://www.tfo.org/en/universe/tfo-247/100463871/video-game-hackathon', 'url': 'http://www.tfo.org/en/universe/tfo-247/100463871/video-game-hackathon',
'md5': '47c987d0515561114cf03d1226a9d4c7', 'md5': 'cafbe4f47a8dae0ca0159937878100d6',
'info_dict': { 'info_dict': {
'id': '100463871', 'id': '7da3d50e495c406b8fc0b997659cc075',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Video Game Hackathon', 'title': 'Video Game Hackathon',
'description': 'md5:558afeba217c6c8d96c60e5421795c07', 'description': 'md5:558afeba217c6c8d96c60e5421795c07',
'upload_date': '20160212',
'timestamp': 1455310233,
} }
} }

Some files were not shown because too many files have changed in this diff Show More