1
0
mirror of https://codeberg.org/polarisfm/youtube-dl synced 2024-11-29 19:47:54 +01:00

Merge Update

This commit is contained in:
SpeakerEnder 2019-10-03 20:02:14 -04:00
commit eb7707754f
93 changed files with 3703 additions and 3140 deletions

View File

@ -18,7 +18,7 @@ title: ''
<!-- <!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.06.27. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED. - First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.09.12.1. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser. - Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape. - Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape.
- Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates. - Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates.
@ -26,7 +26,7 @@ Carefully read and work through this check list in order to prevent the most com
--> -->
- [ ] I'm reporting a broken site support - [ ] I'm reporting a broken site support
- [ ] I've verified that I'm running youtube-dl version **2019.06.27** - [ ] I've verified that I'm running youtube-dl version **2019.09.12.1**
- [ ] I've checked that all provided URLs are alive and playable in a browser - [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped - [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
- [ ] I've searched the bugtracker for similar issues including closed ones - [ ] I've searched the bugtracker for similar issues including closed ones
@ -41,7 +41,7 @@ Add the `-v` flag to your command line you run youtube-dl with (`youtube-dl -v <
[debug] User config: [] [debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2019.06.27 [debug] youtube-dl version 2019.09.12.1
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {} [debug] Proxy map: {}

View File

@ -19,7 +19,7 @@ labels: 'site-support-request'
<!-- <!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.06.27. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED. - First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.09.12.1. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser. - Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that site you are requesting is not dedicated to copyright infringement, see https://yt-dl.org/copyright-infringement. youtube-dl does not support such sites. In order for site support request to be accepted all provided example URLs should not violate any copyrights. - Make sure that site you are requesting is not dedicated to copyright infringement, see https://yt-dl.org/copyright-infringement. youtube-dl does not support such sites. In order for site support request to be accepted all provided example URLs should not violate any copyrights.
- Search the bugtracker for similar site support requests: http://yt-dl.org/search-issues. DO NOT post duplicates. - Search the bugtracker for similar site support requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
@ -27,7 +27,7 @@ Carefully read and work through this check list in order to prevent the most com
--> -->
- [ ] I'm reporting a new site support request - [ ] I'm reporting a new site support request
- [ ] I've verified that I'm running youtube-dl version **2019.06.27** - [ ] I've verified that I'm running youtube-dl version **2019.09.12.1**
- [ ] I've checked that all provided URLs are alive and playable in a browser - [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that none of provided URLs violate any copyrights - [ ] I've checked that none of provided URLs violate any copyrights
- [ ] I've searched the bugtracker for similar site support requests including closed ones - [ ] I've searched the bugtracker for similar site support requests including closed ones

View File

@ -18,13 +18,13 @@ title: ''
<!-- <!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.06.27. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED. - First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.09.12.1. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Search the bugtracker for similar site feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates. - Search the bugtracker for similar site feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
- Finally, put x into all relevant boxes (like this [x]) - Finally, put x into all relevant boxes (like this [x])
--> -->
- [ ] I'm reporting a site feature request - [ ] I'm reporting a site feature request
- [ ] I've verified that I'm running youtube-dl version **2019.06.27** - [ ] I've verified that I'm running youtube-dl version **2019.09.12.1**
- [ ] I've searched the bugtracker for similar site feature requests including closed ones - [ ] I've searched the bugtracker for similar site feature requests including closed ones

View File

@ -18,7 +18,7 @@ title: ''
<!-- <!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.06.27. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED. - First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.09.12.1. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser. - Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape. - Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape.
- Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates. - Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates.
@ -27,7 +27,7 @@ Carefully read and work through this check list in order to prevent the most com
--> -->
- [ ] I'm reporting a broken site support issue - [ ] I'm reporting a broken site support issue
- [ ] I've verified that I'm running youtube-dl version **2019.06.27** - [ ] I've verified that I'm running youtube-dl version **2019.09.12.1**
- [ ] I've checked that all provided URLs are alive and playable in a browser - [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped - [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
- [ ] I've searched the bugtracker for similar bug reports including closed ones - [ ] I've searched the bugtracker for similar bug reports including closed ones
@ -43,7 +43,7 @@ Add the `-v` flag to your command line you run youtube-dl with (`youtube-dl -v <
[debug] User config: [] [debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2019.06.27 [debug] youtube-dl version 2019.09.12.1
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {} [debug] Proxy map: {}

View File

@ -19,13 +19,13 @@ labels: 'request'
<!-- <!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.06.27. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED. - First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.09.12.1. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Search the bugtracker for similar feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates. - Search the bugtracker for similar feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
- Finally, put x into all relevant boxes (like this [x]) - Finally, put x into all relevant boxes (like this [x])
--> -->
- [ ] I'm reporting a feature request - [ ] I'm reporting a feature request
- [ ] I've verified that I'm running youtube-dl version **2019.06.27** - [ ] I've verified that I'm running youtube-dl version **2019.09.12.1**
- [ ] I've searched the bugtracker for similar feature requests including closed ones - [ ] I've searched the bugtracker for similar feature requests including closed ones

View File

@ -339,6 +339,72 @@ Incorrect:
'PLMYEtVRpaqY00V9W81Cwmzp6N6vZqfUKD4' 'PLMYEtVRpaqY00V9W81Cwmzp6N6vZqfUKD4'
``` ```
### Inline values
Extracting variables is acceptable for reducing code duplication and improving readability of complex expressions. However, you should avoid extracting variables used only once and moving them to opposite parts of the extractor file, which makes reading the linear flow difficult.
#### Example
Correct:
```python
title = self._html_search_regex(r'<title>([^<]+)</title>', webpage, 'title')
```
Incorrect:
```python
TITLE_RE = r'<title>([^<]+)</title>'
# ...some lines of code...
title = self._html_search_regex(TITLE_RE, webpage, 'title')
```
### Collapse fallbacks
Multiple fallback values can quickly become unwieldy. Collapse multiple fallback values into a single expression via a list of patterns.
#### Example
Good:
```python
description = self._html_search_meta(
['og:description', 'description', 'twitter:description'],
webpage, 'description', default=None)
```
Unwieldy:
```python
description = (
self._og_search_description(webpage, default=None)
or self._html_search_meta('description', webpage, default=None)
or self._html_search_meta('twitter:description', webpage, default=None))
```
Methods supporting list of patterns are: `_search_regex`, `_html_search_regex`, `_og_search_property`, `_html_search_meta`.
### Trailing parentheses
Always move trailing parentheses after the last argument.
#### Example
Correct:
```python
lambda x: x['ResultSet']['Result'][0]['VideoUrlSet']['VideoUrl'],
list)
```
Incorrect:
```python
lambda x: x['ResultSet']['Result'][0]['VideoUrlSet']['VideoUrl'],
list,
)
```
### Use convenience conversion and parsing functions ### Use convenience conversion and parsing functions
Wrap all extracted numeric data into safe functions from [`youtube_dl/utils.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/utils.py): `int_or_none`, `float_or_none`. Use them for string to number conversions as well. Wrap all extracted numeric data into safe functions from [`youtube_dl/utils.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/utils.py): `int_or_none`, `float_or_none`. Use them for string to number conversions as well.

160
ChangeLog
View File

@ -1,3 +1,163 @@
version 2019.09.12.1
Extractors
* [youtube] Remove quality and tbr for itag 43 (#22372)
version 2019.09.12
Extractors
* [youtube] Quick extraction tempfix (#22367, #22163)
version 2019.09.01
Core
+ [extractor/generic] Add support for squarespace embeds (#21294, #21802,
#21859)
+ [downloader/external] Respect mtime option for aria2c (#22242)
Extractors
+ [xhamster:user] Add support for user pages (#16330, #18454)
+ [xhamster] Add support for more domains
+ [verystream] Add support for woof.tube (#22217)
+ [dailymotion] Add support for lequipe.fr (#21328, #22152)
+ [openload] Add support for oload.vip (#22205)
+ [bbccouk] Extend URL regular expression (#19200)
+ [youtube] Add support for invidious.nixnet.xyz and yt.elukerio.org (#22223)
* [safari] Fix authentication (#22161, #22184)
* [usanetwork] Fix extraction (#22105)
+ [einthusan] Add support for einthusan.ca (#22171)
* [youtube] Improve unavailable message extraction (#22117)
+ [piksel] Extract subtitles (#20506)
version 2019.08.13
Core
* [downloader/fragment] Fix ETA calculation of resumed download (#21992)
* [YoutubeDL] Check annotations availability (#18582)
Extractors
* [youtube:playlist] Improve flat extraction (#21927)
* [youtube] Fix annotations extraction (#22045)
+ [discovery] Extract series meta field (#21808)
* [youtube] Improve error detection (#16445)
* [vimeo] Fix album extraction (#1933, #15704, #15855, #18967, #21986)
+ [roosterteeth] Add support for watch URLs
* [discovery] Limit video data by show slug (#21980)
version 2019.08.02
Extractors
+ [tvigle] Add support for HLS and DASH formats (#21967)
* [tvigle] Fix extraction (#21967)
+ [yandexvideo] Add support for DASH formats (#21971)
* [discovery] Use API call for video data extraction (#21808)
+ [mgtv] Extract format_note (#21881)
* [tvn24] Fix metadata extraction (#21833, #21834)
* [dlive] Relax URL regular expression (#21909)
+ [openload] Add support for oload.best (#21913)
* [youtube] Improve metadata extraction for age gate content (#21943)
version 2019.07.30
Extractors
* [youtube] Fix and improve title and description extraction (#21934)
version 2019.07.27
Extractors
+ [yahoo:japannews] Add support for yahoo.co.jp (#21698, #21265)
+ [discovery] Add support go.discovery.com URLs
* [youtube:playlist] Relax video regular expression (#21844)
* [generic] Restrict --default-search schemeless URLs detection pattern
(#21842)
* [vrv] Fix CMS signing query extraction (#21809)
version 2019.07.16
Extractors
+ [asiancrush] Add support for yuyutv.com, midnightpulp.com and cocoro.tv
(#21281, #21290)
* [kaltura] Check source format URL (#21290)
* [ctsnews] Fix YouTube embeds extraction (#21678)
+ [einthusan] Add support for einthusan.com (#21748, #21775)
+ [youtube] Add support for invidious.mastodon.host (#21777)
+ [gfycat] Extend URL regular expression (#21779, #21780)
* [youtube] Restrict is_live extraction (#21782)
version 2019.07.14
Extractors
* [porn91] Fix extraction (#21312)
+ [yandexmusic] Extract track number and disk number (#21421)
+ [yandexmusic] Add support for multi disk albums (#21420, #21421)
* [lynda] Handle missing subtitles (#20490, #20513)
+ [youtube] Add more invidious instances to URL regular expression (#21694)
* [twitter] Improve uploader id extraction (#21705)
* [spankbang] Fix and improve metadata extraction
* [spankbang] Fix extraction (#21763, #21764)
+ [dlive] Add support for dlive.tv (#18080)
+ [livejournal] Add support for livejournal.com (#21526)
* [roosterteeth] Fix free episode extraction (#16094)
* [dbtv] Fix extraction
* [bellator] Fix extraction
- [rudo] Remove extractor (#18430, #18474)
* [facebook] Fallback to twitter:image meta for thumbnail extraction (#21224)
* [bleacherreport] Fix Bleacher Report CMS extraction
* [espn] Fix fivethirtyeight.com extraction
* [5tv] Relax video URL regular expression and support https URLs
* [youtube] Fix is_live extraction (#21734)
* [youtube] Fix authentication (#11270)
version 2019.07.12
Core
+ [adobepass] Add support for AT&T U-verse (mso ATT) (#13938, #21016)
Extractors
+ [mgtv] Pass Referer HTTP header for format URLs (#21726)
+ [beeg] Add support for api/v6 v2 URLs without t argument (#21701)
* [voxmedia:volume] Improvevox embed extraction (#16846)
* [funnyordie] Move extraction to VoxMedia extractor (#16846)
* [gameinformer] Fix extraction (#8895, #15363, #17206)
* [funk] Fix extraction (#17915)
* [packtpub] Relax lesson URL regular expression (#21695)
* [packtpub] Fix extraction (#21268)
* [philharmoniedeparis] Relax URL regular expression (#21672)
* [peertube] Detect embed URLs in generic extraction (#21666)
* [mixer:vod] Relax URL regular expression (#21657, #21658)
+ [lecturio] Add support id based URLs (#21630)
+ [go] Add site info for disneynow (#21613)
* [ted] Restrict info regular expression (#21631)
* [twitch:vod] Actualize m3u8 URL (#21538, #21607)
* [vzaar] Fix videos with empty title (#21606)
* [tvland] Fix extraction (#21384)
* [arte] Clean extractor (#15583, #21614)
version 2019.07.02
Core
+ [utils] Introduce random_user_agent and use as default User-Agent (#21546)
Extractors
+ [vevo] Add support for embed.vevo.com URLs (#21565)
+ [openload] Add support for oload.biz (#21574)
* [xiami] Update API base URL (#21575)
* [yourporn] Fix extraction (#21585)
+ [acast] Add support for URLs with episode id (#21444)
+ [dailymotion] Add support for DM.player embeds
* [soundcloud] Update client id
version 2019.06.27 version 2019.06.27
Extractors Extractors

View File

@ -1216,6 +1216,72 @@ Incorrect:
'PLMYEtVRpaqY00V9W81Cwmzp6N6vZqfUKD4' 'PLMYEtVRpaqY00V9W81Cwmzp6N6vZqfUKD4'
``` ```
### Inline values
Extracting variables is acceptable for reducing code duplication and improving readability of complex expressions. However, you should avoid extracting variables used only once and moving them to opposite parts of the extractor file, which makes reading the linear flow difficult.
#### Example
Correct:
```python
title = self._html_search_regex(r'<title>([^<]+)</title>', webpage, 'title')
```
Incorrect:
```python
TITLE_RE = r'<title>([^<]+)</title>'
# ...some lines of code...
title = self._html_search_regex(TITLE_RE, webpage, 'title')
```
### Collapse fallbacks
Multiple fallback values can quickly become unwieldy. Collapse multiple fallback values into a single expression via a list of patterns.
#### Example
Good:
```python
description = self._html_search_meta(
['og:description', 'description', 'twitter:description'],
webpage, 'description', default=None)
```
Unwieldy:
```python
description = (
self._og_search_description(webpage, default=None)
or self._html_search_meta('description', webpage, default=None)
or self._html_search_meta('twitter:description', webpage, default=None))
```
Methods supporting list of patterns are: `_search_regex`, `_html_search_regex`, `_og_search_property`, `_html_search_meta`.
### Trailing parentheses
Always move trailing parentheses after the last argument.
#### Example
Correct:
```python
lambda x: x['ResultSet']['Result'][0]['VideoUrlSet']['VideoUrl'],
list)
```
Incorrect:
```python
lambda x: x['ResultSet']['Result'][0]['VideoUrlSet']['VideoUrl'],
list,
)
```
### Use convenience conversion and parsing functions ### Use convenience conversion and parsing functions
Wrap all extracted numeric data into safe functions from [`youtube_dl/utils.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/utils.py): `int_or_none`, `float_or_none`. Use them for string to number conversions as well. Wrap all extracted numeric data into safe functions from [`youtube_dl/utils.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/utils.py): `int_or_none`, `float_or_none`. Use them for string to number conversions as well.

View File

@ -58,16 +58,8 @@
- **ARD:mediathek** - **ARD:mediathek**
- **ARDBetaMediathek** - **ARDBetaMediathek**
- **Arkena** - **Arkena**
- **arte.tv**
- **arte.tv:+7** - **arte.tv:+7**
- **arte.tv:cinema**
- **arte.tv:concert**
- **arte.tv:creative**
- **arte.tv:ddc**
- **arte.tv:embed** - **arte.tv:embed**
- **arte.tv:future**
- **arte.tv:info**
- **arte.tv:magazine**
- **arte.tv:playlist** - **arte.tv:playlist**
- **AsianCrush** - **AsianCrush**
- **AsianCrushPlaylist** - **AsianCrushPlaylist**
@ -231,6 +223,8 @@
- **DiscoveryNetworksDe** - **DiscoveryNetworksDe**
- **DiscoveryVR** - **DiscoveryVR**
- **Disney** - **Disney**
- **dlive:stream**
- **dlive:vod**
- **Dotsub** - **Dotsub**
- **DouyuShow** - **DouyuShow**
- **DouyuTV**: 斗鱼 - **DouyuTV**: 斗鱼
@ -313,9 +307,7 @@
- **FrontendMastersCourse** - **FrontendMastersCourse**
- **FrontendMastersLesson** - **FrontendMastersLesson**
- **Funimation** - **Funimation**
- **FunkChannel** - **Funk**
- **FunkMix**
- **FunnyOrDie**
- **Fusion** - **Fusion**
- **Fux** - **Fux**
- **FXNetworks** - **FXNetworks**
@ -458,6 +450,7 @@
- **linkedin:learning:course** - **linkedin:learning:course**
- **LinuxAcademy** - **LinuxAcademy**
- **LiTV** - **LiTV**
- **LiveJournal**
- **LiveLeak** - **LiveLeak**
- **LiveLeakEmbed** - **LiveLeakEmbed**
- **livestream** - **livestream**
@ -764,7 +757,6 @@
- **rtve.es:television** - **rtve.es:television**
- **RTVNH** - **RTVNH**
- **RTVS** - **RTVS**
- **Rudo**
- **RUHD** - **RUHD**
- **rutube**: Rutube videos - **rutube**: Rutube videos
- **rutube:channel**: Rutube channels - **rutube:channel**: Rutube channels
@ -896,7 +888,6 @@
- **TF1** - **TF1**
- **TFO** - **TFO**
- **TheIntercept** - **TheIntercept**
- **theoperaplatform**
- **ThePlatform** - **ThePlatform**
- **ThePlatformFeed** - **ThePlatformFeed**
- **TheScene** - **TheScene**
@ -1109,6 +1100,7 @@
- **XFileShare**: XFileShare based sites: DaClips, FileHoot, GorillaVid, MovPod, PowerWatch, Rapidvideo.ws, TheVideoBee, Vidto, Streamin.To, XVIDSTAGE, Vid ABC, VidBom, vidlo, RapidVideo.TV, FastVideo.me - **XFileShare**: XFileShare based sites: DaClips, FileHoot, GorillaVid, MovPod, PowerWatch, Rapidvideo.ws, TheVideoBee, Vidto, Streamin.To, XVIDSTAGE, Vid ABC, VidBom, vidlo, RapidVideo.TV, FastVideo.me
- **XHamster** - **XHamster**
- **XHamsterEmbed** - **XHamsterEmbed**
- **XHamsterUser**
- **xiami:album**: 虾米音乐 - 专辑 - **xiami:album**: 虾米音乐 - 专辑
- **xiami:artist**: 虾米音乐 - 歌手 - **xiami:artist**: 虾米音乐 - 歌手
- **xiami:collection**: 虾米音乐 - 精选集 - **xiami:collection**: 虾米音乐 - 精选集
@ -1126,6 +1118,7 @@
- **Yahoo**: Yahoo screen and movies - **Yahoo**: Yahoo screen and movies
- **yahoo:gyao** - **yahoo:gyao**
- **yahoo:gyao:player** - **yahoo:gyao:player**
- **yahoo:japannews**: Yahoo! Japan News
- **YandexDisk** - **YandexDisk**
- **yandexmusic:album**: Яндекс.Музыка - Альбом - **yandexmusic:album**: Яндекс.Музыка - Альбом
- **yandexmusic:playlist**: Яндекс.Музыка - Плейлист - **yandexmusic:playlist**: Яндекс.Музыка - Плейлист

View File

@ -1783,6 +1783,8 @@ class YoutubeDL(object):
annofn = replace_extension(filename, 'annotations.xml', info_dict.get('ext')) annofn = replace_extension(filename, 'annotations.xml', info_dict.get('ext'))
if self.params.get('nooverwrites', False) and os.path.exists(encodeFilename(annofn)): if self.params.get('nooverwrites', False) and os.path.exists(encodeFilename(annofn)):
self.to_screen('[info] Video annotations are already present') self.to_screen('[info] Video annotations are already present')
elif not info_dict.get('annotations'):
self.report_warning('There are no annotations to write.')
else: else:
try: try:
self.to_screen('[info] Writing video annotations to: ' + annofn) self.to_screen('[info] Writing video annotations to: ' + annofn)

View File

@ -94,7 +94,7 @@ def _real_main(argv=None):
if opts.verbose: if opts.verbose:
write_string('[debug] Batch file urls: ' + repr(batch_urls) + '\n') write_string('[debug] Batch file urls: ' + repr(batch_urls) + '\n')
except IOError: except IOError:
sys.exit('ERROR: batch file could not be read') sys.exit('ERROR: batch file %s could not be read' % opts.batchfile)
all_urls = batch_urls + [url.strip() for url in args] # batch_urls are already striped in read_batch_urls all_urls = batch_urls + [url.strip() for url in args] # batch_urls are already striped in read_batch_urls
_enc = preferredencoding() _enc = preferredencoding()
all_urls = [url.decode(_enc, 'ignore') if isinstance(url, bytes) else url for url in all_urls] all_urls = [url.decode(_enc, 'ignore') if isinstance(url, bytes) else url for url in all_urls]

View File

@ -53,7 +53,7 @@ class DashSegmentsFD(FragmentFD):
except compat_urllib_error.HTTPError as err: except compat_urllib_error.HTTPError as err:
# YouTube may often return 404 HTTP error for a fragment causing the # YouTube may often return 404 HTTP error for a fragment causing the
# whole download to fail. However if the same fragment is immediately # whole download to fail. However if the same fragment is immediately
# retried with the same request data this usually succeeds (1-2 attemps # retried with the same request data this usually succeeds (1-2 attempts
# is usually enough) thus allowing to download the whole file successfully. # is usually enough) thus allowing to download the whole file successfully.
# To be future-proof we will retry all fragments that fail with any # To be future-proof we will retry all fragments that fail with any
# HTTP error. # HTTP error.

View File

@ -194,6 +194,7 @@ class Aria2cFD(ExternalFD):
cmd += self._option('--interface', 'source_address') cmd += self._option('--interface', 'source_address')
cmd += self._option('--all-proxy', 'proxy') cmd += self._option('--all-proxy', 'proxy')
cmd += self._bool_option('--check-certificate', 'nocheckcertificate', 'false', 'true', '=') cmd += self._bool_option('--check-certificate', 'nocheckcertificate', 'false', 'true', '=')
cmd += self._bool_option('--remote-time', 'updatetime', 'true', 'false', '=')
cmd += ['--', info_dict['url']] cmd += ['--', info_dict['url']]
return cmd return cmd

View File

@ -190,12 +190,13 @@ class FragmentFD(FileDownloader):
}) })
def _start_frag_download(self, ctx): def _start_frag_download(self, ctx):
resume_len = ctx['complete_frags_downloaded_bytes']
total_frags = ctx['total_frags'] total_frags = ctx['total_frags']
# This dict stores the download progress, it's updated by the progress # This dict stores the download progress, it's updated by the progress
# hook # hook
state = { state = {
'status': 'downloading', 'status': 'downloading',
'downloaded_bytes': ctx['complete_frags_downloaded_bytes'], 'downloaded_bytes': resume_len,
'fragment_index': ctx['fragment_index'], 'fragment_index': ctx['fragment_index'],
'fragment_count': total_frags, 'fragment_count': total_frags,
'filename': ctx['filename'], 'filename': ctx['filename'],
@ -234,8 +235,8 @@ class FragmentFD(FileDownloader):
state['downloaded_bytes'] += frag_downloaded_bytes - ctx['prev_frag_downloaded_bytes'] state['downloaded_bytes'] += frag_downloaded_bytes - ctx['prev_frag_downloaded_bytes']
if not ctx['live']: if not ctx['live']:
state['eta'] = self.calc_eta( state['eta'] = self.calc_eta(
start, time_now, estimated_size, start, time_now, estimated_size - resume_len,
state['downloaded_bytes']) state['downloaded_bytes'] - resume_len)
state['speed'] = s.get('speed') or ctx.get('speed') state['speed'] = s.get('speed') or ctx.get('speed')
ctx['speed'] = state['speed'] ctx['speed'] = state['speed']
ctx['prev_frag_downloaded_bytes'] = frag_downloaded_bytes ctx['prev_frag_downloaded_bytes'] = frag_downloaded_bytes

View File

@ -146,7 +146,7 @@ def write_piff_header(stream, params):
sps, pps = codec_private_data.split(u32.pack(1))[1:] sps, pps = codec_private_data.split(u32.pack(1))[1:]
avcc_payload = u8.pack(1) # configuration version avcc_payload = u8.pack(1) # configuration version
avcc_payload += sps[1:4] # avc profile indication + profile compatibility + avc level indication avcc_payload += sps[1:4] # avc profile indication + profile compatibility + avc level indication
avcc_payload += u8.pack(0xfc | (params.get('nal_unit_length_field', 4) - 1)) # complete represenation (1) + reserved (11111) + length size minus one avcc_payload += u8.pack(0xfc | (params.get('nal_unit_length_field', 4) - 1)) # complete representation (1) + reserved (11111) + length size minus one
avcc_payload += u8.pack(1) # reserved (0) + number of sps (0000001) avcc_payload += u8.pack(1) # reserved (0) + number of sps (0000001)
avcc_payload += u16.pack(len(sps)) avcc_payload += u16.pack(len(sps))
avcc_payload += sps avcc_payload += sps

View File

@ -15,10 +15,13 @@ class AbcNewsVideoIE(AMPIE):
IE_NAME = 'abcnews:video' IE_NAME = 'abcnews:video'
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?:// https?://
abcnews\.go\.com/
(?: (?:
[^/]+/video/(?P<display_id>[0-9a-z-]+)-| abcnews\.go\.com/
video/embed\?.*?\bid= (?:
[^/]+/video/(?P<display_id>[0-9a-z-]+)-|
video/embed\?.*?\bid=
)|
fivethirtyeight\.abcnews\.go\.com/video/embed/\d+/
) )
(?P<id>\d+) (?P<id>\d+)
''' '''

View File

@ -7,6 +7,7 @@ import functools
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str from ..compat import compat_str
from ..utils import ( from ..utils import (
clean_html,
float_or_none, float_or_none,
int_or_none, int_or_none,
try_get, try_get,
@ -27,7 +28,7 @@ class ACastIE(InfoExtractor):
''' '''
_TESTS = [{ _TESTS = [{
'url': 'https://www.acast.com/sparpodcast/2.raggarmordet-rosterurdetforflutna', 'url': 'https://www.acast.com/sparpodcast/2.raggarmordet-rosterurdetforflutna',
'md5': 'a02393c74f3bdb1801c3ec2695577ce0', 'md5': '16d936099ec5ca2d5869e3a813ee8dc4',
'info_dict': { 'info_dict': {
'id': '2a92b283-1a75-4ad8-8396-499c641de0d9', 'id': '2a92b283-1a75-4ad8-8396-499c641de0d9',
'ext': 'mp3', 'ext': 'mp3',
@ -46,28 +47,37 @@ class ACastIE(InfoExtractor):
}, { }, {
'url': 'https://play.acast.com/s/rattegangspodden/s04e09-styckmordet-i-helenelund-del-22', 'url': 'https://play.acast.com/s/rattegangspodden/s04e09-styckmordet-i-helenelund-del-22',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://play.acast.com/s/sparpodcast/2a92b283-1a75-4ad8-8396-499c641de0d9',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
channel, display_id = re.match(self._VALID_URL, url).groups() channel, display_id = re.match(self._VALID_URL, url).groups()
s = self._download_json( s = self._download_json(
'https://play-api.acast.com/stitch/%s/%s' % (channel, display_id), 'https://feeder.acast.com/api/v1/shows/%s/episodes/%s' % (channel, display_id),
display_id)['result'] display_id)
media_url = s['url'] media_url = s['url']
if re.search(r'[0-9a-f]{8}-(?:[0-9a-f]{4}-){3}[0-9a-f]{12}', display_id):
episode_url = s.get('episodeUrl')
if episode_url:
display_id = episode_url
else:
channel, display_id = re.match(self._VALID_URL, s['link']).groups()
cast_data = self._download_json( cast_data = self._download_json(
'https://play-api.acast.com/splash/%s/%s' % (channel, display_id), 'https://play-api.acast.com/splash/%s/%s' % (channel, display_id),
display_id)['result'] display_id)['result']
e = cast_data['episode'] e = cast_data['episode']
title = e['name'] title = e.get('name') or s['title']
return { return {
'id': compat_str(e['id']), 'id': compat_str(e['id']),
'display_id': display_id, 'display_id': display_id,
'url': media_url, 'url': media_url,
'title': title, 'title': title,
'description': e.get('description') or e.get('summary'), 'description': e.get('summary') or clean_html(e.get('description') or s.get('description')),
'thumbnail': e.get('image'), 'thumbnail': e.get('image'),
'timestamp': unified_timestamp(e.get('publishingDate')), 'timestamp': unified_timestamp(e.get('publishingDate') or s.get('publishDate')),
'duration': float_or_none(s.get('duration') or e.get('duration')), 'duration': float_or_none(e.get('duration') or s.get('duration')),
'filesize': int_or_none(e.get('contentLength')), 'filesize': int_or_none(e.get('contentLength')),
'creator': try_get(cast_data, lambda x: x['show']['author'], compat_str), 'creator': try_get(cast_data, lambda x: x['show']['author'], compat_str),
'series': try_get(cast_data, lambda x: x['show']['name'], compat_str), 'series': try_get(cast_data, lambda x: x['show']['name'], compat_str),

View File

@ -25,6 +25,11 @@ MSO_INFO = {
'username_field': 'username', 'username_field': 'username',
'password_field': 'password', 'password_field': 'password',
}, },
'ATT': {
'name': 'AT&T U-verse',
'username_field': 'userid',
'password_field': 'password',
},
'ATTOTT': { 'ATTOTT': {
'name': 'DIRECTV NOW', 'name': 'DIRECTV NOW',
'username_field': 'userid', 'username_field': 'userid',

View File

@ -4,17 +4,10 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import compat_str
compat_parse_qs,
compat_str,
compat_urllib_parse_urlparse,
)
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
find_xpath_attr,
get_element_by_attribute,
int_or_none, int_or_none,
NO_DEFAULT,
qualities, qualities,
try_get, try_get,
unified_strdate, unified_strdate,
@ -25,59 +18,7 @@ from ..utils import (
# add tests. # add tests.
class ArteTvIE(InfoExtractor):
_VALID_URL = r'https?://videos\.arte\.tv/(?P<lang>fr|de|en|es)/.*-(?P<id>.*?)\.html'
IE_NAME = 'arte.tv'
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
lang = mobj.group('lang')
video_id = mobj.group('id')
ref_xml_url = url.replace('/videos/', '/do_delegate/videos/')
ref_xml_url = ref_xml_url.replace('.html', ',view,asPlayerXml.xml')
ref_xml_doc = self._download_xml(
ref_xml_url, video_id, note='Downloading metadata')
config_node = find_xpath_attr(ref_xml_doc, './/video', 'lang', lang)
config_xml_url = config_node.attrib['ref']
config = self._download_xml(
config_xml_url, video_id, note='Downloading configuration')
formats = [{
'format_id': q.attrib['quality'],
# The playpath starts at 'mp4:', if we don't manually
# split the url, rtmpdump will incorrectly parse them
'url': q.text.split('mp4:', 1)[0],
'play_path': 'mp4:' + q.text.split('mp4:', 1)[1],
'ext': 'flv',
'quality': 2 if q.attrib['quality'] == 'hd' else 1,
} for q in config.findall('./urls/url')]
self._sort_formats(formats)
title = config.find('.//name').text
thumbnail = config.find('.//firstThumbnailUrl').text
return {
'id': video_id,
'title': title,
'thumbnail': thumbnail,
'formats': formats,
}
class ArteTVBaseIE(InfoExtractor): class ArteTVBaseIE(InfoExtractor):
@classmethod
def _extract_url_info(cls, url):
mobj = re.match(cls._VALID_URL, url)
lang = mobj.group('lang')
query = compat_parse_qs(compat_urllib_parse_urlparse(url).query)
if 'vid' in query:
video_id = query['vid'][0]
else:
# This is not a real id, it can be for example AJT for the news
# http://www.arte.tv/guide/fr/emissions/AJT/arte-journal
video_id = mobj.group('id')
return video_id, lang
def _extract_from_json_url(self, json_url, video_id, lang, title=None): def _extract_from_json_url(self, json_url, video_id, lang, title=None):
info = self._download_json(json_url, video_id) info = self._download_json(json_url, video_id)
player_info = info['videoJsonPlayer'] player_info = info['videoJsonPlayer']
@ -108,13 +49,15 @@ class ArteTVBaseIE(InfoExtractor):
'upload_date': unified_strdate(upload_date_str), 'upload_date': unified_strdate(upload_date_str),
'thumbnail': player_info.get('programImage') or player_info.get('VTU', {}).get('IUR'), 'thumbnail': player_info.get('programImage') or player_info.get('VTU', {}).get('IUR'),
} }
qfunc = qualities(['HQ', 'MQ', 'EQ', 'SQ']) qfunc = qualities(['MQ', 'HQ', 'EQ', 'SQ'])
LANGS = { LANGS = {
'fr': 'F', 'fr': 'F',
'de': 'A', 'de': 'A',
'en': 'E[ANG]', 'en': 'E[ANG]',
'es': 'E[ESP]', 'es': 'E[ESP]',
'it': 'E[ITA]',
'pl': 'E[POL]',
} }
langcode = LANGS.get(lang, lang) langcode = LANGS.get(lang, lang)
@ -126,8 +69,8 @@ class ArteTVBaseIE(InfoExtractor):
l = re.escape(langcode) l = re.escape(langcode)
# Language preference from most to least priority # Language preference from most to least priority
# Reference: section 5.6.3 of # Reference: section 6.8 of
# http://www.arte.tv/sites/en/corporate/files/complete-technical-guidelines-arte-geie-v1-05.pdf # https://www.arte.tv/sites/en/corporate/files/complete-technical-guidelines-arte-geie-v1-07-1.pdf
PREFERENCES = ( PREFERENCES = (
# original version in requested language, without subtitles # original version in requested language, without subtitles
r'VO{0}$'.format(l), r'VO{0}$'.format(l),
@ -193,274 +136,59 @@ class ArteTVBaseIE(InfoExtractor):
class ArteTVPlus7IE(ArteTVBaseIE): class ArteTVPlus7IE(ArteTVBaseIE):
IE_NAME = 'arte.tv:+7' IE_NAME = 'arte.tv:+7'
_VALID_URL = r'https?://(?:(?:www|sites)\.)?arte\.tv/(?:[^/]+/)?(?P<lang>fr|de|en|es)/(?:videos/)?(?:[^/]+/)*(?P<id>[^/?#&]+)' _VALID_URL = r'https?://(?:www\.)?arte\.tv/(?P<lang>fr|de|en|es|it|pl)/videos/(?P<id>\d{6}-\d{3}-[AF])'
_TESTS = [{ _TESTS = [{
'url': 'http://www.arte.tv/guide/de/sendungen/XEN/xenius/?vid=055918-015_PLUS7-D', 'url': 'https://www.arte.tv/en/videos/088501-000-A/mexico-stealing-petrol-to-survive/',
'only_matching': True, 'info_dict': {
}, { 'id': '088501-000-A',
'url': 'http://sites.arte.tv/karambolage/de/video/karambolage-22', 'ext': 'mp4',
'only_matching': True, 'title': 'Mexico: Stealing Petrol to Survive',
}, { 'upload_date': '20190628',
'url': 'http://www.arte.tv/de/videos/048696-000-A/der-kluge-bauch-unser-zweites-gehirn', },
'only_matching': True,
}] }]
@classmethod
def suitable(cls, url):
return False if ArteTVPlaylistIE.suitable(url) else super(ArteTVPlus7IE, cls).suitable(url)
def _real_extract(self, url): def _real_extract(self, url):
video_id, lang = self._extract_url_info(url) lang, video_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, video_id) return self._extract_from_json_url(
return self._extract_from_webpage(webpage, video_id, lang) 'https://api.arte.tv/api/player/v1/config/%s/%s' % (lang, video_id),
video_id, lang)
def _extract_from_webpage(self, webpage, video_id, lang):
patterns_templates = (r'arte_vp_url=["\'](.*?%s.*?)["\']', r'data-url=["\']([^"]+%s[^"]+)["\']')
ids = (video_id, '')
# some pages contain multiple videos (like
# http://www.arte.tv/guide/de/sendungen/XEN/xenius/?vid=055918-015_PLUS7-D),
# so we first try to look for json URLs that contain the video id from
# the 'vid' parameter.
patterns = [t % re.escape(_id) for _id in ids for t in patterns_templates]
json_url = self._html_search_regex(
patterns, webpage, 'json vp url', default=None)
if not json_url:
def find_iframe_url(webpage, default=NO_DEFAULT):
return self._html_search_regex(
r'<iframe[^>]+src=(["\'])(?P<url>.+\bjson_url=.+?)\1',
webpage, 'iframe url', group='url', default=default)
iframe_url = find_iframe_url(webpage, None)
if not iframe_url:
embed_url = self._html_search_regex(
r'arte_vp_url_oembed=\'([^\']+?)\'', webpage, 'embed url', default=None)
if embed_url:
player = self._download_json(
embed_url, video_id, 'Downloading player page')
iframe_url = find_iframe_url(player['html'])
# en and es URLs produce react-based pages with different layout (e.g.
# http://www.arte.tv/guide/en/053330-002-A/carnival-italy?zone=world)
if not iframe_url:
program = self._search_regex(
r'program\s*:\s*({.+?["\']embed_html["\'].+?}),?\s*\n',
webpage, 'program', default=None)
if program:
embed_html = self._parse_json(program, video_id)
if embed_html:
iframe_url = find_iframe_url(embed_html['embed_html'])
if iframe_url:
json_url = compat_parse_qs(
compat_urllib_parse_urlparse(iframe_url).query)['json_url'][0]
if json_url:
title = self._search_regex(
r'<h3[^>]+title=(["\'])(?P<title>.+?)\1',
webpage, 'title', default=None, group='title')
return self._extract_from_json_url(json_url, video_id, lang, title=title)
# Different kind of embed URL (e.g.
# http://www.arte.tv/magazine/trepalium/fr/episode-0406-replay-trepalium)
entries = [
self.url_result(url)
for _, url in re.findall(r'<iframe[^>]+src=(["\'])(?P<url>.+?)\1', webpage)]
return self.playlist_result(entries)
# It also uses the arte_vp_url url from the webpage to extract the information
class ArteTVCreativeIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:creative'
_VALID_URL = r'https?://creative\.arte\.tv/(?P<lang>fr|de|en|es)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://creative.arte.tv/fr/episode/osmosis-episode-1',
'info_dict': {
'id': '057405-001-A',
'ext': 'mp4',
'title': 'OSMOSIS - N\'AYEZ PLUS PEUR D\'AIMER (1)',
'upload_date': '20150716',
},
}, {
'url': 'http://creative.arte.tv/fr/Monty-Python-Reunion',
'playlist_count': 11,
'add_ie': ['Youtube'],
}, {
'url': 'http://creative.arte.tv/de/episode/agentur-amateur-4-der-erste-kunde',
'only_matching': True,
}]
class ArteTVInfoIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:info'
_VALID_URL = r'https?://info\.arte\.tv/(?P<lang>fr|de|en|es)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://info.arte.tv/fr/service-civique-un-cache-misere',
'info_dict': {
'id': '067528-000-A',
'ext': 'mp4',
'title': 'Service civique, un cache misère ?',
'upload_date': '20160403',
},
}]
class ArteTVFutureIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:future'
_VALID_URL = r'https?://future\.arte\.tv/(?P<lang>fr|de|en|es)/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://future.arte.tv/fr/info-sciences/les-ecrevisses-aussi-sont-anxieuses',
'info_dict': {
'id': '050940-028-A',
'ext': 'mp4',
'title': 'Les écrevisses aussi peuvent être anxieuses',
'upload_date': '20140902',
},
}, {
'url': 'http://future.arte.tv/fr/la-science-est-elle-responsable',
'only_matching': True,
}]
class ArteTVDDCIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:ddc'
_VALID_URL = r'https?://ddc\.arte\.tv/(?P<lang>emission|folge)/(?P<id>[^/?#&]+)'
_TESTS = []
def _real_extract(self, url):
video_id, lang = self._extract_url_info(url)
if lang == 'folge':
lang = 'de'
elif lang == 'emission':
lang = 'fr'
webpage = self._download_webpage(url, video_id)
scriptElement = get_element_by_attribute('class', 'visu_video_block', webpage)
script_url = self._html_search_regex(r'src="(.*?)"', scriptElement, 'script url')
javascriptPlayerGenerator = self._download_webpage(script_url, video_id, 'Download javascript player generator')
json_url = self._search_regex(r"json_url=(.*)&rendering_place.*", javascriptPlayerGenerator, 'json url')
return self._extract_from_json_url(json_url, video_id, lang)
class ArteTVConcertIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:concert'
_VALID_URL = r'https?://concert\.arte\.tv/(?P<lang>fr|de|en|es)/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://concert.arte.tv/de/notwist-im-pariser-konzertclub-divan-du-monde',
'md5': '9ea035b7bd69696b67aa2ccaaa218161',
'info_dict': {
'id': '186',
'ext': 'mp4',
'title': 'The Notwist im Pariser Konzertclub "Divan du Monde"',
'upload_date': '20140128',
'description': 'md5:486eb08f991552ade77439fe6d82c305',
},
}]
class ArteTVCinemaIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:cinema'
_VALID_URL = r'https?://cinema\.arte\.tv/(?P<lang>fr|de|en|es)/(?P<id>.+)'
_TESTS = [{
'url': 'http://cinema.arte.tv/fr/article/les-ailes-du-desir-de-julia-reck',
'md5': 'a5b9dd5575a11d93daf0e3f404f45438',
'info_dict': {
'id': '062494-000-A',
'ext': 'mp4',
'title': 'Film lauréat du concours web - "Les ailes du désir" de Julia Reck',
'upload_date': '20150807',
},
}]
class ArteTVMagazineIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:magazine'
_VALID_URL = r'https?://(?:www\.)?arte\.tv/magazine/[^/]+/(?P<lang>fr|de|en|es)/(?P<id>[^/?#&]+)'
_TESTS = [{
# Embedded via <iframe src="http://www.arte.tv/arte_vp/index.php?json_url=..."
'url': 'http://www.arte.tv/magazine/trepalium/fr/entretien-avec-le-realisateur-vincent-lannoo-trepalium',
'md5': '2a9369bcccf847d1c741e51416299f25',
'info_dict': {
'id': '065965-000-A',
'ext': 'mp4',
'title': 'Trepalium - Extrait Ep.01',
'upload_date': '20160121',
},
}, {
# Embedded via <iframe src="http://www.arte.tv/guide/fr/embed/054813-004-A/medium"
'url': 'http://www.arte.tv/magazine/trepalium/fr/episode-0406-replay-trepalium',
'md5': 'fedc64fc7a946110fe311634e79782ca',
'info_dict': {
'id': '054813-004_PLUS7-F',
'ext': 'mp4',
'title': 'Trepalium (4/6)',
'description': 'md5:10057003c34d54e95350be4f9b05cb40',
'upload_date': '20160218',
},
}, {
'url': 'http://www.arte.tv/magazine/metropolis/de/frank-woeste-german-paris-metropolis',
'only_matching': True,
}]
class ArteTVEmbedIE(ArteTVPlus7IE): class ArteTVEmbedIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:embed' IE_NAME = 'arte.tv:embed'
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
http://www\.arte\.tv https://www\.arte\.tv
/(?:playerv2/embed|arte_vp/index)\.php\?json_url= /player/v3/index\.php\?json_url=
(?P<json_url> (?P<json_url>
http://arte\.tv/papi/tvguide/videos/stream/player/ https?://api\.arte\.tv/api/player/v1/config/
(?P<lang>[^/]+)/(?P<id>[^/]+)[^&]* (?P<lang>[^/]+)/(?P<id>\d{6}-\d{3}-[AF])
) )
''' '''
_TESTS = [] _TESTS = []
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) json_url, lang, video_id = re.match(self._VALID_URL, url).groups()
video_id = mobj.group('id')
lang = mobj.group('lang')
json_url = mobj.group('json_url')
return self._extract_from_json_url(json_url, video_id, lang) return self._extract_from_json_url(json_url, video_id, lang)
class TheOperaPlatformIE(ArteTVPlus7IE):
IE_NAME = 'theoperaplatform'
_VALID_URL = r'https?://(?:www\.)?theoperaplatform\.eu/(?P<lang>fr|de|en|es)/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://www.theoperaplatform.eu/de/opera/verdi-otello',
'md5': '970655901fa2e82e04c00b955e9afe7b',
'info_dict': {
'id': '060338-009-A',
'ext': 'mp4',
'title': 'Verdi - OTELLO',
'upload_date': '20160927',
},
}]
class ArteTVPlaylistIE(ArteTVBaseIE): class ArteTVPlaylistIE(ArteTVBaseIE):
IE_NAME = 'arte.tv:playlist' IE_NAME = 'arte.tv:playlist'
_VALID_URL = r'https?://(?:www\.)?arte\.tv/guide/(?P<lang>fr|de|en|es)/[^#]*#collection/(?P<id>PL-\d+)' _VALID_URL = r'https?://(?:www\.)?arte\.tv/(?P<lang>fr|de|en|es|it|pl)/videos/(?P<id>RC-\d{6})'
_TESTS = [{ _TESTS = [{
'url': 'http://www.arte.tv/guide/de/plus7/?country=DE#collection/PL-013263/ARTETV', 'url': 'https://www.arte.tv/en/videos/RC-016954/earn-a-living/',
'info_dict': { 'info_dict': {
'id': 'PL-013263', 'id': 'RC-016954',
'title': 'Areva & Uramin', 'title': 'Earn a Living',
'description': 'md5:a1dc0312ce357c262259139cfd48c9bf', 'description': 'md5:d322c55011514b3a7241f7fb80d494c2',
}, },
'playlist_mincount': 6, 'playlist_mincount': 6,
}, {
'url': 'http://www.arte.tv/guide/de/playlists?country=DE#collection/PL-013190/ARTETV',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
playlist_id, lang = self._extract_url_info(url) lang, playlist_id = re.match(self._VALID_URL, url).groups()
collection = self._download_json( collection = self._download_json(
'https://api.arte.tv/api/player/v1/collectionData/%s/%s?source=videos' 'https://api.arte.tv/api/player/v1/collectionData/%s/%s?source=videos'
% (lang, playlist_id), playlist_id) % (lang, playlist_id), playlist_id)

View File

@ -5,14 +5,12 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from .kaltura import KalturaIE from .kaltura import KalturaIE
from ..utils import ( from ..utils import extract_attributes
extract_attributes,
remove_end,
)
class AsianCrushIE(InfoExtractor): class AsianCrushIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?asiancrush\.com/video/(?:[^/]+/)?0+(?P<id>\d+)v\b' _VALID_URL_BASE = r'https?://(?:www\.)?(?P<host>(?:(?:asiancrush|yuyutv|midnightpulp)\.com|cocoro\.tv))'
_VALID_URL = r'%s/video/(?:[^/]+/)?0+(?P<id>\d+)v\b' % _VALID_URL_BASE
_TESTS = [{ _TESTS = [{
'url': 'https://www.asiancrush.com/video/012869v/women-who-flirt/', 'url': 'https://www.asiancrush.com/video/012869v/women-who-flirt/',
'md5': 'c3b740e48d0ba002a42c0b72857beae6', 'md5': 'c3b740e48d0ba002a42c0b72857beae6',
@ -20,7 +18,7 @@ class AsianCrushIE(InfoExtractor):
'id': '1_y4tmjm5r', 'id': '1_y4tmjm5r',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Women Who Flirt', 'title': 'Women Who Flirt',
'description': 'md5:3db14e9186197857e7063522cb89a805', 'description': 'md5:7e986615808bcfb11756eb503a751487',
'timestamp': 1496936429, 'timestamp': 1496936429,
'upload_date': '20170608', 'upload_date': '20170608',
'uploader_id': 'craig@crifkin.com', 'uploader_id': 'craig@crifkin.com',
@ -28,10 +26,27 @@ class AsianCrushIE(InfoExtractor):
}, { }, {
'url': 'https://www.asiancrush.com/video/she-was-pretty/011886v-pretty-episode-3/', 'url': 'https://www.asiancrush.com/video/she-was-pretty/011886v-pretty-episode-3/',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.yuyutv.com/video/013886v/the-act-of-killing/',
'only_matching': True,
}, {
'url': 'https://www.yuyutv.com/video/peep-show/013922v-warring-factions/',
'only_matching': True,
}, {
'url': 'https://www.midnightpulp.com/video/010400v/drifters/',
'only_matching': True,
}, {
'url': 'https://www.midnightpulp.com/video/mononoke/016378v-zashikiwarashi-part-1/',
'only_matching': True,
}, {
'url': 'https://www.cocoro.tv/video/the-wonderful-wizard-of-oz/008878v-the-wonderful-wizard-of-oz-ep01/',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) mobj = re.match(self._VALID_URL, url)
host = mobj.group('host')
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
@ -51,7 +66,7 @@ class AsianCrushIE(InfoExtractor):
r'\bentry_id["\']\s*:\s*["\'](\d+)', webpage, 'entry id') r'\bentry_id["\']\s*:\s*["\'](\d+)', webpage, 'entry id')
player = self._download_webpage( player = self._download_webpage(
'https://api.asiancrush.com/embeddedVideoPlayer', video_id, 'https://api.%s/embeddedVideoPlayer' % host, video_id,
query={'id': entry_id}) query={'id': entry_id})
kaltura_id = self._search_regex( kaltura_id = self._search_regex(
@ -63,15 +78,23 @@ class AsianCrushIE(InfoExtractor):
r'/p(?:artner_id)?/(\d+)', player, 'partner id', r'/p(?:artner_id)?/(\d+)', player, 'partner id',
default='513551') default='513551')
return self.url_result( description = self._html_search_regex(
'kaltura:%s:%s' % (partner_id, kaltura_id), r'(?s)<div[^>]+\bclass=["\']description["\'][^>]*>(.+?)</div>',
ie=KalturaIE.ie_key(), video_id=kaltura_id, webpage, 'description', fatal=False)
video_title=title)
return {
'_type': 'url_transparent',
'url': 'kaltura:%s:%s' % (partner_id, kaltura_id),
'ie_key': KalturaIE.ie_key(),
'id': video_id,
'title': title,
'description': description,
}
class AsianCrushPlaylistIE(InfoExtractor): class AsianCrushPlaylistIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?asiancrush\.com/series/0+(?P<id>\d+)s\b' _VALID_URL = r'%s/series/0+(?P<id>\d+)s\b' % AsianCrushIE._VALID_URL_BASE
_TEST = { _TESTS = [{
'url': 'https://www.asiancrush.com/series/012481s/scholar-walks-night/', 'url': 'https://www.asiancrush.com/series/012481s/scholar-walks-night/',
'info_dict': { 'info_dict': {
'id': '12481', 'id': '12481',
@ -79,7 +102,16 @@ class AsianCrushPlaylistIE(InfoExtractor):
'description': 'md5:7addd7c5132a09fd4741152d96cce886', 'description': 'md5:7addd7c5132a09fd4741152d96cce886',
}, },
'playlist_count': 20, 'playlist_count': 20,
} }, {
'url': 'https://www.yuyutv.com/series/013920s/peep-show/',
'only_matching': True,
}, {
'url': 'https://www.midnightpulp.com/series/016375s/mononoke/',
'only_matching': True,
}, {
'url': 'https://www.cocoro.tv/series/008549s/the-wonderful-wizard-of-oz/',
'only_matching': True,
}]
def _real_extract(self, url): def _real_extract(self, url):
playlist_id = self._match_id(url) playlist_id = self._match_id(url)
@ -96,15 +128,15 @@ class AsianCrushPlaylistIE(InfoExtractor):
entries.append(self.url_result( entries.append(self.url_result(
mobj.group('url'), ie=AsianCrushIE.ie_key())) mobj.group('url'), ie=AsianCrushIE.ie_key()))
title = remove_end( title = self._html_search_regex(
self._html_search_regex( r'(?s)<h1\b[^>]\bid=["\']movieTitle[^>]+>(.+?)</h1>', webpage,
r'(?s)<h1\b[^>]\bid=["\']movieTitle[^>]+>(.+?)</h1>', webpage, 'title', default=None) or self._og_search_title(
'title', default=None) or self._og_search_title( webpage, default=None) or self._html_search_meta(
webpage, default=None) or self._html_search_meta( 'twitter:title', webpage, 'title',
'twitter:title', webpage, 'title', default=None) or self._search_regex(
default=None) or self._search_regex( r'<title>([^<]+)</title>', webpage, 'title', fatal=False)
r'<title>([^<]+)</title>', webpage, 'title', fatal=False), if title:
' | AsianCrush') title = re.sub(r'\s*\|\s*.+?$', '', title)
description = self._og_search_description( description = self._og_search_description(
webpage, default=None) or self._html_search_meta( webpage, default=None) or self._html_search_meta(

View File

@ -40,6 +40,7 @@ class BBCCoUkIE(InfoExtractor):
iplayer(?:/[^/]+)?/(?:episode/|playlist/)| iplayer(?:/[^/]+)?/(?:episode/|playlist/)|
music/(?:clips|audiovideo/popular)[/#]| music/(?:clips|audiovideo/popular)[/#]|
radio/player/| radio/player/|
sounds/play/|
events/[^/]+/play/[^/]+/ events/[^/]+/play/[^/]+/
) )
(?P<id>%s)(?!/(?:episodes|broadcasts|clips)) (?P<id>%s)(?!/(?:episodes|broadcasts|clips))
@ -70,7 +71,7 @@ class BBCCoUkIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': 'b039d07m', 'id': 'b039d07m',
'ext': 'flv', 'ext': 'flv',
'title': 'Leonard Cohen, Kaleidoscope - BBC Radio 4', 'title': 'Kaleidoscope, Leonard Cohen',
'description': 'The Canadian poet and songwriter reflects on his musical career.', 'description': 'The Canadian poet and songwriter reflects on his musical career.',
}, },
'params': { 'params': {
@ -220,6 +221,20 @@ class BBCCoUkIE(InfoExtractor):
# rtmp download # rtmp download
'skip_download': True, 'skip_download': True,
}, },
}, {
'url': 'https://www.bbc.co.uk/sounds/play/m0007jzb',
'note': 'Audio',
'info_dict': {
'id': 'm0007jz9',
'ext': 'mp4',
'title': 'BBC Proms, 2019, Prom 34: WestEastern Divan Orchestra',
'description': "Live BBC Proms. WestEastern Divan Orchestra with Daniel Barenboim and Martha Argerich.",
'duration': 9840,
},
'params': {
# rtmp download
'skip_download': True,
}
}, { }, {
'url': 'http://www.bbc.co.uk/iplayer/playlist/p01dvks4', 'url': 'http://www.bbc.co.uk/iplayer/playlist/p01dvks4',
'only_matching': True, 'only_matching': True,
@ -609,7 +624,7 @@ class BBCIE(BBCCoUkIE):
'url': 'http://www.bbc.com/news/world-europe-32668511', 'url': 'http://www.bbc.com/news/world-europe-32668511',
'info_dict': { 'info_dict': {
'id': 'world-europe-32668511', 'id': 'world-europe-32668511',
'title': 'Russia stages massive WW2 parade despite Western boycott', 'title': 'Russia stages massive WW2 parade',
'description': 'md5:00ff61976f6081841f759a08bf78cc9c', 'description': 'md5:00ff61976f6081841f759a08bf78cc9c',
}, },
'playlist_count': 2, 'playlist_count': 2,

View File

@ -99,7 +99,7 @@ class BeamProLiveIE(BeamProBaseIE):
class BeamProVodIE(BeamProBaseIE): class BeamProVodIE(BeamProBaseIE):
IE_NAME = 'Mixer:vod' IE_NAME = 'Mixer:vod'
_VALID_URL = r'https?://(?:\w+\.)?(?:beam\.pro|mixer\.com)/[^/?#&]+\?.*?\bvod=(?P<id>\w+)' _VALID_URL = r'https?://(?:\w+\.)?(?:beam\.pro|mixer\.com)/[^/?#&]+\?.*?\bvod=(?P<id>[^?#&]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://mixer.com/willow8714?vod=2259830', 'url': 'https://mixer.com/willow8714?vod=2259830',
'md5': 'b2431e6e8347dc92ebafb565d368b76b', 'md5': 'b2431e6e8347dc92ebafb565d368b76b',
@ -122,6 +122,9 @@ class BeamProVodIE(BeamProBaseIE):
}, { }, {
'url': 'https://mixer.com/streamer?vod=IxFno1rqC0S_XJ1a2yGgNw', 'url': 'https://mixer.com/streamer?vod=IxFno1rqC0S_XJ1a2yGgNw',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://mixer.com/streamer?vod=Rh3LY0VAqkGpEQUe2pN-ig',
'only_matching': True,
}] }]
@staticmethod @staticmethod

View File

@ -32,6 +32,10 @@ class BeegIE(InfoExtractor):
# api/v6 v2 # api/v6 v2
'url': 'https://beeg.com/1941093077?t=911-1391', 'url': 'https://beeg.com/1941093077?t=911-1391',
'only_matching': True, 'only_matching': True,
}, {
# api/v6 v2 w/o t
'url': 'https://beeg.com/1277207756',
'only_matching': True,
}, { }, {
'url': 'https://beeg.porn/video/5416503', 'url': 'https://beeg.porn/video/5416503',
'only_matching': True, 'only_matching': True,
@ -49,14 +53,17 @@ class BeegIE(InfoExtractor):
r'beeg_version\s*=\s*([\da-zA-Z_-]+)', webpage, 'beeg version', r'beeg_version\s*=\s*([\da-zA-Z_-]+)', webpage, 'beeg version',
default='1546225636701') default='1546225636701')
qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query) if len(video_id) >= 10:
t = qs.get('t', [''])[0].split('-')
if len(t) > 1:
query = { query = {
'v': 2, 'v': 2,
's': t[0],
'e': t[1],
} }
qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
t = qs.get('t', [''])[0].split('-')
if len(t) > 1:
query.update({
's': t[0],
'e': t[1],
})
else: else:
query = {'v': 1} query = {'v': 1}

View File

@ -15,6 +15,7 @@ from ..utils import (
float_or_none, float_or_none,
parse_iso8601, parse_iso8601,
smuggle_url, smuggle_url,
str_or_none,
strip_jsonp, strip_jsonp,
unified_timestamp, unified_timestamp,
unsmuggle_url, unsmuggle_url,
@ -306,3 +307,115 @@ class BiliBiliBangumiIE(InfoExtractor):
return self.playlist_result( return self.playlist_result(
entries, bangumi_id, entries, bangumi_id,
season_info.get('bangumi_title'), season_info.get('evaluate')) season_info.get('bangumi_title'), season_info.get('evaluate'))
class BilibiliAudioBaseIE(InfoExtractor):
def _call_api(self, path, sid, query=None):
if not query:
query = {'sid': sid}
return self._download_json(
'https://www.bilibili.com/audio/music-service-c/web/' + path,
sid, query=query)['data']
class BilibiliAudioIE(BilibiliAudioBaseIE):
_VALID_URL = r'https?://(?:www\.)?bilibili\.com/audio/au(?P<id>\d+)'
_TEST = {
'url': 'https://www.bilibili.com/audio/au1003142',
'md5': 'fec4987014ec94ef9e666d4d158ad03b',
'info_dict': {
'id': '1003142',
'ext': 'm4a',
'title': '【tsukimi】YELLOW / 神山羊',
'artist': 'tsukimi',
'comment_count': int,
'description': 'YELLOW的mp3版',
'duration': 183,
'subtitles': {
'origin': [{
'ext': 'lrc',
}],
},
'thumbnail': r're:^https?://.+\.jpg',
'timestamp': 1564836614,
'upload_date': '20190803',
'uploader': 'tsukimi-つきみぐー',
'view_count': int,
},
}
def _real_extract(self, url):
au_id = self._match_id(url)
play_data = self._call_api('url', au_id)
formats = [{
'url': play_data['cdns'][0],
'filesize': int_or_none(play_data.get('size')),
}]
song = self._call_api('song/info', au_id)
title = song['title']
statistic = song.get('statistic') or {}
subtitles = None
lyric = song.get('lyric')
if lyric:
subtitles = {
'origin': [{
'url': lyric,
}]
}
return {
'id': au_id,
'title': title,
'formats': formats,
'artist': song.get('author'),
'comment_count': int_or_none(statistic.get('comment')),
'description': song.get('intro'),
'duration': int_or_none(song.get('duration')),
'subtitles': subtitles,
'thumbnail': song.get('cover'),
'timestamp': int_or_none(song.get('passtime')),
'uploader': song.get('uname'),
'view_count': int_or_none(statistic.get('play')),
}
class BilibiliAudioAlbumIE(BilibiliAudioBaseIE):
_VALID_URL = r'https?://(?:www\.)?bilibili\.com/audio/am(?P<id>\d+)'
_TEST = {
'url': 'https://www.bilibili.com/audio/am10624',
'info_dict': {
'id': '10624',
'title': '每日新曲推荐每日11:00更新',
'description': '每天11:00更新为你推送最新音乐',
},
'playlist_count': 19,
}
def _real_extract(self, url):
am_id = self._match_id(url)
songs = self._call_api(
'song/of-menu', am_id, {'sid': am_id, 'pn': 1, 'ps': 100})['data']
entries = []
for song in songs:
sid = str_or_none(song.get('id'))
if not sid:
continue
entries.append(self.url_result(
'https://www.bilibili.com/audio/au' + sid,
BilibiliAudioIE.ie_key(), sid))
if entries:
album_data = self._call_api('menu/info', am_id) or {}
album_title = album_data.get('title')
if album_title:
for entry in entries:
entry['album'] = album_title
return self.playlist_result(
entries, am_id, album_title, album_data.get('intro'))
return self.playlist_result(entries, am_id)

View File

@ -6,7 +6,6 @@ from ..utils import (
ExtractorError, ExtractorError,
remove_end, remove_end,
) )
from .rudo import RudoIE
class BioBioChileTVIE(InfoExtractor): class BioBioChileTVIE(InfoExtractor):
@ -41,11 +40,15 @@ class BioBioChileTVIE(InfoExtractor):
}, { }, {
'url': 'http://www.biobiochile.cl/noticias/bbtv/comentarios-bio-bio/2016/07/08/edecanes-del-congreso-figuras-decorativas-que-le-cuestan-muy-caro-a-los-chilenos.shtml', 'url': 'http://www.biobiochile.cl/noticias/bbtv/comentarios-bio-bio/2016/07/08/edecanes-del-congreso-figuras-decorativas-que-le-cuestan-muy-caro-a-los-chilenos.shtml',
'info_dict': { 'info_dict': {
'id': 'edecanes-del-congreso-figuras-decorativas-que-le-cuestan-muy-caro-a-los-chilenos', 'id': 'b4xd0LK3SK',
'ext': 'mp4', 'ext': 'mp4',
'uploader': '(none)', # TODO: fix url_transparent information overriding
'upload_date': '20160708', # 'uploader': 'Juan Pablo Echenique',
'title': 'Edecanes del Congreso: Figuras decorativas que le cuestan muy caro a los chilenos', 'title': 'Comentario Oscar Cáceres',
},
'params': {
# empty m3u8 manifest
'skip_download': True,
}, },
}, { }, {
'url': 'http://tv.biobiochile.cl/notas/2015/10/22/ninos-transexuales-de-quien-es-la-decision.shtml', 'url': 'http://tv.biobiochile.cl/notas/2015/10/22/ninos-transexuales-de-quien-es-la-decision.shtml',
@ -60,7 +63,9 @@ class BioBioChileTVIE(InfoExtractor):
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
rudo_url = RudoIE._extract_url(webpage) rudo_url = self._search_regex(
r'<iframe[^>]+src=(?P<q1>[\'"])(?P<url>(?:https?:)?//rudo\.video/vod/[0-9a-zA-Z]+)(?P=q1)',
webpage, 'embed URL', None, group='url')
if not rudo_url: if not rudo_url:
raise ExtractorError('No videos found') raise ExtractorError('No videos found')
@ -68,7 +73,7 @@ class BioBioChileTVIE(InfoExtractor):
thumbnail = self._og_search_thumbnail(webpage) thumbnail = self._og_search_thumbnail(webpage)
uploader = self._html_search_regex( uploader = self._html_search_regex(
r'<a[^>]+href=["\']https?://(?:busca|www)\.biobiochile\.cl/(?:lista/)?(?:author|autor)[^>]+>(.+?)</a>', r'<a[^>]+href=["\'](?:https?://(?:busca|www)\.biobiochile\.cl)?/(?:lista/)?(?:author|autor)[^>]+>(.+?)</a>',
webpage, 'uploader', fatal=False) webpage, 'uploader', fatal=False)
return { return {

View File

@ -71,7 +71,7 @@ class BleacherReportIE(InfoExtractor):
video = article_data.get('video') video = article_data.get('video')
if video: if video:
video_type = video['type'] video_type = video['type']
if video_type == 'cms.bleacherreport.com': if video_type in ('cms.bleacherreport.com', 'vid.bleacherreport.com'):
info['url'] = 'http://bleacherreport.com/video_embed?id=%s' % video['id'] info['url'] = 'http://bleacherreport.com/video_embed?id=%s' % video['id']
elif video_type == 'ooyala.com': elif video_type == 'ooyala.com':
info['url'] = 'ooyala:%s' % video['id'] info['url'] = 'ooyala:%s' % video['id']
@ -87,9 +87,9 @@ class BleacherReportIE(InfoExtractor):
class BleacherReportCMSIE(AMPIE): class BleacherReportCMSIE(AMPIE):
_VALID_URL = r'https?://(?:www\.)?bleacherreport\.com/video_embed\?id=(?P<id>[0-9a-f-]{36})' _VALID_URL = r'https?://(?:www\.)?bleacherreport\.com/video_embed\?id=(?P<id>[0-9a-f-]{36}|\d{5})'
_TESTS = [{ _TESTS = [{
'url': 'http://bleacherreport.com/video_embed?id=8fd44c2f-3dc5-4821-9118-2c825a98c0e1', 'url': 'http://bleacherreport.com/video_embed?id=8fd44c2f-3dc5-4821-9118-2c825a98c0e1&library=video-cms',
'md5': '2e4b0a997f9228ffa31fada5c53d1ed1', 'md5': '2e4b0a997f9228ffa31fada5c53d1ed1',
'info_dict': { 'info_dict': {
'id': '8fd44c2f-3dc5-4821-9118-2c825a98c0e1', 'id': '8fd44c2f-3dc5-4821-9118-2c825a98c0e1',
@ -101,6 +101,6 @@ class BleacherReportCMSIE(AMPIE):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
info = self._extract_feed_info('http://cms.bleacherreport.com/media/items/%s/akamai.json' % video_id) info = self._extract_feed_info('http://vid.bleacherreport.com/videos/%s.akamai' % video_id)
info['id'] = video_id info['id'] = video_id
return info return info

View File

@ -2,7 +2,6 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import base64 import base64
import json
import re import re
import struct import struct
@ -11,14 +10,12 @@ from .adobepass import AdobePassIE
from ..compat import ( from ..compat import (
compat_etree_fromstring, compat_etree_fromstring,
compat_parse_qs, compat_parse_qs,
compat_str,
compat_urllib_parse_urlparse, compat_urllib_parse_urlparse,
compat_urlparse, compat_urlparse,
compat_xml_parse_error, compat_xml_parse_error,
compat_HTTPError, compat_HTTPError,
) )
from ..utils import ( from ..utils import (
determine_ext,
ExtractorError, ExtractorError,
extract_attributes, extract_attributes,
find_xpath_attr, find_xpath_attr,
@ -27,18 +24,19 @@ from ..utils import (
js_to_json, js_to_json,
int_or_none, int_or_none,
parse_iso8601, parse_iso8601,
smuggle_url,
unescapeHTML, unescapeHTML,
unsmuggle_url, unsmuggle_url,
update_url_query, update_url_query,
clean_html, clean_html,
mimetype2ext, mimetype2ext,
UnsupportedError,
) )
class BrightcoveLegacyIE(InfoExtractor): class BrightcoveLegacyIE(InfoExtractor):
IE_NAME = 'brightcove:legacy' IE_NAME = 'brightcove:legacy'
_VALID_URL = r'(?:https?://.*brightcove\.com/(services|viewer).*?\?|brightcove:)(?P<query>.*)' _VALID_URL = r'(?:https?://.*brightcove\.com/(services|viewer).*?\?|brightcove:)(?P<query>.*)'
_FEDERATED_URL = 'http://c.brightcove.com/services/viewer/htmlFederated'
_TESTS = [ _TESTS = [
{ {
@ -55,7 +53,8 @@ class BrightcoveLegacyIE(InfoExtractor):
'timestamp': 1368213670, 'timestamp': 1368213670,
'upload_date': '20130510', 'upload_date': '20130510',
'uploader_id': '1589608506001', 'uploader_id': '1589608506001',
} },
'skip': 'The player has been deactivated by the content owner',
}, },
{ {
# From http://medianetwork.oracle.com/video/player/1785452137001 # From http://medianetwork.oracle.com/video/player/1785452137001
@ -70,6 +69,7 @@ class BrightcoveLegacyIE(InfoExtractor):
'upload_date': '20120814', 'upload_date': '20120814',
'uploader_id': '1460825906', 'uploader_id': '1460825906',
}, },
'skip': 'video not playable',
}, },
{ {
# From http://mashable.com/2013/10/26/thermoelectric-bracelet-lets-you-control-your-body-temperature/ # From http://mashable.com/2013/10/26/thermoelectric-bracelet-lets-you-control-your-body-temperature/
@ -79,7 +79,7 @@ class BrightcoveLegacyIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'This Bracelet Acts as a Personal Thermostat', 'title': 'This Bracelet Acts as a Personal Thermostat',
'description': 'md5:547b78c64f4112766ccf4e151c20b6a0', 'description': 'md5:547b78c64f4112766ccf4e151c20b6a0',
'uploader': 'Mashable', # 'uploader': 'Mashable',
'timestamp': 1382041798, 'timestamp': 1382041798,
'upload_date': '20131017', 'upload_date': '20131017',
'uploader_id': '1130468786001', 'uploader_id': '1130468786001',
@ -124,6 +124,7 @@ class BrightcoveLegacyIE(InfoExtractor):
'id': '3550319591001', 'id': '3550319591001',
}, },
'playlist_mincount': 7, 'playlist_mincount': 7,
'skip': 'Unsupported URL',
}, },
{ {
# playlist with 'playlistTab' (https://github.com/ytdl-org/youtube-dl/issues/9965) # playlist with 'playlistTab' (https://github.com/ytdl-org/youtube-dl/issues/9965)
@ -133,6 +134,7 @@ class BrightcoveLegacyIE(InfoExtractor):
'title': 'Lesson 08', 'title': 'Lesson 08',
}, },
'playlist_mincount': 10, 'playlist_mincount': 10,
'skip': 'Unsupported URL',
}, },
{ {
# playerID inferred from bcpid # playerID inferred from bcpid
@ -141,12 +143,6 @@ class BrightcoveLegacyIE(InfoExtractor):
'only_matching': True, # Tested in GenericIE 'only_matching': True, # Tested in GenericIE
} }
] ]
FLV_VCODECS = {
1: 'SORENSON',
2: 'ON2',
3: 'H264',
4: 'VP8',
}
@classmethod @classmethod
def _build_brighcove_url(cls, object_str): def _build_brighcove_url(cls, object_str):
@ -238,7 +234,8 @@ class BrightcoveLegacyIE(InfoExtractor):
@classmethod @classmethod
def _make_brightcove_url(cls, params): def _make_brightcove_url(cls, params):
return update_url_query(cls._FEDERATED_URL, params) return update_url_query(
'http://c.brightcove.com/services/viewer/htmlFederated', params)
@classmethod @classmethod
def _extract_brightcove_url(cls, webpage): def _extract_brightcove_url(cls, webpage):
@ -297,38 +294,12 @@ class BrightcoveLegacyIE(InfoExtractor):
videoPlayer = query.get('@videoPlayer') videoPlayer = query.get('@videoPlayer')
if videoPlayer: if videoPlayer:
# We set the original url as the default 'Referer' header # We set the original url as the default 'Referer' header
referer = smuggled_data.get('Referer', url) referer = query.get('linkBaseURL', [None])[0] or smuggled_data.get('Referer', url)
video_id = videoPlayer[0]
if 'playerID' not in query: if 'playerID' not in query:
mobj = re.search(r'/bcpid(\d+)', url) mobj = re.search(r'/bcpid(\d+)', url)
if mobj is not None: if mobj is not None:
query['playerID'] = [mobj.group(1)] query['playerID'] = [mobj.group(1)]
return self._get_video_info(
videoPlayer[0], query, referer=referer)
elif 'playerKey' in query:
player_key = query['playerKey']
return self._get_playlist_info(player_key[0])
else:
raise ExtractorError(
'Cannot find playerKey= variable. Did you forget quotes in a shell invocation?',
expected=True)
def _brightcove_new_url_result(self, publisher_id, video_id):
brightcove_new_url = 'http://players.brightcove.net/%s/default_default/index.html?videoId=%s' % (publisher_id, video_id)
return self.url_result(brightcove_new_url, BrightcoveNewIE.ie_key(), video_id)
def _get_video_info(self, video_id, query, referer=None):
headers = {}
linkBase = query.get('linkBaseURL')
if linkBase is not None:
referer = linkBase[0]
if referer is not None:
headers['Referer'] = referer
webpage = self._download_webpage(self._FEDERATED_URL, video_id, headers=headers, query=query)
error_msg = self._html_search_regex(
r"<h1>We're sorry.</h1>([\s\n]*<p>.*?</p>)+", webpage,
'error message', default=None)
if error_msg is not None:
publisher_id = query.get('publisherId') publisher_id = query.get('publisherId')
if publisher_id and publisher_id[0].isdigit(): if publisher_id and publisher_id[0].isdigit():
publisher_id = publisher_id[0] publisher_id = publisher_id[0]
@ -339,6 +310,9 @@ class BrightcoveLegacyIE(InfoExtractor):
else: else:
player_id = query.get('playerID') player_id = query.get('playerID')
if player_id and player_id[0].isdigit(): if player_id and player_id[0].isdigit():
headers = {}
if referer:
headers['Referer'] = referer
player_page = self._download_webpage( player_page = self._download_webpage(
'http://link.brightcove.com/services/player/bcpid' + player_id[0], 'http://link.brightcove.com/services/player/bcpid' + player_id[0],
video_id, headers=headers, fatal=False) video_id, headers=headers, fatal=False)
@ -349,136 +323,16 @@ class BrightcoveLegacyIE(InfoExtractor):
if player_key: if player_key:
enc_pub_id = player_key.split(',')[1].replace('~', '=') enc_pub_id = player_key.split(',')[1].replace('~', '=')
publisher_id = struct.unpack('>Q', base64.urlsafe_b64decode(enc_pub_id))[0] publisher_id = struct.unpack('>Q', base64.urlsafe_b64decode(enc_pub_id))[0]
if publisher_id: if publisher_id:
return self._brightcove_new_url_result(publisher_id, video_id) brightcove_new_url = 'http://players.brightcove.net/%s/default_default/index.html?videoId=%s' % (publisher_id, video_id)
raise ExtractorError( if referer:
'brightcove said: %s' % error_msg, expected=True) brightcove_new_url = smuggle_url(brightcove_new_url, {'referrer': referer})
return self.url_result(brightcove_new_url, BrightcoveNewIE.ie_key(), video_id)
self.report_extraction(video_id) # TODO: figure out if it's possible to extract playlistId from playerKey
info = self._search_regex(r'var experienceJSON = ({.*});', webpage, 'json') # elif 'playerKey' in query:
info = json.loads(info)['data'] # player_key = query['playerKey']
video_info = info['programmedContent']['videoPlayer']['mediaDTO'] # return self._get_playlist_info(player_key[0])
video_info['_youtubedl_adServerURL'] = info.get('adServerURL') raise UnsupportedError(url)
return self._extract_video_info(video_info)
def _get_playlist_info(self, player_key):
info_url = 'http://c.brightcove.com/services/json/experience/runtime/?command=get_programming_for_experience&playerKey=%s' % player_key
playlist_info = self._download_webpage(
info_url, player_key, 'Downloading playlist information')
json_data = json.loads(playlist_info)
if 'videoList' in json_data:
playlist_info = json_data['videoList']
playlist_dto = playlist_info['mediaCollectionDTO']
elif 'playlistTabs' in json_data:
playlist_info = json_data['playlistTabs']
playlist_dto = playlist_info['lineupListDTO']['playlistDTOs'][0]
else:
raise ExtractorError('Empty playlist')
videos = [self._extract_video_info(video_info) for video_info in playlist_dto['videoDTOs']]
return self.playlist_result(videos, playlist_id='%s' % playlist_info['id'],
playlist_title=playlist_dto['displayName'])
def _extract_video_info(self, video_info):
video_id = compat_str(video_info['id'])
publisher_id = video_info.get('publisherId')
info = {
'id': video_id,
'title': video_info['displayName'].strip(),
'description': video_info.get('shortDescription'),
'thumbnail': video_info.get('videoStillURL') or video_info.get('thumbnailURL'),
'uploader': video_info.get('publisherName'),
'uploader_id': compat_str(publisher_id) if publisher_id else None,
'duration': float_or_none(video_info.get('length'), 1000),
'timestamp': int_or_none(video_info.get('creationDate'), 1000),
}
renditions = video_info.get('renditions', []) + video_info.get('IOSRenditions', [])
if renditions:
formats = []
for rend in renditions:
url = rend['defaultURL']
if not url:
continue
ext = None
if rend['remote']:
url_comp = compat_urllib_parse_urlparse(url)
if url_comp.path.endswith('.m3u8'):
formats.extend(
self._extract_m3u8_formats(
url, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
continue
elif 'akamaihd.net' in url_comp.netloc:
# This type of renditions are served through
# akamaihd.net, but they don't use f4m manifests
url = url.replace('control/', '') + '?&v=3.3.0&fp=13&r=FEEFJ&g=RTSJIMBMPFPB'
ext = 'flv'
if ext is None:
ext = determine_ext(url)
tbr = int_or_none(rend.get('encodingRate'), 1000)
a_format = {
'format_id': 'http%s' % ('-%s' % tbr if tbr else ''),
'url': url,
'ext': ext,
'filesize': int_or_none(rend.get('size')) or None,
'tbr': tbr,
}
if rend.get('audioOnly'):
a_format.update({
'vcodec': 'none',
})
else:
a_format.update({
'height': int_or_none(rend.get('frameHeight')),
'width': int_or_none(rend.get('frameWidth')),
'vcodec': rend.get('videoCodec'),
})
# m3u8 manifests with remote == false are media playlists
# Not calling _extract_m3u8_formats here to save network traffic
if ext == 'm3u8':
a_format.update({
'format_id': 'hls%s' % ('-%s' % tbr if tbr else ''),
'ext': 'mp4',
'protocol': 'm3u8_native',
})
formats.append(a_format)
self._sort_formats(formats)
info['formats'] = formats
elif video_info.get('FLVFullLengthURL') is not None:
info.update({
'url': video_info['FLVFullLengthURL'],
'vcodec': self.FLV_VCODECS.get(video_info.get('FLVFullCodec')),
'filesize': int_or_none(video_info.get('FLVFullSize')),
})
if self._downloader.params.get('include_ads', False):
adServerURL = video_info.get('_youtubedl_adServerURL')
if adServerURL:
ad_info = {
'_type': 'url',
'url': adServerURL,
}
if 'url' in info:
return {
'_type': 'playlist',
'title': info['title'],
'entries': [ad_info, info],
}
else:
return ad_info
if not info.get('url') and not info.get('formats'):
uploader_id = info.get('uploader_id')
if uploader_id:
info.update(self._brightcove_new_url_result(uploader_id, video_id))
else:
raise ExtractorError('Unable to extract video url for %s' % video_id)
return info
class BrightcoveNewIE(AdobePassIE): class BrightcoveNewIE(AdobePassIE):

View File

@ -220,7 +220,7 @@ class InfoExtractor(object):
* "preference" (optional, int) - quality of the image * "preference" (optional, int) - quality of the image
* "width" (optional, int) * "width" (optional, int)
* "height" (optional, int) * "height" (optional, int)
* "resolution" (optional, string "{width}x{height"}, * "resolution" (optional, string "{width}x{height}",
deprecated) deprecated)
* "filesize" (optional, int) * "filesize" (optional, int)
thumbnail: Full URL to a video thumbnail image. thumbnail: Full URL to a video thumbnail image.

View File

@ -3,6 +3,7 @@ from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import unified_timestamp from ..utils import unified_timestamp
from .youtube import YoutubeIE
class CtsNewsIE(InfoExtractor): class CtsNewsIE(InfoExtractor):
@ -14,8 +15,8 @@ class CtsNewsIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': '201501291578109', 'id': '201501291578109',
'ext': 'mp4', 'ext': 'mp4',
'title': '以色列.真主黨交火 3人死亡', 'title': '以色列.真主黨交火 3人死亡 - 華視新聞網',
'description': '以色列和黎巴嫩真主黨,爆發五年最嚴重衝突,雙方砲轟交火,兩名以軍死亡,還有一名西班牙籍的聯合國維和人...', 'description': '以色列和黎巴嫩真主黨,爆發五年最嚴重衝突,雙方砲轟交火,兩名以軍死亡,還有一名西班牙籍的聯合國維和人員也不幸罹難。大陸陝西、河南、安徽、江蘇和湖北五個省份出現大暴雪,嚴重影響陸空交通,不過九華山卻出現...',
'timestamp': 1422528540, 'timestamp': 1422528540,
'upload_date': '20150129', 'upload_date': '20150129',
} }
@ -26,7 +27,7 @@ class CtsNewsIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': '201309031304098', 'id': '201309031304098',
'ext': 'mp4', 'ext': 'mp4',
'title': '韓國31歲童顏男 貌如十多歲小孩', 'title': '韓國31歲童顏男 貌如十多歲小孩 - 華視新聞網',
'description': '越有年紀的人越希望看起來年輕一點而南韓卻有一位31歲的男子看起來像是11、12歲的小孩身...', 'description': '越有年紀的人越希望看起來年輕一點而南韓卻有一位31歲的男子看起來像是11、12歲的小孩身...',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
'timestamp': 1378205880, 'timestamp': 1378205880,
@ -62,8 +63,7 @@ class CtsNewsIE(InfoExtractor):
video_url = mp4_feed['source_url'] video_url = mp4_feed['source_url']
else: else:
self.to_screen('Not CTSPlayer video, trying Youtube...') self.to_screen('Not CTSPlayer video, trying Youtube...')
youtube_url = self._search_regex( youtube_url = YoutubeIE._extract_url(page)
r'src="(//www\.youtube\.com/embed/[^"]+)"', page, 'youtube url')
return self.url_result(youtube_url, ie='Youtube') return self.url_result(youtube_url, ie='Youtube')

View File

@ -48,7 +48,14 @@ class DailymotionBaseInfoExtractor(InfoExtractor):
class DailymotionIE(DailymotionBaseInfoExtractor): class DailymotionIE(DailymotionBaseInfoExtractor):
_VALID_URL = r'(?i)https?://(?:(www|touch)\.)?dailymotion\.[a-z]{2,3}/(?:(?:(?:embed|swf|#)/)?video|swf)/(?P<id>[^/?_]+)' _VALID_URL = r'''(?ix)
https?://
(?:
(?:(?:www|touch)\.)?dailymotion\.[a-z]{2,3}/(?:(?:(?:embed|swf|\#)/)?video|swf)|
(?:www\.)?lequipe\.fr/video
)
/(?P<id>[^/?_]+)
'''
IE_NAME = 'dailymotion' IE_NAME = 'dailymotion'
_FORMATS = [ _FORMATS = [
@ -133,14 +140,26 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
}, { }, {
'url': 'http://www.dailymotion.com/swf/x3ss1m_funny-magic-trick-barry-and-stuart_fun', 'url': 'http://www.dailymotion.com/swf/x3ss1m_funny-magic-trick-barry-and-stuart_fun',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.lequipe.fr/video/x791mem',
'only_matching': True,
}, {
'url': 'https://www.lequipe.fr/video/k7MtHciueyTcrFtFKA2',
'only_matching': True,
}] }]
@staticmethod @staticmethod
def _extract_urls(webpage): def _extract_urls(webpage):
urls = []
# Look for embedded Dailymotion player # Look for embedded Dailymotion player
matches = re.findall( # https://developer.dailymotion.com/player#player-parameters
r'<(?:(?:embed|iframe)[^>]+?src=|input[^>]+id=[\'"]dmcloudUrlEmissionSelect[\'"][^>]+value=)(["\'])(?P<url>(?:https?:)?//(?:www\.)?dailymotion\.com/(?:embed|swf)/video/.+?)\1', webpage) for mobj in re.finditer(
return list(map(lambda m: unescapeHTML(m[1]), matches)) r'<(?:(?:embed|iframe)[^>]+?src=|input[^>]+id=[\'"]dmcloudUrlEmissionSelect[\'"][^>]+value=)(["\'])(?P<url>(?:https?:)?//(?:www\.)?dailymotion\.com/(?:embed|swf)/video/.+?)\1', webpage):
urls.append(unescapeHTML(mobj.group('url')))
for mobj in re.finditer(
r'(?s)DM\.player\([^,]+,\s*{.*?video[\'"]?\s*:\s*["\']?(?P<id>[0-9a-zA-Z]+).+?}\s*\);', webpage):
urls.append('https://www.dailymotion.com/embed/video/' + mobj.group('id'))
return urls
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)

View File

@ -7,50 +7,51 @@ from .common import InfoExtractor
class DBTVIE(InfoExtractor): class DBTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?dbtv\.no/(?:[^/]+/)?(?P<id>[0-9]+)(?:#(?P<display_id>.+))?' _VALID_URL = r'https?://(?:www\.)?dagbladet\.no/video/(?:(?:embed|(?P<display_id>[^/]+))/)?(?P<id>[0-9A-Za-z_-]{11}|[a-zA-Z0-9]{8})'
_TESTS = [{ _TESTS = [{
'url': 'http://dbtv.no/3649835190001#Skulle_teste_ut_fornøyelsespark,_men_kollegaen_var_bare_opptatt_av_bikinikroppen', 'url': 'https://www.dagbladet.no/video/PynxJnNWChE/',
'md5': '2e24f67936517b143a234b4cadf792ec', 'md5': 'b8f850ba1860adbda668d367f9b77699',
'info_dict': { 'info_dict': {
'id': '3649835190001', 'id': 'PynxJnNWChE',
'display_id': 'Skulle_teste_ut_fornøyelsespark,_men_kollegaen_var_bare_opptatt_av_bikinikroppen',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Skulle teste ut fornøyelsespark, men kollegaen var bare opptatt av bikinikroppen', 'title': 'Skulle teste ut fornøyelsespark, men kollegaen var bare opptatt av bikinikroppen',
'description': 'md5:1504a54606c4dde3e4e61fc97aa857e0', 'description': 'md5:49cc8370e7d66e8a2ef15c3b4631fd3f',
'thumbnail': r're:https?://.*\.jpg', 'thumbnail': r're:https?://.*\.jpg',
'timestamp': 1404039863, 'upload_date': '20160916',
'upload_date': '20140629', 'duration': 69,
'duration': 69.544, 'uploader_id': 'UCk5pvsyZJoYJBd7_oFPTlRQ',
'uploader_id': '1027729757001', 'uploader': 'Dagbladet',
}, },
'add_ie': ['BrightcoveNew'] 'add_ie': ['Youtube']
}, { }, {
'url': 'http://dbtv.no/3649835190001', 'url': 'https://www.dagbladet.no/video/embed/xlGmyIeN9Jo/?autoplay=false',
'only_matching': True, 'only_matching': True,
}, { }, {
'url': 'http://www.dbtv.no/lazyplayer/4631135248001', 'url': 'https://www.dagbladet.no/video/truer-iran-bor-passe-dere/PalfB2Cw',
'only_matching': True,
}, {
'url': 'http://dbtv.no/vice/5000634109001',
'only_matching': True,
}, {
'url': 'http://dbtv.no/filmtrailer/3359293614001',
'only_matching': True, 'only_matching': True,
}] }]
@staticmethod @staticmethod
def _extract_urls(webpage): def _extract_urls(webpage):
return [url for _, url in re.findall( return [url for _, url in re.findall(
r'<iframe[^>]+src=(["\'])((?:https?:)?//(?:www\.)?dbtv\.no/(?:lazy)?player/\d+.*?)\1', r'<iframe[^>]+src=(["\'])((?:https?:)?//(?:www\.)?dagbladet\.no/video/embed/(?:[0-9A-Za-z_-]{11}|[a-zA-Z0-9]{8}).*?)\1',
webpage)] webpage)]
def _real_extract(self, url): def _real_extract(self, url):
video_id, display_id = re.match(self._VALID_URL, url).groups() display_id, video_id = re.match(self._VALID_URL, url).groups()
info = {
return {
'_type': 'url_transparent', '_type': 'url_transparent',
'url': 'http://players.brightcove.net/1027729757001/default_default/index.html?videoId=%s' % video_id,
'id': video_id, 'id': video_id,
'display_id': display_id, 'display_id': display_id,
'ie_key': 'BrightcoveNew',
} }
if len(video_id) == 11:
info.update({
'url': video_id,
'ie_key': 'Youtube',
})
else:
info.update({
'url': 'jwplatform:' + video_id,
'ie_key': 'JWPlatform',
})
return info

View File

@ -5,23 +5,17 @@ import re
import string import string
from .discoverygo import DiscoveryGoBaseIE from .discoverygo import DiscoveryGoBaseIE
from ..compat import ( from ..compat import compat_urllib_parse_unquote
compat_str, from ..utils import ExtractorError
compat_urllib_parse_unquote,
)
from ..utils import (
ExtractorError,
try_get,
)
from ..compat import compat_HTTPError from ..compat import compat_HTTPError
class DiscoveryIE(DiscoveryGoBaseIE): class DiscoveryIE(DiscoveryGoBaseIE):
_VALID_URL = r'''(?x)https?:// _VALID_URL = r'''(?x)https?://
(?P<site> (?P<site>
(?:(?:www|go)\.)?discovery|
(?:www\.)? (?:www\.)?
(?: (?:
discovery|
investigationdiscovery| investigationdiscovery|
discoverylife| discoverylife|
animalplanet| animalplanet|
@ -40,15 +34,15 @@ class DiscoveryIE(DiscoveryGoBaseIE):
cookingchanneltv| cookingchanneltv|
motortrend motortrend
) )
)\.com(?P<path>/tv-shows/[^/]+/(?:video|full-episode)s/(?P<id>[^./?#]+))''' )\.com/tv-shows/(?P<show_slug>[^/]+)/(?:video|full-episode)s/(?P<id>[^./?#]+)'''
_TESTS = [{ _TESTS = [{
'url': 'https://www.discovery.com/tv-shows/cash-cab/videos/dave-foley', 'url': 'https://go.discovery.com/tv-shows/cash-cab/videos/riding-with-matthew-perry',
'info_dict': { 'info_dict': {
'id': '5a2d9b4d6b66d17a5026e1fd', 'id': '5a2f35ce6b66d17a5026e29e',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Dave Foley', 'title': 'Riding with Matthew Perry',
'description': 'md5:4b39bcafccf9167ca42810eb5f28b01f', 'description': 'md5:a34333153e79bc4526019a5129e7f878',
'duration': 608, 'duration': 84,
}, },
'params': { 'params': {
'skip_download': True, # requires ffmpeg 'skip_download': True, # requires ffmpeg
@ -56,20 +50,20 @@ class DiscoveryIE(DiscoveryGoBaseIE):
}, { }, {
'url': 'https://www.investigationdiscovery.com/tv-shows/final-vision/full-episodes/final-vision', 'url': 'https://www.investigationdiscovery.com/tv-shows/final-vision/full-episodes/final-vision',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://go.discovery.com/tv-shows/alaskan-bush-people/videos/follow-your-own-road',
'only_matching': True,
}, {
# using `show_slug` is important to get the correct video data
'url': 'https://www.sciencechannel.com/tv-shows/mythbusters-on-science/full-episodes/christmas-special',
'only_matching': True,
}] }]
_GEO_COUNTRIES = ['US'] _GEO_COUNTRIES = ['US']
_GEO_BYPASS = False _GEO_BYPASS = False
_API_BASE_URL = 'https://api.discovery.com/v1/'
def _real_extract(self, url): def _real_extract(self, url):
site, path, display_id = re.match(self._VALID_URL, url).groups() site, show_slug, display_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, display_id)
react_data = self._parse_json(self._search_regex(
r'window\.__reactTransmitPacket\s*=\s*({.+?});',
webpage, 'react data'), display_id)
content_blocks = react_data['layout'][path]['contentBlocks']
video = next(cb for cb in content_blocks if cb.get('type') == 'video')['content']['items'][0]
video_id = video['id']
access_token = None access_token = None
cookies = self._get_cookies(url) cookies = self._get_cookies(url)
@ -79,27 +73,36 @@ class DiscoveryIE(DiscoveryGoBaseIE):
if auth_storage_cookie and auth_storage_cookie.value: if auth_storage_cookie and auth_storage_cookie.value:
auth_storage = self._parse_json(compat_urllib_parse_unquote( auth_storage = self._parse_json(compat_urllib_parse_unquote(
compat_urllib_parse_unquote(auth_storage_cookie.value)), compat_urllib_parse_unquote(auth_storage_cookie.value)),
video_id, fatal=False) or {} display_id, fatal=False) or {}
access_token = auth_storage.get('a') or auth_storage.get('access_token') access_token = auth_storage.get('a') or auth_storage.get('access_token')
if not access_token: if not access_token:
access_token = self._download_json( access_token = self._download_json(
'https://%s.com/anonymous' % site, display_id, query={ 'https://%s.com/anonymous' % site, display_id,
'Downloading token JSON metadata', query={
'authRel': 'authorization', 'authRel': 'authorization',
'client_id': try_get( 'client_id': '3020a40c2356a645b4b4',
react_data, lambda x: x['application']['apiClientId'],
compat_str) or '3020a40c2356a645b4b4',
'nonce': ''.join([random.choice(string.ascii_letters) for _ in range(32)]), 'nonce': ''.join([random.choice(string.ascii_letters) for _ in range(32)]),
'redirectUri': 'https://fusion.ddmcdn.com/app/mercury-sdk/180/redirectHandler.html?https://www.%s.com' % site, 'redirectUri': 'https://fusion.ddmcdn.com/app/mercury-sdk/180/redirectHandler.html?https://www.%s.com' % site,
})['access_token'] })['access_token']
try: headers = self.geo_verification_headers()
headers = self.geo_verification_headers() headers['Authorization'] = 'Bearer ' + access_token
headers['Authorization'] = 'Bearer ' + access_token
try:
video = self._download_json(
self._API_BASE_URL + 'content/videos',
display_id, 'Downloading content JSON metadata',
headers=headers, query={
'embed': 'show.name',
'fields': 'authenticated,description.detailed,duration,episodeNumber,id,name,parental.rating,season.number,show,tags',
'slug': display_id,
'show_slug': show_slug,
})[0]
video_id = video['id']
stream = self._download_json( stream = self._download_json(
'https://api.discovery.com/v1/streaming/video/' + video_id, self._API_BASE_URL + 'streaming/video/' + video_id,
display_id, headers=headers) display_id, 'Downloading streaming JSON metadata', headers=headers)
except ExtractorError as e: except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code in (401, 403): if isinstance(e.cause, compat_HTTPError) and e.cause.code in (401, 403):
e_description = self._parse_json( e_description = self._parse_json(

View File

@ -0,0 +1,97 @@
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
from ..utils import int_or_none
class DLiveVODIE(InfoExtractor):
IE_NAME = 'dlive:vod'
_VALID_URL = r'https?://(?:www\.)?dlive\.tv/p/(?P<uploader_id>.+?)\+(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://dlive.tv/p/pdp+3mTzOl4WR',
'info_dict': {
'id': '3mTzOl4WR',
'ext': 'mp4',
'title': 'Minecraft with james charles epic',
'upload_date': '20190701',
'timestamp': 1562011015,
'uploader_id': 'pdp',
}
}, {
'url': 'https://dlive.tv/p/pdpreplay+D-RD-xSZg',
'only_matching': True,
}]
def _real_extract(self, url):
uploader_id, vod_id = re.match(self._VALID_URL, url).groups()
broadcast = self._download_json(
'https://graphigo.prd.dlive.tv/', vod_id,
data=json.dumps({'query': '''query {
pastBroadcast(permlink:"%s+%s") {
content
createdAt
length
playbackUrl
title
thumbnailUrl
viewCount
}
}''' % (uploader_id, vod_id)}).encode())['data']['pastBroadcast']
title = broadcast['title']
formats = self._extract_m3u8_formats(
broadcast['playbackUrl'], vod_id, 'mp4', 'm3u8_native')
self._sort_formats(formats)
return {
'id': vod_id,
'title': title,
'uploader_id': uploader_id,
'formats': formats,
'description': broadcast.get('content'),
'thumbnail': broadcast.get('thumbnailUrl'),
'timestamp': int_or_none(broadcast.get('createdAt'), 1000),
'view_count': int_or_none(broadcast.get('viewCount')),
}
class DLiveStreamIE(InfoExtractor):
IE_NAME = 'dlive:stream'
_VALID_URL = r'https?://(?:www\.)?dlive\.tv/(?!p/)(?P<id>[\w.-]+)'
def _real_extract(self, url):
display_name = self._match_id(url)
user = self._download_json(
'https://graphigo.prd.dlive.tv/', display_name,
data=json.dumps({'query': '''query {
userByDisplayName(displayname:"%s") {
livestream {
content
createdAt
title
thumbnailUrl
watchingCount
}
username
}
}''' % display_name}).encode())['data']['userByDisplayName']
livestream = user['livestream']
title = livestream['title']
username = user['username']
formats = self._extract_m3u8_formats(
'https://live.prd.dlive.tv/hls/live/%s.m3u8' % username,
display_name, 'mp4')
self._sort_formats(formats)
return {
'id': display_name,
'title': self._live_title(title),
'uploader': display_name,
'uploader_id': username,
'formats': formats,
'description': livestream.get('content'),
'thumbnail': livestream.get('thumbnailUrl'),
'is_live': True,
'timestamp': int_or_none(livestream.get('createdAt'), 1000),
'view_count': int_or_none(livestream.get('watchingCount')),
}

View File

@ -2,6 +2,7 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import json import json
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
@ -18,7 +19,7 @@ from ..utils import (
class EinthusanIE(InfoExtractor): class EinthusanIE(InfoExtractor):
_VALID_URL = r'https?://einthusan\.tv/movie/watch/(?P<id>[^/?#&]+)' _VALID_URL = r'https?://(?P<host>einthusan\.(?:tv|com|ca))/movie/watch/(?P<id>[^/?#&]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://einthusan.tv/movie/watch/9097/', 'url': 'https://einthusan.tv/movie/watch/9097/',
'md5': 'ff0f7f2065031b8a2cf13a933731c035', 'md5': 'ff0f7f2065031b8a2cf13a933731c035',
@ -32,6 +33,12 @@ class EinthusanIE(InfoExtractor):
}, { }, {
'url': 'https://einthusan.tv/movie/watch/51MZ/?lang=hindi', 'url': 'https://einthusan.tv/movie/watch/51MZ/?lang=hindi',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://einthusan.com/movie/watch/9097/',
'only_matching': True,
}, {
'url': 'https://einthusan.ca/movie/watch/4E9n/?lang=hindi',
'only_matching': True,
}] }]
# reversed from jsoncrypto.prototype.decrypt() in einthusan-PGMovieWatcher.js # reversed from jsoncrypto.prototype.decrypt() in einthusan-PGMovieWatcher.js
@ -41,7 +48,9 @@ class EinthusanIE(InfoExtractor):
)).decode('utf-8'), video_id) )).decode('utf-8'), video_id)
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) mobj = re.match(self._VALID_URL, url)
host = mobj.group('host')
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
@ -53,7 +62,7 @@ class EinthusanIE(InfoExtractor):
page_id = self._html_search_regex( page_id = self._html_search_regex(
'<html[^>]+data-pageid="([^"]+)"', webpage, 'page ID') '<html[^>]+data-pageid="([^"]+)"', webpage, 'page ID')
video_data = self._download_json( video_data = self._download_json(
'https://einthusan.tv/ajax/movie/watch/%s/' % video_id, video_id, 'https://%s/ajax/movie/watch/%s/' % (host, video_id), video_id,
data=urlencode_postdata({ data=urlencode_postdata({
'xEvent': 'UIVideoPlayer.PingOutcome', 'xEvent': 'UIVideoPlayer.PingOutcome',
'xJson': json.dumps({ 'xJson': json.dumps({

View File

@ -216,17 +216,14 @@ class FiveThirtyEightIE(InfoExtractor):
_TEST = { _TEST = {
'url': 'http://fivethirtyeight.com/features/how-the-6-8-raiders-can-still-make-the-playoffs/', 'url': 'http://fivethirtyeight.com/features/how-the-6-8-raiders-can-still-make-the-playoffs/',
'info_dict': { 'info_dict': {
'id': '21846851', 'id': '56032156',
'ext': 'mp4', 'ext': 'flv',
'title': 'FiveThirtyEight: The Raiders can still make the playoffs', 'title': 'FiveThirtyEight: The Raiders can still make the playoffs',
'description': 'Neil Paine breaks down the simplest scenario that will put the Raiders into the playoffs at 8-8.', 'description': 'Neil Paine breaks down the simplest scenario that will put the Raiders into the playoffs at 8-8.',
'timestamp': 1513960621,
'upload_date': '20171222',
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
'expected_warnings': ['Unable to download f4m manifest'],
} }
def _real_extract(self, url): def _real_extract(self, url):
@ -234,9 +231,8 @@ class FiveThirtyEightIE(InfoExtractor):
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
video_id = self._search_regex( embed_url = self._search_regex(
r'data-video-id=["\'](?P<id>\d+)', r'<iframe[^>]+src=["\'](https?://fivethirtyeight\.abcnews\.go\.com/video/embed/\d+/\d+)',
webpage, 'video id', group='id') webpage, 'embed url')
return self.url_result( return self.url_result(embed_url, 'AbcNewsVideo')
'http://espn.go.com/video/clip?id=%s' % video_id, ESPNIE.ie_key())

View File

@ -58,17 +58,8 @@ from .ard import (
ARDMediathekIE, ARDMediathekIE,
) )
from .arte import ( from .arte import (
ArteTvIE,
ArteTVPlus7IE, ArteTVPlus7IE,
ArteTVCreativeIE,
ArteTVConcertIE,
ArteTVInfoIE,
ArteTVFutureIE,
ArteTVCinemaIE,
ArteTVDDCIE,
ArteTVMagazineIE,
ArteTVEmbedIE, ArteTVEmbedIE,
TheOperaPlatformIE,
ArteTVPlaylistIE, ArteTVPlaylistIE,
) )
from .asiancrush import ( from .asiancrush import (
@ -113,6 +104,8 @@ from .bild import BildIE
from .bilibili import ( from .bilibili import (
BiliBiliIE, BiliBiliIE,
BiliBiliBangumiIE, BiliBiliBangumiIE,
BilibiliAudioIE,
BilibiliAudioAlbumIE,
) )
from .biobiochiletv import BioBioChileTVIE from .biobiochiletv import BioBioChileTVIE
from .bitchute import ( from .bitchute import (
@ -404,11 +397,7 @@ from .frontendmasters import (
FrontendMastersCourseIE FrontendMastersCourseIE
) )
from .funimation import FunimationIE from .funimation import FunimationIE
from .funk import ( from .funk import FunkIE
FunkMixIE,
FunkChannelIE,
)
from .funnyordie import FunnyOrDieIE
from .fusion import FusionIE from .fusion import FusionIE
from .fxnetworks import FXNetworksIE from .fxnetworks import FXNetworksIE
from .gaia import GaiaIE from .gaia import GaiaIE
@ -592,6 +581,7 @@ from .linkedin import (
) )
from .linuxacademy import LinuxAcademyIE from .linuxacademy import LinuxAcademyIE
from .litv import LiTVIE from .litv import LiTVIE
from .livejournal import LiveJournalIE
from .liveleak import ( from .liveleak import (
LiveLeakIE, LiveLeakIE,
LiveLeakEmbedIE, LiveLeakEmbedIE,
@ -980,7 +970,6 @@ from .rts import RTSIE
from .rtve import RTVEALaCartaIE, RTVELiveIE, RTVEInfantilIE, RTVELiveIE, RTVETelevisionIE from .rtve import RTVEALaCartaIE, RTVELiveIE, RTVEInfantilIE, RTVELiveIE, RTVETelevisionIE
from .rtvnh import RTVNHIE from .rtvnh import RTVNHIE
from .rtvs import RTVSIE from .rtvs import RTVSIE
from .rudo import RudoIE
from .ruhd import RUHDIE from .ruhd import RUHDIE
from .rutube import ( from .rutube import (
RutubeIE, RutubeIE,
@ -1268,6 +1257,10 @@ from .udn import UDNEmbedIE
from .ufctv import UFCTVIE from .ufctv import UFCTVIE
from .uktvplay import UKTVPlayIE from .uktvplay import UKTVPlayIE
from .digiteka import DigitekaIE from .digiteka import DigitekaIE
from .dlive import (
DLiveVODIE,
DLiveStreamIE,
)
from .umg import UMGDeIE from .umg import UMGDeIE
from .unistra import UnistraIE from .unistra import UnistraIE
from .unity import UnityIE from .unity import UnityIE
@ -1434,6 +1427,7 @@ from .xfileshare import XFileShareIE
from .xhamster import ( from .xhamster import (
XHamsterIE, XHamsterIE,
XHamsterEmbedIE, XHamsterEmbedIE,
XHamsterUserIE,
) )
from .xiami import ( from .xiami import (
XiamiSongIE, XiamiSongIE,
@ -1457,6 +1451,7 @@ from .yahoo import (
YahooSearchIE, YahooSearchIE,
YahooGyaOPlayerIE, YahooGyaOPlayerIE,
YahooGyaOIE, YahooGyaOIE,
YahooJapanNewsIE,
) )
from .yandexdisk import YandexDiskIE from .yandexdisk import YandexDiskIE
from .yandexmusic import ( from .yandexmusic import (

View File

@ -428,7 +428,7 @@ class FacebookIE(InfoExtractor):
timestamp = int_or_none(self._search_regex( timestamp = int_or_none(self._search_regex(
r'<abbr[^>]+data-utime=["\'](\d+)', webpage, r'<abbr[^>]+data-utime=["\'](\d+)', webpage,
'timestamp', default=None)) 'timestamp', default=None))
thumbnail = self._og_search_thumbnail(webpage) thumbnail = self._html_search_meta(['og:image', 'twitter:image'], webpage)
view_count = parse_count(self._search_regex( view_count = parse_count(self._search_regex(
r'\bviewCount\s*:\s*["\']([\d,.]+)', webpage, 'view count', r'\bviewCount\s*:\s*["\']([\d,.]+)', webpage, 'view count',

View File

@ -9,7 +9,7 @@ from ..utils import int_or_none
class FiveTVIE(InfoExtractor): class FiveTVIE(InfoExtractor):
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
http:// https?://
(?:www\.)?5-tv\.ru/ (?:www\.)?5-tv\.ru/
(?: (?:
(?:[^/]+/)+(?P<id>\d+)| (?:[^/]+/)+(?P<id>\d+)|
@ -39,6 +39,7 @@ class FiveTVIE(InfoExtractor):
'duration': 180, 'duration': 180,
}, },
}, { }, {
# redirect to https://www.5-tv.ru/projects/1000095/izvestia-glavnoe/
'url': 'http://www.5-tv.ru/glavnoe/#itemDetails', 'url': 'http://www.5-tv.ru/glavnoe/#itemDetails',
'info_dict': { 'info_dict': {
'id': 'glavnoe', 'id': 'glavnoe',
@ -46,6 +47,7 @@ class FiveTVIE(InfoExtractor):
'title': r're:^Итоги недели с \d+ по \d+ \w+ \d{4} года$', 'title': r're:^Итоги недели с \d+ по \d+ \w+ \d{4} года$',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
}, },
'skip': 'redirect to «Известия. Главное» project page',
}, { }, {
'url': 'http://www.5-tv.ru/glavnoe/broadcasts/508645/', 'url': 'http://www.5-tv.ru/glavnoe/broadcasts/508645/',
'only_matching': True, 'only_matching': True,
@ -70,7 +72,7 @@ class FiveTVIE(InfoExtractor):
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
video_url = self._search_regex( video_url = self._search_regex(
[r'<div[^>]+?class="flowplayer[^>]+?data-href="([^"]+)"', [r'<div[^>]+?class="(?:flow)?player[^>]+?data-href="([^"]+)"',
r'<a[^>]+?href="([^"]+)"[^>]+?class="videoplayer"'], r'<a[^>]+?href="([^"]+)"[^>]+?class="videoplayer"'],
webpage, 'video url') webpage, 'video url')

View File

@ -1,89 +1,21 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import itertools
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from .nexx import NexxIE from .nexx import NexxIE
from ..compat import compat_str
from ..utils import ( from ..utils import (
int_or_none, int_or_none,
try_get, str_or_none,
) )
class FunkBaseIE(InfoExtractor): class FunkIE(InfoExtractor):
_HEADERS = { _VALID_URL = r'https?://(?:www\.)?funk\.net/(?:channel|playlist)/[^/]+/(?P<display_id>[0-9a-z-]+)-(?P<id>\d+)'
'Accept': '*/*',
'Accept-Language': 'en-US,en;q=0.9,ru;q=0.8',
'authorization': 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJjbGllbnROYW1lIjoid2ViYXBwLXYzMSIsInNjb3BlIjoic3RhdGljLWNvbnRlbnQtYXBpLGN1cmF0aW9uLWFwaSxuZXh4LWNvbnRlbnQtYXBpLXYzMSx3ZWJhcHAtYXBpIn0.mbuG9wS9Yf5q6PqgR4fiaRFIagiHk9JhwoKES7ksVX4',
}
_AUTH = 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJjbGllbnROYW1lIjoid2ViYXBwLXYzMSIsInNjb3BlIjoic3RhdGljLWNvbnRlbnQtYXBpLGN1cmF0aW9uLWFwaSxuZXh4LWNvbnRlbnQtYXBpLXYzMSx3ZWJhcHAtYXBpIn0.mbuG9wS9Yf5q6PqgR4fiaRFIagiHk9JhwoKES7ksVX4'
@staticmethod
def _make_headers(referer):
headers = FunkBaseIE._HEADERS.copy()
headers['Referer'] = referer
return headers
def _make_url_result(self, video):
return {
'_type': 'url_transparent',
'url': 'nexx:741:%s' % video['sourceId'],
'ie_key': NexxIE.ie_key(),
'id': video['sourceId'],
'title': video.get('title'),
'description': video.get('description'),
'duration': int_or_none(video.get('duration')),
'season_number': int_or_none(video.get('seasonNr')),
'episode_number': int_or_none(video.get('episodeNr')),
}
class FunkMixIE(FunkBaseIE):
_VALID_URL = r'https?://(?:www\.)?funk\.net/mix/(?P<id>[^/]+)/(?P<alias>[^/?#&]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://www.funk.net/mix/59d65d935f8b160001828b5b/die-realste-kifferdoku-aller-zeiten', 'url': 'https://www.funk.net/channel/ba-793/die-lustigsten-instrumente-aus-dem-internet-teil-2-1155821',
'md5': '8edf617c2f2b7c9847dfda313f199009', 'md5': '8dd9d9ab59b4aa4173b3197f2ea48e81',
'info_dict': {
'id': '123748',
'ext': 'mp4',
'title': '"Die realste Kifferdoku aller Zeiten"',
'description': 'md5:c97160f5bafa8d47ec8e2e461012aa9d',
'timestamp': 1490274721,
'upload_date': '20170323',
},
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
mix_id = mobj.group('id')
alias = mobj.group('alias')
lists = self._download_json(
'https://www.funk.net/api/v3.1/curation/curatedLists/',
mix_id, headers=self._make_headers(url), query={
'size': 100,
})['_embedded']['curatedListList']
metas = next(
l for l in lists
if mix_id in (l.get('entityId'), l.get('alias')))['videoMetas']
video = next(
meta['videoDataDelegate']
for meta in metas
if try_get(
meta, lambda x: x['videoDataDelegate']['alias'],
compat_str) == alias)
return self._make_url_result(video)
class FunkChannelIE(FunkBaseIE):
_VALID_URL = r'https?://(?:www\.)?funk\.net/channel/(?P<id>[^/]+)/(?P<alias>[^/?#&]+)'
_TESTS = [{
'url': 'https://www.funk.net/channel/ba/die-lustigsten-instrumente-aus-dem-internet-teil-2',
'info_dict': { 'info_dict': {
'id': '1155821', 'id': '1155821',
'ext': 'mp4', 'ext': 'mp4',
@ -92,83 +24,26 @@ class FunkChannelIE(FunkBaseIE):
'timestamp': 1514507395, 'timestamp': 1514507395,
'upload_date': '20171229', 'upload_date': '20171229',
}, },
'params': {
'skip_download': True,
},
}, { }, {
# only available via byIdList API 'url': 'https://www.funk.net/playlist/neuesteVideos/kameras-auf-dem-fusion-festival-1618699',
'url': 'https://www.funk.net/channel/informr/martin-sonneborn-erklaert-die-eu',
'info_dict': {
'id': '205067',
'ext': 'mp4',
'title': 'Martin Sonneborn erklärt die EU',
'description': 'md5:050f74626e4ed87edf4626d2024210c0',
'timestamp': 1494424042,
'upload_date': '20170510',
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://www.funk.net/channel/59d5149841dca100012511e3/mein-erster-job-lovemilla-folge-1/lovemilla/',
'only_matching': True, 'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) display_id, nexx_id = re.match(self._VALID_URL, url).groups()
channel_id = mobj.group('id') video = self._download_json(
alias = mobj.group('alias') 'https://www.funk.net/api/v4.0/videos/' + nexx_id, nexx_id)
return {
headers = self._make_headers(url) '_type': 'url_transparent',
'url': 'nexx:741:' + nexx_id,
video = None 'ie_key': NexxIE.ie_key(),
'id': nexx_id,
# Id-based channels are currently broken on their side: webplayer 'title': video.get('title'),
# tries to process them via byChannelAlias endpoint and fails 'description': video.get('description'),
# predictably. 'duration': int_or_none(video.get('duration')),
for page_num in itertools.count(): 'channel_id': str_or_none(video.get('channelId')),
by_channel_alias = self._download_json( 'display_id': display_id,
'https://www.funk.net/api/v3.1/webapp/videos/byChannelAlias/%s' 'tags': video.get('tags'),
% channel_id, 'thumbnail': video.get('imageUrlLandscape'),
'Downloading byChannelAlias JSON page %d' % (page_num + 1), }
headers=headers, query={
'filterFsk': 'false',
'sort': 'creationDate,desc',
'size': 100,
'page': page_num,
}, fatal=False)
if not by_channel_alias:
break
video_list = try_get(
by_channel_alias, lambda x: x['_embedded']['videoList'], list)
if not video_list:
break
try:
video = next(r for r in video_list if r.get('alias') == alias)
break
except StopIteration:
pass
if not try_get(
by_channel_alias, lambda x: x['_links']['next']):
break
if not video:
by_id_list = self._download_json(
'https://www.funk.net/api/v3.0/content/videos/byIdList',
channel_id, 'Downloading byIdList JSON', headers=headers,
query={
'ids': alias,
}, fatal=False)
if by_id_list:
video = try_get(by_id_list, lambda x: x['result'][0], dict)
if not video:
results = self._download_json(
'https://www.funk.net/api/v3.0/content/videos/filter',
channel_id, 'Downloading filter JSON', headers=headers, query={
'channelId': channel_id,
'size': 100,
})['result']
video = next(r for r in results if r.get('alias') == alias)
return self._make_url_result(video)

View File

@ -1,162 +0,0 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
float_or_none,
int_or_none,
unified_timestamp,
)
class FunnyOrDieIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?funnyordie\.com/(?P<type>embed|articles|videos)/(?P<id>[0-9a-f]+)(?:$|[?#/])'
_TESTS = [{
'url': 'http://www.funnyordie.com/videos/0732f586d7/heart-shaped-box-literal-video-version',
'md5': 'bcd81e0c4f26189ee09be362ad6e6ba9',
'info_dict': {
'id': '0732f586d7',
'ext': 'mp4',
'title': 'Heart-Shaped Box: Literal Video Version',
'description': 'md5:ea09a01bc9a1c46d9ab696c01747c338',
'thumbnail': r're:^http:.*\.jpg$',
'uploader': 'DASjr',
'timestamp': 1317904928,
'upload_date': '20111006',
'duration': 318.3,
},
}, {
'url': 'http://www.funnyordie.com/embed/e402820827',
'info_dict': {
'id': 'e402820827',
'ext': 'mp4',
'title': 'Please Use This Song (Jon Lajoie)',
'description': 'Please use this to sell something. www.jonlajoie.com',
'thumbnail': r're:^http:.*\.jpg$',
'timestamp': 1398988800,
'upload_date': '20140502',
},
'params': {
'skip_download': True,
},
}, {
'url': 'http://www.funnyordie.com/articles/ebf5e34fc8/10-hours-of-walking-in-nyc-as-a-man',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
links = re.findall(r'<source src="([^"]+/v)[^"]+\.([^"]+)" type=\'video', webpage)
if not links:
raise ExtractorError('No media links available for %s' % video_id)
links.sort(key=lambda link: 1 if link[1] == 'mp4' else 0)
m3u8_url = self._search_regex(
r'<source[^>]+src=(["\'])(?P<url>.+?/master\.m3u8[^"\']*)\1',
webpage, 'm3u8 url', group='url')
formats = []
m3u8_formats = self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False)
source_formats = list(filter(
lambda f: f.get('vcodec') != 'none', m3u8_formats))
bitrates = [int(bitrate) for bitrate in re.findall(r'[,/]v(\d+)(?=[,/])', m3u8_url)]
bitrates.sort()
if source_formats:
self._sort_formats(source_formats)
for bitrate, f in zip(bitrates, source_formats or [{}] * len(bitrates)):
for path, ext in links:
ff = f.copy()
if ff:
if ext != 'mp4':
ff = dict(
[(k, v) for k, v in ff.items()
if k in ('height', 'width', 'format_id')])
ff.update({
'format_id': ff['format_id'].replace('hls', ext),
'ext': ext,
'protocol': 'http',
})
else:
ff.update({
'format_id': '%s-%d' % (ext, bitrate),
'vbr': bitrate,
})
ff['url'] = self._proto_relative_url(
'%s%d.%s' % (path, bitrate, ext))
formats.append(ff)
self._check_formats(formats, video_id)
formats.extend(m3u8_formats)
self._sort_formats(
formats, field_preference=('height', 'width', 'tbr', 'format_id'))
subtitles = {}
for src, src_lang in re.findall(r'<track kind="captions" src="([^"]+)" srclang="([^"]+)"', webpage):
subtitles[src_lang] = [{
'ext': src.split('/')[-1],
'url': 'http://www.funnyordie.com%s' % src,
}]
timestamp = unified_timestamp(self._html_search_meta(
'uploadDate', webpage, 'timestamp', default=None))
uploader = self._html_search_regex(
r'<h\d[^>]+\bclass=["\']channel-preview-name[^>]+>(.+?)</h',
webpage, 'uploader', default=None)
title, description, thumbnail, duration = [None] * 4
medium = self._parse_json(
self._search_regex(
r'jsonMedium\s*=\s*({.+?});', webpage, 'JSON medium',
default='{}'),
video_id, fatal=False)
if medium:
title = medium.get('title')
duration = float_or_none(medium.get('duration'))
if not timestamp:
timestamp = unified_timestamp(medium.get('publishDate'))
post = self._parse_json(
self._search_regex(
r'fb_post\s*=\s*(\{.*?\});', webpage, 'post details',
default='{}'),
video_id, fatal=False)
if post:
if not title:
title = post.get('name')
description = post.get('description')
thumbnail = post.get('picture')
if not title:
title = self._og_search_title(webpage)
if not description:
description = self._og_search_description(webpage)
if not duration:
duration = int_or_none(self._html_search_meta(
('video:duration', 'duration'), webpage, 'duration', default=False))
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'uploader': uploader,
'timestamp': timestamp,
'duration': duration,
'formats': formats,
'subtitles': subtitles,
}

View File

@ -1,12 +1,19 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
from .brightcove import BrightcoveNewIE
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import (
clean_html,
get_element_by_class,
get_element_by_id,
)
class GameInformerIE(InfoExtractor): class GameInformerIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?gameinformer\.com/(?:[^/]+/)*(?P<id>.+)\.aspx' _VALID_URL = r'https?://(?:www\.)?gameinformer\.com/(?:[^/]+/)*(?P<id>[^.?&#]+)'
_TEST = { _TESTS = [{
# normal Brightcove embed code extracted with BrightcoveNewIE._extract_url
'url': 'http://www.gameinformer.com/b/features/archive/2015/09/26/replay-animal-crossing.aspx', 'url': 'http://www.gameinformer.com/b/features/archive/2015/09/26/replay-animal-crossing.aspx',
'md5': '292f26da1ab4beb4c9099f1304d2b071', 'md5': '292f26da1ab4beb4c9099f1304d2b071',
'info_dict': { 'info_dict': {
@ -18,16 +25,25 @@ class GameInformerIE(InfoExtractor):
'upload_date': '20150928', 'upload_date': '20150928',
'uploader_id': '694940074001', 'uploader_id': '694940074001',
}, },
} }, {
# Brightcove id inside unique element with field--name-field-brightcove-video-id class
'url': 'https://www.gameinformer.com/video-feature/new-gameplay-today/2019/07/09/new-gameplay-today-streets-of-rogue',
'info_dict': {
'id': '6057111913001',
'ext': 'mp4',
'title': 'New Gameplay Today Streets Of Rogue',
'timestamp': 1562699001,
'upload_date': '20190709',
'uploader_id': '694940074001',
},
}]
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/694940074001/default_default/index.html?videoId=%s' BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/694940074001/default_default/index.html?videoId=%s'
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) display_id = self._match_id(url)
webpage = self._download_webpage( webpage = self._download_webpage(
url, display_id, headers=self.geo_verification_headers()) url, display_id, headers=self.geo_verification_headers())
brightcove_id = self._search_regex( brightcove_id = clean_html(get_element_by_class('field--name-field-brightcove-video-id', webpage) or get_element_by_id('video-source-content', webpage))
[r'<[^>]+\bid=["\']bc_(\d+)', r"getVideo\('[^']+video_id=(\d+)"], brightcove_url = self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id if brightcove_id else BrightcoveNewIE._extract_url(self, webpage)
webpage, 'brightcove id') return self.url_result(brightcove_url, 'BrightcoveNew', brightcove_id)
return self.url_result(
self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id, 'BrightcoveNew',
brightcove_id)

View File

@ -2075,6 +2075,22 @@ class GenericIE(InfoExtractor):
}, },
'playlist_count': 6, 'playlist_count': 6,
}, },
{
# Squarespace video embed, 2019-08-28
'url': 'http://ootboxford.com',
'info_dict': {
'id': 'Tc7b_JGdZfw',
'title': 'Out of the Blue, at Childish Things 10',
'ext': 'mp4',
'description': 'md5:a83d0026666cf5ee970f8bd1cfd69c7f',
'uploader_id': 'helendouglashouse',
'uploader': 'Helen & Douglas House',
'upload_date': '20140328',
},
'params': {
'skip_download': True,
},
},
{ {
# Zype embed # Zype embed
'url': 'https://www.cookscountry.com/episode/554-smoky-barbecue-favorites', 'url': 'https://www.cookscountry.com/episode/554-smoky-barbecue-favorites',
@ -2104,6 +2120,23 @@ class GenericIE(InfoExtractor):
}, },
'expected_warnings': ['Failed to download MPD manifest'], 'expected_warnings': ['Failed to download MPD manifest'],
}, },
{
# DailyMotion embed with DM.player
'url': 'https://www.beinsports.com/us/copa-del-rey/video/the-locker-room-valencia-beat-barca-in-copa/1203804',
'info_dict': {
'id': 'k6aKkGHd9FJs4mtJN39',
'ext': 'mp4',
'title': 'The Locker Room: Valencia Beat Barca In Copa del Rey Final',
'description': 'This video is private.',
'uploader_id': 'x1jf30l',
'uploader': 'beIN SPORTS USA',
'upload_date': '20190528',
'timestamp': 1559062971,
},
'params': {
'skip_download': True,
},
},
# { # {
# # TODO: find another test # # TODO: find another test
# # http://schema.org/VideoObject # # http://schema.org/VideoObject
@ -2209,7 +2242,7 @@ class GenericIE(InfoExtractor):
default_search = 'fixup_error' default_search = 'fixup_error'
if default_search in ('auto', 'auto_warning', 'fixup_error'): if default_search in ('auto', 'auto_warning', 'fixup_error'):
if '/' in url: if re.match(r'^[^\s/]+\.[^\s/]+/', url):
self._downloader.report_warning('The url doesn\'t specify the protocol, trying with http') self._downloader.report_warning('The url doesn\'t specify the protocol, trying with http')
return self.url_result('http://' + url) return self.url_result('http://' + url)
elif default_search != 'fixup_error': elif default_search != 'fixup_error':
@ -2378,6 +2411,12 @@ class GenericIE(InfoExtractor):
# Unescaping the whole page allows to handle those cases in a generic way # Unescaping the whole page allows to handle those cases in a generic way
webpage = compat_urllib_parse_unquote(webpage) webpage = compat_urllib_parse_unquote(webpage)
# Unescape squarespace embeds to be detected by generic extractor,
# see https://github.com/ytdl-org/youtube-dl/issues/21294
webpage = re.sub(
r'<div[^>]+class=[^>]*?\bsqs-video-wrapper\b[^>]*>',
lambda x: unescapeHTML(x.group(0)), webpage)
# it's tempting to parse this further, but you would # it's tempting to parse this further, but you would
# have to take into account all the variations like # have to take into account all the variations like
# Video Title - Site Name # Video Title - Site Name

View File

@ -11,7 +11,7 @@ from ..utils import (
class GfycatIE(InfoExtractor): class GfycatIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?gfycat\.com/(?:ifr/|gifs/detail/)?(?P<id>[^-/?#]+)' _VALID_URL = r'https?://(?:www\.)?gfycat\.com/(?:ru/|ifr/|gifs/detail/)?(?P<id>[^-/?#]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://gfycat.com/DeadlyDecisiveGermanpinscher', 'url': 'http://gfycat.com/DeadlyDecisiveGermanpinscher',
'info_dict': { 'info_dict': {
@ -44,6 +44,9 @@ class GfycatIE(InfoExtractor):
'categories': list, 'categories': list,
'age_limit': 0, 'age_limit': 0,
} }
}, {
'url': 'https://gfycat.com/ru/RemarkableDrearyAmurstarfish',
'only_matching': True
}, { }, {
'url': 'https://gfycat.com/gifs/detail/UnconsciousLankyIvorygull', 'url': 'https://gfycat.com/gifs/detail/UnconsciousLankyIvorygull',
'only_matching': True 'only_matching': True

View File

@ -34,9 +34,13 @@ class GoIE(AdobePassIE):
'watchdisneyxd': { 'watchdisneyxd': {
'brand': '009', 'brand': '009',
'resource_id': 'DisneyXD', 'resource_id': 'DisneyXD',
},
'disneynow': {
'brand': '011',
'resource_id': 'Disney',
} }
} }
_VALID_URL = r'https?://(?:(?:(?P<sub_domain>%s)\.)?go|disneynow)\.com/(?:(?:[^/]+/)*(?P<id>vdka\w+)|(?:[^/]+/)*(?P<display_id>[^/?#]+))'\ _VALID_URL = r'https?://(?:(?:(?P<sub_domain>%s)\.)?go|(?P<sub_domain_2>disneynow))\.com/(?:(?:[^/]+/)*(?P<id>vdka\w+)|(?:[^/]+/)*(?P<display_id>[^/?#]+))'\
% '|'.join(list(_SITE_INFO.keys()) + ['disneynow']) % '|'.join(list(_SITE_INFO.keys()) + ['disneynow'])
_TESTS = [{ _TESTS = [{
'url': 'http://abc.go.com/shows/designated-survivor/video/most-recent/VDKA3807643', 'url': 'http://abc.go.com/shows/designated-survivor/video/most-recent/VDKA3807643',
@ -78,17 +82,16 @@ class GoIE(AdobePassIE):
def _extract_videos(self, brand, video_id='-1', show_id='-1'): def _extract_videos(self, brand, video_id='-1', show_id='-1'):
display_id = video_id if video_id != '-1' else show_id display_id = video_id if video_id != '-1' else show_id
foo = 'http://api.contents.watchabc.go.com/vp2/ws/contents/3000/videos/%s/001/-1/%s/-1/%s/-1/-1.json' % (brand, show_id, video_id)
print("Foo is:", foo)
return self._download_json( return self._download_json(
'http://api.contents.watchabc.go.com/vp2/ws/contents/3000/videos/%s/001/-1/%s/-1/%s/-1/-1.json' % (brand, show_id, video_id), 'http://api.contents.watchabc.go.com/vp2/ws/contents/3000/videos/%s/001/-1/%s/-1/%s/-1/-1.json' % (brand, show_id, video_id),
display_id)['video'] display_id)['video']
def _real_extract(self, url): def _real_extract(self, url):
sub_domain, video_id, display_id = re.match(self._VALID_URL, url).groups() mobj = re.match(self._VALID_URL, url)
print("sub_domain:",sub_domain) sub_domain = mobj.group('sub_domain') or mobj.group('sub_domain_2')
video_id, display_id = mobj.group('id', 'display_id')
site_info = self._SITE_INFO.get(sub_domain, {}) site_info = self._SITE_INFO.get(sub_domain, {})
print("site_info:", site_info)
brand = site_info.get('brand') brand = site_info.get('brand')
if not video_id or not site_info: if not video_id or not site_info:
webpage = self._download_webpage(url, display_id or video_id) webpage = self._download_webpage(url, display_id or video_id)
@ -101,8 +104,7 @@ class GoIE(AdobePassIE):
brand = self._search_regex( brand = self._search_regex(
(r'data-brand=\s*["\']\s*(\d+)', (r'data-brand=\s*["\']\s*(\d+)',
r'data-page-brand=\s*["\']\s*(\d+)'), webpage, 'brand', r'data-page-brand=\s*["\']\s*(\d+)'), webpage, 'brand',
default='008') default='004') #was 8 premerge
print("Brand 100:", brand)
site_info = next( site_info = next(
si for _, si in self._SITE_INFO.items() si for _, si in self._SITE_INFO.items()
if si.get('brand') == brand) if si.get('brand') == brand)

View File

@ -3,6 +3,7 @@ from __future__ import unicode_literals
import hashlib import hashlib
import hmac import hmac
import re
import time import time
import uuid import uuid
@ -126,6 +127,8 @@ class HotStarIE(HotStarBaseIE):
format_url = url_or_none(playback_set.get('playbackUrl')) format_url = url_or_none(playback_set.get('playbackUrl'))
if not format_url: if not format_url:
continue continue
format_url = re.sub(
r'(?<=//staragvod)(\d)', r'web\1', format_url)
tags = str_or_none(playback_set.get('tagsCombination')) or '' tags = str_or_none(playback_set.get('tagsCombination')) or ''
if tags and 'encryption:plain' not in tags: if tags and 'encryption:plain' not in tags:
continue continue
@ -133,7 +136,8 @@ class HotStarIE(HotStarBaseIE):
try: try:
if 'package:hls' in tags or ext == 'm3u8': if 'package:hls' in tags or ext == 'm3u8':
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
format_url, video_id, 'mp4', m3u8_id='hls')) format_url, video_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='hls'))
elif 'package:dash' in tags or ext == 'mpd': elif 'package:dash' in tags or ext == 'mpd':
formats.extend(self._extract_mpd_formats( formats.extend(self._extract_mpd_formats(
format_url, video_id, mpd_id='dash')) format_url, video_id, mpd_id='dash'))

View File

@ -22,7 +22,7 @@ from ..utils import (
class InstagramIE(InfoExtractor): class InstagramIE(InfoExtractor):
_VALID_URL = r'(?P<url>https?://(?:www\.)?instagram\.com/p/(?P<id>[^/?#&]+))' _VALID_URL = r'(?P<url>https?://(?:www\.)?instagram\.com/(?:p|tv)/(?P<id>[^/?#&]+))'
_TESTS = [{ _TESTS = [{
'url': 'https://instagram.com/p/aye83DjauH/?foo=bar#abc', 'url': 'https://instagram.com/p/aye83DjauH/?foo=bar#abc',
'md5': '0d2da106a9d2631273e192b372806516', 'md5': '0d2da106a9d2631273e192b372806516',
@ -92,6 +92,9 @@ class InstagramIE(InfoExtractor):
}, { }, {
'url': 'http://instagram.com/p/9o6LshA7zy/embed/', 'url': 'http://instagram.com/p/9o6LshA7zy/embed/',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.instagram.com/tv/aye83DjauH/',
'only_matching': True,
}] }]
@staticmethod @staticmethod

View File

@ -7,7 +7,7 @@ from .common import InfoExtractor
class JWPlatformIE(InfoExtractor): class JWPlatformIE(InfoExtractor):
_VALID_URL = r'(?:https?://(?:content\.jwplatform|cdn\.jwplayer)\.com/(?:(?:feed|player|thumb|preview|video)s|jw6|v2/media)/|jwplatform:)(?P<id>[a-zA-Z0-9]{8})' _VALID_URL = r'(?:https?://(?:content\.jwplatform|cdn\.jwplayer)\.com/(?:(?:feed|player|thumb|preview)s|jw6|v2/media)/|jwplatform:)(?P<id>[a-zA-Z0-9]{8})'
_TESTS = [{ _TESTS = [{
'url': 'http://content.jwplatform.com/players/nPripu9l-ALJ3XQCI.js', 'url': 'http://content.jwplatform.com/players/nPripu9l-ALJ3XQCI.js',
'md5': 'fa8899fa601eb7c83a64e9d568bdf325', 'md5': 'fa8899fa601eb7c83a64e9d568bdf325',

View File

@ -103,6 +103,11 @@ class KalturaIE(InfoExtractor):
{ {
'url': 'https://www.kaltura.com:443/index.php/extwidget/preview/partner_id/1770401/uiconf_id/37307382/entry_id/0_58u8kme7/embed/iframe?&flashvars[streamerType]=auto', 'url': 'https://www.kaltura.com:443/index.php/extwidget/preview/partner_id/1770401/uiconf_id/37307382/entry_id/0_58u8kme7/embed/iframe?&flashvars[streamerType]=auto',
'only_matching': True, 'only_matching': True,
},
{
# unavailable source format
'url': 'kaltura:513551:1_66x4rg7o',
'only_matching': True,
} }
] ]
@ -306,12 +311,17 @@ class KalturaIE(InfoExtractor):
f['fileExt'] = 'mp4' f['fileExt'] = 'mp4'
video_url = sign_url( video_url = sign_url(
'%s/flavorId/%s' % (data_url, f['id'])) '%s/flavorId/%s' % (data_url, f['id']))
format_id = '%(fileExt)s-%(bitrate)s' % f
# Source format may not be available (e.g. kaltura:513551:1_66x4rg7o)
if f.get('isOriginal') is True and not self._is_valid_url(
video_url, entry_id, format_id):
continue
# audio-only has no videoCodecId (e.g. kaltura:1926081:0_c03e1b5g # audio-only has no videoCodecId (e.g. kaltura:1926081:0_c03e1b5g
# -f mp4-56) # -f mp4-56)
vcodec = 'none' if 'videoCodecId' not in f and f.get( vcodec = 'none' if 'videoCodecId' not in f and f.get(
'frameRate') == 0 else f.get('videoCodecId') 'frameRate') == 0 else f.get('videoCodecId')
formats.append({ formats.append({
'format_id': '%(fileExt)s-%(bitrate)s' % f, 'format_id': format_id,
'ext': f.get('fileExt'), 'ext': f.get('fileExt'),
'tbr': int_or_none(f['bitrate']), 'tbr': int_or_none(f['bitrate']),
'fps': int_or_none(f.get('frameRate')), 'fps': int_or_none(f.get('frameRate')),

View File

@ -6,8 +6,8 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str from ..compat import compat_str
from ..utils import ( from ..utils import (
clean_html,
determine_ext, determine_ext,
extract_attributes,
ExtractorError, ExtractorError,
float_or_none, float_or_none,
int_or_none, int_or_none,
@ -19,6 +19,7 @@ from ..utils import (
class LecturioBaseIE(InfoExtractor): class LecturioBaseIE(InfoExtractor):
_API_BASE_URL = 'https://app.lecturio.com/api/en/latest/html5/'
_LOGIN_URL = 'https://app.lecturio.com/en/login' _LOGIN_URL = 'https://app.lecturio.com/en/login'
_NETRC_MACHINE = 'lecturio' _NETRC_MACHINE = 'lecturio'
@ -67,51 +68,56 @@ class LecturioIE(LecturioBaseIE):
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https:// https://
(?: (?:
app\.lecturio\.com/[^/]+/(?P<id>[^/?#&]+)\.lecture| app\.lecturio\.com/([^/]+/(?P<nt>[^/?#&]+)\.lecture|(?:\#/)?lecture/c/\d+/(?P<id>\d+))|
(?:www\.)?lecturio\.de/[^/]+/(?P<id_de>[^/?#&]+)\.vortrag (?:www\.)?lecturio\.de/[^/]+/(?P<nt_de>[^/?#&]+)\.vortrag
) )
''' '''
_TESTS = [{ _TESTS = [{
'url': 'https://app.lecturio.com/medical-courses/important-concepts-and-terms-introduction-to-microbiology.lecture#tab/videos', 'url': 'https://app.lecturio.com/medical-courses/important-concepts-and-terms-introduction-to-microbiology.lecture#tab/videos',
'md5': 'f576a797a5b7a5e4e4bbdfc25a6a6870', 'md5': '9a42cf1d8282a6311bf7211bbde26fde',
'info_dict': { 'info_dict': {
'id': '39634', 'id': '39634',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Important Concepts and Terms Introduction to Microbiology', 'title': 'Important Concepts and Terms Introduction to Microbiology',
}, },
'skip': 'Requires lecturio account credentials', 'skip': 'Requires lecturio account credentials',
}, { }, {
'url': 'https://www.lecturio.de/jura/oeffentliches-recht-staatsexamen.vortrag', 'url': 'https://www.lecturio.de/jura/oeffentliches-recht-staatsexamen.vortrag',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://app.lecturio.com/#/lecture/c/6434/39634',
'only_matching': True,
}] }]
_CC_LANGS = { _CC_LANGS = {
'Arabic': 'ar',
'Bulgarian': 'bg',
'German': 'de', 'German': 'de',
'English': 'en', 'English': 'en',
'Spanish': 'es', 'Spanish': 'es',
'Persian': 'fa',
'French': 'fr', 'French': 'fr',
'Japanese': 'ja',
'Polish': 'pl', 'Polish': 'pl',
'Pashto': 'ps',
'Russian': 'ru', 'Russian': 'ru',
} }
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
display_id = mobj.group('id') or mobj.group('id_de') nt = mobj.group('nt') or mobj.group('nt_de')
lecture_id = mobj.group('id')
webpage = self._download_webpage( display_id = nt or lecture_id
'https://app.lecturio.com/en/lecture/%s/player.html' % display_id, api_path = 'lectures/' + lecture_id if lecture_id else 'lecture/' + nt + '.json'
display_id) video = self._download_json(
self._API_BASE_URL + api_path, display_id)
lecture_id = self._search_regex(
r'lecture_id\s*=\s*(?:L_)?(\d+)', webpage, 'lecture id')
api_url = self._search_regex(
r'lectureDataLink\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1', webpage,
'api url', group='url')
video = self._download_json(api_url, display_id)
title = video['title'].strip() title = video['title'].strip()
if not lecture_id:
pid = video.get('productId') or video.get('uid')
if pid:
spid = pid.split('_')
if spid and len(spid) == 2:
lecture_id = spid[1]
formats = [] formats = []
for format_ in video['content']['media']: for format_ in video['content']['media']:
@ -129,24 +135,30 @@ class LecturioIE(LecturioBaseIE):
continue continue
label = str_or_none(format_.get('label')) label = str_or_none(format_.get('label'))
filesize = int_or_none(format_.get('fileSize')) filesize = int_or_none(format_.get('fileSize'))
formats.append({ f = {
'url': file_url, 'url': file_url,
'format_id': label, 'format_id': label,
'filesize': float_or_none(filesize, invscale=1000) 'filesize': float_or_none(filesize, invscale=1000)
}) }
if label:
mobj = re.match(r'(\d+)p\s*\(([^)]+)\)', label)
if mobj:
f.update({
'format_id': mobj.group(2),
'height': int(mobj.group(1)),
})
formats.append(f)
self._sort_formats(formats) self._sort_formats(formats)
subtitles = {} subtitles = {}
automatic_captions = {} automatic_captions = {}
cc = self._parse_json( captions = video.get('captions') or []
self._search_regex( for cc in captions:
r'subtitleUrls\s*:\s*({.+?})\s*,', webpage, 'subtitles', cc_url = cc.get('url')
default='{}'), display_id, fatal=False)
for cc_label, cc_url in cc.items():
cc_url = url_or_none(cc_url)
if not cc_url: if not cc_url:
continue continue
lang = self._search_regex( cc_label = cc.get('translatedCode')
lang = cc.get('languageCode') or self._search_regex(
r'/([a-z]{2})_', cc_url, 'lang', r'/([a-z]{2})_', cc_url, 'lang',
default=cc_label.split()[0] if cc_label else 'en') default=cc_label.split()[0] if cc_label else 'en')
original_lang = self._search_regex( original_lang = self._search_regex(
@ -160,7 +172,7 @@ class LecturioIE(LecturioBaseIE):
}) })
return { return {
'id': lecture_id, 'id': lecture_id or nt,
'title': title, 'title': title,
'formats': formats, 'formats': formats,
'subtitles': subtitles, 'subtitles': subtitles,
@ -169,37 +181,40 @@ class LecturioIE(LecturioBaseIE):
class LecturioCourseIE(LecturioBaseIE): class LecturioCourseIE(LecturioBaseIE):
_VALID_URL = r'https://app\.lecturio\.com/[^/]+/(?P<id>[^/?#&]+)\.course' _VALID_URL = r'https://app\.lecturio\.com/(?:[^/]+/(?P<nt>[^/?#&]+)\.course|(?:#/)?course/c/(?P<id>\d+))'
_TEST = { _TESTS = [{
'url': 'https://app.lecturio.com/medical-courses/microbiology-introduction.course#/', 'url': 'https://app.lecturio.com/medical-courses/microbiology-introduction.course#/',
'info_dict': { 'info_dict': {
'id': 'microbiology-introduction', 'id': 'microbiology-introduction',
'title': 'Microbiology: Introduction', 'title': 'Microbiology: Introduction',
'description': 'md5:13da8500c25880c6016ae1e6d78c386a',
}, },
'playlist_count': 45, 'playlist_count': 45,
'skip': 'Requires lecturio account credentials', 'skip': 'Requires lecturio account credentials',
} }, {
'url': 'https://app.lecturio.com/#/course/c/6434',
'only_matching': True,
}]
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) nt, course_id = re.match(self._VALID_URL, url).groups()
display_id = nt or course_id
webpage = self._download_webpage(url, display_id) api_path = 'courses/' + course_id if course_id else 'course/content/' + nt + '.json'
course = self._download_json(
self._API_BASE_URL + api_path, display_id)
entries = [] entries = []
for mobj in re.finditer( for lecture in course.get('lectures', []):
r'(?s)<[^>]+\bdata-url=(["\'])(?:(?!\1).)+\.lecture\b[^>]+>', lecture_id = str_or_none(lecture.get('id'))
webpage): lecture_url = lecture.get('url')
params = extract_attributes(mobj.group(0)) if lecture_url:
lecture_url = urljoin(url, params.get('data-url')) lecture_url = urljoin(url, lecture_url)
lecture_id = params.get('data-id') else:
lecture_url = 'https://app.lecturio.com/#/lecture/c/%s/%s' % (course_id, lecture_id)
entries.append(self.url_result( entries.append(self.url_result(
lecture_url, ie=LecturioIE.ie_key(), video_id=lecture_id)) lecture_url, ie=LecturioIE.ie_key(), video_id=lecture_id))
return self.playlist_result(
title = self._search_regex( entries, display_id, course.get('title'),
r'<span[^>]+class=["\']content-title[^>]+>([^<]+)', webpage, clean_html(course.get('description')))
'title', default=None)
return self.playlist_result(entries, display_id, title)
class LecturioDeCourseIE(LecturioBaseIE): class LecturioDeCourseIE(LecturioBaseIE):

View File

@ -326,7 +326,7 @@ class LetvCloudIE(InfoExtractor):
elif play_json.get('code'): elif play_json.get('code'):
raise ExtractorError('Letv cloud returned error %d' % play_json['code'], expected=True) raise ExtractorError('Letv cloud returned error %d' % play_json['code'], expected=True)
else: else:
raise ExtractorError('Letv cloud returned an unknwon error') raise ExtractorError('Letv cloud returned an unknown error')
def b64decode(s): def b64decode(s):
return compat_b64decode(s).decode('utf-8') return compat_b64decode(s).decode('utf-8')

View File

@ -0,0 +1,42 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import int_or_none
class LiveJournalIE(InfoExtractor):
_VALID_URL = r'https?://(?:[^.]+\.)?livejournal\.com/video/album/\d+.+?\bid=(?P<id>\d+)'
_TEST = {
'url': 'https://andrei-bt.livejournal.com/video/album/407/?mode=view&id=51272',
'md5': 'adaf018388572ced8a6f301ace49d4b2',
'info_dict': {
'id': '1263729',
'ext': 'mp4',
'title': 'Истребители против БПЛА',
'upload_date': '20190624',
'timestamp': 1561406715,
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
record = self._parse_json(self._search_regex(
r'Site\.page\s*=\s*({.+?});', webpage,
'page data'), video_id)['video']['record']
storage_id = compat_str(record['storageid'])
title = record.get('name')
if title:
# remove filename extension(.mp4, .mov, etc...)
title = title.rsplit('.', 1)[0]
return {
'_type': 'url_transparent',
'id': video_id,
'title': title,
'thumbnail': record.get('thumbnail'),
'timestamp': int_or_none(record.get('timecreate')),
'url': 'eagleplatform:vc.videos.livejournal.com:' + storage_id,
'ie_key': 'EaglePlatform',
}

View File

@ -117,6 +117,10 @@ class LyndaIE(LyndaBaseIE):
}, { }, {
'url': 'https://www.lynda.com/de/Graphic-Design-tutorials/Willkommen-Grundlagen-guten-Gestaltung/393570/393572-4.html', 'url': 'https://www.lynda.com/de/Graphic-Design-tutorials/Willkommen-Grundlagen-guten-Gestaltung/393570/393572-4.html',
'only_matching': True, 'only_matching': True,
}, {
# Status="NotFound", Message="Transcript not found"
'url': 'https://www.lynda.com/ASP-NET-tutorials/What-you-should-know/5034180/2811512-4.html',
'only_matching': True,
}] }]
def _raise_unavailable(self, video_id): def _raise_unavailable(self, video_id):
@ -247,12 +251,17 @@ class LyndaIE(LyndaBaseIE):
def _get_subtitles(self, video_id): def _get_subtitles(self, video_id):
url = 'https://www.lynda.com/ajax/player?videoId=%s&type=transcript' % video_id url = 'https://www.lynda.com/ajax/player?videoId=%s&type=transcript' % video_id
subs = self._download_json(url, None, False) subs = self._download_webpage(
url, video_id, 'Downloading subtitles JSON', fatal=False)
if not subs or 'Status="NotFound"' in subs:
return {}
subs = self._parse_json(subs, video_id, fatal=False)
if not subs:
return {}
fixed_subs = self._fix_subtitles(subs) fixed_subs = self._fix_subtitles(subs)
if fixed_subs: if fixed_subs:
return {'en': [{'ext': 'srt', 'data': fixed_subs}]} return {'en': [{'ext': 'srt', 'data': fixed_subs}]}
else: return {}
return {}
class LyndaCourseIE(LyndaBaseIE): class LyndaCourseIE(LyndaBaseIE):

View File

@ -79,6 +79,10 @@ class MGTVIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'tbr': tbr, 'tbr': tbr,
'protocol': 'm3u8_native', 'protocol': 'm3u8_native',
'http_headers': {
'Referer': url,
},
'format_note': stream.get('name'),
}) })
self._sort_formats(formats) self._sort_formats(formats)

View File

@ -164,7 +164,7 @@ class MixcloudIE(InfoExtractor):
def decrypt_url(f_url): def decrypt_url(f_url):
for k in (key, 'IFYOUWANTTHEARTISTSTOGETPAIDDONOTDOWNLOADFROMMIXCLOUD'): for k in (key, 'IFYOUWANTTHEARTISTSTOGETPAIDDONOTDOWNLOADFROMMIXCLOUD'):
decrypted_url = self._decrypt_xor_cipher(k, f_url) decrypted_url = self._decrypt_xor_cipher(k, f_url)
if re.search(r'^https?://[0-9a-z.]+/[0-9A-Za-z/.?=&_-]+$', decrypted_url): if re.search(r'^https?://[0-9A-Za-z.]+/[0-9A-Za-z/.?=&_-]+$', decrypted_url):
return decrypted_url return decrypted_url
for url_key in ('url', 'hlsUrl', 'dashUrl'): for url_key in ('url', 'hlsUrl', 'dashUrl'):

View File

@ -85,7 +85,8 @@ class NickBrIE(MTVServicesInfoExtractor):
https?:// https?://
(?: (?:
(?P<domain>(?:www\.)?nickjr|mundonick\.uol)\.com\.br| (?P<domain>(?:www\.)?nickjr|mundonick\.uol)\.com\.br|
(?:www\.)?nickjr\.[a-z]{2} (?:www\.)?nickjr\.[a-z]{2}|
(?:www\.)?nickelodeonjunior\.fr
) )
/(?:programas/)?[^/]+/videos/(?:episodios/)?(?P<id>[^/?\#.]+) /(?:programas/)?[^/]+/videos/(?:episodios/)?(?P<id>[^/?\#.]+)
''' '''
@ -101,6 +102,9 @@ class NickBrIE(MTVServicesInfoExtractor):
}, { }, {
'url': 'http://www.nickjr.de/blaze-und-die-monster-maschinen/videos/f6caaf8f-e4e8-4cc1-b489-9380d6dcd059/', 'url': 'http://www.nickjr.de/blaze-und-die-monster-maschinen/videos/f6caaf8f-e4e8-4cc1-b489-9380d6dcd059/',
'only_matching': True, 'only_matching': True,
}, {
'url': 'http://www.nickelodeonjunior.fr/paw-patrol-la-pat-patrouille/videos/episode-401-entier-paw-patrol/',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -45,7 +45,11 @@ class NineNowIE(InfoExtractor):
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
page_data = self._parse_json(self._search_regex( page_data = self._parse_json(self._search_regex(
r'window\.__data\s*=\s*({.*?});', webpage, r'window\.__data\s*=\s*({.*?});', webpage,
'page data'), display_id) 'page data', default='{}'), display_id, fatal=False)
if not page_data:
page_data = self._parse_json(self._parse_json(self._search_regex(
r'window\.__data\s*=\s*JSON\.parse\s*\(\s*(".+?")\s*\)\s*;',
webpage, 'page data'), display_id), display_id)
for kind in ('episode', 'clip'): for kind in ('episode', 'clip'):
current_key = page_data.get(kind, {}).get( current_key = page_data.get(kind, {}).get(

File diff suppressed because it is too large Load Diff

View File

@ -5,26 +5,27 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_str, # compat_str,
compat_HTTPError, compat_HTTPError,
) )
from ..utils import ( from ..utils import (
clean_html, clean_html,
ExtractorError, ExtractorError,
remove_end, # remove_end,
str_or_none,
strip_or_none, strip_or_none,
unified_timestamp, unified_timestamp,
urljoin, # urljoin,
) )
class PacktPubBaseIE(InfoExtractor): class PacktPubBaseIE(InfoExtractor):
_PACKT_BASE = 'https://www.packtpub.com' # _PACKT_BASE = 'https://www.packtpub.com'
_MAPT_REST = '%s/mapt-rest' % _PACKT_BASE _STATIC_PRODUCTS_BASE = 'https://static.packt-cdn.com/products/'
class PacktPubIE(PacktPubBaseIE): class PacktPubIE(PacktPubBaseIE):
_VALID_URL = r'https?://(?:(?:www\.)?packtpub\.com/mapt|subscription\.packtpub\.com)/video/[^/]+/(?P<course_id>\d+)/(?P<chapter_id>\d+)/(?P<id>\d+)' _VALID_URL = r'https?://(?:(?:www\.)?packtpub\.com/mapt|subscription\.packtpub\.com)/video/[^/]+/(?P<course_id>\d+)/(?P<chapter_id>[^/]+)/(?P<id>[^/]+)(?:/(?P<display_id>[^/?&#]+))?'
_TESTS = [{ _TESTS = [{
'url': 'https://www.packtpub.com/mapt/video/web-development/9781787122215/20528/20530/Project+Intro', 'url': 'https://www.packtpub.com/mapt/video/web-development/9781787122215/20528/20530/Project+Intro',
@ -40,6 +41,9 @@ class PacktPubIE(PacktPubBaseIE):
}, { }, {
'url': 'https://subscription.packtpub.com/video/web_development/9781787122215/20528/20530/project-intro', 'url': 'https://subscription.packtpub.com/video/web_development/9781787122215/20528/20530/project-intro',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://subscription.packtpub.com/video/programming/9781838988906/p1/video1_1/business-card-project',
'only_matching': True,
}] }]
_NETRC_MACHINE = 'packtpub' _NETRC_MACHINE = 'packtpub'
_TOKEN = None _TOKEN = None
@ -50,9 +54,9 @@ class PacktPubIE(PacktPubBaseIE):
return return
try: try:
self._TOKEN = self._download_json( self._TOKEN = self._download_json(
self._MAPT_REST + '/users/tokens', None, 'https://services.packtpub.com/auth-v1/users/tokens', None,
'Downloading Authorization Token', data=json.dumps({ 'Downloading Authorization Token', data=json.dumps({
'email': username, 'username': username,
'password': password, 'password': password,
}).encode())['data']['access'] }).encode())['data']['access']
except ExtractorError as e: except ExtractorError as e:
@ -61,54 +65,40 @@ class PacktPubIE(PacktPubBaseIE):
raise ExtractorError(message, expected=True) raise ExtractorError(message, expected=True)
raise raise
def _handle_error(self, response):
if response.get('status') != 'success':
raise ExtractorError(
'% said: %s' % (self.IE_NAME, response['message']),
expected=True)
def _download_json(self, *args, **kwargs):
response = super(PacktPubIE, self)._download_json(*args, **kwargs)
self._handle_error(response)
return response
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) course_id, chapter_id, video_id, display_id = re.match(self._VALID_URL, url).groups()
course_id, chapter_id, video_id = mobj.group(
'course_id', 'chapter_id', 'id')
headers = {} headers = {}
if self._TOKEN: if self._TOKEN:
headers['Authorization'] = 'Bearer ' + self._TOKEN headers['Authorization'] = 'Bearer ' + self._TOKEN
video = self._download_json( try:
'%s/users/me/products/%s/chapters/%s/sections/%s' video_url = self._download_json(
% (self._MAPT_REST, course_id, chapter_id, video_id), video_id, 'https://services.packtpub.com/products-v1/products/%s/%s/%s' % (course_id, chapter_id, video_id), video_id,
'Downloading JSON video', headers=headers)['data'] 'Downloading JSON video', headers=headers)['data']
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 400:
self.raise_login_required('This video is locked')
raise
content = video.get('content') # TODO: find a better way to avoid duplicating course requests
if not content: # metadata = self._download_json(
self.raise_login_required('This video is locked') # '%s/products/%s/chapters/%s/sections/%s/metadata'
# % (self._MAPT_REST, course_id, chapter_id, video_id),
# video_id)['data']
video_url = content['file'] # title = metadata['pageTitle']
# course_title = metadata.get('title')
metadata = self._download_json( # if course_title:
'%s/products/%s/chapters/%s/sections/%s/metadata' # title = remove_end(title, ' - %s' % course_title)
% (self._MAPT_REST, course_id, chapter_id, video_id), # timestamp = unified_timestamp(metadata.get('publicationDate'))
video_id)['data'] # thumbnail = urljoin(self._PACKT_BASE, metadata.get('filepath'))
title = metadata['pageTitle']
course_title = metadata.get('title')
if course_title:
title = remove_end(title, ' - %s' % course_title)
timestamp = unified_timestamp(metadata.get('publicationDate'))
thumbnail = urljoin(self._PACKT_BASE, metadata.get('filepath'))
return { return {
'id': video_id, 'id': video_id,
'url': video_url, 'url': video_url,
'title': title, 'title': display_id or video_id, # title,
'thumbnail': thumbnail, # 'thumbnail': thumbnail,
'timestamp': timestamp, # 'timestamp': timestamp,
} }
@ -119,6 +109,7 @@ class PacktPubCourseIE(PacktPubBaseIE):
'info_dict': { 'info_dict': {
'id': '9781787122215', 'id': '9781787122215',
'title': 'Learn Nodejs by building 12 projects [Video]', 'title': 'Learn Nodejs by building 12 projects [Video]',
'description': 'md5:489da8d953f416e51927b60a1c7db0aa',
}, },
'playlist_count': 90, 'playlist_count': 90,
}, { }, {
@ -136,35 +127,38 @@ class PacktPubCourseIE(PacktPubBaseIE):
url, course_id = mobj.group('url', 'id') url, course_id = mobj.group('url', 'id')
course = self._download_json( course = self._download_json(
'%s/products/%s/metadata' % (self._MAPT_REST, course_id), self._STATIC_PRODUCTS_BASE + '%s/toc' % course_id, course_id)
course_id)['data'] metadata = self._download_json(
self._STATIC_PRODUCTS_BASE + '%s/summary' % course_id,
course_id, fatal=False) or {}
entries = [] entries = []
for chapter_num, chapter in enumerate(course['tableOfContents'], 1): for chapter_num, chapter in enumerate(course['chapters'], 1):
if chapter.get('type') != 'chapter': chapter_id = str_or_none(chapter.get('id'))
continue sections = chapter.get('sections')
children = chapter.get('children') if not chapter_id or not isinstance(sections, list):
if not isinstance(children, list):
continue continue
chapter_info = { chapter_info = {
'chapter': chapter.get('title'), 'chapter': chapter.get('title'),
'chapter_number': chapter_num, 'chapter_number': chapter_num,
'chapter_id': chapter.get('id'), 'chapter_id': chapter_id,
} }
for section in children: for section in sections:
if section.get('type') != 'section': section_id = str_or_none(section.get('id'))
continue if not section_id or section.get('contentType') != 'video':
section_url = section.get('seoUrl')
if not isinstance(section_url, compat_str):
continue continue
entry = { entry = {
'_type': 'url_transparent', '_type': 'url_transparent',
'url': urljoin(url + '/', section_url), 'url': '/'.join([url, chapter_id, section_id]),
'title': strip_or_none(section.get('title')), 'title': strip_or_none(section.get('title')),
'description': clean_html(section.get('summary')), 'description': clean_html(section.get('summary')),
'thumbnail': metadata.get('coverImage'),
'timestamp': unified_timestamp(metadata.get('publicationDate')),
'ie_key': PacktPubIE.ie_key(), 'ie_key': PacktPubIE.ie_key(),
} }
entry.update(chapter_info) entry.update(chapter_info)
entries.append(entry) entries.append(entry)
return self.playlist_result(entries, course_id, course.get('title')) return self.playlist_result(
entries, course_id, metadata.get('title'),
clean_html(metadata.get('about')))

View File

@ -168,7 +168,7 @@ class PeerTubeIE(InfoExtractor):
@staticmethod @staticmethod
def _extract_peertube_url(webpage, source_url): def _extract_peertube_url(webpage, source_url):
mobj = re.match( mobj = re.match(
r'https?://(?P<host>[^/]+)/videos/watch/(?P<id>%s)' r'https?://(?P<host>[^/]+)/videos/(?:watch|embed)/(?P<id>%s)'
% PeerTubeIE._UUID_RE, source_url) % PeerTubeIE._UUID_RE, source_url)
if mobj and any(p in webpage for p in ( if mobj and any(p in webpage for p in (
'<title>PeerTube<', '<title>PeerTube<',

View File

@ -14,7 +14,7 @@ class PhilharmonieDeParisIE(InfoExtractor):
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?:// https?://
(?: (?:
live\.philharmoniedeparis\.fr/(?:[Cc]oncert/|misc/Playlist\.ashx\?id=)| live\.philharmoniedeparis\.fr/(?:[Cc]oncert/|embed(?:app)?/|misc/Playlist\.ashx\?id=)|
pad\.philharmoniedeparis\.fr/doc/CIMU/ pad\.philharmoniedeparis\.fr/doc/CIMU/
) )
(?P<id>\d+) (?P<id>\d+)
@ -40,6 +40,12 @@ class PhilharmonieDeParisIE(InfoExtractor):
}, { }, {
'url': 'http://live.philharmoniedeparis.fr/misc/Playlist.ashx?id=1030324&track=&lang=fr', 'url': 'http://live.philharmoniedeparis.fr/misc/Playlist.ashx?id=1030324&track=&lang=fr',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://live.philharmoniedeparis.fr/embedapp/1098406/berlioz-fantastique-lelio-les-siecles-national-youth-choir-of.html?lang=fr-FR',
'only_matching': True,
}, {
'url': 'https://live.philharmoniedeparis.fr/embed/1098406/berlioz-fantastique-lelio-les-siecles-national-youth-choir-of.html?lang=fr-FR',
'only_matching': True,
}] }]
_LIVE_URL = 'https://live.philharmoniedeparis.fr' _LIVE_URL = 'https://live.philharmoniedeparis.fr'

View File

@ -18,15 +18,14 @@ class PikselIE(InfoExtractor):
_VALID_URL = r'https?://player\.piksel\.com/v/(?P<id>[a-z0-9]+)' _VALID_URL = r'https?://player\.piksel\.com/v/(?P<id>[a-z0-9]+)'
_TESTS = [ _TESTS = [
{ {
'url': 'http://player.piksel.com/v/nv60p12f', 'url': 'http://player.piksel.com/v/ums2867l',
'md5': 'd9c17bbe9c3386344f9cfd32fad8d235', 'md5': '34e34c8d89dc2559976a6079db531e85',
'info_dict': { 'info_dict': {
'id': 'nv60p12f', 'id': 'ums2867l',
'ext': 'mp4', 'ext': 'mp4',
'title': 'فن الحياة - الحلقة 1', 'title': 'GX-005 with Caption',
'description': 'احدث برامج الداعية الاسلامي " مصطفي حسني " فى رمضان 2016علي النهار نور', 'timestamp': 1481335659,
'timestamp': 1465231790, 'upload_date': '20161210'
'upload_date': '20160606',
} }
}, },
{ {
@ -39,7 +38,7 @@ class PikselIE(InfoExtractor):
'title': 'WAW- State of Washington vs. Donald J. Trump, et al', 'title': 'WAW- State of Washington vs. Donald J. Trump, et al',
'description': 'State of Washington vs. Donald J. Trump, et al, Case Number 17-CV-00141-JLR, TRO Hearing, Civil Rights Case, 02/3/2017, 1:00 PM (PST), Seattle Federal Courthouse, Seattle, WA, Judge James L. Robart presiding.', 'description': 'State of Washington vs. Donald J. Trump, et al, Case Number 17-CV-00141-JLR, TRO Hearing, Civil Rights Case, 02/3/2017, 1:00 PM (PST), Seattle Federal Courthouse, Seattle, WA, Judge James L. Robart presiding.',
'timestamp': 1486171129, 'timestamp': 1486171129,
'upload_date': '20170204', 'upload_date': '20170204'
} }
} }
] ]
@ -113,6 +112,13 @@ class PikselIE(InfoExtractor):
}) })
self._sort_formats(formats) self._sort_formats(formats)
subtitles = {}
for caption in video_data.get('captions', []):
caption_url = caption.get('url')
if caption_url:
subtitles.setdefault(caption.get('locale', 'en'), []).append({
'url': caption_url})
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': title,
@ -120,4 +126,5 @@ class PikselIE(InfoExtractor):
'thumbnail': video_data.get('thumbnailUrl'), 'thumbnail': video_data.get('thumbnailUrl'),
'timestamp': parse_iso8601(video_data.get('dateadd')), 'timestamp': parse_iso8601(video_data.get('dateadd')),
'formats': formats, 'formats': formats,
'subtitles': subtitles,
} }

View File

@ -18,43 +18,10 @@ from ..utils import (
) )
class PlatziIE(InfoExtractor): class PlatziBaseIE(InfoExtractor):
_VALID_URL = r'''(?x)
https?://
(?:
platzi\.com/clases| # es version
courses\.platzi\.com/classes # en version
)/[^/]+/(?P<id>\d+)-[^/?\#&]+
'''
_LOGIN_URL = 'https://platzi.com/login/' _LOGIN_URL = 'https://platzi.com/login/'
_NETRC_MACHINE = 'platzi' _NETRC_MACHINE = 'platzi'
_TESTS = [{
'url': 'https://platzi.com/clases/1311-next-js/12074-creando-nuestra-primera-pagina/',
'md5': '8f56448241005b561c10f11a595b37e3',
'info_dict': {
'id': '12074',
'ext': 'mp4',
'title': 'Creando nuestra primera página',
'description': 'md5:4c866e45034fc76412fbf6e60ae008bc',
'duration': 420,
},
'skip': 'Requires platzi account credentials',
}, {
'url': 'https://courses.platzi.com/classes/1367-communication-codestream/13430-background/',
'info_dict': {
'id': '13430',
'ext': 'mp4',
'title': 'Background',
'description': 'md5:49c83c09404b15e6e71defaf87f6b305',
'duration': 360,
},
'skip': 'Requires platzi account credentials',
'params': {
'skip_download': True,
},
}]
def _real_initialize(self): def _real_initialize(self):
self._login() self._login()
@ -97,6 +64,42 @@ class PlatziIE(InfoExtractor):
'Unable to login: %s' % error, expected=True) 'Unable to login: %s' % error, expected=True)
raise ExtractorError('Unable to log in') raise ExtractorError('Unable to log in')
class PlatziIE(PlatziBaseIE):
_VALID_URL = r'''(?x)
https?://
(?:
platzi\.com/clases| # es version
courses\.platzi\.com/classes # en version
)/[^/]+/(?P<id>\d+)-[^/?\#&]+
'''
_TESTS = [{
'url': 'https://platzi.com/clases/1311-next-js/12074-creando-nuestra-primera-pagina/',
'md5': '8f56448241005b561c10f11a595b37e3',
'info_dict': {
'id': '12074',
'ext': 'mp4',
'title': 'Creando nuestra primera página',
'description': 'md5:4c866e45034fc76412fbf6e60ae008bc',
'duration': 420,
},
'skip': 'Requires platzi account credentials',
}, {
'url': 'https://courses.platzi.com/classes/1367-communication-codestream/13430-background/',
'info_dict': {
'id': '13430',
'ext': 'mp4',
'title': 'Background',
'description': 'md5:49c83c09404b15e6e71defaf87f6b305',
'duration': 360,
},
'skip': 'Requires platzi account credentials',
'params': {
'skip_download': True,
},
}]
def _real_extract(self, url): def _real_extract(self, url):
lecture_id = self._match_id(url) lecture_id = self._match_id(url)
@ -104,7 +107,11 @@ class PlatziIE(InfoExtractor):
data = self._parse_json( data = self._parse_json(
self._search_regex( self._search_regex(
r'client_data\s*=\s*({.+?})\s*;', webpage, 'client data'), # client_data may contain "};" so that we have to try more
# strict regex first
(r'client_data\s*=\s*({.+?})\s*;\s*\n',
r'client_data\s*=\s*({.+?})\s*;'),
webpage, 'client data'),
lecture_id) lecture_id)
material = data['initialState']['material'] material = data['initialState']['material']
@ -146,7 +153,7 @@ class PlatziIE(InfoExtractor):
} }
class PlatziCourseIE(InfoExtractor): class PlatziCourseIE(PlatziBaseIE):
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?:// https?://
(?: (?:

View File

@ -39,7 +39,12 @@ class Porn91IE(InfoExtractor):
r'<div id="viewvideo-title">([^<]+)</div>', webpage, 'title') r'<div id="viewvideo-title">([^<]+)</div>', webpage, 'title')
title = title.replace('\n', '') title = title.replace('\n', '')
info_dict = self._parse_html5_media_entries(url, webpage, video_id)[0] video_link_url = self._search_regex(
r'<textarea[^>]+id=["\']fm-video_link[^>]+>([^<]+)</textarea>',
webpage, 'video link')
videopage = self._download_webpage(video_link_url, video_id)
info_dict = self._parse_html5_media_entries(url, videopage, video_id)[0]
duration = parse_duration(self._search_regex( duration = parse_duration(self._search_regex(
r'时长:\s*</span>\s*(\d+:\d+)', webpage, 'duration', fatal=False)) r'时长:\s*</span>\s*(\d+:\d+)', webpage, 'duration', fatal=False))

View File

@ -4,32 +4,34 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import (
compat_HTTPError,
compat_str,
)
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
int_or_none, int_or_none,
strip_or_none, str_or_none,
unescapeHTML,
urlencode_postdata, urlencode_postdata,
) )
class RoosterTeethIE(InfoExtractor): class RoosterTeethIE(InfoExtractor):
_VALID_URL = r'https?://(?:.+?\.)?roosterteeth\.com/episode/(?P<id>[^/?#&]+)' _VALID_URL = r'https?://(?:.+?\.)?roosterteeth\.com/(?:episode|watch)/(?P<id>[^/?#&]+)'
_LOGIN_URL = 'https://roosterteeth.com/login' _LOGIN_URL = 'https://roosterteeth.com/login'
_NETRC_MACHINE = 'roosterteeth' _NETRC_MACHINE = 'roosterteeth'
_TESTS = [{ _TESTS = [{
'url': 'http://roosterteeth.com/episode/million-dollars-but-season-2-million-dollars-but-the-game-announcement', 'url': 'http://roosterteeth.com/episode/million-dollars-but-season-2-million-dollars-but-the-game-announcement',
'md5': 'e2bd7764732d785ef797700a2489f212', 'md5': 'e2bd7764732d785ef797700a2489f212',
'info_dict': { 'info_dict': {
'id': '26576', 'id': '9156',
'display_id': 'million-dollars-but-season-2-million-dollars-but-the-game-announcement', 'display_id': 'million-dollars-but-season-2-million-dollars-but-the-game-announcement',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Million Dollars, But...: Million Dollars, But... The Game Announcement', 'title': 'Million Dollars, But... The Game Announcement',
'description': 'md5:0cc3b21986d54ed815f5faeccd9a9ca5', 'description': 'md5:168a54b40e228e79f4ddb141e89fe4f5',
'thumbnail': r're:^https?://.*\.png$', 'thumbnail': r're:^https?://.*\.png$',
'series': 'Million Dollars, But...', 'series': 'Million Dollars, But...',
'episode': 'Million Dollars, But... The Game Announcement', 'episode': 'Million Dollars, But... The Game Announcement',
'comment_count': int,
}, },
}, { }, {
'url': 'http://achievementhunter.roosterteeth.com/episode/off-topic-the-achievement-hunter-podcast-2016-i-didn-t-think-it-would-pass-31', 'url': 'http://achievementhunter.roosterteeth.com/episode/off-topic-the-achievement-hunter-podcast-2016-i-didn-t-think-it-would-pass-31',
@ -47,6 +49,9 @@ class RoosterTeethIE(InfoExtractor):
# only available for FIRST members # only available for FIRST members
'url': 'http://roosterteeth.com/episode/rt-docs-the-world-s-greatest-head-massage-the-world-s-greatest-head-massage-an-asmr-journey-part-one', 'url': 'http://roosterteeth.com/episode/rt-docs-the-world-s-greatest-head-massage-the-world-s-greatest-head-massage-an-asmr-journey-part-one',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://roosterteeth.com/watch/million-dollars-but-season-2-million-dollars-but-the-game-announcement',
'only_matching': True,
}] }]
def _login(self): def _login(self):
@ -89,60 +94,55 @@ class RoosterTeethIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) display_id = self._match_id(url)
api_episode_url = 'https://svod-be.roosterteeth.com/api/v1/episodes/%s' % display_id
webpage = self._download_webpage(url, display_id) try:
m3u8_url = self._download_json(
episode = strip_or_none(unescapeHTML(self._search_regex( api_episode_url + '/videos', display_id,
(r'videoTitle\s*=\s*(["\'])(?P<title>(?:(?!\1).)+)\1', 'Downloading video JSON metadata')['data'][0]['attributes']['url']
r'<title>(?P<title>[^<]+)</title>'), webpage, 'title', except ExtractorError as e:
default=None, group='title'))) if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
if self._parse_json(e.cause.read().decode(), display_id).get('access') is False:
title = strip_or_none(self._og_search_title( self.raise_login_required(
webpage, default=None)) or episode '%s is only available for FIRST members' % display_id)
raise
m3u8_url = self._search_regex(
r'file\s*:\s*(["\'])(?P<url>http.+?\.m3u8.*?)\1',
webpage, 'm3u8 url', default=None, group='url')
if not m3u8_url:
if re.search(r'<div[^>]+class=["\']non-sponsor', webpage):
self.raise_login_required(
'%s is only available for FIRST members' % display_id)
if re.search(r'<div[^>]+class=["\']golive-gate', webpage):
self.raise_login_required('%s is not available yet' % display_id)
raise ExtractorError('Unable to extract m3u8 URL')
formats = self._extract_m3u8_formats( formats = self._extract_m3u8_formats(
m3u8_url, display_id, ext='mp4', m3u8_url, display_id, 'mp4', 'm3u8_native', m3u8_id='hls')
entry_protocol='m3u8_native', m3u8_id='hls')
self._sort_formats(formats) self._sort_formats(formats)
description = strip_or_none(self._og_search_description(webpage)) episode = self._download_json(
thumbnail = self._proto_relative_url(self._og_search_thumbnail(webpage)) api_episode_url, display_id,
'Downloading episode JSON metadata')['data'][0]
attributes = episode['attributes']
title = attributes.get('title') or attributes['display_title']
video_id = compat_str(episode['id'])
series = self._search_regex( thumbnails = []
(r'<h2>More ([^<]+)</h2>', r'<a[^>]+>See All ([^<]+) Videos<'), for image in episode.get('included', {}).get('images', []):
webpage, 'series', fatal=False) if image.get('type') == 'episode_image':
img_attributes = image.get('attributes') or {}
comment_count = int_or_none(self._search_regex( for k in ('thumb', 'small', 'medium', 'large'):
r'>Comments \((\d+)\)<', webpage, img_url = img_attributes.get(k)
'comment count', fatal=False)) if img_url:
thumbnails.append({
video_id = self._search_regex( 'id': k,
(r'containerId\s*=\s*["\']episode-(\d+)\1', 'url': img_url,
r'<div[^<]+id=["\']episode-(\d+)'), webpage, })
'video id', default=display_id)
return { return {
'id': video_id, 'id': video_id,
'display_id': display_id, 'display_id': display_id,
'title': title, 'title': title,
'description': description, 'description': attributes.get('description') or attributes.get('caption'),
'thumbnail': thumbnail, 'thumbnails': thumbnails,
'series': series, 'series': attributes.get('show_title'),
'episode': episode, 'season_number': int_or_none(attributes.get('season_number')),
'comment_count': comment_count, 'season_id': attributes.get('season_id'),
'episode': title,
'episode_number': int_or_none(attributes.get('number')),
'episode_id': str_or_none(episode.get('uuid')),
'formats': formats, 'formats': formats,
'channel_id': attributes.get('channel_id'),
'duration': int_or_none(attributes.get('length')),
} }

View File

@ -32,7 +32,7 @@ class RtlNlIE(InfoExtractor):
'duration': 1167.96, 'duration': 1167.96,
}, },
}, { }, {
# best format avaialble a3t # best format available a3t
'url': 'http://www.rtl.nl/system/videoplayer/derden/rtlnieuws/video_embed.html#uuid=84ae5571-ac25-4225-ae0c-ef8d9efb2aed/autoplay=false', 'url': 'http://www.rtl.nl/system/videoplayer/derden/rtlnieuws/video_embed.html#uuid=84ae5571-ac25-4225-ae0c-ef8d9efb2aed/autoplay=false',
'md5': 'dea7474214af1271d91ef332fb8be7ea', 'md5': 'dea7474214af1271d91ef332fb8be7ea',
'info_dict': { 'info_dict': {

View File

@ -1,53 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
js_to_json,
get_element_by_class,
unified_strdate,
)
class RudoIE(InfoExtractor):
_VALID_URL = r'https?://rudo\.video/vod/(?P<id>[0-9a-zA-Z]+)'
_TEST = {
'url': 'http://rudo.video/vod/oTzw0MGnyG',
'md5': '2a03a5b32dd90a04c83b6d391cf7b415',
'info_dict': {
'id': 'oTzw0MGnyG',
'ext': 'mp4',
'title': 'Comentario Tomás Mosciatti',
'upload_date': '20160617',
},
}
@classmethod
def _extract_url(cls, webpage):
mobj = re.search(
r'<iframe[^>]+src=(?P<q1>[\'"])(?P<url>(?:https?:)?//rudo\.video/vod/[0-9a-zA-Z]+)(?P=q1)',
webpage)
if mobj:
return mobj.group('url')
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id, encoding='iso-8859-1')
jwplayer_data = self._parse_json(self._search_regex(
r'(?s)playerInstance\.setup\(({.+?})\)', webpage, 'jwplayer data'), video_id,
transform_source=lambda s: js_to_json(re.sub(r'encodeURI\([^)]+\)', '""', s)))
info_dict = self._parse_jwplayer_data(
jwplayer_data, video_id, require_title=False, m3u8_id='hls', mpd_id='dash')
info_dict.update({
'title': self._og_search_title(webpage),
'upload_date': unified_strdate(get_element_by_class('date', webpage)),
})
return info_dict

View File

@ -68,9 +68,10 @@ class SafariBaseIE(InfoExtractor):
raise ExtractorError( raise ExtractorError(
'Unable to login: %s' % credentials, expected=True) 'Unable to login: %s' % credentials, expected=True)
# oreilly serves two same groot_sessionid cookies in Set-Cookie header # oreilly serves two same instances of the following cookies
# and expects first one to be actually set # in Set-Cookie header and expects first one to be actually set
self._apply_first_set_cookie_header(urlh, 'groot_sessionid') for cookie in ('groot_sessionid', 'orm-jwt', 'orm-rt'):
self._apply_first_set_cookie_header(urlh, cookie)
_, urlh = self._download_webpage_handle( _, urlh = self._download_webpage_handle(
auth.get('redirect_uri') or next_uri, None, 'Completing login',) auth.get('redirect_uri') or next_uri, None, 'Completing login',)

View File

@ -197,7 +197,7 @@ class SoundcloudIE(InfoExtractor):
'skip_download': True, 'skip_download': True,
}, },
}, },
# not avaialble via api.soundcloud.com/i1/tracks/id/streams # not available via api.soundcloud.com/i1/tracks/id/streams
{ {
'url': 'https://soundcloud.com/giovannisarani/mezzo-valzer', 'url': 'https://soundcloud.com/giovannisarani/mezzo-valzer',
'md5': 'e22aecd2bc88e0e4e432d7dcc0a1abf7', 'md5': 'e22aecd2bc88e0e4e432d7dcc0a1abf7',
@ -221,7 +221,7 @@ class SoundcloudIE(InfoExtractor):
} }
] ]
_CLIENT_ID = 'FweeGBOOEOYJWLJN3oEyToGLKhmSz0I7' _CLIENT_ID = 'BeGVhOrGmfboy1LtiHTQF6Ejpt9ULJCI'
@staticmethod @staticmethod
def _extract_urls(webpage): def _extract_urls(webpage):

View File

@ -5,6 +5,7 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
merge_dicts,
orderedSet, orderedSet,
parse_duration, parse_duration,
parse_resolution, parse_resolution,
@ -26,6 +27,8 @@ class SpankBangIE(InfoExtractor):
'description': 'dillion harper masturbates on a bed', 'description': 'dillion harper masturbates on a bed',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
'uploader': 'silly2587', 'uploader': 'silly2587',
'timestamp': 1422571989,
'upload_date': '20150129',
'age_limit': 18, 'age_limit': 18,
} }
}, { }, {
@ -106,31 +109,36 @@ class SpankBangIE(InfoExtractor):
for format_id, format_url in stream.items(): for format_id, format_url in stream.items():
if format_id.startswith(STREAM_URL_PREFIX): if format_id.startswith(STREAM_URL_PREFIX):
if format_url and isinstance(format_url, list):
format_url = format_url[0]
extract_format( extract_format(
format_id[len(STREAM_URL_PREFIX):], format_url) format_id[len(STREAM_URL_PREFIX):], format_url)
self._sort_formats(formats) self._sort_formats(formats)
info = self._search_json_ld(webpage, video_id, default={})
title = self._html_search_regex( title = self._html_search_regex(
r'(?s)<h1[^>]*>(.+?)</h1>', webpage, 'title') r'(?s)<h1[^>]*>(.+?)</h1>', webpage, 'title', default=None)
description = self._search_regex( description = self._search_regex(
r'<div[^>]+\bclass=["\']bottom[^>]+>\s*<p>[^<]*</p>\s*<p>([^<]+)', r'<div[^>]+\bclass=["\']bottom[^>]+>\s*<p>[^<]*</p>\s*<p>([^<]+)',
webpage, 'description', fatal=False) webpage, 'description', default=None)
thumbnail = self._og_search_thumbnail(webpage) thumbnail = self._og_search_thumbnail(webpage, default=None)
uploader = self._search_regex( uploader = self._html_search_regex(
r'class="user"[^>]*><img[^>]+>([^<]+)', (r'(?s)<li[^>]+class=["\']profile[^>]+>(.+?)</a>',
r'class="user"[^>]*><img[^>]+>([^<]+)'),
webpage, 'uploader', default=None) webpage, 'uploader', default=None)
duration = parse_duration(self._search_regex( duration = parse_duration(self._search_regex(
r'<div[^>]+\bclass=["\']right_side[^>]+>\s*<span>([^<]+)', r'<div[^>]+\bclass=["\']right_side[^>]+>\s*<span>([^<]+)',
webpage, 'duration', fatal=False)) webpage, 'duration', default=None))
view_count = str_to_int(self._search_regex( view_count = str_to_int(self._search_regex(
r'([\d,.]+)\s+plays', webpage, 'view count', fatal=False)) r'([\d,.]+)\s+plays', webpage, 'view count', default=None))
age_limit = self._rta_search(webpage) age_limit = self._rta_search(webpage)
return { return merge_dicts({
'id': video_id, 'id': video_id,
'title': title, 'title': title or video_id,
'description': description, 'description': description,
'thumbnail': thumbnail, 'thumbnail': thumbnail,
'uploader': uploader, 'uploader': uploader,
@ -138,7 +146,8 @@ class SpankBangIE(InfoExtractor):
'view_count': view_count, 'view_count': view_count,
'formats': formats, 'formats': formats,
'age_limit': age_limit, 'age_limit': age_limit,
} }, info
)
class SpankBangPlaylistIE(InfoExtractor): class SpankBangPlaylistIE(InfoExtractor):

View File

@ -22,7 +22,7 @@ class BellatorIE(MTVServicesInfoExtractor):
'only_matching': True, 'only_matching': True,
}] }]
_FEED_URL = 'http://www.spike.com/feeds/mrss/' _FEED_URL = 'http://www.bellator.com/feeds/mrss/'
_GEO_COUNTRIES = ['US'] _GEO_COUNTRIES = ['US']

View File

@ -133,7 +133,7 @@ class TEDIE(InfoExtractor):
def _extract_info(self, webpage): def _extract_info(self, webpage):
info_json = self._search_regex( info_json = self._search_regex(
r'(?s)q\(\s*"\w+.init"\s*,\s*({.+})\)\s*</script>', r'(?s)q\(\s*"\w+.init"\s*,\s*({.+?})\)\s*</script>',
webpage, 'info json') webpage, 'info json')
return json.loads(info_json) return json.loads(info_json)

View File

@ -72,8 +72,13 @@ class TV4IE(InfoExtractor):
video_id = self._match_id(url) video_id = self._match_id(url)
info = self._download_json( info = self._download_json(
'http://www.tv4play.se/player/assets/%s.json' % video_id, 'https://playback-api.b17g.net/asset/%s' % video_id,
video_id, 'Downloading video info JSON') video_id, 'Downloading video info JSON', query={
'service': 'tv4',
'device': 'browser',
'protocol': 'hls,dash',
'drm': 'widevine',
})['metadata']
title = info['title'] title = info['title']
@ -111,5 +116,9 @@ class TV4IE(InfoExtractor):
'timestamp': parse_iso8601(info.get('broadcast_date_time')), 'timestamp': parse_iso8601(info.get('broadcast_date_time')),
'duration': int_or_none(info.get('duration')), 'duration': int_or_none(info.get('duration')),
'thumbnail': info.get('image'), 'thumbnail': info.get('image'),
'is_live': info.get('is_live') is True, 'is_live': info.get('isLive') is True,
'series': info.get('seriesTitle'),
'season_number': int_or_none(info.get('seasonNumber')),
'episode': info.get('episodeTitle'),
'episode_number': int_or_none(info.get('episodeNumber')),
} }

View File

@ -9,6 +9,8 @@ from ..utils import (
float_or_none, float_or_none,
int_or_none, int_or_none,
parse_age_limit, parse_age_limit,
try_get,
url_or_none,
) )
@ -23,11 +25,10 @@ class TvigleIE(InfoExtractor):
_TESTS = [ _TESTS = [
{ {
'url': 'http://www.tvigle.ru/video/sokrat/', 'url': 'http://www.tvigle.ru/video/sokrat/',
'md5': '36514aed3657d4f70b4b2cef8eb520cd',
'info_dict': { 'info_dict': {
'id': '1848932', 'id': '1848932',
'display_id': 'sokrat', 'display_id': 'sokrat',
'ext': 'flv', 'ext': 'mp4',
'title': 'Сократ', 'title': 'Сократ',
'description': 'md5:d6b92ffb7217b4b8ebad2e7665253c17', 'description': 'md5:d6b92ffb7217b4b8ebad2e7665253c17',
'duration': 6586, 'duration': 6586,
@ -37,7 +38,6 @@ class TvigleIE(InfoExtractor):
}, },
{ {
'url': 'http://www.tvigle.ru/video/vladimir-vysotskii/vedushchii-teleprogrammy-60-minut-ssha-o-vladimire-vysotskom/', 'url': 'http://www.tvigle.ru/video/vladimir-vysotskii/vedushchii-teleprogrammy-60-minut-ssha-o-vladimire-vysotskom/',
'md5': 'e7efe5350dd5011d0de6550b53c3ba7b',
'info_dict': { 'info_dict': {
'id': '5142516', 'id': '5142516',
'ext': 'flv', 'ext': 'flv',
@ -62,7 +62,7 @@ class TvigleIE(InfoExtractor):
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
video_id = self._html_search_regex( video_id = self._html_search_regex(
(r'<div[^>]+class=["\']player["\'][^>]+id=["\'](\d+)', (r'<div[^>]+class=["\']player["\'][^>]+id=["\'](\d+)',
r'var\s+cloudId\s*=\s*["\'](\d+)', r'cloudId\s*=\s*["\'](\d+)',
r'class="video-preview current_playing" id="(\d+)"'), r'class="video-preview current_playing" id="(\d+)"'),
webpage, 'video id') webpage, 'video id')
@ -90,21 +90,40 @@ class TvigleIE(InfoExtractor):
age_limit = parse_age_limit(item.get('ageRestrictions')) age_limit = parse_age_limit(item.get('ageRestrictions'))
formats = [] formats = []
for vcodec, fmts in item['videos'].items(): for vcodec, url_or_fmts in item['videos'].items():
if vcodec == 'hls': if vcodec == 'hls':
continue m3u8_url = url_or_none(url_or_fmts)
for format_id, video_url in fmts.items(): if not m3u8_url:
if format_id == 'm3u8':
continue continue
height = self._search_regex( formats.extend(self._extract_m3u8_formats(
r'^(\d+)[pP]$', format_id, 'height', default=None) m3u8_url, video_id, ext='mp4', entry_protocol='m3u8_native',
formats.append({ m3u8_id='hls', fatal=False))
'url': video_url, elif vcodec == 'dash':
'format_id': '%s-%s' % (vcodec, format_id), mpd_url = url_or_none(url_or_fmts)
'vcodec': vcodec, if not mpd_url:
'height': int_or_none(height), continue
'filesize': int_or_none(item.get('video_files_size', {}).get(vcodec, {}).get(format_id)), formats.extend(self._extract_mpd_formats(
}) mpd_url, video_id, mpd_id='dash', fatal=False))
else:
if not isinstance(url_or_fmts, dict):
continue
for format_id, video_url in url_or_fmts.items():
if format_id == 'm3u8':
continue
video_url = url_or_none(video_url)
if not video_url:
continue
height = self._search_regex(
r'^(\d+)[pP]$', format_id, 'height', default=None)
filesize = int_or_none(try_get(
item, lambda x: x['video_files_size'][vcodec][format_id]))
formats.append({
'url': video_url,
'format_id': '%s-%s' % (vcodec, format_id),
'vcodec': vcodec,
'height': int_or_none(height),
'filesize': filesize,
})
self._sort_formats(formats) self._sort_formats(formats)
return { return {

View File

@ -1,32 +1,35 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
from .mtv import MTVServicesInfoExtractor from .spike import ParamountNetworkIE
class TVLandIE(MTVServicesInfoExtractor): class TVLandIE(ParamountNetworkIE):
IE_NAME = 'tvland.com' IE_NAME = 'tvland.com'
_VALID_URL = r'https?://(?:www\.)?tvland\.com/(?:video-clips|(?:full-)?episodes)/(?P<id>[^/?#.]+)' _VALID_URL = r'https?://(?:www\.)?tvland\.com/(?:video-clips|(?:full-)?episodes)/(?P<id>[^/?#.]+)'
_FEED_URL = 'http://www.tvland.com/feeds/mrss/' _FEED_URL = 'http://www.tvland.com/feeds/mrss/'
_TESTS = [{ _TESTS = [{
# Geo-restricted. Without a proxy metadata are still there. With a # Geo-restricted. Without a proxy metadata are still there. With a
# proxy it redirects to http://m.tvland.com/app/ # proxy it redirects to http://m.tvland.com/app/
'url': 'http://www.tvland.com/episodes/hqhps2/everybody-loves-raymond-the-invasion-ep-048', 'url': 'https://www.tvland.com/episodes/s04pzf/everybody-loves-raymond-the-dog-season-1-ep-19',
'info_dict': { 'info_dict': {
'description': 'md5:80973e81b916a324e05c14a3fb506d29', 'description': 'md5:84928e7a8ad6649371fbf5da5e1ad75a',
'title': 'The Invasion', 'title': 'The Dog',
}, },
'playlist': [], 'playlist_mincount': 5,
}, { }, {
'url': 'http://www.tvland.com/video-clips/zea2ev/younger-younger--hilary-duff---little-lies', 'url': 'https://www.tvland.com/video-clips/4n87f2/younger-a-first-look-at-younger-season-6',
'md5': 'e2c6389401cf485df26c79c247b08713', 'md5': 'e2c6389401cf485df26c79c247b08713',
'info_dict': { 'info_dict': {
'id': 'b8697515-4bbe-4e01-83d5-fa705ce5fa88', 'id': '891f7d3c-5b5b-4753-b879-b7ba1a601757',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Younger|December 28, 2015|2|NO-EPISODE#|Younger: Hilary Duff - Little Lies', 'title': 'Younger|April 30, 2019|6|NO-EPISODE#|A First Look at Younger Season 6',
'description': 'md5:7d192f56ca8d958645c83f0de8ef0269', 'description': 'md5:595ea74578d3a888ae878dfd1c7d4ab2',
'upload_date': '20151228', 'upload_date': '20190430',
'timestamp': 1451289600, 'timestamp': 1556658000,
},
'params': {
'skip_download': True,
}, },
}, { }, {
'url': 'http://www.tvland.com/full-episodes/iu0hz6/younger-a-kiss-is-just-a-kiss-season-3-ep-301', 'url': 'http://www.tvland.com/full-episodes/iu0hz6/younger-a-kiss-is-just-a-kiss-season-3-ep-301',

View File

@ -4,6 +4,7 @@ from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
int_or_none, int_or_none,
NO_DEFAULT,
unescapeHTML, unescapeHTML,
) )
@ -17,9 +18,21 @@ class TVN24IE(InfoExtractor):
'id': '1584444', 'id': '1584444',
'ext': 'mp4', 'ext': 'mp4',
'title': '"Święta mają być wesołe, dlatego, ludziska, wszyscy pod jemiołę"', 'title': '"Święta mają być wesołe, dlatego, ludziska, wszyscy pod jemiołę"',
'description': 'Wyjątkowe orędzie Artura Andrusa, jednego z gości "Szkła kontaktowego".', 'description': 'Wyjątkowe orędzie Artura Andrusa, jednego z gości Szkła kontaktowego.',
'thumbnail': 're:https?://.*[.]jpeg', 'thumbnail': 're:https?://.*[.]jpeg',
} }
}, {
# different layout
'url': 'https://tvnmeteo.tvn24.pl/magazyny/maja-w-ogrodzie,13/odcinki-online,1,4,1,0/pnacza-ptaki-i-iglaki-odc-691-hgtv-odc-29,1771763.html',
'info_dict': {
'id': '1771763',
'ext': 'mp4',
'title': 'Pnącza, ptaki i iglaki (odc. 691 /HGTV odc. 29)',
'thumbnail': 're:https?://.*',
},
'params': {
'skip_download': True,
},
}, { }, {
'url': 'http://fakty.tvn24.pl/ogladaj-online,60/53-konferencja-bezpieczenstwa-w-monachium,716431.html', 'url': 'http://fakty.tvn24.pl/ogladaj-online,60/53-konferencja-bezpieczenstwa-w-monachium,716431.html',
'only_matching': True, 'only_matching': True,
@ -35,18 +48,21 @@ class TVN24IE(InfoExtractor):
}] }]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) display_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, display_id)
title = self._og_search_title(webpage) title = self._og_search_title(
webpage, default=None) or self._search_regex(
r'<h\d+[^>]+class=["\']magazineItemHeader[^>]+>(.+?)</h',
webpage, 'title')
def extract_json(attr, name, fatal=True): def extract_json(attr, name, default=NO_DEFAULT, fatal=True):
return self._parse_json( return self._parse_json(
self._search_regex( self._search_regex(
r'\b%s=(["\'])(?P<json>(?!\1).+?)\1' % attr, webpage, r'\b%s=(["\'])(?P<json>(?!\1).+?)\1' % attr, webpage,
name, group='json', fatal=fatal) or '{}', name, group='json', default=default, fatal=fatal) or '{}',
video_id, transform_source=unescapeHTML, fatal=fatal) display_id, transform_source=unescapeHTML, fatal=fatal)
quality_data = extract_json('data-quality', 'formats') quality_data = extract_json('data-quality', 'formats')
@ -59,16 +75,24 @@ class TVN24IE(InfoExtractor):
}) })
self._sort_formats(formats) self._sort_formats(formats)
description = self._og_search_description(webpage) description = self._og_search_description(webpage, default=None)
thumbnail = self._og_search_thumbnail( thumbnail = self._og_search_thumbnail(
webpage, default=None) or self._html_search_regex( webpage, default=None) or self._html_search_regex(
r'\bdata-poster=(["\'])(?P<url>(?!\1).+?)\1', webpage, r'\bdata-poster=(["\'])(?P<url>(?!\1).+?)\1', webpage,
'thumbnail', group='url') 'thumbnail', group='url')
video_id = None
share_params = extract_json( share_params = extract_json(
'data-share-params', 'share params', fatal=False) 'data-share-params', 'share params', default=None)
if isinstance(share_params, dict): if isinstance(share_params, dict):
video_id = share_params.get('id') or video_id video_id = share_params.get('id')
if not video_id:
video_id = self._search_regex(
r'data-vid-id=["\'](\d+)', webpage, 'video id',
default=None) or self._search_regex(
r',(\d+)\.html', url, 'video id', default=display_id)
return { return {
'id': video_id, 'id': video_id,

View File

@ -317,7 +317,7 @@ class TwitchVodIE(TwitchItemBaseIE):
'Downloading %s access token' % self._ITEM_TYPE) 'Downloading %s access token' % self._ITEM_TYPE)
formats = self._extract_m3u8_formats( formats = self._extract_m3u8_formats(
'%s/vod/%s?%s' % ( '%s/vod/%s.m3u8?%s' % (
self._USHER_BASE, item_id, self._USHER_BASE, item_id,
compat_urllib_parse_urlencode({ compat_urllib_parse_urlencode({
'allow_source': 'true', 'allow_source': 'true',

View File

@ -428,11 +428,22 @@ class TwitterIE(InfoExtractor):
'params': { 'params': {
'skip_download': True, # requires ffmpeg 'skip_download': True, # requires ffmpeg
}, },
}, {
'url': 'https://twitter.com/foobar/status/1087791357756956680',
'info_dict': {
'id': '1087791357756956680',
'ext': 'mp4',
'title': 'Twitter - A new is coming. Some of you got an opt-in to try it now. Check out the emoji button, quick keyboard shortcuts, upgraded trends, advanced search, and more. Let us know your thoughts!',
'thumbnail': r're:^https?://.*\.jpg',
'description': 'md5:66d493500c013e3e2d434195746a7f78',
'uploader': 'Twitter',
'uploader_id': 'Twitter',
'duration': 61.567,
},
}] }]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
user_id = mobj.group('user_id')
twid = mobj.group('id') twid = mobj.group('id')
webpage, urlh = self._download_webpage_handle( webpage, urlh = self._download_webpage_handle(
@ -441,8 +452,13 @@ class TwitterIE(InfoExtractor):
if 'twitter.com/account/suspended' in urlh.geturl(): if 'twitter.com/account/suspended' in urlh.geturl():
raise ExtractorError('Account suspended by Twitter.', expected=True) raise ExtractorError('Account suspended by Twitter.', expected=True)
if user_id is None: user_id = None
mobj = re.match(self._VALID_URL, urlh.geturl())
redirect_mobj = re.match(self._VALID_URL, urlh.geturl())
if redirect_mobj:
user_id = redirect_mobj.group('user_id')
if not user_id:
user_id = mobj.group('user_id') user_id = mobj.group('user_id')
username = remove_end(self._og_search_title(webpage), ' on Twitter') username = remove_end(self._og_search_title(webpage), ' on Twitter')

View File

@ -1,11 +1,9 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .adobepass import AdobePassIE from .adobepass import AdobePassIE
from ..utils import ( from ..utils import (
extract_attributes, NO_DEFAULT,
smuggle_url, smuggle_url,
update_url_query, update_url_query,
) )
@ -31,22 +29,22 @@ class USANetworkIE(AdobePassIE):
display_id = self._match_id(url) display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
player_params = extract_attributes(self._search_regex( def _x(name, default=NO_DEFAULT):
r'(<div[^>]+data-usa-tve-player-container[^>]*>)', webpage, 'player params')) return self._search_regex(
video_id = player_params['data-mpx-guid'] r'data-%s\s*=\s*(["\'])(?P<value>(?:(?!\1).)+)\1' % name,
title = player_params['data-episode-title'] webpage, name, default=default, group='value')
account_pid, path = re.search( video_id = _x('mpx-guid')
r'data-src="(?:https?)?//player\.theplatform\.com/p/([^/]+)/.*?/(media/guid/\d+/\d+)', title = _x('episode-title')
webpage).groups() mpx_account_id = _x('mpx-account-id', '2304992029')
query = { query = {
'mbr': 'true', 'mbr': 'true',
} }
if player_params.get('data-is-full-episode') == '1': if _x('is-full-episode', None) == '1':
query['manifest'] = 'm3u' query['manifest'] = 'm3u'
if player_params.get('data-entitlement') == 'auth': if _x('is-entitlement', None) == '1':
adobe_pass = {} adobe_pass = {}
drupal_settings = self._search_regex( drupal_settings = self._search_regex(
r'jQuery\.extend\(Drupal\.settings\s*,\s*({.+?})\);', r'jQuery\.extend\(Drupal\.settings\s*,\s*({.+?})\);',
@ -57,7 +55,7 @@ class USANetworkIE(AdobePassIE):
adobe_pass = drupal_settings.get('adobePass', {}) adobe_pass = drupal_settings.get('adobePass', {})
resource = self._get_mvpd_resource( resource = self._get_mvpd_resource(
adobe_pass.get('adobePassResourceId', 'usa'), adobe_pass.get('adobePassResourceId', 'usa'),
title, video_id, player_params.get('data-episode-rating', 'TV-14')) title, video_id, _x('episode-rating', 'TV-14'))
query['auth'] = self._extract_mvpd_auth( query['auth'] = self._extract_mvpd_auth(
url, video_id, adobe_pass.get('adobePassRequestorId', 'usa'), resource) url, video_id, adobe_pass.get('adobePassRequestorId', 'usa'), resource)
@ -65,11 +63,11 @@ class USANetworkIE(AdobePassIE):
info.update({ info.update({
'_type': 'url_transparent', '_type': 'url_transparent',
'url': smuggle_url(update_url_query( 'url': smuggle_url(update_url_query(
'http://link.theplatform.com/s/%s/%s' % (account_pid, path), 'http://link.theplatform.com/s/HNK2IC/media/guid/%s/%s' % (mpx_account_id, video_id),
query), {'force_smil_url': True}), query), {'force_smil_url': True}),
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'series': player_params.get('data-show-title'), 'series': _x('show-title', None),
'episode': title, 'episode': title,
'ie_key': 'ThePlatform', 'ie_key': 'ThePlatform',
}) })

View File

@ -34,6 +34,7 @@ class VevoIE(VevoBaseIE):
(?:https?://(?:www\.)?vevo\.com/watch/(?!playlist|genre)(?:[^/]+/(?:[^/]+/)?)?| (?:https?://(?:www\.)?vevo\.com/watch/(?!playlist|genre)(?:[^/]+/(?:[^/]+/)?)?|
https?://cache\.vevo\.com/m/html/embed\.html\?video=| https?://cache\.vevo\.com/m/html/embed\.html\?video=|
https?://videoplayer\.vevo\.com/embed/embedded\?videoId=| https?://videoplayer\.vevo\.com/embed/embedded\?videoId=|
https?://embed\.vevo\.com/.*?[?&]isrc=|
vevo:) vevo:)
(?P<id>[^&?#]+)''' (?P<id>[^&?#]+)'''
@ -144,6 +145,9 @@ class VevoIE(VevoBaseIE):
# Geo-restricted to Netherlands/Germany # Geo-restricted to Netherlands/Germany
'url': 'http://www.vevo.com/watch/boostee/pop-corn-clip-officiel/FR1A91600909', 'url': 'http://www.vevo.com/watch/boostee/pop-corn-clip-officiel/FR1A91600909',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://embed.vevo.com/?isrc=USH5V1923499&partnerId=4d61b777-8023-4191-9ede-497ed6c24647&partnerAdCode=',
'only_matching': True,
}] }]
_VERSIONS = { _VERSIONS = {
0: 'youtube', # only in AuthenticateVideo videoVersions 0: 'youtube', # only in AuthenticateVideo videoVersions

View File

@ -2,12 +2,14 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import base64 import base64
import functools
import json import json
import re import re
import itertools import itertools
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_kwargs,
compat_HTTPError, compat_HTTPError,
compat_str, compat_str,
compat_urlparse, compat_urlparse,
@ -19,6 +21,7 @@ from ..utils import (
int_or_none, int_or_none,
merge_dicts, merge_dicts,
NO_DEFAULT, NO_DEFAULT,
OnDemandPagedList,
parse_filesize, parse_filesize,
qualities, qualities,
RegexNotFoundError, RegexNotFoundError,
@ -98,6 +101,13 @@ class VimeoBaseInfoExtractor(InfoExtractor):
webpage, 'vuid', group='vuid') webpage, 'vuid', group='vuid')
return xsrft, vuid return xsrft, vuid
def _extract_vimeo_config(self, webpage, video_id, *args, **kwargs):
vimeo_config = self._search_regex(
r'vimeo\.config\s*=\s*(?:({.+?})|_extend\([^,]+,\s+({.+?})\));',
webpage, 'vimeo config', *args, **compat_kwargs(kwargs))
if vimeo_config:
return self._parse_json(vimeo_config, video_id)
def _set_vimeo_cookie(self, name, value): def _set_vimeo_cookie(self, name, value):
self._set_cookie('vimeo.com', name, value) self._set_cookie('vimeo.com', name, value)
@ -253,7 +263,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
\. \.
)? )?
vimeo(?P<pro>pro)?\.com/ vimeo(?P<pro>pro)?\.com/
(?!(?:channels|album)/[^/?#]+/?(?:$|[?#])|[^/]+/review/|ondemand/) (?!(?:channels|album|showcase)/[^/?#]+/?(?:$|[?#])|[^/]+/review/|ondemand/)
(?:.*?/)? (?:.*?/)?
(?: (?:
(?: (?:
@ -580,11 +590,9 @@ class VimeoIE(VimeoBaseInfoExtractor):
# and latter we extract those that are Vimeo specific. # and latter we extract those that are Vimeo specific.
self.report_extraction(video_id) self.report_extraction(video_id)
vimeo_config = self._search_regex( vimeo_config = self._extract_vimeo_config(webpage, video_id, default=None)
r'vimeo\.config\s*=\s*(?:({.+?})|_extend\([^,]+,\s+({.+?})\));', webpage,
'vimeo config', default=None)
if vimeo_config: if vimeo_config:
seed_status = self._parse_json(vimeo_config, video_id).get('seed_status', {}) seed_status = vimeo_config.get('seed_status', {})
if seed_status.get('state') == 'failed': if seed_status.get('state') == 'failed':
raise ExtractorError( raise ExtractorError(
'%s said: %s' % (self.IE_NAME, seed_status['title']), '%s said: %s' % (self.IE_NAME, seed_status['title']),
@ -905,7 +913,7 @@ class VimeoUserIE(VimeoChannelIE):
class VimeoAlbumIE(VimeoChannelIE): class VimeoAlbumIE(VimeoChannelIE):
IE_NAME = 'vimeo:album' IE_NAME = 'vimeo:album'
_VALID_URL = r'https://vimeo\.com/album/(?P<id>\d+)(?:$|[?#]|/(?!video))' _VALID_URL = r'https://vimeo\.com/(?:album|showcase)/(?P<id>\d+)(?:$|[?#]|/(?!video))'
_TITLE_RE = r'<header id="page_header">\n\s*<h1>(.*?)</h1>' _TITLE_RE = r'<header id="page_header">\n\s*<h1>(.*?)</h1>'
_TESTS = [{ _TESTS = [{
'url': 'https://vimeo.com/album/2632481', 'url': 'https://vimeo.com/album/2632481',
@ -925,21 +933,39 @@ class VimeoAlbumIE(VimeoChannelIE):
'params': { 'params': {
'videopassword': 'youtube-dl', 'videopassword': 'youtube-dl',
} }
}, {
'url': 'https://vimeo.com/album/2632481/sort:plays/format:thumbnail',
'only_matching': True,
}, {
# TODO: respect page number
'url': 'https://vimeo.com/album/2632481/page:2/sort:plays/format:thumbnail',
'only_matching': True,
}] }]
_PAGE_SIZE = 100
def _page_url(self, base_url, pagenum): def _fetch_page(self, album_id, authorizaion, hashed_pass, page):
return '%s/page:%d/' % (base_url, pagenum) api_page = page + 1
query = {
'fields': 'link',
'page': api_page,
'per_page': self._PAGE_SIZE,
}
if hashed_pass:
query['_hashed_pass'] = hashed_pass
videos = self._download_json(
'https://api.vimeo.com/albums/%s/videos' % album_id,
album_id, 'Downloading page %d' % api_page, query=query, headers={
'Authorization': 'jwt ' + authorizaion,
})['data']
for video in videos:
link = video.get('link')
if not link:
continue
yield self.url_result(link, VimeoIE.ie_key(), VimeoIE._match_id(link))
def _real_extract(self, url): def _real_extract(self, url):
album_id = self._match_id(url) album_id = self._match_id(url)
return self._extract_videos(album_id, 'https://vimeo.com/album/%s' % album_id) webpage = self._download_webpage(url, album_id)
webpage = self._login_list_password(url, album_id, webpage)
api_config = self._extract_vimeo_config(webpage, album_id)['api']
entries = OnDemandPagedList(functools.partial(
self._fetch_page, album_id, api_config['jwt'],
api_config.get('hashed_pass')), self._PAGE_SIZE)
return self.playlist_result(entries, album_id, self._html_search_regex(
r'<title>\s*(.+?)(?:\s+on Vimeo)?</title>', webpage, 'title', fatal=False))
class VimeoGroupsIE(VimeoAlbumIE): class VimeoGroupsIE(VimeoAlbumIE):

View File

@ -4,7 +4,10 @@ from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from .once import OnceIE from .once import OnceIE
from ..compat import compat_urllib_parse_unquote from ..compat import compat_urllib_parse_unquote
from ..utils import ExtractorError from ..utils import (
ExtractorError,
int_or_none,
)
class VoxMediaVolumeIE(OnceIE): class VoxMediaVolumeIE(OnceIE):
@ -13,18 +16,43 @@ class VoxMediaVolumeIE(OnceIE):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
video_data = self._parse_json(self._search_regex(
r'Volume\.createVideo\(({.+})\s*,\s*{.*}\s*,\s*\[.*\]\s*,\s*{.*}\);', webpage, 'video data'), video_id) setup = self._parse_json(self._search_regex(
r'setup\s*=\s*({.+});', webpage, 'setup'), video_id)
video_data = setup.get('video') or {}
info = {
'id': video_id,
'title': video_data.get('title_short'),
'description': video_data.get('description_long') or video_data.get('description_short'),
'thumbnail': video_data.get('brightcove_thumbnail')
}
asset = setup.get('asset') or setup.get('params') or {}
formats = []
hls_url = asset.get('hls_url')
if hls_url:
formats.extend(self._extract_m3u8_formats(
hls_url, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
mp4_url = asset.get('mp4_url')
if mp4_url:
tbr = self._search_regex(r'-(\d+)k\.', mp4_url, 'bitrate', default=None)
format_id = 'http'
if tbr:
format_id += '-' + tbr
formats.append({
'format_id': format_id,
'url': mp4_url,
'tbr': int_or_none(tbr),
})
if formats:
self._sort_formats(formats)
info['formats'] = formats
return info
for provider_video_type in ('ooyala', 'youtube', 'brightcove'): for provider_video_type in ('ooyala', 'youtube', 'brightcove'):
provider_video_id = video_data.get('%s_id' % provider_video_type) provider_video_id = video_data.get('%s_id' % provider_video_type)
if not provider_video_id: if not provider_video_id:
continue continue
info = {
'id': video_id,
'title': video_data.get('title_short'),
'description': video_data.get('description_long') or video_data.get('description_short'),
'thumbnail': video_data.get('brightcove_thumbnail')
}
if provider_video_type == 'brightcove': if provider_video_type == 'brightcove':
info['formats'] = self._extract_once_formats(provider_video_id) info['formats'] = self._extract_once_formats(provider_video_id)
self._sort_formats(info['formats']) self._sort_formats(info['formats'])
@ -39,46 +67,49 @@ class VoxMediaVolumeIE(OnceIE):
class VoxMediaIE(InfoExtractor): class VoxMediaIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?(?:(?:theverge|vox|sbnation|eater|polygon|curbed|racked)\.com|recode\.net)/(?:[^/]+/)*(?P<id>[^/?]+)' _VALID_URL = r'https?://(?:www\.)?(?:(?:theverge|vox|sbnation|eater|polygon|curbed|racked|funnyordie)\.com|recode\.net)/(?:[^/]+/)*(?P<id>[^/?]+)'
_TESTS = [{ _TESTS = [{
# Volume embed, Youtube
'url': 'http://www.theverge.com/2014/6/27/5849272/material-world-how-google-discovered-what-software-is-made-of', 'url': 'http://www.theverge.com/2014/6/27/5849272/material-world-how-google-discovered-what-software-is-made-of',
'info_dict': { 'info_dict': {
'id': '11eXZobjrG8DCSTgrNjVinU-YmmdYjhe', 'id': 'j4mLW6x17VM',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Google\'s new material design direction', 'title': 'Material world: how Google discovered what software is made of',
'description': 'md5:2f44f74c4d14a1f800ea73e1c6832ad2', 'description': 'md5:dfc17e7715e3b542d66e33a109861382',
'upload_date': '20190710',
'uploader_id': 'TheVerge',
'uploader': 'The Verge',
}, },
'params': { 'add_ie': ['Youtube'],
# m3u8 download
'skip_download': True,
},
'add_ie': ['Ooyala'],
}, { }, {
# data-ooyala-id # Volume embed, Youtube
'url': 'http://www.theverge.com/2014/10/21/7025853/google-nexus-6-hands-on-photos-video-android-phablet', 'url': 'http://www.theverge.com/2014/10/21/7025853/google-nexus-6-hands-on-photos-video-android-phablet',
'md5': 'd744484ff127884cd2ba09e3fa604e4b', 'md5': '4c8f4a0937752b437c3ebc0ed24802b5',
'info_dict': { 'info_dict': {
'id': 'RkZXU4cTphOCPDMZg5oEounJyoFI0g-B', 'id': 'Gy8Md3Eky38',
'ext': 'mp4', 'ext': 'mp4',
'title': 'The Nexus 6: hands-on with Google\'s phablet', 'title': 'The Nexus 6: hands-on with Google\'s phablet',
'description': 'md5:87a51fe95ff8cea8b5bdb9ac7ae6a6af', 'description': 'md5:d9f0216e5fb932dd2033d6db37ac3f1d',
'uploader_id': 'TheVerge',
'upload_date': '20141021',
'uploader': 'The Verge',
}, },
'add_ie': ['Ooyala'], 'add_ie': ['Youtube'],
'skip': 'Video Not Found', 'skip': 'similar to the previous test',
}, { }, {
# volume embed # Volume embed, Youtube
'url': 'http://www.vox.com/2016/3/31/11336640/mississippi-lgbt-religious-freedom-bill', 'url': 'http://www.vox.com/2016/3/31/11336640/mississippi-lgbt-religious-freedom-bill',
'info_dict': { 'info_dict': {
'id': 'wydzk3dDpmRz7PQoXRsTIX6XTkPjYL0b', 'id': 'YCjDnX-Xzhg',
'ext': 'mp4', 'ext': 'mp4',
'title': 'The new frontier of LGBTQ civil rights, explained', 'title': "Mississippi's laws are so bad that its anti-LGBTQ law isn't needed to allow discrimination",
'description': 'md5:0dc58e94a465cbe91d02950f770eb93f', 'description': 'md5:fc1317922057de31cd74bce91eb1c66c',
'uploader_id': 'voxdotcom',
'upload_date': '20150915',
'uploader': 'Vox',
}, },
'params': { 'add_ie': ['Youtube'],
# m3u8 download 'skip': 'similar to the previous test',
'skip_download': True,
},
'add_ie': ['Ooyala'],
}, { }, {
# youtube embed # youtube embed
'url': 'http://www.vox.com/2016/3/24/11291692/robot-dance', 'url': 'http://www.vox.com/2016/3/24/11291692/robot-dance',
@ -93,6 +124,7 @@ class VoxMediaIE(InfoExtractor):
'uploader': 'Vox', 'uploader': 'Vox',
}, },
'add_ie': ['Youtube'], 'add_ie': ['Youtube'],
'skip': 'Page no longer contain videos',
}, { }, {
# SBN.VideoLinkset.entryGroup multiple ooyala embeds # SBN.VideoLinkset.entryGroup multiple ooyala embeds
'url': 'http://www.sbnation.com/college-football-recruiting/2015/2/3/7970291/national-signing-day-rationalizations-itll-be-ok-itll-be-ok', 'url': 'http://www.sbnation.com/college-football-recruiting/2015/2/3/7970291/national-signing-day-rationalizations-itll-be-ok-itll-be-ok',
@ -118,10 +150,11 @@ class VoxMediaIE(InfoExtractor):
'description': 'md5:e02d56b026d51aa32c010676765a690d', 'description': 'md5:e02d56b026d51aa32c010676765a690d',
}, },
}], }],
'skip': 'Page no longer contain videos',
}, { }, {
# volume embed, Brightcove Once # volume embed, Brightcove Once
'url': 'https://www.recode.net/2014/6/17/11628066/post-post-pc-ceo-the-full-code-conference-video-of-microsofts-satya', 'url': 'https://www.recode.net/2014/6/17/11628066/post-post-pc-ceo-the-full-code-conference-video-of-microsofts-satya',
'md5': '01571a896281f77dc06e084138987ea2', 'md5': '2dbc77b8b0bff1894c2fce16eded637d',
'info_dict': { 'info_dict': {
'id': '1231c973d', 'id': '1231c973d',
'ext': 'mp4', 'ext': 'mp4',

View File

@ -64,7 +64,15 @@ class VRVBaseIE(InfoExtractor):
def _call_cms(self, path, video_id, note): def _call_cms(self, path, video_id, note):
if not self._CMS_SIGNING: if not self._CMS_SIGNING:
self._CMS_SIGNING = self._call_api('index', video_id, 'CMS Signing')['cms_signing'] index = self._call_api('index', video_id, 'CMS Signing')
self._CMS_SIGNING = index.get('cms_signing') or {}
if not self._CMS_SIGNING:
for signing_policy in index.get('signing_policies', []):
signing_path = signing_policy.get('path')
if signing_path and signing_path.startswith('/cms/'):
name, value = signing_policy.get('name'), signing_policy.get('value')
if name and value:
self._CMS_SIGNING[name] = value
return self._download_json( return self._download_json(
self._API_DOMAIN + path, video_id, query=self._CMS_SIGNING, self._API_DOMAIN + path, video_id, query=self._CMS_SIGNING,
note='Downloading %s JSON metadata' % note, headers=self.geo_verification_headers()) note='Downloading %s JSON metadata' % note, headers=self.geo_verification_headers())

View File

@ -32,6 +32,10 @@ class VzaarIE(InfoExtractor):
'ext': 'mp3', 'ext': 'mp3',
'title': 'MP3', 'title': 'MP3',
}, },
}, {
# with null videoTitle
'url': 'https://view.vzaar.com/20313539/download',
'only_matching': True,
}] }]
@staticmethod @staticmethod
@ -45,7 +49,7 @@ class VzaarIE(InfoExtractor):
video_data = self._download_json( video_data = self._download_json(
'http://view.vzaar.com/v2/%s/video' % video_id, video_id) 'http://view.vzaar.com/v2/%s/video' % video_id, video_id)
title = video_data['videoTitle'] title = video_data.get('videoTitle') or video_id
formats = [] formats = []

View File

@ -1,5 +1,6 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import itertools
import re import re
from .common import InfoExtractor from .common import InfoExtractor
@ -8,6 +9,7 @@ from ..utils import (
clean_html, clean_html,
determine_ext, determine_ext,
dict_get, dict_get,
extract_attributes,
ExtractorError, ExtractorError,
int_or_none, int_or_none,
parse_duration, parse_duration,
@ -18,21 +20,21 @@ from ..utils import (
class XHamsterIE(InfoExtractor): class XHamsterIE(InfoExtractor):
_DOMAINS = r'(?:xhamster\.(?:com|one|desi)|xhms\.pro|xhamster[27]\.com)'
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?:// https?://
(?:.+?\.)?xhamster\.(?:com|one)/ (?:.+?\.)?%s/
(?: (?:
movies/(?P<id>\d+)/(?P<display_id>[^/]*)\.html| movies/(?P<id>\d+)/(?P<display_id>[^/]*)\.html|
videos/(?P<display_id_2>[^/]*)-(?P<id_2>\d+) videos/(?P<display_id_2>[^/]*)-(?P<id_2>\d+)
) )
''' ''' % _DOMAINS
_TESTS = [{ _TESTS = [{
'url': 'http://xhamster.com/movies/1509445/femaleagent_shy_beauty_takes_the_bait.html', 'url': 'https://xhamster.com/videos/femaleagent-shy-beauty-takes-the-bait-1509445',
'md5': '8281348b8d3c53d39fffb377d24eac4e', 'md5': '98b4687efb1ffd331c4197854dc09e8f',
'info_dict': { 'info_dict': {
'id': '1509445', 'id': '1509445',
'display_id': 'femaleagent_shy_beauty_takes_the_bait', 'display_id': 'femaleagent-shy-beauty-takes-the-bait',
'ext': 'mp4', 'ext': 'mp4',
'title': 'FemaleAgent Shy beauty takes the bait', 'title': 'FemaleAgent Shy beauty takes the bait',
'timestamp': 1350194821, 'timestamp': 1350194821,
@ -40,13 +42,12 @@ class XHamsterIE(InfoExtractor):
'uploader': 'Ruseful2011', 'uploader': 'Ruseful2011',
'duration': 893, 'duration': 893,
'age_limit': 18, 'age_limit': 18,
'categories': ['Fake Hub', 'Amateur', 'MILFs', 'POV', 'Beauti', 'Beauties', 'Beautiful', 'Boss', 'Office', 'Oral', 'Reality', 'Sexy', 'Taking'],
}, },
}, { }, {
'url': 'http://xhamster.com/movies/2221348/britney_spears_sexy_booty.html?hd', 'url': 'https://xhamster.com/videos/britney-spears-sexy-booty-2221348?hd=',
'info_dict': { 'info_dict': {
'id': '2221348', 'id': '2221348',
'display_id': 'britney_spears_sexy_booty', 'display_id': 'britney-spears-sexy-booty',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Britney Spears Sexy Booty', 'title': 'Britney Spears Sexy Booty',
'timestamp': 1379123460, 'timestamp': 1379123460,
@ -54,13 +55,12 @@ class XHamsterIE(InfoExtractor):
'uploader': 'jojo747400', 'uploader': 'jojo747400',
'duration': 200, 'duration': 200,
'age_limit': 18, 'age_limit': 18,
'categories': ['Britney Spears', 'Celebrities', 'HD Videos', 'Sexy', 'Sexy Booty'],
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
}, { }, {
# empty seo # empty seo, unavailable via new URL schema
'url': 'http://xhamster.com/movies/5667973/.html', 'url': 'http://xhamster.com/movies/5667973/.html',
'info_dict': { 'info_dict': {
'id': '5667973', 'id': '5667973',
@ -71,7 +71,6 @@ class XHamsterIE(InfoExtractor):
'uploader': 'parejafree', 'uploader': 'parejafree',
'duration': 72, 'duration': 72,
'age_limit': 18, 'age_limit': 18,
'categories': ['Amateur', 'Blowjobs'],
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
@ -94,6 +93,18 @@ class XHamsterIE(InfoExtractor):
}, { }, {
'url': 'https://xhamster.one/videos/femaleagent-shy-beauty-takes-the-bait-1509445', 'url': 'https://xhamster.one/videos/femaleagent-shy-beauty-takes-the-bait-1509445',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://xhamster.desi/videos/femaleagent-shy-beauty-takes-the-bait-1509445',
'only_matching': True,
}, {
'url': 'https://xhamster2.com/videos/femaleagent-shy-beauty-takes-the-bait-1509445',
'only_matching': True,
}, {
'url': 'http://xhamster.com/movies/1509445/femaleagent_shy_beauty_takes_the_bait.html',
'only_matching': True,
}, {
'url': 'http://xhamster.com/movies/2221348/britney_spears_sexy_booty.html?hd',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -285,7 +296,7 @@ class XHamsterIE(InfoExtractor):
class XHamsterEmbedIE(InfoExtractor): class XHamsterEmbedIE(InfoExtractor):
_VALID_URL = r'https?://(?:.+?\.)?xhamster\.com/xembed\.php\?video=(?P<id>\d+)' _VALID_URL = r'https?://(?:.+?\.)?%s/xembed\.php\?video=(?P<id>\d+)' % XHamsterIE._DOMAINS
_TEST = { _TEST = {
'url': 'http://xhamster.com/xembed.php?video=3328539', 'url': 'http://xhamster.com/xembed.php?video=3328539',
'info_dict': { 'info_dict': {
@ -322,3 +333,49 @@ class XHamsterEmbedIE(InfoExtractor):
video_url = dict_get(vars, ('downloadLink', 'homepageLink', 'commentsLink', 'shareUrl')) video_url = dict_get(vars, ('downloadLink', 'homepageLink', 'commentsLink', 'shareUrl'))
return self.url_result(video_url, 'XHamster') return self.url_result(video_url, 'XHamster')
class XHamsterUserIE(InfoExtractor):
_VALID_URL = r'https?://(?:.+?\.)?%s/users/(?P<id>[^/?#&]+)' % XHamsterIE._DOMAINS
_TESTS = [{
# Paginated user profile
'url': 'https://xhamster.com/users/netvideogirls/videos',
'info_dict': {
'id': 'netvideogirls',
},
'playlist_mincount': 267,
}, {
# Non-paginated user profile
'url': 'https://xhamster.com/users/firatkaan/videos',
'info_dict': {
'id': 'firatkaan',
},
'playlist_mincount': 1,
}]
def _entries(self, user_id):
next_page_url = 'https://xhamster.com/users/%s/videos/1' % user_id
for pagenum in itertools.count(1):
page = self._download_webpage(
next_page_url, user_id, 'Downloading page %s' % pagenum)
for video_tag in re.findall(
r'(<a[^>]+class=["\'].*?\bvideo-thumb__image-container[^>]+>)',
page):
video = extract_attributes(video_tag)
video_url = url_or_none(video.get('href'))
if not video_url or not XHamsterIE.suitable(video_url):
continue
video_id = XHamsterIE._match_id(video_url)
yield self.url_result(
video_url, ie=XHamsterIE.ie_key(), video_id=video_id)
mobj = re.search(r'<a[^>]+data-page=["\']next[^>]+>', page)
if not mobj:
break
next_page = extract_attributes(mobj.group(0))
next_page_url = url_or_none(next_page.get('href'))
if not next_page_url:
break
def _real_extract(self, url):
user_id = self._match_id(url)
return self.playlist_result(self._entries(user_id), user_id)

View File

@ -7,7 +7,7 @@ from ..utils import int_or_none
class XiamiBaseIE(InfoExtractor): class XiamiBaseIE(InfoExtractor):
_API_BASE_URL = 'http://www.xiami.com/song/playlist/cat/json/id' _API_BASE_URL = 'https://emumo.xiami.com/song/playlist/cat/json/id'
def _download_webpage_handle(self, *args, **kwargs): def _download_webpage_handle(self, *args, **kwargs):
webpage = super(XiamiBaseIE, self)._download_webpage_handle(*args, **kwargs) webpage = super(XiamiBaseIE, self)._download_webpage_handle(*args, **kwargs)

View File

@ -1,12 +1,14 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import hashlib
import itertools import itertools
import json import json
import re import re
from .common import InfoExtractor, SearchInfoExtractor from .common import InfoExtractor, SearchInfoExtractor
from ..compat import ( from ..compat import (
compat_str,
compat_urllib_parse, compat_urllib_parse,
compat_urlparse, compat_urlparse,
) )
@ -18,7 +20,9 @@ from ..utils import (
int_or_none, int_or_none,
mimetype2ext, mimetype2ext,
smuggle_url, smuggle_url,
try_get,
unescapeHTML, unescapeHTML,
url_or_none,
) )
from .brightcove import ( from .brightcove import (
@ -556,3 +560,130 @@ class YahooGyaOIE(InfoExtractor):
'https://gyao.yahoo.co.jp/player/%s/' % video_id.replace(':', '/'), 'https://gyao.yahoo.co.jp/player/%s/' % video_id.replace(':', '/'),
YahooGyaOPlayerIE.ie_key(), video_id)) YahooGyaOPlayerIE.ie_key(), video_id))
return self.playlist_result(entries, program_id) return self.playlist_result(entries, program_id)
class YahooJapanNewsIE(InfoExtractor):
IE_NAME = 'yahoo:japannews'
IE_DESC = 'Yahoo! Japan News'
_VALID_URL = r'https?://(?P<host>(?:news|headlines)\.yahoo\.co\.jp)[^\d]*(?P<id>\d[\d-]*\d)?'
_GEO_COUNTRIES = ['JP']
_TESTS = [{
'url': 'https://headlines.yahoo.co.jp/videonews/ann?a=20190716-00000071-ann-int',
'info_dict': {
'id': '1736242',
'ext': 'mp4',
'title': 'ムン大統領が対日批判を強化“現金化”効果はテレビ朝日系ANN - Yahoo!ニュース',
'description': '韓国の元徴用工らを巡る裁判の原告が弁護士が差し押さえた三菱重工業の資産を売却して - Yahoo!ニュース(テレビ朝日系ANN)',
'thumbnail': r're:^https?://.*\.[a-zA-Z\d]{3,4}$',
},
'params': {
'skip_download': True,
},
}, {
# geo restricted
'url': 'https://headlines.yahoo.co.jp/hl?a=20190721-00000001-oxv-l04',
'only_matching': True,
}, {
'url': 'https://headlines.yahoo.co.jp/videonews/',
'only_matching': True,
}, {
'url': 'https://news.yahoo.co.jp',
'only_matching': True,
}, {
'url': 'https://news.yahoo.co.jp/byline/hashimotojunji/20190628-00131977/',
'only_matching': True,
}, {
'url': 'https://news.yahoo.co.jp/feature/1356',
'only_matching': True
}]
def _extract_formats(self, json_data, content_id):
formats = []
video_data = try_get(
json_data,
lambda x: x['ResultSet']['Result'][0]['VideoUrlSet']['VideoUrl'],
list)
for vid in video_data or []:
delivery = vid.get('delivery')
url = url_or_none(vid.get('Url'))
if not delivery or not url:
continue
elif delivery == 'hls':
formats.extend(
self._extract_m3u8_formats(
url, content_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
else:
formats.append({
'url': url,
'format_id': 'http-%s' % compat_str(vid.get('bitrate', '')),
'height': int_or_none(vid.get('height')),
'width': int_or_none(vid.get('width')),
'tbr': int_or_none(vid.get('bitrate')),
})
self._remove_duplicate_formats(formats)
self._sort_formats(formats)
return formats
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
host = mobj.group('host')
display_id = mobj.group('id') or host
webpage = self._download_webpage(url, display_id)
title = self._html_search_meta(
['og:title', 'twitter:title'], webpage, 'title', default=None
) or self._html_search_regex('<title>([^<]+)</title>', webpage, 'title')
if display_id == host:
# Headline page (w/ multiple BC playlists) ('news.yahoo.co.jp', 'headlines.yahoo.co.jp/videonews/', ...)
stream_plists = re.findall(r'plist=(\d+)', webpage) or re.findall(r'plist["\']:\s*["\']([^"\']+)', webpage)
entries = [
self.url_result(
smuggle_url(
'http://players.brightcove.net/5690807595001/HyZNerRl7_default/index.html?playlistId=%s' % plist_id,
{'geo_countries': ['JP']}),
ie='BrightcoveNew', video_id=plist_id)
for plist_id in stream_plists]
return self.playlist_result(entries, playlist_title=title)
# Article page
description = self._html_search_meta(
['og:description', 'description', 'twitter:description'],
webpage, 'description', default=None)
thumbnail = self._og_search_thumbnail(
webpage, default=None) or self._html_search_meta(
'twitter:image', webpage, 'thumbnail', default=None)
space_id = self._search_regex([
r'<script[^>]+class=["\']yvpub-player["\'][^>]+spaceid=([^&"\']+)',
r'YAHOO\.JP\.srch\.\w+link\.onLoad[^;]+spaceID["\' ]*:["\' ]+([^"\']+)',
r'<!--\s+SpaceID=(\d+)'
], webpage, 'spaceid')
content_id = self._search_regex(
r'<script[^>]+class=["\']yvpub-player["\'][^>]+contentid=(?P<contentid>[^&"\']+)',
webpage, 'contentid', group='contentid')
json_data = self._download_json(
'https://feapi-yvpub.yahooapis.jp/v1/content/%s' % content_id,
content_id,
query={
'appid': 'dj0zaiZpPVZMTVFJR0FwZWpiMyZzPWNvbnN1bWVyc2VjcmV0Jng9YjU-',
'output': 'json',
'space_id': space_id,
'domain': host,
'ak': hashlib.md5('_'.join((space_id, host)).encode()).hexdigest(),
'device_type': '1100',
})
formats = self._extract_formats(json_data, content_id)
return {
'id': content_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'formats': formats,
}

View File

@ -10,6 +10,7 @@ from ..utils import (
ExtractorError, ExtractorError,
int_or_none, int_or_none,
float_or_none, float_or_none,
try_get,
) )
@ -51,23 +52,43 @@ class YandexMusicTrackIE(YandexMusicBaseIE):
IE_DESC = 'Яндекс.Музыка - Трек' IE_DESC = 'Яндекс.Музыка - Трек'
_VALID_URL = r'https?://music\.yandex\.(?:ru|kz|ua|by)/album/(?P<album_id>\d+)/track/(?P<id>\d+)' _VALID_URL = r'https?://music\.yandex\.(?:ru|kz|ua|by)/album/(?P<album_id>\d+)/track/(?P<id>\d+)'
_TEST = { _TESTS = [{
'url': 'http://music.yandex.ru/album/540508/track/4878838', 'url': 'http://music.yandex.ru/album/540508/track/4878838',
'md5': 'f496818aa2f60b6c0062980d2e00dc20', 'md5': 'f496818aa2f60b6c0062980d2e00dc20',
'info_dict': { 'info_dict': {
'id': '4878838', 'id': '4878838',
'ext': 'mp3', 'ext': 'mp3',
'title': 'Carlo Ambrosio, Carlo Ambrosio & Fabio Di Bari - Gypsy Eyes 1', 'title': 'Carlo Ambrosio & Fabio Di Bari - Gypsy Eyes 1',
'filesize': 4628061, 'filesize': 4628061,
'duration': 193.04, 'duration': 193.04,
'track': 'Gypsy Eyes 1', 'track': 'Gypsy Eyes 1',
'album': 'Gypsy Soul', 'album': 'Gypsy Soul',
'album_artist': 'Carlo Ambrosio', 'album_artist': 'Carlo Ambrosio',
'artist': 'Carlo Ambrosio, Carlo Ambrosio & Fabio Di Bari', 'artist': 'Carlo Ambrosio & Fabio Di Bari',
'release_year': 2009, 'release_year': 2009,
}, },
'skip': 'Travis CI servers blocked by YandexMusic', 'skip': 'Travis CI servers blocked by YandexMusic',
} }, {
# multiple disks
'url': 'http://music.yandex.ru/album/3840501/track/705105',
'md5': 'ebe7b4e2ac7ac03fe11c19727ca6153e',
'info_dict': {
'id': '705105',
'ext': 'mp3',
'title': 'Hooverphonic - Sometimes',
'filesize': 5743386,
'duration': 239.27,
'track': 'Sometimes',
'album': 'The Best of Hooverphonic',
'album_artist': 'Hooverphonic',
'artist': 'Hooverphonic',
'release_year': 2016,
'genre': 'pop',
'disc_number': 2,
'track_number': 9,
},
'skip': 'Travis CI servers blocked by YandexMusic',
}]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
@ -110,9 +131,21 @@ class YandexMusicTrackIE(YandexMusicBaseIE):
'abr': int_or_none(download_data.get('bitrate')), 'abr': int_or_none(download_data.get('bitrate')),
} }
def extract_artist_name(artist):
decomposed = artist.get('decomposed')
if not isinstance(decomposed, list):
return artist['name']
parts = [artist['name']]
for element in decomposed:
if isinstance(element, dict) and element.get('name'):
parts.append(element['name'])
elif isinstance(element, compat_str):
parts.append(element)
return ''.join(parts)
def extract_artist(artist_list): def extract_artist(artist_list):
if artist_list and isinstance(artist_list, list): if artist_list and isinstance(artist_list, list):
artists_names = [a['name'] for a in artist_list if a.get('name')] artists_names = [extract_artist_name(a) for a in artist_list if a.get('name')]
if artists_names: if artists_names:
return ', '.join(artists_names) return ', '.join(artists_names)
@ -121,10 +154,17 @@ class YandexMusicTrackIE(YandexMusicBaseIE):
album = albums[0] album = albums[0]
if isinstance(album, dict): if isinstance(album, dict):
year = album.get('year') year = album.get('year')
disc_number = int_or_none(try_get(
album, lambda x: x['trackPosition']['volume']))
track_number = int_or_none(try_get(
album, lambda x: x['trackPosition']['index']))
track_info.update({ track_info.update({
'album': album.get('title'), 'album': album.get('title'),
'album_artist': extract_artist(album.get('artists')), 'album_artist': extract_artist(album.get('artists')),
'release_year': int_or_none(year), 'release_year': int_or_none(year),
'genre': album.get('genre'),
'disc_number': disc_number,
'track_number': track_number,
}) })
track_artist = extract_artist(track.get('artists')) track_artist = extract_artist(track.get('artists'))
@ -152,7 +192,7 @@ class YandexMusicAlbumIE(YandexMusicPlaylistBaseIE):
IE_DESC = 'Яндекс.Музыка - Альбом' IE_DESC = 'Яндекс.Музыка - Альбом'
_VALID_URL = r'https?://music\.yandex\.(?:ru|kz|ua|by)/album/(?P<id>\d+)/?(\?|$)' _VALID_URL = r'https?://music\.yandex\.(?:ru|kz|ua|by)/album/(?P<id>\d+)/?(\?|$)'
_TEST = { _TESTS = [{
'url': 'http://music.yandex.ru/album/540508', 'url': 'http://music.yandex.ru/album/540508',
'info_dict': { 'info_dict': {
'id': '540508', 'id': '540508',
@ -160,7 +200,15 @@ class YandexMusicAlbumIE(YandexMusicPlaylistBaseIE):
}, },
'playlist_count': 50, 'playlist_count': 50,
'skip': 'Travis CI servers blocked by YandexMusic', 'skip': 'Travis CI servers blocked by YandexMusic',
} }, {
'url': 'https://music.yandex.ru/album/3840501',
'info_dict': {
'id': '3840501',
'title': 'Hooverphonic - The Best of Hooverphonic (2016)',
},
'playlist_count': 33,
'skip': 'Travis CI servers blocked by YandexMusic',
}]
def _real_extract(self, url): def _real_extract(self, url):
album_id = self._match_id(url) album_id = self._match_id(url)
@ -169,7 +217,7 @@ class YandexMusicAlbumIE(YandexMusicPlaylistBaseIE):
'http://music.yandex.ru/handlers/album.jsx?album=%s' % album_id, 'http://music.yandex.ru/handlers/album.jsx?album=%s' % album_id,
album_id, 'Downloading album JSON') album_id, 'Downloading album JSON')
entries = self._build_playlist(album['volumes'][0]) entries = self._build_playlist([track for volume in album['volumes'] for track in volume])
title = '%s - %s' % (album['artists'][0]['name'], album['title']) title = '%s - %s' % (album['artists'][0]['name'], album['title'])
year = album.get('year') year = album.get('year')

View File

@ -3,6 +3,7 @@ from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
determine_ext,
int_or_none, int_or_none,
url_or_none, url_or_none,
) )
@ -47,6 +48,10 @@ class YandexVideoIE(InfoExtractor):
# episode, sports # episode, sports
'url': 'https://yandex.ru/?stream_channel=1538487871&stream_id=4132a07f71fb0396be93d74b3477131d', 'url': 'https://yandex.ru/?stream_channel=1538487871&stream_id=4132a07f71fb0396be93d74b3477131d',
'only_matching': True, 'only_matching': True,
}, {
# DASH with DRM
'url': 'https://yandex.ru/portal/video?from=morda&stream_id=485a92d94518d73a9d0ff778e13505f8',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -59,13 +64,22 @@ class YandexVideoIE(InfoExtractor):
'disable_trackings': 1, 'disable_trackings': 1,
})['content'] })['content']
m3u8_url = url_or_none(content.get('content_url')) or url_or_none( content_url = url_or_none(content.get('content_url')) or url_or_none(
content['streams'][0]['url']) content['streams'][0]['url'])
title = content.get('title') or content.get('computed_title') title = content.get('title') or content.get('computed_title')
formats = self._extract_m3u8_formats( ext = determine_ext(content_url)
m3u8_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls') if ext == 'm3u8':
formats = self._extract_m3u8_formats(
content_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls')
elif ext == 'mpd':
formats = self._extract_mpd_formats(
content_url, video_id, mpd_id='dash')
else:
formats = [{'url': content_url}]
self._sort_formats(formats) self._sort_formats(formats)
description = content.get('description') description = content.get('description')

View File

@ -37,7 +37,7 @@ class YourPornIE(InfoExtractor):
self._search_regex( self._search_regex(
r'data-vnfo=(["\'])(?P<data>{.+?})\1', webpage, 'data info', r'data-vnfo=(["\'])(?P<data>{.+?})\1', webpage, 'data info',
group='data'), group='data'),
video_id)[video_id]).replace('/cdn/', '/cdn4/') video_id)[video_id]).replace('/cdn/', '/cdn5/')
title = (self._search_regex( title = (self._search_regex(
r'<[^>]+\bclass=["\']PostEditTA[^>]+>([^<]+)', webpage, 'title', r'<[^>]+\bclass=["\']PostEditTA[^>]+>([^<]+)', webpage, 'title',

View File

@ -27,9 +27,11 @@ from ..compat import (
compat_str, compat_str,
) )
from ..utils import ( from ..utils import (
bool_or_none,
clean_html, clean_html,
dict_get, dict_get,
error_to_compat_str, error_to_compat_str,
extract_attributes,
ExtractorError, ExtractorError,
float_or_none, float_or_none,
get_element_by_attribute, get_element_by_attribute,
@ -39,7 +41,6 @@ from ..utils import (
orderedSet, orderedSet,
parse_codecs, parse_codecs,
parse_duration, parse_duration,
qualities,
remove_quotes, remove_quotes,
remove_start, remove_start,
smuggle_url, smuggle_url,
@ -116,6 +117,8 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
'f.req': json.dumps(f_req), 'f.req': json.dumps(f_req),
'flowName': 'GlifWebSignIn', 'flowName': 'GlifWebSignIn',
'flowEntry': 'ServiceLogin', 'flowEntry': 'ServiceLogin',
# TODO: reverse actual botguard identifier generation algo
'bgRequest': '["identifier",""]',
}) })
return self._download_json( return self._download_json(
url, None, note=note, errnote=errnote, url, None, note=note, errnote=errnote,
@ -321,17 +324,18 @@ class YoutubePlaylistBaseInfoExtractor(YoutubeEntryListBaseInfoExtractor):
for video_id, video_title in self.extract_videos_from_page(content): for video_id, video_title in self.extract_videos_from_page(content):
yield self.url_result(video_id, 'Youtube', video_id, video_title) yield self.url_result(video_id, 'Youtube', video_id, video_title)
def extract_videos_from_page(self, page): def extract_videos_from_page_impl(self, video_re, page, ids_in_page, titles_in_page):
ids_in_page = [] for mobj in re.finditer(video_re, page):
titles_in_page = []
for mobj in re.finditer(self._VIDEO_RE, page):
# The link with index 0 is not the first video of the playlist (not sure if still actual) # The link with index 0 is not the first video of the playlist (not sure if still actual)
if 'index' in mobj.groupdict() and mobj.group('id') == '0': if 'index' in mobj.groupdict() and mobj.group('id') == '0':
continue continue
video_id = mobj.group('id') video_id = mobj.group('id')
video_title = unescapeHTML(mobj.group('title')) video_title = unescapeHTML(
mobj.group('title')) if 'title' in mobj.groupdict() else None
if video_title: if video_title:
video_title = video_title.strip() video_title = video_title.strip()
if video_title == '► Play all':
video_title = None
try: try:
idx = ids_in_page.index(video_id) idx = ids_in_page.index(video_id)
if video_title and not titles_in_page[idx]: if video_title and not titles_in_page[idx]:
@ -339,6 +343,12 @@ class YoutubePlaylistBaseInfoExtractor(YoutubeEntryListBaseInfoExtractor):
except ValueError: except ValueError:
ids_in_page.append(video_id) ids_in_page.append(video_id)
titles_in_page.append(video_title) titles_in_page.append(video_title)
def extract_videos_from_page(self, page):
ids_in_page = []
titles_in_page = []
self.extract_videos_from_page_impl(
self._VIDEO_RE, page, ids_in_page, titles_in_page)
return zip(ids_in_page, titles_in_page) return zip(ids_in_page, titles_in_page)
@ -368,11 +378,24 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
(?:www\.)?hooktube\.com/| (?:www\.)?hooktube\.com/|
(?:www\.)?yourepeat\.com/| (?:www\.)?yourepeat\.com/|
tube\.majestyc\.net/| tube\.majestyc\.net/|
# Invidious instances taken from https://github.com/omarroth/invidious/wiki/Invidious-Instances
(?:(?:www|dev)\.)?invidio\.us/| (?:(?:www|dev)\.)?invidio\.us/|
(?:www\.)?invidiou\.sh/| (?:(?:www|no)\.)?invidiou\.sh/|
(?:www\.)?invidious\.snopyta\.org/| (?:(?:www|fi|de)\.)?invidious\.snopyta\.org/|
(?:www\.)?invidious\.kabi\.tk/| (?:www\.)?invidious\.kabi\.tk/|
(?:www\.)?invidious\.enkirton\.net/|
(?:www\.)?invidious\.13ad\.de/|
(?:www\.)?invidious\.mastodon\.host/|
(?:www\.)?invidious\.nixnet\.xyz/|
(?:www\.)?tube\.poal\.co/|
(?:www\.)?vid\.wxzm\.sx/| (?:www\.)?vid\.wxzm\.sx/|
(?:www\.)?yt\.elukerio\.org/|
(?:www\.)?kgg2m7yk5aybusll\.onion/|
(?:www\.)?qklhadlycap4cnod\.onion/|
(?:www\.)?axqzx4s6s54s32yentfqojs3x5i7faxza6xo3ehd4bzzsg2ii4fv2iid\.onion/|
(?:www\.)?c7hqkpkpemu6e7emz5b4vyz7idjgdvgaaa3dyimmeojqbgpea3xqjoid\.onion/|
(?:www\.)?fz253lmuao3strwbfbmx46yu7acac2jz27iwtorgmbqlkurlclmancad\.onion/|
(?:www\.)?invidious\.l4qlywnpwqsluw65ts7md3khrivpirse744un3x7mlskqauz5pyuzgqd\.onion/|
youtube\.googleapis\.com/) # the various hostnames, with wildcard subdomains youtube\.googleapis\.com/) # the various hostnames, with wildcard subdomains
(?:.*?\#/)? # handle anchor (#/) redirect urls (?:.*?\#/)? # handle anchor (#/) redirect urls
(?: # the various things that can precede the ID: (?: # the various things that can precede the ID:
@ -1587,17 +1610,6 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
video_id = mobj.group(2) video_id = mobj.group(2)
return video_id return video_id
def _extract_annotations(self, video_id):
return self._download_webpage(
'https://www.youtube.com/annotations_invideo', video_id,
note='Downloading annotations',
errnote='Unable to download video annotations', fatal=False,
query={
'features': 1,
'legacy': 1,
'video_id': video_id,
})
@staticmethod @staticmethod
def _extract_chapters(description, duration): def _extract_chapters(description, duration):
if not description: if not description:
@ -1692,6 +1704,15 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
def extract_token(v_info): def extract_token(v_info):
return dict_get(v_info, ('account_playback_token', 'accountPlaybackToken', 'token')) return dict_get(v_info, ('account_playback_token', 'accountPlaybackToken', 'token'))
def extract_player_response(player_response, video_id):
pl_response = str_or_none(player_response)
if not pl_response:
return
pl_response = self._parse_json(pl_response, video_id, fatal=False)
if isinstance(pl_response, dict):
add_dash_mpd_pr(pl_response)
return pl_response
player_response = {} player_response = {}
# Get video info # Get video info
@ -1714,7 +1735,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
note='Refetching age-gated info webpage', note='Refetching age-gated info webpage',
errnote='unable to download video info webpage') errnote='unable to download video info webpage')
video_info = compat_parse_qs(video_info_webpage) video_info = compat_parse_qs(video_info_webpage)
pl_response = video_info.get('player_response', [None])[0]
player_response = extract_player_response(pl_response, video_id)
add_dash_mpd(video_info) add_dash_mpd(video_info)
view_count = extract_view_count(video_info)
else: else:
age_gate = False age_gate = False
video_info = None video_info = None
@ -1737,11 +1761,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
is_live = True is_live = True
sts = ytplayer_config.get('sts') sts = ytplayer_config.get('sts')
if not player_response: if not player_response:
pl_response = str_or_none(args.get('player_response')) player_response = extract_player_response(args.get('player_response'), video_id)
if pl_response:
pl_response = self._parse_json(pl_response, video_id, fatal=False)
if isinstance(pl_response, dict):
player_response = pl_response
if not video_info or self._downloader.params.get('youtube_include_dash_manifest', True): if not video_info or self._downloader.params.get('youtube_include_dash_manifest', True):
add_dash_mpd_pr(player_response) add_dash_mpd_pr(player_response)
# We also try looking in get_video_info since it may contain different dashmpd # We also try looking in get_video_info since it may contain different dashmpd
@ -1773,9 +1793,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
get_video_info = compat_parse_qs(video_info_webpage) get_video_info = compat_parse_qs(video_info_webpage)
if not player_response: if not player_response:
pl_response = get_video_info.get('player_response', [None])[0] pl_response = get_video_info.get('player_response', [None])[0]
if isinstance(pl_response, dict): player_response = extract_player_response(pl_response, video_id)
player_response = pl_response
add_dash_mpd_pr(player_response)
add_dash_mpd(get_video_info) add_dash_mpd(get_video_info)
if view_count is None: if view_count is None:
view_count = extract_view_count(get_video_info) view_count = extract_view_count(get_video_info)
@ -1798,9 +1816,15 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
break break
def extract_unavailable_message(): def extract_unavailable_message():
return self._html_search_regex( messages = []
r'(?s)<h1[^>]+id="unavailable-message"[^>]*>(.+?)</h1>', for tag, kind in (('h1', 'message'), ('div', 'submessage')):
video_webpage, 'unavailable message', default=None) msg = self._html_search_regex(
r'(?s)<{tag}[^>]+id=["\']unavailable-{kind}["\'][^>]*>(.+?)</{tag}>'.format(tag=tag, kind=kind),
video_webpage, 'unavailable %s' % kind, default=None)
if msg:
messages.append(msg)
if messages:
return '\n'.join(messages)
if not video_info: if not video_info:
unavailable_message = extract_unavailable_message() unavailable_message = extract_unavailable_message()
@ -1812,16 +1836,11 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
video_details = try_get( video_details = try_get(
player_response, lambda x: x['videoDetails'], dict) or {} player_response, lambda x: x['videoDetails'], dict) or {}
# title video_title = video_info.get('title', [None])[0] or video_details.get('title')
if 'title' in video_info: if not video_title:
video_title = video_info['title'][0]
elif 'title' in player_response:
video_title = video_details['title']
else:
self._downloader.report_warning('Unable to extract video title') self._downloader.report_warning('Unable to extract video title')
video_title = '_' video_title = '_'
# description
description_original = video_description = get_element_by_id("eow-description", video_webpage) description_original = video_description = get_element_by_id("eow-description", video_webpage)
if video_description: if video_description:
@ -1846,11 +1865,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
''', replace_url, video_description) ''', replace_url, video_description)
video_description = clean_html(video_description) video_description = clean_html(video_description)
else: else:
fd_mobj = re.search(r'<meta name="description" content="([^"]+)"', video_webpage) video_description = self._html_search_meta('description', video_webpage) or video_details.get('shortDescription')
if fd_mobj:
video_description = unescapeHTML(fd_mobj.group(1))
else:
video_description = ''
if not smuggled_data.get('force_singlefeed', False): if not smuggled_data.get('force_singlefeed', False):
if not self._downloader.params.get('noplaylist'): if not self._downloader.params.get('noplaylist'):
@ -1888,6 +1903,9 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
if view_count is None and video_details: if view_count is None and video_details:
view_count = int_or_none(video_details.get('viewCount')) view_count = int_or_none(video_details.get('viewCount'))
if is_live is None:
is_live = bool_or_none(video_details.get('isLive'))
# Check for "rental" videos # Check for "rental" videos
if 'ypc_video_rental_bar_text' in video_info and 'author' not in video_info: if 'ypc_video_rental_bar_text' in video_info and 'author' not in video_info:
raise ExtractorError('"rental" videos not supported. See https://github.com/ytdl-org/youtube-dl/issues/359 for more information.', expected=True) raise ExtractorError('"rental" videos not supported. See https://github.com/ytdl-org/youtube-dl/issues/359 for more information.', expected=True)
@ -1896,6 +1914,9 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
return int_or_none(self._search_regex( return int_or_none(self._search_regex(
r'\bclen[=/](\d+)', media_url, 'filesize', default=None)) r'\bclen[=/](\d+)', media_url, 'filesize', default=None))
streaming_formats = try_get(player_response, lambda x: x['streamingData']['formats'], list) or []
streaming_formats.extend(try_get(player_response, lambda x: x['streamingData']['adaptiveFormats'], list) or [])
if 'conn' in video_info and video_info['conn'][0].startswith('rtmp'): if 'conn' in video_info and video_info['conn'][0].startswith('rtmp'):
self.report_rtmp_download() self.report_rtmp_download()
formats = [{ formats = [{
@ -1904,10 +1925,11 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'url': video_info['conn'][0], 'url': video_info['conn'][0],
'player_url': player_url, 'player_url': player_url,
}] }]
elif not is_live and (len(video_info.get('url_encoded_fmt_stream_map', [''])[0]) >= 1 or len(video_info.get('adaptive_fmts', [''])[0]) >= 1): elif not is_live and (streaming_formats or len(video_info.get('url_encoded_fmt_stream_map', [''])[0]) >= 1 or len(video_info.get('adaptive_fmts', [''])[0]) >= 1):
encoded_url_map = video_info.get('url_encoded_fmt_stream_map', [''])[0] + ',' + video_info.get('adaptive_fmts', [''])[0] encoded_url_map = video_info.get('url_encoded_fmt_stream_map', [''])[0] + ',' + video_info.get('adaptive_fmts', [''])[0]
if 'rtmpe%3Dyes' in encoded_url_map: if 'rtmpe%3Dyes' in encoded_url_map:
raise ExtractorError('rtmpe downloads are not supported, see https://github.com/ytdl-org/youtube-dl/issues/343 for more information.', expected=True) raise ExtractorError('rtmpe downloads are not supported, see https://github.com/ytdl-org/youtube-dl/issues/343 for more information.', expected=True)
formats = []
formats_spec = {} formats_spec = {}
fmt_list = video_info.get('fmt_list', [''])[0] fmt_list = video_info.get('fmt_list', [''])[0]
if fmt_list: if fmt_list:
@ -1921,91 +1943,104 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'width': int_or_none(width_height[0]), 'width': int_or_none(width_height[0]),
'height': int_or_none(width_height[1]), 'height': int_or_none(width_height[1]),
} }
q = qualities(['small', 'medium', 'hd720']) for fmt in streaming_formats:
streaming_formats = try_get(player_response, lambda x: x['streamingData']['formats'], list) itag = str_or_none(fmt.get('itag'))
if streaming_formats: if not itag:
for fmt in streaming_formats:
itag = str_or_none(fmt.get('itag'))
if not itag:
continue
quality = fmt.get('quality')
quality_label = fmt.get('qualityLabel') or quality
formats_spec[itag] = {
'asr': int_or_none(fmt.get('audioSampleRate')),
'filesize': int_or_none(fmt.get('contentLength')),
'format_note': quality_label,
'fps': int_or_none(fmt.get('fps')),
'height': int_or_none(fmt.get('height')),
'quality': q(quality),
# bitrate for itag 43 is always 2147483647
'tbr': float_or_none(fmt.get('averageBitrate') or fmt.get('bitrate'), 1000) if itag != '43' else None,
'width': int_or_none(fmt.get('width')),
}
formats = []
for url_data_str in encoded_url_map.split(','):
url_data = compat_parse_qs(url_data_str)
if 'itag' not in url_data or 'url' not in url_data or url_data.get('drm_families'):
continue continue
quality = fmt.get('quality')
quality_label = fmt.get('qualityLabel') or quality
formats_spec[itag] = {
'asr': int_or_none(fmt.get('audioSampleRate')),
'filesize': int_or_none(fmt.get('contentLength')),
'format_note': quality_label,
'fps': int_or_none(fmt.get('fps')),
'height': int_or_none(fmt.get('height')),
# bitrate for itag 43 is always 2147483647
'tbr': float_or_none(fmt.get('averageBitrate') or fmt.get('bitrate'), 1000) if itag != '43' else None,
'width': int_or_none(fmt.get('width')),
}
for fmt in streaming_formats:
if fmt.get('drm_families'):
continue
url = url_or_none(fmt.get('url'))
if not url:
cipher = fmt.get('cipher')
if not cipher:
continue
url_data = compat_parse_qs(cipher)
url = url_or_none(try_get(url_data, lambda x: x['url'][0], compat_str))
if not url:
continue
else:
cipher = None
url_data = compat_parse_qs(compat_urllib_parse_urlparse(url).query)
stream_type = int_or_none(try_get(url_data, lambda x: x['stream_type'][0])) stream_type = int_or_none(try_get(url_data, lambda x: x['stream_type'][0]))
# Unsupported FORMAT_STREAM_TYPE_OTF # Unsupported FORMAT_STREAM_TYPE_OTF
if stream_type == 3: if stream_type == 3:
continue continue
format_id = url_data['itag'][0]
url = url_data['url'][0]
if 's' in url_data or self._downloader.params.get('youtube_include_dash_manifest', True): format_id = fmt.get('itag') or url_data['itag'][0]
ASSETS_RE = r'"assets":.+?"js":\s*("[^"]+")' if not format_id:
jsplayer_url_json = self._search_regex( continue
ASSETS_RE, format_id = compat_str(format_id)
embed_webpage if age_gate else video_webpage,
'JS player URL (1)', default=None) if cipher:
if not jsplayer_url_json and not age_gate: if 's' in url_data or self._downloader.params.get('youtube_include_dash_manifest', True):
# We need the embed website after all ASSETS_RE = r'"assets":.+?"js":\s*("[^"]+")'
if embed_webpage is None:
embed_url = proto + '://www.youtube.com/embed/%s' % video_id
embed_webpage = self._download_webpage(
embed_url, video_id, 'Downloading embed webpage')
jsplayer_url_json = self._search_regex( jsplayer_url_json = self._search_regex(
ASSETS_RE, embed_webpage, 'JS player URL') ASSETS_RE,
embed_webpage if age_gate else video_webpage,
'JS player URL (1)', default=None)
if not jsplayer_url_json and not age_gate:
# We need the embed website after all
if embed_webpage is None:
embed_url = proto + '://www.youtube.com/embed/%s' % video_id
embed_webpage = self._download_webpage(
embed_url, video_id, 'Downloading embed webpage')
jsplayer_url_json = self._search_regex(
ASSETS_RE, embed_webpage, 'JS player URL')
player_url = json.loads(jsplayer_url_json) player_url = json.loads(jsplayer_url_json)
if player_url is None:
player_url_json = self._search_regex(
r'ytplayer\.config.*?"url"\s*:\s*("[^"]+")',
video_webpage, 'age gate player URL')
player_url = json.loads(player_url_json)
if 'sig' in url_data:
url += '&signature=' + url_data['sig'][0]
elif 's' in url_data:
encrypted_sig = url_data['s'][0]
if self._downloader.params.get('verbose'):
if player_url is None: if player_url is None:
player_version = 'unknown' player_url_json = self._search_regex(
player_desc = 'unknown' r'ytplayer\.config.*?"url"\s*:\s*("[^"]+")',
else: video_webpage, 'age gate player URL')
if player_url.endswith('swf'): player_url = json.loads(player_url_json)
player_version = self._search_regex(
r'-(.+?)(?:/watch_as3)?\.swf$', player_url, if 'sig' in url_data:
'flash player', fatal=False) url += '&signature=' + url_data['sig'][0]
player_desc = 'flash player %s' % player_version elif 's' in url_data:
encrypted_sig = url_data['s'][0]
if self._downloader.params.get('verbose'):
if player_url is None:
player_version = 'unknown'
player_desc = 'unknown'
else: else:
player_version = self._search_regex( if player_url.endswith('swf'):
[r'html5player-([^/]+?)(?:/html5player(?:-new)?)?\.js', player_version = self._search_regex(
r'(?:www|player(?:_ias)?)-([^/]+)(?:/[a-z]{2,3}_[A-Z]{2})?/base\.js'], r'-(.+?)(?:/watch_as3)?\.swf$', player_url,
player_url, 'flash player', fatal=False)
'html5 player', fatal=False) player_desc = 'flash player %s' % player_version
player_desc = 'html5 player %s' % player_version else:
player_version = self._search_regex(
[r'html5player-([^/]+?)(?:/html5player(?:-new)?)?\.js',
r'(?:www|player(?:_ias)?)-([^/]+)(?:/[a-z]{2,3}_[A-Z]{2})?/base\.js'],
player_url,
'html5 player', fatal=False)
player_desc = 'html5 player %s' % player_version
parts_sizes = self._signature_cache_id(encrypted_sig) parts_sizes = self._signature_cache_id(encrypted_sig)
self.to_screen('{%s} signature length %s, %s' % self.to_screen('{%s} signature length %s, %s' %
(format_id, parts_sizes, player_desc)) (format_id, parts_sizes, player_desc))
signature = self._decrypt_signature( signature = self._decrypt_signature(
encrypted_sig, video_id, player_url, age_gate) encrypted_sig, video_id, player_url, age_gate)
sp = try_get(url_data, lambda x: x['sp'][0], compat_str) or 'signature' sp = try_get(url_data, lambda x: x['sp'][0], compat_str) or 'signature'
url += '&%s=%s' % (sp, signature) url += '&%s=%s' % (sp, signature)
if 'ratebypass' not in url: if 'ratebypass' not in url:
url += '&ratebypass=yes' url += '&ratebypass=yes'
@ -2025,24 +2060,33 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
mobj = re.search(r'^(?P<width>\d+)[xX](?P<height>\d+)$', url_data.get('size', [''])[0]) mobj = re.search(r'^(?P<width>\d+)[xX](?P<height>\d+)$', url_data.get('size', [''])[0])
width, height = (int(mobj.group('width')), int(mobj.group('height'))) if mobj else (None, None) width, height = (int(mobj.group('width')), int(mobj.group('height'))) if mobj else (None, None)
if width is None:
width = int_or_none(fmt.get('width'))
if height is None:
height = int_or_none(fmt.get('height'))
filesize = int_or_none(url_data.get( filesize = int_or_none(url_data.get(
'clen', [None])[0]) or _extract_filesize(url) 'clen', [None])[0]) or _extract_filesize(url)
quality = url_data.get('quality', [None])[0] quality = url_data.get('quality', [None])[0] or fmt.get('quality')
quality_label = url_data.get('quality_label', [None])[0] or fmt.get('qualityLabel')
tbr = (float_or_none(url_data.get('bitrate', [None])[0], 1000)
or float_or_none(fmt.get('bitrate'), 1000)) if format_id != '43' else None
fps = int_or_none(url_data.get('fps', [None])[0]) or int_or_none(fmt.get('fps'))
more_fields = { more_fields = {
'filesize': filesize, 'filesize': filesize,
'tbr': float_or_none(url_data.get('bitrate', [None])[0], 1000), 'tbr': tbr,
'width': width, 'width': width,
'height': height, 'height': height,
'fps': int_or_none(url_data.get('fps', [None])[0]), 'fps': fps,
'format_note': url_data.get('quality_label', [None])[0] or quality, 'format_note': quality_label or quality,
'quality': q(quality),
} }
for key, value in more_fields.items(): for key, value in more_fields.items():
if value: if value:
dct[key] = value dct[key] = value
type_ = url_data.get('type', [None])[0] type_ = url_data.get('type', [None])[0] or fmt.get('mimeType')
if type_: if type_:
type_split = type_.split(';') type_split = type_.split(';')
kind_ext = type_split[0].split('/') kind_ext = type_split[0].split('/')
@ -2090,9 +2134,14 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
a_format.setdefault('http_headers', {})['Youtubedl-no-compression'] = 'True' a_format.setdefault('http_headers', {})['Youtubedl-no-compression'] = 'True'
formats.append(a_format) formats.append(a_format)
else: else:
error_message = clean_html(video_info.get('reason', [None])[0]) error_message = extract_unavailable_message()
if not error_message: if not error_message:
error_message = extract_unavailable_message() error_message = clean_html(try_get(
player_response, lambda x: x['playabilityStatus']['reason'],
compat_str))
if not error_message:
error_message = clean_html(
try_get(video_info, lambda x: x['reason'][0], compat_str))
if error_message: if error_message:
raise ExtractorError(error_message, expected=True) raise ExtractorError(error_message, expected=True)
raise ExtractorError('no conn, hlsvp, hlsManifestUrl or url_encoded_fmt_stream_map information found in video info') raise ExtractorError('no conn, hlsvp, hlsManifestUrl or url_encoded_fmt_stream_map information found in video info')
@ -2263,7 +2312,21 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
# annotations # annotations
video_annotations = None video_annotations = None
if self._downloader.params.get('writeannotations', False): if self._downloader.params.get('writeannotations', False):
video_annotations = self._extract_annotations(video_id) xsrf_token = self._search_regex(
r'([\'"])XSRF_TOKEN\1\s*:\s*([\'"])(?P<xsrf_token>[A-Za-z0-9+/=]+)\2',
video_webpage, 'xsrf token', group='xsrf_token', fatal=False)
invideo_url = try_get(
player_response, lambda x: x['annotations'][0]['playerAnnotationsUrlsRenderer']['invideoUrl'], compat_str)
if xsrf_token and invideo_url:
xsrf_field_name = self._search_regex(
r'([\'"])XSRF_FIELD_NAME\1\s*:\s*([\'"])(?P<xsrf_field_name>\w+)\2',
video_webpage, 'xsrf field name',
group='xsrf_field_name', default='session_token')
video_annotations = self._download_webpage(
self._proto_relative_url(invideo_url),
video_id, note='Downloading annotations',
errnote='Unable to download video annotations', fatal=False,
data=urlencode_postdata({xsrf_field_name: xsrf_token}))
chapters = self._extract_chapters(description_original, video_duration) chapters = self._extract_chapters(description_original, video_duration)
@ -2421,7 +2484,8 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
(%(playlist_id)s) (%(playlist_id)s)
)""" % {'playlist_id': YoutubeBaseInfoExtractor._PLAYLIST_ID_RE} )""" % {'playlist_id': YoutubeBaseInfoExtractor._PLAYLIST_ID_RE}
_TEMPLATE_URL = 'https://www.youtube.com/playlist?list=%s' _TEMPLATE_URL = 'https://www.youtube.com/playlist?list=%s'
_VIDEO_RE = r'href="\s*/watch\?v=(?P<id>[0-9A-Za-z_-]{11})&amp;[^"]*?index=(?P<index>\d+)(?:[^>]+>(?P<title>[^<]+))?' _VIDEO_RE_TPL = r'href="\s*/watch\?v=%s(?:&amp;(?:[^"]*?index=(?P<index>\d+))?(?:[^>]+>(?P<title>[^<]+))?)?'
_VIDEO_RE = _VIDEO_RE_TPL % r'(?P<id>[0-9A-Za-z_-]{11})'
IE_NAME = 'youtube:playlist' IE_NAME = 'youtube:playlist'
_TESTS = [{ _TESTS = [{
'url': 'https://www.youtube.com/playlist?list=PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re', 'url': 'https://www.youtube.com/playlist?list=PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re',
@ -2444,6 +2508,8 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
'info_dict': { 'info_dict': {
'title': '29C3: Not my department', 'title': '29C3: Not my department',
'id': 'PLwP_SiAcdui0KVebT0mU9Apz359a4ubsC', 'id': 'PLwP_SiAcdui0KVebT0mU9Apz359a4ubsC',
'uploader': 'Christiaan008',
'uploader_id': 'ChRiStIaAn008',
}, },
'playlist_count': 95, 'playlist_count': 95,
}, { }, {
@ -2452,6 +2518,8 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
'info_dict': { 'info_dict': {
'title': '[OLD]Team Fortress 2 (Class-based LP)', 'title': '[OLD]Team Fortress 2 (Class-based LP)',
'id': 'PLBB231211A4F62143', 'id': 'PLBB231211A4F62143',
'uploader': 'Wickydoo',
'uploader_id': 'Wickydoo',
}, },
'playlist_mincount': 26, 'playlist_mincount': 26,
}, { }, {
@ -2460,6 +2528,8 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
'info_dict': { 'info_dict': {
'title': 'Uploads from Cauchemar', 'title': 'Uploads from Cauchemar',
'id': 'UUBABnxM4Ar9ten8Mdjj1j0Q', 'id': 'UUBABnxM4Ar9ten8Mdjj1j0Q',
'uploader': 'Cauchemar',
'uploader_id': 'Cauchemar89',
}, },
'playlist_mincount': 799, 'playlist_mincount': 799,
}, { }, {
@ -2477,13 +2547,17 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
'info_dict': { 'info_dict': {
'title': 'JODA15', 'title': 'JODA15',
'id': 'PL6IaIsEjSbf96XFRuNccS_RuEXwNdsoEu', 'id': 'PL6IaIsEjSbf96XFRuNccS_RuEXwNdsoEu',
'uploader': 'milan',
'uploader_id': 'UCEI1-PVPcYXjB73Hfelbmaw',
} }
}, { }, {
'url': 'http://www.youtube.com/embed/_xDOZElKyNU?list=PLsyOSbh5bs16vubvKePAQ1x3PhKavfBIl', 'url': 'http://www.youtube.com/embed/_xDOZElKyNU?list=PLsyOSbh5bs16vubvKePAQ1x3PhKavfBIl',
'playlist_mincount': 485, 'playlist_mincount': 485,
'info_dict': { 'info_dict': {
'title': '2017 華語最新單曲 (2/24更新)', 'title': '2018 Chinese New Singles (11/6 updated)',
'id': 'PLsyOSbh5bs16vubvKePAQ1x3PhKavfBIl', 'id': 'PLsyOSbh5bs16vubvKePAQ1x3PhKavfBIl',
'uploader': 'LBK',
'uploader_id': 'sdragonfang',
} }
}, { }, {
'note': 'Embedded SWF player', 'note': 'Embedded SWF player',
@ -2492,13 +2566,16 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
'info_dict': { 'info_dict': {
'title': 'JODA7', 'title': 'JODA7',
'id': 'YN5VISEtHet5D4NEvfTd0zcgFk84NqFZ', 'id': 'YN5VISEtHet5D4NEvfTd0zcgFk84NqFZ',
} },
'skip': 'This playlist does not exist',
}, { }, {
'note': 'Buggy playlist: the webpage has a "Load more" button but it doesn\'t have more videos', 'note': 'Buggy playlist: the webpage has a "Load more" button but it doesn\'t have more videos',
'url': 'https://www.youtube.com/playlist?list=UUXw-G3eDE9trcvY2sBMM_aA', 'url': 'https://www.youtube.com/playlist?list=UUXw-G3eDE9trcvY2sBMM_aA',
'info_dict': { 'info_dict': {
'title': 'Uploads from Interstellar Movie', 'title': 'Uploads from Interstellar Movie',
'id': 'UUXw-G3eDE9trcvY2sBMM_aA', 'id': 'UUXw-G3eDE9trcvY2sBMM_aA',
'uploader': 'Interstellar Movie',
'uploader_id': 'InterstellarMovie1',
}, },
'playlist_mincount': 21, 'playlist_mincount': 21,
}, { }, {
@ -2523,6 +2600,7 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
'skip': 'This video is not available.',
'add_ie': [YoutubeIE.ie_key()], 'add_ie': [YoutubeIE.ie_key()],
}, { }, {
'url': 'https://youtu.be/yeWKywCrFtk?list=PL2qgrgXsNUG5ig9cat4ohreBjYLAPC0J5', 'url': 'https://youtu.be/yeWKywCrFtk?list=PL2qgrgXsNUG5ig9cat4ohreBjYLAPC0J5',
@ -2534,7 +2612,6 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
'uploader_id': 'backuspagemuseum', 'uploader_id': 'backuspagemuseum',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/backuspagemuseum', 'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/backuspagemuseum',
'upload_date': '20161008', 'upload_date': '20161008',
'license': 'Standard YouTube License',
'description': 'md5:800c0c78d5eb128500bffd4f0b4f2e8a', 'description': 'md5:800c0c78d5eb128500bffd4f0b4f2e8a',
'categories': ['Nonprofits & Activism'], 'categories': ['Nonprofits & Activism'],
'tags': list, 'tags': list,
@ -2545,6 +2622,16 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
'noplaylist': True, 'noplaylist': True,
'skip_download': True, 'skip_download': True,
}, },
}, {
# https://github.com/ytdl-org/youtube-dl/issues/21844
'url': 'https://www.youtube.com/playlist?list=PLzH6n4zXuckpfMu_4Ff8E7Z1behQks5ba',
'info_dict': {
'title': 'Data Analysis with Dr Mike Pound',
'id': 'PLzH6n4zXuckpfMu_4Ff8E7Z1behQks5ba',
'uploader_id': 'Computerphile',
'uploader': 'Computerphile',
},
'playlist_mincount': 11,
}, { }, {
'url': 'https://youtu.be/uWyaPkt-VOI?list=PL9D9FC436B881BA21', 'url': 'https://youtu.be/uWyaPkt-VOI?list=PL9D9FC436B881BA21',
'only_matching': True, 'only_matching': True,
@ -2563,6 +2650,34 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
def _real_initialize(self): def _real_initialize(self):
self._login() self._login()
def extract_videos_from_page(self, page):
ids_in_page = []
titles_in_page = []
for item in re.findall(
r'(<[^>]*\bdata-video-id\s*=\s*["\'][0-9A-Za-z_-]{11}[^>]+>)', page):
attrs = extract_attributes(item)
video_id = attrs['data-video-id']
video_title = unescapeHTML(attrs.get('data-title'))
if video_title:
video_title = video_title.strip()
ids_in_page.append(video_id)
titles_in_page.append(video_title)
# Fallback with old _VIDEO_RE
self.extract_videos_from_page_impl(
self._VIDEO_RE, page, ids_in_page, titles_in_page)
# Relaxed fallbacks
self.extract_videos_from_page_impl(
r'href="\s*/watch\?v\s*=\s*(?P<id>[0-9A-Za-z_-]{11})', page,
ids_in_page, titles_in_page)
self.extract_videos_from_page_impl(
r'data-video-ids\s*=\s*["\'](?P<id>[0-9A-Za-z_-]{11})', page,
ids_in_page, titles_in_page)
return zip(ids_in_page, titles_in_page)
def _extract_mix(self, playlist_id): def _extract_mix(self, playlist_id):
# The mixes are generated from a single video # The mixes are generated from a single video
# the id of the playlist is just 'RD' + video_id # the id of the playlist is just 'RD' + video_id
@ -2625,7 +2740,7 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
page, 'title', default=None) page, 'title', default=None)
_UPLOADER_BASE = r'class=["\']pl-header-details[^>]+>\s*<li>\s*<a[^>]+\bhref=' _UPLOADER_BASE = r'class=["\']pl-header-details[^>]+>\s*<li>\s*<a[^>]+\bhref='
uploader = self._search_regex( uploader = self._html_search_regex(
r'%s["\']/(?:user|channel)/[^>]+>([^<]+)' % _UPLOADER_BASE, r'%s["\']/(?:user|channel)/[^>]+>([^<]+)' % _UPLOADER_BASE,
page, 'uploader', default=None) page, 'uploader', default=None)
mobj = re.search( mobj = re.search(
@ -2711,6 +2826,8 @@ class YoutubeChannelIE(YoutubePlaylistBaseInfoExtractor):
'info_dict': { 'info_dict': {
'id': 'UUKfVa3S1e4PHvxWcwyMMg8w', 'id': 'UUKfVa3S1e4PHvxWcwyMMg8w',
'title': 'Uploads from lex will', 'title': 'Uploads from lex will',
'uploader': 'lex will',
'uploader_id': 'UCKfVa3S1e4PHvxWcwyMMg8w',
} }
}, { }, {
'note': 'Age restricted channel', 'note': 'Age restricted channel',
@ -2720,6 +2837,8 @@ class YoutubeChannelIE(YoutubePlaylistBaseInfoExtractor):
'info_dict': { 'info_dict': {
'id': 'UUs0ifCMCm1icqRbqhUINa0w', 'id': 'UUs0ifCMCm1icqRbqhUINa0w',
'title': 'Uploads from Deus Ex', 'title': 'Uploads from Deus Ex',
'uploader': 'Deus Ex',
'uploader_id': 'DeusExOfficial',
}, },
}, { }, {
'url': 'https://invidio.us/channel/UC23qupoDRn9YOAVzeoxjOQA', 'url': 'https://invidio.us/channel/UC23qupoDRn9YOAVzeoxjOQA',
@ -2804,6 +2923,8 @@ class YoutubeUserIE(YoutubeChannelIE):
'info_dict': { 'info_dict': {
'id': 'UUfX55Sx5hEFjoC3cNs6mCUQ', 'id': 'UUfX55Sx5hEFjoC3cNs6mCUQ',
'title': 'Uploads from The Linux Foundation', 'title': 'Uploads from The Linux Foundation',
'uploader': 'The Linux Foundation',
'uploader_id': 'TheLinuxFoundation',
} }
}, { }, {
# Only available via https://www.youtube.com/c/12minuteathlete/videos # Only available via https://www.youtube.com/c/12minuteathlete/videos
@ -2813,6 +2934,8 @@ class YoutubeUserIE(YoutubeChannelIE):
'info_dict': { 'info_dict': {
'id': 'UUVjM-zV6_opMDx7WYxnjZiQ', 'id': 'UUVjM-zV6_opMDx7WYxnjZiQ',
'title': 'Uploads from 12 Minute Athlete', 'title': 'Uploads from 12 Minute Athlete',
'uploader': '12 Minute Athlete',
'uploader_id': 'the12minuteathlete',
} }
}, { }, {
'url': 'ytuser:phihag', 'url': 'ytuser:phihag',
@ -2906,7 +3029,7 @@ class YoutubePlaylistsIE(YoutubePlaylistsBaseInfoExtractor):
'playlist_mincount': 4, 'playlist_mincount': 4,
'info_dict': { 'info_dict': {
'id': 'ThirstForScience', 'id': 'ThirstForScience',
'title': 'Thirst for Science', 'title': 'ThirstForScience',
}, },
}, { }, {
# with "Load more" button # with "Load more" button
@ -2923,6 +3046,7 @@ class YoutubePlaylistsIE(YoutubePlaylistsBaseInfoExtractor):
'id': 'UCiU1dHvZObB2iP6xkJ__Icw', 'id': 'UCiU1dHvZObB2iP6xkJ__Icw',
'title': 'Chem Player', 'title': 'Chem Player',
}, },
'skip': 'Blocked',
}] }]

View File

@ -41,6 +41,7 @@ class ZDFBaseIE(InfoExtractor):
class ZDFIE(ZDFBaseIE): class ZDFIE(ZDFBaseIE):
_VALID_URL = r'https?://www\.zdf\.de/(?:[^/]+/)*(?P<id>[^/?]+)\.html' _VALID_URL = r'https?://www\.zdf\.de/(?:[^/]+/)*(?P<id>[^/?]+)\.html'
_QUALITIES = ('auto', 'low', 'med', 'high', 'veryhigh') _QUALITIES = ('auto', 'low', 'med', 'high', 'veryhigh')
_GEO_COUNTRIES = ['DE']
_TESTS = [{ _TESTS = [{
'url': 'https://www.zdf.de/dokumentation/terra-x/die-magie-der-farben-von-koenigspurpur-und-jeansblau-100.html', 'url': 'https://www.zdf.de/dokumentation/terra-x/die-magie-der-farben-von-koenigspurpur-und-jeansblau-100.html',

File diff suppressed because it is too large Load Diff

View File

@ -1,3 +1,3 @@
from __future__ import unicode_literals from __future__ import unicode_literals
__version__ = '2019.06.27' __version__ = '2019.09.12.1'