1
0
mirror of https://codeberg.org/polarisfm/youtube-dl synced 2025-01-23 21:17:55 +01:00

Merge remote-tracking branch 'upstream/master' into fix-zing-mp3

This commit is contained in:
Thinh Nguyen 2018-10-03 19:08:06 -04:00
commit 563b863d7d
No known key found for this signature in database
GPG Key ID: C565BDA2570826DE
45 changed files with 1314 additions and 487 deletions

View File

@ -6,8 +6,8 @@
--- ---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2018.08.28*. If it's not, read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected. ### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2018.09.26*. If it's not, read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2018.08.28** - [ ] I've **verified** and **I assure** that I'm running youtube-dl **2018.09.26**
### Before submitting an *issue* make sure you have: ### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through the [README](https://github.com/rg3/youtube-dl/blob/master/README.md), **most notably** the [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections - [ ] At least skimmed through the [README](https://github.com/rg3/youtube-dl/blob/master/README.md), **most notably** the [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@ -36,7 +36,7 @@ Add the `-v` flag to **your command line** you run youtube-dl with (`youtube-dl
[debug] User config: [] [debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2018.08.28 [debug] youtube-dl version 2018.09.26
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {} [debug] Proxy map: {}

View File

@ -1,3 +1,76 @@
version 2018.09.26
Extractors
* [pluralsight] Fix subtitles extraction (#17671)
* [mediaset] Improve embed support (#17668)
+ [youtube] Add support for invidio.us (#17613)
+ [zattoo] Add support for more zattoo platform sites
* [zattoo] Fix extraction (#17175, #17542)
version 2018.09.18
Core
+ [extractor/common] Introduce channel meta fields
Extractors
* [adobepass] Don't pollute default headers dict
* [udemy] Don't pollute default headers dict
* [twitch] Don't pollute default headers dict
* [youtube] Don't pollute default query dict (#17593)
* [crunchyroll] Prefer hardsubless formats and formats in locale language
* [vrv] Make format ids deterministic
* [vimeo] Fix ondemand playlist extraction (#14591)
+ [pornhub] Extract upload date (#17574)
+ [porntube] Extract channel meta fields
+ [vimeo] Extract channel meta fields
+ [youtube] Extract channel meta fields (#9676, #12939)
* [porntube] Fix extraction (#17541)
* [asiancrush] Fix extraction (#15630)
+ [twitch:clips] Extend URL regular expression (closes #17559)
+ [vzaar] Add support for HLS
* [tube8] Fix metadata extraction (#17520)
* [eporner] Extract JSON-LD (#17519)
version 2018.09.10
Core
+ [utils] Properly recognize AV1 codec (#17506)
Extractors
+ [iprima] Add support for prima.iprima.cz (#17514)
+ [tele5] Add support for tele5.de (#7805, #7922, #17331, #17414)
* [nbc] Fix extraction of percent encoded URLs (#17374)
version 2018.09.08
Extractors
* [youtube] Fix extraction (#17457, #17464)
+ [pornhub:uservideos] Add support for new URLs (#17388)
* [iprima] Confirm adult check (#17437)
* [slideslive] Make check for video service name case-insensitive (#17429)
* [radiojavan] Fix extraction (#17151)
* [generic] Skip unsuccessful jwplayer extraction (#16735)
version 2018.09.01
Core
* [utils] Skip remote IP addresses non matching to source address' IP version
when creating a connection (#13422, #17362)
Extractors
+ [ard] Add support for one.ard.de (#17397)
* [niconico] Fix extraction on python3 (#17393, #17407)
* [ard] Extract f4m formats
* [crunchyroll] Parse vilos media data (#17343)
+ [ard] Add support for Beta ARD Mediathek
+ [bandcamp] Extract more metadata (#13197)
* [internazionale] Fix extraction of non-available-abroad videos (#17386)
version 2018.08.28 version 2018.08.28
Extractors Extractors

View File

@ -511,6 +511,8 @@ The basic usage is not to set any template arguments when downloading a single f
- `timestamp` (numeric): UNIX timestamp of the moment the video became available - `timestamp` (numeric): UNIX timestamp of the moment the video became available
- `upload_date` (string): Video upload date (YYYYMMDD) - `upload_date` (string): Video upload date (YYYYMMDD)
- `uploader_id` (string): Nickname or id of the video uploader - `uploader_id` (string): Nickname or id of the video uploader
- `channel` (string): Full name of the channel the video is uploaded on
- `channel_id` (string): Id of the channel
- `location` (string): Physical location where the video was filmed - `location` (string): Physical location where the video was filmed
- `duration` (numeric): Length of the video in seconds - `duration` (numeric): Length of the video in seconds
- `view_count` (numeric): How many users have watched the video on the platform - `view_count` (numeric): How many users have watched the video on the platform

View File

@ -56,6 +56,7 @@
- **archive.org**: archive.org videos - **archive.org**: archive.org videos
- **ARD** - **ARD**
- **ARD:mediathek** - **ARD:mediathek**
- **ARDBetaMediathek**
- **Arkena** - **Arkena**
- **arte.tv** - **arte.tv**
- **arte.tv:+7** - **arte.tv:+7**
@ -97,6 +98,7 @@
- **bbc.co.uk:article**: BBC articles - **bbc.co.uk:article**: BBC articles
- **bbc.co.uk:iplayer:playlist** - **bbc.co.uk:iplayer:playlist**
- **bbc.co.uk:playlist** - **bbc.co.uk:playlist**
- **BBVTV**
- **Beatport** - **Beatport**
- **Beeg** - **Beeg**
- **BehindKink** - **BehindKink**
@ -191,7 +193,7 @@
- **Crackle** - **Crackle**
- **Criterion** - **Criterion**
- **CrooksAndLiars** - **CrooksAndLiars**
- **Crunchyroll** - **crunchyroll**
- **crunchyroll:playlist** - **crunchyroll:playlist**
- **CSNNE** - **CSNNE**
- **CSpan**: C-SPAN - **CSpan**: C-SPAN
@ -250,6 +252,7 @@
- **egghead:course**: egghead.io course - **egghead:course**: egghead.io course
- **egghead:lesson**: egghead.io lesson - **egghead:lesson**: egghead.io lesson
- **eHow** - **eHow**
- **EinsUndEinsTV**
- **Einthusan** - **Einthusan**
- **eitb.tv** - **eitb.tv**
- **EllenTube** - **EllenTube**
@ -267,6 +270,7 @@
- **EsriVideo** - **EsriVideo**
- **Europa** - **Europa**
- **EveryonesMixtape** - **EveryonesMixtape**
- **EWETV**
- **ExpoTV** - **ExpoTV**
- **Expressen** - **Expressen**
- **ExtremeTube** - **ExtremeTube**
@ -326,6 +330,7 @@
- **Gfycat** - **Gfycat**
- **GiantBomb** - **GiantBomb**
- **Giga** - **Giga**
- **GlattvisionTV**
- **Glide**: Glide mobile video messages (glide.me) - **Glide**: Glide mobile video messages (glide.me)
- **Globo** - **Globo**
- **GloboArticle** - **GloboArticle**
@ -493,6 +498,7 @@
- **Mixer:vod** - **Mixer:vod**
- **MLB** - **MLB**
- **Mnet** - **Mnet**
- **MNetTV**
- **MoeVideo**: LetitBit video services: moevideo.net, playreplay.net and videochart.net - **MoeVideo**: LetitBit video services: moevideo.net, playreplay.net and videochart.net
- **Mofosex** - **Mofosex**
- **Mojvideo** - **Mojvideo**
@ -524,6 +530,7 @@
- **Myvi** - **Myvi**
- **MyVidster** - **MyVidster**
- **MyviEmbed** - **MyviEmbed**
- **MyVisionTV**
- **n-tv.de** - **n-tv.de**
- **natgeo** - **natgeo**
- **natgeo:episodeguide** - **natgeo:episodeguide**
@ -549,6 +556,7 @@
- **netease:program**: 网易云音乐 - 电台节目 - **netease:program**: 网易云音乐 - 电台节目
- **netease:singer**: 网易云音乐 - 歌手 - **netease:singer**: 网易云音乐 - 歌手
- **netease:song**: 网易云音乐 - **netease:song**: 网易云音乐
- **NetPlus**
- **Netzkino** - **Netzkino**
- **Newgrounds** - **Newgrounds**
- **NewgroundsPlaylist** - **NewgroundsPlaylist**
@ -625,6 +633,7 @@
- **orf:iptv**: iptv.ORF.at - **orf:iptv**: iptv.ORF.at
- **orf:oe1**: Radio Österreich 1 - **orf:oe1**: Radio Österreich 1
- **orf:tvthek**: ORF TVthek - **orf:tvthek**: ORF TVthek
- **OsnatelTV**
- **PacktPub** - **PacktPub**
- **PacktPubCourse** - **PacktPubCourse**
- **PandaTV**: 熊猫TV - **PandaTV**: 熊猫TV
@ -685,6 +694,7 @@
- **qqmusic:playlist**: QQ音乐 - 歌单 - **qqmusic:playlist**: QQ音乐 - 歌单
- **qqmusic:singer**: QQ音乐 - 歌手 - **qqmusic:singer**: QQ音乐 - 歌手
- **qqmusic:toplist**: QQ音乐 - 排行榜 - **qqmusic:toplist**: QQ音乐 - 排行榜
- **QuantumTV**
- **Quickline** - **Quickline**
- **QuicklineLive** - **QuicklineLive**
- **R7** - **R7**
@ -752,6 +762,7 @@
- **safari**: safaribooksonline.com online video - **safari**: safaribooksonline.com online video
- **safari:api** - **safari:api**
- **safari:course**: safaribooksonline.com online courses - **safari:course**: safaribooksonline.com online courses
- **SAKTV**
- **Sapo**: SAPO Vídeos - **Sapo**: SAPO Vídeos
- **savefrom.net** - **savefrom.net**
- **SBS**: sbs.com.au - **SBS**: sbs.com.au
@ -846,6 +857,7 @@
- **techtv.mit.edu** - **techtv.mit.edu**
- **ted** - **ted**
- **Tele13** - **Tele13**
- **Tele5**
- **TeleBruxelles** - **TeleBruxelles**
- **Telecinco**: telecinco.es, cuatro.com and mediaset.es - **Telecinco**: telecinco.es, cuatro.com and mediaset.es
- **Telegraaf** - **Telegraaf**
@ -1033,12 +1045,14 @@
- **vrv** - **vrv**
- **vrv:series** - **vrv:series**
- **VShare** - **VShare**
- **VTXTV**
- **vube**: Vube.com - **vube**: Vube.com
- **VuClip** - **VuClip**
- **VVVVID** - **VVVVID**
- **VyboryMos** - **VyboryMos**
- **Vzaar** - **Vzaar**
- **Walla** - **Walla**
- **WalyTV**
- **washingtonpost** - **washingtonpost**
- **washingtonpost:article** - **washingtonpost:article**
- **wat.tv** - **wat.tv**

View File

@ -785,6 +785,10 @@ class TestUtil(unittest.TestCase):
'vcodec': 'h264', 'vcodec': 'h264',
'acodec': 'aac', 'acodec': 'aac',
}) })
self.assertEqual(parse_codecs('av01.0.05M.08'), {
'vcodec': 'av01.0.05M.08',
'acodec': 'none',
})
def test_escape_rfc3986(self): def test_escape_rfc3986(self):
reserved = "!*'();:@&=+$,/?#[]" reserved = "!*'();:@&=+$,/?#[]"

View File

@ -1325,8 +1325,8 @@ class AdobePassIE(InfoExtractor):
_DOWNLOADING_LOGIN_PAGE = 'Downloading Provider Login Page' _DOWNLOADING_LOGIN_PAGE = 'Downloading Provider Login Page'
def _download_webpage_handle(self, *args, **kwargs): def _download_webpage_handle(self, *args, **kwargs):
headers = kwargs.get('headers', {}) headers = self.geo_verification_headers()
headers.update(self.geo_verification_headers()) headers.update(kwargs.get('headers', {}))
kwargs['headers'] = headers kwargs['headers'] = headers
return super(AdobePassIE, self)._download_webpage_handle( return super(AdobePassIE, self)._download_webpage_handle(
*args, **compat_kwargs(kwargs)) *args, **compat_kwargs(kwargs))

View File

@ -21,7 +21,7 @@ from ..compat import compat_etree_fromstring
class ARDMediathekIE(InfoExtractor): class ARDMediathekIE(InfoExtractor):
IE_NAME = 'ARD:mediathek' IE_NAME = 'ARD:mediathek'
_VALID_URL = r'^https?://(?:(?:www\.)?ardmediathek\.de|mediathek\.(?:daserste|rbb-online)\.de)/(?:.*/)(?P<video_id>[0-9]+|[^0-9][^/\?]+)[^/\?]*(?:\?.*)?' _VALID_URL = r'^https?://(?:(?:www\.)?ardmediathek\.de|mediathek\.(?:daserste|rbb-online)\.de|one\.ard\.de)/(?:.*/)(?P<video_id>[0-9]+|[^0-9][^/\?]+)[^/\?]*(?:\?.*)?'
_TESTS = [{ _TESTS = [{
# available till 26.07.2022 # available till 26.07.2022
@ -37,6 +37,9 @@ class ARDMediathekIE(InfoExtractor):
# m3u8 download # m3u8 download
'skip_download': True, 'skip_download': True,
} }
}, {
'url': 'https://one.ard.de/tv/Mord-mit-Aussicht/Mord-mit-Aussicht-6-39-T%C3%B6dliche-Nach/ONE/Video?bcastId=46384294&documentId=55586872',
'only_matching': True,
}, { }, {
# audio # audio
'url': 'http://www.ardmediathek.de/tv/WDR-H%C3%B6rspiel-Speicher/Tod-eines-Fu%C3%9Fballers/WDR-3/Audio-Podcast?documentId=28488308&bcastId=23074086', 'url': 'http://www.ardmediathek.de/tv/WDR-H%C3%B6rspiel-Speicher/Tod-eines-Fu%C3%9Fballers/WDR-3/Audio-Podcast?documentId=28488308&bcastId=23074086',
@ -282,3 +285,76 @@ class ARDIE(InfoExtractor):
'upload_date': upload_date, 'upload_date': upload_date,
'thumbnail': thumbnail, 'thumbnail': thumbnail,
} }
class ARDBetaMediathekIE(InfoExtractor):
_VALID_URL = r'https://beta\.ardmediathek\.de/[a-z]+/player/(?P<video_id>[a-zA-Z0-9]+)/(?P<display_id>[^/?#]+)'
_TESTS = [{
'url': 'https://beta.ardmediathek.de/ard/player/Y3JpZDovL2Rhc2Vyc3RlLmRlL3RhdG9ydC9mYmM4NGM1NC0xNzU4LTRmZGYtYWFhZS0wYzcyZTIxNGEyMDE/die-robuste-roswita',
'md5': '2d02d996156ea3c397cfc5036b5d7f8f',
'info_dict': {
'display_id': 'die-robuste-roswita',
'id': 'Y3JpZDovL2Rhc2Vyc3RlLmRlL3RhdG9ydC9mYmM4NGM1NC0xNzU4LTRmZGYtYWFhZS0wYzcyZTIxNGEyMDE',
'title': 'Tatort: Die robuste Roswita',
'description': r're:^Der Mord.*trüber ist als die Ilm.',
'duration': 5316,
'thumbnail': 'https://img.ardmediathek.de/standard/00/55/43/59/34/-1774185891/16x9/960?mandant=ard',
'upload_date': '20180826',
'ext': 'mp4',
},
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('video_id')
display_id = mobj.group('display_id')
webpage = self._download_webpage(url, display_id)
data_json = self._search_regex(r'window\.__APOLLO_STATE__\s*=\s*(\{.*);\n', webpage, 'json')
data = self._parse_json(data_json, display_id)
res = {
'id': video_id,
'display_id': display_id,
}
formats = []
for widget in data.values():
if widget.get('_geoblocked'):
raise ExtractorError('This video is not available due to geoblocking', expected=True)
if '_duration' in widget:
res['duration'] = widget['_duration']
if 'clipTitle' in widget:
res['title'] = widget['clipTitle']
if '_previewImage' in widget:
res['thumbnail'] = widget['_previewImage']
if 'broadcastedOn' in widget:
res['upload_date'] = unified_strdate(widget['broadcastedOn'])
if 'synopsis' in widget:
res['description'] = widget['synopsis']
if '_subtitleUrl' in widget:
res['subtitles'] = {'de': [{
'ext': 'ttml',
'url': widget['_subtitleUrl'],
}]}
if '_quality' in widget:
format_url = widget['_stream']['json'][0]
if format_url.endswith('.f4m'):
formats.extend(self._extract_f4m_formats(
format_url + '?hdcore=3.11.0',
video_id, f4m_id='hds', fatal=False))
elif format_url.endswith('m3u8'):
formats.extend(self._extract_m3u8_formats(
format_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
else:
formats.append({
'format_id': 'http-' + widget['_quality'],
'url': format_url,
'preference': 10, # Plain HTTP, that's nice
})
self._sort_formats(formats)
res['formats'] = formats
return res

View File

@ -8,7 +8,6 @@ from .kaltura import KalturaIE
from ..utils import ( from ..utils import (
extract_attributes, extract_attributes,
remove_end, remove_end,
urlencode_postdata,
) )
@ -34,19 +33,40 @@ class AsianCrushIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
data = self._download_json( webpage = self._download_webpage(url, video_id)
'https://www.asiancrush.com/wp-admin/admin-ajax.php', video_id,
data=urlencode_postdata({
'postid': video_id,
'action': 'get_channel_kaltura_vars',
}))
entry_id = data['entry_id'] entry_id, partner_id, title = [None] * 3
vars = self._parse_json(
self._search_regex(
r'iEmbedVars\s*=\s*({.+?})', webpage, 'embed vars',
default='{}'), video_id, fatal=False)
if vars:
entry_id = vars.get('entry_id')
partner_id = vars.get('partner_id')
title = vars.get('vid_label')
if not entry_id:
entry_id = self._search_regex(
r'\bentry_id["\']\s*:\s*["\'](\d+)', webpage, 'entry id')
player = self._download_webpage(
'https://api.asiancrush.com/embeddedVideoPlayer', video_id,
query={'id': entry_id})
kaltura_id = self._search_regex(
r'entry_id["\']\s*:\s*(["\'])(?P<id>(?:(?!\1).)+)\1', player,
'kaltura id', group='id')
if not partner_id:
partner_id = self._search_regex(
r'/p(?:artner_id)?/(\d+)', player, 'partner id',
default='513551')
return self.url_result( return self.url_result(
'kaltura:%s:%s' % (data['partner_id'], entry_id), 'kaltura:%s:%s' % (partner_id, kaltura_id),
ie=KalturaIE.ie_key(), video_id=entry_id, ie=KalturaIE.ie_key(), video_id=kaltura_id,
video_title=data.get('vid_label')) video_title=title)
class AsianCrushPlaylistIE(InfoExtractor): class AsianCrushPlaylistIE(InfoExtractor):

View File

@ -1,6 +1,5 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import json
import random import random
import re import re
import time import time
@ -16,15 +15,18 @@ from ..utils import (
int_or_none, int_or_none,
KNOWN_EXTENSIONS, KNOWN_EXTENSIONS,
parse_filesize, parse_filesize,
str_or_none,
try_get,
unescapeHTML, unescapeHTML,
update_url_query, update_url_query,
unified_strdate, unified_strdate,
unified_timestamp,
url_or_none, url_or_none,
) )
class BandcampIE(InfoExtractor): class BandcampIE(InfoExtractor):
_VALID_URL = r'https?://.*?\.bandcamp\.com/track/(?P<title>[^/?#&]+)' _VALID_URL = r'https?://[^/]+\.bandcamp\.com/track/(?P<title>[^/?#&]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://youtube-dl.bandcamp.com/track/youtube-dl-test-song', 'url': 'http://youtube-dl.bandcamp.com/track/youtube-dl-test-song',
'md5': 'c557841d5e50261777a6585648adf439', 'md5': 'c557841d5e50261777a6585648adf439',
@ -36,13 +38,44 @@ class BandcampIE(InfoExtractor):
}, },
'_skip': 'There is a limit of 200 free downloads / month for the test song' '_skip': 'There is a limit of 200 free downloads / month for the test song'
}, { }, {
# free download
'url': 'http://benprunty.bandcamp.com/track/lanius-battle', 'url': 'http://benprunty.bandcamp.com/track/lanius-battle',
'md5': '0369ace6b939f0927e62c67a1a8d9fa7', 'md5': '853e35bf34aa1d6fe2615ae612564b36',
'info_dict': { 'info_dict': {
'id': '2650410135', 'id': '2650410135',
'ext': 'aiff', 'ext': 'aiff',
'title': 'Ben Prunty - Lanius (Battle)', 'title': 'Ben Prunty - Lanius (Battle)',
'thumbnail': r're:^https?://.*\.jpg$',
'uploader': 'Ben Prunty', 'uploader': 'Ben Prunty',
'timestamp': 1396508491,
'upload_date': '20140403',
'release_date': '20140403',
'duration': 260.877,
'track': 'Lanius (Battle)',
'track_number': 1,
'track_id': '2650410135',
'artist': 'Ben Prunty',
'album': 'FTL: Advanced Edition Soundtrack',
},
}, {
# no free download, mp3 128
'url': 'https://relapsealumni.bandcamp.com/track/hail-to-fire',
'md5': 'fec12ff55e804bb7f7ebeb77a800c8b7',
'info_dict': {
'id': '2584466013',
'ext': 'mp3',
'title': 'Mastodon - Hail to Fire',
'thumbnail': r're:^https?://.*\.jpg$',
'uploader': 'Mastodon',
'timestamp': 1322005399,
'upload_date': '20111122',
'release_date': '20040207',
'duration': 120.79,
'track': 'Hail to Fire',
'track_number': 5,
'track_id': '2584466013',
'artist': 'Mastodon',
'album': 'Call of the Mastodon',
}, },
}] }]
@ -51,19 +84,23 @@ class BandcampIE(InfoExtractor):
title = mobj.group('title') title = mobj.group('title')
webpage = self._download_webpage(url, title) webpage = self._download_webpage(url, title)
thumbnail = self._html_search_meta('og:image', webpage, default=None) thumbnail = self._html_search_meta('og:image', webpage, default=None)
m_download = re.search(r'freeDownloadPage: "(.*?)"', webpage)
if not m_download:
m_trackinfo = re.search(r'trackinfo: (.+),\s*?\n', webpage)
if m_trackinfo:
json_code = m_trackinfo.group(1)
data = json.loads(json_code)[0]
track_id = compat_str(data['id'])
if not data.get('file'): track_id = None
raise ExtractorError('Not streamable', video_id=track_id, expected=True) track = None
track_number = None
duration = None
formats = [] formats = []
for format_id, format_url in data['file'].items(): track_info = self._parse_json(
self._search_regex(
r'trackinfo\s*:\s*\[\s*({.+?})\s*\]\s*,\s*?\n',
webpage, 'track info', default='{}'), title)
if track_info:
file_ = track_info.get('file')
if isinstance(file_, dict):
for format_id, format_url in file_.items():
if not url_or_none(format_url):
continue
ext, abr_str = format_id.split('-', 1) ext, abr_str = format_id.split('-', 1)
formats.append({ formats.append({
'format_id': format_id, 'format_id': format_id,
@ -73,85 +110,110 @@ class BandcampIE(InfoExtractor):
'acodec': ext, 'acodec': ext,
'abr': int_or_none(abr_str), 'abr': int_or_none(abr_str),
}) })
track = track_info.get('title')
track_id = str_or_none(track_info.get('track_id') or track_info.get('id'))
track_number = int_or_none(track_info.get('track_num'))
duration = float_or_none(track_info.get('duration'))
self._sort_formats(formats) def extract(key):
return self._search_regex(
r'\b%s\s*["\']?\s*:\s*(["\'])(?P<value>(?:(?!\1).)+)\1' % key,
webpage, key, default=None, group='value')
return { artist = extract('artist')
'id': track_id, album = extract('album_title')
'title': data['title'], timestamp = unified_timestamp(
'thumbnail': thumbnail, extract('publish_date') or extract('album_publish_date'))
'formats': formats, release_date = unified_strdate(extract('album_release_date'))
'duration': float_or_none(data.get('duration')),
}
else:
raise ExtractorError('No free songs found')
download_link = m_download.group(1) download_link = self._search_regex(
video_id = self._search_regex( r'freeDownloadPage\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1', webpage,
r'(?ms)var TralbumData = .*?[{,]\s*id: (?P<id>\d+),?$', 'download link', default=None, group='url')
webpage, 'video id') if download_link:
track_id = self._search_regex(
r'(?ms)var TralbumData = .*?[{,]\s*id: (?P<id>\d+),?$',
webpage, 'track id')
download_webpage = self._download_webpage( download_webpage = self._download_webpage(
download_link, video_id, 'Downloading free downloads page') download_link, track_id, 'Downloading free downloads page')
blob = self._parse_json( blob = self._parse_json(
self._search_regex( self._search_regex(
r'data-blob=(["\'])(?P<blob>{.+?})\1', download_webpage, r'data-blob=(["\'])(?P<blob>{.+?})\1', download_webpage,
'blob', group='blob'), 'blob', group='blob'),
video_id, transform_source=unescapeHTML) track_id, transform_source=unescapeHTML)
info = blob['digital_items'][0] info = try_get(
blob, (lambda x: x['digital_items'][0],
lambda x: x['download_items'][0]), dict)
if info:
downloads = info.get('downloads')
if isinstance(downloads, dict):
if not track:
track = info.get('title')
if not artist:
artist = info.get('artist')
if not thumbnail:
thumbnail = info.get('thumb_url')
downloads = info['downloads'] download_formats = {}
track = info['title'] download_formats_list = blob.get('download_formats')
if isinstance(download_formats_list, list):
for f in blob['download_formats']:
name, ext = f.get('name'), f.get('file_extension')
if all(isinstance(x, compat_str) for x in (name, ext)):
download_formats[name] = ext.strip('.')
artist = info.get('artist') for format_id, f in downloads.items():
title = '%s - %s' % (artist, track) if artist else track format_url = f.get('url')
if not format_url:
continue
# Stat URL generation algorithm is reverse engineered from
# download_*_bundle_*.js
stat_url = update_url_query(
format_url.replace('/download/', '/statdownload/'), {
'.rand': int(time.time() * 1000 * random.random()),
})
format_id = f.get('encoding_name') or format_id
stat = self._download_json(
stat_url, track_id, 'Downloading %s JSON' % format_id,
transform_source=lambda s: s[s.index('{'):s.rindex('}') + 1],
fatal=False)
if not stat:
continue
retry_url = url_or_none(stat.get('retry_url'))
if not retry_url:
continue
formats.append({
'url': self._proto_relative_url(retry_url, 'http:'),
'ext': download_formats.get(format_id),
'format_id': format_id,
'format_note': f.get('description'),
'filesize': parse_filesize(f.get('size_mb')),
'vcodec': 'none',
})
download_formats = {}
for f in blob['download_formats']:
name, ext = f.get('name'), f.get('file_extension')
if all(isinstance(x, compat_str) for x in (name, ext)):
download_formats[name] = ext.strip('.')
formats = []
for format_id, f in downloads.items():
format_url = f.get('url')
if not format_url:
continue
# Stat URL generation algorithm is reverse engineered from
# download_*_bundle_*.js
stat_url = update_url_query(
format_url.replace('/download/', '/statdownload/'), {
'.rand': int(time.time() * 1000 * random.random()),
})
format_id = f.get('encoding_name') or format_id
stat = self._download_json(
stat_url, video_id, 'Downloading %s JSON' % format_id,
transform_source=lambda s: s[s.index('{'):s.rindex('}') + 1],
fatal=False)
if not stat:
continue
retry_url = url_or_none(stat.get('retry_url'))
if not retry_url:
continue
formats.append({
'url': self._proto_relative_url(retry_url, 'http:'),
'ext': download_formats.get(format_id),
'format_id': format_id,
'format_note': f.get('description'),
'filesize': parse_filesize(f.get('size_mb')),
'vcodec': 'none',
})
self._sort_formats(formats) self._sort_formats(formats)
title = '%s - %s' % (artist, track) if artist else track
if not duration:
duration = float_or_none(self._html_search_meta(
'duration', webpage, default=None))
return { return {
'id': video_id, 'id': track_id,
'title': title, 'title': title,
'thumbnail': info.get('thumb_url') or thumbnail, 'thumbnail': thumbnail,
'uploader': info.get('artist'), 'uploader': artist,
'artist': artist, 'timestamp': timestamp,
'release_date': release_date,
'duration': duration,
'track': track, 'track': track,
'track_number': track_number,
'track_id': track_id,
'artist': artist,
'album': album,
'formats': formats, 'formats': formats,
} }

View File

@ -211,6 +211,11 @@ class InfoExtractor(object):
If not explicitly set, calculated from timestamp. If not explicitly set, calculated from timestamp.
uploader_id: Nickname or id of the video uploader. uploader_id: Nickname or id of the video uploader.
uploader_url: Full URL to a personal webpage of the video uploader. uploader_url: Full URL to a personal webpage of the video uploader.
channel: Full name of the channel the video is uploaded on.
Note that channel fields may or may not repeat uploader
fields. This depends on a particular extractor.
channel_id: Id of the channel.
channel_url: Full URL to a channel webpage.
location: Physical location where the video was filmed. location: Physical location where the video was filmed.
subtitles: The available subtitles as a dictionary in the format subtitles: The available subtitles as a dictionary in the format
{tag: subformats}. "tag" is usually a language code, and {tag: subformats}. "tag" is usually a language code, and
@ -1701,9 +1706,9 @@ class InfoExtractor(object):
# However, this is not always respected, for example, [2] # However, this is not always respected, for example, [2]
# contains EXT-X-STREAM-INF tag which references AUDIO # contains EXT-X-STREAM-INF tag which references AUDIO
# rendition group but does not have CODECS and despite # rendition group but does not have CODECS and despite
# referencing audio group an audio group, it represents # referencing an audio group it represents a complete
# a complete (with audio and video) format. So, for such cases # (with audio and video) format. So, for such cases we will
# we will ignore references to rendition groups and treat them # ignore references to rendition groups and treat them
# as complete formats. # as complete formats.
if audio_group_id and codecs and f.get('vcodec') != 'none': if audio_group_id and codecs and f.get('vcodec') != 'none':
audio_group = groups.get(audio_group_id) audio_group = groups.get(audio_group_id)

View File

@ -8,6 +8,7 @@ import zlib
from hashlib import sha1 from hashlib import sha1
from math import pow, sqrt, floor from math import pow, sqrt, floor
from .common import InfoExtractor from .common import InfoExtractor
from .vrv import VRVIE
from ..compat import ( from ..compat import (
compat_b64decode, compat_b64decode,
compat_etree_fromstring, compat_etree_fromstring,
@ -18,6 +19,8 @@ from ..compat import (
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
bytes_to_intlist, bytes_to_intlist,
extract_attributes,
float_or_none,
intlist_to_bytes, intlist_to_bytes,
int_or_none, int_or_none,
lowercase_escape, lowercase_escape,
@ -26,7 +29,6 @@ from ..utils import (
unified_strdate, unified_strdate,
urlencode_postdata, urlencode_postdata,
xpath_text, xpath_text,
extract_attributes,
) )
from ..aes import ( from ..aes import (
aes_cbc_decrypt, aes_cbc_decrypt,
@ -43,7 +45,7 @@ class CrunchyrollBaseIE(InfoExtractor):
data['req'] = 'RpcApi' + method data['req'] = 'RpcApi' + method
data = compat_urllib_parse_urlencode(data).encode('utf-8') data = compat_urllib_parse_urlencode(data).encode('utf-8')
return self._download_xml( return self._download_xml(
'http://www.crunchyroll.com/xml/', 'https://www.crunchyroll.com/xml/',
video_id, note, fatal=False, data=data, headers={ video_id, note, fatal=False, data=data, headers={
'Content-Type': 'application/x-www-form-urlencoded', 'Content-Type': 'application/x-www-form-urlencoded',
}) })
@ -139,7 +141,8 @@ class CrunchyrollBaseIE(InfoExtractor):
parsed_url._replace(query=compat_urllib_parse_urlencode(qs, True))) parsed_url._replace(query=compat_urllib_parse_urlencode(qs, True)))
class CrunchyrollIE(CrunchyrollBaseIE): class CrunchyrollIE(CrunchyrollBaseIE, VRVIE):
IE_NAME = 'crunchyroll'
_VALID_URL = r'https?://(?:(?P<prefix>www|m)\.)?(?P<url>crunchyroll\.(?:com|fr)/(?:media(?:-|/\?id=)|[^/]*/[^/?&]*?)(?P<video_id>[0-9]+))(?:[/?&]|$)' _VALID_URL = r'https?://(?:(?P<prefix>www|m)\.)?(?P<url>crunchyroll\.(?:com|fr)/(?:media(?:-|/\?id=)|[^/]*/[^/?&]*?)(?P<video_id>[0-9]+))(?:[/?&]|$)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.crunchyroll.com/wanna-be-the-strongest-in-the-world/episode-1-an-idol-wrestler-is-born-645513', 'url': 'http://www.crunchyroll.com/wanna-be-the-strongest-in-the-world/episode-1-an-idol-wrestler-is-born-645513',
@ -148,7 +151,7 @@ class CrunchyrollIE(CrunchyrollBaseIE):
'ext': 'mp4', 'ext': 'mp4',
'title': 'Wanna be the Strongest in the World Episode 1 An Idol-Wrestler is Born!', 'title': 'Wanna be the Strongest in the World Episode 1 An Idol-Wrestler is Born!',
'description': 'md5:2d17137920c64f2f49981a7797d275ef', 'description': 'md5:2d17137920c64f2f49981a7797d275ef',
'thumbnail': 'http://img1.ak.crunchyroll.com/i/spire1-tmb/20c6b5e10f1a47b10516877d3c039cae1380951166_full.jpg', 'thumbnail': r're:^https?://.*\.jpg$',
'uploader': 'Yomiuri Telecasting Corporation (YTV)', 'uploader': 'Yomiuri Telecasting Corporation (YTV)',
'upload_date': '20131013', 'upload_date': '20131013',
'url': 're:(?!.*&amp)', 'url': 're:(?!.*&amp)',
@ -221,7 +224,7 @@ class CrunchyrollIE(CrunchyrollBaseIE):
'info_dict': { 'info_dict': {
'id': '535080', 'id': '535080',
'ext': 'mp4', 'ext': 'mp4',
'title': '11eyes Episode 1 Piros éjszaka - Red Night', 'title': '11eyes Episode 1 Red Night ~ Piros éjszaka',
'description': 'Kakeru and Yuka are thrown into an alternate nightmarish world they call "Red Night".', 'description': 'Kakeru and Yuka are thrown into an alternate nightmarish world they call "Red Night".',
'uploader': 'Marvelous AQL Inc.', 'uploader': 'Marvelous AQL Inc.',
'upload_date': '20091021', 'upload_date': '20091021',
@ -437,13 +440,22 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
if 'To view this, please log in to verify you are 18 or older.' in webpage: if 'To view this, please log in to verify you are 18 or older.' in webpage:
self.raise_login_required() self.raise_login_required()
media = self._parse_json(self._search_regex(
r'vilos\.config\.media\s*=\s*({.+?});',
webpage, 'vilos media', default='{}'), video_id)
media_metadata = media.get('metadata') or {}
language = self._search_regex(
r'(?:vilos\.config\.player\.language|LOCALE)\s*=\s*(["\'])(?P<lang>(?:(?!\1).)+)\1',
webpage, 'language', default=None, group='lang')
video_title = self._html_search_regex( video_title = self._html_search_regex(
r'(?s)<h1[^>]*>((?:(?!<h1).)*?<span[^>]+itemprop=["\']title["\'][^>]*>(?:(?!<h1).)+?)</h1>', r'(?s)<h1[^>]*>((?:(?!<h1).)*?<span[^>]+itemprop=["\']title["\'][^>]*>(?:(?!<h1).)+?)</h1>',
webpage, 'video_title') webpage, 'video_title')
video_title = re.sub(r' {2,}', ' ', video_title) video_title = re.sub(r' {2,}', ' ', video_title)
video_description = self._parse_json(self._html_search_regex( video_description = (self._parse_json(self._html_search_regex(
r'<script[^>]*>\s*.+?\[media_id=%s\].+?({.+?"description"\s*:.+?})\);' % video_id, r'<script[^>]*>\s*.+?\[media_id=%s\].+?({.+?"description"\s*:.+?})\);' % video_id,
webpage, 'description', default='{}'), video_id).get('description') webpage, 'description', default='{}'), video_id) or media_metadata).get('description')
if video_description: if video_description:
video_description = lowercase_escape(video_description.replace(r'\r\n', '\n')) video_description = lowercase_escape(video_description.replace(r'\r\n', '\n'))
video_upload_date = self._html_search_regex( video_upload_date = self._html_search_regex(
@ -456,92 +468,113 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
[r'<a[^>]+href="/publisher/[^"]+"[^>]*>([^<]+)</a>', r'<div>\s*Publisher:\s*<span>\s*(.+?)\s*</span>\s*</div>'], [r'<a[^>]+href="/publisher/[^"]+"[^>]*>([^<]+)</a>', r'<div>\s*Publisher:\s*<span>\s*(.+?)\s*</span>\s*</div>'],
webpage, 'video_uploader', fatal=False) webpage, 'video_uploader', fatal=False)
available_fmts = []
for a, fmt in re.findall(r'(<a[^>]+token=["\']showmedia\.([0-9]{3,4})p["\'][^>]+>)', webpage):
attrs = extract_attributes(a)
href = attrs.get('href')
if href and '/freetrial' in href:
continue
available_fmts.append(fmt)
if not available_fmts:
for p in (r'token=["\']showmedia\.([0-9]{3,4})p"', r'showmedia\.([0-9]{3,4})p'):
available_fmts = re.findall(p, webpage)
if available_fmts:
break
video_encode_ids = []
formats = [] formats = []
for fmt in available_fmts: for stream in media.get('streams', []):
stream_quality, stream_format = self._FORMAT_IDS[fmt] audio_lang = stream.get('audio_lang')
video_format = fmt + 'p' hardsub_lang = stream.get('hardsub_lang')
stream_infos = [] vrv_formats = self._extract_vrv_formats(
streamdata = self._call_rpc_api( stream.get('url'), video_id, stream.get('format'),
'VideoPlayer_GetStandardConfig', video_id, audio_lang, hardsub_lang)
'Downloading media info for %s' % video_format, data={ for f in vrv_formats:
'media_id': video_id, if not hardsub_lang:
'video_format': stream_format, f['preference'] = 1
'video_quality': stream_quality, language_preference = 0
'current_page': url, if audio_lang == language:
}) language_preference += 1
if streamdata is not None: if hardsub_lang == language:
stream_info = streamdata.find('./{default}preload/stream_info') language_preference += 1
if language_preference:
f['language_preference'] = language_preference
formats.extend(vrv_formats)
if not formats:
available_fmts = []
for a, fmt in re.findall(r'(<a[^>]+token=["\']showmedia\.([0-9]{3,4})p["\'][^>]+>)', webpage):
attrs = extract_attributes(a)
href = attrs.get('href')
if href and '/freetrial' in href:
continue
available_fmts.append(fmt)
if not available_fmts:
for p in (r'token=["\']showmedia\.([0-9]{3,4})p"', r'showmedia\.([0-9]{3,4})p'):
available_fmts = re.findall(p, webpage)
if available_fmts:
break
if not available_fmts:
available_fmts = self._FORMAT_IDS.keys()
video_encode_ids = []
for fmt in available_fmts:
stream_quality, stream_format = self._FORMAT_IDS[fmt]
video_format = fmt + 'p'
stream_infos = []
streamdata = self._call_rpc_api(
'VideoPlayer_GetStandardConfig', video_id,
'Downloading media info for %s' % video_format, data={
'media_id': video_id,
'video_format': stream_format,
'video_quality': stream_quality,
'current_page': url,
})
if streamdata is not None:
stream_info = streamdata.find('./{default}preload/stream_info')
if stream_info is not None:
stream_infos.append(stream_info)
stream_info = self._call_rpc_api(
'VideoEncode_GetStreamInfo', video_id,
'Downloading stream info for %s' % video_format, data={
'media_id': video_id,
'video_format': stream_format,
'video_encode_quality': stream_quality,
})
if stream_info is not None: if stream_info is not None:
stream_infos.append(stream_info) stream_infos.append(stream_info)
stream_info = self._call_rpc_api( for stream_info in stream_infos:
'VideoEncode_GetStreamInfo', video_id, video_encode_id = xpath_text(stream_info, './video_encode_id')
'Downloading stream info for %s' % video_format, data={ if video_encode_id in video_encode_ids:
'media_id': video_id, continue
'video_format': stream_format, video_encode_ids.append(video_encode_id)
'video_encode_quality': stream_quality,
})
if stream_info is not None:
stream_infos.append(stream_info)
for stream_info in stream_infos:
video_encode_id = xpath_text(stream_info, './video_encode_id')
if video_encode_id in video_encode_ids:
continue
video_encode_ids.append(video_encode_id)
video_file = xpath_text(stream_info, './file') video_file = xpath_text(stream_info, './file')
if not video_file: if not video_file:
continue continue
if video_file.startswith('http'): if video_file.startswith('http'):
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
video_file, video_id, 'mp4', entry_protocol='m3u8_native', video_file, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False)) m3u8_id='hls', fatal=False))
continue
video_url = xpath_text(stream_info, './host')
if not video_url:
continue
metadata = stream_info.find('./metadata')
format_info = {
'format': video_format,
'height': int_or_none(xpath_text(metadata, './height')),
'width': int_or_none(xpath_text(metadata, './width')),
}
if '.fplive.net/' in video_url:
video_url = re.sub(r'^rtmpe?://', 'http://', video_url.strip())
parsed_video_url = compat_urlparse.urlparse(video_url)
direct_video_url = compat_urlparse.urlunparse(parsed_video_url._replace(
netloc='v.lvlt.crcdn.net',
path='%s/%s' % (remove_end(parsed_video_url.path, '/'), video_file.split(':')[-1])))
if self._is_valid_url(direct_video_url, video_id, video_format):
format_info.update({
'format_id': 'http-' + video_format,
'url': direct_video_url,
})
formats.append(format_info)
continue continue
format_info.update({ video_url = xpath_text(stream_info, './host')
'format_id': 'rtmp-' + video_format, if not video_url:
'url': video_url, continue
'play_path': video_file, metadata = stream_info.find('./metadata')
'ext': 'flv', format_info = {
}) 'format': video_format,
formats.append(format_info) 'height': int_or_none(xpath_text(metadata, './height')),
self._sort_formats(formats, ('height', 'width', 'tbr', 'fps')) 'width': int_or_none(xpath_text(metadata, './width')),
}
if '.fplive.net/' in video_url:
video_url = re.sub(r'^rtmpe?://', 'http://', video_url.strip())
parsed_video_url = compat_urlparse.urlparse(video_url)
direct_video_url = compat_urlparse.urlunparse(parsed_video_url._replace(
netloc='v.lvlt.crcdn.net',
path='%s/%s' % (remove_end(parsed_video_url.path, '/'), video_file.split(':')[-1])))
if self._is_valid_url(direct_video_url, video_id, video_format):
format_info.update({
'format_id': 'http-' + video_format,
'url': direct_video_url,
})
formats.append(format_info)
continue
format_info.update({
'format_id': 'rtmp-' + video_format,
'url': video_url,
'play_path': video_file,
'ext': 'flv',
})
formats.append(format_info)
self._sort_formats(formats, ('preference', 'language_preference', 'height', 'width', 'tbr', 'fps'))
metadata = self._call_rpc_api( metadata = self._call_rpc_api(
'VideoPlayer_GetMediaMetadata', video_id, 'VideoPlayer_GetMediaMetadata', video_id,
@ -549,7 +582,17 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
'media_id': video_id, 'media_id': video_id,
}) })
subtitles = self.extract_subtitles(video_id, webpage) subtitles = {}
for subtitle in media.get('subtitles', []):
subtitle_url = subtitle.get('url')
if not subtitle_url:
continue
subtitles.setdefault(subtitle.get('language', 'enUS'), []).append({
'url': subtitle_url,
'ext': subtitle.get('format', 'ass'),
})
if not subtitles:
subtitles = self.extract_subtitles(video_id, webpage)
# webpage provide more accurate data than series_title from XML # webpage provide more accurate data than series_title from XML
series = self._html_search_regex( series = self._html_search_regex(
@ -557,8 +600,8 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
webpage, 'series', fatal=False) webpage, 'series', fatal=False)
season = xpath_text(metadata, 'series_title') season = xpath_text(metadata, 'series_title')
episode = xpath_text(metadata, 'episode_title') episode = xpath_text(metadata, 'episode_title') or media_metadata.get('title')
episode_number = int_or_none(xpath_text(metadata, 'episode_number')) episode_number = int_or_none(xpath_text(metadata, 'episode_number') or media_metadata.get('episode_number'))
season_number = int_or_none(self._search_regex( season_number = int_or_none(self._search_regex(
r'(?s)<h\d[^>]+id=["\']showmedia_about_episode_num[^>]+>.+?</h\d>\s*<h4>\s*Season (\d+)', r'(?s)<h\d[^>]+id=["\']showmedia_about_episode_num[^>]+>.+?</h\d>\s*<h4>\s*Season (\d+)',
@ -568,7 +611,8 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
'id': video_id, 'id': video_id,
'title': video_title, 'title': video_title,
'description': video_description, 'description': video_description,
'thumbnail': xpath_text(metadata, 'episode_image_url'), 'duration': float_or_none(media_metadata.get('duration'), 1000),
'thumbnail': xpath_text(metadata, 'episode_image_url') or media_metadata.get('thumbnail', {}).get('url'),
'uploader': video_uploader, 'uploader': video_uploader,
'upload_date': video_upload_date, 'upload_date': video_upload_date,
'series': series, 'series': series,

View File

@ -59,7 +59,7 @@ class DTubeIE(InfoExtractor):
try: try:
self.to_screen('%s: Checking %s video format URL' % (video_id, format_id)) self.to_screen('%s: Checking %s video format URL' % (video_id, format_id))
self._downloader._opener.open(video_url, timeout=5).close() self._downloader._opener.open(video_url, timeout=5).close()
except timeout as e: except timeout:
self.to_screen( self.to_screen(
'%s: %s URL is invalid, skipping' % (video_id, format_id)) '%s: %s URL is invalid, skipping' % (video_id, format_id))
continue continue

View File

@ -9,6 +9,7 @@ from ..utils import (
encode_base_n, encode_base_n,
ExtractorError, ExtractorError,
int_or_none, int_or_none,
merge_dicts,
parse_duration, parse_duration,
str_to_int, str_to_int,
url_or_none, url_or_none,
@ -25,10 +26,16 @@ class EpornerIE(InfoExtractor):
'display_id': 'Infamous-Tiffany-Teen-Strip-Tease-Video', 'display_id': 'Infamous-Tiffany-Teen-Strip-Tease-Video',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Infamous Tiffany Teen Strip Tease Video', 'title': 'Infamous Tiffany Teen Strip Tease Video',
'description': 'md5:764f39abf932daafa37485eb46efa152',
'timestamp': 1232520922,
'upload_date': '20090121',
'duration': 1838, 'duration': 1838,
'view_count': int, 'view_count': int,
'age_limit': 18, 'age_limit': 18,
}, },
'params': {
'proxy': '127.0.0.1:8118'
}
}, { }, {
# New (May 2016) URL layout # New (May 2016) URL layout
'url': 'http://www.eporner.com/hd-porn/3YRUtzMcWn0/Star-Wars-XXX-Parody/', 'url': 'http://www.eporner.com/hd-porn/3YRUtzMcWn0/Star-Wars-XXX-Parody/',
@ -104,12 +111,15 @@ class EpornerIE(InfoExtractor):
}) })
self._sort_formats(formats) self._sort_formats(formats)
duration = parse_duration(self._html_search_meta('duration', webpage)) json_ld = self._search_json_ld(webpage, display_id, default={})
duration = parse_duration(self._html_search_meta(
'duration', webpage, default=None))
view_count = str_to_int(self._search_regex( view_count = str_to_int(self._search_regex(
r'id="cinemaviews">\s*([0-9,]+)\s*<small>views', r'id="cinemaviews">\s*([0-9,]+)\s*<small>views',
webpage, 'view count', fatal=False)) webpage, 'view count', fatal=False))
return { return merge_dicts(json_ld, {
'id': video_id, 'id': video_id,
'display_id': display_id, 'display_id': display_id,
'title': title, 'title': title,
@ -117,4 +127,4 @@ class EpornerIE(InfoExtractor):
'view_count': view_count, 'view_count': view_count,
'formats': formats, 'formats': formats,
'age_limit': 18, 'age_limit': 18,
} })

View File

@ -54,6 +54,7 @@ from .appletrailers import (
from .archiveorg import ArchiveOrgIE from .archiveorg import ArchiveOrgIE
from .arkena import ArkenaIE from .arkena import ArkenaIE
from .ard import ( from .ard import (
ARDBetaMediathekIE,
ARDIE, ARDIE,
ARDMediathekIE, ARDMediathekIE,
) )
@ -1085,6 +1086,7 @@ from .teachingchannel import TeachingChannelIE
from .teamcoco import TeamcocoIE from .teamcoco import TeamcocoIE
from .techtalks import TechTalksIE from .techtalks import TechTalksIE
from .ted import TEDIE from .ted import TEDIE
from .tele5 import Tele5IE
from .tele13 import Tele13IE from .tele13 import Tele13IE
from .telebruxelles import TeleBruxellesIE from .telebruxelles import TeleBruxellesIE
from .telecinco import TelecincoIE from .telecinco import TelecincoIE
@ -1453,8 +1455,20 @@ from .youtube import (
from .zapiks import ZapiksIE from .zapiks import ZapiksIE
from .zaq1 import Zaq1IE from .zaq1 import Zaq1IE
from .zattoo import ( from .zattoo import (
BBVTVIE,
EinsUndEinsTVIE,
EWETVIE,
GlattvisionTVIE,
MNetTVIE,
MyVisionTVIE,
NetPlusIE,
OsnatelTVIE,
QuantumTVIE,
QuicklineIE, QuicklineIE,
QuicklineLiveIE, QuicklineLiveIE,
SAKTVIE,
VTXTVIE,
WalyTVIE,
ZattooIE, ZattooIE,
ZattooLiveIE, ZattooLiveIE,
) )

View File

@ -3,15 +3,45 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_urlparse from ..compat import (
compat_b64decode,
compat_str,
compat_urllib_parse_unquote,
compat_urlparse,
)
from ..utils import ( from ..utils import (
int_or_none,
parse_duration, parse_duration,
parse_iso8601, parse_iso8601,
str_or_none,
str_to_int, str_to_int,
try_get,
unified_timestamp,
url_or_none,
) )
class FourTubeBaseIE(InfoExtractor): class FourTubeBaseIE(InfoExtractor):
_TKN_HOST = 'tkn.kodicdn.com'
def _extract_formats(self, url, video_id, media_id, sources):
token_url = 'https://%s/%s/desktop/%s' % (
self._TKN_HOST, media_id, '+'.join(sources))
parsed_url = compat_urlparse.urlparse(url)
tokens = self._download_json(token_url, video_id, data=b'', headers={
'Origin': '%s://%s' % (parsed_url.scheme, parsed_url.hostname),
'Referer': url,
})
formats = [{
'url': tokens[format]['token'],
'format_id': format + 'p',
'resolution': format + 'p',
'quality': int(format),
} for format in sources]
self._sort_formats(formats)
return formats
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
kind, video_id, display_id = mobj.group('kind', 'id', 'display_id') kind, video_id, display_id = mobj.group('kind', 'id', 'display_id')
@ -68,21 +98,7 @@ class FourTubeBaseIE(InfoExtractor):
media_id = params[0] media_id = params[0]
sources = ['%s' % p for p in params[2]] sources = ['%s' % p for p in params[2]]
token_url = 'https://tkn.kodicdn.com/{0}/desktop/{1}'.format( formats = self._extract_formats(url, video_id, media_id, sources)
media_id, '+'.join(sources))
parsed_url = compat_urlparse.urlparse(url)
tokens = self._download_json(token_url, video_id, data=b'', headers={
'Origin': '%s://%s' % (parsed_url.scheme, parsed_url.hostname),
'Referer': url,
})
formats = [{
'url': tokens[format]['token'],
'format_id': format + 'p',
'resolution': format + 'p',
'quality': int(format),
} for format in sources]
self._sort_formats(formats)
return { return {
'id': video_id, 'id': video_id,
@ -164,6 +180,7 @@ class FuxIE(FourTubeBaseIE):
class PornTubeIE(FourTubeBaseIE): class PornTubeIE(FourTubeBaseIE):
_VALID_URL = r'https?://(?:(?P<kind>www|m)\.)?porntube\.com/(?:videos/(?P<display_id>[^/]+)_|embed/)(?P<id>\d+)' _VALID_URL = r'https?://(?:(?P<kind>www|m)\.)?porntube\.com/(?:videos/(?P<display_id>[^/]+)_|embed/)(?P<id>\d+)'
_URL_TEMPLATE = 'https://www.porntube.com/videos/video_%s' _URL_TEMPLATE = 'https://www.porntube.com/videos/video_%s'
_TKN_HOST = 'tkn.porntube.com'
_TESTS = [{ _TESTS = [{
'url': 'https://www.porntube.com/videos/teen-couple-doing-anal_7089759', 'url': 'https://www.porntube.com/videos/teen-couple-doing-anal_7089759',
'info_dict': { 'info_dict': {
@ -171,13 +188,32 @@ class PornTubeIE(FourTubeBaseIE):
'ext': 'mp4', 'ext': 'mp4',
'title': 'Teen couple doing anal', 'title': 'Teen couple doing anal',
'uploader': 'Alexy', 'uploader': 'Alexy',
'uploader_id': 'Alexy', 'uploader_id': '91488',
'upload_date': '20150606', 'upload_date': '20150606',
'timestamp': 1433595647, 'timestamp': 1433595647,
'duration': 5052, 'duration': 5052,
'view_count': int, 'view_count': int,
'like_count': int, 'like_count': int,
'categories': list, 'age_limit': 18,
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://www.porntube.com/videos/squirting-teen-ballerina-ecg_1331406',
'info_dict': {
'id': '1331406',
'ext': 'mp4',
'title': 'Squirting Teen Ballerina on ECG',
'uploader': 'Exploited College Girls',
'uploader_id': '665',
'channel': 'Exploited College Girls',
'channel_id': '665',
'upload_date': '20130920',
'timestamp': 1379685485,
'duration': 851,
'view_count': int,
'like_count': int,
'age_limit': 18, 'age_limit': 18,
}, },
'params': { 'params': {
@ -191,6 +227,55 @@ class PornTubeIE(FourTubeBaseIE):
'only_matching': True, 'only_matching': True,
}] }]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id, display_id = mobj.group('id', 'display_id')
webpage = self._download_webpage(url, display_id)
video = self._parse_json(
self._search_regex(
r'INITIALSTATE\s*=\s*(["\'])(?P<value>(?:(?!\1).)+)\1',
webpage, 'data', group='value'), video_id,
transform_source=lambda x: compat_urllib_parse_unquote(
compat_b64decode(x).decode('utf-8')))['page']['video']
title = video['title']
media_id = video['mediaId']
sources = [compat_str(e['height'])
for e in video['encodings'] if e.get('height')]
formats = self._extract_formats(url, video_id, media_id, sources)
thumbnail = url_or_none(video.get('masterThumb'))
uploader = try_get(video, lambda x: x['user']['username'], compat_str)
uploader_id = str_or_none(try_get(
video, lambda x: x['user']['id'], int))
channel = try_get(video, lambda x: x['channel']['name'], compat_str)
channel_id = str_or_none(try_get(
video, lambda x: x['channel']['id'], int))
like_count = int_or_none(video.get('likes'))
dislike_count = int_or_none(video.get('dislikes'))
view_count = int_or_none(video.get('playsQty'))
duration = int_or_none(video.get('durationInSeconds'))
timestamp = unified_timestamp(video.get('publishedAt'))
return {
'id': video_id,
'title': title,
'formats': formats,
'thumbnail': thumbnail,
'uploader': uploader or channel,
'uploader_id': uploader_id or channel_id,
'channel': channel,
'channel_id': channel_id,
'timestamp': timestamp,
'like_count': like_count,
'dislike_count': dislike_count,
'view_count': view_count,
'duration': duration,
'age_limit': 18,
}
class PornerBrosIE(FourTubeBaseIE): class PornerBrosIE(FourTubeBaseIE):
_VALID_URL = r'https?://(?:(?P<kind>www|m)\.)?pornerbros\.com/(?:videos/(?P<display_id>[^/]+)_|embed/)(?P<id>\d+)' _VALID_URL = r'https?://(?:(?P<kind>www|m)\.)?pornerbros\.com/(?:videos/(?P<display_id>[^/]+)_|embed/)(?P<id>\d+)'

View File

@ -3023,7 +3023,7 @@ class GenericIE(InfoExtractor):
wapo_urls, video_id, video_title, ie=WashingtonPostIE.ie_key()) wapo_urls, video_id, video_title, ie=WashingtonPostIE.ie_key())
# Look for Mediaset embeds # Look for Mediaset embeds
mediaset_urls = MediasetIE._extract_urls(webpage) mediaset_urls = MediasetIE._extract_urls(self, webpage)
if mediaset_urls: if mediaset_urls:
return self.playlist_from_matches( return self.playlist_from_matches(
mediaset_urls, video_id, video_title, ie=MediasetIE.ie_key()) mediaset_urls, video_id, video_title, ie=MediasetIE.ie_key())
@ -3112,7 +3112,7 @@ class GenericIE(InfoExtractor):
return self.playlist_from_matches( return self.playlist_from_matches(
foxnews_urls, video_id, video_title, ie=FoxNewsIE.ie_key()) foxnews_urls, video_id, video_title, ie=FoxNewsIE.ie_key())
sharevideos_urls = [mobj.group('url') for mobj in re.finditer( sharevideos_urls = [sharevideos_mobj.group('url') for sharevideos_mobj in re.finditer(
r'<iframe[^>]+?\bsrc\s*=\s*(["\'])(?P<url>(?:https?:)?//embed\.share-videos\.se/auto/embed/\d+\?.*?\buid=\d+.*?)\1', r'<iframe[^>]+?\bsrc\s*=\s*(["\'])(?P<url>(?:https?:)?//embed\.share-videos\.se/auto/embed/\d+\?.*?\buid=\d+.*?)\1',
webpage)] webpage)]
if sharevideos_urls: if sharevideos_urls:
@ -3150,9 +3150,13 @@ class GenericIE(InfoExtractor):
jwplayer_data = self._find_jwplayer_data( jwplayer_data = self._find_jwplayer_data(
webpage, video_id, transform_source=js_to_json) webpage, video_id, transform_source=js_to_json)
if jwplayer_data: if jwplayer_data:
info = self._parse_jwplayer_data( try:
jwplayer_data, video_id, require_title=False, base_url=url) info = self._parse_jwplayer_data(
return merge_dicts(info, info_dict) jwplayer_data, video_id, require_title=False, base_url=url)
return merge_dicts(info, info_dict)
except ExtractorError:
# See https://github.com/rg3/youtube-dl/pull/16735
pass
# Video.js embed # Video.js embed
mobj = re.search( mobj = re.search(

View File

@ -1,49 +1,55 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re import hashlib
import hmac
import time
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str from ..compat import compat_HTTPError
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
ExtractorError, ExtractorError,
int_or_none, int_or_none,
try_get,
) )
class HotStarBaseIE(InfoExtractor): class HotStarBaseIE(InfoExtractor):
_GEO_COUNTRIES = ['IN'] _AKAMAI_ENCRYPTION_KEY = b'\x05\xfc\x1a\x01\xca\xc9\x4b\xc4\x12\xfc\x53\x12\x07\x75\xf9\xee'
def _download_json(self, *args, **kwargs): def _call_api(self, path, video_id, query_name='contentId'):
response = super(HotStarBaseIE, self)._download_json(*args, **kwargs) st = int(time.time())
if response['resultCode'] != 'OK': exp = st + 6000
if kwargs.get('fatal'): auth = 'st=%d~exp=%d~acl=/*' % (st, exp)
raise ExtractorError( auth += '~hmac=' + hmac.new(self._AKAMAI_ENCRYPTION_KEY, auth.encode(), hashlib.sha256).hexdigest()
response['errorDescription'], expected=True) response = self._download_json(
return None 'https://api.hotstar.com/' + path,
return response['resultObj'] video_id, headers={
'hotstarauth': auth,
def _download_content_info(self, content_id): 'x-country-code': 'IN',
return self._download_json( 'x-platform-code': 'JIO',
'https://account.hotstar.com/AVS/besc', content_id, query={ }, query={
'action': 'GetAggregatedContentDetails', query_name: video_id,
'appVersion': '5.0.40', 'tas': 10000,
'channel': 'PCTV', })
'contentId': content_id, if response['statusCode'] != 'OK':
})['contentInfo'][0] raise ExtractorError(
response['body']['message'], expected=True)
return response['body']['results']
class HotStarIE(HotStarBaseIE): class HotStarIE(HotStarBaseIE):
IE_NAME = 'hotstar'
_VALID_URL = r'https?://(?:www\.)?hotstar\.com/(?:.+?[/-])?(?P<id>\d{10})' _VALID_URL = r'https?://(?:www\.)?hotstar\.com/(?:.+?[/-])?(?P<id>\d{10})'
_TESTS = [{ _TESTS = [{
'url': 'http://www.hotstar.com/on-air-with-aib--english-1000076273', 'url': 'https://www.hotstar.com/can-you-not-spread-rumours/1000076273',
'info_dict': { 'info_dict': {
'id': '1000076273', 'id': '1000076273',
'ext': 'mp4', 'ext': 'mp4',
'title': 'On Air With AIB', 'title': 'Can You Not Spread Rumours?',
'description': 'md5:c957d8868e9bc793ccb813691cc4c434', 'description': 'md5:c957d8868e9bc793ccb813691cc4c434',
'timestamp': 1447227000, 'timestamp': 1447248600,
'upload_date': '20151111', 'upload_date': '20151111',
'duration': 381, 'duration': 381,
}, },
@ -58,47 +64,47 @@ class HotStarIE(HotStarBaseIE):
'url': 'http://www.hotstar.com/1000000515', 'url': 'http://www.hotstar.com/1000000515',
'only_matching': True, 'only_matching': True,
}] }]
_GEO_BYPASS = False
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
video_data = self._download_content_info(video_id) webpage = self._download_webpage(url, video_id)
app_state = self._parse_json(self._search_regex(
r'<script>window\.APP_STATE\s*=\s*({.+?})</script>',
webpage, 'app state'), video_id)
video_data = {}
for v in app_state.values():
content = try_get(v, lambda x: x['initialState']['contentData']['content'], dict)
if content and content.get('contentId') == video_id:
video_data = content
title = video_data['episodeTitle'] title = video_data['title']
if video_data.get('encrypted') == 'Y': if video_data.get('drmProtected'):
raise ExtractorError('This video is DRM protected.', expected=True) raise ExtractorError('This video is DRM protected.', expected=True)
formats = [] formats = []
for f in ('JIO',): format_data = self._call_api('h/v1/play', video_id)['item']
format_data = self._download_json( format_url = format_data['playbackUrl']
'http://getcdn.hotstar.com/AVS/besc', ext = determine_ext(format_url)
video_id, 'Downloading %s JSON metadata' % f, if ext == 'm3u8':
fatal=False, query={ try:
'action': 'GetCDN', formats.extend(self._extract_m3u8_formats(
'asJson': 'Y', format_url, video_id, 'mp4', m3u8_id='hls'))
'channel': f, except ExtractorError as e:
'id': video_id, if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
'type': 'VOD', self.raise_geo_restricted(countries=['IN'])
}) raise
if format_data: elif ext == 'f4m':
format_url = format_data.get('src') # produce broken files
if not format_url: pass
continue else:
ext = determine_ext(format_url) formats.append({
if ext == 'm3u8': 'url': format_url,
formats.extend(self._extract_m3u8_formats( 'width': int_or_none(format_data.get('width')),
format_url, video_id, 'mp4', 'height': int_or_none(format_data.get('height')),
m3u8_id='hls', fatal=False)) })
elif ext == 'f4m':
# produce broken files
continue
else:
formats.append({
'url': format_url,
'width': int_or_none(format_data.get('width')),
'height': int_or_none(format_data.get('height')),
})
self._sort_formats(formats) self._sort_formats(formats)
return { return {
@ -106,57 +112,43 @@ class HotStarIE(HotStarBaseIE):
'title': title, 'title': title,
'description': video_data.get('description'), 'description': video_data.get('description'),
'duration': int_or_none(video_data.get('duration')), 'duration': int_or_none(video_data.get('duration')),
'timestamp': int_or_none(video_data.get('broadcastDate')), 'timestamp': int_or_none(video_data.get('broadcastDate') or video_data.get('startDate')),
'formats': formats, 'formats': formats,
'channel': video_data.get('channelName'),
'channel_id': video_data.get('channelId'),
'series': video_data.get('showName'),
'season': video_data.get('seasonName'),
'season_number': int_or_none(video_data.get('seasonNo')),
'season_id': video_data.get('seasonId'),
'episode': title, 'episode': title,
'episode_number': int_or_none(video_data.get('episodeNumber')), 'episode_number': int_or_none(video_data.get('episodeNo')),
'series': video_data.get('contentTitle'),
} }
class HotStarPlaylistIE(HotStarBaseIE): class HotStarPlaylistIE(HotStarBaseIE):
IE_NAME = 'hotstar:playlist' IE_NAME = 'hotstar:playlist'
_VALID_URL = r'(?P<url>https?://(?:www\.)?hotstar\.com/tv/[^/]+/(?P<content_id>\d+))/(?P<type>[^/]+)/(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?hotstar\.com/tv/[^/]+/s-\w+/list/[^/]+/t-(?P<id>\w+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.hotstar.com/tv/pratidaan/14982/episodes/14812/9993', 'url': 'https://www.hotstar.com/tv/savdhaan-india/s-26/list/popular-clips/t-3_2_26',
'info_dict': { 'info_dict': {
'id': '14812', 'id': '3_2_26',
}, },
'playlist_mincount': 75, 'playlist_mincount': 20,
}, { }, {
'url': 'http://www.hotstar.com/tv/pratidaan/14982/popular-clips/9998/9998', 'url': 'https://www.hotstar.com/tv/savdhaan-india/s-26/list/extras/t-2480',
'only_matching': True, 'only_matching': True,
}] }]
_ITEM_TYPES = {
'episodes': 'EPISODE',
'popular-clips': 'CLIPS',
}
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) playlist_id = self._match_id(url)
base_url = mobj.group('url')
content_id = mobj.group('content_id')
playlist_type = mobj.group('type')
content_info = self._download_content_info(content_id) collection = self._call_api('o/v1/tray/find', playlist_id, 'uqId')
playlist_id = compat_str(content_info['categoryId'])
collection = self._download_json(
'https://search.hotstar.com/AVS/besc', playlist_id, query={
'action': 'SearchContents',
'appVersion': '5.0.40',
'channel': 'PCTV',
'moreFilters': 'series:%s;' % playlist_id,
'query': '*',
'searchOrder': 'last_broadcast_date desc,year desc,title asc',
'type': self._ITEM_TYPES.get(playlist_type, 'EPISODE'),
})
entries = [ entries = [
self.url_result( self.url_result(
'%s/_/%s' % (base_url, video['contentId']), 'https://www.hotstar.com/%s' % video['contentId'],
ie=HotStarIE.ie_key(), video_id=video['contentId']) ie=HotStarIE.ie_key(), video_id=video['contentId'])
for video in collection['response']['docs'] for video in collection['assets']['items']
if video.get('contentId')] if video.get('contentId')]
return self.playlist_result(entries, playlist_id) return self.playlist_result(entries, playlist_id)

View File

@ -7,7 +7,7 @@ from ..utils import unified_timestamp
class InternazionaleIE(InfoExtractor): class InternazionaleIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?internazionale\.it/video/(?:[^/]+/)*(?P<id>[^/?#&]+)' _VALID_URL = r'https?://(?:www\.)?internazionale\.it/video/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TEST = { _TESTS = [{
'url': 'https://www.internazionale.it/video/2015/02/19/richard-linklater-racconta-una-scena-di-boyhood', 'url': 'https://www.internazionale.it/video/2015/02/19/richard-linklater-racconta-una-scena-di-boyhood',
'md5': '3e39d32b66882c1218e305acbf8348ca', 'md5': '3e39d32b66882c1218e305acbf8348ca',
'info_dict': { 'info_dict': {
@ -23,7 +23,23 @@ class InternazionaleIE(InfoExtractor):
'params': { 'params': {
'format': 'bestvideo', 'format': 'bestvideo',
}, },
} }, {
'url': 'https://www.internazionale.it/video/2018/08/29/telefono-stare-con-noi-stessi',
'md5': '9db8663704cab73eb972d1cee0082c79',
'info_dict': {
'id': '761344',
'display_id': 'telefono-stare-con-noi-stessi',
'ext': 'mp4',
'title': 'Usiamo il telefono per evitare di stare con noi stessi',
'description': 'md5:75ccfb0d6bcefc6e7428c68b4aa1fe44',
'timestamp': 1535528954,
'upload_date': '20180829',
'thumbnail': r're:^https?://.*\.jpg$',
},
'params': {
'format': 'bestvideo',
},
}]
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) display_id = self._match_id(url)
@ -40,8 +56,13 @@ class InternazionaleIE(InfoExtractor):
DATA_RE % 'job-id', webpage, 'video id', group='value') DATA_RE % 'job-id', webpage, 'video id', group='value')
video_path = self._search_regex( video_path = self._search_regex(
DATA_RE % 'video-path', webpage, 'video path', group='value') DATA_RE % 'video-path', webpage, 'video path', group='value')
video_available_abroad = self._search_regex(
DATA_RE % 'video-available_abroad', webpage,
'video available aboard', default='1', group='value')
video_available_abroad = video_available_abroad == '1'
video_base = 'https://video.internazionale.it/%s/%s.' % (video_path, video_id) video_base = 'https://video%s.internazionale.it/%s/%s.' % \
('' if video_available_abroad else '-ita', video_path, video_id)
formats = self._extract_m3u8_formats( formats = self._extract_m3u8_formats(
video_base + 'm3u8', display_id, 'mp4', video_base + 'm3u8', display_id, 'mp4',

View File

@ -12,7 +12,7 @@ from ..utils import (
class IPrimaIE(InfoExtractor): class IPrimaIE(InfoExtractor):
_VALID_URL = r'https?://play\.iprima\.cz/(?:.+/)?(?P<id>[^?#]+)' _VALID_URL = r'https?://(?:play|prima)\.iprima\.cz/(?:.+/)?(?P<id>[^?#]+)'
_GEO_BYPASS = False _GEO_BYPASS = False
_TESTS = [{ _TESTS = [{
@ -33,14 +33,27 @@ class IPrimaIE(InfoExtractor):
# geo restricted # geo restricted
'url': 'http://play.iprima.cz/closer-nove-pripady/closer-nove-pripady-iv-1', 'url': 'http://play.iprima.cz/closer-nove-pripady/closer-nove-pripady-iv-1',
'only_matching': True, 'only_matching': True,
}, {
# iframe api.play-backend.iprima.cz
'url': 'https://prima.iprima.cz/my-little-pony/mapa-znameni-2-2',
'only_matching': True,
}, {
# iframe prima.iprima.cz
'url': 'https://prima.iprima.cz/porady/jak-se-stavi-sen/rodina-rathousova-praha',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
self._set_cookie('play.iprima.cz', 'ott_adult_confirmed', '1')
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
video_id = self._search_regex(r'data-product="([^"]+)">', webpage, 'real id') video_id = self._search_regex(
(r'<iframe[^>]+\bsrc=["\'](?:https?:)?//(?:api\.play-backend\.iprima\.cz/prehravac/embedded|prima\.iprima\.cz/[^/]+/[^/]+)\?.*?\bid=(p\d+)',
r'data-product="([^"]+)">'),
webpage, 'real id')
playerpage = self._download_webpage( playerpage = self._download_webpage(
'http://play.iprima.cz/prehravac/init', 'http://play.iprima.cz/prehravac/init',

View File

@ -26,8 +26,15 @@ class JamendoBaseIE(InfoExtractor):
class JamendoIE(JamendoBaseIE): class JamendoIE(JamendoBaseIE):
_VALID_URL = r'https?://(?:www\.)?jamendo\.com/track/(?P<id>[0-9]+)/(?P<display_id>[^/?#&]+)' _VALID_URL = r'''(?x)
_TEST = { https?://
(?:
licensing\.jamendo\.com/[^/]+|
(?:www\.)?jamendo\.com
)
/track/(?P<id>[0-9]+)/(?P<display_id>[^/?#&]+)
'''
_TESTS = [{
'url': 'https://www.jamendo.com/track/196219/stories-from-emona-i', 'url': 'https://www.jamendo.com/track/196219/stories-from-emona-i',
'md5': '6e9e82ed6db98678f171c25a8ed09ffd', 'md5': '6e9e82ed6db98678f171c25a8ed09ffd',
'info_dict': { 'info_dict': {
@ -40,14 +47,19 @@ class JamendoIE(JamendoBaseIE):
'duration': 210, 'duration': 210,
'thumbnail': r're:^https?://.*\.jpg' 'thumbnail': r're:^https?://.*\.jpg'
} }
} }, {
'url': 'https://licensing.jamendo.com/en/track/1496667/energetic-rock',
'only_matching': True,
}]
def _real_extract(self, url): def _real_extract(self, url):
mobj = self._VALID_URL_RE.match(url) mobj = self._VALID_URL_RE.match(url)
track_id = mobj.group('id') track_id = mobj.group('id')
display_id = mobj.group('display_id') display_id = mobj.group('display_id')
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(
'https://www.jamendo.com/track/%s/%s' % (track_id, display_id),
display_id)
title, artist, track = self._extract_meta(webpage) title, artist, track = self._extract_meta(webpage)

View File

@ -4,6 +4,11 @@ from __future__ import unicode_literals
import re import re
from .theplatform import ThePlatformBaseIE from .theplatform import ThePlatformBaseIE
from ..compat import (
compat_parse_qs,
compat_str,
compat_urllib_parse_urlparse,
)
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
int_or_none, int_or_none,
@ -76,12 +81,33 @@ class MediasetIE(ThePlatformBaseIE):
}] }]
@staticmethod @staticmethod
def _extract_urls(webpage): def _extract_urls(ie, webpage):
return [ def _qs(url):
mobj.group('url') return compat_parse_qs(compat_urllib_parse_urlparse(url).query)
for mobj in re.finditer(
r'<iframe\b[^>]+\bsrc=(["\'])(?P<url>https?://(?:www\.)?video\.mediaset\.it/player/playerIFrame(?:Twitter)?\.shtml\?.*?\bid=\d+.*?)\1', def _program_guid(qs):
webpage)] return qs.get('programGuid', [None])[0]
entries = []
for mobj in re.finditer(
r'<iframe\b[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//(?:www\.)?video\.mediaset\.it/player/playerIFrame(?:Twitter)?\.shtml.*?)\1',
webpage):
embed_url = mobj.group('url')
embed_qs = _qs(embed_url)
program_guid = _program_guid(embed_qs)
if program_guid:
entries.append(embed_url)
continue
video_id = embed_qs.get('id', [None])[0]
if not video_id:
continue
urlh = ie._request_webpage(
embed_url, video_id, note='Following embed URL redirect')
embed_url = compat_str(urlh.geturl())
program_guid = _program_guid(_qs(embed_url))
if program_guid:
entries.append(embed_url)
return entries
def _real_extract(self, url): def _real_extract(self, url):
guid = self._match_id(url) guid = self._match_id(url)

View File

@ -167,9 +167,9 @@ class MotherlessGroupIE(InfoExtractor):
if not entries: if not entries:
entries = [ entries = [
self.url_result( self.url_result(
compat_urlparse.urljoin(base, '/' + video_id), compat_urlparse.urljoin(base, '/' + entry_id),
ie=MotherlessIE.ie_key(), video_id=video_id) ie=MotherlessIE.ie_key(), video_id=entry_id)
for video_id in orderedSet(re.findall( for entry_id in orderedSet(re.findall(
r'data-codename=["\']([A-Z0-9]+)', webpage))] r'data-codename=["\']([A-Z0-9]+)', webpage))]
return entries return entries

View File

@ -7,6 +7,7 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from .theplatform import ThePlatformIE from .theplatform import ThePlatformIE
from .adobepass import AdobePassIE from .adobepass import AdobePassIE
from ..compat import compat_urllib_parse_unquote
from ..utils import ( from ..utils import (
find_xpath_attr, find_xpath_attr,
smuggle_url, smuggle_url,
@ -75,11 +76,16 @@ class NBCIE(AdobePassIE):
'url': 'https://www.nbc.com/classic-tv/charles-in-charge/video/charles-in-charge-pilot/n3310', 'url': 'https://www.nbc.com/classic-tv/charles-in-charge/video/charles-in-charge-pilot/n3310',
'only_matching': True, 'only_matching': True,
}, },
{
# Percent escaped url
'url': 'https://www.nbc.com/up-all-night/video/day-after-valentine%27s-day/n2189',
'only_matching': True,
}
] ]
def _real_extract(self, url): def _real_extract(self, url):
permalink, video_id = re.match(self._VALID_URL, url).groups() permalink, video_id = re.match(self._VALID_URL, url).groups()
permalink = 'http' + permalink permalink = 'http' + compat_urllib_parse_unquote(permalink)
response = self._download_json( response = self._download_json(
'https://api.nbc.com/v3/videos', video_id, query={ 'https://api.nbc.com/v3/videos', video_id, query={
'filter[permalink]': permalink, 'filter[permalink]': permalink,

View File

@ -252,7 +252,7 @@ class NiconicoIE(InfoExtractor):
}, },
'timing_constraint': 'unlimited' 'timing_constraint': 'unlimited'
} }
})) }).encode())
resolution = video_quality.get('resolution', {}) resolution = video_quality.get('resolution', {})

View File

@ -243,7 +243,7 @@ class PhantomJSwrapper(object):
class OpenloadIE(InfoExtractor): class OpenloadIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?(?:openload\.(?:co|io|link)|oload\.(?:tv|stream|site|xyz|win|download))/(?:f|embed)/(?P<id>[a-zA-Z0-9-_]+)' _VALID_URL = r'https?://(?:www\.)?(?:openload\.(?:co|io|link)|oload\.(?:tv|stream|site|xyz|win|download|cloud))/(?:f|embed)/(?P<id>[a-zA-Z0-9-_]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://openload.co/f/kUEfGclsU9o', 'url': 'https://openload.co/f/kUEfGclsU9o',
@ -307,6 +307,9 @@ class OpenloadIE(InfoExtractor):
}, { }, {
'url': 'https://oload.download/f/kUEfGclsU9o', 'url': 'https://oload.download/f/kUEfGclsU9o',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://oload.cloud/f/4ZDnBXRWiB8',
'only_matching': True,
}, { }, {
# Its title has not got its extension but url has it # Its title has not got its extension but url has it
'url': 'https://oload.download/f/N4Otkw39VCw/Tomb.Raider.2018.HDRip.XviD.AC3-EVO.avi.mp4', 'url': 'https://oload.download/f/N4Otkw39VCw/Tomb.Raider.2018.HDRip.XviD.AC3-EVO.avi.mp4',

View File

@ -2,31 +2,38 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str
from ..utils import ( from ..utils import (
float_or_none, try_get,
int_or_none, urljoin,
parse_iso8601,
xpath_text,
) )
class PhilharmonieDeParisIE(InfoExtractor): class PhilharmonieDeParisIE(InfoExtractor):
IE_DESC = 'Philharmonie de Paris' IE_DESC = 'Philharmonie de Paris'
_VALID_URL = r'https?://live\.philharmoniedeparis\.fr/(?:[Cc]oncert/|misc/Playlist\.ashx\?id=)(?P<id>\d+)' _VALID_URL = r'''(?x)
https?://
(?:
live\.philharmoniedeparis\.fr/(?:[Cc]oncert/|misc/Playlist\.ashx\?id=)|
pad\.philharmoniedeparis\.fr/doc/CIMU/
)
(?P<id>\d+)
'''
_TESTS = [{ _TESTS = [{
'url': 'http://pad.philharmoniedeparis.fr/doc/CIMU/1086697/jazz-a-la-villette-knower',
'md5': 'a0a4b195f544645073631cbec166a2c2',
'info_dict': {
'id': '1086697',
'ext': 'mp4',
'title': 'Jazz à la Villette : Knower',
},
}, {
'url': 'http://live.philharmoniedeparis.fr/concert/1032066.html', 'url': 'http://live.philharmoniedeparis.fr/concert/1032066.html',
'info_dict': { 'info_dict': {
'id': '1032066', 'id': '1032066',
'ext': 'flv', 'title': 'md5:0a031b81807b3593cffa3c9a87a167a0',
'title': 'md5:d1f5585d87d041d07ce9434804bc8425',
'timestamp': 1428179400,
'upload_date': '20150404',
'duration': 6592.278,
}, },
'params': { 'playlist_mincount': 2,
# rtmp download
'skip_download': True,
}
}, { }, {
'url': 'http://live.philharmoniedeparis.fr/Concert/1030324.html', 'url': 'http://live.philharmoniedeparis.fr/Concert/1030324.html',
'only_matching': True, 'only_matching': True,
@ -34,45 +41,60 @@ class PhilharmonieDeParisIE(InfoExtractor):
'url': 'http://live.philharmoniedeparis.fr/misc/Playlist.ashx?id=1030324&track=&lang=fr', 'url': 'http://live.philharmoniedeparis.fr/misc/Playlist.ashx?id=1030324&track=&lang=fr',
'only_matching': True, 'only_matching': True,
}] }]
_LIVE_URL = 'https://live.philharmoniedeparis.fr'
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
concert = self._download_xml( config = self._download_json(
'http://live.philharmoniedeparis.fr/misc/Playlist.ashx?id=%s' % video_id, '%s/otoPlayer/config.ashx' % self._LIVE_URL, video_id, query={
video_id).find('./concert') 'id': video_id,
'lang': 'fr-FR',
})
formats = [] def extract_entry(source):
info_dict = { if not isinstance(source, dict):
'id': video_id, return
'title': xpath_text(concert, './titre', 'title', fatal=True), title = source.get('title')
'formats': formats, if not title:
} return
files = source.get('files')
fichiers = concert.find('./fichiers') if not isinstance(files, dict):
stream = fichiers.attrib['serveurstream'] return
for fichier in fichiers.findall('./fichier'): format_urls = set()
info_dict['duration'] = float_or_none(fichier.get('timecodefin')) formats = []
for quality, (format_id, suffix) in enumerate([('lq', ''), ('hq', '_hd')]): for format_id in ('mobile', 'desktop'):
format_url = fichier.get('url%s' % suffix) format_url = try_get(
if not format_url: files, lambda x: x[format_id]['file'], compat_str)
if not format_url or format_url in format_urls:
continue continue
formats.append({ format_urls.add(format_url)
'url': stream, m3u8_url = urljoin(self._LIVE_URL, format_url)
'play_path': format_url, formats.extend(self._extract_m3u8_formats(
'ext': 'flv', m3u8_url, video_id, 'mp4', entry_protocol='m3u8_native',
'format_id': format_id, m3u8_id='hls', fatal=False))
'width': int_or_none(concert.get('largeur%s' % suffix)), if not formats:
'height': int_or_none(concert.get('hauteur%s' % suffix)), return
'quality': quality, self._sort_formats(formats)
}) return {
self._sort_formats(formats) 'title': title,
'formats': formats,
}
date, hour = concert.get('date'), concert.get('heure') thumbnail = urljoin(self._LIVE_URL, config.get('image'))
if date and hour:
info_dict['timestamp'] = parse_iso8601(
'%s-%s-%sT%s:00' % (date[0:4], date[4:6], date[6:8], hour))
elif date:
info_dict['upload_date'] = date
return info_dict info = extract_entry(config)
if info:
info.update({
'id': video_id,
'thumbnail': thumbnail,
})
return info
entries = []
for num, chapter in enumerate(config['chapters'], start=1):
entry = extract_entry(chapter)
entry['id'] = '%s-%d' % (video_id, num)
entries.append(entry)
return self.playlist_result(entries, video_id, config.get('title'))

View File

@ -210,18 +210,26 @@ query viewClip {
raise ExtractorError('Unable to log in') raise ExtractorError('Unable to log in')
def _get_subtitles(self, author, clip_idx, lang, name, duration, video_id): def _get_subtitles(self, author, clip_idx, clip_id, lang, name, duration, video_id):
captions_post = { captions = None
'a': author, if clip_id:
'cn': clip_idx, captions = self._download_json(
'lc': lang, '%s/transcript/api/v1/caption/json/%s/%s'
'm': name, % (self._API_BASE, clip_id, lang), video_id,
} 'Downloading captions JSON', 'Unable to download captions JSON',
captions = self._download_json( fatal=False)
'%s/player/retrieve-captions' % self._API_BASE, video_id, if not captions:
'Downloading captions JSON', 'Unable to download captions JSON', captions_post = {
fatal=False, data=json.dumps(captions_post).encode('utf-8'), 'a': author,
headers={'Content-Type': 'application/json;charset=utf-8'}) 'cn': int(clip_idx),
'lc': lang,
'm': name,
}
captions = self._download_json(
'%s/player/retrieve-captions' % self._API_BASE, video_id,
'Downloading captions JSON', 'Unable to download captions JSON',
fatal=False, data=json.dumps(captions_post).encode('utf-8'),
headers={'Content-Type': 'application/json;charset=utf-8'})
if captions: if captions:
return { return {
lang: [{ lang: [{
@ -413,7 +421,7 @@ query viewClip {
# TODO: other languages? # TODO: other languages?
subtitles = self.extract_subtitles( subtitles = self.extract_subtitles(
author, clip_idx, 'en', name, duration, display_id) author, clip_idx, clip.get('clipId'), 'en', name, duration, display_id)
return { return {
'id': clip_id, 'id': clip_id,

View File

@ -58,8 +58,6 @@ class PopcornTVIE(InfoExtractor):
thumbnail = self._og_search_thumbnail(webpage) thumbnail = self._og_search_thumbnail(webpage)
timestamp = unified_timestamp(self._html_search_meta( timestamp = unified_timestamp(self._html_search_meta(
'uploadDate', webpage, 'timestamp')) 'uploadDate', webpage, 'timestamp'))
print(self._html_search_meta(
'duration', webpage))
duration = int_or_none(self._html_search_meta( duration = int_or_none(self._html_search_meta(
'duration', webpage), invscale=60) 'duration', webpage), invscale=60)
view_count = int_or_none(self._html_search_meta( view_count = int_or_none(self._html_search_meta(

View File

@ -40,6 +40,7 @@ class PornHubIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'Seductive Indian beauty strips down and fingers her pink pussy', 'title': 'Seductive Indian beauty strips down and fingers her pink pussy',
'uploader': 'Babes', 'uploader': 'Babes',
'upload_date': '20130628',
'duration': 361, 'duration': 361,
'view_count': int, 'view_count': int,
'like_count': int, 'like_count': int,
@ -57,6 +58,7 @@ class PornHubIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': '重庆婷婷女王足交', 'title': '重庆婷婷女王足交',
'uploader': 'Unknown', 'uploader': 'Unknown',
'upload_date': '20150213',
'duration': 1753, 'duration': 1753,
'view_count': int, 'view_count': int,
'like_count': int, 'like_count': int,
@ -237,8 +239,14 @@ class PornHubIE(InfoExtractor):
video_urls.append((video_url, None)) video_urls.append((video_url, None))
video_urls_set.add(video_url) video_urls_set.add(video_url)
upload_date = None
formats = [] formats = []
for video_url, height in video_urls: for video_url, height in video_urls:
if not upload_date:
upload_date = self._search_regex(
r'/(\d{6}/\d{2})/', video_url, 'upload data', default=None)
if upload_date:
upload_date = upload_date.replace('/', '')
tbr = None tbr = None
mobj = re.search(r'(?P<height>\d+)[pP]?_(?P<tbr>\d+)[kK]', video_url) mobj = re.search(r'(?P<height>\d+)[pP]?_(?P<tbr>\d+)[kK]', video_url)
if mobj: if mobj:
@ -254,7 +262,7 @@ class PornHubIE(InfoExtractor):
self._sort_formats(formats) self._sort_formats(formats)
video_uploader = self._html_search_regex( video_uploader = self._html_search_regex(
r'(?s)From:&nbsp;.+?<(?:a\b[^>]+\bhref=["\']/(?:user|channel)s/|span\b[^>]+\bclass=["\']username)[^>]+>(.+?)<', r'(?s)From:&nbsp;.+?<(?:a\b[^>]+\bhref=["\']/(?:(?:user|channel)s|model|pornstar)/|span\b[^>]+\bclass=["\']username)[^>]+>(.+?)<',
webpage, 'uploader', fatal=False) webpage, 'uploader', fatal=False)
view_count = self._extract_count( view_count = self._extract_count(
@ -278,6 +286,7 @@ class PornHubIE(InfoExtractor):
return { return {
'id': video_id, 'id': video_id,
'uploader': video_uploader, 'uploader': video_uploader,
'upload_date': upload_date,
'title': title, 'title': title,
'thumbnail': thumbnail, 'thumbnail': thumbnail,
'duration': duration, 'duration': duration,
@ -346,7 +355,7 @@ class PornHubPlaylistIE(PornHubPlaylistBaseIE):
class PornHubUserVideosIE(PornHubPlaylistBaseIE): class PornHubUserVideosIE(PornHubPlaylistBaseIE):
_VALID_URL = r'https?://(?:[^/]+\.)?pornhub\.com/(?:user|channel)s/(?P<id>[^/]+)/videos' _VALID_URL = r'https?://(?:[^/]+\.)?pornhub\.com/(?:(?:user|channel)s|model|pornstar)/(?P<id>[^/]+)/videos'
_TESTS = [{ _TESTS = [{
'url': 'http://www.pornhub.com/users/zoe_ph/videos/public', 'url': 'http://www.pornhub.com/users/zoe_ph/videos/public',
'info_dict': { 'info_dict': {
@ -378,6 +387,12 @@ class PornHubUserVideosIE(PornHubPlaylistBaseIE):
}, { }, {
'url': 'http://www.pornhub.com/users/zoe_ph/videos/public', 'url': 'http://www.pornhub.com/users/zoe_ph/videos/public',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.pornhub.com/model/jayndrea/videos/upload',
'only_matching': True,
}, {
'url': 'https://www.pornhub.com/pornstar/jenny-blighe/videos/upload',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -4,8 +4,11 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
unified_strdate, parse_resolution,
str_to_int, str_to_int,
unified_strdate,
urlencode_postdata,
urljoin,
) )
@ -29,13 +32,26 @@ class RadioJavanIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
download_host = self._download_json(
'https://www.radiojavan.com/videos/video_host', video_id,
data=urlencode_postdata({'id': video_id}),
headers={
'Content-Type': 'application/x-www-form-urlencoded',
'Referer': url,
}).get('host', 'https://host1.rjmusicmedia.com')
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
formats = [{ formats = []
'url': 'https://media.rdjavan.com/media/music_video/%s' % video_path, for format_id, _, video_path in re.findall(
'format_id': '%sp' % height, r'RJ\.video(?P<format_id>\d+[pPkK])\s*=\s*(["\'])(?P<url>(?:(?!\2).)+)\2',
'height': int(height), webpage):
} for height, video_path in re.findall(r"RJ\.video(\d+)p\s*=\s*'/?([^']+)'", webpage)] f = parse_resolution(format_id)
f.update({
'url': urljoin(download_host, video_path),
'format_id': format_id,
})
formats.append(f)
self._sort_formats(formats) self._sort_formats(formats)
title = self._og_search_title(webpage) title = self._og_search_title(webpage)

View File

@ -274,7 +274,6 @@ class RaiPlayPlaylistIE(InfoExtractor):
('programma', 'nomeProgramma'), webpage, 'title') ('programma', 'nomeProgramma'), webpage, 'title')
description = unescapeHTML(self._html_search_meta( description = unescapeHTML(self._html_search_meta(
('description', 'og:description'), webpage, 'description')) ('description', 'og:description'), webpage, 'description'))
print(description)
entries = [] entries = []
for mobj in re.finditer( for mobj in re.finditer(

View File

@ -164,6 +164,6 @@ class SeznamZpravyArticleIE(InfoExtractor):
description = info.get('description') or self._og_search_description(webpage) description = info.get('description') or self._og_search_description(webpage)
return self.playlist_result([ return self.playlist_result([
self.url_result(url, ie=SeznamZpravyIE.ie_key()) self.url_result(entry_url, ie=SeznamZpravyIE.ie_key())
for url in SeznamZpravyIE._extract_urls(webpage)], for entry_url in SeznamZpravyIE._extract_urls(webpage)],
article_id, title, description) article_id, title, description)

View File

@ -8,6 +8,7 @@ from ..utils import ExtractorError
class SlidesLiveIE(InfoExtractor): class SlidesLiveIE(InfoExtractor):
_VALID_URL = r'https?://slideslive\.com/(?P<id>[0-9]+)' _VALID_URL = r'https?://slideslive\.com/(?P<id>[0-9]+)'
_TESTS = [{ _TESTS = [{
# video_service_name = YOUTUBE
'url': 'https://slideslive.com/38902413/gcc-ia16-backend', 'url': 'https://slideslive.com/38902413/gcc-ia16-backend',
'md5': 'b29fcd6c6952d0c79c5079b0e7a07e6f', 'md5': 'b29fcd6c6952d0c79c5079b0e7a07e6f',
'info_dict': { 'info_dict': {
@ -19,14 +20,18 @@ class SlidesLiveIE(InfoExtractor):
'uploader_id': 'UC62SdArr41t_-_fX40QCLRw', 'uploader_id': 'UC62SdArr41t_-_fX40QCLRw',
'upload_date': '20170925', 'upload_date': '20170925',
} }
}, {
# video_service_name = youtube
'url': 'https://slideslive.com/38903721/magic-a-scientific-resurrection-of-an-esoteric-legend',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
video_data = self._download_json( video_data = self._download_json(
url, video_id, headers={'Accept': 'application/json'}) url, video_id, headers={'Accept': 'application/json'})
service_name = video_data['video_service_name'] service_name = video_data['video_service_name'].lower()
if service_name == 'YOUTUBE': if service_name == 'youtube':
yt_video_id = video_data['video_service_id'] yt_video_id = video_data['video_service_id']
return self.url_result(yt_video_id, 'Youtube', video_id=yt_video_id) return self.url_result(yt_video_id, 'Youtube', video_id=yt_video_id)
else: else:

View File

@ -44,3 +44,10 @@ class ParamountNetworkIE(MTVServicesInfoExtractor):
_FEED_URL = 'http://www.paramountnetwork.com/feeds/mrss/' _FEED_URL = 'http://www.paramountnetwork.com/feeds/mrss/'
_GEO_COUNTRIES = ['US'] _GEO_COUNTRIES = ['US']
def _extract_mgid(self, webpage):
cs = self._parse_json(self._search_regex(
r'window\.__DATA__\s*=\s*({.+})',
webpage, 'data'), None)['children']
c = next(c for c in cs if c.get('type') == 'VideoPlayer')
return c['props']['media']['video']['config']['uri']

View File

@ -0,0 +1,44 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from .nexx import NexxIE
from ..compat import compat_urlparse
class Tele5IE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?tele5\.de/(?:mediathek|tv)/(?P<id>[^?#&]+)'
_TESTS = [{
'url': 'https://www.tele5.de/mediathek/filme-online/videos?vid=1549416',
'info_dict': {
'id': '1549416',
'ext': 'mp4',
'upload_date': '20180814',
'timestamp': 1534290623,
'title': 'Pandorum',
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://www.tele5.de/tv/kalkofes-mattscheibe/video-clips/politik-und-gesellschaft?ve_id=1551191',
'only_matching': True,
}, {
'url': 'https://www.tele5.de/tv/dark-matter/videos',
'only_matching': True,
}]
def _real_extract(self, url):
qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
video_id = (qs.get('vid') or qs.get('ve_id') or [None])[0]
if not video_id:
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
video_id = self._html_search_regex(
r'id\s*=\s*["\']video-player["\'][^>]+data-id\s*=\s*["\'](\d+)',
webpage, 'video id')
return self.url_result(
'https://api.nexx.cloud/v3/759/videos/byid/%s' % video_id,
ie=NexxIE.ie_key(), video_id=video_id)

View File

@ -45,7 +45,7 @@ class Tube8IE(KeezMoviesIE):
r'videoTitle\s*=\s*"([^"]+)', webpage, 'title') r'videoTitle\s*=\s*"([^"]+)', webpage, 'title')
description = self._html_search_regex( description = self._html_search_regex(
r'>Description:</strong>\s*(.+?)\s*<', webpage, 'description', fatal=False) r'(?s)Description:</dt>\s*<dd>(.+?)</dd>', webpage, 'description', fatal=False)
uploader = self._html_search_regex( uploader = self._html_search_regex(
r'<span class="username">\s*(.+?)\s*<', r'<span class="username">\s*(.+?)\s*<',
webpage, 'uploader', fatal=False) webpage, 'uploader', fatal=False)
@ -55,19 +55,19 @@ class Tube8IE(KeezMoviesIE):
dislike_count = int_or_none(self._search_regex( dislike_count = int_or_none(self._search_regex(
r'rdownVar\s*=\s*"(\d+)"', webpage, 'dislike count', fatal=False)) r'rdownVar\s*=\s*"(\d+)"', webpage, 'dislike count', fatal=False))
view_count = str_to_int(self._search_regex( view_count = str_to_int(self._search_regex(
r'<strong>Views: </strong>([\d,\.]+)\s*</li>', r'Views:\s*</dt>\s*<dd>([\d,\.]+)',
webpage, 'view count', fatal=False)) webpage, 'view count', fatal=False))
comment_count = str_to_int(self._search_regex( comment_count = str_to_int(self._search_regex(
r'<span id="allCommentsCount">(\d+)</span>', r'<span id="allCommentsCount">(\d+)</span>',
webpage, 'comment count', fatal=False)) webpage, 'comment count', fatal=False))
category = self._search_regex( category = self._search_regex(
r'Category:\s*</strong>\s*<a[^>]+href=[^>]+>([^<]+)', r'Category:\s*</dt>\s*<dd>\s*<a[^>]+href=[^>]+>([^<]+)',
webpage, 'category', fatal=False) webpage, 'category', fatal=False)
categories = [category] if category else None categories = [category] if category else None
tags_str = self._search_regex( tags_str = self._search_regex(
r'(?s)Tags:\s*</strong>(.+?)</(?!a)', r'(?s)Tags:\s*</dt>\s*<dd>(.+?)</(?!a)',
webpage, 'tags', fatal=False) webpage, 'tags', fatal=False)
tags = [t for t in re.findall( tags = [t for t in re.findall(
r'<a[^>]+href=[^>]+>([^<]+)', tags_str)] if tags_str else None r'<a[^>]+href=[^>]+>([^<]+)', tags_str)] if tags_str else None

View File

@ -51,7 +51,9 @@ class TwitchBaseIE(InfoExtractor):
expected=True) expected=True)
def _call_api(self, path, item_id, *args, **kwargs): def _call_api(self, path, item_id, *args, **kwargs):
kwargs.setdefault('headers', {})['Client-ID'] = self._CLIENT_ID headers = kwargs.get('headers', {}).copy()
headers['Client-ID'] = self._CLIENT_ID
kwargs['headers'] = headers
response = self._download_json( response = self._download_json(
'%s/%s' % (self._API_BASE, path), item_id, '%s/%s' % (self._API_BASE, path), item_id,
*args, **compat_kwargs(kwargs)) *args, **compat_kwargs(kwargs))
@ -559,7 +561,8 @@ class TwitchStreamIE(TwitchBaseIE):
TwitchAllVideosIE, TwitchAllVideosIE,
TwitchUploadsIE, TwitchUploadsIE,
TwitchPastBroadcastsIE, TwitchPastBroadcastsIE,
TwitchHighlightsIE)) TwitchHighlightsIE,
TwitchClipsIE))
else super(TwitchStreamIE, cls).suitable(url)) else super(TwitchStreamIE, cls).suitable(url))
def _real_extract(self, url): def _real_extract(self, url):
@ -633,7 +636,7 @@ class TwitchStreamIE(TwitchBaseIE):
class TwitchClipsIE(TwitchBaseIE): class TwitchClipsIE(TwitchBaseIE):
IE_NAME = 'twitch:clips' IE_NAME = 'twitch:clips'
_VALID_URL = r'https?://clips\.twitch\.tv/(?:[^/]+/)*(?P<id>[^/?#&]+)' _VALID_URL = r'https?://(?:clips\.twitch\.tv/(?:[^/]+/)*|(?:www\.)?twitch\.tv/[^/]+/clip/)(?P<id>[^/?#&]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://clips.twitch.tv/FaintLightGullWholeWheat', 'url': 'https://clips.twitch.tv/FaintLightGullWholeWheat',
@ -653,6 +656,9 @@ class TwitchClipsIE(TwitchBaseIE):
# multiple formats # multiple formats
'url': 'https://clips.twitch.tv/rflegendary/UninterestedBeeDAESuppy', 'url': 'https://clips.twitch.tv/rflegendary/UninterestedBeeDAESuppy',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.twitch.tv/sergeynixon/clip/StormyThankfulSproutFutureMan',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -122,7 +122,9 @@ class UdemyIE(InfoExtractor):
raise ExtractorError(error_str, expected=True) raise ExtractorError(error_str, expected=True)
def _download_webpage_handle(self, *args, **kwargs): def _download_webpage_handle(self, *args, **kwargs):
kwargs.setdefault('headers', {})['User-Agent'] = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/603.2.4 (KHTML, like Gecko) Version/10.1.1 Safari/603.2.4' headers = kwargs.get('headers', {}).copy()
headers['User-Agent'] = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/603.2.4 (KHTML, like Gecko) Version/10.1.1 Safari/603.2.4'
kwargs['headers'] = headers
return super(UdemyIE, self)._download_webpage_handle( return super(UdemyIE, self)._download_webpage_handle(
*args, **compat_kwargs(kwargs)) *args, **compat_kwargs(kwargs))

View File

@ -299,10 +299,13 @@ class VimeoIE(VimeoBaseInfoExtractor):
'uploader_url': r're:https?://(?:www\.)?vimeo\.com/atencio', 'uploader_url': r're:https?://(?:www\.)?vimeo\.com/atencio',
'uploader_id': 'atencio', 'uploader_id': 'atencio',
'uploader': 'Peter Atencio', 'uploader': 'Peter Atencio',
'channel_id': 'keypeele',
'channel_url': r're:https?://(?:www\.)?vimeo\.com/channels/keypeele',
'timestamp': 1380339469, 'timestamp': 1380339469,
'upload_date': '20130928', 'upload_date': '20130928',
'duration': 187, 'duration': 187,
}, },
'expected_warnings': ['Unable to download JSON metadata'],
}, },
{ {
'url': 'http://vimeo.com/76979871', 'url': 'http://vimeo.com/76979871',
@ -355,11 +358,13 @@ class VimeoIE(VimeoBaseInfoExtractor):
'url': 'https://vimeo.com/channels/tributes/6213729', 'url': 'https://vimeo.com/channels/tributes/6213729',
'info_dict': { 'info_dict': {
'id': '6213729', 'id': '6213729',
'ext': 'mov', 'ext': 'mp4',
'title': 'Vimeo Tribute: The Shining', 'title': 'Vimeo Tribute: The Shining',
'uploader': 'Casey Donahue', 'uploader': 'Casey Donahue',
'uploader_url': r're:https?://(?:www\.)?vimeo\.com/caseydonahue', 'uploader_url': r're:https?://(?:www\.)?vimeo\.com/caseydonahue',
'uploader_id': 'caseydonahue', 'uploader_id': 'caseydonahue',
'channel_url': r're:https?://(?:www\.)?vimeo\.com/channels/tributes',
'channel_id': 'tributes',
'timestamp': 1250886430, 'timestamp': 1250886430,
'upload_date': '20090821', 'upload_date': '20090821',
'description': 'md5:bdbf314014e58713e6e5b66eb252f4a6', 'description': 'md5:bdbf314014e58713e6e5b66eb252f4a6',
@ -465,6 +470,9 @@ class VimeoIE(VimeoBaseInfoExtractor):
if 'Referer' not in headers: if 'Referer' not in headers:
headers['Referer'] = url headers['Referer'] = url
channel_id = self._search_regex(
r'vimeo\.com/channels/([^/]+)', url, 'channel id', default=None)
# Extract ID from URL # Extract ID from URL
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id') video_id = mobj.group('id')
@ -543,6 +551,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
else: else:
config_re = [r' = {config:({.+?}),assets:', r'(?:[abc])=({.+?});'] config_re = [r' = {config:({.+?}),assets:', r'(?:[abc])=({.+?});']
config_re.append(r'\bvar\s+r\s*=\s*({.+?})\s*;') config_re.append(r'\bvar\s+r\s*=\s*({.+?})\s*;')
config_re.append(r'\bconfig\s*=\s*({.+?})\s*;')
config = self._search_regex(config_re, webpage, 'info section', config = self._search_regex(config_re, webpage, 'info section',
flags=re.DOTALL) flags=re.DOTALL)
config = json.loads(config) config = json.loads(config)
@ -563,19 +572,23 @@ class VimeoIE(VimeoBaseInfoExtractor):
if config.get('view') == 4: if config.get('view') == 4:
config = self._verify_player_video_password(redirect_url, video_id) config = self._verify_player_video_password(redirect_url, video_id)
vod = config.get('video', {}).get('vod', {})
def is_rented(): def is_rented():
if '>You rented this title.<' in webpage: if '>You rented this title.<' in webpage:
return True return True
if config.get('user', {}).get('purchased'): if config.get('user', {}).get('purchased'):
return True return True
label = try_get( for purchase_option in vod.get('purchase_options', []):
config, lambda x: x['video']['vod']['purchase_options'][0]['label_string'], compat_str) if purchase_option.get('purchased'):
if label and label.startswith('You rented this'): return True
return True label = purchase_option.get('label_string')
if label and (label.startswith('You rented this') or label.endswith(' remaining')):
return True
return False return False
if is_rented(): if is_rented() and vod.get('is_trailer'):
feature_id = config.get('video', {}).get('vod', {}).get('feature_id') feature_id = vod.get('feature_id')
if feature_id and not data.get('force_feature_id', False): if feature_id and not data.get('force_feature_id', False):
return self.url_result(smuggle_url( return self.url_result(smuggle_url(
'https://player.vimeo.com/player/%s' % feature_id, 'https://player.vimeo.com/player/%s' % feature_id,
@ -652,6 +665,8 @@ class VimeoIE(VimeoBaseInfoExtractor):
r'<link[^>]+rel=["\']license["\'][^>]+href=(["\'])(?P<license>(?:(?!\1).)+)\1', r'<link[^>]+rel=["\']license["\'][^>]+href=(["\'])(?P<license>(?:(?!\1).)+)\1',
webpage, 'license', default=None, group='license') webpage, 'license', default=None, group='license')
channel_url = 'https://vimeo.com/channels/%s' % channel_id if channel_id else None
info_dict = { info_dict = {
'id': video_id, 'id': video_id,
'formats': formats, 'formats': formats,
@ -662,6 +677,8 @@ class VimeoIE(VimeoBaseInfoExtractor):
'like_count': like_count, 'like_count': like_count,
'comment_count': comment_count, 'comment_count': comment_count,
'license': cc_license, 'license': cc_license,
'channel_id': channel_id,
'channel_url': channel_url,
} }
info_dict = merge_dicts(info_dict, info_dict_config, json_ld) info_dict = merge_dicts(info_dict, info_dict_config, json_ld)

View File

@ -72,7 +72,7 @@ class VRVBaseIE(InfoExtractor):
class VRVIE(VRVBaseIE): class VRVIE(VRVBaseIE):
IE_NAME = 'vrv' IE_NAME = 'vrv'
_VALID_URL = r'https?://(?:www\.)?vrv\.co/watch/(?P<id>[A-Z0-9]+)' _VALID_URL = r'https?://(?:www\.)?vrv\.co/watch/(?P<id>[A-Z0-9]+)'
_TEST = { _TESTS = [{
'url': 'https://vrv.co/watch/GR9PNZ396/Hidden-America-with-Jonah-Ray:BOSTON-WHERE-THE-PAST-IS-THE-PRESENT', 'url': 'https://vrv.co/watch/GR9PNZ396/Hidden-America-with-Jonah-Ray:BOSTON-WHERE-THE-PAST-IS-THE-PRESENT',
'info_dict': { 'info_dict': {
'id': 'GR9PNZ396', 'id': 'GR9PNZ396',
@ -85,7 +85,34 @@ class VRVIE(VRVBaseIE):
# m3u8 download # m3u8 download
'skip_download': True, 'skip_download': True,
}, },
} }]
def _extract_vrv_formats(self, url, video_id, stream_format, audio_lang, hardsub_lang):
if not url or stream_format not in ('hls', 'dash'):
return []
assert audio_lang or hardsub_lang
stream_id_list = []
if audio_lang:
stream_id_list.append('audio-%s' % audio_lang)
if hardsub_lang:
stream_id_list.append('hardsub-%s' % hardsub_lang)
stream_id = '-'.join(stream_id_list)
format_id = '%s-%s' % (stream_format, stream_id)
if stream_format == 'hls':
adaptive_formats = self._extract_m3u8_formats(
url, video_id, 'mp4', m3u8_id=format_id,
note='Downloading %s m3u8 information' % stream_id,
fatal=False)
elif stream_format == 'dash':
adaptive_formats = self._extract_mpd_formats(
url, video_id, mpd_id=format_id,
note='Downloading %s MPD information' % stream_id,
fatal=False)
if audio_lang:
for f in adaptive_formats:
if f.get('acodec') != 'none':
f['language'] = audio_lang
return adaptive_formats
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
@ -115,26 +142,9 @@ class VRVIE(VRVBaseIE):
for stream_type, streams in streams_json.get('streams', {}).items(): for stream_type, streams in streams_json.get('streams', {}).items():
if stream_type in ('adaptive_hls', 'adaptive_dash'): if stream_type in ('adaptive_hls', 'adaptive_dash'):
for stream in streams.values(): for stream in streams.values():
stream_url = stream.get('url') formats.extend(self._extract_vrv_formats(
if not stream_url: stream.get('url'), video_id, stream_type.split('_')[1],
continue audio_locale, stream.get('hardsub_locale')))
stream_id = stream.get('hardsub_locale') or audio_locale
format_id = '%s-%s' % (stream_type.split('_')[1], stream_id)
if stream_type == 'adaptive_hls':
adaptive_formats = self._extract_m3u8_formats(
stream_url, video_id, 'mp4', m3u8_id=format_id,
note='Downloading %s m3u8 information' % stream_id,
fatal=False)
else:
adaptive_formats = self._extract_mpd_formats(
stream_url, video_id, mpd_id=format_id,
note='Downloading %s MPD information' % stream_id,
fatal=False)
if audio_locale:
for f in adaptive_formats:
if f.get('acodec') != 'none':
f['language'] = audio_locale
formats.extend(adaptive_formats)
self._sort_formats(formats) self._sort_formats(formats)
subtitles = {} subtitles = {}

View File

@ -4,15 +4,19 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str
from ..utils import ( from ..utils import (
int_or_none, int_or_none,
float_or_none, float_or_none,
unified_timestamp,
url_or_none,
) )
class VzaarIE(InfoExtractor): class VzaarIE(InfoExtractor):
_VALID_URL = r'https?://(?:(?:www|view)\.)?vzaar\.com/(?:videos/)?(?P<id>\d+)' _VALID_URL = r'https?://(?:(?:www|view)\.)?vzaar\.com/(?:videos/)?(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
# HTTP and HLS
'url': 'https://vzaar.com/videos/1152805', 'url': 'https://vzaar.com/videos/1152805',
'md5': 'bde5ddfeb104a6c56a93a06b04901dbf', 'md5': 'bde5ddfeb104a6c56a93a06b04901dbf',
'info_dict': { 'info_dict': {
@ -40,24 +44,48 @@ class VzaarIE(InfoExtractor):
video_id = self._match_id(url) video_id = self._match_id(url)
video_data = self._download_json( video_data = self._download_json(
'http://view.vzaar.com/v2/%s/video' % video_id, video_id) 'http://view.vzaar.com/v2/%s/video' % video_id, video_id)
source_url = video_data['sourceUrl']
info = { title = video_data['videoTitle']
formats = []
source_url = url_or_none(video_data.get('sourceUrl'))
if source_url:
f = {
'url': source_url,
'format_id': 'http',
}
if 'audio' in source_url:
f.update({
'vcodec': 'none',
'ext': 'mp3',
})
else:
f.update({
'width': int_or_none(video_data.get('width')),
'height': int_or_none(video_data.get('height')),
'ext': 'mp4',
'fps': float_or_none(video_data.get('fps')),
})
formats.append(f)
video_guid = video_data.get('guid')
usp = video_data.get('usp')
if isinstance(video_guid, compat_str) and isinstance(usp, dict):
m3u8_url = ('http://fable.vzaar.com/v4/usp/%s/%s.ism/.m3u8?'
% (video_guid, video_id)) + '&'.join(
'%s=%s' % (k, v) for k, v in usp.items())
formats.extend(self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
self._sort_formats(formats)
return {
'id': video_id, 'id': video_id,
'title': video_data['videoTitle'], 'title': title,
'url': source_url,
'thumbnail': self._proto_relative_url(video_data.get('poster')), 'thumbnail': self._proto_relative_url(video_data.get('poster')),
'duration': float_or_none(video_data.get('videoDuration')), 'duration': float_or_none(video_data.get('videoDuration')),
'timestamp': unified_timestamp(video_data.get('ts')),
'formats': formats,
} }
if 'audio' in source_url:
info.update({
'vcodec': 'none',
'ext': 'mp3',
})
else:
info.update({
'width': int_or_none(video_data.get('width')),
'height': int_or_none(video_data.get('height')),
'ext': 'mp4',
})
return info

View File

@ -259,7 +259,9 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
return True return True
def _download_webpage_handle(self, *args, **kwargs): def _download_webpage_handle(self, *args, **kwargs):
kwargs.setdefault('query', {})['disable_polymer'] = 'true' query = kwargs.get('query', {}).copy()
query['disable_polymer'] = 'true'
kwargs['query'] = query
return super(YoutubeBaseInfoExtractor, self)._download_webpage_handle( return super(YoutubeBaseInfoExtractor, self)._download_webpage_handle(
*args, **compat_kwargs(kwargs)) *args, **compat_kwargs(kwargs))
@ -347,6 +349,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
(?:www\.)?hooktube\.com/| (?:www\.)?hooktube\.com/|
(?:www\.)?yourepeat\.com/| (?:www\.)?yourepeat\.com/|
tube\.majestyc\.net/| tube\.majestyc\.net/|
(?:www\.)?invidio\.us/|
youtube\.googleapis\.com/) # the various hostnames, with wildcard subdomains youtube\.googleapis\.com/) # the various hostnames, with wildcard subdomains
(?:.*?\#/)? # handle anchor (#/) redirect urls (?:.*?\#/)? # handle anchor (#/) redirect urls
(?: # the various things that can precede the ID: (?: # the various things that can precede the ID:
@ -490,6 +493,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'uploader': 'Philipp Hagemeister', 'uploader': 'Philipp Hagemeister',
'uploader_id': 'phihag', 'uploader_id': 'phihag',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/phihag', 'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/phihag',
'channel_id': 'UCLqxVugv74EIW3VWh2NOa3Q',
'channel_url': r're:https?://(?:www\.)?youtube\.com/channel/UCLqxVugv74EIW3VWh2NOa3Q',
'upload_date': '20121002', 'upload_date': '20121002',
'license': 'Standard YouTube License', 'license': 'Standard YouTube License',
'description': 'test chars: "\'/\\ä↭𝕐\ntest URL: https://github.com/rg3/youtube-dl/issues/1892\n\nThis is a test video for youtube-dl.\n\nFor more information, contact phihag@phihag.de .', 'description': 'test chars: "\'/\\ä↭𝕐\ntest URL: https://github.com/rg3/youtube-dl/issues/1892\n\nThis is a test video for youtube-dl.\n\nFor more information, contact phihag@phihag.de .',
@ -1064,6 +1069,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'url': 'https://www.youtube.com/watch?v=MuAGGZNfUkU&list=RDMM', 'url': 'https://www.youtube.com/watch?v=MuAGGZNfUkU&list=RDMM',
'only_matching': True, 'only_matching': True,
}, },
{
'url': 'https://invidio.us/watch?v=BaW_jenozKc',
'only_matching': True,
},
] ]
def __init__(self, *args, **kwargs): def __init__(self, *args, **kwargs):
@ -1178,7 +1187,9 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
def _parse_sig_js(self, jscode): def _parse_sig_js(self, jscode):
funcname = self._search_regex( funcname = self._search_regex(
(r'(["\'])signature\1\s*,\s*(?P<sig>[a-zA-Z0-9$]+)\(', (r'(["\'])signature\1\s*,\s*(?P<sig>[a-zA-Z0-9$]+)\(',
r'\.sig\|\|(?P<sig>[a-zA-Z0-9$]+)\('), r'\.sig\|\|(?P<sig>[a-zA-Z0-9$]+)\(',
r'yt\.akamaized\.net/\)\s*\|\|\s*.*?\s*c\s*&&\s*d\.set\([^,]+\s*,\s*(?P<sig>[a-zA-Z0-9$]+)\(',
r'\bc\s*&&\s*d\.set\([^,]+\s*,\s*(?P<sig>[a-zA-Z0-9$]+)\('),
jscode, 'Initial JS player signature function name', group='sig') jscode, 'Initial JS player signature function name', group='sig')
jsi = JSInterpreter(jscode) jsi = JSInterpreter(jscode)
@ -1905,6 +1916,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
else: else:
self._downloader.report_warning('unable to extract uploader nickname') self._downloader.report_warning('unable to extract uploader nickname')
channel_id = self._html_search_meta(
'channelId', video_webpage, 'channel id')
channel_url = 'http://www.youtube.com/channel/%s' % channel_id if channel_id else None
# thumbnail image # thumbnail image
# We try first to get a high quality image: # We try first to get a high quality image:
m_thumb = re.search(r'<span itemprop="thumbnail".*?href="(.*?)">', m_thumb = re.search(r'<span itemprop="thumbnail".*?href="(.*?)">',
@ -2076,6 +2091,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'uploader': video_uploader, 'uploader': video_uploader,
'uploader_id': video_uploader_id, 'uploader_id': video_uploader_id,
'uploader_url': video_uploader_url, 'uploader_url': video_uploader_url,
'channel_id': channel_id,
'channel_url': channel_url,
'upload_date': upload_date, 'upload_date': upload_date,
'license': video_license, 'license': video_license,
'creator': video_creator or artist, 'creator': video_creator or artist,
@ -2407,7 +2424,7 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
class YoutubeChannelIE(YoutubePlaylistBaseInfoExtractor): class YoutubeChannelIE(YoutubePlaylistBaseInfoExtractor):
IE_DESC = 'YouTube.com channels' IE_DESC = 'YouTube.com channels'
_VALID_URL = r'https?://(?:youtu\.be|(?:\w+\.)?youtube(?:-nocookie)?\.com)/channel/(?P<id>[0-9A-Za-z_-]+)' _VALID_URL = r'https?://(?:youtu\.be|(?:\w+\.)?youtube(?:-nocookie)?\.com|(?:www\.)?invidio\.us)/channel/(?P<id>[0-9A-Za-z_-]+)'
_TEMPLATE_URL = 'https://www.youtube.com/channel/%s/videos' _TEMPLATE_URL = 'https://www.youtube.com/channel/%s/videos'
_VIDEO_RE = r'(?:title="(?P<title>[^"]+)"[^>]+)?href="/watch\?v=(?P<id>[0-9A-Za-z_-]+)&?' _VIDEO_RE = r'(?:title="(?P<title>[^"]+)"[^>]+)?href="/watch\?v=(?P<id>[0-9A-Za-z_-]+)&?'
IE_NAME = 'youtube:channel' IE_NAME = 'youtube:channel'
@ -2428,6 +2445,9 @@ class YoutubeChannelIE(YoutubePlaylistBaseInfoExtractor):
'id': 'UUs0ifCMCm1icqRbqhUINa0w', 'id': 'UUs0ifCMCm1icqRbqhUINa0w',
'title': 'Uploads from Deus Ex', 'title': 'Uploads from Deus Ex',
}, },
}, {
'url': 'https://invidio.us/channel/UC23qupoDRn9YOAVzeoxjOQA',
'only_matching': True,
}] }]
@classmethod @classmethod

View File

@ -18,12 +18,12 @@ from ..utils import (
) )
class ZattooBaseIE(InfoExtractor): class ZattooPlatformBaseIE(InfoExtractor):
_NETRC_MACHINE = 'zattoo'
_HOST_URL = 'https://zattoo.com'
_power_guide_hash = None _power_guide_hash = None
def _host_url(self):
return 'https://%s' % self._HOST
def _login(self): def _login(self):
username, password = self._get_login_info() username, password = self._get_login_info()
if not username or not password: if not username or not password:
@ -33,13 +33,13 @@ class ZattooBaseIE(InfoExtractor):
try: try:
data = self._download_json( data = self._download_json(
'%s/zapi/v2/account/login' % self._HOST_URL, None, 'Logging in', '%s/zapi/v2/account/login' % self._host_url(), None, 'Logging in',
data=urlencode_postdata({ data=urlencode_postdata({
'login': username, 'login': username,
'password': password, 'password': password,
'remember': 'true', 'remember': 'true',
}), headers={ }), headers={
'Referer': '%s/login' % self._HOST_URL, 'Referer': '%s/login' % self._host_url(),
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8', 'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
}) })
except ExtractorError as e: except ExtractorError as e:
@ -53,7 +53,7 @@ class ZattooBaseIE(InfoExtractor):
def _real_initialize(self): def _real_initialize(self):
webpage = self._download_webpage( webpage = self._download_webpage(
self._HOST_URL, None, 'Downloading app token') self._host_url(), None, 'Downloading app token')
app_token = self._html_search_regex( app_token = self._html_search_regex(
r'appToken\s*=\s*(["\'])(?P<token>(?:(?!\1).)+?)\1', r'appToken\s*=\s*(["\'])(?P<token>(?:(?!\1).)+?)\1',
webpage, 'app token', group='token') webpage, 'app token', group='token')
@ -62,7 +62,7 @@ class ZattooBaseIE(InfoExtractor):
# Will setup appropriate cookies # Will setup appropriate cookies
self._request_webpage( self._request_webpage(
'%s/zapi/v2/session/hello' % self._HOST_URL, None, '%s/zapi/v2/session/hello' % self._host_url(), None,
'Opening session', data=urlencode_postdata({ 'Opening session', data=urlencode_postdata({
'client_app_token': app_token, 'client_app_token': app_token,
'uuid': compat_str(uuid4()), 'uuid': compat_str(uuid4()),
@ -75,7 +75,7 @@ class ZattooBaseIE(InfoExtractor):
def _extract_cid(self, video_id, channel_name): def _extract_cid(self, video_id, channel_name):
channel_groups = self._download_json( channel_groups = self._download_json(
'%s/zapi/v2/cached/channels/%s' % (self._HOST_URL, '%s/zapi/v2/cached/channels/%s' % (self._host_url(),
self._power_guide_hash), self._power_guide_hash),
video_id, 'Downloading channel list', video_id, 'Downloading channel list',
query={'details': False})['channel_groups'] query={'details': False})['channel_groups']
@ -93,28 +93,30 @@ class ZattooBaseIE(InfoExtractor):
def _extract_cid_and_video_info(self, video_id): def _extract_cid_and_video_info(self, video_id):
data = self._download_json( data = self._download_json(
'%s/zapi/program/details' % self._HOST_URL, '%s/zapi/v2/cached/program/power_details/%s' % (
self._host_url(), self._power_guide_hash),
video_id, video_id,
'Downloading video information', 'Downloading video information',
query={ query={
'program_id': video_id, 'program_ids': video_id,
'complete': True 'complete': True,
}) })
p = data['program'] p = data['programs'][0]
cid = p['cid'] cid = p['cid']
info_dict = { info_dict = {
'id': video_id, 'id': video_id,
'title': p.get('title') or p['episode_title'], 'title': p.get('t') or p['et'],
'description': p.get('description'), 'description': p.get('d'),
'thumbnail': p.get('image_url'), 'thumbnail': p.get('i_url'),
'creator': p.get('channel_name'), 'creator': p.get('channel_name'),
'episode': p.get('episode_title'), 'episode': p.get('et'),
'episode_number': int_or_none(p.get('episode_number')), 'episode_number': int_or_none(p.get('e_no')),
'season_number': int_or_none(p.get('season_number')), 'season_number': int_or_none(p.get('s_no')),
'release_year': int_or_none(p.get('year')), 'release_year': int_or_none(p.get('year')),
'categories': try_get(p, lambda x: x['categories'], list), 'categories': try_get(p, lambda x: x['c'], list),
'tags': try_get(p, lambda x: x['g'], list)
} }
return cid, info_dict return cid, info_dict
@ -126,11 +128,11 @@ class ZattooBaseIE(InfoExtractor):
if is_live: if is_live:
postdata_common.update({'timeshift': 10800}) postdata_common.update({'timeshift': 10800})
url = '%s/zapi/watch/live/%s' % (self._HOST_URL, cid) url = '%s/zapi/watch/live/%s' % (self._host_url(), cid)
elif record_id: elif record_id:
url = '%s/zapi/watch/recording/%s' % (self._HOST_URL, record_id) url = '%s/zapi/watch/recording/%s' % (self._host_url(), record_id)
else: else:
url = '%s/zapi/watch/recall/%s/%s' % (self._HOST_URL, cid, video_id) url = '%s/zapi/watch/recall/%s/%s' % (self._host_url(), cid, video_id)
formats = [] formats = []
for stream_type in ('dash', 'hls', 'hls5', 'hds'): for stream_type in ('dash', 'hls', 'hls5', 'hds'):
@ -201,13 +203,13 @@ class ZattooBaseIE(InfoExtractor):
return info_dict return info_dict
class QuicklineBaseIE(ZattooBaseIE): class QuicklineBaseIE(ZattooPlatformBaseIE):
_NETRC_MACHINE = 'quickline' _NETRC_MACHINE = 'quickline'
_HOST_URL = 'https://mobiltv.quickline.com' _HOST = 'mobiltv.quickline.com'
class QuicklineIE(QuicklineBaseIE): class QuicklineIE(QuicklineBaseIE):
_VALID_URL = r'https?://(?:www\.)?mobiltv\.quickline\.com/watch/(?P<channel>[^/]+)/(?P<id>[0-9]+)' _VALID_URL = r'https?://(?:www\.)?%s/watch/(?P<channel>[^/]+)/(?P<id>[0-9]+)' % re.escape(QuicklineBaseIE._HOST)
_TEST = { _TEST = {
'url': 'https://mobiltv.quickline.com/watch/prosieben/130671867-maze-runner-die-auserwaehlten-in-der-brandwueste', 'url': 'https://mobiltv.quickline.com/watch/prosieben/130671867-maze-runner-die-auserwaehlten-in-der-brandwueste',
@ -220,7 +222,7 @@ class QuicklineIE(QuicklineBaseIE):
class QuicklineLiveIE(QuicklineBaseIE): class QuicklineLiveIE(QuicklineBaseIE):
_VALID_URL = r'https?://(?:www\.)?mobiltv\.quickline\.com/watch/(?P<id>[^/]+)' _VALID_URL = r'https?://(?:www\.)?%s/watch/(?P<id>[^/]+)' % re.escape(QuicklineBaseIE._HOST)
_TEST = { _TEST = {
'url': 'https://mobiltv.quickline.com/watch/srf1', 'url': 'https://mobiltv.quickline.com/watch/srf1',
@ -236,8 +238,18 @@ class QuicklineLiveIE(QuicklineBaseIE):
return self._extract_video(channel_name, video_id, is_live=True) return self._extract_video(channel_name, video_id, is_live=True)
class ZattooBaseIE(ZattooPlatformBaseIE):
_NETRC_MACHINE = 'zattoo'
_HOST = 'zattoo.com'
def _make_valid_url(tmpl, host):
return tmpl % re.escape(host)
class ZattooIE(ZattooBaseIE): class ZattooIE(ZattooBaseIE):
_VALID_URL = r'https?://(?:www\.)?zattoo\.com/watch/(?P<channel>[^/]+?)/(?P<id>[0-9]+)[^/]+(?:/(?P<recid>[0-9]+))?' _VALID_URL_TEMPLATE = r'https?://(?:www\.)?%s/watch/(?P<channel>[^/]+?)/(?P<id>[0-9]+)[^/]+(?:/(?P<recid>[0-9]+))?'
_VALID_URL = _make_valid_url(_VALID_URL_TEMPLATE, ZattooBaseIE._HOST)
# Since regular videos are only available for 7 days and recorded videos # Since regular videos are only available for 7 days and recorded videos
# are only available for a specific user, we cannot have detailed tests. # are only available for a specific user, we cannot have detailed tests.
@ -269,3 +281,135 @@ class ZattooLiveIE(ZattooBaseIE):
def _real_extract(self, url): def _real_extract(self, url):
channel_name = video_id = self._match_id(url) channel_name = video_id = self._match_id(url)
return self._extract_video(channel_name, video_id, is_live=True) return self._extract_video(channel_name, video_id, is_live=True)
class NetPlusIE(ZattooIE):
_NETRC_MACHINE = 'netplus'
_HOST = 'netplus.tv'
_VALID_URL = _make_valid_url(ZattooIE._VALID_URL_TEMPLATE, _HOST)
_TESTS = [{
'url': 'https://www.netplus.tv/watch/abc/123-abc',
'only_matching': True,
}]
class MNetTVIE(ZattooIE):
_NETRC_MACHINE = 'mnettv'
_HOST = 'tvplus.m-net.de'
_VALID_URL = _make_valid_url(ZattooIE._VALID_URL_TEMPLATE, _HOST)
_TESTS = [{
'url': 'https://www.tvplus.m-net.de/watch/abc/123-abc',
'only_matching': True,
}]
class WalyTVIE(ZattooIE):
_NETRC_MACHINE = 'walytv'
_HOST = 'player.waly.tv'
_VALID_URL = _make_valid_url(ZattooIE._VALID_URL_TEMPLATE, _HOST)
_TESTS = [{
'url': 'https://www.player.waly.tv/watch/abc/123-abc',
'only_matching': True,
}]
class BBVTVIE(ZattooIE):
_NETRC_MACHINE = 'bbvtv'
_HOST = 'bbv-tv.net'
_VALID_URL = _make_valid_url(ZattooIE._VALID_URL_TEMPLATE, _HOST)
_TESTS = [{
'url': 'https://www.bbv-tv.net/watch/abc/123-abc',
'only_matching': True,
}]
class VTXTVIE(ZattooIE):
_NETRC_MACHINE = 'vtxtv'
_HOST = 'vtxtv.ch'
_VALID_URL = _make_valid_url(ZattooIE._VALID_URL_TEMPLATE, _HOST)
_TESTS = [{
'url': 'https://www.vtxtv.ch/watch/abc/123-abc',
'only_matching': True,
}]
class MyVisionTVIE(ZattooIE):
_NETRC_MACHINE = 'myvisiontv'
_HOST = 'myvisiontv.ch'
_VALID_URL = _make_valid_url(ZattooIE._VALID_URL_TEMPLATE, _HOST)
_TESTS = [{
'url': 'https://www.myvisiontv.ch/watch/abc/123-abc',
'only_matching': True,
}]
class GlattvisionTVIE(ZattooIE):
_NETRC_MACHINE = 'glattvisiontv'
_HOST = 'iptv.glattvision.ch'
_VALID_URL = _make_valid_url(ZattooIE._VALID_URL_TEMPLATE, _HOST)
_TESTS = [{
'url': 'https://www.iptv.glattvision.ch/watch/abc/123-abc',
'only_matching': True,
}]
class SAKTVIE(ZattooIE):
_NETRC_MACHINE = 'saktv'
_HOST = 'saktv.ch'
_VALID_URL = _make_valid_url(ZattooIE._VALID_URL_TEMPLATE, _HOST)
_TESTS = [{
'url': 'https://www.saktv.ch/watch/abc/123-abc',
'only_matching': True,
}]
class EWETVIE(ZattooIE):
_NETRC_MACHINE = 'ewetv'
_HOST = 'tvonline.ewe.de'
_VALID_URL = _make_valid_url(ZattooIE._VALID_URL_TEMPLATE, _HOST)
_TESTS = [{
'url': 'https://www.tvonline.ewe.de/watch/abc/123-abc',
'only_matching': True,
}]
class QuantumTVIE(ZattooIE):
_NETRC_MACHINE = 'quantumtv'
_HOST = 'quantum-tv.com'
_VALID_URL = _make_valid_url(ZattooIE._VALID_URL_TEMPLATE, _HOST)
_TESTS = [{
'url': 'https://www.quantum-tv.com/watch/abc/123-abc',
'only_matching': True,
}]
class OsnatelTVIE(ZattooIE):
_NETRC_MACHINE = 'osnateltv'
_HOST = 'onlinetv.osnatel.de'
_VALID_URL = _make_valid_url(ZattooIE._VALID_URL_TEMPLATE, _HOST)
_TESTS = [{
'url': 'https://www.onlinetv.osnatel.de/watch/abc/123-abc',
'only_matching': True,
}]
class EinsUndEinsTVIE(ZattooIE):
_NETRC_MACHINE = '1und1tv'
_HOST = '1und1.tv'
_VALID_URL = _make_valid_url(ZattooIE._VALID_URL_TEMPLATE, _HOST)
_TESTS = [{
'url': 'https://www.1und1.tv/watch/abc/123-abc',
'only_matching': True,
}]

View File

@ -2477,7 +2477,7 @@ def parse_codecs(codecs_str):
vcodec, acodec = None, None vcodec, acodec = None, None
for full_codec in splited_codecs: for full_codec in splited_codecs:
codec = full_codec.split('.')[0] codec = full_codec.split('.')[0]
if codec in ('avc1', 'avc2', 'avc3', 'avc4', 'vp9', 'vp8', 'hev1', 'hev2', 'h263', 'h264', 'mp4v', 'hvc1'): if codec in ('avc1', 'avc2', 'avc3', 'avc4', 'vp9', 'vp8', 'hev1', 'hev2', 'h263', 'h264', 'mp4v', 'hvc1', 'av01'):
if not vcodec: if not vcodec:
vcodec = full_codec vcodec = full_codec
elif codec in ('mp4a', 'opus', 'vorbis', 'mp3', 'aac', 'ac-3', 'ec-3', 'eac3', 'dtsc', 'dtse', 'dtsh', 'dtsl'): elif codec in ('mp4a', 'opus', 'vorbis', 'mp3', 'aac', 'ac-3', 'ec-3', 'eac3', 'dtsc', 'dtse', 'dtsh', 'dtsl'):

View File

@ -1,3 +1,3 @@
from __future__ import unicode_literals from __future__ import unicode_literals
__version__ = '2018.08.28' __version__ = '2018.09.26'