1
0
mirror of https://codeberg.org/polarisfm/youtube-dl synced 2024-12-25 07:47:55 +01:00

Merge pull request #4 from ytdl-org/master

Master 01.03.2020 to own
This commit is contained in:
Oneboy1979 2020-03-02 16:13:45 +01:00 committed by GitHub
commit 5770e2911c
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
85 changed files with 1913 additions and 1536 deletions

View File

@ -18,7 +18,7 @@ title: ''
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.01.01. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.03.01. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape.
- Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates.
@ -26,7 +26,7 @@ Carefully read and work through this check list in order to prevent the most com
-->
- [ ] I'm reporting a broken site support
- [ ] I've verified that I'm running youtube-dl version **2020.01.01**
- [ ] I've verified that I'm running youtube-dl version **2020.03.01**
- [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
- [ ] I've searched the bugtracker for similar issues including closed ones
@ -41,7 +41,7 @@ Add the `-v` flag to your command line you run youtube-dl with (`youtube-dl -v <
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2020.01.01
[debug] youtube-dl version 2020.03.01
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

View File

@ -19,7 +19,7 @@ labels: 'site-support-request'
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.01.01. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.03.01. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that site you are requesting is not dedicated to copyright infringement, see https://yt-dl.org/copyright-infringement. youtube-dl does not support such sites. In order for site support request to be accepted all provided example URLs should not violate any copyrights.
- Search the bugtracker for similar site support requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
@ -27,7 +27,7 @@ Carefully read and work through this check list in order to prevent the most com
-->
- [ ] I'm reporting a new site support request
- [ ] I've verified that I'm running youtube-dl version **2020.01.01**
- [ ] I've verified that I'm running youtube-dl version **2020.03.01**
- [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that none of provided URLs violate any copyrights
- [ ] I've searched the bugtracker for similar site support requests including closed ones

View File

@ -18,13 +18,13 @@ title: ''
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.01.01. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.03.01. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Search the bugtracker for similar site feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
- Finally, put x into all relevant boxes (like this [x])
-->
- [ ] I'm reporting a site feature request
- [ ] I've verified that I'm running youtube-dl version **2020.01.01**
- [ ] I've verified that I'm running youtube-dl version **2020.03.01**
- [ ] I've searched the bugtracker for similar site feature requests including closed ones

View File

@ -18,7 +18,7 @@ title: ''
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.01.01. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.03.01. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape.
- Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates.
@ -27,7 +27,7 @@ Carefully read and work through this check list in order to prevent the most com
-->
- [ ] I'm reporting a broken site support issue
- [ ] I've verified that I'm running youtube-dl version **2020.01.01**
- [ ] I've verified that I'm running youtube-dl version **2020.03.01**
- [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
- [ ] I've searched the bugtracker for similar bug reports including closed ones
@ -43,7 +43,7 @@ Add the `-v` flag to your command line you run youtube-dl with (`youtube-dl -v <
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2020.01.01
[debug] youtube-dl version 2020.03.01
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

View File

@ -19,13 +19,13 @@ labels: 'request'
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.01.01. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.03.01. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Search the bugtracker for similar feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
- Finally, put x into all relevant boxes (like this [x])
-->
- [ ] I'm reporting a feature request
- [ ] I've verified that I'm running youtube-dl version **2020.01.01**
- [ ] I've verified that I'm running youtube-dl version **2020.03.01**
- [ ] I've searched the bugtracker for similar feature requests including closed ones

View File

@ -13,7 +13,7 @@ dist: trusty
env:
- YTDL_TEST_SET=core
- YTDL_TEST_SET=download
matrix:
jobs:
include:
- python: 3.7
dist: xenial
@ -35,6 +35,11 @@ matrix:
env: YTDL_TEST_SET=download
- env: JYTHON=true; YTDL_TEST_SET=core
- env: JYTHON=true; YTDL_TEST_SET=download
- name: flake8
python: 3.8
dist: xenial
install: pip install flake8
script: flake8 .
fast_finish: true
allow_failures:
- env: YTDL_TEST_SET=download

111
ChangeLog
View File

@ -1,3 +1,114 @@
version 2020.03.01
Core
* [YoutubeDL] Force redirect URL to unicode on python 2
- [options] Remove duplicate short option -v for --version (#24162)
Extractors
* [xhamster] Fix extraction (#24205)
* [franceculture] Fix extraction (#24204)
+ [telecinco] Add support for article opening videos
* [telecinco] Fix extraction (#24195)
* [xtube] Fix metadata extraction (#21073, #22455)
* [youjizz] Fix extraction (#24181)
- Remove no longer needed compat_str around geturl
* [pornhd] Fix extraction (#24128)
+ [teachable] Add support for multiple videos per lecture (#24101)
+ [wistia] Add support for multiple generic embeds (#8347, 11385)
* [imdb] Fix extraction (#23443)
* [tv2dk:bornholm:play] Fix extraction (#24076)
version 2020.02.16
Core
* [YoutubeDL] Fix playlist entry indexing with --playlist-items (#10591,
#10622)
* [update] Fix updating via symlinks (#23991)
+ [compat] Introduce compat_realpath (#23991)
Extractors
+ [npr] Add support for streams (#24042)
+ [24video] Add support for porn.24video.net (#23779, #23784)
- [jpopsuki] Remove extractor (#23858)
* [nova] Improve extraction (#23690)
* [nova:embed] Improve (#23690)
* [nova:embed] Fix extraction (#23672)
+ [abc:iview] Add support for 720p (#22907, #22921)
* [nytimes] Improve format sorting (#24010)
+ [toggle] Add support for mewatch.sg (#23895, #23930)
* [thisoldhouse] Fix extraction (#23951)
+ [popcorntimes] Add support for popcorntimes.tv (#23949)
* [sportdeutschland] Update to new API
* [twitch:stream] Lowercase channel id for stream request (#23917)
* [tv5mondeplus] Fix extraction (#23907, #23911)
* [tva] Relax URL regular expression (#23903)
* [vimeo] Fix album extraction (#23864)
* [viewlift] Improve extraction
* Fix extraction (#23851)
+ Add support for authentication
+ Add support for more domains
* [svt] Fix series extraction (#22297)
* [svt] Fix article extraction (#22897, #22919)
* [soundcloud] Imporve private playlist/set tracks extraction (#3707)
version 2020.01.24
Extractors
* [youtube] Fix sigfunc name extraction (#23819)
* [stretchinternet] Fix extraction (#4319)
* [voicerepublic] Fix extraction
* [azmedien] Fix extraction (#23783)
* [businessinsider] Fix jwplatform id extraction (#22929, #22954)
+ [24video] Add support for 24video.vip (#23753)
* [ivi:compilation] Fix entries extraction (#23770)
* [ard] Improve extraction (#23761)
* Simplify extraction
+ Extract age limit and series
* Bypass geo-restriction
+ [nbc] Add support for nbc multi network URLs (#23049)
* [americastestkitchen] Fix extraction
* [zype] Improve extraction
+ Extract subtitles (#21258)
+ Support URLs with alternative keys/tokens (#21258)
+ Extract more metadata
* [orf:tvthek] Improve geo restricted videos detection (#23741)
* [soundcloud] Restore previews extraction (#23739)
version 2020.01.15
Extractors
* [yourporn] Fix extraction (#21645, #22255, #23459)
+ [canvas] Add support for new API endpoint (#17680, #18629)
* [ndr:base:embed] Improve thumbnails extraction (#23731)
+ [vodplatform] Add support for embed.kwikmotion.com domain
+ [twitter] Add support for promo_video_website cards (#23711)
* [orf:radio] Clean description and improve extraction
* [orf:fm4] Fix extraction (#23599)
* [safari] Fix kaltura session extraction (#23679, #23670)
* [lego] Fix extraction and extract subtitle (#23687)
* [cloudflarestream] Improve extraction
+ Add support for bytehighway.net domain
+ Add support for signed URLs
+ Extract thumbnail
* [naver] Improve extraction
* Improve geo-restriction handling
+ Extract automatic captions
+ Extract uploader metadata
+ Extract VLive HLS formats
* Improve metadata extraction
- [pandatv] Remove extractor (#23630)
* [dctp] Fix format extraction (#23656)
+ [scrippsnetworks] Add support for www.discovery.com videos
* [discovery] Fix anonymous token extraction (#23650)
* [nrktv:seriebase] Fix extraction (#23625, #23537)
* [wistia] Improve format extraction and extract subtitles (#22590)
* [vice] Improve extraction (#23631)
* [redtube] Detect private videos (#23518)
version 2020.01.01
Extractors

View File

@ -1,7 +1,6 @@
#!/usr/bin/env python
from __future__ import unicode_literals
import base64
import io
import json
import mimetypes

View File

@ -389,7 +389,6 @@
- **JeuxVideo**
- **Joj**
- **Jove**
- **jpopsuki.tv**
- **JWPlatform**
- **Kakao**
- **Kaltura**
@ -628,7 +627,6 @@
- **OutsideTV**
- **PacktPub**
- **PacktPubCourse**
- **PandaTV**: 熊猫TV
- **pandora.tv**: 판도라TV
- **ParamountNetwork**
- **parliamentlive.tv**: UK parliament videos
@ -664,6 +662,7 @@
- **Pokemon**
- **PolskieRadio**
- **PolskieRadioCategory**
- **Popcorntimes**
- **PopcornTV**
- **PornCom**
- **PornerBros**
@ -1005,8 +1004,8 @@
- **Vidzi**
- **vier**: vier.be and vijf.be
- **vier:videos**
- **ViewLift**
- **ViewLiftEmbed**
- **viewlift**
- **viewlift:embed**
- **Viidea**
- **viki**
- **viki:channel**

View File

@ -816,11 +816,15 @@ class TestYoutubeDL(unittest.TestCase):
'webpage_url': 'http://example.com',
}
def get_ids(params):
def get_downloaded_info_dicts(params):
ydl = YDL(params)
# make a copy because the dictionary can be modified
ydl.process_ie_result(playlist.copy())
return [int(v['id']) for v in ydl.downloaded_info_dicts]
# make a deep copy because the dictionary and nested entries
# can be modified
ydl.process_ie_result(copy.deepcopy(playlist))
return ydl.downloaded_info_dicts
def get_ids(params):
return [int(v['id']) for v in get_downloaded_info_dicts(params)]
result = get_ids({})
self.assertEqual(result, [1, 2, 3, 4])
@ -852,6 +856,22 @@ class TestYoutubeDL(unittest.TestCase):
result = get_ids({'playlist_items': '2-4,3-4,3'})
self.assertEqual(result, [2, 3, 4])
# Tests for https://github.com/ytdl-org/youtube-dl/issues/10591
# @{
result = get_downloaded_info_dicts({'playlist_items': '2-4,3-4,3'})
self.assertEqual(result[0]['playlist_index'], 2)
self.assertEqual(result[1]['playlist_index'], 3)
result = get_downloaded_info_dicts({'playlist_items': '2-4,3-4,3'})
self.assertEqual(result[0]['playlist_index'], 2)
self.assertEqual(result[1]['playlist_index'], 3)
self.assertEqual(result[2]['playlist_index'], 4)
result = get_downloaded_info_dicts({'playlist_items': '4,2'})
self.assertEqual(result[0]['playlist_index'], 4)
self.assertEqual(result[1]['playlist_index'], 2)
# @}
def test_urlopen_no_file_protocol(self):
# see https://github.com/ytdl-org/youtube-dl/issues/8227
ydl = YDL()

View File

@ -26,7 +26,6 @@ from youtube_dl.extractor import (
ThePlatformIE,
ThePlatformFeedIE,
RTVEALaCartaIE,
FunnyOrDieIE,
DemocracynowIE,
)
@ -322,18 +321,6 @@ class TestRtveSubtitles(BaseTestSubtitles):
self.assertEqual(md5(subtitles['es']), '69e70cae2d40574fb7316f31d6eb7fca')
class TestFunnyOrDieSubtitles(BaseTestSubtitles):
url = 'http://www.funnyordie.com/videos/224829ff6d/judd-apatow-will-direct-your-vine'
IE = FunnyOrDieIE
def test_allsubtitles(self):
self.DL.params['writesubtitles'] = True
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(set(subtitles.keys()), set(['en']))
self.assertEqual(md5(subtitles['en']), 'c5593c193eacd353596c11c2d4f9ecc4')
class TestDemocracynowSubtitles(BaseTestSubtitles):
url = 'http://www.democracynow.org/shows/2015/7/3'
IE = DemocracynowIE

View File

@ -92,6 +92,7 @@ from .utils import (
YoutubeDLCookieJar,
YoutubeDLCookieProcessor,
YoutubeDLHandler,
YoutubeDLRedirectHandler,
)
from .cache import Cache
from .extractor import get_info_extractor, gen_extractor_classes, _LAZY_LOADER
@ -990,7 +991,7 @@ class YoutubeDL(object):
'playlist_title': ie_result.get('title'),
'playlist_uploader': ie_result.get('uploader'),
'playlist_uploader_id': ie_result.get('uploader_id'),
'playlist_index': i + playliststart,
'playlist_index': playlistitems[i - 1] if playlistitems else i + playliststart,
'extractor': ie_result['extractor'],
'webpage_url': ie_result['webpage_url'],
'webpage_url_basename': url_basename(ie_result['webpage_url']),
@ -2343,6 +2344,7 @@ class YoutubeDL(object):
debuglevel = 1 if self.params.get('debug_printtraffic') else 0
https_handler = make_HTTPS_handler(self.params, debuglevel=debuglevel)
ydlh = YoutubeDLHandler(self.params, debuglevel=debuglevel)
redirect_handler = YoutubeDLRedirectHandler()
data_handler = compat_urllib_request_DataHandler()
# When passing our own FileHandler instance, build_opener won't add the
@ -2356,7 +2358,7 @@ class YoutubeDL(object):
file_handler.file_open = file_open
opener = compat_urllib_request.build_opener(
proxy_handler, https_handler, cookie_processor, ydlh, data_handler, file_handler)
proxy_handler, https_handler, cookie_processor, ydlh, redirect_handler, data_handler, file_handler)
# Delete the default user-agent header, which would otherwise apply in
# cases where our custom HTTP handler doesn't come into play

View File

@ -2754,6 +2754,17 @@ else:
compat_expanduser = os.path.expanduser
if compat_os_name == 'nt' and sys.version_info < (3, 8):
# os.path.realpath on Windows does not follow symbolic links
# prior to Python 3.8 (see https://bugs.python.org/issue9949)
def compat_realpath(path):
while os.path.islink(path):
path = os.path.abspath(os.readlink(path))
return path
else:
compat_realpath = os.path.realpath
if sys.version_info < (3, 0):
def compat_print(s):
from .utils import preferredencoding
@ -2998,6 +3009,7 @@ __all__ = [
'compat_os_name',
'compat_parse_qs',
'compat_print',
'compat_realpath',
'compat_setenv',
'compat_shlex_quote',
'compat_shlex_split',

View File

@ -110,17 +110,17 @@ class ABCIViewIE(InfoExtractor):
# ABC iview programs are normally available for 14 days only.
_TESTS = [{
'url': 'https://iview.abc.net.au/show/ben-and-hollys-little-kingdom/series/0/video/ZX9371A050S00',
'md5': 'cde42d728b3b7c2b32b1b94b4a548afc',
'url': 'https://iview.abc.net.au/show/gruen/series/11/video/LE1927H001S00',
'md5': '67715ce3c78426b11ba167d875ac6abf',
'info_dict': {
'id': 'ZX9371A050S00',
'id': 'LE1927H001S00',
'ext': 'mp4',
'title': "Gaston's Birthday",
'series': "Ben And Holly's Little Kingdom",
'description': 'md5:f9de914d02f226968f598ac76f105bcf',
'upload_date': '20180604',
'uploader_id': 'abc4kids',
'timestamp': 1528140219,
'title': "Series 11 Ep 1",
'series': "Gruen",
'description': 'md5:52cc744ad35045baf6aded2ce7287f67',
'upload_date': '20190925',
'uploader_id': 'abc1',
'timestamp': 1569445289,
},
'params': {
'skip_download': True,
@ -148,7 +148,7 @@ class ABCIViewIE(InfoExtractor):
'hdnea': token,
})
for sd in ('sd', 'sd-low'):
for sd in ('720', 'sd', 'sd-low'):
sd_url = try_get(
stream, lambda x: x['streams']['hls'][sd], compat_str)
if not sd_url:

View File

@ -5,6 +5,7 @@ from .common import InfoExtractor
from ..utils import (
clean_html,
int_or_none,
js_to_json,
try_get,
unified_strdate,
)
@ -13,22 +14,21 @@ from ..utils import (
class AmericasTestKitchenIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?americastestkitchen\.com/(?:episode|videos)/(?P<id>\d+)'
_TESTS = [{
'url': 'https://www.americastestkitchen.com/episode/548-summer-dinner-party',
'url': 'https://www.americastestkitchen.com/episode/582-weeknight-japanese-suppers',
'md5': 'b861c3e365ac38ad319cfd509c30577f',
'info_dict': {
'id': '1_5g5zua6e',
'title': 'Summer Dinner Party',
'id': '5b400b9ee338f922cb06450c',
'title': 'Weeknight Japanese Suppers',
'ext': 'mp4',
'description': 'md5:858d986e73a4826979b6a5d9f8f6a1ec',
'thumbnail': r're:^https?://.*\.jpg',
'timestamp': 1497285541,
'upload_date': '20170612',
'uploader_id': 'roger.metcalf@americastestkitchen.com',
'release_date': '20170617',
'description': 'md5:3d0c1a44bb3b27607ce82652db25b4a8',
'thumbnail': r're:^https?://',
'timestamp': 1523664000,
'upload_date': '20180414',
'release_date': '20180414',
'series': "America's Test Kitchen",
'season_number': 17,
'episode': 'Summer Dinner Party',
'episode_number': 24,
'season_number': 18,
'episode': 'Weeknight Japanese Suppers',
'episode_number': 15,
},
'params': {
'skip_download': True,
@ -47,7 +47,7 @@ class AmericasTestKitchenIE(InfoExtractor):
self._search_regex(
r'window\.__INITIAL_STATE__\s*=\s*({.+?})\s*;\s*</script>',
webpage, 'initial context'),
video_id)
video_id, js_to_json)
ep_data = try_get(
video_data,
@ -55,17 +55,7 @@ class AmericasTestKitchenIE(InfoExtractor):
lambda x: x['videoDetail']['content']['data']), dict)
ep_meta = ep_data.get('full_video', {})
zype_id = ep_meta.get('zype_id')
if zype_id:
embed_url = 'https://player.zype.com/embed/%s.js?api_key=jZ9GUhRmxcPvX7M3SlfejB6Hle9jyHTdk2jVxG7wOHPLODgncEKVdPYBhuz9iWXQ' % zype_id
ie_key = 'Zype'
else:
partner_id = self._search_regex(
r'src=["\'](?:https?:)?//(?:[^/]+\.)kaltura\.com/(?:[^/]+/)*(?:p|partner_id)/(\d+)',
webpage, 'kaltura partner id')
external_id = ep_data.get('external_id') or ep_meta['external_id']
embed_url = 'kaltura:%s:%s' % (partner_id, external_id)
ie_key = 'Kaltura'
zype_id = ep_data.get('zype_id') or ep_meta['zype_id']
title = ep_data.get('title') or ep_meta.get('title')
description = clean_html(ep_meta.get('episode_description') or ep_data.get(
@ -79,8 +69,8 @@ class AmericasTestKitchenIE(InfoExtractor):
return {
'_type': 'url_transparent',
'url': embed_url,
'ie_key': ie_key,
'url': 'https://player.zype.com/embed/%s.js?api_key=jZ9GUhRmxcPvX7M3SlfejB6Hle9jyHTdk2jVxG7wOHPLODgncEKVdPYBhuz9iWXQ' % zype_id,
'ie_key': 'Zype',
'title': title,
'description': description,
'thumbnail': thumbnail,

View File

@ -1,6 +1,7 @@
# coding: utf-8
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
@ -22,7 +23,101 @@ from ..utils import (
from ..compat import compat_etree_fromstring
class ARDMediathekIE(InfoExtractor):
class ARDMediathekBaseIE(InfoExtractor):
_GEO_COUNTRIES = ['DE']
def _extract_media_info(self, media_info_url, webpage, video_id):
media_info = self._download_json(
media_info_url, video_id, 'Downloading media JSON')
return self._parse_media_info(media_info, video_id, '"fsk"' in webpage)
def _parse_media_info(self, media_info, video_id, fsk):
formats = self._extract_formats(media_info, video_id)
if not formats:
if fsk:
raise ExtractorError(
'This video is only available after 20:00', expected=True)
elif media_info.get('_geoblocked'):
self.raise_geo_restricted(
'This video is not available due to geoblocking',
countries=self._GEO_COUNTRIES)
self._sort_formats(formats)
subtitles = {}
subtitle_url = media_info.get('_subtitleUrl')
if subtitle_url:
subtitles['de'] = [{
'ext': 'ttml',
'url': subtitle_url,
}]
return {
'id': video_id,
'duration': int_or_none(media_info.get('_duration')),
'thumbnail': media_info.get('_previewImage'),
'is_live': media_info.get('_isLive') is True,
'formats': formats,
'subtitles': subtitles,
}
def _extract_formats(self, media_info, video_id):
type_ = media_info.get('_type')
media_array = media_info.get('_mediaArray', [])
formats = []
for num, media in enumerate(media_array):
for stream in media.get('_mediaStreamArray', []):
stream_urls = stream.get('_stream')
if not stream_urls:
continue
if not isinstance(stream_urls, list):
stream_urls = [stream_urls]
quality = stream.get('_quality')
server = stream.get('_server')
for stream_url in stream_urls:
if not url_or_none(stream_url):
continue
ext = determine_ext(stream_url)
if quality != 'auto' and ext in ('f4m', 'm3u8'):
continue
if ext == 'f4m':
formats.extend(self._extract_f4m_formats(
update_url_query(stream_url, {
'hdcore': '3.1.1',
'plugin': 'aasp-3.1.1.69.124'
}), video_id, f4m_id='hds', fatal=False))
elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
stream_url, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
else:
if server and server.startswith('rtmp'):
f = {
'url': server,
'play_path': stream_url,
'format_id': 'a%s-rtmp-%s' % (num, quality),
}
else:
f = {
'url': stream_url,
'format_id': 'a%s-%s-%s' % (num, ext, quality)
}
m = re.search(
r'_(?P<width>\d+)x(?P<height>\d+)\.mp4$',
stream_url)
if m:
f.update({
'width': int(m.group('width')),
'height': int(m.group('height')),
})
if type_ == 'audio':
f['vcodec'] = 'none'
formats.append(f)
return formats
class ARDMediathekIE(ARDMediathekBaseIE):
IE_NAME = 'ARD:mediathek'
_VALID_URL = r'^https?://(?:(?:(?:www|classic)\.)?ardmediathek\.de|mediathek\.(?:daserste|rbb-online)\.de|one\.ard\.de)/(?:.*/)(?P<video_id>[0-9]+|[^0-9][^/\?]+)[^/\?]*(?:\?.*)?'
@ -63,94 +158,6 @@ class ARDMediathekIE(InfoExtractor):
def suitable(cls, url):
return False if ARDBetaMediathekIE.suitable(url) else super(ARDMediathekIE, cls).suitable(url)
def _extract_media_info(self, media_info_url, webpage, video_id):
media_info = self._download_json(
media_info_url, video_id, 'Downloading media JSON')
formats = self._extract_formats(media_info, video_id)
if not formats:
if '"fsk"' in webpage:
raise ExtractorError(
'This video is only available after 20:00', expected=True)
elif media_info.get('_geoblocked'):
raise ExtractorError('This video is not available due to geo restriction', expected=True)
self._sort_formats(formats)
duration = int_or_none(media_info.get('_duration'))
thumbnail = media_info.get('_previewImage')
is_live = media_info.get('_isLive') is True
subtitles = {}
subtitle_url = media_info.get('_subtitleUrl')
if subtitle_url:
subtitles['de'] = [{
'ext': 'ttml',
'url': subtitle_url,
}]
return {
'id': video_id,
'duration': duration,
'thumbnail': thumbnail,
'is_live': is_live,
'formats': formats,
'subtitles': subtitles,
}
def _extract_formats(self, media_info, video_id):
type_ = media_info.get('_type')
media_array = media_info.get('_mediaArray', [])
formats = []
for num, media in enumerate(media_array):
for stream in media.get('_mediaStreamArray', []):
stream_urls = stream.get('_stream')
if not stream_urls:
continue
if not isinstance(stream_urls, list):
stream_urls = [stream_urls]
quality = stream.get('_quality')
server = stream.get('_server')
for stream_url in stream_urls:
if not url_or_none(stream_url):
continue
ext = determine_ext(stream_url)
if quality != 'auto' and ext in ('f4m', 'm3u8'):
continue
if ext == 'f4m':
formats.extend(self._extract_f4m_formats(
update_url_query(stream_url, {
'hdcore': '3.1.1',
'plugin': 'aasp-3.1.1.69.124'
}),
video_id, f4m_id='hds', fatal=False))
elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
stream_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
else:
if server and server.startswith('rtmp'):
f = {
'url': server,
'play_path': stream_url,
'format_id': 'a%s-rtmp-%s' % (num, quality),
}
else:
f = {
'url': stream_url,
'format_id': 'a%s-%s-%s' % (num, ext, quality)
}
m = re.search(r'_(?P<width>\d+)x(?P<height>\d+)\.mp4$', stream_url)
if m:
f.update({
'width': int(m.group('width')),
'height': int(m.group('height')),
})
if type_ == 'audio':
f['vcodec'] = 'none'
formats.append(f)
return formats
def _real_extract(self, url):
# determine video id from url
m = re.match(self._VALID_URL, url)
@ -302,19 +309,20 @@ class ARDIE(InfoExtractor):
}
class ARDBetaMediathekIE(InfoExtractor):
_VALID_URL = r'https://(?:beta|www)\.ardmediathek\.de/[^/]+/(?:player|live)/(?P<video_id>[a-zA-Z0-9]+)(?:/(?P<display_id>[^/?#]+))?'
class ARDBetaMediathekIE(ARDMediathekBaseIE):
_VALID_URL = r'https://(?:beta|www)\.ardmediathek\.de/(?P<client>[^/]+)/(?:player|live)/(?P<video_id>[a-zA-Z0-9]+)(?:/(?P<display_id>[^/?#]+))?'
_TESTS = [{
'url': 'https://beta.ardmediathek.de/ard/player/Y3JpZDovL2Rhc2Vyc3RlLmRlL3RhdG9ydC9mYmM4NGM1NC0xNzU4LTRmZGYtYWFhZS0wYzcyZTIxNGEyMDE/die-robuste-roswita',
'md5': '2d02d996156ea3c397cfc5036b5d7f8f',
'md5': 'dfdc87d2e7e09d073d5a80770a9ce88f',
'info_dict': {
'display_id': 'die-robuste-roswita',
'id': 'Y3JpZDovL2Rhc2Vyc3RlLmRlL3RhdG9ydC9mYmM4NGM1NC0xNzU4LTRmZGYtYWFhZS0wYzcyZTIxNGEyMDE',
'title': 'Tatort: Die robuste Roswita',
'id': '70153354',
'title': 'Die robuste Roswita',
'description': r're:^Der Mord.*trüber ist als die Ilm.',
'duration': 5316,
'thumbnail': 'https://img.ardmediathek.de/standard/00/55/43/59/34/-1774185891/16x9/960?mandant=ard',
'upload_date': '20180826',
'thumbnail': 'https://img.ardmediathek.de/standard/00/70/15/33/90/-1852531467/16x9/960?mandant=ard',
'timestamp': 1577047500,
'upload_date': '20191222',
'ext': 'mp4',
},
}, {
@ -330,71 +338,69 @@ class ARDBetaMediathekIE(InfoExtractor):
video_id = mobj.group('video_id')
display_id = mobj.group('display_id') or video_id
webpage = self._download_webpage(url, display_id)
data_json = self._search_regex(r'window\.__APOLLO_STATE__\s*=\s*(\{.*);\n', webpage, 'json')
data = self._parse_json(data_json, display_id)
res = {
'id': video_id,
'display_id': display_id,
player_page = self._download_json(
'https://api.ardmediathek.de/public-gateway',
display_id, data=json.dumps({
'query': '''{
playerPage(client:"%s", clipId: "%s") {
blockedByFsk
broadcastedOn
maturityContentRating
mediaCollection {
_duration
_geoblocked
_isLive
_mediaArray {
_mediaStreamArray {
_quality
_server
_stream
}
formats = []
subtitles = {}
geoblocked = False
for widget in data.values():
if widget.get('_geoblocked') is True:
geoblocked = True
if '_duration' in widget:
res['duration'] = int_or_none(widget['_duration'])
if 'clipTitle' in widget:
res['title'] = widget['clipTitle']
if '_previewImage' in widget:
res['thumbnail'] = widget['_previewImage']
if 'broadcastedOn' in widget:
res['timestamp'] = unified_timestamp(widget['broadcastedOn'])
if 'synopsis' in widget:
res['description'] = widget['synopsis']
subtitle_url = url_or_none(widget.get('_subtitleUrl'))
if subtitle_url:
subtitles.setdefault('de', []).append({
'ext': 'ttml',
'url': subtitle_url,
})
if '_quality' in widget:
format_url = url_or_none(try_get(
widget, lambda x: x['_stream']['json'][0]))
if not format_url:
continue
ext = determine_ext(format_url)
if ext == 'f4m':
formats.extend(self._extract_f4m_formats(
format_url + '?hdcore=3.11.0',
video_id, f4m_id='hds', fatal=False))
elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
format_url, video_id, 'mp4', m3u8_id='hls',
fatal=False))
else:
# HTTP formats are not available when geoblocked is True,
# other formats are fine though
if geoblocked:
continue
quality = str_or_none(widget.get('_quality'))
formats.append({
'format_id': ('http-' + quality) if quality else 'http',
'url': format_url,
'preference': 10, # Plain HTTP, that's nice
})
if not formats and geoblocked:
self.raise_geo_restricted(
msg='This video is not available due to geoblocking',
countries=['DE'])
self._sort_formats(formats)
res.update({
'subtitles': subtitles,
'formats': formats,
}
_previewImage
_subtitleUrl
_type
}
show {
title
}
synopsis
title
tracking {
atiCustomVars {
contentId
}
}
}
}''' % (mobj.group('client'), video_id),
}).encode(), headers={
'Content-Type': 'application/json'
})['data']['playerPage']
title = player_page['title']
content_id = str_or_none(try_get(
player_page, lambda x: x['tracking']['atiCustomVars']['contentId']))
media_collection = player_page.get('mediaCollection') or {}
if not media_collection and content_id:
media_collection = self._download_json(
'https://www.ardmediathek.de/play/media/' + content_id,
content_id, fatal=False) or {}
info = self._parse_media_info(
media_collection, content_id or video_id,
player_page.get('blockedByFsk'))
age_limit = None
description = player_page.get('synopsis')
maturity_content_rating = player_page.get('maturityContentRating')
if maturity_content_rating:
age_limit = int_or_none(maturity_content_rating.lstrip('FSK'))
if not age_limit and description:
age_limit = int_or_none(self._search_regex(
r'\(FSK\s*(\d+)\)\s*$', description, 'age limit', default=None))
info.update({
'age_limit': age_limit,
'display_id': display_id,
'title': title,
'description': description,
'timestamp': unified_timestamp(player_page.get('broadcastedOn')),
'series': try_get(player_page, lambda x: x['show']['title']),
})
return res
return info

View File

@ -47,39 +47,19 @@ class AZMedienIE(InfoExtractor):
'url': 'https://www.telebaern.tv/telebaern-news/montag-1-oktober-2018-ganze-sendung-133531189#video=0_7xjo9lf1',
'only_matching': True
}]
_API_TEMPL = 'https://www.%s/api/pub/gql/%s/NewsArticleTeaser/cb9f2f81ed22e9b47f4ca64ea3cc5a5d13e88d1d'
_PARTNER_ID = '1719221'
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
host = mobj.group('host')
video_id = mobj.group('id')
entry_id = mobj.group('kaltura_id')
host, display_id, article_id, entry_id = re.match(self._VALID_URL, url).groups()
if not entry_id:
api_url = 'https://www.%s/api/pub/gql/%s' % (host, host.split('.')[0])
payload = {
'query': '''query VideoContext($articleId: ID!) {
article: node(id: $articleId) {
... on Article {
mainAssetRelation {
asset {
... on VideoAsset {
kalturaId
}
}
}
}
}
}''',
'variables': {'articleId': 'Article:%s' % mobj.group('article_id')},
}
json_data = self._download_json(
api_url, video_id, headers={
'Content-Type': 'application/json',
},
data=json.dumps(payload).encode())
entry_id = json_data['data']['article']['mainAssetRelation']['asset']['kalturaId']
entry_id = self._download_json(
self._API_TEMPL % (host, host.split('.')[0]), display_id, query={
'variables': json.dumps({
'contextId': 'NewsArticle:' + article_id,
}),
})['data']['context']['mainAsset']['video']['kaltura']['kalturaId']
return self.url_result(
'kaltura:%s:%s' % (self._PARTNER_ID, entry_id),

View File

@ -9,21 +9,26 @@ class BusinessInsiderIE(InfoExtractor):
_VALID_URL = r'https?://(?:[^/]+\.)?businessinsider\.(?:com|nl)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://uk.businessinsider.com/how-much-radiation-youre-exposed-to-in-everyday-life-2016-6',
'md5': 'ca237a53a8eb20b6dc5bd60564d4ab3e',
'md5': 'ffed3e1e12a6f950aa2f7d83851b497a',
'info_dict': {
'id': 'hZRllCfw',
'id': 'cjGDb0X9',
'ext': 'mp4',
'title': "Here's how much radiation you're exposed to in everyday life",
'description': 'md5:9a0d6e2c279948aadaa5e84d6d9b99bd',
'upload_date': '20170709',
'timestamp': 1499606400,
},
'params': {
'skip_download': True,
'title': "Bananas give you more radiation exposure than living next to a nuclear power plant",
'description': 'md5:0175a3baf200dd8fa658f94cade841b3',
'upload_date': '20160611',
'timestamp': 1465675620,
},
}, {
'url': 'https://www.businessinsider.nl/5-scientifically-proven-things-make-you-less-attractive-2017-7/',
'only_matching': True,
'md5': '43f438dbc6da0b89f5ac42f68529d84a',
'info_dict': {
'id': '5zJwd4FK',
'ext': 'mp4',
'title': 'Deze dingen zorgen ervoor dat je minder snel een date scoort',
'description': 'md5:2af8975825d38a4fed24717bbe51db49',
'upload_date': '20170705',
'timestamp': 1499270528,
},
}, {
'url': 'http://www.businessinsider.com/excel-index-match-vlookup-video-how-to-2015-2?IR=T',
'only_matching': True,
@ -35,7 +40,8 @@ class BusinessInsiderIE(InfoExtractor):
jwplatform_id = self._search_regex(
(r'data-media-id=["\']([a-zA-Z0-9]{8})',
r'id=["\']jwplayer_([a-zA-Z0-9]{8})',
r'id["\']?\s*:\s*["\']?([a-zA-Z0-9]{8})'),
r'id["\']?\s*:\s*["\']?([a-zA-Z0-9]{8})',
r'(?:jwplatform\.com/players/|jwplayer_)([a-zA-Z0-9]{8})'),
webpage, 'jwplatform id')
return self.url_result(
'jwplatform:%s' % jwplatform_id, ie=JWPlatformIE.ie_key(),

View File

@ -13,6 +13,8 @@ from ..utils import (
int_or_none,
merge_dicts,
parse_iso8601,
str_or_none,
url_or_none,
)
@ -20,15 +22,15 @@ class CanvasIE(InfoExtractor):
_VALID_URL = r'https?://mediazone\.vrt\.be/api/v1/(?P<site_id>canvas|een|ketnet|vrt(?:video|nieuws)|sporza)/assets/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://mediazone.vrt.be/api/v1/ketnet/assets/md-ast-4ac54990-ce66-4d00-a8ca-9eac86f4c475',
'md5': '90139b746a0a9bd7bb631283f6e2a64e',
'md5': '68993eda72ef62386a15ea2cf3c93107',
'info_dict': {
'id': 'md-ast-4ac54990-ce66-4d00-a8ca-9eac86f4c475',
'display_id': 'md-ast-4ac54990-ce66-4d00-a8ca-9eac86f4c475',
'ext': 'flv',
'ext': 'mp4',
'title': 'Nachtwacht: De Greystook',
'description': 'md5:1db3f5dc4c7109c821261e7512975be7',
'description': 'Nachtwacht: De Greystook',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 1468.03,
'duration': 1468.04,
},
'expected_warnings': ['is not a supported codec', 'Unknown MIME type'],
}, {
@ -39,23 +41,45 @@ class CanvasIE(InfoExtractor):
'HLS': 'm3u8_native',
'HLS_AES': 'm3u8',
}
_REST_API_BASE = 'https://media-services-public.vrt.be/vualto-video-aggregator-web/rest/external/v1'
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
site_id, video_id = mobj.group('site_id'), mobj.group('id')
# Old API endpoint, serves more formats but may fail for some videos
data = self._download_json(
'https://mediazone.vrt.be/api/v1/%s/assets/%s'
% (site_id, video_id), video_id)
% (site_id, video_id), video_id, 'Downloading asset JSON',
'Unable to download asset JSON', fatal=False)
# New API endpoint
if not data:
token = self._download_json(
'%s/tokens' % self._REST_API_BASE, video_id,
'Downloading token', data=b'',
headers={'Content-Type': 'application/json'})['vrtPlayerToken']
data = self._download_json(
'%s/videos/%s' % (self._REST_API_BASE, video_id),
video_id, 'Downloading video JSON', fatal=False, query={
'vrtPlayerToken': token,
'client': '%s@PROD' % site_id,
}, expected_status=400)
message = data.get('message')
if message and not data.get('title'):
if data.get('code') == 'AUTHENTICATION_REQUIRED':
self.raise_login_required(message)
raise ExtractorError(message, expected=True)
title = data['title']
description = data.get('description')
formats = []
for target in data['targetUrls']:
format_url, format_type = target.get('url'), target.get('type')
format_url, format_type = url_or_none(target.get('url')), str_or_none(target.get('type'))
if not format_url or not format_type:
continue
format_type = format_type.upper()
if format_type in self._HLS_ENTRY_PROTOCOLS_MAP:
formats.extend(self._extract_m3u8_formats(
format_url, video_id, 'mp4', self._HLS_ENTRY_PROTOCOLS_MAP[format_type],
@ -134,20 +158,20 @@ class CanvasEenIE(InfoExtractor):
},
'skip': 'Pagina niet gevonden',
}, {
'url': 'https://www.een.be/sorry-voor-alles/herbekijk-sorry-voor-alles',
'url': 'https://www.een.be/thuis/emma-pakt-thilly-aan',
'info_dict': {
'id': 'mz-ast-11a587f8-b921-4266-82e2-0bce3e80d07f',
'display_id': 'herbekijk-sorry-voor-alles',
'id': 'md-ast-3a24ced2-64d7-44fb-b4ed-ed1aafbf90b8',
'display_id': 'emma-pakt-thilly-aan',
'ext': 'mp4',
'title': 'Herbekijk Sorry voor alles',
'description': 'md5:8bb2805df8164e5eb95d6a7a29dc0dd3',
'title': 'Emma pakt Thilly aan',
'description': 'md5:c5c9b572388a99b2690030afa3f3bad7',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 3788.06,
'duration': 118.24,
},
'params': {
'skip_download': True,
},
'skip': 'Episode no longer available',
'expected_warnings': ['is not a supported codec'],
}, {
'url': 'https://www.canvas.be/check-point/najaar-2016/de-politie-uw-vriend',
'only_matching': True,
@ -183,19 +207,44 @@ class VrtNUIE(GigyaBaseIE):
IE_DESC = 'VrtNU.be'
_VALID_URL = r'https?://(?:www\.)?vrt\.be/(?P<site_id>vrtnu)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{
# Available via old API endpoint
'url': 'https://www.vrt.be/vrtnu/a-z/postbus-x/1/postbus-x-s1a1/',
'info_dict': {
'id': 'pbs-pub-2e2d8c27-df26-45c9-9dc6-90c78153044d$vid-90c932b1-e21d-4fb8-99b1-db7b49cf74de',
'ext': 'flv',
'ext': 'mp4',
'title': 'De zwarte weduwe',
'description': 'md5:d90c21dced7db869a85db89a623998d4',
'description': 'md5:db1227b0f318c849ba5eab1fef895ee4',
'duration': 1457.04,
'thumbnail': r're:^https?://.*\.jpg$',
'season': '1',
'season': 'Season 1',
'season_number': 1,
'episode_number': 1,
},
'skip': 'This video is only available for registered users'
'skip': 'This video is only available for registered users',
'params': {
'username': '<snip>',
'password': '<snip>',
},
'expected_warnings': ['is not a supported codec'],
}, {
# Only available via new API endpoint
'url': 'https://www.vrt.be/vrtnu/a-z/kamp-waes/1/kamp-waes-s1a5/',
'info_dict': {
'id': 'pbs-pub-0763b56c-64fb-4d38-b95b-af60bf433c71$vid-ad36a73c-4735-4f1f-b2c0-a38e6e6aa7e1',
'ext': 'mp4',
'title': 'Aflevering 5',
'description': 'Wie valt door de mand tijdens een missie?',
'duration': 2967.06,
'season': 'Season 1',
'season_number': 1,
'episode_number': 5,
},
'skip': 'This video is only available for registered users',
'params': {
'username': '<snip>',
'password': '<snip>',
},
'expected_warnings': ['Unable to download asset JSON', 'is not a supported codec', 'Unknown MIME type'],
}]
_NETRC_MACHINE = 'vrtnu'
_APIKEY = '3_0Z2HujMtiWq_pkAjgnS2Md2E11a1AwZjYiBETtwNE-EoEHDINgtnvcAOpNgmrVGy'

View File

@ -1,20 +1,24 @@
# coding: utf-8
from __future__ import unicode_literals
import base64
import re
from .common import InfoExtractor
class CloudflareStreamIE(InfoExtractor):
_DOMAIN_RE = r'(?:cloudflarestream\.com|(?:videodelivery|bytehighway)\.net)'
_EMBED_RE = r'embed\.%s/embed/[^/]+\.js\?.*?\bvideo=' % _DOMAIN_RE
_ID_RE = r'[\da-f]{32}|[\w-]+\.[\w-]+\.[\w-]+'
_VALID_URL = r'''(?x)
https?://
(?:
(?:watch\.)?(?:cloudflarestream\.com|videodelivery\.net)/|
embed\.(?:cloudflarestream\.com|videodelivery\.net)/embed/[^/]+\.js\?.*?\bvideo=
(?:watch\.)?%s/|
%s
)
(?P<id>[\da-f]+)
'''
(?P<id>%s)
''' % (_DOMAIN_RE, _EMBED_RE, _ID_RE)
_TESTS = [{
'url': 'https://embed.cloudflarestream.com/embed/we4g.fla9.latest.js?video=31c9291ab41fac05471db4e73aa11717',
'info_dict': {
@ -41,23 +45,28 @@ class CloudflareStreamIE(InfoExtractor):
return [
mobj.group('url')
for mobj in re.finditer(
r'<script[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//embed\.(?:cloudflarestream\.com|videodelivery\.net)/embed/[^/]+\.js\?.*?\bvideo=[\da-f]+?.*?)\1',
r'<script[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//%s(?:%s).*?)\1' % (CloudflareStreamIE._EMBED_RE, CloudflareStreamIE._ID_RE),
webpage)]
def _real_extract(self, url):
video_id = self._match_id(url)
domain = 'bytehighway.net' if 'bytehighway.net/' in url else 'videodelivery.net'
base_url = 'https://%s/%s/' % (domain, video_id)
if '.' in video_id:
video_id = self._parse_json(base64.urlsafe_b64decode(
video_id.split('.')[1]), video_id)['sub']
manifest_base_url = base_url + 'manifest/video.'
formats = self._extract_m3u8_formats(
'https://cloudflarestream.com/%s/manifest/video.m3u8' % video_id,
video_id, 'mp4', entry_protocol='m3u8_native', m3u8_id='hls',
fatal=False)
manifest_base_url + 'm3u8', video_id, 'mp4',
'm3u8_native', m3u8_id='hls', fatal=False)
formats.extend(self._extract_mpd_formats(
'https://cloudflarestream.com/%s/manifest/video.mpd' % video_id,
video_id, mpd_id='dash', fatal=False))
manifest_base_url + 'mpd', video_id, mpd_id='dash', fatal=False))
self._sort_formats(formats)
return {
'id': video_id,
'title': video_id,
'thumbnail': base_url + 'thumbnails/thumbnail.jpg',
'formats': formats,
}

View File

@ -16,10 +16,11 @@ class DctpTvIE(InfoExtractor):
_TESTS = [{
# 4x3
'url': 'http://www.dctp.tv/filme/videoinstallation-fuer-eine-kaufhausfassade/',
'md5': '3ffbd1556c3fe210724d7088fad723e3',
'info_dict': {
'id': '95eaa4f33dad413aa17b4ee613cccc6c',
'display_id': 'videoinstallation-fuer-eine-kaufhausfassade',
'ext': 'flv',
'ext': 'm4v',
'title': 'Videoinstallation für eine Kaufhausfassade',
'description': 'Kurzfilm',
'thumbnail': r're:^https?://.*\.jpg$',
@ -27,10 +28,6 @@ class DctpTvIE(InfoExtractor):
'timestamp': 1302172322,
'upload_date': '20110407',
},
'params': {
# rtmp download
'skip_download': True,
},
}, {
# 16x9
'url': 'http://www.dctp.tv/filme/sind-youtuber-die-besseren-lehrer/',
@ -59,33 +56,26 @@ class DctpTvIE(InfoExtractor):
uuid = media['uuid']
title = media['title']
ratio = '16x9' if media.get('is_wide') else '4x3'
play_path = 'mp4:%s_dctp_0500_%s.m4v' % (uuid, ratio)
is_wide = media.get('is_wide')
formats = []
servers = self._download_json(
'http://www.dctp.tv/streaming_servers/', display_id,
note='Downloading server list JSON', fatal=False)
def add_formats(suffix):
templ = 'https://%%s/%s_dctp_%s.m4v' % (uuid, suffix)
formats.extend([{
'format_id': 'hls-' + suffix,
'url': templ % 'cdn-segments.dctp.tv' + '/playlist.m3u8',
'protocol': 'm3u8_native',
}, {
'format_id': 's3-' + suffix,
'url': templ % 'completed-media.s3.amazonaws.com',
}, {
'format_id': 'http-' + suffix,
'url': templ % 'cdn-media.dctp.tv',
}])
if servers:
endpoint = next(
server['endpoint']
for server in servers
if url_or_none(server.get('endpoint'))
and 'cloudfront' in server['endpoint'])
else:
endpoint = 'rtmpe://s2pqqn4u96e4j8.cloudfront.net/cfx/st/'
app = self._search_regex(
r'^rtmpe?://[^/]+/(?P<app>.*)$', endpoint, 'app')
formats = [{
'url': endpoint,
'app': app,
'play_path': play_path,
'page_url': url,
'player_url': 'http://svm-prod-dctptv-static.s3.amazonaws.com/dctptv-relaunch2012-110.swf',
'ext': 'flv',
}]
add_formats('0500_' + ('16x9' if is_wide else '4x3'))
if is_wide:
add_formats('720p')
thumbnails = []
images = media.get('images')

View File

@ -13,8 +13,8 @@ from ..compat import compat_HTTPError
class DiscoveryIE(DiscoveryGoBaseIE):
_VALID_URL = r'''(?x)https?://
(?P<site>
(?:(?:www|go)\.)?discovery|
(?:www\.)?
go\.discovery|
www\.
(?:
investigationdiscovery|
discoverylife|
@ -22,8 +22,7 @@ class DiscoveryIE(DiscoveryGoBaseIE):
ahctv|
destinationamerica|
sciencechannel|
tlc|
velocity
tlc
)|
watch\.
(?:
@ -83,7 +82,7 @@ class DiscoveryIE(DiscoveryGoBaseIE):
'authRel': 'authorization',
'client_id': '3020a40c2356a645b4b4',
'nonce': ''.join([random.choice(string.ascii_letters) for _ in range(32)]),
'redirectUri': 'https://fusion.ddmcdn.com/app/mercury-sdk/180/redirectHandler.html?https://www.%s.com' % site,
'redirectUri': 'https://www.discovery.com/',
})['access_token']
headers = self.geo_verification_headers()

View File

@ -4,7 +4,6 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
encode_base_n,
ExtractorError,
@ -55,7 +54,7 @@ class EpornerIE(InfoExtractor):
webpage, urlh = self._download_webpage_handle(url, display_id)
video_id = self._match_id(compat_str(urlh.geturl()))
video_id = self._match_id(urlh.geturl())
hash = self._search_regex(
r'hash\s*:\s*["\']([\da-f]{32})', webpage, 'hash')

View File

@ -497,7 +497,6 @@ from .jeuxvideo import JeuxVideoIE
from .jove import JoveIE
from .joj import JojIE
from .jwplatform import JWPlatformIE
from .jpopsukitv import JpopsukiIE
from .kakao import KakaoIE
from .kaltura import KalturaIE
from .kanalplay import KanalPlayIE
@ -808,7 +807,6 @@ from .packtpub import (
PacktPubIE,
PacktPubCourseIE,
)
from .pandatv import PandaTVIE
from .pandoratv import PandoraTVIE
from .parliamentliveuk import ParliamentLiveUKIE
from .patreon import PatreonIE
@ -851,6 +849,7 @@ from .polskieradio import (
PolskieRadioIE,
PolskieRadioCategoryIE,
)
from .popcorntimes import PopcorntimesIE
from .popcorntv import PopcornTVIE
from .porn91 import Porn91IE
from .porncom import PornComIE

View File

@ -31,7 +31,13 @@ class FranceCultureIE(InfoExtractor):
webpage = self._download_webpage(url, display_id)
video_data = extract_attributes(self._search_regex(
r'(?s)<div[^>]+class="[^"]*?(?:title-zone-diffusion|heading-zone-(?:wrapper|player-button))[^"]*?"[^>]*>.*?(<button[^>]+data-asset-source="[^"]+"[^>]+>)',
r'''(?sx)
(?:
</h1>|
<div[^>]+class="[^"]*?(?:title-zone-diffusion|heading-zone-(?:wrapper|player-button))[^"]*?"[^>]*>
).*?
(<button[^>]+data-asset-source="[^"]+"[^>]+>)
''',
webpage, 'video data'))
video_url = video_data['data-asset-source']

View File

@ -2098,6 +2098,9 @@ class GenericIE(InfoExtractor):
'ext': 'mp4',
'title': 'Smoky Barbecue Favorites',
'thumbnail': r're:^https?://.*\.jpe?g',
'description': 'md5:5ff01e76316bd8d46508af26dc86023b',
'upload_date': '20170909',
'timestamp': 1504915200,
},
'add_ie': [ZypeIE.ie_key()],
'params': {
@ -2284,7 +2287,7 @@ class GenericIE(InfoExtractor):
if head_response is not False:
# Check for redirect
new_url = compat_str(head_response.geturl())
new_url = head_response.geturl()
if url != new_url:
self.report_following_redirect(new_url)
if force_videoid:
@ -2384,12 +2387,12 @@ class GenericIE(InfoExtractor):
return self.playlist_result(
self._parse_xspf(
doc, video_id, xspf_url=url,
xspf_base_url=compat_str(full_response.geturl())),
xspf_base_url=full_response.geturl()),
video_id)
elif re.match(r'(?i)^(?:{[^}]+})?MPD$', doc.tag):
info_dict['formats'] = self._parse_mpd_formats(
doc,
mpd_base_url=compat_str(full_response.geturl()).rpartition('/')[0],
mpd_base_url=full_response.geturl().rpartition('/')[0],
mpd_url=url)
self._sort_formats(info_dict['formats'])
return info_dict
@ -2534,14 +2537,15 @@ class GenericIE(InfoExtractor):
dailymail_urls, video_id, video_title, ie=DailyMailIE.ie_key())
# Look for embedded Wistia player
wistia_url = WistiaIE._extract_url(webpage)
if wistia_url:
return {
'_type': 'url_transparent',
'url': self._proto_relative_url(wistia_url),
'ie_key': WistiaIE.ie_key(),
'uploader': video_uploader,
}
wistia_urls = WistiaIE._extract_urls(webpage)
if wistia_urls:
playlist = self.playlist_from_matches(wistia_urls, video_id, video_title, ie=WistiaIE.ie_key())
for entry in playlist['entries']:
entry.update({
'_type': 'url_transparent',
'uploader': video_uploader,
})
return playlist
# Look for SVT player
svt_url = SVTIE._extract_url(webpage)
@ -2960,7 +2964,7 @@ class GenericIE(InfoExtractor):
# Look for VODPlatform embeds
mobj = re.search(
r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//(?:www\.)?vod-platform\.net/[eE]mbed/.+?)\1',
r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//(?:(?:www\.)?vod-platform\.net|embed\.kwikmotion\.com)/[eE]mbed/.+?)\1',
webpage)
if mobj is not None:
return self.url_result(

View File

@ -1,5 +1,7 @@
from __future__ import unicode_literals
import base64
import json
import re
from .common import InfoExtractor
@ -8,6 +10,7 @@ from ..utils import (
mimetype2ext,
parse_duration,
qualities,
try_get,
url_or_none,
)
@ -15,15 +18,16 @@ from ..utils import (
class ImdbIE(InfoExtractor):
IE_NAME = 'imdb'
IE_DESC = 'Internet Movie Database trailers'
_VALID_URL = r'https?://(?:www|m)\.imdb\.com/(?:video|title|list).+?[/-]vi(?P<id>\d+)'
_VALID_URL = r'https?://(?:www|m)\.imdb\.com/(?:video|title|list).*?[/-]vi(?P<id>\d+)'
_TESTS = [{
'url': 'http://www.imdb.com/video/imdb/vi2524815897',
'info_dict': {
'id': '2524815897',
'ext': 'mp4',
'title': 'No. 2 from Ice Age: Continental Drift (2012)',
'title': 'No. 2',
'description': 'md5:87bd0bdc61e351f21f20d2d7441cb4e7',
'duration': 152,
}
}, {
'url': 'http://www.imdb.com/video/_/vi2524815897',
@ -47,21 +51,23 @@ class ImdbIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(
'https://www.imdb.com/videoplayer/vi' + video_id, video_id)
video_metadata = self._parse_json(self._search_regex(
r'window\.IMDbReactInitialState\.push\(({.+?})\);', webpage,
'video metadata'), video_id)['videos']['videoMetadata']['vi' + video_id]
title = self._html_search_meta(
['og:title', 'twitter:title'], webpage) or self._html_search_regex(
r'<title>(.+?)</title>', webpage, 'title', fatal=False) or video_metadata['title']
data = self._download_json(
'https://www.imdb.com/ve/data/VIDEO_PLAYBACK_DATA', video_id,
query={
'key': base64.b64encode(json.dumps({
'type': 'VIDEO_PLAYER',
'subType': 'FORCE_LEGACY',
'id': 'vi%s' % video_id,
}).encode()).decode(),
})[0]
quality = qualities(('SD', '480p', '720p', '1080p'))
formats = []
for encoding in video_metadata.get('encodings', []):
for encoding in data['videoLegacyEncodings']:
if not encoding or not isinstance(encoding, dict):
continue
video_url = url_or_none(encoding.get('videoUrl'))
video_url = url_or_none(encoding.get('url'))
if not video_url:
continue
ext = mimetype2ext(encoding.get(
@ -69,7 +75,7 @@ class ImdbIE(InfoExtractor):
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
video_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
preference=1, m3u8_id='hls', fatal=False))
continue
format_id = encoding.get('definition')
formats.append({
@ -80,13 +86,33 @@ class ImdbIE(InfoExtractor):
})
self._sort_formats(formats)
webpage = self._download_webpage(
'https://www.imdb.com/video/vi' + video_id, video_id)
video_metadata = self._parse_json(self._search_regex(
r'args\.push\(\s*({.+?})\s*\)\s*;', webpage,
'video metadata'), video_id)
video_info = video_metadata.get('VIDEO_INFO')
if video_info and isinstance(video_info, dict):
info = try_get(
video_info, lambda x: x[list(video_info.keys())[0]][0], dict)
else:
info = {}
title = self._html_search_meta(
['og:title', 'twitter:title'], webpage) or self._html_search_regex(
r'<title>(.+?)</title>', webpage, 'title',
default=None) or info['videoTitle']
return {
'id': video_id,
'title': title,
'alt_title': info.get('videoSubTitle'),
'formats': formats,
'description': video_metadata.get('description'),
'thumbnail': video_metadata.get('slate', {}).get('url'),
'duration': parse_duration(video_metadata.get('duration')),
'description': info.get('videoDescription'),
'thumbnail': url_or_none(try_get(
video_metadata, lambda x: x['videoSlate']['source'])),
'duration': parse_duration(info.get('videoRuntime')),
}

View File

@ -239,7 +239,7 @@ class IviCompilationIE(InfoExtractor):
self.url_result(
'http://www.ivi.ru/watch/%s/%s' % (compilation_id, serie), IviIE.ie_key())
for serie in re.findall(
r'<a href="/watch/%s/(\d+)"[^>]+data-id="\1"' % compilation_id, html)]
r'<a\b[^>]+\bhref=["\']/watch/%s/(\d+)["\']' % compilation_id, html)]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)

View File

@ -1,68 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
int_or_none,
unified_strdate,
)
class JpopsukiIE(InfoExtractor):
IE_NAME = 'jpopsuki.tv'
_VALID_URL = r'https?://(?:www\.)?jpopsuki\.tv/(?:category/)?video/[^/]+/(?P<id>\S+)'
_TEST = {
'url': 'http://www.jpopsuki.tv/video/ayumi-hamasaki---evolution/00be659d23b0b40508169cdee4545771',
'md5': '88018c0c1a9b1387940e90ec9e7e198e',
'info_dict': {
'id': '00be659d23b0b40508169cdee4545771',
'ext': 'mp4',
'title': 'ayumi hamasaki - evolution',
'description': 'Release date: 2001.01.31\r\n浜崎あゆみ - evolution',
'thumbnail': 'http://www.jpopsuki.tv/cache/89722c74d2a2ebe58bcac65321c115b2.jpg',
'uploader': 'plama_chan',
'uploader_id': '404',
'upload_date': '20121101'
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_url = 'http://www.jpopsuki.tv' + self._html_search_regex(
r'<source src="(.*?)" type', webpage, 'video url')
video_title = self._og_search_title(webpage)
description = self._og_search_description(webpage)
thumbnail = self._og_search_thumbnail(webpage)
uploader = self._html_search_regex(
r'<li>from: <a href="/user/view/user/(.*?)/uid/',
webpage, 'video uploader', fatal=False)
uploader_id = self._html_search_regex(
r'<li>from: <a href="/user/view/user/\S*?/uid/(\d*)',
webpage, 'video uploader_id', fatal=False)
upload_date = unified_strdate(self._html_search_regex(
r'<li>uploaded: (.*?)</li>', webpage, 'video upload_date',
fatal=False))
view_count_str = self._html_search_regex(
r'<li>Hits: ([0-9]+?)</li>', webpage, 'video view_count',
fatal=False)
comment_count_str = self._html_search_regex(
r'<h2>([0-9]+?) comments</h2>', webpage, 'video comment_count',
fatal=False)
return {
'id': video_id,
'url': video_url,
'title': video_title,
'description': description,
'thumbnail': thumbnail,
'uploader': uploader,
'uploader_id': uploader_id,
'upload_date': upload_date,
'view_count': int_or_none(view_count_str),
'comment_count': int_or_none(comment_count_str),
}

View File

@ -4,7 +4,6 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
clean_html,
determine_ext,
@ -36,7 +35,7 @@ class LecturioBaseIE(InfoExtractor):
self._LOGIN_URL, None, 'Downloading login popup')
def is_logged(url_handle):
return self._LOGIN_URL not in compat_str(url_handle.geturl())
return self._LOGIN_URL not in url_handle.geturl()
# Already logged in
if is_logged(urlh):

View File

@ -2,23 +2,24 @@
from __future__ import unicode_literals
import re
import uuid
from .common import InfoExtractor
from ..compat import compat_str
from ..compat import compat_HTTPError
from ..utils import (
unescapeHTML,
parse_duration,
get_element_by_class,
ExtractorError,
int_or_none,
qualities,
)
class LEGOIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?lego\.com/(?P<locale>[^/]+)/(?:[^/]+/)*videos/(?:[^/]+/)*[^/?#]+-(?P<id>[0-9a-f]+)'
_VALID_URL = r'https?://(?:www\.)?lego\.com/(?P<locale>[a-z]{2}-[a-z]{2})/(?:[^/]+/)*videos/(?:[^/]+/)*[^/?#]+-(?P<id>[0-9a-f]{32})'
_TESTS = [{
'url': 'http://www.lego.com/en-us/videos/themes/club/blocumentary-kawaguchi-55492d823b1b4d5e985787fa8c2973b1',
'md5': 'f34468f176cfd76488767fc162c405fa',
'info_dict': {
'id': '55492d823b1b4d5e985787fa8c2973b1',
'id': '55492d82-3b1b-4d5e-9857-87fa8c2973b1_en-US',
'ext': 'mp4',
'title': 'Blocumentary Great Creations: Akiyuki Kawaguchi',
'description': 'Blocumentary Great Creations: Akiyuki Kawaguchi',
@ -26,103 +27,123 @@ class LEGOIE(InfoExtractor):
}, {
# geo-restricted but the contentUrl contain a valid url
'url': 'http://www.lego.com/nl-nl/videos/themes/nexoknights/episode-20-kingdom-of-heroes-13bdc2299ab24d9685701a915b3d71e7##sp=399',
'md5': '4c3fec48a12e40c6e5995abc3d36cc2e',
'md5': 'c7420221f7ffd03ff056f9db7f8d807c',
'info_dict': {
'id': '13bdc2299ab24d9685701a915b3d71e7',
'id': '13bdc229-9ab2-4d96-8570-1a915b3d71e7_nl-NL',
'ext': 'mp4',
'title': 'Aflevering 20 - Helden van het koninkrijk',
'title': 'Aflevering 20: Helden van het koninkrijk',
'description': 'md5:8ee499aac26d7fa8bcb0cedb7f9c3941',
'age_limit': 5,
},
}, {
# special characters in title
'url': 'http://www.lego.com/en-us/starwars/videos/lego-star-wars-force-surprise-9685ee9d12e84ff38e84b4e3d0db533d',
# with subtitle
'url': 'https://www.lego.com/nl-nl/kids/videos/classic/creative-storytelling-the-little-puppy-aa24f27c7d5242bc86102ebdc0f24cba',
'info_dict': {
'id': '9685ee9d12e84ff38e84b4e3d0db533d',
'id': 'aa24f27c-7d52-42bc-8610-2ebdc0f24cba_nl-NL',
'ext': 'mp4',
'title': 'Force Surprise LEGO® Star Wars™ Microfighters',
'description': 'md5:9c673c96ce6f6271b88563fe9dc56de3',
'title': 'De kleine puppy',
'description': 'md5:5b725471f849348ac73f2e12cfb4be06',
'age_limit': 1,
'subtitles': {
'nl': [{
'ext': 'srt',
'url': r're:^https://.+\.srt$',
}],
},
},
'params': {
'skip_download': True,
},
}]
_BITRATES = [256, 512, 1024, 1536, 2560]
_QUALITIES = {
'Lowest': (64, 180, 320),
'Low': (64, 270, 480),
'Medium': (96, 360, 640),
'High': (128, 540, 960),
'Highest': (128, 720, 1280),
}
def _real_extract(self, url):
locale, video_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, video_id)
title = get_element_by_class('video-header', webpage).strip()
progressive_base = 'https://lc-mediaplayerns-live-s.legocdn.com/'
streaming_base = 'http://legoprod-f.akamaihd.net/'
content_url = self._html_search_meta('contentUrl', webpage)
path = self._search_regex(
r'(?:https?:)?//[^/]+/(?:[iz]/s/)?public/(.+)_[0-9,]+\.(?:mp4|webm)',
content_url, 'video path', default=None)
if not path:
player_url = self._proto_relative_url(self._search_regex(
r'<iframe[^>]+src="((?:https?)?//(?:www\.)?lego\.com/[^/]+/mediaplayer/video/[^"]+)',
webpage, 'player url', default=None))
if not player_url:
base_url = self._proto_relative_url(self._search_regex(
r'data-baseurl="([^"]+)"', webpage, 'base url',
default='http://www.lego.com/%s/mediaplayer/video/' % locale))
player_url = base_url + video_id
player_webpage = self._download_webpage(player_url, video_id)
video_data = self._parse_json(unescapeHTML(self._search_regex(
r"video='([^']+)'", player_webpage, 'video data')), video_id)
progressive_base = self._search_regex(
r'data-video-progressive-url="([^"]+)"',
player_webpage, 'progressive base', default='https://lc-mediaplayerns-live-s.legocdn.com/')
streaming_base = self._search_regex(
r'data-video-streaming-url="([^"]+)"',
player_webpage, 'streaming base', default='http://legoprod-f.akamaihd.net/')
item_id = video_data['ItemId']
countries = [locale.split('-')[1].upper()]
self._initialize_geo_bypass({
'countries': countries,
})
net_storage_path = video_data.get('NetStoragePath') or '/'.join([item_id[:2], item_id[2:4]])
base_path = '_'.join([item_id, video_data['VideoId'], video_data['Locale'], compat_str(video_data['VideoVersion'])])
path = '/'.join([net_storage_path, base_path])
streaming_path = ','.join(map(lambda bitrate: compat_str(bitrate), self._BITRATES))
try:
item = self._download_json(
# https://contentfeed.services.lego.com/api/v2/item/[VIDEO_ID]?culture=[LOCALE]&contentType=Video
'https://services.slingshot.lego.com/mediaplayer/v2',
video_id, query={
'videoId': '%s_%s' % (uuid.UUID(video_id), locale),
}, headers=self.geo_verification_headers())
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 451:
self.raise_geo_restricted(countries=countries)
raise
formats = self._extract_akamai_formats(
'%si/s/public/%s_,%s,.mp4.csmil/master.m3u8' % (streaming_base, path, streaming_path), video_id)
m3u8_formats = list(filter(
lambda f: f.get('protocol') == 'm3u8_native' and f.get('vcodec') != 'none',
formats))
if len(m3u8_formats) == len(self._BITRATES):
self._sort_formats(m3u8_formats)
for bitrate, m3u8_format in zip(self._BITRATES, m3u8_formats):
progressive_base_url = '%spublic/%s_%d.' % (progressive_base, path, bitrate)
mp4_f = m3u8_format.copy()
mp4_f.update({
'url': progressive_base_url + 'mp4',
'format_id': m3u8_format['format_id'].replace('hls', 'mp4'),
'protocol': 'http',
})
web_f = {
'url': progressive_base_url + 'webm',
'format_id': m3u8_format['format_id'].replace('hls', 'webm'),
'width': m3u8_format['width'],
'height': m3u8_format['height'],
'tbr': m3u8_format.get('tbr'),
'ext': 'webm',
video = item['Video']
video_id = video['Id']
title = video['Title']
q = qualities(['Lowest', 'Low', 'Medium', 'High', 'Highest'])
formats = []
for video_source in item.get('VideoFormats', []):
video_source_url = video_source.get('Url')
if not video_source_url:
continue
video_source_format = video_source.get('Format')
if video_source_format == 'F4M':
formats.extend(self._extract_f4m_formats(
video_source_url, video_id,
f4m_id=video_source_format, fatal=False))
elif video_source_format == 'M3U8':
formats.extend(self._extract_m3u8_formats(
video_source_url, video_id, 'mp4', 'm3u8_native',
m3u8_id=video_source_format, fatal=False))
else:
video_source_quality = video_source.get('Quality')
format_id = []
for v in (video_source_format, video_source_quality):
if v:
format_id.append(v)
f = {
'format_id': '-'.join(format_id),
'quality': q(video_source_quality),
'url': video_source_url,
}
formats.extend([web_f, mp4_f])
else:
for bitrate in self._BITRATES:
for ext in ('web', 'mp4'):
formats.append({
'format_id': '%s-%s' % (ext, bitrate),
'url': '%spublic/%s_%d.%s' % (progressive_base, path, bitrate, ext),
'tbr': bitrate,
'ext': ext,
})
quality = self._QUALITIES.get(video_source_quality)
if quality:
f.update({
'abr': quality[0],
'height': quality[1],
'width': quality[2],
}),
formats.append(f)
self._sort_formats(formats)
subtitles = {}
sub_file_id = video.get('SubFileId')
if sub_file_id and sub_file_id != '00000000-0000-0000-0000-000000000000':
net_storage_path = video.get('NetstoragePath')
invariant_id = video.get('InvariantId')
video_file_id = video.get('VideoFileId')
video_version = video.get('VideoVersion')
if net_storage_path and invariant_id and video_file_id and video_version:
subtitles.setdefault(locale[:2], []).append({
'url': 'https://lc-mediaplayerns-live-s.legocdn.com/public/%s/%s_%s_%s_%s_sub.srt' % (net_storage_path, invariant_id, video_file_id, locale, video_version),
})
return {
'id': video_id,
'title': title,
'description': self._html_search_meta('description', webpage),
'thumbnail': self._html_search_meta('thumbnail', webpage),
'duration': parse_duration(self._html_search_meta('duration', webpage)),
'description': video.get('Description'),
'thumbnail': video.get('GeneratedCoverImage') or video.get('GeneratedThumbnail'),
'duration': int_or_none(video.get('Length')),
'formats': formats,
'subtitles': subtitles,
'age_limit': int_or_none(video.get('AgeFrom')),
'season': video.get('SeasonTitle'),
'season_number': int_or_none(video.get('Season')) or None,
'episode_number': int_or_none(video.get('Episode')) or None,
}

View File

@ -8,7 +8,6 @@ from .common import InfoExtractor
from ..compat import (
compat_b64decode,
compat_HTTPError,
compat_str,
)
from ..utils import (
ExtractorError,
@ -99,7 +98,7 @@ class LinuxAcademyIE(InfoExtractor):
'sso': 'true',
})
login_state_url = compat_str(urlh.geturl())
login_state_url = urlh.geturl()
try:
login_page = self._download_webpage(
@ -129,7 +128,7 @@ class LinuxAcademyIE(InfoExtractor):
})
access_token = self._search_regex(
r'access_token=([^=&]+)', compat_str(urlh.geturl()),
r'access_token=([^=&]+)', urlh.geturl(),
'access token')
self._download_webpage(

View File

@ -6,7 +6,6 @@ import re
from .theplatform import ThePlatformBaseIE
from ..compat import (
compat_parse_qs,
compat_str,
compat_urllib_parse_urlparse,
)
from ..utils import (
@ -114,7 +113,7 @@ class MediasetIE(ThePlatformBaseIE):
continue
urlh = ie._request_webpage(
embed_url, video_id, note='Following embed URL redirect')
embed_url = compat_str(urlh.geturl())
embed_url = urlh.geturl()
program_guid = _program_guid(_qs(embed_url))
if program_guid:
entries.append(embed_url)

View File

@ -129,7 +129,7 @@ class MediasiteIE(InfoExtractor):
query = mobj.group('query')
webpage, urlh = self._download_webpage_handle(url, resource_id) # XXX: add UrlReferrer?
redirect_url = compat_str(urlh.geturl())
redirect_url = urlh.geturl()
# XXX: might have also extracted UrlReferrer and QueryString from the html
service_path = compat_urlparse.urljoin(redirect_url, self._html_search_regex(

View File

@ -1,68 +1,33 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
clean_html,
dict_get,
ExtractorError,
int_or_none,
parse_duration,
try_get,
update_url_query,
)
class NaverIE(InfoExtractor):
_VALID_URL = r'https?://(?:m\.)?tv(?:cast)?\.naver\.com/v/(?P<id>\d+)'
class NaverBaseIE(InfoExtractor):
_CAPTION_EXT_RE = r'\.(?:ttml|vtt)'
_TESTS = [{
'url': 'http://tv.naver.com/v/81652',
'info_dict': {
'id': '81652',
'ext': 'mp4',
'title': '[9월 모의고사 해설강의][수학_김상희] 수학 A형 16~20번',
'description': '합격불변의 법칙 메가스터디 | 메가스터디 수학 김상희 선생님이 9월 모의고사 수학A형 16번에서 20번까지 해설강의를 공개합니다.',
'upload_date': '20130903',
},
}, {
'url': 'http://tv.naver.com/v/395837',
'md5': '638ed4c12012c458fefcddfd01f173cd',
'info_dict': {
'id': '395837',
'ext': 'mp4',
'title': '9년이 지나도 아픈 기억, 전효성의 아버지',
'description': 'md5:5bf200dcbf4b66eb1b350d1eb9c753f7',
'upload_date': '20150519',
},
'skip': 'Georestricted',
}, {
'url': 'http://tvcast.naver.com/v/81652',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
vid = self._search_regex(
r'videoId["\']\s*:\s*(["\'])(?P<value>(?:(?!\1).)+)\1', webpage,
'video id', fatal=None, group='value')
in_key = self._search_regex(
r'inKey["\']\s*:\s*(["\'])(?P<value>(?:(?!\1).)+)\1', webpage,
'key', default=None, group='value')
if not vid or not in_key:
error = self._html_search_regex(
r'(?s)<div class="(?:nation_error|nation_box|error_box)">\s*(?:<!--.*?-->)?\s*<p class="[^"]+">(?P<msg>.+?)</p>\s*</div>',
webpage, 'error', default=None)
if error:
raise ExtractorError(error, expected=True)
raise ExtractorError('couldn\'t extract vid and key')
def _extract_video_info(self, video_id, vid, key):
video_data = self._download_json(
'http://play.rmcnmv.naver.com/vod/play/v2.0/' + vid,
video_id, query={
'key': in_key,
'key': key,
})
meta = video_data['meta']
title = meta['subject']
formats = []
get_list = lambda x: try_get(video_data, lambda y: y[x + 's']['list'], list) or []
def extract_formats(streams, stream_type, query={}):
for stream in streams:
@ -73,7 +38,7 @@ class NaverIE(InfoExtractor):
encoding_option = stream.get('encodingOption', {})
bitrate = stream.get('bitrate', {})
formats.append({
'format_id': '%s_%s' % (stream.get('type') or stream_type, encoding_option.get('id') or encoding_option.get('name')),
'format_id': '%s_%s' % (stream.get('type') or stream_type, dict_get(encoding_option, ('name', 'id'))),
'url': stream_url,
'width': int_or_none(encoding_option.get('width')),
'height': int_or_none(encoding_option.get('height')),
@ -83,7 +48,7 @@ class NaverIE(InfoExtractor):
'protocol': 'm3u8_native' if stream_type == 'HLS' else None,
})
extract_formats(video_data.get('videos', {}).get('list', []), 'H264')
extract_formats(get_list('video'), 'H264')
for stream_set in video_data.get('streams', []):
query = {}
for param in stream_set.get('keys', []):
@ -101,28 +66,101 @@ class NaverIE(InfoExtractor):
'mp4', 'm3u8_native', m3u8_id=stream_type, fatal=False))
self._sort_formats(formats)
replace_ext = lambda x, y: re.sub(self._CAPTION_EXT_RE, '.' + y, x)
def get_subs(caption_url):
if re.search(self._CAPTION_EXT_RE, caption_url):
return [{
'url': replace_ext(caption_url, 'ttml'),
}, {
'url': replace_ext(caption_url, 'vtt'),
}]
else:
return [{'url': caption_url}]
automatic_captions = {}
subtitles = {}
for caption in video_data.get('captions', {}).get('list', []):
for caption in get_list('caption'):
caption_url = caption.get('source')
if not caption_url:
continue
subtitles.setdefault(caption.get('language') or caption.get('locale'), []).append({
'url': caption_url,
})
sub_dict = automatic_captions if caption.get('type') == 'auto' else subtitles
sub_dict.setdefault(dict_get(caption, ('locale', 'language')), []).extend(get_subs(caption_url))
upload_date = self._search_regex(
r'<span[^>]+class="date".*?(\d{4}\.\d{2}\.\d{2})',
webpage, 'upload date', fatal=False)
if upload_date:
upload_date = upload_date.replace('.', '')
user = meta.get('user', {})
return {
'id': video_id,
'title': title,
'formats': formats,
'subtitles': subtitles,
'description': self._og_search_description(webpage),
'thumbnail': meta.get('cover', {}).get('source') or self._og_search_thumbnail(webpage),
'automatic_captions': automatic_captions,
'thumbnail': try_get(meta, lambda x: x['cover']['source']),
'view_count': int_or_none(meta.get('count')),
'upload_date': upload_date,
'uploader_id': user.get('id'),
'uploader': user.get('name'),
'uploader_url': user.get('url'),
}
class NaverIE(NaverBaseIE):
_VALID_URL = r'https?://(?:m\.)?tv(?:cast)?\.naver\.com/(?:v|embed)/(?P<id>\d+)'
_GEO_BYPASS = False
_TESTS = [{
'url': 'http://tv.naver.com/v/81652',
'info_dict': {
'id': '81652',
'ext': 'mp4',
'title': '[9월 모의고사 해설강의][수학_김상희] 수학 A형 16~20번',
'description': '메가스터디 수학 김상희 선생님이 9월 모의고사 수학A형 16번에서 20번까지 해설강의를 공개합니다.',
'timestamp': 1378200754,
'upload_date': '20130903',
'uploader': '메가스터디, 합격불변의 법칙',
'uploader_id': 'megastudy',
},
}, {
'url': 'http://tv.naver.com/v/395837',
'md5': '8a38e35354d26a17f73f4e90094febd3',
'info_dict': {
'id': '395837',
'ext': 'mp4',
'title': '9년이 지나도 아픈 기억, 전효성의 아버지',
'description': 'md5:eb6aca9d457b922e43860a2a2b1984d3',
'timestamp': 1432030253,
'upload_date': '20150519',
'uploader': '4가지쇼 시즌2',
'uploader_id': 'wrappinguser29',
},
'skip': 'Georestricted',
}, {
'url': 'http://tvcast.naver.com/v/81652',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
content = self._download_json(
'https://tv.naver.com/api/json/v/' + video_id,
video_id, headers=self.geo_verification_headers())
player_info_json = content.get('playerInfoJson') or {}
current_clip = player_info_json.get('currentClip') or {}
vid = current_clip.get('videoId')
in_key = current_clip.get('inKey')
if not vid or not in_key:
player_auth = try_get(player_info_json, lambda x: x['playerOption']['auth'])
if player_auth == 'notCountry':
self.raise_geo_restricted(countries=['KR'])
elif player_auth == 'notLogin':
self.raise_login_required()
raise ExtractorError('couldn\'t extract vid and key')
info = self._extract_video_info(video_id, vid, in_key)
info.update({
'description': clean_html(current_clip.get('description')),
'timestamp': int_or_none(current_clip.get('firstExposureTime'), 1000),
'duration': parse_duration(current_clip.get('displayPlayTime')),
'like_count': int_or_none(current_clip.get('recommendPoint')),
'age_limit': 19 if current_clip.get('adult') else None,
})
return info

View File

@ -87,11 +87,25 @@ class NBCIE(AdobePassIE):
def _real_extract(self, url):
permalink, video_id = re.match(self._VALID_URL, url).groups()
permalink = 'http' + compat_urllib_parse_unquote(permalink)
response = self._download_json(
video_data = self._download_json(
'https://friendship.nbc.co/v2/graphql', video_id, query={
'query': '''{
page(name: "%s", platform: web, type: VIDEO, userId: "0") {
data {
'query': '''query bonanzaPage(
$app: NBCUBrands! = nbc
$name: String!
$oneApp: Boolean
$platform: SupportedPlatforms! = web
$type: EntityPageType! = VIDEO
$userId: String!
) {
bonanzaPage(
app: $app
name: $name
oneApp: $oneApp
platform: $platform
type: $type
userId: $userId
) {
metadata {
... on VideoPageData {
description
episodeNumber
@ -100,15 +114,20 @@ class NBCIE(AdobePassIE):
mpxAccountId
mpxGuid
rating
resourceId
seasonNumber
secondaryTitle
seriesShortTitle
}
}
}
}''' % permalink,
})
video_data = response['data']['page']['data']
}''',
'variables': json.dumps({
'name': permalink,
'oneApp': True,
'userId': '0',
}),
})['data']['bonanzaPage']['metadata']
query = {
'mbr': 'true',
'manifest': 'm3u',
@ -117,8 +136,8 @@ class NBCIE(AdobePassIE):
title = video_data['secondaryTitle']
if video_data.get('locked'):
resource = self._get_mvpd_resource(
'nbcentertainment', title, video_id,
video_data.get('rating'))
video_data.get('resourceId') or 'nbcentertainment',
title, video_id, video_data.get('rating'))
query['auth'] = self._extract_mvpd_auth(
url, video_id, 'nbcentertainment', resource)
theplatform_url = smuggle_url(update_url_query(

View File

@ -9,6 +9,8 @@ from ..utils import (
int_or_none,
parse_iso8601,
qualities,
try_get,
urljoin,
)
@ -220,11 +222,17 @@ class NDREmbedBaseIE(InfoExtractor):
upload_date = ppjson.get('config', {}).get('publicationDate')
duration = int_or_none(config.get('duration'))
thumbnails = [{
'id': thumbnail.get('quality') or thumbnail_id,
'url': thumbnail['src'],
'preference': quality_key(thumbnail.get('quality')),
} for thumbnail_id, thumbnail in config.get('poster', {}).items() if thumbnail.get('src')]
thumbnails = []
poster = try_get(config, lambda x: x['poster'], dict) or {}
for thumbnail_id, thumbnail in poster.items():
thumbnail_url = urljoin(url, thumbnail.get('src'))
if not thumbnail_url:
continue
thumbnails.append({
'id': thumbnail.get('quality') or thumbnail_id,
'url': thumbnail_url,
'preference': quality_key(thumbnail.get('quality')),
})
return {
'id': video_id,

View File

@ -18,7 +18,7 @@ class NovaEmbedIE(InfoExtractor):
_VALID_URL = r'https?://media\.cms\.nova\.cz/embed/(?P<id>[^/?#&]+)'
_TEST = {
'url': 'https://media.cms.nova.cz/embed/8o0n0r?autoplay=1',
'md5': 'b3834f6de5401baabf31ed57456463f7',
'md5': 'ee009bafcc794541570edd44b71cbea3',
'info_dict': {
'id': '8o0n0r',
'ext': 'mp4',
@ -44,11 +44,17 @@ class NovaEmbedIE(InfoExtractor):
formats = []
for format_id, format_list in bitrates.items():
if not isinstance(format_list, list):
continue
format_list = [format_list]
for format_url in format_list:
format_url = url_or_none(format_url)
if not format_url:
continue
if format_id == 'hls':
formats.extend(self._extract_m3u8_formats(
format_url, video_id, ext='mp4',
entry_protocol='m3u8_native', m3u8_id='hls',
fatal=False))
continue
f = {
'url': format_url,
}
@ -91,7 +97,7 @@ class NovaIE(InfoExtractor):
_VALID_URL = r'https?://(?:[^.]+\.)?(?P<site>tv(?:noviny)?|tn|novaplus|vymena|fanda|krasna|doma|prask)\.nova\.cz/(?:[^/]+/)+(?P<id>[^/]+?)(?:\.html|/|$)'
_TESTS = [{
'url': 'http://tn.nova.cz/clanek/tajemstvi-ukryte-v-podzemi-specialni-nemocnice-v-prazske-krci.html#player_13260',
'md5': '1dd7b9d5ea27bc361f110cd855a19bd3',
'md5': '249baab7d0104e186e78b0899c7d5f28',
'info_dict': {
'id': '1757139',
'display_id': 'tajemstvi-ukryte-v-podzemi-specialni-nemocnice-v-prazske-krci',
@ -113,7 +119,8 @@ class NovaIE(InfoExtractor):
'params': {
# rtmp download
'skip_download': True,
}
},
'skip': 'gone',
}, {
# media.cms.nova.cz embed
'url': 'https://novaplus.nova.cz/porad/ulice/epizoda/18760-2180-dil',
@ -128,6 +135,7 @@ class NovaIE(InfoExtractor):
'skip_download': True,
},
'add_ie': [NovaEmbedIE.ie_key()],
'skip': 'CHYBA 404: STRÁNKA NENALEZENA',
}, {
'url': 'http://sport.tn.nova.cz/clanek/sport/hokej/nhl/zivot-jde-dal-hodnotil-po-vyrazeni-z-playoff-jiri-sekac.html',
'only_matching': True,
@ -152,14 +160,29 @@ class NovaIE(InfoExtractor):
webpage = self._download_webpage(url, display_id)
description = clean_html(self._og_search_description(webpage, default=None))
if site == 'novaplus':
upload_date = unified_strdate(self._search_regex(
r'(\d{1,2}-\d{1,2}-\d{4})$', display_id, 'upload date', default=None))
elif site == 'fanda':
upload_date = unified_strdate(self._search_regex(
r'<span class="date_time">(\d{1,2}\.\d{1,2}\.\d{4})', webpage, 'upload date', default=None))
else:
upload_date = None
# novaplus
embed_id = self._search_regex(
r'<iframe[^>]+\bsrc=["\'](?:https?:)?//media\.cms\.nova\.cz/embed/([^/?#&]+)',
webpage, 'embed url', default=None)
if embed_id:
return self.url_result(
'https://media.cms.nova.cz/embed/%s' % embed_id,
ie=NovaEmbedIE.ie_key(), video_id=embed_id)
return {
'_type': 'url_transparent',
'url': 'https://media.cms.nova.cz/embed/%s' % embed_id,
'ie_key': NovaEmbedIE.ie_key(),
'id': embed_id,
'description': description,
'upload_date': upload_date
}
video_id = self._search_regex(
[r"(?:media|video_id)\s*:\s*'(\d+)'",
@ -233,18 +256,8 @@ class NovaIE(InfoExtractor):
self._sort_formats(formats)
title = mediafile.get('meta', {}).get('title') or self._og_search_title(webpage)
description = clean_html(self._og_search_description(webpage, default=None))
thumbnail = config.get('poster')
if site == 'novaplus':
upload_date = unified_strdate(self._search_regex(
r'(\d{1,2}-\d{1,2}-\d{4})$', display_id, 'upload date', default=None))
elif site == 'fanda':
upload_date = unified_strdate(self._search_regex(
r'<span class="date_time">(\d{1,2}\.\d{1,2}\.\d{4})', webpage, 'upload date', default=None))
else:
upload_date = None
return {
'id': video_id,
'display_id': display_id,

View File

@ -4,6 +4,7 @@ from .common import InfoExtractor
from ..utils import (
int_or_none,
qualities,
url_or_none,
)
@ -48,6 +49,10 @@ class NprIE(InfoExtractor):
},
}],
'expected_warnings': ['Failed to download m3u8 information'],
}, {
# multimedia, no formats, stream
'url': 'https://www.npr.org/2020/02/14/805476846/laura-stevenson-tiny-desk-concert',
'only_matching': True,
}]
def _real_extract(self, url):
@ -95,6 +100,17 @@ class NprIE(InfoExtractor):
'format_id': format_id,
'quality': quality(format_id),
})
for stream_id, stream_entry in media.get('stream', {}).items():
if not isinstance(stream_entry, dict):
continue
if stream_id != 'hlsUrl':
continue
stream_url = url_or_none(stream_entry.get('$text'))
if not stream_url:
continue
formats.extend(self._extract_m3u8_formats(
stream_url, stream_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
self._sort_formats(formats)
entries.append({

View File

@ -12,6 +12,7 @@ from ..utils import (
ExtractorError,
int_or_none,
JSON_LD_RE,
js_to_json,
NO_DEFAULT,
parse_age_limit,
parse_duration,
@ -105,6 +106,7 @@ class NRKBaseIE(InfoExtractor):
MESSAGES = {
'ProgramRightsAreNotReady': 'Du kan dessverre ikke se eller høre programmet',
'ProgramRightsHasExpired': 'Programmet har gått ut',
'NoProgramRights': 'Ikke tilgjengelig',
'ProgramIsGeoBlocked': 'NRK har ikke rettigheter til å vise dette programmet utenfor Norge',
}
message_type = data.get('messageType', '')
@ -255,6 +257,17 @@ class NRKTVIE(NRKBaseIE):
''' % _EPISODE_RE
_API_HOSTS = ('psapi-ne.nrk.no', 'psapi-we.nrk.no')
_TESTS = [{
'url': 'https://tv.nrk.no/program/MDDP12000117',
'md5': '8270824df46ec629b66aeaa5796b36fb',
'info_dict': {
'id': 'MDDP12000117AA',
'ext': 'mp4',
'title': 'Alarm Trolltunga',
'description': 'md5:46923a6e6510eefcce23d5ef2a58f2ce',
'duration': 2223,
'age_limit': 6,
},
}, {
'url': 'https://tv.nrk.no/serie/20-spoersmaal-tv/MUHH48000314/23-05-2014',
'md5': '9a167e54d04671eb6317a37b7bc8a280',
'info_dict': {
@ -266,6 +279,7 @@ class NRKTVIE(NRKBaseIE):
'series': '20 spørsmål',
'episode': '23.05.2014',
},
'skip': 'NoProgramRights',
}, {
'url': 'https://tv.nrk.no/program/mdfp15000514',
'info_dict': {
@ -370,7 +384,24 @@ class NRKTVIE(NRKBaseIE):
class NRKTVEpisodeIE(InfoExtractor):
_VALID_URL = r'https?://tv\.nrk\.no/serie/(?P<id>[^/]+/sesong/\d+/episode/\d+)'
_TEST = {
_TESTS = [{
'url': 'https://tv.nrk.no/serie/hellums-kro/sesong/1/episode/2',
'info_dict': {
'id': 'MUHH36005220BA',
'ext': 'mp4',
'title': 'Kro, krig og kjærlighet 2:6',
'description': 'md5:b32a7dc0b1ed27c8064f58b97bda4350',
'duration': 1563,
'series': 'Hellums kro',
'season_number': 1,
'episode_number': 2,
'episode': '2:6',
'age_limit': 6,
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://tv.nrk.no/serie/backstage/sesong/1/episode/8',
'info_dict': {
'id': 'MSUI14000816AA',
@ -386,7 +417,8 @@ class NRKTVEpisodeIE(InfoExtractor):
'params': {
'skip_download': True,
},
}
'skip': 'ProgramRightsHasExpired',
}]
def _real_extract(self, url):
display_id = self._match_id(url)
@ -409,7 +441,7 @@ class NRKTVSerieBaseIE(InfoExtractor):
(r'INITIAL_DATA(?:_V\d)?_*\s*=\s*({.+?})\s*;',
r'({.+?})\s*,\s*"[^"]+"\s*\)\s*</script>'),
webpage, 'config', default='{}' if not fatal else NO_DEFAULT),
display_id, fatal=False)
display_id, fatal=False, transform_source=js_to_json)
if not config:
return
return try_get(
@ -479,6 +511,14 @@ class NRKTVSeriesIE(NRKTVSerieBaseIE):
_VALID_URL = r'https?://(?:tv|radio)\.nrk(?:super)?\.no/serie/(?P<id>[^/]+)'
_ITEM_RE = r'(?:data-season=["\']|id=["\']season-)(?P<id>\d+)'
_TESTS = [{
'url': 'https://tv.nrk.no/serie/blank',
'info_dict': {
'id': 'blank',
'title': 'Blank',
'description': 'md5:7664b4e7e77dc6810cd3bca367c25b6e',
},
'playlist_mincount': 30,
}, {
# new layout, seasons
'url': 'https://tv.nrk.no/serie/backstage',
'info_dict': {
@ -648,7 +688,7 @@ class NRKSkoleIE(InfoExtractor):
_TESTS = [{
'url': 'https://www.nrk.no/skole/?page=search&q=&mediaId=14099',
'md5': '6bc936b01f9dd8ed45bc58b252b2d9b6',
'md5': '18c12c3d071953c3bf8d54ef6b2587b7',
'info_dict': {
'id': '6021',
'ext': 'mp4',

View File

@ -69,10 +69,10 @@ class NYTimesBaseIE(InfoExtractor):
'width': int_or_none(video.get('width')),
'height': int_or_none(video.get('height')),
'filesize': get_file_size(video.get('file_size') or video.get('fileSize')),
'tbr': int_or_none(video.get('bitrate'), 1000),
'tbr': int_or_none(video.get('bitrate'), 1000) or None,
'ext': ext,
})
self._sort_formats(formats)
self._sort_formats(formats, ('height', 'width', 'filesize', 'tbr', 'fps', 'format_id'))
thumbnails = []
for image in video_data.get('images', []):

View File

@ -6,12 +6,14 @@ import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
clean_html,
determine_ext,
float_or_none,
HEADRequest,
int_or_none,
orderedSet,
remove_end,
str_or_none,
strip_jsonp,
unescapeHTML,
unified_strdate,
@ -88,8 +90,11 @@ class ORFTVthekIE(InfoExtractor):
format_id = '-'.join(format_id_list)
ext = determine_ext(src)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
src, video_id, 'mp4', m3u8_id=format_id, fatal=False))
m3u8_formats = self._extract_m3u8_formats(
src, video_id, 'mp4', m3u8_id=format_id, fatal=False)
if any('/geoprotection' in f['url'] for f in m3u8_formats):
self.raise_geo_restricted()
formats.extend(m3u8_formats)
elif ext == 'f4m':
formats.extend(self._extract_f4m_formats(
src, video_id, f4m_id=format_id, fatal=False))
@ -161,44 +166,48 @@ class ORFRadioIE(InfoExtractor):
show_date = mobj.group('date')
show_id = mobj.group('show')
if station == 'fm4':
show_id = '4%s' % show_id
data = self._download_json(
'http://audioapi.orf.at/%s/api/json/current/broadcast/%s/%s' % (station, show_id, show_date),
show_id
)
'http://audioapi.orf.at/%s/api/json/current/broadcast/%s/%s'
% (station, show_id, show_date), show_id)
def extract_entry_dict(info, title, subtitle):
return {
'id': info['loopStreamId'].replace('.mp3', ''),
'url': 'http://loopstream01.apa.at/?channel=%s&id=%s' % (station, info['loopStreamId']),
entries = []
for info in data['streams']:
loop_stream_id = str_or_none(info.get('loopStreamId'))
if not loop_stream_id:
continue
title = str_or_none(data.get('title'))
if not title:
continue
start = int_or_none(info.get('start'), scale=1000)
end = int_or_none(info.get('end'), scale=1000)
duration = end - start if end and start else None
entries.append({
'id': loop_stream_id.replace('.mp3', ''),
'url': 'http://loopstream01.apa.at/?channel=%s&id=%s' % (station, loop_stream_id),
'title': title,
'description': subtitle,
'duration': (info['end'] - info['start']) / 1000,
'timestamp': info['start'] / 1000,
'description': clean_html(data.get('subtitle')),
'duration': duration,
'timestamp': start,
'ext': 'mp3',
'series': data.get('programTitle')
}
entries = [extract_entry_dict(t, data['title'], data['subtitle']) for t in data['streams']]
'series': data.get('programTitle'),
})
return {
'_type': 'playlist',
'id': show_id,
'title': data['title'],
'description': data['subtitle'],
'entries': entries
'title': data.get('title'),
'description': clean_html(data.get('subtitle')),
'entries': entries,
}
class ORFFM4IE(ORFRadioIE):
IE_NAME = 'orf:fm4'
IE_DESC = 'radio FM4'
_VALID_URL = r'https?://(?P<station>fm4)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
_VALID_URL = r'https?://(?P<station>fm4)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>4\w+)'
_TEST = {
'url': 'http://fm4.orf.at/player/20170107/CC',
'url': 'http://fm4.orf.at/player/20170107/4CC',
'md5': '2b0be47375432a7ef104453432a19212',
'info_dict': {
'id': '2017-01-07_2100_tl_54_7DaysSat18_31295',
@ -209,7 +218,8 @@ class ORFFM4IE(ORFRadioIE):
'timestamp': 1483819257,
'upload_date': '20170107',
},
'skip': 'Shows from ORF radios are only available for 7 days.'
'skip': 'Shows from ORF radios are only available for 7 days.',
'only_matching': True,
}

View File

@ -1,99 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
ExtractorError,
qualities,
)
class PandaTVIE(InfoExtractor):
IE_DESC = '熊猫TV'
_VALID_URL = r'https?://(?:www\.)?panda\.tv/(?P<id>[0-9]+)'
_TESTS = [{
'url': 'http://www.panda.tv/66666',
'info_dict': {
'id': '66666',
'title': 're:.+',
'uploader': '刘杀鸡',
'ext': 'flv',
'is_live': True,
},
'params': {
'skip_download': True,
},
'skip': 'Live stream is offline',
}, {
'url': 'https://www.panda.tv/66666',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
config = self._download_json(
'https://www.panda.tv/api_room_v2?roomid=%s' % video_id, video_id)
error_code = config.get('errno', 0)
if error_code != 0:
raise ExtractorError(
'%s returned error %s: %s'
% (self.IE_NAME, error_code, config['errmsg']),
expected=True)
data = config['data']
video_info = data['videoinfo']
# 2 = live, 3 = offline
if video_info.get('status') != '2':
raise ExtractorError(
'Live stream is offline', expected=True)
title = data['roominfo']['name']
uploader = data.get('hostinfo', {}).get('name')
room_key = video_info['room_key']
stream_addr = video_info.get(
'stream_addr', {'OD': '1', 'HD': '1', 'SD': '1'})
# Reverse engineered from web player swf
# (http://s6.pdim.gs/static/07153e425f581151.swf at the moment of
# writing).
plflag0, plflag1 = video_info['plflag'].split('_')
plflag0 = int(plflag0) - 1
if plflag1 == '21':
plflag0 = 10
plflag1 = '4'
live_panda = 'live_panda' if plflag0 < 1 else ''
plflag_auth = self._parse_json(video_info['plflag_list'], video_id)
sign = plflag_auth['auth']['sign']
ts = plflag_auth['auth']['time']
rid = plflag_auth['auth']['rid']
quality_key = qualities(['OD', 'HD', 'SD'])
suffix = ['_small', '_mid', '']
formats = []
for k, v in stream_addr.items():
if v != '1':
continue
quality = quality_key(k)
if quality <= 0:
continue
for pref, (ext, pl) in enumerate((('m3u8', '-hls'), ('flv', ''))):
formats.append({
'url': 'https://pl%s%s.live.panda.tv/live_panda/%s%s%s.%s?sign=%s&ts=%s&rid=%s'
% (pl, plflag1, room_key, live_panda, suffix[quality], ext, sign, ts, rid),
'format_id': '%s-%s' % (k, ext),
'quality': quality,
'source_preference': pref,
})
self._sort_formats(formats)
return {
'id': video_id,
'title': self._live_title(title),
'uploader': uploader,
'formats': formats,
'is_live': True,
}

View File

@ -46,7 +46,7 @@ class PlatziBaseIE(InfoExtractor):
headers={'Referer': self._LOGIN_URL})
# login succeeded
if 'platzi.com/login' not in compat_str(urlh.geturl()):
if 'platzi.com/login' not in urlh.geturl():
return
login_error = self._webpage_read_content(

View File

@ -0,0 +1,99 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import (
compat_b64decode,
compat_chr,
)
from ..utils import int_or_none
class PopcorntimesIE(InfoExtractor):
_VALID_URL = r'https?://popcorntimes\.tv/[^/]+/m/(?P<id>[^/]+)/(?P<display_id>[^/?#&]+)'
_TEST = {
'url': 'https://popcorntimes.tv/de/m/A1XCFvz/haensel-und-gretel-opera-fantasy',
'md5': '93f210991ad94ba8c3485950a2453257',
'info_dict': {
'id': 'A1XCFvz',
'display_id': 'haensel-und-gretel-opera-fantasy',
'ext': 'mp4',
'title': 'Hänsel und Gretel',
'description': 'md5:1b8146791726342e7b22ce8125cf6945',
'thumbnail': r're:^https?://.*\.jpg$',
'creator': 'John Paul',
'release_date': '19541009',
'duration': 4260,
'tbr': 5380,
'width': 720,
'height': 540,
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id, display_id = mobj.group('id', 'display_id')
webpage = self._download_webpage(url, display_id)
title = self._search_regex(
r'<h1>([^<]+)', webpage, 'title',
default=None) or self._html_search_meta(
'ya:ovs:original_name', webpage, 'title', fatal=True)
loc = self._search_regex(
r'PCTMLOC\s*=\s*(["\'])(?P<value>(?:(?!\1).)+)\1', webpage, 'loc',
group='value')
loc_b64 = ''
for c in loc:
c_ord = ord(c)
if ord('a') <= c_ord <= ord('z') or ord('A') <= c_ord <= ord('Z'):
upper = ord('Z') if c_ord <= ord('Z') else ord('z')
c_ord += 13
if upper < c_ord:
c_ord -= 26
loc_b64 += compat_chr(c_ord)
video_url = compat_b64decode(loc_b64).decode('utf-8')
description = self._html_search_regex(
r'(?s)<div[^>]+class=["\']pt-movie-desc[^>]+>(.+?)</div>', webpage,
'description', fatal=False)
thumbnail = self._search_regex(
r'<img[^>]+class=["\']video-preview[^>]+\bsrc=(["\'])(?P<value>(?:(?!\1).)+)\1',
webpage, 'thumbnail', default=None,
group='value') or self._og_search_thumbnail(webpage)
creator = self._html_search_meta(
'video:director', webpage, 'creator', default=None)
release_date = self._html_search_meta(
'video:release_date', webpage, default=None)
if release_date:
release_date = release_date.replace('-', '')
def int_meta(name):
return int_or_none(self._html_search_meta(
name, webpage, default=None))
return {
'id': video_id,
'display_id': display_id,
'url': video_url,
'title': title,
'description': description,
'thumbnail': thumbnail,
'creator': creator,
'release_date': release_date,
'duration': int_meta('video:duration'),
'tbr': int_meta('ya:ovs:bitrate'),
'width': int_meta('og:video:width'),
'height': int_meta('og:video:height'),
'http_headers': {
'Referer': url,
},
}

View File

@ -8,6 +8,7 @@ from ..utils import (
ExtractorError,
int_or_none,
js_to_json,
merge_dicts,
urljoin,
)
@ -27,23 +28,22 @@ class PornHdIE(InfoExtractor):
'view_count': int,
'like_count': int,
'age_limit': 18,
}
},
'skip': 'HTTP Error 404: Not Found',
}, {
# removed video
'url': 'http://www.pornhd.com/videos/1962/sierra-day-gets-his-cum-all-over-herself-hd-porn-video',
'md5': '956b8ca569f7f4d8ec563e2c41598441',
'md5': '1b7b3a40b9d65a8e5b25f7ab9ee6d6de',
'info_dict': {
'id': '1962',
'display_id': 'sierra-day-gets-his-cum-all-over-herself-hd-porn-video',
'ext': 'mp4',
'title': 'Sierra loves doing laundry',
'title': 'md5:98c6f8b2d9c229d0f0fde47f61a1a759',
'description': 'md5:8ff0523848ac2b8f9b065ba781ccf294',
'thumbnail': r're:^https?://.*\.jpg',
'view_count': int,
'like_count': int,
'age_limit': 18,
},
'skip': 'Not available anymore',
}]
def _real_extract(self, url):
@ -61,7 +61,13 @@ class PornHdIE(InfoExtractor):
r"(?s)sources'?\s*[:=]\s*(\{.+?\})",
webpage, 'sources', default='{}')), video_id)
info = {}
if not sources:
entries = self._parse_html5_media_entries(url, webpage, video_id)
if entries:
info = entries[0]
if not sources and not info:
message = self._html_search_regex(
r'(?s)<(div|p)[^>]+class="no-video"[^>]*>(?P<value>.+?)</\1',
webpage, 'error message', group='value')
@ -80,23 +86,29 @@ class PornHdIE(InfoExtractor):
'format_id': format_id,
'height': height,
})
self._sort_formats(formats)
if formats:
info['formats'] = formats
self._sort_formats(info['formats'])
description = self._html_search_regex(
r'<(div|p)[^>]+class="description"[^>]*>(?P<value>[^<]+)</\1',
webpage, 'description', fatal=False, group='value')
(r'(?s)<section[^>]+class=["\']video-description[^>]+>(?P<value>.+?)</section>',
r'<(div|p)[^>]+class="description"[^>]*>(?P<value>[^<]+)</\1'),
webpage, 'description', fatal=False,
group='value') or self._html_search_meta(
'description', webpage, default=None) or self._og_search_description(webpage)
view_count = int_or_none(self._html_search_regex(
r'(\d+) views\s*<', webpage, 'view count', fatal=False))
thumbnail = self._search_regex(
r"poster'?\s*:\s*([\"'])(?P<url>(?:(?!\1).)+)\1", webpage,
'thumbnail', fatal=False, group='url')
'thumbnail', default=None, group='url')
like_count = int_or_none(self._search_regex(
(r'(\d+)\s*</11[^>]+>(?:&nbsp;|\s)*\blikes',
(r'(\d+)</span>\s*likes',
r'(\d+)\s*</11[^>]+>(?:&nbsp;|\s)*\blikes',
r'class=["\']save-count["\'][^>]*>\s*(\d+)'),
webpage, 'like count', fatal=False))
return {
return merge_dicts(info, {
'id': video_id,
'display_id': display_id,
'title': title,
@ -106,4 +118,4 @@ class PornHdIE(InfoExtractor):
'like_count': like_count,
'formats': formats,
'age_limit': 18,
}
})

View File

@ -43,8 +43,15 @@ class RedTubeIE(InfoExtractor):
webpage = self._download_webpage(
'http://www.redtube.com/%s' % video_id, video_id)
if any(s in webpage for s in ['video-deleted-info', '>This video has been removed']):
raise ExtractorError('Video %s has been removed' % video_id, expected=True)
ERRORS = (
(('video-deleted-info', '>This video has been removed'), 'has been removed'),
(('private_video_text', '>This video is private', '>Send a friend request to its owner to be able to view it'), 'is private'),
)
for patterns, message in ERRORS:
if any(p in webpage for p in patterns):
raise ExtractorError(
'Video %s %s' % (video_id, message), expected=True)
info = self._search_json_ld(webpage, video_id, default={})

View File

@ -8,7 +8,6 @@ from .common import InfoExtractor
from ..compat import (
compat_parse_qs,
compat_str,
compat_urlparse,
)
from ..utils import (
@ -39,13 +38,13 @@ class SafariBaseIE(InfoExtractor):
'Downloading login page')
def is_logged(urlh):
return 'learning.oreilly.com/home/' in compat_str(urlh.geturl())
return 'learning.oreilly.com/home/' in urlh.geturl()
if is_logged(urlh):
self.LOGGED_IN = True
return
redirect_url = compat_str(urlh.geturl())
redirect_url = urlh.geturl()
parsed_url = compat_urlparse.urlparse(redirect_url)
qs = compat_parse_qs(parsed_url.query)
next_uri = compat_urlparse.urljoin(
@ -165,7 +164,8 @@ class SafariIE(SafariBaseIE):
kaltura_session = self._download_json(
'%s/player/kaltura_session/?reference_id=%s' % (self._API_BASE, reference_id),
video_id, 'Downloading kaltura session JSON',
'Unable to download kaltura session JSON', fatal=False)
'Unable to download kaltura session JSON', fatal=False,
headers={'Accept': 'application/json'})
if kaltura_session:
session = kaltura_session.get('session')
if session:

View File

@ -106,7 +106,7 @@ class ScrippsNetworksWatchIE(AWSIE):
class ScrippsNetworksIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?(?P<site>cookingchanneltv|(?:diy|food)network|hgtv|travelchannel)\.com/videos/[0-9a-z-]+-(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?(?P<site>cookingchanneltv|discovery|(?:diy|food)network|hgtv|travelchannel)\.com/videos/[0-9a-z-]+-(?P<id>\d+)'
_TESTS = [{
'url': 'https://www.cookingchanneltv.com/videos/the-best-of-the-best-0260338',
'info_dict': {
@ -131,9 +131,13 @@ class ScrippsNetworksIE(InfoExtractor):
}, {
'url': 'https://www.travelchannel.com/videos/two-climates-one-bag-5302184',
'only_matching': True,
}, {
'url': 'https://www.discovery.com/videos/guardians-of-the-glades-cooking-with-tom-cobb-5578368',
'only_matching': True,
}]
_ACCOUNT_MAP = {
'cookingchanneltv': 2433005105,
'discovery': 2706091867,
'diynetwork': 2433004575,
'foodnetwork': 2433005105,
'hgtv': 2433004575,

View File

@ -96,7 +96,7 @@ class SoundcloudIE(InfoExtractor):
'repost_count': int,
}
},
# not streamable song
# not streamable song, preview
{
'url': 'https://soundcloud.com/the-concept-band/goldrushed-mastered?in=the-concept-band/sets/the-royal-concept-ep',
'info_dict': {
@ -119,7 +119,6 @@ class SoundcloudIE(InfoExtractor):
# rtmp
'skip_download': True,
},
'skip': 'Preview',
},
# private link
{
@ -239,7 +238,7 @@ class SoundcloudIE(InfoExtractor):
'ext': 'mp3',
'title': 'Mezzo Valzer',
'description': 'md5:4138d582f81866a530317bae316e8b61',
'uploader': 'Giovanni Sarani',
'uploader': 'Micronie',
'uploader_id': '3352531',
'timestamp': 1551394171,
'upload_date': '20190228',
@ -346,9 +345,9 @@ class SoundcloudIE(InfoExtractor):
})
def invalid_url(url):
return not url or url in format_urls or re.search(r'/(?:preview|playlist)/0/30/', url)
return not url or url in format_urls
def add_format(f, protocol):
def add_format(f, protocol, is_preview=False):
mobj = re.search(r'\.(?P<abr>\d+)\.(?P<ext>[0-9a-z]{3,4})(?=[/?])', stream_url)
if mobj:
for k, v in mobj.groupdict().items():
@ -361,12 +360,16 @@ class SoundcloudIE(InfoExtractor):
v = f.get(k)
if v:
format_id_list.append(v)
preview = is_preview or re.search(r'/(?:preview|playlist)/0/30/', f['url'])
if preview:
format_id_list.append('preview')
abr = f.get('abr')
if abr:
f['abr'] = int(abr)
f.update({
'format_id': '_'.join(format_id_list),
'protocol': 'm3u8_native' if protocol == 'hls' else 'http',
'preference': -10 if preview else None,
})
formats.append(f)
@ -377,7 +380,7 @@ class SoundcloudIE(InfoExtractor):
if not isinstance(t, dict):
continue
format_url = url_or_none(t.get('url'))
if not format_url or t.get('snipped') or '/preview/' in format_url:
if not format_url:
continue
stream = self._download_json(
format_url, track_id, query=query, fatal=False)
@ -400,7 +403,8 @@ class SoundcloudIE(InfoExtractor):
add_format({
'url': stream_url,
'ext': ext,
}, 'http' if protocol == 'progressive' else protocol)
}, 'http' if protocol == 'progressive' else protocol,
t.get('snipped') or '/preview/' in format_url)
if not formats:
# Old API, does not work for some tracks (e.g.
@ -520,7 +524,17 @@ class SoundcloudIE(InfoExtractor):
class SoundcloudPlaylistBaseIE(SoundcloudIE):
def _extract_track_entries(self, tracks, token=None):
def _extract_set(self, playlist, token=None):
playlist_id = compat_str(playlist['id'])
tracks = playlist.get('tracks') or []
if not all([t.get('permalink_url') for t in tracks]) and token:
tracks = self._download_json(
self._API_V2_BASE + 'tracks', playlist_id,
'Downloading tracks', query={
'ids': ','.join([compat_str(t['id']) for t in tracks]),
'playlistId': playlist_id,
'playlistSecretToken': token,
})
entries = []
for track in tracks:
track_id = str_or_none(track.get('id'))
@ -533,7 +547,10 @@ class SoundcloudPlaylistBaseIE(SoundcloudIE):
url += '?secret_token=' + token
entries.append(self.url_result(
url, SoundcloudIE.ie_key(), track_id))
return entries
return self.playlist_result(
entries, playlist_id,
playlist.get('title'),
playlist.get('description'))
class SoundcloudSetIE(SoundcloudPlaylistBaseIE):
@ -544,6 +561,7 @@ class SoundcloudSetIE(SoundcloudPlaylistBaseIE):
'info_dict': {
'id': '2284613',
'title': 'The Royal Concept EP',
'description': 'md5:71d07087c7a449e8941a70a29e34671e',
},
'playlist_mincount': 5,
}, {
@ -566,13 +584,10 @@ class SoundcloudSetIE(SoundcloudPlaylistBaseIE):
msgs = (compat_str(err['error_message']) for err in info['errors'])
raise ExtractorError('unable to download video webpage: %s' % ','.join(msgs))
entries = self._extract_track_entries(info['tracks'], token)
return self.playlist_result(
entries, str_or_none(info.get('id')), info.get('title'))
return self._extract_set(info, token)
class SoundcloudPagedPlaylistBaseIE(SoundcloudPlaylistBaseIE):
class SoundcloudPagedPlaylistBaseIE(SoundcloudIE):
def _extract_playlist(self, base_url, playlist_id, playlist_title):
COMMON_QUERY = {
'limit': 2000000000,
@ -770,10 +785,7 @@ class SoundcloudPlaylistIE(SoundcloudPlaylistBaseIE):
self._API_V2_BASE + 'playlists/' + playlist_id,
playlist_id, 'Downloading playlist', query=query)
entries = self._extract_track_entries(data['tracks'], token)
return self.playlist_result(
entries, playlist_id, data.get('title'), data.get('description'))
return self._extract_set(data, token)
class SoundcloudSearchIE(SearchInfoExtractor, SoundcloudIE):

View File

@ -13,36 +13,18 @@ from ..utils import (
class SportDeutschlandIE(InfoExtractor):
_VALID_URL = r'https?://sportdeutschland\.tv/(?P<sport>[^/?#]+)/(?P<id>[^?#/]+)(?:$|[?#])'
_TESTS = [{
'url': 'http://sportdeutschland.tv/badminton/live-li-ning-badminton-weltmeisterschaft-2014-kopenhagen',
'url': 'https://sportdeutschland.tv/badminton/re-live-deutsche-meisterschaften-2020-halbfinals?playlistId=0',
'info_dict': {
'id': 'live-li-ning-badminton-weltmeisterschaft-2014-kopenhagen',
'id': 're-live-deutsche-meisterschaften-2020-halbfinals',
'ext': 'mp4',
'title': 're:Li-Ning Badminton Weltmeisterschaft 2014 Kopenhagen',
'categories': ['Badminton'],
'title': 're:Re-live: Deutsche Meisterschaften 2020.*Halbfinals',
'categories': ['Badminton-Deutschland'],
'view_count': int,
'thumbnail': r're:^https?://.*\.jpg$',
'description': r're:Die Badminton-WM 2014 aus Kopenhagen bei Sportdeutschland\.TV',
'thumbnail': r're:^https?://.*\.(?:jpg|png)$',
'timestamp': int,
'upload_date': 're:^201408[23][0-9]$',
'upload_date': '20200201',
'description': 're:.*', # meaningless description for THIS video
},
'params': {
'skip_download': 'Live stream',
},
}, {
'url': 'http://sportdeutschland.tv/li-ning-badminton-wm-2014/lee-li-ning-badminton-weltmeisterschaft-2014-kopenhagen-herren-einzel-wei-vs',
'info_dict': {
'id': 'lee-li-ning-badminton-weltmeisterschaft-2014-kopenhagen-herren-einzel-wei-vs',
'ext': 'mp4',
'upload_date': '20140825',
'description': 'md5:60a20536b57cee7d9a4ec005e8687504',
'timestamp': 1408976060,
'duration': 2732,
'title': 'Li-Ning Badminton Weltmeisterschaft 2014 Kopenhagen: Herren Einzel, Wei Lee vs. Keun Lee',
'thumbnail': r're:^https?://.*\.jpg$',
'view_count': int,
'categories': ['Li-Ning Badminton WM 2014'],
}
}]
def _real_extract(self, url):
@ -50,7 +32,7 @@ class SportDeutschlandIE(InfoExtractor):
video_id = mobj.group('id')
sport_id = mobj.group('sport')
api_url = 'http://proxy.vidibusdynamic.net/sportdeutschland.tv/api/permalinks/%s/%s?access_token=true' % (
api_url = 'https://proxy.vidibusdynamic.net/ssl/backend.sportdeutschland.tv/api/permalinks/%s/%s?access_token=true' % (
sport_id, video_id)
req = sanitized_Request(api_url, headers={
'Accept': 'application/vnd.vidibus.v2.html+json',

View File

@ -1,14 +1,14 @@
# coding: utf-8
from __future__ import unicode_literals
from .ard import ARDMediathekIE
from .ard import ARDMediathekBaseIE
from ..utils import (
ExtractorError,
get_element_by_attribute,
)
class SRMediathekIE(ARDMediathekIE):
class SRMediathekIE(ARDMediathekBaseIE):
IE_NAME = 'sr:mediathek'
IE_DESC = 'Saarländischer Rundfunk'
_VALID_URL = r'https?://sr-mediathek(?:\.sr-online)?\.de/index\.php\?.*?&id=(?P<id>[0-9]+)'

View File

@ -5,44 +5,28 @@ from ..utils import int_or_none
class StretchInternetIE(InfoExtractor):
_VALID_URL = r'https?://portal\.stretchinternet\.com/[^/]+/portal\.htm\?.*?\beventId=(?P<id>\d+)'
_VALID_URL = r'https?://portal\.stretchinternet\.com/[^/]+/(?:portal|full)\.htm\?.*?\beventId=(?P<id>\d+)'
_TEST = {
'url': 'https://portal.stretchinternet.com/umary/portal.htm?eventId=313900&streamType=video',
'url': 'https://portal.stretchinternet.com/umary/portal.htm?eventId=573272&streamType=video',
'info_dict': {
'id': '313900',
'id': '573272',
'ext': 'mp4',
'title': 'Augustana (S.D.) Baseball vs University of Mary',
'description': 'md5:7578478614aae3bdd4a90f578f787438',
'timestamp': 1490468400,
'upload_date': '20170325',
'title': 'University of Mary Wrestling vs. Upper Iowa',
'timestamp': 1575668361,
'upload_date': '20191206',
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
stream = self._download_json(
'https://neo-client.stretchinternet.com/streamservice/v1/media/stream/v%s'
% video_id, video_id)
video_url = 'https://%s' % stream['source']
event = self._download_json(
'https://neo-client.stretchinternet.com/portal-ws/getEvent.json',
video_id, query={
'clientID': 99997,
'eventID': video_id,
'token': 'asdf',
})['event']
title = event.get('title') or event['mobileTitle']
description = event.get('customText')
timestamp = int_or_none(event.get('longtime'))
'https://api.stretchinternet.com/trinity/event/tcg/' + video_id,
video_id)[0]
return {
'id': video_id,
'title': title,
'description': description,
'timestamp': timestamp,
'url': video_url,
'title': event['title'],
'timestamp': int_or_none(event.get('dateCreated'), 1000),
'url': 'https://' + event['media'][0]['url'],
}

View File

@ -4,19 +4,14 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import (
compat_parse_qs,
compat_urllib_parse_urlparse,
)
from ..compat import compat_str
from ..utils import (
determine_ext,
dict_get,
int_or_none,
orderedSet,
str_or_none,
strip_or_none,
try_get,
urljoin,
compat_str,
)
@ -237,23 +232,23 @@ class SVTPlayIE(SVTPlayBaseIE):
class SVTSeriesIE(SVTPlayBaseIE):
_VALID_URL = r'https?://(?:www\.)?svtplay\.se/(?P<id>[^/?&#]+)'
_VALID_URL = r'https?://(?:www\.)?svtplay\.se/(?P<id>[^/?&#]+)(?:.+?\btab=(?P<season_slug>[^&#]+))?'
_TESTS = [{
'url': 'https://www.svtplay.se/rederiet',
'info_dict': {
'id': 'rederiet',
'id': '14445680',
'title': 'Rederiet',
'description': 'md5:505d491a58f4fcf6eb418ecab947e69e',
'description': 'md5:d9fdfff17f5d8f73468176ecd2836039',
},
'playlist_mincount': 318,
}, {
'url': 'https://www.svtplay.se/rederiet?tab=sasong2',
'url': 'https://www.svtplay.se/rederiet?tab=season-2-14445680',
'info_dict': {
'id': 'rederiet-sasong2',
'id': 'season-2-14445680',
'title': 'Rederiet - Säsong 2',
'description': 'md5:505d491a58f4fcf6eb418ecab947e69e',
'description': 'md5:d9fdfff17f5d8f73468176ecd2836039',
},
'playlist_count': 12,
'playlist_mincount': 12,
}]
@classmethod
@ -261,83 +256,87 @@ class SVTSeriesIE(SVTPlayBaseIE):
return False if SVTIE.suitable(url) or SVTPlayIE.suitable(url) else super(SVTSeriesIE, cls).suitable(url)
def _real_extract(self, url):
series_id = self._match_id(url)
series_slug, season_id = re.match(self._VALID_URL, url).groups()
qs = compat_parse_qs(compat_urllib_parse_urlparse(url).query)
season_slug = qs.get('tab', [None])[0]
if season_slug:
series_id += '-%s' % season_slug
webpage = self._download_webpage(
url, series_id, 'Downloading series page')
root = self._parse_json(
self._search_regex(
self._SVTPLAY_RE, webpage, 'content', group='json'),
series_id)
series = self._download_json(
'https://api.svt.se/contento/graphql', series_slug,
'Downloading series page', query={
'query': '''{
listablesBySlug(slugs: ["%s"]) {
associatedContent(include: [productionPeriod, season]) {
items {
item {
... on Episode {
videoSvtId
}
}
}
id
name
}
id
longDescription
name
shortDescription
}
}''' % series_slug,
})['data']['listablesBySlug'][0]
season_name = None
entries = []
for season in root['relatedVideoContent']['relatedVideosAccordion']:
for season in series['associatedContent']:
if not isinstance(season, dict):
continue
if season_slug:
if season.get('slug') != season_slug:
if season_id:
if season.get('id') != season_id:
continue
season_name = season.get('name')
videos = season.get('videos')
if not isinstance(videos, list):
items = season.get('items')
if not isinstance(items, list):
continue
for video in videos:
content_url = video.get('contentUrl')
if not content_url or not isinstance(content_url, compat_str):
for item in items:
video = item.get('item') or {}
content_id = video.get('videoSvtId')
if not content_id or not isinstance(content_id, compat_str):
continue
entries.append(
self.url_result(
urljoin(url, content_url),
ie=SVTPlayIE.ie_key(),
video_title=video.get('title')
))
entries.append(self.url_result(
'svt:' + content_id, SVTPlayIE.ie_key(), content_id))
metadata = root.get('metaData')
if not isinstance(metadata, dict):
metadata = {}
title = metadata.get('title')
season_name = season_name or season_slug
title = series.get('name')
season_name = season_name or season_id
if title and season_name:
title = '%s - %s' % (title, season_name)
elif season_slug:
title = season_slug
elif season_id:
title = season_id
return self.playlist_result(
entries, series_id, title, metadata.get('description'))
entries, season_id or series.get('id'), title,
dict_get(series, ('longDescription', 'shortDescription')))
class SVTPageIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?svt\.se/(?:[^/]+/)*(?P<id>[^/?&#]+)'
_VALID_URL = r'https?://(?:www\.)?svt\.se/(?P<path>(?:[^/]+/)*(?P<id>[^/?&#]+))'
_TESTS = [{
'url': 'https://www.svt.se/sport/oseedat/guide-sommartraningen-du-kan-gora-var-och-nar-du-vill',
'url': 'https://www.svt.se/sport/ishockey/bakom-masken-lehners-kamp-mot-mental-ohalsa',
'info_dict': {
'id': 'guide-sommartraningen-du-kan-gora-var-och-nar-du-vill',
'title': 'GUIDE: Sommarträning du kan göra var och när du vill',
'id': '25298267',
'title': 'Bakom masken Lehners kamp mot mental ohälsa',
},
'playlist_count': 7,
'playlist_count': 4,
}, {
'url': 'https://www.svt.se/nyheter/inrikes/ebba-busch-thor-kd-har-delvis-ratt-om-no-go-zoner',
'url': 'https://www.svt.se/nyheter/utrikes/svenska-andrea-ar-en-mil-fran-branderna-i-kalifornien',
'info_dict': {
'id': 'ebba-busch-thor-kd-har-delvis-ratt-om-no-go-zoner',
'title': 'Ebba Busch Thor har bara delvis rätt om ”no-go-zoner”',
'id': '24243746',
'title': 'Svenska Andrea redo att fly sitt hem i Kalifornien',
},
'playlist_count': 1,
'playlist_count': 2,
}, {
# only programTitle
'url': 'http://www.svt.se/sport/ishockey/jagr-tacklar-giroux-under-intervjun',
'info_dict': {
'id': '2900353',
'id': '8439V2K',
'ext': 'mp4',
'title': 'Stjärnorna skojar till det - under SVT-intervjun',
'duration': 27,
@ -356,16 +355,26 @@ class SVTPageIE(InfoExtractor):
return False if SVTIE.suitable(url) else super(SVTPageIE, cls).suitable(url)
def _real_extract(self, url):
playlist_id = self._match_id(url)
path, display_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, playlist_id)
article = self._download_json(
'https://api.svt.se/nss-api/page/' + path, display_id,
query={'q': 'articles'})['articles']['content'][0]
entries = [
self.url_result(
'svt:%s' % video_id, ie=SVTPlayIE.ie_key(), video_id=video_id)
for video_id in orderedSet(re.findall(
r'data-video-id=["\'](\d+)', webpage))]
entries = []
title = strip_or_none(self._og_search_title(webpage, default=None))
def _process_content(content):
if content.get('_type') in ('VIDEOCLIP', 'VIDEOEPISODE'):
video_id = compat_str(content['image']['svtId'])
entries.append(self.url_result(
'svt:' + video_id, SVTPlayIE.ie_key(), video_id))
return self.playlist_result(entries, playlist_id, title)
for media in article.get('media', []):
_process_content(media)
for obj in article.get('structuredBody', []):
_process_content(obj.get('content') or {})
return self.playlist_result(
entries, str_or_none(article.get('id')),
strip_or_none(article.get('title')))

View File

@ -4,7 +4,6 @@ import re
from .common import InfoExtractor
from .wistia import WistiaIE
from ..compat import compat_str
from ..utils import (
clean_html,
ExtractorError,
@ -58,7 +57,7 @@ class TeachableBaseIE(InfoExtractor):
self._logged_in = True
return
login_url = compat_str(urlh.geturl())
login_url = urlh.geturl()
login_form = self._hidden_inputs(login_page)
@ -160,8 +159,8 @@ class TeachableIE(TeachableBaseIE):
webpage = self._download_webpage(url, video_id)
wistia_url = WistiaIE._extract_url(webpage)
if not wistia_url:
wistia_urls = WistiaIE._extract_urls(webpage)
if not wistia_urls:
if any(re.search(p, webpage) for p in (
r'class=["\']lecture-contents-locked',
r'>\s*Lecture contents locked',
@ -174,12 +173,14 @@ class TeachableIE(TeachableBaseIE):
title = self._og_search_title(webpage, default=None)
return {
entries = [{
'_type': 'url_transparent',
'url': wistia_url,
'ie_key': WistiaIE.ie_key(),
'title': title,
}
} for wistia_url in wistia_urls]
return self.playlist_result(entries, video_id, title)
class TeachableCourseIE(TeachableBaseIE):

View File

@ -11,6 +11,7 @@ from ..utils import (
determine_ext,
int_or_none,
str_or_none,
try_get,
urljoin,
)
@ -24,7 +25,7 @@ class TelecincoIE(InfoExtractor):
'info_dict': {
'id': '1876350223',
'title': 'Bacalao con kokotxas al pil-pil',
'description': 'md5:1382dacd32dd4592d478cbdca458e5bb',
'description': 'md5:716caf5601e25c3c5ab6605b1ae71529',
},
'playlist': [{
'md5': 'adb28c37238b675dad0f042292f209a7',
@ -55,6 +56,26 @@ class TelecincoIE(InfoExtractor):
'description': 'md5:2771356ff7bfad9179c5f5cd954f1477',
'duration': 50,
},
}, {
# video in opening's content
'url': 'https://www.telecinco.es/vivalavida/fiorella-sobrina-edmundo-arrocet-entrevista_18_2907195140.html',
'info_dict': {
'id': '2907195140',
'title': 'La surrealista entrevista a la sobrina de Edmundo Arrocet: "No puedes venir aquí y tomarnos por tontos"',
'description': 'md5:73f340a7320143d37ab895375b2bf13a',
},
'playlist': [{
'md5': 'adb28c37238b675dad0f042292f209a7',
'info_dict': {
'id': 'TpI2EttSDAReWpJ1o0NVh2',
'ext': 'mp4',
'title': 'La surrealista entrevista a la sobrina de Edmundo Arrocet: "No puedes venir aquí y tomarnos por tontos"',
'duration': 1015,
},
}],
'params': {
'skip_download': True,
},
}, {
'url': 'http://www.telecinco.es/informativos/nacional/Pablo_Iglesias-Informativos_Telecinco-entrevista-Pedro_Piqueras_2_1945155182.html',
'only_matching': True,
@ -135,17 +156,28 @@ class TelecincoIE(InfoExtractor):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
article = self._parse_json(self._search_regex(
r'window\.\$REACTBASE_STATE\.article\s*=\s*({.+})',
r'window\.\$REACTBASE_STATE\.article(?:_multisite)?\s*=\s*({.+})',
webpage, 'article'), display_id)['article']
title = article.get('title')
description = clean_html(article.get('leadParagraph'))
description = clean_html(article.get('leadParagraph')) or ''
if article.get('editorialType') != 'VID':
entries = []
for p in article.get('body', []):
content = p.get('content')
if p.get('type') != 'video' or not content:
body = [article.get('opening')]
body.extend(try_get(article, lambda x: x['body'], list) or [])
for p in body:
if not isinstance(p, dict):
continue
entries.append(self._parse_content(content, url))
content = p.get('content')
if not content:
continue
type_ = p.get('type')
if type_ == 'paragraph':
content_str = str_or_none(content)
if content_str:
description += content_str
continue
if type_ == 'video' and isinstance(content, dict):
entries.append(self._parse_content(content, url))
return self.playlist_result(
entries, str_or_none(article.get('id')), title, description)
content = article['opening']['content']

View File

@ -2,43 +2,42 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import try_get
class ThisOldHouseIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?thisoldhouse\.com/(?:watch|how-to|tv-episode)/(?P<id>[^/?#]+)'
_VALID_URL = r'https?://(?:www\.)?thisoldhouse\.com/(?:watch|how-to|tv-episode|(?:[^/]+/)?\d+)/(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'https://www.thisoldhouse.com/how-to/how-to-build-storage-bench',
'md5': '568acf9ca25a639f0c4ff905826b662f',
'info_dict': {
'id': '2REGtUDQ',
'id': '5dcdddf673c3f956ef5db202',
'ext': 'mp4',
'title': 'How to Build a Storage Bench',
'description': 'In the workshop, Tom Silva and Kevin O\'Connor build a storage bench for an entryway.',
'timestamp': 1442548800,
'upload_date': '20150918',
}
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://www.thisoldhouse.com/watch/arlington-arts-crafts-arts-and-crafts-class-begins',
'only_matching': True,
}, {
'url': 'https://www.thisoldhouse.com/tv-episode/ask-toh-shelf-rough-electric',
'only_matching': True,
}, {
'url': 'https://www.thisoldhouse.com/furniture/21017078/how-to-build-a-storage-bench',
'only_matching': True,
}, {
'url': 'https://www.thisoldhouse.com/21113884/s41-e13-paradise-lost',
'only_matching': True,
}]
_ZYPE_TMPL = 'https://player.zype.com/embed/%s.html?api_key=hsOk_yMSPYNrT22e9pu8hihLXjaZf0JW5jsOWv4ZqyHJFvkJn6rtToHl09tbbsbe'
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(
(r'data-mid=(["\'])(?P<id>(?:(?!\1).)+)\1',
r'id=(["\'])inline-video-player-(?P<id>(?:(?!\1).)+)\1'),
webpage, 'video id', default=None, group='id')
if not video_id:
drupal_settings = self._parse_json(self._search_regex(
r'jQuery\.extend\(Drupal\.settings\s*,\s*({.+?})\);',
webpage, 'drupal settings'), display_id)
video_id = try_get(
drupal_settings, lambda x: x['jwplatform']['video_id'],
compat_str) or list(drupal_settings['comScore'])[0]
return self.url_result('jwplatform:' + video_id, 'JWPlatform', video_id)
r'<iframe[^>]+src=[\'"](?:https?:)?//thisoldhouse\.chorus\.build/videos/zype/([0-9a-f]{24})',
webpage, 'video id')
return self.url_result(self._ZYPE_TMPL % video_id, 'Zype', video_id)

View File

@ -17,9 +17,9 @@ from ..utils import (
class ToggleIE(InfoExtractor):
IE_NAME = 'toggle'
_VALID_URL = r'https?://video\.toggle\.sg/(?:en|zh)/(?:[^/]+/){2,}(?P<id>[0-9]+)'
_VALID_URL = r'https?://(?:(?:www\.)?mewatch|video\.toggle)\.sg/(?:en|zh)/(?:[^/]+/){2,}(?P<id>[0-9]+)'
_TESTS = [{
'url': 'http://video.toggle.sg/en/series/lion-moms-tif/trailers/lion-moms-premier/343115',
'url': 'http://www.mewatch.sg/en/series/lion-moms-tif/trailers/lion-moms-premier/343115',
'info_dict': {
'id': '343115',
'ext': 'mp4',
@ -33,7 +33,7 @@ class ToggleIE(InfoExtractor):
}
}, {
'note': 'DRM-protected video',
'url': 'http://video.toggle.sg/en/movies/dug-s-special-mission/341413',
'url': 'http://www.mewatch.sg/en/movies/dug-s-special-mission/341413',
'info_dict': {
'id': '341413',
'ext': 'wvm',
@ -48,7 +48,7 @@ class ToggleIE(InfoExtractor):
}, {
# this also tests correct video id extraction
'note': 'm3u8 links are geo-restricted, but Android/mp4 is okay',
'url': 'http://video.toggle.sg/en/series/28th-sea-games-5-show/28th-sea-games-5-show-ep11/332861',
'url': 'http://www.mewatch.sg/en/series/28th-sea-games-5-show/28th-sea-games-5-show-ep11/332861',
'info_dict': {
'id': '332861',
'ext': 'mp4',
@ -65,19 +65,22 @@ class ToggleIE(InfoExtractor):
'url': 'http://video.toggle.sg/en/clips/seraph-sun-aloysius-will-suddenly-sing-some-old-songs-in-high-pitch-on-set/343331',
'only_matching': True,
}, {
'url': 'http://video.toggle.sg/zh/series/zero-calling-s2-hd/ep13/336367',
'url': 'http://www.mewatch.sg/en/clips/seraph-sun-aloysius-will-suddenly-sing-some-old-songs-in-high-pitch-on-set/343331',
'only_matching': True,
}, {
'url': 'http://video.toggle.sg/en/series/vetri-s2/webisodes/jeeva-is-an-orphan-vetri-s2-webisode-7/342302',
'url': 'http://www.mewatch.sg/zh/series/zero-calling-s2-hd/ep13/336367',
'only_matching': True,
}, {
'url': 'http://video.toggle.sg/en/movies/seven-days/321936',
'url': 'http://www.mewatch.sg/en/series/vetri-s2/webisodes/jeeva-is-an-orphan-vetri-s2-webisode-7/342302',
'only_matching': True,
}, {
'url': 'https://video.toggle.sg/en/tv-show/news/may-2017-cna-singapore-tonight/fri-19-may-2017/512456',
'url': 'http://www.mewatch.sg/en/movies/seven-days/321936',
'only_matching': True,
}, {
'url': 'http://video.toggle.sg/en/channels/eleven-plus/401585',
'url': 'https://www.mewatch.sg/en/tv-show/news/may-2017-cna-singapore-tonight/fri-19-may-2017/512456',
'only_matching': True,
}, {
'url': 'http://www.mewatch.sg/en/channels/eleven-plus/401585',
'only_matching': True,
}]

View File

@ -1,21 +1,12 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
dict_get,
float_or_none,
int_or_none,
unified_timestamp,
update_url_query,
url_or_none,
)
class TruNewsIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?trunews\.com/stream/(?P<id>[^/?#&]+)'
_TEST = {
'url': 'https://www.trunews.com/stream/will-democrats-stage-a-circus-during-president-trump-s-state-of-the-union-speech',
'md5': 'a19c024c3906ff954fac9b96ce66bb08',
'info_dict': {
'id': '5c5a21e65d3c196e1c0020cc',
'display_id': 'will-democrats-stage-a-circus-during-president-trump-s-state-of-the-union-speech',
@ -28,48 +19,16 @@ class TruNewsIE(InfoExtractor):
},
'add_ie': ['Zype'],
}
_ZYPE_TEMPL = 'https://player.zype.com/embed/%s.js?api_key=X5XnahkjCwJrT_l5zUqypnaLEObotyvtUKJWWlONxDoHVjP8vqxlArLV8llxMbyt'
def _real_extract(self, url):
display_id = self._match_id(url)
video = self._download_json(
zype_id = self._download_json(
'https://api.zype.com/videos', display_id, query={
'app_key': 'PUVKp9WgGUb3-JUw6EqafLx8tFVP6VKZTWbUOR-HOm__g4fNDt1bCsm_LgYf_k9H',
'per_page': 1,
'active': 'true',
'friendly_title': display_id,
})['response'][0]
zype_id = video['_id']
thumbnails = []
thumbnails_list = video.get('thumbnails')
if isinstance(thumbnails_list, list):
for thumbnail in thumbnails_list:
if not isinstance(thumbnail, dict):
continue
thumbnail_url = url_or_none(thumbnail.get('url'))
if not thumbnail_url:
continue
thumbnails.append({
'url': thumbnail_url,
'width': int_or_none(thumbnail.get('width')),
'height': int_or_none(thumbnail.get('height')),
})
return {
'_type': 'url_transparent',
'url': update_url_query(
'https://player.zype.com/embed/%s.js' % zype_id,
{'api_key': 'X5XnahkjCwJrT_l5zUqypnaLEObotyvtUKJWWlONxDoHVjP8vqxlArLV8llxMbyt'}),
'ie_key': 'Zype',
'id': zype_id,
'display_id': display_id,
'title': video.get('title'),
'description': dict_get(video, ('description', 'ott_description', 'short_description')),
'duration': int_or_none(video.get('duration')),
'timestamp': unified_timestamp(video.get('published_at')),
'average_rating': float_or_none(video.get('rating')),
'view_count': int_or_none(video.get('request_count')),
'thumbnails': thumbnails,
}
})['response'][0]['_id']
return self.url_result(self._ZYPE_TEMPL % zype_id, 'Zype', zype_id)

View File

@ -4,7 +4,6 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
ExtractorError,
int_or_none,
@ -151,7 +150,7 @@ class TumblrIE(InfoExtractor):
url = 'http://%s.tumblr.com/post/%s/' % (blog, video_id)
webpage, urlh = self._download_webpage_handle(url, video_id)
redirect_url = compat_str(urlh.geturl())
redirect_url = urlh.geturl()
if 'tumblr.com/safe-mode' in redirect_url or redirect_url.startswith('/safe-mode'):
raise ExtractorError(
'This Tumblr may contain sensitive media. '

View File

@ -106,7 +106,7 @@ class TV2DKBornholmPlayIE(InfoExtractor):
video_id = self._match_id(url)
video = self._download_json(
'http://play.tv2bornholm.dk/controls/AJAX.aspx/specifikVideo', video_id,
'https://play.tv2bornholm.dk/controls/AJAX.aspx/specifikVideo', video_id,
data=json.dumps({
'playlist_id': video_id,
'serienavn': '',

View File

@ -3,31 +3,51 @@ from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
clean_html,
determine_ext,
extract_attributes,
get_element_by_class,
int_or_none,
parse_duration,
parse_iso8601,
)
class TV5MondePlusIE(InfoExtractor):
IE_DESC = 'TV5MONDE+'
_VALID_URL = r'https?://(?:www\.)?tv5mondeplus\.com/toutes-les-videos/[^/]+/(?P<id>[^/?#]+)'
_TEST = {
'url': 'http://www.tv5mondeplus.com/toutes-les-videos/documentaire/tdah-mon-amour-tele-quebec-tdah-mon-amour-ep001-enfants',
'md5': '12130fc199f020673138a83466542ec6',
_VALID_URL = r'https?://(?:www\.)?(?:tv5mondeplus|revoir\.tv5monde)\.com/toutes-les-videos/[^/]+/(?P<id>[^/?#]+)'
_TESTS = [{
# movie
'url': 'https://revoir.tv5monde.com/toutes-les-videos/cinema/rendez-vous-a-atlit',
'md5': '8cbde5ea7b296cf635073e27895e227f',
'info_dict': {
'id': 'tdah-mon-amour-tele-quebec-tdah-mon-amour-ep001-enfants',
'id': '822a4756-0712-7329-1859-a13ac7fd1407',
'display_id': 'rendez-vous-a-atlit',
'ext': 'mp4',
'title': 'Tdah, mon amour - Enfants',
'description': 'md5:230e3aca23115afcf8006d1bece6df74',
'upload_date': '20170401',
'timestamp': 1491022860,
}
}
'title': 'Rendez-vous à Atlit',
'description': 'md5:2893a4c5e1dbac3eedff2d87956e4efb',
'upload_date': '20200130',
},
}, {
# series episode
'url': 'https://revoir.tv5monde.com/toutes-les-videos/series-fictions/c-est-la-vie-ennemie-juree',
'info_dict': {
'id': '0df7007c-4900-3936-c601-87a13a93a068',
'display_id': 'c-est-la-vie-ennemie-juree',
'ext': 'mp4',
'title': "C'est la vie - Ennemie jurée",
'description': 'md5:dfb5c63087b6f35fe0cc0af4fe44287e',
'upload_date': '20200130',
'series': "C'est la vie",
'episode': 'Ennemie jurée',
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://revoir.tv5monde.com/toutes-les-videos/series-fictions/neuf-jours-en-hiver-neuf-jours-en-hiver',
'only_matching': True,
}, {
'url': 'https://revoir.tv5monde.com/toutes-les-videos/info-societe/le-journal-de-la-rts-edition-du-30-01-20-19h30',
'only_matching': True,
}]
_GEO_BYPASS = False
def _real_extract(self, url):
@ -37,11 +57,7 @@ class TV5MondePlusIE(InfoExtractor):
if ">Ce programme n'est malheureusement pas disponible pour votre zone géographique.<" in webpage:
self.raise_geo_restricted(countries=['FR'])
series = get_element_by_class('video-detail__title', webpage)
title = episode = get_element_by_class(
'video-detail__subtitle', webpage) or series
if series and series != title:
title = '%s - %s' % (series, title)
title = episode = self._html_search_regex(r'<h1>([^<]+)', webpage, 'title')
vpl_data = extract_attributes(self._search_regex(
r'(<[^>]+class="video_player_loader"[^>]+>)',
webpage, 'video player loader'))
@ -65,15 +81,37 @@ class TV5MondePlusIE(InfoExtractor):
})
self._sort_formats(formats)
description = self._html_search_regex(
r'(?s)<div[^>]+class=["\']episode-texte[^>]+>(.+?)</div>', webpage,
'description', fatal=False)
series = self._html_search_regex(
r'<p[^>]+class=["\']episode-emission[^>]+>([^<]+)', webpage,
'series', default=None)
if series and series != title:
title = '%s - %s' % (series, title)
upload_date = self._search_regex(
r'(?:date_publication|publish_date)["\']\s*:\s*["\'](\d{4}_\d{2}_\d{2})',
webpage, 'upload date', default=None)
if upload_date:
upload_date = upload_date.replace('_', '')
video_id = self._search_regex(
(r'data-guid=["\']([\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})',
r'id_contenu["\']\s:\s*(\d+)'), webpage, 'video id',
default=display_id)
return {
'id': display_id,
'id': video_id,
'display_id': display_id,
'title': title,
'description': clean_html(get_element_by_class('video-detail__description', webpage)),
'description': description,
'thumbnail': vpl_data.get('data-image'),
'duration': int_or_none(vpl_data.get('data-duration')) or parse_duration(self._html_search_meta('duration', webpage)),
'timestamp': parse_iso8601(self._html_search_meta('uploadDate', webpage)),
'upload_date': upload_date,
'formats': formats,
'episode': episode,
'series': series,
'episode': episode,
}

View File

@ -9,8 +9,8 @@ from ..utils import (
class TVAIE(InfoExtractor):
_VALID_URL = r'https?://videos\.tva\.ca/details/_(?P<id>\d+)'
_TEST = {
_VALID_URL = r'https?://videos?\.tva\.ca/details/_(?P<id>\d+)'
_TESTS = [{
'url': 'https://videos.tva.ca/details/_5596811470001',
'info_dict': {
'id': '5596811470001',
@ -24,7 +24,10 @@ class TVAIE(InfoExtractor):
# m3u8 download
'skip_download': True,
}
}
}, {
'url': 'https://video.tva.ca/details/_5596811470001',
'only_matching': True,
}]
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/5481942443001/default_default/index.html?videoId=%s'
def _real_extract(self, url):

View File

@ -17,8 +17,8 @@ class TwentyFourVideoIE(InfoExtractor):
_VALID_URL = r'''(?x)
https?://
(?P<host>
(?:(?:www|porno)\.)?24video\.
(?:net|me|xxx|sexy?|tube|adult|site)
(?:(?:www|porno?)\.)?24video\.
(?:net|me|xxx|sexy?|tube|adult|site|vip)
)/
(?:
video/(?:(?:view|xml)/)?|
@ -59,6 +59,12 @@ class TwentyFourVideoIE(InfoExtractor):
}, {
'url': 'https://porno.24video.net/video/2640421-vsya-takaya-gibkaya-i-v-masle',
'only_matching': True,
}, {
'url': 'https://www.24video.vip/video/view/1044982',
'only_matching': True,
}, {
'url': 'https://porn.24video.net/video/2640421-vsya-takay',
'only_matching': True,
}]
def _real_extract(self, url):

View File

@ -575,8 +575,8 @@ class TwitchStreamIE(TwitchBaseIE):
channel_id = self._match_id(url)
stream = self._call_api(
'kraken/streams/%s?stream_type=all' % channel_id, channel_id,
'Downloading stream JSON').get('stream')
'kraken/streams/%s?stream_type=all' % channel_id.lower(),
channel_id, 'Downloading stream JSON').get('stream')
if not stream:
raise ExtractorError('%s is offline' % channel_id, expected=True)

View File

@ -251,10 +251,10 @@ class TwitterIE(TwitterBaseIE):
'info_dict': {
'id': '700207533655363584',
'ext': 'mp4',
'title': 'Simon Vertugo - BEAT PROD: @suhmeduh #Damndaniel',
'title': 'simon vetugo - BEAT PROD: @suhmeduh #Damndaniel',
'description': 'BEAT PROD: @suhmeduh https://t.co/HBrQ4AfpvZ #Damndaniel https://t.co/byBooq2ejZ',
'thumbnail': r're:^https?://.*\.jpg',
'uploader': 'Simon Vertugo',
'uploader': 'simon vetugo',
'uploader_id': 'simonvertugo',
'duration': 30.0,
'timestamp': 1455777459,
@ -376,6 +376,10 @@ class TwitterIE(TwitterBaseIE):
# Twitch Clip Embed
'url': 'https://twitter.com/GunB1g/status/1163218564784017422',
'only_matching': True,
}, {
# promo_video_website card
'url': 'https://twitter.com/GunB1g/status/1163218564784017422',
'only_matching': True,
}]
def _real_extract(self, url):
@ -458,10 +462,11 @@ class TwitterIE(TwitterBaseIE):
return try_get(o, lambda x: x[x['type'].lower() + '_value'])
card_name = card['name'].split(':')[-1]
if card_name == 'amplify':
formats = self._extract_formats_from_vmap_url(
get_binding_value('amplify_url_vmap'),
get_binding_value('amplify_content_id') or twid)
if card_name in ('amplify', 'promo_video_website'):
is_amplify = card_name == 'amplify'
vmap_url = get_binding_value('amplify_url_vmap') if is_amplify else get_binding_value('player_stream_url')
content_id = get_binding_value('%s_content_id' % (card_name if is_amplify else 'player'))
formats = self._extract_formats_from_vmap_url(vmap_url, content_id or twid)
self._sort_formats(formats)
thumbnails = []

View File

@ -1,35 +1,50 @@
# coding: utf-8
from __future__ import unicode_literals
import re
import time
import functools
import hashlib
import json
import random
import re
import time
from .adobepass import AdobePassIE
from .youtube import YoutubeIE
from .common import InfoExtractor
from .youtube import YoutubeIE
from ..compat import (
compat_HTTPError,
compat_str,
)
from ..utils import (
clean_html,
ExtractorError,
int_or_none,
OnDemandPagedList,
parse_age_limit,
str_or_none,
try_get,
)
class ViceIE(AdobePassIE):
class ViceBaseIE(InfoExtractor):
def _call_api(self, resource, resource_key, resource_id, locale, fields, args=''):
return self._download_json(
'https://video.vice.com/api/v1/graphql', resource_id, query={
'query': '''{
%s(locale: "%s", %s: "%s"%s) {
%s
}
}''' % (resource, locale, resource_key, resource_id, args, fields),
})['data'][resource]
class ViceIE(ViceBaseIE, AdobePassIE):
IE_NAME = 'vice'
_VALID_URL = r'https?://(?:(?:video|vms)\.vice|(?:www\.)?viceland)\.com/(?P<locale>[^/]+)/(?:video/[^/]+|embed)/(?P<id>[\da-f]+)'
_VALID_URL = r'https?://(?:(?:video|vms)\.vice|(?:www\.)?vice(?:land|tv))\.com/(?P<locale>[^/]+)/(?:video/[^/]+|embed)/(?P<id>[\da-f]{24})'
_TESTS = [{
'url': 'https://video.vice.com/en_us/video/pet-cremator/58c69e38a55424f1227dc3f7',
'info_dict': {
'id': '5e647f0125e145c9aef2069412c0cbde',
'id': '58c69e38a55424f1227dc3f7',
'ext': 'mp4',
'title': '10 Questions You Always Wanted To Ask: Pet Cremator',
'description': 'md5:fe856caacf61fe0e74fab15ce2b07ca5',
@ -43,17 +58,16 @@ class ViceIE(AdobePassIE):
# m3u8 download
'skip_download': True,
},
'add_ie': ['UplynkPreplay'],
}, {
# geo restricted to US
'url': 'https://video.vice.com/en_us/video/the-signal-from-tolva/5816510690b70e6c5fd39a56',
'info_dict': {
'id': '930c0ad1f47141cc955087eecaddb0e2',
'id': '5816510690b70e6c5fd39a56',
'ext': 'mp4',
'uploader': 'waypoint',
'uploader': 'vice',
'title': 'The Signal From Tölva',
'description': 'md5:3927e3c79f9e8094606a2b3c5b5e55d5',
'uploader_id': '57f7d621e05ca860fa9ccaf9',
'uploader_id': '57a204088cb727dec794c67b',
'timestamp': 1477941983,
'upload_date': '20161031',
},
@ -61,15 +75,14 @@ class ViceIE(AdobePassIE):
# m3u8 download
'skip_download': True,
},
'add_ie': ['UplynkPreplay'],
}, {
'url': 'https://video.vice.com/alps/video/ulfs-wien-beruchtigste-grafitti-crew-part-1/581b12b60a0e1f4c0fb6ea2f',
'info_dict': {
'id': '581b12b60a0e1f4c0fb6ea2f',
'ext': 'mp4',
'title': 'ULFs - Wien berüchtigste Grafitti Crew - Part 1',
'description': '<p>Zwischen Hinterzimmer-Tattoos und U-Bahnschächten erzählen uns die Ulfs, wie es ist, "süchtig nach Sachbeschädigung" zu sein.</p>',
'uploader': 'VICE',
'description': 'Zwischen Hinterzimmer-Tattoos und U-Bahnschächten erzählen uns die Ulfs, wie es ist, "süchtig nach Sachbeschädigung" zu sein.',
'uploader': 'vice',
'uploader_id': '57a204088cb727dec794c67b',
'timestamp': 1485368119,
'upload_date': '20170125',
@ -78,9 +91,7 @@ class ViceIE(AdobePassIE):
'params': {
# AES-encrypted m3u8
'skip_download': True,
'proxy': '127.0.0.1:8118',
},
'add_ie': ['UplynkPreplay'],
}, {
'url': 'https://video.vice.com/en_us/video/pizza-show-trailer/56d8c9a54d286ed92f7f30e4',
'only_matching': True,
@ -98,7 +109,7 @@ class ViceIE(AdobePassIE):
@staticmethod
def _extract_urls(webpage):
return re.findall(
r'<iframe\b[^>]+\bsrc=["\']((?:https?:)?//video\.vice\.com/[^/]+/embed/[\da-f]+)',
r'<iframe\b[^>]+\bsrc=["\']((?:https?:)?//video\.vice\.com/[^/]+/embed/[\da-f]{24})',
webpage)
@staticmethod
@ -109,31 +120,16 @@ class ViceIE(AdobePassIE):
def _real_extract(self, url):
locale, video_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(
'https://video.vice.com/%s/embed/%s' % (locale, video_id),
video_id)
video = self._parse_json(
self._search_regex(
r'PREFETCH_DATA\s*=\s*({.+?})\s*;\s*\n', webpage,
'app state'), video_id)['video']
video_id = video.get('vms_id') or video.get('id') or video_id
title = video['title']
is_locked = video.get('locked')
video = self._call_api('videos', 'id', video_id, locale, '''body
locked
rating
thumbnail_url
title''')[0]
title = video['title'].strip()
rating = video.get('rating')
thumbnail = video.get('thumbnail_url')
duration = int_or_none(video.get('duration'))
series = try_get(
video, lambda x: x['episode']['season']['show']['title'],
compat_str)
episode_number = try_get(
video, lambda x: x['episode']['episode_number'])
season_number = try_get(
video, lambda x: x['episode']['season']['season_number'])
uploader = None
query = {}
if is_locked:
if video.get('locked'):
resource = self._get_mvpd_resource(
'VICELAND', title, video_id, rating)
query['tvetoken'] = self._extract_mvpd_auth(
@ -148,12 +144,9 @@ class ViceIE(AdobePassIE):
query.update({
'exp': exp,
'sign': hashlib.sha512(('%s:GET:%d' % (video_id, exp)).encode()).hexdigest(),
'_ad_blocked': None,
'_ad_unit': '',
'_debug': '',
'skipadstitching': 1,
'platform': 'desktop',
'rn': random.randint(10000, 100000),
'fbprebidtoken': '',
})
try:
@ -169,85 +162,94 @@ class ViceIE(AdobePassIE):
raise
video_data = preplay['video']
base = video_data['base']
uplynk_preplay_url = preplay['preplayURL']
episode = video_data.get('episode', {})
channel = video_data.get('channel', {})
formats = self._extract_m3u8_formats(
preplay['playURL'], video_id, 'mp4', 'm3u8_native')
self._sort_formats(formats)
episode = video_data.get('episode') or {}
channel = video_data.get('channel') or {}
season = video_data.get('season') or {}
subtitles = {}
cc_url = preplay.get('ccURL')
if cc_url:
subtitles['en'] = [{
for subtitle in preplay.get('subtitleURLs', []):
cc_url = subtitle.get('url')
if not cc_url:
continue
language_code = try_get(subtitle, lambda x: x['languages'][0]['language_code'], compat_str) or 'en'
subtitles.setdefault(language_code, []).append({
'url': cc_url,
}]
})
return {
'_type': 'url_transparent',
'url': uplynk_preplay_url,
'formats': formats,
'id': video_id,
'title': title,
'description': base.get('body') or base.get('display_body'),
'thumbnail': thumbnail,
'duration': int_or_none(video_data.get('video_duration')) or duration,
'description': clean_html(video.get('body')),
'thumbnail': video.get('thumbnail_url'),
'duration': int_or_none(video_data.get('video_duration')),
'timestamp': int_or_none(video_data.get('created_at'), 1000),
'age_limit': parse_age_limit(video_data.get('video_rating')),
'series': video_data.get('show_title') or series,
'episode_number': int_or_none(episode.get('episode_number') or episode_number),
'age_limit': parse_age_limit(video_data.get('video_rating') or rating),
'series': try_get(video_data, lambda x: x['show']['base']['display_title'], compat_str),
'episode_number': int_or_none(episode.get('episode_number')),
'episode_id': str_or_none(episode.get('id') or video_data.get('episode_id')),
'season_number': int_or_none(season_number),
'season_id': str_or_none(episode.get('season_id')),
'uploader': channel.get('base', {}).get('title') or channel.get('name') or uploader,
'season_number': int_or_none(season.get('season_number')),
'season_id': str_or_none(season.get('id') or video_data.get('season_id')),
'uploader': channel.get('name'),
'uploader_id': str_or_none(channel.get('id')),
'subtitles': subtitles,
'ie_key': 'UplynkPreplay',
}
class ViceShowIE(InfoExtractor):
class ViceShowIE(ViceBaseIE):
IE_NAME = 'vice:show'
_VALID_URL = r'https?://(?:.+?\.)?vice\.com/(?:[^/]+/)?show/(?P<id>[^/?#&]+)'
_TEST = {
'url': 'https://munchies.vice.com/en/show/fuck-thats-delicious-2',
_VALID_URL = r'https?://(?:video\.vice|(?:www\.)?vice(?:land|tv))\.com/(?P<locale>[^/]+)/show/(?P<id>[^/?#&]+)'
_PAGE_SIZE = 25
_TESTS = [{
'url': 'https://video.vice.com/en_us/show/fck-thats-delicious',
'info_dict': {
'id': 'fuck-thats-delicious-2',
'title': "Fuck, That's Delicious",
'description': 'Follow the culinary adventures of rapper Action Bronson during his ongoing world tour.',
'id': '57a2040c8cb727dec794c901',
'title': 'F*ck, Thats Delicious',
'description': 'The life and eating habits of raps greatest bon vivant, Action Bronson.',
},
'playlist_count': 17,
}
'playlist_mincount': 64,
}, {
'url': 'https://www.vicetv.com/en_us/show/fck-thats-delicious',
'only_matching': True,
}]
def _fetch_page(self, locale, show_id, page):
videos = self._call_api('videos', 'show_id', show_id, locale, '''body
id
url''', ', page: %d, per_page: %d' % (page + 1, self._PAGE_SIZE))
for video in videos:
yield self.url_result(
video['url'], ViceIE.ie_key(), video.get('id'))
def _real_extract(self, url):
show_id = self._match_id(url)
webpage = self._download_webpage(url, show_id)
locale, display_id = re.match(self._VALID_URL, url).groups()
show = self._call_api('shows', 'slug', display_id, locale, '''dek
id
title''')[0]
show_id = show['id']
entries = [
self.url_result(video_url, ViceIE.ie_key())
for video_url, _ in re.findall(
r'<h2[^>]+class="article-title"[^>]+data-id="\d+"[^>]*>\s*<a[^>]+href="(%s.*?)"'
% ViceIE._VALID_URL, webpage)]
entries = OnDemandPagedList(
functools.partial(self._fetch_page, locale, show_id),
self._PAGE_SIZE)
title = self._search_regex(
r'<title>(.+?)</title>', webpage, 'title', default=None)
if title:
title = re.sub(r'(.+)\s*\|\s*.+$', r'\1', title).strip()
description = self._html_search_meta(
'description', webpage, 'description')
return self.playlist_result(entries, show_id, title, description)
return self.playlist_result(
entries, show_id, show.get('title'), show.get('dek'))
class ViceArticleIE(InfoExtractor):
class ViceArticleIE(ViceBaseIE):
IE_NAME = 'vice:article'
_VALID_URL = r'https://www\.vice\.com/[^/]+/article/(?P<id>[^?#]+)'
_VALID_URL = r'https://(?:www\.)?vice\.com/(?P<locale>[^/]+)/article/(?:[0-9a-z]{6}/)?(?P<id>[^?#]+)'
_TESTS = [{
'url': 'https://www.vice.com/en_us/article/on-set-with-the-woman-making-mormon-porn-in-utah',
'info_dict': {
'id': '41eae2a47b174a1398357cec55f1f6fc',
'id': '58dc0a3dee202d2a0ccfcbd8',
'ext': 'mp4',
'title': 'Mormon War on Porn ',
'description': 'md5:6394a8398506581d0346b9ab89093fef',
'title': 'Mormon War on Porn',
'description': 'md5:1c5d91fe25fa8aa304f9def118b92dbf',
'uploader': 'vice',
'uploader_id': '57a204088cb727dec794c67b',
'timestamp': 1491883129,
@ -258,10 +260,10 @@ class ViceArticleIE(InfoExtractor):
# AES-encrypted m3u8
'skip_download': True,
},
'add_ie': ['UplynkPreplay'],
'add_ie': [ViceIE.ie_key()],
}, {
'url': 'https://www.vice.com/en_us/article/how-to-hack-a-car',
'md5': '7fe8ebc4fa3323efafc127b82bd821d9',
'md5': '13010ee0bc694ea87ec40724397c2349',
'info_dict': {
'id': '3jstaBeXgAs',
'ext': 'mp4',
@ -271,15 +273,15 @@ class ViceArticleIE(InfoExtractor):
'uploader_id': 'MotherboardTV',
'upload_date': '20140529',
},
'add_ie': ['Youtube'],
'add_ie': [YoutubeIE.ie_key()],
}, {
'url': 'https://www.vice.com/en_us/article/znm9dx/karley-sciortino-slutever-reloaded',
'md5': 'a7ecf64ee4fa19b916c16f4b56184ae2',
'info_dict': {
'id': 'e2ed435eb67e43efb66e6ef9a6930a88',
'id': '57f41d3556a0a80f54726060',
'ext': 'mp4',
'title': "Making The World's First Male Sex Doll",
'description': 'md5:916078ef0e032d76343116208b6cc2c4',
'description': 'md5:19b00b215b99961cf869c40fbe9df755',
'uploader': 'vice',
'uploader_id': '57a204088cb727dec794c67b',
'timestamp': 1476919911,
@ -288,6 +290,7 @@ class ViceArticleIE(InfoExtractor):
},
'params': {
'skip_download': True,
'format': 'bestvideo',
},
'add_ie': [ViceIE.ie_key()],
}, {
@ -299,14 +302,11 @@ class ViceArticleIE(InfoExtractor):
}]
def _real_extract(self, url):
display_id = self._match_id(url)
locale, display_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, display_id)
prefetch_data = self._parse_json(self._search_regex(
r'__APP_STATE\s*=\s*({.+?})(?:\s*\|\|\s*{}\s*)?;\s*\n',
webpage, 'app state'), display_id)['pageData']
body = prefetch_data['body']
article = self._call_api('articles', 'slug', display_id, locale, '''body
embed_code''')[0]
body = article['body']
def _url_res(video_url, ie_key):
return {
@ -316,7 +316,7 @@ class ViceArticleIE(InfoExtractor):
'ie_key': ie_key,
}
vice_url = ViceIE._extract_url(webpage)
vice_url = ViceIE._extract_url(body)
if vice_url:
return _url_res(vice_url, ViceIE.ie_key())
@ -332,6 +332,6 @@ class ViceArticleIE(InfoExtractor):
video_url = self._html_search_regex(
r'data-video-url="([^"]+)"',
prefetch_data['embed_code'], 'video URL')
article['embed_code'], 'video URL')
return _url_res(video_url, ViceIE.ie_key())

View File

@ -1,28 +1,62 @@
from __future__ import unicode_literals
import base64
import json
import re
from .common import InfoExtractor
from ..compat import compat_urllib_parse_unquote
from ..compat import compat_HTTPError
from ..utils import (
ExtractorError,
clean_html,
determine_ext,
int_or_none,
js_to_json,
parse_age_limit,
parse_duration,
try_get,
)
class ViewLiftBaseIE(InfoExtractor):
_DOMAINS_REGEX = r'(?:(?:main\.)?snagfilms|snagxtreme|funnyforfree|kiddovid|winnersview|(?:monumental|lax)sportsnetwork|vayafilm)\.com|hoichoi\.tv'
_API_BASE = 'https://prod-api.viewlift.com/'
_DOMAINS_REGEX = r'(?:(?:main\.)?snagfilms|snagxtreme|funnyforfree|kiddovid|winnersview|(?:monumental|lax)sportsnetwork|vayafilm|failarmy|ftfnext|lnppass\.legapallacanestro|moviespree|app\.myoutdoortv|neoufitness|pflmma|theidentitytb)\.com|(?:hoichoi|app\.horseandcountry|kronon|marquee|supercrosslive)\.tv'
_SITE_MAP = {
'ftfnext': 'lax',
'funnyforfree': 'snagfilms',
'hoichoi': 'hoichoitv',
'kiddovid': 'snagfilms',
'laxsportsnetwork': 'lax',
'legapallacanestro': 'lnp',
'marquee': 'marquee-tv',
'monumentalsportsnetwork': 'monumental-network',
'moviespree': 'bingeflix',
'pflmma': 'pfl',
'snagxtreme': 'snagfilms',
'theidentitytb': 'tampabay',
'vayafilm': 'snagfilms',
}
_TOKENS = {}
def _call_api(self, site, path, video_id, query):
token = self._TOKENS.get(site)
if not token:
token_query = {'site': site}
email, password = self._get_login_info(netrc_machine=site)
if email:
resp = self._download_json(
self._API_BASE + 'identity/signin', video_id,
'Logging in', query=token_query, data=json.dumps({
'email': email,
'password': password,
}).encode())
else:
resp = self._download_json(
self._API_BASE + 'identity/anonymous-token', video_id,
'Downloading authorization token', query=token_query)
self._TOKENS[site] = token = resp['authorizationToken']
return self._download_json(
self._API_BASE + path, video_id,
headers={'Authorization': token}, query=query)
class ViewLiftEmbedIE(ViewLiftBaseIE):
_VALID_URL = r'https?://(?:(?:www|embed)\.)?(?:%s)/embed/player\?.*\bfilmId=(?P<id>[\da-f]{8}-(?:[\da-f]{4}-){3}[\da-f]{12})' % ViewLiftBaseIE._DOMAINS_REGEX
IE_NAME = 'viewlift:embed'
_VALID_URL = r'https?://(?:(?:www|embed)\.)?(?P<domain>%s)/embed/player\?.*\bfilmId=(?P<id>[\da-f]{8}-(?:[\da-f]{4}-){3}[\da-f]{12})' % ViewLiftBaseIE._DOMAINS_REGEX
_TESTS = [{
'url': 'http://embed.snagfilms.com/embed/player?filmId=74849a00-85a9-11e1-9660-123139220831&w=500',
'md5': '2924e9215c6eff7a55ed35b72276bd93',
@ -30,6 +64,9 @@ class ViewLiftEmbedIE(ViewLiftBaseIE):
'id': '74849a00-85a9-11e1-9660-123139220831',
'ext': 'mp4',
'title': '#whilewewatch',
'description': 'md5:b542bef32a6f657dadd0df06e26fb0c8',
'timestamp': 1334350096,
'upload_date': '20120413',
}
}, {
# invalid labels, 360p is better that 480p
@ -39,7 +76,8 @@ class ViewLiftEmbedIE(ViewLiftBaseIE):
'id': '17ca0950-a74a-11e0-a92a-0026bb61d036',
'ext': 'mp4',
'title': 'Life in Limbo',
}
},
'skip': 'The video does not exist',
}, {
'url': 'http://www.snagfilms.com/embed/player?filmId=0000014c-de2f-d5d6-abcf-ffef58af0017',
'only_matching': True,
@ -54,67 +92,68 @@ class ViewLiftEmbedIE(ViewLiftBaseIE):
return mobj.group('url')
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
if '>This film is not playable in your area.<' in webpage:
raise ExtractorError(
'Film %s is not playable in your area.' % video_id, expected=True)
domain, film_id = re.match(self._VALID_URL, url).groups()
site = domain.split('.')[-2]
if site in self._SITE_MAP:
site = self._SITE_MAP[site]
try:
content_data = self._call_api(
site, 'entitlement/video/status', film_id, {
'id': film_id
})['video']
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
error_message = self._parse_json(e.cause.read().decode(), film_id).get('errorMessage')
if error_message == 'User does not have a valid subscription or has not purchased this content.':
self.raise_login_required()
raise ExtractorError(error_message, expected=True)
raise
gist = content_data['gist']
title = gist['title']
video_assets = content_data['streamingInfo']['videoAssets']
formats = []
has_bitrate = False
sources = self._parse_json(self._search_regex(
r'(?s)sources:\s*(\[.+?\]),', webpage,
'sources', default='[]'), video_id, js_to_json)
for source in sources:
file_ = source.get('file')
if not file_:
mpeg_video_assets = video_assets.get('mpeg') or []
for video_asset in mpeg_video_assets:
video_asset_url = video_asset.get('url')
if not video_asset:
continue
type_ = source.get('type')
ext = determine_ext(file_)
format_id = source.get('label') or ext
if all(v in ('m3u8', 'hls') for v in (type_, ext)):
formats.extend(self._extract_m3u8_formats(
file_, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
else:
bitrate = int_or_none(self._search_regex(
[r'(\d+)kbps', r'_\d{1,2}x\d{1,2}_(\d{3,})\.%s' % ext],
file_, 'bitrate', default=None))
if not has_bitrate and bitrate:
has_bitrate = True
height = int_or_none(self._search_regex(
r'^(\d+)[pP]$', format_id, 'height', default=None))
formats.append({
'url': file_,
'format_id': 'http-%s%s' % (format_id, ('-%dk' % bitrate if bitrate else '')),
'tbr': bitrate,
'height': height,
})
if not formats:
hls_url = self._parse_json(self._search_regex(
r'filmInfo\.src\s*=\s*({.+?});',
webpage, 'src'), video_id, js_to_json)['src']
formats = self._extract_m3u8_formats(
hls_url, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False)
field_preference = None if has_bitrate else ('height', 'tbr', 'format_id')
self._sort_formats(formats, field_preference)
bitrate = int_or_none(video_asset.get('bitrate'))
height = int_or_none(self._search_regex(
r'^_?(\d+)[pP]$', video_asset.get('renditionValue'),
'height', default=None))
formats.append({
'url': video_asset_url,
'format_id': 'http%s' % ('-%d' % bitrate if bitrate else ''),
'tbr': bitrate,
'height': height,
'vcodec': video_asset.get('codec'),
})
title = self._search_regex(
[r"title\s*:\s*'([^']+)'", r'<title>([^<]+)</title>'],
webpage, 'title')
hls_url = video_assets.get('hls')
if hls_url:
formats.extend(self._extract_m3u8_formats(
hls_url, film_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
self._sort_formats(formats, ('height', 'tbr', 'format_id'))
return {
'id': video_id,
info = {
'id': film_id,
'title': title,
'description': gist.get('description'),
'thumbnail': gist.get('videoImageUrl'),
'duration': int_or_none(gist.get('runtime')),
'age_limit': parse_age_limit(content_data.get('parentalRating')),
'timestamp': int_or_none(gist.get('publishDate'), 1000),
'formats': formats,
}
for k in ('categories', 'tags'):
info[k] = [v['title'] for v in content_data.get(k, []) if v.get('title')]
return info
class ViewLiftIE(ViewLiftBaseIE):
_VALID_URL = r'https?://(?:www\.)?(?P<domain>%s)(?:/(?:films/title|show|(?:news/)?videos?))?/(?P<id>[^?#]+)' % ViewLiftBaseIE._DOMAINS_REGEX
IE_NAME = 'viewlift'
_VALID_URL = r'https?://(?:www\.)?(?P<domain>%s)(?P<path>(?:/(?:films/title|show|(?:news/)?videos?|watch))?/(?P<id>[^?#]+))' % ViewLiftBaseIE._DOMAINS_REGEX
_TESTS = [{
'url': 'http://www.snagfilms.com/films/title/lost_for_life',
'md5': '19844f897b35af219773fd63bdec2942',
@ -151,10 +190,13 @@ class ViewLiftIE(ViewLiftBaseIE):
'id': '00000148-7b53-de26-a9fb-fbf306f70020',
'display_id': 'augie_alone/s_2_ep_12_love',
'ext': 'mp4',
'title': 'Augie, Alone:S. 2 Ep. 12 - Love',
'description': 'md5:db2a5c72d994f16a780c1eb353a8f403',
'title': 'S. 2 Ep. 12 - Love',
'description': 'Augie finds love.',
'thumbnail': r're:^https?://.*\.jpg',
'duration': 107,
'upload_date': '20141012',
'timestamp': 1413129540,
'age_limit': 17,
},
'params': {
'skip_download': True,
@ -177,6 +219,9 @@ class ViewLiftIE(ViewLiftBaseIE):
# Was once Kaltura embed
'url': 'https://www.monumentalsportsnetwork.com/videos/john-carlson-postgame-2-25-15',
'only_matching': True,
}, {
'url': 'https://www.marquee.tv/watch/sadlerswells-sacredmonsters',
'only_matching': True,
}]
@classmethod
@ -184,119 +229,22 @@ class ViewLiftIE(ViewLiftBaseIE):
return False if ViewLiftEmbedIE.suitable(url) else super(ViewLiftIE, cls).suitable(url)
def _real_extract(self, url):
domain, display_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, display_id)
if ">Sorry, the Film you're looking for is not available.<" in webpage:
raise ExtractorError(
'Film %s is not available.' % display_id, expected=True)
initial_store_state = self._search_regex(
r"window\.initialStoreState\s*=.*?JSON\.parse\(unescape\(atob\('([^']+)'\)\)\)",
webpage, 'Initial Store State', default=None)
if initial_store_state:
modules = self._parse_json(compat_urllib_parse_unquote(base64.b64decode(
initial_store_state).decode()), display_id)['page']['data']['modules']
content_data = next(m['contentData'][0] for m in modules if m.get('moduleType') == 'VideoDetailModule')
gist = content_data['gist']
film_id = gist['id']
title = gist['title']
video_assets = try_get(
content_data, lambda x: x['streamingInfo']['videoAssets'], dict)
if not video_assets:
token = self._download_json(
'https://prod-api.viewlift.com/identity/anonymous-token',
film_id, 'Downloading authorization token',
query={'site': 'snagfilms'})['authorizationToken']
video_assets = self._download_json(
'https://prod-api.viewlift.com/entitlement/video/status',
film_id, headers={
'Authorization': token,
'Referer': url,
}, query={
'id': film_id
})['video']['streamingInfo']['videoAssets']
formats = []
mpeg_video_assets = video_assets.get('mpeg') or []
for video_asset in mpeg_video_assets:
video_asset_url = video_asset.get('url')
if not video_asset:
continue
bitrate = int_or_none(video_asset.get('bitrate'))
height = int_or_none(self._search_regex(
r'^_?(\d+)[pP]$', video_asset.get('renditionValue'),
'height', default=None))
formats.append({
'url': video_asset_url,
'format_id': 'http%s' % ('-%d' % bitrate if bitrate else ''),
'tbr': bitrate,
'height': height,
'vcodec': video_asset.get('codec'),
})
hls_url = video_assets.get('hls')
if hls_url:
formats.extend(self._extract_m3u8_formats(
hls_url, film_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
self._sort_formats(formats, ('height', 'tbr', 'format_id'))
info = {
'id': film_id,
'display_id': display_id,
'title': title,
'description': gist.get('description'),
'thumbnail': gist.get('videoImageUrl'),
'duration': int_or_none(gist.get('runtime')),
'age_limit': parse_age_limit(content_data.get('parentalRating')),
'timestamp': int_or_none(gist.get('publishDate'), 1000),
'formats': formats,
}
for k in ('categories', 'tags'):
info[k] = [v['title'] for v in content_data.get(k, []) if v.get('title')]
return info
else:
film_id = self._search_regex(r'filmId=([\da-f-]{36})"', webpage, 'film id')
snag = self._parse_json(
self._search_regex(
r'Snag\.page\.data\s*=\s*(\[.+?\]);', webpage, 'snag', default='[]'),
display_id)
for item in snag:
if item.get('data', {}).get('film', {}).get('id') == film_id:
data = item['data']['film']
title = data['title']
description = clean_html(data.get('synopsis'))
thumbnail = data.get('image')
duration = int_or_none(data.get('duration') or data.get('runtime'))
categories = [
category['title'] for category in data.get('categories', [])
if category.get('title')]
break
else:
title = self._html_search_regex(
(r'itemprop="title">([^<]+)<',
r'(?s)itemprop="title">(.+?)<div'), webpage, 'title')
description = self._html_search_regex(
r'(?s)<div itemprop="description" class="film-synopsis-inner ">(.+?)</div>',
webpage, 'description', default=None) or self._og_search_description(webpage)
thumbnail = self._og_search_thumbnail(webpage)
duration = parse_duration(self._search_regex(
r'<span itemprop="duration" class="film-duration strong">([^<]+)<',
webpage, 'duration', fatal=False))
categories = re.findall(r'<a href="/movies/[^"]+">([^<]+)</a>', webpage)
return {
'_type': 'url_transparent',
'url': 'http://%s/embed/player?filmId=%s' % (domain, film_id),
'id': film_id,
'display_id': display_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'duration': duration,
'categories': categories,
'ie_key': 'ViewLiftEmbed',
}
domain, path, display_id = re.match(self._VALID_URL, url).groups()
site = domain.split('.')[-2]
if site in self._SITE_MAP:
site = self._SITE_MAP[site]
modules = self._call_api(
site, 'content/pages', display_id, {
'includeContent': 'true',
'moduleOffset': 1,
'path': path,
'site': site,
})['modules']
film_id = next(m['contentData'][0]['gist']['id'] for m in modules if m.get('moduleType') == 'VideoDetailModule')
return {
'_type': 'url_transparent',
'url': 'http://%s/embed/player?filmId=%s' % (domain, film_id),
'id': film_id,
'display_id': display_id,
'ie_key': 'ViewLiftEmbed',
}

View File

@ -591,7 +591,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
# Retrieve video webpage to extract further information
webpage, urlh = self._download_webpage_handle(
url, video_id, headers=headers)
redirect_url = compat_str(urlh.geturl())
redirect_url = urlh.geturl()
except ExtractorError as ee:
if isinstance(ee.cause, compat_HTTPError) and ee.cause.code == 403:
errmsg = ee.cause.read()
@ -841,33 +841,6 @@ class VimeoChannelIE(VimeoBaseInfoExtractor):
return self._TITLE or self._html_search_regex(
self._TITLE_RE, webpage, 'list title', fatal=False)
def _login_list_password(self, page_url, list_id, webpage):
login_form = self._search_regex(
r'(?s)<form[^>]+?id="pw_form"(.*?)</form>',
webpage, 'login form', default=None)
if not login_form:
return webpage
password = self._downloader.params.get('videopassword')
if password is None:
raise ExtractorError('This album is protected by a password, use the --video-password option', expected=True)
fields = self._hidden_inputs(login_form)
token, vuid = self._extract_xsrft_and_vuid(webpage)
fields['token'] = token
fields['password'] = password
post = urlencode_postdata(fields)
password_path = self._search_regex(
r'action="([^"]+)"', login_form, 'password URL')
password_url = compat_urlparse.urljoin(page_url, password_path)
password_request = sanitized_Request(password_url, post)
password_request.add_header('Content-type', 'application/x-www-form-urlencoded')
self._set_vimeo_cookie('vuid', vuid)
self._set_vimeo_cookie('xsrft', token)
return self._download_webpage(
password_request, list_id,
'Verifying the password', 'Wrong password')
def _title_and_entries(self, list_id, base_url):
for pagenum in itertools.count(1):
page_url = self._page_url(base_url, pagenum)
@ -876,7 +849,6 @@ class VimeoChannelIE(VimeoBaseInfoExtractor):
'Downloading page %s' % pagenum)
if pagenum == 1:
webpage = self._login_list_password(page_url, list_id, webpage)
yield self._extract_list_title(webpage)
# Try extracting href first since not all videos are available via
@ -923,7 +895,7 @@ class VimeoUserIE(VimeoChannelIE):
_BASE_URL_TEMPL = 'https://vimeo.com/%s'
class VimeoAlbumIE(VimeoChannelIE):
class VimeoAlbumIE(VimeoBaseInfoExtractor):
IE_NAME = 'vimeo:album'
_VALID_URL = r'https://vimeo\.com/(?:album|showcase)/(?P<id>\d+)(?:$|[?#]|/(?!video))'
_TITLE_RE = r'<header id="page_header">\n\s*<h1>(.*?)</h1>'
@ -973,13 +945,39 @@ class VimeoAlbumIE(VimeoChannelIE):
def _real_extract(self, url):
album_id = self._match_id(url)
webpage = self._download_webpage(url, album_id)
webpage = self._login_list_password(url, album_id, webpage)
api_config = self._extract_vimeo_config(webpage, album_id)['api']
viewer = self._parse_json(self._search_regex(
r'bootstrap_data\s*=\s*({.+?})</script>',
webpage, 'bootstrap data'), album_id)['viewer']
jwt = viewer['jwt']
album = self._download_json(
'https://api.vimeo.com/albums/' + album_id,
album_id, headers={'Authorization': 'jwt ' + jwt},
query={'fields': 'description,name,privacy'})
hashed_pass = None
if try_get(album, lambda x: x['privacy']['view']) == 'password':
password = self._downloader.params.get('videopassword')
if not password:
raise ExtractorError(
'This album is protected by a password, use the --video-password option',
expected=True)
self._set_vimeo_cookie('vuid', viewer['vuid'])
try:
hashed_pass = self._download_json(
'https://vimeo.com/showcase/%s/auth' % album_id,
album_id, 'Verifying the password', data=urlencode_postdata({
'password': password,
'token': viewer['xsrft'],
}), headers={
'X-Requested-With': 'XMLHttpRequest',
})['hashed_pass']
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 401:
raise ExtractorError('Wrong password', expected=True)
raise
entries = OnDemandPagedList(functools.partial(
self._fetch_page, album_id, api_config['jwt'],
api_config.get('hashed_pass')), self._PAGE_SIZE)
return self.playlist_result(entries, album_id, self._html_search_regex(
r'<title>\s*(.+?)(?:\s+on Vimeo)?</title>', webpage, 'title', fatal=False))
self._fetch_page, album_id, jwt, hashed_pass), self._PAGE_SIZE)
return self.playlist_result(
entries, album_id, album.get('name'), album.get('description'))
class VimeoGroupsIE(VimeoChannelIE):

View File

@ -6,22 +6,18 @@ import time
import itertools
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse_urlencode,
compat_str,
)
from .naver import NaverBaseIE
from ..compat import compat_str
from ..utils import (
dict_get,
ExtractorError,
float_or_none,
int_or_none,
merge_dicts,
remove_start,
try_get,
urlencode_postdata,
)
class VLiveIE(InfoExtractor):
class VLiveIE(NaverBaseIE):
IE_NAME = 'vlive'
_VALID_URL = r'https?://(?:(?:www|m)\.)?vlive\.tv/video/(?P<id>[0-9]+)'
_NETRC_MACHINE = 'vlive'
@ -34,6 +30,7 @@ class VLiveIE(InfoExtractor):
'title': "[V LIVE] Girl's Day's Broadcast",
'creator': "Girl's Day",
'view_count': int,
'uploader_id': 'muploader_a',
},
}, {
'url': 'http://www.vlive.tv/video/16937',
@ -44,6 +41,7 @@ class VLiveIE(InfoExtractor):
'creator': 'EXO',
'view_count': int,
'subtitles': 'mincount:12',
'uploader_id': 'muploader_j',
},
'params': {
'skip_download': True,
@ -187,45 +185,9 @@ class VLiveIE(InfoExtractor):
'This video is only available for CH+ subscribers')
long_video_id, key = video_info['vid'], video_info['inkey']
playinfo = self._download_json(
'http://global.apis.naver.com/rmcnmv/rmcnmv/vod_play_videoInfo.json?%s'
% compat_urllib_parse_urlencode({
'videoId': long_video_id,
'key': key,
'ptc': 'http',
'doct': 'json', # document type (xml or json)
'cpt': 'vtt', # captions type (vtt or ttml)
}), video_id)
formats = [{
'url': vid['source'],
'format_id': vid.get('encodingOption', {}).get('name'),
'abr': float_or_none(vid.get('bitrate', {}).get('audio')),
'vbr': float_or_none(vid.get('bitrate', {}).get('video')),
'width': int_or_none(vid.get('encodingOption', {}).get('width')),
'height': int_or_none(vid.get('encodingOption', {}).get('height')),
'filesize': int_or_none(vid.get('size')),
} for vid in playinfo.get('videos', {}).get('list', []) if vid.get('source')]
self._sort_formats(formats)
view_count = int_or_none(playinfo.get('meta', {}).get('count'))
subtitles = {}
for caption in playinfo.get('captions', {}).get('list', []):
lang = dict_get(caption, ('locale', 'language', 'country', 'label'))
if lang and caption.get('source'):
subtitles[lang] = [{
'ext': 'vtt',
'url': caption['source']}]
info = self._get_common_fields(webpage)
info.update({
'id': video_id,
'formats': formats,
'view_count': view_count,
'subtitles': subtitles,
})
return info
return merge_dicts(
self._get_common_fields(webpage),
self._extract_video_info(video_id, long_video_id, key))
def _download_init_page(self, video_id):
return self._download_webpage(

View File

@ -6,8 +6,8 @@ from ..utils import unescapeHTML
class VODPlatformIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?vod-platform\.net/[eE]mbed/(?P<id>[^/?#]+)'
_TEST = {
_VALID_URL = r'https?://(?:(?:www\.)?vod-platform\.net|embed\.kwikmotion\.com)/[eE]mbed/(?P<id>[^/?#]+)'
_TESTS = [{
# from http://www.lbcgroup.tv/watch/chapter/29143/52844/%D8%A7%D9%84%D9%86%D8%B5%D8%B1%D8%A9-%D9%81%D9%8A-%D8%B6%D9%8A%D8%A7%D9%81%D8%A9-%D8%A7%D9%84%D9%80-cnn/ar
'url': 'http://vod-platform.net/embed/RufMcytHDolTH1MuKHY9Fw',
'md5': '1db2b7249ce383d6be96499006e951fc',
@ -16,7 +16,10 @@ class VODPlatformIE(InfoExtractor):
'ext': 'mp4',
'title': 'LBCi News_ النصرة في ضيافة الـ "سي.أن.أن"',
}
}
}, {
'url': 'http://embed.kwikmotion.com/embed/RufMcytHDolTH1MuKHY9Fw',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)

View File

@ -1,17 +1,12 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import (
compat_str,
compat_urlparse,
)
from ..compat import compat_str
from ..utils import (
ExtractorError,
determine_ext,
int_or_none,
sanitized_Request,
urljoin,
)
@ -26,8 +21,7 @@ class VoiceRepublicIE(InfoExtractor):
'ext': 'm4a',
'title': 'Watching the Watchers: Building a Sousveillance State',
'description': 'Secret surveillance programs have metadata too. The people and companies that operate secret surveillance programs can be surveilled.',
'thumbnail': r're:^https?://.*\.(?:png|jpg)$',
'duration': 1800,
'duration': 1556,
'view_count': int,
}
}, {
@ -38,63 +32,31 @@ class VoiceRepublicIE(InfoExtractor):
def _real_extract(self, url):
display_id = self._match_id(url)
req = sanitized_Request(
compat_urlparse.urljoin(url, '/talks/%s' % display_id))
# Older versions of Firefox get redirected to an "upgrade browser" page
req.add_header('User-Agent', 'youtube-dl')
webpage = self._download_webpage(req, display_id)
webpage = self._download_webpage(url, display_id)
if '>Queued for processing, please stand by...<' in webpage:
raise ExtractorError(
'Audio is still queued for processing', expected=True)
config = self._search_regex(
r'(?s)return ({.+?});\s*\n', webpage,
'data', default=None)
data = self._parse_json(config, display_id, fatal=False) if config else None
if data:
title = data['title']
description = data.get('teaser')
talk_id = compat_str(data.get('talk_id') or display_id)
talk = data['talk']
duration = int_or_none(talk.get('duration'))
formats = [{
'url': compat_urlparse.urljoin(url, talk_url),
'format_id': format_id,
'ext': determine_ext(talk_url) or format_id,
'vcodec': 'none',
} for format_id, talk_url in talk['links'].items()]
else:
title = self._og_search_title(webpage)
description = self._html_search_regex(
r"(?s)<div class='talk-teaser'[^>]*>(.+?)</div>",
webpage, 'description', fatal=False)
talk_id = self._search_regex(
[r"id='jc-(\d+)'", r"data-shareable-id='(\d+)'"],
webpage, 'talk id', default=None) or display_id
duration = None
player = self._search_regex(
r"class='vr-player jp-jplayer'([^>]+)>", webpage, 'player')
formats = [{
'url': compat_urlparse.urljoin(url, talk_url),
'format_id': format_id,
'ext': determine_ext(talk_url) or format_id,
'vcodec': 'none',
} for format_id, talk_url in re.findall(r"data-([^=]+)='([^']+)'", player)]
talk = self._parse_json(self._search_regex(
r'initialSnapshot\s*=\s*({.+?});',
webpage, 'talk'), display_id)['talk']
title = talk['title']
formats = [{
'url': urljoin(url, talk_url),
'format_id': format_id,
'ext': determine_ext(talk_url) or format_id,
'vcodec': 'none',
} for format_id, talk_url in talk['media_links'].items()]
self._sort_formats(formats)
thumbnail = self._og_search_thumbnail(webpage)
view_count = int_or_none(self._search_regex(
r"class='play-count[^']*'>\s*(\d+) plays",
webpage, 'play count', fatal=False))
return {
'id': talk_id,
'id': compat_str(talk.get('id') or display_id),
'display_id': display_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'duration': duration,
'view_count': view_count,
'description': talk.get('teaser'),
'thumbnail': talk.get('image_url'),
'duration': int_or_none(talk.get('archived_duration')),
'view_count': int_or_none(talk.get('play_count')),
'formats': formats,
}

View File

@ -13,8 +13,7 @@ from ..utils import (
class WistiaIE(InfoExtractor):
_VALID_URL = r'(?:wistia:|https?://(?:fast\.)?wistia\.(?:net|com)/embed/(?:iframe|medias)/)(?P<id>[a-z0-9]{10})'
_API_URL = 'http://fast.wistia.com/embed/medias/%s.json'
_IFRAME_URL = 'http://fast.wistia.net/embed/iframe/%s'
_EMBED_BASE_URL = 'http://fast.wistia.com/embed/'
_TESTS = [{
'url': 'http://fast.wistia.net/embed/iframe/sh7fpupwlt',
@ -46,31 +45,32 @@ class WistiaIE(InfoExtractor):
# https://wistia.com/support/embed-and-share/video-on-your-website
@staticmethod
def _extract_url(webpage):
match = re.search(
r'<(?:meta[^>]+?content|(?:iframe|script)[^>]+?src)=["\'](?P<url>(?:https?:)?//(?:fast\.)?wistia\.(?:net|com)/embed/(?:iframe|medias)/[a-z0-9]{10})', webpage)
if match:
return unescapeHTML(match.group('url'))
urls = WistiaIE._extract_urls(webpage)
return urls[0] if urls else None
match = re.search(
r'''(?sx)
<script[^>]+src=(["'])(?:https?:)?//fast\.wistia\.com/assets/external/E-v1\.js\1[^>]*>.*?
<div[^>]+class=(["']).*?\bwistia_async_(?P<id>[a-z0-9]{10})\b.*?\2
''', webpage)
if match:
return 'wistia:%s' % match.group('id')
match = re.search(r'(?:data-wistia-?id=["\']|Wistia\.embed\(["\']|id=["\']wistia_)(?P<id>[a-z0-9]{10})', webpage)
if match:
return 'wistia:%s' % match.group('id')
@staticmethod
def _extract_urls(webpage):
urls = []
for match in re.finditer(
r'<(?:meta[^>]+?content|(?:iframe|script)[^>]+?src)=["\'](?P<url>(?:https?:)?//(?:fast\.)?wistia\.(?:net|com)/embed/(?:iframe|medias)/[a-z0-9]{10})', webpage):
urls.append(unescapeHTML(match.group('url')))
for match in re.finditer(
r'''(?sx)
<div[^>]+class=(["']).*?\bwistia_async_(?P<id>[a-z0-9]{10})\b.*?\2
''', webpage):
urls.append('wistia:%s' % match.group('id'))
for match in re.finditer(r'(?:data-wistia-?id=["\']|Wistia\.embed\(["\']|id=["\']wistia_)(?P<id>[a-z0-9]{10})', webpage):
urls.append('wistia:%s' % match.group('id'))
return urls
def _real_extract(self, url):
video_id = self._match_id(url)
data_json = self._download_json(
self._API_URL % video_id, video_id,
self._EMBED_BASE_URL + 'medias/%s.json' % video_id, video_id,
# Some videos require this.
headers={
'Referer': url if url.startswith('http') else self._IFRAME_URL % video_id,
'Referer': url if url.startswith('http') else self._EMBED_BASE_URL + 'iframe/' + video_id,
})
if data_json.get('error'):
@ -95,27 +95,61 @@ class WistiaIE(InfoExtractor):
'url': aurl,
'width': int_or_none(a.get('width')),
'height': int_or_none(a.get('height')),
'filesize': int_or_none(a.get('size')),
})
else:
aext = a.get('ext')
is_m3u8 = a.get('container') == 'm3u8' or aext == 'm3u8'
formats.append({
'format_id': atype,
display_name = a.get('display_name')
format_id = atype
if atype and atype.endswith('_video') and display_name:
format_id = '%s-%s' % (atype[:-6], display_name)
f = {
'format_id': format_id,
'url': aurl,
'tbr': int_or_none(a.get('bitrate')),
'vbr': int_or_none(a.get('opt_vbitrate')),
'width': int_or_none(a.get('width')),
'height': int_or_none(a.get('height')),
'filesize': int_or_none(a.get('size')),
'vcodec': a.get('codec'),
'container': a.get('container'),
'ext': 'mp4' if is_m3u8 else aext,
'protocol': 'm3u8' if is_m3u8 else None,
'tbr': int_or_none(a.get('bitrate')) or None,
'preference': 1 if atype == 'original' else None,
})
}
if display_name == 'Audio':
f.update({
'vcodec': 'none',
})
else:
f.update({
'width': int_or_none(a.get('width')),
'height': int_or_none(a.get('height')),
'vcodec': a.get('codec'),
})
if a.get('container') == 'm3u8' or aext == 'm3u8':
ts_f = f.copy()
ts_f.update({
'ext': 'ts',
'format_id': f['format_id'].replace('hls-', 'ts-'),
'url': f['url'].replace('.bin', '.ts'),
})
formats.append(ts_f)
f.update({
'ext': 'mp4',
'protocol': 'm3u8_native',
})
else:
f.update({
'container': a.get('container'),
'ext': aext,
'filesize': int_or_none(a.get('size')),
})
formats.append(f)
self._sort_formats(formats)
subtitles = {}
for caption in data.get('captions', []):
language = caption.get('language')
if not language:
continue
subtitles[language] = [{
'url': self._EMBED_BASE_URL + 'captions/' + video_id + '.vtt?language=' + language,
}]
return {
'id': video_id,
'title': title,
@ -124,4 +158,5 @@ class WistiaIE(InfoExtractor):
'thumbnails': thumbnails,
'duration': float_or_none(data.get('duration')),
'timestamp': int_or_none(data.get('createdAt')),
'subtitles': subtitles,
}

View File

@ -113,7 +113,7 @@ class XHamsterIE(InfoExtractor):
display_id = mobj.group('display_id') or mobj.group('display_id_2')
desktop_url = re.sub(r'^(https?://(?:.+?\.)?)m\.', r'\1', url)
webpage = self._download_webpage(desktop_url, video_id)
webpage, urlh = self._download_webpage_handle(desktop_url, video_id)
error = self._html_search_regex(
r'<div[^>]+id=["\']videoClosed["\'][^>]*>(.+?)</div>',
@ -161,6 +161,9 @@ class XHamsterIE(InfoExtractor):
'ext': determine_ext(format_url, 'mp4'),
'height': get_height(quality),
'filesize': filesize,
'http_headers': {
'Referer': urlh.geturl(),
},
})
self._sort_formats(formats)

View File

@ -47,7 +47,7 @@ class XTubeIE(InfoExtractor):
'display_id': 'A-Super-Run-Part-1-YT',
'ext': 'flv',
'title': 'A Super Run - Part 1 (YT)',
'description': 'md5:ca0d47afff4a9b2942e4b41aa970fd93',
'description': 'md5:4cc3af1aa1b0413289babc88f0d4f616',
'uploader': 'tshirtguy59',
'duration': 579,
'view_count': int,
@ -87,10 +87,24 @@ class XTubeIE(InfoExtractor):
'Cookie': 'age_verified=1; cookiesAccepted=1',
})
sources = self._parse_json(self._search_regex(
r'(["\'])?sources\1?\s*:\s*(?P<sources>{.+?}),',
webpage, 'sources', group='sources'), video_id,
transform_source=js_to_json)
title, thumbnail, duration = [None] * 3
config = self._parse_json(self._search_regex(
r'playerConf\s*=\s*({.+?})\s*,\s*\n', webpage, 'config',
default='{}'), video_id, transform_source=js_to_json, fatal=False)
if config:
config = config.get('mainRoll')
if isinstance(config, dict):
title = config.get('title')
thumbnail = config.get('poster')
duration = int_or_none(config.get('duration'))
sources = config.get('sources')
if isinstance(sources, dict):
sources = self._parse_json(self._search_regex(
r'(["\'])?sources\1?\s*:\s*(?P<sources>{.+?}),',
webpage, 'sources', group='sources'), video_id,
transform_source=js_to_json)
formats = []
for format_id, format_url in sources.items():
@ -102,20 +116,25 @@ class XTubeIE(InfoExtractor):
self._remove_duplicate_formats(formats)
self._sort_formats(formats)
title = self._search_regex(
(r'<h1>\s*(?P<title>[^<]+?)\s*</h1>', r'videoTitle\s*:\s*(["\'])(?P<title>.+?)\1'),
webpage, 'title', group='title')
description = self._search_regex(
if not title:
title = self._search_regex(
(r'<h1>\s*(?P<title>[^<]+?)\s*</h1>', r'videoTitle\s*:\s*(["\'])(?P<title>.+?)\1'),
webpage, 'title', group='title')
description = self._og_search_description(
webpage, default=None) or self._html_search_meta(
'twitter:description', webpage, default=None) or self._search_regex(
r'</h1>\s*<p>([^<]+)', webpage, 'description', fatal=False)
uploader = self._search_regex(
(r'<input[^>]+name="contentOwnerId"[^>]+value="([^"]+)"',
r'<span[^>]+class="nickname"[^>]*>([^<]+)'),
webpage, 'uploader', fatal=False)
duration = parse_duration(self._search_regex(
r'<dt>Runtime:?</dt>\s*<dd>([^<]+)</dd>',
webpage, 'duration', fatal=False))
if not duration:
duration = parse_duration(self._search_regex(
r'<dt>Runtime:?</dt>\s*<dd>([^<]+)</dd>',
webpage, 'duration', fatal=False))
view_count = str_to_int(self._search_regex(
r'<dt>Views:?</dt>\s*<dd>([\d,\.]+)</dd>',
(r'["\']viewsCount["\'][^>]*>(\d+)\s+views',
r'<dt>Views:?</dt>\s*<dd>([\d,\.]+)</dd>'),
webpage, 'view count', fatal=False))
comment_count = str_to_int(self._html_search_regex(
r'>Comments? \(([\d,\.]+)\)<',
@ -126,6 +145,7 @@ class XTubeIE(InfoExtractor):
'display_id': display_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'uploader': uploader,
'duration': duration,
'view_count': view_count,
@ -144,7 +164,7 @@ class XTubeUserIE(InfoExtractor):
'id': 'greenshowers-4056496',
'age_limit': 18,
},
'playlist_mincount': 155,
'playlist_mincount': 154,
}
def _real_extract(self, url):

View File

@ -44,7 +44,7 @@ class YouJizzIE(InfoExtractor):
encodings = self._parse_json(
self._search_regex(
r'encodings\s*=\s*(\[.+?\]);\n', webpage, 'encodings',
r'[Ee]ncodings\s*=\s*(\[.+?\]);\n', webpage, 'encodings',
default='[]'),
video_id, fatal=False)
for encoding in encodings:

View File

@ -1,6 +1,7 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
parse_duration,
urljoin,
@ -8,9 +9,9 @@ from ..utils import (
class YourPornIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?(?:yourporn\.sexy|sxyprn\.com)/post/(?P<id>[^/?#&.]+)'
_VALID_URL = r'https?://(?:www\.)?sxyprn\.com/post/(?P<id>[^/?#&.]+)'
_TESTS = [{
'url': 'https://yourporn.sexy/post/57ffcb2e1179b.html',
'url': 'https://sxyprn.com/post/57ffcb2e1179b.html',
'md5': '6f8682b6464033d87acaa7a8ff0c092e',
'info_dict': {
'id': '57ffcb2e1179b',
@ -33,11 +34,19 @@ class YourPornIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
video_url = urljoin(url, self._parse_json(
parts = self._parse_json(
self._search_regex(
r'data-vnfo=(["\'])(?P<data>{.+?})\1', webpage, 'data info',
group='data'),
video_id)[video_id]).replace('/cdn/', '/cdn5/')
video_id)[video_id].split('/')
num = 0
for c in parts[6] + parts[7]:
if c.isnumeric():
num += int(c)
parts[5] = compat_str(int(parts[5]) - num)
parts[1] += '8'
video_url = urljoin(url, '/'.join(parts))
title = (self._search_regex(
r'<[^>]+\bclass=["\']PostEditTA[^>]+>([^<]+)', webpage, 'title',
@ -54,4 +63,5 @@ class YourPornIE(InfoExtractor):
'thumbnail': thumbnail,
'duration': duration,
'age_limit': 18,
'ext': 'mp4',
}

View File

@ -1343,6 +1343,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
funcname = self._search_regex(
(r'\b[cs]\s*&&\s*[adf]\.set\([^,]+\s*,\s*encodeURIComponent\s*\(\s*(?P<sig>[a-zA-Z0-9$]+)\(',
r'\b[a-zA-Z0-9]+\s*&&\s*[a-zA-Z0-9]+\.set\([^,]+\s*,\s*encodeURIComponent\s*\(\s*(?P<sig>[a-zA-Z0-9$]+)\(',
r'\b(?P<sig>[a-zA-Z0-9$]{2})\s*=\s*function\(\s*a\s*\)\s*{\s*a\s*=\s*a\.split\(\s*""\s*\)',
r'(?P<sig>[a-zA-Z0-9$]+)\s*=\s*function\(\s*a\s*\)\s*{\s*a\s*=\s*a\.split\(\s*""\s*\)',
# Obsolete patterns
r'(["\'])signature\1\s*,\s*(?P<sig>[a-zA-Z0-9$]+)\(',

View File

@ -29,7 +29,6 @@ class ZapiksIE(InfoExtractor):
'timestamp': 1359044972,
'upload_date': '20130124',
'view_count': int,
'comment_count': int,
},
},
{

View File

@ -244,14 +244,14 @@ class ZDFChannelIE(ZDFBaseIE):
'id': 'das-aktuelle-sportstudio',
'title': 'das aktuelle sportstudio | ZDF',
},
'playlist_count': 21,
'playlist_mincount': 23,
}, {
'url': 'https://www.zdf.de/dokumentation/planet-e',
'info_dict': {
'id': 'planet-e',
'title': 'planet e.',
},
'playlist_count': 4,
'playlist_mincount': 50,
}, {
'url': 'https://www.zdf.de/filme/taunuskrimi/',
'only_matching': True,

View File

@ -4,10 +4,20 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_HTTPError
from ..utils import (
dict_get,
ExtractorError,
int_or_none,
js_to_json,
parse_iso8601,
)
class ZypeIE(InfoExtractor):
_VALID_URL = r'https?://player\.zype\.com/embed/(?P<id>[\da-fA-F]+)\.js\?.*?api_key=[^&]+'
_ID_RE = r'[\da-fA-F]+'
_COMMON_RE = r'//player\.zype\.com/embed/%s\.(?:js|json|html)\?.*?(?:access_token|(?:ap[ip]|player)_key)='
_VALID_URL = r'https?:%s[^&]+' % (_COMMON_RE % ('(?P<id>%s)' % _ID_RE))
_TEST = {
'url': 'https://player.zype.com/embed/5b400b834b32992a310622b9.js?api_key=jZ9GUhRmxcPvX7M3SlfejB6Hle9jyHTdk2jVxG7wOHPLODgncEKVdPYBhuz9iWXQ&autoplay=false&controls=true&da=false',
'md5': 'eaee31d474c76a955bdaba02a505c595',
@ -16,6 +26,9 @@ class ZypeIE(InfoExtractor):
'ext': 'mp4',
'title': 'Smoky Barbecue Favorites',
'thumbnail': r're:^https?://.*\.jpe?g',
'description': 'md5:5ff01e76316bd8d46508af26dc86023b',
'timestamp': 1504915200,
'upload_date': '20170909',
},
}
@ -24,34 +37,98 @@ class ZypeIE(InfoExtractor):
return [
mobj.group('url')
for mobj in re.finditer(
r'<script[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//player\.zype\.com/embed/[\da-fA-F]+\.js\?.*?api_key=.+?)\1',
r'<script[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?%s.+?)\1' % (ZypeIE._COMMON_RE % ZypeIE._ID_RE),
webpage)]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
try:
response = self._download_json(re.sub(
r'\.(?:js|html)\?', '.json?', url), video_id)['response']
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code in (400, 401, 403):
raise ExtractorError(self._parse_json(
e.cause.read().decode(), video_id)['message'], expected=True)
raise
title = self._search_regex(
r'video_title\s*[:=]\s*(["\'])(?P<value>(?:(?!\1).)+)\1', webpage,
'title', group='value')
body = response['body']
video = response['video']
title = video['title']
m3u8_url = self._search_regex(
r'(["\'])(?P<url>(?:(?!\1).)+\.m3u8(?:(?!\1).)*)\1', webpage,
'm3u8 url', group='url')
formats = self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls')
if isinstance(body, dict):
formats = []
for output in body.get('outputs', []):
output_url = output.get('url')
if not output_url:
continue
name = output.get('name')
if name == 'm3u8':
formats = self._extract_m3u8_formats(
output_url, video_id, 'mp4',
'm3u8_native', m3u8_id='hls', fatal=False)
else:
f = {
'format_id': name,
'tbr': int_or_none(output.get('bitrate')),
'url': output_url,
}
if name in ('m4a', 'mp3'):
f['vcodec'] = 'none'
else:
f.update({
'height': int_or_none(output.get('height')),
'width': int_or_none(output.get('width')),
})
formats.append(f)
text_tracks = body.get('subtitles') or []
else:
m3u8_url = self._search_regex(
r'(["\'])(?P<url>(?:(?!\1).)+\.m3u8(?:(?!\1).)*)\1',
body, 'm3u8 url', group='url')
formats = self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', 'm3u8_native', m3u8_id='hls')
text_tracks = self._search_regex(
r'textTracks\s*:\s*(\[[^]]+\])',
body, 'text tracks', default=None)
if text_tracks:
text_tracks = self._parse_json(
text_tracks, video_id, js_to_json, False)
self._sort_formats(formats)
thumbnail = self._search_regex(
r'poster\s*[:=]\s*(["\'])(?P<url>(?:(?!\1).)+)\1', webpage, 'thumbnail',
default=False, group='url')
subtitles = {}
if text_tracks:
for text_track in text_tracks:
tt_url = dict_get(text_track, ('file', 'src'))
if not tt_url:
continue
subtitles.setdefault(text_track.get('label') or 'English', []).append({
'url': tt_url,
})
thumbnails = []
for thumbnail in video.get('thumbnails', []):
thumbnail_url = thumbnail.get('url')
if not thumbnail_url:
continue
thumbnails.append({
'url': thumbnail_url,
'width': int_or_none(thumbnail.get('width')),
'height': int_or_none(thumbnail.get('height')),
})
return {
'id': video_id,
'display_id': video.get('friendly_title'),
'title': title,
'thumbnail': thumbnail,
'thumbnails': thumbnails,
'description': dict_get(video, ('description', 'ott_description', 'short_description')),
'timestamp': parse_iso8601(video.get('published_at')),
'duration': int_or_none(video.get('duration')),
'view_count': int_or_none(video.get('request_count')),
'average_rating': int_or_none(video.get('rating')),
'season_number': int_or_none(video.get('season')),
'episode_number': int_or_none(video.get('episode')),
'formats': formats,
'subtitles': subtitles,
}

View File

@ -134,7 +134,7 @@ def parseOpts(overrideArguments=None):
action='help',
help='Print this help text and exit')
general.add_option(
'-v', '--version',
'--version',
action='version',
help='Print program version and exit')
general.add_option(

View File

@ -9,6 +9,7 @@ import subprocess
import sys
from zipimport import zipimporter
from .compat import compat_realpath
from .utils import encode_compat_str
from .version import __version__
@ -84,7 +85,9 @@ def update_self(to_screen, verbose, opener):
print_notes(to_screen, versions_info['versions'])
# sys.executable is set to the full pathname of the exe-file for py2exe
filename = sys.executable if hasattr(sys, 'frozen') else sys.argv[0]
# though symlinks are not followed so that we need to do this manually
# with help of realpath
filename = compat_realpath(sys.executable if hasattr(sys, 'frozen') else sys.argv[0])
if not os.access(filename, os.W_OK):
to_screen('ERROR: no write permissions on %s' % filename)

View File

@ -2795,6 +2795,15 @@ class YoutubeDLCookieProcessor(compat_urllib_request.HTTPCookieProcessor):
https_response = http_response
class YoutubeDLRedirectHandler(compat_urllib_request.HTTPRedirectHandler):
if sys.version_info[0] < 3:
def redirect_request(self, req, fp, code, msg, headers, newurl):
# On python 2 urlh.geturl() may sometimes return redirect URL
# as byte string instead of unicode. This workaround allows
# to force it always return unicode.
return compat_urllib_request.HTTPRedirectHandler.redirect_request(self, req, fp, code, msg, headers, compat_str(newurl))
def extract_timezone(date_str):
m = re.search(
r'^.{8,}?(?P<tz>Z$| ?(?P<sign>\+|-)(?P<hours>[0-9]{2}):?(?P<minutes>[0-9]{2})$)',

View File

@ -1,3 +1,3 @@
from __future__ import unicode_literals
__version__ = '2020.01.01'
__version__ = '2020.03.01'