1
0
mirror of https://codeberg.org/polarisfm/youtube-dl synced 2024-12-02 05:07:55 +01:00
This commit is contained in:
Mario 2018-05-29 17:48:34 +01:00
commit 5d348bfcb0
102 changed files with 2629 additions and 1320 deletions

View File

@ -6,8 +6,8 @@
--- ---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2018.04.16*. If it's not, read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected. ### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2018.05.26*. If it's not, read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2018.04.16** - [ ] I've **verified** and **I assure** that I'm running youtube-dl **2018.05.26**
### Before submitting an *issue* make sure you have: ### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through the [README](https://github.com/rg3/youtube-dl/blob/master/README.md), **most notably** the [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections - [ ] At least skimmed through the [README](https://github.com/rg3/youtube-dl/blob/master/README.md), **most notably** the [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@ -36,7 +36,7 @@ Add the `-v` flag to **your command line** you run youtube-dl with (`youtube-dl
[debug] User config: [] [debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2018.04.16 [debug] youtube-dl version 2018.05.26
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {} [debug] Proxy map: {}

View File

@ -236,3 +236,6 @@ Lei Wang
Petr Novák Petr Novák
Leonardo Taccari Leonardo Taccari
Martin Weinelt Martin Weinelt
Surya Oktafendri
TingPing
Alexandre Macabies

132
ChangeLog
View File

@ -1,3 +1,135 @@
version 2018.05.26
Core
* [utils] Improve parse_age_limit
Extractors
* [audiomack] Stringify video id (#15310)
* [izlesene] Fix extraction (#16233, #16271, #16407)
+ [indavideo] Add support for generic embeds (#11989)
* [indavideo] Fix extraction (#11221)
* [indavideo] Sign download URLs (#16174)
+ [peertube] Add support for PeerTube based sites (#16301, #16329)
* [imgur] Fix extraction (#16537)
+ [hidive] Add support for authentication (#16534)
+ [nbc] Add support for stream.nbcsports.com (#13911)
+ [viewlift] Add support for hoichoi.tv (#16536)
* [go90] Extract age limit and detect DRM protection(#10127)
* [viewlift] fix extraction for snagfilms.com (#15766)
* [globo] Improve extraction (#4189)
* Add support for authentication
* Simplify URL signing
* Extract DASH and MSS formats
* [leeco] Fix extraction (#16464)
* [teamcoco] Add fallback for format extraction (#16484)
* [teamcoco] Improve URL regular expression (#16484)
* [imdb] Improve extraction (#4085, #14557)
version 2018.05.18
Extractors
* [vimeo:likes] Relax URL regular expression and fix single page likes
extraction (#16475)
* [pluralsight] Fix clip id extraction (#16460)
+ [mychannels] Add support for mychannels.com (#15334)
- [moniker] Remove extractor (#15336)
* [pbs] Fix embed data extraction (#16474)
+ [mtv] Add support for paramountnetwork.com and bellator.com (#15418)
* [youtube] Fix hd720 format position
* [dailymotion] Remove fragment part from m3u8 URLs (#8915)
* [3sat] Improve extraction (#15350)
* Extract all formats
* Extract more format metadata
* Improve format sorting
* Use hls native downloader
* Detect and bypass geo-restriction
+ [dtube] Add support for d.tube (#15201)
* [options] Fix typo (#16450)
* [youtube] Improve format filesize extraction (#16453)
* [youtube] Make uploader extraction non fatal (#16444)
* [youtube] Fix extraction for embed restricted live streams (#16433)
* [nbc] Improve info extraction (#16440)
* [twitch:clips] Fix extraction (#16429)
* [redditr] Relax URL regular expression (#16426, #16427)
* [mixcloud] Bypass throttling for HTTP formats (#12579, #16424)
+ [nick] Add support for nickjr.de (#13230)
* [teamcoco] Fix extraction (#16374)
version 2018.05.09
Core
* [YoutubeDL] Ensure ext exists for automatic captions
* Introduce --geo-bypass-ip-block
Extractors
+ [udemy] Extract asset captions
+ [udemy] Extract stream URLs (#16372)
+ [businessinsider] Add support for businessinsider.com (#16387, #16388, #16389)
+ [cloudflarestream] Add support for cloudflarestream.com (#16375)
* [watchbox] Fix extraction (#16356)
* [discovery] Extract Affiliate/Anonymous Auth Token from cookies (#14954)
+ [itv:btcc] Add support for itv.com/btcc (#16139)
* [tunein] Use live title for live streams (#16347)
* [itv] Improve extraction (#16253)
version 2018.05.01
Core
* [downloader/fragment] Restart download if .ytdl file is corrupt (#16312)
+ [extractor/common] Extract interaction statistic
+ [utils] Add merge_dicts
+ [extractor/common] Add _download_json_handle
Extractors
* [kaltura] Improve iframe embeds detection (#16337)
+ [udemy] Extract outputs renditions (#16289, #16291, #16320, #16321, #16334,
#16335)
+ [zattoo] Add support for zattoo.com and mobiltv.quickline.com (#14668, #14676)
* [yandexmusic] Convert release_year to int
* [udemy] Override _download_webpage_handle instead of _download_webpage
* [xiami] Override _download_webpage_handle instead of _download_webpage
* [yandexmusic] Override _download_webpage_handle instead of _download_webpage
* [youtube] Correctly disable polymer on all requests (#16323, #16326)
* [generic] Prefer enclosures over links in RSS feeds (#16189)
+ [redditr] Add support for old.reddit.com URLs (#16274)
* [nrktv] Update API host (#16324)
+ [imdb] Extract all formats (#16249)
+ [vimeo] Extract JSON-LD (#16295)
* [funk:channel] Improve extraction (#16285)
version 2018.04.25
Core
* [utils] Fix match_str for boolean meta fields
+ [Makefile] Add support for pandoc 2 and disable smart extension (#16251)
* [YoutubeDL] Fix typo in media extension compatibility checker (#16215)
Extractors
+ [openload] Recognize IPv6 stream URLs (#16136, #16137, #16205, #16246,
#16250)
+ [twitch] Extract is_live according to status (#16259)
* [pornflip] Relax URL regular expression (#16258)
- [etonline] Remove extractor (#16256)
* [breakcom] Fix extraction (#16254)
+ [youtube] Add ability to authenticate with cookies
* [youtube:feed] Implement lazy playlist extraction (#10184)
+ [svt] Add support for TV channel live streams (#15279, #15809)
* [ccma] Fix video extraction (#15931)
* [rentv] Fix extraction (#15227)
+ [nick] Add support for nickjr.nl (#16230)
* [extremetube] Fix metadata extraction
+ [keezmovies] Add support for generic embeds (#16134, #16154)
* [nexx] Extract new azure URLs (#16223)
* [cbssports] Fix extraction (#16217)
* [kaltura] Improve embeds detection (#16201)
* [instagram:user] Fix extraction (#16119)
* [cbs] Skip DRM asset types (#16104)
version 2018.04.16 version 2018.04.16
Extractors Extractors

View File

@ -93,8 +93,8 @@ Alternatively, refer to the [developer instructions](#developer-instructions) fo
## Network Options: ## Network Options:
--proxy URL Use the specified HTTP/HTTPS/SOCKS proxy. --proxy URL Use the specified HTTP/HTTPS/SOCKS proxy.
To enable experimental SOCKS proxy, specify To enable SOCKS proxy, specify a proper
a proper scheme. For example scheme. For example
socks5://127.0.0.1:1080/. Pass in an empty socks5://127.0.0.1:1080/. Pass in an empty
string (--proxy "") for direct connection string (--proxy "") for direct connection
--socket-timeout SECONDS Time to wait before giving up, in seconds --socket-timeout SECONDS Time to wait before giving up, in seconds
@ -106,16 +106,18 @@ Alternatively, refer to the [developer instructions](#developer-instructions) fo
--geo-verification-proxy URL Use this proxy to verify the IP address for --geo-verification-proxy URL Use this proxy to verify the IP address for
some geo-restricted sites. The default some geo-restricted sites. The default
proxy specified by --proxy (or none, if the proxy specified by --proxy (or none, if the
options is not present) is used for the option is not present) is used for the
actual downloading. actual downloading.
--geo-bypass Bypass geographic restriction via faking --geo-bypass Bypass geographic restriction via faking
X-Forwarded-For HTTP header (experimental) X-Forwarded-For HTTP header
--no-geo-bypass Do not bypass geographic restriction via --no-geo-bypass Do not bypass geographic restriction via
faking X-Forwarded-For HTTP header faking X-Forwarded-For HTTP header
(experimental)
--geo-bypass-country CODE Force bypass geographic restriction with --geo-bypass-country CODE Force bypass geographic restriction with
explicitly provided two-letter ISO 3166-2 explicitly provided two-letter ISO 3166-2
country code (experimental) country code
--geo-bypass-ip-block IP_BLOCK Force bypass geographic restriction with
explicitly provided IP block in CIDR
notation
## Video Selection: ## Video Selection:
--playlist-start NUMBER Playlist video to start at (default is 1) --playlist-start NUMBER Playlist video to start at (default is 1)
@ -206,7 +208,7 @@ Alternatively, refer to the [developer instructions](#developer-instructions) fo
--playlist-reverse Download playlist videos in reverse order --playlist-reverse Download playlist videos in reverse order
--playlist-random Download playlist videos in random order --playlist-random Download playlist videos in random order
--xattr-set-filesize Set file xattribute ytdl.filesize with --xattr-set-filesize Set file xattribute ytdl.filesize with
expected file size (experimental) expected file size
--hls-prefer-native Use the native HLS downloader instead of --hls-prefer-native Use the native HLS downloader instead of
ffmpeg ffmpeg
--hls-prefer-ffmpeg Use ffmpeg instead of the native HLS --hls-prefer-ffmpeg Use ffmpeg instead of the native HLS

View File

@ -1,27 +1,22 @@
#!/usr/bin/env python3 #!/usr/bin/env python3
from __future__ import unicode_literals from __future__ import unicode_literals
import hashlib
import urllib.request
import json import json
versions_info = json.load(open('update/versions.json')) versions_info = json.load(open('update/versions.json'))
version = versions_info['latest'] version = versions_info['latest']
URL = versions_info['versions'][version]['bin'][0] version_dict = versions_info['versions'][version]
data = urllib.request.urlopen(URL).read()
# Read template page # Read template page
with open('download.html.in', 'r', encoding='utf-8') as tmplf: with open('download.html.in', 'r', encoding='utf-8') as tmplf:
template = tmplf.read() template = tmplf.read()
sha256sum = hashlib.sha256(data).hexdigest()
template = template.replace('@PROGRAM_VERSION@', version) template = template.replace('@PROGRAM_VERSION@', version)
template = template.replace('@PROGRAM_URL@', URL) template = template.replace('@PROGRAM_URL@', version_dict['bin'][0])
template = template.replace('@PROGRAM_SHA256SUM@', sha256sum) template = template.replace('@PROGRAM_SHA256SUM@', version_dict['bin'][1])
template = template.replace('@EXE_URL@', versions_info['versions'][version]['exe'][0]) template = template.replace('@EXE_URL@', version_dict['exe'][0])
template = template.replace('@EXE_SHA256SUM@', versions_info['versions'][version]['exe'][1]) template = template.replace('@EXE_SHA256SUM@', version_dict['exe'][1])
template = template.replace('@TAR_URL@', versions_info['versions'][version]['tar'][0]) template = template.replace('@TAR_URL@', version_dict['tar'][0])
template = template.replace('@TAR_SHA256SUM@', versions_info['versions'][version]['tar'][1]) template = template.replace('@TAR_SHA256SUM@', version_dict['tar'][1])
with open('download.html', 'w', encoding='utf-8') as dlf: with open('download.html', 'w', encoding='utf-8') as dlf:
dlf.write(template) dlf.write(template)

View File

@ -100,6 +100,7 @@
- **Beatport** - **Beatport**
- **Beeg** - **Beeg**
- **BehindKink** - **BehindKink**
- **Bellator**
- **BellMedia** - **BellMedia**
- **Bet** - **Bet**
- **Bigflix** - **Bigflix**
@ -122,6 +123,7 @@
- **BRMediathek**: Bayerischer Rundfunk Mediathek - **BRMediathek**: Bayerischer Rundfunk Mediathek
- **bt:article**: Bergens Tidende Articles - **bt:article**: Bergens Tidende Articles
- **bt:vestlendingen**: Bergens Tidende - Vestlendingen - **bt:vestlendingen**: Bergens Tidende - Vestlendingen
- **BusinessInsider**
- **BuzzFeed** - **BuzzFeed**
- **BYUtv** - **BYUtv**
- **Camdemy** - **Camdemy**
@ -163,6 +165,7 @@
- **ClipRs** - **ClipRs**
- **Clipsyndicate** - **Clipsyndicate**
- **CloserToTruth** - **CloserToTruth**
- **CloudflareStream**
- **cloudtime**: CloudTime - **cloudtime**: CloudTime
- **Cloudy** - **Cloudy**
- **Clubic** - **Clubic**
@ -232,6 +235,7 @@
- **DrTuber** - **DrTuber**
- **drtv** - **drtv**
- **drtv:live** - **drtv:live**
- **DTube**
- **Dumpert** - **Dumpert**
- **dvtv**: http://video.aktualne.cz/ - **dvtv**: http://video.aktualne.cz/
- **dw** - **dw**
@ -257,7 +261,6 @@
- **ESPN** - **ESPN**
- **ESPNArticle** - **ESPNArticle**
- **EsriVideo** - **EsriVideo**
- **ETOnline**
- **Europa** - **Europa**
- **EveryonesMixtape** - **EveryonesMixtape**
- **ExpoTV** - **ExpoTV**
@ -362,7 +365,6 @@
- **ImgurAlbum** - **ImgurAlbum**
- **Ina** - **Ina**
- **Inc** - **Inc**
- **Indavideo**
- **IndavideoEmbed** - **IndavideoEmbed**
- **InfoQ** - **InfoQ**
- **Instagram** - **Instagram**
@ -374,6 +376,7 @@
- **Ir90Tv** - **Ir90Tv**
- **ITTF** - **ITTF**
- **ITV** - **ITV**
- **ITVBTCC**
- **ivi**: ivi.ru - **ivi**: ivi.ru
- **ivi:compilation**: ivi.ru compilations - **ivi:compilation**: ivi.ru compilations
- **ivideon**: Ivideon TV - **ivideon**: Ivideon TV
@ -446,7 +449,6 @@
- **mailru**: Видео@Mail.Ru - **mailru**: Видео@Mail.Ru
- **mailru:music**: Музыка@Mail.Ru - **mailru:music**: Музыка@Mail.Ru
- **mailru:music:search**: Музыка@Mail.Ru - **mailru:music:search**: Музыка@Mail.Ru
- **MakersChannel**
- **MakerTV** - **MakerTV**
- **mangomolo:live** - **mangomolo:live**
- **mangomolo:video** - **mangomolo:video**
@ -484,7 +486,6 @@
- **MoeVideo**: LetitBit video services: moevideo.net, playreplay.net and videochart.net - **MoeVideo**: LetitBit video services: moevideo.net, playreplay.net and videochart.net
- **Mofosex** - **Mofosex**
- **Mojvideo** - **Mojvideo**
- **Moniker**: allmyvideos.net and vidspot.net
- **Morningstar**: morningstar.com - **Morningstar**: morningstar.com
- **Motherless** - **Motherless**
- **MotherlessGroup** - **MotherlessGroup**
@ -506,6 +507,7 @@
- **mva:course**: Microsoft Virtual Academy courses - **mva:course**: Microsoft Virtual Academy courses
- **Mwave** - **Mwave**
- **MwaveMeetGreet** - **MwaveMeetGreet**
- **MyChannels**
- **MySpace** - **MySpace**
- **MySpace:album** - **MySpace:album**
- **MySpass** - **MySpass**
@ -523,6 +525,7 @@
- **nbcolympics** - **nbcolympics**
- **nbcolympics:stream** - **nbcolympics:stream**
- **NBCSports** - **NBCSports**
- **NBCSportsStream**
- **NBCSportsVPlayer** - **NBCSportsVPlayer**
- **ndr**: NDR.de - Norddeutscher Rundfunk - **ndr**: NDR.de - Norddeutscher Rundfunk
- **ndr:embed** - **ndr:embed**
@ -616,11 +619,13 @@
- **PacktPubCourse** - **PacktPubCourse**
- **PandaTV**: 熊猫TV - **PandaTV**: 熊猫TV
- **pandora.tv**: 판도라TV - **pandora.tv**: 판도라TV
- **ParamountNetwork**
- **parliamentlive.tv**: UK parliament videos - **parliamentlive.tv**: UK parliament videos
- **Patreon** - **Patreon**
- **pbs**: Public Broadcasting Service (PBS) and member stations: PBS: Public Broadcasting Service, APT - Alabama Public Television (WBIQ), GPB/Georgia Public Broadcasting (WGTV), Mississippi Public Broadcasting (WMPN), Nashville Public Television (WNPT), WFSU-TV (WFSU), WSRE (WSRE), WTCI (WTCI), WPBA/Channel 30 (WPBA), Alaska Public Media (KAKM), Arizona PBS (KAET), KNME-TV/Channel 5 (KNME), Vegas PBS (KLVX), AETN/ARKANSAS ETV NETWORK (KETS), KET (WKLE), WKNO/Channel 10 (WKNO), LPB/LOUISIANA PUBLIC BROADCASTING (WLPB), OETA (KETA), Ozarks Public Television (KOZK), WSIU Public Broadcasting (WSIU), KEET TV (KEET), KIXE/Channel 9 (KIXE), KPBS San Diego (KPBS), KQED (KQED), KVIE Public Television (KVIE), PBS SoCal/KOCE (KOCE), ValleyPBS (KVPT), CONNECTICUT PUBLIC TELEVISION (WEDH), KNPB Channel 5 (KNPB), SOPTV (KSYS), Rocky Mountain PBS (KRMA), KENW-TV3 (KENW), KUED Channel 7 (KUED), Wyoming PBS (KCWC), Colorado Public Television / KBDI 12 (KBDI), KBYU-TV (KBYU), Thirteen/WNET New York (WNET), WGBH/Channel 2 (WGBH), WGBY (WGBY), NJTV Public Media NJ (WNJT), WLIW21 (WLIW), mpt/Maryland Public Television (WMPB), WETA Television and Radio (WETA), WHYY (WHYY), PBS 39 (WLVT), WVPT - Your Source for PBS and More! (WVPT), Howard University Television (WHUT), WEDU PBS (WEDU), WGCU Public Media (WGCU), WPBT2 (WPBT), WUCF TV (WUCF), WUFT/Channel 5 (WUFT), WXEL/Channel 42 (WXEL), WLRN/Channel 17 (WLRN), WUSF Public Broadcasting (WUSF), ETV (WRLK), UNC-TV (WUNC), PBS Hawaii - Oceanic Cable Channel 10 (KHET), Idaho Public Television (KAID), KSPS (KSPS), OPB (KOPB), KWSU/Channel 10 & KTNW/Channel 31 (KWSU), WILL-TV (WILL), Network Knowledge - WSEC/Springfield (WSEC), WTTW11 (WTTW), Iowa Public Television/IPTV (KDIN), Nine Network (KETC), PBS39 Fort Wayne (WFWA), WFYI Indianapolis (WFYI), Milwaukee Public Television (WMVS), WNIN (WNIN), WNIT Public Television (WNIT), WPT (WPNE), WVUT/Channel 22 (WVUT), WEIU/Channel 51 (WEIU), WQPT-TV (WQPT), WYCC PBS Chicago (WYCC), WIPB-TV (WIPB), WTIU (WTIU), CET (WCET), ThinkTVNetwork (WPTD), WBGU-TV (WBGU), WGVU TV (WGVU), NET1 (KUON), Pioneer Public Television (KWCM), SDPB Television (KUSD), TPT (KTCA), KSMQ (KSMQ), KPTS/Channel 8 (KPTS), KTWU/Channel 11 (KTWU), East Tennessee PBS (WSJK), WCTE-TV (WCTE), WLJT, Channel 11 (WLJT), WOSU TV (WOSU), WOUB/WOUC (WOUB), WVPB (WVPB), WKYU-PBS (WKYU), KERA 13 (KERA), MPBN (WCBB), Mountain Lake PBS (WCFE), NHPTV (WENH), Vermont PBS (WETK), witf (WITF), WQED Multimedia (WQED), WMHT Educational Telecommunications (WMHT), Q-TV (WDCQ), WTVS Detroit Public TV (WTVS), CMU Public Television (WCMU), WKAR-TV (WKAR), WNMU-TV Public TV 13 (WNMU), WDSE - WRPT (WDSE), WGTE TV (WGTE), Lakeland Public Television (KAWE), KMOS-TV - Channels 6.1, 6.2 and 6.3 (KMOS), MontanaPBS (KUSM), KRWG/Channel 22 (KRWG), KACV (KACV), KCOS/Channel 13 (KCOS), WCNY/Channel 24 (WCNY), WNED (WNED), WPBS (WPBS), WSKG Public TV (WSKG), WXXI (WXXI), WPSU (WPSU), WVIA Public Media Studios (WVIA), WTVI (WTVI), Western Reserve PBS (WNEO), WVIZ/PBS ideastream (WVIZ), KCTS 9 (KCTS), Basin PBS (KPBT), KUHT / Channel 8 (KUHT), KLRN (KLRN), KLRU (KLRU), WTJX Channel 12 (WTJX), WCVE PBS (WCVE), KBTC Public Television (KBTC) - **pbs**: Public Broadcasting Service (PBS) and member stations: PBS: Public Broadcasting Service, APT - Alabama Public Television (WBIQ), GPB/Georgia Public Broadcasting (WGTV), Mississippi Public Broadcasting (WMPN), Nashville Public Television (WNPT), WFSU-TV (WFSU), WSRE (WSRE), WTCI (WTCI), WPBA/Channel 30 (WPBA), Alaska Public Media (KAKM), Arizona PBS (KAET), KNME-TV/Channel 5 (KNME), Vegas PBS (KLVX), AETN/ARKANSAS ETV NETWORK (KETS), KET (WKLE), WKNO/Channel 10 (WKNO), LPB/LOUISIANA PUBLIC BROADCASTING (WLPB), OETA (KETA), Ozarks Public Television (KOZK), WSIU Public Broadcasting (WSIU), KEET TV (KEET), KIXE/Channel 9 (KIXE), KPBS San Diego (KPBS), KQED (KQED), KVIE Public Television (KVIE), PBS SoCal/KOCE (KOCE), ValleyPBS (KVPT), CONNECTICUT PUBLIC TELEVISION (WEDH), KNPB Channel 5 (KNPB), SOPTV (KSYS), Rocky Mountain PBS (KRMA), KENW-TV3 (KENW), KUED Channel 7 (KUED), Wyoming PBS (KCWC), Colorado Public Television / KBDI 12 (KBDI), KBYU-TV (KBYU), Thirteen/WNET New York (WNET), WGBH/Channel 2 (WGBH), WGBY (WGBY), NJTV Public Media NJ (WNJT), WLIW21 (WLIW), mpt/Maryland Public Television (WMPB), WETA Television and Radio (WETA), WHYY (WHYY), PBS 39 (WLVT), WVPT - Your Source for PBS and More! (WVPT), Howard University Television (WHUT), WEDU PBS (WEDU), WGCU Public Media (WGCU), WPBT2 (WPBT), WUCF TV (WUCF), WUFT/Channel 5 (WUFT), WXEL/Channel 42 (WXEL), WLRN/Channel 17 (WLRN), WUSF Public Broadcasting (WUSF), ETV (WRLK), UNC-TV (WUNC), PBS Hawaii - Oceanic Cable Channel 10 (KHET), Idaho Public Television (KAID), KSPS (KSPS), OPB (KOPB), KWSU/Channel 10 & KTNW/Channel 31 (KWSU), WILL-TV (WILL), Network Knowledge - WSEC/Springfield (WSEC), WTTW11 (WTTW), Iowa Public Television/IPTV (KDIN), Nine Network (KETC), PBS39 Fort Wayne (WFWA), WFYI Indianapolis (WFYI), Milwaukee Public Television (WMVS), WNIN (WNIN), WNIT Public Television (WNIT), WPT (WPNE), WVUT/Channel 22 (WVUT), WEIU/Channel 51 (WEIU), WQPT-TV (WQPT), WYCC PBS Chicago (WYCC), WIPB-TV (WIPB), WTIU (WTIU), CET (WCET), ThinkTVNetwork (WPTD), WBGU-TV (WBGU), WGVU TV (WGVU), NET1 (KUON), Pioneer Public Television (KWCM), SDPB Television (KUSD), TPT (KTCA), KSMQ (KSMQ), KPTS/Channel 8 (KPTS), KTWU/Channel 11 (KTWU), East Tennessee PBS (WSJK), WCTE-TV (WCTE), WLJT, Channel 11 (WLJT), WOSU TV (WOSU), WOUB/WOUC (WOUB), WVPB (WVPB), WKYU-PBS (WKYU), KERA 13 (KERA), MPBN (WCBB), Mountain Lake PBS (WCFE), NHPTV (WENH), Vermont PBS (WETK), witf (WITF), WQED Multimedia (WQED), WMHT Educational Telecommunications (WMHT), Q-TV (WDCQ), WTVS Detroit Public TV (WTVS), CMU Public Television (WCMU), WKAR-TV (WKAR), WNMU-TV Public TV 13 (WNMU), WDSE - WRPT (WDSE), WGTE TV (WGTE), Lakeland Public Television (KAWE), KMOS-TV - Channels 6.1, 6.2 and 6.3 (KMOS), MontanaPBS (KUSM), KRWG/Channel 22 (KRWG), KACV (KACV), KCOS/Channel 13 (KCOS), WCNY/Channel 24 (WCNY), WNED (WNED), WPBS (WPBS), WSKG Public TV (WSKG), WXXI (WXXI), WPSU (WPSU), WVIA Public Media Studios (WVIA), WTVI (WTVI), Western Reserve PBS (WNEO), WVIZ/PBS ideastream (WVIZ), KCTS 9 (KCTS), Basin PBS (KPBT), KUHT / Channel 8 (KUHT), KLRN (KLRN), KLRU (KLRU), WTJX Channel 12 (WTJX), WCVE PBS (WCVE), KBTC Public Television (KBTC)
- **pcmag** - **pcmag**
- **PearVideo** - **PearVideo**
- **PeerTube**
- **People** - **People**
- **PerformGroup** - **PerformGroup**
- **periscope**: Periscope - **periscope**: Periscope
@ -668,6 +673,8 @@
- **qqmusic:playlist**: QQ音乐 - 歌单 - **qqmusic:playlist**: QQ音乐 - 歌单
- **qqmusic:singer**: QQ音乐 - 歌手 - **qqmusic:singer**: QQ音乐 - 歌手
- **qqmusic:toplist**: QQ音乐 - 排行榜 - **qqmusic:toplist**: QQ音乐 - 排行榜
- **Quickline**
- **QuicklineLive**
- **R7** - **R7**
- **R7Article** - **R7Article**
- **radio.de** - **radio.de**
@ -785,7 +792,6 @@
- **Spiegel** - **Spiegel**
- **Spiegel:Article**: Articles on spiegel.de - **Spiegel:Article**: Articles on spiegel.de
- **Spiegeltv** - **Spiegeltv**
- **Spike**
- **Sport5** - **Sport5**
- **SportBoxEmbed** - **SportBoxEmbed**
- **SportDeutschland** - **SportDeutschland**
@ -1093,6 +1099,8 @@
- **youtube:watchlater**: Youtube watch later list, ":ytwatchlater" for short (requires authentication) - **youtube:watchlater**: Youtube watch later list, ":ytwatchlater" for short (requires authentication)
- **Zapiks** - **Zapiks**
- **Zaq1** - **Zaq1**
- **Zattoo**
- **ZattooLive**
- **ZDF** - **ZDF**
- **ZDFChannel** - **ZDFChannel**
- **zingmp3**: mp3.zing.vn - **zingmp3**: mp3.zing.vn

View File

@ -42,6 +42,7 @@ from youtube_dl.utils import (
is_html, is_html,
js_to_json, js_to_json,
limit_length, limit_length,
merge_dicts,
mimetype2ext, mimetype2ext,
month_by_name, month_by_name,
multipart_encode, multipart_encode,
@ -518,6 +519,8 @@ class TestUtil(unittest.TestCase):
self.assertEqual(parse_age_limit('PG-13'), 13) self.assertEqual(parse_age_limit('PG-13'), 13)
self.assertEqual(parse_age_limit('TV-14'), 14) self.assertEqual(parse_age_limit('TV-14'), 14)
self.assertEqual(parse_age_limit('TV-MA'), 17) self.assertEqual(parse_age_limit('TV-MA'), 17)
self.assertEqual(parse_age_limit('TV14'), 14)
self.assertEqual(parse_age_limit('TV_G'), 0)
def test_parse_duration(self): def test_parse_duration(self):
self.assertEqual(parse_duration(None), None) self.assertEqual(parse_duration(None), None)
@ -669,6 +672,17 @@ class TestUtil(unittest.TestCase):
self.assertEqual(dict_get(d, ('b', 'c', key, )), None) self.assertEqual(dict_get(d, ('b', 'c', key, )), None)
self.assertEqual(dict_get(d, ('b', 'c', key, ), skip_false_values=False), false_value) self.assertEqual(dict_get(d, ('b', 'c', key, ), skip_false_values=False), false_value)
def test_merge_dicts(self):
self.assertEqual(merge_dicts({'a': 1}, {'b': 2}), {'a': 1, 'b': 2})
self.assertEqual(merge_dicts({'a': 1}, {'a': 2}), {'a': 1})
self.assertEqual(merge_dicts({'a': 1}, {'a': None}), {'a': 1})
self.assertEqual(merge_dicts({'a': 1}, {'a': ''}), {'a': 1})
self.assertEqual(merge_dicts({'a': 1}, {}), {'a': 1})
self.assertEqual(merge_dicts({'a': None}, {'a': 1}), {'a': 1})
self.assertEqual(merge_dicts({'a': ''}, {'a': 1}), {'a': ''})
self.assertEqual(merge_dicts({'a': ''}, {'a': 'abc'}), {'a': 'abc'})
self.assertEqual(merge_dicts({'a': None}, {'a': ''}, {'a': 'abc'}), {'a': 'abc'})
def test_encode_compat_str(self): def test_encode_compat_str(self):
self.assertEqual(encode_compat_str(b'\xd1\x82\xd0\xb5\xd1\x81\xd1\x82', 'utf-8'), 'тест') self.assertEqual(encode_compat_str(b'\xd1\x82\xd0\xb5\xd1\x81\xd1\x82', 'utf-8'), 'тест')
self.assertEqual(encode_compat_str('тест', 'utf-8'), 'тест') self.assertEqual(encode_compat_str('тест', 'utf-8'), 'тест')
@ -1072,6 +1086,18 @@ ffmpeg version 2.4.4 Copyright (c) 2000-2014 the FFmpeg ...'''), '2.4.4')
self.assertFalse(match_str( self.assertFalse(match_str(
'like_count > 100 & dislike_count <? 50 & description', 'like_count > 100 & dislike_count <? 50 & description',
{'like_count': 190, 'dislike_count': 10})) {'like_count': 190, 'dislike_count': 10}))
self.assertTrue(match_str('is_live', {'is_live': True}))
self.assertFalse(match_str('is_live', {'is_live': False}))
self.assertFalse(match_str('is_live', {'is_live': None}))
self.assertFalse(match_str('is_live', {}))
self.assertFalse(match_str('!is_live', {'is_live': True}))
self.assertTrue(match_str('!is_live', {'is_live': False}))
self.assertTrue(match_str('!is_live', {'is_live': None}))
self.assertTrue(match_str('!is_live', {}))
self.assertTrue(match_str('title', {'title': 'abc'}))
self.assertTrue(match_str('title', {'title': ''}))
self.assertFalse(match_str('!title', {'title': 'abc'}))
self.assertFalse(match_str('!title', {'title': ''}))
def test_parse_dfxp_time_expr(self): def test_parse_dfxp_time_expr(self):
self.assertEqual(parse_dfxp_time_expr(None), None) self.assertEqual(parse_dfxp_time_expr(None), None)

View File

@ -211,7 +211,7 @@ class YoutubeDL(object):
At the moment, this is only supported by YouTube. At the moment, this is only supported by YouTube.
proxy: URL of the proxy server to use proxy: URL of the proxy server to use
geo_verification_proxy: URL of the proxy to use for IP address verification geo_verification_proxy: URL of the proxy to use for IP address verification
on geo-restricted sites. (Experimental) on geo-restricted sites.
socket_timeout: Time to wait for unresponsive hosts, in seconds socket_timeout: Time to wait for unresponsive hosts, in seconds
bidi_workaround: Work around buggy terminals without bidirectional text bidi_workaround: Work around buggy terminals without bidirectional text
support, using fridibi support, using fridibi
@ -259,7 +259,7 @@ class YoutubeDL(object):
- "warn": only emit a warning - "warn": only emit a warning
- "detect_or_warn": check whether we can do anything - "detect_or_warn": check whether we can do anything
about it, warn otherwise (default) about it, warn otherwise (default)
source_address: (Experimental) Client-side IP address to bind to. source_address: Client-side IP address to bind to.
call_home: Boolean, true iff we are allowed to contact the call_home: Boolean, true iff we are allowed to contact the
youtube-dl servers for debugging. youtube-dl servers for debugging.
sleep_interval: Number of seconds to sleep before each download when sleep_interval: Number of seconds to sleep before each download when
@ -281,11 +281,14 @@ class YoutubeDL(object):
match_filter_func in utils.py is one example for this. match_filter_func in utils.py is one example for this.
no_color: Do not emit color codes in output. no_color: Do not emit color codes in output.
geo_bypass: Bypass geographic restriction via faking X-Forwarded-For geo_bypass: Bypass geographic restriction via faking X-Forwarded-For
HTTP header (experimental) HTTP header
geo_bypass_country: geo_bypass_country:
Two-letter ISO 3166-2 country code that will be used for Two-letter ISO 3166-2 country code that will be used for
explicit geographic restriction bypassing via faking explicit geographic restriction bypassing via faking
X-Forwarded-For HTTP header (experimental) X-Forwarded-For HTTP header
geo_bypass_ip_block:
IP range in CIDR notation that will be used similarly to
geo_bypass_country
The following options determine which downloader is picked: The following options determine which downloader is picked:
external_downloader: Executable of the external downloader to call. external_downloader: Executable of the external downloader to call.
@ -1479,23 +1482,28 @@ class YoutubeDL(object):
if info_dict.get('%s_number' % field) is not None and not info_dict.get(field): if info_dict.get('%s_number' % field) is not None and not info_dict.get(field):
info_dict[field] = '%s %d' % (field.capitalize(), info_dict['%s_number' % field]) info_dict[field] = '%s %d' % (field.capitalize(), info_dict['%s_number' % field])
subtitles = info_dict.get('subtitles') for cc_kind in ('subtitles', 'automatic_captions'):
if subtitles: cc = info_dict.get(cc_kind)
for _, subtitle in subtitles.items(): if cc:
for _, subtitle in cc.items():
for subtitle_format in subtitle: for subtitle_format in subtitle:
if subtitle_format.get('url'): if subtitle_format.get('url'):
subtitle_format['url'] = sanitize_url(subtitle_format['url']) subtitle_format['url'] = sanitize_url(subtitle_format['url'])
if subtitle_format.get('ext') is None: if subtitle_format.get('ext') is None:
subtitle_format['ext'] = determine_ext(subtitle_format['url']).lower() subtitle_format['ext'] = determine_ext(subtitle_format['url']).lower()
automatic_captions = info_dict.get('automatic_captions')
subtitles = info_dict.get('subtitles')
if self.params.get('listsubtitles', False): if self.params.get('listsubtitles', False):
if 'automatic_captions' in info_dict: if 'automatic_captions' in info_dict:
self.list_subtitles(info_dict['id'], info_dict.get('automatic_captions'), 'automatic captions') self.list_subtitles(
info_dict['id'], automatic_captions, 'automatic captions')
self.list_subtitles(info_dict['id'], subtitles, 'subtitles') self.list_subtitles(info_dict['id'], subtitles, 'subtitles')
return return
info_dict['requested_subtitles'] = self.process_subtitles( info_dict['requested_subtitles'] = self.process_subtitles(
info_dict['id'], subtitles, info_dict['id'], subtitles, automatic_captions)
info_dict.get('automatic_captions'))
# We now pick which formats have to be downloaded # We now pick which formats have to be downloaded
if info_dict.get('formats') is None: if info_dict.get('formats') is None:

View File

@ -430,6 +430,7 @@ def _real_main(argv=None):
'config_location': opts.config_location, 'config_location': opts.config_location,
'geo_bypass': opts.geo_bypass, 'geo_bypass': opts.geo_bypass,
'geo_bypass_country': opts.geo_bypass_country, 'geo_bypass_country': opts.geo_bypass_country,
'geo_bypass_ip_block': opts.geo_bypass_ip_block,
# just for deprecation check # just for deprecation check
'autonumber': opts.autonumber if opts.autonumber is True else None, 'autonumber': opts.autonumber if opts.autonumber is True else None,
'usetitle': opts.usetitle if opts.usetitle is True else None, 'usetitle': opts.usetitle if opts.usetitle is True else None,

View File

@ -45,7 +45,6 @@ class FileDownloader(object):
min_filesize: Skip files smaller than this size min_filesize: Skip files smaller than this size
max_filesize: Skip files larger than this size max_filesize: Skip files larger than this size
xattr_set_filesize: Set ytdl.filesize user xattribute with expected size. xattr_set_filesize: Set ytdl.filesize user xattribute with expected size.
(experimental)
external_downloader_args: A list of additional command-line arguments for the external_downloader_args: A list of additional command-line arguments for the
external downloader. external downloader.
hls_use_mpegts: Use the mpegts container for HLS videos. hls_use_mpegts: Use the mpegts container for HLS videos.

View File

@ -74,8 +74,13 @@ class FragmentFD(FileDownloader):
return not ctx['live'] and not ctx['tmpfilename'] == '-' return not ctx['live'] and not ctx['tmpfilename'] == '-'
def _read_ytdl_file(self, ctx): def _read_ytdl_file(self, ctx):
assert 'ytdl_corrupt' not in ctx
stream, _ = sanitize_open(self.ytdl_filename(ctx['filename']), 'r') stream, _ = sanitize_open(self.ytdl_filename(ctx['filename']), 'r')
try:
ctx['fragment_index'] = json.loads(stream.read())['downloader']['current_fragment']['index'] ctx['fragment_index'] = json.loads(stream.read())['downloader']['current_fragment']['index']
except Exception:
ctx['ytdl_corrupt'] = True
finally:
stream.close() stream.close()
def _write_ytdl_file(self, ctx): def _write_ytdl_file(self, ctx):
@ -158,11 +163,17 @@ class FragmentFD(FileDownloader):
if self.__do_ytdl_file(ctx): if self.__do_ytdl_file(ctx):
if os.path.isfile(encodeFilename(self.ytdl_filename(ctx['filename']))): if os.path.isfile(encodeFilename(self.ytdl_filename(ctx['filename']))):
self._read_ytdl_file(ctx) self._read_ytdl_file(ctx)
if ctx['fragment_index'] > 0 and resume_len == 0: is_corrupt = ctx.get('ytdl_corrupt') is True
is_inconsistent = ctx['fragment_index'] > 0 and resume_len == 0
if is_corrupt or is_inconsistent:
message = (
'.ytdl file is corrupt' if is_corrupt else
'Inconsistent state of incomplete fragment download')
self.report_warning( self.report_warning(
'Inconsistent state of incomplete fragment download. ' '%s. Restarting from the beginning...' % message)
'Restarting from the beginning...')
ctx['fragment_index'] = resume_len = 0 ctx['fragment_index'] = resume_len = 0
if 'ytdl_corrupt' in ctx:
del ctx['ytdl_corrupt']
self._write_ytdl_file(ctx) self._write_ytdl_file(ctx)
else: else:
self._write_ytdl_file(ctx) self._write_ytdl_file(ctx)

View File

@ -24,10 +24,12 @@ class RtmpFD(FileDownloader):
def real_download(self, filename, info_dict): def real_download(self, filename, info_dict):
def run_rtmpdump(args): def run_rtmpdump(args):
start = time.time() start = time.time()
resume_percent = None
resume_downloaded_data_len = None
proc = subprocess.Popen(args, stderr=subprocess.PIPE) proc = subprocess.Popen(args, stderr=subprocess.PIPE)
cursor_in_new_line = True cursor_in_new_line = True
def dl():
resume_percent = None
resume_downloaded_data_len = None
proc_stderr_closed = False proc_stderr_closed = False
while not proc_stderr_closed: while not proc_stderr_closed:
# read line from stderr # read line from stderr
@ -88,7 +90,12 @@ class RtmpFD(FileDownloader):
self.to_screen('') self.to_screen('')
cursor_in_new_line = True cursor_in_new_line = True
self.to_screen('[rtmpdump] ' + line) self.to_screen('[rtmpdump] ' + line)
try:
dl()
finally:
proc.wait() proc.wait()
if not cursor_in_new_line: if not cursor_in_new_line:
self.to_screen('') self.to_screen('')
return proc.returncode return proc.returncode
@ -163,7 +170,15 @@ class RtmpFD(FileDownloader):
RD_INCOMPLETE = 2 RD_INCOMPLETE = 2
RD_NO_CONNECT = 3 RD_NO_CONNECT = 3
started = time.time()
try:
retval = run_rtmpdump(args) retval = run_rtmpdump(args)
except KeyboardInterrupt:
if not info_dict.get('is_live'):
raise
retval = RD_SUCCESS
self.to_screen('\n[rtmpdump] Interrupted by user')
if retval == RD_NO_CONNECT: if retval == RD_NO_CONNECT:
self.report_error('[rtmpdump] Could not connect to RTMP server.') self.report_error('[rtmpdump] Could not connect to RTMP server.')
@ -171,7 +186,7 @@ class RtmpFD(FileDownloader):
while retval in (RD_INCOMPLETE, RD_FAILED) and not test and not live: while retval in (RD_INCOMPLETE, RD_FAILED) and not test and not live:
prevsize = os.path.getsize(encodeFilename(tmpfilename)) prevsize = os.path.getsize(encodeFilename(tmpfilename))
self.to_screen('[rtmpdump] %s bytes' % prevsize) self.to_screen('[rtmpdump] Downloaded %s bytes' % prevsize)
time.sleep(5.0) # This seems to be needed time.sleep(5.0) # This seems to be needed
args = basic_args + ['--resume'] args = basic_args + ['--resume']
if retval == RD_FAILED: if retval == RD_FAILED:
@ -188,13 +203,14 @@ class RtmpFD(FileDownloader):
break break
if retval == RD_SUCCESS or (test and retval == RD_INCOMPLETE): if retval == RD_SUCCESS or (test and retval == RD_INCOMPLETE):
fsize = os.path.getsize(encodeFilename(tmpfilename)) fsize = os.path.getsize(encodeFilename(tmpfilename))
self.to_screen('[rtmpdump] %s bytes' % fsize) self.to_screen('[rtmpdump] Downloaded %s bytes' % fsize)
self.try_rename(tmpfilename, filename) self.try_rename(tmpfilename, filename)
self._hook_progress({ self._hook_progress({
'downloaded_bytes': fsize, 'downloaded_bytes': fsize,
'total_bytes': fsize, 'total_bytes': fsize,
'filename': filename, 'filename': filename,
'status': 'finished', 'status': 'finished',
'elapsed': time.time() - started,
}) })
return True return True
else: else:

View File

@ -52,7 +52,7 @@ class AnimeOnDemandIE(InfoExtractor):
}] }]
def _login(self): def _login(self):
(username, password) = self._get_login_info() username, password = self._get_login_info()
if username is None: if username is None:
return return

View File

@ -277,7 +277,9 @@ class AnvatoIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {}) url, smuggled_data = unsmuggle_url(url, {})
self._initialize_geo_bypass(smuggled_data.get('geo_countries')) self._initialize_geo_bypass({
'countries': smuggled_data.get('geo_countries'),
})
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
access_key, video_id = mobj.group('access_key_or_mcp', 'id') access_key, video_id = mobj.group('access_key_or_mcp', 'id')

View File

@ -0,0 +1,94 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
determine_ext,
js_to_json,
)
class APAIE(InfoExtractor):
_VALID_URL = r'https?://[^/]+\.apa\.at/embed/(?P<id>[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})'
_TESTS = [{
'url': 'http://uvp.apa.at/embed/293f6d17-692a-44e3-9fd5-7b178f3a1029',
'md5': '2b12292faeb0a7d930c778c7a5b4759b',
'info_dict': {
'id': 'jjv85FdZ',
'ext': 'mp4',
'title': '"Blau ist mysteriös": Die Blue Man Group im Interview',
'description': 'md5:d41d8cd98f00b204e9800998ecf8427e',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 254,
'timestamp': 1519211149,
'upload_date': '20180221',
},
}, {
'url': 'https://uvp-apapublisher.sf.apa.at/embed/2f94e9e6-d945-4db2-9548-f9a41ebf7b78',
'only_matching': True,
}, {
'url': 'http://uvp-rma.sf.apa.at/embed/70404cca-2f47-4855-bbb8-20b1fae58f76',
'only_matching': True,
}, {
'url': 'http://uvp-kleinezeitung.sf.apa.at/embed/f1c44979-dba2-4ebf-b021-e4cf2cac3c81',
'only_matching': True,
}]
@staticmethod
def _extract_urls(webpage):
return [
mobj.group('url')
for mobj in re.finditer(
r'<iframe[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//[^/]+\.apa\.at/embed/[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12}.*?)\1',
webpage)]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
jwplatform_id = self._search_regex(
r'media[iI]d\s*:\s*["\'](?P<id>[a-zA-Z0-9]{8})', webpage,
'jwplatform id', default=None)
if jwplatform_id:
return self.url_result(
'jwplatform:' + jwplatform_id, ie='JWPlatform',
video_id=video_id)
sources = self._parse_json(
self._search_regex(
r'sources\s*=\s*(\[.+?\])\s*;', webpage, 'sources'),
video_id, transform_source=js_to_json)
formats = []
for source in sources:
if not isinstance(source, dict):
continue
source_url = source.get('file')
if not source_url or not isinstance(source_url, compat_str):
continue
ext = determine_ext(source_url)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
source_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
else:
formats.append({
'url': source_url,
})
self._sort_formats(formats)
thumbnail = self._search_regex(
r'image\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1', webpage,
'thumbnail', fatal=False, group='url')
return {
'id': video_id,
'title': video_id,
'thumbnail': thumbnail,
'formats': formats,
}

View File

@ -74,7 +74,7 @@ class AtresPlayerIE(InfoExtractor):
self._login() self._login()
def _login(self): def _login(self):
(username, password) = self._get_login_info() username, password = self._get_login_info()
if username is None: if username is None:
return return

View File

@ -65,7 +65,7 @@ class AudiomackIE(InfoExtractor):
return {'_type': 'url', 'url': api_response['url'], 'ie_key': 'Soundcloud'} return {'_type': 'url', 'url': api_response['url'], 'ie_key': 'Soundcloud'}
return { return {
'id': api_response.get('id', album_url_tag), 'id': compat_str(api_response.get('id', album_url_tag)),
'uploader': api_response.get('artist'), 'uploader': api_response.get('artist'),
'title': api_response.get('title'), 'title': api_response.get('title'),
'url': api_response['url'], 'url': api_response['url'],

View File

@ -44,7 +44,7 @@ class BambuserIE(InfoExtractor):
} }
def _login(self): def _login(self):
(username, password) = self._get_login_info() username, password = self._get_login_info()
if username is None: if username is None:
return return

View File

@ -12,7 +12,7 @@ class BellMediaIE(InfoExtractor):
(?: (?:
ctv| ctv|
tsn| tsn|
bnn| bnn(?:bloomberg)?|
thecomedynetwork| thecomedynetwork|
discovery| discovery|
discoveryvelocity| discoveryvelocity|
@ -27,17 +27,16 @@ class BellMediaIE(InfoExtractor):
much\.com much\.com
)/.*?(?:\bvid(?:eoid)?=|-vid|~|%7E|/(?:episode)?)(?P<id>[0-9]{6,})''' )/.*?(?:\bvid(?:eoid)?=|-vid|~|%7E|/(?:episode)?)(?P<id>[0-9]{6,})'''
_TESTS = [{ _TESTS = [{
'url': 'http://www.ctv.ca/video/player?vid=706966', 'url': 'https://www.bnnbloomberg.ca/video/david-cockfield-s-top-picks~1403070',
'md5': 'ff2ebbeae0aa2dcc32a830c3fd69b7b0', 'md5': '36d3ef559cfe8af8efe15922cd3ce950',
'info_dict': { 'info_dict': {
'id': '706966', 'id': '1403070',
'ext': 'mp4', 'ext': 'flv',
'title': 'Larry Day and Richard Jutras on the TIFF red carpet of \'Stonewall\'', 'title': 'David Cockfield\'s Top Picks',
'description': 'etalk catches up with Larry Day and Richard Jutras on the TIFF red carpet of "Stonewall”.', 'description': 'md5:810f7f8c6a83ad5b48677c3f8e5bb2c3',
'upload_date': '20150919', 'upload_date': '20180525',
'timestamp': 1442624700, 'timestamp': 1527288600,
}, },
'expected_warnings': ['HTTP Error 404'],
}, { }, {
'url': 'http://www.thecomedynetwork.ca/video/player?vid=923582', 'url': 'http://www.thecomedynetwork.ca/video/player?vid=923582',
'only_matching': True, 'only_matching': True,
@ -70,6 +69,7 @@ class BellMediaIE(InfoExtractor):
'investigationdiscovery': 'invdisc', 'investigationdiscovery': 'invdisc',
'animalplanet': 'aniplan', 'animalplanet': 'aniplan',
'etalk': 'ctv', 'etalk': 'ctv',
'bnnbloomberg': 'bnn',
} }
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -669,7 +669,10 @@ class BrightcoveNewIE(AdobePassIE):
def _real_extract(self, url): def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {}) url, smuggled_data = unsmuggle_url(url, {})
self._initialize_geo_bypass(smuggled_data.get('geo_countries')) self._initialize_geo_bypass({
'countries': smuggled_data.get('geo_countries'),
'ip_blocks': smuggled_data.get('geo_ip_blocks'),
})
account_id, player_id, embed, video_id = re.match(self._VALID_URL, url).groups() account_id, player_id, embed, video_id = re.match(self._VALID_URL, url).groups()

View File

@ -0,0 +1,42 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from .jwplatform import JWPlatformIE
class BusinessInsiderIE(InfoExtractor):
_VALID_URL = r'https?://(?:[^/]+\.)?businessinsider\.(?:com|nl)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://uk.businessinsider.com/how-much-radiation-youre-exposed-to-in-everyday-life-2016-6',
'md5': 'ca237a53a8eb20b6dc5bd60564d4ab3e',
'info_dict': {
'id': 'hZRllCfw',
'ext': 'mp4',
'title': "Here's how much radiation you're exposed to in everyday life",
'description': 'md5:9a0d6e2c279948aadaa5e84d6d9b99bd',
'upload_date': '20170709',
'timestamp': 1499606400,
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://www.businessinsider.nl/5-scientifically-proven-things-make-you-less-attractive-2017-7/',
'only_matching': True,
}, {
'url': 'http://www.businessinsider.com/excel-index-match-vlookup-video-how-to-2015-2?IR=T',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
jwplatform_id = self._search_regex(
(r'data-media-id=["\']([a-zA-Z0-9]{8})',
r'id=["\']jwplayer_([a-zA-Z0-9]{8})',
r'id["\']?\s*:\s*["\']?([a-zA-Z0-9]{8})'),
webpage, 'jwplatform id')
return self.url_result(
'jwplatform:%s' % jwplatform_id, ie=JWPlatformIE.ie_key(),
video_id=video_id)

View File

@ -0,0 +1,96 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
ExtractorError,
int_or_none,
)
class CamModelsIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?cammodels\.com/cam/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://www.cammodels.com/cam/AutumnKnight/',
'only_matching': True,
}]
def _real_extract(self, url):
user_id = self._match_id(url)
webpage = self._download_webpage(
url, user_id, headers=self.geo_verification_headers())
manifest_root = self._html_search_regex(
r'manifestUrlRoot=([^&\']+)', webpage, 'manifest', default=None)
if not manifest_root:
ERRORS = (
("I'm offline, but let's stay connected", 'This user is currently offline'),
('in a private show', 'This user is in a private show'),
('is currently performing LIVE', 'This model is currently performing live'),
)
for pattern, message in ERRORS:
if pattern in webpage:
error = message
expected = True
break
else:
error = 'Unable to find manifest URL root'
expected = False
raise ExtractorError(error, expected=expected)
manifest = self._download_json(
'%s%s.json' % (manifest_root, user_id), user_id)
formats = []
for format_id, format_dict in manifest['formats'].items():
if not isinstance(format_dict, dict):
continue
encodings = format_dict.get('encodings')
if not isinstance(encodings, list):
continue
vcodec = format_dict.get('videoCodec')
acodec = format_dict.get('audioCodec')
for media in encodings:
if not isinstance(media, dict):
continue
media_url = media.get('location')
if not media_url or not isinstance(media_url, compat_str):
continue
format_id_list = [format_id]
height = int_or_none(media.get('videoHeight'))
if height is not None:
format_id_list.append('%dp' % height)
f = {
'url': media_url,
'format_id': '-'.join(format_id_list),
'width': int_or_none(media.get('videoWidth')),
'height': height,
'vbr': int_or_none(media.get('videoKbps')),
'abr': int_or_none(media.get('audioKbps')),
'fps': int_or_none(media.get('fps')),
'vcodec': vcodec,
'acodec': acodec,
}
if 'rtmp' in format_id:
f['ext'] = 'flv'
elif 'hls' in format_id:
f.update({
'ext': 'mp4',
# hls skips fragments, preferring rtmp
'preference': -1,
})
else:
continue
formats.append(f)
self._sort_formats(formats)
return {
'id': user_id,
'title': self._live_title(user_id),
'is_live': True,
'formats': formats,
}

View File

@ -20,6 +20,7 @@ from ..utils import (
parse_duration, parse_duration,
parse_iso8601, parse_iso8601,
parse_age_limit, parse_age_limit,
strip_or_none,
int_or_none, int_or_none,
ExtractorError, ExtractorError,
) )
@ -129,6 +130,9 @@ class CBCIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
title = self._og_search_title(webpage, default=None) or self._html_search_meta(
'twitter:title', webpage, 'title', default=None) or self._html_search_regex(
r'<title>([^<]+)</title>', webpage, 'title', fatal=False)
entries = [ entries = [
self._extract_player_init(player_init, display_id) self._extract_player_init(player_init, display_id)
for player_init in re.findall(r'CBC\.APP\.Caffeine\.initInstance\(({.+?})\);', webpage)] for player_init in re.findall(r'CBC\.APP\.Caffeine\.initInstance\(({.+?})\);', webpage)]
@ -136,8 +140,7 @@ class CBCIE(InfoExtractor):
self.url_result('cbcplayer:%s' % media_id, 'CBCPlayer', media_id) self.url_result('cbcplayer:%s' % media_id, 'CBCPlayer', media_id)
for media_id in re.findall(r'<iframe[^>]+src="[^"]+?mediaId=(\d+)"', webpage)]) for media_id in re.findall(r'<iframe[^>]+src="[^"]+?mediaId=(\d+)"', webpage)])
return self.playlist_result( return self.playlist_result(
entries, display_id, entries, display_id, strip_or_none(title),
self._og_search_title(webpage, fatal=False),
self._og_search_description(webpage)) self._og_search_description(webpage))

View File

@ -0,0 +1,60 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class CloudflareStreamIE(InfoExtractor):
_VALID_URL = r'''(?x)
https?://
(?:
(?:watch\.)?cloudflarestream\.com/|
embed\.cloudflarestream\.com/embed/[^/]+\.js\?.*?\bvideo=
)
(?P<id>[\da-f]+)
'''
_TESTS = [{
'url': 'https://embed.cloudflarestream.com/embed/we4g.fla9.latest.js?video=31c9291ab41fac05471db4e73aa11717',
'info_dict': {
'id': '31c9291ab41fac05471db4e73aa11717',
'ext': 'mp4',
'title': '31c9291ab41fac05471db4e73aa11717',
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://watch.cloudflarestream.com/9df17203414fd1db3e3ed74abbe936c1',
'only_matching': True,
}, {
'url': 'https://cloudflarestream.com/31c9291ab41fac05471db4e73aa11717/manifest/video.mpd',
'only_matching': True,
}]
@staticmethod
def _extract_urls(webpage):
return [
mobj.group('url')
for mobj in re.finditer(
r'<script[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//embed\.cloudflarestream\.com/embed/[^/]+\.js\?.*?\bvideo=[\da-f]+?.*?)\1',
webpage)]
def _real_extract(self, url):
video_id = self._match_id(url)
formats = self._extract_m3u8_formats(
'https://cloudflarestream.com/%s/manifest/video.m3u8' % video_id,
video_id, 'mp4', entry_protocol='m3u8_native', m3u8_id='hls',
fatal=False)
formats.extend(self._extract_mpd_formats(
'https://cloudflarestream.com/%s/manifest/video.mpd' % video_id,
video_id, mpd_id='dash', fatal=False))
self._sort_formats(formats)
return {
'id': video_id,
'title': video_id,
'formats': formats,
}

View File

@ -339,15 +339,17 @@ class InfoExtractor(object):
_GEO_BYPASS attribute may be set to False in order to disable _GEO_BYPASS attribute may be set to False in order to disable
geo restriction bypass mechanisms for a particular extractor. geo restriction bypass mechanisms for a particular extractor.
Though it won't disable explicit geo restriction bypass based on Though it won't disable explicit geo restriction bypass based on
country code provided with geo_bypass_country. (experimental) country code provided with geo_bypass_country.
_GEO_COUNTRIES attribute may contain a list of presumably geo unrestricted _GEO_COUNTRIES attribute may contain a list of presumably geo unrestricted
countries for this extractor. One of these countries will be used by countries for this extractor. One of these countries will be used by
geo restriction bypass mechanism right away in order to bypass geo restriction bypass mechanism right away in order to bypass
geo restriction, of course, if the mechanism is not disabled. (experimental) geo restriction, of course, if the mechanism is not disabled.
NB: both these geo attributes are experimental and may change in future _GEO_IP_BLOCKS attribute may contain a list of presumably geo unrestricted
or be completely removed. IP blocks in CIDR notation for this extractor. One of these IP blocks
will be used by geo restriction bypass mechanism similarly
to _GEO_COUNTRIES.
Finally, the _WORKING attribute should be set to False for broken IEs Finally, the _WORKING attribute should be set to False for broken IEs
in order to warn the users and skip the tests. in order to warn the users and skip the tests.
@ -358,6 +360,7 @@ class InfoExtractor(object):
_x_forwarded_for_ip = None _x_forwarded_for_ip = None
_GEO_BYPASS = True _GEO_BYPASS = True
_GEO_COUNTRIES = None _GEO_COUNTRIES = None
_GEO_IP_BLOCKS = None
_WORKING = True _WORKING = True
def __init__(self, downloader=None): def __init__(self, downloader=None):
@ -392,12 +395,15 @@ class InfoExtractor(object):
def initialize(self): def initialize(self):
"""Initializes an instance (authentication, etc).""" """Initializes an instance (authentication, etc)."""
self._initialize_geo_bypass(self._GEO_COUNTRIES) self._initialize_geo_bypass({
'countries': self._GEO_COUNTRIES,
'ip_blocks': self._GEO_IP_BLOCKS,
})
if not self._ready: if not self._ready:
self._real_initialize() self._real_initialize()
self._ready = True self._ready = True
def _initialize_geo_bypass(self, countries): def _initialize_geo_bypass(self, geo_bypass_context):
""" """
Initialize geo restriction bypass mechanism. Initialize geo restriction bypass mechanism.
@ -408,28 +414,82 @@ class InfoExtractor(object):
HTTP requests. HTTP requests.
This method will be used for initial geo bypass mechanism initialization This method will be used for initial geo bypass mechanism initialization
during the instance initialization with _GEO_COUNTRIES. during the instance initialization with _GEO_COUNTRIES and
_GEO_IP_BLOCKS.
You may also manually call it from extractor's code if geo countries You may also manually call it from extractor's code if geo bypass
information is not available beforehand (e.g. obtained during information is not available beforehand (e.g. obtained during
extraction) or due to some another reason. extraction) or due to some other reason. In this case you should pass
this information in geo bypass context passed as first argument. It may
contain following fields:
countries: List of geo unrestricted countries (similar
to _GEO_COUNTRIES)
ip_blocks: List of geo unrestricted IP blocks in CIDR notation
(similar to _GEO_IP_BLOCKS)
""" """
if not self._x_forwarded_for_ip: if not self._x_forwarded_for_ip:
country_code = self._downloader.params.get('geo_bypass_country', None)
# If there is no explicit country for geo bypass specified and # Geo bypass mechanism is explicitly disabled by user
# the extractor is known to be geo restricted let's fake IP if not self._downloader.params.get('geo_bypass', True):
# as X-Forwarded-For right away. return
if (not country_code and
self._GEO_BYPASS and if not geo_bypass_context:
self._downloader.params.get('geo_bypass', True) and geo_bypass_context = {}
countries):
country_code = random.choice(countries) # Backward compatibility: previously _initialize_geo_bypass
if country_code: # expected a list of countries, some 3rd party code may still use
self._x_forwarded_for_ip = GeoUtils.random_ipv4(country_code) # it this way
if isinstance(geo_bypass_context, (list, tuple)):
geo_bypass_context = {
'countries': geo_bypass_context,
}
# The whole point of geo bypass mechanism is to fake IP
# as X-Forwarded-For HTTP header based on some IP block or
# country code.
# Path 1: bypassing based on IP block in CIDR notation
# Explicit IP block specified by user, use it right away
# regardless of whether extractor is geo bypassable or not
ip_block = self._downloader.params.get('geo_bypass_ip_block', None)
# Otherwise use random IP block from geo bypass context but only
# if extractor is known as geo bypassable
if not ip_block:
ip_blocks = geo_bypass_context.get('ip_blocks')
if self._GEO_BYPASS and ip_blocks:
ip_block = random.choice(ip_blocks)
if ip_block:
self._x_forwarded_for_ip = GeoUtils.random_ipv4(ip_block)
if self._downloader.params.get('verbose', False):
self._downloader.to_screen(
'[debug] Using fake IP %s as X-Forwarded-For.'
% self._x_forwarded_for_ip)
return
# Path 2: bypassing based on country code
# Explicit country code specified by user, use it right away
# regardless of whether extractor is geo bypassable or not
country = self._downloader.params.get('geo_bypass_country', None)
# Otherwise use random country code from geo bypass context but
# only if extractor is known as geo bypassable
if not country:
countries = geo_bypass_context.get('countries')
if self._GEO_BYPASS and countries:
country = random.choice(countries)
if country:
self._x_forwarded_for_ip = GeoUtils.random_ipv4(country)
if self._downloader.params.get('verbose', False): if self._downloader.params.get('verbose', False):
self._downloader.to_screen( self._downloader.to_screen(
'[debug] Using fake IP %s (%s) as X-Forwarded-For.' '[debug] Using fake IP %s (%s) as X-Forwarded-For.'
% (self._x_forwarded_for_ip, country_code.upper())) % (self._x_forwarded_for_ip, country.upper()))
def extract(self, url): def extract(self, url):
"""Extracts URL information and returns it in list of dicts.""" """Extracts URL information and returns it in list of dicts."""
@ -682,18 +742,30 @@ class InfoExtractor(object):
else: else:
self.report_warning(errmsg + str(ve)) self.report_warning(errmsg + str(ve))
def _download_json(self, url_or_request, video_id, def _download_json_handle(
note='Downloading JSON metadata', self, url_or_request, video_id, note='Downloading JSON metadata',
errnote='Unable to download JSON metadata', errnote='Unable to download JSON metadata', transform_source=None,
transform_source=None,
fatal=True, encoding=None, data=None, headers={}, query={}): fatal=True, encoding=None, data=None, headers={}, query={}):
json_string = self._download_webpage( """Return a tuple (JSON object, URL handle)"""
res = self._download_webpage_handle(
url_or_request, video_id, note, errnote, fatal=fatal, url_or_request, video_id, note, errnote, fatal=fatal,
encoding=encoding, data=data, headers=headers, query=query) encoding=encoding, data=data, headers=headers, query=query)
if (not fatal) and json_string is False: if res is False:
return None return res
json_string, urlh = res
return self._parse_json( return self._parse_json(
json_string, video_id, transform_source=transform_source, fatal=fatal) json_string, video_id, transform_source=transform_source,
fatal=fatal), urlh
def _download_json(
self, url_or_request, video_id, note='Downloading JSON metadata',
errnote='Unable to download JSON metadata', transform_source=None,
fatal=True, encoding=None, data=None, headers={}, query={}):
res = self._download_json_handle(
url_or_request, video_id, note=note, errnote=errnote,
transform_source=transform_source, fatal=fatal, encoding=encoding,
data=data, headers=headers, query=query)
return res if res is False else res[0]
def _parse_json(self, json_string, video_id, transform_source=None, fatal=True): def _parse_json(self, json_string, video_id, transform_source=None, fatal=True):
if transform_source: if transform_source:
@ -1008,6 +1080,40 @@ class InfoExtractor(object):
if isinstance(json_ld, dict): if isinstance(json_ld, dict):
json_ld = [json_ld] json_ld = [json_ld]
INTERACTION_TYPE_MAP = {
'CommentAction': 'comment',
'AgreeAction': 'like',
'DisagreeAction': 'dislike',
'LikeAction': 'like',
'DislikeAction': 'dislike',
'ListenAction': 'view',
'WatchAction': 'view',
'ViewAction': 'view',
}
def extract_interaction_statistic(e):
interaction_statistic = e.get('interactionStatistic')
if not isinstance(interaction_statistic, list):
return
for is_e in interaction_statistic:
if not isinstance(is_e, dict):
continue
if is_e.get('@type') != 'InteractionCounter':
continue
interaction_type = is_e.get('interactionType')
if not isinstance(interaction_type, compat_str):
continue
interaction_count = int_or_none(is_e.get('userInteractionCount'))
if interaction_count is None:
continue
count_kind = INTERACTION_TYPE_MAP.get(interaction_type.split('/')[-1])
if not count_kind:
continue
count_key = '%s_count' % count_kind
if info.get(count_key) is not None:
continue
info[count_key] = interaction_count
def extract_video_object(e): def extract_video_object(e):
assert e['@type'] == 'VideoObject' assert e['@type'] == 'VideoObject'
info.update({ info.update({
@ -1023,6 +1129,7 @@ class InfoExtractor(object):
'height': int_or_none(e.get('height')), 'height': int_or_none(e.get('height')),
'view_count': int_or_none(e.get('interactionCount')), 'view_count': int_or_none(e.get('interactionCount')),
}) })
extract_interaction_statistic(e)
for e in json_ld: for e in json_ld:
if isinstance(e.get('@context'), compat_str) and re.match(r'^https?://schema.org/?$', e.get('@context')): if isinstance(e.get('@context'), compat_str) and re.match(r'^https?://schema.org/?$', e.get('@context')):

View File

@ -49,7 +49,7 @@ class CrunchyrollBaseIE(InfoExtractor):
}) })
def _login(self): def _login(self):
(username, password) = self._get_login_info() username, password = self._get_login_info()
if username is None: if username is None:
return return

View File

@ -11,10 +11,10 @@ class CTVNewsIE(InfoExtractor):
_VALID_URL = r'https?://(?:.+?\.)?ctvnews\.ca/(?:video\?(?:clip|playlist|bin)Id=|.*?)(?P<id>[0-9.]+)' _VALID_URL = r'https?://(?:.+?\.)?ctvnews\.ca/(?:video\?(?:clip|playlist|bin)Id=|.*?)(?P<id>[0-9.]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.ctvnews.ca/video?clipId=901995', 'url': 'http://www.ctvnews.ca/video?clipId=901995',
'md5': '10deb320dc0ccb8d01d34d12fc2ea672', 'md5': '9b8624ba66351a23e0b6e1391971f9af',
'info_dict': { 'info_dict': {
'id': '901995', 'id': '901995',
'ext': 'mp4', 'ext': 'flv',
'title': 'Extended: \'That person cannot be me\' Johnson says', 'title': 'Extended: \'That person cannot be me\' Johnson says',
'description': 'md5:958dd3b4f5bbbf0ed4d045c790d89285', 'description': 'md5:958dd3b4f5bbbf0ed4d045c790d89285',
'timestamp': 1467286284, 'timestamp': 1467286284,

View File

@ -35,7 +35,7 @@ class CuriosityStreamBaseIE(InfoExtractor):
return result['data'] return result['data']
def _real_initialize(self): def _real_initialize(self):
(email, password) = self._get_login_info() email, password = self._get_login_info()
if email is None: if email is None:
return return
result = self._download_json( result = self._download_json(

View File

@ -180,9 +180,12 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
continue continue
ext = mimetype2ext(type_) or determine_ext(media_url) ext = mimetype2ext(type_) or determine_ext(media_url)
if ext == 'm3u8': if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats( m3u8_formats = self._extract_m3u8_formats(
media_url, video_id, 'mp4', preference=-1, media_url, video_id, 'mp4', preference=-1,
m3u8_id='hls', fatal=False)) m3u8_id='hls', fatal=False)
for f in m3u8_formats:
f['url'] = f['url'].split('#')[0]
formats.append(f)
elif ext == 'f4m': elif ext == 'f4m':
formats.extend(self._extract_f4m_formats( formats.extend(self._extract_f4m_formats(
media_url, video_id, preference=-1, f4m_id='hds', fatal=False)) media_url, video_id, preference=-1, f4m_id='hds', fatal=False))

View File

@ -5,7 +5,10 @@ import re
import string import string
from .discoverygo import DiscoveryGoBaseIE from .discoverygo import DiscoveryGoBaseIE
from ..compat import compat_str from ..compat import (
compat_str,
compat_urllib_parse_unquote,
)
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
try_get, try_get,
@ -55,6 +58,18 @@ class DiscoveryIE(DiscoveryGoBaseIE):
video = next(cb for cb in content_blocks if cb.get('type') == 'video')['content']['items'][0] video = next(cb for cb in content_blocks if cb.get('type') == 'video')['content']['items'][0]
video_id = video['id'] video_id = video['id']
access_token = None
cookies = self._get_cookies(url)
# prefer Affiliate Auth Token over Anonymous Auth Token
auth_storage_cookie = cookies.get('eosAf') or cookies.get('eosAn')
if auth_storage_cookie and auth_storage_cookie.value:
auth_storage = self._parse_json(compat_urllib_parse_unquote(
compat_urllib_parse_unquote(auth_storage_cookie.value)),
video_id, fatal=False) or {}
access_token = auth_storage.get('a') or auth_storage.get('access_token')
if not access_token:
access_token = self._download_json( access_token = self._download_json(
'https://www.%s.com/anonymous' % site, display_id, query={ 'https://www.%s.com/anonymous' % site, display_id, query={
'authRel': 'authorization', 'authRel': 'authorization',
@ -72,7 +87,7 @@ class DiscoveryIE(DiscoveryGoBaseIE):
'Authorization': 'Bearer ' + access_token, 'Authorization': 'Bearer ' + access_token,
}) })
except ExtractorError as e: except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403: if isinstance(e.cause, compat_HTTPError) and e.cause.code in (401, 403):
e_description = self._parse_json( e_description = self._parse_json(
e.cause.read().decode(), display_id)['description'] e.cause.read().decode(), display_id)['description']
if 'resource not available for country' in e_description: if 'resource not available for country' in e_description:

View File

@ -102,7 +102,9 @@ class DPlayIE(InfoExtractor):
display_id = mobj.group('id') display_id = mobj.group('id')
domain = mobj.group('domain') domain = mobj.group('domain')
self._initialize_geo_bypass([mobj.group('country').upper()]) self._initialize_geo_bypass({
'countries': [mobj.group('country').upper()],
})
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)

View File

@ -42,7 +42,7 @@ class DramaFeverBaseIE(InfoExtractor):
self._login() self._login()
def _login(self): def _login(self):
(username, password) = self._get_login_info() username, password = self._get_login_info()
if username is None: if username is None:
return return

View File

@ -8,7 +8,6 @@ from ..utils import (
unified_strdate, unified_strdate,
xpath_text, xpath_text,
determine_ext, determine_ext,
qualities,
float_or_none, float_or_none,
ExtractorError, ExtractorError,
) )
@ -16,7 +15,8 @@ from ..utils import (
class DreiSatIE(InfoExtractor): class DreiSatIE(InfoExtractor):
IE_NAME = '3sat' IE_NAME = '3sat'
_VALID_URL = r'(?:https?://)?(?:www\.)?3sat\.de/mediathek/(?:index\.php|mediathek\.php)?\?(?:(?:mode|display)=[^&]+&)*obj=(?P<id>[0-9]+)$' _GEO_COUNTRIES = ['DE']
_VALID_URL = r'https?://(?:www\.)?3sat\.de/mediathek/(?:(?:index|mediathek)\.php)?\?(?:(?:mode|display)=[^&]+&)*obj=(?P<id>[0-9]+)'
_TESTS = [ _TESTS = [
{ {
'url': 'http://www.3sat.de/mediathek/index.php?mode=play&obj=45918', 'url': 'http://www.3sat.de/mediathek/index.php?mode=play&obj=45918',
@ -43,7 +43,8 @@ class DreiSatIE(InfoExtractor):
def _parse_smil_formats(self, smil, smil_url, video_id, namespace=None, f4m_params=None, transform_rtmp_url=None): def _parse_smil_formats(self, smil, smil_url, video_id, namespace=None, f4m_params=None, transform_rtmp_url=None):
param_groups = {} param_groups = {}
for param_group in smil.findall(self._xpath_ns('./head/paramGroup', namespace)): for param_group in smil.findall(self._xpath_ns('./head/paramGroup', namespace)):
group_id = param_group.attrib.get(self._xpath_ns('id', 'http://www.w3.org/XML/1998/namespace')) group_id = param_group.get(self._xpath_ns(
'id', 'http://www.w3.org/XML/1998/namespace'))
params = {} params = {}
for param in param_group: for param in param_group:
params[param.get('name')] = param.get('value') params[param.get('name')] = param.get('value')
@ -54,7 +55,7 @@ class DreiSatIE(InfoExtractor):
src = video.get('src') src = video.get('src')
if not src: if not src:
continue continue
bitrate = float_or_none(video.get('system-bitrate') or video.get('systemBitrate'), 1000) bitrate = int_or_none(self._search_regex(r'_(\d+)k', src, 'bitrate', None)) or float_or_none(video.get('system-bitrate') or video.get('systemBitrate'), 1000)
group_id = video.get('paramGroup') group_id = video.get('paramGroup')
param_group = param_groups[group_id] param_group = param_groups[group_id]
for proto in param_group['protocols'].split(','): for proto in param_group['protocols'].split(','):
@ -75,66 +76,36 @@ class DreiSatIE(InfoExtractor):
note='Downloading video info', note='Downloading video info',
errnote='Failed to download video info') errnote='Failed to download video info')
status_code = doc.find('./status/statuscode') status_code = xpath_text(doc, './status/statuscode')
if status_code is not None and status_code.text != 'ok': if status_code and status_code != 'ok':
code = status_code.text if status_code == 'notVisibleAnymore':
if code == 'notVisibleAnymore':
message = 'Video %s is not available' % video_id message = 'Video %s is not available' % video_id
else: else:
message = '%s returned error: %s' % (self.IE_NAME, code) message = '%s returned error: %s' % (self.IE_NAME, status_code)
raise ExtractorError(message, expected=True) raise ExtractorError(message, expected=True)
title = doc.find('.//information/title').text title = xpath_text(doc, './/information/title', 'title', True)
description = xpath_text(doc, './/information/detail', 'description')
duration = int_or_none(xpath_text(doc, './/details/lengthSec', 'duration'))
uploader = xpath_text(doc, './/details/originChannelTitle', 'uploader')
uploader_id = xpath_text(doc, './/details/originChannelId', 'uploader id')
upload_date = unified_strdate(xpath_text(doc, './/details/airtime', 'upload date'))
def xml_to_thumbnails(fnode): urls = []
thumbnails = []
for node in fnode:
thumbnail_url = node.text
if not thumbnail_url:
continue
thumbnail = {
'url': thumbnail_url,
}
if 'key' in node.attrib:
m = re.match('^([0-9]+)x([0-9]+)$', node.attrib['key'])
if m:
thumbnail['width'] = int(m.group(1))
thumbnail['height'] = int(m.group(2))
thumbnails.append(thumbnail)
return thumbnails
thumbnails = xml_to_thumbnails(doc.findall('.//teaserimages/teaserimage'))
format_nodes = doc.findall('.//formitaeten/formitaet')
quality = qualities(['veryhigh', 'high', 'med', 'low'])
def get_quality(elem):
return quality(xpath_text(elem, 'quality'))
format_nodes.sort(key=get_quality)
format_ids = []
formats = [] formats = []
for fnode in format_nodes: for fnode in doc.findall('.//formitaeten/formitaet'):
video_url = fnode.find('url').text video_url = xpath_text(fnode, 'url')
is_available = 'http://www.metafilegenerator' not in video_url if not video_url or video_url in urls:
if not is_available:
continue continue
urls.append(video_url)
is_available = 'http://www.metafilegenerator' not in video_url
geoloced = 'static_geoloced_online' in video_url
if not is_available or geoloced:
continue
format_id = fnode.attrib['basetype'] format_id = fnode.attrib['basetype']
quality = xpath_text(fnode, './quality', 'quality')
format_m = re.match(r'''(?x) format_m = re.match(r'''(?x)
(?P<vcodec>[^_]+)_(?P<acodec>[^_]+)_(?P<container>[^_]+)_ (?P<vcodec>[^_]+)_(?P<acodec>[^_]+)_(?P<container>[^_]+)_
(?P<proto>[^_]+)_(?P<index>[^_]+)_(?P<indexproto>[^_]+) (?P<proto>[^_]+)_(?P<index>[^_]+)_(?P<indexproto>[^_]+)
''', format_id) ''', format_id)
ext = determine_ext(video_url, None) or format_m.group('container') ext = determine_ext(video_url, None) or format_m.group('container')
if ext not in ('smil', 'f4m', 'm3u8'):
format_id = format_id + '-' + quality
if format_id in format_ids:
continue
if ext == 'meta': if ext == 'meta':
continue continue
@ -147,24 +118,23 @@ class DreiSatIE(InfoExtractor):
if video_url.startswith('https://'): if video_url.startswith('https://'):
continue continue
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
video_url, video_id, 'mp4', m3u8_id=format_id, fatal=False)) video_url, video_id, 'mp4', 'm3u8_native',
m3u8_id=format_id, fatal=False))
elif ext == 'f4m': elif ext == 'f4m':
formats.extend(self._extract_f4m_formats( formats.extend(self._extract_f4m_formats(
video_url, video_id, f4m_id=format_id, fatal=False)) video_url, video_id, f4m_id=format_id, fatal=False))
else: else:
proto = format_m.group('proto').lower() quality = xpath_text(fnode, './quality')
if quality:
format_id += '-' + quality
abr = int_or_none(xpath_text(fnode, './audioBitrate', 'abr'), 1000) abr = int_or_none(xpath_text(fnode, './audioBitrate'), 1000)
vbr = int_or_none(xpath_text(fnode, './videoBitrate', 'vbr'), 1000) vbr = int_or_none(xpath_text(fnode, './videoBitrate'), 1000)
width = int_or_none(xpath_text(fnode, './width', 'width')) tbr = int_or_none(self._search_regex(
height = int_or_none(xpath_text(fnode, './height', 'height')) r'_(\d+)k', video_url, 'bitrate', None))
if tbr and vbr and not abr:
filesize = int_or_none(xpath_text(fnode, './filesize', 'filesize')) abr = tbr - vbr
format_note = ''
if not format_note:
format_note = None
formats.append({ formats.append({
'format_id': format_id, 'format_id': format_id,
@ -174,31 +144,50 @@ class DreiSatIE(InfoExtractor):
'vcodec': format_m.group('vcodec'), 'vcodec': format_m.group('vcodec'),
'abr': abr, 'abr': abr,
'vbr': vbr, 'vbr': vbr,
'width': width, 'tbr': tbr,
'height': height, 'width': int_or_none(xpath_text(fnode, './width')),
'filesize': filesize, 'height': int_or_none(xpath_text(fnode, './height')),
'format_note': format_note, 'filesize': int_or_none(xpath_text(fnode, './filesize')),
'protocol': proto, 'protocol': format_m.group('proto').lower(),
'_available': is_available,
}) })
format_ids.append(format_id)
geolocation = xpath_text(doc, './/details/geolocation')
if not formats and geolocation and geolocation != 'none':
self.raise_geo_restricted(countries=self._GEO_COUNTRIES)
self._sort_formats(formats) self._sort_formats(formats)
thumbnails = []
for node in doc.findall('.//teaserimages/teaserimage'):
thumbnail_url = node.text
if not thumbnail_url:
continue
thumbnail = {
'url': thumbnail_url,
}
thumbnail_key = node.get('key')
if thumbnail_key:
m = re.match('^([0-9]+)x([0-9]+)$', thumbnail_key)
if m:
thumbnail['width'] = int(m.group(1))
thumbnail['height'] = int(m.group(2))
thumbnails.append(thumbnail)
upload_date = unified_strdate(xpath_text(doc, './/details/airtime'))
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'description': description, 'description': xpath_text(doc, './/information/detail'),
'duration': duration, 'duration': int_or_none(xpath_text(doc, './/details/lengthSec')),
'thumbnails': thumbnails, 'thumbnails': thumbnails,
'uploader': uploader, 'uploader': xpath_text(doc, './/details/originChannelTitle'),
'uploader_id': uploader_id, 'uploader_id': xpath_text(doc, './/details/originChannelId'),
'upload_date': upload_date, 'upload_date': upload_date,
'formats': formats, 'formats': formats,
} }
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) video_id = self._match_id(url)
video_id = mobj.group('id') details_url = 'http://www.3sat.de/mediathek/xmlservice/web/beitragsDetails?id=%s' % video_id
details_url = 'http://www.3sat.de/mediathek/xmlservice/web/beitragsDetails?ak=web&id=%s' % video_id
return self.extract_from_xml_url(video_id, details_url) return self.extract_from_xml_url(video_id, details_url)

View File

@ -0,0 +1,83 @@
# coding: utf-8
from __future__ import unicode_literals
import json
import re
from socket import timeout
from .common import InfoExtractor
from ..utils import (
int_or_none,
parse_iso8601,
)
class DTubeIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?d\.tube/(?:#!/)?v/(?P<uploader_id>[0-9a-z.-]+)/(?P<id>[0-9a-z]{8})'
_TEST = {
'url': 'https://d.tube/#!/v/benswann/zqd630em',
'md5': 'a03eaa186618ffa7a3145945543a251e',
'info_dict': {
'id': 'zqd630em',
'ext': 'mp4',
'title': 'Reality Check: FDA\'s Disinformation Campaign on Kratom',
'description': 'md5:700d164e066b87f9eac057949e4227c2',
'uploader_id': 'benswann',
'upload_date': '20180222',
'timestamp': 1519328958,
},
'params': {
'format': '480p',
},
}
def _real_extract(self, url):
uploader_id, video_id = re.match(self._VALID_URL, url).groups()
result = self._download_json('https://api.steemit.com/', video_id, data=json.dumps({
'jsonrpc': '2.0',
'method': 'get_content',
'params': [uploader_id, video_id],
}).encode())['result']
metadata = json.loads(result['json_metadata'])
video = metadata['video']
content = video['content']
info = video.get('info', {})
title = info.get('title') or result['title']
def canonical_url(h):
if not h:
return None
return 'https://ipfs.io/ipfs/' + h
formats = []
for q in ('240', '480', '720', '1080', ''):
video_url = canonical_url(content.get('video%shash' % q))
if not video_url:
continue
format_id = (q + 'p') if q else 'Source'
try:
self.to_screen('%s: Checking %s video format URL' % (video_id, format_id))
self._downloader._opener.open(video_url, timeout=5).close()
except timeout as e:
self.to_screen(
'%s: %s URL is invalid, skipping' % (video_id, format_id))
continue
formats.append({
'format_id': format_id,
'url': video_url,
'height': int_or_none(q),
'ext': 'mp4',
})
return {
'id': video_id,
'title': title,
'description': content.get('description'),
'thumbnail': canonical_url(info.get('snaphash')),
'tags': content.get('tags') or metadata.get('tags'),
'duration': info.get('duration'),
'formats': formats,
'timestamp': parse_iso8601(result.get('created')),
'uploader_id': uploader_id,
}

View File

@ -91,17 +91,6 @@ class DVTVIE(InfoExtractor):
}, { }, {
'url': 'http://video.aktualne.cz/v-cechach-poprve-zazni-zelenkova-zrestaurovana-mse/r~45b4b00483ec11e4883b002590604f2e/', 'url': 'http://video.aktualne.cz/v-cechach-poprve-zazni-zelenkova-zrestaurovana-mse/r~45b4b00483ec11e4883b002590604f2e/',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://video.aktualne.cz/dvtv/babis-a-zeman-nesou-vinu-za-to-ze-nemame-jasno-v-tom-kdo-bud/r~026afb54fad711e79704ac1f6b220ee8/',
'md5': '87defe16681b1429c91f7a74809823c6',
'info_dict': {
'id': 'f5ae72f6fad611e794dbac1f6b220ee8',
'ext': 'mp4',
'title': 'Babiš a Zeman nesou vinu za to, že nemáme jasno v tom, kdo bude vládnout, říká Pekarová Adamová',
},
'params': {
'skip_download': True,
},
}] }]
def _parse_video_metadata(self, js, video_id, live_js=None): def _parse_video_metadata(self, js, video_id, live_js=None):

View File

@ -44,6 +44,7 @@ from .anysex import AnySexIE
from .aol import AolIE from .aol import AolIE
from .allocine import AllocineIE from .allocine import AllocineIE
from .aliexpress import AliExpressLiveIE from .aliexpress import AliExpressLiveIE
from .apa import APAIE
from .aparat import AparatIE from .aparat import AparatIE
from .appleconnect import AppleConnectIE from .appleconnect import AppleConnectIE
from .appletrailers import ( from .appletrailers import (
@ -137,6 +138,7 @@ from .brightcove import (
BrightcoveLegacyIE, BrightcoveLegacyIE,
BrightcoveNewIE, BrightcoveNewIE,
) )
from .businessinsider import BusinessInsiderIE
from .buzzfeed import BuzzFeedIE from .buzzfeed import BuzzFeedIE
from .byutv import BYUtvIE from .byutv import BYUtvIE
from .c56 import C56IE from .c56 import C56IE
@ -144,6 +146,7 @@ from .camdemy import (
CamdemyIE, CamdemyIE,
CamdemyFolderIE CamdemyFolderIE
) )
from .cammodels import CamModelsIE
from .camwithher import CamWithHerIE from .camwithher import CamWithHerIE
from .canalplus import CanalplusIE from .canalplus import CanalplusIE
from .canalc2 import Canalc2IE from .canalc2 import Canalc2IE
@ -195,6 +198,7 @@ from .clippit import ClippitIE
from .cliprs import ClipRsIE from .cliprs import ClipRsIE
from .clipsyndicate import ClipsyndicateIE from .clipsyndicate import ClipsyndicateIE
from .closertotruth import CloserToTruthIE from .closertotruth import CloserToTruthIE
from .cloudflarestream import CloudflareStreamIE
from .cloudy import CloudyIE from .cloudy import CloudyIE
from .clubic import ClubicIE from .clubic import ClubicIE
from .clyp import ClypIE from .clyp import ClypIE
@ -281,6 +285,7 @@ from .drtv import (
DRTVIE, DRTVIE,
DRTVLiveIE, DRTVLiveIE,
) )
from .dtube import DTubeIE
from .dvtv import DVTVIE from .dvtv import DVTVIE
from .dumpert import DumpertIE from .dumpert import DumpertIE
from .defense import DefenseGouvFrIE from .defense import DefenseGouvFrIE
@ -466,10 +471,7 @@ from .imgur import (
) )
from .ina import InaIE from .ina import InaIE
from .inc import IncIE from .inc import IncIE
from .indavideo import ( from .indavideo import IndavideoEmbedIE
IndavideoIE,
IndavideoEmbedIE,
)
from .infoq import InfoQIE from .infoq import InfoQIE
from .instagram import InstagramIE, InstagramUserIE from .instagram import InstagramIE, InstagramUserIE
from .internazionale import InternazionaleIE from .internazionale import InternazionaleIE
@ -477,7 +479,10 @@ from .internetvideoarchive import InternetVideoArchiveIE
from .iprima import IPrimaIE from .iprima import IPrimaIE
from .iqiyi import IqiyiIE from .iqiyi import IqiyiIE
from .ir90tv import Ir90TvIE from .ir90tv import Ir90TvIE
from .itv import ITVIE from .itv import (
ITVIE,
ITVBTCCIE,
)
from .ivi import ( from .ivi import (
IviIE, IviIE,
IviCompilationIE IviCompilationIE
@ -576,7 +581,6 @@ from .mailru import (
MailRuMusicIE, MailRuMusicIE,
MailRuMusicSearchIE, MailRuMusicSearchIE,
) )
from .makerschannel import MakersChannelIE
from .makertv import MakerTVIE from .makertv import MakerTVIE
from .mangomolo import ( from .mangomolo import (
MangomoloVideoIE, MangomoloVideoIE,
@ -619,7 +623,6 @@ from .mnet import MnetIE
from .moevideo import MoeVideoIE from .moevideo import MoeVideoIE
from .mofosex import MofosexIE from .mofosex import MofosexIE
from .mojvideo import MojvideoIE from .mojvideo import MojvideoIE
from .moniker import MonikerIE
from .morningstar import MorningstarIE from .morningstar import MorningstarIE
from .motherless import ( from .motherless import (
MotherlessIE, MotherlessIE,
@ -640,6 +643,7 @@ from .mtv import (
from .muenchentv import MuenchenTVIE from .muenchentv import MuenchenTVIE
from .musicplayon import MusicPlayOnIE from .musicplayon import MusicPlayOnIE
from .mwave import MwaveIE, MwaveMeetGreetIE from .mwave import MwaveIE, MwaveMeetGreetIE
from .mychannels import MyChannelsIE
from .myspace import MySpaceIE, MySpaceAlbumIE from .myspace import MySpaceIE, MySpaceAlbumIE
from .myspass import MySpassIE from .myspass import MySpassIE
from .myvi import ( from .myvi import (
@ -661,6 +665,7 @@ from .nbc import (
NBCOlympicsIE, NBCOlympicsIE,
NBCOlympicsStreamIE, NBCOlympicsStreamIE,
NBCSportsIE, NBCSportsIE,
NBCSportsStreamIE,
NBCSportsVPlayerIE, NBCSportsVPlayerIE,
) )
from .ndr import ( from .ndr import (
@ -714,10 +719,7 @@ from .nick import (
NickRuIE, NickRuIE,
) )
from .niconico import NiconicoIE, NiconicoPlaylistIE from .niconico import NiconicoIE, NiconicoPlaylistIE
from .ninecninemedia import ( from .ninecninemedia import NineCNineMediaIE
NineCNineMediaStackIE,
NineCNineMediaIE,
)
from .ninegag import NineGagIE from .ninegag import NineGagIE
from .ninenow import NineNowIE from .ninenow import NineNowIE
from .nintendo import NintendoIE from .nintendo import NintendoIE
@ -805,6 +807,7 @@ from .parliamentliveuk import ParliamentLiveUKIE
from .patreon import PatreonIE from .patreon import PatreonIE
from .pbs import PBSIE from .pbs import PBSIE
from .pearvideo import PearVideoIE from .pearvideo import PearVideoIE
from .peertube import PeerTubeIE
from .people import PeopleIE from .people import PeopleIE
from .performgroup import PerformGroupIE from .performgroup import PerformGroupIE
from .periscope import ( from .periscope import (
@ -1010,7 +1013,10 @@ from .spankbang import SpankBangIE
from .spankwire import SpankwireIE from .spankwire import SpankwireIE
from .spiegel import SpiegelIE, SpiegelArticleIE from .spiegel import SpiegelIE, SpiegelArticleIE
from .spiegeltv import SpiegeltvIE from .spiegeltv import SpiegeltvIE
from .spike import SpikeIE from .spike import (
BellatorIE,
ParamountNetworkIE,
)
from .stitcher import StitcherIE from .stitcher import StitcherIE
from .sport5 import Sport5IE from .sport5 import Sport5IE
from .sportbox import SportBoxEmbedIE from .sportbox import SportBoxEmbedIE
@ -1418,5 +1424,11 @@ from .youtube import (
) )
from .zapiks import ZapiksIE from .zapiks import ZapiksIE
from .zaq1 import Zaq1IE from .zaq1 import Zaq1IE
from .zattoo import (
QuicklineIE,
QuicklineLiveIE,
ZattooIE,
ZattooLiveIE,
)
from .zdf import ZDFIE, ZDFChannelIE from .zdf import ZDFIE, ZDFChannelIE
from .zingmp3 import ZingMp3IE from .zingmp3 import ZingMp3IE

View File

@ -226,7 +226,7 @@ class FacebookIE(InfoExtractor):
return urls return urls
def _login(self): def _login(self):
(useremail, password) = self._get_login_info() useremail, password = self._get_login_info()
if useremail is None: if useremail is None:
return return

View File

@ -46,7 +46,7 @@ class FC2IE(InfoExtractor):
}] }]
def _login(self): def _login(self):
(username, password) = self._get_login_info() username, password = self._get_login_info()
if username is None or password is None: if username is None or password is None:
return False return False

View File

@ -51,7 +51,7 @@ class FunimationIE(InfoExtractor):
}] }]
def _login(self): def _login(self):
(username, password) = self._get_login_info() username, password = self._get_login_info()
if username is None: if username is None:
return return
try: try:

View File

@ -5,7 +5,10 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from .nexx import NexxIE from .nexx import NexxIE
from ..utils import int_or_none from ..utils import (
int_or_none,
try_get,
)
class FunkBaseIE(InfoExtractor): class FunkBaseIE(InfoExtractor):
@ -77,6 +80,20 @@ class FunkChannelIE(FunkBaseIE):
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
}, {
# only available via byIdList API
'url': 'https://www.funk.net/channel/informr/martin-sonneborn-erklaert-die-eu',
'info_dict': {
'id': '205067',
'ext': 'mp4',
'title': 'Martin Sonneborn erklärt die EU',
'description': 'md5:050f74626e4ed87edf4626d2024210c0',
'timestamp': 1494424042,
'upload_date': '20170510',
},
'params': {
'skip_download': True,
},
}, { }, {
'url': 'https://www.funk.net/channel/59d5149841dca100012511e3/mein-erster-job-lovemilla-folge-1/lovemilla/', 'url': 'https://www.funk.net/channel/59d5149841dca100012511e3/mein-erster-job-lovemilla-folge-1/lovemilla/',
'only_matching': True, 'only_matching': True,
@ -87,16 +104,28 @@ class FunkChannelIE(FunkBaseIE):
channel_id = mobj.group('id') channel_id = mobj.group('id')
alias = mobj.group('alias') alias = mobj.group('alias')
results = self._download_json(
'https://www.funk.net/api/v3.0/content/videos/filter', channel_id,
headers = { headers = {
'authorization': 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJjbGllbnROYW1lIjoiY3VyYXRpb24tdG9vbCIsInNjb3BlIjoic3RhdGljLWNvbnRlbnQtYXBpLGN1cmF0aW9uLWFwaSxzZWFyY2gtYXBpIn0.q4Y2xZG8PFHai24-4Pjx2gym9RmJejtmK6lMXP5wAgc', 'authorization': 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJjbGllbnROYW1lIjoiY3VyYXRpb24tdG9vbCIsInNjb3BlIjoic3RhdGljLWNvbnRlbnQtYXBpLGN1cmF0aW9uLWFwaSxzZWFyY2gtYXBpIn0.q4Y2xZG8PFHai24-4Pjx2gym9RmJejtmK6lMXP5wAgc',
'Referer': url, 'Referer': url,
}, query={ }
video = None
by_id_list = self._download_json(
'https://www.funk.net/api/v3.0/content/videos/byIdList', channel_id,
headers=headers, query={
'ids': alias,
}, fatal=False)
if by_id_list:
video = try_get(by_id_list, lambda x: x['result'][0], dict)
if not video:
results = self._download_json(
'https://www.funk.net/api/v3.0/content/videos/filter', channel_id,
headers=headers, query={
'channelId': channel_id, 'channelId': channel_id,
'size': 100, 'size': 100,
})['result'] })['result']
video = next(r for r in results if r.get('alias') == alias) video = next(r for r in results if r.get('alias') == alias)
return self._make_url_result(video) return self._make_url_result(video)

View File

@ -91,7 +91,7 @@ class GDCVaultIE(InfoExtractor):
] ]
def _login(self, webpage_url, display_id): def _login(self, webpage_url, display_id):
(username, password) = self._get_login_info() username, password = self._get_login_info()
if username is None or password is None: if username is None or password is None:
self.report_warning('It looks like ' + webpage_url + ' requires a login. Try specifying a username and password and try again.') self.report_warning('It looks like ' + webpage_url + ' requires a login. Try specifying a username and password and try again.')
return None return None

View File

@ -23,6 +23,7 @@ from ..utils import (
is_html, is_html,
js_to_json, js_to_json,
KNOWN_EXTENSIONS, KNOWN_EXTENSIONS,
merge_dicts,
mimetype2ext, mimetype2ext,
orderedSet, orderedSet,
sanitized_Request, sanitized_Request,
@ -106,6 +107,10 @@ from .springboardplatform import SpringboardPlatformIE
from .yapfiles import YapFilesIE from .yapfiles import YapFilesIE
from .vice import ViceIE from .vice import ViceIE
from .xfileshare import XFileShareIE from .xfileshare import XFileShareIE
from .cloudflarestream import CloudflareStreamIE
from .peertube import PeerTubeIE
from .indavideo import IndavideoEmbedIE
from .apa import APAIE
class GenericIE(InfoExtractor): class GenericIE(InfoExtractor):
@ -190,6 +195,16 @@ class GenericIE(InfoExtractor):
'title': 'pdv_maddow_netcast_m4v-02-27-2015-201624', 'title': 'pdv_maddow_netcast_m4v-02-27-2015-201624',
} }
}, },
# RSS feed with enclosures and unsupported link URLs
{
'url': 'http://www.hellointernet.fm/podcast?format=rss',
'info_dict': {
'id': 'http://www.hellointernet.fm/podcast?format=rss',
'description': 'CGP Grey and Brady Haran talk about YouTube, life, work, whatever.',
'title': 'Hello Internet',
},
'playlist_mincount': 100,
},
# SMIL from http://videolectures.net/promogram_igor_mekjavic_eng # SMIL from http://videolectures.net/promogram_igor_mekjavic_eng
{ {
'url': 'http://videolectures.net/promogram_igor_mekjavic_eng/video/1/smil.xml', 'url': 'http://videolectures.net/promogram_igor_mekjavic_eng/video/1/smil.xml',
@ -1271,6 +1286,23 @@ class GenericIE(InfoExtractor):
}, },
'add_ie': ['Kaltura'], 'add_ie': ['Kaltura'],
}, },
{
# Kaltura iframe embed, more sophisticated
'url': 'http://www.cns.nyu.edu/~eero/math-tools/Videos/lecture-05sep2017.html',
'info_dict': {
'id': '1_9gzouybz',
'ext': 'mp4',
'title': 'lecture-05sep2017',
'description': 'md5:40f347d91fd4ba047e511c5321064b49',
'upload_date': '20170913',
'uploader_id': 'eps2',
'timestamp': 1505340777,
},
'params': {
'skip_download': True,
},
'add_ie': ['Kaltura'],
},
{ {
# meta twitter:player # meta twitter:player
'url': 'http://thechive.com/2017/12/08/all-i-want-for-christmas-is-more-twerk/', 'url': 'http://thechive.com/2017/12/08/all-i-want-for-christmas-is-more-twerk/',
@ -1443,21 +1475,6 @@ class GenericIE(InfoExtractor):
}, },
'expected_warnings': ['Failed to parse JSON Expecting value'], 'expected_warnings': ['Failed to parse JSON Expecting value'],
}, },
# Ooyala embed
{
'url': 'http://www.businessinsider.com/excel-index-match-vlookup-video-how-to-2015-2?IR=T',
'info_dict': {
'id': '50YnY4czr4ms1vJ7yz3xzq0excz_pUMs',
'ext': 'mp4',
'description': 'Index/Match versus VLOOKUP.',
'title': 'This is what separates the Excel masters from the wannabes',
'duration': 191.933,
},
'params': {
# m3u8 downloads
'skip_download': True,
}
},
# Brightcove URL in single quotes # Brightcove URL in single quotes
{ {
'url': 'http://www.sportsnet.ca/baseball/mlb/sn-presents-russell-martin-world-citizen/', 'url': 'http://www.sportsnet.ca/baseball/mlb/sn-presents-russell-martin-world-citizen/',
@ -1985,6 +2002,63 @@ class GenericIE(InfoExtractor):
'skip_download': True, 'skip_download': True,
}, },
}, },
{
# CloudflareStream embed
'url': 'https://www.cloudflare.com/products/cloudflare-stream/',
'info_dict': {
'id': '31c9291ab41fac05471db4e73aa11717',
'ext': 'mp4',
'title': '31c9291ab41fac05471db4e73aa11717',
},
'add_ie': [CloudflareStreamIE.ie_key()],
'params': {
'skip_download': True,
},
},
{
# PeerTube embed
'url': 'https://joinpeertube.org/fr/home/',
'info_dict': {
'id': 'home',
'title': 'Reprenez le contrôle de vos vidéos ! #JoinPeertube',
},
'playlist_count': 2,
},
{
# Indavideo embed
'url': 'https://streetkitchen.hu/receptek/igy_kell_otthon_hamburgert_sutni/',
'info_dict': {
'id': '1693903',
'ext': 'mp4',
'title': 'Így kell otthon hamburgert sütni',
'description': 'md5:f5a730ecf900a5c852e1e00540bbb0f7',
'timestamp': 1426330212,
'upload_date': '20150314',
'uploader': 'StreetKitchen',
'uploader_id': '546363',
},
'add_ie': [IndavideoEmbedIE.ie_key()],
'params': {
'skip_download': True,
},
},
{
# APA embed via JWPlatform embed
'url': 'http://www.vol.at/blue-man-group/5593454',
'info_dict': {
'id': 'jjv85FdZ',
'ext': 'mp4',
'title': '"Blau ist mysteriös": Die Blue Man Group im Interview',
'description': 'md5:d41d8cd98f00b204e9800998ecf8427e',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 254,
'timestamp': 1519211149,
'upload_date': '20180221',
},
'params': {
'skip_download': True,
},
},
{ {
'url': 'http://share-videos.se/auto/video/83645793?uid=13', 'url': 'http://share-videos.se/auto/video/83645793?uid=13',
'md5': 'b68d276de422ab07ee1d49388103f457', 'md5': 'b68d276de422ab07ee1d49388103f457',
@ -2025,14 +2099,16 @@ class GenericIE(InfoExtractor):
entries = [] entries = []
for it in doc.findall('./channel/item'): for it in doc.findall('./channel/item'):
next_url = xpath_text(it, 'link', fatal=False) next_url = None
if not next_url:
enclosure_nodes = it.findall('./enclosure') enclosure_nodes = it.findall('./enclosure')
for e in enclosure_nodes: for e in enclosure_nodes:
next_url = e.attrib.get('url') next_url = e.attrib.get('url')
if next_url: if next_url:
break break
if not next_url:
next_url = xpath_text(it, 'link', fatal=False)
if not next_url: if not next_url:
continue continue
@ -2995,6 +3071,26 @@ class GenericIE(InfoExtractor):
return self.playlist_from_matches( return self.playlist_from_matches(
xfileshare_urls, video_id, video_title, ie=XFileShareIE.ie_key()) xfileshare_urls, video_id, video_title, ie=XFileShareIE.ie_key())
cloudflarestream_urls = CloudflareStreamIE._extract_urls(webpage)
if cloudflarestream_urls:
return self.playlist_from_matches(
cloudflarestream_urls, video_id, video_title, ie=CloudflareStreamIE.ie_key())
peertube_urls = PeerTubeIE._extract_urls(webpage)
if peertube_urls:
return self.playlist_from_matches(
peertube_urls, video_id, video_title, ie=PeerTubeIE.ie_key())
indavideo_urls = IndavideoEmbedIE._extract_urls(webpage)
if indavideo_urls:
return self.playlist_from_matches(
indavideo_urls, video_id, video_title, ie=IndavideoEmbedIE.ie_key())
apa_urls = APAIE._extract_urls(webpage)
if apa_urls:
return self.playlist_from_matches(
apa_urls, video_id, video_title, ie=APAIE.ie_key())
sharevideos_urls = [mobj.group('url') for mobj in re.finditer( sharevideos_urls = [mobj.group('url') for mobj in re.finditer(
r'<iframe[^>]+?\bsrc\s*=\s*(["\'])(?P<url>(?:https?:)?//embed\.share-videos\.se/auto/embed/\d+\?.*?\buid=\d+.*?)\1', r'<iframe[^>]+?\bsrc\s*=\s*(["\'])(?P<url>(?:https?:)?//embed\.share-videos\.se/auto/embed/\d+\?.*?\buid=\d+.*?)\1',
webpage)] webpage)]
@ -3002,21 +3098,6 @@ class GenericIE(InfoExtractor):
return self.playlist_from_matches( return self.playlist_from_matches(
sharevideos_urls, video_id, video_title) sharevideos_urls, video_id, video_title)
def merge_dicts(dict1, dict2):
merged = {}
for k, v in dict1.items():
if v is not None:
merged[k] = v
for k, v in dict2.items():
if v is None:
continue
if (k not in merged or
(isinstance(v, compat_str) and v and
isinstance(merged[k], compat_str) and
not merged[k])):
merged[k] = v
return merged
# Look for HTML5 media # Look for HTML5 media
entries = self._parse_html5_media_entries(url, webpage, video_id, m3u8_id='hls') entries = self._parse_html5_media_entries(url, webpage, video_id, m3u8_id='hls')
if entries: if entries:

View File

@ -1,15 +1,16 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import base64
import hashlib
import json
import random import random
import re import re
import math
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_HTTPError,
compat_str, compat_str,
compat_chr,
compat_ord,
) )
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
@ -22,12 +23,7 @@ from ..utils import (
class GloboIE(InfoExtractor): class GloboIE(InfoExtractor):
_VALID_URL = r'(?:globo:|https?://.+?\.globo\.com/(?:[^/]+/)*(?:v/(?:[^/]+/)?|videos/))(?P<id>\d{7,})' _VALID_URL = r'(?:globo:|https?://.+?\.globo\.com/(?:[^/]+/)*(?:v/(?:[^/]+/)?|videos/))(?P<id>\d{7,})'
_NETRC_MACHINE = 'globo'
_API_URL_TEMPLATE = 'http://api.globovideos.com/videos/%s/playlist'
_SECURITY_URL_TEMPLATE = 'http://security.video.globo.com/videos/%s/hash?player=flash&version=17.0.0.132&resource_id=%s'
_RESIGN_EXPIRATION = 86400
_TESTS = [{ _TESTS = [{
'url': 'http://g1.globo.com/carros/autoesporte/videos/t/exclusivos-do-g1/v/mercedes-benz-gla-passa-por-teste-de-colisao-na-europa/3607726/', 'url': 'http://g1.globo.com/carros/autoesporte/videos/t/exclusivos-do-g1/v/mercedes-benz-gla-passa-por-teste-de-colisao-na-europa/3607726/',
'md5': 'b3ccc801f75cd04a914d51dadb83a78d', 'md5': 'b3ccc801f75cd04a914d51dadb83a78d',
@ -70,287 +66,51 @@ class GloboIE(InfoExtractor):
'only_matching': True, 'only_matching': True,
}] }]
class MD5(object): def _real_initialize(self):
HEX_FORMAT_LOWERCASE = 0 email, password = self._get_login_info()
HEX_FORMAT_UPPERCASE = 1 if email is None:
BASE64_PAD_CHARACTER_DEFAULT_COMPLIANCE = '' return
BASE64_PAD_CHARACTER_RFC_COMPLIANCE = '='
PADDING = '=0xFF01DD'
hexcase = 0
b64pad = ''
def __init__(self):
pass
class JSArray(list):
def __getitem__(self, y):
try: try:
return list.__getitem__(self, y) self._download_json(
except IndexError: 'https://login.globo.com/api/authentication', None, data=json.dumps({
return 0 'payload': {
'email': email,
def __setitem__(self, i, y): 'password': password,
try: 'serviceId': 4654,
return list.__setitem__(self, i, y) },
except IndexError: }).encode(), headers={
self.extend([0] * (i - len(self) + 1)) 'Content-Type': 'application/json; charset=utf-8',
self[-1] = y })
except ExtractorError as e:
@classmethod if isinstance(e.cause, compat_HTTPError) and e.cause.code == 401:
def hex_md5(cls, param1): resp = self._parse_json(e.cause.read(), None)
return cls.rstr2hex(cls.rstr_md5(cls.str2rstr_utf8(param1))) raise ExtractorError(resp.get('userMessage') or resp['id'], expected=True)
raise
@classmethod
def b64_md5(cls, param1, param2=None):
return cls.rstr2b64(cls.rstr_md5(cls.str2rstr_utf8(param1, param2)))
@classmethod
def any_md5(cls, param1, param2):
return cls.rstr2any(cls.rstr_md5(cls.str2rstr_utf8(param1)), param2)
@classmethod
def rstr_md5(cls, param1):
return cls.binl2rstr(cls.binl_md5(cls.rstr2binl(param1), len(param1) * 8))
@classmethod
def rstr2hex(cls, param1):
_loc_2 = '0123456789ABCDEF' if cls.hexcase else '0123456789abcdef'
_loc_3 = ''
for _loc_5 in range(0, len(param1)):
_loc_4 = compat_ord(param1[_loc_5])
_loc_3 += _loc_2[_loc_4 >> 4 & 15] + _loc_2[_loc_4 & 15]
return _loc_3
@classmethod
def rstr2b64(cls, param1):
_loc_2 = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_'
_loc_3 = ''
_loc_4 = len(param1)
for _loc_5 in range(0, _loc_4, 3):
_loc_6_1 = compat_ord(param1[_loc_5]) << 16
_loc_6_2 = compat_ord(param1[_loc_5 + 1]) << 8 if _loc_5 + 1 < _loc_4 else 0
_loc_6_3 = compat_ord(param1[_loc_5 + 2]) if _loc_5 + 2 < _loc_4 else 0
_loc_6 = _loc_6_1 | _loc_6_2 | _loc_6_3
for _loc_7 in range(0, 4):
if _loc_5 * 8 + _loc_7 * 6 > len(param1) * 8:
_loc_3 += cls.b64pad
else:
_loc_3 += _loc_2[_loc_6 >> 6 * (3 - _loc_7) & 63]
return _loc_3
@staticmethod
def rstr2any(param1, param2):
_loc_3 = len(param2)
_loc_4 = []
_loc_9 = [0] * ((len(param1) >> 2) + 1)
for _loc_5 in range(0, len(_loc_9)):
_loc_9[_loc_5] = compat_ord(param1[_loc_5 * 2]) << 8 | compat_ord(param1[_loc_5 * 2 + 1])
while len(_loc_9) > 0:
_loc_8 = []
_loc_7 = 0
for _loc_5 in range(0, len(_loc_9)):
_loc_7 = (_loc_7 << 16) + _loc_9[_loc_5]
_loc_6 = math.floor(_loc_7 / _loc_3)
_loc_7 -= _loc_6 * _loc_3
if len(_loc_8) > 0 or _loc_6 > 0:
_loc_8[len(_loc_8)] = _loc_6
_loc_4[len(_loc_4)] = _loc_7
_loc_9 = _loc_8
_loc_10 = ''
_loc_5 = len(_loc_4) - 1
while _loc_5 >= 0:
_loc_10 += param2[_loc_4[_loc_5]]
_loc_5 -= 1
return _loc_10
@classmethod
def str2rstr_utf8(cls, param1, param2=None):
_loc_3 = ''
_loc_4 = -1
if not param2:
param2 = cls.PADDING
param1 = param1 + param2[1:9]
while True:
_loc_4 += 1
if _loc_4 >= len(param1):
break
_loc_5 = compat_ord(param1[_loc_4])
_loc_6 = compat_ord(param1[_loc_4 + 1]) if _loc_4 + 1 < len(param1) else 0
if 55296 <= _loc_5 <= 56319 and 56320 <= _loc_6 <= 57343:
_loc_5 = 65536 + ((_loc_5 & 1023) << 10) + (_loc_6 & 1023)
_loc_4 += 1
if _loc_5 <= 127:
_loc_3 += compat_chr(_loc_5)
continue
if _loc_5 <= 2047:
_loc_3 += compat_chr(192 | _loc_5 >> 6 & 31) + compat_chr(128 | _loc_5 & 63)
continue
if _loc_5 <= 65535:
_loc_3 += compat_chr(224 | _loc_5 >> 12 & 15) + compat_chr(128 | _loc_5 >> 6 & 63) + compat_chr(
128 | _loc_5 & 63)
continue
if _loc_5 <= 2097151:
_loc_3 += compat_chr(240 | _loc_5 >> 18 & 7) + compat_chr(128 | _loc_5 >> 12 & 63) + compat_chr(
128 | _loc_5 >> 6 & 63) + compat_chr(128 | _loc_5 & 63)
return _loc_3
@staticmethod
def rstr2binl(param1):
_loc_2 = [0] * ((len(param1) >> 2) + 1)
for _loc_3 in range(0, len(_loc_2)):
_loc_2[_loc_3] = 0
for _loc_3 in range(0, len(param1) * 8, 8):
_loc_2[_loc_3 >> 5] |= (compat_ord(param1[_loc_3 // 8]) & 255) << _loc_3 % 32
return _loc_2
@staticmethod
def binl2rstr(param1):
_loc_2 = ''
for _loc_3 in range(0, len(param1) * 32, 8):
_loc_2 += compat_chr(param1[_loc_3 >> 5] >> _loc_3 % 32 & 255)
return _loc_2
@classmethod
def binl_md5(cls, param1, param2):
param1 = cls.JSArray(param1)
param1[param2 >> 5] |= 128 << param2 % 32
param1[(param2 + 64 >> 9 << 4) + 14] = param2
_loc_3 = 1732584193
_loc_4 = -271733879
_loc_5 = -1732584194
_loc_6 = 271733878
for _loc_7 in range(0, len(param1), 16):
_loc_8 = _loc_3
_loc_9 = _loc_4
_loc_10 = _loc_5
_loc_11 = _loc_6
_loc_3 = cls.md5_ff(_loc_3, _loc_4, _loc_5, _loc_6, param1[_loc_7 + 0], 7, -680876936)
_loc_6 = cls.md5_ff(_loc_6, _loc_3, _loc_4, _loc_5, param1[_loc_7 + 1], 12, -389564586)
_loc_5 = cls.md5_ff(_loc_5, _loc_6, _loc_3, _loc_4, param1[_loc_7 + 2], 17, 606105819)
_loc_4 = cls.md5_ff(_loc_4, _loc_5, _loc_6, _loc_3, param1[_loc_7 + 3], 22, -1044525330)
_loc_3 = cls.md5_ff(_loc_3, _loc_4, _loc_5, _loc_6, param1[_loc_7 + 4], 7, -176418897)
_loc_6 = cls.md5_ff(_loc_6, _loc_3, _loc_4, _loc_5, param1[_loc_7 + 5], 12, 1200080426)
_loc_5 = cls.md5_ff(_loc_5, _loc_6, _loc_3, _loc_4, param1[_loc_7 + 6], 17, -1473231341)
_loc_4 = cls.md5_ff(_loc_4, _loc_5, _loc_6, _loc_3, param1[_loc_7 + 7], 22, -45705983)
_loc_3 = cls.md5_ff(_loc_3, _loc_4, _loc_5, _loc_6, param1[_loc_7 + 8], 7, 1770035416)
_loc_6 = cls.md5_ff(_loc_6, _loc_3, _loc_4, _loc_5, param1[_loc_7 + 9], 12, -1958414417)
_loc_5 = cls.md5_ff(_loc_5, _loc_6, _loc_3, _loc_4, param1[_loc_7 + 10], 17, -42063)
_loc_4 = cls.md5_ff(_loc_4, _loc_5, _loc_6, _loc_3, param1[_loc_7 + 11], 22, -1990404162)
_loc_3 = cls.md5_ff(_loc_3, _loc_4, _loc_5, _loc_6, param1[_loc_7 + 12], 7, 1804603682)
_loc_6 = cls.md5_ff(_loc_6, _loc_3, _loc_4, _loc_5, param1[_loc_7 + 13], 12, -40341101)
_loc_5 = cls.md5_ff(_loc_5, _loc_6, _loc_3, _loc_4, param1[_loc_7 + 14], 17, -1502002290)
_loc_4 = cls.md5_ff(_loc_4, _loc_5, _loc_6, _loc_3, param1[_loc_7 + 15], 22, 1236535329)
_loc_3 = cls.md5_gg(_loc_3, _loc_4, _loc_5, _loc_6, param1[_loc_7 + 1], 5, -165796510)
_loc_6 = cls.md5_gg(_loc_6, _loc_3, _loc_4, _loc_5, param1[_loc_7 + 6], 9, -1069501632)
_loc_5 = cls.md5_gg(_loc_5, _loc_6, _loc_3, _loc_4, param1[_loc_7 + 11], 14, 643717713)
_loc_4 = cls.md5_gg(_loc_4, _loc_5, _loc_6, _loc_3, param1[_loc_7 + 0], 20, -373897302)
_loc_3 = cls.md5_gg(_loc_3, _loc_4, _loc_5, _loc_6, param1[_loc_7 + 5], 5, -701558691)
_loc_6 = cls.md5_gg(_loc_6, _loc_3, _loc_4, _loc_5, param1[_loc_7 + 10], 9, 38016083)
_loc_5 = cls.md5_gg(_loc_5, _loc_6, _loc_3, _loc_4, param1[_loc_7 + 15], 14, -660478335)
_loc_4 = cls.md5_gg(_loc_4, _loc_5, _loc_6, _loc_3, param1[_loc_7 + 4], 20, -405537848)
_loc_3 = cls.md5_gg(_loc_3, _loc_4, _loc_5, _loc_6, param1[_loc_7 + 9], 5, 568446438)
_loc_6 = cls.md5_gg(_loc_6, _loc_3, _loc_4, _loc_5, param1[_loc_7 + 14], 9, -1019803690)
_loc_5 = cls.md5_gg(_loc_5, _loc_6, _loc_3, _loc_4, param1[_loc_7 + 3], 14, -187363961)
_loc_4 = cls.md5_gg(_loc_4, _loc_5, _loc_6, _loc_3, param1[_loc_7 + 8], 20, 1163531501)
_loc_3 = cls.md5_gg(_loc_3, _loc_4, _loc_5, _loc_6, param1[_loc_7 + 13], 5, -1444681467)
_loc_6 = cls.md5_gg(_loc_6, _loc_3, _loc_4, _loc_5, param1[_loc_7 + 2], 9, -51403784)
_loc_5 = cls.md5_gg(_loc_5, _loc_6, _loc_3, _loc_4, param1[_loc_7 + 7], 14, 1735328473)
_loc_4 = cls.md5_gg(_loc_4, _loc_5, _loc_6, _loc_3, param1[_loc_7 + 12], 20, -1926607734)
_loc_3 = cls.md5_hh(_loc_3, _loc_4, _loc_5, _loc_6, param1[_loc_7 + 5], 4, -378558)
_loc_6 = cls.md5_hh(_loc_6, _loc_3, _loc_4, _loc_5, param1[_loc_7 + 8], 11, -2022574463)
_loc_5 = cls.md5_hh(_loc_5, _loc_6, _loc_3, _loc_4, param1[_loc_7 + 11], 16, 1839030562)
_loc_4 = cls.md5_hh(_loc_4, _loc_5, _loc_6, _loc_3, param1[_loc_7 + 14], 23, -35309556)
_loc_3 = cls.md5_hh(_loc_3, _loc_4, _loc_5, _loc_6, param1[_loc_7 + 1], 4, -1530992060)
_loc_6 = cls.md5_hh(_loc_6, _loc_3, _loc_4, _loc_5, param1[_loc_7 + 4], 11, 1272893353)
_loc_5 = cls.md5_hh(_loc_5, _loc_6, _loc_3, _loc_4, param1[_loc_7 + 7], 16, -155497632)
_loc_4 = cls.md5_hh(_loc_4, _loc_5, _loc_6, _loc_3, param1[_loc_7 + 10], 23, -1094730640)
_loc_3 = cls.md5_hh(_loc_3, _loc_4, _loc_5, _loc_6, param1[_loc_7 + 13], 4, 681279174)
_loc_6 = cls.md5_hh(_loc_6, _loc_3, _loc_4, _loc_5, param1[_loc_7 + 0], 11, -358537222)
_loc_5 = cls.md5_hh(_loc_5, _loc_6, _loc_3, _loc_4, param1[_loc_7 + 3], 16, -722521979)
_loc_4 = cls.md5_hh(_loc_4, _loc_5, _loc_6, _loc_3, param1[_loc_7 + 6], 23, 76029189)
_loc_3 = cls.md5_hh(_loc_3, _loc_4, _loc_5, _loc_6, param1[_loc_7 + 9], 4, -640364487)
_loc_6 = cls.md5_hh(_loc_6, _loc_3, _loc_4, _loc_5, param1[_loc_7 + 12], 11, -421815835)
_loc_5 = cls.md5_hh(_loc_5, _loc_6, _loc_3, _loc_4, param1[_loc_7 + 15], 16, 530742520)
_loc_4 = cls.md5_hh(_loc_4, _loc_5, _loc_6, _loc_3, param1[_loc_7 + 2], 23, -995338651)
_loc_3 = cls.md5_ii(_loc_3, _loc_4, _loc_5, _loc_6, param1[_loc_7 + 0], 6, -198630844)
_loc_6 = cls.md5_ii(_loc_6, _loc_3, _loc_4, _loc_5, param1[_loc_7 + 7], 10, 1126891415)
_loc_5 = cls.md5_ii(_loc_5, _loc_6, _loc_3, _loc_4, param1[_loc_7 + 14], 15, -1416354905)
_loc_4 = cls.md5_ii(_loc_4, _loc_5, _loc_6, _loc_3, param1[_loc_7 + 5], 21, -57434055)
_loc_3 = cls.md5_ii(_loc_3, _loc_4, _loc_5, _loc_6, param1[_loc_7 + 12], 6, 1700485571)
_loc_6 = cls.md5_ii(_loc_6, _loc_3, _loc_4, _loc_5, param1[_loc_7 + 3], 10, -1894986606)
_loc_5 = cls.md5_ii(_loc_5, _loc_6, _loc_3, _loc_4, param1[_loc_7 + 10], 15, -1051523)
_loc_4 = cls.md5_ii(_loc_4, _loc_5, _loc_6, _loc_3, param1[_loc_7 + 1], 21, -2054922799)
_loc_3 = cls.md5_ii(_loc_3, _loc_4, _loc_5, _loc_6, param1[_loc_7 + 8], 6, 1873313359)
_loc_6 = cls.md5_ii(_loc_6, _loc_3, _loc_4, _loc_5, param1[_loc_7 + 15], 10, -30611744)
_loc_5 = cls.md5_ii(_loc_5, _loc_6, _loc_3, _loc_4, param1[_loc_7 + 6], 15, -1560198380)
_loc_4 = cls.md5_ii(_loc_4, _loc_5, _loc_6, _loc_3, param1[_loc_7 + 13], 21, 1309151649)
_loc_3 = cls.md5_ii(_loc_3, _loc_4, _loc_5, _loc_6, param1[_loc_7 + 4], 6, -145523070)
_loc_6 = cls.md5_ii(_loc_6, _loc_3, _loc_4, _loc_5, param1[_loc_7 + 11], 10, -1120210379)
_loc_5 = cls.md5_ii(_loc_5, _loc_6, _loc_3, _loc_4, param1[_loc_7 + 2], 15, 718787259)
_loc_4 = cls.md5_ii(_loc_4, _loc_5, _loc_6, _loc_3, param1[_loc_7 + 9], 21, -343485551)
_loc_3 = cls.safe_add(_loc_3, _loc_8)
_loc_4 = cls.safe_add(_loc_4, _loc_9)
_loc_5 = cls.safe_add(_loc_5, _loc_10)
_loc_6 = cls.safe_add(_loc_6, _loc_11)
return [_loc_3, _loc_4, _loc_5, _loc_6]
@classmethod
def md5_cmn(cls, param1, param2, param3, param4, param5, param6):
return cls.safe_add(
cls.bit_rol(cls.safe_add(cls.safe_add(param2, param1), cls.safe_add(param4, param6)), param5), param3)
@classmethod
def md5_ff(cls, param1, param2, param3, param4, param5, param6, param7):
return cls.md5_cmn(param2 & param3 | ~param2 & param4, param1, param2, param5, param6, param7)
@classmethod
def md5_gg(cls, param1, param2, param3, param4, param5, param6, param7):
return cls.md5_cmn(param2 & param4 | param3 & ~param4, param1, param2, param5, param6, param7)
@classmethod
def md5_hh(cls, param1, param2, param3, param4, param5, param6, param7):
return cls.md5_cmn(param2 ^ param3 ^ param4, param1, param2, param5, param6, param7)
@classmethod
def md5_ii(cls, param1, param2, param3, param4, param5, param6, param7):
return cls.md5_cmn(param3 ^ (param2 | ~param4), param1, param2, param5, param6, param7)
@classmethod
def safe_add(cls, param1, param2):
_loc_3 = (param1 & 65535) + (param2 & 65535)
_loc_4 = (param1 >> 16) + (param2 >> 16) + (_loc_3 >> 16)
return cls.lshift(_loc_4, 16) | _loc_3 & 65535
@classmethod
def bit_rol(cls, param1, param2):
return cls.lshift(param1, param2) | (param1 & 0xFFFFFFFF) >> (32 - param2)
@staticmethod
def lshift(value, count):
r = (0xFFFFFFFF & value) << count
return -(~(r - 1) & 0xFFFFFFFF) if r > 0x7FFFFFFF else r
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
video = self._download_json( video = self._download_json(
self._API_URL_TEMPLATE % video_id, video_id)['videos'][0] 'http://api.globovideos.com/videos/%s/playlist' % video_id,
video_id)['videos'][0]
title = video['title'] title = video['title']
formats = [] formats = []
for resource in video['resources']: for resource in video['resources']:
resource_id = resource.get('_id') resource_id = resource.get('_id')
if not resource_id or resource_id.endswith('manifest'): resource_url = resource.get('url')
if not resource_id or not resource_url:
continue continue
security = self._download_json( security = self._download_json(
self._SECURITY_URL_TEMPLATE % (video_id, resource_id), 'http://security.video.globo.com/videos/%s/hash' % video_id,
video_id, 'Downloading security hash for %s' % resource_id) video_id, 'Downloading security hash for %s' % resource_id, query={
'player': 'flash',
'version': '17.0.0.132',
'resource_id': resource_id,
})
security_hash = security.get('hash') security_hash = security.get('hash')
if not security_hash: if not security_hash:
@ -361,22 +121,28 @@ class GloboIE(InfoExtractor):
continue continue
hash_code = security_hash[:2] hash_code = security_hash[:2]
received_time = int(security_hash[2:12]) received_time = security_hash[2:12]
received_random = security_hash[12:22] received_random = security_hash[12:22]
received_md5 = security_hash[22:] received_md5 = security_hash[22:]
sign_time = received_time + self._RESIGN_EXPIRATION sign_time = compat_str(int(received_time) + 86400)
padding = '%010d' % random.randint(1, 10000000000) padding = '%010d' % random.randint(1, 10000000000)
signed_md5 = self.MD5.b64_md5(received_md5 + compat_str(sign_time) + padding) md5_data = (received_md5 + sign_time + padding + '0xFF01DD').encode()
signed_hash = hash_code + compat_str(received_time) + received_random + compat_str(sign_time) + padding + signed_md5 signed_md5 = base64.urlsafe_b64encode(hashlib.md5(md5_data).digest()).decode().strip('=')
signed_hash = hash_code + received_time + received_random + sign_time + padding + signed_md5
resource_url = resource['url']
signed_url = '%s?h=%s&k=%s' % (resource_url, signed_hash, 'flash') signed_url = '%s?h=%s&k=%s' % (resource_url, signed_hash, 'flash')
if resource_id.endswith('m3u8') or resource_url.endswith('.m3u8'): if resource_id.endswith('m3u8') or resource_url.endswith('.m3u8'):
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
signed_url, resource_id, 'mp4', entry_protocol='m3u8_native', signed_url, resource_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False)) m3u8_id='hls', fatal=False))
elif resource_id.endswith('mpd') or resource_url.endswith('.mpd'):
formats.extend(self._extract_mpd_formats(
signed_url, resource_id, mpd_id='dash', fatal=False))
elif resource_id.endswith('manifest') or resource_url.endswith('/manifest'):
formats.extend(self._extract_ism_formats(
signed_url, resource_id, ism_id='mss', fatal=False))
else: else:
formats.append({ formats.append({
'url': signed_url, 'url': signed_url,

View File

@ -123,7 +123,7 @@ class GoIE(AdobePassIE):
'adobe_requestor_id': requestor_id, 'adobe_requestor_id': requestor_id,
}) })
else: else:
self._initialize_geo_bypass(['US']) self._initialize_geo_bypass({'countries': ['US']})
entitlement = self._download_json( entitlement = self._download_json(
'https://api.entitlement.watchabc.go.com/vp2/ws-secure/entitlement/2020/authorize.json', 'https://api.entitlement.watchabc.go.com/vp2/ws-secure/entitlement/2020/authorize.json',
video_id, data=urlencode_postdata(data)) video_id, data=urlencode_postdata(data))

View File

@ -6,7 +6,9 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
ExtractorError,
int_or_none, int_or_none,
parse_age_limit,
parse_iso8601, parse_iso8601,
) )
@ -23,6 +25,7 @@ class Go90IE(InfoExtractor):
'description': 'VICE\'s Karley Sciortino meets with activists who discuss the state\'s strong anti-porn stance. Then, VICE Sports explains NFL contracts.', 'description': 'VICE\'s Karley Sciortino meets with activists who discuss the state\'s strong anti-porn stance. Then, VICE Sports explains NFL contracts.',
'timestamp': 1491868800, 'timestamp': 1491868800,
'upload_date': '20170411', 'upload_date': '20170411',
'age_limit': 14,
} }
} }
@ -33,6 +36,8 @@ class Go90IE(InfoExtractor):
video_id, headers={ video_id, headers={
'Content-Type': 'application/json; charset=utf-8', 'Content-Type': 'application/json; charset=utf-8',
}, data=b'{"client":"web","device_type":"pc"}') }, data=b'{"client":"web","device_type":"pc"}')
if video_data.get('requires_drm'):
raise ExtractorError('This video is DRM protected.', expected=True)
main_video_asset = video_data['main_video_asset'] main_video_asset = video_data['main_video_asset']
episode_number = int_or_none(video_data.get('episode_number')) episode_number = int_or_none(video_data.get('episode_number'))
@ -123,4 +128,5 @@ class Go90IE(InfoExtractor):
'season_number': season_number, 'season_number': season_number,
'episode_number': episode_number, 'episode_number': episode_number,
'subtitles': subtitles, 'subtitles': subtitles,
'age_limit': parse_age_limit(video_data.get('rating')),
} }

View File

@ -17,6 +17,8 @@ class HiDiveIE(InfoExtractor):
# Using X-Forwarded-For results in 403 HTTP error for HLS fragments, # Using X-Forwarded-For results in 403 HTTP error for HLS fragments,
# so disabling geo bypass completely # so disabling geo bypass completely
_GEO_BYPASS = False _GEO_BYPASS = False
_NETRC_MACHINE = 'hidive'
_LOGIN_URL = 'https://www.hidive.com/account/login'
_TESTS = [{ _TESTS = [{
'url': 'https://www.hidive.com/stream/the-comic-artist-and-his-assistants/s01e001', 'url': 'https://www.hidive.com/stream/the-comic-artist-and-his-assistants/s01e001',
@ -31,8 +33,26 @@ class HiDiveIE(InfoExtractor):
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
'skip': 'Requires Authentication',
}] }]
def _real_initialize(self):
email, password = self._get_login_info()
if email is None:
return
webpage = self._download_webpage(self._LOGIN_URL, None)
form = self._search_regex(
r'(?s)<form[^>]+action="/account/login"[^>]*>(.+?)</form>',
webpage, 'login form')
data = self._hidden_inputs(form)
data.update({
'Email': email,
'Password': password,
})
self._download_webpage(
self._LOGIN_URL, None, 'Logging in', data=urlencode_postdata(data))
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
title, key = mobj.group('title', 'key') title, key = mobj.group('title', 'key')
@ -43,6 +63,7 @@ class HiDiveIE(InfoExtractor):
data=urlencode_postdata({ data=urlencode_postdata({
'Title': title, 'Title': title,
'Key': key, 'Key': key,
'PlayerId': 'f4f895ce1ca713ba263b91caeb1daa2d08904783',
})) }))
restriction = settings.get('restrictionReason') restriction = settings.get('restrictionReason')
@ -79,6 +100,7 @@ class HiDiveIE(InfoExtractor):
subtitles.setdefault(cc_lang, []).append({ subtitles.setdefault(cc_lang, []).append({
'url': cc_url, 'url': cc_url,
}) })
self._sort_formats(formats)
season_number = int_or_none(self._search_regex( season_number = int_or_none(self._search_regex(
r's(\d+)', key, 'season number', default=None)) r's(\d+)', key, 'season number', default=None))

View File

@ -66,7 +66,7 @@ class HRTiBaseIE(InfoExtractor):
self._logout_url = modules['user']['resources']['logout']['uri'] self._logout_url = modules['user']['resources']['logout']['uri']
def _login(self): def _login(self):
(username, password) = self._get_login_info() username, password = self._get_login_info()
# TODO: figure out authentication with cookies # TODO: figure out authentication with cookies
if username is None or password is None: if username is None or password is None:
self.raise_login_required() self.raise_login_required()

View File

@ -3,25 +3,27 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str
from ..utils import ( from ..utils import (
determine_ext,
mimetype2ext, mimetype2ext,
parse_duration,
qualities, qualities,
remove_end,
) )
class ImdbIE(InfoExtractor): class ImdbIE(InfoExtractor):
IE_NAME = 'imdb' IE_NAME = 'imdb'
IE_DESC = 'Internet Movie Database trailers' IE_DESC = 'Internet Movie Database trailers'
_VALID_URL = r'https?://(?:www|m)\.imdb\.com/(?:video|title).+?[/-]vi(?P<id>\d+)' _VALID_URL = r'https?://(?:www|m)\.imdb\.com/(?:video|title|list).+?[/-]vi(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.imdb.com/video/imdb/vi2524815897', 'url': 'http://www.imdb.com/video/imdb/vi2524815897',
'info_dict': { 'info_dict': {
'id': '2524815897', 'id': '2524815897',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Ice Age: Continental Drift Trailer (No. 2)', 'title': 'No. 2 from Ice Age: Continental Drift (2012)',
'description': 'md5:9061c2219254e5d14e03c25c98e96a81', 'description': 'md5:87bd0bdc61e351f21f20d2d7441cb4e7',
} }
}, { }, {
'url': 'http://www.imdb.com/video/_/vi2524815897', 'url': 'http://www.imdb.com/video/_/vi2524815897',
@ -38,76 +40,67 @@ class ImdbIE(InfoExtractor):
}, { }, {
'url': 'http://www.imdb.com/title/tt4218696/videoplayer/vi2608641561', 'url': 'http://www.imdb.com/title/tt4218696/videoplayer/vi2608641561',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.imdb.com/list/ls009921623/videoplayer/vi260482329',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage('http://www.imdb.com/video/imdb/vi%s' % video_id, video_id) webpage = self._download_webpage(
descr = self._html_search_regex( 'https://www.imdb.com/videoplayer/vi' + video_id, video_id)
r'(?s)<span itemprop="description">(.*?)</span>', video_metadata = self._parse_json(self._search_regex(
webpage, 'description', fatal=False) r'window\.IMDbReactInitialState\.push\(({.+?})\);', webpage,
player_url = 'http://www.imdb.com/video/imdb/vi%s/imdb/single' % video_id 'video metadata'), video_id)['videos']['videoMetadata']['vi' + video_id]
player_page = self._download_webpage( title = self._html_search_meta(
player_url, video_id, 'Downloading player page') ['og:title', 'twitter:title'], webpage) or self._html_search_regex(
# the player page contains the info for the default format, we have to r'<title>(.+?)</title>', webpage, 'title', fatal=False) or video_metadata['title']
# fetch other pages for the rest of the formats
extra_formats = re.findall(r'href="(?P<url>%s.*?)".*?>(?P<name>.*?)<' % re.escape(player_url), player_page)
format_pages = [
self._download_webpage(
f_url, video_id, 'Downloading info for %s format' % f_name)
for f_url, f_name in extra_formats]
format_pages.append(player_page)
quality = qualities(('SD', '480p', '720p', '1080p')) quality = qualities(('SD', '480p', '720p', '1080p'))
formats = [] formats = []
for format_page in format_pages: for encoding in video_metadata.get('encodings', []):
json_data = self._search_regex( if not encoding or not isinstance(encoding, dict):
r'<script[^>]+class="imdb-player-data"[^>]*?>(.*?)</script>',
format_page, 'json data', flags=re.DOTALL)
info = self._parse_json(json_data, video_id, fatal=False)
if not info:
continue continue
format_info = info.get('videoPlayerObject', {}).get('video', {}) video_url = encoding.get('videoUrl')
if not format_info: if not video_url or not isinstance(video_url, compat_str):
continue continue
video_info_list = format_info.get('videoInfoList') ext = determine_ext(video_url, mimetype2ext(encoding.get('mimeType')))
if not video_info_list or not isinstance(video_info_list, list): if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
video_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
continue continue
video_info = video_info_list[0] format_id = encoding.get('definition')
if not video_info or not isinstance(video_info, dict):
continue
video_url = video_info.get('videoUrl')
if not video_url:
continue
format_id = format_info.get('ffname')
formats.append({ formats.append({
'format_id': format_id, 'format_id': format_id,
'url': video_url, 'url': video_url,
'ext': mimetype2ext(video_info.get('videoMimeType')), 'ext': ext,
'quality': quality(format_id), 'quality': quality(format_id),
}) })
self._sort_formats(formats) self._sort_formats(formats)
return { return {
'id': video_id, 'id': video_id,
'title': remove_end(self._og_search_title(webpage), ' - IMDb'), 'title': title,
'formats': formats, 'formats': formats,
'description': descr, 'description': video_metadata.get('description'),
'thumbnail': format_info.get('slate'), 'thumbnail': video_metadata.get('slate', {}).get('url'),
'duration': parse_duration(video_metadata.get('duration')),
} }
class ImdbListIE(InfoExtractor): class ImdbListIE(InfoExtractor):
IE_NAME = 'imdb:list' IE_NAME = 'imdb:list'
IE_DESC = 'Internet Movie Database lists' IE_DESC = 'Internet Movie Database lists'
_VALID_URL = r'https?://(?:www\.)?imdb\.com/list/(?P<id>[\da-zA-Z_-]{11})' _VALID_URL = r'https?://(?:www\.)?imdb\.com/list/ls(?P<id>\d{9})(?!/videoplayer/vi\d+)'
_TEST = { _TEST = {
'url': 'http://www.imdb.com/list/JFs9NWw6XI0', 'url': 'https://www.imdb.com/list/ls009921623/',
'info_dict': { 'info_dict': {
'id': 'JFs9NWw6XI0', 'id': '009921623',
'title': 'March 23, 2012 Releases', 'title': 'The Bourne Legacy',
'description': 'A list of trailers, clips, and more from The Bourne Legacy, starring Jeremy Renner and Rachel Weisz.',
}, },
'playlist_count': 7, 'playlist_count': 8,
} }
def _real_extract(self, url): def _real_extract(self, url):
@ -115,9 +108,13 @@ class ImdbListIE(InfoExtractor):
webpage = self._download_webpage(url, list_id) webpage = self._download_webpage(url, list_id)
entries = [ entries = [
self.url_result('http://www.imdb.com' + m, 'Imdb') self.url_result('http://www.imdb.com' + m, 'Imdb')
for m in re.findall(r'href="(/video/imdb/vi[^"]+)"\s+data-type="playlist"', webpage)] for m in re.findall(r'href="(/list/ls%s/videoplayer/vi[^"]+)"' % list_id, webpage)]
list_title = self._html_search_regex( list_title = self._html_search_regex(
r'<h1 class="header">(.*?)</h1>', webpage, 'list title') r'<h1[^>]+class="[^"]*header[^"]*"[^>]*>(.*?)</h1>',
webpage, 'list title')
list_description = self._html_search_regex(
r'<div[^>]+class="[^"]*list-description[^"]*"[^>]*><p>(.*?)</p>',
webpage, 'list description')
return self.playlist_result(entries, list_id, list_title) return self.playlist_result(entries, list_id, list_title, list_description)

View File

@ -3,7 +3,6 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_urlparse
from ..utils import ( from ..utils import (
int_or_none, int_or_none,
js_to_json, js_to_json,
@ -21,7 +20,7 @@ class ImgurIE(InfoExtractor):
'id': 'A61SaA1', 'id': 'A61SaA1',
'ext': 'mp4', 'ext': 'mp4',
'title': 're:Imgur GIF$|MRW gifv is up and running without any bugs$', 'title': 're:Imgur GIF$|MRW gifv is up and running without any bugs$',
'description': 'Imgur: The most awesome images on the Internet.', 'description': 'Imgur: The magic of the Internet',
}, },
}, { }, {
'url': 'https://imgur.com/A61SaA1', 'url': 'https://imgur.com/A61SaA1',
@ -29,7 +28,7 @@ class ImgurIE(InfoExtractor):
'id': 'A61SaA1', 'id': 'A61SaA1',
'ext': 'mp4', 'ext': 'mp4',
'title': 're:Imgur GIF$|MRW gifv is up and running without any bugs$', 'title': 're:Imgur GIF$|MRW gifv is up and running without any bugs$',
'description': 'Imgur: The most awesome images on the Internet.', 'description': 'Imgur: The magic of the Internet',
}, },
}, { }, {
'url': 'https://imgur.com/gallery/YcAQlkx', 'url': 'https://imgur.com/gallery/YcAQlkx',
@ -37,8 +36,6 @@ class ImgurIE(InfoExtractor):
'id': 'YcAQlkx', 'id': 'YcAQlkx',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Classic Steve Carell gif...cracks me up everytime....damn the repost downvotes....', 'title': 'Classic Steve Carell gif...cracks me up everytime....damn the repost downvotes....',
'description': 'Imgur: The most awesome images on the Internet.'
} }
}, { }, {
'url': 'http://imgur.com/topic/Funny/N8rOudd', 'url': 'http://imgur.com/topic/Funny/N8rOudd',
@ -50,8 +47,8 @@ class ImgurIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage( gifv_url = 'https://i.imgur.com/{id}.gifv'.format(id=video_id)
compat_urlparse.urljoin(url, video_id), video_id) webpage = self._download_webpage(gifv_url, video_id)
width = int_or_none(self._og_search_property( width = int_or_none(self._og_search_property(
'video:width', webpage, default=None)) 'video:width', webpage, default=None))
@ -107,7 +104,7 @@ class ImgurIE(InfoExtractor):
return { return {
'id': video_id, 'id': video_id,
'formats': formats, 'formats': formats,
'description': self._og_search_description(webpage), 'description': self._og_search_description(webpage, default=None),
'title': self._og_search_title(webpage), 'title': self._og_search_title(webpage),
} }

View File

@ -1,11 +1,15 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str
from ..utils import ( from ..utils import (
int_or_none, int_or_none,
parse_age_limit, parse_age_limit,
parse_iso8601, parse_iso8601,
update_url_query,
) )
@ -13,7 +17,7 @@ class IndavideoEmbedIE(InfoExtractor):
_VALID_URL = r'https?://(?:(?:embed\.)?indavideo\.hu/player/video/|assets\.indavideo\.hu/swf/player\.swf\?.*\b(?:v(?:ID|id))=)(?P<id>[\da-f]+)' _VALID_URL = r'https?://(?:(?:embed\.)?indavideo\.hu/player/video/|assets\.indavideo\.hu/swf/player\.swf\?.*\b(?:v(?:ID|id))=)(?P<id>[\da-f]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://indavideo.hu/player/video/1bdc3c6d80/', 'url': 'http://indavideo.hu/player/video/1bdc3c6d80/',
'md5': 'f79b009c66194acacd40712a6778acfa', 'md5': 'c8a507a1c7410685f83a06eaeeaafeab',
'info_dict': { 'info_dict': {
'id': '1837039', 'id': '1837039',
'ext': 'mp4', 'ext': 'mp4',
@ -36,6 +40,20 @@ class IndavideoEmbedIE(InfoExtractor):
'only_matching': True, 'only_matching': True,
}] }]
# Some example URLs covered by generic extractor:
# http://indavideo.hu/video/Vicces_cica_1
# http://index.indavideo.hu/video/2015_0728_beregszasz
# http://auto.indavideo.hu/video/Sajat_utanfutoban_a_kis_tacsko
# http://erotika.indavideo.hu/video/Amator_tini_punci
# http://film.indavideo.hu/video/f_hrom_nagymamm_volt
# http://palyazat.indavideo.hu/video/Embertelen_dal_Dodgem_egyuttes
@staticmethod
def _extract_urls(webpage):
return re.findall(
r'<iframe[^>]+\bsrc=["\'](?P<url>(?:https?:)?//embed\.indavideo\.hu/player/video/[\da-f]+)',
webpage)
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
@ -45,7 +63,14 @@ class IndavideoEmbedIE(InfoExtractor):
title = video['title'] title = video['title']
video_urls = video.get('video_files', []) video_urls = []
video_files = video.get('video_files')
if isinstance(video_files, list):
video_urls.extend(video_files)
elif isinstance(video_files, dict):
video_urls.extend(video_files.values())
video_file = video.get('video_file') video_file = video.get('video_file')
if video: if video:
video_urls.append(video_file) video_urls.append(video_file)
@ -58,11 +83,23 @@ class IndavideoEmbedIE(InfoExtractor):
if flv_url not in video_urls: if flv_url not in video_urls:
video_urls.append(flv_url) video_urls.append(flv_url)
formats = [{ filesh = video.get('filesh')
formats = []
for video_url in video_urls:
height = int_or_none(self._search_regex(
r'\.(\d{3,4})\.mp4(?:\?|$)', video_url, 'height', default=None))
if filesh:
if not height:
continue
token = filesh.get(compat_str(height))
if token is None:
continue
video_url = update_url_query(video_url, {'token': token})
formats.append({
'url': video_url, 'url': video_url,
'height': int_or_none(self._search_regex( 'height': height,
r'\.(\d{3,4})\.mp4(?:\?|$)', video_url, 'height', default=None)), })
} for video_url in video_urls]
self._sort_formats(formats) self._sort_formats(formats)
timestamp = video.get('date') timestamp = video.get('date')
@ -89,55 +126,3 @@ class IndavideoEmbedIE(InfoExtractor):
'tags': tags, 'tags': tags,
'formats': formats, 'formats': formats,
} }
class IndavideoIE(InfoExtractor):
_VALID_URL = r'https?://(?:.+?\.)?indavideo\.hu/video/(?P<id>[^/#?]+)'
_TESTS = [{
'url': 'http://indavideo.hu/video/Vicces_cica_1',
'md5': '8c82244ba85d2a2310275b318eb51eac',
'info_dict': {
'id': '1335611',
'display_id': 'Vicces_cica_1',
'ext': 'mp4',
'title': 'Vicces cica',
'description': 'Játszik a tablettel. :D',
'thumbnail': r're:^https?://.*\.jpg$',
'uploader': 'Jet_Pack',
'uploader_id': '491217',
'timestamp': 1390821212,
'upload_date': '20140127',
'duration': 7,
'age_limit': 0,
'tags': ['vicces', 'macska', 'cica', 'ügyes', 'nevetés', 'játszik', 'Cukiság', 'Jet_Pack'],
},
}, {
'url': 'http://index.indavideo.hu/video/2015_0728_beregszasz',
'only_matching': True,
}, {
'url': 'http://auto.indavideo.hu/video/Sajat_utanfutoban_a_kis_tacsko',
'only_matching': True,
}, {
'url': 'http://erotika.indavideo.hu/video/Amator_tini_punci',
'only_matching': True,
}, {
'url': 'http://film.indavideo.hu/video/f_hrom_nagymamm_volt',
'only_matching': True,
}, {
'url': 'http://palyazat.indavideo.hu/video/Embertelen_dal_Dodgem_egyuttes',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
embed_url = self._search_regex(
r'<link[^>]+rel="video_src"[^>]+href="(.+?)"', webpage, 'embed url')
return {
'_type': 'url_transparent',
'ie_key': 'IndavideoEmbed',
'url': embed_url,
'display_id': display_id,
}

View File

@ -239,7 +239,7 @@ class IqiyiIE(InfoExtractor):
return ohdave_rsa_encrypt(data, e, N) return ohdave_rsa_encrypt(data, e, N)
def _login(self): def _login(self):
(username, password) = self._get_login_info() username, password = self._get_login_info()
# No authentication to be performed # No authentication to be performed
if not username: if not username:

View File

@ -7,6 +7,7 @@ import json
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from .brightcove import BrightcoveNewIE
from ..compat import ( from ..compat import (
compat_str, compat_str,
compat_etree_register_namespace, compat_etree_register_namespace,
@ -18,6 +19,7 @@ from ..utils import (
xpath_text, xpath_text,
int_or_none, int_or_none,
parse_duration, parse_duration,
smuggle_url,
ExtractorError, ExtractorError,
determine_ext, determine_ext,
) )
@ -41,6 +43,14 @@ class ITVIE(InfoExtractor):
# unavailable via data-playlist-url # unavailable via data-playlist-url
'url': 'https://www.itv.com/hub/through-the-keyhole/2a2271a0033', 'url': 'https://www.itv.com/hub/through-the-keyhole/2a2271a0033',
'only_matching': True, 'only_matching': True,
}, {
# InvalidVodcrid
'url': 'https://www.itv.com/hub/james-martins-saturday-morning/2a5159a0034',
'only_matching': True,
}, {
# ContentUnavailable
'url': 'https://www.itv.com/hub/whos-doing-the-dishes/2a2898a0024',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -127,7 +137,8 @@ class ITVIE(InfoExtractor):
if fault_code == 'InvalidGeoRegion': if fault_code == 'InvalidGeoRegion':
self.raise_geo_restricted( self.raise_geo_restricted(
msg=fault_string, countries=self._GEO_COUNTRIES) msg=fault_string, countries=self._GEO_COUNTRIES)
elif fault_code != 'InvalidEntity': elif fault_code not in (
'InvalidEntity', 'InvalidVodcrid', 'ContentUnavailable'):
raise ExtractorError( raise ExtractorError(
'%s said: %s' % (self.IE_NAME, fault_string), expected=True) '%s said: %s' % (self.IE_NAME, fault_string), expected=True)
info.update({ info.update({
@ -251,3 +262,38 @@ class ITVIE(InfoExtractor):
'subtitles': subtitles, 'subtitles': subtitles,
}) })
return info return info
class ITVBTCCIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?itv\.com/btcc/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TEST = {
'url': 'http://www.itv.com/btcc/races/btcc-2018-all-the-action-from-brands-hatch',
'info_dict': {
'id': 'btcc-2018-all-the-action-from-brands-hatch',
'title': 'BTCC 2018: All the action from Brands Hatch',
},
'playlist_mincount': 9,
}
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/1582188683001/HkiHLnNRx_default/index.html?videoId=%s'
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
entries = [
self.url_result(
smuggle_url(self.BRIGHTCOVE_URL_TEMPLATE % video_id, {
# ITV does not like some GB IP ranges, so here are some
# IP blocks it accepts
'geo_ip_blocks': [
'193.113.0.0/16', '54.36.162.0/23', '159.65.16.0/21'
],
'referrer': url,
}),
ie=BrightcoveNewIE.ie_key(), video_id=video_id)
for video_id in re.findall(r'data-video-id=["\'](\d+)', webpage)]
title = self._og_search_title(webpage, fatal=False)
return self.playlist_result(entries, playlist_id, title)

View File

@ -1,10 +1,11 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_urllib_parse_unquote from ..compat import (
compat_str,
compat_urllib_parse_unquote,
)
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
float_or_none, float_or_none,
@ -57,12 +58,33 @@ class IzleseneIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
url = 'http://www.izlesene.com/video/%s' % video_id webpage = self._download_webpage('http://www.izlesene.com/video/%s' % video_id, video_id)
webpage = self._download_webpage(url, video_id)
video = self._parse_json(
self._search_regex(
r'videoObj\s*=\s*({.+?})\s*;\s*\n', webpage, 'streams'),
video_id)
title = video.get('videoTitle') or self._og_search_title(webpage)
formats = []
for stream in video['media']['level']:
source_url = stream.get('source')
if not source_url or not isinstance(source_url, compat_str):
continue
ext = determine_ext(url, 'mp4')
quality = stream.get('value')
height = int_or_none(quality)
formats.append({
'format_id': '%sp' % quality if quality else 'sd',
'url': compat_urllib_parse_unquote(source_url),
'ext': ext,
'height': height,
})
self._sort_formats(formats)
title = self._og_search_title(webpage)
description = self._og_search_description(webpage, default=None) description = self._og_search_description(webpage, default=None)
thumbnail = self._proto_relative_url( thumbnail = video.get('posterURL') or self._proto_relative_url(
self._og_search_thumbnail(webpage), scheme='http:') self._og_search_thumbnail(webpage), scheme='http:')
uploader = self._html_search_regex( uploader = self._html_search_regex(
@ -71,41 +93,15 @@ class IzleseneIE(InfoExtractor):
timestamp = parse_iso8601(self._html_search_meta( timestamp = parse_iso8601(self._html_search_meta(
'uploadDate', webpage, 'upload date')) 'uploadDate', webpage, 'upload date'))
duration = float_or_none(self._html_search_regex( duration = float_or_none(video.get('duration') or self._html_search_regex(
r'"videoduration"\s*:\s*"([^"]+)"', r'videoduration["\']?\s*=\s*(["\'])(?P<value>(?:(?!\1).)+)\1',
webpage, 'duration', fatal=False), scale=1000) webpage, 'duration', fatal=False, group='value'), scale=1000)
view_count = str_to_int(get_element_by_id('videoViewCount', webpage)) view_count = str_to_int(get_element_by_id('videoViewCount', webpage))
comment_count = self._html_search_regex( comment_count = self._html_search_regex(
r'comment_count\s*=\s*\'([^\']+)\';', r'comment_count\s*=\s*\'([^\']+)\';',
webpage, 'comment_count', fatal=False) webpage, 'comment_count', fatal=False)
content_url = self._html_search_meta(
'contentURL', webpage, 'content URL', fatal=False)
ext = determine_ext(content_url, 'mp4')
# Might be empty for some videos.
streams = self._html_search_regex(
r'"qualitylevel"\s*:\s*"([^"]+)"', webpage, 'streams', default='')
formats = []
if streams:
for stream in streams.split('|'):
quality, url = re.search(r'\[(\w+)\](.+)', stream).groups()
formats.append({
'format_id': '%sp' % quality if quality else 'sd',
'url': compat_urllib_parse_unquote(url),
'ext': ext,
})
else:
stream_url = self._search_regex(
r'"streamurl"\s*:\s*"([^"]+)"', webpage, 'stream URL')
formats.append({
'format_id': 'sd',
'url': compat_urllib_parse_unquote(stream_url),
'ext': ext,
})
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': title,

View File

@ -136,9 +136,10 @@ class KalturaIE(InfoExtractor):
re.search( re.search(
r'''(?xs) r'''(?xs)
<(?:iframe[^>]+src|meta[^>]+\bcontent)=(?P<q1>["']) <(?:iframe[^>]+src|meta[^>]+\bcontent)=(?P<q1>["'])
(?:https?:)?//(?:(?:www|cdnapi)\.)?kaltura\.com/(?:(?!(?P=q1)).)*\b(?:p|partner_id)/(?P<partner_id>\d+) (?:https?:)?//(?:(?:www|cdnapi(?:sec)?)\.)?kaltura\.com/(?:(?!(?P=q1)).)*\b(?:p|partner_id)/(?P<partner_id>\d+)
(?:(?!(?P=q1)).)* (?:(?!(?P=q1)).)*
[?&;]entry_id=(?P<id>(?:(?!(?P=q1))[^&])+) [?&;]entry_id=(?P<id>(?:(?!(?P=q1))[^&])+)
(?:(?!(?P=q1)).)*
(?P=q1) (?P=q1)
''', webpage) ''', webpage)
) )

View File

@ -130,7 +130,7 @@ class LeIE(InfoExtractor):
media_id, 'Downloading flash playJson data', query={ media_id, 'Downloading flash playJson data', query={
'id': media_id, 'id': media_id,
'platid': 1, 'platid': 1,
'splatid': 101, 'splatid': 105,
'format': 1, 'format': 1,
'source': 1000, 'source': 1000,
'tkey': self.calc_time_key(int(time.time())), 'tkey': self.calc_time_key(int(time.time())),

View File

@ -282,7 +282,9 @@ class LimelightMediaIE(LimelightBaseIE):
def _real_extract(self, url): def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {}) url, smuggled_data = unsmuggle_url(url, {})
video_id = self._match_id(url) video_id = self._match_id(url)
self._initialize_geo_bypass(smuggled_data.get('geo_countries')) self._initialize_geo_bypass({
'countries': smuggled_data.get('geo_countries'),
})
pc, mobile, metadata = self._extract( pc, mobile, metadata = self._extract(
video_id, 'getPlaylistByMediaId', video_id, 'getPlaylistByMediaId',

View File

@ -4,7 +4,10 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import int_or_none from ..utils import (
int_or_none,
parse_codecs,
)
class MinotoIE(InfoExtractor): class MinotoIE(InfoExtractor):
@ -26,7 +29,7 @@ class MinotoIE(InfoExtractor):
formats.extend(fmt_url, video_id, 'mp4', m3u8_id='hls', fatal=False) formats.extend(fmt_url, video_id, 'mp4', m3u8_id='hls', fatal=False)
else: else:
fmt_profile = fmt.get('profile') or {} fmt_profile = fmt.get('profile') or {}
f = { formats.append({
'format_id': fmt_profile.get('name-short'), 'format_id': fmt_profile.get('name-short'),
'format_note': fmt_profile.get('name'), 'format_note': fmt_profile.get('name'),
'url': fmt_url, 'url': fmt_url,
@ -35,16 +38,8 @@ class MinotoIE(InfoExtractor):
'filesize': int_or_none(fmt.get('filesize')), 'filesize': int_or_none(fmt.get('filesize')),
'width': int_or_none(fmt.get('width')), 'width': int_or_none(fmt.get('width')),
'height': int_or_none(fmt.get('height')), 'height': int_or_none(fmt.get('height')),
} 'codecs': parse_codecs(fmt.get('codecs')),
codecs = fmt.get('codecs')
if codecs:
codecs = codecs.split(',')
if len(codecs) == 2:
f.update({
'vcodec': codecs[0],
'acodec': codecs[1],
}) })
formats.append(f)
self._sort_formats(formats) self._sort_formats(formats)
return { return {

View File

@ -179,6 +179,10 @@ class MixcloudIE(InfoExtractor):
formats.append({ formats.append({
'format_id': 'http', 'format_id': 'http',
'url': decrypted, 'url': decrypted,
'downloader_options': {
# Mixcloud starts throttling at >~5M
'http_chunk_size': 5242880,
},
}) })
self._sort_formats(formats) self._sort_formats(formats)

View File

@ -1,116 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
import os.path
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
remove_start,
sanitized_Request,
urlencode_postdata,
)
class MonikerIE(InfoExtractor):
IE_DESC = 'allmyvideos.net and vidspot.net'
_VALID_URL = r'https?://(?:www\.)?(?:allmyvideos|vidspot)\.net/(?:(?:2|v)/v-)?(?P<id>[a-zA-Z0-9_-]+)'
_TESTS = [{
'url': 'http://allmyvideos.net/jih3nce3x6wn',
'md5': '710883dee1bfc370ecf9fa6a89307c88',
'info_dict': {
'id': 'jih3nce3x6wn',
'ext': 'mp4',
'title': 'youtube-dl test video',
},
}, {
'url': 'http://allmyvideos.net/embed-jih3nce3x6wn',
'md5': '710883dee1bfc370ecf9fa6a89307c88',
'info_dict': {
'id': 'jih3nce3x6wn',
'ext': 'mp4',
'title': 'youtube-dl test video',
},
}, {
'url': 'http://vidspot.net/l2ngsmhs8ci5',
'md5': '710883dee1bfc370ecf9fa6a89307c88',
'info_dict': {
'id': 'l2ngsmhs8ci5',
'ext': 'mp4',
'title': 'youtube-dl test video',
},
}, {
'url': 'https://www.vidspot.net/l2ngsmhs8ci5',
'only_matching': True,
}, {
'url': 'http://vidspot.net/2/v-ywDf99',
'md5': '5f8254ce12df30479428b0152fb8e7ba',
'info_dict': {
'id': 'ywDf99',
'ext': 'mp4',
'title': 'IL FAIT LE MALIN EN PORSHE CAYENNE ( mais pas pour longtemps)',
'description': 'IL FAIT LE MALIN EN PORSHE CAYENNE.',
},
}, {
'url': 'http://allmyvideos.net/v/v-HXZm5t',
'only_matching': True,
}]
def _real_extract(self, url):
orig_video_id = self._match_id(url)
video_id = remove_start(orig_video_id, 'embed-')
url = url.replace(orig_video_id, video_id)
assert re.match(self._VALID_URL, url) is not None
orig_webpage = self._download_webpage(url, video_id)
if '>File Not Found<' in orig_webpage:
raise ExtractorError('Video %s does not exist' % video_id, expected=True)
error = self._search_regex(
r'class="err">([^<]+)<', orig_webpage, 'error', default=None)
if error:
raise ExtractorError(
'%s returned error: %s' % (self.IE_NAME, error), expected=True)
builtin_url = self._search_regex(
r'<iframe[^>]+src=(["\'])(?P<url>.+?/builtin-.+?)\1',
orig_webpage, 'builtin URL', default=None, group='url')
if builtin_url:
req = sanitized_Request(builtin_url)
req.add_header('Referer', url)
webpage = self._download_webpage(req, video_id, 'Downloading builtin page')
title = self._og_search_title(orig_webpage).strip()
description = self._og_search_description(orig_webpage).strip()
else:
fields = re.findall(r'type="hidden" name="(.+?)"\s* value="?(.+?)">', orig_webpage)
data = dict(fields)
post = urlencode_postdata(data)
headers = {
b'Content-Type': b'application/x-www-form-urlencoded',
}
req = sanitized_Request(url, post, headers)
webpage = self._download_webpage(
req, video_id, note='Downloading video page ...')
title = os.path.splitext(data['fname'])[0]
description = None
# Could be several links with different quality
links = re.findall(r'"file" : "?(.+?)",', webpage)
# Assume the links are ordered in quality
formats = [{
'url': l,
'quality': i,
} for i, l in enumerate(links)]
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'description': description,
'formats': formats,
}

View File

@ -6,17 +6,17 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
class MakersChannelIE(InfoExtractor): class MyChannelsIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?makerschannel\.com/.*(?P<id_type>video|production)_id=(?P<id>[0-9]+)' _VALID_URL = r'https?://(?:www\.)?mychannels\.com/.*(?P<id_type>video|production)_id=(?P<id>[0-9]+)'
_TEST = { _TEST = {
'url': 'http://makerschannel.com/en/zoomin/community-highlights?video_id=849', 'url': 'https://mychannels.com/missholland/miss-holland?production_id=3416',
'md5': '624a512c6969236b5967bf9286345ad1', 'md5': 'b8993daad4262dd68d89d651c0c52c45',
'info_dict': { 'info_dict': {
'id': '849', 'id': 'wUUDZZep6vQD',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Landing a bus on a plane is an epic win', 'title': 'Miss Holland joins VOTE LEAVE',
'uploader': 'ZoomIn', 'description': 'Miss Holland | #13 Not a potato',
'description': 'md5:cd9cca2ea7b69b78be81d07020c97139', 'uploader': 'Miss Holland',
} }
} }
@ -27,12 +27,12 @@ class MakersChannelIE(InfoExtractor):
def extract_data_val(attr, fatal=False): def extract_data_val(attr, fatal=False):
return self._html_search_regex(r'data-%s\s*=\s*"([^"]+)"' % attr, video_data, attr, fatal=fatal) return self._html_search_regex(r'data-%s\s*=\s*"([^"]+)"' % attr, video_data, attr, fatal=fatal)
minoto_id = self._search_regex(r'/id/([a-zA-Z0-9]+)', extract_data_val('video-src', True), 'minoto id') minoto_id = extract_data_val('minoto-id') or self._search_regex(r'/id/([a-zA-Z0-9]+)', extract_data_val('video-src', True), 'minoto id')
return { return {
'_type': 'url_transparent', '_type': 'url_transparent',
'url': 'minoto:%s' % minoto_id, 'url': 'minoto:%s' % minoto_id,
'id': extract_data_val('video-id', True), 'id': url_id,
'title': extract_data_val('title', True), 'title': extract_data_val('title', True),
'description': extract_data_val('description'), 'description': extract_data_val('description'),
'thumbnail': extract_data_val('image'), 'thumbnail': extract_data_val('image'),

View File

@ -1,7 +1,8 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re
import base64 import base64
import json
import re
from .common import InfoExtractor from .common import InfoExtractor
from .theplatform import ThePlatformIE from .theplatform import ThePlatformIE
@ -9,6 +10,7 @@ from .adobepass import AdobePassIE
from ..utils import ( from ..utils import (
find_xpath_attr, find_xpath_attr,
smuggle_url, smuggle_url,
try_get,
unescapeHTML, unescapeHTML,
update_url_query, update_url_query,
int_or_none, int_or_none,
@ -78,10 +80,14 @@ class NBCIE(AdobePassIE):
def _real_extract(self, url): def _real_extract(self, url):
permalink, video_id = re.match(self._VALID_URL, url).groups() permalink, video_id = re.match(self._VALID_URL, url).groups()
permalink = 'http' + permalink permalink = 'http' + permalink
video_data = self._download_json( response = self._download_json(
'https://api.nbc.com/v3/videos', video_id, query={ 'https://api.nbc.com/v3/videos', video_id, query={
'filter[permalink]': permalink, 'filter[permalink]': permalink,
})['data'][0]['attributes'] 'fields[videos]': 'description,entitlement,episodeNumber,guid,keywords,seasonNumber,title,vChipRating',
'fields[shows]': 'shortTitle',
'include': 'show.shortTitle',
})
video_data = response['data'][0]['attributes']
query = { query = {
'mbr': 'true', 'mbr': 'true',
'manifest': 'm3u', 'manifest': 'm3u',
@ -103,10 +109,11 @@ class NBCIE(AdobePassIE):
'title': title, 'title': title,
'url': theplatform_url, 'url': theplatform_url,
'description': video_data.get('description'), 'description': video_data.get('description'),
'keywords': video_data.get('keywords'), 'tags': video_data.get('keywords'),
'season_number': int_or_none(video_data.get('seasonNumber')), 'season_number': int_or_none(video_data.get('seasonNumber')),
'episode_number': int_or_none(video_data.get('episodeNumber')), 'episode_number': int_or_none(video_data.get('episodeNumber')),
'series': video_data.get('showName'), 'episode': title,
'series': try_get(response, lambda x: x['included'][0]['attributes']['shortTitle']),
'ie_key': 'ThePlatform', 'ie_key': 'ThePlatform',
} }
@ -169,6 +176,65 @@ class NBCSportsIE(InfoExtractor):
NBCSportsVPlayerIE._extract_url(webpage), 'NBCSportsVPlayer') NBCSportsVPlayerIE._extract_url(webpage), 'NBCSportsVPlayer')
class NBCSportsStreamIE(AdobePassIE):
_VALID_URL = r'https?://stream\.nbcsports\.com/.+?\bpid=(?P<id>\d+)'
_TEST = {
'url': 'http://stream.nbcsports.com/nbcsn/generic?pid=206559',
'info_dict': {
'id': '206559',
'ext': 'mp4',
'title': 'Amgen Tour of California Women\'s Recap',
'description': 'md5:66520066b3b5281ada7698d0ea2aa894',
},
'params': {
# m3u8 download
'skip_download': True,
},
'skip': 'Requires Adobe Pass Authentication',
}
def _real_extract(self, url):
video_id = self._match_id(url)
live_source = self._download_json(
'http://stream.nbcsports.com/data/live_sources_%s.json' % video_id,
video_id)
video_source = live_source['videoSources'][0]
title = video_source['title']
source_url = None
for k in ('source', 'msl4source', 'iossource', 'hlsv4'):
sk = k + 'Url'
source_url = video_source.get(sk) or video_source.get(sk + 'Alt')
if source_url:
break
else:
source_url = video_source['ottStreamUrl']
is_live = video_source.get('type') == 'live' or video_source.get('status') == 'Live'
resource = self._get_mvpd_resource('nbcsports', title, video_id, '')
token = self._extract_mvpd_auth(url, video_id, 'nbcsports', resource)
tokenized_url = self._download_json(
'https://token.playmakerservices.com/cdn',
video_id, data=json.dumps({
'requestorId': 'nbcsports',
'pid': video_id,
'application': 'NBCSports',
'version': 'v1',
'platform': 'desktop',
'cdn': 'akamai',
'url': video_source['sourceUrl'],
'token': base64.b64encode(token.encode()).decode(),
'resourceId': base64.b64encode(resource.encode()).decode(),
}).encode())['tokenizedUrl']
formats = self._extract_m3u8_formats(tokenized_url, video_id, 'mp4')
self._sort_formats(formats)
return {
'id': video_id,
'title': self._live_title(title) if is_live else title,
'description': live_source.get('description'),
'formats': formats,
'is_live': is_live,
}
class CSNNEIE(InfoExtractor): class CSNNEIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?csnne\.com/video/(?P<id>[0-9a-z-]+)' _VALID_URL = r'https?://(?:www\.)?csnne\.com/video/(?P<id>[0-9a-z-]+)'

View File

@ -85,7 +85,7 @@ class NickBrIE(MTVServicesInfoExtractor):
https?:// https?://
(?: (?:
(?P<domain>(?:www\.)?nickjr|mundonick\.uol)\.com\.br| (?P<domain>(?:www\.)?nickjr|mundonick\.uol)\.com\.br|
(?:www\.)?nickjr\.nl (?:www\.)?nickjr\.[a-z]{2}
) )
/(?:programas/)?[^/]+/videos/(?:episodios/)?(?P<id>[^/?\#.]+) /(?:programas/)?[^/]+/videos/(?:episodios/)?(?P<id>[^/?\#.]+)
''' '''
@ -98,6 +98,9 @@ class NickBrIE(MTVServicesInfoExtractor):
}, { }, {
'url': 'http://www.nickjr.nl/paw-patrol/videos/311-ge-wol-dig-om-terug-te-zijn/', 'url': 'http://www.nickjr.nl/paw-patrol/videos/311-ge-wol-dig-om-terug-te-zijn/',
'only_matching': True, 'only_matching': True,
}, {
'url': 'http://www.nickjr.de/blaze-und-die-monster-maschinen/videos/f6caaf8f-e4e8-4cc1-b489-9380d6dcd059/',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -163,7 +163,7 @@ class NiconicoIE(InfoExtractor):
self._login() self._login()
def _login(self): def _login(self):
(username, password) = self._get_login_info() username, password = self._get_login_info()
# No authentication to be performed # No authentication to be performed
if not username: if not username:
return True return True

View File

@ -13,38 +13,11 @@ from ..utils import (
) )
class NineCNineMediaBaseIE(InfoExtractor): class NineCNineMediaIE(InfoExtractor):
_API_BASE_TEMPLATE = 'http://capi.9c9media.com/destinations/%s/platforms/desktop/contents/%s/'
class NineCNineMediaStackIE(NineCNineMediaBaseIE):
IE_NAME = '9c9media:stack'
_GEO_COUNTRIES = ['CA']
_VALID_URL = r'9c9media:stack:(?P<destination_code>[^:]+):(?P<content_id>\d+):(?P<content_package>\d+):(?P<id>\d+)'
def _real_extract(self, url):
destination_code, content_id, package_id, stack_id = re.match(self._VALID_URL, url).groups()
stack_base_url_template = self._API_BASE_TEMPLATE + 'contentpackages/%s/stacks/%s/manifest.'
stack_base_url = stack_base_url_template % (destination_code, content_id, package_id, stack_id)
formats = []
formats.extend(self._extract_m3u8_formats(
stack_base_url + 'm3u8', stack_id, 'mp4',
'm3u8_native', m3u8_id='hls', fatal=False))
formats.extend(self._extract_f4m_formats(
stack_base_url + 'f4m', stack_id,
f4m_id='hds', fatal=False))
self._sort_formats(formats)
return {
'id': stack_id,
'formats': formats,
}
class NineCNineMediaIE(NineCNineMediaBaseIE):
IE_NAME = '9c9media' IE_NAME = '9c9media'
_GEO_COUNTRIES = ['CA']
_VALID_URL = r'9c9media:(?P<destination_code>[^:]+):(?P<id>\d+)' _VALID_URL = r'9c9media:(?P<destination_code>[^:]+):(?P<id>\d+)'
_API_BASE_TEMPLATE = 'http://capi.9c9media.com/destinations/%s/platforms/desktop/contents/%s/'
def _real_extract(self, url): def _real_extract(self, url):
destination_code, content_id = re.match(self._VALID_URL, url).groups() destination_code, content_id = re.match(self._VALID_URL, url).groups()
@ -58,13 +31,26 @@ class NineCNineMediaIE(NineCNineMediaBaseIE):
content_package = content['ContentPackages'][0] content_package = content['ContentPackages'][0]
package_id = content_package['Id'] package_id = content_package['Id']
content_package_url = api_base_url + 'contentpackages/%s/' % package_id content_package_url = api_base_url + 'contentpackages/%s/' % package_id
content_package = self._download_json(content_package_url, content_id) content_package = self._download_json(
content_package_url, content_id, query={
'$include': '[HasClosedCaptions]',
})
if content_package.get('Constraints', {}).get('Security', {}).get('Type') == 'adobe-drm': if content_package.get('Constraints', {}).get('Security', {}).get('Type'):
raise ExtractorError('This video is DRM protected.', expected=True) raise ExtractorError('This video is DRM protected.', expected=True)
stacks = self._download_json(content_package_url + 'stacks/', package_id)['Items'] manifest_base_url = content_package_url + 'manifest.'
multistacks = len(stacks) > 1 formats = []
formats.extend(self._extract_m3u8_formats(
manifest_base_url + 'm3u8', content_id, 'mp4',
'm3u8_native', m3u8_id='hls', fatal=False))
formats.extend(self._extract_f4m_formats(
manifest_base_url + 'f4m', content_id,
f4m_id='hds', fatal=False))
formats.extend(self._extract_mpd_formats(
manifest_base_url + 'mpd', content_id,
mpd_id='dash', fatal=False))
self._sort_formats(formats)
thumbnails = [] thumbnails = []
for image in content.get('Images', []): for image in content.get('Images', []):
@ -85,10 +71,12 @@ class NineCNineMediaIE(NineCNineMediaBaseIE):
continue continue
container.append(e_name) container.append(e_name)
description = content.get('Desc') or content.get('ShortDesc')
season = content.get('Season', {}) season = content.get('Season', {})
base_info = {
'description': description, info = {
'id': content_id,
'title': title,
'description': content.get('Desc') or content.get('ShortDesc'),
'timestamp': parse_iso8601(content.get('BroadcastDateTime')), 'timestamp': parse_iso8601(content.get('BroadcastDateTime')),
'episode_number': int_or_none(content.get('Episode')), 'episode_number': int_or_none(content.get('Episode')),
'season': season.get('Name'), 'season': season.get('Name'),
@ -97,26 +85,19 @@ class NineCNineMediaIE(NineCNineMediaBaseIE):
'series': content.get('Media', {}).get('Name'), 'series': content.get('Media', {}).get('Name'),
'tags': tags, 'tags': tags,
'categories': categories, 'categories': categories,
'duration': float_or_none(content_package.get('Duration')),
'formats': formats,
} }
entries = [] if content_package.get('HasClosedCaptions'):
for stack in stacks: info['subtitles'] = {
stack_id = compat_str(stack['Id']) 'en': [{
entry = { 'url': manifest_base_url + 'vtt',
'_type': 'url_transparent', 'ext': 'vtt',
'url': '9c9media:stack:%s:%s:%s:%s' % (destination_code, content_id, package_id, stack_id), }, {
'id': stack_id, 'url': manifest_base_url + 'srt',
'title': '%s_part%s' % (title, stack['Name']) if multistacks else title, 'ext': 'srt',
'duration': float_or_none(stack.get('Duration')), }]
'ie_key': 'NineCNineMediaStack',
} }
entry.update(base_info)
entries.append(entry)
return { return info
'_type': 'multi_video',
'id': content_id,
'title': title,
'description': description,
'entries': entries,
}

View File

@ -65,7 +65,7 @@ class NocoIE(InfoExtractor):
self._login() self._login()
def _login(self): def _login(self):
(username, password) = self._get_login_info() username, password = self._get_login_info()
if username is None: if username is None:
return return

View File

@ -237,7 +237,7 @@ class NRKTVIE(NRKBaseIE):
(?:/\d{2}-\d{2}-\d{4})? (?:/\d{2}-\d{2}-\d{4})?
(?:\#del=(?P<part_id>\d+))? (?:\#del=(?P<part_id>\d+))?
''' % _EPISODE_RE ''' % _EPISODE_RE
_API_HOST = 'psapi-ne.nrk.no' _API_HOST = 'psapi-we.nrk.no'
_TESTS = [{ _TESTS = [{
'url': 'https://tv.nrk.no/serie/20-spoersmaal-tv/MUHH48000314/23-05-2014', 'url': 'https://tv.nrk.no/serie/20-spoersmaal-tv/MUHH48000314/23-05-2014',

View File

@ -340,7 +340,10 @@ class OpenloadIE(InfoExtractor):
get_element_by_id('streamurj', webpage) or get_element_by_id('streamurj', webpage) or
self._search_regex( self._search_regex(
(r'>\s*([\w-]+~\d{10,}~\d+\.\d+\.0\.0~[\w-]+)\s*<', (r'>\s*([\w-]+~\d{10,}~\d+\.\d+\.0\.0~[\w-]+)\s*<',
r'>\s*([\w~-]+~\d+\.\d+\.\d+\.\d+~[\w~-]+)'), webpage, r'>\s*([\w~-]+~\d+\.\d+\.\d+\.\d+~[\w~-]+)',
r'>\s*([\w-]+~\d{10,}~(?:[a-f\d]+:){2}:~[\w-]+)\s*<',
r'>\s*([\w~-]+~[a-f0-9:]+~[\w~-]+)\s*<',
r'>\s*([\w~-]+~[a-f0-9:]+~[\w~-]+)'), webpage,
'stream URL')) 'stream URL'))
video_url = 'https://openload.co/stream/%s?mime=true' % decoded_id video_url = 'https://openload.co/stream/%s?mime=true' % decoded_id

View File

@ -42,7 +42,7 @@ class PacktPubIE(PacktPubBaseIE):
_TOKEN = None _TOKEN = None
def _real_initialize(self): def _real_initialize(self):
(username, password) = self._get_login_info() username, password = self._get_login_info()
if username is None: if username is None:
return return
try: try:

View File

@ -53,7 +53,7 @@ class PatreonIE(InfoExtractor):
# needed. Keeping this commented for when this inevitably changes. # needed. Keeping this commented for when this inevitably changes.
''' '''
def _login(self): def _login(self):
(username, password) = self._get_login_info() username, password = self._get_login_info()
if username is None: if username is None:
return return

View File

@ -505,7 +505,7 @@ class PBSIE(InfoExtractor):
if player: if player:
video_info = self._parse_json( video_info = self._parse_json(
self._search_regex( self._search_regex(
r'(?s)PBS\.videoData\s*=\s*({.+?});\n', [r'(?s)PBS\.videoData\s*=\s*({.+?});\n', r'window\.videoBridge\s*=\s*({.+?});'],
player, '%s video data' % page, default='{}'), player, '%s video data' % page, default='{}'),
display_id, transform_source=js_to_json, fatal=False) display_id, transform_source=js_to_json, fatal=False)
if video_info: if video_info:
@ -513,10 +513,14 @@ class PBSIE(InfoExtractor):
if not info: if not info:
info = video_info info = video_info
if not chapters: if not chapters:
raw_chapters = video_info.get('chapters') or []
if not raw_chapters:
for chapter_data in re.findall(r'(?s)chapters\.push\(({.*?})\)', player): for chapter_data in re.findall(r'(?s)chapters\.push\(({.*?})\)', player):
chapter = self._parse_json(chapter_data, video_id, js_to_json, fatal=False) chapter = self._parse_json(chapter_data, video_id, js_to_json, fatal=False)
if not chapter: if not chapter:
continue continue
raw_chapters.append(chapter)
for chapter in raw_chapters:
start_time = float_or_none(chapter.get('start_time'), 1000) start_time = float_or_none(chapter.get('start_time'), 1000)
duration = float_or_none(chapter.get('duration'), 1000) duration = float_or_none(chapter.get('duration'), 1000)
if start_time is None or duration is None: if start_time is None or duration is None:

View File

@ -0,0 +1,228 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
int_or_none,
parse_resolution,
try_get,
unified_timestamp,
urljoin,
)
class PeerTubeIE(InfoExtractor):
_INSTANCES_RE = r'''(?:
# Taken from https://instances.joinpeertube.org/instances
tube\.openalgeria\.org|
peertube\.pointsecu\.fr|
peertube\.nogafa\.org|
peertube\.pl|
megatube\.lilomoino\.fr|
peertube\.tamanoir\.foucry\.net|
peertube\.inapurna\.org|
peertube\.netzspielplatz\.de|
video\.deadsuperhero\.com|
peertube\.devosi\.org|
peertube\.1312\.media|
tube\.worldofhauru\.xyz|
tube\.bootlicker\.party|
skeptikon\.fr|
peertube\.geekshell\.fr|
tube\.opportunis\.me|
peertube\.peshane\.net|
video\.blueline\.mg|
tube\.homecomputing\.fr|
videos\.cloudfrancois\.fr|
peertube\.viviers-fibre\.net|
tube\.ouahpiti\.info|
video\.tedomum\.net|
video\.g3l\.org|
fontube\.fr|
peertube\.gaialabs\.ch|
peertube\.extremely\.online|
peertube\.public-infrastructure\.eu|
tube\.kher\.nl|
peertube\.qtg\.fr|
tube\.22decembre\.eu|
facegirl\.me|
video\.migennes\.net|
janny\.moe|
tube\.p2p\.legal|
video\.atlanti\.se|
troll\.tv|
peertube\.geekael\.fr|
vid\.leotindall\.com|
video\.anormallostpod\.ovh|
p-tube\.h3z\.jp|
tube\.darfweb\.eu|
videos\.iut-orsay\.fr|
peertube\.solidev\.net|
videos\.symphonie-of-code\.fr|
testtube\.ortg\.de|
videos\.cemea\.org|
peertube\.gwendalavir\.eu|
video\.passageenseine\.fr|
videos\.festivalparminous\.org|
peertube\.touhoppai\.moe|
peertube\.duckdns\.org|
sikke\.fi|
peertube\.mastodon\.host|
firedragonvideos\.com|
vidz\.dou\.bet|
peertube\.koehn\.com|
peer\.hostux\.social|
share\.tube|
peertube\.walkingmountains\.fr|
medias\.libox\.fr|
peertube\.moe|
peertube\.xyz|
jp\.peertube\.network|
videos\.benpro\.fr|
tube\.otter\.sh|
peertube\.angristan\.xyz|
peertube\.parleur\.net|
peer\.ecutsa\.fr|
peertube\.heraut\.eu|
peertube\.tifox\.fr|
peertube\.maly\.io|
vod\.mochi\.academy|
exode\.me|
coste\.video|
tube\.aquilenet\.fr|
peertube\.gegeweb\.eu|
framatube\.org|
thinkerview\.video|
tube\.conferences-gesticulees\.net|
peertube\.datagueule\.tv|
video\.lqdn\.fr|
meilleurtube\.delire\.party|
tube\.mochi\.academy|
peertube\.dav\.li|
media\.zat\.im|
pytu\.be|
peertube\.valvin\.fr|
peertube\.nsa\.ovh|
video\.colibris-outilslibres\.org|
video\.hispagatos\.org|
tube\.svnet\.fr|
peertube\.video|
videos\.lecygnenoir\.info|
peertube3\.cpy\.re|
peertube2\.cpy\.re|
videos\.tcit\.fr|
peertube\.cpy\.re
)'''
_VALID_URL = r'''(?x)
https?://
%s
/(?:videos/(?:watch|embed)|api/v\d/videos)/
(?P<id>[^/?\#&]+)
''' % _INSTANCES_RE
_TESTS = [{
'url': 'https://peertube.moe/videos/watch/2790feb0-8120-4e63-9af3-c943c69f5e6c',
'md5': '80f24ff364cc9d333529506a263e7feb',
'info_dict': {
'id': '2790feb0-8120-4e63-9af3-c943c69f5e6c',
'ext': 'mp4',
'title': 'wow',
'description': 'wow such video, so gif',
'thumbnail': r're:https?://.*\.(?:jpg|png)',
'timestamp': 1519297480,
'upload_date': '20180222',
'uploader': 'Luclu7',
'uploader_id': '7fc42640-efdb-4505-a45d-a15b1a5496f1',
'uploder_url': 'https://peertube.nsa.ovh/accounts/luclu7',
'license': 'Unknown',
'duration': 3,
'view_count': int,
'like_count': int,
'dislike_count': int,
'tags': list,
'categories': list,
}
}, {
'url': 'https://peertube.tamanoir.foucry.net/videos/watch/0b04f13d-1e18-4f1d-814e-4979aa7c9c44',
'only_matching': True,
}, {
# nsfw
'url': 'https://tube.22decembre.eu/videos/watch/9bb88cd3-9959-46d9-9ab9-33d2bb704c39',
'only_matching': True,
}, {
'url': 'https://tube.22decembre.eu/videos/embed/fed67262-6edb-4d1c-833b-daa9085c71d7',
'only_matching': True,
}, {
'url': 'https://tube.openalgeria.org/api/v1/videos/c1875674-97d0-4c94-a058-3f7e64c962e8',
'only_matching': True,
}]
@staticmethod
def _extract_urls(webpage):
return [
mobj.group('url')
for mobj in re.finditer(
r'''(?x)<iframe[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//%s/videos/embed/[^/?\#&]+)\1'''
% PeerTubeIE._INSTANCES_RE, webpage)]
def _real_extract(self, url):
video_id = self._match_id(url)
video = self._download_json(
urljoin(url, '/api/v1/videos/%s' % video_id), video_id)
title = video['name']
formats = []
for file_ in video['files']:
if not isinstance(file_, dict):
continue
file_url = file_.get('fileUrl')
if not file_url or not isinstance(file_url, compat_str):
continue
file_size = int_or_none(file_.get('size'))
format_id = try_get(
file_, lambda x: x['resolution']['label'], compat_str)
f = parse_resolution(format_id)
f.update({
'url': file_url,
'format_id': format_id,
'filesize': file_size,
})
formats.append(f)
self._sort_formats(formats)
def account_data(field):
return try_get(video, lambda x: x['account'][field], compat_str)
category = try_get(video, lambda x: x['category']['label'], compat_str)
categories = [category] if category else None
nsfw = video.get('nsfw')
if nsfw is bool:
age_limit = 18 if nsfw else 0
else:
age_limit = None
return {
'id': video_id,
'title': title,
'description': video.get('description'),
'thumbnail': urljoin(url, video.get('thumbnailPath')),
'timestamp': unified_timestamp(video.get('publishedAt')),
'uploader': account_data('displayName'),
'uploader_id': account_data('uuid'),
'uploder_url': account_data('url'),
'license': try_get(
video, lambda x: x['licence']['label'], compat_str),
'duration': int_or_none(video.get('duration')),
'view_count': int_or_none(video.get('views')),
'like_count': int_or_none(video.get('likes')),
'dislike_count': int_or_none(video.get('dislikes')),
'age_limit': age_limit,
'tags': try_get(video, lambda x: x['tags'], list),
'categories': categories,
'formats': formats,
}

View File

@ -94,7 +94,7 @@ class PluralsightIE(PluralsightBaseIE):
self._login() self._login()
def _login(self): def _login(self):
(username, password) = self._get_login_info() username, password = self._get_login_info()
if username is None: if username is None:
return return
@ -140,10 +140,10 @@ class PluralsightIE(PluralsightBaseIE):
raise ExtractorError('Unable to log in') raise ExtractorError('Unable to log in')
def _get_subtitles(self, author, clip_id, lang, name, duration, video_id): def _get_subtitles(self, author, clip_idx, lang, name, duration, video_id):
captions_post = { captions_post = {
'a': author, 'a': author,
'cn': clip_id, 'cn': clip_idx,
'lc': lang, 'lc': lang,
'm': name, 'm': name,
} }
@ -195,13 +195,13 @@ class PluralsightIE(PluralsightBaseIE):
author = qs.get('author', [None])[0] author = qs.get('author', [None])[0]
name = qs.get('name', [None])[0] name = qs.get('name', [None])[0]
clip_id = qs.get('clip', [None])[0] clip_idx = qs.get('clip', [None])[0]
course_name = qs.get('course', [None])[0] course_name = qs.get('course', [None])[0]
if any(not f for f in (author, name, clip_id, course_name,)): if any(not f for f in (author, name, clip_idx, course_name,)):
raise ExtractorError('Invalid URL', expected=True) raise ExtractorError('Invalid URL', expected=True)
display_id = '%s-%s' % (name, clip_id) display_id = '%s-%s' % (name, clip_idx)
course = self._download_course(course_name, url, display_id) course = self._download_course(course_name, url, display_id)
@ -217,7 +217,7 @@ class PluralsightIE(PluralsightBaseIE):
clip_index = clip_.get('index') clip_index = clip_.get('index')
if clip_index is None: if clip_index is None:
continue continue
if compat_str(clip_index) == clip_id: if compat_str(clip_index) == clip_idx:
clip = clip_ clip = clip_
break break
@ -225,6 +225,7 @@ class PluralsightIE(PluralsightBaseIE):
raise ExtractorError('Unable to resolve clip') raise ExtractorError('Unable to resolve clip')
title = clip['title'] title = clip['title']
clip_id = clip.get('clipName') or clip.get('name') or clip['clipId']
QUALITIES = { QUALITIES = {
'low': {'width': 640, 'height': 480}, 'low': {'width': 640, 'height': 480},
@ -277,7 +278,7 @@ class PluralsightIE(PluralsightBaseIE):
clip_post = { clip_post = {
'author': author, 'author': author,
'includeCaptions': False, 'includeCaptions': False,
'clipIndex': int(clip_id), 'clipIndex': int(clip_idx),
'courseName': course_name, 'courseName': course_name,
'locale': 'en', 'locale': 'en',
'moduleName': name, 'moduleName': name,
@ -330,10 +331,10 @@ class PluralsightIE(PluralsightBaseIE):
# TODO: other languages? # TODO: other languages?
subtitles = self.extract_subtitles( subtitles = self.extract_subtitles(
author, clip_id, 'en', name, duration, display_id) author, clip_idx, 'en', name, duration, display_id)
return { return {
'id': clip.get('clipName') or clip['name'], 'id': clip_id,
'title': title, 'title': title,
'duration': duration, 'duration': duration,
'creator': author, 'creator': author,

View File

@ -19,7 +19,7 @@ class RDSIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': '604333', 'id': '604333',
'display_id': 'fowler-jr-prend-la-direction-de-jacksonville', 'display_id': 'fowler-jr-prend-la-direction-de-jacksonville',
'ext': 'mp4', 'ext': 'flv',
'title': 'Fowler Jr. prend la direction de Jacksonville', 'title': 'Fowler Jr. prend la direction de Jacksonville',
'description': 'Dante Fowler Jr. est le troisième choix du repêchage 2015 de la NFL. ', 'description': 'Dante Fowler Jr. est le troisième choix du repêchage 2015 de la NFL. ',
'timestamp': 1430397346, 'timestamp': 1430397346,

View File

@ -47,7 +47,7 @@ class RedditIE(InfoExtractor):
class RedditRIE(InfoExtractor): class RedditRIE(InfoExtractor):
_VALID_URL = r'(?P<url>https?://(?:www\.)?reddit\.com/r/[^/]+/comments/(?P<id>[^/?#&]+))' _VALID_URL = r'(?P<url>https?://(?:[^/]+\.)?reddit\.com/r/[^/]+/comments/(?P<id>[^/?#&]+))'
_TESTS = [{ _TESTS = [{
'url': 'https://www.reddit.com/r/videos/comments/6rrwyj/that_small_heart_attack/', 'url': 'https://www.reddit.com/r/videos/comments/6rrwyj/that_small_heart_attack/',
'info_dict': { 'info_dict': {
@ -74,6 +74,10 @@ class RedditRIE(InfoExtractor):
# imgur # imgur
'url': 'https://www.reddit.com/r/MadeMeSmile/comments/6t7wi5/wait_for_it/', 'url': 'https://www.reddit.com/r/MadeMeSmile/comments/6t7wi5/wait_for_it/',
'only_matching': True, 'only_matching': True,
}, {
# imgur @ old reddit
'url': 'https://old.reddit.com/r/MadeMeSmile/comments/6t7wi5/wait_for_it/',
'only_matching': True,
}, { }, {
# streamable # streamable
'url': 'https://www.reddit.com/r/videos/comments/6t7sg9/comedians_hilarious_joke_about_the_guam_flag/', 'url': 'https://www.reddit.com/r/videos/comments/6t7sg9/comedians_hilarious_joke_about_the_guam_flag/',
@ -82,6 +86,10 @@ class RedditRIE(InfoExtractor):
# youtube # youtube
'url': 'https://www.reddit.com/r/videos/comments/6t75wq/southern_man_tries_to_speak_without_an_accent/', 'url': 'https://www.reddit.com/r/videos/comments/6t75wq/southern_man_tries_to_speak_without_an_accent/',
'only_matching': True, 'only_matching': True,
}, {
# reddit video @ nm reddit
'url': 'https://nm.reddit.com/r/Cricket/comments/8idvby/lousy_cameraman_finds_himself_in_cairns_line_of/',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -50,7 +50,7 @@ class RoosterTeethIE(InfoExtractor):
}] }]
def _login(self): def _login(self):
(username, password) = self._get_login_info() username, password = self._get_login_info()
if username is None: if username is None:
return return

View File

@ -27,7 +27,7 @@ class SafariBaseIE(InfoExtractor):
self._login() self._login()
def _login(self): def _login(self):
(username, password) = self._get_login_info() username, password = self._get_login_info()
if username is None: if username is None:
return return

View File

@ -64,7 +64,7 @@ class SinaIE(InfoExtractor):
# The video id is in the redirected url # The video id is in the redirected url
self.to_screen('Getting video id') self.to_screen('Getting video id')
request = HEADRequest(url) request = HEADRequest(url)
(_, urlh) = self._download_webpage_handle(request, 'NA', False) _, urlh = self._download_webpage_handle(request, 'NA', False)
return self._real_extract(urlh.geturl()) return self._real_extract(urlh.geturl())
else: else:
pseudo_id = mobj.group('pseudo_id') pseudo_id = mobj.group('pseudo_id')

View File

@ -181,7 +181,6 @@ class SoundcloudIE(InfoExtractor):
thumbnail = info.get('artwork_url') or info.get('user', {}).get('avatar_url') thumbnail = info.get('artwork_url') or info.get('user', {}).get('avatar_url')
if isinstance(thumbnail, compat_str): if isinstance(thumbnail, compat_str):
thumbnail = thumbnail.replace('-large', '-t500x500') thumbnail = thumbnail.replace('-large', '-t500x500')
ext = 'mp3'
result = { result = {
'id': track_id, 'id': track_id,
'uploader': info.get('user', {}).get('username'), 'uploader': info.get('user', {}).get('username'),
@ -215,8 +214,11 @@ class SoundcloudIE(InfoExtractor):
track_id, 'Downloading track url', query=query) track_id, 'Downloading track url', query=query)
for key, stream_url in format_dict.items(): for key, stream_url in format_dict.items():
abr = int_or_none(self._search_regex( ext, abr = 'mp3', None
r'_(\d+)_url', key, 'audio bitrate', default=None)) mobj = re.search(r'_([^_]+)_(\d+)_url', key)
if mobj:
ext, abr = mobj.groups()
abr = int(abr)
if key.startswith('http'): if key.startswith('http'):
stream_formats = [{ stream_formats = [{
'format_id': key, 'format_id': key,
@ -234,11 +236,12 @@ class SoundcloudIE(InfoExtractor):
}] }]
elif key.startswith('hls'): elif key.startswith('hls'):
stream_formats = self._extract_m3u8_formats( stream_formats = self._extract_m3u8_formats(
stream_url, track_id, 'mp3', entry_protocol='m3u8_native', stream_url, track_id, ext, entry_protocol='m3u8_native',
m3u8_id=key, fatal=False) m3u8_id=key, fatal=False)
else: else:
continue continue
if abr:
for f in stream_formats: for f in stream_formats:
f['abr'] = abr f['abr'] = abr
@ -250,7 +253,7 @@ class SoundcloudIE(InfoExtractor):
formats.append({ formats.append({
'format_id': 'fallback', 'format_id': 'fallback',
'url': update_url_query(info['stream_url'], query), 'url': update_url_query(info['stream_url'], query),
'ext': ext, 'ext': 'mp3',
}) })
for f in formats: for f in formats:

View File

@ -11,9 +11,9 @@ from .nexx import (
from .spiegeltv import SpiegeltvIE from .spiegeltv import SpiegeltvIE
from ..compat import compat_urlparse from ..compat import compat_urlparse
from ..utils import ( from ..utils import (
extract_attributes, parse_duration,
unified_strdate, strip_or_none,
get_element_by_attribute, unified_timestamp,
) )
@ -21,35 +21,38 @@ class SpiegelIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?spiegel\.de/video/[^/]*-(?P<id>[0-9]+)(?:-embed|-iframe)?(?:\.html)?(?:#.*)?$' _VALID_URL = r'https?://(?:www\.)?spiegel\.de/video/[^/]*-(?P<id>[0-9]+)(?:-embed|-iframe)?(?:\.html)?(?:#.*)?$'
_TESTS = [{ _TESTS = [{
'url': 'http://www.spiegel.de/video/vulkan-tungurahua-in-ecuador-ist-wieder-aktiv-video-1259285.html', 'url': 'http://www.spiegel.de/video/vulkan-tungurahua-in-ecuador-ist-wieder-aktiv-video-1259285.html',
'md5': '2c2754212136f35fb4b19767d242f66e', 'md5': 'b57399839d055fccfeb9a0455c439868',
'info_dict': { 'info_dict': {
'id': '1259285', 'id': '563747',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Vulkanausbruch in Ecuador: Der "Feuerschlund" ist wieder aktiv', 'title': 'Vulkanausbruch in Ecuador: Der "Feuerschlund" ist wieder aktiv',
'description': 'md5:8029d8310232196eb235d27575a8b9f4', 'description': 'md5:8029d8310232196eb235d27575a8b9f4',
'duration': 49, 'duration': 49,
'upload_date': '20130311', 'upload_date': '20130311',
'timestamp': 1362994320,
}, },
}, { }, {
'url': 'http://www.spiegel.de/video/schach-wm-videoanalyse-des-fuenften-spiels-video-1309159.html', 'url': 'http://www.spiegel.de/video/schach-wm-videoanalyse-des-fuenften-spiels-video-1309159.html',
'md5': 'f2cdf638d7aa47654e251e1aee360af1', 'md5': '5b6c2f4add9d62912ed5fc78a1faed80',
'info_dict': { 'info_dict': {
'id': '1309159', 'id': '580988',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Schach-WM in der Videoanalyse: Carlsen nutzt die Fehlgriffe des Titelverteidigers', 'title': 'Schach-WM in der Videoanalyse: Carlsen nutzt die Fehlgriffe des Titelverteidigers',
'description': 'md5:c2322b65e58f385a820c10fa03b2d088', 'description': 'md5:c2322b65e58f385a820c10fa03b2d088',
'duration': 983, 'duration': 983,
'upload_date': '20131115', 'upload_date': '20131115',
'timestamp': 1384546642,
}, },
}, { }, {
'url': 'http://www.spiegel.de/video/astronaut-alexander-gerst-von-der-iss-station-beantwortet-fragen-video-1519126-embed.html', 'url': 'http://www.spiegel.de/video/astronaut-alexander-gerst-von-der-iss-station-beantwortet-fragen-video-1519126-embed.html',
'md5': 'd8eeca6bfc8f1cd6f490eb1f44695d51', 'md5': '97b91083a672d72976faa8433430afb9',
'info_dict': { 'info_dict': {
'id': '1519126', 'id': '601883',
'ext': 'mp4', 'ext': 'mp4',
'description': 'SPIEGEL ONLINE-Nutzer durften den deutschen Astronauten Alexander Gerst über sein Leben auf der ISS-Station befragen. Hier kommen seine Antworten auf die besten sechs Fragen.', 'description': 'SPIEGEL ONLINE-Nutzer durften den deutschen Astronauten Alexander Gerst über sein Leben auf der ISS-Station befragen. Hier kommen seine Antworten auf die besten sechs Fragen.',
'title': 'Fragen an Astronaut Alexander Gerst: "Bekommen Sie die Tageszeiten mit?"', 'title': 'Fragen an Astronaut Alexander Gerst: "Bekommen Sie die Tageszeiten mit?"',
'upload_date': '20140904', 'upload_date': '20140904',
'timestamp': 1409834160,
} }
}, { }, {
'url': 'http://www.spiegel.de/video/astronaut-alexander-gerst-von-der-iss-station-beantwortet-fragen-video-1519126-iframe.html', 'url': 'http://www.spiegel.de/video/astronaut-alexander-gerst-von-der-iss-station-beantwortet-fragen-video-1519126-iframe.html',
@ -62,59 +65,28 @@ class SpiegelIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage, handle = self._download_webpage_handle(url, video_id) metadata_url = 'http://www.spiegel.de/video/metadata/video-%s.json' % video_id
handle = self._request_webpage(metadata_url, video_id)
# 302 to spiegel.tv, like http://www.spiegel.de/video/der-film-zum-wochenende-die-wahrheit-ueber-maenner-video-99003272.html # 302 to spiegel.tv, like http://www.spiegel.de/video/der-film-zum-wochenende-die-wahrheit-ueber-maenner-video-99003272.html
if SpiegeltvIE.suitable(handle.geturl()): if SpiegeltvIE.suitable(handle.geturl()):
return self.url_result(handle.geturl(), 'Spiegeltv') return self.url_result(handle.geturl(), 'Spiegeltv')
nexx_id = self._search_regex( video_data = self._parse_json(self._webpage_read_content(
r'nexxOmniaId\s*:\s*(\d+)', webpage, 'nexx id', default=None) handle, metadata_url, video_id), video_id)
if nexx_id: title = video_data['title']
domain_id = NexxIE._extract_domain_id(webpage) or '748' nexx_id = video_data['nexxOmniaId']
return self.url_result( domain_id = video_data.get('nexxOmniaDomain') or '748'
'nexx:%s:%s' % (domain_id, nexx_id), ie=NexxIE.ie_key(),
video_id=nexx_id)
video_data = extract_attributes(self._search_regex(r'(<div[^>]+id="spVideoElements"[^>]+>)', webpage, 'video element', default=''))
title = video_data.get('data-video-title') or get_element_by_attribute('class', 'module-title', webpage)
description = video_data.get('data-video-teaser') or self._html_search_meta('description', webpage, 'description')
base_url = self._search_regex(
[r'server\s*:\s*(["\'])(?P<url>.+?)\1', r'var\s+server\s*=\s*"(?P<url>[^"]+)\"'],
webpage, 'server URL', group='url')
xml_url = base_url + video_id + '.xml'
idoc = self._download_xml(xml_url, video_id)
formats = []
for n in list(idoc):
if n.tag.startswith('type') and n.tag != 'type6':
format_id = n.tag.rpartition('type')[2]
video_url = base_url + n.find('./filename').text
formats.append({
'format_id': format_id,
'url': video_url,
'width': int(n.find('./width').text),
'height': int(n.find('./height').text),
'abr': int(n.find('./audiobitrate').text),
'vbr': int(n.find('./videobitrate').text),
'vcodec': n.find('./codec').text,
'acodec': 'MP4A',
})
duration = float(idoc[0].findall('./duration')[0].text)
self._check_formats(formats, video_id)
self._sort_formats(formats)
return { return {
'_type': 'url_transparent',
'id': video_id, 'id': video_id,
'url': 'nexx:%s:%s' % (domain_id, nexx_id),
'title': title, 'title': title,
'description': description.strip() if description else None, 'description': strip_or_none(video_data.get('teaser')),
'duration': duration, 'duration': parse_duration(video_data.get('duration')),
'upload_date': unified_strdate(video_data.get('data-video-date')), 'timestamp': unified_timestamp(video_data.get('datum')),
'formats': formats, 'ie_key': NexxIE.ie_key(),
} }

View File

@ -1,55 +1,46 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .mtv import MTVServicesInfoExtractor from .mtv import MTVServicesInfoExtractor
class SpikeIE(MTVServicesInfoExtractor): class BellatorIE(MTVServicesInfoExtractor):
_VALID_URL = r'https?://(?:[^/]+\.)?spike\.com/[^/]+/[\da-z]{6}(?:[/?#&]|$)' _VALID_URL = r'https?://(?:www\.)?bellator\.com/[^/]+/[\da-z]{6}(?:[/?#&]|$)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.spike.com/video-clips/lhtu8m/auction-hunters-can-allen-ride-a-hundred-year-old-motorcycle', 'url': 'http://www.bellator.com/fight/atwr7k/bellator-158-michael-page-vs-evangelista-cyborg',
'md5': '1a9265f32b0c375793d6c4ce45255256',
'info_dict': { 'info_dict': {
'id': 'b9c8221a-4e50-479a-b86d-3333323e38ba', 'id': 'b55e434e-fde1-4a98-b7cc-92003a034de4',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Auction Hunters|December 27, 2013|4|414|Can Allen Ride A Hundred Year-Old Motorcycle?', 'title': 'Douglas Lima vs. Paul Daley - Round 1',
'description': 'md5:fbed7e82ed5fad493615b3094a9499cb', 'description': 'md5:805a8dd29310fd611d32baba2f767885',
'timestamp': 1388120400, },
'upload_date': '20131227', 'params': {
# m3u8 download
'skip_download': True,
}, },
}, { }, {
'url': 'http://www.spike.com/full-episodes/j830qm/lip-sync-battle-joel-mchale-vs-jim-rash-season-2-ep-209', 'url': 'http://www.bellator.com/video-clips/bw6k7n/bellator-158-foundations-michael-venom-page',
'md5': 'b25c6f16418aefb9ad5a6cae2559321f', 'only_matching': True,
}]
_FEED_URL = 'http://www.spike.com/feeds/mrss/'
_GEO_COUNTRIES = ['US']
class ParamountNetworkIE(MTVServicesInfoExtractor):
_VALID_URL = r'https?://(?:www\.)?paramountnetwork\.com/[^/]+/[\da-z]{6}(?:[/?#&]|$)'
_TESTS = [{
'url': 'http://www.paramountnetwork.com/episodes/j830qm/lip-sync-battle-joel-mchale-vs-jim-rash-season-2-ep-13',
'info_dict': { 'info_dict': {
'id': '37ace3a8-1df6-48be-85b8-38df8229e241', 'id': '37ace3a8-1df6-48be-85b8-38df8229e241',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Lip Sync Battle|April 28, 2016|2|209|Joel McHale Vs. Jim Rash|Act 1', 'title': 'Lip Sync Battle|April 28, 2016|2|209|Joel McHale Vs. Jim Rash|Act 1',
'description': 'md5:a739ca8f978a7802f67f8016d27ce114', 'description': 'md5:a739ca8f978a7802f67f8016d27ce114',
}, },
}, { 'params': {
'url': 'http://www.spike.com/video-clips/lhtu8m/', # m3u8 download
'only_matching': True, 'skip_download': True,
}, { },
'url': 'http://www.spike.com/video-clips/lhtu8m',
'only_matching': True,
}, {
'url': 'http://bellator.spike.com/fight/atwr7k/bellator-158-michael-page-vs-evangelista-cyborg',
'only_matching': True,
}, {
'url': 'http://bellator.spike.com/video-clips/bw6k7n/bellator-158-foundations-michael-venom-page',
'only_matching': True,
}] }]
_FEED_URL = 'http://www.spike.com/feeds/mrss/' _FEED_URL = 'http://www.paramountnetwork.com/feeds/mrss/'
_MOBILE_TEMPLATE = 'http://m.spike.com/videos/video.rbml?id=%s'
_CUSTOM_URL_REGEX = re.compile(r'spikenetworkapp://([^/]+/[-a-fA-F0-9]+)')
_GEO_COUNTRIES = ['US'] _GEO_COUNTRIES = ['US']
def _extract_mgid(self, webpage):
mgid = super(SpikeIE, self)._extract_mgid(webpage)
if mgid is None:
url_parts = self._search_regex(self._CUSTOM_URL_REGEX, webpage, 'episode_id')
video_type, episode_id = url_parts.split('/', 1)
mgid = 'mgid:arc:{0}:spike.com:{1}'.format(video_type, episode_id)
return mgid

View File

@ -1,35 +1,34 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import binascii
import re
import json import json
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import (
compat_b64decode,
compat_ord,
)
from ..utils import ( from ..utils import (
ExtractorError,
qualities,
determine_ext, determine_ext,
ExtractorError,
int_or_none,
mimetype2ext,
parse_duration,
parse_iso8601,
qualities,
) )
class TeamcocoIE(InfoExtractor): class TeamcocoIE(InfoExtractor):
_VALID_URL = r'https?://teamcoco\.com/video/(?P<video_id>[0-9]+)?/?(?P<display_id>.*)' _VALID_URL = r'https?://teamcoco\.com/(?P<id>([^/]+/)*[^/?#]+)'
_TESTS = [ _TESTS = [
{ {
'url': 'http://teamcoco.com/video/80187/conan-becomes-a-mary-kay-beauty-consultant', 'url': 'http://teamcoco.com/video/mary-kay-remote',
'md5': '3f7746aa0dc86de18df7539903d399ea', 'md5': '55d532f81992f5c92046ad02fec34d7d',
'info_dict': { 'info_dict': {
'id': '80187', 'id': '80187',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Conan Becomes A Mary Kay Beauty Consultant', 'title': 'Conan Becomes A Mary Kay Beauty Consultant',
'description': 'Mary Kay is perhaps the most trusted name in female beauty, so of course Conan is a natural choice to sell their products.', 'description': 'Mary Kay is perhaps the most trusted name in female beauty, so of course Conan is a natural choice to sell their products.',
'duration': 504, 'duration': 495.0,
'age_limit': 0, 'upload_date': '20140402',
'timestamp': 1396407600,
} }
}, { }, {
'url': 'http://teamcoco.com/video/louis-ck-interview-george-w-bush', 'url': 'http://teamcoco.com/video/louis-ck-interview-george-w-bush',
@ -40,7 +39,8 @@ class TeamcocoIE(InfoExtractor):
'description': 'Louis C.K. got starstruck by George W. Bush, so what? Part one.', 'description': 'Louis C.K. got starstruck by George W. Bush, so what? Part one.',
'title': 'Louis C.K. Interview Pt. 1 11/3/11', 'title': 'Louis C.K. Interview Pt. 1 11/3/11',
'duration': 288, 'duration': 288,
'age_limit': 0, 'upload_date': '20111104',
'timestamp': 1320405840,
} }
}, { }, {
'url': 'http://teamcoco.com/video/timothy-olyphant-drinking-whiskey', 'url': 'http://teamcoco.com/video/timothy-olyphant-drinking-whiskey',
@ -49,6 +49,8 @@ class TeamcocoIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'Timothy Olyphant Raises A Toast To “Justified”', 'title': 'Timothy Olyphant Raises A Toast To “Justified”',
'description': 'md5:15501f23f020e793aeca761205e42c24', 'description': 'md5:15501f23f020e793aeca761205e42c24',
'upload_date': '20150415',
'timestamp': 1429088400,
}, },
'params': { 'params': {
'skip_download': True, # m3u8 downloads 'skip_download': True, # m3u8 downloads
@ -63,110 +65,111 @@ class TeamcocoIE(InfoExtractor):
}, },
'params': { 'params': {
'skip_download': True, # m3u8 downloads 'skip_download': True, # m3u8 downloads
} },
'skip': 'This video is no longer available.',
}, {
'url': 'http://teamcoco.com/video/the-conan-audiencey-awards-for-04/25/18',
'only_matching': True,
}, {
'url': 'http://teamcoco.com/italy/conan-jordan-schlansky-hit-the-streets-of-florence',
'only_matching': True,
}, {
'url': 'http://teamcoco.com/haiti/conan-s-haitian-history-lesson',
'only_matching': True,
}, {
'url': 'http://teamcoco.com/israel/conan-hits-the-streets-beaches-of-tel-aviv',
'only_matching': True,
} }
] ]
_VIDEO_ID_REGEXES = (
r'"eVar42"\s*:\s*(\d+)', def _graphql_call(self, query_template, object_type, object_id):
r'Ginger\.TeamCoco\.openInApp\("video",\s*"([^"]+)"', find_object = 'find' + object_type
r'"id_not"\s*:\s*(\d+)' return self._download_json(
) 'http://teamcoco.com/graphql/', object_id, data=json.dumps({
'query': query_template % (find_object, object_id)
}))['data'][find_object]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) display_id = self._match_id(url)
display_id = mobj.group('display_id') response = self._graphql_call('''{
webpage, urlh = self._download_webpage_handle(url, display_id) %s(slug: "%s") {
if 'src=expired' in urlh.geturl(): ... on RecordSlug {
raise ExtractorError('This video is expired.', expected=True) record {
id
title
teaser
publishOn
thumb {
preview
}
file {
url
}
tags {
name
}
duration
}
}
... on NotFoundSlug {
status
}
}
}''', 'Slug', display_id)
if response.get('status'):
raise ExtractorError('This video is no longer available.', expected=True)
video_id = mobj.group('video_id') record = response['record']
if not video_id: video_id = record['id']
video_id = self._html_search_regex(
self._VIDEO_ID_REGEXES, webpage, 'video id')
data = None video_sources = self._graphql_call('''{
%s(id: "%s") {
preload_codes = self._html_search_regex( src
r'(function.+)setTimeout\(function\(\)\{playlist', }
webpage, 'preload codes') }''', 'RecordVideoSource', video_id) or {}
base64_fragments = re.findall(r'"([a-zA-Z0-9+/=]+)"', preload_codes)
base64_fragments.remove('init')
def _check_sequence(cur_fragments):
if not cur_fragments:
return
for i in range(len(cur_fragments)):
cur_sequence = (''.join(cur_fragments[i:] + cur_fragments[:i])).encode('ascii')
try:
raw_data = compat_b64decode(cur_sequence)
if compat_ord(raw_data[0]) == compat_ord('{'):
return json.loads(raw_data.decode('utf-8'))
except (TypeError, binascii.Error, UnicodeDecodeError, ValueError):
continue
def _check_data():
for i in range(len(base64_fragments) + 1):
for j in range(i, len(base64_fragments) + 1):
data = _check_sequence(base64_fragments[:i] + base64_fragments[j:])
if data:
return data
self.to_screen('Try to compute possible data sequence. This may take some time.')
data = _check_data()
if not data:
raise ExtractorError(
'Preload information could not be extracted', expected=True)
formats = [] formats = []
get_quality = qualities(['500k', '480p', '1000k', '720p', '1080p']) get_quality = qualities(['low', 'sd', 'hd', 'uhd'])
for filed in data['files']: for format_id, src in video_sources.get('src', {}).items():
if determine_ext(filed['url']) == 'm3u8': if not isinstance(src, dict):
# compat_urllib_parse.urljoin does not work here
if filed['url'].startswith('/'):
m3u8_url = 'http://ht.cdn.turner.com/tbs/big/teamcoco' + filed['url']
else:
m3u8_url = filed['url']
m3u8_formats = self._extract_m3u8_formats(
m3u8_url, video_id, ext='mp4')
for m3u8_format in m3u8_formats:
if m3u8_format not in formats:
formats.append(m3u8_format)
elif determine_ext(filed['url']) == 'f4m':
# TODO Correct f4m extraction
continue continue
src_url = src.get('src')
if not src_url:
continue
ext = determine_ext(src_url, mimetype2ext(src.get('type')))
if format_id == 'hls' or ext == 'm3u8':
# compat_urllib_parse.urljoin does not work here
if src_url.startswith('/'):
src_url = 'http://ht.cdn.turner.com/tbs/big/teamcoco' + src_url
formats.extend(self._extract_m3u8_formats(
src_url, video_id, 'mp4', m3u8_id=format_id, fatal=False))
else: else:
if filed['url'].startswith('/mp4:protected/'): if src_url.startswith('/mp4:protected/'):
# TODO Correct extraction for these files # TODO Correct extraction for these files
continue continue
m_format = re.search(r'(\d+(k|p))\.mp4', filed['url']) tbr = int_or_none(self._search_regex(
if m_format is not None: r'(\d+)k\.mp4', src_url, 'tbr', default=None))
format_id = m_format.group(1)
else:
format_id = filed['bitrate']
tbr = (
int(filed['bitrate'])
if filed['bitrate'].isdigit()
else None)
formats.append({ formats.append({
'url': filed['url'], 'url': src_url,
'ext': 'mp4', 'ext': ext,
'tbr': tbr, 'tbr': tbr,
'format_id': format_id, 'format_id': format_id,
'quality': get_quality(format_id), 'quality': get_quality(format_id),
}) })
if not formats:
formats = self._extract_m3u8_formats(
record['file']['url'], video_id, 'mp4', fatal=False)
self._sort_formats(formats) self._sort_formats(formats)
return { return {
'id': video_id, 'id': video_id,
'display_id': display_id, 'display_id': display_id,
'formats': formats, 'formats': formats,
'title': data['title'], 'title': record['title'],
'thumbnail': data.get('thumb', {}).get('href'), 'thumbnail': record.get('thumb', {}).get('preview'),
'description': data.get('teaser'), 'description': record.get('teaser'),
'duration': data.get('duration'), 'duration': parse_duration(record.get('duration')),
'age_limit': self._family_friendly_search(webpage), 'timestamp': parse_iso8601(record.get('publishOn')),
} }

View File

@ -32,7 +32,7 @@ class TennisTVIE(InfoExtractor):
_NETRC_MACHINE = 'tennistv' _NETRC_MACHINE = 'tennistv'
def _login(self): def _login(self):
(username, password) = self._get_login_info() username, password = self._get_login_info()
if not username or not password: if not username or not password:
raise ExtractorError('No login info available, needed for using %s.' % self.IE_NAME, expected=True) raise ExtractorError('No login info available, needed for using %s.' % self.IE_NAME, expected=True)

View File

@ -36,7 +36,7 @@ class TubiTvIE(InfoExtractor):
}] }]
def _login(self): def _login(self):
(username, password) = self._get_login_info() username, password = self._get_login_info()
if username is None: if username is None:
return return
self.report_login() self.report_login()

View File

@ -4,11 +4,18 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import int_or_none from ..compat import compat_str
from ..utils import (
ExtractorError,
int_or_none,
urlencode_postdata
)
class TumblrIE(InfoExtractor): class TumblrIE(InfoExtractor):
_VALID_URL = r'https?://(?P<blog_name>[^/?#&]+)\.tumblr\.com/(?:post|video)/(?P<id>[0-9]+)(?:$|[/?#])' _VALID_URL = r'https?://(?P<blog_name>[^/?#&]+)\.tumblr\.com/(?:post|video)/(?P<id>[0-9]+)(?:$|[/?#])'
_NETRC_MACHINE = 'tumblr'
_LOGIN_URL = 'https://www.tumblr.com/login'
_TESTS = [{ _TESTS = [{
'url': 'http://tatianamaslanydaily.tumblr.com/post/54196191430/orphan-black-dvd-extra-behind-the-scenes', 'url': 'http://tatianamaslanydaily.tumblr.com/post/54196191430/orphan-black-dvd-extra-behind-the-scenes',
'md5': '479bb068e5b16462f5176a6828829767', 'md5': '479bb068e5b16462f5176a6828829767',
@ -97,6 +104,45 @@ class TumblrIE(InfoExtractor):
'add_ie': ['Instagram'], 'add_ie': ['Instagram'],
}] }]
def _real_initialize(self):
self._login()
def _login(self):
username, password = self._get_login_info()
if username is None:
return
login_page = self._download_webpage(
self._LOGIN_URL, None, 'Downloading login page')
login_form = self._hidden_inputs(login_page)
login_form.update({
'user[email]': username,
'user[password]': password
})
response, urlh = self._download_webpage_handle(
self._LOGIN_URL, None, 'Logging in',
data=urlencode_postdata(login_form), headers={
'Content-Type': 'application/x-www-form-urlencoded',
'Referer': self._LOGIN_URL,
})
# Successful login
if '/dashboard' in urlh.geturl():
return
login_errors = self._parse_json(
self._search_regex(
r'RegistrationForm\.errors\s*=\s*(\[.+?\])\s*;', response,
'login errors', default='[]'),
None, fatal=False)
if login_errors:
raise ExtractorError(
'Unable to login: %s' % login_errors[0], expected=True)
self.report_warning('Login has probably failed')
def _real_extract(self, url): def _real_extract(self, url):
m_url = re.match(self._VALID_URL, url) m_url = re.match(self._VALID_URL, url)
video_id = m_url.group('id') video_id = m_url.group('id')
@ -105,11 +151,19 @@ class TumblrIE(InfoExtractor):
url = 'http://%s.tumblr.com/post/%s/' % (blog, video_id) url = 'http://%s.tumblr.com/post/%s/' % (blog, video_id)
webpage, urlh = self._download_webpage_handle(url, video_id) webpage, urlh = self._download_webpage_handle(url, video_id)
redirect_url = compat_str(urlh.geturl())
if 'tumblr.com/safe-mode' in redirect_url or redirect_url.startswith('/safe-mode'):
raise ExtractorError(
'This Tumblr may contain sensitive media. '
'Disable safe mode in your account settings '
'at https://www.tumblr.com/settings/account#safe_mode',
expected=True)
iframe_url = self._search_regex( iframe_url = self._search_regex(
r'src=\'(https?://www\.tumblr\.com/video/[^\']+)\'', r'src=\'(https?://www\.tumblr\.com/video/[^\']+)\'',
webpage, 'iframe url', default=None) webpage, 'iframe url', default=None)
if iframe_url is None: if iframe_url is None:
return self.url_result(urlh.geturl(), 'Generic') return self.url_result(redirect_url, 'Generic')
iframe = self._download_webpage(iframe_url, video_id, 'Downloading iframe page') iframe = self._download_webpage(iframe_url, video_id, 'Downloading iframe page')

View File

@ -62,7 +62,7 @@ class TuneInBaseIE(InfoExtractor):
return { return {
'id': content_id, 'id': content_id,
'title': title, 'title': self._live_title(title) if is_live else title,
'formats': formats, 'formats': formats,
'thumbnail': thumbnail, 'thumbnail': thumbnail,
'location': location, 'location': location,

View File

@ -227,14 +227,16 @@ class TVPlayIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {}) url, smuggled_data = unsmuggle_url(url, {})
self._initialize_geo_bypass(smuggled_data.get('geo_countries')) self._initialize_geo_bypass({
'countries': smuggled_data.get('geo_countries'),
})
video_id = self._match_id(url) video_id = self._match_id(url)
geo_country = self._search_regex( geo_country = self._search_regex(
r'https?://[^/]+\.([a-z]{2})', url, r'https?://[^/]+\.([a-z]{2})', url,
'geo country', default=None) 'geo country', default=None)
if geo_country: if geo_country:
self._initialize_geo_bypass([geo_country.upper()]) self._initialize_geo_bypass({'countries': [geo_country.upper()]})
video = self._download_json( video = self._download_json(
'http://playapi.mtgx.tv/v3/videos/%s' % video_id, video_id, 'Downloading video JSON') 'http://playapi.mtgx.tv/v3/videos/%s' % video_id, video_id, 'Downloading video JSON')

View File

@ -8,6 +8,7 @@ import random
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_HTTPError, compat_HTTPError,
compat_kwargs,
compat_parse_qs, compat_parse_qs,
compat_str, compat_str,
compat_urllib_parse_urlencode, compat_urllib_parse_urlencode,
@ -16,11 +17,14 @@ from ..compat import (
from ..utils import ( from ..utils import (
clean_html, clean_html,
ExtractorError, ExtractorError,
float_or_none,
int_or_none, int_or_none,
js_to_json,
orderedSet, orderedSet,
parse_duration, parse_duration,
parse_iso8601, parse_iso8601,
qualities,
try_get,
unified_timestamp,
update_url_query, update_url_query,
urlencode_postdata, urlencode_postdata,
urljoin, urljoin,
@ -45,10 +49,11 @@ class TwitchBaseIE(InfoExtractor):
'%s returned error: %s - %s' % (self.IE_NAME, error, response.get('message')), '%s returned error: %s - %s' % (self.IE_NAME, error, response.get('message')),
expected=True) expected=True)
def _call_api(self, path, item_id, note): def _call_api(self, path, item_id, *args, **kwargs):
kwargs.setdefault('headers', {})['Client-ID'] = self._CLIENT_ID
response = self._download_json( response = self._download_json(
'%s/%s' % (self._API_BASE, path), item_id, note, '%s/%s' % (self._API_BASE, path), item_id,
headers={'Client-ID': self._CLIENT_ID}) *args, **compat_kwargs(kwargs))
self._handle_error(response) self._handle_error(response)
return response return response
@ -56,7 +61,7 @@ class TwitchBaseIE(InfoExtractor):
self._login() self._login()
def _login(self): def _login(self):
(username, password) = self._get_login_info() username, password = self._get_login_info()
if username is None: if username is None:
return return
@ -168,6 +173,13 @@ class TwitchItemBaseIE(TwitchBaseIE):
return self.playlist_result(entries, info['id'], info['title']) return self.playlist_result(entries, info['id'], info['title'])
def _extract_info(self, info): def _extract_info(self, info):
status = info.get('status')
if status == 'recording':
is_live = True
elif status == 'recorded':
is_live = False
else:
is_live = None
return { return {
'id': info['_id'], 'id': info['_id'],
'title': info.get('title') or 'Untitled Broadcast', 'title': info.get('title') or 'Untitled Broadcast',
@ -178,6 +190,7 @@ class TwitchItemBaseIE(TwitchBaseIE):
'uploader_id': info.get('channel', {}).get('name'), 'uploader_id': info.get('channel', {}).get('name'),
'timestamp': parse_iso8601(info.get('recorded_at')), 'timestamp': parse_iso8601(info.get('recorded_at')),
'view_count': int_or_none(info.get('views')), 'view_count': int_or_none(info.get('views')),
'is_live': is_live,
} }
def _real_extract(self, url): def _real_extract(self, url):
@ -614,21 +627,23 @@ class TwitchStreamIE(TwitchBaseIE):
} }
class TwitchClipsIE(InfoExtractor): class TwitchClipsIE(TwitchBaseIE):
IE_NAME = 'twitch:clips' IE_NAME = 'twitch:clips'
_VALID_URL = r'https?://clips\.twitch\.tv/(?:[^/]+/)*(?P<id>[^/?#&]+)' _VALID_URL = r'https?://clips\.twitch\.tv/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://clips.twitch.tv/ea/AggressiveCobraPoooound', 'url': 'https://clips.twitch.tv/FaintLightGullWholeWheat',
'md5': '761769e1eafce0ffebfb4089cb3847cd', 'md5': '761769e1eafce0ffebfb4089cb3847cd',
'info_dict': { 'info_dict': {
'id': 'AggressiveCobraPoooound', 'id': '42850523',
'ext': 'mp4', 'ext': 'mp4',
'title': 'EA Play 2016 Live from the Novo Theatre', 'title': 'EA Play 2016 Live from the Novo Theatre',
'thumbnail': r're:^https?://.*\.jpg', 'thumbnail': r're:^https?://.*\.jpg',
'timestamp': 1465767393,
'upload_date': '20160612',
'creator': 'EA', 'creator': 'EA',
'uploader': 'stereotype_', 'uploader': 'stereotype_',
'uploader_id': 'stereotype_', 'uploader_id': '43566419',
}, },
}, { }, {
# multiple formats # multiple formats
@ -639,34 +654,63 @@ class TwitchClipsIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) status = self._download_json(
'https://clips.twitch.tv/api/v2/clips/%s/status' % video_id,
video_id)
clip = self._parse_json( formats = []
self._search_regex(
r'(?s)clipInfo\s*=\s*({.+?});', webpage, 'clip info'),
video_id, transform_source=js_to_json)
title = clip.get('title') or clip.get('channel_title') or self._og_search_title(webpage) for option in status['quality_options']:
if not isinstance(option, dict):
formats = [{ continue
'url': option['source'], source = option.get('source')
if not source or not isinstance(source, compat_str):
continue
formats.append({
'url': source,
'format_id': option.get('quality'), 'format_id': option.get('quality'),
'height': int_or_none(option.get('quality')), 'height': int_or_none(option.get('quality')),
} for option in clip.get('quality_options', []) if option.get('source')] 'fps': int_or_none(option.get('frame_rate')),
})
if not formats:
formats = [{
'url': clip['clip_video_url'],
}]
self._sort_formats(formats) self._sort_formats(formats)
return { info = {
'id': video_id,
'title': title,
'thumbnail': self._og_search_thumbnail(webpage),
'creator': clip.get('broadcaster_display_name') or clip.get('broadcaster_login'),
'uploader': clip.get('curator_login'),
'uploader_id': clip.get('curator_display_name'),
'formats': formats, 'formats': formats,
} }
clip = self._call_api(
'kraken/clips/%s' % video_id, video_id, fatal=False, headers={
'Accept': 'application/vnd.twitchtv.v5+json',
})
if clip:
quality_key = qualities(('tiny', 'small', 'medium'))
thumbnails = []
thumbnails_dict = clip.get('thumbnails')
if isinstance(thumbnails_dict, dict):
for thumbnail_id, thumbnail_url in thumbnails_dict.items():
thumbnails.append({
'id': thumbnail_id,
'url': thumbnail_url,
'preference': quality_key(thumbnail_id),
})
info.update({
'id': clip.get('tracking_id') or video_id,
'title': clip.get('title') or video_id,
'duration': float_or_none(clip.get('duration')),
'views': int_or_none(clip.get('views')),
'timestamp': unified_timestamp(clip.get('created_at')),
'thumbnails': thumbnails,
'creator': try_get(clip, lambda x: x['broadcaster']['display_name'], compat_str),
'uploader': try_get(clip, lambda x: x['curator']['display_name'], compat_str),
'uploader_id': try_get(clip, lambda x: x['curator']['id'], compat_str),
})
else:
info.update({
'title': video_id,
'id': video_id,
})
return info

View File

@ -18,6 +18,7 @@ from ..utils import (
int_or_none, int_or_none,
js_to_json, js_to_json,
sanitized_Request, sanitized_Request,
try_get,
unescapeHTML, unescapeHTML,
urlencode_postdata, urlencode_postdata,
) )
@ -58,6 +59,10 @@ class UdemyIE(InfoExtractor):
# no url in outputs format entry # no url in outputs format entry
'url': 'https://www.udemy.com/learn-web-development-complete-step-by-step-guide-to-success/learn/v4/t/lecture/4125812', 'url': 'https://www.udemy.com/learn-web-development-complete-step-by-step-guide-to-success/learn/v4/t/lecture/4125812',
'only_matching': True, 'only_matching': True,
}, {
# only outputs rendition
'url': 'https://www.udemy.com/how-you-can-help-your-local-community-5-amazing-examples/learn/v4/t/lecture/3225750?start=0',
'only_matching': True,
}] }]
def _extract_course_info(self, webpage, video_id): def _extract_course_info(self, webpage, video_id):
@ -101,7 +106,7 @@ class UdemyIE(InfoExtractor):
% (course_id, lecture_id), % (course_id, lecture_id),
lecture_id, 'Downloading lecture JSON', query={ lecture_id, 'Downloading lecture JSON', query={
'fields[lecture]': 'title,description,view_html,asset', 'fields[lecture]': 'title,description,view_html,asset',
'fields[asset]': 'asset_type,stream_url,thumbnail_url,download_urls,data', 'fields[asset]': 'asset_type,stream_url,thumbnail_url,download_urls,stream_urls,captions,data',
}) })
def _handle_error(self, response): def _handle_error(self, response):
@ -115,9 +120,9 @@ class UdemyIE(InfoExtractor):
error_str += ' - %s' % error_data.get('formErrors') error_str += ' - %s' % error_data.get('formErrors')
raise ExtractorError(error_str, expected=True) raise ExtractorError(error_str, expected=True)
def _download_webpage(self, *args, **kwargs): def _download_webpage_handle(self, *args, **kwargs):
kwargs.setdefault('headers', {})['User-Agent'] = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/603.2.4 (KHTML, like Gecko) Version/10.1.1 Safari/603.2.4' kwargs.setdefault('headers', {})['User-Agent'] = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/603.2.4 (KHTML, like Gecko) Version/10.1.1 Safari/603.2.4'
return super(UdemyIE, self)._download_webpage( return super(UdemyIE, self)._download_webpage_handle(
*args, **compat_kwargs(kwargs)) *args, **compat_kwargs(kwargs))
def _download_json(self, url_or_request, *args, **kwargs): def _download_json(self, url_or_request, *args, **kwargs):
@ -146,7 +151,7 @@ class UdemyIE(InfoExtractor):
self._login() self._login()
def _login(self): def _login(self):
(username, password) = self._get_login_info() username, password = self._get_login_info()
if username is None: if username is None:
return return
@ -299,9 +304,25 @@ class UdemyIE(InfoExtractor):
'url': src, 'url': src,
}) })
download_urls = asset.get('download_urls') for url_kind in ('download', 'stream'):
if isinstance(download_urls, dict): urls = asset.get('%s_urls' % url_kind)
extract_formats(download_urls.get('Video')) if isinstance(urls, dict):
extract_formats(urls.get('Video'))
captions = asset.get('captions')
if isinstance(captions, list):
for cc in captions:
if not isinstance(cc, dict):
continue
cc_url = cc.get('url')
if not cc_url or not isinstance(cc_url, compat_str):
continue
lang = try_get(cc, lambda x: x['locale']['locale'], compat_str)
sub_dict = (automatic_captions if cc.get('source') == 'auto'
else subtitles)
sub_dict.setdefault(lang or 'en', []).append({
'url': cc_url,
})
view_html = lecture.get('view_html') view_html = lecture.get('view_html')
if view_html: if view_html:
@ -357,6 +378,12 @@ class UdemyIE(InfoExtractor):
fatal=False) fatal=False)
extract_subtitles(text_tracks) extract_subtitles(text_tracks)
if not formats and outputs:
for format_id, output in outputs.items():
f = extract_output_format(output, format_id)
if f.get('url'):
formats.append(f)
self._sort_formats(formats, field_preference=('height', 'width', 'tbr', 'format_id')) self._sort_formats(formats, field_preference=('height', 'width', 'tbr', 'format_id'))
return { return {

View File

@ -3,13 +3,16 @@ from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
ExtractorError,
parse_duration, parse_duration,
parse_iso8601, parse_iso8601,
urlencode_postdata,
) )
class UFCTVIE(InfoExtractor): class UFCTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?ufc\.tv/video/(?P<id>[^/]+)' _VALID_URL = r'https?://(?:www\.)?ufc\.tv/video/(?P<id>[^/]+)'
_NETRC_MACHINE = 'ufctv'
_TEST = { _TEST = {
'url': 'https://www.ufc.tv/video/ufc-219-countdown-full-episode', 'url': 'https://www.ufc.tv/video/ufc-219-countdown-full-episode',
'info_dict': { 'info_dict': {
@ -26,6 +29,21 @@ class UFCTVIE(InfoExtractor):
} }
} }
def _real_initialize(self):
username, password = self._get_login_info()
if username is None:
return
code = self._download_json(
'https://www.ufc.tv/secure/authenticate',
None, 'Logging in', data=urlencode_postdata({
'username': username,
'password': password,
'format': 'json',
})).get('code')
if code and code != 'loginsuccess':
raise ExtractorError(code, expected=True)
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) display_id = self._match_id(url)
video_data = self._download_json(url, display_id, query={ video_data = self._download_json(url, display_id, query={

View File

@ -75,7 +75,7 @@ class VesselIE(InfoExtractor):
'Access to this content is restricted. (%s said: %s)' % (self.IE_NAME, err_code), expected=True) 'Access to this content is restricted. (%s said: %s)' % (self.IE_NAME, err_code), expected=True)
def _login(self): def _login(self):
(username, password) = self._get_login_info() username, password = self._get_login_info()
if username is None: if username is None:
return return
self.report_login() self.report_login()

View File

@ -1,24 +1,27 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import base64
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_urllib_parse_unquote
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
clean_html, clean_html,
determine_ext, determine_ext,
int_or_none, int_or_none,
js_to_json, js_to_json,
parse_age_limit,
parse_duration, parse_duration,
) )
class ViewLiftBaseIE(InfoExtractor): class ViewLiftBaseIE(InfoExtractor):
_DOMAINS_REGEX = r'(?:snagfilms|snagxtreme|funnyforfree|kiddovid|winnersview|monumentalsportsnetwork|vayafilm)\.com|kesari\.tv' _DOMAINS_REGEX = r'(?:snagfilms|snagxtreme|funnyforfree|kiddovid|winnersview|(?:monumental|lax)sportsnetwork|vayafilm)\.com|hoichoi\.tv'
class ViewLiftEmbedIE(ViewLiftBaseIE): class ViewLiftEmbedIE(ViewLiftBaseIE):
_VALID_URL = r'https?://(?:(?:www|embed)\.)?(?:%s)/embed/player\?.*\bfilmId=(?P<id>[\da-f-]{36})' % ViewLiftBaseIE._DOMAINS_REGEX _VALID_URL = r'https?://(?:(?:www|embed)\.)?(?:%s)/embed/player\?.*\bfilmId=(?P<id>[\da-f]{8}-(?:[\da-f]{4}-){3}[\da-f]{12})' % ViewLiftBaseIE._DOMAINS_REGEX
_TESTS = [{ _TESTS = [{
'url': 'http://embed.snagfilms.com/embed/player?filmId=74849a00-85a9-11e1-9660-123139220831&w=500', 'url': 'http://embed.snagfilms.com/embed/player?filmId=74849a00-85a9-11e1-9660-123139220831&w=500',
'md5': '2924e9215c6eff7a55ed35b72276bd93', 'md5': '2924e9215c6eff7a55ed35b72276bd93',
@ -60,8 +63,10 @@ class ViewLiftEmbedIE(ViewLiftBaseIE):
formats = [] formats = []
has_bitrate = False has_bitrate = False
for source in self._parse_json(js_to_json(self._search_regex( sources = self._parse_json(self._search_regex(
r'(?s)sources:\s*(\[.+?\]),', webpage, 'json')), video_id): r'(?s)sources:\s*(\[.+?\]),', webpage,
'sources', default='[]'), video_id, js_to_json)
for source in sources:
file_ = source.get('file') file_ = source.get('file')
if not file_: if not file_:
continue continue
@ -70,7 +75,8 @@ class ViewLiftEmbedIE(ViewLiftBaseIE):
format_id = source.get('label') or ext format_id = source.get('label') or ext
if all(v in ('m3u8', 'hls') for v in (type_, ext)): if all(v in ('m3u8', 'hls') for v in (type_, ext)):
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
file_, video_id, 'mp4', m3u8_id='hls')) file_, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
else: else:
bitrate = int_or_none(self._search_regex( bitrate = int_or_none(self._search_regex(
[r'(\d+)kbps', r'_\d{1,2}x\d{1,2}_(\d{3,})\.%s' % ext], [r'(\d+)kbps', r'_\d{1,2}x\d{1,2}_(\d{3,})\.%s' % ext],
@ -85,6 +91,13 @@ class ViewLiftEmbedIE(ViewLiftBaseIE):
'tbr': bitrate, 'tbr': bitrate,
'height': height, 'height': height,
}) })
if not formats:
hls_url = self._parse_json(self._search_regex(
r'filmInfo\.src\s*=\s*({.+?});',
webpage, 'src'), video_id, js_to_json)['src']
formats = self._extract_m3u8_formats(
hls_url, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False)
field_preference = None if has_bitrate else ('height', 'tbr', 'format_id') field_preference = None if has_bitrate else ('height', 'tbr', 'format_id')
self._sort_formats(formats, field_preference) self._sort_formats(formats, field_preference)
@ -109,10 +122,13 @@ class ViewLiftIE(ViewLiftBaseIE):
'display_id': 'lost_for_life', 'display_id': 'lost_for_life',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Lost for Life', 'title': 'Lost for Life',
'description': 'md5:fbdacc8bb6b455e464aaf98bc02e1c82', 'description': 'md5:ea10b5a50405ae1f7b5269a6ec594102',
'thumbnail': r're:^https?://.*\.jpg', 'thumbnail': r're:^https?://.*\.jpg',
'duration': 4489, 'duration': 4489,
'categories': ['Documentary', 'Crime', 'Award Winning', 'Festivals'] 'categories': 'mincount:3',
'age_limit': 14,
'upload_date': '20150421',
'timestamp': 1429656819,
} }
}, { }, {
'url': 'http://www.snagfilms.com/show/the_world_cut_project/india', 'url': 'http://www.snagfilms.com/show/the_world_cut_project/india',
@ -125,7 +141,9 @@ class ViewLiftIE(ViewLiftBaseIE):
'description': 'md5:5c168c5a8f4719c146aad2e0dfac6f5f', 'description': 'md5:5c168c5a8f4719c146aad2e0dfac6f5f',
'thumbnail': r're:^https?://.*\.jpg', 'thumbnail': r're:^https?://.*\.jpg',
'duration': 979, 'duration': 979,
'categories': ['Documentary', 'Sports', 'Politics'] 'categories': 'mincount:2',
'timestamp': 1399478279,
'upload_date': '20140507',
} }
}, { }, {
# Film is not playable in your area. # Film is not playable in your area.
@ -138,9 +156,6 @@ class ViewLiftIE(ViewLiftBaseIE):
}, { }, {
'url': 'http://www.winnersview.com/videos/the-good-son', 'url': 'http://www.winnersview.com/videos/the-good-son',
'only_matching': True, 'only_matching': True,
}, {
'url': 'http://www.kesari.tv/news/video/1461919076414',
'only_matching': True,
}, { }, {
# Was once Kaltura embed # Was once Kaltura embed
'url': 'https://www.monumentalsportsnetwork.com/videos/john-carlson-postgame-2-25-15', 'url': 'https://www.monumentalsportsnetwork.com/videos/john-carlson-postgame-2-25-15',
@ -156,11 +171,62 @@ class ViewLiftIE(ViewLiftBaseIE):
raise ExtractorError( raise ExtractorError(
'Film %s is not available.' % display_id, expected=True) 'Film %s is not available.' % display_id, expected=True)
initial_store_state = self._search_regex(
r"window\.initialStoreState\s*=.*?JSON\.parse\(unescape\(atob\('([^']+)'\)\)\)",
webpage, 'Initial Store State', default=None)
if initial_store_state:
modules = self._parse_json(compat_urllib_parse_unquote(base64.b64decode(
initial_store_state).decode()), display_id)['page']['data']['modules']
content_data = next(m['contentData'][0] for m in modules if m.get('moduleType') == 'VideoDetailModule')
gist = content_data['gist']
film_id = gist['id']
title = gist['title']
video_assets = content_data['streamingInfo']['videoAssets']
formats = []
mpeg_video_assets = video_assets.get('mpeg') or []
for video_asset in mpeg_video_assets:
video_asset_url = video_asset.get('url')
if not video_asset:
continue
bitrate = int_or_none(video_asset.get('bitrate'))
height = int_or_none(self._search_regex(
r'^_?(\d+)[pP]$', video_asset.get('renditionValue'),
'height', default=None))
formats.append({
'url': video_asset_url,
'format_id': 'http%s' % ('-%d' % bitrate if bitrate else ''),
'tbr': bitrate,
'height': height,
'vcodec': video_asset.get('codec'),
})
hls_url = video_assets.get('hls')
if hls_url:
formats.extend(self._extract_m3u8_formats(
hls_url, film_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
self._sort_formats(formats, ('height', 'tbr', 'format_id'))
info = {
'id': film_id,
'display_id': display_id,
'title': title,
'description': gist.get('description'),
'thumbnail': gist.get('videoImageUrl'),
'duration': int_or_none(gist.get('runtime')),
'age_limit': parse_age_limit(content_data.get('parentalRating')),
'timestamp': int_or_none(gist.get('publishDate'), 1000),
'formats': formats,
}
for k in ('categories', 'tags'):
info[k] = [v['title'] for v in content_data.get(k, []) if v.get('title')]
return info
else:
film_id = self._search_regex(r'filmId=([\da-f-]{36})"', webpage, 'film id') film_id = self._search_regex(r'filmId=([\da-f-]{36})"', webpage, 'film id')
snag = self._parse_json( snag = self._parse_json(
self._search_regex( self._search_regex(
r'Snag\.page\.data\s*=\s*(\[.+?\]);', webpage, 'snag'), r'Snag\.page\.data\s*=\s*(\[.+?\]);', webpage, 'snag', default='[]'),
display_id) display_id)
for item in snag: for item in snag:

View File

@ -88,7 +88,7 @@ class VikiBaseIE(InfoExtractor):
self._login() self._login()
def _login(self): def _login(self):
(username, password) = self._get_login_info() username, password = self._get_login_info()
if username is None: if username is None:
return return

View File

@ -16,6 +16,7 @@ from ..utils import (
ExtractorError, ExtractorError,
InAdvancePagedList, InAdvancePagedList,
int_or_none, int_or_none,
merge_dicts,
NO_DEFAULT, NO_DEFAULT,
RegexNotFoundError, RegexNotFoundError,
sanitized_Request, sanitized_Request,
@ -36,7 +37,7 @@ class VimeoBaseInfoExtractor(InfoExtractor):
_LOGIN_URL = 'https://vimeo.com/log_in' _LOGIN_URL = 'https://vimeo.com/log_in'
def _login(self): def _login(self):
(username, password) = self._get_login_info() username, password = self._get_login_info()
if username is None: if username is None:
if self._LOGIN_REQUIRED: if self._LOGIN_REQUIRED:
raise ExtractorError('No login info available, needed for using %s.' % self.IE_NAME, expected=True) raise ExtractorError('No login info available, needed for using %s.' % self.IE_NAME, expected=True)
@ -639,16 +640,18 @@ class VimeoIE(VimeoBaseInfoExtractor):
'preference': 1, 'preference': 1,
}) })
info_dict = self._parse_config(config, video_id) info_dict_config = self._parse_config(config, video_id)
formats.extend(info_dict['formats']) formats.extend(info_dict_config['formats'])
self._vimeo_sort_formats(formats) self._vimeo_sort_formats(formats)
json_ld = self._search_json_ld(webpage, video_id, default={})
if not cc_license: if not cc_license:
cc_license = self._search_regex( cc_license = self._search_regex(
r'<link[^>]+rel=["\']license["\'][^>]+href=(["\'])(?P<license>(?:(?!\1).)+)\1', r'<link[^>]+rel=["\']license["\'][^>]+href=(["\'])(?P<license>(?:(?!\1).)+)\1',
webpage, 'license', default=None, group='license') webpage, 'license', default=None, group='license')
info_dict.update({ info_dict = {
'id': video_id, 'id': video_id,
'formats': formats, 'formats': formats,
'timestamp': unified_timestamp(timestamp), 'timestamp': unified_timestamp(timestamp),
@ -658,7 +661,9 @@ class VimeoIE(VimeoBaseInfoExtractor):
'like_count': like_count, 'like_count': like_count,
'comment_count': comment_count, 'comment_count': comment_count,
'license': cc_license, 'license': cc_license,
}) }
info_dict = merge_dicts(info_dict, info_dict_config, json_ld)
return info_dict return info_dict
@ -984,10 +989,10 @@ class VimeoWatchLaterIE(VimeoChannelIE):
class VimeoLikesIE(InfoExtractor): class VimeoLikesIE(InfoExtractor):
_VALID_URL = r'https://(?:www\.)?vimeo\.com/user(?P<id>[0-9]+)/likes/?(?:$|[?#]|sort:)' _VALID_URL = r'https://(?:www\.)?vimeo\.com/(?P<id>[^/]+)/likes/?(?:$|[?#]|sort:)'
IE_NAME = 'vimeo:likes' IE_NAME = 'vimeo:likes'
IE_DESC = 'Vimeo user likes' IE_DESC = 'Vimeo user likes'
_TEST = { _TESTS = [{
'url': 'https://vimeo.com/user755559/likes/', 'url': 'https://vimeo.com/user755559/likes/',
'playlist_mincount': 293, 'playlist_mincount': 293,
'info_dict': { 'info_dict': {
@ -995,7 +1000,10 @@ class VimeoLikesIE(InfoExtractor):
'description': 'See all the videos urza likes', 'description': 'See all the videos urza likes',
'title': 'Videos urza likes', 'title': 'Videos urza likes',
}, },
} }, {
'url': 'https://vimeo.com/stormlapse/likes',
'only_matching': True,
}]
def _real_extract(self, url): def _real_extract(self, url):
user_id = self._match_id(url) user_id = self._match_id(url)
@ -1004,7 +1012,7 @@ class VimeoLikesIE(InfoExtractor):
self._search_regex( self._search_regex(
r'''(?x)<li><a\s+href="[^"]+"\s+data-page="([0-9]+)"> r'''(?x)<li><a\s+href="[^"]+"\s+data-page="([0-9]+)">
.*?</a></li>\s*<li\s+class="pagination_next"> .*?</a></li>\s*<li\s+class="pagination_next">
''', webpage, 'page count'), ''', webpage, 'page count', default=1),
'page count', fatal=True) 'page count', fatal=True)
PAGE_SIZE = 12 PAGE_SIZE = 12
title = self._html_search_regex( title = self._html_search_regex(
@ -1012,7 +1020,7 @@ class VimeoLikesIE(InfoExtractor):
description = self._html_search_meta('description', webpage) description = self._html_search_meta('description', webpage)
def _get_page(idx): def _get_page(idx):
page_url = 'https://vimeo.com/user%s/likes/page:%d/sort:date' % ( page_url = 'https://vimeo.com/%s/likes/page:%d/sort:date' % (
user_id, idx + 1) user_id, idx + 1)
webpage = self._download_webpage( webpage = self._download_webpage(
page_url, user_id, page_url, user_id,
@ -1032,7 +1040,7 @@ class VimeoLikesIE(InfoExtractor):
return { return {
'_type': 'playlist', '_type': 'playlist',
'id': 'user%s_likes' % user_id, 'id': '%s_likes' % user_id,
'title': title, 'title': title,
'description': description, 'description': description,
'entries': pl, 'entries': pl,

View File

@ -32,7 +32,7 @@ class VKBaseIE(InfoExtractor):
_NETRC_MACHINE = 'vk' _NETRC_MACHINE = 'vk'
def _login(self): def _login(self):
(username, password) = self._get_login_info() username, password = self._get_login_info()
if username is None: if username is None:
return return

View File

@ -69,7 +69,7 @@ class WatchBoxIE(InfoExtractor):
source = self._parse_json( source = self._parse_json(
self._search_regex( self._search_regex(
r'(?s)source\s*:\s*({.+?})\s*,\s*\n', webpage, 'source', r'(?s)source["\']?\s*:\s*({.+?})\s*[,}]', webpage, 'source',
default='{}'), default='{}'),
video_id, transform_source=js_to_json, fatal=False) or {} video_id, transform_source=js_to_json, fatal=False) or {}

View File

@ -9,8 +9,8 @@ from ..utils import int_or_none
class XiamiBaseIE(InfoExtractor): class XiamiBaseIE(InfoExtractor):
_API_BASE_URL = 'http://www.xiami.com/song/playlist/cat/json/id' _API_BASE_URL = 'http://www.xiami.com/song/playlist/cat/json/id'
def _download_webpage(self, *args, **kwargs): def _download_webpage_handle(self, *args, **kwargs):
webpage = super(XiamiBaseIE, self)._download_webpage(*args, **kwargs) webpage = super(XiamiBaseIE, self)._download_webpage_handle(*args, **kwargs)
if '>Xiami is currently not available in your country.<' in webpage: if '>Xiami is currently not available in your country.<' in webpage:
self.raise_geo_restricted('Xiami is currently not available in your country') self.raise_geo_restricted('Xiami is currently not available in your country')
return webpage return webpage

View File

@ -34,8 +34,8 @@ class YandexMusicBaseIE(InfoExtractor):
'youtube-dl with --cookies', 'youtube-dl with --cookies',
expected=True) expected=True)
def _download_webpage(self, *args, **kwargs): def _download_webpage_handle(self, *args, **kwargs):
webpage = super(YandexMusicBaseIE, self)._download_webpage(*args, **kwargs) webpage = super(YandexMusicBaseIE, self)._download_webpage_handle(*args, **kwargs)
if 'Нам очень жаль, но&nbsp;запросы, поступившие с&nbsp;вашего IP-адреса, похожи на&nbsp;автоматические.' in webpage: if 'Нам очень жаль, но&nbsp;запросы, поступившие с&nbsp;вашего IP-адреса, похожи на&nbsp;автоматические.' in webpage:
self._raise_captcha() self._raise_captcha()
return webpage return webpage
@ -57,14 +57,14 @@ class YandexMusicTrackIE(YandexMusicBaseIE):
'info_dict': { 'info_dict': {
'id': '4878838', 'id': '4878838',
'ext': 'mp3', 'ext': 'mp3',
'title': 'Carlo Ambrosio & Fabio Di Bari, Carlo Ambrosio - Gypsy Eyes 1', 'title': 'Carlo Ambrosio, Carlo Ambrosio & Fabio Di Bari - Gypsy Eyes 1',
'filesize': 4628061, 'filesize': 4628061,
'duration': 193.04, 'duration': 193.04,
'track': 'Gypsy Eyes 1', 'track': 'Gypsy Eyes 1',
'album': 'Gypsy Soul', 'album': 'Gypsy Soul',
'album_artist': 'Carlo Ambrosio', 'album_artist': 'Carlo Ambrosio',
'artist': 'Carlo Ambrosio & Fabio Di Bari, Carlo Ambrosio', 'artist': 'Carlo Ambrosio, Carlo Ambrosio & Fabio Di Bari',
'release_year': '2009', 'release_year': 2009,
}, },
'skip': 'Travis CI servers blocked by YandexMusic', 'skip': 'Travis CI servers blocked by YandexMusic',
} }
@ -120,7 +120,7 @@ class YandexMusicTrackIE(YandexMusicBaseIE):
track_info.update({ track_info.update({
'album': album.get('title'), 'album': album.get('title'),
'album_artist': extract_artist(album.get('artists')), 'album_artist': extract_artist(album.get('artists')),
'release_year': compat_str(year) if year else None, 'release_year': int_or_none(year),
}) })
track_artist = extract_artist(track.get('artists')) track_artist = extract_artist(track.get('artists'))

View File

@ -37,6 +37,7 @@ from ..utils import (
orderedSet, orderedSet,
parse_codecs, parse_codecs,
parse_duration, parse_duration,
qualities,
remove_quotes, remove_quotes,
remove_start, remove_start,
smuggle_url, smuggle_url,
@ -84,7 +85,7 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
If _LOGIN_REQUIRED is set and no authentication was provided, an error is raised. If _LOGIN_REQUIRED is set and no authentication was provided, an error is raised.
""" """
(username, password) = self._get_login_info() username, password = self._get_login_info()
# No authentication to be performed # No authentication to be performed
if username is None: if username is None:
if self._LOGIN_REQUIRED and self._downloader.params.get('cookiefile') is None: if self._LOGIN_REQUIRED and self._downloader.params.get('cookiefile') is None:
@ -246,9 +247,9 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
return True return True
def _download_webpage(self, *args, **kwargs): def _download_webpage_handle(self, *args, **kwargs):
kwargs.setdefault('query', {})['disable_polymer'] = 'true' kwargs.setdefault('query', {})['disable_polymer'] = 'true'
return super(YoutubeBaseInfoExtractor, self)._download_webpage( return super(YoutubeBaseInfoExtractor, self)._download_webpage_handle(
*args, **compat_kwargs(kwargs)) *args, **compat_kwargs(kwargs))
def _real_initialize(self): def _real_initialize(self):
@ -1537,7 +1538,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
ytplayer_config = self._get_ytplayer_config(video_id, video_webpage) ytplayer_config = self._get_ytplayer_config(video_id, video_webpage)
if ytplayer_config: if ytplayer_config:
args = ytplayer_config['args'] args = ytplayer_config['args']
if args.get('url_encoded_fmt_stream_map'): if args.get('url_encoded_fmt_stream_map') or args.get('hlsvp'):
# Convert to the same format returned by compat_parse_qs # Convert to the same format returned by compat_parse_qs
video_info = dict((k, [v]) for k, v in args.items()) video_info = dict((k, [v]) for k, v in args.items())
add_dash_mpd(video_info) add_dash_mpd(video_info)
@ -1697,9 +1698,11 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
self.report_information_extraction(video_id) self.report_information_extraction(video_id)
# uploader # uploader
if 'author' not in video_info: video_uploader = try_get(video_info, lambda x: x['author'][0], compat_str)
raise ExtractorError('Unable to extract uploader name') if video_uploader:
video_uploader = compat_urllib_parse_unquote_plus(video_info['author'][0]) video_uploader = compat_urllib_parse_unquote_plus(video_uploader)
else:
self._downloader.report_warning('unable to extract uploader name')
# uploader_id # uploader_id
video_uploader_id = None video_uploader_id = None
@ -1813,6 +1816,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
chapters = self._extract_chapters(description_original, video_duration) chapters = self._extract_chapters(description_original, video_duration)
def _extract_filesize(media_url):
return int_or_none(self._search_regex(
r'\bclen[=/](\d+)', media_url, 'filesize', default=None))
if 'conn' in video_info and video_info['conn'][0].startswith('rtmp'): if 'conn' in video_info and video_info['conn'][0].startswith('rtmp'):
self.report_rtmp_download() self.report_rtmp_download()
formats = [{ formats = [{
@ -1838,6 +1845,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'width': int_or_none(width_height[0]), 'width': int_or_none(width_height[0]),
'height': int_or_none(width_height[1]), 'height': int_or_none(width_height[1]),
} }
q = qualities(['small', 'medium', 'hd720'])
formats = [] formats = []
for url_data_str in encoded_url_map.split(','): for url_data_str in encoded_url_map.split(','):
url_data = compat_parse_qs(url_data_str) url_data = compat_parse_qs(url_data_str)
@ -1917,13 +1925,19 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
mobj = re.search(r'^(?P<width>\d+)[xX](?P<height>\d+)$', url_data.get('size', [''])[0]) mobj = re.search(r'^(?P<width>\d+)[xX](?P<height>\d+)$', url_data.get('size', [''])[0])
width, height = (int(mobj.group('width')), int(mobj.group('height'))) if mobj else (None, None) width, height = (int(mobj.group('width')), int(mobj.group('height'))) if mobj else (None, None)
filesize = int_or_none(url_data.get(
'clen', [None])[0]) or _extract_filesize(url)
quality = url_data.get('quality_label', [None])[0] or url_data.get('quality', [None])[0]
more_fields = { more_fields = {
'filesize': int_or_none(url_data.get('clen', [None])[0]), 'filesize': filesize,
'tbr': float_or_none(url_data.get('bitrate', [None])[0], 1000), 'tbr': float_or_none(url_data.get('bitrate', [None])[0], 1000),
'width': width, 'width': width,
'height': height, 'height': height,
'fps': int_or_none(url_data.get('fps', [None])[0]), 'fps': int_or_none(url_data.get('fps', [None])[0]),
'format_note': url_data.get('quality_label', [None])[0] or url_data.get('quality', [None])[0], 'format_note': quality,
'quality': q(quality),
} }
for key, value in more_fields.items(): for key, value in more_fields.items():
if value: if value:
@ -1969,9 +1983,11 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
a_format.setdefault('http_headers', {})['Youtubedl-no-compression'] = 'True' a_format.setdefault('http_headers', {})['Youtubedl-no-compression'] = 'True'
formats.append(a_format) formats.append(a_format)
else: else:
unavailable_message = extract_unavailable_message() error_message = clean_html(video_info.get('reason', [None])[0])
if unavailable_message: if not error_message:
raise ExtractorError(unavailable_message, expected=True) error_message = extract_unavailable_message()
if error_message:
raise ExtractorError(error_message, expected=True)
raise ExtractorError('no conn, hlsvp or url_encoded_fmt_stream_map information found in video info') raise ExtractorError('no conn, hlsvp or url_encoded_fmt_stream_map information found in video info')
# Look for the DASH manifest # Look for the DASH manifest
@ -1990,6 +2006,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
for df in self._extract_mpd_formats( for df in self._extract_mpd_formats(
mpd_url, video_id, fatal=dash_mpd_fatal, mpd_url, video_id, fatal=dash_mpd_fatal,
formats_dict=self._formats): formats_dict=self._formats):
if not df.get('filesize'):
df['filesize'] = _extract_filesize(df['url'])
# Do not overwrite DASH format found in some previous DASH manifest # Do not overwrite DASH format found in some previous DASH manifest
if df['format_id'] not in dash_formats: if df['format_id'] not in dash_formats:
dash_formats[df['format_id']] = df dash_formats[df['format_id']] = df

View File

@ -0,0 +1,270 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from uuid import uuid4
from .common import InfoExtractor
from ..compat import (
compat_HTTPError,
compat_str,
)
from ..utils import (
ExtractorError,
int_or_none,
try_get,
urlencode_postdata,
)
class ZattooBaseIE(InfoExtractor):
_NETRC_MACHINE = 'zattoo'
_HOST_URL = 'https://zattoo.com'
_power_guide_hash = None
def _login(self):
username, password = self._get_login_info()
if not username or not password:
self.raise_login_required(
'A valid %s account is needed to access this media.'
% self._NETRC_MACHINE)
try:
data = self._download_json(
'%s/zapi/v2/account/login' % self._HOST_URL, None, 'Logging in',
data=urlencode_postdata({
'login': username,
'password': password,
'remember': 'true',
}), headers={
'Referer': '%s/login' % self._HOST_URL,
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
})
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 400:
raise ExtractorError(
'Unable to login: incorrect username and/or password',
expected=True)
raise
self._power_guide_hash = data['session']['power_guide_hash']
def _real_initialize(self):
webpage = self._download_webpage(
self._HOST_URL, None, 'Downloading app token')
app_token = self._html_search_regex(
r'appToken\s*=\s*(["\'])(?P<token>(?:(?!\1).)+?)\1',
webpage, 'app token', group='token')
app_version = self._html_search_regex(
r'<!--\w+-(.+?)-', webpage, 'app version', default='2.8.2')
# Will setup appropriate cookies
self._request_webpage(
'%s/zapi/v2/session/hello' % self._HOST_URL, None,
'Opening session', data=urlencode_postdata({
'client_app_token': app_token,
'uuid': compat_str(uuid4()),
'lang': 'en',
'app_version': app_version,
'format': 'json',
}))
self._login()
def _extract_cid(self, video_id, channel_name):
channel_groups = self._download_json(
'%s/zapi/v2/cached/channels/%s' % (self._HOST_URL,
self._power_guide_hash),
video_id, 'Downloading channel list',
query={'details': False})['channel_groups']
channel_list = []
for chgrp in channel_groups:
channel_list.extend(chgrp['channels'])
try:
return next(
chan['cid'] for chan in channel_list
if chan.get('cid') and (
chan.get('display_alias') == channel_name or
chan.get('cid') == channel_name))
except StopIteration:
raise ExtractorError('Could not extract channel id')
def _extract_cid_and_video_info(self, video_id):
data = self._download_json(
'%s/zapi/program/details' % self._HOST_URL,
video_id,
'Downloading video information',
query={
'program_id': video_id,
'complete': True
})
p = data['program']
cid = p['cid']
info_dict = {
'id': video_id,
'title': p.get('title') or p['episode_title'],
'description': p.get('description'),
'thumbnail': p.get('image_url'),
'creator': p.get('channel_name'),
'episode': p.get('episode_title'),
'episode_number': int_or_none(p.get('episode_number')),
'season_number': int_or_none(p.get('season_number')),
'release_year': int_or_none(p.get('year')),
'categories': try_get(p, lambda x: x['categories'], list),
}
return cid, info_dict
def _extract_formats(self, cid, video_id, record_id=None, is_live=False):
postdata_common = {
'https_watch_urls': True,
}
if is_live:
postdata_common.update({'timeshift': 10800})
url = '%s/zapi/watch/live/%s' % (self._HOST_URL, cid)
elif record_id:
url = '%s/zapi/watch/recording/%s' % (self._HOST_URL, record_id)
else:
url = '%s/zapi/watch/recall/%s/%s' % (self._HOST_URL, cid, video_id)
formats = []
for stream_type in ('dash', 'hls', 'hls5', 'hds'):
postdata = postdata_common.copy()
postdata['stream_type'] = stream_type
data = self._download_json(
url, video_id, 'Downloading %s formats' % stream_type.upper(),
data=urlencode_postdata(postdata), fatal=False)
if not data:
continue
watch_urls = try_get(
data, lambda x: x['stream']['watch_urls'], list)
if not watch_urls:
continue
for watch in watch_urls:
if not isinstance(watch, dict):
continue
watch_url = watch.get('url')
if not watch_url or not isinstance(watch_url, compat_str):
continue
format_id_list = [stream_type]
maxrate = watch.get('maxrate')
if maxrate:
format_id_list.append(compat_str(maxrate))
audio_channel = watch.get('audio_channel')
if audio_channel:
format_id_list.append(compat_str(audio_channel))
preference = 1 if audio_channel == 'A' else None
format_id = '-'.join(format_id_list)
if stream_type in ('dash', 'dash_widevine', 'dash_playready'):
this_formats = self._extract_mpd_formats(
watch_url, video_id, mpd_id=format_id, fatal=False)
elif stream_type in ('hls', 'hls5', 'hls5_fairplay'):
this_formats = self._extract_m3u8_formats(
watch_url, video_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id=format_id,
fatal=False)
elif stream_type == 'hds':
this_formats = self._extract_f4m_formats(
watch_url, video_id, f4m_id=format_id, fatal=False)
elif stream_type == 'smooth_playready':
this_formats = self._extract_ism_formats(
watch_url, video_id, ism_id=format_id, fatal=False)
else:
assert False
for this_format in this_formats:
this_format['preference'] = preference
formats.extend(this_formats)
self._sort_formats(formats)
return formats
def _extract_video(self, channel_name, video_id, record_id=None, is_live=False):
if is_live:
cid = self._extract_cid(video_id, channel_name)
info_dict = {
'id': channel_name,
'title': self._live_title(channel_name),
'is_live': True,
}
else:
cid, info_dict = self._extract_cid_and_video_info(video_id)
formats = self._extract_formats(
cid, video_id, record_id=record_id, is_live=is_live)
info_dict['formats'] = formats
return info_dict
class QuicklineBaseIE(ZattooBaseIE):
_NETRC_MACHINE = 'quickline'
_HOST_URL = 'https://mobiltv.quickline.com'
class QuicklineIE(QuicklineBaseIE):
_VALID_URL = r'https?://(?:www\.)?mobiltv\.quickline\.com/watch/(?P<channel>[^/]+)/(?P<id>[0-9]+)'
_TEST = {
'url': 'https://mobiltv.quickline.com/watch/prosieben/130671867-maze-runner-die-auserwaehlten-in-der-brandwueste',
'only_matching': True,
}
def _real_extract(self, url):
channel_name, video_id = re.match(self._VALID_URL, url).groups()
return self._extract_video(channel_name, video_id)
class QuicklineLiveIE(QuicklineBaseIE):
_VALID_URL = r'https?://(?:www\.)?mobiltv\.quickline\.com/watch/(?P<id>[^/]+)'
_TEST = {
'url': 'https://mobiltv.quickline.com/watch/srf1',
'only_matching': True,
}
@classmethod
def suitable(cls, url):
return False if QuicklineIE.suitable(url) else super(QuicklineLiveIE, cls).suitable(url)
def _real_extract(self, url):
channel_name = video_id = self._match_id(url)
return self._extract_video(channel_name, video_id, is_live=True)
class ZattooIE(ZattooBaseIE):
_VALID_URL = r'https?://(?:www\.)?zattoo\.com/watch/(?P<channel>[^/]+?)/(?P<id>[0-9]+)[^/]+(?:/(?P<recid>[0-9]+))?'
# Since regular videos are only available for 7 days and recorded videos
# are only available for a specific user, we cannot have detailed tests.
_TESTS = [{
'url': 'https://zattoo.com/watch/prosieben/130671867-maze-runner-die-auserwaehlten-in-der-brandwueste',
'only_matching': True,
}, {
'url': 'https://zattoo.com/watch/srf_zwei/132905652-eishockey-spengler-cup/102791477/1512211800000/1514433500000/92000',
'only_matching': True,
}]
def _real_extract(self, url):
channel_name, video_id, record_id = re.match(self._VALID_URL, url).groups()
return self._extract_video(channel_name, video_id, record_id)
class ZattooLiveIE(ZattooBaseIE):
_VALID_URL = r'https?://(?:www\.)?zattoo\.com/watch/(?P<id>[^/]+)'
_TEST = {
'url': 'https://zattoo.com/watch/srf1',
'only_matching': True,
}
@classmethod
def suitable(cls, url):
return False if ZattooIE.suitable(url) else super(ZattooLiveIE, cls).suitable(url)
def _real_extract(self, url):
channel_name = video_id = self._match_id(url)
return self._extract_video(channel_name, video_id, is_live=True)

View File

@ -203,7 +203,7 @@ def parseOpts(overrideArguments=None):
network.add_option( network.add_option(
'--proxy', dest='proxy', '--proxy', dest='proxy',
default=None, metavar='URL', default=None, metavar='URL',
help='Use the specified HTTP/HTTPS/SOCKS proxy. To enable experimental ' help='Use the specified HTTP/HTTPS/SOCKS proxy. To enable '
'SOCKS proxy, specify a proper scheme. For example ' 'SOCKS proxy, specify a proper scheme. For example '
'socks5://127.0.0.1:1080/. Pass in an empty string (--proxy "") ' 'socks5://127.0.0.1:1080/. Pass in an empty string (--proxy "") '
'for direct connection') 'for direct connection')
@ -232,7 +232,7 @@ def parseOpts(overrideArguments=None):
'--geo-verification-proxy', '--geo-verification-proxy',
dest='geo_verification_proxy', default=None, metavar='URL', dest='geo_verification_proxy', default=None, metavar='URL',
help='Use this proxy to verify the IP address for some geo-restricted sites. ' help='Use this proxy to verify the IP address for some geo-restricted sites. '
'The default proxy specified by --proxy (or none, if the options is not present) is used for the actual downloading.') 'The default proxy specified by --proxy (or none, if the option is not present) is used for the actual downloading.')
geo.add_option( geo.add_option(
'--cn-verification-proxy', '--cn-verification-proxy',
dest='cn_verification_proxy', default=None, metavar='URL', dest='cn_verification_proxy', default=None, metavar='URL',
@ -240,15 +240,19 @@ def parseOpts(overrideArguments=None):
geo.add_option( geo.add_option(
'--geo-bypass', '--geo-bypass',
action='store_true', dest='geo_bypass', default=True, action='store_true', dest='geo_bypass', default=True,
help='Bypass geographic restriction via faking X-Forwarded-For HTTP header (experimental)') help='Bypass geographic restriction via faking X-Forwarded-For HTTP header')
geo.add_option( geo.add_option(
'--no-geo-bypass', '--no-geo-bypass',
action='store_false', dest='geo_bypass', default=True, action='store_false', dest='geo_bypass', default=True,
help='Do not bypass geographic restriction via faking X-Forwarded-For HTTP header (experimental)') help='Do not bypass geographic restriction via faking X-Forwarded-For HTTP header')
geo.add_option( geo.add_option(
'--geo-bypass-country', metavar='CODE', '--geo-bypass-country', metavar='CODE',
dest='geo_bypass_country', default=None, dest='geo_bypass_country', default=None,
help='Force bypass geographic restriction with explicitly provided two-letter ISO 3166-2 country code (experimental)') help='Force bypass geographic restriction with explicitly provided two-letter ISO 3166-2 country code')
geo.add_option(
'--geo-bypass-ip-block', metavar='IP_BLOCK',
dest='geo_bypass_ip_block', default=None,
help='Force bypass geographic restriction with explicitly provided IP block in CIDR notation')
selection = optparse.OptionGroup(parser, 'Video Selection') selection = optparse.OptionGroup(parser, 'Video Selection')
selection.add_option( selection.add_option(
@ -498,7 +502,7 @@ def parseOpts(overrideArguments=None):
downloader.add_option( downloader.add_option(
'--xattr-set-filesize', '--xattr-set-filesize',
dest='xattr_set_filesize', action='store_true', dest='xattr_set_filesize', action='store_true',
help='Set file xattribute ytdl.filesize with expected file size (experimental)') help='Set file xattribute ytdl.filesize with expected file size')
downloader.add_option( downloader.add_option(
'--hls-prefer-native', '--hls-prefer-native',
dest='hls_prefer_native', action='store_true', default=None, dest='hls_prefer_native', action='store_true', default=None,

Some files were not shown because too many files have changed in this diff Show More