1
0
mirror of https://codeberg.org/polarisfm/youtube-dl synced 2024-12-04 14:27:53 +01:00
This commit is contained in:
frkhit 2019-02-03 02:04:28 +08:00
commit 78f2427add
123 changed files with 4274 additions and 1458 deletions

View File

@ -6,8 +6,8 @@
--- ---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2018.11.07*. If it's not, read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected. ### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2019.01.30.1*. If it's not, read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2018.11.07** - [ ] I've **verified** and **I assure** that I'm running youtube-dl **2019.01.30.1**
### Before submitting an *issue* make sure you have: ### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through the [README](https://github.com/rg3/youtube-dl/blob/master/README.md), **most notably** the [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections - [ ] At least skimmed through the [README](https://github.com/rg3/youtube-dl/blob/master/README.md), **most notably** the [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@ -36,7 +36,7 @@ Add the `-v` flag to **your command line** you run youtube-dl with (`youtube-dl
[debug] User config: [] [debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2018.11.07 [debug] youtube-dl version 2019.01.30.1
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {} [debug] Proxy map: {}

View File

@ -15,6 +15,18 @@ env:
- YTDL_TEST_SET=download - YTDL_TEST_SET=download
matrix: matrix:
include: include:
- python: 3.7
dist: xenial
env: YTDL_TEST_SET=core
- python: 3.7
dist: xenial
env: YTDL_TEST_SET=download
- python: 3.8-dev
dist: xenial
env: YTDL_TEST_SET=core
- python: 3.8-dev
dist: xenial
env: YTDL_TEST_SET=download
- env: JYTHON=true; YTDL_TEST_SET=core - env: JYTHON=true; YTDL_TEST_SET=core
- env: JYTHON=true; YTDL_TEST_SET=download - env: JYTHON=true; YTDL_TEST_SET=download
fast_finish: true fast_finish: true

View File

@ -152,16 +152,20 @@ After you have ensured this site is distributing its content legally, you can fo
``` ```
5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/extractors.py). 5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. Note that tests with `only_matching` key in test's dict are not counted in. 6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. Note that tests with `only_matching` key in test's dict are not counted in.
7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L74-L252). Add tests and code for as many as you want. 7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/7f41a598b3fba1bcab2817de64a08941200aa3c8/youtube_dl/extractor/common.py#L94-L303). Add tests and code for as many as you want.
8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](https://pypi.python.org/pypi/flake8). Also make sure your code works under all [Python](https://www.python.org/) versions claimed supported by youtube-dl, namely 2.6, 2.7, and 3.2+. 8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](http://flake8.pycqa.org/en/latest/index.html#quickstart):
9. When the tests pass, [add](https://git-scm.com/docs/git-add) the new files and [commit](https://git-scm.com/docs/git-commit) them and [push](https://git-scm.com/docs/git-push) the result, like this:
$ flake8 youtube_dl/extractor/yourextractor.py
9. Make sure your code works under all [Python](https://www.python.org/) versions claimed supported by youtube-dl, namely 2.6, 2.7, and 3.2+.
10. When the tests pass, [add](https://git-scm.com/docs/git-add) the new files and [commit](https://git-scm.com/docs/git-commit) them and [push](https://git-scm.com/docs/git-push) the result, like this:
$ git add youtube_dl/extractor/extractors.py $ git add youtube_dl/extractor/extractors.py
$ git add youtube_dl/extractor/yourextractor.py $ git add youtube_dl/extractor/yourextractor.py
$ git commit -m '[yourextractor] Add new extractor' $ git commit -m '[yourextractor] Add new extractor'
$ git push origin yourextractor $ git push origin yourextractor
10. Finally, [create a pull request](https://help.github.com/articles/creating-a-pull-request). We'll then review and merge it. 11. Finally, [create a pull request](https://help.github.com/articles/creating-a-pull-request). We'll then review and merge it.
In any case, thank you very much for your contributions! In any case, thank you very much for your contributions!
@ -173,7 +177,7 @@ Extractors are very fragile by nature since they depend on the layout of the sou
### Mandatory and optional metafields ### Mandatory and optional metafields
For extraction to work youtube-dl relies on metadata your extractor extracts and provides to youtube-dl expressed by an [information dictionary](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L75-L257) or simply *info dict*. Only the following meta fields in the *info dict* are considered mandatory for a successful extraction process by youtube-dl: For extraction to work youtube-dl relies on metadata your extractor extracts and provides to youtube-dl expressed by an [information dictionary](https://github.com/rg3/youtube-dl/blob/7f41a598b3fba1bcab2817de64a08941200aa3c8/youtube_dl/extractor/common.py#L94-L303) or simply *info dict*. Only the following meta fields in the *info dict* are considered mandatory for a successful extraction process by youtube-dl:
- `id` (media identifier) - `id` (media identifier)
- `title` (media title) - `title` (media title)
@ -181,7 +185,7 @@ For extraction to work youtube-dl relies on metadata your extractor extracts and
In fact only the last option is technically mandatory (i.e. if you can't figure out the download location of the media the extraction does not make any sense). But by convention youtube-dl also treats `id` and `title` as mandatory. Thus the aforementioned metafields are the critical data that the extraction does not make any sense without and if any of them fail to be extracted then the extractor is considered completely broken. In fact only the last option is technically mandatory (i.e. if you can't figure out the download location of the media the extraction does not make any sense). But by convention youtube-dl also treats `id` and `title` as mandatory. Thus the aforementioned metafields are the critical data that the extraction does not make any sense without and if any of them fail to be extracted then the extractor is considered completely broken.
[Any field](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L149-L257) apart from the aforementioned ones are considered **optional**. That means that extraction should be **tolerant** to situations when sources for these fields can potentially be unavailable (even if they are always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields. [Any field](https://github.com/rg3/youtube-dl/blob/7f41a598b3fba1bcab2817de64a08941200aa3c8/youtube_dl/extractor/common.py#L188-L303) apart from the aforementioned ones are considered **optional**. That means that extraction should be **tolerant** to situations when sources for these fields can potentially be unavailable (even if they are always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields.
#### Example #### Example
@ -257,11 +261,33 @@ title = meta.get('title') or self._og_search_title(webpage)
This code will try to extract from `meta` first and if it fails it will try extracting `og:title` from a `webpage`. This code will try to extract from `meta` first and if it fails it will try extracting `og:title` from a `webpage`.
### Make regular expressions flexible ### Regular expressions
When using regular expressions try to write them fuzzy and flexible. #### Don't capture groups you don't use
Capturing group must be an indication that it's used somewhere in the code. Any group that is not used must be non capturing.
##### Example
Don't capture id attribute name here since you can't use it for anything anyway.
Correct:
```python
r'(?:id|ID)=(?P<id>\d+)'
```
Incorrect:
```python
r'(id|ID)=(?P<id>\d+)'
```
#### Make regular expressions relaxed and flexible
When using regular expressions try to write them fuzzy, relaxed and flexible, skipping insignificant parts that are more likely to change, allowing both single and double quotes for quoted values and so on.
#### Example ##### Example
Say you need to extract `title` from the following HTML code: Say you need to extract `title` from the following HTML code:
@ -294,7 +320,26 @@ title = self._search_regex(
webpage, 'title', group='title') webpage, 'title', group='title')
``` ```
### Use safe conversion functions ### Long lines policy
There is a soft limit to keep lines of code under 80 characters long. This means it should be respected if possible and if it does not make readability and code maintenance worse.
For example, you should **never** split long string literals like URLs or some other often copied entities over multiple lines to fit this limit:
Correct:
```python
'https://www.youtube.com/watch?v=FqZTN594JQw&list=PLMYEtVRpaqY00V9W81Cwmzp6N6vZqfUKD4'
```
Incorrect:
```python
'https://www.youtube.com/watch?v=FqZTN594JQw&list='
'PLMYEtVRpaqY00V9W81Cwmzp6N6vZqfUKD4'
```
### Use convenience conversion and parsing functions
Wrap all extracted numeric data into safe functions from [`youtube_dl/utils.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/utils.py): `int_or_none`, `float_or_none`. Use them for string to number conversions as well. Wrap all extracted numeric data into safe functions from [`youtube_dl/utils.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/utils.py): `int_or_none`, `float_or_none`. Use them for string to number conversions as well.
@ -302,6 +347,8 @@ Use `url_or_none` for safe URL processing.
Use `try_get` for safe metadata extraction from parsed JSON. Use `try_get` for safe metadata extraction from parsed JSON.
Use `unified_strdate` for uniform `upload_date` or any `YYYYMMDD` meta field extraction, `unified_timestamp` for uniform `timestamp` extraction, `parse_filesize` for `filesize` extraction, `parse_count` for count meta fields extraction, `parse_resolution`, `parse_duration` for `duration` extraction, `parse_age_limit` for `age_limit` extraction.
Explore [`youtube_dl/utils.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/utils.py) for more useful convenience functions. Explore [`youtube_dl/utils.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/utils.py) for more useful convenience functions.
#### More examples #### More examples

260
ChangeLog
View File

@ -1,3 +1,263 @@
version 2019.01.30.1
Core
* [postprocessor/ffmpeg] Fix avconv processing broken in #19025 (#19067)
version 2019.01.30
Core
* [postprocessor/ffmpeg] Do not copy Apple TV chapter tracks while embedding
subtitles (#19024, #19042)
* [postprocessor/ffmpeg] Disable "Last message repeated" messages (#19025)
Extractors
* [yourporn] Fix extraction and extract duration (#18815, #18852, #19061)
* [drtv] Improve extraction (#19039)
+ Add support for EncryptedUri videos
+ Extract more metadata
* Fix subtitles extraction
+ [fox] Add support for locked videos using cookies (#19060)
* [fox] Fix extraction for free videos (#19060)
+ [zattoo] Add support for tv.salt.ch (#19059)
version 2019.01.27
Core
+ [extractor/common] Extract season in _json_ld
* [postprocessor/ffmpeg] Fallback to ffmpeg/avconv for audio codec detection
(#681)
Extractors
* [vice] Fix extraction for locked videos (#16248)
+ [wakanim] Detect DRM protected videos
+ [wakanim] Add support for wakanim.tv (#14374)
* [usatoday] Fix extraction for videos with custom brightcove partner id
(#18990)
* [drtv] Fix extraction (#18989)
* [nhk] Extend URL regular expression (#18968)
* [go] Fix Adobe Pass requests for Disney Now (#18901)
+ [openload] Add support for oload.club (#18969)
version 2019.01.24
Core
* [YoutubeDL] Fix negation for string operators in format selection (#18961)
version 2019.01.23
Core
* [utils] Fix urljoin for paths with non-http(s) schemes
* [extractor/common] Improve jwplayer relative URL handling (#18892)
+ [YoutubeDL] Add negation support for string comparisons in format selection
expressions (#18600, #18805)
* [extractor/common] Improve HLS video-only format detection (#18923)
Extractors
* [crunchyroll] Extend URL regular expression (#18955)
* [pornhub] Bypass scrape detection (#4822, #5930, #7074, #10175, #12722,
#17197, #18338 #18842, #18899)
+ [vrv] Add support for authentication (#14307)
* [videomore:season] Fix extraction
* [videomore] Improve extraction (#18908)
+ [tnaflix] Pass Referer in metadata request (#18925)
* [radiocanada] Relax DRM check (#18608, #18609)
* [vimeo] Fix video password verification for videos protected by
Referer HTTP header
+ [hketv] Add support for hkedcity.net (#18696)
+ [streamango] Add support for fruithosts.net (#18710)
+ [instagram] Add support for tags (#18757)
+ [odnoklassniki] Detect paid videos (#18876)
* [ted] Correct acodec for HTTP formats (#18923)
* [cartoonnetwork] Fix extraction (#15664, #17224)
* [vimeo] Fix extraction for password protected player URLs (#18889)
version 2019.01.17
Extractors
* [youtube] Extend JS player signature function name regular expressions
(#18890, #18891, #18893)
version 2019.01.16
Core
+ [test/helper] Add support for maxcount and count collection len checkers
* [downloader/hls] Fix uplynk ad skipping (#18824)
* [postprocessor/ffmpeg] Improve ffmpeg version parsing (#18813)
Extractors
* [youtube] Skip unsupported adaptive stream type (#18804)
+ [youtube] Extract DASH formats from player response (#18804)
* [funimation] Fix extraction (#14089)
* [skylinewebcams] Fix extraction (#18853)
+ [curiositystream] Add support for non app URLs
+ [bitchute] Check formats (#18833)
* [wistia] Extend URL regular expression (#18823)
+ [playplustv] Add support for playplus.com (#18789)
version 2019.01.10
Core
* [extractor/common] Use episode name as title in _json_ld
+ [extractor/common] Add support for movies in _json_ld
* [postprocessor/ffmpeg] Embed subtitles with non-standard language codes
(#18765)
+ [utils] Add language codes replaced in 1989 revision of ISO 639
to ISO639Utils (#18765)
Extractors
* [youtube] Extract live HLS URL from player response (#18799)
+ [outsidetv] Add support for outsidetv.com (#18774)
* [jwplatform] Use JW Platform Delivery API V2 and add support for more URLs
+ [fox] Add support National Geographic (#17985, #15333, #14698)
+ [playplustv] Add support for playplus.tv (#18789)
* [globo] Set GLBID cookie manually (#17346)
+ [gaia] Add support for gaia.com (#14605)
* [youporn] Fix title and description extraction (#18748)
+ [hungama] Add support for hungama.com (#17402, #18771)
* [dtube] Fix extraction (#18741)
* [tvnow] Fix and rework extractors and prepare for a switch to the new API
(#17245, #18499)
* [carambatv:page] Fix extraction (#18739)
version 2019.01.02
Extractors
* [discovery] Use geo verification headers (#17838)
+ [packtpub] Add support for subscription.packtpub.com (#18718)
* [yourporn] Fix extraction (#18583)
+ [acast:channel] Add support for play.acast.com (#18587)
+ [extractors] Add missing age limits (#18621)
+ [rmcdecouverte] Add support for live stream
* [rmcdecouverte] Bypass geo restriction
* [rmcdecouverte] Update URL regular expression (#18595, 18697)
* [manyvids] Fix extraction (#18604, #18614)
* [bitchute] Fix extraction (#18567)
version 2018.12.31
Extractors
+ [bbc] Add support for another embed pattern (#18643)
+ [npo:live] Add support for npostart.nl (#18644)
* [beeg] Fix extraction (#18610, #18626)
* [youtube] Unescape HTML for series (#18641)
+ [youtube] Extract more format metadata
* [youtube] Detect DRM protected videos (#1774)
* [youtube] Relax HTML5 player regular expressions (#18465, #18466)
* [youtube] Extend HTML5 player regular expression (#17516)
+ [liveleak] Add support for another embed type and restore original
format extraction
+ [crackle] Extract ISM and HTTP formats
+ [twitter] Pass Referer with card request (#18579)
* [mediasite] Extend URL regular expression (#18558)
+ [lecturio] Add support for lecturio.de (#18562)
+ [discovery] Add support for Scripps Networks watch domains (#17947)
version 2018.12.17
Extractors
* [ard:beta] Improve geo restricted videos extraction
* [ard:beta] Fix subtitles extraction
* [ard:beta] Improve extraction robustness
* [ard:beta] Relax URL regular expression (#18441)
* [acast] Add support for embed.acast.com and play.acast.com (#18483)
* [iprima] Relax URL regular expression (#18515, #18540)
* [vrv] Fix initial state extraction (#18553)
* [youtube] Fix mark watched (#18546)
+ [safari] Add support for learning.oreilly.com (#18510)
* [youtube] Fix multifeed extraction (#18531)
* [lecturio] Improve subtitles extraction (#18488)
* [uol] Fix format URL extraction (#18480)
+ [ard:mediathek] Add support for classic.ardmediathek.de (#18473)
version 2018.12.09
Core
* [YoutubeDL] Keep session cookies in cookie file between runs
* [YoutubeDL] Recognize session cookies with expired set to 0 (#12929)
Extractors
+ [teachable] Add support for teachable platform sites (#5451, #18150, #18272)
+ [aenetworks] Add support for historyvault.com (#18460)
* [imgur] Improve gallery and album detection and extraction (#9133, #16577,
#17223, #18404)
* [iprima] Relax URL regular expression (#18453)
* [hotstar] Fix video data extraction (#18386)
* [ard:mediathek] Fix title and description extraction (#18349, #18371)
* [xvideos] Switch to HTTPS (#18422, #18427)
+ [lecturio] Add support for lecturio.com (#18405)
+ [nrktv:series] Add support for extra materials
* [nrktv:season,series] Fix extraction (#17159, #17258)
* [nrktv] Relax URL regular expression (#18304, #18387)
* [yourporn] Fix extraction (#18424, #18425)
* [tbs] Fix info extraction (#18403)
+ [gamespot] Add support for review URLs
version 2018.12.03
Core
* [utils] Fix random_birthday to generate existing dates only (#18284)
Extractors
+ [tiktok] Add support for tiktok.com (#18108, #18135)
* [pornhub] Use actual URL host for requests (#18359)
* [lynda] Fix authentication (#18158, #18217)
* [gfycat] Update API endpoint (#18333, #18343)
+ [hotstar] Add support for alternative app state layout (#18320)
* [azmedien] Fix extraction (#18334, #18336)
+ [vimeo] Add support for VHX (Vimeo OTT) (#14835)
* [joj] Fix extraction (#18280, #18281)
+ [wistia] Add support for fast.wistia.com (#18287)
version 2018.11.23
Core
+ [setup.py] Add more relevant classifiers
Extractors
* [mixcloud] Fallback to hardcoded decryption key (#18016)
* [nbc:news] Fix article extraction (#16194)
* [foxsports] Fix extraction (#17543)
* [loc] Relax regular expression and improve formats extraction
+ [ciscolive] Add support for ciscolive.cisco.com (#17984)
* [nzz] Relax kaltura regex (#18228)
* [sixplay] Fix formats extraction
* [bitchute] Improve title extraction
* [kaltura] Limit requested MediaEntry fields
+ [americastestkitchen] Add support for zype embeds (#18225)
+ [pornhub] Add pornhub.net alias
* [nova:embed] Fix extraction (#18222)
version 2018.11.18
Extractors
+ [wwe] Extract subtitles
+ [wwe] Add support for playlistst (#14781)
+ [wwe] Add support for wwe.com (#14781, #17450)
* [vk] Detect geo restriction (#17767)
* [openload] Use original host during extraction (#18211)
* [atvat] Fix extraction (#18041)
+ [rte] Add support for new API endpoint (#18206)
* [tnaflixnetwork:embed] Fix extraction (#18205)
* [picarto] Use API and add token support (#16518)
+ [zype] Add support for player.zype.com (#18143)
* [vivo] Fix extraction (#18139)
* [ruutu] Update API endpoint (#18138)
version 2018.11.07 version 2018.11.07
Extractors Extractors

View File

@ -496,7 +496,7 @@ The `-o` option allows users to indicate a template for the output file names.
**tl;dr:** [navigate me to examples](#output-template-examples). **tl;dr:** [navigate me to examples](#output-template-examples).
The basic usage is not to set any template arguments when downloading a single file, like in `youtube-dl -o funny_video.flv "https://some/video"`. However, it may contain special sequences that will be replaced when downloading each video. The special sequences may be formatted according to [python string formatting operations](https://docs.python.org/2/library/stdtypes.html#string-formatting). For example, `%(NAME)s` or `%(NAME)05d`. To clarify, that is a percent symbol followed by a name in parentheses, followed by a formatting operations. Allowed names along with sequence type are: The basic usage is not to set any template arguments when downloading a single file, like in `youtube-dl -o funny_video.flv "https://some/video"`. However, it may contain special sequences that will be replaced when downloading each video. The special sequences may be formatted according to [python string formatting operations](https://docs.python.org/2/library/stdtypes.html#string-formatting). For example, `%(NAME)s` or `%(NAME)05d`. To clarify, that is a percent symbol followed by a name in parentheses, followed by formatting operations. Allowed names along with sequence type are:
- `id` (string): Video identifier - `id` (string): Video identifier
- `title` (string): Video title - `title` (string): Video title
@ -667,7 +667,7 @@ The following numeric meta fields can be used with comparisons `<`, `<=`, `>`, `
- `asr`: Audio sampling rate in Hertz - `asr`: Audio sampling rate in Hertz
- `fps`: Frame rate - `fps`: Frame rate
Also filtering work for comparisons `=` (equals), `!=` (not equals), `^=` (begins with), `$=` (ends with), `*=` (contains) and following string meta fields: Also filtering work for comparisons `=` (equals), `^=` (starts with), `$=` (ends with), `*=` (contains) and following string meta fields:
- `ext`: File extension - `ext`: File extension
- `acodec`: Name of the audio codec in use - `acodec`: Name of the audio codec in use
- `vcodec`: Name of the video codec in use - `vcodec`: Name of the video codec in use
@ -675,6 +675,8 @@ Also filtering work for comparisons `=` (equals), `!=` (not equals), `^=` (begin
- `protocol`: The protocol that will be used for the actual download, lower-case (`http`, `https`, `rtsp`, `rtmp`, `rtmpe`, `mms`, `f4m`, `ism`, `http_dash_segments`, `m3u8`, or `m3u8_native`) - `protocol`: The protocol that will be used for the actual download, lower-case (`http`, `https`, `rtsp`, `rtmp`, `rtmpe`, `mms`, `f4m`, `ism`, `http_dash_segments`, `m3u8`, or `m3u8_native`)
- `format_id`: A short description of the format - `format_id`: A short description of the format
Any string comparison may be prefixed with negation `!` in order to produce an opposite comparison, e.g. `!*=` (does not contain).
Note that none of the aforementioned meta fields are guaranteed to be present since this solely depends on the metadata obtained by particular extractor, i.e. the metadata offered by the video hoster. Note that none of the aforementioned meta fields are guaranteed to be present since this solely depends on the metadata obtained by particular extractor, i.e. the metadata offered by the video hoster.
Formats for which the value is not known are excluded unless you put a question mark (`?`) after the operator. You can combine format filters, so `-f "[height <=? 720][tbr>500]"` selects up to 720p videos (or videos where the height is not known) with a bitrate of at least 500 KBit/s. Formats for which the value is not known are excluded unless you put a question mark (`?`) after the operator. You can combine format filters, so `-f "[height <=? 720][tbr>500]"` selects up to 720p videos (or videos where the height is not known) with a bitrate of at least 500 KBit/s.
@ -1024,16 +1026,20 @@ After you have ensured this site is distributing its content legally, you can fo
``` ```
5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/extractors.py). 5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. Note that tests with `only_matching` key in test's dict are not counted in. 6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. Note that tests with `only_matching` key in test's dict are not counted in.
7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L74-L252). Add tests and code for as many as you want. 7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/7f41a598b3fba1bcab2817de64a08941200aa3c8/youtube_dl/extractor/common.py#L94-L303). Add tests and code for as many as you want.
8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](https://pypi.python.org/pypi/flake8). Also make sure your code works under all [Python](https://www.python.org/) versions claimed supported by youtube-dl, namely 2.6, 2.7, and 3.2+. 8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](http://flake8.pycqa.org/en/latest/index.html#quickstart):
9. When the tests pass, [add](https://git-scm.com/docs/git-add) the new files and [commit](https://git-scm.com/docs/git-commit) them and [push](https://git-scm.com/docs/git-push) the result, like this:
$ flake8 youtube_dl/extractor/yourextractor.py
9. Make sure your code works under all [Python](https://www.python.org/) versions claimed supported by youtube-dl, namely 2.6, 2.7, and 3.2+.
10. When the tests pass, [add](https://git-scm.com/docs/git-add) the new files and [commit](https://git-scm.com/docs/git-commit) them and [push](https://git-scm.com/docs/git-push) the result, like this:
$ git add youtube_dl/extractor/extractors.py $ git add youtube_dl/extractor/extractors.py
$ git add youtube_dl/extractor/yourextractor.py $ git add youtube_dl/extractor/yourextractor.py
$ git commit -m '[yourextractor] Add new extractor' $ git commit -m '[yourextractor] Add new extractor'
$ git push origin yourextractor $ git push origin yourextractor
10. Finally, [create a pull request](https://help.github.com/articles/creating-a-pull-request). We'll then review and merge it. 11. Finally, [create a pull request](https://help.github.com/articles/creating-a-pull-request). We'll then review and merge it.
In any case, thank you very much for your contributions! In any case, thank you very much for your contributions!
@ -1045,7 +1051,7 @@ Extractors are very fragile by nature since they depend on the layout of the sou
### Mandatory and optional metafields ### Mandatory and optional metafields
For extraction to work youtube-dl relies on metadata your extractor extracts and provides to youtube-dl expressed by an [information dictionary](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L75-L257) or simply *info dict*. Only the following meta fields in the *info dict* are considered mandatory for a successful extraction process by youtube-dl: For extraction to work youtube-dl relies on metadata your extractor extracts and provides to youtube-dl expressed by an [information dictionary](https://github.com/rg3/youtube-dl/blob/7f41a598b3fba1bcab2817de64a08941200aa3c8/youtube_dl/extractor/common.py#L94-L303) or simply *info dict*. Only the following meta fields in the *info dict* are considered mandatory for a successful extraction process by youtube-dl:
- `id` (media identifier) - `id` (media identifier)
- `title` (media title) - `title` (media title)
@ -1053,7 +1059,7 @@ For extraction to work youtube-dl relies on metadata your extractor extracts and
In fact only the last option is technically mandatory (i.e. if you can't figure out the download location of the media the extraction does not make any sense). But by convention youtube-dl also treats `id` and `title` as mandatory. Thus the aforementioned metafields are the critical data that the extraction does not make any sense without and if any of them fail to be extracted then the extractor is considered completely broken. In fact only the last option is technically mandatory (i.e. if you can't figure out the download location of the media the extraction does not make any sense). But by convention youtube-dl also treats `id` and `title` as mandatory. Thus the aforementioned metafields are the critical data that the extraction does not make any sense without and if any of them fail to be extracted then the extractor is considered completely broken.
[Any field](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L149-L257) apart from the aforementioned ones are considered **optional**. That means that extraction should be **tolerant** to situations when sources for these fields can potentially be unavailable (even if they are always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields. [Any field](https://github.com/rg3/youtube-dl/blob/7f41a598b3fba1bcab2817de64a08941200aa3c8/youtube_dl/extractor/common.py#L188-L303) apart from the aforementioned ones are considered **optional**. That means that extraction should be **tolerant** to situations when sources for these fields can potentially be unavailable (even if they are always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields.
#### Example #### Example
@ -1129,11 +1135,33 @@ title = meta.get('title') or self._og_search_title(webpage)
This code will try to extract from `meta` first and if it fails it will try extracting `og:title` from a `webpage`. This code will try to extract from `meta` first and if it fails it will try extracting `og:title` from a `webpage`.
### Make regular expressions flexible ### Regular expressions
When using regular expressions try to write them fuzzy and flexible. #### Don't capture groups you don't use
Capturing group must be an indication that it's used somewhere in the code. Any group that is not used must be non capturing.
##### Example
Don't capture id attribute name here since you can't use it for anything anyway.
Correct:
```python
r'(?:id|ID)=(?P<id>\d+)'
```
Incorrect:
```python
r'(id|ID)=(?P<id>\d+)'
```
#### Make regular expressions relaxed and flexible
When using regular expressions try to write them fuzzy, relaxed and flexible, skipping insignificant parts that are more likely to change, allowing both single and double quotes for quoted values and so on.
#### Example ##### Example
Say you need to extract `title` from the following HTML code: Say you need to extract `title` from the following HTML code:
@ -1166,7 +1194,26 @@ title = self._search_regex(
webpage, 'title', group='title') webpage, 'title', group='title')
``` ```
### Use safe conversion functions ### Long lines policy
There is a soft limit to keep lines of code under 80 characters long. This means it should be respected if possible and if it does not make readability and code maintenance worse.
For example, you should **never** split long string literals like URLs or some other often copied entities over multiple lines to fit this limit:
Correct:
```python
'https://www.youtube.com/watch?v=FqZTN594JQw&list=PLMYEtVRpaqY00V9W81Cwmzp6N6vZqfUKD4'
```
Incorrect:
```python
'https://www.youtube.com/watch?v=FqZTN594JQw&list='
'PLMYEtVRpaqY00V9W81Cwmzp6N6vZqfUKD4'
```
### Use convenience conversion and parsing functions
Wrap all extracted numeric data into safe functions from [`youtube_dl/utils.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/utils.py): `int_or_none`, `float_or_none`. Use them for string to number conversions as well. Wrap all extracted numeric data into safe functions from [`youtube_dl/utils.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/utils.py): `int_or_none`, `float_or_none`. Use them for string to number conversions as well.
@ -1174,6 +1221,8 @@ Use `url_or_none` for safe URL processing.
Use `try_get` for safe metadata extraction from parsed JSON. Use `try_get` for safe metadata extraction from parsed JSON.
Use `unified_strdate` for uniform `upload_date` or any `YYYYMMDD` meta field extraction, `unified_timestamp` for uniform `timestamp` extraction, `parse_filesize` for `filesize` extraction, `parse_count` for count meta fields extraction, `parse_resolution`, `parse_duration` for `duration` extraction, `parse_age_limit` for `age_limit` extraction.
Explore [`youtube_dl/utils.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/utils.py) for more useful convenience functions. Explore [`youtube_dl/utils.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/utils.py) for more useful convenience functions.
#### More examples #### More examples

View File

@ -33,7 +33,7 @@
- **AdobeTVShow** - **AdobeTVShow**
- **AdobeTVVideo** - **AdobeTVVideo**
- **AdultSwim** - **AdultSwim**
- **aenetworks**: A+E Networks: A&E, Lifetime, History.com, FYI Network - **aenetworks**: A+E Networks: A&E, Lifetime, History.com, FYI Network and History Vault
- **afreecatv**: afreecatv.com - **afreecatv**: afreecatv.com
- **AirMozilla** - **AirMozilla**
- **AliExpressLive** - **AliExpressLive**
@ -163,6 +163,8 @@
- **chirbit** - **chirbit**
- **chirbit:profile** - **chirbit:profile**
- **Cinchcast** - **Cinchcast**
- **CiscoLiveSearch**
- **CiscoLiveSession**
- **CJSW** - **CJSW**
- **cliphunter** - **cliphunter**
- **Clippit** - **Clippit**
@ -318,6 +320,7 @@
- **Fusion** - **Fusion**
- **Fux** - **Fux**
- **FXNetworks** - **FXNetworks**
- **Gaia**
- **GameInformer** - **GameInformer**
- **GameOne** - **GameOne**
- **gameone:playlist** - **gameone:playlist**
@ -358,6 +361,7 @@
- **hitbox** - **hitbox**
- **hitbox:live** - **hitbox:live**
- **HitRecord** - **HitRecord**
- **hketv**: 香港教育局教育電視 (HKETV) Educational Television, Hong Kong Educational Bureau
- **HornBunny** - **HornBunny**
- **HotNewHipHop** - **HotNewHipHop**
- **hotstar** - **hotstar**
@ -368,18 +372,22 @@
- **HRTiPlaylist** - **HRTiPlaylist**
- **Huajiao**: 花椒直播 - **Huajiao**: 花椒直播
- **HuffPost**: Huffington Post - **HuffPost**: Huffington Post
- **Hungama**
- **HungamaSong**
- **Hypem** - **Hypem**
- **Iconosquare** - **Iconosquare**
- **ign.com** - **ign.com**
- **imdb**: Internet Movie Database trailers - **imdb**: Internet Movie Database trailers
- **imdb:list**: Internet Movie Database lists - **imdb:list**: Internet Movie Database lists
- **Imgur** - **Imgur**
- **ImgurAlbum** - **imgur:album**
- **imgur:gallery**
- **Ina** - **Ina**
- **Inc** - **Inc**
- **IndavideoEmbed** - **IndavideoEmbed**
- **InfoQ** - **InfoQ**
- **Instagram** - **Instagram**
- **instagram:tag**: Instagram hashtag search
- **instagram:user**: Instagram user profile - **instagram:user**: Instagram user profile
- **Internazionale** - **Internazionale**
- **InternetVideoArchive** - **InternetVideoArchive**
@ -433,6 +441,9 @@
- **Le**: 乐视网 - **Le**: 乐视网
- **Learnr** - **Learnr**
- **Lecture2Go** - **Lecture2Go**
- **Lecturio**
- **LecturioCourse**
- **LecturioDeCourse**
- **LEGO** - **LEGO**
- **Lemonde** - **Lemonde**
- **Lenta** - **Lenta**
@ -534,9 +545,8 @@
- **MyviEmbed** - **MyviEmbed**
- **MyVisionTV** - **MyVisionTV**
- **n-tv.de** - **n-tv.de**
- **natgeo**
- **natgeo:episodeguide**
- **natgeo:video** - **natgeo:video**
- **NationalGeographicTV**
- **Naver** - **Naver**
- **NBA** - **NBA**
- **NBC** - **NBC**
@ -636,6 +646,7 @@
- **orf:oe1**: Radio Österreich 1 - **orf:oe1**: Radio Österreich 1
- **orf:tvthek**: ORF TVthek - **orf:tvthek**: ORF TVthek
- **OsnatelTV** - **OsnatelTV**
- **OutsideTV**
- **PacktPub** - **PacktPub**
- **PacktPubCourse** - **PacktPubCourse**
- **PandaTV**: 熊猫TV - **PandaTV**: 熊猫TV
@ -660,6 +671,7 @@
- **Pinkbike** - **Pinkbike**
- **Pladform** - **Pladform**
- **play.fm** - **play.fm**
- **PlayPlusTV**
- **PlaysTV** - **PlaysTV**
- **Playtvak**: Playtvak.cz, iDNES.cz and Lidovky.cz - **Playtvak**: Playtvak.cz, iDNES.cz and Lidovky.cz
- **Playvid** - **Playvid**
@ -765,6 +777,7 @@
- **safari:api** - **safari:api**
- **safari:course**: safaribooksonline.com online courses - **safari:course**: safaribooksonline.com online courses
- **SAKTV** - **SAKTV**
- **SaltTV**
- **Sapo**: SAPO Vídeos - **Sapo**: SAPO Vídeos
- **savefrom.net** - **savefrom.net**
- **SBS**: sbs.com.au - **SBS**: sbs.com.au
@ -851,6 +864,8 @@
- **TastyTrade** - **TastyTrade**
- **TBS** - **TBS**
- **TDSLifeway** - **TDSLifeway**
- **Teachable**
- **TeachableCourse**
- **teachertube**: teachertube.com videos - **teachertube**: teachertube.com videos
- **teachertube:user:collection**: teachertube.com user and collection videos - **teachertube:user:collection**: teachertube.com user and collection videos
- **TeachingChannel** - **TeachingChannel**
@ -883,6 +898,8 @@
- **ThisAmericanLife** - **ThisAmericanLife**
- **ThisAV** - **ThisAV**
- **ThisOldHouse** - **ThisOldHouse**
- **TikTok**
- **TikTokUser**
- **tinypic**: tinypic.com videos - **tinypic**: tinypic.com videos
- **TMZ** - **TMZ**
- **TMZArticle** - **TMZArticle**
@ -924,7 +941,9 @@
- **TVNet** - **TVNet**
- **TVNoe** - **TVNoe**
- **TVNow** - **TVNow**
- **TVNowList** - **TVNowAnnual**
- **TVNowNew**
- **TVNowSeason**
- **TVNowShow** - **TVNowShow**
- **tvp**: Telewizja Polska - **tvp**: Telewizja Polska
- **tvp:embed**: Telewizja Polska - **tvp:embed**: Telewizja Polska
@ -957,8 +976,6 @@
- **uol.com.br** - **uol.com.br**
- **uplynk** - **uplynk**
- **uplynk:preplay** - **uplynk:preplay**
- **Upskill**
- **UpskillCourse**
- **Urort**: NRK P3 Urørt - **Urort**: NRK P3 Urørt
- **URPlay** - **URPlay**
- **USANetwork** - **USANetwork**
@ -977,6 +994,7 @@
- **VevoPlaylist** - **VevoPlaylist**
- **VGTV**: VGTV, BTTV, FTV, Aftenposten and Aftonbladet - **VGTV**: VGTV, BTTV, FTV, Aftenposten and Aftonbladet
- **vh1.com** - **vh1.com**
- **vhx:embed**
- **Viafree** - **Viafree**
- **vice** - **vice**
- **vice:article** - **vice:article**
@ -1053,6 +1071,7 @@
- **VVVVID** - **VVVVID**
- **VyboryMos** - **VyboryMos**
- **Vzaar** - **Vzaar**
- **Wakanim**
- **Walla** - **Walla**
- **WalyTV** - **WalyTV**
- **washingtonpost** - **washingtonpost**
@ -1080,6 +1099,7 @@
- **wrzuta.pl:playlist** - **wrzuta.pl:playlist**
- **WSJ**: Wall Street Journal - **WSJ**: Wall Street Journal
- **WSJArticle** - **WSJArticle**
- **WWE**
- **XBef** - **XBef**
- **XboxClips** - **XboxClips**
- **XFileShare**: XFileShare based sites: DaClips, FileHoot, GorillaVid, MovPod, PowerWatch, Rapidvideo.ws, TheVideoBee, Vidto, Streamin.To, XVIDSTAGE, Vid ABC, VidBom, vidlo, RapidVideo.TV, FastVideo.me - **XFileShare**: XFileShare based sites: DaClips, FileHoot, GorillaVid, MovPod, PowerWatch, Rapidvideo.ws, TheVideoBee, Vidto, Streamin.To, XVIDSTAGE, Vid ABC, VidBom, vidlo, RapidVideo.TV, FastVideo.me
@ -1139,3 +1159,4 @@
- **ZDF** - **ZDF**
- **ZDFChannel** - **ZDFChannel**
- **zingmp3**: mp3.zing.vn - **zingmp3**: mp3.zing.vn
- **Zype**

View File

@ -124,6 +124,8 @@ setup(
'Development Status :: 5 - Production/Stable', 'Development Status :: 5 - Production/Stable',
'Environment :: Console', 'Environment :: Console',
'License :: Public Domain', 'License :: Public Domain',
'Programming Language :: Python',
'Programming Language :: Python :: 2',
'Programming Language :: Python :: 2.6', 'Programming Language :: Python :: 2.6',
'Programming Language :: Python :: 2.7', 'Programming Language :: Python :: 2.7',
'Programming Language :: Python :: 3', 'Programming Language :: Python :: 3',
@ -132,6 +134,13 @@ setup(
'Programming Language :: Python :: 3.4', 'Programming Language :: Python :: 3.4',
'Programming Language :: Python :: 3.5', 'Programming Language :: Python :: 3.5',
'Programming Language :: Python :: 3.6', 'Programming Language :: Python :: 3.6',
'Programming Language :: Python :: 3.7',
'Programming Language :: Python :: 3.8',
'Programming Language :: Python :: Implementation',
'Programming Language :: Python :: Implementation :: CPython',
'Programming Language :: Python :: Implementation :: IronPython',
'Programming Language :: Python :: Implementation :: Jython',
'Programming Language :: Python :: Implementation :: PyPy',
], ],
cmdclass={'build_lazy_extractors': build_lazy_extractors}, cmdclass={'build_lazy_extractors': build_lazy_extractors},

View File

@ -153,15 +153,27 @@ def expect_value(self, got, expected, field):
isinstance(got, compat_str), isinstance(got, compat_str),
'Expected field %s to be a unicode object, but got value %r of type %r' % (field, got, type(got))) 'Expected field %s to be a unicode object, but got value %r of type %r' % (field, got, type(got)))
got = 'md5:' + md5(got) got = 'md5:' + md5(got)
elif isinstance(expected, compat_str) and expected.startswith('mincount:'): elif isinstance(expected, compat_str) and re.match(r'^(?:min|max)?count:\d+', expected):
self.assertTrue( self.assertTrue(
isinstance(got, (list, dict)), isinstance(got, (list, dict)),
'Expected field %s to be a list or a dict, but it is of type %s' % ( 'Expected field %s to be a list or a dict, but it is of type %s' % (
field, type(got).__name__)) field, type(got).__name__))
expected_num = int(expected.partition(':')[2]) op, _, expected_num = expected.partition(':')
assertGreaterEqual( expected_num = int(expected_num)
if op == 'mincount':
assert_func = assertGreaterEqual
msg_tmpl = 'Expected %d items in field %s, but only got %d'
elif op == 'maxcount':
assert_func = assertLessEqual
msg_tmpl = 'Expected maximum %d items in field %s, but got %d'
elif op == 'count':
assert_func = assertEqual
msg_tmpl = 'Expected exactly %d items in field %s, but got %d'
else:
assert False
assert_func(
self, len(got), expected_num, self, len(got), expected_num,
'Expected %d items in field %s, but only got %d' % (expected_num, field, len(got))) msg_tmpl % (expected_num, field, len(got)))
return return
self.assertEqual( self.assertEqual(
expected, got, expected, got,
@ -237,6 +249,20 @@ def assertGreaterEqual(self, got, expected, msg=None):
self.assertTrue(got >= expected, msg) self.assertTrue(got >= expected, msg)
def assertLessEqual(self, got, expected, msg=None):
if not (got <= expected):
if msg is None:
msg = '%r not less than or equal to %r' % (got, expected)
self.assertTrue(got <= expected, msg)
def assertEqual(self, got, expected, msg=None):
if not (got == expected):
if msg is None:
msg = '%r not equal to %r' % (got, expected)
self.assertTrue(got == expected, msg)
def expect_warnings(ydl, warnings_re): def expect_warnings(ydl, warnings_re):
real_warning = ydl.report_warning real_warning = ydl.report_warning

View File

@ -497,7 +497,64 @@ jwplayer("mediaplayer").setup({"abouttext":"Visit Indie DB","aboutlink":"http:\/
'width': 1280, 'width': 1280,
'height': 720, 'height': 720,
}] }]
) ),
(
# https://github.com/rg3/youtube-dl/issues/18923
# https://www.ted.com/talks/boris_hesser_a_grassroots_healthcare_revolution_in_africa
'ted_18923',
'http://hls.ted.com/talks/31241.m3u8',
[{
'url': 'http://hls.ted.com/videos/BorisHesser_2018S/audio/600k.m3u8?nobumpers=true&uniqueId=76011e2b',
'format_id': '600k-Audio',
'vcodec': 'none',
}, {
'url': 'http://hls.ted.com/videos/BorisHesser_2018S/audio/600k.m3u8?nobumpers=true&uniqueId=76011e2b',
'format_id': '68',
'vcodec': 'none',
}, {
'url': 'http://hls.ted.com/videos/BorisHesser_2018S/video/64k.m3u8?nobumpers=true&uniqueId=76011e2b',
'format_id': '163',
'acodec': 'none',
'width': 320,
'height': 180,
}, {
'url': 'http://hls.ted.com/videos/BorisHesser_2018S/video/180k.m3u8?nobumpers=true&uniqueId=76011e2b',
'format_id': '481',
'acodec': 'none',
'width': 512,
'height': 288,
}, {
'url': 'http://hls.ted.com/videos/BorisHesser_2018S/video/320k.m3u8?nobumpers=true&uniqueId=76011e2b',
'format_id': '769',
'acodec': 'none',
'width': 512,
'height': 288,
}, {
'url': 'http://hls.ted.com/videos/BorisHesser_2018S/video/450k.m3u8?nobumpers=true&uniqueId=76011e2b',
'format_id': '984',
'acodec': 'none',
'width': 512,
'height': 288,
}, {
'url': 'http://hls.ted.com/videos/BorisHesser_2018S/video/600k.m3u8?nobumpers=true&uniqueId=76011e2b',
'format_id': '1255',
'acodec': 'none',
'width': 640,
'height': 360,
}, {
'url': 'http://hls.ted.com/videos/BorisHesser_2018S/video/950k.m3u8?nobumpers=true&uniqueId=76011e2b',
'format_id': '1693',
'acodec': 'none',
'width': 853,
'height': 480,
}, {
'url': 'http://hls.ted.com/videos/BorisHesser_2018S/video/1500k.m3u8?nobumpers=true&uniqueId=76011e2b',
'format_id': '2462',
'acodec': 'none',
'width': 1280,
'height': 720,
}]
),
] ]
for m3u8_file, m3u8_url, expected_formats in _TEST_CASES: for m3u8_file, m3u8_url, expected_formats in _TEST_CASES:

View File

@ -239,6 +239,76 @@ class TestFormatSelection(unittest.TestCase):
downloaded = ydl.downloaded_info_dicts[0] downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'vid-vcodec-dot') self.assertEqual(downloaded['format_id'], 'vid-vcodec-dot')
def test_format_selection_string_ops(self):
formats = [
{'format_id': 'abc-cba', 'ext': 'mp4', 'url': TEST_URL},
{'format_id': 'zxc-cxz', 'ext': 'webm', 'url': TEST_URL},
]
info_dict = _make_result(formats)
# equals (=)
ydl = YDL({'format': '[format_id=abc-cba]'})
ydl.process_ie_result(info_dict.copy())
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'abc-cba')
# does not equal (!=)
ydl = YDL({'format': '[format_id!=abc-cba]'})
ydl.process_ie_result(info_dict.copy())
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'zxc-cxz')
ydl = YDL({'format': '[format_id!=abc-cba][format_id!=zxc-cxz]'})
self.assertRaises(ExtractorError, ydl.process_ie_result, info_dict.copy())
# starts with (^=)
ydl = YDL({'format': '[format_id^=abc]'})
ydl.process_ie_result(info_dict.copy())
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'abc-cba')
# does not start with (!^=)
ydl = YDL({'format': '[format_id!^=abc]'})
ydl.process_ie_result(info_dict.copy())
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'zxc-cxz')
ydl = YDL({'format': '[format_id!^=abc][format_id!^=zxc]'})
self.assertRaises(ExtractorError, ydl.process_ie_result, info_dict.copy())
# ends with ($=)
ydl = YDL({'format': '[format_id$=cba]'})
ydl.process_ie_result(info_dict.copy())
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'abc-cba')
# does not end with (!$=)
ydl = YDL({'format': '[format_id!$=cba]'})
ydl.process_ie_result(info_dict.copy())
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'zxc-cxz')
ydl = YDL({'format': '[format_id!$=cba][format_id!$=cxz]'})
self.assertRaises(ExtractorError, ydl.process_ie_result, info_dict.copy())
# contains (*=)
ydl = YDL({'format': '[format_id*=bc-cb]'})
ydl.process_ie_result(info_dict.copy())
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'abc-cba')
# does not contain (!*=)
ydl = YDL({'format': '[format_id!*=bc-cb]'})
ydl.process_ie_result(info_dict.copy())
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'zxc-cxz')
ydl = YDL({'format': '[format_id!*=abc][format_id!*=zxc]'})
self.assertRaises(ExtractorError, ydl.process_ie_result, info_dict.copy())
ydl = YDL({'format': '[format_id!*=-]'})
self.assertRaises(ExtractorError, ydl.process_ie_result, info_dict.copy())
def test_youtube_format_selection(self): def test_youtube_format_selection(self):
order = [ order = [
'38', '37', '46', '22', '45', '35', '44', '18', '34', '43', '6', '5', '17', '36', '13', '38', '37', '46', '22', '45', '35', '44', '18', '34', '43', '6', '5', '17', '36', '13',

View File

@ -0,0 +1,34 @@
#!/usr/bin/env python
# coding: utf-8
from __future__ import unicode_literals
import os
import re
import sys
import tempfile
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from youtube_dl.utils import YoutubeDLCookieJar
class TestYoutubeDLCookieJar(unittest.TestCase):
def test_keep_session_cookies(self):
cookiejar = YoutubeDLCookieJar('./test/testdata/cookies/session_cookies.txt')
cookiejar.load(ignore_discard=True, ignore_expires=True)
tf = tempfile.NamedTemporaryFile(delete=False)
try:
cookiejar.save(filename=tf.name, ignore_discard=True, ignore_expires=True)
temp = tf.read().decode('utf-8')
self.assertTrue(re.search(
r'www\.foobar\.foobar\s+FALSE\s+/\s+TRUE\s+0\s+YoutubeDLExpiresEmpty\s+YoutubeDLExpiresEmptyValue', temp))
self.assertTrue(re.search(
r'www\.foobar\.foobar\s+FALSE\s+/\s+TRUE\s+0\s+YoutubeDLExpires0\s+YoutubeDLExpires0Value', temp))
finally:
tf.close()
os.remove(tf.name)
if __name__ == '__main__':
unittest.main()

View File

@ -39,7 +39,7 @@ class TestCompat(unittest.TestCase):
def test_compat_expanduser(self): def test_compat_expanduser(self):
old_home = os.environ.get('HOME') old_home = os.environ.get('HOME')
test_str = 'C:\Documents and Settings\тест\Application Data' test_str = r'C:\Documents and Settings\тест\Application Data'
compat_setenv('HOME', test_str) compat_setenv('HOME', test_str)
self.assertEqual(compat_expanduser('~'), test_str) self.assertEqual(compat_expanduser('~'), test_str)
compat_setenv('HOME', old_home or '') compat_setenv('HOME', old_home or '')

View File

@ -14,4 +14,4 @@ from youtube_dl.postprocessor import MetadataFromTitlePP
class TestMetadataFromTitle(unittest.TestCase): class TestMetadataFromTitle(unittest.TestCase):
def test_format_to_regex(self): def test_format_to_regex(self):
pp = MetadataFromTitlePP(None, '%(title)s - %(artist)s') pp = MetadataFromTitlePP(None, '%(title)s - %(artist)s')
self.assertEqual(pp._titleregex, '(?P<title>.+)\ \-\ (?P<artist>.+)') self.assertEqual(pp._titleregex, r'(?P<title>.+)\ \-\ (?P<artist>.+)')

View File

@ -507,6 +507,8 @@ class TestUtil(unittest.TestCase):
self.assertEqual(urljoin('http://foo.de/', ''), None) self.assertEqual(urljoin('http://foo.de/', ''), None)
self.assertEqual(urljoin('http://foo.de/', ['foobar']), None) self.assertEqual(urljoin('http://foo.de/', ['foobar']), None)
self.assertEqual(urljoin('http://foo.de/a/b/c.txt', '.././../d.txt'), 'http://foo.de/d.txt') self.assertEqual(urljoin('http://foo.de/a/b/c.txt', '.././../d.txt'), 'http://foo.de/d.txt')
self.assertEqual(urljoin('http://foo.de/a/b/c.txt', 'rtmp://foo.de'), 'rtmp://foo.de')
self.assertEqual(urljoin(None, 'rtmp://foo.de'), 'rtmp://foo.de')
def test_url_or_none(self): def test_url_or_none(self):
self.assertEqual(url_or_none(None), None) self.assertEqual(url_or_none(None), None)

View File

@ -0,0 +1,6 @@
# Netscape HTTP Cookie File
# http://curl.haxx.se/rfc/cookie_spec.html
# This is a generated file! Do not edit.
www.foobar.foobar FALSE / TRUE YoutubeDLExpiresEmpty YoutubeDLExpiresEmptyValue
www.foobar.foobar FALSE / TRUE 0 YoutubeDLExpires0 YoutubeDLExpires0Value

28
test/testdata/m3u8/ted_18923.m3u8 vendored Normal file
View File

@ -0,0 +1,28 @@
#EXTM3U
#EXT-X-VERSION:4
#EXT-X-STREAM-INF:AUDIO="600k",BANDWIDTH=1255659,PROGRAM-ID=1,CODECS="avc1.42c01e,mp4a.40.2",RESOLUTION=640x360
/videos/BorisHesser_2018S/video/600k.m3u8?nobumpers=true&uniqueId=76011e2b
#EXT-X-STREAM-INF:AUDIO="600k",BANDWIDTH=163154,PROGRAM-ID=1,CODECS="avc1.42c00c,mp4a.40.2",RESOLUTION=320x180
/videos/BorisHesser_2018S/video/64k.m3u8?nobumpers=true&uniqueId=76011e2b
#EXT-X-STREAM-INF:AUDIO="600k",BANDWIDTH=481701,PROGRAM-ID=1,CODECS="avc1.42c015,mp4a.40.2",RESOLUTION=512x288
/videos/BorisHesser_2018S/video/180k.m3u8?nobumpers=true&uniqueId=76011e2b
#EXT-X-STREAM-INF:AUDIO="600k",BANDWIDTH=769968,PROGRAM-ID=1,CODECS="avc1.42c015,mp4a.40.2",RESOLUTION=512x288
/videos/BorisHesser_2018S/video/320k.m3u8?nobumpers=true&uniqueId=76011e2b
#EXT-X-STREAM-INF:AUDIO="600k",BANDWIDTH=984037,PROGRAM-ID=1,CODECS="avc1.42c015,mp4a.40.2",RESOLUTION=512x288
/videos/BorisHesser_2018S/video/450k.m3u8?nobumpers=true&uniqueId=76011e2b
#EXT-X-STREAM-INF:AUDIO="600k",BANDWIDTH=1693925,PROGRAM-ID=1,CODECS="avc1.4d401f,mp4a.40.2",RESOLUTION=853x480
/videos/BorisHesser_2018S/video/950k.m3u8?nobumpers=true&uniqueId=76011e2b
#EXT-X-STREAM-INF:AUDIO="600k",BANDWIDTH=2462469,PROGRAM-ID=1,CODECS="avc1.640028,mp4a.40.2",RESOLUTION=1280x720
/videos/BorisHesser_2018S/video/1500k.m3u8?nobumpers=true&uniqueId=76011e2b
#EXT-X-STREAM-INF:AUDIO="600k",BANDWIDTH=68101,PROGRAM-ID=1,CODECS="mp4a.40.2",DEFAULT=YES
/videos/BorisHesser_2018S/audio/600k.m3u8?nobumpers=true&uniqueId=76011e2b
#EXT-X-I-FRAME-STREAM-INF:BANDWIDTH=74298,PROGRAM-ID=1,CODECS="avc1.42c00c",RESOLUTION=320x180,URI="/videos/BorisHesser_2018S/video/64k_iframe.m3u8?nobumpers=true&uniqueId=76011e2b"
#EXT-X-I-FRAME-STREAM-INF:BANDWIDTH=216200,PROGRAM-ID=1,CODECS="avc1.42c015",RESOLUTION=512x288,URI="/videos/BorisHesser_2018S/video/180k_iframe.m3u8?nobumpers=true&uniqueId=76011e2b"
#EXT-X-I-FRAME-STREAM-INF:BANDWIDTH=304717,PROGRAM-ID=1,CODECS="avc1.42c015",RESOLUTION=512x288,URI="/videos/BorisHesser_2018S/video/320k_iframe.m3u8?nobumpers=true&uniqueId=76011e2b"
#EXT-X-I-FRAME-STREAM-INF:BANDWIDTH=350933,PROGRAM-ID=1,CODECS="avc1.42c015",RESOLUTION=512x288,URI="/videos/BorisHesser_2018S/video/450k_iframe.m3u8?nobumpers=true&uniqueId=76011e2b"
#EXT-X-I-FRAME-STREAM-INF:BANDWIDTH=495850,PROGRAM-ID=1,CODECS="avc1.42c01e",RESOLUTION=640x360,URI="/videos/BorisHesser_2018S/video/600k_iframe.m3u8?nobumpers=true&uniqueId=76011e2b"
#EXT-X-I-FRAME-STREAM-INF:BANDWIDTH=810750,PROGRAM-ID=1,CODECS="avc1.4d401f",RESOLUTION=853x480,URI="/videos/BorisHesser_2018S/video/950k_iframe.m3u8?nobumpers=true&uniqueId=76011e2b"
#EXT-X-I-FRAME-STREAM-INF:BANDWIDTH=1273700,PROGRAM-ID=1,CODECS="avc1.640028",RESOLUTION=1280x720,URI="/videos/BorisHesser_2018S/video/1500k_iframe.m3u8?nobumpers=true&uniqueId=76011e2b"
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="600k",LANGUAGE="en",NAME="Audio",AUTOSELECT=YES,DEFAULT=YES,URI="/videos/BorisHesser_2018S/audio/600k.m3u8?nobumpers=true&uniqueId=76011e2b",BANDWIDTH=614400

View File

@ -88,6 +88,7 @@ from .utils import (
version_tuple, version_tuple,
write_json_file, write_json_file,
write_string, write_string,
YoutubeDLCookieJar,
YoutubeDLCookieProcessor, YoutubeDLCookieProcessor,
YoutubeDLHandler, YoutubeDLHandler,
) )
@ -558,7 +559,7 @@ class YoutubeDL(object):
self.restore_console_title() self.restore_console_title()
if self.params.get('cookiefile') is not None: if self.params.get('cookiefile') is not None:
self.cookiejar.save() self.cookiejar.save(ignore_discard=True, ignore_expires=True)
def trouble(self, message=None, tb=None): def trouble(self, message=None, tb=None):
"""Determine action to take when a download problem appears. """Determine action to take when a download problem appears.
@ -1062,21 +1063,24 @@ class YoutubeDL(object):
if not m: if not m:
STR_OPERATORS = { STR_OPERATORS = {
'=': operator.eq, '=': operator.eq,
'!=': operator.ne,
'^=': lambda attr, value: attr.startswith(value), '^=': lambda attr, value: attr.startswith(value),
'$=': lambda attr, value: attr.endswith(value), '$=': lambda attr, value: attr.endswith(value),
'*=': lambda attr, value: value in attr, '*=': lambda attr, value: value in attr,
} }
str_operator_rex = re.compile(r'''(?x) str_operator_rex = re.compile(r'''(?x)
\s*(?P<key>ext|acodec|vcodec|container|protocol|format_id) \s*(?P<key>ext|acodec|vcodec|container|protocol|format_id)
\s*(?P<op>%s)(?P<none_inclusive>\s*\?)? \s*(?P<negation>!\s*)?(?P<op>%s)(?P<none_inclusive>\s*\?)?
\s*(?P<value>[a-zA-Z0-9._-]+) \s*(?P<value>[a-zA-Z0-9._-]+)
\s*$ \s*$
''' % '|'.join(map(re.escape, STR_OPERATORS.keys()))) ''' % '|'.join(map(re.escape, STR_OPERATORS.keys())))
m = str_operator_rex.search(filter_spec) m = str_operator_rex.search(filter_spec)
if m: if m:
comparison_value = m.group('value') comparison_value = m.group('value')
op = STR_OPERATORS[m.group('op')] str_op = STR_OPERATORS[m.group('op')]
if m.group('negation'):
op = lambda attr, value: not str_op(attr, value)
else:
op = str_op
if not m: if not m:
raise ValueError('Invalid filter specification %r' % filter_spec) raise ValueError('Invalid filter specification %r' % filter_spec)
@ -2056,15 +2060,21 @@ class YoutubeDL(object):
self.report_warning('Unable to remove downloaded original file') self.report_warning('Unable to remove downloaded original file')
def _make_archive_id(self, info_dict): def _make_archive_id(self, info_dict):
video_id = info_dict.get('id')
if not video_id:
return
# Future-proof against any change in case # Future-proof against any change in case
# and backwards compatibility with prior versions # and backwards compatibility with prior versions
extractor = info_dict.get('extractor_key') extractor = info_dict.get('extractor_key') or info_dict.get('ie_key') # key in a playlist
if extractor is None: if extractor is None:
if 'id' in info_dict: # Try to find matching extractor for the URL and take its ie_key
extractor = info_dict.get('ie_key') # key in a playlist for ie in self._ies:
if extractor is None: if ie.suitable(info_dict['url']):
return None # Incomplete video information extractor = ie.ie_key()
return extractor.lower() + ' ' + info_dict['id'] break
else:
return
return extractor.lower() + ' ' + video_id
def in_download_archive(self, info_dict): def in_download_archive(self, info_dict):
fn = self.params.get('download_archive') fn = self.params.get('download_archive')
@ -2072,7 +2082,7 @@ class YoutubeDL(object):
return False return False
vid_id = self._make_archive_id(info_dict) vid_id = self._make_archive_id(info_dict)
if vid_id is None: if not vid_id:
return False # Incomplete video information return False # Incomplete video information
try: try:
@ -2297,10 +2307,9 @@ class YoutubeDL(object):
self.cookiejar = compat_cookiejar.CookieJar() self.cookiejar = compat_cookiejar.CookieJar()
else: else:
opts_cookiefile = expand_path(opts_cookiefile) opts_cookiefile = expand_path(opts_cookiefile)
self.cookiejar = compat_cookiejar.MozillaCookieJar( self.cookiejar = YoutubeDLCookieJar(opts_cookiefile)
opts_cookiefile)
if os.access(opts_cookiefile, os.R_OK): if os.access(opts_cookiefile, os.R_OK):
self.cookiejar.load() self.cookiejar.load(ignore_discard=True, ignore_expires=True)
cookie_processor = YoutubeDLCookieProcessor(self.cookiejar) cookie_processor = YoutubeDLCookieProcessor(self.cookiejar)
if opts_proxy is not None: if opts_proxy is not None:

View File

@ -75,10 +75,14 @@ class HlsFD(FragmentFD):
fd.add_progress_hook(ph) fd.add_progress_hook(ph)
return fd.real_download(filename, info_dict) return fd.real_download(filename, info_dict)
def is_ad_fragment(s): def is_ad_fragment_start(s):
return (s.startswith('#ANVATO-SEGMENT-INFO') and 'type=ad' in s or return (s.startswith('#ANVATO-SEGMENT-INFO') and 'type=ad' in s or
s.startswith('#UPLYNK-SEGMENT') and s.endswith(',ad')) s.startswith('#UPLYNK-SEGMENT') and s.endswith(',ad'))
def is_ad_fragment_end(s):
return (s.startswith('#ANVATO-SEGMENT-INFO') and 'type=master' in s or
s.startswith('#UPLYNK-SEGMENT') and s.endswith(',segment'))
media_frags = 0 media_frags = 0
ad_frags = 0 ad_frags = 0
ad_frag_next = False ad_frag_next = False
@ -87,12 +91,13 @@ class HlsFD(FragmentFD):
if not line: if not line:
continue continue
if line.startswith('#'): if line.startswith('#'):
if is_ad_fragment(line): if is_ad_fragment_start(line):
ad_frags += 1
ad_frag_next = True ad_frag_next = True
elif is_ad_fragment_end(line):
ad_frag_next = False
continue continue
if ad_frag_next: if ad_frag_next:
ad_frag_next = False ad_frags += 1
continue continue
media_frags += 1 media_frags += 1
@ -123,7 +128,6 @@ class HlsFD(FragmentFD):
if line: if line:
if not line.startswith('#'): if not line.startswith('#'):
if ad_frag_next: if ad_frag_next:
ad_frag_next = False
continue continue
frag_index += 1 frag_index += 1
if frag_index <= ctx['fragment_index']: if frag_index <= ctx['fragment_index']:
@ -196,8 +200,10 @@ class HlsFD(FragmentFD):
'start': sub_range_start, 'start': sub_range_start,
'end': sub_range_start + int(splitted_byte_range[0]), 'end': sub_range_start + int(splitted_byte_range[0]),
} }
elif is_ad_fragment(line): elif is_ad_fragment_start(line):
ad_frag_next = True ad_frag_next = True
elif is_ad_fragment_end(line):
ad_frag_next = False
self._finish_frag_download(ctx) self._finish_frag_download(ctx)

View File

@ -17,25 +17,15 @@ from ..utils import (
class ACastIE(InfoExtractor): class ACastIE(InfoExtractor):
IE_NAME = 'acast' IE_NAME = 'acast'
_VALID_URL = r'https?://(?:www\.)?acast\.com/(?P<channel>[^/]+)/(?P<id>[^/#?]+)' _VALID_URL = r'''(?x)
https?://
(?:
(?:(?:embed|www)\.)?acast\.com/|
play\.acast\.com/s/
)
(?P<channel>[^/]+)/(?P<id>[^/#?]+)
'''
_TESTS = [{ _TESTS = [{
# test with one bling
'url': 'https://www.acast.com/condenasttraveler/-where-are-you-taipei-101-taiwan',
'md5': 'ada3de5a1e3a2a381327d749854788bb',
'info_dict': {
'id': '57de3baa-4bb0-487e-9418-2692c1277a34',
'ext': 'mp3',
'title': '"Where Are You?": Taipei 101, Taiwan',
'description': 'md5:a0b4ef3634e63866b542e5b1199a1a0e',
'timestamp': 1196172000,
'upload_date': '20071127',
'duration': 211,
'creator': 'Concierge',
'series': 'Condé Nast Traveler Podcast',
'episode': '"Where Are You?": Taipei 101, Taiwan',
}
}, {
# test with multiple blings
'url': 'https://www.acast.com/sparpodcast/2.raggarmordet-rosterurdetforflutna', 'url': 'https://www.acast.com/sparpodcast/2.raggarmordet-rosterurdetforflutna',
'md5': 'a02393c74f3bdb1801c3ec2695577ce0', 'md5': 'a02393c74f3bdb1801c3ec2695577ce0',
'info_dict': { 'info_dict': {
@ -50,6 +40,12 @@ class ACastIE(InfoExtractor):
'series': 'Spår', 'series': 'Spår',
'episode': '2. Raggarmordet - Röster ur det förflutna', 'episode': '2. Raggarmordet - Röster ur det förflutna',
} }
}, {
'url': 'http://embed.acast.com/adambuxton/ep.12-adam-joeschristmaspodcast2015',
'only_matching': True,
}, {
'url': 'https://play.acast.com/s/rattegangspodden/s04e09-styckmordet-i-helenelund-del-22',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -83,17 +79,27 @@ class ACastIE(InfoExtractor):
class ACastChannelIE(InfoExtractor): class ACastChannelIE(InfoExtractor):
IE_NAME = 'acast:channel' IE_NAME = 'acast:channel'
_VALID_URL = r'https?://(?:www\.)?acast\.com/(?P<id>[^/#?]+)' _VALID_URL = r'''(?x)
_TEST = { https?://
'url': 'https://www.acast.com/condenasttraveler', (?:
(?:www\.)?acast\.com/|
play\.acast\.com/s/
)
(?P<id>[^/#?]+)
'''
_TESTS = [{
'url': 'https://www.acast.com/todayinfocus',
'info_dict': { 'info_dict': {
'id': '50544219-29bb-499e-a083-6087f4cb7797', 'id': '4efc5294-5385-4847-98bd-519799ce5786',
'title': 'Condé Nast Traveler Podcast', 'title': 'Today in Focus',
'description': 'md5:98646dee22a5b386626ae31866638fbd', 'description': 'md5:9ba5564de5ce897faeb12963f4537a64',
}, },
'playlist_mincount': 20, 'playlist_mincount': 35,
} }, {
_API_BASE_URL = 'https://www.acast.com/api/' 'url': 'http://play.acast.com/s/ft-banking-weekly',
'only_matching': True,
}]
_API_BASE_URL = 'https://play.acast.com/api/'
_PAGE_SIZE = 10 _PAGE_SIZE = 10
@classmethod @classmethod
@ -106,7 +112,7 @@ class ACastChannelIE(InfoExtractor):
channel_slug, note='Download page %d of channel data' % page) channel_slug, note='Download page %d of channel data' % page)
for cast in casts: for cast in casts:
yield self.url_result( yield self.url_result(
'https://www.acast.com/%s/%s' % (channel_slug, cast['url']), 'https://play.acast.com/s/%s/%s' % (channel_slug, cast['url']),
'ACast', cast['id']) 'ACast', cast['id'])
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -22,18 +22,19 @@ class AENetworksBaseIE(ThePlatformIE):
class AENetworksIE(AENetworksBaseIE): class AENetworksIE(AENetworksBaseIE):
IE_NAME = 'aenetworks' IE_NAME = 'aenetworks'
IE_DESC = 'A+E Networks: A&E, Lifetime, History.com, FYI Network' IE_DESC = 'A+E Networks: A&E, Lifetime, History.com, FYI Network and History Vault'
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?:// https?://
(?:www\.)? (?:www\.)?
(?P<domain> (?P<domain>
(?:history|aetv|mylifetime|lifetimemovieclub)\.com| (?:history(?:vault)?|aetv|mylifetime|lifetimemovieclub)\.com|
fyi\.tv fyi\.tv
)/ )/
(?: (?:
shows/(?P<show_path>[^/]+(?:/[^/]+){0,2})| shows/(?P<show_path>[^/]+(?:/[^/]+){0,2})|
movies/(?P<movie_display_id>[^/]+)(?:/full-movie)?| movies/(?P<movie_display_id>[^/]+)(?:/full-movie)?|
specials/(?P<special_display_id>[^/]+)/full-special specials/(?P<special_display_id>[^/]+)/full-special|
collections/[^/]+/(?P<collection_display_id>[^/]+)
) )
''' '''
_TESTS = [{ _TESTS = [{
@ -80,6 +81,9 @@ class AENetworksIE(AENetworksBaseIE):
}, { }, {
'url': 'http://www.history.com/specials/sniper-into-the-kill-zone/full-special', 'url': 'http://www.history.com/specials/sniper-into-the-kill-zone/full-special',
'only_matching': True 'only_matching': True
}, {
'url': 'https://www.historyvault.com/collections/america-the-story-of-us/westward',
'only_matching': True
}] }]
_DOMAIN_TO_REQUESTOR_ID = { _DOMAIN_TO_REQUESTOR_ID = {
'history.com': 'HISTORY', 'history.com': 'HISTORY',
@ -90,9 +94,9 @@ class AENetworksIE(AENetworksBaseIE):
} }
def _real_extract(self, url): def _real_extract(self, url):
domain, show_path, movie_display_id, special_display_id = re.match(self._VALID_URL, url).groups() domain, show_path, movie_display_id, special_display_id, collection_display_id = re.match(self._VALID_URL, url).groups()
display_id = show_path or movie_display_id or special_display_id display_id = show_path or movie_display_id or special_display_id or collection_display_id
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id, headers=self.geo_verification_headers())
if show_path: if show_path:
url_parts = show_path.split('/') url_parts = show_path.split('/')
url_parts_len = len(url_parts) url_parts_len = len(url_parts)

View File

@ -43,10 +43,6 @@ class AmericasTestKitchenIE(InfoExtractor):
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
partner_id = self._search_regex(
r'src=["\'](?:https?:)?//(?:[^/]+\.)kaltura\.com/(?:[^/]+/)*(?:p|partner_id)/(\d+)',
webpage, 'kaltura partner id')
video_data = self._parse_json( video_data = self._parse_json(
self._search_regex( self._search_regex(
r'window\.__INITIAL_STATE__\s*=\s*({.+?})\s*;\s*</script>', r'window\.__INITIAL_STATE__\s*=\s*({.+?})\s*;\s*</script>',
@ -58,7 +54,18 @@ class AmericasTestKitchenIE(InfoExtractor):
(lambda x: x['episodeDetail']['content']['data'], (lambda x: x['episodeDetail']['content']['data'],
lambda x: x['videoDetail']['content']['data']), dict) lambda x: x['videoDetail']['content']['data']), dict)
ep_meta = ep_data.get('full_video', {}) ep_meta = ep_data.get('full_video', {})
external_id = ep_data.get('external_id') or ep_meta['external_id']
zype_id = ep_meta.get('zype_id')
if zype_id:
embed_url = 'https://player.zype.com/embed/%s.js?api_key=jZ9GUhRmxcPvX7M3SlfejB6Hle9jyHTdk2jVxG7wOHPLODgncEKVdPYBhuz9iWXQ' % zype_id
ie_key = 'Zype'
else:
partner_id = self._search_regex(
r'src=["\'](?:https?:)?//(?:[^/]+\.)kaltura\.com/(?:[^/]+/)*(?:p|partner_id)/(\d+)',
webpage, 'kaltura partner id')
external_id = ep_data.get('external_id') or ep_meta['external_id']
embed_url = 'kaltura:%s:%s' % (partner_id, external_id)
ie_key = 'Kaltura'
title = ep_data.get('title') or ep_meta.get('title') title = ep_data.get('title') or ep_meta.get('title')
description = clean_html(ep_meta.get('episode_description') or ep_data.get( description = clean_html(ep_meta.get('episode_description') or ep_data.get(
@ -72,8 +79,8 @@ class AmericasTestKitchenIE(InfoExtractor):
return { return {
'_type': 'url_transparent', '_type': 'url_transparent',
'url': 'kaltura:%s:%s' % (partner_id, external_id), 'url': embed_url,
'ie_key': 'Kaltura', 'ie_key': ie_key,
'title': title, 'title': title,
'description': description, 'description': description,
'thumbnail': thumbnail, 'thumbnail': thumbnail,

View File

@ -8,20 +8,23 @@ from .generic import GenericIE
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
ExtractorError, ExtractorError,
qualities,
int_or_none, int_or_none,
parse_duration, parse_duration,
qualities,
str_or_none,
try_get,
unified_strdate, unified_strdate,
xpath_text, unified_timestamp,
update_url_query, update_url_query,
url_or_none, url_or_none,
xpath_text,
) )
from ..compat import compat_etree_fromstring from ..compat import compat_etree_fromstring
class ARDMediathekIE(InfoExtractor): class ARDMediathekIE(InfoExtractor):
IE_NAME = 'ARD:mediathek' IE_NAME = 'ARD:mediathek'
_VALID_URL = r'^https?://(?:(?:www\.)?ardmediathek\.de|mediathek\.(?:daserste|rbb-online)\.de|one\.ard\.de)/(?:.*/)(?P<video_id>[0-9]+|[^0-9][^/\?]+)[^/\?]*(?:\?.*)?' _VALID_URL = r'^https?://(?:(?:(?:www|classic)\.)?ardmediathek\.de|mediathek\.(?:daserste|rbb-online)\.de|one\.ard\.de)/(?:.*/)(?P<video_id>[0-9]+|[^0-9][^/\?]+)[^/\?]*(?:\?.*)?'
_TESTS = [{ _TESTS = [{
# available till 26.07.2022 # available till 26.07.2022
@ -51,8 +54,15 @@ class ARDMediathekIE(InfoExtractor):
# audio # audio
'url': 'http://mediathek.rbb-online.de/radio/Hörspiel/Vor-dem-Fest/kulturradio/Audio?documentId=30796318&topRessort=radio&bcastId=9839158', 'url': 'http://mediathek.rbb-online.de/radio/Hörspiel/Vor-dem-Fest/kulturradio/Audio?documentId=30796318&topRessort=radio&bcastId=9839158',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://classic.ardmediathek.de/tv/Panda-Gorilla-Co/Panda-Gorilla-Co-Folge-274/Das-Erste/Video?bcastId=16355486&documentId=58234698',
'only_matching': True,
}] }]
@classmethod
def suitable(cls, url):
return False if ARDBetaMediathekIE.suitable(url) else super(ARDMediathekIE, cls).suitable(url)
def _extract_media_info(self, media_info_url, webpage, video_id): def _extract_media_info(self, media_info_url, webpage, video_id):
media_info = self._download_json( media_info = self._download_json(
media_info_url, video_id, 'Downloading media JSON') media_info_url, video_id, 'Downloading media JSON')
@ -173,13 +183,18 @@ class ARDMediathekIE(InfoExtractor):
title = self._html_search_regex( title = self._html_search_regex(
[r'<h1(?:\s+class="boxTopHeadline")?>(.*?)</h1>', [r'<h1(?:\s+class="boxTopHeadline")?>(.*?)</h1>',
r'<meta name="dcterms\.title" content="(.*?)"/>', r'<meta name="dcterms\.title" content="(.*?)"/>',
r'<h4 class="headline">(.*?)</h4>'], r'<h4 class="headline">(.*?)</h4>',
r'<title[^>]*>(.*?)</title>'],
webpage, 'title') webpage, 'title')
description = self._html_search_meta( description = self._html_search_meta(
'dcterms.abstract', webpage, 'description', default=None) 'dcterms.abstract', webpage, 'description', default=None)
if description is None: if description is None:
description = self._html_search_meta( description = self._html_search_meta(
'description', webpage, 'meta description') 'description', webpage, 'meta description', default=None)
if description is None:
description = self._html_search_regex(
r'<p\s+class="teasertext">(.+?)</p>',
webpage, 'teaser text', default=None)
# Thumbnail is sometimes not present. # Thumbnail is sometimes not present.
# It is in the mobile version, but that seems to use a different URL # It is in the mobile version, but that seems to use a different URL
@ -288,7 +303,7 @@ class ARDIE(InfoExtractor):
class ARDBetaMediathekIE(InfoExtractor): class ARDBetaMediathekIE(InfoExtractor):
_VALID_URL = r'https://beta\.ardmediathek\.de/[a-z]+/player/(?P<video_id>[a-zA-Z0-9]+)/(?P<display_id>[^/?#]+)' _VALID_URL = r'https://(?:beta|www)\.ardmediathek\.de/[^/]+/(?:player|live)/(?P<video_id>[a-zA-Z0-9]+)(?:/(?P<display_id>[^/?#]+))?'
_TESTS = [{ _TESTS = [{
'url': 'https://beta.ardmediathek.de/ard/player/Y3JpZDovL2Rhc2Vyc3RlLmRlL3RhdG9ydC9mYmM4NGM1NC0xNzU4LTRmZGYtYWFhZS0wYzcyZTIxNGEyMDE/die-robuste-roswita', 'url': 'https://beta.ardmediathek.de/ard/player/Y3JpZDovL2Rhc2Vyc3RlLmRlL3RhdG9ydC9mYmM4NGM1NC0xNzU4LTRmZGYtYWFhZS0wYzcyZTIxNGEyMDE/die-robuste-roswita',
'md5': '2d02d996156ea3c397cfc5036b5d7f8f', 'md5': '2d02d996156ea3c397cfc5036b5d7f8f',
@ -302,12 +317,18 @@ class ARDBetaMediathekIE(InfoExtractor):
'upload_date': '20180826', 'upload_date': '20180826',
'ext': 'mp4', 'ext': 'mp4',
}, },
}, {
'url': 'https://www.ardmediathek.de/ard/player/Y3JpZDovL3N3ci5kZS9hZXgvbzEwNzE5MTU/',
'only_matching': True,
}, {
'url': 'https://www.ardmediathek.de/swr/live/Y3JpZDovL3N3ci5kZS8xMzQ4MTA0Mg',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('video_id') video_id = mobj.group('video_id')
display_id = mobj.group('display_id') display_id = mobj.group('display_id') or video_id
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
data_json = self._search_regex(r'window\.__APOLLO_STATE__\s*=\s*(\{.*);\n', webpage, 'json') data_json = self._search_regex(r'window\.__APOLLO_STATE__\s*=\s*(\{.*);\n', webpage, 'json')
@ -318,43 +339,62 @@ class ARDBetaMediathekIE(InfoExtractor):
'display_id': display_id, 'display_id': display_id,
} }
formats = [] formats = []
subtitles = {}
geoblocked = False
for widget in data.values(): for widget in data.values():
if widget.get('_geoblocked'): if widget.get('_geoblocked') is True:
raise ExtractorError('This video is not available due to geoblocking', expected=True) geoblocked = True
if '_duration' in widget: if '_duration' in widget:
res['duration'] = widget['_duration'] res['duration'] = int_or_none(widget['_duration'])
if 'clipTitle' in widget: if 'clipTitle' in widget:
res['title'] = widget['clipTitle'] res['title'] = widget['clipTitle']
if '_previewImage' in widget: if '_previewImage' in widget:
res['thumbnail'] = widget['_previewImage'] res['thumbnail'] = widget['_previewImage']
if 'broadcastedOn' in widget: if 'broadcastedOn' in widget:
res['upload_date'] = unified_strdate(widget['broadcastedOn']) res['timestamp'] = unified_timestamp(widget['broadcastedOn'])
if 'synopsis' in widget: if 'synopsis' in widget:
res['description'] = widget['synopsis'] res['description'] = widget['synopsis']
if '_subtitleUrl' in widget: subtitle_url = url_or_none(widget.get('_subtitleUrl'))
res['subtitles'] = {'de': [{ if subtitle_url:
subtitles.setdefault('de', []).append({
'ext': 'ttml', 'ext': 'ttml',
'url': widget['_subtitleUrl'], 'url': subtitle_url,
}]} })
if '_quality' in widget: if '_quality' in widget:
format_url = widget['_stream']['json'][0] format_url = url_or_none(try_get(
widget, lambda x: x['_stream']['json'][0]))
if format_url.endswith('.f4m'): if not format_url:
continue
ext = determine_ext(format_url)
if ext == 'f4m':
formats.extend(self._extract_f4m_formats( formats.extend(self._extract_f4m_formats(
format_url + '?hdcore=3.11.0', format_url + '?hdcore=3.11.0',
video_id, f4m_id='hds', fatal=False)) video_id, f4m_id='hds', fatal=False))
elif format_url.endswith('m3u8'): elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
format_url, video_id, 'mp4', m3u8_id='hls', fatal=False)) format_url, video_id, 'mp4', m3u8_id='hls',
fatal=False))
else: else:
# HTTP formats are not available when geoblocked is True,
# other formats are fine though
if geoblocked:
continue
quality = str_or_none(widget.get('_quality'))
formats.append({ formats.append({
'format_id': 'http-' + widget['_quality'], 'format_id': ('http-' + quality) if quality else 'http',
'url': format_url, 'url': format_url,
'preference': 10, # Plain HTTP, that's nice 'preference': 10, # Plain HTTP, that's nice
}) })
if not formats and geoblocked:
self.raise_geo_restricted(
msg='This video is not available due to geoblocking',
countries=['DE'])
self._sort_formats(formats) self._sort_formats(formats)
res['formats'] = formats res.update({
'subtitles': subtitles,
'formats': formats,
})
return res return res

View File

@ -28,8 +28,10 @@ class ATVAtIE(InfoExtractor):
display_id = self._match_id(url) display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
video_data = self._parse_json(unescapeHTML(self._search_regex( video_data = self._parse_json(unescapeHTML(self._search_regex(
r'class="[^"]*jsb_video/FlashPlayer[^"]*"[^>]+data-jsb="([^"]+)"', [r'flashPlayerOptions\s*=\s*(["\'])(?P<json>(?:(?!\1).)+)\1',
webpage, 'player data')), display_id)['config']['initial_video'] r'class="[^"]*jsb_video/FlashPlayer[^"]*"[^>]+data-jsb="(?P<json>[^"]+)"'],
webpage, 'player data', group='json')),
display_id)['config']['initial_video']
video_id = video_data['id'] video_id = video_data['id']
video_title = video_data['title'] video_title = video_data['title']

View File

@ -62,7 +62,7 @@ class AudiomackIE(InfoExtractor):
# Audiomack wraps a lot of soundcloud tracks in their branded wrapper # Audiomack wraps a lot of soundcloud tracks in their branded wrapper
# if so, pass the work off to the soundcloud extractor # if so, pass the work off to the soundcloud extractor
if SoundcloudIE.suitable(api_response['url']): if SoundcloudIE.suitable(api_response['url']):
return {'_type': 'url', 'url': api_response['url'], 'ie_key': 'Soundcloud'} return self.url_result(api_response['url'], SoundcloudIE.ie_key())
return { return {
'id': compat_str(api_response.get('id', album_url_tag)), 'id': compat_str(api_response.get('id', album_url_tag)),

View File

@ -36,7 +36,6 @@ class AZMedienIE(InfoExtractor):
'id': '1_anruz3wy', 'id': '1_anruz3wy',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Bundesrats-Vakanzen / EU-Rahmenabkommen', 'title': 'Bundesrats-Vakanzen / EU-Rahmenabkommen',
'description': 'md5:dd9f96751ec9c35e409a698a328402f3',
'uploader_id': 'TVOnline', 'uploader_id': 'TVOnline',
'upload_date': '20180930', 'upload_date': '20180930',
'timestamp': 1538328802, 'timestamp': 1538328802,
@ -53,15 +52,12 @@ class AZMedienIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
host = mobj.group('host')
video_id = mobj.group('id') video_id = mobj.group('id')
entry_id = mobj.group('kaltura_id') entry_id = mobj.group('kaltura_id')
if not entry_id: if not entry_id:
webpage = self._download_webpage(url, video_id) api_url = 'https://www.%s/api/pub/gql/%s' % (host, host.split('.')[0])
api_path = self._search_regex(
r'["\']apiPath["\']\s*:\s*["\']([^"^\']+)["\']',
webpage, 'api path')
api_url = 'https://www.%s%s' % (mobj.group('host'), api_path)
payload = { payload = {
'query': '''query VideoContext($articleId: ID!) { 'query': '''query VideoContext($articleId: ID!) {
article: node(id: $articleId) { article: node(id: $articleId) {

View File

@ -795,6 +795,15 @@ class BBCIE(BBCCoUkIE):
'uploader': 'Radio 3', 'uploader': 'Radio 3',
'uploader_id': 'bbc_radio_three', 'uploader_id': 'bbc_radio_three',
}, },
}, {
'url': 'http://www.bbc.co.uk/learningenglish/chinese/features/lingohack/ep-181227',
'info_dict': {
'id': 'p06w9tws',
'ext': 'mp4',
'title': 'md5:2fabf12a726603193a2879a055f72514',
'description': 'Learn English words and phrases from this story',
},
'add_ie': [BBCCoUkIE.ie_key()],
}] }]
@classmethod @classmethod
@ -945,6 +954,15 @@ class BBCIE(BBCCoUkIE):
if entries: if entries:
return self.playlist_result(entries, playlist_id, playlist_title, playlist_description) return self.playlist_result(entries, playlist_id, playlist_title, playlist_description)
# http://www.bbc.co.uk/learningenglish/chinese/features/lingohack/ep-181227
group_id = self._search_regex(
r'<div[^>]+\bclass=["\']video["\'][^>]+\bdata-pid=["\'](%s)' % self._ID_REGEX,
webpage, 'group id', default=None)
if playlist_id:
return self.url_result(
'https://www.bbc.co.uk/programmes/%s' % group_id,
ie=BBCCoUkIE.ie_key())
# single video story (e.g. http://www.bbc.com/travel/story/20150625-sri-lankas-spicy-secret) # single video story (e.g. http://www.bbc.com/travel/story/20150625-sri-lankas-spicy-secret)
programme_id = self._search_regex( programme_id = self._search_regex(
[r'data-(?:video-player|media)-vpid="(%s)"' % self._ID_REGEX, [r'data-(?:video-player|media)-vpid="(%s)"' % self._ID_REGEX,

View File

@ -1,15 +1,10 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import compat_str
compat_chr,
compat_ord,
compat_urllib_parse_unquote,
)
from ..utils import ( from ..utils import (
int_or_none, int_or_none,
parse_iso8601, unified_timestamp,
urljoin,
) )
@ -36,29 +31,9 @@ class BeegIE(InfoExtractor):
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
cpl_url = self._search_regex( beeg_version = self._search_regex(
r'<script[^>]+src=(["\'])(?P<url>(?:/static|(?:https?:)?//static\.beeg\.com)/cpl/\d+\.js.*?)\1', r'beeg_version\s*=\s*([\da-zA-Z_-]+)', webpage, 'beeg version',
webpage, 'cpl', default=None, group='url') default='1546225636701')
cpl_url = urljoin(url, cpl_url)
beeg_version, beeg_salt = [None] * 2
if cpl_url:
cpl = self._download_webpage(
self._proto_relative_url(cpl_url), video_id,
'Downloading cpl JS', fatal=False)
if cpl:
beeg_version = int_or_none(self._search_regex(
r'beeg_version\s*=\s*([^\b]+)', cpl,
'beeg version', default=None)) or self._search_regex(
r'/(\d+)\.js', cpl_url, 'beeg version', default=None)
beeg_salt = self._search_regex(
r'beeg_salt\s*=\s*(["\'])(?P<beeg_salt>.+?)\1', cpl, 'beeg salt',
default=None, group='beeg_salt')
beeg_version = beeg_version or '2185'
beeg_salt = beeg_salt or 'pmweAkq8lAYKdfWcFCUj0yoVgoPlinamH5UE1CB3H'
for api_path in ('', 'api.'): for api_path in ('', 'api.'):
video = self._download_json( video = self._download_json(
@ -68,37 +43,6 @@ class BeegIE(InfoExtractor):
if video: if video:
break break
def split(o, e):
def cut(s, x):
n.append(s[:x])
return s[x:]
n = []
r = len(o) % e
if r > 0:
o = cut(o, r)
while len(o) > e:
o = cut(o, e)
n.append(o)
return n
def decrypt_key(key):
# Reverse engineered from http://static.beeg.com/cpl/1738.js
a = beeg_salt
e = compat_urllib_parse_unquote(key)
o = ''.join([
compat_chr(compat_ord(e[n]) - compat_ord(a[n % len(a)]) % 21)
for n in range(len(e))])
return ''.join(split(o, 3)[::-1])
def decrypt_url(encrypted_url):
encrypted_url = self._proto_relative_url(
encrypted_url.replace('{DATA_MARKERS}', ''), 'https:')
key = self._search_regex(
r'/key=(.*?)%2Cend=', encrypted_url, 'key', default=None)
if not key:
return encrypted_url
return encrypted_url.replace(key, decrypt_key(key))
formats = [] formats = []
for format_id, video_url in video.items(): for format_id, video_url in video.items():
if not video_url: if not video_url:
@ -108,18 +52,20 @@ class BeegIE(InfoExtractor):
if not height: if not height:
continue continue
formats.append({ formats.append({
'url': decrypt_url(video_url), 'url': self._proto_relative_url(
video_url.replace('{DATA_MARKERS}', 'data=pc_XX__%s_0' % beeg_version), 'https:'),
'format_id': format_id, 'format_id': format_id,
'height': int(height), 'height': int(height),
}) })
self._sort_formats(formats) self._sort_formats(formats)
title = video['title'] title = video['title']
video_id = video.get('id') or video_id video_id = compat_str(video.get('id') or video_id)
display_id = video.get('code') display_id = video.get('code')
description = video.get('desc') description = video.get('desc')
series = video.get('ps_name')
timestamp = parse_iso8601(video.get('date'), ' ') timestamp = unified_timestamp(video.get('date'))
duration = int_or_none(video.get('duration')) duration = int_or_none(video.get('duration'))
tags = [tag.strip() for tag in video['tags'].split(',')] if video.get('tags') else None tags = [tag.strip() for tag in video['tags'].split(',')] if video.get('tags') else None
@ -129,6 +75,7 @@ class BeegIE(InfoExtractor):
'display_id': display_id, 'display_id': display_id,
'title': title, 'title': title,
'description': description, 'description': description,
'series': series,
'timestamp': timestamp, 'timestamp': timestamp,
'duration': duration, 'duration': duration,
'tags': tags, 'tags': tags,

View File

@ -5,7 +5,10 @@ import itertools
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import urlencode_postdata from ..utils import (
orderedSet,
urlencode_postdata,
)
class BitChuteIE(InfoExtractor): class BitChuteIE(InfoExtractor):
@ -37,16 +40,22 @@ class BitChuteIE(InfoExtractor):
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.57 Safari/537.36', 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.57 Safari/537.36',
}) })
title = self._search_regex( title = self._html_search_regex(
(r'<[^>]+\bid=["\']video-title[^>]+>([^<]+)', r'<title>([^<]+)'), (r'<[^>]+\bid=["\']video-title[^>]+>([^<]+)', r'<title>([^<]+)'),
webpage, 'title', default=None) or self._html_search_meta( webpage, 'title', default=None) or self._html_search_meta(
'description', webpage, 'title', 'description', webpage, 'title',
default=None) or self._og_search_description(webpage) default=None) or self._og_search_description(webpage)
format_urls = []
for mobj in re.finditer(
r'addWebSeed\s*\(\s*(["\'])(?P<url>(?:(?!\1).)+)\1', webpage):
format_urls.append(mobj.group('url'))
format_urls.extend(re.findall(r'as=(https?://[^&"\']+)', webpage))
formats = [ formats = [
{'url': mobj.group('url')} {'url': format_url}
for mobj in re.finditer( for format_url in orderedSet(format_urls)]
r'addWebSeed\s*\(\s*(["\'])(?P<url>(?:(?!\1).)+)\1', webpage)] self._check_formats(formats, video_id)
self._sort_formats(formats) self._sort_formats(formats)
description = self._html_search_regex( description = self._html_search_regex(

View File

@ -14,6 +14,7 @@ class CamModelsIE(InfoExtractor):
_TESTS = [{ _TESTS = [{
'url': 'https://www.cammodels.com/cam/AutumnKnight/', 'url': 'https://www.cammodels.com/cam/AutumnKnight/',
'only_matching': True, 'only_matching': True,
'age_limit': 18
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -93,4 +94,5 @@ class CamModelsIE(InfoExtractor):
'title': self._live_title(user_id), 'title': self._live_title(user_id),
'is_live': True, 'is_live': True,
'formats': formats, 'formats': formats,
'age_limit': 18
} }

View File

@ -20,6 +20,7 @@ class CamTubeIE(InfoExtractor):
'duration': 1274, 'duration': 1274,
'timestamp': 1528018608, 'timestamp': 1528018608,
'upload_date': '20180603', 'upload_date': '20180603',
'age_limit': 18
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
@ -66,4 +67,5 @@ class CamTubeIE(InfoExtractor):
'like_count': like_count, 'like_count': like_count,
'creator': creator, 'creator': creator,
'formats': formats, 'formats': formats,
'age_limit': 18
} }

View File

@ -25,6 +25,7 @@ class CamWithHerIE(InfoExtractor):
'comment_count': int, 'comment_count': int,
'uploader': 'MileenaK', 'uploader': 'MileenaK',
'upload_date': '20160322', 'upload_date': '20160322',
'age_limit': 18,
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
@ -84,4 +85,5 @@ class CamWithHerIE(InfoExtractor):
'comment_count': comment_count, 'comment_count': comment_count,
'uploader': uploader, 'uploader': uploader,
'upload_date': upload_date, 'upload_date': upload_date,
'age_limit': 18
} }

View File

@ -82,6 +82,12 @@ class CarambaTVPageIE(InfoExtractor):
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
videomore_url = VideomoreIE._extract_url(webpage) videomore_url = VideomoreIE._extract_url(webpage)
if not videomore_url:
videomore_id = self._search_regex(
r'getVMCode\s*\(\s*["\']?(\d+)', webpage, 'videomore id',
default=None)
if videomore_id:
videomore_url = 'videomore:%s' % videomore_id
if videomore_url: if videomore_url:
title = self._og_search_title(webpage) title = self._og_search_title(webpage)
return { return {

View File

@ -1,20 +1,19 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .turner import TurnerBaseIE from .turner import TurnerBaseIE
from ..utils import int_or_none
class CartoonNetworkIE(TurnerBaseIE): class CartoonNetworkIE(TurnerBaseIE):
_VALID_URL = r'https?://(?:www\.)?cartoonnetwork\.com/video/(?:[^/]+/)+(?P<id>[^/?#]+)-(?:clip|episode)\.html' _VALID_URL = r'https?://(?:www\.)?cartoonnetwork\.com/video/(?:[^/]+/)+(?P<id>[^/?#]+)-(?:clip|episode)\.html'
_TEST = { _TEST = {
'url': 'http://www.cartoonnetwork.com/video/teen-titans-go/starfire-the-cat-lady-clip.html', 'url': 'https://www.cartoonnetwork.com/video/ben-10/how-to-draw-upgrade-episode.html',
'info_dict': { 'info_dict': {
'id': '8a250ab04ed07e6c014ef3f1e2f9016c', 'id': '6e3375097f63874ebccec7ef677c1c3845fa850e',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Starfire the Cat Lady', 'title': 'How to Draw Upgrade',
'description': 'Robin decides to become a cat so that Starfire will finally love him.', 'description': 'md5:2061d83776db7e8be4879684eefe8c0f',
}, },
'params': { 'params': {
# m3u8 download # m3u8 download
@ -25,18 +24,39 @@ class CartoonNetworkIE(TurnerBaseIE):
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
id_type, video_id = re.search(r"_cnglobal\.cvp(Video|Title)Id\s*=\s*'([^']+)';", webpage).groups()
query = ('id' if id_type == 'Video' else 'titleId') + '=' + video_id def find_field(global_re, name, content_re=None, value_re='[^"]+', fatal=False):
return self._extract_cvp_info( metadata_re = ''
'http://www.cartoonnetwork.com/video-seo-svc/episodeservices/getCvpPlaylist?networkName=CN2&' + query, video_id, { if content_re:
'secure': { metadata_re = r'|video_metadata\.content_' + content_re
'media_src': 'http://androidhls-secure.cdn.turner.com/toon/big', return self._search_regex(
'tokenizer_src': 'https://token.vgtf.net/token/token_mobile', r'(?:_cnglobal\.currentVideo\.%s%s)\s*=\s*"(%s)";' % (global_re, metadata_re, value_re),
}, webpage, name, fatal=fatal)
}, {
media_id = find_field('mediaId', 'media id', 'id', '[0-9a-f]{40}', True)
title = find_field('episodeTitle', 'title', '(?:episodeName|name)', fatal=True)
info = self._extract_ngtv_info(
media_id, {'networkId': 'cartoonnetwork'}, {
'url': url, 'url': url,
'site_name': 'CartoonNetwork', 'site_name': 'CartoonNetwork',
'auth_required': self._search_regex( 'auth_required': find_field('authType', 'auth type') != 'unauth',
r'_cnglobal\.cvpFullOrPreviewAuth\s*=\s*(true|false);',
webpage, 'auth required', default='false') == 'true',
}) })
series = find_field(
'propertyName', 'series', 'showName') or self._html_search_meta('partOfSeries', webpage)
info.update({
'id': media_id,
'display_id': display_id,
'title': title,
'description': self._html_search_meta('description', webpage),
'series': series,
'episode': title,
})
for field in ('season', 'episode'):
field_name = field + 'Number'
info[field + '_number'] = int_or_none(find_field(
field_name, field + ' number', value_re=r'\d+') or self._html_search_meta(field_name, webpage))
return info

View File

@ -0,0 +1,142 @@
# coding: utf-8
from __future__ import unicode_literals
import itertools
from .common import InfoExtractor
from ..compat import (
compat_parse_qs,
compat_urllib_parse_urlparse,
)
from ..utils import (
clean_html,
float_or_none,
int_or_none,
try_get,
urlencode_postdata,
)
class CiscoLiveBaseIE(InfoExtractor):
# These appear to be constant across all Cisco Live presentations
# and are not tied to any user session or event
RAINFOCUS_API_URL = 'https://events.rainfocus.com/api/%s'
RAINFOCUS_API_PROFILE_ID = 'Na3vqYdAlJFSxhYTYQGuMbpafMqftalz'
RAINFOCUS_WIDGET_ID = 'n6l4Lo05R8fiy3RpUBm447dZN8uNWoye'
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/5647924234001/SyK2FdqjM_default/index.html?videoId=%s'
HEADERS = {
'Origin': 'https://ciscolive.cisco.com',
'rfApiProfileId': RAINFOCUS_API_PROFILE_ID,
'rfWidgetId': RAINFOCUS_WIDGET_ID,
}
def _call_api(self, ep, rf_id, query, referrer, note=None):
headers = self.HEADERS.copy()
headers['Referer'] = referrer
return self._download_json(
self.RAINFOCUS_API_URL % ep, rf_id, note=note,
data=urlencode_postdata(query), headers=headers)
def _parse_rf_item(self, rf_item):
event_name = rf_item.get('eventName')
title = rf_item['title']
description = clean_html(rf_item.get('abstract'))
presenter_name = try_get(rf_item, lambda x: x['participants'][0]['fullName'])
bc_id = rf_item['videos'][0]['url']
bc_url = self.BRIGHTCOVE_URL_TEMPLATE % bc_id
duration = float_or_none(try_get(rf_item, lambda x: x['times'][0]['length']))
location = try_get(rf_item, lambda x: x['times'][0]['room'])
if duration:
duration = duration * 60
return {
'_type': 'url_transparent',
'url': bc_url,
'ie_key': 'BrightcoveNew',
'title': title,
'description': description,
'duration': duration,
'creator': presenter_name,
'location': location,
'series': event_name,
}
class CiscoLiveSessionIE(CiscoLiveBaseIE):
_VALID_URL = r'https?://ciscolive\.cisco\.com/on-demand-library/\??[^#]*#/session/(?P<id>[^/?&]+)'
_TEST = {
'url': 'https://ciscolive.cisco.com/on-demand-library/?#/session/1423353499155001FoSs',
'md5': 'c98acf395ed9c9f766941c70f5352e22',
'info_dict': {
'id': '5803694304001',
'ext': 'mp4',
'title': '13 Smart Automations to Monitor Your Cisco IOS Network',
'description': 'md5:ec4a436019e09a918dec17714803f7cc',
'timestamp': 1530305395,
'upload_date': '20180629',
'uploader_id': '5647924234001',
'location': '16B Mezz.',
},
}
def _real_extract(self, url):
rf_id = self._match_id(url)
rf_result = self._call_api('session', rf_id, {'id': rf_id}, url)
return self._parse_rf_item(rf_result['items'][0])
class CiscoLiveSearchIE(CiscoLiveBaseIE):
_VALID_URL = r'https?://ciscolive\.cisco\.com/on-demand-library/'
_TESTS = [{
'url': 'https://ciscolive.cisco.com/on-demand-library/?search.event=ciscoliveus2018&search.technicallevel=scpsSkillLevel_aintroductory&search.focus=scpsSessionFocus_designAndDeployment#/',
'info_dict': {
'title': 'Search query',
},
'playlist_count': 5,
}, {
'url': 'https://ciscolive.cisco.com/on-demand-library/?search.technology=scpsTechnology_applicationDevelopment&search.technology=scpsTechnology_ipv6&search.focus=scpsSessionFocus_troubleshootingTroubleshooting#/',
'only_matching': True,
}]
@classmethod
def suitable(cls, url):
return False if CiscoLiveSessionIE.suitable(url) else super(CiscoLiveSearchIE, cls).suitable(url)
@staticmethod
def _check_bc_id_exists(rf_item):
return int_or_none(try_get(rf_item, lambda x: x['videos'][0]['url'])) is not None
def _entries(self, query, url):
query['size'] = 50
query['from'] = 0
for page_num in itertools.count(1):
results = self._call_api(
'search', None, query, url,
'Downloading search JSON page %d' % page_num)
sl = try_get(results, lambda x: x['sectionList'][0], dict)
if sl:
results = sl
items = results.get('items')
if not items or not isinstance(items, list):
break
for item in items:
if not isinstance(item, dict):
continue
if not self._check_bc_id_exists(item):
continue
yield self._parse_rf_item(item)
size = int_or_none(results.get('size'))
if size is not None:
query['size'] = size
total = int_or_none(results.get('total'))
if total is not None and query['from'] + query['size'] > total:
break
query['from'] += query['size']
def _real_extract(self, url):
query = compat_parse_qs(compat_urllib_parse_urlparse(url).query)
query['type'] = 'session'
return self.playlist_result(
self._entries(query, url), playlist_title='Search query')

View File

@ -119,11 +119,7 @@ class CNNBlogsIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
webpage = self._download_webpage(url, url_basename(url)) webpage = self._download_webpage(url, url_basename(url))
cnn_url = self._html_search_regex(r'data-url="(.+?)"', webpage, 'cnn url') cnn_url = self._html_search_regex(r'data-url="(.+?)"', webpage, 'cnn url')
return { return self.url_result(cnn_url, CNNIE.ie_key())
'_type': 'url',
'url': cnn_url,
'ie_key': CNNIE.ie_key(),
}
class CNNArticleIE(InfoExtractor): class CNNArticleIE(InfoExtractor):
@ -145,8 +141,4 @@ class CNNArticleIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
webpage = self._download_webpage(url, url_basename(url)) webpage = self._download_webpage(url, url_basename(url))
cnn_url = self._html_search_regex(r"video:\s*'([^']+)'", webpage, 'cnn url') cnn_url = self._html_search_regex(r"video:\s*'([^']+)'", webpage, 'cnn url')
return { return self.url_result('http://cnn.com/video/?/video/' + cnn_url, CNNIE.ie_key())
'_type': 'url',
'url': 'http://cnn.com/video/?/video/' + cnn_url,
'ie_key': CNNIE.ie_key(),
}

View File

@ -1239,17 +1239,30 @@ class InfoExtractor(object):
if expected_type is not None and expected_type != item_type: if expected_type is not None and expected_type != item_type:
return info return info
if item_type in ('TVEpisode', 'Episode'): if item_type in ('TVEpisode', 'Episode'):
episode_name = unescapeHTML(e.get('name'))
info.update({ info.update({
'episode': unescapeHTML(e.get('name')), 'episode': episode_name,
'episode_number': int_or_none(e.get('episodeNumber')), 'episode_number': int_or_none(e.get('episodeNumber')),
'description': unescapeHTML(e.get('description')), 'description': unescapeHTML(e.get('description')),
}) })
if not info.get('title') and episode_name:
info['title'] = episode_name
part_of_season = e.get('partOfSeason') part_of_season = e.get('partOfSeason')
if isinstance(part_of_season, dict) and part_of_season.get('@type') in ('TVSeason', 'Season', 'CreativeWorkSeason'): if isinstance(part_of_season, dict) and part_of_season.get('@type') in ('TVSeason', 'Season', 'CreativeWorkSeason'):
info['season_number'] = int_or_none(part_of_season.get('seasonNumber')) info.update({
'season': unescapeHTML(part_of_season.get('name')),
'season_number': int_or_none(part_of_season.get('seasonNumber')),
})
part_of_series = e.get('partOfSeries') or e.get('partOfTVSeries') part_of_series = e.get('partOfSeries') or e.get('partOfTVSeries')
if isinstance(part_of_series, dict) and part_of_series.get('@type') in ('TVSeries', 'Series', 'CreativeWorkSeries'): if isinstance(part_of_series, dict) and part_of_series.get('@type') in ('TVSeries', 'Series', 'CreativeWorkSeries'):
info['series'] = unescapeHTML(part_of_series.get('name')) info['series'] = unescapeHTML(part_of_series.get('name'))
elif item_type == 'Movie':
info.update({
'title': unescapeHTML(e.get('name')),
'description': unescapeHTML(e.get('description')),
'duration': parse_duration(e.get('duration')),
'timestamp': unified_timestamp(e.get('dateCreated')),
})
elif item_type in ('Article', 'NewsArticle'): elif item_type in ('Article', 'NewsArticle'):
info.update({ info.update({
'timestamp': parse_iso8601(e.get('datePublished')), 'timestamp': parse_iso8601(e.get('datePublished')),
@ -1586,6 +1599,7 @@ class InfoExtractor(object):
# References: # References:
# 1. https://tools.ietf.org/html/draft-pantos-http-live-streaming-21 # 1. https://tools.ietf.org/html/draft-pantos-http-live-streaming-21
# 2. https://github.com/rg3/youtube-dl/issues/12211 # 2. https://github.com/rg3/youtube-dl/issues/12211
# 3. https://github.com/rg3/youtube-dl/issues/18923
# We should try extracting formats only from master playlists [1, 4.3.4], # We should try extracting formats only from master playlists [1, 4.3.4],
# i.e. playlists that describe available qualities. On the other hand # i.e. playlists that describe available qualities. On the other hand
@ -1657,11 +1671,16 @@ class InfoExtractor(object):
rendition = stream_group[0] rendition = stream_group[0]
return rendition.get('NAME') or stream_group_id return rendition.get('NAME') or stream_group_id
# parse EXT-X-MEDIA tags before EXT-X-STREAM-INF in order to have the
# chance to detect video only formats when EXT-X-STREAM-INF tags
# precede EXT-X-MEDIA tags in HLS manifest such as [3].
for line in m3u8_doc.splitlines():
if line.startswith('#EXT-X-MEDIA:'):
extract_media(line)
for line in m3u8_doc.splitlines(): for line in m3u8_doc.splitlines():
if line.startswith('#EXT-X-STREAM-INF:'): if line.startswith('#EXT-X-STREAM-INF:'):
last_stream_inf = parse_m3u8_attributes(line) last_stream_inf = parse_m3u8_attributes(line)
elif line.startswith('#EXT-X-MEDIA:'):
extract_media(line)
elif line.startswith('#') or not line.strip(): elif line.startswith('#') or not line.strip():
continue continue
else: else:
@ -2614,7 +2633,7 @@ class InfoExtractor(object):
'id': this_video_id, 'id': this_video_id,
'title': unescapeHTML(video_data['title'] if require_title else video_data.get('title')), 'title': unescapeHTML(video_data['title'] if require_title else video_data.get('title')),
'description': video_data.get('description'), 'description': video_data.get('description'),
'thumbnail': self._proto_relative_url(video_data.get('image')), 'thumbnail': urljoin(base_url, self._proto_relative_url(video_data.get('image'))),
'timestamp': int_or_none(video_data.get('pubdate')), 'timestamp': int_or_none(video_data.get('pubdate')),
'duration': float_or_none(jwplayer_data.get('duration') or video_data.get('duration')), 'duration': float_or_none(jwplayer_data.get('duration') or video_data.get('duration')),
'subtitles': subtitles, 'subtitles': subtitles,
@ -2641,12 +2660,9 @@ class InfoExtractor(object):
for source in jwplayer_sources_data: for source in jwplayer_sources_data:
if not isinstance(source, dict): if not isinstance(source, dict):
continue continue
source_url = self._proto_relative_url(source.get('file')) source_url = urljoin(
if not source_url: base_url, self._proto_relative_url(source.get('file')))
continue if not source_url or source_url in urls:
if base_url:
source_url = compat_urlparse.urljoin(base_url, source_url)
if source_url in urls:
continue continue
urls.append(source_url) urls.append(source_url)
source_type = source.get('type') or '' source_type = source.get('type') or ''

View File

@ -1,7 +1,10 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals, division from __future__ import unicode_literals, division
import hashlib
import hmac
import re import re
import time
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_HTTPError from ..compat import compat_HTTPError
@ -48,6 +51,21 @@ class CrackleIE(InfoExtractor):
'only_matching': True, 'only_matching': True,
}] }]
_MEDIA_FILE_SLOTS = {
'360p.mp4': {
'width': 640,
'height': 360,
},
'480p.mp4': {
'width': 768,
'height': 432,
},
'480p_1mbps.mp4': {
'width': 852,
'height': 480,
},
}
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
@ -59,13 +77,16 @@ class CrackleIE(InfoExtractor):
for country in countries: for country in countries:
try: try:
# Authorization generation algorithm is reverse engineered from:
# https://www.sonycrackle.com/static/js/main.ea93451f.chunk.js
media_detail_url = 'https://web-api-us.crackle.com/Service.svc/details/media/%s/%s?disableProtocols=true' % (video_id, country)
timestamp = time.strftime('%Y%m%d%H%M', time.gmtime())
h = hmac.new(b'IGSLUQCBDFHEOIFM', '|'.join([media_detail_url, timestamp]).encode(), hashlib.sha1).hexdigest().upper()
media = self._download_json( media = self._download_json(
'https://web-api-us.crackle.com/Service.svc/details/media/%s/%s' media_detail_url, video_id, 'Downloading media JSON as %s' % country,
% (video_id, country), video_id, 'Unable to download media JSON', headers={
'Downloading media JSON as %s' % country, 'Accept': 'application/json',
'Unable to download media JSON', query={ 'Authorization': '|'.join([h, timestamp, '117', '1']),
'disableProtocols': 'true',
'format': 'json'
}) })
except ExtractorError as e: except ExtractorError as e:
# 401 means geo restriction, trying next country # 401 means geo restriction, trying next country
@ -95,6 +116,20 @@ class CrackleIE(InfoExtractor):
elif ext == 'mpd': elif ext == 'mpd':
formats.extend(self._extract_mpd_formats( formats.extend(self._extract_mpd_formats(
format_url, video_id, mpd_id='dash', fatal=False)) format_url, video_id, mpd_id='dash', fatal=False))
elif format_url.endswith('.ism/Manifest'):
formats.extend(self._extract_ism_formats(
format_url, video_id, ism_id='mss', fatal=False))
else:
mfs_path = e.get('Type')
mfs_info = self._MEDIA_FILE_SLOTS.get(mfs_path)
if not mfs_info:
continue
formats.append({
'url': format_url,
'format_id': 'http-' + mfs_path.split('.')[0],
'width': mfs_info['width'],
'height': mfs_info['height'],
})
self._sort_formats(formats) self._sort_formats(formats)
description = media.get('Description') description = media.get('Description')

View File

@ -144,7 +144,7 @@ class CrunchyrollBaseIE(InfoExtractor):
class CrunchyrollIE(CrunchyrollBaseIE, VRVIE): class CrunchyrollIE(CrunchyrollBaseIE, VRVIE):
IE_NAME = 'crunchyroll' IE_NAME = 'crunchyroll'
_VALID_URL = r'https?://(?:(?P<prefix>www|m)\.)?(?P<url>crunchyroll\.(?:com|fr)/(?:media(?:-|/\?id=)|[^/]*/[^/?&]*?)(?P<video_id>[0-9]+))(?:[/?&]|$)' _VALID_URL = r'https?://(?:(?P<prefix>www|m)\.)?(?P<url>crunchyroll\.(?:com|fr)/(?:media(?:-|/\?id=)|(?:[^/]*/){1,2}[^/?&]*?)(?P<video_id>[0-9]+))(?:[/?&]|$)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.crunchyroll.com/wanna-be-the-strongest-in-the-world/episode-1-an-idol-wrestler-is-born-645513', 'url': 'http://www.crunchyroll.com/wanna-be-the-strongest-in-the-world/episode-1-an-idol-wrestler-is-born-645513',
'info_dict': { 'info_dict': {
@ -269,6 +269,9 @@ class CrunchyrollIE(CrunchyrollBaseIE, VRVIE):
}, { }, {
'url': 'http://www.crunchyroll.com/media-723735', 'url': 'http://www.crunchyroll.com/media-723735',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.crunchyroll.com/en-gb/mob-psycho-100/episode-2-urban-legends-encountering-rumors-780921',
'only_matching': True,
}] }]
_FORMAT_IDS = { _FORMAT_IDS = {

View File

@ -46,8 +46,24 @@ class CuriosityStreamBaseIE(InfoExtractor):
self._handle_errors(result) self._handle_errors(result)
self._auth_token = result['message']['auth_token'] self._auth_token = result['message']['auth_token']
def _extract_media_info(self, media):
video_id = compat_str(media['id']) class CuriosityStreamIE(CuriosityStreamBaseIE):
IE_NAME = 'curiositystream'
_VALID_URL = r'https?://(?:app\.)?curiositystream\.com/video/(?P<id>\d+)'
_TEST = {
'url': 'https://app.curiositystream.com/video/2',
'md5': '262bb2f257ff301115f1973540de8983',
'info_dict': {
'id': '2',
'ext': 'mp4',
'title': 'How Did You Develop The Internet?',
'description': 'Vint Cerf, Google\'s Chief Internet Evangelist, describes how he and Bob Kahn created the internet.',
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
media = self._call_api('media/' + video_id, video_id)
title = media['title'] title = media['title']
formats = [] formats = []
@ -114,38 +130,21 @@ class CuriosityStreamBaseIE(InfoExtractor):
} }
class CuriosityStreamIE(CuriosityStreamBaseIE):
IE_NAME = 'curiositystream'
_VALID_URL = r'https?://app\.curiositystream\.com/video/(?P<id>\d+)'
_TEST = {
'url': 'https://app.curiositystream.com/video/2',
'md5': '262bb2f257ff301115f1973540de8983',
'info_dict': {
'id': '2',
'ext': 'mp4',
'title': 'How Did You Develop The Internet?',
'description': 'Vint Cerf, Google\'s Chief Internet Evangelist, describes how he and Bob Kahn created the internet.',
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
media = self._call_api('media/' + video_id, video_id)
return self._extract_media_info(media)
class CuriosityStreamCollectionIE(CuriosityStreamBaseIE): class CuriosityStreamCollectionIE(CuriosityStreamBaseIE):
IE_NAME = 'curiositystream:collection' IE_NAME = 'curiositystream:collection'
_VALID_URL = r'https?://app\.curiositystream\.com/collection/(?P<id>\d+)' _VALID_URL = r'https?://(?:app\.)?curiositystream\.com/(?:collection|series)/(?P<id>\d+)'
_TEST = { _TESTS = [{
'url': 'https://app.curiositystream.com/collection/2', 'url': 'https://app.curiositystream.com/collection/2',
'info_dict': { 'info_dict': {
'id': '2', 'id': '2',
'title': 'Curious Minds: The Internet', 'title': 'Curious Minds: The Internet',
'description': 'How is the internet shaping our lives in the 21st Century?', 'description': 'How is the internet shaping our lives in the 21st Century?',
}, },
'playlist_mincount': 12, 'playlist_mincount': 17,
} }, {
'url': 'https://curiositystream.com/series/2',
'only_matching': True,
}]
def _real_extract(self, url): def _real_extract(self, url):
collection_id = self._match_id(url) collection_id = self._match_id(url)
@ -153,7 +152,10 @@ class CuriosityStreamCollectionIE(CuriosityStreamBaseIE):
'collections/' + collection_id, collection_id) 'collections/' + collection_id, collection_id)
entries = [] entries = []
for media in collection.get('media', []): for media in collection.get('media', []):
entries.append(self._extract_media_info(media)) media_id = compat_str(media.get('id'))
entries.append(self.url_result(
'https://curiositystream.com/video/' + media_id,
CuriosityStreamIE.ie_key(), media_id))
return self.playlist_result( return self.playlist_result(
entries, collection_id, entries, collection_id,
collection.get('title'), collection.get('description')) collection.get('title'), collection.get('description'))

View File

@ -17,16 +17,29 @@ from ..compat import compat_HTTPError
class DiscoveryIE(DiscoveryGoBaseIE): class DiscoveryIE(DiscoveryGoBaseIE):
_VALID_URL = r'''(?x)https?://(?:www\.)?(?P<site> _VALID_URL = r'''(?x)https?://
discovery| (?P<site>
investigationdiscovery| (?:www\.)?
discoverylife| (?:
animalplanet| discovery|
ahctv| investigationdiscovery|
destinationamerica| discoverylife|
sciencechannel| animalplanet|
tlc| ahctv|
velocity destinationamerica|
sciencechannel|
tlc|
velocity
)|
watch\.
(?:
hgtv|
foodnetwork|
travelchannel|
diynetwork|
cookingchanneltv|
motortrend
)
)\.com(?P<path>/tv-shows/[^/]+/(?:video|full-episode)s/(?P<id>[^./?#]+))''' )\.com(?P<path>/tv-shows/[^/]+/(?:video|full-episode)s/(?P<id>[^./?#]+))'''
_TESTS = [{ _TESTS = [{
'url': 'https://www.discovery.com/tv-shows/cash-cab/videos/dave-foley', 'url': 'https://www.discovery.com/tv-shows/cash-cab/videos/dave-foley',
@ -71,7 +84,7 @@ class DiscoveryIE(DiscoveryGoBaseIE):
if not access_token: if not access_token:
access_token = self._download_json( access_token = self._download_json(
'https://www.%s.com/anonymous' % site, display_id, query={ 'https://%s.com/anonymous' % site, display_id, query={
'authRel': 'authorization', 'authRel': 'authorization',
'client_id': try_get( 'client_id': try_get(
react_data, lambda x: x['application']['apiClientId'], react_data, lambda x: x['application']['apiClientId'],
@ -81,11 +94,12 @@ class DiscoveryIE(DiscoveryGoBaseIE):
})['access_token'] })['access_token']
try: try:
headers = self.geo_verification_headers()
headers['Authorization'] = 'Bearer ' + access_token
stream = self._download_json( stream = self._download_json(
'https://api.discovery.com/v1/streaming/video/' + video_id, 'https://api.discovery.com/v1/streaming/video/' + video_id,
display_id, headers={ display_id, headers=headers)
'Authorization': 'Bearer ' + access_token,
})
except ExtractorError as e: except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code in (401, 403): if isinstance(e.cause, compat_HTTPError) and e.cause.code in (401, 403):
e_description = self._parse_json( e_description = self._parse_json(

View File

@ -4,7 +4,9 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
int_or_none,
NO_DEFAULT, NO_DEFAULT,
parse_duration,
str_to_int, str_to_int,
) )
@ -65,6 +67,9 @@ class DrTuberIE(InfoExtractor):
}) })
self._sort_formats(formats) self._sort_formats(formats)
duration = int_or_none(video_data.get('duration')) or parse_duration(
video_data.get('duration_format'))
title = self._html_search_regex( title = self._html_search_regex(
(r'<h1[^>]+class=["\']title[^>]+>([^<]+)', (r'<h1[^>]+class=["\']title[^>]+>([^<]+)',
r'<title>([^<]+)\s*@\s+DrTuber', r'<title>([^<]+)\s*@\s+DrTuber',
@ -103,4 +108,5 @@ class DrTuberIE(InfoExtractor):
'comment_count': comment_count, 'comment_count': comment_count,
'categories': categories, 'categories': categories,
'age_limit': self._rta_search(webpage), 'age_limit': self._rta_search(webpage),
'duration': duration,
} }

View File

@ -1,15 +1,25 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import binascii
import hashlib
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..aes import aes_cbc_decrypt
from ..compat import compat_urllib_parse_unquote
from ..utils import ( from ..utils import (
bytes_to_intlist,
ExtractorError, ExtractorError,
int_or_none, int_or_none,
intlist_to_bytes,
float_or_none, float_or_none,
mimetype2ext, mimetype2ext,
parse_iso8601, str_or_none,
remove_end, unified_timestamp,
update_url_query, update_url_query,
url_or_none,
) )
@ -20,23 +30,31 @@ class DRTVIE(InfoExtractor):
IE_NAME = 'drtv' IE_NAME = 'drtv'
_TESTS = [{ _TESTS = [{
'url': 'https://www.dr.dk/tv/se/boern/ultra/klassen-ultra/klassen-darlig-taber-10', 'url': 'https://www.dr.dk/tv/se/boern/ultra/klassen-ultra/klassen-darlig-taber-10',
'md5': '7ae17b4e18eb5d29212f424a7511c184', 'md5': '25e659cccc9a2ed956110a299fdf5983',
'info_dict': { 'info_dict': {
'id': 'klassen-darlig-taber-10', 'id': 'klassen-darlig-taber-10',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Klassen - Dårlig taber (10)', 'title': 'Klassen - Dårlig taber (10)',
'description': 'md5:815fe1b7fa656ed80580f31e8b3c79aa', 'description': 'md5:815fe1b7fa656ed80580f31e8b3c79aa',
'timestamp': 1471991907, 'timestamp': 1539085800,
'upload_date': '20160823', 'upload_date': '20181009',
'duration': 606.84, 'duration': 606.84,
'series': 'Klassen',
'season': 'Klassen I',
'season_number': 1,
'season_id': 'urn:dr:mu:bundle:57d7e8216187a4031cfd6f6b',
'episode': 'Episode 10',
'episode_number': 10,
'release_year': 2016,
}, },
'expected_warnings': ['Unable to download f4m manifest'],
}, { }, {
# embed # embed
'url': 'https://www.dr.dk/nyheder/indland/live-christianias-rydning-af-pusher-street-er-i-gang', 'url': 'https://www.dr.dk/nyheder/indland/live-christianias-rydning-af-pusher-street-er-i-gang',
'info_dict': { 'info_dict': {
'id': 'christiania-pusher-street-ryddes-drdkrjpo', 'id': 'urn:dr:mu:programcard:57c926176187a50a9c6e83c6',
'ext': 'mp4', 'ext': 'mp4',
'title': 'LIVE Christianias rydning af Pusher Street er i gang', 'title': 'christiania pusher street ryddes drdkrjpo',
'description': 'md5:2a71898b15057e9b97334f61d04e6eb5', 'description': 'md5:2a71898b15057e9b97334f61d04e6eb5',
'timestamp': 1472800279, 'timestamp': 1472800279,
'upload_date': '20160902', 'upload_date': '20160902',
@ -45,17 +63,18 @@ class DRTVIE(InfoExtractor):
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
'expected_warnings': ['Unable to download f4m manifest'],
}, { }, {
# with SignLanguage formats # with SignLanguage formats
'url': 'https://www.dr.dk/tv/se/historien-om-danmark/-/historien-om-danmark-stenalder', 'url': 'https://www.dr.dk/tv/se/historien-om-danmark/-/historien-om-danmark-stenalder',
'info_dict': { 'info_dict': {
'id': 'historien-om-danmark-stenalder', 'id': 'historien-om-danmark-stenalder',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Historien om Danmark: Stenalder (1)', 'title': 'Historien om Danmark: Stenalder',
'description': 'md5:8c66dcbc1669bbc6f873879880f37f2a', 'description': 'md5:8c66dcbc1669bbc6f873879880f37f2a',
'timestamp': 1490401996, 'timestamp': 1546628400,
'upload_date': '20170325', 'upload_date': '20190104',
'duration': 3502.04, 'duration': 3502.56,
'formats': 'mincount:20', 'formats': 'mincount:20',
}, },
'params': { 'params': {
@ -74,20 +93,26 @@ class DRTVIE(InfoExtractor):
video_id = self._search_regex( video_id = self._search_regex(
(r'data-(?:material-identifier|episode-slug)="([^"]+)"', (r'data-(?:material-identifier|episode-slug)="([^"]+)"',
r'data-resource="[^>"]+mu/programcard/expanded/([^"]+)"'), r'data-resource="[^>"]+mu/programcard/expanded/([^"]+)"'),
webpage, 'video id') webpage, 'video id', default=None)
programcard = self._download_json( if not video_id:
'http://www.dr.dk/mu/programcard/expanded/%s' % video_id, video_id = compat_urllib_parse_unquote(self._search_regex(
video_id, 'Downloading video JSON') r'(urn(?:%3A|:)dr(?:%3A|:)mu(?:%3A|:)programcard(?:%3A|:)[\da-f]+)',
data = programcard['Data'][0] webpage, 'urn'))
title = remove_end(self._og_search_title( data = self._download_json(
webpage, default=None), ' | TV | DR') or data['Title'] 'https://www.dr.dk/mu-online/api/1.4/programcard/%s' % video_id,
video_id, 'Downloading video JSON', query={'expanded': 'true'})
title = str_or_none(data.get('Title')) or re.sub(
r'\s*\|\s*(?:TV\s*\|\s*DR|DRTV)$', '',
self._og_search_title(webpage))
description = self._og_search_description( description = self._og_search_description(
webpage, default=None) or data.get('Description') webpage, default=None) or data.get('Description')
timestamp = parse_iso8601(data.get('CreatedTime')) timestamp = unified_timestamp(
data.get('PrimaryBroadcastStartTime') or data.get('SortDateTime'))
thumbnail = None thumbnail = None
duration = None duration = None
@ -97,24 +122,62 @@ class DRTVIE(InfoExtractor):
formats = [] formats = []
subtitles = {} subtitles = {}
for asset in data['Assets']: assets = []
primary_asset = data.get('PrimaryAsset')
if isinstance(primary_asset, dict):
assets.append(primary_asset)
secondary_assets = data.get('SecondaryAssets')
if isinstance(secondary_assets, list):
for secondary_asset in secondary_assets:
if isinstance(secondary_asset, dict):
assets.append(secondary_asset)
def hex_to_bytes(hex):
return binascii.a2b_hex(hex.encode('ascii'))
def decrypt_uri(e):
n = int(e[2:10], 16)
a = e[10 + n:]
data = bytes_to_intlist(hex_to_bytes(e[10:10 + n]))
key = bytes_to_intlist(hashlib.sha256(
('%s:sRBzYNXBzkKgnjj8pGtkACch' % a).encode('utf-8')).digest())
iv = bytes_to_intlist(hex_to_bytes(a))
decrypted = aes_cbc_decrypt(data, key, iv)
return intlist_to_bytes(
decrypted[:-decrypted[-1]]).decode('utf-8').split('?')[0]
for asset in assets:
kind = asset.get('Kind') kind = asset.get('Kind')
if kind == 'Image': if kind == 'Image':
thumbnail = asset.get('Uri') thumbnail = url_or_none(asset.get('Uri'))
elif kind in ('VideoResource', 'AudioResource'): elif kind in ('VideoResource', 'AudioResource'):
duration = float_or_none(asset.get('DurationInMilliseconds'), 1000) duration = float_or_none(asset.get('DurationInMilliseconds'), 1000)
restricted_to_denmark = asset.get('RestrictedToDenmark') restricted_to_denmark = asset.get('RestrictedToDenmark')
asset_target = asset.get('Target') asset_target = asset.get('Target')
for link in asset.get('Links', []): for link in asset.get('Links', []):
uri = link.get('Uri') uri = link.get('Uri')
if not uri:
encrypted_uri = link.get('EncryptedUri')
if not encrypted_uri:
continue
try:
uri = decrypt_uri(encrypted_uri)
except Exception:
self.report_warning(
'Unable to decrypt EncryptedUri', video_id)
continue
uri = url_or_none(uri)
if not uri: if not uri:
continue continue
target = link.get('Target') target = link.get('Target')
format_id = target or '' format_id = target or ''
preference = None if asset_target in ('SpokenSubtitles', 'SignLanguage', 'VisuallyInterpreted'):
if asset_target in ('SpokenSubtitles', 'SignLanguage'):
preference = -1 preference = -1
format_id += '-%s' % asset_target format_id += '-%s' % asset_target
elif asset_target == 'Default':
preference = 1
else:
preference = None
if target == 'HDS': if target == 'HDS':
f4m_formats = self._extract_f4m_formats( f4m_formats = self._extract_f4m_formats(
uri + '?hdcore=3.3.0&plugin=aasp-3.3.0.99.43', uri + '?hdcore=3.3.0&plugin=aasp-3.3.0.99.43',
@ -140,19 +203,22 @@ class DRTVIE(InfoExtractor):
'vcodec': 'none' if kind == 'AudioResource' else None, 'vcodec': 'none' if kind == 'AudioResource' else None,
'preference': preference, 'preference': preference,
}) })
subtitles_list = asset.get('SubtitlesList') subtitles_list = asset.get('SubtitlesList') or asset.get('Subtitleslist')
if isinstance(subtitles_list, list): if isinstance(subtitles_list, list):
LANGS = { LANGS = {
'Danish': 'da', 'Danish': 'da',
} }
for subs in subtitles_list: for subs in subtitles_list:
if not subs.get('Uri'): if not isinstance(subs, dict):
continue continue
lang = subs.get('Language') or 'da' sub_uri = url_or_none(subs.get('Uri'))
subtitles.setdefault(LANGS.get(lang, lang), []).append({ if not sub_uri:
'url': subs['Uri'], continue
'ext': mimetype2ext(subs.get('MimeType')) or 'vtt' lang = subs.get('Language') or 'da'
}) subtitles.setdefault(LANGS.get(lang, lang), []).append({
'url': sub_uri,
'ext': mimetype2ext(subs.get('MimeType')) or 'vtt'
})
if not formats and restricted_to_denmark: if not formats and restricted_to_denmark:
self.raise_geo_restricted( self.raise_geo_restricted(
@ -170,6 +236,13 @@ class DRTVIE(InfoExtractor):
'duration': duration, 'duration': duration,
'formats': formats, 'formats': formats,
'subtitles': subtitles, 'subtitles': subtitles,
'series': str_or_none(data.get('SeriesTitle')),
'season': str_or_none(data.get('SeasonTitle')),
'season_number': int_or_none(data.get('SeasonNumber')),
'season_id': str_or_none(data.get('SeasonUrn')),
'episode': str_or_none(data.get('EpisodeTitle')),
'episode_number': int_or_none(data.get('EpisodeNumber')),
'release_year': int_or_none(data.get('ProductionYear')),
} }

View File

@ -15,16 +15,16 @@ from ..utils import (
class DTubeIE(InfoExtractor): class DTubeIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?d\.tube/(?:#!/)?v/(?P<uploader_id>[0-9a-z.-]+)/(?P<id>[0-9a-z]{8})' _VALID_URL = r'https?://(?:www\.)?d\.tube/(?:#!/)?v/(?P<uploader_id>[0-9a-z.-]+)/(?P<id>[0-9a-z]{8})'
_TEST = { _TEST = {
'url': 'https://d.tube/#!/v/benswann/zqd630em', 'url': 'https://d.tube/#!/v/broncnutz/x380jtr1',
'md5': 'a03eaa186618ffa7a3145945543a251e', 'md5': '9f29088fa08d699a7565ee983f56a06e',
'info_dict': { 'info_dict': {
'id': 'zqd630em', 'id': 'x380jtr1',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Reality Check: FDA\'s Disinformation Campaign on Kratom', 'title': 'Lefty 3-Rings is Back Baby!! NCAA Picks',
'description': 'md5:700d164e066b87f9eac057949e4227c2', 'description': 'md5:60be222088183be3a42f196f34235776',
'uploader_id': 'benswann', 'uploader_id': 'broncnutz',
'upload_date': '20180222', 'upload_date': '20190107',
'timestamp': 1519328958, 'timestamp': 1546854054,
}, },
'params': { 'params': {
'format': '480p', 'format': '480p',
@ -48,7 +48,7 @@ class DTubeIE(InfoExtractor):
def canonical_url(h): def canonical_url(h):
if not h: if not h:
return None return None
return 'https://ipfs.io/ipfs/' + h return 'https://video.dtube.top/ipfs/' + h
formats = [] formats = []
for q in ('240', '480', '720', '1080', ''): for q in ('240', '480', '720', '1080', ''):

View File

@ -194,6 +194,10 @@ from .chirbit import (
ChirbitProfileIE, ChirbitProfileIE,
) )
from .cinchcast import CinchcastIE from .cinchcast import CinchcastIE
from .ciscolive import (
CiscoLiveSessionIE,
CiscoLiveSearchIE,
)
from .cjsw import CJSWIE from .cjsw import CJSWIE
from .cliphunter import CliphunterIE from .cliphunter import CliphunterIE
from .clippit import ClippitIE from .clippit import ClippitIE
@ -407,6 +411,7 @@ from .funk import (
from .funnyordie import FunnyOrDieIE from .funnyordie import FunnyOrDieIE
from .fusion import FusionIE from .fusion import FusionIE
from .fxnetworks import FXNetworksIE from .fxnetworks import FXNetworksIE
from .gaia import GaiaIE
from .gameinformer import GameInformerIE from .gameinformer import GameInformerIE
from .gameone import ( from .gameone import (
GameOneIE, GameOneIE,
@ -447,6 +452,7 @@ from .hellporno import HellPornoIE
from .helsinki import HelsinkiIE from .helsinki import HelsinkiIE
from .hentaistigma import HentaiStigmaIE from .hentaistigma import HentaiStigmaIE
from .hgtv import HGTVComShowIE from .hgtv import HGTVComShowIE
from .hketv import HKETVIE
from .hidive import HiDiveIE from .hidive import HiDiveIE
from .historicfilms import HistoricFilmsIE from .historicfilms import HistoricFilmsIE
from .hitbox import HitboxIE, HitboxLiveIE from .hitbox import HitboxIE, HitboxLiveIE
@ -465,6 +471,10 @@ from .hrti import (
) )
from .huajiao import HuajiaoIE from .huajiao import HuajiaoIE
from .huffpost import HuffPostIE from .huffpost import HuffPostIE
from .hungama import (
HungamaIE,
HungamaSongIE,
)
from .hypem import HypemIE from .hypem import HypemIE
from .iconosquare import IconosquareIE from .iconosquare import IconosquareIE
from .ign import ( from .ign import (
@ -479,12 +489,17 @@ from .imdb import (
from .imgur import ( from .imgur import (
ImgurIE, ImgurIE,
ImgurAlbumIE, ImgurAlbumIE,
ImgurGalleryIE,
) )
from .ina import InaIE from .ina import InaIE
from .inc import IncIE from .inc import IncIE
from .indavideo import IndavideoEmbedIE from .indavideo import IndavideoEmbedIE
from .infoq import InfoQIE from .infoq import InfoQIE
from .instagram import InstagramIE, InstagramUserIE from .instagram import (
InstagramIE,
InstagramUserIE,
InstagramTagIE,
)
from .internazionale import InternazionaleIE from .internazionale import InternazionaleIE
from .internetvideoarchive import InternetVideoArchiveIE from .internetvideoarchive import InternetVideoArchiveIE
from .iprima import IPrimaIE from .iprima import IPrimaIE
@ -549,6 +564,11 @@ from .lcp import (
) )
from .learnr import LearnrIE from .learnr import LearnrIE
from .lecture2go import Lecture2GoIE from .lecture2go import Lecture2GoIE
from .lecturio import (
LecturioIE,
LecturioCourseIE,
LecturioDeCourseIE,
)
from .leeco import ( from .leeco import (
LeIE, LeIE,
LePlaylistIE, LePlaylistIE,
@ -674,8 +694,7 @@ from .myvi import (
from .myvidster import MyVidsterIE from .myvidster import MyVidsterIE
from .nationalgeographic import ( from .nationalgeographic import (
NationalGeographicVideoIE, NationalGeographicVideoIE,
NationalGeographicIE, NationalGeographicTVIE,
NationalGeographicEpisodeGuideIE,
) )
from .naver import NaverIE from .naver import NaverIE
from .nba import NBAIE from .nba import NBAIE
@ -818,6 +837,7 @@ from .orf import (
ORFOE1IE, ORFOE1IE,
ORFIPTVIE, ORFIPTVIE,
) )
from .outsidetv import OutsideTVIE
from .packtpub import ( from .packtpub import (
PacktPubIE, PacktPubIE,
PacktPubCourseIE, PacktPubCourseIE,
@ -846,6 +866,7 @@ from .piksel import PikselIE
from .pinkbike import PinkbikeIE from .pinkbike import PinkbikeIE
from .pladform import PladformIE from .pladform import PladformIE
from .playfm import PlayFMIE from .playfm import PlayFMIE
from .playplustv import PlayPlusTVIE
from .plays import PlaysTVIE from .plays import PlaysTVIE
from .playtvak import PlaytvakIE from .playtvak import PlaytvakIE
from .playvid import PlayvidIE from .playvid import PlayvidIE
@ -1082,6 +1103,10 @@ from .tass import TassIE
from .tastytrade import TastyTradeIE from .tastytrade import TastyTradeIE
from .tbs import TBSIE from .tbs import TBSIE
from .tdslifeway import TDSLifewayIE from .tdslifeway import TDSLifewayIE
from .teachable import (
TeachableIE,
TeachableCourseIE,
)
from .teachertube import ( from .teachertube import (
TeacherTubeIE, TeacherTubeIE,
TeacherTubeUserIE, TeacherTubeUserIE,
@ -1120,6 +1145,10 @@ from .thisamericanlife import ThisAmericanLifeIE
from .thisav import ThisAVIE from .thisav import ThisAVIE
from .thisoldhouse import ThisOldHouseIE from .thisoldhouse import ThisOldHouseIE
from .threeqsdn import ThreeQSDNIE from .threeqsdn import ThreeQSDNIE
from .tiktok import (
TikTokIE,
TikTokUserIE,
)
from .tinypic import TinyPicIE from .tinypic import TinyPicIE
from .tmz import ( from .tmz import (
TMZIE, TMZIE,
@ -1175,7 +1204,9 @@ from .tvnet import TVNetIE
from .tvnoe import TVNoeIE from .tvnoe import TVNoeIE
from .tvnow import ( from .tvnow import (
TVNowIE, TVNowIE,
TVNowListIE, TVNowNewIE,
TVNowSeasonIE,
TVNowAnnualIE,
TVNowShowIE, TVNowShowIE,
) )
from .tvp import ( from .tvp import (
@ -1227,10 +1258,6 @@ from .uplynk import (
UplynkIE, UplynkIE,
UplynkPreplayIE, UplynkPreplayIE,
) )
from .upskill import (
UpskillIE,
UpskillCourseIE,
)
from .urort import UrortIE from .urort import UrortIE
from .urplay import URPlayIE from .urplay import URPlayIE
from .usanetwork import USANetworkIE from .usanetwork import USANetworkIE
@ -1299,6 +1326,7 @@ from .vimeo import (
VimeoReviewIE, VimeoReviewIE,
VimeoUserIE, VimeoUserIE,
VimeoWatchLaterIE, VimeoWatchLaterIE,
VHXEmbedIE,
) )
from .vimple import VimpleIE from .vimple import VimpleIE
from .vine import ( from .vine import (
@ -1334,7 +1362,6 @@ from .voxmedia import (
VoxMediaVolumeIE, VoxMediaVolumeIE,
VoxMediaIE, VoxMediaIE,
) )
from .vporn import VpornIE
from .vrt import VRTIE from .vrt import VRTIE
from .vrak import VrakIE from .vrak import VrakIE
from .vrv import ( from .vrv import (
@ -1348,6 +1375,7 @@ from .vuclip import VuClipIE
from .vvvvid import VVVVIDIE from .vvvvid import VVVVIDIE
from .vyborymos import VyboryMosIE from .vyborymos import VyboryMosIE
from .vzaar import VzaarIE from .vzaar import VzaarIE
from .wakanim import WakanimIE
from .walla import WallaIE from .walla import WallaIE
from .washingtonpost import ( from .washingtonpost import (
WashingtonPostIE, WashingtonPostIE,
@ -1386,6 +1414,7 @@ from .wsj import (
WSJIE, WSJIE,
WSJArticleIE, WSJArticleIE,
) )
from .wwe import WWEIE
from .xbef import XBefIE from .xbef import XBefIE
from .xboxclips import XboxClipsIE from .xboxclips import XboxClipsIE
from .xfileshare import XFileShareIE from .xfileshare import XFileShareIE
@ -1470,6 +1499,7 @@ from .zattoo import (
QuantumTVIE, QuantumTVIE,
QuicklineIE, QuicklineIE,
QuicklineLiveIE, QuicklineLiveIE,
SaltTVIE,
SAKTVIE, SAKTVIE,
VTXTVIE, VTXTVIE,
WalyTVIE, WalyTVIE,

View File

@ -1,17 +1,20 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import json
import uuid
from .adobepass import AdobePassIE from .adobepass import AdobePassIE
from .uplynk import UplynkPreplayIE from ..compat import (
from ..compat import compat_str compat_str,
compat_urllib_parse_unquote,
)
from ..utils import ( from ..utils import (
HEADRequest,
int_or_none, int_or_none,
parse_age_limit, parse_age_limit,
parse_duration, parse_duration,
try_get, try_get,
unified_timestamp, unified_timestamp,
update_url_query,
) )
@ -31,6 +34,7 @@ class FOXIE(AdobePassIE):
'upload_date': '20170901', 'upload_date': '20170901',
'creator': 'FOX', 'creator': 'FOX',
'series': 'Gotham', 'series': 'Gotham',
'age_limit': 14,
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
@ -44,48 +48,54 @@ class FOXIE(AdobePassIE):
'url': 'https://www.fox.com/watch/30056b295fb57f7452aeeb4920bc3024/', 'url': 'https://www.fox.com/watch/30056b295fb57f7452aeeb4920bc3024/',
'only_matching': True, 'only_matching': True,
}] }]
_HOME_PAGE_URL = 'https://www.fox.com/'
_API_KEY = 'abdcbed02c124d393b39e818a4312055'
_access_token = None
def _call_api(self, path, video_id, data=None):
headers = {
'X-Api-Key': self._API_KEY,
}
if self._access_token:
headers['Authorization'] = 'Bearer ' + self._access_token
return self._download_json(
'https://api2.fox.com/v2.0/' + path,
video_id, data=data, headers=headers)
def _real_initialize(self):
if not self._access_token:
mvpd_auth = self._get_cookies(self._HOME_PAGE_URL).get('mvpd-auth')
if mvpd_auth:
self._access_token = (self._parse_json(compat_urllib_parse_unquote(
mvpd_auth.value), None, fatal=False) or {}).get('accessToken')
if not self._access_token:
self._access_token = self._call_api(
'login', None, json.dumps({
'deviceId': compat_str(uuid.uuid4()),
}).encode())['accessToken']
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
video = self._download_json( video = self._call_api('vodplayer/' + video_id, video_id)
'https://api.fox.com/fbc-content/v1_4/video/%s' % video_id,
video_id, headers={
'apikey': 'abdcbed02c124d393b39e818a4312055',
'Content-Type': 'application/json',
'Referer': url,
})
title = video['name'] title = video['name']
release_url = video['videoRelease']['url'] release_url = video['url']
m3u8_url = self._download_json(release_url, video_id)['playURL']
description = video.get('description') formats = self._extract_m3u8_formats(
duration = int_or_none(video.get('durationInSeconds')) or int_or_none( m3u8_url, video_id, 'mp4',
video.get('duration')) or parse_duration(video.get('duration')) entry_protocol='m3u8_native', m3u8_id='hls')
timestamp = unified_timestamp(video.get('datePublished')) self._sort_formats(formats)
rating = video.get('contentRating')
age_limit = parse_age_limit(rating)
data = try_get( data = try_get(
video, lambda x: x['trackingData']['properties'], dict) or {} video, lambda x: x['trackingData']['properties'], dict) or {}
duration = int_or_none(video.get('durationInSeconds')) or int_or_none(
video.get('duration')) or parse_duration(video.get('duration'))
timestamp = unified_timestamp(video.get('datePublished'))
creator = data.get('brand') or data.get('network') or video.get('network') creator = data.get('brand') or data.get('network') or video.get('network')
series = video.get('seriesName') or data.get( series = video.get('seriesName') or data.get(
'seriesName') or data.get('show') 'seriesName') or data.get('show')
season_number = int_or_none(video.get('seasonNumber'))
episode = video.get('name')
episode_number = int_or_none(video.get('episodeNumber'))
release_year = int_or_none(video.get('releaseYear'))
if data.get('authRequired'):
resource = self._get_mvpd_resource(
'fbc-fox', title, video.get('guid'), rating)
release_url = update_url_query(
release_url, {
'auth': self._extract_mvpd_auth(
url, video_id, 'fbc-fox', resource)
})
subtitles = {} subtitles = {}
for doc_rel in video.get('documentReleases', []): for doc_rel in video.get('documentReleases', []):
@ -98,36 +108,19 @@ class FOXIE(AdobePassIE):
}] }]
break break
info = { return {
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'description': description, 'formats': formats,
'description': video.get('description'),
'duration': duration, 'duration': duration,
'timestamp': timestamp, 'timestamp': timestamp,
'age_limit': age_limit, 'age_limit': parse_age_limit(video.get('contentRating')),
'creator': creator, 'creator': creator,
'series': series, 'series': series,
'season_number': season_number, 'season_number': int_or_none(video.get('seasonNumber')),
'episode': episode, 'episode': video.get('name'),
'episode_number': episode_number, 'episode_number': int_or_none(video.get('episodeNumber')),
'release_year': release_year, 'release_year': int_or_none(video.get('releaseYear')),
'subtitles': subtitles, 'subtitles': subtitles,
} }
urlh = self._request_webpage(HEADRequest(release_url), video_id)
video_url = compat_str(urlh.geturl())
if UplynkPreplayIE.suitable(video_url):
info.update({
'_type': 'url_transparent',
'url': video_url,
'ie_key': UplynkPreplayIE.ie_key(),
})
else:
m3u8_url = self._download_json(release_url, video_id)['playURL']
formats = self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='hls')
self._sort_formats(formats)
info['formats'] = formats
return info

View File

@ -1,43 +1,33 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import (
smuggle_url,
update_url_query,
)
class FoxSportsIE(InfoExtractor): class FoxSportsIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?foxsports\.com/(?:[^/]+/)*(?P<id>[^/]+)' _VALID_URL = r'https?://(?:www\.)?foxsports\.com/(?:[^/]+/)*video/(?P<id>\d+)'
_TEST = { _TEST = {
'url': 'http://www.foxsports.com/tennessee/video/432609859715', 'url': 'http://www.foxsports.com/tennessee/video/432609859715',
'md5': 'b49050e955bebe32c301972e4012ac17', 'md5': 'b49050e955bebe32c301972e4012ac17',
'info_dict': { 'info_dict': {
'id': 'bwduI3X_TgUB', 'id': '432609859715',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Courtney Lee on going up 2-0 in series vs. Blazers', 'title': 'Courtney Lee on going up 2-0 in series vs. Blazers',
'description': 'Courtney Lee talks about Memphis being focused.', 'description': 'Courtney Lee talks about Memphis being focused.',
'upload_date': '20150423', # TODO: fix timestamp
'timestamp': 1429761109, 'upload_date': '19700101', # '20150423',
# 'timestamp': 1429761109,
'uploader': 'NEWA-FNG-FOXSPORTS', 'uploader': 'NEWA-FNG-FOXSPORTS',
}, },
'params': {
# m3u8 download
'skip_download': True,
},
'add_ie': ['ThePlatform'], 'add_ie': ['ThePlatform'],
} }
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) return self.url_result(
'https://feed.theplatform.com/f/BKQ29B/foxsports-all?byId=' + video_id, 'ThePlatformFeed')
config = self._parse_json(
self._html_search_regex(
r"""class="[^"]*(?:fs-player|platformPlayer-wrapper)[^"]*".+?data-player-config='([^']+)'""",
webpage, 'data player config'),
video_id)
return self.url_result(smuggle_url(update_url_query(
config['releaseURL'], {
'mbr': 'true',
'switch': 'http',
}), {'force_smil_url': True}))

View File

@ -1,6 +1,7 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from .youtube import YoutubeIE
class FreespeechIE(InfoExtractor): class FreespeechIE(InfoExtractor):
@ -27,8 +28,4 @@ class FreespeechIE(InfoExtractor):
r'data-video-url="([^"]+)"', r'data-video-url="([^"]+)"',
webpage, 'youtube url') webpage, 'youtube url')
return { return self.url_result(youtube_url, YoutubeIE.ie_key())
'_type': 'url',
'url': youtube_url,
'ie_key': 'Youtube',
}

View File

@ -1,6 +1,9 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import random
import string
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_HTTPError from ..compat import compat_HTTPError
from ..utils import ( from ..utils import (
@ -87,7 +90,7 @@ class FunimationIE(InfoExtractor):
video_id = title_data.get('id') or self._search_regex([ video_id = title_data.get('id') or self._search_regex([
r"KANE_customdimensions.videoID\s*=\s*'(\d+)';", r"KANE_customdimensions.videoID\s*=\s*'(\d+)';",
r'<iframe[^>]+src="/player/(\d+)"', r'<iframe[^>]+src="/player/(\d+)',
], webpage, 'video_id', default=None) ], webpage, 'video_id', default=None)
if not video_id: if not video_id:
player_url = self._html_search_meta([ player_url = self._html_search_meta([
@ -108,8 +111,10 @@ class FunimationIE(InfoExtractor):
if self._TOKEN: if self._TOKEN:
headers['Authorization'] = 'Token %s' % self._TOKEN headers['Authorization'] = 'Token %s' % self._TOKEN
sources = self._download_json( sources = self._download_json(
'https://prod-api-funimationnow.dadcdigital.com/api/source/catalog/video/%s/signed/' % video_id, 'https://www.funimation.com/api/showexperience/%s/' % video_id,
video_id, headers=headers)['items'] video_id, headers=headers, query={
'pinst_id': ''.join([random.choice(string.digits + string.ascii_letters) for _ in range(8)]),
})['items']
except ExtractorError as e: except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403: if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
error = self._parse_json(e.cause.read(), video_id)['errors'][0] error = self._parse_json(e.cause.read(), video_id)['errors'][0]

View File

@ -0,0 +1,98 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
int_or_none,
str_or_none,
strip_or_none,
try_get,
)
class GaiaIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?gaia\.com/video/(?P<id>[^/?]+).*?\bfullplayer=(?P<type>feature|preview)'
_TESTS = [{
'url': 'https://www.gaia.com/video/connecting-universal-consciousness?fullplayer=feature',
'info_dict': {
'id': '89356',
'ext': 'mp4',
'title': 'Connecting with Universal Consciousness',
'description': 'md5:844e209ad31b7d31345f5ed689e3df6f',
'upload_date': '20151116',
'timestamp': 1447707266,
'duration': 936,
},
'params': {
# m3u8 download
'skip_download': True,
},
}, {
'url': 'https://www.gaia.com/video/connecting-universal-consciousness?fullplayer=preview',
'info_dict': {
'id': '89351',
'ext': 'mp4',
'title': 'Connecting with Universal Consciousness',
'description': 'md5:844e209ad31b7d31345f5ed689e3df6f',
'upload_date': '20151116',
'timestamp': 1447707266,
'duration': 53,
},
'params': {
# m3u8 download
'skip_download': True,
},
}]
def _real_extract(self, url):
display_id, vtype = re.search(self._VALID_URL, url).groups()
node_id = self._download_json(
'https://brooklyn.gaia.com/pathinfo', display_id, query={
'path': 'video/' + display_id,
})['id']
node = self._download_json(
'https://brooklyn.gaia.com/node/%d' % node_id, node_id)
vdata = node[vtype]
media_id = compat_str(vdata['nid'])
title = node['title']
media = self._download_json(
'https://brooklyn.gaia.com/media/' + media_id, media_id)
formats = self._extract_m3u8_formats(
media['mediaUrls']['bcHLS'], media_id, 'mp4')
self._sort_formats(formats)
subtitles = {}
text_tracks = media.get('textTracks', {})
for key in ('captions', 'subtitles'):
for lang, sub_url in text_tracks.get(key, {}).items():
subtitles.setdefault(lang, []).append({
'url': sub_url,
})
fivestar = node.get('fivestar', {})
fields = node.get('fields', {})
def get_field_value(key, value_key='value'):
return try_get(fields, lambda x: x[key][0][value_key])
return {
'id': media_id,
'display_id': display_id,
'title': title,
'formats': formats,
'description': strip_or_none(get_field_value('body') or get_field_value('teaser')),
'timestamp': int_or_none(node.get('created')),
'subtitles': subtitles,
'duration': int_or_none(vdata.get('duration')),
'like_count': int_or_none(try_get(fivestar, lambda x: x['up_count']['value'])),
'dislike_count': int_or_none(try_get(fivestar, lambda x: x['down_count']['value'])),
'comment_count': int_or_none(node.get('comment_count')),
'series': try_get(node, lambda x: x['series']['title'], compat_str),
'season_number': int_or_none(get_field_value('season')),
'season_id': str_or_none(get_field_value('series_nid', 'nid')),
'episode_number': int_or_none(get_field_value('episode')),
}

View File

@ -14,7 +14,7 @@ from ..utils import (
class GameSpotIE(OnceIE): class GameSpotIE(OnceIE):
_VALID_URL = r'https?://(?:www\.)?gamespot\.com/(?:video|article)s/(?:[^/]+/\d+-|embed/)(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?gamespot\.com/(?:video|article|review)s/(?:[^/]+/\d+-|embed/)(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.gamespot.com/videos/arma-3-community-guide-sitrep-i/2300-6410818/', 'url': 'http://www.gamespot.com/videos/arma-3-community-guide-sitrep-i/2300-6410818/',
'md5': 'b2a30deaa8654fcccd43713a6b6a4825', 'md5': 'b2a30deaa8654fcccd43713a6b6a4825',
@ -41,6 +41,9 @@ class GameSpotIE(OnceIE):
}, { }, {
'url': 'https://www.gamespot.com/articles/the-last-of-us-2-receives-new-ps4-trailer/1100-6454469/', 'url': 'https://www.gamespot.com/articles/the-last-of-us-2-receives-new-ps4-trailer/1100-6454469/',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.gamespot.com/reviews/gears-of-war-review/1900-6161188/',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -109,6 +109,7 @@ from .vice import ViceIE
from .xfileshare import XFileShareIE from .xfileshare import XFileShareIE
from .cloudflarestream import CloudflareStreamIE from .cloudflarestream import CloudflareStreamIE
from .peertube import PeerTubeIE from .peertube import PeerTubeIE
from .teachable import TeachableIE
from .indavideo import IndavideoEmbedIE from .indavideo import IndavideoEmbedIE
from .apa import APAIE from .apa import APAIE
from .foxnews import FoxNewsIE from .foxnews import FoxNewsIE
@ -2196,10 +2197,7 @@ class GenericIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
if url.startswith('//'): if url.startswith('//'):
return { return self.url_result(self.http_scheme() + url)
'_type': 'url',
'url': self.http_scheme() + url,
}
parsed_url = compat_urlparse.urlparse(url) parsed_url = compat_urlparse.urlparse(url)
if not parsed_url.scheme: if not parsed_url.scheme:
@ -3112,6 +3110,10 @@ class GenericIE(InfoExtractor):
return self.playlist_from_matches( return self.playlist_from_matches(
peertube_urls, video_id, video_title, ie=PeerTubeIE.ie_key()) peertube_urls, video_id, video_title, ie=PeerTubeIE.ie_key())
teachable_url = TeachableIE._extract_url(webpage, url)
if teachable_url:
return self.url_result(teachable_url)
indavideo_urls = IndavideoEmbedIE._extract_urls(webpage) indavideo_urls = IndavideoEmbedIE._extract_urls(webpage)
if indavideo_urls: if indavideo_urls:
return self.playlist_from_matches( return self.playlist_from_matches(

View File

@ -53,7 +53,7 @@ class GfycatIE(InfoExtractor):
video_id = self._match_id(url) video_id = self._match_id(url)
gfy = self._download_json( gfy = self._download_json(
'http://gfycat.com/cajax/get/%s' % video_id, 'https://api.gfycat.com/v1/gfycats/%s' % video_id,
video_id, 'Downloading video info') video_id, 'Downloading video info')
if 'error' in gfy: if 'error' in gfy:
raise ExtractorError('Gfycat said: ' + gfy['error'], expected=True) raise ExtractorError('Gfycat said: ' + gfy['error'], expected=True)

View File

@ -72,7 +72,7 @@ class GloboIE(InfoExtractor):
return return
try: try:
self._download_json( glb_id = (self._download_json(
'https://login.globo.com/api/authentication', None, data=json.dumps({ 'https://login.globo.com/api/authentication', None, data=json.dumps({
'payload': { 'payload': {
'email': email, 'email': email,
@ -81,7 +81,9 @@ class GloboIE(InfoExtractor):
}, },
}).encode(), headers={ }).encode(), headers={
'Content-Type': 'application/json; charset=utf-8', 'Content-Type': 'application/json; charset=utf-8',
}) }) or {}).get('glbId')
if glb_id:
self._set_cookie('.globo.com', 'GLBID', glb_id)
except ExtractorError as e: except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 401: if isinstance(e.cause, compat_HTTPError) and e.cause.code == 401:
resp = self._parse_json(e.cause.read(), None) resp = self._parse_json(e.cause.read(), None)

View File

@ -25,15 +25,15 @@ class GoIE(AdobePassIE):
}, },
'watchdisneychannel': { 'watchdisneychannel': {
'brand': '004', 'brand': '004',
'requestor_id': 'Disney', 'resource_id': 'Disney',
}, },
'watchdisneyjunior': { 'watchdisneyjunior': {
'brand': '008', 'brand': '008',
'requestor_id': 'DisneyJunior', 'resource_id': 'DisneyJunior',
}, },
'watchdisneyxd': { 'watchdisneyxd': {
'brand': '009', 'brand': '009',
'requestor_id': 'DisneyXD', 'resource_id': 'DisneyXD',
} }
} }
_VALID_URL = r'https?://(?:(?P<sub_domain>%s)\.)?go\.com/(?:(?:[^/]+/)*(?P<id>vdka\w+)|(?:[^/]+/)*(?P<display_id>[^/?#]+))'\ _VALID_URL = r'https?://(?:(?P<sub_domain>%s)\.)?go\.com/(?:(?:[^/]+/)*(?P<id>vdka\w+)|(?:[^/]+/)*(?P<display_id>[^/?#]+))'\
@ -130,8 +130,8 @@ class GoIE(AdobePassIE):
'device': '001', 'device': '001',
} }
if video_data.get('accesslevel') == '1': if video_data.get('accesslevel') == '1':
requestor_id = site_info['requestor_id'] requestor_id = site_info.get('requestor_id', 'DisneyChannels')
resource = self._get_mvpd_resource( resource = site_info.get('resource_id') or self._get_mvpd_resource(
requestor_id, title, video_id, None) requestor_id, title, video_id, None)
auth = self._extract_mvpd_auth( auth = self._extract_mvpd_auth(
url, video_id, requestor_id, resource) url, video_id, requestor_id, resource)

View File

@ -0,0 +1,191 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
clean_html,
ExtractorError,
int_or_none,
merge_dicts,
parse_count,
str_or_none,
try_get,
unified_strdate,
urlencode_postdata,
urljoin,
)
class HKETVIE(InfoExtractor):
IE_NAME = 'hketv'
IE_DESC = '香港教育局教育電視 (HKETV) Educational Television, Hong Kong Educational Bureau'
_GEO_BYPASS = False
_GEO_COUNTRIES = ['HK']
_VALID_URL = r'https?://(?:www\.)?hkedcity\.net/etv/resource/(?P<id>[0-9]+)'
_TESTS = [{
'url': 'https://www.hkedcity.net/etv/resource/2932360618',
'md5': 'f193712f5f7abb208ddef3c5ea6ed0b7',
'info_dict': {
'id': '2932360618',
'ext': 'mp4',
'title': '喜閱一生(共享閱讀樂) (中、英文字幕可供選擇)',
'description': 'md5:d5286d05219ef50e0613311cbe96e560',
'upload_date': '20181024',
'duration': 900,
'subtitles': 'count:2',
},
'skip': 'Geo restricted to HK',
}, {
'url': 'https://www.hkedcity.net/etv/resource/972641418',
'md5': '1ed494c1c6cf7866a8290edad9b07dc9',
'info_dict': {
'id': '972641418',
'ext': 'mp4',
'title': '衣冠楚楚 (天使系列之一)',
'description': 'md5:10bb3d659421e74f58e5db5691627b0f',
'upload_date': '20070109',
'duration': 907,
'subtitles': {},
},
'params': {
'geo_verification_proxy': '<HK proxy here>',
},
'skip': 'Geo restricted to HK',
}]
_CC_LANGS = {
'中文(繁體中文)': 'zh-Hant',
'中文(简体中文)': 'zh-Hans',
'English': 'en',
'Bahasa Indonesia': 'id',
'\u0939\u093f\u0928\u094d\u0926\u0940': 'hi',
'\u0928\u0947\u092a\u093e\u0932\u0940': 'ne',
'Tagalog': 'tl',
'\u0e44\u0e17\u0e22': 'th',
'\u0627\u0631\u062f\u0648': 'ur',
}
_FORMAT_HEIGHTS = {
'SD': 360,
'HD': 720,
}
_APPS_BASE_URL = 'https://apps.hkedcity.net'
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
title = (
self._html_search_meta(
('ed_title', 'search.ed_title'), webpage, default=None) or
self._search_regex(
r'data-favorite_title_(?:eng|chi)=(["\'])(?P<id>(?:(?!\1).)+)\1',
webpage, 'title', default=None, group='url') or
self._html_search_regex(
r'<h1>([^<]+)</h1>', webpage, 'title', default=None) or
self._og_search_title(webpage)
)
file_id = self._search_regex(
r'post_var\[["\']file_id["\']\s*\]\s*=\s*(.+?);',
webpage, 'file ID')
curr_url = self._search_regex(
r'post_var\[["\']curr_url["\']\s*\]\s*=\s*"(.+?)";',
webpage, 'curr URL')
data = {
'action': 'get_info',
'curr_url': curr_url,
'file_id': file_id,
'video_url': file_id,
}
response = self._download_json(
self._APPS_BASE_URL + '/media/play/handler.php', video_id,
data=urlencode_postdata(data),
headers=merge_dicts({
'Content-Type': 'application/x-www-form-urlencoded'},
self.geo_verification_headers()))
result = response['result']
if not response.get('success') or not response.get('access'):
error = clean_html(response.get('access_err_msg'))
if 'Video streaming is not available in your country' in error:
self.raise_geo_restricted(
msg=error, countries=self._GEO_COUNTRIES)
else:
raise ExtractorError(error, expected=True)
formats = []
width = int_or_none(result.get('width'))
height = int_or_none(result.get('height'))
playlist0 = result['playlist'][0]
for fmt in playlist0['sources']:
file_url = urljoin(self._APPS_BASE_URL, fmt.get('file'))
if not file_url:
continue
# If we ever wanted to provide the final resolved URL that
# does not require cookies, albeit with a shorter lifespan:
# urlh = self._downloader.urlopen(file_url)
# resolved_url = urlh.geturl()
label = fmt.get('label')
h = self._FORMAT_HEIGHTS.get(label)
w = h * width // height if h and width and height else None
formats.append({
'format_id': label,
'ext': fmt.get('type'),
'url': file_url,
'width': w,
'height': h,
})
self._sort_formats(formats)
subtitles = {}
tracks = try_get(playlist0, lambda x: x['tracks'], list) or []
for track in tracks:
if not isinstance(track, dict):
continue
track_kind = str_or_none(track.get('kind'))
if not track_kind or not isinstance(track_kind, compat_str):
continue
if track_kind.lower() not in ('captions', 'subtitles'):
continue
track_url = urljoin(self._APPS_BASE_URL, track.get('file'))
if not track_url:
continue
track_label = track.get('label')
subtitles.setdefault(self._CC_LANGS.get(
track_label, track_label), []).append({
'url': self._proto_relative_url(track_url),
'ext': 'srt',
})
# Likes
emotion = self._download_json(
'https://emocounter.hkedcity.net/handler.php', video_id,
data=urlencode_postdata({
'action': 'get_emotion',
'data[bucket_id]': 'etv',
'data[identifier]': video_id,
}),
headers={'Content-Type': 'application/x-www-form-urlencoded'},
fatal=False) or {}
like_count = int_or_none(try_get(
emotion, lambda x: x['data']['emotion_data'][0]['count']))
return {
'id': video_id,
'title': title,
'description': self._html_search_meta(
'description', webpage, fatal=False),
'upload_date': unified_strdate(self._html_search_meta(
'ed_date', webpage, fatal=False), day_first=False),
'duration': int_or_none(result.get('length')),
'formats': formats,
'subtitles': subtitles,
'thumbnail': urljoin(self._APPS_BASE_URL, result.get('image')),
'view_count': parse_count(result.get('view_count')),
'like_count': like_count,
}

View File

@ -43,6 +43,7 @@ class HotStarIE(HotStarBaseIE):
IE_NAME = 'hotstar' IE_NAME = 'hotstar'
_VALID_URL = r'https?://(?:www\.)?hotstar\.com/(?:.+?[/-])?(?P<id>\d{10})' _VALID_URL = r'https?://(?:www\.)?hotstar\.com/(?:.+?[/-])?(?P<id>\d{10})'
_TESTS = [{ _TESTS = [{
# contentData
'url': 'https://www.hotstar.com/can-you-not-spread-rumours/1000076273', 'url': 'https://www.hotstar.com/can-you-not-spread-rumours/1000076273',
'info_dict': { 'info_dict': {
'id': '1000076273', 'id': '1000076273',
@ -57,6 +58,10 @@ class HotStarIE(HotStarBaseIE):
# m3u8 download # m3u8 download
'skip_download': True, 'skip_download': True,
} }
}, {
# contentDetail
'url': 'https://www.hotstar.com/movies/radha-gopalam/1000057157',
'only_matching': True,
}, { }, {
'url': 'http://www.hotstar.com/sports/cricket/rajitha-sizzles-on-debut-with-329/2001477583', 'url': 'http://www.hotstar.com/sports/cricket/rajitha-sizzles-on-debut-with-329/2001477583',
'only_matching': True, 'only_matching': True,
@ -74,10 +79,15 @@ class HotStarIE(HotStarBaseIE):
r'<script>window\.APP_STATE\s*=\s*({.+?})</script>', r'<script>window\.APP_STATE\s*=\s*({.+?})</script>',
webpage, 'app state'), video_id) webpage, 'app state'), video_id)
video_data = {} video_data = {}
getters = list(
lambda x, k=k: x['initialState']['content%s' % k]['content']
for k in ('Data', 'Detail')
)
for v in app_state.values(): for v in app_state.values():
content = try_get(v, lambda x: x['initialState']['contentData']['content'], dict) content = try_get(v, getters, dict)
if content and content.get('contentId') == video_id: if content and content.get('contentId') == video_id:
video_data = content video_data = content
break
title = video_data['title'] title = video_data['title']

View File

@ -0,0 +1,117 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
int_or_none,
urlencode_postdata,
)
class HungamaIE(InfoExtractor):
_VALID_URL = r'''(?x)
https?://
(?:www\.)?hungama\.com/
(?:
(?:video|movie)/[^/]+/|
tv-show/(?:[^/]+/){2}\d+/episode/[^/]+/
)
(?P<id>\d+)
'''
_TESTS = [{
'url': 'http://www.hungama.com/video/krishna-chants/39349649/',
'md5': 'a845a6d1ebd08d80c1035126d49bd6a0',
'info_dict': {
'id': '2931166',
'ext': 'mp4',
'title': 'Lucky Ali - Kitni Haseen Zindagi',
'track': 'Kitni Haseen Zindagi',
'artist': 'Lucky Ali',
'album': 'Aks',
'release_year': 2000,
}
}, {
'url': 'https://www.hungama.com/movie/kahaani-2/44129919/',
'only_matching': True,
}, {
'url': 'https://www.hungama.com/tv-show/padded-ki-pushup/season-1/44139461/episode/ep-02-training-sasu-pathlaag-karing/44139503/',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
info = self._search_json_ld(webpage, video_id)
m3u8_url = self._download_json(
'https://www.hungama.com/index.php', video_id,
data=urlencode_postdata({'content_id': video_id}), headers={
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
'X-Requested-With': 'XMLHttpRequest',
}, query={
'c': 'common',
'm': 'get_video_mdn_url',
})['stream_url']
formats = self._extract_m3u8_formats(
m3u8_url, video_id, ext='mp4', entry_protocol='m3u8_native',
m3u8_id='hls')
self._sort_formats(formats)
info.update({
'id': video_id,
'formats': formats,
})
return info
class HungamaSongIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?hungama\.com/song/[^/]+/(?P<id>\d+)'
_TEST = {
'url': 'https://www.hungama.com/song/kitni-haseen-zindagi/2931166/',
'md5': 'a845a6d1ebd08d80c1035126d49bd6a0',
'info_dict': {
'id': '2931166',
'ext': 'mp4',
'title': 'Lucky Ali - Kitni Haseen Zindagi',
'track': 'Kitni Haseen Zindagi',
'artist': 'Lucky Ali',
'album': 'Aks',
'release_year': 2000,
}
}
def _real_extract(self, url):
audio_id = self._match_id(url)
data = self._download_json(
'https://www.hungama.com/audio-player-data/track/%s' % audio_id,
audio_id, query={'_country': 'IN'})[0]
track = data['song_name']
artist = data.get('singer_name')
m3u8_url = self._download_json(
data.get('file') or data['preview_link'],
audio_id)['response']['media_url']
formats = self._extract_m3u8_formats(
m3u8_url, audio_id, ext='mp4', entry_protocol='m3u8_native',
m3u8_id='hls')
self._sort_formats(formats)
title = '%s - %s' % (artist, track) if artist else track
thumbnail = data.get('img_src') or data.get('album_image')
return {
'id': audio_id,
'title': title,
'thumbnail': thumbnail,
'track': track,
'artist': artist,
'album': data.get('album_name'),
'release_year': int_or_none(data.get('date')),
'formats': formats,
}

View File

@ -12,7 +12,7 @@ from ..utils import (
class ImgurIE(InfoExtractor): class ImgurIE(InfoExtractor):
_VALID_URL = r'https?://(?:i\.)?imgur\.com/(?:(?:gallery|(?:topic|r)/[^/]+)/)?(?P<id>[a-zA-Z0-9]{6,})(?:[/?#&]+|\.[a-z0-9]+)?$' _VALID_URL = r'https?://(?:i\.)?imgur\.com/(?!(?:a|gallery|(?:t(?:opic)?|r)/[^/]+)/)(?P<id>[a-zA-Z0-9]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://i.imgur.com/A61SaA1.gifv', 'url': 'https://i.imgur.com/A61SaA1.gifv',
@ -20,28 +20,9 @@ class ImgurIE(InfoExtractor):
'id': 'A61SaA1', 'id': 'A61SaA1',
'ext': 'mp4', 'ext': 'mp4',
'title': 're:Imgur GIF$|MRW gifv is up and running without any bugs$', 'title': 're:Imgur GIF$|MRW gifv is up and running without any bugs$',
'description': 'Imgur: The magic of the Internet',
}, },
}, { }, {
'url': 'https://imgur.com/A61SaA1', 'url': 'https://imgur.com/A61SaA1',
'info_dict': {
'id': 'A61SaA1',
'ext': 'mp4',
'title': 're:Imgur GIF$|MRW gifv is up and running without any bugs$',
'description': 'Imgur: The magic of the Internet',
},
}, {
'url': 'https://imgur.com/gallery/YcAQlkx',
'info_dict': {
'id': 'YcAQlkx',
'ext': 'mp4',
'title': 'Classic Steve Carell gif...cracks me up everytime....damn the repost downvotes....',
}
}, {
'url': 'http://imgur.com/topic/Funny/N8rOudd',
'only_matching': True,
}, {
'url': 'http://imgur.com/r/aww/VQcQPhM',
'only_matching': True, 'only_matching': True,
}, { }, {
'url': 'https://i.imgur.com/crGpqCV.mp4', 'url': 'https://i.imgur.com/crGpqCV.mp4',
@ -50,8 +31,8 @@ class ImgurIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
gifv_url = 'https://i.imgur.com/{id}.gifv'.format(id=video_id) webpage = self._download_webpage(
webpage = self._download_webpage(gifv_url, video_id) 'https://i.imgur.com/{id}.gifv'.format(id=video_id), video_id)
width = int_or_none(self._og_search_property( width = int_or_none(self._og_search_property(
'video:width', webpage, default=None)) 'video:width', webpage, default=None))
@ -72,7 +53,6 @@ class ImgurIE(InfoExtractor):
'format_id': m.group('type').partition('/')[2], 'format_id': m.group('type').partition('/')[2],
'url': self._proto_relative_url(m.group('src')), 'url': self._proto_relative_url(m.group('src')),
'ext': mimetype2ext(m.group('type')), 'ext': mimetype2ext(m.group('type')),
'acodec': 'none',
'width': width, 'width': width,
'height': height, 'height': height,
'http_headers': { 'http_headers': {
@ -107,44 +87,64 @@ class ImgurIE(InfoExtractor):
return { return {
'id': video_id, 'id': video_id,
'formats': formats, 'formats': formats,
'description': self._og_search_description(webpage, default=None),
'title': self._og_search_title(webpage), 'title': self._og_search_title(webpage),
} }
class ImgurAlbumIE(InfoExtractor): class ImgurGalleryIE(InfoExtractor):
_VALID_URL = r'https?://(?:i\.)?imgur\.com/(?:(?:a|gallery|topic/[^/]+)/)?(?P<id>[a-zA-Z0-9]{5})(?:[/?#&]+)?$' IE_NAME = 'imgur:gallery'
_VALID_URL = r'https?://(?:i\.)?imgur\.com/(?:gallery|(?:t(?:opic)?|r)/[^/]+)/(?P<id>[a-zA-Z0-9]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://imgur.com/gallery/Q95ko', 'url': 'http://imgur.com/gallery/Q95ko',
'info_dict': { 'info_dict': {
'id': 'Q95ko', 'id': 'Q95ko',
'title': 'Adding faces make every GIF better',
}, },
'playlist_count': 25, 'playlist_count': 25,
}, { }, {
'url': 'http://imgur.com/a/j6Orj', 'url': 'http://imgur.com/topic/Aww/ll5Vk',
'only_matching': True, 'only_matching': True,
}, { }, {
'url': 'http://imgur.com/topic/Aww/ll5Vk', 'url': 'https://imgur.com/gallery/YcAQlkx',
'info_dict': {
'id': 'YcAQlkx',
'ext': 'mp4',
'title': 'Classic Steve Carell gif...cracks me up everytime....damn the repost downvotes....',
}
}, {
'url': 'http://imgur.com/topic/Funny/N8rOudd',
'only_matching': True,
}, {
'url': 'http://imgur.com/r/aww/VQcQPhM',
'only_matching': True, 'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
album_id = self._match_id(url) gallery_id = self._match_id(url)
album_images = self._download_json( data = self._download_json(
'http://imgur.com/gallery/%s/album_images/hit.json?all=true' % album_id, 'https://imgur.com/gallery/%s.json' % gallery_id,
album_id, fatal=False) gallery_id)['data']['image']
if album_images: if data.get('is_album'):
data = album_images.get('data') entries = [
if data and isinstance(data, dict): self.url_result('http://imgur.com/%s' % image['hash'], ImgurIE.ie_key(), image['hash'])
images = data.get('images') for image in data['album_images']['images'] if image.get('hash')]
if images and isinstance(images, list): return self.playlist_result(entries, gallery_id, data.get('title'), data.get('description'))
entries = [
self.url_result('http://imgur.com/%s' % image['hash'])
for image in images if image.get('hash')]
return self.playlist_result(entries, album_id)
# Fallback to single video return self.url_result('http://imgur.com/%s' % gallery_id, ImgurIE.ie_key(), gallery_id)
return self.url_result('http://imgur.com/%s' % album_id, ImgurIE.ie_key())
class ImgurAlbumIE(ImgurGalleryIE):
IE_NAME = 'imgur:album'
_VALID_URL = r'https?://(?:i\.)?imgur\.com/a/(?P<id>[a-zA-Z0-9]+)'
_TESTS = [{
'url': 'http://imgur.com/a/j6Orj',
'info_dict': {
'id': 'j6Orj',
'title': 'A Literary Analysis of "Star Wars: The Force Awakens"',
},
'playlist_count': 12,
}]

View File

@ -227,44 +227,37 @@ class InstagramIE(InfoExtractor):
} }
class InstagramUserIE(InfoExtractor): class InstagramPlaylistIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?instagram\.com/(?P<id>[^/]{2,})/?(?:$|[?#])' # A superclass for handling any kind of query based on GraphQL which
IE_DESC = 'Instagram user profile' # results in a playlist.
IE_NAME = 'instagram:user'
_TEST = {
'url': 'https://instagram.com/porsche',
'info_dict': {
'id': 'porsche',
'title': 'porsche',
},
'playlist_count': 5,
'params': {
'extract_flat': True,
'skip_download': True,
'playlistend': 5,
}
}
_gis_tmpl = None _gis_tmpl = None # used to cache GIS request type
def _entries(self, data): def _parse_graphql(self, webpage, item_id):
# Reads a webpage and returns its GraphQL data.
return self._parse_json(
self._search_regex(
r'sharedData\s*=\s*({.+?})\s*;\s*[<\n]', webpage, 'data'),
item_id)
def _extract_graphql(self, data, url):
# Parses GraphQL queries containing videos and generates a playlist.
def get_count(suffix): def get_count(suffix):
return int_or_none(try_get( return int_or_none(try_get(
node, lambda x: x['edge_media_' + suffix]['count'])) node, lambda x: x['edge_media_' + suffix]['count']))
uploader_id = data['entry_data']['ProfilePage'][0]['graphql']['user']['id'] uploader_id = self._match_id(url)
csrf_token = data['config']['csrf_token'] csrf_token = data['config']['csrf_token']
rhx_gis = data.get('rhx_gis') or '3c7ca9dcefcf966d11dacf1f151335e8' rhx_gis = data.get('rhx_gis') or '3c7ca9dcefcf966d11dacf1f151335e8'
self._set_cookie('instagram.com', 'ig_pr', '1')
cursor = '' cursor = ''
for page_num in itertools.count(1): for page_num in itertools.count(1):
variables = json.dumps({ variables = {
'id': uploader_id,
'first': 12, 'first': 12,
'after': cursor, 'after': cursor,
}) }
variables.update(self._query_vars_for(data))
variables = json.dumps(variables)
if self._gis_tmpl: if self._gis_tmpl:
gis_tmpls = [self._gis_tmpl] gis_tmpls = [self._gis_tmpl]
@ -276,21 +269,26 @@ class InstagramUserIE(InfoExtractor):
'%s:%s:%s' % (rhx_gis, csrf_token, std_headers['User-Agent']), '%s:%s:%s' % (rhx_gis, csrf_token, std_headers['User-Agent']),
] ]
# try all of the ways to generate a GIS query, and not only use the
# first one that works, but cache it for future requests
for gis_tmpl in gis_tmpls: for gis_tmpl in gis_tmpls:
try: try:
media = self._download_json( json_data = self._download_json(
'https://www.instagram.com/graphql/query/', uploader_id, 'https://www.instagram.com/graphql/query/', uploader_id,
'Downloading JSON page %d' % page_num, headers={ 'Downloading JSON page %d' % page_num, headers={
'X-Requested-With': 'XMLHttpRequest', 'X-Requested-With': 'XMLHttpRequest',
'X-Instagram-GIS': hashlib.md5( 'X-Instagram-GIS': hashlib.md5(
('%s:%s' % (gis_tmpl, variables)).encode('utf-8')).hexdigest(), ('%s:%s' % (gis_tmpl, variables)).encode('utf-8')).hexdigest(),
}, query={ }, query={
'query_hash': '42323d64886122307be10013ad2dcc44', 'query_hash': self._QUERY_HASH,
'variables': variables, 'variables': variables,
})['data']['user']['edge_owner_to_timeline_media'] })
media = self._parse_timeline_from(json_data)
self._gis_tmpl = gis_tmpl self._gis_tmpl = gis_tmpl
break break
except ExtractorError as e: except ExtractorError as e:
# if it's an error caused by a bad query, and there are
# more GIS templates to try, ignore it and keep trying
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403: if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
if gis_tmpl != gis_tmpls[-1]: if gis_tmpl != gis_tmpls[-1]:
continue continue
@ -348,14 +346,80 @@ class InstagramUserIE(InfoExtractor):
break break
def _real_extract(self, url): def _real_extract(self, url):
username = self._match_id(url) user_or_tag = self._match_id(url)
webpage = self._download_webpage(url, user_or_tag)
data = self._parse_graphql(webpage, user_or_tag)
webpage = self._download_webpage(url, username) self._set_cookie('instagram.com', 'ig_pr', '1')
data = self._parse_json(
self._search_regex(
r'sharedData\s*=\s*({.+?})\s*;\s*[<\n]', webpage, 'data'),
username)
return self.playlist_result( return self.playlist_result(
self._entries(data), username, username) self._extract_graphql(data, url), user_or_tag, user_or_tag)
class InstagramUserIE(InstagramPlaylistIE):
_VALID_URL = r'https?://(?:www\.)?instagram\.com/(?P<id>[^/]{2,})/?(?:$|[?#])'
IE_DESC = 'Instagram user profile'
IE_NAME = 'instagram:user'
_TEST = {
'url': 'https://instagram.com/porsche',
'info_dict': {
'id': 'porsche',
'title': 'porsche',
},
'playlist_count': 5,
'params': {
'extract_flat': True,
'skip_download': True,
'playlistend': 5,
}
}
_QUERY_HASH = '42323d64886122307be10013ad2dcc44',
@staticmethod
def _parse_timeline_from(data):
# extracts the media timeline data from a GraphQL result
return data['data']['user']['edge_owner_to_timeline_media']
@staticmethod
def _query_vars_for(data):
# returns a dictionary of variables to add to the timeline query based
# on the GraphQL of the original page
return {
'id': data['entry_data']['ProfilePage'][0]['graphql']['user']['id']
}
class InstagramTagIE(InstagramPlaylistIE):
_VALID_URL = r'https?://(?:www\.)?instagram\.com/explore/tags/(?P<id>[^/]+)'
IE_DESC = 'Instagram hashtag search'
IE_NAME = 'instagram:tag'
_TEST = {
'url': 'https://instagram.com/explore/tags/lolcats',
'info_dict': {
'id': 'lolcats',
'title': 'lolcats',
},
'playlist_count': 50,
'params': {
'extract_flat': True,
'skip_download': True,
'playlistend': 50,
}
}
_QUERY_HASH = 'f92f56d47dc7a55b606908374b43a314',
@staticmethod
def _parse_timeline_from(data):
# extracts the media timeline data from a GraphQL result
return data['data']['hashtag']['edge_hashtag_to_media']
@staticmethod
def _query_vars_for(data):
# returns a dictionary of variables to add to the timeline query based
# on the GraphQL of the original page
return {
'tag_name':
data['entry_data']['TagPage'][0]['graphql']['hashtag']['name']
}

View File

@ -12,7 +12,7 @@ from ..utils import (
class IPrimaIE(InfoExtractor): class IPrimaIE(InfoExtractor):
_VALID_URL = r'https?://(?:play|prima)\.iprima\.cz/(?:.+/)?(?P<id>[^?#]+)' _VALID_URL = r'https?://(?:[^/]+)\.iprima\.cz/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_GEO_BYPASS = False _GEO_BYPASS = False
_TESTS = [{ _TESTS = [{
@ -41,6 +41,24 @@ class IPrimaIE(InfoExtractor):
# iframe prima.iprima.cz # iframe prima.iprima.cz
'url': 'https://prima.iprima.cz/porady/jak-se-stavi-sen/rodina-rathousova-praha', 'url': 'https://prima.iprima.cz/porady/jak-se-stavi-sen/rodina-rathousova-praha',
'only_matching': True, 'only_matching': True,
}, {
'url': 'http://www.iprima.cz/filmy/desne-rande',
'only_matching': True,
}, {
'url': 'https://zoom.iprima.cz/10-nejvetsich-tajemstvi-zahad/posvatna-mista-a-stavby',
'only_matching': True,
}, {
'url': 'https://krimi.iprima.cz/mraz-0/sebevrazdy',
'only_matching': True,
}, {
'url': 'https://cool.iprima.cz/derava-silnice-nevadi',
'only_matching': True,
}, {
'url': 'https://love.iprima.cz/laska-az-za-hrob/slib-dany-bratrovi',
'only_matching': True,
}, {
'url': 'https://autosalon.iprima.cz/motorsport/7-epizoda-1',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -61,7 +61,7 @@ class JojIE(InfoExtractor):
bitrates = self._parse_json( bitrates = self._parse_json(
self._search_regex( self._search_regex(
r'(?s)bitrates\s*=\s*({.+?});', webpage, 'bitrates', r'(?s)(?:src|bitrates)\s*=\s*({.+?});', webpage, 'bitrates',
default='{}'), default='{}'),
video_id, transform_source=js_to_json, fatal=False) video_id, transform_source=js_to_json, fatal=False)

View File

@ -7,8 +7,8 @@ from .common import InfoExtractor
class JWPlatformIE(InfoExtractor): class JWPlatformIE(InfoExtractor):
_VALID_URL = r'(?:https?://content\.jwplatform\.com/(?:feeds|players|jw6)/|jwplatform:)(?P<id>[a-zA-Z0-9]{8})' _VALID_URL = r'(?:https?://(?:content\.jwplatform|cdn\.jwplayer)\.com/(?:(?:feed|player|thumb|preview|video|manifest)s|jw6|v2/media)/|jwplatform:)(?P<id>[a-zA-Z0-9]{8})'
_TEST = { _TESTS = [{
'url': 'http://content.jwplatform.com/players/nPripu9l-ALJ3XQCI.js', 'url': 'http://content.jwplatform.com/players/nPripu9l-ALJ3XQCI.js',
'md5': 'fa8899fa601eb7c83a64e9d568bdf325', 'md5': 'fa8899fa601eb7c83a64e9d568bdf325',
'info_dict': { 'info_dict': {
@ -19,7 +19,10 @@ class JWPlatformIE(InfoExtractor):
'upload_date': '20081127', 'upload_date': '20081127',
'timestamp': 1227796140, 'timestamp': 1227796140,
} }
} }, {
'url': 'https://cdn.jwplayer.com/players/nPripu9l-ALJ3XQCI.js',
'only_matching': True,
}]
@staticmethod @staticmethod
def _extract_url(webpage): def _extract_url(webpage):
@ -34,5 +37,5 @@ class JWPlatformIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
json_data = self._download_json('http://content.jwplatform.com/feeds/%s.json' % video_id, video_id) json_data = self._download_json('https://cdn.jwplayer.com/v2/media/' + video_id, video_id)
return self._parse_jwplayer_data(json_data, video_id) return self._parse_jwplayer_data(json_data, video_id)

View File

@ -192,6 +192,8 @@ class KalturaIE(InfoExtractor):
'entryId': video_id, 'entryId': video_id,
'service': 'baseentry', 'service': 'baseentry',
'ks': '{1:result:ks}', 'ks': '{1:result:ks}',
'responseProfile:fields': 'createdAt,dataUrl,duration,name,plays,thumbnailUrl,userId',
'responseProfile:type': 1,
}, },
{ {
'action': 'getbyentryid', 'action': 'getbyentryid',

View File

@ -0,0 +1,229 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
determine_ext,
extract_attributes,
ExtractorError,
float_or_none,
int_or_none,
str_or_none,
url_or_none,
urlencode_postdata,
urljoin,
)
class LecturioBaseIE(InfoExtractor):
_LOGIN_URL = 'https://app.lecturio.com/en/login'
_NETRC_MACHINE = 'lecturio'
def _real_initialize(self):
self._login()
def _login(self):
username, password = self._get_login_info()
if username is None:
return
# Sets some cookies
_, urlh = self._download_webpage_handle(
self._LOGIN_URL, None, 'Downloading login popup')
def is_logged(url_handle):
return self._LOGIN_URL not in compat_str(url_handle.geturl())
# Already logged in
if is_logged(urlh):
return
login_form = {
'signin[email]': username,
'signin[password]': password,
'signin[remember]': 'on',
}
response, urlh = self._download_webpage_handle(
self._LOGIN_URL, None, 'Logging in',
data=urlencode_postdata(login_form))
# Logged in successfully
if is_logged(urlh):
return
errors = self._html_search_regex(
r'(?s)<ul[^>]+class=["\']error_list[^>]+>(.+?)</ul>', response,
'errors', default=None)
if errors:
raise ExtractorError('Unable to login: %s' % errors, expected=True)
raise ExtractorError('Unable to log in')
class LecturioIE(LecturioBaseIE):
_VALID_URL = r'''(?x)
https://
(?:
app\.lecturio\.com/[^/]+/(?P<id>[^/?#&]+)\.lecture|
(?:www\.)?lecturio\.de/[^/]+/(?P<id_de>[^/?#&]+)\.vortrag
)
'''
_TESTS = [{
'url': 'https://app.lecturio.com/medical-courses/important-concepts-and-terms-introduction-to-microbiology.lecture#tab/videos',
'md5': 'f576a797a5b7a5e4e4bbdfc25a6a6870',
'info_dict': {
'id': '39634',
'ext': 'mp4',
'title': 'Important Concepts and Terms Introduction to Microbiology',
},
'skip': 'Requires lecturio account credentials',
}, {
'url': 'https://www.lecturio.de/jura/oeffentliches-recht-staatsexamen.vortrag',
'only_matching': True,
}]
_CC_LANGS = {
'German': 'de',
'English': 'en',
'Spanish': 'es',
'French': 'fr',
'Polish': 'pl',
'Russian': 'ru',
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
display_id = mobj.group('id') or mobj.group('id_de')
webpage = self._download_webpage(
'https://app.lecturio.com/en/lecture/%s/player.html' % display_id,
display_id)
lecture_id = self._search_regex(
r'lecture_id\s*=\s*(?:L_)?(\d+)', webpage, 'lecture id')
api_url = self._search_regex(
r'lectureDataLink\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1', webpage,
'api url', group='url')
video = self._download_json(api_url, display_id)
title = video['title'].strip()
formats = []
for format_ in video['content']['media']:
if not isinstance(format_, dict):
continue
file_ = format_.get('file')
if not file_:
continue
ext = determine_ext(file_)
if ext == 'smil':
# smil contains only broken RTMP formats anyway
continue
file_url = url_or_none(file_)
if not file_url:
continue
label = str_or_none(format_.get('label'))
filesize = int_or_none(format_.get('fileSize'))
formats.append({
'url': file_url,
'format_id': label,
'filesize': float_or_none(filesize, invscale=1000)
})
self._sort_formats(formats)
subtitles = {}
automatic_captions = {}
cc = self._parse_json(
self._search_regex(
r'subtitleUrls\s*:\s*({.+?})\s*,', webpage, 'subtitles',
default='{}'), display_id, fatal=False)
for cc_label, cc_url in cc.items():
cc_url = url_or_none(cc_url)
if not cc_url:
continue
lang = self._search_regex(
r'/([a-z]{2})_', cc_url, 'lang',
default=cc_label.split()[0] if cc_label else 'en')
original_lang = self._search_regex(
r'/[a-z]{2}_([a-z]{2})_', cc_url, 'original lang',
default=None)
sub_dict = (automatic_captions
if 'auto-translated' in cc_label or original_lang
else subtitles)
sub_dict.setdefault(self._CC_LANGS.get(lang, lang), []).append({
'url': cc_url,
})
return {
'id': lecture_id,
'title': title,
'formats': formats,
'subtitles': subtitles,
'automatic_captions': automatic_captions,
}
class LecturioCourseIE(LecturioBaseIE):
_VALID_URL = r'https://app\.lecturio\.com/[^/]+/(?P<id>[^/?#&]+)\.course'
_TEST = {
'url': 'https://app.lecturio.com/medical-courses/microbiology-introduction.course#/',
'info_dict': {
'id': 'microbiology-introduction',
'title': 'Microbiology: Introduction',
},
'playlist_count': 45,
'skip': 'Requires lecturio account credentials',
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
entries = []
for mobj in re.finditer(
r'(?s)<[^>]+\bdata-url=(["\'])(?:(?!\1).)+\.lecture\b[^>]+>',
webpage):
params = extract_attributes(mobj.group(0))
lecture_url = urljoin(url, params.get('data-url'))
lecture_id = params.get('data-id')
entries.append(self.url_result(
lecture_url, ie=LecturioIE.ie_key(), video_id=lecture_id))
title = self._search_regex(
r'<span[^>]+class=["\']content-title[^>]+>([^<]+)', webpage,
'title', default=None)
return self.playlist_result(entries, display_id, title)
class LecturioDeCourseIE(LecturioBaseIE):
_VALID_URL = r'https://(?:www\.)?lecturio\.de/[^/]+/(?P<id>[^/?#&]+)\.kurs'
_TEST = {
'url': 'https://www.lecturio.de/jura/grundrechte.kurs',
'only_matching': True,
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
entries = []
for mobj in re.finditer(
r'(?s)<td[^>]+\bdata-lecture-id=["\'](?P<id>\d+).+?\bhref=(["\'])(?P<url>(?:(?!\2).)+\.vortrag)\b[^>]+>',
webpage):
lecture_url = urljoin(url, mobj.group('url'))
lecture_id = mobj.group('id')
entries.append(self.url_result(
lecture_url, ie=LecturioIE.ie_key(), video_id=lecture_id))
title = self._search_regex(
r'<h1[^>]*>([^<]+)', webpage, 'title', default=None)
return self.playlist_result(entries, display_id, title)

View File

@ -16,16 +16,15 @@ from ..utils import (
class LibraryOfCongressIE(InfoExtractor): class LibraryOfCongressIE(InfoExtractor):
IE_NAME = 'loc' IE_NAME = 'loc'
IE_DESC = 'Library of Congress' IE_DESC = 'Library of Congress'
_VALID_URL = r'https?://(?:www\.)?loc\.gov/(?:item/|today/cyberlc/feature_wdesc\.php\?.*\brec=)(?P<id>[0-9]+)' _VALID_URL = r'https?://(?:www\.)?loc\.gov/(?:item/|today/cyberlc/feature_wdesc\.php\?.*\brec=)(?P<id>[0-9a-z_.]+)'
_TESTS = [{ _TESTS = [{
# embedded via <div class="media-player" # embedded via <div class="media-player"
'url': 'http://loc.gov/item/90716351/', 'url': 'http://loc.gov/item/90716351/',
'md5': '353917ff7f0255aa6d4b80a034833de8', 'md5': '6ec0ae8f07f86731b1b2ff70f046210a',
'info_dict': { 'info_dict': {
'id': '90716351', 'id': '90716351',
'ext': 'mp4', 'ext': 'mp4',
'title': "Pa's trip to Mars", 'title': "Pa's trip to Mars",
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 0, 'duration': 0,
'view_count': int, 'view_count': int,
}, },
@ -57,6 +56,12 @@ class LibraryOfCongressIE(InfoExtractor):
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
}, {
'url': 'https://www.loc.gov/item/ihas.200197114/',
'only_matching': True,
}, {
'url': 'https://www.loc.gov/item/afc1981005_afs20503/',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -67,12 +72,13 @@ class LibraryOfCongressIE(InfoExtractor):
(r'id=(["\'])media-player-(?P<id>.+?)\1', (r'id=(["\'])media-player-(?P<id>.+?)\1',
r'<video[^>]+id=(["\'])uuid-(?P<id>.+?)\1', r'<video[^>]+id=(["\'])uuid-(?P<id>.+?)\1',
r'<video[^>]+data-uuid=(["\'])(?P<id>.+?)\1', r'<video[^>]+data-uuid=(["\'])(?P<id>.+?)\1',
r'mediaObjectId\s*:\s*(["\'])(?P<id>.+?)\1'), r'mediaObjectId\s*:\s*(["\'])(?P<id>.+?)\1',
r'data-tab="share-media-(?P<id>[0-9A-F]{32})"'),
webpage, 'media id', group='id') webpage, 'media id', group='id')
data = self._download_json( data = self._download_json(
'https://media.loc.gov/services/v1/media?id=%s&context=json' % media_id, 'https://media.loc.gov/services/v1/media?id=%s&context=json' % media_id,
video_id)['mediaObject'] media_id)['mediaObject']
derivative = data['derivatives'][0] derivative = data['derivatives'][0]
media_url = derivative['derivativeUrl'] media_url = derivative['derivativeUrl']
@ -89,25 +95,29 @@ class LibraryOfCongressIE(InfoExtractor):
if ext not in ('mp4', 'mp3'): if ext not in ('mp4', 'mp3'):
media_url += '.mp4' if is_video else '.mp3' media_url += '.mp4' if is_video else '.mp3'
if 'vod/mp4:' in media_url: formats = []
formats = [{ if '/vod/mp4:' in media_url:
'url': media_url.replace('vod/mp4:', 'hls-vod/media/') + '.m3u8', formats.append({
'url': media_url.replace('/vod/mp4:', '/hls-vod/media/') + '.m3u8',
'format_id': 'hls', 'format_id': 'hls',
'ext': 'mp4', 'ext': 'mp4',
'protocol': 'm3u8_native', 'protocol': 'm3u8_native',
'quality': 1, 'quality': 1,
}] })
elif 'vod/mp3:' in media_url: http_format = {
formats = [{ 'url': re.sub(r'(://[^/]+/)(?:[^/]+/)*(?:mp4|mp3):', r'\1', media_url),
'url': media_url.replace('vod/mp3:', ''), 'format_id': 'http',
'vcodec': 'none', 'quality': 1,
}] }
if not is_video:
http_format['vcodec'] = 'none'
formats.append(http_format)
download_urls = set() download_urls = set()
for m in re.finditer( for m in re.finditer(
r'<option[^>]+value=(["\'])(?P<url>.+?)\1[^>]+data-file-download=[^>]+>\s*(?P<id>.+?)(?:(?:&nbsp;|\s+)\((?P<size>.+?)\))?\s*<', webpage): r'<option[^>]+value=(["\'])(?P<url>.+?)\1[^>]+data-file-download=[^>]+>\s*(?P<id>.+?)(?:(?:&nbsp;|\s+)\((?P<size>.+?)\))?\s*<', webpage):
format_id = m.group('id').lower() format_id = m.group('id').lower()
if format_id == 'gif': if format_id in ('gif', 'jpeg'):
continue continue
download_url = m.group('url') download_url = m.group('url')
if download_url in download_urls: if download_url in download_urls:

View File

@ -87,7 +87,7 @@ class LiveLeakIE(InfoExtractor):
@staticmethod @staticmethod
def _extract_urls(webpage): def _extract_urls(webpage):
return re.findall( return re.findall(
r'<iframe[^>]+src="(https?://(?:\w+\.)?liveleak\.com/ll_embed\?[^"]*[if]=[\w_]+[^"]+)"', r'<iframe[^>]+src="(https?://(?:\w+\.)?liveleak\.com/ll_embed\?[^"]*[ift]=[\w_]+[^"]+)"',
webpage) webpage)
def _real_extract(self, url): def _real_extract(self, url):
@ -120,13 +120,27 @@ class LiveLeakIE(InfoExtractor):
} }
for idx, info_dict in enumerate(entries): for idx, info_dict in enumerate(entries):
formats = []
for a_format in info_dict['formats']: for a_format in info_dict['formats']:
if not a_format.get('height'): if not a_format.get('height'):
a_format['height'] = int_or_none(self._search_regex( a_format['height'] = int_or_none(self._search_regex(
r'([0-9]+)p\.mp4', a_format['url'], 'height label', r'([0-9]+)p\.mp4', a_format['url'], 'height label',
default=None)) default=None))
formats.append(a_format)
self._sort_formats(info_dict['formats']) # Removing '.*.mp4' gives the raw video, which is essentially
# the same video without the LiveLeak logo at the top (see
# https://github.com/rg3/youtube-dl/pull/4768)
orig_url = re.sub(r'\.mp4\.[^.]+', '', a_format['url'])
if a_format['url'] != orig_url:
format_id = a_format.get('format_id')
formats.append({
'format_id': 'original' + ('-' + format_id if format_id else ''),
'url': orig_url,
'preference': 1,
})
self._sort_formats(formats)
info_dict['formats'] = formats
# Don't append entry ID for one-video pages to keep backward compatibility # Don't append entry ID for one-video pages to keep backward compatibility
if len(entries) > 1: if len(entries) > 1:
@ -146,7 +160,7 @@ class LiveLeakIE(InfoExtractor):
class LiveLeakEmbedIE(InfoExtractor): class LiveLeakEmbedIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?liveleak\.com/ll_embed\?.*?\b(?P<kind>[if])=(?P<id>[\w_]+)' _VALID_URL = r'https?://(?:www\.)?liveleak\.com/ll_embed\?.*?\b(?P<kind>[ift])=(?P<id>[\w_]+)'
# See generic.py for actual test cases # See generic.py for actual test cases
_TESTS = [{ _TESTS = [{
@ -158,15 +172,14 @@ class LiveLeakEmbedIE(InfoExtractor):
}] }]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) kind, video_id = re.match(self._VALID_URL, url).groups()
kind, video_id = mobj.group('kind', 'id')
if kind == 'f': if kind == 'f':
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
liveleak_url = self._search_regex( liveleak_url = self._search_regex(
r'logourl\s*:\s*(?P<q1>[\'"])(?P<url>%s)(?P=q1)' % LiveLeakIE._VALID_URL, r'(?:logourl\s*:\s*|window\.open\()(?P<q1>[\'"])(?P<url>%s)(?P=q1)' % LiveLeakIE._VALID_URL,
webpage, 'LiveLeak URL', group='url') webpage, 'LiveLeak URL', group='url')
elif kind == 'i': else:
liveleak_url = 'http://www.liveleak.com/view?i=%s' % video_id liveleak_url = 'http://www.liveleak.com/view?%s=%s' % (kind, video_id)
return self.url_result(liveleak_url, ie=LiveLeakIE.ie_key()) return self.url_result(liveleak_url, ie=LiveLeakIE.ie_key())

View File

@ -363,7 +363,4 @@ class LivestreamShortenerIE(InfoExtractor):
id = mobj.group('id') id = mobj.group('id')
webpage = self._download_webpage(url, id) webpage = self._download_webpage(url, id)
return { return self.url_result(self._og_search_url(webpage))
'_type': 'url',
'url': self._og_search_url(webpage),
}

View File

@ -15,7 +15,7 @@ from ..utils import (
class LyndaBaseIE(InfoExtractor): class LyndaBaseIE(InfoExtractor):
_SIGNIN_URL = 'https://www.lynda.com/signin' _SIGNIN_URL = 'https://www.lynda.com/signin/lynda'
_PASSWORD_URL = 'https://www.lynda.com/signin/password' _PASSWORD_URL = 'https://www.lynda.com/signin/password'
_USER_URL = 'https://www.lynda.com/signin/user' _USER_URL = 'https://www.lynda.com/signin/user'
_ACCOUNT_CREDENTIALS_HINT = 'Use --username and --password options to provide lynda.com account credentials.' _ACCOUNT_CREDENTIALS_HINT = 'Use --username and --password options to provide lynda.com account credentials.'

View File

@ -2,12 +2,18 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import int_or_none from ..utils import (
determine_ext,
int_or_none,
str_to_int,
urlencode_postdata,
)
class ManyVidsIE(InfoExtractor): class ManyVidsIE(InfoExtractor):
_VALID_URL = r'(?i)https?://(?:www\.)?manyvids\.com/video/(?P<id>\d+)' _VALID_URL = r'(?i)https?://(?:www\.)?manyvids\.com/video/(?P<id>\d+)'
_TEST = { _TESTS = [{
# preview video
'url': 'https://www.manyvids.com/Video/133957/everthing-about-me/', 'url': 'https://www.manyvids.com/Video/133957/everthing-about-me/',
'md5': '03f11bb21c52dd12a05be21a5c7dcc97', 'md5': '03f11bb21c52dd12a05be21a5c7dcc97',
'info_dict': { 'info_dict': {
@ -17,7 +23,18 @@ class ManyVidsIE(InfoExtractor):
'view_count': int, 'view_count': int,
'like_count': int, 'like_count': int,
}, },
} }, {
# full video
'url': 'https://www.manyvids.com/Video/935718/MY-FACE-REVEAL/',
'md5': 'f3e8f7086409e9b470e2643edb96bdcc',
'info_dict': {
'id': '935718',
'ext': 'mp4',
'title': 'MY FACE REVEAL',
'view_count': int,
'like_count': int,
},
}]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
@ -28,12 +45,41 @@ class ManyVidsIE(InfoExtractor):
r'data-(?:video-filepath|meta-video)\s*=s*(["\'])(?P<url>(?:(?!\1).)+)\1', r'data-(?:video-filepath|meta-video)\s*=s*(["\'])(?P<url>(?:(?!\1).)+)\1',
webpage, 'video URL', group='url') webpage, 'video URL', group='url')
title = '%s (Preview)' % self._html_search_regex( title = self._html_search_regex(
r'<h2[^>]+class="m-a-0"[^>]*>([^<]+)', webpage, 'title') (r'<span[^>]+class=["\']item-title[^>]+>([^<]+)',
r'<h2[^>]+class=["\']h2 m-0["\'][^>]*>([^<]+)'),
webpage, 'title', default=None) or self._html_search_meta(
'twitter:title', webpage, 'title', fatal=True)
if any(p in webpage for p in ('preview_videos', '_preview.mp4')):
title += ' (Preview)'
mv_token = self._search_regex(
r'data-mvtoken=(["\'])(?P<value>(?:(?!\1).)+)\1', webpage,
'mv token', default=None, group='value')
if mv_token:
# Sets some cookies
self._download_webpage(
'https://www.manyvids.com/includes/ajax_repository/you_had_me_at_hello.php',
video_id, fatal=False, data=urlencode_postdata({
'mvtoken': mv_token,
'vid': video_id,
}), headers={
'Referer': url,
'X-Requested-With': 'XMLHttpRequest'
})
if determine_ext(video_url) == 'm3u8':
formats = self._extract_m3u8_formats(
video_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls')
else:
formats = [{'url': video_url}]
like_count = int_or_none(self._search_regex( like_count = int_or_none(self._search_regex(
r'data-likes=["\'](\d+)', webpage, 'like count', default=None)) r'data-likes=["\'](\d+)', webpage, 'like count', default=None))
view_count = int_or_none(self._html_search_regex( view_count = str_to_int(self._html_search_regex(
r'(?s)<span[^>]+class="views-wrapper"[^>]*>(.+?)</span', webpage, r'(?s)<span[^>]+class="views-wrapper"[^>]*>(.+?)</span', webpage,
'view count', default=None)) 'view count', default=None))
@ -42,7 +88,5 @@ class ManyVidsIE(InfoExtractor):
'title': title, 'title': title,
'view_count': view_count, 'view_count': view_count,
'like_count': like_count, 'like_count': like_count,
'formats': [{ 'formats': formats,
'url': video_url,
}],
} }

View File

@ -21,7 +21,7 @@ from ..utils import (
class MediasiteIE(InfoExtractor): class MediasiteIE(InfoExtractor):
_VALID_URL = r'(?xi)https?://[^/]+/Mediasite/Play/(?P<id>[0-9a-f]{32,34})(?P<query>\?[^#]+|)' _VALID_URL = r'(?xi)https?://[^/]+/Mediasite/(?:Play|Showcase/(?:default|livebroadcast)/Presentation)/(?P<id>[0-9a-f]{32,34})(?P<query>\?[^#]+|)'
_TESTS = [ _TESTS = [
{ {
'url': 'https://hitsmediaweb.h-its.org/mediasite/Play/2db6c271681e4f199af3c60d1f82869b1d', 'url': 'https://hitsmediaweb.h-its.org/mediasite/Play/2db6c271681e4f199af3c60d1f82869b1d',
@ -84,7 +84,15 @@ class MediasiteIE(InfoExtractor):
'timestamp': 1333983600, 'timestamp': 1333983600,
'duration': 7794, 'duration': 7794,
} }
} },
{
'url': 'https://collegerama.tudelft.nl/Mediasite/Showcase/livebroadcast/Presentation/ada7020854f743c49fbb45c9ec7dbb351d',
'only_matching': True,
},
{
'url': 'https://mediasite.ntnu.no/Mediasite/Showcase/default/Presentation/7d8b913259334b688986e970fae6fcb31d',
'only_matching': True,
},
] ]
# look in Mediasite.Core.js (Mediasite.ContentStreamType[*]) # look in Mediasite.Core.js (Mediasite.ContentStreamType[*])

View File

@ -161,11 +161,17 @@ class MixcloudIE(InfoExtractor):
stream_info = info_json['streamInfo'] stream_info = info_json['streamInfo']
formats = [] formats = []
def decrypt_url(f_url):
for k in (key, 'IFYOUWANTTHEARTISTSTOGETPAIDDONOTDOWNLOADFROMMIXCLOUD'):
decrypted_url = self._decrypt_xor_cipher(k, f_url)
if re.search(r'^https?://[0-9a-z.]+/[0-9A-Za-z/.?=&_-]+$', decrypted_url):
return decrypted_url
for url_key in ('url', 'hlsUrl', 'dashUrl'): for url_key in ('url', 'hlsUrl', 'dashUrl'):
format_url = stream_info.get(url_key) format_url = stream_info.get(url_key)
if not format_url: if not format_url:
continue continue
decrypted = self._decrypt_xor_cipher(key, compat_b64decode(format_url)) decrypted = decrypt_url(compat_b64decode(format_url))
if not decrypted: if not decrypted:
continue continue
if url_key == 'hlsUrl': if url_key == 'hlsUrl':

View File

@ -1,15 +1,10 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from .adobepass import AdobePassIE from .fox import FOXIE
from .theplatform import ThePlatformIE
from ..utils import ( from ..utils import (
smuggle_url, smuggle_url,
url_basename, url_basename,
update_url_query,
get_element_by_class,
) )
@ -66,130 +61,22 @@ class NationalGeographicVideoIE(InfoExtractor):
} }
class NationalGeographicIE(ThePlatformIE, AdobePassIE): class NationalGeographicTVIE(FOXIE):
IE_NAME = 'natgeo' _VALID_URL = r'https?://(?:www\.)?nationalgeographic\.com/tv/watch/(?P<id>[\da-fA-F]+)'
_VALID_URL = r'https?://channel\.nationalgeographic\.com/(?:(?:(?:wild/)?[^/]+/)?(?:videos|episodes)|u)/(?P<id>[^/?]+)' _TESTS = [{
'url': 'https://www.nationalgeographic.com/tv/watch/6a875e6e734b479beda26438c9f21138/',
_TESTS = [ 'info_dict': {
{ 'id': '6a875e6e734b479beda26438c9f21138',
'url': 'http://channel.nationalgeographic.com/u/kdi9Ld0PN2molUUIMSBGxoeDhD729KRjQcnxtetilWPMevo8ZwUBIDuPR0Q3D2LVaTsk0MPRkRWDB8ZhqWVeyoxfsZZm36yRp1j-zPfsHEyI_EgAeFY/', 'ext': 'mp4',
'md5': '518c9aa655686cf81493af5cc21e2a04', 'title': 'Why Nat Geo? Valley of the Boom',
'info_dict': { 'description': 'The lives of prominent figures in the tech world, including their friendships, rivalries, victories and failures.',
'id': 'vKInpacll2pC', 'timestamp': 1542662458,
'ext': 'mp4', 'upload_date': '20181119',
'title': 'Uncovering a Universal Knowledge', 'age_limit': 14,
'description': 'md5:1a89148475bf931b3661fcd6ddb2ae3a',
'timestamp': 1458680907,
'upload_date': '20160322',
'uploader': 'NEWA-FNG-NGTV',
},
'add_ie': ['ThePlatform'],
}, },
{ 'params': {
'url': 'http://channel.nationalgeographic.com/u/kdvOstqYaBY-vSBPyYgAZRUL4sWUJ5XUUPEhc7ISyBHqoIO4_dzfY3K6EjHIC0hmFXoQ7Cpzm6RkET7S3oMlm6CFnrQwSUwo/', 'skip_download': True,
'md5': 'c4912f656b4cbe58f3e000c489360989',
'info_dict': {
'id': 'Pok5lWCkiEFA',
'ext': 'mp4',
'title': 'The Stunning Red Bird of Paradise',
'description': 'md5:7bc8cd1da29686be4d17ad1230f0140c',
'timestamp': 1459362152,
'upload_date': '20160330',
'uploader': 'NEWA-FNG-NGTV',
},
'add_ie': ['ThePlatform'],
}, },
{ }]
'url': 'http://channel.nationalgeographic.com/the-story-of-god-with-morgan-freeman/episodes/the-power-of-miracles/', _HOME_PAGE_URL = 'https://www.nationalgeographic.com/tv/'
'only_matching': True, _API_KEY = '238bb0a0c2aba67922c48709ce0c06fd'
},
{
'url': 'http://channel.nationalgeographic.com/videos/treasures-rediscovered/',
'only_matching': True,
},
{
'url': 'http://channel.nationalgeographic.com/the-story-of-god-with-morgan-freeman/videos/uncovering-a-universal-knowledge/',
'only_matching': True,
},
{
'url': 'http://channel.nationalgeographic.com/wild/destination-wild/videos/the-stunning-red-bird-of-paradise/',
'only_matching': True,
}
]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
release_url = self._search_regex(
r'video_auth_playlist_url\s*=\s*"([^"]+)"',
webpage, 'release url')
theplatform_path = self._search_regex(r'https?://link\.theplatform\.com/s/([^?]+)', release_url, 'theplatform path')
video_id = theplatform_path.split('/')[-1]
query = {
'mbr': 'true',
}
is_auth = self._search_regex(r'video_is_auth\s*=\s*"([^"]+)"', webpage, 'is auth', fatal=False)
if is_auth == 'auth':
auth_resource_id = self._search_regex(
r"video_auth_resourceId\s*=\s*'([^']+)'",
webpage, 'auth resource id')
query['auth'] = self._extract_mvpd_auth(url, video_id, 'natgeo', auth_resource_id)
formats = []
subtitles = {}
for key, value in (('switch', 'http'), ('manifest', 'm3u')):
tp_query = query.copy()
tp_query.update({
key: value,
})
tp_formats, tp_subtitles = self._extract_theplatform_smil(
update_url_query(release_url, tp_query), video_id, 'Downloading %s SMIL data' % value)
formats.extend(tp_formats)
subtitles = self._merge_subtitles(subtitles, tp_subtitles)
self._sort_formats(formats)
info = self._extract_theplatform_metadata(theplatform_path, display_id)
info.update({
'id': video_id,
'formats': formats,
'subtitles': subtitles,
'display_id': display_id,
})
return info
class NationalGeographicEpisodeGuideIE(InfoExtractor):
IE_NAME = 'natgeo:episodeguide'
_VALID_URL = r'https?://channel\.nationalgeographic\.com/(?:wild/)?(?P<id>[^/]+)/episode-guide'
_TESTS = [
{
'url': 'http://channel.nationalgeographic.com/the-story-of-god-with-morgan-freeman/episode-guide/',
'info_dict': {
'id': 'the-story-of-god-with-morgan-freeman-season-1',
'title': 'The Story of God with Morgan Freeman - Season 1',
},
'playlist_mincount': 6,
},
{
'url': 'http://channel.nationalgeographic.com/underworld-inc/episode-guide/?s=2',
'info_dict': {
'id': 'underworld-inc-season-2',
'title': 'Underworld, Inc. - Season 2',
},
'playlist_mincount': 7,
},
]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
show = get_element_by_class('show', webpage)
selected_season = self._search_regex(
r'<div[^>]+class="select-seasons[^"]*".*?<a[^>]*>(.*?)</a>',
webpage, 'selected season')
entries = [
self.url_result(self._proto_relative_url(entry_url), 'NationalGeographic')
for entry_url in re.findall('(?s)<div[^>]+class="col-inner"[^>]*?>.*?<a[^>]+href="([^"]+)"', webpage)]
return self.playlist_result(
entries, '%s-%s' % (display_id, selected_season.lower().replace(' ', '-')),
'%s - %s' % (show, selected_season))

View File

@ -9,10 +9,8 @@ from .theplatform import ThePlatformIE
from .adobepass import AdobePassIE from .adobepass import AdobePassIE
from ..compat import compat_urllib_parse_unquote from ..compat import compat_urllib_parse_unquote
from ..utils import ( from ..utils import (
find_xpath_attr,
smuggle_url, smuggle_url,
try_get, try_get,
unescapeHTML,
update_url_query, update_url_query,
int_or_none, int_or_none,
) )
@ -269,27 +267,14 @@ class CSNNEIE(InfoExtractor):
class NBCNewsIE(ThePlatformIE): class NBCNewsIE(ThePlatformIE):
_VALID_URL = r'''(?x)https?://(?:www\.)?(?:nbcnews|today|msnbc)\.com/ _VALID_URL = r'(?x)https?://(?:www\.)?(?:nbcnews|today|msnbc)\.com/([^/]+/)*(?:.*-)?(?P<id>[^/?]+)'
(?:video/.+?/(?P<id>\d+)|
([^/]+/)*(?:.*-)?(?P<mpx_id>[^/?]+))
'''
_TESTS = [ _TESTS = [
{
'url': 'http://www.nbcnews.com/video/nbc-news/52753292',
'md5': '47abaac93c6eaf9ad37ee6c4463a5179',
'info_dict': {
'id': '52753292',
'ext': 'flv',
'title': 'Crew emerges after four-month Mars food study',
'description': 'md5:24e632ffac72b35f8b67a12d1b6ddfc1',
},
},
{ {
'url': 'http://www.nbcnews.com/watch/nbcnews-com/how-twitter-reacted-to-the-snowden-interview-269389891880', 'url': 'http://www.nbcnews.com/watch/nbcnews-com/how-twitter-reacted-to-the-snowden-interview-269389891880',
'md5': 'af1adfa51312291a017720403826bb64', 'md5': 'af1adfa51312291a017720403826bb64',
'info_dict': { 'info_dict': {
'id': 'p_tweet_snow_140529', 'id': '269389891880',
'ext': 'mp4', 'ext': 'mp4',
'title': 'How Twitter Reacted To The Snowden Interview', 'title': 'How Twitter Reacted To The Snowden Interview',
'description': 'md5:65a0bd5d76fe114f3c2727aa3a81fe64', 'description': 'md5:65a0bd5d76fe114f3c2727aa3a81fe64',
@ -313,7 +298,7 @@ class NBCNewsIE(ThePlatformIE):
'url': 'http://www.nbcnews.com/nightly-news/video/nightly-news-with-brian-williams-full-broadcast-february-4-394064451844', 'url': 'http://www.nbcnews.com/nightly-news/video/nightly-news-with-brian-williams-full-broadcast-february-4-394064451844',
'md5': '73135a2e0ef819107bbb55a5a9b2a802', 'md5': '73135a2e0ef819107bbb55a5a9b2a802',
'info_dict': { 'info_dict': {
'id': 'nn_netcast_150204', 'id': '394064451844',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Nightly News with Brian Williams Full Broadcast (February 4)', 'title': 'Nightly News with Brian Williams Full Broadcast (February 4)',
'description': 'md5:1c10c1eccbe84a26e5debb4381e2d3c5', 'description': 'md5:1c10c1eccbe84a26e5debb4381e2d3c5',
@ -326,7 +311,7 @@ class NBCNewsIE(ThePlatformIE):
'url': 'http://www.nbcnews.com/business/autos/volkswagen-11-million-vehicles-could-have-suspect-software-emissions-scandal-n431456', 'url': 'http://www.nbcnews.com/business/autos/volkswagen-11-million-vehicles-could-have-suspect-software-emissions-scandal-n431456',
'md5': 'a49e173825e5fcd15c13fc297fced39d', 'md5': 'a49e173825e5fcd15c13fc297fced39d',
'info_dict': { 'info_dict': {
'id': 'x_lon_vwhorn_150922', 'id': '529953347624',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Volkswagen U.S. Chief:\xa0 We Have Totally Screwed Up', 'title': 'Volkswagen U.S. Chief:\xa0 We Have Totally Screwed Up',
'description': 'md5:c8be487b2d80ff0594c005add88d8351', 'description': 'md5:c8be487b2d80ff0594c005add88d8351',
@ -339,7 +324,7 @@ class NBCNewsIE(ThePlatformIE):
'url': 'http://www.today.com/video/see-the-aurora-borealis-from-space-in-stunning-new-nasa-video-669831235788', 'url': 'http://www.today.com/video/see-the-aurora-borealis-from-space-in-stunning-new-nasa-video-669831235788',
'md5': '118d7ca3f0bea6534f119c68ef539f71', 'md5': '118d7ca3f0bea6534f119c68ef539f71',
'info_dict': { 'info_dict': {
'id': 'tdy_al_space_160420', 'id': '669831235788',
'ext': 'mp4', 'ext': 'mp4',
'title': 'See the aurora borealis from space in stunning new NASA video', 'title': 'See the aurora borealis from space in stunning new NASA video',
'description': 'md5:74752b7358afb99939c5f8bb2d1d04b1', 'description': 'md5:74752b7358afb99939c5f8bb2d1d04b1',
@ -352,7 +337,7 @@ class NBCNewsIE(ThePlatformIE):
'url': 'http://www.msnbc.com/all-in-with-chris-hayes/watch/the-chaotic-gop-immigration-vote-314487875924', 'url': 'http://www.msnbc.com/all-in-with-chris-hayes/watch/the-chaotic-gop-immigration-vote-314487875924',
'md5': '6d236bf4f3dddc226633ce6e2c3f814d', 'md5': '6d236bf4f3dddc226633ce6e2c3f814d',
'info_dict': { 'info_dict': {
'id': 'n_hayes_Aimm_140801_272214', 'id': '314487875924',
'ext': 'mp4', 'ext': 'mp4',
'title': 'The chaotic GOP immigration vote', 'title': 'The chaotic GOP immigration vote',
'description': 'The Republican House votes on a border bill that has no chance of getting through the Senate or signed by the President and is drawing criticism from all sides.', 'description': 'The Republican House votes on a border bill that has no chance of getting through the Senate or signed by the President and is drawing criticism from all sides.',
@ -374,60 +359,22 @@ class NBCNewsIE(ThePlatformIE):
] ]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) video_id = self._match_id(url)
video_id = mobj.group('id') if not video_id.isdigit():
if video_id is not None:
all_info = self._download_xml('http://www.nbcnews.com/id/%s/displaymode/1219' % video_id, video_id)
info = all_info.find('video')
return {
'id': video_id,
'title': info.find('headline').text,
'ext': 'flv',
'url': find_xpath_attr(info, 'media', 'type', 'flashVideo').text,
'description': info.find('caption').text,
'thumbnail': find_xpath_attr(info, 'media', 'type', 'thumbnail').text,
}
else:
# "feature" and "nightly-news" pages use theplatform.com
video_id = mobj.group('mpx_id')
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
filter_param = 'byId' data = self._parse_json(self._search_regex(
bootstrap_json = self._search_regex( r'window\.__data\s*=\s*({.+});', webpage,
[r'(?m)(?:var\s+(?:bootstrapJson|playlistData)|NEWS\.videoObj)\s*=\s*({.+});?\s*$', 'bootstrap json'), video_id)
r'videoObj\s*:\s*({.+})', r'data-video="([^"]+)"', video_id = data['article']['content'][0]['primaryMedia']['video']['mpxMetadata']['id']
r'jQuery\.extend\(Drupal\.settings\s*,\s*({.+?})\);'],
webpage, 'bootstrap json', default=None)
if bootstrap_json:
bootstrap = self._parse_json(
bootstrap_json, video_id, transform_source=unescapeHTML)
info = None return {
if 'results' in bootstrap: '_type': 'url_transparent',
info = bootstrap['results'][0]['video'] 'id': video_id,
elif 'video' in bootstrap: # http://feed.theplatform.com/f/2E2eJC/nbcnews also works
info = bootstrap['video'] 'url': update_url_query('http://feed.theplatform.com/f/2E2eJC/nnd_NBCNews', {'byId': video_id}),
elif 'msnbcVideoInfo' in bootstrap: 'ie_key': 'ThePlatformFeed',
info = bootstrap['msnbcVideoInfo']['meta'] }
elif 'msnbcThePlatform' in bootstrap:
info = bootstrap['msnbcThePlatform']['videoPlayer']['video']
else:
info = bootstrap
if 'guid' in info:
video_id = info['guid']
filter_param = 'byGuid'
elif 'mpxId' in info:
video_id = info['mpxId']
return {
'_type': 'url_transparent',
'id': video_id,
# http://feed.theplatform.com/f/2E2eJC/nbcnews also works
'url': update_url_query('http://feed.theplatform.com/f/2E2eJC/nnd_NBCNews', {filter_param: video_id}),
'ie_key': 'ThePlatformFeed',
}
class NBCOlympicsIE(InfoExtractor): class NBCOlympicsIE(InfoExtractor):

View File

@ -5,8 +5,8 @@ from ..utils import ExtractorError
class NhkVodIE(InfoExtractor): class NhkVodIE(InfoExtractor):
_VALID_URL = r'https?://www3\.nhk\.or\.jp/nhkworld/en/vod/(?P<id>[^/]+/[^/?#&]+)' _VALID_URL = r'https?://www3\.nhk\.or\.jp/nhkworld/en/(?:vod|ondemand)/(?P<id>[^/]+/[^/?#&]+)'
_TEST = { _TESTS = [{
# Videos available only for a limited period of time. Visit # Videos available only for a limited period of time. Visit
# http://www3.nhk.or.jp/nhkworld/en/vod/ for working samples. # http://www3.nhk.or.jp/nhkworld/en/vod/ for working samples.
'url': 'http://www3.nhk.or.jp/nhkworld/en/vod/tokyofashion/20160815', 'url': 'http://www3.nhk.or.jp/nhkworld/en/vod/tokyofashion/20160815',
@ -19,7 +19,10 @@ class NhkVodIE(InfoExtractor):
'episode': 'The Kimono as Global Fashion', 'episode': 'The Kimono as Global Fashion',
}, },
'skip': 'Videos available only for a limited period of time', 'skip': 'Videos available only for a limited period of time',
} }, {
'url': 'https://www3.nhk.or.jp/nhkworld/en/ondemand/video/2015173/',
'only_matching': True,
}]
_API_URL = 'http://api.nhk.or.jp/nhkworld/vodesdlist/v1/all/all/all.json?apikey=EJfK8jdS57GqlupFgAfAAwr573q01y6k' _API_URL = 'http://api.nhk.or.jp/nhkworld/vodesdlist/v1/all/all/all.json?apikey=EJfK8jdS57GqlupFgAfAAwr573q01y6k'
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -35,7 +35,7 @@ class NovaEmbedIE(InfoExtractor):
bitrates = self._parse_json( bitrates = self._parse_json(
self._search_regex( self._search_regex(
r'(?s)bitrates\s*=\s*({.+?})\s*;', webpage, 'formats'), r'(?s)(?:src|bitrates)\s*=\s*({.+?})\s*;', webpage, 'formats'),
video_id, transform_source=js_to_json) video_id, transform_source=js_to_json)
QUALITIES = ('lq', 'mq', 'hq', 'hd') QUALITIES = ('lq', 'mq', 'hq', 'hd')

View File

@ -363,7 +363,7 @@ class NPOIE(NPOBaseIE):
class NPOLiveIE(NPOBaseIE): class NPOLiveIE(NPOBaseIE):
IE_NAME = 'npo.nl:live' IE_NAME = 'npo.nl:live'
_VALID_URL = r'https?://(?:www\.)?npo\.nl/live(?:/(?P<id>[^/?#&]+))?' _VALID_URL = r'https?://(?:www\.)?npo(?:start)?\.nl/live(?:/(?P<id>[^/?#&]+))?'
_TESTS = [{ _TESTS = [{
'url': 'http://www.npo.nl/live/npo-1', 'url': 'http://www.npo.nl/live/npo-1',
@ -380,6 +380,9 @@ class NPOLiveIE(NPOBaseIE):
}, { }, {
'url': 'http://www.npo.nl/live', 'url': 'http://www.npo.nl/live',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.npostart.nl/live/npo-1',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -211,13 +211,13 @@ class NRKIE(NRKBaseIE):
_TESTS = [{ _TESTS = [{
# video # video
'url': 'http://www.nrk.no/video/PS*150533', 'url': 'http://www.nrk.no/video/PS*150533',
'md5': '2f7f6eeb2aacdd99885f355428715cfa', 'md5': '706f34cdf1322577589e369e522b50ef',
'info_dict': { 'info_dict': {
'id': '150533', 'id': '150533',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Dompap og andre fugler i Piip-Show', 'title': 'Dompap og andre fugler i Piip-Show',
'description': 'md5:d9261ba34c43b61c812cb6b0269a5c8f', 'description': 'md5:d9261ba34c43b61c812cb6b0269a5c8f',
'duration': 263, 'duration': 262,
} }
}, { }, {
# audio # audio
@ -248,7 +248,7 @@ class NRKTVIE(NRKBaseIE):
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?:// https?://
(?:tv|radio)\.nrk(?:super)?\.no/ (?:tv|radio)\.nrk(?:super)?\.no/
(?:serie/[^/]+|program)/ (?:serie(?:/[^/]+){1,2}|program)/
(?![Ee]pisodes)%s (?![Ee]pisodes)%s
(?:/\d{2}-\d{2}-\d{4})? (?:/\d{2}-\d{2}-\d{4})?
(?:\#del=(?P<part_id>\d+))? (?:\#del=(?P<part_id>\d+))?
@ -256,14 +256,14 @@ class NRKTVIE(NRKBaseIE):
_API_HOSTS = ('psapi-ne.nrk.no', 'psapi-we.nrk.no') _API_HOSTS = ('psapi-ne.nrk.no', 'psapi-we.nrk.no')
_TESTS = [{ _TESTS = [{
'url': 'https://tv.nrk.no/serie/20-spoersmaal-tv/MUHH48000314/23-05-2014', 'url': 'https://tv.nrk.no/serie/20-spoersmaal-tv/MUHH48000314/23-05-2014',
'md5': '4e9ca6629f09e588ed240fb11619922a', 'md5': '9a167e54d04671eb6317a37b7bc8a280',
'info_dict': { 'info_dict': {
'id': 'MUHH48000314AA', 'id': 'MUHH48000314AA',
'ext': 'mp4', 'ext': 'mp4',
'title': '20 spørsmål 23.05.2014', 'title': '20 spørsmål 23.05.2014',
'description': 'md5:bdea103bc35494c143c6a9acdd84887a', 'description': 'md5:bdea103bc35494c143c6a9acdd84887a',
'duration': 1741, 'duration': 1741,
'series': '20 spørsmål - TV', 'series': '20 spørsmål',
'episode': '23.05.2014', 'episode': '23.05.2014',
}, },
}, { }, {
@ -301,7 +301,7 @@ class NRKTVIE(NRKBaseIE):
'id': 'MSPO40010515AH', 'id': 'MSPO40010515AH',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Sprint fri teknikk, kvinner og menn 06.01.2015 (Part 1)', 'title': 'Sprint fri teknikk, kvinner og menn 06.01.2015 (Part 1)',
'description': 'md5:c03aba1e917561eface5214020551b7a', 'description': 'md5:1f97a41f05a9486ee00c56f35f82993d',
'duration': 772, 'duration': 772,
'series': 'Tour de Ski', 'series': 'Tour de Ski',
'episode': '06.01.2015', 'episode': '06.01.2015',
@ -314,7 +314,7 @@ class NRKTVIE(NRKBaseIE):
'id': 'MSPO40010515BH', 'id': 'MSPO40010515BH',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Sprint fri teknikk, kvinner og menn 06.01.2015 (Part 2)', 'title': 'Sprint fri teknikk, kvinner og menn 06.01.2015 (Part 2)',
'description': 'md5:c03aba1e917561eface5214020551b7a', 'description': 'md5:1f97a41f05a9486ee00c56f35f82993d',
'duration': 6175, 'duration': 6175,
'series': 'Tour de Ski', 'series': 'Tour de Ski',
'episode': '06.01.2015', 'episode': '06.01.2015',
@ -326,7 +326,7 @@ class NRKTVIE(NRKBaseIE):
'info_dict': { 'info_dict': {
'id': 'MSPO40010515', 'id': 'MSPO40010515',
'title': 'Sprint fri teknikk, kvinner og menn 06.01.2015', 'title': 'Sprint fri teknikk, kvinner og menn 06.01.2015',
'description': 'md5:c03aba1e917561eface5214020551b7a', 'description': 'md5:1f97a41f05a9486ee00c56f35f82993d',
}, },
'expected_warnings': ['Video is geo restricted'], 'expected_warnings': ['Video is geo restricted'],
}, { }, {
@ -362,6 +362,9 @@ class NRKTVIE(NRKBaseIE):
}, { }, {
'url': 'https://radio.nrk.no/serie/dagsnytt/NPUB21019315/12-07-2015#', 'url': 'https://radio.nrk.no/serie/dagsnytt/NPUB21019315/12-07-2015#',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://tv.nrk.no/serie/lindmo/2018/MUHU11006318/avspiller',
'only_matching': True,
}] }]
@ -403,21 +406,35 @@ class NRKTVSerieBaseIE(InfoExtractor):
def _extract_series(self, webpage, display_id, fatal=True): def _extract_series(self, webpage, display_id, fatal=True):
config = self._parse_json( config = self._parse_json(
self._search_regex( self._search_regex(
r'({.+?})\s*,\s*"[^"]+"\s*\)\s*</script>', webpage, 'config', (r'INITIAL_DATA_*\s*=\s*({.+?})\s*;',
default='{}' if not fatal else NO_DEFAULT), r'({.+?})\s*,\s*"[^"]+"\s*\)\s*</script>'),
webpage, 'config', default='{}' if not fatal else NO_DEFAULT),
display_id, fatal=False) display_id, fatal=False)
if not config: if not config:
return return
return try_get(config, lambda x: x['series'], dict) return try_get(
config,
(lambda x: x['initialState']['series'], lambda x: x['series']),
dict)
def _extract_seasons(self, seasons):
if not isinstance(seasons, list):
return []
entries = []
for season in seasons:
entries.extend(self._extract_episodes(season))
return entries
def _extract_episodes(self, season): def _extract_episodes(self, season):
entries = []
if not isinstance(season, dict): if not isinstance(season, dict):
return entries return []
episodes = season.get('episodes') return self._extract_entries(season.get('episodes'))
if not isinstance(episodes, list):
return entries def _extract_entries(self, entry_list):
for episode in episodes: if not isinstance(entry_list, list):
return []
entries = []
for episode in entry_list:
nrk_id = episode.get('prfId') nrk_id = episode.get('prfId')
if not nrk_id or not isinstance(nrk_id, compat_str): if not nrk_id or not isinstance(nrk_id, compat_str):
continue continue
@ -462,7 +479,7 @@ class NRKTVSeriesIE(NRKTVSerieBaseIE):
_VALID_URL = r'https?://(?:tv|radio)\.nrk(?:super)?\.no/serie/(?P<id>[^/]+)' _VALID_URL = r'https?://(?:tv|radio)\.nrk(?:super)?\.no/serie/(?P<id>[^/]+)'
_ITEM_RE = r'(?:data-season=["\']|id=["\']season-)(?P<id>\d+)' _ITEM_RE = r'(?:data-season=["\']|id=["\']season-)(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
# new layout # new layout, seasons
'url': 'https://tv.nrk.no/serie/backstage', 'url': 'https://tv.nrk.no/serie/backstage',
'info_dict': { 'info_dict': {
'id': 'backstage', 'id': 'backstage',
@ -471,20 +488,21 @@ class NRKTVSeriesIE(NRKTVSerieBaseIE):
}, },
'playlist_mincount': 60, 'playlist_mincount': 60,
}, { }, {
# old layout # new layout, instalments
'url': 'https://tv.nrk.no/serie/groenn-glede', 'url': 'https://tv.nrk.no/serie/groenn-glede',
'info_dict': { 'info_dict': {
'id': 'groenn-glede', 'id': 'groenn-glede',
'title': 'Grønn glede', 'title': 'Grønn glede',
'description': 'md5:7576e92ae7f65da6993cf90ee29e4608', 'description': 'md5:7576e92ae7f65da6993cf90ee29e4608',
}, },
'playlist_mincount': 9, 'playlist_mincount': 10,
}, { }, {
'url': 'http://tv.nrksuper.no/serie/labyrint', # old layout
'url': 'https://tv.nrksuper.no/serie/labyrint',
'info_dict': { 'info_dict': {
'id': 'labyrint', 'id': 'labyrint',
'title': 'Labyrint', 'title': 'Labyrint',
'description': 'md5:58afd450974c89e27d5a19212eee7115', 'description': 'md5:318b597330fdac5959247c9b69fdb1ec',
}, },
'playlist_mincount': 3, 'playlist_mincount': 3,
}, { }, {
@ -517,11 +535,12 @@ class NRKTVSeriesIE(NRKTVSerieBaseIE):
description = try_get( description = try_get(
series, lambda x: x['titles']['subtitle'], compat_str) series, lambda x: x['titles']['subtitle'], compat_str)
entries = [] entries = []
for season in series['seasons']: entries.extend(self._extract_seasons(series.get('seasons')))
entries.extend(self._extract_episodes(season)) entries.extend(self._extract_entries(series.get('instalments')))
entries.extend(self._extract_episodes(series.get('extraMaterial')))
return self.playlist_result(entries, series_id, title, description) return self.playlist_result(entries, series_id, title, description)
# Old layout (e.g. https://tv.nrk.no/serie/groenn-glede) # Old layout (e.g. https://tv.nrksuper.no/serie/labyrint)
entries = [ entries = [
self.url_result( self.url_result(
'https://tv.nrk.no/program/Episodes/{series}/{season}'.format( 'https://tv.nrk.no/program/Episodes/{series}/{season}'.format(
@ -533,6 +552,9 @@ class NRKTVSeriesIE(NRKTVSerieBaseIE):
'seriestitle', webpage, 'seriestitle', webpage,
'title', default=None) or self._og_search_title( 'title', default=None) or self._og_search_title(
webpage, fatal=False) webpage, fatal=False)
if title:
title = self._search_regex(
r'NRK (?:Super )?TV\s*[-]\s*(.+)', title, 'title', default=title)
description = self._html_search_meta( description = self._html_search_meta(
'series_description', webpage, 'series_description', webpage,
@ -593,7 +615,7 @@ class NRKPlaylistIE(NRKPlaylistBaseIE):
'title': 'Rivertonprisen til Karin Fossum', 'title': 'Rivertonprisen til Karin Fossum',
'description': 'Første kvinne på 15 år til å vinne krimlitteraturprisen.', 'description': 'Første kvinne på 15 år til å vinne krimlitteraturprisen.',
}, },
'playlist_count': 5, 'playlist_count': 2,
}] }]
def _extract_title(self, webpage): def _extract_title(self, webpage):

View File

@ -11,20 +11,27 @@ from ..utils import (
class NZZIE(InfoExtractor): class NZZIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?nzz\.ch/(?:[^/]+/)*[^/?#]+-ld\.(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?nzz\.ch/(?:[^/]+/)*[^/?#]+-ld\.(?P<id>\d+)'
_TEST = { _TESTS = [{
'url': 'http://www.nzz.ch/zuerich/gymizyte/gymizyte-schreiben-schueler-heute-noch-diktate-ld.9153', 'url': 'http://www.nzz.ch/zuerich/gymizyte/gymizyte-schreiben-schueler-heute-noch-diktate-ld.9153',
'info_dict': { 'info_dict': {
'id': '9153', 'id': '9153',
}, },
'playlist_mincount': 6, 'playlist_mincount': 6,
} }, {
'url': 'https://www.nzz.ch/video/nzz-standpunkte/cvp-auf-der-suche-nach-dem-mass-der-mitte-ld.1368112',
'info_dict': {
'id': '1368112',
},
'playlist_count': 1,
}]
def _real_extract(self, url): def _real_extract(self, url):
page_id = self._match_id(url) page_id = self._match_id(url)
webpage = self._download_webpage(url, page_id) webpage = self._download_webpage(url, page_id)
entries = [] entries = []
for player_element in re.findall(r'(<[^>]+class="kalturaPlayer"[^>]*>)', webpage): for player_element in re.findall(
r'(<[^>]+class="kalturaPlayer[^"]*"[^>]*>)', webpage):
player_params = extract_attributes(player_element) player_params = extract_attributes(player_element)
if player_params.get('data-type') not in ('kaltura_singleArticle',): if player_params.get('data-type') not in ('kaltura_singleArticle',):
self.report_warning('Unsupported player type') self.report_warning('Unsupported player type')

View File

@ -115,6 +115,10 @@ class OdnoklassnikiIE(InfoExtractor):
}, { }, {
'url': 'https://m.ok.ru/dk?st.cmd=movieLayer&st.discId=863789452017&st.retLoc=friend&st.rtu=%2Fdk%3Fst.cmd%3DfriendMovies%26st.mode%3Down%26st.mrkId%3D%257B%2522uploadedMovieMarker%2522%253A%257B%2522marker%2522%253A%25221519410114503%2522%252C%2522hasMore%2522%253Atrue%257D%252C%2522sharedMovieMarker%2522%253A%257B%2522marker%2522%253Anull%252C%2522hasMore%2522%253Afalse%257D%257D%26st.friendId%3D561722190321%26st.frwd%3Don%26_prevCmd%3DfriendMovies%26tkn%3D7257&st.discType=MOVIE&st.mvId=863789452017&_prevCmd=friendMovies&tkn=3648#lst#', 'url': 'https://m.ok.ru/dk?st.cmd=movieLayer&st.discId=863789452017&st.retLoc=friend&st.rtu=%2Fdk%3Fst.cmd%3DfriendMovies%26st.mode%3Down%26st.mrkId%3D%257B%2522uploadedMovieMarker%2522%253A%257B%2522marker%2522%253A%25221519410114503%2522%252C%2522hasMore%2522%253Atrue%257D%252C%2522sharedMovieMarker%2522%253A%257B%2522marker%2522%253Anull%252C%2522hasMore%2522%253Afalse%257D%257D%26st.friendId%3D561722190321%26st.frwd%3Don%26_prevCmd%3DfriendMovies%26tkn%3D7257&st.discType=MOVIE&st.mvId=863789452017&_prevCmd=friendMovies&tkn=3648#lst#',
'only_matching': True, 'only_matching': True,
}, {
# Paid video
'url': 'https://ok.ru/video/954886983203',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -244,6 +248,11 @@ class OdnoklassnikiIE(InfoExtractor):
'ext': 'flv', 'ext': 'flv',
}) })
if not formats:
payment_info = metadata.get('paymentInfo')
if payment_info:
raise ExtractorError('This video is paid, subscribe to download it', expected=True)
self._sort_formats(formats) self._sort_formats(formats)
info['formats'] = formats info['formats'] = formats

View File

@ -243,7 +243,18 @@ class PhantomJSwrapper(object):
class OpenloadIE(InfoExtractor): class OpenloadIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?(?:openload\.(?:co|io|link)|oload\.(?:tv|stream|site|xyz|win|download|cloud|cc|icu|fun))/(?:f|embed)/(?P<id>[a-zA-Z0-9-_]+)' _VALID_URL = r'''(?x)
https?://
(?P<host>
(?:www\.)?
(?:
openload\.(?:co|io|link|pw)|
oload\.(?:tv|stream|site|xyz|win|download|cloud|cc|icu|fun|club|info|pw)
)
)/
(?:f|embed)/
(?P<id>[a-zA-Z0-9-_]+)
'''
_TESTS = [{ _TESTS = [{
'url': 'https://openload.co/f/kUEfGclsU9o', 'url': 'https://openload.co/f/kUEfGclsU9o',
@ -323,6 +334,18 @@ class OpenloadIE(InfoExtractor):
}, { }, {
'url': 'https://oload.fun/f/gb6G1H4sHXY', 'url': 'https://oload.fun/f/gb6G1H4sHXY',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://oload.club/f/Nr1L-aZ2dbQ',
'only_matching': True,
}, {
'url': 'https://oload.info/f/5NEAbI2BDSk',
'only_matching': True,
}, {
'url': 'https://openload.pw/f/WyKgK8s94N0',
'only_matching': True,
}, {
'url': 'https://oload.pw/f/WyKgK8s94N0',
'only_matching': True,
}] }]
_USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36' _USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36'
@ -334,8 +357,11 @@ class OpenloadIE(InfoExtractor):
webpage) webpage)
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) mobj = re.match(self._VALID_URL, url)
url_pattern = 'https://openload.co/%%s/%s/' % video_id host = mobj.group('host')
video_id = mobj.group('id')
url_pattern = 'https://%s/%%s/%s/' % (host, video_id)
headers = { headers = {
'User-Agent': self._USER_AGENT, 'User-Agent': self._USER_AGENT,
} }
@ -368,7 +394,7 @@ class OpenloadIE(InfoExtractor):
r'>\s*([\w~-]+~[a-f0-9:]+~[\w~-]+)'), webpage, r'>\s*([\w~-]+~[a-f0-9:]+~[\w~-]+)'), webpage,
'stream URL')) 'stream URL'))
video_url = 'https://openload.co/stream/%s?mime=true' % decoded_id video_url = 'https://%s/stream/%s?mime=true' % (host, decoded_id)
title = self._og_search_title(webpage, default=None) or self._search_regex( title = self._og_search_title(webpage, default=None) or self._search_regex(
r'<span[^>]+class=["\']title["\'][^>]*>([^<]+)', webpage, r'<span[^>]+class=["\']title["\'][^>]*>([^<]+)', webpage,
@ -379,7 +405,7 @@ class OpenloadIE(InfoExtractor):
entry = entries[0] if entries else {} entry = entries[0] if entries else {}
subtitles = entry.get('subtitles') subtitles = entry.get('subtitles')
info_dict = { return {
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'thumbnail': entry.get('thumbnail') or self._og_search_thumbnail(webpage, default=None), 'thumbnail': entry.get('thumbnail') or self._og_search_thumbnail(webpage, default=None),
@ -388,4 +414,3 @@ class OpenloadIE(InfoExtractor):
'subtitles': subtitles, 'subtitles': subtitles,
'http_headers': headers, 'http_headers': headers,
} }
return info_dict

View File

@ -0,0 +1,28 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
class OutsideTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?outsidetv\.com/(?:[^/]+/)*?play/[a-zA-Z0-9]{8}/\d+/\d+/(?P<id>[a-zA-Z0-9]{8})'
_TESTS = [{
'url': 'http://www.outsidetv.com/category/snow/play/ZjQYboH6/1/10/Hdg0jukV/4',
'md5': '192d968fedc10b2f70ec31865ffba0da',
'info_dict': {
'id': 'Hdg0jukV',
'ext': 'mp4',
'title': 'Home - Jackson Ep 1 | Arbor Snowboards',
'description': 'md5:41a12e94f3db3ca253b04bb1e8d8f4cd',
'upload_date': '20181225',
'timestamp': 1545742800,
}
}, {
'url': 'http://www.outsidetv.com/home/play/ZjQYboH6/1/10/Hdg0jukV/4',
'only_matching': True,
}]
def _real_extract(self, url):
jw_media_id = self._match_id(url)
return self.url_result(
'jwplatform:' + jw_media_id, 'JWPlatform', jw_media_id)

View File

@ -24,9 +24,9 @@ class PacktPubBaseIE(InfoExtractor):
class PacktPubIE(PacktPubBaseIE): class PacktPubIE(PacktPubBaseIE):
_VALID_URL = r'https?://(?:www\.)?packtpub\.com/mapt/video/[^/]+/(?P<course_id>\d+)/(?P<chapter_id>\d+)/(?P<id>\d+)' _VALID_URL = r'https?://(?:(?:www\.)?packtpub\.com/mapt|subscription\.packtpub\.com)/video/[^/]+/(?P<course_id>\d+)/(?P<chapter_id>\d+)/(?P<id>\d+)'
_TEST = { _TESTS = [{
'url': 'https://www.packtpub.com/mapt/video/web-development/9781787122215/20528/20530/Project+Intro', 'url': 'https://www.packtpub.com/mapt/video/web-development/9781787122215/20528/20530/Project+Intro',
'md5': '1e74bd6cfd45d7d07666f4684ef58f70', 'md5': '1e74bd6cfd45d7d07666f4684ef58f70',
'info_dict': { 'info_dict': {
@ -37,7 +37,10 @@ class PacktPubIE(PacktPubBaseIE):
'timestamp': 1490918400, 'timestamp': 1490918400,
'upload_date': '20170331', 'upload_date': '20170331',
}, },
} }, {
'url': 'https://subscription.packtpub.com/video/web_development/9781787122215/20528/20530/project-intro',
'only_matching': True,
}]
_NETRC_MACHINE = 'packtpub' _NETRC_MACHINE = 'packtpub'
_TOKEN = None _TOKEN = None
@ -110,15 +113,18 @@ class PacktPubIE(PacktPubBaseIE):
class PacktPubCourseIE(PacktPubBaseIE): class PacktPubCourseIE(PacktPubBaseIE):
_VALID_URL = r'(?P<url>https?://(?:www\.)?packtpub\.com/mapt/video/[^/]+/(?P<id>\d+))' _VALID_URL = r'(?P<url>https?://(?:(?:www\.)?packtpub\.com/mapt|subscription\.packtpub\.com)/video/[^/]+/(?P<id>\d+))'
_TEST = { _TESTS = [{
'url': 'https://www.packtpub.com/mapt/video/web-development/9781787122215', 'url': 'https://www.packtpub.com/mapt/video/web-development/9781787122215',
'info_dict': { 'info_dict': {
'id': '9781787122215', 'id': '9781787122215',
'title': 'Learn Nodejs by building 12 projects [Video]', 'title': 'Learn Nodejs by building 12 projects [Video]',
}, },
'playlist_count': 90, 'playlist_count': 90,
} }, {
'url': 'https://subscription.packtpub.com/video/web_development/9781787122215',
'only_matching': True,
}]
@classmethod @classmethod
def suitable(cls, url): def suitable(cls, url):

View File

@ -0,0 +1,109 @@
# coding: utf-8
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
from ..compat import compat_HTTPError
from ..utils import (
clean_html,
ExtractorError,
int_or_none,
PUTRequest,
)
class PlayPlusTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?playplus\.(?:com|tv)/VOD/(?P<project_id>[0-9]+)/(?P<id>[0-9a-f]{32})'
_TEST = {
'url': 'https://www.playplus.tv/VOD/7572/db8d274a5163424e967f35a30ddafb8e',
'md5': 'd078cb89d7ab6b9df37ce23c647aef72',
'info_dict': {
'id': 'db8d274a5163424e967f35a30ddafb8e',
'ext': 'mp4',
'title': 'Capítulo 179 - Final',
'description': 'md5:01085d62d8033a1e34121d3c3cabc838',
'timestamp': 1529992740,
'upload_date': '20180626',
},
'skip': 'Requires account credential',
}
_NETRC_MACHINE = 'playplustv'
_GEO_COUNTRIES = ['BR']
_token = None
_profile_id = None
def _call_api(self, resource, video_id=None, query=None):
return self._download_json('https://api.playplus.tv/api/media/v2/get' + resource, video_id, headers={
'Authorization': 'Bearer ' + self._token,
}, query=query)
def _real_initialize(self):
email, password = self._get_login_info()
if email is None:
self.raise_login_required()
req = PUTRequest(
'https://api.playplus.tv/api/web/login', json.dumps({
'email': email,
'password': password,
}).encode(), {
'Content-Type': 'application/json; charset=utf-8',
})
try:
self._token = self._download_json(req, None)['token']
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 401:
raise ExtractorError(self._parse_json(
e.cause.read(), None)['errorMessage'], expected=True)
raise
self._profile = self._call_api('Profiles')['list'][0]['_id']
def _real_extract(self, url):
project_id, media_id = re.match(self._VALID_URL, url).groups()
media = self._call_api(
'Media', media_id, {
'profileId': self._profile,
'projectId': project_id,
'mediaId': media_id,
})['obj']
title = media['title']
formats = []
for f in media.get('files', []):
f_url = f.get('url')
if not f_url:
continue
file_info = f.get('fileInfo') or {}
formats.append({
'url': f_url,
'width': int_or_none(file_info.get('width')),
'height': int_or_none(file_info.get('height')),
})
self._sort_formats(formats)
thumbnails = []
for thumb in media.get('thumbs', []):
thumb_url = thumb.get('url')
if not thumb_url:
continue
thumbnails.append({
'url': thumb_url,
'width': int_or_none(thumb.get('width')),
'height': int_or_none(thumb.get('height')),
})
return {
'id': media_id,
'title': title,
'formats': formats,
'thumbnails': thumbnails,
'description': clean_html(media.get('description')) or media.get('shortDescription'),
'timestamp': int_or_none(media.get('publishDate'), 1000),
'view_count': int_or_none(media.get('numberOfViews')),
'comment_count': int_or_none(media.get('numberOfComments')),
'tags': media.get('tags'),
}

View File

@ -10,7 +10,9 @@ from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_HTTPError, compat_HTTPError,
compat_str, compat_str,
compat_urllib_request,
) )
from .openload import PhantomJSwrapper
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
int_or_none, int_or_none,
@ -22,12 +24,34 @@ from ..utils import (
) )
class PornHubIE(InfoExtractor): class PornHubBaseIE(InfoExtractor):
def _download_webpage_handle(self, *args, **kwargs):
def dl(*args, **kwargs):
return super(PornHubBaseIE, self)._download_webpage_handle(*args, **kwargs)
webpage, urlh = dl(*args, **kwargs)
if any(re.search(p, webpage) for p in (
r'<body\b[^>]+\bonload=["\']go\(\)',
r'document\.cookie\s*=\s*["\']RNKEY=',
r'document\.location\.reload\(true\)')):
url_or_request = args[0]
url = (url_or_request.get_full_url()
if isinstance(url_or_request, compat_urllib_request.Request)
else url_or_request)
phantom = PhantomJSwrapper(self, required_version='2.0')
phantom.get(url, html=webpage)
webpage, urlh = dl(*args, **kwargs)
return webpage, urlh
class PornHubIE(PornHubBaseIE):
IE_DESC = 'PornHub and Thumbzilla' IE_DESC = 'PornHub and Thumbzilla'
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?:// https?://
(?: (?:
(?:[^/]+\.)?pornhub\.com/(?:(?:view_video\.php|video/show)\?viewkey=|embed/)| (?:[^/]+\.)?(?P<host>pornhub\.(?:com|net))/(?:(?:view_video\.php|video/show)\?viewkey=|embed/)|
(?:www\.)?thumbzilla\.com/video/ (?:www\.)?thumbzilla\.com/video/
) )
(?P<id>[\da-z]+) (?P<id>[\da-z]+)
@ -121,12 +145,15 @@ class PornHubIE(InfoExtractor):
}, { }, {
'url': 'http://www.pornhub.com/video/show?viewkey=648719015', 'url': 'http://www.pornhub.com/video/show?viewkey=648719015',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.pornhub.net/view_video.php?viewkey=203640933',
'only_matching': True,
}] }]
@staticmethod @staticmethod
def _extract_urls(webpage): def _extract_urls(webpage):
return re.findall( return re.findall(
r'<iframe[^>]+?src=["\'](?P<url>(?:https?:)?//(?:www\.)?pornhub\.com/embed/[\da-z]+)', r'<iframe[^>]+?src=["\'](?P<url>(?:https?:)?//(?:www\.)?pornhub\.(?:com|net)/embed/[\da-z]+)',
webpage) webpage)
def _extract_count(self, pattern, webpage, name): def _extract_count(self, pattern, webpage, name):
@ -134,14 +161,16 @@ class PornHubIE(InfoExtractor):
pattern, webpage, '%s count' % name, fatal=False)) pattern, webpage, '%s count' % name, fatal=False))
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) mobj = re.match(self._VALID_URL, url)
host = mobj.group('host') or 'pornhub.com'
video_id = mobj.group('id')
self._set_cookie('pornhub.com', 'age_verified', '1') self._set_cookie(host, 'age_verified', '1')
def dl_webpage(platform): def dl_webpage(platform):
self._set_cookie('pornhub.com', 'platform', platform) self._set_cookie(host, 'platform', platform)
return self._download_webpage( return self._download_webpage(
'http://www.pornhub.com/view_video.php?viewkey=%s' % video_id, 'http://www.%s/view_video.php?viewkey=%s' % (host, video_id),
video_id, 'Downloading %s webpage' % platform) video_id, 'Downloading %s webpage' % platform)
webpage = dl_webpage('pc') webpage = dl_webpage('pc')
@ -302,8 +331,8 @@ class PornHubIE(InfoExtractor):
} }
class PornHubPlaylistBaseIE(InfoExtractor): class PornHubPlaylistBaseIE(PornHubBaseIE):
def _extract_entries(self, webpage): def _extract_entries(self, webpage, host):
# Only process container div with main playlist content skipping # Only process container div with main playlist content skipping
# drop-down menu that uses similar pattern for videos (see # drop-down menu that uses similar pattern for videos (see
# https://github.com/rg3/youtube-dl/issues/11594). # https://github.com/rg3/youtube-dl/issues/11594).
@ -313,7 +342,7 @@ class PornHubPlaylistBaseIE(InfoExtractor):
return [ return [
self.url_result( self.url_result(
'http://www.pornhub.com/%s' % video_url, 'http://www.%s/%s' % (host, video_url),
PornHubIE.ie_key(), video_title=title) PornHubIE.ie_key(), video_title=title)
for video_url, title in orderedSet(re.findall( for video_url, title in orderedSet(re.findall(
r'href="/?(view_video\.php\?.*\bviewkey=[\da-z]+[^"]*)"[^>]*\s+title="([^"]+)"', r'href="/?(view_video\.php\?.*\bviewkey=[\da-z]+[^"]*)"[^>]*\s+title="([^"]+)"',
@ -321,11 +350,13 @@ class PornHubPlaylistBaseIE(InfoExtractor):
] ]
def _real_extract(self, url): def _real_extract(self, url):
playlist_id = self._match_id(url) mobj = re.match(self._VALID_URL, url)
host = mobj.group('host')
playlist_id = mobj.group('id')
webpage = self._download_webpage(url, playlist_id) webpage = self._download_webpage(url, playlist_id)
entries = self._extract_entries(webpage) entries = self._extract_entries(webpage, host)
playlist = self._parse_json( playlist = self._parse_json(
self._search_regex( self._search_regex(
@ -340,7 +371,7 @@ class PornHubPlaylistBaseIE(InfoExtractor):
class PornHubPlaylistIE(PornHubPlaylistBaseIE): class PornHubPlaylistIE(PornHubPlaylistBaseIE):
_VALID_URL = r'https?://(?:[^/]+\.)?pornhub\.com/playlist/(?P<id>\d+)' _VALID_URL = r'https?://(?:[^/]+\.)?(?P<host>pornhub\.(?:com|net))/playlist/(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.pornhub.com/playlist/4667351', 'url': 'http://www.pornhub.com/playlist/4667351',
'info_dict': { 'info_dict': {
@ -355,7 +386,7 @@ class PornHubPlaylistIE(PornHubPlaylistBaseIE):
class PornHubUserVideosIE(PornHubPlaylistBaseIE): class PornHubUserVideosIE(PornHubPlaylistBaseIE):
_VALID_URL = r'https?://(?:[^/]+\.)?pornhub\.com/(?:(?:user|channel)s|model|pornstar)/(?P<id>[^/]+)/videos' _VALID_URL = r'https?://(?:[^/]+\.)?(?P<host>pornhub\.(?:com|net))/(?:(?:user|channel)s|model|pornstar)/(?P<id>[^/]+)/videos'
_TESTS = [{ _TESTS = [{
'url': 'http://www.pornhub.com/users/zoe_ph/videos/public', 'url': 'http://www.pornhub.com/users/zoe_ph/videos/public',
'info_dict': { 'info_dict': {
@ -396,7 +427,9 @@ class PornHubUserVideosIE(PornHubPlaylistBaseIE):
}] }]
def _real_extract(self, url): def _real_extract(self, url):
user_id = self._match_id(url) mobj = re.match(self._VALID_URL, url)
host = mobj.group('host')
user_id = mobj.group('id')
entries = [] entries = []
for page_num in itertools.count(1): for page_num in itertools.count(1):
@ -408,7 +441,7 @@ class PornHubUserVideosIE(PornHubPlaylistBaseIE):
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 404: if isinstance(e.cause, compat_HTTPError) and e.cause.code == 404:
break break
raise raise
page_entries = self._extract_entries(webpage) page_entries = self._extract_entries(webpage, host)
if not page_entries: if not page_entries:
break break
entries.extend(page_entries) entries.extend(page_entries)

View File

@ -49,6 +49,16 @@ class RadioCanadaIE(InfoExtractor):
# m3u8 download # m3u8 download
'skip_download': True, 'skip_download': True,
}, },
},
{
# with protectionType but not actually DRM protected
'url': 'radiocanada:toutv:140872',
'info_dict': {
'id': '140872',
'title': 'Épisode 1',
'series': 'District 31',
},
'only_matching': True,
} }
] ]
@ -67,8 +77,10 @@ class RadioCanadaIE(InfoExtractor):
el = find_xpath_attr(metadata, './/Meta', 'name', name) el = find_xpath_attr(metadata, './/Meta', 'name', name)
return el.text if el is not None else None return el.text if el is not None else None
# protectionType does not necessarily mean the video is DRM protected (see
# https://github.com/rg3/youtube-dl/pull/18609).
if get_meta('protectionType'): if get_meta('protectionType'):
raise ExtractorError('This video is DRM protected.', expected=True) self.report_warning('This video is probably DRM protected.')
device_types = ['ipad'] device_types = ['ipad']
if not smuggled_data: if not smuggled_data:

View File

@ -1,38 +1,46 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from .brightcove import BrightcoveLegacyIE from .brightcove import BrightcoveLegacyIE
from ..compat import ( from ..compat import (
compat_parse_qs, compat_parse_qs,
compat_urlparse, compat_urlparse,
) )
from ..utils import smuggle_url
class RMCDecouverteIE(InfoExtractor): class RMCDecouverteIE(InfoExtractor):
_VALID_URL = r'https?://rmcdecouverte\.bfmtv\.com/mediaplayer-replay.*?\bid=(?P<id>\d+)' _VALID_URL = r'https?://rmcdecouverte\.bfmtv\.com/(?:(?:[^/]+/)*program_(?P<id>\d+)|(?P<live_id>mediaplayer-direct))'
_TEST = { _TESTS = [{
'url': 'http://rmcdecouverte.bfmtv.com/mediaplayer-replay/?id=13502&title=AQUAMEN:LES%20ROIS%20DES%20AQUARIUMS%20:UN%20DELICIEUX%20PROJET', 'url': 'https://rmcdecouverte.bfmtv.com/wheeler-dealers-occasions-a-saisir/program_2566/',
'info_dict': { 'info_dict': {
'id': '5419055995001', 'id': '5983675500001',
'ext': 'mp4', 'ext': 'mp4',
'title': 'UN DELICIEUX PROJET', 'title': 'CORVETTE',
'description': 'md5:63610df7c8b1fc1698acd4d0d90ba8b5', 'description': 'md5:c1e8295521e45ffebf635d6a7658f506',
'uploader_id': '1969646226001', 'uploader_id': '1969646226001',
'upload_date': '20170502', 'upload_date': '20181226',
'timestamp': 1493745308, 'timestamp': 1545861635,
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
'skip': 'only available for a week', 'skip': 'only available for a week',
} }, {
# live, geo restricted, bypassable
'url': 'https://rmcdecouverte.bfmtv.com/mediaplayer-direct/',
'only_matching': True,
}]
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/1969646226001/default_default/index.html?videoId=%s' BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/1969646226001/default_default/index.html?videoId=%s'
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) mobj = re.match(self._VALID_URL, url)
webpage = self._download_webpage(url, video_id) display_id = mobj.group('id') or mobj.group('live_id')
webpage = self._download_webpage(url, display_id)
brightcove_legacy_url = BrightcoveLegacyIE._extract_brightcove_url(webpage) brightcove_legacy_url = BrightcoveLegacyIE._extract_brightcove_url(webpage)
if brightcove_legacy_url: if brightcove_legacy_url:
brightcove_id = compat_parse_qs(compat_urlparse.urlparse( brightcove_id = compat_parse_qs(compat_urlparse.urlparse(
@ -41,5 +49,7 @@ class RMCDecouverteIE(InfoExtractor):
brightcove_id = self._search_regex( brightcove_id = self._search_regex(
r'data-video-id=["\'](\d+)', webpage, 'brightcove id') r'data-video-id=["\'](\d+)', webpage, 'brightcove id')
return self.url_result( return self.url_result(
self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id, 'BrightcoveNew', smuggle_url(
brightcove_id) self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id,
{'geo_countries': ['FR']}),
'BrightcoveNew', brightcove_id)

View File

@ -8,7 +8,10 @@ from ..compat import compat_HTTPError
from ..utils import ( from ..utils import (
float_or_none, float_or_none,
parse_iso8601, parse_iso8601,
str_or_none,
try_get,
unescapeHTML, unescapeHTML,
url_or_none,
ExtractorError, ExtractorError,
) )
@ -17,65 +20,87 @@ class RteBaseIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
item_id = self._match_id(url) item_id = self._match_id(url)
try: info_dict = {}
json_string = self._download_json(
'http://www.rte.ie/rteavgen/getplaylist/?type=web&format=json&id=' + item_id,
item_id)
except ExtractorError as ee:
if isinstance(ee.cause, compat_HTTPError) and ee.cause.code == 404:
error_info = self._parse_json(ee.cause.read().decode(), item_id, fatal=False)
if error_info:
raise ExtractorError(
'%s said: %s' % (self.IE_NAME, error_info['message']),
expected=True)
raise
# NB the string values in the JSON are stored using XML escaping(!)
show = json_string['shows'][0]
title = unescapeHTML(show['title'])
description = unescapeHTML(show.get('description'))
thumbnail = show.get('thumbnail')
duration = float_or_none(show.get('duration'), 1000)
timestamp = parse_iso8601(show.get('published'))
mg = show['media:group'][0]
formats = [] formats = []
if mg.get('url'): ENDPOINTS = (
m = re.match(r'(?P<url>rtmpe?://[^/]+)/(?P<app>.+)/(?P<playpath>mp4:.*)', mg['url']) 'https://feeds.rasset.ie/rteavgen/player/playlist?type=iptv&format=json&showId=',
if m: 'http://www.rte.ie/rteavgen/getplaylist/?type=web&format=json&id=',
m = m.groupdict() )
formats.append({
'url': m['url'] + '/' + m['app'],
'app': m['app'],
'play_path': m['playpath'],
'player_url': url,
'ext': 'flv',
'format_id': 'rtmp',
})
if mg.get('hls_server') and mg.get('hls_url'): for num, ep_url in enumerate(ENDPOINTS, start=1):
formats.extend(self._extract_m3u8_formats( try:
mg['hls_server'] + mg['hls_url'], item_id, 'mp4', data = self._download_json(ep_url + item_id, item_id)
entry_protocol='m3u8_native', m3u8_id='hls', fatal=False)) except ExtractorError as ee:
if num < len(ENDPOINTS) or formats:
continue
if isinstance(ee.cause, compat_HTTPError) and ee.cause.code == 404:
error_info = self._parse_json(ee.cause.read().decode(), item_id, fatal=False)
if error_info:
raise ExtractorError(
'%s said: %s' % (self.IE_NAME, error_info['message']),
expected=True)
raise
if mg.get('hds_server') and mg.get('hds_url'): # NB the string values in the JSON are stored using XML escaping(!)
formats.extend(self._extract_f4m_formats( show = try_get(data, lambda x: x['shows'][0], dict)
mg['hds_server'] + mg['hds_url'], item_id, if not show:
f4m_id='hds', fatal=False)) continue
if not info_dict:
title = unescapeHTML(show['title'])
description = unescapeHTML(show.get('description'))
thumbnail = show.get('thumbnail')
duration = float_or_none(show.get('duration'), 1000)
timestamp = parse_iso8601(show.get('published'))
info_dict = {
'id': item_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'timestamp': timestamp,
'duration': duration,
}
mg = try_get(show, lambda x: x['media:group'][0], dict)
if not mg:
continue
if mg.get('url'):
m = re.match(r'(?P<url>rtmpe?://[^/]+)/(?P<app>.+)/(?P<playpath>mp4:.*)', mg['url'])
if m:
m = m.groupdict()
formats.append({
'url': m['url'] + '/' + m['app'],
'app': m['app'],
'play_path': m['playpath'],
'player_url': url,
'ext': 'flv',
'format_id': 'rtmp',
})
if mg.get('hls_server') and mg.get('hls_url'):
formats.extend(self._extract_m3u8_formats(
mg['hls_server'] + mg['hls_url'], item_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='hls', fatal=False))
if mg.get('hds_server') and mg.get('hds_url'):
formats.extend(self._extract_f4m_formats(
mg['hds_server'] + mg['hds_url'], item_id,
f4m_id='hds', fatal=False))
mg_rte_server = str_or_none(mg.get('rte:server'))
mg_url = str_or_none(mg.get('url'))
if mg_rte_server and mg_url:
hds_url = url_or_none(mg_rte_server + mg_url)
if hds_url:
formats.extend(self._extract_f4m_formats(
hds_url, item_id, f4m_id='hds', fatal=False))
self._sort_formats(formats) self._sort_formats(formats)
return { info_dict['formats'] = formats
'id': item_id, return info_dict
'title': title,
'description': description,
'thumbnail': thumbnail,
'timestamp': timestamp,
'duration': duration,
'formats': formats,
}
class RteIE(RteBaseIE): class RteIE(RteBaseIE):

View File

@ -15,10 +15,10 @@ from ..utils import (
class SafariBaseIE(InfoExtractor): class SafariBaseIE(InfoExtractor):
_LOGIN_URL = 'https://www.safaribooksonline.com/accounts/login/' _LOGIN_URL = 'https://learning.oreilly.com/accounts/login/'
_NETRC_MACHINE = 'safari' _NETRC_MACHINE = 'safari'
_API_BASE = 'https://www.safaribooksonline.com/api/v1' _API_BASE = 'https://learning.oreilly.com/api/v1'
_API_FORMAT = 'json' _API_FORMAT = 'json'
LOGGED_IN = False LOGGED_IN = False
@ -76,7 +76,7 @@ class SafariIE(SafariBaseIE):
IE_DESC = 'safaribooksonline.com online video' IE_DESC = 'safaribooksonline.com online video'
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?:// https?://
(?:www\.)?safaribooksonline\.com/ (?:www\.)?(?:safaribooksonline|learning\.oreilly)\.com/
(?: (?:
library/view/[^/]+/(?P<course_id>[^/]+)/(?P<part>[^/?\#&]+)\.html| library/view/[^/]+/(?P<course_id>[^/]+)/(?P<part>[^/?\#&]+)\.html|
videos/[^/]+/[^/]+/(?P<reference_id>[^-]+-[^/?\#&]+) videos/[^/]+/[^/]+/(?P<reference_id>[^-]+-[^/?\#&]+)
@ -104,6 +104,9 @@ class SafariIE(SafariBaseIE):
}, { }, {
'url': 'https://www.safaribooksonline.com/videos/python-programming-language/9780134217314/9780134217314-PYMC_13_00', 'url': 'https://www.safaribooksonline.com/videos/python-programming-language/9780134217314/9780134217314-PYMC_13_00',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://learning.oreilly.com/videos/hadoop-fundamentals-livelessons/9780133392838/9780133392838-00_SeriesIntro',
'only_matching': True,
}] }]
_PARTNER_ID = '1926081' _PARTNER_ID = '1926081'
@ -160,7 +163,7 @@ class SafariIE(SafariBaseIE):
class SafariApiIE(SafariBaseIE): class SafariApiIE(SafariBaseIE):
IE_NAME = 'safari:api' IE_NAME = 'safari:api'
_VALID_URL = r'https?://(?:www\.)?safaribooksonline\.com/api/v1/book/(?P<course_id>[^/]+)/chapter(?:-content)?/(?P<part>[^/?#&]+)\.html' _VALID_URL = r'https?://(?:www\.)?(?:safaribooksonline|learning\.oreilly)\.com/api/v1/book/(?P<course_id>[^/]+)/chapter(?:-content)?/(?P<part>[^/?#&]+)\.html'
_TESTS = [{ _TESTS = [{
'url': 'https://www.safaribooksonline.com/api/v1/book/9780133392838/chapter/part00.html', 'url': 'https://www.safaribooksonline.com/api/v1/book/9780133392838/chapter/part00.html',
@ -185,7 +188,7 @@ class SafariCourseIE(SafariBaseIE):
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?:// https?://
(?: (?:
(?:www\.)?safaribooksonline\.com/ (?:www\.)?(?:safaribooksonline|learning\.oreilly)\.com/
(?: (?:
library/view/[^/]+| library/view/[^/]+|
api/v1/book| api/v1/book|
@ -213,6 +216,9 @@ class SafariCourseIE(SafariBaseIE):
}, { }, {
'url': 'https://www.safaribooksonline.com/videos/python-programming-language/9780134217314', 'url': 'https://www.safaribooksonline.com/videos/python-programming-language/9780134217314',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://learning.oreilly.com/videos/hadoop-fundamentals-livelessons/9780133392838',
'only_matching': True,
}] }]
@classmethod @classmethod

View File

@ -30,8 +30,5 @@ class SaveFromIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
video_id = os.path.splitext(url.split('/')[-1])[0] video_id = os.path.splitext(url.split('/')[-1])[0]
return {
'_type': 'url', return self.url_result(mobj.group('url'), video_id=video_id)
'id': video_id,
'url': mobj.group('url'),
}

View File

@ -19,7 +19,7 @@ class ScrippsNetworksWatchIE(AWSIE):
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?:// https?://
watch\. watch\.
(?P<site>hgtv|foodnetwork|travelchannel|diynetwork|cookingchanneltv|geniuskitchen)\.com/ (?P<site>geniuskitchen)\.com/
(?: (?:
player\.[A-Z0-9]+\.html\#| player\.[A-Z0-9]+\.html\#|
show/(?:[^/]+/){2}| show/(?:[^/]+/){2}|
@ -28,38 +28,23 @@ class ScrippsNetworksWatchIE(AWSIE):
(?P<id>\d+) (?P<id>\d+)
''' '''
_TESTS = [{ _TESTS = [{
'url': 'http://watch.hgtv.com/show/HGTVE/Best-Ever-Treehouses/2241515/Best-Ever-Treehouses/', 'url': 'http://watch.geniuskitchen.com/player/3787617/Ample-Hills-Ice-Cream-Bike/',
'md5': '26545fd676d939954c6808274bdb905a',
'info_dict': { 'info_dict': {
'id': '4173834', 'id': '4194875',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Best Ever Treehouses', 'title': 'Ample Hills Ice Cream Bike',
'description': "We're searching for the most over the top treehouses.", 'description': 'Courtney Rada churns up a signature GK Now ice cream with The Scoopmaster.',
'uploader': 'ANV', 'uploader': 'ANV',
'upload_date': '20170922', 'upload_date': '20171011',
'timestamp': 1506056400, 'timestamp': 1507698000,
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
'add_ie': [AnvatoIE.ie_key()], 'add_ie': [AnvatoIE.ie_key()],
}, {
'url': 'http://watch.diynetwork.com/show/DSAL/Salvage-Dawgs/2656646/Covington-Church/',
'only_matching': True,
}, {
'url': 'http://watch.diynetwork.com/player.HNT.html#2656646',
'only_matching': True,
}, {
'url': 'http://watch.geniuskitchen.com/player/3787617/Ample-Hills-Ice-Cream-Bike/',
'only_matching': True,
}] }]
_SNI_TABLE = { _SNI_TABLE = {
'hgtv': 'hgtv',
'diynetwork': 'diy',
'foodnetwork': 'food',
'cookingchanneltv': 'cook',
'travelchannel': 'trav',
'geniuskitchen': 'genius', 'geniuskitchen': 'genius',
} }

View File

@ -64,7 +64,7 @@ class SixPlayIE(InfoExtractor):
for asset in clip_data['assets']: for asset in clip_data['assets']:
asset_url = asset.get('full_physical_path') asset_url = asset.get('full_physical_path')
protocol = asset.get('protocol') protocol = asset.get('protocol')
if not asset_url or protocol == 'primetime' or asset_url in urls: if not asset_url or protocol == 'primetime' or asset.get('type') == 'usp_hlsfp_h264' or asset_url in urls:
continue continue
urls.append(asset_url) urls.append(asset_url)
container = asset.get('video_container') container = asset.get('video_container')
@ -81,19 +81,17 @@ class SixPlayIE(InfoExtractor):
if not urlh: if not urlh:
continue continue
asset_url = urlh.geturl() asset_url = urlh.geturl()
asset_url = re.sub(r'/([^/]+)\.ism/[^/]*\.m3u8', r'/\1.ism/\1.m3u8', asset_url) for i in range(3, 0, -1):
formats.extend(self._extract_m3u8_formats( asset_url = asset_url = asset_url.replace('_sd1/', '_sd%d/' % i)
asset_url, video_id, 'mp4', 'm3u8_native', m3u8_formats = self._extract_m3u8_formats(
m3u8_id='hls', fatal=False)) asset_url, video_id, 'mp4', 'm3u8_native',
formats.extend(self._extract_f4m_formats( m3u8_id='hls', fatal=False)
asset_url.replace('.m3u8', '.f4m'), formats.extend(m3u8_formats)
video_id, f4m_id='hds', fatal=False)) formats.extend(self._extract_mpd_formats(
formats.extend(self._extract_mpd_formats( asset_url.replace('.m3u8', '.mpd'),
asset_url.replace('.m3u8', '.mpd'), video_id, mpd_id='dash', fatal=False))
video_id, mpd_id='dash', fatal=False)) if m3u8_formats:
formats.extend(self._extract_ism_formats( break
re.sub(r'/[^/]+\.m3u8', '/Manifest', asset_url),
video_id, ism_id='mss', fatal=False))
else: else:
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
asset_url, video_id, 'mp4', 'm3u8_native', asset_url, video_id, 'mp4', 'm3u8_native',

View File

@ -26,7 +26,7 @@ class SkylineWebcamsIE(InfoExtractor):
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
stream_url = self._search_regex( stream_url = self._search_regex(
r'url\s*:\s*(["\'])(?P<url>(?:https?:)?//.+?\.m3u8.*?)\1', webpage, r'(?:url|source)\s*:\s*(["\'])(?P<url>(?:https?:)?//.+?\.m3u8.*?)\1', webpage,
'stream url', group='url') 'stream url', group='url')
title = self._og_search_title(webpage) title = self._og_search_title(webpage)

View File

@ -18,6 +18,7 @@ from ..utils import (
int_or_none, int_or_none,
unified_strdate, unified_strdate,
update_url_query, update_url_query,
url_or_none,
) )
@ -34,7 +35,7 @@ class SoundcloudIE(InfoExtractor):
(?:(?:(?:www\.|m\.)?soundcloud\.com/ (?:(?:(?:www\.|m\.)?soundcloud\.com/
(?!stations/track) (?!stations/track)
(?P<uploader>[\w\d-]+)/ (?P<uploader>[\w\d-]+)/
(?!(?:tracks|sets(?:/.+?)?|reposts|likes|spotlight)/?(?:$|[?#])) (?!(?:tracks|albums|sets(?:/.+?)?|reposts|likes|spotlight)/?(?:$|[?#]))
(?P<title>[\w\d-]+)/? (?P<title>[\w\d-]+)/?
(?P<token>[^?]+?)?(?:[?].*)?$) (?P<token>[^?]+?)?(?:[?].*)?$)
|(?:api\.soundcloud\.com/tracks/(?P<track_id>\d+) |(?:api\.soundcloud\.com/tracks/(?P<track_id>\d+)
@ -157,7 +158,7 @@ class SoundcloudIE(InfoExtractor):
}, },
] ]
_CLIENT_ID = 'LvWovRaJZlWCHql0bISuum8Bd2KX79mb' _CLIENT_ID = 'NmW1FlPaiL94ueEu7oziOWjYEzZzQDcK'
@staticmethod @staticmethod
def _extract_urls(webpage): def _extract_urls(webpage):
@ -368,7 +369,6 @@ class SoundcloudSetIE(SoundcloudPlaylistBaseIE):
class SoundcloudPagedPlaylistBaseIE(SoundcloudPlaylistBaseIE): class SoundcloudPagedPlaylistBaseIE(SoundcloudPlaylistBaseIE):
_API_BASE = 'https://api.soundcloud.com'
_API_V2_BASE = 'https://api-v2.soundcloud.com' _API_V2_BASE = 'https://api-v2.soundcloud.com'
def _extract_playlist(self, base_url, playlist_id, playlist_title): def _extract_playlist(self, base_url, playlist_id, playlist_title):
@ -389,21 +389,30 @@ class SoundcloudPagedPlaylistBaseIE(SoundcloudPlaylistBaseIE):
next_href, playlist_id, 'Downloading track page %s' % (i + 1)) next_href, playlist_id, 'Downloading track page %s' % (i + 1))
collection = response['collection'] collection = response['collection']
if not collection:
break
def resolve_permalink_url(candidates): if not isinstance(collection, list):
collection = []
# Empty collection may be returned, in this case we proceed
# straight to next_href
def resolve_entry(candidates):
for cand in candidates: for cand in candidates:
if isinstance(cand, dict): if not isinstance(cand, dict):
permalink_url = cand.get('permalink_url') continue
entry_id = self._extract_id(cand) permalink_url = url_or_none(cand.get('permalink_url'))
if permalink_url and permalink_url.startswith('http'): if not permalink_url:
return permalink_url, entry_id continue
return self.url_result(
permalink_url,
ie=SoundcloudIE.ie_key() if SoundcloudIE.suitable(permalink_url) else None,
video_id=self._extract_id(cand),
video_title=cand.get('title'))
for e in collection: for e in collection:
permalink_url, entry_id = resolve_permalink_url((e, e.get('track'), e.get('playlist'))) entry = resolve_entry((e, e.get('track'), e.get('playlist')))
if permalink_url: if entry:
entries.append(self.url_result(permalink_url, video_id=entry_id)) entries.append(entry)
next_href = response.get('next_href') next_href = response.get('next_href')
if not next_href: if not next_href:
@ -429,46 +438,53 @@ class SoundcloudUserIE(SoundcloudPagedPlaylistBaseIE):
(?:(?:www|m)\.)?soundcloud\.com/ (?:(?:www|m)\.)?soundcloud\.com/
(?P<user>[^/]+) (?P<user>[^/]+)
(?:/ (?:/
(?P<rsrc>tracks|sets|reposts|likes|spotlight) (?P<rsrc>tracks|albums|sets|reposts|likes|spotlight)
)? )?
/?(?:[?#].*)?$ /?(?:[?#].*)?$
''' '''
IE_NAME = 'soundcloud:user' IE_NAME = 'soundcloud:user'
_TESTS = [{ _TESTS = [{
'url': 'https://soundcloud.com/the-akashic-chronicler', 'url': 'https://soundcloud.com/soft-cell-official',
'info_dict': { 'info_dict': {
'id': '114582580', 'id': '207965082',
'title': 'The Akashic Chronicler (All)', 'title': 'Soft Cell (All)',
}, },
'playlist_mincount': 74, 'playlist_mincount': 28,
}, { }, {
'url': 'https://soundcloud.com/the-akashic-chronicler/tracks', 'url': 'https://soundcloud.com/soft-cell-official/tracks',
'info_dict': { 'info_dict': {
'id': '114582580', 'id': '207965082',
'title': 'The Akashic Chronicler (Tracks)', 'title': 'Soft Cell (Tracks)',
}, },
'playlist_mincount': 37, 'playlist_mincount': 27,
}, { }, {
'url': 'https://soundcloud.com/the-akashic-chronicler/sets', 'url': 'https://soundcloud.com/soft-cell-official/albums',
'info_dict': { 'info_dict': {
'id': '114582580', 'id': '207965082',
'title': 'The Akashic Chronicler (Playlists)', 'title': 'Soft Cell (Albums)',
},
'playlist_mincount': 1,
}, {
'url': 'https://soundcloud.com/jcv246/sets',
'info_dict': {
'id': '12982173',
'title': 'Jordi / cv (Playlists)',
}, },
'playlist_mincount': 2, 'playlist_mincount': 2,
}, { }, {
'url': 'https://soundcloud.com/the-akashic-chronicler/reposts', 'url': 'https://soundcloud.com/jcv246/reposts',
'info_dict': { 'info_dict': {
'id': '114582580', 'id': '12982173',
'title': 'The Akashic Chronicler (Reposts)', 'title': 'Jordi / cv (Reposts)',
}, },
'playlist_mincount': 7, 'playlist_mincount': 6,
}, { }, {
'url': 'https://soundcloud.com/the-akashic-chronicler/likes', 'url': 'https://soundcloud.com/clalberg/likes',
'info_dict': { 'info_dict': {
'id': '114582580', 'id': '11817582',
'title': 'The Akashic Chronicler (Likes)', 'title': 'clalberg (Likes)',
}, },
'playlist_mincount': 321, 'playlist_mincount': 5,
}, { }, {
'url': 'https://soundcloud.com/grynpyret/spotlight', 'url': 'https://soundcloud.com/grynpyret/spotlight',
'info_dict': { 'info_dict': {
@ -479,10 +495,11 @@ class SoundcloudUserIE(SoundcloudPagedPlaylistBaseIE):
}] }]
_BASE_URL_MAP = { _BASE_URL_MAP = {
'all': '%s/profile/soundcloud:users:%%s' % SoundcloudPagedPlaylistBaseIE._API_V2_BASE, 'all': '%s/stream/users/%%s' % SoundcloudPagedPlaylistBaseIE._API_V2_BASE,
'tracks': '%s/users/%%s/tracks' % SoundcloudPagedPlaylistBaseIE._API_BASE, 'tracks': '%s/users/%%s/tracks' % SoundcloudPagedPlaylistBaseIE._API_V2_BASE,
'albums': '%s/users/%%s/albums' % SoundcloudPagedPlaylistBaseIE._API_V2_BASE,
'sets': '%s/users/%%s/playlists' % SoundcloudPagedPlaylistBaseIE._API_V2_BASE, 'sets': '%s/users/%%s/playlists' % SoundcloudPagedPlaylistBaseIE._API_V2_BASE,
'reposts': '%s/profile/soundcloud:users:%%s/reposts' % SoundcloudPagedPlaylistBaseIE._API_V2_BASE, 'reposts': '%s/stream/users/%%s/reposts' % SoundcloudPagedPlaylistBaseIE._API_V2_BASE,
'likes': '%s/users/%%s/likes' % SoundcloudPagedPlaylistBaseIE._API_V2_BASE, 'likes': '%s/users/%%s/likes' % SoundcloudPagedPlaylistBaseIE._API_V2_BASE,
'spotlight': '%s/users/%%s/spotlight' % SoundcloudPagedPlaylistBaseIE._API_V2_BASE, 'spotlight': '%s/users/%%s/spotlight' % SoundcloudPagedPlaylistBaseIE._API_V2_BASE,
} }
@ -490,6 +507,7 @@ class SoundcloudUserIE(SoundcloudPagedPlaylistBaseIE):
_TITLE_MAP = { _TITLE_MAP = {
'all': 'All', 'all': 'All',
'tracks': 'Tracks', 'tracks': 'Tracks',
'albums': 'Albums',
'sets': 'Playlists', 'sets': 'Playlists',
'reposts': 'Reposts', 'reposts': 'Reposts',
'likes': 'Likes', 'likes': 'Likes',

View File

@ -14,7 +14,7 @@ from ..utils import (
class StreamangoIE(InfoExtractor): class StreamangoIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?streamango\.com/(?:f|embed)/(?P<id>[^/?#&]+)' _VALID_URL = r'https?://(?:www\.)?(?:streamango\.com|fruithosts\.net)/(?:f|embed)/(?P<id>[^/?#&]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://streamango.com/f/clapasobsptpkdfe/20170315_150006_mp4', 'url': 'https://streamango.com/f/clapasobsptpkdfe/20170315_150006_mp4',
'md5': 'e992787515a182f55e38fc97588d802a', 'md5': 'e992787515a182f55e38fc97588d802a',
@ -38,6 +38,9 @@ class StreamangoIE(InfoExtractor):
}, { }, {
'url': 'https://streamango.com/embed/clapasobsptpkdfe/20170315_150006_mp4', 'url': 'https://streamango.com/embed/clapasobsptpkdfe/20170315_150006_mp4',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://fruithosts.net/f/mreodparcdcmspsm/w1f1_r4lph_2018_brrs_720p_latino_mp4',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -16,7 +16,7 @@ from ..utils import (
class TBSIE(TurnerBaseIE): class TBSIE(TurnerBaseIE):
_VALID_URL = r'https?://(?:www\.)?(?P<site>tbs|tntdrama)\.com/(?:movies|shows/[^/]+/(?:clips|season-\d+/episode-\d+))/(?P<id>[^/?#]+)' _VALID_URL = r'https?://(?:www\.)?(?P<site>tbs|tntdrama)\.com(?P<path>/(?:movies|shows/[^/]+/(?:clips|season-\d+/episode-\d+))/(?P<id>[^/?#]+))'
_TESTS = [{ _TESTS = [{
'url': 'http://www.tntdrama.com/shows/the-alienist/clips/monster', 'url': 'http://www.tntdrama.com/shows/the-alienist/clips/monster',
'info_dict': { 'info_dict': {
@ -40,12 +40,12 @@ class TBSIE(TurnerBaseIE):
}] }]
def _real_extract(self, url): def _real_extract(self, url):
site, display_id = re.match(self._VALID_URL, url).groups() site, path, display_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
drupal_settings = self._parse_json(self._search_regex( drupal_settings = self._parse_json(self._search_regex(
r'<script[^>]+?data-drupal-selector="drupal-settings-json"[^>]*?>({.+?})</script>', r'<script[^>]+?data-drupal-selector="drupal-settings-json"[^>]*?>({.+?})</script>',
webpage, 'drupal setting'), display_id) webpage, 'drupal setting'), display_id)
video_data = drupal_settings['turner_playlist'][0] video_data = next(v for v in drupal_settings['turner_playlist'] if v.get('url') == path)
media_id = video_data['mediaID'] media_id = video_data['mediaID']
title = video_data['title'] title = video_data['title']

View File

@ -14,20 +14,38 @@ from ..utils import (
) )
class UpskillBaseIE(InfoExtractor): class TeachableBaseIE(InfoExtractor):
_LOGIN_URL = 'http://upskillcourses.com/sign_in' _NETRC_MACHINE = 'teachable'
_NETRC_MACHINE = 'upskill' _URL_PREFIX = 'teachable:'
_SITES = {
# Only notable ones here
'upskillcourses.com': 'upskill',
'academy.gns3.com': 'gns3',
'academyhacker.com': 'academyhacker',
'stackskills.com': 'stackskills',
'market.saleshacker.com': 'saleshacker',
'learnability.org': 'learnability',
'edurila.com': 'edurila',
}
_VALID_URL_SUB_TUPLE = (_URL_PREFIX, '|'.join(re.escape(site) for site in _SITES.keys()))
def _real_initialize(self): def _real_initialize(self):
self._login() self._logged_in = False
def _login(self): def _login(self, site):
username, password = self._get_login_info() if self._logged_in:
return
username, password = self._get_login_info(
netrc_machine=self._SITES.get(site, site))
if username is None: if username is None:
return return
login_page, urlh = self._download_webpage_handle( login_page, urlh = self._download_webpage_handle(
self._LOGIN_URL, None, 'Downloading login page') 'https://%s/sign_in' % site, None,
'Downloading %s login page' % site)
login_url = compat_str(urlh.geturl()) login_url = compat_str(urlh.geturl())
@ -46,18 +64,24 @@ class UpskillBaseIE(InfoExtractor):
post_url = urljoin(login_url, post_url) post_url = urljoin(login_url, post_url)
response = self._download_webpage( response = self._download_webpage(
post_url, None, 'Logging in', post_url, None, 'Logging in to %s' % site,
data=urlencode_postdata(login_form), data=urlencode_postdata(login_form),
headers={ headers={
'Content-Type': 'application/x-www-form-urlencoded', 'Content-Type': 'application/x-www-form-urlencoded',
'Referer': login_url, 'Referer': login_url,
}) })
if '>I accept the new Privacy Policy<' in response:
raise ExtractorError(
'Unable to login: %s asks you to accept new Privacy Policy. '
'Go to https://%s/ and accept.' % (site, site), expected=True)
# Successful login # Successful login
if any(re.search(p, response) for p in ( if any(re.search(p, response) for p in (
r'class=["\']user-signout', r'class=["\']user-signout',
r'<a[^>]+\bhref=["\']/sign_out', r'<a[^>]+\bhref=["\']/sign_out',
r'>\s*Log out\s*<')): r'>\s*Log out\s*<')):
self._logged_in = True
return return
message = get_element_by_class('alert', response) message = get_element_by_class('alert', response)
@ -68,8 +92,14 @@ class UpskillBaseIE(InfoExtractor):
raise ExtractorError('Unable to log in') raise ExtractorError('Unable to log in')
class UpskillIE(UpskillBaseIE): class TeachableIE(TeachableBaseIE):
_VALID_URL = r'https?://(?:www\.)?upskillcourses\.com/courses/[^/]+/lectures/(?P<id>\d+)' _VALID_URL = r'''(?x)
(?:
%shttps?://(?P<site_t>[^/]+)|
https?://(?:www\.)?(?P<site>%s)
)
/courses/[^/]+/lectures/(?P<id>\d+)
''' % TeachableBaseIE._VALID_URL_SUB_TUPLE
_TESTS = [{ _TESTS = [{
'url': 'http://upskillcourses.com/courses/essential-web-developer-course/lectures/1747100', 'url': 'http://upskillcourses.com/courses/essential-web-developer-course/lectures/1747100',
@ -77,7 +107,7 @@ class UpskillIE(UpskillBaseIE):
'id': 'uzw6zw58or', 'id': 'uzw6zw58or',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Welcome to the Course!', 'title': 'Welcome to the Course!',
'description': 'md5:8d66c13403783370af62ca97a7357bdd', 'description': 'md5:65edb0affa582974de4625b9cdea1107',
'duration': 138.763, 'duration': 138.763,
'timestamp': 1479846621, 'timestamp': 1479846621,
'upload_date': '20161122', 'upload_date': '20161122',
@ -88,10 +118,37 @@ class UpskillIE(UpskillBaseIE):
}, { }, {
'url': 'http://upskillcourses.com/courses/119763/lectures/1747100', 'url': 'http://upskillcourses.com/courses/119763/lectures/1747100',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://academy.gns3.com/courses/423415/lectures/6885939',
'only_matching': True,
}, {
'url': 'teachable:https://upskillcourses.com/courses/essential-web-developer-course/lectures/1747100',
'only_matching': True,
}] }]
@staticmethod
def _is_teachable(webpage):
return 'teachableTracker.linker:autoLink' in webpage and re.search(
r'<link[^>]+href=["\']https?://process\.fs\.teachablecdn\.com',
webpage)
@staticmethod
def _extract_url(webpage, source_url):
if not TeachableIE._is_teachable(webpage):
return
if re.match(r'https?://[^/]+/(?:courses|p)', source_url):
return '%s%s' % (TeachableBaseIE._URL_PREFIX, source_url)
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) mobj = re.match(self._VALID_URL, url)
site = mobj.group('site') or mobj.group('site_t')
video_id = mobj.group('id')
self._login(site)
prefixed = url.startswith(self._URL_PREFIX)
if prefixed:
url = url[len(self._URL_PREFIX):]
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
@ -113,12 +170,18 @@ class UpskillIE(UpskillBaseIE):
} }
class UpskillCourseIE(UpskillBaseIE): class TeachableCourseIE(TeachableBaseIE):
_VALID_URL = r'https?://(?:www\.)?upskillcourses\.com/courses/(?:enrolled/)?(?P<id>[^/?#&]+)' _VALID_URL = r'''(?x)
(?:
%shttps?://(?P<site_t>[^/]+)|
https?://(?:www\.)?(?P<site>%s)
)
/(?:courses|p)/(?:enrolled/)?(?P<id>[^/?#&]+)
''' % TeachableBaseIE._VALID_URL_SUB_TUPLE
_TESTS = [{ _TESTS = [{
'url': 'http://upskillcourses.com/courses/essential-web-developer-course/', 'url': 'http://upskillcourses.com/courses/essential-web-developer-course/',
'info_dict': { 'info_dict': {
'id': '119763', 'id': 'essential-web-developer-course',
'title': 'The Essential Web Developer Course (Free)', 'title': 'The Essential Web Developer Course (Free)',
}, },
'playlist_count': 192, 'playlist_count': 192,
@ -128,21 +191,37 @@ class UpskillCourseIE(UpskillBaseIE):
}, { }, {
'url': 'http://upskillcourses.com/courses/enrolled/119763', 'url': 'http://upskillcourses.com/courses/enrolled/119763',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://academy.gns3.com/courses/enrolled/423415',
'only_matching': True,
}, {
'url': 'teachable:https://learn.vrdev.school/p/gear-vr-developer-mini',
'only_matching': True,
}, {
'url': 'teachable:https://filmsimplified.com/p/davinci-resolve-15-crash-course',
'only_matching': True,
}] }]
@classmethod @classmethod
def suitable(cls, url): def suitable(cls, url):
return False if UpskillIE.suitable(url) else super( return False if TeachableIE.suitable(url) else super(
UpskillCourseIE, cls).suitable(url) TeachableCourseIE, cls).suitable(url)
def _real_extract(self, url): def _real_extract(self, url):
course_id = self._match_id(url) mobj = re.match(self._VALID_URL, url)
site = mobj.group('site') or mobj.group('site_t')
course_id = mobj.group('id')
self._login(site)
prefixed = url.startswith(self._URL_PREFIX)
if prefixed:
prefix = self._URL_PREFIX
url = url[len(prefix):]
webpage = self._download_webpage(url, course_id) webpage = self._download_webpage(url, course_id)
course_id = self._search_regex( url_base = 'https://%s/' % site
r'data-course-id=["\'](\d+)', webpage, 'course id',
default=course_id)
entries = [] entries = []
@ -162,10 +241,13 @@ class UpskillCourseIE(UpskillBaseIE):
title = self._html_search_regex( title = self._html_search_regex(
r'<span[^>]+class=["\']lecture-name[^>]+>([^<]+)', li, r'<span[^>]+class=["\']lecture-name[^>]+>([^<]+)', li,
'title', default=None) 'title', default=None)
entry_url = urljoin(url_base, lecture_url)
if prefixed:
entry_url = self._URL_PREFIX + entry_url
entries.append( entries.append(
self.url_result( self.url_result(
urljoin('http://upskillcourses.com/', lecture_url), entry_url,
ie=UpskillIE.ie_key(), video_id=lecture_id, ie=TeachableIE.ie_key(), video_id=lecture_id,
video_title=clean_html(title))) video_title=clean_html(title)))
course_title = self._html_search_regex( course_title = self._html_search_regex(

View File

@ -203,10 +203,8 @@ class TEDIE(InfoExtractor):
ext_url = None ext_url = None
if service.lower() == 'youtube': if service.lower() == 'youtube':
ext_url = external.get('code') ext_url = external.get('code')
return {
'_type': 'url', return self.url_result(ext_url or external['uri'])
'url': ext_url or external['uri'],
}
resources_ = player_talk.get('resources') or talk_info.get('resources') resources_ = player_talk.get('resources') or talk_info.get('resources')
@ -267,6 +265,8 @@ class TEDIE(InfoExtractor):
'format_id': m3u8_format['format_id'].replace('hls', 'http'), 'format_id': m3u8_format['format_id'].replace('hls', 'http'),
'protocol': 'http', 'protocol': 'http',
}) })
if f.get('acodec') == 'none':
del f['acodec']
formats.append(f) formats.append(f)
audio_download = talk_info.get('audioDownload') audio_download = talk_info.get('audioDownload')

View File

@ -61,8 +61,4 @@ class TestURLIE(InfoExtractor):
self.to_screen('Test URL: %s' % tc['url']) self.to_screen('Test URL: %s' % tc['url'])
return { return self.url_result(tc['url'], video_id=video_id)
'_type': 'url',
'url': tc['url'],
'id': video_id,
}

View File

@ -343,7 +343,7 @@ class ThePlatformFeedIE(ThePlatformBaseIE):
def _extract_feed_info(self, provider_id, feed_id, filter_query, video_id, custom_fields=None, asset_types_query={}, account_id=None): def _extract_feed_info(self, provider_id, feed_id, filter_query, video_id, custom_fields=None, asset_types_query={}, account_id=None):
real_url = self._URL_TEMPLATE % (self.http_scheme(), provider_id, feed_id, filter_query) real_url = self._URL_TEMPLATE % (self.http_scheme(), provider_id, feed_id, filter_query)
entry = self._download_json(real_url, video_id)['entries'][0] entry = self._download_json(real_url, video_id)['entries'][0]
main_smil_url = 'http://link.theplatform.com/s/%s/media/guid/%d/%s' % (provider_id, account_id, entry['guid']) if account_id else None main_smil_url = 'http://link.theplatform.com/s/%s/media/guid/%d/%s' % (provider_id, account_id, entry['guid']) if account_id else entry.get('plmedia$publicUrl')
formats = [] formats = []
subtitles = {} subtitles = {}
@ -356,7 +356,8 @@ class ThePlatformFeedIE(ThePlatformBaseIE):
if first_video_id is None: if first_video_id is None:
first_video_id = cur_video_id first_video_id = cur_video_id
duration = float_or_none(item.get('plfile$duration')) duration = float_or_none(item.get('plfile$duration'))
for asset_type in item['plfile$assetTypes']: file_asset_types = item.get('plfile$assetTypes') or compat_parse_qs(compat_urllib_parse_urlparse(smil_url).query)['assetTypes']
for asset_type in file_asset_types:
if asset_type in asset_types: if asset_type in asset_types:
continue continue
asset_types.append(asset_type) asset_types.append(asset_type)

View File

@ -0,0 +1,117 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
compat_str,
ExtractorError,
int_or_none,
str_or_none,
try_get,
url_or_none,
)
class TikTokBaseIE(InfoExtractor):
def _extract_aweme(self, data):
video = data['video']
description = str_or_none(try_get(data, lambda x: x['desc']))
width = int_or_none(try_get(data, lambda x: video['width']))
height = int_or_none(try_get(data, lambda x: video['height']))
format_urls = set()
formats = []
for format_id in (
'play_addr_lowbr', 'play_addr', 'play_addr_h264',
'download_addr'):
for format in try_get(
video, lambda x: x[format_id]['url_list'], list) or []:
format_url = url_or_none(format)
if not format_url:
continue
if format_url in format_urls:
continue
format_urls.add(format_url)
formats.append({
'url': format_url,
'ext': 'mp4',
'height': height,
'width': width,
})
self._sort_formats(formats)
thumbnail = url_or_none(try_get(
video, lambda x: x['cover']['url_list'][0], compat_str))
uploader = try_get(data, lambda x: x['author']['nickname'], compat_str)
timestamp = int_or_none(data.get('create_time'))
comment_count = int_or_none(data.get('comment_count')) or int_or_none(
try_get(data, lambda x: x['statistics']['comment_count']))
repost_count = int_or_none(try_get(
data, lambda x: x['statistics']['share_count']))
aweme_id = data['aweme_id']
return {
'id': aweme_id,
'title': uploader or aweme_id,
'description': description,
'thumbnail': thumbnail,
'uploader': uploader,
'timestamp': timestamp,
'comment_count': comment_count,
'repost_count': repost_count,
'formats': formats,
}
class TikTokIE(TikTokBaseIE):
_VALID_URL = r'https?://(?:m\.)?tiktok\.com/v/(?P<id>\d+)'
_TEST = {
'url': 'https://m.tiktok.com/v/6606727368545406213.html',
'md5': 'd584b572e92fcd48888051f238022420',
'info_dict': {
'id': '6606727368545406213',
'ext': 'mp4',
'title': 'Zureeal',
'description': '#bowsette#mario#cosplay#uk#lgbt#gaming#asian#bowsettecosplay',
'thumbnail': r're:^https?://.*~noop.image',
'uploader': 'Zureeal',
'timestamp': 1538248586,
'upload_date': '20180929',
'comment_count': int,
'repost_count': int,
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
data = self._parse_json(self._search_regex(
r'\bdata\s*=\s*({.+?})\s*;', webpage, 'data'), video_id)
return self._extract_aweme(data)
class TikTokUserIE(TikTokBaseIE):
_VALID_URL = r'https?://(?:m\.)?tiktok\.com/h5/share/usr/(?P<id>\d+)'
_TEST = {
'url': 'https://m.tiktok.com/h5/share/usr/188294915489964032.html',
'info_dict': {
'id': '188294915489964032',
},
'playlist_mincount': 24,
}
def _real_extract(self, url):
user_id = self._match_id(url)
data = self._download_json(
'https://m.tiktok.com/h5/share/usr/list/%s/' % user_id, user_id,
query={'_signature': '_'})
entries = []
for aweme in data['aweme_list']:
try:
entry = self._extract_aweme(aweme)
except ExtractorError:
continue
entry['extractor_key'] = TikTokIE.ie_key()
entries.append(entry)
return self.playlist_result(entries, user_id)

Some files were not shown because too many files have changed in this diff Show More