Anisse Astier
ec0fafbb19
[extractor/common] fallback on utf-8 when charset is not found
...
fixes #2721
2014-04-07 23:10:16 +02:00
Philipp Hagemeister
b6cfde99b7
Only mention websense URL once
2014-04-03 08:12:53 +02:00
Philipp Hagemeister
2410c43d83
Detect Websense censorship ( Fixes #2670 )
2014-04-03 06:09:38 +02:00
Philipp Hagemeister
38d63d846e
[extractor/common] Clarify preference key in formats
2014-03-23 17:41:43 +01:00
Philipp Hagemeister
955c451456
Rename upload_timestamp to timestamp
2014-03-13 18:45:14 +01:00
Philipp Hagemeister
9d2ecdbc71
[vevo] Centralize timestamp handling
2014-03-13 15:30:25 +01:00
Philipp Hagemeister
5a25f39653
Correct extractor documentation
2014-03-10 13:09:55 +01:00
Philipp Hagemeister
9f62eaf4ef
[canal13cl] Add test and improve extraction ( #2498 )
2014-03-03 12:53:11 +01:00
Philipp Hagemeister
0afef30b23
Add display_id field
2014-03-03 12:06:28 +01:00
Philipp Hagemeister
81c2f20b53
[youtube] Correct invalid JSON ( Fixes #2353 )
2014-02-09 17:56:10 +01:00
dst
c1206423c4
Fix extraction of og content in single quotes
2014-01-31 03:57:33 +07:00
Jaime Marquínez Ferrándiz
0c708f11cb
[bloomberg] Fix ooyala url extraction
...
Added a helper method to InfoExtractor for searching the ‘twitter:player’ meta property.
Now the OoyalaIE also recognizes the ‘ec’ parameter in the url as the embed code.
2014-01-29 18:03:32 +01:00
Philipp Hagemeister
7e8caf30c0
Throw an error if no video formats are found
2014-01-27 07:31:54 +01:00
Philipp Hagemeister
db1f388878
[huffpost] Add support
2014-01-27 05:47:38 +01:00
Jaime Marquínez Ferrándiz
944d65c762
[extractor/common] Encode the url when calculating the md5 with —write-pages
option
...
This doesn’t cause any problem in python 2.*, but on python 3 the `md5` function only accepts bytes.
2014-01-25 15:32:56 +01:00
Philipp Hagemeister
1394ce65b4
[youtube] Add new formats ( Fixes #2221 )
2014-01-23 23:54:06 +01:00
Philipp Hagemeister
50317b111d
Merge branch 'youtube-dash-manifest'
...
Conflicts:
youtube_dl/extractor/youtube.py
2014-01-22 19:58:31 +01:00
Philipp Hagemeister
9d4288b2d4
[extractor/common] Clarify when and when not we generate the filename
2014-01-21 01:41:13 +01:00
Philipp Hagemeister
b60016e831
Deal with implicitly UTF-16 decoded webpages
...
These webpages don't specify an encoding and rely on the BOM
2014-01-21 01:39:40 +01:00
Philipp Hagemeister
dd27fd1739
[youtube] Download DASH manifest
...
If given, download and parse the DASH manifest file, in order to get ultra-HQ formats.
Fixes #2166
2014-01-19 05:47:20 +01:00
Philipp Hagemeister
3ec05685f7
[extractor/common] Limit --write-pages filename to 200 chars
...
This avoids problems with very long URLs.
2014-01-17 14:47:47 +01:00
Philipp Hagemeister
9933b57430
[pornhub] Use centralized sorting
2014-01-07 10:25:34 +01:00
Philipp Hagemeister
3d3538e422
[khanacademy] Add support ( Fixes #2066 )
2014-01-07 09:35:34 +01:00
Philipp Hagemeister
5d73273f6f
[orf] Use new extraction method ( Fixes #2057 )
2014-01-06 17:15:27 +01:00
Philipp Hagemeister
9887c9b2d6
[jpopsuki] Simplify
2014-01-03 12:51:37 +01:00
Philipp Hagemeister
08d13955dd
[wistia] Prefer original video format above all others
...
We could also set up a formula which would weigh filesize/bitrate and vcodec/acodec (say, 1GB h264 < 3 GB MPEG2 < 2 GB h264), but that would get really messy real soon.
2014-01-01 20:23:49 +01:00
Philipp Hagemeister
5d4f3985be
Document that format_id field should be present
2013-12-26 21:19:00 +01:00
Philipp Hagemeister
7217e148fb
[yahoo] Use centralized sorting, and add tbr field
2013-12-25 15:18:40 +01:00
Philipp Hagemeister
c7deaa4c74
[zdf] Use centralized sorting
2013-12-24 23:32:04 +01:00
Philipp Hagemeister
e6812ac99d
[spiegel] Use centralized sorting
2013-12-24 12:40:23 +01:00
Philipp Hagemeister
4bcc7bd1f2
Add temporary _sort_formats helper function
2013-12-24 12:31:42 +01:00
Philipp Hagemeister
f49d89ee04
Add a resolution field and improve general --list-formats output
2013-12-24 11:56:02 +01:00
Philipp Hagemeister
f45f96f8f8
[myvideo] Use RTMP instead of RTMPT ( Fixes #2032 )
2013-12-23 15:57:43 +01:00
Philipp Hagemeister
1538eff6d8
[bliptv] Remove support for direct downloads
...
This is now handled by the generic IE
2013-12-23 15:49:21 +01:00
Philipp Hagemeister
aa94a6d315
[aparat] Add support ( Fixes #2012 )
2013-12-20 17:05:39 +01:00
Jaime Marquínez Ferrándiz
c0d0b01f0e
[generic] Detect ooyala videos ( fixes #2013 )
2013-12-19 20:32:12 +01:00
Philipp Hagemeister
46374a56b2
[youtube] Do not warn for videos with allow_rating=0
...
This fixes #1982
Test video: http://www.youtube.com/watch?v=gi2uH3YxohU
2013-12-17 02:49:56 +01:00
Itay Brandes
87a28127d2
_search_regex's "isatty" call fails with Py2exe's
...
_search_regex calls the sys.stderr.isatty() function for unix systems.
Py2exe uses a custom Stderr() stream which doesn't have an `isatty()`
function, leading to it's crash.
Fixes easily with checking that it's a unix system first.
2013-12-16 21:50:26 +01:00
Philipp Hagemeister
d67b0b1596
Reorder info_dict documentation
2013-12-16 14:13:40 +01:00
Philipp Hagemeister
c0ba0f4859
Document duration field
2013-12-16 04:09:43 +01:00
Philipp Hagemeister
e2b38da931
[mtv] Fixup incorrectly encoded XML documents
2013-12-10 12:45:22 +01:00
Philipp Hagemeister
7cc3570e53
Add fatal=False parameter to _download_* functions.
...
This allows us to simplify the calls in the youtube extractor even further.
2013-12-09 01:49:03 +01:00
Philipp Hagemeister
19e3dfc9f8
[9gag] Like/dislike count ( #1895 )
2013-12-05 18:29:07 +01:00
Philipp Hagemeister
aaebed13a8
[smotri] Simplify
2013-12-02 17:08:17 +01:00
Philipp Hagemeister
2a275ab007
[zdf] Use _download_xml
2013-11-28 05:47:50 +01:00
Philipp Hagemeister
79d09f47c2
Merge branch 'opener-to-ydl'
2013-11-25 03:30:37 +01:00
Philipp Hagemeister
c059bdd432
Remove quality_name field and improve zdf extractor
2013-11-25 03:28:55 +01:00
Philipp Hagemeister
02dbf93f0e
[zdf/common] Use API in ZDF extractor.
...
This also comes with a lot of extra format fields
Fixes #1518
2013-11-25 03:13:22 +01:00
Philipp Hagemeister
e03db0a077
Merge branch 'master' into opener-to-ydl
2013-11-24 15:18:44 +01:00
Jaime Marquínez Ferrándiz
267ed0c5d3
[collegehumor] Encode the xml before calling xml.etree.ElementTree.fromstring ( fixes #1822 )
...
Uses a new helper method in InfoExtractor: _download_xml
2013-11-24 14:59:19 +01:00
Philipp Hagemeister
7012b23c94
Match --download-archive during playlist processing ( Fixes #1745 )
2013-11-22 22:46:46 +01:00
Philipp Hagemeister
dca0872056
Move the opener to the YoutubeDL object.
...
This is the first step towards being able to just import youtube_dl and start using it.
Apart from removing global state, this would fix problems like #1805 .
2013-11-22 19:57:52 +01:00
Philipp Hagemeister
5904088811
Add support for tou.tv ( Fixes #1792 )
2013-11-20 06:13:19 +01:00
Philipp Hagemeister
91c7271aab
Add automatic generation of format note based on bitrate and codecs
2013-11-16 01:08:43 +01:00
Jaime Marquínez Ferrándiz
78fb87b283
Don't accept '>' inside the content attribute in OpenGraph regexes
2013-11-15 12:54:13 +01:00
Jaime Marquínez Ferrándiz
ab2d524780
Improve the OpenGraph regex
...
* Do not accept '>' between the property and content attributes.
* Recognize the properties if the content attribute is before the property attribute using two regexes (fixes the extraction of the description for SlideshareIE).
2013-11-15 12:24:54 +01:00
Philipp Hagemeister
eb0a839866
[common] Simplify og_search_property
2013-11-12 10:36:23 +01:00
Marcin Cieślak
a8eeb0597b
Fix AssertionError when og property not found
...
On tvp.pl some webpages contain OpenGraph
metadata and some don't.
If og property is not found, _og_search_description
fails with
WARNING: unable to extract OpenGraph description; please report this issue on http://yt-dl.org/bug
Traceback (most recent call last):
File "/usr/home/saper/bin/youtube-dl", line 18, in <module>
youtube_dl.main()
File "/usr/home/saper/sw/youtube-dl/youtube_dl/__init__.py", line 766, in main
_real_main(argv)
File "/usr/home/saper/sw/youtube-dl/youtube_dl/__init__.py", line 719, in _real_main
retcode = ydl.download(all_urls)
File "/usr/home/saper/sw/youtube-dl/youtube_dl/YoutubeDL.py", line 715, in download
videos = self.extract_info(url)
File "/usr/home/saper/sw/youtube-dl/youtube_dl/YoutubeDL.py", line 348, in extract_info
ie_result = ie.extract(url)
File "/usr/home/saper/sw/youtube-dl/youtube_dl/extractor/common.py", line 125, in extract
return self._real_extract(url)
File "/usr/home/saper/sw/youtube-dl/youtube_dl/extractor/tvp.py", line 56, in _real_extract
info['description'] = self._og_search_description(webpage)
File "/usr/home/saper/sw/youtube-dl/youtube_dl/extractor/common.py", line 331, in _og_search_description
return self._og_search_property('description', html, fatal=False, **kargs)
File "/usr/home/saper/sw/youtube-dl/youtube_dl/extractor/common.py", line 325, in _og_search_property
return unescapeHTML(escaped)
File "/usr/home/saper/sw/youtube-dl/youtube_dl/utils.py", line 494, in unescapeHTML
assert type(s) == type(u'')
AssertionError
The patch allows me to use:
try:
info['description'] = self._og_search_description(webpage)
info['thumbnail'] = self._og_search_thumbnail(webpage)
except RegexNotFoundError:
pass
2013-11-05 23:19:29 +01:00
Jaime Marquínez Ferrándiz
9103bbc5cd
Add the 'webpage_url' field to info_dict
...
The url for the video page, it must allow to reproduce the result.
It's automatically set by YoutubeDL if it's missing.
2013-11-03 12:11:13 +01:00
Philipp Hagemeister
b5d0d817bc
Remove superfluous space
2013-10-30 01:09:44 +01:00
Philipp Hagemeister
ebc14f251c
Merge remote-tracking branch 'origin/master'
2013-10-28 10:44:13 +01:00
Philipp Hagemeister
d41e6efc85
New debug option --write-pages
2013-10-28 10:44:02 +01:00
Filippo Valsorda
8ffa13e03e
[Instagram] get the non-https link, as they are serving Akamai cert from a instagram.com domain
2013-10-28 02:34:29 -04:00
Jaime Marquínez Ferrándiz
55b3e45bba
[vimeo] Fix pro videos and player.vimeo.com urls
...
The old process can still be used for those videos.
Added RegexNotFoundError, which is raised by _search_regex if it can't extract the info.
2013-10-23 14:38:03 +02:00
Jaime Marquínez Ferrándiz
8c51aa6506
The 'format' field now defaults to '{format_id} - {width}x{height}{format_note}'
...
Following the YoutubeIE format. The 'format_note' gives additional info about the format, for example '3D' or 'DASH video'.
2013-10-21 14:42:06 +02:00
Philipp Hagemeister
416a5efce7
fix typos
2013-10-18 00:49:45 +02:00
Philipp Hagemeister
8dbe9899a9
Allow users to specify an age limit ( fixes #1545 )
...
With these changes, users can now restrict what videos are downloaded by the intented audience, by specifying their age with --age-limit YEARS .
Add rudimentary support in youtube, pornotube, and youporn.
2013-10-06 06:08:56 +02:00
Philipp Hagemeister
2f5865cc6d
Clarify that url and ext are optional when formats is given ( #980 )
2013-10-04 11:09:43 +02:00
Philipp Hagemeister
deefc05b88
Document formats (for #980 )
2013-10-04 10:40:42 +02:00
Jaime Marquínez Ferrándiz
0d75ae2ce3
Fix detection of the webpage charset if it's declared using ' instead of "
...
Like in "<meta charset='utf-8'/>"
2013-08-29 11:35:15 +02:00
Philipp Hagemeister
f143d86ad2
[sohu] Handle encoding, and fix tests
2013-08-28 14:00:05 +02:00
Philipp Hagemeister
6d69d03bac
Merge remote-tracking branch 'origin/reuse_ies'
2013-08-28 13:05:21 +02:00
Philipp Hagemeister
2eabb80254
[addanime] improve
2013-08-28 04:25:38 +02:00
Jaime Marquínez Ferrándiz
9e9c164052
Merge pull request #937 from jaimeMF/subtitles_rework
...
Subtitles rework
2013-08-23 02:40:25 -07:00
Philipp Hagemeister
79cb25776f
Cache suitable regular expressions
...
This speeds up TestAllURLsMatching.test_no_duplicates by about 8000% at the cost of minimal memory overhead.
2013-08-21 04:06:48 +02:00
Jaime Marquínez Ferrándiz
5d51a883c2
Use a dictionary for storing the subtitles
...
The errors while getting the subtitles are reported as warnings, if no subtitles are found return and empty dict.
2013-07-20 12:52:25 +02:00
Philipp Hagemeister
f38de77f6e
Use unescapeHTML for OpenGraph properties
...
These are attribute values, so we don't need the more complex and whitespace-destroying cleanHTML - we just need to unescape quotes, that's it.
2013-07-17 10:38:23 +02:00
Philipp Hagemeister
b9d3e1635f
Strip hash info from URL when making requests ( Fixes #1038 )
2013-07-13 22:52:12 +02:00
Philipp Hagemeister
3c4e6d8337
Improve OpenGraph property matching
2013-07-13 20:39:47 +02:00
Jaime Marquínez Ferrándiz
44dbe89035
Use re.DOTALL by default when searching OpenGraph properties
2013-07-13 11:29:08 +02:00
Jaime Marquínez Ferrándiz
46720279c2
InfoExtractor: add some helper methods to extract OpenGraph info
2013-07-12 22:12:04 +02:00
Philipp Hagemeister
690e872c51
Remove video_result helper method
...
Calling it was more complex then actually including the type in the video info
2013-07-11 12:12:30 +02:00
Jaime Marquínez Ferrándiz
56c7366547
YoutubeIE: reuse instances of InfoExtractors ( closes #998 )
...
When a IE is added to the list, it's also added to a dictionary. When a IE is requested it first looks in the dictionary and if there's no instance it will create a new one.
That way _real_initialize is only called once for each IE, saving time if it needs to login for example.
2013-07-08 15:14:27 +02:00
Philipp Hagemeister
d93e4dcbb7
Merge branch 'master' of github.com:rg3/youtube-dl
2013-07-08 01:15:19 +02:00
Philipp Hagemeister
73e79f2a1b
[3sat] Add support ( Fixes #1001 )
2013-07-08 01:13:55 +02:00
Jaime Marquínez Ferrándiz
fc79158de2
VimeoIE: authentication support ( closes #885 ) and add a method in the base InfoExtractor to get the login info
2013-07-07 23:24:34 +02:00
Philipp Hagemeister
0f81866329
Add --list-extractor-descriptions (human-readable list of IEs)
2013-07-01 18:52:19 +02:00
Philipp Hagemeister
f3d294617f
Document view_count ( Closes #963 )
2013-06-29 16:32:28 +02:00
Filippo Valsorda
98bcd2834a
improve generic and encrypted signature error messages
2013-06-25 16:47:16 +02:00
Philipp Hagemeister
3c25b9abae
Remove useless headers
2013-06-23 20:35:50 +02:00
Philipp Hagemeister
d6983cb460
Fix generic class move (add all files)
2013-06-23 19:57:38 +02:00