Misaka¶
Misaka is a CFFI-based binding for Hoedown, a fast markdown processing library written in C. It features a fast HTML renderer and functionality to make custom renderers (e.g. man pages or LaTeX).
See the Changelog for all changes.
Installation¶
Misaka has been tested on CPython 2.6, 2.7, 3.2, 3.3, 3.4, 3.5, 3.6 and PyPy 2.6+. CFFI 1.0 or newer is required. This means Misaka will not work on PyPy 2.5 and older versions.
If you’re installing from source and are using Debian or a Debian derivative
(e.g. Ubuntu) make sure build-essential, python-dev and libffi-dev
are installed.
Install with pip:
pip install misaka
Or grab the source from Github:
git clone https://github.com/FSX/misaka.git
cd misaka
python setup.py install
Consult the CFFI documentation if you experience problems installing CFFI.
Use the following commands to install Misaka in Termux:
apt update
apt upgrade
apt install clang python python-dev libffi libffi-dev
pip install misaka
Usage¶
Very simple example:
import misaka as m
print m.html('some other text')
Or:
from misaka import Markdown, HtmlRenderer
rndr = HtmlRenderer()
md = Markdown(rndr)
print md('some text')
Here’s a simple example that uses Pygments to highlight code (houdini is used to escape the HTML):
import houdini as h
import misaka as m
from pygments import highlight
from pygments.formatters import HtmlFormatter, ClassNotFound
from pygments.lexers import get_lexer_by_name
class HighlighterRenderer(m.HtmlRenderer):
def blockcode(self, text, lang):
try:
lexer = get_lexer_by_name(lang, stripall=True)
except ClassNotFound:
lexer = None
if lexer:
formatter = HtmlFormatter()
return highlight(text, lexer, formatter)
# default
return '\n<pre><code>{}</code></pre>\n'.format(
h.escape_html(text.strip()))
renderer = HighlighterRenderer()
md = m.Markdown(renderer, extensions=('fenced-code',))
print(md("""
Here is some code:
```python
print(123)
```
More code:
print(123)
"""))
The above code listing subclasses HtmlRenderer and implements
a BaseRenderer.blockcode() method. See tests/test_renderer.py
for a renderer with all its methods implemented.
Tests¶
tidy is needed to run the tests. tox can be used to run the tests on all supported Python versions with one command.
Run one of the following commands to install tidy:
apt-get install tidy # Debian and derivatives
pacman -S tidyhtml # Arch Linux
And run the tests with:
python setup.py test
It’s also possible to include or exclude tests. -i and -e accept a
comma separated list of testcases:
# Only run MarkdownConformanceTest_10
python setup.py test -i MarkdownConformanceTest_10
# Or everything except MarkdownConformanceTest_10
python setup.py test -e MarkdownConformanceTest_10
# Or everything except MarkdownConformanceTest_10 and MarkdownConformanceTest_103
python setup.py test -e MarkdownConformanceTest_10,MarkdownConformanceTest_103
-l prints a list of all testcases:
$ python setup.py test -l
[... build output ...]
MarkdownConformanceTest_10
MarkdownConformanceTest_103
BenchmarkLibraries
ArgsToIntTest
CustomRendererTest
SmartypantsTest
And -b runs benchmarks (-i and -e can also be used in
combination with -b):
$ python setup.py test -b
[... build output ...]
>> BenchmarkLibraries
test_hoep 3270 1.00 s/t 305.91 us/op
test_markdown 20 1.23 s/t 61.44 ms/op
test_markdown2 10 3.29 s/t 329.34 ms/op
test_misaka 3580 1.00 s/t 280.01 us/op
test_misaka_classes 3190 1.00 s/t 314.00 us/op
test_mistune 70 1.04 s/t 14.91 ms/o
What you see in the above output are the name, repetitions, total amount of time (in seconds) and the time taken for an operation (one repetition). A benchmark tries to stay within one second and runs a test for a minimum of ten repetitions and tries another ten if there’s time left.
API¶
Extensions¶
Name |
Constant |
|---|---|
tables |
EXT_TABLES |
fenced-code |
EXT_FENCED_CODE |
footnotes |
EXT_FOOTNOTES |
autolink |
EXT_AUTOLINK |
strikethrough |
EXT_STRIKETHROUGH |
underline |
EXT_UNDERLINE |
highlight |
EXT_HIGHLIGHT |
quote |
EXT_QUOTE |
superscript |
EXT_SUPERSCRIPT |
math |
EXT_MATH |
no-intra-emphasis |
EXT_NO_INTRA_EMPHASIS |
space-headers |
EXT_SPACE_HEADERS |
math-explicit |
EXT_MATH_EXPLICIT |
disable-indented-code |
EXT_DISABLE_INDENTED_CODE |
HTML render flags¶
Name |
Constant |
|---|---|
skip-html |
HTML_SKIP_HTML |
escape |
HTML_ESCAPE |
hard-wrap |
HTML_HARD_WRAP |
use-xhtml |
HTML_USE_XHTML |
Functions¶
- misaka.html(text, extensions=0, render_flags=0)¶
Convert markdown text to HTML.
extensionscan be a list or tuple of extensions (e.g.('fenced-code', 'footnotes', 'strikethrough')) or an integer (e.g.EXT_FENCED_CODE | EXT_FOOTNOTES | EXT_STRIKETHROUGH).render_flagscan be a list or tuple of flags (e.g.('skip-html', 'hard-wrap')) or an integer (e.g.HTML_SKIP_HTML | HTML_HARD_WRAP).
- misaka.smartypants(text)¶
Transforms sequences of characters into HTML entities.
Markdown
HTML
Result
's(s, t, m, d, re, ll, ve)’s
’s
"Quotes"“Quotes”
“Quotes”
---—
—
--–
–
...…
…
. . .…
…
(c)©
©
(r)®
®
(tm)™
™
3/4¾
¾
1/2½
½
1/4¼
¼
- misaka.escape_html(text, escape_slash=False)¶
Binding for Hoedown’s HTML escaping function.
The implementation is inspired by the OWASP XSS Prevention recommendations:
& --> & < --> < > --> > " --> " ' --> ' / --> / when escape_slash is set to True
Added in version 2.1.0.
Classes¶
- class misaka.Markdown(renderer, extensions=0)¶
Parses markdown text and renders it using the given renderer.
extensionscan be a list or tuple of extensions (e.g.('fenced-code', 'footnotes', 'strikethrough')) or an integer (e.g.EXT_FENCED_CODE | EXT_FOOTNOTES | EXT_STRIKETHROUGH).
- class misaka.HtmlRenderer(flags=0, nesting_level=0)¶
A wrapper for the HTML renderer that’s included in Hoedown.
render_flagscan be a list or tuple of flags (e.g.('skip-html', 'hard-wrap')) or an integer (e.g.HTML_SKIP_HTML | HTML_HARD_WRAP).nesting_levellimits what’s included in the table of contents. The default value is 0, no headers.An instance of the
HtmlRenderercan not be shared with multipleMarkdowninstances, because it carries state that’s changed by theMarkdowninstance.
- class misaka.SaferHtmlRenderer(flags=(), sanitization_mode='skip-html', nesting_level=0, link_rewrite=None, img_src_rewrite=None)¶
A subclass of
HtmlRendererwhich adds protections against Cross-Site Scripting (XSS):The
'skip-html'flag is turned on by default, preventing injection of HTML elements. If you want to escape HTML code instead of removing it entirely, changesanitization_modeto'escape'.The URLs of links and images are filtered to prevent JavaScript injection. This also blocks the rendering of email addresses into links. See the
check_url()method below.Optionally, the URLs can also be rewritten to counter other attacks such as phishing.
Enabling URL rewriting requires extra arguments:
- Parameters:
link_rewrite – the URL of a redirect page, necessary to rewrite the
hrefattributes of linksimg_src_rewrite – the URL of an image proxy, necessary to rewrite the
srcattributes of images
Both strings should include a
{url}placeholder for the URL-encoded target. Examples:link_rewrite='https://example.com/redirect?url={url}', img_src_rewrite='https://img-proxy-domain/{url}'
Added in version 2.1.0.
- autolink(raw_url, is_email)¶
Filters links generated by the
autolinkextension.
- check_url(url, is_image_src=False)¶
This method is used to check a URL.
Returns
Trueif the URL is “safe”,Falseotherwise.The default implementation only allows HTTP and HTTPS links. That means no
mailto:, noxmpp:, noftp:, etc.This method exists specifically to allow easy customization of link filtering through subclassing, so don’t hesitate to write your own.
If you’re thinking of implementing a blacklist approach, see “Which URL schemes are dangerous (XSS exploitable)?”.
- image(raw_url, title='', alt='')¶
Filters the
srcattribute of an image.Note that filtering the source URL of an
<img>tag is only a very basic protection, and it’s mostly useless in modern browsers (they block JavaScript in there by default). An example of attack that filtering does not thwart is phishing based on HTTP Auth, see this issue for details.To mitigate this issue you should only allow images from trusted services, for example your own image store, or a proxy (see
rewrite_url()).
- link(content, raw_url, title='')¶
Filters links.
- rewrite_url(url, is_image_src=False)¶
This method is called to rewrite URLs.
It uses either
self.link_rewriteorself.img_src_rewritedepending on the value ofis_image_src. The URL is returned unchanged if the corresponding attribute isNone.
- class misaka.HtmlTocRenderer(nesting_level=6)¶
A wrapper for the HTML table of contents renderer that’s included in Hoedown.
nesting_levellimits what’s included in the table of contents. The default value is 6, all headers.An instance of the
HtmlTocRenderercan not be shared with multipleMarkdowninstances, because it carries state that’s changed by theMarkdowninstance.
- class misaka.BaseRenderer¶
- blockcode(text, lang='')¶
langcontains the language when fenced code blocks are enabled and a language is defined in ther code block.
- blockquote(content)¶
- header(content, level)¶
levelcan be a humber from 1 to 6.
- hrule()¶
- list(content, is_ordered, is_block)¶
- listitem(content, is_ordered, is_block)¶
- paragraph(content)¶
- table(content)¶
Depends on the tables extension.
- table_header(content)¶
Depends on the tables extension.
- table_body(content)¶
Depends on the tables extension.
- table_row(content)¶
Depends on the tables extension.
- table_cell(content, align, is_header)¶
Depends on the tables extension.
aligncan be empty,center,leftorright.
- footnotes(content)¶
Depends on the footnotes extension.
- footnote_def(content, num)¶
Depends on the footnotes extension.
- footnote_ref(num)¶
Depends on the footnotes extension.
- blockhtml(text)¶
- autolink(link, is_email)¶
Depends on the autolink extension.
- codespan(text)¶
- double_emphasis(content)¶
- emphasis(content)¶
- underline(content)¶
Depends on the underline extension.
- highlight(content)¶
Depends on the highlight extension.
- quote(content)¶
Depends on the quote extension.
- image(link, title='', alt='')¶
- linebreak()¶
- link(content, link, title='')¶
- triple_emphasis(content)¶
- strikethrough(content)¶
Depends on the strikethrough extension.
- superscript(content)¶
Depends on the superscript extension.
- math(text, displaymode)¶
Depends on the math extension.
displaymodecan be0or1. This is howHtmlRendererhandles it:if displaymode == 1: return '\\[{}\\]'.format(text) else: # displaymode == 0 return '\\({}\\)'.format(text)
- raw_html(text)¶
- entity(text)¶
- normal_text(text)¶
- doc_header(inline_render)¶