PyXWF.Parsers – Parser baseclass

class PyXWF.Parsers.ParserBase(site, parser_mimetypes=[], **kwargs)[source]

Baseclass for Parser implementations. As parsers are sitletons, they have to use the same metaclass. You usually will additionally want to derive from TweakSitleton for support for site-wide configuration of your parser:

# order of inheritance matters!
class RestParser(Tweaks.TweakSitleton, ParserBase):
    __metaclass__ = Registry.SitletonMeta

    def __init__(self, site):
        super(RestParser, self).__init__(site,
            # pass keyword arguments to TweakSitleton

    def parse(self, fileref):
        # fancy parsing happens here
        return Document.Document()

Check out the TweakSitleton documentation for an example of arguments and their effects.

Parsers have to implement the parse() method.

parse(fileref, header_offset=1)[source]

Take a file name or filelike in fileref and parse the hell out of it. Return a Document instance with all relevant data filled in.

classmethod transform_headers(body, header_offset)[source]

header_offset must be a non-negative integer. That amount of header levels will be added to any <h:hN /> elements encountered in the body element tree. A header_offset of 1 will thus convert all <h:h1 /> to <h:h2 />, all <h:h2 /> to <h:h2 /> and so on.

If the conversion would result in a <h:h7 /> or above, the tag is converted into a <h:p /> tag.


This operation is in-place and returns None.

PyXFW.Parsers.PyWebXML — The default document format for PyXWF

class PyXWF.Parsers.PyWebXML.PyWebXML(site)[source]

This class parses PyWebXML documents. Usually, you don’t create instances of this, you just access it using via the parser_registry attribute of your Site instance.

parse(fileref, **kwargs)[source]

Parse the file referenced by fileref as PyWebXML document and return the resulting Document instance.

parse_tree(root, header_offset=1)[source]

Take the root element of an ElementTree and interpret it as PyWebXML document. Return the resulting Document instance on success and raise on error.

header_offset works as documented in the base class’ transform_headers() method.

Table Of Contents

Previous topic

PyXWF.Crumbs – Crumb baseclass

Next topic

Various utilities

This Page