Initially I wanted to distinguish between parsing and mapping. But as I study MediaWiki markup, I am increasingly convinced the best approach is to map the markups and create a parsing algorithm based on that. I took a break from this series for some other posts. This is a late evening endeavor. So the pace is slow. But I put some time into the map. In the course of mapping, I’ve realized the structure of the MediaWiki markup is varied. My feeling is that the markup is a product of what works to promote wiki contribution and that it is an evolving structure.
I’m working through this Wikipedia Help article: Help:Wiki markup and making notes in a spreadsheet as I go. Developing the map will take at least a couple of iterations. At the moment, I’m through about one third of the help article. Some of the Flare tags in the table do not show the unique class which will be used. This is a pasting issue.
MediaWiki Markup | Flare Topic XML / XHTML | ||||||
Start | End | Start Tag | Nested Start or End Tag | Nested Start or End Tag | Nested Start or End Tag | Nested Start or End Tag | End |
== | == | <h2> | </h2> | ||||
=== | === | <h3> | </h3> | ||||
==== | ==== | <h4> | </h4> | ||||
===== | ===== | <h5> | </h5> | ||||
====== | ====== | <h6> | </h6> | ||||
; | : | <div > | <div> | </div> | <div > | </div> | </div> |
—- | <hr /> | ||||||
__TOC__ | Mini TOC Proxy | ||||||
__NOTOC__ | <div /> | ||||||
<br /> | <br /> | ||||||
<br> | <br /> | ||||||
single newline | <div /> | ||||||
empty line | </p> | <p> | |||||
: | <div /> | ||||||
<blockquote> | </blockquote> | ||||||
<div style=”width:auto; margin-left:auto; margin-right:auto;”> | </div> | <div style=”width:auto; margin-left:auto; margin-right:auto;”> | </div> | ||||
* | <ul> | <li> | <p> | </p> | </li> | ||
*: | <p> | </p> | </li> | ||||
</ul> | |||||||
# | <ol> | <li> | <p> | </p> | </li> | ||
#: | <p> | </p> | </li> | ||||
newline | </ol> | ||||||
<poem> | </poem> | <pre> | </pre> | ||||
” | ” | <i> | </i> | ||||
”’ | ”’ | <b> | </b> | ||||
”” | ”” | <b> | <i> | </i> | </b> | ||
{{Smallcaps|small caps}} | |||||||
<code> | </code> | <code> | </code> | ||||
<syntaxhighlight> | </syntaxhighlight> | <syntaxhighlight> | </syntaxhighlight> | ||||
<small> | </small> | <small> | </small> | ||||
<big> | </big> | <big> | </big> | ||||
| | ||||||
{{pad|4em}} | <div> | </div> | |||||
<tt> | </tt> | ||||||
À | |||||||
Á Â Ã Ä Å Æ | |||||||
Ç È É Ê Ë | |||||||
Ì Í Î Ï Ñ | |||||||
Ò Ó Ô Õ Ö Ø | |||||||
Ù Ú Û Ü ß | |||||||
à á â ã ä å æ ç | |||||||
è é ê ë | |||||||
ì í î ï ñ | |||||||
ò ó ô õ ö ø œ | |||||||
ù ú û ü ÿ | |||||||
¿ ¡ § ¶ | |||||||
† ‡ • – — | |||||||
‹ › « » | |||||||
‘ ’ “ ” | |||||||
' " | |||||||
<pre> | </pre> | <pre> | </pre> | ||||
<nowiki> | <nowiki> | ||||||
™ © ® ¢ € ¥ | |||||||
£ ¤ |