Tag Archives: MediaWiki

Flare to MediaWiki to Flare (part 6, of wiki template articles and snippets)

A mapping effort chronicled in a blog is less than exciting to read. So let’s switch gears and lay some groundwork for the parser. We will use a Wikipedia article as the first test and as a guide for what parts of the markup to tackle firstly. Since wikis are moving targets, we will work with the markup at a specific point in time.

This discussion is going to become complicated very fast. It may seem overly complicated. But rest assured that the parser will be doing the work both ways and the complex structures created in Flare to solve the issues we encounter aren’t much more complex than the wiki markup itself.

The article is http://en.wikipedia.org/wiki/Data_mapping. We can see the source for the article by switching to the Edit tab at http://en.wikipedia.org/w/index.php?title=Data_mapping&action=edit. For my use, I copied the markup to a text file. But remember, we have already established that the markup can be retrieved with an API.

The markup begins with this line followed by an empty line:

{{Data transformation}}

The double curly braces indicate a template. Templates are described in http://en.wikipedia.org/wiki/Help:Template. The term, template, surprises me because templates are pages which can be used in other pages. Snippet or boilerplate would have made more sense to me. But with MediaWiki, templates are for content reuse.

How do we track down that template? Templates can be maintained in the Template namespace. Let’s see if that works. Look here: http://en.wikipedia.org/wiki/Template:Data_transformation.

The template is there. As of writing, it establishes a box in the upper right-hand corner of the article which provides links to data transformation topics. So we will need to pull that article over during import. At the moment, I’m leaning toward storing template articles as snippets and using a naming convention for the snippet files which indicates the file’s status as a Flare version of a wiki template.

Snippets help: http://webhelp.madcapsoftware.com/flare8/Content/Snippets/About_Snippets.htm

This popped up in a Google search: http://www.contentinsomnia.net/reusing-content-on-wikis-and-blogs/. I thought I would share the link since it confirms someone else equates Flare snippets with MediaWiki templates.

For the Flare markup, we can use this:

<MadCap:snippetBlock src=”../Snippets/Template_Data_transformation.flsnp” />

We could use a regular expression for this since the object is replace {{ with:

<MadCap:snippetBlock src=”../Snippets/Template_

And }} with:

.flsnp” />

We should also retain the line ending. This replacement is our rule A for now. Rule B is to record the template name for future export and import. By the way, there is an export option to include templates with http://www.mediawiki.org/wiki/Special:Export.

The third line in the markup is:

{{inline|date=June 2010}}

There are more curly braces. But the contents of the braces seem more complex. The first part indicates another template: http://en.wikipedia.org/wiki/Template:Inline. But the second part indicates to display a date. The template article includes instructions about how to use the Inline template. The template is an appeal to improve the article because the citations are insufficient. The date portion is a parameter: http://www.mediawiki.org/wiki/Help:Templates#Parameters. This must be why it is called a template and not a snippet. How are we going to handle parameters in a snippet? Conditions, variables, something else?

A condition applied to the markup for the snippet block is an attribute for the snippet block element. This makes a condition attractive. But imagine how many conditions we would introduce as we imported more templates with more parameters. Besides, the condition doesn’t influence the appearance of the content until a target is generated. What to include or exclude is set on targets. Similarly, variable values are set on targets. Conditions and variables don’t give us what we want.

What about styles?

That feels closer. But unfortunately, which style to use is established in the markup with a class attribute. An element only gets one of those. If class is used, every parameter value would have to be conveyed through that one attribute. We want class for styles and we don’t want hundreds of MediaWiki template parameter mapped styles.

We could also consider grouping the snippet block in a div element with other elements to represent the parameters. But unfortunately some parameters influence the text of the MediaWiki template article. With the Inline template, you can replace the word article with another word indicated in a parameter. How can we achieve parity with the wiki if this is the case? This is an important decision point. We should convey the parameters to Flare so we can make the round trip. But how concerned are we about the way the parameters are manifested in Flare? In this case, we are probably not very concerned. But the meaning of the template can be changed drastically with a parameter. We must address it.

We can address it with a new attribute. But if we do that, we will run into problems conveying the parameters if the snippet is used elsewhere with different parameters. We will convey the parameters within the name of the snippet itself.

<MadCap:snippetBlock src=”../Snippets/Template_Inline|date=June 2010.flsnp”  />

I’m not sure about the pipe character in the markup, I’ll have to check. Secondly, we can place the parameter information in the snippet so as to display in Flare as intended. We can use another snippet within the template snippet for the parameter.

Snippet propagation.

It can be named according to the template and the parameter and contain the appropriate value. When the round trip is completed, the snippet block for the parameter will be translated to the markup in the template which declares the parameter. But the parameter snippet content will not be conveyed. That content is just for Flare. We can see the markup for the Inline template here: http://en.wikipedia.org/w/index.php?title=Template:No_footnotes&action=edit.

Only one version of the snippet contents will be parsed back into wiki markup when the round trip is completed. But the parameter values will still be parsed from the template snippet names. But what if we want to edit a snippet for the template and have the changes propagate to other versions of the template? You guessed it.

More snippet propagation!

All of the content in between snippets for parameters can also be snippets. This way someone can drill down and make changes in Flare which will be reflected in each parameter set version for the snippet for the wiki template article.

The ground rules and guidelines are:

  1. Don’t rename these snippets. Renaming will change the parameter values returned to the wiki on completion of the round trip.
  2. Only create new snippets as needed. Name the template snippets and parameter snippets according to the template article name and parameter values. Follow the convention strictly. The new parameters will convey to the wiki.
  3. Consider that template changes in wikis are a big deal. Don’t take them lightly.

Wikis use bots and people to maintain things like parameters in transclusions. We are attempting to maintain a similar structure in Flare without the advantage of bots. It may be a good idea to create a validation algorithm to check these projects periodically or even constantly just to ensure the Flare side is maintained such that the wiki is respected.

And on to line 4 of the wiki markup…




Flare to MediaWiki to Flare (part 5, Still Mapping)

Initially I wanted to distinguish between parsing and mapping. But as I study MediaWiki markup, I am increasingly convinced the best approach is to map the markups and create a parsing algorithm based on that. I took a break from this series for some other posts. This is a late evening endeavor. So the pace is slow. But I put some time into the map. In the course of mapping, I’ve realized the structure of the MediaWiki markup is varied. My feeling is that the markup is a product of what works to promote wiki contribution and that it is an evolving structure.

I’m working through this Wikipedia Help article: Help:Wiki markup and making notes in a spreadsheet as I go. Developing the map will take at least a couple of iterations. At the moment, I’m through about one third of the help article. Some of the Flare tags in the table do not show the unique class which will be used. This is a pasting issue.

MediaWiki Markup Flare Topic XML / XHTML
Start End Start Tag Nested Start or End Tag Nested Start or End Tag Nested Start or End Tag Nested Start or End Tag End
== == <h2> </h2>
=== === <h3> </h3>
==== ==== <h4> </h4>
===== ===== <h5> </h5>
====== ====== <h6> </h6>
; : <div > <div> </div> <div > </div> </div>
—- <hr />
__TOC__ Mini TOC Proxy
__NOTOC__ <div />
<br /> <br />
<br> <br />
single newline <div />
empty line </p> <p>
: <div />
<blockquote> </blockquote>
<div style=”width:auto; margin-left:auto; margin-right:auto;”> </div> <div style=”width:auto; margin-left:auto; margin-right:auto;”> </div>
* <ul> <li> <p> </p> </li>
*: <p> </p> </li>
# <ol> <li> <p> </p> </li>
#: <p> </p> </li>
newline </ol>
<poem> </poem> <pre> </pre>
<i> </i>
”’ ”’ <b> </b>
”” ”” <b> <i> </i> </b>
{{Smallcaps|small caps}}
<code> </code> <code> </code>
<syntaxhighlight> </syntaxhighlight> <syntaxhighlight> </syntaxhighlight>
<small> </small> <small> </small>
<big> </big> <big> </big>
&nbsp; &nbsp;
{{pad|4em}} <div> </div>
<tt> </tt>
&Aacute; &Acirc; &Atilde; &Auml; &Aring; &AElig;
&Ccedil; &Egrave; &Eacute; &Ecirc; &Euml;
&Igrave; &Iacute; &Icirc; &Iuml; &Ntilde;
&Ograve; &Oacute; &Ocirc; &Otilde; &Ouml; &Oslash;
&Ugrave; &Uacute; &Ucirc; &Uuml; &szlig;
&agrave; &aacute; &acirc; &atilde; &auml; &aring; &aelig; &ccedil;
&egrave; &eacute; &ecirc; &euml;
&igrave; &iacute; &icirc; &iuml; &ntilde;
&ograve; &oacute; &ocirc; &otilde; &ouml; &oslash; &oelig;
&ugrave; &uacute; &ucirc; &uuml; &yuml;
&iquest; &iexcl; &sect; &para;
&dagger; &Dagger; &bull; &ndash; &mdash;
&lsaquo; &rsaquo; &laquo; &raquo;
&lsquo; &rsquo; &ldquo; &rdquo;
&apos; &quot;
<pre> </pre> <pre> </pre>
<nowiki> <nowiki>
&trade; &copy; &reg; &cent; &euro; &yen;
&pound; &curren;

Flare to MediaWiki to Flare (part 4, Starting to Map MediaWiki Markup to Flare Topic XML)

Parsing the wiki markup for MediaWiki wikis is a challenge. There is not a markup spec to which to build a parser. But parsers exist. There would be no Wikipedia without one.

The original plan for this blog series was to use an existing libary to parse wikitext. That may still be the solution. But to convert one markup to another, a good map comes in handy. Here is the simplest version of the map: MediaWiki wikitext > Flare topic XML. But the nuts and bolts are much more complicated.

There is a wiki page on the MediaWiki.org website which discusses the possibility of a spec: Markup spec. The conclusion I draw from reading that page is that the rules for wikitext on a MediaWiki wiki is defined by the parser for a MediaWiki site. In other words, if you want your wiki markup to work, determine the parser behavior.

Writing markup to suit the needs of a parser isn’t especially shocking. Web developers constantly tweak their HTML to accommodate the whims of web browser makers. At least with this markup, there is really only one parser about which to be concerned, the MediaWiki parser. How does that parser behave?

The Markup spec page describes the parser’s actions. Notably, the outline describes a preprocessor, a parser, and a save parser with different behaviors. This is a potential model for different parsing behaviors in the round trip. A section of the page describes parts of the markup language. For example, content within equals signs (=…=) is a first level heading. That will probably be mapped to <h1>…</h1>, possibly with an attribute to indicate the heading is derived from wikitext: <h1 class=”MediaWikiFirstLevelHeading”>…</h1>. Then again, one could just treat all h1 tags the same. We’ll see as we get deeper into the mapping.

To avoid rework, decisions about how to convey metadata in the wikitext to metadata in the Flare topic XML should be made at this point. If conveying metadata is to rely on markup, conveying metadata is a part of the mapping process. An alternative is to store that information in a separate repository, like a database. But maintaining the information in the Flare topic seems like a more convenient option at this point.

For the next post, I’ll build a table with three columns: one for the wikitext tokens, one for the Flare topic XML tags, and one to explain the mapping behavior. That process will go on for a few posts until the mapping is satisfactory.


Flare to MediaWiki to Flare (part 3, Placeholder Flare Topics)

In the previous post about this effort, I discussed the MediaWiki API and how to create a Flare TOC from the response from an API call. The same approach is possible for the Flare topics. But since the wiki page content is not maintained as XML, the conversion is more complicated from wiki page to Flare topic. That is why this post is called Placeholder Flare Topics. Converting the wiki markup will be tackled in a subsequent post.

It may also be a good idea at this point to note some technical debt in the work so far. Although an API call was made and a TOC was created, there are a few other considerations depending on the MediaWiki instance. Firstly, some calls may require authentication. The example so far did not because I did not require a password for myself as the administrator on my local instance. But down the road, the issue of authentication will come up. But the issue has been encountered by many others and handled before, so I feel it is okay to leave that one for later. Secondly, some calls to the MediaWiki API may have limitations on the number of items returned. This is this case for the list of pages call from the previous post. If the wiki becomes large enough, that will have to be addressed.

Creating a Flare topic can be accomplished in the same way as the creating the TOC. Determining what topics to create can be accomplished by cycling through the TocEntry elements in the TOC and using Linq to XML to create one. Populating the topic with content can be accomplished through a MediaWiki API call to retrieve the content and then parsing the wiki markup to create XML. This post’s sample will only show how to create the topics and touch on the API call. For now, I haven’t broken anything out into separate procedures so that you can see the similarities between the TOC creation and the topic creation. Here is the updated (and only partially checked) code:

Flare to MediaWiki to Flare (part 2, make an API call and a Flare TOC)

To get the wiki pages, something has to make the API calls. A program is needed. I decided to build it rather than look for one that fits. But the plan was to use preexisting libraries to access the MediaWiki API if possible. I also decided to go with a Visual Basic project for this next part of the experiment. That decision could have gone many ways. MediaWiki is built on PHP, so that would be a good choice. Java is nice (and I’m familiar with it) and there are plenty of scripting languages which would fit the bill. But, this program is going to run on the same machine as a Flare installation. Flare is a .NET application and so too will be this program to move content to and from Flare projects.

But the real motivation to use Visual Basic was Linq to XML. Once wiki markup was converted to XML, there would probably be more XML manipulation to perform. Linq to XML is good for that. The general idea to get conversion framework started was this:

  • Make an API call to get the list of pages on the wiki
  • Create a one level Flare TOC based on the list of pages
  • Make API calls to get the wiki page markup
  • Convert the wiki markup to some kind of XML
  • Create Flare topics with the XML

Accomplishing that will not finish the MediaWiki to Flare conversion part of the story. But it will give it a good start and something to work with. Once the basic framework for moving the content is in place, other decisions can be made such as how to handle persisting metadata from the wiki page.

In a previous post, how to create a Flare topic with Linq to XML was demonstrated. So one piece of the puzzle is out of the way. I decided to tackle the easier part of the problem firstly: getting the list of pages and creating the Flare TOC. This was started at the end of the previous post with this API call:

Which returns this:

When I looked at that, TRANSFORMATION screamed in my head. But I wasn’t sold yet. A previous post demonstrated how to perform a transformation to create a Flare topic from XML with XSLT and the approach is roughly the same to create a Flare TOC from XML data. But before that, I figured I should create a Visual Basic project. I created a Console Application project called MediaWikiToFlare. At this point, I planned to run the conversion from a command line or PowerShell.

I know I said I planned to use a preexisting library for the API calls. But just to start, I decided to hard code the REST call. Here is a good example for Yahoo API REST calls: Make Yahoo! Web Service REST Calls With VB.NET. I followed that basic pattern for my initial test. I used Linq to XML to store the returned XML and to make the conversion to a new document.

I created the folder structure for the save:

And ran the application. A file called WikiPages.fltoc was created in the folder. Here is how the file contents look:

Flare to MediaWiki to Flare (part 1, getting started)

I asked for suggestions for posts on the Users of MadCap Flare on Linkedin. The first one was to see content flow freely between Flare and a wiki. I’m not promising anything. But the suggestion is very good and it is worth a try. I saw some posts on MadCap Software Forums about this. It sounds like Flare to wiki to Flare has been done with varying degrees of success.

I’m going to try this with MediaWiki. Click a few links from a Google search and you’ll see that MediaWiki is not necessarily the preferred wiki platform for wiki customization. But Wikipedia is powered by powered by MediaWiki and that is a strong enough argument for me. The idea is to blog this as it happens. If I misstep, I’m going to blog it.

If you are reading this, you probably already have Flare installed. I have never implemented a MediaWiki wiki, so I am not going to assume everyone else has. The download is appropriately posted on the MediaWiki wiki. I downloaded 1.19.1. It is a .tar.gz file and WikiMedia notes that you can use 7-Zip to extract the files. Use the file manager application that comes with 7-ZIP. You will have to drill into the file to get to the actual folder.

I’m testing this on my computer which is a Windows 7 laptop. I copied the mediawiki-1.19.1 folder from the downloaded archive to C:\inetpub\wwwroot. Then in a web browser, I entered this URL: http://localhost/mediawiki-1.19.1/index.php, which brought up an webpage with a link to install:

I clicked that and followed the wizard. For my setup, I re-installed MySQL separately. Remember to run these items as an administrator. I had previously installed MySQL for another stack and I couldn’t remember the password. I had to reset it. Instructions to do that for Windows are here: How to Reset the Root Password. After that I mostly selected the defaults on the wizard. The point is just to get a test wiki up and running.

A freshly installed MediaWiki wiki already has a page. I didn’t add anything else right away. The big question is how to retrieve the wiki content. The solution should be programmatic and repeatable. MediaWiki has an API. There is also a bulk export. Without committing to either one, I firstly explored the API.

One requirement for this round trip is to get wiki pages from the wiki and convert the pages to Flare topics. The API is a web service and if you browse to the endpoint for the web service on your installation of MediaWiki, you can see the generated documentation page. Again, I installed the wiki locally. My installation’s API endpoint is http://localhost/mediawiki-1.19.1/api.php.

The MediaWiki wiki has an introductory example that describes a get of the content for the main page on the English version of Wikipedia. The call for my installation is:

When I viewed the call in a web browser, the browser displayed the returned XML from the call. I could have specified another format such as JSON or PHP. But I thought XML would be a safe place to start. Unfortunately, the returned XML wraps another kind of markup which is the actual content. Parsing XML would great since that is what Flare wants. But instead, it is necessary to parse Wikitext. It has been done before and MediaWiki provides a repository of links at Alternative parsers.

A few side notes: It may be worth looking into direct queries of the database which houses the wiki content. It is tempting to look into grabbing HTML. But I have a feeling that would greatly complicate the round trip back into the wiki.

The upside of the API is that other information is nicely organized in the XML. The content would be difficult without a parser. Although it would be interesting to build one, I won’t. But the rest of the metadata and probably the navigation seem easier. Here is a call to get all pages:

To recap this initial exploration, installing MediaWiki isn’t perfectly strait forward but it isn’t terrible. MediaWiki has an API which among other things can be used to retrieve content. Content is not in XML but in Wikitext. There are many Wikitext parsers out there already and there is no reason to reinvent the wheel.