Monthly Archives: August 2012

Flare to MediaWiki to Flare (part 4, Starting to Map MediaWiki Markup to Flare Topic XML)

Parsing the wiki markup for MediaWiki wikis is a challenge. There is not a markup spec to which to build a parser. But parsers exist. There would be no Wikipedia without one.

The original plan for this blog series was to use an existing libary to parse wikitext. That may still be the solution. But to convert one markup to another, a good map comes in handy. Here is the simplest version of the map: MediaWiki wikitext > Flare topic XML. But the nuts and bolts are much more complicated.

There is a wiki page on the MediaWiki.org website which discusses the possibility of a spec: Markup spec. The conclusion I draw from reading that page is that the rules for wikitext on a MediaWiki wiki is defined by the parser for a MediaWiki site. In other words, if you want your wiki markup to work, determine the parser behavior.

Writing markup to suit the needs of a parser isn’t especially shocking. Web developers constantly tweak their HTML to accommodate the whims of web browser makers. At least with this markup, there is really only one parser about which to be concerned, the MediaWiki parser. How does that parser behave?

The Markup spec page describes the parser’s actions. Notably, the outline describes a preprocessor, a parser, and a save parser with different behaviors. This is a potential model for different parsing behaviors in the round trip. A section of the page describes parts of the markup language. For example, content within equals signs (=…=) is a first level heading. That will probably be mapped to <h1>…</h1>, possibly with an attribute to indicate the heading is derived from wikitext: <h1 class=”MediaWikiFirstLevelHeading”>…</h1>. Then again, one could just treat all h1 tags the same. We’ll see as we get deeper into the mapping.

To avoid rework, decisions about how to convey metadata in the wikitext to metadata in the Flare topic XML should be made at this point. If conveying metadata is to rely on markup, conveying metadata is a part of the mapping process. An alternative is to store that information in a separate repository, like a database. But maintaining the information in the Flare topic seems like a more convenient option at this point.

For the next post, I’ll build a table with three columns: one for the wikitext tokens, one for the Flare topic XML tags, and one to explain the mapping behavior. That process will go on for a few posts until the mapping is satisfactory.

 

Frameborder around Topic Frame in WebHelp

You can’t remove or change the frameborder around the topic frame in WebHelp with the skin editor. At least I haven’t figured out how. Moving to the HTML5 target type eliminates the issue. But in case you aren’t ready for that yet, here is how you can change it post-build with a Visual Basic command line application.

  1. Create a Visual Basic command line application project in Visual Studio.
  2. Add the module below.
  3. Build the application. You may need to add some references to your project.
  4. Copy the executable to a known location.
  5. Use the application from the command line or a batch file. The argument is the name of the file to update.
The originating frameborder setting is on line 35 of Default.htm in this folder or a similar location in your installation:

You could change it there. But that would change the setting for any WebHelp generated on your Flare instance. Or you can change the setting post-build in the corresponding file in WebHelp output. The program reads the the text of the file into a string, changes the frameborder setting text in the string, and replaces the text of of the file with the string. Double double quotes are escape sequences for double quotes in Visual Basic.

Flare to MediaWiki to Flare (part 3, Placeholder Flare Topics)

In the previous post about this effort, I discussed the MediaWiki API and how to create a Flare TOC from the response from an API call. The same approach is possible for the Flare topics. But since the wiki page content is not maintained as XML, the conversion is more complicated from wiki page to Flare topic. That is why this post is called Placeholder Flare Topics. Converting the wiki markup will be tackled in a subsequent post.

It may also be a good idea at this point to note some technical debt in the work so far. Although an API call was made and a TOC was created, there are a few other considerations depending on the MediaWiki instance. Firstly, some calls may require authentication. The example so far did not because I did not require a password for myself as the administrator on my local instance. But down the road, the issue of authentication will come up. But the issue has been encountered by many others and handled before, so I feel it is okay to leave that one for later. Secondly, some calls to the MediaWiki API may have limitations on the number of items returned. This is this case for the list of pages call from the previous post. If the wiki becomes large enough, that will have to be addressed.

Creating a Flare topic can be accomplished in the same way as the creating the TOC. Determining what topics to create can be accomplished by cycling through the TocEntry elements in the TOC and using Linq to XML to create one. Populating the topic with content can be accomplished through a MediaWiki API call to retrieve the content and then parsing the wiki markup to create XML. This post’s sample will only show how to create the topics and touch on the API call. For now, I haven’t broken anything out into separate procedures so that you can see the similarities between the TOC creation and the topic creation. Here is the updated (and only partially checked) code:

Notes about Word and MadCap Flare

This isn’t a code-oriented post. Here are a few thoughts about how to use Word as a part of your Flare infrastructure.

When it comes to technical authoring, inevitably, someone will ask if they can use Microsoft Word. And as author’s skills with Word grow, so does their attachment. A desire to structure content, create a single source, and produce multi-channel output lead the author to ask, how can I do those things with Word? It’s a fair question and it isn’t impossible to accomplish those things with Word. But if you are reading this, you have probably passed through that learning curve.

Word is everywhere and there are many solid use-cases for Word. It turns out that Word can be leveraged to improve one’s Flare workflow as well. In particular, two features of Word combined with Flare’s Word imports can be very powerful.

Flare’s Word import functionality is strong. If a Word document has been thoughtfully constructed with Styles, Flare can do a lot with it. Styles in Word map nicely to CSS styles in Flare. Word emphasizes headings in the default style galleries, which eases delineating Word headings into Flare topics.

Microsoft Office supports a technology called Object Linking and Embedding (OLE). OLE enables embedding of other objects, such as Visio diagrams, into Office documents. A Visio diagram embedded in Word can be seen by the reader of a Word document as a diagram. But to someone editing the document, it can be edited as a Visio file. When a Word document with an embedded Visio diagram is imported into Flare, the resulting item in Flare is an image file and a reference to it in the corresponding Flare topic.

This is great if you want to manage content by subject matter experts. It is almost certain the SME is familiar with Word. If the SME is technically minded, there is a good chance she is familiar with Visio. An SME can create content in Word, including embedded objects, and that content can be imported into Word.

One can place the Word file in a known place, such as on a network drive. If you agree to only edit the Word document and not the resultant Flare topics, the SME can come back and make changes time and time again. Add to that, the SME can update embedded objects and those will come over on the import also.

But Word can be used to as a starting point for a Flare project too. The natural use for Word import is to import existing content. When a Word shop switches to Flare, Word imports are central to the conversion process. But Word can also help create new content. Here is why.

Think about the experience of creating a Flare TOC with blank topics from scratch. It is pretty easy. But it takes longer than creating a new Word document and adding headings to create an outline. So why not just create the Word outline and import it into Flare? If you are only going to do it once, it may not be worth it. There is a time cost to setting up a Word template and a Word import file in Flare to map to the Styles in the template. But, if you like to outline your documentation before you fill in the content, it is definitely worth the time.

Flare to MediaWiki to Flare (part 2, make an API call and a Flare TOC)

To get the wiki pages, something has to make the API calls. A program is needed. I decided to build it rather than look for one that fits. But the plan was to use preexisting libraries to access the MediaWiki API if possible. I also decided to go with a Visual Basic project for this next part of the experiment. That decision could have gone many ways. MediaWiki is built on PHP, so that would be a good choice. Java is nice (and I’m familiar with it) and there are plenty of scripting languages which would fit the bill. But, this program is going to run on the same machine as a Flare installation. Flare is a .NET application and so too will be this program to move content to and from Flare projects.

But the real motivation to use Visual Basic was Linq to XML. Once wiki markup was converted to XML, there would probably be more XML manipulation to perform. Linq to XML is good for that. The general idea to get conversion framework started was this:

  • Make an API call to get the list of pages on the wiki
  • Create a one level Flare TOC based on the list of pages
  • Make API calls to get the wiki page markup
  • Convert the wiki markup to some kind of XML
  • Create Flare topics with the XML

Accomplishing that will not finish the MediaWiki to Flare conversion part of the story. But it will give it a good start and something to work with. Once the basic framework for moving the content is in place, other decisions can be made such as how to handle persisting metadata from the wiki page.

In a previous post, how to create a Flare topic with Linq to XML was demonstrated. So one piece of the puzzle is out of the way. I decided to tackle the easier part of the problem firstly: getting the list of pages and creating the Flare TOC. This was started at the end of the previous post with this API call:

Which returns this:

When I looked at that, TRANSFORMATION screamed in my head. But I wasn’t sold yet. A previous post demonstrated how to perform a transformation to create a Flare topic from XML with XSLT and the approach is roughly the same to create a Flare TOC from XML data. But before that, I figured I should create a Visual Basic project. I created a Console Application project called MediaWikiToFlare. At this point, I planned to run the conversion from a command line or PowerShell.

I know I said I planned to use a preexisting library for the API calls. But just to start, I decided to hard code the REST call. Here is a good example for Yahoo API REST calls: Make Yahoo! Web Service REST Calls With VB.NET. I followed that basic pattern for my initial test. I used Linq to XML to store the returned XML and to make the conversion to a new document.

I created the folder structure for the save:

And ran the application. A file called WikiPages.fltoc was created in the folder. Here is how the file contents look:

Flare to MediaWiki to Flare (part 1, getting started)

I asked for suggestions for posts on the Users of MadCap Flare on Linkedin. The first one was to see content flow freely between Flare and a wiki. I’m not promising anything. But the suggestion is very good and it is worth a try. I saw some posts on MadCap Software Forums about this. It sounds like Flare to wiki to Flare has been done with varying degrees of success.

I’m going to try this with MediaWiki. Click a few links from a Google search and you’ll see that MediaWiki is not necessarily the preferred wiki platform for wiki customization. But Wikipedia is powered by powered by MediaWiki and that is a strong enough argument for me. The idea is to blog this as it happens. If I misstep, I’m going to blog it.

If you are reading this, you probably already have Flare installed. I have never implemented a MediaWiki wiki, so I am not going to assume everyone else has. The download is appropriately posted on the MediaWiki wiki. I downloaded 1.19.1. It is a .tar.gz file and WikiMedia notes that you can use 7-Zip to extract the files. Use the file manager application that comes with 7-ZIP. You will have to drill into the file to get to the actual folder.

I’m testing this on my computer which is a Windows 7 laptop. I copied the mediawiki-1.19.1 folder from the downloaded archive to C:\inetpub\wwwroot. Then in a web browser, I entered this URL: http://localhost/mediawiki-1.19.1/index.php, which brought up an webpage with a link to install:

I clicked that and followed the wizard. For my setup, I re-installed MySQL separately. Remember to run these items as an administrator. I had previously installed MySQL for another stack and I couldn’t remember the password. I had to reset it. Instructions to do that for Windows are here: How to Reset the Root Password. After that I mostly selected the defaults on the wizard. The point is just to get a test wiki up and running.

A freshly installed MediaWiki wiki already has a page. I didn’t add anything else right away. The big question is how to retrieve the wiki content. The solution should be programmatic and repeatable. MediaWiki has an API. There is also a bulk export. Without committing to either one, I firstly explored the API.

One requirement for this round trip is to get wiki pages from the wiki and convert the pages to Flare topics. The API is a web service and if you browse to the endpoint for the web service on your installation of MediaWiki, you can see the generated documentation page. Again, I installed the wiki locally. My installation’s API endpoint is http://localhost/mediawiki-1.19.1/api.php.

The MediaWiki wiki has an introductory example that describes a get of the content for the main page on the English version of Wikipedia. The call for my installation is:

When I viewed the call in a web browser, the browser displayed the returned XML from the call. I could have specified another format such as JSON or PHP. But I thought XML would be a safe place to start. Unfortunately, the returned XML wraps another kind of markup which is the actual content. Parsing XML would great since that is what Flare wants. But instead, it is necessary to parse Wikitext. It has been done before and MediaWiki provides a repository of links at Alternative parsers.

A few side notes: It may be worth looking into direct queries of the database which houses the wiki content. It is tempting to look into grabbing HTML. But I have a feeling that would greatly complicate the round trip back into the wiki.

The upside of the API is that other information is nicely organized in the XML. The content would be difficult without a parser. Although it would be interesting to build one, I won’t. But the rest of the metadata and probably the navigation seem easier. Here is a call to get all pages:

To recap this initial exploration, installing MediaWiki isn’t perfectly strait forward but it isn’t terrible. MediaWiki has an API which among other things can be used to retrieve content. Content is not in XML but in Wikitext. There are many Wikitext parsers out there already and there is no reason to reinvent the wheel.

Batch Files, madbuild.exe, and commands

Flare’s help contains topics about batch targets and there is no need to cover all of that here. Briefly, you can create batch targets for a project. From the Batch Target Editor, you can select actions to take and schedule tasks. To generate output from Flare targets using commands specified outside of the Flare user interface, you can use a command line executable called madbuild.exe. This executable is located in the Flare.app folder of your folder installation. For example:

The executable describes its usage as:

But there is also a switch to enable logging:

The primary benefit is of batch is scheduling. Scheduling builds means you don’t have to manually kick off a process, which is a time saver.

Scheduling nightly or continuous build becomes more important as the volume of content increases. For large API and database references, build times can go into hours. Waiting until the end of the day to kick off a build can be a hassle. But there are other scheduling benefits to batch processing:

  • Scheduling other tasks for post-build processing
  • Organizing build schedules for outputs with subsystems to build in parallel

Here are some other tasks you can perform:

  • If you use source control and the source control solution supports the command line, you can automate checkouts and other actions. For example, with Team Foundation Server, there is an executable called TF.exe.
  • You can perform PDF post-processing. For example, if you have a copy of Acrobat, there is an Acrobat API from Adobe which can be used to create command line utilities to run from batch.
  • Copy the output or the source to an archive. For example with XCOPY or Robocopy
  • Copy a different version of a stylesheet into the output.
  • Copy external files into the project folder structure.
  • Adjust skin CSS beyond what is possible in the Skin Editor.

Microsoft TechNet Command Line Reference

Building in parallel means firstly building subsystems at the same time and then building an encompassing output after the parallel builds. For that, you need many targets.

Check out this post about a utility (made by me) to generate batch files: Targets Everywhere. This one is good for generating a file or files for all targets in any project in a folder. You can also save configurations for the utility such as which targets are selected for inclusion. And there is an interesting post at Roughly Everything about Flare about another utility to create batch scripts: Coming soon – a new batch script builder released on CodePlex. This one looks like it is good for generating the text for commands to copy into a batch.

If you use PowerShell, most of what you do from cmd.exe is portable to PowerShell. And that is a topic for another day.

Use XSLT to create a Flare Topic from XML Data

XSLT is language for transforming XML documents. The World Wide Web Consortium (W3C) published XSL Transformations (XSLT)

Version 1.0 as recommendation in 1999 and XSL Transformations (XSLT) Version 2.0 in 2007. The W3C recommendations are posted at:

http://www.w3.org/TR/xslt/

http://www.w3.org/TR/xslt20/

If you are lucky enough to have information housed as XML, you are just one transformation away from a Flare topic. XSLT can be used to create XHTML files from XML files. Suppose you have this XML:

You can use XSLT to create this Flare topic:

The XSLT used is:

Regular Expressions and Flare Markup (Danger)

Danger ahead…

This post does is not a recommendation to use regular expressions for automation which involves XML and XHTML parsing. Unless you are watching the action or verifying the results later, there are dangers to using regular expressions to parse XML and XHTML. This is discussed extensively on developer forum websites. Computer science curricula which describe language theory may even include formal proofs which demonstrate why parsing XML and XHTML with traditional regular expressions is impossible.

There is a particularly colorful post about the subject by Jeff Atwood called “Parsing Html The Cthulhu Way.”

Flare has a find and replace feature which supports wildcards and regular expressions. The feature includes an option to search source code. In other words, you can search the text of a topic or the underlying markup. Flare’s find and replace works well for project-specific needs. Analyzer includes more advanced features to find items. This is a better option if your goal is to change markup. But the functionality is also project-specific. Since Flare artifacts are text files, non-Flare text editors can be used to edit Flare artifacts. And many editors support regular expressions.

If you maintain content in a small number of projects, find and replace in Flare may be sufficient. But if you use many projects to maintain your content, an external find and replace may come in handy. When you step outside of Flare, find and replace are limited to the XML and XHMTL of the Flare artifacts. When working within Flare, find and replace works within the context of the XML (semi-WYSIWYG) editor as well as in the context of the text editor (true XML).

Many programs which support text find and replace also support scripting. Visual Studio includes find replace functionality which can be scripted through Visual Studio macros. The .NET framework has its own flavor distinct from Visual Studio. Many scripting languages and IDEs support find and replace. Most have their own variation of syntax and functionality.

XML and XHTML are text. Regular expressions parse text. It is tempting to use regular expressions to parse XML and XHTML. But when it comes to XHTML and XML, regular expressions have limitations. In computer science terms, traditional regular expressions describe a regular language. XHTML and XML are not regular languages. A regular language can not fully parse a non-regular language. In particular, regular expressions have difficulties with nested tags.

Many languages called regular expressions exceed the scope of traditional regular expressions. For example, some languages called regular expressions support recursion and backtracking. This can be exploited to overcome issues with nested tags.

Since Flare projects are openly maintained in XML, XHTML, and CSS text files, there are patterns in Flare artifacts which can be exploited through regular expressions. For XML and XHTML artifacts, the formatting for markup is similar. There is an opening and closing tag which wraps content.

The obstacle to changing elements with markup such as XML is to change both the opening tag and the corresponding closing tag. The natural desire is to identify the content of the tag in such a way that the closing point of the content is also identified. Another obstacle is implementations of regular expressions may differentiate line ends from other characters. There is potentially more than one line end in any given element. So that must also be considered.

The issue with line ends can be handled. Some issues with nested tags can be handled. But as an overall solution for XML and XHTML parsing, regular expressions are not appropriate. Here is a demonstration as to why.

Let’s begin with a simple regular expression to find the opening tag for a p element. Assume we have a document which contains this element in the source: <p>Example</p>. In the Find field of a find and replace feature, enter <p>. With Visual Studio, if you attempt to find with this value, with no options selected, the opening p tag will be found and highlighted. If Use: Regular expressions is selected, p will not be found. The greater than and less than symbols must be escaped if Regular expressions are used. To escape a character in Visual Studio regular expressions, precede it with \. So adjust the find field to:

With Use: Regular expressions selected, <p> will be found and highlighted. But if Use: Regular expressions is not selected, a message will appear which says:

The following specified text was not found: \<p\>

So far, we have no problems. We sought to find the opening tag for a p element with no attributes and no whitespace in the tag. In order to find the rest of the content in the opening tag’s first line, we can adjust the regular expression to:

This expression uses . to represent any character other than line breaks and * to represent unlimited repetitions of the preceding item. The effect is to return everything until there is a line break. With Use: Regular expressions selected, the entire line is found:

If the goal is just to find content starting with a particular tag which spans one line, that technique is sufficient. But there are two more considerations. Firstly, line breaks are still an issue. Let’s look at an element with a line break:

The same find with Use: Regular expressions would find:

The line break excluded everything after the first line from the find. In Visual Studio regular expressions, a line break is represented by: \n

and “or” is represented with: |

Changing the find to:

Will return everything after and including the opening p tag. But the expression does not stop until the end of the file. The expression will find everything including <p> and after. To terminate the find text at another point, we can specify not to include a character. Visual Studio regular expressions indicate any one character not in a set with:

[^…], where the ellipses is the set of characters

We can use this to specify a stopping point for the find this way:

Now the find will return every thing from the opening p tag up to the first closing tag in the markup, any closing tag. And herein lies the problem. Elements can be nested in elements. For example, the p element may contain a span element or an a element. You could continue down this road for a while. But the more you adapt (complicate) a regular expression to handle situations, the less elegant the solution becomes.

In short, regular expressions work well for some plain text problems. With markup, regular expressions work well for innermost elements, which are elements which contain no other elements. Regular expressions also work well for changing attributes for every kind of element. For example, name=” for p, pre, b, and every other element. But for true XML and XHTML parsing, regular expressions are not reliable.

Welcome to Flare for Programmers

This is a blog about how to script, program, and automate with the XML-based project structure of MadCap Flare. To get started, see the first posts about Flare Topics and Flare TOCs. Then move on to:

These first posts focus on basics. As time goes on, the posts will target specific technologies and examples. You can skip around or read from the beginning. Please leave comments and questions.

For a look at other technologies, consider these posts:

Thanks for reading!