XML in, with, and for VFP
Session E-XML

Lisa Slater Nicholls
download the source for this session here

Session Overview

As a VFP Developer, you will find plenty of use for XML in the near future. This session will show you why, and how, you'll use it. Here are some general reasons to keep in mind:

  • XML provides a platform- and language- neutral syntax for interchanging information between applications and business partners, so it is ideal for VFP applications that function as mid-tier components, in web as well as non-web-enabled architectures.
  • It is ideal for multilingual applications, because XML was designed from the ground up to handle Unicode.
  • It provides raw content for unlimited forms of display, so it is an ideal  tool to complement the shortcomings of the VFP FRX and reporting systems.
  • It is well-equipped to express relational data as well as describe objects, so it is ideal for use in an environment, like VFP, that use both normalized databases and object hierarchies.

As you reflect on the material in this session, you will see many illustrations of these general points although we may not discuss them directly.  In this session we'll cover:

  • why using XML extends VFP in directions that are critical to VFP;
  • practical applications of XML in VFP;
  • what's been added in VFP 7.0 to facilitate using XML;
  • recommendations for tools and techniques to get you up to speed with XML and its associated technologies.

Introduction (why we are here)

This is going to be a long introduction, more like an essay in itself.  I’m going to tell you at least one story and give you some background information. 

Don’t worry – I promise that the rest of this paper will also have lots of code samples!  If you’re already convinced that XML has a place in your future and just want to get started on “how”, you can skip this section.  If you’re interested in the “why”, here we go…

A few weeks ago, I mentioned to John Alden that I was doing a lot of XML-centric work these days. 

NB: Some of you may remember John Alden from US Fox conferences or from conversations on FoxForum. He is the author of Raven, a fascinating and unusual Visual FoxPro application, of a type generally referred to as an expert system. Raven “learns” and stores complex decision trees created by experts in various fields and then generate recommendations, diagnoses, or other comprehensive results for less-expert users.  (If you want to know more, reach John at mailto:alden@tychosoft.com.)

John responded to my e-mail as follows:

[John] XML is where it's at, yet I keep looking for how I might need to use it and not finding much. I'm sure that's because my app is so off the beaten path... 

He couldn’t have been more wrong.  And, because I’m the person I am, I proceeded to tell him so:

[Lisa] Any app that needs to communicate with other apps (IOW that is not a world unto itself) cannot be so far off the beaten path that XML won't find its way in. 
But you have to find that yourself (and you will).  There is no point in deciding that it is "where it's at" and just "going there".  You will just get there w
ithout knowing how it happened, when you need to <s>.

At this point in writing my message to John, I realized that I had an important task in undertaking this topic at the German DevCon this year. 

I certainly don’t think you should magically believe that “XML is where it’s at”, no matter where Microsoft or anybody else says you should want to go today.  And it’s also true that there are some things you have to experience yourself before they can be meaningful.  There is no reason to expect you to derive meaning, or internalize implications, by simply hearing about my experiences.

Still, this session is designed to give you a push in the right direction.  I want to let you know why I think XML is so important to your work and how I think you can start using it, now and in the near future, to expand the horizons of VFP applications.  I will tell you something about my experiences and extrapolate some general meaning from them, if possible.

No app is an island

Part of the battle has already been won.  All of you have written “collaborative software” already.  Perhaps you haven’t used DLLs and ActiveX controls explicitly.  Maybe you haven’t once exported data to a waiting Excel application or used ODBC to attach to SQL Server or Oracle.  You’ve still written collaborative VFP applications.

These days, you are still making use of external components in your VFP applications every time you use GETFILE().   Your VFP app is still dependent on the outside world for help every time you send a REPORT FORM TO PRINT out through the Windows printing subsystem.  If your report contains a letterhead image embedded in the FRX, VFP calls the image’s host application when rendering the graphical portions of the page.

All application development environments, including VFP, have weaknesses and limitations.  Your users do not accept that as an excuse for limiting functionality, so you write Fox applications that collaborate with other software to get the full job done.

As soon as you write collaborative software, the different components have to talk to each other. XML gives us a way to have a maximum number of components talking to each other with a minimum of fuss.

So… don’t we have COM for this purpose already?  Don’t we have established ways to allow applications to communicate?  Not really.  XML extends your ability to communicate to other environments and applications that do not, or cannot, follow COM rules.  Even within a COM-world, XML makes it easier to communicate through firewalls, to transcend configuration hassles, and to alleviate character set and codepage issues. Beyond COM, as you’ll see, the two communicating partners do not have to share much at all. 

FoxPro DOS, still great for lots of uses, comes to mind!  FPD is not a COM client.  But it can parse XML, and it can provide data in XML form.  So can almost anything else.

It's true that not even XML is going to spare us from (for example) Fox's non-standard representation of NULLs, or (for instance) vagaries in VB's understanding of arrays.  But XML and its sister- and daughter- technologies are better than anything else ever has been at reconciling these, and other, issues. 

The people who work on the XML and related standards have a lot of hard-won experience. They are putting it all to good use, designing XML to be both exacting and forgiving, both explicit and inclusive of as-yet-unknown architectures.  Just as importantly, these standards are catching on in the real world.  All that standards and design work must be paying off!  People like using this stuff!

By writing VFP applications that expose XML interfaces, you make your work accessible to more types of developers, who can hook together more types of functionality, than ever before.

As I continued in my message to John:

[Lisa] Before you even get to components... an app like yours has a natural XML tie-in in terms of output.  You are writing and preparing output for a completely unknown set of audiences, so you should be preparing it for a completely unknown set of output devices and presentation formats.  What do you think is the right way to do that <s>?

And now some people at this conference, those of you who have been reading my essays for several years, immediately know why I am so interested. 

Publishing everywhere

Starting in 1997’s Frankfurt sessions, I’ve been talking to you about something called “the one document process". I’ve been looking for ways to make it work, for much longer than that.  Lots of other people were searching for ways to make this come true, too, even though they might have called it something else.

In a one document process, we store information only once across an enterprise, no matter how many business units need it, no matter how many formats they want for views and reports, even data input screens. When it comes to the metadata that describes our application, the data that describes our work as developers we also have one document.  This one document, or dataset, starts with requirements, goes through testing, and proceeds to end-user help files.  We generate the necessary output materials as customized views and formats of one set of metadata.

I don’t mean there is one storage facility for every piece of data (like an Access MDB).  I simply mean that data is only stored once, wherever it is stored, and tied together with other data as needed.  You don’t write a set of use cases or scenarios for your requirements document, for example, and then have the testers start from scratch to write use cases or scenarios. There is one set of use cases, even though they may look different when the testers use them from their presentation in the requirements document.  They may be prioritized differently, or offer different sets of details, in the requirements document and the test plan. The same use cases, again presented differently, help end users understand how the finished application works.

In VFP, we were limited in this goal, partly because our abilities to interchange data and present data were not complete.  VFP did not suffice for all input, storage, and output needs; it needed collaboration.  For example, we might have Visio diagrams indicating process flow and functionality that were integrated into our metadata describing the application. 

We weren’t able to communicate with other components effectively. ODBC, COM, and ADO helped somewhat on the data interchange side but we never really surmounted the limitations of the Report Writer to “speak” properly, or present, the results of our work. 

With XML, the one document process for development work finally becomes a reachable goal, without all kinds of kludges.  It gives VFP the tool to write once, publish every-where, every-what, and every-how.

Some of you may be wondering, how do all the different views of the single document stay synchronized?  For instance, suppose somebody adds a use case late in the development cycle, to match a new requirement.  How do you make sure it shows up in the documentation, other than publishing a text file labeled README.TXT?

The answer lies in establishing a “publish and subscribe” system, in which all known consumers of a piece of data register their interest in that data.  Typically, the consumers indicate how often they would like to be notified of any changes to the data, and/or receive brief notification of all changes, in a form that allows them to evaluate whether this change is pertinent to their “view” of the data.
Those of you who are interested in design patterns take note: in this system, we have an Observable  (the one document data source), and a lot of Observers (the document views).  In a typical system, the document object maintains a Changed flag, and also sends out a notifyObservers event to its registered Observers.  The notifyObservers event often passes a reference to an object providing details on the typeOfChange.

Any Observer, such as the help file generator system, may choose to re-generate some items on the fly, cache others every time a change is noted, and recreate others only when specified events occur.  For example, a list of links on the web in an on-line help file, showing recommended resources, might be sourced “fresh” from the data source every time it is needed, to keep it up to date.  Meanwhile, a listing of examples shown in this help file could be cached on the web server, re-generated only when the examples have been changed in some way.  The API listing or the table of contents, for this same help file, would only be re-generated when a release version change occurs. The Observable datasource remains unaware of these distinctions, of course. 

Where we will use XML (this is not about the Internet)

Rick Strahl is going to show you how all this affects VFP’s ability to function within a distributed application.  These days, it’s true, “distributed” often automatically means “working over the Internet”.  Some of my examples will have an Internet component or “background”, too. 

However, data interchange and data presentation are problems you have to solve whether you work with the Internet or not.  My examples and advice are intended to show you how VFP and XML work together within any component enviroment to solve these problems. 

What VFP applications do

Think about what all your applications do, and by extension what you do every day for a living.  They all do pretty much the same thing, even John Alden’s “off the beaten track” Raven (see figure 1 below).

Figure 1, An oversimplified, but comprehensive, view of VFP activity.

Essentially, you receive data input, you take care of it, and you provide data output.  VFP is great at doing these things, but that’s all it really does. Looks simple and straightforward, but as you know, the “devil is in the details”. 

If the data comes in from an IMPORT process, you have to write certain types of procedures to clean it and import it into your database, error handle, et cetera.  If the data comes from a human typist, you have to write entirely different procedures – to validate, provide feedback, and handle any visual cues and infelicities of the GUI controls in your forms and toolbars – to bring essentially the same rows and columns into your database.  If your client wishes data to come into the application from a new source – say, a Palm device or a bar code reader – you have to handle different cues, different feedback, a different interface.  You will also have to write some code to talk to the Palm device, which is code of a type that VFP is not well prepared to provide.

If your data output or presentation is going to an EXCEL spreadsheet, you may write some automation procedures, whereas you use a REPORT FORM to go to a printer.  If you send the data out to a fax, you face a different set of problems once again.  When your client says she’d like browser-based reports as well, you go back to the drawing board to start generating HTML in yet another, distinct, output procedure.

What XML-enabled VFP applications can do

Suppose all these input and output devices could accept and provide the same message format.  Each device would still have different limitations and different needs, but the procedures you used to address them all – to provide feedback, to deliver data, to communicate errors – would be the same.  In a client-neutral strategy, your application activity and the code you planned to write would be simplified in structure, looking more like figure 2 below.

Figure 2, A “client-neutral” VFP application has a simpler job.

In fact, we could take this a step further and erase the distinction between data input and data output, from the point of view of structuring the code you have to write to collaborate with other applications.  Collaboration and messaging between components would be reduced to a series of requests and responses, as you see in figure 3.

Figure 3, A “messaging view” of VFP applications simply accepts
requests and provides responses, with no distinction between input and output.

What should the common messaging standard be like? There are several reasons why XML is a good choice for messages between disparate clients and servers. Among them:

  • XML is a format easy for multiple types of participants to create and decode, because it is not binary.  Yet it makes provision for multiple extended character sets and encodings, and does not require the participants to share these character sets or encodings.
  • XML easily handles relational data, because of the way elements nest   Lookup elements can be handled with external references to other documents, the use of ID and IDREF attribute types in the DTD, and many other schemes.  Yet it is not restricted to a regular format like relational databases.  Binary data of any imaginable rich data type can be designated using NOTATION-type attributes or the XLINK standard. Like Lotus Notes and object databases, XML can “stretch” to irregular formats that easily represent real-life needs without following a pre-designed structure. 
  • Unlike EDI, XML messages can be broken up with separate sections sent in parallel, for better performance.  Also unlike EDI, it is not expensive to get started in XML and participate in XML exchanges.
  • XML passes easily over HTTP transport, and can pass through firewalls.  Yet it doesn’t execute or trigger code directly, even though an XML message may describe code to be executed.  This makes it easy to send and receive, but also allows security dangers to be minimized.  The receiver is in charge of what code can be described and executed on the receiving system.

With standardized XML requests and responses as your collaboration strategy, you do not plan separately for separate clients.  You define the following elements, the same for all clients:

  • Define a message, or as many message formats, as you wish, to expose your public interface, your way of working with other applications.  (Don’t go it alone – investigate SOAP or Hewlett Packard’s e-speak, or one of several other available standards for defining document formats.  But if you choose to invent your own format(s) instead, as you’ll see, you haven’t precluded any opportunities. )
  • Define a way or ways in which you will allow the request messages to be passed to your applications (will you use a COM message call accepting an XML string, or a filename, or both? Will a command-line argument indicate a file to be processed? Will your application look for files in a particular physical location representing a task queue?).   See “a short but pertinent digression”, listed as Appendix A at the end of this paper.
  • Define your application’s method of passing back a response back to its collaborators (usually, responses are provided according to a similar plan as the requests were accepted).

Nobody does it better

When you see your work put in these terms (a series of uniform requests and responses), it almost looks as though anybody could do database development work!  Not so.

Remember that while we’re simplifying the act of collaborating, you still have to distinguish between input and output to perform the actual data work that is the real raison d’être of a VFP application.Once you receive input data, you have the same storage and manipulation tasks as before.      To send the data, you still need to manipulate complex data relationships and calculate results, as before.

We’re just removing the distractions, getting you back to the work you used to do when you wrote FoxPro applications.  If you remember, you had plenty of work to do then, and things haven’t changed.  Data is still complex, and proper data storage and analysis takes your professional skills.  From what I’ve seen in the last couple of years, businesses and organizations of all types sorely need your skills focused on the work of data management, not siphoned off into understanding how a serial port or some other interface to an input or output device can be coaxed to work with VFP.

You don’t have to change all at once

Suppose your applications currently have a standard Windows GUI interface, do you have to stop using it or change how it works to start providing this common message interface?  Suppose your applications currently use REPORT FORMs to provide printed output, do you have to stop?  No, of course not.

You define your XML requests and responses based on the kinds of input your applications need, not necessarily to replace them.  They can exist in parallel with your current activities, while making room for new clients, without a problem. 

For example a database update XML request would include elements currently provided by your data input screen.  Your data input screen can continue to function as before, while the other clients used the XML request-processing to handle updates.  Whenever you’re ready, you can retrofit the data input “save” button to construct and send the same XML update request to your database procedures as all other clients.

In addition, there may even be some new clients who cannot “speak XML”.  For example, the Palm device I mentioned earlier doesn’t yet support WML, the set of XML tags designed as a standard for wireless devices.  (The next generation of Palms almost certainly will.)  You have two possible resolutions to this problem:

  1. Create a completely new method of handling Palm input, making your application “Palm-knowledgeable”.  In this resolution, your application evaluates where the request came from and, if it were from the Palm, reads the request differently than it would read other requests, constructs different responses tailored for the Palm.
  2. Accept the Palm input and “translate” it directly behind your public interface to a client-neutral XML version.Create a proxy client, inside your application, which invokes standard XML request processing by the rest of your application.      The proxy client receives a standard XML response, which is then translated again – in the same, quarantined space – into a Palm-specific format relayed back to the real client.  The proxy client doesn’t do the translation itself, it just provides an intermediary to relay requests and responses.   For example, the VFP form “save button” we described above might call it.

The first strategy, needless to say, is not what I would recommend.  It is of limited value, since it fits only one client for only a short period of time. 

The second strategy gives you room to handle additional collaborations that are of a non-XML type whenever they come up.  You would use this translation and proxy strategy if your customer asked you to handle an EDI exchange.  The bulk of your application remains “Palm-“ and “EDI- unaware”.

How we use XML in VFP

In our work for Acxiom Corporation, XML has allowed us to be completely “client neutral”.  We receive requests, and send out responses to Java clients, Flash clients, VFP clients, VB Script clients, Perl clients, Palm clients… you name it.  As described above, we sometimes have to add a thin translation layer when the client doesn’t speak XML natively, but most of the processing continues exactly the same for each request.

Figure 4, Acxiom architecture shows XML and XSLT
at work both in internal and external exchanges.

Our XML-handling mid-tier has changed backend data resources, without the client being aware of the difference, more times than you can imagine.  These back end data resources are not accessed through the same interfaces or mechanisms -- we have been through ODBC, sockets programming, and CORBA – and they don’t even have the same structures.  The client receives the same XML response, all the time, no matter what.

Actually, the client is completely unaware when our mid-tier changes, too.  So sometimes the mid-tier includes VFP in the solution and at other times it does not.  All I can say is the solutions that include VFP are more robust than the alternatives!  Our goal has been to maintain what the client experiences, through each change.

You have probably heard people say that XML doesn’t do much good unless people agree on document formats or DTDs.  In fact, this is unnecessary when you add XML’s sister-standard XSLT (eXstensible Stylesheet Language: Transformations) to the processing.  As shown in figure 4 above, we have an architecture in which different parts of our own enterprise don’t share the same format, and we use XSLT to move between them.  On the client side, our XML-enabled partners have their own requirements for document format, and XSLT comes to the rescue again.

XML document design: a detailed look

I will talk about the problems that XML and XSLT have allowed Acxiom to solve, with detailed examples, during the session, and the next section will show you all the techniques we use.

First, let’s look at a separate example of a design problem, using something closer to home: the COVERAGE.APP that ships with VFP.  It will give you a chance to think about those important definitions I mentioned earlier: what XML document structures will you accept, and provide, as messages?  The design possibilities are almost unlimited – certainly less structured than relational database design – so how do you choose? 

The COVERAGE.APP provides an example with some unusual aspects, but, at the same time, it’s is an obvious showcase example for VFP-using-XML.

The problem to solve

You may recall that, starting in VFP 6, the Coverage log offers a “stack column”, which is not represented in the standard COVERAGE.APP interface or analyzed by the VFP 6 Coverage engine.  This omission exists for two reasons:

  • The stack information was added into the Coverage text log relatively late in the VFP 6 development cycle.
  • The shipping Coverage models Profiling and Coverage aspects of the log that do not lend themselves to the visual representation using the same interface as the nested stack levels of programs.  They are different dimensions of the same log data.

In previous conferences, I have demonstrated ways to include the stack information in the Coverage interface using a separate dialog, since you can add any additional dialogs you want into the standard UI (see figure 5 below).  If you were interested in stack levels and not concerning with Coverage and Profiling, you could also build a special Coverage interface on top of the Coverage engine, with this type of dialog as your main display.

As you can probably tell looking at Figure 5, I used a grid to represent the stack information in the Coverage log.  Underneath the grid, of course, was a VFP cursor.    In the cursor, I added columns as the stack levels grew deeper.  With rows representing code components, such as classes and programs, and stack levels represented as columns, each new stack level could be represented in the grid.  You simply scrolled far enough to the right or left. I built some extra features, such as tooltips on a cell level, to overlay more information into this interface.

This was convenient – because cursors are natural to us in VFP – but it wasn’t an entirely apt metaphor. Nested levels of programs were awkward to display using this flat structure of rows and columns.  Although the grid allowed me to represent the stack, it placed the visual emphasis on the wrong elements. Tracking all program elements that happened to fall in stack level 6, for example, was easy, paging through the grid, but didn’t present meaningful information.  I couldn’t turn the row-and-column structure sideways, making stack levels rows and program elements columns, because I’d easily run out of available columns in the cursor.

Attempting to use a relational structure was even worse.  Stack levels to code components is actually a many-to-many relationship.  This relationship can be modeled relationally, but it does not “feel” natural – nor does such a relationship display “naturally” in a VFP interface.

A tree view might have been more apt as a metaphor, to represent the stack.  But it takes one heck of a treeview to display the variety of calls and program elements in any Coverage log.  It would have been slow to create, and memory-intensive.  What’s more, this representation would not allow me to represent the various statistical elements of the Coverage log entry properly, without a kludge.  The display would have been attractive and evoke a stack, visually, but would not have lent itself to statistical analysis.

Figure 5, Coverage VFP6 interface displays stack levels in a special dialog,
along with other types of display for different purposes.

The XML resolution

XML easily solved these problems, and provided a reasonable structure for Coverage stack results.  Using Internet Explorer as a default display device – easy to override – provided a simple presentation, and allows the user to collapse and expand the various “limbs” of the stack tree, just as a tree view would, but at far less cost. The XML log also is also economical in its use of disk space.  The XML stacklevel document representing a typical TASTRADE.APP run, derived from a Coverage text log of 6 MB, requires about 800k of space.

Our first step, then would be to design an XML document that fit the data properly.

Figure 6, below, shows one of two separate StackXML documents generated by the VFP 7 Coverage engine; the two documents analyze the log slightly differently, but as you can see they are fairly straightforward. 

The document level tag shown in the figure is a default version of the tag name. Refer to Appendix Ba sneak preview of new Coverage engine PEMs in VFP7” for list of new properties and methods added to the coverage engine class in VFP7 to support StackXML.  This tag name is one of the “tuneable” options available from COV_TUNE.H, as you’ll see in the appendix.

In the “standard” document, shown in figure 6, all calls from one program component on one level will be included from one “root”.  For example, if main.prg appears as a stacklevel 1 program, then you will only see one element under the XML document-level element for any successive level 1 main.prg lines in the log.  (By “successive” I mean lines in the logs that may be separated by other programs at higher stacklevels, but none at the current stacklevel, 1 in this case.) A stacklevel 1 program with a different name, of course, would have its own tree.   This can happen in a log, even at level 1, if you execute various applications from the command window while maintaining a single Coverage log.

If you choose “extended” StackXML, the engine generates a new parent tag whenever the Coverage text log has gone down into the stack, back up to a particular stack level, and then proceeds down again.  In this format, main.prg could proceed to several level-2 calls, each of which would have its own tree.  When the log next shows a program component at stack level 1, this XML format closes the first stack level 1 tag and opens a new one, whether or not the successive lines at this level were from the same program component.

The “standard” document is more compact and more intuitive if you are looking for an overview of “what calls what”.  The “extended” document structure is more accurate, especially if you are doing performance analysis of the separate trees, because it takes into account the fact that the parent program may have been re-invoked, possibly with different parameters, rather than continuing procedurally after the called program returns.

Looking carefully at this resolution

It would have been possible to offer only one base document (the extended one) and then use XSLT transformation to provide the other document on demand.  However, providing two base document structures gives each both equal weight as valid base representations. 

Figure 6, a sample of Coverage 7 StackXML output.

We don’t ordinarily produce different document structures every time we have a different need.  You can think of the Coverage API as allowing you to make two separate requests for two separate data sets, not two representations of the same data, even though they came out of the same log.

When do you provide additional requests and responses, and when do you opt to externally transform a single XML document from one format to another?  There is probably no clear-cut line between the two practices that will fit every case.  But looking at the data critically and asking “is this a new data set or structure or is this a new representation of the same data?” will help you make the decision.

A third, valid data structure might have separated out the “ON EVENT” lines of code.  The Coverage text log doesn’t give you complete information for these lines; you can’t tell what bar or key label was actually invoked.  Since these lines can’t be tied to a line of code, they can’t be statistically tallied in the same way.  (The standard Coverage interface includes these lines of code as “never marked” or “unmarkable” to make them statisitically netural.)  In both the standard and extended StackXML data structures, these lines have “N-A” as the attribute value for their “RunsFor” attribute.  This is one way to signify their special status to any code running on this data.  However, removing them from the main tree would have been equally valid, and perhaps easier for people doing calculations with this data. XML is flexible enough to handle “inconsistent” or “irregular” data sets – much as the appendices of a book don’t have to use the same format as the rest of the text.

Eventually, you have to arise on one structure that seems appropriate for your data design, and move on.

Consistency is the hobgoblin of small minds and small systems

You’ve decided how your data should be expressed in elements, in this case how the lines of code shown in the text log are arranged and summarized within these XML structures.  Even though the structures are more elastic, and the choices more varied, than rows and columns and tables in a relational database, it’s a similar design task.  You’ll probably enjoy it.

You can’t use the VFP Database Designer, or other design tools you’re used to, to work out your XML document structures.  You need a different IDE. 
I use XML SPY 3.0, a shareware tool from Icon Information Systems, for my XML and XSL design work.   It’s the most flexible of the design surfaces I’ve tried, allowing full validation and intellisense but still providing a “text only” mode when I feel like doing some raw typing. It even incorporates an IE Browser view, so you can see the familiar Internet Explorer collapsible and context-colored version of your document. There’s unlimited undo, a project view, customizable toolbars, and all that good stuff -- but it doesn’t get in my way.
Check out http://www.xmlspy.com/default.html.

Now, what about the format of the tags and elements? The Coverage StackXML documents use a fairly straightforward set of tags and attributes (tag names are code elements, attributes give you statistics about each element).  No particular standard has been followed here, just a desire for the default display (Internet Explorer’s rendering of the document) to be easy to follow.

Why not use something like the various XML formats Microsoft makes available for SQL Server, ADO, or Office – each of which, incidentally, can be generated by VFP 7 using the new CURSORTOXML() function ?  Attractive as the thought of using the new function is, and flexible as its features are (see figure 7 below), it was impractical to use the actual CURSORTOXML()for  the Coverage project for the same reasons that a set of rows and columns doesn’t fit stacklevel data very well: this function emits cursor-shaped, rather than nested-tree-shaped structures.  Still, we could have made an attempt to provide a similar-looking structure, in element and attribute names, to make people feel that the StackXML “belonged” to one of the standard structures Microsoft promotes.  No such attempt has been made. Why not?

Figure 7, VFP 7’s CURSORTOXML() function is great for regular,
cursor-shaped data, and gives you lots of good choices with its arguments and flags.

Here’s why not: this type of consistency is relatively unimportant.  You may be raising somebody’s initial comfort level looking at your document because it seems familiar, but you may be making it harder for that same person to actually get work done with your data.

  Be as true to the needs of the data, and as clean as possible, when designing your XML formats. You will find that your XML partners use different element and attribute names and styles, no matter what standard you choose (as you see, even Microsoft provides at least three, and other XML-enabled partners will have lots more!).  You transform, or map, one way of expressing the data to what your XML partner needs dynamically, using XSLT, whenever necessary.

Let’s face it: the various Microsoft standard XML document formats, and most of the other ones, are fairly ugly and verbose. You’ll also get dragged into religious issues such as “should a document be element-centric or attribute-centric?  Most of the time, it doesn’t matter.  If you prepare yourself to serve multiple clients and handle any and all standards via XSLT, you can use whatever you want internally.  Even participating in a e-Speak or SOAP exchanges, where the broad outlines of the message format are required by the framework, the data interchange portion of that message is up to you. 

You will learn design principles for good XML documents the same way you learned good table and database design (only much quicker): by building them and seeing what works efficiently.  For example, you will learn not to use attributes for an item that might someday deserve to have a parent relationship with some other data element!  It’s practically common sense.

Although the two Coverage StackXML document formats are legible on their own, the VFP 7 Coverage engine recognizes your rights and needs, as a partner in an XML interchange, to have access to transformed versions of these structures.. It gives you a quick way to do the transformation, using the TransformStackXML(tcXSLT, tcXMLIn, tcXMLOut, tlNoShow) method and some related features.  This gives you the chance, for example, to display the StackXML as straight HTML in a browser with a “greenbar” style, alternating colors as you move between stack levels, as shown in figure 8. Although figure 8 is an admittedly trivial example of a transformation, this image should help you keep in mind that the presentation of the data is always malleable.  This includes its position, its level of aggregation, et cetera.  You can, on the fly, decide that whole portions of your data structure are not necessary for a particular transaction (or that a particular customer does not have rights to view all the data).  A transformation easily masks these items out of your delivered version of the XML.

Figure 8, an XSLTransformed view of Coverage StackXML.

In the next section, we will go over the technologies we use to perform the major miracles of XML document creation, parsing, and transformation with only minor pain.  Before continuing, however, I should mention that there is another HTML-related technology often confused with XSL because of its similar name: CSS. 

Although both CSS and XSL technologies are expressed in documents referred to as “stylesheets”, they are in fact very different and complementary in cases where your XSL serves up HTML output.  Transform your XSL into HTML or XHTML with class attributes, attach a CSS-type stylesheet reference in the <HEAD> element of your document, and you are all set to change the physical attributes (fonts, colors, etc) of your entire site or browseable document(s) without regenerating all your content.

If you want to see a non-trivial example of this strategy at work, look no further than the news articles at http://www.infoworld.com. Notice, in figure 9 below, that the news stories have an XML extension on the filename?  Now imagine the database from which the various elements of this page are generated: the banners, the main story in the center, the side panels.  Data-oriented people that you are, you can probably figure out the structure of the XML underneath – or at least a workable version of the structure – just by examining this page surface in the screen shot.  This content is very definitely created by many tables working together, each with a different structure, each type of element expiring on a different schedule.

The XML document underneath is just the beginning, however.  InfoWorld takes that same news story and surrounding content and delivers it in several other forms as well.If      you examine the source, you will just see the HTML end-product of the transform that is appropriate to your client (plus some CSS instructions, if your browser supports it.)

A “smart” pager or other wireless device receives a very tailored version of the same content.  A different browser than yours, with different capabilities, probably receives a different transformed HTML version than you see here.

Figure 9, XML at work at InfoWorld’s news site.

Here’s where we start our whirlwind tour of coding XML in VFP.  You usually design your document as a DTD or schema (in XML SPY, Visual Notepad, or your own favorite tool), and you put some sample data in it (just as you’d append records interactively in a newly-created Visual FoxPro table, to make sure you had the fields right).

Now we have to figure out how VFP will handle the three main tasks with real documents:

  • Creating the documents we send to partners, clients, and servers
  • Parsing the documents they send to us
  • Transforming between XML formats, as necessary, in between.

Creating the XML documents we’ve designed

In VFP we are used to creating strings on the fly, and they can be very large strings.  There is no reason why you can’t create a couple of classes designed especially for this purpose.  In VFP 7, SET TEXTMERGE and TEXT… ENDTEXT have been improved in ways that you’ll find useful for this work.  You can also build the document as a string and use StrtoFile(), introduced in VFP 6.  If the document is eventually saved to disk, I often use low level file functions rather than SET TEXTMERGE or StrToFile() for the best possible performance.

Manual document creation (some tips)

Most of us started off creating our XML documents in Fox manually, in this manner.  I’ll give you some advice, in case you want to do this, and then we’ll go on to some alternatives.

The method you’ll use to build up a document manually varies depending on the content of the document.  You’ll often SCAN through tables evaluating the contents of various fields, or include the property values of relevant objects.  You will find that the TRANSFORM() method becomes your best friend, because it allows you to convert everything to character type, as you concatenate your string or write out your file, without worrying about the original type of each value.

I’ve created a subclass of my FRXCLASS document generation system, which uses a template FRX to drive report creation.  The advantage of this system, which I’ve used previously for WinWord and HTML document creation, is that a single FRX, without any visible objects, can handle grouping and record pointer movement for you, while the events it triggers in the associated FRXClass object creates the real output.  The FRX2XML subclass is particularly good in that it generates a default representation of any open file, by default (the other FRXClass subclasses have to know something about your data before they can generate .DOC or .HTM files).  FRX2XML creates its default representation of a row using this VFP-native, or “manual”, method:

   PROC GenDetailRow()
     LOCAL liIndex, laElements[1]
     SCATTER TO laElements
     FOR liIndex = 1 TO ALEN(laElements)
THIS.cXML = THIS.cXML + ;
                            
                    THIS.XMLNode(FIELD(liIndex),;
                                 laElements[liIndex])
     ENDFOR
   ENDPROC

This method is neither a recommendation nor necessarily the best way to create XML that looks like a row available to you in VFP – I’ve done the same task numerous times and keep coming up with different variations.  (It never ceases to amaze me how many good ways Fox has of massaging data and text.)  I’ve included all the source for FRX2XML, its superclass FRXClass and the other format-specific subclasses of FRXClass, with several examples, as part of the source for this session.  You will find a text document named FRXClass.API with full instructions.  I find it very convenient to use FRX2XML for various manual parsing situations, including another example application I’ll present later in this paper, and you may too, even though it is just one method of many.

You knew it had to happen.  I brought up VFP REPORT FORMs, so we had to discuss at least one bug in the product, FRXs are such a fertile field for them!
Be aware that VFP 7 has added a new application-level property, .LanguageOptions, which allows you to specify strict memvar declaration.  Unfortunately, there is no way to get the option to give you compile-time errors – you’ll just have to turn .LanguageOptions on and test all your code to see what variables remain undeclared.  (Use the Coverage Profiler to help you find code that you haven’t tested!) 
Even more unfortunately, , at this writing report variables can not be declared so they don’t cause an error when strict memvar declaration is turned on.  No word yet on whether this will be fixed before release. The Coverage application saves and restores _VFP.LanguageOptions setting in code surrounding the REPORT FORM command in the DisplayProjectStatistics()method.  There is a #DEFINE that will remove this kludge if they fix the bug. If you use  use REPORT FORM… NOWAIT, not even this kludge will save you (at least in the current build).   You might have to turn _VFP.LanguageOptions off in an early DataEnvironment method, and turn it back on in a late one.  I haven’t tried this.

No matter how many times I create a manual XML document creation procedure, however, there are a couple of XML principles I have to keep in mind.  You’ll need to follow them, too, for any manual parsing you do.

Here is the most critical rule: be careful, if you build the documents manually in this manner, to make sure your documents are well formed before passing them to another application or a user.  An XML document is well-formed if it follows the syntax rules of XML, including the necessity of closing all tags and having a single root node.

NB: The output from an XSL transformation is not required to be well-formed, in that the single document or root node is not required.  It is required only to be well-balanced (all the tags that exist have to be closed).  This output is considered to be an object of type Document Fragment, not Document, within the standard.  These document fragment objects can be concatenated together, or further built up with additional elements using other methods, before being released as a final document object.

Don’t confuse well-formedness with validity.  An XML document is not required to have a DTD (document type definition) or a schema.  If the document does reference a DTD or schema, it is required to be valid according to that DTD or schema.  But every XML document is required to be well-formed, regardless of validity.  Although your manual string-building may not be able to tell whether or not a document is well-formed, many of your clients for the document will not be able to use your document at all unless it is well-formed.  They are using standard XML parsers to handle the document you send – and the document will not load in the parser if it is not well formed.  It is the responsibility of the application formulating the XML to make sure the XML will be parse-able, without any further adjustment, by the receiver of the document. 

How can you check for well-formedness?  The easiest way to test the XML you’re generating manually is to load your results into a parser conforming to the XML standard, at least for debugging purposes. 

To check from the command window, simply CREATEOBJECT(“Microsoft.XMLDOM”), which gives you a reference to the Microsoft XML parser.  Now load your document into the parser as a string using the LoadXML(tcString) method, or use the Load(tcFile) method for a file on disk.  If the method returns .F. (or errors, depending on your version of the Microsoft parser and/or your version of FoxPro!), you should be able to check the <ParserReference>.parseError.reason property to see what went wrong.

If you don’t mind leaving the VFP environment for your tests, load your XML output into _CLIPTEXT and move over to an XML-oriented IDE like XML Spy.  In this type of editor, you will get more exact information about your error, and usually more help in resolving it.

An important error that people often make when creating documents manually is to omit translating characters used in expressing XML syntax when they include data elements. There are only five such characters, but if your data includes them the document will not parse, because the parser won’t be able to tell where tags and entity references begin and end.  Here’s how FRX2XML handles the problem:

   PROC XMLTransform(tvElement)
LOCAL lcElement 
            
      lcElement = TRANSFORM(tvElement)
      lcElement = STRTRAN(lcElement, '&', '&amp;'   )
      lcElement = STRTRAN(lcElement, '<', '&lt;'    )
      lcElement = STRTRAN(lcElement, '>', '&gt;'    )
      lcElement = STRTRAN(lcElement, '"', '&quot;'  )
      lcElement = STRTRAN(lcElement, ['], '&apos;'  )
   RETURN lcElement

Another way of resolving errors arising from XML control characters in data is to surround the data in a CDATA block, which tells parsers not to parse the contents of the block at all.  In this way the offending characters are masked from the parser.  I think this is less flexible, however.

Ensuring well-formedness (and validity if your scenario includes a DTD or a schema) is quite a bit of work if you create documents manually.  Your alternative in VFP 7 might be to use CURSORTOXML(), which I described earlier, and which removes the problem from your hands.  The problem you’ll face, as we saw in the StackXML example, is that the resulting document will be more closely tied to your database structure than might be appropriate for the contents of your data.

A better alternative is to use an XML-standard parser object model to construct, or finish constructing, your document.  For document construction, you will usually use the DOM (document object model) parser we’ve already mentioned briefly.  In the next section, instead of using the DOM parser just to test well-formedness, we’ll use it for document creation.

The FRX2XML class can just as easily generate its XML using a reference to the DOM parser, as FRX2DOC uses a reference to Word, rather than the manual parsing currently used in this class.  It’s too useful a system for me to stop using, even though the mechanics driving each form of output changes over time.
FRX2XML is probably the last FRXClass subclass I’ll ever write.  That’s because , with XML available, I’m not really interested in getting VFP to generate other formats such as RTF or PDF.  I’ll use VFP to generate my XML document structure, and then transform the results using objects and applications expressly written to provide the various output formats, using the principles of XSL:FO (formatting objects).  You can check out a commercial XSL rendering engine application at RenderX (http://www.renderx.com/) or FOP, an open source XSL renderer, at http://xml.apache.org/fop/
The XSL:FO standard is, unfortunately, not as far along as the other XML-related standards, possibly because established cross platform alternatives, such as PDF, do exist.  This is an area I don’t know much about but will be pursuing in the future. 
MS - DOM parser syntax for document creation.

Return to the parser object reference we used earlier. Because this is an XML-intimate,object, you will immediately see how it protects you from some of the problems you face when you generate documents manually.  For example, try this at the command window:

ox = CREATEOBJECT(“Microsoft.XMLDOM”)
oy = ox.createElement("xxx")
oy.text = "<>&'"+["]
? oy.text
* result: <>&'"
? oy.xml
* result: <xxx>&lt;&gt;&amp;'"</xxx>

See the difference?  The XML for this newly-created element object is appropriately transformed to include the entity references.  I have not yet appended this element into an XML document, using the parser – but when I do, this element will load, and become a valid part of a parseable document without any problems.

The DOM has rich syntax for creating nodes of different types and using them to build documents, of which the above is a small sample.  The syntax I’m showing you in this section may include some Microsoft extensions, which means that you will not be able to do exactly the same thing with other parsers.  Even within the standard, other parsers beyond Microsoft’s may use different method names for the same functionality.  Basically, however, they all work the same way.

Here is an excerpt from the VFP 7 Coverage engine’s GetStackXML(tcLog) method, to give you a more extensive example.  I’ve added some comments so you can see what the different parser-related activities are:

* instantiate the parser:
loXML = CREATEOBJECT("Microsoft.XMLDom")
* set up a root node, an empty document, 
* by loading a string:
loXML.LoadXML("<?xml version='1.0'?>”+    ;
   <"+COV_STACKROOT+"-"+JUSTFNAME(THIS.cSourceFile)+"/>")
liStackLevel = 0
SCAN
  * I have omitted a section here in which
  * the current record is checked to figure out
  * whether it is an “ON EVENT” and what its parent node is
  IF llAdd
loThisNode = NULL
            
      IF NOT THIS.lStackXMLExtendedTree
         * check to see if this element already exists
         * at this level, using an XPATH expression
         * This expression can be read “wherethe parent node
         * name matches the one I stipulate, the nodename matches
         * the one I stipulate, and the stack level is the same
         * as the level for the current log entry”
         loThisNode = loXML.SelectSingleNode( ;
                       ""+loParent.NodeName+"/"+ ;
                       lcTag+"[@StackLevel="+TRANSFORM(FStack)+"]")
ENDIF
            
      IF ISNULL(loThisNode)
         * create a node
         loThisNode = loXML.createElement(lcTag)   

         IF llEvent
            * set an attribute on the node
            loThisNode.setAttribute("RunsFor","N-A")ELSE
                        

         
            loThisNode.setAttribute("RunsFor",TRANSFORM(FDuration))
         ENDIF
         * set another attribute
         loThisNode.setAttribute("StackLevel",TRANSFORM(FStack))
         * add the child node to the parent --
         * since the document is “living”, appending the
         * child to the parent immediately changes the contents
         * of the document XML as a whole:
         loParent.appendChild(loThisNode)
ELSE
               
         IF NOT llEvent
            * add the current duration statistic
            * to the value of the current RunsFor attribute
* for the existing node.                Again we use an XPATH
            * expression, this time to grab the value of an 
* attribute rather than a reference to a node:          
         
            loThisNode.SetAttribute("RunsFor",;
           
         TRANSF(VAL(loThisNode.SelectSingleNode("@RunsFor").nodeValue)+;
                   FDuration))
ENDIF
      ENDIF
               
      
      loNode = loThisNode
ENDIF      
         
   * end of XML-adjusting code, some
   * maintenance code here…
ENDSCAN

The code above is not complicated, given a little time spent with an Object Browser and the MSXML parser.  The only part that will be strange to you, and take some practice, is the XPATH syntax, used twice in this code snippet.  XPATH is a kind of query syntax, which you can use directly in the MS parser to select a single node or value (as used above, with the SelectSingleNode(cXPATHExpression) method) or to get a reference to a collection of nodes sharing some characteristics (use the SelectNodes(cXPATHExpression) method. 

XPATH is odd looking but very powerful, something like “regular expression” syntax.  In both direct use with a parser, as here, and within an XSL transformation, XPATH allows you to winnow through very complex document structures, using object references and variables, and setting multiple filter conditions on different levels of your query expression.  As you can see when you contrast the two uses of XPATH above, you can apply the query expression either to an entire document or the portion of a document represented by a single node reference (which includes all its children).

The DOM syntax and object model has a few other concepts you may find strange at first.  For example, it’s often difficult for people to realize that the text surrounded by two element tags (for example the word whatever in the document fragment <mytag>whatever</mytag><someothertag/>) is actually an object, a node of type text, not just a string.  Sometimes the Microsoft parser will let you forget about this distinction (other parsers usually won’t) but often you will have to treat this text as the proper text node object type to get the results you want. 

Consider the following javascript snippet, which changes the text for a particular element if it has been previously filled, and otherwise just adds new text.  If we were editing the document fragment above, this code would change the whatever text to some new text.  If <mytag> had no current text, this code would simply add it.  In this code you can see how the text is being treated as an object and also how it exists as a child node of another document element:

var oNode = httpXML.selectSingleNode(tcQualifiedNodeName)  ;
if (oNode) {
   var oTextNode = httpXML.createTextNode(tcNodeValue)  ;
   if (oNode.hasChildNodes()) {
      // we should be at the bottom level
      // at this point, in this particular document
      // so there should only be one child, the 
   text node.
      // If oNode could have other children, such  as
      // child elements, I might
      // write this code a little differently:
      var oOldChild = oNode.firstChild ;
      oNode.replaceChild(oTextNode,oOldChild) ;
}
   else {
         
   
      // no text yet, just add it
      oNode.appendChild(oTextNode) ;
   }
} 
Recommendations

Although using the DOM parser in this manner is powerful and flexible, I recommend you use a combination of VFP’s string-handling abilities and the parser to create your documents.   In VFP 7, use the native faciities, such as CURSORTOXML(), too. An especially good technique is to have a skeletal or template document ready, either on disk, in a memo field, or built as a string.You      don’t worry about making sure that this template is well-formed because it is a known part of your system, not something created out of unpredictable parts at runtime.   (You checked it for well-formedness when you created  it, didn’t you?  You won’t need to check it at runtime). 

You load this template into the parser and then use a few parser calls to add the dynamic values for this instance of the document using code similar to the javascript above.  For example, you could have a request document ready for somebody else’s server that included a complex request format, complete with a number of elements you set the same way every time you make a request of this server.  All these options and the general request document would be included in your template. At runtime, you add your user ID and password into the appropriate nodes, add a few request parameters specific to this occasion, and you’re ready to send the request. 

If some parts of the request parameters were repetitive (for example, multiple names and addresses), you might have a cursor holding the various instances of the request within one document.  You’d use CURSORTOXML() to generate XML for these the sections. There is no reason to have COM calls, building static sections of the document, just for the joy of creating and appending elements to the DOM model!

When your application functions as a server, you may want to have many response templates as appropriate to different sorts of clients. (the InfoWorld news article example discussed earlier may be “kickstarted” using this strategy).  Web servers can evaluate what type of client has made a request both to determine an appropriate template and the an appropriate XSL transformation to apply. 
I recommend you check cyScape Products for a look at BrowserHawk (http://www.cyscape.com/products/ ).  This product gives you much finer understanding of your client than other web-server-based methods, and also can be used on multiple servers, in multiple environments.  The latest version will tell you about the user’s connection speed, available plug-ins, and a lot more – plus it is intelligent about a wide variety of clients (see figure 10 below).

Now that you have some sense of how you construct an XML document, how will you handle the XML documents you receive? The next section will give you some ideas.

Figure 10, BrowserHawk documentation shows you
how this product can give you critical aid in determining “who you’re talking to”. 
This helps you figure out what type of data interchange or
presentation is best suited to the current “conversation”.

Handling documents (parsing) –

You can use the DOM parser described above to handle the documents you receive from partners in XML exchanges.  Using XPATH query expressions, you can easily grab one or two crucial values from the document, and discard the rest.

Be aware, however, that DOM parsers load documents into memory, so this may not be the best choice for large result sets.  This is not a necessary limit of the DOM, it is just a limitation of DOM parsers currently released.  (It is easy to imagine a DOM parser swapping parts of a document to and from memory, just as VFP swaps cursors to and from disk depending on their size and available resources.)

Normal VFP string-handling procedures might be more appropriate here, given our ability to handle long strings and less concern with the possibility that you might unwittingly do something “non standard” when you are parsing, rather than creating, a document.  You can grab single values with an AT() search, or use low level file functions,  MLINE() (don’t forget the performance-enhancing _MLINE offset!) or ALINES() to scan through an entire document efficiently, inserting rows into a cursor as you go.

VFP 7 makes parsing easier than ever, with enhancements to ALINES() and ASCAN() that may be useful to XML development, plus a new STREXTRACT(cString, cBeginDelim [,cEndDelim [,nOccurence, [nFlags]]]) function targeted at XML string handling.

In VFP 7, you might also decide to XSL-transform the document so the result was “cursor-shaped”, matching one of the formats understood by the new XMLToCursor(cExpression|cFile, [cCursorName, [nCursorType | lStructureOnly]]) function. After you ran XMLToCursor  on your transformed document, it would be easy to manipulate and store the data using normal VFP methods.  This could be a very efficient approach, especially for documents containing multiple rows of results.

There is a second XML standard way of approaching documents, however, beyond the DOM, called SAX (the Simple API for XML).  You should consider it for document parsing, ,because it is efficient for large documents. 

Until VFP 7 we haven’t had a way to use SAX within VFP.  SAX is an event driven model.  Your program must instantiate a class that implements the SAX interface – something we couldn’t do until VFP 7.  When you tell this “document handler” to parse your document, it triggers the standard SAX events, and the code in your class runs as these events are triggered.

The code you’d write to instantiate the parser and parse the document would look something like this:

LOCAL loReader, loContentHandler, loErrorHandler
loReader = CREATEOBJECT("MSXML2.VBSAXXMLReader")
loContentHandler = CREATEOBJECT("FoxSAXContentHandler")
loErrorHandler = CREATEOBJECT("FoxSAXErrorHandler")
loReader.contentHandler = loContentHandler  
loReader.errorHandler = loErrorHandler       
loReader.parseURL(GETFILE("XML"))             

RETURN

In the code above, FoxSAXContentHandler and FoxSAXErrorHandler are my two classes implementing the standard interfaces.  You’ll find my complete example in your source code for this session, as the file MISC\VFPSAX.PRG.  As the comments there will tell you, I haven’t been entirely successful with SAX and VFP 7, and I can’t tell whether the problem lies with the VFP beta or the relatively new IVBSAX interfaces I’m trying to use. For the record, I wrote this code just before the September MSXML beta drop came out, and I think they have changed the SAX interfaces considerably in the latest version. 

The FoxSAXContentHandler class declaration declares its interface implementation and  implements the required class members, in code that looks like this:

DEFINE CLASS FoxSAXContentHandler AS Custom 
   Implements IVBSAXContentHandler IN MSXML3.DLL
   IVBSAXContentHandler_documentLocator = NULL
   FUNCTION IVBSAXContentHandler_startElement ( ;
strNamespaceURI As String, ;
       strLocalName As String, ;
       strQName As String, ;
              
       
       
       attribs As MSXML2.IVBSAXAttributes)
     LOCAL liIndex
       
     
     ? "tag: ",strLocalName
     ? "start text node: "
ENDFUNC
      
   FUNCTION IVBSAXContentHandler_endElement( ;
strNamespaceURI As String, ;
       strLocalName As String, ;
       strQName As String)
            
       
       
     ?? " :end text node"ENDFUNC
          
   
   FUNCTION IVBSAXContentHandler_characters(text 
   As String)
?? text
   ENDFUNC
         
   
   FUNCTION IVBSAXContentHandler_endDocument()
ENDFUNC
      
   * lots more empty methods here…
ENDDEFINE
* error handler class is implemented here…
* see VFPSAX.PRG.

Note that all required events must be implemented and all required interface members declared, but you don’t have to actually use them all, you can leave a method empty if you don’t need to trigger any code at the event represented by that method. 

If you need only a few bits from the center of the document, or if you have to do some calculations based on some parts of the document before you can decide how to treat other parts, SAX is not a good choice.  For one thing, you usually can’t guarantee the order of the nodes in an XML document. 

But, in other cases, you can see that this approach is quite exciting and has a lot of potential.  It is best when your use of the document will work well with a steady , one pass “read” through the entire document, since that is what the SAX parser does. 

Document transformations and exchanges through XSLT

Once you can create and parse XML documents, you get to the crux of the problem: how do people share these documents? If everybody creates their own internal documents to fit their own processes, as I’ve recommended, what happens then?

Throughout this paper I’ve been saying “you have a document and then you transform it using XSLT to meet somebody else’s needs” or “you receive a document and transform it so it matches your requirements”.  What exactly does the transformation do, why do I think it’s a good idea, and how does it work?

The “why” part is probably easiest for me to answer:

The truth is that people will create their own formats, and they will not cooperate on one standard format.There are many reasons for this.      But, whatever the reasons, even when you are dealing with simple row-and-column-shaped data, people will not agree on what how that data should be represented. 

Oracle will require one root tag and one row structure to do an INSERT into their tables, and their SELECT statement will likewise generate one sort of XML.  SQL Server might do something similar, but they will not even use the same format as ADO, let alone Oracle!  Siebel will create something called an XML representation of an “business object” (where the “business object” is what VFP programmers think of as an “updateable view”), and this document format will, likewise, be required and nothing like the other two.

So… get used to it.  You’re going to use XSLT to map between XML exchange partner formats, even when they are doing exactly the same kind of job (showing rows and columns in a table).  When they are doing something more complicated and more specialized than showing rows and columns, the mapping gets a little more complicated to do but it is just as necessary, if not more.

The “what” deserves a paper as long as what I’ve already written (!)… and the “how” might take another paper that long.

I’m going to give you an overview by walking you through an XML exchange process, and then get down to  specifics.

The case I’d direct your attention to is the recent partnership between various large airlines, to use XML to solve the problem of ticket transfers. (This story was reported in Computerworld,, 25 September, 2000, “Airlines turn to XML to try to fix e-ticket transfer problems”, by Michael Meehan.)  Here’s a statement of the problem and the use of XML to resolve the problem:

Currently, passengers who have electronic tickets have to wait in line to receive a paper ticket from their initial airline if a flight has been canceled and they want to try to switch to another carrier. In addition, airline employees must fill out a handwritten "flight interruption manifest" for each ticketholder who's looking to rebook elsewhere.

But with an industry-standard setup based on XML, Young said, a passenger's electronic ticket could automatically be transferred to another airline's system. The common XML technology would provide an easy-to-process format for all the airlines and could make electronic tickets more valuable than paper ones, he added.

I think all of us can sympathize with the airlines’ desire to better this process, to make the growing numbers of flight cancellations and overbookings easier to handle, for everybody concerned.  Let’s walk through what happens now, and what will happen with XML, to see how things are going to work in the new, improved system.

As you see in the quotation above, currently a ticket agent reads an electronic ticket record and either manually or through his/her system translates that e-ticket into a paper ticket.  The passenger then takes the paper ticket to another ticket agent in another airline, with whom the passenger hopes to get a seat.  The second agent reads the paper ticket and fills out a new record based on the contents of the old one, and issues a new ticket booking the passenger on the second airline.

Here’s the critical bit: the old booking record and the new booking record do not have the same format.The two airlines don’t keep their records the same way.      Luckily the agents have read each other’s tickets so many times that the experienced ones are very good at this.  They easily transfer the information from the right boxes on one record to the equivalent boxes in the second record.  Where necessary, they translate between currencies or timezones or languages, and they squeeze the contents of two boxes together into one field, or break up the contents of one box into two entries, until everything fits their system.  (The inexperienced ones tear up three new tickets before they get it right… lines get longer and more flights get missed…)

When you change over this system to XML, the agents no longer have to write out the tickets, which is a good thing.  But the two airlines still don’t share the same system (and have no intention of sharing the same record-keeping systems, for many reasons!).  

That’s the “why” of XSLT, as you can easily see: it enables one system to be mapped to another.  Leaving aside “what” XSLT does to accomplish the translation for a moment, here’s the “how”: a systems developer sits down with one or more experienced agents and learns how these agents convert the contents of one form to the next.  Once this process is recorded, no agent ever has to do it again, with five people shouting at them, two new trainees, and somebody’s baggage all over the floor.

The process by which a developer learns a manual system and transfers it to an automatic one should be a familiar process to you.  You are all experienced at observing manual procedures and putting them into a program. 

The difference between placing a transformation into a program and into an XSLT document is that XSLT is a declarative syntax, not a procedural syntax.  You specify what mappings you want to occur, and you can use logic to do so, but you don’t write any code that emits any text.  In other words, you don’t write the code that tells the XSLT processor how to do the transformation.  In fact, although XSLT processors are based on XML parsers, you aren’t even supposed to think about whether your XSLT processor uses a SAX model parser or a DOM model parser to do its job.  The XSLT specifies only the results you want, including any conditional logic, but not how the results are created.

You could, indeed, write DOM handling code or SAX event code to handle the mapping problem instead of using XSLT.  But you’d write a lot of code and if even one box on a form or one use of a column changed your logic might be incorrect.  In addition, when your disgruntled airline passenger took his paper ticket to a third airline, your logic for handling the mapping between airline 1 and airline 3 would be entirely different.

With XSLT, you don’t change any logic in your program.  You don’t recompile anything in your application when changes occur, or when you add another partner to the exchange.  You make XSLT stylesheets available, specify which ones go with which translations, and you apply these translations with simple lines of code that do not change.  For example, with the Microsoft parser, your code for the transformation might look like this:

oXML = CREATEOBJECT(“Microsoft.XMLDOM”)
oXSL = CREATEOBJECT(“Microsoft.XMLDOM”)
oXML.LoadXML(cMySourceDocumentAsString)
oXSL.Load(cMyXSLTFile)
lcResult = oXML.transformNode(oXSL)
* If you prefer, you can get an object reference back 
* from the transformation, rather than a string result, 
* using .transformNodetoObject method rather than 
* transformNode().

As you can see above, you have two instances of the Microsoft parser loaded with two XML documents, one the source XML document you wish to transform and the other the stylesheet. Yes, XSLT stylesheets are written in XML.  You can manipulate them with the DOM, change parameters in them at runtime using the DOM, like any other XML document. 

You may want a quick, command line or Windows shell method of testing transformations, especially if you’re writing your XML and XSL in an editor like Notepad and have no built-in way of associating the transformation and applying it.  Here’s a VBS script (which you can also call from a .BAT file using CSHELL.EXE) I use to do this:

'Call it: xslproc.vbs mydoc.xml mysheet.xsl output.htm 

Dim XMLDoc 
Dim XSLDoc 
Set XMLDoc = WScript.CreateObject("MSXML.DOMDocument") 

XMLDoc.load WScript.Arguments(0) 
Set XSLDoc = WScript.CreateObject("MSXML.DOMDocument")
XSLDoc.load WScript.Arguments(1) 
Dim OutFile 
Dim FSO 
Set FSO = WScript.CreateObject("Scripting.FileSystemObject") 

Set OutFile = FSO.CreateTextFile(WScript.Arguments(2)) 

OutFile.Write XMLDoc.transformNode(XSLDoc) 
OutFile.Close

I hesitate to write a lot of XSL examples here, for several reasons.  First, the version of the XSLT processors that most of you have available is the one Microsoft published before the standard became available, and has a lot of deficiencies and non-standard syntax. 

To make sure that you are writing and testing standard XSL, I suggest you test with at least one parser besides Microsoft’s.  My choice is SAXON, written by Michael Kay, one of the authors of the XSLT standard.  You can use SAXON on the command line to do transformations.  You can download SAXON or “Instant SAXON for Windows”, which is just the interpreter without source, at http://users.iclway.co.uk/mhkay/saxon/

If you want to start using XSL with the Microsoft parser, you should download the MSXML Technical Preview from Microsoft’s MSDN site.  Their more recent versions are far more standards-compliant than the one they released with IE.  Be sure to install the MSXML files in replace mode, following the included instructions, or else remember to switch back and forth between the versions.  If you don’t, IE and other default invocations of MSXML will keep using the old version.
You can load two separate XSLT transformation engines within the XML SPY interface (use the Edit Settings dialog, on the XSL tab). I usually keep IE loaded along with either SAXON or Oracle’s java-based processor, and I try to make sure my transformations work in both.  If I have any doubts, I go with SAXON’s results as indicating a definitive (standards) ruling.

Second, although XSLT is XML, it is extremely unusual looking XML and tends to look quite alarming unless you have a particular goal in mind and can understand the XSLT you’re looking at in relation to that goal.  There is no such thing as a “typical” translation document, in my experience.  I’ve included one short and somewhat frivolous example of XSLT in the sample application I discuss in the next section (you’ll find it in the ASP\SUPPORT directory of the source for this session).  Time permitting, we will go over several examples of non-trival XSLT syntax in detail during the session.

Putting it all together

As part of your session notes, in the \ASP directories, I’ve included a tiny but complete ASP application, using a VFP COM component where some data manipulation takes place.  This COM server, which you’ll find supplied with all source in the \COM directory, is basically an “all purpose” VFP server that exposes the Application property and its crucial methods.  In this version, a subclass augments the base DataToClip method to be able to provide XML along with its standard data formats.  I use an instance of FRX2XML to create the XML within this augmented method.

Don’t be too concerned about the fact that it is an ASP application, because it is the structure of this application rather than its environment that should drive home the point I wanted to make.  The COM component represents “the stuff that VFP does really well” and that we want to pass in to VFP to do.  Although my little all-purpose VFP server won’t be anything like your implementation of the real life component it represents, keep in mind that a minimum of COM calls is a good idea, whether the caller is a web server or not.  Replace this VFP component with some VFP component that accepts and returns XML instructions, and you’ve got it.

The external part of this application, here written in VBScript-ASP, represents “the stuff that is required for an XML-enabled application”.  Some of it might be done in VFP in your case, rather than externally as I show here, but I wanted to make sure you could see all the XML-related pieces spread out in script code, while the “standard” VFP processing parts remained hidden behind the COM object.

What are the external pieces?

LSN_XML.TXT describes each file, in all the source directories, individually, so I’ll just quote from the relevant section of that text file here:

  • ASP\VFPBASED.ASP: Main file for ASP demo application, showing VBScript + XSLT for presentation and business interchange, while VFP handles all data chores
  • ASP\MYPROCS.INC: “Include" file for ASP application, showing how your app might separate out standard processing chores used by requests to multiple ASP pages.
  • ASP\MYVARS.INC: "Include" file for ASP application, showing how your ASP app might set global options and parameters such as location of data and support files
  • ASP\MYLOCS.INC: "Include" file for ASP application, showing how to personalize/localize messages for an appframework, using Application.Contents() here to load and store these values.  In a web app you aren't likely to localize but you *would* want to change these values to suit the current application all the same.
  • ASP\SUPPORT\HTMLRESPONSE.XSL: An example support file used in XSL transformation of an XML transaction

That’s it.  You have configuration options, you have localization message strings, you have some standard chores such as figuring out what transforms are appropriate to your current action and current options, and actually performing the transformation, you have a set of XSL files, and you have a “face” of the application, some “main” routine, which accepts requests from the outside and returns responses to the outside.

As far as I am concerned, most, if not all, of these exposed pieces can be done in VFP rather than externally as shown here.You might certainly decide to instantiate your DOM parser objects and do the transforms within the VFP component.      You could evaluate which XSLT transforms are appropriate for the current action, either inside or outside the VFP component.  In this case, the deciding factor probably will be: Where can you best cache objects that do the work, even if they would the same objects (in this case, MSXML parser instances) in each implementation?

This example happens to be about ASP, and hence a web server handling the XML exchange, although once again you shouldn’t assume that this sort of exchange is only “important” over the Internet.
Within the Internet space, I just want to point out that there are additional differences of opinion about “where some of the work should be done”, beyond the server-side application portioning we’re discussion here..      The division of labor to be performed isn’t only “which application component on the server does what job”, it’s also “what does the client do versus what do the various serving tiers do”. 
Microsoft, as usual in favor of relatively heavy clients, often indicates that you should hand over a reference to the XSL spreadsheet within the body of the XML document and then send that document to the client, so that the client can do the transform.  I disagree. They like this approach because it lessens the burden on the server, but it also assumes a level of familiarity with XSL and a capability on the part of the client that is not a wise assumption (unless you want everybody in the world using IE as a client <g>!).  It is much better, in my opinion, to do the transform on the server where you have control over it and so that you can serve all potential clients equally well.  You can do the transform efficiently to maximize server performance – this is like anything else.

The important thing you should notice, when reviewing this application, is how little of it there is.  It’s just a thin shell around the work you already know and do well (represented by the “VFPAllPurpose” server in this implementation).

Among these external pieces, you’ve also already seen that some work can be done by native VFP code, such as string manipulation, as well as by COM components designed expressly to work with XML.  Your goals should be to use each tool where it is optimizable.  For example, your VFP code can sort and calculate output before translating that output to its XML response format.  You could also go to the XML response format and then ask your XSLT transformation to handle the sorting and calculating chores – but VFP will do it faster. (XSLT even has indexing and lookup capabilities – but whose version of these features do you think you should use?)  On the other hand, when you’re ready to prepare an HTML version of your XML response, this is something XSLT handles far more elegantly than you will in FoxPro code, in my opinion.

One other criterion you might want to use when deciding which components do the work:  What do you know well, what do you not know well?     The answer to this doesn’t always point in the same direction that you might think.Because you know FoxPro well, and you do not know the DOM well, you might expect me to recommend manual document creation using VFP code.      However, as you’ve seen, I recommend you let the parser take care of document manipulation to avoid errors, even though it means you have to learn parser syntax. 

The parser knows what a valid document and a well-formed document look like, better than your code, and will not make mistakes.  The goal of delivering valid and well-formed XML as your application’s responses is so crucial to an XML-enabled application that this is my highest concern.  It’s this goal, faithfully pursued, that makes everything else run smoothly.

Conclusion (why we will still be here)

I started this paper with a bow to my friend John Alden, and I’ll close it by fulfilling a promise to mention another friend, Kevin Jamieson. Kevin is a young, but happily shining IT professional.  I’ve known him since he and my son Josh were 5 years old and, when he asks a question, it’s generally a good one.

Kevin has become something of a Luddite, even though he works with high tech equipment all day.  He has bought a manual typewriter to record his important (trans: non-work) thoughts.  He asked me to ask, and think about, “why anybody would ever type XML on a typewriter”.  Kevin says if XML were really good we’d want it everywhere.

I promised I would record Kevin’s question in this paper. I don’t really have an answer for you, or for him, about why we’d type XML on a typewriter.Unlike Kevin’s physical      journal pages, XML isn’t really a product or an end result, in itself.  It just provides a conduit – both for data exchange and data presentation – to more products and end results than anything else I can imagine.  Since it’s not an end product, since it requires some application or device to extract meaning from it and apply format to it… it’s hard to imagine XML existing outside the world of electronic devices and processing power. 

But, within that world, it has so much to offer!   I expect, for as long as I work with computers in the future, I’ll be working with XML.  When I’m using VFP to extract the meaning and apply the format to XML, I know I’m working with two well-matched technologies, and one of the most creative partnerships that the world of computers can offer any developer today.


Appendices

Appendix A:
A short but pertinent digression

Colin and I have been using a strategy that seems to work very well for different types of clients: we design our applications as out-of-process (EXE) COM objects, but also give them a command-line usage.  This strategy does not have to be adjusted or re-implemented for each application.  For example, we typically allow only one command line argument, and it always does the same thing.

The command line argument is a filename. We stipulate that the format of this file is a set of lines in the format property = value.  The main program associated with EXE does the following:

  • instantiates the COM object
  • invokes the COM object’s .LoadFile method , passing it this argument
  • invokes the COM object’s .Run or .Execute method.

Internally, the .LoadFile method simply validates the filename and converts the contents of a file to a string, which it sends to a .LoadString method. 

The .LoadString method goes line by line (normalizing whitespace, and using CRs, CRLFs, or LFs as line terminators) through the string:

  • check each line for a potential property=value statement. 
  • subject each possible property to a PEMSTATUS() test
  • if appropriate, STORE a type-transformed version of the value to (“THIS.”+  property).

One beauty of this system is that the .Run, .LoadString, and .LoadFile methods are also available to the COM object. By calling.LoadFile or .LoadString – whichever suits you – followed by .Run, a COM client can make efficient use of a VFP server, with the fewest possible number of COM calls. This is great for performance reasons as well as giving consistent functionality between your command-line and COM clients. 

We find that some early binding versus late binding COM clients have different needs with regard to passing Fox parameters. Some client environments must pass the arguments (no argument can be optional), others have difficulty with methods that require arguments (all arguments must be optional).

With strong typing in VFP 7, this problem may be somewhat alleviated.  To date, however, to serve these differently-abled clients equally, we usually have something like a .cParameter property. If an exposed method such as .LoadString expects an argument and does not receive it, it looks to this property for an alternate source of the required information. A client that has trouble passing VFP parameters uses this property.

Although it is true that not every system has a single process easily defined, to be invoked by a .Run or .Execute call,this is usually solved with      an “action” property  the client can set, like any other, in the set of properties processed by .LoadString

I highly recommend this practice.  I guess it has nothing explicitly to do with XML, on the face of it!  However, it is all part of giving equal access to different types of clients, creates efficient COM applications, and works very well with XML-enabled applications in general.


Appendix B:
A sneak preview of new Coverage engine PEMs in VFP7

Stack XML enhancements to Coverage in VFP7 include the following new PEMs for the cov_engine class in COVERAGE.VCX.  They are shown here with the PEM descriptions added to each member as part of the VCX, with a little extra explanation in some cases. Note: cov_standard (the interface class) was not touched in this enhancement.

.cSavedStackXML

Holds the name of the saved Coverage stacklevel analysis in XML form, after this file has been saved to disk. Set back to default ("") when you load a new log.

.lStackXMLExtendedTree

If .T., generates more extensive StackXML, so Profiling of each branch can assess effects of args and other factors for different invocations of a module. Defaults to .F. but if COV_LOAD_STACK_FROM_DBF is .T., this more extensive XML is always gen'd.

.ShowStackXML(tcLog)

Calls GetStackXML(tcLog) and DisplayStackXML(tcXMLFile) to run coverage analysis figures against a specific VFP project set of files. tcLog argument ignored if DEFINEd COV_LOAD_STACK_FROM_DBF is .T. Returns .T. if successful.
(like ShowProjectStatisics)

.GetStackXML(tcLog)

Generates Stack Analysis XML from tcLog, defaulting to current Coverage source log. tcLog argument ignored if DEFINEd COV_LOAD_STACK_FROM_DBF is .T. Returns .T. if successful.
(like GetProjectStatistics, this is the one that does the real work)

.DisplayStackXML(tcXMLFile)

Displays XML file, defaulting to the current Stack Analysis filename indicated by the cSavedStackXML property. Returns .T. if no error occurs.
(like DisplayProjectStatistics, except that, instead of being abstract in the engine, cov_engine.DisplayStackXML does a ShellExecuteA)

.ToggleStackXMLExtendedTree()

Toggles .lStackXMLExtendedTree. Designed to be augmented in subclasses to reflect this switch in the UI, change default XSLT on this basis, etc. Returns .T. if no error occurs.
(like ToggleCoverageProfileMode)

.cStackXSLT

Holds filename providing default XSL Transformation stylesheet to be applied by default to the generated Stack XML analysis document, when TransformStackXML is called.

.TransformStackXML(tcXSLT, tcXMLIn, tcXMLOut, tlNoShow)

Applies tcXSLT (default to .cStackXSLT) to tcXMLIn (default .cSavedStackXML, GetStackXML called if empty). Saves result to tcXMLOut (defaults to tcXMLIn-based generated name w/ HTM ext). If ! tlNoShow, calls DisplayStackXML. Returns .T. if successful.

Additional Coverage changes in VFP 7 not shown here to support this enhancement include various existing methods updated to account for the new Stack piece.For example,      cov_engine.Init(..) has been adjusted to call GetStackXML() if the engine is instantiated in unattended mode, along with creating and saving the target dbf and skipped files dbf.

Interface changes in the COV_OPTIONSDIALOG class (see figure) and COV_LOCS.H localization message file as well as the shortcut menu expose the Stack features.

New bottom section of Coverage Options dialog allows you to set Stack options and defaults.


New tune-able options in COV_TUNE.H to support the Stack feature include the following:

   * the first two represent element node prefixes 
#DEFINE COV_STACKROOT "VFPCallStackLog" 
#DEFINE COV_STACK_ONEVENT_TAG event 

* the next items are used to help identify the XML and HTM files written 
* to disk as being generated from this particular process. 
#DEFINE COV_STACKXML_SUFFIX "_STACK" 
#DEFINE COV_STACKXMLEXT_SUFFIX "_STACKX" 
#DEFINE COV_TRANSFORM_SUFFIX "_XSL" 

* the next item indicates whether lines are loaded from the Coverage 
* source workfile dbf or gathered directly by reading the original 
* text log. The former has a very slight speed advantage but will 
* not include ON... events, since those lines are ignored by Cov workfiles 
#DEFINE COV_LOAD_STACK_FROM_DBF .F. 

* the vars in the next item have the same meaning as the columns of the 
* same name in the Coverage source workfile DBF -- equivalents for these 
* items are read in from the source text log when COV_LOAD_STACK_FROM_DBF 
* is .F.,and this expression stays the same. You can change it as long as 
* you ensure that the result will never be empty except for ON... events.
* Load a log into COVERAGE.APP, SET DATASESSION TO _oCoverage.DataSessionID, 
* and refer to the source workfile (by default, its alias is FromLog) for 
* some indication on the possible contents of these columns. 
#DEFINE COV_STACKEXPR ALLTR(IIF(INLIST(FileType,".fxp",".mpx",".qpx",".spx"), ; 
IIF(EMPTY(ObjClass),IF(NOT EMPTY(Executing),ALLTR(Executing),""),; 
IIF(LEFT(Executing,1)=".",ALLTR(ObjClass),"")+ALLTR(Executing)), ;
 ALLTR(Executing)))