ePublisher Archive

Archive for the ‘ePublisher’ Category

DOC vs DOCX

Posted on: March 19th, 2012 No Comments

In ePublisher 2011.3, we introduced an alternate processing flow for the Microsoft Word Office Open XML (OOXML, and DOCX, hereafter) document format. Following is a brief explanation of the reason for this new processing flow, some of the existing side-effects, and the implications of this approach down the road.

Let’s start with a brief history of the Microsoft Word integration with ePublisher. In 2004 – 2005, when ePublisher was being designed, the Word adapter leveraged existing code to process Word DOC files to ePublisher intermediate files (WIF), using Word VBA. Through the years, ePublisher development on the Word adapter has been based on this processing flow, and with each successive release of Word (2007 and 2010), the same processing flow has been used.

In Word 2007, Microsoft introduced a new document format named Office Open XML, which uses a *.docx file extension when saved. Unlike the DOC format, the DOCX format is an XML-based open standard. Up until 2011.3, ePublisher has continued to use the same VBA processing flow for both the DOC and DOCX formats.

So why another adapter? There are a number of issues that show up when processing DOCX files using the DOC processing flow. The root cause of these issues is the fact that the VBA-based processing flow normalizes all files to DOC format. This save is lossy and the effect is that formatting information from DOCX files is dropped in some cases.

Why save DOCX to DOC? Why not just leave the file in its native format when generating the ePublisher’s intermediate files? The answer, VBA does not allow inspection of character style runs. Because ePublisher is unable to use the VBA to iterate runs of character formatting, it relies on a library which inspects the raw bytes of a DOC file. The library is able to derive the runs of character formatting from this analysis. This library only works with DOC files (inspection of runs of character formatting is available in DOCX via XPath), so all files must be saved as DOC before the VBA-based processing flow can be applied. So, the same processing flow cannot be applied to both Word formats, but only to the DOC format, an inherent constraint of DOC/VBA processing flow.

The new DOCX adapter works around the limitations enumerated above by leaving the original DOCX file in its native format. It uses a combination of DOM manipulation and XSL to produce the ePublisher intermediate files. The effect is that formatting information derived from DOCX files is more correct and complete.

There are some growing pains associated with this new approach. The DOCX processing flow is not as mature as the DOC processing flow. There are a number of issues with the DOCX adapter as of the 2011.4 release, which we are working to address. As of the 2011.4 release, intermediate patches are being made available for the DOCX processing flow which address these issues more immediately than the regular quarterly release interval. Following is a link to the page from which these intermediate patches are available:

http://wiki.webworks.com/Updates/DocxUpdates

There are a number of natural advantages to the DOCX adapter. Because of the problems with character style runs, the DOC adapter is forever tied to legacy 32-bit code. The DOCX adapter has no such limitation. It represents a viable path toward 64-bit binaries. Also, the speed and memory performance of the DOCX implementation are far superior to the DOC implementation, which improves the scalability ceiling of the DOCX format. Finally, while there are no current plans to make the needed changes, the fact that DOCX is open (doesn’t require Word in order to read and manipulate) opens the potential of the format to be used across platforms.

But I’m not a developer!

Posted on: September 9th, 2010 8 Comments

I lack objectivity when it comes to programming. I enjoy it. Programming is problem solving. It almost always includes a reasonable explanation. Programming involves building things, and fixing and refining the things that you build.

(more…)

Getting ePub output into iBooks on iPhone/iPad

Posted on: August 27th, 2010 2 Comments

In a recent Study Hall, someone asked how one can deploy the ePub output generated by ePublisher to an iPad. This path has been a little unclear to me and so I promised to research and blog what I found.

(more…)

Use JScript .NET instead

Posted on: August 15th, 2010 No Comments

The ePublisher platform rests on top of the .NET platform. Even though the bulk of the ePublisher processing is done with XSL, you can opt to process items with any of the .NET CLR languages rather than XSL. I published a wiki article which includes an ePublisher project that demonstrates processing content paragraphs with JScript .NET. I am curious if the barrier to customization in ePublisher is the XSL. Since Javascript is ubiquitous, this sample provides an alternative to XSL.

iBooks thoughts

Posted on: August 13th, 2010 2 Comments

Since the launch of ePublisher 2010.2, which included the new ePub format, I have been spending time with the iBooks app to see how I like it. That’s what I want to talk about in this article. What are my impressions, prejudices, likes, dislikes and so on.

(more…)

Study Hall

Posted on: March 29th, 2010 3 Comments

I will be moderating an online session named Study Hall, beginning this Wednesday, March 31 at 7pm CDT. Study Hall is an informal online session in which we will explore ePublisher related concepts and questions. Study Hall will occur the second and last Wednesday of each month. Bookmark the link above to keep track of the next Study Hall session.

possible: Google Docs in ePublisher

Posted on: November 5th, 2009 No Comments

Here is a screen-cast of the GoogleDocs input adapter that we demoed at RoundUp 2009. A couple of caveats:

This is not currently available and there are no definite plans for making this a part of any ePublisher release. That largely depends on user interest (your interest).
The implementation is a demo. As such, there are aspects which will needs require further development.
This is my first-ish screen-recording.

Google Docs Demo

2-Celled Note Table

Posted on: January 22nd, 2009 No Comments

Someone asked me about this at RoundUp a few months ago, and I’m just now getting around to publishing it. I’ve just posted a tip for creating a two-celled note style table for a given paragraph style on the WebWorks wiki.

The implementation uses the wwtransform:super resolver. We are working to make this the blessed way to do overrides in ePublisher. We are working on formalizing the implementation for the 2009.1 release.

Enjoy and please report any problems.

ePublisher 2008.2 Eclipse Help Article

Posted on: July 7th, 2008 No Comments

The 2008.2 Release of the ePublisher Platform went live today. This release includes the Eclipse Help Format. I have just posted a wiki page which documents the items which are peculiar to the Eclipse Help format.

ePublisher 2008.1 and wiki articles

Posted on: March 31st, 2008 No Comments

I posted two new wiki articles. The first is a general outline of ePublisher URI Resolvers. The second is documentation for wwtransform:super, a cool new URI available in the 2008.1 runtime.

ePublisher 2008.1 is here. Happy Q2.