About Me

My photo
Hello, my name is Cathal Coffey. I am best described as a hybrid between a developer and an adventurer. When I am not behind a keyboard coding, I am hiking and climbing the beautiful mountains of my home country Ireland. I am a full time student studying Computer Science & Software Engineering at the National University of Ireland Maynooth. I am finishing the final year of a 4 year degree in September 2009. I am the creator of an open source project on codeplex.com called DocX. At the moment I spend a lot of my free time advancing DocX and I enjoy this very much. My aim is to build a community around DocX and add features based on requests from this community. I really enjoy hearing about how people are using DocX in their work\personal projects. So if you are one of these people, please send me an email. Cathal coffey.cathal@gmail.com

Thursday, February 26, 2009

DocX - A .NET library for manipulating Word 2007 files

Note: Code samples have been updated to work with DocX version

Hello my name is Cathal Coffey. I am a intern working at Microsoft Ireland Research. This blog post is about a personal project which I have created outside of my work time.

My project which can be downloaded from here is called DocX. DocX is a .NET library which allows developers to manipulate Word 2007 files, in an easy and intuitive manor. It does not use COM libraries nor does it require Office to be installed in order to function. The rest of this post explains the current features offered by DocX. Please keep in my that this a young library, at the moment it offers two very useful and powerful features

1) String replacement,
2) Set custom properties.


1) String replacement

The document below Test.docx contains the string “pear” lots of time. There are instances of the string “pear” inside structures such as a table, a list and a hyperlink. The document also contains lots of different style properties such as font, colour, bold, italic, strikethrough and underline.

Figure 1 - Test.docx before manipulationFigure 1.1 - Test.docx before manipulation

Replacing the string “pear” with the string “banana” is a trivial task using the library DocX.

// Load a .docx file
using (DocX document = DocX.Load("Test.docx"))
* Replace each instance of the string pear with the string banana.
* Specifying true as the third argument informs DocX to track the
* changes made by this replace. The fourth argument tells DocX to
* ignore case when matching the string pear.
document.ReplaceText("pear", "banana", true, RegexOptions.IgnoreCase);

// Save changes made to this document
}// Release this document from memory.
After running the above code and reopening Test.docx we can see that every instance of the string “pear” has been replaced by the string “banana” and that both deletions and insertions have been tracked. By hovering over a deletion or insertion, we can see that the DocX library has used the credentials that it was executed with, as the author of the edits.

image Figure 1.2 - Test.docx after manipulation

If we click on the “Review” section of the ribbon and select “Accept All Changes in Document” it is now clear that DocX has correctly replace all instances of the string “pear” with the string “banana”.

image Figure 1.3 – Test.docx Accept All Changes in Document

An important point to note is that the DocX library inserted the string “banana” with the correctly style information in each case regardless of what structure it was inside a table, a list or a hyperlink.

imageFigure 1.4 – Test.docx After Accept All Changes in Document


2) Set custom properties

Custom properties are place holders for real data; they can be of type Text (String), Yes or No (Boolean), Number (Integer or Double) or Date (Universal Date).

To add custom properties to a document you select “Prepare -> Properties” from the Office button menu.imageFigure 2.1 – Office button –> Prepare –> Properties

You then select “Document Properties -> Advanced Properties…”

image Figure 2.2 – Document Properties –> Advanced Properties…

The following window will then popup and you can create your own custom properties.

clip_image002Figure 2.3 – Custom properties

I have created seven custom properties for this demo. Four are of type Text: Forename, Username, HomeAddress and FreeGift. One is of type Number: PleaseWaitNDays. One is of type Date: GiftArrivalDate. One is of type Yes or no: RecieveFurtherMail.

Once you have defined custom properties you can use them through your document by selecting ”Insert -> Quick Parts -> Field…”

imageFigure 2.4 – Insert –> Quick Parts –> Field…

If you double click on one of your custom properties, it will appear in the document at the current carrot position.

imageFigure 2.5 – Select custom property

The following document is a welcome letter that will be sent to all new users who subscribe to the factious magazine called “Home Appliances. The letter which includes the seven custom properties listed above looks as follows.

Figure 2.6 - Factious magazine welcome letter

Setting values custom properties for this document is a trivial task using DocX.

// This class represents a user
class User
public string forname, surname, username, freeGift, HomeAddress;
public DateTime joined;
public bool RecieveFurtherMail;

public User()
{ }

static void Main(string[] args)
// A list which contains three new users
List<User> newUsers = new List<User>
new User
forname = "John", surname = "Smith", username = "John87",
freeGift = "toaster", joined = DateTime.Now,
HomeAddress = "21 Hillview, Naas, Co. Kildare",
RecieveFurtherMail = true

new User
forname = "James", surname = "O'Brian", username = "KingJames",
freeGift = "kitchen knife", joined = DateTime.Now,
HomeAddress = "37 Mill Lane, Maynooth, Co. Meath",
RecieveFurtherMail = false

new User
forname = "Mary", surname = "McNamara", username = "McNamara1",
freeGift = "microwave", joined = DateTime.Now,
HomeAddress = "110 Cherry Orchard Drive, Navan, Co. Roscommon", RecieveFurtherMail= true

// Foreach of the three new user create a welcome document based on template.docx
foreach (User newUser in newUsers)
* Load the template to be manipulated and set the custom properties to this
* users specific data
using (DocX doc = DocX.Load("Template.docx"))
doc.AddCustomProperty(new CustomProperty("Forname", newUser.forname));
doc.AddCustomProperty(new CustomProperty("Username", newUser.username));
doc.AddCustomProperty(new CustomProperty("FreeGift", newUser.freeGift));
doc.AddCustomProperty(new CustomProperty("HomeAddress", newUser.HomeAddress));
doc.AddCustomProperty(new CustomProperty("PleaseWaitNDays", 4));
doc.AddCustomProperty(new CustomProperty("GiftArrivalDate", newUser.joined.AddDays(4).ToUniversalTime()));
doc.AddCustomProperty(new CustomProperty("RecieveFurtherMail", newUser.RecieveFurtherMail));

// Save this document as the users name followed by .docx
doc.SaveAs(string.Format(@"{0}.docx", newUser.username));
}// Release this document from memory

The above code will generate three docx files

imageFigure 2.7 – John87.docx

imageFigure 2.8 – KingJames.docx

Figure 2.9 – McNamara1.docx


If you would like to give me feedback on my library DocX, please either post a comment here or email me @ coffey.cathal@gmail.com.

Happy coding,