About Me

My photo
Ireland
Hello, my name is Cathal Coffey. I am best described as a hybrid between a developer and an adventurer. When I am not behind a keyboard coding, I am hiking and climbing the beautiful mountains of my home country Ireland. I am a full time student studying Computer Science & Software Engineering at the National University of Ireland Maynooth. I am finishing the final year of a 4 year degree in September 2009. I am the creator of an open source project on codeplex.com called DocX. At the moment I spend a lot of my free time advancing DocX and I enjoy this very much. My aim is to build a community around DocX and add features based on requests from this community. I really enjoy hearing about how people are using DocX in their work\personal projects. So if you are one of these people, please send me an email. Cathal coffey.cathal@gmail.com

Thursday, February 26, 2009

DocX - A .NET library for manipulating Word 2007 files

Note: Code samples have been updated to work with DocX version 1.0.0.6.

Hello my name is Cathal Coffey. I am a intern working at Microsoft Ireland Research. This blog post is about a personal project which I have created outside of my work time.

My project which can be downloaded from here is called DocX. DocX is a .NET library which allows developers to manipulate Word 2007 files, in an easy and intuitive manor. It does not use COM libraries nor does it require Office to be installed in order to function. The rest of this post explains the current features offered by DocX. Please keep in my that this a young library, at the moment it offers two very useful and powerful features

1) String replacement,
2) Set custom properties.

-----------------------------------------------------------------------------------------------------------------------

1) String replacement

The document below Test.docx contains the string “pear” lots of time. There are instances of the string “pear” inside structures such as a table, a list and a hyperlink. The document also contains lots of different style properties such as font, colour, bold, italic, strikethrough and underline.

Figure 1 - Test.docx before manipulationFigure 1.1 - Test.docx before manipulation

Replacing the string “pear” with the string “banana” is a trivial task using the library DocX.

// Load a .docx file
using (DocX document = DocX.Load("Test.docx"))
{
/*
* Replace each instance of the string pear with the string banana.
* Specifying true as the third argument informs DocX to track the
* changes made by this replace. The fourth argument tells DocX to
* ignore case when matching the string pear.
*/
document.ReplaceText("pear", "banana", true, RegexOptions.IgnoreCase);

// Save changes made to this document
document.Save();
}// Release this document from memory.
After running the above code and reopening Test.docx we can see that every instance of the string “pear” has been replaced by the string “banana” and that both deletions and insertions have been tracked. By hovering over a deletion or insertion, we can see that the DocX library has used the credentials that it was executed with, as the author of the edits.

image Figure 1.2 - Test.docx after manipulation

If we click on the “Review” section of the ribbon and select “Accept All Changes in Document” it is now clear that DocX has correctly replace all instances of the string “pear” with the string “banana”.

image Figure 1.3 – Test.docx Accept All Changes in Document

An important point to note is that the DocX library inserted the string “banana” with the correctly style information in each case regardless of what structure it was inside a table, a list or a hyperlink.

imageFigure 1.4 – Test.docx After Accept All Changes in Document

-------------------------------------------------------------------------------------

2) Set custom properties

Custom properties are place holders for real data; they can be of type Text (String), Yes or No (Boolean), Number (Integer or Double) or Date (Universal Date).

To add custom properties to a document you select “Prepare -> Properties” from the Office button menu.imageFigure 2.1 – Office button –> Prepare –> Properties

You then select “Document Properties -> Advanced Properties…”

image Figure 2.2 – Document Properties –> Advanced Properties…

The following window will then popup and you can create your own custom properties.

clip_image002Figure 2.3 – Custom properties

I have created seven custom properties for this demo. Four are of type Text: Forename, Username, HomeAddress and FreeGift. One is of type Number: PleaseWaitNDays. One is of type Date: GiftArrivalDate. One is of type Yes or no: RecieveFurtherMail.

Once you have defined custom properties you can use them through your document by selecting ”Insert -> Quick Parts -> Field…”

imageFigure 2.4 – Insert –> Quick Parts –> Field…

If you double click on one of your custom properties, it will appear in the document at the current carrot position.

imageFigure 2.5 – Select custom property

The following document is a welcome letter that will be sent to all new users who subscribe to the factious magazine called “Home Appliances. The letter which includes the seven custom properties listed above looks as follows.

image
Figure 2.6 - Factious magazine welcome letter

Setting values custom properties for this document is a trivial task using DocX.

// This class represents a user
class User
{
public string forname, surname, username, freeGift, HomeAddress;
public DateTime joined;
public bool RecieveFurtherMail;

public User()
{ }
}

static void Main(string[] args)
{
// A list which contains three new users
List<User> newUsers = new List<User>
{
new User
{
forname = "John", surname = "Smith", username = "John87",
freeGift = "toaster", joined = DateTime.Now,
HomeAddress = "21 Hillview, Naas, Co. Kildare",
RecieveFurtherMail = true
},

new User
{
forname = "James", surname = "O'Brian", username = "KingJames",
freeGift = "kitchen knife", joined = DateTime.Now,
HomeAddress = "37 Mill Lane, Maynooth, Co. Meath",
RecieveFurtherMail = false
},

new User
{
forname = "Mary", surname = "McNamara", username = "McNamara1",
freeGift = "microwave", joined = DateTime.Now,
HomeAddress = "110 Cherry Orchard Drive, Navan, Co. Roscommon", RecieveFurtherMail= true
}
};

// Foreach of the three new user create a welcome document based on template.docx
foreach (User newUser in newUsers)
{
/*
* Load the template to be manipulated and set the custom properties to this
* users specific data
*/
using (DocX doc = DocX.Load("Template.docx"))
{
doc.AddCustomProperty(new CustomProperty("Forname", newUser.forname));
doc.AddCustomProperty(new CustomProperty("Username", newUser.username));
doc.AddCustomProperty(new CustomProperty("FreeGift", newUser.freeGift));
doc.AddCustomProperty(new CustomProperty("HomeAddress", newUser.HomeAddress));
doc.AddCustomProperty(new CustomProperty("PleaseWaitNDays", 4));
doc.AddCustomProperty(new CustomProperty("GiftArrivalDate", newUser.joined.AddDays(4).ToUniversalTime()));
doc.AddCustomProperty(new CustomProperty("RecieveFurtherMail", newUser.RecieveFurtherMail));

// Save this document as the users name followed by .docx
doc.SaveAs(string.Format(@"{0}.docx", newUser.username));
}// Release this document from memory
}
}

The above code will generate three docx files

imageFigure 2.7 – John87.docx

imageFigure 2.8 – KingJames.docx

image
Figure 2.9 – McNamara1.docx

-----------------------------------------------------------------------------------------------------------------------

If you would like to give me feedback on my library DocX, please either post a comment here or email me @ coffey.cathal@gmail.com.

Happy coding,
Cathal

20 comments:

  1. Hi Cathal,
    Cool features! Is there a way to do a "find" of a pattern (I noted the Regular Expression setting, so it sounds possible)and get back what the pattern found? For example, string Find("[*]") might return [Address], which would allow me to look up "Address" then do document.ReplaceText("[Address]", "21 Vine Street", true, RegexOptions.IgnoreCase); and avoid Custom Properties (which are difficult to use, but I can get a list of them). If find could return a list, it would be even much better! Thanks!

    ReplyDelete
  2. Thanks for this beautiful library. The primary importance shall only come when your library will support RTF Format files or data to be converted to DocX files or data, and DocX files or data to RTF Format files or data, such that a Word Editor which uses RichTextBox or RichEdit 6.0 would be able to export/import DocX files. There is no other importance.

    ReplyDelete
    Replies
    1. Lay off! this is a great idea!

      Delete
    2. "There is no other importance."

      No other importance to YOU maybe but you are not the world and I for one find this library to be exceedingly helpful.

      Delete
  3. Hey, these are super cool features! Also, I like how you describe yourself as an adventurer. It especially helps that you are Irish. Anyway, I was wondering if you have more information converting files and XSL-FO. I have checked out a few sites (like the linked one), but am still trying to figure this all out for a project. -Janie

    ReplyDelete
  4. Thanks a lot.
    Now I am using word to generator a report of a product.
    First I try crystal report. But it's a report tool driven by data. It is not suitable for report with a many text blocks.
    Then I saw docx on codeplex. It is wonderful.

    ReplyDelete
  5. Hi. This is shanmugam I used DocX.dll. It is nice.I've one problem. When the word file is closed there is no error in browser of my asp.net application.If I open a docx file and run I got error. Can you give me a solution to solve the problem even docx file is open


    My code is as

    Protected void Page_Load(object sender, EventArgs e)
    {
    List obContractIssueList = new List();

    int t = 0, d = 0, v = 0, va = 0, act = 0, tot = 0, dep = 0, bal = 0; ;
    using (DocX doc = DocX.Load(@"C:\Documents and Settings\shanmugam\Desktop\test.docx"))
    {
    foreach (Paragraph p in doc.Paragraphs)
    {
    List time = doc.FindAll("");
    List date = doc.FindAll("");
    List venue = doc.FindAll("");
    List venueAddress = doc.FindAll("");
    List actName = doc.FindAll("");
    List totalCost = doc.FindAll("");
    List deposit = doc.FindAll("");
    List balance = doc.FindAll("");
    t = time.Count;

    d = date.Count;
    v = venue.Count;
    va = venueAddress.Count;
    act = actName.Count;
    tot = totalCost.Count;
    dep = deposit.Count;
    bal = balance.Count;


    }
    if (t == 0)
    { obContractIssueList.Add("<TIME>"); }
    if (d == 0)
    { obContractIssueList.Add("<DATE>"); }
    if (v == 0)
    { obContractIssueList.Add("<VENUE>"); }
    if (va == 0)
    { obContractIssueList.Add("<VENUE ADDRESS>"); }
    if (act == 0)
    { obContractIssueList.Add("<ACT NAME>"); }
    if (tot == 0)
    { obContractIssueList.Add("<TOTAL COST>"); }
    if (dep == 0)
    { obContractIssueList.Add("<DEPOSIT>"); }
    if (bal == 0)
    { obContractIssueList.Add("<BALANCE>"); }
    }

    foreach (var s in obContractIssueList)
    {
    Response.Write(s.ToString());
    }
    }
    }

    It's urgent.
    Thanks in advance
    by Shanmugam.s

    ReplyDelete
  6. Hi Cathal, its very nice project.
    I am facing an issue, related to AddCustomProperty.

    My problem is that, how can i make a template document, without using AddCustomProperty. Actually my customer having template already, i need to fill the values using DocX.dll
    or in simple word, how can i add CustomProperty in word document?

    thanks
    Gill

    ReplyDelete
  7. Thanks lot for this release

    I wondred i I can save file withe save as/open dialog without saving file in a server ?

    Thanks again

    ReplyDelete
  8. Hi i want to set the font for whole file programticaly i am using this docx dll for writing the word file but have problem after completion of file writing formating is new well done . i want to use Curier New Font with size 7 please give me sample code and help me.
    Thanks and regards
    Sumeet.

    ReplyDelete
  9. Hey I just want to know is this possible with DocX library that I find a specific word from the document file and add some string just after that word. I don't want to replace any text but to find a text and add another text with a space after that.

    Thanks And Regards
    Toshim Shaikh

    ReplyDelete
  10. I am also a .NET developer and follows a group on my google plus profile for .NET Word Library by the name of Aspose. Its not a free API like DocX but allow free trial so you can try this API also.

    ReplyDelete
  11. Hi Cathal,

    I notice you can work with templates but your sample uses docx instead of dotx:

    using (DocX doc = DocX.Load("Template.docx"))

    Can you use dotx extension or does it not matter?

    thanks
    Russ

    ReplyDelete
  12. Hi Cathal,

    Brilliant work. Following on from your Invoice example, is it possible to add Decimal Tabs to align printed values with the DocX library?

    Thanks in advance
    Glenn

    ReplyDelete
  13. Custom properties which are exist in Header and Footer not getting update, Please help to resolve the issue

    ReplyDelete
  14. Hi Cathal
    How use Custom properties with value is HTM tag

    ReplyDelete
    Replies
    1. Quite interested in this also. Been trying to use Novacode to import some HTML fragments and for them to keep its formatting. Is this possible?

      Delete
  15. Hi Cathal,
    How Can I add the page Number Using the Docx... I have 2 to 3 pages i want to implement page count...

    ReplyDelete
    Replies
    1. http://cathalscorner.blogspot.fr/2010/06/docx-version-10010.html

      Delete
  16. This comment has been removed by the author.

    ReplyDelete