About Me

My photo
Ireland
Hello, my name is Cathal Coffey. I am best described as a hybrid between a developer and an adventurer. When I am not behind a keyboard coding, I am hiking and climbing the beautiful mountains of my home country Ireland. I am a full time student studying Computer Science & Software Engineering at the National University of Ireland Maynooth. I am finishing the final year of a 4 year degree in September 2009. I am the creator of an open source project on codeplex.com called DocX. At the moment I spend a lot of my free time advancing DocX and I enjoy this very much. My aim is to build a community around DocX and add features based on requests from this community. I really enjoy hearing about how people are using DocX in their work\personal projects. So if you are one of these people, please send me an email. Cathal coffey.cathal@gmail.com

Thursday, December 10, 2009

Is DocX dead?

DocX is not dead. I have just returned to college to finish the fourth and final year of my (computer science and software engineering degree). At the moment because of lectures, assignments, and my final year project, I have very little time to contribute to DocX. I am still answering questions that people email me, but I cannot release a new version of DocX until my exams are finished in June. I have received a lot of emails lately requesting new features for DocX and I am maintaining a priority list of these, I will work off this list in June.

In case you are interested, below is some information about my final year project.

Project title: Real time, naked hand tracking with minimal calibration.
Overview: My project aims to answer the following question. Can a users hands be reliably tracked in real time from a video sequence without using markers, gloves or electronic devices?

Hands

It is now December and I have been working on this problem since September, I have had a lot of success in these 4 short months.

untitled22

At this point in time I am building example applications to demonstrate my projects use. I am easily able to extract two very useful pieces of information, the distance between my hands and the angle.

untitled222

Here is a video of what I have so far. The change in distance d and angle a between my hands controls the zoom and rotation of the teapot.


Let me know what you think,
Cathal

video

Saturday, October 31, 2009

Converting .docx into (.doc, .pdf, .html)

Introduction

A DocX user asked me during the week when was I going to support converting Word 2007 documents (.docx) into other useful forms such as  (.doc, .pdf, .html). I would love to add this functionality to DocX, however there is a problem.

The Problem

The only easy way to do this conversion, is to use Microsoft’s Office interop libraries. For anyone who doesn't know what Microsoft’s Office interop libraries are, I envy you.

The Microsoft Office interop libraries are available in the Add Reference dialog.

Untitled 

The Code

Once you have added a reference to Microsoft.Office.Interop.Word you can use the below project to convert a Word 2007 .docx into .doc, .pdf, and .html.

Code Snippet
  1. using System;
  2. using System.Collections.Generic;
  3. using System.Linq;
  4. using System.Text;
  5. using Word = Microsoft.Office.Interop.Word;
  6. using Microsoft.Office.Interop.Word;
  7.  
  8. namespace ConsoleApplication1
  9. {
  10.     class Program
  11.     {
  12.         static void Main(string[] args)
  13.         {
  14.             // Convert Input.docx into Output.doc
  15.             Convert(@"C:\users\cathal\Desktop\Input.docx", @"c:\users\cathal\Desktop\Output.doc", WdSaveFormat.wdFormatDocument);
  16.  
  17.             /*
  18.              * Convert Input.docx into Output.pdf
  19.              * Please note: You must have the Microsoft Office 2007 Add-in: Microsoft Save as PDF or XPS installed
  20.              * http://www.microsoft.com/downloads/details.aspx?FamilyId=4D951911-3E7E-4AE6-B059-A2E79ED87041&displaylang=en
  21.              */
  22.             Convert(@"c:\users\cathal\Desktop\Input.docx", @"c:\users\cathal\Desktop\Output.pdf", WdSaveFormat.wdFormatPDF);
  23.  
  24.             // Convert Input.docx into Output.html
  25.             Convert(@"c:\users\cathal\Desktop\Input.docx", @"c:\users\cathal\Desktop\Output.html", WdSaveFormat.wdFormatHTML);
  26.         }
  27.  
  28.         // Convert a Word 2008 .docx to Word 2003 .doc
  29.         public static void Convert(string input, string output, WdSaveFormat format)
  30.         {
  31.             // Create an instance of Word.exe
  32.             Word._Application oWord = new Word.Application();
  33.  
  34.             // Make this instance of word invisible (Can still see it in the taskmgr).
  35.             oWord.Visible = false;
  36.  
  37.             // Interop requires objects.
  38.             object oMissing = System.Reflection.Missing.Value;
  39.             object isVisible = true;
  40.             object readOnly = false;
  41.             object oInput = input;
  42.             object oOutput = output;
  43.             object oFormat = format;
  44.  
  45.             // Load a document into our instance of word.exe
  46.             Word._Document oDoc = oWord.Documents.Open(ref oInput, ref oMissing, ref readOnly, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref isVisible, ref oMissing, ref oMissing, ref oMissing, ref oMissing);
  47.  
  48.             // Make this document the active document.
  49.             oDoc.Activate();
  50.  
  51.             // Save this document in Word 2003 format.
  52.             oDoc.SaveAs(ref oOutput, ref oFormat, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing);
  53.             
  54.             // Always close Word.exe.
  55.             oWord.Quit(ref oMissing, ref oMissing, ref oMissing);
  56.         }
  57.     }
  58. }

The result

 

image
Input.docx

 

image

image

image

Output.doc

Output.pdf

Output.html

Please note

This code will only execute on a machine that has Microsoft’s Office installed on it. The Microsoft’s Office interop libraries actually execute a “hidden” instance of the Office. If you run the above code and then take a look at taskmgr you will see the following.

image

If you want to convert to .pdf, you must also have the Microsoft Office 2007 Add-in: Microsoft Save as PDF or XPS installed.

It is for this reason that I have not included convert functionality into my DocX library. I do not want DocX to have a dependency on Word.exe.

The future

Is there no way to do conversions without having Word.exe installed on my machine. I didn’t say that, I said there is no easy way. This looks very promising, now if I could only find the time.

Donation?

As always, I offer this code to you for free. I am however a student and if you would like to say thank you, you can buy me lunch by sending a €5 euro donation via paypal.

Thursday, September 17, 2009

DocX codeplex page redesign

I have spent some time over the past few days redesigning http://docx.codeplex.com. I have decided to go with a minimalistic design, my girlfriend Cristina who created all of the art work for me, told me that I should not be afraid of white space. I think she's on to something.

The biggest addition to the new page design is a section aptly named “How do I learn more?”. It is my intention to upload a large series of video tutorials to this section, I have already uploaded the first two.

1. Getting started with DocX.
2. Paragraphs and text formatting.

If you have any requests, topics that you would like to see covered, please send them to coffey.cathal@gmail.com. I am definitely going to create a video about CustomProperties, and another about Images and Pictures. I would also welcome feedback about the new page design, both positive and negative.

Kind regards and happy coding,
Cathal

Friday, August 28, 2009

DocX v1.0.0.8 released

DocX version 1.0.0.8 is now available for download from here. So what's new in this version? Why should you download it?

The biggest addition to this version of DocX is a better method of building highly formatted Paragraphs. This new method is a fluent interface and it was proposed by one of DocX's users Morten Bjerre. So what is a fluent interface? This Wikipedia article explains fluent interfaces quite well.

Here is an example of how this new addition to DocX makes building highly formatted Paragraphs easier and more intuitive. Lets say that we want to create a document that looks like this.

Untitled
This can now be accomplished with the below code.

// Create a document.
using (DocX document = DocX.Create(@"Test.docx"))
{
// Insert a new Paragraphs.
Paragraph p = document.InsertParagraph();

p.Append(
"I am ").Append("bold").Bold()
.Append(
" and I am ")
.Append(
"italic").Italic().Append(".")
.AppendLine(
"I am ")
.Append(
"Arial Black")
.Font(
new FontFamily("Arial Black"))
.Append(
" and I am not.")
.AppendLine(
"I am ")
.Append(
"BLUE").Color(Color.Blue)
.Append(
" and I am")
.Append(
"Red").Color(Color.Red).Append(".");

// Save this document.
document.Save();
}
// Release this document from memory.


This version of DocX also contains three new features that were added by a user
Joel Folkerts. Joel is the first user to post working and tested code for inclusion in DocX. These new features are listed below.
1. Row.Height property
2.
Cell.Width property
3. Row.MergeCells
Row.Height and Cell.Width are self explanatory but here is an example of using Row.MergeCells. Lets say we have the following Table.
UntitledWe can merge the cells {1-3} and the cells {6-8} on row 1 to get the following table.Untitled3This can now be accomplished in DocX with the following code.
// Load a document.
using (DocX document = DocX.Load(@"Test.docx"))
{
// Grab the first Table.
Table t = document.Tables[0];

// Grab the first Row of this Table.
Row r = t.Rows[1];

// Merge the cells 1-3 into one new cell.
r.MergeCells(1, 3);

// Merge the cells 4-6
r.MergeCells(4, 6);

// Save all changes made to this document.
document.Save();
}
// Release this document from memory.


I have also added a new feature that was requested by a user Mathetes. This new feature allows TextReplace() to take a Formatting object as an extra parameter and it then only returns text that matches both the input string and that format. Another new extra parameter for this function is a new Formatting object to be applied to the inserted text.
// Load a document.
using (DocX document = DocX.Load(@"Test.docx"))
{
// The formatting to match.
Formatting matchFormatting = new Formatting();
matchFormatting.Size = 10;
matchFormatting.Italic =
true;
matchFormatting.FontFamily =
new FontFamily("Times New Roman");

// The new formatting to apply.
Formatting newFormatting = new Formatting();
newFormatting.Size = 22;
newFormatting.UnderlineStyle =
UnderlineStyle.dotted;
newFormatting.Bold =
true;

// Iterate through the paragraphs
foreach (Paragraph p in document.Paragraphs)
{
p.ReplaceText
(
"wrong", "right", false,
RegexOptions.IgnoreCase,
newFormatting, matchFormatting,
MatchFormattingOptions.SubsetMatch
);
}

// Save all changes made to this document.
document.Save();
}
// Release this document from memory.
I want to take this opportunity to say a huge thank you to Morten Bjerre, Joel Folkerts and Mathetes. You have all enhanced DocX greatly with the features that you have added to this version.

If anyone out there reading this wants to suggest a new feature, please send it my way. I will try my best to include it in a future version for DocX.
happy coding,
Cathal

Friday, August 14, 2009

DocX v1.0.0.7 released

I have just uploaded DocX version 1.0.0.7 to docx.codeplex.com. This version comes with an extra optional download “Advance – Invoice Example.zip”.

“Advance – Invoice Example.zip” is a zipped Visual Studio 2008 solution that I created to demonstrate how far DocX has evolved since version 1.0.0.1. The first time that you run InvoiceExample.exe it will check for the existence of a document called “InvoiceTemplate.docx”, because this is the first execution this document will not exist, so it is created.

Figure 1.0 - InvoiceTemplate.docx created by first run of InvoiceExample.exe

It is important to note that everything you see in the above screenshot was created by DocX. If you now run InvoiceExample.exe again the document “InvoiceTemplate.docx” is found and it is now used to create an invoice for a factitious company called “The Happy Builder”.

Figure 1.1 - Invoice_The_Happy_Builder.docx created by second run of InvoiceExample.exe

As you can see DocX has loaded the document “InvoiceTemplate.docx” into memory and customized it. This was done by setting custom property values, inserting an image, creating a picture from this image, and replacing a Table with a new Table that contains data from a data source.

This example is intended for an experienced user of DocX. A first time user may find this solution overwhelming. Over the next few days, I am going to create a video tutorial called "Getting started with DocX". This video will be targeted at the first time user of DocX.

I hope that from this example you can see how quickly Docx is growing and also how useful it is becoming. As always if you would like to report a bug, request a feature or even just say hi, please email me at coffey.cathal@gmail.com.

happy coding,
Cathal

DocX is completely free, but if you have found it useful and you would like to make a donation you can do so via paypal.

Tuesday, August 4, 2009

DocX version 1.0.0.6 released

Hi again, sorry it has been so long since my last post. My progress in the physical world has slowed my progress in the digital.

This newest version of DocX brings many bug fixes and also two new features. Firstly, I would like to thank the following people for taking the time to email me with bugs and feature requests. Without your interest and feedback, DocX would not progress.

Many thanks to: John Marsh, Francesco, Johan Laagland, Gaspar, Brian (chicken), Jianbang, Christopher W. Steelman, Peter Thompson, Brian Minert.

New features

1) InsertDocProperty(CustomProperty cp)

CustomProperties are displayed in a document using fields of type DocProperty. Until this version, DocX was capable of adding a CustomProperty to a document, but it was not capable of displaying these CustomProperties. As of DocX version 1.0.0.6, this is now possible, below is an example.

// Load a document
using (DocX document = DocX.Create(@"Test.docx"))
{
// Create a custom property.
CustomProperty name = new CustomProperty("name", "Cathal Coffey");

// Add this custom property to this document.
document.AddCustomProperty(name);

// Insert a new paragraph.
Paragraph p = document.InsertParagraph("Author: ", false);

// Insert a field of type document property to display the custom property name
p.InsertDocProperty(name);

// Save all changes made to this document.
document.Save();
}// Release this document from memory.
The above code example will create the following document. Please note that the text 'Cathal Coffey'
is being displayed by a field, this field's value is controlled by the CustomProperty 'name'.

2) FindAll(string str)
This version of DocX allows you to search a document for a string. The function
FindAll(string str) returns a list containing all of the start indexes of the found string.
Below is an example of this new function.
// Load a document
using (DocX document = DocX.Load(@"Test.docx"))
{
// Loop through the paragraphs in this document.
foreach (Paragraph p in document.Paragraphs)
{
// Find all instances of 'go' in this paragraph.
List<int> gos = document.FindAll("go");

/*
* Insert 'don't' in front of every instance of 'go' in this document to produce * 'don't go'. An important trick here is to do the inserting in reverse document * order. If you inserted in document order, every insert would shift the index * of the remaining matches.
*/
gos.Reverse();
foreach (int index in gos)
{
p.InsertText(index, "don't ", true);
}
}

// Save all changes made to this document.
document.Save();
}// Release this document from memory.
If the above code is run on this document (Drawing in red is to illustrate the index's of each instance of 'go').

Then the following document is produced.

As you can see, 'don't' was inserted at the original index of each instance of 'go'.

DocX is completely free, but if you have found it useful and you would like to make a donation you can do so via paypal.

As always, please contact me if you find any bugs, have any feature requests or just want to provide feedback.

happy coding,
Cathal

Sunday, April 19, 2009

DocX version 1.0.0.4 released

This blog post is to announce the release of DocX version 1.0.0.4. I spend a lot of time re-factoring the library for this release. I have renamed a lot of functions to give a more consistent implementation. I have also created a (.chm) documentation file which contains lots of useful examples.

The .chm documentation
If you open up the documentation file and you see the message Navigation to the webpage was cancelled like the below screenshot.

image

Then you need to right click on the .chm file and select properties.

image

In the properties window, click unblock, then press apply and ok.

image

You should then be able to view the documentation and it should look like this.

image

The documentation contains lots of examples of how to use all of DocX’s features. If you find a mistake in the documentation or if you would like to add a code example that you have written please email me. If I think your example adds value to the documentation then I will gladly add it and mention you in the credits section.

Changes in this release

  1. Added two nwq overloads DocX.Load(System.IO.Stream) and Doc.Create(System.IO.Stream).
  2. DocX.AddParagraph has been renamed to InsertParagraph and now allows a users to insert a new Paragraph at a specified character index in the document, 4 overloads are available.
  3. DocX.Value has been renamed to DocX.Text.
  4. DocX.Save has been renamed to DocX.Close(bool saveChanges).
  5. Paragraph.Insert, Paragraph.Remove and Paragraph.Replace have been renamed to Paragraph.InsertText, Paragraph.RemoveText, Paragraph.ReplaceText, new overloads are also available.
  6. Paragraph.Value has been renamed as Paragraph.Text.

If you would like to send me feedback on DocX, or if you would like to make a suggestion for the next feature I implement, please email me at coffey.cathal@gmail.com.

happy coding,
Cathal

Wednesday, April 8, 2009

DocX version 1.0.0.2 released

Note: Code samples have been updated to work with DocX version 1.0.0.6.

Lots of people emailed me asking me to add support for Images in DocX. So I did. Version 1.0.0.2 which can be downloaded from here adds basic support for Images and Pictures.

Images and Pictures? Are they not the same thing you might ask? No they are not! You can think of a Picture as a customized view of an Image. Once you have added Images to a docx file.

image

You can then add multiple customized Pictures of those Images into Paragraphs. You can do all sorts of interesting customizations with these Pictures before inserting them into Paragraphs. At the moment docx supports the following Picture customizations;

Rotations and transformations

image image image

Resizing

image image image

Shaping

image image image

image image image

Time for an example using DocX; below is a file entitled Example.docx.

image Figure 1.0 – Example.docx before code execution

The following code adds the Image Donkey.jpg into this file, it then creates two Pictures using this Image and inserts them into the last Paragraph.

// Create a .docx file
using (DocX document = DocX.Create(@"Example.docx"))
{
// Add an Image to the docx file
Novacode.Image img = document.AddImage(@"Donkey.jpg");

// Insert an emptyParagraph into this document.
Paragraph p = document.InsertParagraph("", false);

#region pic1
Picture pic1 = p.InsertPicture(img.Id, "Donkey", "Taken on Omey island");

// Set the Picture pic1’s shape
pic1.SetPictureShape(BasicShapes.cube);

// Rotate the Picture pic1 clockwise by 30 degrees
pic1.Rotation = 30;
#endregion

#region
pic2
// Create a Picture. A Picture is a customized view of an Image
Picture pic2 = p.InsertPicture(img.Id, "Donkey", "Taken on Omey island");

// Set the Picture pic2’s shape
pic2.SetPictureShape(CalloutShapes.cloudCallout);

// Flip the Picture pic2 horizontally
pic2.FlipHorizontal = true;
#endregion

// Save the docx file
document.Save();
}// Release this document from memory.
image

Figure 1.1 – Example.docx after code execution

Another interesting thing that you could do would be to apply an operation to every Picture in a document. The below snippet would rotate every Picture in a document clockwise by 30 degrees.

// Load the document that you want to manipulate
using (DocX document = DocX.Load(@"Test.docx"))
{
// Loop through each Paragraph
foreach (Paragraph p in document.Paragraphs)
{
// Loop through each Picture in this Paragraph
foreach (Picture pi in p.Pictures)
{
// Rotate this picture clockwise by 30 degrees
pi.Rotation = 30;
}
}

// Save the document
document.Save();
}// Release this document from memory
If you would like to send me feedback on DocX, or if you would like to make a suggestion for the next feature I implement, please email me at coffey.cathal@gmail.com.

happy coding,
Cathal

Thursday, March 19, 2009

DocX version 1.0.0.1 released

Note: Code samples have been updated to work with DocX version 1.0.0.6.

I promised that I would release DocX frequently, and with lots of new useful features.

Version 1.0.0.1 which can be downloaded from here, allows a developer to create a document on the fly, add new paragraphs and insert highly formatted text. Just about every font option that is available in Word 2007, is now exposed by DocX. These options include, but are not limited to; font, size, bold, italics, underline, strike through, superscript, subscript, color, highlight color, scale, shadow, outline, emboss, engrave, hidden, spacing, position, kerning.

imageFigure 1.0 – Word 2007 font options, tab 1

image
Figure 1.1 – Word 2007 font options, tab 2

Here’s an example of how to generate a docx file, which contains custom formatted text, out of thin air, using docx.

// Create a new document.
using (DocX document = DocX.Create(@"Test.docx"))
{
// Create a formatting called f1
Formatting f1 = new Formatting();
f1.FontFamily = new FontFamily("Agency FB");
f1.Size = 28;
f1.Bold = true;
f1.FontColor = Color.RoyalBlue;
f1.UnderlineStyle = UnderlineStyle.doubleLine;
f1.UnderlineColor = Color.Red;

// Insert a new Paragraph into this document.
Paragraph p = document.InsertParagraph("I've got style!", false, f1);

// Create a formatting called f2
Formatting f2 = new Formatting();
f2.FontFamily = new FontFamily("Colonna MT");
f2.Size = 36.5;
f2.Italic = true;
f2.FontColor = Color.SeaGreen;

// Insert new text at the end of this Paragraph.
p.InsertText("I have a different style.", false, f2);

// Save all changes made to this document.
document.Save();
}// Release this document from memory
The above code will generate the following document.

imageFigure 1.2 – Out of thin air

The coolest thing about this code, is that, you can generate .docx files in the cloud, without a dependency on Microsoft Word.

Please remember that DocX is not a full blown solution for .docx creation and manipulation, not yet anyway. It is a project, that I am working on, in my spare time. If you would like to request a feature for version 1.0.0.2, please email me or alternatively leave a comment here.

My current aim for version 1.0.0.3, is to add the ability to create and manipulate images, tables and lists.

happy coding,
Cathal

Thursday, March 12, 2009

Compiling DocX from source code

I received a few emails outlining problems that people have encountered while trying to compile DocX from source. This post explains how to solve those problems.

Unable to read the project file “DocX.csproj”

Anyone who downloaded change set 17108, this was the build uploaded (Mon at 6:58 PM) would have experienced the following error on opening DocX.csproj in Visual Studio.

Unable to read the projectFigure 1.0 – Unable to read the project file “DocX.csproj”

There are two ways to overcome this problem,

  1. You can simply download change set 15663, this is the build uploaded (Thursday at 11:08 AM).

    or
  2. Open DocX.csproj in a text editor (notepad.exe) and remove the following two lines,
    1. <DeepSeaObfuscate>false</DeepSeaObfuscate>
    2. <Import Project="$(MSBuildExtensionsPath)\DeepSea Obfuscator\DeepSea.Obfuscator.targets" />

Missing reference DocumentFormat.OpenXml

If Visual Studio’s reference window cannot find DocumentFormat.OpenXml then you need to download and install the Open XML Format SDK 2.0.

imageFigure 1.1 – Missing reference DocumentFormat.OpenXml

Test projects missing reference DocX

If Visual Studio’s reference window for the projects CustomPropertyTextApp and StringReplaceTestApp cannot find DocX then you need to update the reference.

imageFigure 1.2 – Test projects missing reference DocX

First you need to build the DocX, you can do this by right clicking on the project and selecting build.

imageFigure 1.3 – Build DocX

Right click on the references for both projects and select “Add Reference…”

imageFigure 1.4 – Add reference…

Select the Projects tab and choose DocX.

imageFigure 1.5 – Select the projects tab

That’s it your done, now the entire solution will compile.