Skip to main content

Tip: Converting a CSV File into XML by Excel

I came across a problem to convert a historical data in comma-separated format (CSV) into a modern & more useful XML format. There are many ways to do such conversion. The most obvious one is to do it programmatically by writing a program to read CSV data and exporting it to an XML file by means of the Simple API for XML (SAX) or the Document Object Model (DOM) libraries.

But I was too lazy to do it.

I knew that Microsoft Excel 2003 supports exporting worksheets into XML data. I tried it out but found that it is not that simple -- we have to define an XML schema and bind each element to each column of the worksheet.

So I went on researching.

Finally, I found a very useful tool on MSDN:
Excel 2003 XML Tools Add-in Version 1.1 (download).
The Excel Add-in's "XML Tools -> Convert a Range to an XML List" menu command allows us to simply select a range in a worksheet (with or without first row as column names) -- with the advanced options to specify the XML root element name and row element name.

The add-in automatically prepare our worksheet for exporting of XML data by the "Data -> XML -> Export" menu command.

Here is a summary of how I used Excel to convert a CSV file into XML:


  1. Install Excel 2003 XML Tools Add-in

  2. Open the CSV file, e.g. LegacyData.csv, in Excel

  3. Save as an Excel workbook

  4. Open the Excel workbook

  5. Use Menu -> XML Tools -> Convert a Range to an XML List

  6. Select the whole data rage

  7. Choose "Yes, first row contains column names"

  8. Select "Use Advanced Options"

  9. Set XML root element name, e.g. "Root"

  10. Set XML row element name, e.g. "Record"

  11. Click "OK"

  12. An error message "The data that you are attempting to map contains formatting that is incompatible with the format specified in the worksheet" will pop up if we try to convert an automatically formatted date column. This is because of the formatting in the Date column. As a workaround, I chose "Match element data type." Dates in the date column will be converted to the raw format, e.g. 38013 for 1/27/2004.

  13. Export the data by using Menu -> Data -> XML -> Export

  14. The dates in the Date elements of the exported XML file will be in the raw format.

  15. Open the exported XML file in an XML editor. In my case, I used XMLSpy.

  16. Go to the Grid view. Select all entries in the Date column (but DO NOT delete the "Date" column title) and delete the entries.

  17. Copy the corresponding date entries from the Excel worksheet and simply paste them into the Date column.

  18. Save the XML file. The XML data file is now properly converted.


I hope this posting will be useful to those who face the same problem as mine.


Keywords: , , , ,

Comments

Popular posts from this blog

"Microsoft.ACE.OLEDB.12.0 provider is not registered on the local machine" Error on Windows 7 (64-bit) + Office 2010 (64-bit) + Visual Studio 2010

If you use (1) Windows 7 (64-bit), and (2) Office 2010 (64-bit), and  (3) Visual Studio 2010 to write an ASP.NET code to connect to Access or Excel database using the Microsoft.ACE.OLEDB.12.0 provider and consistently get the "Microsoft.ACE.OLEDB.12.0 provider is not registered on the local machine" error, try installing the 2007 Office System Driver: Data Connectivity Components , which is basically a Microsoft Access Database Engine 2007 Redistributable for Windows (32-bit) from http://www.microsoft.com/download/en/details.aspx?id=23734 Many forums suggested by Google Search suggest installing the Microsoft Access Database Engine 2010 Redistributable for Windows (32-bit, 64-bit) downloadable from http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=13255 but it wouldn't help because Visual Studio 2010 is a 32-bit application; what you need is a 32-bit Data Connectivity component. The 2010 download will not allow you to install i...

Tips: Mac OS X: Full ANSI Color Support in Terminal.app

I'm trying to switch my Java development platform from Windows XP to Mac OS X Tiger. Wondering how to colorize the Terminal screen, I spent some time googling. From the discussions at the end of this page: macosxhints.com - Add full ANSI color support to Terminal.app Here's a summary of how to enable it: With bash shell as default, simply add export TERM=xterm-color [I prefer this for Linux compatibility] or export TERM=dtterm in the ~/.profile (single-user) or /etc/profile (system-wide) Color terminal is enabled. Use ' ls -G ' (the -G enables color output) to test. Add alias ls='ls -G' in the profile file for convenience. Keywords: mac-os-x , unix , terminal , shell , tips

Lenovo IdeaPad S10: Annoying Fan Noise Fix and Heat Conduction Upgrade

Here's a repost of my message on the Lenovo.com board: I would like to thank Slash (on Lenovo.com board) so much for his messages on this thread: http://forums.lenovo.com/lnv/board/message?board.id=IdeaPad_Netbook&message.id=1183#M1183 I decide today (February 28, 2009) to walk into a Lenovo Service Center inside a computer mall in downtown Bangkok because I cannot tolerate the constant grinding noise from my S10 anymore. I bought my S10 in late November 2008 and I started to hear some grinding noise in late December 2008. I have been using the S10 as my primary computer both for work and for home. At work, this noise has embarrassed me many, many times. At home, I left the S10 with Vista in High Performance mode on overnight so that it can complete the maintenance tasks, e.g., HDD Defrag and Indexing, but the fan noise from high CPU temperature woke me up at 4:45 in the morning! I learned from Slash's picture http://i41.tinypic.com/25alq3d.jpg that Lenovo chose to use chea...