Skip to content

Don’t bother zipping .PPTX files

June 22, 2007

Somebody recently sent me a .pptx file inside of a .zip file.  Remember that .xps, .docx, .xlsx, .pptx files are all based on Open Packaging Convention (OPC) and thus are stored in a zip structured file, so you usually won’t see any compression benefits by putting those files into a .zip file.

Update [6/25/2007]: a good (somewhat developer centric) list of the benefits of the new office file formats can be found in the Office Developer documentation in MSDN.  A snapshot/rewording of those benefits for end users:

The new Office XML Formats introduce a number of benefits for users:

  • Robust. The new Office file formats are designed to be more robust than the old formats, and, therefore, to help reduce the risk of lost information due to damaged or corrupted files.
  • Efficient. The Office XML Formats use ZIP and compression technologies to store documents. A significant benefit of the new formats is substantially smaller file sizes—up to 75 percent smaller than comparable binary documents.

  • Backward-compatible. The 2007 Microsoft Office system is backward-compatible with these earlier versions: Microsoft Office 2000, Microsoft Office XP, and Microsoft Office 2003. Users of these versions can adopt the new format with little effort and continue to gain maximum benefit from existing files.

  • Secure. The openness of the Office XML Formats translates to more secure and transparent files. You can share documents confidently because you can easily identify and remove personally identifiable information and business-sensitive information, such as user names, comments, and file paths. By default, the new Word 2007, Excel 2007, and PowerPoint 2007 file formats (docx, etc…)  do not execute embedded code. So, if a person receives an e-mail message with a Word document attached, he or she could open the attachment knowing the document does not execute harmful code. The Office XML Formats include a special-purpose format with a separate extension for files (docm, etc…) with embedded code, enabling IT staff to quickly identify files that contain code.

Update 2 [6/25/2007]: an Office PM pointed me towards Introduction to new file name extensions and Office XML Formats which has good end-user level benefits for the new file formats.  Here is a snapshot:

  • Compact files   Files are automatically compressed and can be up to 75 percent smaller in some cases. The Office XML Formats uses zip compression technology to store documents, offering potential cost savings as it reduces the disk space required to store files and decreases the bandwidth needed to send files via e-mail, over networks, and across the Internet. When you open a file, it is automatically unzipped. When you save a file, it is automatically zipped again. You do not have to install any special zip utilities to open and close files in the 2007 Office release.
  • Improved damaged-file recovery   Files are structured in a modular fashion that keeps different data components in the file separate from each other. This allows files to be opened even if a component within the file (for example, a chart or table) is damaged or corrupted.
  • Better privacy and more control over personal information   Documents can be shared confidentially, because personally identifiable information and business-sensitive information, such as author names, comments, tracked changes, and file paths can be easily identified and removed by using Document Inspector. For details, see Remove hidden data and personal information from Office documents.
  • Better integration and interoperability of business data   Using Office XML Formats as the data interoperability framework for the 2007 Office release set of products means that documents, worksheets, presentations, and forms can be saved in an XML file format that is freely available for anyone to use and to license, royalty free. Office also supports customer-defined XML Schemas that enhance the existing Office document types. This means that customers can easily unlock information in existing systems and act upon it in familiar Office programs. Information that is created within Office can be easily used by other business applications. All you need to open and edit an Office file is a ZIP utility and an XML editor.
  • Easier detection of documents that contain macros   Files that are saved by using the default "x" suffix (such as .docx, .xlsx, and .pptx) cannot contain Visual Basic for Applications (VBA) macros and XLM macros. Only files whose file name extension ends with an "m" (such as .docm, .xlsm, and .pptm) can contain macros.

From → Microsoft

4 Comments
  1. William permalink

    Well, there\’s other reasons to zip it.  Some e-mail programs, rightly or wrongly, block many document formats including .doc files, because they might include trojans.  Despite the "logic" here, I can those e-mail programs blocking .docx and other formats using the same reasoning.
     
    Granted, a better solution is to change the file name foo.docx to foo.docx.safe and let the consumer change it back, but you\’ll still see users with bad habits doing this, even if they understand that .docx is compressed already.

  2. Rob permalink

    I believe that .docm is the extension used for word documents that can contain macros…so I don\’t think that .docx files should be blocked…

  3. Jiping permalink

    I\’ve just got the info that New Office files saved in zippped format,but many Chinese friends dont know it now.
    I\’ll announce it in Chinese.
    [新版的Office文档文件已经压缩过了,没有必要再压缩了。]

  4. Unknown permalink

    Hey Rob, you\’re correct, the .docm/.xlsm/.pptm extensions is used for documents that may contain VBA macros (or XLM sheets in Excel Spreadsheets).  Blocking the documents by extension is part of a defense-in-depth strategy.  You can block potentially dangerous external documents by the extension, then set the appropriate security settings in the Office applications.  Rezipping the file may counteract some of the security by further obfuscating the existence of the macros within the file.  Use the x formats whenever you need to share a document without macros. 
     
    http://blogs.msdn.com/kevinboske

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: