Mail Archiving: What it is and Why You Need it |
| Date Added: September 30, 2007 03:31:57 PM |
There seems to be a bit of confusion about what e-mail archiving is, and how it differs from e-mail system backups. After all, both are about saving mail for longer periods of time and recovering items at will. I hope to shed a little light on the subject by explaining the differences between archiving and backup, and in which situations the one or the other is most appropriate. Backups are a familiar concept and have been around almost as long as there has been data stored on computers. Outside the computer world, the concept is even older. People have been making copies of important documents for centuries. More recently this has been accomplished through carbon paper and photocopying. But for almost as long as there has been writing, man has made multiple copies of important information to protect it from loss due to unforseen circumstances, whether it be war, natural disaster or political powershifts. Computer backups serve a similar purpose: In the event of an unforseen problem - a fire in the data center or a crashed hard disk - we simply cannot afford to lose important information because recreating the information, if it can be recreated at all is time consuming and expensive.. While everybody pretty much agrees backups are a good thing, it is important to be aware of their limitations. A backup is a snapshot of the data at the time the backup was taken. That is, whatever data was there at the time the backup was made is what you will find in it later, nothing more and nothing less. And the purpose of the backup is also somewhat limited - to restore some or all data in the case of an unforseen event. In some cases, a crashed hard disk for example, you need to restore all data. In other cases, a user who inadvertently deletes something important for example, that single something may be restored. For practical reasons, there are limitations as to how often backups are made. If we look at mail system backups for the moment, some systems cannot be backed up while the system is running so you want to limit how often back the system up so as not to limit its usability. And some systems are only capable of "full" backups i.e. you must copy all data, not just new data in the system. This puts a strain on storage systems since sooner or later you run out of disk space and have to start deleting old backups. It also tends to be wasteful since a large percentage of the data in a backup will be the same as many previous backups. If you doubt this statement simply take a look at your Inbox and count the mail you received before today. If you are donig daily backups each of those mail will be in every single backup since it arrived in your Inbox. Some of this has been mitigated by some system's ability to do "incremental" backups, where only new data is backed up. But again, the main purpose of backups is to be able to restore a system to a known state, even if that state is not identical to the state of the data at the time of the unforseen event. Mail archiving is an entirely different beast. A bit depends upon why you are archiving which I'll get to in a minute, but an archive is not just a snapshot of the system, it is a cumulative view of the system. Every mail entering or leaving the system is put directly into the archive and nothing is ever removed. So why, you are probably asking yourself, can't I just reassemble all of my old backups and get the same thing. The answer is that backups are made periodically, not constantly, because their purpose is to restore the system to a known state in the event of disaster. A mail could come into the system, be quickly deleted by receiver, and never make it into a backup. An archiving solution, on the other hand, dutifully adds the mail to the archive because it serves a different purpose. Which leads us to why you would want to archive mail. There are three main reasons for archiving mail. The first is disk space management. Most messaging systems limit the amount of data individual users may store in the system. Once that limit is reached they must remove some data. But it is not always desireable to simply delete the data. Users may have mail that they access infrequently but still need to save for years. So rather than simply increasing user's space quotas (which by the way make backups even larger) on expensive storage devices, an archiving solution allows them to save this mail on less expensive disks, normally compressed to save even more space, but still gives them access to the information when needed. The second reason is laws and regulations with funny names like SoX and HIPAA. Some companies and organizations are required to store all correspondence, including e-mail, for a given period of time. They could store backups for the proscribed time period but, as discussed above, this does not necessarily fulfill the "all correspondence" requirement. The last reason is as an aid in legal proceedings. In the same way courts may require a company to hand over paper documents to opposing councel, electronic documents such as e-mail are also being requested in what is now called "e-discovery". And again, like with paper documents, the non-existence of a given e-mail can be used to prove that such mail does not and never existed, provided the contents of the archiving system can be proven to include everything sent or received by the system. And this is where an archive is superior to a backup. For example a mail from an employee to the CEO warning about faults in a product could be deleted by both and never end up in a backup. With archiving, you can prove this mail never existed. An additional problem with backups in an e-discovery context has been demonstrated in several high profile lawsuits. Several companies have lost lawsuits simply because they could not perform e-discovery from backups within the court-appointed time frame. So even though they had backups going back years, even a decade, they could not load and search them fast enough to meet the deadline, and lost the suits. In summary, archiving solutions came into being to answer shortcomings with backups as the needs of companies and organizations changed. While there are some similarities between the two, they really provide solutions to two different problems. Backups serve the need to restore a system to a known state whereas archives serve the need to store perhaps all data ever seen by the system for efficient retrieval later. Of course, it should be possible to provide one piece of software capable of serving both these needs. Still, the problems solved are somewhat contradictory so you would end up with more complex software than you really need. At the same time, not everyone needs the one solution or the other so there are advantages to the encapsulation of functionality into different solutions. This encapsulation also allows each solution more freedom to evolve as needs evolve which is usually a good thing. I hope this helps clear up the differences between backup and archiving and helps you to understand why you may need an archiving solution. Next time, we'll talk about what you should look for in an archiving solution to help you better evaluate the alternatives. Mark Manning |