0

I am working on an approach to archive our website (dynamically generated) periodically (say every month) and keep it versionned so that I can go back and pull a page at a certain period.

My initial approach is to crawl the site recursively and commit it to a subversion repository so that I can use subversions history and export feature.

Is there any other optimal solution which uses less space as possible? Also, I am not sure how long a subversion commit of an entire site would take so a faster solution is also desired.

Hennes
  • 64,768
  • 7
  • 111
  • 168
  • 1
    Is there anything wrong with just creating a gzipped tarball of the site in regular intervals? Space shouldn't be the primary concern when taking backups. – slhck Aug 23 '13 at 14:51
  • 2
    Why don't you just put the site itself in a version control system and when you make a change, update the deployed site from your repository? – Oliver Salzburg Aug 23 '13 at 14:51
  • @OliverSalzburg I should have mentioned in the question. The site is dynamically generated. – Balaji Natarajan Aug 23 '13 at 15:03
  • If having a ready to be seen archive, I would suggest [wget](http://www.gnu.org/software/wget/) and the suggestion of tarballing the output. – Doktoro Reichard Aug 23 '13 at 15:03
  • 2
    Dyanamically generated static pages? If so then dynamically generated doesn't really matter. Gzipped tarball per slhck and a cron job. If dynamic at runtime then a database snapshot and standard source control – ToddB Aug 23 '13 at 15:05
  • A subversion (or any other revision control system tool) on the static source files is probably the best solution. Add to that the configuration files for the software. The actual generated content is not important, you can always recreate that. And to see what changed when SVN is perfect. So I guess you already self-answered in the question. – Hennes Sep 13 '13 at 22:24

1 Answers1

1

Use 7zip on a cron job to periodically zip with recursive and a datetime stamp - then test and mv to a fail-over cluster - always test your backup periodically or else you will have corrupt data and not find out until its critical. I believe you can use a 7zip test switch as well. We've been using this approach for 7 years with mission critical data and zip 16 times a day with no failures whatsoever.

jonsca
  • 4,077
  • 15
  • 35
  • 47
user254060
  • 11
  • 1