At work, I’m doing a bunch of data acquisition on various machines outside the firewall. I think I’ve hit on an absolutely great combination of Python technologies to do this in a really manageable and scalable manner.
The design constraints:
- No long-running processes. These are heavily loaded boxes. The data acquisition will run via scheduled cron jobs.
- No new ports open. I can’t run Mysql on these boxes.
- No drilling back from these servers into the corporate network. So I can’t run mysql inside and connect from the outside.
- Each remote box will run a small python app on a scheduled basis, writing its data to a single file.
- A "gatherer" system inside the firewall will retrieve the files from all remote machines on an hourly basis.
- On acquisition, the remote files will be removed.
- The gatherer process will then merge the remote data into a master Mysql database for reporting.
- PySqlite: will provide access to an on-the-fly generated database file on remote machines.
- SQLObject: can talk to both PySqlite and Mysql. I simply define the object model using its scheme, and it will happily generate the tables for me in both Sqlite and Mysql. Writing the merge process was the trickiest bit, but it only took a few hours to write and test.
This was an extremely fast development path. I was all the way from concept to deployment in just three days, and I didn’t even know anything about SQLObject or Sqlite at the start.
Contrast that productivity to Java, where a similar project would take at the minimum a couple weeks, and you start to see why I think that Python has become the secret weapon in my developer toolbox.