I have used FTP often in the past for automated batch transport of files from servers in a corporate environment. I just wanted to share some learnings:
1. A good file naming convention is important. Incorporate at least the following elements:
– data source identity name. if the files are from a CRM logs DB, then perhaps “CRMLOGDB”
– file sequence number. The file sequence is important because it provides the ability to track for missing files when the received FTP files are evaluated at the destination. The file sequence can rotate after a fixed limit depending on field length.
– timestamp at source. This is not essential but can help to make the file unique and if the OS filesystem at source and destination can afford long file names, then it would be good to put it in the filename. The international date format: YYYY-MM-DD is a good way to format dates. I like this date format particularly when it is placed at the start of the filename because when the files are sorted by name, they will sort chronologically. Very useful.
2. Using a temp file name while the file is in transit. During the FTP GET (in a pull) or FTP PUT (in a push) the destination filename can be specified as a temp name or a name with a temp extension or prefix. After the FTP GET or FTP PUT, the transfer script can rename the file to its original file name. To achieve this, the transcript script should ideally iterate the files and not simply use an FTP MPUT or FTP MGET. With this in place, it is easy to identify incomplete transfers at the destination.
3. The choice between FTP PUT (push) or FTP GET (pull) are equally plausible but if the choice is freely available, I am partial to setup FTP PUT (push). This makes it possible for the sending system to send files as soon as files for transfer are available. With a pull setup, the receiving system has to poll the sending system.
4. Obviously, a way for tracking which files have already been transferred is necessary. Some convention has to be agreed between sending and receiving parties. For example, it could be agreed that once the file is transferred it is renamed or moved to another directory, or it could be deleted in the sending system, or it could be tracked another way such as via a database table.
5. Some method for checking that data within a transferred file is complete should be incorporated. So, at file level–there can be a file sequence number, and at file data content level, there can also be a record sequence number. It is also possible to use a trailer record or end of file character or marker sequence. The record sequence number makes for greater assurance of completeness and integrity of course. Corrupted data in the middle during transfers could survive the end of file marker method.
6. There are also of course timing considerations–how often does the sending system produce files. At what time does the sending or pulling system start transfers and how often.
7. What does the sending system do when there is no file to send. Should a sending system create a zero content file anyway so that the receiving system is assured that the sending system is up and running in good condition.
8. Writing a simple interface specification document between the two interfacing systems is useful particularly when the sending and receiving parties from different teams or groups. This way it is clear to both parties what conventions and processes are to be followed.
9. For text data, the systems involved should take advantage of FTP ASCII mode transfer.
In the end, I think it is all about writing a good FTP transfer script. Even windows DOS batch scripts provide sufficient facility to make good and safe FTP transfer scripts. FTP is still useful despite its age when used properly. It is ubiquitous, and easy and simple, but also easy to mishandle and improperly setup. The basic issue as always is to assure that FTP transfers are complete, and have the right control measures in place.