File Transfer Protocol (FTP)
FTP Protocol | |
---|---|
Purpose | Transfer files between hosts |
Standard | RFC 959 |
Runs atop | TCP/IP |
Port number | 20(data), 21(control) |
Libraries | ftplib, urllib2, pexpect |
FTP overview
- FTP was once among the most widely used protocols on the Internet
- FTP service allows users to transfer files between hosts in a TCP-based network.
- Typically, users authenticate to FTP servers using a combination of a username and password.
- However, some sites provide the ability to authenticate anonymously. In this scenario, a user enters the username “anonymous” and submits an email address in lieu of a password.
Why NOT to use FTP?
- FTP is an insecure protocol. files, usernames and passwords are sent completely in the clear.
- FTP clients make a connection, users choose a working directory, and do several operations all over the same network connection. This will have overhead on the server to remember things like a current working directory.
- FTP sync-ing is inefficient. All the files are copied over again even though they are not new or not changed
What to use instead?
- For file download, HTTP has become the standard protocol on today’s Internet.
- Use ssl supported protocols for secured connection.(ftps or https)
- For sync-ing files between hosts, rsync or rdist are more effective. They copy only files that are new or changed.
- SFTP is a secure alternative for trasfering files(over SSH). It’s especially useful in cases where user needs full file-system access.
FTP Connection details
Communication Channels
- FTP is unusual because, by default, it actually uses two TCP connections during operation.
- One connection is the control channel(usually port 21), which carries commands and the resulting acknowledgments or error codes.
- Second connection is the data channel(usually port 20), which is used for transmitting files or other blocks of information. the data channel is fully duplex, meaning that it allows files to be transmitted in both directions simultaneously.
Connection process
- FTP client establishes a command connection by connecting to the FTP port on the server.
- The client authenticates itself, usually with username and password.
- The client changes directory on the server to where it wants to deposit or retrieve files.
- The client begins listening on a new port for the data connection, and then it informs the server about that port.
- The server connects to the port that the client has opened.
- The file is transmitted.
- The data connection is closed.
Client opening a port for server to connect worked well in nearly days but with usage of firewalls, NAT etc it has become more complicated.
FTP has a mode called ‘passive mode’ in which the server opens an extra port for client to connect to, then the data transmission ensues. Passive mode is default in most FTP implementations these days(including ftplib in Python).
Scripting FTP in Python
- Python standard library module ‘ftplib’ is used to script FTP in Python.
- ftplib not only handles the details of establishing connections but also provides convenient ways to automate common commands.
Scanning Directories
- nlst() and dir() methods in ftplib provide ways to explore directories on the server.
- nlst() method provides list of all the files and directories are inside.
- dir() function returns a directory listing in system-defined format, which typically contains a file name, size, modification date, and file type.