As discussed previously, files may contain all different types of data such as images, videos, documents, text files or spreadsheets. Even applications are files. So there is a question of how the Operating System recognises a chunk of data as an image or a text file.
File Types
The answer is that each file has a type and one common technique for implementing file types is to include the type as part of the file name. The name is split into two parts—a name and an extension, usually separated by a period, for example, resume.docx, server.c, etc.
In this way, the user as well as the operating system can tell from the name alone what the type of a file it is. Most operating systems allow users to specify a file name as a sequence of characters followed by a period and terminated by an extension made up of additional characters.
The use of the extension not only helps the system to indicate the type of the file but also the type of operations that can be done on that file; for example, only a file with a .com, .exe, or .sh extension can be executed.
Application programs also use extensions to indicate file types in which they are interested. For example, Java compilers expect source files to have a .java extension, and the Microsoft Word processor expects its files to end with a .doc or .docx extension.
Common file types
File Type | Usual Extension | Function |
executable | exe, com, bin or none | ready-to-run machine- language program |
object | obj, o | compiled, machine language, not linked |
source code | c, cc, java, perl, asm | source code in various languages |
batch | bat, sh | commands to the command interpreter |
markup | xml, html, tex | textual data, documents |
word processor | xml, rtf, docx | various word-processor formats |
library | lib, a, so, dll | libraries of routines for programmers |
print or view | gif, pdf, jpg | ASCII or binary file in a format for printing or viewing |
archive | rar, zip, tar | related files grouped into one file, sometimes com- pressed, for archiving or storage |
multimedia | mpeg, mov, mp3, mp4, avi | binary file containing audio or A/V information |
File Signatures/Magic Numbers
Files, however, do not necessarily always have an extension. At times, the user may intentionally rename and give the file a wrong extension and thus, in such cases, an extension may not even reflect the actual file format. Nevertheless, in that case, the system can try a number of other techniques to determine the file type, so that it can open that file in the most appropriate program.
Magic numbers, also called a file signature, are the first few bytes of a file that are unique to a particular file type, and as a result, provides information about the data contained within the actual file. These few bytes of numerical and text values at the beginning of a file can be used by the system to “differentiate between and recognise different file formats/types” without a file extension.
For example, GIF images, always begin with the ASCII representation of either GIF87a or GIF89a, depending upon the standard to which they adhere.
These sequence of bytes are essential for a file to be opened and changing it may render the file useless as most tools will not access these files due to potential damaging. However, these magic numbers/ file signatures are typically not visible to the user but can be viewed and edited with the help of a hex editor (computer program that allows for manipulation of the binary data that constitutes a computer file).
Leave a Reply