nel, which had to interrupt the process anyway to handle the clock interrupt, very little additional system overhead is required. However, some operating systems, most notably Linux 2.0 (and earlier), do not provide a 'profil()' system call. On such a system, arrangements are made for the kernel to periodically deliver a signal to the process (typically via 'setitimer()'), which then performs the same operation of examining the program counter and incrementing a slot in the memory array. Since this method requires a signal to be delivered to user space every time a sample is taken, it uses considerably more overhead than kernel-based profiling. Also, due to the added delay required to deliver the signal, this method is less accurate as well. A special startup routine allocates memory for the histogram and either calls 'profil()' or sets up a clock signal handler. This routine ('monstartup') can be invoked in several ways. On Linux systems, a special profiling startup file 'gcrt0.o', which invokes 'monstartup' before 'main', is used instead of the default 'crt0.o'. Use of this special startup file is one of the effects of using 'gcc ... -pg' to link. On SPARC systems, no special startup files are used. Rather, the 'mcount' routine, when it is invoked for the first time (typically when 'main' is called), calls 'monstartup'. If the compiler's '-a' option was used, basic-block counting is also enabled. Each object file is then compiled with a static array of counts, initially zero. In the executable code, every time a new basic-block begins (i.e., when an 'if' statement appears), an extra instruction is inserted to increment the corresponding count in the array. At compile time, a paired array was constructed that recorded the starting address of each basic-block. Taken together, the two arrays record the starting address of every basic-block, along with the number of times it was executed. The profiling library also includes a function ('mcleanup') which is typically registered using 'atexit()' to be called as the program exits, and is responsible for writing the file 'gmon.out'. Profiling is turned off, various headers are output, and the histogram is written, followed by the call-graph arcs and the basic-block counts. The output from 'gprof' gives no indication of parts of your program that are limited by I/O or swapping bandwidth. This is because samples of the program counter are taken at fixed intervals of the program's run time. Therefore, the time measurements in 'gprof' output say nothing about time that your program was not running. For example, a part of the program that creates so much data that it cannot all fit in physical memory at once may run very slowly due to thrashing, but 'gprof' will say it uses little time. On the other hand, sampling by run time has the advantage that the amount of load due to other users won't directly affect the output you get.