Formatting ΒΆ

This section was originally developed for incoming graduate students in ECE by some senior PhD students who were developing major codes. It later evolved into guidelines for OrangeFS (formerly PVFS). Some sections of this guide may be so simple as to not be of concern, or may be dated (it was begun in the early 90's). The style rules are still in force, though a tour of the code will show they have not always been followed. The overall concept is improve readability and understandability, and keep the as code consistent as possible. Many of these rules are similar to those for the Linux kernel, while others are completely opposite. The history of the choices is not known, but we try to keep the code as close to the guide as we can.

As you work on the code, if you see glaring violations it would be great if you would correct them. In some cases it may be too much of a distraction, but we ask that at least the code you add or edit keeps to the spirit of this guide.

GPL

The GPL, or General Public License, is a software license created by the GNU organization ( http://www.gnu.org). You can find out more information about it at their web site. A brief summary is that the GPL insists that the source code be distributed with any software released under the GPL. OrangeFS is licensed under the LGPL which allows others to linked to the provided interfaces without requiring that their code be GPL licensed.

Copyright information at the top of source code in OrangeFS should be noted as below

/*
 * (C) 2009 Clemson University, The University of Chicago and Omnibond. *
 * See COPYING in top-level directory.
 */
Other organizations that have contributed to the project may be listed in the copyright line as well (See section  OrangeFS Copyright for information on how do this in OrangeFS code). If you wish to credit particular developers or provide contact information, please do so in the README file located in the top level directory.

Other source code header information

In addition to the copyright comments, it is usually helpful to provide a brief description of what is contained in the each source file. This should just be a few summary lines below the copyright information.

Footer information

Every source file must have a footer for vim as follows:

/*
 * Local variables:
 *  mode: c
 *  c-indent-level: 4
 *  c-basic-offset: 4
 * End:
 *
 * vim: ft=c ts=8 sts=4 sw=4 expandtab
 */
This does several things - makes indentation level 4 spaces, replaces all tabs with spaces and activates c programming mode. Users of other editors need to make sure they do not insert tabs or use other spacing.

Commenting

Commenting your code effectively is very important! Please comment important sections of your code clearly and concisely as you write it. The habit of commenting after completing the code often leads to poor comments.

Do not use c++ style comment delimiters ( // ) in c code. Some c compilers do not accept this as a comment delimiter, and it is not a part of the c language specification.

For single line comments (or brief comments trailing a line of code), just use the /* and */ delimiters. If the comment is longer than one line, use this format:

/* This code does lots of cool things.  It is also written perfectly and
 * will never break.  It is fast, robust, extensible, and resistant to
 * rust and corrosion.
 */
This makes it easy to tell where the comment begins and ends.

Files should have a comment block that using the doxygen format and the command \file

/** \file These are functions that do ...
 *
 *  This is the full description
 */
Comments that describe the operation of a particular function should be listed just above the function definition, not the prototype. The comment should give the function name, what it does, what any potential side effects are, and the range of return values. This is one example:

/** This function shuts down the method control subsystem.
 *
 * It is responsible for tearing down internal data structures, shutting down
 * individual method devices, and gracefully removing any unfinished
 * operations.
 *
 * returns 0 on success, -errno on failure
 */
int MC_finalize(void)
{
   ...
}
Note this format enables the use of Doxygen for automatically documenting your code.

Function naming

Interface functions and global variables in PVFS should use a standard naming convention for clarity. Here are a few guidelines:

The letters "PVFS" should only be prepended to functions and global variables that exist as part of an application level interface. Some examples would be PVFS_open and PVFS_read. For clarity, do not use this naming scheme for interfaces internal to OrangeFS.
Well defined internal OrangeFS interfaces should use the prefix "PINT" (this is short for "PVFS interface"). This should then be followed by an identifier for the interface, and then a description of what the particular function does. Some examples are PINT_flow_alloc, PINT_flow_free, and PINT_flow_post.
There are exceptions to the above rule. For example, well defined interfaces that exist within a very distinct module of OrangeFS may use a different prefix. Examples include the method functions within the BMI layer of OrangeFS communications have names such as BMI_tcp_send and BMI_tcp_recv.
Any variables that are globally visible should follow the rules listed above as well. This naming convention is for both functions and global variables.
Avoid over used variable names such as "flag", "context", "cookie", etc. unless a variable has a very limited scope and is well commented.
Brackets, Indentation, and Spacing

Brackets are of course used to delineate blocks of code contained within loops, conditional statements, or functions. For clarity, any statement executed within a conditional or loop should be enclosed in brackets, even if it is just one line. For example:

if (something true)
{
    do_something;
}
and not

if(something true)
    do_something;
Each bracket gets it's own line in the source code and must line up with its corresponding bracket in the same column.

Indentation is also very important to writing clear code. The easiest rule to remember is that any new set of brackets should add a level of indentation for the code contained within it. This holds for functions, loops, and conditionals. Indentation should be 4 spaces and should NOT use tabs. The following is an example:

int foofunction(int x)
{

    int y = 0;

    if(x <= 0)
    {
        do some stuff;
    }
    else
    {
        for (y = 0; y > x; y++)
        {
            do lots of stuff;
        }
    }

    return(0);
}
int somefunc(int arg)
{
    int a;
    for (a = 0; a < 10; a++)
    {
        float b;
        b = a * arg;
        if (b < 2 * arg)
        {
            printf("WOW!\n");
        }
    }
}
There should be a space placed around every binary operator (including =, <, >=, +, *, etc.) but NOT between a unary operator (including +, - *, ++, etc.) and its argument. There should be a space after each comma in a list of arguments and a comma after each semi-colon in a for loop (unless it is a degenerate form). Space between a function name or keyword (including if, for, while, etc.) are preferred but optional. Potentially ambiguous expression grouping should be parenthesized, even if the default is correct in order to emphasize the programmer's intention.

    /* this is wrong */
    foo=avar*somefunc(bvar,cvar*dvar+1);

    /* this is right */
    foo = avar * somefunc(bvar, (cvar * dvar) + 1);
The dereference operator should be adjacent to the variable it pertains to and have a space between it and the type it returns as follows:

    /* right */
    int *foo, *bar;
    struct foobar *my_function(int *arg);
    int *p = (int *)malloc(sizeof(int));

    /* wrong */
    int* foo, * bar;
    struct foobar * my_function(int* arg);
    int* p = (int*)malloc(sizeof(int));
Readability is more important that reducing the number of lines of code or the width of the code. White space and logical alignment improves readability. Put blank lines in the code to improve readability, and to distinguish different logical steps of the code. This does not mean put a blank line after EVERY line or to put multiple blank lines inside of a function (though between they may be useful in some cases).

If long lines become heavily indented they can be broken across multiple lines, which case they should be indented by at least 8 spaces, or more if it allows the code to line up naturally. The rule is do not let lines wrap around 80 columns, but keep them neat and logical, and do not let them look like regular indentation (hence at least 8 spaces, not 4). Generally the more things line up to represent the structure, the easier it is to read. Once again, using more lines is considered a good thing.

    /* this is fine */
    some_log_var_name = (struct some_big_type *)malloc(some_important number
            * some_other_number * sizeof(struct some_big_type));

    /* this may be better */
    some_log_var_name = (struct some_big_type *)malloc(some_important number *
                        some_other_number * 
                        sizeof(struct some_big_type));
Functions calls co by the rule "All arguments on one line, or each argument on a line by itself." Multi-line argument lists should line up in the same column, preferably after the opening parenthesis, but if that doesn't fit they can line up under the function name, indented by at least 8 spaces.

    /* BAD example */
    PVFS_big_function_name_that_makes_things_not_fit(big_var, small_var, rather_long_var,
                                                     really_really_really_really_big_one,
                                                     one, two, three, four, five,
                                                     six_and_seven_and_eight,
                                                     nine);
    /* GOOD exmple */
    PVFS_big_function_name_that_makes_things_not_fit(big_var, 
                                                     small_var, 
                                                     rather_long_var,
                                                     really_really_really_really_big_one,
                                                     one, 
                                                     two, 
                                                     three,
                                                     four, 
                                                     five,
                                                     six_and_seven_and_eight,
                                                     nine);
And now you have room for lots of comments on those arguments, as well as see how many arguments there are and find a particular one during debugging. We have many functions with long argument lists, when they are formatted this way it is easy to spot to major steps of the code where important things happen and see what is being passed in and out.

If you use vim, a comment block at the end of each file can automatically set parameters such as tab stop and autoexpand tabs that help keep the code in the proper format. Note that these are overridden if you have the same parameter set in your .vimrc file.

Function definitions with more than one argument should have each argument on a line by itself, with each argument indented the same and a comment documenting the argument that adheres to the Doxygen formats.

PVFS_object_ref *PVSF_find_an _object (PVFS_fs_id id,    /**< file system id of the object */
                                       char *name,       /**< string with name of the object */
                                       PVFS_attr_t attr, /**< attributes of the object */
                                       PVFS_hints *hints /**< User supplied hints */
                                      )
{
...
}
Hints for writing maintainable code

General code layout

These are a few general guidelines for how to organize your code:

Group similar functions together into the same .c file.
If a function will only be called from within the .c file where it is defined, then include the prototype for the function in the same .c file near the top. (see section  Static Declarations for information on static declarations)
If a function will be called from outside of the .c file in which it is declared, then put the prototype in a header file separate from the .c file. This header should be included in any other .c file where the function will be called.
Put comments describing the behavior of the function just before its definition, not with the prototype (see section  Commenting for more detailed information about commenting functions).
Header files should only contain prototypes and structures that are needed by external pieces of code. It helps to encapsulate things by not providing extraneous information in the header files.
Length of functions

Try not to make extremely long functions. A good rule of thumb is to limit your functions to 100 lines or less. If a function is longer than this, then it should probably be broken apart into smaller subfunctions. Exceptions to this rule are rare.

Preventing double inclusion

If you are using a header file in several locations, it is easy to create a situation in which the same header file is indirectly included twice in a single compilation. This causes compilation errors because of function, variable, or type redefinition. In order to ensure that this does not happen, you should always wrap your header files in preprocessor macros that prevent the code from being read more than once by the compiler. This may be done by creating a special define that can be detected the second time the code is included. The name of this define should stand out so that it does not conflict with other variables or definitions in your code. It is usually safe to pick the filename of header, convert it to all uppercase, and replace punctuation with underscores. Here is an example for a header file called bmi.h:

/*
 * (C) 2001 Clemson University and The University of Chicago
 *
 * See COPYING in top-level directory.
 */

/** \file This file contains the primary application interface to the BMI
 *  library.
 */

#ifndef __BMI_H    /* these macros tell the compiler to skip the */
#define __BMI_H    /* following code if it hits it a second time */

/* now do whatever you would normally do in your header: */

#include<bmi_types.h>

struct foo
{
    int x;
    int y;
};

int foo_function(double a, double b);

/* don't forget to end your header with this statement */
#endif /* __BMI_H */
Static declarations

Any function or variable that is declared global in a particular .c file but not referenced in any other .c file should be declared static. This helps to keep the symbol name space from becoming cluttered. It also insures that local functions are not accidentally called somewhere that they were not intended to be called.

Initializing variables

Initialize all variables when they are declared in your software. Even if it is a trivial scalar variable, go ahead and initialize it. Integers and floats can typically be initialized to -1 or 0, while pointers can be initialized to NULL. This simple habit can help uncover many problems that occur when the validity of a value is not checked before it is used. There is no guarantee what the value of a variable will be when it is created. Picking a known initial value to start out with can prevent garbage data from being interpreted as valid information.

A similar argument applies to memory regions that are dynamically allocated. Any dynamically allocated structure or variable should at least be zeroed out before being used in the code. This can be done with the memset() function:

foopointer = (struct foostruct)malloc(sizeof(struct foostruct));
if(foopointer == NULL)
{
    /* alloc failed */
    return(some error value);
}
memset(foopointer, 0, sizeof(struct foostruct));
If there are sentinal values other than 0 for elements contained in your struct, they should be set as well.

Allocating and deallocating complex structures

If there is a particular structure that you are frequently dynamically allocating or deallocating, it usually pays off to go ahead and create functions to handle those operations. This is especially helpful if there are further dynamically allocated structures within the original structure. Encapsulating all of this memory management in a pair of functions aids in debugging and makes your code more readable overall. A good naming convention is:

/* returns a pointer to new structure on success, null on failure */
struct foo *alloc_foo(void);

and

/* no return value */
void dealloc_foo(struct foo *);
Keeping up with work in progress

There are often questionable issues, or even issues that you don't have time to deal with at the moment, that come up when writing large pieces of code. It is generally helpful to document these questions or "todo" items in a known location so that they are not forgotten. There are two recommended ways of handling this. Keep larger or more imporant items listed in a file called "TODO" in the top level directory of your project. This file can be added to CVS so that other developers can see a quick list of known bugs or issues that need resolution. As items on this list are corrected, you may wish to log them in another file at the top level called "Changelog". Smaller issues, that are perhaps only important from a stylistic point of view, can be commented in the code and marked with the text string "TODO" within the comment. This is highlighted with a special color with vi syntax highlighting, and can easily be found with the grep tool later.

Choosing good variable and function names

Try to pick descriptive names for variables and functions, rather than saving keystrokes by picking obtuse abbreviations. This makes it easier for people who look at your code afterwards to understand what is going on. If your function or variable name is comprised of more than one word, then separate the word with underscores. If a collection of functions are related, or collectively form a common interface, the prepend an identifier to each function so that it is obvious that they belong together:

int test_control_open();
int test_control_close();
int test_control_read();
Function and variable nameing issues specific to OrangeFS can be found in section  OrangeFS Naming.

Advanced topics

Checking for interrupted system calls

If a system call fails, always check the return value to see if it was set to EINTR. If this happens, it means that the system call was interrupted by a signal and probably did not actually fail; it just needs to be restarted. This is a fairly common situation when doing reads, writes, or polls. You can restart operations either by wrapping them in a while loop that causes it to try again if EINTR occurs, or you can use a goto and a label to jump back to the the system call you wish to repeat.

Constant arguments

If you are passing in pointers as arguments to a function, but do not wish for the value contained in the pointer to be modified, then it is sometimes helpful to make the argument declaration a constant. This makes the compiler present a warning or an error if the value is accidentally modified within your function. This technique is especially useful when one of the arguments to your function is a string. In this case, you will probably be passing in a char* argument for convenience. However, passing in a string in this manner allows the function to modify the argument, which may not be desirable. Using a const char* argument can prevent this. Example:

int string_key(const char *key, const char *id_string)
{
    /* within this function it is now impossible to accidentally modify
     * the character strings pointed to by key or id_string
     */
    return(0);
}
Obscure coding practices

By all means, try to avoid the use of obscure coding tricks when writing software as part of a group. This especially true when there is there is an equally valid but much clearer method of accomplishing your goal. Obscure coding practices include but are not limited to:

the : ? conditional operator
uneccessary goto statements
nested switches
implicit type conversion
placing too much emphasis on making code small
Some of these can be OK if used in a clear, well-documented manner. In particular, goto can be used to break out of long loops on error and jump to the end of the function where it cleans up and exits. It should never be used to implement loops or other complex code structures.

Locking data structures

If you are programming in a multithreaded or reentrant environment, it is very important to use locking mechanisms effectively. Any global variable should be locked before it is accessed in this type of environment. The pthread library contains almost any sort of portable primitive you may need for a single application. It is also helpful to wrap these calls behind an interface that allows you to turn locking on or off at compile time. The ability to disable locking can be useful during development or when running code on a system that does not require locking. Look in the pvfs-locks CVS module for an example.

Select vs. poll

Try to avoid using the select system call and use poll in its place. Poll scales more efficiently. It is also the most direct function call for accomplishing the desired task on modern Linux kernels because select is implemented on top of the kernel's poll function.

String parsing

Be careful with regards to which functions you use when doing simple string parsing. Some of the functions provided in {{string.h}}} are dangerous to use, either because they do not return error values, or because they alter their arguments. Most of these issues are documented in the man pages. One common example occurs when an integer value must be read out of a string. In this case, it is better to use sscanf than atoi:

    char number_string[] = "300";
    int my_number = -1;
    ret = -1;

    /* if you use sscanf, you can check the return value */
    ret = sscanf(number_string, "%d", &my_number);
    if(ret < 1)
    {
        return an error;
    }
    /* as opposed to atoi, which will not tell you if it fails */
    my_number = atoi(number_string);
Abstraction

When you are designing new interfaces, think carefully about how to create an abstraction for what you want the interface to do. The important idea here is to not be tied down to a particular implementation below your interface because you made the interface too restrictive. For example, suppose that you wish to create an interface for storing and retrieving a large number of independent objects. One way to implement this may be to use a hashing function. However, most people consider it to be much quicker to get a simple linked list working. If you abstract the interface correctly, you can implement the functionality with a linked list for now just to get your program working and then come back later and plug in a hash table implementation. This is only possible with a good abstraction, however. If your first interface has functions such as "add_to_head" or "create_new_list" that pass around pointers to lists, then it will of course be difficult to change this interface to use a hash table. It would be better to use functions such as "store_item" or "create_new_container" and use opaque types to keep up with your data structure.

Function pointers

Function pointers can be useful when creating modular code. They allow you to pick which function will be used to perform a given task at run time rather than compile time. This is not really any harder than manipulating pointers to variables:

/* this is the first way to send a message */
int send_message_one(void* data, int size);

/* this is the second way to send a message */
int send_message_two(void* data, int size);

/* this is a pointer to the preferred method */
int (*send_message_generic)(void*, int) = NULL;

...

if(something is true)
{
    send_message_generic = send_message_one;
}
else
{
    send_message_generic = send_message_two;
}


/* We don't care which method the user chose.  We know that it can be
 * accessed through this function pointer without us modifying our code.
 */
send_message_generic(my_data, sizeof(my_data));
Typedefs and opaque types

Choosing appropriate types for objects passed around in your code can be very important in some situations. There are a couple of different issues here:

Platform dependence: Different architectures use a different number of bytes for some variable types. This means that it can sometimes be very helpful to explicitly choose the size of some variables to aid portability. This is especially true if the data is going to passed over a network, although there are more issues (such as big-endian vs. little-endian) to worry about in those situations. It is often a good idea to use typedefs to create new type names that have a known, fixed size:
   typedef int32_t pvfs_flag_t;
This guarantees that when a pvfs_flag_t variable is declared, it will be a 32 bit integer, regardless of the host architecture.

Opaque types: Sometimes you wish to have an interface operate in terms of a specific type. If you are not certain of what type should be used for this purpose in the long term, you can hide it behind a typedef'd opaque type. That way, if you change the type later, you may not have to change every reference to it in the code. You just have to change the initial typedef statement. This can be done for structs or scalar types.
Guess I need an example here...

Error logging with Gossip

Gossip is a simple library for logging both errors and debugging messages. It allows you to send logging messages to either stderr, syslog, or a text file.Gossip uses a debug mask to determine which messages get logged. You may specify a mask level with each debugging call. These messages can then be toggled on or off depending on what the global mask value is. This allows you to turn debugging on or off just for specific parts of your software at run time. The global mask may be made up of several individual mask values logially or'd together in order to enable logging for multiple parts of your software simultaneously. Gossip also allows you to send error messages. These are similar to debugging messages, except that they get logged regardless of the mask value and whether debugging is turned on or off. These error messages should only be used in situations in which a critical error should be recorded.

The following is a list of functions provided in the Gossip library:

gossip_enable_stderr(): Directs logging messages to stderr.
gossip_enable_file(filename, mode): Directs logging messages to a specified file. The arguments are the same as those taken by the fopen() function.
gossip_enable_syslog(priority): Directs logging to syslog. The priority argument is the same as that given to the syslog() function.
gossip_set_debug_mask(debug_on, mask): Turns debugging messages on or off and specifies the mask value to use if turned on.
gossip_disable(): Gracefully shuts down the Gossip logging facilities.
gossip_debug(mask, format, ...): Logs a debugging message. Uses the same format syntax as the printf() function call. It will only print if debugging is turned on and the mask value matches the global mask specified with gossip_set_debug_mask().
gossip_ldebug(mask, format, ...): Same as above, except that it prepends each message with the file name and line number of the source code that invoked it.
gossip_err(format, ...): Logs error messages. These will print regardless of the mask and whether debugging is turned on or off.
gossip_lerr(format, ...): Same as above, except that it prepends each message with the file name and line number of the source code that invoked it.
Examples of how to use Gossip can be found in the gossip/examples directory of the Gossip source code. This code can be found in the pvfs2/src/common/gossip directory within the OrangeFS source tree.

Similarly, run-time client debugging information can be gathered by using environment variables before running the client application. The default client logging method is to set the variable PVFS2_DEBUGMASK to values such as client,network. Many of the supported client debugging masks overlap the server masks that can be verified using pvfs2-set-debugmask. By default, setting PVFS2_DEBUGMASK emits debugging information to stderr, often intermixed with the client program output. If you'd like to redirect client debugging to a file, set the PVFS2_DEBUGFILE environment variable to a valid file name. This causes all debug information specified by the PVFS2_DEBUGMASK to be stored in the file specified, no longer intermixing the output with the client program. The file specified in the PVFS2_DEBUGFILE environment variable will be appended if it already exists. Another influential environment variable that can be used to trigger kmod logging messages is PVFS2_KMODMASK. By setting values of this variable to file, inode prior to starting pvfs2-client-core daemon, verbose kmod subsystem error diagnostics are written to the system ring buffer and eventually to the kernel logs. One could also set the kmod diagnostic level when the kernel module is loaded like so, insmod pvfs2.ko module_parm_debug_mask=<diagnostic level>. The diagnostic level will be a bitwise OR of values specified in pvfs2-debug.h. For more information on setting the kernel or client debug mask, see doc/pvfs2-logging.txt in the OrangeFS source tree.

Suggested error handling

Traditional application error handling with errno

Most unix system calls set a global variable called errno when an error condition occurs. Since this is a global variable, it is overwritten everytime a system call is made. This means that it must be checked immediately following the failure of the system call in question. The errno values correspond to to various error conditions, wuch as bad file descriptor or permission denied. One can print out a textual description of these error values using the perror() or strerror() functions. More information about the use of errno can be found in the man pages for errno, perror, and and strerror.

The use of errno in this manner is fine for small applications, but becomes more tedious when building larger software projects. The problem is that you must store the error value somewhere when passing the error back through multiple abstraction layers. This tends to cause confusion in large projects.