Skip to main contentCal State San Bernardino / [CNS] / [Comp Sci Dept] / [R J Botting] >> [CSci202] >> functions
[Index] [Schedule] [Syllabi] [Text] [Labs] [Projects] [Resources] [Search] [Contact] [Grading]
Notes: [01] [02] [03] [04] [05] [06] [07] [08] [09] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20]
Mon Jun 4 07:51:12 PDT 2007

Contents


    Functions in C++

      Introduction

      This page is a quick summary of things you may need to know about C++ functions at the CS202 (CS1/CS2) level. With out these facts you will make more mistakes and also not be able to use the full power of C++ to make your programming easier.

      Basics

      A C++ function is a sub-program. It is a small, named piece of a program that is called up from other parts of the same program.

      A function must be defined once and can be called any number of times. A function has a name, arguments, and a body. A common problem with programs is that a function call can not be matched up to any function in the rest of the program.

      The body is compiled and stored in memory. When the call is executed the computer jumps to the start of the body. Once in the body it executes its code until it gets to the end or a 'return' statement. If it executes an "exit" statement the whole program terminates. If the computer gets to a return or the end of the function then it jumps to the instruction in the program immediately after the call.

      The Main Function

      All executable programs must have a special function with name "main" and special arguments. The program run starts when the operating system calls this "main" function.

      The main function either has no arguments, 2 arguments, or 3 arguments. The 2 argument case allows the program to access the command that the user typed in. The three argument case also give the program access to the variables defined in operating system (the "environment").

      The main program should return an positive int that indicates if any errors have been discovered by the program. The number zero indicates that zero errors have occurred. A number other than zero indicates that the program did not complete its task and the precise number often is used to indicate the reason for the program failing. The operating system has the power to find out this returned number.

      Compilation

      The compiler has to match up the function call with the right function. It uses the name of the function plus the number and type of the arguments in the call. to find a matching function. If no function is found that has the right name and arguments then the compiler reports an error and the program can not be run.

      A compiler doesn't have to know everything about a function to set up the linkage between the calls and the function body. All it needs to know is the name of the function and the types of arguments that it accepts.

      A function definition has two parts: a header and a body. The header specifies how the function can be called. The body defines what happens if and when the function is called. We say that the header specifies the function. The complete body and header is said to be an implementation of the function.

      In C++ there is a special syntax for giving the header without defining the body: In essence, the body is replaced by a semicolon. The result is called a function prototype, or specification. A file that contains a series of function prototypes is called a header file. Typically a compiler learns about the functions you plan to use in a program file by compiling a series of these header files #included at the start of the program. These are often predefined library files that come with the compiler. Here [ string ] is a slightly simplified version of the C++ standard header file <string>.

      Linking

      The final connection between a function call and its function is normally made after the compiler has finished its work by a separate program called a linker. The linker takes a series of compiled pieces of programs (called object or .o files) and puts them together to give a complete executable program. If a function is called in one object file but has its definition in a different file, then the it is the linker that makes the connection between them. This is said to resolve the link. The result is that the function call has the address of the function body put in pace of the name and type of the function.

      Many compilers encode the name of the function and its arguments into a single symbol. This encoding is called name-mangling. Name-mangling can make it difficult to interpret some error messages produced by the linker. They will refer to the mangled name as missing. As a result you may have to think about several similarly named functions before figuring out the call that fails to call a likable function.

      Execution

        Before a function is called some of its arguments have to be evaluated.

        When a function is called, after all arguments have been prepared for the function, but before control is passed to the function the address of the call is placed on the run-time stack. This value is used when the function returns control to the controlling program.

        Handling arguments

          Call by Value

        1. The normal argument is an expression that must be evaluated first. The resulting values are placed on a stack where the function can find them and use them.

          Call by Reference

          Some arguments, however, are called by reference. These must be variables rather than expressions, and they are not evaluated. The function is instead given the address of the variables and uses these in place of the symbol in the function header.

          A call by reference is shown in the header by adding an ampersand symbol after the type:

                   type name (...., type & argument, ....)

          Call by reference permits a function to access the actual arguments directly in RAM. So, it is often faster to pass the reference rather than copy over the whole value of the data.

          However this also permits the function to change the arguments. This can make it difficult to find out what is going on in a program. It is therefore possible to indicate that an argument is called by reference but is safe from being changed:

                   type name (...., const type & argument, ....)

          Calls with Pointers

          It is also possible for a function to specify that it will be given an expression, that evaluates to an address -- rather than being given an object via its address. This is shown:
                   type name (...., type * argument, ....)
          and requires a lot more care from the programmer than the use of call be reference. The commonest form of this is the older C++ data type of the "char*" (pronounced char-star). This is the address of a character in memory and is often taken to indicate the start of an array of similar characters. It is simpler and safer to use the string library instead.

        . . . . . . . . . ( end of section Handling arguments) <<Contents | End>>

        Return Values

        A function header can specify that a value must and will be returned by a function. If so there must be one or more return_statement's that contain an expression. If the expression is executed than the expression is evaluated and the value left behind (on the runtime stack) for the calling program to collect and use.

      . . . . . . . . . ( end of section Execution) <<Contents | End>>

      Overloading

      A C++ function is matched, by the compiler+linker, to its calls by using both its name and the types of its arguments. So the same name can be used for several similar functions with different types of data.

      This works well as long as the names you choose actually reflect what the functions do.

      Operators

      All C++ operators are actually functions with a peculiar syntax. Addition, for example is normally written:
               1+2
      but can be written as a function call instead:
               operator +(1,2)
      The keyword "operator" followed by a symbol is a special kind of function name.

      You can create your own operators by declaring them as functions in the following typical ways. Operators with two arguments:

               type operator symbol(type left, type right) body
               type class::operator(type argument) body
      Operators with one argument (like "-" for example)
               type class::operator symbol() body
      Consult a good reference manual before choosing the best form for your operators.

      By the way -- operators are very useful as long as the do what they seem to do. A program that uses a plus sign to do a convolution is not a good idea. You should aim to mimic normal mathematical and C++ usage of operator symbols.

      Default Argument Values

      The arguments at the end of an argument list can be given default values. If a function call omits these arguments the default value is used in its place.

      Template Functions

      You can spend a lot of time developing a function that works on int's and then have to rewrite it to work with float's and then as double's. Instead you could write a generic version of the function and let the compiler generate the individual functions for you. In this case it is just a matter of putting
       		template<typename T>
      in front of the function header and writing the body using "T" instead of int/float/double. As long as there is at least one function argument that has type T the compiler can then create the function that you want by matching the type of the argument to "T" and replacing "T" by the actual type of the argument.

      A template is a clever labor saving device. You write generic code that can be used to generate many different kinds of functions and classes automatically.

      There is a catch. Generating instances of templates is complicated process. Many compiler+linkers can not precompile them to object(*.o) files and link them successfully yet. According to the author of C++ a separately compiled template needs to have the keyword "export" put before its definition. However your compiler may not accept this. Instead you may have to set special switches and/or include special compiler flags.

      As a result, until compilers and linkers improve, it is best to not separate specification (*.h) files and implementations (*.cpp) of templates. We normally put the complete function definition (Header+body) in the *.h file and #include it in files that use the template.

      For more see my notes [ templates.html ] on templates.

      Inline Functions

      Normally a function is stored inside a program until it is invoked. There is one stored copy and many calls(invocations) all ovr the program. These calls transfer control (jump) into the function and the function later jumps back to the instruction after the call. This takes time. If a function is small enough then it may be worth generating a copy to replace each call in the program. The older technique was called macro-substitution and used the "#define" to do it. This was unsafe because it was done before the compiler starts to check syntax and semantics. The modern and safe way to ask for a function to be "compiled inline" is to put the keyword "inline" in front of the function header.

      Because inline functions are handled by the compiler they can not be precompiled into a *.o file. They should be put in a .h file that is included in theprograms that need to use them.

      As a rule inline functions should be only a few lines longs at most. This way the compiler doesn't generate a lot of duplicate code -- one copy for each call. The result is a program that does not jump all over the place and so runs faster.

      Functions in Classes

        One powerful way to think about a C++ class is to imagine that it is a family of functions with access to some common data. Together these functions undertake the various responsibilities of objects that belong to the class.

        Member Functions

        A function that is a member of a class is always called in association with an object:
                 object_name . function_name ( actual_arguments )
                 pointer -> function_name ( actual_arguments )
                 (*pointer) . function_name ( actual_arguments )
        (The last two are equivalent).

        Inline Functions in Classes

        If a complete function definition (header+body) is placed inside a class definition:
         		class Object{ public: Object(int x){ *i=x; } ...}
        then it is compiled as an inline function. For more see [ Inline Functions ] above. These declarations include specification and implementation. They should be inside .h files and so #included in files that refer to the class. The result will be a slightly longer program that runs a little faster. As a rule inline member functions should be no longer than a single line of code. Here is a sample of a such a class in a header file: [ inline_stack.h ] and a test program [ tin_stack.cpp ] for this Stack class. Notice that this "header" file includes the bodies of the member fucntions.

        Commonly, inline functions appear in generic classes. This means inside a template class. For an example here is a generic stack with all functions inline:

         	template <typename T>
         	class Stack
         	{
         	public:
         		void push(T n){ s.push_back(n); }
         		T pop(){ assert( ! empty()); T r=s.back(); s.pop_back(); return r;}
         		T top() const { assert( ! empty() ); return s.back();}
         		bool empty() const { return s.size()==0; }
         	private:
         		vector<T> s;
         	}; // Stack
        See the complete header file [ gin_stack.h ] and [ gtin_stack.cpp ] a test program that includes it. Note that the test deliberately crashes to program by popping more out of the stack than has been pushed on to it.

        Encapsulation

        Public memebers (functions and data) can be used by any command in the program -- in the same calss, in a different class,

        Private members are present but not accessible to the other pieces of the program. Protected members are both present and accessible to derived classes.

        Overriding Functions

        When a new class C (say) is derived from a base class (B say) [ Generalization ]
                 class C:public B { newstuff };
        some of the functions in the newstuff can have the same name and argument types as those in class B. They are said to override the functions in B. Normally the compiler sorts the correct function (in C or B) by looking at the declared types of the objects involved. Sometimes this is not what we need.

        Note: override and overload shouldn't be confused. They seem similar but apply in different circumstances.

        Virtual Functions

        If class C is derieved from B then a pointer to C is also pointing at an object of type B [ Generalization ] because objects of type C contain a hiddden object of type B inside them.

        Normal function calls are connected to their functions by the compiler+linker before the program starts running. This is simple and efficient. For example in:

         		Base * pc = new C;
         		pc->makeMyDay();
        The compiler will normally assume that you want B's version of the makeMyDay() function, not the C version(if any).

        However this makes programs harder to maintain and more complex. In C++ you can declare a function so that the connection is not made until the function call takes place. This is less efficient but useful in complicated projects. The technique is called "using a virtual function". If makeMyDay was a virtual function then the compiler does not choose the declared type (B*). Instead it plants code that lookes at the type of *pc (the object pc is referring to) and then picks the version of mkaeMyDay for that object.

        Virtual functions have a very limited application:
        Net

        1. virtual functions are member functions.
          the word "virtual" is needed in the header in the base class.
          the object must be referred to by a pointer to the base class
        2. the object must actually be a an object of the base or a derived class.
        3. all functions that directly or indirectly override the virtual function are also virtual.

        (End of Net)

        The effect is
        Net

        1. the compiler does not resolve virtual function calls.
        2. The run time code uses the actual object's type to resolve the function call.
        3. extra data to determine the type of an object has to be added to the object.
        4. the family of classes has an array of each virtual functions.
        5. Finding the virtual function involves a quick array lookup.
        6. Old compilers used an ineffecient lookup table and suffered from "code bloat" as a result.

        (End of Net)

        Classwide Functions

        A classwide or static member function is attached to the class of object in which it is a member rather than to the objects in the class. For example: each student in a class has an individual height, but the class as a whole has a separate "classwide" average height. Similarly, the number of objects created in a program of a particular class is a classwide property of the class rather than of the objects.

      . . . . . . . . . ( end of section Functions in Classes) <<Contents | End>>

      Syntax Rules

    1. function_call::= function_name actual_arguments. -- a function call can occur in an expression or as a freestanding statement.

    2. function_header::= returned_type function_name formal_arguments.
    3. function_specification::=returned_type function_name specification_formal_arguments.

    4. function_prototype::= function_header semicolon | function_specification semicolon.

    5. function_definition::= function_header function_body.

    6. function_body::= left_brace commands right_brace.

    7. returned_type::= "void" | some data type or class name. --indicates if the function returns a value. The void indicates that the function must not return any value, and must be called as a statement rather than inside an expression.

    8. function_name::= any valid identifier | operator_function_name.

    9. operator_function_name::= "operator" operator_symbol.

    10. specification_formal_arguments::= left_parenthesis list_of_data_types right_parenthsis.
    11. formal_arguments::= left_parenthesis list_of_formal_arguments right_parenthesis.
    12. actual_arguments::=left_parenthesis list_actual_arguments right_parenthesis.

    13. list_of_formal_arguments::= empty | formal_argument Optional(comma list_of_formal_arguments).
    14. list_of_actual_arguments::= empty | actual_argument Optional(comma list_of_actual_arguments).
    15. list_of_data_types::= empty | data_type Optional(comma data_type).

    16. formal_argument::= type name Optional( default_value ).
    17. actual_argument::= expression.
    18. default_value::= equals expression.

    19. return_statement::= "return" Optional(expression). -- the expression must be present if the function returns a value and must be missing if it has a void returned_type.

    20. Optional(X)::= empty | X.

      Jargon

    21. int::data_type=used to store integers -- whole numbers.
    22. float::data_type=`used for real numbers when precision is not important`.
    23. double::=data_type=`used for real numbers when precision is important`.
    24. char::data_type=`used to store a single 7, 8 or 9 bit character`.

    25. data_type::=`a collection of objects that are stored in similar formats and have the same operations and operators.`

      Symbols

    26. equals::="=".
    27. semicolon::=";".

    28. left_parenthesis::= "(".
    29. right_parenthesis::= ")".
    30. left_brace::="{".
    31. right_brace::="}".

    . . . . . . . . . ( end of section Functions in C++) <<Contents | End>>

    Appendix

      Generalization

      Once a class has been defined another class can be derived from it.
       		class Derived: public Base { newstuff };
      This means that all the properties and functions of the original (base) class are available to objects in the derived class. This is designed so that the base class doesn't have to be recompiled when a new class is derived from it. The process that makes this happen is called inheritance.

      Because a complete base object is hidden at the start of each derived object the address of the derived object is exactly the same as the address of the base object. The compiler will therefore let you attach a pointer to a base class to any object of any derived type. So a pointer can be declared like this:

       		Base * basepointer= new Derived(data);
      and the compiler will accept the result.

      This is handy when you need an array of objects that share some properties but have different detailed behavior. Instead of storing the object, we store the pointers in the array. To make sure that this executes the Derived function:

       		basepointer->function(data)
      the function must be a [ Virtual Function ] otherwise the Base function will be chosen by the compiler. Virtual functions are selecte as the program runs.

    . . . . . . . . . ( end of section Appendix) <<Contents | End>>

    Abbreviations

  1. TBA::="To Be Announced", something I have to do.
  2. TBD::="To Be Done", something you have to do.
  3. Dia::="A free Open Source Diagramming tool for Linux, Windoze, etc. ".
  4. YAGNI::="You Ain't Gonna Need It".
  5. DRY::="Don't Repeat Yourself".

End