[Skip Navigation] [CSUSB] / [CNS] / [CSE] / [R J Botting] / [CSci202] / pointers
[Text Version] [Syllabus] [Schedule] [Glossary] [Resources] [Grading] [Contact] [Question] [Search ]
Notes: [01] [02] [03] [04] [05] [06] [07] [08] [09] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20]
Labs: [01] [02] [03] [04] [05] [06] [07] [08] [09] [10]
Tue May 31 15:33:02 PDT 2011

Contents


    Notes on Pointers

      These notes summarize just about everything you need to know about pointers in CS202.

      See Also

      [ Pointers ] (Wikipedia) [ http://cslibrary.stanford.edu/104/ ] and [ 03binky.cpp ] (Pointer fun with Binky) [ lovelypointers.asp ] (Only for experts who wantto know about MicroSoft C++ stuff).

      Pointer as dog trained to point at the bird etc you are hunting

      [ haysi-fantazee.jpg ]

      Computer pointers also tell your program where to find things... but they point out a place in memory not a bird in the brush.

      A Problem that needs Pointers

      Construct a family tree storing the people, their marriages, their children, their brothers and sisters, etc.

      We need a way to link together related people.... to be able to find a person's mother, father, son, sister, ...

       dick---father--> frank ---father--->john
                         ||                  |married
                        frank ---mother--->chrystabel
                         |married
       dick---mother--> meg ---mother--->sarah---father--->caleb
                         ||                |married
                        meg ---father--->richard
                                          |brother-sister
                                         kath
      Notice that I used names ("dick", "meg", and "frank") in the diagram above. In computer memory there are no names! Instead we have addresses. We place the data in memory and must remember where we put it. So we must put addresses into memory as well.

      The addresses are special numbers. We call them pointers. We connect different objects by storing one object's address (a number) as part of the other object. We, therefore, start by reviewing the ideas of address and contents in computer storage..

      Introduction

      Pointers are hard to understand because they are part of the machine's way of doing business. They are not designed to be easy for humans. They are part of the way the CPU and Memory works. You need, therefore, to learn to imagine the memory of the computer, and how it works.

      You should think of computer memory or RAM a gigantic array of bytes:

      1. byte
      2. byte
      3. byte
      4. ...

      The bytes are numbered. The number of a byte is called its address. Each address is a number that identifies a particular piece of memory.

      You have already learned that a variable is an identifier that is associated (tied, bound, linked) to a piece of storage of the right size to hold the variable's values.

      In a running program the memory is divided into three pieces:

      When you declare a variable like this:

       		int remainder;
      the program interprets this by (1)reserving a piece of the stack and (2) using the address of of this piece of storage in place of the variable remainder. At the end of the block in which remainder was declare the storage is "popped" off the Stack ready for recycling for other temporary storage.

      This is a local variable.

      The Heap is random access storage and available for data that is not on the stack. It is not created and deleted automatically. You must explicitly reserve a piece of it. You must explicitly access it. You must make sure your program remembers where it is. You must explicitly recycle it. The heap is often highly disorganized.

      The parts of the heap are not bound to a fixed variable. The storage has no name. It has an address instead. We store the address in a special variables so that we can find the data again.

      Pointers

      In C and C++ there is a special kind of variable called a pointer. A pointer variable is a variable that has some storage of its own.... often on the stack. But a pointer also stores another address. This address is where the "real" data can be found. So a pointer has two addresses - one where it is and one where it points. It forms a link between two addresses.

      You need to think of pointers as an indirect way of getting to data. Like a phone number points to a person. Like an address identifies a house, and so people inside the house. Pointers are like a finger pointing at something and saying "That thing, there ---->".

      Do not confuse the finger with the thing it is pointing at!

      For example the amount of storage needed for an address is the same whatever data is stored in that address. Think of two twins at the zoo: one points at an elephant, and the other other at an ant. The twins are the same size, but the ant and the elephant are different sizes!

      C++ Pointers

      If you declare a pointer variable called p then it points at a thing written
       		*p

      So in C/C++ we declare a pointer to an object of type T by writing:

       		T *p;
      It says: If you put an asterisk in front of p you will get a T.

      After the above statement you've got a dangling pointer because it hasn't been assigned to a reliable address yet. You have three choices for safely getting an address:


      1. Assign the address of a variable t of type T:
         		p = & t;
      2. Copy an existing and valid/alive/nondangling pointer:
         		p = another_pointer;
      3. Grab some new storage off the heap of unused memory:
         		p = new T;

      Think about this program:
       		#include <iostream.h>
       		main()
       		{
       			int *p;
       			int i=12;
       			p= & i;
       			*p = 17;
       			cout << "p is " << p << " and i is " << i;
       		}
      If you compile it and run it then it prints out two values. The address in p is printed in Hexadecimal format ( "0x...") and i is printed and should equal 17, because changing *p is changing i.

      In C++ you get a new piece of heap storage of type T by:

       		new T
      This is an expression whose value is an address! However, this storage has not been given a value yet. If T is an elementary data type then you will have to initialise the new storage:
       		int* pi = new int;  *pi=42;
      Note: it is an error to write pi=42; which makes pi point at byte 42 in RAM!

      If you repeatedly grab new storage then your program will run out of memory. We call this a memory leak. When you assign an existing address (using & for example) then you don't have to worry. It is the calls to new that give storage that needs tidying up. Learn to assoicate every new with a delete:


      1. Grab some new storage off the heap of unused memory:
         		p = new T;
          ...
      2. Return the storage to the heap of unused memory:
          ...
         		delete p;

      Many years ago, in a country far away from here, a friend wrote the software used to reconcile checks every night.... the software ran on minicomputers all England. It worked perfectly for three weeks and then started to fail mysteriously.... all over England. The machines started to report they had run out of memory. It was a small memory leak. My friend was very embarassed by this.

      It is wise to return unused storage to the heap when you have finished with it. Storage you can't access any more is called garbage. Losing track of such storage causes a memory leak. Collecting and returning useless memory is called [ Garbage Collection ] , and there is more on this below.

      Think about this program:

       		#include <iostream.h>
       		main()
       		{
       			int *p;
       			p=new int;
       			*p = 17;
       			cout << "p is " << p << " and *p is " << *p;
       			delete p;
       		}
      If you compile it and run it then it prints out two values. The address in p is printed in Hexadecimal format ( "0x...") and the contents (*p) is printed in decimal and should equal 17.

      Pointers and Constructors

      If we have a class C then new C will use C's default constructor to initialize the contents of the new storage. The expression new C(arguments) calls a special constructor to initialize the object and returns the address of the object.

      Notice that a constructor can also grab heap storage. If class C has a constructor C::C(...){.....new....} then it grabs storage off the heap. You should also write a destructor with header ~C() to delete the storage grabbed inside C(...).

      C++ Notation

      if *p is an object (or struct) then part of the thing it points at is written:
      		p->name_of_part

      If p points at an object with a member function then

       		p->name_of_function(arguments)
      calls the function and applies it to *p.

      Objects vs Pointer at objects


      Table
      declare an object in class:Class_Name object_variable;
      address of object:& object_variable
      declare pointer:Class_Name *pointer_variable;
      make default dynamic object:pointer_variable = new Class_Name;
      make object on the heap:pointer_variable = new Class_Name(arguments);
      address of object:pointer_variable
      The object pointed at:*pointer_variable
      part of objectobject_variable.name_of_part
      part of objectpointer_variable->name_of_part
      Do something to objectobject_variable.name_of_function(arguments)
      Do something to objectpointer_variable->name_of_function(arguments)
      Store value in objectobject_variable= new_value;
      Store value in object*pointer_variable = new_value;
      Move pointerpointer_variable = another_pointer_or_address

      (Close Table)
      Notice an object_variable is stuck to its object, but a pointer_variable can point at different objects at different times.

      Pointers get tricky because two or more pointers can point at the same thing! In the following program the change to *q also changes *p. But changing *q does not change q. The pointer is not the object that it points at:

       		#include <iostream.h>
       		main()
       		{
       			int *p;
       			p=new int;
       			*p = 42;
       			cout << "p is " << p << " and *p is " << *p;
       			int *q;
       			q = p;
       			cout << "q is " << q << " and *q is " << *q;
       			*q = 17;
       			cout << "q is " << q << " and *q is " << *q;
       			cout << "p is " << p << " and *p is " << *p;
       			delete p;
       		}
      Here is a picture of the storage just as the output is being produced:

      Example p[x--]-->x[42]<--q[--x]

      Notice: I've used x as a symbol for the unknown address created by

       			p=new int;
      Doing this is a good way to trace programs that contain pointers.

      Notice I only have to delete one piece of new storage. I chose to do this via delete p;. Since p and q point at the same int it doesn't matter which I use.... but I mustn't do both. After deleting p, both p and q can not be derefenced: *p and *q can crash the program.

      Addresses of Variables


      Net
      1. Declare a variable of type T like this:
         				T v;
      2. Then v is a variable and has type T.

      3. The address of variable v is written with the ampersand character (shifted 7)
         				&v
      4. Then &v has type T* and can point at v. It can not be made to point any where else either!

         				T * p = &v;
      5. p is a variable of type T* and can point at objects of type T. It is initially pointing at v -- because it is equal to &v. So *p has the same value (initially) as v has.

      6. Always, *(&v) is the same as v. Asterisks(*) cancel out ampersands(&).

      (End of Net)

      The following may help:


        Memory is like a gigantic array of byte-sized boxes. Each box is numbered. Imagine a large number of labeled boxes... all the same size, with labels stuck on them... The labels are numbers 0,1,2,3,...

        The numbers are called the addresses. The boxes are memory locations. Inside each box is its contents. If you don't put something in a box you will find some junk left behind by the previous user...

        The compiler handles a normal declaration of a variable by writing the name of the variable on an unused boxes label, underneath the number. When the variable goes out of scope the name is erased and the box can be reused for something else.

        A Pointer is a box that has a number in it. This number must be the address(on a label) of another box. When the compiler sees the declaration of a pointer variable it again finds an unused box and writes the name of the pointer on it. When the program exits the block the variable's name is erased again.

        A program can put an address into a box used for a pointer. Then the number in the pointer's box is the number on another box. It is said to point at the other box.

        The '*' operator in an expression returns follows the pointer from its box to the address of stored in the pointers box. If p is the name on box number 12 for example and box 12 has the number 123 inside it (12 outside, 123 inside) then *p is whatever is in box 123 and p is 123.


      It helps to draw many pictures!
        A pointer is actually the index of a very large array of data. It points to one piece of RAM: Random Access Memory, or Primary memory. Suppose we declare p by
         	Type * p;
        then
      1. p is an index to a gigantic array RAM[0..???] of Type items and
      2. *p is short for RAM[p].

        So until you make p point at something old or something new:

         		p = & old_thing;
        or
         		p = new Type;
        you dare not use *p or RAM[p].

      NULL

      In one thing a pointer is not an index in RAM. There is a special value that a pointer can have that is not an address of an item in RAM. It is symbolized by
       		NULL
      in C and C++. It is called the the Null-pointer. It is used to indicate that there is NO data linked in at that point (think of a loose phone chord).

      If a pointer p is set to NULL then it is saying that p points at no data.

      It is said that madness destroys those who follow the NULL pointer. In other words if p==NULL then *p will crash the program.

      Pointers and Vectors

      It is quite common to have a vector full of pointers. for example if:
       		vector<int*> pv;
      then you can add new int locations to the vector:
       		pv.push_back(new int);
       		pv.push_back(new int);
      and do things with them:
       		*(pv[0])=42;
       		*(pv[1])=32;

      It is also possible to have a variable that points at a vector!

       		vector<int> *vp;
      The above declares a pointer variable, but it is a dangling pointer until we assign it to the address of an existing vector:
       		vp = & v;
      or a new and empty vector of ints:
       		vp = new vector<int>;
      Here you can add ints to the vector that vp points at:
       		(*vp).push_back(42);
       		(*vp).push_back(32);
      or in shorthand
       		vp->push_back(42);
       		vp->push_back(32);
      There are two examples of these two techniques in [ pv.cpp ] and [ vp.cpp ] can you figure out which uses a pointer to a vector, and which uses a vector of pointers?

      Pointers and Arrays

      First, you can have arrays that have pointers in them. Here is a common example:
       		char * argv[]
      Each argv[i] is the address of a character. Or in short hand each argv[i] is a char*.

      Second, sneakily, each C++ array name is a constant pointer! In C/C++ all the items in an array are the same size and they are placed in memory like this:
      Table
      AddressContents
      name+0name[0]
      name+1name[1]
      name+2name[2]
      ...
      name+length-1name[length-1]

      (Close Table)
      Indeed for all arrays

    1. *(name+subscript) == a[subscript]

      In other words we can add integers to pointers to get another pointer. This is called pointer arithmetic and is rather powerful.

      If you use the name of an array without any index then you get the address of the data, not the data itself. This means you can attach a pointer to the first item in the array like this:

       		Item * pointer = array_name;

      Now, if p points at an item in an array then p+1 points at the next item in the array (if it exists). C/C++ is designed so that we can use pointers to rapidly scan the items in an array:

       		for(Item *p = array_name; p is OK; p++) do something with *p;

      Notice that, if your pointer goes outside the array then lizards can come out of your nose!

      Data Structures with Objects

      With simple data like int's and doubles you can store them in a vector or array quite safely. If you have more complex data -- objects in classes or structs -- then it may be best to use arrays and vectors of pointers to the objects rather than the objects themselves. The explanation will have to wait until we cover polymorphism!
       		vector <*Classname> vpc;
       		(*Classname)[Size] apc;

      Common and Subtle errors with Pointers

      There are two big problems with pointers
      1. (following a bad pointer): Following a dangling or NULL pointer.
      2. (memory leak): Creating undeleted garbage.
      3. (running out of heap): THis is rare with good programs and is usually the reult of a meory leak in bad programs.

      The first error is usually signalled by the program going illegal -- accessing a restricted piece of memory, or the operating system crashing.

      The second is subtle and tends to cause problems after people have started to use the system. A memory leak occurs when you accidently create garbage and don't collect it... it then ceases be useful to the computer and ultimately you can run out of memory.

      For example -- my friend Tony worked on the computers that the Banks in England used to clear checks. They formed a nationwide network. He had a memory leak if one word (4 bytes) each time a chack was cleared. The network ran perfectl;y for three weeks and then machines started to report that they had no more memory -- embarrassment and panic ensured...

      Here is a way to work out if an operation on a pointer is safe, risky, or unsafe. One reason C++ and C programs are unreliable is the shear complexity of these rules.

    2. Pointer_states::=following
      Net
        A pointer is in one of four states: dead, alive, null, or out_of_scope.

        It is only when a pointer is alive that you can use it safely.

        The classic use of a pointer is to start dead, become alive, be used for various purposes, and then to be deleted and become dead again before the program removes it when it goes out of scope.


        (dead): Pointers are dead when declared and become alive when you assign an address, or a new location or an alive pointer. It is safe (indeed wise) to assign a NULL to a dead pointer -- it becomes a null pointer. It is safe to leave the scope (at a "}" in the program) if the pointer is dead. It is unsafe to delete or follow(dereference) a dead pointer.


        (alive): Alive pointers are pointing at a real safe location. You can safely follow them (derefence them). You can delete them -- and they become dead pointers -- however there is a risk that other pointers will suddenly become dead if they refer to the same storage! Assigning a new value to a pointer is risky because it can create garbage -- by losing track of the old address which then becomes a memory-leak. If you assign NULL they become NULL pointers. Assigning a dead pointer is unsafe and makes the pointer dead. If you exit the scope of a pointer that is alive then you risk losing track of it's memory and creating garbage and a possible memory leak.


        (null): Null pointers can not be followed safely -- they go nowhere. They can not be deleted safely -- there is nothing to delete. But you can assign an alive pointer (or a new piece of storage) safely and they become alive. If you assign a dead pointer they also become dead, but this is rather a stupid thing to do! It is safe to let a null pointer go out of scope.


      (End of Net)

      Note: In some of my sample programs in this page I have some memory leaks that are safe -- you can safely exit a program with live pointers because the operating system clears the memory up, garbage and all.

      Hints

      I have some rules about pointers:
    3. dicks_rules::=following,
      Net
      1. K.I.S.S.: Keep pointers as simple as you possibly can.
      2. Encapsulate pointers in thoroughly tested, reusable classes that implement well know data structures.
      3. Don't reinvent the Wheel. In C++ use [ The Standard Library ]
      4. Draw lots of diagrams of the RAM.
      5. Walk through the program with a table of addresses and data, step by step.
      6. Document your data structures using the UML.
      7. Take a Data Structures Class (CS330).
      8. Take out the trash: Every new needs a delete.
      9. Never follow a dead or null pointer.

      (End of Net)

      Pointers in the UML

      A pointer should be shown in a UML daigram as an association with an arrow-head. For example

      [Example class with a pointer]

      Older books put and open diamond at the other end ( <>-----> ). This is is called an aggregation. It looks good but is not necessary.

      The commonest special links in a UML class diagram are

      [ USES, ISA, HAS, KNOWS ]

      Other associations are a simple line connecting two classes and should be given a descriptive name.

      The Standard Library

      Standard C++ comes with a large library of powerful templates designed to make programming a faster and less error prone process, without slowing down the speed of the program. The library is called STL after its old name "Standard Template Library." It is used by including files with an names like this: list, stack, vector,... [ stl.html ] You have already met vector and string from this library.

      Dynamically Allocated Arrays

        A Problem that needs Dynamic Arrays

        Input and store a line of users input. You have a virtual memory system with therefore a GByte of storage.... but you don't want to declare an array with 1000MBytes of storage when the user input 5 characters!

        The creation of vectors as part of the Standard Library for C++ hides the need for storage that appears to grow and shrink to fit the current size of the problem.

        Arrays on the Heap

        The storage that a pointer points at is usually explicitly created and destroyed as the program runs. The size of this can be determined after the program starts running. Such dynamic data has an address that is determined after the program starts running... and must be stored in a special variable. This called a pointer variable.

        A pointer variable stores an address and this address contains the data. For example a dynamic array is set up like this:


        1. declaration: Class_Name *pointer_variable;
        2. Get storage pointer_variable = new Class_Name[Size]; (Size can be any expression)

        It is operated on like this:
        1. Put data in pointer_variable[place] = value;

        2. Operate on pointer_variable[place] . function_name(arguments);

        3. first Item pointer_variable[0]
        4. next Item pointer_variable[1]
        5. ...
        6. last Item pointer_variable[Size-1]

        You should delete a dynamically allocated array like this
        1. Free storage delete [] pointer_variable;

      . . . . . . . . . ( end of section Dynamically Allocated Arrays) <<Contents | End>>

      Garbage Collection

      Nobody wants to take out the trash! In programming garbage is the storage that you have asked for (in C++ using new) and yet no longer can use because no pointer points at it any more. C++ doesn't tidy this up for you so a porogram can run out of RAM purely because it hasn't taken tidied up as it goes. There is a command that removes the storage that a pointer points at:
       		delete p;
      Afterwards p may or may not have changed.... but the storage p refered to is now available for other parts of the program to use. It is wise to also clear the pointer p at the same time:
       		delete p; p=NULL;

      Notice that if data is allocated with new[...] then it must be free'd with delete [] ....;. Do not forget the []!

      Some of the subtlest bugs occur because of bad garbage collection. If you forget to delete storage when it is no longer needed, you have a memory leak. In time your program will break. You can buy tools that make sure that you don't have any memory leaks (Purify).

      • Incorrect rule: Each pointer must be deleted.
      • Correct rule: Each new that is executed must be matched by a `delete'.

      Allocated storage, that can not be used again is called garbage. Only garbage can be safely deleted. If there is any pointer that is directly or indirectly attached to some allocated storage than it might get used.... so it is not garbage. Finding the garbage is called garbage collection. If any pointer points at a piece of storage, you must not delete the storage. When several links have been made to a node it can not be deleted until all the links are broken. Safe garbage collection in C/C++ is hard work... but is automatic in most other languages: LISP, Java, SmallTalk, Ada,.... In the old days Garbage collection was triggered off when memory ran short. These days we have "On-the-fly garbage collection".

      Classic War Story


        In the early days of Artificial Intelligence the Department of Defense (DoD) sponsored an experimental system that would answer any question put to it about the armed forces of the USA. The input would be in English and a large collection of data would be searched for relevant information. If the answer was found it would be printed. Other wise the information would be assembled into a complex data structure as the software searched for an answer to the question. They used LISP 1.5 to do the programming.

        When the day came for testing a 5 star general marched up to the console and typed in the question:

         	What is the function of the General Staff of the United states army.

        Sadly the question was not on file. The machine started to try and work out the answer. after a while its data structures had filled the memory.

        The General was not pleased when the computer answered his question by printing:

         			Garbage Collection.

      A Design Pattern for Simple Dynamic Data


      Net
      1. Hide pointers inside a class. Write setters and getters for the data, not the pointers.
      2. Allocate storage in constructors for that class.
      3. Delete storage in the destructor for that class.

      (End of Net)

    . . . . . . . . . ( end of section Notes on Pointers) <<Contents | End>>

    See Also

    Notes on linked data structures:
     http://cse.csusb.edu/dick/cs202/linked.html
    [ linked.html ]

    If you think you understand pointers, please see [ wiki?ThreeStarProgrammers ] which discusses the pros and cons of using:

     		char***ppc;
    Have fun...

    Syntax and Semantics of pointers in C++


    Net
    1. declaration::statement= type asterisk variable.
       		int * pi;
       		char * pc;
       		Person * pperson;
    2. variable::=identifier.
    3. type::=name of a struct, class, enum, typedef, or built-in data types
    4. asterisk::="*".

    5. get_value_pointed_at_expression::= asterisk expression, dereference the pointer. This means finding the contents of the address in p.
       		* pi;
       		* pc;
       		* pperson;

      Table
      AddressContents
      p*p

      (Close Table)
      Take note of how an asterisk works. It converts something of type T* into something of type T. If p is a T* then *p is a T! If t is a variable of type T then &t is a constant of type T*. So:
      • *&t == t

    6. new_element_expression::expression= new type. Gets a piece of memory of the right size and returns the address.

    7. new_array_expression::expression= new type[ expression ].
    8. For all pointers p and integers i, p[i]::=*(p+i).

    9. deletion_statement::statement= delete variable.

    10. indirect_reference_to_field::expression= variable -> field.

    11. For all pointers to structures p and fields f, p->f::= (*p).f.


    (End of Net)

    Glossary

  1. constructor::=a member function with the same name as the class that is called automatically when a variable is declared of that class.

  2. garbage::=storage allocated but no longer accessable after a pointer has been moved.
  3. destructor::=a member function whose name starts with a tilde and ends with the name of the class that is called automatically when the object goes out of scope.

    Glossary

  4. accessor::=`A Function that accesses information in an object with out changing the object in any visible way". In C++ this is called a "const function". In the UML it is called a query.
  5. Algorithm::=A precise description of a series of steps to attain a goal, [ Algorithm ] (Wikipedia).
  6. class::="A description of a set of similar objects that have similar data plus the functions needed to manipulate the data".
  7. constructor::="A Function in a class that creates new objects in the class".
  8. Data_Structure::=A small data base.
  9. destructor::=`A Function that is called when an object is destroyed".
  10. Function::programming=A selfcontained and named piece of program that knows how to do something.
  11. Gnu::="Gnu's Not Unix", a long running open source project that supplies a very popular and free C++ compiler.
  12. mutator::="A Function that changes an object".
  13. object::="A little bit of knowledge -- some data and some know how". An object is instance of a class.
  14. objects::=plural of object.
  15. OOP::="Object-Oriented Programming", Current paradigm for programming.
  16. Semantics::=Rules determining the meaning of correct statements in a language.
  17. SP::="Structured Programming", a previous paradigm for programming.
  18. STL::="The standard C++ library of classes and functions" -- also called the "Standard Template Library" because many of the classes and functions will work with any kind of data.
  19. Syntax::=The rules determining the correctness and structure of statements in a language, grammar.
  20. Q::software="A program I wrote to make software easier to develop",
  21. TBA::="To Be Announced", something I should do.
  22. TBD::="To Be Done", something you have to do.
  23. UML::="Unified Modeling Language".
  24. void::C++Keyword="Indicates a function that has no return".

End