Skip to main contentCal State San Bernardino / [CNS] / [Comp Sci Dept] / [R J Botting] >> [CSci202] >> 11
[Index] [Schedule] [Syllabi] [Text] [Labs] [Projects] [Resources] [Search] [Contact] [Grading]
Notes: [01] [02] [03] [04] [05] [06] [07] [08] [09] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20]
Tue May 8 07:08:47 PDT 2007

Contents


    CSci202 Computer Science II, Session 11 Files and Strings


      (previous): Review of formatted I/O and the arguments to main: [ 10.html ]

      Preparation

      Focus on 11.8 and 11.9. Here is an example based on Horstmann's book [ tss2004.cpp ]

      Study this page and 11.6( binary files) and 11.8 (stringstreams) and 11.9 (strstreams) and write down the questions, doubts, and surprises that you have on a piece of paper.

      Assigned Work Due

      Hand in a question, doubt or surprise and the page it happens on. With your name. For credit.

      Input

        Looking inside files on UNIX -- file wc cat od

        When working with binary data you need tools to look at it when things go wrong with your code and the file is in a mess.

        Down load [ example ] and save as source and then try these commands

        The file command will make a good guess at the kind of file you have:

         	file example

        The wc command will count the lines, words, and characters in a file:

         	wc -l example
         	wc -w example
         	wc -c example

        The cat command outputs text files to the terminal (among other things).

         	cat example
        For long files use more to see the file one screen at a time, and tap space for the next page, and 'q' to quit.

        The od command will tell you precisely what is in a file -- character by character or byte by byte. It makes unprintable and "white space" characters explicit.

         	od -c example
        (Characters)
         	od -x example
        (hexadecimal bytes)
         	od -cx example
        (both).

        My Experience of Binary files

        Use binary files only when you need something special: security, efficiency, or to interface with existing data. It is nearly always easier to develop applications using normal line-oriented formatted files. Why? Because you can edit text files with normal editors! When you have unformatted data you have to write code to create the data, to delete the data, to view the data, and also to edit it. This was confirmed when I developed the code for lab06!

        The book uses direct access into a normal line-oreinted text file. I personally avoid designing programs and software that does this. I prefer handling irregular lines by one line after another... It is much simpler. But here is an example -- a program to read any 10 characters from a file: [ 11peek.cpp ] (Down load and try it.... there may be bugs).

        Unformatted data is efficient because our program can go directly to any random place in the file: direct access or in IBM-speak "DASD" (dazz-dee). We have seek to move to a character in a random access file and tell to find out where the file was last at. Call them like this:

         		fstreamvar.tellp();
        Returns the position where characters will be put.
         		fstreamvar.seekp(position);
        Go to position to get characters in. The book shows the other versions: all are useful.

         		fstreamvar.tellg();
        Position for getting characters in.
         		fstreamvar.seekg(position);
        Go to position to get characters in. The book shows the other versions: all are useful.

        Demonstration of seek and tell with formatted data [ 11seek.cpp ]

        But to be able to use direct access your program must be able to calculate where the data is.

        Start by designing the layout of the data in the unformatted file. Draw a simple picture. Tabulate the data and calculate where it is in a simple case. Work out formulas that calculate where the general data is and hence code for reading and writing it. Document any special constraints. For example the file in lab06 must always start with zero or more real records followed by zero or more blank records. Another example would be to require that the file is always sorted in a particular way. A third trick is to place data near a position calculated by a hash function, that way it is quick to find it again.

        It is also wise to not mix formatted input/output with unformatted input output. As a rule use 'read' and 'write' to handle the data, Stick to the following:

         		fstreamvar.read(address_of_object, sizeof(object)).
         		fstreamvar.write(address_of_object, sizeof(object)).
        These copy bytes into and out of primary memory. Read copies them from disk into RAM and write does the reverse. No tranformation occurs, the bytes are copied as is. [ passwd.dat ] (file) [ 11read.cpp ] (read file) [ 11write.cpp ] (change file).

        Another rule that preserves sanity: never, ever switch from reading to writing without first executing a seekp. Similarly when you have written some data and want to read from a direct access file, always take care to do a seekg before reading. However you can safely execute a series of reads (or a series of writes) and get each item from the file in turn.

        A direct access file is a repository for pieces of RAM. We can store them, keep them and then read them back in again. But when read they are are not necessarily any where near their old address.

        Never store pointers or pointer based data in an unformatted file. This means that you can not store any of the Standard Library classes (vectors, strings, deques, etc.) in objects that are written and read into unformatted files.

        It is better to use a data base to handle persistant data than invent your own data format for a direct access file!

        More Experience with Random Access Files

        The fifth lab for this course worked nicely last time I taught this course. But in 2007 the prgrams started to misbehave -- badly. First, the compiler started whining about adding a "sizeof" to a "streampos". Secondly, the listing program started to output garbage and terminated with a "Segmentation" fault. I checked the old compiled code and it worked. But the moment I recompiled the program it went badly wrong -- accessing the 123,456th character in the file (or some such silly number).

        Reason: the updates to the Gnu compiler changed the internal coding of objects like the user password in the lab. I guess that it added typeid information at the start of the object. So, what my code though was a length was something else....

        The solution was to reconstruct the "passwd" file from scratch using newly compiled programs thruout.

        Now, I'm hoping that they do not upgrade the compiler before the next lab.

        So, add another reason for avoiding direct access binary files. The format of the data is compiler dependent.

        stringstream and strstsream -- real cool stuff

        [ tss2004.cpp ]

        Handy for converting an argument to the main program into a numeric value. Also useful for interpreting a buffer of direct access data, or placing numeric data into character format in a direct access file.

      Questions

        How do get access to a .txt file in C++?

        [ ptxt.cpp ]

        How can an open file close itself by a destructor?

        When any object goes out of scope... at the end of the block in which it is declared.... its destructor is called. In the case of a fstream this then calls the function close().

        What is the difference between ixxx and oxxx?

        The i indicates the stream is for input and not for output. The o indicates means the stream is for output only.

        If you have a program that needs to both read and write data to a file use an fstream.

        What is the differnce between fstream strstream and stringstream?

        The first is attached to a file, the second to a char*, and the last to a C++ string variable. Note -- strstreams are older than stringstreams and may be removed in future versions of C++. clear() and no flags are set?">

        What happens if you clear() and no flags are set?

        Nothing.......... I think.

        Do we have to learn all these new words for types?

        Learn the pattern that underlies them, pick one item from each line below:

          i | o | nothing
        1. f | string |str
        2. stream

        Where does sstream fit?

        This is a library that defines the three kinds of stringstreams.

        Does it take time to seek data in a file?

        Yes. It depends on the average seek time for the disk and the number of positions moved. A long 'seek' take longer to execute than a short one.

        Why would use just istream or ostream

        To make the size of the compiled program smaller.

        Is position 0 in text file really the first character?

        Yes. Nothing hidden! Information about the file is kept in a separate place that depends on the Operating system.

        WHy do I need to access the arguments of main?

        (1) See the lab work. (2) WHen commands are input they have the name of the program plus other data:
         		11peek name_of_file
        and so these programs need to get this information. The operating system puts the arguments into the arguments of main.

        When writing a file does anything stop other processes writing it as well.

        NO. Dangerous! There is File Locking (flock?) but it is better to use a real data base with "Record Locking".

      Exercises

      Depends on the questions!

      Preparation for Laboratory 06: Security and Random Access

      [ lab06.gif ] [ list.cpp ] [ add.cpp ] [ del.cpp ] [ use.cpp ]

    . . . . . . . . . ( end of section CSci202 Computer Science II, Session 11 Files and Strings) <<Contents | End>>

    Laboratory 6 Information Security

    [ lab06.html ]

    Next

    Introduction to Containers (chapter 12) + [ 12.html ] Including a quiz on exceptions, I/O, Files, Arguments to Main, ... in next session

    Abreviations

  1. TBA::="To Be Announced", something I have to do.
  2. TBD::="To Be Done", something you have to do.
  3. Dia::="A free Open Source Diagramming tool for Linux, Windoze, etc. ".

End