.Open CSci202 Computer Science II, Session 13 File Handling (previous): Exceptions .See http://www/dick/cs202/12.html .Open Preparation -- 17 File Processing 848 17.1 Introduction 17.2 Data Hierarchy 17.3 Files and Streams 17.4 Creating a Sequential File 17.5 Reading Data from a Sequential File 17.6 Updating Sequential Files 17.7 Random Access files 17.8 Creating a random access file 17.9 Writing to a random access file 17.10 Read random access sequentially 17.11 Case Study -- Skip 17.12 Skip until CSCI350! 17.13 Wrap-Up .Close . Note -- getline into a string from a stream To input from a stream into a C++ string use .As_is getline(aStream, aStringVariable); not .As_is aStream.getline(aStringVariable); Example .See ./outin.cpp . Assigned Work Due -- Send a Question . Chapter 17 pages 851-852 -- Files and Streams Can you elaborate more on Files and Streams. Key point -- in C/C++ files are accessed as streams. A stream provides a handle -- an object that keeps track of the state of the file - open, closed, ok, failed, at end of file.... More stuff on files below. . Chapter 17 pages -- files What do files allow us to do in our C++ programs? Key point about Files: they are permanent stored data. They should survive the programs that read and write them. They should even survive the machine shutting down properly. It takes something like a head crash to make data on a disk or tape unreadable. A worn out flash drive may become unwritable in time.... a long time. . Chapter 17 pages 858 -- Reading and printing a sequential file In order to determine whether the file was opened successfully, is !inClientFile the only condition I can use? It is just the simplest and safest test you can do. There may be others but I don't use them. . Chapter 17 pages 848 -- Sequential File Could you simply explain what a Sequetial File is? A mistyping of Sequential File.... of course:-) A sequence of records (possibly millions of them) that must be read one after another. . Chapter 17 pages 852-856 -- Sequential Files What is sequential filing used for? It is the simplest way to store data, outside a program, so that it can be kept after the program has halted. It can also be accessed by many programs. All output goes to a sequential file, even if that `file` is your terminal -- also known as "/dev/tty" for historical reasons. . Chapter 17 pages 860 -- Random-Access File what is the purpose of creating a random-access file? It is a of storing data so it can be accessed easily in any order. . Is there a more complex way to store data on disk? Yes. It is called a data base. It is managed by a Data Base Management System or DBMS. It is a lot more complex, and uses a special language called the "Structured Query Language" or "SQL". Take a data base class if you need to store or retrieve complex data. . Chapter 17 pages -- Sequential versus Random What is the main difference between sequential file processing and random-access files? In a serial file you should start at the beginning, read each record in turn, and stop when you get to the end. In a random access file you can read any record you like, in any order, and there is a formula that calculates where a particular record can be found in the file. As a rule sequential access is faster but less flexible. Typically you have to do a lot of sorting to get efficient access to a sequential files. By closing and reopening a sequential file you can access any record.... but this takes a lot of time. Another difference between random access files (as shown in the book) and sequential files is that the records in a sequential file can vary in size -- all you have to do is choose a character to signal the end of a record. The wise programmer chooses the "end of line" character and uses "getline" to reinput the record. If all the data on the line is encoded as normal characters you can then use a normal editor to read, write, and edit the file. More recently preople have started to use the XML rules for storing data in sequential files. These are a lot more complex than we need in CS202. Files on magnetic tape are always sequential, you need disk or flash drives to get efficient magnetic tape. . Chapter 17 pages 854-855 -- Files Can you name a file the same name and attempt to open it with out an error occurring ? If you try to write to a file that already exists the system assumes you want to either overwrite the existing file or add to it (append). No error occurs. However -- valuable data can be destroyed. If you try to read or open from a file that exists -- it works. You get a run time error if `it does not exist`. . Chapter 17 pages 856-859 -- File Processing Is there any easier way to read a sequential file. The book covers all the ways.... and some are easier than others. You always need: to open and close the file and in between access the data. If all the program has to do is read and write characters then see this example: .See http://www/dick/cs202/ptxt.cpp It gets complex if the program has to "parse" the data -- create records or read and write numbers. . Chapter 17 pages 856 -- closing files when no longer needed If closing files when the program doesn't need them reduces resource usage, how can the reduction of resource usage help? The program gets some RAM to use. An open file has data stored, on the heap, describing it's status, and the next data to be transferred. This can be quite large. . Chapter 17 pages 856 -- close When should you use the close function? Close all open files when your program has done with them. Close all open files when your program has done with them. Close all open files when your program has done with them. . Chapter 17 pages 858 -- reinterpret_cast Why is reinterpret_cast necessary? Is it a shortcut to getting the correct data type output without doing a conversion? It is not a shot-cut. It just tells the compiler -- in the clearest possible terms what you want done -- take this data and treat as another type of data. . Chapter 17 pages 865 -- reinterpret_cast Can you give an example of how to use "reinterpret_cast"? I don't have one to hand -- it is a new feature. And I treat all files as characters whenever I can. But see above for the books example. . Chapter 17 pages 852-853 -- Sequential Files When would you need to use sequential files and why? Exercise for the class -- why do we use files? and why sequential files? . Chapter 17.3 pages 851-52 -- Files and Streams Can you better explain the typedef aliases that the library provides? A .key typedef gives a short name to a complex description of a type. This means you don't have to type (and remember) the full description -- just the abbreviated name. The C++ library is designed to work with any character set -- but most program -- in the USA, at least -- use 8-bit ASCII characters. So the `typedef`s merely make it easier for you to access the commonest form of data. Another point -- in CS202 you can almost certainly get away with only using "fstream", "ifstream"(input file), and "ofstream"(output file). . Chapter 17 pages 869-870 -- Random-Access File In fig. 17.13, could you explain how the following lines work? .As_is outCredit.seekp( ( client.getAccountNumber() - 1 ) * sizeof( ClientData ) ); .As_is outCredit.write(reinterpret_cast< const char * >(&client), sizeof( ClientData ) ); The compiler replaces "sizeof" by the number of bytes in the data. The data is is stored in the file as if it was in numbered boxes: .As_is 0 1 2 3 4 5 .As_is |client0|client1|.......|.......|.......|....... Suppose that client data takes up 30 bytes then the position (in bytes) in the file must be: .As_is 0 1 2 3 4 5 .As_is 0 30 60 90 120 150 .As_is |client0|client1|.......|.......|.......|....... So `( client.getAccountNumber() - 1 ) * sizeof( ClientData )` calculates the position `p` in the file. So, the first command does this .As_is outCredit.seekp( p ); The second command then outputs a clientData record to the file: .As_is outCredit.write(reinterpret_cast< const char * >(&client), sizeof( ClientData ) ); First it takes the client's record and interprets it as an array of bytes (char*), then it copies the bytes into the file, overwriting what ever was there before. Hint..... copy the commands and change the names of the datatype (ClientData) and key (getAcountNumber()). . Chapter 17.3 pages 852 -- Files and Streams Do other programing languages impose structure on file handling? Yes -- but the basic rules of file handling are dictated by the (1) the hardware, (2) the operating system, (3) any special DBMS in use, and (finally) the language. So we have a special purpose language -- SQL -- for modern data bases. . Chapter 17 pages 861 -- ifstream functions -- clear What does inClientFile.clear() do? It resets the status flags on the `inClientFile`. When we read a complete file the stream enters a special "end of file" or "eof" state that stops further access. The "clear()" allows the file to return to the start again. . Chapter 17 pages 861 -- Bug on line 107 Should the "&&" be an "or"? $TBA . Chapter 14 pages 863 -- Random Access files How would a uml diagrams look like with Random Access files? UML doesn't do files -- but a class diagram with attributes (and no operations) can be used to describe a record in a file or data base. We could also use the UML to draw the C++ Application Programmer Interface. . Chapter 17 pages 759-862 -- translate binary/hexadecimal Is there a stl function to translate to and from hexadecimal/binary for file manipulation? I think ($TBA) that the "hex" iomanipulator in switches input from decimal to hexadecimal .As_is input >> hex >> integer >> dec; I have never known anybody but teachers to use the characters '0' and '1' to store data in a file using binary. This means using an 8 bits (a char) to encode a single bit of data. This wastes 7 bits per character.... Plus the result is unreadable. C/C++ reads binary files as a sequence of bytes, stored in a string or character array. There does exist and old way to extract the bits from a piece of data. It is called bit-fields. We should mention them later in this course if all goes as planned. But this is part of the C language and not part of the STL. .See ./17.html There are `bitwise` operations (&, |, ~) that C/C++ provide for manipulating data at the bit-level ... and these can easily translate between binary and decimal... .See ./17.html And then there are the traditional pencil and paper techniques that all Computer People claimed to master.... and special tables of binary that all of us actually used:-) The STL does provide a bit-based data structure `bitset` ($TBA ?) which is in the last chapters we will be covering. . Chapter 17 pages 875 -- constructors for files when creating a constructor for an fstream, what does ios::in | ios::out | ios::binary do? These are used when `using` a constructor to access a file -- fstream. They determine the kind of access that you will be doing: input vs output. And the format -- characters or binary. The "|" operator combines the various kinds of access (it is a bitwise "or" .See ./17.html by the way ) so the example given means: The file is binary and the program may either read or write it (or both). . Chapter 17 pages 881 -- Object Serialization Will we be using object serialization, and if so could you elaborate on its functionality? No -- not in CS202. No -- I won't elaborate, except for the following warning. . Warning -- do not store addresses in a file You have to invent special coding (eg XML) to store a linked data structure on disk. . Exercises -- write a program to copy a file Your program should get the name of an existing file and the name of a new file. It should open both files and get every character, in turn, in from the first file and output it (unchanged) into the new file. Here .See ./Copy.cpp is a slightly improved version of the program we developed in class. .Close CSci202 Computer Science II, Session 13 File Handling . Laboratory 7 Exceptions .See http://www/dick/cs202/lab07.html . Next -- Strings .See http://www/dick/cs202/14.html . Bring a pack of cards next WEEK.