Concept of data files. What is the DATA file extension? What are data files

The second file required for See5 to work is the data file. It has a *.data extension. In our case this is the file USR. data.

Each object in the data file has its own row. If the value of the target variable is at the top of the variable names file, the line begins with the value of that target variable. Then the values ​​of all other characteristics follow, separated by commas. Unknown variable values ​​are encoded with a question mark “?”, followed by a vertical bar “|” You can write comments that are not accepted by the system.

Below is the entire data file USR. data, which we will use to demonstrate the capabilities of See5.

Test data files (optional)

To check the quality of the constructed decision tree and the corresponding set of logical rules, the See5 system provides the ability to work with special files that contain additional test data.

The third type of file used by the See5 system contains new test objects. This is what is also called a control sample. This file USR. test is optional and, if used, has the format of the file already described USR. data.

Next support file USR. cases is also optional. It contains objects with unknown classification.

Cost File

The last file type designated USR. costs,contains information about the cost of various classification errors. Completing this file is optional. However, assigning penalties for errors can be very useful when developing some applications.

User Interface

There are five buttons in the main See5 window (Fig. 1). Let's list them from left to right.

Using a button Locate Da ta (data location) opens a window to view available data files and load them into the system.

At the touch of a button Construct Classifier (building a classifier) ​​the dialog box is accessed to select the type of classifier and set its parameters. The Stop button is intended to stop the process of building a decision tree.

Button Use Classifier(use classifier) ​​starts the process of interactive classification of one or more objects. Using the Cross-Reference button, a window is called up in which the connections between the objects of the training set and the visible rules for their classification are clearly revealed.

All of the above functions are also available from the File menu. In turn, in the menu Edit It is possible to edit the data names file and the classification error cost file.

Rice. 1. Main window of the system See5

Building a decision tree

The first stage of data processing usually uses the default system parameters. Press the button Construct Classifier and then in the dialog box that appears (Fig. 29) immediately click 0K(assuming the data file USR. dat a is already loaded). The system displays a results window that looks like this (Fig. 3). The first line of the results report provides information about the version of the See5 system used and the current time. Then the next two lines say that the classifying variable is diagnosis and read data file USR. data contains 74 objects, each of which is described by eleven features.

Fig2. Dialog window for setting the parameters of the classifier design algorithm

The following lines of the report display the constructed decision tree. It can be interpreted as follows:

IF Index more than 0.69 and Speed more than 18, then class No. 3, otherwise

IF Index more than 0.69 and Speed no more than 18 and Thickness no more than 46, TO class No. 1

etc.

Each branch of the tree ends with the number of the class to which it leads. Immediately following the number is an entry of type(s) or (p/t). For example, the very first branch ends with the entry (12,0). This means that this branch corresponds to 12 objects from a certain (third) class. The last branch ends with record 1 (6.0/1.0), from which it follows that this branch describes class No. 1 and 6 objects fall here, of which 1 is mistaken. Quantities P or T may turn out to be fractional in the case when any branch contains a certain number of objects with unknown feature values.

The next section of the report presents the characteristics of the constructed classifier, evaluated on the training set. Here we see that the constructed decision tree has 9 branches (size = 9), and the classification error is observed on 5 objects, which is 6.8%.

IN The final part of the report contains a table with a detailed analysis of the classification results. Based on the data in this table, we can say that from class 1 (healthy kidneys), 20 objects are correctly classified, and 2 objects are incorrectly classified as class 2; among objects of class 2 (multiple cysts), 35 are diagnosed correctly and 2 are incorrectly recognized as healthy; all objects of class 3 (hydronephrosis) are classified correctly with the exception of one object that falls into class No. 2.

Finally, the See5 system displays a message about the time spent on the solution. in our case it was 0.5 s. Here it should be noted that the See5 algorithm generally operates at a very high speed, allowing it to quickly process high-dimensional information arrays containing thousands and tens of thousands of records.

We can analyze the results of our classification in even more detail. To do this, click the Cross-Reference button in the main See5 window. The system will display a window in the left half of which the constructed decision tree is drawn, and in the right half the objects that fall on one or another branch of the tree are listed. To select the branch of interest, you need to click on it with the left mouse button (a dark circle will appear to the right of the branch - an arrow points to it in Fig. 4). In addition, if you click on the number of an object in the right field, the system will display another window called Case, which displays the values ​​of the characteristics and the selected object. In the case shown in the figure, we are interested in the branch (Index<=0.69 и Age.<"43), на которой находятся 10 объектов из 1-го класса и 1 объект из 2-го класса.

Rice. 4. Display of classification results in the cross-reference window

Executable files

1.1. Batch (BAT)

1.2. Software (COM, EXE)

2.1. Simple texts (TXT)

2.2. Complex (DOC)

2.3. Spreadsheets (XLS)

2.4. Databases (MDB)

2.5. Archive (RAR, ZIP)

2.6. Graphic (BMP, JPG, GIF)

2.7. Application components (LIB, OVL)

2.8. Temporary (TMP), etc.

3. Shortcuts– minifiles containing links to other objects for the purpose of opening them (PIF, LNK)

File attributes:

“Hidden” – not visible unless you specifically set to view hidden files;

“Ready for archiving” – will be archived by the archiving (duplication) wizard;

“Compressed” – will be compressed to save disk space;

“Encrypted” – cannot be opened and copied in another user session, but can be deleted and renamed;

“Indexed” – will be marked for quick search in the future.

Note. The attributes “Compressed”, “Encrypted”, “Indexed” can only be set in the NTFS file system.

Managing File System Objects

Types of operations with FS objects:

1. Navigation and search

2. Creation and deletion

3. Editing and viewing

4. Renaming and setting attributes

5. Copy and move

6. Archiving and unarchiving

Ways to create objects:

1. Program - editor or program wizard

2. File shell or file manager

3. Copying (via clipboard or dragging)

Options for opening objects:

1. Folder: List contents

2. Executable file: launch

3. Data file: launch the editor or viewer registered (associated) for this document type

Options for the consequences of dragging an object (“Drag and Drop”):

1. Left cell “Mice” - moving within a disk, copying between disks

2. Right cell "Mice" - select from context menu at the end of dragging

3. Ctrl+left key. "Mice" - copying

4. Shift+left key. "Mice" - moving

Features of copy-move:

1. When copying: a duplicate byte content is created

2. When moving within one disk, the full file name in FAT changes

3. When moving between different disks, first the file is copied, then the FAT indicates the deletion of this file

Conclusion: within one disk, a move operation is faster than a copy operation, and vice versa between different disks.

Examples of MS-DOS commands:

1. External:

1.3. Format the disk: FORMAT disk_name


2. Internal (executed by the command processor):

2.1. Create directory: MD directory_name

2.2. Removing a directory: RD directory_name

2.3. Deleting the DEL file filename

2.4. View the contents of the current directory: DIR

2.5. Exiting the shell: EXIT

Table Ways to copy and move files

quite natural.

File- this is a sequential chain of a data set that has a name and extension (the extension may be missing - in this case, Windows will perceive the file as undetectable). A file is an information entity, so it is stored on physical media (hard drive, flash drive, etc.). In order for a file to be fully identified, the full file name, consisting of a name, a dot and an extension (this is the sequence), must be unique within the same storage (in Windows directory terminology). The file extension allows the system to determine which programs can correctly open, run, read and use a given file.

It is worth knowing that a directory is the same file, but with specific features. Unlike regular files, it cannot contain data, but it can include files or other directories. Similar to documents in archives. There are folders with pieces of paper (files) inside. And there are thicker folders, inside of which there are other directories.

Let's look at an example. File "document.doc". Here, "document" is the file name, and "doc" is the extension, which tells Windows that the file should be opened and edited using word processors such as Microsoft Word or OpenOffice Writer. If the extension is not known to Windows (the necessary programs are not installed), the system will prompt you to select the program manually. It is worth knowing that Windows has predefined sets of well-known extensions, such as txt (text file), exe (executable program) and others.

Classification of files by functional application

The variety of file types is usually divided according to their functional use. This is not a mandatory criterion, but it makes it easier to understand the purpose of the files. In addition, it is important to know that classification by functional application does not in any way limit the set of extensions. For example, each of the classes can contain archives, documents, executable files, etc.

1. User files- pictures, web pages, documents, tables and other files that users use for their own tasks. The names of such files are limited only by the Windows naming standard. The extensions of such files are usually not specified by users, but are automatically assigned by the programs used during file creation. For example, Microsoft Word sets the extension to "doc" or "docx" (depending on the version), and OpenOffice Writer sets the extension to "odt".

2. System files- all the files that Windows needs for normal functioning. The names of such files also do not contradict the Windows naming standard, but they are predefined before the operating system is installed. Therefore, such files should not be renamed or modified by users, as this may cause errors.

3. Program files- all those files that are used by the installed software. From the point of view of name formation, they are similar to system files. In other words, they also do not contradict the standard names, but have fixed names that were defined by the author. It is worth knowing that programs can also create program files during their operation. For example, error log files or configuration files. These files must only be edited by the program itself, otherwise errors may occur (unless the software specifies otherwise).

File naming standard in the Windows operating system.

Let's look at the file naming standard in the Windows operating system. There are two character sets:

1. Recommended character set. It is allowed to use numbers, Latin, Russian and any other national alphabets in file names. The hyphen sign is also supported. All letters of languages ​​can be written in different cases (upper and lower).

2. Valid character set. This includes space, underscore, apostrophe, semicolon, period, comma, as well as special characters " ! @ # $ % & " (and some special characters from the main encoding). It's worth knowing that this character set is interpreted specifically by Windows. Therefore, if possible, you should not use them. However, space, period, comma, and underscore characters rarely cause problems. But when using other symbols, errors may occur. Especially with some special characters. It’s also worth knowing that if you need files that will open normally in other operating systems, then you should limit yourself to only numbers and the Latin alphabet, and use a period only to separate the name and extension.

Prohibited character set- in file names you cannot use: \ /: * ?< > |

What is a file in GUI? Windows has its own graphical shell that allows users to interact with the computer. In this interface, files are shown as small images, also called icons or icons, and a name with an extension (with certain settings, file extensions are not displayed). Typically, if the extension type is known to Windows, then the icon for such a file will be specific. For example, text files with the extension "txt" are usually represented by a notepad icon.

How the full path to the file is formed. The computer has disks, they are represented by a Latin letter. This is the first component of the path. It is followed by a separator in the form of a colon and a forward slash ":\". If the file is located at the root of the disk, then the full file name comes next (hereinafter simply the file name). If the file is in a directory, the directory name is indicated first, then the "\" character is placed and the file name is indicated. If the file is located in a subdirectory of a directory, then the directory must also be followed by the name of the subdirectory and "\". Further by analogy. In this case, the "\" character is a separator that allows you to identify each part of the full path to the file.

The MathCAD11 data file must simply be an ASCII file. MathCAD11 reads files that consist of numbers separated by commas, spaces, or carriage returns. Below are examples of some files read by MathCAD11, assuming they are written in ASCII format:

§ a file created by outputting data from a spreadsheet to disk;

§ a column of numbers typed in a word processor and saved in ASCII format;

§ the result of a program written in a high-level language;

§ data exported from the database.

The numbers in the data files can be integers like 3 or -1, floating point numbers like 2.54, or scientific notation like 4.51E -4 (for 4.5 10 -4). For example, the following list of numbers would be a valid string in a MathCAD11 data file:

200, 50 25.1256, 16E – 2, – 16.125E15

MathCAD11 also saves data to ASCII files. Data files saved by MathCAD11 contain numbers separated by spaces and carriage returns. MathCAD11 documents themselves are not data files in this sense. The only way to create a data file from MathCAD11 is to use the file access functions.

File access functions

MathCAD11 has six file access functions READ, WRITE, APPEND, READPRN, WRITEPRN , APPENDPRN. Their properties:

§ The function name must be printed in capital letters;

§ If MathCAD11 cannot find a data file, it marks the corresponding access function with the error message “ file not found" If MathCAD11 tries to read a file of an inappropriate format, it flags the function with the message “ file error”;

§ Left side of an assignment operator using one of the functions WRITE, APPEND, WRITEPRN, APPENDPRN, should not contain anything else;

§ Each new equality using access functions reopens the data file. When reading data, for example, each new equality starts reading at the beginning of the file;

§ In one equality, the file can only be opened once. This means that if the function READ used with the same filename argument twice in the same equation (this is possible when using a discrete argument), the second time READ will begin reading from the place where reading ended the first time. Because the READPRN reads the entire file, which means that READPRN cannot be used with the same argument twice in the same equality - the second time READPRN there will be nothing left to read;

§ If two equalities in a working document use WRITE or WRITEPRN with one argument, the data from the second equality will be written over the data from the first. Should be used APPEND or APPENDPRN if you need to save the first piece of data. These functions append new data to an existing file.

In table Section 6.1 describes these six functions.

Table 6.1

File access functions

Function Meaning
READ( file) Reads a value from a data file. Returns a scalar. Typically used as follows: v i:=READ(“ file")
WRITE( file) Writes the value to the data file. If the file already exists, replaces it with a new file. Used in definitions of the following form: WRITE( file) := v i
APPEND( file) Appends a value to an existing file. Used in definitions of the following form: APPEND( file) := v i
READPRN( file) Reads a structured data file. Returns a matrix. Each row in the data file becomes a row in the matrix. The number of elements in each line must be the same. Typically used like this: A: = READPRN( file)

End of Table 6.1

(data file) A file located on a computer system that contains data, as opposed to files containing a program. See: computer programming. A data file is usually divided into records and fields.


  • - The stone of Destiny...

    Encyclopedia of Mythology

  • - a named collection of bytes, recorded on a hard or floppy disk, in which a separate element of a computer system is stored, for example. Word document or drawing...

    Encyclopedia of technology

  • - a set of pieces of information of the same type in structure and method of use, placed on data carriers of the external memory of a computer and considered in the process of transmission and processing as a single whole...

    Big Encyclopedic Polytechnic Dictionary

  • - A collection/complex of interrelated information in a computer, stored in its storage as a single whole. The file may contain a program that can be copied into RAM and executed...

    Dictionary of business terms

  • - a collection of related records considered as a single whole...

    Great Accounting Dictionary

  • - a set of ordered and interconnected pieces of information that has a description for identifying departments. portions...

    Natural science. encyclopedic Dictionary

  • - A file containing system information about the server’s operation and information about user actions: - date and time of the user’s visit; - IP address of the user’s computer; - name of the user's browser...

    Dictionary of business terms

  • - a file containing system information about the operation of the server and information about user actions: - date and time of the user’s visit; - IP address of the user’s computer; - name of the user's browser...

    Financial Dictionary

  • - a set of related records stored in the external memory of a computer and considered as a whole. Typically, a file is uniquely identified by specifying the file name, its extension, and the path to the file...

    Financial Dictionary

  • - a set of records located in a different order in relation to the source file. See. also: Files  ...

    Financial Dictionary

  • - a set of records arranged in the order they were received at the recording point. See. also: Files  ...

    Financial Dictionary

  • - a set of records, the elements of which are obtained by processing the original or inverted files. See. also: Files  ...

    Financial Dictionary

  • - a collection of related records considered as a single whole. For example, one line of a personnel questionnaire is considered as an element, the entire questionnaire is considered as a record, a complete set of such records is considered as a file...

    Large economic dictionary

  • - "..."" means any data set subject to automated processing.....

    Official terminology

  • - a set of ordered and interconnected pieces of information from homogeneous elements, having a description for identifying individual pieces...

    Modern encyclopedia

  • - ; pl. fa/yly, R....

    Spelling dictionary of the Russian language

"DATA FILE" in books

Project “Data Warehouse” and project “Technology for identifying hidden relationships within large databases”

From the author's book

The “Data Warehouse” project and the “Technology for identifying hidden relationships within large databases” project Both of these projects were integrated in 1999. Thanks to them, the development and implementation of campaigns for the sale of banking products began. These projects have created great

Export data from an Access 2007 database to a SharePoint list

author Londer Olga

Exporting data from an Access 2007 database to a SharePoint list Access 2007 allows you to export a table or other database object in a variety of formats, such as an external file, a dBase or Paradox database, a Lotus 1-2-3 file, an Excel 2007 workbook, a Word file 2007 RTF, text file, XML document

Move data from an Access 2007 database to a SharePoint site

From the book Microsoft Windows SharePoint Services 3.0. Russian version. Chapters 9-16 author Londer Olga

Moving data from an Access 2007 database to a SharePoint site The needs of many Access 2007 applications go beyond simply managing and collecting data. Often such applications are used by many users of the organization, which means they have increased needs for

Rescuing data from a damaged database

From the book InterBase World. Architecture, administration and development of database applications in InterBase/FireBird/Yaffil author Kovyazin Alexey Nikolaevich

Rescuing data from a damaged database It is possible that all of the above steps will not restore the database. This means that the database is seriously damaged and either cannot be restored as a whole, or it will require

Validation of entered data at the database processor level

author McManus Geoffrey P

Validating entered data at the database processor level In addition to validating data as you enter information, you should be aware that you can also perform validation at the database processor level. This check is usually more reliable because it is applied independently

From the book Database Processing in Visual Basic®.NET author McManus Geoffrey P

Updating a Database Using a Data Adapter Object

From the book The C# 2005 Programming Language and the .NET 2.0 Platform. by Troelsen Andrew

Updating a Database Using a Data Adapter Object Data adapters can do more than populate DataSet object tables for you. They can also support a set of basic SQL command objects, using them to return modified data back to storage

Chapter 2 Data Entry. Types, or formats, of data

From an Excel workbook. Multimedia course author Medinov Oleg

Chapter 2 Data Entry. Types, or formats, of data Working with Excel documents involves entering and processing various data, that is, information that can be textual, numerical, financial, statistical, etc. MULTIMEDIA COURSE Methods of data entry and processing

2.4.5.1. Falcon File and Data Structures

From the book MySQL: A Professional's Guide author Pautov Alexey V

3.2. Exporting data from ERwin to BPwin and associating data model objects with arrows and activities

From the book Business Process Modeling with BPwin 4.0 author Maklakov Sergey Vladimirovich

Client-server databases versus file servers

by Borri Helen

Client-server databases versus file servers File sharing systems are another example of client-server systems. File servers and file system servers serve client requests for files and file systems in sometimes very confusing ways

Data model database

From the book Firebird DATABASE DEVELOPER'S GUIDE by Borri Helen

Data model<>database The “world” that was obtained through the process of description and analysis is the blueprint for your data structures. It is believed that a logical model should describe relationships and sets. A common mistake (and a pitfall common to all CASE tools) is to blindly

Databases (classes for working with databases)

From the book Microsoft Visual C++ and MFC. Programming for Windows 95 and Windows NT author Frolov Alexander Vyacheslavovich

Databases (Database Classes) MFC includes several classes that provide support for database applications. First of all, these are classes focused on working with ODBC drivers – CDatabase and CRecordSet. New tools are also supported

From the book Commentary on the Federal Law of July 27, 2006. N 152-FZ "On personal data" author Petrov Mikhail Igorevich

Article 16. Rights of personal data subjects when making decisions based solely on automated processing of their personal data Commentary on Article 161. The commented article defines the rights of personal data subjects in relation to the adoption

2. Determining the type of data comparison (from idea to data comparison)

From the book Speak the Language of Diagrams: A Guide to Visual Communications author Zelazny Jean

2. Determine the type of data comparison (from idea to data comparison) This step is the link between the idea and the finished diagram. It is very important to understand that any idea - any aspect of the data that you want to focus on - can be expressed through

Internet