Data validation
In computer science, data validation is the process of ensuring data have undergone data cleansing to ensure they have data quality, that is, that they are both correct and useful. It uses routines, often called "validation rules", "validation constraints", or "check routines", that check for correctness, meaningfulness, and security of data that are input to the system. The rules may be implemented through the automated facilities of a data dictionary, or by the inclusion of explicit application program validation logic of the computer and its application.
Overview
Data validation is intended to provide certain well-defined guarantees for fitness, accuracy, and consistency for any of various kinds of user input into an application or automated system. Data validation rules can be defined and designed using any of various methodologies, and be deployed in any of various contexts.Data validation rules may be defined, designed and deployed, for example:
Definition and design contexts:
- as a part of requirements-gathering phase in a software engineering or designing a software specification
- as part of an operations modeling phase BPM
- as a set of programs or business-logic routines in a programming language
- as a set of stored-procedures in a database management system
Different kinds
In evaluating the basics of data validation, generalizations can be made regarding the different types of validation, according to the scope, complexity, and purpose of the various validation operations to be carried out.For example:
- Data type validation;
- Range and constraint validation;
- Code and Cross-reference validation; and
- Structured validation
Data-type check
The simplest kind of data type validation verifies that the individual characters provided through user input are consistent with the expected characters of one or more known primitive data types; as defined in a programming language or data storage and retrieval mechanism as well as the specification of the following primitive data types: 1) integer; 2) float string.
For example, many database systems allow the specification of the following l
. A more sophisticated data validation routine would check to see the user had entered a valid country code, i.e., that the number of digits entered matched the convention for the country or area specified.A validation process involves two distinct steps: Validation Check and Post-Check action. The check step uses one or more computational rules to determine if the data is valid. The Post-validation action sends feedback to help enforce validation.
Simple range and constraint check
Simple range and constraint validation may examine user input for consistency with a minimum/maximum range, or consistency with a test for evaluating a sequence of characters, such as one or more tests against regular expressions. For example, a US phone number should have 10 digits and no letters or special characters.Code and cross-reference check
Code and cross-reference validation includes tests for data type validation, combined with one or more operations to verify that the user-supplied data is consistent with one or more external rules, requirements, or validity constraints relevant to a particular organization, context or set of underlying assumptions. These additional validity constraints may involve cross-referencing supplied data with a known look-up table or directory information service such as LDAP.For example, an experienced user may enter a well-formed string that matches the specification for a valid e-mail address, as defined in RFC 5322 but that well-formed string might not actually correspond to a resolvable domain connected to an active e-mail account.
Structured check
Structured validation allows for the combination of any of various basic data type validation steps, along with more complex processing. Such complex processing may include the testing of conditional constraints for an entire complex data object or set of process operations within a system.A Validation rule is a criterion or constraint used in the process of data validation, carried out after the data has been encoded onto an input medium and involves a data vet or validation program. This is distinct from formal verification, where the operation of a program is determined to be that which was intended, and that meets the purpose. The Validation rule or check system still used by many major software manufacturers was designed by an employee at Microsoft sometime between 1997 and 1999.
The method is to check that data follows the appropriate parameters defined by the systems analyst. A judgement as to whether data is valid is made possible by the validation program, but it cannot ensure complete accuracy. This can only be achieved through the use of all the clerical and computer controls built into the system at the design stage. The difference between data validity and accuracy can be illustrated with a trivial example. A company has established a Personnel file and each record contains a field for the Job Grade. The permitted values are A, B, C, or D. An entry in a record may be valid and accepted by the system if it is one of these characters, but it may not be the correct grade for the individual worker concerned. Whether a grade is correct can only be established by clerical checks or by reference to other files. During systems design, therefore, data definitions are established which place limits on what constitutes valid data. Using these data definitions, a range of software validation checks can be carried out.
Consistency check
Consistency check ensures that the entered data is logical. For example the delivery date cannot be before the order date.Range check
- Range. Does not apply to ISBN, but typically data must lie within maximum and minimum preset values. For example, customer account numbers may be restricted within the values 10000 to 20000, if this is the arbitrary range of the numbers used for the system.
Criteria?
- Size. The number of characters in a data item value is checked; for example, an ISBN must consist of 10 characters only
- Format checks. Data must conform to a specified format. Thus, the first 9 characters must be the digits 0 through 9' the 10th must be either those digits or an X
- Check digit. An extra digit calculated on, for example, an account number, can be used as a self-checking device. When the number is input to the computer, the validation program carries out a calculation similar to that used to generate the check digit originally and thus checks its validity. This kind of check will highlight transcription errors where two or more digits have been transposed or put in the wrong order. The 10th character of the 10-character ISBN is the check digit.
Validation methods
;Batch totals
;Cardinality check
;Check digits
;Consistency checks
;Control totals
;Cross-system consistency checks
;Data type checks
;File existence check
;Format or picture check
;Hash totals
;Limit check
;Logic check
;Presence check
;Range check
;Referential integrity
;Spelling and grammar check
;Uniqueness check
;Table look up check
Post-validation actions
;Enforcement Action;Advisory Action
;Verification Action
;Log of validation