Glossary of Information Management Terms
A |
B |
C |
D |
E |
F |
G |
H |
I |
J |
K |
L |
M |
N |
O |
P |
Q |
R |
S |
T |
U |
V |
W |
X |
Y |
Z
A
ACCURACY RATE
The ratio of characters identified correctly by an OCR/ICR system, usually expressed on a per-character or per-field basis.
ANNOTATION
The ability to attach digital sticky notes, add text, rubberstamps, highlights, call-outs, and any other number of markings to communicate or clarify a document image, usually without affecting the original version.
APPLICATION PROGRAM INTERFACE (API)
The common term for any language and format that helps one program communicate with another program.
APPLICATIONXTENDER
[EMC|Documentum] Electronically stores, organizes, and manages documents, files, and other business-critical information, and provides fast, security-controlled access to information from either Windows or Web based clients. Built on a central repository, ApplicationXtender provides specific capabilities for high-speed image capture and storage, and is designed for quick deployment.
Learn more about EMC|Documentum ApplicationXtender
ARCHITECTURE
The conceptual design that defines how a system’s hardware and software components are integrated and inter-connected to work with each other, and includes computer architectures, network architectures and software architectures.
ARCHIVE
A depository of data, documents, media, and other information usually compressed and stored for long term storage and access.
B
BANDWIDTH
The amount of data that can be passed through a given communications medium, from copper wire to optical fiber, in a given period of time, usually expressed in Hertz (analog) and Bits per second (bps- digital).
BINARY CODE
Code which represents information as a sequential series of 0s and 1s.
BIT
A common contraction for the term “Binary digit”, which is the smallest unit of data a computer can process. There are 8 bits in a byte, which is equivalent to one character.
BIT RATE (BPS)
Speed at which data is transferred through a digital communications medium, expressed in bits per second.
BITMAP
An image file format used to store digital images usually ending in *.bmp-- representing a point-by-point description of an electronic image. File size for .bmp is typically several that over an equivalent JPG, however BMP offers better quality.
BITONAL
An image in which the picture elements have only two intensity values, 1 or 0, which usually stand for black and white.
BOOLEAN SEARCH
Search strategy that allows users to combine words or phrases to search online catalogs or databases, such as AND, OR, and NOT functions.
BUSINESS PROCESS AUTOMATION
The use of computer-based information technology (specifically workflow technology) to automate the steps in a business process, coordinate the assignment and distribution of work items and information among individuals, and manage the completion of tasks, activities and ultimately business process.
BYTE
Eight bits of data grouped together to represent a character or some other computing data.
C
CHECK-IN
The process of uploading a new version of a document or object to a repository, usually accompanied by releasing the lock that the check-out owner had on the document.
CHECK-OUT
The process of downloading a document or object from a repository and placing a lock on the object. The lock prevents other users from checking out or otherwise modifying the document while checked out.
COMPRESSION
A software or hardware process that reduces the size of an image or data file by removing the bits that define blank spaces and other redundant data, and replacing them with smaller algorithm that represents the removed bits.
COMPUTER OUTPUT TO LASER DISK (COLD)
Technique used to transfer computer-generated output to optical disk.
CONFIDENCE VALUE
A number indicating the degree of probability, or confidence, that the classification of a given character is accurate.
CONTENT MANAGEMENT SYSTEM (CMS)
Similar to a Document Management System (DMS), a CMS is a computer application used to manage the process flows needed to collaboratively create, edit, review, index, search, publish and archive various kinds of digital information, such as scanned documents and rich media.
D
DATA CAPTURE
The process of collecting information and converting it into digital form for processing, either by manual data entry (word processing), Optical Character Recognition (OCR), imaging or recording.
DEDUPLCATION
Technology that reduces the amount of physical backup data by identifying and stripping out duplicate copies of data, either at the data source or target backup device.
DESPECKLE
An imaging processing operation that cleans and removes flyspecks, or pepper, from an image to improve the OCR accuracy, legibility and compression-ability.
DIGITIZED INFORMATION
Any data or information represented in numerical form-- the only language computers understand. Whether it is video, photos, images, or audio, more than 90 percent of all new information is now created in digital form.
DISPATCHER
[EMC|Captiva] Application that leverages intelligent document recognition capabilities for the automatic classification, indexing, extraction, and routing of documents, including sophisticated recognition technology that can extract data from all types of documents, including free-form data extraction from less structured documents.
Learn more about EMC|Captiva Dispatcher
DISTRIBUTED CAPTURE
Enables organizations to capture and submit scanned documents from remote locations anywhere in the world via a simple Internet connection, allowing incoming data to be available and utilized across the enterprise. This means that labor-intensive tasks such as document scanning can be distributed to locations where they can be performed most economically.
DOCUMENT
A collection of data organized into some logical order and presented on paper.
DOCUMENT IMAGING
The process in which paper-based information, such as documents, are scanned, converted into digital format (tiff, gif, etc.), and imported into an Electronic Document Management System (DMS). Also referred to as Document Capture.
Learn more about Document Capture
DOCUMENT MANAGEMENT SYSTEM (DMS)
A computer system, or set of programs, that track and store electronic documents, or images of paper documents-- often viewed as an integral component of Enterprise Content Management. See ECM.
DOCUMENT PREPARATION
Steps taken to prepare documents for scanning, such as removing from envelopes, pre-sorting by categories, and any removing of paper clips, staples, and bindings.
DOCUMENT RETRIEVAL
The ability to search, locate, select and immediately display a document from a repository or storage medium.
DOTS PER INCH (DPI)
A measurement of output device resolution and quality based on the number of dots a printer can print per inch, both horizontally and vertically. The higher the DPI, the higher the resolution and detail of the document image. See Image Resolution.
DROP-OUT INK
Specially colored ink which is invisible to scanners, commonly used for printing forms. The result is that the scanner only recognizes the hand-printed or machine printed data that is filled in the fields of the form, which facilitates extraction.
E
ENCRYPTION
Security measure that involves the scrambling of sensitive information and digitized data, such as account numbers or transaction information into a coded form by means of an algorithm, used to prevent unauthorized use or access of data.
ENTERPRISE CONTENT MANAGEMENT (ECM)
The technologies and strategies used by an organization to capture, manage, store, preserve, and analyze all its information assets, both structured and unstructured, throughout its lifecycle.
F
FIBER OPTICS
The superhighway of the information age, this technology uses a modulated beam of light for moving enormous quantities of data through flexible glass wires, capable of moving entire libraries in seconds.
FORMS PROCESSING
The ability for software to read a scanned form, extract the data into computer readable format, and export it to back end systems or databases-- often eliminating the need for manual data entry. Typically using zonal OCR and “drop-out ink” to improve recognition accuracy.
G
GIF
A compression scheme patented by Unisys that uses 256 distinct colors-- mainly used in Internet applications to compress 8-bit images. Also supports animation.
I
IMAGE RESOLUTION
The fineness or coarseness of an image as it was digitized, measured as dots-per-inch (dpi), typically from 200 to 400 dpi.
INPUTACCEL
[EMC|Captiva] An image capture application suite that enables the scanning of paper documents and the conversion of the scanned images into information that can then be stored as data in a repository.
Learn more about EMC|Captiva InputAccel
INTELLIGENT CHARACTER RECOGNITION (ICR)
Advanced form of OCR technology that may include sophisticated capabilities such as learning fonts during processing or using context to strengthen probabilities, and improve accuracy and recognition. See Dispatcher.
INTEROPERABILITY
Ability of one company’s hardware or software to seamlessly talk to and exchange data with the other companies’ systems or devices.
ISIS
ISIS Scanner Drivers is the defacto standard for document imaging scanner control, and allows you to take full advantage of the built-in power of your scanner by ensuring they run at their rated speed or higher.
M
METADATA
Literally, “data about the data”-- be it a document, image, dataset or other resource. Metadata is valuable in the effective storage and retrieval of documents and information. The better quality metadata, the more easily discoverable it is.
N
NAS
Network-Attached Storage (NAS) is a hard disk storage device that is set up with its own network address rather than being attached directly to the computer that is serving applications or files to a network's users—making it possible for many users, usually on a Local Area Network, to share information contained in files. (See SAN also).
O
OPTICAL CHARACTER RECOGNITION (OCR)
Technology that searches scanned documents for printed and other symbolic information and converts it into character codes, such as ASCII, that can then be manipulated and edited as text. OCR can also be used for the indexing of documents for subsequent search and retrieval. See Dispatcher.
R
RECORDS MANAGEMENT
The practice of maintaining an organization’s important records from the time they are created up to their eventual disposal, include the classifying, storing, securing, and compliant preservation or destruction of records.
REPOSITORY
In document management, a repository is a central place where document images and their associated metadata can be stored, maintained, accessed and analyzed.
REJECTION RATE
The rate or percentage of characters or fields that an ICR system determines are too ambiguous to attempt to identify correctly.
RICH CONTENT
Digitized information other than text-- including photos, music files, video, digital movies, etc...
S
SCALABLE
The ability of a system to be expanded in size and functionality to accommodate the continued growth and evolution of an organization.
SAN or STORAGE AREA NETWORK
A network of information storage devices and the computers that access them, designed to give users access to whatever information they need, whenever they need it, providing speed and scalability.
SAN vs. NAS
Raging debate rendered obsolete by EMC with the introduction of Celerra with HighRoad software, which delivers the benefits of both. EMC believes all information storage should be networked.
SCANNING
The process of transforming documents into digital form by creating a two-dimensional image of the document, whereby the surface of a document is analyzed by a beam of light for characters and graphics. Also referred to as imaging or document capture.
T
TRANSACTIONAL CONTENT
Term coined by Forrester Research to describe content that originates outside an organization which relies on internal workflows to drive business processes. Typically, this content initiates transactions between companies and can include orders and invoices from customers, vendors, suppliers, and other business partners.
TERABYTE
A terabyte is a measure of computer storage equal to 1,024 gigabytes. From the word “tera”, which means trillion, although actually 1,099,511,627,776 bytes in a computer’s binary system.
W
WORKFLOW
A work process involving a series of distinct steps that must be executed in a particular order. Often tracking and managing documents as they progress from entry into the system, through the various departments in the organization to their final destination.
WIRELESS
Electronic communication using radio or light waves rather than hard wiring between systems or devices. The continued proliferation of wireless devices greatly accelerates the need for intelligent information storage.