Data Glossary

Office of Data Analytics and Business Intelligence | June 24, 2021
This glossary covers terms that are used in communication and use of day to day usage of data.
Application
Application refers to a piece of software, designed to run on the web or on mobile phones, that connects to large databases. Applications are a way of consuming open data, and are real-time, personalized, and location-specific.
Application Programming Interface (API)
A computing interface that defines interactions between multiple software or mixed hardware-software intermediaries. In simple terms APIs can be used to connect to data and extract it. The Open Data Portal (ODP) has APIs available for every published data set that can be used to extract and analyze data without manually downloading data from the Dallas Open Data Portal.
Attribute
A characteristic of a geographic feature, typically stored in tabular format and linked to the feature in a relational database. The attributes of a well-represented point might include an identification number, address, and type.
Base Layer
A primary layer for spatial reference, upon which other layers are built. Examples of a base layer typically used are either the parcels, or street centerlines.
Buffer
A zone of a specified distance around a feature.
Catalog
A catalog is a collection of datasets, maps and other visuals. The Dallas Open Data Portal catalog has three broad categories, economy, services and public safety. All datasets are broadly divided and cataloged under these three categories.
Category
Methodology by which items or datasets are classified or grouped under a similar theme or topic. Also referred to as Taxonomy.
Comma Separated Values File (CSV)
A standard format for spreadsheets where data is stored in a plain text file, with each data row on a new line and commas separating the values on each row.
Computer Aided Design (CAD)
An automated system for the design, drafting and display of graphically oriented information.
Confidential Data
The principle whereby certain data should not be disclosed because disclosure is prohibited by law, or disclosure could be detrimental to the City or to other individuals or entities. Also referred to as Protected Data.
Coordinate
An x,y location in a Cartesian coordinate system or an x,y,z coordinate in a three dimensional system. Coordinates represent locations on the Earth’s surface relative to other locations.
Data
Data can be defined broadly as information collected on a specific subject. Data types can be divided further into structured data (organized in rows and columns) or unstructured data (images). Data includes numbers or information located in tables, graphs, maps, images and others. Data can be used to understand spatial and temporal trends of a phenomenon, to figure out associated factors for the phenomenon and also future prediction of the phenomenon.
Data Cleaning or Scrubbing
Data in its raw form needs to be cleaned and processed to eliminate errors. Data scrubbing can take as much as 80% of an analysts time in the entire analytical process.
Data Dictionary
Data dictionary is information about data variables and names in addition to the data table.
Data Portal
Data portal, in this case the City of Dallas Open Data Portal is a platform where data and maps from the City of Dallas and other valid sources are published and shared with the public free of cost. Data on Dallas Open Data Portal is regularly maintained and refreshed to keep it recent and relevant for the users.
Data Schema
A specification that defines the structure of the data (required data elements and types, and supporting definitions).
Data Story/Perspective Pages
Data stories or perspective pages (inside Dallas Open Data Portal) are a very important part of data communication, particularly to those who are not familiar with a certain data. Data stories make it easier for the data audience to understand the objective of data collection, processing and analysis and also helps in understanding conclusions of the analysis.
Data Table
Data table refers to a tabular form of data that can be displayed in rows and columns.
Data Users
Any individual or organization that accesses, downloads, analyzes, or who uses data to develop apps, visualizations, reports, and other information products or services.
Database
A logical collection of interrelated information, managed and stored as a unit. A GIS database includes data about the spatial location and shape of geographic features recorded as points, lines, and polygons as well as their attributes.
Dataset
A dataset is any organized collection of data. Dataset is a flexible term and may refer to an entire database, a spreadsheet or other data file, or a related collection of data resources.
Digital Elevation Model (DEM)
Terrain elevation data provided in digital form.
Digitize
To encode map features as x,y coordinates in digital form. Lines are traced to define their shapes. This can be accomplished either manually or by use of a scanner.
File Format
The file format refers to the internal arrangement (format) of the file, not how it is displayed to users. For example, CSV , XLS, JSON files are structured very differently, but may look similar or identical when opened in a spreadsheet program. The format corresponds to the last part of the file name or extension.
Flat Files
A flat file is an informal term for a single table of data from which all word processing or other structure characters or markup have been removed. A flat file stores data in plain text format. Because of their simple structure, flat files can only be read, stored and sent. CSV files are one of the most common types of flat files.
Geocode
The process of identifying a location by one or more attributes from a base layer.
Geographic Information System (GIS)
An organized collection of computer hardware, software, geographic data, and personnel designed to efficiently capture, store, update, manipulate, analyze, and display all forms of geographically referenced information.
Geospatial Data
Data related to the position of things in the real world, including boundaries or locations.
GitHub
GitHub is a code-hosting platform for version control and collaboration. It allows users to work together on projects from any location and supports open source development.
Global Positioning System (GPS)
A satellite based device that records x,y,z coordinates and other data. Ground locations are calculated by signals from satellites orbiting the Earth. GPS devices can be taken into the field to record data while walking, driving, or flying.
JavaScript Object Notation (JSON)
A simple format for data that can describe complex data structures, is both machine-readable and somewhat human-readable, is independent of platform and programming language, and has become a format for data exchange between apps, programs and computer systems.
Layer
A logical set of thematic data described and stored in a map library. Layers act as digital transparencies that can be laid atop one another for viewing or spatial analysis.
Line
Lines represent geographic features too narrow to be displayed as an area at a given scale, such as contours, street centerlines, or streams.
Metadata
Metadata is information about a dataset that makes the data easier to find or identify. Metadata includes the title and description, method of collection, limitations, author, publisher, time period covered, license, date and frequency of release. Metadata describes the dataset’s structure, data elements, its creation, access, format, and content.
Open Data Commons Attribution License (ODC-By)
An agreement intended to allow users to freely share, modify, and use a Database.
Ortho Imagery
Aerial photographs that have been rectified to produce an accurate image of the Earth by removing tilt and relief displacements, which occurred when the photo was taken.
Point
A single x,y coordinate that represents a geographic feature too small to be displayed as a line or area at that scale.
Polygon
A multisided figure that represents area on a map. Polygons have attributes that describe the geographic feature they represent.
Query
A request for data or information from a database table or combination of tables. This data may be generated as results returned by Structured Query Language (SQL) or as pictorials, graphs or complex results, e.g., trend analyses from data-mining tools.
Scale
The ratio or relationship between a distance or area on a map and the corresponding distance or area on the ground.
Shapefile
The shapefile format is a popular geospatial vector data format for geographic information system (GIS) software. The shapefile format can spatially describe vector features: points, lines, and polygons, representing, for example, water wells, rivers, and lakes.
Spatial
Spatial is a metadata term that means the dataset has locational (geographic) information such as coordinates, address, city, or ZIP code.
Spatial Analysis
The process of modeling, examining, and interpreting model results. Spatial analysis is useful for evaluating suitability and capability, for estimating and predicting, and for interpreting and understanding.
Structured Data
Structured data refers to information with a high degree of organization, making the data readily searchable by search engines. Tall versus wide is best to establish publishing consistency.
Structured Query Language (SQL)
A syntax for defining and manipulating data from a relational database. Developed by IBM in the 1970s, it has become an industry standard for query languages in most relational database management systems.
Tag
A tag is a keyword or term assigned to a piece of information or a file. This type of metadata helps describe an item and allows it to be found by browsing or searching.
Theme
An ArcView theme stores map features as primary features (such as arcs, nodes, polygons, and points) and secondary features such as tics, map extent, links, and annotation. A theme usually represents a single geographic layer, such as soils, roads, or land use.
Unstructured Data
Unstructured data (or unstructured information) is information that either does not have a pre-defined data model or is not organized in a pre-defined manner, such as a flat file. Unstructured information is typically text-heavy, but may contain data such as dates, numbers, and facts as well.
Visualization
A visual representation of data, such as a chart, graph or dashboard, is often the easiest way of communicating with data, bringing out its key features. Many visualization tools exist such as Google Charts, Excel, ArcGIS, Tableau, and PowerBI. Creating a dataset’s visualization requires careful attention to the meaning of the variables, the relations between them and the stories.
XML
Extensible Markup Language, is a flexible file format designed to store, transport and share data over the Internet. XML is both human- and machine-readable.