Information Architecture is the art and science of labeling and organizing information, so that it is findable, manageable and useful. The use of information architecture has expanded enormously in recent years. This expansion has evolved from web sites, to intranets, Content Management Systems, Knowledge Management Systems and to Data Analytics (data preparation for data analytics/machine learning) and defining data structures for your IoT devises (sensors).
Information Architecture provides the methods and tools for organizing, labeling, building relationships (through associations), and describing (through metadata) your unstructured content adding this source to your overall pool of available data for analysis. Information architecture enables data analytics to rapidly explore and analyze any combination of structured and unstructured sources. Big Data requires information architecture to exploit relationships and synergies between information, aligning unstructured and structured data. This infrastructure enables organizations to make decisions utilizing the full spectrum of your big data sources.
To facilitate the inclusion of unstructured data (content) the metadata schema must be utilized (it’s an essential artifact that is developed as a part of delivering the information architecture). Having a sound information architecture will enable you to build a consistent structure to big data in order for this data to provide value to the organization. Data preparation for machine learning leverages information architecture principles and supports data integration, analytics and data science use cases.
Applying information architecture processes and tools in data preparation is essential to training your machine learning algorithms with the entirety of your available unstructured data sources. Some of the essential elements of the information architecture process as it pertains to data preparation include:
Content Consumption, which provides an understanding of the universe of relevant content (unstructured data) through performing a content audit. This contributes directly to volume of available content.
Content Generation, which fills gaps identified in the content audit by gathering the requirements for content creation/generation. This in turn will contribute directly to increasing the amount of content that is available in the organization’s data resources.
Content Organization will provide business rules to identify relationships between content, create metadata schema to assign content characteristic to all content. This contributes to increasing the volume of data available and in some ways leveraging existing data to assign metadata values.
Content Access is all about enabling the machine algorithm to access and process content. This will contribute to the volume of data, through establishing the parameters and often times additional metadata fields and values to enhance data analytics.
Content Governance focuses on establishing accountability for the accuracy, consistency and timeliness of content, content relationships, metadata and taxonomy (as well as ontology development) within areas of the enterprise and the applications that are being used. Content Governance will often “prune” the volume of content available in the organization’s data resources by only allowing access to pertinent/relevant content, while either deleting or archiving other content.
Content Quality of Service, which focuses on security, availability, scalability, usefulness of the content and improves the overall quality of the volume of content in the organization’s data resources by:
– defending content from unauthorized access, use, disclosure, disruption, modification, perusal, inspection, recording or destruction
– eliminating or minimizing disruptions from planned system downtime
making sure that the content that is accessed is from and/or based on the authoritative or trusted source, reviewed on a regular basis (based on the specific governance policies), modified when needed and archived when it becomes obsolete
– enabling the content to behave the same no matter what application/tool implements it and flexible enough to be used from an enterprise level as well as a local level without changing its meaning, intent of use and/or function
– by tailoring the content to the specific audience and to ensure that the content serves a distinct purpose, helpful to its audience and is practical.
Inclusion of additional types of data into the information architecture is needed. This includes semi-structured data (i.e., data coming from sensors such as RFID, location information coming from the mobile devices, information from web logs. documents and emails). These new data elements are often produced at much higher rates than the classical transactional data. There is a lot more data coming in at much higher rates, and enterprises need to be able to manage these new types of data and incorporate them into their overall information architecture framework.
Implementing an Information Architecture (IA) will:
- Drive a user centric taxonomy/ontology, metadata and associated keywords to enable consistent labeling, organization, and categorization of your content.
- Enable machine learning algorithms to exploit relationships and synergies between your data and facilitate your organization’s ability to make decisions utilizing the full spectrum of your data sources.
- For IoT, IA offers a viable option in which to construct content that will be represented in a flexible object-oriented fashion.