Field of the Invention
[0001] The present invention generally relates to managing data associated with computer-generated
documents. More particularly, the present invention relates a data store for storing
and relating data associated with computer-generated documents in a separate location
from presentation data for a document's typical presentation format.
Background of the Invention
[0002] With the advent of the computer age, computer and software users have grown accustomed
to user-friendly software applications that help then write, calculate, organize,
prepare presentations, send and receive electronic mail, make music, and the like.
For example, modern electronic word processing applications allow users to prepare
a variety of useful documents. Modem spreadsheet applications allow users to enter,
manipulate, and organize data. Modern electronic slide presentation applications allow
users to create a variety of slide presentations containing text, pictures, data or
other useful objects.
[0003] According to prior methods and systems, documents created by such applications (e.g.
word processing documents, spreadsheets, slide presentation documents) have limited
facility for storing/transporting the contents of arbitrary metadata required by the
context of the documents. For example, a solution built on top of a word processing
document may require the storage of workflow data that describes various states of
the document, for example, previous workflow approval states (dates, times, names),
current approval states, future workflow states before completion, name and office
address of document author, document changes, and the like. According to such prior
methods and systems, the options for storing this information were primarily limited
to the use of document variables or existing custom object linking and embedding (OLE)
document properties that have several limitations. For example, such prior methods
can only store name/value pairs (no hierarchical data). Such methods are limited to
255 characters maximum. Such methods are built to contain only text. All properties
for such methods are stored in a single store, for example, an OLE properties store,
which means the properties have a possibility of conflicting. Further, such stored
properties have no data validation because they are plain text. The result of these
limitations is that it is difficult for users of such applications and related documents
to store arbitrary data with documents, which is a common need of many users.
[0004] Another problem with prior methods and systems is that structured markup language
data, for example Extensible Markup Language (XML) data may not be concurrently edited
by multiple clients (for example, multiple add-ins each independently running in the
context of a word processing document.) However, in the context of many documents,
there is a higher likelihood that the scenarios involving this metadata will require
concurrent editing by one or more sources.
[0005] Accordingly, there is a need for a data store for storing and relating data associated
with a computer-generated document and for allowing use and manipulation of such data
by one or more software applications. It is with respect to these and other considerations
that the present invention has been made.
Summary of the Invention
[0006] Embodiments of the invention solve the above and other problems by providing a data
store within the document, yet separate in location (and possibly format) from the
primary presentation storage location for storing, relating and for allowing use of
data associated with a computer-generated document.
[0007] According to one aspect of the invention, data for structuring information associated
with a document, such as document metadata, is maintained in a data store where relationships
between different pieces of data are maintained. The data store exposes interfaces
to the various pieces of data in the data store for allowing different applications
to access and operate on one or more of the data pieces.
[0008] According to another aspect of the invention, the pieces of data are structured according
to a markup language such as the Extensible Markup Language (XML). XML schemas may
be associated with each piece of data, and the data store may validate the XML structure
applied to the data based on an XML schema associated with a given piece of data.
According to this aspect of the invention, documents may contain any number of arbitrary
data items, for example metadata, structured according to the Extensible Markup Language
(XML). Accordingly, document solution providers may store arbitrary metadata as XML
with a given document and have that information automatically processed by a given
solution having access to the data when the document is opened/edited/saved by a user.
[0009] According to another aspect of the invention programmatic access is provided to the
data in its XML, form while the document is being edited. Thus, a standard mechanism
is provided that is familiar to solution developers via which the data may be accessed
and modified programmatically while the document is open. This programmatic access
mimics standard XML interfaces. Programmatic access to the data is provided via application
programming interfaces to one or more editing client applications (for example, document
editing or creating applications and/or third party application add-in solutions,
and the like). According to this aspect, multiple client applications may access and
edit the same piece of document data, and any conflicting changes to a given piece
of data are resolved. "Side effects" to any given change may be made (for example,
in response to setting a company name to "Microsoft," changing a stock symbol to "MSFT").
In addition, changes to data and any associated side effects may be "bundled" by the
data store so that undoing one or more changes reverses all related changes. This
removes the burden of development from the solution itself to ensure that it has reversed
all changes when the user initiates an undo of the original change from the document
surface, for example, by pressing an Undo command.
[0010] According to another aspect of the invention, standard XML schemas (XSDs) may be
used to define the contents of any of the pieces of custom XML data associated with
document metadata in order to ensure that XML data applied to the document data are
valid. These schemas may be attached to any instance of XML data stored in the document,
and the data store will disallow any change to the XML data that would result in the
XML structure (that is, the XML tags as opposed to their contents) of that data from
becoming invalid. This ensures that the solution developer can attach a specific piece
of XML, metadata to a document and ensure that the XML data will continue to be structurally
"correct" according to the associated schema, regardless of which processes (for example,
add-ins) are used to modify that data.
[0011] These and other features and advantages, which characterize the present invention,
will be apparent from a reading of the following detailed description and a review
of the associated drawings. It is to be understood that both the foregoing general
description and the following detailed description are exemplary and explanatory only
and are not restrictive of the invention as claimed.
Brief Description of the Drawings
[0012]
FIGURE 1 illustrates an exemplary computing device that may be used in one exemplary
embodiment of the present invention.
FIGURE 2 is a block diagram illustrating a relationship between one or more client
applications and a data store and the contents of the data store according to embodiments
of the present invention.
Detailed Description
[0013] As briefly described above, embodiments of the present invention are directed to
methods and systems for storing and relating data associated with a computer-generated
document and for efficiently allowing use and manipulation of data associated with
a computer-generated document by one or more software applications. These embodiments
may be combined, other embodiments may be utilized, and structural changes may be
made without departing from the spirit or scope of the present invention. The following
detailed description is therefore not to be taken in a limiting sense and the scope
of the present invention is defined by the appended claims and their equivalents.
[0014] With reference to FIGURE 1, one exemplary system for implementing the invention includes
a computing device, such as computing device 100. In a very basic configuration, computing
device 100 typically includes at least one processing unit 102 and system memory 104.
Depending on the exact configuration and type of computing device, system memory 104
may be volatile (such as RAM), non-volatile (such as ROM, flash memory, etc.) or some
combination of the two. System memory 104 typically includes an operating system 105,
one or more applications 106, and may include program data 107. In one embodiment,
application 106 may include a word processor application 120. This basic configuration
is illustrated in FIGURE 1 by those components within dashed line 108.
[0015] Computing device 100 may have additional features or functionality. For example,
computing device 100 may also include additional data storage devices (removable and/or
non-removable) such as, for example, magnetic disks, optical disks, or tape. Such
additional storage is illustrated in FIGURE 1 by removable storage 109 and non-removable
storage 110. Computer storage media may include volatile and nonvolatile, removable
and non-removable media implemented in any method or technology for storage of information,
such as computer readable instructions, data structures, program modules, or other
data. System memory 104, removable storage 109 and non-removable storage 110 are all
examples of computer storage media. Computer storage media includes, but is not limited
to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile
disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic
disk storage or other magnetic storage devices, or any other medium which can be used
to store the desired information and which can be accessed by computing device 100.
Any such computer storage media may be part of device 100. Computing device 100 may
also have input device(s) 112 such as keyboard, mouse, pen, voice input device, touch
input device, etc. Output device(s) 114 such as a display, speakers, printer, etc.
may also be included. These devices are well know in the art and need not be discussed
at length here.
[0016] Computing device 100 may also contain communication connections 116 that allow the
device to communicate with other computing devices 118, such as over a network. Communication
connection 116 is one example of communication media. Communication media may typically
be embodied by computer readable instructions, data structures, program modules, or
other data in a modulated data signal, such as a carrier wave or other transport mechanism,
and includes any information delivery media. The term "modulated data signal" means
a signal that has one or more of its characteristics set or changed in such a manner
as to encode information in the signal. By way of example, and not limitation, communication
media includes wired media such as a wired network or direct-wired connection, and
wireless media such as acoustic, RF, infrared and other wireless media. The term computer
readable media as used herein includes both storage media and communication media.
[0017] A number of program modules and data files may be stored in the system memory 104
of the computing device 100, including an operating system 105 suitable for controlling
the operation of a networked personal computer, such as the WINDOWS operating systems
from MICROSOFT Corporation of Redmond, Washington. System memory 104 may also store
one or more program modules, such as word processor application 120, and others described
below. Word processor application 120 is operative to provide functionality for creating,
editing, and processing electronic documents.
[0018] According to one embodiment of the invention, the word processor application 120
comprises the WORD program from MICROSOFT Corporation. It should be appreciated, however,
that word processor application programs from other manufacturers may be utilized
to embody the various aspects of the present invention. It should further be appreciated
that illustration of a word processing application is for purposes of example only
and is not limiting of other types of applications that may produce and operate on
documents according to the present invention. For example, other application programs
106 which are capable of processing various forms of content (e.g. text, images, pictures,
etc.), such as spreadsheet application programs, database application programs, slide
presentation application programs, drawing or computer-aided application programs,
etc. are equally applicable to embodiments of the present invention. An example application
program 106 that produces and operates on a variety of different types of documents
includes OFFICE from MICROSOFT Corporation.
[0019] Embodiments of the invention may be implemented as a computer process, a computing
system, or as an article of manufacture such as a computer program product or computer
readable media. The computer program product may be a computer storage media readable
by a computer system and encoding a computer program of instructions for executing
a computer process. The computer program product may also be a propagated signal on
a carrier readable by a computing system and encoding a computer program of instructions
for executing a computer process.
[0020] Throughout the specification and claims, the following terms take the meanings associated
herein, unless the context of the term dictates otherwise.
[0021] The term "data" may refer to document surface level or presentation level information
such as words, sentences, paragraphs and the like, as well as, supplementary information,
for example, metadata, which is carried with, referred to, or used by the word processing
document. This information is often large and is likely not exposed on the presentation
layer of the document.
[0022] The terms "markup language" or "ML" refer to a language for special codes within
a document that specify how parts of the document are to be interpreted by an application.
In a word processor file, the markup language specifies how the text is to be formatted
or laid out.
[0023] The term "element" refers to the basic unit of an XML document. The element may contain
attributes, other elements, text, and other content regions for an XML document.
[0024] The term "presentation" refers to the visible portion of the document - the text
and layout that would appear if the document were printed.
[0025] The term "tag" refers to a character inserted in a document that delineates elements
within an XML document. Each element can have no more than two tags: the start tag
and the end tag. It is possible to have an empty element (with no content) in which
case one tag is allowed.
[0026] The XML content between the tags is considered the element's "children" (or descendants).
Hence other elements embedded in the element's content are called "child elements"
or "child nodes" or the element. Text embedded directly in the content of the element
is considered the element's "child text nodes". Together, the child elements and the
text within an element constitute that element's "content".
[0027] The term "attribute" refers to an additional property set to a particular value and
associated with the element. Elements may have an arbitrary number of attribute settings
associated with them, including none. Attributes are used to associate additional
information with an element that will not contain additional elements, or be treated
as a text node.
[0028] "XPath" is an operator that uses a pattern expression to identify nodes in an XML
document. An XPath pattern is a slash-separated list of child element names that describe
a path through the XML document. The pattern "selects" elements that match the path.
[0029] The term "XML data store" refers to a container within a document, such as a word
processor document, a spreadsheet document, a slide presentation document, etc., which
provides access for storage and modification of the data (in XML format, for example)
stored in the document while the file is open. Further definition of XML data store
is provided below with respect to Figure 2.
[0030] Figure 2 is a block diagram illustrating a relationship between one or more client
applications and a data store and the contents of the data store according to embodiments
of the present invention. Referring to Fig. 2, the document data 220 includes XML
structure data and associated document data representing the surface or presentation
level view of a document. For example the document data 220 may include XML structure
(e.g., heading tags, body tags, conclusion tags) and associated surface view data
(e.g., words, sentences, paragraphs) of a word processing document, spreadsheet document,
slide presentation document, and the like.
[0031] The data store 208 is a document data repository for storing one or more pieces of
structured data associated with one or more types of data associated with a given
document. The metadatal 225 (structured data item) may include XML, structure data
and associated data for a first piece of metadata associated with the document. For
example, the metadatal 225 may include XML structure data (e.g., date tags, name tags,
etc.) applied to metadata listing the document author, date of document creation,
date of document last change/save, and the like. The metadata2 230 (structured data
item) may include XML structure data (tags) and associated metadata representing a
second piece of metadata associated with the document. As should be understood, the
metadatal and metadata2 are for purposes of example and are not limiting of the variety
and number of different types of data that may be maintained in the data store 208
in association with a given document. For example, as described herein, arbitrary
data may be structured and added to the document by one or more software application
as desired by solution providers or users having access to the document data.
[0032] Referring still to Fig. 2, a schema file 240, 245 may be attached to each piece of
data stored in the data store 208 for dictating the syntax and validation rules associated
with Extensible Markup Language (XML) data applied to each piece of data 225, 230.
As known to those skilled in the art, XML schema files provide a way to describe and
validate data in an XML environment. A schema file states what XML markup data, including
elements and attributes, are used to describe content in an XML document, and the
schema file defines XML markup syntax, including where each element is allowed, what
types of content are allowed within an element and which elements can appear within
other elements. The use of schema files ensures that the document (or individual piece
of data in this case) is structured in a consistent and predictable manner. Schema
files 240, 245 may be created by a user and generally supported by an associated markup
language, such as XML.
[0033] This schematization of the document allows the data store to provide the ability
to "guarantee" the structural validity of the document by rejecting any change that
violates a given schema file at the data store level. According to an embodiment,
the data store 208 utilizes a schema validation module 260 for validating XML structure
added to or changes made to a given piece of data against an associated schema file.
For example, if a document creator or editor makes XML structural changes to a given
piece of data, for example, the metadatal, wherein the editor adds or removes a given
XML, tag, the data store 208 will utilize the schema validation module to check the
XML, structural changes against the associated schema file to ensure the validity
of the change. If the change is not valid, an error can be generated to the editor.
As is understood, such control of the XML structure applied to a given piece of data
allows for structural consistency and predictability which is especially important
for allowing client and third party applications to interact with associated data.
[0034] According to an embodiment of the invention, the data store 208 provides one or more
application programming interfaces (API) 270 which can be accessed by client applications
205 (e.g., word processing applications, spreadsheet applications, slide presentation
applications, etc.), as well as, third party applications 210, 215 via the object
models (OM) of the respective applications 205, 210, 215. These interfaces allow client
applications and third party applications to load any existing XML file into a given
document's data store 208, thus ensuring that that data is now part of the document
and will travel within that document for its lifetime (e.g., through opening/editing/saving/renaming/etc.)
or until the data is deleted from the data store. According to one embodiment, the
data in the data store is available in its XML format even when a source application
for a given piece of data 225, 230 is closed or is otherwise not available. That is,
a given piece of data 225, 230 may be accessed via the APIs 270 by other applications
(other than a source application). As described below, the APIs also allow client
and third party applications to make changes to the XML markup data applied to the
data items 225, 230.
[0035] Once XML, data 225, 230 is loaded into the data store for association with a document
220, it can be manipulated as standard X1VIL, using the data store interfaces designed
to provide similar methods to existing XML editing interfaces in order to leverage
developers' existing knowledge of the XML programming standard. This allows users
to perform standard XML, operations on XML, data added to the data store for a document,
such as adding elements and attributes, removing elements and attributes, changing
the value of existing elements/attributes, and reading the values of any existing
part of the associated XML tree. Using these XML standard operations, solutions may
store structured complex metadata with a document subject to none of the previous
restrictions on the length/size of the data or structure of the data, which enables
the use of this XML, data store for significantly more structured solutions than prior
solutions. For example, a third party application 215 may be written for locating
and extracting document author names and document creation dates from a number of
documents 204 by reading the metadatal 225 added to the data store 208 for each document.
The example third party application may be a spreadsheet application programmed for
making a list of document author names and document creation dates for all documents
created by a given organization. In accordance with embodiments of the present invention,
the third party application may utilize the XML structure applied to the metadatal
for efficiently locating and extracting the desired data. For example, the third party
application may be written to parse the XML structure of the metadatal file to locate
XML tags, such as <docauthor> and <doccreationdate> for obtaining and using data associated
with those tags. As should be appreciated, the forgoing is just one example of the
many ways one or more applications may interact with structured data that is associated
with the document via the data store 208.
[0036] In addition, the data store 208 provides any number of API interfaces 270 to any
individual piece of XML data 220, 225, 230 (also known as a store item) to enable
multiple applications 205, 210, 215 to work with the same piece of data. For example,
several solutions, such as a client application (e.g., word processing application)
and third party application solutions (e.g., the example spreadsheet application described
above), may work with the same set of document properties (e.g., properties contained
in the metadata2 230 file). Using the data store 208, each of these applications receive
separate access to the desired XML data 230 through their own data store API interface
270 for allowing each application to communicate with the data via its own OM without
having to deal with the complexity of having multiple processes accessing the same
piece of data.
[0037] In order to allow for these multiple applications 205, 210, 215 to access the same
data, the data store 208 notifies each of these applications when any part of the
XML data is changed by another application so that a given application may respond
to that change (both internally to its own process and externally by other changes
to the same data). When one application requests a change to a given data item, that
request is automatically sent to all other applications to allow other applications
to decide how or if to respond to the requested change. According to one embodiment,
this is accomplished by allowing each application to register to "listen" to any part
of the XML data to which it has an interface so that a given application solution/program
only receives those messages which are pertinent to its own logic. For example, one
type of application 210 may wish to register to listen to all changes made to a given
XML data in order to provide detailed business logic capabilities to a third party
solution, but another type of application 215 may wish to only listen to changes to
one or two specific XML elements within the same data because its logic does not care
about changes to any other part of the XML data.
[0038] According to this embodiment, the multiple applications 205, 210, 215 may access
and edit the same piece of document data, and any conflicting changes to a given piece
of data are resolved. For example, "side effects" to any given change may be made
when one change by on application causes a side effect change by another application.
For example, a first application 210 may be tasked with extracting company names from
one or more data items 225, 230 associated with a given document for translating those
names into corresponding stock symbols, if available, for compiling a list of company
stock symbols related to a given document. If a second application 215 causes a given
company name in a given piece of metadata to be added or to be changed, for example,
changing a company name from "Company ABC" to Company XYZ," the first application
may listen to this change for automatically updating its list of stock symbols to
include the stock symbol for "Company XYZ" instead of "Company ABC." In addition,
such changes and any associated side effects may be bundled by the data store 208
so that undoing one or more changes reverses all related changes.
[0039] As described herein, embodiments of the invention provide a data store for storing,
relating and for allowing use of data associated with a computer-generated document.
It will be apparent to those skilled in the art that various modifications or variations
may be made in the present invention without departing from the scope or spirit of
the invention. Other embodiments of the invention will be apparent to those skilled
in the art from consideration of the specification and practice of the invention disclosed
herein.
1. A method of managing data associated with computer-generated documents, comprising:
storing a document with an associated document data store;
storing a structured data item associated with the document in the document data store;
and
exposing one or more application programming interfaces (API) to one or more software
applications for allowing programmatic access to the structured data item by the one
or more software applications.
2. The method of claim 1, further comprising structuring the structured data item according
to the Extensible Markup Language (XML).
3. The method of claim 2, further comprising associating with the structured data item
an XML schema file for providing XML markup data and XML markup syntax that may be
validly applied to the structured data item.
4. The method of claim 3, further comprising receiving a change to an XML markup data
applied to the structured data item via the exposed one or more APIs.
5. The method of claim 4, in response to receiving a change to an XML markup data applied
to the structured data item,
reading an XML schema file associated with the structured data item to which the change
to the XML markup data is directed; and
determining whether the change to the XML markup data is valid according to the read
XML schema file.
6. The method of claim 5, further comprising if the change to the XML markup data is
not valid according to the read XML schema file, disallowing the change to the XML
markup data.
7. The method of claim 1, further comprising receiving programmatic access to the structured
data item associated with the document by one of the one or more software applications
via the one or more application programming interfaces.
8. The method of claim 7, further comprising if a change is received to the structured
data item via one of the one or more software applications having access to the document
and to the structured data item, notifying any other of the one or more software applications
having access to the structured data item of the change received to the structured
data item.
9. The method of claim 8, further comprising allowing a notified one of the one or more
software applications to make changes to the structured data item in response to the
change received to the structured data item.
10. The method of claim 9, whereby if the change received to the structured data item
is undone by one of the one or more software applications, correspondingly undoing
any changes made to the structured data item by any other of the one or more software
applications where the any changes made by the other of the one or more software applications
were made in response to the undone change.
11. The method of claim 1, prior to storing a structured data item associated with the
document in the document data store, receiving the structured data item from one of
the one or more software applications via the one or more application programming
interfaces (API).
12. The method of claim 1, whereby the structured data item associated with the document
includes metadata associated with the computer-generated document.
13. A document data store for managing data associated with computer-generated documents
and being operative:
to store a structured data item associated with a document in the document data store;
and
to expose one or more application programming interfaces (API) to one or more software
applications for allowing programmatic access to the structured data item by the one
or more software applications.
14. The document data store of claim 13, being further operative
to receive a change to an Extensible Markup Language (XML) markup data applied to
the structured data item via the exposed one or more APIs; and
to read an XML, schema file associated with the structured data item to which the
change to the XML markup data is directed;
to determine whether the change to the XML markup data is valid according to the read
XML schema file; and
to disallow the change to the XML markup data if the change to the XML markup data
is not valid according to the read XML schema file.
15. The document data store of claim 13, being further operative
to receive programmatic access to the structured data item associated with the document
by one of the one or more software applications via the one or more application programming
interfaces; and
to notify any other of the one or more software applications having access to the
structured data item of the change received to the structured data item if a change
is received to the structured data item via one of the one or more software applications
having access to the structured data item.
16. A computer-readable medium having stored thereon computer-executable instructions
which when executed by a computer perform a method of managing data associated with
computer-generated documents, comprising:
storing a structured data item associated with a document in the document data store;
and
exposing one or more application programming interfaces (API) to one or more software
applications for allowing programmatic access to the stored document and to the structured
data item by the one or more software applications.
17. The computer-readable medium of claim 16, further comprising
structuring the structured data item according to the Extensible Markup Language (XML);
and
associating with the structured data item an XML schema file for providing XML, markup
data and XML markup syntax that may be validly applied to the structured data item.
18. The computer-readable medium of claim 17, further comprising
receiving a change to an XML markup data applied to the structured data item via the
exposed one or more APIs;
reading an XML schema file associated with the structured data item to which the change
to the XML markup data is directed;
determining whether the change to the XML markup data is valid according to the read
XML schema file; and
if the change to the XML, markup data is not valid according to the read XML schema
file, disallowing the change to the XML markup data.
19. The computer-readable medium of claim 16, further comprising
receiving programmatic access to the structured data item associated with the document
by one of the one or more software applications via the one or more application programming
interfaces;
if a change is received to the structured data item via one of the one or more software
applications having access to the structured data item, notifying any other of the
one or more software applications having access to structured data item of the change
received to the structured data item; and
allowing a notified one of the one or more software applications to make changes to
the structured data item in response to the change received to the structured data
item.
20. The computer-readable medium of claim 19, whereby if the change received to the structured
data item is undone by one of the one or more software applications, correspondingly
undoing any changes made to the structured data item by any other of the one or more
software applications where the any changes made by the other of the one or more software
applications were made in response to the undone change.