(19)
(11)EP 3 211 547 B1

(12)EUROPEAN PATENT SPECIFICATION

(45)Mention of the grant of the patent:
30.03.2022 Bulletin 2022/13

(21)Application number: 17154307.7

(22)Date of filing:  02.02.2017
(51)International Patent Classification (IPC): 
G06F 16/00(2019.01)
(52)Cooperative Patent Classification (CPC):
G06F 16/188; G06F 16/13; G06F 16/172; G06F 16/84; G06F 16/162

(54)

SYSTEM AND METHODS FOR PROVIDING FAST CACHEABLE ACCESS TO A KEY-VALUE DEVICE THROUGH A FILESYSTEM INTERFACE

SYSTEM UND VERFAHREN ZUR BEREITSTELLUNG VON SCHNELLEM ZWISCHENSPEICHERBAREN ZUGANG AUF EINE SCHLÜSSELWERTVORRICHTUNG ÜBER EINE DATEISYSTEMSCHNITTSTELLE

SYSTÈME ET PROCÉDÉS DE FOURNITURE D'ACCÈS RAPIDE MIS EN CACHE POUR DISPOSITIF DE VALEURS CLÉS PAR L'INTERM1DIAIRE D'UNE INTERFACE DE SYSTÈME DE FICHIERS


(84)Designated Contracting States:
AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

(30)Priority: 23.02.2016 US 201662298987 P
29.04.2016 US 201615143504

(43)Date of publication of application:
30.08.2017 Bulletin 2017/35

(73)Proprietor: Samsung Electronics Co., Ltd.
Gyeonggi-do 16677 (KR)

(72)Inventors:
  • SINHA, Vikas
    Sunnyvale, CA 94089 (US)
  • GUZ, Zvi
    Palo Alto, CA 94036 (US)
  • LIN, Ming
    San Jose, CA 95119 (US)

(74)Representative: Kuhnen & Wacker Patent- und Rechtsanwaltsbüro PartG mbB 
Prinz-Ludwig-Straße 40A
85354 Freising
85354 Freising (DE)


(56)References cited: : 
  
  • William Jannen ET AL: "Open access to the Proceedings of the 13th USENIX Conference on File and Storage Technologies is sponsored by USENIX BetrFS: A Right-Optimized Write-Optimized File System BetrFS: A Right-Optimized Write-Optimized File System", USENIX, 19 February 2015 (2015-02-19), pages 300-315, XP055367872, Retrieved from the Internet: URL:http://supertech.csail.mit.edu/papers/ JannenYuZh15a.pdf [retrieved on 2017-04-26]
  • Pradeep J Shetty: "From Tuples to Files: a Fast Transactional System Store and File System", Technical Report FSL-12-03, 2 May 2012 (2012-05-02), XP055367882, Stony Brook University, NY, USA Retrieved from the Internet: URL:https://pdfs.semanticscholar.org/fa2e/ 620eea034a58421f70340fd223e215bd2c54.pdf [retrieved on 2017-04-26]
  • UNKNOWN ET AL: "Storage-class memory needs flexible interfaces", PROCEEDINGS OF THE 4TH ASIA-PACIFIC WORKSHOP ON SYSTEMS, APSYS '13, 30 July 2013 (2013-07-30), pages 1-7, XP055367876, New York, New York, USA DOI: 10.1145/2500727.2500732 ISBN: 978-1-4503-2316-1
  
Note: Within nine months from the publication of the mention of the grant of the European patent, any person may give notice to the European Patent Office of opposition to the European patent granted. Notice of opposition shall be filed in a written reasoned statement. It shall not be deemed to have been filed until the opposition fee has been paid. (Art. 99(1) European Patent Convention).


Description

FIELD



[0001] This inventive concept relates to key-value stores, and more particularly to using an operating systems cache when accessing a key-value device.

BACKGROUND



[0002] Existing operating systems implement myriad of mechanisms to cache file system data in memory and improve performance. Specifically, the page cache (or buffer cache) heavily caches frequently accessed data to improve overall filesystem performance. While the page cache itself does not require a file system to reside on a block device, in practice, most configurations that utilize a page cache to improve file system performance require the filesystem to be resident on a block device.

[0003] Key-value Solid State Drives (SSDs) are an emerging technology that delivers better storage performance. But the key-value system used by these SSDs exports object semantics rather than block semantics, and thus may not usually be connected to the page cache. Using key-value SSDs currently requires either bypassing the file system entirely or using a file system without the benefits of the page cache. In either case, data from the key-value SSD is not cached in the operating system's page cache or buffer cache.

[0004] This creates a performance cliff, and usually requires the user program to implement its own caching mechanism to restore reasonable performance. Implementing a cache within the user program is a significant complexity and software development cost for the user. Moreover, when user space caching is used, different programs may not easily share their caches, and the entire cache content is lost when the program terminates.

[0005] From the manuscript "BetrFS: A Right-Optimized Write-Optimizied File System", published by William Jannen et. al. in the proceedings of the 13th USENIX Conference on File and Storage Technologies, pages 300-315, in February 2015, it is known an in-kernel writeoptimized file system called BetrFS achieving both microwrites and large scans efficiently. BetrFS takes advantage of the performance strengths of Bε-trees, implements a key-value store, translates file-system operations into key-value operations and uses the OS kernel cache to accelerate reads without throttling writes. The system can match or outperform traditional file systems on almost every operation.

[0006] From the Master Thesis "From Tuples to Files: a Fast Transactional System Store and File System", Technical Report FSL-12-03, presented by Pradeep J. Shetty at the Stony Brook University, NY, USA, in May 2012, it is known a transactional system store that can efficiently manage a continuum of interrelated objects from small to large. The system is based on a data structure, the VT-tree, which is an extension of the log-structured merge-tree data structure (LSM). In the system applications can perform key-value storage and POSIX file operations in the same ACID system transaction, providing support for operations such as file indexing and tacking, meta-data search, and package installation in a generic a flexible manner. The new transactional file system's performance is comparable to that of existing native file systems and its transactional interface adds a minimal overhead and supports highly concurrent transactions.

[0007] From the manuscript "Storage-Class Memory Needs Fexible Interfaces" published by Haris Volos et al. in the proceedings of the 4th Asia-Pacific workshop on Systems in July 2013, it is known a decentralized file-system architecture that represents a new design targeting storage-class-memory, and reduces the kernel role to just multiplexing physical memory. Applications can achieve high performance by optimizing the file-system interface for application needs without changes to complex kernel code. Initial results with implementing a POSIX and key-value file-system interface to SCM on top of the architecture have been encouraging.

[0008] A need remains for a way to permit a system with a key-value SSD to utilize the benefits of the page cache.

[0009] The invention is set out in the appended set of claims.

BRIEF DESCRIPTION OF THE DRAWINGS



[0010] 

FIG. 1 shows a system enabling using the page cache of the operating system when accessing a key-value system storage device, according to an embodiment of the inventive concept.

FIG. 2 shows additional details of the computer of FIG. 1.

FIGs. 3A-3B show the flow of commands and data across the layers of the computer of FIG. 1 (the method illustrated in FIG. 3B is as such no method according to the claimed invention but useful to understand certain aspects of the claimed invention).

FIG. 4 shows details of the Key-Value File System (KVFS) layer of FIG. 1.

FIG. 5 shows details of the KVFS shim of FIG. 1.

FIG. 6 shows details of the name generator unit of FIG. 5.

FIG. 7 shows details of the file descriptor lookup table of FIG. 5.

FIG. 8 shows details of the structure of the metadata object of FIG. 1.

FIGs. 9A-9E show a flowchart of an example procedure for processing a command using the computer of FIG. 1, according to an embodiment of the inventive concept.

FIGs. 10A-10B show a flowchart of an example procedure for the operation of the KVFS shim of FIG. 1 (which example procedure is as such no embodiment of the claimed invention but useful to understand certain aspects of the claimed invention).

FIGs. 11A-11B show a flowchart of an example procedure for the operation of the KVFS of FIG. 1 (which example procedure is as such no embodiment of the claimed invention but useful to understand certain aspects of the claimed invention).

FIGs. 12A-12B show a flowchart of an example procedure for using the KVFS cache of FIG. 1 (which example procedure is as such no embodiment of the claimed invention but useful to understand certain aspects of the claimed invention).

FIG. 13 shows a flowchart of an example procedure for generating a file name from an object name using the name generator unit of FIG. 5, according to an embodiment of the inventive concept.

FIGs. 14 shows a flowchart of an example procedure for modifying the metadata object of FIG. 1 in the system of FIG. 1 (which example procedure is as such no embodiment of the claimed invention but useful to understand certain aspects of the claimed invention).


DETAILED DESCRIPTION



[0011] Reference will now be made in detail to embodiments of the inventive concept, examples of which are illustrated in the accompanying drawings. In the following detailed description, numerous specific details are set forth to enable a thorough understanding of the inventive concept. It should be understood, however, that persons having ordinary skill in the art may practice the inventive concept without these specific details. In other instances, well-known methods, procedures, components, circuits, and networks have not been described in detail so as not to unnecessarily obscure aspects of the embodiments.

[0012] It will be understood that, although the terms first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first unit could be termed a second unit, and, similarly, a second unit could be termed a first unit, without departing from the scope of the inventive concept.

[0013] The terminology used in the description of the inventive concept herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the inventive concept. As used in the description of the inventive concept and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. The components and features of the drawings are not necessarily drawn to scale.

[0014] Embodiments of the inventive concept include methods for accessing a key-value that leverage an operating system's page cache (or buffer cache) to accelerate data access. A key-value request (or key-value system command) is transformed to a file system request that may utilize the page cache. Embodiments of the inventive concept also transform file system requests to key-value system requests: for example, to transform the page cache ReadPage command to key-value system GET and PUT commands. To facilitate these transformations, a key-value file system (KVFS) may include its own internal page cache, which may further reduce the number of access requests made of the storage device, and may also address partial reads and writes. The storage device is also configured to store a metadata object that supports a file system interface and functionality while requiring a minimal overhead.

[0015] By utilizing the existing operating system page cache, embodiments of the inventive concept improve the data access performance of key-value applications. This result has the added benefit of enabling multiple applications to share the page cache and to permit cached data to persist across application restarts.

[0016] Embodiments of the inventive concept enable use of standard operating system page cache and buffer cache behaviors without requiring changes to any generic part of an operating system. To achieve these results, embodiments of the inventive concept introduce two new components:
  1. 1) Within the user space of the operating system, a new Key-Value File System (KVFS) shim is introduced. The KVFS shim may override a small subset of methods to which an application may link, implementing the changes transparently to the application.
  2. 2) Within the file system layer of the operating system, a KVFS driver (sometimes referred to as the KVFS layer, or just KVFS) is introduced. The KVFS driver conforms to standard file system interfaces (such as BSD's Vnode or Linux's VFS interface) required by the operating system, and translates file system requests to key-value system requests.


[0017] FIG. 1 shows a system enabling using the page cache of the operating system when accessing a key-value system storage device, according to an embodiment of the inventive concept. In FIG. 1, computer 105 is shown as including processor 110, memory 115, and storage device 120. Processor 110 may be any variety of processor: for example, an Intel Xeon or Intel Celeron processor. Memory 115 may be any variety of memory, such as non-volatile memory (e.g., flash memory) or Static Random Access Memory (RAM), but is typically Dynamic RAM. Storage device 120 may be any variety of storage device that does not use a conventional block interface. Embodiments of the inventive concept include Solid State Drives (SSDs) offering a key-value (object) interface, but other embodiments of the inventive concept may support other types of storage devices. In the description below, in general any reference to "key-value", whether in the context of an interface, command, or other context, may be replaced with any other alternative appropriate to a different specialized storage device 120.

[0018] Memory 115 may include application 125, which may be any variety of application. In embodiments of the inventive concept, application 125 is an application designed to utilize a key-value interface of storage device 120. As is described below with reference to FIG. 3B, embodiments of the inventive concept may in addition permit an application utilizing conventional file system commands to access a storage device, such as storage device 120, offering a key-value interface.

[0019] Memory 115 may also include operating system 130, which includes file system 135. File system 135 may be a conventional file system, just as operating system 130 is a conventional operating system including a page cache. (The term "page cache" is intended to encompass any cache offered by an operating system to store data for applications, be it a more conventional buffer cache or a more modern Linux-type page cache.) To enable transition between conventional file system commands and key-value system commands, operating system 130 includes key-value file system (KVFS) shim 140 and KVFS 145. KVFS shim 140 is configured to translate key-value system commands to file system commands, which file system 135 may then process. KVFS 145 is configured to translate file system commands back to key-value system commands to interface with storage device 120 (which, as described above, offers a key-value system interface rather than a conventional block interface). KVFS shim 140 may be implemented as functions that override library functions normally called by application 125.

[0020] Since the specifics of the implementation of KVFS shim 140 and KVFS 145 may depend on variables including the particulars of operating system 130 and file system 135 the commands accepted by storage device 120, implementation may vary across different installations. In some embodiments of the inventive concept, KVFS shim 140 and KVFS 145 may be implemented using pluggable functions, with KVFS shim 140 and KVFS 145 both including a complete set of all possible functions. Then, for a particular implementation, specific functions may be activated, with the remaining functions left inactive. For example, KVFS shim 140 and KVFS 145 may include functions to handle all possible file system commands for all possible file systems 135, and functions to handle all possible key-value system commands for all possible storage devices 120. Then, when KVFS shim 140 and KVFS 145 are installed on computer 105, the functions that process the particular commands recognized by file system 135 and storage device 120 may be activated, implementing the particular KVFS shim 140 and KVFS 145 needed for computer 105.

[0021] While operating system 130 includes its own page cache, further enhancements are made to computer 105 to reduce the need to access data from storage device 120. KVFS 145 includes KVFS cache 150. KVFS cache 150 may store copies 155 and 160 of data and metadata. Copies 155 and 160 may be copies of data object 165 and metadata object 170 stored in storage device 120. As will be described further below with reference to FIGs. 3A-8, data object 165 may store the underlying data, and metadata object 170 may store the metadata of a file. Thus, together, data object 165 and metadata object 170 may establish file 175.

[0022] One reason to include KVFS cache 150 is to address partial reads and writes. Key-value system semantics may specify that objects are read or written in their entirety: partial data reads and writes might not be permitted. Thus, if any data is needed from data object 165 stored in storage device 120, the entirety of data object 165 must be read. Similarly, if any data is to be written to data object 165 stored in storage device 120, the entirety of data object 165 must be written.

[0023] But file system semantics may permit partial data reads and writes. For example, a file system command might only want to read a data field from data object 165. Since key-value system semantics would require the entirety of data object 165 to be read regardless of how much data is actually to be used, the remaining data may be cached somewhere in case it is needed in the future, avoiding the need to re-read data object 165. But since the file system command from operating system 130 only requests the specific data required by application 125, the page cache within operating system 130 would not cache the remaining data from data object 165. Thus, KVFS cache 150 provides a means to store data that would otherwise be discarded, even though it might be needed at some point in the future.

[0024] Of course, this means that KVFS cache 150 is a cache, with the limitations that exist for any cache. KVFS cache 150 will have a finite size determined by the space allocated to KVFS cache 150. If KVFS cache 150 is asked to store more data than for which it has space allocated, KVFS cache 150 will need to rotate data out of KVFS cache 150. KVFS cache 150 may use any desired algorithm for expunging older data to make room for new data, such as Least Frequently Used (LFU), Least Recently Used (LRU), or any other schedule.

[0025] One consequence of KVFS cache 150 expunging older data is that for some object, KVFS cache 150 might contain only part of its data. For example, consider a situation where data is requested from a database that is 200 MB in size. Since objects are written and read in their entirety from key-value system storage devices, a single object, roughly 200 MB in size, stores the database. So when part of the database is to be read, the entire 200 MB of the database would be loaded into KVFS cache 150. Later, assume a request comes to read a file that is 10 KB in size, but KVFS cache 150 is now full. For whatever reason, KVFS cache 150 decides to evict 10 KB worth of the database to make room for the requested file.

[0026] Now further assume that another requests comes for data from the database. With more than 199 MB of the database still in KVFS cache 150, the odds are likely that the requested data is still present in KVFS cache 150. If so, then the request may be satisfied from KVFS cache 150 without accessing storage device 120. But if the requested data happens to be part of the data evicted from KVFS cache 150 when the smaller file was read, KVFS 145 will need to request the entire 200 MB database object again.

[0027] Data writes may be handled similarly. When data is to be written, if the data being replaced is stored in KVFS cache 150, then the data within KVFS cache 150 may be updated and KVFS 145 may return a result. Later, KVFS 145 may write the data from KVFS cache 150 to storage device 120, to ensure the data is updated in the more permanent storage, after which the data in KVFS cache 150 may be flagged as being available to erase. Of course, if new data is to be loaded into KVFS cache 150 when KVFS cache 150 is full, KVFS cache 150 needs to know which data has been written to storage device 120 and which has not, so that data may be flushed to storage device 120 if those pages are to be expunged from KVFS cache 150. So KVFS cache 150 would need to track dirty bits for each page in KVFS cache 150. Another alternative, or course, is to ensure that the data object is written to storage device 120 before KVFS 145 returns a result of the data write operation: in that situation, KVFS cache 150 may be certain that any data may be expunged safely.

[0028] Data object 165 has object name 180. Object name 180 is data that can be used to uniquely locate data object 165 on storage device 120. In a similar way, metadata object 170 may have its own name, although as described below with reference to FIGs. 5 and 6, the name of metadata object 170 may be derived from name 180 of data object 165. By making the name of metadata object 170 a derivative of object name 180, metadata object 170 may always be located knowing object name 180.

[0029] File 175 has file name 185. File name 185 is independent of object name 180. File name 185 may change without changing object name 180, and vice versa.

[0030] FIG. 2 shows additional details of computer 105 of FIG. 1. Referring to FIG. 2, typically, machine or machines 105 include one or more processors 110, which may include memory controller 205 and clock 210, which may be used to coordinate the operations of the components of machine or machines 105. Processors 110 may also be coupled to memory 115, which may include random access memory (RAM), read-only memory (ROM), or other state preserving media, as examples. Processors 110 may also be coupled to storage devices 120, and to network connector 215, which may be, for example, an Ethernet connector or a wireless connector. Processors 110 may also be connected to a bus 220, to which may be attached user interface 225 and input/output interface ports that may be managed using input/output engine 230, among other components.

[0031] FIGs. 3A-3B show the flow of commands and data across the layers of computer 105 of FIG. 1. In FIG. 3A, one embodiment of the inventive concept is shown, in which application 125 may issue key-value system commands that would be recognized by storage device 120. When application 125 issues key-value system command 305, application 125 may use a library function. This library function may be overridden by KVFS shim 140, which may then receive key-value system command 305. KVFS shim 140 then maps key-value system command 305 to file system command 310. File system command 310 is an analogous file system command to key-value system command 305, but one that may be processed by file system 135, part of operating system 130 of FIG. 1. File system 135 (or operating system 130 of FIG. 1, depending on the implementation of operating system 130 of FIG. 1 and file system 135) accesses page cache 315 in an attempt to satisfy file system command 310. If page cache 315 satisfies file system command 310, then file system 135 (or operating system 130 of FIG. 1) may return result 320 back to KVFS shim 140. KVFS shim 140 may then map result 320 into a form expected by application 125: application 125 is expecting a result for key-value system command 305, which might take a different form than that of a result for file system command 310.

[0032] If file system 135 (or operating system 130 of FIG. 1) does not satisfy file system command 310 using page cache 315, file system 135 sends file system command 310 on to KVFS 145. KVFS 145 then attempts to satisfy file system command 310 using KVFS cache 150. If KVFS 145 satisfies file system command 310 using KVFS cache 150, KVFS 145 may return result 325. File system 135 (or operating system 130 of FIG. 1) may then make any needed updates to page cache 315 and may return result 325 (shown as result 320 in FIG. 3A) to KVFS shim 140, where processing may continue as described before.

[0033] KVFS 145 might also need to update storage device 120. For example, if file system command 310 updates the metadata for file 175 of FIG. 1, KVFS 145 may update metadata object 170 as stored on storage device 120. But whether KVFS 145 needs to make any changes to storage device 120 is dependent on the implementation of KVFS 145, storage device 120, and the specifics of file system command 310, and is not necessarily required for all file system commands 310.

[0034] If KVFS 145 does not satisfy file system command 310 using KVFS cache 150, KVFS 145 maps file system command 310 to key-value system command 330. It may be expected that key-value system command 330 will usually be identical to key-value system command 305 as issued by application 125, but it is possible that key-value system command 330 might differ somehow from key-value system command 305. KVFS 145 then receives a result from storage device 120, which KVFS 145 returns to file system 135 (or operating system 130 of FIG. 1) as result 335, after which processing may continue as described before. KVFS 145 may also update KVFS cache 150 of FIG. 1 based on the result 335 received from storage device 120. For example, if file system command 310 involved renaming file 175 of FIG. 1, and KVFS cache 150 of FIG. 1 did not already store metadata object 170 of FIG. 1, KVFS 145 may issue key-value system command 330 to retrieve metadata object 170 of FIG. 1, store copy 160 of metadata object 170 of FIG. 1 in KVFS cache 150 of FIG. 1, and update copy 160 of metadata object 170 of FIG. 1 in KVFS cache 150 of FIG. 1. KVFS 145 may then issue additional second key-value system commands 330 to delete metadata object 170 of FIG. 1 from storage device 120 and to store a replacement metadata object 170 of FIG. 1 in storage device 120, so that storage device 120 includes updated metadata.

[0035] FIG. 3B is similar to FIG. 3A. FIG. 3B illustrates a method according to a comparative example. In the method illustrated in FIG. 3B application 125 issues file system command 310, rather than key-value system command 305 of FIG. 3A. For example, application 125 might be an application that is not designed to utilize the key-value interface offered by storage device 120, but instead expects to use conventional file system commands.

[0036] As application 125 issues conventional file system commands, KVFS shim 140 is not needed to translate key-value system commands into file system commands. As a result, file system 135 (or operating system 130 of FIG. 1) may utilize page cache 315 based on file system command 310. But KVFS 145 may still map file system command 310 to key-value system command 330. By using KVFS 145 to map file system command 310 to key-value system command 330, KVFS 145 may make it appear to file system 135 that storage device 120 uses conventional block storage, when in fact storage device 120 actually uses object storage. In such embodiments of the inventive concept, application 125 may leverage the benefits of page cache 315, despite the fact that storage device 120 does not use conventional block storage. Note that the operations of file system 135 (and/or operating system 130 of FIG. 1), KVFS 145 (and KVFS cache 150 of FIG. 1), and storage device 120 are identical to those described in FIG. 3A.

[0037] While FIGs. 3A and 3B are presented as alternative embodiments, embodiments of the inventive concept represented in FIGs. 3A and 3B may also be combined. For example, embodiments of the inventive concept may operate as shown in FIG. 3A when an application 125 issues key-value system commands such as key-value system command 305 of FIG. 3A, and may operate as shown in FIG. 3B when an application 125 issues file system commands such as file system command 310 of FIG. 3B. As a result, page cache 315 and KVFS cache 150 of FIG. 1 may be leveraged across applications 125 to use either key-value system commands or file system commands, and data may be shared across such applications 125 within page cache 315 and KVFS cache 150 of FIG. 1.

[0038] FIG. 4 shows details of Key-Value File System (KVFS) layer 145 of FIG. 1. In FIG. 4, aside from KVFS cache 150, which was described above with reference to FIGs. 1 and 3A-3B, KVFS 145 is shown as including reception unit 405, mapping unit 410, command unit 415, and return unit 420. Reception unit 405 is configured to receive commands from other levels in operating system 130 of FIG. 1, such as file system 135 of FIG. 1. Mapping unit 410 is configured to map file system commands to key-value system commands. Command unit 415 is configured to issue key-value system commands to storage device 120 of FIG. 1. And return unit 420 is configured to return results to the calling level of operating system 130 of FIG. 1, such as file system 135 of FIG. 1. Note that not every unit is needed in response to all file system commands. For example, if a file system command may be satisfied from data resident in KVFS cache 150, mapping unit 410 and command unit 415 might not be needed to access information from storage device 120 of FIG. 1.

[0039] The mapping of file system commands to key-value system commands was discussed above with reference to FIGs. 3A-3B. To achieve this mapping, mapping unit 410 is configured to include any desired mapping from file system commands to key-value system commands. For example, mapping unit 410 might include a table that specifies what key-value system command(s) correspond to a given file system command. Note that the association may be oneto-many: a single file system command might include multiple key-value system commands. For example, in a flash SSD, data may not be overwritten. Changing data involves invalidating the original data (which may be subject to garbage collection by the SSD whenever appropriate) and writing a new data object. Thus, changing any metadata for a file may require KVFS 145 to delete metadata object 170 of FIG. 1 (more accurately, KVFS 145 may invalidate metadata object 170 of FIG. 1 on storage device 120, and let storage device 120 perform garbage collection to free the space that was occupied by the old object) and to store a replacement metadata object.

[0040] It is worthwhile noting that a distinction may be drawn between the names of various data elements within the system. Returning momentarily to FIG. 1, data object 165 has object name 180, whereas file 175 has file name 185. (Metadata object 170 has an object name as well, as discussed below with reference to FIGs. 5-8. But as the name of metadata object 170 is an element that is strictly internal to the operation of computer 105, the name of metadata object 170 is not significant to this discussion). Object name 180 identifies data object 165; file name 185 identifies file 175. File name 185 itself is metadata stored within metadata object 170. The representation shown in FIG. 1 is merely symbolic. File 175 is effectively an element within file system 135, whereas data object 165 is an element within the key-value system of storage device 120. Object name 180 and file name 185 are distinct. It would be highly unusual, if not outright impossible, for object name 180 and file name 185 to be the same.

[0041] Furthermore, object name 180 and file name 185 may each be modified without affecting the other. For example, if application 125 decides to rename file name 185, this change affects the contents of metadata object 170, but does not change object name 180. Alternatively, if object name 180 were to be changed, this would affect data object 165 (and would have an indirect effect on metadata object 170, as the object name for metadata object 170 would also change); but file name 185 would remain unchanged. Thus it is important to keep separate the concepts of object names and file names: they are related but distinct concepts.

[0042] Returning to FIG. 4, KVFS 145 also includes inode 425. The inode 425 is a data structure representative of a file. The inode 425 may be a conventional inode as used in Unix-based systems, or inode 425 may be a novel data structure. The inode 425 stores information about a file, such as file 175 of FIG. 1. Typically, inode 425 may store file metadata, such as file name 185, the date and time of file creation, the file's owner, etc. But inode 425 may include additional information, as appropriate to the implementation.

[0043] FIG. 5 shows details of KVFS shim 140 of FIG. 1. In FIG. 5, KVFS shim 140 includes reception unit 505, mapping unit 510, command unit 515, and return unit 520. Reception unit 505 is configured to receive commands from application 125 of FIG. 1. Mapping unit 510 is configured to map key-value system commands to file system commands. Command unit 515 is configured to issue file system commands to file system 135 of FIG. 1 (or operating system 130 of FIG. 1). And return unit 520 is configured to return results to application 125 of FIG. 1. Note that unlike KVFS 145 of FIG. 4, KVFS shim 140 may not satisfy key-value system commands on its own, and sends file system commands to file system 135 of FIG. 1 (or operating system 130 of FIG. 1).

[0044] The mapping of key-value system commands to file system commands was discussed above with reference to FIGs. 3A-3B. To achieve this mapping, mapping unit 510 may include any desired mapping from key-value system commands to file system commands. For example, mapping unit 510 might include a table that specifies what file system command(s) correspond to a given key-value system command. But in contrast to mapping unit 410 of FIG. 4, mapping unit 510 in KVFS shim 140 generally has a simpler implementation. While there are numerous file system commands that may be issued to file system 135 of FIG. 1, there are only three key-value system commands that may be issued to a key-value storage device: GET, PUT, and DELETE. A GET command reads data from the storage device; a PUT command writes data to the storage device; and a DELETE command invalidates data on the storage device. Thus the implantation of mapping unit 510 tends to be simpler, given the fewer number of commands that may be issued to key-value storage devices. In addition, file system 135 of FIG. 1 typically has analogous commands for reading, writing, and deleting data, making the mapping from key-value system command to file system command relatively simple. Nevertheless, depending on the specifics of operating system 130 of FIG. 1 and storage device 120 of FIG. 1, a single key-value system command might map to multiple file system commands.

[0045] KVFS shim 140 may also include file descriptor lookup table 525. A file descriptor is an internal mechanism for accessing data in a file (either for writing or reading). KVFS shim 140 may store identifiers for file descriptors in file descriptor lookup table 525. A located file descriptor may then be passed to file system 135 of FIG. 1 as an argument to a file system command. Without file descriptor lookup table 525, either KVFS shim 140 would need to query operating system 130 of FIG. 1 for a file descriptor every time a file needed to be accessed, or else open a file, perform any necessary commands, and then close the file for every key-value system command. But both of these approaches are time-intensive. By storing file descriptors in file descriptor lookup table 525, KVFS shim 140 may quickly determine the appropriate file descriptor for a file system command corresponding to a received key-value system command. File descriptor lookup table 525 is described further below with reference to FIG. 7.

[0046] KVFS shim 140 may also include name generator unit 530. As describe above, metadata objects have names (necessary to be able to access the object), but metadata object names only matter when converting from objects to files, and therefore the names of metadata objects only matter within KVFS shim 140 and KVFS 145 of FIG. 1. As a result, almost any desired algorithm for generating names for metadata objects may be used.

[0047] There are a few desired features for a procedure to generate names for metadata objects. First, the procedure should be deterministic: given the same data, the same metadata name should always result. Second, the procedure should avoid collisions: given different data, different metadata names should result. Third, as object names may have any length, the procedure should be able to process data of any potential length. These are all properties that should be present in name generating unit 530, which may generate a name for metadata object 170 of FIG. 1, given object name 180.

[0048] FIG. 6 shows details of name generator unit 530 of FIG. 5, which may generate a name for metadata object 170 of FIG. 1 from object name 180 of FIG. 1. By starting with object name 180, name generator unit 530 may avoid the complication of trying to consistently generate the same name for metadata object 170 of FIG. 1 from inconsistent input. In FIG. 6, name generator unit 530 may include hash unit 605, ASCII representation unit 610, and collision index unit 615. Hash unit 605 may perform a hash on object name 180. Cryptographic hash algorithms, such as SHA-1, offer most of the properties that are desired for name generation, and therefore make an excellent choice for generating names for metadata objects. But there are some characters (such as a slash, often used to separate the file from its container) that may not be part of a file name. Since the result of a cryptographic hash algorithm is not necessarily usable as a file name, ASCII representation unit 610 may take the result of hash unit 605 and generate an ASCII representation of that result. ASCII representation unit 610 may eliminate any problematic characters from the result of hash unit 605.

[0049] Of course, a cryptographic hash algorithm may not guarantee that there are no collisions between hash results. For example, SHA-1 produces a 160-bit hash result, regardless of the size of the input data. Thus if SHA-1 were given more than 160 bits of input data, SHA-1 would produce a 160-bit hash. For any input size greater than 160 bits, since there are more possible inputs than there are outputs, the possibility of collisions still exists, even if the likelihood is small. To address this possibility, collision index unit 615 may add a collision index to the ASCII representation, in case a collision occurs. The combination of an ASCII representation of the result of hash unit 605 and a collision index may avoid any possible collision in the generation of a name for metadata object 170 of FIG. 1.

[0050] Once the name for metadata object 170 of FIG. 1 has been generated, KVFS shim 140 of FIG. 1 and KVFS 145 of FIG. 1 usees this name to access metadata object 170 of FIG. 1. By sending a PUT, GET, or DELETE request to storage device 120 of FIG. 1 with the generated name for metadata object 170 of FIG. 1, KVFS shim 140 of FIG. 1 and KVFS 145 of FIG. 1 may access and use metadata object 170 of FIG. 1 reliably and consistently.

[0051] FIG. 7 shows details of file descriptor lookup table 525 of FIG. 5. As described above with reference to FIG. 5, file descriptor lookup table 525 provides a mechanism for KVFS shim 140 of FIG. 1 to access a file descriptor for a given file. In FIG. 7, file descriptor lookup table 525 may include any number of associations of hashes and file descriptors. For example, first hash 705 is associated with first file descriptor 710, second hash 715 is associated with second file descriptor 720, and third hash 725 is associated with third file descriptor 730. In FIG. 7, file descriptor lookup table 525 shows three such associations, but embodiments of the inventive concept may support any number of such associations. Given a hash value, KVFS shim 140 may find a corresponding file descriptor, if it exists in file descriptor lookup table 525.

[0052] Hashes 705, 715, and 725 may store file descriptors for files as managed by operating system 130 of FIG. 1. If no file descriptor has yet been opened, KVFS shim 140 of FIG. 1 may open a file and receive a file descriptor back. KVFS shim 140 of FIG. 1 may then add the hash value and the file descriptor to file descriptor lookup table 525 for later use.

[0053] KVFS shim 140 of FIG. 1 may use the name for metadata object 170 of FIG. 1 as the hash for lookup in file descriptor lookup table 525. Since the name for metadata object 170 of FIG. 1 may be generated by using hash unit 605 of FIG. 6 (along with other procedures), the likelihood that two different file descriptors would be associated with the same hash value in file descriptor lookup table 525 are effectively zero.

[0054] FIG. 8 shows details of an example structure of metadata object 170 of FIG. 1. In FIG. 8, metadata object 170 may include various data fields. These data fields may include file name 185, date 805 ("date" as used herein is intended to include both the date and time) that file 175 of FIG. 1 was created, date 810 that file 175 of FIG. 1 was last modified, date 815 that file 175 of FIG. 1 was last accessed, type 820 for file 175 of FIG. 1 (e.g., executable, document, text file, or others), size 825 of file 175 of FIG. 1, container 830 that stores file 175 of FIG. 1, and owner 835 of file 175 of FIG. 1.

[0055] Metadata object 170 may also include object name 180. By including object name 180, access to metadata object 170 gives the system a way back to data object 165 (recall that the name for metadata object 170 may be generated from object name 180). In some embodiments of the inventive concept, metadata object 170 may include object name 180 directly. To make access to metadata object 170 efficient, metadata object 170 should have a fixed size, which means that the space allocated for object name 180 would have to be fixed in advance. But since object names are potentially unbounded in length, including object name 180 within metadata object 170 may create a complication: object name 180 would need to be no longer than the size of the field allocated for object name 180 within metadata object 170. In practice, this is unlikely to be a problem: the field allocated for name 180 may include any desired number of characters: 200, 1000, 10,000, or more. But the possibility of field overflow does exist, which could create an error within operating system 130 of FIG. 1.

[0056] As an alternative, as shown in FIG. 8, metadata object 170 may include pointer 840, which may point to where object name 180 is stored. Once the system knows where object name 180 is stored and length 845 of object name 180, the system may retrieve object name 180. The reason shows metadata object 170 including a pointer to name length 845 is that reading a fixed size of data is more efficient than reading data of unknown size. While FIG. 8 shows name length 845 as stored with object name 180, in other embodiments of the inventive concept name length 845 may be stored within metadata object 170.

[0057] Metadata object 170 may also include pointer 850 to permissions 855. Permissions 855 specify what permissions exist for data object 165 of FIG. 1. The structure of permissions 855 may vary depending on operating system 130 of FIG. 1. For example, in a Unix-based system, permissions 855 may specify whether the owner of file 175 of FIG. 1, other users in the group including the owner of file 175 of FIG. 1, and whether others may read, write, and execute the file. Other operating systems specify permissions 855 in other ways. While FIG. 8 shows permissions 855 being accessed via pointer 850 from metadata object 170, in other embodiments of the inventive concept permissions 855 may be stored within metadata object 170.

[0058] FIGs. 9A-9E show a flowchart of an example procedure for processing a command using computer 105 of FIG. 1, according to an embodiment of the inventive concept. In FIG. 9A, at block 903, KVFS shim 140 of FIG. 1 receives key-value system command 305 of FIG. 3A from application 125 of FIG. 1. At block 906, KVFS shim 140 of FIG. 1 maps key-value system command 305 of FIG. 3A to file system command 310 of FIG. 3A. At block 909, KVFS shim 140 of FIG. 1 may search file descriptor lookup table 525 of FIG. 5 to see if the desired file has previously been opened. As described above with reference to FIG. 6, this search may use a name for metadata object 170 of FIG. 1 generated by name generator unit 530 of FIG. 5.

[0059] At block 912 (FIG. 9B), KVFS shim 140 of FIG. 1 may determine if file descriptor lookup table 525 of FIG. 5 contains the desired file descriptor. If file descriptor lookup table 525 of FIG. 5 contains the desired file descriptor, then at block 915 KVFS shim 140 of FIG. 4 may access the desired file descriptor from file descriptor lookup table 525 of FIG. 5. Otherwise, at block 918, KVFS shim 140 may request a new file descriptor from operating system 130 of FIG. 1 by opening the desired file. At block 921, KVFS shim 140 of FIG. 1 may receive the new file descriptor, and at block 924, KVFS shim 140 of FIG. 1 may add the new file descriptor to file descriptor lookup table 525 of FIG. 5 for future use.

[0060] Either way, once KVFS shim 140 of FIG. 1 has the desired file descriptor, at block 927 (FIG. 9C), KVFS 145 of FIG. 1 receives file system command 310 of FIG. 1 from file system 135 of FIG. 1 (more generally, from operating system 130 of FIG. 1). This block involves KVFS shim 140 of FIG. 1 sending file system command 310 of FIG. 3A to operating system 130 of FIG. 1, to attempt to use page cache 315 of FIG. 3A to satisfy the request; if page cache 315 of FIG. 3A may not satisfy the request, then operating system 130 of FIG. 1 forwards file system command 310 of FIG. 3A to KVFS 145 of FIG. 1.

[0061] At block 930, KVFS 145 checks to see if inode 425 of FIG. 4 exists that stores the desired metadata. If not, then at block 933, KVFS 145 of FIG. 1 requests metadata for file 175 of FIG. 1 from storage device 120 of FIG. 1 (more specifically, KVFS 145 of FIG. 1 requests metadata object 170 of FIG. 1 from storage device 120 of FIG. 1). At block 936, KVFS 145 of FIG. 1 receives metadata object 170 of FIG. 1 for file 175 of FIG. 1 from storage device 120 of FIG. 1. At block 939, KVFS 145 of FIG. 1 stores the metadata in inode 425 of FIG. 4.

[0062] At block 942, regardless of whether or not inode 425 of FIG. 4 exists that stores the desired metadata, KVFS 145 of FIG. 1 accesses object name 180 of FIG. 1 from metadata object 170 of FIG. 1 or from inode 425 of FIG. 4. At block 945, KVFS 145 of FIG. 1 maps file system command 310 of FIG. 3A to key-value system command 330 of FIG. 3A.

[0063] At block 948 (FIG. 9D), KVFS 145 of FIG. 1 may modify inode 425 of FIG. 4, if file system command 310 of FIG. 3A modifies the metadata of file 175 of FIG. 1 in some way. At block 951, KVFS 145 of FIG. 1 attempts to satisfy key-value system command 330 of FIG. 3A using KVFS cache 150 of FIG. 1.

[0064] At block 954, KVFS 145 of FIG. 1 searches KVFS cache 150 of FIG. 1 to see if KVFS cache 150 of FIG. 1 stores the desired data. At block 957, KVFS 145 of FIG. 1 determines if KVFS cache 150 of FIG. 1 stores the desired data. If data object 165 of FIG. 1 (or some pertinent portion of data object 165 of FIG. 1) is not stored in KVFS cache 150 of FIG. 1, then at block 960 KVFS 145 of FIG. 1 sends key-value system command 330 of FIG. 3A to storage device 120 of FIG. 1 to retrieve data object 165 of FIG. 1. At block 963, KVFS 145 of FIG. 1 receives data object 165 of FIG. 1 from storage device 120, and at block 966, KVFS 145 of FIG. 1 stores copy 155 of FIG. 1 of data object 165 of FIG. 1 in KVFS cache 150 of FIG. 1. This storage block, of course, might involve expunging some data from KVFS cache 150 of FIG. 1, to make room for the new data. KVFS 145 of FIG. 1 may use any desired algorithm to select what data to expunge from KVFS cache 150 of FIG. 1.

[0065] At this point, KVFS 145 of FIG. 1 may be certain that KVFS cache 150 of FIG. 1 stores the desired data. At block 969 (FIG. 9E), KVFS 145 of FIG. 1 may access the data or portion thereof from copy 155 of FIG. 1 of data object 165 of FIG. 1 from KVFS cache 150 of FIG. 1. If data is being written, this access operation may involve either deleting data object 165 of FIG. 1 from storage device 120 of FIG. 1 and writing a new data object, or merely flagging the page in KVFS cache 150 of FIG. 1 as dirty (so that the page may be flushed to storage device 120 of FIG. 1 at a later time). At block 972, KVFS 145 of FIG. 1 returns result 335 of FIG. 3A to operating system 130 of FIG. 1, which eventually propagates up to application 125 of FIG. 1 as result 320 of FIG. 3A at block 975.

[0066] The above description is very complicated, as it views the operations of all levels within operating system 130 of FIG. 1: KVFS shim 140 of FIG. 1, file system 135 of FIG. 1, and KVFS 145 of FIG. 1. Reviewing the operations at KVFS shim 140 of FIG. 1 and KVFS 145 of FIG. 1 separately might be beneficial. (Since file system 135 of FIG. 1 remains unchanged in embodiments of the inventive concept, no analysis of the operations of file system 135 of FIG. 1 is provided below.)

[0067] FIGs. 10A-10B show a flowchart of an example procedure for the operation of KVFS shim 140 of FIG. 1. In FIG. 10A, at block 1005, reception unit 505 of FIG. 5 may receive key-value system command 305 of FIG. 3A from application 125 of FIG. 1. At block 1010, mapping unit 510 of FIG. 5 may map key-value system command 305 of FIG. 3A to file system command 310 of FIG. 3A. As described below with reference to FIG. 13, this may involve generating a name for metadata object 170 of FIG. 1. At block 1015, KVFS shim 140 of FIG. 1 may search file descriptor lookup table 525 to see if a file descriptor exists for file 175 of FIG. 1.

[0068] At block 1020 (FIG. 10B), KVFS shim 140 of FIG. 1 may determine if a file descriptor for file 175 of FIG. 1 was found in file descriptor lookup table 525 of FIG. 5. If a file descriptor for file 175 of FIG. 1 was found in file descriptor lookup table 525 of FIG. 5, then the file descriptor is accessed at block 1025. Otherwise, at block 1030, KVFS shim 140 of FIG. 1 requests a new file descriptor for file 175 of FIG. 1. This request may involve asking file system 135 of FIG. 1 to open file 175 of FIG. 1. At block 1035, KVFS shim 140 of FIG. 1 may receive the new file descriptor from file system 135 of FIG. 1, and at block 1040, KVFS shim 140 of FIG. 1 may add the new file descriptor (and the name for metadata object 170 of FIG. 1) to file descriptor lookup table 525 of FIG. 5.

[0069] Either way, once KVFS shim 140 of FIG. 1 has the file descriptor for file 175 of FIG. 1, at block 1045, KVFS shim 140 of FIG. 1 may send file system command 310 toward storage device 120 of FIG. 1 (via operating system 130 of FIG. 1). Then, at block 1050, KVFS shim 140 of FIG. 1 may return result 325 of FIG. 3A, as received from operating system 130 of FIG. 3A, to application 125 of FIG. 1

[0070] Note again that KVFS shim 140 of FIG. 1 is responsible for translating key-value system commands to file system commands, so that native page cache 315 of FIG. 3A of operating system 130 of FIG. 3A may be leveraged. If application 125 issues file system commands rather than key-value system commands, KVFS shim 140 of FIG. 1 may be bypassed and the file system command may be delivered directly to operating system 130 of FIG. 1 (and result 325 of FIG. 3B may be returned directly to application 125 of FIG. 1).

[0071] FIGs. 11A-11B show a flowchart of an example procedure for the operation of KVFS 145 of FIG. 1. In FIG. 11A, at block 1105, KVFS 145 of FIG. 1 may receive file system command 310 of FIG. 3A. At block 1110, KVFS 145 of FIG. 1 may search for inode 425 of FIG. 4 that contains metadata for file 175 of FIG. 1 identified by file system command 310 of FIG. 3A. At block 1115, KVFS 145 of FIG. 1 may determine if inode 425 of FIG. 4 was located. If inode 425 of FIG. 4 was located, then at block 1120, KVFS 145 of FIG. 1 may access inode 425 of FIG. 4, and at block 1125, KVFS 145 of FIG. 1 may access object name 180 from inode 425 of FIG. 4.

[0072] On the other hand, if at block 1115 KVFS 145 of FIG. 1 could not locate inode 425 of FIG. 4, then at block 1130 (FIG. 11B), KVFS 145 of FIG. 1 may request metadata object 170 of FIG. 1 from storage device 120 of FIG. 1. At block 1135, KVFS 145 of FIG. 1 may receive metadata object 170 of FIG. 1 from storage device 120. At block 1140, KVFS 145 of FIG. 1 may extract metadata from metadata object 170 of FIG. 1. At block 1145, KVFS 145 of FIG. 1 may access object name 180 of FIG. 1 from metadata object 170 of FIG. 1. This extraction might be a direct operation, if metadata object 170 directly stores object name 180, or it might be an indirect operation: KVFS 145 of FIG. 1 might first extract a pointer to object name 180 (and possibly name length 845) before loading object name 180. And at block 1150, KVFS 145 of FIG. 1 may create inode 425 of FIG. 4.

[0073] Regardless of whether inode 425 of FIG. 4 was located or created, at block 1155 KVFS 145 of FIG. 1 may attempt to perform the file system command on copy 155 of FIG. 1 of data object 165 of FIG. 1, if present in KVFS cache 150 of FIG. 1. Finally, at block 1160, KVFS 145 of FIG. 1 may return a result of the command.

[0074] A review of FIGs. 11A-11B might suggest that KVFS 145 of FIG. 1 does not send a key-value system command to storage device 120 of FIG. 1. This conclusion would be incorrect, as explained below with reference to FIGs. 12A-12B, which elaborates on block 1155 of FIG. 11B.

[0075] FIGs. 12A-12B show a flowchart of an example procedure for using KVFS cache 150 of FIG. 1. In FIG. 12A, at block 1205, KVFS 145 of FIG. 1 may search KVFS cache 150 of FIG. 1 to see if copies 155 and 160 of FIG. 1 of data object 165 and metadata object 170 of FIG. 1 are in KVFS cache 150 of FIG. 1. At block 1210, KVFS 145 of FIG. 1 may determine if KVFS cache 150 of FIG. 1 stores copies 155 and 160 of FIG. 1 of data object 165 and metadata object 170 of FIG. 1. Note that in this context, "storing a copy" does not necessarily mean storing the entirety of copies 155 and 160 of FIG. 1 of data object 165 and metadata object 170 of FIG. 1, or even necessarily parts of both data object 165 and metadata object 170 of FIG. 1. All that is needed is for KVFS cache 150 of FIG. 1 to store copies of the portions of data object 165 and/or metadata object 170 of FIG. 1 to which the file system command applies. If KVFS cache 150 of FIG. 1 stores copies of all of the pertinent portions of data object 165 and/or metadata object 170 of FIG. 1, KVFS 145 of FIG. 1 may conclude that KVFS cache 150 of FIG. 1 stores copies 155 and 160 of FIG. 1 of data object 165 and metadata object 170 of FIG. 1, even if copies 155 and 160 of FIG. 1 are not complete copies of data object 165 and metadata object 170 of FIG. 1.

[0076] If KVFS cache 150 of FIG. 1 stores copies 155 and 160 of FIG. 1 of data object 165 and metadata object 170 of FIG. 1, then at block 1215, KVFS 145 of FIG. 1 may perform file system command 310 of FIG. 3A on copies 155 and 160 of FIG. 1. If file system command 310 involves changing any data for either data object 165 of FIG. 1 or metadata object 170 of FIG. 1, then KVFS 145 of FIG. 1 may either mark the affected pages in KVFS cache 150 of FIG. 1 as dirty, so the changed data may eventually be flushed to storage device 120, or KVFS 145 of FIG. 1 may immediately delete the existing affected objects and store replacement copies of the changed objects in storage device 120.

[0077] If KVFS cache 150 of FIG. 1 does not store copy 155 of FIG. 1 of data object 165 of FIG. 1, then at block 1220 (FIG. 12B), KVFS 145 of FIG. 1 may map file system command 310 of FIG. 3A to key-value system command 330 of FIG. 3A. At block 1225, KVFS 145 of FIG. 1 may send key-value system command 330 of FIG. 3A to storage device 120 of FIG. 1. At block 1230, KVFS 145 of FIG. 1 may receive a copy of the object that was affected by key-value system command 330 of FIG. 3A from storage device 120. At block 1235, KVFS 145 of FIG. 1 may store copies 155 and/or 160 of FIG. 1 of data object 165 and/or metadata object 170 of FIG. 1 in KVFS cache 150 of FIG. 1, as received from storage device 120 of FIG. 1.

[0078] At block 1240, whether or not KVFS cache 150 of FIG. 1 stored copies 155 and/or 160 of FIG. 1 or data object 165 and/or metadata object 170 of FIG. 1, the command has been performed. At block 1240, KVFS 145 of FIG. 1 may modify inode 425 of FIG. 4 to reflect whatever changes were indicated by file system command 330 of FIG. 3A. At block 1245, KVFS 145 of FIG. 1 may access the pertinent portion of the data from either or both of copies 155 and/or 160 of FIG. 1 from KVFS cache 150 of FIG. 1. At block 1250, KVFS 145 of FIG. 1 may return the access portion of data to operating system 130 of FIG. 1.

[0079] FIG. 13 shows a flowchart of an example procedure for generating a file name from object name 180 using name generator unit 530 of FIG. 5, according to an embodiment of the inventive concept. In FIG. 13, at block 1305, KVFS shim 140 of FIG. 1 may receive object name 180, from which a file name is to be generated. At block 1310, hash unit 605 of FIG. 6 may apply a hash algorithm to object name 180 of FIG. 1 to produce a hash value. At block 1315, ASCII representation unit 610 of FIG. 6 may generate an ASCII representation of the hash value, thereby producing a valid file name within file system 135 of FIG. 1. At block 1320, collision index unit 615 may combine the ASCII representation of the hash value with a collision index to produce a name for metadata object 170 of FIG. 1 that is guaranteed to be unique within operating system 130 of FIG. 1 (or at least, unique within the folder that is supposed to contain file 175 of FIG. 1).

[0080] FIGs. 14 shows a flowchart of an example procedure for modifying metadata object 170 of FIG. 1 in the system of FIG. 1. Recall that when storage device 120 of FIG. 1 is a flash-based storage device, data may not be overwritten. Instead, to modify data the original data is invalidated (and later subject to garbage collection), and a new data object is written containing the modified data. In FIG. 14, at block 1410, KVFS 145 of FIG. 1 may delete metadata object 170 of FIG. 1 from storage device 120 of FIG. 1. At block 1415, KVFS 145 of FIG. 1 may store a replacement metadata object in storage device 120 of FIG. 1.

[0081] The following discussion is intended to provide a brief, general description of a suitable machine or machines in which certain aspects of the inventive concept may be implemented. The machine or machines may be controlled, at least in part, by input from conventional input devices, such as keyboards, mice, etc., as well as by directives received from another machine, interaction with a virtual reality (VR) environment, biometric feedback, or other input signal. As used herein, the term "machine" is intended to broadly encompass a single machine, a virtual machine, or a system of communicatively coupled machines, virtual machines, or devices operating together. Exemplary machines include computing devices such as personal computers, workstations, servers, portable computers, handheld devices, telephones, tablets, etc., as well as transportation devices, such as private or public transportation, e.g., automobiles, trains, cabs, etc.

[0082] The machine or machines may include embedded controllers, such as programmable or non-programmable logic devices or arrays, Application Specific Integrated Circuits (ASICs), embedded computers, smart cards, and the like. The machine or machines may utilize one or more connections to one or more remote machines, such as through a network interface, modem, or other communicative coupling. Machines may be interconnected by way of a physical and/or logical network, such as an intranet, the Internet, local area networks, wide area networks, etc. One skilled in the art will appreciate that network communication may utilize various wired and/or wireless short range or long range carriers and protocols, including radio frequency (RF), satellite, microwave, Institute of Electrical and Electronics Engineers (IEEE) 802.11, Bluetooth®, optical, infrared, cable, laser, etc.

[0083] Embodiments of the present inventive concept may be described by reference to or in conjunction with associated data including functions, procedures, data structures, application programs, etc. which when accessed by a machine results in the machine performing tasks or defining abstract data types or low-level hardware contexts. Associated data may be stored in, for example, the volatile and/or non-volatile memory, e.g., RAM, ROM, etc., or in other storage devices and their associated storage media, including hard-drives, floppy-disks, optical storage, tapes, flash memory, memory sticks, digital video disks, biological storage, etc. Associated data may be delivered over transmission environments, including the physical and/or logical network, in the form of packets, serial data, parallel data, propagated signals, etc., and may be used in a compressed or encrypted format. Associated data may be used in a distributed environment, and stored locally and/or remotely for machine access.

[0084] Embodiments of the inventive concept may include a tangible, non-transitory machinereadable medium comprising instructions executable by one or more processors, the instructions comprising instructions to perform the elements of the inventive concepts as described herein.

[0085] Having described and illustrated the principles of the inventive concept with reference to illustrated embodiments, it will be recognized that the illustrated embodiments may be modified in arrangement and detail without departing from such principles, and may be combined in any desired manner. And, although the foregoing discussion has focused on particular embodiments, other configurations are contemplated. In particular, even though expressions such as "according to an embodiment of the inventive concept" or the like are used herein, these phrases are meant to generally reference embodiment possibilities, and are not intended to limit the inventive concept to particular embodiment configurations. As used herein, these terms may reference the same or different embodiments that are combinable into other embodiments.

[0086] The foregoing illustrative embodiments are not to be construed as limiting the inventive concept thereof. Although a few embodiments have been described, those skilled in the art will readily appreciate that many modifications are possible to those embodiments without materially departing from the novel teachings and advantages of the present disclosure. Accordingly, all such modifications are intended to be included within the scope of this inventive concept as defined in the claims.


Claims

1. A method, comprising:

step A) receiving (903) a first key-value system command (305) at a key-value file system shim (140), wherein the first key-value system command (305) includes an object name identifying an object and wherein the first key-value system command (305) is a command issued by an application (125) to access a storage device (120);

step B) mapping (906) by the key-value file system shim (140) the first key-value system command (305) to a file system command (310), the file system command (310) including a file name (185) identifying a file (175);

step C) sending by the key-value file system shim (140) the file system command (310) to a file system (135);

step D) attempting to use a page cache (315) of an operating system (130), the page cache (315) being included in the file system(135) to satisfy the file system command (310);

step E) if the page cache (315) cannot satisfy the file system command (310),

step E1) receiving (927) the file system command (310) at a key-value file system (145) from the file system (135);

step E2) checking (930) by the key-value file system (145) if an inode (425) for the file that stores desired metadata for the file (175) exists;

step E3) if no inode for the file that stores the desired metadata of the file (175) exists, requesting (933) by the key-value file system (145) a metadata object (170) for the file (175) from the storage device (120);

receiving (936) the metadata object (170) for the file at the key-value file system (145) from the storage device (120);

storing (939) by the key-value file system (145) the metadata from the metadata object (170) in the inode (425) for the file;

step E4) accessing (942) by the key-value file system (145) the object name (180) from the metadata object (170) or from the inode (425);

step E5) mapping (945) by the key-value file system (145) the file system command (310) to a second key-value system command (330) including the object name identifying the object;

step E6) attempting (951) by the key-value file system (145) to satisfy the second key-value system command (330) using a key-value file system cache (150);

searching (954) by the key-value file system (145) the key-value file system cache (150) to see if the key-value file system cache (150) stores the object (165);

step E7) if the object (165) is not found in the key-value file system cache (150), sending (960) by the key-value file system (145) the second key-value system command (330) to the storage device (120) to retrieve the object (165);

receiving the object (165) from the storage device (120);

storing (966) by the key-value file system (145) a copy (155) of the object (165) in the key-value file system cache (150);

returning the object (165) to the page cache of the operating system (130); and

propagating the object (165) to the application (125).


 
2. The method according to claim 1, wherein attempting (951) by the key-value file system (145) to satisfy the second key-value system command (330) using the key-value file system cache (150) comprises:
searching (954) the key-value file system cache (150) for the object name (180).
 
3. The method according to claim 1, further comprising:

receiving (1305) at the key-value file system shim (140) the object name (180); and

applying (1310) a hash function to the object name (180) to produce a file name (175).


 
4. The method according to claim 3, further comprising generating (1315) the file name (175) as an ASCII representation of a hash value of the hash function.
 
5. The method according to claim 4 wherein generating (1315) the file name (175) as an ASCII representation of the hash value includes combining the ASCII representation of the hash value with a collision index.
 
6. A computer system implementing the method according to any one of claims 1 to 5.
 


Ansprüche

1. Verfahren, aufweisend:

Schritt A) Empfangen (903) eines ersten Schlüsselwert-Systembefehls (305) bei einem Schlüsselwert-Dateisystemshim (140), wobei der erste Schlüsselwert-Systembefehl (305) einen Objektnamen, der ein Objekt identifiziert, beinhaltet, und wobei der erste Schlüsselwert-Systembefehl (305) ein Befehl ist, der durch eine Anwendung (125) ausgegeben wird, um auf eine Speichervorrichtung (120) zuzugreifen;

Schritt B) Mappen (906), durch den Schlüsselwert-Dateisystemshim (140), des ersten Schlüsselwert-Systembefehls (305) auf einen Dateisystembefehl (310),

wobei der Dateisystembefehl (310) einen Dateinamen (185), der eine Datei (175) identifiziert, beinhaltet;

Schritt C) Senden, durch den Schlüsselwert-Dateisystemshim (140), des Dateisystembefehls (310) an ein Dateisystem (135);

Schritt D) Versuchen, einen Seitencache (315) eines Betriebssystems (130) zu nutzen, wobei der Seitencache (315) in dem Dateisystem (135) enthalten ist, um den Dateisystembefehl (310) zu erfüllen;

Schritt E) falls der Seitencache (315) den Dateisystembefehl (310) nicht erfüllen kann,

Schritt E1) Empfangen (927) des Dateisystembefehls (310) an einem Schlüsselwert-Dateisystem (145) von dem Dateisystem (135);

Schritt E2) Prüfen (930) durch das Schlüsselwert-Dateisystem (145), ob ein Inode (425) für die Datei, das gewünschten Metadaten für die Datei (175) speichert, existiert,

Schritt E3) falls kein Inode für die Datei, das die gewünschten Metadaten der Datei (175) speichert, existiert, Anfordern (933), durch das Schlüsselwert-Dateisystem (145), eines Metadaten-Objekts (170) für die Datei (175) von der Speichervorrichtung (120);

Empfangen (936) des Metadaten-Objekts (170) für die Datei an dem Schlüsselwert-Dateisystem (145) von der Speichervorrichtung (120);

Speichern (939), durch das Schlüsselwert-Dateisystem (145), der Metadaten von dem Metadaten-Objekt (170) in dem Inode (425) für die Datei;

Schritt E4) Zugreifen (942), durch das Schlüsselwert-Dateisystem (145), auf den Objektnamen (180) von dem Metadaten-Objekt (170) oder von dem Inode (425);

Schritt E5) Mappen (945), durch das Schlüsselwert-Dateisystem (145), des Dateisystembefehls (310) auf einen zweiten Schlüsselwert-Systembefehl (330), der den Objektnamen enthält, der das Objekt identifiziert;

Schritt E6) Versuchen (951), durch das Schlüsselwert-Dateisystem (145), den zweiten Schlüsselwert-Systembefehl (330) unter Verwendung eines Schlüsselwert-Dateisystemcaches (150) zu erfüllen;

Suchen (954), durch das Schlüsselwert-Dateisystem (145), des Schlüsselwert-Dateisystemcaches (150), um festzustellen, ob der Schlüsselwert-Dateisystemcache (150) das Objekt (165) speichert;

Schritt E7) falls das Objekt (165) in dem Schlüsselwert-Dateisystemcache (150) nicht gefunden wird, Senden (960), durch das Schlüsselwert-Dateisystem (145), des zweiten Schlüsselwert-Systembefehls (330) an die Speichervorrichtung (120), um das Objekt (165) abzurufen;

Empfangen des Objekts (165) von der Speichervorrichtung (120);

Speichern (966), durch das Schlüsselwert-Dateisystem (145), einer Kopie (155) des Objekts (165) in dem Schlüsselwert-Dateisystemcache (150);

Zurückführen des Objekts (165) zu dem Seitencache des Betriebssystems (130);

und

Übertragen des Objekts (165) an die Anwendung (125).


 
2. Verfahren nach Anspruch 1, wobei das Versuchen (951), durch das Schlüsselwert-Dateisystem (145), den zweiten Schlüsselwert-Systembefehl (330) unter Verwendung des Schlüsselwert-Dateisystemcaches (150) zu erfüllen, aufweist:
Durchsuchen (954) des Schlüsselwert-Dateisystemcaches (250) nach dem Objektnamen (180).
 
3. Verfahren nach Anspruch 1, ferner aufweisend:

Empfangen (1305) des Objektnamens (180) an dem Schlüsselwert-Dateisystemshim (140); und

Anwenden (1310) einer Hashfunktion auf den Objektnamen (180), um einen Dateinamen (175) zu erzeugen.


 
4. Verfahren nach Anspruch 3, ferner aufweisend ein Erzeugen (1315) des Dateinamens (175) als eine ASCII-Repräsentation eines Hashwerts der Hashfunktion.
 
5. Verfahren nach Anspruch 4, wobei das Erzeugen (1315) des Dateinamens (175) als eine ASCII-Repräsentation des Hashwerts ein Kombinieren der ASCII-Repräsentation des Hashwerts mit einem Kollisionsindex beinhaltet.
 
6. Computersystem, das das Verfahren nach einem der Ansprüche 1 bis 5 implementiert.
 


Revendications

1. Procédé, comprenant :

une étape A) de réception (903) d'une première commande de système de valeur de clé (305) au niveau d'une cale de système de fichiers de valeur de clé (140), où la première commande de système de valeur de clé (305) inclut un nom d'objet identifiant un objet et où la première commande de système de valeur de clé (305) est une commande émise par une application (125) pour accéder à un dispositif de stockage (120) ;

une étape B) de mappage (906), par la cale de système de fichiers de valeur de clé (140), de la première commande de système de valeur de clé (305) vers une commande de système de fichiers (310), la commande de système de fichiers (310) incluant un nom de fichier (185) identifiant un fichier (175) ;

une étape C) d'envoi, par la cale de système de fichiers de valeur de clé (140), de la commande de système de fichiers (310) vers un système de fichiers (135) ;

une étape D) de tentative pour utiliser un cache de page (315) d'un système d'exploitation (130), le cache de page (315) étant inclus dans le système de fichiers (135) afin de satisfaire la commande de système de fichiers (310) ;

une étape E) si le cache de page (315) ne peut pas satisfaire la commande de système de fichiers (310),

une étape E1) de réception (927) de la commande de système de fichiers (310) au niveau d'un système de fichiers de valeur de clé (145) depuis le système de fichiers (135) ;

une étape E2) de vérification (930), par le système de fichiers de valeur de clé (145), si un inode (425) pour le fichier qui stocke une métadonnée souhaitée pour le fichier (175) existe ;

une étape E3), si aucun inode pour le fichier qui stocke la métadonnée souhaitée du fichier (175) n'existe, de demande (933) par le système de fichiers de valeur de clé (145), d'un objet de métadonnée (170) pour le fichier (175) depuis le dispositif de stockage (120) ;

la réception (936) de l'objet de métadonnée (170) pour le fichier au niveau du système de fichiers de valeur de clé (145) depuis le dispositif de stockage (120) ;

le stockage (939), par le système de fichiers de valeur de clé (145), de la métadonnée depuis l'objet de métadonnée (170) dans l'inode (425) pour le fichier ;

une étape E4) d'accès (942), par le système de fichiers de valeur de clé (145), au nom d'objet (180) depuis l'objet de métadonnée (170) ou depuis l'inode (425) ;

une étape E5) de mappage (945), par le système de fichiers de valeur de clé (145), de la commande de système de fichiers (310) vers une seconde commande de système de valeur de clé (330) incluant le nom d'objet identifiant l'objet ;

une étape E6) de tentative (951), par le système de fichiers de valeur de clé (145), pour satisfaire la seconde commande de système de valeur de clé (330) à l'aide d'un cache de système de fichiers de valeur de clé (150) ;

la recherche (954), par le système de fichiers de valeur de clé (145), du cache de système de fichiers de valeur de clé (150) pour voir si le cache de système de fichiers de valeur de clé (150) stocke l'objet (165) ;

une étape E7), si l'objet (165) n'a pas été trouvé dans le cache de système de fichiers de valeur de clé (150), d'envoi (960), par le système de fichiers de valeur de clé (145), de la seconde commande de système de valeur de clé (330) vers le dispositif de stockage (120) pour récupérer l'objet (165) ;

la réception de l'objet (165) depuis le dispositif de stockage (120) ;

le stockage (966), par le système de fichiers de valeur de clé (145), d'une copie (155) de l'objet (165) dans le cache de système de fichiers de valeur de clé (150) ;

le renvoi de l'objet (165) vers le cache de page du système d'exploitation (130) ; et

la propagation de l'objet (165) vers l'application (125) .


 
2. Procédé selon la revendication 1, dans lequel la tentative (951), par le système de fichiers de valeur de clé (145), pour satisfaire la seconde commande de système de valeur de clé (330) à l'aide d'un cache de système de fichiers de valeur de clé (150) comprend :
la recherche (954) du cache de système de fichiers de valeur de clé (150) pour le nom d'objet (180).
 
3. Procédé selon la revendication 1, comprenant en outre :

la réception (1305), au niveau de la cale de système de fichiers de valeur de clé (140), du nom d'objet (180) ; et

l'application (1310) d'une fonction de hachage au nom d'objet (180) pour produire un nom de fichier (175).


 
4. Procédé selon la revendication 3, comprenant en outre la génération (1315) du nom de fichier (175) en tant que représentation ASCII d'une valeur de hachage de la fonction de hachage.
 
5. Procédé selon la revendication 4, dans lequel la génération (1315) du nom de fichier (175) en tant que représentation ASCII de la valeur de hachage inclut la combinaison de la représentation ASCII de la valeur de hachage avec un indice de collision.
 
6. Système informatique mettant en œuvre le procédé selon l'une quelconque des revendications 1 à 5.
 




Drawing

































































Cited references

REFERENCES CITED IN THE DESCRIPTION



This list of references cited by the applicant is for the reader's convenience only. It does not form part of the European patent document. Even though great care has been taken in compiling the references, errors or omissions cannot be excluded and the EPO disclaims all liability in this regard.

Non-patent literature cited in the description