Computer technologies for most daily activities and this

 Computer Science EE – 3863 WordsAdvances in File Systems: the effective improvement of APFS over HFS+grz888Table of Contents1 Introduction………………….……………………………………..…………Page 3    1.1 File Systems…………………………………………………….…….Page 3            1.2 Importance of File Systems………………………………………….Page 4     1.3 Hierarchical File System (Plus)……………………………………..Page 5    1.4 Apple File System and its Advantages……………………………..Page 62 Investigation…………………………………………………………………..Page 8    2.1 Test Results…………………………………………………….……Page 10    2.2 Graphing and Analysis..….………………………………….……..Page 12    2.3 Limitations of Investigations……………………………………….Page 19    2.4 Secondary Data……………………………………………………..Page 203 Conclusion.…………………………………………………………………..Page 22    3.1 Evaluation……………………………………………………………Page 22Bibliography……………………………………………………………………Page 23Appendix..………………………………………………………………………Page 231 IntroductionNow, we are at a peak of computer system utilization. Humans use computer based technologies for most daily activities and this skyrocketing of computer usage has lead to a massive increase in data storage. In 2012, it was estimated that the cumulative quantity of digitally stored data on Earth was 2.7 zettabytes. Currently, we must look towards trying to improve data storage methods in order to ease transfer and increase speeds. For this, we should look towards the most basic level of storage device organization: file systems. In this ideal, we see that Apple, Inc. updated their nineteen year old Hierarchical File System (Plus) filesystem with a more optimized one known as Apple File System (APFS). To see how advances in file systems actually affect real world computation, I have delved upon the research question: “To what extent does the Apple File System improve upon the Hierarchical File System (Plus)?” and will therefore investigate the resulting transfer speed changes due to the new file system. Ultimately, I would be able to evaluate whether improvements in file systems result in visible increase of access speeds, which would lead to overall faster computation.1.1 File SystemsFile systems are features of storage devices that control the method of access of data in the devices. The basic function is to isolate data seperately on the storage devices and to store an address via the usage of a name for identification. This allocation of space on storage disks is essential for access and storage on the disk. Furthermore, file systems determine the organization of files and directories on the drives.Normally, file systems are composed of three different parts: the logical file system, the virtual file system and the physical file system. The logical file system is provides the application programming interface (API) for read and write operations on the disk. It pushes operations to the next layer in the file system – the virtual file system. This part has relatively less function, simply serving as an interface for the presence of multiple file systems on the same disks. This is done through the partitioning of the storage devices. The physical file system manages the physical operation of files or blocks on the disk, while also determining memory management and buffering of operations.1.2 Importance of File SystemsWithout file systems, the access and storage of data would not be possible. They specify essential aspects to files that allow for accessibility. File systems control:Space ManagementFile systems generally allocate space granularly, which means that physical units or blocks on the disk are small scales to contain parts of files. Multiple ‘grains’ make up files, hence the term of ‘granularity’.File NamingRefers to a storage location in a file system. Some file systems have case-sensitive file names, while others may not. There may also sometimes be restrictions on the lengths of file namesDirectories and FoldersDirectories are containers of more files, possibly sub-directories as well. They can be implemented by either indexes in tables of content or through inodes. An inode is a data structure in a Unix-style file system that describes a filesystem object such as a file or a directory. Each inode stores the attributes and disk block location(s) of the object’s data.MetadataAdditional information that may allow for easy bookkeeping such as time created, time modified, owner and permissions1.3 Hierarchical File System (Plus)The Hierarchical File System (Plus) (HFS+) is the updated version of the former Hierarchical File System (HFS). It was released almost two decades ago, in 1998 and has received a large amount of criticism, mainly for lacking elements most common in modern file systems.The structure of a typical HFS+ volume revolves around the following objects:HFS Boot BlocksIn charge of instantiating the drive and its components, thereby it is called a ‘boot’ block.Volume HeaderContains data about the volume itself, like timestamps and sizes of allocation blocks in the volume.Allocation FileKeeps track of which allocation blocks in the volume are free. Allocation blocks are groupings of sectors of the storage device. These blocks collaboratively make up file storage locations.Catalog FileStores folders and directories in the volume and acts like a table of contents to show what file is present in which directory.Attributes FileStores metadata corresponding to files in the volume.The absence of data checksums in HFS+ has lead to the file system having had a lot of criticism since its release. Further problems in this file system are caused by the fact that the creation of HFS+ was done quite a while ago. It was not designed for Unix-based systems, such as Mac OS X, and therefore lead to large amounts of inefficiencies when Apple shifted computers to its new OS, with the same old file system. Another reason for failure of the HFS+ is that it was designed for older processors that were big-endian, that is, bytes were sorted in descending order. However, newer processors that current Apple computers use are sorted in ascending order, otherwise known as little-endian. For this reason, the metadata’s order must be swapped during every read or write operation occurs.1.4 Apple File System and its AdvantagesAPFS is not an extension of HFS+. From HFS+, we are familiar with special ?les such as the catalog ?le, attributes ?le, allocation ?le and extents over?ow ?le. These ?les do not exist any more nor does the journal exist. APFS uses a different strategy in ensuring secure changes in the ?le system.The major components of the APFS ?le system are:Container SuperblockContains information about the entire APFS container such as: the block size limitations, the total number of blocks and previous checkpoints.Checkpoint Superblock DescriptorThis block contains information about meta-data structures in APFS.Volume SuperblockThis is the highest level in a volume and contains information about that volume.File and folder B-TreeRecords all ?les and folders in the volume. It performs the same role as the catalog ?le in HFS+.Extents B-TreeA separate B-Tree of all extents per volume. Extents are references to ?le content, with information about where the data content starts and the length in blocks.SnapshotsA snapshot is a user stored state of a volume at the time when the snapshot was created.CheckpointsA checkpoint is a historical state of the container. Theoretically, APFS has an upward edge over HFS+. APFS better suits the needs of today’s and tomorrow’s computers and mobile devices because it’s made for solid-state storage – Flash, and SSDs. These storage technologies work differently than spinning drives do, so it only makes sense to optimize the file system to take advantage.HFS+ supports 32-bit file IDs, for example, while APFS ups that to 64-bit. That means that while HFS+ can keep track of about 4 billion individual pieces of information on its hard drive, APFS keeps track of about 9 quintillion. Even though APFS can keep track of orders of magnitude more data than HFS+, Instead of duplicating information like HFS+ does, APFS updates metadata links to the actual stored information. Space Sharing is another new feature of APFS. Space Sharing helps the CPUs manage free space on its hard drives more efficiently. You can set up multiple partitions, even multiple file systems, on a single physical device, and all of them can share the same space. You presently have to jump through hoops if you’re resizing partitions and want to re-use de-allocated space. APFS views individual physical devices as “containers,” with multiple “volumes” inside. This opposes that of HFS+.2 InvestigationTo investigate the quantitative level of advancement of APFS over HFS+, if it exists, I decided to test the read and write speeds in each file format. The reason for the development of APFS is to increase the quality and efficiency of the file system that Apple’s operating systems use. This would lead to eventual increases in transfer speeds. For this task, I used a freeware software called XBench® developed by Spiny Software for benchmarking in Macintosh machines. For disk testing, XBench runs Sequential access and Random access tests on the storage media being tested. Initially, it checks read and write speeds while transferring 4 kilobyte sized files and then, it checks using 256 kilobyte files. The output received from XBench software is in terms of Megabytes per second (MB/sec). All tests use no cache and are uncached; they do not use any caching. This maintains consistency. For additional information about XBench and benchmarking in the investigation, refer to Appendix 2 on Page 24.In order to get readings to compare the file systems, I initially formatted an external USB drive to HFS+. Then, I benchmarked it using XBench® software, getting values for the following:Sequential 4K ReadSequential 4K WriteSequential 256K ReadSequential 256K WriteRandom 4K ReadRandom 4K WriteRandom 256K ReadRandom 256K WriteFollowing this, I formatted the same drive to APFS and benchmarked it to get readings for the above.To keep the readings consistent, the same computer was used for all the readings: a Macbook Air with a 1.4 GHz Intel Core i5 processor and 4 GB 1600 MHz DDR3 RAM. All other user run applications and software were stopped, with only background system processes being executed. The laptop was connected to battery power to ensure consistent full performance of the processor. Storage Devices: In order to make sure that quality of pen drives did not influence or affect the readings taken, three different pen drives were used. They were all of the same size, that is, 8 GB. Furthermore, they were of the same standard, that is, USB 2.0 (two point zero). The following pen drives were utilized for this experiment:Transcend USB 2.0 – 8 GBSanDisk SanCruzer USB 2.0 – 8 GBSony USB 2.0 – 8 GBAll three were inserted into the same port on the computer, to ensure further consistency in the investigative process.2.1 Test ResultsHierarchical File System Plus (HFS+) Raw Data (values in MB/second)Apple File System (APFS) Raw Data (values in MB/second)From the raw data collected after carrying out the investigation, we can see an overall increase in data transfer speeds in APFS. Another general trend we can see is that Random Access is slower than Sequential Access, in both cases. Furthermore, reading and writing larger amounts of data, that is, 256 kilobytes is faster than 4 kilobytes. This also applies in both cases.2.2 Graphing and AnalysisIn order to attain a better understanding of the significance of the results, the average speeds in all environments for both Apple File System and Hierarchical File System Plus were taken. The obtained average speeds were than graphed to get a visual depiction of the recordings.Hierarchical File System Plus (HFS+)Apple File System (APFS)Averages for Sequential AccessIn terms of sequential access, we can see that the pen drives functioned much more impressively while formatted under the Apple File System. The increase is seen to be approximately two-fold in both Sequential 4K Write and Sequential 256K Write. However, while looking at 256K Read, the increase in speed in the APFS format is almost four fold. This can be better understood in the following graph:The above line graph uses the average speeds as chart data. It clearly indicates that the shape of the each of the curves are quite similar, which would mean that the results have not been influenced by other factors and that they are accurate. Moreover, we can see the significant increase in speed of the Apple File System format in comparison to the Hierarchical File System Plus format. To get a better understanding of the extent to which APFS is better, we calculate the percentage increase in speeds and graph the obtained information:This result shows us that APFS is faster than HFS+ in all aspects of Sequential access. However, increase in read speeds are relatively higher than the increase in write speeds. We can see the same observation while comparing 256K speeds and 4K speeds, respectively.Averages for Random AccessIn random access tests, we see increases in APFS just like in sequential access. The improvements follow a similar trend, as can be seen in the above data. The following line graph depicts the two file system speeds compared:The above line graph clearly indicates that the shape of the curves are quite similar to the sequential access graphs, which would mean that the results are accurate. We continue to see the significant increase in speed of the Apple File System format in comparison to the Hierarchical File System Plus format. Calculating the percentage increase in speeds (following it, the graph):This result shows us that APFS is faster than HFS+ in all aspects of Sequential access. However, increase in read speeds are relatively higher than the increase in write speeds. The major difference between random access tests and sequential access tests is that 4K read is faster than 256K read, in this case. This is due to the general method of how Random access works. When there is more data, random access tends to be less efficient in terms of getting the information from the data blocks. Sequential access is more reliable in this case, that is, 256 kilobyte read testing.2.3 Limitations of InvestigationAlthough steps and precautions were taken to ensure a consistent environment, the accuracy of the resulting data may be limited.Possible Reasons for InaccuracyThe benchmarking software used may not accurately time the transfer of data, since there is no transparency in the output data.Computer system used may not be optimum, since operating system based background processes may not be able to be controlled consistently.Storage devices used to test may be slowed, due to age or over-usage.Possible Fixes to Above IssuesIt might be better to develop a method just for this process, since benchmarking software used may not give accurate recordings.We should create a specific environment in a desktop computing system, since laptop computers may not be consistent due to changes in battery life.Pen drives used must be new so as to prevent any quality issues affecting performance.2.4 Secondary DataDue to the supposed limitations in the gathering of test data, I referred to external sources. If information from the secondary sources match the trend in the graphs created, a suitable conclusion could be reached.The complete secondary data has been added to the Appendix 1 on Page 23. The results are from an experiment conducted by benchmarking a Solid State Drive (SSD), while formatted in APFS and HFS+. On both file systems, 128 files were transferred, while being tested by sysbench® – the benchmarking software used. Each file was sized 800 megabytes and the total transferred data was around 100 gigabytes. The processed data is as follows:The transfer speeds are, in general, much higher than the ones from the investigation. This is due to numerous reasons. For one, the tested data here is of the primary storage device of the computer used. Moreover, SSDs are much faster than flash disks. Additionally, the system used to benchmark the file systems is better than the one used in the primary data collected. The secondary data has been collected by a 2017 MacBook Pro A1706 with i5-7267U, 16 GB 2133 RAM and AP0256J NVMe disk. Therefore, we can only compare the trends in the two sets of data, not the exact figures. For this reason, the above processed data has been graphed in order to compare it to the experiment’s data:From the generated bar graph above, it has been seen that the trends from the investigation are also in the secondary data shown. Apart from the obvious increase in read over write speeds, we can see that the Apple File System has a significant data transfer speed advantage over the Hierarchical File System Plus. This correlates with my calculated data and graphs. Therefore, we can reach a suitable conclusion.3 ConclusionAfter analyzing test results and comparing it to secondary benchmarks, I have reached a suitable conclusion for this experiment.: Apple File System (APFS) is a significant improvement from the Hierarchical File System Plus (HFS+), when testing on a flash drive. This is due to the fact that APFS are optimized for these devices, whereas HFS+ was made for spinning disk systems.3.1 EvaluationMoreover, from the experiment, we get other observations. Read speeds are generally faster than write speeds in flash drives in both file systems. Usually, transferring larger amounts of data (256 kilobytes) is faster than smaller amounts of data (4 kilobytes). However, for Random access, the percentage increase in 256 kilobyte read is much lesser than in 4 kilobyte read. This is due to the fact that when there is more data, random access tends to be less efficient in terms of getting the information from the data blocks. Sequential access is more reliable in this case, that is, 256 kilobyte read testing.Therefore, we see that Apple File System (APFS) formatted disks will be faster than HFS+ storage devices, at both sequential and random access. After research in APFS and HFS+, I realized that APFS must be faster due to the more efficient storage of data in its data blocks. This leads it to be much more suited for Flash drives and Solid-State Drives, of which we are using Flash drives. Since external USB drives are not spinning disk based storage devices, HFS+ will not be optimum for storage here. BibliographyStatistics from http://www.forbes.com/sites/ciocentral/2012/05/01/big-data-the-hidden-opportunity/ accessed Nov 23 2017.Definition of Inode from https://en.wikipedia.org/wiki/Inode accessed Nov 25 2017.Information about APFS from  https://www.researchgate.net/publication/319573636_Decoding_the_APFS_file_system accessed Nov 23 2017.Secondary data from https://malcont.net/2017/09/apfs-vs-hfs-benchmarks-on-2017-macbook-pro-with-macos-high-sierra/ accessed Jan 29 2018Information about HFS+ from https://wikipedia.org/HFS_Plus/ accessed Nov 24 2018Infromation about APFS from https://wikipedia.org/Apple_File_System/ accessed Nov 24 2018AppendixAppendix 1Data from Secondary Source: