mit 6583 lab1E
GODB的Lab1习题
Exercise 1
Q:
tuple.go
At this point, your code should pass the unit tests in tuple_test.go.
此时,您的代码应该通过 tuple_test.go 中的单元测试。
TupleDesc
类型FileType
1 | // FieldType is the type of a field in a tuple, e.g., its name, table, and [godb.DBType]. |
类型 TupleDesc
1 | // TupleDesc is "type" of the tuple, e.g., the field names and types |
TupleDesc 由一个FieldType切片组成,描述Tuple中数据的名称,Table名与类型
TupleDesc.equals 方法
1 | // Compare two tuple descs, and return true iff |
比较d1 和 d2 是否相同
TupleDesc.copy 方法
1 | // Make a copy of a tuple desc. Note that in go, assignment of a slice to |
将 td 拷贝一份并返回
TupleDesc.merge 方法
1 | // Merge two TupleDescs together. The resulting TupleDesc |
将 desc2 接到 desc 后, 并返回一个新的TupleDesc
Tuple
1 | // Interface used for tuple field values |
Tuple 由Desc描述,Fields 数据与recordID组成
Tuple.writeTo方法
1 | // Serialize the contents of the tuple into a byte array Since all tuples are of |
将 Tuple中的字段序列化写入 buffer中
区分 IntType 和 StringType
int类型以int64进行写入
string类型补充StringLength长度后写入
!不确定UnknownType 如何处理
暂时以报错进行处理
Tuple.readTupleFrom 方法
1 | // TODO: recordID inplement |
1 | // Read the contents of a tuple with the specified [TupleDesc] from the |
从buffer中读取数据,根据desc中提供的类型
int类型以int64读入
string类型以stringlength长度字节串读入。
将data中的数据转换为fields
!不确定UnknownType 如何处理
以报错处理
Tuple.equals 方法
1 | // Compare two tuples for equality. Equality means that the TupleDescs are equal |
比较 t1 和 t2 是否相等
TupleDesc 使用 TupleDesc.equals 方法
Fields 进行遍历比较
Tuple.joinTuples 方法
1 | // Merge two tuples together, producing a new tuple with the fields of t2 appended to t1. |
合并 t1 和 t2
先合并 Desc,使用Desc.merge方法
再合并Fields
Tuple.compareField 方法
1 | // Apply the supplied expression to both t and t2, and compare the results, |
首先获取t和t2的值
然后获取类型
根据类型进行类型断言处理,比较值的结果
Tuple.project 方法
1 | // Project out the supplied fields from the tuple. Should return a new Tuple |
从 tuples 投影出 fields中对应的字段
通过遍历进行寻找fields中对应的字段
测试
tuple_test
全部测试通过
Exercise 2
Q:
实现
getPage()方法buffer_pool.go
There is a unit test buffer_pool_test.go, but you will not be able to pass this testuntil you implement the heap file and heap page methods below. You will also test the functionalityof the buffer pool when you implement your heap file iterator.
When more than this many pages are in the buffer pool, one page should be evicted from the pool before the next is loaded. The choice of eviction policy is up to you; it is not necessary to do something sophisticated.
Notice that BufferPool asks you to implementa flush_all_pages() method. This is not something you would everneed in a real implementation of a buffer pool. However, we need this methodfor testing purposes. You really should never call this method from anywherein your code.
存在单元测试Buffer_Pool_Test.go,但在实现下面的堆文件和堆页面方法之前,您将无法通过此测试。在实现堆文件迭代器时,您还将测试缓冲池的功能。
当缓冲池中的页面数量超过这个数目时,应该在加载下一个页面之前将一个页面从池中逐出。驱逐政策的选择由你自己决定;没有必要做一些复杂的事情。
注意,BufferPool要求您实现flush_all_ages()方法。在真正的缓冲池实现中,这不是您永远需要的东西。
然而,出于测试目的,我们需要此方法。您真的不应该从代码中的任何位置调用此方法。
Bufferpool
1 | type BufferPool struct { |
BufferPool中含有一个map存储Page, numPages为BufferPool的容量
在HeapFile中实现的PageKey方法,为对应的FileName和PageNo生成一个哈希值作为key
Bufferpool.FlushAllPages()
1 | // Testing method -- iterate through all pages in the buffer pool |
遍历所有的Page,调用dbfile的flush方法
测试方法,仅用于测试,在实际使用中不需要
Bufferpool.GetPage()
1 | func (bp *BufferPool) GetPage(file DBFile, pageNo int, tid TransactionID, perm RWPerm) (*Page, error) { |
从BufferPool中获取pageNo对应的页面
根据Page的hash值,判断Page是否位于BufferPool中
若在,则直接返回
若不在,则从读取页面,存入BufferPool中
若BufferPool已满,则需要驱逐一个非脏页。
测试
buffer_pool_test.go
测试需要完成以下的HeapFile和HeapPage才能通过
Exercise 3
Q:
heap_page.go
Although you are not required to use exactly our interface for heap_page.go, you will likely find the methods we have provided to be useful and we recommend following our skeleton.
尽管您不需要完全使用我们的heap_page.go接口,但您可能会发现我们提供的方法很有用,我们建议您遵循我们的框架。
Assuming you follow our outline, there are five non-trivial methods to implement:
- insertTuple() : This method should add a tuple to the page if there is space. Because a heap file is unordered, itcan be inserted in any free slot.
- deleteTuple() : Delete a specific tuple from the page.Note that this method takes a specific recordID (or “rid”) to delete. recordID is an empty interface; you are freeto use any struct you like for the rid, but for a heap file a rid would typically include the page number and the slot number on the page.The page number would typically be the offset in the heap file of the page, and the slot number would likely by the position of the tuplein the in-memory slice of tuples on the page. You will set the rid field of the tuples you return from your iterator. Your heap file implementation should use this rid to identify the specific page to delete from, and then pass the rid into this method so that you can delete the appropriate tuple. Note that if you choose to represent a page in memory as a slice of tuples, and the slot in the rid is the position in the slice, you should take care to not cause the rid to change when you perform the deletion. One way to achieve this is to set the position in the slice to nil (rather than creating a new slice with the deleted tuple removed from it), but many implementations are possible.
- toBuffer() : Serialize the pages to a bytes.Buffer object for saving to disk, using the binary.Write() method to encode the header and the writeTo() method from your tuple implementation. Note that the header includes the number of used slots, but does not encode which slots are empty and which are not. This is ok, because, in GoDB you do not need to preserve the record ids of records when they are written out (so a particular tuple’s rid may change after it is written and then read back.)
- initFromBuffer() : Read the page from the specified buffer by reading the header with the binary.Read() method and then the tuples using the readTupleFrom() method.
- tupleIter() : Return a function that can be invoked to interate through the tuples of the page. See the note about iterators in 2.2 above.
假设您遵循我们的大纲,有五种重要的方法可以实现:
insertTuple():如果有空间,此方法应该向页面添加一个元组。 由于堆文件是无序的,因此可以将其插入到任何空闲槽中。deleteTuple():从页面中删除特定的元组。请注意,此方法需要特定的 recordID(或“rid”)来删除。 recordID是一个空接口; 您可以自由地使用任何您喜欢的结构来删除,但对于堆文件,删除通常包括页码和页面上的槽号。页号通常是该页的堆文件中的偏移量,而槽号可能是该页上元组的内存片中元组的位置。 您将设置从迭代器返回的元组的 Rid 字段。 您的堆文件实现应该使用此rid 来识别要从中删除的特定页面,然后将rid 传递到此方法中,以便您可以删除适当的元组。 请注意,如果您选择将内存中的页面表示为元组切片,并且rid中的槽是切片中的位置,则应注意在执行删除时不要导致rid发生更改。 实现此目的的一种方法是将切片中的位置设置为 nil(而不是创建一个新切片并从中删除已删除的元组),但许多实现都是可能的。toBuffer():将页面序列化为bytes.Buffer对象以保存到磁盘,使用binary.Write()方法对标头进行编码,并使用元组实现中的writeTo()方法 。 请注意,标头包括已使用的时隙数,但不编码哪些时隙为空、哪些时隙不是。 这是可以的,因为在 GoDB 中,您不需要在写出记录时保留记录的记录 ID(因此特定元组的 ID 在写入然后读回后可能会发生变化。)initFromBuffer(): 通过使用“binary.Read()”方法读取标头,然后使用“readTupleFrom()”方法读取元组,从指定缓冲区读取页面。tupleIter():返回一个可以被调用以通过页面的元组进行交互的函数。 请参阅上面 2.2 中有关迭代器的注释。There are a few other methods (setDirty(), isDirty(), getNumSlots(), and the newHeapPage() constructor) that you will need to implement, but these should be straightfoward.
At this point, your code should pass the unit tests in heap_page_test.go.
After you have implemented HeapPage, you will write methods for HeapFile thatread pages from the file, iterate through pages, and insert and deleterecords.
您还需要实现一些其他方法(setDirty()、isDirty()、getNumSlots() 和 newHeapPage() 构造函数),但这些方法应该很简单。
此时,您的代码应该通过“heap_page_test.go”中的单元测试。
实现“HeapPage”后,您将为“HeapFile”编写方法 从文件中读取页面、遍历页面以及插入和删除记录。
heapPage
1 | type heapPage struct { |
一个页面在文件中的存储如下
- 大小为PageSize
- 4个字节存储numSlots
- 4个字节存储numUsed
- 剩下存储numSlots个槽的内容
:::info
!Attention:
若页面不能刚好存储整数倍个槽的时候或有空闲的槽,存在空位需要用0来填充(否则影响下一个页面)
将页面内容写入文件时,空的槽不写入,存数据的槽直接写入
下次读取页面的时候,数据将填充前numUsed个槽
简单理解: 0, 2, 4, 5, 6, 10槽有数据,写入文件,则文件前6个槽中有数据
下次读取时,RecordID变为0,1,2,3,4,5
:::
NumSlots
1 | func (h *heapPage) getNumSlots() int { |
返回现有的槽数,空闲的槽数
insertTuple
1 | // Insert the tuple into a free slot on the page, or return an error if there are |
在heapPage中,可以通过遍历tuples数组,若值为nil则说明为空,插入Tuple
deleteTuple
1 | // Delete the tuple in the specified slot number, or return an error if |
删除指定recordID的元组
Dirty & File
1 | // Page method - return whether or not the page is dirty |
toBuffer()
1 | func (h *heapPage) toBuffer() (*bytes.Buffer, error) { |
首先写入槽位数和已使用槽位数
接着将非空的写入
将剩余空间以0进行填充
读写前后,元组的RID会发生变化
initFromBuffer
1 | // Read the contents of the HeapPage from the supplied buffer. |
从字节数据中初始化页面
首先初始化页头
然后将存有数据的元组读取出来
tupleIter
1 | func (p *heapPage) tupleIter() func() (*Tuple, error) { |
返回一个迭代器函数函数
每调用一次该函数,将会返回下一个非空tuple,若为结尾则返回空
测试
heap_page_test.go
全部测试通过
Exercise 4
Q:
Implement the skeleton methods in:
heap_file.go
There are a number of methods you need to implement; we have provided additional implementation tips in the comments inheap_file.go.
NewHeapFile()- The constructor. It takes a file name that contains the binary encoding of the file (we name thesetable.datby convention), as well as the TupleDesc that can be used to determine the expected format of the file and a buffer pool object that you will use to retrieve cached pages.NumPages()- Return the number of pages in the heap file; you can use theFile.Stat()method to determine the size of the heap file in bytes.readPage()- Read a specific page from storage. To read a page from disk, you will first need to calculate the correct offset in
the file. Hint: you will need random access to the file in order to read and
write pages at arbitrary offsets – check out the golangos.Filetype and itsReadAt()method.
You should not callBufferPoolmethods when reading a page from disk in thereadPage()method, but you will
use the buffer poolgetPage()method in your implementations of the heap fileiterator. Once you have read in the bytes of the page you can create the page using the heap page methodnewHeapPage(). You can convert bytes read from a file to a buffer via thebytes.NewBuffer()method.flushPage()- Force a given page object back to disk. The supplied page will be aHeapPage; you should cast it and retrieve its bytes via the heap page methodtoBytes(). You can then write these bytes back to the appropriate location on disk by opening the backing file and using a method likeos.File.WriteAt().insertTuple()- Add a tuple to the heap file; because the heap file is unordered, it can be inserted in any free slot in the filedeleteTuple()- Remove a specific tuple from the heap file. You should use the rid field of the tuple to determine which page the
tuple is in, and call the heap page methoddeleteTuple()on the appropriage page.Descriptor()Iterator()- Return a function that iterates through the tuples of the heap file one at a time. You should iterate through the pages and use thetupleIter()to iterate through the the tuples of each heap page. See the note above about iterators in GoDB in 2.2 above.
This method should read pages using the buffer pool methodgetPage()which will eventually be used (in
a later lab) to implement locking-based concurrency control and recovery. Do
not load the entire table into memory when the iterator is instantiated – this will cause an
out of memory error for very large tables. Instead, you will just load one page at a
time as the buffer pool accesses them via calls toreadPage().pageKey()- Return a struct that can be used as a key for the page. The buffer pool uses this to determine whether the page is cached or not. We have provided an implementation hint in the comment of this function.
At this point, your code should pass the unit tests in heap_file_test.go and buffer_pool_test.go. This completes the tests for this lab. You should complete the final exercises in the next section.
您需要实施多种方法; 我们在 heap_file.go 的注释中提供了额外的实现技巧。
- NewHeapFile() - 构造函数。 它需要一个文件名,其中包含文件的二进制编码(我们按照惯例将这些文件命名为 table.dat),以及可用于确定文件的预期格式和您将使用的缓冲池对象的 TupleDesc 检索缓存的页面。
- NumPages()——返回堆文件中的页数; 您可以使用 File.Stat() 方法来确定堆文件的大小(以字节为单位)。
- readPage() - 从存储中读取特定页面。 要从磁盘读取页面,您首先需要计算正确的偏移量文件。 提示:您需要随机访问该文件才能读取和 以任意偏移量写入页面——查看 golang os.File 类型及其 ReadAt() 方法。在 readPage() 方法中从磁盘读取页面时,不应调用 BufferPool 方法,但您会 在堆文件迭代器的实现中使用缓冲池 getPage() 方法。 读入页面的字节后,您可以使用堆页面方法 newHeapPage() 创建页面。 您可以通过 bytes.NewBuffer() 方法将从文件读取的字节转换为缓冲区。
- flushPage() - 强制给定的页面对象返回磁盘。 提供的页面将是HeapPage; 您应该通过堆页方法 toBytes() 对其进行强制转换并检索其字节。 然后,您可以通过打开备份文件并使用 os.File.WriteAt() 等方法将这些字节写回磁盘上的适当位置。
- insertTuple() - 将元组添加到堆文件中; 由于堆文件是无序的,因此可以将其插入到文件中的任何空闲槽中
- deleteTuple() - 从堆文件中删除特定的元组。 您应该使用元组的rid字段来确定哪个页面 tuple进入,并在相应的页上调用堆页方法deleteTuple()。
- Descriptor()
- Iterator() - 返回一个一次迭代堆文件元组的函数。 您应该迭代页面并使用 tupleIter() 迭代每个堆页面的元组。 请参阅上面 2.2 中关于 GoDB 迭代器的注释。此方法应使用最终将使用的缓冲池方法 getPage() 读取页面(在 稍后的实验)来实现基于锁定的并发控制和恢复。 做 当迭代器实例化时,不要将整个表加载到内存中——这将导致 非常大的表出现内存不足错误。 相反,您只需加载一页 缓冲池通过调用 readPage() 访问它们的时间。
- pageKey() - 返回一个可以用作页面键的结构。 缓冲池使用它来确定页面是否被缓存。 我们在该函数的注释中提供了实现提示。
此时,您的代码应该通过 heap_file_test.go 和 buffer_pool_test.go 中的单元测试。 本实验室的测试就此完成。 您应该完成下一节中的最终练习。
HeapFile
1 | type HeapFile struct { |
fromFile文件名
td tuple描述
NumPages
1 | // Return the number of pages in the heap file |
此处的页面数是根据数据库文件的大小确定的
readPage
1 | func (f *HeapFile) readPage(pageNo int) (*Page, error) { |
从文件中对应位置读取内容
然后初始化页面
flushPage
1 | func (f *HeapFile) flushPage(p *Page) error { |
将页面写入文件中对应位置
insert & delete Tuple
1 | func (f *HeapFile) insertTuple(t *Tuple, tid TransactionID) error { |
首先看现有的page中是否有空的slot
使用bufPool读取page
如果有,则插入tuple
否则,新建一个页面,插入tuple,并将页面插入到文件的最后(flush page)
1 | func (f *HeapFile) deleteTuple(t *Tuple, tid TransactionID) error { |
根据Tuple的RID获取page
再将RID传给page的删除tuple方法进行tuple删除
Descriptor
1 | func (f *HeapFile) Descriptor() *TupleDesc { |
Iterator
1 | func (f *HeapFile) Iterator(tid TransactionID) (func() (*Tuple, error), error) { |
pageNo负责迭代页面
iter负责页面内的元组迭代
注意迭代页面时将pageNo递增同时将iter置为nil,方便下个页面iter的获取
heapHash
1 | // internal strucuture to use as key for a heap page |
将传入的pgNo与文件名进行hash
测试
heap_file_test.go
测试通过,但大数据读写效率较低
buffer_pool_test.go
测试通过
Exercise 5
Q
lab1_query.go
We have supplied a simple test case for you for this method in lab1_query_test.go, although we will also test it with other files to confirm your implementation is working.
我们在“lab1_query_test.go”中为此方法提供了一个简单的测试用例,尽管我们还将使用其他文件对其进行测试以确认您的实现正常工作。
computeFieldSum
1 | func computeFieldSum(fileName string, td TupleDesc, sumField string) (int, error) { |
新建一个heapFile,指定数据库目录
从CSV中加载数据库
对数据库进行查询,返回特定Field的sum值
测试
lab1_query_test.go
测试通过